1 Introduction

The Vertex Coloring Problem (VCP) asks for the minimum number of colors that can take the vertices of a graph \(G \) so that no two adjacent vertices share a color. This number \(\chi ({G})\) is called the chromatic number of the graph.

The VCP has numerous applications. For instance, when allocating frequencies, devices on nearby locations should work on different frequencies to avoid interference. The chromatic number of this distance-induced graph is thus the minimum span of required frequencies [1, 23]. In compilers, finding an optimal register allocation is a coloring problem on an interference graph of value live ranges [6]. In timetabling, assigning time slots to lectures so that no two classes attended by a common subset of student happen in parallel is a VCP [8].

The best performing approaches to the VCP often do not scale to extremely large graphs such as, for instance, social networks. In fact, on networks with several million nodes, even local search methods are seldom used and the best approaches rely on scale reduction and greedy heuristics both for lower and upper bounds [16, 28]. Indeed, the main technique used for reducing the graph consists in removing vertices of degree lower than some lower bound on the chromatic number. This technique might be very effective on sparse graphs especially when a maximum or a maximal clique provides a good lower bound. Several real-world extremely large sparse graphs can be efficiently tackled, even via complete algorithms, after such preprocessing. However, even relatively sparse graphs can have a large core of vertices whose degree within the core is higher than the chromatic number. In this case, there are not many practical techniques for upper bounds and most proposed approaches rely on greedy heuristics, in particular Brelaz’ Dsatur  [5]. Likewise, in this context there is virtually no method for computing a lower bound other than finding a large clique in the graph. As a result, there is little hope to optimally solve an instance with a large core, and whose chromatic number is strictly larger than the size of its largest clique.

In this paper we consider two datasets of very large graphs. The first, dimacs10, contains 30 graphs from the 10th DIMACS challenge [3]. It consists of two subclasses, one of graphs with heavy-tailed distribution of degrees and the other quasi-regular graphs. The second, snap, contains 75 graphs from the Stanford Large Network Dataset Collection [15]. These graphs correspond to social, citation, collaboration, communication, road or internet networks. They range from tens of thousands to several million vertices and all have extremely low density.

Whereas about half of these graphs are easy or even trivial for the state-of-the-art approaches, the rest remain too large and hard to color even after preprocessing. By combining several methods including local search, heuristics and complete algorithms, we can solve a significant proportion to optimality (close to 40%) of these hardest instances, even if they contain hundreds of thousands of vertices after preprocessing and even if their chromatic number is larger than their clique number. We survey the related work in Sect. 2, describe our main contribution, a method to obtain good lower bounds on very large graphs in Sect. 3, an effective local search approach to obtain good upper bounds in Sect. 4, and a way to combine these in Sect. 5. We report on an experimental comparison with the state of the art in Sect. 6.

2 Related Work

Heuristic methods are very relevant since they easily scale to very large inputs. In particular, the Dsatur heuristic proposed by Brelaz [5] is instrumental in the state-of-the-art method on the datasets we consider,  [16]. The Dsatur heuristic builds a coloring \(C \) mapping vertices to colors. It iteratively choses a vertex from a set \(U\) initially containing all vertices \(V \) of the graph. The chosen vertex \(v \) is the one with maximum saturation degree \(\delta ^{sat}({v})\) defined as the number of colors among its neighbors \(N({v})\), i.e., \(\delta ^{sat}({v}) = |\{C ({u}) \mid u \in (N({v}) \setminus U)\}|\). In case of a tie, the vertex with maximum degree \(|N({v})|\) is selected. Then it sets \(C ({v})\) to the smallest possible color \(\min (\mathbb {N} \setminus \{C ({u}) \mid u \in (N({v}) \setminus U)\})\).

Dsatur-based branch and bound algorithms [9, 26] are among the best complete methods, alongside column generation approaches [18, 20] and SAT-based models and hybrid algorithms [11, 25, 27, 30]. However, none of these scale to graphs with more than a few thousands vertices.

2.1 Local Search

Local search and meta-heuristics have long been applied to graph coloring (e.g. [12]), and with great success. All the best known colorings on the commonly used dataset from the second dimacs challenge [13] were obtained by such methodsFootnote 1.

In principle, local search approaches seem very well suited for coloring large graphs, and indeed most algorithms scale very well to relatively large graphs. However, surprisingly, we could not find a report of a local search or a meta-heuristic approach applied to the large graphs of the snap and dimacs10 datasets, or on graphs of similar magnitude.

When the number of vertices grows really large, then one must be very careful about the implementation details. As a matter of fact, several off-the-shelf algorithms we tried used data structures with a space complexity quadratic in the number of vertices, and are de facto irrelevant. Another critical point is the size of the neighborhood. For instance, the most common tabu scheme considers all the (non-tabu) moves of any node sharing a color with a neighbor, to a different color. Typical methods evaluate every such move and choose the one that decreases the most the number of conflicts. The number of such moves to consider in a graph with millions of vertices can be prohibitive, especially when starting from low quality initial solutions. The state-of-the-art memetic algorithm HEAD [21] uses a similar tabu search, and although we made superficial changes to make it capable of loading massive graphs in memory, it performed poorly on those. After a non-exhaustive review of the literature and of the available software, our belief is that these methods could be adapted to extremely large and sparse graphs, but it would require non-trivial implementation work.

Blöchliger and Zufferey’s local search algorithm [4] appears to be relatively promising in this context. The idea is to try to complete a partial coloring, i.e., a partition of the vertices into of k disjoint independent sets \(\{C_{1},\ldots ,C_{k}\}\) plus an extra set \(U\) of “uncolored” vertices. A move consists in swapping a node \(v \in U\) with the vertices \(N({v}) \cap C_{i}\) for some color \(i \in \{1,\ldots ,k\}\). A move \(({v},{i})\) minimizing \(|N({v}) \cap C_{i}|\) is randomly chosen. In order to escape local minima, after each move \(({v},{i})\), the moves \(({u},{i})\) for \(u \in N({v})\) are added to a tabu list so that \(v \) will stay with color i for a given number of iterations. When the set \(U\) becomes empty, a k-coloring is obtained and the process can continue by randomly eliminating one color i, that is, setting \(U= C_{i}\) and removing \(C_{i}\) from the partition.

2.2 Independent Set Extraction

Whereas sequence-based coloring heuristics (such as Dsatur) explore the vertices and insert them into the smallest possible color class (or independent set), Leighton’s RLF heuristic [14] extracts one maximal independent set (or color class) at a time. This technique has been shown to be more effective than Dsatur on some graphs, however it has a higher computational cost.

Recent effective methods for large graphs rely on this principle. For instance, Hao and Wu [10] recently proposed a method which iteratively extracts maximal independent sets until the graphs contains no more than a given number of vertices. Then, any algorithm can be used on the residual graph to produce a k-coloring which can be trivially extended to a \(k+p\)-coloring of the whole graph if p independent sets have been extracted. Moreover, the authors show that it may be effective to iteratively expand the residual graph by re-inserting the vertices of some independent set extracted in the first phase and run again the coloring method on the larger residual graph. This method, however, was not tested on graphs larger than a few thousand vertices.

2.3 Peeling-Based Approaches

The so-called “peeling” procedure is an efficient scale reduction technique introduced by Abello et al. [2] for the maximum clique problem. Since vertices of \((k+1)\)-cliques have each at least k neighbors, one can ignore vertices of degree \(k-1\) or less. As observed in [28], this procedure corresponds to restricting search to the maximum \(\chi ^{low}\)-core of \(G \) where \(\chi ^{low}\) is some lower bound on \(\omega ({G})\):

Definition 1

(k-Core and denegeracy). A subset \(S \subseteq V \) is called a k-core of the the graph \(G = (V,E)\) if the minimum degree of any vertex in the subgraph of \(G \) induced by S is k. The maximum value of k for which \(G \) has a non-empty k-core is called the degeneracy of \(G \).

As observed by Verma et al. [28], the peeling technique can also be used for graph coloring, since low-degree vertices can be colored greedily.

Theorem 1

(Verma et al. 2015). \(G \) is k-colorable if and only if the maximum k-core of \(G \) is k-colorable.

Indeed, starting from a k-coloring of the maximum k-core of \(G \), one can explore the vertices of \(G \) that do not belong to the core and add them back in the inverse of the degeneracy order, so that any vertex is preceded by at most \(k-1\) of its neighbors, and hence can be colored without introducing a \(k+1\)th color. The other direction is trivial as the maximum k-core is a subgraph of \(G \).

This preprocessing technique can be extremely effective on very sparse graphs, and computing a lower bound of the chromatic number is relatively easy: computing the clique number of a graph is NP-hard, but in practice it is much easier than computing its chromatic number. However, the \(\chi ^{low}\)-core might be too large, and therefore a second use of the peeling technique was proposed in [28]. The idea is to find a coloring of the maximum \((\chi ^{up}-1)\)-core of \(G \) where \(\chi ^{up}\) is an upper bound on \(\chi ({G})\). The maximum \((\chi ^{up}-1)\)-core has several good properties: it is often small, its chromatic number is a lower bound on \(\chi ({G})\), and if there exists such a k-coloring with \(k<\chi ^{up}\), then it can be extended, in the worst case, to a \((\chi ^{up}-1)\)-coloring of \(G \).

Therefore, Verma et al. proposed the following method: Starting from the bounds \(\chi ^{low}\le \chi ({G}) \le \chi ^{up}\), the algorithm solves the maximum \((\chi ^{up}-1)\)-core of \(G \) to optimality, and extends the corresponding k-coloring greedily following the inverse degeneracy order to a \(k'\)-coloring. Then it sets \(\chi ^{low}\) to \(\max (\chi ^{low},k)\) and \(\chi ^{up}\) to \(k'\). The algorithm converges since since \(\chi ^{low}\) cannot decrease and \(\chi ^{up}\) is guaranteed to decrease at each step.

Unfortunately, some graphs simply do not have small k-cores, even for k larger than their chromatic number, so this method is limited to extremely sparse graphs. Moreover, notice that the core must be solved to optimality in order to extract relevant information from the iteration and converge.

The algorithm proposed by Lin et al. [16] also uses peeling, but in a slightly different way. A k-bounded independent set is an independent set whose vertices all have a degree strictly smaller than k. Their method iteratively finds a maximal clique using a very effective sampling-based heuristic; removes a \(\chi ^{low}\)-bounded independent set where \(\chi ^{low}\) is the size of the clique from the graph; and computes an upper bound using the Dsatur heuristic.

This method is very effective, outperforming the approach of Verma et al. on graphs with large cores. However, notice that the vertices in a \(\chi ^{low}\)-bounded independent set cannot be in a \(\chi ^{low}\)-core since their degree is strictly thess than \(\chi ^{low}\), and therefore this variant of peeling is less effective than Verma’s. The two main components are the method to find a clique and the Dsatur heuristic to find upper bounds. The former essentially samples a set of vertices to be expanded to a maximal clique. When extending a clique, a number p of neighbors are probed and the one that maximizes the size of the residual candidate set of vertices to expand the clique is chosen. Several runs are performed with the parameter p growing exponentially at every run. However, it cannot prove a lower bound greater than the clique number. The runs of Dsatur are randomized and augmented with the recolor technique [24]: when a new color class i is created for a vertex \(v \), if there exist two color classes \(C_{j},C_{k}\) with \(j<k\) and a vertex \(u \) such that \(N({v}) \cap C_{j} = \{u \}\) and \(N({u}) \cap C_{k} = \emptyset \), then \(v \) and \(u \) can be recolored to j and k respectively, thus leaving the color i free.

3 Iterated Dsatur

The overwhelmingly most common lower bound technique is to find a large clique. Several other lower bounds have been used. For instance, two extra lower bounds were proposed in [9]: the Lovász Theta number [17] and a second lower bound based on a mapping between coloring and independent sets on a reformulation of the graph [7]. Another lower bound based on finding embedded Mycielskian graphs [22] was proposed in [11]. Moreover, the bounds obtained by linear relaxation of either the standard model or the set covering problem from the branch & price approach are very strong. However, it is difficult to make any of these methods scale up to graphs with millions of vertices.

Many graphs of the dimacs10 and snap datasets have a chromatic number equal to their clique number. Moreover, finding a maximum clique turns out to be much easier in practice than solving the VCP. Therefore, it is often possible to find a maximum clique and they often provide a good lower bound.

In this section, we introduce a method to solve the VCP that scales up to very large graphs. Moreover, it may compute non-trivial lower bounds, that is, larger than the clique number. As a consequence, this method can produce optimality proofs, even when \(\omega ({G}) < \chi ({G})\). The principle is to iteratively compute a coloring with Dsatur, and optimize its prefix up to the first occurrence of the color \(\chi ^{low}+1\). If there exists a \(\chi ^{low}\)-coloring of the prefix, then the next iteration of Dsatur will follow the optimized prefix, whose length will thus increase. Otherwise, the lower bound can be incremented.

figure a

Algorithm 1 uses a variant of Dsatur which takes a total order O of a subset of the vertices and a coloring C for these vertices. It assigns first vertices in the given order and coloring, then colors the rest of the vertices using the standard Dsatur heuristic. It returns the coloring C as well as the total order \(O=\left\langle o_1, \ldots , o_{|V |} \right\rangle \) that it followed. In the following, we write \(\max (C)\) for the maximum color used, and \(C ({v})\) for the color of \(v\).

Algorithm 1 proceeds as follows. Given initial bounds \(\chi ^{low}\) and \(\chi ^{up}\), as well as a coloring and ordering that witness the upper bound, we extract the core graph, which is the subgraph \(G _{O^{1}}\) of \(G\) induced by the vertices \(\{o_1, \ldots , o_p\}\) where p is the maximum index for which all vertices \(o_1, \ldots , o_{p-1}\) are assigned colors in \([1, \chi ^{low}]\). In other words, p is the index of the first vertex that is assigned a color greater than the current lower bound \(\chi ^{low}\). The order of these p vertices is fixed for all subsequent runs of Dsatur. We then compute \(\chi ({G _{O^{1}}})\), using any exact coloring algorithm. In our implementation this is the satisfiability-based algorithm from [11]. If \(\chi ({G _{O^{1}}}) > \chi ^{low}\) then we can update \(\chi ^{low}= \chi ({G _{O^{1}}})\). This is because \(G _{O^{1}}\) is an induced subgraph of \(G\), so \(\chi ({G _{O^{1}}})\) is a lower bound on \(\chi ({G})\). On the other hand, if \(\chi ({G _{O^{1}}}) \le \chi ^{low}\), we fix the first p vertices to their order and color them as in the optimal coloring of \(G _{O^{1}}\) and use them as the starting point for a run of Footnote 2. In either case, we proceed to the next iteration.

Algorithm 1 converges because at every iteration a growing subset of the vertices are included in the core. Indeed, if \(\chi ({G _{O^{i}}}) > \chi ^{low}\), then the lower bound is increased, which means that \(G _{O^{i+1}}\) is larger. If \(\chi ({G _{O^{i}}}) \le \chi ^{low}\), then the next run of is constrained to assign at least \(o_{p}\) to a color in \([1, \chi ^{low}]\), so the core graph at the next iteration contains at least one more vertex. In the extreme, the algorithm will terminate when \(G _{O^{i}} = G \).

4 Local Search for Massive Graphs

As far as we know, the best upper bound for the datasets we consider were obtained using either Brelaz’ heuristic [16], or by greedily extending the optimal solution of a k-core [28]. Therefore, whether local search can help remains to be seen. In this section we describe the modifications we made to Blöchliger and Zufferey’s tabu-search algorithm in order to adapt it to extremely large graphs.

Initialization. A first very modest, but significant, addition is a method to efficiently initialize the solution of the local search. The algorithm described in [4] is given an integer k and tries to find a k-coloring. Since our method produces colorings during preprocessing (from the computation of the degeneracy ordering and from Dsatur) it is immediate to initialize the solution with such a coloring whereby the vertices of any one color class are considered “uncolored”. However, we observed that it was important to choose a small color class, as they can be extremely unbalanced and chosing randomly could lead to a prohibitively large neighborhood to explore in the initial steps.

Chained Flat Moves. Recall that a move consists in swapping a node \(v \) from the set \(U\) of uncolored vertices with its neighbors \(N({v}) \cap C_{i}\) in some color class i. When \(N({v}) \cap C_{i} = \emptyset \) this is an improving move as we have one less uncolored node. Now we call a move \((v,i)\) such that \(|N({v}) \cap C_{i}| = \{u \}\) a flat move. We know that no strictly improving move was possible, so if there is an improving or a flat move involving \(u \) it is likely to be selected next. Therefore, in the event of a flat move we greedily follow chains of flat moves from the previous vertex until reaching an improving move, or until no flat or improving move is possible for that vertex. This technique does not change the neighborhood, but explores it in a more greedy way and is often beneficial. Moreover, we observed that it was relatively easy to assess if such moves were effective, by counting how many of them lead to an improving move, and by checking their length.

figure b

Algorithm 2 is a pseudo-code of our implementation of Blöchliger and Zufferey’s tabu search. We denote \(C_{i}\) the set of vertices of color i, that is \(C_{i} = \{v \mid C ({v})=i\}\). The outer loop and the color selection in line 1 are not in the original implementation, as well as the random path of flat moves corresponding to the lines between 2 and 3. Notice that ties are broken randomly in every “\(\mathrm {arg\, min}\)” operator. Moreover, the management of the tabu list (\(T_v^i\)) as well as of the iteration limit, and the choice of applying a random path move is more complex than the pseudo-code shows. We set the parameters as follows.

Tabu List. Here we used a relatively straightforward scheme which is in fact a simplified version of what is done in the original code. Every 10000 iterations, the tabu tenure parameter \(t\) is decremented, unless it is null or the delta between the lowest and largest size for \(U\) (the set of “uncolored” vertices) is lower than or equal to 1 since the last update of the tabu tenure. In both of the latter cases, \(t\) is increased by its initial value (the initial value was 10 in all our experiments).

Iteration Limit. In order to dynamically adapt the number of iterations to the progress made by the tabu search, we used the following policy: Let k be the current number of iterations and \(I\) the current limit. When the limit is reached within the outer loop, we check if there was any progress on the upper bound \(\chi ^{up}\) since the last limit update. If there was some progress, then we increase the limit by the current number of iterations (\(I= I+ k\)). Now, let \(\delta \) be the value of \(I- k\) at the start of the inner loop. When the limit is reached within the inner loop, we check if there was any progress on the number of uncolored vertices (\(|U|\)) since the last limit update. If there was some progress, then we increase the limit by \(\delta \), otherwise we increase it by \(\delta /2\). We used an initial limit of 250000.

Limit on Chains of Flat Moves. In some cases it is possible to explore very long paths of flat moves hence slowing down the algorithm. We introduce a parameter p (originally set to 1) controlling the probablity 1/p of preferring such moves. Then we simply check the average length l of these moves and their frequency f and adjust p in consequence. In practice, we double p when \(l \times f \ge 20\) and decrement it when it is strictly greater than 1 and \(l \times f \le 3\).

figure c

5 Overall Approach

Our approach combines the peeling preprocessing from Sect. 2, the tabu search described in Sect. 4 and the iterated Dsatur scheme described in Sect. 3.

The principle we use for choosing the exact sequence of techniques is to apply first those that have the greatest effect for the least computational cost. Therefore, we first call to compute not only the degeneracy of the graph, but also the smallest-last ordering [19] O, which is the order in which vertices are processed by the degeneracy algorithm and the array D, which contains the degrees of the vertices during the elimination procedure. The actual degeneracy D is only implicitly contained there as the maximum value in the array, and \(D+1\) is an upper bound on the chromatic number. We also compute a lower bound by finding a clique. Using this lower bound and the order O, we can compute the peeled graph H by removing the vertices whose degree D during the degeneracy computation is at most k.

Although finding the maximum clique is NP-hard, it turns out to be much easier than coloring in the dataset we used, so we solve the problem exactly rather than use a heuristic. It also has a great effect on the rest of the algorithm, as a better initial lower bound results in greater scale reduction from peeling and hence improves all heuristics used further on.

After peeling, we first improve the upper bounds using the Dsatur heuristic () and then local search. Finally, we switch to iterated Dsatur (), which is exact and hence the most computationally expensive phase.

One complication is that the iterated Dsatur phase is initialized with the current best solution. If this solution was found by the local search algorithm, there is no ordering that can use to extract a core. We can produce a relevant ordering from the local search solution simply by sorting the vertices by saturation degree within the local search coloringFootnote 3 as shown in line 2. However, this coloring may not use the smallest colors for the first vertices in the order, therefore, we apply the following transformation:

We run following the ordering O. When processing node \(v \), we check if the color \(C ({v})\) assigned by the tabu search to \(v \) has already been mapped to some color, if not, we map it to the minimum color c that \(v \) can take and assign c to \(v \). We do the same if the color \(C ({v})\) happens to be already mapped to c. Otherwise, we switch to the standard Dsatur from that point on.

The resulting coloring is similar (at least in the prefix) to the LS solution, however it is in a form that might have been produced by Dsatur.

6 Experimental Results

Our implementaton uses  [29] for finding the initial maximum clique, and Footnote 4 as the underlying CDCL CSP solver during the I- phase.Footnote 5

We compare it to the state of the art: the approach [16]. Unfortunately, we could not compare with the approach described in [28] since the coloring part of this code is now lost.Footnote 6 However, this latter approach is dominated by on instances with large cores, hence the hardest.

Every method was run 20 times with different random seeds and with a time limit of one hour and a memory limit of 10 GB. The memory limit was an issue only for which exceeded the memory limit on 3 instances. We raised the limit to 50 GB in these three cases. We used 4 cluster nodes, each with 35 Intel Xeon CPU E5-2695 v4 2.10 GHz cores running Linux Ubuntu 16.04.4.

Table 1. CPU time (easy dimacs10 instances)
Table 2. CPU time (easy snap instances)

The first two columns of Tables 1, 2, 3 and 5 give the size of the graph (number of vertices/edges) before and after scale reduction. In all these tables, bold font is used to highlight the (strictly) best outcomes. In Tables 1 and 2 we report the CPU time in milliseconds for the “easy” instances of the dimacs10 and snap sets, respectively. We say that an instance is easy when both and solved to optimality. We give the minimum, maximum and average CPU time – parsing excluded – across the 20 random runs on the same instance.

Table 3. Lower and upper bounds (hard dimacs10 instances)
Table 4. Summary (hard dimacs10 instances)

Tables 3 and 5 show the lower (\(\chi ^{low}\)) and upper bounds (\(\chi ^{up}\)) found by and on the rest of the dataset (“hard” instances). Both for the lower and upper bound, we give the best and average value across the 20 random runs on the same instance. We use an asterisk (\(^*\)) to denote that the maximum lower bound found over the 20 runs is as high as the minimum upper bound, signifying that the method is able close the instance. Moreover, for the results of , we denote via a superscript in which phase of the approach the best outcome was found. A value of 0 stands for the computation of the degeneracy ordering, 1 for the preprocessing phase, 2 for the local search and 3 for the iterated Dsatur phase.

Table 5. Lower and upper bounds (hard snap instances)
Table 6. Summary (hard snap instances)

Finally, Tables 4 and 6 give a summary view for hard instances, of respectively the dimacs10 and snap datasets, with the arithmetic and geometric mean bounds; overall ratio of optimality; and overall mean CPU time.

We first observe that for many of these graphs (see Tables 1 and 2) finding an optimal coloring is easy. One reason is that their clique and chromatic numbers are equal. However, this is also the case for some graphs classified here as “hard”. Whereas we use a complete maximum clique algorithm in our approach, does not and yet it finds a maximum clique in all the “easy” graphs and in most of the “hard” ones. Moreover, both solvers were able to quickly find a maximum clique and an optimal coloring. In particular, many easy graphs are solved during the preprocessing phase, the maximum \((\chi ^{low}-1)\)-core being very small. Those graphs are therefore trivial both for and for our approach, which are in fact similar on those. There is a slight advantage to our method in terms of average run time, both for easy dimacs10 and easy snap instances, which can presumably be attributed to our peeling method being more efficient than the independent set extraction in .

Of the hard dimacs10 instances in Table 3, all but kron_g500-logn16 are quasi-regular, i.e., every vertex has roughly the same degree. These graphs do not have small cores, hence the peeling phase is irrelevant. We can see that on these graphs, the tabu search algorithm significantly outperforms Dsatur and therefore our approach dominates for the upper bound. For instance, on ldoor, finds a 29.85-coloring on average whereas the best coloring found by has 32 colors. On the instance kron_g500-logn16, the tabu search performs poorly and is on average dominated by . In one run, however, the iterated Dsatur algorithm is able to find a much better coloring using 6 fewer colors than the best one found by . The aggregated results given in Table 4 show that outperforms both for the lower and upper bounds on this dataset.

The iterated Dsatur algorithm is also able to improve the lower bound of 2 instances out of 8 (ldoor and G_n_pin_pout). However, for the latter, produces the same lower bound (4) which is larger than the maximum clique found by . We do not know how to explain this.

On hard instances of the snap dataset (Table 5), the picture is very different with in particular the tabu search being almost useless. The best coloring found by our method was obtained during the local search phase only once, for the instance HR_edges. In all other cases the best coloring was produced either during preprocessing via Dsatur, or during the iterated Dsatur phase. Overall, as shown in Table 6, this is slightly less efficient for the upper bound than which repeatedly uses Dsatur and eventually finds better colorings in several instances whilst is best only on four instances.

The iterated Dsatur phase, however, is very effective with respect to the lower bound. It improves on the maximum clique found by in 25 out of 34 instances, and it matches the best upper bound for 14 instances. Here again, on three instances (cit-HepTh, email-Enron and p2p-Gnutella24) outputs a lower bound greater than that found by . Overall, our approach can close 14 of the hard instances, for 10 of whichFootnote 7, the optimal coloring was not previously known, as far as we know. can only close one of them.

7 Conclusions

We have presented a new algorithm for exactly computing the chromatic number of large real world graphs. This scheme combines a novel local search component that performs well on massive graphs and gives improved upper bounds as well as an iterative reduction method that produces much smaller graphs than previous state of the art scale reduction methods. This scheme involves extracting more information than simply a coloring from the Dsatur greedy coloring heuristic and iteratively solving reduced instances with a complete, branch-and-bound solver, in such a way that lower bounds produced for the reduced graphs are also lower bounds of the original graph. Combined with the fact that we achieve more significant reduction than the current state of the art means that we can find non-trivial lower bounds even when peeling-based reduction cannot reduce the graph to fewer than hundreds of thousands of vertices. Indeed, in our experimental evaluation on a set of massive graphs, this method is able to produce both better lower and upper bounds than existing solvers and proves optimality on several (almost 75%) of them.

We expect that finding a method to extract cores from other heuristics, such as our local search procedure will further improve performance.