1 Introduction

A simple, undirected graph G=(V,E) is defined by a set of vertices \(V=\left \{ {v_{1} ,v_{2} ,{\cdots } ,v_{n}} \right \}\) and a set of edges E made up of pairs of distinct vertices (\(E\subseteq V\times V)\). A clique in graph G is a complete subgraph, that is, a subgraph in which vertices are pairwise adjacent. In this work, we consider the maximum clique problem (MCP), which asks for a clique of the largest cardinality in the graph. The size of a maximum clique is called the clique number of the graph and is usually denoted as ω(G).

The MCP is a well known and deeply studied NP-hard problem in graph theory. Moreover, it has found applications in many different fields, such as data association problems in bioinformatics and computational biology [13], computer vision [4], and robotics [5]. Such association problems may be reduced to the MCP in a correspondence graph, which subsumes the matching criteria between the two entities involved. With the upsurge of Web technologies, cliques have also been applied to capture the structure of massive networks. For example, in social networks a clique can identify a group of cooperating agents (e.g. a terrorist cell); in the World Wide Web, cliques or quasi-cliques can help detect frequently visited pages concerning a certain topic. Clique kernels can also help to identify clusters.

Relevant definitions and notation used in the paper are the following:

  • G[W]=(W,E[W]): a subgraph of graph G induced by a vertex set \(W\subseteq V\).

  • \(N(u)=\left \{{v\in V\vert (u,v)\in E} \right \}:\) the neighbour set of vertex u in graph G, that is, the set of vertices adjacent to u. Notation may include a vertex set as a subscript (e.g. N W (u)) to refer to the neighbourhood in the induced subgraph G[W].

  • deg(u)=|N(u)|: the degree of vertex u.

  • k-colouring: an assignment of k different numbers (colours) to every vertex of graph G such that adjacent vertices have different colours, that is, \(u\in N(v)\Rightarrow c(u)\neq c(v)\). A k-colouring partitions the vertex set V into k disjoint colour sets C 1,C 2,…,C k , also called colour classes. Each colour set C 1,C 2,…,C k , is an independent set, that is, a set of pairwise non-adjacent vertices.

  • χ(G): the chromatic number of graph G, that is, the minimum number k of colours required to colour G.

  • greedy sequential colouring (SEQ): a colouring heuristic which sequentially assigns the lowest possible colour number to each vertex.

  • \(w(v_{i})=|N(v_{i})\cap \{v_{1},\ldots ,v_{i-1}\}|\): denotes the width at the i-th vertex in a sequence, that is, the number of vertices in N(v i ) that precede v i . The width of an ordering is the maximum width at any of its vertices.

  • degeneracy (or width) of G: the minimum width of any ordering of V.

  • minimum-degree-last ordering of vertices: a vertex ordering of minimum width obtained by iteratively removing vertices with minimum degree and placing them in reverse order in the new ordering.

  • \(\sigma (v)=\sum \limits _{u\in N(v)}\text {deg}(u)\): the neighbourhood support of v, that is, the sum of its neighbours’ degrees.

1.1 Exact branch and bound algorithms

In the literature, there are many different approaches to solving the MCP exactly. Most successful exact solvers belong to the family of branch-and-bound algorithms that employ approximate-colour bounds [619]. Fahle’s algorithm [7] is possibly the first solver of this type.

Exhaustive enumeration can be traced back to the classic Bron-Kerbosch algorithm [19]. Exact solvers keep track of a growing clique in S and a candidate set of vertices U that can enlarge S. At each step, a single vertex v is selected from U to build a bigger clique in S and create a new, smaller subproblem, with a set of candidates N U (v). Leaf nodes of the search tree correspond to maximal cliques and during enumeration, they are checked to see whether their size is greater than the incumbent solution stored in a global variable S m a x . Every time a bigger clique is found, it is written to S m a x.

The basic branch-and-bound approach for the MCP can be traced to Carraghan and Pardalos in [22]. Approximate colour bounds for a maximum clique achieve a good compromise between tightness and computational effort. Proposition 1 provides theoretical justification for this, and may be derived trivially from [20].

Proposition 1

Any k-colouring of a graph G gives an upper bound on its clique number (ω(G)≤χ(G)≤k).

In most of the effective exact maximum clique approximate-colour algorithms for the MCP, the greedy sequential colouring heuristic SEQ is employed to colour each subproblem. SEQ is a constructive heuristic that iteratively assigns the smallest possible colour to every vertex such that no conflicts with the already coloured vertices occur. It has a worst-case running time of O(n 2).

Relevant recent improvements reported in the literature for exact maximum clique algorithms that employ approximate-colour bounds are (in chronological order):

  • Branching on maximum colour: at each step (a recursive call of the algorithm), vertices are selected for branching in non-increasing order of their colour numbers. This was first described in algorithm MCQ [8].

  • Recolouring: an additional computation which aims at reducing the size of the colouring obtained by SEQ, but increases its complexity linearly to O(n 3). It was first described in algorithm MCS [9].

  • Static ordering of vertices: vertices in every subproblem are always sorted in the order determined at the beginning of the search. This was first described independently both in MCS and in the bit-parallel kernel of the BBMC family of algorithms [10, 11].

  • Bitstring encoding of the MCP [1012, 15]: the BBMC family of algorithms represents vertex sets, as well as the input graph, via bitstrings. The advantage is that critical operations related to child problem generation and bound computation are performed more efficiently using bitmasks.

  • Selective colouring: a partial SEQ colouring in which only the subset of vertices to be pruned in the child subproblem is coloured. It was first described in BBMCL [12] (the ‘L’ stands for seLective).

  • Strong heuristic for a ’good’ initial solution, as described in [17].

  • Infra-chromatic bound: a bound tighter than the one obtained by SEQ. This bound can possibly be lower than the chromatic number of the input graph. In MaxCLQ [13, 14], the authors proposed one such bound based on reducing the maximum clique problem determined by each coloured subgraph to the partial maximum satisfiability problem. The term infra-chromatic first appeared in [15], where the BBMCX algorithm is described. BBMCX shares the bitstring BBMC kernel and implements an infra-chromatic bound by looking for triplets of colour sets in which there are no vertices that can form a triangle. For each such triplet, denoted inconsistent, the bound is decremented from 3 to 2. BBMCX is currently the fastest published algorithm of the BBMC family for dense graphs.

Table 1 summarizes the majority of algorithms described in this section, together with their most relevant properties.

Table 1 A number of relevant exact maximum clique solvers in chronological order

This work describes two initial orderings of vertices that are efficient for successful approximate-colour solvers, such as MCS or BBMCX, with the exception of MaxSAT-based MaxCLQ.

Related to branching on maximum colour and static ordering is the fact that the initial sorting of vertices is well known to have a significant impact on the size of the MCP search tree. We discuss this issue in the next subsection, as it is very much concerned with the contribution of this work.

1.2 Initial ordering of vertices

A well-known initial sorting strategy for exact MCP solvers is to branch on vertices with the smallest degree at the root node. The idea is similar to branching on variables with a small number of values used in constraint satisfaction problems. In practice, vertices with the smallest degree are placed last in V and branching in all subproblems, including the root node, is done by selecting vertices in reverse order.

The most successful initial sorting strategies for exact MCP algorithms reported in the literature are the following:

  • Minimum width (MW): a minimum-degree-last ordering (ties broken randomly or in natural order), which can be traced back to [22] in connection with the MCP. As explained previously, it is a degenerate ordering achieved by removing, at each iteration, the vertex with the minimum degree and placing it in reverse order in the new ordering.

  • Minimum width with tie break by minimum support (MWS): Similar to MW but with a tie-break strategy: it selects the vertex with the minimum support from the set of vertices with the same degree.

A lighter variant for computing MWS was brought to our attention in a personal communication [23] and will be referred to as MWSS (Minimum Width with Static Support). It is similar to MWS, but instead of recalculating neighbour support at each step it uses the vertex support determined by the initial ordering in every iteration. MWSS is very useful in graphs of high order, in which the computational cost of MWS is high. By default, support tiebreak in this paper always refers to MWS. MWSS will be explicitly mentioned when disambiguation is required.

Initial sorting by degree has been the standard choice of successful approximate-colour exact algorithms for the MCP. Recently, a colour-based ordering was described in [18] as a possible enhancement of the MaxCLQ algorithm. However, the specific impact of the ordering was not analysed.

In Section 2 of this paper, we describe a new initial sorting procedure, DEG_SORT, which improves standard MW/MWS degree-based orderings. Section 3 starts by describing a colour-based initial sorting procedure COLOUR_SORT, based on [18], and explains why it can also be successful for MCS or the BBMC family of algorithms. The final part of the section describes the NEW_SORT algorithm proposed in this work. NEW_SORT selects DEG_SORT or COLOUR_SORT according to a new evaluation function. Section 4 covers the experiments and validation. Finally, Section 5 presents the conclusions and future work.

2 Improved degree-based initial sorting

Branching on vertices with the highest colour was first proposed in MCQ [8] and, since then, has been applied by most successful MCP exact solvers. In practice, as mentioned previously, vertices are sorted in a highest-colour-last fashion and taken in reverse order in every subproblem. Figure 1 depicts the control flow: vertices are coloured by greedy SEQ according to the initial ordering (Fig. 1, top), sorted according to non-decreasing colour number, and then selected in reverse order (Fig. 1, bottom).

Fig. 1
figure 1

Colouring and vertex selection directions in each subproblem

2.1 Analysis of largest-first vertex colouring heuristic

To evaluate the quality of the bounds obtained by direct implementation of the flow in Fig. 1 (very much related to the proposed new sorting heuristic), we consider the Largest-First (LF) decision heuristic for greedy vertex colouring of Welsh and Powell. In [24], they proved that given a non-increasing degree ordering of vertices (i.e. deg(v 1)≥ deg(v 2)≥…≥ deg(v n )), SEQ would always produce not more than \(\max \limits _{i \in V}\min \{i,1+\text {deg}(v_{i})\}\) colours. This is known as the Welsh and Powell bound. We will refer to the opposite (bad) ordering deg(v 1)≤ deg(v 2)≤…≤ deg(v n ) as Smallest-First (SF).

The key idea of LF is to assign colour numbers to the most conflicting vertices early in the hope that those remaining will require a small number of colours (ideally, not different from those used in the early stages). Moreover, Observation 1 is widely accepted (see for example [25]) and many recent exact MCP solvers apply some variant of LF ordering for SEQ colouring [812, 1517].

Observation 1

Greedy sequential colouring of vertices sorted according to the LF rule almost always produces tighter colourings than the Welsh and Powell bound.

A qualitative measure of the impact of LF ordering in SEQ may be found in Table 2. There, LF is compared with its counterpart SF in structured and non-structured uniform random graphs. Note that we do not compare the number of colours in LF colouring with the chromatic number since it is impractical to compute the chromatic number in most of the graphs.

Table 2 Comparison between Largest-First and Smallest-First colouring heuristics for a number of structured (see Appendix) and uniform random graphs

In the case of structured instances, the table reports average colour sizes for typical members of each family (brock200_1brock400_4, dsjc 500.1/5dsjc 1000.1/5, MANN_a9MANN_a27 etc.; see the A for the full list). In the case of Erdös-Rényi random graphs G(n,p), the table reports average colour sizes for different values of n and density p (we consider 50 instances for each graph type). Table 2 gives evidence that the LF rule produces tighter sequential colourings, on average, than the SF one: up to 12 % improvement for non-structured graphs and 30 % for structured graphs. The exception is the MANN family, in which SF actually improves the colouring. This may be explained by the high density (p>0.92) of these particular graphs. We note that even a small bound improvement can produce an exponential reduction in the size of the maximum clique search tree.

Another interesting result to be derived from Table 2 is Observation 2, where the term benefit refers to the gap between LF and SF as a percentage.

Observation 2

The benefit of LF ordering for SEQ colouring diminishes with the growth of graph size and density in the case of uniform random graphs.

Figure 2 corresponds with the data in Table 2 but includes density information for each graph order considered. Observation 2 is captured by the fact that lines in the line chart are aligned by increasing graph size from top to bottom in the figure. Lines cross for some neighbour sizes and different densities [e.g. (200, 0.1) shows a 16.81 % improvement whereas (250, 0.1) shows a superior 17.65 % improvement], but the trend is clearly there. We are not aware of this fact being reported elsewhere and consider Observation 2 as an additional contribution of the paper.

Fig. 2
figure 2

Impact of Largest-First sequential greedy colouring on Erdös-Réényi graphs of different sizes and densities. The Y-axis refers to the improvement with respect to Smallest-First as a percentage. Each line corresponds to a different graph order

We propose the following intuition as an explanation. Let us consider the cases in which any sequential ordering mistakenly uses colour χ(G)+1 for some vertex v, where χ(G) is the chromatic number of the coloured graph G. Such a case is shown in Fig. 3, in which vertex v has χ neighbours with colours 1, …, χ and has to be coloured with colour χ+1.

Fig. 3
figure 3

An example where vertex v has χ neighbours v 1,…,v χ with colour numbers 1,…,χ respectively

In the case of the LF sequential colouring, this happens when vertex v has a smaller degree than each of these χ neighbours. For a better understanding, we present several such cases in Fig. 4. Colours are shown with numbers near vertices. In the first graph, vertex v has two neighbours and colour 3, although the chromatic number is 2; in the second graph, it has three neighbours and colour 4; in the third, it has four neighbours and colour 5.

Fig. 4
figure 4

Examples in which Largest-First sequential colouring uses χ+1 colours

We now show informally that the bad case depicted in Fig. 3 is more likely to occur when the number of edges per vertex is small. This would explain the decrease in performance of LF with graph order as well as density. For a given graph G=(V,E), let us consider an increase in the ratio |E|/|V| and thus an increase in average and maximum degrees. This also results in an increment of the clique number ω and the chromatic number χ because χω. In this scenario, the probability of the case shown in Fig. 3 decreases mainly for two reasons: first, because the degree of vertex v cannot be less than χ, and therefore its expected degree increases faster than the expected minimum degree of its χ neighbours; second, because the probability of these χ neighbours all having degree greater than deg (V) decreases as the number χ of these neighbours increases.

What we have presented is just an intuitive explanation of Observation 2. We believe that attempting to provide rigorous proof is, at this point, impractical. It would probably require a big theorem for a relatively simple result.

2.2 Sorting a fraction of vertices by non-increasing degree

Having established the relevance of LF sorting in sequential colouring, we now proceed to describe a new sorting procedure for exact maximum clique algorithms. In MCQ [8], the colour ordering required for branching (Fig. 1, bottom) is inherited in child subproblems. As a consequence, SEQ is given a suboptimal (non-LF) ordering and its pruning ability is diminished. A first alternative to improve this situation, and described in [16], was to reorder vertices by non-increasing degree prior to colouring (i.e. explicit LF), but its computational cost is high. The paper also described a way to selectively apply this strategy in the shallower levels of the search tree.

A better compromise (currently considered the best approach) is to use a static ordering in all subproblems. As mentioned in the introductory section, this decision heuristic was first proposed independently in [9] and [10] and is currently used by state-of-the-art BBMC and MCS solvers. In static ordering, vertices in every subproblem are always kept in the same relative order as determined initially. Specifically, the pruning ability of static ordering is high in the shallow levels of the search tree and degrades with depth, as subproblems become smaller and the initial sorting is gradually lost.

Related to the colour flow in Fig. 1, both initial vertex ordering strategies MW and MWS described in Section 1.2 are reasonably consistent with LF greedy colouring, in the sense that vertices with high degrees are implicitly placed first in V and colouring proceeds from first to last. However, vertices are actually placed following a smallest-degree-last strategy, which can differ considerably from an explicit highest-degree-first sorting because both MW and MWS are degenerate orderings.

It is easy to see this effect with the example depicted in Fig. 5. Figure 5A shows a simple graph G in which vertices are numbered according to an initial default ordering that uniquely identifies them in the rest of the figures. This ordering will also determine tiebreaks when required. From the perspective of the control flow in Fig. 1, vertices are coloured in natural order (i.e. starting from vertex {1} and going anti-clockwise) and selected in reverse order (i.e. starting from {6} and going clockwise).

Fig. 5
figure 5

Different initial vertex orderings for the MCP. The small numbers near the vertices in B, C, and D indicate their new positions

Figure 5B presents the minimum width ordering (MW) of the graph, and Fig. 5C the minimum width ordering with vertex support (MWS). The difference between them lies in the support of vertices {2} and {4}, which have both the same degree (deg(2) = deg(4) = 2). Ties are broken by vertex number for MW, so vertex {2} is picked first (and placed last) in the new ordering. In the case of MWS, σ(2)=7, whereas σ(4)=6, so vertex {4} is the one placed at the end. After removing {4}, two triangles appear: {1, 2, 3} and {1, 5, 6}; vertices {2, 3, 5, 6} all have minimum degree and support, so vertex {2} is selected in second place and so on.

Examining the resulting MW and MWS orderings from the perspective of the control flow in Fig. 1, it is clear that vertices are not sorted by non-increasing degree at the head of the ordering. In particular, the vertex with the highest degree {1} (deg(1) = 4) comes in third place in both cases. The reason for this lies in the degenerate ordering, which iteratively removes each sorted vertex and thus reduces the degree of the remaining vertices to their core number. In the example, vertices {1, 5, 6} are the last remaining vertices for both MW and MWS (a three-clique). The latter graph is obviously also regular, so all vertices have the same degree and are sorted in reverse order of their numbers. As a consequence, vertex {1} is misplaced.

In the light of the above considerations, we propose an improved initial sorting procedure DEG_SORT, which can be seen as a repair mechanism for MW and MWS with respect to (maximum) degree at the head of the ordering. DEG_SORT takes as input MWS and sorts, according to non-increasing degree, a subset of the first k vertices v 1,v 2⋯ ,v k (vertices with the same degree are taken according to their number). This second ordering is absolute (not degenerate) since it is directed to be as close as possible to LF in the subproblems that appear in the shallow levels of the search tree. The remaining nk vertices are not modified and remain sorted by minimum width with vertex support. Figure 5.D shows the ordering obtained by DEG_SORT in the example: vertex {1} with the highest degree is swapped with vertex {6} and placed first in the list.

Parameter k (the number of vertices reordered by DEG_SORT) should be neither too small (and thus with low impact) nor too big (the original minimum width ordering would be lost). Rather than using k as a tuning parameter, we consider a new parameter p related to the total number of vertices and define it as follows:

$$p=\left\lfloor\frac{|V|}{k}\right\rfloor,p=\{2,3,\ldots\} $$

In practice, DEG_SORT performs best when p ranges between 2 (50 % of the vertices) and 10 (10 % of the vertices). In non-structured Erdös-Rényi graphs, the best results on average appear when p is set to 3. In the case of structured graphs, they are obtained when p is set to 4, but tuning is recommended in both cases whenever possible.

3 Colour-based initial ordering of vertices

3.1 Preliminaries

As explained in previous sections, an initial ordering of vertices based on degree is well known to reduce the size of the search tree in exact maximum clique search. It is also employed by successful modern algorithms such as BBMC and MCS. The logic behind it is to minimize branching in the first level of the tree. Moreover, BBMC and MCS preserve the ordering in every other subproblem as well (to improve the bound obtained by SEQ (see Fig. 1), so the benefits of a good initial ordering also propagate down the search tree to a certain depth.

In [18], the possibility of sorting vertices initially according to a colouring of the graph C(G) = C 1,C 2,…,C k , was described. The intuition is that it should somehow prune the maximum clique search space effectively in graphs where k is a good bound on the clique number, but this was not analysed systematically in the original paper. Interestingly, the current implementations of BBMCX and MCS spend little effort in computing upper bounds on maximum clique at the root node. A typical strategy is to assign to a vertex as colour number the minimum value between its index and maximum graph degree. The above considerations motivate a systematic study of colour-based initial sorting.

The next subsection describes the sorting procedure COLOUR_SORT, which is based on [18] with additional refinements. In Subsection 3.3, we give additional explanations as to why COLOUR_SORT can be successful for BBMCX or MCS with an example. Finally, the last subsection describes the new sorting algorithm NEW_SORT, which is the main contribution of this work.

3.2 The colour-based sorting algorithm

COLOUR_SORT is described in Algorithm 1. The main computation is a variant of the constructive recursive-largest-first (RLF) colouring heuristic, which was first described in [26]. RLF computes colour classes one at a time and does not proceed with another colour until no more vertices can enlarge the current one. In the original paper, the assignment is implemented in the following way: when a new colour class C k is opened, set W 1 contains all remaining uncoloured vertices and set W 2 is empty. Iteratively, a vertex vW 1 is selected, added to C k , and removed from W 1. If v has any neighbours, they are also removed from W 1 and placed in W 2. The assignment of vertices proceeds until W 1 = ϕ. The selection of vertices is based on degree. The first vertex is the one with maximum degree in G[W 1] and the rest of vertices are those with maximum degree in G[W 2]. Once W 1 becomes empty, the next colour class is built.

COLOUR_SORT orders vertices in V according to the colour classes obtained by RLF. The specific variant used takes into account two factors:

  • A strong exact maximum clique algorithm is available, in this case BBMCX.

  • The graph to be ordered is expected to be dense, since finding its clique number presents a challenge.

The actual RLF variant used by COLOUR_SORT computes each new colour set as an independent set (a maximum clique in the complement graph \(\bar {G}\)) (steps 2 to 7). Once a colour set is produced, its vertices are placed in order in O c o l o r and removed from \(\bar {G}\). COLOUR_SORT then proceeds with a new colour set until no more vertices are left in \(\bar {G}\).

figure a

3.3 An example

To see why COLOUR_SORT can be beneficial for successful approximate-colour algorithms, we will use the coloured graph G depicted in Fig. 6. We assume G to be a subproblem, close to a leaf node, of a maximum clique search tree. The output of SEQ for the graph is C 1={1,2}(green), C 2={3,4}(yellow), and C 5={5}(cyan), as shown. The figure also indicates the colour threshold \(k_{\min }\) (the difference between the size of the best clique found so far \(\left |S_{\max }\right |\) and the size of the clique being built in the branch \(\left |S\right |)\) for the subproblem, which is 3. This implies that all vertices belonging to colour classes below this threshold (in the example, sets C 1 and C 2) will be pruned in any derived child node (for a more detailed description of the threshold, see [12] amongst others).

Fig. 6
figure 6

An example of a coloured graph

In algorithms such as BBMC or MCS, pruning the search space can be seen as a technique that accumulates as many vertices as possible behind the k m i n threshold. There are three main alternatives to achieve this:

  1. I

    Incrementing the colour threshold k m i n , or, alternatively, moving the dotted line to the right: this can be done by finding good solutions early, either by making good branching choices or by computing a strong initial solution. Note that the latter can produce very effective pruning, since it increases \(\left |S_{\max }\right |\) in the shallow levels of the search tree.

  2. II

    Shifting vertices from the right to the left of the threshold: this can be achieved with techniques such as recolouring or infra-chromatic pruning. In the example, BBMCX detects that the induced subgraph \(G[C_{1}\cup C_{2}\cup C_{3}]\) is triangle-free and reduces the bound from 3 to 2, so that {5}now falls below the threshold.

  3. III

    Improving the quality of the greedy SEQ colouring, that is, changing its output to produce colour classes C i , \(i<k_{\min }\), that are as large as possible.

The last point is especially relevant to explain why COLOUR_SORT could be successful for some graphs. SEQ is an oriented heuristic. If, in the example, the vertices were presented in the order {2}, {3}, {5}, {1}, {4}, it would find the optimum colouring C 1={2,3,5} and C 1={1,4} (after all, the graph is bipartite). Intuitively, since the relative order of vertices determined initially remains the same for all subproblems (see Subsection 1.1), a colour-based sorting of vertices at the root node could improve the SEQ colourings of many subproblems (possibly also in the deeper levels of the search tree). This can prune the search space better (sometimes even exponentially better) than a standard degree-based ordering in some cases, as will be shown in the next section.

To summarize, we believe that COLOUR_SORT can be successful for the BBMC family of algorithms when the following two conditions are met:

  • it is possible to greedily find a colouring of the input graph that is close to optimal.

  • the chromatic number of the graph is a tight bound on its clique number.

Moreover, COLOUR_SORT can be even more effective if it is combined with a strong initial solution at the start of the search. As explained, a good initial lower bound would shift the threshold k m i n to the right and increase the number of colour classes to the left of the threshold in the shallow (and critical) levels of the search tree.

3.4 The initial sorting algorithm

Before selecting COLOUR_SORT as the initial sorting procedure, we first need to compare it with its degree-based counterpart. In [18], the tail of the colouring, that is, the colour classes with the highest colour numbers, is used for evaluation. A colouring is defined as regular if its tail contains not more than one colour class with a single vertex. If two or more singleton sets exist, it is considered irregular and dismissed.

In this work, we propose to compare any two initial vertex orderings for exact maximum clique search in the following manner. For a given vertex ordering O=(v 1,v 2,…,v n ), let \(G_{v_{1}}=G[N_{\{v_{1},v_{2},\ldots ,v_{i}-1\}}(v_{i})]\) be the subproblem induced by the preceding neighbours of v 1 in the ordering and let \(u(v_{1})\geq 1+\omega (G_{v_{i}})\) be any upper bound on \(\omega (G[N_{v_{1},v_{2},\ldots ,v_{i}-1}(v_{i})\cup v_{1}])\). We then define an upper bound for the ordering O as \(u(O)=\max \limits _{v_{i}\in V}\{u(v_{i})\}\). We consider the ordering O 1 to be preferable to the ordering O 2 if u(O 1<u(O 2)).

With the help of this new bound u(O), our algorithm NEW_SORT (Algorithm 2) evaluates both vertex ordering procedures —degree-based O d e g (described in Section 2) and colour-based O c o l o r — and selects the one with smallest value of u(O). There are different ways to compute valid upper bounds for an ordering according to our previous definition. NEW_SORT uses greedy colouring SEQ (step 5). The notation \(SEQ_{O_{deg}}\) indicates that O d e g is the initial order of vertices for SEQ. A value of u(O c o l o r ) is equal to the number of colours of the RLF colouring {C 1,…,C k } computed by COLOUR_SORT. This is because, in this colouring, \(v_{i}\in C_{j}\Rightarrow u(v_{i})=j\) for any vertex in the ordering. Based on the u(o) value for both orderings, a decision is made; NEW_SORT selects O c o l o r if k is strictly lower than u(O d e g ) and selects O d e g otherwise (step 6).

figure b

Finally, we note that if the input graph is not sufficiently dense, the task of finding a maximum clique in the complement graph becomes impractical. To avoid this, NEW_SORT follows the same strategy as [18] and dismisses O c o l o r if the average density of the graph p(G) is below a certain threshold (step 2).

4 Experiments

The hardware used for the experiments was a 20 core Xeon with 128 Gb of RAM and Linux OS. All the algorithms considered were run on a single core. These were the following:

  • BBMCX [15]: The most recent and efficient variant of the BBMC family of algorithms. Worth noting is the fact that in the comparison survey [21], the bitstring kernel of BBMCX [1012] reported the best performance over a set of graphs from public benchmarks. A similar comment appears in a more recent survey [27], and therefore we consider the choice of BBMCX justified.

  • MaxCLQ [14]: A state-of-the-art PMAX-SAT-based maximum clique solver, which uses an upper bound based on the Partial MAXimum SATisfiability problem. It was considered very efficient in [27].

For this report we consider the following initial sorting procedures:

  • MW: Minimum width sorting of vertices.

  • MWS: Minimum width sorting, breaking ties by minimum vertex support σ. In all graphs over (and including) 1,000 vertices, σ has been computed statically (MWSS) because it is much faster.

  • NEW_SORT: the sorting procedure described in Algorithm 2, which selects the best ordering between DEG_SORT and COLOUR_SORT. DEG_SORT is implemented with the parameter p∈{3,4,…,10} tuned for the best performance for each family of graphs. For this task we consider only easy instances in each family, that is, graphs with estimated running times below 5s. Thus, the tuning process does not constitute a significant constraint in practice.

We also compute a strong initial solution with a state-of-the-art heuristic. This was reported to improve the performance of exact maximum clique solvers in [17]. It was also discussed in Section 3.3 as a possible enhancement of COLOUR_SORT. The heuristic we used was ILS (Iterated Local Search, described in [28]) as in the original paper [17]. In all experiments, time is measured in seconds (with precision of milliseconds) and only running times for the actual search are given (the common procedure in maximum clique literature). The time limit for each experiment was fixed at 24 h.

Graphs employed for the tests are taken from DIMACSFootnote 1 (presented at the Second DIMACS Implementation Challenge) and BHOSHLIBFootnote 2 public data sets. The concrete 67 instances chosen are representative of all families and frequently used in similar reports that may be found elsewhere.

Table 3 reports all the results used to evaluate NEW_SORT. The best time for each graph is shown in bold and the minimum number of steps is shown in italics. The column header ω o shows the initial clique computed during standard preprocessing. The column header ω o(ILS) shows the initial clique found by the ILS heuristic, which was optimal in 60 out of the 67 graphs considered. Concerning the algorithm configuration, MaxCLQ was run as provided by the developer and given the same initial solution ω o as the one computed by BBMCX; BBMCX + MW is the current release of BBMCX and BBMCX + NEW_SORT is the enhanced algorithm, which also includes the stronger ω o(ILS) lower bound. Finally, the column headers under BBMCX/NEW report the time and steps ratio between BBMCX + MW and BBMCX + NEW_SORT. In the cases where the performance of an algorithm is below a millisecond (reported as <0.001), the actual value is rounded up to a millisecond to compute the time ratio.

Table 3 Evaluation of NEW_SORT

4.1 Evaluation

Of the 67 instances considered, BBMCX + NEW_SORT (or NEW_SORT for simplicity) performs better than BBMCX without NEW_SORT in 49 graphs. It is slower in only 5 graphs and prunes the search space better in 56 graphs. Moreover, the performance is improved by more than 15 times in 15 graphs, notably from the gen, keller, frb, and san families. Interestingly, NEW_SORT prefers COLOUR_SORT to the degree-based sorting computed by DEG_SORT in all graphs of three of those four families, specifically gen, keller, and frb.

We will now discuss the results for each family of graphs concerning BBMCX and NEW_SORT to try to provide explanations for the obtained results. The results by families may be summarized as follows:

  • MANN, hamming and johnson: these sets are not significantly affected by any of the enhancements. A possible explanation for the MANN family is its very high density, which makes preprocessing irrelevant. The graphs from the other two families are easy for all the algorithms, so it is not possible to draw any conclusion.

  • C: DEG_SORT, as well as the strong initial solution, explain the difference in performance of BBMCX for this family of graphs. We estimate the reduction of the search tree with the new initial ordering to be around 7 % in the more difficult C250.9 graph.

  • brock and dsjc: The impact of DEG_SORT is not very significant here. In the cases of almost an order of magnitude of improvement (i.e. brock400_3 or brock_400_4), it is explained by a strong initial solution.

  • frb, gen, and keller: When exponential improvements occur, the explanation is mainly due to COLOUR_SORT. Specifically, the frb-30 instances have 30 as both the chromatic and the clique number, and DEG_SORT is unable to capture this structure. COLOUR_SORT, however, finds an optimum colouring, and when vertices are initially sorted in that way, the problem becomes trivial. Instances gen400_p0.9_55 and gen400_p0.9_65 are also trivially solved by BBMCX with COLOUR_SORT, while keller5 is solved more than 30 times faster.

  • p_hat: This family contains non-structured graphs in which significant differences between DEG_SORT and prior orderings were not expected. Interestingly, in three cases DEG_SORT reduces the size of the search tree by more than 10 %. Performances over this threshold are due to the improved initial solution.

  • san, sanr: DEG_SORT improves performance by a small margin, compared with MW, in more difficult graphs (i.e. with 0.9 density). However, these types of instances are well known to be sensitive to a good solution, so whenever NEW_SORT gives a vast improvement in performance (as in the san400_ 0.7—0.9 graphs), the main explanation is the strong initial solution.

Concerning parameter p in DEG_SORT, the best overall value is 4 (in five families) followed by 5 (in san and sanr), 3 (in C), 8 (in frb30), and finally 10 for the p_hat family. As mentioned previously, the tuning procedure uses the easier instances, so it does not constitute a significant disadvantage in a real application.

With respect to MaxCLQ, the proposed NEW_SORT enhances BBMCX so that the latter performs better in the majority of graphs; specifically, it is faster in 60 cases, more than three times faster in 43 cases, and more than an order of magnitude faster in 26 cases. MaxCLQ is supposed to outperform standard BBMCX only in some of the harder, more dense, graphs (independently of the initial sorting). It does so significantly in the graphs MANN_a27,MANN_a45, and C250.9.

5 Conclusions

This work describes a new initial vertex ordering (NEW_SORT) that significantly improves the performance of a family of exact approximate-colour-based solvers for the MCP.

It does so by selecting the ”best” ordering between an improved typical degree-based ordering and a colour-based one. Both sorting procedures have polynomial time complexity and are easy to implement, which makes them useful in practical applications where the exact solution for the maximum clique problem is critical. The best results are obtained when NEW_SORT is further enhanced with a strong initial solution. The reported results show that the improved performance may even be exponential for some graphs.

As a side result, this work also provides an interesting observation for Erdös-Rényi uniform random graphs. It has been observed that the effectiveness of ordering vertices by non-increasing degree for sequential greedy colouring heuristic SEQ is inversely related to the size of these graphs. Work in progress is concerned with further analysis of this result and, if considered appropriate, establishing theoretical proof.