Improved initial vertex ordering for exact maximum clique search

Segundo, Pablo San; Lopez, Alvaro; Batsyn, Mikhail; Nikolaev, Alexey; Pardalos, Panos M.

doi:10.1007/s10489-016-0796-9

Improved initial vertex ordering for exact maximum clique search

Published: 24 May 2016

Volume 45, pages 868–880, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Applied Intelligence Aims and scope Submit manuscript

Improved initial vertex ordering for exact maximum clique search

Download PDF

Pablo San Segundo¹,
Alvaro Lopez¹,
Mikhail Batsyn²,
Alexey Nikolaev² &
…
Panos M. Pardalos^2,3

417 Accesses
15 Citations
Explore all metrics

An Erratum to this article was published on 18 November 2016

Abstract

This paper describes a new initial vertexordering procedure NEW_SORT designed to enhance approximate-colour exact algorithms for the maximum clique problem (MCP). NEW_SORT considers two different vertex orderings: degree and colour-based. The degree-based vertex ordering describes an improvement over a well-known vertex ordering used by exact solvers. Moreover, colour-based vertex orderings for the MCP have been traditionally considered suboptimal with respect to degree-based ones. NEW_SORT chooses the “best” of the two orderings according to a new evaluation function. The reported experiments on graphs taken from public datasets show that a leading exact solver using NEW_SORT —and further enhanced with a strong initial solution— can improve its performance very significantly (sometimes even exponentially).

Initial Sorting of Vertices in the Maximum Clique Problem Reviewed

Reversed Search Maximum Clique Algorithm Based on Recoloring

New Integer Linear Programming Models for the Vertex Coloring Problem

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

A simple, undirected graph G=(V,E) is defined by a set of vertices $V=\left \{ {v_{1} ,v_{2} ,{\cdots } ,v_{n}} \right \}$ and a set of edges E made up of pairs of distinct vertices ($E\subseteq V\times V)$. A clique in graph G is a complete subgraph, that is, a subgraph in which vertices are pairwise adjacent. In this work, we consider the maximum clique problem (MCP), which asks for a clique of the largest cardinality in the graph. The size of a maximum clique is called the clique number of the graph and is usually denoted as ω(G).

The MCP is a well known and deeply studied NP-hard problem in graph theory. Moreover, it has found applications in many different fields, such as data association problems in bioinformatics and computational biology [1–3], computer vision [4], and robotics [5]. Such association problems may be reduced to the MCP in a correspondence graph, which subsumes the matching criteria between the two entities involved. With the upsurge of Web technologies, cliques have also been applied to capture the structure of massive networks. For example, in social networks a clique can identify a group of cooperating agents (e.g. a terrorist cell); in the World Wide Web, cliques or quasi-cliques can help detect frequently visited pages concerning a certain topic. Clique kernels can also help to identify clusters.

Relevant definitions and notation used in the paper are the following:

G[W]=(W,E[W]): a subgraph of graph G induced by a vertex set $W\subseteq V$.
$N(u)=\left \{{v\in V\vert (u,v)\in E} \right \}:$ the neighbour set of vertex u in graph G, that is, the set of vertices adjacent to u. Notation may include a vertex set as a subscript (e.g. N _W(u)) to refer to the neighbourhood in the induced subgraph G[W].
deg(u)=|N(u)|: the degree of vertex u.
k-colouring: an assignment of k different numbers (colours) to every vertex of graph G such that adjacent vertices have different colours, that is, $u\in N(v)\Rightarrow c(u)\neq c(v)$. A k-colouring partitions the vertex set V into k disjoint colour sets C ₁,C ₂,…,C _k, also called colour classes. Each colour set C ₁,C ₂,…,C _k, is an independent set, that is, a set of pairwise non-adjacent vertices.
χ(G): the chromatic number of graph G, that is, the minimum number k of colours required to colour G.
greedy sequential colouring (SEQ): a colouring heuristic which sequentially assigns the lowest possible colour number to each vertex.
$w(v_{i})=|N(v_{i})\cap \{v_{1},\ldots ,v_{i-1}\}|$: denotes the width at the i-th vertex in a sequence, that is, the number of vertices in N(v _i) that precede v _i. The width of an ordering is the maximum width at any of its vertices.
degeneracy (or width) of G: the minimum width of any ordering of V.
minimum-degree-last ordering of vertices: a vertex ordering of minimum width obtained by iteratively removing vertices with minimum degree and placing them in reverse order in the new ordering.
$\sigma (v)=\sum \limits _{u\in N(v)}\text {deg}(u)$: the neighbourhood support of v, that is, the sum of its neighbours’ degrees.

1.1 Exact branch and bound algorithms

In the literature, there are many different approaches to solving the MCP exactly. Most successful exact solvers belong to the family of branch-and-bound algorithms that employ approximate-colour bounds [6–19]. Fahle’s algorithm [7] is possibly the first solver of this type.

Exhaustive enumeration can be traced back to the classic Bron-Kerbosch algorithm [19]. Exact solvers keep track of a growing clique in S and a candidate set of vertices U that can enlarge S. At each step, a single vertex v is selected from U to build a bigger clique in S and create a new, smaller subproblem, with a set of candidates N _U(v). Leaf nodes of the search tree correspond to maximal cliques and during enumeration, they are checked to see whether their size is greater than the incumbent solution stored in a global variable S _{m
a
x}. Every time a bigger clique is found, it is written to S _{m
a
x.}

The basic branch-and-bound approach for the MCP can be traced to Carraghan and Pardalos in [22]. Approximate colour bounds for a maximum clique achieve a good compromise between tightness and computational effort. Proposition 1 provides theoretical justification for this, and may be derived trivially from [20].

Proposition 1

Any k-colouring of a graph G gives an upper bound on its clique number (ω(G)≤χ(G)≤k).

In most of the effective exact maximum clique approximate-colour algorithms for the MCP, the greedy sequential colouring heuristic SEQ is employed to colour each subproblem. SEQ is a constructive heuristic that iteratively assigns the smallest possible colour to every vertex such that no conflicts with the already coloured vertices occur. It has a worst-case running time of O(n ²).

Relevant recent improvements reported in the literature for exact maximum clique algorithms that employ approximate-colour bounds are (in chronological order):

Branching on maximum colour: at each step (a recursive call of the algorithm), vertices are selected for branching in non-increasing order of their colour numbers. This was first described in algorithm MCQ [8].
Recolouring: an additional computation which aims at reducing the size of the colouring obtained by SEQ, but increases its complexity linearly to O(n ³). It was first described in algorithm MCS [9].
Static ordering of vertices: vertices in every subproblem are always sorted in the order determined at the beginning of the search. This was first described independently both in MCS and in the bit-parallel kernel of the BBMC family of algorithms [10, 11].
Bitstring encoding of the MCP [10–12, 15]: the BBMC family of algorithms represents vertex sets, as well as the input graph, via bitstrings. The advantage is that critical operations related to child problem generation and bound computation are performed more efficiently using bitmasks.
Selective colouring: a partial SEQ colouring in which only the subset of vertices to be pruned in the child subproblem is coloured. It was first described in BBMCL [12] (the ‘L’ stands for seLective).
Strong heuristic for a ’good’ initial solution, as described in [17].
Infra-chromatic bound: a bound tighter than the one obtained by SEQ. This bound can possibly be lower than the chromatic number of the input graph. In MaxCLQ [13, 14], the authors proposed one such bound based on reducing the maximum clique problem determined by each coloured subgraph to the partial maximum satisfiability problem. The term infra-chromatic first appeared in [15], where the BBMCX algorithm is described. BBMCX shares the bitstring BBMC kernel and implements an infra-chromatic bound by looking for triplets of colour sets in which there are no vertices that can form a triangle. For each such triplet, denoted inconsistent, the bound is decremented from 3 to 2. BBMCX is currently the fastest published algorithm of the BBMC family for dense graphs.

Table 1 summarizes the majority of algorithms described in this section, together with their most relevant properties.

Table 1 A number of relevant exact maximum clique solvers in chronological order

Full size table

This work describes two initial orderings of vertices that are efficient for successful approximate-colour solvers, such as MCS or BBMCX, with the exception of MaxSAT-based MaxCLQ.

Related to branching on maximum colour and static ordering is the fact that the initial sorting of vertices is well known to have a significant impact on the size of the MCP search tree. We discuss this issue in the next subsection, as it is very much concerned with the contribution of this work.

1.2 Initial ordering of vertices

A well-known initial sorting strategy for exact MCP solvers is to branch on vertices with the smallest degree at the root node. The idea is similar to branching on variables with a small number of values used in constraint satisfaction problems. In practice, vertices with the smallest degree are placed last in V and branching in all subproblems, including the root node, is done by selecting vertices in reverse order.

The most successful initial sorting strategies for exact MCP algorithms reported in the literature are the following:

Minimum width (MW): a minimum-degree-last ordering (ties broken randomly or in natural order), which can be traced back to [22] in connection with the MCP. As explained previously, it is a degenerate ordering achieved by removing, at each iteration, the vertex with the minimum degree and placing it in reverse order in the new ordering.
Minimum width with tie break by minimum support (MWS): Similar to MW but with a tie-break strategy: it selects the vertex with the minimum support from the set of vertices with the same degree.

A lighter variant for computing MWS was brought to our attention in a personal communication [23] and will be referred to as MWSS (Minimum Width with Static Support). It is similar to MWS, but instead of recalculating neighbour support at each step it uses the vertex support determined by the initial ordering in every iteration. MWSS is very useful in graphs of high order, in which the computational cost of MWS is high. By default, support tiebreak in this paper always refers to MWS. MWSS will be explicitly mentioned when disambiguation is required.

Initial sorting by degree has been the standard choice of successful approximate-colour exact algorithms for the MCP. Recently, a colour-based ordering was described in [18] as a possible enhancement of the MaxCLQ algorithm. However, the specific impact of the ordering was not analysed.

In Section 2 of this paper, we describe a new initial sorting procedure, DEG_SORT, which improves standard MW/MWS degree-based orderings. Section 3 starts by describing a colour-based initial sorting procedure COLOUR_SORT, based on [18], and explains why it can also be successful for MCS or the BBMC family of algorithms. The final part of the section describes the NEW_SORT algorithm proposed in this work. NEW_SORT selects DEG_SORT or COLOUR_SORT according to a new evaluation function. Section 4 covers the experiments and validation. Finally, Section 5 presents the conclusions and future work.

2 Improved degree-based initial sorting

Branching on vertices with the highest colour was first proposed in MCQ [8] and, since then, has been applied by most successful MCP exact solvers. In practice, as mentioned previously, vertices are sorted in a highest-colour-last fashion and taken in reverse order in every subproblem. Figure 1 depicts the control flow: vertices are coloured by greedy SEQ according to the initial ordering (Fig. 1, top), sorted according to non-decreasing colour number, and then selected in reverse order (Fig. 1, bottom).

2.1 Analysis of largest-first vertex colouring heuristic

To evaluate the quality of the bounds obtained by direct implementation of the flow in Fig. 1 (very much related to the proposed new sorting heuristic), we consider the Largest-First (LF) decision heuristic for greedy vertex colouring of Welsh and Powell. In [24], they proved that given a non-increasing degree ordering of vertices (i.e. deg(v ₁)≥ deg(v ₂)≥…≥ deg(v _n)), SEQ would always produce not more than $\max \limits _{i \in V}\min \{i,1+\text {deg}(v_{i})\}$ colours. This is known as the Welsh and Powell bound. We will refer to the opposite (bad) ordering deg(v ₁)≤ deg(v ₂)≤…≤ deg(v _n) as Smallest-First (SF).

The key idea of LF is to assign colour numbers to the most conflicting vertices early in the hope that those remaining will require a small number of colours (ideally, not different from those used in the early stages). Moreover, Observation 1 is widely accepted (see for example [25]) and many recent exact MCP solvers apply some variant of LF ordering for SEQ colouring [8–12, 15–17].

Observation 1

Greedy sequential colouring of vertices sorted according to the LF rule almost always produces tighter colourings than the Welsh and Powell bound.

A qualitative measure of the impact of LF ordering in SEQ may be found in Table 2. There, LF is compared with its counterpart SF in structured and non-structured uniform random graphs. Note that we do not compare the number of colours in LF colouring with the chromatic number since it is impractical to compute the chromatic number in most of the graphs.

Table 2 Comparison between Largest-First and Smallest-First colouring heuristics for a number of structured (see Appendix) and uniform random graphs

Full size table

In the case of structured instances, the table reports average colour sizes for typical members of each family (brock200_1 – brock400_4, dsjc 500.1/5 – dsjc 1000.1/5, MANN_a9 – MANN_a27 etc.; see the A for the full list). In the case of Erdös-Rényi random graphs G(n,p), the table reports average colour sizes for different values of n and density p (we consider 50 instances for each graph type). Table 2 gives evidence that the LF rule produces tighter sequential colourings, on average, than the SF one: up to 12 % improvement for non-structured graphs and 30 % for structured graphs. The exception is the MANN family, in which SF actually improves the colouring. This may be explained by the high density (p>0.92) of these particular graphs. We note that even a small bound improvement can produce an exponential reduction in the size of the maximum clique search tree.

Another interesting result to be derived from Table 2 is Observation 2, where the term benefit refers to the gap between LF and SF as a percentage.

Observation 2

The benefit of LF ordering for SEQ colouring diminishes with the growth of graph size and density in the case of uniform random graphs.

Figure 2 corresponds with the data in Table 2 but includes density information for each graph order considered. Observation 2 is captured by the fact that lines in the line chart are aligned by increasing graph size from top to bottom in the figure. Lines cross for some neighbour sizes and different densities [e.g. (200, 0.1) shows a 16.81 % improvement whereas (250, 0.1) shows a superior 17.65 % improvement], but the trend is clearly there. We are not aware of this fact being reported elsewhere and consider Observation 2 as an additional contribution of the paper.

We propose the following intuition as an explanation. Let us consider the cases in which any sequential ordering mistakenly uses colour χ(G)+1 for some vertex v, where χ(G) is the chromatic number of the coloured graph G. Such a case is shown in Fig. 3, in which vertex v has χ neighbours with colours 1, …, χ and has to be coloured with colour χ+1.

In the case of the LF sequential colouring, this happens when vertex v has a smaller degree than each of these χ neighbours. For a better understanding, we present several such cases in Fig. 4. Colours are shown with numbers near vertices. In the first graph, vertex v has two neighbours and colour 3, although the chromatic number is 2; in the second graph, it has three neighbours and colour 4; in the third, it has four neighbours and colour 5.

We now show informally that the bad case depicted in Fig. 3 is more likely to occur when the number of edges per vertex is small. This would explain the decrease in performance of LF with graph order as well as density. For a given graph G=(V,E), let us consider an increase in the ratio |E|/|V| and thus an increase in average and maximum degrees. This also results in an increment of the clique number ω and the chromatic number χ because χ≥ω. In this scenario, the probability of the case shown in Fig. 3 decreases mainly for two reasons: first, because the degree of vertex v cannot be less than χ, and therefore its expected degree increases faster than the expected minimum degree of its χ neighbours; second, because the probability of these χ neighbours all having degree greater than deg (V) decreases as the number χ of these neighbours increases.

What we have presented is just an intuitive explanation of Observation 2. We believe that attempting to provide rigorous proof is, at this point, impractical. It would probably require a big theorem for a relatively simple result.

2.2 Sorting a fraction of vertices by non-increasing degree

Having established the relevance of LF sorting in sequential colouring, we now proceed to describe a new sorting procedure for exact maximum clique algorithms. In MCQ [8], the colour ordering required for branching (Fig. 1, bottom) is inherited in child subproblems. As a consequence, SEQ is given a suboptimal (non-LF) ordering and its pruning ability is diminished. A first alternative to improve this situation, and described in [16], was to reorder vertices by non-increasing degree prior to colouring (i.e. explicit LF), but its computational cost is high. The paper also described a way to selectively apply this strategy in the shallower levels of the search tree.

A better compromise (currently considered the best approach) is to use a static ordering in all subproblems. As mentioned in the introductory section, this decision heuristic was first proposed independently in [9] and [10] and is currently used by state-of-the-art BBMC and MCS solvers. In static ordering, vertices in every subproblem are always kept in the same relative order as determined initially. Specifically, the pruning ability of static ordering is high in the shallow levels of the search tree and degrades with depth, as subproblems become smaller and the initial sorting is gradually lost.

Related to the colour flow in Fig. 1, both initial vertex ordering strategies MW and MWS described in Section 1.2 are reasonably consistent with LF greedy colouring, in the sense that vertices with high degrees are implicitly placed first in V and colouring proceeds from first to last. However, vertices are actually placed following a smallest-degree-last strategy, which can differ considerably from an explicit highest-degree-first sorting because both MW and MWS are degenerate orderings.

It is easy to see this effect with the example depicted in Fig. 5. Figure 5A shows a simple graph G in which vertices are numbered according to an initial default ordering that uniquely identifies them in the rest of the figures. This ordering will also determine tiebreaks when required. From the perspective of the control flow in Fig. 1, vertices are coloured in natural order (i.e. starting from vertex {1} and going anti-clockwise) and selected in reverse order (i.e. starting from {6} and going clockwise).

Figure 5B presents the minimum width ordering (MW) of the graph, and Fig. 5C the minimum width ordering with vertex support (MWS). The difference between them lies in the support of vertices {2} and {4}, which have both the same degree (deg(2) = deg(4) = 2). Ties are broken by vertex number for MW, so vertex {2} is picked first (and placed last) in the new ordering. In the case of MWS, σ(2)=7, whereas σ(4)=6, so vertex {4} is the one placed at the end. After removing {4}, two triangles appear: {1, 2, 3} and {1, 5, 6}; vertices {2, 3, 5, 6} all have minimum degree and support, so vertex {2} is selected in second place and so on.

Examining the resulting MW and MWS orderings from the perspective of the control flow in Fig. 1, it is clear that vertices are not sorted by non-increasing degree at the head of the ordering. In particular, the vertex with the highest degree {1} (deg(1) = 4) comes in third place in both cases. The reason for this lies in the degenerate ordering, which iteratively removes each sorted vertex and thus reduces the degree of the remaining vertices to their core number. In the example, vertices {1, 5, 6} are the last remaining vertices for both MW and MWS (a three-clique). The latter graph is obviously also regular, so all vertices have the same degree and are sorted in reverse order of their numbers. As a consequence, vertex {1} is misplaced.

In the light of the above considerations, we propose an improved initial sorting procedure DEG_SORT, which can be seen as a repair mechanism for MW and MWS with respect to (maximum) degree at the head of the ordering. DEG_SORT takes as input MWS and sorts, according to non-increasing degree, a subset of the first k vertices v ₁,v ₂⋯ ,v _k (vertices with the same degree are taken according to their number). This second ordering is absolute (not degenerate) since it is directed to be as close as possible to LF in the subproblems that appear in the shallow levels of the search tree. The remaining n – k vertices are not modified and remain sorted by minimum width with vertex support. Figure 5.D shows the ordering obtained by DEG_SORT in the example: vertex {1} with the highest degree is swapped with vertex {6} and placed first in the list.

Parameter k (the number of vertices reordered by DEG_SORT) should be neither too small (and thus with low impact) nor too big (the original minimum width ordering would be lost). Rather than using k as a tuning parameter, we consider a new parameter p related to the total number of vertices and define it as follows:

$$p=\left\lfloor\frac{|V|}{k}\right\rfloor,p=\{2,3,\ldots\} $$

In practice, DEG_SORT performs best when p ranges between 2 (50 % of the vertices) and 10 (10 % of the vertices). In non-structured Erdös-Rényi graphs, the best results on average appear when p is set to 3. In the case of structured graphs, they are obtained when p is set to 4, but tuning is recommended in both cases whenever possible.

3 Colour-based initial ordering of vertices

3.1 Preliminaries

As explained in previous sections, an initial ordering of vertices based on degree is well known to reduce the size of the search tree in exact maximum clique search. It is also employed by successful modern algorithms such as BBMC and MCS. The logic behind it is to minimize branching in the first level of the tree. Moreover, BBMC and MCS preserve the ordering in every other subproblem as well (to improve the bound obtained by SEQ (see Fig. 1), so the benefits of a good initial ordering also propagate down the search tree to a certain depth.

In [18], the possibility of sorting vertices initially according to a colouring of the graph C(G) = C ₁,C ₂,…,C _k, was described. The intuition is that it should somehow prune the maximum clique search space effectively in graphs where k is a good bound on the clique number, but this was not analysed systematically in the original paper. Interestingly, the current implementations of BBMCX and MCS spend little effort in computing upper bounds on maximum clique at the root node. A typical strategy is to assign to a vertex as colour number the minimum value between its index and maximum graph degree. The above considerations motivate a systematic study of colour-based initial sorting.

The next subsection describes the sorting procedure COLOUR_SORT, which is based on [18] with additional refinements. In Subsection 3.3, we give additional explanations as to why COLOUR_SORT can be successful for BBMCX or MCS with an example. Finally, the last subsection describes the new sorting algorithm NEW_SORT, which is the main contribution of this work.

3.2 The colour-based sorting algorithm

COLOUR_SORT is described in Algorithm 1. The main computation is a variant of the constructive recursive-largest-first (RLF) colouring heuristic, which was first described in [26]. RLF computes colour classes one at a time and does not proceed with another colour until no more vertices can enlarge the current one. In the original paper, the assignment is implemented in the following way: when a new colour class C _k is opened, set W ₁ contains all remaining uncoloured vertices and set W ₂ is empty. Iteratively, a vertex v∈W ₁ is selected, added to C _k, and removed from W ₁. If v has any neighbours, they are also removed from W ₁ and placed in W ₂. The assignment of vertices proceeds until W ₁ = ϕ. The selection of vertices is based on degree. The first vertex is the one with maximum degree in G[W ₁] and the rest of vertices are those with maximum degree in G[W ₂]. Once W ₁ becomes empty, the next colour class is built.

COLOUR_SORT orders vertices in V according to the colour classes obtained by RLF. The specific variant used takes into account two factors:

A strong exact maximum clique algorithm is available, in this case BBMCX.
The graph to be ordered is expected to be dense, since finding its clique number presents a challenge.

The actual RLF variant used by COLOUR_SORT computes each new colour set as an independent set (a maximum clique in the complement graph $\bar {G}$) (steps 2 to 7). Once a colour set is produced, its vertices are placed in order in O _{c
o
l
o
r} and removed from $\bar {G}$. COLOUR_SORT then proceeds with a new colour set until no more vertices are left in $\bar {G}$.

3.3 An example

To see why COLOUR_SORT can be beneficial for successful approximate-colour algorithms, we will use the coloured graph G depicted in Fig. 6. We assume G to be a subproblem, close to a leaf node, of a maximum clique search tree. The output of SEQ for the graph is C ₁={1,2}(green), C ₂={3,4}(yellow), and C ₅={5}(cyan), as shown. The figure also indicates the colour threshold $k_{\min }$ (the difference between the size of the best clique found so far $\left |S_{\max }\right |$ and the size of the clique being built in the branch $\left |S\right |)$ for the subproblem, which is 3. This implies that all vertices belonging to colour classes below this threshold (in the example, sets C ₁ and C ₂) will be pruned in any derived child node (for a more detailed description of the threshold, see [12] amongst others).

In algorithms such as BBMC or MCS, pruning the search space can be seen as a technique that accumulates as many vertices as possible behind the k _{m
i
n} threshold. There are three main alternatives to achieve this:

I
Incrementing the colour threshold k _{m
i
n}, or, alternatively, moving the dotted line to the right: this can be done by finding good solutions early, either by making good branching choices or by computing a strong initial solution. Note that the latter can produce very effective pruning, since it increases $\left |S_{\max }\right |$ in the shallow levels of the search tree.
II
Shifting vertices from the right to the left of the threshold: this can be achieved with techniques such as recolouring or infra-chromatic pruning. In the example, BBMCX detects that the induced subgraph $G[C_{1}\cup C_{2}\cup C_{3}]$ is triangle-free and reduces the bound from 3 to 2, so that {5}now falls below the threshold.
III
Improving the quality of the greedy SEQ colouring, that is, changing its output to produce colour classes C _i, $i<k_{\min }$, that are as large as possible.

The last point is especially relevant to explain why COLOUR_SORT could be successful for some graphs. SEQ is an oriented heuristic. If, in the example, the vertices were presented in the order {2}, {3}, {5}, {1}, {4}, it would find the optimum colouring C ₁={2,3,5} and C ₁={1,4} (after all, the graph is bipartite). Intuitively, since the relative order of vertices determined initially remains the same for all subproblems (see Subsection 1.1), a colour-based sorting of vertices at the root node could improve the SEQ colourings of many subproblems (possibly also in the deeper levels of the search tree). This can prune the search space better (sometimes even exponentially better) than a standard degree-based ordering in some cases, as will be shown in the next section.

To summarize, we believe that COLOUR_SORT can be successful for the BBMC family of algorithms when the following two conditions are met:

it is possible to greedily find a colouring of the input graph that is close to optimal.
the chromatic number of the graph is a tight bound on its clique number.

Moreover, COLOUR_SORT can be even more effective if it is combined with a strong initial solution at the start of the search. As explained, a good initial lower bound would shift the threshold k _{m
i
n} to the right and increase the number of colour classes to the left of the threshold in the shallow (and critical) levels of the search tree.

3.4 The initial sorting algorithm

Before selecting COLOUR_SORT as the initial sorting procedure, we first need to compare it with its degree-based counterpart. In [18], the tail of the colouring, that is, the colour classes with the highest colour numbers, is used for evaluation. A colouring is defined as regular if its tail contains not more than one colour class with a single vertex. If two or more singleton sets exist, it is considered irregular and dismissed.

In this work, we propose to compare any two initial vertex orderings for exact maximum clique search in the following manner. For a given vertex ordering O=(v ₁,v ₂,…,v _n), let $G_{v_{1}}=G[N_{\{v_{1},v_{2},\ldots ,v_{i}-1\}}(v_{i})]$ be the subproblem induced by the preceding neighbours of v ₁ in the ordering and let $u(v_{1})\geq 1+\omega (G_{v_{i}})$ be any upper bound on $\omega (G[N_{v_{1},v_{2},\ldots ,v_{i}-1}(v_{i})\cup v_{1}])$. We then define an upper bound for the ordering O as $u(O)=\max \limits _{v_{i}\in V}\{u(v_{i})\}$. We consider the ordering O ₁ to be preferable to the ordering O ₂ if u(O ₁<u(O ₂)).

With the help of this new bound u(O), our algorithm NEW_SORT (Algorithm 2) evaluates both vertex ordering procedures —degree-based O _{d
e
g} (described in Section 2) and colour-based O _{c
o
l
o
r}— and selects the one with smallest value of u(O). There are different ways to compute valid upper bounds for an ordering according to our previous definition. NEW_SORT uses greedy colouring SEQ (step 5). The notation $SEQ_{O_{deg}}$ indicates that O _{d
e
g} is the initial order of vertices for SEQ. A value of u(O _{c
o
l
o
r}) is equal to the number of colours of the RLF colouring {C ₁,…,C _k} computed by COLOUR_SORT. This is because, in this colouring, $v_{i}\in C_{j}\Rightarrow u(v_{i})=j$ for any vertex in the ordering. Based on the u(o) value for both orderings, a decision is made; NEW_SORT selects O _{c
o
l
o
r} if k is strictly lower than u(O _{d
e
g}) and selects O _{d
e
g} otherwise (step 6).

Finally, we note that if the input graph is not sufficiently dense, the task of finding a maximum clique in the complement graph becomes impractical. To avoid this, NEW_SORT follows the same strategy as [18] and dismisses O _{c
o
l
o
r} if the average density of the graph p(G) is below a certain threshold (step 2).

4 Experiments

The hardware used for the experiments was a 20 core Xeon with 128 Gb of RAM and Linux OS. All the algorithms considered were run on a single core. These were the following:

BBMCX [15]: The most recent and efficient variant of the BBMC family of algorithms. Worth noting is the fact that in the comparison survey [21], the bitstring kernel of BBMCX [10–12] reported the best performance over a set of graphs from public benchmarks. A similar comment appears in a more recent survey [27], and therefore we consider the choice of BBMCX justified.
MaxCLQ [14]: A state-of-the-art PMAX-SAT-based maximum clique solver, which uses an upper bound based on the Partial MAXimum SATisfiability problem. It was considered very efficient in [27].

For this report we consider the following initial sorting procedures:

MW: Minimum width sorting of vertices.
MWS: Minimum width sorting, breaking ties by minimum vertex support σ. In all graphs over (and including) 1,000 vertices, σ has been computed statically (MWSS) because it is much faster.
NEW_SORT: the sorting procedure described in Algorithm 2, which selects the best ordering between DEG_SORT and COLOUR_SORT. DEG_SORT is implemented with the parameter p∈{3,4,…,10} tuned for the best performance for each family of graphs. For this task we consider only easy instances in each family, that is, graphs with estimated running times below 5s. Thus, the tuning process does not constitute a significant constraint in practice.

We also compute a strong initial solution with a state-of-the-art heuristic. This was reported to improve the performance of exact maximum clique solvers in [17]. It was also discussed in Section 3.3 as a possible enhancement of COLOUR_SORT. The heuristic we used was ILS (Iterated Local Search, described in [28]) as in the original paper [17]. In all experiments, time is measured in seconds (with precision of milliseconds) and only running times for the actual search are given (the common procedure in maximum clique literature). The time limit for each experiment was fixed at 24 h.

Graphs employed for the tests are taken from DIMACS^{Footnote 1} (presented at the Second DIMACS Implementation Challenge) and BHOSHLIB^{Footnote 2} public data sets. The concrete 67 instances chosen are representative of all families and frequently used in similar reports that may be found elsewhere.

Table 3 reports all the results used to evaluate NEW_SORT. The best time for each graph is shown in bold and the minimum number of steps is shown in italics. The column header ω _o shows the initial clique computed during standard preprocessing. The column header ω _o(ILS) shows the initial clique found by the ILS heuristic, which was optimal in 60 out of the 67 graphs considered. Concerning the algorithm configuration, MaxCLQ was run as provided by the developer and given the same initial solution ω _o as the one computed by BBMCX; BBMCX + MW is the current release of BBMCX and BBMCX + NEW_SORT is the enhanced algorithm, which also includes the stronger ω _o(ILS) lower bound. Finally, the column headers under BBMCX/NEW report the time and steps ratio between BBMCX + MW and BBMCX + NEW_SORT. In the cases where the performance of an algorithm is below a millisecond (reported as <0.001), the actual value is rounded up to a millisecond to compute the time ratio.

Table 3 Evaluation of NEW_SORT

Full size table

4.1 Evaluation

Of the 67 instances considered, BBMCX + NEW_SORT (or NEW_SORT for simplicity) performs better than BBMCX without NEW_SORT in 49 graphs. It is slower in only 5 graphs and prunes the search space better in 56 graphs. Moreover, the performance is improved by more than 15 times in 15 graphs, notably from the gen, keller, frb, and san families. Interestingly, NEW_SORT prefers COLOUR_SORT to the degree-based sorting computed by DEG_SORT in all graphs of three of those four families, specifically gen, keller, and frb.

We will now discuss the results for each family of graphs concerning BBMCX and NEW_SORT to try to provide explanations for the obtained results. The results by families may be summarized as follows:

MANN, hamming and johnson: these sets are not significantly affected by any of the enhancements. A possible explanation for the MANN family is its very high density, which makes preprocessing irrelevant. The graphs from the other two families are easy for all the algorithms, so it is not possible to draw any conclusion.
C: DEG_SORT, as well as the strong initial solution, explain the difference in performance of BBMCX for this family of graphs. We estimate the reduction of the search tree with the new initial ordering to be around 7 % in the more difficult C250.9 graph.
brock and dsjc: The impact of DEG_SORT is not very significant here. In the cases of almost an order of magnitude of improvement (i.e. brock400_3 or brock_400_4), it is explained by a strong initial solution.
frb, gen, and keller: When exponential improvements occur, the explanation is mainly due to COLOUR_SORT. Specifically, the frb-30 instances have 30 as both the chromatic and the clique number, and DEG_SORT is unable to capture this structure. COLOUR_SORT, however, finds an optimum colouring, and when vertices are initially sorted in that way, the problem becomes trivial. Instances gen400_p0.9_55 and gen400_p0.9_65 are also trivially solved by BBMCX with COLOUR_SORT, while keller5 is solved more than 30 times faster.
p_hat: This family contains non-structured graphs in which significant differences between DEG_SORT and prior orderings were not expected. Interestingly, in three cases DEG_SORT reduces the size of the search tree by more than 10 %. Performances over this threshold are due to the improved initial solution.
san, sanr: DEG_SORT improves performance by a small margin, compared with MW, in more difficult graphs (i.e. with 0.9 density). However, these types of instances are well known to be sensitive to a good solution, so whenever NEW_SORT gives a vast improvement in performance (as in the san400_ 0.7—0.9 graphs), the main explanation is the strong initial solution.

Concerning parameter p in DEG_SORT, the best overall value is 4 (in five families) followed by 5 (in san and sanr), 3 (in C), 8 (in frb30), and finally 10 for the p_hat family. As mentioned previously, the tuning procedure uses the easier instances, so it does not constitute a significant disadvantage in a real application.

With respect to MaxCLQ, the proposed NEW_SORT enhances BBMCX so that the latter performs better in the majority of graphs; specifically, it is faster in 60 cases, more than three times faster in 43 cases, and more than an order of magnitude faster in 26 cases. MaxCLQ is supposed to outperform standard BBMCX only in some of the harder, more dense, graphs (independently of the initial sorting). It does so significantly in the graphs MANN_a27,MANN_a45, and C250.9.

5 Conclusions

This work describes a new initial vertex ordering (NEW_SORT) that significantly improves the performance of a family of exact approximate-colour-based solvers for the MCP.

It does so by selecting the ”best” ordering between an improved typical degree-based ordering and a colour-based one. Both sorting procedures have polynomial time complexity and are easy to implement, which makes them useful in practical applications where the exact solution for the maximum clique problem is critical. The best results are obtained when NEW_SORT is further enhanced with a strong initial solution. The reported results show that the improved performance may even be exponential for some graphs.

As a side result, this work also provides an interesting observation for Erdös-Rényi uniform random graphs. It has been observed that the effectiveness of ordering vertices by non-increasing degree for sequential greedy colouring heuristic SEQ is inversely related to the size of these graphs. Work in progress is concerned with further analysis of this result and, if considered appropriate, establishing theoretical proof.

Notes

References

Konc J, Janezic D (2010) ProBiS algorithm for detection of structurally similar protein binding sites by local structural alignment. Bioinformatics 26:1160–1168
Article Google Scholar
Eblen J, Phillips C, Rogers G, Langston M (2012) The maximum clique enumeration problem: algorithms, applications, and implementations. BMC Bioinforma 13:S5
Article Google Scholar
Butenko S, Chaovalitwongse W, Pardalos P (eds) (2009) Clustering challenges in biological networks. World Scientific, Singapore
San Segundo P, Artieda J (2015) A novel clique formulation for the visual feature matching problem. Appl Intell 43(2):325–342
Article Google Scholar
San Segundo P, Rodriguez-Losada D (2013) Robust global feature based data association with a sparse bit optimized maximum clique algorithm. IEEE Trans Robot 29(5):1332–1339
Article Google Scholar
Östergård P (2002) A fast algorithm for the maximum clique problem. Discrete Appl Math 120:1:97–207
Article MathSciNet Google Scholar
Fahle T (2002) Simple and fast: Improving a -and-bound algorithm for maximum clique. In: Proceedings ESA-2002, pp 485–498
Tomita E, Seki T (2003) An efficient branch and bound algorithm for finding a maximum clique. In: Calude C, Dinneen M, Vajnovszki V (eds) Discrete Mathematics and Theoretical Computer Science. LNCS, vol 2731, pp 278–289
Tomita E, Sutani Y, Higashi T, Takahashi S, Wakatsuki M (2010) A simple and faster branch-and-bound algorithm for finding a maximum clique. LNCS 5942:191–203
MathSciNet MATH Google Scholar
San Segundo P, Rodriguez-Losada D, Jimenez A (2011) An exact bit-parallel algorithm for the maximum clique problem. Comput Oper Res 38:2:571–581
Article MathSciNet MATH Google Scholar
San Segundo P, Matia F, Rodriguez-Losada D, Hernando M (2013) An improved bit parallel exact maximum clique algorithm. Optim Lett 7:3:467–479
Article MathSciNet MATH Google Scholar
San Segundo P, Tapia C (2014) Relaxed approximate coloring in exact maximum clique search. Comput Oper Res 44:185–192
Article MathSciNet MATH Google Scholar
Li C-M, Quan Z (2010) An Efficient Branch-and-Bound Algorithm based on MaxSAT for the Maximum Clique Problem. In: Proceedings AAAI, pp 128–133
Li C-M, Quan Z (2010) Combining Graph Structure Exploitation and Propositional Reasoning for the Maximum Clique Problem. In: Proceedings ICTAI, pp 344–351
San Segundo P, Nikolaev A, Batsyn M (2015) Infra-chromatic bound for exact maximum clique search. Comput Oper Res 64:293–303
Article MathSciNet Google Scholar
Konc J, Janečič D (2007) An improved branch and bound algorithm for the maximum clique problem. MATCH Commun Math Comput Chem 58:569–590
MathSciNet MATH Google Scholar
Batsyn M, Goldengorin B, Maslov E, Pardalos P (2014) Improvements to MCS algorithm for the maximum clique problem. J Comb Optim 27:397–416
Article MathSciNet MATH Google Scholar
Li C-M, Fang Z, Xu K (2013) Combining MaxSAT Reasoning and Incremental Upper Bound for the Maximum Clique Problem. In: Proceedings ICTAI, pp 939–946
Bron C, Kerbosch J (1973) Algorithm 457: finding all cliques of an undirected graph. Commun ACM 16:9:575–577
Article MATH Google Scholar
Balas E, Yu C (1986) Finding a maximum clique in an arbitrary graph. SIAM J Comput 15:4:1054–1068
Article MathSciNet MATH Google Scholar
Prosser P (2012) Exact algorithms for maximum clique: a computational study. Algorithms 5:4:545–587
Article MathSciNet Google Scholar
Carraghan R, Pardalos P (1990) An exact algorithm for the maximum clique problem. Oper Res Lett 9:6:375–382
Article MATH Google Scholar
Personal communication with researchers Ciaran McCreesh and Patrick Prosser
Welsh D, Powell M (1976) An upper bound for the chromatic number of a graph and its application to timetabling problem. Comput J 10:1:85–86
MATH Google Scholar
Syslo M (1989) Sequential coloring versus Welsh-Powell bound. Discret Math 74:241–243
Article MathSciNet MATH Google Scholar
Leighton F (1979) A graph coloring algorithm for large scheduling problems. J Res Natl Bur Stand 84 (6):489–506
Article MathSciNet MATH Google Scholar
Wu Q, Hao J (2015) A review on algorithms for maximum clique problems. Eur J Oper Res 242:3:693–709
Article MathSciNet MATH Google Scholar
Andrade D, Resende MG, Werneck R (2012) Fast local search for the maximum independent set problem. J Heuristics 18:4:525–547
Article Google Scholar

Download references

Acknowledgments

Pablo San Segundo and Alvaro Lopez are funded by the Spanish Ministry of Economy and Competitiveness (grants ARABOT: DPI 2010-21247-C02-01 and NAVEGASE: DPI 2014-53525-C3-1-R). Mikhail Batsyn, Alexey Nikolaev, and Panos M. Pardalos are supported by the Laboratory of Algorithms and Technologies for Network Analysis, NRU HSE. We would also like to thank Jorge Artieda for his help with the experiments. Finally, we express our gratitude to Chu-Min Li for providing the source code of MaxCLQ.

Author information

Authors and Affiliations

Centre for Automation and Robotics (UPM-CSIC), C/ Jose Gutiérrez Abascal, 2; 28006, Madrid, Spain
Pablo San Segundo & Alvaro Lopez
Laboratory of Algorithms and Technologies for Networks Analysis, National Research University Higher School of Economics, 136 Rodionova, Niznhy Novgorod, Russia
Mikhail Batsyn, Alexey Nikolaev & Panos M. Pardalos
Center for Applied Optimization, University of Florida, 303 Weil Hall, Gainesville, FL, 32611, USA
Panos M. Pardalos

Authors

Pablo San Segundo
View author publications
You can also search for this author in PubMed Google Scholar
Alvaro Lopez
View author publications
You can also search for this author in PubMed Google Scholar
Mikhail Batsyn
View author publications
You can also search for this author in PubMed Google Scholar
Alexey Nikolaev
View author publications
You can also search for this author in PubMed Google Scholar
Panos M. Pardalos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pablo San Segundo.

Additional information

An erratum to this article is available at http://dx.doi.org/10.1007/s10489-016-0862-3.

Appendix

The list of instances from DIMACS and BHOSHLIB benchmarks employed in the reported results in Table 2 is:

C125.9, C250.9, Mann_a9, Mann_a27, Mann_a45, brock200_1/4, brock_400_1/4, c-fat200-1, c-fat200-2, c-fat200-5, c-fat500-1, c-fat500-2, c-fat500-5, c-fat500-10, dsjc500.1, dsjc500.5, dsjc1000.1, dsjc1000.5, frb30-15-1/5, gen200_p0.9_44, gen200_p0.9_55, hamming6-2, hamming6-4, hamming8-2, hamming8-4, hamming10-2, johnons8-2-4, johnons8-4-4, johnons16-2-4, keller4, p_hat300-1/3, p_hat500-1/3, p_hat300-1/3, p_hat700-1/3, p_hat1000-1/2, p_hat1500-1, san200_0.7_1/2, san200_0.9_1/3, san400_0.5_1, san400_0.7_1/3, san400_0.9_1, san1000, sanr200_0.7, sanr200_0.9, sanr400_0.5, sanr400_0.9.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Segundo, P.S., Lopez, A., Batsyn, M. et al. Improved initial vertex ordering for exact maximum clique search. Appl Intell 45, 868–880 (2016). https://doi.org/10.1007/s10489-016-0796-9

Download citation

Published: 24 May 2016
Issue Date: October 2016
DOI: https://doi.org/10.1007/s10489-016-0796-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Improved initial vertex ordering for exact maximum clique search

Abstract

Similar content being viewed by others

Initial Sorting of Vertices in the Maximum Clique Problem Reviewed

Reversed Search Maximum Clique Algorithm Based on Recoloring

New Integer Linear Programming Models for the Vertex Coloring Problem