1 Introduction

The maximum independent set (MIS) problem takes a connected, undirected graph G=(V,E) as input, and tries to find the largest subset S of V such that no two vertices in S have an edge between them. Besides having several direct applications (Bomze et al. 1999), MIS is closely related to two other well-known optimization problems. To find the maximum clique (the largest complete subgraph) of a graph G, it suffices to find the maximum independent set of the complement of G. Similarly, to find the minimum vertex cover of G=(V,E) (the smallest subset of vertices that contains at least one endpoint of each edge in the graph), one can find the maximum independent set S of V and return VS. Because these problems are NP-hard (Karp 1972), for most instances one must resort to heuristics to obtain good solutions within reasonable time.

Most successful heuristics (Battiti and Protasi 2001; Grosso et al. 2004, 2008; Hansen et al. 2004; Katayama et al. 2005; Pullan and Hoos 2006; Richter et al. 2007) maintain a single current solution that is steadily modified by very simple operations, such as individual insertions, individual deletions, and swaps (replacing a vertex by one of its neighbors). In particular, many algorithms use the notion of plateau search, which consists in performing a randomized sequence of swaps. A swap does not improve the solution value by itself, but with luck it may cause a non-solution vertex to become free, thus allowing a simple insertion to be performed. Grosso et al. (2008) have recently obtained exceptional results in practice by performing plateau search almost exclusively. Their method (as well as several others) occasionally applies a more elaborate operation for diversification purposes, but spends most of its time performing basic operations (insertions, deletions, and swaps), often chosen at random.

This paper expands the set of tools that can be used effectively within metaheuristics. We present a fast (in theory and practice) implementation of a natural local search algorithm. It is based on (1, 2)-swaps, in which a single vertex is removed from the solution and replaced by two others. We show that, given any maximal solution, one can find such a move (or prove that none exists) in linear time. In practice, an incremental version runs in sublinear time. The local search is more powerful than simple swaps, but still cheap enough for effective use within more elaborate heuristics. We also discuss a generalization of this method to deal with (2, 3)-swaps, which consist of two removals followed by three insertions. We can find such a move (or prove that none exists) in O(mΔ) time, where Δ is the highest degree in the graph.

Another contribution is a more elaborate heuristic that illustrates the effectiveness of our local search. Although the algorithm is particularly well-suited for large, sparse instances, it is competitive with previous algorithms on a wide range of instances from the literature. As an added contribution, we augment the standard set of instances from the literature with new (and fundamentally different) instances, never previously studied in the context of the MIS problem.

This paper is organized as follows. Section 2 establishes the notation and terminology we use. Our local search algorithms are described in Sect. 3. Section 4 illustrates how they can be applied within a more elaborate heuristic. We report our experimental results in Sect. 5. Finally, concluding remarks are presented in Sect. 6.

2 Basics

The input to the MIS problem is a connected graph G=(V,E), with |V|=n and |E|=m. We assume that vertices are labeled from 1 to n. We use the adjacency list representation: each vertex keeps a list of all adjacent vertices.

A solution S is simply a subset of V in which no two vertices are adjacent. The tightness of a vertex \(v \not\in S\), denoted by τ(v), is the number of neighbors of v that belong to S. We say that a vertex is k-tight if it has tightness k. The tightnesses of all vertices can be computed in O(m) time: initialize all values to zero, then traverse the adjacency list of each solution vertex v and increment τ(w) for every arc (v,w). Vertices that are 0-tight are called free.

A (j,k)-swap consists of removing j vertices from a solution and inserting k vertices into it. For simplicity, we refer to a (k,k)-swap as a k-swap (or simply a swap when k=1), and to a (k−1,k)-swap as a k-improvement. In particular, the main algorithms described in this paper (Sect. 3) are fast implementations of 2-improvements (Sect. 3.1) and 3-improvements (Sect. 3.2). We use the term move to refer to a generic (j,k)-swap.

We say a solution is k-maximal if no k-improvement is possible. In particular, a 1-maximal (or simply maximal) solution has no free vertices.

Solution representation

Being just a subset of V, there are several ways of representing a solution S. Straightforward representations, such as lists or incidence vectors, easily allow insertions and deletions to be performed in constant time. Unfortunately, more complex operations (such as finding a free vertex or determining whether a solution is maximal) can take as much as Θ(m) time. We therefore opt for a slightly more complicated (but more powerful) representation within our algorithms.

We represent a solution S as a permutation of all vertices that partitions them into three blocks: first the |S| vertices in the solution, then the free vertices, and finally the non-solution vertices that are not free. The order among vertices within a block is irrelevant. The sizes of the first two blocks are stored explicitly. In addition, the data structure maintains, for each vertex, its tightness (which allows us to determine when a vertex becomes free) and its position in the permutation. Keeping this position allows a vertex to be moved between any two blocks in constant time. (For example, to move a vertex v from the first block to the second, we first swap v with the last vertex of the first block, then change the boundary between the blocks to make v the first vertex of the second block.)

Our data structure must be updated whenever a vertex v is inserted into or removed from S. To insert a vertex v, we must first move it to the first block of the permutation. Then, for each neighbor w of v, we increase τ(w) by one and move w to the third block (if it was previously free). To remove a vertex v, we first move it to the second block (since it becomes free). Then, for each neighbor w of v, we decrease τ(w) by one and, if its new tightness is zero, we move w from the third to the second block.

This means that inserting or deleting a vertex v takes time proportional to the degree of v, which we denote by deg(v). While this is more expensive than in simpler solution representations (such as lists or incidence vectors), this data structure allows several powerful operations to be performed in constant time. The tightness of any vertex, for example, is explicitly maintained. To determine whether the solution is maximal, it suffices to check if the second block is empty. To find out whether vS, we only need to check if it is on the first block. Finally, we can pick a vertex uniformly at random within any of the three blocks, since each block is represented as a contiguous array of known size. This operation will be especially useful for metaheuristics.

3 Local search

We now present our two main algorithmic contributions. We start with 2-improvements (in Sect. 3.1), then discuss 3-improvements (in Sect. 3.2).

3.1 Finding 2-improvements

Our main local search algorithm is based on 2-improvements. These natural operations have been studied before (see e.g. Feo et al. 1994); our contribution is a faster algorithm that implements them. Given a maximal solution S, we would like to replace some vertex xS with two vertices, v and w (both originally outside the solution), thus increasing the total number of vertices in the solution by one.

Before we get to the actual algorithm, we establish some important facts about any valid 2-improvement that removes a vertex x from the solution. First, both v and w (the vertices inserted) must be neighbors of x, otherwise the original solution would not be maximal. Moreover, both v and w must be 1-tight, or else the new solution would be valid. Finally, v and w must not be adjacent to each other.

Our algorithm processes each vertex xS in turn, using the data structures mentioned in the previous section. First, temporarily remove x from S, creating a solution S′. If S′ has less than two free vertices, stop: there is no 2-improvement involving x. Otherwise, for each neighbor v of x that is free in S′, insert v into S′ and check if the new solution (S″) contains a free vertex w. If it does, inserting w leads to a 2-improvement; if it does not, remove v from S″ (thus restoring S′) and process the next neighbor of x. If no improvement is found, reinsert x into the solution.

This algorithm clearly finds a 2-improvement if there is one. We can determine its running time by bounding the total number of vertex scans performed. A vertex xS is scanned only when x itself is the candidate for removal, when it is removed from and (possibly) reinserted into the solution. A non-solution vertex v is only scanned it if is 1-tight: it is removed from the solution (and possibly reinserted) when its unique solution neighbor x is processed as a candidate for removal. This means every vertex in the graph is scanned O(1) times; equivalently, every edge is visited O(1) times. We have thus proven the following:

Theorem 1

Given a maximal solution S, one can find a 2-improvement (or prove that none exists) in O(m) time.

3.1.1 A direct implementation

Although our algorithm is relatively simple as described, some of its complexity is in fact hidden by the solution representation. For example, we rely on the representation’s ability to determine in constant time whether there is a free vertex, and to produce one such vertex if it exists. To answer such queries efficiently, the algorithm must temporarily modify the solution by performing insertions and deletions, which are somewhat expensive. In practice, we can save some constant factors in the running time with a more direct implementation. Although it may appear to be more complex, it does essentially the same thing.

The implementation assumes the neighbors of any vertex v are sorted by increasing order of label in v’s adjacency list. This order can be enforced in linear time (for all vertices) in a preprocessing step with the application of radix sort to all edges.

As in the original algorithm, we process each solution vertex x in turn. To process x, we first build a list L(x) consisting of all 1-tight neighbors of x, also sorted by label. If L(x) has fewer than two elements, we are done with x: it is not involved in any 2-improvement. Otherwise, we must find, among all candidates in L(x), a pair {v,w} such that there is no edge between v and w. We do this by processing each element vL(x) in turn. For a fixed candidate v, we check if there is a vertex wL(x) (besides v) that does not belong to A(v), the adjacency list of v. Since both L(x) and A(v) are sorted by vertex identifier, this can be done by traversing both lists in tandem. All elements of L(x) should appear in the same order within A(v); if there is a mismatch, the element of L(x) missing in A(v) is the vertex w we are looking for.

We claim that this algorithm finds a valid 2-improvement (or determines that none exists) in O(m) time. This is clearly a valid bound on the time spent scanning all vertices (i.e., traversing their adjacency lists), since each vertex is scanned at most once. Each solution vertex x is scanned to build L(x) (the list of 1-tight neighbors), and each 1-tight non-solution vertex v is scanned when its only solution neighbor is processed. (Non-solution vertices that are not 1-tight are not scanned at all.) We still need to bound the time spent traversing the L(x) lists. Each list L(x) may be traversed several times, but each occurs in tandem with the traversal of the adjacency list A(v) of a distinct 1-tight neighbor v of x. Unless the traversal finds a valid swap (which occurs only once), traversing L(x) costs no more than O(deg(v)), since each element of L(x) (except v) also occurs in A(v). This bounds the total cost of such traversals to O(m).

3.1.2 Incremental version

A typical local search procedure does not restrict itself to a single iteration. If a valid 2-improvement is found, the algorithm will try to find another in the improved solution. This can of course be accomplished in linear time, but we can do better with an incremental version of the local search, which uses information gathered in one iteration to speed up later ones.

The algorithm maintains a set of candidates, which are solution vertices that might be involved in a 2-improvement. So far, we have assumed that all solution vertices are valid candidates, and we test them one by one. After a move, we would test all vertices again. Clearly, if we establish that a candidate x cannot be involved in a 2-improvement, we should not reexamine it unless we have good reason to do so. More precisely, when we “discard” a candidate vertex x, it is because it does not have two independent 1-tight neighbors. Unless at least one other neighbor of x becomes 1-tight, there is no reason to look at x again.

With this in mind, we maintain a list of candidates that is updated whenever the solution changes. Any move (including a 2-improvement) can be expressed in terms of insertions and deletions of individual vertices. When we insert a vertex v into the solution, its neighbors are the only vertices that can become 1-tight, so we simply (and conservatively) add v to the list of candidates. When a vertex x is removed from the solution, the update is slightly more complicated. We must traverse the adjacency list of x and look for vertices that became 1-tight due to the removal of x. By definition, each such vertex v will have a single neighbor y in the solution; y must be inserted into the candidate list. We can find the solution vertex adjacent to each 1-tight neighbor v in constant time, as long as we maintain with each non-solution vertex the list of its solution neighbors.Footnote 1 Therefore, we could still update the candidate list after removing x in O(deg(x)) time. For simplicity, however, we do not maintain the auxiliary data structures in our implementation, and explicitly scan each 1-tight neighbor of x.

Although we have framed our discussion in terms of 2-improvements, these updates can of course be performed for any sequence of removals and/or insertions. As we will see, this means we can easily embed the incremental local search algorithm into more elaborate heuristics.

Once invoked, the local search itself is quite simple: it processes the available candidates in random order. If a candidate x leads to a 2-improvement, we perform the move and update the list of candidates accordingly; otherwise, x is simply removed from the list of candidates. The local search stops when the list of candidates becomes empty, indicating that a local optimum has been reached.

3.1.3 Maximum clique

Although our experiments focus mainly on the MIS problem, it is worth mentioning that one can also implement a linear-time 2-improvement algorithm for the maximum clique problem. Note that simply running the algorithm above on the complement of the input is not enough to ensure linear time, since the complement may be much denser than the original graph.

Given a maximal clique C, we must determine if there is a vertex xC and two vertices \(v,w \not\in C\) such that the removal of x and the insertion of v and w would lead to a larger clique. Such a move only exists if the following holds: (1) v and w are neighbors; (2) both v and w are adjacent to all vertices in C∖{x}; and (3) neither v nor w are adjacent to x (or else C would not be maximal). For a vertex v with tightness |C|−1, define its missing neighbor μ(v) as the only solution vertex to which v is not adjacent. There is a 2-improvement involving \(v \not\in C\) if it has a neighbor \(w \not\in C\) such that τ(w)=|C|−1 and μ(w)=μ(v). Knowing this, the local search procedure can be implemented in O(m) time as follows. First, determine the tightness of all vertices, as well as the missing neighbors of those that are (|C|−1)-tight. Then, for each (|C|−1)-tight vertex v, determine in O(deg(v)) time if it has a neighbor w satisfying the conditions above.

3.2 Finding 3-improvements

We now discuss a potentially more powerful local search algorithm for the minimum independent set problem. Given a 2-maximal solution S (i.e., one in which no 2-improvement can be performed), we want to find a valid 3-improvement. In other words, we look for vertices {x,y}⊂S and {u,v,w}⊂VS such that we can turn S into a better (valid) solution S′ by removing x and y and inserting u, v, and w. To devise an efficient algorithm for this problem, we first establish some useful facts about the vertices involved in such a move.

Lemma 1

Edges (u,v), (u,w), and (v,w) do not belong to the graph.

Proof

If they did, the new solution S′ would not be valid. □

Lemma 2

Each vertex in {u,v,w} is adjacent to x, y, or both, but to no other vertex in S.

Proof

Consider vertex u (one can reason about v and w similarly). If u is adjacent to any other vertex zS besides x and y, the new solution S′ would not be valid. If u were adjacent to neither x nor y, the original solution S would not be maximal, which contradicts our initial assumption. Therefore, u must be adjacent to either x or y (or both). □

Lemma 3

If any two vertices in {u,v,w} are 1-tight, they must have different neighbors in {x,y}.

Proof

By contradiction (and without loss of generality), suppose that both v and w are 1-tight and adjacent to x. Then a 2-improvement replacing x by v and w would be valid, which contradicts our assumption that S is 2-maximal. □

Lemma 4

At least one vertex in {u,v,w} must be adjacent to both x and y.

Proof

Immediate from Lemmas 2 and 3. □

Together, these lemmas imply that any valid 3-improvement involves vertices x, y, u, v, and w such that:

  1. 1.

    x,yS and \(u, v, w \not\in S\);

  2. 2.

    u is 2-tight and adjacent to both x and y;

  3. 3.

    v is adjacent to x (and maybe y) and to no other vertex in S;

  4. 4.

    w adjacent to y (and maybe x) and to no other vertex in S;

  5. 5.

    u, v, and w are not adjacent to one another.

To find a 3-improvement, our algorithm processes each 2-tight vertex u in turn, as follows. Let x and y be u’s neighbors in the solution. Our goal is to find vertices v and w satisfying conditions 3, 4, and 5 above. To accomplish this, first temporarily remove x and y from the solution and insert u (which is now free). Let S′ be the new solution. If they exist, v and w (as defined in the constraints above) must be free in S′. Therefore, if S′ less than two free vertices, we are done with u; there is no pair {v,w} leading to a valid 3-improvement. Otherwise, for each free neighbor v of x, temporarily add it to S′, creating S″. If S″ is not maximal, adding any free vertex w will create a valid solution (thus leading to a 3-improvement). Otherwise, remove v and try another free neighbor of x.

Running time

It is easy to see that the algorithm above will find a valid 3-improvement if there is one. We must now bound its total running time. When processing a 2-tight vertex u, only u, x, y, and the newly-free neighbors v of x will be scanned (no more than twice each). Note that this includes scans performed during insertions and deletions. Because these are all distinct vertices, the total time for all these scans O(m). Since there are O(n) 2-tight vertices u, the total time to find a valid 3-improvement (or prove that no such move exists) is O(mn).

We can obtain a tighter bound in terms of the maximum degree Δ in the graph. Using the notation above, we can make the following observations. Every 2-tight vertex is scanned O(1) times as u (it is first deleted, then reinserted). Furthermore, every solution vertex (x or y) is scanned O(Δ) times; more precisely, it is scanned O(1) times when each 2-tight neighbor u is processed, and there are at most Δ such neighbors. Finally, every 1- or 2-tight neighbor v of x is scanned O(Δ) times: O(1) times when each of its (at most two) neighbors in the solution is scanned. Together, these observations imply that every vertex can be scanned at most O(Δ) times. Equivalently, each edge is visited O(Δ) times, thus bounding the total time of the algorithm by O(mΔ).

For many of the benchmark instances we tested, Δ is a small constant, and in most it is o(n). For denser instances, however, this bound on the running time may be overly pessimistic. In fact, we can use the exact same argument to get a bound of O(mk), where k≤Δ is the maximum number of 2-tight neighbors of any solution vertex x. We have thus proven the following:

Theorem 2

Given a 2-maximal solution S, one can find a valid 3-improvement (or prove that none exists) in O(mk) time, where k≤Δ<n is the maximum number of 2-tight neighbors of any vertex xS.

Practical considerations

Ideally, we would like to have an incremental version of the algorithm in practice, following the approach described in Sect. 3.1.2 for 2-improvements. There, we suggested keeping list of candidate vertices, which would be updated every time the solution changes. Although we could keep a similar list for 3-improvements (keeping all 2-tight vertices u that might be part of a valid move), maintaining it would be substantially more expensive because changes in the neighborhoods of any of the five relevant vertices (u, v, w, x, and y) would have to be taken into account.

Instead, we decided to use a simpler strategy in our implementation of 3-improvements. It works in passes. Each pass starts by running a local search based on 2-improvements to ensure the solution is 2-maximal. It then processes all vertices in turn, performing 3-improvements as they are found. If vertex u is 2-tight when it is its turn to be processed, the algorithm looks for x, y, v, and w, as explained above. Otherwise, u is just skipped. In practice, the solution becomes 3-maximal after a very small number of passes (often 1 or 2) and almost all the improvement (relative to the starting solution) is obtained in the first pass.

4 Metaheuristics

4.1 Iterated local search

To test our local searches, we use them within a heuristic based on the iterated local search (ILS) metaheuristic (Lourenço et al. 2003). We start from a random solution S, apply local search to it, then repeatedly execute the following steps:

  1. 1.

    S′←perturb(S);

  2. 2.

    S′←localsearch(S′);

  3. 3.

    SS′ if certain conditions are met.

Any reasonable stopping criterion can be used, and the algorithm returns the best solution found. The remainder of this section details our implementation of each step of this generic algorithm.

The perturbations in Step 1 are performed with the force(k) routine, which sequentially inserts k vertices into the solution (the choice of which ones will be explained shortly) and removes the neighboring vertices as necessary. (We call these forced insertions.) It then adds free vertices at random until the solution is maximal. We set k=1 in most iterations, which means a single vertex will be inserted. With small probability (1/(2⋅|S|)), however, we pick a higher value: k is set to i+1 with probability proportional to 1/2i, for i≥1. We must then choose which k vertices to insert. If k=1, we pick a random non-solution vertex. If k is larger, we also start with a random non-solution vertex, and pick the j-th vertex (for j>1) among the non-solution vertices within distance exactly two from the first j−1 vertices. (If there is no such vertex, we simply stop inserting.)

We use two techniques for diversification. The first is soft tabu. We keep track of the last iteration in which each non-solution vertex was part of the solution. Whenever the force routine has a choice of multiple vertices to insert, it looks at κ (an input parameter) candidates uniformly at random (with replacement) and picks the “oldest” one, i.e., the one which has been outside the solution for the longest time. We set κ=4 in our experiments. The second diversification technique is employed during local search. If v was the only vertex inserted by the force routine, the subsequent local search will only allow v to be removed from the solution after all other possibilities have been tried.

Step 2 of ILS is a standard local search using the 2-improvement algorithm described in Sect. 3.1. It stops when a local optimum is reached. We also tried incorporating the local search based on 3-improvements into ILS, but the fact that it is not incremental made it too slow to improve the results. By restricting ourselves to 2-improvements, we allow more ILS iterations to be performed.

In Step 3, if the solution S′ obtained after the local search is at least as good as S (i.e., if |S′|≥|S|), S′ becomes the new current solution. We have observed that always going to S′ (even when |S′|<|S|) may cause the algorithm to stray from the best known solution too fast. To avoid this, we use a technique akin to plateau search. If ILS arrives at the current solution S from a solution that was better, it is not allowed to go to a worse solution for at least |S| iterations. If the current solution does not improve in this time, the algorithm is again allowed to go to a worse solution S′. It does so with probability 1/(1+δδ ), where δ=|S|−|S′|, δ =|S |−|S′|, and S is the best solution found so far. Intuitively, the farther S′ is from S and S , the least likely the algorithm is to set SS′. If the algorithm does not go to S′ (including during plateau search), we undo the insertions and deletions that led to S′, then add a small perturbation by performing a random 1-swap in S, if possible. We can find a 1-swap in constant time by keeping the list of all 1-tight vertices explicitly.

Finally, we consider the stopping criterion. As already mentioned, any reasonable criterion works. In our experiments, we stop the algorithm when the average number of scans per arc exceeds a predetermined limit (which is the same for every instance within each family we tested). An arc scan is the most basic operation performed by our algorithm: in fact, the total running time is proportional to the number of such scans. By fixing the number of scans per arc (instead of the total number of scans) in each family, we make the algorithm spend more time on larger instances, which is a sensible approach in practice. To minimize the overhead of counting arc scans individually, our code converts the bound on arc scans into a corresponding bound on vertex scans (using the average vertex degree), and only keeps track of vertex scans during execution.

4.2 The GLP algorithm

We now discuss the algorithm of Grosso et al. (2008), which we call GLP. Although it was originally formulated for the maximum clique problem, our description refers to the MIS problem, as does our implementation. We implemented “Algorithm 1 with restart rule 2,” which seems to give the best results overall among the several variants proposed in Grosso et al. (2008). What follows is a rough sketch of the algorithm. See the original paper for details.

The algorithm keeps a current solution S (initially empty), and spends most of its time performing plateau search (simple swaps). A simple tabu mechanism ensures that vertices that leave the solution during plateau search do not return during the same iteration, unless they become free and there are no alternative moves. A successful iteration ends when a non-tabu vertex becomes free: we simply insert it into the solution and start a new iteration. An iteration is considered unsuccessful if this does not happen after roughly |S| moves: in this case, the solution is perturbed with the forced insertion of a single non-solution vertex (with at least four solution neighbors, if possible), and a new iteration starts. GLP does not use local search.

Unlike Grosso et al.’s implementation of GLP, ours does not stop as soon as it reaches the best solution reported in the literature. Instead, we use the same stopping criterion as the ILS algorithm, based on the number of arc scans. Although different, both ILS and GLP have scans as their main basic operation. Using the number of arc scans as the stopping criterion ensures that both algorithms have similar running times for all instances.

5 Experimental results

All algorithms were implemented by the authors in C++ and compiled with gcc v. 4.3.2 with the full optimization (-O3) flag. All runs were made on one core of a 3.16 GHz Intel Core 2 Duo CPU with 4 GB of RAM running Windows 7 Enterprise 64-bit Edition. CPU times were measured with the getrusage function, which has precision of 1/60 second. Times do not include reading the graph and building the adjacency lists, since these are common to all algorithms. But they do include the allocation, initialization, and destruction of the data structures specific to each algorithm.

5.1 Instances

We test our algorithms on five families of instances. The DIMACS family contains instances of the maximum clique problem from the 2nd DIMACS Implementation Challenge (Johnson and Trick 1996), which have been frequently tested in the literature. It includes a wide variety of instances, with multiple topologies and densities. Since we deal with the MIS problem, we use the complements of the original graphs. For instances with no known optimum, we report the best results available at the time of writing (as listed in Grosso et al. 2008; Richter et al. 2007).

The SAT family contains transformed satisfiability instances originally from the SAT’04 competition, available at Xu (2004) and tested in Grosso et al. (2008), Richter et al. (2007). All optima are known.

The CODE family, made available by Sloane (2000), consists of challenging graphs arising from coding theory. Each subfamily refers to a different error-correcting code, with vertices representing code words and edges indicating conflicts between them. The best known results for the hardest instances were found by the specialized algorithms of Butenko et al. (2002, 2008).

The last two families, MESH and ROAD, are novel in the context of the independent set problem. MESH is motivated by an application in Computer Graphics recently described by Sander et al. (2008). To process a triangulation efficiently in graphics hardware, their algorithm must find a small subset of triangles that covers all the edges in the mesh. This is the same as finding a small set cover on the corresponding dual graph (adjacent faces in the original mesh become adjacent vertices in the dual). The MESH family contains the duals of well-known triangular meshes. While converting the original primal meshes, we repeatedly eliminated vertices of degree one and zero from the dual, since there is always a maximum independent set that contains them. (Degree-one vertices arise when the original mesh is open, i.e., when it has edges that are adjacent to a single triangle instead of the usual two.) Almost all vertices in the resulting MIS instances (which are available upon request) have degree three.

The ROAD family contains planar graphs representing parts of the road network of the United States, originally made available for the 9th DIMACS Implementation Challenge, on shortest paths (Demetrescu et al. 2006). Vertices represent intersections, and edges correspond to the road segments connecting them. As in the previous family, these graphs have numerous vertices of degree one. We chose not to eliminate them explicitly, since these instances are already available in full form.

For conciseness, we only report results on a few representatives of each family, leaving out easy instances and those that behave similarly to others.

5.2 Local search

We start our evaluation with an analysis of the local search algorithms by themselves, in terms of both solution quality and running time.

We first ran the local searches on solutions found by a linear-time algorithm that inserts free vertices uniformly at random into an initially empty solution until it becomes maximal. Table 1 reports the results obtained after 999 runs of this random algorithm (which we call R), with different random seeds. The first four columns characterize the instance: its name, number of vertices (n), average vertex degree (deg), and best known solution (best). (Note that best is taken from the literature for CODE, DIMACS, and SAT; for ROAD and MESH, we show the best results found by the metaheuristics tested in this paper.) The next column shows the average solution obtained by the constructive algorithm (L1), followed by the average local maxima obtained by the 2-improvement (L2) and 3-improvement (L3) local searches. Finally, the last three columns show the average running time (in milliseconds) of these algorithms. Note that local search times include construction of the initial solution.

Table 1 Performance of the local search algorithms when starting from a random solution (R). For each instance, the table shows its name, number of vertices (n), average vertex degree (deg), and the best known solution (best). Average solutions are then shown for the constructive algorithm (L1), 2-improvement local search (L2), and 3-improvement local search (L3). The last three columns show the average running times of these methods in milliseconds (local search times include construction). Each block of instances corresponds to one family (DIMACS, SAT, CODE, MESH, and ROAD, respectively)

Given the somewhat low precision of our timing routine (and how fast the algorithms are in this experiment), we did not measure running times directly. Instead, we ran each subsequence of 111 seeds repeatedly until the total running time was at least 5 seconds, then took the average time per run. Before each timed run, we ran the whole subsequence of 111 once to warm up the cache and minimize fluctuations. (Single runs would be slightly slower, but would have little effect on the relative performance of the algorithms.)

For a more complete understanding of the local searches, we also ran them on solutions found by a natural greedy algorithm. It builds the solution one vertex at a time, always picking for insertion the vertex with the minimum residual degree, i.e., with the fewest free neighbors. Its goal is to preserve as many free vertices as possible after each insertion.

This algorithm can be easily implemented in O(m) time if we keep free vertices in buckets according to the their residual degrees, which are maintained explicitly. When a new vertex is inserted into the solution, we scan its previously free neighbors to update the residual degrees of their own neighbors, which are then moved between buckets as necessary. Buckets are implemented as doubly-linked lists. We add a small degree of randomization to the algorithm by randomly permuting all vertices before adding them to their original buckets. Table 2 reports the results obtained by this implementation (which we refer to as G) by itself and when followed by local search. The columns are the same as in Table 1.

Table 2 Performance of the local search algorithms starting from greedy (G) solutions. L1 refers to the constructive algorithm, L2 to the 2-improvement local search, and L3 to the 3-improvement local search

Finally, we consider a variant of the greedy algorithm that picks a vertex uniformly at random among all candidates with minimum residual degree. An efficient implementation of this method cannot use buckets as G does, since one cannot sample uniformly at random from linked lists in constant time. Instead, we use a technique similar to the one used by our standard solution representation (Sect. 2). Instead of keeping each bucket separately, we maintain a permutation of the entire list of vertices in a single array. All vertices with a given residual degree belong to a contiguous block in the array, in arbitrary order. Non-free vertices appear first, followed by the free vertices sorted by residual degree. By keeping the initial positions of all blocks explicitly, we can pick a random element of each block in constant time. It is easy to see that this data structure can be updated efficiently after a vertex is inserted in the solution. Although it has higher data structure overhead than G, this randomized greedy algorithm (RG) still runs in O(m) time. The results obtained with it are presented in Table 3.

Table 3 Performance of the local search algorithms starting from randomized greedy solutions (RG). L1 refers to the constructive algorithm, L2 to the 2-improvement local search, and L3 to the 3-improvement local search

Comparing all three tables, we observe that the greedy algorithms (G and RG) find solutions of similar quality, and are usually much better than random (R). Random is consistently faster, however, especially for very dense instances, such as p_hat1500-1. While the greedy algorithms must visit every edge in the graph, the random algorithm only traverses the adjacency lists of the vertices that end up in the solution. Applying the 2-improvement local search to R is often cheaper than running G or RG by themselves, but the local maxima reached from R are usually worse. Moreover, on very sparse instances, 2-improvement is actually faster when applied to greedy solutions than to random ones (even including construction times), since it starts much closer to a local maximum.

The 2-improvement local search is remarkably fast when applied to the greedy solutions, particularly when the solution is small compared to the total number of vertices (as in san1000, for example). Even for large, sparse instances (such as fla and buddha) the local search is about as fast as the constructive algorithm. (Recall that the local search times reported in the tables actually include construction.)

On these large, sparse instances, 2-improvements are much more effective on RG than on G. In fact, G tends to find better solutions than RG, but after local search the opposite is true. A possible explanation is that the stack-like nature of buckets in G causes the solutions it generates to be more “packed” solutions than for RG.

The 3-exchange local search finds much better solutions than 2-exchange for all three constructive algorithms (especially G).Footnote 2 In relative terms, however, they are similar: they find their best solutions when starting from RG, and the worst when starting from R. The running times of the 3-improvement local search are often not much higher than those of the greedy algorithm, even for some dense instances (such as keller6). As anticipated in Sect. 3.2, this indicates that bounding the running time by O(mΔ) is sometimes too pessimistic. Of course, bad cases do happen: on johnson32-2-4, for example, the local search is 10 times slower than the standard greedy algorithm—and it does not even improve the solution.

The higher variance of RG helps after local search. Over all 999 runs, for a fixed local search, the best solution found when starting from RG is at least as good as when starting from R or G. The main exception is dragonsub: the best solution found by RG followed by 2-improvement was worse than the best solution found by G, which is actually 2-maximal. Despite this exception, these results suggest that some variant of RG could be well-suited to multistart-based metaheuristics, such as GRASP (Feo et al. 1994).

5.3 Metaheuristics

Although local search can improve the results found by constructive heuristics, we have seen that the local optima are usually somewhat far from the best known bounds. For near-optimal solutions, we turn to metaheuristics. We compare our iterated local search (ILS) with our implementation of Grosso et al.’s GLP algorithm. Our version of GLP deals with the maximum independent set problem directly, and its time per operation is comparable to the original implementation (Grosso et al. 2008).

Tables 4, 5, and 6 present results for DIMACS, CODE, and SAT, respectively. For each instance, we first show its number of vertices, its density, and the best known solution. We then report the minimum, average, and maximum solutions found over 15 runs of each algorithm (the numbers in parentheses indicate how many of the 15 runs found the maximum). Finally, we give the average running time in seconds. Both algorithms were run until the average number of scans per arc reached 217. The best average solution found in each case is highlighted in bold.

Table 4 DIMACS family. For each algorithm, we show the worst (min), average (avg), and best (max) solution found over 15 runs (the number of runs that found the maximum is in parenthesis). We also show the average running times in seconds (time). Both algorithms were run until the average arc was scanned 217 times
Table 5 Results for the CODE family with 217 scans per arc (aggregated over 15 runs)
Table 6 Results for the SAT family with 217 scans per arc (aggregated over 15 runs)

As anticipated by our choice of stopping criterion, the relative running times of the algorithms do not fluctuate much: ILS is consistently about twice as fast on average. This gap is due to constant factors in the implementation of the algorithms. In particular, GLP maintains separate data structures to be able to pick non-tabu free vertices and non-tabu 1-tight vertices uniformly at random in constant time. Although this gap could probably be reduced with additional tuning, it is small enough for our purposes. The number of operations is similar and implementation-independent. For the remainder of this section, our discussion ignores any difference in running times between the algorithms.

Regarding solution quality, we note that, together, the algorithms do rather well on these families. For almost all instances, the best known bound was found at least once (the main exceptions are C2000.9, brock800_2, and the largest SAT graphs). Moreover, the average solutions found by ILS and GLP are usually very close to one another. GLP was consistently better on the brock instances (dense random graphs with a “hidden” large clique), C2000.9 and C4000.5 (also random, with larger cliques naturally hidden by the high value of n). GLP’s advantage at finding these cliques is probably due to its stronger tabu mechanism. In contrast, GLP does poorly on the MANN instances (sparse graphs with large independent sets), while ILS finds the optimal solution MANN_a81 in less than one second on average. On SAT instances, ILS found the best known solutions for all instances in at least one run, but GLP was more consistent on average, reaching these solutions more often. On SAT, ILS does better on the hardest (larger) instances, but misses the best known solution more often on smaller instances.

Although our algorithm does well on these families, GLP is somewhat more robust, especially on DIMACS and CODE. This is not the case for large, sparse graphs, to which we now turn our attention. Table 7 presents results for the MESH family. Because it includes much larger graphs than the previous series, we limit the average number of arc scans to 215.

Table 7 Results for the MESH family with 215 scans per arc (aggregated over 15 runs)

Note that GLP tends to find slightly better results for smaller instances; for some larger instances, notably dragon and buddha, ILS is clearly better. The relative performance of the algorithms appears to be correlated with the regularity of the meshes: GLP is better for regular meshes, whereas ILS is superior for more irregular ones. We verified this by visual inspection—see Fig. 1 for a couple of examples. Alternatively, we can use the standard deviation of the vertex degrees in the original (primal) mesh is a rough proxy for irregularity. It is relatively smaller for bunny (0.58) and dragonsub (0.63), on which GLP is the best algorithm, and bigger for buddha (1.28) and dragon (1.26), on which ILS is superior.Footnote 3 Note that dragonsub is a subdivision of dragon: a new vertex is inserted in the middle of each edge, and each triangle is divided in four. Both meshes represent the same model, but because every new vertex has degree exactly six, dragonsub is much more regular.

Fig. 1
figure 1

Detailed view of the meshes from which bunny (left) and buddha (right) were derived. Notice that bunny is significantly more regular. (Pictures provided by the authors of Sander et al. 2008.)

Although optimal solutions for the MESH family are not known, Sander et al. (2008) computed lower bounds on the cover solutions for seven of their original meshes, which can be easily translated into upper bounds for our (MIS) instances. The instances (and MIS upper bounds) are: buddha (495807), bunny (32850), dragon (68480), fandisk (4168), feline (19325), gargoyle (9120), and turtle (125438). For ILS, the lowest gaps observed were on bunny (the solutions were 1.9% lower than the upper bounds), and the highest on buddha (more than 3.0%). The extremes of GLP were on the same instances: 1.8% on bunny and 3.3% on buddha.

Finally, Table 8 presents the results for ROAD, with the average number of scans per arc limited to 212. Here ILS has clear advantage. On every instance, the worst result found by ILS was better than the best found by GLP.

Table 8 Results for the ROAD family with 212 scans per arc (aggregated over 15 runs)

We note that MESH and ROAD are fundamentally different from the previous families. These are large graphs with linear-sized maximum independent sets. Both ILS and GLP start from relatively bad solutions, which are then steadily improved, one vertex at a time. To illustrate this, Fig. 2 shows the average solutions found for three representative instances from these families (buddha, turtle, and fla) as the algorithms progress. GLP initially finds better solutions, but is soon overtaken by ILS. In the case of turtle, GLP eventually catches up again. The third curve in the plots (ILS+plateau) refers to a version of our algorithm that also performs plateau search when the current solution improves (recall that ILS only performs plateau search when the solution worsens). Although faster at first, ILS+plateau is eventually surpassed by ILS in all three instances.

Fig. 2
figure 2

Average solutions found as the number of scans per vertex increases. Results for buddha (top left), turtle (top right), fla (middle left), frb100-40 (middle right), C2000.9 (bottom left), and 1zc.4096 (bottom right)

For comparison, Fig. 2 also shows results for longer runs (220 scans per arc, with 15 different seeds) on instances from the other three families: frb100-40 (SAT), C2000.9 (DIMACS), and 1zc.4096 (CODE). As before, GLP starts much better. On frb100-40, it is soon surpassed by ILS. On 1zc.4096, ILS slowly reduces the gap, but does not quite close it. On C2000.9, GLP is consistently better, even as the number of scans increases. Note that the performance of ILS+plateau is not much different from ILS itself, unlike on MESH and ROAD instances.

6 Final remarks

We have proposed a fast implementation of a natural local search procedure for the independent set problem. Within an iterated local search (a metaheuristic), it provided results competitive with the best methods previously proposed, often matching the best known solutions (including optima) on the well-studied DIMACS, CODE, and SAT families. On some large, sparse instances (road networks and irregular meshes), its performance is consistently superior to that of GLP. For these large instances, however, we do not know exactly how far our method is from the optimal solution: there may be room for improvement. It seems reasonable, for example, to deal with these problems more locally. Instead of looking at the entire graph at once, we conjecture that one could do better by focusing at individual regions at a time.