A robust and scalable algorithm for the Steiner problem in graphs

Pajor, Thomas; Uchoa, Eduardo; Werneck, Renato F.

doi:10.1007/s12532-017-0123-4

A robust and scalable algorithm for the Steiner problem in graphs

Full Length Paper
Published: 04 September 2017

Volume 10, pages 69–118, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Mathematical Programming Computation Aims and scope Submit manuscript

A robust and scalable algorithm for the Steiner problem in graphs

Download PDF

Thomas Pajor¹,
Eduardo Uchoa² &
Renato F. Werneck³

621 Accesses
23 Citations
3 Altmetric
Explore all metrics

Abstract

We present an effective heuristic for the Steiner Problem in Graphs. Its main elements are a multistart algorithm coupled with aggressive combination of elite solutions, both leveraging recently-proposed fast local searches. We also propose a fast implementation of a well-known dual ascent algorithm that not only makes our heuristics more robust (by dealing with easier cases quickly), but can also be used as a building block of an exact branch-and-bound algorithm that is quite effective for some inputs. On all graph classes we consider, our heuristic is competitive with (and sometimes more effective than) any previous approach with similar running times. It is also scalable: with long runs, we could improve or match the best published results for most open instances in the literature.

Faster Algorithms for Steiner Tree and Related Problems: From Theory to Practice

The GeoSteiner software package for computing Steiner trees in the plane: an updated computational study

Article 20 February 2018

A Partition-Based Heuristic for the Steiner Tree Problem in Large Graphs

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Given an edge-weighted, undirected graph \(G = (V,E)\) and a set \(T \subseteq V\) of terminals, the Steiner Problem in Graphs (SPG) is that of finding a minimum-cost tree that contains all vertices in T. This has applications in many areas, including computational biology, networking, and circuit design [9]. Unfortunately, it is NP-hard not only to find an optimal solution [30], but also to approximate it within a factor of 96/95 [10]. The best known approximation ratio is 1.39 [8, 24] (see [46] for a 1.55 approximation). Given its practical importance, there is a wealth of exact algorithms [12, 18, 25, 31, 33, 38, 40, 41, 45, 51] and heuristics [3, 6, 17, 18, 37, 44, 45, 52] to deal with real-world instances. State-of-the-art algorithms use a diverse toolkit that includes linear relaxations, branch-and-bound, reduction tests (preprocessing), and primal and dual heuristics.

Our goal in this paper is to develop an algorithm that is, above all, robust. For any input instance, regardless of its characteristics, we want to quickly produce a good solution. Moreover, the algorithm should scale well: when given more time to run, it should produce better solutions.

Our basic algorithm follows the principles of a heuristic proposed by Ribeiro, Uchoa, and Werneck [45]: it is a multistart algorithm with an evolutionary component, using perturbation for randomization. Under the hood, however, we introduce significant improvements that lead to much better results.

First, we leverage fast local search algorithms recently proposed by Uchoa and Werneck [52], which are asymptotically faster (in theory and practice) than previous approaches. Second, we propose a cascaded combination strategy, which immediately combines each fresh (newly-created) solution with multiple entries from a pool of elite solutions, leading to much quicker convergence. Third, we counterbalance this intensification strategy with a series of diversification measures (including careful perturbation and replacement policies in the pool) in order to explore the search space more comprehensively.

As a result, long runs of our algorithm can match or even improve the best published solutions (at the time of writing) for several open instances in the literature. For easier inputs, our basic algorithm still finds very good results, but for some graph classes it can be slower than alternative approaches that rely heavily on preprocessing and small duality gaps [12, 40].

To make our overall approach more robust, we include some preprocessing routines. Moreover, we propose a Guarded Multistart algorithm, which runs a branch-and-bound routine at the same time as our basic (primal-only) algorithm. For easy instances, these two threads can help each other, often leading to drastic reductions in total CPU time. To compute lower bounds, we propose a novel and efficient implementation of a well-known combinatorial dual ascent algorithm due to Wong [58]. For several hard instances, our branch-and-bound routine finds provably optimal solutions faster than any published algorithm.

Even with these optimizations, there are important graph classes (such as some VLSI instances) in which our method is not as effective as other approaches, notably those based on advanced reduction techniques and linear programming [12, 40] or on dynamic programming [25] (when the number of terminals is small). Even in such cases, however, the solutions found by our approach are not much worse, confirming its robustness. Overall, our algorithm provides a reliable, general-purpose solution for the Steiner Problem in Graphs.

This paper is organized as follows. Section 2 explains our multistart algorithm. Section 3 discusses our lower-bounding techniques, including branch-and-bound. Section 4 shows how preprocessing and lower-bounding make our basic algorithm more robust. Section 5 has experiments, and we conclude in Sect. 6.

Notation The input to the Steiner Problem in Graphs is an undirected graph \(G = (V,E)\) and a set \(T \subseteq V\) of terminals. Each edge \(e = (v,w)\) has an associated nonnegative cost (length) denoted by \( cost (e)\) or \( cost (v,w)\). A solution \(S = (V_S, E_S)\) is a tree with \(T \subseteq V_S \subseteq V\) and \(E_S \subseteq E\); its cost is the sum of the costs of its edges. Our goal is to find a solution of minimum cost.

2 Basic algorithm

Our basic algorithm follows the multistart approach and runs in M iterations (where M is an input parameter). Each iteration generates a new solution from scratch using a constructive algorithm (with randomization), followed by local search. We also maintain a pool of elite solutions with good solutions found so far, including the very best one. The main feature of our algorithm is a cascaded combination strategy, which aggressively combines a new solution with several existing ones. This finds very good solutions soon, but has a very strong intensification effect. To counterbalance it, we look for diversification in other aspects of the algorithm. The outline of the entire algorithm is as follows:

1.
Create an empty pool P of elite solutions with capacity \(\lceil {\sqrt{M/2}}\rceil \).
2.
Repeat for M iterations:
1. (a)
  Generate a new solution S using a constructive algorithm, local search, and randomization.
2. (b)
  Try to add S to the pool P.
3. (c)
  Generate a solution \(S'\) by combining S with solutions in the pool P.
4. (d)
  Try to add \(S'\) to the pool P.
3.
Return the best solution in the pool P.

The remainder of this section describes details omitted from this outline. Section 2.1 explains the local search routines, Sect. 2.2 describes how fresh solutions are generated, Sect. 2.3 deals with the cascaded combination algorithm, and Sect. 2.4 addresses the insertion and eviction policies for the pool.

2.1 Local search

A local search algorithm tries to improve an existing solution S by examining a neighborhood \(\mathcal{N}(S)\) of S, a set of solutions obtainable from S by performing a restricted set of operations. Evaluating \(\mathcal{N}(S)\) consists of either finding an improving solution \(S'\) (i.e., one with \(cost(S') < cost(S)\)) or proving that no such \(S'\) exists in \(\mathcal{N}(S)\). A local search heuristic repeatedly replaces the current solution by an improving neighbor until it reaches a local optimum. Uchoa and Werneck [52] present algorithms to evaluate in \(O(|E| \log |V|)\) time four natural (and well-studied) neighborhoods: Steiner-vertex insertion, Steiner-vertex elimination, key-path exchange, and key-vertex elimination.

The first two represent a solution \(S = (V_S,E_S)\) in terms of its set \(V_S {\setminus } T\) of Steiner vertices. The minimum spanning tree (MST) of the subgraph of G induced by \(V_S\) (which we denote by \(\text{ MST }(G[V_S])\)) costs no more than S. In particular, if S is optimal, so is \(\text{ MST }(G[V_S])\). Uchoa and Werneck use dynamic graph techniques to efficiently evaluate neighborhoods defined by the insertion or removal of a single Steiner vertex [35, 36, 48, 54]. These neighborhoods had been considered in metaheuristics before [3, 44, 45], but evaluation required \(O(|V|^2)\) time for insertions and O(|E| |V|) time for removals.

The other two neighborhoods represent a solution S in terms of its key vertices \(K_S\), which are Steiner vertices with degree at least three in S. If S is optimal, it costs the same as the MST of its distance network restricted to \(K_S \cup T\) (the complete graph on \(|K_S \cup T|\) vertices whose edge lengths reflect shortest paths in G). Uchoa and Werneck show that the neighborhood corresponding to the elimination of a single key vertex can be evaluated in \(O(|E| \log |V|)\) time. They prove the same bound for the key-path exchange local search [14, 17, 53], which attempts to replace an existing key path (linking two vertices of \(K_S \cup T\) in S) by a shorter path between the components it connects. Both implementations improve on previous time bounds by a factor of O(|T|).

In this paper, we take the local searches mostly as black boxes, but use the fact that these implementations work in passes. If there is an improving move in the neighborhood, a pass is guaranteed to find one in \(O(|E| \log |V|)\) time. To accelerate convergence, the algorithms may perform multiple independent moves in the same pass (within the same time bound). In practice, there are almost always fewer than 10 passes, with most of the improvements achieved early on [52].

We follow Uchoa and Werneck and use what they call the vq local search within our algorithm. It alternates between a pass that evaluates Steiner-vertex insertion (v) and a pass that evaluates (simultaneously) both key-vertex removal and key-path exchange (q). Since in practice Steiner vertex removal (u) is rarely better than key-vertex removal, our algorithm does not use it.^{Footnote 1}

2.2 Generating new solutions

New solutions are generated by a constructive algorithm followed by local search, using randomization. Instead of making our algorithms (constructive heuristic and local search) randomized, we follow Ribeiro et al. [45] and apply perturbations to the edge costs instead, preserving the running time guarantees of all algorithms.

Using the perturbation To build a constructive solution, we apply a random perturbation to the edge costs (details will be given later), then run a near-linear time (in practice) implementation [39] of the shortest-path heuristic (SPH) [49]. Starting from a random root vertex, SPH greedily adds to the solution the entire shortest path to the terminal that is closest (on the perturbed graph) to previously picked vertices.

Ribeiro et al. [45] suggest applying local search to the constructive solution, but using the original (unperturbed) costs during local search. Since the constructive solution can be quite far from the local optimum, however, the effects of the perturbation may disappear quite soon, hurting diversification.

We propose an alternative approach, which leverages the fact that our local searches work in passes. We start the local search on the perturbed instance and, after each pass it makes, we dampen the perturbation, bringing all costs closer to their original values. For each edge e with original (unperturbed) cost \( cost (e)\), let \( cost _i(e)\) be its (perturbed) cost at the end of pass i. For pass \(i+1\), we set \( cost _{i+1}(e) = \alpha cost _i(e) + (1 - \alpha ) cost (e)\), where \(0< \alpha < 1\) is a decay factor (we use \(\alpha = 0.5\)). This dampened approach makes better use of the guidance provided by the perturbation, thus increasing diversification. For efficiency, after three passes with perturbation, we restore the original (unperturbed) costs and run the local search until a local optimum is reached. Uchoa and Werneck [52] (Table 5) show that, on non-trivial graph classes, the local search takes between three and four passes (on average) to reach a local minimum (on the unperturbed graph). By making the local search operate on a (decreasingly) perturbed graph for the first three iterations, we improve diversity with minimal effect on the time to reach an (unperturbed) local optimum, as Sect. 5.1 will show.

Computing the perturbation We now return to the issue of how initial perturbations are computed. Ribeiro et al. propose a simple edge-based approach, in which the cost of each edge is multiplied by a random factor (between 1.0 and 1.2). This is reasonably effective, but has a potential drawback: because edges are independent, the perturbations applied to each edge incident to any particular vertex tend to cancel out one another. We thus propose a vertex-based perturbation, which associates an independent random factor to each vertex in the graph. The perturbed cost of an edge (u, v) is then its original cost multiplied by the average factors of its two endpoints (u and v).

To enhance diversification, the choice of parameters that control the perturbation itself is randomized. Each iteration chooses either edge-based or vertex-based perturbation with equal probability. It then picks a maximum perturbation Q uniformly at random in the range [1.25, 2.00]. Finally, it defines the actual perturbation factors: for each element (vertex or edge), it sets the factor to \(1 + \rho Q\), where \(\rho \) is generated (for each element) uniformly at random in [0, 1].

For further diversification, we actually use a slightly non-uniform distribution parameterized by a small threshold \(\tau = (\log _2 n) / n\). If the random number \(\rho \) is at least \(\tau \) (as is usually the case), we use the formula above. Otherwise, we use a perturbation factor of \(\rho /\tau \). Note that this factor is between 0.00 and 1.00, while the standard factor is always between 1.25 and 2.00. This means that a small fraction of the elements can become significantly cheaper than others, and are thus more likely to appear in the solution. This allows us to test key-vertices that our standard local searches would not normally consider, since they do not include a fast implementation of key-vertex insertion.

We stress that the algorithm already works reasonably well with the standard edge-based perturbation proposed by Ribeiro et al. [45]; although we observed some improvement with the vertex-based perturbation (and the non-uniform distribution), the effects were relatively minor. Given that these these alternatives are quite simple to implement, they are still worth using.

2.3 Cascaded combination

The cascaded combination algorithm takes as input an initial solution \(S_0\), the pool of elite solutions, and the maximum number of allowed failures, denoted by \(\phi \) (we use \(\phi =3\)). The procedure combines \(S_0\) with elements in the pool, generating a (potentially better) solution \(S^*\).

The basic building block of this procedure is the randomized merge operation [45], which takes as input two solutions (\(S_a\) and \(S_b\)) and produces a third (potentially cheaper) one. It does so by first generating a perturbed graph from G by perturbing each edge cost depending on which of the original solutions (\(S_a\) and/or \(S_b\)) it appears in. If an edge appears in both solutions, it keeps its original cost. If it appears in none of the solutions, its cost is multiplied by 1000. If it appears in exactly one solution (\(S_a\) or \(S_b\)), its cost is multiplied by a random number between 100 and 500. We run the SPH heuristic on the resulting instance. Note that, regardless of graph size, the combined solution preserves all edges that appear in both \(S_a\) and \(S_b\) (since they are relatively much cheaper); the remaining edges are likely to come from either \(S_a\) or \(S_b\) (since such edges are still significantly cheaper than those that appear in neither solution). We then remove all perturbations and apply local search to the combined solution, producing \(S_c\), the result of the perturbed combination.

The cascaded combination procedure maintains an incumbent solution \(S^*\), originally set to \(S_0\). In each step, it performs a randomized merge of \(S^*\) and a solution \(S'\) picked uniformly at random from the pool. Let \(S''\) be the resulting solution. If \( cost (S'') < cost (S^*)\), we make \(S''\) the new incumbent (i.e., we set \(S^* \leftarrow S''\)). Otherwise, we say that the randomized merge failed and keep \(S^*\) as the incumbent. When the number of failures reaches \(\phi \), the cascaded combination algorithm stops and returns \(S^*\).

Note that the resulting solution \(S^*\) may have elements from several other solutions in the pool. This makes it a powerful intensification agent, helping achieve good solutions quite quickly. That said, the first few solutions added to the pool will have a disproportionate influence on all others, potentially confining the multistart algorithm to a very restricted region of the search space. This is why we prioritize diversification elsewhere in the algorithm.

On average, each multistart iteration touches a constant number of solutions in the pool. We set the capacity of the elite pool to \(\varTheta (\sqrt{M})\) to ensure that most pairs of elite solutions are (indirectly) combined with one another at some point during the algorithm. We set the precise capacity to \(\lceil \sqrt{M/2} \rceil \), but the algorithm is not too sensitive to this constant; results were not much different with \(\lceil \sqrt{M} \rceil \) or \(\lceil \sqrt{M/4} \rceil \).

2.4 Pool management

We now address the insertion and eviction policies for the pool of elite solutions. When our algorithm attempts to add a solution S to the pool, we must consider three simple cases and a nontrivial one. First, if S is identical to a solution already in the pool, it is not added. Second, if the pool is not full and S is not identical to any solution, S is simply added. Third, if the pool is full and S is not better than any solution in the pool, S is not added.

The nontrivial case happens when the pool is full, S is different from all solutions in the pool, and S is better than the worst current solution. In this case, S replaces a solution that is at least as bad as S, with (randomized) preference for solutions that are similar to S. More precisely, we define the relative symmetric difference between S and \(S'\) as \(\varDelta (S,S') = |E(S) \cap E(S')| / |E(S) \cup E(S')|\); this is 0 if the solutions are identical, and 1 if they have no edge in common. When inserting S into the pool, we pick a solution to replace among all solutions \(S'\) that cost at least as much as S, with probability proportional to a score given by \(1/((1 - \varDelta (S,S'))^{2})\). Note that solutions that are similar to S are much more likely to be picked; for example, if S and \(S'\) are nearly identical, the score is close to 1; if they share only half their edges, the score is close to 0.25. This technique (biased towards replacing similar solutions) has been shown to increase diversification for other problems [43].

2.5 Discussion

Although our algorithm borrows some elements from the multistart approach of Ribeiro et al. [45], Sect. 5.1.3 will show that our algorithm significantly outperforms theirs. This section outlines the most relevant similarities and differences between the two approaches.

Ribeiro et al. run a pure multistart routine, in which solutions generated by the end of each iteration are mostly independent from previous ones; these solutions are used to populate an elite pool. At the very end, all pairs of elite solutions in the pool are combined with one another. The resulting solutions populate a second pool (generation). This procedure is repeated until it reaches a generation whose average solution value does not improve upon the previous one. The algorithm makes extensive use of randomization: each iteration chooses between three types of constructive algorithm, three types of perturbation (used to randomize the construction), and two local search schemes. The post-optimization phase chooses between two types of combination (randomized merges or path relinking by complementary moves).

Although there are many differences between the algorithms, the main reason for our good performance is that we leverage asymptotically faster implementations of the local searches. Like us, Ribeiro et al. use local searches based on adding and removing Steiner vertices and a local search based on key-path exchanges (we also add a local search based on key-vertex removals). While they must explicitly build each neighboring solution in linear or quasi-linear time, we can evaluate them implicitly in logarithmic amortized time by applying the recent algorithms from Uchoa and Werneck [52]. This allows us to reach local optima much faster in practice. Our local optima tend to be slightly worse, however, since explicitly building neighboring solutions allows Ribeiro et al. to look for “opportunistic” moves (such as removing non-terminal leaves) during the evaluation. As Sect. 5.1.3 will show, this is a worthy trade-off within the overall multistart algorithm, allowing us to find better results within the same time limit.

Faster local searches allow us not only to run more iterations within the same time limit, but we can also do more in each iteration using cascaded combinations, another contribution of this paper. Both cascaded combinations and the algorithm of Ribeiro et al. use randomized merges as building blocks. While Ribeiro et al. perform randomized merges only at the very end, cascaded combinations (as proposed in this paper) use randomized merges much more aggressively, with combinations happening during the multistart phase itself. The result of one iteration is thus heavily dependent on others. As our experiments will show (see Fig. 2), this allows very good solutions to be generated much earlier in the execution.

Despite being less relevant to explain the performance gap, there are several other differences between the algorithms. We use a single constructive algorithm (Ribeiro et al. choose between three in each iteration) and a single local search scheme (Ribeiro et al. alternate between two). We use a difference-based replacement scheme when updating the pool (Ribeiro et al. do not). We alternate between vertex-based and edge-based perturbations, while Ribeiro et al. use only the latter. We stress that these are minor differences, however. The parameters we reported are the ones we ended up using in the final version of our code, but Sect. 5.1 will show that our algorithm is not very sensitive to these parameters.

3 Lower bounds

We now turn our attention to finding lower bounds. We propose an efficient implementation of a known greedy combinatorial algorithm (due to Wong [58]) associated with a powerful linear programming formulation. We first describe the formulation and an abstract version of the algorithm, then discuss our implementation.

3.1 Formulation and dual ascent

We use the dual of the well-known directed cut formulation for the Steiner problem in graphs [58]. It takes a terminal \(r \in T\) as the root. A set \(W \subset V\) is a Steiner cut if W contains at least one terminal but not the root. Let \(\delta ^-(W)\) be the set consisting of all arcs (u, v) such that \(u \not \in W\) and \(v \in W\).

The dual formulation associates a nonnegative variable \(\pi _W\) with each Steiner cut W. The set \(\mathcal W\) of all Steiner cuts has exponential size. Given a dual solution \(\pi \), the reduced cost \(\bar{c}(a)\) of an arc a is defined as \( cost (a) - \sum _{W \in \mathcal{W}: a \in \delta ^-(W)} \pi _W\). The dual formulation maximizes the sum of all \(\pi _W\) variables, subject to all reduced costs being nonnegative. An arc whose reduced cost is zero is said to be saturated.

The technique we use for finding lower bounds is a dual ascent routine proposed by Wong [58], which finds a greedy feasible solution to the dual formulation. The algorithm maintains a set of C of active terminals, initially consisting of \(T {\setminus } \{r\}\). For a terminal \(t \in T\), let \( cut (t)\) be the set of vertices that can reach t through saturated arcs only. We say that t induces a root component if t is the only active terminal in \( cut (t)\).

The algorithm (implicitly) initializes with 0 the variables associated with all Steiner cuts. In each iteration, it picks a vertex \(v \in C\) and checks if \( cut (v)\) is a root component. If it is not, it makes v inactive and removes it from C; if it is a root component, the algorithm increases \(\pi _{\delta ^-( cut (v))}\) until one of its arcs is saturated (and keeps v in C). We stop when C becomes empty, at which point every terminal can be reached from the root using only saturated arcs.

Each iteration takes O(|E|) time to process a candidate root component. There are O(|T|) unsuccessful iterations, since each reduces the number of active vertices. A successful iteration saturates at least one arc and increases the size (number of vertices) of at least one root component. There can thus be at most \(\min \{|V||T|, |E|\}\) such iterations, bounding the total running time by \(O(|E| \min \{|V||T|, |E|\})\) [15].

3.2 Our implementation

We now describe details of our implementation of Wong’s algorithm that are crucial to its good performance in practice.

Processing a root component Once a vertex v is picked at the beginning of an iteration, we process its (potential) root component in three passes.

The first pass performs a graph search (we use BFS) from v following only saturated incoming arcs. If the search hits another active vertex, the iteration stops immediately: v does not define a root component. Otherwise, the search finishes with two data structures: a set S consisting of all vertices in \( cut (v)\), and a list L which includes all unsaturated arcs (a, b) such that \(a \not \in cut (v)\) and \(b \in cut (v)\). To ensure both structures can be built during the BFS, we allow L to also contain some unsaturated arcs (a, b) such that both a and b belong to \( cut (v)\). These may appear because, when the BFS scans b, it may not know yet whether its neighbor a is part of \( cut (v)\); to be safe, we add (a, b) to L anyway.

The second pass traverses L with two aims: (1) remove from L all arcs (a, b) that are invalid (with \(a \in cut (v)\)); and (2) pick, among the remaining arcs, the one with the minimum residual capacity \(\varDelta \).

The third pass performs an augmentation by reducing the residual capacity of each arc in L by \(\varDelta \). It also builds a set X with the tails of all arcs that become saturated, which will be part of the new root component of v (after augmentation).

Note that only the first pass of the algorithm performs a graph search; the other two passes are much cheaper, as they merely traverse arrays.

Selection rules and lazy evaluation The bound given by the algorithm depends on which active vertex (root component) it selects in each iteration. Without loss of generality, we assume each iteration picks the active vertex that minimizes some score function. Poggi de Aragão, Uchoa, and Werneck [38] (see also [56]) found that using the number of incident arcs as the score works well in practice. Polzin and Vahdati Daneshmand [12, 40] show that a related (but coarser) measure, the number of vertices in the component, also works well. For either score function (and others), the main challenge is to maintain scores efficiently for all root components during the algorithm, since augmenting on one root component may affect several others.

For efficiency, we focus on nondecreasing score functions: as the root component grows, its score can either increase or stay the same. This allows us to use lazy evaluation. We maintain each active vertex v in a priority queue, with a priority \(\sigma (v)\) that is a lower bound on the score of its root component. Each round of the algorithm removes the minimum element t from the queue. It then verifies (using the procedure above) if t defines a root component; if it does not, we just discard t. Otherwise, we perform the corresponding augmentation as long as the actual score is not higher than the priority of the second element in the queue. Finally, we reinsert t into the priority queue.

When we do augment, computing the new exact score of t can be expensive. Instead, we update the score assuming the vertices in the root component are the union of X (the tails of all arcs saturated during the augmentation) and the original vertices (at the beginning of the iteration). Although we may miss some vertices, this is relatively cheap to compute and provides a valid lower bound on the actual score.

Since the number of arcs incident to a root component may decrease, we cannot use it as score function. Instead, we use a refined version of the number of vertices in the root component. Given a component c, let \(\text{ vc }(c)\) be its number of vertices and let \(\text{ deg }(c)\) be the sum of their in-degrees. We use \(\text{ deg }(c) - (\text{ vc }(c) - 1)\) as the score. This is an upper bound on the number of incoming arcs on the component (the \(\text{ vc }(c) - 1\) term discards arcs in a spanning tree of the component, which must exist). This function is nondecreasing, as cheap to compute as the number of vertices, and gives a better estimate on the number of incoming arcs.

Eager evaluation Even with lazy evaluation, we may process a root component multiple times before actually performing an augmentation (or discarding the component). To make the algorithm more efficient, we also use eager evaluation: after removing a component from the priority queue, we sometimes perform an augmentation even its real score does not match the priority in the queue. More precisely, as long as the actual score is no more than 25% higher than the priority, the augmentation is performed. This has almost no effect on solution quality but makes the algorithm significantly faster. Note that any constant factor (including 25%) implies a logarithmic bound on the number of times any component can be reevaluated during the algorithm. The maximum time spent reevaluating components (without subsequent augmentations) is thus \(O(|E||T|\log |V|)\).

Last component Typically, the initial cuts found by the dual ascent algorithm have much fewer vertices than later ones. In particular, when there is only one active vertex v left, we may have to perform several expensive augmentations until it becomes reachable from the root. We can obtain the same bounds faster by dealing with this case differently: we run (forward) Dijkstra’s algorithm from the root to v, using reduced costs as arc lengths. We then use a linear pass to update the reduced costs of the remaining arcs appropriately [56].

3.3 Branch-and-bound

We use our dual ascent algorithm within a simple branch-and-bound procedure. We follow the basic principles of most previous work [12, 38, 40, 56, 58], using dual ascent for lower bounds and branching on vertices. The remainder of this section describes other features of our implementation.

The dual ascent root is picked uniformly at random (among the terminals) and independently for each node of the branch-and-bound tree.

To find primal (upper) bounds, we run the SPH heuristic on the (directed) subgraph consisting only of arcs saturated by the dual ascent procedure, using the same root. We then run a single pass of the Steiner vertex insertion and elimination local search procedures (using all edges, not just saturated ones).

We branch on the vertex that has maximum degree in the primal solution found in the current branch-and-bound node. In case of ties, we look beyond the current primal solution and prefer vertices that maximize the sum of incoming saturated arcs, outgoing saturated arcs, and total degree. Remaining ties are broken at random. If v is the chosen vertex, we remove it from the graph on the “zero” side and make it a terminal on the “one” side. We traverse the branch-and-bound tree in DFS order, visiting the “one” side first. This tends to find good primal solutions quicker than other approaches we tried.

We can eliminate an arc (u, v) if its reduced cost is at least as high as the difference between the best known primal solution and the current dual solution. We actually take the extended reduced cost [41], which also considers the distance (using reduced costs) from the root to u. Since the root can change between nodes in the branch-and-bound tree, we only eliminate an (undirected) edge when both corresponding arcs (directions) could be fixed by reduced cost. If we eliminate at least |E| / 5 edges in a branch-and-bound node, we create a single child node rather than branching.

4 Improving robustness

While the algorithm from Sect. 2 works well on many graph classes, there are still opportunities to make it more robust (compared to other approaches) for very easy or very hard instances. Section 4.1 shows how the lower bounds described in Sect. 3 allow our heuristic to stop sooner. Section 4.2 describes some basic preprocessing techniques to reduce the size of the graph on certain classes of instances. Finally, Sect. 4.3 describes a two-level version of the multistart algorithm that achieves greater diversification on longer runs.

4.1 Guarded multistart

Being a pure heuristic, the multistart algorithm described in Sect. 2 can be wasteful. Because it cannot prove that the best solution it found is optimal (even if it actually is), it cannot stop until it completes its scheduled number of iterations. Ideally, we would like to stop sooner on easy instances.

To that end, we propose a Guarded Multistart (GMS) algorithm. It runs two threads in parallel: the first runs our standard multistart algorithm, while the second runs the branch-and-bound routine from Sect. 3.3. The algorithm terminates as soon as either the multistart thread completes its iterations, or the branch-and-bound thread proves that the incumbent solution is optimal. Communication between threads is limited: the threads inform one another about termination and share the best incumbent solution.

The benefits of this approach are twofold. First, as already mentioned, on easy instances the branch-and-bound algorithm can often prove optimality well before the scheduled number of multistart iterations is reached, making the algorithm faster. Moreover, sometimes the branch-and-bound algorithm quickly finds better primal solutions by itself, leading to better quality as well.

Of course, these advantages are not free: the cycles spent on the branch-and-bound computation could have been used for the multistart itself. On harder instances, we can only afford to perform roughly half as many iterations within the same CPU time. We could make this problem less pronounced using a simple heuristic to detect cases in which the branch-and-bound computation is obviously unhelpful: if its depth reaches (say) 100, we could stop it and proceed only with the multistart computation, saving CPU time. Another potential drawback is nondeterminism: due to scheduling, multiple runs of GMS (even with the same random seed) may find different results. Although one could make the algorithm deterministic by carefully controlling when communication occurs, it is not clear this is worth the extra effort and complexity.

Similarly, it is possible that greater integration between the algorithms (for instance, by tentatively adding branch-and-bound solutions to the multistart pool) would help. In practice, however, we found that the greatest advantage of GMS is saving time on easy instances by allowing the algorithm to stop sooner. Since the primal solutions found by dual ascent tend to be of lower quality for harder instances (with larger duality gaps), it is unlikely that greater integration would justify the extra effort.

4.2 Reduction techniques

To be competitive with state-of-the-art algorithms on standard benchmark instances, we must deal with “easy” inputs effectively. We thus use some basic reduction (preprocessing) techniques that transform the input into a potentially much smaller instance with the same solution.

In particular, we delete non-terminal vertices of degree one (alongside their incident edges). Also, if there is a non-terminal vertex v with exactly two neighbors, u and w, we replace edges (u, v) and (v, w) with a single edge (u, w) with cost \( cost (u,w) = cost (u,v) + cost (v,w)\).

Finally, we implemented a limited version of the Bottleneck Steiner Distance [16] test, which states that an edge (u, v) can be removed from the graph if there is a (bottleneck) path \(P_{uv}\) between u and v such that (1) \(P_{uv}\) excludes edge (u, v) and (2) every subpath of \(P_{uv}\) without an internal terminal has length at most \( cost (u,v)\). Identifying all removable edges can be expensive, so we consider only a couple of common (and cheap) special cases, which can be seen as restricted versions of existing algorithms [12, 40].

The first case is simple. If the combined degree of u and v is small (up to 20), we scan both vertices looking for a common neighbor x such that \( cost (u,x) + cost (x,v) \le cost (u,v)\); we also check if a parallel (u, v) edge exists. This heuristic helps with some grid-like graphs, such as VLSI instances.

The second case may find more elaborate paths. We first use a modified version of Dijkstra’s algorithm [13] to build the Voronoi diagram associated with the terminals of G [34, 39, 52]. For each vertex v, this structure defines \( vb (v)\) (the closest terminal to v), \( vd (v)\) (the distance from that terminal), and \( vp (v)\) (the parent edge on the path from \( vb (v)\)). By looking at boundary edges of this Voronoi diagram, we compute the MST of the distance network of G [34, 39, 52]. Let \(E_t\) be the set of edges of this MST, each corresponding to a path in the original graph. Let \(E_f \subseteq E\) be the set of free edges, i.e., original edges that neither are parents in the Voronoi diagram nor belong to a path in \(E_t\).

We try to eliminate edges in \(E_f\), using \(E_t\) and the Voronoi diagram to find bottleneck paths. To do so efficiently, we first sort the union of \(E_t\) and \(E_f\) in increasing order of cost, breaking ties in favor of entries in \(E_t\). We also initialize a union-find data structure [50] with |V| disjoint sets. We then traverse this list in order. Consider edge \(e_i = (u,v)\). If \(e_i \in E_t\), we join groups u and v in the union-find data structure. Otherwise (if \(e_i \in E_f\)) we delete \(e_i\) if all of the following three conditions hold: (1) u and v belong to the same component in the union-find data structure; (2) \( dist ( vb (u),u) \le cost (e_i)\); and (3) \( dist ( vb (v),v) \le cost (e_i)\). This Voronoi-based test is particularly effective on random and dense Euclidean graphs.

4.3 Two-phase multistart

Even with all the measures we take to increase diversification, our multistart algorithm can still be strongly influenced by the first few solutions it finds, since they will be heavily used during cascaded combination. If the algorithm is unlucky in the choice of the first few solutions, it may be unable to escape a low-quality local optimum.

When the number M of iterations is large (in the thousands), we obtain more consistent results with a two-phase version of our algorithm. The first phase independently runs the standard algorithm four times, with M / 8 iterations each. The second phase runs the standard algorithm with M / 2 iterations, but starting from a pool of elite solutions obtained from the union of the four pools created in the first phase. Note that the combined size of all pools in the first phase is \(4 \sqrt{M/16} = \sqrt{M}\), while the second-phase pool can hold only \(\sqrt{M/4} = \sqrt{M}/2\) solutions. We thus take the elite solutions from the first phase in random order and try to add them to the (initially empty) final pool, using the criteria outlined in Sect. 2.4 to decide which solutions are kept.

We call this two-phase version of our multistart algorithm MS2. Since it has multiple independent starts, it is less likely than the single-phase version of our algorithm (which we call MS) to be adversely influenced by a particularly bad set of initial solutions.

4.4 Time-bounded algorithm

The standard version of our multistart algorithm (as described in Sect. 2) is parameterized by the number of iterations. In some real-world applications, one would like to have instead a time-bounded algorithm, whose goal is to find the best possible solution within a time budget \(\tau \). (This was in fact how the software competition from the 11th DIMACS Implementation Challenge [28] was set up.) Recall, however, that our multistart algorithm sets the size of the elite pool to be proportional to the square root of the number of iterations, which is unknown in the time-bounded setup. A solution would be to fix the size of the elite pool to some constant, and just stop the algorithm when it runs out of time.

We get better results by computing an initial estimate on the number of iterations in the obvious way: we run a single iteration of the multistart algorithm (i.e., constructive algorithm (SPH) followed by local search). Here we use actual edge weights, without perturbation. Let its running time by \(\tau _1\).

We then run the standard multistart algorithm described in Sect. 2, using an (initially empty) elite pool of size \(\sqrt{M'}\). Here \(M' = \tau / (2.5 \tau _1)\) is an estimate on the number of iterations the algorithm can run within the time budget \(\tau \), meaning that we estimate that a standard iteration of the algorithm will take about 2.5 times the first one, which does not use combinations. This constant (2.5) was found empirically, and the algorithm is not too sensitive to it. Note that the algorithm actually stops when the time limit is reached, so its number of iterations may be lower or higher than \(M'\).

The same approach can be used to make the two-phase MS2 algorithm time-bounded: we run a single iteration to get an estimate \(M'\) on the total number of iterations, then run MS2 parameterized by \(M'\). In fact, we can even have an adaptive algorithm that picks either MS or MS2 for the main runs depending on the value of \(M'\). As Sect. 5 will show, a good strategy is to use MS for \(M' \le 2048\) and MS2 otherwise.

5 Experiments

We implemented all algorithms in C++ and compiled them using Visual Studio 2013 (optimizing for speed). We ran our main experiments on a machine with two 3.33 GHz Intel Xeon X5680 processors running Windows 2008R2 Server with 96 GB of DDR3-1333 RAM. This machine scores 388.914 according to the benchmark code made available for the 11th DIMACS Implementation Challenge (http://dimacs11.zib.de/downloads.html). All runs we report are sequential, except those of the Guarded Multistart algorithm, which use two cores. In every case, we report total CPU times, i.e., the sum of the times spent by each CPU involved in the computation.

Our main experiments evaluate all 1437 instances available by August 1, 2014 from the 11th DIMACS Implementation Challenge [28] (accessible from http://dimacs11.zib.de/downloads.html) and report error rates relative to the best solutions published by then (http://dimacs11.zib.de/instances/bounds20140801.txt). Section 5.3 discusses recent developments.

For ease of exposition, we group the original series into classes, as shown in Table 1 (augmented from [52]). More detailed information about the dimensions of the instances in each series can be found in Table 12, in the Appendix. Most instances are available from the SteinLib [32], with two exceptions: cph14 (graphs obtained from rectilinear problems [27]) and vienna (road networks from telecommunication applications [33]).

Table 1 Classes of instances tested in our main experiments

A robust and scalable algorithm for the Steiner problem in graphs

Abstract

Similar content being viewed by others

Faster Algorithms for Steiner Tree and Related Problems: From Theory to Practice

The GeoSteiner software package for computing Steiner trees in the plane: an updated computational study

A Partition-Based Heuristic for the Steiner Tree Problem in Large Graphs

1 Introduction

2 Basic algorithm

2.1 Local search

2.2 Generating new solutions

2.3 Cascaded combination

2.4 Pool management

2.5 Discussion

3 Lower bounds

3.1 Formulation and dual ascent

3.2 Our implementation

3.3 Branch-and-bound

4 Improving robustness

4.1 Guarded multistart

4.2 Reduction techniques

4.3 Two-phase multistart

4.4 Time-bounded algorithm

5 Experiments

5.1 Multistart

5.1.1 Guarded multistart

5.1.2 Comparison with Polzin and Vahdati Daneshmand

5.1.3 Comparison with Ribeiro et al.

5.1.4 Long runs

5.1.5 Time-bounded algorithms

5.2 Branch-and-bound

5.3 Recent developments

6 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

A Appendix

A Appendix

1.1 A.1 Additional results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation