Keywords

1 Introduction

When faced with a new, computationally hard problem to solve, researchers do not typically want to reinvent the wheel. Instead, it makes sense to draw on design ideas from existing high-performance solvers. Such an approach can be made systematic by designing a single, highly parameterized solver that incorporates these different ideas, and then identifying a parameter configuration that achieves good performance via an automatic algorithm configuration method [9, 15]. Indeed, many powerful configuration procedures have recently become available to meet this challenge [10, 11, 14, 20]. The types of solvers configured in this way can range from simple heuristic switching [30] to a complex combination of multiple algorithms [28]. While the result is often an algorithm with excellent performance characteristics, it can be difficult to understand such an algorithm, e.g., in terms of how similar (or dissimilar) it is to the existing solvers from which design ideas were drawn—a problem that has received little attention to date by the research community. This work seeks to address this gap. We propose a new metric for quantitatively assessing the similarity between configurations for highly parametric solvers, which computes the distance between two algorithm configurations in two steps. In the first step, the hierarchical structure of algorithm parameters is represented by a novel data structure called a concept DAG. In the second step, we estimate the similarity of two configurations as the transformation cost from one configuration to another, using concept DAGs.

In order to demonstrate the effectiveness of our approach, we investigate the configurations of SATenstein, a well-known, highly parameterized SLS solver. SATenstein has a rich and complex design space with 43 parameters, drawing design ideas from several existing solvers, and is one of the most complex SLS solvers in the literature. We show that visualizations based on transformation costs can provide useful insights into similarities and differences between solver configurations. In addition, we argue that this metric can help to suggest potential links between algorithm structure and algorithm performance.

To our knowledge, there is little previous work directly relevant to the problem of quantifying the similarity of algorithm configurations. Visualization techniques have been used previously to characterize the structure of instances of the well-known propositional satisfiability problem (SAT) [26]; instead, we focus on algorithm design elements. Most similar to our work, Nikolić et al. [21] used the notion of edit distance to automatically quantify algorithm similarity. Our main innovation is to address hierarchies of conditional parameters by saying that edits to lower-level parameters are less significant than edits to higher-level parameters. Conditional parameters are increasingly important as algorithm development shifts to rely on algorithm configuration tools and hence parameter spaces become richer and more complex; see e.g., recent work on assessing parameter importance [13] and finding critical parameters [4].

The remainder of this paper is organized as follows. We present a high-level description of SATenstein in Sect. 2. Next, we describe concept DAGs (Sect. 3) and then present our experimental setup (Sect. 4). We describe our results on quantifying similarities between algorithm configurations in Sect. 5 and then conclude (Sect. 6).

2 SATenstein

In this section, we provide a short description of the overall design of SATenstein. A detailed description of SATenstein is given in [16]. As shown in the high-level algorithm outline, any instantiation of SATenstein proceeds as follows:

  1. 1.

    Optionally execute B1, which performs search diversification.

  2. 2.

    Execute either B2, B3 or B4, thus performing WalkSAT-based local search, dynamic local search or G\(^{2}\)WSAT-based local search, respectively.

  3. 3.

    Optionally execute B5 to update data structures such as promising list, clause penalties, dynamically adaptable parameters or tabu attributes.

SATenstein consists of five building blocks and eight components, some of which are shared across different building blocks. It has 43 parameters in total. The choice of building block is encoded by several high-level categorical parameters, while the design strategies within each component are determined by a larger number of low-level parameters.

figure a

3 Concept DAGs

We now introduce concept DAGs, a novel data structure for representing algorithm configurations that preserves the hierarchical structure of parameter dependencies. Our notion of a concept DAG is based on that of a concept tree [31]. We work with a DAG-based data structure because parameters may have more than one parent, where the child is only active if the parents take certain values (e.g., SATenstein’s noise parameter phi is only activated when both useAdaptiveMechanism and singleClauseAsNeighbor are turned on). We then define four operators whose repeated application can be used to map between arbitrary concept DAGs, and assign each operator a cost. To compare two parameter configurations, we first represent them using concept DAGs and then define their similarity as the minimal total cost of transforming one DAG into the other.

A concept DAG is a six-tuple \( G =(V, E, L^V, R, D, M)\), where V is a set of nodes, E is a set of directed edges between the nodes in V such that (VE) is an acyclic graph, \(L^V\) is a set of lexicons (terms) for concepts used as node labels, R is a distinguished node called the root, D is the domain of discourse (i. e., the set of all possible node labels), and M is an injective mapping from V to \(L^V\) that assigns a unique label to every node. A parameter configuration can be expressed as a concept DAG in which each node in V represents a parameter, and each directed edge in E represents the conditional dependence relationship between two parameters. \(L^V\) is the set of parameter values used in a particular configuration (i. e., a set containing exactly one value from the domain of each parameter), D is the union of the domains of all parameters, and M specifies which value assigned to each parameter \(v \in V\) in the given configuration. We add an artificial root node R, which connects to all parameter nodes that do not have any parent, and refer to these parameters as top-level parameters.

We can transform one concept DAG into another by a series of delete, insert, relabel and move operations, each of which has an associated cost. For measuring the degree of similarity between two algorithm configurations, we first express them as concept DAGs, \(DAG_1\) and \(DAG_2\). We define the distance between these DAGs as the minimal total cost required for transforming \(DAG_1\) into \(DAG_2\). Obviously, the distance between two identical configurations is 0.

The parameters with the biggest impact on an algorithm’s execution path are likely to appear high in the DAG (i.e., to be conditional upon few or no other parameters) and/or to turn on a complex mechanism (i.e., to have many parameters conditional upon them). Therefore, we say that the importance of a parameter v is a function of its depth (the length of the longest path from the root R of the given concept DAG to v) and the total number of other parameters conditional on it. To capture this definition of importance, we define the cost of each of the four DAG-transforming operations as follows.  

Deletion cost :

\(C(\text {delete}(v))=\frac{1}{|V|} \cdot (height(DAG)-depth(v)+1+|DE(v)|)\), where height(DAG) is the height of the DAG, depth(v) is the depth of node v and DE(v) is the set of descendants of node v. This captures the idea that it is more costly to delete top-level parameters and parameters that (de-)activate complex mechanisms.

Insertion cost :

\(C(\text {insert}(u,v))=\frac{1}{|V|} \cdot (height(DAG)-depth(u)+1+|DE(v)|)\), where DE(v) is the set of descendants of v after the insertion, and u is the node under which v is inserted.

Moving cost :

\(C(\text {move}(u,v))= \frac{|V|-2}{2 \cdot |V|} \cdot [C(\text {delete}(v))+C(\text {insert}(u,v))]\), where \(|V|>2\).

Relabelling cost :

\(C(\text {relabel}(v, l^{v}, l^{v^*})=[C(\text {delete}(v))+C(\text {insert}(u,v))] \cdot s(l^{v}, l^{v^*})\), where u is the parent node of v and \(s(l^{v}, l^{v^*})\) is a measure of the distance between the old label, \(l^v\), and the new label, \(l^{v^*}\), of node v. For parameters with continuous domains, \(s(l^{v}, l^{v^*})=|l^{v} - l^{v^*}|\). For parameters whose domains are some finite, ordinal and discrete set \(\{l^{v_1}, l^{v_2}, \ldots , l^{v_k}\}\), \(s(l^{v}, l^{v^*})=\text {abs}(v-v^*)/(k-1)\), where \(\text {abs}(v-v^*)\) measures the number of intermediate values between v and \(v^*\). For categorical parameters, \(s(l^{v}, l^{v^*}) = 0\) if \(l^{v}= l^{v^*}\) and 1 otherwise. Since ParamILS, the algorithm configurator we used, requires discrete parameter domains, all our parameters are either categorical or have finite, ordinal and discrete domains; therefore, \(s(l^{v}, l^{v^*})\) is always bounded between [0, 1].

 

In our implementation, we did not use the move operator, because the structure of SATenstein’s parameter space does not provide much scope for its application. Also, instead of finding the minimal transformation cost over all sequences of delete and insert operations (a potentially expensive computation), we used an easily implemented, yet effective stochastic local search procedure to produce upper bounds. This procedure is based on randomised iterative first improvement (with a random walk probability of 0.01); starting from a permutation of the delete and insert operations required to transform the first concept DAG into the second that is chosen uniformly at random, it swaps two operations in each step. The search process is restarted whenever no improvement in the best transformation cost seen so far has been obtained within 200 iterations and terminates upon the 10th such restart. For each sequence of inserts and deletes, it is straightforward to compute the transformation cost that also takes into account the corresponding relabelling operations.

Table 1. Our eleven challenger algorithms.
Table 2. Our six benchmark distributions.

4 Experimental Setup

Our quantitative analysis of SATenstein configurations is based on performance comparisons with eleven high-performance SLS solvers on six well-known SAT distributions, listed in Table 1 (we call each of these solvers a challenger) and Table 2, respectively.

We performed algorithm configuration using ParamILS [11], a well-known automatic algorithm configurator. On each benchmark distribution, we configured SATenstein on the training set, and evaluated its performance of the configuration on the test set. For each test set instance, we ran each solver 25 times with a per-run cutoff of 600 CPU seconds. Following [11], we evaluate performance in terms of penalized average run time (PAR), which is defined as average run time with each timed out run counted as having completed in 10 times the cutoff time (in this case, 6000 CPU seconds). For a particular solver, we consider an instance solved if a majority of runs found a satisfying assignment. In practice, PAR can be sensitive to the choice of cutoff; however, in past work [15], we showed that PAR did not affect the qualitative evaluation of SATenstein’s performance in all six distributions we considered. We conducted all of our experiments on a cluster of 55 machines each equipped with dual 3.2 GHz Intel Xeon CPUs with 2 MB cache and 2 GB RAM, running OpenSuSE Linux 11.1 and managed by Sun Grid Engine (version 6.0). Code for SATenstein and our transformation cost computation can be found at http://www.cs.ubc.ca/labs/beta/Projects/SATenstein/.

5 Quantitative Comparison of Algorithm Configurations

In previous work, we performed an extensive performance evaluation on six well-known benchmark distributions, finding that SATenstein outperformed all challengers in every distribution [16]. Moreover, we found that SATenstein outperformed tuned challengers as well, albeit to a reduced extent. In order to refer to them in what follows, we summarize these results in Tables 4 and 5.

Table 3. High-level summary of SATenstein solvers.

Table 3 gives a high-level description of SATenstein solvers in terms of building blocks used and overall SLS category. Recall that SATenstein draws components from three major SLS solver categories: WalkSAT, dynamic local search and G\(^{2}\)WSAT-based algorithms.

Table 4. Performance of SATenstein and the 11 challengers. Every algorithm was run 25 times on each instance with a cutoff of 600 CPU seconds per run. Each cell \(\langle i,j \rangle \) summarizes the test-set performance of algorithm i on distribution j as a / b, where a (top) is the penalized average runtime; b (bottom) is the percentage of instances solved (i.e., those with median runtime < cutoff). The best-scoring algorithm(s) in each column are indicated in bold, and the best-scoring challenger(s) are underlined.
Table 5. Performance summary of the automatically configured versions of 8 challengers (three challengers have no parameters). Every algorithm was run 25 times on each problem instance with a cutoff of 600 CPU seconds per run. Each cell \(\langle i,j \rangle \) summarizes the test-set performance of algorithm i on distribution j as a / b, where a (top) is the penalized average runtime; b (bottom) is the percentage of instances solved (i.e., having median runtime < cutoff). The best-scoring algorithm(s) in each column are indicated in bold.

5.1 Comparison of SATenstein Configurations

We now compare our automatically identified SATenstein solver designs to all of the challengers. As shown in Table 1, 3 of our 11 challengers (AG2p, AG2+, and AG20) are parameter-less. Furthermore, RANOV only differs from ANOV by the addition of a preprocessing step, and so can be understood as a variant of the same algorithm. This leaves us with 7 parameterized challengers to consider. For each, we sampled 50 configurations (consisting of the default configuration, one configuration optimized for each of our 6 benchmark distributions, and 43 random configurations). We then computed the pairwise transformation cost between the resulting 359 configurations (7 \(\times \) 50 challengers’ configurations + 6 SATenstein solvers + AG2p + AG2+ + AG20). The result can be understood as a graph with 359 nodes and 128 522 edges, with nodes corresponding to concept DAGs, and edges labeled by the minimum transformation cost between them. To visualize this graph, we used a dimensionality reduction method to map it onto a plane, with the aim of positioning points so that the Euclidean distance between every pair of points approximates their transformation cost as accurately as possible; in particular, we used the MDS algorithm [2] as implemented in MATLAB’s mdscale function, with option sstress.

Fig. 1.
figure 1

Visualization of the transformation costs in the design of 16 high-performance solvers (359 configurations) obtained via MDS.

Fig. 2.
figure 2

True vs mapped distances in Fig. 1. The data points correspond to the complete set of SATenstein[D] for all domains and all challengers with their default and domain-specific, optimized configurations.

The final layout of similarities among 359 configurations (16 algorithms) is shown in Fig. 1. Observe that in most cases the 50 different configurations for a given challenger solver were so similar that they mapped to virtually the same point in the graph.

As noted earlier, the distance between any two configurations shown in Fig. 1 only approximates their true distance. In addition, the result of the visualization also depends on the number of configurations considered: adding an additional configuration may affect the position of many or all other configurations. Thus, before drawing further conclusions about the results illustrated in Fig. 1, we validated the fidelity of the visualization to the original distance data. As can be seen from Fig. 2, there was a strong correlation between the computed and mapped distances (Pearson correlation coefficient: 0.96). Also, the mapping preserved the relative ordering of the true distances between configurations quite well (Spearman correlation coefficient 0.96)—in other words, distances that appear similar in the 2D plot tend to correspond to similar true distances (and vice versa). Digging deeper, we confirmed that the top two closest challengers in Fig. 1 to each given SATenstein were always the ones having the lowest true transformation costs. For distant challengers, relative distance in the visualization did not always reflect true relative transformation costs; however, we find this acceptable, since we are mainly interested in examining which configurations are similar to each other.

Having confirmed that our dimensionality reduction method is performing reliably, let us examine Fig. 1 in more detail. Overall, and unsurprisingly, we first note that the transformation cost between two configurations in the design space is very weakly related to their performance difference (quantitatively, the Spearman correlation coefficient between performance difference (PAR ratio) and configuration difference (transformation cost) was 0.25). Examining algorithms by type, we note that all dynamic local search algorithms are grouped together, on the right side of Fig. 1; likewise, the algorithms using adaptive mechanisms are grouped together at the bottom-left of Fig. 1. SATenstein() solvers were typically more similar to each other than to challengers, and fell into two broad clusters. The first cluster also includes the SAPS variants (SAPS, RSAPS), while the second also includes G2 and VW. None of the SATenstein solvers uses an adaptive mechanism to automatically adjust other parameters. In fact, as shown in Table 5, the same is true of the best performance-optimized challengers as neither SAPS, G2, or VW use adaptive mechanism. This suggests that in many cases, contrary to common belief (see, e.g., [8, 19]) it may be preferable to expose parameters so they can be instantiated by sophisticated configurators rather than automatically adjusting them at running time using a simple adaptive mechanism.

We now consider benchmarks individually. For the FAC benchmark, SATenstein[FAC] had similar performance to SAPS[FAC]; as seen in Fig. 1, both solvers are structurally very similar as well. Overall, for the ‘industrial’ distributions, CBMC(SE) and FAC, dynamic local search algorithms often yielded the best performance amongst all challengers. Our automatically-constructed SATenstein solvers for these two distributions are also dynamic local search algorithms. Due to the larger search neighbourhood and the use of clause penalties, dynamic local search algorithms are more suitable for solving industrial SAT instances, which often have some special global structure.

Fig. 3.
figure 3

The transformation costs of configuration of individual challengers and selected SATenstein solvers. (a): SAPS (best on HGEN and FAC); (b): SAPS and SATenstein[HGEN, FAC]; (c): G2 (best on SWGCP); (d): G2 and SATenstein[SWGCP]; (e): VW (best on CBMC(SE), QCP, and R3SAT); (f): VW and SATenstein[CBMC, QCP, R3SAT].

For R3SAT, a well-studied distribution, many challengers showed good performance (the top three challengers were VW, RSAPS, and SAPS). The performance of SATenstein[R3SAT] is only slightly better than that of VW[R3SAT]. Figure 1 shows that SATenstein[R3SAT] is a dynamic local search algorithm similar to RSAPS and SAPS.

For HGEN, even the best performance-optimized challengers, RSAPS[HGEN] and SAPS[HGEN], performed poorly. SATenstein[HGEN] achieves more than 1 000-fold speedups against all challengers. Its configuration is far away from any dynamic local search algorithm (the best challengers), and closest to VW, a WalkSAT algorithm, and G2.

For QCP, VW[QCP] does not reach the performance of SATenstein[QCP], but significantly outperforms all other challengers. Our transformation cost analysis shows that VW is the closest neighbour to SATenstein[QCP]. For SWGCP, many challengers achieve similar performance to SATenstein[SWGCP]. Figure 1 shows that SATenstein[SWGCP] is close to G2[SWGCP], which is the best performing challenger on SWGCP.

5.2 Comparison to Configured Challengers

Since there were large performance gaps between default and configured challengers, we were also interested in the transformation cost between the configurations of individual challenger solvers. Table 5 shows that after configuring each challenger for each distribution, we found that SAPS was best on HGEN and FAC; G2 was best on SWGCP, and VW was best on CBMC(SE), QCP, and R3FIX. Figure 3 (left) visualizes the parameter spaces for each of these three solvers (43 random configurations + default configuration + 6 optimized configurations). Figure 3 (right) shows the same thing, but also adds the best SATenstein() configurations for each benchmark on which the challenger exhibited top performance.

Examining these figures in the left column of Fig. 3, we first note that the SAPS configurations optimized for FAC and HGEN are very similar but differ substantially from SAPS’s default configuration. On SWGCP, the optimized configuration of G2 not only performs much better than the default but, as seen in Fig. 3(c), is also quite different. All three top-performing VW configurations are rather different from VW’s default, and none of them uses the adaptive mechanism for choosing parameter wpWalk, s, and c. Since the parameter useAdaptiveMechanism is a top-level parameter and many other parameters are conditionally dependent on it, the transformation costs between VW default and optimized configurations of VW are very large, due to the high relabelling cost for these nodes in our concept DAGs.

The right column of Fig. 3 illustrates the similarity between optimized SATenstein() solvers and the best performing challenger for each benchmark. As previously noted, SATenstein[FAC] and SAPS[FAC] are not only very similar in performance, but also structurally similar. Likewise, SATenstein[SWGCP] is similar to G2[SWGCP]. On R3SAT, many challengers had similar performance. SATenstein[R3SAT] (\(\text {PAR}=1.11\)) was quite different from the best challenger VW[R3SAT] (\(\text {PAR}=1.26\)), but resembled SAPS[R3SAT] (\(\text {PAR}=1.53\)). For the three remaining benchmarks, SATenstein() solvers exhibited much better performance than the best optimized challengers, and their configurations likewise differed substantially from the challengers’ configurations.

As an aside, it might initially be surprising that qualitative features of the visualizations in Fig. 3 appear to be absent from Fig. 1. In particular, the sets of randomly sampled challenger configurations that are quite well-separated in Fig. 3 are nearly collapsed into single points in Fig. 1. The reason for this lies in the fact that the 2D-mapping of the highly non-planar pairwise distance data performed by MDS focuses on minimal overall distortion. For example, when visualizing the differences within a set of randomly sampled SAPS configurations (Fig. 3(a)), MDS spreads these out into a cloud of points to represent their differences. However, the presence of a single SATenstein configuration that has large transformation costs from all of these SAPS configurations forces MDS to use one dimension to capture those differences, leaving essentially only one dimension to represent the much smaller differences between the SAPS configurations (Fig. 3(b)). Adding further very different configurations (as present in Fig. 1) leads to mappings in which the smaller differences between configurations of the same challenger become insignificant.

6 Conclusion

We have proposed a new metric for quantitatively assessing the similarity between configurations of highly parametric solvers. Our metric is based on a data structure, concept DAGs, that preserves the internal hierarchical structure of parameters. We estimate the similarity of two configurations as the transformation cost from one configuration to another. In the context of SATenstein, a highly parameterized SLS-based SAT solver, we have demonstrated that visualizations based on transformation cost can provide useful insights into similarities and differences between solver configurations. In addition, we believe that this metric could be useful for suggesting potential links between algorithm structure and algorithm performance further exploration of which could be an interesting future research direction.