Keywords

1 Introduction

In this paper, we propose a new perspective and visualization tools to understand and analyze the behavior of the backtrack-search procedure for solving Constraint Satisfaction Problems (CSPs). Backtrack search is commonly used for solving CSPs. However, its performance is unpredictable and can differ widely on similar instances. Further, maintaining a given consistency property during search has become a common practice  [13, 22, 30, 42, 43] and new strategies for dynamically switching between consistency algorithms are being investigated  [1,2,3, 9, 15, 16, 18, 25, 29, 41, 46, 48]. While consistency algorithms can dramatically reduce the size of the search space, their impact on the CPU cost of search can vary widely. This cost is currently poorly understood, hard to predict, and difficult to control.

To gain an understanding of the behavior of search and its performance, most previous work has focused on the visualization of various metrics at the end of search. In contrast, we claim that it is sometimes advantageous to inspect the history  [20] or evolution of these metrics along the search process in order to detect bottlenecks that can be addressed by specific tools (e.g., modify the topological structure of the constraint network [43] or enforce a particular high-level consistency). As we demonstrate in this paper, it is sometimes not practical, or even feasible, to manually carry out such an analysis. For this reason, we advocate to automatically organize the history of search (i.e., its evolution over time) as a sequence of regimes  [23] of qualitatively distinct behaviors detected by sampling the performance metrics during search. We claim that our visualizations are useful to the researcher designing new adaptive strategies and consistency algorithms as well as to the savvy engineer deploying constraint-programming solutions.

Building on the visualizations proposed by Woodward et al. [46] for describing search performance, we introduce the following contributions:

  1. 1.

    A criterion for computing the distance between two samples, quantifying the change in search behavior.

  2. 2.

    A clustering technique for automatically summarizing the history of search and its progress over time in a hierarchical structure that a human user can inspect at various levels of details.

  3. 3.

    A new visualization that reveals and summarizes the behavior of a dynamic variable ordering heuristic.

This paper is structured as follows. We first review background information and the relevant literature. Then, we describe our contributions. For each contribution, we highlight its usefulness with illustrative examples. Finally, we discuss the usefulness of our approach and conclude with directions for future research.

2 Background and Related Work

Constraint Satisfaction Problems (CSPs) are used to model many combinatorial decision problems of practical importance in Computer Science, Engineering, and Management. A CSP is defined by a tuple (XDC), where X is a set of variables, D is the set of the variables’ domains, and C a set of constraints that restrict the combinations of values that the variables can take at the same time. We denote the number of variables \(|X|=n\). A solution to a CSP is an assignment of values to variables such that all constraints are simultaneously satisfied. Determining whether or not a given CSP has a solution is known to be NP-complete. The constraint network of a CSP instance is a graph where the vertices represent the variables in the CSP and the edges represent the constraints and connect the variables in their scope.

To date, backtrack search (BT) is the only sound and complete algorithm for solving CSPs [7]. Backtracking suffers from thrashing when it repeatedly explores similar subtrees without making progress. Strategies for reducing thrashing include intelligent backtracking, ordering heuristics, learning no-goods, and enforcing a given consistency property after every variable instantiation. Enforcing consistency reduces the size of the search space by deleting, from the variables’ domains, values that cannot appear in a consistent solution given the current search path (i.e., conditioning). In recent years, the research community has investigated higher-level consistencies (HLC) as inference techniques to prune larger portions of the search space at the cost of increased processing effort [6, 13, 17, 33], leading to a trade-off between the search effort and the time for enforcing consistency. We claim that our visualizations are insightful tools to understand why and where to apply HLCs and to assess their effectiveness.

In this paper, we use by default dom/wdeg [10], a dynamic variable ordering heuristic (and dom/deg [5] where indicated). Below, we summarize the consistency properties that we enforce. We always maintain Generalized Arc Consistency (GAC) [26] during search. In addition to GAC, we enforce, as HLCs, Partition-One Arc-Consistency (POAC) [4], which filters the domains of the variables by performing singleton testing, and Partial Path Consistency (PPC) [8], which filters the constraints after triangulating the constraint graph. Both POAC and (strong) PPC are strictly stronger than GAC. In order to control their cost, we enforce the HLCs in a ‘selective/adaptive’ manner using the PrePeak\(^+\) strategy of Woodward et al. [46] for POAC (denoted PrePeak\(^+\)-POAC) and the Directional Partial Path Consistency of Woodward [45] (denoted DPPC\(^+\)). PrePeak\(^+\) triggers POAC by watching the numbers of backtracks and follows geometric laws to increase or decrease the calls to POAC. DPPC\(^+\) chooses a subset of the triangles of the triangulated constraint network on which it calls PPC in a manner to reduce the addition of new constraints while increasing propagation across the network. Finally, we use a k-way branching strategy for search. However, preliminary tests on Choco [12], on which we tested GAC and implemented POAC, show similar shapes of the visualizations.

Prior research on search visualization has appeared in the Constraint Programming literature. Most previous approaches, such as the ones discussed below, can be viewed as tools for inspecting and debugging search. The DiSCiPl project provides extensive visual functionalities to develop, test, and debug constraint logic programs such as displaying variables’ states, effect of constraints and global constraints, event propagation at each node of the search tree, and identifying isomorphic subtrees [11, 38]. Many useful methodologies from the DiSCiPl project are implemented in CP-Viz [39] and other works [37]. The OZ Explorer displays the search tree allowing the user to access detailed information about each tree node and to collapse and expand failing trees for closer examination [34]. This work is currently incorporated into Gecode’s Gist [35]. The above approaches focus on exploring in detail the search tree to examine and inspect the impact of individual search decisions on a particular variable, constraint, variable domain, or even a subtree of the search. In contrast, our work proposes to summarize the evolution of search along a given dimension (i.e., projection), providing a qualitative, abstract view of a search tree that is too large to be explored. We believe that the above approaches are orthogonal and complementary to ours.

Ghoniem et al. [19] propose a variety of heatmap visualizations to illustrate the operation of constraint propagation across an instance. Their heatmaps illustrate the impact of the activity of a variable or of a constraint on other variables or constraints in terms of filtering the variables’ domains or causing further constraint propagation. These visualizations are fine grained and inform the user of the effect of a particular decision. They have pedagogical value and can also be useful for debugging. They differ from our approach in that the user has the burden of identifying the animation frames of interests. Relatively short behaviors can go unnoticed and long behaviors can be wasteful of the user’s time. In contrast, our regime-identification process automatically organizes search history into a sequence of ‘interesting’ episodes. We consider their contribution too to be orthogonal and complementary to our approach.

Fig. 1.
figure 1

From Woodward et al. [46]: Number of backtracks (BpD) and calls (CpD) to POAC per depth at the end of search for benchmark instance pseudo-aim-200-1-6-4 using dom/wdeg. Left: GAC. Right: APOAC, an adaptive POAC by Balafrej et al. [1]. The CPU time, #NV (number of nodes visited), and #BT (number of backtracks) are typically used to indicate the cost of search. (Color figure online)

Tracing the search effort by depth of the search tree was first proposed for the number of constraint checks and values removed per search level (Epstein et al.  [16]) and for the number of nodes visited (Simonis et al. [39] in CP-Viz, also used for solving a packing problem [40]). More recently, Woodward et al.  [46] propose to focus on the number of backtracks per depth (BpD) to assess the severity of thrashing and on the number of calls to an HLC per depth (CpD) to explain the cost of enforcing an HLC, see Fig. 1. Further, they split the CpD into three line charts, showing in green the HLC calls that prune inconsistent subtrees and yield domain wipeouts (most effective), in blue those that filter the domains of future variables (effective), and in red those that do not filter the domain of any future variable (wasteful). Finally, they superimpose the CpD to the BpD chart in order to reveal the effectiveness of the consistency enforced (and that of the adaptive strategy for enforcing an HLC, when applicable).

3 Tracking Search History

To the best of our knowledge, previous approaches to visualizing the various search ‘metrics per depth’ of the search tree (e.g., number of nodes visited, constraint checks, or backtracks) present the values reached at the end of search thus providing a ‘blanket’ summary where thrashing occurred and search invested the most resources. They fail to identify the problems that occur during search. We argue that these visualizations may mask important information, such as transient behaviors of search, that an algorithm developer or a savvy user need to examine and address in order to solve the problem. In this paper, we propose to track the progress of search by collecting samples of the BpD ‘profile’ during search, see Fig. 2.Footnote 1 In the examples discussed in this paper, we use a sampling rate of 100 ms. To this end, we propose a criterion for deciding whether two consecutive samples are ‘equivalent’ and pertain to the same qualitative regime  [23] of search. The sequence of regimes yields a history  [20] of the search.

Fig. 2.
figure 2

BpD and CpD are sampled, during search, at times \(t_i\) (\(0<i\le \mathrm {max}\)), and the corresponding line charts are portioned into successive regimes of equivalent behavior that collectively describe the history of search progress

Below, we first introduce a criterion for assessing distance between two BpD samples. Then, we describe how to build a summarization tree using agglomerative hierarchical clustering and how the tree can be used to allow the human user to examine the search history at the desired level of detail. Finally, we provide two illustrative examples.

3.1 Distance Between Two BpD Samples

We adopt the following notations. \(D=[1,d_{\text {max}}]\) is the domain of the depth of the search tree for solving the CSP. In a k-way backtrack search, \(d_{\text {max}}=n\); in a binary-branching scheme, \(d_{\text {max}}\le na\), where a is the maximum domain size. We sample search, recording the cumulative number of backtracks per depth (BpD) at each sampling instant. \(\mathrm {BT}(d,t)\) designates the number of backtracks executed by search at depth \(d \in D \) from the beginning of search until time t. The first regime starts at \(t=0\) and we set \(\mathrm {BT}_{d\in D}(d,0) \equiv 0\). The last regime ends at \(t_{\mathrm {m}}\) and corresponds to the last sample collected. We say that two time instants t and \(t'\) do not belong to the same qualitative regime if their corresponding BpD curves ‘differ enough’ according to some divergence or distance criterion. Below, we introduce one such criterion.

To measure the distance between BpDs at time stamps t and \(t'\), we adapt, to our context, the Kullback-Leibler (KL) divergence (also called relative entropy)  [24]. Given two discrete probability distributions P and Q defined on the same probability space, \(\chi \), the KL-divergence from Q to P, which is always non-negative, is defined as: .

In order to adapt the KL-divergence to our context, we introduce the probability that \(\mathrm {BT}(d,t)\) take a given value as \(p(d,t) = \frac{\mathrm {BT}(d,t)}{\sum _{d\in D} \mathrm {BT}(d,t)}\). In order to avoid having this probability be zero when \(\mathrm {BT}(d,t)=0\), we use Laplace Smoothing [27], approximating this probability with \(\widehat{p}(d,t)= \frac{\mathrm {BT}(d,t)+ \alpha }{\sum _{d\in D}\mathrm {BT}(d,t) + \alpha d_{\max }}\) and using \(\alpha =0.1\) in our experiments. Finally, we define the divergence \(\mathrm {div}(t,t') =\mathrm {div}(t',t)\) between two BpD curves at time t and \(t'\) as

$$\begin{aligned} \mathrm {div(t,t')} = \max \left( \sum _{d\in D}\widehat{p}(d,t)\log {\left( \frac{\widehat{p}(d,t)}{\widehat{p}(d,t')}\right) }, \sum _{d\in D}\widehat{p}(d,t')\log {\left( \frac{\widehat{p}(d,t')}{\widehat{p}(d,t)}\right) } \right) \end{aligned}$$
(1)

Further, we define the distance between two regimes as the divergence between the two BpDs located at the middle sample of the corresponding regimes. Another alternative, a little more costly, is to use the largest divergence between any two samples in the regimes. Other options exist, preliminary evaluations did not exhibit a significant sensitivity to this choice.

3.2 Clustering Tree and Summarization Tree

We propose to summarize the history of the search following the idea of agglomerative hierarchical clustering described by Nielsen [28, Chapter 8], which yields a binary tree.

Starting from the sequence of BpD samples (collected during search and ordered along the time line), we form the leaf nodes of the clustering tree storing one sample per node. We place these nodes in a vector Q. Then,

  1. 1.

    We determine the divergence between every two consecutive nodes in Q (those that store temporally adjacent samples).

  2. 2.

    We generate a parent for the two nodes with the smallest divergence value.

  3. 3.

    We replace, in Q, the two nodes with their parent.

  4. 4.

    We associate, to the parent, the concatenation of the sample sequences of the two children, ordered in increasing time.

  5. 5.

    We store, for the parent, the divergence value between its children.

  6. 6.

    We choose as representative of the new node the middle sample, breaking ties by using the earlier sample.

We iteratively apply the process until \(|Q|=1\). We call this tree the clustering tree and reuse it every time a new summarization is needed.

Finally, we identify the deepest nodes in the tree with a stored divergence value larger than or equal to a user-defined value \(\delta \) and replace each such node by a new regime node. The regime node inherits from the node that it replaces, its representative middle sample, its divergence value, and the sequence of samples stored at the leaves of the corresponding subtree. We call the resulting structure a \(\delta \)-summarization tree. The dendrogram shown in Fig. 3 illustrates this process.

Fig. 3.
figure 3

Dendrogram of the summarization tree, showing the sequence of ‘cluster’ mergers and the divergence value at which each merger occurs (with \(\epsilon \ge 0\))

The clustering tree stores samples at its leaves and does not use or depend on \(\delta \). The summarization tree uses \(\delta \) to ‘trim’ the lower levels of the clustering tree, replacing them with regimes for explanation, which can be done interactively and is computationally cheap. Thus, if the summarization is too detailed or too abstract, the user may adjust \(\delta \) to quickly regenerate a new history from the clustering tree to reach a level of detail that is cognitively satisfying. Further, iteratively adding samples to the clustering tree as search proceeds can be done, online, in logarithmic time. Consequently, building and maintaining the clustering tree can efficiently be done online, during search. However, this direction remains to be investigated.

In addition to the data structures used for the hierarchical clustering [36], we introduce two data structures: first, Q, implemented as a vector of size m, where m is the number of samples collected. Then, another vector to store the data of the non-leaf nodes of the clustering tree. This size of this vector is \(m-1\) because the clustering tree is a full binary tree. The clustering tree can be built in \(\mathcal {O}(m^2)\) time and \(\mathcal {O}(m)\) space [14].

Fig. 4.
figure 4

aim-200-2-0-sat-1: Histories with GAC (top) and with PrePeak\(^+\)-POAC (bottom), using dom/wdeg. Showing the middle BpD sample in each regime and, for reference, the final sample collected. (Color figure online)

3.3 Illustrative Example: Explaining Adaptive Consistency

In this section, we compare the behavior of search for solving the SAT instance aim-200-2-0-sat-1 [50]. GAC takes less than 10 min and generates 14 million backtracks. PrePeak\(^+\)-POAC, on the other hand, solves the instance in little over a minute, netting only 1.2 million backtracks. To see why, we examine the corresponding search histories shown in Fig. 4a and 4b. We see that initially, both searches proceed exactly in the same manner. The behavior of PrePeak\(^+\)-POAC starts to diverge from that of GAC at the fifth regime of PrePeak\(^+\)-POAC (Fig. 4b) where PrePeak\(^+\) reacts to the sharp increase of the number of backtracks around depth 100 and triggers POAC. POAC successfully prunes inconsistent subtrees (green calls) or decreases the size of search tree (blue calls). By examining the last three regimes of PrePeak\(^+\) (Fig. 4b), we see how the HLC successfully reduces the severity of the backtracking, reducing the peak in height, width, and even in position. By investing in opportunistic and effective (even if costly) calls to POAC, we significantly reduce the search time.

3.4 Illustrative Example: Analyzing Structure

In this section, we show how the identification and visualization of the regimes of search provide insight into the structure of a problem instance and guide the choice of the appropriate type of consistency for solving it.

We consider the graph-coloring instance mug100-25-3 [50] and try to solve it with GAC using dom/wdeg for dynamic variable ordering. Search fails to terminate within two hours. The inspection of the BpD of GAC (left of Fig. 5) at the end of the unsuccessful search reveals the presence of a ‘dramatic’ peak of the number of backtracks at depth 83 (i.e., relatively deep in the search tree). This behavior hints that search may have made a bad decision at a shallower level of search from which it could not recover.

Fig. 5.
figure 5

mug100-25-3: GAC (left); constraint network (center); DPPC\(^+\) (right).

Fig. 6.
figure 6

Histories of mug100-25-3 with GAC and DPPC\(^+\). Showing the middle BpD sample in each regime and, for reference, the final sample collected.

The inspection of the history of search, shown in Fig. 6a, reveals six regimes. Search generates a first peak at depth 34 (Regimes 1 and 2) before generating a second peak at depth 86 (Regimes 3 and 4). Then, this second peak grows larger, dwarfs the first, shifts to depth 61, and settles at depth 83. Further, Regime 1 reveals a peak of magnitude 2834 backtracks at depth 34, which corresponds to instantiating about one third of the variables in the problem. Regime 2 shows that search has overcome the initial bottleneck in the instance occurring at depth 34 with a magnitude of 4556 backtracks but is struggling again with a second bottleneck at depth 86 with a magnitude of 4624 backtracks. Regime 3 shows that the severity of the first bottleneck is dwarfed by a dramatic increase of the second bottleneck, which reaches 237705 backtracks at depth 86. As time progresses, the peak moves to shallower levels, to depth 61. Although the peak keeps moving to deeper levels, search struggles and is unable to conclude. Another noteworthy information is the time scale at which the regimes are identified. The first three regimes take place within the first five seconds of search whereas the last three regimes occur at larger time scales. It seems presumptuous to expect a human user to identify and isolate behavior occurring at such small time scales.

The observation of the two bottlenecks prompts us to examine the constraint network of the problem instance looking for some structural explanation for the non-termination of search. Indeed, we expect the graph to exhibit some particular configuration such as two overlapping cliques. The examination of the constraint graph, shown at the center of Fig. 5, refutes our two-cliques hypothesis but reveals the existence of a few large cycles as well as many cycles of size three or more. Previous work argued that the presence of cycles can be detrimental to the effectiveness of constraint propagation and showed how triangulation of the constraint network allows us to remedy the situation [43, 44, 47]. With this insight, we consider enforcing, during search, Partial Path Consistency (PPC) instead of GAC because the algorithm for PPC operates on existing triangles and on triangulated cycles of the constraint network [8]. Because PPC is too expensive to enforce during search, Woodward proposed a computationally competitive algorithm to enforce a relaxed version of Directional Partial Path Consistency (DPPC\(^+\)) [45]. By enforcing a strong consistency along cycles, search is able to detect the insolvability of mug100-25-3 and terminates in less than 20 s. The corresponding history of search, shown in Fig. 6b, exhibits a unique regime.

Figure 5 (right) shows the BpD of search while enforcing DPPC\(^+\) on the mug100-25-3 instance. This chart shows a peak of 13536 backtracks at depth 34. Table 1 compares the cost of search with GAC and with DPPC\(^+\). The sign ‘>’ indicates that search does not terminate within the allotted two hours.

Table 1. Performance of search (dom/wdeg) on mug100-25-3: GAC versus DPPC\(^+\)

Above, we showed how the visualizations and regimes of search behavior allow us to form a hypothesis about the issues encountered by search and attempt an alternative technique that is usually avoided because of its cost, culminating in successfully solving a difficult instance.

4 Visualizing Variable Ordering

Heuristics for variable ordering remain an active area of research in Constraint Programming. Currently, dom/wdeg [10] is considered the most effective general-purpose heuristic. When running experiments on benchmark problems, we encountered a puzzling case where search did not terminate when using dom/wdeg but safely terminated, determining inconsistency, when using dom/deg. We propose variable-instantiation per depth (VIpD) as a visualization to summarize the behavior of a variable ordering heuristic.

4.1 Variable-Instantiation per Depth (VIpD)

We denote by I(vdt) the number of times variable v is instantiated at a given depth d from the beginning of search until time t. For a variable \(v\in X\), we define the weighted depth:

$$\begin{aligned} d_{w}(v,t) = \frac{\sum _{d\in D} I(v,d,t)\cdot d}{\sum _{d\in D} I(v,d,t)} \end{aligned}$$
(2)

We introduce the VIpD chart at time t as a two-dimensional heatmap of \(I(\cdot ,\cdot ,t)\) using, as x-axis, the depth of search and, as y-axis, the variables of the CSP listed in their increasing \(d_w(\cdot ,t)\) values (ties broken using a lexicographical ordering of the names of the variables).

Table 2. Performance of search on mug100-1-3 using POAC

4.2 Illustrative Example

We try to solve the instance mug100-1-3 [50] of a graph coloring problem using GAC, but searches with dom/wdeg and with dom/deg fail to terminate after two hours of wall-clock time. In a next attempt, we maintain POAC during search; search with dom/wdeg did not terminate but, with dom/deg, it terminated after 47 min and determined that the instance is inconsistent. Table 2 compares the performance of POAC with dom/wdeg and dom/deg. The sign ‘>’ indicates that search did not terminate within two hours.

The comparison of the VIpD of POAC with dom/wdeg and that with dom/deg reveals an erratic behavior of dom/wdeg (Fig. 7). Indeed, dom/wdeg, which learns from domain wipeouts, is unable to determine the most conflicting variables on which it should focus. Instead, it continually instantiates a large number of variables over a wide range of depth levels in search. In contrast, dom/deg exhibits a more stable, less chaotic behavior. It focuses on the variables that yield inconsistency and is able to successfully terminate. The VIpD charts are also available as an Excel file where the user may analyze individual cells to examine their \(I(\cdot ,\cdot ,t)\) values.

Fig. 7.
figure 7

Variable Instantiation per Depth (VIpD) on mug100-1-3. Left: POAC with dom/wdeg after two hours. Right: POAC with dom/deg after 47 min.

Comparing the histories alongside the VIpD charts (see Fig. 8), we see clearly that dom/wdeg is erratic and chaotic since the beginning of search whereas dom/deg is able to quickly focus on a relatively localized conflict to determine inconsistency of the problem instance. The peak of the BpD is sharper for dom/deg than for dom/wdeg and the VIpD shows a smaller set of variables is affected by the backtracking in dom/deg than in dom/wdeg. Future work could analyze this information to identify the source of conflict.

Fig. 8.
figure 8

History and VIpD of POAC on mug100-1-3: dom/wdeg (top), dom/deg (bottom)

5 Discussion

We illustrated the usefulness of our approach in three case studies (summarized in Table 3). The first example (Sect. 3.3) compares two search histories to explain the operation and effectiveness of an adaptive technique (perhaps to a student or to a layperson). The same process can be used by a researcher or an application developer to explore the impact of, and adjust, the parameters of a new method. In the second example (Sect. 3.4), search fails because of a transient behavior at a tiny time scale detected by our automatic regime identification. Without this tool, the user may never notice the issue in order to explore effective solutions.

Table 3. Illustrative examples: the instances used and the corresponding search history

Beyond the scenarios and the metrics discussed above, we believe that building and comparing search histories are useful to explore, understand, and explain the impact, on the behavior and performance of search, of many advanced techniques such as no-good/clause learning, restart strategies, consistency algorithms, variable ordering heuristics, as well as the design of new such algorithms.

We strongly believe that our approach is beneficial in the context of existing constraint solvers such as Choco  [12] and Gecode  [35]. As stated above, preliminary tests on GAC and POAC [32] on Choco have shown similar BpD and CpD line-charts to those we see in k-way branching. Further, propagators in such solvers are often implemented on the individual constraints themselves, which makes the accounting of the calls to various types of consistencies particularly well adapted.

The SAT community is another one that could benefit from our work. In SAT solvers, inprocessing (in the form of the application of the resolution rule) interleaves search and inference steps [21, 49]. Resolution is allocated a fixed percentage of the CPU time (e.g., 10%) and sometimes its effectiveness is monitored for early termination. We believe that inference should be targeted at the ‘areas’ where search is struggling rather than following a predetermined and fixed effort allocation. We claim that understanding where search struggles and how that struggle changes can be used to identify where inference is best invested.

While our work does not generate verbal explanations, we claim that the graphical tools directly ‘speak’ to a user’s intuitions.

6 Conclusions

In this paper, we presented a summarization technique and a visualization to allow the user to understand search behavior and performance on a given problem instance.

Currently, our approach provides a ‘post-mortem’ analysis of search, but our ultimate goal is to provide an ‘in-vivo’ analysis and allow the user to intervene in, and guide, the search process, trying alternatives and mixing strategies while observing their effects on the effectiveness of problem solving. A significantly more ambitious objective is to use the tools discussed in this paper as richer representations than mere numerical values to automatically guide search [31].

Future work includes the development of animation techniques based on the proposed visualizations.