Conflict history based heuristic for constraint satisfaction problem solving

Habet, Djamal; Terrioux, Cyril

doi:10.1007/s10732-021-09475-z

Conflict history based heuristic for constraint satisfaction problem solving

Published: 30 June 2021

Volume 27, pages 951–990, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Heuristics Aims and scope Submit manuscript

Conflict history based heuristic for constraint satisfaction problem solving

Download PDF

Djamal Habet¹ &
Cyril Terrioux¹

388 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

The variable ordering heuristic is an important module in algorithms dedicated to solve Constraint Satisfaction Problems (CSP), while it impacts the efficiency of exploring the search space and the size of the search tree. It also exploits, often implicitly, the structure of the instances. In this paper, we propose Conflict-History Search (CHS), a dynamic and adaptive variable ordering heuristic for CSP solving. It is based on the search failures and considers the temporality of these failures throughout the solving steps. The exponential recency weighted average is used to estimate the evolution of the hardness of constraints throughout the search. The experimental evaluation on XCSP3 instances shows that integrating CHS to solvers based on MAC (Maintaining Arc Consistency) and BTD (Backtracking with Tree Decomposition) achieves competitive results and improvements compared to the state-of-the-art heuristics. Beyond the decision problem, we show empirically that the solving of the constraint optimization problem (COP) can also take advantage of this heuristic.

Profound Degree: A Conservative Heuristic to Repair Dynamic CSPs

Using conflict and support counts for variable and value ordering in CSPs

Article 06 December 2017

Adapting Consistency in Constraint Solving

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The Constraint Satisfaction Problem (CSP) is an important formalism in Artificial Intelligence (AI) which allows to model and efficiently solve problems that occur in various fields, both academic and industrial (e.g. Cabon et al. 1999; Holland and O’Sullivan 2005; Rossi et al. 2006; Simonin et al. 2015). A CSP instance is defined on a set of variables, which must be assigned in their respective finite domains. Variable assignments must satisfy a set of constraints, which express restrictions on assignments. A solution is an assignment of each variable, which satisfies all constraints.

CSP solving is often based on backtracking algorithms. In recent years, it has made significant progress thanks to research on several aspects. In particular, considerable effort is devoted to global constraints, filtering techniques, learning and restarts (Rossi et al. 2006). An important component in CSP solvers is the variable ordering heuristic. Indeed, the corresponding heuristics define, statically or dynamically, the order in which the variables will be assigned and, thus, the way that the search space will be explored and the size of the search tree. The problem of finding the best variable to assign (i.e. one which minimizes the search tree size) is NP-Hard (Liberatore 2000).

Many heuristics have been proposed (e.g. Bessière et al. 2001; Bessière and Régin 1996; Boussemart et al. 2004; Brélaz 1979; Geelen 1992; Golomb and Baumert 1965; Hebrard and Siala 2017; Michel and Hentenryck 2012; Refalo 2004) aiming mainly to satisfy the first-fail principle (Haralick and Elliot 1980) which advises “to succeed, try first where you are likely to fail”. Nowadays, the most efficient heuristics are adaptive and dynamic (Boussemart et al. 2004; Geelen 1992; Hebrard and Siala 2017; Michel and Hentenryck 2012; Refalo 2004), where the variable ordering is defined according to the collected information since the beginning of the search. For instance, some heuristics consider the effect of filtering when decisions and propagations are applied (Michel and Hentenryck 2012; Refalo 2004). dom/wdeg is one of the simplest, the most used and efficient variable ordering heuristic (Boussemart et al. 2004). It is based on the hardness of constraints and, more specifically, reflects how often a constraint fails. It uses a weighting process to focus on the variables appearing in constraints with high weights which are assumed to be hard to satisfy. In addition, some heuristics, such as LC (Lecoutre et al. 2006) and COS (Gay et al. 2015), attempt to consider the search history while they require the use of auxiliary heuristics.

In this paper, we propose Conflict-History Search (CHS), a new dynamic and adaptive variable ordering heuristic for CSP solving. It is based on the history of search failures, which happen as soon as a domain of a variable is emptied after constraint propagations. The goal is to reward the scores of constraints that have recently been involved in conflicts and therefore to favor the variables appearing in these constraints. The scores of constraints are estimated on the basis of the exponential recency weighted average technique, which comes from reinforcement learning (Sutton and Barto 1998). It was also recently used in defining powerful branching heuristics for solving the satisfiability problem (SAT) (Liang et al. 2016a, b). We have integrated CHS in solvers based on MAC (Maintaining Arc Consistency) (Sabin and Freuder 1994) and BTD (Backtracking with Tree-Decomposition) (Jégou and Terrioux 2003). The empirical evaluation on XCSP3 instances^{Footnote 1} shows that CHS is competitive and brings improvements to the state-of-the-art heuristics. In addition, this evaluation provides an extensive study of the performance of state-of-the-art search heuristics on more than 12,000 instances. Finally, we also study, from a practical viewpoint, the benefits of the proposed heuristic for solving constraint optimization problems (COP).

The paper is structured as follows. Section 2 includes some necessary definitions and notations. Section 3 presents and details our contribution, the CHS variable ordering heuristic. Section 4 describes related work on variable ordering heuristics for CSP and on branching heuristics for the satisfiability problem. CHS is evaluated experimentally and compared to the main powerful heuristics of the state-of-the-art on CSP instances in Sect. 5 and on COP ones in Sect. 6. Finally, we conclude and give some perspectives on extending the application of CHS.

2 Preliminaries

This section is dedicated to the definition of CSP and Exponential Recency Weighted Average, which we use to propose our heuristic.

2.1 Constraint satisfaction problem

An instance of a Constraint Satisfaction Problem (CSP) is given by a triple (X, D, C), such that: $X = \{x_1, \ldots ,x_n\}$ is a set of n variables, $D = \{D_{1},\ldots ,D_{n}\}$ is a set of finite domains, and $C = \{c_1, \ldots , c_e\}$ is a set of e constraints. The domain of each variable $x_i$ is $D_i$. Each constraint $c_j$ is defined by its scope $S(c_j)$ and its compatibility relation $R(c_j)$, where $S(c_j) = \{x_{j_1} , \ldots , x_{j_k}\} \subseteq X$ and $R(c_j) \subseteq D_{{j_1}} \times \cdots \times D_{{j_k}}$. The constraint satisfaction problem asks for an assignment of the variables $x_i \in X$ within their respective domains $D_i$ ($1 \le i \le n$) that satisfies each constraint in C. Such consistent assignment is a solution. Checking whether a CSP instance has a solution is NP-complete (Rossi et al. 2006).

In the past decades, many solvers have been proposed for solving CSPs. Generally, from a practical viewpoint, they succeed in solving efficiently a large kind of instances despite of the NP-completeness of the CSP decision problem. In most cases, they rely on optimized backtracking algorithms whose time complexity is at least in $O(e.d^n)$ where d denotes the size of the largest domain. In order to ensure an efficient solving, they commonly exploit jointly several techniques (see Rossi et al. 2006 for more details) among which we can cite:

variable ordering heuristics which aim to guide the search by choosing the next variable to assign (we discuss about some state-of-the-art heuristics in Sect. 4),
constraint learning and non-chronological backtracking which aim to avoid some redundancies during the exploration of the search space,
filtering techniques enforcing some consistency level which aim to simplify the instance by removing some values from domains or tuples from constraint relations which cannot participate to a solution.

For instance, most state-of-the-art solvers maintain some consistency level at each step of the search, like MAC (Maintaining Arc-Consistency Sabin and Freuder 1994) or RFL (Real Full Look-ahead Nadel 1988) do for arc-consistency. This latter turns out to be a relevant tradeoff between the number of removed values and the runtime.

We now recall MAC with more details. During the solving, MAC develops a binary search tree whose nodes correspond to decisions. More precisely, it can make two kinds of decisions: positive decisions $x_i=v_i$ which assign the value $v_i$ to the variable $x_i$ and negative decisions $x_i \ne v_i$ which ensure that $x_i$ cannot be assigned with $v_i$. Let us consider $\varSigma =\langle \delta _1,\ldots ,\delta _i\rangle $ (where each $\delta _j$ may be a positive or negative decision) as the current decision sequence. At each node of the search tree, MAC takes either a positive decision or negative one. When reaching a new level, it starts by a positive decision which requires to choose a variable among the unassigned variables and a value. Both choices are achieved thanks to heuristics. Then, once the decision made, MAC applies an arc-consistency filtering. This filtering deletes some values of unassigned variables which are not consistent with the last taken decision and $\varSigma $. By so doing, a domain may become empty. In such a case, we say that a dead-end or a conflict occurs. This means that the current set of decisions cannot lead to a solution. If no dead-end occurs, the search goes on to the next level by choosing a new positive decision. Otherwise, the current decision is called into question. If it is a positive decision $x_i = v_i$, MAC makes the corresponding negative decision $x_i \ne v_i$, that is the value $v_i$ is deleted from the domain $D_{i}$. Otherwise, it is a negative decision and MAC backtracks to the last positive decision $x_\ell = v_\ell $ in $\varSigma $ and makes the decision $x_\ell \ne v_\ell $. If no such decision exists, it means that the instance has no solution. In contrast, if MAC succeeds in assigning all the variables, the corresponding assignment is, by construction, a solution of the considered instance.

More recently, restart techniques have been introduced in the CSP framework (e.g. in Lecoutre et al. 2007). They generally allow to reduce the impact of bad choices performed thanks to heuristics (like the variable ordering heuristic) or of the occurrence of heavy-tailed phenomena (Gomes et al. 2000). For efficiency reasons, they are usually exploited with some learning techniques like recording of nld-nogoods in Lecoutre et al. (2007). These nogoods can be seen as a set of decisions which cannot be extended to a solution. They are used to avoid visiting again a part of the search space which has already been visited by MAC. These nogoods are recorded each time a restart occurs.

2.2 Exponential recency weighted average

Given a time series of m numbers $y=(y_1, y_2, \ldots , y_m)$, the simple average of y is $\sum _{i=1}^{m}\frac{1}{m}y_i$ where each $y_i$ has the same weight $\frac{1}{m}$. There are situations where recent data are more relevant than old data to describe the current situation. The Exponential Recency Weighted Average (ERWA) (Sutton and Barto 1998) takes into account such considerations by giving higher weights to the recent data than the older ones. More precisely, the exponential moving average ${\bar{y}}_m$ is computed as follows:

$$\begin{aligned} {\bar{y}}_m = \sum _{i=1}^{m} \alpha \times (1-\alpha )^{m-i}\times y_i \end{aligned}$$

where $0<\alpha <1$ is a step-size parameter which controls the relative weights between recent and past data. The moving average can also be calculated incrementally by the formula:

$$\begin{aligned} {\bar{y}}_{m} = (1-\alpha )\times {\bar{y}}_{m-1} + \alpha \times {y}_{m}. \end{aligned}$$

The base case is ${\bar{y}}_0=0$. ERWA is used to solve the bandit problem to estimate the expected reward of different actions in nonstationary environments (Sutton and Barto 1998). In bandit problems, the agent must select an action to play, from a given set of actions, while maximizing its long term expected reward.

3 Conflict-history search for CSP

This section is dedicated to our contribution by defining and describing a new variable ordering heuristic for CSP solving, which we call Conflict-History Search (CHS). The main idea is to consider the history of constraint failures and favor the variables that often appear in recent failures. In this order, the conflicts are dated and the constraints are weighted on the basis of the exponential recency weighted average. These weights are coupled with the variable domains to calculate the Conflict-History scores of the variables.

3.1 CHS description

Formally, CHS maintains for each constraint $c_j$ a score $q(c_j)$ which is initialized to 0 at the beginning of the search. If $c_j$ leads to a failure during the search because the domain of a variable in $S(c_j)$ is emptied then $q(c_j)$ is updated by the formula below derived from ERWA (Sutton and Barto 1998):

$$\begin{aligned} q(c_j) = (1-\alpha ) \times q(c_j) + \alpha \times r(c_j) \end{aligned}$$

The parameter $0<\alpha <1$ is the step-size and $r(c_j)$ is the reward value. The parameter $\alpha $ fixes the importance given to the old value of q at the expense of the reward r. The value of $\alpha $ decreases over time as it is applied in reinforcement learning to converge towards relevant values of q (Sutton and Barto 1998). In other words, decreasing the value of $\alpha $ amounts to giving more importance to the last value of q and considering that the values of q are more and more relevant as the search progresses. Furthermore, we are interested by the constraint failure to follow the first-fail principle (Haralick and Elliot 1980).

CHS applies the decreasing policy of $\alpha $, which is successfully used for designing efficient branching heuristic for the satisfiability problem (Liang et al. 2016a, b). More precisely, starting from an initial value $\alpha _0$, $\alpha $ decreases by $10^{-6}$ at each constraint failure to a minimum of 0.06. This minimum value of $\alpha $ controls the number of steps before considering that a convergence is reached.

The reward value $r(c_j)$ is based on how recently $c_j$ occurred in conflicts. More precisely, it relies on the proximity between the previous conflict in which $c_j$ is involved and the current one. By so doing, we aim to give a higher reward to constraints that fail regularly over short periods of time during the search space exploration. The reward value is calculated according to the formula:

$$\begin{aligned} r(c_j)=\frac{1}{Conflicts-Conflict(c_j)+1} \end{aligned}$$

Initialized to 0, Conflicts is the number of conflicts which have occurred since the beginning of the search. $Conflict(c_j)$ is also initialized to 0 for each constraint $c_j \in C$. When a conflict occurs on $c_j$, $r(c_j)$ and $q(c_j)$ are computed. Then Conflicts is incremented by 1 and $Conflict(c_j)$ is updated to the new value of Conflicts.

At this stage, we define the Conflict-History score of a variable $x_i \in X$ as follows:

$$\begin{aligned} chv(x_i) =\frac{\sum \nolimits _{c_j \in C:\ x_i \in S(c_j) \wedge |Uvars({S(c_j)})| >1} q(c_j)}{|D_i|} \end{aligned}$$

(1)

$Uvars({Y})$ is the set of unassigned variables in ${Y}$. $D_i$ is the current domain of $x_i$ and its size may be reduced by the propagation process in the current step of the search. CHS chooses the variable to assign with the highest chv value. In this manner, CHS focuses branching on the variables with a small domain size belonging to constraints which appear recently and repetitively in conflicts.

One can observe that at the beginning of the search, all the variables have the same score, which is equal to 0. To avoid random selection, we update Eq. 1 to calculate chv as given below, where $\delta $ is a positive real number close to 0.

$$\begin{aligned} chv(x_i) =\frac{\sum \nolimits _{c_j \in C: \ x_i \in S(c_j) \wedge |Uvars({S(c_j)})| >1} (q(c_j)+\delta )}{|D_i|} \end{aligned}$$

(2)

Thus, when the search starts, the branching will be oriented according to the degree of the variables without having a negative influence on the ERWA-based calculation later in the search. CHS selects the branching variable with the highest chv value calculated according to Eq. 2.

The heuristic CHS is described in Algorithm 1 with an event-driven approach. Lines 2–7 correspond to the initialization step. If a conflict occurs when enforcing the filtering with the constraint $c_j$, the associated event is triggered and the score is update (Lines 8–14). The selection of a new variable is achieved thanks to Lines 15–16.

3.2 CHS and restarts

Restart techniques are known to be important for the efficiency of solving algorithms (see for example Lecoutre et al. 2007). Restarts may allow to reduce the impact of irrelevant choices done during the search according to heuristics, such as variable selection.

As it will be detailed later, CHS is integrated into CSP solving algorithms, which include restarts. In the corresponding implementations, the $Conflict(c_j)$ value of each constraint $c_j$ is not reinitialized when a restart occurs. It is the same for $q(c_j)$. However, a smoothing may be applied and will be explained below. Keeping this information unchanged reinforces learning from the search history.

Concerning the step-size $\alpha $, which defines the importance given to the old value of $q(c_j)$ at the expense of the reward $r(c_j)$, CHS reinitializes the value of $\alpha $ to $\alpha _0$ at each restart (Line 18 of Algorithm 1). This may guide the search through different parts of the search space.

3.3 CHS and smoothing

At each conflict, CHS updates the chv score of one constraint at a time: the constraint $c_j$ which is used to wipe out the domain of a variable in $S(c_j)$. As long as they do not appear in new conflicts, some constraints can have their weights unchanged for several search steps. These constraints may have high scores while their importance does not seem significant for the current part of the search. To avoid this situation, we propose to smooth the scores $q(c_j)$ of all the constraints $c_j \in C$ at each restart by the following formula:

$$\begin{aligned} q(c_j) = q(c_j) \times 0.995^{Conflicts-Conflict(c_j)} \end{aligned}$$

Hence, the scores of constraints are decayed according to the date of their last appearances in conflicts (Lines 19–20 of Algorithm 1).

4 Related work

Before providing a detailed experimental evaluation of CHS and its components, we present the most efficient and common variable ordering heuristics for CSP. As CHS, the recalled heuristics share the same behavior. In effect, the variables and/or constraints are weighted dynamically throughout the search by considering the collected information since its beginning. Some of these heuristics, such as Last Conflict (Lecoutre et al. 2006), require the use of an auxiliary heuristic as it will be explained later. We also recall briefly branching heuristics for the satisfiability problem. It should be recalled that ERWA was first used in the context of the satisfiability problem (Liang et al. 2016a, b).

4.1 Impact-based search (IBS)

This heuristic selects the variable which leads to the largest search space reduction (Refalo 2004). The impact on the search space size is approximated as the reduction of the product of the variable domain sizes. Formally, the impact of assigning the variable $x_i$ to the value $v_i \in D_i$ is defined by:

$$\begin{aligned} I(x_i=v_i) = 1 - \frac{P_{after}}{P_{before}} \end{aligned}$$

$P_{after}$ and $P_{before}$ are respectively the products of the domain cardinalities after and before branching on $x_i=v_i$ and applying constraint propagations. By doing so, selecting the next branching variable requires the computation of the impact of each variable assignment, by simulating filtering at each node of the search tree. This can be very time consuming. Hence, IBS considers the impact of an assignment at a given node as the average of its observed impacts. More precisely, if K is the index set of impacts observed of $x_i=v_i$, IBS estimates an averaged impact of this assignment as follows, where $I_k$ is kth impact value:

$$\begin{aligned} {\bar{I}}(x_i=v_i) = \frac{\sum \nolimits _{k \in K} I_k(x_i=v_i)}{|K|} \end{aligned}$$

Finally, the impact of a variable according to its current domain, which may be filtered, is defined as follows:

$$\begin{aligned} \mathcal{I}(x_i) = \sum \limits _{v\in D_i} 1-{\bar{I}}(x_i=v) \end{aligned}$$

IBS selects the variable with the highest impact value $\mathcal{I}(x_i)$.

4.2 Conflict-driven heuristic

A popular variable ordering heuristic for CSP solving is dom/wdeg (Boussemart et al. 2004). It guides the search towards the variables appearing in the constraints which seem hard to satisfy. For each constraint $c_j$, the dom/wdeg heuristic maintains a weight $w(c_j)$, initially set to 1, counting the number of times that $c_j$ has led to a failure (i.e. the domain of a variable $x_i$ in $S(c_j)$ is emptied during propagation from $c_j$). The weighted degree of a variable $x_i$ is defined as:

$$\begin{aligned} wdeg(x_i)= \sum \limits _{c_j \in C:\ x_i \in S(c_j) \wedge |Uvars({S(c_j)})| >1} w(c_j) \end{aligned}$$

The dom/wdeg heuristic selects the variable $x_i$ to assign with the smallest ratio $|D_i| / wdeg(x_i)$, such that $D_i$ is the current domain of $x_i$ (the size of $D_i$ may be reduced in the current search step). Note that the constraint weights are not smoothed in dom/wdeg. Also, variants of dom/wdeg were introduced, such as in Hebrard and Siala (2017), but are not widely used in practice. Very recently, a refined version of wdeg (called $wdeg^{ca.cd}$) has been defined in Wattez et al. (2019). When a conflict occurs for a constraint $c_j$, instead of increasing its weight by 1 as in dom/wdeg, $wdeg^{ca.cd}$ increases its weight by a value depending on the number of unassigned variables in the scope of $c_j$ and their current domain size.

4.3 Activity-based heuristic (ABS)

ABS is motivated by the prominent role of filtering techniques in CSP solving (Michel and Hentenryck 2012). It exploits this filtering information and maintains measures of how often the variable domains are reduced during the search. In practice, at each node of the search tree, constraint propagation may filter the domains of some variables after the decision process. Let $X_f$ be the set of such variables. Accordingly, the activities $A(x_i)$, initially set to 0, of the variables $x_i \in X$ are updated as follows:

$A(x_i) = A(x_i)+1$ if $x_i \in X_f$
$A(x_i) =\gamma \times A(x_i)$ if $x_i \not \in X_f$

$\gamma $ is a decay parameter, such that $0 \le \gamma \le 1$. The ABS heuristic selects the variable $x_i$ with the highest ratio $A(x_i)/|D_i|$.

4.4 CHB in gecode

Dedicated to constraint programming, Gecode solver implements Conflict-History based Branching (CHB) heuristic since version 5.1.0 released in April 2017 (Schulte 2018). It follows the same steps of the first definition of CHB in the context of the satisfiability problem (Liang et al. 2016a, b). In Gecode, the following parameters are used to update the Q-score of each variable $x_i$ of the CSP instance, denoted $qs(x_i)$. f is the number of failures encountered since the beginning of the search and $lf(x_i)$ is the last failure number of $x_i$, corresponding to the last time that $D_i$ is emptied.

Initialized to 0.05 for each variable $x_i$, CHB update the Q-score $qs(x_i)$ of $x_i$ during the constraint propagation as follows:

If $D_i$ is not reduced then $qs(x_i)$ remains unchanged
If $D_i$ is pruned and the search leads to a failure, $lf(x_i)$ is set to f and $qs(x_i)$ is updated by:
$$\begin{aligned} qs(x_i)=(1-\alpha )\times qs(x_i) + \alpha \times r \end{aligned}$$
The step-size $\alpha $, initialized to 0.4, is updated to $\alpha -10^{-6}$ if $\alpha > 0.06$. The value of the reward r is given by:
$$\begin{aligned} r=\frac{1}{f-lf(x_i)+1} \end{aligned}$$
If $D_i$ is pruned and the search does not lead to a failure, $qs(x_i)$ is also updated by:
$$\begin{aligned} qs(x_i)=(1-\alpha )\times qs(x_i) + \alpha \times r \end{aligned}$$
In this case, the reward value is defined by:
$$\begin{aligned} r=\frac{0.9}{f-lf(x_i)+1} \end{aligned}$$

CHB in Gecode selects the variable with the highest Q-score.

4.5 Last conflict (LC)

Last Conflict (LC) reasoning (Lecoutre et al. 2006) aims to better identify and exploit nogoods in a binary tree search, where each node has a first branch corresponding to a positive decision ($x_i = v_i$) and eventually a second branch with a negative decision ($x_i \ne v_i$).

If a positive decision $x_i = v_i$ leads to a conflict then LC records the variable $x_i$ as a conflicting variable. The value $v_i$ is removed from the domain $D_i$ of $x_i$. After developing the negative branch $x_i \ne v_i$, LC continues the search by assigning a new value $v'_i$ to $x_i$ instead of choosing a new decision variable. This treatment is repeated until a successful assignment of $x_i$ is achieved. In this case, the variable $x_i$ is unrecorded as a conflicting one and the next decision variable is decided by an auxiliary variable ordering heuristic. Hence, this last one is used when no conflicting variable is recorded by LC.

4.6 Conflict order search (COS)

Conflict Order Search (COS) (Gay et al. 2015) is intended to focus the search on the variables which lead to recent conflicts. When a branching on a variable $x_i$ fails, $x_i$ is stamped by the total number of failures since the beginning of the search (the initial stamp value of each variable is 0). COS prefers the variable with the highest stamp value. An auxiliary heuristic is used if all the unassigned variables have the stamp value 0.

4.7 Branching heuristics for the satisfiability problem

In the context of the satisfiability problem, modern solvers based on Conflict-Driven Clause Learning (CDCL) (Eén and Sörensson 2003; Marques-Silva and Sakallah 1999; Moskewicz et al. 2001) employ variable branching heuristics correlated to the ability of the variable to participate in producing learnt clauses when conflicts arise (a conflict is a clause falsification). The Variable State Independent Decaying Sum (VSIDS) heuristic (Moskewicz et al. 2001) maintains an activity value for each Boolean variable. The activities are modified by two operations: the bump (increase the activity of variables appearing in the process of generating a new learnt clause when a conflict is analyzed) and the multiplicative decay of the activities (often applied at each conflict). VSIDS selects the variable with the highest activity to branch on.

Recently, a conflict history based branching heuristic (CHB) (Liang et al. 2016a), based on the exponential recency weighted average, was introduced. It rewards the activities to favor the variables that were recently assigned by decision or propagation. The rewards are higher if a conflict is discovered. The Learning Rate Branching (LRB) heuristic (Liang et al. 2016b) extends CHB by exploiting locality and introducing the learning rate of the variables.

4.8 Discussion

Reinforcement learning techniques have already been studied in constraint programming. The multi-armed bandit framework is used to select adaptively the consistency level of propagation at each node of the search tree (Balafrej et al. 2015). A linear regression method is used to learn the scoring function of value heuristics (Chu and Stuckey 2015). Rewards are calculated and used to select adaptively the backtracking strategy (Bachiri et al. 2015). Learning process based on Least Squares Policy Iteration technique is used to tune adaptively the parameters of stochastic local search algorithms (Battiti and Campigotto 2012).

More recently, upper confidence bound and Thompson Sampling techniques are employed to select automatically a variable ordering heuristic for CSP, among a set of candidate ones, at each node of the search tree (Xia and Yap 2018). The considered candidate set contains notably IBS, ABS and dom/wdeg. Knowing that no heuristic always outperforms another, Xia and Yap exploit reinforcement learning (under the form of a multi-armed bandit) to choose the search heuristic to employ at each node of the search rather than choosing a particular heuristic before the solving. More recently, Wattez et al. have proposed another MAB approach (Wattez et al. 2020). Like in the work of Xia and Yap, each heuristic corresponds to an arm. In contrast, an new arm is chosen at each restart instead of each node. On the other hand, in CHS, reinforcement learning allows to select the branching variable based on ERWA. Note also that CHS can be used as an additional arm in the work of Xia and Yap while it is already exploited as an arm in Wattez et al. (2020).

To return to the heuristics detailed in this section, LC, COS and CHB are also conceptually interested in the search history as CHS. They act directly on the variable scores while CHS considers this history by weighting the constraints that are responsible for failures before scoring the variables. As an illustration, CHB in Gecode updates the Q-score values of variables according to ERWA while CHS uses ERWA to update the weight of constraints to calculate the score of the variables. The update of the $\alpha $ parameter is also different between CHS and CHB, especially during restarts.

Weight and score decaying is also used in other heuristics such as ABS. However, it is applied to the score of the variables and not that of the constraints such as in CHS. It is also important to note that there is no decaying in CHB. Furthermore, CHS and dom/wdeg calculate differently the score of the constraints leading to failures. In the first case, the score of the constraint is always incremented by a constant value 1. In the second case, the new score is a tradeoff between the current one and the reward that varies at each failure. Moreover, the scores of constraints are not decayed in dom/wdeg contrary to CHS. Finally, unlike LC and COS, CHS does not require the use of an auxiliary heuristic.

5 Experimental evaluation on CSP instances

This section is devoted to the evaluation of the behavior of our heuristic when solving CSP instances (decision problem). We first describe the experimental protocol we use. In Sect. 5.2, we assess the sensitivity of our heuristic CHS to its parameters and the benefits of smoothing and resetting. Afterwards, we compare CHS with state-of-the-art variable ordering heuristics in Sect. 5.3, before studying the behavior of CHS when it is used jointly with LC or COS in Sect. 5.4. Finally, in Sect. 5.5, we evaluate the practical interest of CHS in the particular case where the search is guided by a tree-decomposition.

5.1 Experimental protocol

We consider all the CSP instances from the XCSP3 repository^{Footnote 2} and the XCSP3 competition 2018,^{Footnote 3} resulting in 16,947 instances. XCSP3, for XML-CSP version 3, is an XML-based format to represent instances of combinatorial constrained problems. Our solvers are compliant with the rules of the competition except that the global constraints cumulative, circuit and some variants of the allDifferent constraint (namely except and list) or the noOverlap constraint are not supported yet. Consequently, from the 16,947 obtained instances, we first discard 1233 unsupported instances. We also remove 2813 instances which are detected as inconsistent by the initial arc-consistency preprocessing and having no interest for the present comparison. Finally, we have noted that some instances appear more than once. In such a case, we keep only one copy. In the end, our benchmark contains 12,829 instances, including notably structured instances and instances with global constraints.

Regarding the solving step, we exploit MAC with restarts (Lecoutre et al. 2007) before assessing the impact of our approach on a structural solving method, namely BTD-MAC+RST+Merge (Jégou et al. 2016). Roughly speaking, BTD-MAC+RST+Merge differs from MAC by the exploitation of the structure via the notion of tree-decomposition (i.e. a collection of subsets of variables, called clusters, which are arranged in the form of a tree Robertson and Seymour 1986). While the search performed by MAC considers at each step all the remaining variables, one performed by BTD-MAC+RST+Merge only takes into account the unassigned variables of the current cluster. The clusters of the computed tree-decomposition are processed according to a depth-first traversal of the tree-decomposition starting from a cluster called the root cluster (see Jégou et al. 2016 for more details). For BTD-MAC+RST+Merge, the tree-decompositions are computed with the heuristic H$_5$-TD-WT (Jégou et al. 2016). The first root cluster is the cluster having the maximum ratio number of constraints to its size minus one. At each restart, the selected root cluster is one which maximizes the sum of the weights of the constraints whose scope intersects the cluster. The merging heuristic is the one provided in Jégou et al. (2016). Note that these settings except the variable ordering heuristic correspond to those used for the XCSP3 competitions 2017 and 2018 (Habet et al. 2018; Jégou et al. 2017, 2018).

MAC and BTD-MAC+RST+Merge use a geometric restart strategy based on the number of backtracks with an initial cutoff set to 100 and an increasing factor set to 1.1. In order to make the comparison fair, the lexicographic ordering is used for the choice of the next value to assign. We consider the following heuristics dom/wdeg, $wdeg^{ca.cd}$, ABS, IBS and CHB as implemented in Gecode. For ABS, we fix the decay parameter $\gamma $ to 0.999 as in Michel and Hentenryck (2012). Note that we do not exploit a probing step like one mentioned in Michel and Hentenryck (2012). So all the weights are initially set to 0. For CHB, we use the value parameters as given in Schulte (2018). We also introduce a new variant dom/wdeg+s which we define as dom/wdeg where the weights of constraints are smoothed at each restart, exactly as in CHS. For all the heuristics, ties (if any) are broken by using the lexicographic ordering.

We have written our own C++ code to implement all the compared variable ordering heuristics in this section, as well as the solvers that exploit them (MAC and BTD). By so doing, we avoid any bias related to the way the heuristics and solvers are implemented. In particular, the variable ordering heuristics are all implemented with equal refinement and care. Moreover, when comparing the variable ordering heuristics for a given solver, the only thing which differs is the variable ordering heuristic. Indeed, we use exactly the same propagators, the same value heuristic, etc. This ensures that we make a fair comparison. Finally, given a solver and a CSP instance, we consider that a variable ordering heuristic $h_1$ is better than another one $h_2$ if $h_1$ allows the solver to solve the instance faster than $h_2$. Indeed, the aim of variable ordering heuristic is to make a good tradeoff between the size of the explored search tree and the runtime spent for choosing a relevant variable (remember that finding the best one is an NP-Hard task Liberatore 2000). Since all the other parts of the solver are identical, the solving runtime turns to be a relevant measure of the quality of this tradeoff. Thus, when the comparison relies on a collection of instances, $h_1$ is said better than $h_2$ if it leads the solver to solve more instances than $h_2$. If both lead to solve the same number of instance, ties are broken by considering the smaller cumulative runtime. At the end, note that our protocol is consistent with the recommendations outlined in Hooker (1995).

The experiments are performed on Dell PowerEdge R440 servers with Intel Xeon Silver 4112 processors (clocked at 2.6 GHz) under Ubuntu 18.04. Each solving process is allocated a slot of 30 minutes and at most 16 GB of memory per instance. In the following tables, #solved (abbreviated sometimes #solv.) denotes the number of solved instances for a given solver and time is the cumulative runtime, i.e. the sum of the runtime over all the considered instances.

5.2 Impact of CHS settings

In this part, we assess the sensitivity of CHS with respect to the chosen values for $\alpha _0$ or $\delta $. First, we observe the impact of $\alpha _0$ value. Hence, we fix $\delta $ to $10^{-4}$ to start the search by considering the variable degrees then quickly exploit ERWA-based computation. We then vary the value of $\alpha _0$.

Table 1 Number of instances solved by MAC+CHS depending on the value of $\alpha _0$ (between 0.1 and 0.9) for consistent instances (SAT), inconsistent ones (UNSAT), and all the instances (ALL) and the cumulative runtime (in hours) of MAC+CHS for all the instances

Conflict history based heuristic for constraint satisfaction problem solving

Abstract

Similar content being viewed by others

Profound Degree: A Conservative Heuristic to Repair Dynamic CSPs

Using conflict and support counts for variable and value ordering in CSPs

Adapting Consistency in Constraint Solving

Explore related subjects

1 Introduction

2 Preliminaries

2.1 Constraint satisfaction problem

2.2 Exponential recency weighted average

3 Conflict-history search for CSP

3.1 CHS description

3.2 CHS and restarts

3.3 CHS and smoothing

4 Related work

4.1 Impact-based search (IBS)

4.2 Conflict-driven heuristic

4.3 Activity-based heuristic (ABS)

4.4 CHB in gecode

4.5 Last conflict (LC)

4.6 Conflict order search (COS)

4.7 Branching heuristics for the satisfiability problem

4.8 Discussion

5 Experimental evaluation on CSP instances

5.1 Experimental protocol

5.2 Impact of CHS settings

5.3 CHS versus other search heuristics

5.4 Combination with LC and COS

5.5 CHS and tree-decomposition

6 Experimental evaluation on COP instances

6.1 Experimental protocol

6.2 Impact of CHS settings

6.3 CHS versus other search heuristics

7 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation