Keywords

1 Introduction

Dealing with a combinatorial problem often leads to the natural question of computing or estimating its number of solutions. Such a question arises, for instance, in several works on probabilistic reasoning and machine learning [8, 9], or when exploring the structure of the solution space [17]. Counting solutions has indeed been an active research topic in Constraint Programming, in particular on global constraints [13]. Unfortunately, designing an efficient counting algorithm for a specific constraint is as hard as the constraint development itself. Hence, solution counting methods require customized counting algorithms for bounding, or estimating, the number of solutions for each global constraint. We propose here a systematic method to estimate the number of solutions of most of the cardinality constraints.

This article focuses on ten of them: \(\texttt {alldifferent}\), \(\texttt {nvalue}\), \(\texttt {atmostNValues}\), \(\texttt {atleastNValues}\), \(\texttt {occurrence}\), \(\texttt {atmost}\), \(\texttt {atleast}\), \(\texttt {among}\), \(\texttt {uses}\), \(\texttt {disjoint}\). They all constrain the number of occurrences of certain values or the number of different values in a solution. They can be mathematically modelled with bipartite graphs. In [13], the problem of counting solutions for \(\texttt {alldifferent}\) and \(\texttt {gcc}\) is transformed into counting matchings in these graphs. Solving such problems is very hard: they often belong to the #P-complete complexity class. This is why counting-based search, as presented in [13], are not based on exact counting but on estimations or upper bounds. In this article, we introduce a probabilistic approach to compute such an estimation.

In [2], the authors introduce two new global constraints \(\texttt {range}\) and \(\texttt {roots}\), that can be used to specify many cardinality constraints. In other words, for almost every cardinality constraint, there is an equivalent model using only the more primitive \(\texttt {range}\) and \(\texttt {roots}\) constraints (and some arithmetic constraints). This equivalent model is called the decomposition of the initial cardinality constraint. We show how to use the \(\texttt {range}\) and \(\texttt {roots}\) decomposition for counting solutions. More precisely, we develop a probabilistic approach to estimate the number of solutions on a \(\texttt {range}\) and on a \(\texttt {roots}\) constraint and we derive from it a systematic method to estimate the number of solutions on many cardinality constraints. Compared to [13], we obtain an estimation instead of an upper bound, and we propose a method that can be generalized to a large set of cardinality constraints without redesigning a dedicated model.

Outline: The paper is organized as follows. Section 2 gives an introduction to the \(\texttt {range}\) and \(\texttt {roots}\) constraints and some materials to understand the associated bipartite graph model. In Sect. 3, we detail how to count exactly the number of solutions on \(\texttt {range}\) and \(\texttt {roots}\) and then we apply a probabilistic model to develop an estimation of the true number of solutions. In Sect. 4, we give the \(\texttt {range}\) and \(\texttt {roots}\) decomposition and an estimation of the number of solutions for several cardinality constraints, and we synthesize our estimators under a general formula. In Sect. 5, we experiment our probabilistic estimators within the counting-based strategy \(\texttt {maxSD}\).

2 Preliminaries : Introduction to \(\texttt {range}\) and \(\texttt {roots}\)

In all the article, we will use the following notations. Let \(X=\lbrace x_1, \ldots , x_n \rbrace \), the set of variables. For each variable \(x_i \in X\), we note \(D_i\) its domain, \(Y=\bigcup _{i=1}^n D_i= \lbrace y_1, \ldots , y_m \rbrace \) the union of the domains and \(\mathcal {D}=D_1 \times \ldots \times D_n\), the Cartesian product of the domains. We note \(d_i=|D_i|\), the size of the domain of \(x_i\). Given a constraint C on variables X, we write \(\mathcal {S}_{C(X)}\) the set of solutions of C for X and we write \(\# C(X)\) the number of tuples allowed by C for X.

Cardinality constraints restrict the number of occurrences of particular values taken by set of variables, or the number of values or variables meeting some conditions. Among them, we can list \(\texttt {alldifferent}\), \(\texttt {gcc}\), \(\texttt {nvalue}\), \(\texttt {atleast}\), \(\texttt {atmost}\). We will come back and define properly these constraints one by one in Sect. 4. Most of the time, these constraints can be modelled with a bipartite graph, in which we are looking for some mathematical structures, such as matchings for example.

Fig. 1.
figure 1

Value graph and sub-value graph of Examples 1 and 2.

Definition 1

(Value Graph). Let \(G_{X,Y} =G(X \cup Y, E)\), the graph on nodes \(X\cup Y\), with edges \(E = \lbrace (x_i, y_j) \mid y_j \in D_i\rbrace \). \(G_{X,Y}\) is a bipartite graph representing the domain of each variable. There is an edge between \(x_i\) and \(y_j\) iff \(y_j \in D_i\).

Example 1

Let \(X=\lbrace x_1, x_2, x_3, x_4, x_5\rbrace \) with \(D_1=\lbrace 1,2,4 \rbrace \), \(D_2=\lbrace 2,3 \rbrace \), \(D_3=\lbrace 1,2,3,5 \rbrace \), \(D_4=\lbrace 4,5 \rbrace \) and \(D_5=\lbrace 2,4,5 \rbrace \). We obtain the value graph \(G_{X,Y}\) depicted on Fig. 1a.

We also define the sub-value graph induced by two subsets \(X' \subseteq X\) and \(Y' \subseteq Y\), as the value graph restricted to the considered subset of nodes.

Definition 2

(Sub-Value Graph induced by subsets of X and Y). Let \(G_{X',Y'}=G(X' \cup Y', E)\), the value graph of \(X'\) with \(E = \lbrace (x_i, y_j) \mid y_j \in D'_i = D_i \cap Y' \rbrace \). \(G_{X',Y'}\) is a bipartite graph representing the sub-domain induced by \(Y'\) of each variable. There is an edge between \(x_i\) and \(y_j\) iff \(y_j \in D'_i\).

We will also note \(d_i(Y')=|D_i'|\) the size of the domain of \(x_i\) restricted to the values of \(Y'\). Example 2 illustrates a sub-value graph of the value graph presented in Example 1.

Example 2

Let \(X'=\lbrace x_2, x_3, x_4\rbrace \subseteq X\) and \(Y'=\lbrace 3,5 \rbrace \subseteq Y\). the sub-value graph induced by \(X'\) and \(Y'\) is represented in Fig. 1b.

The \(\texttt {range}\) and \(\texttt {roots}\) constraints [2] are two auxiliary constraints that can help decomposing a lot of cardinality constraints. In this study, we will use these decomposition to count solutions on cardinality constraints. As the authors wrote in [2], “\(\texttt {range}\) captures the notion of image of a function and \(\texttt {roots}\) captures the notion of domain". In this paper, we use alternative definitions for these constraints, equivalent to those of [2] and better suited to our needs.

Definition 3

(\(\mathtt {range}\)). Let \(X'\subseteq X\) and \(Y' \subseteq Y\). The constraint \(\texttt {range}\) \((X,X',Y')\) holds if the values assigned to variables of \(X'\) covers exactly \(Y'\) and not more. Formally:

$$\begin{aligned} \mathcal {S}_{\texttt {range}(X,X',Y')} = \lbrace (v_1, \ldots , v_n) \in \mathcal {D}\ |\ \lbrace v_i | x_i \in X' \rbrace = Y' \rbrace \end{aligned}$$
(1)

Definition 4

(\(\mathtt {roots}\)). Let \(X'\subseteq X\) and \(Y' \subseteq Y\). The constraint \(\texttt {roots}\) \((X,X',Y')\) holds if the variables that are assigned to values of \(Y'\) covers exactly \(X'\) and not more. Formally:

$$\begin{aligned} \mathcal {S}_{\texttt {roots}(X,X',Y')} = \lbrace (v_1, \ldots , v_n) \in \mathcal {D}\ |\ \lbrace x_i | v_i \in Y' \rbrace = X' \rbrace \end{aligned}$$
(2)

Example 3

Let’s take the value graph given in Example 1a.

  • The tuple (2, 2, 3, 4, 5) is allowed by the constraint \(\texttt {range}(X,\lbrace x_1, x_2, x_3\rbrace ,\) \(\lbrace 2,3 \rbrace )\).

  • The tuple (2, 2, 3, 4, 5) is allowed by the constraint \(\texttt {roots}(X,\lbrace x_1, x_2, x_3\rbrace ,\) \(\lbrace 2,3 \rbrace )\).

Note that \(\texttt {range}\) and \(\texttt {roots}\) are not exactly reciprocal because every variable must be assigned to a value, but a value is not necessarily assigned to a variable.

3 Counting Solutions on the \(\texttt {range}\) and \(\texttt {roots}\) Constraints

As developed in [13], counting solutions on cardinality constraints requires dedicated counting algorithm for each constraint. In this section we are interested by computing the number of solutions on the \(\texttt {range}\) and the \(\texttt {roots}\) constraints. The idea is then to only use the decomposition of cardinality constraints into these more primitive constraints and to reuse the counting method on \(\texttt {range}\) and \(\texttt {roots}\) to count solutions on cardinality constraints.

3.1 Exact Solutions Counting on \(\texttt {range}\) and \(\texttt {roots}\)

In this subsection, we are interested by exactly computing the number of allowed tuples for a \(\texttt {range}\) constraint and a \(\texttt {roots}\) constraint.

Proposition 1

Let \(X' \subseteq X\) and \(Y' \subseteq Y\). We note \(\overline{X'}\), the complement of \(X'\) in X, such that \(\overline{X'}\cup X'= X\) and \(\overline{X'}\cap X'= \emptyset \). Then, the number of tuples allowed by \(\texttt {range}(X,X',Y')\) is

$$\begin{aligned} \# \texttt {range}(X,X',Y') = \# \texttt {range}(X',X',Y') \cdot \underset{x_i \in \overline{X'}}{\prod } d_i \end{aligned}$$
(3)

Proof

On one side, we must consider every possible assignment for the variables of \(\overline{X'}\) that are not constrained: . And on the other side, we must count every tuples allowed for variables of \(X'\), that are constrained, that is simply \(\# \texttt {range}(X',X',Y')\). The number of tuples is thus the product of these quantities.    \(\square \)

Proposition 1 reduces the problem of counting allowed tuples for every variable in X to only counting tuples for the constrained variables \(X'\). We thus have reduced the problem to counting the number of allowed tuples in the case where every variable and value is constrained.

Proposition 2

$$\begin{aligned} \# \texttt {range}(X,X,Y) =\underset{x_i \in X}{\prod }d_i - \underset{Y' \subsetneq Y}{\sum } \# \texttt {range}(X,X,Y') \end{aligned}$$
(4)

Proof

Inside \(G_{X,Y}\), we must count every possible assignment of variables of X such that every value of Y is covered. To do that, we first count the number of every possible assignment of variables of X in \(G_{X,Y}\) (without considering the \(\texttt {range}\) constraint):

$$\begin{aligned} \underset{x_i \in X}{\prod }d_i \end{aligned}$$

And then, we withdraw, one by one, the assignment of X such that Y is not fully covered, that is, for every subset \(Y' \subsetneq Y\), the solutions of \(\texttt {range}(X,X,Y')\):

$$\begin{aligned} \underset{Y' \subsetneq Y'}{\sum } \# \texttt {range}(X,X,Y') \end{aligned}$$

Indeed, for two different subsets \(Y'_1 \ne Y'_2 \subsetneq Y\), the sets of allowed tuples \(\mathcal {S}_{\texttt {range}(X,X, Y'_1)}\) and \(\mathcal {S}_{\texttt {range}(X,X, Y'_2)}\) are necessarily disjoint: there is a value \(y_j \in Y\) such that \(y_j \in Y'_1\) and \(y_j \notin Y'_2\) (or \(y_j \in Y'_2\) and \(y_j \notin Y'_1\)), so the value \(y_j\) must be assigned to one of the variable of X to satisfy \(\texttt {range}(X,X, Y'_1)\) but none of the variable of X must be assigned to \(y_j\) to satisfy \(\texttt {range}(X,X,Y'_2)\) (or vice-versa). A solution of \(\texttt {range}(X,X,Y'_1)\) cannot be a solution of \(\texttt {range}(X,X,Y'_2)\) and vice-versa. No solution are counted twice in \(\underset{Y' \subsetneq Y}{\sum } \# \texttt {range}(X,X,Y')\). We have:

$$\begin{aligned} \# \texttt {range}(X,X,Y) = \underset{x_i \in X}{\prod }d_i- \underset{Y' \subsetneq Y}{\sum } \# \texttt {range}(X,X,Y') \end{aligned}$$

   \(\square \)

Remark 1

Proposition 2 can be used in Proposition 1 and we obtain:

$$\begin{aligned} \# \texttt {range}(X,X',Y') = \underset{x_i \in \overline{X'}}{\prod } d_i \cdot \left( \underset{x_i \in X'}{\prod }d_i(Y')- \underset{Y'' \subsetneq Y'}{\sum } \# \texttt {range}(X',X',Y'') \right) \end{aligned}$$

This formulae requires to recursively sum and evaluate terms over a exponential-size set and is not tractable in practice (we believe that it is a \(\#P-\)complete problem). In next subsection, we will give an approximation which is much faster to compute. We now deal with the \(\texttt {roots}\) constraint.

Proposition 3

Let \(X' \subseteq X\) and \(Y' \subseteq Y\). We note \(\overline{X'}\), the complement of \(X'\) in X and \(\overline{Y'}\) the complement of \(Y'\) in Y. Then, the number of tuples allowed by \(\texttt {roots}(X,X',Y')\) is

$$\begin{aligned} \# \texttt {roots}(X,X',Y') = \underset{x_i \in X'}{\prod } d_i(Y') \cdot \underset{x_i \in \overline{X'}}{\prod } d_i(\overline{Y'}) \end{aligned}$$
(5)

Proof

In order to satisfy \(\texttt {roots}(X,X',Y')\), every variable from \(X'\) must take a value in \(Y'\) and no value from \(Y'\) must be assigned to a variable from \(\overline{X'}\), that is every variable from \(\overline{X'}\) must be assigned to values from \(\overline{Y'}\):

  • represents the number of ways of assigning every variable of \(X'\)

  • represents the number of ways of assigning every variable of \(\overline{X'}\)    \(\square \)

The formula given by Proposition 3 is polynomial to compute. In practice, the formula depends on the subsets \(X'\) and \(Y'\). Applying the Erdos-Renyi model on \(\texttt {roots}\) allows the estimation of \(\# roots(X,X',Y')\) using only the sizes of \(X'\) and \(Y'\), with a linear complexity.

In Sect. 4, we compose these constraints to count solutions on other cardinality constraints.

3.2 Probabilistic Model Applied to \(\texttt {range}\) and \(\texttt {roots}\)

This subsection presents a probabilistic model for cardinality constraints based on the work of Erdős and Renyi In [5]. The idea is to randomize the domain of the variables. Then, we use this model to get a computable estimation of the number of solutions on \(\texttt {range}\) and \(\texttt {roots}\).

Erdős-Renyi Model Applied to CSP. In [5], Erdős and Renyi  studied the existence and the number of perfect matchings on random graphs. Expressed in the vocabulary we introduced above, the idea is to randomize the domain of each variable such that: for all \(x_i \in X\) and for all \(y_j \in Y\), the event \(\lbrace y_j \in D_i \rbrace \) happens with a predefined probability \(p\in {[0,1]}\) and all such events are independent:

$$\begin{aligned} \mathbb {P}\left( \lbrace y_j \in D_i \rbrace \right) = p \in [0,1] \end{aligned}$$
(6)

Erdős-Renyi Model Applied to \(\texttt {range}\) Constraint. We will study the expectancy of the number of solutions of a \(\texttt {range}\) constraint within these random graphs. In the case where every variable of X and every value of Y are constrained, the expectancy of \(\# \texttt {range}(X,X,Y)\) is a function of nm and p (as a reminder, \(|X|=n\) and \(|Y|=m\)). More precisely:

Proposition 4

In the case where every variable of X and every value of Y are constrained, there exists a coefficient \(a_{n,m}\) such that:

$$\begin{aligned} \mathbb {E}\left( \# \texttt {range}(X,X,Y) \right) = a_{n,m}\cdot p^n \end{aligned}$$
(7)

where \(\mathbb {E}\left( \# \texttt {range}(X,X,Y) \right) \) is the expectancy of \(\# \texttt {range}(X,X,Y)\) under the hypothesis of the Erdős-Renyi Model.

Proof

To prove this result, we simply reason with a mathematical induction on \(|Y|=m\). Let \(|X|=n \in \mathbb {N}\).

Base Case: Let \(Y=\lbrace y \rbrace \) be a singleton. In this particular case, an instance \(\texttt {range}(X,X,Y)\) have one allowed tuple, if y is inside every domain \(D_i\), and have zero allowed tuple otherwise. Then,

$$\begin{aligned} \mathbb {E}\left( \# \texttt {range}(X,X,\lbrace y \rbrace ) \right)&= 0 * \mathbb {P}\left( \lbrace \texttt {range}(X,X,\lbrace y \rbrace ) \text { have no solution} \rbrace \right) \\&\qquad + 1 * \mathbb {P}\left( \lbrace \texttt {range}(X,X,\lbrace y \rbrace ) \text { have one solution} \rbrace \right) \\&= \mathbb {P}\left( \lbrace \texttt {range}(X,X,\lbrace y \rbrace ) \text { have one solution} \rbrace \right) \\&= \mathbb {P}\left( \lbrace \forall x_i \in X, y \in D_i \rbrace \right) \\&= \prod _{i=1}^n \mathbb {P}\left( \lbrace y \in D_i \rbrace \right) , \text {by hypothesis of independence} \\&= p^n \end{aligned}$$

We thus set \(a_{n,1}=1\), which proves the result.

Inductive Step. We assume that the property is true for all \(|Y|=k \in \lbrace 1, \ldots , m-1 \rbrace \): \(\forall Y, \text { such that } 1\le |Y|=k \le m-1, \exists a_{n,k} \in \mathbb {N},\)

$$\begin{aligned} \mathbb {E}\left( \# \texttt {range}(X,X,Y) \right) =a_{n,k}\cdot p^n. \end{aligned}$$

We want to prove that, under this assumption, for a set Y with \(|Y|=m\), there exists \(a_{n,m}\) such that \(\mathbb {E}\left( \# \texttt {range}(X,X,Y) \right) =a_{n,m}\cdot p^n\)

According to Proposition 2, we have:

$$\begin{aligned}&\mathbb {E}\left( \# \texttt {range}(X,X,Y) \right) \\&= \mathbb {E}\left( \underset{x_i \in X}{\prod }d_i \right) - \underset{Y' \subset Y}{\sum } \mathbb {E}\left( \# \texttt {range}(X,X,Y') \right) , \text { by linearity of the operator } \mathbb {E}\left( . \right) \\&=\mathbb {E}\left( \underset{x_i \in X}{\prod } d_i \right) - \sum _{k=1}^{m-1}\left( {\begin{array}{c}m\\ k\end{array}}\right) a_{n,k}\cdot p^n, \text { by hypothesis of induction.} \\&=\underset{x_i \in X}{\prod } \mathbb {E}\left( d_i \right) - \sum _{k=1}^{m-1}\left( {\begin{array}{c}m\\ k\end{array}}\right) a_{n,k}\cdot p^n, \text { by hypothesis of independence} \\&= (mp)^n - \sum _{k=1}^{m-1}\left( {\begin{array}{c}m\\ k\end{array}}\right) a_{n,k}\cdot p^n, \text { because } \forall x_i \in X, \mathbb {E}\left( d_i \right) = mp\\&= \left( m^n - \sum _{k=1}^{m-1}\left( {\begin{array}{c}m\\ k\end{array}}\right) a_{n,k} \right) \cdot p^n \end{aligned}$$

We have identified the coefficient \(a_{n,m}\):

$$\begin{aligned} a_{n,m} = m^n - \sum _{k=1}^{m-1}\left( {\begin{array}{c}m\\ k\end{array}}\right) a_{n,k} \end{aligned}$$
(8)

   \(\square \)

Remarking that \(\left( {\begin{array}{c}m\\ m\end{array}}\right) =1\), we can rewrite 8 as follows:

$$\begin{aligned} m^n = \sum _{k=1}^m \left( {\begin{array}{c}m\\ k\end{array}}\right) a_{n,k} \end{aligned}$$
(9)

Also, \(\forall n \in \mathbb {N}^+, a_{n,1}=1\). These coefficients are referenced as the “triangles of numbers” in OEIS.Footnote 1 The coefficients \(a_{n,m}\) corresponds to the number of possible surjections from a set of cardinal n into a set of cardinal m.Footnote 2 There is a non-recursive formula to compute these coefficients. The following results is admitted here. An intuition of the proof is that this results is an application of the inclusion-exclusion principle, see Sect. 1.9. The Twelvefold Way of [18].

Proposition 5

For \(0<m\le n\),

$$\begin{aligned} a_{n,m} = \sum _{k=0}^m (-1)^k \left( {\begin{array}{c}m\\ k\end{array}}\right) (m-k)^n \end{aligned}$$
(10)

Proposition 6 is a property of triangle of numbers and will be used to make some simplifications for future mathematical developments.

Proposition 6

$$\begin{aligned} a_{n,n} = n! \end{aligned}$$
(11)

Proof

\(a_{n,n}\) is the number possible surjections from a set of cardinality n into a set of cardinality n, which is actually the number of bijections in that specific case.    \(\square \)

We can now extend Proposition 4 to the case where the \(\texttt {range}\) constraint only concerns subsets \(X'\subseteq X\) and \(Y' \subseteq Y\):

Proposition 7

Let \(X' \subseteq X\) and \(Y' \subseteq Y\). We note \(|X'|=n'\) and \(|Y'|=m'\).

$$\begin{aligned} \mathbb {E}\left( \# \texttt {range}(X,X',Y') \right) = a_{n',m'} \cdot m^{n-n'}\cdot p^n \end{aligned}$$
(12)

Proof

According to Propositions 1 and 4 and by hypothesis of independence:

$$\begin{aligned} \mathbb {E}\left( \# \texttt {range}(X,X',Y') \right)&= \mathbb {E}\left( \# \texttt {range}(X',X',Y') \right) \cdot \mathbb {E}\left( \underset{x_i \in \overline{X'}}{\prod } d_i \right) \\&= a_{n',m'}\cdot p^{n'} \cdot \underset{x_i \in \overline{X'}}{\prod }\mathbb {E}\left( d_i \right) \\&= a_{n',m'} \cdot p^{n'} \cdot (mp)^{n-n'} \\&= a_{n',m'} \cdot m^{n-n'}\cdot p^n \end{aligned}$$

   \(\square \)

Erdős-Renyi Model Applied to \(\texttt {roots}\) Constraint. We study now the expectancy of the number of solutions of a \(\texttt {roots}\) constraint.

Proposition 8

Let \(X' \subseteq X\) and \(Y' \subseteq Y\). We note \(|X'|=n'\) and \(|Y'|=m'\).

$$\begin{aligned} \mathbb {E}\left( \# \texttt {roots}(X,X',Y') \right) = m'^{n'} \cdot (m-m')^{n-n'} \cdot p^n \end{aligned}$$
(13)

Proof

According to Proposition 3 and by hypothesis of independence:

$$\begin{aligned} \mathbb {E}\left( \# \texttt {roots}(X,X',Y') \right)&= \mathbb {E}\left( \underset{x_i \in X'}{\prod } d_i(Y') \right) \cdot \mathbb {E}\left( \underset{x_i \in \overline{X'}}{\prod } d_i(\overline{Y'}) \right) \\&= \underset{x_i \in X'}{\prod } \mathbb {E}\left( d_i(Y') \right) \cdot \underset{x_i \in \overline{X'}}{\prod } \mathbb {E}\left( d_i(\overline{Y'}) \right) \\&= (m'p)^{n'} \cdot \left( (m-m')p \right) ^{n-n'} \\&= m'^{n'} \cdot (m-m')^{n-n'} \cdot p^n \end{aligned}$$

   \(\square \)

The parameter p corresponds to the density of edges in the value graph. To use the estimators in practice, we need to estimate p: we will later set p to the division of the sum of domains size by the total number of possible edges: \(n \cdot m\).

4 Generalization to Cardinality Constraints

This section details, in a systematic way, how to count solutions for many cardinality constraints thanks to their \(\texttt {range}\) and \(\texttt {roots}\) decompositions. Due to space limitations, only four constraints are given in detail. For the other six constraints to which our method applies, a synthesis then summarises all the formulae as well as the general computation pattern. Each subsection first recalls the definitions of the considered constraint, then details its decomposition as extracted from [2] and finally provides the formula for the expectancy of its number of solution in our model.

4.1 \(\texttt {alldifferent}\) [16]

Definition 5

A constraint \(\texttt {alldifferent}\)(X) is satisfied iff each variable \(x_i \in X\) is instantiated to a value of its domain \(D_i\) and each value \(y_j \in Y\) is chosen at most once. We define formally the set of allowed tuples:

$$\begin{aligned} \mathcal {S}_{\texttt {alldifferent}(X)} = \lbrace (v_1, \ldots , v_n) \in \mathcal {D} \mid \forall i,j \in \lbrace 1,...,n \rbrace , i \ne j \Leftrightarrow v_i \ne v_j \rbrace \end{aligned}$$
(14)

A decomposition of \(\texttt {alldifferent}\) with a \(\texttt {range}\) constraint is given by the following:

$$\begin{aligned} \texttt {alldifferent}(X) \Leftrightarrow \texttt {range}(X,X,Y')\; \wedge \; |Y'|=n \end{aligned}$$

From this decomposition, we can deduce a formula for the expectancy of the number solutions on an \(\texttt {alldifferent}\) constraint, within the Erdős-Renyi Model.

Proposition 9

$$\begin{aligned} \mathbb {E}\left( \# \texttt {alldifferent}(X) \right) = \frac{m!}{(m-n)!} \cdot p^n \end{aligned}$$
(15)

Proof

According to the decomposition of \(\texttt {alldifferent}\).

$$\begin{aligned} \# \texttt {alldifferent}(X) = \underset{Y' \subseteq Y, \, |Y'|=n|}{\sum } \# \texttt {range}(X,X,Y') \end{aligned}$$

Then,

$$\begin{aligned} \mathbb {E}\left( \# \texttt {alldifferent}(X) \right)&= \underset{Y' \subseteq Y, \, |Y'|=n|}{\sum } \mathbb {E}\left( \# \texttt {range}(X,X,Y') \right) \\&= \left( {\begin{array}{c}m\\ n\end{array}}\right) \cdot a_{n,n} \cdot p^n = \frac{m!}{(m-n)!} \cdot p^n \end{aligned}$$

   \(\square \)

4.2 \(\texttt {nvalue}\) [11]

Definition 6

The constraint \(\texttt {nvalue}(X,N)\) holds if exactly N values from Y are assigned to the variables. Formally:

$$\begin{aligned} \mathcal {S}_{\texttt {nvalue}(X,N)} = \lbrace (v_1, \ldots , v_n) \in \mathcal {D} | \; N = |\lbrace y_j \in Y | \exists i \in \lbrace 1, \ldots , n \rbrace , v_i = y_j \rbrace | \rbrace \end{aligned}$$
(16)

A decomposition of \(\texttt {nvalue}\) with a \(\texttt {range}\) constraint is given by the following:

$$ \begin{aligned} \texttt {nvalue}(X,N) \Leftrightarrow \texttt {range}(X,X,Y')\; \& \; |Y'|= N \end{aligned}$$

From this decomposition, we can deduce a formula to estimate solutions on a \(\texttt {nvalue}\) constraint, within the Erdős-Renyi Model.

Proposition 10

Let \(N \in \mathbb {N}\),

$$\begin{aligned} \mathbb {E}\left( \# \texttt {nvalue}(X,N) \right) = \left( {\begin{array}{c}m\\ N\end{array}}\right) \cdot a_{n,N} \cdot p^n \end{aligned}$$
(17)

Proof

The proof is the same as Proposition 9.    \(\square \)

We can generalize Proposition 9 to the case where N is a variable. The set of solutions for two different values of N are disjoints, then we can simply sum this estimates on the domain of N to compute an estimate in the general case.

4.3 \(\texttt {among}\) [1]

Definition 7

(\(\mathtt {among}\)). Let \(Y' \subseteq Y\). The constraint \(\texttt {among}(X,Y',N)\) holds iff exactly N variables are assigned to value from \(Y'\).

$$\begin{aligned} \mathcal {S}_{\texttt {among}(X,Y',N)} = \lbrace (v_1, \ldots , v_n) | N = |\lbrace x_i | v_i \in Y' \rbrace | \rbrace \end{aligned}$$

The decomposition of \(\texttt {among}\) is given by the following equivalence:

$$\begin{aligned} \texttt {among}(X,Y',N) \Leftrightarrow \texttt {roots}(X,X',Y') \; \wedge \; |X'| = N \end{aligned}$$

Proposition 11

Let \(m'=|Y'|\) and \(N \in \mathbb {N}\),

$$\begin{aligned} \mathbb {E}\left( \# \texttt {among}(X,Y',N) \right) = \left( {\begin{array}{c}n\\ N\end{array}}\right) m'^{N}(m-m')^{n-N} \cdot p^n \end{aligned}$$
(18)

Proof

According to the decomposition of \(\texttt {among}\), we can write:

$$\begin{aligned} \# \texttt {among}(X,Y',N) = \underset{X' \subseteq X, |X'| = N}{\sum } \# \texttt {roots}(X,X',Y') \end{aligned}$$

Indeed, for two different subsets \(X_1', X_2' \subseteq X\), the sets of solutions of \(\texttt {roots}(X,X'_1,Y')\) and \(\texttt {roots}(X,X'_2,Y')\) have an empty intersection, then no solution is counted twice. And:

$$\begin{aligned} \mathbb {E}\left( \# \texttt {among}(X,Y',N) \right)&=\underset{X' \subseteq X, |X'| = N}{\sum } \mathbb {E}\left( \# \texttt {roots}(X,X',Y') \right) \\&=\underset{X' \subseteq X, |X'| = N}{\sum } m'^{m'}(m-m')^{n-|X'|} \cdot p^n, \text {by Proposition {8}}\\&= \left( {\begin{array}{c}n\\ N\end{array}}\right) m'^{m'}(m-m')^{n-N} \cdot p^n \end{aligned}$$

   \(\square \)

In the same way as for \(\texttt {nvalue}\), we can generalize Proposition 11 to the case where N is a variable.

4.4 \(\texttt {occurrence}\) [4]

Definition 8

(\(\mathtt {occurrence}\)). Let \(y \in Y\), the constraint \(\texttt {occurrence}(X,y,N)\) holds iff exactly N variables are assigned to value y.

$$\begin{aligned} \mathcal {S}_{\texttt {occurrence}(X,y,N)} = \lbrace (v_1, \ldots , v_n) | N = |\lbrace x_i | v_i=y \rbrace | \rbrace \end{aligned}$$

The decomposition of \(\texttt {occurrence}\) is given by the following equivalence:

$$\begin{aligned} \texttt {occurrence}(X,y,N) \Leftrightarrow \texttt {roots}(X,X',\lbrace y \rbrace ) \; \wedge \; |X'| = N \end{aligned}$$

Proposition 12

Let \(N \in \mathbb {N}\),

$$\begin{aligned} \mathbb {E}\left( \# \texttt {occurrence}(X,y,N) \right) = \left( {\begin{array}{c}n\\ N\end{array}}\right) (m-1)^{n-N} \cdot p^n \end{aligned}$$
(19)

Proof

The proof is the same as Proposition 11 in the case where \(Y'=\lbrace y \rbrace \) is a singleton.    \(\square \)

Proposition 12 can also be generalized to the case where N is a variable.

4.5 Synthesis

We report the estimators of the number of solutions in Table 1 for several cardinality constraints. We observe a pattern in all these formulae: the estimation of the number of allowed tuples is always \(p^n\) multiplied by the number of tuples allowed by the constraint if every domain were equal to the set of values Y (if the value graph were complete). This remark leads to the following Proposition.

Table 1. Counting formulae extracted from \(\texttt {range}\) and \(\texttt {roots}\) reformulation

Proposition 13

Let C be a constraint over X with \(|X|=n\), Y be the union of the domains and p the edge density in the value graph \(G_{X,Y}\), then:

$$\begin{aligned} \mathbb {E}\left( \# C \right) = \# C^* \cdot p^n \end{aligned}$$
(20)

with \(\# C^*\) the number of allowed tuples if \(G_{X,Y}\) were complete.

Proof

Let \(\mathcal {S}_{C^*}\) be the set of allowed tuples if \(G_{X,Y}\) were complete. For each \(s \in \mathcal {S}_{C^*}\), let \(Z_s\) be the random variable such that, \(Z_s=1\) if s is in the set of allowed tuples \(\mathcal {S}_{C}\) of C, and \(Z_s=0\) otherwise. A solution s is an instantiation of every variable, then, in the Erdős-Renyi Model, \(\mathbb {P}\left( \lbrace Z_s=1 \rbrace \right) =\mathbb {E}\left( Z_s \right) =p^n\). Then,

$$\begin{aligned} \mathbb {E}\left( \# C \right) = \mathbb {E}\left( \underset{s \in \mathcal {S}_{C^*}}{\sum } Z_s \right) = \underset{s \in \mathcal {S}_{C^*}}{\sum } \mathbb {E}\left( Z_s \right) = \# C^* \cdot p^n \end{aligned}$$

   \(\square \)

In Sect. 3, we have shown how to count solutions on a \(\texttt {range}\) and a \(\texttt {roots}\) constraints and in Sect. 4, how to use the \(\texttt {range}\) and \(\texttt {roots}\)/decomposition to estimate the number of solutions on many cardinality constraints. Proposition 13 highlights a general pattern for such estimates. In Sect. 5, we experiment these probabilistic estimators within counting-based heuristics on some problems using cardinality constraints.

Fig. 2.
figure 2

Performances of \(\texttt {maxSD\_ER}\), \(\texttt {maxSD\_PQZ}\), \(\texttt {dom/wdeg}\), \(\texttt {ibs}\) and \(\texttt {abs}\) on 40 hard Latin Square instances, in number of backtracks (left) and time (right).

5 Experimental Analysis

In this section, we present two problems, on which we have run different heuristics: \(\texttt {maxSD}\) [13], \(\texttt {dom/wdeg}\) [3], \(\texttt {abs}\) (activity-based search) [10] and \(\texttt {ibs}\) (impact-based search) [15]. This benchmark has been chosen by taking the problems in XSCP, CSPLib, MiniZinc which matched our testing needs: no COP, with cardinality constraints at the core of the problem but no gcc. Also, the lack of knowledge on how to use \(\texttt {maxSD}\) on problems with several constraints restricts a lot the practical use of the heuristic. These conditions restricted our benchmark to Latin Squares and Sports Tournament Scheduling.

\(\texttt {maxSD}\) consists in choosing a pair variable/value based on the estimation of the number of remaining solutions. More precisely, for each constraint, and for each pair variable/value in this constraint, we compute an estimation of the number of remaining allowed tuples and we associated with each pair a solution density. \(\texttt {maxSD}\) chooses the pair variable/value that maximizes the solution density among every constraint.

We actually do not run \(\texttt {maxSD}\) as presented in [13], but a slightly different version. It consists in re-computing the ordering of the variables only when the product of the domains size have decreased enough, as suggested in [6]. Here, we set a threshold at 20%. Also, the coefficients \(a_{n,m}\), the binomial coefficients and the factorials are computed in advance. The computation of the approximations is thus made in linear time in n.

We first introduce the problem and the cardinality constraints that are used in the model and then compare their efficiency in terms of solving time and number of required backtracks. The instances and the strategies are implemented in Choco solver [14] and we run them on a 2.2 GHz Intel Core i7 with 2.048 GB.

5.1 Latin Square Problem

A Latin Square problem is defined by a \(n*n\) grid whose squares each contain an integer from 1 to n such that each integer appears exactly once per row and column [12]. The model uses a matrix of integer variables and an \(\texttt {alldifferent}\) constraint for each row and each column. We tested on the 40 hard instances used in [13] with n = 30 and 42% of holes (corresponding to the phase transition), generated following [7]. For these instances, we also compare our probabilistic estimator (\(\texttt {maxSD\_ER}\)) with the estimator that is proposed in [13] (\(\texttt {maxSD\_PQZ}\)) for \(\texttt {alldifferent}\). We set a time limit to 10 min.

Figures 2 represent the percentage of solved instances in function of the number of required backtracks, and of the solving time. The strategies \(\texttt {maxSD}\) (for both estimators \(\texttt {maxSD\_ER}\) and \(\texttt {maxSD\_PQZ}\)) and \(\texttt {abs}\) performed better than \(\texttt {dom/wdeg}\) and \(\texttt {ibs}\). \(\texttt {abs}\) solved more instances than the two versions of \(\texttt {maxSD}\), but required more backtracks. \(\texttt {maxSD}\) seems to perform better on the easiest instances (in term of number of backtracks). \(\texttt {maxSD\_PQZ}\) has slightly better performances than \(\texttt {maxSD\_ER}\) on the medium instances and have very comparable performances on the hardest ones.

5.2 Sports Tournament Scheduling Problem

This problem is taken from [19] and is presented as follows: the problem is to schedule a tournament of n teams over \(n-1\) weeks, with each week divided into n / 2 periods, and each period divided into two slots. A tournament must satisfy the following three constraints: every team plays once a week; every team plays at most twice in the same period over the tournament; every team plays every other team. The first and the third constraint are modeled with an \(\texttt {alldifferent}\) constraint and the second one is modeled with an \(\texttt {atmost}\) constraints. We run this problem with the different settings: \(n \in \lbrace 6,8,10,12,14 \rbrace \).

In Table 2, we report the number of backtracks required (and the time required) to solve the problem for different values of n with four different heuristics. Here \(\texttt {maxSD\_PQZ}\) cannot be used as there is no estimator for \(\texttt {atmost}\) in the previous work of [13]. Consequently, we only focused on our approach \(\texttt {maxSD\_ER}\). We fixed a time limit to 5 min. We observe that \(\texttt {maxSD\_ER}\) outperforms \(\texttt {abs}\) and \(\texttt {dom/wdeg}\). For \(n \in \lbrace 6,8,10 \rbrace \), \(\texttt {maxSD\_ER}\) and \(\texttt {ibs}\) have similar performances but \(\texttt {ibs}\) could not find a solution in less than 5 min for \(n=12\) and \(n=14\).

Table 2. Number of backtracks (time in s) for different settings of n

We have shown that our probabilistic estimator for \(\texttt {alldifferent}\) gives very comparable result than the estimator given in [13] on the Latin Square instances. Also our estimators within \(\texttt {maxSD\_ER}\) gives better results than \(\texttt {ibs}\), \(\texttt {abs}\) and \(\texttt {dom/wdeg}\) on the Sport Tournament Scheduling problem.

6 Conclusion

In this paper, we have presented a method to estimate the number of solutions of the \(\texttt {range}\) and \(\texttt {roots}\) constraints with a probabilistic Erdős-Renyi Model. We can estimate the number of solutions of ten cardinality constraints using their \(\texttt {range}\) and \(\texttt {roots}\) decompositions. We detailed our method on \(\texttt {alldifferent}\), \(\texttt {nvalue}\), \(\texttt {among}\) and \(\texttt {occurrence}\) and we report our estimators with \(\texttt {atmostNValues}\), \(\texttt {atleastNValues}\), \(\texttt {atmost}\), \(\texttt {atleast}\), \(\texttt {uses}\) and \(\texttt {disjoint}\). We highlighted a general formula to compute such an estimation on cardinality constraints. We have implemented the heuristic \(\texttt {maxSD\_ER}\) with these new probabilistic estimators and compare their efficiency to \(\texttt {dom/wdeg}\), \(\texttt {abs}\), and \(\texttt {ibs}\).

We think that the main asset of this approach is its systematic nature. We have shown here an application of counting solutions for counting based search. Such an approach could also be used, for example, for uniform random instances generation, probabilistic reasoning or search space structure analysis.

We did not study the \(\texttt {gcc}\) constraint in this article, as its decomposition involves several non-disjoint subsets of the variables. Further research includes extending our approach to the case where several \(\texttt {range}\) and \(\texttt {roots}\) constraints may apply to a common set of variables. This will lead us to estimators of the number of solutions for conjunctions of cardinality constraints, or \(\texttt {gcc}\) constraints.