Keywords

1 Introduction

#SAT (aka model-counting) is the problem of counting the number of satisfying assignments of a given Boolean formula and is a #P-complete problem. Indeed, every NP Turing machine can be encoded as a formula whose satisfying assignments correspond to the accepting paths of the machine [24]. Thus, model-counting is harder than satisfiability: #SAT is indeed intractable in cases for which SAT is tractable (e.g., sets of Horn clauses or sets of 2-literal clauses) [25]. Still, there are cases in which model-counting is tractable (e.g., OBDDs and d-DNNFs). For a very good overview of the problem and of some approaches to it see [14].

Our interest in model counting originates from its applications in the field of Quantitative Information Flow (qif) [8, 20]. Indeed, a basic result in qif is that the maximum min-entropy leakage of a deterministic program is \(\log _2 k\), with k the number of distinct outputs the program can return [20], varying the input. If the program is modeled as a Boolean formula, then computing its leakage reduces to #SAT, specifically to computing the number of models of the formula obtained by existentially projecting out the non-output variables; see [3, 17].

Over the years, several exact counting algorithms have been put forward and implemented, such as, among others, [12, 17, 19, 23], with applications to qif [16]. The problem with exact counters is that, although performing reasonably well when certain parameters of the formula – size, number of variables, number of clauses – are relatively small, they rapidly go out of memory as these parameters grow.

For this reason, approximate counters have more and more been considered. Indeed, in many applications, the exact count of models is not required: it may suffice to provide an estimate, as long as the method is quick and it is equipped with a formal guarantee of correctness. This is typically the case in qif, where knowing the exact count within a factor of \(\eta \) is sufficient to estimate leakage within \(\log _2\eta \) bits. For probabilistic counters, correctness is usually expressed in terms of two parameters: accuracy – the desired maximum difference between the reported and the true count; and confidence – the probability that the reported result is actually within the specified accuracy.

We set ourselves in the line of research pioneered by [25] and followed, e.g., by [1, 6, 13, 17]. The basic idea of a probabilistic model counting algorithm is the following (Sect. 2): given a formula \(\phi \) in the Boolean variables \(y_1,\ldots ,y_m\), one chooses at random \(\langle a_0,\ldots , a_m\rangle \in \{0,1\}^{m+1}\). The resulting XOR constraint \(a_0 = a_1y_1\oplus \cdots \oplus a_m y_m\) splits evenly the set of models of \(\phi \) into two parts: those satisfying the constraint and those not satisfying it. If one independently generates s such constraints \(c_1,\ldots ,c_s\), the formula \(\phi ' {\mathop {=}\limits ^{\triangle }}\,\phi \wedge c_1\wedge \cdots \wedge c_s\) has an expected \(\frac{N}{2^s}\) models, where N is the number of models of \(\phi \) (i.e., the number we aim at estimating). If \(\phi '\) is still satisfiable, then with high probability \(N\ge 2^s\), otherwise \(N<2^s\). By repeating this process, one can arrive at a good estimate of N. This procedure can be implemented by relying on any SAT-solver capable of dealing with XOR constraints, e.g CryptoMiniSat [22]; or even converting the XOR constraints into CNF before feeding \(\phi '\) to the SAT solver. In any case, the branching factor associated with searching for models of \(\phi '\) quickly explodes as the length (number of variables) of the XOR constraints grows. The random generation outlined above will lead to an expected length of \(\frac{m}{2}\) for each constraint, making the procedure not easily scalable as m grows.

In the present paper, we study under what conditions one can employ sparse, hence shorter, constraints, keeping the same guarantees of correctness. We generalize the results of [9] to arrive at an improved understanding of how sparsity is related to minimum distance between models, and how this affects the counting procedure. Based on these results, we also suggest a possible direction for a new counting methodology based on the use of Low Density Parity Check (LDPC) codes [11, 18]; however, we leave a through experimentation with this new methodology for future work.

The main point is to generate the coefficients \(a_1,\ldots ,a_m\) according to a probability value \(\lambda \in \left( 0,\frac{1}{2}\right] \), rather than uniformly. This way, the constraints will have an average length of \(\lambda m\ \le \frac{m}{2}\) each. Basically, the correctness guarantees of the algorithm depend, via the Chebyshev inequality, on keeping the variance of the number of models of \(\phi '\), the formula obtained by joining the constraints, below a certain threshold. A value of the density \(\lambda \) that achieves this is said to be feasible for the formula. In our main result (Sect. 3), we provide a bound on the variance that also depends on the minimum Hamming distance d between the formula’s models: a larger d yields a smaller variance, hence smaller feasible \(\lambda \)’s. Our bound essentially coincides with that of [9] for \(d=1\). Therefore, in principle, a lower bound on the minimum distance can be used to obtain tighter bounds on \(\lambda \), making the XOR constraints shorter and the counting procedure pragmatically more efficient.

We will show this phenomenon at work (Sect. 4) on some formulae where the value of d is known by construction, comparing our results with the state of the art model counter ApproxMC3 [21]. These considerations also suggest that, if no information on d is available, one can encode the formula’s models using an error correcting code with a known minimum distance. We will briefly discuss the use of LDPC codes to this purpose, although at the moment we have no experimental results available in this respect. A comparison with recent related work concludes the paper (Sect. 5). For reasons of space, all proofs have been sketched and are fully available in [4].

2 A General Counting Algorithm

In what follows, we let \(\phi (y)\), or just \(\phi \), denote a generic boolean formula with boolean variables \(y=(y_1,...,y_m)\) and \(m\ge 1\).

2.1 A General Scheme

According to a well-known general scheme [13], the building block of a statistical counting procedure is a probabilistic decision algorithm: with high probability, this algorithm correctly decides whether the cardinality of the set is, or is not, below a given threshold \(2^s\), within some tolerance factors, given by the slack parameters \(\alpha \) and \(\beta \) below.

Definition 1 (#SAT decision algorithm)

Let \(0\le \delta < frac 1 2\) (error probability), \(\alpha > 1\) and \(\beta > 1\) (two slack parameters) be three reals. An \((\alpha ,\beta ,\delta )\)-decision algorithm (for #SAT) is a probabilistic algorithm \(A(\cdot ,\cdot )\), taking a pair of an integer s and a boolean formula \(\phi \), and returning either 1 (meaning ‘\(\#\phi \ge 2^{s-\alpha }\)’) or 0 (meaning ‘\(\#\phi \le 2^{s+\beta }\)’) and such that for each integer \(s\ge 0\) and formula \(\phi \):

  1. 1.

    \(\#\phi > 2^{s+\beta }\) implies \(\Pr \left( A(s,\phi )=0\right) \le \delta \);

  2. 2.

    \(\#\phi < 2^{s-\alpha }\) implies \(\Pr \left( A(s,\phi )=1\right) \le \delta \).

The use of two different slack parameters in the definition above is justified by the need of stating formal guarantees about the outcome of the algorithm, while keeping the precision of the algorithm as high as possible.

As usual, we can boost the confidence in the reported answer, and get an arbitrarily small error probability, by running \(A(s,\phi )\) several times independently. In particular, consider the algorithm \(RA_t(s,\phi )\) obtained by running \(A(s,\phi )\) an odd \(t\ge 1\) number times independently, and then reporting the majority answer. Call Err the event that \(RA_t\) reports a wrong answer; then,

$$\begin{aligned} \Pr (Err)= & {} \Pr (\text {at least }\left\lceil \frac{t}{2} \right\rceil \text {runs of }A(s,\phi ) \text { report the wrong answer})\nonumber \\= & {} \sum _{k=\left\lceil \frac{t}{2} \right\rceil }^t \Pr (\text {exactly }k \text { runs of } A(s,\phi ) \text { report the wrong answer})\nonumber \\= & {} \sum _{k=\left\lceil \frac{t}{2} \right\rceil }^t \left( {\begin{array}{c}t\\ k\end{array}}\right) p^k (1-p)^{t-k} \end{aligned}$$
(1)

where

$$\begin{aligned} p&{\mathop {=}\limits ^{\triangle }}\,\ \Pr (A(s,\phi ) \text { reports the wrong answer})\nonumber \\&= \left\{ \begin{array}{ll} \Pr (A(s,\phi )= 0) &{} \qquad \text{ if } \#\phi > 2^{s+\beta }\\ \Pr (A(s,\phi )= 1) &{} \qquad \text{ if } \#\phi < 2^{s-\alpha } \end{array} \right. \nonumber \\&\le \delta \end{aligned}$$
(2)

Now, replacing (2) in (1), we obtain

$$\begin{aligned} \Pr (Err)\le & {} \sum _{k=\left\lceil \frac{t}{2} \right\rceil }^t \left( {\begin{array}{c}t\\ k\end{array}}\right) \delta ^k (1-\delta )^{t-k} \end{aligned}$$
(3)

Let us call \(\varDelta (t,\delta )\) the right hand side of (3); then, \(RA_t\) is an \((\alpha ,\beta ,\varDelta (t,\delta ))\)-decision algorithm whenever A is an \((\alpha ,\beta ,\delta )\)-decision algorithm.

We now show that any \((\alpha ,\beta ,\delta )\)-decision algorithm A for #SAT can be used as a building block for a counting algorithm, \(C_A(\phi )\), that determines an interval \([\ell ,u]\) such that \(\lfloor 2^\ell \rfloor \le \#\phi \le \lceil 2^u \rceil \) with high probability. Informally, starting with an initial interval \([-1, m]\), the algorithm \(C_A\) performs a binary search, using A to decide which half of the current interval \(\log _2(\#\phi )\) lies in. The search stops when the current interval cannot be further narrowed, taking into account the slack parameters \(\alpha \) and \(\beta \), or when a certain predefined number of iterations is reached. Formally, let \(I_0{\mathop {=}\limits ^{\triangle }}\,[-1, m]\). Assume \(k>0\) and \(I_k=[l_k,u_k]\), then:

  1. (a)

    if \(u_k - l_k \le 2\max (\alpha ,\beta )+1\) or \(k = \lceil \log _2(m)\rceil \), then return \(I_k\);

  2. (b)

    otherwise, let \(s= round \!\left( \frac{u_k+l_k}{2}\right) \); if \(A(s,\phi )=0\) then \(I_{k+1}{\mathop {=}\limits ^{\triangle }}\,[l_k,s+\beta ]\) otherwise \(I_{k+1}{\mathop {=}\limits ^{\triangle }}\,[s-\alpha ,u_k]\).

Theorem 1

Let A be a \((\alpha ,\beta ,\delta )\)-decision algorithm. Then:

  1. 1.

    \(C_A(\phi )\) terminates in \(k\le \lceil \log _2 m \rceil \) iterations returning an interval \(I_k=[l,u]\) such that \(u-l\le 2\max (\alpha ,\beta )+2\);

  2. 2.

    The probability that \(\#\phi \notin [\lfloor 2^l \rfloor , \lceil 2^u\rceil ]\) is at most \( \lceil \log _2 m \rceil \delta \).

Proof (Sketch)

If the algorithm terminates because \(u_k - l_k \le 2\max (\alpha ,\beta )+1\), the first claim is trivial. Otherwise, by construction of the algorithm, we have that \(|I_0| = m+1\). Furthermore, by passing from \(I_{k-1}\) to \(I_{k}\), we have that \(s_k =round\!\left( \frac{u_{k-1}+l_{k-1}}{2}\right) \) and

$$ \begin{array}{rl} |I_{k}| \le &{} \left\{ \begin{array}{ll} \frac{u_{k-1}-l_{k-1}}{2} + \alpha + \frac{1}{2} &{} \text{ if } A(s_k,m) = 1 \text{ and } s_k = \left\lfloor \frac{u_{k-1}-l_{k-1}}{2} \right\rfloor \\ \frac{u_{k-1}-l_{k-1}}{2} + \beta &{} \text{ if } A(s_k,m) = 0 \text{ and } s_k = \left\lfloor \frac{u_{k-1}-l_{k-1}}{2} \right\rfloor \\ \frac{u_{k-1}-l_{k-1}}{2} + \alpha &{} \text{ if } A(s_k,m) = 1 \text{ and } s_k = \left\lceil \frac{u_{k-1}-l_{k-1}}{2} \right\rceil \\ \frac{u_{k-1}-l_{k-1}}{2} + \beta + \frac{1}{2} &{} \text{ if } A(s_k,m) = 0 \text{ and } s_k = \left\lceil \frac{u_{k-1}-l_{k-1}}{2} \right\rceil \end{array} \right. \end{array} $$

Thus, by letting \(M = \max (\alpha ,\beta ) + \frac{1}{2}\), we have that \(|I_k| \le \frac{|I_{k-1}|}{2} + M\); this suffices to obtain \(|I_{\lceil \log _2 m\rceil }| \le 2\max (\alpha ,\beta ) + 2\).

The error probability is the probability that either \(\#\phi < \lfloor 2^l \rfloor \) or \(\#\phi > 2^u\); this is the probability that one of the \(\lceil \log _2 m\rceil \) calls to A has returned a wrong answer.    \(\square \)

In all the experiments we have run, the algorithm has always returned an interval of width at most \(2\max (\alpha ,\beta )+1\), sometimes in less than \(\lceil \log _2(m)\rceil \) iterations: this consideration pragmatically justifies the exit condition we used in the algorithm. We leave for future work a more elaborated analysis of the algorithm to formally establish the \(2\max (\alpha ,\beta )+1\) bound.

2.2 XOR-based Decision Algorithms

Recall that a XOR constraint c on the variables \(y_1,...,y_m\) is an equality of the form

$$\begin{aligned} a_0= & {} a_1y_1\oplus \cdots \oplus a_m y_m \end{aligned}$$

where \(a_i\in \mathbb {F}_2\) for \(i=0,...,m\) (here \(\mathbb {F}_2=\{0,1\}\) is the two elements field.) Hence c can be identified with a \((m+1)\)-tuple in \(\mathbb {F}^{m+1}_2\). Assume that a probability distribution on \(\mathbb {F}^{m+1}_2\) is fixed. A simple proposal for a decision algorithm \(A(s,\phi )\) is as follows:

  1. 1.

    generate s XOR constraints \(c_1,\ldots ,c_s\) independently, according to the fixed probability distribution;

  2. 2.

    if \(\phi \wedge c_1\wedge \cdots \wedge c_s\) is unsatisfiable then return 0, else return 1.

Indeed [13], every XOR constraint splits the set of boolean assignments in two parts, according to whether the assignment satisfies the constraint or not. Thus, if \(\phi \) has less than \(2^s\) models (and so less than \(2^{s+\beta }\)), the formula \(\phi \wedge c_1\wedge \cdots \wedge c_s\) is likely to be unsatisfiable.

In step 2 above, any off-the-shelf SAT solver can be employed: one appealing possibility is using CryptoMiniSat [22], which offers support for specifying XOR constraints (see e.g. [3, 6, 17]). Similarly to [13], it can be proved that this algorithm yields indeed an \( (\alpha ,\beta ,\delta )\)-decision algorithm, for a suitable \(\delta <\frac{1}{2}\), if the constraints \(c_i\) at step 1 are chosen uniformly at random. This however will generate ‘long’ constraints, with an average of \(\frac{m}{2}\) variables each, which a SAT solver will not be able to manage as m grows.

3 Counting with Sparse XORs

We want to explore alternative ways of generating constraints, which will make it possible to work with short (‘sparse’) XOR constraints, while keeping the same guarantees of correctness. In what follows, we assume a probability distribution over the constraints, where each coefficient \(a_i\), for \(i>0\), is chosen independently with probability \(\lambda \), while \(a_0\) is chosen uniformly (and independently from all the other \(a_i\)’s). In other words, we assume that the probability distribution over \(\mathbb {F}^{m+1}_2\) is of the following form, for \(\lambda \in \left( 0,\frac{1}{2}\right] \):

$$\begin{aligned} \Pr (a_0,a_1,...,a_m)&{\mathop {=}\limits ^{\triangle }}\,p(a_0) p'(a_1)\cdots p'(a_m) \nonumber \\ \text { where: }&\quad p(1)=p(0)=\frac{1}{2} \qquad p'(1)=\lambda \qquad p'(0)=1-\lambda \end{aligned}$$
(4)

The expected number of variables appearing in a constraint chosen according to this distribution will therefore be \(m\lambda \). Let us still call A the algorithm presented in Sect. 2.2, with this strategy in choosing the constraints. We want to establish conditions on \(\lambda \) under which A can be proved to be a decision algorithm.

Throughout this section, we fix a boolean formula \(\phi (y_1,...,y_m)\) and \(s\ge 1\) XOR constraints. Let

$$\chi _s{\mathop {=}\limits ^{\triangle }}\,\phi \wedge c_1\wedge \cdots \wedge c_s$$

where the \(c_i\) are chosen independently according to (4). For any assignment (model) \(\sigma \) from variables \(y_1,...,y_m\) to \(\mathbb {F}_2\), let us denote by \(Y_\sigma \) the Bernoulli r.v. which is 1 iff \(\sigma \) satisfies \(\chi _s\).

We now list the steps needed for proving that A is a decision algorithm. This latter result is obtained by Proposition 1(2) (that derives from Lemma 1(1)) and by combining Proposition 1(1), Lemma 2(3) (that derives from Lemma 1(2)) and Lemma 3 in Theorem 2 later on. In what follows, we shall let \(\rho {\mathop {=}\limits ^{\triangle }}\,1-2\lambda \).

Lemma 1

  1. 1.

    \(\Pr (Y_\sigma =1)= E[Y_\sigma ]= 2^{-s}\).

  2. 2.

    Let \(\sigma ,\sigma '\) be any two assignments and d be their Hamming distance in \(\mathbb {F}_2^m\) (i.e., the size of their symmetric difference seen as subsets of \(\{1,...,m\}\)). Then \(\Pr (Y_\sigma =1,Y_{\sigma '} =1)=E[Y_\sigma \cdot Y_{\sigma '}]=\left( \frac{1+\rho ^d}{4}\right) ^s\) (where we let \(\rho ^d{\mathop {=}\limits ^{\triangle }}\,1\) whenever \(\rho =d=0\).)

Proof (Sketch)

The first claim holds by construction. For the second claim, the crucial thing to prove is that, by fixing just one constraint \(c_1\) (so, \(s=1\)), we have that \(\Pr (Y_\sigma =1,Y_{\sigma '}=1) = \frac{1+\rho ^d}{4}\). To this aim, call A and B the sets of variables that are assigned value 1 in \(\sigma \) and \(\sigma '\), respectively; then, we let \(U=A\setminus B\), \(V=B\setminus A\) and \(I=A\cap B\). Let C be the set of variables appearing in the constraint \(c_1\); then, \(U_e\) and \(U_o\) abbreviate \(\Pr ( |C\cap U|\text { is even})\) and \(\Pr ( |C\cap U|\text { is odd})\), respectively; similarly for \(I_e,I_o\) and \(V_e,V_o\). Then,

$$ \begin{array}{l} \Pr (Y_\sigma =1,Y_{\sigma '}=1|a_0=0) = U_e I_e V_e+U_o I_o V_o \\ \Pr (Y_\sigma =1,Y_{\sigma '}=1|a_0=1) = U_o V_o + U_e V_e- U_o I_o V_o - U_e I_e V_e \end{array} $$

By elementary probability theory, \(\Pr (Y_\sigma =1,Y_{\sigma '}=1) = \frac{1}{2} (1- V_e- U_e+ 2 U_e V_e )\); the result is obtained by noting that \(U_e \ = \ \frac{1}{2} (1+\rho ^{|U|})\), \(V_e \ = \ \frac{1}{2} (1+\rho ^{|V|})\) and \(d=|U\cup V|=|U|+|V|\).    \(\square \)

Now let \(T_s\) be the random variable that counts the number of models of \(\chi _s\), when the constraints \(c_1,...,c_s\) are chosen independently according to distribution (4):

$$\begin{aligned} T_s&{\mathop {=}\limits ^{\triangle }}\,\#\chi _s\,. \end{aligned}$$

The event that \(\chi _s\) is unsatisfiable can be expressed as \(T_s=0\). A first step toward establishing conditions under which A yields a decision algorithm is the following result. It makes it clear that a possible strategy is to keep under control the variance of \(T_s\), which depends in turn on \(\lambda \). Let us denote by \(\mu _s\) the expectation of \(T_s\) and by \(\mathrm {var}(T_s)\) its variance. Note that \(\mathrm {var}(T_s)>0\) if \(\#\phi >0\).

Proposition 1

  1. 1.

    \(\#\phi > 2^{s+\beta }\) implies \(\Pr \left( A(s,\phi )=0 \right) \le \frac{1}{1+\frac{\mu _s^2}{\mathrm {var}(T_s)}}\);

  2. 2.

    \(\#\phi < 2^{s-\alpha }\) implies \(\Pr \left( A(s,\phi )=1 \right) < 2^{-\alpha }\).

Proof (Sketch)

The first claim relies on a version of the Cantelli-Chebyshev inequality for integer nonnegative random variables (a.k.a. Alon-Spencer’s inequality); the second claim relies on Lemma 1(1) and Markov’s inequality.    \(\square \)

By the previous proposition, assuming \(\alpha >1\), we obtain a decision algorithm (Definition 1) provided that \(\mathrm {var}(T_s)<\mu _s^2\). This will depend on the value of \(\lambda \) that is chosen, which leads to the following definition.

Definition 2 (feasibility)

Let \(\phi \), s and \(\beta \) be given. A value \(\lambda \in \left( 0,\frac{1}{2}\right] \) is said to be \((\phi ,s,\beta )\)-feasible if \(\#\phi > 2^{s+\beta }\) implies \(\mathrm {var}(T_s)<\mu _s^2\), where the constraints in \(\chi _s\) are chosen according to (4).

Our goal is now to give a method to minimize \(\lambda \) while preserving feasibility. Recall that \(T_s{\mathop {=}\limits ^{\triangle }}\,\#\chi _s\). Denote by \(\sigma _1,...,\sigma _N\) the distinct models of \(\phi \) (hence, \(T_s \le N\)). Note that \(T_s=\sum _{i=1}^N Y_{\sigma _i}\). Given any two models \(\sigma _i\) and \(\sigma _j\), \(1\le i,j\le N\), let \(d_{ij}\) denote their Hamming distance. The following lemma gives exact formulae for the expected value and variance of \(T_s\).

Lemma 2

Let \(\rho =1-2\lambda \).

  1. 1.

    \(\mu _s=E[T_s]= N 2^{-s}\);

  2. 2.

    \(\mathrm {var}(T_s)= \mu _s + 4^{-s}\sum _{i=1}^N\sum _{j\ne i} ( {1+\rho ^{d_{ij}}} )^s - \mu ^2_s\);

  3. 3.

    If \(N \ne 0\), then \( \frac{\mathrm {var}(T_s)}{\mu ^2} = \mu ^{-1}_s + N^{-2} \sum _{i=1}^N\sum _{j\ne i} ( {1+\rho ^{d_{ij}}} )^s -1\)

Proof (Sketch)

The first two items are a direct consequence of Lemma 1 and linearity of expectation; the third item derives from the previous ones.    \(\square \)

Looking at the third item above, we clearly see that the upper bound on the error probability we are after depends much on ‘how sparse’ the set of \(\phi \)’s models is in the Hamming space \(\mathbb {F}_2^m\): the sparser, the greater the distance, the lower the value of the double summation, the better. Let us denote by S the double summation in the third item of the above lemma:

$$\begin{aligned} S&{\mathop {=}\limits ^{\triangle }}\,&\sum _{i=1}^N\sum _{j\ne i} ( {1+\rho ^{d_{ij}}} )^s \end{aligned}$$

In what follows, we will give an upper bound on S which is easy to compute and depends on the minimum Hamming distance d among any two models of \(\phi \). We need some notation about models of a formula. Below, we let \(j=d,...,m\).

$$\begin{aligned} l_j&{\mathop {=}\limits ^{\triangle }}\,&\left\{ \begin{array}{ll}j-\left\lceil \frac{d}{2}\right\rceil +1 &{} \text { if }j\le \frac{m}{2} \\ \max \{0,m-j-\left\lceil \frac{d}{2}\right\rceil +1\} \quad &{} \text { if }j> \frac{m}{2} \end{array} \right. \nonumber \\ w^*&{\mathop {=}\limits ^{\triangle }}\,&\min \left\{ w:d\le w\le m\text { and }\sum _{j=d}^w{m\atopwithdelims (){l_j}} \ge N-1 \right\} \end{aligned}$$
(5)
$$\begin{aligned} N^*&{\mathop {=}\limits ^{\triangle }}\,&\sum _{j=d}^{w^*-1}{m\atopwithdelims (){l_j}} \end{aligned}$$
(6)

where we stipulate that \(\min \emptyset =0\). Note that the definitions of \(w^*\) and \(N^*\) depend solely on Nm and d.

With the above definitions and results, we have the following upper bound on S.

Lemma 3

Let the minimal distance between any two models of \(\phi \) be at least d. Then

$$\begin{aligned} S \le N \left( \sum _{j=d}^{w^*-1} {m\atopwithdelims (){l_j}} ( {1+\rho ^{j}} )^s + (N-1-N^*) ( {1+\rho ^{w^*}} )^s\right) \end{aligned}$$

Proof (Sketch)

Fix one of the models of \(\phi \) (say \(\sigma _i\)), and consider the sub-summation originated by it, \(S_i{\mathop {=}\limits ^{\triangle }}\,\sum _{j\ne i} \left( \frac{1+\rho ^{d_{ij}}}{4}\right) ^s\). Let us group the remaining \(N-1\) models into disjoint families, \(\mathcal {F}_d,\mathcal {F}_{d+1},\ldots \), of models that are at distance \(d,d+1,...\), respectively, from \(\sigma _i\). Note that each of the \(N-1\) models gives rise to exactly one term in the summation \(S_i\). Hence, \(S_i = \sum _{j=d}^{m} |\mathcal {F}_j| \left( \frac{1+\rho ^{j}}{4}\right) ^s\). By the Ray-Chaudhuri-Wilson Lemma [2, Th.4.2], \(|\mathcal {F}_j|\le { m \atopwithdelims (){l_j}}\). Hence, upper-bounding \(S_i\) consists, e.g., in choosing a tuple of integers \(x_d, \ldots , x_m\) such that \(\sum _{j=d}^{m} x_j \left( \frac{1+\rho ^{j}}{4}\right) ^s \ge \sum _{j=d}^{m} |\mathcal {F}_j| \left( \frac{1+\rho ^{j}}{4}\right) ^s\), under the constraints \(0 \le x_j \le {m\atopwithdelims (){l_j}}\), for \(j=d,\ldots ,m\), and \(\sum _{j=d}^{m} x_j = N-1\). An optimal solution is

$$\begin{aligned} x_j= & {} \left\{ \begin{array}{ll} {m\atopwithdelims (){l_j}} &{} \text{ for } j=d,\ldots ,w^*-1 \\ N-1-N^* &{} \text{ for } j = w^* \\ 0 &{} \text{ for } j > w^* \end{array} \right. \end{aligned}$$

The thesis is obtained by summing over all models \(\sigma _i\).    \(\square \)

Definition 3

Given \(s\ge 1\), \(\beta >0\), \(d\ge 1\) and \(\lambda \in \left( 0,\frac{1}{2}\right] \), let us define

where \(\rho {\mathop {=}\limits ^{\triangle }}\,1 - 2\lambda \) and \(N=\lceil 2^{s+\beta }\rceil \) also in the definition of \(w^*\) and \(N^*\).

Using the facts collected so far, the following theorem follows, giving an upper bound on \(\frac{\mathrm {var}(T_s)}{\mu ^2_s}\).

Theorem 2 (upper bound)

Let the minimal distance between models of \(\phi \) be at least d and \(\#\phi > 2^{s+\beta }\). Then, \(\frac{\mathrm {var}(T_s)}{\mu ^2_s} \le B(s,m,\beta ,d,\lambda )\).

Proof

First note that we can assume without loss of generality that \(\#\phi =N=\lceil 2^{s+\beta }\rceil \). If this was not the case, we can consider in what follows any formula \(\phi '\) whose models are models of \(\phi \) but are exactly \(\lceil 2^{s+\beta }\rceil \) (\(\phi '\) can be obtained by adding some conjuncts to \(\phi \) that exclude \(\#\phi - \lceil 2^{s+\beta }\rceil \) models). Then, \(\Pr \left( A(s,\phi )=0 \right) \le \Pr \left( A(s,\phi ')=0 \right) \) and this would suffice, for the purpose of upper-bounding \(\Pr \left( A(s,\phi )=0 \right) \). The result follows from Proposition 1(1), Lemma 2(3), Lemma 3 and by the fact that \(N = \lceil 2^{s+\beta }\rceil \ge 2^{s+\beta }\).    \(\square \)

The following notation will be useful in the rest of the paper. For \(0\le \gamma \le 1\), define

$$\begin{aligned} \lambda ^*_\gamma (s,m,\beta ,d) {\mathop {=}\limits ^{\triangle }}\,\inf \left\{ \lambda \in \left( 0, \frac{1}{2} \right] \,:\,B(s,m,\beta ,d,\lambda )\le \gamma \right\} \end{aligned}$$
(7)

where we stipulate \(\inf \emptyset =+\infty \).

Corollary 1 (Feasibility)

Assume the minimal distance between any two models of \(\phi \) is at least d. Then every \(\lambda \in \left( \lambda ^*_1(s,m,\beta ,d) , \frac{1}{2} \right] \) is \((\phi ,s,\beta )\)-feasible.

4 Evaluation

4.1 Theoretical Bounds on Expected Constraint Length

To assess the improvements that our theory introduces on XOR-based approximate counting, we start by considering the impact of minimum Hamming distance on the expected length of the XOR constraints. First, in Fig. 1 we plot \(\lambda ^*_1\) as a function of s, for fixed values of \(m=32\) and 64, \(\beta =1.5\), and four different values of d. Note that the difference between different values of d tends to vanish as s gets large – i.e. close to m.

Fig. 1.
figure 1

Plots of \(\lambda ^*_1\) as a function of s, for \(m=32\) and \(m=64\), \(\beta =1.5\), and different values of d. For any value of s and d, any value of \(\lambda \) above the curve is feasible.

Next, we compare our theoretical bounds with those in [9], where a goal similar to ours is pursued. Interestingly, their bounds coincide with ours when setting \(d=1\) – no assumption on the minimum Hamming distance – showing that our approach generalizes theirs. We report a numerical comparison in Table 1, where several hypothetical values of m (no. of variables) and s (no. of constraints) are considered. Following a similar evaluation conducted in [9, Tab. 1], here we fix the error probability to \(\delta =\frac{4}{9}\) and the upper slack parameter to \(\beta =2\), and report the values of \(\lambda \times m\) for the minimal value of \(\lambda \) that would guarantee a confidence of at least \(1- \delta \) in case an upper bound is found, computed with their approach and ours. Specifically, in their case \(\lambda \) is obtained via the formulae in [9, Cor.1,Th.3], while in our case \(\lambda = \lambda ^*_{\gamma }\) for \(\gamma =0.8\), which entails the wanted confidence according to Proposition 1(2). We see that, under the assumption that lower bounds on d as illustrated are known, in some cases a dramatic reduction of the expected length of the XOR constraints is obtained.

Table 1. Comparison with the provable bounds from [9, Tab. 1].

4.2 Execution Times

Although the focus of the present paper is mostly theoretical, it is instructive to look at the results of some simple experiments for a first concrete assessment of the proposed methodology. To this aim, we have implemented in Python the algorithm \(C_A\) with A as described in Sect. 3, relying on CryptoMiniSAT [22] as a SAT solver, and conducted a few experimentsFootnote 1.

The crucial issue to use Theorem 2 is the knowledge of (a lower bound on) the minimal distance d among the models of the formula we are inputting to our algorithm. In general, this information is unknown and we shall discuss a possible approach to face this problem in the next section. For the moment, we use a few formulae describing the set of codewords of certain error correcting codes, for which the number of models and the minimum distance is known by construction. Each of such formulae derives from a specific BCH code [5, 15] (these are very well-known error-correcting codes whose minimal distance among the codewords is lower-bounded by construction). In particular, Fxx-yy-zz.cnf is a formula in CNF describing membership to the BCH code for \(2^\mathtt{xx}\) messages (the number of models), codified via codewords of yy bits and with a distance that is at least zz.

For these formulae, we run 3 kinds of experiments. First, we run the tool for every formula without using the improvements of the bounds given by knowing the minimum distance (i.e., we used Theorem 2 by setting \(d=1\)). Second, we run the tool for every formula by using the known minimum distance. Third, we run the state-of-the-art tool for approximate model counting, called ApproxMC3 [21] (an improved version of ApproxMC2 [7]). The results obtained are reported in Table 2.

Table 2. Results for our tool with \(\alpha =\beta =1.5\), \(d=1\) and \(d=d_{min}\), compared to ApproxMC3 with a tolerance \(\epsilon = 3\). In all trials, the error probability \(\delta \) is 0.1.

To compare our results with theirs, we have to consider that, if our tool returns [lu], then the number of models lies in \([\lfloor 2^l\rfloor , \lceil 2^u \rceil ]\) with error probability \(\delta \) (set to 0.1. in all experiments). By contrast, if ApproxMC3 returns a value M, then the number of models lies in \(\left[ \frac{M}{1+\epsilon }, M(1+\epsilon )\right] \) with error probability \(\delta \) (again, set to 0.1 in all experiments). So, we have to choose for ApproxMC3 a tolerance \(\epsilon \) that produces an interval of possible solutions comparable to what we obtain with our tool. The ratio between the sup and the inf of our intervals is \(2^{2\max (\alpha ,\beta )+1}\) (indeed, A always returned an interval such that \(u-l \le 2\max (\alpha ,\beta )+1\)); when \(\alpha =\beta =1.5\), the value is 16. By contrast, the ratio between the sup and the inf of ApproxMC3’s intervals is \((1+\epsilon )^2\); this value is 16 for \(\epsilon = 3\).

In all formulae that have “sufficiently many” models – empirically, at least \(2^{11}\) – our approach outperforms ApproxMC3. Moreover, making use of the minimum distance information implies a gain in performance. Of course, the larger the distance, the greater the gain – again, provided that the formula has sufficiently many models: compare, e.g., the first and the last formula.

4.3 Towards a Practical Methodology

To be used in practice, our technique requires a lower bound on the minimum Hamming distance between any two models of \(\phi \). We discuss below how error-correcting codes might be used, in principle, to obtain such a bound. Generally, speaking an error-correcting code adds redundancy to a string of bits and inflates the minimum distance between the resulting codewords. The idea here is to transform the given formula \(\phi \) into a new formula \(\phi '\) that describes an encoding of the original formula’s models: as a result \(\#\phi '=\#\phi \), but the models of \(\phi '\) live in a higher dimensional space, where a minimum distance between models is ensured.

Assume that \(\phi (y)\) is already in Conjunctive Normal Form. Fix a binary linear [nmd] block code \(\mathcal {C}\) (i.e. a code with \(2^m\) codewords of n bits and minimum distance d), and let G be its generator matrix, i.e. a \(m \times n\) binary matrix such that the codeword associated to \(u\in \mathbb {F}_2^m\) is uG (where we use the vector-matrix multiplication in the field \(\mathbb {F}_2\)). The fact that a \(c\in \mathbb {F}_2^n\) is a codeword can be expressed by finding some u that satisfies the conjunction of the n XOR constraints:

$$ c_i = \bigoplus _{j=1}^m u_j \cdot G_{ij} \quad \text{ for } i = 1,\ldots ,n\,. $$

Again, the important condition here is that G be sparse (on its columns), so that the above formula effectively corresponds to a conjunction of sparse XOR constraints. That is, we should confine ourselves to low-density parity check (LDPC) codes [11, 18]. Now, we consider the formula

$$\begin{aligned}&\phi '(z) {\mathop {=}\limits ^{\triangle }}\,\exists y(\phi (y)\wedge z = yG)\,. \end{aligned}$$

It can be easily proved that \(\phi \) and \(\phi '\) have the same number of models. If we now assume a minimum distance of d when applying Theorem 2, we have a decrease in the feasibility threshold \(\lambda ^*\), as prescribed by (7). This gain must of course be balanced against the increased number of boolean variables in the formula (viz. n). We will have an actual advantage using \(\mathcal {C}\) over not using it (and simply assuming \(d=1\)) if and only if, by using \(\mathcal {C}\), the expected length of the resulting XOR constraints is actually smaller. By letting \(\lambda ^{*,{d}}{\mathop {=}\limits ^{\triangle }}\,\lambda ^*_1(s,n,\beta ,d)\), the latter fact holds if and only if

$$\begin{aligned} n\lambda ^{*,{d}}\le & {} m\lambda ^{*,1} \end{aligned}$$
(8)

or equivalently \(\lambda ^{*,{d}} \le R\lambda ^{*,{1}}\), where \(R{\mathop {=}\limits ^{\triangle }}\,\frac{m}{n}\) is the rate of \(\mathcal {C}\). This points to codes with high rate and big minimum distance. Despite these two parameters pull one against the other, (8) can be fulfilled, and good expected length bounds obtained, by choosing \(\mathcal {C}\) appropriately.

Fig. 2.
figure 2

Plots of the expected length of XOR constraints as a function of s, with and without code, and the relative percentual gain. Here, \(m=32\) (left) and 64 (right), \(\beta =1.5\) and the code is a [155, 64, 20]-LDPC.

For example, [10] presents a [155, 64, 20]-LDPC code, that is a code with block length of 155, with \(2^{64}\) codewords and with minimum distance between codewords of 20. In Fig. 2 we compare the expected length of the resulting XOR constraints in the two cases – \(m\times \lambda ^*(s,m,\beta ,1)\) (without the code, for \(m=32\) and \(m=64\)) and \(155\times \lambda ^*_1(s,155,\beta ,20)\) (with the code) – as functions of s, for fixed \(\beta =1.5\). As seen from the plots, the use of the code offers a significant advantage in terms of expected length up to \(s=10\) and \(s=26\), respectively.

We have performed a few tests for a preliminary practical assessment of this idea. Unfortunately, in all but a few cases, the use of the said [155, 64, 20]-LDPC code does not offer a significant advantage in terms of execution time. Indeed, embedding a formula in this code implies adding 155 new XOR constraints: the presence of so many constraints, however short, apparently outweights the benefit of a minimum distance \(d=20\). We hope that alternative codes, with a more advantageous block length versus minimum distance tradeoff, would fare better. Indeed, as we showed in Table 1, relatively small distances (e.g. \(d=5\)) can already give interesting gains, if the number of extra constraints is small. We leave that as subject for future research.

5 Conclusion

We have studied the relation between sparse XOR constraints and minimum Hamming distance in model counting. Our findings suggest that minimum distance plays an important role in making the feasibility threshold for \(\lambda \) (density) lower, thus potentially improving the effectiveness of XOR based model counting procedures. These results also prompt a natural direction for future research: embedding the set of models into a higher dimensional Hamming space, so as to enforce a given minimum distance.

Beside the already mentioned [9], our work also relates to the recent [1]. There, constraints are represented as systems \(Ay = b\), for A a random LDPC matrix enjoying certain properties, b a random vector, and y the variable vector. Their results are quite different from ours, but also they take the geometry of the set of models into account, including minimum distance. In particular, they make their bounds depend also on a “boost” parameter which appears quite difficult to compute. This leads to a methodology that is only empirically validated – that is, the model count results are offered with no guarantee.