Keywords

1 Introduction

Weighted model counting (WMC), i.e., a generalisation of propositional model counting that assigns weights to literals and computes the total weight of all models of a propositional formula [11], has emerged as a powerful computational framework for problems in many domains, e.g., probabilistic graphical models such as Bayesian networks and Markov networks [3, 8, 9, 15, 32], neuro-symbolic artificial intelligence [37], probabilistic programs [26], and probabilistic logic programs [21]. It has been extended to support continuous variables [6], infinite domains [4], first-order logic [24, 36], and arbitrary semirings [5, 27]. However, as the definition of WMC puts weights on literals, additional variables often need to be added for the sole purpose of holding a weight [3, 8, 9, 15, 32]. This can be particularly detrimental to WMC algorithms that rely on variable ordering heuristics.

One approach to this problem considers weighted clauses and probabilistic semantics based on Markov networks [22]. However, with a new representation comes the need to invent new encodings and inference algorithms. Our work is similar in spirit in that it introduces a new representation for computational problems but can reuse recent WMC algorithms based on pseudo-Boolean function manipulation, namely, ADDMC [19] and DPMC [20]. Furthermore, we identify sufficient conditions for transforming a WMC instance into our new format. As many WMC inference algorithms [16, 29] work by compilation to tractable representations such as arithmetic circuits, deterministic, decomposable negation normal form [14], and sentential decision diagrams (SDDs) [17], another way to avoid parameter variables could be via direct compilation to a more convenient representation. Direct compilation of Bayesian networks to SDDs has been investigated [13]. However, SDDs only support weights on literals, and so are not expressive enough to avoid the issue. To the best of the authors’ knowledge, neither approach [13, 22] has a publicly available implementation.

In this work, we introduce a way to transform WMC problems into a new format based on pseudo-Boolean functions—pseudo-Boolean projection (PBP). We formally show that every WMC problem instance has a corresponding PBP instance and identify conditions under which this transformation can remove parameter variables. Four out of the five known WMC encodings for Bayesian networks [3, 8, 9, 15, 32] can indeed be simplified in this manner. We are able to eliminate 43 % of variables on average and up to 99 % on some instances. This transformation enables two encodings that were previously incompatible with most WMC algorithms (due to using a different definition of WMC [8, 9]) to be run with ADDMC and DPMC and results in a significant performance boost for one other encoding, making it about three times faster than the state of the art. Finally, our theoretical contributions result in a convenient algebraic way of reasoning about two-valued pseudo-Boolean functions and position WMC encodings on common ground, identifying their key properties and assumptions.

2 Weighted Model Counting

We begin with an overview of some notation and terminology. We use \(\wedge \), \(\vee \), \(\lnot \), \(\Rightarrow \), and \(\Leftrightarrow \) to denote conjunction, disjunction, negation, material implication, and material biconditional, respectively. Throughout the paper, we use set-theoretic notation for many concepts in logic. A clause is a set of literals that are part of an implicit disjunction. Similarly, a formula in CNF is a set of clauses that are part of an implicit conjunction. We identify a model with a set of variables that correspond to the positive literals in the model (and all other variables are the negative literals of the model). We can then define the cardinality of a model as the cardinality of this set. For example, let \(\phi = (\lnot a \vee b) \wedge a\) be a propositional formula over variables a and b. Then an equivalent set-theoretic representation of \(\phi \) is \(\{ \{ \lnot a, b \}, \{ a \} \}\). Any subset of \(\{ a, b \}\) is an interpretation of \(\phi \), e.g., \(\{ a, b \}\) is a model of \(\phi \) (written \(\{ a, b \} \models \phi \)) of cardinality two, while \(\emptyset \) is an interpretation but not a model. We can now formally define WMC.

Definition 1 (WMC)

A WMC instance is a tuple \((\phi , X_I, X_P, w)\), where \(X_I\) is the set of indicator variables, \(X_P\) is the set of parameter variables (with \(X_I \cap X_P = \emptyset \)), \(\phi \) is a propositional formula in CNF over \(X_I \cup X_P\), and \(w:X_I \cup X_P \cup \{\lnot x \mid x \in X_I \cup X_P\} \rightarrow \mathbb {R}\) is a weight function such that \(w(x) = w(\lnot x) = 1\) for all \(x \in X_I\). The answer of the instance is \(\sum _{Y \models \phi } \prod _{Y \models l} w(l)\).

That is, the answer to a WMC instance is the sum of the weights of all models of \(\phi \), where the weight of a model is defined as the product of the weights of all (positive and negative) literals in it. Our definition of WMC is largely based on the standard definition [11], but explicitly partitions variables into indicator and parameter variables. In practice, we identify this partition in one of two ways. If an encoding is generated by AceFootnote 1, then variable types are explicitly identified in a file generated alongside the encoding. Otherwise, we take \(X_I\) to be the set of all variables x such that \(w(x) = w(\lnot x) = 1\). Next, we formally define a variation of the WMC problem used by some of the Bayesian network encodings [8, 9].

Definition 2

Let \(\phi \) be a formula over a set of variables X. Then \(Y \subseteq X\) is a minimum-cardinality model of \(\phi \) if \(Y \models \phi \) and \(|Y| \le |Z|\) for all \(Z \models \phi \).

Definition 3 (Minimum-Cardinality WMC)

A minimum-cardinality WMC instance consists of the same tuple as a WMC instance, but its answer is defined to be \(\sum _{Y \models \phi \text {, }|Y| = k} \prod _{Y \models l} w(l)\) (where \(k = \min _{Y \models \phi } |Y|\)) if \(\phi \) is satisfiable, and zero otherwise.

Example 1

Let \(\phi = (x \vee y) \wedge (\lnot x \vee \lnot y) \wedge (\lnot x \vee p) \wedge (\lnot y \vee q) \wedge x\), \(X_I = \{ x, y \}\), \(X_P = \{ p, q \}\), \(w(p) = 0.2\), \(w(q) = 0.8\), and \(w(\lnot p) = w(\lnot q) = 1\). Then \(\phi \) has two models: \(\{x, p\}\) and \(\{ x, p, q \}\) with weights 0.2 and \(0.2 \times 0.8 = 0.16\), respectively. The WMC answer is then \(0.2 + 0.16 = 0.36\), and the minimum-cardinality WMC answer is 0.2.

2.1 Bayesian Network Encodings

A Bayesian network is a directed acyclic graph with random variables as vertices and edges as conditional dependencies. As is common in related literature [15, 32], we assume that each variable has a finite number of values. We call a Bayesian network binary if every variable has two values. If all variables have finite numbers of values, the probability function associated with each variable v can be represented as a conditional probability table (CPT), i.e., a table with a row for each combination of values that v and its parent vertices can take. Each row then also has a probability, i.e., a number in [0, 1].

WMC is a well-established technique for Bayesian network inference, particularly effective on networks where most variables have only a few possible values [15]. Many ways of encoding a Bayesian network into a WMC instance have been proposed. We will refer to them based on the initials of the authors and the year of publication. Darwiche was the first to suggest the d02 [15] encoding that, in many ways, remains the foundation behind most other encodings. He also introduced the distinction between indicator and parameter variables; the former represent variable-value pairs in the Bayesian network, while the latter are associated with probabilities in the CPTs. The encoding sbk05 [32] is the only encoding that deviates from this arrangement: for each variable in the Bayesian network, one indicator variable acts simultaneously as a parameter variable. Chavira and Darwiche propose cd05 [8] where they shift from WMC to minimum-cardinality WMC because that allows the encoding to have fewer variables and clauses. In particular, they propose a way to use the same parameter variable to represent all probabilities in a CPT that are equal and keep only clauses that ‘imply’ parameter variables (i.e., omit clauses where a parameter variable implies indicator variables).Footnote 2 In their next encoding, cd06 [9], the same authors optimise the aforementioned implication clauses, choosing the smallest sufficient selection of indicator variables. A decade later, Bart et al. present bklm16 [3] that improves upon cd06 in two ways. First, they optimise the number of indicator variables used per Bayesian network variable from a linear to a logarithmic amount. Second, they introduce a scaling factor that can ‘absorb’ one probability per Bayesian network variable. However, for this work, we choose to disable the latter improvement since this scaling factor is often small enough to be indistinguishable from zero without the use of arbitrary precision arithmetic, making it completely unusable on realistic instances. Indeed, the reader is free to check that even a small Bayesian network with seven mutually independent binary variables, 0.1 and 0.9 probabilities each, is already big enough for the scaling factor to be exactly equal to zero (as produced by the bklm16 encoderFootnote 3). We suspect that this issue was not identified during the original set of experiments because the authors never looked at numerical answers.

Example 2

Let \(\mathcal {B}\) be a Bayesian network with one variable X which has two values \(x_1\) and \(x_2\) with probabilities \(\Pr (X = x_1) = 0.2\) and \(\Pr (X = x_2) = 0.8\). Let xy be indicator variables, and pq be parameter variables. Then Example 1 is both the cd05 and the cd06 encoding of \(\mathcal {B}\). The bklm16 encoding is \((x \Rightarrow p) \wedge (\lnot x \Rightarrow q) \wedge x\) with \(w(p) = w(\lnot q) = 0.2\), and \(w(\lnot p) = w(q) = 0.8\). And the d02 encoding is \((\lnot x \Rightarrow p) \wedge (p \Rightarrow \lnot x) \wedge (x \Rightarrow q) \wedge (q \Rightarrow x) \wedge \lnot x\) with \(w(p) = 0.2\), \(w(q) = 0.8\), and \(w(\lnot p) = w(\lnot q) = 1\). Note how all other encodings have fewer clauses than d02. While cd05 and cd06 require minimum-cardinality WMC to make this work, bklm16 achieves the same thing by adjusting weights.Footnote 4

3 Pseudo-Boolean Functions

In this work, we propose a more expressive representation for WMC based on pseudo-Boolean functions. A pseudo-Boolean function is a function of the form \(\{ 0, 1 \}^n \rightarrow \mathbb {R}\) [7]. Equivalently, let X denote a set with n elements (we will refer to them as variables), and \(2^X\) denote its powerset. Then a pseudo-Boolean function can have \(2^X\) as its domain (then it is also known as a set function).

Pseudo-Boolean functions, most commonly represented as algebraic decision diagrams (ADDs) [2] (although a tensor-based approach has also been suggested [18, 20]), have seen extensive use in value iteration for Markov decision processes [25], both exact and approximate Bayesian network inference [10, 23], and sum-product network [30] to Bayesian network conversion [38]. ADDs have been extended to compactly represent additive and multiplicative structure [35], sentences in first-order logic [33], and continuous variables [34], the last of which was also applied to weighted model integration, i.e., the WMC extension for continuous variables [6, 28].

Since two-valued pseudo-Boolean functions will be used extensively henceforth, we introduce some new notation. For any propositional formula \(\phi \) over X and \(p, q \in \mathbb {R}\), let \([\phi ]^p_q:2^X \rightarrow \mathbb {R}\) be the pseudo-Boolean function defined as

for any \(Y \subseteq X\). Next, we define some useful operations on pseudo-Boolean functions. The definitions of multiplication and projection are equivalent to those in previous work [19, 20].

Definition 4 (Operations)

Let \(f, g:2^X \rightarrow \mathbb {R}\) be pseudo-Boolean functions, \(x, y \in X\), \(Y = \{y_i\}_{i=1}^n \subseteq X\), and \(r \in \mathbb {R}\). Operations such as addition and multiplication are defined pointwise, i.e., , and likewise for multiplication. Note that properties such as associativity and commutativity are inherited from \(\mathbb {R}\). By regarding a real number as a constant pseudo-Boolean function, we can reuse the same definitions to define scalar operations as \((r+f)(Y) = r+f(Y)\), and \((r \cdot f)(Y) = r \cdot f(Y)\).

Restrictions \(f|_{x=0}, f|_{x=1}:2^X \rightarrow \mathbb {R}\) of f are defined as , and for all \(Y \subseteq X\).

Projection \(\exists _x\) is an endomorphism \(\exists _x:\mathbb {R}^{2^X} \rightarrow \mathbb {R}^{2^X}\) defined as . Since projection is commutative (i.e., \(\exists _x\exists _yf = \exists _y\exists _xf\)) [19, 20], we can define \(\exists _Y:\mathbb {R}^{2^X} \rightarrow \mathbb {R}^{2^X}\) as . Throughout the paper, projection is assumed to have the lowest precedence (e.g., \(\exists _x fg = \exists _x (fg)\)).

Below we list some properties of the operations on pseudo-Boolean functions discussed in this section that can be conveniently represented using our syntax. The proofs of all these properties follow directly from the definitions.

Proposition 1 (Basic Properties)

For any propositional formulas \(\phi \) and \(\psi \), and \(a, b, c, d \in \mathbb {R}\),

  • \([\phi ]^a_b = [\lnot \phi ]^b_a\);

  • \(c + [\phi ]^a_b = [\phi ]^{a+c}_{b+c}\);

  • \(c \cdot [\phi ]^a_b = [\phi ]^{ac}_{bc}\);

  • \([\phi ]^a_b \cdot [\phi ]^c_d = [\phi ]^{ac}_{bd}\);

  • \([\phi ]^1_0 \cdot [\psi ]_0^1 = [\phi \wedge \psi ]_0^1\).

And for any pair of pseudo-Boolean functions \(f, g :2^X \rightarrow \mathbb {R}\) and \(x \in X\), \((fg)|_{x=i} = f|_{x=i} \cdot g|_{x=i}\) for \(i = 0, 1\).

Remark 1

Note that our definitions of binary operations assumed equal domains. For convenience, we can assume domains to shrink whenever a function is independent of some of the variables (i.e., \(f|_{x=0} = f|_{x=1}\)) and expand for binary operations to make the domains of both functions equal. For instance, let \([x]_0^1,[\lnot x]_0^1:2^{\{x\}} \rightarrow \mathbb {R}\) and \([y]_0^1:2^{\{y\}} \rightarrow \mathbb {R}\) be pseudo-Boolean functions. Then \([x]_0^1 \cdot [\lnot x]_0^1\) has \(2^\emptyset \) as its domain. To multiply \([x]_0^1\) and \([y]_0^1\), we expand \([x]_0^1\) into \(\left( [x]_0^1\right) ':2^{\{x, y\}} \rightarrow \mathbb {R}\) which is defined as for all \(Z \subseteq \{ x, y \}\) (and equivalently for \([y]_0^1\)).

4 Pseudo-Boolean Projection

We introduce a new type of computational problem called pseudo-Boolean projection based on two-valued pseudo-Boolean functions. While the same computational framework can handle any pseudo-Boolean functions, two-valued functions are particularly convenient because DPMC can be easily adapted to use them as input. Since we will only encounter functions of the form \([\phi ]^a_b\), where \(\phi \) is a conjunction of literals, we can represent it in text as w \(\langle \phi \rangle \) a b where \(\langle \phi \rangle \) is a representation of \(\phi \) analogous to the representation of a clause in the DIMACS CNF format.

Definition 5 (PBP Instance)

A PBP instance is a tuple \((F, X, \omega )\), where X is the set of variables, F is a set of two-valued pseudo-Boolean functions \(2^X \rightarrow \mathbb {R}\), and \(\omega \in \mathbb {R}\) is the scaling factor.Footnote 5 Its answer is \(\omega \cdot \left( \exists _X\prod _{f \in F}f\right) (\emptyset )\).

figure a

4.1 From WMC to PBP

In this section, we describe an algorithm for transforming WMC instances to the PBP format while removing all parameter variables. We chose to transform existing encodings instead of creating a new one to reuse already-existing techniques for encoding each CPT to its minimal logical representation such as prime implicants and limited forms of resolution [3, 8, 9]. The transformation algorithm works on four out of the five Bayesian network encodings: bklm16 [3], cd05 [8], cd06 [9], and d02 [15]. There is no obvious way to adjust it to work with sbk05 because the roles of indicator and parameter variables overlap [32].

The algorithm is based on several observations that will be made more precise in Sect. 4.2. First, all weights except for \(\{w(p) \mid p \in X_P\}\) are redundant as they either duplicate an already-defined weight or are equal to one. Second, each clause has at most one parameter variable. Third, if the parameter variable is negated, we can ignore the clause (this idea first appears in the cd05 paper [8]). Note that while we formulate our algorithm as a sequel to the WMC encoding procedure primarily because the implementations of Bayesian network WMC encodings are all closed-source, as all transformations in the algorithm are local, it can be efficiently incorporated into a WMC encoding algorithm with no slowdown.

The algorithm is listed as Algorithm 1. The main part of the algorithms is the first loop that iterates over clauses. If a clause consists of a single parameter variable, we incorporate it into \(\omega \). If a clause is of the form \(\alpha \Rightarrow p\), where \(p \in X_P\), and \(\alpha \) is a conjunction of literals over \(X_I\), we transform it into a pseudo-Boolean function \([\alpha ]_1^{w(p)}\). If a clause \(c \in \phi \) has no parameter variables, we reformulate it into a pseudo-Boolean function \([c]_0^1\). Finally, clauses with negative parameter literals are omitted.

As all ‘weighted’ pseudo-Boolean functions produced by the first loop are of the form \([\alpha ]_1^p\) (for some \(p \in \mathbb {R}\) and formula \(\alpha \)), the second loop merges two functions into one whenever \(\alpha \) is a literal. Note that taking into account the order in which clauses are typically generated by encoding algorithms allows us to do this in linear time (i.e., the two mergeable functions will be generated one after the other).

4.2 Correctness Proofs

In this section, we outline key conditions that a (WMC or minimum-cardinality WMC) encoding has to satisfy for Algorithm 1 to output an equivalent PBP instance. We divide the correctness proof into two theorems: Theorem 2 for WMC encodings (i.e., bklm16 and d02) and Theorem 3 for minimum-cardinality WMC encodings (i.e., cd05 and cd06). We begin by listing some properties of pseudo-Boolean functions and establishing a canonical transformation from WMC to PBP.

Theorem 1

(Early Projection [19, 20]). Let X and Y be sets of variables. For all pseudo-Boolean functions \(f:2^X \rightarrow \mathbb {R}\) and \(g:2^Y \rightarrow \mathbb {R}\), if \(x \in X \setminus Y\), then \(\exists _x (f \cdot g) = (\exists _x f) \cdot g\).

Lemma 1

For any pseudo-Boolean function \(f:2^X \rightarrow \mathbb {R}\), we have that \((\exists _X f)(\emptyset ) = \sum _{Y \subseteq X} f(Y)\).

Proof

If \(X = \{x\}\), then \((\exists _xf)(\emptyset ) = (f|_{x=1} + f|_{x=0})(\emptyset ) = f|_{x=1}(\emptyset ) + f|_{x=0}(\emptyset ) = \sum _{Y \subseteq \{x\}} f(Y)\). This easily extends to \(|X| > 1\) by the definition of projection on sets of variables.

Proposition 2

Let \((\phi , X_I, X_P, w)\) be a WMC instance. Then

$$\begin{aligned} \left( \left. \left\{ \left. [c]_0^1 \;\right| \; c \in \phi \right\} \cup \left\{ [x]_{w(\lnot x)}^{w(x)} \;\right| \; x \in X_I \cup X_P\right\} , X_I \cup X_P, 1\right) \end{aligned}$$
(1)

is a PBP instance with the same answer (as defined in Definitions 1 and 5).

Proof

Let \(f = \prod _{c \in \phi } [c]_0^1\), and \(g = \prod _{x \in X_I \cup X_P} [x]_{w(\lnot x)}^{w(x)}\). Then the WMC answer of (1) is \((\exists _{X_I \cup X_P} fg)(\emptyset ) = \sum _{Y \subseteq X_I \cup X_P} (fg)(Y) = \sum _{Y \subseteq X_I \cup X_P} f(Y)g(Y)\) by Lemma 1. Note that

$$ f(Y) = {\left\{ \begin{array}{ll} 1 &{} \text {if } Y \models \phi , \\ 0 &{} \text {otherwise}, \end{array}\right. } \quad \text {and} \quad g(Y) = \prod _{Y \models l} w(l), $$

which means that \(\sum _{Y \subseteq X_I \cup X_P} f(Y)g(Y) = \sum _{Y \models \phi } \prod _{Y \models l} w(l)\) as required.

Theorem 2 (Correctness for WMC)

Algorithm 1, when given a WMC instance \((\phi , X_I, X_P, w)\), returns a PBP instance with the same answer (as defined in Definitions 1 and 5), provided either of the two conditions is satisfied:

  1. 1.

    for all \(p \in X_P\), there is a non-empty family of literals \((l_i)_{i=1}^n\) such that

    1. (a)

      \(w(\lnot p) = 1\),

    2. (b)

      \(l_i \in X_I\) or \(\lnot l_i \in X_I\) for all \(i = 1, \dots , n\),

    3. (c)

      and \(\{ c \in \phi \mid p \in c \text { or } \lnot p \in c \} = \left\{ p \vee \bigvee _{i=1}^n \lnot l_i \right\} \cup \{ l_i \vee \lnot p \mid i = 1, \dots , n \}\);

  2. 2.

    or for all \(p \in X_P\),

    1. (a)

      \(w(p) + w(\lnot p) = 1\),

    2. (b)

      for any clause \(c \in \phi \), \(|c \cap X_P| \le 1\),

    3. (c)

      there is no clause \(c \in \phi \) such that \(\lnot p \in c\),

    4. (d)

      if \(\{ p \} \in \phi \), then there is no clause \(c \in \phi \) such that \(c \ne \{ p \}\) and \(p \in c\),

    5. (e)

      and for any \(c, d \in \phi \) such that \(c \ne d\), \(p \in c\) and \(p \in d\), \(\bigwedge _{l \in c \setminus \{ p \}} \lnot l \wedge \bigwedge _{l \in d \setminus \{ p \}} \lnot l\) is false.

Condition 1 (for d02) simply states that each parameter variable is equivalent to a conjunction of indicator literals. Condition 2 is for encodings that have implications rather than equivalences associated with parameter variables (which, in this case, is bklm16). It ensures that each clause has at most one positive parameter literal and no negative ones, and that at most one implication clause per any parameter variable \(p \in X_P\) can ‘force p to be positive’.

Proof

By Proposition 2,

$$\begin{aligned} \left( \left. \left\{ \left. [c]_0^1 \;\right| \; c \in \phi \right\} \cup \left\{ [x]_{w(\lnot x)}^{w(x)} \;\right| \; x \in X_I \cup X_P\right\} , X_I \cup X_P, 1\right) \end{aligned}$$
(2)

is a PBP instance with the same answer as the given WMC instance. By Definition 5, its answer is \(\left( \exists _{X_I \cup X_P} \left( \prod _{c \in \phi } [c]_0^1 \right) \prod _{x \in X_I \cup X_P} [x]_{w(\lnot x)}^{w(x)} \right) (\emptyset )\). Since both Conditions 1 and 2 ensure that each clause in \(\phi \) has at most one parameter variable, we can partition \(\phi \) into and for all \(p \in X_P\). We can then use Theorem 1 to reorder the answer into \(\left( \exists _{X_I} \left( \prod _{x \in X_I} [x]_{w(\lnot x)}^{w(x)} \right) \left( \prod _{c \in \phi _*} [c]_0^1 \right) \prod _{p \in X_P} \exists _p [p]_{w(\lnot p)}^{w(p)} \prod _{c \in \phi _p} [c]_0^1 \right) (\emptyset )\).

Let us first consider how the unfinished WMC instance \((F, X_I, \omega )\) after the loop on Lines 3 to 10 differs from (2). Note that Algorithm 1 leaves each \(c \in \phi _*\) unchanged, i.e., adds \([c]_0^1\) to F. We can then fix an arbitrary \(p \in X_P\) and let \(F_p\) be the set of functions added to F as a replacement of \(\phi _p\). It is sufficient to show that

$$\begin{aligned} \omega \prod _{f \in F_p} f = \exists _p [p]_{w(\lnot p)}^{w(p)} \prod _{c \in \phi _p} [c]_0^1. \end{aligned}$$
(3)

Note that under Condition 1, \(\bigwedge _{c \in \phi _p} c \equiv p \Leftrightarrow \bigwedge _{i=1}^n l_i\) for some family of indicator variable literals \((l_i)_{i=1}^n\). Thus, \(\exists _p [p]_{w(\lnot p)}^{w(p)} \prod _{c \in \phi _p} [c]_0^1 = \exists _p [p]_1^{w(p)} \left[ p \Leftrightarrow \bigwedge _{i=1}^n l_i \right] _0^1\). If \(w(p) = 1\), then

$$\begin{aligned} \exists _p [p]_1^{w(p)} \left[ p \Leftrightarrow \bigwedge _{i=1}^n l_i \right] _0^1 = \left. \left[ p \Leftrightarrow \bigwedge _{i=1}^n l_i \right] _0^1\right| _{p=1} + \left. \left[ p \Leftrightarrow \bigwedge _{i=1}^n l_i \right] _0^1\right| _{p=0}. \end{aligned}$$
(4)

Since for any input, \(\bigwedge _{i=1}^n l_i\) is either true or false, exactly one of the two summands in Eq. (4) will be equal to one, and the other will be equal to zero, and so \(\left. \left[ p \Leftrightarrow \bigwedge _{i=1}^n l_i \right] _0^1\right| _{p=1} + \left. \left[ p \Leftrightarrow \bigwedge _{i=1}^n l_i \right] _0^1\right| _{p=0} = 1\), where 1 is a pseudo-Boolean function that always returns one. On the other side of Eq. (3), since \(F_p = \emptyset \), and \(\omega \) is unchanged, we get \(\omega \prod _{f \in F_p} f = 1\), and so Eq. (3) is satisfied under Condition 1 when \(w(p) = 1\).

If \(w(p) \ne 1\), then \(F_p = \left\{ \left[ \bigwedge _{i = 1}^n l_i \right] _1^{w(p)} \right\} \), and \(\omega = 1\), and so we want to show that \(\left[ \bigwedge _{i = 1}^n l_i \right] _1^{w(p)} = \exists _p [p]_1^{w(p)} \left[ p \Leftrightarrow \bigwedge _{i=1}^n l_i \right] _0^1\). Indeed, \(\exists _p [p]_1^{w(p)} \left[ p \Leftrightarrow \bigwedge _{i=1}^n l_i \right] _0^1 = w(p) \cdot \left[ \bigwedge _{i=1}^n l_i \right] _0^1 + \left[ \bigwedge _{i=1}^n l_i \right] _1^0 = \left[ \bigwedge _{i=1}^n l_i \right] _1^{w(p)}\). This finishes the proof of the correctness of the first loop under Condition 1.

Now let us assume Condition 2. We still want to prove Eq. (3). If \(w(p) = 1\), then \(F_p = \emptyset \), and \(\omega = 1\), and so the left-hand side of Eq. (3) is equal to one. Then the right-hand side is \(\exists _p [p]_0^1 \prod _{c \in \phi _p} [c]_0^1 = \exists _p \left[ p \wedge \bigwedge _{c \in \phi _p} c \right] _0^1 = \exists _p [p]_0^1 = 0 + 1 = 1\) since \(p \in c\) for every clause \(c \in \phi _p\).

If \(w(p) \ne 1\), and \(\{ p \} \in \phi _p\), then, by Condition 2d, \(\phi _p = \{ \{ p \} \}\), and Algorithm 1 produces \(F_p = \emptyset \), and \(\omega = w(p)\), and so \(\exists _p [p]_{w(\lnot p)}^{w(p)} [p]_0^1 = \exists _p [p]^{w(p)}_0 = w(p) = \omega \prod _{f \in F_p} f\). The only remaining case is when \(w(p) \ne 1\) and \(\{ p \} \not \in \phi _p\). Then \(\omega = 1\), and \(F_p = \left\{ \left. \left[ \bigwedge _{l \in c \setminus \{ p \}} \lnot l\right] _1^{w(p)} \;\right| \; c \in \phi _p \right\} \), so we need to show that \(\prod _{c \in \phi _p} \left[ \bigwedge _{l \in c \setminus \{p \}} \lnot l\right] _1^{w(p)} = \exists _p [p]_{1-w(p)}^{w(p)} \prod _{c \in \phi _p} [c]_0^1\). We can rearrange the right-hand side as

$$\begin{aligned} \exists _p [p]_{1-w(p)}^{w(p)} \prod _{c \in \phi _p} [c]_0^1&= \exists _p [p]_{1-w(p)}^{w(p)} \left[ p \vee \bigwedge _{c \in \phi _p} c \setminus \{ p \} \right] _0^1 \\&= w(p) + (1-w(p)) \left[ \bigwedge _{c \in \phi _p} c \setminus \{ p \} \right] _0^1 \\&= \left[ \bigwedge _{c \in \phi _p} c \setminus \{ p \} \right] _{w(p)}^1 = \left[ \bigvee _{c \in \phi _p} \bigwedge _{l \in c \setminus \{ p \}} \lnot l \right] _1^{w(p)}. \end{aligned}$$

By Condition 2e, \(\bigwedge _{l \in c \setminus \{ p \}} \lnot l\) can be true for at most one \(c \in \phi _p\), and so \(\left[ \bigvee _{c \in \phi _p} \bigwedge _{l \in c \setminus \{ p \}} \lnot l \right] _1^{w(p)} = \prod _{c \in \phi _p} \left[ \bigwedge _{l \in c \setminus \{ p \}} \lnot l \right] _1^{w(p)}\) which is exactly what we needed to show. This ends the proof that the first loop of Algorithm 1 preserves the answer under both Condition 1 and Condition 2. Finally, the loop on Lines 11 to 12 of Algorithm 1 replaces \([v]_1^p[\lnot v]_1^q\) with \([v]_q^p\) (for some \(v \in X_I\) and \(p, q \in \mathbb {R}\)), but, of course, \([v]_1^p[\lnot v]_1^q = [v]_1^p[v]_q^1 = [v]_q^p\), i.e., the answer is unchanged.

Theorem 3 (Minimum-Cardinality Correctness)

Let \((\phi , X_I, X_P, w)\) be a minimum-cardinality WMC instance that satisfies Condition 2b to 2e of Theorem 2 as well as the following:

  1. 1.

    for all parameter variables \(p \in X_P\), \(w(\lnot p) = 1\).

  2. 2.

    all models of \(\{ c \,{\in }\, \phi \mid c \cap X_P = \emptyset \}\) (as subsets of \(X_I\)) have the same cardinality;

  3. 3.

    \(\min _{Z \subseteq X_P} |Z|\) such that \(Y \cup Z \models \phi \) is the same for all \(Y \models \{ c \,{\in }\,\phi \mid c \cap X_P = \emptyset \}\).

Then Algorithm 1, when applied to \((\phi , X_I, X_P, w)\), outputs a PBP instance with the same answer (as defined in Definitions 3 and 5).

In this case, we have to add some assumptions about the cardinality of models. Condition 2 states that all models of the indicator-only part of the formula have the same cardinality. Bayesian network encodings such as cd05 and cd06 satisfy this condition by assigning an indicator variable to each possible variable-value pair and requiring each random variable to be paired with exactly one value. Condition 3 then says that the smallest number of parameter variables needed to turn an indicator-only model into a full model is the same for all indicator-only models. As some ideas duplicate between the proofs of Theorems 2 and 3, the following proof is slightly less explicit and assumes that \(\omega = 1\).

Proof

Let \((F, X_I, \omega )\) be the tuple returned by Algorithm 1 and note that \(F = \left\{ [c]_0^1 \mid c \in \phi \text {, } c \cap X_P = \emptyset \right\} \cup \left\{ \left. \left[ \bigwedge _{l \in c \setminus \{ p \}} \lnot l \right] _1^{w(p)} \;\right| \; p \in X_P \text {, } p \in c \in \phi \text {, } c \ne \{ p \} \right\} \). We split the proof into two parts. In the first part, we show that there is a bijection between minimum-cardinality models of \(\phi \) and \(Y \subseteq X_I\) such that \(\left( \prod _{f \in F} f\right) (Y) \ne 0\).Footnote 6 Let \(Y \subseteq X_I\) and \(Z \subseteq X_I \cup X_P\) be related via this bijection. Then in the second part we will show that

$$\begin{aligned} \prod _{Z \models l} w(l) = \left( \prod _{f \in F} f\right) (Y). \end{aligned}$$
(5)

On the one hand, if \(Z \subseteq X_I \cup X_P\) is a minimum-cardinality model of \(\phi \), then \(\left( \prod _{f \in F}\right) (Z \cap X_I) \ne 0\) under the given assumptions. On the other hand, if \(Y \subseteq X_I\) is such that \(\left( \prod _{f \in F}\right) (Y) \ne 0\), then \(Y \models \{ c \in \phi \mid c \cap X_P = \emptyset \}\). Let \(Y \subseteq Z \subseteq X_I \cup X_P\) be the smallest superset of Y such that \(Z \models \phi \) (it exists by Condition 2c of Theorem 2). We need to show that Z has minimum cardinality. Let \(Y'\) and \(Z'\) be defined equivalently to Y and Z. We will show that \(|Z| = |Z'|\). Note that \(|Y| = |Y'|\) by Condition 2, and \(|Z \setminus Y| = |Z' \setminus Y'|\) by Condition 3. Combining that with the general property that \(|Z| = |Y| + |Z \setminus Y|\) finishes the first part of the proof.

For the second part, let us consider the multiplicative influence of a single parameter variable \(p \in X_P\) on Eq. (5). If the left-hand side is multiplied by w(p) (i.e., \(p \in Z\)), then there must be some clause \(c \in \phi \) such that \(Z \setminus \{ p \} \not \models c\). But then \(Y \models \bigwedge _{l \in c \setminus \{ p \}} \lnot l\), and so the right-hand side is multiplied by w(p) as well (exactly once because of Condition 2e of Theorem 2). This argument works in the other direction as well.

5 Experimental Evaluation

We run a set of experiments, comparing all five original Bayesian network encodings (bklm16, cd05, cd06, d02, sbk05) as well as the first four with Algorithm 1 applied afterwards.Footnote 7 For each encoding e, we write e++ to denote the combination of encoding a Bayesian network as a WMC instance using e and transforming it into a PBP instance using Algorithm 1. Along with DPMCFootnote 8, we also include WMC algorithms used in the papers that introduce each encoding: Ace for cd05, cd06, and d02; CachetFootnote 9 [31] for sbk05; and c2dFootnote 10 [16] with query-dnnfFootnote 11 for bklm16. Ace is also used to encode Bayesian networks into WMC instances for all encodings except for bklm16 which uses another encoder mentioned previously. We focus on the following questions:

  • Can parameter variable elimination improve inference speed?

  • How does DPMC combined with encodings without (and with) parameter variables compare with other WMC algorithms and other encodings?

  • Which instances is our approach particularly successful on (compared to other algorithms and encodings and to the same encoding before our transformation)?

  • What proportion of variables is typically eliminated?

  • Do some encodings benefit from this transformation more than others?

5.1 Setup

DPMC is run with tree decomposition-based planning and ADD-based execution—the best-performing combination in the original set of experiments [20]. We use a single iteration of htd [1] to generate approximately optimal tree decompositions—we found that this configuration is efficient enough to handle huge instances, and yet the width of the returned decomposition is unlikely to differ from optimal by more than one or two. We also enabled DPMC’s greedy mode. This mode (which was not part of the original paper [20]) optimises the order in which ADDs are multiplied by prioritising those with small representations.

For experimental data, we use Bayesian networks available with Ace and Cachet. We split them into the following groups: – DQMR (390 instances) and – Grid networks (450 instances) as described by Sang et al. [32]; – Mastermind (144 instances) and – Random Blocks (256 instances) by Chavira et al. [12]; – other binary Bayesian networks (50 instances) including Plan Recognition [32], Friends and Smokers, Students and Professors [12], and tcc4f; – non-binary classic networks (176 instances): alarm, diabetes, hailfinder, mildew, munin14, pathfinder, pigs, and water.

To perform Bayesian network inference with DPMC (or with any other WMC algorithm not based on compilation such as Cachet), one needs to select a probability to compute [20, 31]. If a network comes with an evidence file, we compute the probability of this evidence. Otherwise, let X be the variable last mentioned in the Bayesian network file. If true is one of the values of X, then we compute \(\Pr (X = \textsf {true})\), otherwise we choose the first-mentioned value of X.

The experiments were run on a computing cluster with Intel Xeon E5-2630, Intel Xeon E7-4820, and Intel Xeon Gold 6138 processors with a 1000 s timeout separately on both encoding and inference, and a 32 GiB memory limit.Footnote 12

5.2 Results

Fig. 1.
figure 1

Cactus plot of all algorithm-encoding pairs. The dotted line denotes the total number of instances used.

Fig. 2.
figure 2

An instance-by-instance comparison between \(\textsf {DPMC} + \texttt {bklm16++}\) (the best combination according to Fig. 1) and the second and third best-performing combinations: \(\textsf {Ace} + \texttt {cd06}\) and \(\textsf {DPMC} + \texttt {bklm16}\).

Fig. 3.
figure 3

Box plots of the numbers of variables in each encoding across all benchmark instances before and after applying Algorithm 1. Outliers and the top parts of some whiskers are omitted.

Table 1. The numbers of instances (out of 1466) that each algorithm and encoding combination solved faster than any other combination and in total.

Figure 1 shows \(\textsf {DPMC}+\texttt {bklm16++}\) to be the best-performing combination across all time limits up to 1000 s with \(\textsf {Ace} + \texttt {cd06}\) and \(\textsf {DPMC}+\texttt {bklm16}\) not far behind. Overall, \(\textsf {DPMC}+\texttt {bklm16++}\) is 3.35 times faster than \(\textsf {DPMC}+\texttt {bklm16}\) and 2.96 times faster than \(\textsf {Ace}+\texttt {cd06}\). Table 1 further shows that \(\textsf {DPMC}+\texttt {bklm16++}\) solves almost a hundred more instances than any other combination, and is the fastest in 69.1 s of them.

The scatter plots in Fig. 2 show that how \(\textsf {DPMC} + \texttt {bklm16++}\) (and perhaps DPMC more generally) compares to \(\textsf {Ace} + \texttt {cd06}\) depends significantly on the data set: the former is a clear winner on DQMR and Grid instances, while the latter performs well on Mastermind and Random Blocks. Perhaps because the underlying WMC algorithm remains the same, the difference between \(\textsf {DPMC} + \texttt {bklm16}\) with and without applying Algorithm 1 is quite noisy, i.e., with most instances scattered around the line of equality. However, our transformation does enable DPMC to solve many instances that were previously beyond its reach.

We also record numbers of variables in each encoding before and after applying Algorithm 1. Figure 3 shows a significant reduction in the number of variables. For instance, the median number of variables in instances encoded with bklm16 was reduced four times: from 1499 to 376. While bklm16++ results in the overall lowest number of variables, the difference between bklm16++ and d02++ seems small. Indeed, the numbers of variables in these two encodings are equal for binary Bayesian networks (i.e., most of our data). Nonetheless, bklm16++ is still much faster than d02++ when run with DPMC.

It is also worth noting that there was no observable difference in the width of the project-join tree used by DPMC (which is equivalent to the treewidth of the primal/Gaifman graph of the input formula [20]) before and after applying Algorithm 1—the observed performance improvement is more likely related to the variable ordering heuristic used by ADDs.Footnote 13

Overall, transforming WMC instances to the PBP format allows us to significantly simplify each instance. This transformation is particularly effective on bklm16, allowing it to surpass cd06 and become the new state of the art. While there is a similarly significant reduction in the number of variables for d02, the performance of \(\textsf {DPMC}+\texttt {d02}\) is virtually unaffected. Finally, while our transformation makes it possible to use cd05 and cd06 with DPMC, the two combinations remain inefficient.

6 Conclusion

In this paper, we showed how the number of variables in a WMC instance can be significantly reduced by transforming it into a representation based on two-valued pseudo-Boolean functions. In some cases, this led to significant improvements in inference speed, allowing \(\textsf {DPMC} + \texttt {bklm16++}\) to overtake \(\textsf {Ace}+\texttt {cd06}\) as the new state of the art WMC technique for Bayesian network inference. Moreover, we identified key properties of Bayesian network encodings that allow for parameter variable removal. However, these properties were rather different for each encoding, and so an interesting question for future work is whether they can be unified into a more abstract and coherent list of conditions.

Bayesian network inference was chosen as the example application of WMC because it is the first and the most studied one [3, 8, 9, 15, 32]. While the distinction between indicator and parameter variables is often not explicitly described in other WMC encodings [21, 26, 37], perhaps in some cases variables could still be partitioned in this way, allowing for not just faster inference with DPMC or ADDMC but also for well-established WMC encoding and inference techniques (such as in the cd05 and cd06 papers [8, 9]) to be transferred to other application domains.