Abstract
Bounded treewidth is one of the most cited combinatorial invariants, which was applied in the literature for solving several counting problems efficiently. A canonical counting problem is #Sat, which asks to count the satisfying assignments of a Boolean formula. Recent work shows that benchmarking instances for #Sat often have reasonably small treewidth. This paper deals with counting problems for instances of small treewidth. We introduce a general framework to solve counting questions based on state-of-the-art database management systems (DBMS). Our framework takes explicitly advantage of small treewidth by solving instances using dynamic programming (DP) on tree decompositions (TD). Therefore, we implement the concept of DP into a DBMS (PostgreSQL), since DP algorithms are already often given in terms of table manipulations in theory. This allows for elegant specifications of DP algorithms and the use of SQL to manipulate records and tables, which gives us a natural approach to bring DP algorithms into practice. To the best of our knowledge, we present the first approach to employ a DBMS for algorithms on TDs. A key advantage of our approach is that DBMS naturally allow to deal with huge tables with a limited amount of main memory (RAM), parallelization, as well as suspending computation.
Our system dpdb is available under GPL3 license at github.com/hmarkus/dp_on_dbs. The work has been supported by the Austrian Science Fund (FWF), Grants Y698, P26696, and P32830, and the German Science Fund (DFG), Grant HO 1294/11-1.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Dynamic programming
- Parameterized algorithmics
- Bounded treewidth
- Database systems
- SQL
- Relational algebra
- Counting
1 Introduction
Counting solutions is a well-known task in mathematics, computer science, and other areas [8, 17, 24, 38]. In combinatorics, for instance, one characterizes the number of solutions to problems by means of mathematical expressions, e.g., generating functions [18]. One particular counting problem, namely model counting (#Sat) asks to output the number of solutions of a given Boolean formula. Model counting and variants thereof have already been applied for solving a variety of real-world applications [8, 10, 19, 44]. Such problems are typically considered rather hard, since #Sat is complete for the class #P [3, 35], i.e., one can simulate any problem of the polynomial hierarchy with polynomially many calls [41] to a #Sat solver. Taming this high complexity is possible with techniques from parameterized complexity [12]. In fact, many of the publicly available #Sat instances show good structural properties after using regular preprocessors like pmc [29], see [23]. By good structural properties, we mean that graph representations of these instance have reasonably small treewidth. The measure treewidth is a structural parameter of graphs which models the closeness of the graph to being a tree and is one of the most cited combinatorial invariants studied in parameterized complexity [12], and subject of recent competitions [15].
This observation gives rise to a general framework for counting problems that leverages treewidth. The general idea to develop such frameworks is indeed not new, since there are both, specialized solvers [9, 23, 25], as well as general systems like D-FLAT [5], Jatatosk [4], and sequoia [31], that exploit treewidth. Some of these systems explicitly use dynamic programming (DP) to directly exploit treewidth by means of so-called tree decompositions (TDs), whereas others provide some kind of declarative layer to model the problem (and perform decomposition and DP internally). In this work, we solve (counting) problems by means of explicitly specified DP algorithms, where essential parts of the DP algorithm are specified in form of SQL SELECT queries. The actual run of the DP algorithm is then delegated to our system dpdb, which employs database management systems (DBMS) [43]. This has not only the advantage of naturally describing and manipulating the tables that are obtained during DP, but also allows dpdb to benefit from decades of database technology in form of the capability to deal with huge tables using limited amount of main memory (RAM), dedicated database joins, as well as query optimization and data-dependent execution plans.
Contribution. We implement a system dpdb for solving counting problems based on dynamic programming on tree decompositions, and present the following contributions. (i) Our system dpdb uses database management systems to handle table operations needed for performing dynamic programming efficiently. The system dpdb is written in Python and employs PostgreSQL as DBMS, but can work with other DBMSs easily. (ii) The architecture of dpdb allows to solve general problems of bounded treewidth that can be solved by means of table operations (in form of relational algebra and SQL) on tree decompositions. As a result, dpdb is a generalized framework for dynamic programming on tree decompositions, where one only needs to specify the essential and problem-specific parts of dynamic programming in order to solve (counting) problems. (iii) Finally, we show how to solve the canonical problem #Sat with the help of dpdb, where it seems that the architecture of dpdb is particularly well-suited. Concretely, we compare the runtime of our system with state-of-the-art model counters, where we observe competitive behavior and promising indications for future work.
2 Preliminaries
We assume familiarity with terminology of graphs and trees. For details, we refer to the literature and standard textbooks [16].
Boolean Satisfiability. We define Boolean formulas and their evaluation in the usual way, cf., [26]. A literal is a Boolean variable x or its negation \(\lnot x\). A CNF formula \(\varphi \) is a set of clauses interpreted as conjunction. A clause is a set of literals interpreted as disjunction. For a formula or clause X, we abbreviate by \(\mathrm{var}(X)\) the variables that occur in X. An assignment of \(\varphi \) is a mapping \(I: \mathrm{var}(\varphi ) \rightarrow \{0,1\}\). The formula \(\varphi (I)\) under assignment I is obtained by removing every clause c from \(\varphi \) that contains a literal set to 1 by I, and removing from every remaining clause of \(\varphi \) all literals set to 0 by I. An assignment I is satisfying if \(\varphi (I)=\emptyset \). Problem #Sat asks to output the number of satisfying assignments of a formula.
Tree Decomposition and Treewidth. A tree decomposition (TD) [12, 27] of a given graph G is a pair \(\mathcal {T}=(T,\chi )\) where T is a rooted tree and \(\chi \) is a mapping which assigns to each node \(t\in V(T)\) a set \(\chi (t)\subseteq V(G)\), called bag, such that (i) \(V(G)=\bigcup _{t\in V(T)}\chi (t)\) and \(E(G)\subseteq \{\,\{u,v\} \;{|}\;t\in V(T), \{u,v\}\subseteq \chi (t)\,\}\); and (ii) for each \(r, s, t\in V(T)\), such that s lies on the path from r to t, we have \(\chi (r) \cap \chi (t) \subseteq \chi (s)\). We let \(\mathrm{width}(\mathcal {T}) \,\mathrel {\mathop :}=\max _{t\in V(T)}|\chi (t)|-1\). The treewidth \(\mathrm{tw}(G)\) of G is the minimum \(\mathrm{width}({\mathcal {T}})\) over all TDs \(\mathcal {T}\) of G. For a node \(t \in V(T)\), we say that \(\mathrm{type}(t)\) is \(\textit{leaf}\) if t has no children and \(\chi (t)=\emptyset \); \(\textit{join}\) if t has children \(t'\) and \(t''\) with \(t'\ne t''\) and \(\chi (t) = \chi (t') = \chi (t'')\); \(\textit{intr}\) (“introduce”) if t has a single child \(t'\), \(\chi (t') \subseteq \chi (t)\) and \(|\chi (t)| = |\chi (t')| + 1\); \(\textit{rem}\) (“removal”) if t has a single child \(t'\), \(\chi (t') \supseteq \chi (t)\) and \(|\chi (t')| = |\chi (t)| + 1\). If for every node \(t\in V(T)\), \(\mathrm{type}(t) \in \{ \textit{leaf}, \textit{join}, \textit{intr}, \textit{rem}\}\), then the TD is called nice.
Example 1
Figure 1 depicts a graph G and a TD \(\mathcal {T}\) of G of width 2. The treewidth of G is also 2 since G contains a complete graph with 3 vertices [27].
\(\blacksquare \)
Relational Algebra. We use relational algebra [11] for manipulation of relations, which forms the theoretical basis of the database standard Structured Query Language (SQL) [43] on tables. An attribute a is of a certain finite domain \(\mathrm{dom}(a)\). Then, a tuple r over set \(\mathrm{att}(r)\) of attributes is a set of pairs of the form (a, v) with \(a\in \mathrm{att}(r),v\in \mathrm{dom}(a)\) s.t. for each \(a\in \mathrm{att}(r)\), there is exactly one \(v\in \mathrm{dom}(a)\) with \((a,v)\in r\). A relation R is a finite set of tuples r over set \(\mathrm{att}(R)\,\mathrel {\mathop :}=\mathrm{att}(r)\) of attributes. Given a relation R over \(\mathrm{att}(R)\). Then, we let \(\mathrm{dom}(R)\,\mathrel {\mathop :}=\bigcup _{a\in \mathrm{att}(R)}\mathrm{dom}(a)\), and let relation R projected to \(A\subseteq \mathrm{att}(R)\) be given by \(\varPi _{A}(R)\,\mathrel {\mathop :}=\{r_A \mid r\in R\}\), where \(r_A \,\mathrel {\mathop :}=\{(a, v) \mid (a, v) \in r, a \in A\}\). This concept can be lifted to extended projection \(\dot{\varPi }_{A,S}\), where we assume in addition to \(A\subseteq \mathrm{att}(R)\), a set S of expressions of the form \(a \leftarrow f\), such that \(a\in \mathrm{att}(R)\setminus A\), and f is an arithmetic function that takes a tuple \(r\in R\), such that there is at most one expression in S for each \(a\in \mathrm{att}(R)\setminus A\). Formally, we define \(\dot{\varPi }_{A,S}(R)\,\mathrel {\mathop :}=\{r_A \cup r^S \mid r\in R\}\) with \(r^S \,\mathrel {\mathop :}=\{(a, f(r)) \mid a \in \mathrm{att}(r), (a \leftarrow f) \in S\}\). Later, we use aggregation by grouping \(_A G_{(a\leftarrow g)}\), where we assume \(A\subseteq \mathrm{att}(R), a\in \mathrm{att}(R)\setminus A\) and a so-called aggregate function g, which takes a relation \(R'\subseteq R\) and returns a value of domain \(\mathrm{dom}(a)\). Therefore, we let \(_A G_{(a\leftarrow g)}(R)\,\mathrel {\mathop :}=\{r\cup \{(a,g(R[r]))\} \mid r\in \varPi _{A}(R)\}\), where \(R[r]\,\mathrel {\mathop :}=\{r'\mid r'\in R, r\subseteq r'\}\). We define renaming of R given set A of attributes, and a bijective mapping \(m:\mathrm{att}(R) \rightarrow A\) s.t. \(\mathrm{dom}(a)=\mathrm{dom}(m(a))\) for \(a\in \mathrm{att}(R)\), by \(\rho _m(R) \,\mathrel {\mathop :}=\{(m(a),v) \mid (a,v)\in R\}\). Selection of rows in R according to a given Boolean formula \(\varphi \) with equalityFootnote 1 is defined by \(\sigma _{\varphi }(R)\,\mathrel {\mathop :}=\{ r \mid r\in R, \varphi (r^E)=\emptyset \}\), where \(r^{E}\) is a truth assignment over \(\mathrm{var}(\varphi )\) such that for each \(v,v',v''\in \mathrm{dom}(R)\cup \mathrm{att}(R)\) (1) \(r^E(v\,{\approx }\,v')=1\) if \((v, v')\in r\), (2) \(r^E(v\,{\approx }\,v)=1\), (3) \(r^E(v\,{\approx }\,v')=r^E(v'\,{\approx }\,v)\), and (4) if \(r^E(v\,{\approx }\,v')=1\), and \(r^E(v'\,{\approx }\,v'')=1\), then \(r^E(v\,{\approx }\,v'')=1\). Given a relation \(R'\) with \(\mathrm{att}(R')\cap \mathrm{att}(R)=\emptyset \). Then, we refer to the cross-join by \(R\times R'\,\mathrel {\mathop :}=\{ r\cup r' \mid r\in R, r'\in R'\}\). Further, a \(\theta \)-join (according to \(\varphi \)) corresponds to \(R \bowtie _\varphi R' \,\mathrel {\mathop :}=\sigma _\varphi (R\times R')\).
3 Towards Relational Algebra for Dynamic Programming
A solver based on dynamic programming (DP) evaluates the input \(\mathcal {I}\) in parts along a given TD of a graph representation G of the input. Thereby, for each node t of the TD, intermediate results are stored in a table \(\tau _{t}\). This is achieved by running a so-called table algorithm \(\mathsf {A}\), which is designed for a certain graph representation, and stores in \(\tau _{t}\) results of problem parts of \(\mathcal {I}\), thereby considering tables \(\tau _{t'}\) for child nodes \(t'\) of t. DP works for many problems \(\mathcal {P}\) as follows.
-
1.
Construct a graph representation G of the given input instance \(\mathcal {I}\).
-
2.
Heuristically compute a tree decomposition \(\mathcal {T}=(T,\chi )\) of G.
-
3.
Traverse the nodes in V(T) in post-order, i.e., perform a bottom-up traversal of T. At every node t during post-order traversal, execute a table algorithm \(\mathsf {A}\) that takes as input t, bag \(\chi (t)\), a local problem \(\mathcal {P}(t,\mathcal {I})=\mathcal {I}_t\) depending on \(\mathcal {P}\), as well as previously computed child tables of t and stores the result in \(\tau _{t}\).
-
4.
Interpret table \(\tau _{n}\) for the root n of T in order to output the solution of \(\mathcal {I}\).
For solving problem \(\mathcal {P}=\textsc {\#Sat} \), we need the following graph representation. The primal graph \(G_\varphi \) [36] of a formula \(\varphi \) has as vertices its variables, where two variables are joined by an edge if they occur together in a clause of \(\varphi \). Given formula \(\varphi \), a TD \(\mathcal {T}=(T,\chi )\) of \(G_\varphi \) and a node t of T. Sometimes, we refer to the treewidth of the primal graph of a given formula by the treewidth of the formula. Then, we let local problem \(\textsc {\#Sat} (t, \varphi )=\varphi _t\) be \(\varphi _t \,\mathrel {\mathop :}=\{\,c \;{|}\;c \in \varphi , \mathrm{var}(c) \subseteq \chi (t)\,\}\), which are the clauses entirely covered by \(\chi (t)\).
Table algorithm \(\mathsf {S}\) as presented in Algorithm 1 shows all the cases that are needed to solve #Sat by means of DP over nice TDs. Each table \(\tau _{t}\) consist of rows of the form \(\langle I, c\rangle \), where I is an assignment of \(\varphi _t\) and c is a counter. Nodes t with \(\mathrm{type}(t)=\textit{leaf}\) consist of the empty assignment and counter 1, cf., Line 1. For a node t with introduced variable \(a\in \chi (t)\), we guess in Line 3 for each assignment \(\beta \) of the child table, whether a is set to true or to false, and ensure that \(\varphi _t\) is satisfied. When an atom a is removed in node t, we project assignments of child tables to \(\chi (t)\), cf., Line 5, and sum up counters of the same assignments. For join nodes, counters of common assignments are multiplied, cf., Line 7.
Example 2
Consider formula \(\varphi \,\mathrel {\mathop :}=\{\overbrace{\{\lnot a, b, c\}}^{c_1}, \overbrace{\{a, \lnot b, \lnot c\}}^{c_2}, \overbrace{\{a, d\}}^{c_3}, \overbrace{\{a, \lnot d\}}^{c_4}\}\). Satisfying assignments of formula \(\varphi \) are, e.g., \(\{a\mapsto 1,b\mapsto 1, c\mapsto 0, d\mapsto 0\}\), \(\{a\mapsto 1, b\mapsto 0,c\mapsto 1, d\mapsto 0\}\) or \(\{a\mapsto 1, b\mapsto 1,c\mapsto 1, d\mapsto 1\}\). In total, there are 6 satisfying assignments of \(\varphi \). Observe that graph G of Fig. 1 actually depicts the primal graph \(G_\varphi \) of \(\varphi \). Intuitively, \(\mathcal{T}\) of Fig. 1 allows to evaluate formula \(\varphi \) in parts. Figure 2 illustrates a nice TD \(\mathcal {T}'=(\cdot , \chi )\) of the primal graph \(G_\varphi \) and tables \(\tau _{1}\), \(\ldots \), \(\tau _{12}\) that are obtained during the execution of \({\mathsf {S}}\) on nodes \(t_1,\ldots ,t_{12}\). We assume that each row in a table \(\tau _{t}\) is identified by a number, i.e., row i corresponds to \(\varvec{u_{t.i}} = \langle I_{t.i}, c_{t.i} \rangle \).
Table \(\tau _{1}=\{\,\langle \emptyset , 1\rangle \,\}\) has \(\mathrm{type}(t_1) = \textit{leaf}\). Since \(\mathrm{type}(t_2) = \textit{intr}\), we construct table \(\tau _{2}\) from \(\tau _{1}\) by taking \(I_{1.i}\cup \{a\mapsto 0\}\) and \(I_{1.i}\cup \{a \mapsto 1\}\) for each \(\langle I_{1.i}, c_{1.i}\rangle \in \tau _{1}\). Then, \(t_3\) introduces c and \(t_4\) introduces b. \(\varphi _{t_1}=\varphi _{t_2}=\varphi _{t_3} = \emptyset \), but since \(\chi (t_4) \subseteq \mathrm{var}(c_1)\) we have \(\varphi _{t_4} = \{c_1,c_2\}\) for \(t_4\). In consequence, for each \(I_{4.i}\) of table \(\tau _{4}\), we have \(\{c_1,c_2\}({{I_{4.i}}})=\emptyset \) since \(\mathsf {S}\) enforces satisfiability of \(\varphi _t\) in node t. Since \(\mathrm{type}(t_5) = \textit{rem}\), we remove variable c from all elements in \(\tau _{4}\) and sum up counters accordingly to construct \(\tau _{5}\). Note that we have already seen all rules where c occurs and hence c can no longer affect interpretations during the remaining traversal. We similarly create \(\tau _{6}=\{\langle \{a\mapsto 0\}, 3 \rangle , \langle \{a \mapsto 1\}, 3 \rangle \}\) and \(\tau _{{10}}=\{\langle \{a \mapsto 1\}, 2 \rangle \}\). Since \(\mathrm{type}(t_{11})=\textit{join}\), we build table \(\tau _{11}\) by taking the intersection of \(\tau _{6}\) and \(\tau _{{10}}\). Intuitively, this combines assignments agreeing on a, where counters are multiplied accordingly. By definition (primal graph and TDs), for every \(c \in \varphi \), variables \(\mathrm{var}(c)\) occur together in at least one common bag. Hence, since \(\tau _{12} = \{\langle \emptyset , 6 \rangle \}\), we can reconstruct for example model \(\{a\mapsto 1,b \mapsto 1, c\mapsto 0, d\mapsto 1\} = I_{11.1} \cup I_{5.4} \cup I_{9.2}\) of \(\varphi \) using highlighted (yellow) rows in Fig. 2. On the other hand, if \(\varphi \) was unsatisfiable, \(\tau _{12}\) would be empty (\(\emptyset \)). \(\blacksquare \)
Alternative: Relational Algebra. Instead of using set theory to describe how tables are obtained during dynamic programming, one could alternatively use relational algebra. There, tables \(\tau _{t}\) for each TD node t are pictured as relations, where \(\tau _{t}\) distinguishes a unique column (attribute) \([\![x]\!]\) for each \(x\in \chi (t)\). Further, there might be additional attributes required depending on the problem at hand, e.g., we need an attribute cnt for counting in #Sat, or an attribute for modeling costs or weights in case of optimization problems. Algorithm 2 presents a table algorithm for problem #Sat that is equivalent to Algorithm 1, but relies on relational algebra only for computing tables. This step from set notation to relational algebra is driven by the observation that in these table algorithms one can identify recurring patterns, and one mainly has to adjust problem-specific parts of it (highlighted by coloring in Algorithm 1). In particular, one typically derives for nodes t with \(\mathrm{type}(t)=\textit{leaf}\), a fresh initial table \(\tau _{t}\), cf., Line 1 of Algorithm 2. Then, whenever an atom a is introduced, such algorithms often use \(\theta \)-joins with a fresh initial table for the introduced variable a representing potential values for a. In Line 3 the selection of the \(\theta \)-join is performed according to \(\varphi _t\), i.e. corresponding to the local problem of #Sat. Further, for nodes t with \(\mathrm{type}(t)=\textit{rem}\), these table algorithms typically need projection. In case of Algorithm 2, Line 5 also needs grouping in order to maintain the counter, as several rows of \(\tau _{1}\) might collapse in \(\tau _{t}\). Finally, for a node t with \(\mathrm{type}(t)=\textit{join}\), in Line 7 we use extended projection and \(\theta \)-joins, where we join on the same truth assignments, which allows us later to leverage advanced database technology. Extended projection is needed for multiplying the counters of the two rows containing the same assignment.
4 Dynamic Programming on TDs Using Databases and SQL
In this section, we present a general architecture to model table algorithms by means of database management systems. The architecture is influenced by the DP approach of the previous section and works as depicted in Fig. 3, where the steps highlighted in yellow and blue need to be specified depending on the problem \(\mathcal {P}\). Steps outside Step 3 are mainly setup tasks, the yellow “E”s indicate events that might be needed to solve more complex problems on the polynomial hierarchy. For example, one could create and drop auxiliary sub-tables for each node during Step 3 within such events. Observe that after the generation of a TD \(\mathcal {T}=(T,\chi )\), Step 2b automatically creates tables \(\tau _{t}\) for each node t of T, where the corresponding table schema of \(\tau _{t}\) is specified in the blue part, i.e., within \(\mathsf {A_{RAlg}}\). The default schema of such a table \(\tau _{t}\) that is assumed in this section foresees one column for each element of the bag \(\chi (t)\), where additional columns such as counters or costs can be added.
Actually, the core of this architecture is focused on the table algorithm \(\mathsf {A_{RAlg}}\) executed for each node t of T of TD \(\mathcal {T}=(T,\chi )\). Besides the definition of table schemes, the blue part concerns specification of the table algorithm by means of a procedural generator template that describes how to dynamically obtain SQL code for each node t thereby oftentimes depending on \(\chi (t)\). This generated SQL code is then used internally for manipulation of tables \(\tau _{t}\) during the tree decomposition traversal in Step 3 of dynamic programming. Listing 3 presents a general template, where parts of table algorithms for problems that are typically problem-specific are replaced by colored placeholders of the form , cf., Algorithm 2. Observe that Line 3 of Listing 3 uses extended projection as in Line 7. This is needed for some problems requiring changes on vertex introduction.
Note, however, that the whole architecture does not depend on certain normalization or forms of TDs, e.g., whether it is nice or not. Instead, a table algorithm of any TD is simply specified by handling problem-specific implementations of the placeholders of Listing 3, where the system following this architecture is responsible for interleaving and overlapping these cases within a node t. In fact, we discuss an implementation of a system according to this architecture next, where it is crucial to implement non-nice TDs to obtain higher efficiency.
4.1 System dpdb: Dynamic Programming with Databases
We implemented the proposed architecture of the previous section in the prototypical dpdb system. The system is open-sourceFootnote 2, written in Python 3 and uses PostgreSQL as DBMS. We are convinced though that one can easily replace PostgreSQL by any other state-of-the-art relational database that uses SQL. In the following, we discuss implementation specifics that are crucial for a performant system that is still extendable and flexible.
Computing TDs. TDs are computed mainly with the library htd version 1.2 with default settings [2], which finds TDs extremely quick also for interesting instances [23] due to heuristics. Note that dpdb directly supports the TD format of recent competitions [15], i.e., one could easily replace the TD library. It is important though to not enforce htd to compute nice TDs, as this would cause a lot of overhead later in dpdb for copying tables. However, in order to benefit from the implementation of \(\theta \)-joins, query optimization and state-of-the-art database technology in general, we observed that it is crucial to limit the number of child nodes of every TD node. Then, especially when there are huge tables involved, \(\theta \)-joins among child node tables cover at most a limited number of child node tables. In consequence, the query optimizer of the database system still has a chance to come up with meaningful execution plans depending on the contents of the table. Note that though one should consider \(\theta \)-joins with more than just two tables, since such binary \(\theta \)-joins already fix in which order these tables shall be combined, thereby again limiting the query optimizer. Apart from this trade-off, we tried to outsource the task of joining tables to the DBMS, since the performance of database systems highly depends on query optimization. The actual limit, which is a restriction from experience and practice only, highly depends on the DBMS that is used. For PostgreSQL, we set a limit of at most 5 child nodes for each node of the TD, i.e., each \(\theta \)-join covers at most 5 child tables.
Towards non-nice TDs. Although this paper presents the algorithms for nice TDs (mainly due to simplicity), the system dpdb interleaves these cases as presented in Listing 3. Concretely, the system executes one query per table \(\tau _{t}\) for each node t during the traversal of TD \(\mathcal {T}\). This query consists of several parts and we briefly explain its parts from outside to inside. First of all, the inner-most part concerns the row candiates for \(\tau _{t}\) consisting of the \(\theta \)-join as in Line 7 of Listing 3, including parts of Line 3, namely cross-joins for each introduced variable, involving \(\#\mathsf {intrTab}\#\) without the filtering on \(\#\mathsf {localProbFilter}\#\). Then, there are different configurations of dpdb concerning these row candidates. For debugging (see below) one could (1) actually materialize the result in a table, whereas for performance runs, one should use (2) common table expressions (CTEs or -queries) or (3) sub-queries (nested queries), which both result in one nested SQL query per table \(\tau _{t}\). On top of these row candidates, projectionFootnote 3 and grouping involving \(\#\mathsf {aggrExp}\#\) as in Line 5 of Listing 3, as well as selection according to \(\#\mathsf {localProbFilter}\#\), cf., Line 3, is specified. It turns out that PostgreSQL can do better with sub-queries, where the query optimizer oftentimes pushes selection and projection into the sub-query if needed, which is not the case for CTEs, as discussed in the PostgreSQL manual [1, Sec. 7.8.1]. On different DBMS or other vendors, e.g., Oracle, it might be better to use CTEs instead.
Example 3
Consider again Example 2 and Fig. 1. If we use table algorithm \(\mathsf {S_{RAlg}}\) with dpdb on formula \(\varphi \) of TD \(\mathcal {T}\) and Option (3): sub-queries, where the row candidates are expressed via a sub-queries. Then, for each node \(t_i\) of \(\mathcal {T}\), dpdb generates a view vi as well as a table \(\tau _{i}\) containing in the end the content of vi. Observe that each view only has one column \([\![a]\!]\) for each variable a of \(\varphi \) since the truth assignment of the other variables are not needed later. This keeps the tables compact, only \(\tau _{1}\) has two rows, \(\tau _{2}\), and \(\tau _{3}\) have only one row. We obtain the following views.
\(\blacksquare \)
Parallelization. A further reason to not over-restrict the number of child nodes within the TD, lies in parallelization. In dpdb, we compute tables in parallel along the TD, where multiple tables can be computed at the same time, as long as the child tables are computed. Therefore, we tried to keep the number of child nodes in the TD as high as possible. In our system dpdb, we currently allow for at most 24 worker threads for table computations and 24 database connections at the same time (both pooled and configurable). On top of that we have 2 additional threads and database connections for job assignments to workers, as well as one dedicated watcher thread for clean-up and connection termination, respectively.
Logging, Debugging and Extensions. Currently, we have two versions of the dpdb system implemented. One version aims for performance and the other one tries to achieve comprehensive logging and easy debugging of problem (instances), thereby increasing explainability. The former for instance does neither keep intermediate results nor create database tables in advance (Step 2b), as depicted in Fig. 3, but creates tables according to an SQL SELECT statement. In the latter we keep all the intermediate results, we record database timestamps before and after certain nodes, provide statistics as, e.g., width, number of rows, etc. Further, since for each table \(\tau _{t}\), exactly one SQL statement is executed for filling this table, we also have a dedicated view of the SQL SELECT statement, whose result is then inserted in \(\tau _{t}\). Together with the power and flexibility of SQL queries, we observed that this helps in finding errors in the table algorithm specifications.
Besides convenient debugging, system dpdb immediately contains an extension for approximation. There, we restrict the table contents to a maximum number of rows. This allows for certain approximations on counting problems or optimization problems, where it is infeasible to compute the full tables. Further, dpdb foresees a dedicated randomization on these restricted number of rows such that we obtain different approximate results on different random seeds.
Note that dpdb can be easily extended. Each problem can overwrite existing default behavior and dpdb also supports problem-specific argument parser for each problem individually. Out-of-the-box, we support the formats DIMACS sat and DIMACS graph [32] as well as the common format for TDs [15].
4.2 Table Algorithms With dpdb for Selected Problems
The system dpdb allows for easy protyping of DP algorithms on TDs. This covers decision problems, counting problems as well as optimization problems. As a proof of concept, we present the relevant parts of table algorithm specification according to the template in Listing 3 for a selection of problems belowFootnote 4. To this end, we assume in this section a not necessarily nice TD \(\mathcal {T}=(T,\chi )\) of the corresponding graph representation of our given instance \(\mathcal {I}\). Further, for the following specifications of the table algorithm using the template \(\mathsf {A_{RAlg}}\) in Algorithm 2, we assume any node t of T and its child nodes \(t_1, \ldots , t_\ell \).
Problem \(\textsc {\#Sat} \). Given instance formula \(\mathcal {I}=\varphi \). Then, specific parts for \(\textsc {\#Sat} \) for node t with \(\varphi _t = \{\{l_{1,1},\ldots ,l_{1,k_1}\}, \ldots , \{l_{n,1},\ldots ,l_{n,k_n}\}\}\).
Observe that for the corresponding decision problem \(\textsc {Sat} \), where the goal is to decide only the existence of a satisfying assignment for given formula \(\varphi \), \(\#\varepsilon \mathsf {Tab}\#\) returns the empty table and parts \(\#\mathsf {aggrExp}\#, \#\mathsf {extProj}\#\) are just empty since there is no counter needed.
Problem \(\#o\hbox {-}\textsc {Col}\). For given input graph \(\mathcal {I}=G=(V,E)\), a o-coloring is a mapping \(\iota : V \rightarrow \{1,\ldots ,o\}\) such that for each edge \(\{u,v\}\in E\), we have \(\iota (u)\ne \iota (v)\). Problem \(\#o\hbox {-}\textsc {Col} \) asks to count the number of o-colorings of G. Local problem \(\#o\hbox {-}\textsc {Col}\)(t,G) is defined by the graph \(G_t \,\mathrel {\mathop :}=(V\cap \chi (t), E\cap [\chi (t)\times \chi (t)])\).
Specific parts for \(\#o\hbox {-}\textsc {Col} \) for node t with \(E(G_t)=\{\{u_1,v_1\},\ldots , \{u_n,v_n\}\}\).
Problem MinVC. Given input graph \(\mathcal {I}=G=(V,E)\), a vertex cover is a set of vertices \(C\subseteq V\) of G such that for each edge \(\{u,v\}\in E\), we have \(\{u,v\}\cap C\ne \emptyset \). Then, MinVC asks to find the minimum cardinality |C| among all vertex covers C, i.e., C is such that there is no vertex cover \(C'\) with \(|C'| < |C|\). Local problem \(\textsc {MinVC}(t,G)\,\mathrel {\mathop :}=G_t\) is defined as above. We use an additional column card for storing cardinalities.
Problem \(\textsc {MinVC} \) for node t with \(E(G_t)=\{\{u_1,v_1\},\ldots , \{u_n,v_n\}\}\) and \(\chi (t)=\{a_1,\ldots ,a_{k}\}\) can be specified as follows.
Observe that \(\#\mathsf {ExtProj}\#\) is a bit more involved on non-nice TDs, as, whenever the column for a vertex a is set to 1, i.e., vertex a is in the vertex cover, we have to consider a only with cost 1, also if a appears in several child node bags.
Note that concrete implementations could generate and apply parts of this specification, as for example in \(\#\mathsf {localProbFilter}\#\) only edges involving newly introduced vertices need to be checked.
Similar to MinVC and #\(o\)-Col one can model several other (graph) problems. One could also think of counting the number of solutions of problem MinVC, where both a column for cardinalities and one for counting is used. There, in addition to grouping with GROUP BY in dpdb, we additionally could use the HAVING construct of SQL, where only rows are kept, whose column card is minimal.
5 Experiments
We conducted a series of experiments using publicly available benchmark sets for #Sat. Our tested benchmarks [22] are publicly available, and our results are also on github at github.com/hmarkus/dp_on_dbs/padl2020.
5.1 Setup
Measure & Resources. We mainly compare wall clock time and number of timeouts. In the time we include preprocessing time as well as decomposition time for computing a TD with a fixed random seed. For parallel solvers we allowed access to 24 physical cores on machines. We set a timeout of 900 s and limited available RAM to 14 GB per instance and solver.
Benchmark Instances. We considered a selection of overall 1494 instances from various publicly available benchmark sets #Sat consisting of fre/meel benchmarksFootnote 5(1480 instances), and c2d benchmarksFootnote 6 (14 instances). However, we considered instances preprocessed by regular #Sat preprocessor pmc [29], similar to results of recent work on #Sat [23], where it was also shown that more than 80% of the #Sat instances have primal treewidth below 19 after preprocessing.
Benchmarked system . We used PostgreSQL 9.5 for our system dpdb, which was available on our benchmark described hardware below. However, we expect major performance increases if higher versions are used, which was not available on our benchmark machines. In particular, parallel queries, where a query is evaluated in parallel, were added and improved in every version greater than 9.6.
Other benchmarked systems. In our experimental work, we present results for the most recent versions of publicly available #Sat solvers, namely, c2d 2.20 [13], d4 1.0 [30], DSHARP 1.0 [33], miniC2D 1.0.0 [34], cnf2eadt 1.0 [28], bdd_minisat_all 1.0.2 [42], and sdd 2.0 [14], which are all based on knowledge compilation techniques. We also considered rather recent approximate solvers ApproxMC2, ApproxMC3 [7] and sts 1.0 [20], as well as CDCL-based solvers Cachet 1.21 [37], sharpCDCLFootnote 7, and sharpSAT 13.02 [40]. Finally, we also included multi-core solvers gpusat 1.0 and gpusat 2.0 [23], which both are based on dynamic programming, as well as countAntom 1.0 [6] on 12 physical CPU cores, which performed better than on 24 cores. Experiments were conducted with default solver options.
Benchmark Hardware. Almost all solvers were executed on a cluster of 12 nodes. Each node is equipped with two Intel Xeon E5-2650 CPUs consisting of 12 physical cores each at 2.2 GHz clock speed, 256 GB RAM and 1 TB hard disc drives (not an SSD) Seagate ST1000NM0033. The results were gathered on Ubuntu 16.04.1 LTS machines with disabled hyperthreading on kernel 4.4.0-139. As we also took into account solvers using a GPU, for gpusat1 and gpusat2 we used a machine equipped with a consumer GPU: Intel Core i3-3245 CPU operating at 3.4 GHz, 16 GB RAM, and one Sapphire Pulse ITX Radeon RX 570 GPU running at 1.24 GHz with 32 compute units, 2048 shader units, and 4 GB VRAM using driver amdgpu-pro-18.30-641594 and OpenCL 1.2. The system operated on Ubuntu 18.04.1 LTS with kernel 4.15.0-34.
5.2 Results
Figure 4 illustrates the top 15 solvers, where instances are preprocessed by pmc, in a cactus-like plot, which provides an overview over all the benchmarked #Sat instances. The x-axis of these plots refers to the number of instances and the y-axis depicts the runtime sorted in ascending order for each solver individually. Overall, dpdb seems to be quite competitive and beats most of the solvers, as for example gpusat1, sharpSAT, dsharp, approxmc as well as cachet. Surprisingly, our system shows a different runtime behavior than the other solvers. We believe that the reason lies in an initial overhead caused by the creation of the tables that seems to depend on the number of nodes of the used TD. There, I/O operations of writing from main memory to hard disk seem to kick in. Table 1 presents more detailed runtime results, showing a solid fifth place for dpdb as our system solves the vast majority of the instances. Assume we only have instances up to an upper boundFootnote 8 of treewidth 35. Then, if instances with TDs up to width 35 are considered, dpdb solves even slightly more instances than countAntom.
6 Final Discussion and Conclusions
We presented a generic system dpdb for explicitly exploiting treewidth by means of dynamic programming on databases. The idea of dpdb is to use database management systems (DBMS) for table manipulation, which makes it (1) easy and elegant to perform rapid prototyping for problems, and (2) allows to leverage from decades of database theory and database system tuning. It turned out that all the cases that occur in dynamic programming can be handled quite elegantly with plain SQL queries. Our system dpdb can be used for both decision and counting problems, thereby also considering optimization. We see our system particularly well-suited for counting problems, especially, since it was shown that for model counting (#Sat) instances of practical relevance typically have small treewidth [23]. In consequence, we carried out preliminary experiments on publicly available instances for #Sat, where we see competitive behavior compared to most recent solvers.
Future Work. Our results give rise to several research questions. First of all, we want to push towards PostgreSQL 12, but at the same time also consider other vendors and systems, e.g., Oracle. In particular, the behavior of different systems might change, when we use different strategies on how to write and evaluate our SQL queries, e.g., sub-queries vs. common table expressions. Currently, we do not create or use any indices, as preliminary tests showed that meaningful B*tree indices are hard to create and oftentimes cost too much time to create. Further, the exploration of bitmap indices, as available in Oracle enterprise DBMS would be worth trying in our case (and for #Sat), since one can efficiently combine database columns by using extremely efficient bit operations.
It might be worth to rigorously test and explore our extensions on limiting the number of rows per table for approximating #Sat or other counting problems, cf., [8, 19, 39]. Another interesting research direction is to study whether efficient data representation techniques on DBMS can be combined with dynamic programming in order to lift our solver to quantified Boolean formulas. Finally, we are also interested in extending this work to projected model counting [21].
Notes
- 1.
We allow for \(\varphi \) to contain expressions \(v{\approx }v'\) as variables for \(v,v'\in \mathrm{dom}(R)\cup \mathrm{att}(R)\), and we abbreviate for \(v\in \mathrm{att}(R)\) with \(\mathrm{dom}(v)=\{0,1\}\), \(v{\approx }1\) by v and \(v{\approx }0\) by \(\lnot v\).
- 2.
Our system dpdb is available under GPL3 license at github.com/hmarkus/dp_on_dbs.
- 3.
Actually, dpdb keeps only columns relevant for the table of the parent node of t.
- 4.
Implementation for problems #Sat as well as MinVC is readily available in dpdb.
- 5.
- 6.
- 7.
- 8.
These upper bounds were obtained via decomposer htd in at most two seconds.
References
Postgresql documentation 12 (2019). https://www.postgresql.org/docs/12/queries-with.html
Abseher, M., Musliu, N., Woltran, S.: htd – a free, open-source framework for (customized) tree decompositions and beyond. In: Salvagnin, D., Lombardi, M. (eds.) CPAIOR 2017. LNCS, vol. 10335, pp. 376–386. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59776-8_30
Bacchus, F., Dalmao, S., Pitassi, T.: Algorithms and complexity results for #SAT and Bayesian inference. In: FOCS 2003, pp. 340–351. IEEE Computer Society (2003)
Bannach, M., Berndt, S.: Practical access to dynamic programming on tree decompositions. Algorithms 12(8), 172 (2019)
Bliem, B., Charwat, G., Hecher, M., Woltran, S.: D-flat\({}^{\text{2 }}\): subset minimization in dynamic programming on tree decompositions made easy. Fundam. Inform. 147(1), 27–61 (2016)
Burchard, J., Schubert, T., Becker, B.: Laissez-Faire caching for parallel #SAT solving. In: Heule, M., Weaver, S. (eds.) SAT 2015. LNCS, vol. 9340, pp. 46–61. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24318-4_5
Chakraborty, S., Fremont, D.J., Meel, K.S., Seshia, S.A., Vardi, M.Y.: Distribution-aware sampling and weighted model counting for SAT. In: AAAI 2014, pp. 1722–1730. The AAAI Press (2014)
Chakraborty, S., Meel, K.S., Vardi, M.Y.: Improving approximate counting for probabilistic inference: From linear to logarithmic sat solver calls. In: IJCAI 2016, pp. 3569–3576. The AAAI Press (2016)
Charwat, G., Woltran, S.: Expansion-based QBF solving on tree decompositions. Fundam. Inform. 167(1–2), 59–92 (2019)
Choi, A., Van den Broeck, G., Darwiche, A.: Tractable learning for structured probability spaces: a case study in learning preference distributions. In: IJCAI 2015. The AAAI Press (2015)
Codd, E.F.: A relational model of data for large shared data banks. Commun. ACM 13(6), 377–387 (1970)
Cygan, M., et al.: Parameterized Algorithms. Springer, Switzerland (2015). https://doi.org/10.1007/978-3-319-21275-3
Darwiche, A.: New advances in compiling CNF to decomposable negation normal form. In: ECAI 2004, pp. 318–322. IOS Press (2004)
Darwiche, A.: SDD: a new canonical representation of propositional knowledge bases. In: IJCAI 2011, pp. 819–826. AAAI Press/IJCAI (2011)
Dell, H., Komusiewicz, C., Talmon, N., Weller, M.: The PACE 2017 parameterized algorithms and computational experiments challenge: the second iteration. In: IPEC 2017, Leibniz International Proceedings in Informatics (LIPIcs), vol. 89, pp. 30:1–30:12. Dagstuhl Publishing (2018)
Diestel, R.: Graph Theory. Graduate Texts in Mathematics, vol. 173, 4th edn. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-662-53622-3
Domshlak, C., Hoffmann, J.: Probabilistic planning via heuristic forward search and weighted model counting. J. Artif. Intell. Res. 30, 565–620 (2007)
Doubilet, P., Rota, G.C., Stanley, R.: On the foundations of combinatorial theory (VI): the idea of generating function. In: Berkeley Symposium on Mathematical Statistics and Probability, vol. 2, pp. 267–318 (1972)
Dueñas-Osorio, L., Meel, K.S., Paredes, R., Vardi, M.Y.: Counting-based reliability estimation for power-transmission grids. In: AAAI 2017, pp. 4488–4494. The AAAI Press (2017)
Ermon, S., Gomes, C.P., Selman, B.: Uniform solution sampling using a constraint solver as an oracle. In: UAI 2012, pp. 255–264. AUAI Press (2012)
Fichte, J.K., Hecher, M., Morak, M., Woltran, S.: Exploiting treewidth for projected model counting and its limits. In: Beyersdorff, O., Wintersteiger, C.M. (eds.) SAT 2018. LNCS, vol. 10929, pp. 165–184. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94144-8_11
Fichte, J.K., Hecher, M., Woltran, S., Zisser, M.: A benchmark collection of #SAT instances and tree decompositions (benchmark set), June 2018. https://doi.org/10.5281/zenodo.1299752
Fichte, J.K., Hecher, M., Zisser, M.: An improved GPU-based SAT model counter. In: Schiex, T., de Givry, S. (eds.) CP 2019. LNCS, vol. 11802, pp. 491–509. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30048-7_29
Gomes, C.P., Sabharwal, A., Selman, B.: Chapter 20: Model counting. In: Handbook of Satisfiability, Frontiers in Artificial Intelligence and Applications, vol. 185, pp. 633–654. IOS Press (2009)
Kiljan, K., Pilipczuk, M.: Experimental evaluation of parameterized algorithms for feedback vertex set. In: SEA. LIPIcs, vol. 103, pp. 12:1–12:12. Schloss Dagstuhl (2018)
Kleine Büning, H., Lettman, T.: Propositional Logic: Deduction and Algorithms. Cambridge University Press, Cambridge (1999)
Kloks, T.: Treewidth: Computations and Approximations. LNCS, vol. 842. Springer, Heidelberg (1994). https://doi.org/10.1007/BFb0045375
Koriche, F., Lagniez, J.M., Marquis, P., Thomas, S.: Knowledge compilation for model counting: affine decision trees. In: IJCAI 2013. The AAAI Press (2013)
Lagniez, J., Marquis, P.: Preprocessing for propositional model counting. In: AAAI, pp. 2688–2694. AAAI Press (2014)
Lagniez, J.M., Marquis, P.: An improved decision-DDNF compiler. In: IJCAI 2017, pp. 667–673. The AAAI Press (2017)
Langer, A., Reidl, F., Rossmanith, P., Sikdar, S.: Evaluation of an MSO-solver. In: Proceedings of ALENEX. pp. 55–63. SIAM/Omnipress (2012)
Liu, J., Zhong, W., Jiao, L.: Comments on “the 1993 DIMACS graph coloring challenge” and “energy function-based approaches to graph coloring”. IEEE Trans. Neural Netw. 17(2), 533 (2006)
Muise, C., McIlraith, S.A., Beck, J.C., Hsu, E.I.: Dsharp: fast d-DNNF compilation with sharpSAT. In: Kosseim, L., Inkpen, D. (eds.) AI 2012. LNCS (LNAI), vol. 7310, pp. 356–361. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30353-1_36
Oztok, U., Darwiche, A.: A top-down compiler for sentential decision diagrams. In: IJCAI 2015, pp. 3141–3148. The AAAI Press (2015)
Roth, D.: On the hardness of approximate reasoning. Artif. Intell. 82(1–2), 273–302 (1996)
Samer, M., Szeider, S.: Algorithms for propositional model counting. J. Discrete Algorithms 8(1), 50–64 (2010)
Sang, T., Bacchus, F., Beame, P., Kautz, H., Pitassi, T.: Combining component caching and clause learning for effective model counting. In: SAT 2004 (2004)
Sang, T., Beame, P., Kautz, H.: Performing Bayesian inference by weighted model counting. In: AAAI 2005. The AAAI Press (2005)
Sharma, S., Roy, S., Soos, M., Meel, K.S.: GANAK: a scalable probabilistic exact model counter. In: IJCAI, pp. 1169–1176. ijcai.org (2019)
Thurley, M.: sharpSAT – counting models with advanced component caching and implicit BCP. In: Biere, A., Gomes, C.P. (eds.) SAT 2006. LNCS, vol. 4121, pp. 424–429. Springer, Heidelberg (2006). https://doi.org/10.1007/11814948_38
Toda, S.: PP is as hard as the polynomial-time hierarchy. SIAM J. Comput. 20(5), 865–877 (1991)
Toda, T., Soh, T.: Implementing efficient all solutions SAT solvers. ACM J. Exp. Algorithmics 21, 1–12 (2015). special Issue SEA 2014
Ullman, J.D.: Principles of Database and Knowledge-Base Systems, vol. II. Computer Science Press (1989)
Xue, Y., Choi, A., Darwiche, A.: Basing decisions on sentences in decision diagrams. In: AAAI 2012. The AAAI Press (2012)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Fichte, J.K., Hecher, M., Thier, P., Woltran, S. (2020). Exploiting Database Management Systems and Treewidth for Counting. In: Komendantskaya, E., Liu, Y. (eds) Practical Aspects of Declarative Languages. PADL 2020. Lecture Notes in Computer Science(), vol 12007. Springer, Cham. https://doi.org/10.1007/978-3-030-39197-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-39197-3_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39196-6
Online ISBN: 978-3-030-39197-3
eBook Packages: Computer ScienceComputer Science (R0)