Efficient Certified RAT Verification

Cruz-Filipe, Luís; Heule, Marijn J. H.; Hunt, Warren A.; Kaufmann, Matt; Schneider-Kamp, Peter

doi:10.1007/978-3-319-63046-5_14

Luís Cruz-Filipe¹⁴,
Marijn J. H. Heule¹⁵,
Warren A. Hunt Jr.¹⁵,
Matt Kaufmann¹⁵ &
…
Peter Schneider-Kamp¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10395))

Included in the following conference series:

International Conference on Automated Deduction

1297 Accesses
45 Citations
1 Altmetric

Abstract

Clausal proofs have become a popular approach to validate the results of SAT solvers. However, validating clausal proofs in the most widely supported format (DRAT) is expensive even in highly optimized implementations. We present a new format, called LRAT, which extends the DRAT format with hints that facilitate a simple and fast validation algorithm. Checking validity of LRAT proofs can be implemented using trusted systems such as the languages supported by theorem provers. We demonstrate this by implementing two certified LRAT checkers, one in Coq and one in ACL2.

Supported by the National Science Foundation under grant CCF-1526760 and by the Danish Council for Independent Research, Natural Sciences, grant DFF-1323-00247.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Efficient Certified Resolution Proof Checking

Efficient, Verified Checking of Propositional Proofs

Mechanical Verification of SAT Refutations with Extended Resolution

1 Introduction

Satisfiability (SAT) solvers are used in many applications in academia and industry, for example to check the correctness of hardware and software [5, 8, 9]. A bug in such a SAT solver could result in an invalid claim that some hardware or software model is correct. In order to deal with this trust issue, we believe a SAT solver should produce a proof of unsatisfiability [17]. In turn, this proof can and should be validated with a trusted checker. In this paper we will present a method and tools to do this efficiently.

Early work on proofs of unsatisfiability focused on resolution proofs [14, 32]. In short, a resolution proof states how every new clause can be constructed using resolution steps. Resolution proofs are easy to validate, but difficult and costly to produce from today’s SAT solvers [19]. Moreover, several state-of-the-art solvers use techniques, such as automated re-encoding [25] and symmetry breaking [11, 20], that go beyond resolution, and therefore cannot be expressed using resolution proofs.

An alternative method is to produce clausal proofs [15, 18, 29], that is, sequences of steps that each modify the current formula by specifying the deletion of an existing clause or the addition of a new clause. Such proofs are supported by all state-of-the-art SAT solvers [6]. The most widely supported clausal proof format is called DRAT [16], which is the format required by the recent SAT competitions^{Footnote 1}. The DRAT proof format was designed to make it as easy as possible to produce proofs, in order to make it easy for implementations to support it [31]. DRAT checkers increase the confidence in the correctness of unsatisfiability results, but there is still room for improvement, i.e., by checking the result using a highly-trusted system [10, 22, 28]. The only mechanically-verified checkers for DRAT [30] or RUP [14] are too slow for practical use. This holds for certified SAT solving [7, 26, 27] as well.

Our tool chain works as follows. When a SAT solver produces a clausal proof of unsatisfiability for a given formula, we validate this proof using a fast non-certified proof checker, which then produces an optimized proof with hints. Then, using a certified checker, we validate that the optimized proof is indeed a valid proof for the original formula. We do not need to trust whether the original proof is correct. In fact, the non-certified checker might even produce an optimized proof from an incorrect proof: since our non-certified checker trims the proof starting from the step that added the empty clause and chaining back through the steps that are necessary to support that step, if the proof contains incorrect steps that are not needed to support the addition of the empty clause, these will be ignored.

Validating clausal proofs is potentially expensive [31]. For each clause addition step in a proof of unsatisfiability, unit clause propagation (explained below) should result in a conflict when performed on the current formula, based on an assignment obtained by negating the clause to be added. Thus, we may need to propagate thousands of unit clauses to check the validity of a single clause addition step. Scanning over the formula thousands of times for a single check would be very expensive. This problem has been mitigated through the use of watch pointers. However, validating clausal proofs is often costly even with watch pointers.

In this paper we first present the new expressive proof format LRAT and afterwards show that this proof format enables the development of efficient certified proof checkers. This work builds upon previous work of some of the co-authors [12], as the LRAT format and the certified Coq checker presented here extend the GRIT format and the certified Coq checker presented there, respectively. Additionally, we implemented an efficient certified checker in the ACL2 theorem proving system, extending [30].

The LRAT format poses several restrictions on the syntax in order to make validation as fast as possible. Each clause in the proof must be suitably sorted. This allows a simple check that the clause does not contain duplicate or complementary literals. Hints are also sorted in such a way that they become unit from left to right. Finally, resolution candidates are sorted by increasing clause index; this allows scanning the formula once.

This paper is structured as follows. In Sect. 2 we briefly recapitulate the checking procedure for clausal proofs based on the DRAT format. The novel LRAT format is introduced in Sect. 3. Section 4 presents an algorithm for verifying LRAT proofs, and discusses its worst-case complexity. We demonstrate the benefits of LRAT by extracting two certified checkers for the format: one in Coq (Sect. 5) and one in ACL2 (Sect. 6). We evaluate the checkers and the potential of LRAT in Sect. 7. Finally, we draw some conclusions in Sect. 8.

Related Work. Independent of our work, Peter Lammich has developed a new format called GRAT and a certified checker based on Isabelle/HOL [23]. Both GRAT and LRAT build on the ideas from [12] and enrich DRAT proofs in the same way. As a consequence, there are now three different certified checkers for enriched DRAT proofs based on three major theorem provers. While equivalent from a theoretical standpoint, these checkers differ by tool chains, performance characteristics, and the extents and contents of the trusted base.

2 Background on Clausal Proof Checking

Consider a formula, or set of clauses implicitly conjoined, where each clause is a list of literals (Boolean proposition letters or their negations), implicitly disjoined. Satisfiability (SAT) solvers decide the question of whether a given formula is satisfiable, that is, true under some assignment of true and false values to the Boolean proposition letters of the formula. A formula is unsatisfiable if there is no assignment under which the formula is true.

Example 1

Consider the formula below, which we will use as a running example:

$$\begin{aligned} F\!=\! & {} (x_1 \vee x_2 \vee \lnot x_3) \wedge (\lnot x_1 \vee \lnot x_2 \vee x_3) \wedge (x_2 \vee x_3 \vee \lnot x_4) \wedge (\lnot x_2 \vee \lnot x_3 \vee x_4)\,\wedge \\&(\lnot x_1 \vee \lnot x_3 \vee \lnot x_4) \wedge ( x_1 \vee x_3 \vee x_4) \wedge (\lnot x_1 \vee x_2 \vee x_4) \wedge ( x_1 \vee \lnot x_2 \vee \lnot x_4) \end{aligned}$$

Each step in a clausal proof is either the addition or the deletion of a clause. Each clause addition step should preserve satisfiability; this should be checkable in polynomial time. The polynomial time checking procedure is described in detail below. Clause deletion steps need not be checked, because they trivially preserve satisfiability. The main reason to include clause deletion steps in proofs is to reduce the computational and memory costs to validate proofs.

A clause with only one literal is called a unit clause. Unit clauses are used to simplify CNF formulas via an algorithm called Unit Clause Propagation (UCP). UCP works as follows: for each unit clause (l), all other clauses containing l are removed from the formula, and all literal occurrences of $\bar{l}$ are removed from all clauses in the formula. Notice that this can result in new unit clauses. UCP terminates when either no literals can be removed, or when it results in a conflict, i.e., all literals in a clause have been removed.

If C is a clause, then $\overline{C}$ denotes its negation, which is a conjunction of all negated literals in C. A clause C has the property Asymmetric Tautology (AT) with respect to a CNF formula F iff UCP on $F \wedge (\overline{C})$ results in a conflict. This operational definition is also known as Reverse Unit Propagation (RUP). The core property used in the DRAT format is Resolution Asymmetric Tautology (RAT). A clause C has the RAT property with respect to a CNF formula F if either it has the AT property, or there exists a literal $l \in C$ (the pivot) such that for all clauses D in F with $\lnot l \in D$, the clause $C \vee (D \setminus \{\lnot l\})$ has the property AT with respect to F. In this case, C can be added to F while preserving satisfiability. The proof of this last property is included in our formalization.

DRAT proof checking works as follows. Let F be the input formula and P be the clausal proof. At each step i, the formula is modified. The initial state is: $F_{0} = F$. At step $i > 0$, the $i^{th}$ line of P is read. If the line has the prefix d, then the clause C described on that line is removed: $F_{i} = F_{i-1} \setminus \{C\}$. Otherwise, if there is no prefix, then C must have the RAT property with respect to formula $F_{i-1}$. This must be validated. If the RAT property can be validated, then the clause is added to the formula: $F_{i} = F_{i-1} \wedge C$. If the validation fails, then the proof is invalid.

The empty clause, typically at the end of the proof, should have the AT property, as it does not have a first literal.

3 Introducing the LRAT Format

The Linear RAT (LRAT) proof format is based on the RAT property, and it is designed to make proof checking as straightforward as possible. The purpose of LRAT proofs is to facilitate the implementation of proof validation software using highly trusted systems such as theorem provers. An LRAT proof can be produced when checking a DRAT proof with a non-certified checker (cf. the end of this section).^{Footnote 2}

The most costly operation during clausal proof validation is finding the unit clauses during unit propagation. The GRIT format [12] removes this problem by requiring proofs to include hints that list all unit clauses. This makes it much easier and faster to validate proofs, because the checker no longer needs to find the unit clauses. However, the GRIT format does not allow checking of all possible clauses that can be learned by today’s SAT solvers and are expressible in the DRAT format.

The LRAT format extends the GRIT format to remove this limitation, by adding support for checking the addition of clauses justified by the non-trivial case of the RAT property. For efficiency, the LRAT format requires that all clauses containing the negated pivot be specified. Furthermore, for each resolvent it has to be specified how to perform UCP, as is done for AT in the GRIT approach. In addition, the pivot must be the first literal in the clause being added.

While the LRAT format is semantically an extension of the GRIT format, we updated two aspects. First, the clauses from the original CNF are not included, as this required verification that these clauses do indeed occur in the original CNF. The advantage of working only with a subset of clauses from the original CNF can be achieved by starting with a deletion step for clauses not relevant for the proof. Second, the syntax of the deletion information has been extended to include a clause identifier. To be recognized, deletion statements are now identified with lines that start with an index followed by “d”. This change makes the format stable under permutations of lines. In practice, checkers expect proof statements in ascending order, which easily can be achieved by sorting the lines numerically. Stability under permutation is useful, as non-certified checkers performing backward analysis often output the steps in a different order than the one needed. This property ensures that e.g. deletions are performed at the right point of time.

To demonstrate these two changes, we first consider an example, which does not require the RAT property. Figure 1 shows an original CNF, the DRUP proof obtained by a SAT solver, the GRIT version of that proof, and, finally, the equivalent LRAT proof.

To specify the addition of a clause justified by the RAT property, we extend the format used for the AT property in GRIT. The line starts with the clause identifier of the new clause followed by the 0-terminated new clause. The first literal of the new clause is required to be the pivot literal. Next, for each clause with clause identifier i containing the negated pivot, we specify the (negative) integer $-i$ followed by a (possibly empty) list of (positive) clause identifiers used in UCP of the new clause with clause i.

For example, consider the first line of the LRAT proof in Fig. 2:

The first number, 9 expresses that the new clause will get identifier 9. The numbers in between the identifier and the first 0 are the literals in the clause. In clause of clause 9 this is only literal 1. The first 0 is followed by the hints. All hints are clause identifiers or their negations. Positive hints express that the clause becomes unit or falsified. Negative hints express that the clause is a candidate for a RAT check, i.e., it contains the complement of the pivot. In the example line, there are three such negative hints: –2, –5, and –7. The LRAT format prescribes that negative literals are listed in increasing order of their absolute value.

After a negative hint, there may be positive hints that list the identifiers of clauses that become unit and eventually falsified. For example, assigning false to the literal in the new clause (1) and to the literals in the second clause apart from the negated pivot (2 and -3) causes the sixth clause to become unit (4), which in turn falsifies the eigth clause.

There are two extensions to this kind of simple RAT checking. (1) It is possible that there are no positive hints following a negative hint. In this case, the new clause and the candidate for a RAT check have two pairs of complementary literals. (2) It is also possible that some positive hints are listed before the first negative hint. In this case, these clauses (i.e., whose identifiers are listed) become unit after assigning the literals in the new clause to false.

The full syntax of the LRAT format is given by the grammar in Fig. 3, where, for the sake of sanity, whitespace (tabs and spaces) is ignored. Note that, syntactically, AT and RAT lines are both covered by RAT lines. AT is just the special case where there is a non-empty list of only positive hints.

Producing LRAT proofs directly from SAT solvers would add significant overhead both in runtime and memory usage, and it might require the addition of complicated code. Instead, we extended the DRAT-trim proof checker [16] to emit LRAT proofs. DRAT-trim already supported the emitting of optimized proofs in the DRAT and TraceCheck+ formats. DRAT-trim emits an LRAT proof after validation of a proof using the “-L proof.lrat” option.

We implemented an uncertified checker for LRAT in C that achieves runtimes comparable to the one from [12] on proofs without RAT lines.

4 Verifying LRAT Proofs

We now discuss how to check an LRAT proof. The algorithm we present takes as input a formula in CNF and an LRAT proof, and returns:

YES, indicating that the proof has successfully been checked, and a new CNF, which is satisfiable if the input CNF was satisfiable;
or NO, indicating that the proof could not be checked.

We are thus able both to check unsatisfiability (if the formula returned in the first case contains the empty clause) and addition of clauses preserving satisfiability.

The algorithm assumes a CNF to be a finite map from a set of positive integers to clauses. We write $C_i$ for the clause with index i. The main step is checking individual RAT steps, which is done by Algorithm check_rat. We use the notation $\widetilde{i}$ to denote a list $[i_1,\ldots ,i_n]$.

Lines 4–10 perform UCP on $\varphi \wedge \overline{C_j} $ using the clauses referred to by $i^0_1,\ldots ,i^0_n$. If the empty clause is derived at some stage, then $C_j$ has the AT property w.r.t. $\varphi $. Otherwise, we store the extended clause C and let p be its first element (line 12). We then check that this clause has the RAT property: we go through all clauses in $\varphi $; lines 14 and 15 deal with the trivial cases, while lines 18–24 again perform UCP to show that $C'$ has the AT property. If the algorithm terminates and returns YES, we have successfully verified that $C_j$ satisfies the RAT property with respect to $\varphi $.

Algorithm check_lrat verifies an LRAT proof by giving each line denoting a RAT step to Algorithm check_rat.

Lemma 1

(Termination). Algorithm check_lrat always terminates.

Proof

Straightforward, as all cycles in both algorithms are for loops.

Theorem 1

(Soundness). If the result of running check_lrat on $\varphi $ and an LRAT proof is YES and $\varphi '$, then: (i) all the steps in the LRAT proof are valid, and (ii) if $\varphi $ is satisfiable, then $\varphi '$ is also satisfiable.

We skip the proof of this theorem, as this algorithm has been directly translated to ACL2 and proved sound therein (Sect. 6).

We now discuss the complexity of these algorithms. We assume efficient data structures, so that e.g. finding an element in a collection can be done in time logarithmic in the number of elements in the collection. In particular, literals in clauses are ordered, and we have constant-time access to any position in a clause. The main challenge is analysing the complexity of a single RAT check.

Lemma 2

Algorithm check_rat runs in time

$$\begin{aligned} \mathcal O\left( |\mathcal I|\cdot |\ell |\cdot \left( \log |\mathcal I|+c\cdot \log (\max (c,|\ell |))\right) \right) \,, \end{aligned}$$

where c is the number of literals in the longest clause in $\varphi $ and $|\ell |$ is the length of the input line.

Proof

Lines 4, 11, 12 and 27 can obviously be done in constant time, while line 2 can be done in time linear in $|\ell |$. Furthermore, the loop in lines 5–10 is the same as that in lines 19–24 (starting with $|C|\le |C'|$), so the worst-case asymptotic complexity of the whole algorithm is that of the loop in lines 13–26.

When reaching line 13, $|C|\le |\ell |$: each literal in C comes either from $C_j$ (which is part of $\ell $) or from one iteration of the loop in lines 5–10, whose hint is obtained from $\ell $. Similarly, $|C'|\le c+|\ell |$ throughout the whole cycle: its literals come either from $C_i\in \varphi $, from C, or from an iteration of the loop in lines 19–24, whose hint is in a different part of $\ell $ than that used to build C.

Line 14 requires looking for a literal in $C_i$, which can be done in time $\mathcal O(\log c)$. Line 15 requires looking for $|C_i|$ literals in C, which can be done in time $\mathcal O(c\cdot \log (|\ell |))$. Line 16 requires finding an index in the data structure generated from $\ell $ in line 2, which can be done in time $\mathcal O(\log |\ell |)$. Lines 17, 18 and 25 can be done in constant time.

We now analyze the loop in lines 19–24, observing that it is executed at most $|\ell |$ times. The loop begins by retrieving $C_m$ from $\varphi $, which can be done in time $\mathcal O(\log |\mathcal I|)$ if we assume CNFs to be stored e.g. in a binary tree. Line 20 then removes all elements of $C'$ from $C_m$, which can be done efficiently by going through $C_m$ and checking whether each element is in $C'$; this has a global complexity of $\mathcal O(c\cdot \log (c+|\ell |))$. (Note that, in the successful case – the one we are interested in – the result is always the empty clause or a single literal.) All the remaining lines can be done in constant time, so the total time required by the loop in lines 19–24 is $\mathcal O(|\ell |(\log |\mathcal I|+c\cdot \log (c+|\ell |)))$.

Since the loop in lines 13–26 is executed $|\mathcal I|$ times, the total time for the whole algorithm is thus

$$\begin{aligned} \mathcal O\left( |\mathcal I|\cdot \left( \log c+c\cdot \log |\ell | +\log |\ell |+|\ell |(\log |\mathcal I|+c\cdot \log (c+|\ell |))\right) \right) \,. \end{aligned}$$

Since both $\log c$ and $\log |\ell |$ are bounded by $\log (c+|\ell |)$, we can replace $\log c+c\cdot \log (c+|\ell |)+\log |\ell |$ by $(c+2)\log (c+|\ell |)$, obtaining

$$\begin{aligned} \mathcal O\left( |\mathcal I|\cdot \left( (c+2)\cdot \log (c+|\ell |)+|\ell |(\log |\mathcal I|+c\cdot \log (c+|\ell |))\right) \right) \end{aligned}$$

which we can simplify to

$$\begin{aligned} \mathcal O\left( |\mathcal I|\cdot \left( |\ell |\log |\mathcal I|+(|\ell |+2)\cdot c\cdot \log (c+|\ell |))\right) \right) \end{aligned}$$

or, equivalently,

$$\begin{aligned} \mathcal O\left( |\mathcal I|\cdot |\ell |\cdot \left( \log |\mathcal I|+c\cdot \log (c+|\ell |))\right) \right) \end{aligned}$$

since $|\ell |$ and $|\ell |+2$ are asymptotically equivalent. Observing that $\log (c+|\ell |)\le \log (2(\max (c,|\ell |))) = \log (2) + \log (\max (c,|\ell |))$ yields the bound in the lemma.

Theorem 2

(Complexity). The complexity of checking an LRAT proof is

$$\begin{aligned} \mathcal O\left( n\cdot (|\mathcal I|+n)\cdot l\left( \log (|\mathcal I|+n)+k\cdot \log k\right) \right) \end{aligned}$$

where n is the number of lines in the DRAT proof, l is the length of the longest line in the proof, $\mathcal I$ and c are as before, and $k=\max (c,l)$.

Proof

The bound follows from observing that the loop in Algorithm check_lrat is executed n times (in case of success); in the worst case, all steps are RAT steps, adding one clause to $\varphi $ (hence the increase in $|\mathcal I|$ to $|\mathcal I|+n$) and potentially making the size of the longest clause in $\varphi $ increase to l (hence raising the multiplicative factor from c to k in the rightmost logarithmic term).

We make some observations. If we allow only the lengths of the proof n to grow while keeping all other parameters fixed, the asymptotic complexity of check_lrat is $\mathcal O(n^2\log n)$. Similarly, if we compare proofs of the same length but consider variations of the length of the clauses in the original CNF, the asymptotic complexity is $O(c\log c)$. In practice, we observe that algorithm check_rat typically terminates in line 7; in these cases, the bound in Lemma 2 can be improved to $\mathcal O(|\ell |\cdot \left( \log |\mathcal {I}|+c\log c\right) )$.

5 Checking LRAT Proofs in Coq

Our development of a verifier of LRAT proofs in Coq does not follow Algorithm check_lrat directly. This is due to the fact that we had previously developed a certified checker for GRIT proofs [12], by extracting an OCaml program from a Coq formalization, and we opted for extending this construction. In particular, the addition of clauses justified by AT (where check_rat returns YES in line 7) is verified using the original checker.

The complexity of checking the RAT property in our development is better than the theoretical upper bound, because we preprocess the LRAT proof and add additional information to bypass line 15 when it fails. (This preprocessing amounts to checking the proof with an untrusted verifier, so the overall complexity including this line is still that of Theorem 2.) The rationale for this preprocessing is that there is a big overhead in using extracted data structures (see [24]), which means that, even if the overall complexity of the extracted checker is optimal, there are large constants that slow down the checker’s performance in practice. We work with a pure extracted program, where all data structures are extracted from their Coq formalizations.^{Footnote 3} This means, in particular, that we do not have lists with direct access. Thus, clauses are represented as binary search trees, which allows most of the operations to have optimal complexity; the exception is the addition in lines 9, 18 and 23, which takes time logarithmic in the size of the original clause, but which is dominated by other steps in the corresponding cycles.

Our experiments show that, with the optimizations enabled by preprocessing, this checker is fast enough to be used in the largest instances available.

The development of the checker in [12] is modular, with different functions that verify each type of line in a GRIT proof. We thus extended this set of functions with a function RAT_check that verifies RAT lines. This function implements a modified variant of Algorithm check_rat: the enriched proof indicates whether we should execute line 15 (and if so, it tells us which literal to look for). Its soundness theorem states that, if the check succeeds, then the clause given can be added to the CNF preserving satisfiability. The term c is the given CNF, while the clause $C_j$ is pivot::cl (so the pivot is already singled out), and L contains the remaining information in the line justifying the RAT step.

(For readibility, we omit type injections from the Coq listings.)

We then enrich the overall loop to include the case where the proof includes RAT lines, and reprove the correctness of the main function refute from [12], whose task it is to prove unsatisfiability of a given formula. Its arguments are only the CNF c (given as a list of pairs index/clause) and the preprocessed LRAT proof (whose type is formalized as Oracle).

By extracting refute, we again obtain a correct-by-construction checker for proofs of unsatisfiability using the full LRAT format. If this checker returns when given a particular CNF and proof, this guarantees that the CNF is indeed unsatisfiable. The universal quantification over the oracle ensures that any errors in its implementation (and in particular in the interface connecting it to the checker) do not affect the correctness of this answer.

Satisfiability-preserving addition of clauses. Algorithm check_lrat is formulated not in terms of unsatisfiability, but of preservation of satisfiability – with unsatisfiability being a particular case where the empty clause is added. In order to provide this functionality, we tweaked our checker to return a pair consisting of a boolean value and a CNF. In the base case (when the input proof is empty), the checker now returns (instead of ) together with the CNF currently stored. If the empty clause is derived at some point, the checker still returns as before, but now together with a CNF containing only the empty clause. If any step fails, we return and also provide the formula currently stored (which results from applying the longest initial segment of the LRAT proof that is verifiable); otherwise we proceed as before.

With these changes, we can still verify unsatisfiability as before, but we can also provide a target CNF and check that the oracle provides a correct reduction from the initial CNF to the target. Function enrich offers this new functionality.

(The predicate ICNF_reduces states that any valuation satisfying c can be used to construct a valuation satisfying c’.)

Results. After adapting the interface to be able to transform proofs in the full LRAT format into the oracle syntax defined above, we tested the extracted checker on 225 unsatisfiability proofs output by SAT solvers supporting RAT proofs. See Sect. 7 for further details.

We also used the possibility of adding new clauses to check the transformation proof from [21], the only SAT-related step in the original proof of the Boolean Pythagorean Triples problem that we were unable to verify in [12]. The certified LRAT checker in Coq was able to verify this proof in 8 minutes and 25 s, including approx. 15 s for checking that the formula generated by the proof coincides with the formula produced by the original SAT solver.

6 LRAT Checker in ACL2

In this section, in order to demonstrate the general applicability of our approach, we extended the original ACL2-based DRAT checker [30] to permit the checking of UNSAT proofs in the LRAT format. We have certified this extension using the ACL2 theorem-proving system.

We outline our formalization below using the Lisp-style ACL2 syntax, with comments to assist readers unfamiliar with Lisp syntax. Note that embedded comments begin with a “;” character and continue to the end of a line.

We omit the code here but note that it has been optimized for efficiency. In particular, applicative hash tables represent formulas, and are heuristically cleaned on occasion after deletion; and mutable objects [2] are used for assignments. These techniques reduce the complexity substantially. Of course, correctness of such optimizations was necessarily proved as part of the overall correctness proof. The code and top-level theorem are available from the top-level file top.lisp in the full proof development [4], included in the GitHub repository [3] that holds ACL2 and its libraries. Also see the README file in that directory. Here we focus primarily on the statement of correctness.

The top-level correctness theorem is as follows.

The command defthm is an ACL2 system command that demands that the ACL2 theorem-proving system establish the validity of the claim that follows the name (in this case main-theorem) of the theorem to be checked.

The theorem above is expressed in terms of the three functions formula-p, refutation-p, and satisfiable. The first of these recognizes structures that represent sets of clauses; our particular representation uses applicative hash tables [1]. The function refutation-p recognizes valid proofs that yield a contradiction; thus, it calls other functions, including one that performs the necessary RAT checks. We verify an alleged proof by checking that each of its steps preserves satisfiability.

Finally, we define satisfiable to mean that there exists an assignment satisfying a given formula. The first definition says that the given assignment satisfies the given formula, while the second uses an existential quantifier to say that some assignment satisfies the given formula.

Before our SAT proof checker can be called, an LRAT-style proof is read from a file, and during the reading process it is converted into an internal Lisp format that is used by our checker. Using the ACL2 theorem prover, we have verified the theorem main-theorem above, which states that our code correctly checks the validity of a proof of the empty clause.

Results. The ACL2 checker is able to check the validity of adding each of the 68, 667 clauses in the transformation proof from [21] in less than 9 s. The certified checking of this LRAT proof is almost as fast as non-certified checking and conversion of the DRAT proof into the LRAT proof by DRAT-trim. This is a testament to the efficiency potential of the LRAT format in particular, and the approach taken in our work in general. At the moment of writing, the correspondence between the formula generated by the original SAT solver and by executing the proof has not been ported yet to the ACL2 checker, but this can easily be added in a similar way as we did for the Coq checker.

7 Experimental Evaluation

In order to evaluate the potential of the LRAT format, we performed extensive experiments on benchmarks from the 2016 SAT competition and the 2015 SAT race. The set of instances we considered consists of the 241 instances from the main and parallel tracks that could be shown to be UNSAT within 5, 000 s using the 2016 competition version of CryptoMiniSat v5 [13]. (CryptoMiniSat was the only solver from this competition where we were able to obtain a non-trivial number of RAT lines in most proofs.) All experiments were performed on identical nodes equipped with dual Intel Xeon E5-2680v3 running at 2.50 GHz with 64 GByte RAM on CentOS with a 3.10.0 Linux kernel.

For each of these instances, the original CNF and proof were first trimmed and optimized and then output in LRAT using drat-trim in backward checking mode. A total of 225 out the 241 instances could be successfully processed by drat-trim within 20, 000 s. Out of the remaining 16 instances, 12 timed out, 3 resulted in a segmentation fault and 1 proof could not be verified. In total there were 381, 468, 814 lines in the 225 proofs totalling 250 GByte, out of which 3, 260, 037 were non-trivial RAT lines.

The Coq checker verified 161 out of these 225 instances within a maximum runtime of 24 h. For the remaining 64 instances, it timed out (59), ran out of memory (1), or determined that the proof was invalid (4). The 161 verified proofs amount to a total of 88 GByte and were processed in just under 3 weeks of CPU time, or in other words at an average speed of 3 MByte per minute.

The ACL2 checker verified 212 out of the 225 instances within a maximum runtime of 6, 708 s, typically being at least an order of magnitude faster than the Coq checker. For the remaining 13 instances, it ran out of memory (1), terminated unexpectedly^{Footnote 4} (1), or determined that the proofs were invalid (11). The 212 verified proofs amount to a total of 205 GByte and were processed in just under 17 h of CPU time, or in other words at an average speed of 207 MByte per minute.

The alleged LRAT proofs for the 11 instances where verification using the ACL2 checker failed range in size from 50 MByte to 6.4 GByte. The Coq checker either agrees with the result (7) or times out (4). We then inspected the smallest alleged proofs by hand and found that they indeed are not valid LRAT proofs.

Given the size of the proofs involved, determining the reason for a failed verification is definitely a challenge. When the non-certified checker claims that it successfully verified the proof, but outputs an LRAT proof that cannot be verified by the certified checker, it seems reasonable to assume that at least the non-certified checker has bugs. This is because the non-certified checker does not only transform the proof from one format into another, but also checks the individual steps.

To summarize the experiments, both certified checkers have been found to be able to verify LRAT proofs of up to several GByte within reasonable computational resources. The input data, the executables, and instructions how to rerun the experiments are available from: http://imada.sdu.dk/~petersk/lrat/

8 Conclusions

We have introduced a novel format for clausal proof checking, Linear RAT (LRAT), which extends the GRIT format [12] to support checking all techniques used in state-of-the-art SAT solvers. We have shown that it allows for implementing efficient certified proof checkers for UNSAT proofs with the RAT property, both using Coq and using ACL2. The ACL2 LRAT checker is almost as fast as — and in some cases even faster than — non-certified checking by DRAT-trim of the corresponding DRAT proof. This suggests that certified checking can be achieved with a reasonable overhead.

Furthermore, we have shown that our Coq checker’s ability to check transformation proofs has allowed us to check the transformation proof from [21], the only SAT-related step in the original proof of the Boolean Pythagorean Triples problem that we were unable to verify in [12].

Notes

1.
see http://satcompetition.org.
2.
DRAT proofs and LRAT proofs are syntactic objects that do not necessarily represent valid proofs. However, they are produced by tools that should only generate objects that correspond to semantically valid proofs, so we adopt this terminology. By “validating a DRAT/LRAT proof”, we mean verifying by independent means that such an object indeed represents a valid proof.
3.
With the exception of integers, which are only used as labels and therefore can be extracted to a native type without compromising soundness of the extracted code.
4.
Termination seems to occur due to an error condition of the underlying LISP runtime system (CCL) used, and could not be reproduced using another system (SBCL).

References

ACL2 Community. ACL2 documentation topic: FAST-ALISTS. http://www.cs.utexas.edu/users/moore/acl2/current/manual/index.html?topic=ACL2____FAST-ALISTS
ACL2 Community. ACL2 documentation topic: STOBJ. http://www.cs.utexas.edu/users/moore/acl2/v7-2/manual/?topic=ACL2____STOBJ
ACL2 Community. ACL2 system and libraries on GitHub. https://github.com/acl2/acl2/
ACL2 LRAT checker. https://github.com/acl2/acl2/tree/master/books/projects/sat/lrat/
Ivančić, F., Yang, Z., Ganai, M.K., Gupta, A., Ashar, P.: Efficient SAT-based bounded model checking for software verification. Theoretical Computer Science 404(3), 256–274 (2008)
Article MathSciNet MATH Google Scholar
Balyo, T., Heule, M.J.H., Järvisalo, M.: Sat competition 2016: Recent developments. In: AAAI 2017 (2017)
Google Scholar
Blanchette, J.C., Fleury, M., Weidenbach, C.: A verified SAT solver framework with learn, forget, restart, and incrementality. In: Olivetti, N., Tiwari, A. (eds.) IJCAR 2016. LNCS (LNAI), vol. 9706, pp. 25–44. Springer, Cham (2016). doi:10.1007/978-3-319-40229-1_4
Google Scholar
Clarke, E.M., Biere, A., Raimi, R., Zhu, Y.: Bounded model checking using satisfiability solving. Formal Methods Syst. Des. 19(1), 7–34 (2001)
Article MATH Google Scholar
Copty, F., Fix, L., Fraer, R., Giunchiglia, E., Kamhi, G., Tacchella, A., Vardi, M.Y.: Benefits of bounded model checking at an industrial setting. In: Berry, G., Comon, H., Finkel, A. (eds.) CAV 2001. LNCS, vol. 2102, pp. 436–453. Springer, Heidelberg (2001). doi:10.1007/3-540-44585-4_43
Chapter Google Scholar
The Coq proof assistant. https://coq.inria.fr/
Crawford, J., Ginsberg, M., Luks, E., Roy, A.: Symmetry-breaking predicates for search problems. In: $KR\tilde{O}$ 1996, pp. 148–159. Morgan Kaufmann (1996)
Google Scholar
Cruz-Filipe, L., Marques-Silva, J., Schneider-Kamp, P.: Efficient certified resolution proof checking. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10205, pp. 118–135. Springer, Heidelberg (2017). doi:10.1007/978-3-662-54577-5_7
Chapter Google Scholar
Cryptominisat v5. http://baldur.iti.kit.edu/sat-competition-2016/solvers/main/cmsat5_main2.zip
Darbari, A., Fischer, B., Marques-Silva, J.: Industrial-strength certified SAT solving through verified SAT proof checking. In: Cavalcanti, A., Deharbe, D., Gaudel, M.-C., Woodcock, J. (eds.) ICTAC 2010. LNCS, vol. 6255, pp. 260–274. Springer, Heidelberg (2010). doi:10.1007/978-3-642-14808-8_18
Chapter Google Scholar
Goldberg, E.I., Novikov, Y.: Verification of proofs of unsatisfiability for CNF formulas. In: DATE, pp. 10886–10891 (2003)
Google Scholar
Heule, M.J.H.: The DRAT format and DRAT-trim checker. CoRR, abs/1610.06229 (2016). Source code, https://github.com/marijnheule/drat-trim
Heule, M.J.H., Biere, A.: Proofs for satisfiability problems. In: All about Proofs, Proofs for All (APPA), July 2014. http://www.easychair.org/smart-program/VSL2014/APPA-index.html
Heule, M.J.H., Hunt Jr., W.A., Wetzler, N.D.: Trimming while checking clausal proofs. In: FMCAD, pp. 181–188 (2013)
Google Scholar
Heule, M.J.H., Hunt Jr., W.A., Wetzler, N.D.: Bridging the gap between easy generation and efficient verification of unsatisfiability proofs. Softw. Test., Verif. Reliab. 24(8), 593–607 (2014)
Google Scholar
Heule, M.J.H., Hunt Jr., W.A., Wetzler, N.D.: Expressing symmetry breaking in DRAT proofs. In: Felty, A.P., Middeldorp, A. (eds.) CADE 2015. LNCS (LNAI), vol. 9195, pp. 591–606. Springer, Cham (2015). doi:10.1007/978-3-319-21401-6_40
Chapter Google Scholar
Heule, M.J.H., Kullmann, O., Marek, V.W.: Solving and verifying the boolean pythagorean triples problem via cube-and-conquer. In: Creignou, N., Le Berre, D. (eds.) SAT 2016. LNCS, vol. 9710, pp. 228–245. Springer, Cham (2016). doi:10.1007/978-3-319-40970-2_15
Google Scholar
Kaufmann, M., Moore, J S.: An industrial strength theorem prover for a logic based on common LISP. IEEE Trans. Softw. Eng. 23(4), 203–213 (1997)
Google Scholar
Lammich, P.: Efficient verified (UN)SAT certificate checking. In: CADE-26. LNCS. Springer (to appear, 2017)
Google Scholar
Letouzey, P.: Extraction in Coq: an overview. In: Beckmann, A., Dimitracopoulos, C., Löwe, B. (eds.) CiE 2008. LNCS, vol. 5028, pp. 359–369. Springer, Heidelberg (2008). doi:10.1007/978-3-540-69407-6_39
Chapter Google Scholar
Manthey, N., Heule, M.J.H., Biere, A.: Automated reencoding of boolean formulas. In: Biere, A., Nahir, A., Vos, T. (eds.) HVC 2012. LNCS, vol. 7857, pp. 102–117. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39611-3_14
Chapter Google Scholar
Maric, F.: Formal verification of a modern SAT solver by shallow embedding into Isabelle/HOL. Theor. Comput. Sci. 411(50), 4333–4356 (2010)
Article MathSciNet MATH Google Scholar
Maric, F., Janicic, P.: Formalization of abstract state transition systems for SAT. Logical Methods in Comput. Sci. 7(3) (2011)
Google Scholar
Nipkow, T., Paulson, L.C., Wenzel, M.: Isabelle/HOL - A Proof Assistant for Higher-Order Logic. Springer, Heidelberg (2002)
MATH Google Scholar
Van Gelder, A.: Producing and verifying extremely large propositional refutations - have your cake and eat it too. Ann. Math. Artif. Intell. 65(4), 329–372 (2012)
Article MathSciNet MATH Google Scholar
Wetzler, N.D., Heule, M.J.H., Hunt Jr., W.A.: Mechanical verification of SAT refutations with extended resolution. In: Blazy, S., Paulin-Mohring, C., Pichardie, D. (eds.) ITP 2013. LNCS, vol. 7998, pp. 229–244. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39634-2_18
Chapter Google Scholar
Wetzler, N.D., Heule, M.J.H., Hunt Jr., W.A.: DRAT-trim: efficient checking and trimming using expressive clausal proofs. In: Sinz, C., Egly, U. (eds.) SAT 2014. LNCS, vol. 8561, pp. 422–429. Springer, Cham (2014). doi:10.1007/978-3-319-09284-3_31
Google Scholar
Zhang, L., Malik, S.: Validating SAT solvers using an independent resolution-based checker: Practical implementations and other applications. In: DATE, pp. 10880–10885 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
Luís Cruz-Filipe & Peter Schneider-Kamp
Department of Computer Science, The University of Texas at Austin, Austin, USA
Marijn J. H. Heule, Warren A. Hunt Jr. & Matt Kaufmann

Authors

Luís Cruz-Filipe
View author publications
You can also search for this author in PubMed Google Scholar
Marijn J. H. Heule
View author publications
You can also search for this author in PubMed Google Scholar
Warren A. Hunt Jr.
View author publications
You can also search for this author in PubMed Google Scholar
Matt Kaufmann
View author publications
You can also search for this author in PubMed Google Scholar
Peter Schneider-Kamp
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luís Cruz-Filipe .

Editor information

Editors and Affiliations

Microsoft Research, Redmond, Washington, USA
Leonardo de Moura

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cruz-Filipe, L., Heule, M.J.H., Hunt, W.A., Kaufmann, M., Schneider-Kamp, P. (2017). Efficient Certified RAT Verification. In: de Moura, L. (eds) Automated Deduction – CADE 26. CADE 2017. Lecture Notes in Computer Science(), vol 10395. Springer, Cham. https://doi.org/10.1007/978-3-319-63046-5_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-63046-5_14
Published: 11 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63045-8
Online ISBN: 978-3-319-63046-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient Certified RAT Verification

Abstract

Similar content being viewed by others

Efficient Certified Resolution Proof Checking

Efficient, Verified Checking of Propositional Proofs

Mechanical Verification of SAT Refutations with Extended Resolution

1 Introduction