1 Introduction

In a fruitful, recent trend, many that aspire to innovate in verification seek advice from research in machine learning [e.g. 36, 37, 11, 38, 18, 21, 12, 22, 13, 24]. The topic of this paper is the application of the monotone theory, developed by Bshouty in exact concept learning, to improve theoretical complexity results for inductive invariant inference.

One of the modi operandi for automatically proving that a system is safe—that it cannot reach a state it should not—is an inductive invariant, which is an assertion that (i) holds for the initial states, (ii) does not hold in any bad state, and (iii) is closed under transitions of the system. These properties are reminiscent of a data classifier, separating good from bad points, prompting the adaptation of algorithms from classical classification to invariant inference [e.g. 36, 37, 11, 38, 18, 35, 13, 19]. In this paper we focus on inductive invariants for propositional transition systems, which are customary in hardware verification and also applicable to software systems through predicate abstraction [17, 20].

The monotone theory by Bshouty [5] is a celebrated achievement in learning theory (most notably in exact learning with queries [1]) that is the foundation for learning Boolean formulas with complex syntactic structures. At its core, the monotone theory studies the monotonization \(\mathcal {M}_{b}({\varphi })\) of a formula \(\varphi \) w.r.t. a valuation \(b\), which is the smallest \(b\)-monotone formula that overapproximates \(\varphi \). (This concept is explained in Sect. 3.) In Bshouty’s work, several monotonizations are used to efficiently reconstruct \(\varphi \).

Recently, the monotone theory has been applied to theoretical studies of invariant inference in the context of two prominent SAT-based inference approaches. In this paper, we solve open problems in each, using a new efficient algorithm to compute monotonizations.

Efficient Interpolation-Based Inference. The study [13] of interpolation-based invariant inference—a hugely influential approach pioneered by McMillan [26]—identified the fence condition as a property of systems and invariants under which the success of a model-based inference algorithm [3, 6] is guaranteed. (We explain the fence condition in Sect. 5.1.) Under this condition, the number of SAT calls (specifically, bounded model checking queries) of the original model-based interpolation algorithm was shown to be polynomial in the DNF size (the number of terms in the smallest DNF representation) of the invariant, but only when the invariant is monotone (containing no negated variables) [13]. Based on the monotone theory, the authors of [13] further introduced an algorithm that, under the same fence condition, could efficiently infer invariants that were almost monotone (containing \(O(1)\) terms with negated variables). However, their techniques could not extend to mimic the pinnacle result of Bshouty’s paper: the CDNF algorithm [5], which can learn formulas in a number of queries that is polynomial in their DNF size, their CNF size (the number of clauses in their smallest CNF representation), and the number of variables. It was unclear whether an analogous result is possible in invariant inference without strengthening the fence condition, e.g. to assume that the fence condition holds both forwards and backwards (see Sect. 7).

We solve this question, and introduce an algorithm that can infer an invariant in a number of SAT queries (specifically, bounded model checking queries) that is polynomial in the invariant’s DNF size, CNF size, and the number of variables, under the assumption that the invariant satisfies the fence condition, without further restrictions (Therorm 5). In particular, this implies that invariants that are representable by a small decision tree can be inferred efficiently. The basic idea is to learn an invariant I as a conjunction of monotonizations \(\mathcal {M}_{\sigma }({I})\) where \(\sigma \) are chosen as counterexamples to induction. The challenge is that I is unknown, and the relatively weak assumption on the transition system of the fence condition does not allow the use of several operations (e.g. membership queries) that learning algorithms rely on to efficiently generate such a representation.

Efficient Abstract Interpretation. The study of IC3/PDR [4, 9] revealed that part of the overapproximation this sophisticated algorithm performs is captured by an abstract interpretation procedure, in an abstract domain founded on the monotone theory [14]. In this procedure, dubbed \(\varLambda \)-PDR, each iteration involves several monotonizations of the set of states reachable in one step from the value of the previous iteration. Upper bounds on the number of iterations in \(\varLambda \)-PDR were investigated to shed light on the number of frames of PDR [14]. However, it was unclear whether the abstract domain itself can be implemented in an efficient manner, and whether efficient complexity bounds on the number of SAT queries (and not just the number of iterations) in \(\varLambda \)-PDR can be obtained.

We solve this question, and show that \(\varLambda \)-PDR can be implemented to yield an overall upper bound on the number of SAT calls which is polynomial in the same quantity that was previously used to bound the number of iterations in \(\varLambda \)-PDR (Theorem 7). This is surprising because, until now, there was no way to compute monotonizations of the post-image of the previous iteration that did not suffer from the fact that the exact post-image of a set of states may be much more complex to represent than its abstraction.

Super-efficient Monotonization. Bshouty [5] provided an algorithm to compute the monotonization \(\mathcal {M}_{b}({\varphi })\), but the complexity of this algorithm depends on the DNF size of the original formula \(\varphi \). Our aforementioned results build on a new algorithm for the same task, whose complexity depends on the DNF size of the monotonization \(\mathcal {M}_{b}({\varphi })\) (Theorem 2), which may be much smaller (and never larger). This enables our efficient interpolation-based inference algorithm and our efficient implementation of abstract interpretation, although each result requires additional technical sophistication: For our efficient model-based interpolation result, the key idea is that the monotonization of an invariant satisfying the fence condition can be computed through the monotonization of the set of states reachable in at most s steps, and our new monotonization algorithm allows to do this efficiently even when the latter set is complex to represent exactly. For our efficient abstract interpretation result, the key idea is that the DNF size of an abstract iterate is bounded by a quantity related to monotonizations of the transition relation, and our new monotonization algorithm allows to compute it efficiently w.r.t the same quantity even though the DNF size of the exact post-image of the previous iterate may be larger.

Overall, we make the following contributions:

  • We introduce a new efficient algorithm to compute monotonizations, whose complexity in terms of the number of SAT queries is proportional to the DNF size of its output (Sect. 4).

  • We prove that an invariant that satisfies the fence condition can be inferred in a number of SAT (bounded model checking) queries that is polynomial in its CNF size, its DNF size, and the number of variables; in particular, invariants represented by small decision trees can be efficiently inferred (Sect. 5).

  • We prove an efficient complexity upper bound for the number of SAT queries performed by abstract interpretation in a domain based on the monotone theory (Sect. 6).

Section 2 sets preliminary notation and Sect. 3 provides background on the monotone theory. Section 7 discusses related work and Sect. 8 concludes.

2 Preliminaries

We work with propositional transition systems defined over a vocabulary \(\varSigma = \{{p_1,\ldots ,p_n}\}\) of n Boolean variables. We identify a formula with the set of its valuations, and at times also identify a set of valuations with an arbitrary formula that represents it which is chosen arbitrarily (one always exists in propositional logic). \(\varphi \Longrightarrow \psi \) denotes the validity of the formula \(\varphi \rightarrow \psi \).

States, Transition Systems, Inductive Invariants. A state is a valuation to \(\varSigma \). If x is a state, x[p] is the value (\({\textit{true}}/{\textit{false}}\) or 1/0) that x assigns to the variable \(p \in \varSigma \). A transition system is a triple \(({\textit{Init}},\delta ,\textit{Bad})\) where \({\textit{Init}},\textit{Bad}\) are formulas over \(\varSigma \) denoting the set of initial and bad states respectively, and the transition relation \(\delta \) is a formula over \(\varSigma \uplus \varSigma '\), where \(\varSigma ' = \{ x' \mid x\in \varSigma \}\) is a copy of the vocabulary used to describe the post-state of a transition. If \(\tilde{\varSigma },\tilde{\varSigma }'\) are distinct copies of \(\varSigma \), \(\delta [\tilde{\varSigma },\tilde{\varSigma }']\) denotes the substitution in \(\delta \) of each \(p \in \varSigma \) by its corresponding in \(\tilde{\varSigma }\) and likewise for \(\varSigma ',\tilde{\varSigma '}\). Given a set of states S, the post-image of S is \(\delta (S) = \{{\sigma ' \mid \exists \sigma \in S. \ (\sigma ,\sigma ') \,\,\models \,\, \delta }\}\). A transition system is safe if all the states that are reachable from \({\textit{Init}}\) via any number of steps of \(\delta \) satisfy \(\lnot \textit{Bad}\). An inductive invariant is a formula I over \(\varSigma \) such that (i) \({\textit{Init}}\Longrightarrow I\), (ii) \(I \wedge \delta \Longrightarrow I'\), and (iii) \(I \Longrightarrow \lnot \textit{Bad}\), where \(I'\) denotes the result of substituting each \(x\in \varSigma \) for \(x' \in \varSigma '\) in I.

In the context of propositional logic, a transition system is safe iff it has an inductive invariant.

Use of SAT in Invariant Inference. Given a candidate, the requirements for being an inductive invariant can be verified using SAT; we refer to the SAT query that checks requirement ii by the name inductiveness check. When an inductiveness checks fails, a SAT solver returns a counterexample to induction, which is a transition \((\sigma ,\sigma ')\) with \(\sigma \,\,\models \,\, I\) but \(\sigma ' \not \models I\). Another important check in invariant inference algorithms that can be implemented using SAT is bounded model checking (BMC) [2], which asks whether a set of states described by a formula \(\psi \) is forwards unreachable in a bounded number \(s \in \mathbb {N}\) of steps; we write this as the check \({\underline{\delta }}^{{s}}({{\textit{Init}}}) \cap \psi \overset{?}{=}\ \emptyset \). Using SAT it is also possible to obtain a counterexample \(\sigma \in {\underline{\delta }}^{{s}}({{\textit{Init}}}) \cap \psi \) if it exists.

We measure the complexity of a SAT-based inference algorithm by the number of SAT calls it performs (including inductiveness checks, BMC, and other SAT calls), and the number of other steps, when each SAT call is considered one step (an oracle call).

Literals, Cubes, Clauses, CNF, DNF. A literal \(\ell \) is a variable p or its negation \(\lnot p\). A clause c is a disjunction of literals. The empty clause is \({\textit{false}}\). A formula is in conjunctive normal norm (CNF) if it is a conjunction of clauses. A cube or term d is a conjunction of a consistent set of literals; at times, we also refer directly to the set and write \(\ell \in d\). The empty cube is \({\textit{true}}\). A formula is in disjunctive normal form (DNF) if it is a disjunction of terms. The domain, \(dom({d})\), of a cube d is the set of variables that appear in it (positively or negatively). Given a state \(\sigma \), we use the state and the (full) cube that consists of all the literals that are satisfied in \(\sigma \) interchangeably. \({\left| {\varphi }\right| }_\textrm{dnf}\) is the minimal number of terms in any DNF representation of \(\varphi \). \({\left| {\varphi }\right| }_\textrm{cnf}\) is the minimal number of clauses in any CNF representation of \(\varphi \).

3 Background: The Monotone Theory

This section provides necessary definitions and results from the monotone theory by Bshouty [5] as used in this paper. Our presentation is based on [13, 14] (lemmas that are stated here slightly differently are proved in the extended version [15]).

Boolean functions which are monotone are special in many ways; one is that they are easier to learn [e.g. 1, 41]. Syntactically, a monotone function can be written in DNF so that all variables appear positively. This is easily generalized to b-monotone formulas, where each variable appears only at one polarity specified by b (Definition 2). The monotone theory aims to handle functions that are not monotone through the conjunction of b-monotone formulas. Section 3.1 considers the (over)approximation of a formula by a b-monotone formula, the “monotonization” of a formula; Sect. 3.2 studies the conjunction of several such monotonizations through the monotone hull operator.

3.1 Least b-Monotone Overapproximations

Definition 1

(b-Monotone Order). Let b be a cube. We define a partial order over states where \(v \le _b x\) when xv agree on all variables not present in b, and x disagrees with b on all variables on which also v disagrees with b: \(\forall p \in \varSigma . \ x[p] \ne v[p] \text{ implies } p \in dom({b}) \wedge v[p]=b[p]\).

Intuitively, \(v \le _b x\) when x can be obtained from v by flipping bits to the opposite of their value in b.

Definition 2

(b-Monotonicity). A formula \(\psi \) is b-monotone for a cube b if \( \forall v \le _b x. \ v \,\,\models \,\, \psi \text{ implies } x \,\,\models \,\, \psi . \)

That is, if v satisfies \(\psi \), so do all the states that are farther away from b than v. For example, if \(\psi \) is 000-monotone and \(100 \,\,\models \,\, \psi \), then because \(100 \le _{000} 111\) (starting in 100 and moving away from 000 can reach 111), also \(111 \,\,\models \,\, \psi \). In contrast, \(100 \not \le _{000} 011\) (the same process cannot flip the 1 bit that already disagrees with 000), so 011 does not necessarily belong to \(\psi \). (000-monotonicity corresponds to the usual notion of monotone formulas.)

Definition 3

(Least b-Monotone Overapproximation). For a formula \(\varphi \) and a cube b, the least b-monotone overapproximation of \(\varphi \) is a formula \(\mathcal {M}_{b}({\varphi })\) defined by

$$\begin{aligned} x \,\,\models \,\, \mathcal {M}_{b}({\varphi }) \text{ iff } \exists v. \ v \le _b x \wedge v \,\,\models \,\, \varphi . \end{aligned}$$

For example, if \(100 \,\,\models \,\, \varphi \), then \(100 \,\,\models \,\, \mathcal {M}_{000}({\varphi })\) because \(\mathcal {M}_{000}({\varphi })\) is an overapproximation, and hence \(111 \,\,\models \,\, \mathcal {M}_{000}({\varphi })\) because it is 000-monotone, as above. Here, thanks to minimality, 011 does not belong to \(\mathcal {M}_{000}({\varphi })\), unless 000, 001, 010, or 011 belong to \(\varphi \).

The minimality property of \(\mathcal {M}_{b}({\varphi })\) is formalized as follows:

Lemma 1

\(\mathcal {M}_{b}({\varphi })\) (Definition 3) is the least b-monotone formula \(\psi \) (Definition 2) s.t. \(\varphi \Longrightarrow \psi \) (i.e., for every other b-monotone formula \(\psi \), if \(\varphi \Longrightarrow \psi \) then \(\mathcal {M}_{b}({\varphi }) \Longrightarrow \psi \)).

An immediate but useful fact is that \(\mathcal {M}_{b}({\cdot })\) is a monotone operator:

Lemma 2

If \(\varphi _1 \Longrightarrow \varphi _2\) then \(\mathcal {M}_{b}({\varphi _1}) \Longrightarrow \mathcal {M}_{b}({\varphi _2})\).

Syntactic Intuition. Ordinary monotone formulas are \(\overline{0}\)-monotone; for general b, a formula \(\psi \) in DNF is b-monotone if interchanging \(p,\lnot p\) whenever \(b[p]={\textit{true}}\) results in a formula that is monotone DNF per the standard definition.Footnote 1 When \(\varphi \) is not b-monotone, the monotonization \(\mathcal {M}_{b}({\varphi })\) is the “closest thing”, in the sense that it is the smallest b-monotone \(\psi \) s.t. \(\varphi \Longrightarrow \psi \). As we shall see, \(\mathcal {M}_{b}({\varphi })\) can be efficiently obtained from \(\varphi \) by deleting literals. The syntactic viewpoint is key for our results in Sect. 4 and Sect. 5.

Geometric Intuition. Geometrically, \(\psi \) is b-monotone if \(v \,\,\models \,\, \psi \Longrightarrow x \,\,\models \,\, \psi \) for every states vx s.t. \(v \le _b x\); the partial order \(\le _b\) indicates that x is “farther away” from b in the Hamming cube than v from b, namely, that there is a shortest path w.r.t. Hamming distance from b to x (or from \(\pi _{b}({x})\)—the projection of x onto b—to x, when b is not a full cube) that goes through v. A formula \(\psi \) is b-monotone when it is closed under this operation, of getting farther from b. In this way, \(\mathcal {M}_{b}({\varphi })\) corresponds to the set of states x to which there is a shortest path from b that intersects \(\varphi \).Footnote 2 The geometric viewpoint is key for our results in Sect. 4 and for the abstract domain in  Sect. 6.

Disjunctive Form. The monotone overapproximation can be obtained from a DNF representation of the original formula, a fact that is useful for algorithms that compute the monotone overapproximation. Starting with a DNF representation of \(\varphi \), we can derive a DNF representation of \(\mathcal {M}_{b}({\varphi })\) by dropping in each term the literals that agree with b. Intuitively, if \(\ell \) agrees with b, the “constraint” that \(\sigma \,\,\models \,\, \ell \) is dropped from \(\mathcal {M}_{b}({t})\) because if \(\sigma \,\,\models \,\, \mathcal {M}_{b}({t})\) then flipping the value of \(\ell \) in \(\sigma \) results in a state \(\tilde{\sigma }\) such that \(\sigma \le _{b} \tilde{\sigma }\) and hence also \(\tilde{\sigma } \,\,\models \,\, \mathcal {M}_{b}({t})\).

Lemma 3

Let \(\varphi = t_1 \vee \ldots \vee t_m\) in DNF. Then the monotonization \(\mathcal {M}_{b}({\varphi }) \equiv \mathcal {M}_{b}({t_1}) \vee \ldots \vee \mathcal {M}_{b}({t_m})\) where \(\mathcal {M}_{b}({t_i}) \equiv t_i \setminus b = \bigwedge \{{\ell \in t_i \wedge \ell \not \in b}\}\).

This fact has several useful corollaries. First, for the important special case of a state (full cube) v, the monotonization \(\mathcal {M}_{b}({v})\) is the conjunction of all literals that hold in v except those that are present in b, written

$$ \textit{cube}_{{b}}({v}) {\mathop {=}\limits ^\textrm{def}}\mathcal {M}_{b}({v}) = \bigwedge {\{{p \ | \ v[p]={\textit{true}}, \, p \not \in b}\}} \wedge \bigwedge {\{{\lnot p_i \ | \ v[p]={\textit{false}}, \, \lnot p \not \in b}\}}. $$

In particular, if \(v \,\,\models \,\, \varphi \) then \(\textit{cube}_{{b}}({v}) \Longrightarrow \mathcal {M}_{b}({\varphi })\) (follows from Lemma 3 thinking about the representation \(v \vee \varphi \)). A similar property holds under the weaker premise that v is known to belong to the monotonization:

Lemma 4

If \(v \,\,\models \,\, \mathcal {M}_{b}({\varphi })\) then \(\textit{cube}_{{b}}({v}) \Longrightarrow \mathcal {M}_{b}({\varphi })\).

Another corollary is that the DNF size cannot increase from \(\varphi \) to \(\mathcal {M}_{b}({\varphi })\):

Lemma 5

\({\left| {\mathcal {M}_{b}({\varphi })}\right| }_\textrm{dnf} \le {\left| {\varphi }\right| }_\textrm{dnf}\).

3.2 Monotone Hull

We now define the monotone hull, which is a conjunction of b-monotone overapproximations over all the b’s from a fixed set of states B.

Definition 4

(Monotone Hull). The monotone hull of a formula \(\varphi \) w.r.t. a set of states B is \(\textrm{MHull}_{B}({\varphi }) = \bigwedge _{b \in B}{\mathcal {M}_{b}({\varphi })}\).

The monotone hull can be simplified to use a succinct DNF representation of the basis B instead of a conjunction over all states.

Lemma 6

If \(B \equiv b_1 \vee \ldots \vee b_m\) where \(b_1,\ldots ,b_m\) are cubes, then \(\textrm{MHull}_{B}({\varphi }) \equiv \mathcal {M}_{b_1}({\varphi }) \wedge \ldots \wedge \mathcal {M}_{b_m}({\varphi })\).

Note that when \(B = b\) is a single cube, \(\textrm{MHull}_{b}({\varphi }) = \mathcal {M}_{b}({\varphi })\).

Similarly to \(\mathcal {M}_{b}({\varphi })\), the monotone hull is an overapproxmation:

Lemma 7

\(\varphi \Longrightarrow \textrm{MHull}_{B}({\varphi })\).

In general, \(\textrm{MHull}_{B}({\varphi })\) is not equivalent to \(\varphi \). However, we can always choose B so that \(\textrm{MHull}_{B}({\varphi }) \equiv \varphi \). A set B that suffices for this is called a basis:

Definition 5

(Monotone Basis). A monotone basis is a set of states B. It is a basis for a formula \(\varphi \) if \( \varphi \equiv \textrm{MHull}_{B}({\varphi }). \)

Conversely, given a set B, we are interested in the set of formulas for which B forms a basis:

Definition 6

(Monotone Span).\(\textrm{MSpan}({B}) = \{{\textrm{MHull}_{B}({\varphi }) \mid \varphi \text{ over } \varSigma }\}\), the set of formulas for which B is a monotone basis.

The following theorem provides a syntactic characterization of \(\textrm{MSpan}({B})\), as the set of all formulas that can be written in CNF using clauses that exclude states from the basis. The connection between CNF and the monotone basis is useful in Sect. 5 where a monotone basis is constructed automatically, and in Sect. 6 where it is used to define an abstract domain.

Theorem 1

([5]).\(\varphi \in \textrm{MSpan}({B})\) iff there exist clauses \(c_1,\ldots ,c_s\) such that \(\varphi \equiv c_1 \wedge \ldots \wedge c_s\) and for every \(1 \le i \le s\) there exists \(b_j \in B\) such that \(b_j \not \models c_i\).

In particular, a basis B for \(\varphi \) can be constructed by writing a CNF representation of \(\varphi \) and choosing for B a state \(b_j \not \models c_j\) for each clause \(c_j\).

Exact Learning Using the Monotone Theory. The monotone theory was first developed by Bshouty for the purpose of exact learning formulas that are not monotone. Essentially, the idea is to reconstruct the formula \(\varphi \) by finding a monotone basis \(B = \{{b_1,\ldots ,b_t}\}\) for it, and constructing \(\textrm{MHull}_{B}({\varphi })\) while using equivalence and membership queries. The CDNF Algorithm [5] achieves efficient learning in terms of the DNF & CNF size of the target formula, and its code is shown in the extended version [15]. Our CDNF invariant inference algorithm (Sect. 5) is inspired by it, although it departs from it in significant ways (see Remark 3).

4 Super-Efficient Monotonization

In this section we develop an efficient procedure to compute \(\mathcal {M}_{b}({\varphi })\), which is a technical enabler of the results in following sections. The algorithm, presented in Algorithm 1, satisfies the following:

Theorem 2

Let \(\varphi \) be a formula and \(b\) a cube. The algorithm \(\textsc {Monotonize}\) \((\varphi ,b)\) computes \(\mathcal {M}_{b}({\varphi })\) in \(O(n^2 {\left| {\mathcal {M}_{b}({\varphi })}\right| }_\textrm{dnf})\) SAT queries and time.

(Throughout this paper, n denotes the number of propositional variables \({\left| {\varSigma }\right| }\).)

What distinguishes Theorem 2 is that the complexity bound depends on the DNF size of the output, the monotonization \(\mathcal {M}_{b}({\varphi })\), and not on the size of the input \(\varphi \), in contrast to the algorithm by Bshouty [5] (see Remark 1).

figure a

Starting from the candidate \(H = {\textit{false}}\), the algorithm iteratively samples—through satisfying models of a SAT query—states that belong in \(\varphi \) but not yet included in H. Every such state \(\sigma _r\) generates a new term in H. Since H is supposed to be \(b\)-monotone, the minimal term to include is \(\textit{cube}_{{b}}({\sigma _r})\). To be efficient, the algorithm generalizes each example, trying to flip bits to find an example v that also should be included in \(\mathcal {M}_{b}({\varphi })\) and is closer in Hamming distance to \(b\), which would result in a smaller term \(\textit{cube}_{{b}}({v})\), thereby including more states in each iteration and converging faster. The criterion for v is that a bit cannot be flipped if this would result in a state x where the Hamming interval does not intersect \(\varphi \). Here, , the projection [e.g. 42] of x onto the (possibly partial) cube \(b\) is the state s.t.

$$\begin{aligned} \pi _{b}({x}) = {\left\{ \begin{array}{ll} b[p] &{} p \in dom({b}) \\ x[p] &{} \text {otherwise} \end{array}\right. }, \end{aligned}$$

and the Hamming interval between two states \(\sigma _1,\sigma _2\) is the smallest cube that contains both—the conjunction of the literals where these agree. In sum, is the conjunction of the literals where \(x,b\) agree and the literals of x over variables that are not present in \(b\). As we will show, intersecting with \(\varphi \) is an indicator for x belonging to the monotonization of \(\varphi \).

The use of SAT queries in the algorithm does not necessarily assume that \(\varphi \) is given explicitly, and indeed in Sect. 5 we apply this algorithm with an implicit representation of \(\varphi \) (using additional copies of the vocabulary).

The rest of this section proves Theorem 2. First, the result v of generalization is so that when we disjoin the term \(\textit{cube}_{{b}}({v})\), we do not “overshoot” to include states that do not belong to the true monotonization:

Lemma 8

If \(\sigma _r \,\,\models \,\, \varphi \), then \(\textsc {generalize}(\varphi , b, \sigma _r)\) returns v s.t. \(\textit{cube}_{{b}}({v}) \Longrightarrow \mathcal {M}_{b}({\varphi })\).

Proof

v is chosen s.t. —note that this holds trivially in the initial choice of v which is \(\sigma _r \,\,\models \,\, \varphi \). Let \(\tilde{\sigma } \,\,\models \,\, \varphi \) s.t. . The latter means that \(\tilde{\sigma } \le _{b} v\), because consists of all the literals in v except for those that disagree with \(b\), so \(\sigma \) agrees with v whenever \(v,b\) agree. In more detail, \(\tilde{\sigma }\) agrees with v on all \(p \not \in dom({b})\) (because \(\pi _{b}({v})[p] = v[p]\) on such variables), and for \(p \in dom({p})\), if \(\tilde{\sigma }[p] \ne v[p]\), if \(v,b\) agree on p then likewise \(\tilde{\sigma }\) agrees with them (because then \(v[p]=\pi _{b}({v})[p]\) and p is retained in the conjunction that forms the Hamming interval), which satisfies Definition 2. As also \(\tilde{\sigma } \,\,\models \,\, \varphi \), this implies that \(v \,\,\models \,\, \mathcal {M}_{b}({\varphi })\) per Definition 3. Hence \(\textit{cube}_{{b}}({v}) \Longrightarrow \mathcal {M}_{b}({\varphi })\), by Lemma 4.    \(\square \)

This shows that it is reasonable to disjoin the term \(\textit{cube}_{{b}}({v})\) to H in the hope of eventually obtaining \(H = \mathcal {M}_{b}({\varphi })\). The following lemma argues that the algorithm continues to sample states until it converges to the true monotonization.

Lemma 9

\(\textsc {Monotonize}(\varphi , b)\) terminates and returns \(\mathcal {M}_{b}({\varphi })\).

Proof

First we show that when it terminates, the result is correct. Always \(H \subseteq \mathcal {M}_{b}({\varphi })\), because in each iteration we disjoin to H a formula that satisfies the same property, by Lemma 8. The algorithm terminates when \(\varphi \subseteq H\), and H is always a \(b\)-monotone formula (by Lemma 3 the monotonization of H is H itself, which is b-monotone as in Lemma 1). From the minimality of \(\mathcal {M}_{b}({\varphi })\) (Lemma 1), necessarily also \(\mathcal {M}_{b}({\varphi }) \subseteq H\).

To show termination it suffices to show that H strictly increases in each iteration (because the number of non-equivalent propositional formulas is finite). To see this, note that generalize(\(\varphi \), \(b\), \(\sigma _r\)) returns v s.t. \(v \le _{b} \sigma _r\), since the procedure starts with \(\sigma _r\) and only flips literals to agree with \(b\). This implies that \(\sigma _r \,\,\models \,\, \textit{cube}_{{b}}({v})\), so after the iteration \(\sigma _r \,\,\models \,\, H\) whereas previously .    \(\square \)

The novelty of the algorithm is its efficiency, which we now turn to establish. The crucial point is the generalization is able to produce, term by term, a minimal representation of \(\mathcal {M}_{b}({\varphi })\). To this end, we first show that \(\textit{cube}_{{b}}({v})\) that the algorithm computes in lines 5 to 6 is a prime implicant of \(\mathcal {M}_{b}({\varphi })\). Recall that a term t is an implicant of a formula \(\psi \) if \(t \Longrightarrow \psi \), and it is prime if this no longer holds after dropping a literal, that is, for every \(\ell \in t\) (as a set of literals), . It is non-trivial if \(t \not \equiv {\textit{false}}\) (not an empty set of literals).

Lemma 10

If \(\sigma _r \,\,\models \,\, \varphi \), then \(\textsc {generalize}(\varphi , b, \sigma _r)\) returns v s.t. \(\textit{cube}_{{b}}({v})\) is a non-trivial prime implicant of \(\mathcal {M}_{b}({\varphi })\).

Proof

Lemma 8 shows that it is an implicant. It is non-trivial because \(\sigma _r\) is a model of it, as shown as part of the proof of Lemma 9. Suppose that \(\textit{cube}_{{b}}({v})\) is not prime. Then there a literal over some variable p that can be dropped. It is present in \(\textit{cube}_{{b}}({v})\), which means that \(p \in dom({b})\) and \(v[p] \ne b[p]\). Then the cube obtained from dropping the literal can be written as \(\textit{cube}_{{b}}({x})\) where \(x = v[p \mapsto \lnot v[p]]\). If this cube is an implicant of \(\mathcal {M}_{b}({\varphi })\), then, because \(x \,\,\models \,\, \textit{cube}_{{b}}({x})\), in particular \(x \,\,\models \,\, \mathcal {M}_{b}({\varphi })\). By Definition 3, there is \(\tilde{\sigma } \,\,\models \,\, \varphi \) such that \(\tilde{\sigma } \le _{b} x\). But the latter implies that , because, by Definition 2, for every \(p \not \in dom({b})\), \(\tilde{\sigma }[p] = x[p] = \pi _{b}({x})[p]\) and for every \(p \in dom({b})\) where \(x,\pi _{b}({x})\) agree also \(\tilde{\sigma },b\) agree (because \(\pi _{b}({x})[p] = b[p]\)). Thus , in contradiction to the choice of v, according to the check in line 20.    \(\square \)

Thanks to the fact that \(\mathcal {M}_{b}({\varphi })\) is \(b\)-monotone, through one of the basic properties of monotone functions that dates back to Quine [30], a prime implicant reproduces a term from the (unique) minimal representation of \(\mathcal {M}_{b}({\varphi })\). We use this to show that the monotonization is computed in few iterations:

Lemma 11

The number of iterations of the loop in line 3 of \(\textsc {Monotonize}(\varphi , b)\) is at most \({\left| {\mathcal {M}_{b}({\varphi })}\right| }_\textrm{dnf}\).

Proof

Lemma 10 shows that in each iteration we disjoin a prime implicant. It is a property of monotone functions that they have a unique DNF representations with irredundant, which consists of the disjunction of all non-trivial prime implicants [30], and this extends to b-monotone functions (through a simple renaming of variables to make the function monotone). Thus the non-trivial prime implicant we disjoin is a term of the minimal DNF representation of \(\mathcal {M}_{b}({\varphi })\). Each additional \(\sigma _r\) produces a new term, as shown in Lemma 9.    \(\square \)

We are now ready to prove that the algorithm overall is efficient.

Proof of Theorem 2. By Lemma 11 the number of iterations of the loop in line 3 is bounded by \({\left| {\mathcal {M}_{b}({\varphi })}\right| }_\textrm{dnf}\). Each iteration calls generalize, which performs at most n iterations of the loop in line 14 because the same variable is never flipped twice. Each iteration of this loop performs n SAT queries in line 20. Note that the cube is straightforward to compute in linear time.    \(\square \)

Remark 1

Bshouty [5] used an algorithm for computing \(\mathcal {M}_{b}({\varphi })\) whose complexity is bounded by the DNF input size \({\left| {\varphi }\right| }_\textrm{dnf}\), whereas Algorithm 1’s complexity is bounded by the DNF output size, \({\left| {\mathcal {M}_{b}({\varphi })}\right| }_\textrm{dnf}\), which is never worse (Lemma 5), and sometimes significantly smaller. When considered as learning algorithms, the improved complexity of Algorithm 1 comes at the expense of the need for richer queries: Bshouty’s algorithm is similar to Algorithm 1 (using an equivalence query in line 3 that produces a positive example—see the extended version [15]), except that the condition in line 20 is replaced by checking whether \(x \,\,\models \,\, \varphi \). This is a membership query to \(\varphi \), whereas our check amounts to a disjointness query [1].

5 Efficient Inference of CDNF Invariants

figure o

In this section we build on the algorithm of Sect. 4 to devise a new model-based interpolation-based algorithm that can efficiently infer invariants that have poly-size CNF and DNF representations (dubbed “CDNF invariants”). We start with background on the theoretical condition that guarantees the success of the original model-based algorithm for simpler forms of invariants.

5.1 Background: Interpolation with the Fence Condition

The essence of interpolation-based invariant inference (ITP) is to generalize a proof of bounded unreachability—i.e., bounded model checking [2]—into a proof of unbounded reachability, that is, a part of the inductive invariant. The reader is more likely to be familiar with the structure of the original algorithm by McMillan [26], which uses bounded unreachability to the bad states, where in each iteration the algorithm adds states to the candidate by disjoining a formula from which it is impossible to reach a bad state in s steps. However, for our purposes here, it is more convenient to consider the dual version, which uses bounded unreachability from the initial states, where in each iteration the algorithm excludes states from the candidate by conjoining a formula which does not exclude any state that the system can reach in s steps from an initial state (i.e., the candidate \(\varphi \) is updated by \(\varphi \leftarrow \varphi \wedge H\) where H is a formula s.t. \({\underline{\delta }}^{{s}}({{\textit{Init}}}) \subseteq H\)).

The original interpolation-based algorithm by McMillan uses a procedure that relies on the internals of the SAT solver [26]. Complexity bounds on interpolation-based algorithms analyze later approaches that exercise control on how interpolants are generated, and do this in a model-based fashion [3, 6] inspired by IC3/PDR [4, 9]. The dual version is presented in Algorithm 2. After starting the candidate \(\varphi \) as \(\lnot \textit{Bad}\), each iteration checks for a counterexample to induction (line 6), whose pre-state \(\sigma \) is excluded from \(\varphi \) at the end of the iteration (line 10). Many states are excluded in each iteration beyond the counterexample, by conjoining to the candidate a minimal clause that excludes \(\sigma \) but retains all the states that are reachable in the system in s steps (line 9—this involves up to n queries of s-BMC, each time dropping a literal and checking whether the clause is still valid). If the counterexample cannot be blocked, because it is in fact reachable in s steps, this is an indication that s needs to be larger (line 7) to find a proof or a safety violation. The algorithm detects that the transition system is unsafe in line 3 when s is enough to find an execution from \({\textit{Init}}\) to \(\textit{Bad}\) with at most s transitions. (Our analysis of the algorithms in the paper focuses on the safe case, the complexity of finding an invariant.)

A condition that guarantees that s is large enough for Algorithm 2 to successfully find an inductive invariant, called the fence condition, was recently put forward [13], involving the Hamming-geometric boundary of the invariant.

Definition 7

(Boundary). Let I be a set of states. Then the (inner) boundary of I, denoted \(\partial ^{+}({I})\), is the set of states \(\sigma ^+ \,\,\models \,\, I\) s.t. there is a state \(\sigma ^-\) that differs from \(\sigma \) in exactly one variable, and \(\sigma ^+ \,\,\models \,\, I, \sigma ^- \not \models I\).

Definition 8

(Fence Condition). Let I be an inductive invariant for a transition system \(({\textit{Init}},\delta ,\textit{Bad})\) and \(s \in \mathbb {N}\). Then I is s-forwards fenced if \(\partial ^{+}({I}) \subseteq {\underline{\delta }}^{{s}}({{\textit{Init}}})\).

Example 1

Let I be the set of all states where at least two bits are 0 and at least two bits are 1. Then \(\partial ^{+}({I})\) is the set where exactly two bits are 0 (and at least two bits are 1) or exactly two bits are 1 (and at least two bits are 0). Note that \(I \setminus \partial ^{+}({I})\) contains many (most) states—those where three or more bits are 0 and three or more bits are 1. The fence condition requires only from the states in \(\partial ^{+}({I})\) to be reachable in s steps.

The fence condition guarantees that throughout the algorithm’s execution, \(\varphi \) contains the fenced invariant, giving rise to the following correctness property:

Theorem 3

([13]). If there exists an s-forwards fenced invariant for \(({\textit{Init}},\delta ,\textit{Bad})\), then successfully finds an invariant.

The idea is that the property that \(I \Longrightarrow \varphi \) is maintained because the fence condition ensures that it suffices to verify that the clause that generalize computes, starting from a state \(\sigma \,\,\models \,\, \varphi \) that is outside of I, does not exclude a state from \({\underline{\delta }}^{{s}}({{\textit{Init}}})\) to guarantee that it also does not exclude a state from I. (Note that the fence condition does not provide a way to know whether an arbitrary state belongs to I.).

The fence condition ensures the algorithm’s success, but not that it is efficient—the number of iterations until convergence may be large, even when there is a fenced inductive invariant that has a short representation in CNF. (Note that in Algorithm 2, \(\varphi \) is always in CNF.) This was ameliorated in [13] by the assumption that the invariant is monotone:

Theorem 4

([13]). If I is an s-forwards fenced inductive invariant for \(({\textit{Init}},\delta ,\textit{Bad})\) and I is monotone (can be written in CNF/DNF with all variables un-negated), then successfully finds an inductive invariant in \(O({\left| {I}\right| }_\textrm{cnf})\) inductiveness checks, and \(O\left( n \cdot {\left| {I}\right| }_\textrm{cnf}\right) \) checks of s-BMC and time.Footnote 3

However, when I is not monotone, it is possible for the algorithm to require an exponential number of iterations even though \({\left| {I}\right| }_\textrm{cnf}\) is small (and in fact, even though every representation of I without redundant clauses is small).

The challenge that we address in this section is to create an invariant inference algorithm that efficiently infers inductive invariants that are not monotone, while relying only on the fence condition.

5.2 CDNF Inference with the Fence Condition

We now present our new invariant inference algorithm (Algorithm 3), that is guaranteed to run in time polynomial in \(n,{\left| {I}\right| }_\textrm{cnf},{\left| {I}\right| }_\textrm{dnf}\) of a fenced invariant I:

Theorem 5

Let I be a forwards s-fenced inductive invariant for \(({\textit{Init}},\delta ,\textit{Bad})\). Then \(\textsc {CDNF-ITP}({\textit{Init}},\delta ,\textit{Bad},s)\) finds an inductive invariant in at most \({\left| {I}\right| }_\textrm{cnf} \cdot {\left| {I}\right| }_\textrm{dnf} \cdot n^2\) of s-BMC checks, \({\left| {I}\right| }_\textrm{cnf}\) inductiveness checks, and \(O({\left| {I}\right| }_\textrm{cnf} \cdot {\left| {I}\right| }_\textrm{dnf} \cdot n^2)\) time.

Example 2

I from Example 1 has poly-size representations in both CNF and DNF. We write:

The CNF formula has 2n clauses, and the DNF has \(\left( {\begin{array}{c}n\\ 4\end{array}}\right) = \varTheta (n^4)\) terms. Theorem 5 shows that such I satisfying the fence condition can be inferred in a number of queries and time that is polynomial in n. Note that these formulas fall outside the classes that previous results can handle efficiently (see Sect. 7) as they are not monotone nor almost-monotone (the number of terms/clauses with negated variables is not constant).

As noted by Bshouty [5], the class of formulas with short DNF and CNF includes the formulas that can be expressed by a small decision tree: a binary tree in which every internal node is labeled by a variable and a leaf by \({\textit{true}}\)/\({\textit{false}}\), and \(\sigma \) satisfies the formula if the path defined by starting from the root, turning left when the \(\sigma \) assigns \({\textit{false}}\) to the variable labeling the node and right otherwise, reaches a leaf \({\textit{true}}\). The size of a decision tree is the number of leaves in the tree. We conclude that (see the proof in the extended version [15]):

Corollary 1

Let I be a forwards s-fenced inductive invariant for \(({\textit{Init}},\delta ,\textit{Bad})\), that can be expressed as a decision tree of size m. Then \(\textsc {CDNF-ITP}\)\(({\textit{Init}},\delta ,\textit{Bad},s)\) finds an inductive invariant in at most \(m^2 \cdot n^2\) of s-BMC checks, m inductiveness checks, and \(O(m^2 \cdot n^2)\) time.

The algorithm CDNF-ITP which attains Theorem 5 is presented in Algorithm 3. Its overall structure is similar to Algorithm 2, except the formula used to block a counterexample is the monotonization of the s-reachable states. Specifically, starting from the candidate \(\varphi = {\textit{true}}\) (line 4, the algorithm iteratively samples counterexamples to induction (line 6) and blocks the pre-state \(\sigma \) from \(\varphi \) by conjoining \(\mathcal {M}_{\sigma }({{\underline{\delta }}^{{s}}({{\textit{Init}}})})\), computed by invoking Algorithm 1. The SAT queries of the form \(\textsc {sat}(\varphi \wedge \theta )\) that Algorithm 1 performs (see Sect. 4) have \(\varphi = {\underline{\delta }}^{{s}}({{\textit{Init}}})\), and they amount to the BMC checks of whether \({\underline{\delta }}^{{s}}({{\textit{Init}}}) \cap \theta \overset{?}{=}\ \emptyset \).

It is important for the efficiency result that Algorithm 3 uses Algorithm 1 as a subprocedure. Using Bshouty’s procedure (see Remark 1) would yield a bound of \(n \cdot {\left| {{\underline{\delta }}^{{s}}({{\textit{Init}}})}\right| }_\textrm{dnf}\) checks of s-BMC, and it is likely that \({\underline{\delta }}^{{s}}({{\textit{Init}}})\) is complex to capture in a formula when s is significant (as common for sets defined by exact reachability, such as the set of the reachable states).

We now proceed to prove the correctness and efficiency of the algorithm (Theorem 5). Throughout, assume that I is an inductive invariant for \(({\textit{Init}},\delta ,\textit{Bad})\). I will be s-forwards fenced; we state this explicitly in the premise of lemmas where this assumption is used. The idea behind the correctness and efficiency of Algorithm 3 is that \(\mathcal {M}_{\sigma }({I})\) is a stronger formula than the clauses that are produced in Algorithm 2, causing the candidate to converge down to the invariant in fewer iterations, while never excluding states that belong to I (because \(I \subseteq \mathcal {M}_{\sigma }({I})\), as used in Lemma 14). As we will show (in Lemma 15), this strategy results in a number of iterations that is bounded by the CNF size of I (without further assumptions on the syntactic structure of I). The trick, however, is to show (in Lemma 13) that what the algorithm computes in line 9 is indeed \(\mathcal {M}_{\sigma }({I})\), even though I is unknown. The crucial observation is that under the fence condition, the monotonization of the s-reachable states matches the monotonization of the invariant (even though these are different sets!). Note that this holds for any invariant that satisfies the fence condition. To prove this we need to recall a fact about the monotonization of the boundary of a set:

Lemma 12

( [14]). Let IS be sets of states s.t. \(\partial ^{+}({I}) \subseteq S\) and \(\sigma \) a state s.t. \(\sigma \not \models I\). Then \(I \subseteq \mathcal {M}_{\sigma }({S})\).

The idea is that for every \(x \in I\), there is a state on the boundary \(v \in \partial ^{+}({I})\) s.t. \(v \le _\sigma x\) (where a shortest path between xb crosses I), and because also \(v \in S\) we would have that \(x \in \mathcal {M}_{\sigma }({S})\).

We proceed to relate the monotonizations of \({\underline{\delta }}^{{s}}({{\textit{Init}}}),I\):

Lemma 13

If I is forwards s-fenced for \(({\textit{Init}},\delta ,\textit{Bad})\), and \(\sigma \not \models I\), then we have that \(\mathcal {M}_{\sigma }({{\underline{\delta }}^{{s}}({{\textit{Init}}})}) = \mathcal {M}_{\sigma }({I})\).

Proof

Since I is an inductive invariant, \({\underline{\delta }}^{{s}}({{\textit{Init}}}) \subseteq I\), so \(\mathcal {M}_{\sigma }({{\underline{\delta }}^{{s}}({{\textit{Init}}})}) \subseteq \mathcal {M}_{s}({I})\) from Lemma 2. For the other direction we use Lemma 12: by the fence condition, \(\partial ^{+}({I}) \subseteq {\underline{\delta }}^{{s}}({{\textit{Init}}})\) and hence, as \(\sigma \not \models I\), we obtain \(I \subseteq \mathcal {M}_{\sigma }({{\underline{\delta }}^{{s}}({{\textit{Init}}})})\). By Lemma 1, this implies that \(\mathcal {M}_{\sigma }({I}) \subseteq \mathcal {M}_{\sigma }({{\underline{\delta }}^{{s}}({{\textit{Init}}})})\).    \(\square \)

We use this to characterize the candidate invariant the algorithm constructs:

Lemma 14

If I is forwards s-fenced for \(({\textit{Init}},\delta ,\textit{Bad})\), then in each step of CDNF-ITP\(({\textit{Init}},\delta ,\textit{Bad},k)\), \(\varphi = \textrm{MHull}_{\mathcal {C}_i}({I}) \wedge \lnot \textit{Bad}\), where \(\mathcal {C}_i\) is the set of counterexamples \(\sigma \) the algorithm has observed so far. In particular, \(I \subseteq \varphi \).

Proof

First, \(I \subseteq \varphi \) holds from the rest of the lemma because \(I \subseteq \lnot \textit{Bad}\) (it is an inductive invariant), and \(I \subseteq \textrm{MHull}_{\mathcal {C}_i}({I})\) by Lemma 7. The proof of \(\varphi = \textrm{MHull}_{\mathcal {C}_i}({I}) \wedge \lnot \textit{Bad}\) is by induction on iterations of the loop in line 5. Initially, \(\mathcal {C} = \emptyset \) and indeed \(\varphi = \lnot \textit{Bad}\). In each iteration, \(I \subseteq \varphi \) using the argument above and the induction hypothesis. Hence, the counterexample to induction of line 6 has \(\sigma \not \models I\) (otherwise \(\sigma ' \,\,\models \,\, I\) because I is an inductive invariant, and this would imply also \(\sigma \,\,\models \,\, \varphi \), in contradiction). Then Lemma 9 ensures that \(H = \mathcal {M}_{\sigma }({{\underline{\delta }}^{{s}}({{\textit{Init}}})})\). Lemma 13 shows that this is \(\mathcal {M}_{\sigma }({I})\), as required.    \(\square \)

Essentially, the algorithm gradually learns a monotone basis (Definition 5) for I from the counterexamples to induction, and constructs I via the monotone hull w.r.t. this basis. The next lemma shows that the size of the basis that the algorithm finds is bounded by \({\left| {I}\right| }_\textrm{cnf}\).

Lemma 15

If I is forwards s-fenced for \(({\textit{Init}},\delta ,\textit{Bad})\), then CDNF-ITP\(({\textit{Init}},\delta ,\textit{Bad},k)\) successfully finds an inductive invariant. Further, the number of iterations of the loop in line 5 is at most \({\left| {I}\right| }_\textrm{cnf}\).

Proof

Since \(\sigma \not \models I\), also it is not a model of the monotonization w.r.t. to itself, \(\sigma \not \models \mathcal {M}_{\sigma }({I})\) (because the only state \(x \le _\sigma \sigma \) is \(x=\sigma \)—see Definitions 1 and 3). This shows, using Lemma 14, that at least one state is excluded from the candidate \(\varphi \) in each iteration. By the same lemma always \(I \Longrightarrow \varphi \), and the algorithm terminates when \(\varphi \) is inductive, so this shows that the algorithm successfully converges to an inductive invariant.

To see that this occurs in at most \({\left| {I}\right| }_\textrm{cnf}\) iterations, consider a minimal CNF representation of I, \(I = c_1 \wedge \ldots \wedge c_{{\left| {I}\right| }_\textrm{cnf}}\). We argue that in each iteration produces at least one new clause from that representation, in the sense that for some i, \(\varphi \wedge \mathcal {M}_{\sigma _b}({I}) \Longrightarrow c_i\) whereas previously . Let \(c_i\) be the clause that \(\sigma \not \models c_i\) (recall e.g. that \(\sigma \not \models \varphi \) and \(I \subseteq \varphi \)). Then \(\mathcal {M}_{\sigma }({I}) \subseteq c_i\), since \(c_i\) is \(\sigma \)-monotone (\(\mathcal {M}_{\sigma }({c_i}) = c_i\), using Lemma 3, because all the literals disagree with \(\sigma \)) and \(I \subseteq c_i\), and \(\mathcal {M}_{\sigma }({I})\) is the smallest such (Lemma 1). Thus when we conjoin \(H = \mathcal {M}_{\sigma }({I})\) to \(\varphi \) we conjoin at least one new \(c_i\) that was not present in a CNF representation of \(\varphi \); this can happen at most \({\left| {I}\right| }_\textrm{cnf}\) times.    \(\square \)

Overall:

Proof of Theorem 5. The algorithm’s success in finding an invariant is established in Lemma 15. As for efficiency, by Lemma 15, there are at most \({\left| {I}\right| }_\textrm{cnf}\) iterations of the loop in CDNF-ITP, each performs a single inductiveness query, and calls Monotonize. By Theorem 2 each such call performs at most \(O(n^2 {\left| {\mathcal {M}_{\sigma }({I})}\right| }_\textrm{dnf})\) s-BMC queries. The claim follows because \({\left| {\mathcal {M}_{\sigma }({I})}\right| }_\textrm{dnf} \le {\left| {I}\right| }_\textrm{dnf}\) (Lemma 5).    \(\square \)

Remark 2

(Backwards fence condition). Our main theorem in this section, Theorem 5, also has a dual version that applies to a fence condition concerning backwards reachability. I is s-backwards fenced if every state in the outer boundary \(\partial ^{-}({I}) {\mathop {=}\limits ^\textrm{def}}\partial ^{+}({\lnot I})\) can reach a state in \(\textit{Bad}\) in at most s steps [13]. The dual of Theorem 5 is that there is an algorithm that achieves the same complexity bound under the assumption that I is s-backwards fenced (instead of s-forwards fenced). The dual-CDNF algorithm is obtained by running our CDNF algorithm on the dual transition system \((\textit{Bad},\delta ^{-1},{\textit{Init}})\) (see e.g. [ [12], Appendix A]) and negate the invariant; notice that the CDNF class is closed under negation. This algorithm also achieves the same bound for decision trees as in Corollary 1, under the backwards fence assumption.

Remark 3

(Comparison to Bshouty’s CDNF algorithm). Our CDNF algorithm, Algorithm 3, is inspired by Bshouty’s CDNF algorithm [5], but diverges from it in several ways. The reason is the different queries available in each setting. (The code for Bshouty’s CDNF algorithm is provided in the extended version  [15].) Structurally, while the candidate in both algorithms is gradually constructed to be \(\textrm{MHull}_{\mathcal {C}_i}({I}) = \bigwedge _{\sigma \in \mathcal {C}_i}{\mathcal {M}_{\sigma }({I})}\) (I being the unknown invariant/formula, and \(\mathcal {C}_i\) the set of negative examples so far), Algorithm 3 constructs each monotonization separately, one by one, whereas Bshouty’s algorithm increases all monotonizations simultaneously. Bshouty’s design follows from having the source of examples—both positive and negative—equivalence queries, checking whether the candidate matches I. A membership query is necessary to decide whether the differentiating example is positive or negative for I in order to decide whether to add disjuncts to the existing monotonizations or to add a new monotonization, respectively. This procedure is problematic in invariant inference, because we cannot in general decide, for a counterexample \((\sigma ,\sigma ')\) showing that our candidate is not inductive, whether \(\sigma \not \models I\) (negative) or \(\sigma ' \,\,\models \,\, I\) (positive) [12, 18]. The solution in previous work [13] was to assume that the invariant satisfies both the forwards and backwards fence condition (see Remark 2). Under this assumption it is possible to decide whether \(\sigma \,\,\models \,\, I\) for an arbitrary state \(\sigma \). However, this condition is much stronger than a one-sided version of the fence condition. Instead, in our inference algorithm, the candidate is ensured to be an overapproximation of the true I, so each counterexample to induction in line 6 yields a negative example. Positive examples are obtained in line 4 from \({\underline{\delta }}^{{s}}({{\textit{Init}}}) \subseteq I\); there is no obvious counterpart to that in exact learning, because in that setting we have no a-priori knowledge of some set S that underapproximates I, let alone one where we know—as the fence condition guarantees through Lemma 12—that covering S in the monotonization is enough to cover I.

6 Efficient Implementation of Abstract Interpretation

In this section we build on the algorithm of Sect. 4 to prove a complexity upper bound on abstract interpretation in the domain based on the monotone theory (Theorem 7). We begin with background on this domain.

6.1 Background: Abstract Interpretation in the Monotone Theory

Recall that given a set of states B, the monotone span (Definition 6) of B, \(\textrm{MSpan}({B})\), is the set of formulas \(\varphi \) s.t. \(\textrm{MHull}_{B}({\varphi }) \equiv \varphi \), or, equivalently, the set of formulas that can be written as conjunctions of clauses that exclude states from B (Theorem 1). The abstract domain \(\mathbb {M}[{B}]=\langle \textrm{MSpan}({B}), \Longrightarrow , \mathbin {\sqcup }_{B}, {\textit{false}}\rangle \), introduced in [14], is a join-semilattice over the monotone span of B, ordered by logical implication, with bottom element \({\textit{false}}\). The lub \(\mathbin {\sqcup }_{B}\) exists because the domain is finite and closed under conjunction (follows from Theorem 1). A Galois connection with the concrete domain is obtained through the concretization \(\gamma (\varphi ) = \{{\sigma \, | \, \sigma \,\,\models \,\, \varphi }\}\) and the abstraction \(\alpha _{B}(\psi ) = \textrm{MHull}_{B}({\psi })\) [14].Footnote 4

figure t

Given a transition system \(({\textit{Init}},\delta )\), iterations of abstract interpretation with the abstract transformer are given by \(\xi _0 = \alpha _{B}({\textit{Init}}), \xi _{i+1} = \alpha _{B}\left( \delta (\gamma (\xi _i)) \cup {\textit{Init}}\right) \). Substituting \(\gamma ,\alpha _{B}\) yields the iterations as shown in Algorithm 4.

Each iterate in Algorithm 4 involves a monotone hull (lines 4 and 6), which is a conjunction of monotonizations. Using Algorithm 1 this can be computed efficiently. We follow on this idea to prove efficient complexity upper bounds on Algorithm 4.

6.2 Complexity Upper Bound

To obtain a complexity upper bound on Algorithm 4 we need to bound the time needed to compute each \(\xi _i\) as well as the number of \(\xi _i\)’s. A bound for the latter is provided by [14]:

Theorem 6

([14]). Let \(({\textit{Init}},\delta ,\textit{Bad})\) be a transition system. Then the algorithm AI-\(\mathbb {M}[{B}]({\textit{Init}},\delta ,\textit{Bad})\) converges in iteration number at most

where B can be written in DNF as \(b_1 \vee \ldots \vee b_m\), the cube consists of the literals that appear in all \(b_1,\ldots ,b_m\) (i.e., as sets of literals), and the reflection of a cube \(d = \ell _1 \wedge \ldots \wedge \ell _r\) is \(\textit{Ref}({d}) = \lnot \ell _1 \wedge \ldots \wedge \lnot \ell _r\).

We fix a DNF representation of \(B = b_1 \vee \ldots \vee b_m\). For brevity, we use \(\zeta \) to refer to the bound in Theorem 6. When \(\zeta \) is small, it reflects the benefit of using abstract interpretation in \(\textrm{MSpan}({B})\) over exact reachability (even though \(\zeta \) is not always a tight bound) [14]. An example of \(\zeta \) for a simple system appears in Example 3.

In this section we prove that it is possible to implement Algorithm 4 so that its overall complexity is polynomial in the same quantity \(\zeta \), the number of variables n, and the number m of terms in the representation of B:

Theorem 7

Algorithm 4 can be implemented to terminate in \(O(n^2 \zeta + (n+m)\zeta ^2)\) SAT queries and time.

Example 3

Let n be an odd number. Consider a transition system over \(\overline{x} = x_1,\ldots ,x_n\), where \({\textit{Init}}\) is \(\overline{x} = 00\ldots 00\) and the transition relation chooses an even number of variables that are 0 from the initial state and turns them into 1. If we take B to be the singleton set containing the state \(\overline{x} = 11\ldots 11\) (hence, B is a cube and \(m=1\)), then Lemma 3 yields that \(\mathcal {M}_{\overline{x}=00\ldots 00\wedge \overline{x}'=11\ldots 11}({\delta }) = \bigvee _{i=1}^{n}{(x'_i = 0)}\) (see the extended version [15] for details) so \(\zeta = {\left| {\mathcal {M}_{\overline{x}=00\ldots 00\wedge \overline{x}'=11\ldots 11}({\delta })}\right| }_\textrm{dnf} = O(n)\). Theorem 7 shows that an implementation of abstract interpretation in \(\mathbb {M}[{B}]\) for this system terminates in \(O(n^3)\) SAT queries and time. This is significant because a naive implementation of Algorithm 4 would start, for the first iteration of line 6, by computing the exact post-image \(\delta ({\textit{Init}})\); in our example this is the set of states where the parity of \(\overline{x}\) is 0, which cannot be represented in polynomial-size DNF nor CNF [e.g. 8]. Our implementation is able to compute the abstraction of the post-image without constructing the post-image and avoids the blowup in complexity. [14] contains other examples with small \(\zeta \).

At this point, the direct approach to implement Algorithm 4 is to perform \(\textrm{MHull}_{B}({\varphi })\) in lines 4 and 6 through , invoking Algorithm 1 on \(\varphi \). Indeed, this achieves a bound that is only slightly worse than Theorem 7 (see Remark 4). In what follows we provide an implementation that both explicates the connection to \(\zeta \), and achieves exactly the bound of Theorem 7.

figure x

Our implementation is displayed in Algorithm 5. The first iterate is computed as described above by invoking Algorithm 1 on \({\textit{Init}}\) (line 4). The SAT queries performed by Algorithm 1 are in this case straightforward, with \(\varphi = {\textit{Init}}\).

To compute the next iterates, we first compute monotnizations of the concrete transformer, \(\delta \vee {\textit{Init}}'\) (line 6). This is a two-vocabulary formula, and accordingly the monotonizations are w.r.t. two-vocabulary cubes. The monotonizations are computed in DNF form and stored in . The next iterate \(\xi _{i+1}\) is formed from the ’s by taking all the combinations of terms from whose pre-state part is satisfied by at least one state in \(\xi _i\), and forming the conjunction of the post-state parts: for a term \(t = \ell _1 \vee \ldots \vee \ell _{i_1} \vee \ell '_{i_1+1} \vee \ldots \vee \ell '_{i_2}\) over \(\varSigma \uplus \varSigma '\), the restriction \({t}\big |_{\varSigma } = \ell _1 \vee \ldots \vee \ell _{i_1}\) and \({t}\big |_{\varSigma '} = \ell '_{i_1+1} \vee \ldots \vee \ell '_{i_2}\).

The intuition is that in the original algorithm, given a set of states \(\xi _i\), we find the set of states in \(\xi _{i+1}\) by taking the result of the transformer \(\delta \vee {\textit{Init}}'\) on the specific \(\xi _i\), then, for the monotonization of the result, adding also the states that are required by the \(b_j\)-monotone order, and this we do for every disjunct \(b_j\) in B, letting \(\xi _{i+1}\) be the conjunction of the said monotonizations. Here, instead, we monotonize \(\delta \vee {\textit{Init}}'\) itself w.r.t. every \(b_j\), such that for every pre-state we have ready the monotonization of the corresponding post-state. We then form \(\xi _{i+1}\) by picking and conjoining the right monotonizations—the ones whose pre-state is in the previous frame. (The monotonization of the pre-state w.r.t. is useful for decreasing \(\zeta \), and hence the obtained bound, without altering \(\xi _{i+1}\); the latter stems from the fact that the input \(\xi _i\) is also the result of such a procedure, so the presence of a pre-state in \(\xi _i\) indicates the presence all the states in its monotone hull w.r.t. in \(\xi _i\).)

The invocation of Algorithm 1 in line 6 is on a double-vocabulary formula; still, the SAT queries to be performed in the invocation of Algorithm 1 are simple SAT queries about two-vocabulary formulas (and a counterexample is a pair of states).

It is important for the efficiency result that Algorithm 3 uses Algorithm 1 as a subprocedure. Using Bshouty’s procedure (see Remark 1) would yield a bound in terms of the DNF size of the original transition relation, which could be significantly larger, especially in cases where the abstract interpretation procedure terminates faster than exact forward reachability [14].

The rest of this section proves that Algorithm 5 realizes Theorem 7. To show that it correctly implements Algorithm 4, we need the following fact about Algorithm 4:

Lemma 16

([14]). In Algorithm 4, \(\sigma ' \,\,\models \,\, \xi _{i+1}\) iff there exist \(\sigma _1,\ldots ,\sigma _m \,\,\models \,\, \xi _i\) s.t.

(1)

We use this to show the correctness of Algorithm 5:

Lemma 17

\(\xi _i\) in Algorithm 5 is logically equivalent to \(\xi _i\) in Algorithm 4.

Proof

By induction over i. The correctness of \(\xi _0\) follows from the correctness of Algorithm 1 (Theorem 2). For the same reasons, of Algorithm 5 is equivalent to . Now for some DNF manipulation: for every \(\sigma '\),

Hence, \(\exists \sigma _1,\ldots ,\sigma _m \in \xi _i\) of Algorithm 5 that with \(\sigma '\) satisfy Eq. 1 iff \(\sigma ' \in \xi _{i+1}\) of Algorithm 5. Lemma 16 and the induction hypothesis complete the proof.    \(\square \)

We can now proceed to prove the complexity bound for Algorithm 5.

Lemma 18

Algorithm 5 terminates in \(O(n^2 \zeta + (n+m)\zeta ^2)\) SAT queries and time.

Proof

By Theorem 2, each invocation of Algorithm 1 in line 4 takes \(O\left( n^2 \cdot {\left| {\mathcal {M}_{b_i}({{\textit{Init}}})}\right| }_\textrm{dnf}\right) = O(n^2 \zeta )\) queries and time. Similarly, each invocation in line 6 takes

figure af

queries and time. This quantity is \(O(n^2 \zeta )\), because by Lemma 3

In each iteration, the number of combinations of terms in line 8 is at most . For each of the m terms in the combination, we split the term to \(\varSigma ,\varSigma '\) parts in time linear term size which is at most n, and perform a SAT check for whether the term intersects \(\xi _i\). Overall this step involves \(O\left( m \cdot \zeta \right) \) queries and \(O\left( (n+m) \cdot \zeta \right) \) time. This is the cost of each iteration; the number of iterations is bounded by \(\zeta \) by Theorem 6. The claim follows.    \(\square \)

The proof of Theorem 7 follows from Lemmas 17 and 18.

Remark 4

Lemmas 17 and 18 have the consequence that in Algorithm 4, \({\left| {\xi _i}\right| }_\textrm{dnf} \le \zeta \) (interestingly, this is true in particular for the resulting inductive invariant). This is a proof that the direct implementation of the monotone hull by m calls to Algorithm 1 amounts to \(O(n^2 m \zeta )\) SAT queries in each iteration, and \(O(n^2 m \zeta ^2)\) time thanks to Theorem 6. Though asymptotically inferior, this implementation approach may be more efficient than Algorithm 5 when \({\left| {\xi _i}\right| }_\textrm{dnf} \ll \zeta \).

7 Related Work

Complexity Bounds for Ivariant Inference. Conjunctive/disjunctive invariants can be inferred in a linear number of SAT calls [16, 25]. On the other hand, inferring CNF/DNF invariants for general transition systems is \({\textbf {NP}}\)-hard with access to a SAT solver [25], even when the invariants are restricted to monotone formulas [12]. Complexity results for model-based interpolation-based invariant inference (stemming from the analysis of the algorithm by [3, 6]) were presented in [13], based on the (backwards) fence condition, which tames reachability enough to efficiently infer monotone and almost-monotone DNF invariants, or monotone and almost-monotone CNF invariants under the (forwards) fence condition with the dual algorithms. Our algorithm (Algorithm 5) can achieve the same bounds if, similar to [13], the monotone basis (set of counterexamples) is fixed in advance. (This alteration is needed because some short monotone DNF formulas have large CNF size [27].) Our algorithm is more versatile because it applies to CDNF formulas which are not monotone or almost-monotone, and alleviates the need to know a monotone basis in advance. A CDNF complexity bound similar to ours was obtained in [13] under the much stronger assumption that both the backwards and the forwards fence condition hold simultaneously (see Remark 2). Property-directed reachability algorithms [4, 9] were shown efficient on several parametrized examples [14, 34] and the very special case of maximal transition systems for monotone invariants [12]. We show (Sect. 6) that abstract interpretation in the monotone theory, studied under the name \(\varLambda \)-PDR [14] in relation to standard PDR, is efficient in broader circumstances, when the DNF size of certain monotonizations of the transition relation are small, using the same quantity \(\zeta \) that in previous work [14] was established as an upper bound on the number of iterations (without an overall complexity result).

Monotone Theory in Invariant Inference. The monotone theory has been employed in previous works on invariant inference. The aforementioned previous results on inference under the fence condition [13] also employ the monotone theory, and are based on one-to-one translations of Bshouty’s algorithms to invariant inference, replacing equivalence queries by inductiveness checks and membership queries by bounded model checking. For Bshouty’s CDNF algorithm this only works under the stronger two-sided fence condition, which is why our algorithm differs significantly (see Remark 3). The one-sided fence condition suffices for the translation of Bshouty’s \(\varLambda \)-algorithm, which is suitable when a monotone basis is known in advance, e.g. for almost-monotone invariants, whereas the CDNF algorithm learns a monotone basis on-the-fly. Another translation of Bshouty’s algorithm to invariant inference is by Jung et al. [23], who combine the CDNF algorithm of Bshouty [5] with predicate abstraction and templates to infer quantified invariants. They overcome the problem of membership queries in the original algorithm (see Remark 3) heuristically, using under- and over-approximations and sometimes random guesses, which could lead to the need to restart. The monotone theory is also used in a non-algorithmic way in [14] to analyze overapproximation in IC3/PDR through abstract interpretation in the monotone theory, to which we prove a complexity upper bound.

Inferring Decision Tree Invariants. In machine learning, decision trees are a popular representation of hypotheses and target concepts. Garg et al. [19] adapt an algorithm by Quinlan [31] to infer invariants (later, general Horn clauses [11]) in the form of decision trees over numerical and Boolean attributes, which is guaranteed to converge, but not necessarily efficiently overall (even though the algorithm efficiently generates the candidates, the number of candidates could be large). Similarly to our algorithm, the translation of the CDNF algorithm to invariant inference in [13] is applicable also to Boolean decision trees, but, as previously mentioned, requires the stronger two-sided fence condition, whereas our result is the first to do so under the (one-sided) fence condition.

Complexity of Abstract Interpretation. The efficiency of the abstract transformers is crucial to the overall success of abstract interpretation, which is often at odds with the domain accuracy; a famous example is the octagon abstract domain [28], whose motivation is the prohibitive cost of the expressive polyhedra domain [7]. We provide a way to compute abstract transformers in the monotone span domain that is efficient in terms of the DNF size of the result (see also Remark 4). The computation of the abstract transformer in Algorithm 5 is inspired by works in symbolic abstraction [32, 40] about finding representations of the best abstract transformer, rather than computing it anew per input [10, 33, 39].

8 Conclusion

This work has accomplished invariant inference algorithms with efficient complexity guarantees in two settings—model-based interpolation and property-directed reachability—resolving open problems where the missing component (as it turns out) was a new way to compute monotone overapproximations. A common theme is the use of rich syntactic forms of the formulas that the algorithms maintain: in our model-based interpolation algorithm, the candidate invariant is a conjunction of DNFs, even though the target invariant has both short CNF and DNF representations. In our efficient implementation of abstract interpretation, each iterate is again a conjunction of DNFs, although the natural definition (Theorem 1, and as inspired by PDR) is with CNF formulas. We hope that these ideas could inspire new interpolation-based and property-directed reachability algorithms that would benefit in practice from richer hypotheses and techniques from the monotone theory.