Cell Morphing: From Array Programs to Array-Free Horn Clauses

Monniaux, David; Gonnord, Laure

doi:10.1007/978-3-662-53413-7_18

David Monniaux^14,15 &
Laure Gonnord¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9837))

Included in the following conference series:

International Static Analysis Symposium

1060 Accesses
25 Citations

Abstract

Automatically verifying safety properties of programs is hard. Many approaches exist for verifying programs operating on Boolean and integer values (e.g. abstract interpretation, counterexample-guided abstraction refinement using interpolants), but transposing them to array properties has been fraught with difficulties. Our work addresses that issue with a powerful and flexible abstraction that morphes concrete array cells into a finite set of abstract ones. This abstraction is parametric both in precision and in the back-end analysis used.

From our programs with arrays, we generate nonlinear Horn clauses over scalar variables only, in a common format with clear and unambiguous logical semantics, for which there exist several solvers. We thus avoid the use of solvers operating over arrays, which are still very immature.

Experiments with our prototype vaphor show that this approach can prove automatically and without user annotations the functional correctness of several classical examples, including selection sort, bubble sort, insertion sort, as well as examples from literature on array analysis.

The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement nr. 306595 “STATOR”.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Booster: An Acceleration-Based Verification Framework for Array Programs

Quantified Invariants via Syntax-Guided Synthesis

Program Extraction for Mutable Arrays

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In this article, we consider programs operating over arrays, or, more generally, maps from an index type to a value type (in the following, we shall use “array” and “map” interchangeably). Such programs contain read (e.g. $v:=a[i]$) and write ($a[i]:=v$) operations over arrays, as well as “scalar” operations.^{Footnote 1} We wish to fully automatically verify properties on such programs; e.g. that a sorting algorithm outputs a sorted permutation of its input.

Universally Quantified Properties. Very often, desirable properties over arrays are universally quantified; e.g. sortedness may be expressed as $\forall k_1,k_2~ k_1 < k_2 \implies a[k_1] \le a[k_2]$. However, formulas with universal quantification and linear arithmetic over integers and at least one predicate symbol (a predicate being a function to the Booleans) form an undecidable class [20], of which some decidable subclasses have however been identified [8]. There is therefore no general algorithm for checking that such invariants hold, let alone inferring them. Yet, there have been several approaches proposed to infer such invariants (see Sect. 7).

We here propose a method for inferring such universally quantified invariants, given a specification on the output of the program. This being undecidable, this approach may fail to terminate in the general case, or may return “unknown”. Experiments however show that our approach can successfully and automatically verify nontrivial properties (e.g. the output from selection sort is sorted and is a permutation of the input).

Our key insight is that if there is a proof of safety of an array-manipulating program, it is likely that there exists a proof that can be expressed with simple steps over properties relating only a small number N of (parametric) array cells, called “distinguished cells”. For instance, all the sorting algorithms we tried can be proved correct with $N=2$, and simple array manipulations (copying, reversing...) with $N=1$.

Horn Clauses. We convert the verification problem to Horn clauses, a common format for program verification problems [36] supported by a number of tools. Usual conversions [18] map variables and operations from the program to variables of the same type and the same operations in the Horn clause problem:^{Footnote 2} an integer is mapped to an integer, an array to an array, etc. If arrays are not supported by the back-end analysis, they may be abstracted away (reads replaced by nondeterministic choices, writes discarded) at the expense of precision. In contrast, our approach abstracts programs much less violently, with tunable precision, even though the result is still a Horn clause problem without arrays. Section 3 explains how many properties (e.g. initialization) can be proved using one “distinguished cell” ($N=1$), Sect. 4 explains how properties such as sortedness can be proved using two cells; completely discarding arrays corresponds to using zero of them.

An interesting characteristic of the Horn clauses we produce is that they are nonlinear^{Footnote 3}, even though a straightforward translation of the semantics of a control-flow graph into Hoare triples expressed as clauses yields a linear system (whose unfoldings correspond to abstract execution traces). If a final property to prove (e.g. “all values are 0”) queries one cell position, this query may morph, by the backward unfolding of the clauses, into a tree of queries at other locations, in contrast to some earlier approaches [31].

We illustrate this approach with automated proofs of several examples from the literature: we apply Sects. 3, 4 or 5 to obtain a system of Horn clauses without arrays. This system is then fed to the Z3, Eldarica or Spacer solver, which produces a model of this system, meaning that the postcondition (e.g. sortedness or multiset of the output equal to that of the input) truly holds.^{Footnote 4}

Previous approaches [31] using “distinguished cells” amounted (even though not described as such) to linear Horn rules; on contrast, our abstract semantics uses non-linear Horn rules, which leads to higher precision (Sect. 7.1).

Contributions. Our main contribution is a system of rules for transforming the atomic program statements in a program operating over arrays or maps, as well as (optionally) the universally quantified postcondition to prove, into a system of non-linear Horn clauses over scalar variables only. The precision of this transformation is tunable using a Galois connection parameterized by the number of “distinguished cells”; e.g. properties such as sortedness need two distinguished cells (Sect. 4) while simpler properties need only one (Sect. 3). Statements operating over non-arrays variables are mapped (almost) identically to their concrete semantics. This system over-approximates the behavior of the program. A solution of that system can be mapped to inductive invariants over the original programs, including universal properties over arrays.

A second contribution, based on the first, is a system of rules that also keeps track of array/map contents (Sect. 5) as a multiset. This system is suitable for showing content properties, e.g. that the output of a sorting algorithm is a permutation of the input, even though the sequence of operations is not directly a sequence of swaps.

We implemented our approach and benchmarked it over several classical examples of array algorithms (Sect. 6), comparing it favorably to other tools.

2 Program Verification as Solving Horn Clauses

A classical approach to program analysis is to consider a program as a control-flow graph and to attach to each vertex $p_i$ (control point) an inductive invariant $I_i$: a set of possible values $\mathbf {x}$ of the program variables (and memory stack and heap, as needed) so that (i) the set associated to the initial control point $p_{i_0}$ contains the possible initialization values $S_{i_0}$ (ii) for each edge $p_i \rightarrow _c p_j$ (c for concrete), the set $I_j$ associated to the target control point $p_j$ should include all the states reachable from the states in the set $I_i$ associated to the source control point $p_i$ according to the transition relation $\tau _{i,j}$ of the edge. Inductiveness is thus defined by Horn clauses ^{Footnote 5}:

$$\begin{aligned} \forall \mathbf {x},~ S_{i_0}(\mathbf {x}) \implies I_{i_0}(\mathbf {x}) \end{aligned}$$

(1)

$$\begin{aligned} \forall \mathbf {x},\mathbf {x}',~ I_i(\mathbf {x}) \wedge \tau _{i,j}(\mathbf {x},\mathbf {x}') \implies I_j(\mathbf {x}') \end{aligned}$$

(2)

For proving safety properties, in addition to inductiveness, one requires that error locations $p_{e_1},\dots ,p_{e_n}$ are proved to be unreachable (the associated set of states is empty): this amounts to Horn clauses implying false: $\forall \mathbf {x},~ I_{e_i}(\mathbf {x}) \Rightarrow \bot $.

Various tools can solve such systems of Horn clauses, that is, can synthesize suitable predicates $I_i$, which constitute inductive invariants. In this article, we tried Z3 ^{Footnote 6} with the PDR fixed point solver [22], Z3 with the Spacer solver [24, 25],^{Footnote 7} and Eldarica [35].^{Footnote 8} Since program verification is undecidable, such tools, in general, may fail to terminate, or may return “unknown”.

For the sake of simplicity, we shall consider, in this article, that all integer variables in programs are mathematical integers ($\mathbb {Z}$) as opposed to machine integers^{Footnote 9} and that arrays are infinite. Again, it is easy to modify our semantics to include systematic array bound checks, jumps to error conditions, etc.

In examples, instead of writing $I_{ stmt }$ for the name of the predicate (inductive invariant) at statement $ stmt $, we shall write $ stmt $ directly, for readability’s sake: thus we write e.g. $ loop $ for a predicate at the head of a loop. Furthermore, for readability, we shall sometimes coalesce several successive statements into one.

Example 1

(Motivating example). Consider the program:

We would like to prove that this program truly fills array with value 42. The flat encoding into Horn clauses assigns a predicate (set of states) to each of the control nodes (Fig. 1), and turns each transition into a Horn rule with variables ranging in $ Arr \left( A,B\right) $, the type of arrays of B indexed by A [26, Chap. 7]:

$$\begin{aligned} \begin{aligned} \forall n\in \mathbb {Z}~ \forall a\in Arr \left( \mathbb {Z},\mathbb {Z}\right) ~ n>0 \implies loop (n,0,a) \end{aligned} \end{aligned}$$

(3)

$$\begin{aligned} \begin{aligned} \forall n,i\in \mathbb {Z}~ \forall a\in Arr \left( \mathbb {Z},\mathbb {Z}\right) ~ i<n \wedge loop (n,i,a) \implies loop (n,i+1, store (a,i,42)) \end{aligned} \end{aligned}$$

(4)

$$\begin{aligned} \begin{aligned} \forall n,i\in \mathbb {Z}~ \forall a\in Arr \left( \mathbb {Z},\mathbb {Z}\right) ~ i\ge n \wedge loop (n,i,a) \implies end (n,i,a) \end{aligned} \end{aligned}$$

(5)

$$\begin{aligned} \begin{aligned} \forall x,n,i\in \mathbb {Z}~ \forall a\in Arr \left( \mathbb {Z},\mathbb {Z}\right) ~ 0 \le x < n \wedge end (n,i,a) \implies select(a,x) = 42 \end{aligned} \end{aligned}$$

(6)

where $ store (a,i,v)$ is array a where the value at index i has been replaced by v and $ select (a,x)$ denotes a[x].

None of the tools we have tried (Z3, Spacer, Eldarica) has been able to solve this system, presumably because they cannot infer universally quantified invariants over arrays.^{Footnote 10} Indeed, here the loop invariant needed is

$$\begin{aligned} 0 \le i \le n \wedge (\forall k~ 0 \le k < i \implies a[k] = 42) \end{aligned}$$

(7)

While $0 \le i \le n$ is inferred by a variety of approaches, the rest is tougher. We shall also see (Example 6) that a slight alteration of this example also prevents some earlier abstraction approaches from checking the desired property.

Most software model checkers attempt constructing invariants from Craig interpolants obtained from refutations [9] of the accessibility of error states in local [22] or global [29] unfoldings of the problem. However, interpolation over array properties is difficult, especially since the goal is not to provide any interpolant, but interpolants that generalize well to invariants [1, 2]. This article instead introduces a way to derive universally quantified invariants from the analysis of a system of Horn clauses on scalar variables (without array variables).

3 Getting Did of the Arrays

To use the power of Horn solvers, we soundly abstract problems with arrays to problems without arrays. In the Horn clauses for Example 1, we attached to each program point $p_\ell $ a predicate $I_\ell $ over $\mathbb {Z}\times \mathbb {Z}\times Arr \left( \mathbb {Z},\mathbb {Z}\right) $ when the program variables are two integers i, n and one integer-value, integer-indexed array a.^{Footnote 11} In any solution of the system of clauses, if the valuation (i, n, a) is reachable at program point $p_\ell $, then $I_\ell (i,n,a)$ holds. Instead, in the case of Example 1, we will consider a predicate $I^\sharp _\ell $ over $\mathbb {Z}\times \mathbb {Z}\times \mathbb {Z}\times \mathbb {Z}$ (the array $ key \rightarrow value $ has been replaced by a pair $( key , value )$) such that $I^\sharp _\ell (i,n,k,a_k)$ ^{Footnote 12} holds for each reachable state (i, n, a) satisfying $a[k]=a_k$. This is the same Galois connection [11] as some earlier works [31] [13, Sect. 2.1]; yet, as we shall see, our abstract transformers are more precise.

Definition 1

The “one distinguished cell” abstraction of $I \subseteq \chi \times Arr \left( \iota ,\beta \right) $ is $\alpha _{}(I) = \{ (\mathbf {x},i,a[i]) \mid \mathbf {x}\in \chi , i \in \iota \}$. The concretization of $I^\sharp \subseteq \chi \times (\iota \times \beta )$ is $\gamma _{}(I^\sharp ) = \{ (\mathbf {x},a) \mid \forall i\in \iota ~ (\mathbf {x},i,a[i]) \in I^\sharp \}$.

Theorem 1

is a Galois connection.

To provide the abstract transformers, we will suppose in the sequel that any statement in the program (control flow graph) will be: (i) either an array read to a fresh variable, in C syntax, $v:=a[i]$ in pseudo-code; the variables of the program are $(\mathbf {x},i,v)$ where $\mathbf {x}$ is a vector of arbitrarily many variables; (ii) either an array write, (where and are variables) in C syntax, $a[i]:=v$ in pseudo-code; the variables of the program are $(\mathbf {x},i,v)$ before and after the statement; (iii) or a scalar operation, including assignments and guards over scalar variables. More complex statements can be transformed to a sequence of such statements, by introducing temporary variables if needed: for instance, $a[i]:=a[j]$ is transformed into $ temp :=a[j];~ a[i]:= temp $.

Definition 2

(Read statement). Let be a variable of type $\beta $, be a variable of type $\iota $, and be an array of values of type $\beta $ with an index of type $\iota $. Let $\mathbf {x}$ be the other program variables, taken in $\chi $. The concrete “next state” relation for the read statement between locations $p_1$ and $p_2$ is $(\mathbf {x},i,v,a) \rightarrow _c (\mathbf {x},i,a[i],a)$.

Its forward abstract semantics is encoded into two Horn clauses:

$$\begin{aligned} \begin{aligned} \forall \mathbf {x}\in \chi ~ \forall i \in \iota ~ \forall v,a_i \in \beta ~ \forall k \in \iota ~ \forall a_k \in \beta \\ k \ne i \wedge I^\sharp _1\big ((\mathbf {x},i,v),(k,a_k)\big ) \wedge I^\sharp _1\big ((\mathbf {x},i,v),(i,a_i)\big ) \implies I^\sharp _2\big ((\mathbf {x},i,a_i),(k,a_k)\big ) \end{aligned} \end{aligned}$$

(8)

$$\begin{aligned} \begin{aligned} \forall \mathbf {x}\in \chi ~ \forall i \in \iota ~ \forall v,a_i \in \beta ~ \forall k \in \iota ~ \forall a_k \in \beta \\ I^\sharp _1\big ((\mathbf {x},i,v),(i,a_i)\big ) \implies I^\sharp _2\big ((\mathbf {x},i,a_i),(i,a_i)\big ) \end{aligned} \end{aligned}$$

(9)

The tuple $(k,a_k)$ now represents a “distinguished cell”. While rule 9 is straightforward ($a_i$ is assigned to the variable v), the nonlinear rule 8 may be more difficult to comprehend. The intuition is that, to have both $a_i = a[i]$ and $a_k = a[k]$ at the read instruction with a given valuation $(\mathbf {x},i)$ of the other variables, both $a_i = a[i]$ and $a_k = a[k]$ had to be reachable with the same valuation.

Remark 1

We use two separate rules for $k=i$ and $k\ne i$ for better precision. A single rule $I^\sharp _1\big ((\mathbf {x},i,v),(k,a_k)\big ) \wedge I^\sharp _1\big ((\mathbf {x},i,v),(i,a_i)\big ) \implies I^\sharp _2\big ((\mathbf {x},i,a_i),(k,a_k)\big )$ would not enforce that if $i=k$ then $a_i = a_k$ in the consequent.

From now on, we shall omit all universal quantifiers inside rules, for readability.

Definition 3

(Write statement). The concrete “next state” relation for the write statement is $(\mathbf {x},i,v,a) \rightarrow _c (\mathbf {x},i,v, store (a,i,v))$. Its forward abstract semantics is encoded into two Horn clauses, depending on whether the distinguished cell is i or not:

$$\begin{aligned} \begin{aligned} I^\sharp _1\big ((\mathbf {x},i,v),(k,a_k)\big ) \wedge i \ne k \implies I^\sharp _2\big ((\mathbf {x},i,v),(k,a_k)\big ) \end{aligned} \end{aligned}$$

(10)

$$\begin{aligned} \begin{aligned} I^\sharp _1\big ((\mathbf {x},i,v),(i,a_i)\big ) \implies I^\sharp _2\big ((\mathbf {x},i,v),(i,v)\big ) \end{aligned} \end{aligned}$$

(11)

Example 2

(Example 1 , cont.). The $a[i]:=42$ statement of Example 1 is translated into (the loop control point is divided into loop / write / incr, all predicates of arity 4):

$$\begin{aligned} \begin{aligned} i \ne k \wedge write (n,i,k,a_k) \implies incr (n,i,k,a_k) \end{aligned} \end{aligned}$$

(12)

$$\begin{aligned} \begin{aligned} write (n,i,i,a_i) \implies incr (n,i,i,42) \end{aligned} \end{aligned}$$

(13)

Definition 4

(Initialization). The creation of an array variable with nondeterministically chosen initial content is abstracted by $I^\sharp _1(\mathbf {x}) \implies I^\sharp _2(\mathbf {x},k,a_k)$.

Definition 5

(Scalar statements). With the same notations as above, we consider a statement (or sequence thereof) operating only on scalar variables: $\mathbf {x}\rightarrow _s \mathbf {x}'$ if it is possible to obtain scalar values $\mathbf {x}'$ after executing the statement on scalar values $\mathbf {x}$. The concrete “next state” relation for that statement is $(\mathbf {x},i,v,a) \rightarrow _c (\mathbf {x}',i,v,a)$. Its forward abstract semantics is encoded into:

$$\begin{aligned} I^\sharp _1(\mathbf {x},k,a_k) \wedge \mathbf {x}\rightarrow _s \mathbf {x}' \implies I^\sharp _2(\mathbf {x}',k,a_k) \end{aligned}$$

(14)

Example 3

A test $x \ne y$ gets abstracted as

$$\begin{aligned} I^\sharp _1(x,y,k,a_k) \wedge x \ne y \implies I^\sharp _2(x,y,k,a_k) \end{aligned}$$

(15)

Definition 6

The scalar operation $ kill (v_1,\dots ,v_n)$ removes variables $v_1,\dots ,v_n$: $(\mathbf {x},v_1,\dots ,v_n) \rightarrow \mathbf {x}$. We shall apply it to get rid of dead variables, sometimes, for the sake of brevity, without explicit note, by coalescing it with other operations.

Our Horn rules are of the form $\forall \mathbf {y}~ I_1^\sharp (\mathbf {f_1}(\mathbf {y})) \wedge \dots \wedge I_1^\sharp (\mathbf {f_m}(\mathbf {y})) \wedge P(\mathbf {y}) \implies I^\sharp _2(\mathbf {g}(\mathbf {y}))$ ($\mathbf {y}$ is a vector of variables, $\mathbf {f_1},\dots ,\mathbf {f_m}$ vectors of terms depending on $\mathbf {y}$, P an arithmetic predicate over $\mathbf {y}$). In other words, they impose in $I^\sharp _2$ the presence of $\mathbf {g}(\mathbf {y})$ as soon as certain $\mathbf {f_1}(\mathbf {y}),\dots ,\mathbf {f_m}(\mathbf {y})$ are found in $I_1^\sharp $. Let $I^\sharp _{2-}$ be the set of such imposed elements. This Horn rule is said to be sound if $\gamma (I^\sharp _{2-})$ includes all states $(\mathbf {x}',a')$ such that there exists $(\mathbf {x},a)$ in $\gamma (I^\sharp _1)$ and $(\mathbf {x},a) \rightarrow _c (\mathbf {x}',a')$.

Lemma 1

The forward abstract semantics of the read statement (Definition 2), of the write statement (Definition 3), of array initialization (Definition 4), of the scalar statements (Definition 5) are sound w.r.t the Galois connection.

Remark 2

The scalar statements include “killing” dead variables (Definition 6). Note that, contrary to many other abstractions, in ours, removing some variables may cause irrecoverable loss of precision on other variables [31, Sect. 4.2]: if v is live, then one can represent $\forall k,~ a[k]=v$, which implies $\forall k_1,k_2~ a[k_1] = a[k_2]$ (constantness), but if v is discarded, the constantness of a is lost.

Theorem 2

If $I^\sharp _1,\dots ,I^\sharp _m$ are a solution of a system of Horn clauses sound in the above sense, then $\gamma (I^\sharp _1),\dots ,\gamma (I^\sharp _m)$ are inductive invariants w.r.t the concrete semantics $\rightarrow _c$.

Definition 7

(Property conversion). A property “at program point $p_\ell $, for all $\mathbf {x}\in \chi $ and all $k \in \iota $, $\phi (\mathbf {x},k,a[k])$ holds” (where $\phi $ is a formula, say over arithmetic) is converted into a Horn query $\forall \mathbf {x}\in \chi ~ \forall k \in \iota ~ I^\sharp _{\ell }(\mathbf {x},k,a_k) \implies \phi (\mathbf {x},k,a_k)$.

Our method for converting a scalar program into a system of Horn clauses over scalar variables is thus:

Algorithm 1

(Abstraction into Horn constraints). Given the control-flow graph of the program:

1.
To each control point $p_\ell $, with vector of scalar variables $\mathbf {x}_\ell $, associate a predicate $I^\sharp _\ell (\mathbf {x}_\ell , k, a_k)$ in the Horn clause system (the vector of scalar variables may change from control point to control point).
2.
For each transition of the program, generate Horn rules according to Definitions 2, 3, 5 as applicable (an initialization node has no antecedents in its rule).
3.
Generate Horn queries from desired properties according to Definition 7.

Example 4

(Example 1 , continued). Let us now apply the Horn abstract semantics from Definitions 3 and 5 to Program 1; in this case, $\beta = \mathbb {Z}$, $\iota = \{ 0,\dots ,n-1 \}$ (thus we always have $0\le k < n$) , $\chi = \mathbb {Z}$. After slight simplification, we get:

$$\begin{aligned} \begin{aligned} 0\le k < n \implies loop (n,0,k,a_k) \end{aligned}\end{aligned}$$

(16)

$$\begin{aligned} \begin{aligned} 0 \le k< n \wedge i < n \wedge loop (n,i,k,a_k) \implies write (n,i,k,a_k) \end{aligned} \end{aligned}$$

(17)

$$\begin{aligned} \begin{aligned} 0 \le k < n \wedge i \ne k \wedge write (n,i,k,a_k) \implies incr (n,i,k,a_k) \end{aligned}\end{aligned}$$

(18)

$$\begin{aligned} \begin{aligned} write (n,i,i,a_i) \implies incr (n,i,i,42) \end{aligned}\end{aligned}$$

(19)

$$\begin{aligned} \begin{aligned} 0 \le k < n \wedge incr (n,i,k,a_k) \implies loop (n,i+1,k,a_k) \end{aligned}\end{aligned}$$

(20)

$$\begin{aligned} \begin{aligned} ~ 0 \le k < n \wedge i \ge n \wedge loop (n,i,k,a_k) \implies end (n,i,k,a_k) \end{aligned} \end{aligned}$$

(21)

Finally, we add the postcondition (using Definition 7):

$$\begin{aligned} \begin{aligned} 0 \le k < n \wedge end (n,i,k,a_k) \Rightarrow a_k = 42 \end{aligned} \end{aligned}$$

(22)

A solution to the resulting system of Horn clauses can be found by e.g. Z3.

Our approach can also be used to establish relationships between several arrays, or between the initial values in an array and the final values: arrays a[i] and b[j] can be abstracted by a quadruple $(i,a_i,j,b_j)$.^{Footnote 13}

Example 5

Consider the problem of finding the minimum of an array slice $a[d \dots h-1]$, with value $b=a[p]$:

Again, we encode the abstraction of the statements (Definitions 2, 3, 5) as Horn clauses. We obtained a predicate $ end (d,h,p,b,k,a_k)$ constrained as follows:

$$\begin{aligned} \begin{aligned} end (d,h,p,b,p,a_p) \implies b = a_p \end{aligned} \end{aligned}$$

(23)

$$\begin{aligned} \begin{aligned} d \le k < h \wedge end (d,h,p,b,k,a_k) \implies b \le a_k \end{aligned} \end{aligned}$$

(24)

Rule 23 imposes the postcondition $b = a[p]$, Rule 24 imposes the postcondition $\forall k~ d \le k < h \implies b \le a[k]$. This example is again solved by Z3.

Earlier approaches based on translation to programs [31], thus transition systems, are equivalent to translating into linear Horn clauses where $x_1,\dots ,x_p$ are the same in the antecedent and consequent:

$$\begin{aligned} I_1(\dots ,x_1,a_1,\dots ,x_p, a_p) \wedge condition \rightarrow I_2(\dots ,x_1,a'_1,\dots ,x_p,a'_p) \end{aligned}$$

(25)

In contrast, in this article we use a much more powerful translation to nonlinear Horn clauses (Sect. 7.1) where $x''_1,\dots ,x''_p$ differ from $x_1,\dots ,x_p$:

$$\begin{aligned} I_1(\dots ,x_1,a_1,\dots ,x_p,a_p) \wedge \dots \wedge&I_1(\dots ,x''_1,a''_1,\dots ,a''_p,x''_p) \wedge condition \nonumber \\&\qquad \qquad \rightarrow I_2(\dots ,x_1,a'_1,\dots ,x_p,a'_p) \end{aligned}$$

(26)

Example 6

(Motivating example, altered). The earlier work [31] could successfully analyze Example 1. However a slight modification of the program prevents it from doing so:

For this particular example, the array would be abstracted all over the program using a fixed number of cells , where $x_1,\dots ,x_p$ are symbolic constants.

The second loop is then analyzed as though it were^{Footnote 14}.

One easily sees that if $p < n$, there must be a loop iteration where and thus at location takes any value and may take value 1.

The output of our translation process in this example is the same until the M control point, then^{Footnote 15} it is:

$$\begin{aligned} \begin{aligned} end (n,i,k,a_k) \implies loop2 (n,0,x,k,a_k,0) \end{aligned} \end{aligned}$$

(27)

$$\begin{aligned} \begin{aligned} loop2 (n,i,x,k,a_k,f) \wedge i< n \implies rd2 (n,i,x,k,a_k,f) \end{aligned}\end{aligned}$$

(28)

$$\begin{aligned} \begin{aligned} rd2 (n,i,x,k,a_k,f) \wedge i\ne k \wedge {\varvec{rd}}{} \mathbf{2}{\varvec{(n,i,x,i,}}{\varvec{a}}_{\varvec{i}}{\varvec{,f)}} \implies test2 (n,i,a_i,k,a_k,f) \end{aligned}\end{aligned}$$

(29)

$$\begin{aligned} \begin{aligned} rd2 (n,i,x,i,a_i,f) \implies test2 (n,i,a_i,i,a_i,f) \end{aligned}\end{aligned}$$

(30)

$$\begin{aligned} \begin{aligned} test2 (n,i,x,k,a_k,f) \wedge x\ne 42 \implies loop2 (n,i+1,k,a_k,1) \end{aligned}\end{aligned}$$

(31)

$$\begin{aligned} \begin{aligned} test2 (n,i,x,k,a_k,f) \wedge x= 42 \implies loop2 (n,i+1,k,a_k,f) \end{aligned}\end{aligned}$$

(32)

$$\begin{aligned} \begin{aligned} loop2 (n,i,x,k,a_k,f) \wedge i\ge n \implies end2 (f) \end{aligned}\end{aligned}$$

(33)

$$\begin{aligned} \begin{aligned} end2 (f) \Rightarrow f=0 \quad \text {(property to prove)} \end{aligned} \end{aligned}$$

(34)

This abstraction is precise enough to prove the desired property, thanks to the bold antecedent (multiples cell indices occur within the same unfolding). Without this nonlinear rule (removing this antecedent yields a sound abstraction equivalent to [31]), the unfoldings of the system of rules all carry the same k (index of the distinguished cell) between predicates: the program is analyzed with respect to one single a[k], with k symbolic, leading to insufficient precision.

No Restrictions on Domain Type and Relationships, and Matrices. The kind of relationship that can be inferred between loop indices, array indices and array contents is limited only by the capabilities of the Horn solver. For instance, invariants of the form $\forall i~ i \equiv 0 \pmod 2 \implies a[i]=0$ may be inferred if the Horn solver supports numeric invariants involving divisibility. Similarly, we have made no assumption regarding the nature of the indexing variable: we used integers because arrays indexed by an integer range are a very common kind of data structure, but really it can be any type supported by the Horn clause solver, e.g. rationals or strings. For instance, matrices (resp. n-tensors) are specified by having pairs of integers (resp. n-tuples) as indices.

4 Sortedness and Other N-ary Predicates

The Galois connection of Definition 1 expresses relations of the form $\forall k \in \iota ~ \phi (\mathbf {x}, k, a[k])$ where $\mathbf {x}$ are variables from the program, a a map and k an index into the map a; in other words, relations between each array element individually and the rest of the variables. It cannot express properties such as sortedness, which link two array elements: $\forall k_1,k_2 \in \iota ~ k_1 < k_2 \implies a[k_1] \le a[k_2]$.

For such properties, we need two “distinguished cells”, with indices $k_1$ and $k_2$. For efficiency, we break this symmetry between indices $k_1$ and $k_2$ by imposing $k_1 < k_2$ for some total order.

Definition 8

The abstraction with indices $k_1 < k_2$ is

$$\begin{aligned} \gamma _{2<}(I^\sharp ) = \{ (\mathbf {x},a) \mid \forall k_1 < k_2\in \iota ~ (\mathbf {x},k_1,a[k_1],k_2,a[k_2]) \in I^\sharp \} \end{aligned}$$

(35)

$$\begin{aligned} \alpha _{2<}(I) = \{ (\mathbf {x},k_1,a[k_1],k_2,a[k_2]) \mid x \in \chi , k_1 \le k_2 \in \iota \} \end{aligned}$$

(36)

Theorem 3

$\alpha _{2<}$ and $\gamma _{2<}$ form a Galois connection:

These constructions easily generalize to arbitrary N indices $k_1,\dots ,k_N$.

Definition 9

(Read, two indices $k_1 < k_2$ ). The abstraction of $v := a[i]$ is:

$$\begin{aligned} \begin{aligned} I^\sharp _1(\mathbf {x}, i, v, k_1, a_{k_1}, k_2, a_{k_2}) \wedge I^\sharp _1(\mathbf {x}, i, v, i, a_i, k_2, a_{k_2}) \wedge \\ I^\sharp _1(\mathbf {x}, i, v, i, a_i, k_1, a_{k_1}) \wedge i< k_1 < k_2 \implies I^\sharp _2(\mathbf {x}, i, a_i, k_1, a_{k_1}, k_2, a_{k_2}) \end{aligned} \end{aligned}$$

(37)

$$\begin{aligned} \begin{aligned} I^\sharp _1(\mathbf {x}, i, v, i, a_i, k_2, a_{k_2}) \wedge I^\sharp _1(\mathbf {x}, i, v, k_1, a_{k_1}, k_2, a_{k_2}) \wedge \\ I^\sharp _1(\mathbf {x}, i, v, k_1, a_{k_1}, i, a_i) \wedge k_1< i < k_2 \implies I^\sharp _2(\mathbf {x}, i, a_i, k_1, a_{k_1}, k_2, a_{k_2}) \end{aligned}\end{aligned}$$

(38)

$$\begin{aligned} \begin{aligned} I^\sharp _1(\mathbf {x}, i, v, k_2, a_{k_2}, i, a_i) \wedge I^\sharp _1(\mathbf {x}, i, v, k_1, a_{k_1}, i, a_i) \wedge \\ I^\sharp _1(\mathbf {x}, i, v, k_1, a_{k_1}, k_2, a_{k_2}) \wedge k_1< k_2 < i \implies I^\sharp _2(\mathbf {x}, i, a_i, k_1, a_{k_1}, k_2, a_{k_2}) \end{aligned}\end{aligned}$$

(39)

$$\begin{aligned} \begin{aligned} I^\sharp _1(\mathbf {x}, i, v, i, a_i, k_2, a_{k_2}) \wedge i < k_2 \implies I^\sharp _2(\mathbf {x}, i, a_i, i, a_i, k_2, a_{k_2}) \end{aligned}\end{aligned}$$

(40)

$$\begin{aligned} \begin{aligned} I^\sharp _1(\mathbf {x}, i, v, k_1, a_{k_1}, i, a_i) \wedge k_1 < i \implies I^\sharp _2(\mathbf {x}, i, a_i, k_1, a_{k_1}, i, a_i) \end{aligned} \end{aligned}$$

(41)

This generalizes to N-ary abstraction by considering all orderings of i inside $k_1< \dots < k_N$, and for each ordering taking all sub-orderings of size N.

Definition 10

(Write statement, two indices $k_1 < k_2$ ). The abstraction of $a[i]:=v$ is:

$$\begin{aligned} \begin{aligned} I^\sharp _1(\mathbf {x}, i, v, k_1, a_{k_1}, k_2, a_{k_2}) \wedge i \ne k_1 \wedge i \ne k_2 \implies I^\sharp _2(\mathbf {x}, i, v, k_1, a_{k_1}, k_2, a_{k_2}) \end{aligned}\end{aligned}$$

(42)

$$\begin{aligned} \begin{aligned} I^\sharp _1(\mathbf {x}, i, v, i, a_{i}, k_2, a_{k_2}) \wedge i < k_2 \implies I^\sharp _2(\mathbf {x}, i, v, i, v, k_2, a_{k_2}) \end{aligned}\end{aligned}$$

(43)

$$\begin{aligned} \begin{aligned} I^\sharp _1(\mathbf {x}, i, v, k_1, a_{k_1}, i, a_{i}) \wedge k_1 < i \implies I^\sharp _2(\mathbf {x}, i, v, k_1, a_{k_1}, i, v) \end{aligned} \end{aligned}$$

(44)

Lemma 2

The abstract forward semantics of the read statement (Definition 9) and of the write statement (Definition 10) are sound w.r.t the Galois connection.

Example 7

(Selection sort). Selection sort finds the least element in $a[d \dots h-1]$ (using Program 5 as its inner loop) and swaps it with a[d], then sorts $a[d+1,h-1]$. At the end, $a[d_0\dots h-1]$ is sorted, where $d_0$ is the initial value of d.

Using the rules for the read (Definition 9) and write (Definition 10) statements, we write the abstract forward semantics of this program as a system of Horn clauses.

We wish to prove that, at the end, $a[d_0,h-1]$ is sorted: at the $ exit $ node,

$$\begin{aligned} \forall d_0 \le k_1< k_2 < h,~ a[k_1] \le a[k_2] \end{aligned}$$

(45)

This is expressed as the final condition:

$$\begin{aligned} \begin{aligned} d_0 \le k_1< k_2 < h \wedge exit (d_0, h, k_1, a_{k_1}, k_2, a_{k_2}) \implies a_{k_1} \le a_{k_2} \end{aligned} \end{aligned}$$

(46)

By running a solver on these clauses, we show that the output of selection sort is truly sorted^{Footnote 16} Let us note that this proof relies on nontrivial invariants:^{Footnote 17}

$$\begin{aligned} \forall k_1,k_2,~ d_0 \le k_1< d \wedge k_1 \le k_2 < h \implies a[k_1] \le a[k_2] \end{aligned}$$

(47)

This invariant can be expressed in our Horn clauses as:

$$\begin{aligned} \begin{aligned} d_0 \le k_1< d \wedge k_1< k_2 < h \wedge outerloop (d_0,d,h,k_1,a_{k_1},k_2,a_{k_2}) \implies a_{k_1} \le a_{k_2} \end{aligned} \end{aligned}$$

(48)

If this invariant is added to the problem as an additional query to prove, solving time is reduced from 6 min to 1 s. It may seem counter-intuitive that a solver takes less time to solve a problem with an additional constraint; but this constraint expresses an invariant necessary to prove the solution, and thus nudges the solver towards the solution.

Our approach is therefore flexible: if a solver fails to prove the desired property on its own, it is possible to help it by providing partial invariants. This is a less tedious approach than having to provide full invariants at every loop header, as common in assisted Floyd-Hoare proofs.

5 Sets and Multisets

Our abstraction for maps may be used to abstract (multi)sets. Let us see for instance how to abstract the multiset of elements of an array, so as to show that the output of a sorting algorithm is a permutation of the input.

In Example 7, we showed how to prove that the output of selection sort is sorted. This is not enough for functional correctness: we also have to prove that the output is a permutation of the input, or, equivalently, that the multiset of elements in the output array is the same as that in the input array.

Let us remark that it is easy to keep track, in an auxiliary map, of the number $\#a(x)$ of elements of value x in the array a[]. Only write accesses to a[] have an influence on $\#a$: a write $a[i]:=v$ is replaced by a sequence:

$$\begin{aligned} \#a(a[i]) := \#a(a[i])-1;~ a[i]:=v;~ \#a(v) := \#a(v)+1 \end{aligned}$$

(49)

(that is, in addition to the array write, the count of elements for the value that gets overwritten is decremented, and the count of elements for the new value is incremented).

This auxiliary map $\#a$ can itself be abstracted using our approach! Let us now see how to implement this in our abstract forward semantics expressed using Horn clauses. We enrich our Galois connection (Definition 1) as follows:

Definition 11

The concretization of $I^\sharp \subseteq \chi \times (\iota \times \beta ) \times (\beta \times \mathbb {N})$ is

$$\begin{aligned} \gamma _{\#}(I^\sharp ) = \Big \{ (\mathbf {x},a) \mid \forall i\in \iota ~ \forall v \in \beta ~ \big (\mathbf {x},(i,a[i]), (v,{{\mathrm{card}}}\{ j \in \iota \mid a[j] = v \})\big ) \in I^\sharp \Big \} \end{aligned}$$

(50)

where ${{\mathrm{card}}}X$ denotes the number of elements in the set X.

The abstraction of $I \subseteq \chi \times Arr \left( \iota ,\beta \right) $ is

$$\begin{aligned} \alpha _{\#}(I) = \Big \{ \big (\mathbf {x},(i,a[i]),(v,{{\mathrm{card}}}\{ j \in \iota \mid a[j] = v \})\big ) ~\Big |~ x \in \chi , i \in \iota \Big \} \end{aligned}$$

(51)

Theorem 4

The Horn rules for array reads and for scalar operations are the same as those for our first abstraction, except that we carry over the extra two components identically.

Definition 12

(Read statement). Same notations as Definition 2:

$$\begin{aligned} \begin{aligned} k \ne i \wedge I^\sharp _1\big ((\mathbf {x},i,v),(k,a_k),(z,a_{\#z})\big ) \wedge \\ I^\sharp _1\big ((\mathbf {x},i,v),(i,a_i),(z,a_{\#z})\big ) \implies I^\sharp _2 \big ((\mathbf {x},i,a_i),(k,a_k),(z,a_{\#z}) \big ) \end{aligned}\\ \begin{aligned} I^\sharp _1\big ((\mathbf {x},i,v),(i,a_i),(z,a_{\#z})\big ) \implies I^\sharp _2\big ((\mathbf {x},i,a_i),(i,a_i),(z,a_{\#z})\big ) \end{aligned} \end{aligned}$$

Lemma 3

The abstract forward semantics of the read statement (Definition 12) is a sound abstraction of the concrete semantics given in Definition 2.

The abstraction of the write statement is more complicated (see the sequence of instructions in Formula 49). To abstract a write $a[i]:=v$ between control points $p_1$ and $p_2$, we execute a read of the old value from the cell abstraction of the array, decrement the number of cells with this value, execute the write to the cell abstraction, and increment the number of cells with the new value.

Definition 13

(Write statement). With the same notations in Definition 3:

Lemma 4

The abstract forward semantics of the write statement (Definition 13) is a sound abstraction of the concrete semantics given in Definition 3.

If we want to compare the multiset of the contents of an array a at the end of a procedure to its contents at the beginning of the procedure, one needs to keep a copy of the old multiset. It is common that the property sought is a relation between the number of occurrences $\#a(z)$ of an element z in the output array a and its number of occurrences $\#a_0(z)$ in the input array $a^0$. In the above formulas, one may therefore replace the pair $(z, a_{\#z})$ by $(z, a_{\#z}, a^0_{\#z})$, with $a^0_{\#z}$ always propagated identically.

Example 8

Consider again selection sort (Program 7). We use the abstract semantics for read (Definition 12) and write (Definition 13), with an additional component $a^0_{\#z}$ for tracking the original number of values z in the array a.

We specify the final property as the query

$$\begin{aligned} \begin{aligned} exit (l_0,h,k,a_k,z,a_{\#z},a^0_{\#z}) \implies a_{\#z} = a^0_{\#z} \end{aligned} \end{aligned}$$

(52)

6 Experiments

Implementation. We implemented our prototype vaphor in 2k lines of OCaml. vaphor takes as input a mini-Java program (a variation of While with array accesses, and assertions) and produces a Smtlib2 file^{Footnote 18}. The core analyzer implements the translation for one and two-dimensional arrays described in Sects. 3 and 4, and also the direct translation toward a formula with array variables.

Experiments. We have tested our analyzer on several examples from the literature, including the array benchmark proposed in [15] also used in [3] (Table 1); and other classical array algorithms including selection sort, bubble sort and insertion sort (Table 2). We compared our approach to existing Horn clause solvers capable of dealing with arrays. All these files are available on the webpage https://hal.archives-ouvertes.fr/hal-01206882

Table 1. Comparison on the array benchmarks of [15]. (Average) timing are in seconds, CPU time. Abstraction with $N=1$. “sat” means the property was proved, “unsat” that it could not be proved. “hints” means that some invariants had to be manually supplied to the solver (e.g. even/odd conditions). A star means that we used another version of the solver. Timeout was 5 mn unless otherwise noted. The machine has 32 i3-3110M cores, 64 GiB RAM, C/C++ solvers were compiled with gcc 4.8.4, the JVM is OpenJDK 1.7.0-85.

Full size table

Table 2. Other array-manipulating programs, including various sorting algorithms. A star means that we used another version of the solver, R1 means random_seed=1. The result is likely a bug in Z3; the alternative is a bug in Spacer, since the same system cannot be satisfiable and unsatisfiable at the same time.

Limitations. Our tool does not currently implement the reasoning over array contents (multiset of values). Experiments for these were thus conducted by manually applying the transformations described in this article in order to obtain a system of Horn clauses. For this reason, because applying rules manually is tedious and error-prone, the only sorting algorithm for which we have checked that the multiset of the output is equal to the multiset of the inputs is selection sort. We are however confident that the two other algorithms would go through, given that they use similar or simpler swapping structures.

Some examples from Dillig et al. [15] involve invariants with even/odd constraints. The Horn solvers we tried do not seem to be able to infer invariants involving divisibility predicates unless these predicates were given by the user. For these cases we added these even/odd properties as additional invariants to prove.

Efficiency caveats. Our tool does not currently simplify the system of Horn clauses that it produces. We have observed that, in some cases, manually simplifying the clauses (removing useless variables, inlining single antecedents by substitution...) dramatically reduces solving times. Also, precomputing some simple scalar invariants on the Horn clauses (e.g. $0 \le k < i$ for a loop from k to $i-1$) and asserting them as assertions to prove in the Horn system sometimes reduces solving time.

We have observed that the execution time of a Horn solver may dramatically change depending on minor changes in the input, pseudo-random number generator seed, or version of the solver. For instance, the same version of Z3 solves the same system of Horn clauses (proving the correctness of selection sort) in 3 min 40 s or 3 h 52 min depending on whether the random seed is 1 or 0.^{Footnote 19}

Furthermore, we have run into numerous problems with solvers, including one example that, on successive versions of the same solver, produced “sat” then “unknown” and finally “unsat”, as well as crashes.

For all these reasons, we believe that solving times should not be regarded too closely. The purpose of our experimental evaluation is not to benchmark solvers relative to each other, but to show that our abstraction, even though it is incomplete, is powerful enough to lead to fully automated proofs of functional correctness of nontrivial array manipulations, including sorting algorithms. Tools for solving Horn clauses are still in their infancy and we thus expect performance and reliability to increase dramatically.

7 Related Work

7.1 Cell-Based Abstractions

Smashing. The simplest abstraction for an array is to “smash” all cells into a single one — this amounts to removing the k component from our first Galois connection (Definition 1). The weakness of that approach is that all writes are treated as “may writes” or weak updates: $a[i]:=x$ adds the value x to the values possibly found in the array a, but there is no way to remove any value from that set. Such an approach thus cannot treat initialization loops (e.g. Program 1) precisely.

Exploding. At the other extreme, for an array of statically known finite length N (which is common in embedded safety-critical software), one can distinguish all cells $a[0], \dots , a[N-1]$ and treat them as separate variables $a_0, \dots , a_{N-1}$. This is a good solution when N is small, but a terrible one when N is large: (i) many analyses scale poorly with the number of active variables (ii) an initialization loop will have to be unrolled N times to show it initializes all cells. Both smashing and exploding have been used with success in the Astrée static analyzer [4, 5].

Slices. More sophisticated analyses [12, 17, 19, 32, 33] distinguish slices or segments in the array; their boundaries depend on the index variables. For instance, in array initialization (Program 1), one slice is the part already initialized (indices $<i$), the other the part yet to be initialized (indices $\ge i$). In the simplest case, each slice is “smashed” into a single value, but more refined analyses express relationships between slices. Since the slices are segments [a, b] of indices, these analyses generalize poorly to multidimensional arrays. Also, there is often a combinatorial explosion in analyzing how array slices may or may not overlap.

Cornish et al. [10] similarly apply a program-to-program translation over the LLVM intermediate representation, followed by a scalar analysis.

To our best knowledge, all these approaches factor through our Galois connections , or combinations thereof: that is, their abstraction can be expressed as a composition of our abstraction and further abstraction — even though our implementation of the abstract transfer functions is completely different from theirs. Our approach, however, separates the concerns of (i) abstracting array problems to array-less problems (ii) abstracting the relationships between different cells and indices.

Fluid updates. Dillig et al. [15] extend the slice approach by introducing “fluid updates” to overcome the dichotomy between strong and weak updates. They specifically exclude sortedness from the kind of properties they can study.

Array removal by program transformation. Monniaux and Alberti [31] analyze array programs by transforming them into array-free programs, which are sent to a back-end analyzer. The resulting invariants contain extra index variables, which can be universally quantified away, similar to the Skolem constants of earlier invariant inference approaches [16, 27]. We have explained (p. 3) why these approaches are less precise than ours.

Another difficulty they obviously faced was the limitations of the back-end solvers that they could use. The integer acceleration engine Flata severely limits the kind of transition relations that can be considered and scales poorly. The abstract interpreter ConcurInterproc can infer disjunctive properties (necessary to distinguish two slices in an array) only if given case splits using observer Boolean variables; but the cost increases greatly (exponentially, in the worst case) with the number of such variables.

7.2 Horn Clauses

Transformations. De Angelis et al. [14] start from a system of Horn clauses over array variables and apply a sequence of transformation rules that (i) either yield the empty set of clauses, meaning the program is correct (ii) either yield the “false” fact, meaning the program is incorrect (iii) either fails to terminate. These rules are based on the axioms of arrays and on a generalization scheme for arithmetic predicates using widening and convex hull.

In contrast to theirs, our approach (i) does not require a target property to prove (though the backend solver may need one) (ii) does not mix concerns about arrays and arithmetic constraints (iii) can prove the correctness of the full insertion sort algorithm (they can prove only the inner loop).

Instantiation. Bjørner et al. [3] propose an approach for solving universally quantified Horn clauses: a Horn clause $(\forall x~ P(x,y)) \rightarrow Q(y)$, not handled by current solvers, is abstracted by $P(x_1(y)) \wedge \dots \wedge P(x_n(y)) \rightarrow Q(y)$ where the $x_i$ are heuristically chosen instantiations. Our approach can be construed as an application of their approach to the axioms of arrays, with specific instantiation heuristics.

We improve on their interesting contribution in several ways. (i) Instead of presenting our approach as a heuristic instantiation scheme, we show that it corresponds to specific Galois connections, which clarifies what abstraction is done and what kind of properties can or cannot be represented. (ii) We handle sortedness properties. None of their examples deal with sortedness and it is unclear how their instantiation heuristics would behave on them. (iii) We handle multisets (and thus permutation properties) by reduction to arrays. It is possible that our approach in this respect can be described as an instantiation scheme over the axioms for arrays (including the multiset of array contents), but, again, it is unclear how their instantiation heuristics would behave in this respect.

Their approach has not been implemented except in private research prototypes; we could not run a comparison.^{Footnote 20}

7.3 Predicate Abstraction, CEGAR and Array Interpolants

There exist a variety of approaches based on counterexample-guided abstraction refinement using Craig interpolants [28–30]. In a nutshell, Craig interpolants are predicates suitable for proving, using Hoare triples, that some unfolding of the execution cannot lead to an error state. They are typically processed from the proof of unsatisfiability of the unfolding produced by an SMT solver.

Generating good interpolants from purely arithmetic problems is already a difficult problem, and generating good universally quantified interpolants on array properties has proved even more challenging [1, 2, 23].

7.4 Acceleration

Bozga et al. [7] have proposed a method for accelerating certain transition relations involving actions over arrays, outputting the transitive closure in the form of a counter automaton. Translating the counter automaton into a first-order formula expressing the array properties however results in a loss of precision.

8 Conclusion and Perspectives

We have proposed a generic approach to abstract programs and universal properties over arrays (or arbitrary maps) by syntactic transformation into a system of Horn clauses without arrays, which is then sent to a solver. This transformation is powerful enough to prove, fully automatically and within minutes, that the output of selection sort is sorted and is a permutation of the input.

While some solvers have difficulties with the kind of Horn systems that we generate, some (e.g. Spacer) are capable of solving them quite well. We have used the stock version of the solvers, without tuning or help from their designers, thus higher performance is to be expected in the future. If the solver cannot find the invariants on its own, it can be helped by partial invariants from the user.

As experiments show, our approach significantly improves on the procedures currently in array-capable Horn solvers, as well as earlier approaches for inferring quantified array invariants: they typically cannot prove sorting algorithms.

Our rules are for forward analysis: a solution to our Horn clauses defines a super-set of all states reachable from program initialization, and the desired property is proved if this set is included in the property. We intend to investigate backward analysis: find a super-set of the set of all states reachable from a property violation, not intersecting the initial states.

One advantage of some of the approaches (the abstract interpretation ones from Sect. 7.1 and the transformation from [31]) is that they are capable of inferring what a program does, or at least a meaningful abstraction of it (e.g. “at the end of this program all cells in the array a contains 42”) as opposed to merely proving a property supplied by the user. Our approach can achieve this as well, provided it is used with a Horn clause solver that provides interesting solutions without the need of a query. This Horn clause solver should however be capable of generating disjunctive properties (e.g. $(k < i \wedge a_k = 0) \vee (k \ge i \wedge a_k = 42)$); thus a simple approach by abstract interpretation of the Horn clauses in, say, a sub-class of the convex polyhedra, will not do. We know of no such Horn solver; designing one is a research challenge. Maybe certain partitioning approaches used in sequential program verification [21, 34] may be transposed to Horn clauses.

We have considered simple programs operating over arrays or maps, as opposed to a real-life programming language with objects, references or, horror, pointer arithmetic. Yet, our approach can be adapted to such languages, following methods that view memory as arrays [6], whose disjointness is proved by typing (e.g. two values of different types can never be aliased, two fields of different types can never be aliased) or by alias analysis.

Notes

1.
In the following, we shall lump as “scalar” operations all operations not involving the array under consideration, e.g. $i:=i+1$. Any data types (integers, strings etc.) are supported if supported by the back-end solver.
2.
With the exception of pointers and references, which need special handling and may be internally converted to array accesses.
3.
A nonlinear clause is of the form $P_1(\dots ) \wedge P_2(\dots ) \wedge \dots \wedge P_n(\dots ) \wedge \textit{arithmetic condition} \implies Q(\dots )$, with several antecedent predicates $P_1, P_2, \dots $. Unfolding such rules yields a tree. In contrast a linear rule $P(\dots ) \wedge \textit{arithmetic condition} \implies Q(\dots )$ has only one antecedent predicate, and unfolding a system of such rules yields a linear chain.
4.
Z3 and Eldarica can also occasionally directly solve Horn clauses over arrays; we also compare to that.
5.
Classically, we denote the sets using predicates: $I_{i_0}(\mathbf {x})$ means $\mathbf {x}\in I_{i_0}$.
6.
https://github.com/Z3Prover hash 7f6ef0b6c0813f2e9e8f993d45722c0e5b99e152; due to various problems we preferred not to use results from later versions.
7.
https://bitbucket.org/spacer/code hash 7e1f9af01b796750d9097b331bb66b752ea0ee3c.
8.
https://github.com/uuverifiers/eldarica/releases/tag/v1.1-rc.
9.
A classical approach is to add overflow checks to the intermediate representation of programs in order to be able to express their semantics with mathematical integers even though they operate over machine integers.
10.
Some of these tools can however infer some simpler array invariants.
11.
For instance, $I_{loop}=loop(n,i,a)$, $I_{end}=end(n,i,a)$.
12.
also denoted by $I^\sharp _\ell ((i,n),(k,a_k))$ for sake of readability.
13.
If one is sure that the only relation that matters between a and b are between cells of same index, then one can use triples $(i,a_i,b_i)$.
14.
It would still be possible to proceed by first analyzing the first loop, getting the scalar invariant at location , quantifying it universally as , then analyzing the second loop. Such an approach would however fail if this program was itself included in an outer loop.
15.
The statement if tab[i]!=42 is decomposed into x:=a[i];if(x!=42).
16.
In Example 8 we shall see how to prove that the multiset of elements in the output is the same as in the input.
17.
Nontrivial in the sense that a human user operating a Floyd-Hoare proof assistant typically does not come up with them so easily.
18.
http://smtlib.cs.uiowa.edu/.
19.
We suspect that different choices in SAT lead to different proofs of unsatisfiability, thus different interpolants and different refinements in the PDR algorithm.
20.
Their approach is not implemented in Z3 (personal communication from N. Bjørner).

References

Alberti, F., Monniaux, D.: Polyhedra to the rescue of array interpolants. In: Symposium on applied computing (Software Verification & Testing), pp. 1745–1750. ACM (2015). doi:10.1145/2695664.2695784
Alberti, F., Bruttomesso, R., Ghilardi, S., Ranise, S., Sharygina, N.: An extension of lazy abstraction with interpolation for programs with arrays. Formal Methods Syst. Des. 45(1), 63–109 (2014). doi:10.1007/s10703-014-0209-9
Article MATH Google Scholar
Bjørner, N., McMillan, K., Rybalchenko, A.: On solving universally quantified Horn clauses. In: Logozzo, F., Fähndrich, M. (eds.) SAS 2013. LNCS, vol. 7935, pp. 105–125. Springer, Heidelberg (2013). doi:10.1145/2695664.2695784
Chapter Google Scholar
Blanchet, B., Cousot, P., Cousot, R., Feret, J., Mauborgne, L., Miné, A., Monniaux, D., Rival, X.: Design and implementation of a special-purpose static program analyzer for safety-critical real-time embedded software. In: Mogensen, T.Æ., Schmidt, D.A., Sudborough, I.H. (eds.) The Essence of Computation. LNCS, vol. 2566, pp. 85–108. Springer, Heidelberg (2002). doi:10.1007/3-540-36377-7_5
Chapter Google Scholar
Blanchet, B., Cousot, P., Cousot, R., Feret, J., Mauborgne, L., Miné, A., Monniaux, D., Rival, X.: A static analyzer for large safety-critical software. In: Programming Language Design and Implementation (PLDI), pp. 196–207. ACM (2003). doi:10.1145/781131.781153
Bornat, R.: Proving pointer programs in Hoare logic. In: Backhouse, R.C., Oliveira, J.N. (eds.) MPC 2000. LNCS, vol. 1837, pp. 102–126. Springer, Heidelberg (2000). doi:10.1007/10722010_8
Chapter Google Scholar
Bozga, M., Habermehl, P., Iosif, R., Konečný, F., Vojnar, T.: Automatic verification of integer array programs. In: Bouajjani, A., Maler, O. (eds.) CAV 2009. LNCS, vol. 5643, pp. 157–172. Springer, Heidelberg (2009). doi:10.1007/978-3-642-02658-4_15
Chapter Google Scholar
Bradley, A.R., Manna, Z., Sipma, H.B.: What’s decidable about arrays? In: Emerson, E.A., Namjoshi, K.S. (eds.) VMCAI 2006. LNCS, vol. 3855, pp. 427–442. Springer, Heidelberg (2006). doi:10.1007/11609773_28
Chapter Google Scholar
Christ, J.: Interpolation Modulo Theories. Ph.D thesis, University of Freiburg (2015)
Google Scholar
Cornish, J.R.M., Gange, G., Navas, J.A., Schachte, P., Søndergaard, H., Stuckey, P.J.: Analyzing array manipulating programs by program transformation. In: Proietti, M., Seki, H. (eds.) LOPSTR 2014. LNCS, vol. 8981, pp. 3–20. Springer, Heidelberg (2015). doi:10.1007/978-3-319-17822-6_1
Google Scholar
Cousot, P., Cousot, R.: Abstract interpretation frameworks. J. Log. Comput. 2(4), 511–547 (1992). doi:10.1093/logcom/2.4.511
Article MathSciNet MATH Google Scholar
Cousot, P., Cousot, R., Logozzo, F.: A parametric segmentation functor for fully automatic and scalable array content analysis. In: Principles of Programming Languages (POPL), pp. 105–118. ACM (2011). doi:10.1145/1926385.1926399
Cousot, P., Cousot, R.: Invited talk: higher order abstract interpretation. In: IEEE International Conference on Computer Languages, pp. 95–112. IEEE (1994)
Google Scholar
De Angelis, E., Fioravanti, F., Pettorossi, A., Proietti, M.: A rule-based verification strategy for array manipulating programs. Fundamenta Informaticae 140(3–4), 329–355 (2015). doi:10.3233/FI-2015-1257
Article MathSciNet MATH Google Scholar
Dillig, I., Dillig, T., Aiken, A.: Fluid updates: beyond strong vs. weak updates. In: Gordon, A.D. (ed.) ESOP 2010. LNCS, vol. 6012, pp. 246–266. Springer, Heidelberg (2010). doi:10.1007/978-3-642-11957-6_14
Chapter Google Scholar
Flanagan, C., Qadeer, S.: Predicate abstraction for software verification. In: POPL, pp. 191–202 (2002)
Google Scholar
Gopan, D., Reps, T.W., Sagiv, S.: A framework for numeric analysis of array operations. In: Principles of Programming Languages (POPL), pp. 338–350 (2005) doi:10.1145/1040305.1040333
Gurfinkel, A., Kahsai, T., Komuravelli, A., Navas, J.A.: The seahorn verification framework. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 343–361. Springer, Heidelberg (2015)
Chapter Google Scholar
Halbwachs, N., Péron, M.: Discovering properties about arrays in simple programs. In: Programming language design and implementation (PLDI), pp. 339–348. ACM (2008) doi:10.1145/1375581.1375623
Halpern, J.Y.: Presburger arithmetic with unary predicates is $\Pi ^1_1$ complete. J. Symbolic Logic 56(2), 637–642 (1991). doi:10.2307/2274706. ISSN 0022–4812
Article MathSciNet MATH Google Scholar
Henry, J., Monniaux, D., Moy, M.: Succinct representations for abstract interpretation. In: Miné, A., Schmidt, D. (eds.) SAS 2012. LNCS, vol. 7460, pp. 283–299. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33125-1_20
Chapter Google Scholar
Hoder, K., Bjørner, N.: Generalized property directed reachability. In: Cimatti, A., Sebastiani, R. (eds.) SAT 2012. LNCS, vol. 7317, pp. 157–171. Springer, Heidelberg (2012). doi:10.1007/978-3-642-31612-8_13. ISBN 978-3-642-31611-1
Chapter Google Scholar
Jhala, R., McMillan, K.L.: Array abstractions from proofs. In: Damm, W., Hermanns, H. (eds.) CAV 2007. LNCS, vol. 4590, pp. 193–206. Springer, Heidelberg (2007). doi:10.1007/978-3-540-73368-3_23
Chapter Google Scholar
Komuravelli, A., Gurfinkel, A., Chaki, S., Clarke, E.M.: Automatic abstraction in SMT-based unbounded software model checking. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 846–862. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39799-8_59. ISBN 978-3-642-39798-1
Chapter Google Scholar
Komuravelli, A., Gurfinkel, A., Chaki, S.: SMT-based model checking for recursive programs. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 17–34. Springer, Heidelberg (2014). doi:10.1007/978-3-319-08867-9_2. ISBN 978-3-319-08866-2
Google Scholar
Kroening, D., Strichman, O.: Decision Procedures. Springer, Heidelberg (2008). ISBN 978-3-540-74104-6
MATH Google Scholar
Lahiri, S.K., Bryant, R.E.: Indexed predicate discovery for unbounded system verification. In: Alur, R., Peled, D.A. (eds.) CAV 2004. LNCS, vol. 3114, pp. 135–147. Springer, Heidelberg (2004). doi:10.1007/978-3-540-27813-9_11
Chapter Google Scholar
McMillan, K.L.: Applications of craig interpolation to model checking. In: Ciardo, G., Darondeau, P. (eds.) ICATPN 2005. LNCS, vol. 3536, pp. 15–16. Springer, Heidelberg (2005). doi:10.1007/11494744_2
Chapter Google Scholar
McMillan, K.L.: Lazy abstraction with interpolants. In: Ball, T., Jones, R.B. (eds.) CAV 2006. LNCS, vol. 4144, pp. 123–136. Springer, Heidelberg (2006). doi:10.1007/11817963_14
Chapter Google Scholar
McMillan, K.L.: Interpolants from Z3 proofs. In: Formal Methods in Computer-Aided Design (FMCAD), pp. 19–27 (2011). ISBN 978-0-9835678-1-3
Google Scholar
Monniaux, D., Alberti, F.: A simple abstraction of arrays and maps by program translation. In: Blazy, S., Jensen, T. (eds.) SAS 2015. LNCS, vol. 9291, pp. 217–234. Springer, Heidelberg (2015). doi:10.1007/978-3-662-48288-9_13. ISBN 978-3-662-48288-9
Chapter Google Scholar
Péron, M.: Contributions to the Static Analysis of Programs HandlingArrays. Theses, Université de Grenoble, September 2010. https://tel.archives-ouvertes.fr/tel-00623697
Perrelle, V.: Analyse statique de programmes manipulant des tableaux. Theses, Université de Grenoble, February 2013. https://tel.archives-ouvertes.fr/tel-00973892
Rival, X., Mauborgne, L.: The trace partitioning abstract domain. ACM Trans. Program. Lang. Syst., 29(5) (2007). doi:10.1145/1275497.1275501
Google Scholar
Rümmer, P., Hojjat, H., Kuncak, V.: Disjunctive interpolants for horn-clause verification. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 347–363. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39799-8_24. ISBN 978-3-642-39798-1
Chapter Google Scholar
Rümmer, P., Hojjat, H., Kuncak, V.: Classifying and solving horn clauses for verification. In: Cohen, E., Rybalchenko, A. (eds.) VSTTE 2013. LNCS, vol. 8164, pp. 1–21. Springer, Heidelberg (2014). doi:10.1007/978-3-642-54108-7_1
Chapter Google Scholar

Download references

Acknowledgments

We wish to thank the anonymous referees for their careful reading and helpful comments.

Author information

Authors and Affiliations

University of Grenoble Alpes, VERIMAG, 38000, Grenoble, France
David Monniaux
CNRS, VERIMAG, 38000, Grenoble, France
David Monniaux
University of Lyon, LIP (UMR CNRS/ENS Lyon/UCB Lyon1/Inria), 69000, Lyon, France
Laure Gonnord

Authors

David Monniaux
View author publications
You can also search for this author in PubMed Google Scholar
Laure Gonnord
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Laure Gonnord .

Editor information

Editors and Affiliations

Ecole Normale Supérieure , Paris, France
Xavier Rival

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Monniaux, D., Gonnord, L. (2016). Cell Morphing: From Array Programs to Array-Free Horn Clauses. In: Rival, X. (eds) Static Analysis. SAS 2016. Lecture Notes in Computer Science(), vol 9837. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-53413-7_18

Download citation

DOI: https://doi.org/10.1007/978-3-662-53413-7_18
Published: 31 August 2016
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-53412-0
Online ISBN: 978-3-662-53413-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics