Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In this article, we consider programs operating over arrays, or, more generally, maps from an index type to a value type (in the following, we shall use “array” and “map” interchangeably). Such programs contain read (e.g. \(v:=a[i]\)) and write (\(a[i]:=v\)) operations over arrays, as well as “scalar” operations.Footnote 1 We wish to fully automatically verify properties on such programs; e.g. that a sorting algorithm outputs a sorted permutation of its input.

Universally Quantified Properties. Very often, desirable properties over arrays are universally quantified; e.g. sortedness may be expressed as \(\forall k_1,k_2~ k_1 < k_2 \implies a[k_1] \le a[k_2]\). However, formulas with universal quantification and linear arithmetic over integers and at least one predicate symbol (a predicate being a function to the Booleans) form an undecidable class [20], of which some decidable subclasses have however been identified [8]. There is therefore no general algorithm for checking that such invariants hold, let alone inferring them. Yet, there have been several approaches proposed to infer such invariants (see Sect. 7).

We here propose a method for inferring such universally quantified invariants, given a specification on the output of the program. This being undecidable, this approach may fail to terminate in the general case, or may return “unknown”. Experiments however show that our approach can successfully and automatically verify nontrivial properties (e.g. the output from selection sort is sorted and is a permutation of the input).

Our key insight is that if there is a proof of safety of an array-manipulating program, it is likely that there exists a proof that can be expressed with simple steps over properties relating only a small number N of (parametric) array cells, called “distinguished cells”. For instance, all the sorting algorithms we tried can be proved correct with \(N=2\), and simple array manipulations (copying, reversing...) with \(N=1\).

Horn Clauses. We convert the verification problem to Horn clauses, a common format for program verification problems [36] supported by a number of tools. Usual conversions [18] map variables and operations from the program to variables of the same type and the same operations in the Horn clause problem:Footnote 2 an integer is mapped to an integer, an array to an array, etc. If arrays are not supported by the back-end analysis, they may be abstracted away (reads replaced by nondeterministic choices, writes discarded) at the expense of precision. In contrast, our approach abstracts programs much less violently, with tunable precision, even though the result is still a Horn clause problem without arrays. Section 3 explains how many properties (e.g. initialization) can be proved using one “distinguished cell” (\(N=1\)), Sect. 4 explains how properties such as sortedness can be proved using two cells; completely discarding arrays corresponds to using zero of them.

An interesting characteristic of the Horn clauses we produce is that they are nonlinearFootnote 3, even though a straightforward translation of the semantics of a control-flow graph into Hoare triples expressed as clauses yields a linear system (whose unfoldings correspond to abstract execution traces). If a final property to prove (e.g. “all values are 0”) queries one cell position, this query may morph, by the backward unfolding of the clauses, into a tree of queries at other locations, in contrast to some earlier approaches [31].

We illustrate this approach with automated proofs of several examples from the literature: we apply Sects. 3, 4 or 5 to obtain a system of Horn clauses without arrays. This system is then fed to the Z3, Eldarica or Spacer solver, which produces a model of this system, meaning that the postcondition (e.g. sortedness or multiset of the output equal to that of the input) truly holds.Footnote 4

Previous approaches [31] using “distinguished cells” amounted (even though not described as such) to linear Horn rules; on contrast, our abstract semantics uses non-linear Horn rules, which leads to higher precision (Sect. 7.1).

Contributions. Our main contribution is a system of rules for transforming the atomic program statements in a program operating over arrays or maps, as well as (optionally) the universally quantified postcondition to prove, into a system of non-linear Horn clauses over scalar variables only. The precision of this transformation is tunable using a Galois connection parameterized by the number of “distinguished cells”; e.g. properties such as sortedness need two distinguished cells (Sect. 4) while simpler properties need only one (Sect. 3). Statements operating over non-arrays variables are mapped (almost) identically to their concrete semantics. This system over-approximates the behavior of the program. A solution of that system can be mapped to inductive invariants over the original programs, including universal properties over arrays.

A second contribution, based on the first, is a system of rules that also keeps track of array/map contents (Sect. 5) as a multiset. This system is suitable for showing content properties, e.g. that the output of a sorting algorithm is a permutation of the input, even though the sequence of operations is not directly a sequence of swaps.

We implemented our approach and benchmarked it over several classical examples of array algorithms (Sect. 6), comparing it favorably to other tools.

2 Program Verification as Solving Horn Clauses

A classical approach to program analysis is to consider a program as a control-flow graph and to attach to each vertex \(p_i\) (control point) an inductive invariant \(I_i\): a set of possible values \(\mathbf {x}\) of the program variables (and memory stack and heap, as needed) so that (i) the set associated to the initial control point \(p_{i_0}\) contains the possible initialization values \(S_{i_0}\) (ii) for each edge \(p_i \rightarrow _c p_j\) (c for concrete), the set \(I_j\) associated to the target control point \(p_j\) should include all the states reachable from the states in the set \(I_i\) associated to the source control point \(p_i\) according to the transition relation \(\tau _{i,j}\) of the edge. Inductiveness is thus defined by Horn clauses Footnote 5:

$$\begin{aligned} \forall \mathbf {x},~ S_{i_0}(\mathbf {x}) \implies I_{i_0}(\mathbf {x}) \end{aligned}$$
(1)
$$\begin{aligned} \forall \mathbf {x},\mathbf {x}',~ I_i(\mathbf {x}) \wedge \tau _{i,j}(\mathbf {x},\mathbf {x}') \implies I_j(\mathbf {x}') \end{aligned}$$
(2)

For proving safety properties, in addition to inductiveness, one requires that error locations \(p_{e_1},\dots ,p_{e_n}\) are proved to be unreachable (the associated set of states is empty): this amounts to Horn clauses implying false: \(\forall \mathbf {x},~ I_{e_i}(\mathbf {x}) \Rightarrow \bot \).

Various tools can solve such systems of Horn clauses, that is, can synthesize suitable predicates \(I_i\), which constitute inductive invariants. In this article, we tried Z3 Footnote 6 with the PDR fixed point solver [22], Z3 with the Spacer solver [24, 25],Footnote 7 and Eldarica [35].Footnote 8 Since program verification is undecidable, such tools, in general, may fail to terminate, or may return “unknown”.

For the sake of simplicity, we shall consider, in this article, that all integer variables in programs are mathematical integers (\(\mathbb {Z}\)) as opposed to machine integersFootnote 9 and that arrays are infinite. Again, it is easy to modify our semantics to include systematic array bound checks, jumps to error conditions, etc.

In examples, instead of writing \(I_{ stmt }\) for the name of the predicate (inductive invariant) at statement \( stmt \), we shall write \( stmt \) directly, for readability’s sake: thus we write e.g. \( loop \) for a predicate at the head of a loop. Furthermore, for readability, we shall sometimes coalesce several successive statements into one.

Fig. 1.
figure 1

Compact control-flow graph for Program 1

Example 1

(Motivating example). Consider the program:

figure a

We would like to prove that this program truly fills array with value 42. The flat encoding into Horn clauses assigns a predicate (set of states) to each of the control nodes (Fig. 1), and turns each transition into a Horn rule with variables ranging in \( Arr \left( A,B\right) \), the type of arrays of B indexed by A [26, Chap. 7]:

$$\begin{aligned} \begin{aligned} \forall n\in \mathbb {Z}~ \forall a\in Arr \left( \mathbb {Z},\mathbb {Z}\right) ~ n>0 \implies loop (n,0,a) \end{aligned} \end{aligned}$$
(3)
$$\begin{aligned} \begin{aligned} \forall n,i\in \mathbb {Z}~ \forall a\in Arr \left( \mathbb {Z},\mathbb {Z}\right) ~ i<n \wedge loop (n,i,a) \implies loop (n,i+1, store (a,i,42)) \end{aligned} \end{aligned}$$
(4)
$$\begin{aligned} \begin{aligned} \forall n,i\in \mathbb {Z}~ \forall a\in Arr \left( \mathbb {Z},\mathbb {Z}\right) ~ i\ge n \wedge loop (n,i,a) \implies end (n,i,a) \end{aligned} \end{aligned}$$
(5)
$$\begin{aligned} \begin{aligned} \forall x,n,i\in \mathbb {Z}~ \forall a\in Arr \left( \mathbb {Z},\mathbb {Z}\right) ~ 0 \le x < n \wedge end (n,i,a) \implies select(a,x) = 42 \end{aligned} \end{aligned}$$
(6)

where \( store (a,i,v)\) is array a where the value at index i has been replaced by v and \( select (a,x)\) denotes a[x].

None of the tools we have tried (Z3, Spacer, Eldarica) has been able to solve this system, presumably because they cannot infer universally quantified invariants over arrays.Footnote 10 Indeed, here the loop invariant needed is

$$\begin{aligned} 0 \le i \le n \wedge (\forall k~ 0 \le k < i \implies a[k] = 42) \end{aligned}$$
(7)

While \(0 \le i \le n\) is inferred by a variety of approaches, the rest is tougher. We shall also see (Example 6) that a slight alteration of this example also prevents some earlier abstraction approaches from checking the desired property.

Most software model checkers attempt constructing invariants from Craig interpolants obtained from refutations [9] of the accessibility of error states in local [22] or global [29] unfoldings of the problem. However, interpolation over array properties is difficult, especially since the goal is not to provide any interpolant, but interpolants that generalize well to invariants [1, 2]. This article instead introduces a way to derive universally quantified invariants from the analysis of a system of Horn clauses on scalar variables (without array variables).

3 Getting Did of the Arrays

To use the power of Horn solvers, we soundly abstract problems with arrays to problems without arrays. In the Horn clauses for Example 1, we attached to each program point \(p_\ell \) a predicate \(I_\ell \) over \(\mathbb {Z}\times \mathbb {Z}\times Arr \left( \mathbb {Z},\mathbb {Z}\right) \) when the program variables are two integers in and one integer-value, integer-indexed array a.Footnote 11 In any solution of the system of clauses, if the valuation (ina) is reachable at program point \(p_\ell \), then \(I_\ell (i,n,a)\) holds. Instead, in the case of Example 1, we will consider a predicate \(I^\sharp _\ell \) over \(\mathbb {Z}\times \mathbb {Z}\times \mathbb {Z}\times \mathbb {Z}\) (the array \( key \rightarrow value \) has been replaced by a pair \(( key , value )\)) such that \(I^\sharp _\ell (i,n,k,a_k)\) Footnote 12 holds for each reachable state (ina) satisfying \(a[k]=a_k\). This is the same Galois connection [11] as some earlier works [31] [13, Sect. 2.1]; yet, as we shall see, our abstract transformers are more precise.

Definition 1

The “one distinguished cell” abstraction of \(I \subseteq \chi \times Arr \left( \iota ,\beta \right) \) is \(\alpha _{}(I) = \{ (\mathbf {x},i,a[i]) \mid \mathbf {x}\in \chi , i \in \iota \}\). The concretization of \(I^\sharp \subseteq \chi \times (\iota \times \beta )\) is \(\gamma _{}(I^\sharp ) = \{ (\mathbf {x},a) \mid \forall i\in \iota ~ (\mathbf {x},i,a[i]) \in I^\sharp \}\).

Theorem 1

is a Galois connection.

To provide the abstract transformers, we will suppose in the sequel that any statement in the program (control flow graph) will be: (i) either an array read to a fresh variable, in C syntax, \(v:=a[i]\) in pseudo-code; the variables of the program are \((\mathbf {x},i,v)\) where \(\mathbf {x}\) is a vector of arbitrarily many variables; (ii) either an array write, (where and are variables) in C syntax, \(a[i]:=v\) in pseudo-code; the variables of the program are \((\mathbf {x},i,v)\) before and after the statement; (iii) or a scalar operation, including assignments and guards over scalar variables. More complex statements can be transformed to a sequence of such statements, by introducing temporary variables if needed: for instance, \(a[i]:=a[j]\) is transformed into \( temp :=a[j];~ a[i]:= temp \).

Definition 2

(Read statement). Let be a variable of type \(\beta \), be a variable of type \(\iota \), and be an array of values of type \(\beta \) with an index of type \(\iota \). Let \(\mathbf {x}\) be the other program variables, taken in \(\chi \). The concrete “next state” relation for the read statement between locations \(p_1\) and \(p_2\) is \((\mathbf {x},i,v,a) \rightarrow _c (\mathbf {x},i,a[i],a)\).

Its forward abstract semantics is encoded into two Horn clauses:

$$\begin{aligned} \begin{aligned} \forall \mathbf {x}\in \chi ~ \forall i \in \iota ~ \forall v,a_i \in \beta ~ \forall k \in \iota ~ \forall a_k \in \beta \\ k \ne i \wedge I^\sharp _1\big ((\mathbf {x},i,v),(k,a_k)\big ) \wedge I^\sharp _1\big ((\mathbf {x},i,v),(i,a_i)\big ) \implies I^\sharp _2\big ((\mathbf {x},i,a_i),(k,a_k)\big ) \end{aligned} \end{aligned}$$
(8)
$$\begin{aligned} \begin{aligned} \forall \mathbf {x}\in \chi ~ \forall i \in \iota ~ \forall v,a_i \in \beta ~ \forall k \in \iota ~ \forall a_k \in \beta \\ I^\sharp _1\big ((\mathbf {x},i,v),(i,a_i)\big ) \implies I^\sharp _2\big ((\mathbf {x},i,a_i),(i,a_i)\big ) \end{aligned} \end{aligned}$$
(9)

The tuple \((k,a_k)\) now represents a “distinguished cell”. While rule 9 is straightforward (\(a_i\) is assigned to the variable v), the nonlinear rule 8 may be more difficult to comprehend. The intuition is that, to have both \(a_i = a[i]\) and \(a_k = a[k]\) at the read instruction with a given valuation \((\mathbf {x},i)\) of the other variables, both \(a_i = a[i]\) and \(a_k = a[k]\) had to be reachable with the same valuation.

Remark 1

We use two separate rules for \(k=i\) and \(k\ne i\) for better precision. A single rule \(I^\sharp _1\big ((\mathbf {x},i,v),(k,a_k)\big ) \wedge I^\sharp _1\big ((\mathbf {x},i,v),(i,a_i)\big ) \implies I^\sharp _2\big ((\mathbf {x},i,a_i),(k,a_k)\big )\) would not enforce that if \(i=k\) then \(a_i = a_k\) in the consequent.

From now on, we shall omit all universal quantifiers inside rules, for readability.

Definition 3

(Write statement). The concrete “next state” relation for the write statement is \((\mathbf {x},i,v,a) \rightarrow _c (\mathbf {x},i,v, store (a,i,v))\). Its forward abstract semantics is encoded into two Horn clauses, depending on whether the distinguished cell is i or not:

$$\begin{aligned} \begin{aligned} I^\sharp _1\big ((\mathbf {x},i,v),(k,a_k)\big ) \wedge i \ne k \implies I^\sharp _2\big ((\mathbf {x},i,v),(k,a_k)\big ) \end{aligned} \end{aligned}$$
(10)
$$\begin{aligned} \begin{aligned} I^\sharp _1\big ((\mathbf {x},i,v),(i,a_i)\big ) \implies I^\sharp _2\big ((\mathbf {x},i,v),(i,v)\big ) \end{aligned} \end{aligned}$$
(11)

Example 2

(Example 1 , cont.). The \(a[i]:=42\) statement of Example 1 is translated into (the loop control point is divided into loop / write / incr, all predicates of arity 4):

$$\begin{aligned} \begin{aligned} i \ne k \wedge write (n,i,k,a_k) \implies incr (n,i,k,a_k) \end{aligned} \end{aligned}$$
(12)
$$\begin{aligned} \begin{aligned} write (n,i,i,a_i) \implies incr (n,i,i,42) \end{aligned} \end{aligned}$$
(13)

Definition 4

(Initialization). The creation of an array variable with nondeterministically chosen initial content is abstracted by \(I^\sharp _1(\mathbf {x}) \implies I^\sharp _2(\mathbf {x},k,a_k)\).

Definition 5

(Scalar statements). With the same notations as above, we consider a statement (or sequence thereof) operating only on scalar variables: \(\mathbf {x}\rightarrow _s \mathbf {x}'\) if it is possible to obtain scalar values \(\mathbf {x}'\) after executing the statement on scalar values \(\mathbf {x}\). The concrete “next state” relation for that statement is \((\mathbf {x},i,v,a) \rightarrow _c (\mathbf {x}',i,v,a)\). Its forward abstract semantics is encoded into:

$$\begin{aligned} I^\sharp _1(\mathbf {x},k,a_k) \wedge \mathbf {x}\rightarrow _s \mathbf {x}' \implies I^\sharp _2(\mathbf {x}',k,a_k) \end{aligned}$$
(14)

Example 3

A test \(x \ne y\) gets abstracted as

$$\begin{aligned} I^\sharp _1(x,y,k,a_k) \wedge x \ne y \implies I^\sharp _2(x,y,k,a_k) \end{aligned}$$
(15)

Definition 6

The scalar operation \( kill (v_1,\dots ,v_n)\) removes variables \(v_1,\dots ,v_n\): \((\mathbf {x},v_1,\dots ,v_n) \rightarrow \mathbf {x}\). We shall apply it to get rid of dead variables, sometimes, for the sake of brevity, without explicit note, by coalescing it with other operations.

Our Horn rules are of the form \(\forall \mathbf {y}~ I_1^\sharp (\mathbf {f_1}(\mathbf {y})) \wedge \dots \wedge I_1^\sharp (\mathbf {f_m}(\mathbf {y})) \wedge P(\mathbf {y}) \implies I^\sharp _2(\mathbf {g}(\mathbf {y}))\) (\(\mathbf {y}\) is a vector of variables, \(\mathbf {f_1},\dots ,\mathbf {f_m}\) vectors of terms depending on \(\mathbf {y}\), P an arithmetic predicate over \(\mathbf {y}\)). In other words, they impose in \(I^\sharp _2\) the presence of \(\mathbf {g}(\mathbf {y})\) as soon as certain \(\mathbf {f_1}(\mathbf {y}),\dots ,\mathbf {f_m}(\mathbf {y})\) are found in \(I_1^\sharp \). Let \(I^\sharp _{2-}\) be the set of such imposed elements. This Horn rule is said to be sound if \(\gamma (I^\sharp _{2-})\) includes all states \((\mathbf {x}',a')\) such that there exists \((\mathbf {x},a)\) in \(\gamma (I^\sharp _1)\) and \((\mathbf {x},a) \rightarrow _c (\mathbf {x}',a')\).

Lemma 1

The forward abstract semantics of the read statement (Definition 2), of the write statement (Definition 3), of array initialization (Definition 4), of the scalar statements (Definition 5) are sound w.r.t the Galois connection.

Remark 2

The scalar statements include “killing” dead variables (Definition 6). Note that, contrary to many other abstractions, in ours, removing some variables may cause irrecoverable loss of precision on other variables [31, Sect. 4.2]: if v is live, then one can represent \(\forall k,~ a[k]=v\), which implies \(\forall k_1,k_2~ a[k_1] = a[k_2]\) (constantness), but if v is discarded, the constantness of a is lost.

Theorem 2

If \(I^\sharp _1,\dots ,I^\sharp _m\) are a solution of a system of Horn clauses sound in the above sense, then \(\gamma (I^\sharp _1),\dots ,\gamma (I^\sharp _m)\) are inductive invariants w.r.t the concrete semantics \(\rightarrow _c\).

Definition 7

(Property conversion). A property “at program point \(p_\ell \), for all \(\mathbf {x}\in \chi \) and all \(k \in \iota \), \(\phi (\mathbf {x},k,a[k])\) holds” (where \(\phi \) is a formula, say over arithmetic) is converted into a Horn query \(\forall \mathbf {x}\in \chi ~ \forall k \in \iota ~ I^\sharp _{\ell }(\mathbf {x},k,a_k) \implies \phi (\mathbf {x},k,a_k)\).

Our method for converting a scalar program into a system of Horn clauses over scalar variables is thus:

Algorithm 1

(Abstraction into Horn constraints). Given the control-flow graph of the program:

  1. 1.

    To each control point \(p_\ell \), with vector of scalar variables \(\mathbf {x}_\ell \), associate a predicate \(I^\sharp _\ell (\mathbf {x}_\ell , k, a_k)\) in the Horn clause system (the vector of scalar variables may change from control point to control point).

  2. 2.

    For each transition of the program, generate Horn rules according to Definitions 2, 3, 5 as applicable (an initialization node has no antecedents in its rule).

  3. 3.

    Generate Horn queries from desired properties according to Definition 7.

Example 4

(Example 1 , continued). Let us now apply the Horn abstract semantics from Definitions 3 and 5 to Program 1; in this case, \(\beta = \mathbb {Z}\), \(\iota = \{ 0,\dots ,n-1 \}\) (thus we always have \(0\le k < n\)) , \(\chi = \mathbb {Z}\). After slight simplification, we get:

$$\begin{aligned} \begin{aligned} 0\le k < n \implies loop (n,0,k,a_k) \end{aligned}\end{aligned}$$
(16)
$$\begin{aligned} \begin{aligned} 0 \le k< n \wedge i < n \wedge loop (n,i,k,a_k) \implies write (n,i,k,a_k) \end{aligned} \end{aligned}$$
(17)
$$\begin{aligned} \begin{aligned} 0 \le k < n \wedge i \ne k \wedge write (n,i,k,a_k) \implies incr (n,i,k,a_k) \end{aligned}\end{aligned}$$
(18)
$$\begin{aligned} \begin{aligned} write (n,i,i,a_i) \implies incr (n,i,i,42) \end{aligned}\end{aligned}$$
(19)
$$\begin{aligned} \begin{aligned} 0 \le k < n \wedge incr (n,i,k,a_k) \implies loop (n,i+1,k,a_k) \end{aligned}\end{aligned}$$
(20)
$$\begin{aligned} \begin{aligned} ~ 0 \le k < n \wedge i \ge n \wedge loop (n,i,k,a_k) \implies end (n,i,k,a_k) \end{aligned} \end{aligned}$$
(21)

Finally, we add the postcondition (using Definition 7):

$$\begin{aligned} \begin{aligned} 0 \le k < n \wedge end (n,i,k,a_k) \Rightarrow a_k = 42 \end{aligned} \end{aligned}$$
(22)

A solution to the resulting system of Horn clauses can be found by e.g. Z3.

Our approach can also be used to establish relationships between several arrays, or between the initial values in an array and the final values: arrays a[i] and b[j] can be abstracted by a quadruple \((i,a_i,j,b_j)\).Footnote 13

Example 5

Consider the problem of finding the minimum of an array slice \(a[d \dots h-1]\), with value \(b=a[p]\):

figure b

Again, we encode the abstraction of the statements (Definitions 2, 3, 5) as Horn clauses. We obtained a predicate \( end (d,h,p,b,k,a_k)\) constrained as follows:

$$\begin{aligned} \begin{aligned} end (d,h,p,b,p,a_p) \implies b = a_p \end{aligned} \end{aligned}$$
(23)
$$\begin{aligned} \begin{aligned} d \le k < h \wedge end (d,h,p,b,k,a_k) \implies b \le a_k \end{aligned} \end{aligned}$$
(24)

Rule 23 imposes the postcondition \(b = a[p]\), Rule 24 imposes the postcondition \(\forall k~ d \le k < h \implies b \le a[k]\). This example is again solved by Z3.

Earlier approaches based on translation to programs [31], thus transition systems, are equivalent to translating into linear Horn clauses where \(x_1,\dots ,x_p\) are the same in the antecedent and consequent:

$$\begin{aligned} I_1(\dots ,x_1,a_1,\dots ,x_p, a_p) \wedge condition \rightarrow I_2(\dots ,x_1,a'_1,\dots ,x_p,a'_p) \end{aligned}$$
(25)

In contrast, in this article we use a much more powerful translation to nonlinear Horn clauses (Sect. 7.1) where \(x''_1,\dots ,x''_p\) differ from \(x_1,\dots ,x_p\):

$$\begin{aligned} I_1(\dots ,x_1,a_1,\dots ,x_p,a_p) \wedge \dots \wedge&I_1(\dots ,x''_1,a''_1,\dots ,a''_p,x''_p) \wedge condition \nonumber \\&\qquad \qquad \rightarrow I_2(\dots ,x_1,a'_1,\dots ,x_p,a'_p) \end{aligned}$$
(26)

Example 6

(Motivating example, altered). The earlier work [31] could successfully analyze Example 1. However a slight modification of the program prevents it from doing so:

figure c

For this particular example, the array would be abstracted all over the program using a fixed number of cells , where \(x_1,\dots ,x_p\) are symbolic constants.

The second loop is then analyzed as though it wereFootnote 14.

figure d

One easily sees that if \(p < n\), there must be a loop iteration where and thus at location takes any value and may take value 1.

The output of our translation process in this example is the same until the M control point, thenFootnote 15 it is:

$$\begin{aligned} \begin{aligned} end (n,i,k,a_k) \implies loop2 (n,0,x,k,a_k,0) \end{aligned} \end{aligned}$$
(27)
$$\begin{aligned} \begin{aligned} loop2 (n,i,x,k,a_k,f) \wedge i< n \implies rd2 (n,i,x,k,a_k,f) \end{aligned}\end{aligned}$$
(28)
$$\begin{aligned} \begin{aligned} rd2 (n,i,x,k,a_k,f) \wedge i\ne k \wedge {\varvec{rd}}{} \mathbf{2}{\varvec{(n,i,x,i,}}{\varvec{a}}_{\varvec{i}}{\varvec{,f)}} \implies test2 (n,i,a_i,k,a_k,f) \end{aligned}\end{aligned}$$
(29)
$$\begin{aligned} \begin{aligned} rd2 (n,i,x,i,a_i,f) \implies test2 (n,i,a_i,i,a_i,f) \end{aligned}\end{aligned}$$
(30)
$$\begin{aligned} \begin{aligned} test2 (n,i,x,k,a_k,f) \wedge x\ne 42 \implies loop2 (n,i+1,k,a_k,1) \end{aligned}\end{aligned}$$
(31)
$$\begin{aligned} \begin{aligned} test2 (n,i,x,k,a_k,f) \wedge x= 42 \implies loop2 (n,i+1,k,a_k,f) \end{aligned}\end{aligned}$$
(32)
$$\begin{aligned} \begin{aligned} loop2 (n,i,x,k,a_k,f) \wedge i\ge n \implies end2 (f) \end{aligned}\end{aligned}$$
(33)
$$\begin{aligned} \begin{aligned} end2 (f) \Rightarrow f=0 \quad \text {(property to prove)} \end{aligned} \end{aligned}$$
(34)

This abstraction is precise enough to prove the desired property, thanks to the bold antecedent (multiples cell indices occur within the same unfolding). Without this nonlinear rule (removing this antecedent yields a sound abstraction equivalent to [31]), the unfoldings of the system of rules all carry the same k (index of the distinguished cell) between predicates: the program is analyzed with respect to one single a[k], with k symbolic, leading to insufficient precision.

No Restrictions on Domain Type and Relationships, and Matrices. The kind of relationship that can be inferred between loop indices, array indices and array contents is limited only by the capabilities of the Horn solver. For instance, invariants of the form \(\forall i~ i \equiv 0 \pmod 2 \implies a[i]=0\) may be inferred if the Horn solver supports numeric invariants involving divisibility. Similarly, we have made no assumption regarding the nature of the indexing variable: we used integers because arrays indexed by an integer range are a very common kind of data structure, but really it can be any type supported by the Horn clause solver, e.g. rationals or strings. For instance, matrices (resp. n-tensors) are specified by having pairs of integers (resp. n-tuples) as indices.

4 Sortedness and Other N-ary Predicates

The Galois connection of Definition 1 expresses relations of the form \(\forall k \in \iota ~ \phi (\mathbf {x}, k, a[k])\) where \(\mathbf {x}\) are variables from the program, a a map and k an index into the map a; in other words, relations between each array element individually and the rest of the variables. It cannot express properties such as sortedness, which link two array elements: \(\forall k_1,k_2 \in \iota ~ k_1 < k_2 \implies a[k_1] \le a[k_2]\).

For such properties, we need two “distinguished cells”, with indices \(k_1\) and \(k_2\). For efficiency, we break this symmetry between indices \(k_1\) and \(k_2\) by imposing \(k_1 < k_2\) for some total order.

Definition 8

The abstraction with indices \(k_1 < k_2\) is

$$\begin{aligned} \gamma _{2<}(I^\sharp ) = \{ (\mathbf {x},a) \mid \forall k_1 < k_2\in \iota ~ (\mathbf {x},k_1,a[k_1],k_2,a[k_2]) \in I^\sharp \} \end{aligned}$$
(35)
$$\begin{aligned} \alpha _{2<}(I) = \{ (\mathbf {x},k_1,a[k_1],k_2,a[k_2]) \mid x \in \chi , k_1 \le k_2 \in \iota \} \end{aligned}$$
(36)

Theorem 3

\(\alpha _{2<}\) and \(\gamma _{2<}\) form a Galois connection:

These constructions easily generalize to arbitrary N indices \(k_1,\dots ,k_N\).

Definition 9

(Read, two indices \(k_1 < k_2\) ). The abstraction of \(v := a[i]\) is:

$$\begin{aligned} \begin{aligned} I^\sharp _1(\mathbf {x}, i, v, k_1, a_{k_1}, k_2, a_{k_2}) \wedge I^\sharp _1(\mathbf {x}, i, v, i, a_i, k_2, a_{k_2}) \wedge \\ I^\sharp _1(\mathbf {x}, i, v, i, a_i, k_1, a_{k_1}) \wedge i< k_1 < k_2 \implies I^\sharp _2(\mathbf {x}, i, a_i, k_1, a_{k_1}, k_2, a_{k_2}) \end{aligned} \end{aligned}$$
(37)
$$\begin{aligned} \begin{aligned} I^\sharp _1(\mathbf {x}, i, v, i, a_i, k_2, a_{k_2}) \wedge I^\sharp _1(\mathbf {x}, i, v, k_1, a_{k_1}, k_2, a_{k_2}) \wedge \\ I^\sharp _1(\mathbf {x}, i, v, k_1, a_{k_1}, i, a_i) \wedge k_1< i < k_2 \implies I^\sharp _2(\mathbf {x}, i, a_i, k_1, a_{k_1}, k_2, a_{k_2}) \end{aligned}\end{aligned}$$
(38)
$$\begin{aligned} \begin{aligned} I^\sharp _1(\mathbf {x}, i, v, k_2, a_{k_2}, i, a_i) \wedge I^\sharp _1(\mathbf {x}, i, v, k_1, a_{k_1}, i, a_i) \wedge \\ I^\sharp _1(\mathbf {x}, i, v, k_1, a_{k_1}, k_2, a_{k_2}) \wedge k_1< k_2 < i \implies I^\sharp _2(\mathbf {x}, i, a_i, k_1, a_{k_1}, k_2, a_{k_2}) \end{aligned}\end{aligned}$$
(39)
$$\begin{aligned} \begin{aligned} I^\sharp _1(\mathbf {x}, i, v, i, a_i, k_2, a_{k_2}) \wedge i < k_2 \implies I^\sharp _2(\mathbf {x}, i, a_i, i, a_i, k_2, a_{k_2}) \end{aligned}\end{aligned}$$
(40)
$$\begin{aligned} \begin{aligned} I^\sharp _1(\mathbf {x}, i, v, k_1, a_{k_1}, i, a_i) \wedge k_1 < i \implies I^\sharp _2(\mathbf {x}, i, a_i, k_1, a_{k_1}, i, a_i) \end{aligned} \end{aligned}$$
(41)

This generalizes to N-ary abstraction by considering all orderings of i inside \(k_1< \dots < k_N\), and for each ordering taking all sub-orderings of size N.

Definition 10

(Write statement, two indices \(k_1 < k_2\) ). The abstraction of \(a[i]:=v\) is:

$$\begin{aligned} \begin{aligned} I^\sharp _1(\mathbf {x}, i, v, k_1, a_{k_1}, k_2, a_{k_2}) \wedge i \ne k_1 \wedge i \ne k_2 \implies I^\sharp _2(\mathbf {x}, i, v, k_1, a_{k_1}, k_2, a_{k_2}) \end{aligned}\end{aligned}$$
(42)
$$\begin{aligned} \begin{aligned} I^\sharp _1(\mathbf {x}, i, v, i, a_{i}, k_2, a_{k_2}) \wedge i < k_2 \implies I^\sharp _2(\mathbf {x}, i, v, i, v, k_2, a_{k_2}) \end{aligned}\end{aligned}$$
(43)
$$\begin{aligned} \begin{aligned} I^\sharp _1(\mathbf {x}, i, v, k_1, a_{k_1}, i, a_{i}) \wedge k_1 < i \implies I^\sharp _2(\mathbf {x}, i, v, k_1, a_{k_1}, i, v) \end{aligned} \end{aligned}$$
(44)

Lemma 2

The abstract forward semantics of the read statement (Definition 9) and of the write statement (Definition 10) are sound w.r.t the Galois connection.

Example 7

(Selection sort). Selection sort finds the least element in \(a[d \dots h-1]\) (using Program 5 as its inner loop) and swaps it with a[d], then sorts \(a[d+1,h-1]\). At the end, \(a[d_0\dots h-1]\) is sorted, where \(d_0\) is the initial value of d.

figure e

Using the rules for the read (Definition 9) and write (Definition 10) statements, we write the abstract forward semantics of this program as a system of Horn clauses.

We wish to prove that, at the end, \(a[d_0,h-1]\) is sorted: at the \( exit \) node,

$$\begin{aligned} \forall d_0 \le k_1< k_2 < h,~ a[k_1] \le a[k_2] \end{aligned}$$
(45)

This is expressed as the final condition:

$$\begin{aligned} \begin{aligned} d_0 \le k_1< k_2 < h \wedge exit (d_0, h, k_1, a_{k_1}, k_2, a_{k_2}) \implies a_{k_1} \le a_{k_2} \end{aligned} \end{aligned}$$
(46)

By running a solver on these clauses, we show that the output of selection sort is truly sortedFootnote 16 Let us note that this proof relies on nontrivial invariants:Footnote 17

$$\begin{aligned} \forall k_1,k_2,~ d_0 \le k_1< d \wedge k_1 \le k_2 < h \implies a[k_1] \le a[k_2] \end{aligned}$$
(47)

This invariant can be expressed in our Horn clauses as:

$$\begin{aligned} \begin{aligned} d_0 \le k_1< d \wedge k_1< k_2 < h \wedge outerloop (d_0,d,h,k_1,a_{k_1},k_2,a_{k_2}) \implies a_{k_1} \le a_{k_2} \end{aligned} \end{aligned}$$
(48)

If this invariant is added to the problem as an additional query to prove, solving time is reduced from 6 min to 1 s. It may seem counter-intuitive that a solver takes less time to solve a problem with an additional constraint; but this constraint expresses an invariant necessary to prove the solution, and thus nudges the solver towards the solution.

Our approach is therefore flexible: if a solver fails to prove the desired property on its own, it is possible to help it by providing partial invariants. This is a less tedious approach than having to provide full invariants at every loop header, as common in assisted Floyd-Hoare proofs.

5 Sets and Multisets

Our abstraction for maps may be used to abstract (multi)sets. Let us see for instance how to abstract the multiset of elements of an array, so as to show that the output of a sorting algorithm is a permutation of the input.

In Example 7, we showed how to prove that the output of selection sort is sorted. This is not enough for functional correctness: we also have to prove that the output is a permutation of the input, or, equivalently, that the multiset of elements in the output array is the same as that in the input array.

Let us remark that it is easy to keep track, in an auxiliary map, of the number \(\#a(x)\) of elements of value x in the array a[]. Only write accesses to a[] have an influence on \(\#a\): a write \(a[i]:=v\) is replaced by a sequence:

$$\begin{aligned} \#a(a[i]) := \#a(a[i])-1;~ a[i]:=v;~ \#a(v) := \#a(v)+1 \end{aligned}$$
(49)

(that is, in addition to the array write, the count of elements for the value that gets overwritten is decremented, and the count of elements for the new value is incremented).

This auxiliary map \(\#a\) can itself be abstracted using our approach! Let us now see how to implement this in our abstract forward semantics expressed using Horn clauses. We enrich our Galois connection (Definition 1) as follows:

Definition 11

The concretization of \(I^\sharp \subseteq \chi \times (\iota \times \beta ) \times (\beta \times \mathbb {N})\) is

$$\begin{aligned} \gamma _{\#}(I^\sharp ) = \Big \{ (\mathbf {x},a) \mid \forall i\in \iota ~ \forall v \in \beta ~ \big (\mathbf {x},(i,a[i]), (v,{{\mathrm{card}}}\{ j \in \iota \mid a[j] = v \})\big ) \in I^\sharp \Big \} \end{aligned}$$
(50)

where \({{\mathrm{card}}}X\) denotes the number of elements in the set X.

The abstraction of \(I \subseteq \chi \times Arr \left( \iota ,\beta \right) \) is

$$\begin{aligned} \alpha _{\#}(I) = \Big \{ \big (\mathbf {x},(i,a[i]),(v,{{\mathrm{card}}}\{ j \in \iota \mid a[j] = v \})\big ) ~\Big |~ x \in \chi , i \in \iota \Big \} \end{aligned}$$
(51)

Theorem 4

The Horn rules for array reads and for scalar operations are the same as those for our first abstraction, except that we carry over the extra two components identically.

Definition 12

(Read statement). Same notations as Definition 2:

$$\begin{aligned} \begin{aligned} k \ne i \wedge I^\sharp _1\big ((\mathbf {x},i,v),(k,a_k),(z,a_{\#z})\big ) \wedge \\ I^\sharp _1\big ((\mathbf {x},i,v),(i,a_i),(z,a_{\#z})\big ) \implies I^\sharp _2 \big ((\mathbf {x},i,a_i),(k,a_k),(z,a_{\#z}) \big ) \end{aligned}\\ \begin{aligned} I^\sharp _1\big ((\mathbf {x},i,v),(i,a_i),(z,a_{\#z})\big ) \implies I^\sharp _2\big ((\mathbf {x},i,a_i),(i,a_i),(z,a_{\#z})\big ) \end{aligned} \end{aligned}$$

Lemma 3

The abstract forward semantics of the read statement (Definition 12) is a sound abstraction of the concrete semantics given in Definition 2.

The abstraction of the write statement is more complicated (see the sequence of instructions in Formula 49). To abstract a write \(a[i]:=v\) between control points \(p_1\) and \(p_2\), we execute a read of the old value from the cell abstraction of the array, decrement the number of cells with this value, execute the write to the cell abstraction, and increment the number of cells with the new value.

Definition 13

(Write statement). With the same notations in Definition 3:

figure f

Lemma 4

The abstract forward semantics of the write statement (Definition 13) is a sound abstraction of the concrete semantics given in Definition 3.

If we want to compare the multiset of the contents of an array a at the end of a procedure to its contents at the beginning of the procedure, one needs to keep a copy of the old multiset. It is common that the property sought is a relation between the number of occurrences \(\#a(z)\) of an element z in the output array a and its number of occurrences \(\#a_0(z)\) in the input array \(a^0\). In the above formulas, one may therefore replace the pair \((z, a_{\#z})\) by \((z, a_{\#z}, a^0_{\#z})\), with \(a^0_{\#z}\) always propagated identically.

Example 8

Consider again selection sort (Program 7). We use the abstract semantics for read (Definition 12) and write (Definition 13), with an additional component \(a^0_{\#z}\) for tracking the original number of values z in the array a.

We specify the final property as the query

$$\begin{aligned} \begin{aligned} exit (l_0,h,k,a_k,z,a_{\#z},a^0_{\#z}) \implies a_{\#z} = a^0_{\#z} \end{aligned} \end{aligned}$$
(52)

6 Experiments

Implementation. We implemented our prototype vaphor in 2k lines of OCaml. vaphor takes as input a mini-Java program (a variation of While with array accesses, and assertions) and produces a Smtlib2 fileFootnote 18. The core analyzer implements the translation for one and two-dimensional arrays described in Sects. 3 and 4, and also the direct translation toward a formula with array variables.

Experiments. We have tested our analyzer on several examples from the literature, including the array benchmark proposed in [15] also used in [3] (Table 1); and other classical array algorithms including selection sort, bubble sort and insertion sort (Table 2). We compared our approach to existing Horn clause solvers capable of dealing with arrays. All these files are available on the webpage https://hal.archives-ouvertes.fr/hal-01206882

Table 1. Comparison on the array benchmarks of [15]. (Average) timing are in seconds, CPU time. Abstraction with \(N=1\). “sat” means the property was proved, “unsat” that it could not be proved. “hints” means that some invariants had to be manually supplied to the solver (e.g. even/odd conditions). A star means that we used another version of the solver. Timeout was 5 mn unless otherwise noted. The machine has 32 i3-3110M cores, 64 GiB RAM, C/C++ solvers were compiled with gcc 4.8.4, the JVM is OpenJDK 1.7.0-85.
Table 2. Other array-manipulating programs, including various sorting algorithms. A star means that we used another version of the solver, R1 means random_seed=1. The result is likely a bug in Z3; the alternative is a bug in Spacer, since the same system cannot be satisfiable and unsatisfiable at the same time.

Limitations. Our tool does not currently implement the reasoning over array contents (multiset of values). Experiments for these were thus conducted by manually applying the transformations described in this article in order to obtain a system of Horn clauses. For this reason, because applying rules manually is tedious and error-prone, the only sorting algorithm for which we have checked that the multiset of the output is equal to the multiset of the inputs is selection sort. We are however confident that the two other algorithms would go through, given that they use similar or simpler swapping structures.

Some examples from Dillig et al. [15] involve invariants with even/odd constraints. The Horn solvers we tried do not seem to be able to infer invariants involving divisibility predicates unless these predicates were given by the user. For these cases we added these even/odd properties as additional invariants to prove.

Efficiency caveats. Our tool does not currently simplify the system of Horn clauses that it produces. We have observed that, in some cases, manually simplifying the clauses (removing useless variables, inlining single antecedents by substitution...) dramatically reduces solving times. Also, precomputing some simple scalar invariants on the Horn clauses (e.g. \(0 \le k < i\) for a loop from k to \(i-1\)) and asserting them as assertions to prove in the Horn system sometimes reduces solving time.

We have observed that the execution time of a Horn solver may dramatically change depending on minor changes in the input, pseudo-random number generator seed, or version of the solver. For instance, the same version of Z3 solves the same system of Horn clauses (proving the correctness of selection sort) in 3 min 40 s or 3 h 52 min depending on whether the random seed is 1 or 0.Footnote 19

Furthermore, we have run into numerous problems with solvers, including one example that, on successive versions of the same solver, produced “sat” then “unknown” and finally “unsat”, as well as crashes.

For all these reasons, we believe that solving times should not be regarded too closely. The purpose of our experimental evaluation is not to benchmark solvers relative to each other, but to show that our abstraction, even though it is incomplete, is powerful enough to lead to fully automated proofs of functional correctness of nontrivial array manipulations, including sorting algorithms. Tools for solving Horn clauses are still in their infancy and we thus expect performance and reliability to increase dramatically.

7 Related Work

7.1 Cell-Based Abstractions

Smashing. The simplest abstraction for an array is to “smash” all cells into a single one — this amounts to removing the k component from our first Galois connection (Definition 1). The weakness of that approach is that all writes are treated as “may writes” or weak updates: \(a[i]:=x\) adds the value x to the values possibly found in the array a, but there is no way to remove any value from that set. Such an approach thus cannot treat initialization loops (e.g. Program 1) precisely.

Exploding. At the other extreme, for an array of statically known finite length N (which is common in embedded safety-critical software), one can distinguish all cells \(a[0], \dots , a[N-1]\) and treat them as separate variables \(a_0, \dots , a_{N-1}\). This is a good solution when N is small, but a terrible one when N is large: (i) many analyses scale poorly with the number of active variables (ii) an initialization loop will have to be unrolled N times to show it initializes all cells. Both smashing and exploding have been used with success in the Astrée static analyzer [4, 5].

Slices. More sophisticated analyses [12, 17, 19, 32, 33] distinguish slices or segments in the array; their boundaries depend on the index variables. For instance, in array initialization (Program 1), one slice is the part already initialized (indices \(<i\)), the other the part yet to be initialized (indices \(\ge i\)). In the simplest case, each slice is “smashed” into a single value, but more refined analyses express relationships between slices. Since the slices are segments [ab] of indices, these analyses generalize poorly to multidimensional arrays. Also, there is often a combinatorial explosion in analyzing how array slices may or may not overlap.

Cornish et al. [10] similarly apply a program-to-program translation over the LLVM intermediate representation, followed by a scalar analysis.

To our best knowledge, all these approaches factor through our Galois connections , or combinations thereof: that is, their abstraction can be expressed as a composition of our abstraction and further abstraction — even though our implementation of the abstract transfer functions is completely different from theirs. Our approach, however, separates the concerns of (i) abstracting array problems to array-less problems (ii) abstracting the relationships between different cells and indices.

Fluid updates. Dillig et al. [15] extend the slice approach by introducing “fluid updates” to overcome the dichotomy between strong and weak updates. They specifically exclude sortedness from the kind of properties they can study.

Array removal by program transformation. Monniaux and Alberti [31] analyze array programs by transforming them into array-free programs, which are sent to a back-end analyzer. The resulting invariants contain extra index variables, which can be universally quantified away, similar to the Skolem constants of earlier invariant inference approaches [16, 27]. We have explained (p. 3) why these approaches are less precise than ours.

Another difficulty they obviously faced was the limitations of the back-end solvers that they could use. The integer acceleration engine Flata severely limits the kind of transition relations that can be considered and scales poorly. The abstract interpreter ConcurInterproc can infer disjunctive properties (necessary to distinguish two slices in an array) only if given case splits using observer Boolean variables; but the cost increases greatly (exponentially, in the worst case) with the number of such variables.

7.2 Horn Clauses

Transformations. De Angelis et al. [14] start from a system of Horn clauses over array variables and apply a sequence of transformation rules that (i) either yield the empty set of clauses, meaning the program is correct (ii) either yield the “false” fact, meaning the program is incorrect (iii) either fails to terminate. These rules are based on the axioms of arrays and on a generalization scheme for arithmetic predicates using widening and convex hull.

In contrast to theirs, our approach (i) does not require a target property to prove (though the backend solver may need one) (ii) does not mix concerns about arrays and arithmetic constraints (iii) can prove the correctness of the full insertion sort algorithm (they can prove only the inner loop).

Instantiation. Bjørner et al. [3] propose an approach for solving universally quantified Horn clauses: a Horn clause \((\forall x~ P(x,y)) \rightarrow Q(y)\), not handled by current solvers, is abstracted by \(P(x_1(y)) \wedge \dots \wedge P(x_n(y)) \rightarrow Q(y)\) where the \(x_i\) are heuristically chosen instantiations. Our approach can be construed as an application of their approach to the axioms of arrays, with specific instantiation heuristics.

We improve on their interesting contribution in several ways. (i) Instead of presenting our approach as a heuristic instantiation scheme, we show that it corresponds to specific Galois connections, which clarifies what abstraction is done and what kind of properties can or cannot be represented. (ii) We handle sortedness properties. None of their examples deal with sortedness and it is unclear how their instantiation heuristics would behave on them. (iii) We handle multisets (and thus permutation properties) by reduction to arrays. It is possible that our approach in this respect can be described as an instantiation scheme over the axioms for arrays (including the multiset of array contents), but, again, it is unclear how their instantiation heuristics would behave in this respect.

Their approach has not been implemented except in private research prototypes; we could not run a comparison.Footnote 20

7.3 Predicate Abstraction, CEGAR and Array Interpolants

There exist a variety of approaches based on counterexample-guided abstraction refinement using Craig interpolants [2830]. In a nutshell, Craig interpolants are predicates suitable for proving, using Hoare triples, that some unfolding of the execution cannot lead to an error state. They are typically processed from the proof of unsatisfiability of the unfolding produced by an SMT solver.

Generating good interpolants from purely arithmetic problems is already a difficult problem, and generating good universally quantified interpolants on array properties has proved even more challenging [1, 2, 23].

7.4 Acceleration

Bozga et al. [7] have proposed a method for accelerating certain transition relations involving actions over arrays, outputting the transitive closure in the form of a counter automaton. Translating the counter automaton into a first-order formula expressing the array properties however results in a loss of precision.

8 Conclusion and Perspectives

We have proposed a generic approach to abstract programs and universal properties over arrays (or arbitrary maps) by syntactic transformation into a system of Horn clauses without arrays, which is then sent to a solver. This transformation is powerful enough to prove, fully automatically and within minutes, that the output of selection sort is sorted and is a permutation of the input.

While some solvers have difficulties with the kind of Horn systems that we generate, some (e.g. Spacer) are capable of solving them quite well. We have used the stock version of the solvers, without tuning or help from their designers, thus higher performance is to be expected in the future. If the solver cannot find the invariants on its own, it can be helped by partial invariants from the user.

As experiments show, our approach significantly improves on the procedures currently in array-capable Horn solvers, as well as earlier approaches for inferring quantified array invariants: they typically cannot prove sorting algorithms.

Our rules are for forward analysis: a solution to our Horn clauses defines a super-set of all states reachable from program initialization, and the desired property is proved if this set is included in the property. We intend to investigate backward analysis: find a super-set of the set of all states reachable from a property violation, not intersecting the initial states.

One advantage of some of the approaches (the abstract interpretation ones from Sect. 7.1 and the transformation from [31]) is that they are capable of inferring what a program does, or at least a meaningful abstraction of it (e.g. “at the end of this program all cells in the array a contains 42”) as opposed to merely proving a property supplied by the user. Our approach can achieve this as well, provided it is used with a Horn clause solver that provides interesting solutions without the need of a query. This Horn clause solver should however be capable of generating disjunctive properties (e.g. \((k < i \wedge a_k = 0) \vee (k \ge i \wedge a_k = 42)\)); thus a simple approach by abstract interpretation of the Horn clauses in, say, a sub-class of the convex polyhedra, will not do. We know of no such Horn solver; designing one is a research challenge. Maybe certain partitioning approaches used in sequential program verification [21, 34] may be transposed to Horn clauses.

We have considered simple programs operating over arrays or maps, as opposed to a real-life programming language with objects, references or, horror, pointer arithmetic. Yet, our approach can be adapted to such languages, following methods that view memory as arrays [6], whose disjointness is proved by typing (e.g. two values of different types can never be aliased, two fields of different types can never be aliased) or by alias analysis.