1 Introduction

The Why3 environment for deductive program verification is built around a kernel that implements a formal specification language, based on typed first-order logic. Logical goals can be proved using a large set of automated and interactive external theorem provers, such as Alt-Ergo [4], CVC3 [2], CVC4 [1], Z3 [8], E [17], SPASS [19], Vampire [16], Coq [3], or PVS [15]. When a goal is sent to a prover that does not support some features of the language, Why3 applies a series of encoding transformations, for example, to eliminate pattern matching or polymorphic types [7].

On top of this kernel, Why3 features a programming language WhyML, where functions can be formally specified using contracts. A VC generator produces proof obligations that need to be discharged to prove that a program respects its specification [9].

In this paper, we illustrate the use of Why3 by providing solutions to the three challenges that were given at the VerifyThis competition, held at the 18th FM symposium in August 2012. The description of the challenges can be found at http://fm2012.verifythis.org/challenges/. Reference Java implementations were given for the first two problems, and an algorithm in pseudocode was given for the third one.

We entered the competition with two teams, each with two members, both using Why3. By the end of the competition, our teams had partial solutions for the proposed challenges. After the competition, we merged our teams. For each challenge, we took the better approach of two as a basis upon which we built a complete solution. We estimate that we spent approximately 30 person-hour in that process. Our solutions use Why3 version 0.82 [6].

2 Why3 in a nutshell

In this section, we briefly describe Why3: the pure logic language of specifications, the programming language WhyML, and some peculiarities of program verification in Why3.

2.1 Specification language

Why3 is based on first-order logic with ML-style polymorphic types and several extensions: recursive definitions, algebraic data types, and inductive predicates. The specification language of Why3 does not depend on any features of its programming language, and can serve as a rich common format of theorem proving problems, readily suitable (via Why3) for multiple external provers.

Types. Built-in types in Why3 include integers (), real numbers (), and tuples. A user-defined type can be non-interpreted or be a synonym for a type expression:

figure a

Otherwise, it is an algebraic data type or a record. For instance, polymorphic lists and binary trees are defined in the standard library of Why3 as follows:

figure b

Record types are a special case of algebraic types with a single unnamed constructor and named fields. Here is a definition of a banker’s queue:

figure c

All types must be inhabited and Why3 checks that every algebraic type declaration admits at least one value. For example, the above definition of the list type would be rejected without constructor .

Function and predicate symbols. Every function or predicate symbol in Why3 has a polymorphic type signature. For example, an abstract function that applies a mapping to an element of the domain can be declared as follows:

figure d

Both functions and predicates can be given definitions, possibly mutually recursive. As examples, we can specify that an -to- mapping is strictly increasing:

figure e

or calculate the height of a tree:

figure f

Why3 automatically verifies that recursive definitions are terminating. To do so, it looks for an appropriate lexicographic order of arguments that guarantees a structural descent. Currently, we only support recursion over algebraic types. Other kinds of recursive functions have to be axiomatized or defined as programs, where termination is proved using variants (see Sect. 2.2).

Another extension to the first-order language adopted in Why3 is inductive predicates. Such a predicate is the least relation satisfying a set of clauses. For instance, the subsequence relation over lists is inductively defined as follows:

figure g

Standard positivity restrictions apply to ensure the existence of the least fixed point.

Terms and formulas. First-order language is extended, both in terms and formulas, with pattern matching, -expressions, and conditional expressions. We stay faithful to the usual distinction between terms and formulas that is made in first-order logic. Thus we make a difference between a predicate symbol and a function symbol which returns a -typed value, being defined with . However, to facilitate writing, conditional expressions are allowed in terms, as in the following definition of absolute value:

figure h

Such a construct is directly accepted by provers not making a distinction between terms and formulas. To translate constructs to traditional first-order language, Why3 lifts them to the level of formulas and rewrites them as conjunctions of two implications.

Theories. Pure logical definitions, axioms, and lemmas are organized in collections, called theories. The standard library of Why3 contains numerous theories describing integer and real arithmetic, lists, binary trees, mappings, abstract algebraic notions, etc. To provide a fine-grained control of the premise set, we favor small theories which build on each other and introduce one or two concepts, such as Euclidean division, list membership, or injective maps. Instruction imports a theory into the current context:

figure i

2.2 Programming language

WhyML can be seen as a dialect of ML with extensive support for specification annotations. Program functions are provided with pre- and postconditions for normal and exceptional termination, and loop statements are annotated with invariants. To ensure termination, recursive functions and -loops can be given variants, i.e., terms that decrease at each recursive call or iteration with respect to some well-founded ordering. Statically checked assertions can be inserted at arbitrary points in a program.

Verification conditions are generated using a standard weakest-precondition procedure. To produce first-order proof obligations, WhyML is restricted to the first order, too: Nested function definitions are allowed but higher order functions are not. Furthermore, to keep proof obligations more tractable for provers and more readable (hence debuggable) for users, the type system of WhyML requires all aliases to be known statically, at the time of verification condition generation. This allows us to apply the Hoare-style rule for assignment, without resorting to a heap memory model. One consequence of this discipline is that recursive data types cannot have mutable components. As we demonstrate below, these restrictions do not preclude us from writing and verifying complex algorithms and data structures.

Types. WhyML extends the pure types of the specification language in several ways. First and foremost, the mutable state of a computation is exclusively embodied in mutable fields of record data types:

figure j

A program type can be provided with an invariant, i.e., a logical property imposed on any value of the type:

figure k

Type invariants in WhyML are verified at the function call boundaries. Since WhyML type system tracks aliases and side effects statically, it is easy to detect whenever a type invariant must be re-established. Keyword in place of the equal sign means that, inside WhyML programs, type is not a record, but an abstract data type. Thus, an attempt to access the field in a program would be rejected. However, inside specifications, is a record and its fields may be accessed.

Finally, a record field (or, more generally, an argument of a constructor in an algebraic type) can be declared ghost. Ghost data and ghost computations serve strictly for verification purposes. In particular, a typical use case for ghost fields is to equip a data structure with a pure logical “view”. For example, some intricate implementation of sparse matrices may carry a ghost fieldFootnote 1:

figure l

Formal arguments of program functions, as well as locally defined variables, can also be declared ghost.

To guarantee that ghost data and computations do not interfere with the program and cannot affect its final result, WhyML type system imposes a number of restrictions on their use. Ghost data cannot be used in a non-ghost computation. Ghost computations cannot modify a non-ghost mutable value or raise exceptions that escape into non-ghost code. However, ghost computations can read non-ghost values and ghost data can be used in program specifications.

Function prototypes. Unlike other ML variants, WhyML does not separate interface and implementation. One can freely mix in the same WhyML module fully implemented program functions with abstract function prototypes carrying only a type and a specification. For “model” types like , whose structure is inaccessible from programs, function prototypes is the only way to provide a usable interface. Here is how lookup and update are specified for arrays:

figure m

The names and define mixfix operators and . Clauses and specify the side effects in function prototypes; for example, modifies the mutable field of , namely . The term in the postcondition of the second prototype refers to the pre-call value of the field , before it is modified by .

Programs. The syntax of WhyML programs should give no surprise to anyone familiar with ML. As examples, let us show several functions to handle mutable references:

figure n

Contrary to function prototypes, we do not indicate and effects, since Why3 can extract this information from the code. Notice that the same prefix symbol is used as the name for both a pure access function and a program function. Since program symbols cannot appear in specifications, in pre- and postconditions can only refer to the pure function. In the program code, will refer to the WhyML function.

The last definition shows that pure types, functions, and predicates are accepted in WhyML programs. For instance, the type of integers and basic arithmetic operations are shared between specifications and programs. The only exception is made for logic functions and predicates specified directly on program types: such symbols can only be used in specifications. One reason for this restriction is that these functions and predicates have uncontrolled access to ghost components of program types. Had we not reused the operator, WhyML would reject the last definition reporting that pure function cannot appear in a program.

Modules. Akin to pure logical theories, WhyML declarations and definitions are grouped into modules. A module may import logical theories or contain pure declarations. The standard library modules introduce mutable references, arrays, hash tables, stacks, and queues.

2.3 Verifying programs with Why3

Simple Why3 formalizations can be verified directly from the command line: The why3 tool can run a designated automated prover on each proof obligation generated from a WhyML file and report eventual proof failures. For more complex developments, Why3 provides an interactive graphical environment whose main window is shown in Fig. 1. The big tree on the left shows the current state of the session. The root of the tree is the WhyML file in works, the first-level nodes are theories and modules, and the second-level nodes are primary proof obligations: logical lemmas and verification conditions for top-level WhyML functions.

Fig. 1
figure 1

Why3 graphical user interface (color figure online)

Proof obligations, also called proof tasks, can be sent to automated provers or handled using interactive proof assistants (see the stack of buttons to the left of the session view). Why3 also puts at user’s disposal a number of so-called transformations which can be used to simplify a proof obligation under consideration or to split it into a number of sub-tasks.Footnote 2 These sub-tasks appear at the lower levels of the session tree.

In the session shown in the figure, the file being checked is verifythis_fm2012_LRS.mlw. The first module in this file contains a WhyML function named (whose source code is shown in the bottom-right frame), and the verification condition for this function is decomposed into seven sub-tasks: two preconditions ensuring safety of array access in the loop condition, loop invariant initialization and preservation, and three postconditions covering each branch of the negated loop condition. The cursor is positioned on the first postcondition, where the first two parts of the loop condition are true and the third one is false. The proof obligation itself is shown in the top-right frame. Four automated provers were able to discharge this proof task successfully.

Due to differences in prover technology (which are especially deep between SMT solvers and resolution-based provers), there is no single best prover for the purposes of program verification. Quite often, a proof obligation is only discharged by one or two provers out of half a dozen we use regularly. Being able to target diverse back-end provers is an important feature of Why3 which allows us to prove automatically more goals than we would be able to using just one dedicated prover.

Proof sessions are saved between runs of Why3, which facilitates development and debugging of verified programs. Special algorithms were devised for reconstruction of session trees after modification of the program code or specification [5]. Along with the session, Why3 stores proof scripts for interactive proof assistants (at this moment, Coq and PVS are supported) that handle proof obligations falling beyond the reach of automated provers. One characteristic case where one has to resort to interactive proof is proof tasks requiring reasoning by induction.

Another important functionality of Why3 is code extraction: a verified WhyML program can be translated to a compilable correct-by-construction OCaml program (a mechanism similar to that of Coq). Recently, work has started on a native WhyML evaluator which would allow for quick testing of programs and assertions, speeding up the development process.

3 Challenge 1: longest repeated substring

This challenge aims at computing the longest repeated substring in a given string. The two occurrences of that substring are allowed to overlap. For example, the longest repeated substring in ABCDBBCDA and ABCDBCDB are, respectively, BCD and BCDB, with an overlap in the second case.

A naive algorithm that checks all possible occurrences of each substring would be of cubic complexity. This challenge proposes a more efficient algorithm, that amounts to computing the array of suffixes of the given string. It is an array of integers that lists the positions of the suffixes of the input string, in increasing lexicographic order. For example the sorted array of suffixes of ABCDBCDB is [0; 1; 4; 7; 2; 5;3; 6], corresponding to suffixes ABCDBCDB, BCDBCDB, BCDB, B, CDBCDB, CDB, DBCDB, DB in that order. The longest repeated substring is then the longest common prefix of two consecutive strings in this array.

A preliminary step of this challenge is thus to compute the longest common prefix of two given substrings of the input; this part was expected to be solved during the competition.

We provide a complete solution to the challenge. The Why3 code follows the Java reference implementation as closely as possible. A noticeable difference is that the function for sorting does not apply to a object: because the Java reference calls the method inside a constructorFootnote 3 which is not possible in Why3. Our solution is made of four Why3 modules:

  • : computation of longest common prefix of substrings.

  • : helper functions for sorting suffixes, roughly corresponding to the private part of the class of the Java code.

  • : the suffixArray data type, roughly corresponding to the public part of the class of the Java code.

  • : computation of longest repeated substring, corresponding to the class of the Java code.

In the following, we detail each module in turn, presenting first its specifications, and then the implementation and proofs. The full solution, including the complete Why3 input and the proof results, is available on the Toccata gallery of verified programs at http://toccata.lri.fr/gallery/verifythis_fm2012_LRS.en.html.

3.1 Longest common prefix

Our first module implements the first part of the challenge: a function that computes the longest common prefix at two given indexes.

Specification. The specifications are given in Fig. 2. We import arrays from the Why3 standard library. The predicate is true when the prefixes of length at respective positions and in array are identical, i.e., when and are equal. The predicate is true when is the maximal length of common prefixes at positions and in array . The postcondition of the function is the natural way of specifying that this function computes the longest common prefix.

Fig. 2
figure 2

Challenge 1: specification of longest common prefixes

Implementation and proofs. The Why3 code for the function that computes the longest common prefix is given in Fig. 3. It is a direct translation of the Java reference implementation. Since we need a mutable variable for this code, we use a reference ( is a function of the standard library that increments an integer reference, see Sect. 2.2). The loop invariant is natural, since the algorithm amounts to incrementing until the characters at and differ.

Fig. 3
figure 3

Challenge 1: implementation of the function

To prove this function, we first state and prove two lemmas (Fig. 3). These lemmas are not strictly necessary for the proof, though they help some automated provers, but we need them later anyway. The first lemma states that if characters at positions and are different, then there is no common prefix of length . The second lemma states that, for the longest common prefix to be of length , it suffices to have a common prefix of length but not . The two lemmas and the verification conditions for the function are proved automatically, using SMT solvers. (Detailed statistics on proofs are summarized in Sect. 3.5.)

3.2 Sorting suffixes

Our next module deals with lexicographic order on suffixes and sorting.

Specification. In the specifications of this module (Fig. 4) we define the lexicographic order as a predicate : the suffix at position is smaller than the suffix at position if there is some length such that prefixes of length are the same, and the next character in , if any, is smaller than the next character in . The comparison function is then specified with postconditions that tell, depending on the sign of the result, whether suffixes at positions and are equal, smaller or greater, respectively.

Fig. 4
figure 4

Challenge 1: specification of lexicographic comparison and sorting

To specify the sorting algorithm, we need several ingredients (Fig. 4). First we define the “less or equal” predicate , which is the ordering relation used for sorting. As usual with sorting algorithms, the postcondition is twofold: first, it says that the resulting array is sorted; second, that it contains a permutation of its initial contents. This sorting function is peculiar since it does not take only one array as usual, but takes two arrays, and as arguments, and sorts using an order relation that depends on . Because of this peculiarity, we cannot reuse the predicate from the Why3 standard library to specify the function: the definition of this predicate is parametrized by only one array (see Appendix for details). We thus had to write our own version (Fig. 4) by making a copy of the relevant part of the Why3 standard library and adding an extra argument .

On the other hand, we can reuse the predicate from the standard library (see Fig. 24 in the Appendix for details). Finally, additional requirements are that the input arrays have the same length \(l\), and that the values in array range in \(0\ldots l-1\), for which we can reuse the corresponding predicate from the standard library, where means that maps the domain \([0,\ldots ,n-1]\) into \([0,\ldots ,n-1]\) (See Fig. 25 in the Appendix).

Implementation and proofs. The implementation is given in Fig. 5. The code of the function is close to the Java reference implementation. Notice the use of the keyword to say that the last case is unreachable. All VCs for these functions are proved automatically.

Fig. 5
figure 5

Challenge 1: implementation of comparison and sorting

To prove the sorting algorithm, we first have to show that is transitive (lemma ). To prove this lemma automatically, we pose an auxiliary lemma to handle the equality case. The loop invariants needed are no different from the ones needed for a generic insertion sort algorithm (see for instance http://toccata.lri.fr/gallery/sorting.en.html). We just add the fact that values in array remain within \(0\ldots l-1\). The two assertions in the code are intended to help automated provers.

3.3 The suffixArray data structure

This module corresponds to the public part of the SuffixArray class in the Java reference implementation.

Specification. We first declare a type (Fig. 6) corresponding to the class in the Java code. We equip this type with an invariant that specifies that the two arrays have the same length \(l\), that the suffix array is a permutation of \(0,\ldots ,l-1\), and that it is sorted in increasing order. The three functions correspond to the methods of the same name in the Java code, and their postconditions are natural.

Fig. 6
figure 6

Challenge 1: specification of SuffixArray

Fig. 7
figure 7

Challenge 1: implementation of SuffixArray

Implementation and proofs. The implementation (Fig. 7) of and is simple and easy to prove. The function acts as a constructor, and the difficulty is to establish the invariant. For this, we pose a lemma : when one is permuting elements of an array that represents a permutation, the result is still a permutation. Such a lemma cannot be proved automatically, because it requires induction (the predicate is defined inductively). We perform this proof in Coq, which requires a few lines of tactics. The notion of permutation of an interval \(0 \ldots l-1\) and this lemma are good candidates for addition in the Why3 standard library.

Fig. 8
figure 8

Challenge 1: specification of longest repeated substring

3.4 Longest repeated substring

Specification. This module contains only the main function for computing the longest repeated substring, specified in Fig. 8. As in the Java code, there are two global variables and that hold the results of the computation. To specify this function, we add an extra global ghost variable that holds the index of the second occurrence of the repeated substring. The postconditions thus say that the two indexes and are distinct, that is the length of their longest common prefix, and that for any other pair of two indexes, the longest common prefix is not longer than .

Implementation and proofs.

Fig. 9
figure 9

Challenge 1: implementation of longest repeated substring

The implementation of follows the Java code, and is given in Fig. 9. The first three loop invariants are naturally derived from the first three postconditions. The most difficult part is to establish the last postcondition, for which we add a fourth loop invariant and two assertions after the loop.

Fig. 10
figure 10

Challenge 1: Lemmas for longest repeated substring

The fourth loop invariant states that we already have computed the longest repeated substring for all suffixes in ,...,. To show that it is preserved by a loop iteration, where only suffixes and are considered, it is necessary to show that for any \(j\) between \(0\) and \(i-1\), the longest common prefix of and is smaller that those of and . This property is true only because suffixes are sorted lexicographically. To complete the proof with automated provers, we pose the lemma given on Fig. 10. To prove it, we also pose the auxiliary lemma . Both lemmas are proved automatically.

The last part of the proof is to show the fourth postcondition, knowing that the fourth loop invariant is true at loop exit. To achieve this, we need to add two assertions. The first assertion is just to make the final loop invariant symmetric in \(j\) and \(k\), and needs the lemma of Fig. 10. The second assertion states that the array of suffixes is surjective, that is we are sure that all indexes have been considered when the loop is finished. Since second assertion could not be proved automatically, we needed to perform the proof in Coq. The main reason is that it uses existential quantification, which is typically hard to handle automatically. Another difficulty is that this surjectivity property follows from the classical but non-trivial mathematical result that an injective function that maps a finite set (namely, \(0,\ldots ,l-1\)) to itself is also surjective. Fortunately, such a result is already proved in the Why3 standard library, so that we can reuse it in our Coq proof, which, in the end, is just a dozen lines long.

Finally, proving the postcondition from the second assertion must be done in Coq, it seems that the automated provers are not able to exploit the existential quantifications successfully (3 lines of tactics only).

3.5 Proof statistics

The detailed proof results are available at URL http://toccata.lri.fr/gallery/verifythis_fm2012_LRS.en.html. The table below summarizes these results. The time limit given to automated provers is 10 s.

Module

Number of VCs

Automatically proved

LCP

10

10  (100 %)

Suffixsort

62

62  (100 %)

Suffixarray

22

21  (95 %)

LRS

30

28  (93 %)

Total

124

121  (97 %)

The table below shows the results per prover, among the 121 VCs proved automatically.

Prover

Number of VCs proved

Alt-Ergo 0.95.2

116

CVC3 2.4.1

106

CVC4 1.3

105

Eprover 1.6

77

Vampire 0.6

75

Z3 3.2

85

Z3 4.3.1

82

3 VCs are proved only by Alt-Ergo. The three interactive proofs done using Coq amount to a total of 49 lines of script added manually, most of them being for proving the lemma .

3.6 Lessons learned

The first part of the challenge, namely our module, was easy and was indeed solved almost in the same way within the 45 min slot. On the other hand, achieving the other parts required significantly more work, that we estimate to a dozen of hours. It should be noticed that although the final specifications are, to our opinion, relatively short and natural, it was not easy to obtain them at first: a significant amount of time is needed to find adequate specifications that make the proofs almost fully automated.

Our proofs were not done fully automatically, as we had to invoke Coq and write manually a few lines of proof script. In the case of the lemma, this was necessary because an inductive reasoning was needed, on the definition of the permutation predicate. For the other VCs, this was needed to manually give witnesses of some existential quantifications, which could not be found by automated provers at our disposal. Notice that when we go to Coq to discharge a goal, we typically use a few lines of Coq tactics, e.g. to perform an induction. For the resulting subgoals, we can still use automated provers via the Coq tactic, which amounts to send back a Coq goal to Why3 and call a given automated prover. It is an interesting question whether an interactive proof assistant is mandatory for such purposes, we will come back to this in the concluding section.

Another point is that we were able to reuse a significant amount of theories from the standard library. This is good news since good library support is certainly an important point for the efficient use of a verification environment. However we also realized that some parts were not sufficiently generic, namely the predicate for which the ordering relation cannot depend on another extra argument. A simple solution would be to add an extra parameter in the predicate in the library. Another direction would be to allow some form of partial application in the logic of Why3, that would be a first step towards higher order logic.

4 Challenge 2: sums of prefixes

This challenge proposes to compute the sums of the prefixes of an array. More precisely, given an array , it amounts to computing each partial sum \(\sum _{0\le k < i}\mathtt {a}[k]\) and storing it back into \(\mathtt {a}[i]\). The array is thus modified in place. The proposed algorithm requires the length of to be a power of 2, and does not use any extra storage.

The Why3 specification of the algorithm is given in Fig. 11. We import the function from the theory of the Why3 standard library, where denotes the sum \(\sum _{\mathtt {i}\le k < \mathtt {j}}\mathtt {a}[k]\).

The challenge is to verify a sequential version of a parallelizable algorithm. The idea is to identify the array with a binary tree. The algorithm traverses this tree by gathering information firstly from the leaves to the root (the upsweep phase) and secondly from the root to the leaves (the downsweep phase). First, we specify the state of array during the upsweep and downsweep phases. Then we explain each phase separately.

The full solution is available on the Toccata gallery of verified programs at http://toccata.lri.fr/gallery/verifythis_PrefixSumRec.en.html.

4.1 Modeling the binary tree

The main difficulty in this challenge is to capture the operations performed by the upsweep phase. The array is identified with a complete binary tree and this operation updates nodes with sums of subtrees in an intricate way. The simplest ideaFootnote 4 is to use an inductive predicate (see Fig. 12) that mimics the recursive structure of the algorithm. It takes as arguments two indices and to identify a subtree, and two arrays and . Array stands for the initial contents of the array, and array for its current state with partial sums.

Fig. 11
figure 11

Challenge 2: specification of sums of prefixes

Fig. 12
figure 12

Challenge 2: modeling the binary tree

A subtree is represented by a pair of indices (, ), with being a power of two. The elements of this subtree span from to , included. We introduce two functions and to descend into the left and right subtrees:

figure o

The left subtree is (, ) and the right subtree is (, ). A schematic representation of the algorithm and of the tree view of the array is in Fig. 13.

The main idea in predicate is to describe the value in but not the value in . The main reason is that the root of a left subtree is not modified anymore once the left subtree has been processed. Since is inductively defined, it describes the values at indices from to , since they are all roots of left subtrees. For instance, defines the values stored at indices from 0 to 6. On the contrary, the value is not specified in . Indeed, functions and have different specifications for .

Fig. 13
figure 13

Challenge 2: schematic view of the algorithm. The left column shows the upsweep phase, the right column the downsweep phase. The indices (\(_i\),\(_i\)) correspond to the subtrees and to the parameters (,) of the recursive calls. The initial array is at the bottom left, the array after the upsweep phase is at the top left, the array before the downsweep phase is at the top right, the resulting array is at the bottom right. A plain arrow stands for a sum and an assignment performed in the first and second phase. A dashed arrow stands for an assignment \( \leftarrow \) performed in the second phase

4.2 The “upsweep” phase

Fig. 14
figure 14

Challenge 2: upsweep phase

The implementation and the complete specification of are presented in Fig. 14. This function on (, ) sets to the sum of the elements of the subtree. So when the left subtree has been processed, contains the sum of the elements of the left subtree. It is indeed the property which is described in . Moreover the array is modified by the recursive calls, so we need to specify the extent of these modifications. This is the purpose of the last two formulas in the postcondition. The conditions of preservation of are described in the two-frame lemmas given in Fig. 15. These two lemmas are proved by induction over , using the Coq proof assistant.

Fig. 15
figure 15

Challenge 2: frame properties for

4.3 The “downsweep” phase

Fig. 16
figure 16

Challenge 2: downsweep phase

The implementation of the function is presented in Fig. 16. We need the ghost argument that contains the initial array. Indeed cannot be used because it represents the intermediate state between the upsweep and downsweep phase. The traversal is the same as in the first phase so the inductive predicate is used directly for the proof. During the second phase the value is used again in a special way. Before the call to , it contains the prefix sum of the value of the initial array before the subtree (, ):

figure p
Fig. 17
figure 17

Challenge 2: main procedure

Finally, the function ensures that all the values of the subtree (, ) are the prefix sums of the array . As for the function , we need to specify the extent of the modification of but we do not have to write and prove the frame lemmas for because they can be derived without induction from the frame lemmas of defined in the Why3 standard library. The frame lemma is also used for the frame of the first recursive call.

4.4 The main procedure

The main procedure , in Fig. 17, calls the two phases sequentially and initializes to the prefix sum of the index , which is . The harness test proposed in the challenge is also proved (we do not give the code in this paper).

4.5 Proof statistics

The detailed proof results are available at http://toccata.lri.fr/gallery/verifythis_PrefixSumRec.en.html. The table below summarizes these results. The time limit given to automated provers is 10 s.

Function

Number of VCs

Automatically proved

4

2 (50 %)

24

24 (100 %)

29

29 (100 %)

12

12 (100 %)

20

20 (100 %)

Total

89

87 (98 %)

The table below shows the results per prover, among the 87 VCs proved automatically.

Prover

Number of VCs proved

Alt-Ergo 0.95.2

100

CVC3 2.4.1

85

CVC4 1.2

96

Z3 3.2

64

Z3 4.3.1

70

The two lemmas and are proved using Coq by 4 lines of tactics: one induction on the inductive predicate and then the subgoals are discharged using the tactic, as in challenge 1. All remaining proof obligations are discharged by at least two automated provers.

4.6 Lessons learned

As for challenge 1, we had to use the interactive proof assistant Coq to discharge some VCs. See the general conclusion for a discussion on such a use of Coq.

The hardest part of this challenge is the specification of the state between the two phases. During the competition, when we came up with the idea of using an inductive predicate, we got the indexes wrong several times. When the automated provers failed to prove a proof obligation, we added assertions that should have helped to find particular facts that could not be proved. We also tried to prove them in Coq to find the hole in the proof. After the competition, when the indexes in the specification were corrected, we removed the useless assertions and Coq proofs. This raises the general question on how we can debug the specifications. For such a purpose, we are currently implementing a step-by-step evaluator of WhyML functions. The user will thus be able to easily compare the behavior of the program to its specification.

5 Challenge 3: deletion in a binary search tree

The third challenge is to verify a procedure that removes the node with the minimal key from a binary search tree. The pseudocode given at the competition descends along the leftmost branch of the tree using a while loop. When it reaches a node with no left child, it makes its right child the new left child of its parent. The tree is mutated in place. Our solution respects the reference implementation. The full solution is available on the Toccata gallery of verified programs at http://toccata.lri.fr/gallery/verifythis_fm2012_treedel.en.html.

5.1 Preliminaries

Why3 has no native support for mutable trees. Hence we build a minimal memory model, given in Fig. 18. It introduces some uninterpreted type to denote memory locations, with a particular value . Then the heap is modeled as a global reference holding a purely applicative map from memory locations to nodes. A node is a record with two fields and of type and a third field of type .

Fig. 18
figure 18

Challenge 3: preliminaries

To account for possible -dereference, we do not access these three fields directly, but we use instead functions , , and with suitable preconditions:

figure q

(and similarly for and ). In this model, we assume that any pointer that is not can be safely dereferenced, which is the case in languages that do not permit explicit deallocation, e.g. Java.

5.2 Specification

We import polymorphic immutable lists and trees from the Why3 standard library (see Sect. 2.1 for definitions). We also import a function that lists the elements of a tree according to inorder traversal and a predicate that expresses that all elements of a given list are distinct. Types and will only appear in specifications.

Fig. 19
figure 19

Challenge 3: specification

Our specification for problem 3 is given in Fig. 19. The central component is the inductively defined predicate . Such an inductive definition should be read as a set of inference rules:

Given a memory state \(m\), a loc \(p\), and a tree of memory locations \(t\) (of type ), the predicate means that memory \(m\) contains a well-formed tree rooted at \(p\), whose shape is exactly \(t\). The inductive nature of predicate ensures that such a tree is both finite and acyclic. But nothing prevents the heap-allocated tree to be a DAG; we will take care of that later.

The tree deletion operation takes a non-null loc as argument. It is required to be the root of a tree, that is , where is a tree of locations passed as an additional ghost parameter. To account for the absence of sharing in , we also require that all locations in are distinct, which is conveniently written as . Let be the pair returned by the function. The postcondition makes use of a second ghost parameter to describe the shape of . It simply says that , for some location , with being the field at . We avoid the use of an existential quantifier over by performing pattern-matching over instead.

It is worth pointing out that our postcondition simply says that we deleted the first element from , that is the leftmost innermost element in the tree rooted at . There is no need for the notion of binary search tree to show up. It would be an orthogonal (and easy) lemma to show that the minimal element in a binary search tree is located at the leftmost innermost node, and that after removal the remaining tree is still a binary search tree.

5.3 Proof

The code itself is straightforward. First, it handles the particular case where there is no left subtree. Otherwise, it descends along the leftmost branch of the tree, using three variables , , and to hold three consecutive nodes on this branch. This can be depicted as follows:

figure r

We are done with the descent when becomes . Then we simply remove node by turning the right subtree of into the left subtree of .

Fig. 20
figure 20

Challenge 3: implementation

Proving the code, however, is not straightforward. The whole code annotated with assertions and equipped with ghost statements is given in Fig. 20. The loop performs some local computation, focusing on the subtree rooted at , but the postcondition we wish to establish is related to the whole tree rooted at . Thus we have to account for the “tree with a hole” depicted in gray in the picture above. Fortunately, there is a convenient way to define such a notion: Huet’s zipper [11]. The idea is to define a subtree placeholder as the path from the root of the subtree to the root of the whole tree. In the general case, this path should indicate whether we took the left or right subtree during the descent. In our case, however, we are always moving to the left subtree, so the zipper degenerates into a mere list, that is

figure s

The zipper \((\mathtt Left ~z~x~t)\) denotes a hole in place of the left subtree of a node with value \(x\), with right subtree \(t\), and with some upper context \(z\). The zipper Top denotes a hole at the root of the tree. For instance, the “tree with hole” \((\mathtt Node ~(\mathtt Node ~\Box ~x_2~t_2)~x_1~t_1)\) is denoted by the zipper \((\mathtt Left ~(\mathtt Left ~\mathtt Top ~x_1~t_1)~x_2~t_2)\).

From a zipper and a subtree , we can recover the whole tree with the following recursive function that rebuilds the nodes along the path:

figure t

The idea behind our proof is to maintain the zipper for the subtree rooted at in a ghost variable , as well as its left and right subtrees in two additional ghost variables and . This can be depicted as follows:

figure u

These three ghost variables are updated along the descent. Thus we have the following loop invariant:

figure v

Since the memory is not mutated within the loop, showing the preservation of this loop invariant is straightforward.

The difficult part of the proof lies in the final statement, once we have exited the loop and mutated the memory. The new tree is built from the zipper using

figure w

We have to show that the inorder traversal of that tree is the tail of the inorder traversal of the initial tree. But this only holds thanks to the lack of sharing in the tree, which is provided by the precondition . Otherwise, more than one element could have disappeared from the list. We move that key property into the following lemma:

figure x

It is proved interactively using the Coq proof assistant. The proof introduces 8 sub-lemmas and requires 114 lines of Coq tactics, including the use of the tactic to call external SMT solvers, as already mentioned for the two first challenges. The main reason for this proof to be that long is the lack of separation logic in Why3 (no notion of footprints, few lemmas about etc.).

It is worth pointing out that the use of zippers is only an artifact of our proof. Zippers do not appear at all in our specification.

5.4 Proof statistics

Detailed proof results are given in Fig. 21 on page 15. (This does not include five auxiliary functions, for which proof details are available at URL http://toccata.lri.fr/gallery/verifythis_fm2012_treedel.en.html). The table below summarizes these results. The time limit given to automated provers is 5 s (apart from one goal, for which it is 60 s).

Function

# VCs

Automatically proved

Auxiliary functions

5

5 (100 %)

Lemmas

3

2 (67 %)

30

29 (97 %)

Total

38

36 (95 %)

Two VCs are discharged using Coq (128 lines of tactics).

The table below shows the results per prover, among the 36 VCs that are proved automatically.

Prover

Number of VCs proved

Alt-Ergo 0.95.2

35

CVC3 2.4.1

34

CVC4 1.3

38

Z3 3.2

34

Z3 4.2

34

Since 36 VCs are discharged automatically, it is clear from this table that the cooperative use of several ATPs is a true asset.

Fig. 21
figure 21

Proof results on challenge 3. An answer on green (light) background indicates that the prover succeeded to discharge the VC, in the given number of seconds. An answer on red (dark) background indicates that the prover reached the time limit given between parentheses (color figure online)

5.5 Lessons learned

It is slightly unsatisfactory that we cannot handle this challenge in a direct way: Since WhyML does not allow arbitrary mutable data structures, we have to use an explicit encoding of a memory heap. Although we are able to handle this challenge successfully, it is clear that Why3 is not the language of choice to specify and prove pointer-heavy programs. It would be more natural to use verification tools dedicated to Java or C code, in particular since there exist such tools that are built upon Why3, using WhyML as an intermediate language: Krakatoa [13] for Java and Frama-C [10] and its Jessie plug-in [14] for C. Yet we think that doing the verification with Why3 has real benefits: it is easier to write and debug the specification and proof when VCs are not clobbered with a complex encoding of the memory heap. Once the specification with Why3 is completed and verified, one can adapt it to a Java or C implementation. In particular, we think that our idea of using a zipper in the proof can be readily reused.

6 Conclusions

We found the Why3 environment adequate for the given challenges. The specification language provides advanced tools that proved to be useful: algebraic data types, inductive predicates, and a rich standard library.

To perform the proofs, we needed several back-end provers; in particular, the more complex lemmas had to be proved interactively, using Coq. Even when considering only the VCs that were proved automatically, there is no prover among Alt-Ergo, CVC3, CVC4, and Z3 that was able to discharge all of them. Though the ability to use several provers is a clear advantage, it also makes the maintenance of proof sessions a difficult task. Why3 provides a mechanism for storing proof sessions that records which transformations and provers were used to prove each VC, so that a complete verification project can be replayed if needed [5]. Moreover, it is possible to dump such a session, e.g., Fig. 21 is automatically generated from the proof session of challenge 3. Similar tables for problems 1 and 2 are available online, in Why3’s gallery.

On the use of Coq. For each of the three challenges, we had to use Coq to discharge a few VCs that were not proved by automated provers. Writing a proof script for such VCs may seem to be a complex task that requires a fair knowledge of Coq. However, the tactic helped us to keep such tasks reasonable: a proof starts with some powerful tactic that generates a few subgoals, and after a very little number of tactics, the tactic is able to complete the proof. A typical example is a proof that requires induction. Another typical case is when explicit witnesses have to be provided for existential quantifiers, as in challenge 1. Although this process is reasonably quick and painless, it is necessary to know a little bit of Coq to use it. It is thus desirable to propose alternatives. Two recent features added to Why3 may help: first, a transformation that is able to perform induction on algebraic data types; second, a mechanism of “lemma functions”, similar to that of VeriFast and Dafny, that allows the user to write a recursive program to simulate an induction scheme of his choice. Lemma functions can also be used to generate witnesses for existentially quantified assertions. Still, the ability to perform an induction over an inductive predicate, as we did in challenges 1 and 2, remains to be studied. Last but not least, we are now planning to extend Why3 with a dedicated lightweight interactive prover that would simplify proofs even further.

On the possible use of Frama-C/Jessie or Krakatoa. Why3 is indeed an intermediate verification language which is used by the front-ends Frama-C/Jessie (for C) and Krakatoa (for Java). Thus one might ask why we chose Why3 for the competition: after all, Java implementations were given for the first two challenges, and a language with pointers was mandatory for the third challenge. The first reason is that we decided, prior to the competition, that we were going to use Why3, because it is the tool we are developing. An important improvement that we developed in Why3 relies in the expressiveness of the specification language, that allows us to structure logical models into theories [7], and thus to design a well-structured, reusable standard library. This feature showed itself important during the competition. In the Jessie and Krakatoa front ends, there is no such a large standard library of specifications yet. Thus, another lesson we learned is that we should now improve the specification languages of the front ends. A non-trivial issue is how to reuse the same generic standard library for both Why3 and the front ends. Another point is that the modeling of the heap memory used by the front-ends results in VCs that are sometimes obscure. Finally, we believe that, when one wants to verify a given program, the first step is the design of adequate specifications and adequate proof elements such as loop invariants, auxiliary lemmas, assertions in the code, etc. This work is easier to carry out on a simple language such as WhyML. On a language such as C or Java, one immediately faces the extra burden of showing the absence of runtime errors such as integer overflow and invalid pointer dereferencing.