1 The Problem: Scaling Deductive Program Verification

Deductive program verification is a technique to prove the correctness of a program w.r.t. its specification, which is given in terms of pre- and postconditions of the methods occurring in the program, following the Design-by-Contract principle [20]. Typically deductive program verification uses (an extension or variant of) Hoare logic [12] or dynamic logic [8] as its underlying verification technique.

Over the last years, enormous progress has been made on the use of such deductive program verification techniques for non-trivial examples, such as for example the discovery of a bug in Timsort [11], the verification of a Linux’s USB keyboard driver [25], the verification of avionics software [7], and the various VerifyThis challenges (see e.g., [14, 17]). There are many different factors that have contributed to this progress, such as:

  • the increase in power of automated provers,

  • efficient use of multi-core hardware for formal verification tools,

  • developments in specification languages, and

  • the development of new verification theories, such as the use of concurrent separation logics to reason in a modular way about concurrent programs [3, 18, 21].

Of course, there exist other formal analysis techniques that provide a much higher level of automation than deductive program verification, but the attractiveness of deductive program verification lies in that (1) it can be used to reason about a very large and flexible class of program properties, and (2) it allows to reason about programs with an unbounded state space, and in particular about parametrised programs, i.e., it is possible to prove that a method void m (int n) respect its specification for any possible value of its parameter n.

Therefore, I believe it is important to investigate why the use of deductive program verification on large-scale industrial examples remains difficult, and what can be done to improve this situation. To understand why the use of deductive program verification remains difficult, many different reasons can be given, but I believe the most important ones are the following.

  • Applications are often simply too large to handle, and the verifier lacks the overview of the complete application. Deductive program verification is traditionally quite closely connected to the concrete code, and as a result, it can be difficult to reason about the application at a suitable level of abstraction, because too many low-level details have to be dealt with.

  • Deductive program verification typically requires a high number of auxiliary annotations (loop invariants, intermediate assertions), which require a detailed understanding of the code and of the verification process.

  • For large applications, if we wish to use deductive program verification during the development process, when not all components are available yet, typically the deductive program verification tools require at least some stubs (for example, method contracts for the unimplemented methods) for the missing parts before the available components can be verified.

  • To reason about global system properties (which is necessary if we wish to show that the program requirements are fulfilled) we need to have some way to reason about the missing components as well.

  • As mentioned above, deductive program verification techniques are developed for pre-postcondition-style specifications, which usually do not match well with how high-level program requirements are expressed. We need formal techniques to connect these two levels of specifications.

In this position paper, I propose to work on the unification of models and code, to provide a solution to this problem. I will then sketch the first steps towards this solution, which we are currently developing within in the VerCors tool set. Finally, I will conclude by outlining further research challenges that need to be addressed to fully achieve my proposed solution.

Fig. 1.
figure 1

Multiple views on a system component

2 The Solution: Unification of Models and Code

I believe that to make the use of deductive program verification on large-scale industrial examples possible, we need to soften the border between program and model (or global/high-level specification). Ideally, one can have different views on the components of an application, see Fig. 1 for a visualisation. First of all, there should of course be a code view, which is executable and deterministic, and which provides many low-level details. But one also needs to have a specification view on the same component, which is high-level, declarative and abstract, possibly non-deterministic, and leaves out many details. And many more different views for the same component should be possible: a high-level specification view can be refined into a more detailed, but still declarative model, which can further be refined into an executable model and finally into a code view. And this code view might be further refined, into a program that is further optimised, e.g. for performance or memory usage.

When we take this approach, there are two crucial requirements:

  1. 1.

    we need techniques to connect the views at all the different levels, and this connection needs to be provably correct, and

  2. 2.

    we need to be able to compose the components at all different view levels, i.e. it should be possible to “build” an application, where some of its components are only described by a high-level specification, and to combine these with code-level components, in order to reason about properties of the application as a whole, as visualised in Fig. 2.

Fig. 2.
figure 2

Global verification with multiple component views

There already exists some work on refinement between different views, such as done in VDM [5, 10, 15], Z [16], or EventB [1]. However, most of these approaches focus on refinement between different models, and if they go all the way to executable code, typically the code is extracted from a low-level model description, which is close to the code in spirit, but the connection between the code and the low-level model is not proven correct. Two exceptions that I am aware of are: (1) the work by Dalvandi et al. [9], which aims at extracting code and annotations from an Event-B-model and then proving these correct using Dafny, and (2) the work of Tran-Jörgensen et al. [26], which generates JML annotations from VDM specifications.

Also, in the nineties early ideas on the transformation from models to programs, and from program to program have been developed, see e.g. [6, 22, 24]. These existing ideas need to be studied carefully, and it needs to be investigated if and how they can be incorporated in the current state-of-the-art deductive program verification tools.

Finally, also the work on the CompCert project, see e.g. [4, 19] can be a source of inspiration. In this project, a verified tool chain for C is developed in Coq. However in the approach I advocate, also practical verification and applicability to different programming languages should be a major driving force.

Fig. 3.
figure 3

Example: abstract behaviour specifications

3 First Steps: The VerCors Approach

Within the VerCors tool set, we have started to develop techniques to support this idea. In particular, we allow to specify an abstract model view of a component using process algebra, and then we use an extension of concurrent separation logic to prove that the concrete code behaves according to this model [23], while model checking technology can be used to derive global properties of the program from the process algebra models.

To illustrate this idea, let us consider the small code example in Fig. 3. Suppose we have a shared variable protected by a lock , and two threads that manipulate : one thread multiplies by 4, the other thread adds 4 to . The specifications of the two threads capture the thread’s behaviours abstractly: assuming that the behaviour of the thread before this method call was equal to the process algebra term H (written Hist(H)), execution of the method adds the action mult(4) or add(4) to this behaviour (where \(H.a\) denotes a process algebra term \(H\), followed by action \(a\), see the thread postconditions in lines 4 and 16).

The action annotations (lines 6–10, and lines 18–22) inside the method body indicate the concrete code fragments that corresponds to the abstract actions. Given the action specifications that describe the effect of the actions mult and add in Fig. 4, we use our program logic to prove that the action implementations behave as specified.

Fig. 4.
figure 4

Example action specifications

Moreover, the program logic can also be used to verify that a process algebra term describes the global behaviour of the program. Suppose we have a method, which starts the two threads and then waits for them to terminate. We can prove that the behaviour of this method is to execute the mult and the add action in any order (see the postcondition in line 2 below, where \(P + Q\) denotes a non-deterministic choice between \(P\) and \(Q\) and empty denotes an empty history). Finally, we can use existing model checking technology to reason about this abstract model, combined with the action specifications, to derive that the possible final values of variable x are 4 and 16.

figure a

This example is very simple, but we have used the same approach on larger and non-terminating programs [2, 23, 27].

4 Future Steps

The approach described above is still in its early stages. To fully realise the goal to have a seamless integration of code and models, more work is needed. I believe that the theory of how to make a connection between different levels of abstract models is reasonably well-understood [1, 5, 15, 16], but to make a provably correct transformation from model (or high-level specification) to code is less clear. There are approaches to generate a model from code, but correctness of the extraction is then typically a meta-property, and cannot be proven for the model and code directly (and thus, in particular depends on whether the extraction is correctly implementedFootnote 1). There also exists work on (provably correct) model-to-code generation, see e.g. [28], but the generated code is still very close to the model, and needs to be improved to achieve a reasonable performance.

Therefore, I believe we need to address the following research challenges:

  • we need to further develop refinement techniques that from an abstract model can generate annotated and verifiable code, where it is important that the generated code can be executed efficiently;

  • we need techniques to prove that a program that is transformed to optimise it for performance remains correct after the transformation, see [13] for further ideas;

  • we need to consider whether it is possible to automatically derive a model or abstract view from a concrete program; and

  • we need to further develop the abstract model theory for concurrent software, in particular making the abstract models compositional, such that it is possible to reason about the global behaviour of a system that is composed of both abstract models and concrete code components.