1 Introduction

SPARK [1] is a subset of the Ada programming language targeted at safety- and security-critical applications. SPARK restrictions ensure that the behavior of a SPARK program is unambiguously defined, and simple enough that formal verification tools can perform an automatic diagnosis of conformance between a program specification and its implementation. As a consequence, it forbids the use of features that either prevent automatic proof, or make it possible only at the expense of extensive user annotation effort. The lack of support for pointers is the main example of this choice.

Among the various problems related to the use of pointers in the context of formal program verification, the most difficult problem is that two names may refer to overlapping memory locations, a.k.a. aliasing. Formal verification platforms that support pointer aliasing like Frama-C [2] require users to annotate programs to specify when pointers are not aliased. This can take the form of inequalities between pointers when a typed memory model is used, or the form of separation predicates between memory zones when an untyped memory model is used. In both cases, the annotation burden is acceptable for leaf functions which manipulate single-level pointers, and quickly becomes overwhelming for functions that manipulate pointer-rich data structures. In parallel to the increased cost of annotations, the benefits of automation decrease, as automatic provers have difficulties reasoning explicitly with these inequalities and separation predicates.

Programs often rely on non-aliasing in general for correctness, when such aliasing would introduce interferences between two unrelated names. We call aliasing potentially harmful when a memory location modified through one name could be read through another name, within the scope of a verification condition. Otherwise, the aliasing is benign, when the memory location is only read through both names. A reasonable approach to formal program verification is thus to detect and forbid potentially harmful aliasing of names. Although this restricted language fragment cannot include all pointer-manipulating programs, it still allows us to introduce pointers to SPARK with minimal overhead for its program verification engine.

In this paper, we provide a formal description of the inclusion of pointers in the Ada language subset supported in SPARK, generalizing intuitions that can be found in [3, 4] or on Adacore’s blog [5, 6]. As our main contribution, we show that it is possible to borrow and adapt the ideas underlying the safe support for pointers in permission-based languages like Rust, to safely restrict the use of pointers in usual imperative languages like Ada.

The rest of the paper is organized as follows. In Sect. 2, we give an informal description of our approach. Section 3 introduces a small formal language for which we define the formal alias analysis rules in Sect. 4. In Sect. 5, we describe the implementation of the analysis in GNATProve, a formal verification tool for Ada, and discuss some limitations with the help of various examples. We survey related works in Sect. 6 and future works in Sect. 7.

2 Informal Overview of Alias Analysis in SPARK

In Ada, the access to memory areas is given through paths that start with an identifier (a variable name) and follow through record fields, array indices, or through a special field all, which corresponds to pointer dereferencing. In what follows, we only consider record and pointer types, and discuss the treatment of arrays in Sect. 5.

As an example, we use the following Ada type, describing singly linked lists where each node carries a Boolean flag and a pointer to a shared integer value.

figure a

Given a variable A : List, the paths A.Flag, A.Key.all, A.Next.all.Key are valid and their respective types are Boolean, Integer, and access Integer (a pointer to an Integer). The important difference between pointers and records in Ada is that—similarly to C—assignment of a record copies the values of fields, whereas assignment of a pointer only copies the address and creates an alias.

The alias analysis procedure runs after the type checking. The idea is to associate one of the four permissions—\(\mathsf {RW}\), \(\mathsf {R}\), \(\mathsf {W}\) or \(\mathsf {NO}\)—to each possible path (starting from the available variables) at each sequence point in the program. A set of rules ensures that for any two aliased pointers, at most one has the ownership of the underlying memory area, that means the ability to read and modify it.

The absence of permission is denoted as the \(\mathsf {NO}\) permission. Any modification or access to the value accessible from the path is forbidden. This typically applies to aliased memory areas that have lost the ownership over their stored values.

The read-only permission \(\mathsf {R}\) allows us to read any value accessible from the path: use it in a computation, or pass it as an in parameter in a procedure call. As a consequence, if a given path has the \(\mathsf {R}\) permission, then each valid extension of this path also has it.

The write-only permission \(\mathsf {W}\) allows us to modify memory occupied by the value: use it on the left-hand side in an assignment or pass it as an out parameter in a procedure call. For example, having a write permission for a path of type List allows us to modify the Flag field or to change the addresses stored in the pointer fields Key and Next. However, this does not necessarily give us the permission to modify memory accessible from those pointers. Indeed, to dereference a pointer, we must read the address stored in it, which requires the read permission. Thus, the \(\mathsf {W}\) permission only propagates to path extensions that do not dereference pointers, i.e., do not contain additional all fields.

The read-write permission \(\mathsf {RW}\) combines the properties of the \(\mathsf {R}\) and \(\mathsf {W}\) permissions and grants the full ownership of the path and every value accessible from it. In particular, the \(\mathsf {RW}\) permission propagates to all valid path extensions including those that dereference pointers. The \(\mathsf {RW}\) permission is required to pass a value as an in out parameter in a procedure call.

Execution of program statements changes permissions. A simple example of this is procedure call: all out parameters must be assigned by the callee and get the \(\mathsf {RW}\) permission after the call. The assignment statement is more complicated and several cases must be considered. If we assign a value that does not contain pointers (say, an integer or a pointer-free record), the whole value is copied into the left-hand side, and we only need to check that we have the appropriate permissions: \(\mathsf {W}\) or \(\mathsf {RW}\) for the left-hand side and \(\mathsf {R}\) or \(\mathsf {RW}\) for the right-hand side. However, whenever we copy a pointer, an alias is created. We want to make the left-hand side the new full owner of the value (i.e., give it the \(\mathsf {RW}\) permission), and therefore, after the permission checks, we must revoke the permissions from the right-hand side, to avoid potentially harmful aliasing. The permission checks are also slightly different in this case, as we require the right-hand side to have the \(\mathsf {RW}\) permission in order to move it to the left-hand side.

Let us now consider several simple programs and see how the permission checks allow us to detect potentially harmful aliasing.

Fig. 1.
figure 1

Examples of potentially harmful aliasing, with some verification conditions that require tracking aliases throughout the program to be checked.

Fig. 2.
figure 2

Graphical representation of the permissions attributed to B and its extensions after assignment A := B; in P1.

Procedure P1 in Fig. 1 receives two in out parameters A and B of type List. At the start of the procedure, all in out parameters assume permission \(\mathsf {RW}\). In particular, this implies that each in out parameter is separated from all other parameters, in the sense that no memory area can be reached from two different parameters. The first assignment copies the structure B into A. Thus, the paths A.Flag, A.Key, and A.Next are separated, respectively, from B.Flag, B.Key, and B.Next. However, the paths A.Key.all and B.Key.all are aliased, and A.Next.all and B.Next.all are aliased as well.

The first assignment does not change the permissions of A and its extensions: they retain the \(\mathsf {RW}\) permission and keep the full ownership of their respective memory areas, even if the areas themselves have changed. The paths under B, however, must relinquish (some of) their permissions, as shown in Fig. 2. The paths B.Key.all and B.Next.all as well as all their extensions get the \(\mathsf {NO}\) permission, that is, lose both read and write permissions. This is necessary, as the ownership over their memory areas is transferred to the corresponding paths under A. The paths B, B.Key, and B.Next lose the read permission but keep the write-only \(\mathsf {W}\) permission. Indeed, we forbid reading from memory that can be altered through a concurrent path. However, it is allowed to “redirect” the pointers B.Key and B.Next, either by assigning those fields directly or by copying some different record into B. The field B.Flag is not aliased, nor has aliased extensions, and thus retains the initial \(\mathsf {RW}\) permission. This \(\mathsf {RW}\) permission allows us to perform the assignment B.Flag := True on the next line.

The third assignment, however, is now illegal, since B.Key.all does not have the write permission anymore. What is more, at the end of the procedure the in out parameters A and B are not separated. This is forbidden, as the caller assumes that all out and in out parameters are separated after the call just as they were before.

Procedure P2 in Fig. 1 receives two pointers A and B, and manipulates them inside a while loop. Since the permissions are assigned statically, we must ensure that at the end of a single iteration, we did not lose the permissions necessary for the next iteration. This requirement is violated in the example: after the last assignment A := B, the path B receives permission \(\mathsf {W}\) and the path B.all, permission \(\mathsf {NO}\), as B.all is now an alias of A.all. The new permissions for B and B.all are thus weaker than the original ones (\(\mathsf {RW}\) for both), and the procedure is rejected. Should it be accepted, we would have conflicting memory modifications from two aliased paths at the beginning of the next iteration.

3 \(\mu \)SPARK Language

For the purposes of formal presentation, we introduce \(\mu \)SPARK, a small subset of SPARK featuring pointers, records, loops, and procedure calls. We present the syntax of \(\mu \)SPARK, and define the rules of alias safety.

The data types of \(\mu \)SPARK are as follows:

Every \(\mu \)SPARK program starts with a list of record type declarations:

$$ \begin{array}{rcl} record &{} \,{:}{:}\!\!=\; &{} \texttt {type}\,\, ident \,\, \texttt {is} \,\, \texttt {record} \,\, field^\star \,\, \texttt {end} \\[1ex] field &{} \,{:}{:}\!\!=\; &{} ident \,\, \texttt {:} \,\, type \end{array} $$

We require all field names to be distinct. The field types must not refer to the record types declared later in the list. Nevertheless, a record type R can be made recursive by adding a field whose type is a pointer to R (written access R). We discuss the handling of array types in Sect. 5.

The syntax of \(\mu \)SPARK statements is defined by the following rules:

$$ \begin{array}{rclcl} path &{} \,{:}{:}\!\!=\; &{} ident &{} \qquad &{} \text {variable} \\ &{}|&{} path \,\texttt {.}\, ident &{}&{} \text {record field} \\ &{}|&{} path \,\texttt {.}\, \texttt {all} &{}&{} \text {pointer dereference} \\[2ex] expr &{} \,{:}{:}\!\!=\; &{} path &{}&{} \text {l-value} \\ &{}|&{} \texttt {42} \;|\; \texttt {3.14} \;|\; \texttt {True} \;|\; \texttt {False} \;|\; \ldots &{} \qquad &{} \text {scalar value} \\ &{}|&{} expr \,\,(\,\texttt {+}\,|\,\texttt {-}\,|\,\texttt {<}\,|\,\texttt {=}\,|\,\ldots \,)\,\, expr &{}&{} \text {binary operator} \\ &{}|&{} \texttt {null} &{}&{} \text {null pointer} \\[2ex] stmt &{} \,\,{:}{:}\!\!=\; &{} path \,\,\texttt {:=}\;\; expr &{}&{} \text {assignment} \\ &{}|&{} path \,\,\texttt {:=}\;\; \texttt {new} \,\, type &{}&{} \text {allocation} \\ &{}|&{} \texttt {if} \,\,expr\,\, \texttt {then}\,\, stmt^{\star } \,\, \texttt {else} \,\, stmt^{\star } \,\, \texttt {end} &{}&{} \text {conditional} \\ &{}|&{} \texttt {while} \,\,expr\,\, \texttt {loop}\,\, stmt^{\star } \,\, \texttt {end} &{}&{} \text {``while'' loop} \\ &{}|&{} ident \,\,\texttt {(}\,\, { expr }^{\star } \,\,\texttt {)} &{}&{} \text {procedure call} \end{array} $$

Following the record type declarations, a \(\mu \)SPARK program contains a set of mutually recursive procedure declarations:

We require all formal parameters and local variables in a procedure to have distinct names. A procedure call can only pass left-values (i.e., paths) for in out and out parameters. The execution starts from a procedure named Main with the empty parameter list.

The type system for \(\mu \)SPARK is rather standard and we do not show it here in full. We assume that binary operators only operate on scalar types. The null pointer can have any pointer type access \(\tau \). The dereference operator .all converts access \(\tau \) to \(\tau \). Allocation p := new \(\tau \) requires path p to have type access \(\tau \). In what follows, we only consider well-typed \(\mu \)SPARK programs. A formal semantics for \(\mu \)SPARK statements is given in Appendix A.

On the semantic level, we need to distinguish the units of allocation, such as whole records, from the units of access, such as individual record fields. We use the term location to refer to the memory area occupied by an allocated value. We treat locations as elements of an abstract infinite set, and denote them with letter \(\ell \). We use the term address to designate either a location, denoted \(\ell \), or a specific component inside the location of a record, denoted \(\ell .f.g\), where f and g are field names (assuming that at \(\ell \) we have a record whose field f is itself a record with a field g). A value is either a scalar, an address, a null pointer or a record, that is, a finite mapping from field names to values.

A \(\mu \)SPARK program is executed in the context defined by a binding \(\varUpsilon \) that maps variable names to addresses and a store \(\varSigma \) that maps locations to values. By a slight abuse of notation, we apply \(\varSigma \) to arbitrary addresses, so that \(\varSigma (\ell .f)\) is \(\varSigma (\ell )(f)\), the value of the field f of the record value stored in \(\varSigma \) at \(\ell \).

The evaluation of expressions is effect-free and is denoted . We also need to evaluate l-values to the corresponding addresses in the store, written \(\langle p \rangle ^\varUpsilon _\varSigma \), where p is the evaluated path. Illicit operations, such as dereferencing a null pointer, cannot be evaluated and stall the execution (blocking semantics). In the formal rules below, c stands for a scalar constant and \(\odot \), for a binary operator:

4 Access Policies, Transformers, and Alias Safety Rules

We denote paths with letters p and q. We write \(p \sqsubset q\) to denote that p is a strict prefix of q or, equivalently, q is a strict extension of p. In what follows, we always mean strict prefixes and extensions, unless explicitly said otherwise.

In the typing context of a given procedure, a well-typed path is said to be deep if it has a non-strict extension of an access type, otherwise it is called shallow. We extend these notions to types: a type \(\tau \) is deep (resp. shallow) if and only if a \(\tau \)-typed path is deep (resp. shallow). In other words, a path or a type is deep if a pointer can be reached from it, and shallow otherwise. For example, the List type in Sect. 2 is a deep type, and so is access Integer, whereas any scalar type or any record with scalar fields only is shallow.

An extension q of a path p is called a near extension if it has as many pointer dereferences as p, otherwise it is a far extension. For instance, given a variable A of type List, the paths A.Flag, A.Key, and A.Next are the near extensions of A, whereas A.Key.all, A.Next.all, and their extensions are far extensions, since they all create an additional pointer dereference by passing through all.

We say that sequence points are the program points before or after a given statement. For each sequence point in a given \(\mu \)SPARK program, we statically compute an access policy: a partial function that maps each well-typed path to one of the four permissions: \(\mathsf {RW}\), \(\mathsf {R}\), \(\mathsf {W}\), and \(\mathsf {NO}\), which form a diamond lattice: \(\mathsf {RW}> \mathsf {R}| \mathsf {W}> \mathsf {NO}\). We denote permissions by \(\pi \) and access policies by \(\varPi \).

Permission transformers modify policies at a given path, as well as its prefixes and extensions. Symbolically, we write \(\varPi \xrightarrow {T}_p \varPi '\) to denote that policy \(\varPi '\) results from application of transformer T to \(\varPi \) at path p. We define a composition operation that allows chaining the application of permission transformers \(T_1\) at path \(p_1\) and \(T_2\) at path \(p_2\) to \(\varPi \) resulting in the policy \(\varPi '\). We write as an abbreviation for (that is, for some \(\varPi ''\), \(\varPi \xrightarrow {T_1}_p \varPi '' \xrightarrow {T_2}_p \varPi '\)). We write \( \varPi \xrightarrow {T}_{p,q} \varPi ' \) as an abbreviation for .

Permission transformers can also apply to expressions, which consists in updating the policy for every path in the expression. This only includes paths that occur as sub-expressions: in an expression X.f.g + Y.h, only the paths X.f.g and Y.h are concerned, whereas X, X.f and Y are not. The order in which the individual paths are treated must not affect the final result.

Fig. 3.
figure 3

Alias safety rules for statements.

We define the rules of alias safety for \(\mu \)SPARK statements in the context of a current access policy. An alias-safe statement yields an updated policy which is used to check the subsequent statement. We write \(\varPi \cdot s \rightarrow \varPi '\) to denote that statement s is safe with respect to policy \(\varPi \) and yields the updated policy \(\varPi '\). We extend this notation to sequences of statements in an obvious way, as the reflexive-transitive closure of the update relation on \(\varPi \). The rules for checking the alias safety of statements are given in Fig. 3. These rules use a number of permission transformers such as ‘fresh’, ‘check’, ‘move’, ‘observe’, and ‘borrow’, which we define and explain below.

Let us start with the (P-assign) rule. Assignments grant the full ownership over the copied value to the left-hand side. If we copy a value of a shallow type, we merely have to ensure that the right-hand side has the read permission. Whenever we copy a deep-typed value, aliases may be created, and we must check that the right-hand side is initially the sole owner of the copied value (that is, possesses the \(\mathsf {RW}\) permission) and revoke the ownership from it.

To define the ‘\({\mathrm {move}}\)’ transformer that handles permissions for the right-hand side of an assignment, we need to introduce several simpler transformers.

Definition 1

Permission transformer \(\mathrm {check} \, ~\pi \) does not modify the access policy and only verifies that a given path p has permission \(\pi \) or greater. In other words, \(\varPi \xrightarrow {\mathrm {check} \, \pi }_p \varPi '\) if and only if \(\varPi (p) \geqslant \pi \) and \(\varPi = \varPi '\). This transformer also applies to expressions: \(\varPi \xrightarrow {\mathrm {check} \, \pi }_e \varPi '\) states that \(\varPi \xrightarrow {\mathrm {check} \, \pi }_p \varPi ' (= \varPi )\) for every path p occurring in e.

Definition 2

Permission transformer \(\mathrm {fresh} \, \pi \) assigns permission \(\pi \) to a given path p and all its extensions.

Definition 3

Permission transformer \(\mathrm {cut}\) assigns restricted permissions to a deep path p and its extensions: the path p and its near deep extensions receive permission \(\mathsf {W}\), the near shallow extensions keep their current permissions, and the far extensions receive permission \(\mathsf {NO}\).

Going back to the procedure P1 in Fig. 1, the change of permissions on the right-hand side after the assignment A := B corresponds to the definition of ‘\(\mathrm {cut}\)’. In the case where the right-hand side of an assignment is a deep path, we also need to change permissions of the prefixes, to reflect the ownership transfer.

Definition 4

Permission transformer \(\mathrm {block}\) propagates the loss of the read permission from a given path to all its prefixes. Formally, it is defined by the following rules, where x stands for a variable and f for a field name:

$$\begin{array}{ccc} \dfrac{}{\varPi \xrightarrow {\mathrm {block}}_x \varPi } &{}&{} \dfrac{ \varPi [p \mapsto \mathsf {W}]\! \xrightarrow {\mathrm {block}}_p \!\varPi '}{\varPi \xrightarrow {\mathrm {block}}_{p.{\texttt {all}}} \varPi '} \\[4ex] \dfrac{ \varPi (p) = \mathsf {NO}}{\varPi \xrightarrow {\mathrm {block}}_{p.f} \varPi } &{}\qquad &{} \dfrac{ \varPi (p) \geqslant \mathsf {W}\quad \varPi [p \mapsto \mathsf {W}]\! \xrightarrow {\mathrm {block}}_p \!\varPi '}{\varPi \xrightarrow {\mathrm {block}}_{p.f} \varPi '} \end{array}$$

Definition 5

Permission transformer \(\mathrm {move}\) applies to expressions:

  • if e has a shallow type, then \(\varPi \xrightarrow {\mathrm {move}}_e \varPi ' \,\Leftrightarrow \, \varPi \xrightarrow {\mathrm {check} \, \mathsf {R}}_e \varPi '\),

  • if e is a deep path p, then ,

  • if e is null, then \(\varPi \xrightarrow {\mathrm {move}}_e \varPi ' \,\Leftrightarrow \, \varPi ' = \varPi \).

To further illustrate the ‘\(\mathrm {move}\)’ transformer, let us consider two variables P and Q of type access List and an assignment P := Q.all.Next. We assume that Q and all its extensions have full ownership (\(\mathsf {RW}\)) before the assignment. We apply the second case in the definition of ‘\(\mathrm {move}\)’ to the deep path Q.all.Next. The ‘\(\mathrm {check} \, \mathsf {RW}\)’ condition is verified, and the ‘\(\mathrm {cut}\)’ transformer sets the permission for Q.all.Next to \(\mathsf {W}\) and the permission for Q.all.Next.all and all its extensions to \(\mathsf {NO}\). Indeed, P.all becomes an alias of Q.all.Next.all and steals the full ownership for this memory area. However, we still can reassign Q.all.Next to a different address. Moreover, we still can write some new values into Q.all or Q, without compromising safety. This is enforced by the application of the ‘\(\mathrm {block}\)’ transformer at the end. We cannot keep the read permission for Q or Q.all, since it implies the read access to the data under Q.all.Next.all.

Finally, we need to describe the change of permissions on the left-hand side of an assignment, in order to reflect the gain of the full ownership. The idea is that as soon as we have the full ownership for each field of a record, we can assume the full ownership of the whole record, and similarly for pointers.

Definition 6

Permission transformer \(\mathrm {lift}\) propagates the \(\mathsf {RW}\) permission from a given path to its prefixes, wherever possible:

$$\begin{array}{ccc} \dfrac{}{\varPi \xrightarrow {\mathrm {lift}}_x \varPi } &{} \qquad &{} \dfrac{ \varPi [p \mapsto \mathsf {RW}]\! \xrightarrow {\mathrm {lift}}_p \!\varPi '}{\varPi \xrightarrow {\mathrm {lift}}_{p.{\texttt {all}}} \varPi '} \\ \dfrac{ \forall q \sqsupset p.\, \varPi (q) = \mathsf {RW}\quad \varPi [p \mapsto \mathsf {RW}]\! \xrightarrow {\mathrm {lift}}_p \!\varPi '}{\varPi \xrightarrow {\mathrm {lift}}_{p.f} \varPi '} &{}&{} \dfrac{ \exists q \sqsupset p.\, \varPi (q) \ne \mathsf {RW}}{\varPi \xrightarrow {\mathrm {lift}}_{p.f} \varPi } \end{array}$$

In the (P-assign) rule, we revoke the permissions from the right-hand side of an assignment before granting the ownership to the left-hand side. This is done in order to prevent creation of circular data structures. Consider an assignment A.Next.all := A, where A has type List. According to the definition of ‘\(\mathrm {move}\)’, all far extensions of the right-hand side, notably A.Next.all, receive permission \(\mathsf {NO}\). This makes the left-hand side fail the write permission check.

Allocations p := new \(\tau \) are handled by the (P-alloc) rule. We grant the full permission on the newly allocated memory, as it cannot possibly be aliased.

In a conditional statement, the policies at the end of the two branches are merged selecting the most restrictive permission for each path. Loops require that no permissions are lost at the end of a loop iteration, compared to the entry, as explained above for procedure P2 in Fig. 1.

Procedure calls guarantee to the callee that every argument with mode in, in out, or out has at least permission \(\mathsf {R}\), \(\mathsf {RW}\) or \(\mathsf {W}\), respectively. To ensure the absence of potentially harmful aliasing, we revoke the necessary permissions using the ‘\(\mathrm {observe}\)’ and ‘\(\mathrm {borrow}\)’ transformers.

Definition 7

Permission transformer \(\mathrm {borrow}\) assigns permission \(\mathsf {NO}\) to a given path p and all its prefixes and extensions.

Definition 8

Permission transformer \(\mathrm {freeze}\) removes the write permission from a given path p and all its prefixes and extensions. In other words, \(\mathrm {freeze}\) assigns to each path q comparable to p the minimum permission \(\varPi (q) \wedge \mathsf {R}\).

Definition 9

Permission transformer \(\mathrm {observe}\) applies to expressions:

  • if e has a shallow type, then \(\varPi \xrightarrow {\mathrm {observe}}_e \varPi ' \,\Leftrightarrow \, \varPi ' = \varPi \),

  • if e is a deep path p, then \(\varPi \xrightarrow {\mathrm {observe}}_e \varPi ' \,\Leftrightarrow \, \varPi \xrightarrow {\mathrm {freeze}}_p \varPi '\),

  • if e is null, then \(\varPi \xrightarrow {\mathrm {observe}}_e \varPi ' \,\Leftrightarrow \, \varPi ' = \varPi \).

We remove the write permission from the deep-typed in parameters using the ‘\(\mathrm {observe}\)’ transformer, in order to allow aliasing between the read-only paths. As for the in out and out parameters, we transfer the full ownership over them to the callee, which is reflected by dropping every permission on the caller’s side using ‘\(\mathrm {borrow}\)’.

In the (P-call) rule, we revoke permissions right after checking them for each parameter. In this way, we cannot pass, for example, the same path as an in and in out parameter in the same call. Indeed, the ‘\(\mathrm {observe}\)’ transformer will remove the write permission, which is required by ‘\(\mathrm {check} \, \mathsf {RW}\)’ later in the transformer chain. At the end of the call, the callee transfers to the caller the full ownership over each in out and out parameter.

We apply our alias safety analysis to each procedure declaration. We start with an empty access policy, denoted \(\varnothing \). Then we fill the policy with the permissions for the formal parameters and the local variables and check the procedure body. At the end, we verify that every in out and out parameter has the \(\mathsf {RW}\) permission. Formally, this is expressed with the following rule:

figure b

We say that a \(\mu \)SPARK program is alias-safe if all its procedures are.

By the end of the analysis, an alias-safe program has an access policy associated to each sequence point in it. We say that an access policy \(\varPi \) is consistent whenever it satisfies the following conditions for all valid paths \(\pi \), \(\pi .f\), \(\pi .\texttt {all}\):

$$\begin{aligned} \varPi (\pi ) = \mathsf {RW}&\Longrightarrow \varPi (\pi .f) = \mathsf {RW}&\varPi (\pi ) = \mathsf {RW}&\Longrightarrow \varPi (\pi .{\texttt {all}}) = \mathsf {RW}\end{aligned}$$
(1)
$$\begin{aligned} \varPi (\pi ) = \mathsf {R}&\Longrightarrow \varPi (\pi .f) = \mathsf {R}&\varPi (\pi ) = \mathsf {R}&\Longrightarrow \varPi (\pi .{\texttt {all}}) = \mathsf {R}\end{aligned}$$
(2)
$$\begin{aligned} \varPi (\pi ) = \mathsf {W}&\Longrightarrow \varPi (\pi .f) \ge \mathsf {W}\end{aligned}$$
(3)

These invariants correspond to the informal explanations given in Sect. 2. Invariant (1) states that the full ownership over a value propagates to all values reachable from it. Invariant (2) states that the read-only permission must also propagate to all extensions. Indeed, a modification of a reachable component can be observed from any prefix. Invariant (3) states that write permission over a record value implies a write permission over each of its fields. However, the write permission does not necessarily propagate across pointer dereference.

Lemma 1 (Policy Consistency)

The alias safety rules in Fig. 3 preserve policy consistency.

When, during an execution, we arrive at a given sequence point with the set of variable bindings \(\varUpsilon \), store \(\varSigma \), and statically computed and consistent access policy \(\varPi \), we say that the state of the execution respects the Concurrent Read, Exclusive Write condition (CREW), if and only if for any two distinct valid paths p and q, \(\langle p \rangle ^\varUpsilon _\varSigma = \langle q \rangle ^\varUpsilon _\varSigma \wedge \varPi (p) \ge \mathsf {W}\Longrightarrow \varPi (q) = \mathsf {NO}\).

The main result about the soundness of our approach is as follows.

Theorem 1 (Soundness)

A terminating evaluation of a well-typed alias-safe \(\mu \)SPARK program respects the CREW condition at every sequence point.

The full proof, for a slightly different definition of \(\mu \)SPARK, is given in [7]. The argument proceeds by induction on the evaluation derivation, following the rules provided in Appendix A. The only difficult cases are assignment, where the required permission withdrawal is ensured by the ‘\(\mathrm {move}\)’ transformer, and procedure call, where the chain of ‘\(\mathrm {observe}\)’ and ‘\(\mathrm {borrow}\)’ transformers, together with the corresponding checks, on the caller’s side, ensures that the CREW condition is respected at the beginning of the callee.

For the purposes of verification, an alias-safe program can be treated with no regard for sharing. More precisely, we can safely transform access types into records with a single field that contains either null or the referenced value. Since records are copied on assignment, we obtain a program that can be verified using the standard rules of Floyd-Hoare logic or weakest-precondition calculus (as the rules have also ensured the absence of aliasing between procedure parameters).

Indeed, consider an assignment A := B where A and B are pointers. In an alias-safe program, B loses its ownership over the referenced value and cannot be used anymore without being reassigned. Then, whenever we modify that value through A.all, we do not need to update B.all in the verification condition. In other words, we can safely treat A := B as a deep copy of B.all into A.all. The only adjustment that needs to be made to the verification condition generator consists in adding checks against the null pointer dereferencement, which is not handled by our rules.

5 Implementation and Evaluation

The alias safety rules presented above have been implemented in the SPARK proof tool, called GNATprove. The real SPARK subset differs from \(\mu \)SPARK in several respects: arrays, functions, additional loop constructs, and global variables. For arrays, permission rules apply to all elements, without taking into account the exact index of an element, which may not be known statically in the general case. Functions return values and cannot perform side effects. They only take in parameters and may be called inside expressions. To avoid creating aliases between the function parameters and the returned value, the full \(\mathsf {RW}\) permission is required on the latter at the end of the callee. The rules for loops have been extended to handle for-loops and plain loops (which have no exit condition), and also the exit (break) statements inside loops. Finally, global variables are considered as implicit parameters of subprograms that access them, with mode depending on whether the subprogram reads and/or modifies the variable.

Though our alias safety rules are constraining, we feel that they significantly improve the expressive power of the SPARK subset. To demonstrate it, let us go over some examples.Footnote 1 One of the main uses of pointers is to serve as references to avoid copying potentially big data structures. We believe this use case is supported as long as the CREW condition is respected. We demonstrate this on a small procedure that swaps two pointers.

figure c

This code is accepted by our alias safety rules. We can provide it with a contract, which can then be verified by the SPARK proof tool.

figure d

Another common use case for pointers in Ada is to store indefinite types (that is, the types whose size is not known statically, such as String) inside aggregate data structures like arrays or records. The usual workaround consists in storing pointers to indefinite elements instead. This usage is also supported by our alias analysis, as illustrated by an implementation of word sets, which is accepted and fully verified by SPARK.

figure e

The last use case that we want to consider is the implementation of recursive data structures such as lists and trees. While alias safety rules exclude structures whose members do not have a single owner like doubly linked lists or arbitrary graphs, they are permissive enough for many non-trivial tree data structures, for example, red-black trees. To insert a value in a red-black tree, the tree is first traversed top-down to find the correct leaf for the insertion, and then it is traversed again bottom-up to reestablish balancing. Doing this traversal iteratively requires storing a link to the parent node in children, which is not allowed as it would introduce an alias. Therefore, we went for a recursive implementation, partially shown above. The rotating functions, which are used by the Balance procedure (not shown here) can be implemented straightforwardly, since rotation moves pointers around without creating any cycles.

This example passes alias safety analysis successfully (i.e. without errors from the tool) and can be verified to be free of runtime exceptions (such as dereferences of null pointers) by the SPARK proof tool.

6 Related Work

The recent adoption of permission-based typing systems by programming languages is the culmination of several decades of research in this field. Going back as early as 1987 for Girard’s linear logic [8] and 1983 for Ada’s limited types [9], Baker was the first to suggest using linear types in programming languages [10], formalised in 1998 by Clarke et al. [11]. More recent works focus on Java, such as Javari and Uno [12, 13].

Separation logic [14] is an extension of Hoare-Floyd logic that allows reasoning about pointers. In general, it is difficult to integrate into automated deductive verification: in particular, it is not directly supported by SMT provers, although some recent attempts try to have it mended [15, 16].

Permission-based programming languages generalize the issue of avoiding harmful aliasing to the more general problem of preventing harmful sharing of resources (memory, but also network connections, files, etc.).

Cyclone and Rust achieve absence of harmful aliasing by enforcing an ownership type system on the memory pointed to by objects [17, 18]. Furthermore, Rust has many sophisticated lifetime checks, that prevent dangling pointers, double free, and null pointer dereference. In SPARK, those checks are handled by separate analysis passes of the toolset. Even though there is still no formal description of Rust’s borrow-checker, we must note a significant recent effort to provide a rigorous formal description of the foundations of Rust [19].

Dafny associates each object with its dynamic frame, the set of pointers that it owns [20]. This dynamic version of ownership is enforced by modeling the ownership of pointers in logic, generating verification conditions to detect violations of the single-owner model, and proving them using SMT provers. In Spec#, ownership is similarly enforced by proof, to detect violations of the so-called Boogie methodology [21].

In our work, we use a permission-based mechanism for detecting potentially harmful aliasing, in order to make the presence of pointers transparent for automated provers. In addition, our approach does not require additional user annotations, that are required in some of the previously mentioned techniques. We thus expect to achieve high automation and usability, which was our goal for supporting pointers in SPARK.

7 Future Work

The GNAT+SPARK Community release in 2020 contains support for pointers, as defined in Sect. 3.10 of the SPARK Reference Manual [22], with two important improvement not discussed in this paper: local observe/borrow operations and support for proof of absence of memory leaks.

Both these features require extensive changes to the generation of verification conditions. Support for local borrows requires special mechanisms to report changes on the borrower to the borrowee at the end of the borrow, as shown by recent work on Rust [23]. Support for proof of absence of memory leaks requires special mechanisms to track values that are either null or moved so that we can make sure that all values going out of scope are in this case.

8 Conclusion

In this paper, we have presented the rules for alias safety analysis that allow implementing and verifying in SPARK a wide range of programs using pointers and dynamic allocation. To the best of our knowledge, this is a novel approach to control aliasing introduced by arbitrary pointers in a programming language supported by proof. Our approach does not require additional user annotations or proof of additional verification conditions, which makes it much simpler to adopt. We provided a formalization of our rules for a subset of SPARK in order to mathematically prove the safety of our analysis.

In the future, we plan to extend our formalism and proof to non-terminating executions. For that purpose, we can provide a co-inductive definition of the big-step semantics and perform a similar co-inductive soundness proof, as described by Leroy and Grall [24].

Another long-term goal would be extending our analysis so that it could handle automatic reclamation, parallelism, initialization and lifetime checks, instead of relying on external checks.