Keywords

1 Introduction

Concurrent programs are difficult to reason about either formally or informally because of potential interference between threads; interference can be managed by separation of the parts of the state accessible to threads; separation arguments are often complicated by dynamic changes of ownership.

It is useful to distinguish the issues arising in the design of concurrent programs before fixing on specific notations — clearly, separation/ownership and interference constitute underlying issues. An obvious demarcation is to employ Separation Logic to tackle the first set of issues and something like Rely/Guarantee reasoning for the latter.

It has been shown elsewhere that ‘pulling apart’ the standard rely/guarantee notation throws light on the issue of interference. In [JHC15], the benefits of studying issues prior to choosing a notation are discussed. In particular, that paper takes a new look at specifying and reasoning about interference (the new presentation is more fully explained in [HJC14]).

In the same spirit, the current paper examines the issue of separation. The separation of storage into disjoint portions is clearly an issue for concurrent program design — when it can be established, it is possible to reason separately about threads or processes that operate on the disjoint sections. Tony Hoare’s early attempt to extend his ‘axiomatic basis’ [Hoa69] to parallel programs provides this insight in [Hoa72]. Hoare shows that pre/post conditions of the code for separate threads can be conjoined providing the variables used by the threads are disjoint. He tackled normal (or ‘scoped’) variables where dynamic ownership might be controlled by something like monitors.

In comparison to scoped variables, it is more delicate to reason about separation over ‘heap’ variables whose addresses are computed by the programs in which they occur. Furthermore, exchange of ownership of heap addresses between threads is often disguised by intricate pointer manipulation.

The issues of separation and ownership are certainly handled well by Concurrent Separation Logic [O’H07]. The current paper suggests that some forms of separation can be specified by using data abstraction. The only novelty with respect to standard data abstraction/reification is that the representation must be shown to preserve the separation property of the abstraction.

Two examples are presented here: a simple list reversal algorithm that is sequential and comes from one of Reynolds’ early papers [Rey02] on Separation Logic and a concurrent sorting algorithm. In both cases the implementation uses (separate portions of) heap storage and the ownership of heap cells is exchanged between threads. It would be possible to object that the examples presented look like simple data reifications but that is, in fact, the main point. Using data abstraction, along with the one additional idea that separate abstract variables can be reified onto a shared data structure, throws light on the concepts of separation and ownership.

Of course, some notation has to be used for the specifications and requisite proof obligations but this is well-established and was not devised for concurrency. The authors happen to use ideas from VDMFootnote 1 but the same points could be made in Z or Event-B. In more complicated examples, it is useful to be explicit about ‘framing’ and VDM does offer ways of specifying read and write access to parts of the state. For framing, the ideas in [Bor00] or ‘dynamic frames’ [Kas11] would also be options.

The observation that it is possible to tackle some cases of reasoning about separation by using layers of abstraction is in no way intended to challenge research on separation logics. However, as with the reported reformulation of rely/guarantee reasoning, focussing on the issue rather than a specific notation might give a new angle on notations for separation and/or reduce the need to develop new logics.

Hints for a top-down development of the list reversal algorithm are sketched in [JHC15]. The current paper completes the development and fills in details omitted there — more importantly, it draws out the consequences (cf. Sect. 4) and adds the more substantial example of concurrent merge sorting in Sect. 3.

2 In-Place List Reversal

As observed in [JHC15], as well as separation being crucial for concurrent programs, it also has a role in sequential programs. In fact, Separation Logic [Rey02] was conceived for sequential programs; the development of Concurrent Separation Logic [O’H07] came later. While Sect. 3 applies the idea of separation as an abstraction to a concurrent sorting algorithm, this section shows the application of the same idea to the development of a sequential program whose final implementation performs in-place reversal of a sequence.

2.1 Original Presentation

In [Rey02], John Reynolds presented an efficient sequential list reversal algorithm; the fact that the code operates in-place makes it an ideal vehicle for introducing the idea of using abstraction to handle separation. Interestingly, Reynolds introduced the problem by starting with the algorithm, shown in Fig. 1. The list is represented by a value for each item, with the subsequent address containing a pointer to the next item. The algorithm utilises three pointers (i, j, k), where i initially points to the start of the list, k is a temporary place-holder and at termination of the algorithm, j points to the reversed list.

Reynolds used the separating conjunction of Separation Logic to develop a useful specification of the algorithm from the code. His specification demonstrates the ability of the separating conjunction operator to hide the details of the separation, such as showing that the two lists must remain separate and that they are separate from all other lists. While this is certainly a useful method for handling the complexities of separation, the following sections show how layered abstractions can offer a viable alternative.

Fig. 1.
figure 1

Reynolds’ in-place list reversal program in C notation (*n is the C-style pointer dereference of pointer n).

2.2 Abstract Specification

The notion of reversing a sequence is expressed simply as a recursive function:

figure a

The initial step is to develop a program whose state is a pair of lists:

figure b

where the first, referred to as s, is the original list and the second, referred to as r, should finally contain the reversed list. It is worth observing that the two fields of \(\Sigma _{0}\) are implicitly separate — they are ‘scoped’ variables and, unless a language allows something like ‘parameter passing by reference’, there is no debate about a lack of separation.

An operation to compute the reverse of a list can be specified as follows:

figure c

It is straightforward to develop the abstract program in Fig. 2 (the body of the while loop is given as a specified operation because its isolation makes the reification below clearer). The loop preserves the value of ; the standard VDM proof rule for loops handles termination by requiring that the relation be well-founded — thus .

Fig. 2.
figure 2

Abstract list reversal program.

2.3 Representing Sequences

The program in Fig. 2 is based on abstract sequences and cannot address things like moving pointers to achieve in-place operation. To show how the list reversal can occur without moving the data, the abstract state needs to be represented as a heap:

figure d

(In VDM, maps (\(D\buildrel m\over \longrightarrow R\)) are finite constructed functions; the fields of a pair \(pr \in (Val \times \big [Ptr\big ])\) are accessed hereFootnote 2 by index, e.g. \(pr_{1}\); the square brackets around Ptr indicate that it is optional and that \(\mathbf{{nil}}\notin Ptr\) is a possible value.)

Such a heap might contain information for other threads and/or garbage discarded by processes. Section 2.4 completes the reification to just such a Heap but, here, an intermediate step is introduced which shows two scoped variables each containing a sub-heap that is precisely a sequence representation (Srep). (Although this intermediate representation could actually be elided, a significant advantage of its use is that Srep objects are also useful for the development of the concurrent program in Sect. 3.) One could define Srep using a datatype invariant but the proofs below benefit from defining the concept inductively as the least map \(Srep \subseteq Heap\) containing:Footnote 3

figure e

Furthermore, a useful function that defines the start element can be defined over the recursive construction:

figure f

The state for this intermediate development step contains two Srep objects which are required to have disjoint domains:Footnote 4

figure g
figure h

On the \(\Sigma _{1}\) representation, the specification of the operation corresponding to the body of the while loop in Fig. 2 is:

figure i

Lemma 1

It is necessary to show that \(STEP_{1}\) preserves the invariant of \(\Sigma _{1}\).

figure j

The proof is by induction over Srep.Footnote 5

Proof obligations for data reification are standard in methods such as VDM (cf. [Jon90, Chap. 8]): retrieve functions are homomorphisms from the representation back to the abstraction.

figure k

The gather function is again defined over the inductive construction of Srep:

figure l

VDM defines an ‘adequacy’ proof obligation which requires that, for each abstract state, there exists at least one representation state.

Lemma 2

There is at least one representation for each abstract state:

figure m

The proof of this lemma is by induction on s.

The key commutativity proof for reification shows that the design step models the abstract specification:

Lemma 3

\(STEP_{1}\) models (under \(retr_{0}\)) the abstract \(STEP_{0}\)

figure n

The proof follows from unfolding the defined functions/predicates.

2.4 The Heap

Although the two Srep variables in the preceding section are ‘heap-like’, each is used like a scoped variable. This section shows that the scoped variables can be represented in a single heap and that the behaviour on the heap remains as specified in Sect. 2.3.

This final representation uses a single heap (hp) and two pointers (ij). The hp field of \(\Sigma _{2}\) is essentially the heap underlying Fig. 1.Footnote 6

figure o

This is again an exercise in data reification. Here, it is mandatory that sep holds between the two sub-heaps because their union is used in \((sr \cup rr) \subseteq hp\); the fact that this is not an equality admits the possibility of other information in the heap. The retrieve function in this case is:

figure p

where:

figure q

The definedness of trace for \(Srep \subseteq Heap\) follows from \(inv\text {-}\Sigma _{2}\).

Lemma 4

The trace function applied to the start of an Srep returns exactly the pointers in that Srep; therefore, restricting the domain of a heap containing an Srep to such a trace yields the original Srep.

figure r

The proof is by induction over Srep.

The adequacy proof obligation for \(\Sigma _{2}\) is:

Lemma 5

There is at least one representation in \(\Sigma _{2}\) for each \(\Sigma _{1}\) state:

figure s

The proof creates a minimal hp that contains exactly the union of sr/rr which are disjoint.

On \(\Sigma _{2}\), the specification of the operation corresponding to \(STEP_{1}\) above is:

figure t

for which the reification proof obligation is:

Theorem 1

\(STEP_{2}\) models (under \(retr_{1}\)) the abstract \(STEP_{1}\)

figure u

The proof again follows from unfolding the defined functions/predicates.

Code (in C++) that satisfies \(post-STEP_{2}\) is given in Fig. 3. The final step in the correctness argument is to note that the loop in Fig. 2 terminates when \(s = [\,]\) and the loop on the representation terminates when \(i = \mathbf{{nil}}\); under \(retr_{1}/retr_{0}\), these conditions are equivalent.

Fig. 3.
figure 3

C++ implementation of the list reversal algorithm.

2.5 Observations

This simple sequential example illustrates how the motto separation is an abstraction can work in practice. In the abstraction (\(\Sigma _{0}\)) of Sect. 2.2, the two variables are assumed to be distinct; standard data reification rules apply where that distinction is obvious; in the step to \(\Sigma _{2}\), it must be established that the abstraction of separation holds in the representation as (changing) portions of a shared heap.

A valuable by-product of the layered design is that the algorithm is discussed on the abstraction and neither the reification step nor its justification are concerned with list reversal as such. This is, of course, in line with the message of [Wir76].

There are some incidental bonuses from the use of VDM: invariants (and the use of predicate restricted types) effectively provide pre conditions for the functions; use of relational post conditions avoids the need for what are essentially auxiliary variables to refer to the initial state; and the use of ‘LPF’ [BCJ84] simplifies the construction of logical expressions where terms and/or propositions can fail to denote.

This example is simple and, in fact, the development presented here is even clearer than that in an earlier draft. The point is that the important notion of separation has been tackled without any special notation. Section 3 employs the same approach on a program that uses parallelism.

3 Mergesort

The preceding list reversal example demonstrates the idea of handling separation via abstraction in a sequential development. This section applies the same idea to a concurrent design: the well-known mergesort algorithm which sorts by recursively splitting lists. At each step, the argument list is divided into two parts (preferably, but not necessarily, of roughly equal sizes) which are recursively submitted to mergesort; as the recursion unwinds, the two sorted lists are merged into a single sorted list.

3.1 Specification

The notion of sorting is easy to specify as a relation:

figure v

The ordered predicate tests that its argument is an ascending sequence.

figure w

The permutes predicate tests that its two arguments contain the same elements; here this is done by comparing the ‘bag’ (‘multiset’) of occurrences:

figure x
figure y

3.2 Algorithm

The basic idea of merge sorting can be established with a recursive function (mergesort defined below). This uses a merge function that selects the minimum head element from its two argument lists and recurses:

figure z

Lemma 6

The merge function has the property that the final list is a permutation of the initial two lists conjoined:

figure aa

The proof is by nested induction on the lists.

Lemma 7

The merge function also satisfies the property that, if the argument lists are ordered, so is the resulting merged list:

figure ab

The proof is identical in structure to that of Lemma 6.

The mergesort function itself is defined as follows:

figure ac

Lemma 8

The mergesort function ensures that the resulting list is both sorted and a permutation of the initial list:

figure ad

Because of the arbitrary split, the proof uses course-of-values induction on s.

3.3 Representing Sequences

Having dealt with the algorithmic ideas in Sect. 3.2, the method used in Sect. 2.3 can be followed by reifying the abstract sequences into Srep objects as defined in Sect. 2.3.

The implementation consists of two operations: \(MSORT_{1}\) operates on \(S_{1}\):

figure ae

while the \(MERGE_{1}\) operation uses a state that contains three instances of Srep:

figure af

where the three fields are pairwise separate (sep cf. Sect. 2.3). As in Sect. 2.3, this notion of separation is used here only to simplify the exchange of ownership of cells between lr and a. In Sect. 3.4, separation justifies the embedding of three Srep objects in a single heap.

Turning to the presentation of the (abstract) program, standard sequential program constructs (e.g. the while loop) were used in Sect. 2.2. This approach is not followed here because it would be a digression to derive a proof rule for the (non-tail) recursion needed in \(MSORT_{1}\) (this construct is not covered in [Jon90]). Instead the recursion in both \(MERGE_{1}\) and \(MSORT_{1}\) is represented as predicates by ‘quoting post conditions’ (cf. [Jon90, Sect. 9.3]).

figure ag

Lemma 9

\(MERGE_{1}\) preserves separation:

figure ah

The proof of this lemma is obvious from the form of the proof of Lemma 1.

Lemma 10

The operation \(MERGE_{1}\) mirrors the function merge

figure ai

Here again, the proof follows that of Lemma 3.

It is necessary to split an Srep into two separate values of that type. The function split recurses until the argument p is located in the representation:

figure aj

Lemma 11

The split function yields two instances of Srep that are separate:

figure ak

The proof is by induction on sr.

Lemma 12

Under the gather function, concatenation of the two lists produced by split gives the argument list:

figure al

This proof follows the structure of that of Lemma 11.

Whereas \(MERGE_{1}\) is used sequentially (there are no concurrent threads), instances of \(MSORT_{1}\) are to be run in parallel. The term ‘parallel’ is used in preference to ‘concurrently’ precisely because the instances are executed on separate parts of the heap.

figure am

Theorem 2

The final conclusion is that the operation \(MSORT_{1}\) mirrors the function mergesort:

figure an

which follows from the lemmas.

3.4 The Heap

It is almost as straightforward as in Sect. 2.4 to develop code for \(MSORT_{2}\) and \(MERGE_{2}\). There is one interesting addition required because of the concurrent execution of two instances of \(MSORT_{2}\). The invariants follow the same pattern as with the sequence reversal example — for \(MERGE_{2}\), the representation in the Heap is:

figure ao

and the corresponding representation for \(MSORT_{2}\) is simply:

figure ap

The respective retrieve functions are:

figure aq
figure ar

It is, however, necessary to establish non-interference between the concurrent threads. This can be done with a simple use of rely/guarantee reasoning:Footnote 7

figure as

The code in Figs. 4 and 5 satisfies the specifications of \(MERGE_{2}\) and \(MSORT_{2}\) respectively; a specific implementation of split is also provided.

Fig. 4.
figure 4

C++ implementation of MERGE.

Fig. 5.
figure 5

C++ implementation of MSORT.

3.5 Observations

As in Sect. 2, the approach of viewing separation as an abstraction has benefits. As in the earlier example, aspects of VDM such as types restricted by predicates and relational post conditions play a small part in the development of merge sort. More significant is that the layered development makes it possible to divorce the reasoning about merging and sorting from details of how the abstract state is reified onto heap storage.

Although this example has used some aspects of VDM not needed in Sect. 2 — in particular, quoting post conditions — it is important to remember that these are long-standing ideas in VDM and are not specific to reasoning about the separation issue.

4 Discussion

The research reported in this paper is one vector of the ‘Taming Concurrency’ project in which it is hoped to identify and/or to develop apposite notations for reasoning about the underlying issues that make designing and justifying intricate concurrent programs challenging. In contrast, starting with a fixed notation might be seen as a version of ‘to a man with a hammer, everything looks like a nail’. Of course, using existing notation is not precluded but ensuring that the issues are clear looks to be a prudent starting point.

The Rely/Guarantee (R/G) approach (of which more below) was devised for reasoning about the issue of interference. The R/G concept has been substantially recast in [HJC14] and the new version is summarised in [JHC15]. In contrast to the monolithic five-tuple approach of [Jon81, Jon83a, Jon83b] for R/G specifications, [HJC14] presents separate rely and guar constructs in a refinement calculus style and shows their algebraic structure.

The current paper is written in the same spirit. Separation is also a key issue in thinking about parallel programs. One example of the importance of separation is the way in which storage is allocated between threads in an operating system. Separation Logic (SL) has a well-crafted collection of operators for reasoning about separation/ownership and an attractive feature is the pleasing algebraic properties of the operators.

This paper –with the help of examples previously tackled with SL– explores the option of reasoning about separation using predicates defined over heaps. The idea can be summarised with the motto that separation is an abstraction. A corollary of this point of view is that representations (e.g. of separate scoped variables into heap representations) have to preserve the separation property of the abstraction. Other than the twist of viewing separation as an abstraction, the method of data reification used here is long-established in the literature.

Analogous to the pulling apart of R/G specifications, an alternative view of SL might lead to different notational ideas than if the notation itself is taken as the fixed point. Obviously, the fact that it is possible to reason about separation without the need to use SL itself is not an argument against SL. One huge benefit of SL is the tool support that has been developed around the notation. These tools support a ‘bottom-up’ approach that is advantageous with legacy software. The pleasing algebraic relationship between SL operators has been referred to above. These operators are also able to express some constraints in a succinct way (e.g. the use of separating conjunction with recursion to state that a chain of pointers has no loops).

A bonus from the top down approach can be seen in the examples in this paper: the essence of each algorithm is documented and reasoned about on the abstraction and this is separated from arguments about the messy details of the (heap) representations. The hope is that seeing what can be done in a top-down view using abstraction could prompt new requirements for SL-like notations. The approach might, conceivably, also control the proliferation against which Matt Parkinson warns in [Par10].

Separation is, of course, a way of ruling out interference so it is interesting to understand those situations where a user can choose which approach to adopt. With scoped variables, there is a variety of ways to define the named variables (frame) of different threads. VDM allows state components to be marked as having rd/wr access; the keyword notation is rather heavy but serves the purpose and many alternatives could be considered. In the refinement calculus presentation of [HJC14, JHC15], write access is made clear but not access for reading. Section 3.4 above indicates the recording of read/write access to subsets of heap addresses. (There are, of course, occasions where read:write clashes require assumptions in the reading process and rely conditions are an obvious candidate for recording such assumptions.) One approach that is used with separation logics to handle such access constraints is to employ ‘fractional permissions’ [Boy03].

Technical connections between R/G and SL are considered in [VP07, Vaf07]. It might also be worth noting one of the Laws in [HJC14]:

figure at

which both handles the general case of interference and rather clearly shows that the attractive prospect of conjoining the post conditions of parallel threads can be achieved (only) if their respective guarantee conditions ensure sufficient separation. This emphasises that complete separation is an extreme case of minimising interference.

One last comment on the similarities is that the importance of (data) abstraction in the proposed way of looking at separation nicely mirrors its key role in R/G methods [Jon07].

More narrowly, on the content of this paper, alternatives considered by the authors include:

  • It would simplify the notation to separate the Heap into two mappings (one for the Val and the other for the next Ptr) because it would remove the need to use subscripts to access the components of the pair.

  • In both examples, it would be possible to omit the intermediate representation and to move directly from the respective abstract states to the general Heap. As mentioned in Sect. 2.3, the fact that Srep is used in both examples is one argument for its separation — the other argument is the divorce of the algorithm design from the messy heap representation details.

For future work, it would be useful to develop a ‘theory’ of Srep objects. Another interesting avenue to explore is the extent to which recording the relationship between a clean abstraction and its representation (given here as ‘retrieve functions’) could be used to generate code automatically from the abstract algorithm. Finally, the need to reason about both separation and interference will be discussed in another paper on which the current authors are working (together with Andrius Velykis) which covers the design of concurrent implementations of tree and graph representations.