

# **Verifying Correctness of Persistent Concurrent Data Structures**

John Derrick<sup>1</sup>, Simon Doherty<sup>1</sup>, Brijesh Dongol<sup>2( $\boxtimes$ )</sup>, Gerhard Schellhorn<sup>3</sup>, and Heike Wehrheim<sup>4</sup>

> <sup>1</sup> University of Sheffield, Sheffield, UK <sup>2</sup> University of Surrey, Guildford, UK b.dongol@surrey.ac.uk <sup>3</sup> University of Augsburg, Augsburg, Germany

> <sup>4</sup> Paderborn University, Paderborn, Germany

**Abstract.** Non-volatile memory (NVM), aka persistent memory, is a new paradigm for memory preserving its contents even after power loss. The expected ubiquity of NVM has stimulated interest in the design of *persistent* concurrent data structures, together with associated notions of correctness. In this paper, we present the first formal proof technique for *durable linearizability*, which is a correctness criterion that extends linearizability to handle crashes and recovery in the context of NVM. Our proofs are based on refinement of IO-automata representations of concurrent data structures. To this end, we develop a generic procedure for transforming any standard sequential data structure into a durable specification. Since the durable specification only exhibits durably linearizable behaviours, it serves as the abstract specification in our refinement proof. We exemplify our technique on a recently proposed persistent memory queue that builds on Michael and Scott's lock-free queue.

## **1 Introduction**

Recent technological advances indicate that future architectures will employ some form of *non-volatile memory* (NVM) that retains its contents after a system crash (e.g., power outage). NVM is intended to be used as an intermediate layer between traditional *volatile memory* (VM) and secondary storage and has the potential to vastly improve system speed and stability. Software that uses NVM has the potential to be more robust; in case of a crash, a system state before the crash may be recovered using contents from NVM, as opposed to being restarted from secondary storage. However, because the same data is stored in both a volatile and non-volatile manner, and because NVM is updated at a slower rate than VM, recovery to a consistent state may not always be possible. This is particularly true for concurrent systems, where coping with NVM requires introduction of additional synchronisation instructions into a program.

-c Springer Nature Switzerland AG 2019

M. H. ter Beek et al. (Eds.): FM 2019, LNCS 11800, pp. 179–195, 2019. [https://doi.org/10.1007/978-3-030-30942-8](https://doi.org/10.1007/978-3-030-30942-8_12)\_12

Derrick, Dongol and Doherty are supported by EPSRC grants EP/R032351/1, EP/R032556/2, EP/R019045/2; Wehrheim by DFG grant WE 2290/12-1.

Recently, researchers have developed *persistent* extensions to existing concurrent objects (e.g., concurrent data structures or transactional memory). This work has been accompanied by extensions to known notions of consistency, such as linearizability and opacity that cope with crashes and subsequent recovery.

In this paper, we examine correctness of the recently developed persistent queue by Friedman et al. [\[11](#page-15-0)], against the (also) recently developed notion of *durable linearizability* [\[14](#page-15-1)]. Friedman et al.'s queue extends the well-known Michael-Scott queue [\[20\]](#page-16-0), whereas durable linearizability extends the standard notion of linearizability [\[12\]](#page-15-2) so that completed executions are guaranteed to survive a system crash.

Our verification follows a well-established methodology: (1) we develop an operational model of durable linearizability that is parameterised by a generic sequential object (e.g., a queue data structure with enqueue and dequeue operations), (2) we prove that this operational model is sound, and (3) we establish a series of refinements between the operational model and the concrete implementation. The final (and most complex) of these steps, which establishes that the implementation refines the operational model, is fully mechanised in the KIV theorem prover [\[10](#page-15-3)]. It is important to note that the operational model is generic and for any particular verification one needs therefore just to establish step (3) in order to show that a particular algorithm is durable linearizable.

Ours is the first paper to address formal verification of persistent data structures. We consider the development of our sound operational characterisation of durable linearizability and the refinement proofs, including mechanisation in KIV, to be the main contributions of this paper. The mechanisation and the full version of the paper may be accessed from [\[17\]](#page-15-4).

We present Friedman et al.'s queue in Sect. [2,](#page-1-0) durable linearizability in Sect. [3,](#page-4-0) an operational characterisation of durable linearizability in Sect. [4,](#page-6-0) and address correctness of the queue in Sect. [5.](#page-9-0)

## <span id="page-1-0"></span>**2 A Persistent Queue**

The persistent queue of Friedman et al. [\[11](#page-15-0)] is an extension of the Michael-Scott queue (MSQ) [\[20](#page-16-0)] to cope with NVM (see Algorithms [1](#page-2-0) and [2\)](#page-3-0). The MSQ uses a linked list of nodes with global head and tail pointers. The first node is a sentinel that simplifies handling of empty queues. The MSQ is initialised by allocating a dummy node with a null next pointer, then setting the global head and tail pointers to this dummy node.

The enqueue operation creates a new node that is inserted at the end of the linked list. The insertion is performed using an atomic compare-and-swap (CAS) instruction that atomically updates the *next* pointer of the last node provided this next pointer hasn't changed since it was read at the beginning of the enqueue operation. The CAS returns true if it succeeds and false otherwise. Immediately after a new node is inserted, the tail pointer is *lagging* one node behind the true tail of the queue, and hence, must be updated to point to the last node in a separate step.

<span id="page-2-0"></span>

| <b>Algorithm 1.</b> Constructors           |                                          |
|--------------------------------------------|------------------------------------------|
| 1: class NODE                              | 1: class DURABLEQUEUE                    |
| 2: T val;                                  | 2: Node* head:                           |
| 3: Node* next;                             | 3: Node* tail;                           |
| 4: int degID;                              | $4: T*$ RVals [MAX];                     |
| 5: Node(T k):                              | 5: DURABLEQUEUE()                        |
| 6: $val(k)$ , $deqID(-1)$ , $next(null)$ ; | 6: $T*$ node := new Node(T());           |
|                                            | $7:$ flush(node);                        |
|                                            | $8:$ head $:=$ node;                     |
|                                            | $9: \quad \text{ flush}(\text{khead})$ ; |
|                                            | 10: $tail := node;$                      |
|                                            | flush(&tail);<br>11:                     |
|                                            | $RVals[i] := null; //all i$<br>12:       |
|                                            | $flush(\&RVals[i]);$<br>13:              |

The dequeue operation returns empty if the head and tail pointer both point to the sentinel node and the tail is not lagging. If the queue is not empty, the dequeue reads from the value of the node immediately after the sentinel and atomically swings the head pointer to this next node provided it has not changed. Thereby, the next node becomes the new sentinel node of the queue.

A key feature of MSQ is a *helping mechanism* where a different thread from the original enqueue may advance the tail pointer if it is lagging. In the case of a dequeue, this only occurs if head and tail pointers are equal, but the queue is not empty.

Friedman et al. [\[11\]](#page-15-0) adapt MSQ to a system comprising both VM and NVM. In such systems, computations take place in VM as normal, but data is periodically flushed to NVM by the system. In addition to system controlled flushes, a programmer may introduce explicit **flush** events that transfer data from VM to NVM. Only data in NVM persists after a crash (e.g., power loss). A persistent data structure must enable recovery from such an event, as opposed to a full system restart. In doing this, it must ensure some notion of consistency in the presence of crashes and a subsequent recovery operation. Following Friedman et al. [\[11](#page-15-0)], the notion of consistency we use is durable linearizability (see Sect. [3\)](#page-4-0).

The persistent queue uses the same underlying data structure as MSQ (see Algorithm [1\)](#page-2-0), but nodes contain an additional field, deqID (initialised to  $-1$ ), which holds the ID of the thread that removed the node from the queue. In addition to the head and tail pointers, it uses an array of pointers, RVals, with one index for each thread, containing either null (which is the initial value), of a pointer to a cell which itself either contains empty (which signifies that the thread last saw an empty queue), or a value (which is the value that was last dequeued). Unlike MSQ, the persistent dequeue operation does not return a value; instead the returned value for tid is stored in the cell pointed to by RVals[tid].

**Persistent Enqueue.** The basic structure (see Algorithm [2\)](#page-3-0) is the same as the enqueue of MSQ. In addition, to ensure that the linked list data structure



<span id="page-3-0"></span>

is recoverable after a crash, nodes and next pointers have to be persisted after being modified in VM.

This is achieved by using three **flush** operations in lines 3, 10 and 14. The first ensures that the node is persisted before it is inserted into the queue; the second and third ensure that the next pointer of a lagging tail pointer is persisted before the tail is advanced. Note that updates to tail do not need to be explicitly flushed because it can be recomputed during recovery by traversing the persistent list.

**Persistent Dequeue.** The basic structure of the dequeue operation also resembles the dequeue of MSQ. In addition it uses variables RVals and deqID to guarantee durable linearizability. RVals is an array of pointers to cells that are used to store the value returned by each dequeue. A dequeue creates a new cell at

Line 2, then flushes it at Line 3. The pointer to this cell is stored in RVals at Line 4, and this pointer is made persistent at Line 5.

The deqID field is used to logically mark nodes that are dequeued, which occurs at the successful CAS at Line 20. This logical dequeue is made persistent by flushing the deqID at Line 21. After a node has been logically dequeued, the dequeued value is stored in the cell pointed to by RVals[tid] (see Line 22) where tid is the thread ID of the dequeuing thread. This dequeued value is made persistent at Line 23. A dequeue by thread tid stores empty in RVals[tid] if the queue is empty in Line 13, and this value is made persistent at Line 14.

The persistent dequeue operation employs an additional helping mechanism to ensure that these new fields are made persistent in the correct order. In particular, a node that has been logically dequeued in VM must be made persistent before another dequeue is allowed to succeed. Therefore, if a thread recognises that deqID is not  $-1$  at Line 20, it helps the other thread by flushing the deqID field, writing the dequeued value into the cell pointed to by  $RVals[{\text{nxt}} \rightarrow \text{tid}],$ flushing this cell, and finally advancing the head pointer. Note that the helping thread may be delayed between the read at Line 27 and the write at Line 30, and the original thread tid may begin a new dequeue operation in this interval. In this case, since tid allocates a fresh cell at Line 2, the helping thread's write at Line 30 will harmlessly modify a previous cell.

After a crash, and prior to resuming normal operation, persistent data structures must perform a *recovery* operation that restores the state of the data structure in VM from NVM. The recovery procedure proposed by Friedman et al. is multithreaded (and complex), so we elide its details here. Instead, we provide a simpler single-threaded recovery operation (see Sect. [5.1\)](#page-9-1).

## <span id="page-4-0"></span>**3 Durable Linearizability**

We now define *durable linearizability* [\[14](#page-15-1)], a central correctness condition for persistent concurrent data structures. Like linearizability, durable linearizability is defined over *histories* recording the *invocation* and *response* events of operations executed on the concurrent data structure. Unlike linearizability, durably linearizable histories include crash events.

Formally, we let *Σ* be the set of operations. For a queue,  $\mathcal{Z} = \{ \text{Eng}, \text{Deg} \}$ . A history is a sequence of events, each of which is either (a) an invocation of an operation *op* by a thread  $t \in T$  with values *v*, written  $inv<sub>t</sub>(op, v)$ , (b) responses of *op* in thread t with value v, written  $res<sub>t</sub>(op, v)$ , or (c) a system-wide crash c.

Given a history  $h$ , we let  $ops(h)$  denote h restricted to non-crash events, and  $h_{\downarrow t}$  denote h restricted to (non-crash) events of thread  $t \in T$ . The crash events partition a history into  $h = h_0 c_1 h_1 c_2 ... h_{n-1} c_n h_n$ , such that *n* is the number of crash events in h,  $c_i$  is the *i*th crash event and  $ops(h_i) = h_i$  (i.e.,  $h_i$  contains no crash events). We call the subhistory h*<sup>i</sup>* the i*-th era* of h. For a history h and events  $e_1, e_2$ , we write  $e_1 < h e_2$  whenever  $h = h_0 e_1 h_1 e_2 h_2$ .

A history h is said to be *sequential* iff every invocation event (except if it is the last event in  $h$ ) is immediately followed by its corresponding response event; it is *well formed* if and only if (a)  $h_{1t}$  is sequential for every thread t and (b) each thread id appears in at most one era. Any invocation that is not followed by its response event is called a *pending* invocation. We consider well-formed histories only. A history h defines a *happens-before* ordering on the events occuring in h by letting  $e_1 \prec_h e_2$  iff  $e_1 \prec_h e_2$  and  $e_1$  is a response and  $e_2$  an invocation event. Linearizability (and durable linearizability) requires a notion of a *legal history*, which we define using a sequential object. Every history of a sequential object is both sequential and legal.

**Definition 1 (Sequential Object).** *A* sequential object *over a base type Val is a 5-tuple*  $(\Sigma, S, s_0, in, \rho)$  *where* 

- $\sim$  *Z is an* alphabet *of operations, S is a set of states and*  $s_0$  *the initial state,*
- $-$  in :  $\Sigma \rightarrow \mathbb{N}$  is an input function *telling us the number of inputs an operation*  $op \in \Sigma$  *takes, and*
- *–*  $\rho$  :  $S \times \Sigma \times \text{Val}^* \rightarrow S \times (\text{Val} \cup \{\text{empty}, \bot\})$  *is a partial* transition function.

We assume outputs of operations to consist of a single value which possibly is the symbol empty or no value denoted by  $\bot$ . In the following we let  $\boldsymbol{v} = v_1v_2 \ldots v_n$ denote a string of *n* elements and write  $\#v$  to denote its length *n*. We write  $inv<sub>t</sub>(op, v)$  for an invocation of the operation *op* with  $n = \text{\#}v$  inputs by thread t and let *Inv* be the set of all such invocations. Similarly, we let *Res* be the set of all responses.

The *legal* histories of a sequential object  $\mathbb{S} = (\Sigma, S, s_0, in, \rho)$  are defined as follows. We write  $s \xrightarrow{inv_t(op, v) res_t(op, v)} s'$  for  $\rho(s, op, v) = (s', v)$  and  $t \in T$ . For a sequence w of invocations and responses, we write  $s \stackrel{w}{\longrightarrow} s'$  iff either  $w = \langle \rangle$  and  $s = s'$ , or  $w = u \circ w'$  and there exists an s'' such that  $s \xrightarrow{u} s''$  and  $s'' \xrightarrow{w'} s'$ . The set of *legal histories* of S is given by  $\text{legal}_{S} = \{w \in (\text{Inv} \cup \text{Res})^* \mid \exists s \in S. s_0 \xrightarrow{w} s\}.$ 

<span id="page-5-0"></span>*Example 2.* A sequential queue,  $\mathbb{Q}$ , storing elements of type V is defined by  $\Sigma = {\text{Eng, Deg}}, \, \text{in}(\text{Eng}) = 1, \text{in}(\text{Deg}) = 0, \, q_0 = \langle \rangle, \, \text{and}$ 

$$
\rho = \{ \big((q, \mathtt{Eng}, v), (q \cdot v, \bot) \big) \mid v \in V \land q \in V^* \} \cup \{ \big((v \cdot q, \mathtt{Deg}, \varepsilon), (q, v) \big) \mid v \in V \land q \in V^* \} \cup \{ \big((\langle \rangle, \mathtt{Deg}, \varepsilon), (\langle \rangle, \mathtt{empty}) \big) \}
$$

where  $\varepsilon$  is the empty string,  $\langle \rangle$  is the empty sequence and  $\cdot$  is used for sequence concatenation. For  $\mathbb Q$ , the history h below is sequential and legal

$$
h = \langle \mathit{inv}_1(\mathtt{Eng},a), \mathit{res}_1(\mathtt{Eng},\bot), \mathit{inv}_2(\mathtt{Deg},\varepsilon), \mathit{res}_2(\mathtt{Deg},a)\rangle
$$

whereas the history  $h \cdot \langle inv_3(\text{Deg}, \varepsilon), res_3(\text{Deg}, b) \rangle$  is sequential but not legal.

For the definition of durable linearizability some more notation is needed. We write  $h \equiv h'$  if  $h_{|t} = h'_{|t}$  for all threads t. We let  $compl(h)$  (the completion) be the set of histories that can be obtained from  $h$  by appending (some) missing responses at the end, and use  $trunc(h)$  to remove pending invocations from a history h (or a set of histories). Following Herlihy and Wing [\[12\]](#page-15-2), h is *linearizable* if there is some  $h' \in trunc(compl(h))$  and some legal sequential history  $h_S$  such that (i)  $h' \equiv h_S$  and (ii)  $\forall e_1, e_2 \in h': e_1 \prec_{h'} e_2 \Rightarrow e_1 \prec_{h_S} e_2.$ 

<span id="page-6-1"></span>For durable linearizability, this definition is now simply lifted to histories with crashes.

**Definition 3 (Durable Linearizability** [\[14](#page-15-1)]**).** *A history* h *is* durably linearizable *if it is well formed and ops*(h) *is linearizable.*

Informally, durable linearizability guarantees that even after a crash the state of the concurrent object remains consistent with the abstract specification. This means that the effect of any operations that completed before a crash are preserved after the crash. The effect of operations that did not complete before a crash may or may not be preserved. For example, the concurrent history

$$
hc = \langle inv_1(\texttt{Enq},a), inv_3(\texttt{Deq},\varepsilon), res_1(\texttt{Enq},\bot), c, inv_2(\texttt{Deq},\varepsilon), res_2(\texttt{Deq},a)\rangle
$$

is durably linearizable since  $ops(hc) = \langle inv_1(\text{Eng}, a), inv_3(\text{Deg}, \varepsilon), res_1(\text{Eng}, \perp),$  $inv_2(\text{Deq}, \varepsilon)$ ,  $res_2(\text{Deq}, a)$  is linearizable with respect to the history h in Example [2.](#page-5-0) On the other hand the history

 $\langle inv_1(\texttt{Eng}, a), inv_3(\texttt{Eng}, b), res_1(\texttt{Eng}, \perp), c, inv_2(\texttt{Deg}, c), res_2(\texttt{Deg}, \texttt{empty})\rangle$ 

is not durably linearizable since the effect of the completed operation  $\text{Eng}(a)$  is not preserved after the crash.

Our methodology for proving durable linearizability does not use Definition [3](#page-6-1) directly; instead it uses the following characterisation, which defines the set of all durably linearizable histories for a sequential object.

We let  $\text{Lin}(\mathbb{S})$  be the set of histories linearizable wrt. the legal histories of sequential object S and define

$$
\text{Durlin}(\mathbb{S}) = \{ h \in (\mathit{Inv} \cup \mathit{Res} \cup \{c\})^* \mid \mathit{ops}(h) \in \text{Lin}(\mathbb{S}) \}
$$

For a given concurrent durable data structure implementing a sequential object S, proving its correctness thus amounts to showing that all histories of the implementation are in DurLin(S). To this end, for a given S, we develop an operational model  $DURAUT(S)$  whose behaviours generate  $DURLIN(S)$ . We then use a standard refinement approach to show that the implementation model is a refinement of  $DURAUT(S)$ . This is enough to guarantee that the original implementation is durably linearizable.

## <span id="page-6-0"></span>**4 An Operational Model for Durable Linearizability**

The operational model for durable linearizability is formalised in terms of an *Input/Output automaton (IOA)* [\[18\]](#page-15-5). This framework is often used for proving linearizability via refinement [\[9](#page-15-6)].



 $states(A): pc: T \rightarrow \{idle, ready, crashed\} \cup \{inv(op), res(op) | op \in \Sigma\}$  $s: S$  (the state of the sequential object)  $val: T \to Val^*$  (values of inputs) *out*:  $T \rightarrow Val$  (the value of the output)  $start(A): \forall t \in T : pc(t) = idle \wedge val(t) = \epsilon \wedge out(t) = \epsilon \wedge s = s_0$ 

<span id="page-7-1"></span>**Fig. 1.** Durable automaton  $A = \text{DurAur}(S)$  for  $S = (\Sigma, S, s_0, in, \rho)$ 

**Definition 4.** *An* IOA *is a labeled transition system* A *with*

- *a set of* states states(A)*,*
- *a set of* start *states* start(A) ⊆ states(A)*,*
- *a set of* actions acts(A)*, and*
- *a transition relation*  $trans(A) ⊆ states(A) × acts(A) × states(A)$  *(so that the actions label the transitions).*

The set acts(A) is partitioned into *internal* actions, internal(A) and *external* actions,  $external(A).<sup>1</sup>$  $external(A).<sup>1</sup>$  $external(A).<sup>1</sup>$  The internal actions represent events of the system that are not visible to the environment, whereas the external actions represent the automaton's interactions with its environment.

An *execution* of an IOA A is a sequence  $\sigma = s_0 a_1 s_1 a_2 s_2 a_3 \dots$  of alternating states and actions such that  $s_0 \in start(A)$  and for each i,  $(s_i, a_{i+1}, s_{i+1}) \in$  $trans(A)$ . A *reachable* state of A is a state appearing in an execution of A. An *invariant* of A is any superset of the reachable states of A (equivalently, any predicate satisfied by all reachable states of A). A *trace* of A is any sequence of (external) actions obtained by projecting onto the external actions of any execution of A. The set of traces of A,  $traces(A)$ , represents A's externally visible behaviour. If every trace of an automaton  $C$  is also a trace of an automaton  $A$ , then we say that C *implements* or *refines* A.

For an arbitrary sequential object S, we next construct a durable automaton  $DURAUT(S)$  (see Fig. [1\)](#page-7-1) whose traces are histories in  $DURLIN(S)$  only. This automaton can serve as a specification automaton in a refinement proof. The state of this automaton incorporates the state s of the sequential object S, plus for every thread  $t \in T$ :

<span id="page-7-0"></span><sup>&</sup>lt;sup>1</sup> In the standard IOA setting, external actions are further subdivided into input and output actions; this distinction is not needed for this current work.

- a program counter fixing whether the thread is still *idle*, is *ready* to be started, is *crashed* (i.e., has been active during a crash), or is currently executing an operation,
- possible input values of the thread's operations and a possible output value.

The transition relation of the automaton is  $-$  as usual  $-$  given in the form of pre- and postconditions of actions. For every operation *op* in the sequential object, the automaton has actions  $inv(op)$ ,  $do(op)$  and  $res(op)$ , where  $do(op)$ corresponds to execution of the abstract operation *op*, potentially changing the state of the sequential object. We use  $inv<sub>t</sub>(op, v)$  and  $res<sub>t</sub>(op, v)$  for  $inv(op)<sub>t</sub>(v)$ and  $res(\omega p)_t(v)$ , respectively. Any step of the implementation that refines  $d\omega(\omega p)$ is a step that persists the corresponding operation *op* (i.e., a *persistence point*, see Sect. [5\)](#page-9-0). Persistence points in durable linearizability are analogous to linearization points in linearizability [\[9](#page-15-6)]. Note that a thread may only invoke an operation if it is ready. We furthermore have a dedicated *crash* action that may be executed at any time that sets all active threads to *crashed*. To ensure that crashed threads are confined to a single era, we use a separate action run that enables idle threads to become ready. While inv(*op*), res(*op*) and *crash* are external actions, run and do(*op*) are internal.

<span id="page-8-0"></span>The theorem below ensures that traces of the durable automaton are the durably linearizable histories of S.

#### **Theorem 5.** *If* S *is a sequential object, then traces*( $DURAUT(S)$ ) ⊂  $DURLIN(S)$ *.*

*Proof.* Let  $\sigma = cs_0a_1cs_1 \dots a_ncs_n$  be an execution of DURAUT(S) and let  $cs_i.s$ , cs<sub>i</sub>.out etc. be the components of state  $cs_i$ . Let tr be the trace of  $\sigma$ . We construct the history  $h$  by making the following changes to  $tr$  (in this order).

**Completion** For every  $a_i$  being a do action  $d_{o_t}(op)$  in  $\sigma$  without matching  $res<sub>t</sub>(op)$ , we add  $res<sub>t</sub>(op, v)$  such that  $v = cs<sub>i</sub>.out(t)$  to the end of tr.

**Truncation** We remove all  $inv<sub>t</sub>(op, v)$  without matching response.

Next, we need to construct a legal sequential history  $h_S$  such that  $ops(h) \equiv h_S$ . Let  $i_1, \ldots, i_k$  be the indices of  $\sigma$  such that  $a_{i_j}$  is a do action  $d_{o_t}(op)$ . Then  $\rho(cs_{i_{j-1}}.s, op, v) = (cs_{i_j}.s, cs_{i_j}.out(t))$  by definition of the durable automaton. We set

$$
w_{i_j} = inv_t(op, v) res_t(op, cs_{i_j}.out(t)) .
$$

We let  $h_S = w_{i_1} \ldots w_{i_k}$  and  $h_S \in legal(\mathbb{S})$ .

Now assume  $e_1 \prec_h e_2$ . By definition,  $e_1 = res_t(op, v)$  and  $e_2 = inv_{t'}(op', v)$ for some  $t, t' \in T$ . Then  $e_1$  has not been added to the trace tr by completion since responses are added to the end. By construction of the durable automaton threads execute inv, do and res operations in this ordering only. Hence the execution  $\sigma$  contains an action  $do_t(op)$  prior to  $e_1$  and an action  $do_{t'}(op')$  following  $e_2$ . Hence  $e_1 \prec_{h_S} e_2$ .

In fact, we believe that the two sets in Theorem [5](#page-8-0) are equal. However, we do not need this property for our proof methodology.

<span id="page-9-2"></span>

**Fig. 2.** Possible state of persistent queue; volatile data represented using shading and volatile pointers represented using dashed arrows

## <span id="page-9-0"></span>**5 Correctness of the Persistent Queue**

In this section we present a formal verification of the persistent queue. In Section [5.1,](#page-9-1) we describe a model of the queue. In Sect. [5.2](#page-10-0) we describe the application of the refinement-based verification to this example, where we establish the relationship between an intermediate automaton and the durable automaton. Section [5.3](#page-11-0) describes the persistence points in the concrete implementation that are used in the proof, and Sect. [5.4](#page-12-0) describes the main invariants and abstraction relations.

#### <span id="page-9-1"></span>**5.1 Modelling the Persistent Queue**

To verify durable linearizability we need to model the persistent queue. The persistent queue contains two versions of each variable: one in VM and one in NVM. We model this in the automaton by two mappings  $ps, vs : Loc \rightarrow X$ , where  $Loc$  is a set of locations and  $X$  is a generic set that contains enqueued values, references, thread ids, etc. Mappings ps and vs represent the persistent and volatile states, respectively. A flush of location k updates the value of  $ps(k)$ to  $vs(k)$ , while recovery moves data from  $ps(k)$  to  $vs(k)$ . All other operations take place in  $vs(k)$ .

In order to help illustrate the structure of the queue, Fig. [2](#page-9-2) depicts a possible state of the persistent queue. Each node contains three values: a data value, a thread id (possibly −1, which is the initial value), and a next pointer. Variables phead and vhead are the persistent and volatile head pointers, respectively, and *vtail* is the volatile tail pointer. In the KIV model phead =  $ps(head)$ , *vhead* =  $vs(head)$ , etc. The values depicted by shading and the dashed arrows in the figure are volatile; in Fig. [2,](#page-9-2) these are the  $\text{degID}$  of node d and the next pointer of node  $f$ , as well as the volatile head and tail pointers. Here enqueues for nodes labelled  $a$  to  $f$  have all taken place and persisted, whereas node labelled  $g$  has been partly enqueued but not yet persisted. Nodes labelled a to c have been dequeued and persisted, but the node labelled  $d$  has been marked for dequeue, but not persisted. Here, the phead is lagging behind vhead; in an execution, phead may be lagging by an arbitrary amount as the flush of vhead is controlled by the system as opposed to an explicit flush statement in the program code.

The persistent contents of the queue (which we refer to as the *queue reference list*) corresponds to the abstract queue  $\langle d, e, f \rangle$ . In addition, our proof makes use



<span id="page-10-1"></span>**Fig. 3.** The intermediate automaton *IDQ*

of the so called old reference list, which are elements that had been persistently enqueued, but have also been persistently dequeued.

#### <span id="page-10-0"></span>**5.2 Refinement-Based Verification**

As outlined in Sect. [3,](#page-4-0) we verify durable linearizability by proving refinement between the implementation model and  $DURAUT(\mathbb{Q})$  using the IO automata formalism introduced in Sect. [4.](#page-6-0) Refinement can be proven via *forward* or *backward simulations* [\[19\]](#page-16-1); such simulations allow a step-by-step comparison between the operations using an abstraction relation. In our proof, we establish a backward simulation between the intermediate automaton and  $DurAur(Q)$  as well as a forward simulation between the implementation of the persistent queue and the intermediate automaton. The proof uses an intermediate automaton that resolves non-determinism at the abstract level as used in existing proofs of MSQ [\[8](#page-15-7)[,9](#page-15-6)]. Since refinement guarantees trace inclusion, this is sufficient to show that the persistent queue is durably linearizable.

The intermediate automaton *IDQ*, presented in Fig. [3,](#page-10-1) is similar to the durable automaton for the queue datatype,  $DURAUT(\mathbb{Q})$  (see Fig. [1](#page-7-1) instanti-ated for the queue from Example [2\)](#page-5-0). As with  $DurAur(\mathbb{Q})$ , it has variables pc, *val* and *out*, which play the same role, and variable  $q$  instantiates the state  $s$ . Furthermore, all its actions except for *checkEmp* are also actions of  $DurAur(\mathbb{Q})$ , and have essentially the same effect. For *IDQ* we get the following property[2](#page-11-1).

## **Theorem 6.**  $traces(IDQ) \subseteq traces(DURAUT(Q))$ .

The additional features of *IDQ* exist to model a behaviour where a dequeue thread first *observes* that the queue is empty, and later decides to return empty, at a point when the queue may no longer *be* empty. The observation is modelled by a *checkEmp<sup>t</sup>* action, which records in the *obsEmp<sup>t</sup>* variable the fact that the queue was empty during the execution of t's dequeue operation. In this automaton, it is possible for a thread t to execute a  $dof(Deq)$  transition and set the output value to empty whenever  $obsEmp(t)$  has been set to *true*. We note that the queue may not actually be empty when this transition takes place, but this does not affect soundness of the proof method since  $obsEmp(t)$  being set to true indicates that the queue has been empty at *some point* during the operation's execution. Further details of this technique, in the context of linearizability, may be found in  $[4,8,9]$  $[4,8,9]$  $[4,8,9]$  $[4,8,9]$ .

## <span id="page-11-0"></span>**5.3 Identification of Persistence Points**

To match executions of the concrete implementation with the abstract level, we must identify the *persistence points* of the implementation, which are atomic events whose execution causes the effect of the corresponding operations to take effect at the abstract level. These are analogous to standard linearizability, where proofs proceed via identification of *linearization points* [\[9\]](#page-15-6). In durable linearizability, persistence points are typically statements (flush events) that cause the operation under consideration to become durable. Thus these statements must be simulated by the abstract do operation. Note that persistent points must occur after an operation has taken effect in NVM, but before the operation returns.

In MSQ, the enqueue operation linearizes upon successful execution of the CAS at line 9. However, in the persistent queue, this line is not the persistence point of the operation, rather it is the first operation that *flushes the effect of this CAS*, i.e., the first flush of the next pointer to the enqueued element. This may occur in line 10 of the same thread, line 14 executed by another thread, or due to a system-controlled flush. Despite there being several possible choices for the persistence point, it is possible to prove forward simulation with respect to the do(Enq) operation of the intermediate automaton *IDQ*.

The verification of the empty dequeue follows a similar pattern to the verification of the empty dequeue of MSQ. The persistence point is conditional on the future execution of the operation, thus we refer to the persistence point as a *potential persistence point* (this is similar to the concept of potential linearization points  $[4,8,9]$  $[4,8,9]$  $[4,8,9]$  $[4,8,9]$ . The empty dequeue potentially takes effect at line 9 if the value

<span id="page-11-1"></span> $2$  For the proof, see the full paper at [\[17](#page-15-4)].

loaded for  $n \times t$  is null, but this decision is not resolved until later in the operation (line 12). Using the intermediate automaton (Fig. [3\)](#page-10-1), it allows the proof to proceed via forward simulation, like earlier proofs of linearizability [\[8,](#page-15-7)[9\]](#page-15-6).

A non-empty dequeue linearizes in VM when the node that is dequeued is marked for deletion by updating its deqID field at line 20. Like the enqueue, the persistence point of the dequeue is the first flush of this deqID field. This may occur either at line 21 of the same thread, line 29 of another thread, or a system flush. Again, we show that each of these steps simulates the  $do(Deq)$  operation of the intermediate automaton.

## <span id="page-12-0"></span>**5.4 Key Invariants and Abstraction Relation**

There are several key properties that the persistent queue must maintain in order to ensure correctness. These are formalised as invariants in our proof. Here we describe them in plain English:

- 1. We keep track of two sublists: *old reference list*, which are elements that have been dequeued, and *queue reference list*, which are elements that form the current queue. Formalising the structure of these lists and ensuring global correctness of the invariant is one of the most difficult parts of the proof. This is particularly true for steps that correspond to persistence points (see below) since the volatile pointers can be "lagging" immediately after persistence.
- 2. All nodes of the queue must be reachable in NVM (i.e.,  $ps$ ) from phead. This means that the nodes including the next pointers must be made persistent prior to inserting a new node.
- 3. All nodes in the old reference list must have a deqID field different from <sup>−</sup><sup>1</sup> in ps, indicating that they have been dequeued.
- 4. All nodes in the queue reference list must have a deqID field value  $-1$  in ps.
- 5. Only the first node in queue reference list may have a deqID field value different from  $-1$  in vs.
- 6. Pointers phead and ptail may be lagging behind vhead and vtail, respectively. However, vhead may not overtake vtail.

We now describe the state of the queue after execution of some key steps of the algorithm.

*Dequeue Persistence Point.* A node is considered to be dequeued if its logical deletion is flushed, i.e., the deqID marked by a thread id is flushed. For the queue depicted in Fig. [2,](#page-9-2) the queue immediately after the volatile deqID of node  $d$  is flushed is as follows.



The abstract queue corresponding to this queue is  $\langle e, f \rangle$ . Note that in the queue above, vhead pointer is now lagging and must be updated to point to the new sentinel node d.

*Enqueue Persistence Point.* A node is considered to be enqueued if it can be reached from *phead* in NVM, and its deqID field in persistent memory is  $-1$ . Consider again the queue in Fig. [2.](#page-9-2) The queue immediately after the next pointer of f becomes persistent is as follows.



Note that this transformation must be performed before moving *vtail*, otherwise the nodes after  $g$  could be lost upon system crash. In the queue above *vtail* is lagging, and hence, must be updated before a new node can be enqueued. As soon as the next pointer of  $f$  becomes persistent, the node  $g$  is considered to be part of the queue, i.e., the abstract queue corresponding to the queue above is  $\langle d, e, f, q \rangle$ .

*Crash and Recovery.* Finally, consider the queue in Fig. [2](#page-9-2) after a crash and recovery:



The volatile deqID value for node  $d$  is restored from persistent memory, but the node  $g$  is lost.

*Abstraction Relation and Mechanisation in KIV.* These invariants enable us to prove a refinement between the implementation and *IDQ* in Fig. [3.](#page-10-1) The main part of the abstraction relation states that the abstract queue corresponds to values in the queue reference list. For an enqueue, the first flush that persists the next pointer (i.e., the effect of line 9) must match  $dof(ap)$  with  $op = Eng$ . For a non-empty dequeue, the first flush that persists deqID must match  $dof(bp)$  with  $op = Deg$  in Fig. [3.](#page-10-1) An empty dequeue must match *checkEmp<sub>t</sub>* when it loads nxt at line 9 and  $dof(Deq)$  if the test at line 12 succeeds.

This refinement has been interactively, mechanically proven in the KIV theorem prover [\[10](#page-15-3)] (see [\[17\]](#page-15-4) for the KIV proof and the encodings), which has been used extensively in the verification of concurrent data structures (e.g.,  $[4,24]$  $[4,24]$  $[4,24]$ ). The proof of the invariant in KIV is simplified via the use of a *rely condition* [\[15](#page-15-9)] that captures interference from a thread's environment in an abstract manner. Roughly speaking, a rely condition is a relation over the states of an automaton that must preserve the invariant of each transaction, and that must abstract the

transitions of each transaction. Similar techniques have been used in previous proofs of concurrent algorithms [\[6\]](#page-15-10).

## **6 Conclusion**

There are numerous approaches to proving (standard) linearizability of concurrent data structures (e.g., [\[1,](#page-14-0)[24](#page-16-2)[,27](#page-16-3)]; see [\[9\]](#page-15-6) for an overview), including specialisations to cope with weak memory models (e.g.,  $[2,5,7,22,25,26]$  $[2,5,7,22,25,26]$  $[2,5,7,22,25,26]$  $[2,5,7,22,25,26]$  $[2,5,7,22,25,26]$  $[2,5,7,22,25,26]$  $[2,5,7,22,25,26]$  $[2,5,7,22,25,26]$ ). The recent development of NVM has been accompanied by persistent versions of well-known concurrent constructs, including concurrent objects [\[3,](#page-15-13)[11\]](#page-15-0), synchronisation primitives [\[13](#page-15-14)[,21](#page-16-7)] and transactional memory [\[16\]](#page-15-15). This paper has focussed on a persistent queue [\[11](#page-15-0)], against the recently developed notion of durable linearizability [\[14](#page-15-1)].

Development of objects implemented for NVM presents a similar challenge to weak memory, in the sense that there are multiples levels of memory to consider. Moreover, caches and registers are volatile, while cache flush instructions allow reordering with store instructions in accordance with the memory model of the system (e.g., [\[23](#page-16-8)]). Correctness in the presence of crashes and recovery can be affected by the order in which elements are persisted, which necessitates the use of programmer-controlled flush operations, increasing complexity. Unfortunately, proofs of correctness (e.g., of durable linearizability) are either given informally or are entirely lacking. This gives little confidence in the correctness of the underlying persistent objects.

Verification of persistent memory algorithms is inherently more complex than in the standard setting. Since an operation only takes effect after a flush event, helping is inevitably required to bring the data structure into a consistent state and for an operation to take effect. For proofs by refinement, these additional helping steps have to be considered in the simulation proof. This ultimately complicates the invariant since the helping is performed by another thread (including a system thread). Moreover, since the state of the data structure can be "lagging" immediately after helping is performed, precisely formalising the underlying helping mechanism further complicates the invariant. Future work will consider how best to manage this additional proof complexity.

**Acknowledgements.** We thank Lindsay Groves for comments that have helped improve this paper.

### **References**

- <span id="page-14-0"></span>1. Abdulla, P.A., Haziza, F., Holík, L., Jonsson, B., Rezine, A.: An integrated specification and verification technique for highly concurrent data structures. In: Piterman, N., Smolka, S.A. (eds.) TACAS 2013. LNCS, vol. 7795, pp. 324–338. Springer, Heidelberg (2013). [https://doi.org/10.1007/978-3-642-36742-7](https://doi.org/10.1007/978-3-642-36742-7_23) 23
- <span id="page-14-1"></span>2. Batty, M., Dodds, M., Gotsman, A.: Library abstraction for  $C/C++$  concurrency. In: Giacobazzi, R., Cousot, R. (eds.) Symposium on Principles of Programming Languages, POPL, pp. 235–248. ACM (2013). [https://doi.org/10.1145/2429069.](https://doi.org/10.1145/2429069.2429099) [2429099](https://doi.org/10.1145/2429069.2429099)
- <span id="page-15-13"></span>3. Cohen, N., Aksun, D.T., Larus, J.R.: Object-oriented recovery for non-volatile memory. PACMPL **2**(OOPSLA), 153:1–153:22 (2018)
- <span id="page-15-8"></span>4. Derrick, J., Schellhorn, G., Wehrheim, H.: Verifying linearisability with potential linearisation points. In: Butler, M., Schulte, W. (eds.) FM 2011. LNCS, vol. 6664, pp. 323–337. Springer, Heidelberg (2011). [https://doi.org/10.1007/978-3-](https://doi.org/10.1007/978-3-642-21437-0_25) [642-21437-0](https://doi.org/10.1007/978-3-642-21437-0_25) 25
- <span id="page-15-11"></span>5. Derrick, J., Smith, G., Groves, L., Dongol, B.: A proof method for linearizability on TSO architectures. In: Hinchey, M.G., Bowen, J.P., Olderog, E.-R. (eds.) Provably Correct Systems. NMSSE, pp. 61–91. Springer, Cham (2017). [https://doi.org/10.](https://doi.org/10.1007/978-3-319-48628-4_4) [1007/978-3-319-48628-4](https://doi.org/10.1007/978-3-319-48628-4_4) 4
- <span id="page-15-10"></span>6. Doherty, S., Dongol, B., Derrick, J., Schellhorn, G., Wehrheim, H.: Proving opacity of a pessimistic STM. In: OPODIS, LIPIcs, vol. 70, pp. 35:1–35:17. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2016)
- <span id="page-15-12"></span>7. Doherty, S., Dongol, B., Wehrheim, H., Derrick, J.: Making linearizability compositional for partially ordered executions. In: Furia, C.A., Winter, K. (eds.) IFM 2018. LNCS, vol. 11023, pp. 110–129. Springer, Cham (2018). [https://doi.org/10.](https://doi.org/10.1007/978-3-319-98938-9_7) [1007/978-3-319-98938-9](https://doi.org/10.1007/978-3-319-98938-9_7) 7
- <span id="page-15-7"></span>8. Doherty, S., Groves, L., Luchangco, V., Moir, M.: Formal verification of a practical lock-free queue algorithm. In: de Frutos-Escrig, D., Núñez, M. (eds.) FORTE 2004. LNCS, vol. 3235, pp. 97–114. Springer, Heidelberg (2004). [https://doi.org/10.1007/](https://doi.org/10.1007/978-3-540-30232-2_7) [978-3-540-30232-2](https://doi.org/10.1007/978-3-540-30232-2_7) 7
- <span id="page-15-6"></span>9. Dongol, B., Derrick, J.: Verifying linearisability: a comparative survey. ACM Comput. Surv. **48**(2), 19:1–19:43 (2015). <https://doi.org/10.1145/2796550>
- <span id="page-15-3"></span>10. Ernst, G., Pfähler, J., Schellhorn, G., Haneberg, D., Reif, W.: KIV-overview and verifythis competition. Softw. Tools Technol. Transf. (STTT) **17**(6), 677–694 (2015)
- <span id="page-15-0"></span>11. Friedman, M., Herlihy, M., Marathe, V.J., Petrank, E.: A persistent lock-free queue for non-volatile memory. In: Krall, A., Gross, T.R. (eds.) ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP, pp. 28–40. ACM (2018). <https://doi.org/10.1145/3178487.3178490>
- <span id="page-15-2"></span>12. Herlihy, M., Wing, J.M.: Linearizability: a correctness condition for concurrent objects. ACM TOPLAS **12**(3), 463–492 (1990)
- <span id="page-15-14"></span>13. Huang, Y., Pavlovic, M., Marathe, V.J., Seltzer, M., Harris, T., Byan, S.: Closing the performance gap between volatile and persistent key-value stores using crossreferencing logs. In: USENIX Annual Technical Conference, pp. 967–979. USENIX Association (2018)
- <span id="page-15-1"></span>14. Izraelevitz, J., Mendes, H., Scott, M.L.: Linearizability of persistent memory objects under a full-system-crash failure model. In: Gavoille, C., Ilcinkas, D. (eds.) DISC 2016. LNCS, vol. 9888, pp. 313–327. Springer, Heidelberg (2016). [https://](https://doi.org/10.1007/978-3-662-53426-7_23) [doi.org/10.1007/978-3-662-53426-7](https://doi.org/10.1007/978-3-662-53426-7_23) 23
- <span id="page-15-9"></span>15. Jones, C.B.: Tentative steps toward a development method for interfering programs. ACM Trans. Program. Lang. Syst. **5**(4), 596–619 (1983). [https://doi.org/](https://doi.org/10.1145/69575.69577) [10.1145/69575.69577](https://doi.org/10.1145/69575.69577)
- <span id="page-15-15"></span>16. Joshi, A., Nagarajan, V., Cintra, M., Viglas, S.: DHTM: durable hardware transactional memory. In: ISCA, pp. 452–465. IEEE Computer Society (2018)
- <span id="page-15-4"></span>17. KIV proofs for the durable linearizable queue (2019). [http://www.informatik.uni](http://www.informatik.uni-augsburg.de/swt/projects/Durable-Queue.html)[augsburg.de/swt/projects/Durable-Queue.html](http://www.informatik.uni-augsburg.de/swt/projects/Durable-Queue.html)
- <span id="page-15-5"></span>18. Lynch, N.A., Tuttle, M.R.: Hierarchical correctness proofs for distributed algorithms. In: PODC, pp. 137–151. ACM, New York (1987). [https://doi.org/10.1145/](https://doi.org/10.1145/41840.41852) [41840.41852](https://doi.org/10.1145/41840.41852)
- <span id="page-16-1"></span>19. Lynch, N., Vaandrager, F.W.: Forward and backward simulations part I: untimed systems. Inf. Comput. Inf. Control - IANDC **121**, 214–233 (1995). [https://doi.org/](https://doi.org/10.1006/inco.1995.1134) [10.1006/inco.1995.1134](https://doi.org/10.1006/inco.1995.1134)
- <span id="page-16-0"></span>20. Michael, M.M., Scott, M.L.: Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In: Proceedings of 15th ACM Symposium on Principles of Distributed Computing, pp. 267–275 (1996)
- <span id="page-16-7"></span>21. Pavlovic, M., Kogan, A., Marathe, V.J., Harris, T.: Brief announcement: persistent multi-word compare-and-swap, In: PODC, pp. 37–39. ACM (2018)
- <span id="page-16-4"></span>22. Raad, A., Doko, M., Rozic, L., Lahav, O., Vafeiadis, V.: On library correctness under weak memory consistency: specifying and verifying concurrent libraries under declarative consistency models. PACMPL **3**(POPL), 68:1–68:31 (2019). <https://dl.acm.org/citation.cfm?id=3290381>
- <span id="page-16-8"></span>23. Raad, A., Vafeiadis, V.: Persistence semantics for weak memory: integrating epoch persistency with the TSO memory model. PACMPL **2**(OOPSLA), 137:1–137:27 (2018)
- <span id="page-16-2"></span>24. Schellhorn, G., Derrick, J., Wehrheim, H.: A sound and complete proof technique for linearizability of concurrent data structures. ACM Trans. Comput. Logic **15**(4), 31:1–31:37 (2014). <https://doi.org/10.1145/2629496>
- <span id="page-16-5"></span>25. Travkin, O., Mütze, A., Wehrheim, H.: SPIN as a linearizability checker under weak memory models. In: Bertacco, V., Legay, A. (eds.) HVC 2013. LNCS, vol. 8244, pp. 311–326. Springer, Cham (2013). [https://doi.org/10.1007/978-3-319-03077-7](https://doi.org/10.1007/978-3-319-03077-7_21) 21
- <span id="page-16-6"></span>26. Travkin, O., Wehrheim, H.: Verification of concurrent programs on weak memory models. In: Sampaio, A., Wang, F. (eds.) ICTAC 2016. LNCS, vol. 9965, pp. 3–24. Springer, Cham (2016). [https://doi.org/10.1007/978-3-319-46750-4](https://doi.org/10.1007/978-3-319-46750-4_1) 1
- <span id="page-16-3"></span>27. Vafeiadis, V.: Automatically proving linearizability. In: Touili, T., Cook, B., Jackson, P. (eds.) CAV 2010. LNCS, vol. 6174, pp. 450–464. Springer, Heidelberg (2010). [https://doi.org/10.1007/978-3-642-14295-6](https://doi.org/10.1007/978-3-642-14295-6_40) 40