1 Introduction

Non-volatile memory (NVM) technologies, e.g., Intel Optane, enable byte-addressable accesses as allowed by DRAM, while retaining the benefits of persistent storage. NVM has the potential to radically impact future systems since they can be designed to efficiently recover from a system-wide crash. However, NVM also introduces new programming challenges and requires previous notions of correctness to be reconsidered. Such challenges are particularly acute for concurrent programs, where one additionally has to understand interactions between persistency (concerning the order in which memory updates are persisted) and weak memory consistency (concerning the order in which memory updates in one thread become visible to other threads).

There has been widespread interest on NVM, with several works characterising their semantics in the context of hardware weak memory models [12, 57, 59, 60]. Alongside these low-level semantics, a separate line of work has focussed on adapting correctness conditions such as linearisability [34] and opacity [33], obtaining corresponding conditions such as durable linearisability [38] and durable opacity [5]. Such conditions provide a basis for developing high-level synchronisation mechanisms such as concurrent objects, in the case of (durable) linearisability, and transactional memory, in the case of (durable) opacity.

Our work brings these two lines of work together in the context of recoverable concurrent transactional memory [14, 39, 41, 47]. In particular, we develop \(\text {dTML}_{\text {Px86}} \), which is an adaptation of the durable Transactional Mutex Lock (dTML) algorithm [5], which is itself a durable extension of the Transactional Mutex Lock [15] with logging mechanisms that support recoverability. The dTML algorithm has been designed for a strong memory model (PSC) [44], which extends sequential consistency (SC) [48] with persistency. However, PSC is a strong memory model that is unrealistic for modern architectures (e.g., Intel), which only provide weak memory guarantees.

1.1 Designing, modelling and verifying \({dTML}_{{Px86}}\)

Unlike prior works, as the name implies, \(\text {dTML}_{\text {Px86}} \) assumes Intel’s x86 persistency and consistency model (\(\text {Px86} \)) [37], which extends the x86 Total Store Order (TSO) model [64] with a persistency semantics [12, 44, 57, 59, 60]. Here, like in TSO, each write is first cached in a local FIFO store buffer (and only visible to the writing thread), then later propagated to the volatile shared memory (whereby it becomes visible to other threads). Writes in the volatile shared memory are later persisted by propagating them to NVM.

In \(\text {Px86} \), the order in which writes become persistent may differ from the order in which they were issued. To address this, \(\text {Px86} \) provides instructions, e.g., flush, \({{\textbf {flush}}}_{\text {opt}}\), that explicitly flush locationsFootnote 1 to NVM, ensuring that the corresponding locations are persisted. The flush instruction flushes a location line in a synchronous manner, blocking the executing thread until the prior write has been persisted. The optimised flush instruction (\({{\textbf {flush}}}_{\text {opt}}\)) flushes a single location but in an asynchronous manner (without blocking the execution of the corresponding thread). The \({{\textbf {flush}}}_{\text {opt}}\) instruction is not ordered with respect to any following write, \({{\textbf {flush}}}_{\text {opt}}\), or \({{\textbf {flush}}}\) (when applied to an address in a different location) instructions [57, Fig. 3], and only serves to tag locations that are to be persisted later. As a result, the execution of a \({{\textbf {flush}}}_{\text {opt}}\) on an address x, does not provide any guarantees about the value of x in persistent memory. To restrict the additional weak behaviors that \({{\textbf {flush}}}_{\text {opt}}\) introduces, \(\text {Px86} \) provides a store fence (\({{\textbf {sfence}}} \)) instruction that orders store instructions with \({{\textbf {flush}}}_{\text {opt}}\). The \({{\textbf {flush}}}_{\text {opt}}\) instruction is guaranteed to take effect (the contents of the given location reach the persistent memory) when a following \({{\textbf {sfence}}} \) instruction is executed.

\(\text {dTML}_{\text {Px86}}\) is designed to make use of \({{\textbf {flush}}}_{\text {opt}}\) instructions for efficiency. However, this introduces new verification challenges. Namely, \({{\textbf {flush}}}_{\text {opt}}\) instructions are difficult to reason about and, in fact, some earlier logics [58] only provided partial support for \({{\textbf {flush}}}_{\text {opt}}\) instructions, requiring a program with \({{\textbf {flush}}}_{\text {opt}}\) instructions to be transformed into a program with \({{\textbf {flush}}}\) instructions only. This transformation technique was known to be incomplete [58]. Full support for \({{\textbf {flush}}}_{\text {opt}}\) was only provided after development of the view-based semantics of \(\text {Px86} \) (which we call \(\text {Px86}_{\textrm{view}} \)) [12] and a corresponding Owicki–Gries logic [8]. Our proofs for \(\text {dTML}_{\text {Px86}}\) represent the first large-scale proofs of correctness for a realistic program that uses \({{\textbf {flush}}}_{\text {opt}}\) instructions.

We aim to achieve full operational proofs of correctness, therefore we build on the aforementioned \(\text {Px86}_{\textrm{view}} \) semantics [12] and Owicki–Gries logic [8]. However, using this logic directly in our current work is not possible for two reasons.

  1. (1)

    Like prior works on verifying \(\text {Px86} \) programs [12, 58], Bila et al. [8] have only focussed on reasoning about the behaviour upto the first crash of the program. To fully establish correctness of \(\text {dTML}_{\text {Px86}}\), it is critical to also reason about the program after restarting the system.

  2. (2)

    The assertions of Pierogi defined by Bila et al. [8] are inadequate for reasoning about certain phenomena that occur in \(\text {dTML}_{\text {Px86}} \). In particular, we must often reason about memory patterns by considering the order in which writes occur.

One of our contributions is an extension of the view-based semantics [12] as well as the associated logic [8] to enable reasoning about program recovery (after a crash), as well as new assertions that enable reasoning about the last writes to a location.

Our correctness proof of \(\text {dTML}_{\text {Px86}}\) uses forward simulation to establish a refinement with respect to an abstract operational specification (called \(\text {dTMS2} \) [5]). This, to our knowledge, is the first operational proof of refinement for the Px86. Other works have used refinement to verify durable linearisability directly under the declarative \(\text {Px86} \) model [24, 56]. Unlike our work, these prior works are not accompanied by any mechanisation. Dalvandi and Dongol [16] have considered operational refinement proofs of transactional memory algorithms under the RC11 memory model with full mechanisation in Isabelle/HOL. These proofs have a different set of complexities (due to relaxed and release-acquire accesses), but do not require consideration of durability or recovery as we do in \(\text {dTML}_{\text {Px86}}\).

1.2 Contributions

This paper comprises the following main contributions.

  1. (1)

    We develop a durable transactional memory \(\text {dTML}_{\text {Px86}} \) that guarantees durable opacity under \(\text {Px86}_{\textrm{view}} \) (and hence Px86). As mentioned above, \(\text {dTML}_{\text {Px86}}\) makes use of \({{\textbf {flush}}}_{\text {opt}}\) instructions for improved efficiency, which increases the verification challenge.

  2. (2)

    We develop a extension of the \(\text {Px86}_{\textrm{view}} \) semantics to enable operational reasoning about the behaviour of program after a crash, i.e., the recovery and subsequent execution. This is coupled with an extended Owicki–Gries logic that is also capable of reasoning about recovery steps.

  3. (3)

    To take advantage of our operational reasoning technique, we apply a simulation-based proof to show correctness of \(\text {dTML}_{\text {Px86}} \) by refinement. The proof proceeds via a long-established technique of establishing a forward simulation between the implementation and an abstract specification [5, 18, 22]. In the context of transactional memory, we prove that \(\text {dTML}_{\text {Px86}} \) is a refinement of an operational model, dTMS2 [5], whose traces are guaranteed to be durably opaque.

  4. (4)

    We mechanise our entire development in Isabelle/HOL, ranging from the semantics, logic (including soundness of the atomic Hoare triples), and all proofs pertaining to \(\text {dTML}_{\text {Px86}} \), including proofs of the invariant and simulation.

1.3 Supplementary material

The Isabelle/HOL development accompanying this paper is available at [7].

1.4 Overview

This paper is organised as follows. In Sect. refsec:motivation, we provide some background and further motivation for our work, and in Sect. 3, we recap durable opacity as well as an operational model that guarantees durable opacity. In Sect. 4, we present a view-based operational model for \(\text {Px86} \), including our extensions that model recovery after a crash. We present our extended \(\text {dTML}_{\text {Px86}}\) algorithm in Sect. 5. We present the Owicki–Gries proof technique (further extended to cope with recovery) and the invariants of \(\text {dTML}_{\text {Px86}}\) in Sect. 6. In Sect. 7 we present the durable opacity proof of \(\text {dTML}_{\text {Px86}}\) and in Sect. 8 we discuss related work.

2 Background and motivation

In this section, we provide some basic high-level background and general motivation for our work.

2.1 Px86 semantics

To illustrate the behaviours of different persistent memory instructions, we use three examples (see Fig. 1) by Raad et al. [59], which demonstrate the behaviour of \({{\textbf {flush}}}_{\text {opt}}\) instructions. The assertion at the end of each program (indicated by ) expresses persistent invariant [8], i.e., the persistent memory state if the corresponding program crashes.

The program in Fig. 1a first writes the value 1 to location x, then issues an optimised flush instruction to x. Finally, it writes the value 1 to location y. During its execution, both values 0 and 1 possible values for both locations x and y in the persistent memory. This is because \({{\textbf {flush}}}_{\text {opt}}\ x\) by itself does not guarantee that x is persisted before \({{\textbf {store}}}\,\,y\,\, 1\) is executed. In fact, after executing \({{\textbf {store}}}\,\,y\,\, 1\), it may be the case that y may be set to 1 in persistent memory before x is set to 1.

To prevent potential reorderings between optimised flushes and later instructions, one can use the \({{\textbf {sfence}}} \) (store fence) instruction as mentioned above. Other options would be using (RMW) instructions such as a compare-and-swap (CAS) or fetch-and-add (FAA). As illustrated in Fig. 1b adding an sfence instruction before \({{\textbf {store}}}\,\,y\,\, 1\) prevents the \({{\textbf {flush}}}_{\text {opt}}\ x\) from being reordered after it. Thus, if the persistent value of y is 1, then \({{\textbf {store}}}\,\,y\,\, 1\) must have been executed, and hence \({{\textbf {sfence}}} \) must have also been executed, which means that the persistent value of x is 1.

The program in Fig. 1c constitutes a message passing example. As in TSO, loading the value 1 for y (stored in register r) in the second thread, indicates that the store of value 1 at y from the first thread has been already evicted from its local store buffer. Since the store of value 1 to x precedes the store to y in the first thread, this means that the write to x has also been evicted from the first thread’s store buffer and therefore is visible to the second thread. The \({{\textbf {flush}}}_{\text {opt}}\ x\) instruction in the second thread cannot be reordered before the preceding load. Hence, when the \({{\textbf {flush}}}_{\text {opt}}\ x\) is executed, the value of x seen by the second thread must be 1. Moreover, after the \({{\textbf {sfence}}} \) is executed, we can be sure that the value of x in persistent memory is 1. Thus, if the persistent value of z is 1 (meaning that the store to z has been executed), then the persistent value of x is also 1.

Fig. 1
figure 1

Example Px86 programs by Raad et al. [59] where the assertion defines the possible persisted values during the execution. In all examples x, y, z are distinct locations with initial value 0, and r is a (thread-local) register.

2.2 Implementation challenges under Px86

Transactional memory (TM) aims to simplify concurrent programming by executing operations (loads, stores) within a transaction with an illusion of atomicity. That is, all changes to data inside a transaction are performed as if they were a single operation. Transactions also execute in an all-or-nothing manner—either all operations occur (i.e., the corresponding transaction commits), or none occur (i.e., the corresponding transaction aborts). We aim to develop a TM algorithm that ensures durable opacity [5], which we discuss in §3.

There are two main challenges when developing durable TM algorithms under weak memory models such as \(\text {Px86} \).

  1. (1)

    The first challenge concerns thread synchronisation. In a weak memory context, a read of a shared location may return a stale value, i.e., a value that is not the location’s last written value. To address this, we must use instructions with strong ordering guarantees (e.g., CAS) at key points within \(\text {dTML}_{\text {Px86}}\) to prevent transactions from reading stale values.

  2. (2)

    The second challenge concerns durability. Without correct placement of explicit flush instructions and the careful design of a recovery mechanism, there is no guarantee of correctness after a system crash. To tackle this, we must strategically position \({{\textbf {flush}}}_{\text {opt}}\) and \({{\textbf {sfence}}} \) instructions in a way that does not compromise the algorithm’s efficiency. We must also design a recovery process that enables the state to be reset to a consistent state after a crash.

3 Durable opacity

Opacity has been extensively covered in the literature [2, 3, 17, 21, 33, 49], while the formal definition of durable opacity may be found in [5, 6]. We provide these formal definitions in Sect. C, and explain the key concepts here through example (Sect. 3.1). Formally, we only require an operational characterisation of durable opacity called \(\text {dTMS2} \) [5, 6], which we present as an input/output automaton in Sect. 3.2. dTMS2 has been used in prior proofs of durable opacity [5, 6, 23] including recent model checking encodings under Px86  [61].

Note that in this paper, for simplicity, we conflate threads and transactions, i.e., each thread is assumed to execute at most one transaction. This restriction can easily be lifted, but at the cost of additional notational overhead [16], whereby we explictly track the transaction executed by each thread in a special state variable. In the following sections, we often use the terms thread and transaction interchangeably.

3.1 Opacity and durable opacity

The discussion and example below is adapted from our earlier work [23].

Correctness conditions for TM are defined in terms of histories of externally visible events, which are the external calls (invocations) and returns (responses) of TM operations. Typically, we have a pair of events for operations TMBegin, TMRead, TMWrite and TMCommit, noting that an operation call may return with an abort.

A concurrent history comprises an interleaving of (external) events from the different operations executed by different transactions. Each history is assumed to be well formed, i.e., the history, when restricted to a single transaction starts with a TMBegin, possibly followed by a number of TMRead and TMWrite operations, possibly followed by a TMCommit operation (see Fig. 2). Moreover, each operation executed by a transaction must have responded before the next operation is invoked.

A transaction is complete in a history if it has responded with TMCommit(ok) or an abort event, and once completed, the transaction must not execute any further operations. However, a transaction within a history may not be complete, i.e., may be a live transaction.

TM implementations are typically designed to be serialisable, i.e., there is a total order of committed transactions that is consistent with a sequential history. The TM implementations of interest in this paper in fact guarantee strict serialisability, which means that the total order of operations must additionally respect the real-time order, i.e., if transaction \(t_1\) commits before transaction \(t_2\) starts, then \(t_1\) must serialise before \(t_2\). Concurrent (i.e., overlapping) transactions may, however, be serialised in any order. TM implementations also typically provide a semantics for live and aborted transactions. A well-studied condition here is opacity [33], which ensures that there exists a total order across all transactions so that the committed transactions are strictly serialised and the aborted transactions are consistent with the serialisation order.

While the above provides semantics for transaction consistency, under NVM, we also require a further guarantee of failure atomicity. To this end, we follow the notion of durable opacity [5], where all transactions committed before a crash are persistent (after the crash), and in addition, the effects of any partially executed transactions are generally not visible after the crash. This concept is similar to that of durable linearisability [38], for concurrent objects.

A durable concurrent history is a concurrent history interleaved with crash events. A durable concurrent history is well formed iff the history with crash events removed is well formed and, moreover, no transaction that started before the crash continues executing after the crash.

Durable opacity, defined over durable concurrent histories, simply requires that the given history is opaque after all crash events are removed. Note that this means that any live transactions before a crash are aborted, and the writes of any committed transactions are persisted, i.e., are not lost after crash.

Example 1

(Dongol and Le-Papin [23]) Consider the history given below, where we elide the response events as well as the \(\texttt {TMBegin}\) / \(\texttt {TMCommit}\) operations, focussing instead on the allowable order of the transactions \(t_1\)-\(t_9\). We use \(R_i\, x\, v\) to denote a completed read operation by transaction \(t_i\) on variable x returning value v. (Similarly \(W_i\, x\, v\).) We use \(R_i\, x\, \_\) to denote a \(\texttt {TMRead}\) operation that has been invoked by \(t_i\) but not returned. All transactions except for transactions \(t_4\) and \(t_5\) are committed. Transaction \(t_4\) is a live transaction that is interrupted by a crash, and transaction \(t_5\) is an aborted complete transaction.

To show that the history above is durably opaque, we must remove the crash events, and show that the remaining history is opaque. Here, we must find a total order among all (including live and aborted) transactions so that the values returned by the read operations are consistent with the memory semantics w.r.t. the committed transactions. This total order must respect the real-time order of transactions, e.g., \(t_1\) and \(t_2\) may not be reordered. Assuming all variables are initialised to 0, an ordering that satisfies these constraints is: \(t_5 \prec t_1 \prec t_3 \prec t_2 \prec t_4\prec t_6\prec t_7\prec t_8\). Other orders are possible, however, for example, \(t_1\) cannot occur before \(t_5\) even though \(t_5\) aborts (if it did, \(R_5\, y\, 0\) would be inconsistent with the memory semantics).

One caveat of durable opacity pertains to transactions that have already invoked (but not returned from) \(\texttt {TMCommit}\) when a crash occurs. When removing crash events from the history, such transactions may either be treated as a committed transaction, or a live (and hence aborted) transaction [5].

Example 2

Consider the history given below, which comprises committed transactions \(t_1\) and \(t_5\) and live transactions \(t_2\), \(t_3\) and \(t_4\) that are interrupted by a crash. We assume that both \(t_2\) and \(t_3\) have started committing when crash occurs, but \(t_3\) has not.

The history is durably opaque, e.g., \(t_1 \prec t_2 \prec t_3\prec t_4 \prec t_5\) is a valid total order, where we treat \(t_2\) as an aborted transaction, but \(t_3\) as a committed transaction. Transaction \(t_4\) can only be considered as a live (and hence aborted) transaction.

3.2 The dTMS2 operational specification

While Sect. 3.1 provides a pedagogical overview of durable opacity, the formal aspects needed for this paper are provided by an operational specification, dTMS2  [5]. dTMS2 extends the TMS2 operational model [20] with a crash-recovery operation (Crash). The transitions of dTMS2 are given in Fig. 2. TMS2 has been shown to imply opacity [50], while dTMS2 has been shown to imply durable opacity [5]. Thus every history of dTMS2 is guaranteed to be durably opaque. Later in Sect. 6, we will show that \(\text {dTML}_{\text {Px86}}\) is a refinement of dTMS2, and hence also guaranteed to satisfy durable opacity.

The memory of \(\text {dTMS2} \) is modelled as a sequence of mappings from locations to values \(L\in (\textsc {Loc}\rightarrow \textsc {Val})^*\). We refer to each such mapping as a memory snapshot. Interestingly, as in other works on refinement-based proofs of durability [18], there is no need distinguish between volatile and persistent memory state in the abstract specification, and the entire state is considered to be persistent. Like \(\text {dTML}_{\text {Px86}} \), \(\text {dTMS2} \) supports operations (\(\texttt {TMBegin}\)), read (\(\texttt {TMRead}\)), write (\(\texttt {TMWrite}\)) and commit (\(\texttt {TMCommit}\)).

Fig. 2
figure 2

Transitions of \(\text {dTMS2} \) for a transaction t, where \(\textit{validIdx}(t, n) = {\textsf {beginIdx}}_{t}\le n < |{L}|\wedge \textsf{rdSet}_t\subseteq L(n)\) and \(\oplus \) denotes a functional override. For example \(f \oplus \{x \rightarrow v \} = \; \lambda y. \; {{\textbf {if}}} \; y = x \; {{\textbf {then}}}\; v \; else \; f(y) \). arrows represent possible transitions for either an abort or crash. arrows represent transitions that are possible for a crash only. Dashed arrows (e.g., \(\texttt {doBegin}_t\)) represent internal transitions.

Writing transactions of \(\text {dTMS2} \) assume a deferred update policy, i.e., each transaction t maintains a write set (\(\textsf{wrSet}_t\)) that records the values that t has written during its execution (see \(\texttt {doWrite}_t(x,v)\) in Fig. 2). The memory is updated by writing back the elements of the \(\textsf{wrSet}_t\) to memory when the transaction commits (\(\texttt {TMCommit}\)). In particular, when a writing transaction t commits (see \(\texttt {doCommit}_t \) in Fig. 2), t creates a new memory snapshot by applying its write set to final memory in \(L\), then appending the resulting memory snapshot to \(L\).

To ensure read consistency, the reads performed from memory are recorded in a local read set (\(\textsf{rdSet}_t\)) for each transaction t (see the else case of \(\texttt {doRead}_t(x)\) in Fig. 2). To judge consistency, the largest memory index is stored in \({\textsf {beginIdx}}_{t}\), recording the earliest memory snapshot against which \(t\) can serialise (see \(\texttt {doBegin}_t\) in Fig. 2). A read-only transaction must be validated w.r.t. some memory snapshot indexed at or after \({\textsf {beginIdx}}_{t}\) (see \(\texttt {doRead}_t(x, n)\) in Fig. 2). Validation only succeeds (see \(\textit{validIdx}\)) against a memory snapshot L(n) if each read in the given read set is consistent with L(n). A read in a writing transaction is similar except that the validity check must be w.r.t. the last memory snapshot when the location being read is not in the transaction’s write set. If the location being read is in the transaction’s write set, then the value in the write set is returned.

As shown in Fig. 2, following the notion of a canonical automata [52], each ‘do’ transition is internal, and is preceded and succeeded by a corresponding external invocation and response transition. A transaction can abort (indicated by pc value \(\textit{Aborted}\)) after invocation, but before responding. It can crash by transitioning to \(\textit{Aborted}\) from any state after starting, but before it has committed or aborted.

4 View-based Px86 model

This paper builds on the view-based model for \(\text {Px86}_{\textrm{view}} \) proposed by Cho et al. [12], which has been shown to be equivalent to \(\text {Px86} \) [59]. \(\text {Px86}_{\textrm{view}} \) abstractly captures underlying architectural complexities in terms of timestamps. To support the modelling and verification of our TM implementation, we extend \(\text {Px86}_{\textrm{view}} \) as follows:

  1. (1)

    We add a new crash transition to model a system-wide crash. This is needed because in contrast to prior work [12, 58], we wish to allow reasoning about our TM execution even after a crash/recovery event takes place.

  2. (2)

    We introduce a syntax and semantics for high-level durably linearisable [38] objects.

In the following section, we provide a description of the \(\text {Px86}_{\textrm{view}} \) programming language and semantics, emphasising on our extensions.

4.1 Programming language

Fig. 3
figure 3

Programming language syntax.

The syntax of our language is given in Fig. 3, which is the syntax from prior work [8, 12] extended with high-level method calls.

Atomic statements (in \(\textsc {ASt}\)) may be a no-op (\({{\textbf {skip}}}\)), a local assignment (\(a\,{:=}\,e\)), a load of a shared location (\(a\,{:=}\,{{\textbf {load}}}\,x\)), a store to a shared location (\({{\textbf {store}}}\,\,x\,\, e\)), an atomic compare-and-swap (a \(:=\) CAS  \(x\)  \(e_1\)  \(e_2\)), a memory fence (\({{\textbf {mfence}}} \)), a flush instruction (\({{\textbf {flush }}}x\)), an optimised flush instruction (\({{\textbf {flush}}}_{\text {opt}}\ x\)), a store fence (\({{\textbf {sfence}}} \)) or a call to an atomic method f of object o (o.f).

A labelled statement \(\textsc {LSt}\) is either:

  1. (1)

    a statement of the form \(\alpha \,\,{{\textbf {goto}}}\,\,j\), comprising an atomic statement \(\alpha \) to be executed and the label j of the next statement;

  2. (2)

    a conditional statement of the form \( {{\textbf {if}}}\,\, B\,\,{{\textbf {goto}}}\,\,j\,\,{{\textbf {else to}}}\,\,k\), which facilitates branching, directing execution to label j if B holds and to k, otherwise; and

  3. (3)

    a statement incorporating an auxiliary update, denoted as \(\langle \alpha \,\,{{\textbf {goto}}}\,\,j, \hat{a}:= \hat{e}\rangle \). An auxiliary update behaves like \(\alpha \,\,{{\textbf {goto}}}\,\,j\), but additionally updates the value of the auxiliary variable \(\hat{a}\) with the auxiliary expression \(\hat{e}\) within the same atomic step.

Following [8], a program \(\Pi \) is represented as a function that maps pairs of the form (\(t, i \in \textsc {Tid}\times \textsc {Lab}\)) to labelled statements in \(\textsc {LSt}\), representing the next statement to be executed.

Control flow within each thread is tracked by a program counter function, \(\textit{pc}\), which records the program counter of each thread. The initial label of each thread is a designated label \(\iota \) (in \(\textsc {Lab}\)). During a program’s execution, the \(\textit{pc}\) value of a thread changes according to \(\Pi \) and at the end of the thread’s execution, \(\textit{pc}\) is assigned to a designated value \(\zeta \in \textsc {Lab}\).

Example 3

(Program) The program Fig. 1a, assuming that the executing thread has id 1, is given as follows:

\( \Pi \triangleq \left\{ \begin{array}{@{~}l@{~}} (1, \iota ) \mapsto {{\textbf {store}}}\,\,x\,\, 1\,\,{{\textbf {goto}}}\,\,2, \\ (1, 2) \mapsto {{\textbf {flush}}}_{\text {opt}}\ x\,\,{{\textbf {goto}}}\,\,3, \\ (1, 3) \mapsto {{\textbf {store}}}\,\,y\,\, 1\,\,{{\textbf {goto}}}\,\,\zeta \end{array} \right\} \)

4.2 The \(Px86_{\textrm{view}}\) semantics

Fig. 4
figure 4

Sample of transition rules of \(Px86_{\textrm{view}}\) for a program \(\Pi \).

Our operational semantics is based on earlier work by Cho et al. [12]. A selection of transition rules of the semantics is given in Fig. 4.

A state is modelled by a tuple \(\sigma = {\langle {pc,\textit{rec},\mathbb {T},M,G}\rangle }\).

  • \(pc:\textsc {Tid}\rightarrow \textsc {Lab}\) maps each thread to the next instruction to be executed.

  • \(\textit{rec}: bool\) is a flag that indicates when a recovery process is in progress. In the event of a crash, \(\textit{rec}\) is set to \(\textsf {true}\) to indicate that an implementation-specific recovery process is about to start its execution. We assume that after the recovery process completes, \(\textit{rec}\) is reset to \(\textsf {false}\).

  • \(\mathbb {T}:\textsc {Tid}\rightarrow \textsc {Thread}\) maps each thread to its current thread state, where \(\textsc {Thread}\) is a record of thread views (see below) and local register store (\(\textsf {regs}: \textsc {Reg}\rightarrow \textsc {Val}\)).

  • \(M\in \textsc {Memory}\) is a list of messages modelling the current memory. The first message of each memory is a store \(\textit{CM}: \textsc {Loc}\rightarrow \textsc {Val}\), and the subsequent messages have the form \(\langle \textsc {Loc}\,{:=}\,\textsc {Val} \rangle \). Initially, we assume \(M = \langle \textit{CM}\rangle \), where \(\textit{CM}(x) = 0\) for all \(x\in \textsc {Loc}\).

  • \(G:\textsc {AuxVar}\rightarrow \textsc {Val}\) records the current values of auxiliary variable.

We denote the components of state \(\sigma \) as \(\sigma .\mathbb {T}\), \(\sigma .M\), etc. We refer to the indices of a memory list as timestamps. For \(ts> 0\), the location and a value of a message m are denoted as \(m.\textsf {loc}\) and \(m.\textsf {val}\), respectively. The length of the memory list M is denoted as \(|{M}|\). We say that a message with timestamp \(ts_1\) and location x is not overwritten from timestamp \(ts_2\)’s perspective if the following holds: . We denote the above as \({x} \not \in {M}({ts_1}..{ts_2}]\). Furthermore, we use \(\sqcup \) to obtain the maximum among timestamps (i.e. \(ts_1 \sqcup ts_2 = \textrm{max}(ts_1,ts_2)\)).

The views of a thread state \(\textsc {Thread}\) comprises the following components.

  • \(\textsf {coh}: \textsc {Loc}\rightarrow \mathbb N\), modelling the coherence view, which is used to determine the last write to the given location seen by the thread. In combination with \({\textsf {v}_{\textsf {rNew}}}\) below, \(\textsf {coh}\) determines the range of observable values by \(t\) for a given location.

  • \({\textsf {v}_{\textsf {rNew}}}: \mathbb N\), modelling the latest timestamp among all timestamps seen by the thread.

  • \({\textsf {v}_{\textsf {pReady}}}: \mathbb {N}\), used to ensure that \({{\textbf {load}}} \), \({{\textbf {sfence}}} \), \({{\textbf {mfence}}} \) and \({{\textbf {CAS}}} \) instructions are ordered w.r.t. subsequent \({{\textbf {flush}}}_{\text {opt}} \) instructions.

  • \({\textsf {v}_{\textsf {pAsync}}}: \textsc {Loc}\rightarrow \mathbb N\), modelling the asynchronous view, which is used to determine values to be persistent after the execution of an \({{\textbf {sfence}}} \).

  • \(\textsf {v}_{\textsf {pCommit}}: \textsc {Loc}\rightarrow \mathbb N\) modelling the persistent view, which is used to determine the set of values of a given location in persistent memory.

We assume that all the registers and views are initialised to 0.

As shown in Fig. 4, execution of a \({{\textbf {store}}}\,\,x\,\, v\) instruction adds a message \(\langle x\,{:=}\,v \rangle \) to the memory list and updates the coherence view. A \(r\,{:=}\,{{\textbf {load}}}\,x\) instruction either reads from an earlier write performed by the same thread (load-internal) or from a write performed by another thread (load-external), which update different thread view components. If the read happens to read the first message of the memory returns \(M[0](x)\), otherwise it returns \(M[ts].\textsf {val}\) (assuming \(M[ts].\textsf {loc}\)). We capture both scenarios using the notation \(M[ts] \equiv \langle x\,{:=}\,v \rangle \).

The \({{\textbf {CAS}}} \) instruction is modelled by two transition rules (CAS-success and CAS-failure). The CAS-success transition (for \(a\,{:=}\,{{\textbf {CAS}}}\,\,x\,\,e_1\,\,e_2\)) takes place when the value of register \(e_1\) is equal to the last write at \(x\) (the last memory message with location x). In this case, a message \(\langle x\,{:=}\,v_2 \rangle \), where \(v_2\) is the value of register \(e_2\), is appended in the end of the memory list and register a is assigned \(\textsf {true}\). The CAS-failure transition takes place when the last write at \(x\) does not have the value \(v_1\). In this case, register a is assigned \(\textsf {false}\). The effect of the CAS-failure transition is a equivalent to the effect of a load instruction on location \(x\).

A more detailed description of the views of a thread is given in §A, while an example execution is given below.

Example 4

(Program execution) Fig. 5 illustrates how the view components of a thread state \(\mathbb {T}(t)\) change when \(t\) executes a program.

Fig. 5
figure 5

A depiction of a subset of the current views, the thread state (\(\mathbb {T}(t)\)), and \(\text {Px86}_{\textrm{view}} \) memory list (M). The assertions over the thread state are explained in Example 8. The highlighted components of the state capture the effects of each instruction.

  1. (1)

    Initially, the memory \(\sigma .M\) only includes the initial message, and all the components of \(t\)’s view state point to timestamp 0 (i.e., the initial message).

  2. (2)

    After the execution of \({{\textbf {store}}}\,\,x\,\, 1\) the message \(\langle x:= 1 \rangle \) is added to the memory. The store transition causes the coherence view of \(t\) for x (i.e., \(\sigma .\mathbb {T}(t).\textsf {coh}(x)\)) to become 1.

  3. (3)

    Execution of \({{\textbf {flush}}}_{\text {opt}}\ x \) causes \(\sigma .\mathbb {T}(t).{\textsf {v}_{\textsf {pAsync}}}(x)\) to point to memory index 1. Thus, after executing a subsequent \({{\textbf {sfence}}} \), \(x:= 1\) will be guaranteed to have been persisted (after step 4 below).

  4. (4)

    The \({{\textbf {sfence}}} \) instruction causes the \(\sigma .\mathbb {T}(t).\textsf {v}_{\textsf {pCommit}}(x)\) view for \(t\) to point to the same message as the \(\sigma .\mathbb {T}(t).{\textsf {v}_{\textsf {pAsync}}}(x)\) view, indicating that \(x:= 1\) is now persisted.

  5. (5)

    Finally, execution of \({{\textbf {store}}}\,\,y\,\, 1\) adds to memory M the message \(\langle y:= 1 \rangle \) at index 2. All the views in \(\mathbb {T}(t)\) remain the same apart from the coherence view of y (i.e., \(\sigma .\mathbb {T}(t).\textsf {coh}(y)\)).

4.2.1 Modelling crashes and recovery

In contrast to prior works [8, 12], which only modelled execution upto the first crash, we provide explicit mechanisms to enable reasoning about crashes and the subsequent recovery operation. We introduce a crash transition that creates a new initial message and resets the views of each thread.

Specifically, the memory component of the state, \(\sigma .M\), satisfies \(\textit{CM}\) immediately after a crash in state \(\sigma \) if for every \(x\in \textsc {Loc}\), there exists some \(ts\) such that \(\sigma .M[ts] \equiv \langle x\,{:=}\,\textit{CM}(x) \rangle \) and \({x} \not \in {\sigma .M}({ts}..{\bigsqcup _t \sigma .\mathbb {T}(t).\textsf {v}_{\textsf {pCommit}}(x) }]\). To formalise this, we first define the set of possible persistent timestamps for location \(x\) in \(\sigma \):

$$\begin{aligned} {\textsf{TS}^{\textsf{P}}}(\sigma , x) \triangleq \left\{ ts\mid \textsf{MemLoc} ( x, ts, \sigma .M ) = x\wedge {x} \not \in {\sigma .M}({ts}..{\bigsqcup _t \sigma .\mathbb {T}(t).\textsf {v}_{\textsf {pCommit}}(x)}] \right\} \end{aligned}$$

where \( \textsf{MemLoc} ( x, t, M ) \triangleq {{\textbf {if}}}\ (t = 0)\ {{\textbf {then}}}\ x\ {{\textbf {else}}}\ M[t].\textsf {loc}\). The set of timestamps \({\textsf{TS}^{\textsf{P}}}(\sigma , x)\) represent the set of timestamps of messages that have been persisted in state \(\sigma \), and thus their corresponding values can be read for location \(x\) if a crash occurs at this point of execution. This set corresponds to all the timestamps of the memory messages with location \(x\) that are not overwritten before maximum of each thread’s \(\textsf {v}_{\textsf {pCommit}}\) view for location \(x\) (i.e., \(\bigsqcup _t \sigma .\mathbb {T}(t).\textsf {v}_{\textsf {pCommit}}(x)\)).

Example 5

Consider the program executed in Example 4. The set \({\textsf{TS}^{\textsf{P}}}(\sigma , x)\) changes as follows: \({\textsf{TS}^{\textsf{P}}}(\sigma ,x) = \{\} \xrightarrow {{{\textbf {store}}}\,\,x\,\, 1} {\textsf{TS}^{\textsf{P}}}(\sigma ,x) = \{ 0, 1\}\xrightarrow {{{\textbf {flush}}}_{\text {opt}}\ x} {\textsf{TS}^{\textsf{P}}}(\sigma ,x) = \{ 0, 1\} \xrightarrow {{{\textbf {sfence}}}} {\textsf{TS}^{\textsf{P}}}(\sigma ,x) = \{ 1\} \xrightarrow {{{\textbf {store}}}\,\,y\,\, 1} {\textsf{TS}^{\textsf{P}}}(\sigma ,x) = \{ 1\}\)

The set of values corresponding to these timestamps is given by

where \(\textsf{Vals}(TS,x, M) \triangleq \{ \textsf{MemVal} (x, t, M ) \mid t \in TS\}\) returns the values at the given set of timestamps, assuming \( \textsf{MemVal} (x, t, M ) \triangleq {{\textbf {if}}}\ (t = 0)\ {{\textbf {then}}}\ M[0](x)\ {{\textbf {else}}}\ M[t].\textsf {val}\). We call the set \({[}x]^\textsf{P}\) the persistent view of location x. The persistent view of any location \(x\) in \(\textsc {Loc}\) is global (not specific to a thread) and captures the possible values of \(x\) in persistent memory. It constitutes one of the view-based expressions that we use to form assertions in the proof outlines of \(\text {Px86}_{\textrm{view}} \) programs [8]. We present other view-based expressions in Sect. 6.1.

We assume that recovery is executed by a unique system thread, \(\textit{syst}\), that is different from any program thread. Recovery is only enabled in state \(\sigma \) if \(\sigma .\textit{rec}\) holds. Moreover, we assume a special label, \(\textit{Rec}_\textit{pending}\), which we assume is the label of the first recovery instruction. Upon completion of the recovery procedure, we assume that \(pc_\textit{syst}\) is set to \(\textit{Rec}_\textit{complete}\), and that there is a transition from this state to a state in which \(\textit{rec}\) is set to \(\textsf {false}\).

5 \(\text {dTML}_{\text {Px86}}\): a durable transaction mutex lock for Px86

In this section, we describe \(\text {TML} \) and the extensions required for durable opacity under \(\text {Px86} \). An adaptation of \(\text {TML} \) that ensures durable opacity under the simpler PSC memory model (cf. [44]) has been presented in prior work [5]. In addition to assuming a more realistic memory model, unlike Bila et al. [5], our adapted algorithm \(\text {dTML}_{\text {Px86}} \), uses optimised flush instructions to increase performance [37], but at the cost of significantly increasing the verification challenge.

5.1 The \(\text {dTML}_{\text {Px86}}\) algorithm

Pseudocode for \(\text {dTML}_{\text {Px86}} \) is given in Fig. 6 as “fall-through” execution, which is notationally more convenient than our goto language (Sect. 4.1). Our Isabelle/HOL encoding uses the goto model (and hence is consistent with the language in Sect. 4.1).

In order to handle the weak behaviours introduced by \(\text {Px86} \), we introduce several extensions to the original TML implementation [15]. Specifically, the lines highlighted ensure correct thread synchronisation under weak memory, while the lines highlighted are required to ensure correctness under persistency. The variables highlighted are auxiliary. All the local variables apart from the auxiliary ones are modelled as registers. To distinguish them from global variables, we index the registers with the id of the transaction that they belong to. As before, we assume that thread identifiers coincide with the transaction identifiers. Moreover, for simplicity, line numbers for return statements are omitted. From now on, will use the term internal read for a read that a transaction performs to a location that the same transaction previously wrote, and external read for a read that a transaction performs to a location that has been written by another transaction.

We assume that all locations, the registers for every transaction, the global variable \( \textsf{glb} \) and the auxiliary variable \(\textsf{recGlb}\), are initialised to zero. The auxiliary variable \(\textsf{writer}\) is initialised to \(\textit{None}\). We explain the behaviour of \(\text {dTML}_{\text {Px86}} \) in stages, starting with the basic algorithm.

Fig. 6
figure 6

Durable Transactional Mutex Lock.

5.1.1 The basic TML algorithm

TML performs writes in an eager manner, also known as direct update, i.e., it updates shared memory within the write operation itself. This is in contrast to lazy algorithms that store writes locally in a write set, and update shared memory at a later stage, e.g., during the commit operation. Additionally TML adopts a strict policy for transactional synchronisation: as soon as a transaction attempts to write to a variable, all other transactions running concurrently will be aborted when they invoke a read or a write operation. To enforce this synchronisation policy, TML uses a single global versioned lock [19], \( \textsf{glb} \), and a local register \( \textsf{loc}_t \) to record a snapshot of \( \textsf{glb} \) at the beginning of the transaction t. A writing transaction is in progress iff the value of \( \textsf{glb} \) is odd.

A transaction t starts by calling \(\texttt {TMBegin}\), then reading \( \textsf{glb} \) and storing the read value in the register \( \textsf{loc}_t \) (Bp). If the value of \( \textsf{glb} \) is odd, another writing transaction is in progress so t does not start. Instead, it reattempts to start by rereading \( \textsf{glb} \).

Operation \(\texttt {TMWrite}(x, v)\) first checks whether \( \textsf{loc}_t \) is even (Wp). If not, then t must already be the writing transaction, and hence, it can proceed and update the value of the given location \(x\) to v (W7). If \( \textsf{loc}_t \) is even, it means that the current transaction is not yet a writing transaction, thus it attempts to become a writing transaction by performing a compare-and-swap (\({{\textbf {CAS}}} \)) operation (W1). If this \({{\textbf {CAS}}} \) succeeds, \(\texttt {TMWrite}\) becomes the writing transaction and increments \( \textsf{loc}_t \) (W3), making \( \textsf{loc}_t \) odd, then proceeds to update \(x\) to v (W7). In addition, at W3, the auxiliary variable \(\textsf{writer}\) is set to t. If the \({{\textbf {CAS}}} \) at W1 fails, the transaction t aborts.

Operation \(\texttt {TMRead}(x)\) first reads the value at the given location x and stores it in the register \(r_t\) (Rp). The lines R1 to R3 are used to ensure weak-memory synchronisation under TSO and are explained below. At line R4, the operation reads the current value of \( \textsf{glb} \). If this value is the same as \( \textsf{loc}_t \), then either this transaction is the writing transaction, or no other transaction has performed any writes since this transaction started. In both cases the transaction returns the read value. If the test at R5 fails, then the transaction aborts.

Transaction t commits by first checking whether \( \textsf{loc}_t \) is odd (Cp). If so, it means that t is a writing transaction (and hence \( \textsf{glb} \) is odd), thus it makes \( \textsf{glb} \) even by incrementing \( \textsf{glb} \) and setting the auxiliary variable \(\textsf{writer}\) to \(\textit{None}\). If t is a read-only transaction (i.e., \( \textsf{loc}_t \) is even), it simply commits.

We now describe the necessary extensions for adapting TML to the persistent x86 setting. From now on, we assume that the underlying memory model is the persistent x86 and the instructions that are used correspond to the atomic statements of the \(\text {Px86}_{\textrm{view}} \) programming language (see Sect. 4.1)

5.1.2 Correct synchronisation under Px86

Under \(\text {Px86} \), in the presence of multiple writes to a location, a read may return a stale value, i.e., a value that is not the last written value. To ensure that a writing transaction serialises correctly, it must successfully perform a \({{\textbf {CAS}}}\) at line W1, which guarantees that it reads the last written value of \( \textsf{glb} \). However, in the standard \(\text {TML} \) and \(\text {dTML} \) algorithms [5, 15, 17] (which assume SC and PSC memory, respectively), this synchronisation is never performed by read-only transactions. Using approach in the \(\text {Px86} \) setting is problematic since a read-only transaction may complete with a stale value of \( \textsf{glb} \), without ever reading from the latest write to \( \textsf{glb} \).

Example 6

Consider the program in Fig. 6 without lines R1–R3 (which have been introduced to address correctness under \(\text {Px86} \)). An execution of this program can reach a state with the following memory sequence:

$$\begin{aligned} {\langle {M_0, {\langle { \textsf{glb} := 1}\rangle }, {\langle {x:= 1}\rangle }, {\langle { \textsf{glb} := 2}\rangle }, {\langle { \textsf{glb} := 3}\rangle }, {\langle {x:= 2}\rangle }, {\langle { \textsf{glb} := 4}\rangle } }\rangle } \end{aligned}$$

after executing two transactions \(t_1\) and \(t_2\), where \(t_1\) writes 1 at location \(x\) and commits and afterwards \(t_2\) writes 2 at location \(x\) and commits. Now suppose transaction \(t_3\) starts, reads \( \textsf{glb} := 2\) (i.e. \( \textsf{loc}_t = 2\)), allowing it to complete \(\texttt {TMBegin}\), and then performs a \(\texttt {TMRead}(x)\) operation. The \(\text {Px86} \) semantics allows it to read from the stale write \({\langle {x:= 1}\rangle }\) (which has been written by transaction \(t_1\)), and then commit. Since \(t_1 \prec t_2\) and \(t_2 \prec t_3\), \(t_3\) reading the value of \(x\) written by \(t_1\), causes the generated history to violate the real-time ordering constraint of opacity.

To address this, we follow a similar approach to Dalvandi and Dongol [16] in the RC11 memory model,Footnote 2 and introduce a \({{\textbf {CAS}}}\) in the \(\texttt {TMRead}\) operation (R2), mimicking a fetch-and-add-zero, to ensure that the last value of \( \textsf{glb} \) is read. If this \({{\textbf {CAS}}}\) succeeds, the executing transaction can immediately return the read value, and if this \({{\textbf {CAS}}}\) fails, the transaction can immediately abort (R3). Note that this \({{\textbf {CAS}}}\) only needs to performed if the corresponding transaction has not previously performed a read or a write. Thus at line R1, we bypass R2 when \( \textsf{loc}_t \) is odd or \(\textsf{hasRead}_t\) holds. To see how the introduction of lines \(R1-R3\) addresses the issues, consider the following example.

Example 7

Consider the program in Fig. 6 (with lines R1–R3). Execution of this program can also reach the state in Example 6 after the execution of the transactions \(t_1\) and \(t_2\) described in Example 6. Once again, suppose transaction t starts, then reads \( \textsf{glb} := 2\) (i.e., \( \textsf{loc}_t = 2\)), allowing it to complete \(\texttt {TMBegin}\). Suppose t then executes a \(\texttt {TMRead}(x)\) operation reading the stale write \({\langle {x:= 1}\rangle }\). However, now (unlike Example 6) t proceeds to line R2 and since \( \textsf{loc}_t \) is not the last written value of \( \textsf{glb} \), the \({{\textbf {CAS}}}\) fails, and thus t aborts.

5.1.3 Read-only transactions in Px86

Like Dalvandi and Dongol [16], we observe new behaviours of \(\text {dTML}_{\text {Px86}}\) that would not be present under SC memory, but without violating durable opacity. In particular, a read-only transaction, \(t\), is not immediately invalidated when \( \textsf{glb} \) is updated by another writing transaction, provided \(t\) continues to read from transactional locations that are consistent with a stale value of \( \textsf{glb} \). This read-only transaction would be able to successfully commit if it never reads a value for \(x\) that is more recent than its copy of \( \textsf{glb} \). In case a read-only transaction reads a value of a location \(x\) at Rp that is more recent than its local copy of \( \textsf{glb} \), the load of \( \textsf{glb} \) at R4 would also read a more recent copy of \( \textsf{glb} \) and the transaction would subsequently abort.Footnote 3

5.1.4 Ensuring durability

Durability of \(\text {dTML} \) under PSC has been studied in previous work [5]. The main idea there was to introduce a durably linearisable [38] persistent undo log that records the previous values of locations that have been overwritten by incomplete writing transactions. The log is reset to empty when the writing transaction commits. If a crash occurs when a incomplete writing transaction t is in flight, the subsequent recovery operation sets the state to the last consistent state by undoing the writes of t using the undo log. The recovery mechanism from the undo log is similar to this previous work [5], but we use \({{\textbf {flush}}}_{\text {opt}}\) and \({{\textbf {sfence}}} \) instructions instead of \({{\textbf {flush}}}\).

As in earlier work [5], there is no need to explicitly persist \( \textsf{glb} \). For transactions to successfully execute \(\texttt {TMBegin}\) after a crash, there is no necessity for transactions to read a particular value of \( \textsf{glb} \) at line Bp, as long as the read value is even. Lines Rec8–Rec10 of \(\texttt {TMRecover}\) ensure that there is at least one even value visible for \( \textsf{glb} \) after a system crash.

5.1.5 Alternative designs

While developing \(\text {dTML}_{\text {Px86}} \), we considered several design alternatives. For instance, one option is to move the \({{\textbf {CAS}}} \) instruction of line R2, to line Bp. In this way, a transaction t could have retried loading the most recent value of \( \textsf{glb} \) into \( \textsf{loc}_t \) until it succeeds before starting. This would have allowed the transaction to avoid aborting at a later stage. However, while this design may have resulted in fewer aborts, it would likely lead to a considerable increase in overall latency since transactions would be require to execute several \({{\textbf {CAS}}}\) instructions within the \(\texttt {TMBegin}\) operation.

Another design alternative is to use a \({{\textbf {flush}}} \) instruction instead of the \({{\textbf {flush}}}_{\text {opt}}\ ;{{\textbf {sfence}}} \) sequence in Rec4 and Rec5. Since the value of each location recorded in the log, is persisted sequentially and by only one thread, we expect the \({{\textbf {flush}}} \) instruction in this case to be equally or more efficient than the current solution.

Both alternative designs would not affect significantly the verification effort.

5.2 \(\text {dTML}_{\text {Px86}}\) model

We build a transition system model for \(\text {dTML}_{\text {Px86}} \). In this model, we must clarify possible histories of the algorithm, which in turn requires us to clarify the invocation and response events. We assume that the algorithm is executed by a most-general client [22] that calls the operations of \(\text {dTML}_{\text {Px86}} \).

5.2.1 \({dTML}_{{Px86}}\) executions and histories

For each transaction t, we assume a program counter, \(pc_t\), (initially \(\textit{NotStarted}\)) that is used to model the control flow of transaction t. When t is in flight, but not executing any operation, we have \(pc_t = \textit{Ready}\). Similarly, \(pc_t = \textit{Aborted}\) and \(pc_t = \textit{Committed}\) iff t has aborted or committed, respectively. Otherwise \(pc_t\) is a line number corresponding to the instruction of the operation t is executing.

We assume each operation \(op \in \{\texttt {TMBegin}, \texttt {TMRead}(x), \) \(\texttt {TMWrite}(x, v), \) \(\texttt {TMCommit}\} \) generates an event \( inv _t(op)\) when op starts executing and \( res _t(op)\), when op completes.

5.2.2 Ensuring well-formed histories

To ensure well-formedness of histories, we must ensure that transaction identifiers are not reused. Additionally, a live (i.e., in-flight) transaction before a crash must not continue its execution after the crash. To this end, we implicitly assume a persistent transaction manager that allocates new transaction identifiers. In our model, like earlier works [5] we use program counters to concisely characterise this assumption. First note that we assume program counter values of all threads except the system thread are unchanged after a crash transition (see Fig. 4), thus any transaction t with \(pc_t = \textit{NotStarted}\) can be executed after a crash. To ensure that in-flight transactions are not resumed, we assume that recovery starts by setting \(pc_t\) to \(\textit{Aborted}\) for every transaction t such that \(pc_t \notin \{\textit{NotStarted}, \textit{Aborted}, \textit{Committed}\}\) (cf. TMCrashRecovery in Fig. 2).

5.2.3 Modelling log operations

The final source of complexity is the durably linearisable [38] log, log, which we model as a (persistent) mapping from locations to values. In our model, we use a sequential specification of \(\textit{log} \) that does not enforce any client-side memory synchronisation (see [16, 66]) because the TML algorithm only allows a single writer at a time, and hence there is never any race on \(\textit{log} \). Moreover, because we assume that \(\textit{log}\) is durably linearisable, the effect of each log operation is persisted before the operation returns, and hence its client (i.e., our \(\text {dTML}_{\text {Px86}}\) algorithm) never accesses unpersisted log values. We assume that \(\textit{log} \) supports the following operations.

  • \(\textit{log}.{\textbf {isEmpty}}\)() that returns true whenever the log is empty (i.e., all elements are mapped to \(\bot \)).

  • \(\textit{log}.{{\textbf {contains}}}(x)\) that returns true whenever the log contains x (i.e., x is not mapped to \(\bot \)).

  • \(\textit{log}.{\textbf {contains}}(x)\) that updates the logged location \(x\) to value \(v\).

  • \({\textit{log}.{{\textbf {getKey}}} ()}\) that non-determinstically returns a location whose value is not \(\bot \).

  • \({\textit{log}.{{\textbf {getVal}}} (x)}\) that returns the value of \(x\) in \(\textit{log}\).

The log is stored in the G state component in Fig. 4 and updated according to SC semantics. An actual implementation of \(\textit{log} \) may synchronise threads, e.g., with \({{\textbf {mfence}}} \) operations, which affects the persistency and thread views of the variables of \(\text {dTML}_{\text {Px86}}\). Our proof makes no such assumptions about \(\textit{log} \), namely we assume the weakest possible ordering guarantees. Thus, an implementation of \(\textit{log} \) that performs additional thread synchronisation would not affect soundness of our result.

Fig. 7
figure 7

\(\texttt {TMRead}\) annotation.

6 Invariants of \(\text {dTML}_{\text {Px86}}\)

This section describes the key invariants of \(\text {dTML}_{\text {Px86}} \) and mechanisms for proving their correctness. These will be used in the simulation proof in Sect. 7. Our work builds on the Pierogi logic for \(\text {Px86}_{\textrm{view}} \) [8], which uses view-based expressions derived from the view components of the thread state. We only require a subset of the Pierogi assertions. However, we also introduce new view-based expressions simplify reasoning about \(\text {dTML}_{\text {Px86}} \) (see Sect. 6.1). This is combined with an Owicki–Gries style proof method to establish correctness of proof outlines (Sect. 6.2). However, unlike Pierogi, because we additionally reason about the behaviour of a program after a crash, we slightly modify the interpretation of a persistent invariant as used in Pierogi (see Sect. 6.2). Pierogi requires that we establish a set of proof rules for atomic statements. We present a subset of these, including our new view-based expressions for \(\text {dTML}_{\text {Px86}} \) in §B. In Sect. 6.4, we present an example proof outline for the \(\texttt {TMRead}\) operation and finally, in Sect. 6.3, we present the persistent invariant.

6.1 View-based expressions

We first recap two key Pierogi view-based expressions that are used in our proof.

The thread view expression, \(\mathop {[x]_{t}}\), of a thread \(t\) for a location \(x\) captures the values that are visible to \(t\) for \(x\). It indicates the values that can be read from \(t\) via the execution of a \({{\textbf {load}}} \) or \({{\textbf {CAS}}} \) instruction on \(x\). The formal definition of \(\mathop {[x]_{t}}\) is constructed by firstly specifying the set of timestamps of the visible to \(t\) memory messages with location x (\({\textsf{TS}_{\textit{t}}}(\sigma , x)\)), and then by extracting the set of the values that correspond to those timestamps using \(\textsf{Vals}\). We define:

figure d

where \({\textsf{TS}_{\textit{t}}}(\sigma , x) \triangleq \left\{ \begin{array}{@{}l|l@{}} ts &{} \textsf{MemLoc} ( x, ts, \sigma .M ) = x\wedge {} \\ &{} \sigma .\mathbb {T}(t).\textsf {coh}(x) \le ts\wedge {x} \not \in {\sigma .M}({ts}..{\sigma .\mathbb {T}(t).{\textsf {v}_{\textsf {rNew}}}}]\end{array}\right\} \).

Similarly, the asynchronous view expression, \({[}x]_{t}^\textsf{A}\), of a thread \(t\) for a location \(x\) is thread-local and captures the values that can be persisted after the execution of an \({{\textbf {sfence}}} \) instruction by \(t\). This only depends on the view \({\textsf {v}_{\textsf {pAsync}}}(x)\) of \(t\), which potentially changes after a \({{\textbf {flush}}}_{\text {opt}} \) on \(x\) by \(t\). The formal definition of \({[}x]_{t}^\textsf{A}\) is constructed by firstly specifying the set of timestamps of the asynchronous view of thread \(t\) for location x and state \(\sigma \). Then, as before, we extract the set of values that correspond to those timestamps using \(\textsf{Vals}\). We define:

figure e

where \({\textsf{TS}^{\textsf{A}}_{\textit{t}}}(\sigma , x) \triangleq \left\{ ts\mid \textsf{MemLoc} ( x, ts, \sigma .M ) = x\wedge {x} \not \in {\sigma .M}({ts}..{ \sigma .\mathbb {T}(t).{\textsf {v}_{\textsf {pAsync}}}(x)}]\right\} \).

Example 8

Consider again the example execution in Fig. 5. This time we consider the assertions associated with each program state. Initially, views \(\mathop {[z]_{t}}\), \({[}z]_{t}^\textsf{A}\) and \({[}z]^\textsf{P}\) for \(z \in \{x, y\}\) all comprise the set \(\{0\}\), meaning that the only value they can read is from the initial message.

  1. (1)

    After execution of \({{\textbf {store}}}\,\,x\,\, 1\), we have \(\mathop {[x]_{t}} = \{1\}\), since the coherence view changes, while \({[}x]_{t}^\textsf{A} = {[}x]^\textsf{P} = \{0,1\}\) since these views can see the value for x in either the initial message or \({\langle {x:= 1}\rangle }\). The view assertions on y are unchanged.

  2. (2)

    After execution of \({{\textbf {flush}}}_{\text {opt}}\ x\), since \(\textsf{v}_\textsf{pAsync}(x)\) is updated, the value 0 is no longer visible to the asynchronous view, and hence \({[}x]_{t}^\textsf{A} = \{1\}\). Note that the persistent memory may still see both 0 and 1 and hence \( {[}x]^\textsf{P} = \{0,1\}\).

  3. (3)

    Next \({{\textbf {sfence}}} \) is executed, whereby the both \(\textsf{v}_\textsf{pCommit}(x)\) and \(\textsf{v}_\textsf{pReady}\) are updated, and this means that we have \( {[}x]^\textsf{P} = \{1\}\).

  4. (4)

    Finally \({{\textbf {store}}}\,\,y\,\, 1\) is executed, which has a similar effect to the first step, but on y instead of x.

Next, we present an extension to Pierogi that enable reasoning about written values before a given timestamp. The last entry views return the timestamp of the memory message with location equal to the given location and a timestamp less than or equal to the given limit.

\(\textsf{MemLastEntryLim}(x, t, M)\) returns the maximum timestamp of the memory messages with location x and timestamp less or equal to timestamp t, \({\textsf {LE}} (x) \) returns the timestamp of the last memory message on location x, and \({\textsf {LE}_\textsf{coh}}(y,t,x) \) returns the timestamp of the last write to \(x\) before \(t\)’s coherence view for \(y\). The expression \(\vec {x} \) returns the value of the last message of the memory with location x in the given state.

6.2 Owicki–Gries reasoning

In this section, we describe our Owicki–Gries style framework that we used to show that a proof outline is valid. Our framework follows Pierogi  [8], but we revise the notion of a persistent invariant to enable one to describe the execution of a program after a crash. In particular, given a multi-threaded program \(\Pi \), in addition to the local correctness and global correctness checks, we also check that the persistent invariant is maintained by all program transitions, including those of the recovery operation. As such the persistent invariant can be used as an assumption when proving local correctness and global correctness. The use of a global invariant to simplify Owicki–Gries proofs is a well known technique [26].

We refer to the set of assertions (i.e. predicates over \(\text {Px86}_{\textrm{view}} \) states) that use view-based expressions (§6.1) as an \(\textsc {Assertion}_\textsc {pv}\). A proof outline is a tuple \(( in , ann , I , fin )\), where \( in , fin \in \textsc {Assertion}_\textsc {pv}\) are the initial and final assertions, I is the persistent invariant and \( ann \) is an annotation function that models program annotations. Specifically, \( ann \in \textsc {Ann}= \textsc {Tid}\times \textsc {Lab}\rightarrow \textsc {Assertion}_\textsc {pv}\), associates each program point \((t, i)\) with its associated assertion. We let \(\textit{Recovery}\) denote the set of all statements of the recovery operation and crash be a statement corresponding to a crash transition.

Definition 1

(Valid proof outline) A proof outline \(( in , ann , fin , I )\) is valid for a program \(\Pi \) iff the following hold:

  • Initialisation. For all \(t\in \textsc {Tid}\), \( in \Rightarrow I \wedge ann (t, \iota ) \).

  • Finalisation. \(I \wedge (\bigwedge _{t\in \textsc {Tid}}\ ann (t, \zeta )) \Rightarrow fin \).

  • Local correctness. For all \(t\in \textsc {Tid}\) and \(i \in \textsc {Lab}\), either:

    • \(\bullet \) \(\Pi (t, i) = \alpha \,\,{{\textbf {goto}}}\,\,j\) and ; or

    • \(\bullet \) \(\Pi (t, i) = {{\textbf {if}}}\,\, B\,\,{{\textbf {goto}}}\,\,j\,\,{{\textbf {else to}}}\,\,k\) and both

      • \(I \wedge ann (t, i) \wedge B \Rightarrow ann (t,j)\) and

      • \(I \wedge ann (t, i) \wedge \lnot B \Rightarrow ann (t,k)\) hold; or

    • \(\bullet \) \(\Pi (t, i) = \langle \alpha \,\,{{\textbf {goto}}}\,\,j, \hat{a}:= \hat{e}\rangle \) and .

  • Global Correctness. For all \(t_1, t_2 \in \textsc {Tid}\) such that \(t_1 \ne t_2\) and \(i_1, i_2 \in \textsc {Lab}\):

    • \(\bullet \) if \(\Pi (t_1, i_1) = \alpha \,\,{{\textbf {goto}}}\,\,j\), then ;

    • \(\bullet \) if \(\Pi (t_1, i_1) = \langle \alpha \,\,{{\textbf {goto}}}\,\,j, \hat{a}:= \hat{e}\rangle \), then .

  • Crash invariance. Both of the following hold:

    • \(\bullet \) for all \(\alpha \in \textit{Recovery}\),

    • \(\bullet \)

Initialisation (resp. Finalisation) ensures that the initial (resp. final) assertion of each thread holds in the initial (resp. final) state. Local correctness ensures the validity of the program annotation of each thread, while global correctness ensures the global correctness of the program annotation of each thread under the execution of other threads. In essence, the local correctness proof for a thread \(t\) checks for each atomic statement of \(t\) if its post-condition (given as annotation) can be established by its pre-condition (given as annotation). Similarly, the global correctness proof for a thread \(t\) checks that the pre-condition of each atomic statement of \(t\) is stable against the atomic statements of the other threads. Note that \({{\textbf {if}}}\,\, B\,\,{{\textbf {goto}}}\,\,j\,\,{{\textbf {else to}}}\,\,k\) does not generate a global correctness proof obligation since B is an expression over thread-local variables, thus does not change the global state.

To show that a proof outline is valid (Definition 1) we use two types of rules: standard decomposition rules and rules for atomic statements.

6.2.1 Standard decomposition rules

The standard decomposition rules of Hoare logic such as weakening preconditions, strengthening postconditions, and decomposing conjunctions and disjunctions apply (see [8]).

6.2.2 Rules for atomic statements and correctness of view-based assertions

The proof rules that we use constitute all the rules of the \(\textsc {Pierogi} \) framework [8] as well as some additional rules developed enable proofs of correctness for \(\text {dTML}_{\text {Px86}} \). All the proof rules used in this work have been mechanised and proved sound against our extensions to \(\text {Px86}_{\textrm{view}} \) in Isabelle/HOL. Each rule captures the impact of the execution of atomic statements (discussed in §4.1) on assertions formed by view-based expressions (outlined in §6.1).

We have two general types of rules used to discharge local and global correctness proof obligations. Local correctness proof rules often describe how views are changed through the execution of a thread:

Example 9

Assuming that the statement in question is executed by thread t, the rule states that the asynchronous view of x for thread \(t\) in the post-state is equal or a subset of its asynchronous view in the pre-state, after executing \({{\textbf {flush}}}_{\text {opt}}\ x\).

Global correctness proof rules are often used to show stability of assertions.

Example 10

Assuming that the statement in question is executed by thread t and \(t \ne t'\), the rule states that the thread view of any address y for any thread remains unchanged after the execution of \({{\textbf {sfence}}} \).

Other global correctness rules describe how the memory (and hence available values for a thread to observe change).

Example 11

Assuming that the statement in question is executed by thread t and \(t \ne t'\), the rule states that the value v for x is available for thread \(t'\) to read after the execution of \({{\textbf {store}}}\,\,x\,\, v\).

A full set of rules for proving local and global correctness of view-based assertions is presented in Sect. B.

6.3 Persistent invariant of \(\text {dTML}_{\text {Px86}}\)

To prove corrrectness of \(\text {dTML}_{\text {Px86}} \), we construct a multithreaded program \(\Pi _{\text {dTML}_{\text {Px86}}}\) based on the model introduced in Sect. 5.2. \(\Pi _{\text {dTML}_{\text {Px86}}}\) includes all \(\text {dTML}_{\text {Px86}}\) operations, invocation events, response events and the system crash event. With the exception of the system thread, which is only capable of executing the \(\texttt {TMRecover}\) operation, any thread t in \(\textsc {Tid}\) is free to perform any number of operations (excluding the recovery operation) as long as the resulting execution history conforms to the control flow and well-formedness constraints.

In this section, we present the most important aspects of the persistent invariant, which comprises a collection of properties that the \(\text {dTML}_{\text {Px86}} \) implementation guarantees in every program state. The corresponding proofs have been mechanised in Isabelle/HOL.

6.3.1 Memory properties

The first three properties describe memory patterns that occur during the execution of \(\text {dTML}_{\text {Px86}} \). In each of the properties below, we assume that \(i, j \in {{\textbf {dom}}}(M)\) and that \(i < j\).

Property 1

The values of \( \textsf{glb} \) are monotonically increasing within the memory sequence M, i.e.,

$$\begin{aligned} \forall v_i, v_j.\ M[i] \equiv {\langle { \textsf{glb} := v_i}\rangle } \wedge M[j] \equiv {\langle { \textsf{glb} := v_j}\rangle } \ \ \implies \ \ v_i \le v_j \end{aligned}$$

Property 1 is needed because unlike in prior work [5], the recovery process of \(\text {dTML}_{\text {Px86}}\) does not reset \( \textsf{glb} \) to 0. This is actually necessary to avoid \(\texttt {TMRead}\) operations returning stale values (i.e., values that were in persistent memory, but subsequently modified) after a crash. The following example demonstrates this phenomenon.

Example 12

Consider the program in Fig. 6 that resets \( \textsf{glb} \) to zero (\({{\textbf {store}}}\,\, \textsf{glb} \,\, 0\)) after Rec6 instead of executing lines \(Rec7-Rec10\). An execution of this program can reach a state with the following memory sequence:

$$\begin{aligned} {\langle { \left\{ \begin{array}{@{}c@{}} \textsf{glb} \mapsto 2, x \mapsto 5, \_ \mapsto 0 \end{array} \right\} , \langle x\,{:=}\,3 \rangle ,\langle \textsf{glb} \,{:=}\,0 \rangle , \langle \textsf{glb} \,{:=}\,1 \rangle , \langle y\,{:=}\,1 \rangle , \langle \textsf{glb} \,{:=}\,2 \rangle }\rangle } \end{aligned}$$

which is reached from the initial state after a

  1. (1)

    A writing transaction updates \(x\) to 3 then commits (so \( \textsf{glb} = 2\)),

  2. (2)

    Another writing transaction writes updates \(x\) to 5 (so \(\textit{log} (x) = 3\)),

  3. (3)

    A crash occurs (resulting in the intial state above),

  4. (4)

    The modified recovery operation described above executes (appending \(\langle x\,{:=}\,3 \rangle \) then \(\langle \textsf{glb} \,{:=}\,0 \rangle \) to the memory),

  5. (5)

    A third writing transaction that updates \(y\) to 1 commits successfully.

Now assume that another transaction t starts, then reads 2 for \( \textsf{glb} \) from the initial message, allowing it to complete \(\texttt {TMBegin}\), then performs a \(\texttt {TMRead}(x)\) operation. In this case, according \(\text {Px86} \) semantics the initial value of \(x\) (i.e., 5) is still observable at Rp. The test at R1 succeeds and the \({{\textbf {CAS}}} \) instruction at R2 can still succeed, since the last value of \( \textsf{glb} \) is 2. As a result, t can successfully complete the \(\texttt {TMRead}\) operation and subsequently commit, violating durable opacity.

Property 2

If there exists a write between two writes to \( \textsf{glb} \) such that the value of \( \textsf{glb} \) is unchanged, then the location of any intermediate write between these two writes must be on \( \textsf{glb} \), i.e.,

$$\begin{aligned}{}\begin{array}{@{}l@{}} \forall v.\ M[i] \equiv {\langle { \textsf{glb} := v}\rangle } \wedge M[j] \equiv {\langle { \textsf{glb} := v}\rangle } \ \ \implies \ \ \forall k \in [i, j].\; M[k].\textsf {loc}= \textsf{glb} \end{array} \end{aligned}$$

Property 2 holds since this memory pattern described by the antecedent only occurs when two or more transactions that have not yet executed a \(\texttt {TMRead}\) or \(\texttt {TMWrite}\) invoke \(\texttt {TMRead}\) operations and successfully execute their \({{\textbf {CAS}}} \) instruction at R2. The first of these reading transactions introduces a write to \( \textsf{glb} \) that immediately follows either

  1. (1)

    The initial message, or

  2. (2)

    A write to \( \textsf{glb} \) by a writing transaction at C3, or

  3. (3)

    A message added by the \(\texttt {TMRecover}\) process at Rec9 or Rec10.

The subsequent \(\texttt {TMRead}\) operations introduce writes to \( \textsf{glb} \) with unchanged values.

Property 3

Between a memory message on \( \textsf{glb} \) with even value and another memory message on a location different from \( \textsf{glb} \), there exists a message with location on \( \textsf{glb} \) with odd value, i.e.,

$$\begin{aligned}{}\begin{array}{@{}l@{}} i > 0 \wedge M[i].\textsf {loc}= \textsf{glb} \wedge \textit{even}(M[i].\textsf {val}) \wedge M[j].\textsf {loc}\ne \textsf{glb} \ \implies \ \\ \qquad \qquad \qquad \qquad \qquad \qquad \quad \qquad \exists k \in (i, j).\; M[k].\textsf {loc}= \textsf{glb} \wedge \textit{odd}(M[k].\textsf {val})\end{array} \end{aligned}$$

Property 3 describes a memory pattern that occurs when a transaction successfully performs a \(\texttt {TMWrite}\). Note that excluding the initial message and the messages added from the recovery process, the only way that messages with a location different from \( \textsf{glb} \) are added to the memory is by executing W7. Prior to this, the writing transaction performs a successful \({{\textbf {CAS}}}\) at W1. The execution of W1 adds a message to memory with location \( \textsf{glb} \) and odd value.

6.3.2 Coherence property for non-writing transactions

The next property uses \( \textsf {maxcoh}_t \triangleq \lambda \sigma .\ \bigsqcup _x(\sigma .\mathbb {T}(t)).\textsf {coh}(x)\), which denotes the maximum coherence value for t across all locations and \( {\textsf {vrnew}}_t \triangleq \lambda \sigma .\ (\sigma .\mathbb {T}(t)).{\textsf {v}_{\textsf {rNew}}}\), which retrieves the value of \({\textsf {v}_{\textsf {rNew}}}\) for t. We let \(\textsf{Recovering}\triangleq \lambda \sigma .\ \sigma .\textit{rec}= \textsf {true}\).

Property 4

When a \(\texttt {TMRecover}\) process is not in progress, for any transaction that is not a writing transaction, the coherence view for all the locations in memory is less than or equal to its \({\textsf {v}_{\textsf {rNew}}}\) view, i.e.,

$$\begin{array}{@{}l@{}} \forall t \in \textsc {Tid}.\; \lnot \textsf{Recovering}\wedge \textsf{writer}\ne t \ \ \implies \ \ \textsf {maxcoh}_t \le {\textsf {vrnew}}_t \end{array}.$$

Property 4 holds because the only cases in which \({\textsf {coh}}_t(x) > {\textsf {vrnew}}_t \), is when t is executing a write on x or performing an internal read to x. Both cases are precluded for non-writing transactions.

6.3.3 Properties about tracked locations and log

We now describe a set of properties describing the memory locations that are tracked by \(\text {Px86} \) and \(\textit{log} \). Note that we assume that all locations in \(\textsc {Loc}\) different from \( \textsf{glb} \) can be transactionally written and read.

Property 5

The domain of \(\textit{log} \) does not contain the location \( \textsf{glb} \), i.e.,

$$\begin{aligned} \forall x \in {{\textbf {dom}}}( \textit{log}). \; x \ne \textsf{glb} \end{aligned}$$

Property 6

For all locations \(x \ne \textsf{glb} \) that is not in \(\textit{log} \), the persistent view includes only their last written value, i.e.,

$$\begin{aligned} \forall x \in \textsc {Loc}. \; x \ne \textsf{glb} \wedge x \notin {{\textbf {dom}}}( \textit{log}) \; \implies \; {[}x]^\textsf{P} = \{\vec {x}\} \end{aligned}$$

6.3.4 Properties about \( \textsf{glb} \) and \(\textsf{recGlb}\)

Next we have three properties for \( \textsf{glb} \) and the auxiliary variable \(\textsf{recGlb}\).

Property 7

In the presence of a writing transaction, last value of glb in the memory must be odd, i.e.,

$$\begin{aligned} \textsf{writer}\ne \textit{None}\ \ \implies \ \ \textit{odd}(\overrightarrow{ \textsf{glb} }) \end{aligned}$$

Property 7 holds due to the successful execution of W1. Note that the implication does not hold in the other direction because, in our model, we reset the auxiliary variable \(\textsf{writer}\) to \(\textit{None}\) during a crash, yet the last value of \( \textsf{glb} \) after a crash may be odd. One could have defined a stronger invariant: \(\lnot \textsf{Recovering}\implies (\textsf{writer}\ne \textit{None}\Leftrightarrow \textit{odd}(\overrightarrow{ \textsf{glb} }))\), however, we have not needed this strengthening in our proofs.

Property 8

With the exception of the initial message, the value of \( \textsf{glb} \) is greater than or equal to \(\textsf{recGlb}\), i.e.,

$$\begin{aligned} \forall i\in {{\textbf {dom}}}(M).\; 0 < i \wedge M[i].\textsf {loc}= \textsf{glb} \; \implies \; M[i].\textsf {val}\ge {\textsf{recGlb}} \end{aligned}$$

Property 9

After a transaction \(t\) successfully executes \(\texttt {TMBegin}\), the value of \( \textsf{loc}_t \) must be less than or equal to the last value of glb \((\overrightarrow{ \textsf{glb} })\). Moreover, after a successful \(\texttt {TMWrite}\) and/or \(\texttt {TMRead}\) operation has taken place (i.e. \(\textsf{hasRead}_t\vee \textsf{hasWritten}_t\) holds), the value of \(\textsf{recGlb}\) is less than or equal to \( \textsf{loc}_t \), i.e.

$$\begin{aligned} \forall t\in \textsc {Tid}. \; \begin{array}{@{}l@{}}(\textit{pc}_{t} \notin \{ \textit{NotStarted}, Bp, B1,B2, Aborted, Committed \} \implies \textsf{loc}_t \le \overrightarrow{ \textsf{glb} }) \ \ \wedge {} \\ (\textsf{hasRead}_t\vee \textsf{hasWritten}_t\implies {\textsf{recGlb}} \le \textsf{loc}_t ) \end{array} \end{aligned}$$

6.3.5 Properties about recovery

Finally, we have a set of properties about the state immediately after a crash (before recovery has begun) and after recovery has finished.

Property 10

When a \(\texttt {TMRecover}\) process is in progress, all the transactions are either \(\textit{NotStarted}\), \(\textit{Aborted}\) or \(\textit{Committed}\), i.e.,

$$\begin{aligned} \textsf{Recovering}\ \ \implies \ \ ( \forall t. \; pc_{t} \in \{ \textit{NotStarted}, \textit{Aborted}, \textit{Committed}\} ) \end{aligned}$$

Property 11

When a \(\texttt {TMRecover}\) process is not in progress (i.e., has completed), the value of \( \textsf{glb} \) in the initial message is less than the value of the auxiliary variable \(\textsf{recGlb}\), which in turn is at most the final value of the value of \(\textsf{recGlb}\). Moreover, the value of \(\textit{even}(\textsf{recGlb})\) is even, i.e.,

$$\begin{aligned} \lnot \textsf{Recovering}\ \ \implies \ \ M[0]( \textsf{glb} ) < {\textsf{recGlb}} \wedge {\textsf{recGlb}} \le \overrightarrow{ \textsf{glb} }\wedge \textit{even}(\textsf{recGlb}) \end{aligned}$$

6.4 \(\text {dTML}_{\text {Px86}}\) program annotation

We now enumerate the local properties of each thread by adding program annotations at each atomic step. The program annotation is formed by view-based expressions (see Sect. 6.1). As an example, we give the annotated proof outline of \(\texttt {TMRead}\) in Fig. 7. The assertions of \(\text {dTML}_{\text {Px86}}\) can be classified into three categories:

  1. (1)

    Transactions that have not yet performed a read or a write (),

  2. (2)

    Read-only transactions (), and

  3. (3)

    Writing transactions ().

The assertions highlighted in Fig. 7 capture the effects of the preceding instruction.

We define an assertion \( \textsf{ready}_t\), which holds when an in-flight transaction is in an idle state (i.e., not executing any transactional operation):

The first disjunct captures two local conditions of \(t\): that the local snapshot of \( \textsf{glb} \) is even and the writer is not \(t\), as well as a visibility guarantee that if \(t\)’s the local snapshot of \( \textsf{glb} \) is consistent with the last write to \( \textsf{glb} \), then the thread’s view of each location \(y\) is the last write to \(y\). The visibility guarantee ensures that the transaction \(t\) can be serialised after the last writing transaction in case \(t\) successfully performs its reads and commits.

The second disjunct covers read-only transactions as described in Sect. 5.1 using the predicate \( \mathsf{read_{pre}}(t, y)\) below. We let \(\textsf {coh}_t(x) \triangleq \lambda \sigma .\ (\sigma .\mathbb {T}(t)).\textsf {coh}(x)\).

$$\begin{aligned} \mathsf{read_{pre}}(t, y)&= \begin{array}{@{}l@{}} {\textsf {coh}}_t( \textsf{glb} ) > 0 \wedge M[{\textsf {coh}}_t( \textsf{glb} )] \equiv {\langle { \textsf{glb} := \textsf{loc}_t }\rangle } \end{array} \end{aligned}$$

Predicate \( \mathsf{read_{pre}}(t, y)\) is established a successful CAS-success transition (see Fig. 4). Namely, the successful \({{\textbf {CAS}}} \) transition at R2 in Fig. 7 shifts the coherence view of \( \textsf{glb} \) to the length of the memory in the pre-state, which is greater than zero since the memory includes always the initial message. Furthermore, the second conjuct of \( \mathsf{read_{pre}}\) holds because the successful \({{\textbf {CAS}}} \) transition appends the message \( {\langle { \textsf{glb} := \textsf{loc}_t }\rangle }\) to the end of the memory.

The third disjunct of \( \textsf{ready}\) is straightforward since a writing transaction takes the lock (by making \( \textsf{glb} \) odd). The only additional guarantee one required is that \(t\)’s asynchronous view of each location in \(\textit{log} \) is maximal. This guarantees that when \(t\) later performs an \({{\textbf {sfence}}} \) at C1, all of the writes performed by \(t\) are persisted.

Fig. 8
figure 8

Example execution for read-only transactions.

We discuss correctness of read-only transactions (pink assertions), which is the most challenging aspect of the proof. We use the example in Fig. 8 with an abstract history h comprises three transactions \(t_1\)-\(t_3\). Transactions \(t_1\) and \(t_2\) cannot be reordered due to the real-time order constraint of durable opacity (see Definition 4). Moreover, since the first read of transaction \(t_3\) has returned 1 for x, the only valid sequential history corresponds to the ordering (\(t_1 \prec t_3 \prec t_2\)). Thus, the second read in transaction \(t_3\) must either return 7 for y, or abort.

In the implementation, we must identify the timestamp of the write that a read-only transaction reads from and does not lead to an abort. To this end, let

where \({\textsf {LE}_\textsf{coh}}(y,t,x) \) returns the timestamp of the last write to \(x\) before \(t\)’s coherence view for \(y\). In the example in Fig. 8, we have \({\textsf {LE}_\textsf{coh}}( \textsf{glb} ,t_3,y) = 3\) since \(t_3\)’s coherence view of \( \textsf{glb} \) is memory index 5, and the last write to y before index 5 is at index 3.

Provided that \(t_3\) reads from the memory at index 3, the message at index 5 will still be observable to \(t_3\). Therefore, it can read this message at R4 so that the check at R5 does not fail. We can prove that the second read of \(t_3\) can only succeed if it reads from index 3 by contradiction.

  • Case 1: \(t_3\) reads a message with timestamp greater than \({\textsf {LE}_\textsf{coh}}( \textsf{glb} ,t_3,y) \) at Rp. In Fig. 8, the only such message is at index 8. Using Property 3, in the post-state of Rp, there exists a timestamp ts between \(\textsf {coh}_{t_3}( \textsf{glb} )\) (i.e., 5 in our example) and \(\textsf {coh}_{t_3}(y)\) (i.e., 8 in our example), such that \(M[ts] \equiv {\langle { \textsf{glb} := v}\rangle }\) and \(\textit{odd}(v)\). In our example, \(ts = 6\) and \(v = 3\).

       By Property 4, every observable timestamp for \( \textsf{glb} \) must be greater than or equal to ts (i.e. 6) and thus greater than \(\textsf {coh}_{t_3}( \textsf{glb} )\) (i.e., 5). Thus, \(t_3\) must read a value for \( \textsf{glb} \) at R4 that is different from \( \textsf{loc}_{t_3}\). By the third conjunct of the of \( \textsf{ready}_{t_3}\), we have \(\textit{even}( \textsf{loc}_{t_3})\). Moreover by Property 1, each value of \( \textsf{glb} \) after ts is at least v. Since \(\textit{odd}(v)\), we have \(v \ne \textsf{loc}_{t_3}\), thus \(t_3\) cannot observe \( \textsf{loc}_{t_3}\) for \( \textsf{glb} \).

  • Case 2: \(t_3\) reads a message with timestamp less than \({\textsf {LE}_\textsf{coh}}( \textsf{glb} ,t_3,y) \) at Rp. In Fig. 8, such a message is the initial message (with timestamp 0). By Property 4, \( {\textsf {vrnew}}_t \) must be at least \(\textsf {coh}_{t_3}( \textsf{glb} )\) (i.e., timestamp 5). However, \({\textsf {LE}_\textsf{coh}}( \textsf{glb} ,t_3,y) \) overwrites this earlier message and hence the earlier timestamp is no longer visible to \(t_3\).

The annotations for the remaining \(\text {dTML}_{\text {Px86}} \) operations are given in Sect. B.1.

Theorem 1

() The proof outline for \(\text {dTML}_{\text {Px86}}\) is valid.

7 Durable opacity via refinement

We now are now ready to prove durable opacity. The proof proceeds by showing refinement between \(\text {dTML}_{\text {Px86}}\) and the dTMS2 operational specification (see Sect. 3.2). In particular, we establish a forward simulation (Definition 2) between the \(\text {dTML}_{\text {Px86}}\) transition system and the dTMS2 specification. It is well known that a forward simulation is a sound proof technique for refinement. As in proofs of linearisability [22], refinement must guarantee trace inclusion, i.e., every externally observable behaviour of the concrete system (e.g., \(\text {dTML}_{\text {Px86}}\)) is an externally observable behaviour of the abstract system (e.g., dTMS2). The external steps (transitions) correspond to invocations and responses of transactional operations as well as the system crashes.

Definition 2

(Forward simulation) For an abstract system A and a concrete system C, a relation R between the states of A and C is a forward simulation iff each of the following holds.

  • Initialisation. For any initial state \(cs_0\) of C, there exists an initial state \(as_0\) of A such that \(R(as_0, cs_0)\).

  • External step correspondence. For any external transition from cs to \(cs'\) in C and any state as of A such that R(ascs), there exists a corresponding external transition (performing the same action) from as to \(as'\) such that \(R(as', cs')\).

  • Internal step correspondence. For any internal transition from cs to \(cs'\) of C and any state as of A such that R(ascs), either:

    • \(\bullet \) \(R(as,cs')\), or (stuttering step)

    • \(\bullet \) There is an internal transition of A from as to \(as'\) such that \(R(as',cs')\) (non-stuttering step)

The forward simulation relation (\(\textit{simRel}\)) is split into two relations: a global relation, globalRel (see Sect. 7.1), and a transaction relation, txnRel (see Sect. 7.2). The global relation describes how the shared states of the two transition systems are related, while the transaction relation specifies the relation between the state of each transaction in the concrete and abstract transition systems. In particular, we have:

$$\begin{aligned} \textit{simRel}(as, cs) \triangleq \textit{globalRel}(as, cs) \wedge \forall t\in \textsc {Tid}.\ \textit{txnRel}(as, cs, t) \end{aligned}$$

where as and cs are states of \(\text {dTMS2} \) and \(\text {dTML} \), respectively.

To explain these relations, we start by identifying the linearisation points of the \(\text {dTML}_{\text {Px86}} \) operations.

Operation \(\texttt {TMBegin}\) linearises at B1 provided \( \textsf{loc}_t \) is even. For transactions that have not performed any \(\texttt {TMRead}\) or \(\texttt {TMWrite}\), the linearisation point of \(\texttt {TMRead}\) is a successful \({{\textbf {CAS}}}\) at R2. For any other type of transaction, \(\texttt {TMRead}\) linearises at R5 provided the value read from \( \textsf{glb} \) is \( \textsf{loc}_t \). Operation \(\texttt {TMWrite}\) linearises when the memory is updated at W7. Operation \(\texttt {TMCommit}\) has two linearisation points depending on whether the transaction has successfully executed a \(\texttt {TMWrite}\) operation. For a writing transaction (i.e., when \( \textsf{loc}_t \) is odd), \(\texttt {TMCommit}\) linearises at C2. Otherwise, \(\texttt {TMCommit}\) linearises at Cp.

Our simulation relation assumes the following auxiliary definitions. We let \(\textit{intHalf}(n) \triangleq \left\lfloor \frac{n}{2}\right\rfloor \), be integer division of n by 2.

$$\begin{aligned} \textit{writes}(as, cs)&\triangleq {\textbf {if }} cs.\textsf{writer}= t \wedge pc_{t} \ne C3 {\textbf { then }} as.\textsf{wrSet}_t{\textbf { else }} \emptyset \\ \textit{logicalGlb}(cs)&\triangleq \begin{array}{@{}l@{}}{\textbf {if }} cs.\textsf{writer}= t \wedge pc_{t} = C3 \\ {\textbf {then }} \overrightarrow{ \textsf{glb} }(cs) - cs.\textsf{recGlb}+ 1 {\textbf { else }} \overrightarrow{ \textsf{glb} }(cs) - cs.\textsf{recGlb}\end{array} \\ wrCount(cs)&\triangleq \textit{intHalf}(\textit{logicalGlb}(cs)) \\ \textit{inFlight}(t, cs)&\triangleq t \in \textsc {Tid}\wedge pc_t \notin \{\textit{NotStarted}, \textit{Aborted}, \textit{Committed}\} \end{aligned}$$

The function \(\textit{writes}\) returns the abstract \(\textsf{wrSet}_t\) of a writing transaction. Note that the abstract \(\textsf{wrSet}_t\) is empty after the writing transaction has cleared its log, and hence \(\texttt {TMCommit}\) linearises at C2. The function \(\textit{logicalGlb}\) is used to determine the logical value of \( \textsf{glb} \) (since initialisation or the last recovery) and compensates for the fact that a committing writing transaction has linearised but not yet incremented \( \textsf{glb} \) at C3. The function wrCount(cs) returns the number of committed writing transactions in the concrete state, taking into account the fact that each writing transaction updates \( \textsf{glb} \) twice. Finally, \(\textit{inFlight}\) is used to determine whether the given transaction t in state cs is live (has been started but has not been committed or aborted).

7.1 Global relation

The global relation \(\textit{globalRel}\) comprises conditions (1)-(4) below. The definition relies on , which returns the timestamp of the last write to \(x\) before timestamp t.

$$\begin{aligned}&\lnot \textsf{Recovering}\implies ( \forall x. \; x \ne \textsf{glb} \ \ \implies \ \ \vec {x} = (\textit{last}(as.L) \oplus \textit{writes}(cs, as))(x) \end{aligned}$$
(1)
$$\begin{aligned}&\lnot \textsf{Recovering}\ \ \implies \ \ (\textit{wrCount}(cs)= |{as.L}| - 1) \end{aligned}$$
(2)
$$\begin{aligned}&\forall x. \; x \ne \textsf{glb} \ \ \implies \ \ \textit{last}(as.L) (x) = \begin{array}{@{}l@{}}{\textbf {if }} x \notin {{\textbf {dom}}}(cs.\textit{log}) \\ {\textbf {then }} \vec {x} {\textbf { else }}(cs.\textit{log}) (x)\end{array} \end{aligned}$$
(3)
(4)
  • Condition (1) states that, for each location \(x\) different from \( \textsf{glb} \), the last written value for \(x\) in \(\text {dTML}_{\text {Px86}} \) is the value of \(x\) in the last memory snapshot of \(\text {dTMS2} \) overwritten by the write set of an in-flight writing transaction (if there is any).

  • Condition (2) states that the number of memory snapshots in \(\text {dTMS2} \) memory since initialisation or the last crash is equal to \(\textit{wrCount}(cs)\).

  • Condition (3) states that, for each location \(x\) different from \( \textsf{glb} \), the value of \(x\) in the last memory snapshot of dTMS2 is the last written value for \(x\) in \(\text {dTML}_{\text {Px86}}\) whenever \(x\) is not in \(\textit{log} \) and is the corresponding value in \(\textit{log} \), otherwise.

  • Condition (4) states that whenever \(\text {dTML}_{\text {Px86}}\) ’s memory index i contains a write to \( \textsf{glb} \) with value v, for any location \(x\) different from \( \textsf{glb} \), the value of the last write to \(x\) before index i is precisely the value of \(x\) in the abstract memory snapshot indexed \(\textit{intHalf}(v - cs.\textsf{recGlb})\).

7.2 Transaction relation

The transaction relation (txnRel) comprises conditions (5)-(10) given below, as well as a condition matching abstract and concrete program counters, return values, and validity of completing transactions, which we discuss later.

$$\begin{aligned} \forall t. \; inFlight(t, cs) \wedge \lnot cs.\textsf{hasWritten}_t\wedge \lnot cs.\textsf{hasRead}_t&\implies as.\textsf{rdSet}_t= \emptyset \end{aligned}$$
(5)
$$\begin{aligned} \forall t. \; inFlight(t, cs) \wedge cs.\textsf{hasRead}_t&\implies as.\textsf{rdSet}_t\ne \emptyset \end{aligned}$$
(6)
(7)
(8)
$$\begin{aligned} \forall t. \; inFlight(t, cs) \wedge \textsf{writer}= t \wedge cs.pc_t \notin \{ W4 - W7 \}&\implies as.\textsf{wrSet}_t\ne \emptyset \end{aligned}$$
(9)
$$\begin{aligned} \forall t.\ \forall x \in {{\textbf {dom}}}(as.\textsf{wrSet}_t).\ cs.\textsf{writer}= t&\implies (as.\textsf{wrSet}_t)(x) = cs.\vec {x} \end{aligned}$$
(10)

The first five conditions, relate the \(\text {dTML}_{\text {Px86}} \) state of an inFlight transaction t with the \(\textsf{wrSet}_t\) and \(\textsf{rdSet}_t\) variables of the corresponding \(\text {dTMS2} \) state. Condition (10) resolves internal reads and states that the write set (of the dTMS2 state) of a writing transaction t contains the last written value to that location by \(\text {dTML}_{\text {Px86}} \) for each location in its domain.

We elide formal presentation of the final condition of \(\textit{txnRel}\), and instead provide a textual description of its remaining parts. We refer the interested reader to our mechanisation [7] for full details.

(1):

\(\textit{txnRel}\) maps \(\text {dTML}_{\text {Px86}} \) program counter values to their \(\text {dTMS2} \) counterparts, which also enables one to identify the linearisation points of \(\text {dTML}_{\text {Px86}} \).

(2):

\(\textit{txnRel}\) ensures that the value returned by a \(\text {dTML}_{\text {Px86}} \)’s successful read on x (\(\texttt {TMRead}(x)\)) in its linearisation point is the same as the value returned in the corresponding non-stuttering step of \(\text {dTMS2} \). For this, the last condition leverages both the global relation and the TMRead annotation (Fig. 7). The way in which abstract and concrete values are matched differs for read-only and writing transactions. We give an overview of the proof for identifying the corresponding abstract and concrete values below.

Read-only transaction. The first read (i.e., \(\texttt {TMRead}(x)\)) of a read-only transaction t succeeds if \(cs. \textsf{loc}_t \) obtains the maximum value of \( \textsf{glb} \) in cs. Otherwise, the \({{\textbf {CAS}}} \) instruction at R2 fails. Based on the precondition of R2 (P2) (see Fig. 7), if the \({{\textbf {CAS}}} \) instruction succeeds, we can deduce that \(\overrightarrow{ \textsf{glb} }(cs) = cs. \textsf{loc}_t \) and \(cs. \textsf{loc}_t \) is even. Additionally, we can infer that there is no message with a timestamp greater than or equal to the timestamp corresponding to the last write to \( \textsf{glb} \) in cs with a location different from \( \textsf{glb} \). This is because, according to Property 3, if such a write existed, the value of the last write of \( \textsf{glb} \) would be odd. Therefore, the timestamp of the message read for x precedes the timestamp of the message of the last write to \( \textsf{glb} \) in cs and must have the form \(\textsf {LE}(i,x)\), where \(cs.M[i] \equiv {\langle { \textsf{glb} := cs. \textsf{loc}_t }\rangle }\). By instantiating Condition (4), we can infer that the value read for x corresponds to the abstract value \(as.L[\textit{intHalf}(cs. \textsf{loc}_t - cs.\textsf{recGlb})](x)\).

   For any subsequent read operation (i.e., \(\texttt {TMRead}(x)\)) performed by a read-only transaction t, we can derive the index of the memory snapshot of \(\text {dTMS2} \) that contains the returned write directly by the \(\texttt {TMRead}\) program annotation. Specifically, the \(\texttt {TMRead}\) program annotation (assertion P5 in Fig. 7) imposes that the only value that can be successfully returned by \(\text {dTML}_{\text {Px86}} \) corresponds to the concrete memory message with timestamp \({\textsf {LE}_\textsf{coh}}( \textsf{glb} ,t,x) \). By expanding the definition of \(\textsf {LE}_\textsf{coh}\), we obtain \({\textsf {LE}_\textsf{coh}}( \textsf{glb} ,t,x) = M[ \textsf {LE}({\textsf {coh}}_t( \textsf{glb} ),x) ] \). Given this, and by using condition (4), we can determine that the index of the memory snapshot of \(\text {dTMS2} \) containing this write is \(as.L[\textit{intHalf}(cs. \textsf{loc}_t - cs.\textsf{recGlb})]\).

Writing transaction. By the \(\texttt {TMRead}\) program annotation (assertion P5 in Fig. 7), a read operation on x (i.e., \(\texttt {TMRead}(x)\)) of a writing transaction t can only return the last value written on x (\(\vec {x}(cs)\)). In case the read is external, by utilising condition (1), we can deduce that the corresponding abstract value is equal to \(\textit{last}(as.L)(x)\). In case the read is internal by condition (10) the corresponding abstract value is equal to \((as.\textsf{wrSet}_t)(x)\).

(3):

It ensures that the ordering (validity) constraints of \(\text {dTMS2} \) are met. For this it guarantees that in the linearisation points of the \(\text {dTML}_{\text {Px86}} \)’s \(\texttt {TMRead}\) and \(\texttt {TMCommit}\) operations, we have:

$$\begin{aligned}{}\begin{array}{@{}l@{}}as.\textsf{rdSet}_t\subseteq as.L[\textit{intHalf}({cs. \textsf{loc}_t \!-\! cs.\textsf{recGlb}})] \wedge {} \\ as.{\textsf {beginIdx}}_{t}\le \textit{intHalf}({cs. \textsf{loc}_t - cs.\textsf{recGlb}})\end{array} \end{aligned}$$
(11)

For a transaction t, the validity constraint of \(\text {dTMS2} \) requires that if its read set \((as.\textsf{rdSet}_t)\) is not empty, it must be consistent with a memory snapshot indexed greater than or equal to \({\textsf {beginIdx}}_{t}\) and less than the length of the abstract memory (as indicated in \(\texttt {doRead}_t(x, n)\) and \(\texttt {doCommit}_t \) in Fig. 2). By condition (11), condition (2), and property (9), an index satisfying these conditions exists and is equal to \(\textit{intHalf}({cs. \textsf{loc}_t - cs.\textsf{recGlb}})\). In the case where t is a writing transaction (\(\textsf{wrSet}_t\ne \bot \)), this index should correspond to the last element of the abstract memory (as defined in \(\texttt {doCommit}_t \) in Fig. 2). The \(\texttt {TMCommit}\) annotation (see Appendix) specifies that at the linearisation point of the \(\texttt {TMCommit}\) operation for a writing transaction, \(cs. \textsf{loc}_t = \textsf{glb} \). Combining this with condition (11), we can deduce that \(as.\textsf{rdSet}_t\subseteq as.L[\textit{intHalf}({\vec {(} \textsf{glb} )(cs) - cs.\textsf{recGlb}})]\). According to condition (2), this corresponds to the last element of the abstract memory.

(4):

It ensures that immediately after a crash, the length of the \(\text {dTMS2} \) memory list is 1, the transaction that executes the \(\texttt {TMRecover}\) operation is syst and the value of each location x that is read and cleared from the \(cs.\textit{log} \) in each recovery loop is equal to the corresponding value of x the memory snapshot of \(\text {dTMS2} \).

Theorem 2

() \(\textit{simRel}\) is a forward simulation is a between dTMS2 and \(\text {dTML}_{\text {Px86}}\).

7.3 Mechanisation

The refinement proof has been fully mechanised in Isabelle/HOL. This mechanisation builds on the \(\textsc {Pierogi}\) framework [8]. This comprised three main steps:

  1. (1)

    Encoding the necessary modifications to the \(\text {Px86}_{\textrm{view}} \) model [12] to reflect the revised version presented in Sect. 4, then adapting a selection of the Pierogi proof rules to the new context and proving the additional proof rules of Pierogi (Sect. B).

  2. (2)

    Proving validity of the persistent invariant of \(\text {dTML}_{\text {Px86}} \) (Sect. 6.3) and the proof outline for \(\text {dTML}_{\text {Px86}} \) (Theorem 1).

  3. (3)

    Proving that the \(\text {dTML}_{\text {Px86}} \) implementation refines the \(\text {dTMS2} \) specification (Theorem 2). Specifically, we established the simulation relation for each step of the \(\text {dTML}_{\text {Px86}} \) transition system, resulting in a total of 47 sub-proofs.

Steps (1) and (2) together took approximately 3 months of full-time work and step (3) required approximately 2 months.

The core development, including semantics, view-based assertions, and the soundness of proof rules, consists of approximately 10,000 lines of Isabelle/HOL code. With this foundation in place, the proof of the persistence invariant, and the validity of the proof outline for \(\text {dTML}_{\text {Px86}} \) encompass approximately 20,000 lines of Isabelle/HOL code.

7.3.1 Lessons learnt

The initial step involves developing a persistent invariant and program annotations for dTML, which we expressed using Pierogi  [8] view-based expressions. While it was necessary to extend the Pierogi expression syntax to account for memory patterns that occurred in \(\text {dTML}_{\text {Px86}}\), a substantial subset of them remained applicable without modification. Thus, we believe that Pierogi  [8] can be widely utilised or easily extended to verify persistent memory algorithms in general. We extend the existing semantics [12] to model both crash and recovery. Additionally, we use an extended Owicki–Gries [54] logic that makes use of a persistent invariant, which is shown to hold for all program transitions, including the crash transition and the subsequent recovery process. These extensions can also be readily applied to model and reason about other programs, including after a crash.

The subsequent step comprises proving forward simulation [52] together with the \(\text {dTML}_{\text {Px86}}\) persistent invariant and program annotation, establishing refinement between \(\text {dTML}_{\text {Px86}}\) and \(\text {dTMS2} \) [5] (which in turn guarantees durable opacity). Although, this was not used in the current proof, a strength of a simulation-based approach lies in its ability to enable hierarchical reasoning, e.g., if one was required to use intermediate model to establish both a forward and backward simulation [22, 49]. Our simulation relation is inspired by existing works [18], showing that established concepts and methods for verifying volatile algorithms provide a stepping stone to the more complex Px86 domain.

Each \(\text {Px86}_{\textrm{view}} \) assertion that we use requires introduction of proof rules for this assertion for different atomic program statements (see §B), which must be proved sound against the operational semantics. Proving correctness of these rules can be challenging because it requires examination of the low-level operational semantics and their effect on the views of different system components. However, once soundness is established, they can be reused to validate proof outlines without extra effort. In particular, Isabelle/HOL can generate the required proof obligations with minimal interaction and then automatically identify the appropriate set of high-level proof rules needed to resolve each obligation using the integrated Sledgehammer tool [9].

In our proof (see [7]), there are several repeated patterns of unfoldings and theorem application. These could be automated through specific tactics (e.g., using Eisbach [53]). One could also make further improvements to the proof structure and modularisation (e.g., using locales [43]). However, we leave these aspects for future work.

8 Related work

Various works focus on adapting algorithms for the conventional volatile RAM model to non-volatile memory (NVM). FliT [70] is a C++ library that can be used for making any linearisable [34] data structure durable linearisable [38] by selectively flushing only writes that are subsequently read. NVTraverse [28] and Mirror [29] are able to translate automatically lock-free data structures into a durable data structures. In particular NVTraverse requires the given lock-free data structure to be in a traversal form while Mirror employs a shadowing data technique which requires maintaining two replicas of data, a persistent memory version which is updated first, and a volatile version which is updated second and is used for fast data access. In this work we are focussing on adapting a software transactional memory implementation to the persistency setting.

Our example implementation, \(\text {dTML}_{\text {Px86}} \) extends \(\text {TML} \) [15] with a persistent undo log, and associated modifications such as the introduction of a recovery operation. The undo log technique is used by several persistent STMs [11, 13, 46] as a means of achieving failure atomicity. An alternative technique comprises using a redo log [30, 32, 51, 62, 69]. Other persistent transactional memory algorithms rely on applying hardware modifications for achieving failure atomicity [40, 63, 67]

The literature includes numerous notions of correctness for software transactional memory many of which have been introduced as consistency conditions of database system transactions. As an example, strict serialisability  [55], requires that all non-aborted completed transactions must be ordered to form a sequential history that is valid (i.e. respect the memory semantics) and respects the real-time order of the transactions. Although opacity can offer robust safety guarantees that render it suitable for transactional memory, it may be viewed as overly restrictive and needlessly complex to implement in TM systems. This is primarily due to its requirement that every live and abort transaction must be consistent with all prior committed transactions. To this end a number of correctness conditions have been suggested which aim to modify various aspects of opacity while preserving its essential safety guarantees such as elastic opacity [27], live opacity [25], virtual world consistency [35] and last-use opacity [65].

In the context of non-volatile memory, Raad et al. [57] proposed a notion of durable serialisability under weak memory which extends the concept of serialisability to the persistency setting. However, this correctness condition does not handle aborted transactions. Here we are focusing on durable opacity which was first introduced in by Bila et al. [5].

The Pierogi logic, including the extensions developed in the current work, is proven sound against the \(\text {Px86}_{\textrm{view}} \) semantics of Cho et al. [12]. Other operational semantics for persistent TSO include [1, 44, 59] and [60] which extend the model introduced in [59] to encompass non-temporal writes and reads and writes to a richer set of Intel-x86 memory types. In terms of program logics, apart from Pierogi, POG [58] also addresses persistent memory programs. However, it is not mechanised and can not directly handle examples that involve \({{\textbf {flush}}}_{\text {opt}}\) instructions. Vindum and Birkedal [68] have recently developed a not architecture-specific concurrent separation logic for weak persistency. This logic is built upon the Perenial [10] and Iris logic framework [42] and has been mechanised in the Coq proof assistant.

Numerous works have aimed to simplify proofs of persistent memory programs (including persistent TM algorithms). Gorjiara et al. [31] have developed a notion of robustness for Px86, which holds if every post-crash execution of a program under Px86 is a post-crash execution under a strict persistency model. Here, strict persistency is defined in terms of TSO, i.e., if two stores are ordered under the TSO semantics, they must be persisted in the same order. Thus, when a program is robust, one can reason about Px86 programs using TSO semantics, simplifying verification. However, there are efficient bug finding tools for checking robustness violations, as far as we are aware, there is currently no technique that enables robustness for Px86 to be checked for all possible executions.

Beillahi et al. [4] have notions of robustness for causally consistent transactions, which aims to reduce weak transactional consistency models to serialisability. This work studies transactional consistency as opposed to implementations of transactional memory, thus is orthogonal (but complementary) to our work.

Two recent works have developed modular proofs for durable opacity. Bila et al. [6] develop a durable library that supports transformation of simulation-based proofs of opaque TM implementations to proofs of durable opacity for the same TM that uses a persistency library. Given that TML has already been verified to be opaque [17], technically, such a proof could be reused. However, the library currently only supports the stronger PSC memory model. Raad et al. [61] present another modularisation technique that builds on the PMDK transactional library [36], which provides support for failure atomic transactions, but not concurrency. In particular, they show that PMDK transactions can be embedded within an STM to achieve both failure atomicity and thread safety, including under Px86, with validation performed using the FDR model checker [23]. Our intention is to directly support proofs of durable opacity, rather than rely on a third party.

9 Conclusions

In this work, we presented a revised version of the Px86 model [12] and Pierogi [8], which allows reasoning about Px86 programs even after a system crash. Subsequently, we presented a durably opaque STM implementation under \(\text {Px86} \) (\(\text {dTML}_{\text {Px86}} \)) and demonstrated a proof technique based on refinement for establishing correctness.

A possible extension of this work is exploring the connection between durable opacity and contextual refinement. This is particularly relevant in the case of persistent STM implementations like \(\text {dTML}_{\text {Px86}} \), which are primarily used as libraries. Relevant work in the context of C11 has been conducted by Dalvandi and Dongol [16], demonstrating the insufficiency of \(\text {TMS2} \) in providing client guarantees under the weak memory model of C11. In this work, a more adequate specification is proposed which constitutes an adaptation of \(\text {TMS2} \), along with a program logic for verifying client programs. In the context of persistent memory, Khyzha and Lahav [45] have introduced a correctness criterion for contextual refinement.

We believe that the methodology can also be applied to construct a program logic for other weak persistent memory models such as the PArmv8 model. A potential starting point for this, could be the PArmv8 view-based semantics presented in Cho et al. [12].

Finally, formalising more weak correctness conditions for persistent STMs (e.g., buffered durable opacity, which can be defined in the same fashion as buffered durable linearisability [38]) as well as exploring their performance implications can be an interesting subject for future work.