Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

There are many ways that software can fail: either software itself can be the cause of the failure (e.g. memory overflow or null pointer dereferencing); or the failure can arise independently of the software. These unpredictable failures are either transient faults, such as when a bit is flipped by cosmic radiation, or host failures (also referred to as crashes). Host failures can be classified into soft, such as those arising from power loss which can be fixed by rebooting the host, and hard, such as permanent hardware failure.

Consider a simple transfer operation that moves money between bank accounts. Assuming that bank accounts can have overdrafts, the transfer can be regarded as a sequence of two steps: first, subtract the money from one bank account; and then add the money to the other account. In the absence of host failures, the operation should succeed. However, if a host failure occurs in the middle of the transfer, money is lost. Programmers employ various techniques to recover some consistency after a crash, such as write-ahead logging (WAL) and associated recovery code. In this paper, we develop the reasoning to verify programs that can recover from host failures, assuming hard failures do not happen.

Resource reasoning, as introduced by separation logic [15], is a method for verifying that programs do not fail. A triple \(\{\, P \,\} \, {\mathbb {C}}\, \{\, Q \,\}\) is given a fault-avoiding, partial correctness interpretation. This means that, assuming the precondition P holds then, if program \(\mathbb {C}\) terminates, it must be the case that \(\mathbb {C}\) does not fail and has all the resource necessary to yield a result which satisfies postcondition Q. Such reasoning guarantees the correct behaviour of the program, ensuring that the software does not crash itself due to bugs, e.g. invalid memory access. However, it assumes that there are no other failures of any form. To reason about programs that can recover from host failures, we must change the underlying assumptions of resource reasoning.

We swap the traditional resource models with one that distinguishes between volatile and durable resource: the volatile resource (e.g. in RAM) does not survive crashes; whereas the durable resource (e.g. on the hard drive) does. Recovery operations use the durable state to repair any corruptions caused by the host failure. We introduce fault-tolerant resource reasoning to reason about programs in the presence of host failures and their associated recovery operations. We introduce a new fault-tolerant Hoare triple judgement of the form:

$$ {S} \, \vdash \,\{\, P_{V} \ | \ P_{D}\, \} \, {{\mathbb {C}}} \, \{\, Q_{V} \ | \ Q_{D} \,\} $$

which has a partial-correctness, resource fault-avoiding and host failing interpretation. From the standard resource fault avoiding interpretation: assuming the precondition \(P_V \ |\ P_D\) holds, where the volatile state satisfies \(P_V\) and the durable \(P_D\), then if \(\mathbb {C}\) terminates and there is no host failure, the volatile and durable resource will satisfy \(Q_V\) and \(Q_D\) respectively. From the host-failing interpretation: when there is a host failure, the volatile state is lost, and after potential recovery operations, the remaining durable state will satisfy the fault-condition S.

We extend the Views framework [3], which provides a general account of concurrent resource reasoning, with these fault-tolerant triples to provide a general framework for fault-tolerant resource reasoning. We instantiate our framework to give a fault-tolerant extension of concurrent separation logic [11] as an illustrative example. We use this instantiation to verify the correctness of programs that make use of recovery protocols to guarantee different levels of fault tolerance. In particular, we study a simple bank transaction using write-ahead logging and a simplified ARIES recovery algorithm [8], widely used in database systems.

2 Motivating Examples

We introduce fault-tolerant resource reasoning by showing how a simple bank transfer can be implemented and verified to be robust against host failures.

2.1 Naive Bank Transfer

Consider a simple transfer operation that moves money between bank accounts. Using a separation logic [15] triple, we can specify the transfer operation as:

$$ \vdash \begin{array}{@{}c@{}} \left\{ { \mathsf {Account}(\mathtt {from}, v) * \mathsf {Account}(\mathtt {to}, w) } \right\} \\ {\mathtt {transfer}(\mathtt {from}, \mathtt {to}, \mathtt {amount})} \\ \left\{ {\mathsf {Account}(\mathtt {from}, v - \mathtt {amount}) * \mathsf {Account}(\mathtt {to}, w + \mathtt {amount})} \right\} \end{array} $$

The internal structure of the account is abstracted using the abstract predicate [12], \(\mathsf {Account}(x, v)\), which states that there is an account x with balance v. The specification says that, with access to the accounts \(\mathtt {from}\) and \(\mathtt {to}\), the transfer will not fault. It will decrease the balance of account \(\mathtt {from}\) by \(\mathtt {amount}\) and increase the balance of account \(\mathtt {to}\) by the same value. We can implement the transfer operation as follows:

$$ \begin{array}{@{}l@{}} \mathtt {function} \ \mathtt {transfer}(\mathtt {from}, \mathtt {to}, \mathtt {amount}) \ \{ \\ \quad \mathtt {widthdraw}(\mathtt {from}, \mathtt {amount}); \ \mathtt {deposit}(\mathtt {to}, \mathtt {amount}); \\ \}\end{array} $$

Using separation logic, it is possible to prove that this implementation satisfies the specification, assuming no host failures. This implementation gives no guarantees in the presence of host failures. However, for this example, it is clearly desirable for the implementation to be aware that host failures occur. In addition, the implementation should guarantee that in the event of a host failure the operation is atomic : either it happened as a whole, or nothing happened. Note that the word atomic is also used in concurrency literature to describe an operation that takes effect at a single, discrete instant in time. In Sect. 3 we combine concurrency atomicity of concurrent separation logic with host failure atomicity: if an operation is concurrently atomic then it is also host-failure atomic.

2.2 Fault-Tolerant Bank Transfer: Implementation

We want an implementation of transfer to be robust against host failures and guarantee atomicity. One way to achieve this is to use write-ahead logging (WAL) combined with a recovery operation. We assume a file-system module which provides standard atomic operations to create and delete files, test their existence, and write to and read from files. Since file systems are critical, their operations have associated internal recovery operations in the event of a host failure.

Given an arbitrary program \(\mathbb {C}\), we use \([\mathbb {C}]\) to identify that the program is associated with a recovery. We can now rewrite the transfer operation, making use of the file-system operations to implement a stylised WAL protocol as follows:

$$ \begin{array}{@{}l@{}} \mathtt {function} \ \mathtt {transfer}(\mathtt {from}, \mathtt {to}, \mathtt {amount}) \ \{ \\ \quad \mathtt {fromAmount} \mathtt {\ :=\ } \mathtt {getAmount}(\mathtt {from}); \\ \quad \mathtt {toAmount} \mathtt {\ :=\ } \mathtt {getAmount}(\mathtt {to}); \\ \quad \left[ \mathtt {create}(\mathtt {log}) \right] ; \\ \quad \left[ \mathtt {write}(\mathtt {log}, (\mathtt {from}, \mathtt {to}, \mathtt {fromAmount}, \mathtt {toAmount})) \right] ; \\ \quad \mathtt {setAmount}(\mathtt {from}, \mathtt {fromAmount} - \mathtt {amount}); \\ \quad \mathtt {setAmount}(\mathtt {to}, \mathtt {toAmount} + \mathtt {amount}); \\ \quad \left[ \mathtt {delete}(\mathtt {log}) \right] ; \\ \}\end{array} $$

The operation works by first reading the amounts stored in each account. It then creates a log file, \(\mathtt {log}\), where it stores the amounts for each account. It then updates each account, and finally deletes the log file. If a host failure occurs the log provides enough information to implement a recovery operation. In particular, its absence from the durable state means the transfer either happened or not, while its presence indicates the operation has not completed. In the latter case, we restore the initial balance by reading the log. An example of a recovery operation is the following:

$$ \begin{array}{@{}l@{}} \mathtt {function} \ \mathtt {transferRecovery}() \ \{ \\ \quad \mathtt {b} \mathtt {\ :=\ } \left[ \mathtt {exists}(\mathtt {log}) \right] ; \\ \quad \mathtt {if} \ (\mathtt {b}) \ \{ \\ \quad \quad (\mathtt {from}, \mathtt {to}, \mathtt {fromAmount}, \mathtt {toAmount}) \mathtt {\ :=\ } \left[ \mathtt {read}(\mathtt {log}) \right] ; \\ \quad \quad \mathtt {if} \ (\mathtt {from} \ne \mathtt {nil} \ \& \& \ \mathtt {to} \ne \mathtt {nil}) \ \{ \\ \quad \quad \quad \mathtt {setAmount}(\mathtt {from}, \mathtt {fromAmount}); \ \mathtt {setAmount}(\mathtt {to}, \mathtt {toAmount}); \\ \quad \quad \}\\ \quad \quad \left[ \mathtt {delete}(\mathtt {log}) \right] ; \\ \quad \}\\ \}\end{array} $$

The operation tests if the log file exists. If it does not, the recovery completes immediately since the balance is already consistent. Otherwise, the values of the accounts are reset to those stored in the log file which correspond to the initial balance. While the recovery operation is running, a host failure may occur, which means that upon reboot the recovery operation will run again. Eventually the recovery operation completes, at which point the transfer either occurred or did not. This guarantees that transfer is atomic with respect to host failures.

2.3 Fault-Tolerant Bank Transfer: Verification

We introduce the following new Hoare triple for specifying programs that run in a machine where host failures can occur:

$$ {S}\, \vdash \, \{ \,P_{V} \ |\ P_{D}\, \} \, {\mathbb {C}} \, \{ \,Q_{V} \ | \ Q_{D}\, \} $$

where \(P_V\), \(P_D\), \(Q_V\), \(Q_D\) and S are assertions in the style of separation logic and \(\mathbb {C}\) is a program. \(P_V\) and \(Q_V\) describe the volatile resource, and \(P_D\) and \(Q_D\) describe the durable resource. The judgement is read as a normal Hoare triple when there are no host failures. The interpretation of the triples is partial resource fault avoiding and host failing. Given an initial \(P_V \ |\ P_D\), it is safe to execute \(\mathbb {C}\) without causing a resource fault. If no host failure occurs, and \(\mathbb {C}\) terminates, the resulting state will satisfy \(Q_V \ |\ Q_D\). If a host failure occurs, then the durable state will satisfy the fault-condition S.

Given the new judgement we can describe the resulting state after a host failure. Protocols designed to make programs robust against host failures make use of the durable resource to return to a consistent state after reboot. We must be able to describe programs that have a recovery operation running after reboot. We introduce the following triple:

$$ {R} \, \vdash \, \{ \,P_{V} \ |\ P_{D}\, \} \, {\left[ \mathbb {C} \right] } \, \{\, Q_{V} \ |\ Q_{D}\, \} $$

The notation \(\left[ \mathbb {C} \right] \) is used to identify a program with an associated recovery. The assertion R describes the durable resource after the recovery takes place.

We can now use the new judgements to verify the write-ahead logging transfer and its recovery. In their implementation, we use a simplified journaling file system as the durable resource with the operations specified in Fig. 1. We specify the write-ahead logging transfer with the following triple:

Fig. 1.
figure 1

Specification of a simplified journaling file system.

$$\begin{aligned} S \vdash \begin{array}{@{}c@{}}\left\{ \begin{array}{@{}c@{}} \mathtt {from} = f \wedge \mathtt {to} = t \wedge \mathtt {amount} = a \wedge \mathsf {emp}\\ \hline \mathsf {Account}(f, v) * \mathsf {Account}(t, w) \end{array}\right\} \\ { \mathtt {transfer}(\mathtt {from}, \mathtt {to}, \mathtt {amount}) } \\ \left\{ \begin{array}{@{}c@{}} \mathtt {from} = f \wedge \mathtt {to} = t \wedge \mathtt {amount} = a \wedge \mathsf {emp}\\ \hline \mathsf {Account}(f, v - a) * \mathsf {Account}(t, w + a) \end{array}\right\} \end{array} \end{aligned}$$

where the fault-condition S describes all the possible durable states if a host failure occurs:

$$\begin{aligned} \begin{array}{@{}r@{\ }c@{\ }l@{}} S &{} = &{} (\mathsf {Account}(f, v) * \mathsf {Account}(t, w)) \\ &{} \vee &{} (\mathsf {Account}(f, v) * \mathsf {Account}(t, w) * \mathsf {file}(\mathtt {log}, [])) \\ &{} \vee &{} (\mathsf {Account}(f, v) * \mathsf {Account}(t, w) * \mathsf {file}(\mathtt {log}, [(f,t,v,w)])) \\ &{} \vee &{} (\mathsf {Account}(f, v - a) * \mathsf {Account}(t, w) * \mathsf {file}(\mathtt {log}, [(f,t,v,w)])) \\ &{} \vee &{} (\mathsf {Account}(f, v - a) * \mathsf {Account}(t, w + a) * \mathsf {file}(\mathtt {log}, [(f,t,v,w)]) \\ &{} \vee &{} (\mathsf {Account}(f, v - a) * \mathsf {Account}(t, w + a)) \end{array} \end{aligned}$$

A proof that the implementation satisfies the specification is shown in Fig. 2. If there is a host failure, the current specification of transfer only guarantees that the durable resource satisfies S. This includes the case where money is lost. This is undesirable. What we want is a guarantee that the operation is atomic. In order to add this guarantee, we must combine reasoning about the operation with reasoning about its recovery to establish that undesirable states are fixed after recovery. We formalise the combination of an operation and its recovery in order to provide robustness guarantees against host failures in the recovery abstraction rule:

$$\begin{aligned} \frac{ \begin{array}{c} {\mathbb {C}_{R}\ \mathbf {recovers}\ \mathbb {C}} \\ {S}\, \vdash \, \{\, P_{V}\ |\ P_{D}\, \} \, {\mathbb {C}} \, \{\, Q_{V}\ |\ Q_{D} \} \\ {S}\, \vdash \, \{\, \mathsf {emp}\ |\ S\, \} \, {\mathbb {C}_R} \, \{\, {\mathsf {true}}\ |\ R\} \end{array} }{ {R}\, \vdash \, \{\, P_{V}\ |\ P_{D}\, \} \, {\left[ \mathbb {C} \right] } \, \{\, Q_{V}\ |\ Q_{D}\} } \end{aligned}$$

When implementing a new operation, we use the recovery abstraction rule to establish the fault-condition R we wish to expose to the client. In the second premiss we must first derive what the durable resource S will be immediately after a host-failure. In the third premiss, we establish that given S, the associated recovery operation will change the durable resource to the desired R. Note that because the recovery \(\mathbb {C}_R\) runs immediately after the host failure, the volatile resource of its precondition is empty. Furthermore, we require the fault-condition of the recovery to be the same as the resource that is being recovered, since the recovery operation itself may fail due to a host-failure; i.e. recovery operations must be able to recover themselves.

Fig. 2.
figure 2

Proof of transfer operation using write-ahead logging.

We allow recovery abstraction to derive any fault-condition that is established by the recovery operation. If that fault-condition is a disjunction between the durable pre- and postconditions, \(P_D \vee Q_D\), then the operation \(\left[ \mathbb {C} \right] \) appears to be atomic with respect to host failures. Either the operation’s (durable) resource updates completely, or not at all. No intermediate states are visible to the client.

In order for transfer to be atomic, according to the recovery abstraction rule, transferRecovery must satisfy the following specification:

$$\begin{aligned} S \vdash \begin{array}{@{}c@{}}\left\{ \begin{array}{@{}c@{}} \mathsf {emp}\\ \hline S \end{array}\right\} \\ { \mathtt {transferRecovery}() } \\ \left\{ \begin{array}{@{}c@{}} \mathsf {true} \\ \hline (\mathsf {Account}(f, v) * \mathsf {Account}(t, w)) \vee (\mathsf {Account}(f, v - a) * \mathsf {Account}(t, w + a)) \end{array}\right\} \end{array} \end{aligned}$$

The proof that the implementation satisfies this specification is given in Fig. 3. By applying the abstraction recovery rule we get the following specification for transfer which guarantees atomicity in case of a host-failure:

$$\begin{aligned} R \vdash \begin{array}{@{}c@{}}\left\{ \begin{array}{@{}c@{}} \mathtt {from} = f \wedge \mathtt {to} = t \wedge \mathtt {amount} = a \wedge \mathsf {emp}\\ \hline \mathsf {Account}(f, v) * \mathsf {Account}(t, w) \end{array}\right\} \\ { \left[ \mathtt {transfer}(\mathtt {from}, \mathtt {to}, \mathtt {amount}) \right] } \\ \left\{ \begin{array}{@{}c@{}} \mathtt {from} = f \wedge \mathtt {to} = t \wedge \mathtt {amount} = a \wedge \mathsf {emp}\\ \hline \mathsf {Account}(f, v - a) * \mathsf {Account}(t, w + a) \end{array}\right\} \end{array} \end{aligned}$$

where the fault-condition R describes the recovered durable state:

$$ R = (\mathsf {Account}(f, v) * \mathsf {Account}(t, w)) \vee (\mathsf {Account}(f, v - a) * \mathsf {Account}(t, w + a)) $$

With this example, we have seen how to guarantee atomicity by logging the information required to undo operations. Advanced WAL protocols also store information allowing to redo operations and use concurrency control. We do not go into depth on how to enforce concurrency control in our examples other than the example shown in Sect. 3.1. It follows the common techniques used in concurrent separation logic.Footnote 1 However, in Sect. 4 we show ARIES, an advanced algorithm that uses write-ahead logging. A different style of write-ahead logging is used by file systems called journaling [14], which we discuss in the technical report [10].

Fig. 3.
figure 3

Proof that the transfer recovery operation guarantees atomicity.

3 Program Logic

Until now, we have only seen how to reason about sequential programs. For concurrent programs, we use resource invariants, in the style of concurrent separation logic [11], that are updated by primitive atomic operations. Here primitive atomic is used to mean that the operation takes effect at a single, discrete instant in time, and that it is atomic with respect to host failures.

The general judgement that enables us to reason about host failing concurrent programs is:

Here, \(P_V\ |\ P_D\) and \(Q_V\ |\ Q_D\) are pre- and postconditions as usual and describe the volatile and durable resource. S is a durable assertion, which we refer to as the fault-condition, describing the durable resource of the program \(\mathbb {C}\) after a host failure and possible recovery. The interpretation of these triples is partial resource fault avoiding and host failing. Starting from an initial state satisfying the precondition \(P_V\ |\ P_D\), it is safe to execute \(\mathbb {C}\) without causing a resource fault. If no host failure occurs and \(\mathbb {C}\) terminates, the resulting state will satisfy the postcondition \(Q_V\ |\ Q_D\). The shared resource invariant \(J_V\ |\ J_D\) is maintained throughout the execution of \(\mathbb {C}\). If a host failure occurs, all volatile resource is lost and the durable state will (after possible recoveries) satisfy \(S * J_D\).

We give an overview of the key proof rules of Fault-tolerant Concurrent Separation Logic (FTCSL) in Fig. 4. Here we do not formally define the syntax of our assertions, although we describe the semantics in Sect. 5. In general, volatile and durable assertions can be parameterised by any separation algebra.

Fig. 4.
figure 4

Selected proof rules of FTCSL

The sequence rule allows us to combine two programs in sequence as long as they have the same fault-condition and resource invariant. Typically, when the fault-conditions differ, we can weaken them using the consequence rule, which adds fault-condition weakening to the standard consequence rule of Hoare logic. The frame rule, as in separation logic, allows us to extend the pre- and postconditions with the same unmodified resource \(R_V * R_D\). However, here the durable part, \(R_D\), is also added to the fault-condition.

The atomic rule allows us to use the resource invariant \(J_V\ |\ J_D\) using a primitive atomic operation. Since the operation executes in a single, discrete, moment in time, we can think of the operation temporarily owning the resources \(J_V\ |\ J_D\). However, they must be reestablished at the end. This guarantees that the every primitive atomic operation maintains the resource invariant. Note that the rule enforces atomicity with respect to host failures. The share rule allows us to use local resources to extend the shared resource invariant.

The parallel rule, in terms of pre- and postconditions is as in concurrent separation logic. However, the fault-condition describes the possible durable resources that may result from a host failure while running \(\mathbb {C}_1\) and \(\mathbb {C}_2\) in parallel. In particular, a host-failure may occur while both \(\mathbb {C}_1\) and \(\mathbb {C}_2\) are running, in which case the fault-condition is \(S_1 * S_2\), or when either one of \(\mathbb {C}_1\), \(\mathbb {C}_2\) has finished, in which case the fault-condition is \(S_1 * {Q_D}_2\) and \(S_2 * {Q_D}_1\) respectively.

Finally, the recovery abstraction rule allows us to prove that a recovery operation \(\mathbb {C}_R\) establishes the fault-condition R we wish to expose to the client. The first premiss requires operation \(\mathbb {C}_R\) to be the recovery of \(\mathbb {C}\), i.e. it is executed on reboot after a host failure during execution of \(\mathbb {C}\). The second premiss guarantees that in such case, the durable resources satisfy S and the shared resource invariant satisfies \(J_D\), while the volatile state is lost after a host failure. The third premiss, takes the resource after the reboot and runs the recovery operation in order to establish R. Note that \(J_D\) is an invariant, as there can be potentially parallel recovery operations accessing it using primitive atomic operations. While the recovery operation \(\mathbb {C}_R\) is running, there can be any number of host failures, which restart the recovery. This means that the recovery operation must be able to recover from itself. We allow recovery abstraction to derive any fault-condition that is established by the recovery operation. If the fault-condition is a disjunction between the durable pre- and post-conditions, \(P_V \vee Q_D\), then the operation \(\left[ \mathbb {C} \right] \) appears to be atomic with respect to host failures.

3.1 Example: Concurrent Bank Transfer

Consider two threads that both perform a transfer operation from account f to account t as shown in Sect. 2. The parallel rule requires that each operation acts on disjoint resources in the precondition. Since both threads update the same accounts, we synchronise their use with the atomic blocks denoted by \(\langle \_ \rangle \). A possible specification for the program is the following:

A sketch proof of this specification is given in Fig. 5. We first move the shared resources of the two transfer operations to the shared invariant (share rule). We then prove each thread independently by making use of the atomic rule to gain temporary access to the shared invariant within the atomic block, and reuse the specification given in Sect. 2.3. It is possible to get stronger postconditions, that maintain exact information about the amounts of each bank account, using complementary approaches such as Owicki-Gries or other forms of resource ownership [18]. The sequential examples in this paper can be adapted to concurrent applications using these techniques.

Fig. 5.
figure 5

Sketch proof of two concurrent transfers over the same accounts.

4 Case Study: ARIES

In Sect. 2 we saw an example of a very simple transaction and its associated recovery operation employing write-ahead logging. Relational databases support concurrent execution of complex transactions following the established ACID (Atomicity, Consistency, Isolation and Durability) set of properties. ARIES (Algorithms for Recovery and Isolation Exploiting Semantics) [8], is a collection of algorithms involving, concurrent execution, write-ahead-logging and failure recovery of transactions, that is widely-used to establish ACID properties.

It is beyond the scope of this paper to verify that the full set of ARIES algorithms guarantees ACID properties. Instead, we focus on a stylised version of the recovery algorithm of ARIES proving that: (a) it is idempotent with respect to host failures, (b) after recovery, all transactions recorded in the write-ahead log have either been completed, or were rolled-back.

Transactions update database records stored in durable memory, which for the purposes of this discussion we assume to be a single file in a file system. To increase performance, the database file is divided into fixed-size blocks, called pages, containing multiple records. Thus input/output to the database file, instead of records, is in terms of pages, which are also typically cached in volatile memory. A single transaction may update multiple pages. In the event of a host failure, there may be transactions that have not yet completed, or have completed but their updated pages have not yet been written back to the database file.

ARIES employs write-ahead logging for page updates performed by transactions. The log is stored on a durable fault-tolerant medium. The recovery uses the logged information in a sequence of three phases. First, the analysis phase, scans the log to determine the (volatile) state, of any active transactions (committed or not), at the point of host failure. Next, the redo phase, scans the log and redos each logged page update, unless the associated page in the database file is already updated. Finally, the undo phase, scans the log and undos each page update for each uncommitted transaction. To cope with a possible host failure during the ARIES recovery, each undo action is logged beforehand. Thus, in the event of a host failure the undo actions will be retried as part of the redo phase.

In Fig. 6, we define the log and database model and describe the predicates we use in our specifications and proofs. We refer the reader to our technical report [10] for the formal definitions. We model the database state, \(db\), as a set of pages, where each page comprises the page identifier, the log sequence number (defined later) of the last update performed on the page, and the page data. The log, \(lg\), is structured as a sequence of log records, ordered by a log sequence number, \(lsn \in \mathbb {N}\), each of which records a particular action performed by a transaction. The ordering follows the order in which transaction actions are performed on the database. The logged action, \(U[tid, pid, op]\), records that the transaction identifier \(tid\), performs the update \(op : \textsc {Data} \rightarrow \textsc {Data}\) on the page identified by \(pid\). We use \(op^{-1}\) to denote the operation undoing the update \(op\). \(B[tid]\), records the start of a new transaction with identifier \(tid\), and \(C[tid]\), records that the transaction with id \(tid\) is committed. The information from the above actions is used to construct two auxiliary structures used by the recovery to determine the state of transactions and pages at the point of a host failure. The transaction table (TT), records the status of all active transactions (e.g. updating, committed) and the latest log sequence number associated with the transaction. The dirty page table (DPT), records which pages are modified but yet unwritten to the database together with the first log sequence number of the action that caused the first modification to each page. To avoid the cost of scanning the entire log, implementations regularly log snapshots of the TT and DPT in checkpoints, \(CHK[tt, dpt]\). For simplicity, we assume the log contains exactly one checkpoint.

Fig. 6.
figure 6

Abstract model of the database and ARIES log, and predicates.

The high level overview of the recovery algorithm in terms of its analysis, redo and undo phases is given in Fig. 7. The analysis phase first finds the checkpoint and restores the TT and DPT. Then, it proceeds to scan the log forwards from the checkpoint, updating the TT and DPT. Any new transaction is added to the TT. For any commit log record we update the TT to record that the transaction is committed. For any update log record, we add an entry for the associated page to the DPT, also recording the log sequence number, unless an entry for the same page is already in it. We give the following specification for the analysis phase:

The specification states that given the database log, the TT and DPT in the log’s checkpoint are restored and updated according to the log records following the checkpoint. The analysis does not modify any durable state.

Fig. 7.
figure 7

ARIES recovery: high level structure.

The redo phase, follows analysis and repeats the logged updates. Specifically, redo scans the log forward from the record with the lowest sequence number in the DPT. This is the very first update that is logged, but (potentially) not yet written to the database. The updates are redone unless the recorded page associated with that update is not present in the DPT, or a more recent update has modified it. We give the following specification to redo:

The specification states that the database is updated according to the logged update records following the smallest log sequence number in the DPT. The fault-condition specifies that after a host failure, all, some or none of the redos have happened. Since redo does not log anything, the log is not affected.

The last phase is undo, which reverts the updates of any transaction that is not committed. In particular, undo scans the log backwards from the log record with the largest log sequence number in the TT. This is the log sequence number of the very last update. For each update record scanned, if the transaction exists in the TT and is not marked as committed, the update is reversed. However, each reverting update is logged beforehand. This ensures, that undos will happen even in case of host failure, since they will be re-done in the redo phase of the subsequent recovery run. We give the following specification for the undo phase:

The specification states that the database is updated with actions reverting previous updates as obtained from the log. These undo actions are themselves logged. In the event of a host failure the fault-condition specifies that all, some, or none of the operations are undone and logged.

Using the specification for each phase and using our logic we can derive the following specification for this ARIES recovery algorithm:

The proof that the high level structure of the ARIES algorithm satisfies this specification is given in Fig. 8. For the implementations of each phase and proofs they meet their specifications we refer the reader to our technical report [10]. The key property of the ARIES recovery specification is that the durable precondition is the same as the fault-condition. This guarantees that the recovery is idempotent with respect to host failures. This is crucial for any recovery operation, as witnessed in the recovery abstraction rule, guaranteeing that the recovery itself is robust against crashes. Furthermore, the specification states that any transaction logged as committed at the time of host failure, is committed after recovery. Otherwise transactions are rolled back.

Fig. 8.
figure 8

Proof of the high level structure of ARIES recovery.

5 Semantics and Soundness

We give a brief overview of the semantics of our reasoning and the intuitions behind its soundness. A detailed account is given in the technical report [10].

5.1 Fault-Tolerant Views

We define a general fault-tolerant reasoning framework using Hoare triples with fault-conditions in the style of the Views framework [3]. Pre- and postcondition assertions are modelled as pairs of volatile and durable views (commutative monoids). Fault-condition assertions are modelled as durable viewsFootnote 2. Volatile and durable views provide partial knowledge reified to concrete volatile and durable program states respectively. Concrete volatile states include the distinguished host-failed state . The semantic interpretation of a primitive operation is given as a state transformer function from concrete states to sets of concrete states.

To prove soundness, we encode our Fault-tolerant Views (FTV) framework into Views [3]. A judgementFootnote 3 \({s} \, \vdash \{(p_{v},p_{d})\} \, {\mathbb {C}} \, \{(q_{v},q_{d})\}\), where \(s,p_d,q_d\) are durable views and \(p_v,q_v\) are volatile views is encoded as the Views judgement: , where volatile views are extended to include and \(\vee \) is disjunction of views. For the general abstraction recovery rule we encode \(\left[ \mathbb {C} \right] \) as a program which can test for host failures, beginning with \(\mathbb {C}\) and followed by as many iterations of the recovery \(\mathbb {C}_R\) as required in case of a host failure.

We require the following properties for a sound instance of the framework:

Host failure: For each primitive operation, its interpretation function must transform non host-failed states to states including a host-failed state. This guarantees that each operation can be abruptly interrupted by a host failure.

Host failure propagation: For each primitive operation, its interpretation function must leave all host-failed states intact. That is, when the state says there is a host failure, it stays a host failure.

Axiom soundness: The axiom soundness property (property [G] of Views [3]).

The first two are required to justify the general FTV rules, while the final property establishes soundness of the Views encoding itself. When all the parameters are instantiated and the above properties established then the instantiation of the framework is sound.

5.2 Fault-Tolerant Concurrent Separation Logic

We justify the soundness of FTCSL by an encoding into the Fault-tolerant Views framework discussed earlier. The encoding is similar to the concurrent separation logic encoding into Views. We instantiate volatile and durable views as pairs of local views and shared invariants.

The FTCSL judgement is encoded as:

$$\begin{aligned} {s} \, \vdash \, \{((p_{v}, j_{v}), (p_{d}, j_{d}))\} \, {\mathbb {C}} \, \{((q_{v}, j_{v}), (q_{d}, j_{d}))\} \end{aligned}$$

The proof rules in Fig. 4 are justified by soundness of the encoding and simple application of FTV proof rules. Soundness of the encoding is established by proving the properties stated in Sect. 5.1.

Theorem 1

(FTCSL Soundness). If the judgement is derivable in the program logic, then if we run the program \(\mathbb {C}\) from state satisfying \(P_V * J_V\ |\ P_D * J_D\), then \(\mathbb {C}\) will either not terminate, or terminate in state satisfying \(Q_V * J_V\ |\ Q_D * J_D\), or a host failure will occur destroying any volatile state and the remaining durable state (after potential recoveries) will satisfy \(S * J_D\). The resource invariant \(J_V\ |\ J_D\) holds throughout the execution of \(\mathbb {C}\).

6 Related Work

There has been a significant amount of work in critical systems, such as file systems and databases, to develop defensive methods against the types of failures covered in this paper [1, 8, 14, 19]. The verification of these techniques has mainly been through testing [6, 13] and model checking [21]. However, these techniques have been based on building models that are specific to the particular application and recovery strategy, and are difficult to reuse.

Program logics based on separation logic have been successful in reasoning about file systems [5, 9] and concurrent indexes [16] on which database and file systems depend. However, as is typical with Hoare logics, their specifications avoid host failures, assuming that if a precondition holds then associated operations will not fail. Faulty Logic [7] by Meola and Walker is an exception. Faulty logic is designed to reason about transient faults, such as random bit flips due to background radiation, which are different in nature from host failure.

Zengin and Vafeiadis propose a purely functional programming language with an operational semantics providing tolerance against processor failures in parallel programs [22]. Computations are check-pointed to durable storage before execution and, upon detection of a failure, the failed computations are restarted. In general, this approach does not work for concurrent imperative programs which mutate the durable store.

In independent work, Chen et al. introduced Crash Hoare Logic (CHL) to reason about host failures and applied it to a substantial sequential journaling file system (FSCQ) written in Coq [2]. CHL extends Hoare triples with fault-conditions and provides highly automated reasoning about host failures. FSCQ performs physical journaling, meaning it uses a write-ahead log for both data and metadata, so that the recovery can guarantee atomicity with respect to host failures. The authors use CHL to prove that this property is indeed true. The resource stored in the disk is treated as durable. Since FSCQ is implemented in the functional language of Coq, which lacks the traditional process heap, the volatile state is stored in immutable variables.

The aim of FSCQ and CHL is to provide a verified implementation of a sequential file system which tolerates host failures. In contrast, our aim is to provide a general methodology for fault-tolerant resource reasoning about concurrent programs. We extend the Views framework [3] to provide a general concurrent framework for reasoning about host failure and recovery. Like CHL, we extend Hoare triples with fault-conditions. We instantiate our framework to concurrent separation logic, and demonstrate that an ARIES recovery algorithm uses the write-ahead log correctly to guarantee the atomicity of transactions. In the technical report [10], we explore the differences in the specifications of fault-tolerance guarantees in physical and logical journaling file systems.

As we are defining a framework, our reasoning of the durable and volatile state (given by arbitrary view monoids) is general. In contrast, CHL reasoning is specific to the durable state on the disk and the volatile state in the immutable variable store. CHL is able to reason modularly about different layers of abstraction of a file-system implementation, using logical address spaces which give a systematic pattern of use for standard predicates. We do not explore modular reasoning about layers of abstractions in this paper, since it is orthogonal to reasoning about host failures, and examples have already been studied in instances of the Views framework and other separation logic literature [4, 12, 17, 18, 20].

We can certainly benefit from the practical CHL approach to mechanisation and proof automation. We also believe that future work on CHL, especially on extending the reasoning to heap-manipulating concurrent programs, can benefit from our general approach.

7 Conclusions and Future Work

We have developed fault-tolerant resource reasoning, extending the Views framework [3] to reason about programs in the presence of host failures. We have proved a general soundness result. For this paper, we have focused on fault-tolerant concurrent separation logic, a particular instance of the framework. We have demonstrated our reasoning by studying an ARIES recovery algorithm, showing that it is idempotent and that it guarantees atomicity of database transactions in the event of a host failure.

There has been recent work on concurrent program logics with the ability to reason about abstract atomicity [17]. This involves proving that even though the implementation of an operation takes multiple steps, from the client’s point of view they can be seen as a single step. Currently, this is enforced by syntactic primitive atomic blocks (\(\langle \_\rangle \)) in the programming language. In future, we want to combine abstract atomicity from concurrency with host failure atomicity.

Another direction for future work involves extending existing specifications for file systems [5, 9] with our framework. This will allow both the verification of interesting clients programs, such as fault-tolerant software installers or persisted message queues, and the verification of fault-tolerant databases and file systems.