Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Linearizability [13] is the de facto correctness condition for the implementation of concurrent data structures. In a nutshell, linearizability establishes an observational equivalence between a multi-step fine-grained implementation, and an atomic coarse-grained specification of the data structure [7]. Thus, linearizability implies that each operation of a data structure implementation can be considered as executing atomically with respect to other concurrent operations.

While the definition of linearizability is intuitively simple, its proofs tend to be complex, and oftentimes depend on complicated simulation relations between the abstraction and the implementation (eg. [3, 18]). A common strategy to define linearizability simulations is to identify program points in the implementation of the data structure – known as linearization points (LP) – upon whose execution an operation can be considered to have happened atomically [21]. Technically, a linearization point indicates in a weak simulation argument a unique and atomic step in the execution of the implementation at which the specification and the implementation match their behaviors.

In this paper, we present a lightweight technique which reduces this complex task into a property checking problem. In our approach, the abstract specification of a data structure operation is defined using recursive functions on heaplets (assertions describing a portion of the heap). We model these assertions as a set of memory locations following [17], enabling the use of SMT solvers to discharge our proof obligations. For example, at a linearization point (LP) of a stack push method, the set of locations conforming the stack after the execution of LP, should be equal to the set of locations conforming the stack before the push, plus an additional location containing the new value pushed. The validity of the formulae relating these two states of this linearizability argument (over sets) is decidable because they can be translated to quantifier-free formulas over integers and functions, and can be solved by using SMT solvers.

Tools like Cave [22] are succesful at automatically proving linearizability for data structures where the linearization points – static program locations – can be affixed to static program locations within each operation implementation. Unfortunately, a large class of lock-free linearizable data structures resist proofs by identifying such linearization points [3, 11, 16]. This should not be surprising since the definition of linearizability only requires the existence of a linearization for each invocation, which is intrinsically a semantical property of the implementation. Such a linearization needs not always correspond to the execution of predefined instructions within the code of the operation implementation. Our work extends previous efforts on the automation of linearizability proofs leveraging internal linearization points [22], by identifying data structure implementations whose linearization points may reside in code belonging to a concurrent operation.

We do so by identifying two common patterns, occurring in fine-grained concurrent algorithms which cannot be verified using linearization points. Our ideas extend the state of the art in automatic linearizability verification by extending Cave [22] beyond linearization points. To the verification of linearizability by linearization points, Poling adds the following three notable features.

  • Firstly, Poling converges faster than Cave in all the benchmarks that both tools can handle. This can be attributed to the fact that Poling reasons about separation logic (SL) verification conditions by interpreting them as sets of memory locations, following [17], with efficient SMT solvers.

  • Secondly, Poling can verify linearizability for implementations that use helping [14], a mechanism that allows an operation performed by one thread to be completed by a concurrent operation of a different thread.

  • Finally, Poling is sensitive to hindsight [16], a specific pattern in which the linearization of operations that do not update the state (called pure) can only be established a posteriori of their execution, and it might depend on other threads operations.

2 Overview

In the tradition of Cave, we focus on the linearizability of concurrent data structures implemented using linked lists. Some examples of such data structures are: stacks, queues and sets (see [12]). In our work we assume a garbage collected environment. Our approach is defined with respect to the most general client (MGC) program in which an unbounded number of threads interact though a single shared instance of the data structure [21], hence generalizing any possible client.

Fig. 1.
figure 1

Treiber stack.

To introduce Poling, consider the implementation of a lock-free stack due to Treiber [19] shown in Fig. 1. The stack is organized as a linked list of nodes containing a payload (val), and a next pointer to the following node in the list. The head of the list is saved in the field first of a shared global variable TOP, which serves as the synchronization point for competing threads. Thus, TOP->first points to the last-in node, when there is one, and it contains 0 if the stack is empty. The operations push and pop are presented. As customary, we group blocks that should be executed atomically with “\(\langle \)” and “\(\rangle \)” (in this case they represent an inline of a cas instruction). The operation push creates a new node - it reads the head of the linked list, and then tries to atomically update it. If some other thread modifies TOP->first between the read and the write, the update is aborted and the process is reiterated. A similar argument applies to pop.

Since linearizability relates the implementation to a specification, the programmer must provide the specification. In the case of Cave this is done by means of a simple language with primitives for the interpretation of lists and sets. The programmer can use these primitives to operationally specify the data structure. On the other hand, in Poling the specification is done declaratively using recursive functions. The specification of pop is defined declaratively as:

$$ {\small \begin{array}{rcll} \mathtt {vals}_{}({\sigma },\mathtt{TOP->first }) &{} = &{} r \mathop {{:}{:}}\mathtt {vals}_{}({\sigma '},\mathtt{TOP->first }) &{} \iff \mathtt{pop}() = { r}\\ \mathtt {vals}_{}({\sigma },\mathtt{TOP->first }) &{} = &{} [\ ] &{} \iff \mathtt{pop}() = \mathtt{Empty} \end{array}} $$

In these equations, TOP and first are program variables. The keyword vals is a primitive used to abstractly refer to the contents of a data structure, where the argument \(\sigma \) represents the current state, and the argument TOP->first is a program expression pointing to the first element of the list. The state resulting from executing pop is represented by the symbol \(\sigma '\). Finally, the return value of pop is denoted by r. Then, the specification of pop establishes the relation between the abstract state of the list before and after its execution. In this example, Poling interprets vals(\(\sigma \), TOP->first) as a recursive function [17], defining the values stored in the locations reachable from TOP->first in \(\sigma \). Recursive functions will be formalized in Sect. 3.

In Fig. 1, the program point marked with L0 corresponds to the linearization point of pop when the list is empty. The specification for the non-empty case is satisfied at the program point L, because the set of values of locations reachable from TOP->first in a state \(\sigma \) (before the execution of the statement at L) equals to the set of values reachable in the resulting state (\(\sigma '\)) plus the element pointed by y, which is pop’s return value. Importantly no other point in pop changes the values of the reachable locations. Poling using the set of memory locations at L calls an SMT solver to automatically check whether the update to these locations, before and after L, respects the specification of pop. We present verification details in Sect. 4.

Fig. 2.
figure 2

Elimination based stack. The formulae in square brackets are assumes and each node denotes an atomic block. Struct desc is an annotated descriptor.

The example shown in Fig. 2 (adapted from [6]) is a simplification of the HSY stack [10]. The stack uses an underlying Treiber implementation, but improves its performance for high-contention scenarios through an elimination layer, an occurrence of the helping mechanism. The example allows pairs of overlapping instances of pop and push to exchange their values without accessing the list. Each operation attempts an iteration of the Treiber implementation, and if it fails it applies the helping mechanism. The process is iterated until the operation succeeds. Helping is implemented by storing a descriptor of a pending pop into the shared state variable popdesc. In the descriptor, if the value of g is W(aiting) in location a \(_1\), there is an invocation of pop ready to be helped. If the value of g is C(ollided), then a pair of push and pop invocations is in the process of exchanging the value through g in location b. The helping completes or gives up when pop sets the value of g back to F(ree), in locations a \(_2\) and a \(_3\), so that another instance of helping can happen. Importantly, the linearization of pop can happen in the code of a concurrent push in location b and, in location a \(_2\), the pop can witness that it has been helped because the transition through a \(_1\) to a \(_2\) can only be caused by an action that happened in location b of a push thread, where it was linearized.

We exploit this witnessing strategy in our linearization proofs. When an operation is linearized by another operation that helps it, we record the state at which the operation was linearized. Then, in the verification of the helped operation, we can use the recorded state as a witness to check that it was indeed helped by a concurrent operation. We observe that the witnessing is carried out through the descriptors (popdesc). To verify how a push can find and then help a pending pop, we allow programmers to use JML-style markers to annotate the descriptor data structure, indicating for an operation’s name, the parameters and the return value (in Fig. 2 denoted via @operation, @parameter and @return respectively). We cover the details of the verification in Sect. 4.

3 Formal Model

We define a data structure \(\mathtt{D}\) as a pair \((\mathtt{D}_{\varSigma }, \mathtt{D}_{\mathtt {M}})\) consisting of \(\mathtt{D}_{\varSigma }\), the domain of the data structure; and \(\mathtt{D}_{\mathtt {M}}\) a set of method names, or primitive operations in the terminology of [13]. We call an invocation of the method \(\mathtt{m}\) simply an operation, and use the metavariable \(op_{\mathtt{m}}\) to range over the set \(\mathcal {O}ps\) of all possible invocations. In turn, operations can be decomposed into a tuple \(op_{\mathtt{m}} = (\mathtt{m},t,v, r)\) containing a thread identifier \(t\in \mathcal {T}id\), a vector with the arguments \(v\), and if the operation is completed, a return value \(r\).

Program Model. We omit a description of the program state and operational semantics of the programming language, assumed to be a standard first-order, shared-memory, concurrent, imperative language. We assume a set of states \(\sigma \in \mathtt{D}_{\varSigma }\), and an operational semantics with execution steps between states labeled by events \(ev\in Evs\) following the judgment: \(\sigma \xrightarrow {t,ev} \sigma '\). Events capture data and control-flow actions (loads, stores, conditionals, etc.) with the addition of invocation and response events of operation \(op_{\mathtt{m}}\), denoted \(\mathsf {inv}(op_{\mathtt{m}})\) and \(\mathsf {res}(op_{\mathtt{m}})\) respectively. These latter two kinds of events serve to delimit the “duration” of operation invocations. We assume the obvious extension of this step judgment to traces of events ranged by \({\mathsf {tr}}\in Evs^*\), denoted \(\sigma \xrightarrow {{\mathsf {tr}}} \sigma '\), with the obvious inductive definition. Where unnecessary, we also omit the intermediate states.

Linearizability. Following [21], and without loss of generality, we assume a thread makes at most one invocation to an operation of the data structure. We then overload the notations for invocations and responses as \(\mathsf {inv}(t)\) and \(\mathsf {res}(t)\), for thread \(t\). We define \(\mathsf {history}({\mathsf {tr}})\) to be the projection of invocation and response events in trace \({\mathsf {tr}}\). Traces induce a partial order between operations. We say that a operation \(t\) precedes operation \(t'\) in \({\mathsf {tr}}\) \(t'\hbox {in}\mathsf{tr}\;{(t \prec _{\mathsf{tr}}t')}\) as defined belowFootnote 1:

$$\begin{aligned} t \prec _{\mathsf{tr}} t' \iff \ \exists \ {\mathsf {tr}}_0\,\, {\mathsf {tr}}_1\,\, {\mathsf {tr}}_2,\ {\mathsf {tr}}= {\mathsf {tr}}_0 \cdot \mathsf {res}(t) \cdot {\mathsf {tr}}_{1} \cdot \mathsf {inv}(t') \cdot {\mathsf {tr}}_2 \end{aligned}$$

We say that a history is sequential if each invocation event \(\mathsf {inv}(t)\) is immediately followed by its corresponding response event \(\mathsf {res}(t)\).

Definition 1

(Linearizable History). A history \(h\) is linearizable w.r.t. the specification of a data structure \(\mathtt{D}\) if, and only if, there exists a sequential trace \(h_{s}\) of \(\mathtt{D}\) such that \(\prec _h \subseteq \prec _{H_s}\), and the set of operations of \(h\) and \(h_{s}\) coincide.

This notion is trivially lifted to implementations by requiring every implementation trace to have a linearizable history.

Set Specification. As hinted in Sect. 2, we abstract the data structure through an abstraction function: \(\mathtt{vals}(\sigma , \mathtt{x})\), which for a given state \(\sigma \) represents the set of values stored in the data structure pointed by the program variable x. This allows us to express the abstract data structure specification as an equation relating the initial state \(\sigma \) and final state \(\sigma '\). Then for \(Set\) data structures implemented as an ordered list, where we assume that the global variable \(\mathsf {head}(Set)\) points to the beginning of the list, we write:

$$\begin{aligned} \begin{array}{rclcl} \mathtt {vals}_{}({\sigma '},\mathsf {head}(Set))\ &{} = &{} \{v\} \cup \mathtt {vals}_{}({\sigma },\mathsf {head}(Set)) &{} \iff &{} \mathtt{add}({ v}) = \text {true}\\ \mathtt {vals}_{}({\sigma '},\mathsf {head}(Set)) \cup \{v\}\ &{} = &{} \mathtt {vals}_{}({\sigma },\mathsf {head}(Set)) &{} \iff &{} \mathtt{contains}({ v}) = \text {true}\\ v &{}\notin &{} \mathtt {vals}_{}({\sigma },\mathsf {head}(Set))\ &{} \iff &{} \mathtt{contains}({ v}) = \text {false} \end{array} \end{aligned}$$

Order Preserving Specifications. Using the concatenation operator (\(\mathop {{:}{:}}\)) instead of union (\(\cup \)) as above, we can capture the behavior of data structures whose temporal behavior imposes an ordering. For example a stack pop should always return the “last” value pushed. Assuming that \(\mathsf {head}(\mathtt{D})\) is the global variable pointing to the first element of the list, we can specify pop and dequeue – where we omit all other methods – as:

$$ {\small \begin{array}{rclcl} \small \mathtt {vals}_{}({\sigma },\mathsf {head}(Stk)) &{} = &{} r \mathop {{:}{:}}\mathtt {vals}_{}({\sigma '},\mathsf {head}(Stk)) &{} \iff &{} \mathtt{pop}() = { r}\\ \mathtt {vals}_{}({\sigma },\mathsf {head}(Queue)) &{} = &{} \mathtt {vals}_{}({\sigma '},\mathsf {head}(Queue)) \mathop {{:}{:}}r &{} \iff &{} \mathtt{dequeue}() = { r} \end{array} } $$

State Abstraction. Since we use RGSep [21], we abstract the program state \(\sigma \) using separation logic formulae, denoted by a metavariable \(\psi \). The syntax of these formulae is given by the following grammars, where \(\approx \) ranges over binaryrelations of expression.

figure b

We briefly describe these assertions (details can be found in [21]): 1. The formulae are given in disjunctive normal form, representing the different possible states reachable through different paths, 2. Each of the disjuncts has two parts, the local state, \(P\) predicating over the state local to the thread, and a shared state \(\boxed {P'}\) – demarcated by a box [21] – which predicates over the state accessible to all threads, 3. Further, each of these states can be separated into a pure part \(\varPi \), only concerned with stack allocated variables, and a spatial part \(\Gamma \), which describes the heap, 4. Finally, heap assertions include the standard separation logic operators, where \(E_{tl} \mapsto \rho \) denotes that the location E contains a record \(\rho \) (a mapping from field names to values), where the special field tl points to the next node in a linked list, and \(\mathtt{lseg}_{tl,\rho }(E, E')\) denotes a linked list segment starting at location E and ending at \(E'\). The same convention applies to tl. All the nodes in this list segment share a same field-value mapping described in \(\rho \).

Data Structure Abstraction. We use the method of [17] to discharge proof obligations about the state using an SMT solver. In [17] SL assertions are encoded as predicates on sets of memory locations. To that end, we define in Fig. 3 a data structure abstraction function that takes an RGSep assertion, and transforms it into a set of values. This is the function , which is the symbolic counterpart to the function \(\mathtt {vals}_{}(\sigma ,\mathtt{x})\) used for specifications. This recursive abstraction definition represents the set of elements that inhabit the data structure. The function \(\mathsf {Find}(\Gamma ,\mathsf {E})\), simply finds the syntactic atomic occurrence of a node or a list starting from \(\mathsf {E}\) in the spacial formula \(\Gamma \). We omit its trivial recursive definition over RGSep formulae. Here, the capitalized expressions VALS are uninterpreted functions in the logic of the underlying theorem provers we use [17]. Finally, notice that if we substitute the concatenation operator (\(\mathop {{:}{:}}\)) for all the occurrences of the union operator (\(\cup \)) in Fig. 3, we obtain a recursive definition for lists instead of sets. A refined definition of our data structure abstraction is given in [25] (that also considers reachable locations that are logically detached).

Fig. 3.
figure 3

Recursive abstraction definition

4 Verification

Our verification begins after a pass of the frontend of Cave [22], which given the implementation of a data structure \(\mathtt{D}\), uses symbolic execution and shape analysis to produce \(\mathtt{D}\)’s data structure invariant [4] and RGSep [21] rely-guarantee actions. To aid the verification of the helping mechanism, Poling requires the programmer to instrument descriptors (as exemplified in Fig. 2). A simple analysis could be implemented to instrument the descriptors automatically, or an interface could be implemented to indicate the thread descriptors, but we omit this unrelated step to simplify our development.

Central to our development are LP (linearization point) functions.

Definition 2

(Valid LP). Assume a trace \({\mathsf {tr}}: \sigma _{i} \xrightarrow {{\mathsf {tr}}} \sigma _{f}\) of the data structure \(\mathtt{D}\), and a function \(\varrho : \mathcal {O}ps\rightarrow \mathtt{D}_{\varSigma }\), mapping each operation \(op\) of \({\mathsf {tr}}\) to the state in \({\mathsf {tr}}\) exactly prior to the linearization of \(op\). We say that \(\varrho \) is a valid linearization point function for \({\mathsf {tr}}\) with respect to an abstract specification \(\varphi \) if:

  1. 1.

    every operation \(op\in {\mathsf {tr}}\), has an LP state (i.e. \(\varrho (op)\)) strictly between its invocation (\(\mathsf {inv}(op)\)) and its response (\(\mathsf {res}(op)\)). Formally:

    $$ \exists \ {\mathsf {tr}}_{(1:4)},\ \sigma _{i} \xrightarrow {{\mathsf {tr}}} \sigma _f = \sigma _{i} \xrightarrow {{\mathsf {tr}}_1} \cdot \xrightarrow {\mathsf {inv}(op)} \cdot \xrightarrow {{\mathsf {tr}}_2} \varrho (op) \xrightarrow {{\mathsf {tr}}_3} \cdot \xrightarrow {\mathsf {res}(op)} \cdot \xrightarrow {{\mathsf {tr}}_4} \sigma _f $$
  2. 2.

    only the states that linearize operations can affect the abstract data structure:

    $$\begin{aligned} \sigma _1&\notin \{\varrho (op)\ |\ op\in {{\mathsf {tr}}}\}\ \text {and}\ \sigma _{i} \xrightarrow {{\mathsf {tr}}} \sigma _f = \sigma _{i} \xrightarrow {{\mathsf {tr}}_1} \sigma _1 \xrightarrow {ev} \sigma _2 \xrightarrow {{\mathsf {tr}}_2} \sigma _f \\&\Rightarrow \mathtt {vals}_{}(\sigma _1,\mathsf {head}(\mathtt{D})) = \mathtt {vals}_{}(\sigma _2,\mathsf {head}(\mathtt{D})) \end{aligned}$$
  3. 3.

    for each operation \(op\in {\mathsf {tr}}\), we have that the LP state, and its subsequent state are related by the data structure specification \(\varphi \). Formally, if the abstract specification of \(op= (\mathtt{m},t,v,r)\) is \(\varphi \), for a trace of \(op\): \(\sigma _{i} \xrightarrow {{\mathsf {tr}}} \sigma _f = \sigma _{i} \xrightarrow {{\mathsf {tr}}_1} \varrho (op) \xrightarrow {ev} \hat{\sigma } \xrightarrow {{\mathsf {tr}}_2} \cdot \xrightarrow {\mathsf {res}(op)} \sigma _r \xrightarrow {{\mathsf {tr}}_3} \sigma _f\) where \((\mathtt{m}(v) = \sigma _r(r))\): (1) if \(op\) is the only operation linearized in \(\varrho (op)\) (i.e. there does not exist another \(op'\), such that \(op' \ne op\wedge \varrho (op') = \varrho (op)\)) then

    $$ [\varrho (op)/\sigma ][\hat{\sigma }/\sigma ']\varphi $$

    (2) if there does exist one \(op'\) whose abstract specification is \(\varphi '\), such that \(op' \ne op\wedge \varrho (op') = \varrho (op)\) Footnote 2 then, for any (ghost) state \(\sigma _g\),

    $$ ([\varrho (op)/\sigma ][\sigma _{g}/\sigma ']\varphi \Rightarrow [\sigma _{g}/\sigma ][\hat{\sigma }/\sigma ']\varphi ') \vee ([\varrho (op)/\sigma ][\sigma _{g}/\sigma ']\varphi ' \Rightarrow [\sigma _{g}/\sigma ][\hat{\sigma }/\sigma ']\varphi ) $$

In condition 3, to verify whether an operation \(op\) can be linearized when \(op\) is the single operation linearized in state \(\varrho (op)\), we prove that the specification \([\varrho (op)/\sigma ][\hat{\sigma }/\sigma ']\varphi \) is respected by the execution step from the LP state \(\varrho (op)\). In the substitution, the parametric pre state \(\sigma \) and post state \(\sigma '\) of the specification \(\varphi \), are replaced with the LP state and its post state \(\hat{\sigma }\). However, in the case of helping, many operations can be linearized in a single step (e.g. a push and a pop exemplified in Fig. 2). We handle this case by introducing ghost states. For example, if an event \(\varrho (op) \xrightarrow {ev} \hat{\sigma }\) linearizes two operations; say \(\varrho (op) = \varrho (op')\), we check that there exists an intermediate state \(\sigma _{g}\) such that \(\varrho (op)\) and \(\sigma _{g}\) satisfy the specification of \(op\) and \(\sigma _{g}\) and \(\hat{\sigma }\) satisfy the specification of \(op'\), or viceversa. Intuitively this mediates and orders the linearization of the two operations.

Theorem 1

A data structure implementation \(\mathtt{D}\) is linearizable with respect to an abstract specification \(\varphi \), if for every trace \({\mathsf {tr}}\) of the implementation, there exists a valid \(\mathtt{D}\) -LP function with respect to the specification \(\varphi \).Footnote 3

Witness States. Definition 2 paired with Theorem 1 provides us with a way of checking linearizability by constructing valid LP functions. However, since LP functions map operations to states, it could be the case that the LP state of a certain operation may precede an event from a thread other than the one whose operation is linearized. The same argument applies when multiple operations are linearized in a single step. This is true for the helping mechanism of Fig. 2. In this case, we define witness states as states from which one thread can make sure that it has been linearized, i.e. that a prior LP state (\(\varrho (op)\)) exists. For the simple case where the linearization point of an operation can be identified with its own program statement, the witness state is exactly the state before executing this statement. We prove linearizability by identifying witness states. In our approach, we distinguish effectful witness states where the abstract data structure is altered (like L in Fig. 1), from pure (or effect-less) witness state that leave the abstract data structure intact (like L0 in Fig. 1).

Algorithm. We present our overall verification strategy in Fig. 4. We first use heuristics derived from Cave to identify a set of states as candidate witness states (\(\psi _{\sigma }\)), paired with the event (\(ev\)) to be executed next; such events include all the memory reads or writes. The abstract states (\(\psi _{\sigma }\)) are obtained through the symbolic execution of Cave. The function Check verifies whether in a symbolic state \(\psi _{\sigma }\) the linearization of the operation can be witnessed w.r.t. its abstract specification \(\varphi \). Programmers provide \(\varphi \) through the definition \(\mathtt{vals(\sigma , \mathsf {head}(\mathtt{D}))}\) which is translated into the symbolic version .

Consider the pseudo-code of Check (given in Fig. 4). In line 2 we symbolically execute the event \(ev\) from the state \(\psi _{\sigma }\). To check if the abstract specification \(\varphi \) is fulfilled, in line 3, we replace the initial and final state with the ones obtained by symbolic execution and then unroll the definitions and that are mentioned in the specification of the method, and encode them using first order logic (FOL) with set theories following [17]. We feed the unrolled formulae to an SMT solver (the arguments v and return values r in the specification are also replaced with proper program variables, not shown here). Notice that satisfiability of quantifier-free formulas over sets/multisets with set union (\(\cup \)) is decidable. Concatenation (\(\mathop {{:}{:}}\)) is considered as uninterpreted. If the formula is provable we have identified a witness state.

This strategy only applies to the case when linearization can be syntactically associated to instructions of the operation’s own code, i.e. LPs. Lines 4-6 deal with the cases when the linearization point might reside in operations of concurrent threads, which will be covered in the subsequent sections. If we are not able to prove \(\varphi \) in \(\psi _{\sigma }\) after these checks, at line 7, before reporting this state is not a candidate witness, we check that the abstract data structure did not change. We recall that we assume that only linearization events can modify the abstract state of the data structure. Hence, if the state did change, we abort the process in line 9, and report that the implementation could not be proved linearizable. After all the witness states are validated, following the strategy in [22], we use a simple data-flow analysis to verify each program path has either exactly one witness state or at least one pure witness.

Fig. 4.
figure 4

General framework. (The full pseudo code is provided in [25]).

Example 1

Consider the pop method of the stack implementation of Fig. 1. With the method delineated above we obtain a symbolic state before the program point L (we only show the shared state):

$$ \small \mathtt{TOP} \mapsto (\mathtt{first}:\mathtt{y})\ *\ \mathtt{y} \mapsto \mathtt{z}\ *\ \mathsf{lseg}_{\mathtt{next}}(\mathtt{z}, \mathtt{0}) $$

This state is rendered from the successful test in the pop (\(\mathtt {\mathtt{TOP} \mapsto \mathtt{first} = y}\)). We will consider this state to be the witness state of pop (i.e. \(\psi _{\sigma }\)). The assignment of \(\mathtt{z}\) to TOP->first would then be performed. To verify whether this implementation is faithful to the Stack specification, we first symbolically execute the instruction at L to render the post state after L (i.e. \(\psi _{\sigma '}\)):

$$ \small \mathtt{TOP} \mapsto (\mathtt{first}:\mathtt{z})\ *\ \mathsf{lseg}_{\mathtt{next}}(\mathtt{z}, \mathtt{0}) \ *\ \mathtt{y} \mapsto \mathtt{z} $$

According to the abstract specification of pop (Definition 3), we have to prove:

figure c

After unfolding, and substituting the special return symbol r with the actual return value, we obtain:

$$ \small \mathtt{VAL}(\mathtt{y}) {:}{:} \mathtt{VALS}(\mathtt{z}, \mathtt{0}) = \mathtt{VAL}(\mathtt{y}){:}{:}\mathtt{VALS}(\mathtt{z}, \mathtt{0}) $$

which is clearly provable. Moreover,

figure d

holds for all the other states of pop. After all these verification steps, our method concludes that the program state before L is a valid effectful witness state.

Helping Verification. Consider a trace \(\sigma _{i} \xrightarrow {{\mathsf {tr}}_1} \varrho (op') \xrightarrow {t, ev} \sigma \xrightarrow {{\mathsf {tr}}_2} \sigma _f\) corresponding to an execution of a data structure where \(op'\) is specified as \((\mathtt{m},t',v,r)\). This trace is typical of algorithms implementing the helping mechanism. Here, the event (\(ev\)) that linearizes thread \(t'\) (the thread executing \(op'\)) is taken by a concurrent thread \(t\). A key ingredient of this pattern are the descriptors, which are used to keep information about ongoing invocations performed by different threads (c.f. popdesc in Fig. 2). A thread can acknowledge its concurrent threads through its descriptors which are used by the concurrent threads to complete helping. In our proof, a thread under verification can retrieve the specifications of the other concurrent operations through their descriptors.

To exploit such descriptors, we add to the symbolic state, a set that represents helped operations. We call this set the Spec Pool, and use it to keep track of the synchronization entailed through the descriptors. Operations that perform helping are assumed to affect the Spec Pool. In the Spec Pool, each helped operation is equipped with (1) the condition that must hold upon its linearization and (2) the rely actions (used by the helper thread) that linearize it.

Fig. 5.
figure 5

Spec pool.

We provide a pictorial description of the process in Fig. 5. Here we consider an event \(\varrho (op') \xrightarrow {t, ev} \sigma \) from thread t which helps a concurrent operation \(op'\). This event modifies the Spec Pool by inserting a tuple (\(op'\), \(\alpha (\psi _{\sigma })\), RG) indicating that \(op'\) has been helped at a state \(\sigma \), and RG is the rely-guarantee actions extracted from this step [23], where \(\alpha \) is a function that encodes a SL formula \(\psi _{\sigma }\) into a FOL formula. A verification step from \(op'\) (i.e. \(t'\)) can observe the effects of t at state \(\sigma '\) (by checking a first order logical implication between \(\alpha (\psi _{\sigma })\) and \(\alpha (\psi _{\sigma '})\)). When verifying \(op'\) we also need to check that \(\sigma '\) can only be reached with the help of the rely action RG, absent of which \(\sigma '\) would be unreachable for \(t'\). If the check is successful, \(\sigma '\) is considered as the witness state for \(op'\). Figure 6 presents the definition of \(\alpha (\psi )\). Simply stated, we keep the pure part of the SL formula and forget about the list segments. We encode the field-value mapping of a memory location (\(\mathsf {E}_{\mathsf {tls}} \mapsto \rho \)) into a conjunction of equations; each equation encodes the value \(\rho (\mathsf {pf})\) of a field \(\mathsf {pf}\) on the location \(\mathsf {E}\). We encode \(\mathsf {E}\) only if it is marked as a descriptor (local variables are implicitly existentially quantified).

Our approach hence reduces the problem of verifying linearizability to the following proof obligations: (a) we must check how an operation can be helped at the valid LP state in Fig. 5 (i.e. linearized by another thread); this corresponds to the helps function in Fig. 4, (b) for the thread that is helped, we must check the code that detects whether the operation has been helped at the valid witness state in Fig. 5; this corresponds to the helped function in Fig. 4, and (c) we must check that the helped operation is linearized exactly once.

Fig. 6.
figure 6

Data abstraction from SL formula to FOL formula.

For (a) we prove whether a given execution step in thread \(t\) can linearize another thread \(t'\) (with \(t' \ne t\)), directly following Definition 2. Let us consider how this proof works for push and pop of Fig. 2. At the statement b of push we detect the descriptor popdesc, representing a concurrent pop thread. Assuming \(\mathtt {head}(Stk) = \mathtt {TOP}\), we check:

figure e

According to Definition 2, \(\psi _{gst}\) is a necessary intermediate state in the abstract data structure between the push and pop operations (it does not exist in the actual execution). The precedent is obtained from the push’s operation specification, with the argument substituted with the formal parameter v. The consequent is the specification of pop’s operation, substituting the return value for popdesc.d, known from the descriptor. Since the stack is not updated by the instruction b, after unrolling we can prove the above formula. Both operations are linearized in this step. After verifying that pop is helped, we create a Spec Pool item \((op', \psi ', RG)\), representing the result of helping at statement b:

$$ \small \big (\mathtt {pop},\ \exists \mathtt{v}. \mathtt{popdes.g} = \mathtt{C} \wedge \mathtt{popdes.d = v},\ \mathtt{popdes} \mapsto (\mathtt{g:W}) \leadsto \mathtt{popdes} \mapsto (\mathtt{g:C})\big ) $$

As stated, \(\psi '\) is the data abstraction of state \(\sigma '\) while RG here only shows the key rely-guarantee action ([21]), i.e., g is changed from W(aiting) to C(ollided).

We prove (b) by showing that if a thread \(t'\) is linearized by another thread t, this fact is manifest through the Spec Pool. To prove that \(t'\) has been helped in a state \(\psi _{\sigma '}\), we need to find a pool element \((op', \psi ', RG)\) such that the operation of \(t'\) is with the same method name to \(op'\), and prove with an SMT solver that:

$$\begin{aligned} \small (\psi ' \Rightarrow \alpha (\psi _{\sigma '})) \wedge \lnot (\psi ' \wedge \alpha (\psi _{\sigma '\setminus RG})) \end{aligned}$$

The first conjunct implies that \(op'\) may have been linearized by another thread, and the second one ensures that this could only result from other threads’ interference RG. To check that this linearization could not have been possible without the interference from another thread’s helps, we compute the state \(\psi _{\sigma '\setminus RG}\) by symbolically executing the method (using Cave) to the code location of \(\sigma '\) dropping the rely action RG. Then the conditions recorded in the Spec Pool (the conditions hold upon helping) must contradict \(\psi _{\sigma '\setminus RG}\). Consider the path reaching the statement a2 in \(\mathtt {pop}\) in Fig. 2. The conditions in the Spec Pool for pop (\(\psi '\)) entail the data abstraction of state at a2 (abstracted as \(\mathtt{popdesc.g} = \mathtt{C}\)). The only possible way to satisfy this assertion is by the rely RG, since originally we had \(\mathtt{popdesc.g} = \mathtt{W}\) at a1. We conclude that the pop was linearized by RG, the action made by a concurrent push. We also need to ensure a program path that witnesses helping must return the value of the return-field instrumented in the descriptor (e.g. return \(\mathtt{popdesc.d}\) at a2).

We prove (c) by checking that an operation can only be helped once (e.g. helping for thread \(t'\) should be prohibited from state \(\sigma ''\) in the trace in Fig. 5). We leave the details in [25].

Our verification procedure maintains the Spec Pool as part of the abstract state, and calls function Check of Fig. 4 twice. In the first pass, we construct the Spec Pool by identifying helping scenarios; in the second pass, we exploit the Spec Pool to identify helped operations. Specifically, in Fig. 4, when a candidate state fails to fulfill the specification at line 3, we attempt to prove (a), calling function helps at line 4, which identifies a set of descriptors in the state that enable helping. If successful, the corresponding pool items are created (in the first pass). Otherwise, at line 5, by calling function helped, we attempt proof obligation b to check if the operation has been helped (in the second pass).

Verification with Hindsight. This pattern is based on the Hindsight Lemma of [16]. In the interest of space we shall avoid presenting a full example like Lazy_set [9] (see [25]), which implements a set with an optimistic lock free contains operations using a linked list. As in the picture below, each node contains three fields: a value, a mark bit representing whether the item has been removed from the list (marked with grey), and a link (denoted as n) to the following node.

figure f

The fundamental invariants for this algorithm are: 1. the elements in the set are ordered for fast lookups through the lock-free contains method, 2. the elements in the list are all reachable from the TOP pointer, and are not marked, 3. removed elements are marked before being unlinked, and 4. the next pointer of a removed node never changes, hence it might still point to a node in the data structure, until this node is in turn removed. In the figure, the set contains the elements 7 and 15, but from the removed nodes we know that it contained the elements 1, 5 and 10 at some point in the past. A concurrent contains operation, which started before the elements were removed, may assume 1, 5 and 10 are still contained. Following [16], we shall call nodes that are reachable from TOP (including those that are marked) backbone nodes (e.g. 7 and 15). Conversely, nodes that cannot be reached from TOP are called exterior nodes (e.g. 1, 5 and 10).

Lemma 1

(Hindsight [16]). Let \({\mathsf {tr}}\) be an execution of the set data structure presented above satisfying:

  1. 1.

    An exterior node never becomes a backbone node.

  2. 2.

    The successor of an exterior node will never change.

  3. 3.

    If the successor of a backbone node changes, the node remains a backbone node in the immediate following state.

Then, for any states \(\sigma _i\) = \({\mathsf {tr}}(i)\), \(\sigma _k\) = \({\mathsf {tr}}(k)\) such that \(0 \le i \le k < |{\mathsf {tr}}|\) and for any nodes u, v, w such that \(u.n \mapsto v\) is a backbone link in \(\sigma _i\), and \(v.n \mapsto w\) is a link (not necessarily in the backbone) in \(\sigma _k\), there exists a state \(\sigma _j = {\mathsf {tr}}(j)\) such that \(i \le j \le k\) and \(v.n \mapsto w\) is a backbone link in \(\sigma _j\).

Lemma 1 allows us to use exterior nodes and links in the current state to infer that there existed a past state in which the exterior nodes were in the backbone. Using this information we attempt to linearize the contains method, even if the found node is, in the current state, an exterior node. However, Lemma 1 cannot be used directly because, although an exterior link \(v.n \mapsto w\) might be found in the current state, its premise, that a link \(u.n \mapsto v\) was present in the backbone in a previous state, cannot be immediately established by looking at the current state in the symbolic execution. To resolve this problem we propose Theorem 2 which we exploit to automate the application of Lemma 1 in Poling.

Theorem 2

If there is an exterior link \(v.n \mapsto w\) in \(\sigma \), a past state of \(\sigma \) in which the link is a backbone link exists provided the following conditions:

  1. 1.

    The premises of Lemma 1 hold, and

  2. 2.

    Reach \((head(\mathtt{D}), v)\) can be proved in the sequential state \(\sigma ^{seq}\).Footnote 4

The sequential counterpart \(\sigma ^{seq}\) of a state \(\sigma \) in a trace of an operation \(op\), \(\sigma _{i} \xrightarrow {\mathsf {inv}(op)} \cdot \xrightarrow {{\mathsf {tr}}_2} \sigma \xrightarrow {{\mathsf {tr}}_3} \cdot \xrightarrow {\mathsf {res}(op)} \sigma _f \), is obtained by execution from \(\sigma _i\) to \(\sigma \) dropping all the steps from \(op\)’s concurrent operations (executing \(op\) sequentially).

Note that the second condition ensures a temporal traversal to v (see [16]) and hence guarantees that \(v.n \mapsto w\) was once a backbone link. The verification of this pattern (e.g. contains operation) is implemented in the hindsight function in line 6 in the Check function (Fig. 4). In this function, exploiting Theorem 2, if an exterior link \(v.n \mapsto w\) is found in a candidate state \(\sigma \) and \(\mathtt{Reach}(\mathtt{head}(\mathtt{D}), v)\) holds in \(\sigma ^{seq}\) (we compute \({\psi _{\sigma ^{seq}}}\) by utilizing symbolic execution with an empty set of rely-guarantee actions in the implementation), we construct a past state \(\sigma _p\) and substitute it for \(\sigma \) when verifying the method’s specification. If the verification succeeds, \(\sigma \) is a pure witness state for the verifying thread’s linearization, that is, we can deduce the existence of LP state (\(\sigma _p\)) from witness state \(\sigma \). We also customize the symbolic execution engine to verify all the three premises in Lemma 1: for each execution step \(\sigma \xrightarrow {t, ev} \sigma '\), we collect exterior nodes (symbolically) in \(\sigma \) (\(\psi _\sigma \)), and verify that the step \(ev\) does not change their successors and they do not become reachable from head(D); we also collect backbone nodes and check, if their successors are changed by \(ev\), then they remain reachable from head(D) in \(\sigma '\) (\(\psi _\sigma '\)). If any of these checks fails, the hindsight function (Fig. 4) returns false.

Fig. 7.
figure 7

Hindsight application rule.

To reconstruct a past state \(\sigma _p\) as above, we introduce the Hindsight Lemma application rule in Fig. 7. The rule is an adaptation of May-subtraction [23]. Intuitively, May-Subtract(P, Q) considers the ways in which an RGSep assertion Q can be removed from another assertion P. Our application rule works as May-Subtract(\(\psi _{Inv}^{\mathtt{D}}\), \(v.n \mapsto w\)) to subtract an exterior link out of data structure invariant, and return the remaining state with the link added back. The auxiliary function exp (expose) considers all the ways in which v can be matched to a node or linked list assertion. Notice that since the only thing that is assumed in the rule (the hypotheses) is the data structure shape invariant \(\psi _{Inv}^{\mathtt{D}}\) (derived from Cave), the resulting symbolic state is an abstraction of an actual past state. The correctness of this rule is guaranteed by the proof of Theorem 2.

Limitations. Although Poling can automatically handle concurrent data structures with non-internal linearization points, we acknowledge that it cannot verify a class of concurrent data structure whose linearization points depend on future behaviors [11]. We expect to extend Poling to support this class of programs in the future.

5 Experimental Results

We evaluated Poling Footnote 5 on 11 examples, divided into 3 categories shown in the tables of Fig. 8. In the first table we present algorithms provable using internal linearization points. We compare the times that Cave (version 2.1) and Poling take to verify the algorithms and notice that for all these programs Poling outperforms Cave. This can be attributed to our usage of SMT solvers following [17] to efficiently discharge linearizability proof obligations.

The second table presents algorithms falling under the hindsight pattern. We considered set implementation algorithms that perform an optimistic \(\mathtt {contains}\) (or \(\mathtt {lookup}\)) operation. Optimistic_set [16] traverses the list optimistically (without acquiring any locks, or synchronizing with other threads) to find a node. In contrast, Lazy_set [9], and its variant Vechev_CAS_set [24] use a bit for marking nodes before deletion.

The last table includes programs that implement the helping mechanism. Conditional compare-and-swap (CCAS) [20] is a simplified version of the well known RDCSS algorithm [8]. If a CCAS method finds a (thread) descriptor in its targeting shared memory location, they attempt to help complete the operation in that descriptor before performing its own. Finally, HSY_stack is the full HSY stack implementation [10]. Our running time in this complex example is comparable to a rewriting technique illustrated in [6]. As expected, Cave cannot prove all the programs in the second and third categories.

Fig. 8.
figure 8

Experimental results.

6 Related Work and Conclusion

Related Work. Most techniques on linearizability verification (e.g., [1, 2]) are based on forward simulation arguments, and typically only work for methods with internal linearization points local to their own code locations. To deal with external linearization points, [3] proposed a technique limited to the case where only read-only operations may have external linearization points.

Complete backward simulation strategies have been proposed in [18]. However, they are often difficult to automate. Other methods combine both forward and backward simulations, using history and/or prophecy variables [21], instrumenting the program with auxiliary state [14], or using logical relations to construct relational proofs [20]. A general and modular proof strategy is proposed in [14], that, along with lightweight instrumentation, leverages rely-guarantee reasoning to manually verify algorithms with external linearization points. In contrast, our method exploits witness states to infer a proof automatically. The helping mechanism is also considered in [6] (which does not deal with the hindsight pattern) by rewriting the implementation so that all operations have their linearization points within their rewritten code. Our technique does not rely on rewritings because the relevant witness is found within the Spec Pool.

Our technique can be considered an adaptation of [17] which verifies sequential data structures using recursive definitions on heaplets. Similar to Poling, the automata based approach [1] is also a property checking algorithm which formalizes linearizability specifications as automata, and checks the cross-product of a symbolic encoding of the program with the specification automata for safety. The main difference between Poling and [1] resides in the verification of implementations with external linearization points.

Conclusion. We describe a procedure and a tool Poling that automatically checks the linearizability of fine-grained concurrent data structures. Poling abstracts concurrent data structure into sets of locations following [17] and considers linearizability verification as a property checking technique, which are efficiently solved with an SMT solver. Poling extends prior art by incorporating important concurrent programming patterns: algorithms using helping, and algorithms that can be proved using the hindsight lemma [16]. Our experimental results provide evidence of the effectiveness of our tool.