Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Data structures that can be accessed concurrently by many parallel threads are a central component of many software applications, and are implemented in several widely used libraries (e.g., java.util.concurrent). Linearizability [17] is the standard correctness criterion for such concurrent data structure implementations. It states that each operation on the data structure can be considered as being performed atomically at some point, called the linearization point (LP), between its invocation and return. This allows client threads to understand the data structure in terms of atomic actions, without considering the complications of concurrency.

Linearizable concurrent data structures typically employ fine-grained synchronization, replacing locks by atomic operations such as compare-and-swap, and are therefore notoriously difficult to get correct, witnessed, e.g., by a number of bugs in published algorithms [8, 20]. It is therefore important to develop efficient techniques for automatically verifying their correctness. This requires overcoming several challenges.

One challenge is that the criterion of linearizability is harder to establish than standard correctness criteria, such as control state reachability; in fact, proving linearizability with respect to a given data structure specification is undecidable, even in frameworks where verification of temporal safety properties is decidable [5]. This has lead to verification techniques that establish some form of simulation between concurrent and sequential executions, and whose mechanization requires an interactive theorem prover (e.g., [6, 7, 25, 26]). Automation has been successful only under simplifying assumptions. A natural one is that LPs are fixed, i.e., can be affixed to particular statements in method implementations [1, 3, 30]. However, for a large class of linearizable implementations, the LPs are not fixed in the code of their methods, but depend on actions of other threads in each particular execution. This happens, e.g., for algorithms that employ various forms of helping mechanisms, in which the execution of a particular statement in one thread defines the LP for one or several other threads [13, 14, 34, 35].

Another challenge is that verification techniques must be able to reason about fine-grained concurrent programs that are infinite-state in many dimensions: they consist of an unbounded number of concurrent threads, which operate on an unbounded domain of data values, and use unbounded dynamically allocated memory. This challenge has been addressed by bounding the number of accessing threads [2, 32, 33], restricting the class of algorithms that can be verified [15, 31], or requiring auxiliary lemmas [24, 36].

Contributions. In this paper, we present a novel uniform framework for automatically verifying linearizability of concurrent data structure implementations, which handles the above challenges. In our framework, the user provides (i) a C-like description of the data structure implementation, (ii) a specification of sequential data structure semantics, using the simple technique of observers [1], and (iii) a linearization policy, which describes how LP placement in different concurrent threads affect each other; such linearization policies are often provided informally together with descriptions of new data structure implementations. Our framework then automatically checks that the requirement of linearizability (wrp. to the data structure specification) is satisfied when LPs are placed according to the given linearization policy. Ours is the first framework that can automatically verify all linearizable singly-linked list-based implementations of sets, stacks, and queues that have appeared in the verification literature, requiring only a small amount of user annotation (of linearization policies). Our framework relies on a number of advancements over the state-of-the-art.

  1. 1.

    We handle non-fixed LPs by a novel formalism for specifying linearization policies, by means of so-called controllers. Linearization policies capture inter-thread constraints on LP placement. They are often described informally by the algorithm designers together with each new data structure implementation when explaining why it is linearizable. Our controllers offer a way to express such policies in a simple and uniform manner. They can express complex patterns for linearization that are much more general than fixed LPs, including all the ones that are known to us from the literature, such as helping and flat-combining [1214, 23, 24, 34, 35]. Each method is equipped with a controller, whose task is to announce the occurrence of potential linearization points for that method. The controller is defined by a few simple rules that are triggered by its thread, and may also interact with controllers of other threads, in order to properly announce LPs. We specify the data structure semantics by adapting the technique of observers [1], that check correctness of the sequence of LP announcements. We extend previous usage of observers by allowing a controller to announce an LP several times. This extension allow us to handle implementations where the occurrence of a (non-fixed) LP is conditional on some predicted future condition; a false prediction can then (under some restrictions) be corrected by a renewed LP announcement. In previous approaches, such situations were handled by prophecy variables or backward simulation (e.g., [25]), that make automated verification significantly more difficult. The use of observers and controllers reduces verification to establishing that each method invocation generates a sequence of LP announcements, the last of which may change the state of the observer and conforms to the method’s call and return parameters, such that that the total sequence of LP announcements is not rejected by the observer. The establishment of these conditions can be reduced to control state reachability using standard techniques; our framework accomplishes this by automatically generated monitors.

  2. 2.

    We handle the challenge of an unbounded number of threads by extending the successful thread-modular approach which verifies a concurrent program by generating an invariant that correlates the global state with the local state of an arbitrary thread [3]. We must extend it to handle global transitions where several threads synchronize, due to interaction between controllers. We show that the number of synchronizing threads that need to be considered in the abstract postcondition computation is bounded by \(3\cdot \left( \# {\mathcal O}\right) +2\) where \(\# {\mathcal O}\) is the diameter of the observer. Furthermore, we define a condition, which we call stuttering, that allows to reduce the number of synchronizing threads to only two. The stuttering condition can be verified through a simple syntactic check and is indeed satisfied by all the examples that we have considered.

  3. 3.

    In order to reason about concurrent programs that operate on an unbounded data domain via dynamically allocated memory, we present a novel symbolic representation of singly-linked heap structures.

For stacks and queues Chakraborty et al. [15], and later Boujjani et al. [4], showed that linearizability can be checked by observing only the sequence of method calls and returns, without considering potential LP placement. This technique cannot check linearizabilitiy of sets. We can adapt their technique to our framework. By using observers as defined in [4, 15] and adapting controllers accordingly, we can use our symbolic verification technique to automatically verify stacks and queues without considering potential LP placement.

We have implemented our technique in a tool, and applied it to specify and automatically verify linearizability of all the implementations of concurrent set, queue, and stack algorithms known to us in the literature, as well as some algorithms for implementing atomic memory read/write operations. To use the tool, the user needs to provide the code of the algorithm together with the controllers that specify linearization policies. To our knowledge, this is the first time all these examples are verified fully automatically in the same framework.

Related Work. Much previous work has been devoted to the manual verification of linearizability for concurrent programs. Examples include [18, 28]. In [24], O’Hearn et al. define a hindsight lemma that provides a non-constructive evidence for linearizability. The lemma is used to prove linearizability of an optimistic variant of the lazy set algorithm. Vafeiadis [29] uses forward and backward simulation relations together with history or prophecy variables to prove linearizability. These approaches are manual, and without tool implementations. Mechanical proofs of linearizability, using interactive theorem provers, have been reported in [6, 7, 25, 26]. For instance, Colvin et al. [6] verify the lazy set algorithm in PVS, using a combination of forward and backward simulations.

There are several works on automatic verification of linearizability. In [31], Vafeiadis develops an automatic tool for proving linearizability that employs instrumentation to verify logically pure executions. However, this work can handle non-fixed LPs only for read-only methods, i.e., methods that do not modify the heap. This means that the method cannot handle algorithms like the Elimination queue [23], HSY stack [14], CCAS [12], RDCSS [12] and HM set [16] that we consider in this paper. In addition, their shape abstraction is not powerful enough to handle algorithms like Harris set [11] and Michael set [22] that are also handled by our method. Chakraborty et al. [15] describe an “aspect-oriented” method for modular verification of concurrent queues that they use to prove linearizability of the Herlihy/Wing queue. Bouajjani et al. [4] extended this work to show that verifying linearizability for certain fixed abstract data types, including queues and stacks, is reducible to control-state reachability. We can incorporate this technique into our framework by a suitable construction of observers. The method can not be applied to sets. The most recent work of Zhu et al. [36] describe a tool that is applied for specific set, queue, and stack algorithms. For queue algorithms, their technique can handle queues with helping mechanism except for HW queue [17] which is handled by our paper. For set algorithms, the authors can only handle those that perform an optimistic contains (or lookup) operation by applying the hindsight lemma from [24]. Hindsight-based proofs provide only non-constructive evidence of linearizability. Furthermore, some algorithms (e.g., the unordered list algorithm considered in Sect. 8 of this paper) do not contain the code patterns required by the hindsight method. Algorithms with non-optimistic contains (or lookup) operation like HM [16], Harris [11] and Michael [22] sets cannot be verified by their technique. Vechev et al. [33] check linearizability with user-specified non-fixed LPs, using a tool for finite-state verification. Their method assumes a bounded number of threads, and they report state space explosion when having more than two threads. Dragoi et al. [10] describe a method for proving linearizability that is applicable to algorithms with non-fixed LPs. However, their method needs to rewrite the implementation so that all operations have linearization points within the rewritten code. Černý et al. [32] show decidability of a class of programs with a bounded number of threads operating on concurrent data structures. Finally, the works [1, 3, 30] all require fixed linearization points.

We have not found any report in the literature of a verification method that is sufficiently powerful to automatically verify the class of concurrent set implementations based on sorted and non-sorted singly-linked lists having non-optimistic contains (or lookup) operations we consider. For instance the lock-free sets of HM [16], Harris [11], or Michael [22], or unordered set of [35].

2 Data Structures, Observers, and Linearizability

Data Structure Semantics. A data structure \(\mathtt{DS}\) is a pair \(\left\langle {\mathbb {D}},{\mathbb {M}}\right\rangle \) where \({\mathbb {D}}\) is the data domain and \({\mathbb {M}}\) is the alphabet of method names. An operation \({ op}\) is of the form \(\mathtt{m}(d^{ in},d^{ out})\) where \(\mathtt{m}\in {\mathbb {M}}\) is a method name and \(d^{ in},d^{ out}\in {\mathbb {D}}\) are the input resp. output data values. A trace of \(\mathtt{DS}\) is a sequence of operations. The behavior \([\![{\mathtt{DS}}]\!]\) of \(\mathtt{DS}\) is the set of traces. We often identify \(\mathtt{DS}\) with its behavior \([\![{\mathtt{DS}}]\!]\). For example, in the Set data structure, the set of method names is given by \(\left\{ \mathtt{add}, \mathtt{rmv}, \mathtt{ctn}\right\} \), and the data domain is the union \({\mathbb {Z}}\cup \mathbb {B}\) of the sets of integers and Booleans. Input and output data values are in \({\mathbb {Z}}\) and \(\mathbb {B}\) respectively. For instance, the operation \(\mathtt{add}(3,\mathtt{tt})\) successfully adds the value 3, while \(\mathtt{ctn}(2,\mathtt{ff})\) is a failed search for 2 in the set.

Fig. 1.
figure 1

Set observer.

Observers. We specify traces of data structures by observers, as introduced in [1]. Observers are finite automata extended with a finite set of registers that assume values in \({\mathbb {Z}}\). At initialization, the registers are nondeterministically assigned arbitrary values, which never change during a run of the observer. Formally, an observer \({\mathcal O}\) is a tuple \(\left\langle S^{\mathcal O},s^{\mathcal O}_\mathtt{init},\mathtt{X}^{{\mathcal O}},\varDelta ^{\mathcal O},s^{\mathcal O}_\mathtt{acc}\right\rangle \) where \(S^{\mathcal O}\) is a finite set of observer states including the initial state \(s^{\mathcal O}_\mathtt{init}\) and the accepting state \(s^{\mathcal O}_\mathtt{acc}\), a finite set \(\mathtt{X}^{{\mathcal O}}\) of registers, and \(\varDelta ^{\mathcal O}\) is a finite set of transitions. Transitions are of the form \(\left\langle s_1,\mathtt{m}(x^{ in},x^{ out}),s_2\right\rangle \) where \(x^{ in}\) and \(x^{ out}\) are either registers or constants, i.e., transitions are labeled by operations whose input or output data may be parameterized on registers. The observer accepts a trace if it can be processed in such a way that an accepting state is reached. Observers can be used to give exact specifications of the behaviors of data structures such as sets, queues, and stacks. The observer is defined in such a way that it accepts precisely those traces that do not belong to the behavior of the data structure. This is best illustrated by an example. Figure 1 depicts an observer that accepts the sequences of operations that are not in \([\![\mathtt{Set}]\!]\).

Linearizability. An operation \(\mathtt{m}(d^{ in},d^{ out})\) gives rise to two actions, namely a call action \(\left( \mathtt{eid},\mathtt{m},\mathtt{d}^{ in}\right) \downarrow \) and a return action \(\left( \mathtt{eid},\mathtt{m},\mathtt{d}^{ out}\right) \uparrow \), where \(\mathtt{eid}\in \mathbb {N}\) is an action identifier. A history \(h\) is a sequence of actions such that for each return action \(a_2\) in \(h\) there is a unique matching call action \(a_1\) where the action identifiers in \(a_1\) and \(a_2\) are identical and different from the action identifiers of all other actions in \(h\) and such that \(a_1\) occurs before \(a_2\) in \(h\). A call action carries always the same method name as its matching return action. A call action which does not match any return action in \(h\) is said to be pending. A history without pending call actions is said to be complete. A completed extension of \(h\) is a complete history \(h'\) obtained from \(h\) by appending (at the end) zero or more return actions that are matched by pending call actions in \(h\), and thereafter removing pending call actions. For action identifiers \(\mathtt{eid}_1,\mathtt{eid}_2\), we write \(\mathtt{eid}_1\preceq _{\mathtt{h}}\mathtt{eid}_2\) to denote that the return action with identifier \(\mathtt{eid}_1\) occurs before the call action with identifier \(\mathtt{eid}_2\) in \(h\). We say that a history is sequential if it is of the form \(a_1a'_1a_2a'_2\cdots a_na'_n\) where \(a'_i\) is the matching action of \(a_i\) for all \(i:1\le i\le n\), i.e., each call action is immediately followed by the matching return action. We identify a sequential history of the above form with the corresponding trace \({ op}_1{ op}_2\cdots { op}_n\) where \({ op}_i=\mathtt{m}(d^{ in}_i,d^{ out}_i)\), \(a_i=\left( \mathtt{eid}_\mathtt{i},\mathtt{m},\mathtt{d}^{ in}_{\mathtt{i}}\right) \downarrow \), and \(a_i=\left( \mathtt{eid}_\mathtt{i},\mathtt{m},\mathtt{d}^{ out}_{\mathtt{i}}\right) \uparrow \), i.e., we merge each call action together with the matching return action into one operation. A complete history \(h'\) is a linearization of \(h\) if (i) \(h'\) is a permutation of \(h\), (ii) \(h'\) is sequential, and (iii) \(\mathtt{eid}_1\preceq _{\mathtt{h}'}\mathtt{eid}_2\) if \(\mathtt{eid}_1\preceq _{\mathtt{h}}\mathtt{eid}_2\) for each pair of action identifiers \(\mathtt{eid}_1\) and \(\mathtt{eid}_2\). We say that a sequential history \(h'\) is valid wrt. \(\mathtt{DS}\) if the corresponding trace is in \([\![{\mathtt{DS}}]\!]\). We say that \(h\) is linearizable wrt. \(\mathtt{DS}\) if there is a completed extension of \(h\), which has a linearization that is valid wrt. \(\mathtt{DS}\). A set \(H\) of histories is linearizable wrt. \(\mathtt{DS}\) if all members of \(H\) are linearizable wrt. \(\mathtt{DS}\).

3 Programs

In this section, we introduce programs that consist of arbitrary numbers of concurrently executing threads that access a concurrent data structure. Each thread executes a method that performs an operation on the data structure. Each method declares local variables and a method body. Variables are either pointer variables (to heap cells), or data variables, assuming values from an infinite (ordered) domain, or from some finite set \(\mathbb {F}\) that includes the Boolean values \(\mathbb {B}\). We assume w.l.o.g. that the infinite set is given by the set \({\mathbb {Z}}\) of integers. The body is built in the standard way from atomic commands, using standard control flow constructs (sequential composition, selection, and loop constructs). Each statement is equipped with a unique label. We assume that the set of local variables include the parameter of the method in addition to the program counter pc whose value is the label of the next statement to be executed. Method execution is terminated by executing a return command, which may return a value. The global variables can be accessed by all threads, whereas local variables can be accessed only by the thread which is invoking the corresponding method. We assume that all global variables are pointer variables, and that they are initialized, together with the heap, by an initialization method, which is executed once at the beginning of program execution.

Heap cells have a fixed set \(\mathcal {F}\) of fields, namely data fields that assume values in \({\mathbb {Z}}\) or \(\mathbb {F}\), and lock fields. Furthermore, each cell has one pointer field, denoted next, and hence heap cells are organized into singly-linked lists. We use the term \({\mathbb {Z}}\)-field for a data field that assumes values in \({\mathbb {Z}}\), and the terms \(\mathbb {F}\)-field and lock field with analogous meaning. Atomic commands include assignments between data variables, pointer variables, or fields of cells pointed to by a pointer variable. The command new Node() allocates a new structure of type Node on the heap, and returns a reference to it. The compare-and-swap command CAS(&a,b,c) atomically compares the values of a and b. If equal, it assigns the value of c to a and returns true, otherwise, it leaves a unchanged and returns false. We assume a memory management mechanism, which automatically collects garbage, and also ensures that a new cell is fresh, i.e., has not been used before by the program; this avoids the so-called ABA problem (e.g., [21]).

Figure 2 depicts a program Lazy Set [13] that implements a concurrent set containing elements from \({\mathbb {Z}}\). The set is implemented as an ordered singly linked list. The program contains three methods, add, rmv, and ctn, corresponding to operations that add, remove, and check the existence of an element in the set, respectively. Each method takes an element as argument, and returns a value which indicates whether or not the operation has been successful. For instance, the operation \(\mathtt{add(e)}\) returns the value true if e was not already a member of the set. In such a case a new cell with data value e is added to its appropriate position in the list. If e is already present, then the list is not changed and the value false is returned. The program also contains the subroutine locate that is called by add and rmv methods. A cell in the list has two data fields: \(\mathtt{mark}\!:\!\mathbb {F}\), where \(\mathbb {F}= \left\{ 0,1\right\} \), and \(\mathtt{val}\!:\!{\mathbb {Z}}\), one lock field \(\mathtt{lock}\), and one pointer field \(\mathtt{next}\). The rmv method first logically removes the node from the list by setting the mark field, before physically removing the node. The ctn method is wait-free and traverses the list ignoring the locks inside the cells.

Fig. 2.
figure 2

Lazy Set.

Fig. 3.
figure 3

Reaction Rules for Controllers of Lazy Set.

4 Linearization Policies

In this section, we introduce our formalism of controllers for expressing linearization policies. A linearization policy is expressed by defining for each method an associated controller, which is responsible for generating operations announcing the occurrence of LPs during each method invocation. The controller is occasionally activated, either by its thread or by another controller, and mediates the interaction of the thread with the observer as well as with other threads.

To add controllers, we first declare some statements in each method to be triggering: these are marked by the symbol , as in Fig. 2. We specify the behavior of the controller, belonging to a method \(\mathtt{m}\), by a set \(R^{\mathtt{m}}\) of reaction rules. To define these rules, we first define different types of events that are used to specify their behaviors. Recall that an operation is of the form \(\mathtt{m}(d^{ in},d^{ out})\) where \(\mathtt{m}\) is a method name and \(d^{ in},d^{ out}\in {\mathbb {D}}\) are data values. Operations are emitted by the controller to the observer to notify that the thread executing the method performs a linearization of the corresponding method with the given input and output values. Next, we fix a set \(\varSigma \) of broadcast messages, each with a fixed arity, which are used for synchronization between controllers. A message is formed by supplying data values as parameters. In reaction rules, these data values are denoted by expressions over the variables of the method, which are evaluated in the current state when the rule is invoked. In an operation, the first parameter, denoting input, must be either a constant or the parameter of the method call.

  • A triggered rule, of form , specifies that whenever the method executes a triggering statement and the condition \({ cnd}\) evaluates to true, then the controller performs a reaction in which it emits the operation obtained by evaluating \({ op}\) to the observer, and broadcasts the message obtained by evaluating \({ se}\) to the controllers of other threads. The broadcast message \({ se}\) is optional.

  • A receiving rule, of form \( \text{ when } \left\langle { re},\mathtt{ord}\right\rangle \text{ provided } { cnd} \text{ emit } { op}\), specifies that whenever the observer of some other thread broadcasts the message obtained by evaluating \({ re}\), and \({ cnd}\) evaluates to true, then the controller performs a reaction where it emits the operation obtained by evaluating \({ op}\) to the observer. Note that no further broadcasting is performed. The interaction of the thread with the observer may occur either before or after the sender thread, according to the flag \(\mathtt{ord}\).

A controller may also use a finite set of states, which restrict the possible sequences of reactions by a controller in the standard way. Whenever such states are used, the rule includes source and target states using kewords from and goto. In Fig. 3, the rule \(\rho _7\) changes the state from \(q_0\) to \(q_1\), meaning that no further applications of rules \(\rho _6\) or \(\rho _7\) are possible, since they both start from state \(q_0\). Rules that do not mention states can be applied regardless of the controller state and leave it unchanged.

Let us illustrate how the reaction rules for controllers in Fig. 3 specify LPs for the algorithm in Fig. 2. Here, a successful rmv method has its LP at line 3, and an unsuccessful rmv has its LP at line 2 when the test c.val = e evaluates to false. Therefore, both these statements are marked as triggering. The controller has a reaction rule for each of these cases: in Fig. 3: rule \(\rho _3\) corresponds to a successful rmv, whereas rule \(\rho _4\) corresponds to an unsuccessful rmv. Rule \(\rho _4\) states that whenever the rmv method executes a triggering statement, from a state where pc=2 and c.val \(<>\) e, then the operation rmv(e,false) will be emitted to the observer.

A successful add method has its LP at line 4. Therefore, the controller for add has the triggered rule \(\rho _1\) which emits the operation add(e,true) to the observer. In addition, the controller also broadcasts the message add(e), which is received by any controller for a ctn method which has not yet passed line 4, thereby linearizing an unsuccessful ctn(e) method by emitting ctn(e,false) to the observer. The keyword before denotes that the operation ctn(e,false) will be presented before add(e,true) to the observer. Since the reception of add(e) is performed in the same atomic step as the triggering statement at line 4 of the add method, this describes a linearization pattern, where a ctn method, which has not yet reached line 4, linearizes an unsuccessful ctn-invocation just before some other thread linearizes a successful add of the same element.

To see why unsuccessful ctn method invocations may not have fixed LPs, note that the naive attempt of defining the LP at line 4 provided that the test at line 5 fails will not work. Namely, the ctn method may traverse the list and arrive at line 4 in a situation where the element e is not in the list (either e is not in any cell, or the cell containing e is marked). However, before executing the command at line 4, another thread performs an add operation, inserting a new cell containing the element e into the list. The problem is now that the ctn method cannot “see” the new cell, since it is unreachable from the cell currently pointed to by the variable b of the ctn method. If the ctn method would now try to linearize an unsuccessful ctn(e), this would violate the semantics of a set, since the add method just linearized a successful insertion of e.

A solution to this problem, following [13], is to let an unsuccessful ctn(e) method linearize at line 4 only if no successful add(e) method linearized since its invocation. If some other thread linearizes a successful add(e) before the ctn(e) method executes line 4, then the ctn(e) method should linearize immediately before the add(e) method. This solution is expressed by the rules in Fig. 3. Note, however, that it is now possible for an invocation of a successful ctn(e) to emit several operations: It can first emit ctn(e,false) together with the linearization of a successful add(e), and thereafter emit ctn(e,true) when it executes the statement at line 4, if it then finds the element e in the list (by rule \(\rho _5\)). In such a case, both operations are fine with the observer, and the last one is the event that causes the method to return true, i.e., conforms with the parameters and returns values.

Verifying Linearization Policies. By using an observer to specify the sequential semantics of the data structure, and defining controllers that specify the linearization policy, the verification of linearizability is reduced to establishing four conditions: (i) each method invocation generates a non-empty sequence of operations, (ii) the last operation of a method conforms to its parameters and return value, (iii) only the last operation of a method may change the state of the observer, and (iv) the sequence of all operations cannot drive the observer to an accepting state. Our verification framework automatically reduces the establishment of these conditions to a problem of checking control state reachability. This is done by augmenting the observer by a monitor. The monitor is automatically generated. It keeps track of the state of the observer, and records the sequence of operations and call and return actions generated by the threads. For each thread, it keeps track of whether it has linearized, whether it has caused a state change in the observer, and the parameters used in its last linearization. Using this information, it goes to an error state whenever any of the above four conditions is violated. The correctness of this approach is established in the next section, as Theorem 1.

5 Semantics

In this section, we formalize the semantics of programs, controllers, observers, and monitors, as introduced in the preceding sections. We do this in several steps. First, we define the state of the heap and the transition relation induced by a single thread. From this we derive the semantics of the program and use it to define its history set. We then define the semantics of observers, and its augmentation by the monitor; the observer is embedded in the monitor. Finally, we define the product of the augmented program and the monitor. On this basis, we can formally state and prove the correctness of our approach, as Theorem 1.

For a function \(f:A\mapsto B\) from a set A to a set B, we use \(f[a_1\leftarrow b_1,\ldots ,a_n\leftarrow b_n]\) to denote the function \(f'\) such that \(f'(a_i)=b_i\) and \(f'(a)=f(a)\) if \(a\not \in \left\{ a_1,\ldots ,a_n\right\} \).

Below, we assume a program \({\mathcal P}\) with a set \(\mathtt{X}^\mathtt{gl}\) of global variables, a data structure \(\mathtt{DS}\) specified as an observer \({\mathcal O}\). We assume that each thread \(\mathtt{th}\) executes one method denoted \(\mathtt{Method}\left( \mathtt{th}\right) \).

Heaps. A heap (state) is a tuple \({\mathcal H}=\left\langle {\mathbb C},\mathtt{succ},\mathtt{Val}^\mathtt{gl},\mathtt{Val}^{{\mathbb C}}\right\rangle \), where (i) \({\mathbb C}\) is a finite set of cells, including the two special cells \(\mathtt{null}\) and \(\bot \) (dangling); we define \({\mathbb C}^{-}={\mathbb C}-\left\{ \mathtt{null},\bot \right\} \), (ii) \(\mathtt{succ}:{\mathbb C}^{-}\rightarrow {\mathbb C}\) is a total function that defines the next-pointer relation, i.e., \(\mathtt{succ}\left( {\mathbbm {c}}\right) ={\mathbbm {c}}'\) means that cell \({\mathbbm {c}}\) points to \({\mathbbm {c}}'\), (iii) \(\mathtt{Val}^\mathtt{gl}:\mathtt{X}^\mathtt{gl}\rightarrow {\mathbb C}\) maps the global (pointer) variables to their values, and (iv) \(\mathtt{Val}^{{\mathbb C}}:{\mathbb C}\times \mathcal {F}\rightarrow \mathbb {F}\cup {\mathbb {Z}}\) maps data and lock fields of each cell to their values. We assume that the heap is initialized by an initialization thread, and let \({\mathcal H}_\mathtt{init}\) denote the initial heap.

Threads. A local state \(\mathtt{loc}\) of a thread \(\mathtt{th}\) wrt. a heap \({\mathcal H}\) defines the values of its local variables, including the program counter pc and the input parameter for the method executed by \(\mathtt{th}\). In addition, there is the special initial state \(\mathtt{idle}\), and terminated state \(\mathtt{term}\). A view \(\mathtt{view}\) is a pair \(\left\langle \mathtt{loc},{\mathcal H}\right\rangle \), where \(\mathtt{loc}\) is a local state wrt. the heap \({\mathcal H}\). A thread \(\mathtt{th}\) induces a transition relation \({\xrightarrow {}{}}_{\!\!\mathtt{th}}\) on views. Initially, \(\mathtt{th}\) is in the state \(\mathtt{idle}\). It becomes active by executing a transition \(\left\langle \mathtt{idle},{\mathcal H}\right\rangle {\xrightarrow {{\left( \mathtt{th},\mathtt{m},d^{ in}\right) \downarrow }}{}}_{\!\!\mathtt{th}} \left\langle \mathtt{loc}_\mathtt{init}^{\mathtt{th}},{\mathcal H}\right\rangle \), labeled by a call action with \(\mathtt{th}\) as the event identifier and \(\mathtt{m}=\mathtt{Method}\left( \mathtt{th}\right) \). It moves to an initial local state \(\mathtt{loc}_\mathtt{init}^{\mathtt{th}}\) where \(d^{ in}\) is the value of its input parameter, the value of pc is the label of the first statement of the method, and the other local variables are undefined. Thereafter, \(\mathtt{th}\) executes statements one after one. We write \(\mathtt{view}{\xrightarrow {\lambda }{}}_{\!\!\mathtt{th}}\mathtt{view}'\) to denote that the statement labeled by \(\lambda \) can be executed from \(\mathtt{view}\), yielding \(\mathtt{view}'\). Note that the next move of \(\mathtt{th}\) is uniquely determined by \(\mathtt{view}\), since \(\mathtt{th}\) cannot access the local states of other threads. Finally, \(\mathtt{th}\) terminates when executing its return command giving rise to a transition \(\left\langle \mathtt{loc},{\mathcal H}\right\rangle {\xrightarrow {\left( \mathtt{th},\mathtt{m},d^{ out}\right) \uparrow }{}}_{\!\!\mathtt{th}} \left\langle \mathtt{term},{\mathcal H}\right\rangle \), labeled by a return action with \(\mathtt{th}\) as event identifier, \(\mathtt{m}=\mathtt{Method}\left( \mathtt{th}\right) \), and \(d^{ out}\) as the returned value.

Programs. A configuration of a program \({\mathcal P}\) is a tuple \(\left\langle \mathtt{T},\mathtt{LOC},{\mathcal H}\right\rangle \) where \(\mathtt{T}\) is a set of threads, \({\mathcal H}\) is a heap, and \(\mathtt{LOC}\) is a local state mapping over \(\mathtt{T}\) wrt. \({\mathcal H}\) that maps each thread \(\mathtt{th}\in \mathtt{T}\) to its local state \(\mathtt{LOC}\left( \mathtt{th}\right) \) wrt. \({\mathcal H}\). The initial configuration \(c_\mathtt{init}^{{\mathcal P}}\) is the pair \(\left\langle \mathtt{LOC}_\mathtt{init},{\mathcal H}_\mathtt{init}\right\rangle \), where \(\mathtt{LOC}_\mathtt{init}\left( \mathtt{th}\right) =\mathtt{idle}\) for each \(\mathtt{th}\in \mathtt{T}\), i.e., \({\mathcal P}\) starts its execution from a configuration with an initial heap, and with each thread in its initial local state. A program \({\mathcal P}\) induces a transition relation \({\xrightarrow {}{}}_{\!\!{\mathcal P}}\) where each step corresponds to one move of a single thread. This is captured by the rules P1, P2, and P3 of Fig. 4 (here \(\varepsilon \) denotes the empty word.) Note that the only visible transitions are those corresponding to call and return actions. A history of \({\mathcal P}\) is a sequence of form \(a_1a_2\cdots a_n\), such that there is a sequence \( c_0{\xrightarrow {a_1}{}}_{\!\!{\mathcal P}} c_1{\xrightarrow {a_2}{}}_{\!\!{\mathcal P}} \cdots c_{n-1}{\xrightarrow {a_n}{}}_{\!\!{\mathcal P}}c_n\) of transitions from the initial configuration \(c_0=c_\mathtt{init}^{{\mathcal P}}\). We define \({\mathcal H}\left( {\mathcal P}\right) \) to be the set of histories of \({\mathcal P}\). We say that \({\mathcal P}\) is linearizable wrt. \(\mathtt{DS}\) if the set \({\mathcal H}\left( {\mathcal P}\right) \) is linearizable wrt. \(\mathtt{DS}\).

Controllers. The controller of a thread \(\mathtt{th}\) induces a transition relation \({\xrightarrow {}{}}_{\!\!\mathtt{cntrl}\left( \mathtt{th}\right) }\), which depends on the current view \(\mathtt{view}\) of \(\mathtt{th}\). For an expression \({ expr}\) denoting an operation, broadcast message, or condition, let \(\mathtt{view}({ expr})\) denote the result of evaluating \({ expr}\) in the state \(\mathtt{view}\). For a predicate \({ cnd}\), we let \(\mathtt{view}\models { cnd}\) denote that \(\mathtt{view}({ cnd})\) is true. Transitions are defined by the set \(R^{\mathtt{Method}\left( \mathtt{th}\right) }\) of rules (\(R^{\mathtt{th}}\) for short) associated with the method of \(\mathtt{th}\). Rule C1 describes transitions induced by a triggered rule, and states that if \({ cnd}\) is true under \(\mathtt{view}\), then the controller emits the operation obtained by instantiating \({ op}\) and broadcasts the message obtained by instantiating \({ se}\), according to \(\mathtt{view}\). Rule C2 describes transitions induced by receiving a message \(e'\), and states that if \({ re}\) equals the value of \(e'\) in \(\mathtt{view}\), and \({ cnd}\) evaluates to true in \(\mathtt{view}\), then the operation \(\mathtt{view}({ op})\) is emitted. In both cases, the controller also emits to the observer the identity \(\mathtt{th}\) of the thread. This identity will be used for bookkeeping by the monitor (see below.)

Fig. 4.
figure 4

Inference rules of the semantics.

Augmented Threads. An augmented view \(\left\langle q,\mathtt{loc},{\mathcal H}\right\rangle \) extends a view \(\left\langle \mathtt{loc},{\mathcal H}\right\rangle \) with a state \(q\) of the controller. We define a transition relation \({\xrightarrow {}{}}_{\!\!\mathtt{aug}\left( \mathtt{th}\right) }\) over augmented views that describes the behavior of a thread \(\mathtt{th}\) when augmented with its controller. For a label \(\lambda \) of a statement we write to denote that the corresponding statement is triggered. Rule TC1 states that a non-triggered statement by the thread does not affect the state of the controller, and does not emit or broadcast anything. Rule TC2 states a return action by the thread. Rule TC3 states that when the thread \(\mathtt{th}\) performs a triggered statement, then the controller will emit an operation to the observer and broadcast messages to the other threads. The reception of broadcast messages will be described in Rule R.

Augmented Programs. An augmented program \({\mathcal Q}\) is obtained from a program \({\mathcal P}\) by augmenting each thread with its controller. A configuration \(c\) of \({\mathcal Q}\) is a tuple \(\left\langle \mathtt{T},\mathtt{Q},\mathtt{LOC},{\mathcal H}\right\rangle \), which extends a configuration of \({\mathcal P}\) by a controller state mapping, \(\mathtt{Q}\), which maps each thread \(\mathtt{th}\in \mathtt{T}\) to the state of its controller. We define the size \(\left| c\right| =\left| \mathtt{T}\right| \) and define \(\mathtt{ThreadsOf}\left( c\right) =\mathtt{T}\). We use \({\mathcal C}^{{\mathcal Q}}\) to denote the set of configurations of \({\mathcal Q}\). Transitions of \({\mathcal Q}\) are performed by the threads in \(\mathtt{T}\). The rule Q1 describes return action or a non-triggered statement in which only its local state and the heap may change. Rule Q2 describes the case where a thread \(\mathtt{th}_s\) executes a triggered statement and hence also broadcasts a message to the other threads. Before describing this rule, we describe the effect of message reception. Consider a set \(\mathtt{T}\) of threads, a heap \({\mathcal H}\), a controller state mapping \(\mathtt{Q}\) over \(\mathtt{T}\), and a local state mapping \(\mathtt{LOC}\) over \(\mathtt{T}\) wrt. \({\mathcal H}\). We will define a relation that extracts the set of threads in \(\mathtt{T}\) that can receive \(e\) from \(\mathtt{th}_s\). For \(\mathtt{ord}\in \left\{ \texttt {before},\texttt {after}\right\} \), define \(\mathtt{enabled}\left( \mathtt{T},\mathtt{Q},\mathtt{LOC},{\mathcal H},e,\mathtt{ord},\mathtt{th}_s\right) \) to be the set of threads \(\mathtt{th}\in \mathtt{T}\) such that (i) \(\mathtt{th}\ne \mathtt{th}_s\), i.e., a receiver thread should be different from the sender thread, and (ii) \(\left\langle \mathtt{LOC}(\mathtt{th}),{\mathcal H}\right\rangle : \mathtt{Q}(\mathtt{th}){\xrightarrow {\left\langle \mathtt{th},{ op}\right\rangle \mid {\left\langle e,\mathtt{ord}\right\rangle }?}{}}_{\!\!\mathtt{cntrl}\left( \mathtt{th}\right) }q'\) for some \(q'\) and \({ op}\), i.e., the controller of \(\mathtt{th}\) has an enabled receiving rule that can receive \(e\), with the ordering flag given by \(\mathtt{ord}\). The rule R describes the effect of receiving a message that has been broadcast by a sender thread \(\mathtt{th}_s\). Each thread that can receive the message will do so, and the thread potentially will all change their controller state and emit an operation. Notice that the receiver threads are collected in a set and therefore the rule allows any ordering of the receiving threads. We are now ready to explain the rule Q2. The sender thread \(\mathtt{th}_s\) broadcast a message \(e_2\) that will be received by other threads according to rule R. The sender thread and the receiver threads, all emit operations (\(e_1\) and \(w\) respectively). Depending on the order specified in the specification of the controller, a receiving thread may linearize before or after \(\mathtt{th}_s\). Notice that the receiver threads only change the local states of their controllers and the state of the observer. An initial configuration of \({\mathcal Q}\in {\mathcal C}^{{\mathcal Q}}\) is of the form \(\left\langle \mathtt{T},\mathtt{Q}_\mathtt{init},\mathtt{LOC}_\mathtt{init},{\mathcal H}_\mathtt{init}\right\rangle \) where, for each \(\mathtt{th}\in \mathtt{T}\), \(\mathtt{LOC}_\mathtt{init}\left( \mathtt{th}\right) \) is the initial state of the controller of \(\mathtt{th}\), and \(\mathtt{LOC}_\mathtt{init}\left( \mathtt{th}\right) =\mathtt{idle}\).

Observers. Let \({\mathcal O}=\left\langle S^{\mathcal O},s^{\mathcal O}_\mathtt{init},\mathtt{X}^{{\mathcal O}},\varDelta ^{\mathcal O},s^{\mathcal O}_\mathtt{acc}\right\rangle \). A configuration \(c\) of \({\mathcal O}\) is a pair \(\left\langle s,\mathtt{Val}\right\rangle \) where \(s\in S^{\mathcal O}\) is an observer state and \(\mathtt{Val}\) assigns a value to each register in \({\mathcal O}\). We use \({\mathcal C}^{{\mathcal O}}\) to denote the set of configurations of \({\mathcal O}\). The transition relation \({\xrightarrow {}{}}_{\!\!{\mathcal O}}\) of the observer is described by the rules O1 and O2 of Fig. 4. Rule O1 states that if there is an enabled transition that is labeled by a given action then such a transition may be performed. Rule O2 states that if there is no such a transition, the observer remains in its current state. Notice that the rules imply that the register values never change during a run of the observer. An initial configuration of \({\mathcal O}\in {\mathcal C}^{{\mathcal O}}\) is of the form \(\left\langle s^{\mathcal O}_\mathtt{init},\mathtt{Val}\right\rangle \).

Monitors. The monitor \({\mathcal M}\) augments the observer. It keeps track of the observer state, and the sequence of operations and call and return actions generated by the threads of the augmented program. It reports an error if one of the following happens: (i) a thread terminates without having linearized, (ii) the parameters of call and return actions are different from those of the corresponding emitted operation, (iii) a thread linearizes although it has previously changed the state of the observer, or (iv) \({\mathcal Q}\) generates a trace which violates the specification of the data structure \(\mathtt{DS}\). The monitor keeps track of these conditions as follows. A configuration of \({\mathcal M}\) is either (i) a tuple \(\left\langle \mathtt{T},c,\mathtt{Lin},\mathtt{RV}\right\rangle \) where \(\mathtt{T}\) is a set of threads, \(c\) is an observer configuration, \(\mathtt{Lin}:\mathtt{T}\mapsto \left\{ 0,1,2\right\} \), and \(\mathtt{RV}:\mathtt{T}\mapsto {\mathbb {Z}}\cup \mathbb {F}\); or (ii) the special state error. For a thread \(\mathtt{th}\in \mathtt{T}\), the values 0, 1, 2 of \(\mathtt{Lin}\left( \mathtt{th}\right) \) have the following interpretations: 0 means that \(\mathtt{th}\) has not linearized yet, 1 means that \(\mathtt{th}\) has linearized but not changed the state of the observer, and 2 means that \(\mathtt{th}\) has both linearized and changed the state of the observer. Furthermore, \(\mathtt{RV}\left( \mathtt{th}\right) \) stores the value returned by \(\mathtt{th}\) the latest time it performed a linearization. In case \(c\) is of the first form, we define \(\mathtt{ThreadsOf}\left( c\right) =\mathtt{T}\). We use \({\mathcal C}^{{\mathcal M}}\) to denote the set of configurations of \({\mathcal M}\). The rules M1 through M8 describe the transition relation \({\xrightarrow {}{}}_{\!\!{\mathcal M}}\) induced by \({\mathcal M}\). Rule M1 describes the scenario when the monitor detects an operation \(\mathtt{m}(d^{ in},d^{ out})\) performed by a thread \(\mathtt{th}\) such that \({\mathcal O}\) does not change its state. In such a case, the flag \(\mathtt{Lin}\) for \(\mathtt{th}\) is updated to 1 (the thread has linearized but not changed the state of the observer), and the latest value \(d^{ out}\) returned by \(\mathtt{th}\) is stored \(\mathtt{RV}\). Rule M2 is similar except that the observer changes state and hence the flag \(\mathtt{Lin}\) is updated to 2 for \(\mathtt{th}\). Notice that in both rules, a premise is that \(\mathtt{th}\) has not changed the observer state previously (the flag \(\mathtt{Lin}\) for \(\mathtt{th}\) is different from 2.) Rules M3, M4, M5, and M6 describe the conditions (i), (ii), (iii), and (iv) respectively described above that lead to the error state. The rule M7 describes the fact that the error state is a sink, i.e., once \({\mathcal M}\) enters that state then it will never leave it. Finally, rule M8 describes the reflexive transitive closure of the relation \({\xrightarrow {}{}}_{\!\!{\mathcal M}}\). An initial configuration of \({\mathcal M}\in {\mathcal C}^{{\mathcal M}}\) is of the form \(\left\langle \mathtt{T},c_\mathtt{init},\mathtt{Lin}_\mathtt{init},\mathtt{RV}_\mathtt{init}\right\rangle \) where \(c_\mathtt{init}\in c_\mathtt{init}^{{\mathcal O}}\) is an initial configuration of \({\mathcal O}\), \(\mathtt{Lin}_\mathtt{init}\left( \mathtt{th}\right) =0\) and \(\mathtt{RV}_\mathtt{init}\left( \mathtt{th}\right) \) is undefined for every thread \(\mathtt{th}\in \mathtt{T}\).

Product. We use \({\mathcal S}={\mathcal Q}\,{\otimes }\,{\mathcal M}\) to denote the system obtained by running \({\mathcal Q}\) and \({\mathcal M}\) together. A (initial) configuration of \({\mathcal S}\) is of the form \(\left\langle c_1,c_2\right\rangle \) where \(c_1\) and \(c_2\) are (initial) configurations of \({\mathcal Q}\) and \({\mathcal M}\) respectively such that \(\mathtt{ThreadsOf}\left( c_1\right) =\mathtt{ThreadsOf}\left( c_2\right) \). We use \({\mathcal C}^{{\mathcal S}}\) to denote the set of configurations of \({\mathcal S}\). The induced transition relation \({\xrightarrow {}{}}_{\!\!{\mathcal S}}\) is described by rule QM of Fig. 4. Intuitively, in the composed systems, the augmented program and the monitor synchronize through the actions they produce. For a configuration \(c\in {\mathcal C}^{{\mathcal S}}\), we define \(\mathtt{Post}\left( c\right) =\left\{ c'\ | \ c{\xrightarrow {}{}}_{\!\!{\mathcal S}}c'\right\} \). For a set C of system configurations, we define \(\mathtt{Post}\left( C\right) = \bigcup _{c\in C} \mathtt{Post}\left( c\right) \). We say that that error is reachable in \({\mathcal S}\) if for an initial configuration \(c\) of \({\mathcal S}\) and configuration \(c'\) of \({\mathcal Q}\).

Verifying Linearizability. The correctness of this approach is established in the following theorem.

Theorem 1

If error is not reachable in \({\mathcal S}\) then \({\mathcal P}\) is linearizable wrt. \(\mathtt{DS}\).

Proof

To prove \({\mathcal P}\) is linearizable wrt. \(\mathtt{DS}\). We must establish that for any history h of \({\mathcal P}\), there exists a history \(h'\) such that \(h'\) is linearization of h and \(h'\) is valid wrt. \(\mathtt{DS}\). So, consider a history h of \({\mathcal P}\). From Condition (i) in the paragraph about monitors in Sect. 5, h is a complete history. Then from Conditions (ii) and (iii), each action in h has a LP and whose call and return parameters are consistent with the corresponding emitted operation by the controller. Therefore, there exists a complete sequential history \(h'\) which is a permutation of h whose actions are ordered according to the order of their LPs in h. Therefore \(h'\) is linearization of h. From Condition (iv), \(h'\) is valid wrt. \(\mathtt{DS}\). Therefore, h is linearizable wrt. \(\mathtt{DS}\).    \(\square \)

6 Thread Abstraction

By Theorem 1, linearizability can be verified by establishing a reachability property for the product \({\mathcal S}={\mathcal Q}{\otimes }{\mathcal M}\). This verification must handle the challenges of an unbounded number of threads, an unbounded heap, and unbounded data domain. In this section, we describe how we handle an unbounded number of threads by adapting the thread-modular approach [3], and extending it to handle global transitions where several threads synchronize, due to interaction between controllers.

Let a heap cell \({\mathbbm {c}}\) be accessible from a thread \(\mathtt{th}\) if it is reachable (directly or via sequence of next-pointers) from a global variable or local variable of \(\mathtt{th}_1\). A thread abstracted configuration of \({\mathcal S}\) is a pair \(\left\langle c^{ta},\mathtt{private}\right\rangle \), where \(c^{ta}\in {\mathcal C}^{{\mathcal S}}\) with \(\mathtt{ThreadsOf}\left( c^{ta}\right) = \left\{ \mathtt{th}_1\right\} \), such that each cell in the heap of \(c^{ta}\) is accessible from \(\mathtt{th}_1\) (i.e., the heap has no garbage), and where \(\mathtt{private}\) is a predicate over the cells in the heap of \(c^{ta}\). For a system configuration \(c^{\mathcal S}\), let \(\alpha _\mathtt{thread}\left( c^{\mathcal S}\right) \) be the set of thread abstracted configurations \(\left\langle c^{ta},\mathtt{private}\right\rangle \), such that (i) \(c^{ta}\) can be obtained by removing all threads from \(c^{\mathcal S}\) except one, renaming that thread to \(\mathtt{th}_1\), and thereafter removing all heap cells that are not accessible from \(\mathtt{th}_1\), (ii) \(\mathtt{private}({\mathbbm {c}})\) is true iff in \(c^{\mathcal S}\), the heap cell \({\mathbbm {c}}\) is accessible only from the thread that is not removed. For a set \(C\) of system configurations, define \(\alpha _\mathtt{thread}\left( C\right) = \mathop \bigcup _{c^{\mathcal S}\in C}\alpha _\mathtt{thread}\left( c^{\mathcal S}\right) \). Conversely, for a set \(C\) of thread abstracted configurations, define \(\gamma _\mathtt{thread}\left( C\right) = \left\{ c^{\mathcal S}\ | \ \alpha _\mathtt{thread}(\{C\})\subseteq C\right\} \). Define the postcondition operation \(\mathtt{Post}^{ta}\) on sets of thread abstracted configurations in the standard way by \( \mathtt{Post}^{ta}\left( C\right) = \alpha _\mathtt{thread}(\mathtt{Post}\left( \gamma _\mathtt{thread}(C)\right) ) \). By standard arguments, \(\mathtt{Post}^{ta}\) then soundly overapproximates \(\mathtt{Post}\).

Fig. 5.
figure 5

Example of (a) heap state of Lazy Set algorithm, (b) its thread-abstracted version, and (c) symbolic version. The observer register x has value 9.

In Fig. 5 (a) is a possible heap state of the Lazy Set algorithm of Fig. 2. Each cell contains the values of val, mark, and lock from top to bottom, where ✔ denotes true, and ✗ denotes false (or free for lock). There are three threads, 1, 2, and 3 with pointer variable p[i] of thread i labeling the cell that it points to. The observer register x has value 9. Figure 5 (b) shows its thread abstraction onto thread 1.

We observe that the concretization of a set of abstract configurations is a set of configurations with arbitrary sizes. However, as we argue below, it is sufficient to only consider such sizes up to \(3\cdot (\# {\mathcal O})+2\). The reason is that the thread abstraction encodes, for an arbitrary thread \(\mathtt{th}\), its view, i.e., its local state, the state of the heap, and the observer. In order to preserve soundness, we need not only to consider the transitions of \(\mathtt{th}\), but also how the transitions of other threads may change the states of the heap and the observer. Suppose that a different thread \(\mathtt{th}'\) performs a transition. We consider two cases depending on whether \(\mathtt{th}'\) sends a broadcast message or not. If not, it is sufficient for \(\mathtt{th}\) to see how \(\mathtt{th}'\) alone changes the heap and the observer. If \(\mathtt{th}'\) sends a message an arbitrary number of threads may receive it. Now note that only \(\mathtt{th}'\) may change the heap, while the receivers only change the state of the observer (Sect. 5). Also note that the values of the observer registers never change. The total effect is that \(\mathtt{th}'\) may change the state of the heap, while the other threads only change the state of the observer. Any such effect can be accomplished by only considering at most \(3\cdot (\# {\mathcal O})+2\) threads. The diameter \(\# {\mathcal O}\) of \({\mathcal O}\) is the length of the longest simple path in \({\mathcal O}\). Formally, define \(\gamma _\mathtt{thread}^{k}\left( C\right) = \left\{ c\ | \ \left( c\in \gamma _\mathtt{thread}\left( C\right) \right) \cap \left| c\right| \le k\right\} \), i.e., it is the set of configurations of in the concretization of \(C\) with at most \(k\) threads.

Lemma 1

For a set of abstract configurations \(C\), we have \(\mathtt{Post}^{ta}{(C)} = \alpha _\mathtt{thread}\left( \mathtt{Post}\left( \gamma _\mathtt{thread}^{k}\left( C\right) \right) \right) \) where \(k = 3\cdot (\# {\mathcal O})+2\).

As we mentioned above, when we perform helping transitions, the only effect of receiver threads is to change the state of the observer. When the receivers do not change the observer to a non-accepting state, we say that the transition is stuttering wrt. the observer. The system \({\mathcal S}\) is stuttering if all helping transitions of \({\mathcal S}\) are stuttering. This can be verified by a simple syntactic check of the observer and controller. In all the examples we have considered, the system turns out to be stuttering. For instance, the receiver threads in the Lazy Set algorithm are all performing stuttering ctn operations. Hence, the concretization needs only consider the sender thread, together with the thread whose view we are considering, reducing the number of threads to two.

Lemma 2

For a set abstract configurations \(C\) of \({\mathcal S}\), if \({\mathcal S}\) is stuttering, then \(\mathtt{Post}^{ta}{(C)} = \alpha _\mathtt{thread}\left( \mathtt{Post}\left( \gamma _\mathtt{thread}^{2}\left( C\right) \right) \right) \).

7 Symbolic Shape and Data Abstraction

This section describes how we handle an unbounded heap and data domain in thread-abstracted configurations. We assume that each heap cell contains exactly one \({\mathbb {Z}}\)-field.

The abstraction of thread-abstracted configurations we use is a variation of a well-known abstraction of singly-linked lists [19]. We explicitly represent only a finite subset of the heap cells, called the relevant cells. The remaining cells are summarized into linked-list segments of arbitrary length. Thus, we intuitively abstract the heap into a graph-like structure, whose nodes are relevant cells, and whose edges are the connecting segments. More precisely, a cell \({\mathbbm {c}}\) (in a thread-abstracted configuration) is relevant if it is either (i) pointed to by a (global or local) pointer variable, or (ii) the value of \({{\mathbbm {c}}_1.\mathtt{next}}\) and \({{\mathbbm {c}}_2.\mathtt{next}}\) for two different cells \({\mathbbm {c}}_1\), \({\mathbbm {c}}_2 \in {\mathbb C}\), one of which is reachable from some global variable, or (iii) the value of \({{\mathbbm {c}}'.\mathtt{next}}\) for some cell \({\mathbbm {c}}' \in {\mathbb C}\) such that \(\mathtt{private}({\mathbbm {c}}')\) but \(\lnot \mathtt{private}({\mathbbm {c}})\). Each relevant cell is connected, by a linked-list segment, to a uniquely defined next relevant cell. For instance, the thread abstracted heap in Fig. 5(b) has 5 relevant cells: 4 cells that are pointed to by variables (Head, Tail, p[1], and c[1]) and the cell where the lower list segment meets the upper one (the second cell in the top row). The corresponding symbolic representation, shown in Fig. 5(c), contains only the relevant cells, and connects each relevant cell to its successor relevant cell. Note that the cell containing 4 is not relevant, since it is not globally reachable. Consequently, we do not represent the precise branching structure among cells that are not globally reachable. This is an optimization to limit the size of our symbolic representation.

Our symbolic representation must now represent (A) data variables and data fields of relevant cells, and (B) data fields of list segments. For (A) we do as follows.

  1. 1.

    Each data and lock variable, and each data and lock field of each relevant cell, is mapped to a domain which depends on the set of values it can assume: (i) variables and fields that assume values in \(\mathbb {F}\) are mapped to \(\mathbb {F}\), (ii) lock variables and fields are mapped to \(\left\{ \mathtt{th}_1, other , free \right\} \), representing whether the lock was last acquired by \(\mathtt{th}_1\), by another thread, or is free, (iii) variables and fields that assume values in \({\mathbb {Z}}\) are mapped to \([\mathtt{X}^{{\mathcal O}}\mapsto {\left\{ =,\ne \right\} }]\), representing for each observer register \(x \in \mathtt{X}^{{\mathcal O}}\), whether or not the value of the variable or field equals the value of x.

  2. 2.

    Each pair of variables or fields that assume values in \({\mathbb {Z}}\) is mapped to a subset of \({\left\{ <,=,>\right\} }\), representing the set of possible relations between them.=

For (B), we define a domain of path expressions, which capture the following information about a segment.

  • The possible relations between adjacent \({\mathbb {Z}}\)-fields in the heap segment, represented as a subset of \({\left\{ <,=,>\right\} }\), e.g., to represent that the segment is sorted.

  • The set of observer registers whose value appears in some \({\mathbb {Z}}\)-field of the segment. For these registers, the path expression provides (i) the order in which the first occurrence of their values appear in the segment; this allows, e.g., to check ordering properties between data values for stacks and queues, and (ii) whether there is exactly one, or more than one occurrence of their value.

  • For each data and lock field, the set of possible values of that field in the heap segment, represented as a subset of the domain defined in case 1 above. This subset is provided both for the cells whose \({\mathbb {Z}}\)-field is not equal to any observer register, and for the cells whose \({\mathbb {Z}}\)-field is equal to each observer register.

To illustrate, in Fig. 5(c), each heap segment is labeled by a representation of its path expression. The first component of each path expression is \(\left\{ <\right\} \), i.e., the segment is sorted. The path expression at the top right has as its second component the sequence \(x^1\), expressing that there is exactly one cell whose \({\mathbb {Z}}\)-field has the same value as the observer register x. The row expresses that the cells whose \({\mathbb {Z}}\)-value is equal to that of x, have their mark field true, and their lock field free. The row expresses that the other cells have their mark field false, and their lock field free. The two lower path expressions express that no cell has a value equal to that of x, and summarize the possible values of fields.

The symbolic representation combines the above described shape and data representations, representing thread-abstracted configurations by symbolic configurations. These are tuples of form \(\Phi = \left\langle \mathtt{I}, \mathtt{Val}^\mathtt{v}, \mathtt{nextrel}, \pi , \mathtt{Val}^\mathtt{d}, \mathtt{Val}^\mathtt{r},\mathtt{private}\right\rangle \), where

  • \(\mathtt{I}\) is a finite set of indices, denoting the relevant cells of the heap,

  • \(\mathtt{Val}^\mathtt{v}\) maps each (global or local) pointer variable to an index in \(\mathtt{I}\),

  • \(\mathtt{nextrel}\) maps each index \(i\) to \(\mathtt{I}\cup \left\{ \mathtt{null},\bot \right\} \),

  • \(\pi \) maps each index \(i\) to a path expression; intuitively, if the index \(i\) represents the relevant cell \({\mathbbm {c}}\), then \(\mathtt{nextrel}(i)\) represents the next relevant cell, and \(\pi (i)\) summarizes the heap segment between \({\mathbbm {c}}\) and the cell represented by \(\mathtt{nextrel}(i)\),

  • \(\mathtt{Val}^\mathtt{d}\) maps data variables and fields of relevant cells to appropriate domains, and \(\mathtt{Val}^\mathtt{r}\) maps pair of those that assume values in \({\mathbb {Z}}\) to a subset of \(\left\{ <, =, >\right\} \).

  • \(\mathtt{private}\) is a predicate on indices.

We define a satisfaction relation between thread-abstracted and symbolic configurations. A symbolic representation \(\Psi \) is a set of symbolic configurations. A set of thread-abstracted configurations then satisfies a symbolic representation \(\Psi \) if each of its thread-abstracted configuration satisfies some symbolic configuration in \(\Psi \).

Symbolic Postcondition Computation. It remains to define a symbolic post operation \(\mathtt{Post}^{symb}\) on symbolic representations that reflects \(\mathtt{Post}^{ta}\) on sets of thread-abstracted configurations. Given a set \(\Psi \) of symbolic configurations representing possible views of single threads, it first generates all ways of merging them to symbolic representations of combined views of k threads; thereafter computing their postconditions wrp. to the next statement in each thread; and finally projecting the results onto each of the k participating threads. As follows from Lemmas 1 and 2, it is sufficient to consider only bounded values of k, and indeed \(k=2\) is sufficient for all examples that we considered.

Fig. 6.
figure 6

Experimental Results for verifying concurrent programs

8 Experimental Results

Based on our framework, we have implemented a tool in OCaml, and used it for verifying 19 concurrent algorithms (both lock-based and lock-free) including two stacks, five queues, nine ordered sets, one unordered set and two CAS algorithms. The user needs to provide the program code and the set of controllers. The experiments were performed on a desktop 2.8 GHz processor with 8 GB memory. The results are presented in Fig. 6, where running times are given in seconds. Figure 6(a) provides the verification results for all algorithms except for HW queue with provided linearization policies, using the technique of controllers introduced in the paper. Figure 6(b) provides the verification results for queues and stack without LP placement, using the technique adapted from [4, 15]. All experiments start from the initial heap, and end either when the analysis reaches the fixed point or when a violation of linearizability is detected.

Helping. The algorithms marked by \(\checkmark \) use helping. We run some of the algorithms with two different helping patterns (such as the ones described in Sect. 3), and report the execution time for each.

Arrays. Our tool does not currently support arrays, and hence we have transformed arrays to singly-linked lists in the algorithms that use the former.

Safety Properties. Our tool is also capable of verifying memory related safety properties such as the absence of null pointer dereferencing, dangling pointers, double-freeing, cycles, and dereferencing of freed nodes, as well as sortedness. In fact, for each algorithm, the time reported in Fig. 6 is the sum of the times taken to show linearizability and all the properties mentioned above.

Running Times. As can be seen from the table, the running times vary in the different examples. This is due to the types of shapes that are produced during the analysis. For instance, the CCAS, RDCSS, stack and queue algorithms without an elimination produce simple shape patterns and hence they have shorter running times. Several features may make shapes more complex, such as insertion/removal of elements in the middle of a list in the ordered set algorithms, and having two linked lists (instead of one) in the elimination queue and stack algorithms. Also, the unordered set algorithm generates complex shapes since it allows physically removed elements to re-appear in the set.

Error Detection. In addition to establishing correctness of the original versions of the benchmark algorithms, we tested our tool with intentionally inserted bugs. For example, we emitted broadcast messages in the controllers or inserted bugs into the codes of algorithms. In all cases, the tool, as expected, successfully detected and reported the bug. In the Lazy Set algorithm, when we emitted rule \(\rho _7\) in Fig. 3, the tool reported an error. As another example, when we removed the statement in line 4 of the add method in the Lazy Set algorithm, the tool also reported an error.

9 Conclusions

We have presented a uniform framework for automatically verifying linearizability of singly-linked list-based concurrent implementations of sets, stacks, and queues, annotated with linearization policies. Our contributions include a novel formalism for specifying linearization policies with non-fixed LPs, an extension of thread-modular reasoning, and an extension of existing symbolic representations of unbounded singly-linked-list structures containing data from an unbounded domain. We have verified all linearizable singly-linked list-based implementations known to us in the literature. In the future, we intend to extend the framework to more complex data structures.