1 Introduction

This paper presents PrideMM, an efficient model checker for second-order (SO) logic. PrideMM is used to automatically evaluate tests under the intricate memory specificationsFootnote 1 of aggressively optimised concurrent languages, where no automated solution currently exists, and it is compared to existing tools over a simpler class of memory specifications.

We argue that SO logic is a sweet spot: restrictive enough to enable efficient solving, yet expressive enough to extend automation to a new class of memory specifications that seek to solve open problems in concurrent language design. PrideMM enables a modify-execute-evaluate pattern of memory-specification development, where changes are quickly implemented and automatically tested.

Memory specifications define what values may be read in a concurrent system. Current evaluators rely on ad hoc algorithms  [3, 6, 14] or satisfiability (SAT) solvers  [40]. However, flaws in existing language memory specifications  [5]—where one must account for executions introduced through aggressive optimisation—have led to a new class of memory specifications  [20, 22] that cannot be practically solved using existing ad hoc or SAT techniques.

Many memory specifications are definable in \(\exists \)SO in a natural way and one can simulate them using SAT solvers. We demonstrate this facility of PrideMM for a realistic C++ memory specification  [24], reproducing previous results [39, 40]. But, some memory specifications are naturally formulated in higher-order logic. For example, the Jeffrey-Riely specification (J+R) comes with a formalisation, in the proof assistant Agda  [11], that clearly uses higher-order features  [20]. We observed that the problem of checking whether a program execution is allowed by J+R can be reduced to the model checking problem for SO. From a program execution, one obtains an SO structure \(\mathfrak {A}\) on an universe of size n, and then one asks whether \(\mathfrak {A}\models \mathsf {JR}_n\), where

figure a

We will define precisely these formulae later (Sect. 5.4). For now, observe that the formula \(\mathsf {JR}_n\) is in \(\exists \forall \exists \)SO. In practice, this means that it is not possible to use SAT solvers, as that would involve an exponential explosion. That motivates our development of an SO model checker. It is known that SO captures the polynomial hierarchy  [27, Corollary 9.9], and the canonical problem for the polynomial hierarchy is quantified satisfiability. Hence, we built our SO model checker on top of a quantified satisfiability solver (QBF solver), QFUN  [17].

The contributions of our work are as follows:

  1. 1.

    we present a model checker for SO, built on top of QBF solvers;

  2. 2.

    we reproduce known simulation results for traditional memory specifications;

  3. 3.

    we simulate a memory specification (J+R) that is a representative of a class of memory specifications that are out of the reach of traditional simulation techniques.

2 Overview

Figure 1 shows the architecture of our memory-specification simulator. The input is a litmus test written in the LISA language, and the output is a boolean result. LISA is a programming language that was designed for studying memory specifications  [1]. We use LISA for its compatibility with the state-of-the-art memory-specification checker Herd7  [3]. We transform the input program into an event structure  [41]. The memory-specification generator (MSG) produces an SO formula. We have a few interchangeable MSGs (Sect. 5). For some memory specifications (Sect. 5.1, Sect. 5.2, Sect. 5.3), which Herd7 can handle as well, the formula is in fact fixed and does not depend at all on the event structure. For other memory specifications (such as Sect. 5.4), the MSG might need to look at certain characteristics of the structure (such as its size). Finally, both the second-order structure and the second-order formula are fed into a solver, giving a verdict for the litmus test.

Fig. 1.
figure 1

From a LISA test case to a Y/N answer, given by the SO solver.

We are able to do so because of a key insight: relational second-order logic represents a sweet-spot in the design space. On the one hand, it is expressive enough such that encoding memory specifications is natural. On the other hand, it is simple enough such that it can be solved efficiently, using emerging QBF technology.

2.1 Memory Specifications

A memory specification describes the executions allowed by a shared-memory concurrent system; for example, under sequential consistency (SC)  [25] memory accesses from all threads are interleaved and reads take their value from the most recent write of the same variable. Processor speculation, memory-subsystem reordering and compiler optimisations lead mainstream languages and processors to violate SC, and we say such systems exhibit relaxed concurrency. Relaxed concurrency is commonly described in an axiomatic specification (e.g. SC, ARM, Power, x86, C++ specifications  [3]), where each program execution is represented as a graph with memory accesses as vertices, and edges representing program structure and dynamic memory behaviour. A set of axioms permit some execution graphs and forbid others.

Figure 2 presents a litmus test—a succinct pseudocode program designed to probe for a particular relaxed behaviour—together with an execution graph and an axiom. We shall discuss each in turn.

The test, called LB+ctrl, starts with \(\texttt {x}\) and \(\texttt {y}\) initialised to 0, then two threads concurrently read and conditionally write 1 back to their respective variables. The outcome \(r_1 = 1 \wedge r_2 = 1\) (1/1) is unintuitive, and it cannot result from SC: there is no interleaving that agrees with the program order and places the writes of 1 before the reads for both \(\texttt {x}\) and \(\texttt {y}\).

In an axiomatic specification, the outcome specified by the test corresponds to the execution graph shown in Fig. 2. Initialisation is elided, but the read and write of each thread is shown with \({ po }\) edges reflecting program order and edges linking writes to reads that read from them. The axiom of Fig. 2 forbids the outcome 1/1 as the corresponding execution contains a cycle in . The SC, x86, Power and ARM specifications each include a variant of this axiom, all forbidding 1/1, whereas the C++ standard omits it  [6] and allows 1/1.

MemSAT  [39] and Herd7  [3] automatically solve litmus tests for axiomatic specifications using a SAT solver and ad hoc solving respectively, but not all memory specifications fit the axiomatic paradigm.

Fig. 2.
figure 2

LB+ctrl, an axiomatic execution of it, and an axiom that forbids it.

Axiomatic Specifications Do Not Fit Optimised Languages. Languages like C++ and Java perform dependency-removing optimisations that complicate their memory specifications. For example, the second thread of the LB+false-dep test in Fig. 3 can be optimised using common subexpression elimination to r2 = y; x = 1;. On ARM and Power, this optimised code may be reordered, permitting the relaxed outcome 1/1, whereas the syntactic control dependency of the original would make 1/1 forbidden. It is common practice to use syntactic dependencies to enforce ordering on hardware, but at the language level the optimiser removes these false dependencies.

The memory specification of the C++ standard  [15] is flawed because its axiomatic specification cannot draw a distinction between the executions leading to outcome 1/1 in LB+ctrl and LB+false-dep: to see that the dependency is false, one must consider more than one execution path, but axiomatic specifications judge single executions only  [5].

Fig. 3.
figure 3

LB+false-dep and the corresponding event structure.

Event Structures Capture the Necessary Information. A new class of specifications aims to fix this by ordering only real dependencies  [12, 20, 22, 31]. With a notable exception  [22], these specifications are based on event structures, where all paths of control flow are represented in a single graph. Figure 3 presents the event structure for LB+false-dep. Program order is represented by arrows ( ). Conflict ( ) links events where only one can occur in an execution (the same holds for their program-order successors). For example, on the left-hand thread, the load of x can result in a read of value 0 (event a) or a read of value 1 (event b), but not both. Conversely, two subgraphs unrelated by program-order or conflict, e.g. \(\{a,b,c\}\) and \(\{d,e,f,g\}\), represent two threads in parallel execution.

It should be clear from the event structure in Fig. 3 that regardless of the value read from y in the right-hand thread, there is a write to x of value 1; that is, the apparent dependency from the load of y is false and could be optimised away. Memory specifications built above event structures can recognise this pattern and permit relaxed execution.

The Jeffrey and Riely Specification. J+R is built above event structures and correctly identifies false dependencies  [20]. Conceptually, the specification is related to the Java memory specification  [29]: in both, one constructs an execution stepwise, adding only memory events that can be justified from the previous steps. The sequence captures a causal order that prevents cycles with real dependencies. While Java is too strong, J+R allows writes that have false dependencies on a read to be justified before that read. To do this, the specification recognises confluence in the program structure: regardless of the execution path, the write will always be made. This search across execution paths involves an alternation of quantification that current ad hoc and SAT-based tools cannot efficiently simulate. However, the problem is amenable to QBF solvers.

2.2 Developing SC in SO Logic

The SC memory specification can be expressed as an axiomatic model  [3] using coherence order, a per-variable total order of write events. An execution is allowed if there exists a reads-from relation  and a coherence order  such that the transitive closure of is acyclic. Here, \({ po }\) is the (fixed) program-order relation, and it is understood that  and  satisfy certain further axioms. In our setting, we describe the sequentially consistent specification as follows. We represent  and  by existentially-quantified SO arity-2 variables  and , respectively. For example, to say , we use the formula . The program order \({ po }\) is represented by an interpreted arity-2 symbol <. Then, the SO formula that represents is

figure d

The definition from above should be interpreted as a macro expansion rule: the left-hand side \(\mathsf {R}(y,z)\) is a combinator that expands to the formula on right-hand side. To require that the transitive closure of \(\mathsf {R}\) is acyclic we require that there exists a relation that includes \(\mathsf {R}\), is transitive, and irreflexive:

$$\begin{aligned} \exists Z\, \bigl ( \mathsf {sub}^2(\mathsf {R}, Z) \wedge \mathsf {trans}(Z) \wedge \mathsf {irrefl}(Z) \bigr ) \end{aligned}$$

The combinators \(\mathsf {sub}^2\), \(\mathsf {trans}\), \(\mathsf {irrefl}\) are defined as one would expect. For example, \(\mathsf {sub}^2(P,Q)\), which says that the arity-2 relation P is included in the arity-2 relation Q, is \(\forall xy\,\bigl (P(x,y)\rightarrow Q(x,y)\bigr )\). In short, the translation from the usual formulation of memory specifications into the SO logic encoding that we propose is natural and almost automatic.

To represent programs and their behaviours uniformly for all memory specifications in Sect. 5, we use event structures. These have the ability to represent an overlay of potential executions. Some memory specifications require reasoning about several executions at the same time: this is a salient feature of the J+R memory specification.

Once we have the program and its behaviour represented as a logic structure \(\mathfrak {A}\) and the memory specification represented as a logic formula \(\phi \), we ask whether the structure satisfies the formula, written \(\mathfrak {A}\models \phi \). In other words, we have to solve a model-checking problem for second-order logic, which reduces to QBF solving because the structure \(\mathfrak {A}\) is finite.

3 Preliminaries

To introduce the necessary notation, we recall some standard definitions  [27]. A (finite, relational) vocabulary \(\sigma \) is a finite collection of constant symbols (\(\mathtt {1},\ldots ,\mathtt {n}\)) together with a finite collection of relation symbols (\(\mathtt {q},\mathtt {r},\ldots \)). A (finite, relational) structure \(\mathfrak {A}\) over vocabulary \(\sigma \) is a tuple \(\langle A,Q,R,\dots \rangle \) where \(A=\{1,\ldots ,n\}\) is a finite set called universe with several distinguished relations \(Q,R,\ldots \) We assume a countable set of first-order variables (xy, ...), and a countable set of second-order variables (XY, ...). A variable \(\alpha \) is a first-order variable or a second-order variable; a term t is a first-order variable or a constant symbol; a predicate P is a second-order variable or a relation symbol. A (second-order) formula \(\phi \) is defined inductively: (a) if P is a predicate and \(t_1,\ldots ,t_k\) are terms, then \(P(t_1,\ldots ,t_k)\) is a formulaFootnote 2; (b) if \(\phi _1\) and \(\phi _2\) are formulae, then \(\phi _1 \circ \phi _2\) is a formula, where \(\circ \) is a boolean connective; and (c) if \(\alpha \) is a variable and \(\phi \) is a formula, then \(\exists \alpha \,\phi \) and \(\forall \alpha \,\phi \) are formulae. We assume the standard satisfaction relation \(\models \) between structures and formulae.

The logic defined so far is known as relational SO. If we require that all quantifiers over second-order variables are existentials, we obtain a fragment known as \(\exists \)SO. For example, the SC specification of Sect. 2.2 is in \(\exists \)SO.

The Model Checking Problem. Given a structure \(\mathfrak {A}\) and a formula \(\phi \), determine if \(\mathfrak {A}\models \phi \). We assume that the relations of \(\mathfrak {A}\) are given by explicitly listing their elements. The formula \(\phi \) uses the syntax defined above.

Combinators. We will build formulae using the combinators defined below. This simplifies the presentation, and directly corresponds to an API for building formulae within PrideMM.

figure e
figure f

By convention, all quantifiers that occur on the right-hand side of the definitions above are over fresh variables. Above, \(P^k\) and \(Q^k\) are arity-k predicates, x and y are first-order variables, and \(\mathsf {R}\) and \(\mathsf {S}\) are combinators.

Let us discuss two of the more interesting combinators: \(\mathsf {acyclic}\) and \(\mathsf {TC}\). A relation P is acyclic if it is included in a relation that is transitive and irreflexive. We remark that the definition of \(\mathsf {acyclic}\) is carefully chosen: even slight variations can have a strong influence on the runtime of solvers  [18]. The combinator \(\mathsf {TC}\) for bounded transitive closure is interesting for another reason: it is higher-order—applying an argument (\(\textsf {R}\)) relation in each step of its expansion. By way of example, let us illustrate its application to the subset combinator \(\mathsf {sub}^1\).

$$\begin{aligned}&\mathsf {TC}_1(\mathsf {sub}^1)(P,Q) \\ {}&\qquad = \mathsf {eq}^1(P,Q) \vee \exists X\, \bigl ( \mathsf {sub}^1(P,X) \wedge \mathsf {TC}_0(\mathsf {sub}^1)(X,Q) \bigr ) \\ {}&\qquad = \left\{ \begin{aligned}&\forall x_1\, \bigl (P(x_1)\leftrightarrow Q(x_1)\bigr ) \vee \\&\quad \exists X\, \bigl ( \forall x_2\, \bigl ( P(x_2) \rightarrow X(x_2) \bigr ) \wedge \mathsf {eq}^1(X,Q) \bigr ) \end{aligned} \right. \\ {}&\qquad = \left\{ \begin{aligned}&\forall x_1\, \bigl (P(x_1)\leftrightarrow Q(x_1)\bigr ) \vee \\&\quad \exists X\, \bigl ( \forall x_2\, \bigl ( P(x_2) \rightarrow X(x_2) \bigr ) \wedge \forall x_3\, \bigl ( X(x_3) \leftrightarrow Q(x_3) \bigr ) \bigr ) \end{aligned} \right. \end{aligned}$$

In the calculation above, P, Q and X have arity 1.

4 So Solving Through QBF

From a reasoning perspective, SO model-checking is a highly non-trivial task due to quantifiers. In particular, quantifiers over relations, where the size of the search-space alone is daunting. For a universe of size n there are \(2^{n^2}\) possible binary relations, and there are \(2^{n^k}\) possible k-ary relations.Footnote 3

A relation is uniquely characterised by a vector of Boolean values, each determining whether a certain tuple is in the relation or not. This insight lets us formulate a search for a relation as a SAT problem, where a fresh Boolean variable is introduced for any potential tuple in the relation. Even though the translation is exponential, it is a popular method in finite-model finding for first-order logic formulae  [13, 33, 38].

However, in the setting of SO, a SAT solver is insufficient since the input formula may contain alternating quantifiers. We tackle this issue by translating to quantified Boolean formulae (QBF), rather than to plain SAT. The translation is carried out in three stages.

  1. 1.

    each interpreted relation is in-lined as a disjunction of conjunctions over the tuples where the relation holds;

  2. 2.

    first-order quantifiers are expanded into Boolean connectives over the elements of the universe, i.e. \(\forall x\phi \) leads to one conjunct for each element of the universe and \(\exists x\phi \) leads to one disjunct for each element of the universe;

  3. 3.

    all atoms now are ground and each atom is replaced by a fresh Boolean variable, which is inserted under the same type of quantifier as the atom.

For illustration, consider the formula \(\exists X\forall Y\forall z\, \bigl (Y(z)\rightarrow X(z)\bigr )\) and the universe \(A =\{1, 2\}\). The formula requires a set X that is a superset of all sets. Inevitably, X has to be the whole domain. The QBF formulation is \(\exists \mathsf {x}_1 \mathsf {x}_2\forall \mathsf {y}_1 \mathsf {y}_2\, \bigl ((\mathsf {y}_1\rightarrow \mathsf {x}_1\bigr ) \wedge \bigl (\mathsf {y}_2\rightarrow \mathsf {x}_2)\bigr )\). Intuitively, rather than talking about a set, we focus on each element separately, which is enabled by the finiteness of the universe. Using QBF enables us to arbitrarily quantify over the sets’ elements.

PrideMM enables exporting the QBF formulation into the QCIR format  [21], which is supported by a bevy of QBF solvers. However, since most solvers only support prenex form, PrideMM, also additionally prenexes the formula, where it attempts to heuristically minimise the number of quantifier levels.

The experimental evaluation showed that the QFUN solver  [17] performs the best on the considered instances, see Sect. 6. While the solver performs very well on the J+R litmus tests, a couple of instances were left unsolved. Encouraged by the success of QFUN, we built a dedicated solver that integrates the translation to QBF and the solving itself. The solver represents the formula in dedicated hash-consed data structures (the formulae grow in size considerably). The expansion of first-order variables is done directly on these data structures while also simplifying the formula on the go. The solver also directly supports non-prenex formulation (see  [19] for non-prenex QBF solving). The solver applies several preprocessing techniques before expanding the first-order variables, such as elimination of relations that appear only in positive or only in negative positions in the formula.

5 Memory Specification Encodings

In this section, we show that many memory specifications can be expressed conveniently in second-order logic. We represent programs and their behaviours with event structures: this supports the expression of axiomatic specifications such as C++, but also the higher-order specification of J+R. For a given program, its event structure is constructed in a straightforward way: loads give rise to mutually conflicting read events and writes to write events  [20]. We express the constraints over event structures with the following vocabulary, shared across all specifications.

Vocabulary. A memory specification decides if a program is allowed to have a certain behaviour. We pose this as a model checking problem, \(\mathfrak {A}\models \phi \), where \(\mathfrak {A}\) captures program behaviour and \(\phi \) the memory specification. The vocabulary of \(\mathfrak {A}\) consists of the following symbols:

  • arity 1:\(\mathtt {read}\), \(\mathtt {write}\), \(\mathtt {final}\)

  • arity 2: \(\le \), \(\mathtt {conflict}\), \(\mathtt {justifies}\), \(\mathtt {sloc}\), \(=\)

Sets \(\mathtt {read}\) and \(\mathtt {write}\) classify read and write events. The symbol \(\mathtt {final}\), another set of events, identifies the executions that exhibit final register states matching the outcome specified by the litmus test.

Events x and y are in program order, written \(x \le y\), if event x arises from an earlier statement than y in the program text. We have \(\mathtt {conflict}(x,y)\) between events that cannot belong to the same execution; for example, a load statement gives rise to an event for each value it might read, but an execution chooses one particular value, and contains only the corresponding event. We write \(\mathtt {justifies}(x,y)\) when x is a read and y is a write to the same memory location of the same value. We have \(\mathtt {sloc}(x,y)\) when x and y access the same memory location. Identity on events, \(\{\,(x,x)\mid x\in A\,\}\), is denoted by \(=\).

Configurations and Executions. We distinguish two types of sets of events. A configuration is a set of events that contains no conflict and is downward closed with respect to \(\le \); that is, X is a configuration when \(\mathsf {V}(X)\) holds, where the \(\mathsf {V}\) combinator is defined by

figure g

We say that a configuration X is an execution of interest when every final event is either in X or in conflict with an event in X; that is, X is an execution of interest when \(\mathsf {F}(X)\) holds, where the \(\mathsf {F}\) combinator is defined by

figure h

Intuitively, we shall put in \(\mathtt {final}\) all the maximal events (according to \(\le \)) for which registers have the desired values.

Notations. In the formulae below, X will stand for a configuration, which may be the execution of interest. Variables , , and so on are used to represent the relations that are typically denoted by , , , ... Thus, X has arity 1, while , , , ... have arity 2.

In what follows, we present four memory specifications: sequential consistency (Sect. 5.1), release–acquire (Sect. 5.2), C++ (Sect. 5.3), and J+R (Sect. 5.4). The first three can be expressed in \(\exists \)SO (and in first-order logic). The last one uses both universal and existential quantification over sets. For each memory specification, we shall see their encoding in second-order logic.

5.1 Sequential Consistency

The SC specification allows all interleavings of threads, and nothing else. It is described by the following SO sentence:

figure i

Intuitively, we say that there exists a coherence order relation  and a reads-from relation  which, when combined in a certain way, result in an acyclic relation . The formula says that  satisfies the usual axioms of a coherence order with respect to the execution X; and the formula says that  satisfies the usual axioms of a reads-from relation with respect to the execution X. Moreover, the formula \(\mathsf {F}(X)\) asks that X is an execution of interest, which results in registers having certain values.

figure j

When X is a potential execution and  is a potential coherence-order relation, the formula requires that the writes in X for the same location include some total order. Because of the later condition that is acyclic,  is in fact required to be a total order per location. When X is a potential execution and  is a potential reads-from relation, the formula requires that  is injective, is a subset of \(\mathtt {justifies}\), and relates all the reads in X to some write in X.

The auxiliary relation is the union of strict program-order (<), reads-from (), coherence-order (), and the from-reads relation:

figure k

5.2 Release–Acquire

Release–Acquire is a simple relaxed memory specification, which is represented straightforwardly in SO logic. It is captured by the formula \(\mathsf {RA}\) using the vocabulary established in the definition of SC:

figure l

The existential SO variable over-approximates a relation traditionally called happens-before.

5.3 C++

To capture the C++ specification in SO logic, we follow the Herd7 specification of Lahav et al.  [24]. Their work introduces necessary patches to the specification of the standard  [6] but also includes fixes and adjustments from prior work  [4, 23]. The specification is more nuanced than the SC and RA specifications and requires additions to the vocabulary of \(\mathfrak {A}\) together with a reformulation for efficiency, but the key difference is more fundamental. C++ is a catch-fire semantics: programs that exhibit even a single execution with a data race are allowed to do anything—satisfying every expected outcome. This difference is neatly expressed in SO logic:

figure m

The formula reuses , and \(\mathsf {F}(X)\) and includes three new combinators: , and . constrains a new over-approximation, \( Y _{\alpha \beta }\), used for building a transitive relation. captures the conditions imposed on a valid C++ execution, and is the analogue of the conditions applied in \(\mathsf {SC}\) and \(\mathsf {RA}\). holds if there is a race in the execution X. Note that the expected outcome is allowed if \(\mathsf {F}(X)\) is satisfied or if there is a race and is true, matching the catch-fire semantics.

New Vocabulary. C++ Read-modify-write operations load and store from memory in a single atomic step: a new relation links the corresponding reads and writes. C++ fence operations introduce new events and the set \({\mathtt {fences}}\) identifies them. The programmer annotates each memory access and fence with a memory order parameter that sets the force of inter-thread synchronisation created by the access. For each choice, we add a new set: \({\mathtt {na}}\), \({\mathtt {rlx}}\), \({\mathtt {acq}}\), \({\mathtt {rel}}\), acq-rel, and \({\mathtt {sc}}\).

Over-Approximation in Happens Before. The validity condition, , and races , hinge on a relation called happens-before. We over-approximate transitive closures in the SO logic for efficiency, but Lahav et al.  [24] define happens-before with nested closures that do not perform well. Instead we over-approximate a reformulation of happens-before that flattens the nested closures into a single one (see Appendix A).

We define a combinator for happens-before, , that is used in and . It takes as argument an over-approximation of the closure internal to the reformed definition of happens-before, \( Y _{\alpha \beta }\). constrains \( Y _{\alpha \beta }\), requiring it to be transitive and to include the conjuncts of the closure, \(\alpha \) and \(\beta \) below.

figure n

5.4 Jeffrey–Riely

The J+R memory specification is captured by a sentence \(\mathsf {JR}_n\), parametrised by an integer n. Unlike the formulae we saw before, \(\mathsf {JR}_n\) makes use of three levels of quantifiers (\(\exists \forall \exists \)), putting it on the third level of the polynomial hierarchy. We begin by liftingFootnote 4 \(\mathtt {justifies}\) from events to sets of events P and Q:

figure o

We read \(\mathsf {J}\) as ‘justifies’, and \(\mathsf {AJ}\) as ‘always justifies’. Next, we define what Jeffrey and Riely call ‘always eventually justify’

figure p

The size of the formula \(\mathsf {TC}_n(\mathsf {AeJ}_m)(P,Q)\) we defined above is \(\varTheta (mn)\). In particular, it is bounded. Finally, we letFootnote 5

figure q

and ask solve the model checking problem \(\mathfrak {A}\models \mathsf {JR}_n\). Since the formulae above are in MSO, it is sufficient to pick . Since all bounded transitive closures include the subset relation, they are monotonic, and it suffices, in fact, to pick . For actual solving, we will use this observation.

6 Evaluation

We evaluate our tool in the context of Herd7  [3], which is a standard tool among memory specification researchers for building axiomatic memory specifications. No similar tool exists for higher-order event structure based memory specifications.

Fig. 4.
figure 4

Comparison of PrideMM’s performance in contrast to Herd7  [3].

6.1 Comparison to Existing Techniques

In Fig. 4 we compare the performance and capabilities of PrideMM to Herd7, the de facto standard tool for building axiomatic memory specifications. Herd7 and PrideMM were both executed on a machine equipped with an Intel i5-5250u CPU and 16 GB of memory. We choose not to compare our tool to MemSAT  [39], as there are more memory specifications implemented for Herd7 in the CAT language  [2] than there are for MemSAT.

Performance. Notably Herd7’s performance is very favourable in contrast to the performance of PrideMM, however there are some caveats. The performance of PrideMM is largely adequate, with most of the standard litmus tests taking less than 2 s to execute. \(y \le 1s\) is highlighted on the chart. We find that our QBF technique scales better than Herd7 with large programs. This is demonstrated in the SB-16 test, a variant of the “store buffering” litmus test with 16 threads. The large number of combinations for quantifying the existentially quantified relations which are explored naïvely by Herd7 cause it to take a long time to complete. In contrast, smarter SAT techniques handle these larger problems handily.

Expressiveness. We split the chart in Fig. 4 into 2 sections, the left-hand side of the chart displays a representative subset of common litmus tests showing PrideMM’s strength and weaknesses. These litmus tests are evaluated under the C++ memory specification. Note that these include tests with behaviour expected to be observable and unobservable, hence there being two MP bars. The C++ memory specification is within the domain of memory specifications that Herd7 can solve, as it requires only existentially quantified relations.

The right-hand half of the chart is the first 10 Java causality test cases run under the J+R specification, which are no longer expressible in Herd7. PrideMM solves these in reasonable time, with most tests solved in less than 10 min. Our J+R tests replicate the results found in the original paper, but where they use laborious manual proof in the Agda proof assistant, PrideMM validates the results automatically (Fig. 5).

6.2 QBF vs SO Solver Performance

PrideMM enables emitting the SO logic formulae and structures directly for the SO solver, or we can convert to a QBF query (see Sect. 4). This allows us to use our SO solver as well as QBF solvers. We find that the SO solver affords us a performance advantage over the QBF solver in most of the Java causality test cases, where performance optimisations for alternating quantification are applicable.

Fig. 5.
figure 5

Solver approaches for PrideMM on Java causality test cases. \(\bot \) represents timeout or mem-out.

We include the performance of the QBF solvers CAQE and QFUN, the respective winners of the CNF and non-CNF tracks at 2017’s QBFEVAL competition  [32]. Our QBF benchmarks were first produced in the circuit-like format QCIR  [21], natively supported by QFUN. The inputs to CAQE were produced by converting to CNF through standard means, followed by a preprocessing step with Bloqqer  [7].

We can also emit the structures and formulae as an Isabelle/HOL file, which can then be loaded into Nitpick  [8] conveniently. We found that Nitpick cannot be run over the C++ specification or the J+R specification, timing out after 1 hr on all the litmus tests.

7 Related Work

We build on prior work from two different areas—relaxed memory specifications, and SAT/QBF solving: the LISA frontend comes from the Herd7 memory-specification simulator  [3], the MSGs implement memory specifications that have been previously proposed  [20, 24], and the SO solver is based on a state-of-the-art QBF solver  [17].

There is a large body of work on finite relational model finding in the context of memory specifications using Alloy  [16]. Alloy has been used to compare memory specifications and find litmus tests which can distinguish two specifications  [40], and has been used to synthesise comprehensive sets of tests for a specific memory specification  [28]. Applying SAT technology in the domain of evaluating memory specifications has been tried before, too. MemSAT  [39] uses Kodkod  [38], the same tool that Alloy relies on to do relational model finding. MemSynth  [10] uses Ocelot  [9] to embed relational logic into the Rosette  [37] language. Our results are consistent with the findings of MemSAT and MemSynth: SAT appears to be a scalable and fast way to evaluate large memory specification questions. Despite this, SAT does not widen the class of specifications that can be efficiently simulated beyond ad hoc techniques.

There is work to produce a version of Alloy which can model higher-order constructions, called Alloy*  [30], however this is limited in that each higher order set requires a new signature in the universe to represent it. Exponential expansion of the sets quantified in the J+R specification leaves model finding for J+R executions intractable in Alloy* too.

While Nitpick  [8] can model higher order constructions, we found it could not generate counter examples in a reasonable length of time of the specifications we built. There is work to build a successor to Nitpick called Nunchaku  [34], however, at present Nunchaku does not support higher order quantification. Once Nunchaku is more complete we intend to output to Nunchaku and evaluate its performance in comparison to our SO solver.

There is a bevy of work on finite model finding in various domains. SAT is a popular method for finite model finding in first-order logic formulae  [13, 33]. There are constraint satisfaction-based model finders, e.g. the SEM model finder  [42], relying on dedicated symmetry and propagation. Reynolds et al. propose solutions for finite model finding in the context of SMT  [35, 36] (CVC4 is in fact used as backend to Nunchaku).

8 Conclusion

This paper presents PrideMM, a case study of the application of new solving techniques to a problem domain with active research. PrideMM allows memory specification researchers to build a new class of memory specifications with richer quantification, and still automatically evaluate these specifications over programs. In this sense we provide a Herd7-style modify-execute-evaluate pattern of development for higher-order memory specifications that were previously unsuitable for mechanised model finding.