1 Introduction

Distributed Algorithms. Traditionally [17], distributed algorithms are often described by means of pseudo code for its local processes: sequences of statements may manipulate local variables or trigger the exchange of messages with other participating processes. The following code [9, 23] describes the intended behavior of a single so-called participant i (one out of n) which is meant to solve the problem of Distributed Consensus [17] in a system where processes may fail.

figure a

An understanding of such a distributed algorithm requires to precisely fix the underlying assumptions of the system model, e.g., the meaning of send (broadcast) and receive (input) actions in the context of failures. In the above algorithm, an essential ingredient is the \(\text {alive}\)-test whose passing is subject to subtle guarantees. In the following, we explain the intuition behind \(\text {alive}\)-tests in the context of fault tolerance and the correctness of Distributed Consensus in more detail.

Fault Tolerance. In the so-called fail-stop model of distributed systems, processes may fail; and when they do so, they do not recover from this state. A failed process does no longer contribute to the system evolution, i.e., it can neither send nor receive messages. A process that does not fail in a run, is called correct (in that run). Failure detection provides processes with the permission to suspect other processes to have failed and, thus, to no longer wait for their messages to arrive. Perfect (i.e., always reliable) failure detection is not implementable in purely asynchronous systems, since it is impossible to distinguish the processes that have failed from those that are just slow. Here, Chandra and Toueg [6] proposed the concept of unreliable failure detection, whose degree of reliability is expressed by means of temporal constraints on runs. For example, for the above Consensus algorithm, a property called Weak Accuracy suffices: “Some correct process is never suspected by any (correct) process”.

Correctness. Specifications for distributed algorithms typically consist of properties of some temporal logic flavor that capture the intended safety and liveness guarantees. For Distributed Consensus, all participants shall agree on the decision for some value, while every participant starts with a private input value as proposal. In the above algorithm, this input value is initially assigned to the local variable \(x_i\), which may then be updated due to knowledge acquired by learning about the values kept by other participants via communication. Three temporal properties then capture in how far an algorithm works correctly. \(\bullet \) Validity: Every decision must be for some initial proposal. \(\bullet \) Agreement: No two correct processes decide differently. \(\bullet \) Termination: Every correct process eventually decides. The verification of these properties is dominated by state-based reasoning techniques, often referring to global state invariants about the values that are memorized in the respective local variables \(x_i\) of every (alive) participant.

In the above example algorithm, each participant gets its turn to propose a value in the role of the coordinator of “its” round. In every other round, each participant is to adopt the value proposed by that round’s coordinator ...unless it cannot detect that it is still “alive”. The algorithm satisfies Termination, as it runs a for-loop and never deadlocks. It satisfies Validity, as values are never invented, but only passed on. It satisfies Agreement, as (at least) in the round of the process that—by Weak Accuracy—is never suspected, every other process will have to adopt this proposal. Afterwards, there is no way to decide otherwise.

Using Process Calculi. Process calculi provide a wealth of proof methods and their syntactic nature allows for concise formal models that are nevertheless close to executable code in programming languages. A great variety of process calculi have been developed in the past, most of them for general purposes, some of them rather domain-specific. In the above case, the domain prompts two choices: (i) It is natural to employ distributed process calculi [12], where so-called locations represent units of distribution, possibly subject to failure. (ii) As message-passing models are prevalent for distributed systems, it is obvious to also use message passing calculi, as opposed to distributed process calculi that are based on the migration among and within so-called ambients [5].

Most of the existing process calculi (often descendants of CSP [13], CCS [19], ACP [3], or the \(\pi \)-calculus family [20]), however, are based on notions of action, thus essentially supporting just action-based reasoning. The main observations in the above-mentioned attempts to use process calculi to verify distributed algorithms are that (i) even if action-based reasoning—often using bisimulation techniques—is employed on the outside, it still heavily relies on state-based reasoning inside to construct the required bisimulations [9], and (ii) classical process calculi do not at all support state-based reasoning. This was also the main problem in [15, 22], where the respective authors applied process calculus machinery to the specification and verification of fault-tolerant algorithms that solve Distributed Consensus. In [24], the authors propose a method to systematically (re)construct state information for the reachable global states of an example Distributed Consensus algorithm (which was formalized in a tailor-made process calculus) and to capture this information within a dedicated data structure outside of the calculus. The lesson learned from [24] was that this method is too tedious and highly error-prone; it simply did not scale. This is the motivation to, instead of reconstructing implicit state information, make it explicit from the outset and provide linguistic support and structure within the process calculus itself. In this paper, we report on some of our recent results in this endeavor.

Our Approach. We use a reasonably standard and widespread notion of memory: mappings from variables to values. In our calculus, processes are threads that are associated with its local memories. Threads may declare variables and assign the value of complex expressions to them, resulting in updates to their own memory. Threads can be defined recursively, and they may run concurrently. In a fault-tolerant scenario, locations are “named processes” such that failures can be named. Parallel processes, together with “message in transit”, form networks.

In the operational semantics of our calculi, we let transitions operate between structural equivalence classes (equipped with some convenient congruence properties) of states. In a fault-tolerant scenario, global configurations keep track of failures and their detection. Executions of failure-aware networks, and their reachable configurations, can be analyzed via induction on transition sequences.

Related Work. Next to the above-mentioned work [9, 15, 22] using process calculi, we also used a state-machine approach [14], which suffers from the fact its global view on algorithms slightly obfuscating the locality of behaviors.

There are only few other related approaches using process calculi. In several contexts, process calculi have been equipped with notions of location or locality [4, 5, 12], but there they have different meaning; in particular, locations were not equipped with memories. In calculi with reversibility (e.g., [7]), process-local memories are used to store back-tracking information, i.e., a history of steps of a process that led it to the current state, which can be exploited to undo these steps in a causally consistent manner. Closest to our approach is the work of Garavel [10] on LNT, which is a programming language in the spirit of LOTOS that was developed to be easier to use for engineers [11]. Our treatment of write-many variables, which is uncommon for most process calculi, was partly inspired by them. However, the context of LOTOS/LNT is different from our distributed world, as it mainly addresses concurrent algorithms without support for fault-tolerance. More detailed comparisons are found later on in Sect. 3.

Structure and Contributions. In Sect. 2, we introduce memories and expression evaluation. In Sect. 3, we define syntax and semantics of a novel calculus of distributed processes that dispose of local memories. We provide a reasonably simple operational semantics for this calculus and discuss the impact of \(\alpha \)-conversion arising from the role of memories as binders for variables. In Sect. 4, we equip the calculus with awareness for locations and failures, which allows for completely new ways to model messages in transit and to deal with failure suspicions. In Sect. 5, we demonstrate the use of the calculus in a case study, where the advantage of direct access to local memories of processes is apparent. In Sect. 6, we summarize our contributions and conclude with a glimpse on future work.

2 Memories

We employ the widely-used idea that states, in the simplest possible way, are just variable assignments, which are often also called memories. This follows the tradition of research on state-based reasoning (see the ABZ conference series [2]).

We assume a set \(\mathbb {V} \) of values \(\textsf{v}\), for example, booleans or natural numbers. We also assume a countably infinite set \(\mathcal {X} \) of variables x. A memory is modeled as a total function \(M: \mathcal {X} \rightarrow \mathbb {V} \cup \left\{ \top , \bot \right\} \), by which variables may be associated with values or otherwise have the status of being just initialized (\(\top \)) or undefined (\(\bot \)). The set \(\textrm{dom}{(M)}\triangleq \left\{ x \in \mathcal {X} \mid M(x) \not =\bot \right\} \) denotes all variables defined in M. Accordingly, \(M_\bot \) denotes an initial memory, thus without any defined variables.

By their mutable nature, memories may be updated, which can be defined as follows: a memory \(M\langle x \mapsto \textsf{w}\rangle \), where x is updated to map to \(\textsf{w}\in \mathbb {V} \cup \left\{ \top \right\} \), behaves just like memory M unless we access the entry of the updated variable x:

$$\begin{aligned} M\langle x \mapsto \textsf{w}\rangle (y)\triangleq {\left\{ \begin{array}{ll} \textsf{w} &{} \text {if}\,\, x = y\\ M(y) &{} \text {if}\, x \ne y\\ \end{array}\right. } \end{aligned}$$

Note that also the cases with \(M(y)\in \left\{ \top ,\bot \right\} \) are properly covered.

We assume a set \(\mathcal {E} \) of expressions e with \(\mathbb {V} \cup \mathcal {X} \subseteq \mathcal {E} \). One may consider arbitrarily complex expressions with vectors and function symbols, as given by:

$$\begin{aligned} \begin{array}{rcl} e & \;\mathop {::=}\;& \textsf{v} \; \big | \;x \; \big | \;(e,\ldots ,e) \; \big | \;\textrm{f}(e) \end{array} \end{aligned}$$

The intended application will decide the respective range of allowed expressions.

We define the set \(\textrm{fv}(e)\) of (free) variables of e inductively by \(\textrm{fv}(\textsf{v})\triangleq \emptyset \), \(\textrm{fv}(x)\triangleq \left\{ x \right\} \), \(\textrm{fv}((e_1,\ldots ,e_n))\triangleq \bigcup \nolimits _{i\in \left\{ 1,\ldots ,n \right\} }\textrm{fv}(e_i)\), and \(\textrm{fv}(\textrm{f}(e))\triangleq \textrm{fv}(e)\).

We assume that expressions can be “reduced” to values by terminating computations. As expressions \(e\in \mathcal {E} \) may contain variables, we should evaluate them within the context of a memory M with \(\textrm{fv}(e)\subseteq \textrm{dom}{(M)}\). We let the function \(\textrm{fetch}_{M} : \mathcal {E} \rightarrow \mathcal {E} \cup \left\{ \bot \right\} \) for memory M replace the variables in \(\textrm{fv}(e)\) with their M-value; if a variable is only initialized, the result will yield undefined (see Definition 3 in the Appendix). To model the evaluation of expressions that include function symbols \(\textrm{f}\), we assume a homomorphic function \(\textrm{eval}{(\cdot )}: \mathcal {E} \cup \left\{ \bot \right\} \rightarrow \mathbb {V} \cup \left\{ \bot \right\} \) to be employed after \(\textrm{fetch}_M{(e)}\) has fetched from M—if possible—current values for the variables contained in e. The obvious idea then is that \(\textrm{eval}\) applies the semantics of each application of the function symbol \(\textrm{f}\). Thus, we define \(\textrm{eval}_M{(e)}\triangleq \textrm{eval}{(\textrm{fetch}_M{(e)})}\).

3 A Distributed Process Calculus with Local Memories

As we intend to use this calculus in the context of distributed systems, we have to rely on a concept of distributable units. We propose to use threads that dispose of their own private memory, which we call processes, as the units of distribution. In physically distributed systems, messages take time to travel from one process to another. Therefore, the asynchronous variant of message passing is to be preferred, in which send and receive actions are decoupled, as they cannot happen at the same time. Causally evident, send actions must always occur strictly before their corresponding receive action, which we model via a representation of “messages in travel”. All local memory states together with all messages in travel then provide us with the global state of a system.

In this section, we fix all of the these concepts as a calculus with two-level syntax for threads and (networks of distributed) processes.

In our calculus, the standard issues of bindings of variables as well as the notion of \(\alpha \)-conversion inevitably pop up and get proper treatment. Note in advance that this treatment is just necessary in order to provide a sound operational semantics for the calculus. When it comes to the use of the calculus for verification, we better avoid the need for \(\alpha \)-conversion during executions.

Syntax. We assume the set \(\mathcal {X} \) of variables, the set \(\mathbb {V} \) of values, and the set \(\mathcal {E} \) of expressions with \(\mathcal {X} \cup \mathbb {V} \subseteq \mathcal {E} \). Let \(\mathbb {B} =\left\{ \textsf{t},\textsf{f} \right\} \) be the set of booleans with \(\mathbb {B} \subseteq \mathbb {V} \). Let \(\mathbb {C} \subseteq \mathbb {V} \) denote the set of available channels where \(c\in \mathcal {E} \) is a metavariable for an expression that has to be evaluated to a channel \(\textsf{c}\in \mathbb {C} \).

We use to denote multisets/bags and \(\uplus \) to denote their disjoint union.

The following figure defines the syntax of our calculus with local memories. The right column represents designators for the respective syntactic categories.

figure c

The syntax defines two layers, threads (\(\mathcal {T}\)) and networks (\(\mathcal {N}\)).

We first explain threads T, which assemble guards G, which in turn perform actions \(\mu \). Action \(\textrm{var}\,x\) declares variable x, action \(\langle x \mathop {:=}e \rangle \) assigns (the value of) expression e to variable x, action \(c(x)\) receives a value over a channel c to store it in variable x; messages \(\overline{c}\langle e\rangle \) that each send some payload e over some channel c are collected in action O which resembles a multicast operation sending a multiset of messages in one go. A guard G can be \(\textbf{0}\) which does nothing, an action prefix \(\mu .T\), or an (external) choice \(G+G\). Threads can be guards G, conditionals \({\textsf{if}} \;e\;{\textsf{then}}\;T\;{\textsf{else}}\;T\), or refer to thread identifiers \(I\in \mathcal {I} \) that are equipped with a list of variables \({x_1,\ldots ,x_n}\) for which they need access. We require a defining equation with \(\textrm{fv}(G)\subseteq \left\{ x_1,\ldots ,x_n \right\} \) (see Definition 1) for every used thread identifier. Threads may also run in parallel \(T\mid T\).

A process \([{} {M} {}\triangleleft {} {T} {}]\) associates a memory M (introduced in Sect. 2) with a thread T. Multisets  collect messages \(\overline{\textsf{c}}\langle \textsf{v}\rangle \) in travel, where \(\textsf{c}\) and \(\textsf{v}\in \mathbb {V} \) are concrete channels and values, respectively, as determined by expression evaluation. A network N is composed of parallel processes together with the message aether.

Our calculus allows for concurrent threads within processes. This is often required, because concurrent activities support a natural modeling principle for node-local code of distributed algorithms. Unless restrictions are imposed, the memory M is shared. For example, in process \([{\,} {M} {\,}\triangleleft {\,} {T_1 \mid T_2} {\,}]\), both \(T_1\) and \(T_2\) have access to the memory M and can manipulate its variables, i.e., both threads can declare new variables and assign values to them. Thus, we get the usual and well-known problems of potentially competing reads and writes, which we do not intend to repeat in this paper. We also do not intend to discuss potential solutions to race conditions. We do, however, intend to be precise about the semantic implications of such an extension concerning variable bindings.

Binders. Our calculus contains two binders for variables. (i) The thread \(\textrm{var}\,x.T\) acts as a binder for x with scope T. (ii) The process \([{} {M} {}\triangleleft {} {T} {}]\) acts as a binder for the variables in \(\textrm{dom}{(M)}\) with scope T. As usual, we must carefully deal with free and bound variables. This can be done in a mostly straightforward way.

Definition 1 (Bound and Free Variables)

We define the functions \(\textrm{bv}/\textrm{fv}\) on actions, threads and processes as follows. For actions:

figure f

For threads, the full version can be consulted as Definition 4 in the Appendix. Here, we just point out the case for identifiers:

figure g

The other cases are defined homomorphically.

For processes, the interpretation of memories as binders yields:

$$\begin{aligned} \begin{array}{rcl} \textrm{bv}{([{} {M} {}\triangleleft {} {T} {}])}&{}\triangleq &{} \textrm{bv}{(T)} \cup \textrm{dom}{(M)}\\ \textrm{fv}([{} {M} {}\triangleleft {} {T} {}]) &{}\triangleq &{} \textrm{fv}(T) \setminus \textrm{dom}{(M)} \end{array} \end{aligned}$$

A variable x is called fresh w.r.t. process P if \(x\notin \textrm{bv}{(P)}\cup \textrm{fv}(P)\). An occurrence of a variable is bound if it occurs within the scope of a binder for it. (Note that \(\textrm{var}\,x.T\) is a binder for x, so the x in “\(\textrm{var}\,x\)” itself does not qualify as an occurrence of x.) An occurrence of a variable is free if it is not bound.

For example, in thread \( (\; \textrm{var}\,x.T_1 \!\mid \! \langle x \mathop {:=}e \rangle .T_2 \;) \), variable x is both free and bound, as \(x\in \textrm{bv}{(\textrm{var}\,x.T_1)}\) and \(x\in \textrm{fv}(\langle x \mathop {:=}e \rangle .T_2)\).

As usual, we may employ the concept of \(\alpha \)-conversion to identify processes that only differ in the concrete naming of variables. Likewise, we may rename bound variables, when needed, by consistently replacing all bound occurrences together with the respective binders with appropriately fresh variables. We write \({T_1}=_\alpha {T_2}\), if \(T_1\) and \(T_2\) differ only in consistent renamings of \(\textrm{var}\)-bound variables.

Here, we also apply this principle to processes \([{} {M} {}\triangleleft {} {T} {}]\). We may rename variables in T that are bound by M with fresh variables: We do so by consistently replacing them in M—i.e., in \(\textrm{dom}{(M)}\), as the values associated by M do not contain variables—together with all of the respective bound occurrences in T. Formally, replacing a binding for x in M (i.e., with \(x\in \textrm{dom}{(M)}\)) by a binding for a sufficiently fresh y to the M-value of x, can be defined as

$$\begin{aligned} {\left\{ \nicefrac {y}{x} \right\} }{M} \triangleq ( {M}\!\!\restriction _{\textrm{dom}{(M)}\setminus \left\{ x \right\} } ) {\langle y \mapsto M(x)\rangle } \end{aligned}$$

by first removing the binding for x (\({M}\!\!\restriction _{\textrm{dom}{(M)}\setminus \left\{ x \right\} }\)), then updating \({\langle y \mapsto M(x)\rangle }\). Let \({\left\{ \nicefrac {y}{x} \right\} }{T}\) denotes the standard substitution of free occurrences of x in T with y. Assuming \(x\in \textrm{dom}{(M)}\) and y fresh for \([{} {M} {}\triangleleft {} {T} {}]\), we then define:

$$\begin{aligned}{}[{} {M} {}\triangleleft {} {T} {}] =_\alpha [{\,} {{\left\{ \nicefrac {y}{x} \right\} }{M}} {\,}\triangleleft {\,} {{\left\{ \nicefrac {y}{x} \right\} }{T}} {\,}] \end{aligned}$$

The reflexive, symmetric and transitive closure of \(=_{\alpha }\) is of course an equivalence. As it just involves consistent in-place renamings of variables, it also satisfies congruence properties. For example, we define \( [{} {M} {}\triangleleft {} {T} {}] =_\alpha [{} {M} {}\triangleleft {} {T'} {}] \) if \(T =_\alpha T'\).

Sanity Conventions. Processes shall provide sufficient knowledge about their local variables. Therefore, a process P is called closed if \(\textrm{fv}(P)=\emptyset \). It is practically useful to always require closedness, as the intuitive meaning of an “open” process referring to free variables would be rather dubious: Where should such variables, not bound to their process, refer to? We generalize closedness of processes to networks by stating: A network N is called legal, if all its processes are closed.

For verification purposes, we use memories with the intention to access specifically-named local variables. Allowing the application of \(\alpha \)-conversion during the course of execution obviously defeats this purpose.Footnote 1 Thus, we require that, in any given application, variable names will be chosen such that there is no need to refer to \(\alpha \)-conversion when declaring new variables. One ingredient in this respect is that we only permit defining equations with \(\textrm{bv}{(G)}=\emptyset \).

Fig. 1.
figure 1

Structural Equivalence[s]

Fig. 2.
figure 2

Structure I

Definition 2 (Structural Equivalence)

We define the equivalence \(\equiv \) for threads (\(\mathcal {T}\)), processes (\(\mathcal {P}\)) and networks (\(\mathcal {N}\)) by the rules in Fig. 1.Footnote 2

For threads, we assume that both \((\mathcal {G},{+},\textbf{0})\) and \((\mathcal {T},{\mid },\textbf{0})\) are commutative monoids. In addition, we include \(\alpha \)-conversion by rule T-Alpha, while rule T-Out gets rid of empty outgoing bags.

For processes, we also include \(\alpha \)-conversion by rule P-Alpha, while rule P-Mem simply embeds thread congruence.

For networks, we assume that \((\mathcal {N},{\parallel },\mathbf {\emptyset })\) is a commutative monoid. Moreover, rule N-Chem allows us to combine and separate multisets of traveling messages.

Let \(\equiv _{\not \alpha }\) denote structural equivalence in which rules T-Alpha and P-Alpha are not allowed.

Note that the equivalence \(\equiv \) preserves the set of free variables and satisfies some useful congruence properties, due to the inclusion of the rules T-Par and N-Par. Note further that we will only consider closed processes in spite of rule P-Mem leaving this aspect open.

Operational Semantics. We define the notion of execution of networks as an unlabeled transition relation on \(\mathcal {N} \). As usual, we exploit the structural equivalence relation \(\equiv \) via the rule Str in Fig. 2. Rule Par allows us to focus on the actions of individual processes: these are captured by the rules in Fig. 3 and 4. Rules Decl, Assign, and Rcv are memory-changing. Rules Snd, True, False, and Ident are not memory-changing. Rules Snd and Rcv are global-state-changing.

Fig. 3.
figure 3

Local Memory-Changing Steps

Rule Decl declares a new variable for memory M, so \(x\not \in \textrm{dom}{(M)}\) is clear. We also require \(x\not \in \textrm{fv}(\widehat{T})\), as \(M\langle x \mapsto \top \rangle \) is a binder for x. Note that for closed processes, \(x\notin \textrm{dom}{(M)}\) implies \(x\not \in \textrm{fv}(\widehat{T})\). Rule Assign evaluates expression e and updates variable x in memory M, but only if it is already defined in M. It is not allowed to reset the variable to \(\bot \) (undefined) or \(\top \) (initialized).Footnote 3

Rule Rcv defines the reception of message \({\textsf{c}}\langle \textsf{v}\rangle \). Just like assignment, the received value \(\textsf{v}\) updates variable x in memory M; we only need to further check whether expression c evaluates to channel \(\textsf{c}\). Note, however, that reception may overwrite previous values; this imperative style [10] distinguishes our approach from the “classical” functional style of input, as in CCS [19].

Rule Snd selects one of the messages \(\overline{c}\langle e\rangle \) in the outgoing bag O; it then evaluates both c and e and checks whether they fit the requirement of resulting in a channel \(\textsf{c}\) and a value \(\textsf{v}\). In case of success, the message is removed from O (where \(\setminus \) denotes multiset removal), and its evaluated counterpart \(\overline{\textsf{c}}\langle \textsf{v}\rangle \) is placed into the network as “message in travel”.Footnote 4 Rule Ident describes the insertion of threads via identifiers. The premises ensure that the variables \(x_1,\ldots ,x_n \) of I are captured—as with dynamic scoping—by the associated memory M and that no other variables are accessed from within the defined body G. The rules True and False for evaluating conditionals are standard.

Example. As we require processes to be closed, let \(x\in \textrm{dom}{(M)}\) for:

$$\begin{aligned}{}[{\;} {M} {\;}\triangleleft {\;} {\textrm{var}\,x.T_1 \mid \langle x \mathop {:=}e \rangle .T_2} {\;}] \end{aligned}$$

With \(x\in \textrm{bv}{(\textrm{var}\,x.T_1)}\) and \(x\in \textrm{fv}(\langle x \mathop {:=}e \rangle .T_2)\), the occurrence of x in \(\langle x \mathop {:=}e \rangle .T_2\) is bound by M, whereas \(\textrm{var}\,x.T_1\) declares x as a “private” variable with scope \(T_1\).Footnote 5 Note how the premise of rule Decl ensures to require an \(\alpha \)-conversion before the variable can actually be declared.Footnote 6

Fig. 4.
figure 4

Local Non-Memory-Changing Steps

The following lemma states that \(\alpha \)-conversion and transitions get along well.

Lemma 1 (Preservation)

Let N be a legal network.

If \(N \xrightarrow {{}} N'\), then \(N'\) is legal.

Proof (Sketch)

Variables are never removed from memories M. They can only be changed via \(\alpha \)-conversions, but then their bound occurrences in the associated process will be changed accordingly. Otherwise, memories can only grow.

Variables that are bound within the scope of a declaration will remain bound when rule (Decl) is applied, but then by the associated memory M.

Fig. 5.
figure 5

Failures

4 Location Failures and Their Detection

Syntax. We follow the approach of [21] and introduce a set \(\mathbb {L} \subseteq \mathbb {V} \) of location names. In our calculus, we then let the processes \([{\,\!} {M} {\,\!}\triangleleft {\,\!} {T} {\,\!}]\) of Sect. 3 evolve into locations \(\ell [{} {M} {}\triangleleft {} {T} {}]\), where \(\ell \in \mathbb {L} \), so locations are simply located processes [12]. Conveniently, locations may also serve as a natural unit of failure.

We adapt the communication actions of Sect. 3 to become “location-aware”:

  • Output \(\overline{c}@l\langle e\rangle \) adds the name of the intended target;

  • Input \(c@l(x)\) adds the name of the intended source;

where l represents an expression that is expected to be evaluated to a location name, so we should require that \(\textrm{eval}_M{(l)}\in \mathbb {L} \). This has two concrete advantages: (i) Location-aware send actions fit to the intended application domain. (ii) Location-aware receive actions conveniently support suspicions. Message in travel, the elements of bags  , now take the form \({\textsf{c}}_{\textsf{src}\rightarrow \textsf{trg}}\langle \textsf{v}\rangle \), with \(\textsf{src},\textsf{trg}\in \mathbb {L} \) indicating the source and target of the message.

Structural Equivalence. We adapt the rules of Definition 2 to the extended syntax. The changes from processes to locations and the location-aware forms of communication actions and the messages in transit are orthogonal to the rules.

Operational Semantics. In order to track the failures of locations, we again follow [21] and identify a so-called trusted-immortal location \(\textsf{trim}\) that cannot fail and will never be suspected. With this abstraction, it is almost trivial to model systems that satisfy Weak Accuracy (see Sect. 1). We use global configurations of the form \(F \blacktriangleright _{\textsf{trim}} N\), in which (i) \(F\subseteq \mathbb {L} \) indicates which locations have failed (so far); (ii) \(\textsf{trim}\) is the dynamically determined trusted immortal; (iii) N is a network running in the context of (i) and (ii). For Weak Accuracy, it is required [21] that the very first transition of an execution randomly chooses the trusted immortal from the set of available location names. Rule TrIm in Fig. 5 shows how we represent this behavior starting out from an initial configuration. Rule Fail then allows any location to fail at any time, unless it has already failed or is immortal. Note that, in case of a location failure, we allow that the associated memory may still be inspected in spite of the location no longer contributing.

Figure 6 embeds the steps of the (adapted) semantics of the location-free calculus into the location-aware setting. Assuming that those steps now carry a label \(@\ell \) (see Figs. 8 and 9), rule N-Step allows such steps only if their responsible location \(\ell \) has not (yet) failed. Rule N-Susp relies on the label \(\textrm{susp}(k)@\ell \) to govern suspicions: it indicates that (a thread at) location \(\ell \) would like suspect location k to have failed. This is generously permitted, unless it applies to the trusted location \(\textsf{trim}\) and unless the suspector itself has failed. As a consequence, every run generated with these rules satisfies Weak Accuracy.

Fig. 6.
figure 6

Located Steps

Figure 7 is the counterpart to Fig. 2, but now adapted to deal with location-aware labels \(\eta \in \left\{ @\ell , \textrm{susp}(k)@\ell \mid \ell ,k\in \mathbb {L} \right\} \).

Fig. 7.
figure 7

Structure II

Figure 8 contains the location-aware variants of the rules in Fig. 3. Rules L-Decl and L-Assign now take place in locations as opposed to just processes. However, rule L-Rcv will now only allow a thread inside a location at \(\ell \) to receive a message \({\textsf{c}}_{\textsf{src}\rightarrow \mathsf {\ell }}\langle \textsf{v}\rangle \) if two conditions are satisfied: it must be explicitly addressed to \(\ell \) and it also must originate from the expected source location at \(\textsf{src}\).

Fig. 8.
figure 8

Located Memory-Changing Steps

Fig. 9.
figure 9

Located Non-Memory-Changing Steps

Figure 9 contains the location-aware variants of the rules in Fig. 4. In addition, rule L-Susp allows a thread to ignore a reception by launching a suspicion request for the intended source location of the sender. Rule L-Snd differs from rule Snd of Fig. 4 mainly in the formation of the message in travel: now, message \({\textsf{c}}_{\mathsf {\ell }\rightarrow \textsf{trg}}\langle \textsf{v}\rangle \) explicitly mentions its source \(\ell \) and target \(\textsf{trg}\). Rules L-Ident, L-True, and L-False now take place in locations as opposed to just processes.

Normalized Derivations. Due to the design of our semantics rules, every derivation of a transition on configurations \(F \blacktriangleright _{\textsf{trim}} N\) can be normalized. Either, the root of the derivation tree is generated by one of the rules in Fig. 5; then nothing else needs to be considered, as the premises do only depend on F and \(\textsf{trim}\). Or, the root is derived by one of the rules in Fig. 6. Then, the transition premise can be always be derived with an application of rule L-Str of Fig. 7. Its purpose is to rearrange the structure of N as well as the internals of its locations such that rule L-Par can be applied (possibly multiple times). The goal is to identify a single location \(\ell [{\,} {M} {\,}\triangleleft {\,} {T\mid \widehat{T}} {\,}]\), possibly together with a suitable singleton “travel bag” in order to enable the application of one of the rules in Figs. 8 and 9. An application of rule L-Str can support this by shifting the identified location to the left, if needed (by L-Rcv) together with a suitable message, and also shift the intended thread T to the left inside this location.

5 Case Study: Distributed Consensus

In this section, we formalize the algorithm that we presented in the Introduction within our distributed process calculus and prove that it correctly solves Distributed Consensus, i.e., that it satisfies Validity, Agreement and Termination.

Fig. 10.
figure 10

Algorithm

As the algorithm uses booleans and natural numbers, we define our sets of expressions and values accordingly: \(\mathbb {B} \cup \mathbb {N} \subseteq \mathbb {V} \). We also need operations on numbers and comparisons among them, so \(\mathcal {E} \) shall include \(e_1+e_2\), \(e_1=e_2\), \(e_1\le e_2\) for \(e_1,e_2\in \mathcal {E} \). We assume that the evaluation function \(\textrm{eval}\) (see Sect. 2) takes care for ill-formed and ill-typed cases by then yielding \(\bot \).

In addition, we use a single channel \(\textsf{c}\) for the message exchanges; as our calculus fixes source and target location names in communication actions, it will always be unambiguous for which round a message is intended by simply identifying the sender as the respective coordinator of a round: \(\mathbb {C} \triangleq \left\{ \textsf{c} \right\} \subseteq \mathbb {V} \). Likewise, we let \(\mathbb {L} \triangleq \left\{ 1,\ldots ,n \right\} \subseteq \mathbb {N} \), as this is the convention provided by the algorithm. On may (and should!) criticize the abuse of natural numbers for this purpose, which intentionally confuses location names and round numbers, but in order to remain as close to the pseudo code as possible, we follow this convention.

For the vector \((\textsf{input}_1,\ldots , \textsf{input}_n)\) of initial proposals for the n participants, the code in Fig. 10 represents the algorithm, as formalized in our calculus. We instrument the code with tags to refer to positions in the code. (Tag  is used several times, but always with the same thread identifier).

\(\textsf{Consensus}_{(\textsf{input}_1,\ldots , \textsf{input}_n)}\) defines a network of locations, one for each participant \(\ell \in \left\{ 1,\ldots ,n \right\} \). Each location is equipped with an initial memory, where we directly set the four variables \(\textrm{chan}, \textrm{x}, \textrm{r}, \textrm{output}\) to their initial values. Note that all participants dispose of the same channel vector. Note also that their initial memories only potentially differ in their initial proposals. We could also use dedicated \(\textrm{var}\,\)-declarations and assignment steps; the effect would be the same, but at the expense of \(4*2*n\) additional execution steps. Note that all locations are closed, so the defined network \(\textsf{Consensus}_{(\textsf{input}_1,\ldots , \textsf{input}_n)}\) is legal.

On each location, the same code is run, as represented by the thread definition for \(\textsf{L}_{\ell }^{\textrm{chan}, \textrm{x}, \textrm{r}, \textrm{output}}\). Note that the code does not include any variable declarations, so no \(\alpha \)-conversion will ever be needed during execution. The increment of the round number ( ) together with the break condition ( ) simulate the for-loop of the pseudo code. Apart from this deviation, the code only essentially differs from its pseudo variant in that we do not need to check for “\(\text {alive}(p_r)\)”, as in our calculus suspicion is, except for the trusted immortal, always allowed ( ). Note that the command \(\text {broadcast~} x_i\) is explicit in our code as multiple output ( ). Here, we deviate from the pseudo code in that we have a coordinator not send a message to itself and then wait for its reception; therefore, we use an if-then-else instead of the two if-then constructs in the pseudo code. Finally, note that the thread is completely sequential; there are no parallel threads.

The execution of the algorithm then starts from \( \emptyset \blacktriangleright _{} \mathsf {Consensus_{(\textsf{input}_1,\ldots , \textsf{input}_n)}} \) with no failed processes, and with a trusted immortal yet to be determined. By the design of our semantics, every reachable configuration can be represented in a standard form, up to structural congruence \(\equiv _{\not \alpha }\), as follows:

figure p

where we use \(\textstyle \prod \nolimits _{\ell \in {L}}\ell [{} {M_\ell } {}\triangleleft {} {T_\ell } {}]\) for \({L}\subseteq \mathbb {L} \) as abbreviation for the parallel composition of locations \(\ell [{} {M_\ell } {}\triangleleft {} {T_\ell } {}]\) modulo associativity and commutativity.

Therefore, for every reachable configuration, we can now simply inspect (i) the messages in transit ( ), (ii) the individual local states \(M_\ell \) of all participants, and (iii) the “program counters” (to be understood as a location-specific metavariable) for all participants. Using this direct access, we can now state an informative (global state) invariant. On the one hand, it is very close to the intuitive reasoning that we sketched in Sect. 1. On the other hand, it is formal and can be checked with precise reference to our operational semantics.

Lemma 2 (Invariant)

Let \((\textsf{input}_1,\ldots , \textsf{input}_n)\) be a valid vector of proposals. Let \(\textrm{Undecided}\triangleq \left\{ \textsf{input}_1,\ldots , \textsf{input}_n \right\} \).

figure s

Note that we use the convention of TLA+ on the use of conjunction lists [16], in which the enlisted conjuncts internally have stronger operator precedence.

The invariant of Lemma 2 points out that for every participant \(\ell \), depending on their respective round \(M_\ell (\textrm{r})\), the content of its current proposal \(M_\ell (\textrm{x})\) can be constrained. Before round \(\textsf{trim}\), not much can be guaranteed (conjunct 1), as suspicions may be applied at will. However, within round \(\textsf{trim}\), it is precisely the passage of all non-coordinators from  to  that changes the situation (conjuncts 2 and 3), as none of them may suspect the coordinator \(\textsf{trim}\) to have failed. Afterwards, this value will be uniformly proposed by all later coordinators (conjuncts 4 and 5). In the invariant, the statements on messages in  just strengthen the statement in order to make the induction go through, as the information of the decision value is passed on from locations to messages in transit, from where they will be received by the target locations. Finally, note that if the conditions of the constraints are met in a configuration, then this means that a participant was non-failing for long enough in order to reach this state.

Proof (Sketch)

We proceed by induction on the length of the sequence \(\longmapsto ^{+}\) (note that this induction starts after the first step to determine \(\textsf{trim}\)). The invariant is initially trivially satisfied, as all processes are in round 0 and .

The induction step addresses

figure x

for which we check all possibilities of deriving such a transition. Note that every derivation that results from an application of the rules Fail (only changes F to \(F'\)), L-True, L-False, L-Ident, L-Susp (only change  to ) will keep the invariant valid, as they neither change  nor any of the \(M_\ell \). The changes to the position must be checked, but are harmless (e.g., to to  , or  to  ). Rule L-Decl will never be applied, as there are no variable declarations in the code. Otherwise, we only have to deal with applications of the rules L-Assign, L-Snd and L-Rcv, appearing in the following cases (in which \(\ell \not \in {F}\)):

  1. 1.

    participant \(\ell \) moving from to : rule L-Assign

  2. 2.

    participant \(\ell \) moving from to (either again or) : rule L-Snd

  3. 3.

    participant \(\ell \) moving from to : rule L-Rcv

  4. 4.

    participant \(\ell \) moving from to : rule L-Assign

As an example of case 2, consider \(\ell =\textsf{trim}\) with \(M_\textsf{trim}(\textrm{r})=\textsf{trim}\) for the first time in position  . If the induction step applies L-Snd, then the message \({\textsf{c}}_{\mathsf {\textsf{trim}}\rightarrow \textsf{j}}\langle \textsf{v}\rangle \) appearing in  is the first one originating from \(\textsf{trim}\) such that for the induction step afterwards conjunct 2 must be checked. It is satisfied, as rule L-Snd will use \(\textrm{eval}_{M_{\textsf{trim}}}(\textrm{x})=M_{\textsf{trim}}(\textrm{x})\) as the payload \(\textsf{v}\) for this message.

As another example, consider a non-coordinator \(\ell \) (\(\not =\textsf{trim}\)) in just the round \(M_\ell (\textrm{r}) = \textsf{trim}\) in position . By induction (conjunct 3, subconjunct 1), \(M_\ell (\textrm{x})\in \textrm{Undecided}\). If the induction step applies L-Rcv, then the message must be available. After the step, participant \(\ell \) will be in position  , so the second subconjunct of conjunct 3 must be satisfied. This holds, as conjunct 2 is true in the hypothesis, because rule L-Rcv updates the memory with the received \(\textsf{v}\).

Note how this proof makes heavy use of the direct access to the values of local variables \(M_\ell (\textrm{r})\) for the various \(\ell \in \mathbb {L} \) at each reachable configuration of an execution. After all, it is this possibility of access to local variables in our calculus that makes the proof doable and thus satisfies the title of this paper.

Theorem 1

The algorithm of Fig. 10 solves Distributed Consensus.

Proof (Sketch)

Termination holds, as the only potentially blocking operation is in input position ( ). Participants may always suspect, though, unless they wait for \(\textsf{trim}\). In this case, there will eventually be a message, as \(\textsf{trim}\) cannot fail.

Validity holds, as the invariant never uses a value that is not in \(\textrm{Undecided}\).

Agreement holds with conjunct 6 of the invariant and Termination.

6 Conclusion

We provide linguistic support for state-based reasoning in distributed process calculi. We do so by equipping located processes, the units of distribution in such calculi, with local memories. We develop syntax and operational semantics for this calculus in two steps, starting with a fault-free version. We demonstrate the applicability of our calculus on the formalization of a fault-tolerant algorithm to solve Distributed Consensus. The correctness proof highlights the proximity of our formalization with the widely-used intuitive correctness arguments.

We conjecture that our calculus (or slight extensions of it) is applicable to the large class of fault-tolerant distributed algorithms, which use typical pseudo code with global asynchronous message passing and reference to local variables, next to simple control structures like loops and conditionals. It is rare in this domain that (channel) name passing, as known from the (Applied) Pi Calculus [1, 20], is needed in algorithms. We see, however, no problem at all to also include a restriction operator in our calculus to govern the scope of channel names.

Further work consists of applying our approach to other Distributed Consensus algorithms and mechanizing the reasoning about the invariant and the correctness proof. In [14], we used state machines and checked their correctness in Isabelle. We plan to develop a similar formalization of our calculus.