Eliminating Message Counters in Synchronous Threshold Automata

Stoilkovska, Ilina; Konnov, Igor; Widder, Josef; Zuleger, Florian

doi:10.1007/978-3-030-67067-2_10

Ilina Stoilkovska^11,12,
Igor Konnov¹²,
Josef Widder¹² &
…
Florian Zuleger¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12597))

Included in the following conference series:

International Conference on Verification, Model Checking, and Abstract Interpretation

781 Accesses
1 Citations

Abstract

In previous work, we introduced synchronous threshold automata for the verification of synchronous fault-tolerant distributed algorithms, and presented a verification method based on bounded model checking. Modeling a distributed algorithm by a threshold automaton requires to correctly deal with the semantics for sending and receiving messages based on the fault assumption. This step was done manually so far, and required human ingenuity. Motivated by similar results for asynchronous threshold automata, in this paper we show that one can start from a faithful model of the distributed algorithm that includes the sending and receiving of messages, and then automatically obtain a threshold automaton by applying quantifier elimination on the receive message counters. In this way, we obtain a fully automated verification pipeline. We present an experimental evaluation, discovering a bug in our previous manual encoding. Interestingly, while quantifier elimination in general produces larger threshold automata than the manual encoding, the verification times are comparable and even faster in several cases, allowing us to verify benchmarks that could not be handled before.

Partially supported by: Interchain Foundation, Switzerland; Austrian Science Fund (FWF) via doctoral college LogiCS W1255.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Eliminating Message Counters in Threshold Automata

Verifying safety of synchronous fault-tolerant algorithms by bounded model checking

Article 30 August 2021

Verifying Safety of Synchronous Fault-Tolerant Algorithms by Bounded Model Checking

1 Introduction

Formal modeling and automated verification of fault-tolerant distributed algorithms [2, 28] received considerable attention recently, e.g., [8, 20, 29, 32, 38]. In the more classic approach towards distributed algorithms’ correctness, algorithms are described in pseudo code, using send and receive operations whose semantics are typically not formalized, but given in English. As a result, this may lead to ambiguities that are an obstacle both for implementing distributed algorithms faithfully, as well as for computer-aided verification. Threshold automata were introduced as a formalization of fault-tolerant distributed algorithms with precise semantics [5, 23, 26], and effective automated verification methods have been introduced both for the asynchronous [22] and for the synchronous [36] case. While they are a concise model that allows to capture precisely the non-determinism distributed systems exhibit due to the communication model and partial faults, threshold automata in fact constitute a manual abstraction: a threshold automaton has to capture two major ingredients of a distributed system: (i) the local program control flow that is based on received messages and (ii) the semantics of send and receive operations in a fault-prone environment. For many classical distributed algorithms, this manual abstraction is quite immediate, but as has been observed in [37], more involved distributed algorithms are harder to abstract manually. This manual process consists in understanding how a fault assumption—that typically is well-understood but not formalized—changes the semantics of sending and receiving messages, which is a formalization step that typically requires human ingenuity. The more desirable approach is to have a precise and formal description of (i) and (ii), and to construct the abstraction automatically. This also allows to reuse (ii), that is, the formalization of given distributed computing model for new benchmarks. Indeed, in [37], for asynchronous algorithms, we introduced a method that takes as input formalizations of (i) and (ii) and automatically constructs threshold automata. By this, we have reduced the required expertise of the user, increased the degree of automation on the verification process, and indeed found some bugs in manual abstractions of asynchronous algorithms. However, the approach in [37] focuses on (asynchronous) interleaving semantics, and asynchronous message passing, which pose different challenges than the synchronous setting.

While distributed algorithms are mostly designed for asynchronous systems, there exists a considerable amount of literature that focuses on synchronous distributed algorithms. The synchronous computation model is relevant, both theoretically and practically: (a) a well-known impossibility result [18] reveals a class of problems for which a solution in the asynchronous model does not exist, but which can be solved in the synchronous model, (b) some real-time systems are actually built on top of synchronous distributed algorithms [24], and (c) several verification approaches reduce the asynchronous to the synchronous setting [4, 12, 13, 15, 19, 25], enabling the transfer of verification techniques. For these reasons, verification in the synchronous setting received significant interest recently [1, 17, 29]. Applying verification techniques discovered a bug in an already published synchronous consensus algorithm, as reported in [27].

In [36], we proposed a synchronous variant of threshold automata along with an automated verification method based on bounded model checking. We experimentally evaluated our approach on a large number of benchmarks coming from the distributed systems literature. However, the framework in [36] is based on the manual abstraction described above.

Our Contributions. In this paper, we bring the automatic generation of threshold automata to the synchronous setting. We propose a synchronous threshold automata (STA) framework that allows us to:

1.
model a given algorithm with an STA, whose guards are linear integer arithmetic expressions over the number of received messages, such that the obtained STA is in one-to-one correspondence with the pseudo code,
2.
model the implicit assumptions imposed by the computation and fault models explicitly, using a so-called environment assumption, which is specific to the respective fault model and can be reused for different algorithms,
3.
automatically translate the guards over the local receive variables into guards over the number of globally sent messages, using quantifier elimination,
4.
pass the output of the translation as input to the verification tool proposed in [36], which implements a semi-decision procedure for computing the diameter, and performs bounded model checking.

In [36], the STA given as input to the verification tool was produced manually, that is, the steps 1–3 above were done by the user. By automating these steps, we reduce the ingenuity required by the user. We encoded the control flow and the environment assumptions of several synchronous algorithms in our framework and compared the resulting STA with the existing manual encodings from [34]. We confirm that manual abstraction is error-prone, as we discovered glitches in previous manually encoded STA. For all benchmarks, the automatically generated STA are comparable with the manual encodings. For some, the automatically generated STA could be verified faster. Thus in addition to increasing the degree of automation, we also gained in performance.

2 Our Approach at a Glance

Synchronous Distributed Algorithms. A distributed algorithm is a collection of n processes that perform a common task and exchange messages. At most t of the n processes can be faulty, and f processes are actually faulty. The numbers n, t, f are parameters, where n and t are “known”, that is, they appear in the code (see Fig. 1), while f may differ according to the individual executions. In the synchronous computation model, the actions that a process takes locally depend on the messages that the process has received in the current round by other processes. Often, a process checks whether a quorum has been obtained (e.g., majority, two-thirds, etc.) by counting the number of messages it has received. Obtaining a quorum means that the number of received messages has to pass a given threshold, which should guarantee that it is safe for a correct process to take an action, and move to a new local state.

The threshold automata framework [23] is based on the observation that from the viewpoint of enabled transitions in a transition system, we may substitute the check whether a quorum of messages has been received with a check whether enough messages have been sent. For some algorithms, this substitution is straightforward, but others have more complicated guard expressions over the number of received messages. Consider, for example, the pseudo code of the algorithm PhaseQueen [6, 9], presented in Fig. 1. The algorithm operates in phases, with two rounds per phase (lines 3–8 and 9–11). In round 1, all processes broadcast their value stored in the variable $\mathtt {v}$ (line 3), receive messages from other processes (line 4), and count the number of messages with value 0 (line 5) and value 1 (line 6). If a process received more than 2t messages with value 1, then it sets its value to 1 (line 7), otherwise it sets its value to 0 (line 8). In round 2, a process $i$ acts as a queen, if the number of the current phase is equal to $i$ (line 9), and it is the only process that broadcasts (line 9). Each process receives the queen’s value $\mathtt {v}_q$ (line 10), and checks if in round 1, it received less than $n-t$ messages with value equal to its own value $\mathtt {v}$. If this is the case, the process sets its value to the value $\mathtt {v}_q$ received from the queen (line 11). This algorithm satisfies the property agreement: it ensures that after phase $t+1$, i.e., after the loop on line 2 terminates, all correct processes have the same value $\mathtt {v}$.

Receive Synchronous Threshold Automata. In Sect. 3, we propose a new variant of synchronous threshold automata, rSTA, with guards expressed over receive variables. Figure 2 shows the rSTA of the algorithm PhaseQueen. It corresponds to the control flow of the pseudo code in Fig. 1 as follows. The following locations capture local states of correct processes that are currently not a queen:

$\textsc {v}i$ encodes that a process has the value $i \in \{0, 1\}$,
$\textsc {r1v}i$ encodes that after the first round a process sets its value to $i \in \{0, 1\}$, and that it has received at least $n-t$ messages that have its value (i.e., the condition from line 11 evaluates to false),
$\textsc {r1v}i\textsc {q}$ encodes that after the first round a process sets its value to $i \in \{0, 1\}$, and that it has received less than $n-t$ messages that have its value. Such a process will use the queen’s message to update its value at the end of the second round (that is, the condition in line 11 evaluates to true),
$\textsc {r2v}i$ encodes that after the second round a process sets its value to $i \in \{0, 1\}$.

From the location $\textsc {r2v}i$, we have outgoing rules that bring the process back to the beginning of the next phase, i.e., to $\textsc {v}i$, for $i \in \{0, 1\}$. Additionally, a process might move from the location $\textsc {r2v}i$ to $\textsc {qv}i$, for $i \in \{0, 1\}$, and thus become a queen in the next phase. The locations $\textsc {qv}i, \textsc {r1qv}i, \textsc {r2qv}i$, for $i \in \{0, 1\}$, capture the behavior of a correct process acting as a queen in the current phase. The Byzantine processes can act arbitrary, and their behavior is not explicitly modeled in the automaton. However, in some phase, the queen may be Byzantine. To capture this, we introduce locations, populated by a single Byzantine process, namely the locations $F = \{\textsc {f}, \dots , \textsc {r2qf}\}$. The queen is Byzantine in some phase, if the single Byzantine process moves from the location $\textsc {r2f}$ to the location $\textsc {qf}$.

Processes in locations $\textsc {v}i, \textsc {qv}i$ send messages of type $m_i$, that is, messages containing the value $i \in \{0, 1\}$. The message types $m_{qi}$ are used to encode that the queen in the current phase sent a message with value $i \in \{0, 1\}$. When the queen process is Byzantine, it can send messages of type $m_{q0}$ or $m_{q1}$. We write $\mathsf {sent}(m)$ to denote the set of locations where processes send a message of type $m$, and $\#{\mathsf {sent}(m)}$ for the number of sent messages of type $m$.

The receive guards $\varphi _1, \dots , \varphi _8$ express conditions over the number of received messages of some message type, and capture expressions which appear in the pseudo code. We denote by $\mathsf {nr}(m_i)$ and $\mathsf {nr}(m_{qi})$ the number of messages containing the value $i \in \{0, 1\}$ that a process received from all processes in the first round of the phase (i.e., the value $\mathtt {C}[i]$ in the pseudo code, lines 5, 6) and by the queen in the second round of the phase, respectively. For example, the receive guard $\varphi _1$, occurring on rules that move processes to the location $\textsc {r1v0}$, checks if a process received at most 2t messages of type $m_1$ (the branch is taken in line 8), and at least $n-t$ messages of type $m_0$ (the condition in line 11 is false).

We explicitly encode the relationship between the number of received and sent messages using an environment assumption $\textsf {Env}$, which bounds the number of received messages: (i) from below by the number of messages sent by the correct processes, and (ii) from above by the number of messages sent by both the correct and faulty processes. The bound (i) captures the assumptions of the synchronous communication, which requires that all messages sent by correct processes in a round are received in the same round, and the bound (ii) captures the non-determinism introduced by the faulty processes. E.g., in the algorithm PhaseQueen, we have f Byzantine processes, which may send messages of arbitrary types. For the receive variable $\mathsf {nr}(m_i)$, we have the constraint $\#{\mathsf {sent}(m_i)} \le \mathsf {nr}(m_i) \le \#{\mathsf {sent}(m_i)} + f$ in the environment assumption $\textsf {Env}$.

The agreement property stated above is a safety property. To check if it holds, it suffices to check that after $t+1$ phases, either all processes are in locations $\textsc {v0}, \textsc {qv0}$, or in locations $\textsc {v1}, \textsc {qv1}$. The precise formalization of the properties we are interested in verifying can be found in [36].

Our Approach. In Sect. 6, we eliminate the receive variables in an rSTA using quantifier elimination for Presburger arithmetic [14, 30, 31]. We strengthen the receive guards by the environment assumption $\textsf {Env}$ that imposes bounds on the values of the receive variables, which are existentially quantified. As a result, a quantifier-free guard expression over the number of sent messages is obtained. For example, the result of applying quantifier elimination to the guard $\varphi _1$ over the receive variables from Fig. 2, strengthened by the upper and lower bounds in the environment assumption $\textsf {Env}$, is the guard $\widehat{\varphi }_1$ with no receive variables:

$$\begin{aligned} \widehat{\varphi }_1&\equiv \#{\mathsf {sent}(m_1)} \le 2t \wedge \#{\mathsf {sent}(m_0)} + f \ge n - t \wedge \widehat{\textsf {Env}} \end{aligned}$$

where $\widehat{\textsf {Env}}$ are the residual constraints from eliminating the receive variables from the environment assumption $\textsf {Env}$. The condition $\mathsf {nr}(m_1) \le 2t$ in the guard $\varphi _1$ is translated to $\#{\mathsf {sent}(m_1)} \le 2t$, and the condition $\mathsf {nr}(m_0) \ge n-t$ to $\#{\mathsf {sent}(m_0)} + f \ge n - t$. That is, when translating the guards, the number of the faulty processes f is used in guards that check if the number of sent messages passes a threshold, whereas f is not used in guards that check if the number of sent messages is below a threshold. (Byzantine processes send messages arbitrarily.)

The STA where all guards over the receive variables are replaced by the automatically generated guards over the number of sent messages constitutes a valid input to the bounded model checking technique for STA from [36], which we use to verify their safety properties. We show that this method is sound and complete by showing the existence of a bisimulation between the composition of n copies of rSTA and the composition of n copies of the produced STA. Thus, eliminating the receive message counters preserves temporal properties. We implemented this technique and used it to automatically generate STA for a set of benchmarks, and compared them to the existing manually encoded STA for the same benchmarks. We discuss our the experimental results in Sect. 7.

3 Synchronous Threshold Automata

We recall synchronous threshold automata from [36] and extend them with receive variables below. A synchronous threshold automaton () is the tuple $\textsf {STA}{} = (\mathcal {L}_{}, {\mathcal {I}}_{}, {\mathcal {R}}_{}, \varPi _{}, RC_{}, \textsf {Env}_{})$, whose locations $\mathcal {L}$, initial locations ${\mathcal {I}}$, rules ${\mathcal {R}}$, parameters $\varPi $, and resilience condition $RC$ are defined below. We define the environment assumption $\textsf {Env}$ in Sect. 3.2.

Parameters $\varPi $, Resilience Condition $RC$. We assume that the set $\varPi $ of parameters contains at least the parameter n, denoting the total number of processes. The resilience condition $RC$ is a linear arithmetic expression over the parameters from $\varPi $. We call the vector $\varvec{\pi }= \langle \pi _1, \dots , \pi _{|\varPi |}\rangle $ the parameter vector, and the vector $\mathbf {p}= \langle \mathsf {p}_1, \dots , \mathsf {p}_{|\varPi |} \rangle \in \mathbb {N}^{|\varPi |}$ a valution of $\varvec{\pi }$. The set $\mathbf {P}_{RC}= \{\mathbf {p}\in \mathbb {N}^{|\varPi |} \mid \mathbf {p} \text{ is } \text{ a } \text{ valuation } \text{ of } \varvec{\pi } \text{ and } \mathbf {p} \text{ satisfies } RC\}$ contains the admissible valuations of $\varvec{\pi }$. The mapping $N: \mathbf {P}_{RC}\rightarrow \mathbb {N}$ maps an admissible valuation $\mathbf {p}\in \mathbf {P}_{RC}$ to the number $N(\mathbf {p}) \in \mathbb {N}$ of participating processes, i.e., the number of processes whose behavior is modeled using the STA. We denote by $N(\varvec{\pi })$ the linear combination of parameters that defines the number of participating processes.

Locations $\mathcal {L}, {\mathcal {I}}$. The locations $\ell \in \mathcal {L}$ encode the current value of the local variables of a process, together with information about the program counter. We assume that each local variable and the program counter ranges over a finite set of values, that is, we assume that the set $\mathcal {L}$ of locations is a finite set. The initial locations in ${\mathcal {I}}\subseteq \mathcal {L}$ encode the initial values of the local variables.

Message Types $\mathcal {M}$. Let $\mathcal {M}$ denote the set of message types. To encode sending messages in the STA, we define a mapping $\mathsf {sent}: \mathcal {M}\rightarrow 2^{\mathcal {L}}$, that maps a message type $m\in \mathcal {M}$ to a set $\mathsf {sent}(m) \subseteq \mathcal {L}$ of locations, such that $\mathsf {sent}(m) = \{\ell \in \mathcal {L}\mid \text{ a } \text{ process } \text{ in } \ell \text{ sends } \text{ message } \text{ of } \text{ type } m\}$.

Let $L\subseteq \mathcal {L}$ denote a set of locations, and let $\#L$ denote the number of processes in locations from the set $L$. To define guards over the sent messages and express temporal properties, we define c-propositions:

$$\begin{aligned} \#L\ge \varvec{a}\cdot \varvec{\pi }+ b \text{ for } L\subseteq \mathcal {L}\text{, } \varvec{a}\in \mathbb {Z}^{|\varPi |} \text{, } \text{ and } b \in \mathbb {Z}\end{aligned}$$

We denote by $\mathrm {CP}$ the set of $c$-propositions. If the set $L$ of locations in the $c$-proposition is equal to the set $\mathsf {sent}(m)$, for some $m\in \mathcal {M}$, the $c$-proposition is used to check whether the number of messages of type $m\in \mathcal {M}$ is greater than or equal to a linear combination of the parameters, also called a threshold. Formally, the $c$-propositions are evaluated in tuples $(\varvec{\kappa }, \mathbf {p})$, where $\varvec{\kappa }\in \mathbb {N}^{|\mathcal {L}|}$ is an $|\mathcal {L}|$-dimensional vector of counters, and $\mathbf {p}\in \mathbf {P}_{RC}$ is an admissible valuation:

$$\begin{aligned} (\varvec{\kappa }, \mathbf {p}) \models \#L\ge \varvec{a}\cdot \varvec{\pi }+ b\quad \text{ iff } \quad \sum _{\ell \in L} \varvec{\kappa }[\ell ] \ge \varvec{a}\cdot {\mathbf {p}} + b \end{aligned}$$

(1)

Rules ${\mathcal {R}}$. A rule $r\in {\mathcal {R}}$ is a tuple $({ from }, { to }, \varphi )$, where: ${ from }, { to }\in \mathcal {L}$ are locations, and $\varphi $ is a guard, i.e., a Boolean combination of $c$-propositions. The guards $r.\varphi $, for $r\in {\mathcal {R}}$, analogously to (1), are evaluated in tuples $(\varvec{\kappa }, \mathbf {p})$, and the semantics of the Boolean connectives is standard.

3.1 Receive Synchronous Threshold Automata

A receive is the tuple $\textsf {rSTA}{} = (\mathcal {L}, {\mathcal {I}}, {\mathcal {R}}^{\varDelta }, \varDelta , \varPi , RC, \textsf {Env}^{\varDelta }){}$, whose locations $\mathcal {L}$, initial locations ${\mathcal {I}}$, parameters $\varPi $, and resilience condition $RC$ are defined as for STA. We define the receive variables $\varDelta $ and rules ${\mathcal {R}}^{\varDelta }$ below, and the environment assumption $\textsf {Env}^{\varDelta }$ in Sect. 3.2.

Receive Variables $\varDelta $. The set $\varDelta $ contains receive variables $\mathsf {nr}(m)$ that store the number of messages of type $m\in \mathcal {M}$ that were received by a process. Thus, $|\varDelta | = |\mathcal {M}|$, as in $\varDelta $ there is exactly one receive variable $\mathsf {nr}(m)$ per message type $m\in \mathcal {M}$. The values of the receive variables depend on the number of messages sent in a given round (discussed in more detail in Sect. 3.2).

Let $M\subseteq \mathcal {M}$ denote a set of message types, and let $\#{M}$ denote the total number of messages of types $m\in M$, received by some process. Observe that the notation $\#{M}$ is a shorthand for $\sum _{m\in M} \mathsf {nr}(m)$. We will use these two notations interchangeably. Further, when $M$ is a singleton set, that is, when $M= \{m\}$, we will simply use the notation $\mathsf {nr}(m)$ to denote $\#\{m\}$. For the purpose of expressing guards over the receive variables $\mathsf {nr}(m)$, for $m\in \mathcal {M}$, we define r-propositions:

$$\begin{aligned} \#{M} \ge \varvec{a}\cdot \varvec{\pi }+ b \text{, } \text{ such } \text{ that } M\subseteq \mathcal {M}, \varvec{a}\in {\mathbb {Z}}^{|\varPi |}, b \in {\mathbb {Z}}\end{aligned}$$

We denote by $\mathrm {RP}$ the set of $r$-propositions. The intended meaning of the $r$-propositions is to check whether the total number of messages of types $m\in M$ received by some process $i$ passes some threshold. Formally, they are evaluated in tuples $(\mathbf {d}, \mathbf {p})$, where $\mathbf {d}\in \mathbb {N}^{|\mathcal {M}|}$ is a vector of values assigned to each receive variable $\mathsf {nr}(m)$, for $m\in \mathcal {M}$, and $\mathbf {p}\in \mathbf {P}_{RC}$. We define:

$$\begin{aligned} (\mathbf {d}, \mathbf {p}) \models \#{M} \ge \varvec{a}\cdot \varvec{\pi }+ b\quad \text{ iff } \quad \sum _{m\in M} \mathbf {d}[m] \ge \varvec{a}\cdot \mathbf {p}+ b \end{aligned}$$

(2)

Rules ${\mathcal {R}}^{\varDelta }$. Similarly to the way we defined rules of STA above, the rules $r^{\varDelta }\in {\mathcal {R}}^{\varDelta }$ in rSTA are tuples $r^{\varDelta }= ({ from }, { to }, \varphi )$, where $r^{\varDelta }.{ from }, r^{\varDelta }.{ to }\in \mathcal {L}$ are locations, and $r^{\varDelta }.\varphi $ is a receive guard, which is a Boolean combination of $c$-propositions and $r$-propositions. The receive guards $r^{\varDelta }.\varphi $, for $r^{\varDelta }\in {\mathcal {R}}^{\varDelta }$, are evaluated in tuples $(\mathbf {d}, \varvec{\kappa }, \mathbf {p})$. Given a tuple $(\mathbf {d},\varvec{\kappa },\mathbf {p})$, where $\mathbf {d}\in \mathbb {N}^{|\mathcal {M}|}$ is a vector of valuations of the receive variables $\mathsf {nr}(m)$, for $m\in \mathcal {M}$, $\varvec{\kappa }\in \mathbb {N}^{|\mathcal {L}|}$ is an $|\mathcal {L}|$-dimensional vector of counters, and $\mathbf {p}\in \mathbf {P}_{RC}$ is an admissible valuation, we evaluate $c$-propositions and $r$-propositions (the semantics of the Boolean connectives is standard):

$$\begin{aligned} (\mathbf {d},\varvec{\kappa },\mathbf {p}) \models \#L\ge \varvec{a}\cdot \varvec{\pi }+ b~ \quad&\text{ iff } \quad (\varvec{\kappa }, \mathbf {p}) \models \#L\ge \varvec{a}\cdot \varvec{\pi }+ b~&\text{(cf. } \text{(1)) } \\ (\mathbf {d},\varvec{\kappa },\mathbf {p}) \models \#{M} \ge \varvec{a}\cdot \varvec{\pi }+ b\quad&\text{ iff } \quad (\mathbf {d}, \mathbf {p}) \models \#{M} \ge \varvec{a}\cdot \varvec{\pi }+ b&\text{(cf. } \text{(2)) } \end{aligned}$$

3.2 Environment Assumption and Modeling Faults

Depending on the fault model, when constructing a (receive) STA that models the behavior of a process running a given algorithm, we typically need to introduce additional locations or rules that are used to capture the behavior of the faulty processes. Additionally, to faithfully model the faulty environment, we will introduce constraints on the number of processes in given locations in both STA and rSTA, expressed using $c$-propositions, as well as constraints on the values of the receive variables of the rSTA, expressed using e-propositions:

$$\begin{aligned} \#{M} \ge \#L+ \varvec{a}\cdot \varvec{\pi }+ b \text{, } \text{ such } \text{ that } M\subseteq \mathcal {M}, L\subseteq \mathcal {L}, \varvec{a}\in {\mathbb {Z}}^{|\varPi |}, b \in {\mathbb {Z}}\end{aligned}$$

We denote by $\mathrm {EP}$ the set of $e$-propositions. The $e$-propositions are evaluated in tuples $(\mathbf {d}, \varvec{\kappa }, \mathbf {p})$ where $\mathbf {d}\in \mathbb {N}^{|\mathcal {M}|}$ is a vector of valuations of the receive variables, $\varvec{\kappa }\in \mathbb {N}^{|\mathcal {L}|}$ is an $|\mathcal {L}|$-dimensional vector of counters, and $\mathbf {p}\in \mathbf {P}_{RC}$. We say that:

$$\begin{aligned} (\mathbf {d},\varvec{\kappa },\mathbf {p}) \models \#{M} \ge \#L+ \varvec{a}\cdot \varvec{\pi }+ b\quad \text{ iff } \quad \sum _{m\in M} \mathbf {d}[m] \ge \sum _{\ell \in L} \varvec{\kappa }[\ell ] + \varvec{a}\cdot \mathbf {p}+ b \end{aligned}$$

The $e$-propositions will be used to express that the number of received messages is in the range from the number of messages sent by correct processes to the total number of sent messages (sent by both correct and faulty processes).

For STA, the environment assumption $\textsf {Env}$ is a conjunction of $c$-propositions and their negations. For rSTA, the environment assumption $\textsf {Env}^{\varDelta }$ is a conjunction of $c$-propositions, $e$-propositions and their negations. The $c$-propositions restrict the number of processes in certain locations, while the $e$-propositions restrict the values of the receive variables by relating them to the number of sent messages of the same type. We define the environment assumptions $\textsf {Env}$ and $\textsf {Env}^{\varDelta }$ of the STA and rSTA, respectively, as $\textsf {Env}\equiv \textsf {Env}_{\mathrm {CP}}$ and $\textsf {Env}^{\varDelta }\equiv \textsf {Env}_{\mathrm {CP}}\wedge \textsf {Env}_{\mathrm {EP}}$, where $\textsf {Env}_{\mathrm {CP}}$ and $\textsf {Env}_{\mathrm {EP}}$ are conjunctions of $c$-propositions and $e$-propositions and their negations, respectively, such that:

$$\begin{aligned} \textsf {Env}_{\mathrm {CP}}\equiv \textsf {C1} \wedge \textsf {C2} \wedge \textsf {Env}_{\mathrm {CP}, *}\quad \text{ and } \quad \textsf {Env}_{\mathrm {EP}}\equiv \textsf {E1} \wedge \textsf {Env}_{\mathrm {EP}, *}\end{aligned}$$

where, irrespective of the fault model, we have the following constraints:

(C1):: $\bigwedge _{\ell \in \mathcal {L}} \#\{\ell \} \ge 0$, i.e., the number of processes in a location $\ell $ is non-negative,
(C2):: $\#{\mathcal {L}} = N(\varvec{\pi })$, i.e., the number of processes in all locations $\mathcal {L}$ is equal to the number of participating processes,
(E1):: $\bigwedge _{m\in \mathcal {M}} \#{\mathsf {sent}(m)} \le \mathsf {nr}(m)$, i.e., the number $\mathsf {nr}(m)$ of received messages of each message type $m\in \mathcal {M}$ is bounded from below by the number $\#{\mathsf {sent}(m)}$ of messages of type $m$, sent by correct processes.

The formulas $\textsf {Env}_{\mathrm {CP}, *}$ and $\textsf {Env}_{\mathrm {EP}, *}$ for $*\in \{\textsf {cr}, \textsf {so}, \textsf {byz}\}$, depend on the fault model, i.e., on whether we model crash, send omission, or Byzantine faults.

Crash Faults. Crash-faulty processes stop executing the algorithm prematurely and cannot restart. To model the behavior of the crash-faulty processes, the set $\mathcal {L}$ of locations of the (receive) STA is the set: $\mathcal {L}= \mathcal {L}_{\mathsf {corr}}\cup \mathcal {L}_{\mathsf {cr}}\cup \{\ell _{\mathsf {fld}}\}$, where $\mathcal {L}_{\mathsf {corr}}$ is a set of correct locations, $\mathcal {L}_{\mathsf {cr}}= \{\ell _{\mathsf {cr}}\mid \ell _{\mathsf {cr}} \text{ is } \text{ a } \text{ fresh } \text{ copy } \text{ of } \ell \in \mathcal {L}_{\mathsf {corr}}\}$ is a set of crash locations, and $\ell _{\mathsf {fld}}$ is a failed location. The crash locations $\ell _{\mathsf {cr}}\in \mathcal {L}_{\mathsf {cr}}$ model the same values of the local variables and program counter as their correct counterpart $\ell \in \mathcal {L}_{\mathsf {corr}}$. The difference is that processes in the crash locations $\ell _{\mathsf {cr}}\in \mathcal {L}_{\mathsf {cr}}$ are flagged by the environment to crash in the current round. After crashing, they move to the failed location $\ell _{\mathsf {fld}}$, where they remain forever. This models that the crashed processes cannot restart.

A crash-faulty process may send a message to a subset of the other processes in the round in which it crashes. To model this, we introduce the mapping $\mathsf {sent}_{\mathsf {cr}}: \mathcal {M}\rightarrow 2^{\mathcal {L}_{\mathsf {cr}}}$, which defines, for each $m\in \mathcal {M}$, the set of crash locations $\mathsf {sent}_{\mathsf {cr}}(m) \subseteq \mathcal {L}_{\mathsf {cr}}$ where processes send a message of type $m$. Then, $\#{(\mathsf {sent}(m) \cup \mathsf {sent}_{\mathsf {cr}}(m))}$ denotes the number of messages sent by correct and crash-faulty processes. In addition to the new locations, we add the following new rules:

(cr1):: for every rule $r\in {\mathcal {R}}$, if $r.{ from }\in \mathcal {L}_{\mathsf {corr}}$ and $r.{ to }\in \mathcal {L}_{\mathsf {corr}}$, then we add the rule $(r.{ from }, \ell _{\mathsf {cr}}, r.\varphi )$, where $\ell _{\mathsf {cr}}\in \mathcal {L}_{\mathsf {cr}}$ is the crash location corresponding to $r.{ to }$,
(cr2):: for every crash location $\ell _{\mathsf {cr}}\in \mathcal {L}_{\mathsf {cr}}$, we add the rule $(\ell _{\mathsf {cr}}, \ell _{\mathsf {fld}}, \top )$,
(cr3):: for the failed location $\ell _{\mathsf {fld}}$, we add the rule $(\ell _{\mathsf {fld}}, \ell _{\mathsf {fld}}, \top )$.

The rules (cr1) move processes from the correct to the crash locations, in rounds where the environment flags them as crashed. The rules (cr2) move processes from the crashed locations to the failed location, where they can only apply the self-loop rule (cr3), which keeps them in the failed location.

We model the behavior of crash-faulty processes explicitly, that is, we have $N(\varvec{\pi }) = n$. The constraints $\textsf {Env}_{\mathrm {CP}, \textsf {cr}}$ and $\textsf {Env}_{\mathrm {EP}, \textsf {cr}}$ for the crash fault model are:

$$\begin{aligned}&\;\textsf {Env}_{\mathrm {CP}, \textsf {cr}}= \#{(\mathcal {L}_{\mathsf {cr}}\cup \{\ell _{\mathsf {fld}}\})} \le f \\ \textsf {Env}_{\mathrm {EP}, \textsf {cr}}&\equiv \bigwedge _{m\in \mathcal {M}} \mathsf {nr}(m) \le \#{(\mathsf {sent}(m) \cup \mathsf {sent}_{\mathsf {cr}}(m))} \end{aligned}$$

The formula $\textsf {Env}_{\mathrm {CP}, \textsf {cr}}$ ensures that there are no more than f faults. The formula $\textsf {Env}_{\mathrm {EP}, \textsf {cr}}$ restricts the values of the receive variables by ensuring that the number of received messages of type $m\in \mathcal {M}$ for each process is a value, bounded from above by the number $\#{(\mathsf {sent}(m) \cup \mathsf {sent}_{\mathsf {cr}}(m))}$ of messages of type $m$, sent by the correct processes and the processes flagged as crashed in the current round.

Figure 3 depicts the pseudo code and the rSTA of the crash-tolerant k-set agreement algorithm FloodMin, for $k=1$ [28]. We identify the sets $\mathcal {L}_{\mathsf {corr}}= \{\textsc {v0}, \textsc {v1}\}$ of correct locations, $\mathcal {L}_{\mathsf {cr}}= \{\textsc {cr0}, \textsc {cr1}\}$ of crash locations, $\mathcal {M}= \{m_0, m_1\}$ of message types. The location $\textsc {v}i$ encodes that a correct process has its variable $\mathtt {best}$ set to $i \in \{0, 1\}$, the location $\textsc {cr}i$ encodes that the value of $\mathtt {best}$ of a crashed process is $i \in \{0, 1\}$, and the message type $m_i$ encodes a message containing the value $i \in \{0, 1\}$. The failed location is $\ell _{\mathsf {fld}}$. We define $\mathsf {sent}(m_i) = \{\textsc {v}i\}$ and $\mathsf {sent}_{\mathsf {cr}}(m_i) = \{\textsc {cr}i\}$, for $i \in \{0, 1\}$. The two receive guards $\varphi _1 \equiv \mathsf {nr}(m_0) \ge 1$ and $\varphi _2 \equiv \mathsf {nr}(m_0) < 1$ check if a process received at least one message of type $m_0$ (i.e., if the minimal value 0 has been received in line 5 of the pseudo code) and no message of type $m_0$, respectively. The constraint $\textsf {Env}_{\mathrm {CP}, \textsf {cr}}$ ensures that there are not more than f processes in the locations $\textsc {cr0}, \textsc {cr1}$, and $\ell _{\mathsf {fld}}$ together. The constraint $\textsf {Env}_{\mathrm {EP}, \textsf {cr}}$ bounds the values of the receive variables $\mathsf {nr}(m_i)$ from above by the number of processes in locations $\textsc {v}i, \textsc {cr}i$, for $i \in \{0,1\}$.

Send Omission Faults. A send-omission-faulty process may omit to send a message, but acts as a correct process on the receiving side. We model algorithms tolerating send omission faults similarly to crash faults: the set $\mathcal {L}$ of locations is $\mathcal {L}= \mathcal {L}_{\mathsf {corr}}\cup \mathcal {L}_{\mathsf {so}}$, where $\mathcal {L}_{\mathsf {corr}}$ is a set of correct locations and $\mathcal {L}_{\mathsf {so}}= \{\ell _{\mathsf {so}}\mid \ell _{\mathsf {so}} \text{ is } \text{ a } \text{ fresh } \text{ copy } \text{ of } \ell \in \mathcal {L}_{\mathsf {corr}}\}$ is a set of send-omission locations. For every rule $r\in {\mathcal {R}}$ connecting two locations $\ell , \ell ' \in \mathcal {L}_{\mathsf {corr}}$, there exists a rule $(\ell _{\mathsf {so}}, \ell _{\mathsf {so}}', r.\varphi ) \in {\mathcal {R}}$, connecting their two corresponding send-omission locations $\ell _{\mathsf {so}}, \ell _{\mathsf {so}}' \in \mathcal {L}_{\mathsf {so}}$. We introduce the mapping $\mathsf {sent}_{\mathsf {so}}: \mathcal {M}\rightarrow 2^{\mathcal {L}_{\mathsf {so}}}$, which defines the set of send-omission locations where processes send a message of type $m\in \mathcal {M}$.

As there are no rules that connect the locations from $\mathcal {L}_{\mathsf {corr}}$ to the locations from $\mathcal {L}_{\mathsf {so}}$, the automaton consists of two parts: one used by the correct processes, and one used by the send-omission-faulty processes. The behavior of the send-omission-faulty processes is encoded explicitly, using locations and rules in the automaton, hence, we define $N(\varvec{\pi }) = n$. The constraint $\textsf {Env}_{\mathrm {CP}, \textsf {so}}$ ensures that the number of processes populating the correct locations is $n-f$, and the number of processes populating the send-omission locations is f. The constraint $\textsf {Env}_{\mathrm {EP}, \textsf {so}}$ ensures that the number of received messages of type $m\in \mathcal {M}$ for each process is bounded from above by the number $\#{(\mathsf {sent}(m) \cup \mathsf {sent}_{\mathsf {so}}(m))}$ of messages of type $m$, sent by the correct and the send-omission-faulty processes. Formally:

$$\begin{aligned}&\qquad \textsf {Env}_{\mathrm {CP}, \textsf {so}}= \#{\mathcal {L}_{\mathsf {corr}}} = n - f \wedge \#{\mathcal {L}_{\mathsf {so}}} = f \\&\textsf {Env}_{\mathrm {EP}, \textsf {so}}\equiv \bigwedge _{m\in \mathcal {M}} \mathsf {nr}(m) \le \#{(\mathsf {sent}(m) \cup \mathsf {sent}_{\mathsf {so}}(m))} \end{aligned}$$

Figure 4 depicts the rSTA for the k-set agreement algorithm FMinOmit, for $k=1$, which is a variant of the algorithm FloodMin (Fig. 3) that tolerates send omission faults. We identify the sets $\mathcal {L}_{\mathsf {corr}}= \{\textsc {v0}, \textsc {v1}\}$ of correct locations, $\mathcal {L}_{\mathsf {so}}= \{\textsc {so0}, \textsc {so1}\}$ of send-omission locations, and $\mathcal {M}= \{m_0, m_1\}$ of message types. We define $\mathsf {sent}(m_i) = \{\textsc {v}i\}$ and $\mathsf {sent}_{\mathsf {so}}(m_i) = \{\textsc {so}i\}$, for $i \in \{0, 1\}$. The constraint $\textsf {Env}_{\mathrm {CP}, \textsf {so}}$ ensures that there are exactly $n-f$ processes in the correct locations $\textsc {v0}, \textsc {v1}$, and exactly f processes in the send-omission locations $\textsc {so0}, \textsc {so1}$. The receive guards $\varphi _1$ and $\varphi _2$ are the syntactically same as in the rSTA for the crash-tolerant version of the algorithm FloodMin, for $k=1$. However, the environment constraint $\textsf {Env}_{\mathrm {EP}, \textsf {so}}$ differs from $\textsf {Env}_{\mathrm {EP}, \textsf {cr}}$: it restricts the number $\mathsf {nr}(m_i)$ of received messages of type $m_i$ to a value which is less than or equal to the number of processes in locations $\textsc {v}i, \textsc {so}i$, for $i \in \{0, 1\}$.

Byzantine Faults. To model the behavior of the Byzantine-faulty processes, which can act arbitrary, no new locations and rules are introduced in the (receive) STA. Instead, the (receive) STA is used to model the behavior of the correct processes, and the effect that the Byzantine-faulty processes have on the correct ones is captured in the guards (and environment assumption). The number of messages sent by Byzantine-faulty processes is overapproximated by the parameter f, which denotes the number of faults. That is, for a message type $m\in \mathcal {M}$, the number $\#{\mathsf {sent}(m)} + f$ is the upper bound on the number of messages sent by correct and Byzantine-faulty processes.

The (receive) STA for Byzantine faults is used to model the behavior of the correct processes, hence $N(\varvec{\pi }) = n - f$. As we do not introduce new locations or rules, we have $\textsf {Env}_{\mathrm {CP}, \textsf {byz}}\equiv \top $. The constraint $\textsf {Env}_{\mathrm {EP}, \textsf {byz}}$ encodes the effect that the Byzantine-faulty processes have on the correct processes, by bounding the receive variables $\mathsf {nr}(m)$ by $\mathsf {sent}(m) + f$ from above, for $m\in \mathcal {M}$:

$$\begin{aligned} \textsf {Env}_{\mathrm {EP}, \textsf {byz}}\equiv \bigwedge _{m\in \mathcal {M}} \mathsf {nr}(m) \le \mathsf {sent}(m) + f \end{aligned}$$

Figure 5 shows the pseudo code of the Byzantine reliable broadcast algorithm RB [21]. The locations $\mathcal {L}= \{\textsc {v0}, \textsc {v1}, \textsc {se}, \textsc {ac}\}$ model the behavior of the correct processes. The location $\textsc {v}i$ encodes that a process has value $i \in \{0, 1\}$, the location $\textsc {se}$ that a process has sent an ECHO message, and the location $\textsc {ac}$ that a process sets its value to 1 in line 12. There is a single message type, $m_{\textsf {E}}$, which encodes a message containing the value $\textsf {ECHO}$. There are four receive guards, $\varphi _1, \dots , \varphi _4$. The guard $\varphi _2$, for example, checks that at least $t+1$ ECHO messages are received, capturing line 8 of the pseudo code. The set of processes that send an ECHO message is $\mathsf {sent}(m_{\textsf {E}}) = \{\textsc {v1}, \textsc {se}, \textsc {ac}\}$. The constraint $\textsf {Env}_{\mathrm {EP}, \textsf {byz}}$ ensures that there are not more than $\#\{\textsc {v1}, \textsc {se}, \textsc {ac}\} + f$ received messages of type $m_{\textsf {E}}$.

Remark on Algorithms with a Coordinator. When modeling Byzantine-tolerant algorithms where a process acts as a coordinator (such as, e.g., the algorithm PhaseQueen in Fig. 1), we need to take into account that at some point, the coordinator will be Byzantine. Thus, we add locations $\mathcal {L}_{\textsf {byz}}\subseteq \mathcal {L}$ for a single Byzantine process, disjoint from the locations that are used by the correct processes. The new locations do not encode any values of the local variables; they ensure that the Byzantine process (which may become a coordinator) moves synchronously with the other processes. In the rSTA for the algorithm PhaseQueen (Fig. 2), we defined $\mathcal {L}_{\textsf {byz}}= F = \{\textsc {f}, \dots , \textsc {r2qf}\}$. As we model the behavior of a single Byzantine process explicitly, we have $N(\varvec{\pi }) = n-f+1$.

In this case, we define the constraints $\textsf {Env}_{\mathrm {CP}, \textsf {co}}$, which restrict the number of processes in given locations. We also identify locations $\mathcal {L}_{\textsf {co}}\subseteq \mathcal {L}$, which only a (correct or Byzantine) coordinator is allowed to populate. The environment constraint $\textsf {Env}_{\mathrm {CP}, \textsf {co}}$ for Byzantine-tolerant algorithms with a coordinator is:

$$\begin{aligned} \textsf {Env}_{\mathrm {CP}, \textsf {co}}\equiv \#{\mathcal {L}_{\textsf {co}}} = 1 \wedge \#{\mathcal {L}_{\textsf {byz}}} = 1 \end{aligned}$$

where $\#{\mathcal {L}_{\textsf {co}}} = 1$ (resp. $\#{\mathcal {L}_{\textsf {byz}}} = 1$) ensures that there is exactly one process in the coordinator locations $\mathcal {L}_{\textsf {co}}$ (resp. in the Byzantine locations $\mathcal {L}_{\textsf {byz}}$).

Additionally, we have message types $m_{\textsf {co}}\in \mathcal {M}$ that model the coordinator messages, and denote by $\ell _F$ the location where the Byzantine process performs the coordinator broadcast. The constraint $\textsf {Env}_{\mathrm {EP}, \textsf {co}}$ states that the number of received coordinator messages of type $m_{\textsf {co}}$ does not exceed the total number of coordinator messages of type $m_{\textsf {co}}$ sent by the correct and Byzantine coordinators:

$$\begin{aligned} \textsf {Env}_{\mathrm {EP}, \textsf {co}}\equiv \textsf {Env}_{\mathrm {EP}, \textsf {byz}}\wedge \bigwedge _{m_{\textsf {co}}\in \mathcal {M}} \mathsf {nr}(m_{\textsf {co}}) \le \#{(\mathsf {sent}(m_{\textsf {co}}) \cup \{\ell _F\}) } \end{aligned}$$

Thus, for the algorithm PhaseQueen, whose rSTA we depicted in Fig. 2:

$$\begin{aligned}&\quad \; \textsf {Env}_{\mathrm {CP}, \textsf {co}}\equiv \#\{\textsc {qv0}, \dots , \textsc {r2qv1}, \textsc {qf}, \dots , \textsc {r2qf}\} = 1 \wedge \#\{ \textsc {f}, \dots , \textsc {r2qf}\} = 1 \\&\textsf {Env}_{\mathrm {EP}, \textsf {co}}\equiv \bigwedge _{i \in \{0, 1\}} \left( \mathsf {nr}(m_i) \le \#{\mathsf {sent}(m_i)} + f \wedge \mathsf {nr}(m_{qi}) \le \#{(\mathsf {sent}(m_{qi}) \cup \{\textsc {r1qf}\}}) \right) \end{aligned}$$

4 Counter Systems

For an $\textsf {STA}= (\mathcal {L}_{}, {\mathcal {I}}_{}, {\mathcal {R}}_{}, \varPi _{}, RC_{}, \textsf {Env}_{})$ and an admissible valuation $\mathbf {p}\in \mathbf {P}_{RC}$, we recall the definition of a counter system from [36]. A counter system w.r.t. an admissible valuation $\mathbf {p}\in \mathbf {P}_{RC}$ and an $\textsf {STA}= (\mathcal {L}_{}, {\mathcal {I}}_{}, {\mathcal {R}}_{}, \varPi _{}, RC_{}, \textsf {Env}_{})$ is the tuple $\mathsf {CS}(\textsf {STA}, \mathbf {p}) = (\varSigma (\mathbf {p}), I(\mathbf {p}), R(\mathbf {p}))$, representing a system of $N(\mathbf {p})$ processes whose behavior is modeled using the $\textsf {STA}$, where $\varSigma (\mathbf {p})$ is the set of configurations, $I(\mathbf {p})$ is the set of initial configurations, and $R(\mathbf {p})$ is the transition relation.

A configuration $\sigma \in \varSigma (\mathbf {p})$ is a tuple $(\varvec{\kappa }, \mathbf {p})$, where $\mathbf {p}\in \mathbf {P}_{RC}$ is an admissible valuation, and $\varvec{\kappa }\in \mathbb {N}^{|\mathcal {L}|}$ is an $|\mathcal {L}|$-dimensional vector of counters, such that $\sigma \models \textsf {Env}$. For every $\sigma \in \varSigma (\mathbf {p})$, we have $\sum _{\ell \in \mathcal {L}} \sigma .\varvec{\kappa }[\ell ] = N(\mathbf {p})$. This follows from $\sigma \models \textsf {Env}$, in particular from $\sigma \models \#{\mathcal {L}} = N(\varvec{\pi })$, the definition of $N(\mathbf {p})$, and the semantics of the $c$-propositions. A configuration $\sigma \in \varSigma (\mathbf {p})$ is initial, i.e., $\sigma \in I(\mathbf {p})\subseteq \varSigma (\mathbf {p})$, iff $\sigma .\varvec{\kappa }[\ell ] = 0$, for every $\ell \in \mathcal {L}\setminus {\mathcal {I}}$. That is, the value $\sigma .\varvec{\kappa }[\ell ]$ of the counter for each non-initial location $\ell \in \mathcal {L}\setminus {\mathcal {I}}$ is set to 0 in $\sigma \in {\mathcal {I}}$.

To define the transition relation $R(\mathbf {p})$, we first define the notion of a transition. A transition is a function $tr: {\mathcal {R}}\rightarrow \mathbb {N}$ that maps each rule $r\in \mathcal {R}$ to a factor $tr(r) \in \mathbb {N}$. Given a valuation $\mathbf {p}$ of $\varvec{\pi }$, the set $ Tr (\mathbf {p})= \{tr\mid \sum _{r\in {\mathcal {R}}} tr(r) = N(\mathbf {p})\}$ contains transitions whose factors sum up to $N(\mathbf {p})$. For a transition $tr$ and a rule $r\in {\mathcal {R}}$, the factor $tr(r)$ denotes the number of processes that act upon this rule. By restricting the set $ Tr (\mathbf {p})$ to contain transitions whose factors sum up to $N(\mathbf {p})$, we ensure that in a transition, every process takes a step. This captures the semantics of synchronous computation. A transition $tr\in Tr (\mathbf {p})$ is enabled in a tuple $(\varvec{\kappa }, \mathbf {p})$, where $\varvec{\kappa }$ is an $|\mathcal {L}|$- dimensional vector of counters and $\mathbf {p}\in \mathbf {P}_{RC}$ an admissible valuation, iff for every $r\in {\mathcal {R}}$, such that $tr(r) > 0$, it holds that $(\varvec{\kappa }, \mathbf {p}) \models r.\varphi $, and for every $\ell \in \mathcal {L}$, we have $\varvec{\kappa }[\ell ] = \sum _{r\in {\mathcal {R}}\wedge r.{ from }= \ell } tr(r)$. The former condition ensures that processes only use rules whose guards are satisfied, and the latter that every process moves in an enabled transition.

Given a transition $tr\in Tr (\mathbf {p})$, we define the origin $o(tr) = (\varvec{\kappa }, \mathbf {p})$ of $tr$, where for every location $\ell \in \mathcal {L}$, we have $\varvec{\kappa }[\ell ] = \sum _{r\in {\mathcal {R}}\wedge r.{ from }= \ell } tr(r)$, and the goal $g(tr) = (\varvec{\kappa }', \mathbf {p})$ of $tr$, where for every location $\ell \in \mathcal {L}$, we have $\varvec{\kappa }'[\ell ] = \sum _{r\in {\mathcal {R}}\wedge r.{ to }= \ell } tr(r)$. The origin $o(tr)$ is the unique tuple $(\varvec{\kappa }, \mathbf {p})$ where the transition $tr$ is enabled, while its goal $g(tr)$ is the unique tuple $(\varvec{\kappa }', \mathbf {p})$ that is obtained by applying the transition $tr$ to its origin $o(tr)$. The transition relation $R(\mathbf {p})$ is the relation $R(\mathbf {p})\subseteq \varSigma (\mathbf {p})\times Tr (\mathbf {p})\times \varSigma (\mathbf {p})$, such that $\langle \sigma , tr, \sigma ' \rangle \in R(\mathbf {p})$ iff $\sigma = o(tr)$ is the origin and $\sigma ' = g(tr)$ is the goal of the transition $tr$.

5 Synchronous Transition Systems

Let $\textsf {rSTA}= (\mathcal {L}, {\mathcal {I}}, {\mathcal {R}}^{\varDelta }, \varDelta , \varPi , RC, \textsf {Env}^{\varDelta }){}$ be a receive STA, and $\mathbf {p}\in \mathbf {P}_{RC}$ an admissible valuation of the parameter vector $\varvec{\pi }$. A synchronous transition system (or system), w.r.t. an admissible valuation $\mathbf {p}\in \mathbf {P}_{RC}$ and an $\textsf {rSTA}$ is the triple $\mathsf {STS}(\textsf {rSTA}, \mathbf {p})= \langle S(\mathbf {p}), S_0(\mathbf {p}), T(\mathbf {p})\rangle $, representing a system of $N(\mathbf {p})$ processes whose behavior is modeled using the $\textsf {rSTA}$, where $S(\mathbf {p})$ is the set of states, $S_0(\mathbf {p})$ is the set of initial states, and $T(\mathbf {p})$ is the transition relation.

Recall that the environment assumption $\textsf {Env}^{\varDelta }$ of the $\textsf {rSTA}$ is the conjunction $\textsf {Env}^{\varDelta }\equiv \textsf {Env}_{\mathrm {CP}}\wedge \textsf {Env}_{\mathrm {EP}}$. A state $s\in S(\mathbf {p})$ is a tuple $s= \langle \varvec{\ell }, \mathbf {nr}_{1}, \dots , \mathbf {nr}_{N(\mathbf {p})}, \mathbf {p}\rangle $, where $\varvec{\ell }\in \mathcal {L}^{N(\mathbf {p})}$ is an $N(\mathbf {p})$-dimensional vector of locations, and $\mathbf {nr}_{i} \in \mathbb {N}^{|\mathcal {M}|}$, for $1 \le i\le N(\mathbf {p})$, is a vector of valuations of the receive variables $\mathsf {nr}(m)$, with $m\in \mathcal {M}$, for each process $i$, such that $s\models \textsf {Env}_{\mathrm {CP}}$. In a state $s\in S(\mathbf {p})$, the vector $\varvec{\ell }$ of locations is used to store the current location $s.\varvec{\ell }[i] \in \mathcal {L}$ for each process $i$, while the vector $\mathbf {nr}_{i} \in \mathbb {N}^{|\mathcal {M}|}$ stores the values of the receive variables for each process $i$, with $1 \le i\le N(\mathbf {p})$. Further, each state $s\in S(\mathbf {p})$ satisfies $\textsf {Env}_{\mathrm {CP}}$.

To formally define that a state $s\in S(\mathbf {p})$ satisfies the environment constraint $\textsf {Env}_{\mathrm {CP}}$, we define the semantics of $c$-propositions w.r.t. states $s\in S(\mathbf {p})$. Let $\mathsf {counters}_{\mathbf {p}}: S(\mathbf {p})\times \mathcal {L}\rightarrow \mathbb {N}$ denote a mapping that maps a state $s\in S(\mathbf {p})$ and a location $\ell \in \mathcal {L}$ to the number of processes that are in location $\ell $ in the state $s$, that is, $\mathsf {counters}_{\mathbf {p}}(s, \ell ) = |\{i\mid 1 \le i\le N(\mathbf {p}) \wedge s.\varvec{\ell }[i] = \ell \}|$. Further, let $\varvec{\kappa }(s) \in \mathbb {N}^{|\mathcal {L}|}$ denote the $|\mathcal {L}|$-dimensional vector of counters w.r.t. the state $s\in S(\mathbf {p})$, where for every location $\ell \in \mathcal {L}$, we have that $\varvec{\kappa }(s)[\ell ]$ stores the number of processes that are in location $\ell $ in the state $s$, that is, $\varvec{\kappa }(s)[\ell ] = \mathsf {counters}_{\mathbf {p}}(s,\ell )$. We say that $s\models \#L\ge \varvec{a}\cdot \varvec{\pi }+ b$ iff $(\varvec{\kappa }(s), s.\mathbf {p}) \models \#L\ge \varvec{a}\cdot \varvec{\pi }+ b$. A state $s\in S(\mathbf {p})$ satisfies the environment constraints $\textsf {Env}_{\mathrm {CP}}$, that is, $s\models \textsf {Env}_{\mathrm {CP}}$ iff $(\varvec{\kappa }(s), s.\mathbf {p}) \models \textsf {Env}_{\mathrm {CP}}$.

In an initial state $s_0 \in S_0(\mathbf {p})$, the vector $\varvec{\ell }$ of locations stores only initial locations, i.e., $\varvec{\ell }[i] \in {\mathcal {I}}$, for $1 \le i\le N(\mathbf {p})$, and all receive variables of all processes are initialized to 0. Formally, a state $s_0 = \langle \varvec{\ell }, \mathbf {nr}_{1}, \dots , \mathbf {nr}_{N(\mathbf {p})}, \mathbf {p}\rangle $ is initial, i.e., $s_0 \in S_0(\mathbf {p})$, if $s_0.\varvec{\ell }\in {\mathcal {I}}^{N(\mathbf {p})}$ and $s_0.\mathbf {nr}_{i}[m] = 0$, for $1 \le i\le N(\mathbf {p})$ and $m\in \mathcal {M}$.

We now define the transition relation $T(\mathbf {p})\subseteq S(\mathbf {p})\times S(\mathbf {p})$, where we will use the environment constraint $\textsf {Env}_{\mathrm {EP}}$ to restrict the values of the receive variables. A transition $(s, s') \in T(\mathbf {p})$ encodes one round in the execution of the distributed algorithm. In a round, the processes send and receive messages, and update their variables based on the received messages. Further, all the messages sent in the current round are received in the same round. The process variable updates are captured by moving processes from one location to another, based on the values of the receive variables. The transition relation $T(\mathbf {p})$ is a binary relation $T(\mathbf {p})\subseteq S(\mathbf {p})\times S(\mathbf {p})$, where $(s, s') \in T(\mathbf {p})$ iff for every process $i$, with $1 \le i\le N(\mathbf {p})$:

1.
$0 \le s'.\mathbf {nr}_{i}[m] \le N(\mathbf {p})$, such that $(s'.\mathbf {nr}_{i}, \varvec{\kappa }(s), s.\mathbf {p}) \models \textsf {Env}_{\mathrm {EP}}$, for $m\in \mathcal {M}$,
2.
there exists $r^{\varDelta }\in {\mathcal {R}}^{\varDelta }$ such that:
- $s.\varvec{\ell }[i] = r^{\varDelta }.{ from }$,
- $(s'.\mathbf {nr}_{i}, \varvec{\kappa }(s), s.\mathbf {p}) \models r^{\varDelta }.\varphi $,
- $s'.\varvec{\ell }[i] = r^{\varDelta }.{ to }$.
3.
$s'.\mathbf {p}= s.\mathbf {p}$ and $s' \models \textsf {Env}_{\mathrm {CP}}$.

In a transition $(s, s') \in T(\mathbf {p})$, the receive variables and locations of each process are updated. That is, the value $s'.\mathbf {nr}_{i}[m]$ of the receive variable $\mathsf {nr}(m)$ of process $i$ is assigned a value in the range from 0 to $N(\mathbf {p})$ non-deterministically, such that the environment constraint $\textsf {Env}_{\mathrm {EP}}$ is satisfied. This ensures that the number of received messages of type $m$ is non-negative, that it does not exceed the number of participating processes, and that the receive variables of each process are assigned values that satisfy the constraints of the environment assumption. In the case of the synchronous computation model, this captures that all messages sent by correct processes in a round are received in the same round, and that the number of messages of type $m$, received by process $i$, is bounded by above by the total number of messages of type $m$, sent by both correct and faulty processes. To update the locations, each process $i$ picks a rule $r^{\varDelta }\in {\mathcal {R}}^{\varDelta }$ that it applies to update its location, if the process $i$ is in location $r^{\varDelta }.{ from }$ in the state $s$, and if the newly assigned values of the receive variables of process $i$ in the state $s'$ satisfy the receive guard $r^{\varDelta }.\varphi $. If this is the case, the process $i$ updates its location to $r^{\varDelta }.{ to }$ in the state $s'$. The parameter values remain unchanged, and we require that the state $s'$ satisfies $\textsf {Env}_{\mathrm {CP}}$, i.e., it is a valid state.

6 Abstracting rSTA to STA

Given an rSTA, our goal is to construct an STA, which differs from the rSTA only in the guards on its rules and the environment assumption. For each rule $r^{\varDelta }\in {\mathcal {R}}^{\varDelta }$ in the rSTA, whose guard $r^{\varDelta }.\varphi $ is a receive guard, we will construct a rule $r\in {\mathcal {R}}$ in the STA, such that the guard $r.\varphi $ is a Boolean combination of $c$-propositions. We will perform the abstraction in two steps: (i) we will strengthen each receive guard $r^{\varDelta }.\varphi $, occurring on the rules $r^{\varDelta }\in {\mathcal {R}}^{\varDelta }$ of the rSTA, with the constraints imposed by the faulty environment and the synchronous computation model, encoded in the environment assumption $\textsf {Env}^{\varDelta }$, and (ii) we will eliminate the receive variables from the receive guards and environment assumptions of rSTA to obtain the guards and environment assumption of STA.

6.1 Guard Strengthening

Let $\textsf {rSTA}= (\mathcal {L}, {\mathcal {I}}, {\mathcal {R}}^{\varDelta }, \varDelta , \varPi , RC, \textsf {Env}^{\varDelta })$ be a receive STA, where the rules $r^{\varDelta }\in {\mathcal {R}}^{\varDelta }$ have guards containing expressions over the receive variables $\mathsf {nr}(m) \in \varDelta $, and where the environment assumption $\textsf {Env}^{\varDelta }\equiv \textsf {Env}_{\mathrm {CP}}\wedge \textsf {Env}_{\mathrm {EP}}$ is a conjunction of two environment constraints, $\textsf {Env}_{\mathrm {CP}}$ and $\textsf {Env}_{\mathrm {EP}}$, where the latter restricts the values of the receive variables. Recall that in Sect. 3.2, we defined different environment constraints $\textsf {Env}_{\mathrm {EP}}$ for the different fault models. In general, these constraints express that for each message type $m\in \mathcal {M}$, the receive variable $\mathsf {nr}(m)$ is assigned a value which is greater or equal to the number of messages of type $m$ sent by correct processes, and which is smaller or equal to the total number of messages of type $m$, sent by both correct and faulty processes (e.g., $\#{\mathsf {sent}(m)} \le \mathsf {nr}(m) \le \#{\mathsf {sent}(m)} + \#{\mathsf {sent}_{\mathsf {cr}}(m)}$ for crash faults). As a first step towards eliminating the receive variables from the receive guards, we strengthen the rules from the set ${\mathcal {R}}^{\varDelta }$, such that we add the environment constraints $\textsf {Env}_{\mathrm {EP}}$ to their guards in order to bound the values of the receive variables.

Definition 1

Given $r^{\varDelta }\in {\mathcal {R}}^{\varDelta }$, its strengthened rule is $\widehat{r}^{\varDelta }= \mathsf {strengthen}(r^{\varDelta })$, such that: $\widehat{r}^{\varDelta }.{ from }= r^{\varDelta }.{ from }$, $\widehat{r}^{\varDelta }.{ to }= r^{\varDelta }.{ to }$, $\widehat{r}^{\varDelta }.\varphi = r^{\varDelta }.\varphi \wedge \textsf {Env}_{\mathrm {EP}}$.

We denote by $\widehat{{\mathcal {R}}}^{\varDelta }= \{\mathsf {strengthen}(r^{\varDelta }) \mid r^{\varDelta }\in {\mathcal {R}}^{\varDelta }\}$ the set of strengthened rules in $\textsf {rSTA}= (\mathcal {L}, {\mathcal {I}}, {\mathcal {R}}^{\varDelta }, \varDelta , \varPi , RC, \textsf {Env}^{\varDelta })$, where $\textsf {Env}^{\varDelta }\equiv \textsf {Env}_{\mathrm {CP}}\wedge \textsf {Env}_{\mathrm {EP}}$.

6.2 Eliminating the Receive Variables

Let $\textsf {rSTA}= (\mathcal {L}, {\mathcal {I}}, {\mathcal {R}}^{\varDelta }, \varDelta , \varPi , RC, \textsf {Env}^{\varDelta })$ be a receive STA, and let $\widehat{{\mathcal {R}}}^{\varDelta }$ be the set of strengthened rules (Definition 1). We define an $\textsf {STA}= (\mathcal {L}_{}, {\mathcal {I}}_{}, {\mathcal {R}}_{}, \varPi _{}, RC_{}, \textsf {Env}_{})$ whose locations, initial locations, and parameters are the same as in $\textsf {rSTA}$, while we construct the rules ${\mathcal {R}}$ and the environment assumption $\textsf {Env}$ of the STA below.

Recall that $\textsf {Env}^{\varDelta }\equiv \textsf {Env}_{\mathrm {CP}}\wedge \textsf {Env}_{\mathrm {EP}}$. To define the environment assumption $\textsf {Env}$ of the constructed STA, we set $\textsf {Env}\equiv \textsf {Env}_{\mathrm {CP}}$. Before we define the rules of the constructed STA, we define the mapping $\mathsf {eliminate}$.

Definition 2

Let $\phi $ be a propositional formula over r-, c-, and $e$-propositions. Let $\varvec{\delta }= \langle \mathsf {nr}(m_1), \dots , \mathsf {nr}(m_{|\mathcal {M}|}) \rangle $ denote the $|\mathcal {M}|$-dimensional receive variables vector, and $\mathsf {QE}$ denote the quantifier elimination procedure for Presburger arithmetic. The formula $ \mathsf {eliminate}(\phi ) = \mathsf {QE}(\exists \varvec{\delta }\; \phi ) $ is a quantifier-free formula, with no occurrence of receive variables $\mathsf {nr}(m) \in \varDelta $, which is logically equivalent to $\exists \varvec{\delta }\; \phi $.

To construct a rule $r\in {\mathcal {R}}$ of an STA, given a rule $r^{\varDelta }\in {\mathcal {R}}^{\varDelta }$ of an rSTA, we will apply the mapping $\mathsf {eliminate}$ to each guard of the strengthened rule $\widehat{r}^{\varDelta }\in \widehat{{\mathcal {R}}}^{\varDelta }$, where $\widehat{r}^{\varDelta }= \mathsf {strengthen}(r^{\varDelta })$. The result of quantifier elimination is a quantifier-free formula over $c$-propositions, which is logically equivalent to $\exists \varvec{\delta }\; \widehat{r}^{\varDelta }.\varphi $.

Definition 3

Given $r^{\varDelta }\in {\mathcal {R}}^{\varDelta }$, its constructed rule is $r= \mathsf {construct}(r^{\varDelta }) \in {\mathcal {R}}$, such that: $r.{ from }= r^{\varDelta }.{ from }$, $r.{ to }= r^{\varDelta }.{ to }$, $r.\varphi = \mathsf {eliminate}(\widehat{r}^{\varDelta }.\varphi )$, where $\widehat{r}^{\varDelta }= \mathsf {strengthen}(r^{\varDelta })$.

Proposition 1

For every strengthened rule $\widehat{r}^{\varDelta }\in \widehat{{\mathcal {R}}}^{\varDelta }$ and every tuple $(\mathbf {d},\varvec{\kappa },\mathbf {p})$, where $\mathbf {d}\in \mathbb {N}^{|\mathcal {M}|}$, $\varvec{\kappa }\in \mathbb {N}^{|\mathcal {L}|}$, and $\mathbf {p}\in \mathbf {P}_{RC}$, we have:

$$\begin{aligned} (\mathbf {d},\varvec{\kappa },\mathbf {p})&\models \widehat{r}^{\varDelta }.\varphi \,\quad \text{ implies } \quad (\varvec{\kappa },\mathbf {p}) \models \mathsf {eliminate}(\widehat{r}^{\varDelta }.\varphi ) \end{aligned}$$

Proposition 1 is a consequence of quantifier elimination. Note that the converse of this proposition does not hold in general. That is, $(\varvec{\kappa },\mathbf {p}) \models \mathsf {eliminate}(\widehat{r}^{\varDelta }.\varphi )$ does not imply that $(\mathbf {d},\varvec{\kappa },\mathbf {p}) \models \widehat{r}^{\varDelta }.\varphi $, for every $\mathbf {d}\in \mathbb {N}^{|\mathcal {M}|}$. However, by quantifier elimination, we have that $(\varvec{\kappa },\mathbf {p}) \models \mathsf {eliminate}(\widehat{r}^{\varDelta }.\varphi )$ implies $(\varvec{\kappa },\mathbf {p})\models \exists \varvec{\delta }{}\; \widehat{r}^{\varDelta }.\varphi $.

6.3 Soundness and Completeness

This construction of an STA is sound and complete. That is, given a rSTA and an admissible valuation $\mathbf {p}\in \mathbf {P}_{RC}$, we show that there exists a bisimulation relation between the system $\mathsf {STS}(\textsf {rSTA}, \mathbf {p})$, induced by rSTA and $\mathbf {p}$, and a counter system $\mathsf {CS}(\textsf {STA}, \mathbf {p})$, induced by the constructed STA and $\mathbf {p}$. The existence of a bisimulation implies that $\mathsf {STS}(\textsf {rSTA}, \mathbf {p})$ and $\mathsf {CS}(\textsf {STA}, \mathbf {p})$ satisfy the same CTL $^*$ formulas [3]. To express temporal formulas, as atomic propositions we use the $c$-propositions from the set $\mathrm {CP}$. We define two labeling functions, $\lambda _{S(\mathbf {p})}$ and $\lambda _{\varSigma (\mathbf {p})}$, where $\lambda _{S(\mathbf {p})}: S(\mathbf {p})\rightarrow 2^{\mathrm {CP}}$ assigns to a state $s\in S(\mathbf {p})$ the set of $c$-propositions that hold in it (the function $\lambda _{\varSigma (\mathbf {p})} : \varSigma (\mathbf {p})\rightarrow 2^{\mathrm {CP}}$ is defined analogously).

We introduce an abstraction mapping $\alpha _{\mathbf {p}}: S(\mathbf {p})\rightarrow \varSigma (\mathbf {p})$ that maps states $s\in S(\mathbf {p})$ of $\mathsf {STS}(\textsf {rSTA}, \mathbf {p})$ to configurations $\sigma \in \varSigma (\mathbf {p})$ of $\mathsf {CS}(\textsf {STA}, \mathbf {p})$, such that $\sigma = \alpha _{\mathbf {p}}(s)$ iff $\sigma = (\varvec{\kappa }(s), s.\mathbf {p})$. By the definition of the abstraction mapping $\alpha _{\mathbf {p}}$ and the semantics of $c$-propositions, we have that a state and its abstraction satisfy the same $c$-propositions. Further, given a configuration $\sigma \in \varSigma (\mathbf {p})$, we can construct a state $s\in S(\mathbf {p})$, such that $\sigma = \alpha _{\mathbf {p}}(s)$. While this is always possible, the constructed state $s$ might not be reachable in any execution of the system $\mathsf {STS}(\textsf {rSTA}, \mathbf {p})$. However, we can use the constraint $\textsf {Env}_{\mathrm {EP}}$ to restrict the value of the receive variables in the constructed state $s$, such that it is a valid state in the system $\mathsf {STS}(\textsf {rSTA}, \mathbf {p})$. The main result of this section is stated below. The detailed proof of this result can be found in the first author’s PhD thesis.

Theorem 1

The binary relation $ B (\mathbf {p})= \{(s, \sigma ) \mid s\in S(\mathbf {p}), \sigma \in \varSigma (\mathbf {p}), \sigma = \alpha _{\mathbf {p}}(s)\}$ is a bisimulation relation.

7 Experimental Evaluation

To show the usefulness of translating rSTA to STA, we: (i) encoded synchronous fault-tolerant distributed algorithms using rSTA, (ii) implemented the method from Sect. 6 in a prototype, (iii) compared the output to the existing manual encodings from [34], some of which are artifacts of the experimental evaluation from [36] and were given as examples throughout this paper, and (iv) verified the properties of the generated STA using the technique from [36].

Encoding Algorithms as . We extended the STA encoding from [36], to support (i) declarations of receive variables and (ii) constraints given by the environment assumption. The algorithms we encoded are listed in Table 1, and their rSTA can be found in [35]. For each of them, there already existed a manually produced STA [34]. The manually produced rSTA and STA have the same structure w.r.t. locations and rules, and differ only in the guards that occur on the rules: in the rSTA, we have receive guards, which are Boolean combinations of $r$-propositions and $c$-propositions, while in the manually encoded STA, the guards are Boolean combinations of $c$-propositions.

Applying Quantifier Elimination. We implemented a script that parses the input rSTA and creates an STA whose rules have guards that are Boolean combinations of $c$-propositions, according to the abstraction from Sect. 6. To automate the quantifier elimination step, we applied Z3 [16] tactics for quantifier elimination [10, 11], to formulas of the form $\exists \varvec{\delta }{} \;\widehat{r}^{\varDelta }.\varphi $, where $\widehat{r}^{\varDelta }.\varphi \equiv r^{\varDelta }.\varphi \wedge \textsf {Env}_{\mathrm {EP}}$ is the strengthened guard of the receive guard $r^{\varDelta }.\varphi $, for $r^{\varDelta }\in {\mathcal {R}}^{\varDelta }$. For all our benchmarks, the STA is generated within seconds, as reported in Table 1.

Analyzing the Automatically Generated . We compared the guards of the automatically generated STA (auto STA) to the manually encoded STA (man STA). Syntactically, the guards of auto STA are larger in general, as they contain additional constraints that result from quantifier elimination. Semantically, we check whether the guards for the auto STA imply the guards of the man STA. For each automatically generated guard $\varphi _{\textsf {auto}}$, we check whether its corresponding guard $\varphi _{\textsf {man}}$ from the manual encoding is implied by $\varphi _{\textsf {auto}}$, for all values of the parameters and number of sent messages by checking the validity of the formula:

$$\begin{aligned} \forall \mathbf {p}\in \mathbf {P}_{RC}\; \forall L_1\dots \forall L_{|\mathcal {M}|} \; \varphi _{\textsf {auto}}(L_1, \dots , L_{|\mathcal {M}|}) \, \rightarrow \, \varphi _{\textsf {man}}(L_1, \dots , L_{|\mathcal {M}|}) \end{aligned}$$

(3)

where $L_j = \mathsf {sent}(m_j)$, for $m_j \in \mathcal {M}$ and $1 \le j \le |\mathcal {M}|$, denotes the set of locations where processes send messages of type $m_j$. We automate the validity check of (3) using an SMT solver, such as Z3, to check the unsatisfiability of its negation. With this check we are able to either verify that the earlier man STA faithfully model the benchmark algorithms, or detect discrepancies, which we investigated further. Our translation technique produces the strongest possible guards, due to the soundness and completeness result. Hence, we expected that the implication holds for all the guards of all the benchmarks we considered. This is however not the case for the algorithms HybridKing and HybridQueen which are designed to tolerate hybrid faults, in particular, send omissions and Byzantine faults. There, we found that one automatically generated guard does not imply its corresponding manual guard, and concluded that this is due to a flaw in the manual encoding by manual inspection. We found a similar problem with a missing rule in the (purely) Byzantine versions of these algorithms, namely ByzKing and ByzQueen. By adding these rules and correcting the appropriate manual guards, we were able to establish the validity of (3) for all guards.

Table 1. The algorithms we encoded as rSTA and the results of applying the verification technique from [36]. The column $\mathsf {QE}$ states the time needed to produce an auto STA from an rSTA. The column $\Rightarrow $ states if (3) is valid all, some, or none of guards. We report on the time it took the solvers Z3 and CVC4 to (i) check the guard implications (only Z3), (ii) compute the diameter for the auto STA, and (iii) check the safety properties of the auto STA, (iv) compute the diameter for the man STA, (v) check the safety properties of the man STA, using the SMT-based procedure from [36].

Full size table

Model Checking of Safety Properties. We gave the STA we obtained as output of our translation procedure as input to the bounded model checking tool from [36], which computes a diameter of a counter system and performs bounded model checking for safety properties. The experiments were run on a machine with 2.8 GHz Quad-Core Intel(R) Core(TM) i7 CPU and 16GB. The results of applying the SMT-based procedure from [36] to the auto STA, as well as to the extended set [34] of man STA from [36], are presented in Table 1. The timeout, denoted by t.o. in the table, was set to 24 h. For all algorithms, we note that bounded model checking with both Z3 and CVC4 performs similarly for both auto STA and man STA. For computing the diameter, we observe that for the algorithms: RB [21] (Fig. 5), HybridRB, OmitRB [9], FairCons [33], FloodMin, for $k=1$ (Fig. 3) and $k=2$ [28], FMinOmit, for $k=1$ (Fig. 4) and $k=2$ [28], kSetOmit, for $k=2$ [33], FloodSet [28], PhaseKing [7], and PhaseQueen [6] (Fig. 1), we obtain comparable results on both the auto STA and man STA. For the other algorithms, we found:

computing the diameter for the auto STA of kSetOmit, with $k=1$ [33], is slightly slower with Z3 and slightly faster with CVC4 than for the man STA;
Z3 performs better when computing the diameter for the auto STA than for the man STA of both ByzKing and ByzQueen [9], while CVC4 performs worse. Note that in Table 1 we report the times for the man STA of ByzKing and ByzQueen that have missing rules. After adding the rules to the man STA, computing the diameter on the auto STA is still faster with both solvers;
Z3 and CVC4 compute the diameter for the auto STA of HybridKing and HybridQueen [9] within seconds, in contrast to both timing out for the man STA;
computing the diameter with Z3 is significantly faster for the auto STA than for the man STA of OmitKing [9]. CVC4 computes the diameter for auto STA of OmitKing, while for man STA it times out. The computed diameter $\mathsf {d}=4$ for auto STA is smaller than the diameter 8, computed for man STA;
Z3 and CVC4 compute the diameter for the auto STA of OmitQueen [9] faster than for man STA.

8 Conclusions

We established a fully automated pipeline that for a synchronous distributed algorithm: (1) starts from a formal model that captures its pseudo code, (2) produces a formal model suitable for verification, and (3) automatically verifies its safety properties. Our technique thus closes the gap between the original description of an algorithm (using received messages) and the synchronous threshold automaton of the algorithm given as an input to a verification tool.

There are two major differences to the asynchronous case considered in [37]. First, the asynchronous model uses interleaving semantics, while in the synchronous model all processes take a step in a transition. Second, in the asynchronous model, there are no limitations when a message will be delivered. The lower bound on the number of received messages, given in the synchronous model by the number of sent messages by correct processes, is only eventually satisfied in the asynchronous model, and thus is not used in the process of eliminating the receive variables from the receive guards.

We did extensive experimental evaluation of our method. We attribute the better performance of the bounded model checking technique from [36] on the automatically generated STA to the fact that the automatically generated guards contain more additional constraints, coming from the environment assumption, which help guide the SMT solvers. Moreover, not only do we obtain the diameter bounds faster, we also obtain better bounds for the automatically generated STA of some benchmarks. These findings confirm the conjecture that manual encoding of distributed algorithms is a tedious and error-prone task and suggest that there is a real benefit of producing guards automatically.

References

Aminof, B., Rubin, S., Stoilkovska, I., Widder, J., Zuleger, F.: Parameterized model checking of synchronous distributed algorithms by abstraction. VMCAI 2018. LNCS, vol. 10747, pp. 1–24. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73721-8_1
Chapter Google Scholar
Attiya, H., Welch, J.: Distributed Computing, 2nd edn. Wiley, Hoboken (2004)
Book Google Scholar
Baier, C., Katoen, J.P.: Principles of Model Checking. MITP, United States (2008)
Google Scholar
Bakst, A., von Gleissenthall, K., Kici, R.G., Jhala, R.: Verifying distributed programs via canonical sequentialization. PACMPL 1(OOPSLA), 1–27 (2017)
Google Scholar
Balasubramanian, A.R., Esparza, J., Lazić, M.: Complexity of verification and synthesis of threshold automata. In: ATVA (2020)
Google Scholar
Berman, P., Garay, J.A., Perry, K.J.: Asymptotically Optimal Distributed Consensus. Technical report, Bell Labs (1989). http://plan9.bell-labs.co/who/garay/asopt.ps
Berman, P., Garay, J.A., Perry, K.J.: Towards optimal distributed consensus (Extended Abstract). In: FOCS, pp. 410–415 (1989)
Google Scholar
Bertrand, N., Konnov, I., Lazić, M., Widder, J.: Verification of randomized consensus algorithms under round-rigid adversaries. In: CONCUR, pp. 1–15 (2019)
Google Scholar
Biely, M., Schmid, U., Weiss, B.: Synchronous consensus under hybrid process and link failures. Theor. Comput. Sci. 412(40), 5602–5630 (2011)
Article MathSciNet Google Scholar
Bjørner, N.: Linear quantifier elimination as an abstract decision procedure. In: Giesl, J., Hähnle, R. (eds.) IJCAR 2010. LNCS (LNAI), vol. 6173, pp. 316–330. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14203-1_27
Chapter Google Scholar
Bjørner, N., Janota, M.: Playing with quantified satisfaction. LPAR 35, 15–27 (2015)
Google Scholar
Bouajjani, A., Enea, C., Ji, K., Qadeer, S.: On the completeness of verifying message passing programs under bounded asynchrony. In: Chockler, H., Weissenbacher, G. (eds.) CAV 2018. LNCS, vol. 10982, pp. 372–391. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96142-2_23
Chapter Google Scholar
Chaouch-Saad, M., Charron-Bost, B., Merz, S.: A reduction theorem for the verification of round-based distributed algorithms. In: Bournez, O., Potapov, I. (eds.) RP 2009. LNCS, vol. 5797, pp. 93–106. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04420-5_10
Chapter Google Scholar
Cooper, D.C.: Theorem proving in arithmetic without multiplication. Mach. Intell. 7(91–99), 300 (1972)
Google Scholar
Damian, A., Drăgoi, C., Militaru, A., Widder, J.: Communication-closed asynchronous protocols. In: Dillig, I., Tasiran, S. (eds.) CAV 2019. LNCS, vol. 11562, pp. 344–363. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25543-5_20
Chapter Google Scholar
de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3_24
Chapter Google Scholar
Drăgoi, C., Henzinger, T.A., Veith, H., Widder, J., Zufferey, D.: A logic-based framework for verifying consensus algorithms. In: McMillan, K.L., Rival, X. (eds.) VMCAI 2014. LNCS, vol. 8318, pp. 161–181. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54013-4_10
Chapter Google Scholar
Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985)
Article MathSciNet Google Scholar
Gleissenthall, K.V., Gökhan Kici, R., Bakst, A., Stefan, D., Jhala, R.: Pretend synchrony. In: POPL (2019)
Google Scholar
Hawblitzel, C., et al.: Ironfleet: proving safety and liveness of practical distributed systemsp. Commun. ACM 60(7), 83–92 (2017)
Article Google Scholar
Srikanth, T.K., Toueg, S.: Optimal clock synchronization. J. ACM 34(3), 626–645 (1987)
Article MathSciNet Google Scholar
Konnov, I., Lazić, M., Veith, H., Widder, J.: A short counterexample property for safety and liveness verification of fault-tolerant distributed algorithms. In: POPL, pp. 719–734 (2017)
Google Scholar
Konnov, I., Veith, H., Widder, J.: On the completeness of bounded model checking for threshold-based distributed algorithms: reachability. Inf. Comput. 252, 95–109 (2017). https://doi.org/10.1016/j.ic.2016.03.006
Article MathSciNet Google Scholar
Kopetz, H., Grünsteidl, G.: TTP - a protocol for fault-tolerant real-time systems. IEEE Comput. 27(1), 14–23 (1994). https://doi.org/10.1109/2.248873
Article Google Scholar
Kragl, B., Qadeer, S., Henzinger, T.A.: Synchronizing the asynchronous. In: CONCUR, pp. 1–17 (2018)
Google Scholar
Kukovec, J., Konnov, I., Widder, J.: Reachability in parameterized systems: all flavors of threshold automata. In: CONCUR. LIPIcs, vol. 118, pp. 1–17 (2018)
Google Scholar
Lincoln, P., Rushby, J.: A formally verified algorithm for interactive consistency under a hybrid fault model. In: FTCS, pp. 402–411 (1993)
Google Scholar
Lynch, N.: Distributed Algorithms. Morgan Kaufman (1996)
Google Scholar
Marić, O., Sprenger, C., Basin, D.: Cutoff bounds for consensus algorithms. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10427, pp. 217–237. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63390-9_12
Chapter Google Scholar
Presburger, M.: Über die vollständigkeit eines gewissen systems der arithmetik ganzer zahlen, in welchem die addition als einzige operation hervortritt. Comptes Rendus du I congres de Mathématiciens des Pays Slaves, pp. 92–101 (1929)
Google Scholar
Pugh, W.: A practical algorithm for exact array dependence analysis. Commun. ACM 35(8), 102–114 (1992)
Article Google Scholar
Rahli, V., Guaspari, D., Bickford, M., Constable, R.L.: Formal specification, verification, and implementation of fault-tolerant systems using EventML. ECEASST 72 (2015)
Google Scholar
Raynal, M.: Fault-tolerant agreement in synchronous message-passing systems. Synth. Lect. Distrib. Comput. Theory 1(1), 1–189 (2010)
Google Scholar
Stoilkovska, I.: Manually Encoded Synchronous Threshold Automata. https://github.com/istoilkovska/syncTA/algorithms. Accessed Oct 2020
Stoilkovska, I.: Receive Synchronous Threshold Automata. https://github.com/istoilkovska/syncTA/receiveSTA. Accessed Oct 2020
Stoilkovska, I., Konnov, I., Widder, J., Zuleger, F.: Verifying safety of synchronous fault-tolerant algorithms by bounded model checking. In: Vojnar, T., Zhang, L. (eds.) TACAS 2019. LNCS, vol. 11428, pp. 357–374. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17465-1_20
Chapter Google Scholar
Stoilkovska, I., Konnov, I., Widder, J., Zuleger, F.: Eliminating message counters in threshold automata. In: Hung, D.V., Sokolsky, O. (eds.) ATVA 2020. LNCS, vol. 12302, pp. 196–212. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59152-6_11
Chapter Google Scholar
Wilcox, J.R., et al.: Verdi: a framework for implementing and formally verifying distributed systems. In: PLDI, pp. 357–368 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

TU Wien, Vienna, Austria
Ilina Stoilkovska & Florian Zuleger
Informal Systems, Vienna, Austria
Ilina Stoilkovska, Igor Konnov & Josef Widder

Authors

Ilina Stoilkovska
View author publications
You can also search for this author in PubMed Google Scholar
Igor Konnov
View author publications
You can also search for this author in PubMed Google Scholar
Josef Widder
View author publications
You can also search for this author in PubMed Google Scholar
Florian Zuleger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ilina Stoilkovska .

Editor information

Editors and Affiliations

University of Copenhagen, Copenhagen, Denmark
Fritz Henglein
Tel Aviv University, Tel Aviv, Israel
Sharon Shoham
Technion, Haifa, Israel
Yakir Vizel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stoilkovska, I., Konnov, I., Widder, J., Zuleger, F. (2021). Eliminating Message Counters in Synchronous Threshold Automata. In: Henglein, F., Shoham, S., Vizel, Y. (eds) Verification, Model Checking, and Abstract Interpretation. VMCAI 2021. Lecture Notes in Computer Science(), vol 12597. Springer, Cham. https://doi.org/10.1007/978-3-030-67067-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-67067-2_10
Published: 12 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67066-5
Online ISBN: 978-3-030-67067-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Eliminating Message Counters in Synchronous Threshold Automata

Abstract

Similar content being viewed by others

Eliminating Message Counters in Threshold Automata

Verifying safety of synchronous fault-tolerant algorithms by bounded model checking

Verifying Safety of Synchronous Fault-Tolerant Algorithms by Bounded Model Checking

1 Introduction

2 Our Approach at a Glance

3 Synchronous Threshold Automata

3.1 Receive Synchronous Threshold Automata

3.2 Environment Assumption and Modeling Faults

4 Counter Systems

5 Synchronous Transition Systems

6 Abstracting rSTA to STA

6.1 Guard Strengthening

Definition 1

6.2 Eliminating the Receive Variables

Definition 2

Definition 3

Proposition 1

6.3 Soundness and Completeness

Theorem 1

7 Experimental Evaluation

8 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Eliminating Message Counters in Synchronous Threshold Automata

Abstract

Similar content being viewed by others

Eliminating Message Counters in Threshold Automata

Verifying safety of synchronous fault-tolerant algorithms by bounded model checking

Verifying Safety of Synchronous Fault-Tolerant Algorithms by Bounded Model Checking

1 Introduction

2 Our Approach at a Glance

3 Synchronous Threshold Automata

3.1 Receive Synchronous Threshold Automata

3.2 Environment Assumption and Modeling Faults

4 Counter Systems

5 Synchronous Transition Systems

6 Abstracting rSTA to STA

6.1 Guard Strengthening

Definition 1

6.2 Eliminating the Receive Variables

Definition 2

Definition 3

Proposition 1

6.3 Soundness and Completeness

Theorem 1

7 Experimental Evaluation

8 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation