Accuracy of Message Counting Abstraction in Fault-Tolerant Distributed Algorithms

Konnov, Igor; Widder, Josef; Spegni, Francesco; Spalazzi, Luca

doi:10.1007/978-3-319-52234-0_19

Igor Konnov¹⁵,
Josef Widder¹⁵,
Francesco Spegni¹⁶ &
…
Luca Spalazzi¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10145))

Included in the following conference series:

International Conference on Verification, Model Checking, and Abstract Interpretation

938 Accesses
2 Citations

Abstract

Fault-tolerant distributed algorithms are a vital part of mission-critical distributed systems. In principle, automatic verification can be used to ensure the absence of bugs in such algorithms. In practice however, model checking tools will only establish the correctness of distributed algorithms if message passing is encoded efficiently. In this paper, we consider abstractions suitable for many fault-tolerant distributed algorithms that count messages for comparison against thresholds, e.g., the size of a majority of processes. Our experience shows that storing only the numbers of sent and received messages in the global state is more efficient than explicitly modeling message buffers or sets of messages. Storing only the numbers is called message-counting abstraction. Intuitively, this abstraction should maintain all necessary information. In this paper, we confirm this intuition for asynchronous systems by showing that the abstract system is bisimilar to the concrete system. Surprisingly, if there are real-time constraints on message delivery (as assumed in fault-tolerant clock synchronization algorithms), then there exist neither timed bisimulation, nor time-abstracting bisimulation. Still, we prove this abstraction useful for model checking: it preserves ATCTL properties, as the abstract and the concrete models simulate each other.

Supported by: the Austrian Science Fund (FWF) through the National Research Network RiSE (S11403 and S11405), and project PRAVDA (P27722); and by the Vienna Science and Technology Fund (WWTF) through project APALACHE (ICT15-103).

Access provided by CONRICYT-eBooks. Download conference paper PDF

Modular Model-Checking of a Byzantine Fault-Tolerant Protocol

Parameterized Model Checking of Synchronous Distributed Algorithms by Abstraction

Eliminating Message Counters in Synchronous Threshold Automata

1 Introduction

The following algorithmic idea is pervasive in fault-tolerant distributed computing [13, 21, 30, 33, 36, 39]: each correct process counts messages received from distinct peers. Then, given the total number of processes n and the maximum number of faulty processes t, a process performs certain actions only if the message counter reaches a threshold such as $n-t$ (this number ensures that faulty processes alone cannot prevent progress in the computation). A list of benchmark algorithms that use such thresholds can be found in [27]. On the left of Fig. 1, we give an example pseudo code [36]. This algorithm works in a timed environment [35] (with a time bound $\tau ^+$ on message delays) in the presence of Byzantine faults ($n > 3t$) and provides safety and liveness guarantees such as:

(a):: If a correct process accepts (that is, executes Line 19) at time T, then all correct processes accept by time $T+2\tau ^+$.
(b):: If all correct processes start with $ myval _i = 0$, then no correct process ever accepts.
(c):: If all correct processes start with $ myval _i = 1$, then at least one correct process eventually accepts.

As is typical for the distributed algorithms literature, the pseudo code from Fig. 1 omits “unnecessary book-keeping” details of message passing. That is, neither the local data structures that store the received messages nor the message buffers are explicitly described. Hence, if we want to automatically verify such an algorithm design, it is up to a verification expert to find adequate modeling and proper abstractions of message passing.

The authors of [23] suggested to model message passing using message counters instead of keeping track of individual messages. This modeling was shown experimentally to be efficient for fixed size systems, and later a series of parameterized model checking techniques was based upon it [22, 23, 25,26,27]. The encoding on the right of Fig. 1 is obtained by adding a global integer variable nsntEcho. Incrementing this variable (Line 36) encodes that a correct process executes Line 15 of the original pseudo code. The ith process keeps the number of received messages in a local integer variable $\texttt {rcvdEcho}_i$ that can be increased, as long as the invariant $\texttt {rcvdEcho}_i \le \texttt {nsntEcho} + f$ is preserved, where f is the actual number of Byzantine faulty processes in the run. (This models that correct processes can receive up to f messages sent by faulty processes.) In fact, this modeling can be seen as a message-counting abstraction of a distributed system that uses message buffers.

The broadcast primitive in Fig. 1 is also used in the seminal clock synchronization algorithm from [35]. For clock synchronization, the precision of the clocks depends on the timing behavior^{Footnote 1} of the message system that the processes use to re-synchronize; e.g., in [35] it is required that each message sent at an instant T by a correct process must be delivered by a correct recipient process in the time interval $[T+\tau ^-, T+\tau ^+]$ for some bounds $\tau ^-$ and $\tau ^+$ fixed in each run.

The standard theory of timed automata [7] does not account for message passing directly. To incorporate messages, one specifies a message passing system as a network of timed automata, i.e., a collection of timed automata that are scheduled with respect to interleaving semantics and interact via rendezvous, synchronous broadcast, or shared variables [12]. In this case, there are two typical ways to encode message passing: (i) for each pair of processes, introduce a separate timed automaton that models a channel between the processes, or (ii) introduce a single timed automaton that stores messages from timed automata (modeling the processes) and delivers the messages later by respecting the timing constraints. The same applies to Timed I/O automata [24]. Both solutions maintain much more details than required for automated verification of distributed algorithms such as [35]: First, processes do not compare process identifiers when making transitions, and thus are symmetric. Second, processes do not compare identifiers in the received messages, but only count messages.

For automated verification purposes, it appears natural to model such algorithms with timed automata that use a message-counting abstraction. However, the central question for practical verification is: how precise is the message-counting abstraction? In other words, given an algorithm A, what is the strongest equivalence between the model $M_S(A)$ using message sets and the model $M_C(A)$ using message counting. If the message counting abstraction is too coarse, then this may lead to spurious counterexamples, which may result in many refinement steps [17], or even may make the verification procedure incomplete.

Contributions. We introduce timed and untimed models suitable for the verification of threshold-based distributed algorithms, and establish relations between these models. An overview of the following contributions is depicted in Fig. 2:

We define a model of processes that count messages. We then compose them into asynchronous systems (interleaving semantics). We give two variants: message passing, where the messages are stored in sets, and message counting, where only the number of sent messages is stored in shared variables.
We then show that in the asynchronous case, the message passing and the message counting variants are bisimilar. This proves the intuition that underlies the verification results from [22, 23, 25, 27]. It explains why no spurious counterexamples due to message-counting abstraction were experienced in the experimental evaluation of the verification techniques from [22].
We obtain timed models by adding timing constraints on message delays that restrict the message reception time depending on the sending times.
We prove the surprising result that, in general, there is neither timed bisimulation nor time-abstracting bisimulation between the message passing and the message counting variants.
Finally, we prove that there is timed simulation equivalence between the message passing and the message counting variants. This paves a way for abstraction-based model checking of timed distributed algorithms [35].

In the following section, we briefly recall the classic definitions of transition systems, timed automata, and simulations [7, 16]. However, the timed automata defined there do not provide standard means to express processes that communicate via asynchronous message passing, as required for distributed algorithms. As we are interested in timed automata that capture this structure, we first define asynchronous message passing in Sect. 3 and then add timing constraints in Sect. 4 via message sets and message counting.

2 Preliminaries

We recall the classic definitions to the extent necessary for our work, and add two non-standard notions: First, our definition of a timed automaton assumes partitioning of the set of clocks into two disjoint sets: the message clocks (used to express the timing constraints of the message system underlying the distributed algorithm) and the specification clocks (used to express the specifications). Second, we assume that clocks are “not ticking” before they are started (more precisely, they are initialized to $-\infty $).

We will use the following sets: the set of Boolean values ${\mathbb {B}}=\{\mathsf {F}, \mathsf {T}\}$, the set of natural numbers $\mathbb {N}= \{1, 2, \dots \}$, the set ${\mathbb {N}_0}= \mathbb {N}\cup \{ 0 \}$, the set of non-negative reals ${\mathbb {R}_{\ge 0}}$, and the set of time instants $\mathbb {T}:= {\mathbb {R}_{\ge 0} \cup \{-\infty \}}$.

Transition Systems. Given a finite set $\mathsf {AP}$ of atomic propositions, a transition system is a tuple $\textit{TS}=(S, S^0, R, L)$ where $S$ is a set of states, $S^0 \subseteq S$ are the initial states, $R\subseteq S\times S$ is a transition relation, and $L: S\rightarrow 2^{\mathsf {AP}}$ is a labeling function.

Clocks. A clock is a variable that ranges over the set $\mathbb {T}$. We call a clock that has the value $-\infty $ uninitialized. For a set X of clocks, a clock valuation is a function $\nu : X \rightarrow \mathbb {T}$. Given a clock valuation $\nu $ and a $\delta \in {\mathbb {R}_{\ge 0}}$, we define $\nu + \delta $ to be the valuation $\nu '$ such that $\nu '(c) = \nu (c) + \delta $ for $c \in X$ (Note that $-\infty + \delta = -\infty $). For a set $Y \subseteq X$ and a clock valuation $\nu : X \rightarrow \mathbb {T}$, we denote by $\nu [Y:=0]$ the valuation $\nu '$ such that $\nu '(c) = 0$ for $c \in Y \cap X$ and $\nu '(c) = \nu (c)$ for $c \in X \setminus Y$. Given a set of clocks Z, the set of clock constraints $\varPsi (Z)$ is defined to contain all expressions generated by the following grammar:

$$\begin{aligned} \zeta := c \le a ~ | ~ c \ge a ~ | ~ c < a ~ | ~ c > a ~ | ~ \zeta \wedge \zeta \qquad \text { for } c \in Z, a \in {\mathbb {N}_0}\end{aligned}$$

Timed Automata. Given a set of atomic propositions $\mathsf {AP}$ and a finite transition system $(S, S^0, R, L)$ over $\mathsf {AP}$, which models discrete control of a system, we model the system’s real-time behavior with a timed automaton, i.e., a tuple $\textit{TA}=(S, S^0, R, L, X\cup U, I, E)$ with the following properties:

The set $X\cup U$ is the disjoint union of the sets of message clocks $X$ and specification clocks $U$.
The function $I: S\rightarrow \varPsi (X\cup U)$ is a state invariant, which assigns to each discrete state a clock constraint over $X\, \cup \, U$, which must hold in that state. We denote by that the clock valuations $\mu $ and $\nu $ satisfy the constraints of $I(s)$.
$E: R \rightarrow \varPsi (X\cup U) \times 2^{(X\cup U)}$ is a state switch relation that assigns to each transition a guard on clock values and a (possibly empty) set of clocks that must be reset to zero, when the transition takes place.

We assume that $\mathsf {AP}$ is disjoint from $\varPsi (X\, \cup \, U)$. Thus, the discrete behavior does not interfere with propositions on time. The semantics of a timed automaton $\textit{TA}= (S, S^0, R, L, X\cup U, I, E)$ is an infinite transition system $\textit{TS}(\textit{TA})= (Q, Q^0, \varDelta , \lambda )$ over propositions $\mathsf {AP}\cup \varPsi (U)$ with the following properties [6]:

1.
The set $Q$ of states consists of triples $(s, \mu , \nu )$, where $s \in S$ is the discrete component of the state, whereas $\mu : X\rightarrow \mathbb {T}$ and $\nu : U\rightarrow \mathbb {T}$ are valuations of the message and specification clocks respectively such that .
2.
The set $Q^0 \subseteq Q$ of initial states comprises triples $(s_0, \mu _0, \nu _0)$ with $s_0 \in S_0$, and clocks are set to $-\infty $, i.e., $\forall c \in X.\ \mu _0(c) = -\infty $ and $\forall c \in U.\ \nu _0(c) = -\infty $.
3.
The transition relation $\varDelta $ contains pairs $((s, \mu , \nu ), (s', \mu ', \nu '))$ of two kinds of transitions:
1. (a)
  A time step: $s' = s$ and $\mu ' = \mu + \delta $, $\nu ' = \nu + \delta $, for $\delta >0$, provided that for all $\delta ':\ 0 \le \delta ' \le \delta $ the invariant is preserved, i.e., .
2. (b)
  A discrete step: there is a transition $(s, s') \in R$ with $(\varphi , Y) = E((s, s'))$ whose guard $\varphi $ is enabled, i.e., , and the clocks from Y are reset, i.e., $\mu ' = \mu [Y \cap X := 0]$, $\nu ' = \nu [Y \cap U := 0]$, provided that .
Given a transition $(q,q') \in \varDelta $, we write $q \xrightarrow {\delta }_\varDelta q'$ for a time step with delay $\delta \in {\mathbb {R}_{\ge 0}}$, or $q \rightarrow _\varDelta q'$ for a discrete step.
4.
The labeling function $\lambda : Q \rightarrow 2^{\mathsf {AP}\cup \varPsi (U)}$ is defined as follows. For any state $q = (s, \mu , \nu )$, the labeling .

Comparing System Behaviors. For transition systems $\textit{TS}_i=(S_i, S_i^0, R_i, L_i)$ for $i\in \{ 1,2 \}$, a relation $H \subseteq S_1 \times S_2$ is a simulation, if (i) for each $(s_1, s_2) \in H$ the labels coincide $L_1(s_1) = L_2(s_2)$, and (ii) for each transition $(s_1, t_1) \in R_1$, there is a transition $(s_2, t_2) \in R_2$ such that $(t_1, t_2) \in H$. If, in addition, the set $H^{-1} = \{ (s_2, s_1) :(s_1, s_2) \in H \}$ is also a simulation, then H is called bisimulation.

Further, if $\textit{TA}_1$ and $\textit{TA}_2$ are timed automata with $\textit{TS}(\textit{TA}_i)= (Q_i, Q_i^0, \varDelta _i, \lambda _i)$ for $i\in \{ 1,2 \}$, then a simulation $H \subseteq Q_1 \times Q_2$ is called timed simulation [29], and a bisimulation $B \subseteq Q_1 \times Q_2$ is called timed bisimulation [15].

For transition systems $\textit{TS}_i=(S_i, S_i^0, R_i, L_i)$ for $i\in \{1,2\}$, we say that a simulation $H \subseteq S_1 \times S_2$ is initial, if $\forall s \in S_1^0\,\ \exists t \in S_2^0.\ (s, t) \in H$. A bisimulation $B \subseteq S_1 \times S_2$ is initial, if the simulations B and $B^{-1}$ are initial. The same applies to timed (bi-)simulations. Then, for $i\in \{1,2\}$, we recall the standard preorders and equivalences on a pair of transition systems $\textit{TS}_i=(S_i, S_i^0, R_i, L_i)$, and on a pair of timed automata $\textit{TA}_i$, where $\textit{TS}(\textit{TA}_i)= (Q_i, Q_i^0, \varDelta _i, \lambda _i)$:

1.
$\textit{TS}_1 \approx \textit{TS}_2$ (bisimilar), if there is an initial bisimulation $B \subseteq S_1 \times S_2$.
2.
$\textit{TA}_1 \preceq ^{t}\textit{TA}_2$ ($\textit{TA}_2$ time-simulates $\textit{TA}_1$), if there is an initial timed simulation $H \subseteq Q_1 \times Q_2$.
3.
$\textit{TA}_1 \approx ^{t}\textit{TA}_2$ (time-bisimilar), if there is an initial timed bisimulation $B \subseteq Q_1 \times Q_2$.
4.
$\textit{TA}_1 \simeq ^{t}\textit{TA}_2$ (time-simulation equivalent), if $\textit{TA}_1 \preceq ^{t}\textit{TA}_2$ and $\textit{TA}_2 \preceq ^{t}\textit{TA}_1$.

Timed bisimulation forces time steps to advance clocks by the same amount of time. A coarser relation — called time-abstracting bisimulation [37] — allows two transition systems to advance clocks at “different speeds”. Given two timed automata $\textit{TA}_i$, for $i\in \{1,2\}$ and the respective transition systems $\textit{TS}(\textit{TA}_i) = (Q_i, Q_i^0, \varDelta _i, \lambda _i)$, a binary relation $B \subseteq Q_1 \times Q_2$ is a time-abstracting bisimulation [37], if the following holds for every pair $(q_1,q_2) \in B$:

1.
The labels coincide: $\lambda _1(q_1) = \lambda _2(q_2)$;
2.
For all j and k such that $\{j,k\} = \{1,2\}$, and each discrete step $q_j \rightarrow _{\varDelta _j} r_j$, there is a discrete step $q_k \rightarrow _{\varDelta _k} r_k$ and $(r_j, r_k) \in B$;
3.
For all j and k such that $\{j,k\} = \{1,2\}$, a delay $\delta \in {\mathbb {R}_{\ge 0}}$, and a time step $q_j \xrightarrow {\delta }_{\varDelta _j} r_j$, there is a delay $\delta ' \in {\mathbb {R}_{\ge 0}}$ and a time step $q_k \xrightarrow {\delta '}_{\varDelta _k} r_k$ such that $(r_j, r_k) \in B$.

By substituting $\delta '$ with $\delta $, one obtains the definition of timed bisimulation.

3 Asynchronous Message Passing Systems

Timed automata as defined above neither capture processes nor communication via messages, as would be required to model distributed algorithms. Hence we now introduce these notions and then construct an asynchronous system using processes and message passing (or message counting). We assume that at every step a process receives and sends at most one message [19]. In Sect. 4, we add time to this modeling in order to obtain a timed automaton.

Single Correct Process. We assume a (possibly infinite) set of control states $\mathcal{{L}}$ and a subset $\mathcal{{L}}_0 \subseteq \mathcal{{L}}$ of initial control states. We fix a finite set $\mathsf {MT}$ of message types. We assume that the control states in $\mathcal{{L}}$ keep track of the messages sent by a process. Thus, $\mathcal{{L}}$ comes with a predicate $\mathsf {is\_sent}:\mathcal{{L}}\times \mathsf {MT}\rightarrow {\mathbb {B}}$, where $\mathsf {is\_sent}(\ell , m)$ evaluates to true if and only if a message of type m has been sent according to the control state $\ell $. Finally, we introduce a set $\varPi $ of parameters and store the parameter values in a vector $\mathbf {p}\in {\mathbb {N}_0}^{{|\varPi |}}$. As noted in [22], parameter values are typically restricted with a resilience condition such as $n > 3t$ (less than a third of the processes are faulty), so we will assume that there is a set of all admissible combinations of parameter values $\mathbf {P}_{RC}\subseteq {\mathbb {N}_0}^{{|\varPi |}}$.

The behavior of a single process is defined as a process transition relation $\mathcal{{T}}\subseteq \mathcal{{L}}\times {\mathbb {N}_0}^{|\varPi |} \times {\mathbb {N}_0}^{|\mathsf {MT}|} \times \mathcal{{L}}$ encoding transitions guarded by conditions on message counters that range over ${\mathbb {N}_0}^{|\mathsf {MT}|}$: when $(\ell , \mathbf {p}, \mathbf {c}, \ell ') \in \mathcal{{T}}$, a process can make a transition from the control state $\ell $ to the control state $\ell '$, provided that, for every $m \in \mathsf {MT}$, the number of received messages of type m is greater than or equal to $\mathbf {c}(m)$ in a configuration with parameter values $\mathbf {p}$.

Example 3.1

The process shown in Fig. 1 can be written in our definitions as follows. The algorithm is using only one message type, and thus $\mathsf {MT}= \{\textsf {ECHO}\}$. We assume a set of control states $\mathcal{{L}}= \{\textsf {V0}, \textsf {V1}, \textsf {SE}, \textsf {AC}\}$: $\textsf {V0}$ and $\textsf {V1}$ encode the initial states where $ myval =0$ and $ myval =1$ respectively, ${\mathsf {pc}}=\textsf {SE}$ encodes the status “$\textsf {ECHO}$ sent before”, and ${\mathsf {pc}}=\textsf {AC}$ encodes the status “accept”. The initial control states are: $\mathcal{{L}}_0 = \{\textsf {V0}, \textsf {V1}\}$. The transition relation contains four types of transitions: $t_1^\mathbf {p}= (\textsf {V0}, \mathbf {p}, \mathbf {c}_1, \textsf {SE})$, $t_2^\mathbf {p}= (\textsf {V0}, \mathbf {p}, \mathbf {c}_2, \textsf {AC})$, $t_3^\mathbf {p}= (\textsf {V1}, \mathbf {p}, \mathbf {c}_3, \textsf {SE})$, and $t_4^\mathbf {p}= (\textsf {SE}, \mathbf {p}, \mathbf {c}_2, \textsf {AC})$, for any $\mathbf {p}\in {\mathbb {N}_0}^{|\varPi |}$ and $\mathbf {c}_1,\mathbf {c}_2,\mathbf {c}_3$ satisfying the following: $\mathbf {c}_1(\textsf {ECHO}) \ge \mathbf {p}(t) + 1$, $\mathbf {c}_2(\textsf {ECHO}) \ge \mathbf {p}(n) - \mathbf {p}(t)$, and $\mathbf {c}_3(\textsf {ECHO}) \ge 0$. Finally, $\mathsf {is\_sent}(\ell , \textsf {ECHO})$ iff $\ell \in \{\textsf {SE}, \textsf {AC}\}$. A concise graphical representation of the transition relation is given in Fig. 3. There, each edge represents multiple transitions of the same type. Let us observe that while the action of sending a message can be inferred by simply checking all the transitions going from a state s to a state t such that $\lnot \mathsf {is\_sent}(s)$ and $\mathsf {is\_sent}(t)$, the action of receiving an individual message is not part of the process description at this level. However, if a guarded transition is taken, this implies that a threshold has been reached, e.g., in case of $\mathbf {c}_1$, at least $t+1$ messages were received. $\lhd $

We make two assumptions typical for distributed algorithms [19, 35]:

A1:: Processes do not forget that they have sent messages: If $(\ell , \mathbf {p}, \mathbf {c}, \ell ') \in \mathcal{{T}}$, then $\mathsf {is\_sent}(\ell , m) \rightarrow \mathsf {is\_sent}(\ell ', m)$ for every $m \in \mathsf {MT}$.
A2:: At each step a process sends at most one message to all: If $(\ell , \mathbf {p}, \mathbf {c}, \ell ') \in \mathcal{{T}}$ and $\lnot \mathsf {is\_sent}(\ell , m) \wedge \mathsf {is\_sent}(\ell ', m) \wedge \lnot \mathsf {is\_sent}(\ell , m') \wedge \mathsf {is\_sent}(\ell ', m')$ then $m=m'$.

Then, we call $(\mathsf {MT}, \mathcal{{L}}, \mathcal{{L}}_0, \mathcal{{T}})$ a process template.

Table 1. The message-passing and message-counting interpretations

Full size table

Asynchronous Message Passing and Counting in Presence of Byzantine Faults. In this section we introduce two ways of modeling message passing: by storing messages in sets, and by counting messages. As in [23], we do not explicitly model Byzantine processes [32], but capture their effect on the correct processes in the form of spurious messages. Although we do not discuss other kinds of faults (e.g., crashes, symmetric faults, omission faults), it is not hard to model other faults by following the modeling in [23].

We fix a set of processes $\mathsf {Proc}$, which is typically defined as $\{1, \dots , n\}$ for $n \ge 1$. Further, assume that there are two disjoint sets: the set $\mathsf {Corr}\subseteq \mathsf {Proc}$ of correct processes, and $\mathsf {Byz}\subseteq \mathsf {Proc}$ of Byzantine processes (possibly empty), with $\mathsf {Byz}\ \cup \ \mathsf {Corr}= \mathsf {Proc}$. Given a process template $(\mathsf {MT}, \mathcal{{L}}, \mathcal{{L}}_0, \mathcal{{T}})$, we refer to $(\mathsf {MT},\mathcal{{L}}, \mathcal{{L}}_0, \mathcal{{T}}, \mathsf {Corr}, \mathsf {Byz})$ as a design. Note that a design does not capture how processes interact with messages. To do so, in Table 1, we define message passing (MP) and message counting (MC) models as interpretations of the signature $( Msg , MsgSets , init , card , \textit{add}, inTransit )$, with the following informal meaning:

$ Msg $: the set of all messages that can be exchanged by the processes,
$ MsgSets $: collections of messages,
$ init $: the empty collection of messages,
$ card $: a function that counts messages of the given type,
$\textit{add}$: a function that adds a message to a collection of messages,
$ inTransit $: a function that checks whether a message is in transit and thus can be received.

Transition Systems. Fix interpretations $( Msg _I, MsgSets _I, init _I, card _I, \textit{add}_I, inTransit _I)$ for $I \in \{ MP , MC \}$. Then, we define a transition system $\textit{TS}^I = (S^{I}, S_0^{I}, R^{I}, L^{I})$ of processes from $\mathsf {Proc}$ that communicate with respect to interpretation I. We call message-passing system the transition system obtained using the interpretation MP, and message-counting system the transition system obtained using the interpretation MC.

The set $S^{I}$ contains configurations, i.e., tuples $(\mathbf {p}, {\mathsf {pc}}, \mathsf {rcvd}, \mathsf {sent})$ having the following properties: (a) $\mathbf {p}\in {\mathbb {N}_0}^{{|\varPi |}}$, (b) ${\mathsf {pc}}: \mathsf {Corr}\rightarrow \mathcal{{L}}$, (c) $\mathsf {rcvd}: \mathsf {Corr}\rightarrow MsgSets _I$, and (d) $\mathsf {sent}\in MsgSets _I$. In a configuration, for every process $p \in \mathsf {Corr}$, the values ${\mathsf {pc}}(p)$ and $\mathsf {rcvd}(p)$ comprise the local view of the process p, while the components $\mathsf {sent}$ and $\mathbf {p}$ comprise the shared state of the distributed system. A configuration $\sigma \in S^{I}$ belongs to the set $S^{I}_0$ of initial configurations, if for each process $p \in \mathsf {Corr}$, it holds that: (a) $\sigma .{\mathsf {pc}}(p) \in \mathcal{{L}}_0$, (b) $\sigma .\mathsf {rcvd}(p) = init _{I}$, (c) $\sigma .\mathsf {sent}= init _{I}$, and (d) $\sigma .\mathbf {p}\in \mathbf {P}_{RC}$.

Definition 3.2

The transition relation $R^{I}$ contains a pair of configurations $(\sigma , \sigma ') \in S^{I} \times S^{I}$, if there is a correct process $p \in \mathsf {Corr}$ that satisfies:

1.
There exists a local transition $(\ell , \mathbf {p}, \mathbf {c}, \ell ') \in \mathcal{{T}}$ satisfying $\sigma .{\mathsf {pc}}(p) = \ell $ and $\sigma '.{\mathsf {pc}}(p) = \ell '$ and for all m in $\mathsf {MT}$, $\mathbf {c}(m) = card _I(m, \sigma '.\mathsf {rcvd}(p))$. Also, it is required that $\sigma .\mathbf {p}= \sigma '.\mathbf {p}= \mathbf {p}$.
2.
Messages are received and sent according to the signature:
1. (a)
  Process p receives no message: $\sigma '.\mathsf {rcvd}(p) = \sigma .\mathsf {rcvd}(p)$, or there is a message in transit in $\sigma $ that is received in $\sigma '$, i.e., there is a message $ msg \in Msg _I$ satisfying:
  
  $ inTransit _I( msg , \sigma .\mathsf {rcvd}(p), \sigma .\mathsf {sent}) \; \wedge \; \sigma '.\mathsf {rcvd}(p) = \textit{add}_I( msg , \sigma .\mathsf {rcvd}(p)).$
2. (b)
  The shared variable $\mathsf {sent}$ is changed iff process p sends a message, that is, $\sigma '.\mathsf {sent}= \textit{add}_I( msg , \sigma .\mathsf {sent})$, if and only if $\lnot \mathsf {is\_sent}(\sigma .{\mathsf {pc}}(p), m)$ and $\mathsf {is\_sent}(\sigma '.{\mathsf {pc}}(p), m)$, for every $m \in \mathsf {MT}$ and $ msg \in Msg _I$ of type m.
3.
The processes different from p do not change their local states: $\sigma '.{\mathsf {pc}}(q) = \sigma .{\mathsf {pc}}(q)$ and $\sigma '.\mathsf {rcvd}(q) = \sigma .\mathsf {rcvd}(q)$ for $q \in \mathsf {Corr}\setminus \{p \}$.

The labeling function $L^{I}: S^{I} \rightarrow \mathcal{{L}}^{|\mathsf {Corr}|} \times \left( {\mathbb {N}_0}^{|\mathsf {MT}|} \right) ^ { |\mathsf {Corr}|}$ labels each configuration $\sigma \in S^{I}$ with the vector of control states and message counters, i.e., $L^I(\sigma ) = ((\ell _1, \dots , \ell _{|\mathsf {Corr}|}), (\mathbf {c}_1,\dots ,\mathbf {c}_{|\mathsf {Corr}|}))$ such that $\ell _p = \sigma .{\mathsf {pc}}(p)$ and $\mathbf {c}_p(m) = card _I(m, \sigma .\mathsf {rcvd}(p))$ for $p \in \mathsf {Corr}$, $m \in \mathsf {MT}$. (For simplicity we use the convention that $\mathsf {Corr}= \{1, \dots j \}$, for some $j \in \mathbb {N}$.) Note that $L^I$ labels a configuration with the process control states and the number of messages received by each process.

The message-passing transition systems have the following features. The messages sent by correct processes are stored in the shared set $\mathsf {sent}$. In this modeling, the messages from Byzantine processes are not stored in $\mathsf {sent}$ explicitly, but can be received at any step. Each correct process $p \in \mathsf {Corr}$ stores received messages in its local set $\mathsf {rcvd}(p)$, whose elements originate from the messages stored in the set $\mathsf {sent}$ or from Byzantine processes.

The message-counting transition systems have the following features. Messages are not stored explicitly, but are only counted. We maintain two vectors of counters: (i) representing the number of messages that originate from correct processes (these messages have the tag $\mathsf {C}$), and (ii) representing the number of messages that originate from faulty processes (these messages have the tag $\mathsf {F}$). Each correct process $p \in \mathsf {Corr}$ keeps two such vectors of counters $\mathbf {c}_\mathsf {C}$ and $\mathbf {c}_\mathsf {F}$ in its local variable $\mathsf {rcvd}(p)$. In the following, we refer to $\mathbf {c}_\mathsf {C}$ and $\mathbf {c}_\mathsf {F}$ using the notation $[{\mathsf {rcvd}(p)} ]_{\mathsf {C}}$ and $[{\mathsf {rcvd}(p)} ]_{\mathsf {F}}$. The number of sent messages is also stored as a pair of vectors $[{\mathsf {sent}} ]_{\mathsf {C}}$ and $[{\mathsf {sent}} ]_{\mathsf {F}}$. By the definition of the transition relation $R^{\textsf {MC}}$, the vector $[{\mathsf {sent}} ]_{\mathsf {F}}$ is always equal to the zero vector, whereas the correct process p can increment its counter $[{\mathsf {rcvd}(p)} ]_{\mathsf {F}}$, if $[{\mathsf {rcvd}(p)} ]_{\mathsf {F}}(m) < |\mathsf {Byz}|$, for every $m \in \mathsf {MT}$.

To prove bisimulation between a message-passing system and a message-counting system — built from the same design — we introduce the following relation on the configurations of both systems:

Definition 3.3

Let $H^{\#}\subseteq S^\textsf {MP}\times S^\textsf {MC}$ such that $(\sigma ,\sigma ^\#) \in H^{\#}$ if for all processes $p\in \mathsf {Corr}$ and message types $m\in \mathsf {MT}$:

1.
$\sigma ^\#.{\mathsf {pc}}(p) = \sigma .{\mathsf {pc}}(p)$
2.
$\sigma ^\#.[{\mathsf {rcvd}(p)} ]_{\mathsf {C}}(m) = |\{ q \in \mathsf {Corr}:\left\langle m, q \right\rangle \in \sigma .\mathsf {rcvd}(p) \}|$
3.
$\sigma ^\#.[{\mathsf {rcvd}(p)} ]_{\mathsf {F}}(m) = |\{ q \in \mathsf {Byz}:\left\langle m, q \right\rangle \in \sigma .\mathsf {rcvd}(p) \}|$
4.
$\sigma ^\#.[{\mathsf {sent}} ]_{\mathsf {C}}(m) = |\{ q \in \mathsf {Corr}:\left\langle m, q\right\rangle \in \sigma .\mathsf {sent}\}|$
5.
$\sigma ^\#.[{\mathsf {sent}} ]_{\mathsf {F}}(m) = 0$
6.
$\{ q \in \mathsf {Proc}:\left\langle m, q \right\rangle \in \sigma .\mathsf {sent}\} \subseteq \mathsf {Corr}$
7.
$ \sigma .\mathsf {rcvd}(p) \subseteq \sigma .\mathsf {sent}\cup \{\left\langle m, q\right\rangle :m \in \mathsf {MT}, q\in \mathsf {Byz}\}$
8.
$\mathsf {is\_sent}(\sigma .{\mathsf {pc}}(p), m) \leftrightarrow \left\langle m, p\right\rangle \in \sigma .\mathsf {sent}$

Theorem 3.4

For a message-passing system $\textit{TS}^{\,\textsf {MP}}$ and a message-counting system $\textit{TS}^{\,\textsf {MC}}$ defined over the same design, $H^{\#}$ is a bisimulation.

The key argument to prove the Theorem 3.4 is that given a message counting state $\sigma ^\#$, if a step increases a counter $\mathsf {rcvd}(p)$, in the message passing system this transition can be mirrored by receiving an arbitrary message in transit. In fact, in both systems, once a message is sent it can be received at any future step. We will see that in the timed version this argument does not work anymore, due to the restricted time interval in which a message must be received.

4 Messages with Time Constraints

We now add time constraints to both, message-passing systems and message-counting systems. Following the definitions from distributed algorithms [35, 40], we assume that every message is delivered within a predefined time bound, that is, not earlier than $\tau ^-$ time units and not later than $\tau ^+$ times units since the instant it was sent, with $ 0 \le \tau ^-\le \tau ^+$. We use naturals for $\tau ^-$ and $\tau ^+$ for consistency with the literature on timed automata.

As can be seen from Sect. 2, to define a timed automaton, one has to provide an invariant and a switch relation. In the following, we fix the invariants and switch relations with respect to the timing constraints $\tau ^-$ and $\tau ^+$ on messages. However, the specifications of distributed algorithms may refer to time, e.g., “If a correct process accepts the message (round k) at time t, then every correct process does so by time $t+t_{\mathrm {del}}$” [35]. Therefore, we assume that a specification invariant (or user invariant) $I_U: 2^\mathsf {AP}\rightarrow \varPsi (U)$ and a specification switch relation (or user switch relation) $E_U: 2^\mathsf {AP}\times 2^\mathsf {AP}\rightarrow \varPsi (U) \times 2^U$ are given as input. Then, we will refer to the tuple $(\mathcal{{L}}, \mathcal{{L}}_0, \mathcal{{T}}, \mathsf {Proc}, I_U, E_U)$ as a timed design and we will assume that a timed design is fixed in the following.

Using a timed design, we will use message-passing and message-counting systems to derive two timed automata. For a message of type m sent by a correct process p, the message-passing system uses a clock $c\left\langle m,p \right\rangle $ to store the delay since the message $\left\langle m, p\right\rangle $ was sent. The message-counting system stores the delay since the ith message of type m was sent, for all i and m. Both timed automata specify an invariant to constrain the time required to deliver a message.

Definition 4.1

(Message-passing timed automaton). Given a message-passing system $\textit{TS}^{\,\textsf {MP}}=(S^{\textsf {MP}}, S^{\textsf {MP}}_0, R^{\textsf {MP}}, L^{\textsf {MP}})$ defined over a timed system design $(\mathcal{{L}}, \mathcal{{L}}_0, \mathcal{{T}}, \mathsf {Proc}, I_U, E_U)$, we say that a timed automaton $\textit{TA}^\textsf {MP}=(S^{\textsf {MP}}, S^{\textsf {MP}}_0, R^{\textsf {MP}}, L^{\textsf {MP}}, U\cup X^{\textsf {MP}}, I^{\textsf {MP}}, E^{\textsf {MP}})$ is a message-passing timed automaton, if it has the following properties:

1.
There is one clock per message that can be sent by a correct process: $X^{\textsf {MP}}= \{c\left\langle m,p \right\rangle :m \in \mathsf {MT}, p \in \mathsf {Corr}\}$.
2.
For each discrete transition $(\sigma , \sigma ') \in R^{\textsf {MP}}$, the state switch relation $E^{\textsf {MP}}(\sigma , \sigma ')$ ensures the specification invariant and resets the given specification clocks and the clocks corresponding to the message sent in transition $(\sigma , \sigma ')$. That is, if $(\varphi _U, Y_U)$ is the guard, and specification clocks are in $E_U(L^{\textsf {MP}}(\sigma ), L^{\textsf {MP}}(\sigma '))$, then $E^{\textsf {MP}}(\sigma , \sigma ') = (\varphi _U, Y_U \cup \{c\left\langle m,p \right\rangle :\left\langle m,p\right\rangle \in \sigma '.\mathsf {sent}\setminus \sigma .\mathsf {sent}\})$.
3.
Each state $\sigma \in S^{\textsf {MP}}$ has the invariant $I^{\textsf {MP}}(\sigma ) = I_U(L^{\textsf {MP}}(\sigma )) \wedge \varphi ^-_\textsf {MP}\wedge \varphi ^+_\textsf {MP}$ composed of:
1. (a)
  the specification invariant $I_U(L^{\textsf {MP}}(\sigma ))$;
2. (b)
  the lower bound on the age of received messages: $\varphi ^-_\textsf {MP}= \bigwedge _{\left\langle m,p\right\rangle \in M} c\left\langle m,p \right\rangle \ge \tau ^-$ for $M= \{\left\langle m,p\right\rangle \in \mathsf {MT}\times \mathsf {Corr}:\exists q \in \mathsf {Corr}.\ \left\langle m,p\right\rangle \in \sigma .\mathsf {rcvd}(q) \}$; and
3. (c)
  the upper bound on the age of messages that are in transit: $\varphi ^+_\textsf {MP}= \bigwedge _{(m,p) \in M} 0 \le c\left\langle m,p \right\rangle \le \tau ^+$ for $M=\{\left\langle m, p\right\rangle \in \mathsf {MT}\times \mathsf {Corr}:\left\langle m, p\right\rangle \in \sigma .\mathsf {sent}\setminus \bigcap _{q \in \mathsf {Corr}} \sigma .\mathsf {rcvd}(q)\}$.

Definition 4.2

(Message-counting timed automaton). Given a message-counting system $\textit{TS}_\textsf {MC}=(S^{\textsf {MC}}, S^{\textsf {MC}}_0, R^{\textsf {MC}}, L^{\textsf {MC}})$ defined over a timed design $(\mathcal{{L}}, \mathcal{{L}}_0, \mathcal{{T}}, \mathsf {Proc}, I_U, E_U)$, we say that a timed automaton $\textit{TA}_\textsf {MC}=(S^{\textsf {MC}}, S^{\textsf {MC}}_0, R^{\textsf {MC}}$, $L^{\textsf {MC}}, U\cup X^{\textsf {MC}}, I^{\textsf {MC}}, E^{\textsf {MC}})$ is a message-counting timed automaton, if it has the following properties:

1.
There is one clock per message type and number of messages sent. That is, $X^{\textsf {MC}}= \{c\left\langle m, i\right\rangle :m \in \mathsf {MT}, 1 \le i \le |\mathsf {Corr}| \}$.
2.
For each discrete transition $(\sigma , \sigma ') \in R^{\textsf {MC}}$, the state switch relation $E^{\textsf {MC}}(\sigma , \sigma ')$ ensures the specification invariant and resets the given specification clocks and the clocks corresponding to message counters updated by $(\sigma , \sigma ')$. That is, if $(\varphi _U, Y_U) = E_U(L^{\textsf {MC}}(\sigma ), L^{\textsf {MC}}(\sigma '))$, then the switch relation $E^{\textsf {MC}}(\sigma , \sigma ')$ is $(\varphi _U, Y_U \cup \{ c\left\langle m, k\right\rangle :m \in \mathsf {MT}, k = \sigma '.\mathsf {sent}(m) = \sigma .\mathsf {sent}(m) + 1\})$.
3.
Each state $\sigma \in S^{\textsf {MC}}$ has the invariant $I^{\textsf {MC}}(\sigma ) = I_U(L^{\textsf {MC}}(\sigma )) \wedge \varphi ^-_\textsf {MC}\wedge \varphi ^+_\textsf {MC}$ composed of:
1. (a)
  the specification invariant $I_U(L^{\textsf {MC}}(\sigma ))$;
2. (b)
  $\varphi ^-_\textsf {MC}= \bigwedge _{m \in \mathsf {MT}} a(m) > 0 \rightarrow c\left\langle m, a(m) \right\rangle \ge \tau ^-$ for the numbers $a(m) = \max _{p \in \mathsf {Corr}} [{\sigma .\mathsf {rcvd}(p)(m)} ]_{\mathsf {C}}$. If a correct process has received a(m) messages of type m from correct processes, then the a(m)-th message of type m, for every $m \in \mathsf {MT}$, was sent at least $\tau ^-$ time units earlier.
3. (c)
  $\varphi ^+_\textsf {MC}= \bigwedge _{m\in \mathsf {MT}} \bigwedge _{b(m) < j \le \sigma .\mathsf {sent}(m)} 0 \le c\left\langle m, j\right\rangle \le \tau ^+$ for the numbers $b(m) = \min _{p \in \mathsf {Corr}} [{\sigma .\mathsf {rcvd}(p)(m)} ]_{\mathsf {C}}$. If there is a correct process that has received b(m) messages of type m from correct processes, then for every number of messages $j > b(m)$, the respective clock is bounded by $\tau ^+$.

While the number of employed clocks is the same, the latter model is “more abstract”: by forgetting the identity of the sender, indeed, several configurations of the message-passing timed automaton can be mapped on the same configuration of the message-counting timed automaton.

5 Precision of Message Counting with Time Constraints

While Theorem 3.4 establishes a strong equivalence — that is, a bisimulation relation — between message-passing transition systems, we will show in Theorem 5.1 that message-passing timed automata and message-counting timed automata are not necessarily equivalent in the sense of timed bisimulation. Remarkably, such automata are also not necessarily equivalent in the sense of time-abstracting bisimulation. These results show an upper bound on the degree of precision achievable by model checking of timed properties of FTDAs by counting messages. Nevertheless, we show that such automata simulate each other, and thus they satisfy the same $\mathsf {ATCTL}$ formulas (Corollarys 5.10 and 6.2).

Theorem 5.1

There exists a timed design whose message-passing timed automaton$ TA ^{\textsf {MP}}$ and message-counting timed automaton $ TA ^{\textsf {MC}}$ satisfy:

1.
There is no initial timed bisimulation between $ TA ^{\textsf {MP}}$ and $ TA ^{\textsf {MC}}$.
2.
There is no initial time-abstracting bisimulation between $ TA ^{\textsf {MP}}$ and $ TA ^{\textsf {MC}}$.

Proof

(sketch). We give an example of a timed design proving Point 2. Since timed bisimulation is a special case of time-abstracting bisimulation, this example also proves Point 1.

We use the process template shown in Fig. 4 on page 7. Formally, this template is defined as follows: there is one parameter, i.e., $\varPi =\{n\}$, one message type, i.e., $\mathsf {MT}=\{M\}$, and two control states, i.e., $\mathcal{{L}}=\{\ell _0, \ell _1\}$. There are two types of transitions: $t_1^\mathbf {p}= (\ell _0, \mathbf {p}, \mathbf {c}_4, \ell _1)$ and $t_2^\mathbf {p}= (\ell _1, \mathbf {p}, \mathbf {c}_5, \ell _1)$. The conditions $\mathbf {c}_4$ and $\mathbf {c}_5$ require that $\mathbf {c}_4(M) = 0$ and $\mathbf {c}_5(M) \ge 0$ respectively. Every process sends a message of type M when going from $\ell _0$ to $\ell _1$, i.e., $\mathsf {is\_sent}(\ell , M) = \mathsf {T}$ iff $\ell = \ell _1$. Then the processes self-loop in the control state $\ell _1$ (by doing so, they can receive messages from the other processes).

Consider the system of two correct processes and no Byzantine processes, that is, $\mathsf {Corr}=\{1,2\}$ and $\mathsf {Byz}=\emptyset $. We fix the upper bound on message delays to be $\tau ^+= 2\tau ^-> 0$. For the sake of this proof, we set $U=\emptyset $, and thus $I_U$ and $E_U$ are defined trivially. Together, these constraints define a timed design.

Figure 5 illustrates two runs of a $ TA ^{\textsf {MP}}$ and a run of $ TA ^{\textsf {MC}}$ that should be matched by a time-abstracting bisimulation, if one exists. We show by contradiction that no such relation exists. Note that the message $\left\langle M,1\right\rangle $ has been received by all processes at the timed state $q_7$ and has not been received by the first process at the timed state $q''_7$. Thus the timed state $q_7$ admits a time step, while the timed state $q''_7$ does not. Indeed, on one hand, the timed automaton $ TA ^{\textsf {MP}}$ can advance the clocks by at most $\tau ^+- \tau ^-= \tau ^-$ time units in $q_7$ before the clock attached to the message $\left\langle M,2 \right\rangle $ expires; on the other hand, in $q''_7$, the timed automaton $ TA ^{\textsf {MP}}$ cannot advance the clocks before the clock attached to the message $\left\langle M,1 \right\rangle $ expires. However, both states must be time-abstract related to the state $r_7$ of $ TA ^{\textsf {MC}}$, because they both received the same number of messages of type M and thus their labels coincide, from which we derive the required contradiction. Hence, proving that there is no time-abstracting bisimulation. $\square $

From Theorem 5.1, it follows that message counting abstraction is not precise enough to preserve an equivalence relation as strong as bisimulation. However, for abstraction-based model checking a coarser relation, namely, timed-simulation equivalence, would be sufficient. In one direction, timed-simulation is easy: a discrete configuration of a message-passing timed automaton can be mapped to the configuration of the message-counting timed automaton by just counting the messages for each message type, while the clocks assignments are kept the same. The other direction is harder: A first approach would be to map a configuration of a message-counting timed automaton to all the configurations of the message-passing timed automaton, where the message counters are equal to the cardinalities of the sets of received messages. This mapping is problematic because of the interplay of message re-ordering and timing constraints:

Example 5.2

Figure 6 exemplifies a problematic behavior that originates from the interplay of message re-ordering and timing constraints on message delays. In the figure we see the space-time diagram of two timed message passing runs, where first process 1 sends $\left\langle M,1 \right\rangle $ at instant $t_1$, and then process 2 sends $\left\langle M,2 \right\rangle $ at a later time $t_2>t_1$. In the run on the left, process 3 receives $\left\langle M,2 \right\rangle $ at instant $t_3$ and has not received $\left\langle M,1 \right\rangle $ before. In the run on the right process 3 receives $\left\langle M,1 \right\rangle $ at instant $t_3$. Hence, at $t_3$ on the left $\left\langle M,1 \right\rangle $ is in transit, while on the right $\left\langle M,2 \right\rangle $ is in transit, which has been sent after $\left\langle M,1 \right\rangle $. As indicated by the $\tau ^+$ intervals, due to the invariants from Definition 4.1[3c], the left run is more restricted: On the left within one time step the clocks can be advanced by $\tau ^+- (t_3 - t_1)$ while on the right the clocks can advance further, namely, by $ \tau ^+- (t_3 - t_2) > \tau ^+- (t_3 - t_1)$. Message counting timed automata abstract away the origin of the messages, and intuitively, relate the sending of the ith message to the reception of i messages, which correspond to runs where messages are received “in order”, like in the run on the right. We shall formalize this below. $\lhd $

In the following, we exclude from the simulation relation those states where an in-transit message has been sent before a received one, and only consider so-called well-formed states where the messages are received in the chronological order of the sending (according to the clocks of timed automata). Indeed, we use the fact that the timing constraints of well-formed states in the message-passing system match the timing constraints in the message-counting system.

Definition 5.3

(Well-formed state). For a message-passing timed automaton $ TA ^{\textsf {MP}}$ with $\textit{TS}( TA ^{\textsf {MP}})= (Q, Q_0, \varDelta , \lambda )$, a state $(s, \mu , \nu ) \in Q$ is well-formed, if for each message type $m \in \mathsf {MT}$, each process $p \in \mathsf {Corr}$ that has received a message $ \left\langle m, p' \right\rangle $ has also received all messages of type m sent earlier than $ \left\langle m, p' \right\rangle $:

$$\begin{aligned} \left\langle m, p' \right\rangle \in s.\mathsf {rcvd}(p) \wedge \mu (c\left\langle m, p'' \right\rangle )&> \mu (c\left\langle m, p' \right\rangle )\\ \nonumber&\rightarrow \left\langle m, p''\right\rangle \in s.\mathsf {rcvd}(p) \text{ for } p', p'' \in \mathsf {Corr}\end{aligned}$$

(1)

Observe that because messages can be sent at precisely the same time, there can be different well-formed states s and $s'$ with $s.\mathsf {rcvd}(p) \ne s'.\mathsf {rcvd}(p)$. Also, considering only well-formed states does not imply that the messages are received according to the sending order in a run (which would correspond to FIFO).

We will use a mapping $\mathsf {WF}$ to abstract arbitrary states of any message passing timed automaton to sets of well-formed states in the same automaton.

Definition 5.4

Given a message-passing timed automaton $ TA ^{\textsf {MP}}$ with the transition system $\textit{TS}( TA ^{\textsf {MP}})= (Q, Q_0, \varDelta , \lambda )$, we define a mapping $\mathsf {WF}: Q\rightarrow 2^{Q}$ that maps an automaton state $(s, \mu , \nu ) \in Q$ into a set of well-formed states with each $(s', \mu ', \nu ') \in \mathsf {WF}((s, \mu , \nu ))$ having the following properties:

1.
$\mu ' = \mu $, $\nu ' = \nu $, $s'.\mathsf {sent}= s.\mathsf {sent}$, and $s.{\mathsf {pc}}(p) = s'.{\mathsf {pc}}(p)$ for $p \in \mathsf {Corr}$, and
2.
$|\{q :\left\langle m, q\right\rangle \in s'.\mathsf {rcvd}(p)\}| = |\{q :\left\langle m, q\right\rangle \in s.\mathsf {rcvd}(p)\}|$ for $m \in \mathsf {MT}, p \in \mathsf {Corr}$.

One can show that every timed state $q \in Q$ has at least one state in $\mathsf {WF}(q)$:

Proposition 5.5

Let $ TA ^{\textsf {MP}}$ be a message-passing timed automaton, and

$\textit{TS}( TA ^{\textsf {MP}})=(Q, Q_0, \varDelta , \lambda )$. For every state $q \in Q$, the set $\mathsf {WF}(q)$ is not empty.

Using Proposition 5.5, one can show that the well-defined states simulate all the timed states of a message-passing timed automaton:

Theorem 5.6

If $ TA ^{\textsf {MP}}$ is a message-passing timed automaton, and if $\textit{TS}( TA ^{\textsf {MP}})= (Q, Q_0, \varDelta , \lambda )$, then $\{(q, r) :q \in Q, r \in \mathsf {WF}(q)\}$ is an initial timed simulation.

Theorem 5.6 suggests that timed automata restricted to well-formed states might help us in avoiding the negative result of Theorem 5.1. To this end, we introduce a well-formed message-passing timed automaton. Before that, we note that Eq. (1) of Definition 5.3 can be transformed to a state invariant. We denote such a state invariant as $I^{\textsf {WF}}$.

Definition 5.7

(Well-formed MPTA). Given a message-passing timed automaton $\textit{TA}^\textsf {MP}=(S, S_0, R, L, U\cup X, I, E)$, its well-formed restriction $ TA ^{\textsf {MP}}_{\textsf {WF}}$ is the timed automaton $(S, S_0, R, L, U\cup X, I\wedge I^{\textsf {WF}}, E)$.

Since the well-formed states are included in the set of timed states, and the well-formed states simulate timed states (Theorem 5.6), we obtain the following:

Corollary 5.8

Let $ TA ^{\textsf {MP}}$ be a message-passing timed automaton and $ TA ^{\textsf {MP}}_{\textsf {WF}}$ be its well-formed restriction. These timed automata are timed-simulation equivalent: $ TA ^{\textsf {MP}}\simeq ^{t} TA ^{\textsf {MP}}_{\textsf {WF}}$.

As a consequence of Theorems 3.4, 5.6, and Corollary 5.8, one obtains that there is a timed bisimulation equivalence between a well-formed message-passing timed automaton and the corresponding message-counting timed automaton, which is obtained by forgetting the sender of the messages and just counting the sent and delivered messages.

Theorem 5.9

Let $ TA ^{\textsf {MP}}$ be a message-passing timed automaton and $ TA ^{\textsf {MC}}$ be a message-counting timed automaton defined over the same timed system design. Further, let $ TA ^{\textsf {MP}}_{\textsf {WF}}$ be the well-formed restriction of $ TA ^{\textsf {MP}}$. There exists an initial timed bisimulation: $ TA ^{\textsf {MP}}_{\textsf {WF}}\approx ^{t} TA ^{\textsf {MC}}$.

By collecting Theorem 5.9 and Corollary 5.8 we conclude that there is a timed simulation equivalence between MPTA and MCTA:

Corollary 5.10

Let $ TA ^{\textsf {MP}}$ be a message-passing timed automaton and $ TA ^{\textsf {MC}}$ be a message-counting timed automaton defined over the same timed system design. $ TA ^{\textsf {MP}}$ and $ TA ^{\textsf {MC}}$ are timed-simulation equivalent: $ TA ^{\textsf {MP}}\simeq ^{t} TA ^{\textsf {MC}}$.

Figure 7 uses arrows to depict the timed simulations presented in this work.

6 Conclusions

Asynchronous Systems. For systems considered in Sect. 3, we conclude from Theorem 3.4 that message-counting systems are detailed enough for model checking of properties written in $\mathsf {CTL}^{*}$:

Corollary 6.1

For a $\mathsf {CTL^\star }$ formula $\varphi $, a message-passing system $\textit{TS}^{\,\textsf {MP}}$ and a message-counting system $\textit{TS}^{\,\textsf {MC}}$ defined over the same design, .

The corollary implies that the message counting abstraction does not introduce spurious behavior. In contrast, data and counter abstractions introduced in [22] may lead to spurious behavior as only simulation relations have been shown for these abstractions.

Timed Systems. For systems considered in Sect. 4, we consider specifications in the temporal logic $\mathsf {ATCTL}$ [14], which restricts $\mathsf {TCTL}$ [6] as follows: first, negations only appear next to propositions $p \in \mathsf {AP}\ \cup \ \varPsi (U)$, and second, the temporal operators are restricted to $\mathsf {A}\,{\mathsf {F}}\,_{\sim c}$, $\mathsf {A}\,{\mathsf {G}}\,_{\sim c}$, and $\mathsf {A}\,\,{\mathsf {U}}\,_{\sim c}$.

To derive that message-counting timed automata are sufficiently precise for model checking of $\mathsf {ATCTL}$ formulas (in the following corollary), we combine the following results: (i) Simulation-equivalent systems satisfy the same formulas of $\mathsf {ACTL}$, e.g. see [11, Theorem 7.76]; (ii) Reduction of $\mathsf {TCTL}$ model checking to $\mathsf {CTL}$ model checking by clock embedding [11, p. 706]; (iii) Corollary 5.10.

Corollary 6.2

For a message-passing timed automaton $ TA ^{\textsf {MP}}$ and a message-counting timed automaton $ TA ^{\textsf {MC}}$ defined over the same timed design and an $\mathsf {ATCTL}$-formula $\varphi $, the following holds: if and only if .

Future Work. Most of the timed specifications of interest for FTDAs (e.g., fault-tolerant clock synchronization algorithms [35, 39, 40]) are examples of time-bounded specifications, thus belonging to the class of timed safety specifications. These algorithms can be encoded as message-passing timed automata (Definition 4.1). In this paper, we have shown that model checking of these algorithms can also be done at the level of message-counting timed automata (Definition 4.2). Based on this it appears natural to apply the abstraction-based parameterized model checking technique from [22]. However, we are still facing the challenge of having a parameterized number of clocks in Definition 4.2. We are currently working on another abstraction that addresses this issue. This will eventually allow us to do parameterized model checking of timed fault-tolerant distributed algorithms using UPPAAL [12] as back-end model checker.

Related Work. As discussed in [23], while modeling message passing is natural for fault-tolerant distributed algorithms (FTDAs), message counting scales better for asynchronous systems, and also builds a basis for efficient parameterized model checking techniques [22, 28]. We are interested in corresponding results for timed systems, that is, our long-term research goal is to build a framework for the automatic verification of timed properties of FTDAs. Such kind of properties are particularly relevant for the analysis of distributed clock synchronization protocols [35, 39, 40]. This investigation combines two research areas: (i) verification of FTDAs and (ii) parameterized model checking (PMC) of timed systems.

To the best of our knowledge, most of the existing literature on (i) can model only the discrete behaviors of the algorithms themselves [4, 5, 18, 20, 22, 28, 38]. Consequently they can neither reason about nor verify their timed properties. This motivated us to extend existing techniques for modeling and abstracting FTDAs, such as message passing and message counting systems together with the message counting abstraction, to timed systems.

Most of the results about PMC of timed systems [1,2,3, 8,9,10, 31, 34] are restricted to systems whose interprocess communication primitives have other systems in mind than FTDAs. For instance, the local state space is fixed and finite and independent of the parameters, while message counting in FTDAs requires that the local state space depends on the parameters. This motivated us to introduce the notions of message passing timed automata and message counting timed automata. Besides, the literature typically focuses on decidability, e.g., [1, 3, 9, 34] analyze decidability for different variants of the parameterized model checking problem (e.g., integer vs. continuous time, safety vs. liveness, presence vs. absence of controller). Our work focuses on establishing relations between different timed models, with the goal of using these relations for abstraction-based model checking.

Notes

1.
As we deal with distributed algorithms and timed automata, the notion of a clock appears in two different contexts in this paper, which should not be confused: The problem of clock synchronization is to compute adjustment for the hardware clocks (oscillators). In the context of timed automata, clocks are special variables used to model the timing behavior of a system.

References

Abdulla, P.A., Deneux, J., Mahata, P.: Multi-clock timed networks. In: LICS, pp. 345–354 (2004)
Google Scholar
Abdulla, P.A., Haziza, F., Holík, L.: All for the price of few. In: Giacobazzi, R., Berdine, J., Mastroeni, I. (eds.) VMCAI 2013. LNCS, vol. 7737, pp. 476–495. Springer, Heidelberg (2013). doi:10.1007/978-3-642-35873-9_28
Chapter Google Scholar
Abdulla, P.A., Jonsson, B.: Model checking of systems with many identical timed processes. Theor. Comput. Sci. 290(1), 241–264 (2003)
Article MathSciNet MATH Google Scholar
Alberti, F., Ghilardi, S., Orsini, A., Pagani, E.: Counter abstractions in model checking of distributed broadcast algorithms: some case studies. In: CILC, pp. 102–117 (2016)
Google Scholar
Alberti, F., Ghilardi, S., Pagani, E.: Counting constraints in flat array fragments. In: Olivetti, N., Tiwari, A. (eds.) IJCAR 2016. LNCS (LNAI), vol. 9706, pp. 65–81. Springer, Heidelberg (2016). doi:10.1007/978-3-319-40229-1_6
Chapter Google Scholar
Alur, R., Courcoubetis, C., Dill, D.: Model-checking for real-time systems. In: LICS, pp. 414–425 (1990)
Google Scholar
Alur, R., Dill, D.L.: A theory of timed automata. Theor. Comput. Sci. 126(2), 183–235 (1994)
Article MathSciNet MATH Google Scholar
Aminof, B., Kotek, T., Rubin, S., Spegni, F., Veith, H.: Parameterized model checking of rendezvous systems. In: Baldan, P., Gorla, D. (eds.) CONCUR 2014. LNCS, vol. 8704, pp. 109–124. Springer, Heidelberg (2014). doi:10.1007/978-3-662-44584-6_9
Google Scholar
Aminof, B., Rubin, S., Zuleger, F., Spegni, F.: Liveness of parameterized timed networks. In: Halldórsson, M.M., Iwama, K., Kobayashi, N., Speckmann, B. (eds.) ICALP 2015. LNCS, vol. 9135, pp. 375–387. Springer, Heidelberg (2015). doi:10.1007/978-3-662-47666-6_30
Google Scholar
Außerlechner, S., Jacobs, S., Khalimov, A.: Tight cutoffs for guarded protocols with fairness. In: Jobstmann, B., Leino, K.R.M. (eds.) VMCAI 2016. LNCS, vol. 9583, pp. 476–494. Springer, Heidelberg (2016). doi:10.1007/978-3-662-49122-5_23
Chapter Google Scholar
Baier, C., Katoen, J.-P.: Principles of Model Checking. MIT Press, Massachusetts (2008)
MATH Google Scholar
Behrmann, G., David, A., Larsen, K.G., Hakansson, J., Pettersson, P., Yi, W., Hendriks, M.: UPPAAL 4.0. In: QEST, pp. 125–126 (2006)
Google Scholar
Bracha, G., Toueg, S.: Asynchronous consensus and broadcast protocols. J. ACM 32(4), 824–840 (1985)
Article MathSciNet MATH Google Scholar
Bulychev, P., Chatain, T., David, A., Larsen, K.G.: Efficient on-the-fly algorithm for checking alternating timed simulation. In: Ouaknine, J., Vaandrager, F.W. (eds.) FORMATS 2009. LNCS, vol. 5813, pp. 73–87. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04368-0_8
Chapter Google Scholar
Čerāns, K.: Decidability of bisimulation equivalences for parallel timer processes. In: Bochmann, G., Probst, D.K. (eds.) CAV 1992. LNCS, vol. 663, pp. 302–315. Springer, Heidelberg (1993). doi:10.1007/3-540-56496-9_24
Chapter Google Scholar
Clarke, E., Grumberg, O., Peled, D.: Model Checking. MIT Press, Massachusetts (1999)
Google Scholar
Clarke, E., Grumberg, O., Jha, S., Lu, Y., Veith, H.: Counterexample-guided abstraction refinement for symbolic model checking. J. ACM 50(5), 752–794 (2003)
Article MathSciNet MATH Google Scholar
Drăgoi, C., Henzinger, T.A., Veith, H., Widder, J., Zufferey, D.: A logic-based framework for verifying consensus algorithms. In: McMillan, K.L., Rival, X. (eds.) VMCAI 2014. LNCS, vol. 8318, pp. 161–181. Springer, Heidelberg (2014). doi:10.1007/978-3-642-54013-4_10
Chapter Google Scholar
Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985)
Article MathSciNet MATH Google Scholar
Fisman, D., Kupferman, O., Lustig, Y.: On verifying fault tolerance of distributed protocols. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 315–331. Springer, Heidelberg (2008). doi:10.1007/978-3-540-78800-3_22
Chapter Google Scholar
Függer, M., Schmid, U.: Reconciling fault-tolerant distributed computing and systems-on-chip. Distrib. Comput. 24(6), 323–355 (2012)
Article MATH Google Scholar
John, A., Konnov, I., Schmid, U., Veith, H., Widder, J.: Parameterized model checking of fault-tolerant distributed algorithms by abstraction. In: FMCAD, pp. 201–209 (2013)
Google Scholar
John, A., Konnov, I., Schmid, U., Veith, H., Widder, J.: Towards modeling and model checking fault-tolerant distributed algorithms. In: Bartocci, E., Ramakrishnan, C.R. (eds.) SPIN 2013. LNCS, vol. 7976, pp. 209–226. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39176-7_14
Chapter Google Scholar
Kaynar, D.K., Lynch, N.A., Segala, R., Vaandrager, F.W.: The Theory of Timed I/O Automata. Morgan & Claypool Publishers, San Rafael (2006)
MATH Google Scholar
Konnov, I., Lazić, M., Veith, H., Widder, J.: A short counterexample property for safety and liveness verification of fault-tolerant distributed algorithms. In: POPL 2017. (to appear, preliminary version at arXiv:1608.05327)
Konnov, I., Veith, H., Widder, J.: On the completeness of bounded model checking for threshold-based distributed algorithms: reachability. In: Baldan, P., Gorla, D. (eds.) CONCUR 2014. LNCS, vol. 8704, pp. 125–140. Springer, Heidelberg (2014). doi:10.1007/978-3-662-44584-6_10
Google Scholar
Konnov, I., Veith, H., Widder, J.: SMT and POR beat counter abstraction: parameterized model checking of threshold-based distributed algorithms. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 85–102. Springer, Heidelberg (2015). doi:10.1007/978-3-319-21690-4_6
Chapter Google Scholar
Konnov, I., Veith, H., Widder, J.: What you always wanted to know about model checking of fault-tolerant distributed algorithms. In: Mazzara, M., Voronkov, A. (eds.) PSI 2015. LNCS, vol. 9609, pp. 6–21. Springer, Heidelberg (2016). doi:10.1007/978-3-319-41579-6_2
Chapter Google Scholar
Lynch, N., Vaandrager, F.: Forward and backward simulations for timing-based systems. In: Bakker, J.W., Huizing, C., Roever, W.P., Rozenberg, G. (eds.) REX 1991. LNCS, vol. 600, pp. 397–446. Springer, Heidelberg (1992). doi:10.1007/BFb0032002
Chapter Google Scholar
Mostéfaoui, A., Mourgaya, E., Parvédy, P.R., Raynal, M.: Evaluating the condition-based approach to solve consensus. In: DSN, pp. 541–550 (2003)
Google Scholar
Namjoshi, K.S., Trefler, R.J.: Uncovering symmetries in irregular process networks. In: Giacobazzi, R., Berdine, J., Mastroeni, I. (eds.) VMCAI 2013. LNCS, vol. 7737, pp. 496–514. Springer, Heidelberg (2013). doi:10.1007/978-3-642-35873-9_29
Chapter Google Scholar
Pease, M., Shostak, R., Lamport, L.: Reaching agreement in the presence of faults. J. ACM 27(2), 228–234 (1980)
Article MathSciNet MATH Google Scholar
Song, Y.J., Renesse, R.: Bosco: one-step byzantine asynchronous consensus. In: Taubenfeld, G. (ed.) DISC 2008. LNCS, vol. 5218, pp. 438–450. Springer, Heidelberg (2008). doi:10.1007/978-3-540-87779-0_30
Chapter Google Scholar
Spalazzi, L., Spegni, F.: Parameterized model-checking of timed systems with conjunctive guards. In: Giannakopoulou, D., Kroening, D. (eds.) VSTTE 2014. LNCS, vol. 8471, pp. 235–251. Springer, Heidelberg (2014). doi:10.1007/978-3-319-12154-3_15
Google Scholar
Srikanth, T.K., Toueg, S.: Optimal clock synchronization. J. ACM 34(3), 626–645 (1987)
Article MathSciNet Google Scholar
Srikanth, T.K., Toueg, S.: Simulating authenticated broadcasts to derive simple fault-tolerant algorithms. Distrib. Comput. 2, 80–94 (1987)
Article Google Scholar
Tripakis, S., Yovine, S.: Analysis of timed systems using time-abstracting bisimulations. FMSD 18, 25–68 (2001)
MATH Google Scholar
Tsuchiya, T., Schiper, A.: Verification of consensus algorithms using satisfiability solving. Distrib. Comput. 23(5–6), 341–358 (2011)
Article MATH Google Scholar
Widder, J., Schmid, U.: Booting clock synchronization in partially synchronous systems with hybrid process and link failures. Distrib. Comput. 20(2), 115–140 (2007)
Article MATH Google Scholar
Widder, J., Schmid, U.: The theta-model: achieving synchrony without clocks. Distrib. Comput. 22(1), 29–47 (2009)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

TU Wien (Vienna University of Technology), Vienna, Austria
Igor Konnov & Josef Widder
UnivPM, Ancona, Italy
Francesco Spegni & Luca Spalazzi

Authors

Igor Konnov
View author publications
You can also search for this author in PubMed Google Scholar
Josef Widder
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Spegni
View author publications
You can also search for this author in PubMed Google Scholar
Luca Spalazzi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Igor Konnov .

Editor information

Editors and Affiliations

IRIF, Université Paris Diderot, Paris, France
Ahmed Bouajjani
VERIMAG, CNRS & Université Grenoble Alpes, Grenoble, France
David Monniaux

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Konnov, I., Widder, J., Spegni, F., Spalazzi, L. (2017). Accuracy of Message Counting Abstraction in Fault-Tolerant Distributed Algorithms. In: Bouajjani, A., Monniaux, D. (eds) Verification, Model Checking, and Abstract Interpretation. VMCAI 2017. Lecture Notes in Computer Science(), vol 10145. Springer, Cham. https://doi.org/10.1007/978-3-319-52234-0_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-52234-0_19
Published: 12 January 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52233-3
Online ISBN: 978-3-319-52234-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics