Keywords

1 Introduction

Nowadays, many high-reliability systems are distributed and parameterized in some manner, e.g. the number of participants, or the size of message buffers. Since the number and the cost of failures of these systems increases [2], industry has applied many automated techniques to reason about their correctness at the design and implementation levels, such as model checking [6, 17, 24, 28], and testing [19]. While these methods report positive results in analyzing individual system configurations with fixed parameter values, the real goal is to verify all configurations, i.e., with infinitely many vectors of parameter values.

Unfortunately, the parameterized verification problem is typically undecidable, even if every participant follows the same code [1, 3, 27]. This negative result has led naturally to two approaches of algorithm analyses: (a) semi-automated methods based on user-guided invariants and proof assistants, and (b) automatic techniques for restricted classes of algorithms and properties. A particularly fascinating case is the cutoff property that guarantees that analyzing a few small instances is sufficient to reason about the correctness of all instances [8, 15]. In a nutshell, given a property \(\xi \) and a system that has a parameter \(\mathbf {m}\), there exists a number \(B \ge 1\) such that whenever all instances that assign a value not greater than B to a parameter \(\mathbf {m}\) satisfy \(\xi \), then all instances which assign an arbitrary number to \(\mathbf {m}\) satisfy \(\xi \). Hence, verification of algorithms that enjoy the cutoff property can be done by model checking of finite instances.

In this paper, we introduce the class of symmetric point-to-point algorithms that enjoys the cutoff property. Informally, an instance in this class contains N processes that follow the same algorithm, and communicate with each other by sending and receiving messages through point-to-point communication channels. At each process, local memory can be partitioned into regions such that one region corresponds one-to-one with a remote process, e.g. the array element \(\mathop { timeout }\left[ p , q \right] \) at a process p stores the maximum waiting time for a process q by the process p. The failure detector [5] is one example of this class. Let \(1..N\) be a set of indexes. We show two cutoffs for these algorithms:

  1. 1.

    Let i be an index, and \(\omega _{\{i\}}\) be an LTL\(\backslash \)X (the stuttering-insensitive linear temporal logic) formula in which every predicate takes one of the forms: \(P_1(i)\) or \(P_2(i, i)\). Properties of the form \(\bigwedge _{i \in 1..N} \omega _{\{i\}}\) has a cutoff of 1.

  2. 2.

    Let i and j be different indexes, and \(\psi _{\{i, j\}}\) be an LTL\(\backslash \)X formula in which every predicate takes one of the (syntactic) forms: \(Q_1(i)\), or \(Q_2(j)\), or \(Q_3(i, j)\), or \(Q_4(j, i)\). Properties of the form \(\bigwedge _{i, j \in 1..N}^{i \ne j} \psi _{\{i, j\}}\) has a cutoff of 2.

For instance, by the second cutoff result, we can verify the following property called the strong completeness property of the failure detector in [5] by model checking of an instance of size 2.

$$ \mathbf{F} \mathbf{G} (\forall i, j \in 1..N:( Correct (i) \wedge \lnot Correct (j)) \Rightarrow Suspected (i, j))$$

This formula means that every crashed process is eventually permanently suspected by every correct process. We are writing \(\mathbf{F} \) and \(\mathbf{G} \) to denote “eventually” and “globally” operators of linear temporal logic (LTL), see [9]. We demonstrate the feasibility of our approach by specifying Chandra and Toueg’s failure detectors [5] in the language \(\textsc {TLA}^+\) [22], and model checking the specification with two model checkers: TLC [28] and APALACHE [20].

Related work. Our work is inspired by the cutoff results for various models of computation: ring-based message-passing systems [14, 15], purely disjunctive guards and conjunctive guards [12, 13], token-based communication [8], and quorum-based algorithms [23]. Additionally, there are semi-decision procedures based on invariants, induction, and abstraction that are successful in many interesting cases [4, 7, 18, 21, 25]. Interactive verification methods with proof assistants [10, 16, 26] have produced positive results in proving distributed algorithms.

The paper is organized as follows. Section 2 presents our motivating example - Chandra and Toueg’s failure detector [5], and challenges in verification of these algorithms. Section 3 defines the model of computation as a transition system. Section 4 shows our main contributions: two cutoff results in the class of symmetric point-to-point distributed algorithms. Section 5 presents how we encode the model of computation, and the failure detector of [5] in \(\textsc {TLA}^+\), and the model checking results. Section 6 concludes the paper with a discussion of future extensions.

2 Motivating Example

This section starts with a description of our motivating example – Chandra and Toueg’s failure detector [5]. Then, we present challenges in verification of the failure detector, and state-of-the-art verification techniques.

Algorithm 1 presents the pseudo-code of the failure detector of [5]. A system instance has N processes that communicate with each other by sending-to-all and receiving messages through \(N^2\) point-to-point communication channels. A process performs local computation based on these messages (we assume that a process also receives the messages that it sends to itself). In one system step, all processes may take up to one step. Some processes may crash, i.e., stop operating. Correct processes follow Algorithm 1 to detect crashes in the system. Initially, every correct process sets a default value for a timeout of each other, i.e. how long it should wait for others and assumes that no processes have crashed (Line 4). Every correct process p has three tasks: (i) repeatedly sends an “alive” message to all (Line 6), and (ii) repeatedly produces predictions about crashes of other processes based on timeouts (Line 8), and (iii) increases a timeout for a process q if p has learned that its suspicion on q is wrong (Line 12). Notice that a process p raises suspicion on the operation of a process q (Line 8) by considering only information related to q: \(\mathop { timeout }\left[ p , q \right] , suspected \left[ p, q \right] \), and messages that p has received from q recently. In other words, its suspicions about other processes grow independently.

Let \( Correct (p)\) be a predicate whether a process p is correct. (However, p can crash later. A crashed process \(p_1\) satisfies \(\lnot Correct (p_1)\).) Let \( Suspected (p, q)\) be a predicate whether a process p suspects a process q. The failure detector should guarantee the following properties [5]:

  • Strong completeness: Every crashed process is eventually permanently suspected by every correct process.

    $$ \begin{aligned} \mathbf{F} \mathbf{G} (\forall p, q \in 1..N:( Correct (p) \wedge \lnot Correct (q)) \Rightarrow Suspected (p, q)) \end{aligned} $$
  • Eventual strong accuracy: There is a time after which correct processes are not suspected by any correct processes.

    $$ \begin{aligned} \mathbf{F} \mathbf{G} (\forall p, q \in 1..N:( Correct (p) \wedge Correct (q)) \Rightarrow \lnot Suspected (p, q)) \end{aligned} $$

In the asynchronous model, Algorithm 1 does not satisfy eventually strong accuracy since there exists no bound on message delay, and messages sent by correct processes might always arrive after the timeout expires. The correctness of failure detectors is based on two implicit time constraints: (1) the transmission delay of messages and (2) the relative speeds of different processes [5]. Even if these upper bounds exist but are unknown, failure detectors can satisfy both strong completeness and eventually strong accuracy.

figure a

Note that the symmetry exists in both the failure detectors of [5] and the above correctness properties. First, every process is isomorphic under renaming. A correct process p always sends a message to all and raises suspicion on a process q by considering only information related to q. Second, there are only point-to-point communication channels. Third, the contents of in-transit messages is identical. They are merely “keep-alive” messages that may arrive at different times. Finally, all variables in both properties strong completeness and eventual strong accuracy are variables over process indices, and they are bound by universal quantifiers. The symmetry is captured by our model of computation and is the key point in our proofs of the cutoff results.

As a result, verification of failure detectors faces the following challenges:

  1. 1.

    Algorithms are parameterized by the number of processes. Hence, we need to verify infinitely many instances of algorithms.

  2. 2.

    Its model of computation lies between synchrony and asynchrony since multiple processes can take a step in a global transition.

  3. 3.

    The algorithm relies on a global clock and local clocks. A straightforward encoding of a clock with an integer would produce an infinite state space.

  4. 4.

    The algorithm is parameterized with the upper bounds on transmission time of messages, and the relative speeds of different processes. These upper bounds are called \(\varDelta \) and \(\varPhi \), respectively.

In this paper, we focus on Challenges 1–2: Our model of computation in Sect. 3 does not restrict the number of processes that simultaneously take a step, and we show cutoffs on the number of processes in Sect. 4. Our cutoff results apply for checking LTL\(\backslash \)X formulas of the forms \(\bigwedge _{i \in 1..N} \omega _{\{1\}}\) and \(\bigwedge _{i, j \in 1..N}^{i \ne j} \psi _{\{1, 2\}}\). Hence, we can verify the failure detector of [5] by model checking its few instances. We demonstrate the feasibility of our approach by specifying and model checking the failure detector in the synchronous case. Our specification contains optimizations for Challenge 3, which allows us to efficiently encode global and local clocks. In the synchronous case, we can skip Challenge 4, which we leave for furture work.

3 Model of Computation

In this section, we formalize a distributed system as a transition system. This formalization captures the semantics of the theoretical model of [5, 11], but does not consider the restrictions on the execution space given by \(\varDelta \) and \(\varPhi \). A global system is a composition of N processes, \(N^2\) point-to-point outgoing message buffers, and N control components that capture what processes can take a step. Every process is identified with a unique index in \(1..N\), and follows the same deterministic algorithm. Moreover, a global system allows: (i) multiple processes to take (at most) one step in one global transition, and (ii) some processes to crash. Every process may execute three kinds of transitions: internal, round, and stuttering. Notice that in one global transition, some processes may send a message to all, and some may receive messages and do computation. Hence, we need to decide which processes move, and what happens to the message buffers. We introduce four sub-rounds: Schedule, Send, Receive, and Computation. The transitions for these sub-rounds are called internal ones. A global round transition is a composition of four internal transitions. We formalize sub-rounds and global transitions later. As a result of modeling, there exists an arbitrary sequence of global configurations which is not accepted in [5, 11]. We define so-called admissible sequences of global configurations that are accepted in [5, 11]. We did encode our formalization in \(\textsc {TLA}^+\), and our specification is presented in Sect. 5.

Since every process follows the same algorithm, we first define a process template that captures the process behavior. This formalization focuses on symmetric point-to-point algorithms parameterized by N. Every process is an instance of the process template. Then, we present the formalization of a global system.

3.1 The Process Template

We fix a set of process indexes as \(1..N\). Moreover, we assume that the message content does not have indexes of its receiver and sender. We let \(\texttt {Msg}\) denote a set of potential messages, and \(\texttt {Set}(\texttt {Msg})\) denote a set of sets of messages.

We model a process template as a transition system \(\mathcal {U}_{N} = (Q_{N}, Tr _{N}, Rel _{N}, q _{N}^0)\) where \(Q_{N} = Loc \times \texttt {Set}(\texttt {Msg})^N \times \mathcal {D}^N\) is a set of template states, \( Tr _{N}\) is a set of template transitions, \( Rel _{N} \subseteq Q_{N} \times Tr _{N} \times Q_{N}\) is a template transition relation, and \( q _{N}^0 \in Q_{N}\) is an initial state. These components of \(\mathcal {U}_{N}\) are defined as follows.

States. A template state \(\rho _{}\) is a tuple \(\left( \ell , S_1, \ldots , S_N, d_1, \ldots , d_N \right) \) where:

  • \(\ell \in Loc \) refers to a location of a program counter, and \( Loc \) is a set of locations. We assume that \( Loc = Loc _{ snd }\cup Loc _{ rcv }\cup Loc _{ comp }\cup \{\ell _{ crash }\}\), and three sets \( Loc _{ snd }\), \( Loc _{ rcv }\), \( Loc _{ comp }\) are disjoint, and \(\ell _{ crash }\) is a special location of crashes. To access the program counter, we use a function \( pc :Q_{N} \rightarrow Loc \) that takes a template state at its input, and produces its program counter as the output. Let \(\rho _{}(k)\) denote the \(k^{th}\) component in a template state \(\rho _{}\). For every \(\rho _{} \in Q_{N}\), we have \( pc (\rho _{}) = \rho _{}(1)\) .

  • \(S_i \in \texttt {Set}(\texttt {Msg})\) refers to a set of messages. It is to store the messages received from a process \(p_i\) for every \(i \in 1..N\). To access a set of received messages from a particular process whose index in \(1..N\), we use a function \( rcvd :Q_{N} \times 1..N\rightarrow \texttt {Set}(\texttt {Msg})\) that takes a template state \(\rho _{}\) and a process index i at its input, and produces the \((i + 1)^{th}\) component of \(\rho _{}\) at the output, i.e. for every \(\rho _{} \in Q_{N}\), we have \( rcvd (\rho _{}, i) = \rho _{}(1 + i)\) .

  • \(d_i \in \mathcal {D}\) refers to a local variable related to a process \(p_i\) for every \(i \in 1..N\). To access a local variable related to a particular process whose index in \(1..N\), we use a function \( lvar :Q_{N} \times 1..N\rightarrow \mathcal {D}\) that takes a template state \(\rho _{}\) and a process index i at its input, and produces the \((1 + N + i)^{th}\) component of \(\rho _{}\) as the output, i.e. \( lvar (\rho _{}, i) = \rho _{}(1 + N + i)\) for every \(\rho _{} \in Q_{N}\).

Initial State. The initial state \( q _{N}^0\) is a tuple \( q _{N}^0 = (\ell _0, \emptyset , \ldots , \emptyset , d_0, \ldots , d_0) \) where \(\ell _0\) is a location, every box for received messages is empty, and every local variable is assigned a constant \(d_0 \in \mathcal {D}\).

Transitions. We define \( Tr _{N} = CSnd \cup CRcv \cup \{ comp , crash , stutter \}\) where

  • \( CSnd \) is a set of transitions. Every transition in \( CSnd \) refers to a task that does some internal computation, and sends a message to all. For example, in task 1 in Algorithm 1, a process increases its local clock, and performs an instruction to send “alive” to all. We let \( csnd (m)\) denote a transition referring to a task with an action to send a message \(m \in \texttt {Msg}\) to all.

  • \( CRcv \) is a set of transitions. Every transition in \( CRcv \) refers to a task that receives N sets of messages, and does some internal computation. For example, in task 2 in Algorithm 1, a process increases its local clock, receives messages, and remove false-negative predictions. We let \( crcv (S_1, \ldots , S_N)\) denote a transition referring to a task with an action to receive sets \(S_1, \ldots , S_N\) of messages. These sets \(S_1, \ldots , S_N\) are delivered by the global system.

  • \( comp \) is a transition which refers to a task with purely local computation. In other words, this task has neither send actions nor receive actions.

  • \( crash \) is a transition for crashes.

  • \( stutter \) is a transition for stuttering steps.

Transition Relation. For two states \(\rho _{}, \rho ^{\prime }_{} \in Q_{N}\) and a transition \( tr \in Tr _{N}\), we write \(\rho _{} \xrightarrow { tr _{}} \rho ^{\prime }_{}\), instead of \((\rho _{}, \xrightarrow { tr _{}}, \rho ^{\prime }_{})\). In the model of [5, 11], each process follows the same deterministic algorithm. Hence, we assume that for every \(\rho _{0} \xrightarrow { tr _{0}} \rho ^{\prime }_{0}\) and \(\rho _{1} \xrightarrow { tr _{1}} \rho ^{\prime }_{1}\), if \(\rho _{0} = \rho _{1}\) and \( tr _0 = tr _1\), then it follows that \(\rho ^{\prime }_{0} = \rho ^{\prime }_{1}\). Moreover, we assume that there exist the following functions which are used to define constraints on the template transition relation:

  • A function \( nextLoc : Loc \rightarrow Loc \) takes a location at its input and produces the next location as the output.

  • A function \( genMsg : Loc \rightarrow \texttt {Set}(\texttt {Msg})\) a location at its input, and produces a singleton set that contains the message that is sent to all processes in the current task. The output can be an empty set. For example, if a process is performing a receive task, the output of \( genMsg \) is an empty set.

  • A function \( nextVar : Loc \times \texttt {Set}(\texttt {Msg}) \times \mathcal {D}\rightarrow \mathcal {D}\) takes a location, a set of messages, and a local variable’s value, and produces a new value of a local variable as the output.

Let us fix functions \( nextLoc , genMsg \) and \( nextVar \). We define the template transitions as follows.

  1. 1.

    For every \(m \in \texttt {Msg}\), for every pair of states \(\rho _{} , \rho ^{\prime }_{} \in Q_{N}\), we have \(\rho _{} \xrightarrow { csnd _{}(m)} \rho ^{\prime }_{}\) if and only if

    1. (a)

      \( pc (\rho ^{\prime }_{}) = nextLoc ( pc (\rho _{})) \wedge \{m\} = genMsg ( pc (\rho _{}))\)

    2. (b)

      \(\forall i \in 1..N: rcvd (\rho _{}, i) = rcvd (\rho ^{\prime }_{}, i)\)

    3. (c)

      \(\forall i \in 1..N: lvar (\rho ^{\prime }_{}, i) = nextVar \big ( pc (\rho _{}), rcvd (\rho ^{\prime }_{}, i), lvar (\rho _{}, i) \big ) \)

    Constraint (a) implies that the update of a program counter and the construction of a sent message m depend on only the current value of a program counter, and a process wants to send only m to all in this step. Constraint (b) is that no message was delivered. Constraint (c) implies that the value of \( lvar (\rho ^{\prime }_{}, i)\) depends on only the current location, a set of messages that have been delivered and stored in \( rcvd (\rho ^{\prime }_{}, i)\), and the value of \( lvar (\rho _{}, i)\).

  2. 2.

    For every \(S_1, \ldots , S_N \subseteq \texttt {Msg}\), for every pair of states \(\rho _{} , \rho ^{\prime }_{} \in Q_{N}\), we have \(\rho _{} \xrightarrow { crcv _{}(S_1, \ldots , S_N)} \rho ^{\prime }_{}\) if and only if the following constraints hold:

    1. (a)

      \( pc (\rho ^{\prime }_{}) = nextLoc ( pc (\rho _{})) \wedge \emptyset = genMsg ( pc (\rho _{}))\)

    2. (b)

      \(\forall i \in 1..N: rcvd (\rho ^{\prime }_{}, i) = rcvd (\rho _{}, i) \cup S_i\)

    3. (c)

      \(\forall i \in 1..N: lvar (\rho ^{\prime }_{}, i) = nextVar \big ( pc (\rho _{})), rcvd (\rho ^{\prime }_{}, i), lvar (\rho _{}, i)\big )\)

    Constraint (a) in \( crcv \) is similar to constraint (a) in \( csnd \), except that no message is sent in this sub-round. Constraint (b) refers that messages in a set \(S_i\) are from a process indexed i, and have been delivered in this step. Constraint (c) in \( crcv \) is similar to constraint (c) in \( csnd \).

  3. 3.

    For every pair of states \(\rho _{} , \rho ^{\prime }_{} \in Q_{N}\), we have \(\rho _{} \xrightarrow { comp _{}} \rho ^{\prime }_{}\) if and only if the following constraints hold:

    1. (a)

      \( pc (\rho ^{\prime }_{}) = nextLoc ( pc (\rho _{})) \wedge \emptyset = genMsg ( pc (\rho _{}))\)

    2. (b)

      \(\forall i \in 1..N: rcvd (\rho ^{\prime }_{}, i) = rcvd (\rho _{}, i)\)

    3. (c)

      \(\forall i \in 1..N: lvar (\rho ^{\prime }_{}, i) = nextVar \big ( pc (\rho _{}), rcvd (\rho ^{\prime }_{}, i), lvar (\rho _{}, i) \big )\)

    Hence, this step has only local computation. No message is sent or delivered.

  4. 4.

    For every pair of states \(\rho _{} , \rho ^{\prime }_{} \in Q_{N}\), we have \(\rho _{} \xrightarrow { crash _{}} \rho ^{\prime }_{} \) if and only if the following constraints hold:

    1. (a)

      \( pc (\rho _{}) \ne \ell _{ crash }\wedge pc (\rho ^{\prime }_{}) = \ell _{ crash }\)

    2. (b)

      \(\forall i \in 1..N: rcvd (\rho _{}, i) = rcvd (\rho ^{\prime }_{}, i) \wedge lvar (\rho _{}, i) = lvar (\rho ^{\prime }_{}, i)\)

    Only the program counter is updated by switching to \(\ell _{ crash }\).

  5. 5.

    For every pair of states \(\rho _{} , \rho ^{\prime }_{} \in Q_{N}\), we have \(\rho _{} \xrightarrow { stutter }{} \rho ^{\prime }_{}\) if and only if \(\rho _{} = \rho ^{\prime }_{}\).

3.2 Modeling the Distributed System

Given N processes which are instantiated from the same process template \(\mathcal {U}_{N} = (Q_{N}, Tr _{N}, Rel _{N}, q _{N}^0)\), the global system is a composition of (i) these processes, and (ii) \(N^2\) point-to-point buffers for in-transit messages, and (iii) N control components that capture what processes can take a step. We formalize the global system as a transition system \(\mathcal {G}_{N} = \left( \mathcal {C}_{N}, T_{N}, R_{N}, g _{N}^0 \right) \) where \(\mathcal {C}_{N} = (Q_{N})^N \times \texttt {Set}(\texttt {Msg})^{N \cdot N} \times \texttt {Bool}^N\) is a set of global configurations, and \(T_{N}\) is a set of global internal, round, and stuttering transitions, and \(R_{N} \subseteq \mathcal {C}_{N} \times T_{N} \times \mathcal {C}_{N}\) is a global transition relation, and \( g _{N}^0\) is an initial configuration. These components are defined as follows.

Configurations. A global configuration \(\kappa _{}\) is defined as a following tuple \( \kappa _{} = \left( q_1, \ldots , q_N, S^1_1, S^2_1 \ldots , S^r_s, \ldots S^N_N, act_1, \ldots , act_N \right) \) where:

  • \(q_i \in Q_{N}\): This component is a state of a process \(p_i\) for every \(i \in 1..N\). To access a local state of a particular process, we use a function \( lstate :\mathcal {C}_{N} \times 1..N\rightarrow Q_{N} \) that takes input as a global configuration \(\kappa _{}\) and a process index i, and produces output as the \(i^{th}\) component of \(\kappa _{}\) which is a state of a process \(p_i\). Let \(\kappa _{}(i)\) denote the \(i^{th}\) component of a global configuration \(\kappa _{}\). For every \(i \in 1..N\), we have \( lstate (\kappa _{}, i) = \kappa _{}(i) = q_i\).

  • \(S^r_s \in \texttt {Set}(\texttt {Msg})\): This component is a set of in-transit messages from a process \(p_s\) to a process \(p_r\) for every \(s, r \in 1..N\). To access a set of in-transit messages between two processes, we use a function \( buf :\mathcal {C}_{N} \times 1..N\times 1..N\rightarrow \texttt {Set}(\texttt {Msg})\) that takes input as a global configuration \(\kappa _{}\), and two process indexes sr, and produces output as the \((s \cdot N + r)^{th}\) component of \(\kappa _{}\) which is a message buffer from a process \(p_s\) (sender) to a process \(p_r\) (receiver). Formally, we have \( buf (\kappa _{}, s, r) = \kappa _{}(s \cdot N + r) = S^r_s\) for every \(s, r \in 1..N\).

  • \(act_i \in \texttt {Bool}\): This component says whether a process \(p_i\) can take one step in a global transition for every \(i \in 1..N\). To access a control component, we use a function \( active :\mathcal {C}_{N} \times 1..N\rightarrow \texttt {Bool}\) that takes input as a configuration \(\kappa _{}\) and a process index i, and produces output as the \(((N + 1) \cdot N + i)^{th}\) component of \(\kappa _{}\) which refers to whether a process \(p_i\) can take a step. Formally, we have \( active (\kappa _{}, i) = \kappa _{}((N + 1) \cdot N + i) = b_i\) for every \(i \in 1..N\). The environment sets the values of \(act_1, \ldots , act_N\) in the sub-round Schedule defined later.

We will write \(\kappa _{} \in (Q_{N})^N \times \texttt {Set}(\texttt {Msg})^{N \cdot N} \times \texttt {Bool}^N\) or \(\kappa _{} \in \mathcal {C}_{N}\).

Initial Configuration. The global system \(\mathcal {G}_{N}\) has one initial configuration \( g _{N}^0\), and it must satisfy the following constraints:

  1. 1.

    \(\forall i \in 1..N:\lnot active ( g _{N}^0, i) \wedge lstate (N, i) = q _{N}^0\)

  2. 2.

    \(\forall s, r \in 1..N: buf ( g _{N}^0, s, r) = \emptyset \)

Global Round Transitions. Intuitively, every round transition is a sequence of a \(\xrightarrow {\texttt {Sched}}\) transition, a \(\xrightarrow {\texttt {Snd}}\) transition, a \(\xrightarrow {\texttt {Rcv}}\) transition, and a \(\xrightarrow {\texttt {Comp}}\) transition defined below. We let denote round transitions. The semantics of round transitions is defined as follows: for every pair of global configurations \( \kappa _{0}, \kappa _{4} \in \mathcal {C}_{N}\), we say if there exist three global configurations \(\kappa _{1}, \kappa _{2}, \kappa _{3} \in \mathcal {C}_{N}\) such that \( \kappa _{0} \xrightarrow {\texttt {Sched}}\kappa _{1} \xrightarrow {\texttt {Snd}}\kappa _{2} \xrightarrow {\texttt {Rcv}}\kappa _{3} \xrightarrow {\texttt {Comp}}\kappa _{4} \). Notice that every correct process can make at most one global internal transition in every global round transition. Moreover, round transitions allow some processes to crash only in the sub-round Schedule. We call these faults clean-crashes.

Global Stuttering Transition. In the proof of Lemma 5 presented in Section 4, we do projection. Therefore, we extend the relation with stuttering: for every configuration \(\kappa _{}\), we allow .

Admissible Sequences. An infinite sequence \(\pi = \kappa _{0} \kappa _{1} \ldots \) of global configurations in \(\mathcal {G}_{N}\) is admissible if the following constraints hold:

  1. 1.

    \(\kappa _{0}\) is the initial state, i.e. \(\kappa _{0} = g _{N}^0\), and

  2. 2.

    \(\pi \) is stuttering equivalent with an infinite sequence \(\pi ^{\prime } = \kappa ^{\prime }_{0} \kappa ^{\prime }_{1} \ldots \) such that \( \kappa ^{\prime }_{4k} \xrightarrow {\texttt {Sched}}\kappa ^{\prime }_{4k+1} \xrightarrow {\texttt {Snd}}\kappa ^{\prime }_{4k+2} \xrightarrow {\texttt {Rcv}}\kappa ^{\prime }_{4k+3} \xrightarrow {\texttt {Comp}}\kappa ^{\prime }_{4k+4} \) for every \(k \ge 0\).

Notice that it immediately follows by this definition that if \(\pi = \kappa _{0} \kappa _{1} \ldots \) is an admissible sequence of configurations in \(\mathcal {G}_{N}\), then for every \(k \ge 0\). From now on, we only consider admissible sequences of global configurations.

Global Internal Transitions. In the model of [5], many processes can take a step in a global transition. We assume that a computation of the distributed system is organized in rounds, i.e.global ticks, and every round is organized as four sub-rounds called Schedule, Send, Receive, and Computation. To model that as a transition system, for every sub-round we define a corresponding transition: \(\xrightarrow {\texttt {Sched}}\) for the sub-round Schedule, \(\xrightarrow {\texttt {Snd}}\) for the sub-round Send, \(\xrightarrow {\texttt {Rcv}}\) for the sub-round Receive, \(\xrightarrow {\texttt {Comp}}\) for the sub-round Comp. These transitions are called global internal transitions. We define the semantics of these sub-rounds as follows.

  1. 1.

    Sub-round Schedule. The environment starts with a global configuration where every process is inactive, and move to another by non-deterministically deciding what processes become crashed, and what processes take a step in the current global transition. Every correct process takes a stuttering step, and every faulty process is inactive. If a process p is crashed in this sub-round, every incoming message buffer to p is set to the empty set. Formally, for \( \kappa _{}, \kappa ^{\prime }_{} \in \mathcal {C}_{N}\), we have \(\kappa _{} \xrightarrow {\texttt {Sched}}\kappa ^{\prime }_{}\) if the following constraints hold:

    1. (a)

      \(\forall i \in 1..N:\lnot active (\kappa _{}, i)\)

    2. (b)

      \(\forall i \in 1..N: lstate (\kappa _{}, i) \xrightarrow { stutter } lstate (\kappa ^{\prime }_{}, i) \vee lstate (\kappa _{}, i) \xrightarrow { crash _{}} lstate (\kappa ^{\prime }_{}, i)\)

    3. (c)

      \(\forall i \in 1..N: pc ( lstate (\kappa ^{\prime }_{}, i)) = \ell _{ crash }\Rightarrow \lnot active (\kappa ^{\prime }_{}, i)\)

    4. (d)

      \(\forall s, r \in 1..N: pc ( lstate (\kappa ^{\prime }_{}, r) ) \ne \ell _{ crash }\Rightarrow buf (\kappa _{}, s, r) = buf (\kappa ^{\prime }_{}, s, r) \)

    5. (e)

      \(\forall r \in 1..N: pc ( lstate (\kappa ^{\prime }_{}, r) ) = \ell _{ crash }\Rightarrow (\forall s \in 1..N: buf (\kappa ^{\prime }_{}, s, r) = \emptyset )\)

  2. 2.

    Sub-round Send. Only processes that perform send actions can take a step in this sub-round. Such processes become inactive at the end of this sub-round. Fresh sent messages are added to corresponding message buffers. To define the semantics of the sub-round Send, we use the following predicates:

    Formally, for \( \kappa _{}, \kappa ^{\prime }_{} \in \mathcal {C}_{N}\), we have \(\kappa _{} \xrightarrow {\texttt {Snd}}\kappa ^{\prime }_{}\) if the following constraints hold:

    1. (a)

      \(\forall i \in 1..N:\lnot Enabled (\kappa _{}, i, Loc _{ snd }) \Leftrightarrow Frozen _S(\kappa _{}, \kappa ^{\prime }_{}, i)\)

    2. (b)

      \(\forall i \in 1..N: Enabled (\kappa _{}, i, Loc _{ snd }) \Leftrightarrow \exists m \in \texttt {Msg}: Sending (\kappa _{}, \kappa ^{\prime }_{}, i, m)\)

    3. (c)

      \(\forall i \in 1..N: Enabled (\kappa _{}, i, Loc _{ snd }) \Rightarrow \lnot active (\kappa ^{\prime }_{}, i)\)

  3. 3.

    Sub-round Receive. Only processes that perform receive actions can take a step in this sub-round. Such processes become inactive at the end of this sub-round. Delivered messages are removed from corresponding message buffers. To define the semantics of this sub-round, we use the following predicates:

    figure b

    Formally, for \( \kappa _{}, \kappa ^{\prime }_{} \in \mathcal {C}_{N}\), we have \(\kappa _{} \xrightarrow {\texttt {Rcv}}\kappa ^{\prime }_{}\) if the following constraints hold:

    1. (a)

      \(\forall i \in 1..N:\lnot Enabled (\kappa _{}, i, Loc _{ rcv }) \Leftrightarrow Frozen _R(\kappa _{}, \kappa ^{\prime }_{}, i)\)

    2. (b)
    3. (c)

      \(\forall i \in 1..N: Enabled (\kappa _{}, i, Loc _{ rcv }) \Rightarrow \lnot active (\kappa ^{\prime }_{}, i)\)

  4. 4.

    Sub-round Computation. Only processes that perform internal computation actions can take a step in this sub-round. Such processes become inactive at the end of this sub-round. Every message buffer is unchanged. Formally, for \( \kappa _{}, \kappa ^{\prime }_{} \in \mathcal {C}_{N}\), we have \(\kappa _{} \xrightarrow {\texttt {Comp}}\kappa ^{\prime }_{}\) if the following constraints hold:

    1. (a)

      \(\forall i \in 1..N: Enabled (\kappa _{}, i, Loc _{ comp }) \Leftrightarrow lstate (\kappa _{}, i) \xrightarrow { comp _{}} lstate (\kappa ^{\prime }_{}, i)\)

    2. (b)

      \(\forall i \in 1..N:\lnot Enabled (\kappa _{}, i, Loc _{ comp }) \Leftrightarrow lstate (\kappa _{}, i) \xrightarrow { stutter } lstate (\kappa ^{\prime }_{}, i)\)

    3. (c)

      \(\forall s, r \in 1..N: buf (\kappa _{}, s, r) = buf (\kappa ^{\prime }_{}, s, r)\)

    4. (d)

      \(\forall i \in 1..N: Enabled (\kappa _{}, i, Loc _{ comp }) \Rightarrow \lnot active (\kappa ^{\prime }_{}, i)\)

Remark 1

Observe that the definitions of \(\kappa _{} \xrightarrow {\texttt {Snd}}\kappa ^{\prime }_{}\), and \(\kappa _{} \xrightarrow {\texttt {Rcv}}\kappa ^{\prime }_{}\), and \(\kappa _{} \xrightarrow {\texttt {Comp}}\kappa ^{\prime }_{}\) allow \(\kappa _{} = \kappa ^{\prime }_{}\), that is stuttering. This captures, e.g. global transitions in [5, 11] where no process sends a message.

4 Cutoff Results

Let \(\mathcal {A}\) be a symmetric point–to–point algorithm. In this section, we show two cutoff results for the number of processes in the algorithm \(\mathcal {A}\). With these cutoff results, one can verify the strong completeness and eventually strong accuracy of the failure detector of [5] by model checking two instances of sizes 1 and 2.

Theorem 1

Let \(\mathcal {A}\) be a symmetric point–to–point algorithm. Let \(\mathcal {G}_{1}\) and \(\mathcal {G}_{N}\) be instances of 2 and N processes respectively for some \(N \ge 1\). Let \( Path _1\) and \( Path _N\) be sets of all admissible sequences of configurations in \(\mathcal {G}_{1}\) and in \(\mathcal {G}_{N}\), respectively. Let \(\omega _{\{i\}}\) be a LTL\(\backslash \)X formula in which every predicate takes one of the forms: \(P_1(i)\) or \(P_2(i, i)\) where i is an index in \(1..N\). Then,

$$\Big (\forall \pi _N \in Path _N :\mathcal {G}_{N}, \pi _N \models \bigwedge _{i \in 1..N} \omega _{\{i\}} \Big ) \; \Leftrightarrow \; \Big (\forall \pi _1 \in Path _1 :\mathcal {G}_{1}, \pi _1 \models \omega _{\{1\}}\Big ).$$

Theorem 2

Let \(\mathcal {A}\) be a symmetric point–to–point algorithm. Let \(\mathcal {G}_{2}\) and \(\mathcal {G}_{N}\) be instances of 2 and N processes respectively for some \(N \ge 2\). Let \( Path _2\) and \( Path _N\) be sets of all admissible sequences of configurations in \(\mathcal {G}_{2}\) and in \(\mathcal {G}_{N}\), respectively. Let \(\psi _{\{i, j\}}\) be an LTL\(\backslash \)X formula in which every predicate takes one of the forms: \(Q_1(i)\), or \(Q_2(j)\), or \(Q_3(i, j)\), or \(Q_4(j, i)\) where i and j are different indexes in \(1..N\). It follows that:

$$\big ( \forall \pi _N \in Path _N :\mathcal {G}_{N}, \pi _N \models \bigwedge ^{i \ne j}_{i, j \in 1..N} \psi _{\{i, j\}} \big ) \Leftrightarrow \big ( \forall \pi _2 \in Path _2 :\mathcal {G}_{2}, \pi _2 \models \psi _{\{1, 2 \}} \big ) .$$

Since the proof of Theorem 1 is similar to the one of Theorem 2, we focus on Theorem 2 here. Its proof is based on the symmetric characteristics in the system model and correctness properties, and on the following lemmas.

  • Lemma 1 says that every transposition on a set of process indexes \(1..N\) preserves the structure of the process template \(\mathcal {U}_{N}\).

  • Lemma 2 says that every transposition on a set of process indexes \(1..N\) preserves the structure of the global transition system \(\mathcal {G}_{N}\) for every \(N \ge 1\).

  • Lemma 5 says that \(\mathcal {G}_{2}\) and \(\mathcal {G}_{N}\) are trace equivalent under a set \(AP_{\{1, 2\}}\) of predicates that take one of the forms: \(Q_1(i)\), or \(Q_2(j)\), or \(Q_3(i, j)\), or \(Q_4(j, i)\).

In the following, we present definitions and constructions to prove these lemmas. We end this section with the proof sketch of Theorem 2.

4.1 Index Transpositions And symmetric point–to–point systems

We first recall the definition of transposition. Given a set \(1..N\) of indexes, we call a bijection \(\alpha :1..N\rightarrow 1..N\) a transposition between two indexes \(i, j \in 1..N\) if the following properties hold: \(\alpha (i) = j\), and \(\alpha (j) = i\), and \(\forall k \in 1..N:(k \ne i \wedge k \ne j) \Rightarrow \alpha (k) = k\). We let \((i \leftrightarrow j)\) denote a transposition between two indexes i and j.

The application of a transposition to a template state is given in Definition 1. Informally, applying a transposition \(\alpha =(i \leftrightarrow j)\) to a template state \(\rho _{}\) generates a new template state by switching only the evaluation of \( rcvd \) and \( lvar \) at indexes i and j. The application of a transposition to a global configuration is provided in Definition 2. In addition to process configurations, we need to change message buffers and control components. We override notation by writing \(\alpha _S(\rho _{})\) and \(\alpha _{C}(\kappa _{})\) to refer the application of a transposition \(\alpha \) to a state \(\rho _{}\) and to a configuration \(\kappa _{}\), respectively. These functions \(\alpha _S\) and \(\alpha _C\) are named a local transposition and a global transposition, respectively.

Definition 1 (Local Transposition)

Let \(\mathcal {U}_{N}\) be a process template with process indexes \(1..N\), and \(\rho _{} = \left( \ell , S_1, \ldots , S_N, d_1, \ldots , d_N \right) \) be a state in \(\mathcal {U}_{N}\). Let \(\alpha =(i \leftrightarrow j)\) be a transposition on \(1..N\). The application of \(\alpha \) to \(\rho _{}\), denoted as \(\alpha _S(\rho _{})\), generates a tuple \( \left( \ell ^{\prime }, S^{\prime }_1, \ldots , S^{\prime }_N, d^{\prime }_1, \ldots , d^{\prime }_N \right) \) such that

  1. 1.

    \(\ell = \ell ^{\prime }\), and \(S_i = S^{\prime }_j\), and \(S_j = S^{\prime }_i\), and \(d_i = d^{\prime }_j\) and \(d_j = d^{\prime }_i\), and

  2. 2.

    \(\forall k \in 1..N:(k \ne i \wedge k \ne j) \Rightarrow (S_k = S^{\prime }_k \wedge d_k = d^{\prime }_k)\)

Definition 2 (Global Transposition)

Let \(\mathcal {G}_{N}\) be a global system with process indexes \(1..N\), and \(\kappa _{}\) be a configuration in \(\mathcal {G}_{N}\). Let \(\alpha =(i \leftrightarrow j)\) be a transposition on \(1..N\). The application of \(\alpha \) to \(\kappa _{}\), denoted as \(\alpha _{C}(\kappa _{})\), generates a configuration in \(\mathcal {G}_{N}\) which satisfies following properties:

  1. 1.

    \(\forall i \in 1..N: lstate (\alpha _{C}(\kappa _{}), \alpha (i)) = \alpha _S( lstate (\kappa _{}, i))\).

  2. 2.

    \(\forall s, r \in 1..N: buf (\alpha _{C}(\kappa _{}), \alpha (s), \alpha (r)) = buf (\kappa _{}, s, r) \)

  3. 3.

    \(\forall i \in 1..N: active (\alpha _{C}(\kappa _{}), \alpha (i)) = active (\kappa _{}, i)\)

Since the content of every message in \(\texttt {Msg}\) does not have indexes of the receiver and sender, no transposition affects the messages. We define the application of a transposition to one of send, compute, crash, and stutter template transitions return the same transition. We extend the application of a transposition to a receive template transition as in Definition 3.

Definition 3 (Receive-transition Transposition)

Let \(\mathcal {U}_{N}\) be a process template with process indexes \(1..N\), and \(\alpha =(i \leftrightarrow j)\) be a transposition on \(1..N\). Let \( crcv (S_1, \ldots , S_N)\) be a transition in \(\mathcal {U}_{N}\) which refers to a task with a receive action. We let \(\alpha _R( crcv (S_1, \ldots , S_N))\) denote the application of \(\alpha \) to \( crcv (S_1, \ldots , S_N)\), and this application returns a new transition \( crcv (S^{\prime }_1, \ldots , S^{\prime }_N)\) in \(\mathcal {U}_{N}\) such that:

  1. 1.

    \(S_i = S^{\prime }_j\), and \(S_j = S^{\prime }_i\), and

  2. 2.

    \(\forall k \in 1..N:(k \ne i \wedge k \ne j) \Rightarrow (S_k = S^{\prime }_k \wedge d_k = d^{\prime }_k)\)

We let \(\alpha _{U}(\mathcal {U}_{N})\) and \(\alpha _{G}(\mathcal {G}_{N})\) denote the application of a transposition \(\alpha \) to a process template \(\mathcal {U}_{N}\) and a global transition \(\mathcal {G}_{N}\), respectively. Since these definitions are straightforward, we skip them in this paper. We prove later that \(\alpha _S(\mathcal {U}_{N}) = \mathcal {U}_{N}\) and \(\alpha _{C}(\mathcal {G}_{N}) = \mathcal {G}_{N}\) (see Lemmas 1 and 2).

Lemma 1 (Symmetric Process Template)

Let \(\mathcal {U}_{N} = (Q_{N}, Tr _{N}, Rel _{N}, q _{N}^0)\) be a process template with indexes \(1..N\). Let \(\alpha =(i \leftrightarrow j)\) be a transposition on \(1..N\), and \(\alpha _S\) be a local transposition based on \(\alpha \) (from Definition 1). The following properties hold:

  1. 1.

    \(\alpha _S\) is a bijection from \(Q_{N}\) to itself.

  2. 2.

    The initial state is preserved under \(\alpha _S\), i.e. \(\alpha _S( q _{N}^0) = q _{N}^0\).

  3. 3.

    Let \(\rho _{}, \rho ^{\prime }_{} \in \mathcal {U}_{N}\) be states such that \(\rho _{} \xrightarrow { crcv _{}(S_1, \ldots , S_N)} \rho ^{\prime }_{}\) for some sets of messages \(S_1, \ldots , S_N \in \texttt {Set}(\texttt {Msg})\). It follows \(\alpha _S(\rho _{}) \xrightarrow {\alpha _R( crcv (S_1, \ldots , S_N))} \alpha _S(\rho ^{\prime }_{}) \).

  4. 4.

    Let \(\rho _{}, \rho ^{\prime }_{}\) be states in \(\mathcal {U}_{N}\), and \( tr \in Tr _{N}\) be one of send, local computation, crash and stutter transitions such that \(\rho _{} \xrightarrow { tr _{}} \rho ^{\prime }_{}\). Then, \(\alpha _S(\rho _{}) \xrightarrow { tr _{}} \alpha _S(\rho ^{\prime }_{}) \).

Lemma 2 (Symmetric Global System)

Let \(\mathcal {U}_{N} = (Q_{N}, Tr _{N}, Rel _{N}, q _{N}^0)\) be a process template, and \(\mathcal {G}_{N} = \left( \mathcal {C}_{N}, T_{N}, R_{N}, g _{N}^0 \right) \) be a global transition system such that the process indexes is a set \(1..N\), and every process is instantiated with \(\mathcal {U}_{N}\). Let \(\alpha \) be a transposition on \(1..N\), and \(\alpha _C\) be a global transposition based on \(\alpha \) (from Definition 2). The following properties hold:

  1. 1.

    \(\alpha _C\) is a bijection from \(\mathcal {C}_{N}\) to itself.

  2. 2.

    The initial configuration is preserved under \(\alpha _C\), i.e. \(\alpha _{C}( g _{N}^0) = g _{N}^0\).

  3. 3.

    Let \(\kappa _{}\) and \(\kappa ^{\prime }_{}\) be configurations in \(\mathcal {G}_{N}\), and \( tr \in T_{N}\) be either a internal transition such that \(\kappa _{} \xrightarrow { tr } \kappa ^{\prime }_{}\). It follows \(\alpha _{C}(\kappa _{}) \xrightarrow { tr } \alpha _{C}(\kappa ^{\prime }_{}) \).

  4. 4.

    Let \(\kappa _{}\) and \(\kappa ^{\prime }_{}\) be configurations in \(\mathcal {G}_{N}\). If , then .

4.2 Trace Equivalence of \(\mathcal {G}_{2}\) and \(\mathcal {G}_{N}\) Under \(AP_{\{1, 2\}}\)

Let \(\mathcal {G}_{2}\) and \(\mathcal {G}_{N}\) be two global transition systems whose processes follow the same symmetric point–to–point algorithm. In the following, our goal is to prove Lemma 5 that says \(\mathcal {G}_{2}\) and \(\mathcal {G}_{N}\) are trace equivalent under a set \(AP_{\{1, 2\}}\) of predicates which take one of the forms: \(Q_1(1), Q_2(2), Q_3(1, 2)\), or \(Q_4(2, 1)\). To do that, we first present two construction techniques: Construction 1 to construct a state in \(\mathcal {U}_{2}\) from a state in \(\mathcal {U}_{N}\), and Construction 2 to construct a global configuration in \(\mathcal {G}_{2}\) from a given global configuration in \(\mathcal {G}_{N}\). Then, we present two Lemmas 3 and 4. These lemmas are required in the proof of Lemma 5.

To keep the presentation simple, when the context is clear, we simply write \(\mathcal {U}_{N}\), instead of fully \(\mathcal {U}_{N} = \left( Q_{N}, Tr _{N}, Rel _{N}, q _{N}^0\right) \). We also write \(\mathcal {G}_{N}\), instead of fully \(\mathcal {G}_{N} = \left( \mathcal {C}_{N}, T_{N}, R_{N}, g _{N}^0 \right) \).

Construction 1 (State Projection)

Let \(\mathcal {A}\) be an arbitrary symmetric point–to–point algorithm. Let \(\mathcal {U}_{N}\) be a process template of \(\mathcal {A}\) for some \(N \ge 2\), and \(\rho _{}^N\) be a process configuration of \(\mathcal {U}_{N}\). We construct a tuple \(\rho _{}^2 = (pc_1, rcvd_1, rcvd_2, v_1, v_2)\) based on \(\rho _{}^N\) and a set \(\{1, 2\}\) of process indexes in the following way:

  1. 1.

    \(pc_1 = pc (\rho _{}^N)\).

  2. 2.

    For every \(i \in \{1, 2\}\), it follows \(rcvd_i = rcvd (\rho _{}^N, i)\).

  3. 3.

    For every \(i \in \{1, 2\}\), it follows \(v_i = lvar (\rho _{}^N, i)\).

Construction 2 (Configuration Projection)

Let \(\mathcal {A}\) be a symmetric point–to–point algorithm. Let \(\mathcal {G}_{2}\) and \(\mathcal {G}_{N}\) be two global transitions of two instances of \(\mathcal {A}\) for some \(N \ge 2\), and \(\kappa _{}^N \in \mathcal {C}_{N}\) be a global configuration in \(\mathcal {G}_{N}\). We construct a tuple \( \kappa _{}^2 = (s_1, s_2, buf^1_1, buf^2_1, buf^1_2, buf^2_2, act_1, act_2) \) based on the configuration \(\kappa _{}^N\) and a set \(\{1, 2\}\) of indexes in the following way:

  1. 1.

    For every \(i \in \{1, 2\}\), a component \(s_i\) is constructed from \( lstate (\kappa _{}^N, i)\) with Construction 1 and indexes \(\{1, 2\}\).

  2. 2.

    For every \(s, r \in \{1, 2\}\), it follows \(buf^r_s = buf (\kappa _{}^N, s, r)\).

  3. 3.

    For every process \(i \in \{1, 2\}\), it follows \(act_i = active (\kappa _{}^N, i) \).

Note that a tuple \(\rho _{}^2\) constructed with Construction 1 is a state in \(\mathcal {U}_{2}\), and a tuple \(\kappa _{}^2\) constructed with Construction 2 is a configuration in \(\mathcal {G}_{2}\). We call \(\rho _{}^2\) (and \(\kappa _{}^2\)) the index projection of \(\rho _{}^N\) (and \(\kappa _{}^N\)) on indexes \(\{1, 2\}\). The following Lemma 3 says that Construction 2 allows us to construct an admissible sequence of global configurations in \(\mathcal {G}_{2}\) based on a given admissible sequence in \(\mathcal {G}_{N}\).

Lemma 3

Let \(\mathcal {A}\) be a symmetric point–to–point algorithm. Let \(\mathcal {G}_{2}\) and \(\mathcal {G}_{N}\) be two transition systems such that all processes in \(\mathcal {G}_{2}\) and \(\mathcal {G}_{N}\) follow \(\mathcal {A}\), and \(N \ge 2\). Let \(\pi ^N = \kappa _{0}^N\kappa _{1}^N \ldots \) be an admissible sequence of configurations in \(\mathcal {G}_{N}\). Let \(\pi ^2 = \kappa _{0}^2\kappa _{1}^2 \ldots \) be a sequence of configurations in \(\mathcal {G}_{2}\) such that \(\kappa _{k}^2\) is the index projection of \(\kappa _{k}^N\) on indexes \(\{1, 2\}\) for every \(k \ge 0\). Then, \(\pi ^2\) is admissible in \(\mathcal {G}_{2}\).

The proof of Lemma 3 is based on the following observations:

  1. 1.

    The application of Construction 1 to an initial template state of \(\mathcal {U}_{N}\) constructs an initial template state of \(\mathcal {U}_{2}\).

  2. 2.

    Construction 1 preserves the template transition relation.

  3. 3.

    The application of Construction 2 to an initial global configuration of \(\mathcal {G}_{N}\) constructs an initial global configuration of \(\mathcal {G}_{2}\).

  4. 4.

    Construction 2 preserves the global transition relation.

Moreover, Lemma 4 says that given an admissible sequence \(\pi ^2 = \kappa _{0}^2 \kappa _{1}^2 \ldots \) in \(\mathcal {G}_{2}\), there exists an admissible sequence \(\pi ^N = \kappa _{0}^N \kappa _{1}^N \ldots \) in \(\mathcal {G}_{N}\) such that \(\kappa _{i}^2\) is the index projection of \(\kappa _{i}^N\) on indexes \(\{1, 2\}\) for every \(0 \le i\).

Lemma 4

Let \(\mathcal {A}\) be an arbitrary symmetric point–to–point algorithm. Let \(\mathcal {G}_{2}\) and \(\mathcal {G}_{N}\) be global transition systems of \(\mathcal {A}\) for some \(N \ge 2\). Let \(\pi ^2 = \kappa _{0}^2 \kappa _{1}^2 \ldots \) be an admissible sequence of configurations in \(\mathcal {G}_{2}\). There exists an admissible sequence \(\pi ^N = \kappa _{0}^N \kappa _{1}^N \ldots \) of configurations in \(\mathcal {G}_{N}\) such that \(\kappa _{i}^2\) is index projection of \(\kappa _{i}^N\) on indexes \(\{1, 2\}\) for every \(i \ge 0\).

Lemma 5

Let \(\mathcal {A}\) be a symmetric point–to–point algorithm. Let \(\mathcal {G}_{2}\) and \(\mathcal {G}_{N}\) be its instances for some \(N \ge 2\). Let \(AP_{\{1, 2\}}\) be a set of predicates that take one of the forms: \(Q_1(1)\), \(Q_2(2)\), \(Q_3(1, 2)\) or \(Q_4(2, 1)\). It follows that \(\mathcal {G}_{2}\) and \(\mathcal {G}_{N}\) are trace equivalent under \(AP_{\{1, 2\}}\).

4.3 Cutoff Results Of symmetric point–to–point algorithms

In the following, we prove the cutoff result in Theorem 2 (see Page 10). A proof of another cutoff result in Theorem 1 is similar.

Proof sketch of Theorem  2. We have

\(\Leftrightarrow \,\, \big ( \bigwedge ^{i \ne j}_{i, j \in 1..N} ( \forall \pi _N \in Path _N :\mathcal {G}_{N}, \pi _N \models \psi _{\{i, j\}} ) \big ) \). Let i and j be two process indexes in a set \(1..N\) such that \(i \ne j\). It follows that \(\alpha ^1 = (i \leftrightarrow 1)\) and \(\alpha ^2 = (j \leftrightarrow 2)\) are transpositions on \(1..N\) (*). By Lemma 2, we have: (i) \(\psi _{\{\alpha ^1(i), \alpha ^2(j)\}} = \psi _{\{1, 2\}}\), and (ii) \(\alpha ^2 ( (\alpha ^1 ( \mathcal {G}_{N}) ) ) = \alpha ^2 ( \mathcal {G}_{N} ) = \mathcal {G}_{N}\), and (iii) \(\alpha ^2 ( (\alpha ^1 ( g _{N}^0) ) ) = \alpha ^2 ( g _{N}^0 ) = g _{N}^0\).

Since \(\psi _{\{i, j\}}\) is an LTL\(\backslash \)X formula, \(\mathbf{A} \psi _{\{i, j\}}\) is a CTL*\(\backslash \)X formula where \(\mathbf{A} \) is a path operator in CTL*\(\backslash \)X (see [9]). By the semantics of the operator \(\mathbf{A} \), it follows \(\forall \pi _N \in Path _N :\mathcal {G}_{N}, \pi _N \models \psi _{\{i, j\}}\) if and only if \(\mathcal {G}_{N}, g _{N}^0 \models \mathbf{A} \psi _{\{i, j\}}\). By point (*), it follows \(\mathcal {G}_{N}, g _{N}^0 \models \mathbf{A} \psi _{\{i, j\}}\) if and only if \(\mathcal {G}_{N}, g _{N}^0 \models \mathbf{A} \psi _{\{1, 2\}}\). We have that \( \mathcal {G}_{N}, g _{N}^0 \models \bigwedge ^{i \ne j}_{i, j \in 1..N} \mathbf{A} \psi (i, j)\) if and only if \(\mathcal {G}_{N}, g _{N}^0 \models \mathbf{A} \psi (1, 2) \), because both i and j are arbitrary and different. By the semantics of the operator \(\mathbf{A} \), we have \(\mathcal {G}_{N}, g _{N}^0 \models \mathbf{A} \psi (1, 2)\) if and only if \(\forall \pi _N \in Path _N :\mathcal {G}_{N}, \pi _N \models \psi (1, 2)\). It follows \(\forall \pi _N \in Path _N :\mathcal {G}_{N}, \pi _N \models \psi (1, 2)\) if and only if \(\forall \pi _2 \in Path _2 :\mathcal {G}_{2}, \pi _2 \models \psi (1, 2)\) by Lemma 5. Then, Theorem 2 holds.    \(\square \)

5 Experiments

To demonstrate the feasibility of our approach, we specified the failure detector [5] in \(\textsc {TLA}^+\) [22]Footnote 1. Our specification follows the model of computation in Section 3. It is close to the pseudo-code in 1, except that these tasks are organized in a loop: task 1, task 2, and task 3. Moreover, our encoding contains the upper bounds on transmission time of messages and on the relative speeds of different processes, called \(\varDelta \) and \(\varPhi \) respectively. The user can verify our specification with different values of \(\varDelta \) and \(\varPhi \) by running model checkers TLC [28] and APALACHE [20]. Our experiments were set up in the synchronous case where \(\varDelta = 0\) and \(\varPhi = 1\). To reduce the state space, we apply abstractions to a global clock, local clocks, and received messages. Our abstractions are explained in detail in our \(\textsc {TLA}^+\) specification.

Table 1. Checking the failure detector [5] in the synchronous case

We ran the following experiments on a laptop with a core i7-6600U CPU and 16GB DDR4. Table 1 presents the results in model checking the failure detectors [5] in the synchronous model. From the theoretical viewpoint, an instance with \(N = 1\) is necessary, but we show only interesting cases with \(N \ge 2\) in Table 1. (We did check an instance with \(N = 1\), and there are no errors in this instance.) The strong accuracy property is the following safety property: \( \mathbf{G} (\forall p, q \in 1..N:( Correct (p) \wedge Correct (q)) \Rightarrow \lnot Suspected (p, q))\). The column “depth” shows the maximum execution length used by our tool as well as the maximum depth reached by TLC while running breadth-first search. For the second and forth benchmarks, we used the diameter bound that was reported by TLC, which does exhaustive state exploration. Hence, the verification results with APALACHE are complete. The abbreviation “TO” means timeout of 10 h. The inductive invariant is on the transition , and contains type invariants, constraints on the age of in-transit messages, and constraints on when a process executes a task.

6 Conclusion

We have introduced the class of symmetric point-to-point algorithms that capture some well-known algorithms, e.g. failure detectors. The symmetric point-to-point algorithms enjoy the cutoff property. We have shown that checking properties of the form \(\omega (i)\) has a cutoff of 1, and checking properties of the form \(\psi (i, j)\) has a cutoff of 2 where \(\omega (i)\) is an LTL\(\backslash \)X formula whose predicates inspect only variables with a process index i, and \(\psi (i, j)\) is an LTL\(\backslash \)X formula whose predicates inspect only variables with two different process indexes \(i \ne j\). We demonstrated the feasibility of our approach by specifying and model checking the failure detector by Chandra and Toueg under synchrony with two model checkers TLC and APALACHE.

We see two directions for future work. The first is to find new cutoffs for checking other properties in symmetry point-to-point algorithms. For example, given a correctness property with k universal quantifiers over process index variables, we conjecture that checking k small instances whose size is less than or equal to k is sufficient to reason about the correctness of all instances. The second is to extend our results to the model of computation under partial synchrony. This model has additional time constraints on message delay \(\varDelta \) and the relative process speed \(\varPhi \). Algorithms under partial synchrony are parameterized by \(\varDelta \) and \(\varPhi \). We explore techniques to deal with these parameters.