Formal Verification of Multi-Paxos for Distributed Consensus

Chand, Saksham; Liu, Yanhong A.; Stoller, Scott D.

doi:10.1007/978-3-319-48989-6_8

Saksham Chand¹⁷,
Yanhong A. Liu¹⁷ &
Scott D. Stoller¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9995))

Included in the following conference series:

International Symposium on Formal Methods

1863 Accesses
20 Citations

Abstract

This paper describes formal specification and verification of Lamport’s Multi-Paxos algorithm for distributed consensus. The specification is written in TLA+, Lamport’s Temporal Logic of Actions. The proof is written and checked using TLAPS, a proof system for TLA+. Building on Lamport, Merz, and Doligez’s specification and proof for Basic Paxos, we aim to facilitate the understanding of Multi-Paxos and its proof by minimizing the difference from those for Basic Paxos, and to demonstrate a general way of proving other variants of Paxos and other sophisticated distributed algorithms. We also discuss our general strategies for proving properties about sets and tuples that helped the proof check succeed in significantly reduced time.

This work was supported in part by NSF grants CCF-1414078, CCF-1248184, and CNS-1421893, ONR grant N000141512208, and AFOSR grant FA9550-14-1-0261. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of these agencies.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Simpler Specifications and Easier Proofs of Distributed Algorithms Using History Variables

Paxos Consensus, Deconstructed and Abstracted

Eventual consensus in Synod: verification using a failure-aware actor model

Article 25 July 2022

1 Introduction

Distributed consensus is a fundamental problem in distributed computing. It requires that a set of processes agree on some value or values. Consensus is essential when distributed services are replicated for fault-tolerance, because non-faulty replicas must agree. Unfortunately, consensus is difficult when the processes or communication channels may fail.

Paxos [16] is an important algorithm, developed by Lamport, for solving distributed consensus. Basic Paxos is for agreeing on a one-shot value, such as whether to commit a database transaction. Multi-Paxos is for agreeing on an infinite sequence of values, for example, a stream of commands to execute. Multi-Paxos has been used in many important distributed services, e.g., Google’s Chubby [1, 3] and Microsoft’s Autopilot [13]. There are other Paxos variants, e.g., that reduce a message delay [19] or add preemption [17], but Multi-Paxos is the most important in making Paxos practical for distributed services that must perform a continual sequence of operations.

Paxos handles processes that run concurrently without shared memory, where processes may crash and may later recover, and messages may be delayed indefinitely or lost. In Basic Paxos, each process may repeatedly attempt to be the leader and propose some value, and wait for appropriate replies from appropriate subsets of the processes while also replying appropriately to other processes; consensus is reached eventually if enough processes and channels are non-faulty to elect a leader. In Multi-Paxos, many more different attempts, proposals, and replies may happen in overlapping fashions to reach consensus on values in different slots in the continual sequence.

Paxos has often been difficult to understand, even though it was created almost three decades ago [21]. Lamport later wrote a much simpler description of the phases of the algorithm but only for Basic Paxos [17]. Lamport, Merz, and Doligez [22] wrote a formal specification and proof of Basic Paxos in TLA+ [18] and TLAPS [26]. Many efforts, especially in recent years, have been spent on formal specification and verification of Multi-Paxos, but they use more restricted or less direct language models, some mixed in large systems with many unrelated functionalities, or handle other variants of Paxos than Multi-Paxos, as discussed in Sect. 7. What is lacking is formal specification and proof of the exact phases of Multi-Paxos, in a most direct and general language like TLA+ [18], with a complete proof that is mechanically checked, and a general method for doing such specifications and proofs in a more feasible way.

This paper addresses this challenge. We describe a formal specification of Multi-Paxos written in TLA+, and a complete proof written and automatically checked using TLAPS. Building on Lamport, Merz, and Doligez’s specification and proof for Basic Paxos, we aim to facilitate the understanding of multi-Paxos and its proof by minimizing the difference from those for Basic Paxos. The key change in the specification is to replace operations involving two numbers with those involving a set of 3-tuples, for each of a set of processes, exactly capturing the minimum conceptual difference between Basic Paxos and Multi-Paxos. However, the proof becomes significantly more difficult because of the handling of sets and tuples in place of two numbers.

This work also aims to show the minimum-change approach as a general way of specifying and verifying other variants of Paxos, and more generally of specifying and verifying other sophisticated algorithms by starting from the basics. We demonstrate this by further showing the extension of the specification and proof of Multi-Paxos to add preemption—letting processes abandon proposals that are already preempted by other proposals [17, 29]. We also extended the specification and proof of Basic Paxos with preemption, which is even easier.

Finally, we discuss a general method we attempted to follow to tackle tedious and difficult proof obligations involving sets and tuples, a well-known significant complication in general. For difficult properties involving sets, we use induction and direct the prover to focus on the changes in the set values. For properties involving tuples, we change the ways of accessing and testing the elements to yield significantly reduced proof-checking time. Overall, we were able to keep the specification minimally changed, and keep the proof-checking time to about 2 min or less while the prover checks the proofs for over 900 obligations for both Multi-Paoxs and Multi-Paxos with Preemption.

Our full TLA+ specification and TLAPS-checked proof of Multi-Paxos with Preemption are included in the Appendix of the full version [2].

2 Distributed Consensus and Paxos

A system is a set of processes that can process values individually and communicate with each other by sending and receiving messages. The processes may crash and may later recover. The messages may be delayed indefinitely or lost.

Distributed consensus. The basic consensus problem, called single-value consensus or single-decree consensus, is to ensure that a single value is chosen from among the values proposed by the processes. The safety requirements for basic consensus are [17]:

Only a value that has been proposed may be chosen.
Only a single value is chosen.
A process never learns that a value has been chosen unless it actually has been chosen.

Formally this is defined as

(1)

where \(\mathcal {V}\) is the set of possible proposed values, and \(\phi \) is a predicate that given a value v evaluates to true iff v was chosen by the algorithm. The specification of \(\phi \) is part of the algorithm.

The more general consensus problem, called multi-value consensus or multi-decree consensus, is to choose a sequence of values, instead of a single value. Here we have

(2)

where \(\mathcal {V}\) is as above, \(\mathcal {S}\) is a set of slots used to index the sequence of decisions, and \(\phi \) is a predicate that given a value v and a slot s evaluates to true iff v was chosen for s by the algorithm.

Basic Paxos and Multi-Paxos. Paxos solves the problem of consensus. Two main roles of the algorithm are performed by two kinds of processes:

\(\mathcal {P}\) is the set of proposers. These processes propose values that can be chosen.
\(\mathcal {A}\) is the set of acceptors. These processes vote for proposed values. A value is chosen when there are enough votes for it.

A set \(\mathcal {Q}\) of subsets of the acceptors, i.e., \(\mathcal {Q} \subseteq 2^{\mathcal {A}}\), is used as a quorum system. It must satisfy the following properties:

\(\mathcal {Q}\) is a set cover for \(\mathcal {A}\), i.e., \(\bigcup _{Q \in \mathcal {Q}} Q = \mathcal {A}\).
Any two quorums overlap, i.e., \(\forall Q_1, Q_2 \in \mathcal {Q} : Q_1 \cap Q_2 \ne \emptyset \).

The most commonly used quorum system \(\mathcal {Q}\) takes any majority of acceptors as an element in \(\mathcal {Q}\).

Basic Paxos solves single-value consensus. It defines predicate \(\phi \) as

(3)

where \(\mathcal {B}\) is the set of proposal numbers, also called ballots, which is any set that can be strictly totally ordered. sent(“2b”, b, v, a) means that a message of type “2b” with ballot b and value v was sent by acceptor a (to some set of processes). An acceptor votes by sending such a message.

Multi-Paxos solves the problem of multi-value consensus. It trivially extends predicate \(\phi \) to decide a value for each slot s in \(\mathcal {S}\):

(4)

To satisfy the safety requirements, \(\mathcal {S}\) need not have any relations defined on it. In practice, \(\mathcal {S}\) is usually the natural numbers.

Figure 1 shows Lamport’s description of Basic Paxos [17]. It uses any majority of acceptors as a quorum. In Phase 2a, it instructs the accept request be sent to each acceptor that replied with the proposer’s ballot n, but it is sufficient for safety to send accept to any subset of \(\mathcal {A}\). However, because the proposer is waiting for a quorum, the set of receivers should contain at least one quorum, which again is allowed to be different from the quorum that responded to n.

Multi-Paxos can be built from Basic Paxos by carefully adding slots. In Basic Paxos, acceptors cache the value they have accepted with the highest ballot. With slots, we have a sequence of these values indexed by slot. Therefore,

In Phase 1b, the acceptor now replies with a mapping in \(\mathcal {S} \rightarrow \mathcal {B} \times \mathcal {V}\) as opposed to just one pair in \(\mathcal {B} \times \mathcal {V}\).
The same change is needed in Phase 2b.
Upon receiving such a mapping as a reply, in Phase 2a, a proposer proposes a mapping in \(\mathcal {S} \rightarrow \mathcal {V}\) instead of just one value in \(\mathcal {V}\). In the same way that v was chosen in Basic Paxos, by picking the value backed by the highest received ballot, in Multi-Paxos, the proposer does this calculation for each slot in the received mapping.
Phase 1a is unchanged.
Learning, as described in the last part of Fig. 1, is also unchanged, except to consider different slots separately—a process learns that a value is chosen for a slot if a quorum of acceptors accepted it for that slot.

Note that the size of messages replied by the acceptors grows as \(\mathcal {S}\) increases, which is a common abstraction before applying optimizations [16, 29].

3 Specification of Multi-Paxos

We give a formal specification of Multi-Paxos by minimally extending that of Basic Paxos by Lamport et al. [22].

Variables. The specification of Multi-Paxos has four global variables.

msgs—the set of messages that have been sent. Processes read from or add to this set. This is the same as in the specification of Basic Paxos.
accVoted—per acceptor, a set of triples in \(\mathcal {B} \times \mathcal {S} \times \mathcal {V}\), capturing a mapping in \(\mathcal {S} \rightarrow \mathcal {B} \times \mathcal {V}\), that the acceptor has voted for. This contrasts two numbers per acceptor, in two variables, maxVBal and maxVal, in Basic Paxos.
accMaxBal—per acceptor, the highest ballot seen by the acceptor. This is named maxBal in the specification of Basic Paxos.
proBallot—per proposer, the ballot of the current ballot being run by the proposer. This is not in the specification of Basic Paxos; it is added to support preemption and is only updated during preemption.

Note that in accVoted, we maintain a set of pairs in \(\mathcal {B} \times \mathcal {V}\), not just the pair with the maximum ballot. This is an abstraction that simplifies the specification and allows possible generalization of Paxos [29].

Algorithm Steps. The algorithm consists of repeatedly executing two phases.

Phase 1a. Figure 2 shows the specifications of Phase 1a for Basic Paxos and Multi-Paxos, which are in essence the same. Parameter ballot b, in Basic Paxos is replaced with proposer p executing this phase in Multi-Paxos, to allow extensions such as preemption that need to know the proposer of a ballot; uses of b are changed to proBallot[p]; and \(from \mapsto p\) is added in Send. Send is a macro that adds its argument to msgs, i.e., . In this specification, 1a messages do not have a receiver, making them accessible to all processes. However, this is not required. It is enough to send this message to any subset of \(\mathcal {A}\) that contains a quorum.
Phase 1b. Figure 3 shows the specifications of Phase 1b. Parameter acceptor a executes this phase. The only key difference between the specifications is the set accVoted[a] of triples in Send of Multi-Paxos vs. the two numbers maxVBal[a] and maxVal[a] in Basic Paxos.
Phase 2a. Figure 4 shows Phase 2a. The key difference is, in Send, the bloating of a single value v in \(\mathcal {V}\) in Basic Paxos to a set of pairs given by ProposeDecrees capturing a mapping in \(\mathcal {S} \rightarrow \mathcal {V}\) in Multi-Paxos. The operation of finding the value with the highest ballot in Basic Paxos is performed for each slot by Bmax in Multi-Paxos; Bmax takes a set T of triples capturing a mapping in \(\mathcal {S} \rightarrow \mathcal {B} \times \mathcal {V}\) and returns a set of pairs capturing a mapping in \(\mathcal {S} \rightarrow \mathcal {V}\). NewProposals generates a set of pairs capturing a mapping in \(\mathcal {S} \rightarrow \mathcal {V}\) where values are proposed for slots not in Bmax. Note that this is significantly more sophisticated than running Basic Paxos for each slot, because the ballots are shared and changing for all slots, and slots are paired with values dynamically where slots that failed to reach consensus values earlier are also detected and reused.
Phase 2b. Figure 5 shows Phase 2b. In Basic Paxos, the acceptor updates its voted pair maxVBal[a] and maxVal[a] upon receipt of a 2a message of the highest ballot. In Multi-Paxos, this is performed for each slot. The acceptor updates accVoted to have all decrees in the received 2a message and all previous values in accVoted for slots not mentioned in that message.

Complete Algorithm Specification. To complete the algorithm specification, we define vars, Init, Next, and Spec, typical TLA+ macro names for the set of variables, the initial state, possible actions leading to the next state, and the system specification, respectively:

(5)

4 Verification of Multi-Paxos

We first define the auxiliary predicates and invariants used, by extending those for the proof of Basic Paxos with slots, and then describe our proof strategy which proves Consistency of Multi-Paxos.

Auxiliary Predicates. These predicates are used throughout the proof. We define the predicate \(\phi \) in (4) by \(\phi (v, s) \equiv Chosen(v, s)\), where:

(6)

Predicate returns the highest ballot among all pairs in set D with slot s.

(7)

Type Invariants. Type invariants are captured by TypeOK.

(8)

Invariants About Messages. The following invariant is for 1b messages. The first conjunct establishes that the ballot is at most the highest ballot seen by the sending acceptor. The second conjunct states that the decrees contained within the message body have been voted for by the sending acceptor. The last conjunct asserts that for each slot, relative to the timeline established by ballots, since the last time this acceptor voted in the slot to the time this message was sent, no voting occurred on the slot by this acceptor.

(9)

Proof Strategy. The proof is developed following a standard hierarchical structure and uses proof by induction and contradiction.

(10)

where AccInv is an invariant about acceptors, and MsgInv2a and MsgInv2b are invariants for 2a and 2b messages, respectively, and these three invariants are defined in the Appendix of the full version [2].

The main theorem to prove is Consistent as defined in Eq. (10). For this, we define Inv and first prove . Then, we prove which by temporal logic, concludes . To prove , we employ a systematic proof strategy that works very well for algorithms described in the event driven paradigm, including message-passing distributed algorithms. We demonstrate the strategy for some invariants in Inv.

First, consider invariant TypeOK. The goal is . Recall . The induction basis, , is trivial, and TLAPS handles it automatically. Next, we want to prove , where the left side is the induction hypothesis, and right side is the goal to be proved. \([Next]_{vars}\) is a disjunction of phases, as for any algorithm, and \(TypeOK'\) is a conjunction of smaller invariants, as for many invariants. Now, the basis can be stripped down to each disjunct separately, and each smaller goal needs to be proved from all smaller disjuncts. This process is mechanical, and TLAPS provides a feature for precisely this expansion into smaller proof obligations. This breakdown is the first step in our proof strategy. For TypeOK, this expands to 5 smaller assertions; with 5 phases in Next, we obtain 25 small proofs done by the prover with no manual intervention.

MsgInv and AccInv are more involved. We proceed like we did for TypeOK and create a proof tree, each branch of which aims to prove an invariant for some disjunct in Next. To explain the rest of our strategy, we show one combination: MsgInv and Phase1b. Equation (11) gives the skeleton of the proof; the full proof is in the Appendix of the full version [2]. Goal for the prover is step \(\langle 4\rangle 2\) which states that \(MsgInv'\) holds if an acceptor, a, executes Phase1b. m is any message in the new set of messages, \(msgs'\). Substeps \(\langle 5\rangle 1,2,3\) focus on MsgInv1b, MsgInv2a, MsgInv2b, respectively.

Phase1b generates a 1b message. \(\langle 5\rangle 3\) is easy for the prover as it argues about 2b messages. Intuitively, \(\langle 5\rangle 2\) should be easy for the prover too since, like \(\langle 5\rangle 3\), it involves a message type that is not what Phase1b generates. However, this is not the case because of predicate SafeAt, which is used in MsgInv2a and expresses whether it is safe to accept a given value for a given ballot for a given slot (the formal definition is in the Appendix of the full version [2]). At this point the prover needs a continuity lemma.

We define a continuity lemma as a lemma which asserts that a predicate continues to hold (or not hold) as the system goes from one state to the next in a single step. For example, the continuity lemma for SafeAt states that SafeAt continues to hold for any disjunct in Next, which includes Phase1b(a). The characteristic property of such lemmas is their reuse. In our proof of Multi-Paxos, we defined 5 continuity lemmas which are asserted in 24 places.

Lastly, we need to prove \(\langle 5\rangle 1\). Since \(\langle 5\rangle 1\) asserts about 1b messages and Phase1b generates such messages, the proof is more complicated and the prover needs manual intervention. Here we split the set of messages in the new state into two: \(\langle 6\rangle 1\) for the old messages, and \(\langle 6\rangle 2\) for the increment created in this step. For the old messages, we need continuity lemmas. The most challenging is the increment. To deal with the increment, we focus on the cause of the increment—the definition of Phase1b—and treat each goal conjunct separately in \(\langle 7\rangle 1,2,3\). The prover proves \(\langle 7\rangle 1\) by just the definition of Phase1b and the fact that it is the increment. For \(\langle 7\rangle 2\), along with the definition of Phase1b, the prover also needs a continuity lemma for VotedForIn. \(\langle 7\rangle 3\) required, along with the definition of Phase1b and continuity lemmas, some problem-specific manual intervention. In this case, we helped the prover understand the change in limits of the set \(MaxVotedBallotInSlot(m.voted, s)+1..m.bal-1\).

(11)

Induction for properties over sets, and ways of accessing elements of tuples. After developing the proof using the above strategy, we were still faced with certain assertions which were difficult to prove. One of the main difficulties lay in proving properties about tuples and sets of tuples for each of a set of processes in Multi-Paxos, as opposed to scalars for each of a set of processes in Basic Paxos. It may appear that, in many places, this requires simply adding an extra parameter for the slot, but the proof became significantly more difficult: even in places where an explicit inductive proof is not needed, auxiliary facts had to be added to help TLAPS succeed or proceed faster.

For example, adding slots to the proof of theorem Consistent for Basic Paxos caused the prover to take about 90 s to check it. To aid the proof, we added \(\exists \, a \in \mathcal {A} : VotedForIn(a, v_1, b_1, s) \wedge VotedForIn(a, v_2, b_1, s)\) as an intermediary fact derivable from \(ChosenIn(v_1, b_1, s) \wedge ChosenIn(v_2, b_2, s) \wedge b_1 = b_2\). Following this, the prover asserted the conclusion \(v_1 = v_2\) in a few milliseconds.

Tuples have only a fixed number of components and therefore do not require separate inductive proofs, but they often turn out to be tricky and require special care in choosing the ways to access and test their elements, to reduce TLAPS’s proof-checking time. For example, consider the definition of VotedForIn in Eq. (6). Originally a test \([slot \mapsto s,val \mapsto v] \in m.decrees\) was written, because it was natural, but it had to be changed to \(\exists \, d \in m.decrees : d.slot = s \wedge d.val = v\), because the prover found the latter more helpful. With the original version, the proof did not carry through after 1 or 2 min. After the change, the proof proceeded quickly. One minute of waiting for such simple, small tests felt very long, making it uncertain whether the proof would carry through, even if it would in a longer time. With dozens of places like this, one also cannot afford to wait for this long at each such place.

5 Multi-Paxos with Preemption

Preemption is described informally in Lamport’s description of Basic Paxos in Fig. 1, in the paragraph about abandoning a proposal. Preemption has an acceptor reply to a proposer, in both Phases 1b and 2b, if the proposer’s ballot is stale i.e., the acceptor has seen a higher ballot than the one just received from the proposer. This reply is a hint to the proposer to increase its ballot.

To specify preemption, each of Phases 1b and 2b adds a new case for when the acceptor receives a lower ballot than some ballot it has seen before. We also define predicate Preempt that specifies how proposers update proBallot upon receiving a preemption message. Figure 6 shows Phase 1b with and without the modifications to add preemption. Modifications to Phase 2b are similar and are omitted for brevity.

Preemption adds a new phase in the variable Next, modifies definitions of existing phases, and adds a new type of message. This meant increasing the width of the proof tree for the new phase. This new branch of the proof was proven by asserting continuity lemmas already established earlier. The whole task of adding the new specification and proof took less than an hour.

6 Results of TLAPS-checked Proof

Figure 7 summarizes the results from our specification and proof.

The specification size grew by only 18 lines (16 %), from 115 lines for Basic Paxos to 133 lines for Multi-Paxos; another 23 lines are added for Preemption.
The proof size increased significantly by 763 lines (180 %), from 423 for Basic Paxos to 1106 for Multi-Paxos, due to the complex interaction between slots and ballots; only 30 more lines are added for Preemption, thanks to the reuse of all lemmas, especially continuity lemmas.
The maximum level of proof tree nodes increased from 7 to 11 going from Basic Paxos to Multi-Paxos but remained 11 after adding Preemption; this contrast is even stronger for the maximum degree of proof tree nodes, consistent with challenge of going to Multi-Paxos.
The increase in number of lemmas is due to the change from Maximum in Basic Paxos to MaxVotedBallotInSlot in Multi-Paxos, defined in Eq. (7). Five lemmas were needed for this predicate alone to aid the prover, as we moved from scalars to a set of tuples for each acceptor.
No proof by induction on set increment is used for Basic Paxos. Four such proofs are used for Multi-Paxos and for Multi-Paxos with Preemption.
Proof by contradiction is used once in the proof of Basic Paxos, and we extended it with slots in the proof of Multi-Paxos and Multi-Paxos with Premption.
The number of proof obligations to the prover increased most significantly, by 679 (284 %), from 239 for Basic Paxos to 918 for Multi-Paxos. Only another 41 proof obligations were added for Multi-Paxos with Preemption.
The proof-checking time increased significantly, by 104 s, from 24 for Basic Paxos to 128 for Multi-Paxos, despite our continuous efforts to help the prover reduce it, because of the greatly increased size and complexity of the inductions used, leading to significantly more obligations to the prover. Going to Multi-Paxos with Preemption, however, the proof-checking time decreased by about 25 %. This was initially surprising, but our understanding of Paxos and experience with proofs help support it: (1) adding the preemption cases to the original Phases 1b and 2b helps make the obligations in these cases more specialized and the remaining steps for proving consistency (which carry on longer in these cases before) easier; (2) adding preemption with Phases 1a and 2a increases the number of proof obligations, but the new obligations are easy, because they let the proposer start over (and thus there are no remaining steps in these cases). We are investigating further to confirm these.

7 Related Work and Conclusion

We discuss closest related results on verification of Paxos, categorized by the verification technique.

Model checking. Model checking automatically explores the state space of systems [6]. Lamport wrote TLA+ specifications for Basic Paxos and its variants, e.g., Fast Paxos [19], and checked them using the TLA+ model checker TLC [25], but he has not done this for Multi-Paxos or its variants; a number of MS students at our university have also done this in course projects, including for Multi-Paxos. Delano et al. [8] modeled Basic Paxos in Promela and checked it using the Spin model checker [31]. To reduce the state space, they use counting guards to track majority, reset local variables after state operations, and use sorted send instead of FIFO send (with random receive, to model non-FIFO channels). They checked Basic Paxos for pairs of numbers of proposers and acceptors up to (2,8), (3,5), (4,4), (5,3), and (8,2). Yabandeh et al. [35] checked a C++ implementation of Basic Paxos using CrystalBall, a tool built on Mace [15], which includes a model checker. Yang et al. [36] used their model checker MoDist to check a Multi-Paxos-based service system developed by a Microsoft product team [24]. With dynamic partial-order reduction [10], they found 13 bugs including 2 bugs in the Paxos implementation, with as few as 3 replicas and a few slots. In all cases, existing work in model checking either does not check Multi-Paxos or can check it for only a very small number of slots and processes.

Deductive verification. Kellomaki [14] formally specified and verified Basic Paxos using PVS [32]. Charron-Bost and Schiper [5] expressed Basic Paxos in the Heard-Of model, and Charron-Bost and Merz [4] verified it formally using Isabelle/HOL [33]. Drăgoi et al. [9] specified and verified a version of Basic Paxos in PSync, which is based on the Heard-Of model, so the specification and proof are similar to [4, 5]. Lamport et al. [22] give a formal specification of Basic Paxos in TLA+ and a TLAPS-checked proof of its correctness. Lamport [20] wrote a TLA+ specification of Byzantine Paxos, a variant of Basic Paxos that tolerates arbitrary failures, and a TLAPS-checked proof that it refines Basic Paxos. With IronFleet, Hawblitzel et al. [11] verified a state machine replication system that uses Multi-Paxos at its core. Their specification mimics TLA+ models but is written in Dafny [23], which has no direct concurrency support but has more automated proof support than TLAPS. This work is superior to its peers by proving not only safety but also liveness properties. Schiper et al. [30] used EventML [28] to specify Multi-Paxos and used NuPRL [7] to verify safety. Using the Verdi framework, Wilcox et al. [34] expressed Raft [27], an algorithm similar to Multi-Paxos, in OCAML and verified it using Coq [12]. All these works either do not handle Multi-Paxos or handle it using more restricted or less direct language models than TLA+, some mixed in large systems, making the essence of the algorithm’s proof harder to find and understand.

In contrast, our work is the first to specify the exact phases of Multi-Paxos in a most direct and general language model, TLA+, with a complete correctness proof automatically checked using TLAPS. Building on Lamport, Merz, and Doligez’s specification and proof for Basic Paxos [22], we aim to facilitate the understanding of Multi-Paxos and its proof by minimizing the difference from those for basic Paxos. We also show this as a general way for specifying and proving variants of Multi-Paxos, by doing so for Multi-Paxos extended with preemption. We also discuss the significantly more complex but necessary subproofs by induction. Future work may automate inductive proofs and support the verification of variants that improve and extend Multi-Paxos, by extending specifications of variants of Paxos, e.g., Fast Paxos [19] and Byzantine Paxos [20], to Multi-Paxos and verifying these variants of Multi-Paxos as well as Raft [27].

References

Burrows, M.: The Chubby lock service for loosely-coupled distributed systems. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, pp. 335–350. USENIX Association (2006)
Google Scholar
Chand, S., Liu, Y.A., Stoller, S.D.: Formal Verification of Multi-Paxos for Distributed Consensus. arXiv preprint arXiv:1606.01387 (2016)
Chandra, T.D., Griesemer, R., Redstone, J.: Paxos made live–An engineering perspective. In: Proceedings of the 26th Annual ACM Symposium on Principles of Distributed Computing, pp. 398–407 (2007)
Google Scholar
Charron-Bost, B., Merz, S.: Formal verification of a consensus algorithm in the Heard-Of model. Int. J. Softw. Inform. 3(2–3), 273–303 (2009)
Google Scholar
Charron-Bost, B., Schiper, A.: The Heard-Of model: computing in distributed systems with benign faults. Distrib. Comput. 22(1), 49–71 (2009)
Article MATH Google Scholar
Clarke Jr., E.M., Grumberg, O., Peled, D.A.: Model Checking. MIT Press, Cambridge (1999)
Google Scholar
Constable, R.L., Allen, S.F., Bromley, H.M., Cleaveland, W.R., Cremer, J.F., Harper, R.W., Howe, D.J., Knoblock, T.B., Mendler, N.P., Panangaden, P., Sasaki, J.T., Smith, S.F.: Implementing Mathematics with the Nuprl Proof Development System. Prentice-Hall, Upper Saddle River (1986)
Google Scholar
Delzanno, G., Tatarek, M., Traverso, R.: Model checking Paxos in Spin. In: Proceedings of the 5th International Symposium on Games, Automata, Logics and Formal Verification, pp. 131–146 (2014)
Google Scholar
Drăgoi, C., Henzinger, T.A., Zufferey, D.: Psync: A partially synchronous language for fault-tolerant distributed algorithms. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 400–415 (2016)
Google Scholar
Flanagan, C., Godefroid, P.: Dynamic partial-order reduction for model checking software. In: Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 110–121 (2005)
Google Scholar
Hawblitzel, C., Howell, J., Kapritsos, M., Lorch, J.R., Parno, B., Roberts, M.L., Setty, S., Zill, B.: IronFleet: proving practical distributed systems correct. In: Proceedings of the 25th Symposium on Operating Systems Principles, pp. 1–17 (2015)
Google Scholar
INRIA: The Coq Proof Assistant. http://coq.inria.fr/. (Last released January 2016)
Isard, M.: Autopilot: Automatic data center management. ACM SIGOPS Oper. Syst. Rev. 41(2), 60–67 (2007)
Article Google Scholar
Kellomäki, P.: An annotated specification of the consensus protocol of Paxos using superposition in PVS. Report 36, Institute of Software Systems, Tampere University of Technology (2004)
Google Scholar
Killian, C.E., Anderson, J.W., Braud, R., Jhala, R., Vahdat, A.M.: Mace: language support for building distributed systems. In: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 179–188 (2007)
Google Scholar
Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16(2), 133–169 (1998)
Article Google Scholar
Lamport, L.: Paxos made simple. SIGACT News (Distrib. Comput. Column) 32(4), 51–58 (2001)
Google Scholar
Lamport, L.: Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley, Amsterdam (2002)
Google Scholar
Lamport, L.: Fast Paxos. Distrib. Comput. 19(2), 79–103 (2006). http://research.microsoft.com/pubs/64624/tr-2005-112.pdf
Article MATH Google Scholar
Lamport, L.: Byzantizing paxos by refinement. In: Peleg, D. (ed.) DISC 2011. LNCS, vol. 6950, pp. 211–224. Springer, Heidelberg (2011). doi:10.1007/978-3-642-24100-0_22
Chapter Google Scholar
Lamport, L.: My writings. http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#lamport-paxos. Accessed 24 Jan 2016. Lamport’s history of paper [16]
Lamport, L., Merz, S., Doligez, D.: A TLA spefication of the Paxos Consensus algorithm described in Paxos Made Simple and a TLAPS-checked proof of its correctness. file /tlapm/examples/paxos/Paxos.tla in TLAPS distribution, November 2012. http://tla.msr-inria.inria.fr/tlaps/dist/current/tlaps-1.4.3.tar.gz. Accessed 28 Nov 2014
Leino, K.R.M.: Dafny: an automatic program verifier for functional correctness. In: Clarke, E.M., Voronkov, A. (eds.) LPAR 2010. LNCS (LNAI), vol. 6355, pp. 348–370. Springer, Heidelberg (2010). doi:10.1007/978-3-642-17511-4_20
Chapter Google Scholar
Liu, X., Guo, Z., Wang, X., Chen, F., Lian, X., Tang, J., Wu, M., Kaashoek, M.F., Zhang, Z.: D3S: debugging deployed distributed systems. In: Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, pp. 423–437. USENIX Association (2008)
Google Scholar
Microsoft Research: The TLA Toolbox. http://research.microsoft.com/en-us/um/people/lamport/tla/toolbox.html. Accessed 4 Jan 2016
Microsoft Research-Inria Joint Center: TLA+ Proof System (TLAPS). http://tla.msr-inria.inria.fr/tlaps/. (Last released June 2015)
Ongaro, D., Ousterhout, J.: In search of an understandable consensus algorithm. In: 2014 USENIX Annual Technical Conference (USENIX ATC 14), pp. 305–319. USENIX Association (2014). http://www.usenix.org/conference/atc14/technical-sessions/presentation/ongaro
PRL Project: EventML. http://www.nuprl.org/software/#WhatisEventML. Accessed 21 Sep 2012
van Renesse, R., Altinbuken, D.: Paxos made moderately complex. ACM Comput. Surv. 47(3), 1–36 (2015)
Article Google Scholar
Schiper, N., Rahli, V., van Renesse, R., Bickford, M., Constable, R.L.: Developing correctly replicated databases using formal tools. In: Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp. 395–406. IEEE CS Press (2014)
Google Scholar
Spin Community: Verifying Multi-threaded Software with Spin. http://spinroot.com/spin/whatispin.html. (Last released January 1, 2016)
SRI: PVS Specification and Verification System. http://pvs.csl.sri.com/. (Last released February 11, 2013)
University of Cambridge: Isabelle (a generic proof assistant). http://isabelle.in.tum.de/. (Last released May 25, 2015)
Wilcox, J.R., Woos, D., Panchekha, P., Tatlock, Z., Wang, X., Ernst, M.D., Anderson, T.: Verdi: A framework for implementing and formally verifying distributed systems. In: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 357–368 (2015)
Google Scholar
Yabandeh, M., Knezevic, N., Kostic, D., Kuncak, V.: CrystalBall: predicting and preventing inconsistencies in deployed distributed systems. In: Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation, pp. 229–244. USENIX Association (2009)
Google Scholar
Yang, J., Chen, T., Wu, M., Xu, Z., Liu, X., Lin, H., Yang, M., Long, F., Zhang, L., Zhou, L.: MoDist: transparent model checking of unmodified distributed systems. In: Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation, pp. 213–228. USENIX Association (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Stony Brook University, Stony Brook, New York, 11794, USA
Saksham Chand, Yanhong A. Liu & Scott D. Stoller

Authors

Saksham Chand
View author publications
You can also search for this author in PubMed Google Scholar
Yanhong A. Liu
View author publications
You can also search for this author in PubMed Google Scholar
Scott D. Stoller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saksham Chand .

Editor information

Editors and Affiliations

Newcastle University, Newcastle upon Tyne, United Kingdom
John Fitzgerald
US Naval Research Laboratory, Washington, District of Columbia, USA
Constance Heitmeyer
ISTI-CNR, Pisa, Italy
Stefania Gnesi
University of Cyprus, Nicosia, Cyprus
Anna Philippou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chand, S., Liu, Y.A., Stoller, S.D. (2016). Formal Verification of Multi-Paxos for Distributed Consensus. In: Fitzgerald, J., Heitmeyer, C., Gnesi, S., Philippou, A. (eds) FM 2016: Formal Methods. FM 2016. Lecture Notes in Computer Science(), vol 9995. Springer, Cham. https://doi.org/10.1007/978-3-319-48989-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-48989-6_8
Published: 08 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48988-9
Online ISBN: 978-3-319-48989-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Formal Verification of Multi-Paxos for Distributed Consensus

Abstract

Similar content being viewed by others

Simpler Specifications and Easier Proofs of Distributed Algorithms Using History Variables

Paxos Consensus, Deconstructed and Abstracted

Eventual consensus in Synod: verification using a failure-aware actor model

1 Introduction

2 Distributed Consensus and Paxos

3 Specification of Multi-Paxos

4 Verification of Multi-Paxos

5 Multi-Paxos with Preemption

6 Results of TLAPS-checked Proof

7 Related Work and Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Formal Verification of Multi-Paxos for Distributed Consensus

Abstract

Similar content being viewed by others

Simpler Specifications and Easier Proofs of Distributed Algorithms Using History Variables

Paxos Consensus, Deconstructed and Abstracted

Eventual consensus in Synod: verification using a failure-aware actor model

1 Introduction

2 Distributed Consensus and Paxos

3 Specification of Multi-Paxos

4 Verification of Multi-Paxos

5 Multi-Paxos with Preemption

6 Results of TLAPS-checked Proof

7 Related Work and Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation