Abstract
It was recently proved that the causality or the happens before relation between events in an asynchronous distributed system cannot be detected in the presence of Byzantine processes [Misra and Kshemkalyani, NCA 2022]. This result holds for the multicast, unicast, and broadcast modes of communication. This prompts us to examine whether the causality detection problem can be solved in synchronous systems in the presence of Byzantine processes. We answer this in the affirmative by outlining two approaches. The first approach uses Replicated State Machines (RSM) and vector clocks. Another approach is based on a transformation from Byzantine failures to crash failures for synchronous systems.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
The “happens before” or the causality relation, denoted \(\rightarrow \), between events in a distributed system was defined by Lamport [6]. Given two events e and \(e'\), the causality detection problem asks to determine whether \(e\rightarrow e'\).
There is a rich literature on solutions for solving the causality detection problem between events. See [4, 5, 9, 15, 17] for an overview of some approaches such as tracking causality graphs, scalar clocks, vector clocks [3, 8], and variants of logical clocks such as hierarchical clocks, plausible clocks, dotted version vectors, Bloom clocks, interval tree clocks and resettable encoded vector clocks. Some of these variants track causality accurately while others introduce approximations as trade-offs to save on the space and/or time and/or message overheads. Schwarz and Mattern [17] stated that the quest for the holy grail of the ideal causality tracking mechanism is on. This literature above assumed that processes are correct (non-faulty). The causality detection problem for a system with Byzantine processes was recently introduced and studied in [11].
A related problem is the causal ordering of messages. Under the Byzantine failure model, causal ordering has recently been studied in [10, 12, 13].
Contributions. It was recently proved that the problem of detecting causality between a pair of events cannot be solved in an asynchronous system in the presence of Byzantine processes, irrespective of whether the communication is via unicasts, multicasts, or broadcasts [11]. In the multicast mode of communication, each send event sends a message to a group consisting of a subset of the set of processes in the system. Different send events can send to different subsets of processes. Communicating by unicasts and communicating by broadcasts are special cases of multicasting. It was shown in [11] that in asynchronous systems with even a single Byzantine process, the unicast and multicast modes of communication are susceptible to false positives and false negatives, whereas the broadcast mode of communication is susceptible to false negatives but no false positives. A false positive means that \(e\not \rightarrow e'\) whereas \(e\rightarrow e'\) is perceived/detected. A false negative means than \(e\rightarrow e'\) whereas \(e\not \rightarrow e'\) is perceived/detected. The impossibility result for asynchronous systems prompts us to examine whether the causality detection problem can be solved in synchronous systems in the presence of Byzantine processes. We answer in the affirmative for unicasts, multicasts, and broadcasts by outlining two approaches in this brief announcement. The results are summarized in Table 1.
2 System Model
The distributed system is modelled as an undirected complete graph \(G = (P,C)\). Here P is the set of processes communicating in the distributed system. Let \(|P|=n\). C is the set of (logical) FIFO communication links over which processes communicate by message passing. The processes may be Byzantine [7, 14]. The distributed system is assumed to be synchronous [2].
Let \(e^x_i\), where \(x \ge 1\), denote the x-th event executed by process \(p_i\). An event may be an internal event, a message send event, or a message receive event. Let the state of \(p_i\) after \(e^x_i\) be denoted \(s^x_i\), where \(x\ge 1\), and let \(s^0_i\) be the initial state. The execution at \(p_i\) is the sequence of alternating events and resulting states, as \(\langle s^0_i,e^1_i,s^1_i,e^2_i, s^2_i\ldots \rangle \). The sequence of events \(\langle e^1_i,e^2_i,\ldots \rangle \) is called the execution history at \(p_i\) and denoted \(E_i\). Let \(E=\bigcup _i\{E_i\}\) and let T(E) denote the set of all events in (the set of sequences) E. The happens before [6] relation, denoted \(\rightarrow \), is an irreflexive, asymmetric, and transitive partial order defined over events in a distributed execution that is used to define causality. The causal past of an event e is denoted as CP(e) and defined as the set of events \(\{e'\in T(E)\,|\, e'\rightarrow e\}\).
3 Problem Formulation and a Brief Overview of Solutions
The problem formulation is analogous to that in [11]. An algorithm to solve the causality detection problem collects the execution history of each process in the system and derives causal relations from it. \(E_i\) is the actual execution history at \(p_i\). For any causality detection algorithm, let \(F_i\) be the execution history at \(p_i\) as perceived and collected by the algorithm and let \(F=\bigcup _i\{F_i\}\). F thus denotes the execution history of the system as perceived and collected by the algorithm. Analogous to T(E), let T(F) denote the set of all events in F. Analogous to the definition of \(\rightarrow \) on T(E) [6], the happens before relation can be defined on T(F) instead of on T(E).
Let \(e1\rightarrow e2|_E\) and \(e1\rightarrow e2|_F\) be the evaluation (1 or 0) of \(e1\rightarrow e2\) using E and F, respectively. Byzantine processes may corrupt the collection of F to make it different from E. We assume that a correct process \(p_i\) needs to detect whether \(e^x_h \rightarrow e^*_i\) holds and \(e^*_i\) is an event in T(E). If \(e^x_h \not \in T(E)\) then \(e^x_h\rightarrow e^*_i|_E\) evaluates to false. If \(e^x_h \not \in T(F)\) (or \(e^*_i \not \in T(F)\)) then \(e^x_h\rightarrow e^*_i|_F\) evaluates to false. We assume an oracle that is used for determining correctness of the causality detection algorithm; this oracle has access to E which can be any execution history such that \(T(E)\supseteq CP(e^*_i)\).
Byzantine processes may collude as follows.
-
1.
To delete \(e^x_h\) from \(F_h\) or in general, record F as any alteration of E such that \(e^x_h\rightarrow e^*_i|_F = 0\), while \(e^x_h\rightarrow e^*_i|_E = 1\), or
-
2.
To add a fake event \(e^x_h\) in \(F_h\) or in general, record F as any alteration of E such that \(e^x_h\rightarrow e^*_i|_F = 1\), while \(e^x_h\rightarrow e^*_i|_E = 0\).
Without loss of generality, we have that \(e^x_h \in T(E)\cup T(F)\). Note that \(e^x_h\) belongs to \(T(F)\setminus T(E)\) when it is a fake event in F.
Definition 1
The causality detection problem \(CD(E,F,e^*_i)\) for any event \(e^*_i\in T(E)\) at a correct process \(p_i\) is to devise an algorithm to collect the execution history E as F at \(p_i\) such that \(valid(F)=1\), where
When 1 is returned, the algorithm output matches the actual (God’s) truth and solves CD correctly. Thus, returning 1 indicates that the problem has been solved correctly by the algorithm using F. 0 is returned if either
-
\(\exists e^x_h\) such that \(e^x_h\rightarrow e^*_i|_E = 1 \wedge e^x_h\rightarrow e^*_i|_F = 0\) (denoting a false negative), or
-
\(\exists e^x_h\) such that \(e^x_h\rightarrow e^*_i|_E = 0 \wedge e^x_h\rightarrow e^*_i|_F = 1\) (denoting a false positive).
In our first solution, we use the Replicated State Machine (RSM) approach [16] and vector clocks in the algorithm for causality detection. We can show that F at a correct process can be made to exactly match E, hence there is no possibility of a false positive or of a false negative. The RSM approach works only in synchronous systems. In a system with n application processes, the RSM-based solution uses \(3t+1\) process replicas per application process, where t is the maximum number of Byzantine processes that can be tolerated in a RSM. Thus, there can be at most nt Byzantine processes among a total of \((3t+1)n\) processes partitioned into n RSMs of \(3t+1\) processes each, with each RSM having up to t Byzantine processes. By using \((3t+1)n\) processes and the RSM approach to represent n application processes, the malicious effects of Byzantine process behaviors are neutralized.
Another approach is as follows. A generic transformation from Byzantine failures to crash failures for synchronous systems can be applied [1], this requires \(t< n/3\). The possibility of correct Byzantine-tolerant causality detection would be implied by the possibility of correct crash-tolerant causality detection.
References
Bazzi, R.A., Neiger, G.: Simplifying fault-tolerance: providing the abstraction of crash failures. J. ACM 48(3), 499–554 (2001). https://doi.org/10.1145/382780.382784
Dwork, C., Lynch, N.A., Stockmeyer, L.J.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288–323 (1988). https://doi.org/10.1145/42282.42283
Fidge, C.J.: Logical time in distributed computing systems. IEEE Comput. 24(8), 28–33 (1991). https://doi.org/10.1109/2.84874
Kshemkalyani, A.D.: The power of logical clock abstractions. Distrib. Comput. 17(2), 131–150 (2004). https://doi.org/10.1007/s00446-003-0105-9
Kshemkalyani, A.D., Shen, M., Voleti, B.: Prime clock: encoded vector clock to characterize causality in distributed systems. J. Parallel Distrib. Comput. 140, 37–51 (2020). https://doi.org/10.1016/j.jpdc.2020.02.008
Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558–565 (1978)
Lamport, L., Shostak, R.E., Pease, M.C.: The Byzantine generals problem. ACM Trans. Program. Lang. Syst. 4(3), 382–401 (1982)
Mattern, F.: Virtual time and global states of distributed systems. In: Parallel and Distributed Algorithms, pp. 215–226. North-Holland (1988)
Misra, A., Kshemkalyani, A.D.: The bloom clock for causality testing. In: Goswami, D., Hoang, T.A. (eds.) ICDCIT 2021. LNCS, vol. 12582, pp. 3–23. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-65621-8_1
Misra, A., Kshemkalyani, A.D.: Causal ordering in the presence of Byzantine processes. In: 28th IEEE International Conference on Parallel and Distributed Systems (ICPADS), pp. 130–138. IEEE (2022). https://doi.org/10.1109/ICPADS56603.2022.00025
Misra, A., Kshemkalyani, A.D.: Detecting causality in the presence of Byzantine processes: there is no holy grail. In: 21st IEEE International Symposium on Network Computing and Applications (NCA), pp. 73–80 (2022). https://doi.org/10.1109/NCA57778.2022.10013644
Misra, A., Kshemkalyani, A.D.: Solvability of Byzantine fault-tolerant causal ordering problems. In: Koulali, M.A., Mezini, M. (eds.) NETYS 2022. LNCS, vol. 13464, pp. 87–103. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-17436-0_7
Misra, A., Kshemkalyani, A.D.: Byzantine fault-tolerant causal ordering. In: 24th International Conference on Distributed Computing and Networking (ICDCN), pp. 100–109. ACM (2023). https://doi.org/10.1145/3571306.3571395
Pease, M.C., Shostak, R.E., Lamport, L.: Reaching agreement in the presence of faults. J. ACM 27(2), 228–234 (1980). https://doi.org/10.1145/322186.322188
Pozzetti, T., Kshemkalyani, A.D.: Resettable encoded vector clock for causality analysis with an application to dynamic race detection. IEEE Trans. Parallel Distrib. Syst. 32(4), 772–785 (2021). https://doi.org/10.1109/TPDS.2020.3032293
Schneider, F.B.: Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput. Surv. 22(4), 299–319 (1990)
Schwarz, R., Mattern, F.: Detecting causal relationships in distributed computations: in search of the holy grail. Distrib. Comput. 7(3), 149–174 (1994)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Misra, A., Kshemkalyani, A.D. (2023). Brief Announcement: Byzantine-Tolerant Detection of Causality in Synchronous Systems. In: Dolev, S., Schieber, B. (eds) Stabilization, Safety, and Security of Distributed Systems. SSS 2023. Lecture Notes in Computer Science, vol 14310. Springer, Cham. https://doi.org/10.1007/978-3-031-44274-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-44274-2_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44273-5
Online ISBN: 978-3-031-44274-2
eBook Packages: Computer ScienceComputer Science (R0)