Gossip based fault tolerant protocol in distributed transactional memory using quorum based replication system

Chatterjee, Moumita; Mitra, Anirban; Roy, Sudipta; Setua, Sanjit Kumar

doi:10.1007/s10586-019-02973-7

Gossip based fault tolerant protocol in distributed transactional memory using quorum based replication system

Published: 03 September 2019

Volume 23, pages 1103–1124, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Cluster Computing Aims and scope Submit manuscript

Gossip based fault tolerant protocol in distributed transactional memory using quorum based replication system

Download PDF

Moumita Chatterjee¹,
Anirban Mitra²,
Sudipta Roy ORCID: orcid.org/0000-0001-5161-9311³ &
…
Sanjit Kumar Setua¹

324 Accesses
4 Citations
Explore all metrics

Abstract

Single copy Distributed Software Transactional Memory protocol maintains only one replica of each object in the system and is therefore prone to failures in large scale dynamically changing network. In this paper we propose a replication model using quorum system for transactional memory protocol where communication among the nodes takes place using gossip. The previous protocols demand a static structure over the network. Maintenance of a static structure for a dynamic network requires a significant overhead. Our method executes on an unstructured network which does not require adaption in case of node joining and node leaving. The algorithm maintains the coherence of the objects and aims to achieve low communication cost while reducing execution time of the transactions. The algorithm achieves a message complexity of $ {\text{O}}\left( {\sqrt n } \right) $ and time complexity of $ {\text{O}}\left( {\log \sqrt n } \right) $ which is an improvement over previous replication protocols for distributed transactional memory. Simulation results shows that the method exhibits better fault tolerance and requires less number of messages than existing approaches.

A Distributed Transactional Memory Protocol for Dynamic Networks

A Hybrid Quorum Protocol for Improved Availability, Capacity, Load and Reduced Overhead

Article 21 June 2016

CaLibRe: A Better Consistency-Latency Tradeoff for Quorum Based Replication Systems

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Software Transactional memory is a concurrent synchronization model for reducing the drawbacks of lock-based synchronization mechanisms. Transactional memory is composed of modules, called transactions, consisting of a series of steps to execute shared memory reads and writes atomically. Transactions can be either abort by detecting conflict while accessing shared variables or commit when no conflict is detected. On detecting a conflict, a conflict detection module is used for solving the conflict. Transactional memory support for both software and hardware levels are the focus of several recent research efforts. The complications of synchronization in distributed lock based systems have instigated the distributed version of software transactional memory (D-STM).

Distributed Software Transactional Memory is classified into two types depending on the system design: cache coherent D-STM and cluster D-STM. In the cache coherent D-STM, a metric space network is assumed where the nodes communicate using message passing, whereas a cluster D-STM comprises of a set of clusters where each cluster has several processors that works together functioning like a single processor.

In this work we consider distributed cache coherence D-STM where the transactions execute at a particular node (immobile) and objects are forwarded to the nodes which require them [1]. In this model only one copy of the object is placed in the network. A transaction running at a node generates a read (write) request for accessing an object. The D-STM proxy of that node checks if the requested object is present at the node or not. If the node contains the object, D-STM proxy grants the request by giving the object to the transaction otherwise it uses the D-STM protocol to find the object. After a remote node receives the request, its D-STM proxy checks if the object is currently being used by any local transaction or is free. In the first case, D-STM proxy may allow the local transaction a possibility to commit by delaying or aborting the remote transaction or it may abort the local transaction and grants the object to the requesting transaction. Otherwise, D-STM protocol grants the request of the remote transaction. Consequently, some coherence protocols are required for maintaining the consistency of shared objects. Such protocols are mostly responsible for finding and moving objects in distributed network. Examples of such protocols include Combine [2], Spiral [3], Relay [4], Arrow [5].

In the single copy D-STM protocol described above, only one copy of each object is placed in the network. In large scale dynamically changing system, the set of processors which are available for computation may change or become unavailable because of failures. During a computation, if node failure occurs, the D-STM messages locating an object may be lost and the transactions that are participating in the computation at that time would not commit. Thus the single copy model of D-STM protocol cannot tolerate node failures. So, it is required to maintain multiple replicas of each object to attain high availability in the event of node failures.

In this work, we propose QS-TRAIL a quorum based replication model for the cache coherence D-STM model that provides fault tolerance in a large scale dynamic network with node failures. A transaction contacts a read (write) quorum for performing a read (write) operation. The intersection between a read quorum and a write quorum must be nonempty. This intersection property ensures that a write operation in one quorum followed by a read in another quorum is observed by at least one node and that node can provide the latest replica of the object to the requesting transaction. Thus the reliability of the system can be preserved by using the QS-TRAIL our proposed model for read and write operation. The quorum system in QS-TRAIL our proposed protocol is not dependent on any fixed structure. Use of unstructured overlay has many advantages. It is not required to re adapt the structure at each dynamic event which makes it flexible. There is also no need for failure detection.

The rest of the paper is organized as follows: Sect. 2 describes the related work. The contribution and objectives of the work is presented in Sect. 3. Section 4 describes the system model and scope of work. The preliminary definitions are given in Sect. 5. Section 6 describes the proposed work. Section 7 presents the proof of correctness, complexity analysis and performance analysis. Section 8 concludes the paper.

2 Related work

We review some of the distributed transactional memory protocols along with our contribution in Table 1 and compare our analysis with them.

Table 1 Comparison of different DTM protocols

Full size table

A large number of research works [1,2,3,4,5,6, 8, 12,13,14] have focused on implementing transactional memory in large scale distributed network systems. The distributed implementation of transactional memory is divided into three categories—data flow, control flow, hybrid flow. In data flow approach [1,2,3,4,5,6, 8, 12, 13] objects are mobile, that is they move from one node to another while the transactions are immobile. The movement of the objects is governed by the transactions that are executing at different nodes in the network. In control flow approach [6, 7, 14, 10] objects resides on the nodes and the transactions move to the nodes containing the required objects. This method is invoked as RPC on that node. Control flow approaches are useful when objects are called infrequently or objects may be immobile due to its size, dependencies etc. Hybrid flow [12, 14] combines the data flow and control flow approach. The movement of objects in the network affects the completion time of the transactions executing on the nodes in the network. In [15, 16] the authors studied scheduling algorithms for minimizing the execution time and the communication cost of the transactions. Communication cost minimization have also been studied in many papers [1, 3, 4] for distributed transaction memory problem involving one shared object.

Single copy data flow model is intrinsically vulnerable in networks where the set of nodes available for computation and the links may change or become unavailable due to failures. When node failures occur, the objects placed at the nodes are no longer accessible and all transactions requesting that particular objects will not commit. Single copy D-STM protocols such as Ballistic [1] and Relay [4] uses a tree structure to create a logical fifo link between the nodes. These logical links are assumed to be reliable against link failures. Combine [2] protocol can endure fractional link failures and non fifo arrival of messages by maintaining an overlay tree between the nodes. However Combine cannot tolerate network partitioning in which some objects may become unreachable from transactions outside the partition. TRAIL [8] protocol can permit node failures by using gossip for communication among the nodes. However TRAIL cannot tolerate failure of the node containing the single copy of the object. Thus single copy D-STM model is inappropriate for a network with link and node failures. Single copy control flow model like Snake Protocol [6] and RMI-DSTM [7] maintain a single copy of each object in the network and hence cannot tolerate node or link failures.

Replication is an extremely promising technique to provide fault tolerance and reliability in D-STM systems where multiple replicas of the objects are present. Recent works on transactional memory focuses on implementing transactional memory through replication and multi versioning [9, 10, 12, 14, 17,18,19,20,21]. These solutions require message passing as means of communication for maintaining consistency among the different replicas of the objects and presume the communication cost among the nodes to be uniform. In [22] the authors provided a partial replication approach that provides scalability gain while reducing the amount of data that is to be stored at each node [9, 21]. Can maintain multiple copies of an object in the network while maintaining communication cost which increases linearly with the number of objects.

Zhang et al. [11] proposed the QR model, quorum based replication protocol for D-STM model, where multiple replicas of each object are distributed over several nodes. QR model uses the FLOODING protocol to build a quorum over the nodes. QR model ensures fault tolerance property in a failure prone network. The model allows transactions to perform concurrent reads and ensures that the consistencies among the replicas of the objects are maintained at the time of commit of the transactions. In [10], Hirve et al. proposed the Hiper TM model where object copy is replicated over all the nodes in the network to avoid remote access. The model allows concurrent reads and maintains consistency during write operations. It uses a strict quorum system to maintain consistency among the replicas.

In this work, we propose the QS-TRAIL (Quorum System-TRAIL) protocol which provides fault tolerance property to the D-STM protocol using replication. QS-TRAIL builds on TRAIL [8]. Some of the aspects of QS-TRAIL protocol are same as that of TRAIL. Similar to TRAIL [8], the QS-TRAIL protocol executes on an unstructured network. Similar to [11], QS-TRAIL protocol uses a quorum based replication system however the quorum system used in our protocol does not follow a logical structure. The communication cost of QS-TRAIL protocol is not dependent on the layout of the underlying network and quorum system and this makes the protocol flexible. Our protocol also allows multiple transactions to read an object concurrently. Thus QS-TRAIL protocol is an efficient solution to the support single copy D-STM protocol in a network with node failures.

3 Contribution of the work

Our objective in this work is to improve the effectiveness of the solutions for the single copy D-STM protocol TRAIL [8] in a network with node failures and to prove the usefulness of this new method. The previous protocol [11] demands a static tree structure over the network. Maintenance of a static structure for a dynamic network requires a significant overhead. QS-TRAIL protocol executes on an unstructured network. We do not have to readapt an unstructured network for dynamic events like node leaving and joining. Our proposed method also uses gossip as an unstructured communication mode for communicating among the nodes.

The major contributions of our work are as follows:

1.
We present a new algorithm using gossip for providing fault tolerance in distributed transactional memory.
2.
Our algorithm uses probabilistic quorum system to ensure uniformity among the different replicas of the object.
3.
Our algorithm has a message complexity of $ {\text{O}}\left( {\sqrt n } \right) $ and time complexity of $ {\text{O}}\left( {\log \sqrt n } \right) $ which improves on the previous best known algorithm [11] where n is the number of nodes.
4.
We demonstrate that our algorithm has better fault tolerance property than the previous best known solutions.

4 System model and scope of work

4.1 System model

The system consists of n distinct nodes communicating among themselves through message passing links. The system is dynamic in the sense that at every unit of time, some nodes enters and leaves the system. We consider a failure prone network where any node can fail and recover independently and the communication links among the nodes may also fail while delivering messages. We also presume network partitioning where nodes and link may fail concurrently and nodes in different partitions may become unreachable to each other.

4.2 Distributed transactions

Each node can execute a transaction. Transactions are composed of a series of steps consisting of read and write requests for an object. Transaction can either abort or commit depending on whether conflict is detected or not. A transaction goes through three states: active, committed or aborted. A transaction that is executing and accessing shared variables is considered to be active. Let us consider Ti and Tj to be two active transactions that are requesting the same object O and one of Ti and Tj is a write operation. Then Ti and Tj conflict on object O. Transactions conflict in three cases:

(i)
Ti is a read and Tj is a write.
(ii)
Ti is a write and Tj is a read.
(iii)
Both Ti and Tj are writes.

If conflicts occur, the contention manager resolves the conflict by delaying or aborting one of the conflicting transactions. The contention manager considered in this work assigns higher priorities to transactions with earlier timestamps [23].

4.3 TRAIL protocol

In [8] we proposed the TRAIL protocol which is a D-STM protocol which realizes a distributed transaction memory using gossip for shared objects on a dynamic network. TRAIL is the first DTM protocol to run on an unstructured network. A DTM proxy module is present in each node for providing interfaces to local transactions and the DTM proxy modules of other nodes. A transaction issuing a read or write operation, requests the DTM proxy to grant the shared object. The DTM proxy verifies if the object is present locally at that node. If the object is present, it is granted to the transaction otherwise the lookup (move) operation of TRAIL protocol is used for locating the object.

4.4 Limitations of TRAIL protocol

Although TRAIL protocol allows several read-only replicas of an object to be present in the system, the protocol does not provide any mechanism to preserve consistency over all the replicas (both read-only and writable). Consider two nodes N_x and N_y that issues transactions T_x and T_y at same time t where T_x consists of operations read (O) and T_y consists of operations write (O). Clearly, T_x and T_y conflict on O. If T_x and T_y operate on different replicas of the objects then the contention managers at N_x (N_y) are unable to detect the conflicts. To update the objects about the transactions that are accessing it, T_x (T_y) must forward messages to all replicas of the objects. Unfortunately, in TRAIL protocol such mechanism requires high communication overhead. In addition, the contention manager of the TRAIL protocol may still make a erroneous decision if a conflict is detected between the duration when T_x (T_y) terminated and the conflicting replica gets the information due to high communication delay. Due to these difficulties, TRAIL protocol supports limited concurrency in read operations and each read/write operation are considered as write operations where move operation is used to obtain the objects.

TRAIL protocol grants requests in accordance to the timestamps of request generation. When two remote transactions are requesting an object simultaneously, the object will be sent first to the transaction that is closer to it and afterward to the farther transaction. In this way TRAIL protocol maintains locality by routing requests efficiently. However transactions do not normally initiate concurrently and it is probable for a farther transaction to request the object at an earlier timestamp. In such cases, TRAIL protocol is not able to route the requests efficiently as TRAIL overlook locality in trying to monitor the single replica of the object. QS-TRAIL protocol uses locality to minimize communication cost and enhance performance of TRAIL protocol.

5 Preliminary definitions

Definition 1

(Quorum system) A quorum system over a set of nodes is defined as a subset of nodes or processes.

A quorum is defined formally as follows:

Let A = {a0, a1,…an} (n ≥ 1) be a set. A set system Q_A is a quorum system over A iff.

$$ \left( {\text{Intersection}} \right)\,\forall\,Q_{1} \cap Q_{2 } \in Q_{A } : Q_{1} \cap Q_{2} \ne \emptyset . $$

(1)

Definition 2

(Probabilistic quorum system (PQS)) Let Q be a quorum and let W(n) is the access strategy of Q to access n nodes and let 0 < ε < 1 is given. A tuple $ \left( {Q,W\left( n \right)} \right) $ is a PQS if for any quorums $ Q_{a} $, $ Q_{b} \in Q $ which are accessed with strategy $ {\text{W}}(Q_{a} ) $ and $ {\text{W}}(Q_{b} ) $ we have:

$$ \forall Q_{a} ,Q_{b} \in Q \Pr\,\left[ {\left( {Q_{a} \cap Q_{b} } \right) \ne \emptyset } \right] \ge 1 - \varepsilon . $$

(2)

The probabilistic quorums used in our algorithm are sets of size $ {\rm{k}}\sqrt n $. The constant k is selected so as to make the probability of intersection of two random quorums sufficiently high. Let us chose two quorums Q_a and Q_b uniformly at random, each of size $ {\rm{k}}\sqrt n $. Then probability of non intersection is $ \Pr [Q_{a} \cap Q_{b} = \emptyset ] < e^{{ - k^{2} }} $ [24]. Quorums having size $ {\rm{k}}\sqrt n $ ensure intersection with high probability—for two quorums the most probable and expected size of intersection is k², thus making k adequately high, we can decrease the probability of non intersection of two quorums to empty.

Definition 3

(Access strategy of PQS) An access strategy w specifies an algorithm in which a transaction T trying to contact a probabilistic quorum communicates its requests. According to Malkhi et al. [24], w for a set Q specifies a probability distribution on the elements of Q, i.e., w: Q → {0, 1} satisfies

$$ \mathop \sum \limits_{{Q \in Q_{A} }} w\left( Q \right) = 1. $$

(3)

The access strategy defines which nodes are to be contacted during a read/write operation to access a quorum set. In our model the access strategy is probabilistic and the participants of the quorum are chosen at random.

To access the participants of the quorum randomly a membership protocol is modelled. The protocol uses gossip for membership management. A middleware called the peer sampling service creates a local partial view of each node which is stored in its peer list. Through this peer sampling service each node periodically updates its peer list and computes its set of neighbors. Each node maintains an entry for each neighbor containing the identifier of the neighbor and age of the entry. During each peer list update, each node selects the oldest neighbor and communicates with it to update its peer list. The age of the neighbor denotes the last time a gossip message was received from the neighbor for peer sampling; this is used for detection of failed node from the peer list. During algorithm execution, a node selects its neighbors randomly from the peer list before sending the gossip message to it.

6 The proposed algorithm

6.1 The main idea

The QS-TRAIL protocol is a quorum based protocol for supporting replication in TRAIL protocol where several replicas of each object are distributed over several nodes. A node initiating a transaction (initiator node) issues a read (write) operation on an object and contacts the read (write) quorum to access the object. On completion of the operations, the QS-TRAIL protocol verifies the coherence of the replicas of the objects before it allows the transaction to commit. If a conflict is detected between two transactions that are accessing the same object, then one of the transactions is aborted, otherwise the transaction is permitted to commit.

Figure 1 illustrates our proposed protocol on a data flow DTM with five transactions executing on five transactions and object O whose replicas are present at node n2 and n4. Node n2 has an older version of O and n4 has a newer version of O. In Fig. 1a T1 and T5 sends request for O along its edges. n2 receives request from T5 first followed by T1 and stores the requests in that order. After receiving responses from n2 and n4, T1 and T5 accept newer version of O. When the transactions sends commit request node n4 detects conflicts among T1 and T5. It aborts T1 and sends a commit message to T5. This scenario is depicted in Fig. 1b.

QS-TRAIL protocol has the following advantages. The algorithm must achieve scalability and low message overhead. The algorithm should be efficient such that it provides maximum concurrency, low waiting time and high locality for transactions. To realize this, the probabilistic quorum system is used which does not rely on any fixed structure which makes our algorithm flexible. QS-TRAIL protocol communicates with the quorum members using gossip protocol which is robust against link and node failures, scalable and fault tolerant and do not require any error recovery mechanism.

6.2 Object model used in the algorithm

Let O be an object with an initial value. Each node maintains its own replica of the object.
Each replica consists of three fields:
val The value field called representing the value of the object (Initialized to v).
tag A tag field representing the version number of its corresponding value val (Initialized to 0.)
status Status field for recording the status of object copy. The status field is set to protected whenever a transaction is trying to update O.(Initialized to false).
Initially q nodes maintain the default replica with value v and the object values in the other nodes have the val field set to NULL.
Each replica maintains the following data structures:
RR(O) (read request list) maintains the records of all transactions that requests that particular replica of O for reading.
WR(O) (write request list) maintains a list of all transactions that requests that particular replica of O for writing.
Any node (including nodes belonging to a quorum) can access O by using read or write operations (to modify the value of the object).

6.3 Local data structures used by each transaction T_i

Each transaction T_i maintains the following local variables:

Request queue List of transactions that are trying to read the object requested by T_i.This list is sent to T_i by the nodes in the quorum along with the GRANT message.
Beneficiary list List of transactions that are presently using the quorum of T_i.
Quorum membership (Q) The quorum to which T_i currently belongs.
Readset Set of objects T_i has presently read.
Writeset Set of objects presently modified by T_i at its local node.

6.4 Data structures at each node N_i

Each node maintains the following data structures:

Peerlist Set of neighbors of node N_i chosen by the gossip protocol.
CT list (conflicting transactions list) List of transactions in N_i’s RR and WR list that conflicts with the transaction requesting to commit.
AT list (aborted transaction list) List of transactions that are to be aborted.

6.5 Additional parameters at initiator node

initiatorid Identifier of the node initiating the request.
sessionid Identifier of the present gossip session.
transid Identifier of the transaction initiating the request.
ttl specifies the time to live field. It is initialized to h where h is the disseminating bound indicating depth of propagation of gossip messages. ttl is decremented at each node participating in the propagation of gossip messages. If ttl = 0, the dissemination is complete.
s The parameter s is used to represent the number of neighboring nodes that each intermediate nodes contact during dissemination.
q q represents the minimum number of nodes to be contacted during a request propagation. The two parameters s and h together represent the total number of nodes contacted during the gossip propagation which is approximating $ \frac{{s^{h + 1} - 1}}{s - 1} $ and represents a tree of depth h and degree s + 1. These two parameters are selected so as to make the number of nodes contacted during propagation to be greater than q.
neighborsent Set of neighbors to which the read or write request message is sent.(Initialized to NULL)
receivedfromnodes Set of nodes which have responded to the request message.
alreadyreceived Flag value to denote if a node has already received message in the current session.
numk count of the number of messages received in the current session.

6.6 Structure of messages

(senderid represents the identifier of the intermediary node that forwards the gossip message.)

READ (intiatorid, transid, senderid, ttl, sessionid, O)
WRITE (intiatorid, transid, senderid, ttl, sessionid, O)
CALL(initiatorid, transid, sessionid, O)
WITHDRAW(initiatorid, transid, ttl, sessionid, O)
JOIN(nodeid): nodeid is the identifier of the node sending a request to join the quorum
QUIT(nodeid)
COMMIT_REQUEST(initiatorid, transid, ttl, sessionid, O)
COMMIT(initiatorid, transid, ttl, sessionid, O)
ABORT (initiatorid, transid, ttl, sessionid)
The different types of Response messages are:
(O_k represents object at node N_k)
RESPONSE(O_k, RR(O_k))
RESPONSE(O_k, WR(O_k))
RESPONSE(COMMIT,T,CT(T))
RESPONSE(ABORT,T)

6.7 Description of the algorithm

6.7.1 Read operation

A transaction T at node N requesting a read operation on an object forwards a READ message to $ {\text{q}} = {\text{O}}\left( {{\rm{k}}\sqrt n } \right) $ nodes. It sets the ttl field equal to h and forwards the READ message to s of its neighboring nodes using gossip. Each node forwards the READ message until at least q nodes receive the message. On getting the READ message, each node N_q in the quorum ensures that it possesses a replica of O. If a replica is present, it adds the transaction to RR list of the object and grants the object to N by forwarding a GRANT message to it. While sending the GRANT message, N_q also piggybacks all the requests presently in the RR list to N. A transaction T on node N upon receiving a GRANT message stores all the received requests in its request queue. T waits until all the members of the read quorum have sent responses. T then opts for the replica that has the highest value in the tag field as the most recent replica of the object.

One approach in supporting parallel read operations and to ensure scalability is to use the following approach. Once a node has successfully received responses from its quorum, it invites other read transactions to join the quorum and access the object for reading. Therefore a transaction can read the object either by receiving responses from its quorum or by getting a join invitation from other transactions. The transaction in the first case is known as the benefactor and the transactions in the second case are known as beneficiary. To inform the benefactor about other read requests a quorum member sends its read request queue to the benefactor together with the GRANT message. T forwards a CALL message to all the transactions in its request queue. A transaction T_c on node N_c upon receiving a CALL message from T for its present request forwards a WITHDRAW message to all the members of its quorum and enters the quorum. A node N_q in the read quorum upon receiving a WITHDRAW message from a transaction removes the request from its RR list. Once a beneficiary exits the quorum, it forwards a QUIT message to its benefactor. The benefactor waits until all the beneficiaries have quit the quorum before sending QUIT message to its quorum.

Since a beneficiary can join the quorum for reading without receiving responses from all the members, a fulfilled read request may exist for some instance in the system. As a result a node may receive old GRANT, CALL and WITHDRAW requests. But since each request has a timestamp associated with it, so a transaction can decide whether the messages received are old. A transaction upon receiving an old CALL message from another transaction sends a withdraw message to it. Old GRANT and WITHDRAW messages may be ignored.

6.7.2 Write operation

A transaction T on node N requesting a write operation on an object sends a WRITE message to $ {\text{q}} = {\text{O}}\left( {{\rm{k}}\sqrt n } \right) $ nodes. It sets the ttl field equal to h and sends the WRITE message to s of its neighboring nodes using gossip. Each node forwards the WRITE message until at least q nodes receive the message. Upon receiving the WRITE message each node N_q in the quorum adds the transaction T to its WR list and grants the object to N by forwarding a GRANT message to it.

After receiving response from all the members of the quorum, T chooses the replica with the highest tag field as the most recent replica of the object. T then generates a temporary copy at its local node and updates the temporary copy with its proposed value. It also increments the tag field of the temporary copy by 1. If no other transactions abort T during its operations and T succeeds to commit, it forwards an updated replica to all the nodes in its selected quorum.

6.7.3 Request to commit operation

A transaction T at node N requesting to commit creates a message containing all the information about the objects in its read and write set and sends the message to its neighbors using gossip. The neighbors forward the message until at least q nodes receive the message. When a node Nq receives the COMMIT_REQUEST message it eliminates T from its RR and WR lists for all objects replicas and constructs a list of transactions CT(T) that conflict with T. Nq determines the set of transactions conflicting with T by the following method [11]:

(i)
If the status bit of the replica is protected, then T should be aborted as the replica is expecting an update.
(ii)
If the local replica at Nq has a higher tag value than the replica at N, then T should be aborted.
(iii)
If the local replica at Nq has a tag value equal to the replica OT at N where T reads replica OT of O, then T conflicts with all the transactions in write request list of O (WR(O)) at Nq.
(iv)
If the local replica at Nq has a tag value equal to the replica OT at N where T writes replica OT of O, then T conflicts with all the transactions in the read request list RR(O) and write request list WR(O) of O.

The contention manager of Nq compares the priorities of T with the list of conflicting transactions. If T’s priority is higher than any transactions in CT(T) at Nq then the contention manager allows T to commit. Nq forwards a RESPONSE(COMMIT,T,CT(T)) to N. It then updates status of all the objects in T’s writeset as protected. If any transactions in CT(T) has higher priority than T, then the contention manager aborts T. Nq forwards a message RESPONSE(ABORT,T) to N and clears CT(T).

T accumulates the responses received from the nodes in the quorum until the total number of responses is greater than q. If all the responses are commits, then T proceeds to commit. T saves all the conflicting transactions in a list known as aborted transactions AT(T). Otherwise if at least one response is abort, then the contention manager aborts T.

6.7.4 Commit operation

A transaction T sets the ttl field equal to h and forwards a COMMIT message to s of its neighboring nodes using gossip. Each node forwards the COMMIT message until at least q nodes receive the message. It also sends an ABORT message to all transactions Ta in AT(T). On receiving COMMIT, a node updates its replica of O with the latest value and tag and puts the status field to false. When a transaction at node N″ receives the ABORT message, it aborts instantly.

6.7.5 Abort operation

A transaction aborts in one of the two situations: (a) After the Commit_Request operation or (b) after receiving abort message. On aborting, T cancels its operations on the local objects and sends an ABORT message to s of its neighboring nodes using gossip. Each node forwards the ABORT message until at least q nodes receive the message. On receiving an abort message each node eliminates T from all its RR(O) and WR(O) list and updates the status bit of the objects in T’s writeset to false.

7 Analysis of the algorithm

7.1 Proof of correctness

In this section we first study the characteristics of the quorum system used in QS-TRAIL protocol and then evaluate its performance. Initially, let us assume that there are at least Q nodes that hold the initial replica of the object. We further assume that for all the nodes our underlying gossip protocol provides a local partial view of the neighbors that are selected randomly from the set of all nodes. We further assume that quorum replication occur during read operations by using gossip.

Theorem 1

Let O be an object replicated across Q nodes with PQS. If we consider only crash failures occurs and a read operation by any transactions on O is not executed in parallel with any write operations then the probability of the read operation returning the value modified by the preceding write operation is at least 1 − ε.

Proof

Let us consider the most recent write operation preceding a read operation. A transaction T requesting to commit after writing to an object selects a write quorum Q_w by gossip access strategy and forwards a commit request to all the nodes in the quorum. Each node n′ in the quorum on receiving the request checks whether T has the highest replica of the object and T has higher priorities(earlier timestamp) over all the other transactions conflicting with T. After receiving commit responses from each node in Q_w, T proceeds to commit. A read operation following the write operation by T selects a read quorum Q_r by gossip access strategy. By definition of PQS, Q_w chosen by the preceding write operation and Q_r chosen by the read operation intersects with probability of at least 1 − ∈ i.e., $ \left[ { { \Pr }\left( {Q_{w} \cap Q_{r} } \right) \ne \emptyset } \right] \ge 1 - \varepsilon $. So with probability of at least 1-ε the read operation reads the correct version of the object.□

Theorem 2

QS-TRAIL protocol implements Probabilistic Quorum System.

Proof

We know from Theorem 1, that if no transaction is performing commit on object O at time t, then all the transactions that access replicas of O at time t gets the same replica of O with probability of at least 1 − ε. For any transaction writing to O before time t then there must exist a quorum Q such that the tag of O in the nodes in Q is greater than the tag of O in Q′ (not belonging to Q).

If any transaction wants to access O at time t then it must chose a quorum Q″ to read the latest value of O. By definition of PQS, Pr(Q ∩ Q″ ≠ ∅) ≥ 1 − ε. So there must exists some node n with probability of at least 1 − ε such that n ∈(Q ∩ Q″). Thus n must have the latest version of the object that was written during the previous read operation. So with probability of at least 1-ε, transaction T that access O at time t gets the correct version of the object. Therefore QS-TRAIL protocol implements probabilistic quorum system.□

7.1.1 Safety property

We demonstrate the safety property using the notations described in Table 2. We use the notations in the figure to state some properties of our protocol that can be easily verified.

Table 2 Notations used in safety property proof

Full size table

Property 1

If a request by transaction T is being satisfied at time t, then no other transactions can commit at time t. Formally,

$$ \begin{aligned} commit\left( {T,t} \right) = > & (\exists T^{\prime} : T^{\prime} \in R_{t} : \left\langle {\forall n:n \in quorum\left( {T,t} \right)\;\varLambda \; commit - request\left( {n,T,t} \right)} \right\rangle \;\varLambda \\ & \left\langle {\forall n^{\prime}:n^{\prime} \in quorum\left( {T^{\prime},t} \right)\;\varLambda \; commit - request\left( {n^{\prime},T^{\prime},t} \right) } \right\rangle \;\varLambda \\ & \left\langle {\exists x :x \in quorum\left( {T,t} \right) \cap quorum\left( {T^{\prime},t} \right) \;\varLambda \; response - commit\left( {x,T} \right) \;\varLambda \; \sim response - commit\left( {x,T^{\prime}} \right)} \right\rangle \\ \end{aligned} $$

(4)

Property 2

Two transactions cannot receive commit response from all members of the quorum at the same time t. Formally,

$$ \forall n: response - commit\left( {n,T,t} \right) \;\varLambda \; response - commit\left( {n,T^{\prime},t} \right) = > \left( {T = T^{\prime}} \right) $$

(5)

We now state our safety property and prove that our protocol is safe.

Theorem 3

(Safety property) QS-TRAIL protocol satisfies the safety property. i.e., read and write requests cannot be satisfied concurrently. Formally we can write:

$$ commit\left( {T,t} \right) \varLambda commit\left( {T^{\prime},t} \right) = > T = T^{\prime} $$

(6)

Proof

Let us assume that $ commit\left( {T,t} \right) $ and $ commit\left( {T^{\prime},t} \right) $ hold. As $ commit\left( {T,t} \right) $ holds, therefore by using (4), there is a transaction $ {\text{X}} \in R_{t} $ such that

$$ \begin{aligned} (\exists X : X \in R_{t} : & \left\langle {\forall n:n \in quorum\left( {T,t} \right)\;\varLambda \;commit - request\left( {n,T,t} \right)} \right\rangle \varLambda \\ & \left\langle {\forall n^{\prime}:n^{\prime} \in quorum\left( {X,t} \right)\;\varLambda \; commit - request\left( {n^{\prime},X,t} \right) } \right\rangle \varLambda \\ & \left\langle {\exists x:x \in quorum\left( {T,t} \right) \cap \in quorum\left( {X,t} \right)\varLambda \;response - commit\left( {x,T} \right) \;\varLambda \; \sim response - commit\left( {x,X} \right)} \right\rangle \\ \end{aligned} $$

(7)

Similarly since $ commit\left( {T^{\prime},t} \right) $ holds then by using (4), there is a transaction $ {\text{Y}} \in R_{t} $ such that

$$ \begin{aligned} (\exists Y : Y \in R_{t} : & \left\langle {\forall n:n \in quorum\left( {T,t} \right)\;\varLambda \; commit - request\left( {n,T,t} \right)} \right\rangle \varLambda \\ & \left\langle {\forall n^{\prime}:n^{\prime} \in quorum\left( {Y,t} \right)\;\varLambda \; commit - request\left( {n^{\prime},Y,t} \right) } \right\rangle \varLambda \\ & \left\langle {\exists x :x \in quorum\left( {T,t} \right) \cap quorum\left( {Y,t} \right) \varLambda response - commit\left( {x,T} \right) \;\varLambda \; \sim response - commit\left( {x,Y} \right)} \right\rangle \\ \end{aligned} $$

(8)

We know from the intersection property of the quorum system that there is at least one node z in quorum(X) ∩ quorum(Y) such that z has received commit request from both X and Y at time t. However from property 2, we know that

$$ \forall n: response - commit\left( {n,X,t} \right)\; \varLambda \; response - commit\left( {n,Y,t} \right) = > \left( {X = Y } \right) $$

(9)

A transaction commits if it receives responses from all the nodes at time t. So X and Y are actually the same requests. From (7), (8) and (9) we can say that T = T′ = X=Y.□

7.1.2 One copy equivalence property

We prove that QS-TRAIL protocol satisfies one-copy equivalence property by using the notations described in Table 3. A transaction T updates an object O as a result of commit operation. Formally we can write this as follows:

Table 3 Additional notations used in one copy equivalence proof

Full size table

$$ update\left( {T,O,t} \right) = > commit\left( {T,t} \right) \varLambda \left\langle {\forall n:n \in quorum\left( {T,t} \right)\varLambda \left( {status\left( {n,O,t} \right) = true} \right)\varLambda \left( {sendnewvalue\left( {n,T,O,t} \right)} \right)} \right\rangle $$

(10)

To prove one copy equivalence property, we need to show that only a single transaction can update the object at any time and all other transactions trying to access the object will not commit. The following two lemmas show that our algorithm satisfies one copy equivalence property.

Lemma 1

If no transaction is updating object O at time t (as a result of commit operation), all transactions requesting O at time t receive the same replica of O. Formally we can write as follows:

$$ \sim update\left( {T,O,t} \right)\;\varLambda \;accessrequest\left( {T^{\prime},t} \right) = > \left\langle {\forall T^{\prime} : T^{\prime} \in R_{t} :\forall n:n \in quorum\left( {T^{\prime}} \right) :tag\left( {O,n} \right) = tag\left( {O,T^{\prime}} \right)} \right\rangle $$

(11)

Proof

Let us consider the case that a transaction T has propagated changes to object O using quorum Q before time t and another transaction T′ is trying to read O at time t. Then there exist a quorum Q′ from which T′ is trying to read the object. From Theorem 2, it follows that Pr(Q ∩ Q′ ≠ ∅) ≥ 1 − ε. So there must exists some node n with probability at least 1 − ∈ such that n∈(Q ∩ Q′). This node n has the replica of the object having the highest tag. We know that all read and write operations always selects the object with the highest tag as the most recent replica of the object. Hence T′ will choose the object with the highest tag as the most recent replica of the object.□

Lemma 2

If a transaction T is updating the value of O at time t and another transaction T′ is trying to access O for reading through a quorum Q′ before T has updated O in its quorum Q, then T′ will not commit. Formally we can write it as:

$$ update\left( {T,O,t} \right)\;\varLambda \; accessrequest\left( {T^{\prime},t} \right) = > \sim commit\left( {T^{\prime},t} \right) $$

(12)

Proof

When T′ issues a read request at time t′, it receives an old copy of O. When T′ issues commit, then two cases may occur:

1.
T’s update has not reached all the nodes of Q and so O. sta tus = true. So T′ will not receive commit-response from all the nodes in the quorum, so from (4)

$$ \begin{aligned} commit\left( {T^{\prime},t} \right) = > & (\exists X : X \in R_{t} : \left\langle {\forall n:n \in quorum\left( {T^{\prime},t} \right)\;\varLambda \; commit - request\left( {n,T^{\prime},t} \right)} \right\rangle \;\varLambda \\ & \left\langle {\forall n^{\prime}:n^{\prime} \in quorum\left( {X,t} \right)\;\varLambda \; commit - request\left( {n^{\prime},X,t} \right) } \right\rangle \;\varLambda \; \\ & \left\langle {\exists x :x \in quorum\left( {T^{\prime},t} \right) \cap quorum\left( {X,t} \right)\; \varLambda \; \sim response - commit\left( {x,T^{\prime}} \right) \;\varLambda \; \sim response - commit\left( {x,X} \right)} \right\rangle \\ \end{aligned} $$
(13)

So T′ will not commit.
2.
T’s update of O has reached all the nodes in the quorum. As T′ has read an old value of O so there exists a node n such that O.tag > O_T′. tag i.e.,

$$ \left\langle {\exists n :n \in quorum\left( {T,t} \right) \cap quorum\left( {T^{\prime},t} \right) \;\varLambda \; \left( {tag\left( {O,n} \right) > tag\left( {O,T^{\prime}} \right)} \right)} \right\rangle $$
(14)

So T′ will be aborted during the commit operation.

As a result both read and write operations on an object cannot proceed simultaneously.

Theorem 4 follows from Lemmas 1 and 2.□

Theorem 4

(One copy Equivalence Property) QS-TRAIL protocol satisfies one-copy equivalence property.

$$ update\left( {T,O,t} \right) = > \left\langle {{\nexists }T^{\prime}: T^{\prime} \in R_{t } : update\left( {T^{\prime},O,t} \right)} \right\rangle $$

(15)

7.1.3 Liveness property (starvation freedom)

A transaction T′ cannot complete its execution if (i) the timestamp of T′ is smaller than the timestamp of the committing transaction T. (ii) T′ has read an old replica of O. The above cases are used to decide the case when the T′ will be aborted (Table 4).

Table 4 Additional notations used in liveness proof

Full size table

Formally we can write:

$$ Abort\left( {T,T^{\prime},t} \right)\, \triangleq \,< \exists n:n \in quorum\left( T \right) \cap quourm\left( {T^{\prime}} \right)\;\varLambda \; (tag\left( {O,T^{\prime}} \right) < tag\left( {O,T} \right)) > V ts\left( T \right) < ts\left( {T^{\prime}} \right) $$

(16)

We can define the abort-set of a transaction T at time t denoted by $ abort - set\left( {T,t} \right) $ as the set of transactions that T will abort at time t i.e., $ abort - set\left( {T,t} \right) = \{ T^{\prime}| Abort\left( {T,T^{\prime},t} \right)\} $. The abort-set of a transaction T that has completed is empty. We can say that when all the nodes of the quorum chosen by T have received its request then the abort-set of T cannot grow. We can write this formally as follows:

$$ \forall n:n \in quorum\left( T \right):\left\langle {\forall x,y:x,y} \right\rangle t:x \le y = > \left\langle {abort - set\left( {T,x} \right) \supseteq abort - set\left( {T,y} \right)} \right\rangle $$

(17)

We can observe that a transaction can be involved in one of the two types of conflict relationships: (1) T′ may be in the CT(T) with a smaller timestamp than T or (2) T′ may be in CT(T) with a larger timestamp than T. In the first case the contention manager will abort the transaction with the smaller timestamp and allow T to commit. In the second case T will be aborted.

We prove that our protocol satisfies the liveness properties using the following properties:

Property 3

After a transaction T sends commit requests to its quorum members, all the members of its quorum removes T from its read and write sets. We can formally write this as follows:

$$ \forall n:n \in quorum\left( {T,t} \right): commit - request\left( {n,T,t} \right) = > \left\langle { \exists j:j \ge t: T \notin list\left( {RR\left( {n,j} \right) \;\varLambda \; WR\left( {n,j} \right)} \right)} \right\rangle $$

(18)

Lemma 3

Every transaction requesting an object will eventually receive either an ABORT message or will complete its execution. We can formally write this as follows:

$$ commit - request\left( {T,t} \right)\;\varLambda \; commit - request\left( {T^{\prime},t} \right)\;\varLambda \; ts\left( T \right)\left\langle {ts\left( {T^{\prime}} \right) = } \right\rangle \left( {T^{\prime} \in abort - set\left( {T,t} \right)} \right)\;\varLambda \; \left( {commit\left( {T,t} \right)} \right) $$

(19)

Proof

Let us consider a node n belonging to the intersection of quorum(T) and quorum(T′) i.e. $ n \in quorum\left( T \right) \cap quorum\left( {T^{\prime}} \right) $ such that n receives the read/write request of transaction T′ after it has granted the object to transaction T. In this case n adds T′ to its RR and WR Queue after T. When T and T′ both decides to commit, they both send COMMIT_REQUEST message to the members of the quorum. The contention manager at n decides which transaction will commit depending on the priority of the transactions. If the contention manager decides T will commit, it eliminates T from its RR and WR lists of all objects and sends a commit message to T. It also sends an ABORT message to T′ and removes T′ from its abort list. Once T′ receives ABORT message, it aborts immediately. Formally we can write,

$$ (abortcount\left( {T^{\prime},t} \right) > 0)) = > \left\langle {\exists j:j \ge t:T^{\prime} \notin abort - set\left( {T,j} \right)} \right\rangle $$

(20)

□

Property 4

When a transaction T receives an ABORT message, it restarts from the beginning with a new timestamp.

$$ (abortcount\left( {T,t} \right) > 0)) = > < \exists j:j \ge t: accessrequest\left( {T,j} \right) $$

(21)

Lemma 4

Every transaction in the system will eventually be satisfied.

Proof

Let us prove the theorem using contradiction. Let us presume that a transaction T is never satisfied. We have to demonstrate that T achieves higher priority than other transactions at time t that is: no node issues a transaction with higher priority than T at time t.□

Let us consider a node N. Now we consider two instances:

1.
After time t, N issues an infinite number of transactions.
2.
After time t, N issues a finite number of transactions.

Let us consider the first case. We know that the logical clock at node N strictly increases each time X issues a transaction. As a result, every transaction issued by N has a higher timestamp than T and therefore T achieves higher priority than any other transactions issued by N. In the second case, T will assume higher priority than all other transactions of N after N has issued its last transaction.

Theorem 5 follows from Lemma 3 and 4.

Theorem 5

(Liveness property) Every transaction is eventually satisfied. Formally

$$ Abort\left( {T,T^{\prime},t} \right) = > \left\langle {\exists j:j \ge t:Commit\left( {T^{\prime},j} \right)} \right\rangle $$

(22)

7.2 Complexity analysis

In this section we analyze the performance of QS-TRAIL protocol with respect to the following performance metrics: communication time, message complexity and availability. As defined earlier, q is used to denote the maximum size of a quorum.

Theorem 6

The communication time for read/write operation starting at a node X is between $ 2*(\log \;{\rm{k}}\sqrt n ) $ ) and $ 2*f*(\log \;{\rm{k}}\sqrt n ) $ ) where f is the number of nodes that have failed in the network.

Proof

A transaction requesting a read (write) operation on an object sends a read/write request to its neighbors until a quorum is created. The request messages have to be sent to at least $ {\text{q}} = {\rm{k}}\sqrt n $ distinct nodes to get a quorum. The Gossip protocol generates a distributed spanning tree in the network. The nodes are contacted in parallel along a tree of depth h and degree s + 1. So the communication time needed to contact all the nodes in the tree and to receive responses is $ 2*(\log_{s} q) $. i.e. $ 2*(\log_{s} {\rm{k}}\sqrt n ) $.

If some of the nodes in the quorum q have failed, then the Gossip protocol has to probe more nodes for creating a quorum. Sending a request and receiving a reply requires $ 2*(\log_{s} k\surd n) $) steps so the time required for a node X to start probing new nodes is at most $ 2*(\log_{s} k\surd n) $. In the worst case, on failure of f nodes, at most f rounds are required to find a read or a write quorum. So for f rounds the time required for probing is at most $ 2*f*\left( {\log_{s} k\surd n} \right) $.□

Theorem 7

The total number of messages required for forming a quorum is between $ 2*k\surd n $ to $ 2*f*k\surd n $ where f is the number of nodes that have failed in the network.

Proof

A transaction requesting a read (write) operation on an object sends a read (write) request to its neighbors until a quorum is found. The number of individual nodes that have to be contacted to get a quorum is $ {\text{q}} = {\rm{k}}\sqrt n $.Sending a request and receiving a reply requires $ 2*{\rm{k}}\sqrt n $ messages. If some of the nodes in the quorum q have failed, then the Gossip protocol has to probe more nodes for forming a quorum. If f nodes have failed in the worst case, then at most f rounds are required to find a quorum. Therefore, for f rounds the time required for probing is at most $ 2*f*{\rm{k}}\sqrt n . $□

Theorem 8

The availability (fault tolerance) of the QS-TRAIL protocol under the probabilistic quorum system is Ω (n).

Proof

The availability (fault tolerance) of a quorum system Q in the QS-TRAIL protocol is defined as the minimum number of nodes that intersect all the quorums in Q. It is the minimum number of nodes whose crash will disable every quorum in the system. Malkhi et al. [24] proved that the fault tolerance of a probabilistic quorum system of size $ 1\sqrt n $ is $ n - l\sqrt n + 1 = \varOmega \left( n \right) $. Thus the availability of the QS-TRAIL protocol under the probabilistic quorum systems is Ω(n).□

7.3 Performance analysis

We performed simulations to analyze the performance of QS-TRAIL protocol. The simulation settings are given in Table 5. We developed a discrete event simulator in C language to perform the simulation of our proposed method. The environment for our implementation is as follows:

Table 5 Simulation parameters

Full size table

Intel(R) Core TM i5-3210 M CPU (2.5-GHz clock) and 4 GB RAM.
Ubuntu 14.04.
C compiler: GCC version 4.8.2.

In our simulation, each node performs peer sampling to select its neighbors. We made some assumptions to simplify the simulation.

1:
The links between the nodes do not fail.
2:
Node can fail at any time.
3:
Peer samplings are performed successfully so the nodes know about their neighbors.

Since our proposed algorithm uses gossip as a mode of communication as well as for peer sampling among the nodes, our proposed method is valid for both node and link failures. However, we are only assuming node failures as failure of a node results in the loss of an object copy which is required to demonstrate the effectiveness of our proposed method.

7.3.1 Time complexity analysis

Figure 2 gives the number of cycles that is required to find a quorum of specific size for different values of fan out. A large fan out requires a fewer gossip cycles to find a quorum. With the increase in fan out the number of messages also increases proportionally, but the number of cycles decreases. Figure 2 illustrates that the size of the quorum increases exponentially for linear increase in gossip cycles. This correlates with our theoretical results about the time complexity as stated in Theorem 6. For fan out value of 20 we can observe that the time complexity required by the proposed approach is roughly log ($ \sqrt n $).This result demonstrates that our algorithm is scalable with regard to the quorum size.

7.3.2 Average path length

Scalability is a major issue that is to be considered when information dissemination occurs. It is crucial that the cost and the time needed to reach a node are small. To evaluate scalability the average path length is used as a parameter. The minimum number of edges between any two nodes is the shortest path between them. The average path length is the average of this shortest path over all pairs of nodes in the network. Our motivation for considering this property is that we can estimate the time and the number of messages needed for contacting a node. Figure 3 demonstrates that for a specified number of nodes the average path length reduces with increasing fan out. This suggests that the time needed for contacting a node decreases with increasing fan out.

7.3.3 Performance of QS-TRAIL in dynamic and static scenarios

For large-scale experiments the following situations are considered:

i.
The gossiping environment is static for consecutive requests.
ii.
The gossiping environment is dynamic for consecutive requests with node joining.
iii.
The gossiping environment is dynamic for consecutive requests with node leaving.

In Figs. 4, 5, and 6 the results for the simulation scenarios 1–3 are shown respectively. The nodes issuing transactions that request the object are selected at random. The requisite number of cycles to find a quorum of specific size differs for each generated request because of the uncertain characteristics of the gossip algorithm. From the above figures, we can observe that the variation of the requisite number of cycles for a given quorum size is relatively small.

In scenario 2, the requisite number of cycles needed to find a quorum of specific size grows for each generated request as some nodes are added through the initial gossip cycles. Thus as the number of nodes increases, the cycles required to find quorums of specific sizes also increases, but still QS-TRAIL protocol works properly satisfying safety and one-copy equivalence properties.

Scenario 3 considers removal of nodes from the network due to failures. During the initial gossip cycles, some nodes become nonexistent. As a result, the message passes through a fewer number of nodes and less cycles are needed. The requisite number of cycles in this case varies largely as some of the nodes may attempt to send request to some of the non existing nodes prior to peer sampling completion.

7.3.4 Comparisons with existing multiple copies DTM approaches

We performed simulations for comparing the performance of QS-TRAIL protocol using probabilistic quorum system against the data flow approach FLOODING [11] and the control flow approach Hiper TM protocol [10] for particular system sizes. We have considered FLOODING and Hiper TM as comparison benchmarks as these two protocols have used different replication, concurrency control and fault tolerance strategies compared to our proposed approach. The measures of comparisons are mainly quorum size, fault tolerance and the number of messages needed for selecting quorums of specific sizes.

Our simulation study uses the following parameters: FLOODING uses the Tree protocol [25]. In the FLOODING protocol a tree of n nodes and height h is considered. The degree of each node is taken as 2d + 1. The value of d is considered in the range of [11, 26] and h is taken as [3, 4]. The number of nodes and the quorum size are calculated using the formulas given in [18]. Hiper TM uses a strict quorum system in which the intersection between the quorums has at least one non faulty node. The quorum sizes for the different number of nodes are calculated using the formulas given in [10]. For the probabilistic quorums the size of the quorums is taken as $ 1\sqrt n $ according to definition given in Malkhi et al. [24]. The value of l is taken in the range of [2.45, 6].

7.3.4.1 Quorum size analysis

Figure 7 depicts the size of the quorum selected in QS-TRAIL protocol, FLOODING [11] and Hiper TM protocol [10] for different network sizes. As demonstrated in the figure, the size of the quorum chosen by the QS-TRAIL protocol is much smaller than by using the FLOODING protocol and the Hiper TM protocol. The size of the quorum selected is a considerable measure as a smaller size ensures low message and communication cost in accessing the quorum.

7.3.4.2 Fault tolerance analysis

Another measure considered for performance comparison is fault tolerance. In Fig. 8 we compare the fault tolerance of QS-TRAIL, FLOODING protocol and the Hiper TM protocol for different network sizes. From the figure it is clear that QS-TRAIL protocol using probabilistic quorum system have much better fault tolerance than FLOODING protocol and the Hiper TM protocol even for smaller quorum sizes (thus providing better efficiency).The smallest number of available nodes in QS-TRAIL protocol in case of failures is much greater than the other two protocols thus ensuring that quorums always intersect and the most recent copy of the object is available for transaction execution. The improvements become more evident as the network grows. The fault tolerance of the QS-TRAIL protocol justifies our theoretical results as proved in Theorem 8.We can observe that the fault tolerance increases linearly (order of n) with the number of nodes in case of QS-TRAIL.

7.3.4.3 Message complexity analysis

Figure 9 demonstrates the number of messages used in all the three protocols under network failures for different network sizes. We proved in Theorem 7 that the number of messages required in QS-TRAIL protocol is $ {\text{O}}\left( {\surd n} \right) $.We can observe from the figure that our simulation results justifies our theoretical results. As the figure shows, when the size of the network is small, the number of messages used in the proposed method and the FLOODING protocols are nearly the same. However, the message complexity of Hiper TM protocol increases as the size of the quorum used in the protocol is almost equal to the number of nodes in the network. As network size increases, the number of messages used in QS-TRAIL protocol is much less than the other two protocols. Specifically the message complexity of QS-TRAIL protocol is less than by almost 56% for large network sizes compared to FLOODING protocol.

8 Conclusion

In this paper we propose an efficient quorum based replication algorithm for solving the single copy DTM protocol in a network with node failures. QS-TRAIL protocol executes on an unstructured network and utilizes gossip protocol for delivering messages to the nodes. So, unlike existing protocols, QS-TRAIL does not require reconfiguration of the system in the event of failures. Our experimental results shows that our proposed method has much better fault tolerance than other existing multi copy protocols for D-STM. The protocol maintains a low message complexity and communication cost as well as smaller quorum sizes compared to other quorum based replication protocols in a failure prone network. These properties are highly desirable for a distributed system with network failures.

References

Herlihy, M., Sun, Y.: Distributed transactional memory for metric-space networks. Distrib. Comput. 20(3), 195–208 (2007)
Article Google Scholar
Attiya, H., Gramoli, V., Milani, A.: Combine: an improved directory-based consistency protocol. In: Technical, LPD-2010-002, EPFL (2010)
Sharma, G., Busch, C., Srinivasagopalan, S.: Distributed transactional memory for general networks. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS 2012) (2012)
Zhang, B., Ravindran, B.: BA: relay: a cache-coherence protocol for distributed transactional memory. In: Proceedings of the 13th international conference on Principles of Distributed Systems (OPODIS), pp. 48–53 (2009)
Herlihy, M., Kuhn, F., Tirthapura, S., Wattenhofer, R.: Dynamic Analysis of the Arrow Distributed Protocol Theory of Computing Systems. Springer, New York (2006)
MATH Google Scholar
Saad, M.M., Ravindran, B.: Snake: control flow distributed software transactional memory. In: SSS, pp. 238–252 (2011)
Saad, M.M., Ravindran, B.: RMI-DSTM: control flow distributed software transactional memory: technical report. ECE Dept., VirginiaTech, Blacksburg, VA (2011)
Google Scholar
Chatterjee M., Setua S.K.: A distributed transactional memory protocol for dynamic networks. In: Mandal J., Dutta P., Mukhopadhyay S. (eds.) Computational Intelligence, Communications, and Business Analytics. CICBA 2017. Communications in Computer and Information Science, vol. 775. Springer, Singapore
Kim, J., Ravindran, B.: Scheduling transactions in replicated distributed software transactional memory. In: CCGrid, pp. 227–234 (2013)
Hirve, S., Palmieri, R., Ravindran, B.: HiperTM: high performance, fault-tolerant transactional memory. In: Chatterjee, M., Cao, J., Kothapalli, K., Rajsbaum, S. (eds.) Distributed Computing and Networking. ICDCN 2014. Lecture Notes in Computer Science, vol. 8314. Springer, Berlin (2014)
Google Scholar
Zhang, B., Ravindran, B.: A quorum-based replication framework for distributed software transactional memory. In: OPODIS, pp. 18–33 (2011)
Peluso, S., Ruivo, P., Romano, P., Quaglia, F., Rodrigues, L.: When scalability meets consistency: genuine multiversion update-serializable partial data replication. In: ICDCS, pp. 455–465 (2012)
Attiya, H., Gramoli, V., Milani, A.: Directory protocols for distributed transactional memory. In: Guerraoui, R., Romano, P. (eds.) Transactional Memory. Lecture Notes in Computer Science, vol. 8913, pp. 367–391. Springer, New York (2015)
Google Scholar
Palmieri, R., Peluso, S., Ravindran, B.: Transaction execution models in partially replicated transactional memory: the case for dataflow and control flow. In: Guerraoui, R., Romano, P. (eds.) Transactional Memory. Lecture Notes in Computer Science, vol. 8913. Springer, New York (2015)
Google Scholar
Busch, C., Herlihy, M., Popovic, M., Sharma, G.: Impossibility results for distributed transactional memory. In: PODC’15, July 21–23, 2015, Donostia-San Sebastián, Spain (2015)
Busch, C., Herlihy, M., Popovic, M., Sharma, G.: Fast scheduling in distributed transactional memory. In: Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architecture-SPAA’17
Hirve, S., Palmieri, R., Ravindran, B.: Archie: a speculative replicated transactional system. In: Middleware, pp. 265–276 (2014)
Keidar, I., Perelman, D.: Multi-versioning in transactional memory. In: Guerraoui, R., Romano, P. (eds.) Transactional Memory Lecture Notes in Computer Science, vol. 8913, pp. 150–165. Springer, New York (2015)
MATH Google Scholar
Hendler, D., Naiman, A., Peluso, S., Quaglia, F., Romano, P., Suissa, A.: Exploiting locality in lease-based replicated transactional memory via task migration. In: DISC, pp. 121–133 (2013)
Romano, P., Palmieri, R., Quaglia, F., Carvalho, N., Rodrigues, L.: On speculative replication of transactional systems. J. Comput. Syst. Sci. 80(1), 257–276 (2014)
Article MathSciNet Google Scholar
Kim, J., Ravindran, B.: On transactional scheduling in distributed transactional memory systems. In: SSS, pp. 347–361 (2010)
Lima, D., Miranda, H., Jaiani, F.: Simulation of partial replication in distributed transactional memory. In: IEEE Wireless Days Conference, pp. 54–59 (2017)
Guerraoui, R., Herlihy, M., Pochon, B.: Towards a theory of transactional contention managers. In: PODC’05: Proceedings of the Twenty-Fourth Annual ACM Symposium on Principles of Distributed Computing, New York, pp. 258–264, ACM (2005)
Malkhi, D., Reiter, M., Wright, R.: Probabilistic quorum systems. In: Proceedings of the 16th ACM Symposium on Principles Distributed Computing, pp. 267–273 (1997)
Agarwal, D., El Abbabi, A.: The tree quorum protocol: an efficient approach for managing replicated data. In: Proceedings of the 16th VLDB Conference, Brisbane (1990)
Malkhi, D., Reiter, M.K., Wool, A.: Probabilistic quorum systems. Inf. Comput. 170, 184–206 (2001). https://doi.org/10.1006/inco.2001.3054
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Calcutta University Technology Campus, JD-2, Sector-III, Salt Lake, Kolkata, 70 0 098, India
Moumita Chatterjee & Sanjit Kumar Setua
Department of Computer Science and Engineering, Academy of Technology, Adisaptagram, West Bengal, 712121, India
Anirban Mitra
RCIL, Washington University in St. Louis, 510 South Kings Highway Boule-vard, Saint Louis, MO, 63110, USA
Sudipta Roy

Authors

Moumita Chatterjee
View author publications
You can also search for this author in PubMed Google Scholar
Anirban Mitra
View author publications
You can also search for this author in PubMed Google Scholar
Sudipta Roy
View author publications
You can also search for this author in PubMed Google Scholar
Sanjit Kumar Setua
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sudipta Roy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chatterjee, M., Mitra, A., Roy, S. et al. Gossip based fault tolerant protocol in distributed transactional memory using quorum based replication system. Cluster Comput 23, 1103–1124 (2020). https://doi.org/10.1007/s10586-019-02973-7

Download citation

Received: 28 September 2018
Revised: 14 April 2019
Accepted: 14 August 2019
Published: 03 September 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s10586-019-02973-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Gossip based fault tolerant protocol in distributed transactional memory using quorum based replication system

Abstract

Similar content being viewed by others

A Distributed Transactional Memory Protocol for Dynamic Networks

A Hybrid Quorum Protocol for Improved Availability, Capacity, Load and Reduced Overhead

CaLibRe: A Better Consistency-Latency Tradeoff for Quorum Based Replication Systems

1 Introduction

2 Related work

3 Contribution of the work

4 System model and scope of work

4.1 System model

4.2 Distributed transactions

4.3 TRAIL protocol

4.4 Limitations of TRAIL protocol

5 Preliminary definitions

Definition 1

Definition 2

Definition 3

6 The proposed algorithm

6.1 The main idea

6.2 Object model used in the algorithm

6.3 Local data structures used by each transaction Ti

6.4 Data structures at each node Ni

6.5 Additional parameters at initiator node

6.6 Structure of messages

6.7 Description of the algorithm

6.7.1 Read operation

6.7.2 Write operation

6.7.3 Request to commit operation

6.7.4 Commit operation

6.7.5 Abort operation

7 Analysis of the algorithm

7.1 Proof of correctness

Theorem 1

Proof

Theorem 2

Proof

7.1.1 Safety property

Property 1

Property 2

Theorem 3

Proof

7.1.2 One copy equivalence property

Lemma 1

Proof

Lemma 2

Proof

Theorem 4

7.1.3 Liveness property (starvation freedom)

Property 3

Lemma 3

Proof

Property 4

Lemma 4

Proof

Theorem 5

7.2 Complexity analysis

Theorem 6

Proof

Theorem 7

Proof

Theorem 8

Proof

7.3 Performance analysis

7.3.1 Time complexity analysis

7.3.2 Average path length

7.3.3 Performance of QS-TRAIL in dynamic and static scenarios

7.3.4 Comparisons with existing multiple copies DTM approaches

7.3.4.1 Quorum size analysis

7.3.4.2 Fault tolerance analysis

7.3.4.3 Message complexity analysis

8 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

6.3 Local data structures used by each transaction T_i

6.4 Data structures at each node N_i