Protecting against malicious and selective forwarding attacks for P2P search & retrieval system

Chuang, Yung-Ting

doi:10.1007/s12083-016-0500-1

Protecting against malicious and selective forwarding attacks for P2P search & retrieval system

Published: 24 August 2016

Volume 10, pages 1079–1100, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Peer-to-Peer Networking and Applications Aims and scope Submit manuscript

Protecting against malicious and selective forwarding attacks for P2P search & retrieval system

Download PDF

Yung-Ting Chuang¹

229 Accesses
4 Citations
Explore all metrics

Abstract

With recent advances in networking technology, emerging networks continue to play an increasing role in the lives of most users. The Internet search and retrieval system is so powerful that it helps us to share information and perspectives from across the world. However, the threat of censorship exists on some centralized search engines, since all of their information is currently controlled by these sites administrators. The restriction and control of information are pervasive enough within governments and organizations to censor or intrude on even the most free and uncontrolled communication media. For this reason, the Peer-to-Peer (P2P) search and retrieval system is designed to resist censorship over the network. Nevertheless, its decentralized nature makes it very difficult to infer information that cannot be measured directly, such as the proportion of subverted and selfish nodes. Moreover, the situation is even more challenging when the network becomes extremely large. Hence, I propose a dynamic adaptive algorithm that can: 1) tackle the censorship and security issues; 2) determine the proportion of subverted and selfish nodes; 3) defend against malicious and selective forwarding attacks by appropriately adjusting the number of requests to ensure high match probability; 4) guarantee robustness and scalability even with different random networks and varied network sizes. In several experiments, I demonstrate that my algorithm can effectively and accurately estimate these metrics and manage the system, even when the network has a large proportion of malicious nodes, a large proportion of selfish nodes, or a mere partial view of network membership.

An Efficient, Secure and User Privacy-Preserving Search Protocol for Peer-to-Peer Networks

Next level peer-to-peer overlay networks under high churns: a survey

Article 21 December 2019

TMV: Trust-Matrix-Value Based Neighbor Peer Selection for Secure Query Forwarding in P2P Networks

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Traditionally, network technology was developed to enable the transfer of information between devices, and intentionally transferred information from the server to the clients. With recent advancements in networking technology, emerging networks continue to play an increasing role in the lives of most users. The Internet, which is designed to be uncontrolled and unbiased, has become the most popular information source in the post-industrial world. The Internet is so powerful that it helps us to share information and perspectives from across the world. Similarly, search engines are a major driver of this revolution, which aims for unbiased and uncontrolled information retrieval.

However, the threat of censorship appears to be one of the problems on these centralized search engines, since the information is controlled by these sites administrators. Currently, we all trust those administrators to remain benign and allow us to access, publish, and retrieve the information. However, history has shown that restriction and control of information are pervasive enough within governments and organizations to censor or intrude on even the most free and uncontrolled communication media. For example, internet censorship is prevalent in China and its social media [9]. According to Chomhaill et al., even though censorship is contrary to the very idea of freedom of speech [4], it seems to be continuing to grow [9] in China.

Thus, I had previously presented the design of the trustworthy information publication and retrieval system [11], which is an unstructured P2P search system that takes a probabilistic approach and avoids information censorship over the network. However, the decentralized nature of this system makes it very difficult to deduce the information that cannot be measured directly, such as the proportion of subverted nodes that do not reply when they have matches. When we considered the question of malicious nodes, we specifically considered an attack scenario in which the network has a large number of malicious or Sybil nodes that behave normally most of the time, but do not match requests and return responses. For example, BitTorrent [59] might suffer a Sybil attack when a few malicious nodes stand in for a larger number of nodes and refuse to respond when they have matches, thus affecting overall download time and retrieval rate. These Sybil attacks also exist in Instant Messaging (IM) applications, which use Distributed Hash Tables (DHT) routing, where a few malicious nodes might insert a large number of fake nodes and act as normal nodes in the network, but drop queries or refuse to reply to queries even when they have matches. As a result, I had previously introduced a novel algorithm [10] in which every node maintains a full view of the membership, and uses direct communications to protect against the subverted nodes in a small network. With [10], I demonstrated that the network can maintain the same high probability of a match when some of the nodes are subverted as when all of the nodes are operational.

Traditionally, when a source node needs to distribute its message to multiple nodes, it multicasts its messages directly to all of these nodes. However, this way might not be a feasible method of communication, especially when in a large network. As a result, most systems prefer an alternative strategy called probabilistic forwarding, such that the source node first transmits the message to a set of nodes, and has each of these nodes forward the messages to another set of nodes, and so on. Probabilistic forwarding is an important solution to multicasting and gossiping, and has been largely applied to sensors in wireless sensor networks (WSNs), mobile ad-hoc networks (MANETs), and vehicular ad-hoc networks (VANETs) [7, 26, 29, 53], where there are enormous numbers of mobile nodes distributed over a large area. For example, WSNs are widely applied to military sensing, habitat monitoring, and military tracking, where their sensors are gathered together to sense and share information with other nodes. Similarly, VANETs are applied to traffic information systems [52], where sensors collect and process current traffic data in cooperation with others. The advantage of such a message-forwarding strategy is that it allows nodes to share the load of processing, buffering, and communication costs with others, rather than placing the entire load on the source node. Moser and Melliar-Smith [46, 47] hence perform a probabilistic analysis that finds the probability density function (pdf) for the number of distinct nodes reached in terms of forwarding fanouts, forwarding levels, and forwarding probability.

Nevertheless, the above works [10, 46, 47] would not be applicable to a scenario that contains all of the following cases: 1) when the network size is extremely large; 2) when the network uses a message-forwarding strategy; 3) when there exist subverted nodes that do not respond when they have matches; or 4) when there exist selfish nodes that do not forward messages. When the network size is extremely large, such as in the case of file-sharing networks like Gnutella [49] or BitTorrent [13], the situation is even more challenging, since many of the nodes cannot afford to maintain a full view of the membership due to limitations on memory space and bandwidth. In addition, when considering networks where nodes are allowed only for short-range communication (i.e., wireless sensor networks (WSNs), mobile ad-hoc networks (MANETs), and vehicular ad-hoc networks (VANETs)), direct communication would not be feasible for data distribution and retrieval; thus, the source or requesting nodes would need to first forward their messages to a set of nodes, and then have each of these nodes forward the messages to another set of nodes. Moreover, when the network suffers Sybil attacks [27, 59], malicious nodes refuse to respond when they have matches. Furthermore, when the network has selected forwarding attacks [1, 63], selfish nodes do not forward any metadata or request messages. The fact that there is no work applicable to a scenario that contains all four of the cases mentioned above, and thus, this is why I have developed a dynamic adaptive message forwarding algorithm in this paper.

My algorithm can accurately estimate the proportion of subverted and selfish nodes, adaptively adjust the number of requests according to various network sizes, and achieve high match probability even when the network has a large proportion of malicious and selfish nodes. The following summarizes the contributions of this paper:

1.
My algorithm tackles censorship and security issues, such that it can assure Internet users that a small number of administrators cannot prevent them from distributing their messages and information, or from retrieving information from others.
2.
My algorithm employs the probability density function (pdf), Exponential Weighted Moving Average (EWMA) method, and modified chi-squared test to determine the proportion of subverted and selfish nodes.
3.
My algorithm defends against malicious and selective forwarding attacks by appropriately adjusting the number of requests to ensure high match probability, even when the network has large proportions of subverted and selfish nodes.
4.
My algorithm guarantees robustness and scalability in different random networks (e.g., Erdos-Renyi, Watts-Strogatz, and Barabasi-Abert networks) and across varied network sizes.

We know that it is important to develop such an algorithm for P2P search and retrieval systems before it is needed, to ensure that it is available when it is needed. As with other questions of trust, we must create our defenses before a problem arises.

Section 2 of this paper discusses related work, and Section 3 describes how my system works. Section 4 describes the architecture of the P2P search and retrieval system, and Section 5 provides foundations for the system. Section 6 describes the message forwarding algorithm, and Section 7 presents an experimental evaluation. Lastly, Section 8 presents the conclusion and future work.

2 Related work

2.1 Peer-to-peer network

Mischke and Stiller [44], Risson and Moors [50], along with Tsoumakos and Roussopoulos [57], provide comparisons of distributed search methods for peer-to-peer networks. The structured approach [5, 28] requires the nodes to be organized in an overlay network based on distributed hash tables, trees, or rings, which is efficient, but is vulnerable to control by its administrators. The unstructured approach [12, 22, 35, 49, 54, 55, 61] is typically a gossip-based approach, which uses randomization, and requires nodes to find each other by exchanging messages over existing links. My system, therefore, uses the unstructured approach, which is less vulnerable to manipulation.

Gnutella [49], one of the first unstructured networks, uses flooding of requests to find information. Freenet [12] is more sophisticated and efficient than Gnutella because it learns from previous requests. In Freenet, nodes that successfully respond to requests receive more metadata and more requests; thus, it is easy for a group of untrustworthy nodes to gather together and collect the most searches into their group, making Freenet vulnerable to subversion.

Ferreira et al. [22] used random walks to replicate both queries and data in an unstructured network. BubbleStorm [54], a probabilistic system for unstructured peer-to-peer search, replicates queries and data, and combines random walks with flooding. Pub-2-Sub [55] is a publish-and-subscribe service for unstructured P2P networks of cooperative nodes, as it uses directed routing to distribute subscription and publication messages to other nodes.

Quasar [61] and OneSwarm [35] are concerned with trust like my system. Quasar is a probabilistic publish-or-subscribe system that uses local gradients of aggregated vectors (e.g. gravity wells), and then applies a rendezvous-less event routing infrastructure to route messages directly to the nearby group members. The authors state that privacy and scalability concerns make centralized systems undesirable to the users of social networks. On the other hand, OneSwarm [35] is a P2P data-sharing system that allows data to be shared either publicly or anonymously, using a combination of trusted and untrusted nodes. My system does not use a structured overlay, nor use the combination of trusted and untrusted nodes, and has a different trust objective than Quasar and OneSwarm.

2.2 Exponential weighted moving average and chi-squared statistics

Several network security researchers use the EWMA algorithm and the chi-squared statistics. [25] applies the chi-squared test to detect intrusions, and [51, 58] discusses how to appropriately determine the smoothing factor for detecting network anomalies based on a known window size. Similarly, [64, 66] presents anomaly-detection techniques for intrusion detection based on the EWMA method and the chi-squared test. My dynamic adaptive message forwarding algorithm does not fix a window size, but rather, I consider all of the results starting from the time that a node joins to the network until the time it leaves the network. In addition, my algorithm uses the EWMA method and the modified chi-squared test to determine the proportion of malicious nodes in a node’s view of the membership. Furthermore, I make the smoothing factor tunable, so that the user can freely set it for his/her particular network environment.

Press et al. [48] use a modified chi-squared statistic to balance the weights of the buckets, in favor of comparing two datasets. Similarly, Belen [3] and Heckert [32] apply modified chi-squared statistics to determine the similarities between attribute couples of a dataset and a projected subset. Similarly, in this paper, my algorithm employs the modified chi-squared statistic for achieving high accuracy in estimating the proportion of non-malicious and non-selfish nodes in the network.

2.3 Message forwarding and probability density function

Probabilistic forwarding, an alternative and important solution to multi-cast routing and gossiping, has been largely applied for sensors and ad-hoc networks [7, 26, 29, 53]. Hedetniemi [33] presents an overview of gossiping and broadcasting in communication networks. Similarly, Farley [20] presents algorithms that require nodes to broadcast with a minimum number of links, and determines upper and lower bounds of broadcast messages [19]. Likewise, [8, 18] uses forwarding in delay-tolerant networks to reduce overall performance cost. In addition, Deering et al. [15] present extensions to two routing algorithms, distance-vector routing and link-state routing, to achieve greater efficiency when messages are forwarded in both inter-networks and extended LANs.

Moser and Melliar-Smith [46, 47] present a paper that finds: 1) the probability density function (pdf) for the number of distinct nodes reached, 2) the expected number of distinct nodes to which a message is forwarded within a fixed and small size of the network, 3) the probability density function (pdf) for the number of new nodes at a given forwarding level, and 4) the expected number of new nodes at a given forwarding level. My scalable and adaptive forwarding algorithm thus applies the probability density function (pdf) like [46, 47] to estimate the expected number of distinct nodes that would receive the metadata and requests. However, rather than just estimating the expected number of requests and metadata, my algorithm calculates the match probabilities in terms of the forwarding fanout, forwarding level, forwarding probability, number of nodes that report matches, proportion of subverted nodes, and proportion of selfish nodes. In addition, my algorithm can estimate the proportion of subverted and selfish nodes, and can appropriately determine the appropriate number of requests to achieve high match probabilities.

2.4 Detecting and defending against malicious attacks

Some prior work has been focused on trustworthiness, in particular for detecting and protecting against malicious attacks. Jesi et al. [36] identify malicious nodes in an overlay network based on gossiping, and place such nodes on a blacklist. They focus on hub attacks in which malicious nodes collude and divide the network by spreading false rumors. [38] proposes a techniques to cope with malicious behavior, as well as coping with nodes that do not respond in order to reduce their workloads [31]. Some approaches use a prediction model obtained by training decision trees over a sequence of normal data [40], or use neutral networks to obtain a model [24] to detect novel attacks. [34] uses a statistical method for ranking each sequence and determining how often the sequence should occur in normal traces or in intrusions. Nevertheless, my algorithm does not use gossiping, training decision trees, or neutral networks to detect novel attacks, but rather, my algorithm detects malicious attacks based on the probability density function (pdf), EWMA method, modified chi-squared test, and the number of matches received for a node’s request.

Condie et al. [14] present a protocol for finding adaptive P2P topologies to protect against malicious peers that upload corrupt, inauthentic, or misnamed content. Peers improve the trustworthiness of the network by forming connections based on its past trust scores, disconnecting malicious peers, and moving them to the edge of the network. Morselli et al. [45] describe an adaptive replication protocol that uses random walks, and have a feedback mechanism that adjusts the number of replicas according to the mean search length, for determining sufficient replications. However, my algorithm does not disconnect malicious nodes, nor does it use random walks; rather, my algorithm increases the number of nodes to distribute requests in order to maintain high probability, even when the network contains a large proportion of malicious nodes.

2.5 Detecting and defending against selective forwarding attacks

There are various techniques to detect and defend against selective forwarding attacks. According to [63], malicious nodes might refuse to forward certain messages, or drop the messages passing through them, resulting in a failure to propagate the messages any further. As noted in [1], selective forwarding attacks are even harder to detect in WSN, since sensor nodes all have limited signals.

Deng et al. [16] propose a Secure Data Transmission (SDT) that uses watermark technology to detect selective forwarding attacks. [30] uses Lightweight Detection to generate the alert packet and detects selective forwarding attacks. Alajmi and Elleithy [1] demonstrate that their system obtains safe data transmission between nodes and effectively detect the selective forwarding attacks, and achieves reasonable design issues. Similarly, Yu and Xiao [62, 65] present a Lightweight Security Scheme (LWSS) to detect selective forwarding attacks in WSN, which uses multi-hop acknowledgement techniques to launch alarms by obtaining responses from intermediate nodes. Yu and Xiao further claim that their scheme is reliable because they randomly select some nodes as the checkpoint to send acknowledgement in order to detect the adversary nodes. However, my algorithm detects selective forwarding attacks by applying the probability density function (pdf), EWMA method, and modified chi-squared test, using the number of matches received for a node’s request to estimate the proportion of non-selfish nodes in the network, and thus is quite different from the techniques as discussed above.

Mathur et al. [41] propose an algorithm that detects black-hole and selective forwarding attacks, defends against these attacks by employing cryptographic hashes for black-hole attacks, and applies a neighborhood watch and threshold-based analysis to correct selective forwarding attacks. In [23], the authors uses multipath routing scheme to defend against selective forwarding attacks in WSN, such that when a node detects the packet drop during the routing process, it will resend the packet to the alternative route. Karlof et al. [37] propose multi-path routing to defend against the selective forwarding attacks, in which they basically allow every packet to choose its next hop probability from a set of possible nodes, thus reducing the chances of having adversary nodes gain complete control of data flow. Likewise, Lee and Cho [39] propose a fuzzy-based reliable data-delivery method to defend against selective forwarding attacks, where the number of paths for packet delivery is based on the fuzzy method. However, for protecting against selective forwarding attacks, my algorithm increases the number of nodes that forward the requests to maintain high match probability, rather than using threshold-based analysis, multipath routing, or a fuzzy-based scheme.

2.6 Random networks

Some prior work has focused on random networks that might contain the following properties: 1) the network has a small-world effect; 2) the network follows a power-law distribution; 3) the network follows a binomial distribution. The first property means that most nodes are reachable from other nodes within a small number of hops, and this property can also apply to the existing social networks [43]. The second property means that the probabilities of the node degrees vary as a power-law. The third property indicates that degree of any node follows a binomial distribution.

In my experimental evaluation section, I have considered three different random networks, namely, Erdos-Renyi (ER) network, Watts-Strogatz (WS) network, and Barabasi-Albert (BA) network, in the context of my algorithm for a P2P search and retrieval system. Therefore, I briefly introduce these random networks in the below subsections.

2.6.1 Erdos-Renyi (ER) network

Erdos-Renyi (ER) network [17], a classic random network that contains small-world effect, forms a connection between two nodes with random probabilities. The ER network has the following properties: 1) every link I _{i
j} has a probability of p∈(0,1) of being selected, 2) every link I _{i
j} is independent from other links, 3) the total number of edges is $\frac {p \times n(n-1)}{2}$, 4) the degree of any node forms a binomial distribution, 5) the presence of the link I _{i
j} is based on:

$$ I_{ij}= \left\{\begin{array}{ll} 1, & \text{with probability}\ p \\ 0, & \text{with porbability}\ 1-p \end{array}\right. $$

(1)

2.6.2 Watts-Strogatz (WS) network

The Watts-Strogatz (WS) network [60], another classic random network with small-world effects, is formed after the Erdos-Renyi (ER) network. The author created this network to address the limitations of ER networks, where the aim is to avoid a low clustering coefficient. Therefore, the WS network is claimed to solve the low-clustering issue, while still maintaining the short average path lengths of the ER network. The WS network is constructed with the following rules: 1) network initially places nodes to one or multi-dimensional lattices (e.g., circle or grid), 2) network connects each node with its m a x N of nearest neighbors, 3) nodes add a few long-range links based on the re-wiring probability β, such that the overall network has a short average path lengths [56], 4) the re-wiring probability β that connects the random pair of nodes for each link I _{i
j} in the network is based on:

$$ I_{ij}= \left\{\begin{array}{ll} 1, & \text{with probability}\ \beta \\ 0, & \text{with porbability}\ 1-\beta \end{array}\right. $$

(2)

If the re-wiring probability β is high, the network might have more chances to establish long-range connections between two random nodes. On the other hand, if the re-wiring probability β is low, the network would look the same as the original lattices.

2.6.3 Barabasi-Albert (BA) network

Barabasi-Albert (BA) network [2], a scale-free network with preferential attaching to high-degree nodes, follows a power-law distribution. As a result, the BA network tends to form a rich-get-richer process. The BA network contains the following properties: 1) network begins with an initial connected network of s m a l l N nodes, 2) when a new node joins the network, it forms a connection to the existing s m a l l N nodes with a connecting probability p _i, which is proportional to the summation of links from these existing s m a l l N nodes, 3) the connection probability p _i that a new node is connected to node i is based on:

$$ p_{i} = \frac{k_{i}}{{\sum}_{j}k_{j}} $$

(3)

where k _i is the degree of node i, and j is the number of existing nodes. From Eq. 3, we can see that the high-degree nodes tend to accumulate more links, while low-degree nodes acquire hardly any new links.

3 System model

The steps of my system are given below:

1.
A new node initially called g e t B o o t V i e w() (Section 4.1) to obtains some members from the bootstrapping nodes selected at random, and joins the network. Once that node has successfully joined to the system, it can become a source node or a requesting node at any time.
2.
When a node wishes to distribute its metadata, it becomes a source node. The source node first calls a l D i s t r i b u t i o n() (Section 5.6) to determine the appropriate forwarding fanouts a and forwarding levels l, based on the expected number of nodes to which the metadata should be distributed m. Finally, the source node calls the d i s t r i b u t e() (Section 4.2) to distribute and forward the metadata message to randomly chosen nodes in its view.
3.
Similarly, when a node wishes to distribute its request, it becomes a requesting node. The requesting node first calls the E W M A() (Section 5.3) to calculate the current non-normalized observed probability array O, and then applies x f G e t() (Section 5.7) (the method that contains n o r m() (Section 5.4) and m o d C h i S q() (Section 5.5)) to estimate the proportion of subverted and selfish nodes in the network, based on its previous request results. Next, the requesting node calls the a l G e t() (Section 5.8) (the method that contains p d f() (Section 5.1) and f i n d M a t c h P r o b a b i l i t y() (Section 5.2)) to calculate the appropriate value of a forwarding fanouts and the l levels of message forwarding to which a request is distributed. Finally, the requesting node calls r e q u e s t() (Section 4.3) to distribute and forward the request to randomly chosen nodes in its view.

Figure 1 shows the flow diagram of my system, including all the steps mentioned above.

4 Design of P2P search and retrieval system

The P2P Search and Retrieval system is a fully distributed system with no centralized mechanisms and no centralized control. The nodes that participate in the network constitute the membership of the network. The source nodes produce metadata that describes its information, and distribute this metadata, along with the URL of the information, to a subset of the participating nodes selected at random. The requesting nodes generate requests that contain search keywords, and distribute the requests to a subset of the participating nodes selected at random. Nodes that receive such a request compare the keywords from the request with the metadata they hold. If there is a match, the matching node returns the URL of the associated information to the requesting node. The requesting node can then use this URL to further retrieve the information from the source node. In my system, nodes do not aim to maintain an agreed membership, but rather are allowed have their own and limited local view of the membership. The following subsections describe the detailed design of my system.

4.1 Joining the membership, getBootView()

The steps of joining the membership are given below, and are shown in Fig. 2.

1.
A node joining the membership first senses all of the nodes near itself. Next, it randomly selects a number of nodes to be its bootstrapping nodes, to which it needs to distribute membership requests, and asks for all of their members. Some of these bootstrapping nodes might forward these membership requests to other nodes in the network, based on the chosen forwarding probability.
2.
The bootstrapping nodes return all of its members to the joining node.
3.
The joining node then randomly chooses up to m a x N nodes, and adds them to its membership view.

4.2 Distribution metadata, distribute()

The steps involved when a source node distributes its metadata are given below, and are illustrated in Fig. 3

1.
The source node first randomly chooses some nodes to which it needs to distribute its metadata. The source node then distributes its metadata together with the URL to the selected nodes.
2.
Based on the chosen forwarding probability, some nodes might further forward this metadata to other nodes in the network.

4.3 Distribution requests, request()

The steps involved when a requesting node distributes requests are given below, and are illustrated in Figs. 4 and 5.

1.
A requesting node randomly chooses some nodes to which it needs to distribute its requests, and distributes its request to the selected nodes. Based on the chosen forwarding probability, some nodes might further forward this request to other nodes in the network.
2.
A node that receives a request compares the keywords in the request with the metadata it holds. If it finds a match, the node responds to the requesting node with the URL.
3.
The requesting node contacts and retrieves the actual sources from the source node.

5 Foundations

In this section, I describe the foundations of my proposed algorithm. The algorithm detects two kinds of attacks, where one attack is such that nodes do not match requests or metadata (called subverted nodes), and the other attack is such that nodes do not forward messages (called selfish nodes).

5.1 Probabilistic Analysis with probability density function, pdf()

In order to calculate the expected number of distinct nodes to which n ₀’s message is forwarded, I employ the following equation, as described in [6, 21, 46], to first calculate the probabilities of intersection between two subsets in order to eliminate duplicates. The parameters for the equation are described as follows:

N: The set having cardinality n
Y: A subset of N having cardinality y
Z: A subset of N having cardinality z, where y≤z.

The probability density function (pdf) v(k) for 0≤k≤y, that $Y \cap Z$ has cardinality k is then given by:

$$ v(k) = \binom{y}{k} \prod\limits_{j=0}^{min(z,k-1)}\frac{z-j}{n-j} \prod\limits_{j=0}^{min(n-z,y-k-1)}\frac{n-z-j}{n-k-j} $$

(4)

for k−1≥0, n−z≥0, and y−k−1≥0. If any of these conditions is not satisfied, then v(k)=1.

Next, I employ the algorithm described in [46], to calculate the probability density function (pdf) [6, 21] for the number of nodes that receive the messages in terms of forwarding level, forwarding fanout, and forwarding probability. The algorithm calculates the probability density function (pdf) for the number of nodes at the level of message forwarding to which n ₀’s message is forwarded from levels 0 through L, and the probability density function (pdf) for the number of nodes to which n ₀’s message is forwarded from levels 0 through L. In addition, the algorithm calculates the expected number of nodes at level l, and the expected number of nodes from level 0 through L.

5.2 Match probabilities with message forwarding, findMatchProbability()

The primary parameters for the match probabilities with message forwarding are:

a: The upper bound on the number of forwarding fanouts that a node should forward the message
l: The upper bound on the number of forwarding levels that a node should forward the message
m: The upper bound on the number of nodes in the network that have received n ₀’s metadata with fanout of a from level 0 to l, and is given by:
$$ m = 1 + a + a^{2} + {\dots} + a^{l} = \left\{\begin{array}{ll} \frac{a^{l+1}-1}{a-1} & \text{if } a>l\\ l+1 & \text{if } a=1 \end{array}\right. $$
(5)
r: The upper bound on the number of nodes in the network that have received n ₀’s requests with fanout of a from level 0 to l, and is given by:
$$ r = 1 + a + a^{2} + {\dots} + a^{l} = \left\{\begin{array}{ll} \frac{a^{l+1}-1}{a-1} & \text{if } a>l\\ l+1 & \text{if } a=1\\ \end{array}\right. $$
(6)
n: The number of participating nodes in the network.
X: The number of participating nodes that are operational.
k: The number of nodes that report matches to a requesting node.
F: The number of participating nodes that are non-selfish and forward messages.

The analytical model is based on the hypergeometric distribution [21], which describes the number of successes in a sequence of random draws from a finite population without replacement. The analytical model that finds match probabilities of k matches is given by:

$$\begin{array}{@{}rcl@{}} P(k) \!&=&\! \frac{\binom{mx}{k}\binom{n-mx}{r-k}}{\binom{n}{r}}\\ \!&=&\! \frac{(\frac{mx}{k} \frac{mx-1}{k-1} \!\ldots\! \frac{mx-k+1}{1}) (\frac{n-mx}{r-k} \frac{n-mx-1}{r-k-1} \!\ldots\! \frac{n-mx-r+k+1}{1})} {(\frac{n}{r} \frac{n-1}{r-1} \!\ldots\! \frac{n-r+1}{1})}\\ \end{array} $$

(7)

for m x + r≤n and $k \leq \min \{mx, r\}$. If either of those two conditions is not satisfied, then P(k)=0.

From Eq. 7, the match probabilities P(k≥1) of one or more matches is given by:

$$\begin{array}{@{}rcl@{}} P(k \geq 1) &&\!= 1 - P(0) \\ &&\!= 1- \frac{n-mx}{n} \frac{n-mx-1}{n-1} {\ldots} \frac{n-mx+1-r}{n-r+1} \end{array} $$

(8)

where m x + r≤n.

The calculation for finding match probabilities of k matches with message forwarding is given in below, which combines both hypergeometric distribution and pdf() function [6, 21, 46, 47]:

$$\begin{array}{@{}rcl@{}} MP= {\sum}^{n}_{m=a+1} {\sum}^{n}_{r=a+1} P(k)\!\!&&\times pdf (n, a, l, F).get(m) \\&&\times pdf(n, a, l, F).get(r) \end{array} $$

(9)

Basically, the method calculates the probability of k match P(k), as described in Eq. 7, and probabilistic analysis with probability density function (pdf), as described in Section 5.1 for both p d f(n,a,l,F).g e t(m) and p d f(n,a,l,F).g e t(r), for the number of nodes reached when forwarding to m and to r nodes. The match probability is the summation of the product of these match probabilities for all values of m and r.

5.3 Exponential Weighted Moving Average Algorithm, EWMA()

My dynamic adaptive forwarding algorithm uses the exponential weighted moving average (EWMA) algorithm to smooth the observations and average the non-normalized observations for k matches over a sequence of requests. In this way, the overall result would decrease the noise that is inherited in the individual samples.

The primary parameters for the EWMA algorithm are:

c: The smoothing factor, 0≤c≤1
s _t: The output of the EWMA algorithm at time t.
v _t: The current non-normalized observed probabilities O(k) for k matches

The EWMA algorithm is defined by:

$$\begin{array}{@{}rcl@{}} s_{1} &=& v_{1}\\ s_{t} &=& c \times v_{t} + (1-c) \times s_{t-1} \quad \text{if} \; t > 1 \; \end{array} $$

(10)

In addition, I offer the user the opportunity to choose the appropriate value of c for his/her network environment. Different users, operating in different network environments with different objectives, might choose different values of c, and my system allows them to do so.

5.4 Normalization, norm()

In real-world deployments and in my experiments, it’s impossible to use a request to estimate the proportion of subverted nodes and selfish nodes when all of the responses indicate no match. This is because such responses can arise in the following situations: 1) when the metadata and the requests are distributed to disjoint subsets of nodes, 2) when there exist subverted nodes that have a match and do not report it, 3) when there exist selfish nodes that have received a message but did not forward this message, and 4) when there exists no metadata that match a request.

Thus, I exclude requests for which all responses indicate no match. Moreover, for large values of k, the probability of k matches is negligibly small, so I further exclude those requests as well. That is, I first determine a value k M a x, exclude requests that return k responses for k>k M a x, and normalize the probabilities P(k),1≤k≤k M a x. To find the normalized probabilities Q(k), 1≤k≤k M a x, I perform the following calculation:

$$ Q(k) = \frac{P(k)}{{\sum}_{i=1}^{K}P(i)} $$

(11)

5.5 Modified chi-squared method, modChiSq()

My algorithm first uses the modified chi-squared goodness-of-fit test [48] to first compare the normalized observed probabilities with the normalized analytical probabilities for different values of X. Next, my algorithm determines which of the analytical curves best matches the observed probabilities, in order to estimate the proportion of non-subverted nodes in its membership view. The modified chi-squared statistic is given as follows:

$$ \chi^{2} = \sum\limits_{k=1}^{kMax}\frac{(o_{k}-e_{k})^{2}}{o_{k}+e_{k}} $$

(12)

where:

o _k: The actual number of observations that fall into the kth bucket
e _k: The expected number of observations for the kth bucket
k M a x: The number of buckets into which the observations fall

5.6 Calculate appropriate a and l for metadata distribution, alDistribution()

According to [42], Melliar-Smith et al. have shown that the probability of one or more matches satisfied:

$$ P(k \geq 1) > 1 - \exp^{\frac{mrx}{n}} $$

(13)

If the source node distributes its metadata to $m=2\sqrt {n}$ of nodes, and the requesting node distributes its requests to $r=2\sqrt {n}$ of nodes, with x=1 proportional of nodes that are operational, then

$$ P(k \geq 1) > 1 - \exp^{\frac{2\sqrt{n}2\sqrt{n}}{n}} \geq 1 - \exp^{-4} > 0.9817 $$

(14)

Therefore, in order to obtain a high probability of one or more matches, I choose $m=2\sqrt {n}$ for the total number of nodes to distribute metadata. However, since my algorithm follows a message forwarding approach, the source node would need to first determine an appropriate number of fanouts (a) and forwarding levels (l). Therefore, my algorithm applies the algorithm, as shown in Fig. 6, that determines the appropriate number of a and l, such that the overall number of distributions would still be appropriate to achieve $m=2\sqrt {n}$ of nodes.

The parameters and variables for a l D i s t r i b u t i o n() that determine the appropriate value of a and l to which a source node distributes its metadata are described as follows:

n: The number of nodes in a node’s view of the membership.
m L: The upper bound on the number of nodes in the network that should receive the source node’s metadata.
l: The upper bound on the number of forwarding levels that a node should forward the source node’s metadata
a: The upper bound on the number of forwarding fanouts that a node should forward the source node’s metadata
m: The expected number of nodes to which the metadata should be distributed

5.7 Detecting subverted and selfish nodes, xfGet()

If the requesting node determines that there exist nodes that receive both metadata and the requests, but did not report the match, those nodes are all considered subverted nodes. In addition, if the requesting node discovers that there exist nodes that did not forward the message, those nodes are considered selfish nodes. My algorithm for detecting subverted and selfish nodes estimates the proportion of non-subverted and non-selfish nodes. For given values of n, a, l, k M a x, x f G a p, and x f M i n, my algorithm first computes the analytical probabilities for the number k of matches for various values of X and F. These values of X and F enable the algorithm to find the estimated proportion x of non-subverted nodes and the proportion of f of non-selfish nodes, which yields significant changes in the number of nodes to which to distribute requests.

My algorithm first collects data on the number of responses that a requesting node receives for its request using the probability density function (pdf), EWMA method, and modified chi-squared test. Next, it calculates the empirical probabilities from the data. The algorithm then applies the normalization method to exclude zero matches, since it cannot distinguish a case in which no metadata exists from a case in which no node holds both the metadata and the request, or in which nodes do not forward messages. Using the probability density function (pdf), EWMA method, and modified chi-squared test, my algorithm compares the normalized observed probabilities O(k) and the normalized analytical probabilities P[F,X](k), for k=1,2,...,k M a x with the various combinations of curves [ F,X].

The total combinations of curves [ F,X] depend on the value of x f G a p and x f M i n, and my system allows users to set these values for their particular network environments. When the value of x f G a p is low (e.g. x f G a p=0.1), it provides a faster reaction to change the values of x and f, as well as having a higher risk of making mistakes. On the other hand, when the value of x f G a p is high (e.g. x f G a p=0.5), it gives slower reactions to change the values of x and f, and has a lower risk of making wrong estimates. Consequently, for different network environments with different objectives, users might choose different value of x f G a p, and my system allows them to do so. For example, if an user sets x f G a p=0.3, x f M i n=0.4, it will create the following 9 combinations of curves [ F,X]: 1) [ F=1,X=1], 2) [ F=1,X=0.7], 3) [ F=1,X=0.4], 4) [ F=0.7,X=1], 5) [ F=0.7,X=0.7], 6) [ F=0.7,X=0.4], 7) [ F=0.4,X=1], 8) [ F=0.4,X=0.7], and 9) [ F=0.4,X=0.4].

After the system has finished creating the combination of curves [ F,X], my algorithm then chooses as the observed proportion of non-subverted and non-selfish nodes, the value of x and f for which the modified chi-squared value is the smallest. This value of f and x is the algorithm’s best estimate of the proportion of non-subverted and non-selfish nodes, and corresponds to the curve with the best fit.

The parameters for detection method x f G e t() that determines the estimated proportion x of non-subverted nodes and f of non-selfish nodes are described as follows:

O _i: The array of unnormalized observed probabilities at time i.
n: The number of nodes in a node’s view of the membership.
k M a x: The upper bound on the number k of responses to a request.
P[F,X]: The array of analytical probabilities for a particular value of F and X.
x: The estimated proportion of non-subverted nodes in a node’s view
f: The estimated proportion of non-selfish nodes in a network that forward messages.
l: The upper bound on the number of forwarding levels that a node should forward the requesting node’s requests
a: The upper bound on the number of forwarding fanouts that a node should forward the requesting node’s requests
x f G a p: The gap between each curve [ F,X]
x f M i n: The minimum value for F and X curve.

Pseudocode for detection algorithm is given in Fig. 7.

5.8 Protecting against subverted and selfish nodes, alGet()

If the detection algorithm estimates that the proportion x of non-subverted nodes is less than x=1.0, or the proportion f of non-selfish nodes is less than f=1.0, my algorithm makes an adjustment in which it first determines the value of y ₀ for the point on the X=1.0 and F=1.0 curve, and then computes the probability of one or more matches to obtain y ₀ = P(k≥1). Finally, the algorithm increases the number of l levels and a fanouts to which the requests are distributed, to achieve the same probability of a match as when X=1.0 and F=1.0.

The parameters and variables for the defensive method a l G e t() that determines the number a fanout and l levels to which a requesting node distributes its request are described as follows:

n: The number of nodes in a node’s view of the membership.
y ₀: The probability y ₀ = P(k≥1) of one or more matches when $r = m = 2\sqrt {n}$ and X=1.0.
x: The estimated value of x returned by the detection method x f G e t().
f: The estimated value of f returned by the detection method x f G e t().
a: The number of forwarding fanouts
l: The number of forwarding levels

Pseudocode for the defensive method is given in Fig. 8.

6 Message forwarding algorithm

6.1 Dynamic adaptive message forwarding algorithm, dynamicAdaptive()

In this section, I describe my dynamic adaptive message-forwarding algorithm, dynamicAdaptive(), which deals with attacks by subverted and selfish nodes. In addition, my algorithm addresses situations where every node cannot have knowledge of all the nodes in its membership, especially on a very large-scale network. In summary, the aims of my algorithm are to: 1) utilize the probability density function (pdf), EWMA method, and modified chi-squared test to detect subverted and selfish nodes; 2) defend against subverted and selfish nodes by appropriately adjusting the number of requests to achieve high match probability; and 3) ensure network robustness and scalability with different random networks and varied network sizes. The basic idea of my algorithm is described as follows, and the pseudocode for the dynamic adaptive algorithm is given in Fig. 9.

1.
A newly joining node calls g e t B o o t V i e w() to get its initial view of the membership from a bootstrapping node, randomly chooses up to m a x N nodes, and adds them into its initial view.
2.
A source node waits until the current time reaches n e x t T i m e, or the time to send its next metadata message. It then calls the a l D i s t r i b u t i o n() method to determine the appropriate a and l for the metadata distribution, and then calls the d i s t r i b u t e() method to distribute the metadata message to randomly chosen nodes in its view. Some nodes may forward the metadata they receive to other nodes in the network.
3.
Similarly, a requesting node waits until the current time reaches n e x t T i m e, or the time to send its next request message. The requesting node first checks whether its a and l have been previously determined. If not, it sets X=1 and F=1, then calls a l G e t() to calculate the value of a and l.
4.
After that, the requesting node calls the r e q u e s t() method to distribute the request to randomly chosen nodes in its view. Some nodes may forward the requests they receive to other nodes in the network.
5.
The requesting node further applies the EWMA() method to calculate the current non-normalized observed probability array O.
6.
Next, the requesting node waits until it has collected up to d requests, and then calls the x f G e t() method to estimate the proportion of subverted and selfish nodes in the network.
7.
The requesting node checks the number of successive estimates hair by a modified chi-squared test that indicates the same changes in the value of x and f, determines whether it should accept the change and allows the algorithm to take action to change the value of a and l. If the number of successive estimates is greater than hair, the requesting node calls the a l G e t() method to adjust the number a forwarding fanouts and the l levels of message forwarding to which a request is distributed. Otherwise, the requesting node simply sets x and f to its previous determined values (x = x P r e v, f = f P r e v).
8.
The requesting node sets the the previous x and previous f to its current values (x P r e v = x,f P r e v = f)
9.
The algorithm then goes back to Line 3 and repeats these steps indefinitely.

6.2 Non-adaptive message forwarding algorithm, nonAdaptive()

In this section, I describe my non-adaptive message-forwarding algorithm that deals with attacks by subverted and selfish nodes, nonAdaptive(). Basically, the non-adaptive algorithm is exactly the same as the dynamicAdaptive() algorithm, described in Section 6.1, except that it does not call the a l G e t() method to adjust the value of a and l. In other words, the non-adaptive algorithm only focuses on detecting the proportion of subverted and selfish nodes in the network, but does not defend against these malicious and selfish nodes. The pseudocode for the non-adaptive algorithm is nearly the same as Fig. 9, except that lines 19, and 20 are omitted. The basic ideas of the non-adaptive algorithm and pseudocode are omitted due to space constraints.

7 Experimental evaluation

In this section, I performed a simulation to evaluate my non-adaptive and dynamic adaptive message-forwarding algorithms. My system operates over the Internet, and is implemented using HTTP, which operates over TCP. As such, communication is “reliable”. However, nodes may become malicious and selfish at any time. That is, nodes in the network are assumed to follow the algorithms, except that:

A node is allowed to have a certain number of members in its local view, in which case it only maintains a partial view of the actual network.
A malicious node responds to a request but, if it has a match between the keywords in a request and the metadata it holds, it fails to report that match.
A selfish node that sends metadata responds to requests, but does not forward messages.

In the simulation program, I first initialized the numbers n, m a x N, k M a x, c, d, a, l, and hair. Next, I constructed a random network (e.g., an Erdos-Renyi network, a Watts-Strogatz network, or a Barabasi-Albert network), and added up to m a x N of nodes to each node’s view. Then, at each time step, some nodes might distribute metadata messages, distribute request messages, or forward messages. Some nodes might behave maliciously by not responding to a request with a match when they do have a match. Similarly, some nodes might behave selfishly by not forwarding to a message. Then, I evaluated the network robustness and scalability for the following scenarios: 1) various network sizes, 2) various values of m a x N, 3) various random networks, 4) various values of X, and 5) various values of F. Lastly, in order to reflect a realistic network, I designed an extended scenario such that X and F increases or decreases its value slowly, and tested the effectiveness and scalability of my adaptive algorithm with this extended scenario.

7.1 Controlled variables

Malicious nodes can disrupt the behavior of the system and network by hiding or censoring information. Thus, my adaptive algorithm must adapt to circumvent such malicious nodes. A node can’t know or control proportion of the malicious nodes in the actual membership, but it can adjust the number of request messages in order to obtain more responses, and obtain high match probability. To do so, nodes make use of the following quantity:

X: The proportion of non-malicious nodes in the actual membership at a particular point in time.

When the overall network size is extremely large (e.g., N>>m a x N), it is unrealistic and difficult to have all nodes maintaining a full view of the membership due to limitations on memory space and bandwidth. In other words, when a node’s maximum view m a x N is much smaller than the actual network size N, it would contain too many undiscovered nodes from the network. As a result, my adaptive algorithm must adapt to circumvent such a scenario, and still allow nodes to obtain high match probabilities even when m a x N is much lower than N. For my algorithm, a node cannot directly detect the number of nodes that it has not discovered, but it can adjust the number of nodes to which it distributes its requests in order to obtain more responses to its requests. In my system, each node has the following quantity:

m a x N: This is the maximum number of nodes that a node is allowed to have in its current view.

Similarly, selfish nodes can disrupt the behavior of the system and network by ceasing the information flow to other nodes. As a result, my adaptive algorithm must adapt to circumstances when selfish nodes send metadata or respond to requests, but do not forward messages. A node cannot control the proportion of selfish nodes in the network, but it can adjust the number of forwarding levels and the number of forwarding messages to obtain more responses to its requests and ensure high match probability. In order to do so, nodes make the use of the following quantity:

F: The probability of message forwarding in the network at a particular point in time, where 0<F≤1.0.

7.2 Performance metrics

The performance metrics for the scalable and dynamic message-forwarding algorithm are explained as follows:

MP: The match probability of one or more responses for a request, averaged over all requesting nodes.
MC: The message cost per node per time unit.
x: The estimated proportion of non-malicious nodes in the actual network at a particular point in time.
f: The estimated probability of non-selfish nodes in the actual network at a particular point of time.
MA: The accuracy of my algorithm for estimating the correct values of f and x.

7.3 Varying n and a with fixed m a x N on Erdos-Renyi network

I first considered the scenario in which I applied a non-adaptive algorithm and varied n and a with fixed m a x N on an Erdos-Renyi network. In the simulation, I set m a x N=1000, n=1000,5000,10000, a=8,12,14, k M a x=9, c=0.999, d=50, l=2, h a i r=6, x f G a p=0.3, and x f M i n=0.4. I performed the experiment on the following nine cases:

Case 1: F=1.0, X=1.0
Case 2: F=1.0, X=0.7
Case 3: F=1.0, X=0.4
Case 4: F=0.7, X=1.0
Case 5: F=0.7, X=0.7
Case 6: F=0.7, X=0.4
Case 7: F=0.4, X=1.0
Case 8: F=0.4, X=0.7
Case 9: F=0.4, X=0.4

Figure 10 shows the MP curves of: 1) n=1000, a=8; 2) n=5000, a=12; and 3) n=10000, a=14, where all the curves were generated with a non-adaptive algorithm on an Erdos-Renyi network. Each node is allowed to have its current view of the membership up to m a x N=1000, regardless of actual network size n. In Fig. 10, I first noticed that when X and F decreases, the MP also decreases for n=1000, n=5000, and n=10000. Similarly, when both X and F decrease, the MP dramatically drops to about M P=0.301818 for n=1000, n=5000, and n=10000. In addition, for n=1000, n=5000, and n=10000, the mean value of MC is M C=72 for n=1000, M C=156 for n=5000, and M C=210 for n=10000. Lastly, I noticed that the match probabilities MP for n=1000 are all very close to n=5000 and n=10000. This shows the effectiveness of my algorithm, because a node can still obtains similar match probabilities, even with different values of n. Subsequently, I chose n=10000, a=14, for the following experiment.

7.4 Varying m a x N with fixed n on Erdos-Renyi network

I considered a scenario in which I applied a non-adaptive algorithm and varied m a x N with fixed n on an Erdos-Renyi network. In the simulation, I set n=10000, m a x N=1000,5000,10000, a=14, k M a x=9, c=0.999, d=50, l=2, h a i r=6, x f G a p=0.3, and x f M i n=0.4. I performed the experiment on the following nine cases:

Case 1: F=1.0, X=1.0
Case 2: F=1.0, X=0.7
Case 3: F=1.0, X=0.4
Case 4: F=0.7, X=1.0
Case 5: F=0.7, X=0.7
Case 6: F=0.7, X=0.4
Case 7: F=0.4, X=1.0
Case 8: F=0.4, X=0.7
Case 9: F=0.4, X=0.4

Figure 11 shows the MP curves of m a x N=1000, m a x N=5000, and m a x N=10000, where all the curves were generated with a non-adaptive algorithm on an Erdos-Renyi network. In Fig. 11, I first noticed that when X and F decrease, the MP also decreases for m a x N=1000, m a x N=5000, and m a x N=10000. Similarly, when both X and F decrease, the MP dramatically drops to about M P=0.294318 for m a x N=1000, m a x N=5000, and m a x N=10000. In addition, for m a x N=1000, m a x N=5000, and m a x N=10000, the overall mean values of MC are all M C=210. Moreover, I noticed that the match probability for m a x N=1000 is close to m a x N=5000 and m a x N=10000. Lastly, by comparing Figs. 10 and 11, I concluded that even when the network has different values of m a x N or n, my algorithm still produces a similar MP. Consequently, I chose m a x N=1000 for my following experiment.

7.5 Varying different random networks

In this section, I considered a scenario in which I applied a non-adaptive algorithm and set n=10000, a=14, l=2, m a x N=1000, k M a x=9, c=0.999, d=50, h a i r=6, x f G a p=0.3, and x f M i n=0.4 on an ER network, a WS network, and a BA network. For the WS and BA networks, the source node distributes its metadata evenly across the overall network in order to obtain a fair amount of responses to its request messages. In addition, by having metadata distribute evenly to the network, some popular nodes would not be more vulnerable to attacks. I performed the experiment on the following nine cases:

Case 1: F=1.0, X=1.0
Case 2: F=1.0, X=0.7
Case 3: F=1.0, X=0.4
Case 4: F=0.7, X=1.0
Case 5: F=0.7, X=0.7
Case 6: F=0.7, X=0.4
Case 7: F=0.4, X=1.0
Case 8: F=0.4, X=0.7
Case 9: F=0.4, X=0.4

Figure 12 shows the MP curves with non-adaptive algorithms on the ER network, WS network, and BA network. In Fig. 12, I first noticed that when X and F decrease, the MP also decreases for all three random network. Similarly, when both X and F decrease, the MP dramatically decreases to about M P=0.292045 for all three random networks. In addition, for all three random networks, the overall mean values of MC are all M C=210. Moreover, I noticed that the match probability for the ER network remains quite close to that for the WS network and BA network.

Figure 13 shows the MA curves with a non-adaptive algorithm on the ER network, WS network, and BA network. In Fig. 13, I noticed that the accuracy MA for detecting the value of X and F remains high on all three random networks. As a result, these figures demonstrate the robustness and scalability of my algorithm, because a node can still obtain similar match probabilities with different random networks.

7.6 Non-adaptive vs. dynamic adaptive algorithm

In this section, I considered a scenario where I compared the non-adaptive and dynamic adaptive message-forwarding algorithm on the ER network, WS network, and BA network. In the simulation, I set n=10000, a=14, l=2, m a x N=1000, k M a x=9, c=0.999, d=50, h a i r=6, x f G a p=0.3, x f M i n=0.4, and performed the experiment using the following nine cases:

Case 1: F=1.0, X=1.0
Case 2: F=1.0, X=0.7
Case 3: F=1.0, X=0.4
Case 4: F=0.7, X=1.0
Case 5: F=0.7, X=0.7
Case 6: F=0.7, X=0.4
Case 7: F=0.4, X=1.0
Case 8: F=0.4, X=0.7
Case 9: F=0.4, X=0.4

Tables 1, 2, and 3 show the message costs MC for non-adaptive and adaptive algorithms on the ER network, WS network, and BA network. First of all, the MC of the non-adaptive algorithm remains M C=210 for all three networks. Next, as X and F decrease, MC from the adaptive algorithm increases. When the values of X and F are both low, such as X=0.4 and F=0.4, the MC increases moderately from M C=210 to M C=1145.06. The increased value of MC is still within a reasonable range because when X and F are both low (e.g. X=0.4 and F=0.4), the requesting node only increases its requests to about 10 % of the total nodes in the network. Lastly, I noticed that the message costs of non-adaptive and adaptive algorithms for the ER network are quite close to those of the WS network and BA network.

Table 1 MC for ER network

Full size table

Table 2 MC for WS network

Full size table

Table 3 MC for BA network

Full size table

Figures 14, 15, and 16 show the MP for non-adaptive and adaptive algorithms on the ER network, WS network, and BA network. From these three figures, I noticed that when X and F decrease, the MP of the non-adaptive algorithm decreases for all three random networks. However, the MP of my adaptive algorithm increases significantly for all three random networks. In addition, these figures demonstrated the robustness and effectiveness of my dynamic adaptive algorithm, because when the network only has relatively low values of X and F, my algorithm can still adjust the number of requests to ensure high MP for all three random networks. For example, for x=0.4 and f=0.4, the MP is still about M P=0.81 for all three random networks. Moreover, I observed that the MP of non-adaptive and adaptive algorithms for the ER network remains very close to the WS network and BA network.

7.7 Effectiveness of dynamic adaptive message forwarding algorithm

In this section, I considered an extended scenario in which I applied the dynamic adaptive algorithm and varied X and F with n=10000, a=14, l=2, m a x N=1000, k M a x=9, c=0.999, d=50, h a i r=6, x f G a p=0.3, x f M i n=0.4 on the ER network, WS network, and BA network. In order to reflect a realistic network, I designed my extended scenario such that X and F decrease or increase their value slowly, which comprises the following scenarios:

Scenario 1: X=1, F=1, time 0 ∼ 500
Scenario 2: X=0.95, F=1, time 500 ∼ 600
Scenario 3: X=0.9, F=1, time 600 ∼700
Scenario 4: X=0.85, F=1, time 700 ∼ 800
Scenario 5: X=0.8, F=1, time 800 ∼ 900
Scenario 6: X=0.75, F=1, time 900 ∼ 1000
Scenario 7: X=0.7, F=1, time 1000 ∼ 1500
Scenario 8: X=0.65, F=1, time 1500 ∼ 1600
Scenario 9: X=0.6, F=1, time 1600 ∼ 1700
Scenario 10: X=0.55, F=1, time 1700 ∼ 1800
Scenario 11: X=0.5, F=1, time 1800 ∼ 1900
Scenario 12: X=0.45, F=1, time 1900 ∼ 2000
Scenario 13: X=0.4, F=1, time 2000 ∼ 2500
Scenario 14: X=0.4, F=0.95, time 2500 ∼ 2600
Scenario 15: X=0.4, F=0.9, time 2600 ∼ 2700
Scenario 16: X=0.4, F=0.85, time 2700 ∼ 2800
Scenario 17: X=0.4, F=0.8, time 2800 ∼ 2900
Scenario 18: X=0.4, F=0.75, time 2900 ∼ 3000
Scenario 19: X=0.4, F=0.7, time 3000 ∼ 3500
Scenario 20: X=0.4, F=0.65, time 3500 ∼ 3600
Scenario 21: X=0.4, F=0.6, time 3600 ∼ 3700
Scenario 22: X=0.4, F=0.55, time 3700 ∼ 3800
Scenario 23: X=0.4, F=0.5, time 3800 ∼ 3900
Scenario 24: X=0.4, F=0.45, time 3900 ∼ 4000
Scenario 25: X=0.4, F=0.4, time 4000 ∼ 4500
Scenario 26: X=0.45, F=0.4, time 4500 ∼ 4600
Scenario 27: X=0.5, F=0.4, time 4600 ∼ 4700
Scenario 28: X=0.55, F=0.4, time 4700 ∼ 4800
Scenario 29: X=0.6, F=0.4, time 4800 ∼ 4900
Scenario 30: X=0.65, F=0.4, time 4900 ∼ 5000
Scenario 31: X=0.7, F=0.4, time 5000 ∼ 5500
Scenario 32: X=0.75, F=0.4, time 5500 ∼ 5600
Scenario 33: X=0.8, F=0.4, time 5600 ∼ 5700
Scenario 34: X=0.85, F=0.4, time 5700 ∼ 5800
Scenario 35: X=0.9, F=0.4, time 5800 ∼ 5900
Scenario 36: X=0.95, F=0.4, time 5900 ∼ 6000
Scenario 37: X=1.0, F=0.4, time 6000 ∼ 6500
Scenario 38: X=1.0, F=0.45, time 6500 ∼ 6600
Scenario 39: X=1.0, F=0.5, time 6600 ∼ 6700
Scenario 40: X=1.0, F=0.55, time 6700 ∼ 6800
Scenario 41: X=1.0, F=0.6, time 6800 ∼ 6900
Scenario 42: X=1.0, F=0.65, time 6900 ∼ 7000
Scenario 43: X=1.0, F=0.7, time 7000 ∼ 7500
Scenario 44: X=0.95, F=0.7, time 7500 ∼ 7600
Scenario 45: X=0.9, F=0.7, time 7600 ∼ 7700
Scenario 46: X=0.85, F=0.7, time 7700 ∼ 7800
Scenario 47: X=0.8, F=0.7, time 7800 ∼ 7900
Scenario 48: X=0.75, F=0.7, time 7900 ∼ 8000
Scenario 49: X=0.7, F=0.7, time 8000 ∼ 850

Figure 17 shows the graphs of X, F, and MP for the ER network, WS network, and BA network. In the extended scenario, X decreases from 1 to 0.7, to 0.4, then increases to 0.7, to 1, and finally decreases to 0.7. Similarly, F first decreases from 1 to 0.7, then drops to 0.4, and finally increases back to 0.7.

First of all, from Fig. 17, when X and F are low, there are few nodes that report a match, resulting in a low match probability. Thus, the values of MP of non-adaptive algorithm on the ER network change from M P=1 at time 1, to M P=0.9820 at time 1000, to M P=0.9545 at time 2000, to M P=0.8877 at time 3000, to M P=0.7898 at time 4000, to M P=0.6969 at time 5000, to M P=0.6592 at time 6000, to M P=0.6602 at time 7000, and finally to M P=0.6859 at time 8000. Similarly, for the WS network, the value of MP of non-adaptive algorithm changes from M P=1 at time 1, to M P=0.9740 at time 1000, to M P=0.9505 at time 2000, to M P=0.8927 at time 3000, to M P=0.7998 at time 4000, to M P=0.7013 at time 5000, to M P=0.6627 at time 6000, to M P=0.6595 at time 7000, and finally to M P=0.6845 at time 8000. Likewise, for the BA network, the value of MP of the non-adaptive algorithm changes from M P=1 at time 1, to M P=0.9780 at time 1000, to M P=0.9530 at time 2000, to M P=0.8920 at time 3000, to M P=0.8008 at time 4000, to M P=0.7081 at time 5000, to M P=0.6741 at time 6000, to M P=0.6736 at time 7000, and finally to M P=0.6980 at time 8000.

Next, when the values of X and F are low, the requesting node receives fewer matches, and the consequently affects the modified chi-squared to estimate low values of X and F. Hence, my dynamic adaptive algorithm increases a and l in order to maintain high match probability MP. Subsequently, the overall value of MP of the dynamic adaptive algorithm on the ER network changes from M P=1 at time 1, to M P=0.9780 at time 1000, to M P=0.9615 at time 2000, to M P=0.9583 at time 3000, to M P=0.9333 at time 4000, to M P=0.8872 at time 5000, to M P=0.8825 at time 6000, to M P=0.8919 at time 7000, and finally to M P=0.8996 at time 8000. Similarly, the value of MP of dynamic adaptive algorithm on WS network changes from M P=1 at time 1, to M P=0.9780 at time 1000, to M P=0.9685 at time 2000, to M P=0.9620 at time 3000, to M P=0.9348 at time 4000, to M P=0.8910 at time 5000, to M P=0.8877 at time 6000, to M P=0.8943 at time 7000, and finally to M P=0.9023 at time 8000. Likewise, the value of MP of dynamic adaptive algorithm on BA network changes from M P=1 at time 1, to M P=0.9810 at time 1000, to M P=0.9740 at time 2000, to M P=0.9650 at time 3000, to M P=0.9378 at time 4000, to M P=0.9016 at time 5000, to M P=0.9022 at time 6000, to M P=0.9116 at time 7000, and finally to M P=0.9171 at time 8000.

From Fig. 17, I notice that the curve of MP on the ER network is quite similar to the curve of MP for both the WS and the BA network. In addition, it is clear to see that the match probability MP for the ER network, WS network, and BA network remains high even when X and F are low, and that showed the effectiveness and robustness of my dynamic adaptive algorithm.

Tables 4, 5, and 6 show the mean values of MP and MC for the entire extended scenario on the ER network, WS network, and BA network. First of all, the value of MC of the non-adaptive algorithm for all three random networks remains M C=210 from the beginning to the end. Next, it is obvious to see that the mean values of MP and MC on the ER network are similar to the values of MP on both the WS and the BA network. In addition, I noticed that by applying my dynamic adaptive algorithm to the system, the mean match probability MP for each random network increases from M P=0.6862 to about M P=0.9046, which is quite effective. Thus, these tables confirmed the effectiveness and scalability of my dynamic adaptive algorithm, and have demonstrated that a node can still find a match even when the network has low values of X and F. Furthermore, it is clear to notice that my algorithm can appropriately adjust MC to ensure high value of MP. Finally, the overall mean value of MC for all three random networks is about M C=544.633, which is considered a reasonable cost.

Table 4 Mean value of MC and MP on ER network

Full size table

Table 5 Mean value of MC and MP on WS network

Full size table

Table 6 Mean value of MC and MP on BA network

Full size table

Overall, these experiments have demonstrated the scalability and effectiveness of my algorithm in estimating the proportion of non-malicious and non-selfish nodes, and that false alarms rarely occur. When the actual proportion X of non-malicious nodes increases or decreases, my algorithm quickly estimates the changes of x of non-malicious nodes. Similarly, when the actual proportion F of non-selfish nodes increases or decreases, my algorithm quickly determines the changes of f of non-selfish nodes. In addition, I have showed that my dynamic adaptive algorithm can effectively adjust the value of a and l for achieving high match probability MP. Moreover, I have verified that when the allowable number of nodes m a x N in a node’s view is much smaller than the actual network size n, my algorithm can still accurately estimate the value of x and f, and appropriately adjusts the number of a and l to achieve high MP. Furthermore, I considered the ER network, WS network, and BA networks in the context of my algorithm, and have showed the robustness of my algorithm on several experiments with varied network sizes. Lastly, I have constructed an extended scenario in which I varied X and F with various random networks, and have demonstrated the effectiveness and scalability of my algorithm, such that it can accurately estimate the value of x and f, appropriately adjust a and l to ensure high match probability MP, and still maintain the reasonable message cost MC.

8 Conclusion

In this paper, I have presented a dynamic adaptive message-forwarding algorithm for P2P search and retrieval system, where the main goal is to: 1) tackle the censorship and security issues; 2) employ the probability density function (pdf), Exponential Weighted Moving Average (EWMA) method, and modified chi-squared test to determine the proportion of subverted and selfish nodes; 3) defend against malicious and selective forwarding attacks by appropriately adjusting the number of requests to ensure high match probability, even when the network has large proportions of subverted and selfish nodes; and 4) guarantee robustness and scalability of my algorithm in different random networks (e.g., Erdos-Renyi, Watts-Strogatz, and Barabasi-Abert networks) and varied network sizes.

I have first explained the theory behind my dynamic adaptive message-forwarding algorithm, which uses random sampling and statistical inferences to accurately estimate the metrics that cannot be observed (e.g. malicious and selfish nodes), and thereby appropriately changes the number of requests in order to maintain a high match probability. Extensive experimental evaluations demonstrated the scalability and robustness of my algorithm, which can work well under a range of network conditions: 1) when the network has a large proportions of malicious nodes, 2) when the network has a large proportions of selfish nodes, and 3) when nodes only allow to have a mere partial view of network membership.

In the future, I plan to continue investigating and expanding my adaptive algorithms, incorporating scenarios when the network has a high rate of membership churn. In addition, I also plan to extend my work and test my algorithm when the network contains a very large membership, such as containing millions of nodes. When the network contains millions of nodes, the communication and computational costs will be larger. Therefore, I plan to investigate an algorithm that forwards messages, deals with a large number of nodes, and protects against malicious and selective forwarding attacks, but still maintains reasonable communication and computation costs.

References

Alajmi N, Elleithy K (2015) Multi-layer approach for the detection of selective forwarding attacks. Sensors 15(11):29332–29345
Article Google Scholar
Barabási A, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Article MathSciNet MATH Google Scholar
Belen R (2009) Detecting disguised missing data. PhD thesis, Middle East Technical University
Bennett I Media censorship in china. http://www.cfr.org/china/media-censorship-china/p11515. Accessed: 2015-02-08
Bianchi S, Felber P, Gradinariu M (2007) Content-based Publish/subscribe using distributed r-trees. In: Proceedings of Euro-Par. Rennes, France, pp 537–548
Brownlee KA, Brownlee KA (1965) Statistical theory and methodology in science and engineering, vol 150. Wiley, New York
Busse M, Haenselmann T (2006) Wolfgang Effelsberg. Energy-efficient forwarding schemes for wireless sensor networks. In: Proceedings of the International Symposium on on World of Wireless, Mobile and Multimedia Networks, p 2006
Chen X, Shen J, Groves T, Wu J (2009) Probability delegation forwarding in delay tolerant networks. In: Computer Communications and Networks, 2009. ICCCN Proceedings of 18th Internatonal Conference on, p 2009
Chomhaill TN, McKelvey N, Curran K, Subaginy N (2015) Internet censorship in China. In: Mehdi K-P (ed) Encyclopedia of Information Science and Technology, pages pp. 1447–1451. hershey, PA: Information Science Reference third edition edition
Chuang YT, Melliar-Smith PM, Moser LE, Michel Lombera I (2016) Maintaining censorship resistance in the iTrust network for publication, search and retrieval. Peer-to-Peer Networking and Applications 9(2):266–283
Chuang YT, Lombera IM, Moser LE, Melliar-Smith PM (2011) Trustworthy distributed search and retrieval over the Internet. In: Proceedings of the International Conference on Internet Computing, pages 169–175. NV, Las Vegas
Clarke I, Sandberg O, Wiley B, Freenet TH (2001) An distributed anonymous information storage and retrieval system. In: Proceedings of the Workshop on Design Issues in Anonymity and Unobservability, pages 46–66, Berkeley, CA
Cohen B (2008) The bittorrent protocol specification
T. Condie, S. D. Kamvar, and H. Garcia-Molina. Adaptive peer-to-peer topologies. Inproceedings of the 4th IEEE International Conference on Peer-to-Peer Computing, pages 53–62 (August 2004) Zurich Switzerland
Stephen E (1990) Deering and David R Cheriton. Multicast routing in datagram internetworks and extended lans. ACM Transactions on Computer Systems (TOCS) 8(2):85–110
Article Google Scholar
Deng H, Sun X, Wang B, Cao Y (2009) Selective forwarding attack detection using watermark in wsns. In: Computing, Communication, Control, and Management, 2009. CCCM ISECS International Colloquium on, vol 3, p 2009
Erd6s Paul, Rényi A (1960) On the evolution of random graphs. Publ Math Inst Hungar Acad Sci 5:17–61
Erramilli V, Crovella M, Chaintreau A, Diot C (2008) Delegation forwarding. In: Proceedings of the 9th ACM international symposium on Mobile ad hoc networking and computing, pp 251–260
Farley AM (1980) Broadcast time in communication networks. SIAM J Appl Math 39(2):385–390
Article MathSciNet MATH Google Scholar
Farley AM (1979) Minimal broadcast networks. Networks 9(4):313–332
Article MathSciNet MATH Google Scholar
Feller W (1968) An Introduction to Probability Theory and Its Applications, volume I John Wiley & Sons
Ferreira RA, Ramanathan MK, Awan A, Grama A, Jagannathan S (2005) Search with probabilistic guarantees in unstructured peer-to-peer networks. In: Proceedings of 5th IEEE International Conference on Peer-to-Peer Computing. Konstanz Germany, pp 165–172
Geethu PC, Mohammed AR (2013) Defense mechanism against selective forwarding attack in wireless sensor networks. In: Computing, Communications and Networking Technologies (ICCCNT) Fourth International Conference on, pages 1–4, p 2013
Ghosh AK, Schwartzbard A (1999) A study in using neural networks for anomaly and misuse detection. In: USENIX Security
Goonatilake R, Herath A, Herath S, Herath J (2007) Intrusion detection using the chi-square goodness-of-fit test for information assurance, network, forensics and software security. J Comput Sci Colleges 23(1):255–263
Google Scholar
Yu G, He T (2007) Data forwarding in extremely low duty-cycle sensor networks with unreliable communication links. In: Proceedings of the 5th international conference on Embedded networked sensor systems, pages 321–334 ACM
Gunturu R (2015) Survey of sybil attacks in social networks. arXiv:1504.05522
Gupta A, Sahin O, Agrawal D, Meghdoot A, Abbadi E (2004) Content-based publish/subscribe over P2P networks. In: Proceedings of the 5th ACM/IFIP/USENIX International Conference on Middleware, pages 254–273, Toronto, Canada
Haas ZJ, Halpern JY, Li L (2006) Gossip-based ad hoc routing. IEEE/ACM Transac Netw (ToN) 14 (3):479–491
Article Google Scholar
Hai Tran Hoang, Huh E-N (2008) Detecting selective forwarding attacks in wireless sensor networks using two-hops neighbor knowledge. In: Network Computing and Applications NCA’08. Seventh, IEEE International Symposium on, pages 325–331. IEEE, p 2008
Hales D (2004) From selfish nodes to cooperative networks - Emergent link-based incentives in peer-to-peer networks. In: Proceedings of the Fourth International Conference on Peer-to-Peer Computing, pages 151–158, Bologna, Italy
Heckert A (2006) Chi-square two sample tests, September. http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/chi2samp.htm
Hedetniemi SM, Hedetniemi ST, Liestman AL (1988) A survey of gossiping and broadcasting in communication networks. Networks 18(4):319–349
Article MathSciNet MATH Google Scholar
Helman P, Bhangoo J (1997) A statistically based system for prioritizing information exploration under uncertainty. Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on 27(4):449–466
Article Google Scholar
Isdal T, Piatek M, Krishnamurthy A, Anderson T (2010) Privacy preserving P2P data sharing with OneSwarm. In: Proceedings of the ACM SIGCOMM Conference, pages 111–122, New Delhi, India
Jesi GP, Hales D, Van Steen M (2007) Identifying Malicious peers before A decentralized secure peer sampling service. In: Proceedings of the 1st International Conference on Self-Adaptive and Self-Organizing Systems, pages 237–246, Boston, MA
Karlof C, Wagner D (2003) Secure routing in wireless sensor networks Attacks and countermeasures. Ad Hoc Netw 1(2):293–315
Article Google Scholar
Lazarevic A, Ertoz L, Kumar V, Ozgur A, Srivastava J (2003) A comparative study of anomaly detection schemes in network intrusion detection. In: Proceedings of the Third SIAM International Conference on Data Mining, pages 25–36, San francisco CA
Lee HY, Cho TH (2007) Fuzzy-based reliable data delivery for countering selective forwarding in sensor networks. In: Ubiquitous Intelligence and Computing, pages 535–544. Springer
Lee W, Stolfo SJ, et al. (1998) Data mining approaches for intrusion detection. In: Usenix security
Mathur Avijit, Newe Thomas, Rao Muzaffar (2016) Defence against black hole and selective forwarding attacks for medical wsns in the iot. Sensors 16(1):118
Article Google Scholar
Melliar-Smith PM, Moser LE, Lombera IM, Chuang YT (2012) iTrust: Trustworthy information publication, search and retrieval. In: Proceedings of the 13th International Conference on Distributed Computing and Networking, pages 351–366, Hong Kong, China
Milgram S (1967) The small world problem. Psychology today 2(1):60–67
Google Scholar
Mischke J, Stiller B (2004) A methodology for the design of distributed search in P2P middleware. IEEE Netw 18(1):30–37
Article Google Scholar
Morselli R, Bhattacharjee B, Srinivasan A, Marsh MA (2005) Efficient lookup on unstructured topologies. In: Proceedings of the 24th ACM Symposium on Principles of Distributed Computing, pages 77–86, Las Vegas, NV
Moser LE, Melliar-Smith PM (2013) Probabilistic analysis of message forwarding. In: Proceedings of the IEEE International Conference on Computer Communications and Networks, Nassau, Bahamas
Moser L. E., Melliar-Smith P. M. (2014) Analysis of the match probabilities for the itrust information network with message forwarding. In: Proceedings of the International Conference on Information Networking, Phuket, Thailand, pp 340–345
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical Recipes in Fortran: The Art of Scientific Computing. Cambridge University Press. Cambridge , United Kingdom
MATH Google Scholar
Ripeanu M (2001) Peer-to-peer architecture case study: Gnutella network. In: Peer-to-Peer Computing, 2001. Proceedings. First International Conference on, pages 99–100. IEEE
Risson J, Moors T (2006) Survey of research towards robust peer-to-peer networks Search methods. Comput Netw: Int J Comput Telecommun Netw 50(17):3485–3521
Article MATH Google Scholar
Roberts SW (1959) Control chart tests based on geometric moving averages. Technometrics 1(3):239–250
Article Google Scholar
Saeed Y, Lodhi SA, Ahmed K (2013) Obstacle management in vanet using game theory and fuzzy logic control. Int J Commun 4(1):9
Google Scholar
Sasson Y, Cavin D, Schiper A (2003) Probabilistic broadcast for flooding in wireless mobile ad hoc networks conference=Wireless Communications and Networking, 2003. WCNC 2003 IEEE, volume 2, pages 1124–1130, p 2003
Terpstra WW, Kangasharju J, Leng C, Buchmann AP (2007) Bubblestorm: REsilient, probabilistic, and exhaustive peer-to-peer search. In: Proceedings of the ACM Conference on Applications, Technologies, Architectures and Protocols for Computer Communications, pages 49–60, Kyoto, Japan
Tran DA, Pham C (2010) Enabling content-based publish/subscribe services in cooperative P2P networks. Comput Netw: Int J Comput Telecommun Netw 52(11):1739–1749
Article MATH Google Scholar
Travers Jeffrey, Milgram Stanley (1969) An experimental study of the small world problem. Sociometry, 425–443
Tsoumakos D, Roussopoulos N (2003) A comparison of peer-to-peer search methods. In: Proceedings of the Sixth International Workshop on the Web and Databases, pages 61–66, San Diego CA
Viinikka J, Debar H (2004) Monitoring IDS background noise using EWMA control charts and alert information. In: Proceedings of the 7th International Symposium on Recent Advances in Intrusion Detection, pages 166–187, French Riviera, France
Wang L, Kangasharju J (2012) Real-world sybil attacks in bittorrent mainline dht. In: Global Communications Conference (GLOBECOM), 2012 IEEE, pages 826–832, IEEE
Watts DJ, Strogatz SH (1998) Collective dynamics of small-worldnetworks. Nature 393(6684):440–442
Article Google Scholar
Wong B, Quasar SG (2008) A probabilistic publish-subscribe system for social networks. In: Proceedings of the 7th International Workshop on Peer-to-Peer Systems, Tampa Bay FL
Xiao B, Bo Y, Chemas CG (2007) Identify suspect nodes in selective forwarding attacks. J Parallel Distrib Comput 67(11):1218–1230
Article MATH Google Scholar
Xin-Sheng W, Yong-Zhao Z, Shu-Ming X, Liang-Min W (2009) Lightweight defense scheme against selective forwarding attacks in wireless sensor networks. In: Cyber-Enabled Distributed Computing and Knowledge Discovery CyberC’09. International Conference on, pages 226–232, IEEE, p 2009
Ye N, Chen Q (2001) An anomaly detection technique based on a chi-square statistic for detecting intrusions into information systems. Quality and Reliability Eng Int J 17(2):105– 112
Article MathSciNet Google Scholar
Bo Y, Xiao B (2006) Detecting selective forwarding attacks in wireless sensor networks. In: Parallel and Distributed Processing Symposium IPDPS 2006. 20th International, pages 8–pp, IEEE, p 2006
Zhou B, Shi Q, Merabti M (2006) Intrusion detection in pervasive networks based on a chi-square statistic test. In: Proceedings of the 30th International Computer Software and Applications Conference, pages 203–208, Chicago, IL

Download references

Acknowledgments

This research was supported in part by MOST 103-2410-H-194-064 and MOST 104-2410-H-194-090-MY2 of Ministry of Science and Technology, Taiwan.

Author information

Authors and Affiliations

Department of Information Management, National Chung Cheng University, 168 University Rd., Minhsiung Township, Chia-Yi, 62100, Taiwan
Yung-Ting Chuang

Authors

Yung-Ting Chuang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yung-Ting Chuang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chuang, YT. Protecting against malicious and selective forwarding attacks for P2P search & retrieval system. Peer-to-Peer Netw. Appl. 10, 1079–1100 (2017). https://doi.org/10.1007/s12083-016-0500-1

Download citation

Received: 01 March 2016
Accepted: 31 July 2016
Published: 24 August 2016
Issue Date: July 2017
DOI: https://doi.org/10.1007/s12083-016-0500-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Protecting against malicious and selective forwarding attacks for P2P search & retrieval system

Abstract

Similar content being viewed by others

An Efficient, Secure and User Privacy-Preserving Search Protocol for Peer-to-Peer Networks

Next level peer-to-peer overlay networks under high churns: a survey

TMV: Trust-Matrix-Value Based Neighbor Peer Selection for Secure Query Forwarding in P2P Networks

1 Introduction

2 Related work

2.1 Peer-to-peer network

2.2 Exponential weighted moving average and chi-squared statistics

2.3 Message forwarding and probability density function

2.4 Detecting and defending against malicious attacks

2.5 Detecting and defending against selective forwarding attacks

2.6 Random networks

2.6.1 Erdos-Renyi (ER) network

2.6.2 Watts-Strogatz (WS) network

2.6.3 Barabasi-Albert (BA) network

3 System model

4 Design of P2P search and retrieval system

4.1 Joining the membership, getBootView()

4.2 Distribution metadata, distribute()

4.3 Distribution requests, request()

5 Foundations

5.1 Probabilistic Analysis with probability density function, pdf()

5.2 Match probabilities with message forwarding, findMatchProbability()

5.3 Exponential Weighted Moving Average Algorithm, EWMA()

5.4 Normalization, norm()

5.5 Modified chi-squared method, modChiSq()

5.6 Calculate appropriate a and l for metadata distribution, alDistribution()

5.7 Detecting subverted and selfish nodes, xfGet()

5.8 Protecting against subverted and selfish nodes, alGet()

6 Message forwarding algorithm

6.1 Dynamic adaptive message forwarding algorithm, dynamicAdaptive()

6.2 Non-adaptive message forwarding algorithm, nonAdaptive()

7 Experimental evaluation

7.1 Controlled variables

7.2 Performance metrics

7.3 Varying n and a with fixed m a x N on Erdos-Renyi network

7.4 Varying m a x N with fixed n on Erdos-Renyi network

7.5 Varying different random networks

7.6 Non-adaptive vs. dynamic adaptive algorithm

7.7 Effectiveness of dynamic adaptive message forwarding algorithm

8 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation