Keywords

1 Introduction

Participatory Sensing is a process of data collection and interpretation of an event by feeding interactive data via web or social media [28, 31]. A Participatory Sensor Network consists of nodes or participants to collect data for a common project goal within its framework [1, 3]. The nodes or participants use their personal mobile phones to sense various activities of their surrounding environment and submit sensed data through mobile network or social networking sites [3, 24, 25].

However, finding reliable sources in a participatory sensor network is very challenging task due to the big and continuous volume of sensing and communication data generated by the participant nodes and the availability of ubiquitous, real-time data sharing opportunities among nodes [2, 12, 16, 17, 20]. One conventional way to collect the reliable data is conducting self-reported surveys. However, conducting survey is a time consuming procedure. Popular data collection can be achieved by using mobile devices like smart phones, wearable sensing devices or through social networks [4, 10, 23, 26]. In a participatory network, the users are considered as participatory sensors, and an event can be reported or detected by the users [21, 22]. The major challenge in this participatory sensing is to ascertain the truthfulness of the data and the sources. The reliability of the sources is questionable because the data collection is open to a very large population [36]. The reliability of the participants (or sources) denotes the probability that the participant reports correct observations. Reliability may be impaired because of the lack of human attention to the task, or because of the bad intention to deceive. Without knowing the reliability of sources, it is difficult to measure whether the reported observations or events are true or not [32]. Openness in data collection also leads to numerous questions about the quality, credibility, integrity, and trustworthiness of the collected information [5, 9, 14, 15]. It is very challenging to find whether the end user is correct, truthful and trustworthy. If the nodes are reliable, the credibility of the total system increases. Therefore, it is very important to find the reliable sensing sources to detect the events.

In this paper, we address the challenges of finding the node reliability in a participatory sensing system. Our paper is organized as follows. Section 2 provides the background study. Section 3 provides the problem domain in details. Section 4 provides the experimental results. Finally, Sect. 5 gives the conclusion of this research work.

2 Related Work

In this section, we discuss some research work on the reliability estimation of the nodes in a participatory sensing network. For the case of specific kinds of data such as location data, a variety of methods are used in order to verify the truthfulness of the location of a mobile device [19]. The key idea is that time-stamped location certificates signed by wireless infrastructure are issued to co-located mobile devices. A user can collect certificates and later provide those to a remote party as a verifiable proof of his or her location at a specific time. However, the major drawback of this approach is that the applicability of these infrastructure based approaches for mobile sensing is limited as cooperating infrastructure may not be present in remote or hostile environments.

In the context of participatory sensing, where raw sensor data is collected and transmitted, a basic approach for ensuring the integrity of the content has been proposed in [14], which guards whether the data produced by a sensor has been maliciously altered by the users. Trusted Platform Module (TPM) hardware [14], can be leveraged to provide this assurance. However, this method is expensive and not practical since each user must have predefined hardware framework.

The problem of trustworthiness has been studied for resolving multiple, conflicting information on the web in [36]. The earliest work in this regard are proposed in [7, 8]. A number of recent methods in [3, 18, 32, 33] also address this issue, in which a consistency model is constructed in order to measure the trust in user responses in a participatory sensing environment. The key idea of this system is that untrustworthy responses from users are more likely to be different from one another, whereas truthful methods are more likely to be consistent with one another. This broad principle is used in order to model the likelihood of participant reliability in social sensing with the use of a Bayesian approach [32]. A system called Apollo [18] has been proposed in this context in order to find the truth from noisy social data streams. However, these types of methods are a bit time consuming and do not ensure source reliability. In case of collaborative attack this method may fail.

In [18], authors present a fuzzy approach where this system is able to quantify uncertain and imprecise information, such as trust, which is normally expressed by linguistic terms rather than numerical values. However, the linguistic term can create vague results. In [28, 34], authors present a streaming approach to solve the truth estimation problem in crowd sourcing applications. They consider a category of crowd sourcing applications, the truth estimation problem. This is basically reliability finding problem. In fact fact-finding algorithms are used to solve this problem by iteratively assessing the credibility of sources and their claims in the absence of reputation scores. However, such methods operate on the entire dataset of reported observations in a batch fashion, which makes them less suited to applications where new observations arrive continuously. The problem is modelled as an Expectation Maximization (EM) problem to determine the odds of correctness of different observations [3]. Problem and accessing on-line information from various data sources are mentioned in [37]. However, all of these methods suffer from the collecting the ground truth data for unavoidable circumstances. Therefore, finding credible nodes of the detected event is still very challenging research problem.

3 Problem Domain

In this section, we define the system model, problem formulation and give the details of our methodology. At first we discuss some preliminaries relevant to our research problem. Let us consider a participatory sensing model where a group of M participants, \(S_1\)... \(S_M\), make individual observations about a set of N events \(C_1\)...\(C_N\). The probability that participant \(S_i\) reports a true event when the event is actually true is \(a_i\) and the probability that participant \(S_i\) reports a true event when the event is actually false is \(b_i\). \(\theta \) is the set of \(a_i\) and \(b_i\), \(\theta \)(\(a_i\), \(b_i\)).

Fig. 1.
figure 1

Population based system model.

To handle this challenge, we apply Genetic Algorithm or Population Based Method where we can keep around a sample of candidate solutions rather than a single candidate solution to find the solution quickly [11]. We know, GA generate solutions to optimize the problems inspired by natural evolution, such as inheritance, mutation, selection, and crossover [11]. We call our method Population Based Reliability Estimation (PBRE) which uses a set of reliability for the population instead of single reliability. In our approach, we call this set of reliability as P and we use Genetic Algorithm to estimate the best possible reliable participants. We call the set of \(\theta \) as P which is a set of reliability. \(z_j\) is the probability that the event or claim \(C_j\) is indeed authentic.

3.1 Population Based Method

In this section, we provide an outline of this method as follows.

Step 1: We initialize and build population in the following ways:

  1. 1.

    We initialize M, N

  2. 2.

    We take input SC matrix or \(Source-Claim\) matrix. Each entry of the matrix is either 0 or 1. Here, when the participant \(S_i\) reports an event \(C_j\) as false \(S_iC_j=0\) , and when \(S_i\) reports an event \(C_j\) as true \(S_iC_j=1\). We assume, each observation and source of the matrix is independent of each other.

  3. 3.

    We initialize d, overall bias on event to be true (value may range from 0 to 1).

  4. 4.

    Finally, P= The set of \(\theta \) āny value between 0 to 1.

Step 2: We calculate \(z_j\) as follows.

is the conditional probability to be true, given the SC matrix \(X_j\) related to the \(j^{th}\) and the current estimate of \(\theta \).

Step 3: We compute fitness. Computing Fitness is described as follows. Then, we assess fitness of P, the set of reliability. We compare P with the best reliability. The target reliability or \(target\_a_i\) is computed as follows.

\(target\_a_i =\sum \limits _{i=1}^M(\sum \limits _{j=1}^N=\frac{S_i\times C_j}{N})\)

For example, in an ideal case, when the probability of all events to be true, \(z_j=1\) . Let us consider, there are 2 events and 3 participants which is illustrated in Fig. 2. Participant \(S_1\) reports event \(C_1\) as true, and reports the event \(S_2\) as false. Therefore, \(target\_a_1 = \frac{S_1 C_1 + S_1 C_2}{2} = \frac{1+0}{2} = 0.5\).

Now, the objective is to select the best fit or the fittest \(a_i\) from P that helps to converge \(z_j\). We take the fittest value from the initial set of values of \(a_i\) using the fitness function. We call this fittest value as fit reliability or \(fit\_a_i\).

Fig. 2.
figure 2

Calculating the most reliable \(target\_a_i\)

Now, we define two types of fitness functions Fit_Parent and Replace_Parent.

Type 1 : Fit_Parent- Fit_Parent selects \(fit\_a_i\) from the set of \(a_i\) of \(S_i\). Here, \(fit\_a_i\) is the closest value to \(target\_a_i\). We describe the computation in Fig. 3.

Fig. 3.
figure 3

Fit_Parent computation

Fit_Parent selects \(fit\_a_i\) from the set of \(a_i\) of \(S_i\). Here, \(fit\_a_i\) is the closest value to \(target\_a_i\). We describe the computation in Fig. 3. For example, we initialize three sets of \(a_1\) for participant \(S_1\) e.i. 0.3, 0.1 and 0.8. Figure 3 is an illustrative example of Fit_Parent computation. We see that the \(target\_a_1\) is 0.5. Therefore, the closest \(a_1\) e.i. \(fit\_a_1\) is 0.3. Similarly, we calculate fitness for participant \(S_2\) and \(S_3\) which are \(a_2=0.8\) and \(a_3=0.6\) respectively.

Fig. 4.
figure 4

Replace_Parent computation

Type 2 : Replace_Parent- Here, instead of selecting one \(fit\_a_i\) from every participant \(S_i\)’s P, we select the full set of \(a_i\) which is the closest to set of \(target\_a_i\). Now, we give an illustrative example of Replace_Parent in Fig. 4.

For example, we initialize three sets of \(a_i\) for each participant \(S_1\) e.i. (\(a_{11},a_{12},a_{13}\)) =  (0.3, 0.1, 0.8), for \(S_2\) it is (\(a_{21},a_{22},a_{23}\)) = (0.8, 0.4, 0.5) and for \(S_3\) it is (\(a_{31},a_{32},a_{33}\)) = (0.8, 0.5,0.9). Now, we make another set taking the first \(a_i\) from each \(S_i\) e.i. (\(a_{11}, a_{21}, a_{31}\)) = (0.3, 0.8, 0.8) and similarly (\(a_{12}, a_{22}, a_{32}\)) = (0.1, 0.4, 0.6) and (\(a_{13}, a_{23}, a_{33}\)) = (0.8, 0.6,0.9). Our \(target\_a_i\) = (0.5, 1, 0.5). Therefore, we find that there are two \(fit\_a_i\)s in the first set, similarly one and no \(fit\_a_i\) for the second and the third set. Finally, we take the first set as the set of \(fit\_a_i\).

Step 4: Breeding

Now, the objective is to generate a new \(child_{\theta }\) from \(parent_{\theta }\) . We choose recombination technique [11] as breeding technique. This new values are called two children \(anew_i\) and \(bnew_i\), where,

\(anew_i= \alpha a_i + (1 - \alpha )b_i\),

\(bnew_i= \beta b_i + (1 - \beta )a_i\),

where, \(\alpha \) = random value between 0 to 1, and

\(\beta \) = random value between 0 to 1.

Step 5: Joining

We form the next generation parent by using new children. Joining equations are given as follows.

\(a_i = anew_i\)

\(b_i = bnew_i\)

Step 6: Error Percentage of Participant Reliability

We calculate the percentage of error of participant’s reliability by dividing the total number of converged reliable nodes by the total number of reliable nodes.

The flow chart in Fig. 5 shows the summary of the procedure.

Fig. 5.
figure 5

The summary of the procedure.

Now, we provide the formal algorithms of PBRE from Algorithms 1 to 7.

figure a
figure b
figure c
figure d
figure e
figure f
figure g

4 Experimental Results

In this section, we present the experimental results to show the effectiveness of PBRE method. We also compare our findings with the findings of another relevant algorithm Expectation Maximization [35]. The simulation of PBRE runs on 1.58 GHz Intel Core 2 Duo Processor with 2 GB memory. Simulation is done on synthetic data sets where SC matrix is generated randomly. SC matrix contains data of 1, 0 and if we used real time data set it would carry the similar property. The performance metrics used to evaluate the methods are described as follows:

  1. 1.

    The Error Percentage of Participant’s Reliability denotes the estimation of reliability of a participant to a converged event z.

  2. 2.

    The Convergence Rate denotes how quickly participant can report the correct event. It is computed by the participantś reliability divided by the total iteration needed to converge.

Now, we give the table of simulation parameters used for testing in Table 1.

Table 1. Simulation parameters

We carry out experiments using simulation to evaluate the performance of the proposed PBRE scheme in terms of estimation accuracy of the probability that a participant is right or a measured variable is true compared to another existing reference method Expectation Maximization (EM) Method. We take the average of ten simulation runs. Variance we found is negligible. We consider two types of scenario. In a dense network, the number of participant nodes M is high.

Varying different parameters value, the error percentage was calculated. We now give the details as follows.

4.1 For Variable Number of Participants

We compare the estimation accuracy of PBRE (Fit_Parent and Replace_Parent) and Expectation Maximization(EM) scheme by varying the number of participants in the system.

Fig. 6.
figure 6

Error estimation for participant number

In Fig. 6(a), the number of participants is varied from 30 to 90. Two events and two sets of reliability per person are considered. We observe that, PBRE has a lower estimation error in participant reliability compared to EM scheme. Between two schemes of PBRE, Fit_Parent and Replace_Parent, Fit_Parent has much lower estimation error. This is because Fit_Parent takes only the fit values whereas Replace_Parent takes the fit set of values.

We run experiments for the increased number of participants from 300 to 900. The number in the set of reliability per person is 15. Event number is same as before e.i. two. Now, in Fig. 6(b), we observe that the error percentage decreases for Fit_Parent to 1 % which is compared to 10 to 15 % in Fig. 6 for participants with 4 sets of reliability per person. The reason behind this decline is due to the increased number in the set of reliability.

4.2 For Variable Number of Events

Here, we compare the results by varying the number of events from 2 to 10 for two cases.

Fig. 7.
figure 7

Error estimation for the events

In Fig. 7(a), experiments are run for a sparse network of 50 participants, 2–10 events and 4 set of reliability per person. Here also, PBRE shows better results than EM because, when the event number increases, \(target\_a_i\) decreases (Line 6, Procedure PBRE). Therefore, there are more matches of \(a_i\) as \(fit\_a_i\) to \(target\_a_i\). We run experiments for the increased number of participants to 600 in Fig. 7(b). Here also, PBRE shows better results than EM because, when the event number increases, \(target\_a_i\) decreases (Line 6, Procedure PBRE). Therefore, there are more matches of \(a_i\) as \(fit\_a_i\) to \(target\_a_i\).

4.3 Convergence Rate

We study the convergence vs. estimation accuracy of PBRE and EM scheme by varying the number of participants from 30 to 80. Event number is fixed at 2 and the set of reliability per person is at 4. In Fig. 8, we observe that the convergence rate for PBRE is a bit lower than the EM. Since PBRE has the lower error percentage of reliability than EM, it iterates more than EM to converge. Here, the convergence rate for Fit_Parent, Replace_Parent and EM are 0–2, 3.5–4.5 and 8–10 respectively.

Fig. 8.
figure 8

Convergence rate.

We then examine the results by varying number of events from 2 to 10. The Participant number is fixed at 50 and set of reliability per person is at 4. In Fig. 8(b), we find that the convergence rate for PBRE is lower than the EM. This is because of the similar fact happening in Fig. 8(a). Here, convergence rate for Fit_Parent, Replace_Parent and EM are 0–1, 2–4 and 7.5–10 respectively. We also observe that the rate in Fig. 8(a) is lower than the rate in Fig. 8(b) for increased number of events which is natural since the number of events more it takes more time to converge.

5 Conclusion

In this paper, we study the challenge of finding the node reliability in a participatory sensor network. We propose Population Based Reliability Estimation(PBRE). We computed conditional probability of event to be true with the given set of reliability. We use Genetic Algorithm to estimate the reliability by iterating fitness assessment, breeding and joining. We vary the number of participants, the number of events. The metrics for performance measurement are the error percentage of participantsŕeliable reports and the convergence rate. We compare the results with the result of another relevant and popular method Expectation Maximization for the truth finding. We find that our approach provides better results.

In future, we would like to provide some hybrid approach to have better results. Besides, we have a bias about ground truth more than 50 % probability. We would like to explore the impact of the uncertainty on our method.