1 Introduction

Recently, the integration of sensors and embedded computing devices triggers a novel sensing paradigm, namely mobile crowdsensing, which allows individuals to acquire sensory data from their surroundings. Mobile sensing is increasingly applied to the collection of definite information as to road conditions, environmental pollution and commodity prices supervision [1,2,3]. Likewise, the onboard units (OBUs) installed in vehicles offer vehicular crowdsensing services in intelligent transportation. With sensing devices, vehicles can send the basic driving information, collect traffic conditions and road conditions, and upload them to server for data aggregation and publishing. Traffic management department uses these data to provide traffic and road conditions, route planning services [4,5,6,7]. This acquisition model of raw data from vehicular crowdsensing explicitly lower various economic costs. In spite of the significant conveniences given by vehicular crowdsensing, this paradigm is still confronted with sensing data trustworthiness and privacy preserving key challenges because crowdsensing is an “open” system, in which any vehicle can join the sensing activities [8,9,10].

In classic vehicular crowdsensing application, a lot of information such as real-time location of vehicles and event-reports are collected and analyzed; however, the utility of traffic reports depends on its correctness. Malicious vehicles will generate some event-reports that conflict with the actual situation to be uploaded to local RSU or broadcast to other vehicles. If these false reports cannot be evaluated and filtered in time, the crowdsensing system will be attacked [8], which is demonstrated in Fig. 1.

Figure 1
figure 1

True event-report vs. false event-report in vehicular crowdsensing.

As evident from Fig. 1, a vehicle detects an accident ahead by sensor, and then it reports this accident to the local RSU. The green symbol in the figure represents the real event-report generated by honest vehicles and the red one represents the false event-report by generated by malicious vehicles (red color). If the trustworthiness of the event-report cannot be properly evaluated, then the RSU or traffic management department was misled and published wrong road conditions information, which ultimately led to the traffic system being hijacked by these malicious vehicles or false event-reports, which may cause serious traffic jams. Further, if these data are uploaded to the cloud server for evaluation, the best time for guiding the traffic flow may be delayed. In essence, accurate and timely assessment of the event-report is the foundation for the security of vehicular crowdsensing system; secondly, when the vehicle shares the sensing report with space-time attributes, it may leak large amount of drivers’ privacy information, i.e., the user’s identity, trajectory, health status, and behavior patterns, etc. Therefore, privacy protection mechanism in the vehicular crowdsensing scenario is also one of the research focuses of researchers [11, 12]. Of course, if the system provides complete anonymity, the credibility of the report is difficult to guarantee at the same time. Therefore, finding a solution that can achieve data trust and privacy protection has become a key Challenge. At present, in the field of location-based services (LBS), a large amount of research works on data collection based on privacy protection has been conducted [13,14,15]. This paper focuses on evaluating the truth-value of the event-reports rather than analyzing the formation of event-reports by vehicles. Vehicles gather data about the surrounding environment through the embedded sensors (such as OBU, camera, etc.) and then make the corresponding decisions. For example, an individual vehicle can determine whether there are potholes on the road surface through the changes of speed and vibration.

Furthermore, the abundant sensing data exerts a negative influence on data transmission and processing. It is worth mentioning that vehicle-generated data boasts local relevance in vehicle crowdsensing application. Local relevance indicates that the sensing data exhibit spatial-temporal features. The information of traffic jams may be valid only within half an hour and be useful to the vehicles nearby. Therefore, sending all reports to a remote cloud server for processing will cause a waste the bandwidth of network and response delay. With the emergence of fog computing, network edge devices are employed to perform storing and communication in large quantities. Hence, the necessity to upload all relevant data to the cloud server. The sensing data can be collected, stored and analyzed through fog nodes for local services. In addition, the data are processed locally. In brief, fog nodes not only save unnecessary communication bandwidth, but also support location-aware data management [8, 16]. In order to reduce the burden of data transmission and computation, fog networking is introduced into TPSense framework. Fog networking is a new architecture that provides storage, communication and other functions between terminal devices and the network. Computing functions are further applied to the edge of network for reducing the delay of data transmission. In the fog network, large-scale mobile terminals realize self-organized communication and collaboration through fog nodes. Data storage and calculation can be completed not in a large data center but on the fog nodes near the terminal. Therefore, RSUs are used as a fog node, which act as an intermediate route between the vehicles and the server, aggregates and analyze the data sensed by vehicles. Meanwhile, it can provide corresponding services for vehicles. In short, the introduction of fog nodes can not only reduce the communication bandwidth, but also contribute to the localization of data processing.

Confronted with the challenges mentioned above, we design a novel fog-assisted vehicular crowdsensing framework with event-report trustworthiness evaluation and privacy-preserving (TPSense), which treats RSU as a fog node. The RSU has powerful computing and storage capabilities, and it collect event-reports generated by vehicles, trustworthiness assessment, data aggregation, and broadcasts the results to surrounding vehicles and uploads them to the cloud server. Meanwhile, TPSense uses blind signature to achieve vehicle privacy protection. The following are the major contributions of the paper.

  1. 1)

    We mathematically formulate the event-reports trustworthiness evaluation problem (TEP), which is converted to a maximum likelihood estimation problem. Finally, TE-EM algorithm based Expectation-Maximization efficiently solves the TEP.

  2. 2)

    We propose a novel fog-assisted vehicular crowdsensing framework, which has two goal: event-report trustworthiness evaluation and vehicle privacy preserving.

  3. 3)

    We run a set of simulation experiments on synthetic data and real data to evaluate the effectiveness of our proposals and then make a comparison of the performance.

This paper is developed as follows. Section 2 analyzes related work. Section 3 introduces the TPSense framework, threat model and threat model. Section 4 presents the event-information trustworthiness scheme. Section 5 completes the experiments and analyzes results, while Section 6 reaches conclusions and discusses future work.

2 Related Work

2.1 Data Trustworthiness and Privacy Preservation in VANETs

As mentioned, data trustworthiness and privacy preservation is two important challenges in VANETs. Researchers proposed different data trustworthiness evaluation schemes for communication in both Vehicle-to-Vehicle and Vehicle-to-Infrastructure [17]. Most of these solutions are based on the reputation model proposed in [18]. An event report with the vehicle’s reputation score is sent to a receiving entity (vehicles, RSU or cloud servers), and the latter confirms the true value of the report. The reputation of vehicle is continuously updated. A reputation-based announcement scheme is devise by Li et al. [19] to assess reliability of information, where vehicles behaviors can be collected. Accordingly, the scheme uses the accumulated feedbacks to testify the reliability of a vehicle. Similarly, in [20] a lasting reputation system in accordance with the vehicular daily behavior patterns is raised. A beacon-based trust management system is advocated by Chen et al. [17] to guard against internal attacks from sending malicious messages sent by internal malicious nodes proposed. Additionally, location privacy of VANETs is increased. However, it is proposed by Raya et al. [21] that data-oriented trust may better suit VANETs. The trust of the nodes are regarded as one parameter for data trustworthiness in this this scheme. The data trustworthiness varies in accordance with environment. However, this scheme does not explore privacy preservation. A dynamic approach was put forward to construct trusted vehicles groups by Tamper et al. [22].

Many solutions in VANETs require trusted third parties (TTP) updates the certificate revocation list (CRL) frequently and offer to the co-located RSUs [23]; The false messages are not to get circulated in VANETs. It is, however, demanding for CRL-based authentication practices as in processing the list as its size increases. In addition, encryption, and signature algorithms are combined into various schemes to achieve privacy preservation. Nevertheless, more efforts are necessary to meet efficient computing requirements.

2.2 Data Trustworthiness and Privacy Preservation in Participatory Sensing

To enhance the trustworthiness of sensing data in participatory sensing (PS) paradigm, researchers have attached great importance to the evaluation of sensing data and management of the reputation of participants. Huang et al. [24] and Yu et al. [25] proposed a reputation system by utilizing Gomeprtz function to calculate the participant’s reputation score. Gomeprtz function model applies participants’ past cooperation levels to reputation scores. The cooperation level is set by a module which performs an outlier detection algorithm. The main drawbacks of those models lie in the fact that the uncertainty factor in the trust assessment is not considered [26].

Fuzzy inference-based reputation model is used to calculate trust scores by way of the acquired evidence. Each participant is assumed to belong to a social network, and a social graph is able to describe the relationships of the participants.

Trustworthiness of a participant’s contribution data results from such factors as their expertise, location, socializing. A trust of participant comes from those factors. However, participants’ privacy (such as location, friendship relations) in social networks will lose protection. This is also not accepted by participants. In addition, several works in the literature [27, 28] have employed feedback-based reputation model to calculate the participants’ trust scores in PS. Currently, some well-known crowdsensing applications such as Foursquare and Waze also use a rating feedback mechanism to allow service consumers to give positive, negative, or neutral evaluation information on products. Its advantages include simplicity, fast, and less expensive, which is the essence of PS paradigm.

In the above scheme, privacy protection of participants is less considered. Huang [29] further considers privacy needs of participants based on literature [24]. It assigns each participant a couple of pseudonyms, and relies on a trusted third party to pseudonyms change. Similarly, Christin et al. [30] used blind signatures and cloaking techniques to protect privacy. Wang et al. [31] proposes ARTSense to tackle the issue of trust under no identity. To solve the issue of time delay, Ma et al. [32] put forward two reputation maintenance schemes concerning privacy protection.

Different from the data trustworthiness evaluation based on the node reputation model mentioned above, this research assumes that the reliability of vehicles in participating sensing does not have any prior knowledge, and that the given research does not require a stable network topology, nor the reputation value of the vehicles.

3 TPSense Framework

Unlike previous research works, there are three differences. First, we focus on estimating the binary value of sensing reports; second, the report trustworthiness evaluation algorithm is executed on RSU, and does not need Manage and continuously update the reputation score of all vehicles. Third, in our framework, pseudonyms are used to replace the real identities of vehicles, to protect privacy of vehicles. Network model, attack model and corresponding hypotheses of the framework will be described in this part.

3.1 System Model

TPSense is composed of five entities as given in Fig. 2.

Figure 2
figure 2

System architecture.

Trusted Authority (TA): To ensure system security and vehicles’ privacy, we consider usually an authoritative traffic management department of government as TA. Initial parameters and cryptographic keys set by TA can help vehicles and RSUs to realize stronger privacy protection. Service providers (SP) and cloud servers (CS) are merged together to provide end users with various services including data storage, data processing, and data publishing (e.g., traffic queries from different vehicles,). Roadside Units (RSUs) are subordinated by the service providers and placed on the roadside. RSUs are viewed as a fog node with computation capabilities. Equipped with wireless devices, RSUs collect driving reports from crowd vehicles, authenticates vehicles’ identity, verifies data trustworthiness and filters, and then uploads local traffic conditions to service providers and cloud servers. The benefit of data processing on RSU is to shorten the data collection process. RSUs feedback traffic conditions to vehicles faster. Therefore, it is not necessary to upload all sensing reports to the cloud server for processing. In addition, RSU is also capable of responding to the driver’s query of road conditions or broadcasting the correct road condition information. As one major component of the TPSense framework, RSUs can fulfill data trustworthiness evaluation and privacy protection, and do not exist in mobile crowdsensing. Vehicle nodes (VNs) are equipped with onboard units (OBU), which enable direct communication with other vehicles and RSUs through DSRC, or 5G. A vehicle may periodically broadcast its driving information or occasionally send its traffic report to the local RSUs.

3.2 Threat Model

External and internal attackers may threaten the security of vehicular crowdsensing system. We focus on internal attack about data trustworthiness in this paper. Specifically, the malicious nodes in system may generate forged data and submit them to RSU or the server for their own benefit (for example, gaining credits for contributing to a crowdsensing task). Internal attacks mainly include malicious node or malicious conflict behavior attacks. The attacker may tamper, counterfeit or modify (the adversary may include intercepting the normal data transmission, forging or modifying data) data in conflict with the real traffic scene.

The paper proposes two assumptions.

  1. 1)

    RSU, SP and cloud server are trusted and impossible to be compromised. The reports generated by vehicles do not need to be transmitted to cloud server. In other words, the filtering of traffic event reports is done on the RSU.

  2. 2)

    Most of vehicles are honest and able to generate event-reports faithfully. The scope of this paper does not include the following: Method for vehicle to generate data report and collusion attacks implemented by several malicious nodes.

Privacy threats in this system cannot be ignored. The RSUs and CS are honest-but-curious. However, they may get drivers’ information ranging from drivers’ identity, trajectories, and driving behavior modeling. Therefore, while ensuring the trustworthiness of the vehicle’s report, users’ privacy protection should also be considered. That is, the identity of vehicles should remain anonymous to other vehicles and RSUs.

4 Enabling Event-Information Trustworthiness Scheme for Vehicle Crowdsensing

Recently, several trust schemes as to crowd-sensing [31] have also been devised. Most of these schemes evaluate the trust of nodes and sensing data based on nodes’ continuously updated reputation score. This section focuses on the implementation of sensing report trustworthiness evaluation model (SrTEM).

4.1 Sensing Report Trustworthiness Evaluation Module

As a core part of the whole vehicular sensing system, SrTEM provides a basis to reach our final objective. The goal of SrTEM is to judge the trustworthiness of event-reports. The content of the sensing data in the report depends on the application itself; it can also be traffic conditions, perception of air quality or noise, etc. Taking the vehicle crowdsensing traffic application as an example, RSU collected the traffic and road condition information (such as vehicle collision or road surface condition reports) which uploaded by the vehicle under its communication within a specified period. A trustworthiness evaluation algorithm evaluate false report from some selfish or malicious vehicles, and filter those false reports on RSU.

  1. A.

    Problem Definition and Formalization

We take an event ej (ej ∈ E) as the object under monitor and a vehicle vi (vi ∈ V) generates a sensing report (Rj) about an event in vehicle crowdsensing system. This can be a traffic event, or potholes on a specific road, and etc.

Report Rj is presumed to be binary in this paper. Rj ∈ {T, F} where T stands for True (e.g., “There is a traffic jam ahead at the specific location”), and F stands for False (e.g., “There is no traffic jam at the specific location”). We are concerned with the binary attributes of statements in vehicle crowdsensing.

In vehicle crowdsensing, A matrix VR is used to account for the reports from all vehicles V about events E, which is named the vehicle-report matrix. The element VRi, j = v indicates that the vehicle Vi declares that the value of Rj is v. It is also possible that a vehicle does not generate report, in which the corresponding element in the vehicle-report matrix VR is assigened value “Unknown” (U for short) indicating that the vehicle did not generate anything relevant about this event. Hence, each element ViRj in the vehicle-report matrix VR may have a value of T, F or U.

First, some definitions and notations are introduced. Xv denotes that X has value v. The reliability of vehicle Vi is ti, meaning the probability that a report is true if the vehicle Vi report it. ti is described as:

$$ {t}_i=P\left({R}_j^v|{VR}_{i,j}^v\right) $$
(1)

Let further define \( {T}_i^v \) to be the probability that vehicle Vi reports the value of Rj correctly. Similarly \( {F}_i^v \) denotes the probability of an incorrect report by Vi. To put it another way, there exists the probability that Vi reports that Rj has value \( \overline{v} \) if its value is v. Here \( \overline{v} \) is the complement of v. Formally, \( {T}_i^v \) and \( {F}_i^v \) are defined as follows:

$$ {T}_i^v=P\left({\mathrm{VR}}_{i,j}^v|{R}_j^v\right),\kern0.5em {F}_i^v=P\left({\mathrm{VR}}_{i,j}^{\overline{v}}|{R}_j^v\right) $$
(2)

Note that a vehicle may not assert a report, \( {T}_i^v+{F}_i^v\le 1 \). Therefore, we can get:

$$ 1-{T}_i^v-{F}_i^v=P\left({\mathrm{VR}}_{i,j}^U|{R}_j^v\right) $$
(3)

Let the probability that vehicle Vi generates the report to be of value v be \( {p}_i^v \). Let \( {p}_i^{\overline{v}} \) stand for the probability that Vi generates a report that has a value instead of v. Let dv represent the prior probability that Rj has value v.

Applying ti into the equation of \( {T}_i^v \) and \( {F}_i^v \), we find the correlations between the terms based on the Bayesian theorem:

$$ {T}_i^v=\frac{t_i\times {v}_i^k}{d^v}\ {F}_i^v=\frac{\left(1-{t}_i\right)\times {v}_i^k}{1-{d}^v} $$
(4)

Table 1 summarizes the introduced notations.

Table 1 The set of notations.

Therefore, the trustworthiness evaluation issue of reports is treated as a maximum likelihood estimation problem: Based on vehicle-reports matrix VR, how to calculate the reliability of each vehicle as well as the trustworthiness of every report efficiently?

  1. B.

    Trustworthiness Evaluation by Maximum Likelihood Estimation

In this part, the Expectation-Maximization (EM) algorithm is employed to address the maximum likelihood estimation problem proposed in the preceding section. For simplicity, we assume that all vehicles independently generate reports in vehicle crowdsensing scene. Thus, the proposed algorithm is named TE-EM.

As a common algorithm in the field of machine-learning, EM is taken to acquire maximum likelihood estimates of parameters [33]. While using the EM algorithm, it is the most difficult to mathematically formulate the problem. Firstly, the likelihood function L(θ; X, Y) is defined, in which θ is the vector of unknown parameters, X stands for an acquired data set, and Y denotes the vector of latent variables. It is through EM that maximums likelihood estimate of θ and Y is acquired after iterative performing of E-step and M-step.

$$ \mathrm{E}-\mathrm{step}:Q\left(\theta |{\theta}^{(n)}\right)={E}_{Z\mid X,{\theta}^{(n)}}\left[\log L\left(\theta; X,Y\right)\right] $$
(5)
$$ \mathrm{M}-\mathrm{step}:{\theta}^{\left(n+1\right)}=\arg {\max}_{\theta }Q\left(\theta \right) $$
(6)

The EM model applies to the given crowdsensing problem. A latent variable Y is introduced in each report to denote the event variables. The vehicle report matrix VR is known as the observed data X, and the parameter vector θ as:

$$ \theta =\left\{\left({T}_i^v,{F}_i^v,{d}_k\right)\right|\forall i\in V,v\in \left\{T,F\right\}\Big\} $$

Then the likelihood function L(θ; X, Y) is acquired as follows:

$$ L\left(\theta; X,Y\right)=P\left(X,Y|\theta \right)=\prod \limits_{j=1}^NP\left({y}_j\right)\times P\left({X}_j|{y}_j,\theta \right) $$
(7)

Where N = |R| refers to the quantity of event variables, and Xj indicates all the reports from the vehicle about the j-th event.

Therefore, we can deduce the E-step as:

$$ Q\left(\theta |{\theta}^{(n)}\right)=\sum \limits_{j=1}^N\left\{{Y}_1\left(n,j\right)\times \left[\sum \limits_{i=1}^M\left({VR}_{i,j}^1\mathit{\log}{T}_i^1+{VR}_{i,j}^2\mathit{\log}{F}_i^1+\left(1-{VR}_{i,j}^1-{VR}_{i,j}^2\right)\log \left(1-{T}_i^1-{F}_i^1\right)+\mathit{\log}{d}_1\right)\right]+\left(1-{Y}_1\left(n,j\right)\right)\times \left[\sum \limits_{i=1}^M\left({VR}_{i,j}^2\mathit{\log}{T}_i^2+{VR}_{i,j}^1\mathit{\log}{F}_i^2+\left(1-{VR}_{i,j}^1-{VR}_{i,j}^2\right)\log \left(1-{T}_i^2-{F}_i^2\right)+\mathit{\log}\left(1-{d}_1\right)\right)\right]\right\} $$
(8)

Where Y1(n, j) = p(zj = 1| Xj, θ(n)) refers to the fact of that the conditional probability of Rj has value k (k = 1) if the VR matrix is correlated to the jthevent and present estimate of θ. Xj stands for the jth column of VR matrix. Note that Y2(n, j) = 1 − Y1(n, j) and d2 = 1 − d1.

For the M-step, to get θ that maximizes Q(θ| θ(n)), we set partial derivatives \( \frac{\partial Q}{\partial {T}_i^k}=0 \),\( \frac{\partial Q}{\partial {F}_i^k}=0 \) and \( \frac{\partial Q}{\partial {d}_k}=0 \), we can get expressions of the optimal \( {T}_{i,k}^{\ast } \), \( {F}_{i,k}^{\ast } \) and dk:

$$ {\displaystyle \begin{array}{c}{T_i^1}^{\left(n+1\right)}={T_i^1}^{\ast }=\frac{\sum_{j\in {SJ}_i^1}{Y}_1\left(n,j\right)}{\sum_{j=1}^N{Y}_1\left(n,j\right)}\\ {}\begin{array}{c}{F_i^1}^{\left(n+1\right)}={F_i^1}^{\ast }=\frac{\sum_{j\in {SJ}_i^2}p\left({y}_j=1|{X}_j,{\theta}^{(n)}\right)}{\sum_{j=1}^Np\left({y}_j=1|{X}_j,{\theta}^{(n)}\right)}\\ {}\begin{array}{c}{T_i^2}^{\left(n+1\right)}={T_i^2}^{\ast }=\frac{K_i^1-{\sum}_{j\in {SJ}_i^1}{Y}_1\left(n,j\right)}{N-{\sum}_{j=1}^N{Y}_1\left(n,j\right)}\\ {}\begin{array}{c}{F_i^2}^{\left(n+1\right)}={F_i^2}^{\ast }=\frac{K_i^2-{\sum}_{j\in {SJ}_i^2}{Y}_1\left(n,j\right)}{N-{\sum}_{j=1}^N{Y}_1\left(n,j\right)}\\ {}{d_1}^{\left(n+1\right)}={d_1}^{\ast }=\frac{\sum_{j=1}^N{Y}_1\left(n,j\right)}{N}\end{array}\end{array}\end{array}\end{array}} $$
(9)

Where \( {SJ}_i^1 \) and \( {SJ}_i^2 \) are the set of reports the vehicle Vi generates, each of which has true or false value respectively, and \( {K}_i^1 \) and \( {K}_i^2 \) indicate the size of reports in the two sets mentioned above.

4.2 User’s Privacy Preservation

The TPSense framework utilize anonymous method to protect users’ privacy. Specifically, a vehicle’s real identity should not appear in the sensing data report in PS application. In this way, neither RSUs nor cloud server can link a report with a certain vehicle. Meanwhile, it is also necessary to change vehicle’s pseudonym each time when a vehicle make the report submission. Otherwise, the real identity of a vehicle may still be leaked by analyzing the trajectories of the vehicle. In order to solve the above problems, our scheme utilizes Blind Signature technology [34] to generate a Blinded ID (BID) similar to a pseudonym for vehicle users. The specific process of using blind signature to generate pseudonyms for vehicles is as follows:

  1. 1)

    Message blinding: A vehicle randomly generates a blinding factor, and uses the signer’s public key and blinding factor to process the information M. Then the vehicle can obtains the blinded information M’, and sends the blinded message M’ to the signer.

  2. 2)

    The blind message signing: When the signer receives the blinded message M’ from the vehicle node, he just needs to prove that he receives the message without knowing the specific content. The signer encrypts M’ with his private key and then gets the signature SIG (M’). Next, the signer sends it back to the vehicle node.

  3. 3)

    Signature recovery: After receiving the blindly signed message SIG (M’), the vehicle removes the blinding factor, and obtains the signature SIG (M) from SIG (M’).

  4. 4)

    Pseudonym generation: The vehicle generates a pseudonym through combining the SIG (M) and the temporary public key.

5 Performance Evaluation

The effectiveness of the TPSense framework is evaluated based on the estimated error of vehicle reliability and the accuracy of event-reports evaluation including false positives and negatives rate.

In addition to providing anonymous privacy protection of vehicles, TPSense framework mainly distinguishes between true and false event-reports, which will avoid the attacks of false information from malicious nodes inside the system. We use python to develop a customized emulator to assess the efficacy of the framework. The performance of TPSense is further evaluated through synthetic and real dataset. The data contains movement trajectories of a large number of vehicles, randomly generated traffic events.

5.1 Evaluation of TPSense with Synthetic Data

The random waypoint model is employed to simulate the paths of vehicles. The fundamental simulation and system parameters are shown in Table 2. For the simulation dataset, several RSUs with a communication radius of 0.5 km are deployed randomly in simulation area. The coordinates of RSUs are saved into a file. We do not consider a vehicle’s report unless its location attributes meet our requirements. The distribution function proposed in Barnwal et al. [35] is taken to generate random events in various simulated areas.

Table 2 Fundamental simulation parameters.

50–200 vehicles are taken in each experiment, and random effects are minimized through averaging the results. We assumed two types of false reports, one being submitted by those vehicles off the communication range of any RSU and the other being presented with a given probability instead of real event-reports from vehicles in the communication range.

In particular, unlike trustworthiness evaluation based on participant’s reputation value in the previous schemes, TE-EM method in TPSense does not require continuous updating of the reputation value of participants. We use the classic truth-discovery algorithms in the field of data mining as the baseline. These algorithms are also commonly used to solve data fusion under information conflicts and not suitable for comparison with existing trusted models. Here, we choose the following four algorithms: Regular-EM [33], TruthFinder [36], Sums [37], and Voting. Since the existing literature has proved that simple Voting algorithm is less effective than the other three algorithms, so the experiment does not list voting algorithm. The widely accepted false positive and negative rate, estimation error are employed as the metric.

Two experiments are devised to assess the performance of TE-EM. A random number of vehicles and event-reports are produced by a simulator in Python. Each vehicle is given a random probability to represent its reliability. We assume event-reports to be binary in this paper. For each vehicle, some event-reports can be generated. It is worth mentioning that for the EM-Regular, we adapted it according to our requirements and that the report of higher probability was taken from the two contrastive versions of the given event after the computation ends.

  1. A.

    Impact of Number of Vehicles on Metrics

The estimation accuracy of our algorithm and baselines are compared in the first experiment through the different numbers of vehicles in crowdsensing scenario. The size of event-reports was set at 4000, with 2000 reports being correct and 2000 being misreported. The event-reports by per vehicle was set at 50 on average and the number of vehicles ranges from 50 to 110, results of which are presented in Fig. 3. TE-EM performs the best of four algorithms in predetermined metrics. In addition, all the algorithms perform better as the number of participants increase.

Figure 3
figure 3

Impact of number of vehicles on metrics (using synthetic data).

  1. B.

    Impact of Number of Offset O on Metrics

As mentioned above, dv stands for the prior probability that the value of Rj is v. For example, d1 refers to a probability that the value of a randomly chosen report is true. d1 is set at 0.5 in the initial phase of algorithm and an offset O indicates the disparity between d1 and ground truth. In this experiment, the value of o range from 0 to 0.45, and the results are shown as in Fig. 4.

Figure 4
figure 4

Impact of number of offset o on metrics (using synthetic data).

Figure 4 indicates that all the algorithms can produce reliable results as o changes. Comparatively speaking, TE-EM algorithm outshines the other three algorithms in estimation error, false positive and negative rate. Besides, the initial value of dv make little difference to the performance of TE-EM algorithm.

5.2 Evaluation of TPSense with Real Data

In this experiment, we employed the outdoor temperature sensing data from CRAWDAD. The data set includes about 5000 sensing items opportunistically from 300 or so taxis in Rome within a day. Temperature data with a time mark, an identity and coordinates are uploaded to the server. The data set is from the previous research on participatory sensing [38].

The city area is segregated into 9 sensing regions with each part being 56 km2 in area, and a base station is deployed in the central area of individual regions. There are 4 time spans for temperature sensing in a day, with each one lasting 6 h. We assign a temperature value to each taxi in a time span based on Gaussian distribution.

The value of ground truth temperature come from the mean temperature. We presume a temperature range with 10% offset and if a value produced falls into this range, we regard the report as true. In addition, if the location of taxi is not in the area, but the sensing data within the normal temperature range and the fake location data are upload, we consider its contribution to be false. We varied the percentage of honest vehicles from 0.6 to 0.9. In other words, each time a different ratio of taxis are designated as malicious nodes. Experimental evaluation still uses previous metrics.

Figure 5 shows the changes in the performance of these four algorithms against the various percentages of honest vehicles. The increase in the percentage of honest vehicles has witnessed improvement of performance of these four algorithms to different degrees, among which TE-EM exhibits the highest accuracy. The experimental results are consistent with the results generated based on synthetic data.

Figure 5
figure 5

Impact of ratio of honest vehicles on metrics (using real data).

6 Conclusion

In this paper, we proposes TPSense, a lightweight fog-assisted vehicular crowdsensing framework, which addresses the evaluation of event-reports’ trustworthiness and protection of users’ privacy. To solve the problem of false event-report generated by malicious nodes in the crowdsensing system, we convert it into a maximum likelihood estimation problem, and handle it through the expectation maximization algorithm. Through above works, we can complete the trustworthiness evaluation of event-reports and the reliability evaluation of the vehicles, and achieve the aim of false event-reports filtering on local RSUs. Meanwhile, the blind signature technology is used to generate a pseudonym for replacing the vehicle’s real identity when vehicle uploads event-reports to ensure the anonymity of the vehicle and achieve users’ privacy protection. We have assessed the TPSense by means of synthetic data and real data in vehicular crowdsensing. It is shown in results that TPSense outshines previous research in increasing information reliability. In the ongoing research, we will strengthen location privacy, data privacy protection and identity privacy by using technologies such as homomorphic encryption, space-time cloak, and differential privacy.