Keywords

1 Introduction

With the rapid development of wireless communication technology and positioning technology, it has promoted the wide application of location-based services (LBS) [1,2,3]. In order to get services in LBS, the user has to share its current location information, such as querying the nearest hospital. However, the location information can be stolen by the attackers, then more private information maybe leaked by the mining methods [4]. Therefore, the personal privacy information in LBS should be protected.

The location privacy protection method for road network is mainly based on K-anonymity and L-diversity [5,6,7,8,9], which is an anonymous region contains both K users and L road segments. However, this method doesn’t consider the influence of semantic information of the location. Based on this, Li et al. [10] divided the city map based on the Voronoi diagram, and the anonymous region is constructed according to the semantic location popularity of the Voronoi cell. Xu et al. [11] proposed an incremental query optimization method based on local optimization and global optimization, using the deficient popularity to build an anonymous region. Li et al. [12] converted the road network into road segment clustering map, balancing quality of service and privacy requirements by constructing the anonymous region with different semantic location types. Chen et al. [13] introduces the regional popularity and combines with the user-defined sensitivity to calculate the privacy of adjacent road segments, a privacy protection algorithm based on location semantic is designed. Lv et al. [14] obtained the adjacent candidate road segments efficiently until the privacy requirements are satisfied. Chen et al. [15] considered the semantic information of the dummy location and enhanced the privacy protection by constructing a location semantic tree. Xu et al. [16] proposed a sensitivity-fade algorithm, which uses the semantic location influence vector to select the road segment to construct the anonymous region and improve privacy protection.

However, none of these methods consider most the user’s privacy preference for location privacy protection and quality of service. Therefore, an adjustable semantic location privacy protection scheme for road network is proposed in this paper.

2 Preliminaries

2.1 Related Definition

In this paper, the road network is denoted as an undirected connectivity diagram G=(V, E), \( E = \{ e_{1} ,e_{2} , \ldots ,e_{m} \} \) denotes the road segments in the road network, each road segment \( e_{i} = \{ eid,v_{s} ,v_{e} \} \) is an edge in the road network, with eid is the road segment number, vs and ve respectively denote the starting and the end point of the road segment. \( V = \{ v_{1} ,v_{2} , \ldots ,v_{n} \} \) denotes the intersection of road segment. Anonymous region CR is comprised of multiple adjacent road segments \( Edges = \{ e_{1} ,e_{2} , \ldots ,e_{i} \} \), multiple users \( Users = \{ u_{1} ,u_{2} , \ldots ,u_{j} \} \) and multiple semantic locations \( Locs = \{ loc_{1} ,loc_{2} , \ldots ,loc_{k} \} \) on the road segments, in which the number of road segments Edges and users Users should satisfy the personalized privacy requirement of users.

Definition 1 (Semantic location).

\( loc = \{ lid,eid,(x,y),tp\} \) denotes the semantic location in the road network, with lid is the number of the semantic location, eid is the number of the road where the semantic location is located, (x, y) is the coordinate of the semantic location, and tp is the type of the semantic location. The type of semantic location is divided into n types in total, and \( Type = \{ tp_{1} ,tp_{2} , \ldots ,tp_{n} \} \) is the set of n semantic location types.

Definition 2 (Semantic location popularity).

It is used to describe the popularity of a semantic location type in the road network. For each semantic location type \( tp_{i} \in Type \) set a popularity \( p_{{tp_{i} }} \). The set \( Pop = \{ p_{{tp_{1} }} ,p_{{tp_{2} }} , \ldots ,p_{{tp_{n} }} \} \) indicates the popularity of all semantic location.

Definition 3 (Semantic location sensitivity).

It is used to describe the sensitivity of a semantic location type in the road network. Each user sets a sensitivity \( s_{{tp_{i} }} \) for each semantic location type \( tp_{i} \in Type \) according to their own circumstances. The set \( S_{u} = \{ s_{{tp_{1} }} ,s_{{tp_{2} }} , \ldots ,s_{{tp_{n} }} \} \) is the set of sensitivity of all semantic location types relative to user u.

Based on definition 2 and definition 3, the popularity and the sensitivity of the CR can be defined as follow:

Definition 4 (Regional popularity).

The popularity PopularCR of the CR,

$$ Popular_{CR} = \sum\limits_{i = 1}^{|Type|} {\frac{{|CR.Locs.tp = tp_{i} |}}{|CR.Locs|}} p_{{tp_{i} }} $$
(1)

Definition 5 (Regional sensitivity).

The sensitivity SensCR of the CR,

$$ Sens_{CR} = \sum\limits_{i = 1}^{|Type|} {\frac{{|CR.Locs.tp = tp_{i} |}}{|CR.Locs|}} s_{{tp_{i} }} $$
(2)

|Type| in formulas (1) and (2) is the total number of semantic location types contained in the CR; and |CR.Locs| is the number of semantic locations contained in the CR.

Definition 6 (Regional privacy).

The regional privacy RPCR of the CR,

$$ RP_{CR} = \frac{{Popular_{CR} }}{{Sens_{CR} }} $$
(3)

Definition 7 (Privacy tolerance).

It is used to describe the importance of the user’s privacy information. Assuming that all adjacent road segments of the CR are the set \( NearEdges = \{ e_{1} ,e_{2} , \ldots ,e_{n} \} \), and the regional privacy is formed by adding road segments in NearEdges to the CR one by one, which is recorded as the set RPset.

$$ \forall rp_{i} \in RP_{set} ,\delta_{i} = \frac{{rp_{\hbox{max} } - rp_{i} }}{{rp_{\hbox{max} } - rp_{\hbox{min} } }},\delta_{i} \in [0,1] $$
(4)

\( rp_{\hbox{max} } \) is the maximum of \( RP_{set} \), \( rp_{\hbox{min} } \) is the minimum of \( RP_{set} \). Privacy tolerance reflects the user’s choice of location privacy protection and the quality of service based on subjective intent, which is in a range of values [0,1]. The more attention is paid to the location privacy protection, the privacy tolerance is lower; otherwise, the quality of service is more important.

Definition 8 (Privacy requirement).

For a user u that makes a query, his privacy requirements are expressed in PR(K,L,δ,S). In this case, K denotes the user-defined lowest number of anonymous users; L denotes the user-defined lowest number of road segments; δ denotes the user-defined highest value of privacy tolerance; and S is user-defined sensitivity of a group of different semantic location types.

Definition 9 (Deficient number).

For a user u, it is used to describing how many anonymous users are still missing from the CR to satisfy u.PR.K,

$$ dn = u.PR.K - |CR.Users| $$
(5)

|CR.Users| is the number of anonymous users in the CR.

2.2 System Model

This paper is based on the central server architecture (Fig. 1), a trusted third anonymous server that exists in the client and the location server. Users send their locations, inquiry contents, and privacy requirements to anonymous servers. The anonymous server sends the users’ locations after the privacy preference selection module and the anonymous module to the LBS server. The location server queries the candidate results and returns it to the anonymous server, the anonymous server analyzes the query candidate set, and returns the screening valid results to the requester. Besides, the anonymous server needs to store the city map information and the semantic location information.

Fig. 1.
figure 1

Central Server Architecture

3 Adjustable Semantic Location Privacy Protection Scheme

This paper assumes that third-party servers(anonymous servers) are trustworthy to users. In order to achieve user’s privacy preference for location privacy protection and the quality of service in the road network, this paper designs an adjustable semantic location privacy protection scheme. The algorithm flow as shown in Fig. 2.

Fig. 2.
figure 2

Algorithm Flow

Algorithm 1 is an adjustable semantic location privacy protection scheme(ASLPP). The algorithm’s input parameters include user u and privacy requirements PR. The concrete steps are as follows:

  1. (1)

    Initialize the input parameters (line 1);

  2. (2)

    Add the road segment where the user is located to the CR.Edges (line 2–3), and update CR.Users and CR.Locs (line 4);

  3. (3)

    Determine whether the CR meets the user privacy requirements (line 5), If the requirements are satisfied, return CR (line 13); otherwise, execute step 4;

  4. (4)

    Execute the algorithm OARSS, add the result to the CR and execute step 3 (line 5–12).

The pseudo-code of the algorithm is as follows:

figure a

Algorithm 2 is the optimal road segment selection algorithm(OARSS). The input parameters of the algorithm are the anonymous region CRS, the deficient number dn, the privacy tolerance δ, the set of road segments NearEdges and the set of sensitivity SS. The concrete steps are as follows:

  1. (1)

    Initialize the input parameters (line 1–4);

  2. (2)

    Calculate RP after adding the set of road segments and add it to the set RPset (line 5–8), find the maximum and minimum of RPset (line 9–10).

  3. (3)

    Calculate the privacy tolerance of each road segment in the set NearEdge, and take the road segment of less than or equal to δ as the set Cadedgesset (line 11–17).

  4. (4)

    Add a road segment greater than or equal to dn in the Cadedgesset to the set DPEdge1set, and add a road segment smaller than dn to the set DPEdge2set (line 18–24).

  5. (5)

    If DPEdge1set is not empty, return to the road segment of the minimum of anonymous users. Otherwise, return to the road segment of DPEdge2set with the maximum of anonymous users (line 25–30).

The pseudo-code of the algorithm is as follows:

figure b

4 Experiment and Analysis

4.1 Experiment Data Sets and Parameter Settings

The environment of the experiment is Intel Core(TM) 2 CPU @ 2.83 GHz; 2 GB RAM; the operating system is Microsoft Windows 7 Professional, and the algorithm is written in Java based on MyEclipse environment.

The experimental data is based on the Oldenburg map of Germany, which includes 6105 vertices and 7035 edges. There are 1 0000 uniform distribution users obtained from Brinkhoff based network mobile object generator [17] by introducing the highway network of Germany Oldenburg city into Brinkhoff generator. The users are distributed on the road segments, and a specific semantic information definition is labeled on the attribute of location type in location data generated by Brinkhoff generator, including 4 types of semantic location (hospital, bar, shopping mall, and school). All experimental parameters in the experiment set are shown in Table 1.

Table 1. Parameter settings

The experiment randomly selects 1000 users who request service to simulate experiments. For the convenience of calculation, the hypothetical popularity for the type of semantic locations is as following: {hospital:0.3, bar:0, shopping mall:0.4, school:0.3}. Considering the time complexity and the quality of service, the maximum number of road segments Lmax is set in the experiment.

4.2 Analysis of Experimental Results

The experiment compares and evaluates ASLPP with LSBASC proposed in literature [13] and Enhance-LSBASC proposed in literature [14] from the aspects of anonymous success rate, average anonymous execution time, average number of semantic locations and regional privacy.

(1) Influence of δ

Figure 3 depicts the influence of δ on the algorithm ASLPP. Since the algorithm LSBASC and Enhance-LSBASC don’t consider privacy tolerance δ, only the algorithm ASLPP is experimentally verified when K = 25, L = 6, Lmax= 20 and δ changes from 0.1 to 1. It can be seen from Fig. 3 that the anonymous success rate of the algorithm ASLPP is increasing, but the number of semantic locations, average anonymous execution time, and regional privacy are decreasing, especially after δ is greater than 0.7, the trend of change is more obvious. This is because when the δ increases, more road segments meet the privacy tolerance δ. The algorithm ASLPP selects the road segment according to the deficient number, reduces the number of road segments that need to be added. Therefore, the average anonymous execution time and the number of semantic locations are reduced, the anonymous success rate is improved. However, the larger of δ, the road segment selected by the algorithm ASLPP contains more sensitive semantic locations of the user, which makes the regional privacy decrease. It can be seen that the algorithm ASLPP can effectively implement the privacy preference selection of the user’s location privacy protection and the quality of service by adjusting δ.

Fig. 3.
figure 3

Anonymous success rate, average number of semantic locations, average anonymous execution time, regional privacy and δ

(2) Influence of K

Figure 4 depicts the influence of K on the algorithm ASLPP, LSBASC, and Enhance-LSBASC when L = 6, δ  = 0.7, Lmax = 20 and K changes from 15 to 35. In Fig. 4(a), the anonymous success rate of the three algorithms is decreasing and the algorithm ASLPP is higher than that of the algorithm LSBASC and Enhance-LSBASC. This is because the algorithm LSBASC chooses the best one to join the anonymous set each time, while the algorithm Enhance-LSBASC selects the optimal road segment set to join the current anonymous set each time. When the number of added road segments reaches the upper limit of the road segment tolerance Lmax, the number of anonymous users can’t meet the privacy requirements, resulting in anonymous failure. The algorithm ASLPP selects the optimal road segment by the deficient number. In Fig. 4(b), the average anonymous execution time of the three algorithms is increasing, but the algorithm ASLPP is lower than that of the algorithm LSBASC and higher than that of Enhance-LSBASC. This is because when the K increases, more road segments need to be added to meet the user’s privacy requirements.

Fig. 4.
figure 4

Change of K

In Fig. 4(c), the number of semantic locations of the three algorithms increases, but algorithm ASLPP is lower than that of the algorithm LSBASC and Enhance-LSBASC. This is because algorithm ASLPP selects adjacent road segment based on the deficient number, and reduces the number of road segments that need to be added. In Fig. 4(d), the regional privacy of the three algorithms is decreasing. This is because the anonymous region contains more relatively sensitive semantic locations of the user, which makes the regional privacy decrease. The algorithm ASLPP is lower than that of the algorithm LSBASC and higher than the algorithm Enhance-LSBASC. However, the algorithm ASLPP only reduces regional privacy by about 1% compared to the algorithm LSBASC.

(3) Influence of L

Figure 5 depicts the influence of L on the algorithm ASLPP, LSBASC, and Enhance-LSBASC when K = 25, δ = 0.7, Lmax= 20, and L changes from 3 to 15. In Fig. 5(a), the anonymous success rate of the three algorithms is not affected by L, and the algorithm ASLPP is higher than that of the algorithm LSBASC and Enhance-LSBASC. In Fig. 5(b), the average anonymous execution time of the three algorithms is increasing, but the algorithm ASLPP is lower than that of the algorithm LSBASC and higher than that of the algorithm Enhance-LSBASC. This is because when the L increases, the number of road segments in the anonymous region must reach a certain number. The algorithm ASLPP and LSBASC choose the best one each time, while the algorithm Enhance-LSBASC selects the optimal road segments set each time.

Fig. 5.
figure 5

Change of L

In Fig. 5(c), the number of semantic locations of the three algorithms is increasing, but the algorithm ASLPP is lower than that of the algorithm LSBASC and Enhance-LSBASC. This is because the algorithm ASLPP selects the road segment by the deficient number, reducing the number of road segments so that the number of semantic locations is decreasing. In Fig. 5(d), the regional privacy of the three algorithms is increasing, but the algorithm ASLPP is lower than that of the algorithm LSBASC and higher than that of the algorithm Enhance-LSBASC. This is because when the L increases, the number of road segments in the anonymous region is increasing, making the number of semantic locations are less sensitive to the increasing number of users.

5 Conclusion

In this paper, based on the existing semantic location privacy protection methods without the consideration of the user’s privacy preference, we propose an adjustable semantic location privacy protection scheme for road network. The scheme selects the adjacent road segment through privacy tolerance and deficient number. Under the premise of fully satisfying personalized privacy requirements, the privacy preference of location privacy protection and service quality is realized. Finally, simulation experiments show that the proposed scheme supports the user’s privacy preference.