Keywords

1 Introduction

The prevalence of GPS equipped mobile devices and the fast development of wireless communication technologies stimulate the emergence of location-based services (LBS). A LBS receives a users’ location and provides that user with information or services tailored to that location. For example, with the help of LBS, a user can get answers to various location-based queries, such as the nearest ATM, restaurant, or retail store to his/her current geographical position. LBS has a wide range of promising applications [3, 11, 13, 20, 21, 25, 26], however, it also poses significant privacy risks, as the location information collected from users can reveal far more than a user’s latitude and longitude. Knowing where a user is plus some background knowledge can infer many sensitive information about individuals, such as home address, health condition, lifestyle habits, and political attitude. To boost further development of LBS, user privacy must be protected. On the other hand, a LBS provider typically holds a Points of Interest (POI) database, based on which the location-based queries can be processed. The POI database is a great asset to the LBS provider, as building such a database generally requires many resources and is by no means a trivial task. It is therefore expected that the privacy of a LBS provider, that is, its POI database, should also be kept secret. That is, a LBS should only send necessary POI data to authorized users as the answers to their location-based queries.

A variety of approaches can be used to provide a certain degree of privacy preserving for location-based queries. Methods such as access control [1, 22], mix zone [2, 6], k-anonymity [4, 5, 14] and dummy location [9, 10, 23] can prevent LBS provider from learning the exact position of a user. Specifically, the first three methods introduce a trusted third party who maintains all users’ location. When a user wants to do a query, a cloaked area, instead of the exact location of the user, is generated by the third party and sent to the LBS provider for subsequent processing. Clearly, this kind of methods is vulnerable to misbehavior of the third party. To overcome this shortcoming, the method of dummy location abandons the trusted third party and fills a user’s query with both real and fake POIs. To achieve a high level of security, however, a lot of extra POIs need to be send to the LBS provider, resulting in heavy unnecessary communication cost. Besides, all these approaches ignore the privacy requirement of the LBS provider.

Private Information Retrieval (PIR) is another popular privacy-preserving technique for location-based queries. It allows a user to retrieve the data he/she wants from a database, without disclosing the index of the data to the database server. When simply applying PIR to location-based queries, the LBS provider cannot know the location of the user, but the user will get more POIs than the answer to his/her query. To deal with this, Yi et al. [7] present a solution based on two encryption schemes, Paillier and Rabin, which only allows the user to get the exact POIs to his/her query. Paulet et al. [18] also achieve this goal through a two-stage operation of Oblivious Transfer (OT) and PIR. However, it is remarkable that, these approaches have a linear computation cost with the number of POIs, resulting in a prohibitive long query processing time when the POI database is large.

Following the model presented in [18], we propose an efficient approach for location-based queries with mutual privacy protection. Our approach has a significant improvement in online performance by executing two rounds of OT extension on two small key sets. To reduce online query processing cost, the LBS provider constructs a private grid and a public grid index for its POI database. Every cell in both grids is encrypted by a different symmetric key. During the online phase of query processing, the user obtains the key of the cell of the public grid where he/she is located. With this target cell, the user can get the key of the cell of the private grid where he/she is located, and thus retrieves the required POIs eventually. Notice that in this procedure, the user can only obtain one specific key from the LBS provider, so he can only decrypt the necessary POIs. Hence, our approach guarantees data privacy for both the user and the LBS provider.

The rest of the paper is organized as follows. Section 2 discusses some recent achievements in privacy preserving LBS. Section 3 presents our approach for location-based queries processing with mutual privacy protection. Section 4 analyses the computation cost and the communication cost of our approach. We report experimental results in Sect. 5 and conclude our work in Sect. 6.

2 Related Work

The public concern over privacy stimulated lots of research efforts in privacy preserving LBS. And the privacy-preserving computation of trajectory [12] is an extension of LBS. An early step in protecting user’s location privacy is notifying and requesting user for the usage of their location. Information access control [1, 22] is a technique used to protect location information gathered by location tracking systems. It requires the location of user are gathered and relies on LBS provider to restrict access to stored location information through rule-based polices. But it’s vulnerable when the third party who maintains all user locations misbehaves.

Mix zone [2, 6], k-anonymity [4, 5, 14] and dummy location [9, 10, 23] solve this problem by hiding user’s location into some bigger zone or more records so that LBS provider can not locate the exact position of user. Both mix zone and k-anonymity use a trusted third party which is in charge of user’s location and assign user with cloaked query by which user can query LBS provider without revealing the exact location of them. In this situation, k-anonymity is effected heavily on the distribution and density of the users, which, are out of control and the balance of privacy and precision is another difficult problem need to be solved. Dummy location is completed by sending many random locations along with user’s query, though dummy location does not rely on any third party, the LBS provider still can restrict the user in a small space of the total domain, which leads to a weak privacy.

Private information retrieval (PIR) [7, 17, 18] based LBS is a new scheme which can provide a stronger cryptographic guarantees than former techniques. Both access control, mix zone and k-anonymity are vulnerable for the employ of third party who maintains all user locations. Also, k-anonymity is effected heavily on the distribution and density of the users, which, are out of control. LBS queries based on dummy locations in [23] incurs both computation and communication overhead for client. However, most of issues overhead can be solved by the introduction of PIR-based LBS queries. PIR is a technique which allows a user to retrieve data from a database, without disclosing the index to the database server. A PIR-based LBS queries usually require two stages. In the first stage, the mobile user retrieves the index of his location from the LBS provider. In the second stage, the mobile user retrieves the POIs according to the index from the LBS provider.

In [7], Ghinita et al. used Computational PIR [8] which is based on the Quadratic Residuosity Assumption which states that it is computationally hard to find the quadratic residues in modulo arithmetic of large composite number n where the factorization of n is unknown. The proposed method for nearest neighbors consists of two stages just as discussed before: determining the index of a public grid and retrieving the value of target cell with PIR. It is a secure and efficient approach for privacy preserving LBS queries if we only care about the privacy of user, but it is not appropriate when we not only care about LBS’s not learning anything about user’s query, but also want user not to learn anything about other data in LBS’s database other than the one she queried. Since with PIR scheme, client can also infer the data which in the same column with the target item.

One way to deal with the problem of data leakage is to encrypt all the data needed to be transferred. [24] proposed by Yi et al. accomplishes this objective by the combination of Paillier Homomorphic encryption scheme [16] and the Rabin encryption scheme [19]. Initially, LBS divides the location based database into cells with the same size, and collects k nearest POIs in each cell.After that, user just need to retrieval the data of target grid without compromising the privacy of both user and LBS. At first, user generates an encrypted query with Paillier and Rabin and sends it to LBS. LBS generates response by linearly computation on all data and sends the encrypted response back. Then, user can and only can get the target item decrypted with his private key. However, this method has a big computation cost for the usage of two encryption scheme and is not practical.

Another way realized this goal is introduced by Paulet et al. in [18], same to [7], this protocol is also organized according to two stages. In the first stage, the user determines his/her location within a public grid, using oblivious transfer (OT)[15]. This data contains both the ID and associated symmetric key for the block of data in the private grid. In the second stage, the user executes a communicational efficient PIR, to retrieve the appropriate block in the private grid which can be decrypted with the symmetric key obtained in the first stage. The property of oblivious transfer and PIR preserved the privacy of user. The data privacy of LBS can also be preserved, for all data of LBS are encrypted and user only has the key for target block.

3 Privacy-Preserving Location-Based Query Processing

3.1 System Model

In this paper, we simply adopt the system model proposed in [18] as the foundation of our work. As shown in Fig. 1, there are three types of entities in the model: a set of users, a mobile service provider (SP), and a location server (LS). LS holds a POI database and provides some location-based services that are consumed by the users. The POI database consists of \(\rho \) POI records, each of which gives the coordinate (i.e., longitude and latitude) of a position and a description about what is at the position. SP is responsible for establishing and maintaining the communication between LS and the users. We also take the reasonable assumption made in [18] where SP is considered to be a passive entity that is not allowed to collude with LS. This is because, if a user wants to consume some LBS, he/she has to endow SP with the authority to collect the whereabouts of his/her mobile device. In this case, any method for user privacy protection will be definitely overthrew provided that SP is allowed to collude with LS.

Fig. 1.
figure 1

System model

By excluding the possibility of collusion between SP and LS, we consider only two possible adversaries. When LS is an adversary, he/she tries to get the position of the users based on their location-based queries. When a user is an adversary, he/she tries to obtain more POI data than the necessary answer to his/her location-based queries. Hence, the objective of our work presented in this paper is to design a method that can protect mutual privacy (i.e., the user’s location and LS’s POI data) when processing location-based queries.

Fig. 2.
figure 2

Overview of privacy preserving location-based query processing

3.2 Approach Overview

Figure 2 gives an overview of our approach to the problem of location-based query processing with mutual privacy protection. Our approach consists of two steps: data preparation and query processing. Data preparation is conducted by LS itself and is totally an offline computation, while query processing involves some online computation and multiple rounds of communication between LS and the user. Table 1 summarizes the notations in our paper.

Data preparation. LS first constructs a quadtree as the spatial index of his/her POI database, based on which the quadtree subdivision of the whole region, denoted as grid Q, can be made directly. Every leaf of the quadtree has at most d POIs, which means every cell in Q has also at most d POIs. For these cells having fewer than d POIs, LS adds dummy POIs as placeholders. Besides the private grid Q, LS also generates a public grid P by dividing the whole region into \(m \times m\) cells where \(m=2^{h-1}\) and h is the height of the quadtree. Clearly, every cell \(P_i\) in grid P must be located in exactly one cell \(Q_j\) in grid Q. Based on this observation, every cell \(P_i\) in P stores the index j of its corresponding cell \(Q_j\) in Q. To keep his/her POI data secret, LS encrypts both the private grid Q and the public grid P. In particular, every cell \(Q_j\) (i.e., the d POIs in that cell) is encrypted by a key \(QK_j\) and every cell \(P_i\) (i.e., the index of its corresponding cell in grid Q) is encrypted by a key \(PK_i\). More details about data encryption will be given in the next subsection. The two encrypted grids E(P) and E(Q), together with m, are sent to all users who want to consume location-based services.

Query Processing. According to the value of m and his/her coordinates (note that the coordinates are just the input of location-based queries), it is quite easy for a user u to determine \(P_i\), the cell of the public grid P where he/she is located in. Holding the index of cell \(P_i\), the user is able to obtain the encryption key \(PK_i\) for \(P_i\) by running an oblivious key transfer protocol with LS. Note that, this key is also the decryption key for \(P_i\) (more details will be discussed later), so the user can immediately perform decryption locally and get the index of \(Q_j\), the cell of the private grid Q where he/she is located in. Once the index of cell \(Q_j\) is decided, the user performs the oblivious key transfer protocol again to get the encryption key \(QK_j\) for cell \(Q_j\). By decrypting \(E(Q_j)\) with the key \(QK_j\), the user can obtain the POIs contained in cell \(Q_j\), which is just the answer to his/her location-based query. Note that, the oblivious key transfer protocol ensures LS cannot learn the index of the cell where the user is located in, while the user cannot obtain more keys than the one for the cell where he/she is located in, thus protecting mutual privacy.

Table 1. Summary of notations

3.3 Protocol

Algorithm 1 shows our protocol of location-based query processing with mutual privacy protection. Recall that grid P has \(M=m*m\) cells, each of which contains an \(\alpha \)-bits integer \(P_I\), indicating the index of the corresponding cell in the private grid Q. The number of cells in grid Q depends on the POI database and is assumed to be N. Each cell in grid Q has d POIs (including dummy POIs), and their information such as the coordinates is represented as a string of bits, denoted as an \(\beta \)-bits integer \(Q_J\). To encrypt \(P_I\) for \(1 \le I \le M\), LS generates l random pairs of keys \((PK_1^0,PK_1^1),(PK_2^0,PK_2^1),\cdots , (PK_l^0,PK_l^1)\) where \(l = \log _2M\). For a given I, its binary representation \((i_1,i_2,\cdots , i_l)\) is used as the selection bits to select l keys from the l pairs of keys. The exclusive-OR of the l selected keys, \(\bigotimes _{k=1}^l PK_k^{i_k}\), is given to a pseudorandom generator H (which is typically implemented as secure hash function such as SHA-1) to produce an \(\alpha \)-bits pseudorandom string. The encryption of \(P_I\) is completed by computing the exclusive-OR of \(P_I\) and the pseudorandom string. \(Q_J\) can be likewise encrypted, as seen from lines 6-7. The encrypted grids \(E(P_I)\) and \(E(Q_J)\) are sent to u when u has a stable and fast network such as Wi-Fi, which completes the offline stage of our protocol.

In the online stage, u can determine the cell in the public grid P where he/she is located in based on his/her current coordinates. Based on the index of the cell, u can obtain the l keys that are used to encrypt the cell by running an obvious key transfer protocol (see Algorithm 2), which is built on the oblivious transfer extension protocol presented in [15]. The pseudorandom string used for encryption can be reproduced by the same H function provided that the same seed is given to H, so the decryption can be easily done after u gets these l keys. Once the index of the cell where u is located in the private grid Q is known by u, he/she can run the key transfer protocol again to get the keys for that cell, based on which \(Q_J\) can be decrypted to retrieve the POIs in that grid, which is just the answer to u’s query.

figure a
figure b

3.4 Correctness and Security Analysis

We first prove the utility of our proposed protocol which means the POIs user obtained are exactly the nearest POIs to user. Then, we prove that our protocol is secure for both user and LBS. LBS can not infer any information about user, the same, user can get no more information about LBS than the POIs he/she has requested.

Theorem 1 (Correctness). Assume that user and LBS follow Alogrithm 2 correctly. LBS holds l pairs of keys \((K_1^0,K_1^1),(K_2^0,K_2^1),\cdots ,(K_l^0,K_l^1)\) and user u holds an index I whose binary representation is \((i_1,i_2,\cdots , i_l)\), we will prove that the result \((K_1^{i_1},K_2^{i_2},\cdots ,K_l^{i_l})\) which is obtained by user u according to protocol is equal to \((QK_1^{i_1},QK_2^{i_2},\cdots ,QK_l^{i_l})\) which is exactly the value user u wants.

Proof. Firstly, according to line 9 of key transfer,

$$ K_k^{i_k} = H((g^\mu )^{\gamma _k},\beta ) \otimes E_k^{i_k} $$

for the value of \(i_k (1 \le k \le l)\), the binary representation of I, is either 0 or 1, and we will discuss the process with two cases:

1:when \(i_k = 0\), the value user u wants is \(QK_l^{i_k} = K_k^0\). Next, our job is to prove that the value \(K_k^{i_k}\) which user obtained according to the protocol is \(K_k^0\) too. Firstly, user u sets the value of \(\mathbf K _k^{0}\) and \(\mathbf K _k^{1}\) as (line 3):

$$ \mathbf K _k^{0}=g^{\gamma _k} , \mathbf K _k^{1}=e/\mathbf {K}_k^{0} $$

then u sends \(\mathbf K _k^{0}=g^{\gamma _k}\) to LS (line 4), so we have \(H((g^\mu )^{\gamma _k},\beta ) = H((\mathbf K ^0_k)^{\mu }),\beta )\), furthermore,

$$ H((g^\mu )^{\gamma _k},\beta ) \otimes E_k^{i_k} = H((g^\mu )^{\gamma _k},\beta ) \otimes H((\mathbf K ^0_k)^{\mu }),\beta ) \otimes K_k^0 $$

according to the property of exclusive,

$$ H((g^\mu )^{\gamma _k},\beta ) \otimes H((\mathbf K ^0_k)^{\mu }),\beta ) \otimes K_k^0 = K_k^0 $$

eventually, we have \(K_k^{i_k} = K_k^0 = QK_k^{i_k}\).

2:when \(i_k = 1\), the proof is just a copy to ahead, so we give the analytical procedure directly:

$$ K_k^{i_k} = H((g^\mu )^{\gamma _k},\beta ) \otimes E_k^{i_k} = H((g^\mu )^{\gamma _k},\beta ) \otimes H((\mathbf K ^1_k)^{\mu }),\beta ) \otimes K_k^1 = K_k^1 = QK_k^{i_k} $$

It is proved that for the k-th query of I, \(K_{k}^{i_{k}}\) is the right response for either \(i_k = 0\) or \(i_k = 1\). So, for all \(1\le k\le l\), \(K_{k}^{i_{k}} = QK_{k}^{i_{k}}\).

Based on the correctness of key transfer, it’s easy to verify the correctness of our proposed privacy preserving query processing. According to Algorithm 1, LBS sends both encrypted grids P and Q to u. If u gets the right key from LS to decrypt the target grid within P, then he gets the index of Q. The same, right key for \(Q_j\), guarantees that u can decrypt the right POI records, and get the right POIs eventually.

Security Model. We assume that LS and client are both semi-honest, also known as “honest but curious”. They run the protocol exactly as specified (no deviations, malicious or otherwise), but may try to learn as much as possible about the input of the other party from their views of the protocol.

It should be noted that though secure protocols against malicious adversaries exist, they are far too inefficient to implement and be used in practice. Secure protocols against semi-honest adversaries, however, are not only useful in practice but also the foundation of designing secure protocols against malicious adversaries.As we mentioned ahead, by excluding the possibility of collusion between SP and LS, we consider only two possible adversaries.

Security of u. When LS is an adversary, he/she tries to get the position of the users based on their location-based queries. However, the privacy of u is preserved, since the value that he/she sends to LS is out of distinguishing whether it was chosen directly at random or as \(e/\mathbf K _j^0\) for random \(\mathbf K _j^1\). LS gets no more information than random value \(\mathbf K _j^0\) .

Security of LS. In the preliminaries, we have the requirement that u cannot know the discrete logarithms of both \(\mathbf K _j^0\) and \(\mathbf K _j^1\), since this would reveal to u the discrete logarithm of e. Therefore, the Diffie-Hellman assumption implies that u cannot predict any value of \((\mathbf K _j^0)^\mu \) and \((\mathbf K _j^1)^\mu \) too. Based on the random oracle assumption, it ensures that u can not distinguish \(H((\mathbf K _j^0)^\mu )\) either \(H((\mathbf K _j^1)^\mu )\) from random, and so is the encrypted message \(E_j^0\) and \(E_j^1\). Thus u can get no more POI data except the one he/she has required. The privacy of LS can be preserved.

4 Performance Analysis

In this chapter, we analyze the computation and communication cost of our approach. As the most expensive operations in our approach are modular exponentiation (EXP) and hash functions (HASH), we focus on the number of times they are required. Table 2 summarizes the performance of our approach where \(\phi \) equals to \(\log _2M + \log _2N\).

4.1 Computation Cost

According to the protocol, our privacy preserving location-based query processing consists of two steps, data preparation and query processing. In the step of data preparation, the main computation cost comes from the encryption of two grids, P and Q. Recall that P and Q have M and N cells respectively, and each cell is encrypted via a secure hash function. Therefore, the computation cost of data preparation is \(M+N\) hashes.

The step of query processing requires two rounds of key transfer. To simplify discussion, we first assume the query of user u can be represented by 1-bit integer i. As seen from line 3 of Alogrithm 2, either \(\mathbf K ^{0}\) or \(\mathbf K ^{1}\) is set to \(g^{\gamma }\) by one exponentiation. After that, u sends the \(\mathbf K ^0\) to LS, and LS reconstructs the key \((\mathbf K ^0)^{\mu }\) by one exponentiation (see line 6 of Alogrithm 2). Next, 2 hashes are used to encrypt original messages \((\mathbf K ^0,\mathbf K ^1)\) by LS (see line 7 of Alogrithm 2). To extract message \(\mathbf K ^i\) from \(E^i\) (see line 9 of Alogrithm 2), one exponentiation is needed for the computation of \((g^\mu )^{\gamma }\) and one hash for key.

Recall that each grid of P and Q is represented by an \(\alpha \)-bits and \(\beta \)-bits integer, respectively. Consequently, the number of key pairs, l, of P is \(\log _2M\) and \(\log _2N\) for Q. Hence, to transfer \(\log _2M\) keys for P and \(\log _2N \) keys for Q, LS needs 4 exponentiations offline and \(\phi \) exponentiations adds \(2\phi \) hashes online, while u needs \(2\phi \) exponentiations and \(\phi \) hashes, where \(\phi =\log _2M + \log _2N\).

Table 2. Performance analysis of our protocol

4.2 Communication Cost

At the end of data preparation, LS sends the encrypted grids E(P) and E(Q) to u (see line 8 in Algorithm 1). Since P has M grids with \(\alpha \) bits for each grid, and Q has N grids with \(\beta \) bits of each grid, the size of encrypted grids P and Q are \(M\alpha \) and \(N\beta \) respectively. By taking advantage of hash function, the length of encrypted grids E(P) and E(Q) remains \(M\alpha \) and \(N\beta \). Therefore, the communication cost of data preparation is \(M\alpha + N\beta \).

Before discussing the communication cost of query processing, it is important to distinguish between the length of the input element, and the length of group element v (which is typically 1024 bits long). With l pairs of keys as the input, the data from user to LS is composed of l group elements \(\mathbf K _j^0 (1\le j \le l)\) (see line 4 in Alogrithm 2), and the data from LS to user is composed of a group element \(g^\mu \) and 2l encrypted messages \(E=(E_j^0,E_j^1)\) in the size of input (see line 8 in Alogrithm 2).

In the two rounds of key transfer, the length of key which is used to encrypt grid P and Q is set to w (which is typically 80 bits), which means the length of input element is w bits. As the number of key pairs, l, during key transfer is \(\log _2M\) and \(\log _2N\), the communication cost is \(v\phi \) bits from user to LS and \(2v+2w\phi \) bits from LS to user.

5 Experimental Evaluation

In this section, we present experimental evaluations of our approach on a synthetic dataset. We implement the methods proposed in [18, 24], and compare them with our approach. All experiments are performed on a PC with 3.2 GHz CPU, 8 GB RAM, JDK 7 and Win 7. The synthetic dataset is a 1000*1000 region with uniformly distributed POIs, each of which is represented by a 64-bits integer. The key size to pseudo-random function H is set to 80. We have two variables in the experiment, the number of POIs, N, and the leaf capacity of quadtree d. d is set to 15 while N varies from 20k to 100k with a step of 20k. N is set to 50k when d varies from 5 to 25 with a step of 5.

Fig. 3.
figure 3

Online response time of three approaches

It is clear from Fig. 3 that our approach has the best performance in terms of online response time. Using our approach, queries on all datasets can be responded in less than 0.1 s, even for the biggest dataset with 100k POIs. In contrast, though the method of Paulet et al. can return answers within 1 s on the dataset with 20k POIs, it takes more time when N becomes bigger or d becomes smaller. For example, when \(N=50\)k and \(d=5\), it takes about 18 s which is unacceptable in practice. The method proposed by Yi et al. needs 5 s even on the smallest dataset with 20k POIs.

Clearly, with the increasing number of POIs or decreasing number of leaf capacity, both grids Q and P have more cells, which results in the increase of both computation and communication cost. However, our approach in the phase of online computation works only on \((log_2M + log_2N)\) pairs of keys, so the increase of POIs can be largely overlooked which means our approach has a good scalability. Besides, LS needs to send users the encrypted E(P) and E(Q) whose size is linear with \(M+N\). Hence, the increase of POIs or the decrease of leaf capacity will both lead to a linear increase of communication cost.

Fig. 4.
figure 4

Communication cost of three approaches

Figure 4 shows the communication cost of three approaches. Clearly, the communication cost of our approach is not the best, but it is the cost for a significant improvement in online response time. Further, the increase of communication cost is totally acceptable in practice. For example, for the biggest dataset with 100k POIs, the communication cost of our approach is only about 12 MB, which is affordable in a stable and fast network such as Wi-Fi.

We also notice that our approach and the method of Paulet et al. both need to encrypt some data beforehand to protect the data privacy of LS. Figure 5 shows the time of data preparation. Clearly, our approach needs less time than the method of Paulet et al. In particular, our approach can be finished in 10 s for a dataset with 100k POIs. That is, even with the time of data preparation, a new user can get response in seconds using our approach.

Fig. 5.
figure 5

Data preparation cost

In conclusion, the empirical study shows that our approach based on two rounds of OT transfer has a great improvement in online response time. Specifically, our approach can provide mutual privacy protection for location-based queries in less than 0.1 s with a communication cost less than 12 MB even for a dataset of 100k POIs. Therefore, our protocol is both computational efficient and communicational efficient.

6 Conclusion

In this paper, we propose an efficient approach to protecting mutual privacy in location-based queries. We achieve the goal by performing two rounds of OT extension on two small key sets. We theoretically prove the security and analyze the complexity of our approach. Compared with state-of-the-art work, our approach has several orders of magnitude improvement in online response time, only at the expense of little and acceptable communication cost. Empirical study shows that our approach is both computational efficient and communicational efficient.