Privacy-Preserving Multidimensional Range Query on Real-Time Data

Ting, Zhong; Xiao, Han; Yunshuo, Yang; Aixiang, Zhu

doi:10.1007/978-3-319-27051-7_5

Zhong Ting¹⁷,
Han Xiao¹⁷,
Yang Yunshuo¹⁷ &
…
Zhu Aixiang¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9483))

Included in the following conference series:

International Conference on Cloud Computing and Security

1761 Accesses

Abstract

We propose a scheme for realizing privacy-preserving multidimensional range query on real-time data. This is motivated by the scenario where data owner (DO) stores its real-time data that are periodically submitted by its subordinate data collectors (DCs) on cloud service provider (CSP) in encrypted form and sometimes executes range query. The semi-trusted CSP is curious about the data and may return incomplete query results to DO. We divide time into $ N $ epochs and adopt key-insulated technology which supports periodical key update to bucketization method. Our scheme radically reduces the cost of DO and is more secure in the sense that keys for up to $ m < N $ epochs can be compromised without jeopardizing the security of whole system. Moreover, we realize the integrity verification of query results. Experiment results show that the cost of our scheme is acceptable, particularly for the scenario where each DC collects small amount of real-time data in each epoch.

Access provided by Autonomous University of Puebla. Download conference paper PDF

TRQED: Secure and Fast Tree-Based Private Range Queries over Encrypted Cloud

Fast Multi-dimensional Range Queries on Encrypted Cloud Databases

Verifiable Range Query Processing for Cloud Computing

Keywords

1 Introduction

Motivation: In some data outsourcing scenarios, the data owner (DO) may store its real-time data that are periodically submitted by its subordinate data collectors (DCs) on cloud service provider (CSP) in encrypted form. For a typical example, Fig. 1 shows a radiation detection scenario where DCs which manage sensing devices periodically upload the data they collected to CSP, and the data they collected are related with three dimensions: the time, the geographic location of the road (denoted by GPS coordinates) and the local radiation level. Environmental Protection Agency (DO) sometimes may query the CSP to get radiation levels of roads. Moreover, there are many other similar scenarios, such as traffic accident monitoring system where Traffic Management Agency often requests pictures of accidents stored on CSP. In these scenarios, the privacy of those data which DCs upload to the CSP should be respected. We note that CSPs in these scenarios is semi-trusted and curious. The CSP can correctly implement various operations by rules but may omit some query results and it is curious about the data they store. In addition, the CSP may omit some query results.

We aim to realize multidimensional range query on real time data with privacy-preserving in the above scenarios. Generally, it is necessary to store these sensitive data in encrypted form and the keys are kept secret from CSP all the time. In the trivial approach, DCs upload data to DO in real time first. Then DO encrypts the data and generates indexes itself and finally uploads them to CSP. Also, DO may sometimes query the data that are stored in CSP. Note that symmetric cryptosystem, such as AES, is used in the trivial approach, and the key keeps unchanged.

Since data that DCs upload to DO is real-time, the trivial approach is very inefficient. DO needs to frequently upload newly collected data to the CSP and frequently updates index structure on the CSP, which lead to very high cost at DO.

For the scenario mentioned above, we propose a scheme to realize privacy-preserving multidimensional range query based on bucketization scheme on real-time data for cloud storage. Since we are dealing with real time data, we divide time into $ N $ epochs and adopt an asymmetric key-insulated method to automatically update the key at each epoch. Thus DO only need to issue keys to DC every $ N $ epochs and query the CSP when necessary. DCs directly upload the collected real time data and indexing information to the CSP. Furthermore, since the CSP is semi-trusted and may submit forged or incomplete query results to DO, it is necessary to guarantee the integrity of query result.

Our Contributions: For the scenario mentioned above, we propose a scheme to realize privacy-preserving multidimensional range query based on bucketization scheme on real-time data for cloud storage. To summarize, the contributions of this paper are: (1) We divide time into $ N $ epochs and adopt key-insulated technology which is based on public key cryptosystem and supports periodical key update to bucketization method. Compared with the trivial approach, there are two advantages: (i) DCs undertake most of the work including collecting data, generating indexes and uploading data and thus radically reduce the cost of DO in the trivial approach. (ii) Key distribution is simple, because keys of each epoch can be calculated from keys of the previous epoch and the cycle for key distribution are $ N $ epochs where $ N $ is a parameter of the key-insulated technology. (2) Our scheme supports the integrity verification of query result, and can verify whether CSP who is semi-trusted omits some query results or not and thereby avoid the query-result incompleteness.

Organization: Section 2 introduces related works. Our scheme is presented in Sect. 3 and security analysis is exhibited in Sect. 4. Experiment results are provided in Sect. 5. Section 6 is the conclusion and Sect. 7 is the acknowledgements of this paper.

2 Related Work

2.1 Solutions for Multidimensional Range Query

So far, there are mainly three types of solutions for multidimensional range queries:

(1) Methods that use some specialized data structures [1–3]: The performance of these methods often gets worse as the dimension of data increases. Shi et al. [1] proposes a binary interval tree to represent ranges efficiently and applies it to multidimensional range queries, but their scheme cannot avoid unwanted leakage of information because of adapting one-dimensional search techniques to the multidimensional case. A binary string-based encoding of ranges is proposed by Li, J. et al. in [3]. However, since each record in their scheme has to be checked by the server in query processing, the access mechanism of their scheme is inefficient. (2) Order-preserving encryption-based techniques (OPE) [4, 5]: the order of plaintext data in OPE is preserved in cipher text domain. But it is susceptible to statistical attacks as the encryption is deterministic (i.e., the encryption of a given plaintext is identical, thus the frequency of distinct value in the dataset often be revealed). (3) Bucketization-based techniques [6–8]: Since it partitions and indexes data by simple distributional properties, it can realize efficient query and keep the information disclosure to a minimum. Hacigumus et al. [6] first propose the bucketization-based representation for query processing, and afterward many researchers adopted Bucketization-based techniques for multidimensional range queries [7–10].

2.2 Key-Insulated Technology

Key-insulated technology [9, 10, 11–13] is based on public key cryptosystem. We call a scheme is ($ m, \, N $)-key-insulated [10] if it satisfies the following notion: we assume that the exposures of secret keys for decrypting data are inevitable and up to $ m < N $ epochs can be compromised. Secret key which is used to decrypt the data encrypted by public key is refreshed at discrete time epochs via interaction with a secure device; public key update can be done independently. The security objective is to minimize the affect caused by compromised epochs, and the only thing an adversary can do is to destroy the security of the data in the epochs that are compromised.

3 Privacy-Preserving Multidimensional Range Query

In this section, we present a scheme for privacy-preserving multidimensional range query which can be applied in the case where data are submitted in real time.

We describe real-time data as a tuple of attribute values $ \left\{ {A_{1} ,A_{2} , \cdots ,A_{d} } \right\} $, where $ d \ge 1 $ denotes the number of attributes of data and this data is referred to as multidimensional data. We consider the following type of multidimensional range query:

$$ \left( {epoch = t} \right) \wedge \left( {l_{a} \le A_{j} \le l_{b} } \right),j \in \left[ {1,d} \right], $$

Where $ t $ denotes the interested epoch ($ t \in \left[ {1,N} \right] $), $ \left[ {l_{a} ,l_{b} } \right] $ denotes the wanted range of attribute $ A_{j} $. Moreover, this type of range query can be extended to the type of multiple range queries that involve several epochs and the union of several attributes.

3.1 Our System Model

Figure 2 shows our system model. We assume that DO, DCs and the key-update device are trusted, while the CSP is semi-trusted and curious. The key-update device is physically-secure but computationally-limited, it is applied only to key updates and not applied to the actual cryptographic computations.

As for setup of the system, DO generates an initial decryption key, a master encryption key and a master key for each DC. Then it distributes master encryption keys to corresponding DCs, and distributes master keys to the key-update device. We will provide detailed system setup in Sect. 3.2.

At the end of each epoch, each DC generates an encryption key by its master encryption key to encrypt the data it collected in that epoch. In addition to partitioning the data to buckets by bucketization-based method, each DC also generates some verifying numbers for empty buckets (the buckets that have no data in them) which are used in our integrity verification processing of query result. The data which consists of bucket IDs, encrypted data and verifying numbers are finally uploaded to the CSP by each DC at each epoch. Detailed data processing in each epoch is provided in Sect. 3.3.

DO sometimes needs to query the data stored on the CSP. DO first translates plaintext queries to bucket IDs and sends them to the CSP. Then the CSP returns a set of encrypted results, it also returns a proof that is generated using corresponding verifying numbers. Once DO receives the results, it first verifies query-result integrity by the proof. If it succeeds, DO will update the decryption key(s) by interacting with the key-update device to decrypt the encrypted results and then filter them to get wanted query results. Note that in our scheme, initial decryption keys are used only one time when DO generates decryption keys of the first epoch ($ t = 1 $) and DO can update the to the decryption keys of any desired epoch on “one shot”. Detailed query processing is shown in Sect. 3.4.

3.2 System Setup

For $ DC_{i} $, DO generates master encryption keys, initial decryption keys and master keys by the key generation algorithm $ {\mathcal{G}}\left( {1^{k} ,m,N} \right) \to \left( {PK_{i}^{*} ,SK_{i}^{ * } ,SK_{i,0} } \right) $.

In detail, given a security parameter $ 1^{k} $, the maximum of acceptable exposure epoch $ m $ and the number of epochs $ N $, DO does following steps for $ DC_{i} $:

Step 1. It randomly chooses a prime $ q $($ \left| q \right| \, = \, k $) and a prime $ p \, = \, 2q + 1 $. Let $ {\mathbb{G}} \subset {\mathbb{Z}}_{p}^{*} $ of size $ q $ be a unique subgroup. And we assume that $ {\mathbb{G}} $ holds the DDH assumption. Then it chooses $ g,h \in {\mathbb{G}} $ at random.

Step 2. It selects $ x_{0}^{*} ,y_{0}^{*} , \cdots ,x_{m}^{*} ,y_{m}^{*} $ by $ {\mathbb{Z}}_{q} $;

Step 3. It computes $ z_{0}^{*} = g^{{x_{0}^{*} }} h^{{y_{0}^{*} }} , \cdots ,z_{m}^{*} = g^{{x_{m}^{*} }} h^{{y_{m}^{*} }} $;

Step 4. Finally, it outputs:

the master encryption key, $ PK_{i}^{*} = \left( {g,h,z_{0}^{*} , \cdots ,z_{m}^{*} } \right) $;

the initial decryption key, $ SK_{i,0} = (x_{0}^{*} ,y_{0}^{*} ) $;

the master key, $ SK_{i}^{*} = \left( {x_{1}^{*} ,y_{1}^{*} , \cdots ,x_{m}^{*} ,y_{m}^{*} } \right) $.

Step 5. It distributes the master encryption key $ PK_{i}^{*} $ to $ DC_{i} $, and distributes master key $ SK_{i}^{*} $ to the key-update device.

3.3 Data Processing in Each Epoch

Each DC partitions the data it collected into buckets by bucketization based method [14]. Note that we denote by bucket ID the index in our scheme. Due to space constraints, we do not describe bucketization based method in detail. For convenience, we make the following assumptions:

We denote by $ \varvec\Omega $ the set of all buckets including non-empty buckets and empty buckets that $ DC_{i} $ generated in epoch $ t $.
We define the number of non-empty buckets $ DC_{i} $ generated in epoch $ t $ with $ Y_{i,t} $, the non-empty buckets $ DC_{i} $ generated in epoch $ t $ are denoted by $ {\mathbf{B}}_{i,t} = \left\{ {b_{1} ,b_{2} , \cdots ,b_{{Y_{i,t} }} } \right\} \subseteq {\varvec{\Omega}},j \in \left[ {1,Y_{i,t} } \right] $.

At the end of each epoch, each DC generates the corresponding encryption key for the epoch by its master encryption key and uses it to encrypt all the non-empty buckets that have data in them. Let`s consider the encryption work that $ DC_{i} $ does at the end of epoch $ t $. Given master encryption key $ PK_{i}^{*} \equiv \left( {g,h,z_{0}^{*} , \cdots ,z_{m}^{*} } \right) $ and epoch value $ t $, $ DC_{i} $ does:

Step 1. It computes $ PK_{i,t} = \prod\nolimits_{l = 0}^{m} {\left( {{\text{z}}_{l}^{ *} } \right)^{{{\text{t}}^{l} }} } $;

Step 2. It selects $ r \in {\mathbb{Z}}_{q} $ at random, and computes the cipher text of data $ D_{j} $ falling into the bucket $ b_{j} $:$ \left( {\left\{ {D_{j} } \right\}_{{PK_{i,t} }} } \right) = \left( {g^{r} ,h^{r} ,\left( {\text{PK}_{i,t} } \right)^{r} \cdot \left( {D_{j} } \right)} \right) $.

Although the above encryption method can ensure data confidentiality (as the CSP does not know decryption key), the CSP may still omit some data which satisfy the query, leading to query-result incompleteness. Our solution is that $ DC_{i} $ generates verifying number $ num\left( {b_{k} ,i,t} \right) $ for each empty bucket $ b_{k} \in \varvec\Omega \backslash {\mathbf{B}}_{i,t} $ at the end of epoch $ t $:$ num\left( {b_{k} ,i,t} \right) = h_{a} \left( {i\parallel t\parallel b_{k} \parallel PK_{i,t} } \right) $, where $ h_{a} (\cdot) $ denotes a hash function of $ a $ bits. Finally, $ DC_{i} $ uploads to the CSP all encrypted non-empty buckets and verifying numbers with their respective bucket IDs as follows:

$$ i,t,\left\{ {b_{j} ,\left\{ {D_{j} } \right\}_{{PK_{i,t} }} |b_{j} \in {\mathbf{B}}_{\text{i,t}} } \right\},\left\{ {b_{k} ,num\left( {b_{k} ,i,t} \right)|b_{k} \in \varvec\Omega \backslash {\mathbf{B}}_{i,t} } \right\}. $$

3.4 Query Processing

3.4.1 Decryption Key Update

The key update here is semantic secure which can be proved under the DDH assumption. We note that the key-update device is unnecessary for key update in the scenario where the storage of DO is secure (decryption keys cannot be exposure). Here, we assume that the exposures of decryption keys are inevitable and up to $ m < N $ epochs can be compromised (where $ m $ is a parameter). Adversary is not sufficient to derive any decryption keys even it compromise the key-update alone.

Let`s consider the update from the decryption key $ SK_{i,t} $ which can decrypt the data uploaded by $ DC_{i} $ in epoch $ t $ to the decryption key $ SK_{{i,t^{\prime}}} $($ t,t^{\prime} \in \left[ {1,N} \right],t < t^{\prime} $).

First, the key-update device generates a partial decryption key $ SK_{{i,t^{\prime}}}^{\prime } $ of epoch $ t^{\prime} $ by the device key-update algorithm $ {\mathcal{D}\mathcal{K}\mathcal{U}\ominus }\left( {t^{\prime},SK_{i}^{*} } \right) \to \left( {SK_{{i,t^{\prime}}}^{\prime } } \right) $, which inputs epoch value $ t^{\prime} $ and the master key $ SK_{i}^{*} = \left( {x_{1}^{*} ,y_{1}^{*} , \cdots ,x_{m}^{*} ,y_{m}^{*} } \right) $ and finally outputs partial decryption key $ SK_{{i,t^{\prime}}}^{\prime } = \left( {x_{{t^{\prime}}}^{\prime } ,y_{{t^{\prime}}}^{\prime } } \right) $ by computing $ x_{{t^{\prime}}}^{\prime } \equiv \sum\nolimits_{l = 1}^{m} {x_{l}^{*} \left( {\left( {t^{\prime}} \right)^{l} - t^{l} } \right)} $ and $ y_{{t^{\prime}}}^{\prime } \equiv \sum\nolimits_{l = 1}^{m} {y_{l}^{*} } $ $ \left( {\left( {t^{\prime}} \right)^{l} - t^{l} } \right) $. We assume that the exposure of partial decryption key occurs less likely than the exposure of decryption key.

Second, DO uses the partial decryption key $ SK_{{i,t^{\prime}}}^{\prime } $ to generate the decryption key $ SK_{{i,t^{\prime}}} $ by the DO key-update algorithm $ {\mathcal{D}\mathcal{O}\mathcal{K}\mathcal{U}}\left( {t^{\prime},SK_{i,t} ,SK_{{i,t^{\prime}}}^{\prime } } \right) \to \left( {SK_{{i,t^{\prime}}} } \right) $, which inputs epoch value $ t^{\prime} $, the decryption key of epoch $ t $ $ SK_{i,t} = \left( {x_{t} ,y_{t} } \right) $ and the partial decryption key $ SK_{{i,t^{\prime}}}^{\prime } = \left( {x_{{t^{\prime}}}^{\prime } ,y_{{t^{\prime}}}^{\prime } } \right) $ and finally outputs the decryption key $ SK_{{i,t^{\prime}}} = $ $ \left( {x_{{t^{\prime}}} ,y_{{t^{\prime}}} } \right) $ by computing $ x_{{t^{\prime}}} = x_{t} + x_{{t^{\prime}}}^{\prime } $ and $ y_{{t^{\prime}}} = y_{t} + y_{{t^{\prime}}}^{\prime } $.

3.4.2 Query and Verification

Here, we make the following assumptions for convenience: (1) We denote by $ \alpha $ the average number of non-empty buckets generated by all DCs in each epoch; (2) $ \left\langle {t,Q_{t} } \right\rangle $ denotes a query, where $ Q_{t} $ represents the set of queried bucket IDs.

When DO wants to query some data, it first translates the plaintext query to $ \left\langle {t,Q_{t} } \right\rangle $ by the bucketization based partition method. After receiving $ \left\langle {t,Q_{t} } \right\rangle $, CSP queries the data that are uploaded by DCs in epoch $ t $ to get all the buckets satisfying $ Q_{t} $ and then it computes a hash by the concatenated verifying numbers from all DCs which corresponding empty bucket IDs interacts with $ Q_{t} $:

$$ NUM_{{Q_{t} }} = h_{b} \left( {\mathop {||}\limits_{{b_{k} \in Q_{t} \cap \varOmega \backslash {\mathbf{B}}_{i,t} ,i \in \left[ {1,n} \right],t \in \left[ {1,N} \right]}} num\left( {b_{k} ,t} \right)} \right), $$

Where $ h_{b} (\cdot) $ is a hash function with $ b $ bits. And afterwards the CSP returns query results as follows:

$$ i,t,\left\{ {b_{j} ,\left\{ {D_{j} } \right\}_{{PK_{i,t} }} |b_{j} \in Q_{t} \cap {\mathbf{B}}_{\text{i,t}} } \right\},\left\{ {b_{k} ,NUM_{{Q_{t} }} |b_{k} \in Q_{t} \cap \varOmega \backslash {\mathbf{B}}_{i,t} ,i \in \left[ {1,n} \right],t \in \left[ {1,N} \right]} \right\}. $$

Since DO knows $ PK_{i}^{*} $, it can compute all the corresponding verifying numbers and then computes a $ NUM_{{Q_{t} }}^{\prime } $. If $ NUM_{{Q_{t} }}^{\prime } = NUM_{{Q_{t} }} $, DO considers that CSP did not omit query results (otherwise it did) then uses corresponding decryption key(s) to decrypt the results by computing $ D_{i,j} \equiv {{\left\{ {D_{j} } \right\}_{{PK_{i,t} }} } \mathord{\left/ {\vphantom {{\left\{ {D_{j} } \right\}_{{PK_{i,t} }} } {g^{{rx_{t} }} h^{{ry_{t} }} }}} \right. \kern-0pt} {g^{{rx_{t} }} h^{{ry_{t} }} }} $. We note that DO can query the data that are uploaded by several DCs in several epochs.

However, the query results always contain some superfluous data items (false positives) that DO does not really want (as the interested ranges may not exactly span full buckets). Using finer buckets can reduce such false positives, but this brings the problem that the data distribution may be more accurately estimated by adversary and thus increase the risk of information disclosure in bucketization. For this problem, we can adopt the maximum entropy principle to each bucket to uniform the distribution of sensitive attributes in each bucket, whereby minimize the risk of disclosure. And we also can refer to [15, 16] for optimal bucketing strategies which can achieve a good balance between reducing false positives and reducing the risk of information disclosure.

4 Security Analysis

4.1 Analysis on Integrity Verification

As we discussed above, the CSP may omit some query results, leading to incompleteness of query. Let`s consider the probability that the misbehavior of the CSP can be detected. We assume that each of $ \alpha $ non-empty bucket is queried with probability $ \gamma $ and omitted with probability $ \delta $, so the total number of omitted buckets generated by all DCs is $ \alpha \gamma \delta $. To escape the detection by DO, the CSP must return a correct $ NUM^{\prime}_{{Q_{t} }} $ corresponding to the incomplete query results. And the probability of guessing a correct $ NUM^{\prime}_{{Q_{t} }} $ is $ 2^{ - a} $. So the probability that the misbehavior of the CSP can be detected is:

$$ P_{\det } = 1 - 2^{ - a\alpha \gamma \delta } . $$

It is more likely to successfully detect the misbehavior of the CSP as $ a\alpha \gamma \delta $ is larger.

4.2 Analysis on Impact of Compromised DCs

In this section, we analyze the case that adversary may additionally compromise some DCs to help the CSP escape our query-result integrity verification. With the master encryption keys revealed by these compromised DCs, the CSP can derive all the corresponding verifying numbers and omit the buckets which are uploaded by compromised DCs without being detected. Note that the behaviors of compromised DCs can not affect non-compromised DCs, so the performance of the integrity verification method will not be affected when non-compromised DCs are always the majority.

Specially, we assume that each DC is compromised with probability $ p_{c} \ll {1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0pt} 2} $. Let`s consider following case: the CSP omits some buckets generated by $ n^{\prime} = \left( {1 - p_{c} } \right)n $ non-compromised DCs which totally generate $ \alpha^{\prime} = \alpha \left( {1 - p_{c} } \right) $ non-empty buckets. The analysis on integrity verification still holds after replacing $ n $ with $ n^{\prime} $ and $ \alpha $ with $ \alpha^{\prime} $.

4.3 Attacks on the (M, N)-Key-Insulated Method

Let`s analyze the impact of compromised DO with an insecure storage and compromised key-update device. To model key exposure attacks, it assumed that adversary has right to access key exposure oracle (which inputs $ t $ and returns the temporary decryption key), left-or-right encryption oracle [17] (which inputs epoch value and plaintext data and then returns cipher text) and decryption oracle (which inputs epoch value and cipher text then returns plaintext).

Under the DDH assumption, the ($ m, \, N $)-key-insulated method is semantically secure. The authors in [10] proved that the ($ m, \, N $)-key-insulated method can against following three types of key exposure, and we omit the proof for space constraint: (1) Ordinary key exposure: it occurs when adversary compromise the insecure storage of DO (i.e., $ SK_{i,t} $ is leaked); (2) Key-update exposure: it occurs when adversary break into the insecure storage of DO in key updating step (i.e., between epoch $ t - 1 $ and epoch $ t $) to get $ SK_{i,t - 1} $, $ SK_{i,t}^{\prime } $ so as to generate $ SK_{i,t} $; (3) Master key exposure: it occurs when adversary compromise the key-update device (i.e., leakage of $ SK_{i}^{*} $).

5 Experiment

To evaluate the performance of our scheme, we compare it with the trivial approach described in Sect. 1. We assume that DO in the trivial approach adopts AES-128 algorithm and the same bucketization-based methods as in our scheme. In both approaches, time is divided into epochs. The cost of generating bucket IDs in both schemes can be ignored due to its simplicity. Let $ k $ be the total number of buckets in each epoch and $ \lambda $ be the percentage of non-empty buckets among all buckets. We denote the probability of each bucket being queried by $ \gamma $. Since the duration for each epoch is not long and the data collected in each epoch and each bucket won’t be much, we suppose that the size of data in each non-empty bucket and each epoch is 4096bit for simplicity.

Primitive Operations: The main (primitive) operations used in our scheme including

(1)
Modular exponentiation;
(2)
Modular multiplication;
(3)
SHA-256 computation of 1024-bit integers.

The time needed for these operations was benchmarked in [18] as following:

(1)
The average time for computing the power of a 1024-bit number to a 270-bit exponent and then reducing modulo was found to be $ t_{1} = 1.5 $ ms;
(2)
Multiplication of 270-bit numbers modulo was found to be $ t_{2} = 0.00016 $ ms;
(3)
The average time to compute the 256-bit digest of a 1024-bit number was found to be $ t_{3} = 0.01 $ ms.

For efficiency, we can adopt AES-128 algorithm to encrypt the data in buckets and then use the keys generated by ($ m, \, N $)-key-insulated method to encrypt symmetric keys in our scheme. Since AES-128 algorithm in the trivial approach and our scheme is efficient and the amount of data that are encrypted using AES-128 algorithm is same, we do not consider the cost of AES-128 algorithm when comparing the performance of the trivial approach and our scheme.

Next, we will compare the cost of the trivial approach and our scheme, including cost of the system setup, data processing work in each epoch and query processing.

$ t^{\prime\prime} $ is the number of queried epochs.

System Setup. DO in our scheme generates an initial decryption key, a master encryption key and a master key for each DC, while DO in the trivial approach only generates a key which is used in the AES-128 algorithm. In addition, DO in our scheme then distributes the master encryption key and the master key to the corresponding DC and key-update device respectively. We can ignore the transmission time, and the time of system setup for each DC is $ 2\left( {m + 1} \right)t_{1} + \left( {m + 1} \right)t_{2} $. And the cost of system setup in our scheme is acceptable for the following two reasons: (i) The cost is linearly related to $ m $ which can be very small even for high security requirements. (ii) DO only needs to generate these keys for every $ N $ epochs.

Data Processing in Each Epoch. DO in the trivial approach needs to encrypt and generate bucket IDs for the data that are uploaded by all DCs, and it also need frequently upload these encrypted data and bucket IDs to CSP. The communication cost of DO would be high for real-time data. However, in our scheme, each DC undertakes these works for its data and DO don’t need to do anything for data processing. Additionally, each DC in our scheme needs to generate its encryption key for current epoch and the time for key generation is $ 2\left( {m + 1} \right)t_{1} + mt_{2} $ (it is acceptable for the small value of $ m $). And each DC in our scheme also generates verifying numbers for empty buckets and this is to improve security against semi-trusted CSP which the trivial approach doesn’t consider. The averaged time that each DC in our scheme encrypts $ {{\lambda k} \mathord{\left/ {\vphantom {{\lambda k} n}} \right. \kern-0pt} n} $ non-empty data buckets and generates verifying numbers for $ {{(1 - \lambda )k} \mathord{\left/ {\vphantom {{(1 - \lambda )k} n}} \right. \kern-0pt} n} $ empty buckets in each epoch is $ {{\left( {3t_{1} + t_{2} } \right)\lambda k} \mathord{\left/ {\vphantom {{\left( {3t_{1} + t_{2} } \right)\lambda k} n}} \right. \kern-0pt} n} + {{(1 - \lambda )kt_{3} } \mathord{\left/ {\vphantom {{(1 - \lambda )kt_{3} } n}} \right. \kern-0pt} n} $. It is worthwhile for DCs in our scheme to spend additional time for verifying numbers generation and encryption key update because this also improves security of the system. And the total time of data processing in each epoch for each DC is as follows:

$$ 2\left( {m + 1} \right)t_{1} + mt_{2} + {{\left( {3t_{1} + t_{2} } \right)\lambda k} \mathord{\left/ {\vphantom {{\left( {3t_{1} + t_{2} } \right)\lambda k} n}} \right. \kern-0pt} n} + {{(1 - \lambda )kt_{3} } \mathord{\left/ {\vphantom {{(1 - \lambda )kt_{3} } n}} \right. \kern-0pt} n} $$

Results are showing in Fig. 3. There are four lines that denote the time for data upload work with $ k = 1000 $, $ k = 2000 $, $ k = 3000 $ and $ k = 4000 $, respectively. The time is increasing as $ \lambda $ increasing. Experiment results show that the cost of data processing work in each epoch for each DC is acceptable.

Query Processing. In addition to transforming query and decrypting query results, DO in our scheme need update decryption keys and verify the integrity of query results. But these only occur when DO wants to query the data. Here, let`s first consider the case that DO queries the data that are uploaded by one DC in one epoch. The query results contain $ {{k\lambda \gamma } \mathord{\left/ {\vphantom {{k\lambda \gamma } n}} \right. \kern-0pt} n} $ non-empty buckets, and the time for decrypt the query results is $ {{2\lambda \gamma k\left( {t_{1} + t_{2} } \right)} \mathord{\left/ {\vphantom {{2\lambda \gamma k\left( {t_{1} + t_{2} } \right)} n}} \right. \kern-0pt} n} $. Moreover, the time for key update is $ 4m\left( {t_{1} + t_{2} } \right) $, and the time for computing $ NUM_{{Q_{t} }} $ is $ {{\left( {1 - \lambda } \right)\gamma kt_{3} } \mathord{\left/ {\vphantom {{\left( {1 - \lambda } \right)\gamma kt_{3} } n}} \right. \kern-0pt} n} $. So the total time is below:

$$ 4m\left( {t_{1} + t_{2} } \right) + {{2\lambda \gamma k\left( {t_{1} + t_{2} } \right)} \mathord{\left/ {\vphantom {{2\lambda \gamma k\left( {t_{1} + t_{2} } \right)} n}} \right. \kern-0pt} n} + {{\left( {1 - \lambda } \right)\gamma kt_{3} } \mathord{\left/ {\vphantom {{\left( {1 - \lambda } \right)\gamma kt_{3} } n}} \right. \kern-0pt} n}. $$

Results are shown in Fig. 4. There are four lines that denote the time for querying the data which are submitted by $ DC_{i} $ in one epoch with $ k = 1000 $, $ k = 2000 $, $ k = 3000 $ and $ k = 4000 $, respectively. Obviously, the query time increases with the number of buckets. In actual scenarios, such as radiation detection and traffic monitoring, DO often queries the data that are collected by sensing devices in specific area and limited number of epochs. Therefore, the cost of query processing in our scheme is acceptable.

Table 1 is the comparison between our scheme and the trivial approach. Note that the value of $ m $ is small in actual applications. In addition, the cost of DO in our scheme is radically reduced for the work of data processing are shared by $ n $ DCs. Additionally, since we adopt ($ m, \, N $)-key-insulated method to update keys in each epoch, the data in our scheme is more secure compared with the data in the trivial approach that are encrypted by unchanged key during $ N $ epochs. Furthermore, our integrity verification method can verify that the semi-trusted CSP whether omit query results or not. The additional cost of key update and integrity verification are acceptable.

Table 1. The comparison between the trivial approach and our scheme in each epoch

Full size table

6 Conclusion

In this paper, we are the first to construct a scheme for realizing multidimensional range query for real-time data. We adopt ($ m, \, N $)-key-insulated method to bucketization method and radically reduce the cost of DO. In our scheme, DO don`t need to do anything at data processing in each epoch, and it only executes query when it wants. Each DC undertakes the work of updating encryption keys, encrypting and generating verifying numbers for its data in each epoch, and experiments show that the cost for each DC to do these works is acceptable for practice. By using ($ m, \, N $)-key-insulated method which is semantically secure under the DDH assumption, we improve the security of data (an adversary who compromises at most $ m $ epochs can only destroy the security of the data in the epochs that are compromised) and also simplify the key distribution of DO. Furthermore, we can verify whether the semi-trusted CSP omits some query results or not and thereby ensure query-result integrity. Because of space constraints, we leave further research on the optimal method for reducing false positives and the risk of information disclosure for the future.

References

Shi, E., Bethencourt, J., Chan, H.T.-H., Song, D.X., Perrig, A.: Multi-dimensional range query over encrypted data. In: IEEE S&P (2007)
Google Scholar
Boneh, D., Waters, B.: Conjunctive, subset, and range queries on encrypted data. In: Vadhan, S.P. (ed.) TCC 2007. LNCS, vol. 4392, pp. 535–554. Springer, Heidelberg (2007)
Chapter Google Scholar
Li, J., Omiecinski, E.R.: Efficiency and security trade-off in supporting range queries on encrypted databases. In: Jajodia, S., Wijesekera, D. (eds.) Data and Applications Security 2005. LNCS, vol. 3654, pp. 69–83. Springer, Heidelberg (2005)
Chapter Google Scholar
Agrawal, R., Kiernan, J., Srikant, R., Xu, Y.: Order-preserving encryption for numeric data. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 563–574. ACM (2004)
Google Scholar
Boldyreva, A., Chenette, N., Lee, Y., O’Neill, A.: Order-preserving symmetric encryption. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 224–241. Springer, Heidelberg (2009)
Chapter Google Scholar
Hacigümüş, H., Lyer, B., Li, C., Mehrotra, S.: Executing SQL over encrypted data in the database-service-provider model. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 216–227. ACM (2002)
Google Scholar
Hore, B., Mehrotra, S., Tsudik, G.: A privacy-preserving index for range queries. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, vol. 30, pp. 720–731. VLDB Endowment (2004)
Google Scholar
Hore, B., Mehrotra, S., Canim, M., Kantarcioglu, M.: Secure multidimensional range queries over outsourced data. Int. J. Very Large Data Bases 21, 333–358 (2012)
Article Google Scholar
Girault, M.: Relaxing tamper-resistance requirements for smart cards by using (auto-) proxy signatures. In: Quisquater, J.-J., Schneier, B. (eds.) CARDIS 1998. LNCS, vol. 1820, pp. 157–166. Springer, Heidelberg (2000)
Google Scholar
Dodis, Y., Katz, J., Xu, S., Yung, M.: Key-insulated public key cryptosystems. In: Knudsen, L.R. (ed.) EUROCRYPT 2002. LNCS, vol. 2332, pp. 65–82. Springer, Heidelberg (2002)
Chapter Google Scholar
Tzeng, W.-G., Tzeng, Z.-J.: Robust key-evolving public key encryption schemes. In: Deng, R.H., Qing, S., Bao, F., Zhou, J. (eds.) ICICS 2002. LNCS, vol. 2513, pp. 61–72. Springer, Heidelberg (2002)
Chapter Google Scholar
Lu, C.-F., Shieh, S.-P.: Secure key-evolving protocols for discrete logarithm schemes. In: Preneel, B. (ed.) CT-RSA 2002. LNCS, vol. 2271, pp. 300–309. Springer, Heidelberg (2002)
Chapter Google Scholar
Hanaoka, G., Hanaoka, Y., Imai, H.: Parallel key-insulated public key encryption. In: Yung, M., Dodis, Y., Kiayias, A., Malkin, T. (eds.) PKC 2006. LNCS, vol. 3958, pp. 105–122. Springer, Heidelberg (2006)
Chapter Google Scholar
Zhang, R., Shi, J., Zhang, Y.: Secure multidimensional range queries in sensor networks. In: Proceedings of the Tenth ACM International Symposium on Mobile Ad Hoc Networking and Computing, pp. 197–206. ACM (2009)
Google Scholar
Phan Van Song, Y.-L.: Query-optimal-bucketization and controlled-diffusion algorithms for privacy in outsourced databases. Project report, CS5322 Databases Security-2009/2010
Google Scholar
Hore, B., Mehrotra, S., Tsudik, G.: A privacy-preserving index for range queries. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, vol. 30, pp. 720–731. VLDB Endowment (2004)
Google Scholar
Bellare, M., Desai, A., Jokipii, E., Rogaway, P.: A concrete security treatment of symmetric encryption. In: Foundations of Computer Science, pp. 394–403 (1997)
Google Scholar
Papamanthou, C., Tamassia, R., Triandopoulos, N.: Authenticated hash tables. In: Proceedings of the 15th ACM Conference on Computer and Communications Security, pp. 437–448. ACM (2008)
Google Scholar

Download references

Acknowledgements

Our work is sponsored by the national natural science foundation of China (research on privacy protecting cipher text query algorithm in cloud storage, No. 61472064), the science and technology foundation of Sichuan province (research and application demonstration on trusted and safety-controllable privacy protecting service architecture for cloud data, 2015GZ0095) and the fundamental research funds for the central universities (research on some key technology in cloud storage security, YGX2013J072).

Author information

Authors and Affiliations

School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China
Zhong Ting, Han Xiao, Yang Yunshuo & Zhu Aixiang

Authors

Zhong Ting
View author publications
You can also search for this author in PubMed Google Scholar
Han Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Yang Yunshuo
View author publications
You can also search for this author in PubMed Google Scholar
Zhu Aixiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhong Ting .

Editor information

Editors and Affiliations

Nanjing University of Aeronautics and As, Nanjing, China
Zhiqiu Huang
Nanjing University of Information Scienc, Nanjing, China
Xingming Sun
Nanjing, China
Junzhou Luo
Nanjing University of Aeronautics and As, Nanjing, China
Jian Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ting, Z., Xiao, H., Yunshuo, Y., Aixiang, Z. (2015). Privacy-Preserving Multidimensional Range Query on Real-Time Data. In: Huang, Z., Sun, X., Luo, J., Wang, J. (eds) Cloud Computing and Security. ICCCS 2015. Lecture Notes in Computer Science(), vol 9483. Springer, Cham. https://doi.org/10.1007/978-3-319-27051-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-27051-7_5
Published: 05 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27050-0
Online ISBN: 978-3-319-27051-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics