A dynamic and verifiable multi-keyword ranked search scheme in the P2P networking environment

Wang, Haoyang; Fan, Kai; Li, Hui; Yang, Yintang

doi:10.1007/s12083-020-00912-7

A dynamic and verifiable multi-keyword ranked search scheme in the P2P networking environment

Published: 26 May 2020

Volume 13, pages 2342–2355, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Peer-to-Peer Networking and Applications Aims and scope Submit manuscript

A dynamic and verifiable multi-keyword ranked search scheme in the P2P networking environment

Download PDF

Haoyang Wang¹,
Kai Fan¹,
Hui Li¹ &
…
Yintang Yang²

510 Accesses
10 Citations
Explore all metrics

Abstract

With the rapid development of the Internet, Peer to Peer(P2P) network has been applied in various fields. Users in P2P network also have a large amount of data, but users cannot provide enough storage space locally. More and more users choose to upload their own data to cloud server in order to save overhead and facilitate sharing their own data with other users. In order to ensure data security, researchers have proposed searchable encryption(SE) technology, and searchable encryption has been widely used. In this paper, a dynamic verifiable multi-keyword ranked search scheme is proposed under the background of P2P network and cloud storage service(CSS). On the basis of using secure kNN algorithm to encrypt index and traditional inner product algorithm to obtain ranked results, the scheme in this paper realizes forward and backward security by changing the structure of file vector and using modular residual computation. Meanwhile, the integrity and freshness of search results are verified by combining timestamp chain and Merkle tree. Finally, the security of this scheme under two threat models is analyzed, and the performance evaluation experiment is carried out on the document set.

A Verifiable and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud Data with Accuracy Improvement

Secure and Verifiable Multi-owner Ranked-Keyword Search in Cloud Computing

DMRS: an efficient dynamic multi-keyword ranked search over encrypted cloud data

Article 22 June 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In recent years, P2P network and application have developed rapidly, but the limited local storage space of users has become a bottleneck for their development, With the unique advantages of CSS, more and more users storing data in the cloud server. This reduces local storage and computational overhead while also enabling rapid development of SE technology. The efficiency of the SE scheme is affected by various factors such as index, trapdoors and encryption methods. Since Song et al. [21] first proposed a symmetric searchable encryption(SSE), the SE technology has made tremendous progress in improving search efficiency. Simultaneously, the cloud servers in most current SE schemes are set to be honest but curious. But in reality, malicious servers will perform illegal operations on the ciphertexts on them. Researchers have proposed some verifiable searchable encryption(VSE) scheme on this basis. However, most VSE schemes currently only supports static databases or based on specific SE structures. In the meantime, for most existing searchable encryption schemes that support dynamic updates, the update operations are performed on the cloud server, thus forward and backward security is very necessary for dynamic searchable encryption(DSE) schemes. They are defined as follows:

1.
Forward privacy: If we search for a keyword and later add a new document containing keyword, the cloud server does not learn that the new document has a keyword we searched in the past.
2.
Backward privacy: New queries cannot be executed over deleted documents.

In this paper, we propose a dynamic and verifiable multi-keyword ranked search scheme in the P2P networking environment with forward and backward security(DVMRS). DVMRS can be summarized as follow:

1.
DVMRS realizes forward and backward security in line with the traditional inner product algorithm and secure kNN algorithm [16, 17, 28], which are challenging issues on the past DSE schemes.
2.
DVMRS combines Merkle tree and time-stamp chain to ensure verification of the search results integrity and freshness in the dynamic databases.

We organize this paper as follows: In Section 2, we give the related work on SE. And in Section 3, we introduce the design goals and preliminaries. Meanwhile, the system model and security model are described in Section 4. We present our scheme in Section 5 completely. Then we carry out the performance and security analysis in Section 6 and Section 7, respectively. Finally, we reveal the conclusion and future work of our scheme in Section 8.

2 Related work

Song et al. [21] first proposed a searchable symmetric encryption scheme using sequential for single keyword search on encrypted data, which proved to be safe. Then Boneh et al. [4] proposed a searchable encryption scheme with public key. Based on these two schemes, the research on searchable encryption has made great progress in recent years.

Multi-keyword search

After a great quantity researches, [2, 5, 10] realized the linked multi-keyword search. Boneh et al. [5] proposed a public key based join and unjoin query scheme, which is similar to subset and range queries. Wang et al. [2] designed a public key scheme based on inverted index. This approach uses private set intersection to support joined multi-keyword queries. Wu et al. [27] constructed an efficient multi-keyword public key searchable encryption scheme based on the reverse encryption index structure and homomorphic encryption.

Ranked search

Ranking search is proposed to sort the search results according to a certain method of relevance, so that search users can find more relevant search results faster. Wang et al. [24] uses inverted indexes and constructs a preserving order symmetric encryption scheme, but this scheme only supports single-keyword search. Cao et al. [19] first construct a searchable encryption scheme(MRSE) that supports multi-keyword ranked search using secure inner product computation with lower computing and communication overhead. However, this scheme ignored the importance of different keywords. Fu et al. [8] used processing stemming algorithm, LSH and bloom filter built a supports ranked and fuzzy search scheme at the same time. Sun et al. [22] extended the index structure to multi-dimensional binary(MDB) tree structure based on the Fu et al. [8]. Dai et al. [3] proposed a multi-keyword ranking search scheme for privacy protection of encrypted data in a hybrid cloud based on binary tree and keyword partition algorithm in line with an equally divided k − means clustering. Zhang et al. [33] has modified the computation of term frequency(TF) value, which the words position and file length in the file are added into it.

Dynamic search

In practice, the files stored by a cloud server are not immutable, so file updates should be considered. In Goh et al. [9], file update is implemented through bloom filter. Kamara et al. [14] constructed a new dynamic encryption index to implement dynamic updates.The scheme proposed by Wang et al. [25] implements efficient dynamic indexing under homomorphic encryption and pseudo-random padding. Wan and Deng et al. [23] utilized the bilinear-map accumulation tree to achieve updating scheme. Du et al. [6] put forward a dynamic multi-client searchable encryption scheme that supports boolean queries by incorporating the client authorization information into the query token and index.

Verifiable search

Kamara et al. [13] firstly raised VSSE scheme in static databases. Wen and Deng et al. [23] proposed the verification application by homomorphic MAC. Wang et al. [26] proposed an efficient verifiable keyword searchable encryption utilizing aggregate keys to save computational resources. Zhang et al. [33] combined the binary tree and the Merkle tree based on inner product computation to implement the verification of data integrity.

3 Preliminaries

Incremental hash (IH). Incremental hash was first proposed by Bellare et al. [1] and was used by numerous existing SE schemes. An IH function is a collision-resistant function:
$$ IH:{\{ 0,1\}^{*}} \to {\{ 0,1\}^{l}} $$
(1)

with which the addition or subtraction operation of two random strings in the IH function will not produce a collision.
Merkle tree. Merkle tree was first proposed by kamara et al. [18] for the first time. The Merkle tree in DVMRS is established on the basis of tree index. Firstly, the IH value corresponding to the file at each leaf node of the tree is calculated. Then the IH value of the non-leaf node is calculated by the concatenation operation by using the left and right child nodes of a non-leaf node, where || represents concatenation operation. And the IH value of the root node rt is obtained by repeating this process. In Fig. 1, we demonstrate how to build a Merkle tree.
Time-stamp chain. The time-stamp chain is a chain consisting of fixed update time-stamps up_i, data update time-stamp TP(i,j), and query time-stamp T_i, where the fixed update time-stamp is the point in time after a fixed period of time for updating, the data update time-stamp is the time point when data are updated and the query time-stamp is the point in time when the user queries. In Fig. 2, we depict the specific composition of a time-stamp chain in DVMRS.
Minimum hash sub-tree(MHS). The MHS is a part of Merkle tree, which we utilize the MHS to assist in calculating the root IH value of the search results. Take Fig. 1 for example, the request document is f₅. The MHS returned is consists of hash22 node and its two child nodes. When search user delivers a search query to the CSP, the CSP sends back the final search results along with MHS and the corresponding auxiliary information, which will be introduced in Section 5.1.

4 System design

4.1 Design goals

In order to realize the privacy protection, efficient search, multi-keyword ranked search and search results verification proposed in our scheme, we put forward the following design goals:

Search efficiently: The index structure in SE scheme has a significant effect on search efficiency. In our scheme, tree-based index is a great to realize it.
Dynamic update: Our scheme should support dynamic updating of data. When data owners increase, delete and update local data, we need to execute the same operation instructions for data on cloud server to ensure the consistency of local data between the CSP and data owner.
Privacy: In this scheme, we need to realize data privacy, index and query privacy, forward and backward privacy, keyword privacy and trapdoor unlinkability. (1) Data Privacy. The CSP cannot conclude the corresponding plaintext by analyzing the stored ciphertext. (2) Index and Query Privacy. Both the query and index are represented by vectors. The index vector contains information such as TF value, while the query vector contains information such as inverted document frequency(IDF) value, so it is necessary to protect the privacy of them. (3) Keyword Privacy. The CSP could not distinguish the specific keywords.(4) Trapdoor Unlinkability. The trapdoor need to be distinguishable for the same query. (5) Forward and Backward Privacy. In dynamic schemes, forward and backward privacy are such an indispensable demand that should be implemented.
Results verification: In DVMRS, we need to verify the completeness, freshness and correctness of the search results. (1) Completeness. The CSP returns the search results completely. (2) Freshness. The results returned to the user should be up to date. (3) Correctness. The CSP cannot return search results containing irrelevant data to the users.
Multi-keyword search: Compared with single-keyword search, multi-keyword search is more able to meet the search requirements of data users and implement higher search efficiency.

4.2 System model

As shown in Fig. 3, DVMRS consists of three entities.

Data owner: The data owner will use symmetric encryption algorithm to encrypt the local data and outsourced them to the cloud service provider(CSP). Meanwhile, the data owner will use the data dictionary to build a tree-index structure, which will be transmitted to the CSP after encryption. The data owner sends the key that generates the search trapdoor and the symmetric key that encrypts the data to the authorized users through the secret channel.
Cloud service provider(CSP): The CSP is an intermediate entity that stores encrypted data and corresponding indexes obtained from the data owner and provides data access and search services to authorized data users. When an authorized user sends a trapdoor to the CSP, it returns a matching set of ciphertexts in line with trapdoor and the authenticator under the query time-stamp to the data user for results verification.
Data users: An authorized user can obtain symmetric key and secret key from the data owner. When data users want to search on the CSP, firstly they need to generate a searching keyword set, then use the secret key to generate the corresponding trapdoor and send it to the CSP. The data user verifies the searching results by obtaining auxiliary information from the CSP.

4.3 Security model

In DVMRS model, the data users are authorized by the data owner to conduct the search, so we do not consider the leakage of the data users. Meanwhile, communication between data owner and data user is conducted through secret channel, so we do not need to consider in the communication between them.

However, the CSP is considered curious and dishonest in this scheme, the CSP may not follow the protocols and the CSP is curious about the data content, keywords and other additional information. DVMRS mainly considers two threat models:

Known Ciphertext Model: In this model, the CSP is only aware of encrypted information, which includes encrypted data, encrypted indexes and searching trapdoors.
Known Background Model: In this model, the CSP know information other than encrypted information, such as document frequency and keyword frequency. This information will be used for statistical attacks to infer keywords in query requests.

5 Scheme construction

5.1 Notations

Table 1 lists the symbols that need to be used in this scheme and the corresponding instructions.

Table 1 Notations summary

Full size table

5.2 Algorithm construction

5.2.1 Data owner

The following algorithms are executed by Data owner:

KGen(1^k) →{K₁,K₂,K₃,ssk,spk}: This algorithm is a probabilistic algorithm run by the data user, It takes a security parameter as input, and outputs the secret keys and a random signing key pair (ssk,spk). K₁ is used to encrypt the index based on tree construction, K₂ is used to encrypt documents and K₃ is used to generate authenticator with the of ssk. The secret key K₁ = (S,M₁,M₂). S is an (m + d + 1)-bit vector, M₁ and M₂ are two (m + d + 1) × (m + d + 1) invertible matrices respectively. The vector S is a splitting indicator to split document vector into two random vectors, which confuses the value of document vector, M₁ and M₂ are used to encrypt the splitting vectors simultaneously.
Enc(F,K₂) →{C}: The data owner uses secret key K₂ to generate the ciphertexts of document set.
GenIndex(F,K₁,K₃,ssk) →{I,π}: This algorithm has divided into two sub-algorithms BuildTree(F,K₁) →{I} and GenMerkleTree(I,K₃,ssk) →{π}. The data owner generates an m-bit vector P_j according to each document F_j(j = 1,2,⋯,N), where each bit P_j[i] indicates the weight value of keyword t_i in F_j i.e. P_j[i] = V alue(t_i,F_j). Then the data owner extends the vector P_j to an (m + d + 1)-bit vector $P_{j}^{\prime } = {P_{j}} \mid \mid {V_{j}}$, where V_j is a (d + 1)-bit vector. To achieve certain scalability of document collection, d should be set as possible maximum size of the outsourced document collection. Assume the possible maximum similarity score between a search keyword set and a document is ${\max \limits _{s}}$, then the data owner chooses a random parameter g, where $g > {\max \limits _{s}}$. Therefore , the vector V_j(j = 1,2,⋯,N) can be generated as shown in Algorithm 1, where g^∗ represents arbitrary integer multiples of g. Thus, the type of V_j is as follows:
$$ \begin{array}{@{}rcl@{}} {V_{1}} &=& ({g^{*}},{g^{*}}, {\cdots} ,{a_{1}},{g^{*}}, {\cdots} ,{g^{*}}, - {a_{1}})\\ {V_{2}} &=& ({g^{*}},{g^{*}},{a_{2}}, {\cdots} ,{g^{*}}, {\cdots} ,{g^{*}}, - {a_{2}})\\ {V_{3}} &=& ({g^{*}},{g^{*}}, {\cdots} ,{g^{*}},{a_{3}}, {\cdots} ,{g^{*}}, - {a_{3}}). \end{array} $$
(2)

The vector $P_{j}^{\prime }$ will be encrypted by the secure kNN algorithm: the data owner uses vector S to split $P_{j}^{\prime }$ into two (m + d + 1) dimensional vectors (p_a,p_b), where the vector S functions as a splitting indicator. Namely, if S[i] = 0(i = 1,2,⋯m + d + 1), p_a[i] and p_b[i] are both set as ${P_{j}^{\prime }}[i]$; if S[i] = 1(i = 1,2,⋯m + d + 1), the value of ${P_{j}^{\prime }}[i]$ will be randomly split into p_a[i] and p_b[i] i.e. ${P_{j}^{\prime }}[i]={p_{a}}[i]+{p_{b}}[i]$. Then the index of encrypted document C_j can be calculated as $I = ({M_{1}^{T}}{p_{a}},{M_{2}^{T}}{p_{b}})$. Moreover, according to the Merkle tree construction method in Section3, the data owner builds the relevant Merkle tree M_I in line with the index tree. Finally, a key tuple $({K_{1}},{K_{2}},\bar \sigma )$ will be sent to the authenticated search users through secure broadcast channel, where K₂ is the symmetric key used to encrypt documents outsourced to the CSP. And the data owner publishes g and stores σ in her own storage space. Meanwhile, the data owner outputs the authenticator π, stores the tree index and the original authenticator locally, and then sends them to the CSP.
PreUpdate(K₁,K₂,K₃,ssk,f) →{τ_u,π_i,j}: This algorithm run by the data owner. It takes as input the symmetric keys K₁, K₂, K₃, the secret key of signing key pair ssk, a file f to be updated and outputs the update token τ_u and the new authenticator π_i,j. The data owner sends τ_u and π_i,j to the CSP.
$Update(I,{\tau _{u}},\sigma ,\bar \sigma ) \to \{ {I^{\prime }},{\sigma ^{\prime }},{\bar \sigma ^{\prime }}\}$ This algorithm is used to perform the update token. It takes the tree index, the update token τ_u, the previous number sets σ and $\bar \sigma $ as input, then outputs the new tree index and updated number sets ${\sigma ^{\prime }}$, ${\bar \sigma ^{\prime }}$. The Merkle tree will be updated simultaneously. After the execution of this algorithm, the data owner delivers the new index and number sets to CSP.

5.2.2 CSP

The following algorithms are executed by CSP: $Search(I,TD) \to \{ R,{{\pi }_{q}^{t}},{\pi _{i,j}}\}$: This algorithm uses index and trapdoor to calculate the similarity score to get the top − k searching results. After receiving the trapdoor TD from the data users, the CSP calculates the similarity score between TD and the index vector stored in each node to get the top-k relevant results. The similarity score is calculated as:

$$ \begin{array}{@{}rcl@{}} Score(\bar W,{F_{j}}) &=& I \cdot TD\\ &=& ({P}_{j}^{\prime} \cdot {Q_{\bar{W}}^{\prime}}) mod g\\ &=& ({p_{a}} \cdot {q_{a}} + {p_{b}} \cdot {q_{b}})mod g\\ &=& ({{M}_{1}^{T}}{p_{a}},{{M}_{2}^{T}}{p_{b}}) \!\cdot\! ({M}_{1}^{- 1}{q_{a}},{M}_{2}^{- 1}{q_{b}})mod g\\ &=& ({P_{j}} \cdot {Q_{\bar W}})mod g \end{array} $$

(3)

The larger score indicates the corresponding document F_j is more relevant to the search keyword set $\bar W$, the documents with top scores will be returned to the data user. Meanwhile, the CSP returns the authenticator ${\pi _{q}^{t}}$ to the requested user when the data user sends a request to server and the authenticator π_i,_j of the check point.

Moreover, It should be noted that when calculating the similarity scores in the first step, it is not necessary to calculate the similarity scores on each node one by one. When the weight of the corresponding keyword position on a non-leaf node is zero, we directly give up searching on the child nodes of this node,which can decrease amount of communication and computational overhead.

5.2.3 Data user

The following algorithms are executed by data user:

$GenTrapdoor(\bar W,{K_{1}},\bar \sigma ) \to \{ TD\}$: The data user generates the keyword set $\bar W$ for searching. Then data user creates an m-bit vector ${{Q}_{\bar W}}$ according to $\bar W$, where ${{Q}_{\bar W}}[i]$ indicates whether the i-th keyword of dictionary w_i is in $\bar W$ i.e. ${{Q}_{\bar W}}[i] = 1$ indicates yes and ${{Q}_{\bar W}}[i] = 0$ indicates no. Then the data user also extends the vector ${{Q}_{\bar W}}$ to an (m + d + 1)-bit vector ${{Q}_{\bar W}^{\prime }} = {{Q}_{\bar W}}||{V^{\prime }}$, where ${V^{\prime }}$ is a (d + 1)-bit vector and can be generated as follows: ${V^{\prime }} = \{ {V^{\prime }}[i]\} = 1(i \in \bar \sigma \cup \{ d + 1\} )$ or ${V^{\prime }} = \{ {V^{\prime }}[i]\} = 0(other)$. The data user can split ${Q}_{\bar W}^{\prime }$ into two (m + d + 1)-bit vectors (q_a,q_b): if S[i] = 0(i = 1,2,⋯,m + d + 1), the value of ${Q}_{\bar W}^{\prime }$ will be randomly split into q_a[i] and q_b[i]; if S[i] = 1(i = 1,2,⋯,m + d + 1), q_a[i] and q_b[i] are both set as ${{Q}_{\bar W}^{\prime }}[i]$. Thus the trapdoor TD can be generated as $TD = (M_{1}^{- 1}{q_{a}},M_{2}^{- 1}{q_{b}})$.
Dec(R,K₂) →{PR}: The data user uses the secret key K₂ transmitted from the data owner by a secure broadcast channel to decrypt the encrypted results R and get the plaintext results PR.
$Check({K_{3}},spk,{{\pi }_{q}^{t}},{\pi _{i,j}}) \to \{ b_{1}\}$: The data user runs this algorithm to check the correctness of authenticator, which they get at query time t, this algorithm takes symmetric key K₃, the public key of signing keypair spk, authenticator ${{\pi }_{q}^{t}}$ and π_i,j as input. It outputs a bit b₁ represents an accept or reject result. The detailed process of this algorithm is shown in Algorithm 2:
$Verify({K_{1}},{K_{3}},spk,R,{\pi _{q}^{t}},aux) \to \{ b_{2}\}$: The data users run this algorithm, which takes symmetric keys K₁, K₃, the public key of signing key pair spk, the searching results R, the authenticator ${\pi _{q}^{t}}$ and the corresponding auxiliary information aux returned by the CSP. It outputs a bit b₂ represents verification passed or not. The verification process of searching results is shown in algorithm 3:

5.3 Detailed process

5.3.1 Anthenticator construction

We utilize the Merkle tree and time-stamp chain to design authenticator. There are two challenges in verifying searching results.

The first one is how to design an efficient generic proof index not only supports data integrity verification but also supports data update. In DVMRS we build and maintain such a proof index by leveraging the fully Merkle tree and IH values.

The second one is how to ensure data freshness by preventing the root from being replayed(malicious server) in the context of data updates. In order to solve this problem, we add a time-stamp to the authenticator, which is located on the three-party known time-stamp chain, it ensures that each authenticator will have a time identity that can not be tampered with.

Combined with the solutions to the above two problems, we design the authenticator used in the scheme as follows:

$$ \left\{ \begin{array}{ll} {\pi_{i,0}} = ({\alpha_{i,0}},Si{g_{ssk}}({\alpha_{i,0}})),(u{p_{i}} < t{p_{i,0}} \le u{p_{i + 1}})\\ {\alpha_{i,0}} = En{c_{{K_{3}}}}(r{t_{i,0}}||t{p_{i,0}})\\ {\cdots} \\ {\pi_{i,j}} = ({\alpha_{i,j}},Si{g_{ssk}}({\alpha_{i,j}})),(t{p_{i,j - 1}} < t{p_{i,j}} \le u{p_{i + 1}})\\ {\alpha_{i,j}} = En{c_{{K_{3}}}}(r{t_{i,j}}||t{p_{i,j}}||{\alpha_{i,j - 1}})\\ {\cdots} \\ {\pi_{i,n}} = ({\alpha_{i,n}},Si{g_{ssk}}({\alpha_{i,n}})),(t{p_{i,n}} = u{p_{i + 1}})\\ {\alpha_{i,n}} = En{c_{{K_{3}}}}(r{t_{i,n}}||t{p_{i,n}}||{\alpha_{i,n - 1}}) \end{array} \right. $$

(4)

The detailed process of constructing authenticator:

1)
Generating the IH hash values of files.
2)
Building the Merkle tree based on the IH values of files.
3)
Data owner determines a time-stamp chain, which the time-stamp chain contains the fixed update time point of authenticator and interval length of the update time.
4)
Data owner generates a original authenticator and time-stamp chain, meanwhile broadcasts them among data user with broadcasting channel and upload them to CSP. It is noted that we use Network Time Protocol running on the CSP to synchronize the clocks among the data owners and the data users during their interactions with the servers.

In DVMRS, the update of authenticator is divided into two cases:

The first case is that the Merkle tree is not updated during an update interval, the data owner only needs to update the time-stamp at next time point.

The second case is that the document set of the data owner is modified within an update interval, which means the root of the Merkle tree has been updated, then the data owner will calculate a new authenticator by using the latest root IH value and present time-stamp, meanwhile the CSP need to update authenticator simultaneously.

5.3.2 Results verification

The process of data users verifying searching results divided into two stages:

1)
In order to prevent malicious CSP from returning incorrect anthenticator to the user, we design Check algorithm to check the freshness and correctness of authenticator, it is executed by a data user and verifies whether the authenticator has been replayed.
2)
We use Generate algorithm to verify searching results by leveraging the root hash value extracted from the authenticator and the minimum hash sub-tree extracted from Merkle tree with searching results.

In order to prevent the CSP from previous authenticators and ensure the freshness of the root hash value, the application of time-stamp chain can solve this problem, such that data users can trace authenticators in the chain and identify if root hash value is fresh.

In this setting, the CSP needs to provide an authenticator at the query time and meanwhile an authenticator at the checkpoint, where the checkpoint is referred as the next update time point close to the query time t.

When data users searching files with keywords in keywords dictionary, there have three cases at different points with same keywords, we have demonstrated these cases in Fig. 4 in sub-section 2.5:

1)
The first case is that the query occurs at t₁, the malicious CSP could only send π_i− 1,n to the data user.
2)
The second case is that the query occurs at t₂ after the data update at tp_i,₀, and the authenticator that the malicious CSP sends to the user is π_i,₀.
3)
The last case is that query is generated at t₂, and the authenticator sent by the malicious CSP is π_i− 1,n. In the last case, a data freshness attack occurs, but it will be detected at the checkpoint up_i+ 1. The data user will obtain the last authenticator π_i,₁ from the malicious CSP at the checkpoint to verify whether the data obtained at the query time has been replayed or not.

6 Performance analysis

The DVMRS is to improve the security on the secure kNN neighbor-based searchable encryption scheme. By changing the structure of constructing the index tree and generating the trapdoor vector, the scheme adds a number set to mark the data to achieve forward and backward security. Verification of data integrity, freshness and correctness is achieve by combining time-stamp chain and Merkle tree.

6.1 Precision

In the searching accuracy analysis, we use the following formula to define the precision:

$$ Precision = \frac{{{k^{\prime}}}}{k} $$

(5)

where $k^{\prime }$ is the number of real documents and k is the number of returned top-k documents.

The schemes for comparison with this scheme are Xia et al. [16] and Zhang et al. [17], and the accuracy of this scheme is slightly higher than that of Xia et al. [16]. The scheme, Zhang et al. [17], is slightly higher than DVMRS and the Xia et al. [16], which is due to the modification of the inner product TF × IDF calculation. The method assigns different weights to search for keywords in different positions in the data, thereby making the search accuracy higher. However, DVMRS realizes the forward and backward security that is not realized by other schemes. This scheme also introduces the time-stamp chain and combines the Merkle tree to resolve it.The final three precision comparison simulation diagram is shown in Fig. 4.

In order to ensure the fairness of the simulation comparison, the same parameters and the same data set were used in the simulation comparison.

The data set used in the simulation contains 3026 documents and 1789 keywords extracted from the document. It can be seen from the simulation image that there is no obvious fluctuation in the search accuracy during the change of the number of keywords in the query from 50 to 500 between three schemes. As can be seen from Fig. 5, the DVMRS maintains high search accuracy while achieving forward and backward security.

6.2 Index construction

In both the DVMRS and two schemes involved in the comparison, the index construction is divided into two stages, building the tree structure and encryption, and TF × IDF algorithm and kNN nearest neighbor algorithm are used in both Zhang et al. [17] and Xia et al. [16] schemes. In the encryption stage, the two comparison schemes will extend the n-bit original vector to (n + U + 1)-bit, where U is a random number. However, we should note that U is a random number in a certain range. When there are more keywords in a file, the range of U random should be increased accordingly, otherwise its random security can not be brought into play.

In DVMRS, in order to achieve forward and backward security, we modify the vector filling method of vector to extend the original vector of n-bit to (n + d + 1)-bit, where d is the maximum value of the outsourced document set. In this way, the larger number of document sets, the more bits the vector accounts for, although it will increase a small amount of computing overhead, but the security will be stronger. Simultaneously, it is noted that in order to ensure the security of keywords, two schemes for comparison should be also adjust the range of U according to the quantity of stored data.

In Fig. 5, the number of keywords in the dictionary is fixed to n = 750 for the number of different documents from 500 to 3000. In Fig. 6, the number of documents in the dictionary from 250 to 1500 is fixed to N = 1500.

6.3 Trapdoor generation

In DVMRS and the contrast schemes, the complexity of trapdoor generation is determined by the segmentation vector and secret matrix used in encryption, and it is also related to the total number of keywords extracted from the document set. When the total number of keywords in the dictionary is n, the complexity of generating trap gate is O(n²). Meanwhile, the filling vector in encryption is improved in the DVMRS. The time complexity of trapdoor generation is also affected by the number of document sets, but in the acceptable range.

We have simulated the trapdoor generation many times, as shown in Figs. 8 and 9 respectively. So in practice, there will be a large amount of data in CSP computing environment. There will only be a small gap in the time complexity of trapdoor generation.

In Fig. 7, the number of keywords in each query is fixed at 10, and the number of keywords in the dictionary changes from 500 to 2000. The fixed keyword in the dictionary in Fig. 8 is n = 500, and the number of words in each query is from 5 to 30.

6.4 Search efficiency

The search process of this scheme is as follows:(1) Calculate the inner product of the relevant node vector and the query vector in the tree;(2) Return the document with higher similarity score according to the number of returned documents and the obtained similarity score. Therefore, the searching time complexity is mainly affected by the number of nodes in the tree, i.e. mainly depends on the number of documents. When the number of stored documents is N, the time complexity is O(log₂N). It is also affected by the number of keywords in the dictionary. We compare the DVMRS with the schemes in Zhang et al. [17] and Xia et al. [16], where we simulate different cases of different documents and different numbers of keywords. The simulation results are shown in Figs. 9 and 10.

In Fig. 9, the number of keywords is fixed to n = 500, the number of search keyword set is m = 10, the number of returned documents is k = 30, and the number of documents changes from 500 to 3000. In Fig. 10, the number of documents is fixed to N = 1000, the number of search keyword set is m = 10, and the number of returned documents is k = 30. The number of keywords in dictionary varies from 250 to 1500.

7 Security analysis

7.1 Privacy protection

1.
Data privacy. The document set is encrypted locally by the data owner using symmetric encryption. The data owner directly uploads the ciphertexts to the CSP after encryption. The data owner transfers symmetric key to the authorized users through the secret channel. The data user is proven to be semantically secure through a symmetrically encrypted set of documents.
2.
Index and Trapdoor privacy. In DVMRS, the security of the index and the trapdoor are based on the secure kNN algorithm. Although the keyword set of the two filters or the two search keyword sets are the same, the indexes or trapdoors are not the same. This is because the secure kNN algorithm is a non-deterministic algorithm. In this algorithm, the segmentation vector S used to split the trapdoors and indexes, two matrices M₁,M₂ are randomly generated. As long as the security of the encryption key K₁ is ensured, the CSP cannot identify the trapdoors and indexes by analyzing the plaintext. This has also proven to be safe in the context of Known Ciphertext Model.
3.
Trapdoor Privacy. In constructing the vector V_j(j = 1,2,⋯ ,N), the DVMRS first selects a random parameter g(g >similarity score maximum value). Then selects a parameter g^∗ of any integer multiple of g to construct the vector V_j. The trapdoor vector is randomly divided into two vectors and then encrypted, which protects the search pattern and makes it indistinguishable from different trapdoors and even the same trapdoors. Therefore, the similarity score will be different for each query, and the CSP cannot distinguish them.
4.
Keyword Privacy. A random number is used to randomize similarity scores when data owner encrypts indexes, thus the keyword privacy can be secured under Known Background Model.
5.
Forward and Backward Security. Since the CSP is malicious in the context of DVMRS, some historical information such as previous queries and deleted documents are saved in the CSP local storage space.

In this scenario our goal is that newly inserted documents cannot be searched by previous, or new queries cannot be executed on deleted documents. In DVMRS, a legitimate query calculates similarity scores as follows:
$$ \begin{array}{@{}rcl@{}} Score && (\bar W,{F_{j}}) \\ &=& ({P_{j}^{\prime}} \cdot {Q^{\prime}}) mod g\\ &=& ({P_{j}} \cdot Q + \sum\limits_{i = 1}^{d + 1} {{V_{j}}[i] \cdot {V^{\prime}}[i]} ) mod g\\ &=& ({P_{j}} \cdot Q + {a_{j}} + {g^{*}} - {a_{j}}) mod g\\ &=& {P_{j}} \cdot Q \end{array} $$
(6)

However, when previous queries (or new queries) “touch” new files (or deleted files), there will be:
$$ \begin{array}{@{}rcl@{}} Score &&(\bar W,{F_{j}}) \\ &=& ({P_{j}^{\prime}} \cdot {Q^{\prime}}) mod g\\ &=& ({P_{j}} \cdot Q + \sum\limits_{i = 1}^{d + 1} {{V_{j}}[i] \cdot {V^{\prime}}[i]} ) mod g\\ &=& ({P_{j}} \cdot Q + {g^{*}} - {a_{j}}) mod g\\ &=& ({P_{j}} \cdot Q - {a_{j}}) mod g \end{array} $$
(7)

Because the parameter a is randomly generated, the correct similarity score cannot be calculated. Therefore, this scheme has forward and backward security.

7.2 Security model

Known Ciphertext Model. In DVMRS, the CSP only knows the encrypted information, especially the encrypted document set C, the encrypted index tree I, and the trapdoor TD. The adversary could only distinguish between two files by “generated index” and “encrypted file”. The vector representing the file is (m+d + 1)-bit, the first m-bit represents the weight of the keywords, and the latter (d + 1)-bit represents the dimension of the extended vector V.

In the index generation stage, we first expand the vector P_j corresponding to each file to ${P}_{j}^{*}$, and then use the segmentation vector to segment ${P}_{j}^{*}$. When S[i] = 1, ${{P}_{j}^{*}}[i]$ will be randomly divided into two vectors P_a[i] and P_b[i].Assuming that the number of “1” in the first m-bit is μ₁ and the dimension of each file is η_f, there are ${({2^{{\eta _{f}}}})^{\mu } } \cdot {({2^{{\eta _{f}}}})^{d}}$ possible combinations, while the two vectors are encrypted by the matrix of (m + d + 1) × (m + d + 1) dimensions. Assuming that each element is η_M-bit in the matrix, then the two matrices have ${({2^{{\eta _{M}}}})^{{{(m + d + 1)}^{2}} \times 2}}$ possible values. Therefore the probability of the same index of two files is calculated as follows:

$$ \begin{array}{@{}rcl@{}} {P_{d}} &=& \frac{1}{{{{({2^{{\eta_{f}}}})}^{{\mu_{1}}}} \cdot {{({2^{{\eta_{f}}}})}^{d}} \cdot {{({2^{{\eta_{M}}}})}^{{{(m + d + 1)}^{2}} \times 2}}}}\\ &=& \frac{1}{{{2^{{\mu_{1}}{\eta_{f}} + {\eta_{f}}d + 2{\eta_{M}}{{(m + d + 1)}^{2}}}}}} \end{array} $$

(8)

It can be seen from the above equation that the parameter μ₁, d, η_f, η_M are larger, the more difficult it is to distinguish, so the encrypted index is indistinguishable.

Known background model

In DVMRS, the CSP may obtain other information, such as a file search frequency and a keyword search frequency, in addition to the encryption information. These information will be used for statistical attacks to infer the keywords in the query.

In DVMRS, the trapdoor is the vector of (m+d + 1)-bit, and the former m-bit represents whether the keywords of the corresponding position exist in the query, the remaining (d + 1)-bit vector represents ${V^{\prime }} = \{ {V^{\prime }}[i]\} = 1(i \in \bar \sigma \cup \{ d + 1\} )$ or ${V^{\prime }} = \{ {V^{\prime }}[i]\} = 0(other)$ in the extension vector. Firstly, the random number r ofη_r is used to extend the vector, which has ${2^{{\eta _{r}}}}$ possible values. Then the vector is divided into two vectors using the segmentation vector of (m+d + 1)-bit, in which there are μ₀ “0”. Assuming that each dimension of the former (m+d)-bit of the vector is η_q-bit, then there are a total of ${({2^{{\eta _{q}}}})^{{\mu _{0}}}}$ possibilities. Then the random matrix is used to encrypt the two query vectors. The same probability of the two trapdoors is calculated as follows:

$$ {P_{d}} = \frac{1}{{{2^{{\eta_{r}}}} \cdot {{({2^{{\eta_{q}}}})}^{{\mu_{0}}}}}} $$

(9)

As can be seen from the above formula, when η_r, η_q, μ₀ are set to larger numbers, the indistinguishable query vector can be achieved.

8 Conclusion and future work

8.1 Conclusion

On the basis of a multi-keyword ranked scheme based on traditional inner product, we propose a searchable encryption scheme with forward and backward security and search results verification.

Firstly, we realize forward and backward security by changing the construction method represents the document vector and introducing two number sets.

Moreover, in order to make the verification of the returned results more secure and effective, we combine the Merkle tree and the time-stamp chain to ensure the data freshness, preventing the malicious server from returning the wrong authenticator. The DVMRS in this paper has also been extended to implement dynamic updates.

Finally, we analyze the performance and security of the proposed scheme. In the security analysis, we prove the two threat models proposed in the paper (Table 2).

Table 2 Scheme comparison

Full size table

The work done in this article still has space for improvement. In the model of DVMRS, the data owner sends the authenticator to the users through a secure broadcast channel, and also informs all users through it when the data is updated.

8.2 Future work

Above all, with the rapid development of machine learning(ML) and federated learning(FL), the security and privacy of data in them also need to be guaranteed [11, 30,31,32]. The combination of SE technology and ML,FL will be a research focus in the future, and it is the problem that we will continue to study as well.

Moreover, a variety of novel computing modes such as edge computing and fog computing have been proposed in recent years. These computing modes have been used in the design of SE schemes [7, 15, 20, 29] due to their high efficiency and low latency. DVMRS could try to combine these computing modes to improve query efficiency and reduce communication latency as well.

Eventually, the blockchain technology has developed mature nowadays, and its applications in the context of the Internet of Things are also increasing. Many desirable properties of the blockchain, such as decentralization, encryption technology and unchangeable transaction records, which make it have great applied value in privacy protection and authentication and key authorization of SE [12]. In the subsequent work, we will also try to combine DVMRS with the blockchain to achieve fine-grained management of multi-user authority.

References

Bellare M, Goldreich O, Goldwasser S (1994) Incremental cryptography: the case of hashing and signing. Lecture Notes in Computer Science 839:216–233
Article Google Scholar
Bing W, Wei S, Lou W, Hou YT (2015) Inverted index based multi-keyword public-key searchable encryption with strong privacy guarantee. In: Computer communications
Dai H, Ji Y, Yang G, Huang H, Yi X (2019) A privacy-preserving multi-keyword ranked search over encrypted data in hybrid clouds. IEEE Access
Dan B, Crescenzo GD, Ostrovsky R, Persiano G (2004) Public key encryption with keyword search. Eurocrypt 3027(16):506– 522
MathSciNet MATH Google Scholar
Dan B, Waters B (2007) Conjunctive, subset, and range queries on encrypted data. Tcc 4392:535–554
MathSciNet MATH Google Scholar
Du L, Li K, Liu Q, Wu Z, Zhang S (2020) Dynamic multi-client searchable symmetric encryption with support for boolean queries. Inform Sci 506:234–257
Article Google Scholar
Fan K, Xu H, Gao L, Li H, Yang Y (2019) Efficient and privacy preserving access control scheme for fog-enabled iot. Future Gen Comput Sys 99:134–142
Article Google Scholar
Fu Z, Wu X, Guan C, Sun X, Ren K (2017) Toward efficient multi-keyword fuzzy search over encrypted outsourced data with accuracy improvement. IEEE Trans Inform Forensics Secur 11(12):2706–2716
Article Google Scholar
Goh E-J (2003) Secure indexes. Cryptology ePrint Archive, Report 2003/216. https://eprint.iacr.org/2003/216
Golle P, Staddon J, Waters B (2004) Secure conjunctive keyword search over encrypted data. In: Proceedings applied cryptography & network security conference
Hao M, Li H, Luo X, Xu G, Yang H, Liu S (2019) Efficient and privacy-enhanced federated learning for industrial artificial intelligence. IEEE Transactions on Industrial Informatics
Jiang W, Li H, Xu G, Wen M, Dong G, Lin X (2019) Ptas: privacy-preserving thin-client authentication scheme in blockchain-based pki. Future Gen Comput Sys 96:185–195
Article Google Scholar
Kamara S, Papamanthou C, Roeder T (2011) Cs2: a semantic cryptographic cloud storage system. Technical report, Tech. Rep. MSR-TR-2011-58, Microsoft Technical Report (May 2011) http...
Kamara S, Papamanthou C, Roeder T (2012) Dynamic searchable symmetric encryption. In: ACM conference on computer & communications security
Li H, Liu D, Dai Y, Luan TH, Yu S (2018) Personalized search over encrypted data with efficient and secure updates in mobile clouds. IEEE Trans Emerging Topics Comput 6(1):97– 109
Article Google Scholar
Li H, Yang Y, Dai Y, Yu S, Xiang Y Achieving secure and efficient dynamic searchable symmetric encryption over medical cloud data. IEEE Transactions on Cloud Computing, pp 1–1, accepted 2017, to appear. https://doi.org/10.1109/TCC.2017.2769645
Liu L, De Vel O, Han Q-L, Zhang J, Xiang Y (2018) Detecting and preventing cyber insider threats: a survey. IEEE Communications Surveys & Tutorials 20(2):1397–1417
Article Google Scholar
Merkle RC (1987) A digital signature based on a conventional encryption function. In: Conference on the theory and application of cryptographic techniques. Springer, Berlin, pp 369– 378
Ning C, Cong W, Ming L, Ren K, Lou W (2011) Privacy-preserving multi-keyword ranked search over encrypted cloud data. In: Infocom. IEEE
Ren H, Li H, Dai Y, Yang K, Lin X (2018) Querying in internet of things with privacy preserving: challenges, solutions and opportunities. IEEE Network 32(6):144–151
Article Google Scholar
Song DX, Wagner D, Perrig A (2002) Practical techniques for searches on encrypted data. In: IEEE symposium on security & privacy, pp 44–55
Sun W, Wang B, Cao N, Li M, Li H (2014) Privacy-preserving multi-keyword text search in the cloud supporting similarity-based ranking. IEEE Trans Parallel Distrib Syst 25(11):3025–3035
Article Google Scholar
Wan Z, Deng RH (2016) Vpsearch: achieving verifiability for privacy-preserving multi-keyword search over encrypted cloud data. IEEE Trans Dependable Secure Comput PP(99):1–1
Google Scholar
Wang C, Cao N, Li J, Ren K, Lou W (2010) Secure ranked keyword search over encrypted cloud data. In: IEEE international conference on distributed computing systems
Wang Q, He M, Du M, Chow SSM, Zou Q (2018) Searchable encryption over feature-rich data. IEEE Trans Dependable Secure Comput PP(99):1–1
Google Scholar
Wang X, Cheng X, Yu X (2019) Efficient verifiable key-aggregate keyword searchable encryption for data sharing in outsourcing storage. IEEE Access 8:11732–11742
Article Google Scholar
Wu DN, Gan QQ, Wang X (2018) Verifiable public key encryption with keyword search based on homomorphic encryption in multi-user setting. IEEE Access 6:42445–42453
Article Google Scholar
Xu G, Li H, Dai Y, Yang K, Lin X (2019) Enabling efficient and geometric range query with access control over encrypted spatial data. IEEE Trans Inform Forensics Secur 14(4):870– 885
Article Google Scholar
Xu G, Li H, Liu S, Wen M, Lu R (2019) Efficient and privacy-preserving truth discovery in mobile crowd sensing systems. IEEE Trans Vehicular Technol 68(4):3854–3865
Article Google Scholar
Xu G, Li H, Liu S, Yang K, Lin X (2020) Verifynet: secure and verifiable federated learning. IEEE Trans Inform Forensics Secur 15(1):911–926
Article Google Scholar
Xu G, Li H, Ren H, Yang K, Deng RH (2019) Data security issues in deep learning: attacks, countermeasures and opportunities. IEEE Communications Magazine 57(11):116–122
Article Google Scholar
Zhang J, Xiang Y, Wang Y, Zhou W, Xiang Y, Guan Y (2012) Network traffic classification using correlation information. IEEE Transactions on Parallel and Distributed systems 24(1):104–117
Article Google Scholar
Zhang Q, Fu S, Jia N, Xu M (2018) A verifiable and dynamic multi-keyword ranked search scheme over encrypted cloud data with accuracy improvement. In: International conference on security and privacy in communication systems

Download references

Acknowledgments

This work is supported by the National Key R&D Program of China (No. 2017YFB0802300), the National Natural Science Foundation of China (No. 61772403 and No. U1836203), the Natural Science Foundation of Shaanxi Province(No.2019ZDLGY12-02), the Shaanxi Innovation Team Project(No.2018TD-007), the Xi’an Science and technology innovation plan(No.201809168CX9JC10) and National 111 Program of China B16037.

Author information

Authors and Affiliations

State Key Laboratory of Integrated Service Networks, Xidian University, Xi’an, China
Haoyang Wang, Kai Fan & Hui Li
Key Lab. of the Minist. of Educ. for Wide Band-Gap Semicon. Materials and Devices, Xidian University, Xi’an, China
Yintang Yang

Authors

Haoyang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kai Fan
View author publications
You can also search for this author in PubMed Google Scholar
Hui Li
View author publications
You can also search for this author in PubMed Google Scholar
Yintang Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kai Fan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection: Special Issue on Security and Privacy in Machine Learning Assisted P2P Networks

Guest Editors: Hongwei Li, Rongxing Lu and Mohamed Mahmoud

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, H., Fan, K., Li, H. et al. A dynamic and verifiable multi-keyword ranked search scheme in the P2P networking environment. Peer-to-Peer Netw. Appl. 13, 2342–2355 (2020). https://doi.org/10.1007/s12083-020-00912-7

Download citation

Received: 11 October 2019
Accepted: 26 March 2020
Published: 26 May 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s12083-020-00912-7

A dynamic and verifiable multi-keyword ranked search scheme in the P2P networking environment

Abstract

Similar content being viewed by others

A Verifiable and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud Data with Accuracy Improvement

Secure and Verifiable Multi-owner Ranked-Keyword Search in Cloud Computing

DMRS: an efficient dynamic multi-keyword ranked search over encrypted cloud data

1 Introduction

2 Related work

Multi-keyword search

Ranked search

Dynamic search

Verifiable search

3 Preliminaries

4 System design

4.1 Design goals

4.2 System model

4.3 Security model

5 Scheme construction

5.1 Notations

5.2 Algorithm construction

5.2.1 Data owner

5.2.2 CSP

5.2.3 Data user

5.3 Detailed process

5.3.1 Anthenticator construction

5.3.2 Results verification

6 Performance analysis

6.1 Precision

6.2 Index construction

6.3 Trapdoor generation

6.4 Search efficiency

7 Security analysis

7.1 Privacy protection

7.2 Security model

Known background model

8 Conclusion and future work

8.1 Conclusion

8.2 Future work

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation