Keywords

1 Introduction

Cloud computing is an innovative Internet-based computing paradigm that enables cloud users to move out their data and applications to a remote cloud in order to deploy scalable and elastic services on demand without having to provision a data center. However, while cloud computing has many advantages, it has not been widely used. According to a survey lunched by Twin Strata in 2015, only 38 % of organizations would like to put their inactive data stored in public cloud; about 24 % of users were using cloud storage for data backup, archiving and disaster recovery. This shows that the issue of data security [1, 2] is one of the major obstacles to the promotion of cloud storage. Since the user’s data is outsourced to distributed cloud servers, the service provider can easily access the data.

To prevent data from being maliciously accessed by cloud providers, data owners tend to encrypt their private data before outsourcing to the cloud, and they only share the decryption key to other authorized users. Although this method can protect the privacy of the data, it brings the data retrieve problems. This limitation has motivated many researches on advanced searchable encryption schemes that enable searching on the encrypted data while protecting the confidentiality of the data and queries.

The solution of searchable encryption that first proposed by Song et al. [3] provides a way to perform efficient keyword searches over encrypted data. Promoted by Song’s pioneering work, many efforts have been devoted to construct more efficient searchable symmetric encryption (SSE) schemes, such as [48] and [9]. A SSE scheme allows users to encrypt their data using symmetric encryption, and then uses files and keywords to create the encrypted index for further searches. When the user wants to retrieve some files, he needs to choose a keyword and use it to generate a search request. After that, the server uses this special request to search over its internal data structure. At last, the server finds all the files related to that keyword and returns the file collection to the user. Besides performing successful searches, the privacy feature of the SSE also ensures that, given encrypted files, encrypted indexes and a series of search requests, the server cannot learn any useful information about the files and the keywords.

The solutions above are single-keyword oriented, which are inefficient in practice since the searches may return a very large number of files, such as when searching in a remote-stored email archive. The works in [1012] and [13] extend the search primitive to the multi-keyword conjunctive search, which avoid this limitation and are more practical for real world scenarios.

To the best of our knowledge, few works consider the searchable encryption and the search authentication together. Kamara et al. [14] presented a cryptographic cloud storage system which combines an adaptive secure searchable symmetric encryption scheme with a search authenticate mechanism to allow the user to verify the integrity of the search result. They used a simple Merkle tree structure [15] and a pre-computed basis to authenticate the given dataset. Kurosawa and Ohtaki [16] introduced the definition of UC-security and proposed a verifiable SSE scheme that allows the user to detect search result’s integrity.

Our Contribution.

In this paper, we present a dynamic integrity preserving multi-keyword searchable encryption scheme, enabling search authentication in multi-keyword searchable encryption schemes to fulfill the practical needs. We reduce the multi-keyword search (MSE) problem to the single-keyword case by performing a search for each individual keyword and doing the intersection between each resultant file sets to get the final result. To lower the communication overhead during a search, the intersection of each keyword’s search result is computed at the server side. The only thing that the user needs to do is to receive the final result and verify its integrity.

Thus, our approach should meet the following requirements: (1) the server is able to take multiple keywords as input, and give the final result directly; (2) for the server that honestly executes the search algorithm, a valid proof can be formed and pass the verification; no one can generate a valid proof for a maliciously modified search result and still pass the verification. Theoretical basis of proposed solution is inspired by the authenticated data structure in [17] to verify set operations on out sourced sets.

We use dynamic SSE to realize the single-keyword search and use Merkle tree as the base data structure to prove the correctness of the intersection. Based on them, our scheme maintains the adaptive chosen-keyword security and is unforgeable against adaptive adversaries.

2 Definition and Security Model

2.1 Definitions

We consider the scenario that consists of two types of entities. One of them is the user that owns the data, and the other is the cloud storage provider, as known as the server, which provides storage services to the user. The dynamic MSE scheme allows a user to encrypt his data and outsource the encrypted data to the server. After uploading the encrypted data, the user only needs to store a secret key and an authenticated data state, regardless of the file number and size, i.e., the user’s storage overhead is constant size. User can later generate search requests using single or multiple keywords and submit to the server. Given a search request, the server searches over the encrypted data and returns the set of encrypted files and a corresponding proof. The correctness of this search result can be verified by the user, using this result and proof. User can also dynamically update the file set on demand after the first uploading. The main system architecture is showed in Fig. 1.

Fig. 1.
figure 1

Integrity preserving search over encrypted data

While using multiple keywords in a search, we define the search result to be the intersection of the sets generated by searching for each individual keyword. Concretely speaking, the question we discussed in this paper is the conjunctive keyword search. We use “token” to describe the request sent by user. Since our scheme is dynamic, there are two additional tokens, the add token and the delete token. The formal definition of our scheme is defined as follows.

Definition 1.

A dynamic MSE scheme is a tuple of eight polynomial-time algorithms and protocols \( {\text{MSE}} = ({\text{Gen}},{\text{Setup}},{\text{SrchToken}},{\text{Search}},{\text{Verify, Dec}},{\text{Add/Update}},{\text{Del/}} \) \( {\text{Update)}} \) such that:

\( K \leftarrow {\text{Gen}}(1^{k} ) \) : is a probabilistic algorithm run by the user that takes a security parameter \( 1^{k} \) as input, outputs a secret key \( K \).

\( (\gamma ,{\mathbf{c}},st,\alpha ) \leftarrow {\text{Setup}}(K,\delta ,{\mathbf{f}}) \) : is a probabilistic algorithm run by the user that takes the secret key \( K \) , an index \( \delta \) and a set of files \( {\mathbf{f}} \) as input, outputs an encrypted index \( \gamma \) , a set of ciphertexts \( {\mathbf{c}} \) , a data state \( st \) and an authenticated structure \( \alpha \).

\( \tau_{s} \leftarrow {\text{SrchToken}}(K,W) \) : is a deterministic algorithm run by the user that takes as input the secret key \( K \) and a set of words \( W \) , outputs search token \( \tau_{s} \).

\( ({\mathbf{I}}_{W} ,\pi ) \leftarrow {\text{Search}}(\alpha ,\gamma ,{\mathbf{c}},\tau_{s} ) \) : is a deterministic algorithm run by the server that takes as input the authenticated structure \( \alpha \) , the encrypted index \( \gamma \) , the set of ciphertexts \( {\mathbf{c}} \) and the search token \( \tau_{s} \) , outputs a set of file identifiers \( {\mathbf{I}}_{W} \) , and a proof \( \pi \).

\( b \leftarrow {\text{Verify}}(K,st,\tau_{s} ,{\mathbf{I^{\prime}}},\pi ) \) : is a deterministic algorithm run by the user that takes as input the secret key \( K \) , the data state \( st \) , a search token \( \tau_{s} \) , a set of file identifiers \( {\mathbf{I^{\prime}}} \) and a proof \( \pi \) , outputs 1 as accept or 0 as reject.

\( f \leftarrow {\text{Dec}}(K,c) \) : is a deterministic algorithm run by the user to decrypt a ciphertext \( c \) , outputs a plaintext file \( f \).

\( (U :st^{\prime};\;S:\alpha^{\prime},\gamma^{\prime},{\mathbf{c}}^{\prime}) \leftarrow {\text{Add/Update(}}U:K,\delta_{f} ,f,st;\;S:\alpha ,\gamma ,{\mathbf{c}}) \) : is an interactive protocol run between the user \( U \) and the server \( S \) to add file to the file set.

\( (U :st^{\prime};\;S:\alpha^{\prime},\gamma^{\prime},{\mathbf{c}}^{\prime}) \leftarrow {\text{Del/Update(}}U:K,\delta_{f} ,f,st;\;S:\alpha ,\gamma ,{\mathbf{c}}) \) : is an interactive protocol run between the user \( U \) and the server \( S \) to delete file from the file set.

2.2 Security Model

We consider the server to be an un-trusted entity, which may deliberately steal or sabotage the user’s data, or ignore some special files in the search result. Intuitively, an integrity preserving searchable encryption scheme should meet the following security features: (1) the encrypted files and data structures on the server side should not leak any information about the files to the server; (2) the search requests generated by the user should not leak any information about the keywords he uses; (3) for a fallacious result, the server cannot produce a valid proof and pass the user’s verification.

Dynamic CKA2-Secure.

This security requirement characterizes the feature that the scheme does not leak any information to the adversary except those defined in the leakage functions. The security definition will be parameterized by the four leakage functions \( \mathcal{\mathcal{L}}_{1} \sim \mathcal{\mathcal{L}}_{4} \). The adversary is allowed to be adaptive, i.e., its queries could base on the previous results. Let \( \mathcal{\mathcal{A}} \) be a stateful adversary that executes the server-side algorithm, “game” represent the interaction between \( \mathcal{\mathcal{A}} \) and user or simulator, “view” represent all the information that \( \mathcal{\mathcal{A}} \) can collect during the game. We assume that, \( \mathcal{\mathcal{A}} \) can choose the encrypted message, and then generates the queries by interacting with the user adaptively. Therefore, in our security definition, the “view” of \( \mathcal{\mathcal{A}} \) should only contain the information specified by \( \mathcal{\mathcal{L}}_{1} \), \( \mathcal{\mathcal{L}}_{2} \), \( \mathcal{\mathcal{L}}_{3} \) and \( \mathcal{\mathcal{L}}_{4} \) in a simulated way.

Definition 2.

Given the dynamic MSE scheme described in Definition 1 , describe \( \mathcal{\mathcal{A}} \) as a stateful adversary, \( \mathcal{\mathcal{S}} \) as a stateful simulator, \( \mathcal{\mathcal{L}}_{1} \), \( \mathcal{\mathcal{L}}_{2} \), \( \mathcal{\mathcal{L}}_{3} \), \( \mathcal{\mathcal{L}}_{4} \) as stateful leakage functions. Consider the following games:

The dynamic MSE scheme is \( (\mathcal{\mathcal{L}}_{1} ,\mathcal{\mathcal{L}}_{2} ,\mathcal{\mathcal{L}}_{3} ,\mathcal{\mathcal{L}}_{4} ) \)-secure against adaptive dynamic chosen-keyword attacks if for all PPT adversary \( \mathcal{\mathcal{A}} \) , there exist a probabilistic polynomial time simulator \( \mathcal{\mathcal{S}} \) such that:

$$ \left| {{\mathbf{Pr}}\left[ {{\text{Real}}_{\mathcal{\mathcal{A}}} (1^{k} ) = 1} \right] - {\mathbf{Pr}}\left[ {{\text{Ideal}}_{{\mathcal{\mathcal{A}},\mathcal{\mathcal{S}}}} (1^{k} ) = 1} \right]} \right| \le {\mathbf{negl}}(1^{k} ), $$

where \( {\mathbf{negl}}(1^{k} ) \) is a negligible function with input \( 1^{k} \).

Unforgeability.

We use game \( {\text{Forge}}_{\mathcal{\mathcal{A}}} (1^{k} ) \) to describe our scheme’s unforgeability. In the unforgeability game, the adversary interacts with a user that honestly executes the scheme. User initializes his data structures using the data provided by the adversary. After making polynomial times queries, the adversary produces a set of keywords, a wrong search result and a proof to this result. If these outputs pass the user’s verification algorithm, the game outputs 1, otherwise it outputs 0. The unforgeability requires that, all PPT adversaries have at most negligible probability to let the game output 1. We give the formal definition as follow.

Definition 3.

Given the dynamic MSE scheme described in Definition 1 , for a stateful adversary \( \mathcal{\mathcal{A}} \) , consider the following game:

where the set \( {\mathbf{I^{\prime}}} \ne {\mathbf{I}}_{W} \) . We say the dynamic MSE scheme is unforgeable if for all PPT adversary \( \mathcal{\mathcal{A}} \) , the probability: \( {\mathbf{Pr}} [ {\text{Forge}}_{\mathcal{\mathcal{A}}} (1^{k} ) = 1] \le {\mathbf{negl}}(1^{k} ) \) , where \( {\mathbf{negl}}(1^{k} ) \) is a negligible function with input \( 1^{k} \).

3 Integrity Preserving Multi-keyword Searchable Encryption Scheme

In this section, we first construct a multi-keyword searchable encryption scheme, and then add the search authentication mechanism to it to make the search result’s integrity verifiable.

In our construction, the set of files \( {\mathbf{f}} \) along with the inverted indexes \( \delta \) are the initial input. In contrast to the file index, an inverted index is a set of lists that lead by keywords, and each keyword is followed by a set of files that contain that keyword. The keywords of each file are pre-selected, and can be considered as the outputs of some other algorithms, which won’t be discussed here.

3.1 Dynamic Searchable Encryption

In the literature, most searchable encryption schemes use symmetric encryption to improve performance. We follow the prior constructions and build our scheme upon the CPA secure private key encryption [18].

The Fig. 2 shows our dynamic searchable encryption structure that is constructed based on the inverted index. Generally speaking, the lookup table contains all the keywords in the system, and each keyword in the table leads a list that stored in the search array. For example, the list of keyword \( w_{2} \) starts at address 4 in the array, and the node at address 4 has a pointer that points to address 7, and then address 8. By traversing this list, all files that contain the keyword \( w_{2} \) can be retrieved. All the nodes are stored at random location in the search array. To support efficient file updating, there are also a deletion table and a deletion array. They work the same way, except those lists are led by files.

Fig. 2.
figure 2

The schematic search structure

In order to prevent the server from learning the data, all the tables’ entry, all the pointers in the table, and all the nodes in those arrays are encrypted. During a search, given the encrypted keywords, the server first decrypts the pointers in the lookup table and then uses the pointers to find the corresponding file identifiers in the search array. Those keywords remain encrypted throughout the search. Even if the server has searched all those keywords, it can only learn the relationship between the encrypted keywords and the related file identifiers but cannot obtain any useful knowledge about the keywords itself. This could prevent the curious server from learning the files and keywords. A more detailed analysis about this security model will be presented in Sect. 4.

3.2 Making Result Verifiable

In the following content, we discuss the method to make result verifiable. This method can allow a server to prove to a client that it answered a multi-keyword search query correctly.

The method proposed in [14] is a Merkle tree based solution that it computes the accumulated value for each word \( w \), and uses these values as leaves to construct the tree. In a search, the server returns a file set \( S \), and a Merkle tree proof to this set. The user can compute his own accumulated value using the files in \( S \), and use it to perform the Merkle tree verification. If the newly computed root equals to the original one, then the result is correct and can be accepted by the user.

However, while switching to the multiple keywords setting, this solution is obsolete to prove the correctness of the intersection of the results. The server could only generate the proof for each set separately. These sets and proofs must be transferred to the user side to be verified, and subsequently the intersection of these sets could be computed by the user. Obviously, the communication complexity is linear and may have performance problems when the sets are very large.

The reasonable way to address this problem is to let the server compute the intersection, and give the user final result directly. In this case, the correctness of the intersection operation should be proved. We use the bilinear-map accumulator to realize this functionality. The bilinear-map accumulator [19] is an efficient tool to provide proofs of membership for elements that belong to a set. Let \( s \in {\mathbf{\mathbb{Z}}}_{p}^{*} \) be a randomly chosen trapdoor. The accumulator accumulates elements in \( {\mathbf{\mathbb{Z}}}_{p} \), and outputs an element in \( \varvec{\mathbb{G}} \). For a set of elements \( \mathcal{\mathcal{X}} \) in \( {\mathbf{\mathbb{Z}}}_{p} \), the accumulation value \( {\text{acc}}(\mathcal{\mathcal{X}}) \) is defined as:

$$ {\text{acc}}(\mathcal{\mathcal{X}}) = g^{{\prod\nolimits_{{x \in \mathcal{\mathcal{X}}}} {(x + s)} }} $$
(1)

Without knowing the trapdoor \( s \), the value \( {\text{acc}}(\mathcal{\mathcal{X}}) \) can also be constructed using \( \mathcal{\mathcal{X}} \) and the pre-computed \( (g,g^{s} , \ldots ,g^{{s^{q} }} ) \), where \( q \ge \# \mathcal{\mathcal{X}} \). The proof of subset containment of a set \( \mathcal{\mathcal{S}} \subseteq \mathcal{\mathcal{X}} \) is the witness \( (\mathcal{\mathcal{S}},\mathcal{\mathcal{W}}_{{\mathcal{\mathcal{S}},\mathcal{\mathcal{X}}}} ) \) where:

$$ \mathcal{\mathcal{W}}_{{\mathcal{\mathcal{S}},\mathcal{\mathcal{X}}}} = g^{{\prod\nolimits_{{x \in \mathcal{\mathcal{X}} - \mathcal{\mathcal{S}}}} {(x + s)} }} $$
(2)

Subset containment of \( \mathcal{\mathcal{S}} \) in \( \mathcal{\mathcal{X}} \) can be verified by checking:

$$ e\left( {\mathcal{\mathcal{W}}_{{\mathcal{\mathcal{S}},\mathcal{\mathcal{X}}}} ,g^{{\prod\nolimits_{{x \in \mathcal{\mathcal{S}}}} {(x + s)} }} } \right) = e\left( {{\text{acc}}(\mathcal{\mathcal{X}}),g} \right) $$
(3)

The security of the bilinear-map accumulator relies on the bilinear q-Strong Diffie-Hellman assumption.

Intuitively, the correctness of the intersection could be defined as follows: given a set \( {\mathbf{I}} \) and a series of sets \( S_{1} , \ldots ,S_{n} \), \( {\mathbf{I}} \) is the correct intersection of \( S_{1} , \ldots ,S_{n} \) if and only if the following conditions hold:

  1. 1.

    The subset condition: \( ({\mathbf{I}} \subseteq S_{1} ) \wedge \cdots \wedge ({\mathbf{I}} \subseteq S_{n} ) \).

  2. 2.

    The completeness condition: \( (S_{1} - {\mathbf{I}}) \cap \cdots \cap (S_{n} - {\mathbf{I}}) = \emptyset \).

The subset condition is easy to understand, because as the intersection, the set \( {\mathbf{I}} \) must be included in each set \( S_{i} \). We use Merkle tree to authenticate the value \( {\text{acc}}(S_{i} ) \). For all \( w \in {\mathbf{w}} \), the values \( {\text{acc}}({\mathbf{f}}_{w} ) \) are computed according to (1), then the tree is constructed using these values as leaves.

Since the user does not store those accumulated values, the server should first generates Merkle tree proofs for each \( {\text{acc}}(S_{i} ) \). It’s then straight forward to produce the subset witness \( ({\mathbf{I}},\mathcal{\mathcal{W}}_{{{\kern 1pt} {\mathbf{I}},S_{i} }} ) \) in (2) for each set \( S_{i} \). Given the \( {\text{acc}}(S_{i} ) \) and the witness \( ({\mathbf{I}},\mathcal{\mathcal{W}}_{{{\kern 1pt} {\mathbf{I}},S_{i} }} ) \), the validity of the value \( {\text{acc}}(S_{i} ) \) should be first verified using Merkle tree proofs, then the subset containment relationship could be checked by performing the verifications according to the equation in (3).

The completeness condition is also necessary since the set \( {\mathbf{I}} \) must contain all the common elements. To construct the completeness proof, we define the polynomial:

$$ P_{i} (s) = \prod\nolimits_{{f \in S_{i} - {\mathbf{I}}}} {\left( {s + id(f)} \right)} $$

The following result is based on the extended Euclidean algorithm over polynomials and provides verification for checking the completeness of set intersection.

Lemma 1.

The set \( {\mathbf{I}} \) is complete if and only if there exist polynomials \( q_{1} (s), \ldots ,q_{n} (s) \) such that \( q_{1} (s)P_{1} (s) + \cdots + q_{n} (s)P_{n} (s) = 1 \) where \( P_{i} (s) \) is defined above. Suppose \( {\mathbf{I}} \) is not the complete set, then there exist at least one common factor in \( P_{1} (s), \ldots ,P_{n} (s) \) . Thus there are no polynomials \( q_{1} (s), \ldots ,q_{n} (s) \) to satisfy \( q_{1} (s)P_{1} (s) + \cdots + q_{n} (s)P_{n} (s) = 1 \).

The formal analysis will be given in Sect. 4.

3.3 Explicit Construction

Let \( \Uppi = ({\text{Gen}},{\text{Enc}},{\text{Dec}}) \) be a private-key encryption system. \( F:\{ 0,1\}^{k} \times \{ 0,1\}^{*} \to \{ 0,1\}^{k} \), \( G :\{ 0,1\}^{k} \times \{ 0,1\}^{*} \to \{ 0,1\}^{*} \), \( P: \) \( \{ 0,1\}^{k} \times \{ 0,1\}^{*} \to \{ 0,1\}^{k} \) be pseudo-random functions. Let \( H_{1} :\{ 0,1\}^{*} \to \{ 0,1\}^{*} \), \( H_{2} :\{ 0,1\}^{*} \to \{ 0,1\}^{*} \) and \( H_{3} :\{ 0,1\}^{*} \to \{ 0,1\}^{k} \) be collision-resistant hash functions. Let \( z \in \varvec{\mathbb{N}} \) be the initial size of the free list, and \( {\mathbf{0}} \) be a series of 0’s. Choose bilinear pairing parameters \( (p,\varvec{\mathbb{G}},\mathcal{\mathcal{G}},e,g) \).

\( {\mathbf{Gen}}\text{(1}^{\text{k}} \text{)}: \) Randomly choose three \( k{\text{ - bit}} \) strings \( K_{1} ,K_{2} ,K_{3} \) and generate \( K_{4} \leftarrow \Uppi . {\text{Gen}}(1^{k} ) \). Choose \( s \in \varvec{\mathbb{Z}}_{p}^{*} \) at random and output \( K = (K_{1} ,K_{2} ,K_{3} ,K_{4} ,s) \) as the private keys. Compute \( (g,g^{s} ,g^{{s^{2} }} , \ldots ,g^{{s^{q} }} ) \) as public parameters where \( q \) should be large enough, i.e., should at least satisfy \( q \ge \hbox{max} \{ \# {\mathbf{f}}_{w} \}_{{w \in {\mathbf{w}}}} \).

\( {\mathbf{Setup}}{(K,\delta ,f):} \)

  1. 1.

    Let \( {\text{A}}_{s} \) and \( {\text{A}}_{d} \) be arrays of size \( |{\mathbf{c}}|/8 + z \) and let \( {\text{T}}_{s} \) and \( {\text{T}}_{d} \) be dictionaries of size \( \# {\mathbf{w}} \) and \( \# {\mathbf{f}} \), respectively. Use “free” to represent a \( k{\text{ - length}} \) word not in \( {\mathbf{w}} \).The following step 2 and step 3 should be performed synchronously to set up \( {\text{A}}_{s} \) and \( {\text{A}}_{d} \) at the same time.

  2. 2.

    For every keyword \( w \in {\mathbf{w}} \),

    • Generate a list \( {\text{L}}_{w} \) of \( \# {\mathbf{f}}_{w} \) nodes \( ({\text{N}}_{1} , \ldots ,{\text{N}}_{{\# {\mathbf{f}}_{w} }} ) \) randomly stored in \( {\text{A}}_{s} \), set \( {\text{N}}_{i} = \left( {\left\langle {id_{i} ,{\text{addr}}_{s} ({\text{N}}_{i - 1} ),{\text{addr}}_{s} ({\text{N}}_{i + 1} )} \right\rangle \oplus H_{1} (P_{{K_{3} }} (w),r_{i} ),r_{i} } \right) \), where \( id_{i} \) is the identity of the \( i{\text{th}} \) file in \( {\mathbf{f}}_{w} \), \( r_{i} \) is a \( k{\text{ - bit}} \) random string, and \( {\text{addr}}_{s} ({\text{N}}_{{\# {\mathbf{f}}_{w} + 1}} ) = \) \( {\text{addr}}_{s} ({\text{N}}_{0} ) = {\mathbf{0}}^{{\log \# {\text{A}}_{s} }} \)

    • Set \( {\text{T}}_{s} \left[ {F_{{K_{1} }} (w)} \right] = \left\langle {{\text{addr}}_{s} ({\text{N}}_{1} ),{\text{addr}}_{d} ({\text{N}}^{\prime}_{1} )} \right\rangle \oplus G_{{K_{2} }} (w) \), where \( {\text{N}}^{\prime}_{1} \) is the dual of \( {\text{N}}_{1} \), which has the same \( (f_{1} ,w) \) pair as node \( {\text{N}}_{1} \).

  1. 3.

    For each file \( f \) in \( {\mathbf{f}} \),

    • Create a list \( {\text{L}}_{f} \) of \( \# f \) dual nodes \( ({\text{D}}_{1} , \ldots ,{\text{D}}_{\# f} ) \) \( ({\text{N}}_{1} , \ldots ,{\text{N}}_{{\# {\mathbf{f}}_{w} }} ) \) randomly stored in the deletion array \( {\text{A}}_{d} \). Each node \( {\text{D}}_{i} \) is associated with a word \( w \), and a corresponding node \( {\text{N}} \) in \( {\text{L}}_{w} \). Let \( {\text{N}}_{ + 1} \) be the node after \( {\text{N}} \) in \( {\text{L}}_{w} \), and \( {\text{N}}_{ - 1} \) be the node before \( {\text{N}} \) in \( {\text{L}}_{w} \). Define \( {\text{D}}_{i} \) as:

$$ {\text{D}}_{i} = \left( {\left\langle {\begin{array}{*{20}c} {{\text{addr}}_{d} ({\text{D}}_{i + 1} ),{\text{addr}}_{d} ({\text{N}}^{\prime}_{ - 1} ),{\text{addr}}_{d} ({\text{N}}^{\prime}_{ + 1} ),} \\ {{\text{addr}}_{s} ({\text{N}}),{\text{addr}}_{s} ({\text{N}}_{ - 1} ),{\text{addr}}_{s} ({\text{N}}_{ + 1} )} \\ \end{array} } \right\rangle \oplus H_{2} (P_{{K_{3} }} (f),r^{\prime}_{i} ),r^{\prime}_{i} } \right), $$

where \( r^{\prime}_{i} \) is a random \( k{\text{ - bit}} \) string, \( {\text{addr}}_{d} ({\text{D}}_{\# f + 1} ) = {\mathbf{0}}^{{\log \# {\text{A}}_{d} }} \)

  • Store a pointer to the first node of \( {\text{L}}_{f} \) in the deletion table by setting:

$$ {\text{T}}_{d} \left[ {F_{{K_{1} }} (f)} \right] = {\text{addr}}_{d} ({\text{D}}_{1} ) \oplus G_{{K_{2} }} (f) $$
  1. 4.

    Generate the free list \( {\text{L}}_{\text{free}} \) by choosing \( z \) at random in \( {\text{A}}_{s} \) and in \( {\text{A}}_{d} \). Let \( ({\text{F}}_{1} , \ldots ,{\text{F}}_{z} ) \) and \( ({\text{F}}^{\prime}_{1} , \ldots ,{\text{F}}^{\prime}_{z} ) \) be the free nodes in \( {\text{A}}_{s} \) and \( {\text{A}}_{d} \), respectively. Set: \( {\text{T}}_{s} \left[ {\text{free}} \right] = \left\langle {{\text{addr}}_{s} ({\text{F}}_{1} ),{\mathbf{0}}^{{\log \# {\text{A}}_{s} }} } \right\rangle \), and for \( 1 \le i \le z \), set \( {\text{A}}_{s} \left[ {{\text{addr}}_{s} ({\text{F}}_{i} )} \right] = \) \( \left\langle {{\mathbf{0}}^{{\log \# {\mathbf{f}}}} ,{\text{addr}}_{s} ({\text{F}}_{i + 1} ),{\text{addr}}_{d} ({\text{F}}^{\prime}_{i} ),{\mathbf{0}}^{k} } \right\rangle \), where \( {\text{addr}}_{s} ({\text{F}}_{z + 1} ) = {\mathbf{0}}^{{\log \# {\text{A}}_{s} }} \).

  2. 5.

    Fill the remaining entries of \( {\text{A}}_{s} \) and \( {\text{A}}_{d} \) with random strings.

  3. 6.

    For \( 1 \le i \le \# {\mathbf{f}} \), let \( c_{i} \leftarrow \Uppi . {\text{Enc}}_{{K_{4} }} (f_{i} ) \).

  4. 7.

    For all \( w \in {\mathbf{w}} \), form the leaf node by letting \( \theta_{w} = \left\langle {F_{{K_{1} }} (w),g^{{\prod\nolimits_{{f \in {\mathbf{f}}_{w} }} {\left( {s + id(f)} \right)} }} } \right\rangle \).

    Construct a Merkle tree using \( H_{3} \) with leaves \( \mathcal{\mathcal{L}} = \{ \theta_{w} \}_{{w \in {\mathbf{w}}}} \) permuted in a random order.

  1. 8.

    Output \( (\gamma ,{\mathbf{c}},st,\alpha ) \), where \( \gamma = ({\text{A}}_{s} ,{\text{T}}_{s} ,{\text{A}}_{d} ,{\text{T}}_{d} ) \), \( {\mathbf{c}} = (c_{1} , \ldots , \) \( c_{{\# {\mathbf{f}}}} ) \), \( st \) is the root of the tree, and \( \alpha \) is the tree itself.

\( {\mathbf{SrchToken}}(K,W) \): For \( W = (w_{1} , \ldots ,w_{n} ) \), compute each \( \tau_{i} = (F_{{K_{1} }} (w_{i} ),G_{{K_{2} }} (w_{i} ), \) \( P_{{K_{3} }} (w_{i} )) \), then output \( \tau_{s} = \left( {\tau_{1} , \ldots \tau_{n} } \right) \).

\( {\mathbf{Search}}(\alpha ,\gamma ,{\mathbf{c}},\tau_{s} ): \)

  1. 1.

    For each \( \tau_{i} \) in \( \tau_{s} \), parse \( \tau_{i} \) as \( (\tau_{i,1} ,\tau_{i,2} ,\tau_{i,3} ) \),

    • Recover a pointer to the first node of the list by computing \( (\alpha_{1} ,\alpha^{\prime}_{1} ) = {\text{T}}_{s} [\tau_{i,1} ] \oplus \tau_{i,2} \).

    • Lookup node \( {\text{N}}_{1} = {\text{A}}[\alpha_{1} ] \) and decrypt it using \( \tau_{i,3} \), i.e., parse \( {\text{N}}_{1} \) as \( (v_{1} ,r_{1} ) \) and compute \( (id_{1} ,{\mathbf{0}},{\text{addr}}_{s} ({\text{N}}_{2} )) = v_{1} \oplus H_{1} (\tau_{i,3} ,r_{1} ) \). Let \( \alpha_{2} = {\text{addr}}_{s} ({\text{N}}_{2} ) \).

    • For \( j \ge 2 \), decrypt node \( {\text{N}}_{j} \) as above until \( \alpha_{j + 1} = {\mathbf{0}} \).

    • Let \( S_{i} = \{ id_{1} , \ldots ,id_{t} \} \) be the file identifiers revealed in the previous steps.

  2. 2.

    For the sets \( S_{1} , \cdots ,S_{n} \) generated in step 1, let \( {\mathbf{I}}_{W} = \{ id_{1} , \ldots ,id_{m} \} \) be the intersection, i.e., \( {\mathbf{I}}_{W} = S_{1} \cap S_{2} \cap \ldots \cap S_{n} \). Compute the proofs in the following steps:

    • For \( 1 \le i \le n \), find the leaf \( \theta_{i} \) in \( \alpha \) whose first element is \( \tau_{i,1} \) and generate the proof \( t_{i} \). The \( t_{i} \) includes \( \theta_{i} \) and all the sibling nodes in the path from the leaf \( \theta_{i} \) to the root. Let \( \mathcal{\mathcal{T}} = \{ t_{1} , \ldots ,t_{n} \} \).

    • For \( 1 \le i \le n \), form the polynomial: \( P_{i} = \prod\nolimits_{{f \in S_{i} - {\mathbf{I}}_{W} }} {\left( {s + id(f)} \right)} \),

      then use the public parameters \( (g,g^{s} ,g^{{s^{2} }} , \ldots ,g^{{s^{q} }} ) \) to compute the value \( g^{{P_{i} }} \). Let \( {\mathcal{S}} = \{ g^{{P_{1} }} , \ldots ,g^{{P_{n} }} \} \) be the subset witness.

    • Giving the polynomials \( \{ P_{1} , \ldots ,P_{n} \} \) generated in step 2, find the polynomials \( \{ q_{1} , \ldots ,q_{n} \} \) that satisfying \( q_{1} P_{1} + q_{2} P_{2} + \cdots + q_{n} P_{n} = 1 \) . This can be done using extended Euclidean algorithm over polynomials. Let \( {\mathcal{C}} = \{ g^{{q_{1} }} , \ldots ,g^{{q_{n} }} \} \) be the completeness witness.

  3. 3.

    Output the result \( {\mathbf{I}}_{W} \) and the proof \( \pi = \{ {\mathcal{T}},{\mathcal{S}},{\mathcal{C}}\} \).

\( {\mathbf{Verify}}(K,st,\tau_{s} ,{\mathbf{I^{\prime}}} ,\pi ): \)

  1. 1.

    Parse \( \pi \) as \( \{ {\mathcal{T}},{\mathcal{S}},{\mathcal{C}}\} \) and verify these proofs in the following steps:

    • For each proof \( t_{i} \) in \( {\mathcal{T}} \), let \( \theta_{i} \) be the corresponding leaf node in \( t_{i} \). Parse \( \theta_{i} \) as \( (\theta_{i,1} ,\theta_{i,2} ) \), i.e. \( \theta_{i,1} = F_{{K_{1} }} (w_{i} ) \) and \( \theta_{i,2} = g^{{\prod\nolimits_{{f \in {\mathbf{f}}_{{w_{i} }} }} {\left( {s + id(f)} \right)} }} \).Verify if the value \( \theta_{i,1} \) equals to \( \tau_{i,1} \), where \( \tau_{i,1} \) is the first element of \( \tau_{i} \) in \( \tau_{s} \). Then verify the proof \( t_{i} \) using the root \( st \).

    • For \( 1 \le i \le n \), parse the leaf node \( \theta_{i} \) as \( (\theta_{i,1} ,\theta_{i,2} ) \), then perform the subset condition verification by checking: \( e(g^{{\prod\nolimits_{k = 1}^{m} {(s + id_{k} )} }} ,g^{{P_{i} }} )\mathop = \limits^{?} e(\theta_{i,2} ,g) \), where \( (id_{1} , \ldots id_{m} ) \) is from \( {\mathbf{I^{\prime}}} \) and \( g^{{P_{i} }} \) is element in \( \mathcal{\mathcal{S}} \).

    • Verify the completeness condition by checking: \( \prod\nolimits_{i = 1}^{n} {e(g^{{P_{i} }} ,g^{{q_{i} }} )} \mathop = \limits^{?} e(g,g) \), where \( g^{{P_{i} }} \) is element in \( \mathcal{\mathcal{S}} \) and \( g^{{q_{i} }} \) is the corresponding element in \( \mathcal{\mathcal{C}} \).

  2. 2.

    If all the verifications succeed, then output 1, otherwise output 0.

\( {\mathbf{Dec}}(K,c) \): Output \( f = \Uppi . {\text{Dec}}_{{K_{4} }} (c) \).

\( {\mathbf{Add/Update}} (U:K,\delta_{f} ,f,st;\;S:\alpha ,\gamma ,{\mathbf{c}}) \):

User:

Recover the unique sequence of words \( (w_{1} , \ldots ,w_{\# f} ) \) from \( \delta_{f} \) and compute the set \( \{ F_{{K_{1} }} (w_{i} )\}_{1 \le i \le \# f} \) and send to the server.

Server:

  1. 1.

    For \( 1 \le i \le \# f \), traverse the Merkel tree \( \alpha \) and:

    • Find the leaf \( \theta_{i} \) in \( \alpha \) whose first element is \( F_{{K_{1} }} (w_{i} ) \).

    • Let \( t_{i} \) be the proof in \( \alpha \) from \( \theta_{i} \) to the root. The proof includes the leaf \( \theta_{i} \), and all the sibling nodes from \( \theta_{i} \) to the root.

  2. 2.

    Let \( \rho = (t_{1} , \ldots ,t_{\# f} ) \) and send it to the user.

User:

  1. 1.

    Verify the proofs in \( (t_{1} , \ldots ,t_{\# f} ) \) using \( st \), if fails, output \( \bot \) and terminate.

  2. 2.

    For \( 1 \le i \le \# f \),

    • Let \( \theta_{i} \) be the leaf in \( t_{i} \), parse \( \theta_{i} \) as \( (\theta_{i,1} ,\theta_{i,2} ) \).

    • Compute the new leaf node \( \theta^{\prime}_{i} = (\theta_{i,1} ,(\theta_{i,2} )^{s + id(f)} ) \).

  3. 3.

    Update the root hash \( st \) using \( (\theta^{\prime}_{1} , \ldots ,\theta^{\prime}_{\# f} ) \) and the information in \( (t_{1} , \ldots ,t_{\# f} ) \).

  4. 4.

    Compute \( \tau_{a} = (F_{{K_{1} }} (f),G_{{K_{2} }} (f),\lambda_{1} , \ldots \lambda_{\# f} ) \), where for all \( 1 \le i \le \# f: \):

$$ \lambda_{i} = \left( {\begin{array}{*{20}l} {\theta_{i,1}^{\prime } ,\theta_{i,2}^{\prime } ,G_{{K_{2} }} \left( {w_{i} } \right),\left\langle {id\left( f \right),{\mathbf{0}},{\mathbf{0}},} \right\rangle \oplus H_{1} \left( {P_{{K_{3} }} \left( {W_{i} } \right),r_{i} } \right),} \hfill \\ {r_{i} ,\left\langle {{\mathbf{0}},{\mathbf{0}},{\mathbf{0}},{\mathbf{0}},{\mathbf{0}},{\mathbf{0}}} \right\rangle \oplus H_{2} \left( {P_{{K_{3} }} \left( f \right),r_{i}^{\prime } } \right),r_{i}^{\prime } } \hfill \\ \end{array} } \right), $$

where \( r_{i} \) and \( r^{\prime}_{i} \) are random \( k{\text{ - bit}} \) strings.

  1. 5.

    Let \( c_{f} \leftarrow {\text{SKE}} . {\text{Enc}}_{{K_{4} }} (f) \) and send \( (\tau_{a} ,c_{f} ) \) to the server, then output the new root \( st^{\prime} \).

Server:

  1. 1.

    Parse \( \tau_{a} \) as \( (\tau_{1} ,\tau_{2} ,\lambda_{1} , \ldots ,\lambda_{\# f} ) \) and return \( \bot \) if \( \tau_{1} \) is already in \( {\text{T}}_{d} \).

  2. 2.

    For \( 1 \le i \le \# f \),

    • Find the first free location \( \varphi \) in \( {\text{A}}_{s} \), second free location \( \varphi_{ + 1} \) in \( {\text{A}}_{s} \), first free location \( \varphi^{\prime} \) in \( {\text{A}}_{d} \), and second free location \( \varphi^{\prime}_{ + 1} \) in \( {\text{A}}_{d} \), by computing \( (\varphi ,{\mathbf{0}}) = {\text{T}}_{s} [ {\text{free]}} \), \( ({\mathbf{0}},\varphi_{ + 1} ,\varphi^{\prime}) = {\text{A}}_{s} [\varphi ] \) and \( ({\mathbf{0}},\varphi_{ + 2} ,\varphi^{\prime}_{ + 1} ) = {\text{A}}_{s} [\varphi_{ + 1} ] \).

    • Update the search table by setting \( {\text{T}}_{s} [{\text{free}}] = (\varphi_{ + 1} ,{\mathbf{0}}) \).

    • Recover \( {\text{N}}_{1} \)’s address \( \alpha_{1} \) by computing \( (\alpha_{1} ,\alpha^{\prime}_{1} ) = {\text{T}}_{s} [\lambda_{i} [1]] \oplus \lambda_{i} [3] \).

    • Parse \( {\text{N}}_{1} = {\text{A}}_{s} [\alpha_{1} ] \) as \( (v_{1} ,r_{1} ) \), then update \( {\text{N}}_{1} \)’s back pointer point by setting: \( {\text{A}}_{s} [\alpha_{1} ] = \left( {v_{1} \oplus \langle {\mathbf{0}},\varphi ,{\mathbf{0}}\rangle ,r_{1} } \right) \).

    • Store the new node at location \( \varphi \) and modify its forward pointer to \( {\text{N}}_{1} \) by setting:\( {\text{A}}_{s} [\varphi ] = \left( {\lambda_{i} [4] \oplus \langle {\mathbf{0}},{\mathbf{0}},\alpha_{1} \rangle ,\lambda_{i} [5]} \right) \) .

    • Update the search table by setting:\( {\text{T}}_{s} \left[ {\lambda_{i} [1]} \right] = (\varphi ,\varphi^{\prime}) \oplus \lambda_{i} [3] \) .

    • Parse \( {\text{D}}_{1} = {\text{A}}_{d} [\alpha^{\prime}_{1} ] \) as \( (v^{\prime}_{1} ,r^{\prime}_{1} ) \), set \( {\text{A}}_{d} [\alpha^{\prime}_{1} ] = (v^{\prime}_{1} \oplus \langle {\mathbf{0}},\varphi^{\prime},{\mathbf{0}},{\mathbf{0}},\varphi ,{\mathbf{0}}\rangle ,r^{\prime}_{1} ) \).

    • If \( i < \# f \), set \( {\text{A}}_{d} [\varphi^{\prime}] = \left( {\lambda_{i} [6] \oplus \langle \varphi^{\prime}_{ + 1} ,{\mathbf{0}},\alpha^{\prime}_{1} ,\varphi ,{\mathbf{0}},\alpha_{1} \rangle ,\lambda_{i} [7]} \right) \).

    • If \( i = \# f \), set \( {\text{A}}_{d} [\varphi^{\prime}] = \left( {\lambda_{i} [6] \oplus \langle {\mathbf{0}},{\mathbf{0}},\alpha^{\prime}_{1} ,\varphi ,{\mathbf{0}},\alpha_{1} \rangle ,\lambda_{i} [7]} \right) \).

    • If \( i = 1 \), then update the deletion table by setting \( {\text{T}}_{d} [\tau_{1} ] = \varphi^{\prime} \oplus \tau_{2} \).

  3. 3.

    Update the cipher texts by adding \( c \) to \( {\mathbf{c}} \).

  4. 4.

    Let \( \theta^{\prime}_{i} = (\lambda_{i} [1],\lambda_{i} [2]) \), update the tree \( \alpha \) by replacing the leaves \( (\theta_{1} , \ldots ,\theta_{\# f} ) \) with \( (\theta^{\prime}_{1} , \ldots ,\theta^{\prime}_{\# f} ) \).

  5. 5.

    Output \( (\alpha^{\prime},\gamma^{\prime},{\mathbf{c^{\prime}}}) \), where \( \alpha^{\prime} \) is the updated tree.

\( {\mathbf{Del/Update}} (U:K,\delta_{f} ,f,st;\;S:\alpha ,\gamma ,{\mathbf{c}}): \)

User:

Recover the unique sequence of words \( (w_{1} , \ldots ,w_{\# f} ) \) from \( \delta_{f} \) and compute the set \( \{ F_{{K_{1} }} (w_{i} )\}_{1 \le i \le \# f} \) and send to the server.

Server:

  1. 1.

    For \( 1 \le i \le \# f \), traverse the Merkel tree \( \alpha \) and:

    • Find the leaf \( \theta_{i} \) in \( \alpha \) whose first element is \( F_{{K_{1} }} (w_{i} ) \).

    • Let \( t_{i} \) be the proof in \( \alpha \) from \( \theta_{i} \) to the root. The proof includes the leaf \( \theta_{i} \), and all the sibling nodes from \( \theta_{i} \) to the root.

  1. 2.

    Let \( \rho = (t_{1} , \ldots ,t_{\# f} ) \) and send it to the user.

User:

  1. 1.

    Verify the proofs in \( (t_{1} , \ldots ,t_{\# f} ) \) using \( st \), if fails, output \( \bot \) and terminate.

  2. 2.

    For \( 1 \le i \le \# f \),

    • Let \( \theta_{i} \) be the leaf in \( t_{i} \), parse \( \theta_{i} \) as \( (\theta_{i,1} ,\theta_{i,2} ) \).

    • Compute the new leaf node \( \theta^{\prime}_{i} = (\theta_{i,1} ,(\theta_{i,2} )^{1/(s + id(f))} ) \).

  3. 3.

    Update the root hash \( st \) using \( (\theta^{\prime}_{1} , \ldots ,\theta^{\prime}_{\# f} ) \) and the information in \( (t_{1} , \ldots ,t_{\# f} ) \).

  4. 4.

    Compute \( \tau_{d} = (F_{{K_{1} }} (f),G_{{K_{2} }} (f),P_{{K_{3} }} (f),id(f),\theta^{\prime}_{1} , \ldots ,\theta^{\prime}_{\# f} ) \).

  5. 5.

    Send \( \tau_{d} \) to the server, then output the new root \( st^{\prime} \).

Server:

  1. 1.

    Parse \( \tau_{d} \) as \( (\tau_{1} ,\tau_{2} ,\tau_{3} ,id,\theta^{\prime}_{1} , \ldots ,\theta^{\prime}_{\# f} ) \).

  2. 2.

    Find the first node of \( {\text{L}}_{f} \) by computing \( \alpha^{\prime}_{i} = {\text{T}}_{d} [\tau_{1} ] \oplus \tau_{2} \).

  3. 3.

    While \( \alpha^{\prime}_{i} \ne {\mathbf{0}} \),

    • Parse \( {\text{D}}_{i} = {\text{A}}_{d} [\alpha^{\prime}_{i} ] \) as \( (v^{\prime}_{i} ,r^{\prime}_{i} ) \), decrypt \( {\text{D}}_{i} \) by computing \( (\alpha_{1} , \ldots ,\alpha_{6} ) = v^{\prime}_{i} \oplus \) \( H_{2} (\tau_{3} ,r^{\prime}_{i} ) \).

    • Delete \( {\text{D}}_{i} \) by setting \( {\text{A}}_{d} [\alpha^{\prime}_{i} ] \) to a random string.

    • Find address of the first free node by computing \( (\varphi ,{\mathbf{0}}) = {\text{T}}_{s} [{\text{free]}} \).

    • Update the first node of the free list in the \( {\text{T}}_{s} \) point to \( {\text{D}}_{i} \)’s dual by setting \( {\text{T}}_{s} [{\text{free]}} = (\alpha_{4} ,{\mathbf{0}}) \).

    • Free \( {\text{D}}_{i} \)’s dual by setting \( {\text{A}}_{s} [\alpha_{4} ] = ({\mathbf{0}},\varphi ,\alpha^{\prime}_{i} ) \).

    • Let \( {\text{N}}_{ - 1} \) be the node before \( {\text{D}}_{i} \)’s dual. Update \( {\text{N}}_{ - 1} \)’s next pointer by setting \( {\text{A}}_{s} [\alpha_{5} ] = (\beta_{1} ,\beta_{2} ,\beta_{3} \oplus \alpha_{4} \oplus \alpha_{6} ,r_{ - 1} ) \), where \( (\beta_{1} ,\beta_{2} ,\beta_{3} ,r_{ - 1} ) = {\text{A}}_{s} [\alpha_{5} ] \). Also, update the pointers of \( {\text{N}}_{ - 1} \)’s dual by setting:\( {\text{A}}_{d} \left[ {\alpha_{2} } \right] = (\beta_{1} ,\beta_{2} ,\beta_{3} \oplus \alpha^{\prime}_{i} \oplus \alpha_{3} ,\beta_{4} ,\beta_{5} ,\beta_{6} \) \( \oplus \alpha_{4} \oplus \alpha_{6} ,r^{\prime}_{ - 1} ) \),where \( (\beta_{1} , \ldots ,\beta_{6} ,r^{\prime}_{ - 1} ) = {\text{A}}_{d} [\alpha_{2} ] \).

    • Let \( {\text{N}}_{ + 1} \) be the node after \( {\text{D}}_{i} \)’s dual. Update \( {\text{N}}_{ + 1} \)’s previous pointer by setting \( {\text{A}}_{s} [\alpha_{6} ] = (\beta_{1} ,\beta_{2} \oplus \alpha_{4} \oplus \alpha_{5} ,\beta_{3} ,r_{ + 1} ) \), where \( (\beta_{1} ,\beta_{2} ,\beta_{3} ,r_{ + 1} ) = {\text{A}}_{s} [\alpha_{6} ] \). Also, update \( {\text{N}}_{ + 1} \)’s dual’s pointers by setting: \( {\text{A}}_{d} \left[ {\alpha_{3} } \right] = (\beta_{1} ,\beta_{2} \oplus \alpha^{\prime}_{i} \oplus \alpha_{2} ,\beta_{3} ,\beta_{4} ,\beta_{5} \oplus \alpha_{4} \oplus \alpha_{5} , \) \( \beta_{6} ,r^{\prime}_{ + 1} ) \), where \( (\beta_{1} , \ldots ,\beta_{6} ,r^{\prime}_{ + 1} ) = {\text{A}}_{d} [\alpha_{3} ] \).

    • Set \( \alpha^{\prime}_{i} = \alpha_{1} \).

  4. 4.

    Remove the cipher text corresponding to \( id \) from \( {\mathbf{c}} \).

  5. 5.

    Remove \( \tau_{1} \) from \( {\text{T}}_{d} \).

  6. 6.

    Update the tree \( \alpha \) by replacing the leaves \( (\theta_{1} , \ldots ,\theta_{\# f} ) \) with \( (\theta^{\prime}_{1} , \ldots ,\theta^{\prime}_{\# f} ) \).

  7. 7.

    Output \( (\alpha^{\prime},\gamma^{\prime},{\mathbf{c^{\prime}}}) \), where \( \alpha^{\prime} \) is the updated tree.

4 Security Analysis

4.1 Dynamic CKA2-Secure

In the following, we analyze our dynamic MSE scheme and investigate which information has been leaked during the execution of these algorithms and protocols. The formal definition will be given afterwards.

In our scheme, for each word \( w_{i} \), the value \( F_{{K_{1} }} (w_{i} ) \) can be treated as a unique identifier, and we denote it by \( id(w_{i} ) \). For each file \( f_{i} \), there are two identifiers, the \( id(f_{i} ) \) in the array \( {\text{A}}_{s} \) and the \( F_{{K_{1} }} (f_{i} ) \) in the table \( {\text{T}}_{d} \). Both of them can uniquely represent a file, so for convenience, we do not distinguish between them.

Given the encrypted index \( \gamma = ({\text{T}}_{s} ,{\text{A}}_{s} ,{\text{T}}_{d} ,{\text{A}}_{d} ) \), the Merkle tree \( \alpha \) and the ciphertexts \( {\mathbf{c}} \), the server can learn the size of \( {\text{A}}_{s} \), the set \( [id(w)]_{{w \in {\mathbf{w}}}} \) from \( {\text{T}}_{s} \), the set \( [id(f)]_{{f \in {\mathbf{f}}}} \) and the length of each file \( [|f|]_{{f \in {\mathbf{f}}}} \). We denote these by \( \mathcal{\mathcal{L}}_{1} \), i.e.,

$$ \mathcal{\mathcal{L}}_{1} (\delta ,{\mathbf{f}}) = \left( {\# {\text{A}}_{s} ,\left[ {id(w)} \right]_{{w \in {\mathbf{w}}}} ,\left[ {id(f)} \right]_{{f \in {\mathbf{f}}}} ,\left[ {\left| f \right|} \right]_{{f \in {\mathbf{f}}}} } \right). $$

The search operation reveals to the server \( id(w) \) for all \( w \in W \), and the relationship between \( id(w) \) and the identifiers of all files that contains \( w \). We denote these by \( \mathcal{\mathcal{L}}_{2} \), i.e.,

$$ \mathcal{\mathcal{L}}_{2} (\delta ,{\mathbf{f}},W) = \left( {[id(f)]_{{f \in {\mathbf{f}}_{w} }} ,id(w)} \right)_{{{\text{for all }}w \in W}}. $$

In the add protocol, the server can learn the identifier of the file to be added, the length of the file, and the identifiers of the words that belong to the file. In addition, it can tell whether the word \( w \) contained in the file is a new word by checking the table \( {\text{T}}_{s} \). We denote these by \( \mathcal{\mathcal{L}}_{3} \), i.e.,

$$ \mathcal{\mathcal{L}}_{3} (\delta ,{\mathbf{f}},f) = \left( {id(f),\left[ {id(w),{\text{apprs}}(w)} \right]_{{w \in {\mathbf{w}}_{f} }} ,\left| f \right|} \right), $$

where \( {\text{apprs}}(w) \) is a one bit flag set to 1 if the word \( w \) exists in the index before the file \( f \) is added, otherwise, it is set to 0.

Similarly, in the delete protocol, the server can learn the identifier of the file to be deleted, and know the relationship between \( id(f) \) and those word identifiers. In addition, for each \( w \in {\mathbf{w}}_{f} \), by removing the word pair \( (f,w) \) from the list \( {\text{L}}_{w} \), the server learns the locations of the pair’s neighbors in \( {\text{L}}_{w} \). We denote these by \( \mathcal{\mathcal{L}}_{4} \).

$$ \mathcal{\mathcal{L}}_{4} (\delta ,{\mathbf{f}},f) = \left( {id(f),\left[ {id(w),{\text{prev}}(f,w),{\text{next}}(f,w)} \right]_{{w \in {\mathbf{w}}_{f} }} } \right), $$

where \( {\text{prev}}(f,w) \) and \( {\text{next}}(f,w) \) are the file identifiers of the file before and after \( f \) in the word list \( {\text{L}}_{w} \). For the head and the tail of the list, the corresponding value is set \( \bot \) to indicate that there are no more nodes before or after this one.

Now we use the following theorem to claim that the construction in Sect. 4 is dynamic CKA2-secure in the random oracle model with the leakage functions described above.

Theorem 1.

If the private-key encryption system \( \varPi \) is CPA-secure, and the \( F \), \( G \) and \( P \) are pseudo-random functions, then the dynamic MSE scheme is \( (\mathcal{\mathcal{L}}_{1} ,\mathcal{\mathcal{L}}_{2} ,\mathcal{\mathcal{L}}_{3} ,\mathcal{\mathcal{L}}_{4} ) \)-secure against adaptive chosen-keyword attacks in the random oracle model.

Proof:

The primary goal of providing this proof is to construct a PPT simulator \( {\mathcal{S}} \) that can generate the simulated values in the ideal game using the information given in these leakage functions. Those simulated values should be indistinguishable from ones in the real game to any PPT adversary.

Given the information received from \( {\mathcal{L}}_{1} \), the simulator could determine the length and the structure of encrypted index \( \gamma \), ciphertexts \( {\mathbf{c}} \) and tree \( \alpha \). Then it can use randomly chosen strings to construct these structures and produce these values as the simulated one \( (\tilde{\gamma },{\tilde{\mathbf{c}}},\tilde{\alpha }) \). If a PPT adversary can distinguish the tuple \( (\tilde{\gamma },{\tilde{\mathbf{c}}},\tilde{\alpha }) \) from \( (\gamma ,{\mathbf{c}},\alpha ) \) with non-negligible probability then it can break at least one of these properties with non-negligible probability: the CPA security of the encryption scheme; the pseudo-randomness of the PRFs and the elliptic curve discrete logarithm assumption.

Given the information received from \( {\mathcal{L}}_{2} \), \( {\mathcal{L}}_{3} \) and \( {\mathcal{L}}_{4} \), the simulator should respond the simulated search token, the simulated add token and the simulated delete token during the adversary’s queries. These steps become more complex due to the fact that simulator needs to track the dependencies between the information revealed by these queries to ensure consistency among these simulated tokens. We define additional assisting structures \( {\text{iA}}_{s} \), \( {\text{iA}}_{d} \), \( {\text{iT}}_{s} \) and \( {\text{iT}}_{d} \) in the simulator side to maintain consistency during updation. The simulator uses these assisted structures to record those dependencies that are revealed by \( {\mathcal{L}}_{2} \), \( {\mathcal{L}}_{3} \) and \( {\mathcal{L}}_{4} \) in the queries, and builds internal relationship in \( {\text{iA}}_{s} \), \( {\text{iA}}_{d} \), \( {\text{iT}}_{s} \) and \( {\text{iT}}_{d} \), while the values in \( \tilde{\gamma } = ({\tilde{\text{A}}}_{s} ,{\tilde{\text{T}}}_{s} ,{\tilde{\text{A}}}_{d} ,{\tilde{\text{T}}}_{d} ) \) remain random. This gives the simulator the ability to respond the adversary’s queries like a real user, except using those simulated values.

Analyze.

  1. 1.

    If the pseudo-randomness of \( F \), \( G \) and \( P \) holds, then for all PPT adversaries \( \mathcal{\mathcal{A}} \), there exist negligible value \( \varepsilon_{1} \) such that:

    $$ \left| {{\mathbf{Pr}}\left[ {1 \leftarrow \mathcal{\mathcal{A}}\left( {\delta ,{\mathbf{f}},{\text{A}}_{s} ,{\text{T}}_{s} ,{\text{A}}_{d} ,{\text{T}}_{d} } \right)} \right] - {\mathbf{Pr}}\left[ {1 \leftarrow \mathcal{\mathcal{A}}\left( {\delta ,{\mathbf{f}},{\tilde{\text{A}}}_{s} ,{\tilde{\text{T}}}_{s} ,{\tilde{\text{A}}}_{d} ,{\tilde{\text{T}}}_{d} } \right)} \right]} \right| \le \varepsilon_{1} . $$

Because each cell in \( {\tilde{\text{A}}}_{s} \) can be recognized as the form \( \langle {\tilde{\text{N}}},\tilde{r}\rangle \) where \( \left| {{\tilde{\text{N}}}} \right| = 2\log \# {\text{A}}_{s} + \log \# {\mathbf{f}} \) and \( \left| {\tilde{r}} \right| = k \). The cell in \( {\text{A}}_{s} \) is \( {\text{N}}_{i} = \left( {\left\langle {id_{i} ,{\text{addr}}_{s} ({\text{N}}_{i - 1} ),{\text{addr}}_{s} ({\text{N}}_{i + 1} )} \right\rangle \oplus H_{1} (P_{{K_{3} }} (w),r_{i} ),r_{i} } \right) \), due to the pseudo-randomness of \( P \) and the random oracle \( H_{1} \), all PPT adversaries \( \mathcal{\mathcal{A}\ominus } \) cannot distinguish \( {\tilde{\text{A}}}_{s} \) from \( {\text{A}}_{s} \). Similarly, it cannot distinguish \( {\text{T}}_{s} ,{\text{A}}_{d} ,{\text{T}}_{d} \) with \( {\tilde{\text{T}}}_{s} ,{\tilde{\text{A}}}_{d} ,{\tilde{\text{T}}}_{d} \) if the pseudo-randomness of \( F \), \( G \) and \( P \) holds. It means the adversary can distinguish the real index \( {\text{A}}_{s} ,{\text{T}}_{s} ,{\text{A}}_{d} ,{\text{T}}_{d} \) from the simulated index \( {\tilde{\text{A}}}_{s} ,{\tilde{\text{T}}}_{s} ,{\tilde{\text{A}}}_{d} ,{\tilde{\text{T}}}_{d} \). Therefore, the probability \( \varepsilon_{1} \) is negligible.

  1. 2.

    Based on the elliptic curve discrete logarithm assumption and the pseudo-randomness of \( F, \) any PPT adversary \( \mathcal{\mathcal{A}} \) cannot distinguish the real leaf nodes \( \mathcal{\mathcal{L}} \) from the simulated one \( \mathcal{\tilde{\mathcal{L}}} \), therefore cannot distinguish the tree \( \tilde{\alpha } \) from \( \alpha \), since they are generated by these leaves. i.e. there exist negligible value \( \varepsilon_{2} \) such that:

    $$ \left| {{\mathbf{Pr}}\left[ {1 \leftarrow \mathcal{\mathcal{A}}\left( {\delta ,{\mathbf{f}},\alpha } \right)} \right] - {\mathbf{Pr}}\left[ {1 \leftarrow \mathcal{\mathcal{A}}\left( {\delta ,{\mathbf{f}},\tilde{\alpha }} \right)} \right]} \right| \le \varepsilon_{2}. $$

Because the pseudo-randomness of \( F \) holds, any PPT adversary cannot distinguish the random bits from the output of PRF \( F \). So it cannot distinguish the random bits \( \gamma_{s} \left( {id(w_{i} )} \right) \) with \( F_{{K_{1} }} (w_{i} ) \). In addition, due to the discrete logarithm assumptions, any PPT adversary cannot can distinguish \( g^{{\omega_{i} }} \) with \( g^{{\prod\nolimits_{{f \in {\mathbf{f}}_{{w_{i} }} }} {\left( {s + id(f)} \right)} }} \). As we know, the real leaf nodes \( \mathcal{\mathcal{L}} \) from the simulated one \( \mathcal{\tilde{\mathcal{L}}} \) are: \( \mathcal{\tilde{\mathcal{L}}} = \{ \tilde{\theta }_{w} \}_{{w \in {\mathbf{w}}}} = \{ (\tilde{\theta }_{w,1} ,\tilde{\theta }_{w,2} )\}_{{w \in {\mathbf{w}}}} = \) \( (\gamma_{s} \left( {id(w_{i} )} \right),g^{{\omega_{i} }} );\mathcal{L} = \{ \theta_{w} \}_{{w \in {\mathbf{w}}}} = \{ (\theta_{w,1} ,\theta_{w,2} )\}_{{w \in {\mathbf{w}}}} = (F_{{K_{1} }} (w_{i} ),g^{{\prod\nolimits_{{f \in {\mathbf{f}}_{{w_{i} }} }} {\left( {s + id(f)} \right)} }} ) \),

So \( \mathcal{\mathcal{A}} \) cannot distinguish \( \mathcal{\mathcal{L}} \) from \( \mathcal{\tilde{\mathcal{L}}} \). The tree \( \tilde{\alpha } \) and \( \alpha \) are build by the collision-resistant hash function \( H_{3} \) of the leaf nodes, so the adversary cannot distinguish the tree \( \tilde{\alpha } \) and \( \alpha \), then \( \varepsilon_{2} \) is negligible.

  1. 3.

    If the private-key encryption system \( \Pi \) is CPA-secure, then for all PPT adversaries \( \mathcal{\mathcal{A}} \), there exists negligible value \( \varepsilon_{3} \) such that:

    $$ \left| {{\mathbf{Pr}}\left[ {1 \leftarrow \mathcal{\mathcal{A}}\left( {{\mathbf{f}},{\mathbf{c}}} \right)} \right] - {\mathbf{Pr}}\left[ {1 \leftarrow \mathcal{\mathcal{A}}\left( {{\mathbf{f}},{\tilde{\mathbf{c}}}} \right)} \right]} \right| \le \varepsilon_{3}. $$

Because the private-key encryption system \( \Pi \) is proved to be CPA secure, so the ciphertexts it produces do not reveal any partial information about the plaintext. So any PPT adversary \( \mathcal{\mathcal{A}} \) cannot distinguish the ciphertexts that generated by two different inputs using the SKE encryption. As we know, \( {\tilde{\mathbf{c}}} \) and \( {\mathbf{c}} \) are:\( {\mathbf{c}} = (c_{1} , \ldots ,c_{{\# {\mathbf{f}}}} ) \), in which \( c_{i} =\Pi . {\text{Enc}}_{{K_{4} }} (f_{i} ) \); \( {\tilde{\mathbf{c}}} = (\tilde{c}_{1} , \ldots ,\tilde{c}_{{\# {\mathbf{f}}}} ) \), in which \( \tilde{c}_{i} =\Pi . {\text{Enc}}_{{K_{4} }} ({\mathbf{0}}^{\left| f \right|} ) \). So \( \mathcal{\mathcal{A}} \) cannot distinguish \( {\tilde{\mathbf{c}}} \) from \( {\mathbf{c}} \). Therefore, \( \varepsilon_{3} \) is negligible.

In addition, there exists negligible value \( \varepsilon_{4} \) such that:

$$ \left| {{\mathbf{Pr}}\left[ {1 \leftarrow \mathcal{\mathcal{A}}\left( {\delta ,{\mathbf{f}},\tau_{s} } \right)} \right] - {\mathbf{Pr}}\left[ {1 \leftarrow \mathcal{\mathcal{A}}\left( {\delta ,{\mathbf{f}},\tilde{\tau }_{s} } \right)} \right]} \right| \le \varepsilon_{4} $$

Because the pseudo-randomness of \( F,G,P \) holds, any PPT adversary cannot distinguish the random bits from the output of PRF \( F,G,P \). So it cannot distinguish \( \tilde{\tau }_{i} = \left( {\gamma_{s} \left( {id(w_{i} )} \right),{\tilde{\text{T}}}^{\prime}_{s} \left[ {\gamma_{s} \left( {id(w_{i} )} \right)} \right] \oplus {\text{iT}}_{s} \left( {id(w_{i} )} \right),K_{{id(w_{i} )}} } \right) \) with \( \tau_{i} = (F_{{K_{1} }} (w_{i} ),G_{{K_{2} }} (w_{i} ), \) \( P_{{K_{3} }} (w_{i} )) \), so as \( \tau_{s} = \left( {\tau_{1} , \ldots \tau_{n} } \right) \) and \( \tau_{s} = \left( {\tilde{\tau }_{1} , \ldots \tilde{\tau }_{n} } \right) \). Therefore, \( \varepsilon_{4} \) is negligible.

In the same way, based on the elliptic curve discrete logarithm assumption and the pseudo-randomness of \( F,G,P \), we have:

$$ \left| {{\mathbf{Pr}}\left[ {1 \leftarrow \mathcal{\mathcal{A}}\left( {\delta ,{\mathbf{f}},\tau_{a} ,c_{f} } \right)} \right] - {\mathbf{Pr}}\left[ {1 \leftarrow \mathcal{\mathcal{A}}\left( {\delta ,{\mathbf{f}},\tilde{\tau }_{a} ,\tilde{c}_{f} } \right)} \right]} \right| \le \varepsilon_{5} , $$
$$ \left| {{\mathbf{Pr}}\left[ {1 \leftarrow \mathcal{\mathcal{A}}\left( {\delta ,{\mathbf{f}},\tau_{d} } \right)} \right] - {\mathbf{Pr}}\left[ {1 \leftarrow \mathcal{\mathcal{A}}\left( {\delta ,{\mathbf{f}},\tilde{\tau }_{d} } \right)} \right]} \right| \le \varepsilon_{6} , $$

where \( \varepsilon_{5} ,\varepsilon_{6} \) are all negligible values.

To sum up, we have the conclusion that for all PPT adversaries \( \mathcal{\mathcal{A}} \), the output of \( {\text{Real}}_{\mathcal{\mathcal{A}}} (1^{k} ) \) and \( {\text{Ideal}}_{{\mathcal{\mathcal{A}},\mathcal{\mathcal{S}}}} (1^{k} ) \) are identical, except with negligible probability \( {\mathbf{negl}}(1^{k} ) \), i.e.:

$$ \left| {{\mathbf{Pr}}\left[ {{\text{Real}}_{\mathcal{\mathcal{A}}} (1^{k} ) = 1} \right] - {\mathbf{Pr}}\left[ {{\text{Ideal}}_{{\mathcal{\mathcal{A}},\mathcal{\mathcal{S}}}} (1^{k} ) = 1} \right]} \right| \le {\mathbf{negl}}(1^{k} ). $$

Therefore our dynamic MSE scheme is \( (\mathcal{\mathcal{L}}_{1} ,\mathcal{\mathcal{L}}_{2} ,\mathcal{\mathcal{L}}_{3} ,\mathcal{\mathcal{L}}_{4} ) \)-secure against adaptive chosen-keyword attacks in random oracle model. □

4.2 Unforgeability

Theorem 2.

If \( H_{3} \) is collision-resistant hash function and the bilinear q-SDH assumption holds then the dynamic MSE scheme is unforgeable.

Proof:

The main idea to give the proof is that, if there exists a PPT adversary \( {\mathcal{A}} \) such that \( {\text{Forge}}_{{\mathcal{A}}} (1^{k} ) = 1 \), then there exist a PPT simulator \( {\mathcal{S}} \) that breaks at least one of the assumptions : The collision-resistance property of \( H_{3} \) and the Bilinear q-SDH assumption.

During the game, the simulator \( {\mathcal{S}} \) interacts with \( {\mathcal{A}} \) using real algorithm. Assume after \( q \) times queries, \( {\mathcal{A}} \) outputs a set of file identifiers \( {\mathbf{I^{\prime}}} \ne {\mathbf{I}}_{W} \) and a valid proof \( \pi \). This means the proof \( \pi = \left\{ {{\mathcal{T}},{\mathcal{S}},{\mathcal{C}}} \right\} \) he produces under query \( W \) passes all three steps of the verification phase. We categorize the forgery into three types:

Type I forgery: For some word \( w_{i} \in W \), the adversary outputs a different leaf value \( {\mathop {\theta} \limits^{\frown }}_{{w_{i} }} \) in Merkle tree proof \( {\mathop t\limits^{\frown }}_{i} \) and passes the verification step 1.

Type II forgery: For some word \( w_{i} \in W \), \( {\mathbf{I^{\prime}}}\not \subset S_{i} \). The adversary gives the simulator the real accumulation value in the proof \( t_{i} \), and outputs a subset witness \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{g}^{{P_{i} }} \) that passes the verification step 2.

Type III forgery: The set \( {\mathbf{I^{\prime}}} \) is a proper subset of \( {\mathbf{I}}_{W} \). The adversary gives the simulator the real \( {\mathcal{S}} = \{ g^{{P_{1} }} , \ldots ,g^{{P_{n} }} \} \), and outputs a completeness witness \( \mathop {\mathcal{C}}\limits^{\frown } \) which passes the verification step 3.

It is clear that if \( {\mathbf{I^{\prime}}} \ne {\mathbf{I}}_{W} \) and proof \( \pi \) is valid then one of the above mentioned forgeries must occur. Next we show that the simulator \( {\mathcal{S}} \) can use type I forgery to break the collision-resistance property of \( H_{3} \), and use type II or III forgeries to break the bilinear q-SDH assumption.

1. The collision-resistance property of \( H_{3} \)

The hash function \( H_{3} \) is collision-resistance if it is difficult for all PPT adversaries to find two different messages \( m_{1} \) and \( m_{2} \), such that \( H_{3} (m_{1} ) = H_{3} (m_{2} ) \).

First, given the hash function \( H_{3} \), the simulator \( {\mathcal{S}} \) interacts with the adversary \( {\mathcal{A}} \) according to the game \( {\text{Forge}}_{{\mathcal{A}}} (1^{k} ) \). If \( {\mathcal{A}} \) wins the game and the Type I forgery occurs, that means for some word \( w_{i} \in W \), \( {\mathcal{A}} \) outputs a different leaf value \( {\mathop \theta \limits^{\frown }}_{{w_{i} }} \) in Merkle tree proof \( { \mathop t\limits^{\frown }}_{i} \).. Then, the simulator \( {\mathcal{S}} \) verifies the value \( {\mathcal{A}} \) outputs, which passes the verification step 1. Let \( {\mathop \theta \limits^{\frown }}_{{w_{i} }} = ({\mathop {\theta} \limits^{\frown }}_{{w_{i} ,1}} ,{\mathop {\theta} \limits^{\frown } }_{{w_{i} ,2}} ) \). Passing the verification step 1 means the following two conditions hold:

  • The search key \( F_{{K_{1} }} (w_{i} ) = {\mathop \theta \limits^{\frown }}_{{w_{i} ,1}} \).

  • The Merkle tree verification using the leaf \( {\mathop \theta \limits^{\frown }}_{{w_{i} }} \) succeeds.

According to the verification step 1, the adversary \( {\mathcal{A}} \) may only forge the \( {\mathop \theta \limits^{\frown }}_{{w_{i} ,2}} \). Then passing the Merkle tree verification implies that the adversary is able to find the collision of \( H_{3} \), because it can generate the same root with the modified leaf.

2. Bilinear q-SDH assumption

Given the simulator \( {\mathcal{S}} \) an instance of bilinear q-SDH problem: \( (p,\varvec{\mathbb{G}},\mathcal{\mathcal{G}},e,g) \) and a \( (q + 1) \)-tuple \( (g,g^{s} , \ldots ,g^{{s^{q} }} ) \). \( {\mathcal{S}} \) interacts with the adversary \( {\mathcal{A}} \) in the following way.

First, since \( {\mathcal{S}} \) doesn’t know the value \( s \) in the given bilinear q-SDH instance, it needs to reconstruct the following algorithms which related to \( s \) in the game \( {\text{Forge}}_{{\mathcal{A}}} (1^{k} ) \):

In the algorithm Gen, the simulator \( {\mathcal{S}} \) directly uses \( (g,g^{s} , \ldots ,g^{{s^{q} }} ) \) as the public parameters without knowing \( s \), and sends them to the adversary.

And in the algorithm Setup, for the leaf nodes: \( \theta_{w} = \left( {F_{{K_{1} }} (w),g^{{\prod\nolimits_{{f \in {\mathbf{f}}_{w} }} {\left( {s + id(f)} \right)} }} } \right) \), the simulator \( {\mathcal{S}} \) computes the value using \( (g,g^{s} , \ldots ,g^{{s^{q} }} ) \):\( g^{{\prod\nolimits_{{f \in {\mathbf{f}}_{w} }} {\left( {s + id(f)} \right)} }} \).

It is worth mentioning that, the simulator \( {\mathcal{S}} \) needs to construct an extra auxiliary data structure \( {\mathcal{N}} \). It stores for each leaf nodes \( \theta_{w} \) the polynomial:\( n_{w} = \prod\nolimits_{{f \in {\mathbf{f}}_{w} }} {(s + id(f))} \), which is used to form the add/delete tokens later.

Simulator \( {\mathcal{S}} \) cannot directly compute the value of \( \tau_{a} \) in Add/Update protocol. In the \( {\text{Add/Update}} \) protocol’s user’s step 2, when computing the value of the new leaf node \( \theta_{i}^{\prime } \) in \( \tau_{a} \), it first finds \( {\mathcal{N}} \) to find the polynomials \( n_{i} \) that equals \( \theta_{i,2} \), then computes the value \( g^{{n_{i} \cdot (s + id(f))}} \) using \( (g,g^{s} , \ldots ,g^{{s^{q} }} ) \). The value \( \theta^{\prime}_{i} = (\theta_{i,1} ,g^{{n_{i} \cdot (s + id(f))}} ) \) is the updated leaf node.

Similarly, in the \( {\text{Del/Update}} \) protocol, \( {\mathcal{S}} \) finds the \( n_{i} \) in \( {\mathcal{N}} \) and removes the factor \( s + id(f) \) from \( n_{i} \) and then computes the value \( g^{{n_{i} /(s + id(f))}} \) using \( (g,g^{s} , \ldots ,g^{{s^{q} }} ) \) and gets a new leaf node \( \theta^{\prime}_{i} = (\theta_{i,1} ,g^{{n_{i} /(s + id(f))}} ) \).

In this modified game, the values that related to \( s \) are computed in a new way. However, it produces same output as it was produced by earlier version of algorithm. So in the adversary \( {\mathcal{A}} \)’s view, these values are still valid, it cannot distinguish this game with the original one.

If \( {\mathcal{A}} \) wins the game, and the following two types of forgeries occur, then the simulator \( {\mathcal{S}} \) may solve the given bilinear q-SDH instance.

  • If the Type II forgery occurs, i.e., the adversary \( {\mathcal{A}} \) outputs a set \( {\mathbf{I^{\prime}}} = \{ x_{1} , \ldots ,x_{m} \} \) and a witness \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{g}^{{P_{i} }} \) for some \( w_{i} \in W \). Let \( S_{i} = \{ id_{1} , \ldots ,id_{q} \} \) be the file identifier set related to the word \( w_{i} \), then \( g^{{(s + id_{1} )(s + id_{2} ) \cdots (s + id_{q} )}} \) is the corresponding accumulation value. Since \( {\mathbf{I^{\prime}}}\not \subset S_{i} \), there exists some \( 1 \le j \le m \), such that \( x_{j} \notin S_{i} \). This means in the equation: \( e(g^{{\prod\nolimits_{{x \in {\mathbf{I^{\prime}}}}} {(s + x)} }} ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{g}^{{P_{i} }} ) = e(g^{{(s + id_{1} )(s + id_{2} ) \cdots (s + id_{q} )}} ,g) \), \( (s + x_{j} ) \) does not divide \( (s + id_{1} )(s + id_{2} ) \cdots (s + id_{q} ) \). Therefore there exists polynomial \( Q(s) \) of degree \( q - 1 \) and constant \( c \ne 0 \), such that:\( (s + id_{1} )(s + id_{2} ) \cdots (s + id_{q} ) = Q(s)(s + x_{j} ) + c \).

Then the simulator \( {\mathcal{S}} \) has \( e(g,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{g}^{{P_{i} }} )^{{(s + x_{j} )\prod\nolimits_{{x \in {\mathbf{I^{\prime}}} \wedge x \ne x_{j} }} {(s + x)} }} = e(g,g)^{{Q(s)(s + x_{j} ) + c}} \).

After transformation, it can finally have \( e(g,g)^{{1/s + x_{j} }} = e(g,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{g}^{{P_{i} }} )^{{\prod\nolimits_{{x \in {\mathbf{I^{\prime}}} \wedge x \ne x_{j} }} {(s + x)} }} \) \( e(g,g)^{ - Q(s)} )^{1/c} \). This means the simulator \( {\mathcal{S}} \) can solve the instance of bilinear q-SDH problem in polynomial time.

  • If the Type III forgery occurs, i.e., the adversary \( {\mathcal{A}} \) outputs a set \( {\mathbf{I^{\prime}}} = \{ x_{1} , \ldots ,x_{m} \} \) and the completeness witness \( \mathop {\mathcal{C}}\limits^{\frown } \). Since the set \( {\mathbf{I^{\prime}}} \) is the proper subset of \( {\mathbf{I}}_{W} \), there exists at least one common factor in polynomials \( P_{1} , \ldots ,P_{n} \). We use \( (s + x) \) to denote the factor, where \( x \notin {\mathbf{I^{\prime}}} \). These values can pass the verification step 3 means the following holds: \( \prod\nolimits_{i = 1}^{n} {e(g^{{P_{i} }} ,g^{{q_{i} }} )} = e(g,g) \). And extract \( (s + x) \) from each \( P_{i} \) by computing \( g^{{P^{\prime}_{i} }} = (g^{{P_{i} }} )^{1/(s + x)} \), then \( \prod\nolimits_{i = 1}^{n} {e(g^{{P_{i} }} ,g^{{q_{i} }} )} = \) \( (\prod\nolimits_{i = 1}^{n} {e(g^{{P^{\prime}_{i} }} ,g^{{q_{i} }} )} )^{s + x} = e(g,g). \)

Thus the simulator \( \mathcal{\mathcal{S}} \) can easily form the solution of the instance of bilinear q-SDH problem by computing: \( e(g,g)^{1/(s + x)} = \prod\nolimits_{i = 1}^{n} {e(g^{{P^{\prime}_{i}}},g^{{q_{i}}})} \). This means the simulator can also solve the instance of bilinear q-SDH problem in polynomial time.

The above analyses show that, if the adversary \( \mathcal{\mathcal{A}} \) could successfully forge a proof, it must have the ability to break at least one of these assumptions above. Therefore we have the conclusion that for all PPT adversaries \( \mathcal{\mathcal{A}} \), the probability

$$ {\mathbf{Pr}} [ {\text{Forge}}_{\mathcal{\mathcal{A}}} (1^{k}) = 1] \le {\mathbf{negl}}(1^{k}), $$

where \( {\mathbf{negl}}(1^{k}) \) is a negligible function with input \( 1^{k} \). Thus our dynamic MSE scheme is unforgeable. □

5 Conclusion

Searchable encryption is an important cryptographic primitive for cloud storage environment. It is well motivated by the popularity of cloud storage services. At the same time, the authentication methods utilized for the search results’ verification is a significant supplement that makes the search more reliable and it would greatly promote the development of cloud storage service.

In this paper, we described a searchable encryption scheme which supports multiple keywords search and authentication. The scheme can greatly reduce the communication cost during the search. We demonstrated that, taking into account the security challenges in the cloud storage, our scheme can withstand the chosen-keyword attack carried out by the adaptive adversaries. Proposed scheme also prevents the result from being maliciously altered by those adversaries.

In the future, we will perform a detailed analysis of the security aspects in this paper and investigate the feasibility of designing improved security model to enhance the scheme’s security features. Moreover, we will give consideration to the authenticate techniques to achieve more efficiency to meet practical needs.