Keywords

1 Introduction

With the data volume growing fast, an increasing number of users now upload and store their data on the cloud servers to reduce local storage pressure. Different from previous C/S(Client/Server) mode, cloud servers are usually deemed “semi-trusted” because there exists a fear that the users’ data may be hacked. Therefore, encryption is often used to protect the users’ data privacy. Traditional encryption & decryption algorithms, such as DES [1], AES [2], and RSA [3], do not support search over encrypted data. When the data volume is huge, storage or bandwidth will not allow cloud users to retrieve all their stored data and then decrypt them to extract the required parts. It is then necessary to design schemes that support search over ciphertexts. Song et al. [4] initially discussed security properties such as controlled searching, provable secrecy, query isolation, and hidden queries, and then constructed the corresponding searchable encryption schemes. Searchable encryption has ever since attracted wide academic interests. In the light of whether index generation and queries use the same key or not, searchable encryption could be categorize d into symmetric searchable encryption and asymmetric searchable encryption [5]. Here, only the latter is concerned.

Taking email systems as an example, Boneh et al. [6] constructed PEKS (Public Key Encryption with Keyword Search) model, designed a scheme based on BDH (Bilinear Diffie-Hellman) assumption and trapdoor permutations respectively, and also proposed a construction using Jacobi symbols. Abdalla et al. [7] analyzed the consistency of PEKS in detail, and proved that schemes proposed in [6] are computationally consistent. They successfully built a different statistically-consistent scheme, providing an approach to converting anonymous IBE (Identity Based Encryption) schemes to secure PEKS ones. Khader [8] constructed a PEKS scheme based on K-resilient IBE, which proved IND-CKA (Semantic Security against Adaptive Chosen Keyword Attack) secure under the standard model. Crescenzo et al. [9] proposed a PEKS scheme based on QRP (Quadratic Residuosity Problem). Hwang et al. [10] constructed the PECK (Public key Encryption with Conjunctive Keyword search) model, and then designed a scheme based on DLDH (Decisional Linear Diffie-Hellman) assumption. Compared with previous schemes, the ciphertext and private key of the new scheme proved to be the shortest. Then the model and scheme are extended to the multi-user circumstances. Baek et al. [11] proposed an improved scheme aiming at three issues such as “refreshing keywords”, “removing secure channel”, and “processing multiple keywords”, which are left unconcerned by schemes proposed in [6]. Rhee et al. [12] pointed out that the capability of adversary in the scheme proposed in [11] is too limited, and instead constructed a reinforced secure model and its corresponding improved scheme. Zhao et al. [13] proposed trapdoor-indistinguishable PEKS. Yang et al. [14] proposed a variant aiming at making up missing computational consistency in the scheme proposed in [8], and promoted the efficiency dramatically. Luo et al. [15] proposed a PEKS scheme to tackle the IF (Integer Factorization) problem. Since users vary frequently under mobile cloud storage, Xia et al. [16] designed a PEKS scheme capable of data sharing and ciphertext modification. Shao et al. [17] designed a PEKS scheme in light of the “uni-sender multi-receiver” circumstance in the medical care area, resolving the problem that ciphertext is too long in previous schemes.

Using Elgamal algorithm, Liu et al. [18] fulfilled a PEKS scheme that provided the function of verifying retrieved data in asymmetric searchable encryption. Many flaws however are found in this scheme through our analyses. This paper thus attempts to fixe these flaws one by one as to obtain an improved scheme.

2 Verifiable Public Key Searchable Encryption Scheme

There are 3 principals in the scheme proposed in [18], namely, data uploader Alice, data retriever Bob and the cloud server. There are 5 algorithms in the scheme proposed in [18], namely, parameter generation, generation of the keyword and encrypted files, trapdoor generation, check, and verification, shown as follows.

(1) Parameter generation

Alice selects a big prime \( p_{1} \) and a generator \( g_{1} \) in \( Z_{{p_{1} }}^{*} \). Alice selects a random integer \( x_{1} \) (\( 0 \le x_{1} \le p - 2 \)), calculates \( y_{1} = g_{1}^{{x_{1} }} \), sets her public key to (\( p_{1} ,g_{1} ,y_{1} \)) and private key to \( x_{1} \). Similarly, Alice selects a generator \( g_{2} \) in \( Z_{{p_{2} }}^{*} \) and a generator \( g \) in \( Z_{p}^{*} \), then sets the public key of Bob to (\( p_{2} ,g_{2} ,y_{2} \)) and his private key to \( x_{2} \), sets the public key of the server to (\( p,g,y \)) and its private key to \( x \). Alice selects a hash function \( H:\{ 0,1\}^{*} \to Z_{p}^{*} \) and sets the encryption algorithm and signature algorithm to Elgamal.

(2) Generation of encrypted files

Suppose Alice wants to send a file \( M \) including keyword \( W \) to Bob. Alice calculates \( H(W) \), selects \( r_{1} ,r_{2} \in_{R} Z_{p}^{*} \), computes \( S_{1} = r_{1} r_{2}^{H(W)} \bmod p \), \( S_{2} = r_{1}^{{H(W)^{ - 1} }} r_{2} \bmod p \), sets \( S = \; < S_{1} ,S_{2} > \).

Alice encrypts \( M \) with Bob’s public key to obtain ciphertext \( C_{1} = \; < y_{11} ,y_{12} > \).

Alice signs \( H(W) \) with \( x_{1} \) to obtain \( sig = \; < H(W),r,s > \).

Alice encrypts its identity \( ID_{A} \) with Bob’s public key to acquire ciphertext \( C_{2} = \; < y_{21} ,y_{22} > \).

Alice uploads \( S = \; < S_{1} ,S_{2} > \) and \( D = \; < C_{1} ,sig,C_{2} > \) to the server.

(3) Trapdoor generation

When Bob wants to retrieve files including keyword \( W_{1} \), he computes \( H(W_{1} ) \), which is encrypted with the server’s public key to acquire \( T_{w} = \; < y_{31} ,y_{32} > \). Then, Bob sends \( T_{w} \) to the server.

(4) Check

After receiving \( T_{w} \), the server decrypts it to obtain \( H(W_{1} ) \). Afterwards, the server checks each \( S = \; < S_{1} ,S_{2} > \) one by one. Once it satisfies that \( S_{1} = S_{2}^{{H(W_{1} )}} \), the server sends the corresponding \( D = \; < C_{1} ,sig,C_{2} > \) to Bob.

(5) Verification

After receiving \( D \), first, Bob decrypts \( C_{2} \) to acquire the identity of the sender. Then, Bob verifies \( sig \) with the public key of the sender. If the verification succeeds, Bob decrypts \( C_{1} \). Otherwise, Bob discards.

Through our analyses, there are several flaws in the scheme, shown as follows.

(1) Trapdoors could be generated by anybody. From the description above, we could see, the trapdoor generation algorithm only uses \( H(.) \) and the public key of the server. These two are known by every principal, which implies that anybody could generate the trapdoor. Usually, whether in symmetric searchable encryption or in its asymmetric counterpart, trapdoor generation should be the exclusive ability of the search requester. Otherwise, any principal could launch a search, which will greatly aggravate the burden of the server. Thus, the trapdoor generation algorithm should use the private key of the search requester.

(2) The adversary could replace \( C_{1} \) with any ciphertext of \( M' \) other than \( M \) without being noticed. For example, he/she could replace \( C_{1} = \; < y_{11} ,y_{12} > \) with \( C_{1}^{'} = \; < y_{11}^{'} ,y_{12}^{'} > \), in which \( C_{1}^{'} \) is the outcome of encrypting \( M' \) other than \( M \) with Bob’s public key. Thus, the adversary could easily cause misunderstanding between Alice and Bob. For this, we should prevent \( C_{1} \) from being tampered with, such as using hash functions or digital signature.

(3) All key pairs are generated by Alice. Usually, both the knowledge for security and the computing power of users are rather limited. Therefore, there might be various flaws in the generated key pairs, such as weak pseudorandomness, short length, and apparent semantics, etc. Therefore, in cryptographic schemes, key pairs are usually generated by Key Generator. Moreover, each principal computes under its own field, which incurs extra difficulty for the implementation of the scheme. Usually, for most cryptographic schemes, all computations could be done under just one field.

(4) Identities are encrypted. Generally speaking, all the identities are public. For an adversary capable of monitoring the whole communication, the sender and the receiver of the message could be known easily. Hence, it is unnecessary to encrypt identities.

(5) \( S \) is absolutely useless. In Check phase, after obtaining \( H(W_{1} ) \), the server does not need to check \( S_{1} = S_{2}^{{H(W_{1} )}} \). It could just compare \( H(W_{1} ) \) with \( H(W) \) in \( sig \). If \( H(W_{1} ) = H(W) \), it sends the corresponding \( D \) to Bob.

3 The Proposed Scheme

The proposed scheme contains 3 principals and 5 algorithms as well, shown as follows.

(1) Parameter generation

KG (Key Generator) selects a big prime \( p \) and a generator \( g \) of \( Z_{p}^{*} \) (All computations are done under this field, if unspecified.). KG selects a hash function \( H:\{ 0,1\}^{*} \to Z_{p}^{*} \). KG generates key pairs (\( sk_{A} ,pk_{A} = g^{{sk_{A} }} \)), (\( sk_{B} ,pk_{B} = g^{{sk_{B} }} \)), (\( sk_{C} ,pk_{C} = g^{{sk_{C} }} \)) for Alice, Bob and the server, respectively. KG sets the algorithm for encryption and signature to Elgamal.

(2) File sending

Alice computes \( H(W) \) and \( S = pk_{B} \cdot g^{H(W)} \).

Alice computes \( C = Enc_{{pk_{B} }} (M) = \; < y_{1} ,y_{2} > \).

Alice computes \( sig = Sig_{{sk_{A} }} (H(M)) = \; < H(M),r,s > \).

Alice sends \( S \) and \( < C,sig,ID_{A} > \) to the server.

(3) Trapdoor generation

Bob sends \( s = sk_{B} + H(W_{1} ) \) to the server.

(4) File retrieving

The server computes \( S' = g^{s} \), compares each \( S \) with \( S' \), and sends all \( < C,sig,ID_{A} > \) where \( S = S' \) to Bob.

(5) Verification

Bob fetches the public key \( pk_{A} \) corresponding to \( ID_{A} \), which will be used to verify \( sig \). If the verification succeeds, Bob decrypts \( C \) with \( sk_{B} \) to obtain \( M' \). Bob computes \( H(M') \) and discards once \( H(M') \ne H(M) \).

4 Analyses of the Proposed Scheme

Apparently, the proposed scheme has perfectly fixed the aforementioned five flaws, shown as follows.

(1) Only the search requester could generate the trapdoor. From the trapdoor generation phase we could see, the private key of Bob, namely \( sk_{B} \), is required, which is only possessed by Bob himself. Therefore, only the search requester is capable of generating the trapdoor.

(2) The adversary couldn’t replace \( C \) with the ciphertext of \( M' \) other than \( M \) without being noticed. Suppose the adversary replaces \( C \) with \( C' = Enc_{{pk_{B} }} (M') = \; < y_{1}^{'} ,y_{2}^{'} > \), let’s discuss two cases below:

1) If the adversary replaces the 1st element in \( sig \) with \( H(M') \), as he/she has no knowledge of the private key of Alice, namely \( sk_{A} \), due to the unforgeability of Elgamal signature, he/she couldn’t replace the 2nd and 3rd elements with Alice’s legal signature \( r',s' \) on \( H(M') \). Thus, in Verification phase, Bob definitely discards after verifying \( sig \) with Alice’s public key \( pk_{A} \).

2) If the adversary does not modify \( sig \) at all, then, in Verification phase, Bob will succeed when verifying \( sig \). Then, after decrypting \( C' \) with \( sk_{B} \), Bob will obtain \( M' \). Due to collision resistance of \( H(.) \), the probability of \( H(M') = H(M) \) is negligible, which implies Bob will discard.

In a word, now the adversary could not cause any misunderstanding between Alice and Bob, which means the semantics of both principals is maintained.

(3) All key pairs are generated by KG, which ensures the security. All the computations are under the field \( Z_{p}^{*} \), which avoids unnecessary troubles.

(4) Encryption of identities is removed. The identities of all principals are transmitted in the plaintext form, which avoids the overhead of encryption.

(5) There is a clear purpose for each component of the proposed scheme. None of them is useless, obviously.

5 Conclusion

This paper proposes an improved scheme aiming at fixing 5 flaws in the scheme proposed in [18]. Analyses show that the proposed scheme has well made up for the deficiency of the previous scheme and proves quite practical as well. In the future, we plan to design searchable encryption algorithms under more complicated application background, studying the issues including dynamic user group, single sender and multiple receivers, untrusted server, etc.