Keywords

1 Introduction

1.1 Background

Nowadays, cloud services such as data storing on remote third-party providers give high data availability and reduce IT infrastructure costs of a company. From a viewpoint of security, company’s sensitive data such as secret information or privacy data of customers should be encrypted to be kept secret from people outside of the company when stored on the cloud. On the other hand, it is indispensable to search the stored data from a viewpoint of usability. However, data encrypting and keyword searching are incompatible in general, since keyword searching for encrypted data is intractable. Although there is a naive approach in which keyword searching is performed after decrypting encrypted data on the cloud, this is insufficient because malicious administrators or softwares on the cloud would steal the plain data or decryption keys when performed the decryption process. As a solution to these problems, searchable encryption has been proposed.

After the first searchable encryption scheme was proposed in [42], many concrete schemes have been constructed. Roughly speaking, searchable encryption schemes are typically classified into two types: symmetric-key type (e.g. [1, 2, 5, 7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49]) and public-key type (e.g. [3, 6]). This paper focuses on the former searchable encryption.

Searchable encryption of symmetric-key type is called searchable symmetric encryption or SSE. SSE consists of document storing process and keyword searching process, and these processes are performed by the same user since a unique secret key is used in typical SSE. In the document storing process, the user encrypts documents and generates an encrypted index from the secret key, and the server stores a pair of the encrypted documents and the encrypted index. In the keyword searching process, the user generates an encrypted query (called trapdoor) from the secret key and a keyword, and the server searches by applying the trapdoor to the encrypted index. Although the keyword searching cost in SSE is quite lower than that in public-key type, this cost becomes critical even in SSE as the number of stored documents increases. In order to reduce this cost, SSE schemes with useful indexes such as inverted index structure or Bloom filter have been constructed.

Security models for SSE also have been studied. Curtmola et al. [15, 16] carefully extracted unavoidable information leaked from the document storing process and the keyword searching process of a typical SSE scheme, and formalized acceptable leakage information. Then, they defined that an SSE scheme is secure if information revealed from the processes of the SSE scheme is at most the acceptable leakage information. Their security model and its variants (e.g. [13]) are used in many SSE schemes. Especially, adaptive security definitions proposed in [13, 15, 16] is considered as one of the security goals in SSE literature.

1.2 Motivation

The SSE schemes (called SSE-1 and SSE-2) proposed by Curtmola et al. have search-friendly encrypted indexes such as inverted index structure [15, 16]. Their schemes have had a big impact on constructing efficient SSE schemes. Especially, SSE-2 is constructed only from pseudo-random functions and achieves the adaptive security. Furthermore, the keyword searching process of SSE-2 is based on the binary searching operation, and therefore performed efficiently. However, there is a problem that the trapdoor size of SSE-2 depends on the number of stored documents.

Chase and Kamara formalized structured encryption which is a generalization of SSE, and its concrete schemes were proposed [13]. An efficient SSE scheme (hereafter, Chase-Kamara scheme) which has a very simple structure is obtained by simplifying the concrete schemes. It is very easy to show that the Chase-Kamara scheme achieves the adaptive security, thanks to simplicity of its encrypted index structure. In the Chase-Kamara scheme, a search result for a keyword is represented as a bit string in which the i-th bit is 1 when the i-th document contains the keyword, and the encrypted index is built by directly masking the search result with each bit of the output of a pseudo-random function. Therefore, the Chase-Kamara scheme requires pseudo-random functions whose output lengths are longer than the number of documents that the user would like to store. As a result, the trapdoor size of the Chase-Kamara scheme depends on the number of stored documents. This trapdoor size becomes critical as the number of stored documents increases. For example, the trapdoor size is about 120MB when the number of stored documents is one billion. Thus, the Chase-Kamara scheme has the same trapdoor size problem as SSE-2.

Recently, Miyoshi et al. proposed the SSE scheme with a small encrypted index [36]. Their scheme is constructed by hierarchical Bloom filters, and achieves the adaptive security. However, in their scheme, the trapdoor size also depends on the number of stored documents, and the number of communication rounds between the user and the server is two. Therefore, their keyword searching process is inefficient although the encrypted index size is reasonable.

1.3 Our Contributions

In this paper, we focus on the trapdoor size problem of the Chase-Kamara scheme, and propose a modified scheme whose trapdoor size does not depend on the number of stored documents. The modified scheme is constructed by using our multiple hashing technique which can transform a trapdoor of short length to that of long length without any secret information. With this technique, the trapdoor size of the modified scheme depends only on the output length of a used hash function (e.g. 512-bit if SHA-256 is used) even if the number of stored documents is one billion. We can show that the modified scheme is adaptively secure in the random oracle model.

A key point of our modified scheme is to securely divide the trapdoor generation process of the Chase-Kamara scheme by using our multiple hashing technique. According to this modification of the trapdoor generation process, the encrypted index of the Chase-Kamara scheme is also slightly modified. Informally, in the Chase-Kamara scheme, the user generates a trapdoor of long length and the server searches the encrypted index by directly using the trapdoor. On the other hand, in our modified scheme, the user generates a trapdoor of short length, and the server transforms the trapdoor to a meaningful value of long length, which consists of hash values and corresponds to the trapdoor of the Chase-Kamara scheme. This transformation uses only the trapdoor sent by the user, but not any secret information. After that, the server searches the encrypted index using the trapdoor and the meaningful value, similarly to the Chase-Kamara scheme.

We give a comparison result among the adaptively secure SSE schemes [13, 15, 36] and our modified scheme in Table 1, where \(\ell \) and \(\lambda \) are the output lengths of a pseudo-random function and a hash function, respectively, \(n_D\) is the number of stored documents, \(n_w\) is the number of used keywords, \(n_{\mathbf{D}(w)}\) is the number of documents containing the keyword w (i.e. the cardinality of the search result of w), \(\varSigma _{\mathbf{D}(w)} = \sum _{i=1}^{n_w} n_{\mathbf{D}(w_i)}\), \(m_{\mathbf{D}(w)} = \max _w (n_{\mathbf{D}(w)})\), and \(\mathtt {PRF}\) and \(\mathtt {HF}\) are the computation costs of a pseudo-random function and a hash function, respectively. Here, we assume that \(\lambda < n_D\) and the binary complete-matching cost for N words is \(\log {N}\). Note that these assumptions are reasonable in practical situations.

Table 1. Comparisons among related works [13, 15, 36] and our work.

1.4 Related Works

Curtmola et al. proposed the SSE schemes (SSE-1 and SSE-2) whose encrypted indexes have search-friendly structures such as inverted index [15]. Their schemes have had a big impact on constructing efficient SSE schemes. Although SSE-2 achieves the adaptive security, the trapdoor size of SSE-2 depends on the number of stored documents.

The Chase-Kamara scheme [13] can build an encrypted index of a very simple structure, and therefore the keyword searching process is conducted efficiently. However, the trapdoor size depends on the number of stored documents.

The Miyoshi et al. scheme [36] can a construct small encrypted index by using hierarchical Bloom filters. However, the trapdoor size also depends on the number of stored documents, and the number of communication rounds between the user and the server is two.

While this paper focuses on constructing efficient SSE schemes, other useful functionalities for SSE have been studied, in addition to basic functionalities such as document storing and keyword searching: for example, document adding/deleting/updating functionalities (a.k.a. dynamic SSE) [9, 15, 20, 23, 26, 28, 29, 37, 38, 43, 45, 47,48,49], flexible search functionalities [5, 7, 10, 14, 18, 21, 27, 30, 31, 34, 35, 37, 41, 46], localities [2, 11, 17], forward security [8], UC-security [32, 33, 40], multi-user settings [1, 15, 16, 19, 24, 48], etc.

1.5 Organization

The rest of this paper is organized as follows. In Sect. 2, we recall cryptographic primitives and SSE definitions which are used throughout the paper. The Chase-Kamara scheme is given in Sect. 3, and its modified scheme is proposed in Sect. 4. We conclude in Sect. 5.

2 Preliminaries

In this section, we recall cryptographic primitives and SSE definitions which are used throughout the paper.

2.1 Notations and Basic Cryptographic Primitives

We denote the set of positive real numbers by \(\mathbb {R}^+\). We say that a function \(\mathtt {negl} : \mathbb {N}\rightarrow \mathbb {R}^+\) is negligible if for any (positive) polynomial p, there exists \(n_0 \in \mathbb {N}\) such that for all \(n \ge n_0\), it holds \(\mathtt {negl}(n) < 1/p(n)\). If A is a probabilistic algorithm, \(y \leftarrow A(x)\) denotes running A on input x with a uniformly-chosen random tape and assigning the output to y. \(A^\mathcal {O}\) denotes an algorithm with oracle access to \(\mathcal {O}\). If S is a finite set, \(s \xleftarrow {u}S\) denotes that s is uniformly chosen from S. We denote the bit length of S by |S|, and the cardinality of S by \(\# S\). For strings a and b, a||b denotes the concatenation of a and b.

We recall the definition of pseudo-random functions. A function \(f : \{0, 1\}^\lambda \times \{0, 1\}^k \rightarrow \{0, 1\}^\ell \) is pseudo-random if f is polynomial-time computable in \(\lambda \), and for any probabilistic polynomial-time (PPT) algorithm \(\mathcal {A}\), it holds

$$\begin{aligned} |\Pr [1 \leftarrow \mathcal {A}^{f_K(\cdot )}(1^\lambda ) \mid K \xleftarrow {u}\{0, 1\}^\lambda ] - \Pr [ 1 \leftarrow \mathcal {A}^{g(\cdot )}(1^\lambda ) \mid g \xleftarrow {u}\mathtt {F}[k, \ell ]]| \le \mathtt {negl}(\lambda ), \end{aligned}$$

where \(\mathtt {F}[k, \ell ]\) is the set of functions mapping \(\{0, 1\}^k\) to \(\{0, 1\}^\ell \).

We recall the definition of left-or-right indistinguishability against the chosen plaintext attack (LOR-CPA) for symmetric-key encryption [4]. A symmetric-key encryption scheme is secure in the sense of LOR-CPA if for any PPT adversary \(\mathcal {A}\), it holds

where \(\mathtt {Enc}_K(\mathcal {LR}(\cdot ,\cdot , b))\) is the left-or-right oracle that takes an input \((x_0, x_1)\) and outputs \(C_0 \leftarrow \mathtt {Enc}_K(x_0)\) if \(b=0\) and \(C_1 \leftarrow \mathtt {Enc}_K(x_1)\) if \(b=1\).

2.2 Definitions of SSE

We recall the definitions of SSE, formalized in [15]. Firstly, we give notions used in SSE literature.

  • Let \(D \in \{0, 1\}^*\) be a document, and \(\mathbf{D}= (D_1, \ldots , D_n)\) be a document collection. Let \(\mathbf{C}= (C_1, \ldots , C_n)\) be a ciphertext collection of \(\mathbf{D}\), where \(C_i\) is a ciphertext of \(D_i\) for \(1 \le i \le n\). We assume that \(D_i\) and \(C_i\) contain the same unique identifier \(id_i\).

  • Let \(w \in \{0, 1\}^k\) be a keyword, and \(\varDelta \subseteq \{0, 1\}^k\) be a set of possible keywords. Let \(\varDelta (\mathbf{D}) \subseteq \varDelta \) be a set of keywords which are contained in some of \(D_1, \ldots , D_n\). Throughout this paper, we assume that \(\# \varDelta \) is polynomially bounded in a security parameter \(\lambda \).

  • For \(\mathbf{D}= (D_1, \ldots , D_n)\) and \(w \in \varDelta \), let \(\mathbf{D}(w)\) be a set of identifiers of documents that contain w. Namely, \(\mathbf{D}(w) = \{id_{i_1}, \ldots , id_{i_m}\}\) for \(w \in \varDelta (\mathbf{D})\) or \(\emptyset \) for \(w \not \in \varDelta (\mathbf{D})\). For a searching sequence \(\mathbf{w}= (w_1, \ldots , w_q)\), let \(\mathbf{D}(\mathbf{w}) = (\mathbf{D}(w_1), \ldots , \mathbf{D}(w_q))\).

An SSE scheme over \(\varDelta \), \(\mathtt {SSE} = (\mathtt {Gen}, \mathtt {Enc}, \mathtt {Trpdr}, \mathtt {Search}, \mathtt {Dec})\), is defined as follows.

  • \(K \leftarrow \mathtt {Gen}(1^\lambda )\): \(\mathtt {Gen}\) is a probabilistic algorithm which takes a parameter \(1^\lambda \) as an input and outputs a secret key K, where \(\lambda \) is a security parameter.

  • \((\mathcal {I}, \mathbf{C}) \leftarrow \mathtt {Enc}(K, \mathbf{D})\): \(\mathtt {Enc}\) is a probabilistic algorithm which takes a secret key K and a document collection \(\mathbf{D}\) as input and outputs an encrypted index \(\mathcal {I}\) and a ciphertext collection \(\mathbf{C}= (C_1, \ldots , C_n )\).

  • \(T \leftarrow \mathtt {Trpdr}(K, w)\): \(\mathtt {Trpdr}\) is a deterministic algorithm which takes a secret key K and a keyword w as input and outputs a trapdoor T.

  • \(S \leftarrow \mathtt {Search}(\mathcal {I}, T)\): \(\mathtt {Search}\) is a deterministic algorithm which takes an encrypted index \(\mathcal {I}\) and a trapdoor T as input and outputs an identifier set S.

  • \(D \leftarrow \mathtt {Dec}(K, C)\): \(\mathtt {Dec}\) is a deterministic algorithm which takes a secret key K and a ciphertext C as input and outputs a plaintext D of C.

An SSE scheme is correct if for all \(\lambda \in \mathbb {N}\), all \(\mathbf{D}\), all \(w \in \varDelta (\mathbf{D})\), all K output by \(\mathtt {Gen}(1^\lambda )\), and all \((\mathcal {I}, \mathbf{C})\) output by \(\mathtt {Enc}(K, \mathbf{D})\), it holds \(\mathtt {Search}(\mathcal {I}, \mathtt {Trpdr}(K, w)) = \mathbf{D}(w)\) and \(\mathtt {Dec}(K, C_i) = D_i\) for \(1 \le i \le n\).

We give security notions, history, access pattern, search pattern, trace, and non-singular [15].

  • For a document collection \(\mathbf{D}= (D_1, \ldots , D_{n})\) and a searching sequence \(\mathbf{w}= (w_1, \ldots , w_{q})\), \(H = (\mathbf{D}, \mathbf{w})\) is called history. This information is sensitive in SSE.

  • \(\alpha (H) = (\mathbf{D}(w_1), \ldots , \mathbf{D}(w_q))\) is called access pattern for a history \(H = (\mathbf{D}, \mathbf{w})\). This information is appeared by performing the keyword searching processes.

  • The following binary symmetric matrix \(\sigma (H) = (\sigma _{i, j})\) is called search pattern for a history \(H = (\mathbf{D}, \mathbf{w})\): for \(1 \le i \le j \le q\), \(\sigma _{i, j} = 1\) if \(w_i = w_j\), and \(\sigma _{i, j} = 0\) otherwise. This information is appeared by performing the keyword searching processes because trapdoors are deterministically generated in SSE.

  • \(\tau (H) = (|D_1|, \ldots , |D_n|, \alpha (H), \sigma (H))\) is called trace for a history \(H = (\mathbf{D}, \mathbf{w})\). This information is leaked while performing SSE protocols, and therefore considered as acceptable leakage information in SSE.

  • H is called non-singular if (1) there exists a history \(H' \ne H\) such that \(\tau (H) = \tau (H')\), and (2) \(H'\) is computed from a given trace \(\tau (H)\), efficiently. We assume that any history is non-singular throughout the paper.

Then, we give the adaptive security definition proposed in [15] (a.k.a. IND-CKA2), which is widely used in SSE literature.

Definition 1

([15]). Let \(\mathtt {SSE} = (\mathtt {Gen, Enc, Trpdr, Search, Dec})\), \(\lambda \) be a security parameter, \(q \in \mathbb {N}\cup \{ 0 \}\), and \(\mathcal {A} = (\mathcal {A}_0, \ldots , \mathcal {A}_q)\) and \(\mathcal {S} = (\mathcal {S}_0, \ldots , \mathcal {S}_q)\) be probabilistic polynomial-time (PPT) algorithms. Here, we consider the following experiments Real and Sim:

We define that SSE is adaptively secure if for any \(\lambda \), any q of polynomial size, and any \(\mathcal {A}= (\mathcal {A}_0, \ldots , \mathcal {A}_q)\), there exists the following PPT algorithm \(\mathcal {S} = (\mathcal {S}_0, \ldots , \mathcal {S}_q)\): For any PPT distinguisher \(\mathcal {D}\), it holds

3 The Chase-Kamara Scheme

In this section, we give the Chase-Kamara scheme which is directly obtained by simplifying the structured encryption schemes (especially, the associative structured encryption scheme for labeled data) proposed in [13].

Let \(F : \{0, 1\}^\lambda \times \{0, 1\}^k \rightarrow \{0, 1\}^\ell \) be a pseudo-random function, and \(\mathtt {SKE}\) be a symmetric-key encryption scheme. Let n be the number of stored documents, that is, \(\mathbf{D}= \{D_1, \ldots , D_n\}\). In the Chase-Kamara scheme, we restrict that \(\ell \ge n\). Here, we use the following bit string \(b_1 || \cdots || b_n || b_{n+1} || \cdots || b_{\ell }\) as another representation for \(\mathbf{D}(w)\): \(b_i = 1\) if \(id_i \in \mathbf{D}(w)\), and \(b_i = 0\) otherwise. For example, if \(n = 3\), \(\ell = 5\), and \(\mathbf{D}(w) = \{id_1, id_3\}\), then we also regard \(\mathbf{D}(w)\) as 10100. The encrypted index \(\mathcal {I}\) built in the Chase-Kamara consists of \(\{ (key, val) \}\). Let the notation \(\mathcal {I}[x]\) be y if there exists a pair (xy) in \(\mathcal {I}\), or \(\bot \) otherwise. Then, the Chase-Kamara scheme is given as follows:

  • \(\mathtt {Gen}(1^\lambda )\):

    1. 1.

      Choose \(K_1, K_2 \xleftarrow {u}\{0, 1\}^\lambda \) and \(K_3 \leftarrow \mathtt {SKE.Enc}(1^\lambda )\).

    2. 2.

      Output \(K = (K_1, K_2, K_3)\).

  • \(\mathtt {Enc}(K, \mathbf{D})\):

    1. 1.

      Let \(\mathcal {I}= \emptyset \).

    2. 2.

      For \(w \in \varDelta \),

      1. (a)

        Compute \(key = F(K_1, w)\) and \(val = \mathbf{D}(w) \oplus F(K_2, w)\).

      2. (b)

        Append (keyval) to \(\mathcal {I}\).

    3. 1.

      For \(D \in \mathbf{D}\), compute \(C \leftarrow \mathtt {SKE.Enc}(K_3, D)\).

    4. 2.

      Output \(\mathcal {I}\) and \(\mathbf{C}= (C_1, \ldots , C_n)\).

  • \(\mathtt {Trpdr}(K, w)\):

    1. 1.

      Compute \(T_1 = F(K_1, w)\) and \(T_2 = F(K_2, w)\).

    2. 2.

      Output \(T = (T_1, T_2)\).

  • \(\mathtt {Search}(\mathcal {I}, T)\):

    1. 1.

      Parse \(T = (T_1, T_2)\).

    2. 2.

      Let \(S = \emptyset \).

    3. 3.

      If \(\mathcal {I}[T_1] = \bot \) then output \(\emptyset \).

    4. 4.

      Compute \(v = \mathcal {I}[T_1] \oplus T_2\).

    5. 5.

      Parse \(v = v_1 || \cdots || v_n || v_{n+1} || \cdots || v_\ell \), where \(v_i \in \{0, 1\}\) for \(1 \le i \le \ell \).

    6. 6.

      For \(1 \le i \le n\), append \(id_i\) to S if \(v_i = 1\).

    7. 7.

      Output S.

  • \(\mathtt {Dec}(K, C)\):

    1. 1.

      Compute \(D \leftarrow \mathtt {SKE.Dec}(K_3, C)\).

    2. 2.

      Output D.

The Chase-Kamara scheme is adaptively secure if \(\mathtt {SKE}\) is LOR-CPA secure and F is a pseudo-random function. This security proof is very simple and straightforward (see [13]).

We observe that the Chase-Kamara scheme can perform the keyword searching process, efficiently, thanks to very simple structures of the encrypted index \(\mathcal {I}\) and the trapdoor T. On the other hand, the trapdoor size, especially \(|T_2|\), depends on the number of stored documents (that is, n). The trapdoor size becomes critical as n is increased. For example, \(|T_2|\) is of about one billion bits (approximately, 120MB) when n is one billion.

4 The Proposed Scheme

In this section, we tackle to the trapdoor size problem of the Chase-Kamara scheme, and propose its modified scheme by using our multiple hashing technique which can transform a trapdoor of short length to that of long length. Our modified scheme can break the restriction \(\ell \ge n\), where n is the number of stored documents and \(\ell \) is the output length of the used pseudo-random function F.

4.1 Our Strategy

A key point of our modified scheme is to securely divide the trapdoor generation process of the Chase-Kamara scheme by using our multiple hashing technique. According to this modification of the trapdoor generation process, the encrypted index of the Chase-Kamara scheme is also slightly modified. In the keyword searching process of the Chase-Kamara scheme, the user generates a trapdoor \(T = (T_1, T_2)\) of long length (especially, \(T_2 = F_2(K, w)\)) and the server searches the encrypted index \(\mathcal {I}\) by directly using the trapdoor T. In order to address the trapdoor size problem, we modify this process as follows. The user generates a trapdoor of short length, and the server transforms the trapdoor to a meaningful value of long length, which consist of multiple hash values and correspond to the trapdoor of the Chase-Kamara scheme. Then, the server searches the encrypted index using the trapdoor and the hash values. This process can be achieved by using our multiple hashing technique. This technical overview is as follows.

As shown in Sect. 3, the encrypted index \(\mathcal {I}\) of the Chase-Kamara scheme is constructed by

$$\begin{aligned} \{ (key, val) \}_w = \{ (F(K_1, w), \mathbf{D}(w) \oplus F(K_2, w)) \}_{w \in \varDelta }, \end{aligned}$$

where w is a keyword, \(F : \{0, 1\}^\lambda \times \{0, 1\}^k \rightarrow \{0, 1\}^\ell \) is a pseudo-random function, \(K_1\) and \(K_2\) are secret keys of F, and \(\mathbf{D}(w)\) is a plain search result for a keyword w and represented as the special bit string form described in Sect. 3. A trapdoor T for a keyword w is computed by \(T = (T_1, T_2) = (F(K_1, w), F(K_2, w))\), where \(|T_2| = \ell \ge n\).

In order to address the above trapdoor size problem, we modify the encrypted index of the Chase-Kamara scheme by using the following multiple hashing technique. For a hash function \(H : \{0, 1\}^* \rightarrow \{0, 1\}^\lambda \), we modify \(\mathcal {I}\) asFootnote 1

$$ \{(key, val)\} = \{(F(K_1, w), \mathbf{D}(w) \oplus (h_{w, 1} || \cdots || h_{w, N}) )\}_{w \in \varDelta }, $$

where \(N = \lceil n / \lambda \rceil \) and

$$ h_{w, 1} = H( H(K_2 || w) || 1), \ldots , h_{w, N} = H( H(K_2 || w) || N). $$

In addition to the above modification of the encrypted index, we further modify the trapdoor \(T = (T_1, T_2)\) as \((F(K_1, w), H(K_2 || w))\).

Then, the keyword searching process in this modification is conducted as follows. For a trapdoor \(T = (F(K_1, w), H(K_2 || w))\), the server computes hash values \(h_{w, 1}, \ldots , h_{w, N}\) from \(T_2 = H(K_2 || w)\), and then checks its search result by \(\mathcal {I}[T_1] \oplus (h_{w, 1} || \cdots || h_{w, N}) (= \mathbf{D}(w))\), similarly to the keyword searching process of the Chase-Kamara scheme. Thus, this modification dramatically reduce the trapdoor size from O(n) to \(O(\lambda )\). For example, the trapdoor size is of 512 bits when we use SHA-256. We also observe an advantage that the server can generate arbitrary long values corresponding to \(T_2\) (i.e. the keyword w) with no secret information. We believe that our multiple hashing technique would be applied to other SSE schemes which have the trapdoor size problem, due to its generality and simplicity. Our multiple hashing technique is summarized in Fig. 1.

Fig. 1.
figure 1

Summary of our multiple hashing technique.

From a viewpoint of security, our multiple hashing technique leads that the server cannot infer not only hidden keywords from trapdoors, but also any information on relationships among elements of our encrypted index until received trapdoors, due to one-wayness of multiple hashing. As a result, we can also show its adaptive security from a similar strategy as the security proof of the Chase-Kamara scheme, but in the random oracle model since our proof strategy essentially requires randomness of hash functions.

4.2 Construction

Let \(H : \{0, 1\}^* \rightarrow \{0, 1\}^\lambda \) be a hash function. Let \(N = \lceil \frac{n}{\lambda } \rceil \) and \(\mathbf{D}(w) = b_1 || \cdots || b_n || b_{n+1} || \cdots || b_{\lambda N}\), where \(b_1, \ldots , b_n\) are represented as the special bit form described in Sect. 3 and \(b_{n+1} = \dots = b_{\lambda N} = 0\). The modified scheme is proposed as follows:

  • \(\mathtt {Gen}(1^\lambda )\):

    1. 1.

      Choose \(K_1 \xleftarrow {u}\{0, 1\}^\lambda \) and \(K_2 \leftarrow \mathtt {SKE.Enc}(1^\lambda )\).

    2. 2.

      Output \(K = (K_1, K_2)\):

  • \(\mathtt {Enc}(K, \mathbf{D})\):

    1. 1.

      Let \(\mathcal {I}= \emptyset \).

    2. 2.

      Compute \(N = \lceil \frac{n}{\lambda } \rceil \).

    3. 3.

      For \(w \in \varDelta \),

      1. (a)

        Compute \(key = H(K_1 || 0 || w)\).

      2. (b)

        Compute \(h_w = H(K_1 || 1 || w)\) and \(h_{w, i} = H( h_w || i)\) for \(1 \le i \le N\).

      3. (c)

        Compute \(val = \mathbf{D}(w) \oplus (h_{w, 1} || \cdots || h_{w, N})\).

      4. (d)

        Append (keyval) to \(\mathcal {I}\).

    4. 4.

      For \(D \in \mathbf{D}\), compute \(C \leftarrow \mathtt {SKE.Enc}(K_2, D)\).

    5. 5.

      Output \(\mathcal {I}\) and \(\mathbf{C}= (C_1, \ldots , C_n)\).

  • \(\mathtt {Trpdr}(K, w)\):

    1. 1.

      Compute \(T_1 = H(K_1 || 0 || w)\) and \(T_2 = H(K_1 || 1 || w)\).

    2. 2.

      Output \(T = (T_1, T_2)\).

  • \(\mathtt {Search}(\mathcal {I}, T)\):

    1. 1.

      Parse \(T = (T_1, T_2)\).

    2. 2.

      Let \(S = \emptyset \).

    3. 3.

      If \(\mathcal {I}[T_1] = \bot \) then output \(\emptyset \).

    4. 4.

      Compute \(N = \lceil \frac{n}{\lambda } \rceil \).

    5. 5.

      Compute \(h_1' = H( T_2 || 1), \ldots , h_N' = H(T_2 || N)\).

    6. 6.

      Compute \(v = \mathcal {I}[T_1] \oplus (h_1' || \cdots || h_N')\).

    7. 7.

      Let \(v = v_1 || \cdots || v_n || v_{n+1} || \cdots || v_{\lambda N}\), where \(v_i \in \{0, 1\}\) for \(1 \le i \le \lambda N\).

    8. 8.

      For \(1 \le i \le n\), add \(id_i\) into S if \(v_i = 1\).

    9. 9.

      Output S.

  • \(\mathtt {Dec}(K, C)\):

    1. 1.

      Compute \(D \leftarrow \mathtt {SKE.Dec}(K_2, C)\).

    2. 2.

      Output D.

In the Chase-Kamara scheme, the user generates a trapdoor \((T_1', T_2') \) for a keyword \(w'\), and the server searches the encrypted index \(\mathcal {I}'\) by \(\mathcal {I}'[T_1'] \oplus T_2'\). On the other hand, our modified scheme is that the user generates a trapdoor \((T_1, T_2)\) for a keyword w, and the server transforms \(T_2\) to the value \((h_{w, 1} || \cdots || h_{w, N})\) and then searches the encrypted index \(\mathcal {I}\) by \(\mathcal {I}[T_1] \oplus (h_{w, 1} || \cdots || h_{w, N})\).

Then, we can show the following security of the modified scheme.

Theorem 1

The modified scheme is adaptively secure in the random oracle model if SKE is LOR-CPA secure.

Before proving the security of our modified scheme, we give our proof strategy. Our security proof is straightforward, similarly to that of the Chase-Kamara scheme.

  • Simulation of \(\mathcal {I}\): From the leakage information \((|D_1|, \ldots , |D_n|)\) obtained by querying on \(\mathbf{D}\), \(\mathcal {S}\) chooses \(k_i, r_{i, 1}, \ldots , r_{i, N} \xleftarrow {u}\{0, 1\}^\lambda \), and set \(\mathcal {I}= \{ (k_i, r_{i, 1} || \cdots || r_{i, N})\}_{1 \le i \le \# \varDelta }\). With this simulation, \(\mathcal {S}\) cheats \(\mathcal {A}\) as if \(\mathcal {I}= \{(k_i, r_{i, 1} || \cdots || r_{i, N})\}\) is generated in the real experiment.

  • Simulation of T: If \(\mathcal {A}\) queries on \(w_i\), then for some j, \(\mathcal {S}\) regards \(r_{j, 1} || \cdots || r_{j, N}\) as

    $$ \begin{array}{ccccccccccc} r_{j, 1} || \cdots || r_{j, N} &{} = &{} \mathbf{D}(w_i) &{} \oplus &{} ( &{} r_{j, 1}' &{} || &{} \cdots &{} || &{} r_{j, N}' &{}) \\ &{}=&{} \mathbf{D}(w_i) &{} \oplus &{} ( &{} H(r_j || 1) &{} || &{} \cdots &{} || &{} H(r_j || N) &{}) \end{array} $$

    by assigning some value \(r_j \in \{0, 1\}^\lambda \), and further regards \(r_j\) as \(H(K_1 || 1 || w_i)\). With this simulation, \(\mathcal {S}\) cheats \(\mathcal {A}\) as if \(r_j\) is obtained from \(H(K_1 || 1 || w_i)\) and \(T = (k_j, r_j)\) is generated in the real experiment. In order to simulate the above completely, \(\mathcal {S}\) computes \(r_{j, 1}', \ldots , r_{j, N}'\) from \(val_j = r_{j, 1} || \cdots || r_{j, N}\) and the leakage information \(\mathbf{D}(w_i)\) obtained by querying on \(w_i\), chooses \(r_j \xleftarrow {u}\{0, 1\}^\lambda \), and appends

    $$ \begin{array}{c|c} \hline ~ \text {Input} ~ &{} ~ \text {Output} ~ \\ \hline r_j || 1 &{} r_{j, 1}' \\ \hline \vdots &{} \vdots \\ \hline r_j || N &{} r_{j, N}' \\ \hline \end{array} $$

    into a random oracle hash table \(\mathcal {H}\).

Our formal proof with the above simulation is given as follows.

Proof

Let \(\mathcal {H} = \{ (input, output) \}\) be a random oracle hash table which is set to \(\emptyset \), initially. A PPT simulator \(\mathcal {S} = (\mathcal {S}_0, \ldots , \mathcal {S}_q)\) is constructed as follows.

\(\mathcal {S}_0\)’s simulation. For the leakage information \((|D_1|, \ldots , |D_n|)\) obtained from \(\mathcal {A}\)’s output \(\mathbf{D}= (D_1, \ldots , D_n)\), \(\mathcal {S}_0\) computes \(N = \lceil \frac{n}{\lambda } \rceil \), and chooses random numbers \(r_{1, 1}, \ldots , r_{1, N}, \ldots , r_{\delta , 1}, \ldots , r_{\delta , N} \xleftarrow {u}\{0, 1\}^\lambda \), where \(\delta = \# \varDelta \). Let

$$\begin{aligned} R_1= & {} r_{1, 1} || \cdots || r_{1, N}, \\&\vdots&\\ R_\delta= & {} r_{\delta , 1} || \cdots || r_{\delta , N}. \end{aligned}$$

\(\mathcal {S}_0\) also chooses random numbers \(k_{1}, \ldots , k_{\delta } \xleftarrow {u}\{0, 1\}^\lambda \), and sets \(\mathcal {I}= \{ (k_i, R_i) \}_{1 \le i \le \delta }\). Further, \(\mathcal {S}_0\) runs \(SK \leftarrow \mathtt {SKE.Gen}(1^\lambda )\) and \(C_i \leftarrow \mathtt {SKE.Enc}(SK, 0^{|D_i|})\) for \(1 \le i \le n\). Then, \(\mathcal {S}_0\) sends \(\mathcal {I}\) and \(\mathbf{C}= \{C_1, \ldots , C_n\}\) to \(\mathcal {A}\).

\(\mathcal {S}_i\)’s simulation \((1 \le i \le q)\). For the leakage information \(\alpha (\mathbf{D}, \mathbf{w}_i)\) and \(\sigma (\mathbf{D}, \mathbf{w}_i)\) obtained from \(\mathcal {A}\)’s output \(w_i\), \(\mathcal {S}_i\) regards \(\mathbf{D}(w_i)\) as \(b_{i, 1} || \cdots || b_{i, n} ||b_{i, n+1} (=0) || \cdots || b_{i, \lambda N} (=0)\), where \(b_{i, j} = 1\) if \(id_j \in \mathbf{D}(w_i)\) and \(b_{i, j} = 0\) otherwise. After that, \(\mathcal {S}_i\) checks whether \(w_i \ne w_{i'}\) for any \(w_{i'} \ (1 \le i' < i)\). We note that this check can be efficiently done from the leakage information \(\sigma (\mathbf{D}, \mathbf{w}_i)\).

If \(w_i \ne w_{i'}\) for \(1 \le i' < i\), \(\mathcal {S}_i\) chooses \(1 \le j \le \delta \) which has not been chosen yet, and computes \(r_{j, 1}' || \cdots || r_{j, N}' = \mathbf{D}(w_i) \oplus R_j\). Then, \(\mathcal {S}_i\) chooses a random number \(r_j \xleftarrow {u}\{0, 1\}^\lambda \), appends

$$ (r_j || 1, r_{j, 1}'), \ldots , (r_j || N, r_{j, N}'), $$

into \(\mathcal {H}\), and sends \(T_i = (k_j, r_j)\) as a trapdoor of \(w_i\) to \(\mathcal {A}\).

If there exists \(i' < i\) such that \(w_i = w_{i'}\), \(\mathcal {S}_i\) merely re-sends \(T_{i'} = (k_j, r_j)\), which has been already chosen in the \(i'\)-th simulation, to \(\mathcal {A}\).

Analysis for \(\mathcal {S}\) ’s simulation

  • \(\mathcal {I}\) and \((T_1, \ldots , T_q)\) output by \(\mathcal {S}\) work correctly, similarly to \(\mathbf {Real}\).

  • For any \(1 \le i \le n\), \(\mathcal {A}\) cannot distinguish \(C_i\) output by \(\mathcal {S}_0\) from \(C_i\) output by Real since \(\mathtt {SKE}\) is LOR-CPA secure.

  • The probability that for any \(1 \le i \le q\), \(\mathcal {A}\) can query \(K_1 || 0 || w_i\) to the random oracle (i.e. \(\mathcal {H}\)) a priori (in other words, the probability that \(\mathcal {A}\) can obtain its corresponding hash value \(k_j\) a priori), is negligible since \(\mathcal {A}\) has no secret key and cannot infer it without querying on \(w_i\).

  • The probability that for any \(1 \le i \le q\), \(\mathcal {A}\) can query \(K_1 || 1 || w_i\) to \(\mathcal {H}\) a priori (in other words, the probability that \(\mathcal {A}\) can obtain its corresponding hash value \(r_j\) a priori), is negligible since \(\mathcal {A}\) has no secret key and cannot infer it without querying on \(w_i\).

  • The probability that for any \(1 \le j \le \delta \) and any \(1 \le i \le N\), \(\mathcal {A}\) can query \(r_j || i\) to \(\mathcal {H}\) a priori (in other words, the probability that \(\mathcal {A}\) can obtain its corresponding hash value \(r_{j, i}'\)), is negligible since \(\mathcal {A}\) cannot have \(r_j\) a priori for any \(1 \le j \le \delta \) without querying on \(w_i\).

  • The probability that for any \(1 \le j \le \delta \) and any \(1 \le i \le N\), \(\mathcal {A}\) can infer \(r_{j, i}'\) from \(R_j\), is negligible since \(\mathcal {A}\) cannot have \(\mathbf{D}(w_i)\) a priori for any \(1 \le i \le q\) without querying on \(w_i\).

From the above analysis, \(\mathcal {A}\) and also any distinguisher \(\mathcal {D}\) cannot distinguish \((k_i, r_i)\) output by \(\mathcal {S}\) from \((key_i, val_i)\) output by \(\mathbf {Real}\) for any \(1 \le i \le \delta \). Thus, the modified scheme is adaptively secure in the random oracle model.    \(\square \)

5 Conclusion

In this paper, we have shown the Chase-Kamara encryption scheme which is obtained by simplifying the structured encryption schemes [13]. We have focused on the trapdoor size problem of the Chase-Kamara scheme, and proposed the modified scheme whose trapdoor size does not depend on the number of stored documents. The modified scheme is based on our multiple hashing technique which can transform a trapdoor of short length to that of long length. We have shown that the modified scheme is adaptively secure in the random oracle model.

A future work is to show that our modified scheme is adaptively secure from standard assumptions. We note that our modified scheme satisfies non-adaptive security if employed pseudo-random functions instead of hash functions in our modified scheme.