Keywords

1 Introduction

A searchable encryption scheme (SSE) allows a party to encrypt a message, index the ciphertext, and at any point in time to efficiently look for the plaintext by issuing a search token encoding a search criterion. In addition, a searchable symmetric encryption scheme is dynamic, if it supports updates of the encrypted database. SSE is an ideal tool in settings where a party would like to outsource some data while it still wishes to maintain some privacy guarantees.

The best-possible notion of privacy hides the memory access during searches and updates. In particular, a server should not be able to tell whether a client retrieves a document which it already obtained from a previous query. One calls this property privacy of the access pattern. Satisfying access pattern privacy in its full generality requires oblivious RAMs introduced by Goldreich and Ostrovsky [6]. Unfortunately, as it is often the case, general-purpose solutions are inefficient and rather of theoretical interest (although significant steps have been made towards practical ORAMs [11]). A weaker notion, known as privacy of the search pattern, hides any information about the encoded keywords. In particular, a server should not be able to tell whether a client already has searched for the same word. Dynamic SSE schemes additionally ask for update privacy. The system shall leak no information about the documents and keywords through the execution of update protocols.

Apart from privacy, relevant design criteria for SSE systems include support of comprehensive search queries and scalability aspects measured in the communication and search complexity. Consider an encrypted version of a SQL-database. Search queries are typically formulated in a regular language, allowing for comprehensive expressions over multiple keywords. In file systems, such as Google’s Drive and Amazon’s S3, millions of users push, pull and delete files. Here, the desiderata is a low communication complexity for searches and updates. Ideally, the operation shall require a single round of communication, as most clients connect through high latency networks.

Previous Work. The research community has proposed different SSE schemes, each one addressing different trade-offs between security, functionality and efficiency. While prior SSE schemes achieve sublinear search time that scales only with the number of documents matching the query [24, 7, 8], most of them lack of important properties usually required in practice like the ability to update the EDB, expressiveness in search queries and the capacity to run a search parallel searching. In Table 1 we give a comparison of the relevant schemes.

The exception is the ground-breaking work of Cash et al. [2, 3]. In [3] they provide an SSE scheme with truly practical search capability and support of conjunctive search queries. Using techniques from multi-party computation they construct a scheme with search complexity \(\mathcal {O}(r)\) where r is the number of documents matching a search word. They reduce the communication complexity with a clever pre-processing technique. Later Cash et al. [2] propose efficient SSE schemes that support updates to the encrypted database, i.e. dynamics. The protocols resemble previous techniques from [3].

Table 1. Comparison of SSE schemes for a single keyword [2, Fig. 1]. Notation: In security Ad means adaptive security in the standard model, Ad(ROM) means adaptive security in the random oracle model, NonAd means non-adaptive security; Leakage is leakage from encrypted database only; Search time is the complexity of the search algorithm while Comm. is the number of communication rounds; Update measures the communication complexity; Formula asks for support of Boolean expression; n = \(\#\) documents, \(m=|\mathcal {W}|\), \(r=|{\mathbf{DB }}(w)|\), \(N=\sum _w |{\mathbf{DB }}(w)|\), \(M\,=\,max_w |{\mathbf{DB }}(w)|\), p = \(\#\) processors, \(|\mathcal {W}_{id}|\) = \(\#\) keyword changes in an update, \(d_w\) = \(\#\) times the searched-for-keyword has been added/deleted.

Our Results. In this work we construct a searchable symmetric encryption system for searching on encrypted that is

  1. 1.

    Search efficient. Searching the encrypted database for all files containing a single keyword is independent of the number of files. It is bounded by the size of the word dictionary \(\mathcal {W}\), i.e. \(\mathcal {O}(\log |\mathcal {W}|)\) where the size of dictionary is fixed in advance.

  2. 2.

    Communication efficient. Communicating a search and update requires a single round of communication. A search queries is succinct. It contains \(2 \log |\mathcal {W}|\) elements of a pairing-friendly group \(\mathbb {G}.\)

  3. 3.

    Private. Searching for all files containing a word leaks no information about the searched word. Search tokens are probabilistic and indistinguishable from prior search queries.

  4. 4.

    Dynamic. Addition and deletion takes as much time and communication as searching for the file.

  5. 5.

    Functional. Searching for all files containing multiple keywords expressed as a Boolean formula of size \(|\mathcal {W}_{id}|\) takes search time \(\mathcal {O}(|\mathcal {W}_{id}|\cdot \log (|\mathcal {W}|)).\)

Our scheme achieves the results by expressing the data structure as an encrypted binary tree (while prior work followed a linked list approach [24, 7, 8]). To traverse the encrypted tree, we construct a function-private secret-key functional encryption scheme for the inner product functionality. The system supports any arbitrary polynomial number of key queries and message queries with the property that the functional keys decrypt specific ciphertexts only. Our construction makes use of symmetric bilinear maps. The security notion we prove for our construction is a natural indistinguishability-based notion, and we establish it under the Subgroup Decision Assumption. To obtain correctness for our scheme, we assume that inner products will be contained in a polynomially-sized range. This assumption is sufficient for our application, as the tree is binary.

2 Preliminaries

2.1 Bilinear Groups

We recall some facts about bilinear groups and vector spaces.

Definition 1

(Bilinear Group). A bilinear group is generated by a probabilistic algorithm \(\mathcal {G}\) that takes as input a security parameter \({\lambda }\) and outputs three abelian groups \(\mathbb {G}, \mathbb {G}_1, \mathbb {G}_T\) with \(\mathbb {G}_1 \subset \mathbb {G}.\) The algorithm also computes an efficiently computable map \(e:\mathbb {G}\times \mathbb {G}\rightarrow \mathbb {G}_T\) that is:

  • bilinear: For all \(g,h\in \mathbb {G}\), \(x,y\in \mathbb {F}_p\) we have \(e(g^x,h^y)=e(g,h)^{xy}\)

  • non-degenerate: For all \(g,h\in \mathbb {G}\), we have \(e(g,h)\not = 1.\)

We assume that group operations and random sampling in each group is efficiently computable and denote the output of \(\mathcal {G}\) as \((\mathbb {G}, \mathbb {G}_1, \mathbb {G}_T, e).\)

In additive notation, a prime order bilinear group is closed under addition and scalar multiplication. It gives raise to a vector space over \(\mathbb {G}.\) To this end, we introduce some additional notation. Let \(\mathbf {x}=(x_1,\ldots , x_n)\in \mathbb {F}_p^n\) be an exponent vector. We write \(g^{\mathbf {x}}=(g^{x_1},\ldots ,g^{x_n})\) for the n-dimensional group vector in \(\mathbb {G}.\) For any “scalar” \(\alpha \in \mathbb {F}_p\) we use the notation \((g^{\mathbf{x }})^\alpha \) to denote the scalar product \((g^{\alpha x_1},\ldots ,g^{\alpha x_n}).\) If the context is clear, we use the term vector interchangeably for group elements and exponents.

Looking at bilinear groups as vector spaces allows us to express linear mappings between such spaces. One such linear mapping is the “dot” product (sometimes referred to as inner product in Euclidean spaces), defined as the sum of the products of the corresponding entries of two vectors. Geometrically, it is the product of the Euclidean magnitudes of the two vectors and the cosine of the angle between them. We define the analog of the dot product between two n-dimensional vectors in bilinear groups.

Definition 2

(Dot Product Group). A dot product group is a bilinear group generated by the group generator \(\mathcal {G}_*.\) The generator also outputs an efficiently computable algorithm \(d:\mathbb {G}^n\times \mathbb {G}^n\rightarrow \mathbb {G}_T.\) The algorithm computes the “dot product” between two vectors \(g^{\mathbf {x}}, h^{\mathbf {y}}\), written \(g^{\mathbf {x}} * h^{\mathbf {y}}\), as

$$\begin{aligned} d(g^{\mathbf {x}}, h^{\mathbf {y}})=\prod ^n_{i=1} e(g,h)^{x_i y_i}=e(g,h)^{\mathbf {x}*\mathbf {y}} \end{aligned}$$

The dot product fulfils the following properties if \(\mathbf {x}, \mathbf {y}\) and \(\mathbf {z}\) are “vectors” in \(\mathbb {F}_p^n\) and \(\alpha ,\beta \) are “scalars” in \(\mathbb {F}_p\):

  • commutative: \(g^{\mathbf {x}} * h^{\mathbf {y}}= h^{\mathbf {y}}*g^{\mathbf {x}}\)

  • distributive (over multiplication): \(g^{\mathbf {x}} * (h^{\mathbf {y}} \ h^{\mathbf {z}}) = (g^{\mathbf {x}} * h^{\mathbf {y}}) \ (g^{\mathbf {x}}* h^{\mathbf {z}}) \)

  • scalar multiplication: \( g^{\alpha \mathbf {x}} * h^{\beta \mathbf {y}}= (g^{\mathbf {x}})^\alpha * (h^{\mathbf {y}})^\beta = (g^{\mathbf {x}} * h^{\mathbf {y}})^{\alpha \beta } \)

  • bilinear: \(g^{\mathbf{x }} * (h^{\alpha \mathbf{y }} \ h^{{\mathbf {z}}} ) = (g^{\mathbf{x }} * h^{ \mathbf{y }})^\alpha \ (g^{\mathbf{x }} * h^{{\mathbf {z}}} ).\)

2.2 Dual Spaces

We will employ the concept of dual pairing vector spaces from [10]. We choose two random sets of vectors: \(\mathbb {B}:= \{{\mathbf{b }}_{1}, \ldots , {\mathbf{b }}_{m}\}\) and \(\mathbb {B}^*:= \{{\mathbf{b }}^*, \ldots , {\mathbf{b }}^*_m\}\) subject to the constraint that they are “dual orthonormal” in the following sense:

$$\begin{aligned}&\langle {\mathbf{b }}_i, {\mathbf{b }}^*_i \rangle = 1\;{\text {mod}}\;p\;\text {for all } i\\&\langle {\mathbf{b }}_i, {\mathbf{b }}^*_j \rangle = 0\;{\text {mod}}\;p\; \text {for all } i\not = j \end{aligned}$$

where \(\langle \cdot , \cdot \rangle \) denotes the dot product.

We note that choosing sets \((\mathbb {B},\mathbb {B}^*)\) at random from sets satisfying these dual orthonormality constraints can be realized by choosing a set of n vectors \(\mathbb {B}\) uniformly at random from \(\mathbb {F}^n_p\) (these vectors will be linearly independent with high probability), then determining each vector of \(\mathbb {B}^*\) from its orthonormality constraints (these vectors will be close to the uniform distribution with high probability). We will denote choosing random dual orthonormal sets this way as: \((\mathbb {B}, \mathbb {B}^*)\leftarrow {\mathcal {D}ual}(\mathbb {F}^n_p).\)

2.3 Subset Membership Problem

The problem of deciding membership in a subset appears in various forms in cryptography. One canonical example is the decisional Diffie-Helman (DDH) problem in a group \(\mathbb {G}\) of prime order p generated by g: Given \((g, g^x, g^y, T_b)\in \mathbb {G}\) the decisional Diffie-Hellman problem asks to decide if \(T_0=g^{xy}\) or \(T_1=g^z\) for random \(x,y,z{\leftarrow _R}\mathbb {F}_p.\) If we define the group G to be generated by (gg) and the subgroup \(G_1\) generated by \((g, g^x)\), then the DDH problem asks to decide if \((g^y, T_b)\) is a random member in G or \({G_{1}}\).

We recall Freeman’s definition of the subgroup decision problem in the setting of symmetric pairing-friendly groups [5]. The assumption states that it is infeasible to distinguish a random sample from group G and a random sample from the subgroup \(G_1\subset G.\) It has been used to prove security of the Boneh-Goh-Nissim encryption system and many other applications [1, 5].

Definition 3

(Subgroup Decision Assumption). Let \(\mathcal {G}_*\) be a symmetric bilinear dot product group generator. We define the following distribution

$$\begin{aligned} {\begin{array}{c} {{{\textsf {\textit{param}}}}}:=(\mathbb {G}, \mathbb {G}_1, \mathbb {G}_T ,e,d)\leftarrow \mathcal {G}(1^{\lambda })\\ G= \mathbb {G}\ \ G_1=\mathbb {G}_1 \\ T_{0} {\leftarrow _R}G_1 \ \ T_{1}{\leftarrow _R}G \end{array} }\end{aligned}$$

We define the advantage of an algorithm \({\mathcal {A}}\) in solving the subgroup decision problem to be

$$\begin{aligned} {\mathsf{Adv}_{\tiny {\mathsf{SDP}}}^{{\mathcal {A}}}} ({{\lambda }})= \big | Pr \left[ \begin{array}{c} {\mathcal {A}}({{{\textsf {\textit{param}}}}}, T_{0})=1 \end{array} \right] - Pr\left[ \begin{array}{c} {\mathcal {A}}({{{\textsf {\textit{param}}}}}, T_{1})=1 \end{array} \right] \big | \end{aligned}$$

We say that \(\mathcal {G}\) satisfies the subgroup decision assumption, if \({\mathsf{Adv}_{\tiny {\mathsf{SDP}}}^{{\mathcal {A}}}} ({{\lambda }})\) is a negligible function of \({\lambda }\) for any polynomial-time adversary \({\mathcal {A}}.\)

Note, if the subgroup decision problem is infeasible in \(\mathbb {G}\), then it is in \(\mathbb {G}_T\) as well.

2.4 Cryptographic Building Blocks

Our searchable encryption will make use of pseudo-random objects. These can be efficiently generated with pseudo-random functions.

Definition 4

(Pseudo-random Function). Let \({\mathsf {PRF}}: {\{0,1\}}^* \times {\{0,1\}}^* \rightarrow {\{0,1\}}^*\) be an efficient, length-preserving, keyed function. We define the advantage of distinguisher \({\mathcal {A}}\) as

$$\begin{aligned} {\mathsf{Adv}_{\tiny {\mathsf{PRF}}}^{{\mathcal {A}}}} ({{\lambda }})= \left| {\text {Pr}\big [{{\mathcal {A}}^{{\mathsf {PRF}}_s(\cdot )}(1^{\lambda })=1}\big ]} - {\text {Pr}\big [{{\mathcal {A}}^{f(\cdot )}(1^{\lambda })=1}\big ]} \right| \end{aligned}$$

where the seed s is chosen at random from \({\{0,1\}}^*\) and f is uniformly chosen at random from the set of functions mapping \({\lambda }\) strings to \({\lambda }\) strings. We say that \({\mathsf {PRF}}\) is a pseudorandom function, if \({\mathsf{Adv}_{\tiny {\mathsf{PRF}}}^{{\mathcal {A}}}} ({{\lambda }})\) is a negligible function of \({\lambda }\) for all probabilistic polynomial-time distinguishers \({\mathcal {A}}.\)

3 Constrained Functional Encryption over the Message Plaintext

3.1 Syntax

We will consider a specialization of the general definition of functional encryption to the particular functionality of computing dot products of n-length message plaintext vectors over a finite field \(\mathbb {F}_p\) with one caveat. Whereas the functional encryption paradigm supports the generation of keys for the decryption of a particular function for any ciphertext, our notion additionally constraints the decryptability to a particular ciphertext.

To make the difference to functional encryption clear, we will refer to the scheme as constrained functional encryption. A private key functional encryption scheme for this functionality will have the following algorithms:

Definition 5

(Constrained Functional Encryption). A constrained functional encryption system \({\mathsf {CFE}}\) consists of four algorithms \(({\mathsf {Setup}}, {\mathsf {KeyGen}}, {\mathsf {Enc}},{} {\mathsf {Dec}})\), such that

  • The \({\mathsf {Setup}}\) algorithm will take in the security parameter \({\lambda }\) and the vector length a parameter n (a positive integer that is polynomial in \(\lambda \)). It will produce a master secret key \({{MSK}}.\)

  • The encryption algorithm \({\mathsf {Enc}}\) will take in the master secret key \({{MSK}}\), and a vector \(\mathbf{x }\in \mathbb {F}_p^n.\) It produces a ciphertext \({\mathbf{CT }}_{\mathbf{x },q}\) and an internal state \(st_q\) for the q-th ciphertext. (We will use counter q to point to a ciphertext.)

  • The key generation algorithm \({\mathsf {KeyGen}}\) will take in the master secret key \({{MSK}}\), a vector \(\mathbf{y }\in \mathbb {F}_p^n\) and the internal state \(st_q.\) It produces a secret key \({{\mathbf{SK }}}_{\mathbf{y },q}.\)

  • The decryption algorithm \({\mathsf {Dec}}\) will take in a secret key \({{\mathbf{SK }}}_{\mathbf{y },q}\) and a ciphertext \({\mathbf{CT }}_{\mathbf{x },q}.\) It will output a value \(z \in \mathbb {F}_p.\)

For correctness, we require that for all \(\mathbf{x },\mathbf{y }\in \mathbb {F}_p^n\), all \({{MSK}}\) in the support of \({\mathsf {Setup}}(1^{\lambda }, n)\), all pairs \(({\mathbf{CT }}_{\mathbf{x },q}, st_q)\) result of calling \({\mathsf {Enc}}({{MSK}}, \mathbf{x })\), all decryption keys \({{\mathbf{SK }}}_{\mathbf{y },q}\) result of calling \({\mathsf {KeyGen}}(MSK, \mathbf{y }, st_q)\), we have

$$\begin{aligned} {\mathsf {Dec}}({{\mathbf{SK }}}_{\mathbf{y },q}, {\mathbf{CT }}_{\mathbf{x },q})= \langle \mathbf{x }, \mathbf{y }\rangle \end{aligned}$$

3.2 Security

We will consider an indistinguishability-based security notion defined by a game between a challenger and an attacker. At the beginning of the game the challenger calls \({\mathsf {Setup}}(1^{\lambda }, n)\) to produce the master secret \({{MSK}}.\) The challenger also selects a random bit b. Throughout the game, the attacker can (adaptively) interact with two oracles.

  • To make a ciphertext query, the attacker submits two vectors \(\mathbf{x }^0, \mathbf{x }^1 \in \mathbb {F}_p^n\) to the challenger, who then runs \({\mathsf {Enc}}(MSK, \mathbf{x }^b)\) and returns the resulting ciphertext \({\mathbf{CT }}_{\mathbf{x }^b,q}\) to the attacker. The challenger stores the state information \(st_q\) for the q-th query.

  • To make a key query, it submits two vectors \(\mathbf {y}^0, \mathbf {y}^1 \in \mathbb {F}_p^n\) along a pointer to the q-th ciphertext query to the challenger, who then runs \({\mathsf {KeyGen}}({{MSK}}, \mathbf{y }^b, st_q)\) and returns the resulting \({{\mathbf{SK }}}_{\mathbf{y }^b}\) to the attacker.

The attacker can make any polynomial number of key and ciphertext queries throughout the game. Note, the result of each ciphertext query is the generation of a ciphertext plus some internal state. We denote by \(st_q\) the state information related to the q-th ciphertext. The challenger uses \(st_q\) to answer key queries linked to the q-th ciphertext. In other words, the decryption key only decrypts the inner product when applied to the q-th ciphertext. This captures the idea of constrained decryptability of ciphertexts. At the end of the game, the attacker must submit a guess \(b'\) for the bit b. We require that for all ciphertext queries \(\mathbf {x}^0, \mathbf {x}^1\) and key queries \(\mathbf {y}^0, \mathbf {y}^1\), it must hold that

$$\begin{aligned} \langle \mathbf {x}^0, \mathbf {y}^0 \rangle = \langle \mathbf {x}^1, \mathbf {y}^1 \rangle \end{aligned}$$

The attacker’s advantage is defined to be the \(\left| \Pr [b=b']-\frac{1}{2}\right| .\)

Definition 6

We say a private key functional encryption scheme for dot products over \(\mathbb {F}_p^n\) has indistinguishable ciphertexts in presence of constrained decryption keys (or simply, is deemed secure), if any PPT attacker’s advantage in the above game is negligible as a function of the security parameter \({\lambda }.\)

3.3 Construction

We now present our construction in symmetric bilinear groups. We will choose random dual orthonormal bases \(({\mathbf{b }}_1,{\mathbf{b }}_2)\in \mathbb {B}\) and \(({\mathbf{b }}_1^*, {\mathbf{b }}_2^*)\in \mathbb {B}^*\) that will be used in the exponent to encode the message and one-time key vectors respectively. Vectors will be encoded twice to create space for a hybrid security proof, resulting in a ciphertext \((A_i, B_i)_{i=1}^n.\) A bit more concrete, the first bases \(({\mathbf{b }}_1, {\mathbf{b }}_1^{*})\) encode the message vector \(\mathbf{x }\) and \(\mathbf{x }\), whereas the second bases \(({\mathbf{b }}_2, {\mathbf{b }}_2{^*})\) encode the key vector \({\mathbf {s}}\) and \({\mathbf {u}}.\)

We view it as a core feature of our construction that the structure of messages and keys in our scheme is perfectly symmetric, just on different sides of dual orthonormal bases. This gives raise to a scheme that is both homomorphic to the message plaintext and key with respect to addition and multiplication.

To generate a decryption key for an inner product function, we encrypt the vector \(\mathbf{y }\) under the key vector \({\mathbf {v}}\) and \(\mathbf{t }\), obtaining the “ciphertext” \((C_i, D_i)_{i=1}^n\), and add a cancellation term \(E=e(g_2,h_2)^{\langle {\mathbf {s}}, \mathbf{t }\rangle }\) plus the base \(F=e(g_1, h_1)\) for the discrete log computation. Decryption first “homomorphically” evaluates the inner product over the A and D elements. (The B and C elements are used in the proof.) The result is a ciphertext encoding the inner products of the message and key vectors. Next decryption just cancels out the key component E and computes the discrete log to the base F.

We will only require decryptions of \(\langle \mathbf {x}, \mathbf {y} \rangle \) from a fixed polynomial range of values inside \(\mathbb {F}_p\), as this will allow a decryption algorithm to compute it as a discrete log in a group where discrete log is generally hard. Hence, we expect the range of \(\langle \mathbf {x}, \mathbf {y} \rangle \) to be small, say an integer in the set \(\{0, \ldots , T\}.\) Using Pollard’s lambda method the computation of the discrete log takes expected time \(\mathcal {\tilde{O}}(\sqrt{T})\) or alternatively space \(\mathcal {O}(T)\) by storing a look up table for the T entries.

  • \({{\mathsf {Setup}}(1^{\lambda },n)}\): On input the security parameter \(1^{\lambda }\) and the dimension n, compute a symmetric bilinear dot product group \((\mathbb {G}, \mathbb {G}_1, \mathbb {G}_T, e, d)\leftarrow \mathcal {G}_*(1^{\lambda })\) with \(|\mathbb {G}|=p.\) Choose generators \(g_1,h_1\in \mathbb {G}\) and \(g_2,h_2\in \mathbb {G}_1.\) Sample at random orthonormal base \(\mathbb {B}, \mathbb {B}^*\leftarrow {\mathcal {D}ual}(\mathbb {F}_p^2).\) The algorithm outputs the master secret MSK as \(\mathbb {B}, \mathbb {B}^*, g_1, g_2, h_1, h_2, p, n.\)

  • \({\mathsf {Enc}}({{MSK}}, \mathbf{x })\): To encrypt a message \(\mathbf{x }\in \mathbb {F}_p^n\) under secret key \({{MSK}}\), choose random vectors \({\mathbf {s}}_q,{\mathbf {u}}_q\in \mathbb {F}_p^n.\) Output ciphertext

    $$\begin{aligned} \{ A_i=(g_1^{{\mathbf{b }}_1})^{x_i} (g_2^{{\mathbf{b }}_2})^{s_i} ,\ B_i= (h_1^{{\mathbf{b }}^*_1})^{x_i} (h_2^{{\mathbf{b }}^*_2})^{u_i} \}_{i=1}^n \end{aligned}$$

    and store the random vectors \(st_q \leftarrow ({\mathbf {s}}_q,{\mathbf {u}}_q).\)

  • \({\mathsf {KeyGen}}({{MSK}}, \mathbf{y }, st_q)\): To generate a decryption key for the q-th ciphertext under master secret \({{MSK}}\) for vector \(\mathbf{y }\in \mathbb {F}_p^n\), the algorithm chooses random vectors \(\mathbf{t }, {\mathbf {v}}\in \mathbb {F}_p^n\) and sets the secret key \({{SK}}_{\mathbf{y },q}\) as

    $$\begin{aligned} \{ C_i=(g_1^{{\mathbf{b }}_1})^{y_i} (g_2^{{\mathbf{b }}_2})^{v_i} ,\ D_i= (h_1^{{\mathbf{b }}^*_1})^{y_i} (h_2^{{\mathbf{b }}^*_2})^{t_i} \}_{i=1}^n, \ E=e(g_2, h_2)^{{\mathbf {s}}_q \mathbf{t }}, \ F=e(g_1,h_1) \end{aligned}$$
  • \({\mathsf {Dec}}(SK_{\mathbf{y },q}, CT)\): To decrypt a ciphertext \({\mathbf{CT }}_{\mathbf{x },q}=(A,B)\) with secret key \({{SK}}_{\mathbf{y },q}=(C,D,E,F)\), compute

    $$\begin{aligned} \prod _{i=1}^n \frac{d(A_i,D_i)}{E} \end{aligned}$$

    and return the discrete log to the base \(F=e(g_1,h_1).\)

We would like to comment on the scheme:

  • The above construction is \(\textit{stateful}.\) It requires the encryptor to store the key vectors \(st_q=({\mathbf {s}}_q, {\mathbf {u}}_q)\) for every ciphertext. For efficient realizations of \({\mathsf {KeyGen}}\) one may reduce the storage complexity and re-compute the state using standard key derivation techniques. That is, instead of sampling vectors \({\mathbf {s}}_q,{\mathbf {u}}_q\) uniformly at random from \(\mathbb {F}_p\), run a pseudorandom function \({\mathsf {PRF}}(k_i,q)\) for \(i\in \{s,u\}\) where the \(k_i\)’s are part of the master secret \({{MSK}}\) and q is a pointer to the ciphertext.

  • Some emerging applications ask to compute a predicate \(P_\mathbf{x }:\mathbb {F}_p^n\rightarrow {\{0,1\}}\) from the class of predicates \(\mathcal {P}=\{P_\mathbf{x }| \mathbf{x }\in \mathbb {F}_p^n \}\) where \(P_\mathbf{x }(\mathbf{y })=1\) if \(\langle \mathbf{x }, \mathbf{y }\rangle =0\) and \(P_\mathbf{x }(\mathbf{y })=0\) otherwise. It has been shown that this way one can evaluate degree n polynomials and 2-CNF/DNF formuals [9]. Our scheme supports efficient predicate tests without computing the discrete log by comparing the output of the decryption to \(F^0=e(g_1,h_1)^0.\)

We prove the following main theorem in the full version.

Theorem 1

Assume the SDA assumption holds in \(\mathcal {G}\), then the above scheme is secure.

4 Dynamic Searchable Symmetric Encryption

A searchable encryption allows a client to encrypt data in such a way that it can later generate search tokens to send as queries to a storage server. Given a search token, the server can search over the encrypted data and return the appropriate encrypted files. Symmetric searchable encryption systems typically follow a blue print (at least when the system tolerates leakage of access patterns): One first encrypts the data with a scheme supporting pseudorandom ciphertextsFootnote 1 and tags ciphertexts with words. Next, one builds up a “cryptographic” data structure with word-identifier pairs. Each identifier points to a ciphertext (or set thereof). Then building a searchable encryption system boils down to designing search mechanisms for the data structure. Throughout the remainder of the paper, we implement the idea and define searchable encryption with respect to searching for identifiers in a data structure.

4.1 Syntax

We follow the notation of Cash et al. A database \({\mathbf{DB }}=((id_i,\{w_j\}_{j\le n})_{i\le m})\) is represented as a list of identifier/word tuples where every (file) identifier \(id_i\in \mathcal {I}\) taken form the index set \(\mathcal {I}\) is associated with j words \(\{w_j\}_{j\le n}\) taken from a word dictionary \(\mathcal {W}.\) A search query \({\psi (\mathbf{w })}=(\psi , \mathbf{w })\) is specified by a tuple of words \(\mathbf{w }\subseteq \mathcal {W}\) and a boolean formula \(\psi \) on \(\mathbf{w }.\) We denote by \(|\psi |\) the arity of the formula. We write \({\mathbf{DB }}(w_j)\) (resp. \({\mathbf{DB }}({\psi (\mathbf{w })})\)) for the set of identifiers associated with the word \(w_j\) (resp. matching \({\psi (\mathbf{w })}\)). An update query \({\phi (\mathbf{u })}\) is parameterized with an update operation \(\mathbf{u }.\) Updates of the form \((add,w,{\mathbf{id }})\), \((del,w,{\mathbf{id }})\) add or remove identifiers \({\mathbf{id }}\) assigned with word w; update operations of the form \((add,\mathbf{w }, id)\), \((del,\mathbf{w },id)\) add or remove a list of words \(\mathbf{w }\) from identifier id. We write \({\mathbf{EDB }}({\phi (\mathbf{u })})\) for the set of identifiers satisfying the update \({\phi (\mathbf{u })}.\)

Definition 7

(Searchable Encryption). A dynamic searchable symmetric encryption scheme \({\mathsf {DSSE}}\) consists of three interactive algorithms \(({\mathsf {Setup}}, {\mathsf {Search}}, {\mathsf {Update}})\) executed between the client and the server, such that

  • \({\mathsf {Setup}}(1^{\lambda }, {\mathbf{DB }}).\) On input a security parameter \({\lambda }\) and a data base \({\mathbf{DB }}\), the protocol outputs a secret key \({{MSK}}\) and an encrypted database \({\mathbf{EDB }}.\) The client stores the secret key \({{MSK}}\), whereas the server holds the encrypted database \({\mathbf{EDB }}.\)

  • \({\mathsf {Search}}({{MSK}}, {\psi (\mathbf{w })}, {\mathbf{EDB }}).\) The protocol is between the client and server, where the client takes as input a secret key \({{MSK}}\) and a search query \({\psi (\mathbf{w })}\) on words \(\mathbf{w }\), and the server takes as input the encrypted database \({\mathbf{EDB }}.\) The server outputs a set of identifiers \(ID\subseteq \mathcal {I}\), the client has no output.

  • \({\mathsf {Update}}({{MSK}},{\phi (\mathbf{u })}, {\mathbf{EDB }}).\) The protocol runs between the client and server, where the client input is a secret key \({{MSK}}\) and an update query \({\phi (\mathbf{u })}\) on operation \(\mathbf{u }\), and the server takes as input the encrypted database \({\mathbf{EDB }}.\) At the end of the interaction, the client terminates with an updated state \({{MSK}}'\) and the server with a modified database \({\mathbf{EDB }}'\!\!{.}\)

We say a \({\mathsf {DSSE}}\) system is non-interactive if \({\mathsf {Search}}\) and \({\mathsf {Update}}\) are two-round protocols.

Definition 8

(Correctness). A dynamic symmetric searchable encryption \({\mathsf {DSSE}}\) system is correct, if for all databases \({\mathbf{DB }}\), all search queries \({\psi (\mathbf{w })}\), all update queries \({\phi (\mathbf{u })}\), and all \(({{MSK}}, {\mathbf{EDB }})\leftarrow {\mathsf {Setup}}(1^{\lambda }, {\mathbf{DB }})\), it holds

  • Search correctness: There exists a negligible function \(\varepsilon _{s}\), s.t.

    $$\begin{aligned} {\text {Pr}\big [{{\mathsf {Search}}({{MSK}}, {\psi (\mathbf{w })}, {\mathbf{EDB }})\not ={\mathbf{DB }}({\psi (\mathbf{w })})}\big ]}={\mathsf {\varepsilon }_{s}({\lambda })} \end{aligned}$$
  • Update correctness: There exists a negligible function \(\varepsilon _{u}\), s.t.

    $$\begin{aligned} {\text {Pr}\big [{{\mathsf {Update}}({{MSK}}, {\phi (\mathbf{u })}, {\mathbf{EDB }})\not ={\mathbf{EDB }}({\phi (\mathbf{u })})}\big ]}={\mathsf {\varepsilon }_{u}({\lambda })} \end{aligned}$$

4.2 Security

Our aim is to provide a strong notion of query privacy. In our model the server shall not tell apart search and update queries even if the same queries have been issued before. We allow the adversary to learn from the interaction with the system is the result of search and update queriers in terms of the associated identifiers. (Note, the server will learn the access pattern when asked to retrieve the ciphertexts as a consequence of the search and update.) To this end, we devise an experiment between the challenger and the adversary \({\mathcal {A}}.\) The adversary chooses two databases \({\mathbf{DB }}_0,{\mathbf{DB }}_1\) and sends two queries \(q_0,q_1\) (be it a search or be it an update query) to the challenger emulates the effect of the query \(q_b\) for a randomly chosen bit b on either of the two encrypted databases \({\mathbf{DB }}_b.\) The adversary \({\mathcal {A}}\) wins the experiment, if he guesses the database he interacts with. To avoid a trivial game, we must restrict the type of adversarial queries. Clearly, if the adversary defines a pair of queries which differ in their response, the adversary wins the experiment with overwhelming probability. Hence, for a meaning security notion, we require that for all search queries \({\psi (\mathbf{w })}=(\psi , \mathbf{w })\), it holds that \({\mathbf{DB }}_0({\psi (\mathbf{w })}_0)={\mathbf{DB }}_1({\psi (\mathbf{w })}_1)\); and for all update queries \({\phi (\mathbf{u })}=(\phi , {\mathbf {u}})\), it holds that \({\mathbf{EDB }}_0({\phi (\mathbf{u })}_0)={\mathbf{EDB }}_1({\phi (\mathbf{u })}_1).\) We summarize the above discussion in the following experiment:

Setup: Adversary \({\mathcal {A}}\) chooses two databases \({\mathbf{DB }}_0, {\mathbf{DB }}_1.\) The challenger flips a bit \(b\in {\{0,1\}}\) and runs \({\mathsf {Setup}}(1^{\lambda }, {\mathbf{DB }}_b).\) It keeps the master secret \({{MSK}}\) to itself and gives the encrypted database \({\mathbf{EDB }}_b\) to \({\mathcal {A}}.\)

Challenge: Adversary \({\mathcal {A}}\) may additively send queries to oracles \({\mathsf {Search}}\) and \({\mathsf {Update}}\):

  • \({\mathsf {Search}}(\cdot ,\cdot )\): This oracle implements the search protocol. It expects two equally-sized search queries \(({\psi (\mathbf{w })}_b=(\psi _b, \mathbf{w }_b)\) subject to the restriction that

    $$\begin{aligned} {\mathbf{DB }}_0({\psi (\mathbf{w })}_0)={\mathbf{DB }}_1({\psi (\mathbf{w })}_1) \end{aligned}$$

    The purpose of the oracle is to emulate a client running the \({\mathsf {Search}}\) algorithm on input \(({{MSK}}, {\psi (\mathbf{w })}_b, {\mathbf{EDB }}_b).\)

  • \({\mathsf {Update}}(\cdot ,\cdot )\): This oracle expects as input two equally-sized update queries \({\phi (\mathbf{u })}_0, {\phi (\mathbf{u })}_1\) subject to the restriction that

    $$\begin{aligned} {\mathbf{EDB }}_0({\phi (\mathbf{u })}_0)={\mathbf{EDB }}_1({\phi (\mathbf{u })}_1) \end{aligned}$$

    It emulates a client running the \({\mathsf {Update}}\) algorithm on input \(({{MSK}},{\phi (\mathbf{u })}_b, {\mathbf{EDB }}_b).\)

Guess: At some point, the adversary \({\mathcal {A}}\) outputs a guess \(b'.\)

The advantage of an adversary \({\mathcal {A}}\) in this experiment is defined as \({\text {Pr}\big [{b'=b}\big ]}-\frac{1}{2}.\)

Definition 9

(Full Security). A dynamic symmetric searchable encryption system is fully secure, if all polynomial-time adversaries \({\mathcal {A}}\) have at most a negligible advantage in the above experiment.

The above notion gives strong search query privacy guarantees in the sense that an adversary does not only learn the search words \(\mathbf{w }\), but it neither learns the formula \(\psi .\) We also consider a relaxed version, where the scheme hides search words only. The experiment is identical to the above one except that we require \((\psi , \mathbf{w }_0)=(\psi , \mathbf{w }_1)\) to hold for all adversarial search queries \({\psi (\mathbf{w })}=(\psi , \mathbf{w })\) and \((\phi , \mathbf{w }_0)=(\phi , \mathbf{w }_1)\) to hold for all adversarial update queries \({\phi (\mathbf{u })}=(\phi , \mathbf{w }).\)

Definition 10

(Weak Security). A dynamic searchable symmetric encryption scheme is weakly secure, if all polynomial-time adversaries \({\mathcal {A}}\) have at most a negligible advantage in the modified experiment subject to the restriction that for all search and update queries, it holds that \((\psi , \mathbf{w }_0)=(\psi , \mathbf{w }_1)\) and \((\phi , \mathbf{w }_0)=(\phi , \mathbf{w }_1).\)

4.3 A Note on the Blue-Print

The beginning of the section describes a blue print for searchable symmetric encryption systems. The fact that an index search scheme has indistinguishable search queriers allows us to construct a searchable symmetric encryption system against outsider attackers analyzing the frequency of (popular) search words given the client’s and server’s trace as follows: Before the server sends out the ciphertexts for the identified indexes, it re-randomises the ciphertexts.

4.4 High-Level Idea

For ease of explanation, we explain the main ideas behind searching for a single word. While a common ground of previous constructions has been a linked list data structure [24], our scheme implements a perfect binary tree data structure. The depth \(d=\log |\mathcal {W}|\) of the binary tree is logarithmic in the total number of words (and a parameter fixed in advance).

We denote the k-th node at level l as \(N_{k,l}.\) The root is \(N_{0,0}.\) Every node has two children. Each edge connecting a parent node \(N_{k,l}\) and a child \(N_{k',l+1}\) is associated with a bit \(b_l\in {\{0,1\}}.\) Every leaf is randomly associated with a bucket \(b_j\) containing all indices \(id_i\in \mathcal {ID}\) matching word \(w_j\in \mathcal {W}.\) Searching for a word \(w=(w_0, \ldots , w_{d-1})\) means to traverse a path from the root \(N_{0,0}\) to a leaf \(N_{k,d-1}\) and read all identifiers in the bucket.

It remains to show how to implement a private decision mechanism for efficiently selecting the nodes. Here is where the constrained functional encryption comes into the game (for dimension \(n=1\)). We generate a cryptographic binary tree where an encryption \(({{CT}}_{k,l},q_{k,l})\leftarrow {\mathsf {Enc}}({{MSK}}, 1)\) represents node \(N_{k,l}.\)

To search for word \(w=(w_{0},\ldots ,w_{d-1})\), we first compute the random encoding \(b_j=(b_0,\ldots , b_{d-1})\) of word \(w_j\) with a pseudorandom function and next generate decryption keys \({{SK}}_{k,l}\leftarrow {\mathsf {KeyGen}}({{MSK}}, b_l, q_{k,l})\) constrained to the set of nodes on the path from the root to the target leaf. Decrypting the node \(N_{k,l}\) gives a hint to choose the child node \(N_{k', l+1}\)

$$\begin{aligned} {\mathsf {Dec}}({{SK}}_{k,l}, CT_{k,l})= {\left\{ \begin{array}{ll} 0 &{} \text {if}\; b_l=0\\ 1 &{} \text {if}\; b_l=1 \end{array}\right. } \end{aligned}$$

Applying this technique for all sequential nodes enables us to traverse the tree efficiently in \(\mathcal {O}(\log |\mathcal {W}|).\) To search a formula over multiple words \({\psi (\mathbf{w })}\), one first searches for the buckets matching every word and then applies the formula over the indices of the buckets. Searching a compound expression of arity \(|\psi |\) takes \(\mathcal {O}(|\psi |\cdot \log |\mathcal {W}|).\) Updating the words in the encrypted database essentially requires to search for the bucket matching the word. One then adds a new index to the bucket, or deletes the bucket. The operation takes time \(\mathcal {O}(\log |\mathcal {W}|).\)

Discussions and Generalizations. The use of the functional encryption scheme \({\mathsf {CFE}}\) has several advantages. First, it randomizes every search query. Without the probabilistic scheme the searchable encryption system would satisfy a weaker privacy notion, where the (outsider) adversary recognizes search queries previously sent. The scheme also makes sure that the server must traverse the tree before identifying the bucket and thereby does not deviate from the path. An adversary applying the search tokens to nodes other than the eligible ones or combining them with previous tokens receives random decryptions. These properties are the crux why the scheme leaks no more information other than the pattern to access the buckets. Third, the constrained functional encryption scheme grows in value, when one expresses a more comprehensive traversal policy. Recall, the \({\mathsf {CFE}}\) supports decryption of functions \(f_\mathbf{x }:\mathbb {F}_p^n\rightarrow \mathbb {F}_p\) from the class of inner products \(\mathcal {F}=\{f_\mathbf{x }| \mathbf{x }\in \mathbb {F}_p^n \}\) where \(f_\mathbf{x }(\mathbf{y })=\langle \mathbf{x }, \mathbf{y }\rangle \) is from a fixed polynomial range.

One may generalize the scheme to search on a directed acyclic graph in the following way:

  • For each node \(N_{k,l}\), encrypt a vector \(\mathbf{x }_{k,l}\in \mathbb {F}_p^n.\)

  • For each edge connecting a parent node \(N_{k,l}\) with a child \(N_{k',l+1}\) assign a label \(f_{\mathbf{x }_{k,l}}(\mathbf{y }_{k',l+1}).\)

  • To traverse from node \(N_{k,l}\) to child \(N_{k',l+1}\), decrypt node \(N_{k,l}\) with a key for \(\mathbf{y }_{k',l+1}\in \mathbb {F}_p^n.\)

This way, the search conditions extend to inner product functions. These are particularly useful functions enabling the computation of conjunctions, disjunctions, CNF/DNF formulas, thresholds, Hadamard weights, and low range statistics. We leave details of the general scheme to search on directed acyclic graphs and other data structures to the full version. In the forthcoming section, we describe a searchable encryption scheme for the special case, where at node \(N_{k,l}\) the inner product function computes a validity check by xor-ing \(w_l\oplus 1.\)

4.5 Description of the Construction

Let \({\mathsf {CFE}}=({\mathsf {Setup}},{\mathsf {KeyGen}}, {\mathsf {Enc}}, {\mathsf {Dec}})\) be constrained functional encryption scheme. Wlog, suppose \(|\mathcal {W}|=2^d\) is a power of 2. Let \({\mathsf {PRF}}:\{0,1\}^{\lambda }\times \{0,1\}^d\rightarrow \{0,1\}^d\) be a pseudorandom function. Define a dynamic symmetric searchable encryption system \({\mathsf {DSSE}}=({\mathsf {Setup}}, {\mathsf {Search}}, {\mathsf {Update}})\) as follows:

  • \(Setup(1^{\lambda }, {\mathbf{DB }})\): On input a security parameter \({\lambda }\) and database \({\mathbf{DB }}=((id_i, \{w_j\}_{j\le 2^d})_{i\le m})\), build up an encrypted data structure as follows:

    1. 1.

      Sample a random seed \(s{\leftarrow _R}{\{0,1\}}^{\lambda }\) and generate a master secret key \({{msk}}\leftarrow {\mathsf {CFE}}.{\mathsf {Setup}}(1^{\lambda })\) for the constrained functional encryption scheme for dimension \(n=1.\)

    2. 2.

      For every \(w_j\in {\mathbf{DB }}\), add \({\mathbf{DB }}(w_j)\) to bucket \(b_j\leftarrow {\mathsf {PRF}}(s, w_j).\)

    3. 3.

      Create the encrypted data structure by computing a set of \(2^d-1\) ciphertexts \(CT_{k,l}, q_{k,l}\leftarrow {\mathsf {Enc}}({{msk}}, 1)\) and assign each state \(q_{k,l}\) with node \(N_{k,l}.\) Define M to be the set of all (kl) pairs identifying the ciphertexts.

    4. 4.

      Return the master secret \({{MSK}}=(s, q_{k,l})_{\forall (k,l)\in M}\) and the encrypted data structure \({\mathbf{EDB }}=( CT_{k,l})_{\forall (k,l) \in M}.\)

  • \(Search({{MSK}}, {\psi (\mathbf{w })})\): To generate a search token \({{TK}}_{{\psi (\mathbf{w })}}\) for the query \({\psi (\mathbf{w })}=(\psi , w_0, \ldots ,w_{|\psi |})\), the client generates for every word \(w_j=(w_0, \ldots , w_{d-1}\) a decryption key \({{\mathbf{SK }}}_j=({{SK}}_{j,0}, \ldots , {{SK}}_{j,d-1})\) as follows:

    1. 1.

      Recover the bucket \(b_j=(b_{j,0},\ldots , b_{j,d-1})\) for word \(w_j\) by computing \({\mathsf {PRF}}(s,w_j)\)

    2. 2.

      Compute for the \(k^{th}\) node \(N_{k,l}\) on the path to the bucket a decryption key \({{SK}}_{j,k}\leftarrow {\mathsf {KeyGen}}({{MSK}}, b_{j,l}, q_{k,l}).\)

    The client sends the search token \({{TK}}_{{\psi (\mathbf{w })}}=(\phi , {{\mathbf{SK }}}_1, \ldots {{\mathbf{SK }}}_{|\psi |})\) to the server. Upon receiving the token, the server searches for every decryption key \({{\mathbf{SK }}}_j=({{SK}}_{j,0}, \ldots , {{SK}}_{j,d-1})\) the bucket \(b_j\) as follows:

    1. 1.

      Decrypt for \(0\le l \le d-1\) the bit \(b_{j,l}\leftarrow {\mathsf {Dec}}({{SK}}_{j,l}, CT_{k,l})\)

    2. 2.

      Traverse to the node at level \(l+1\) in the tree whose edge is associated with bit \(b_{j,l}.\)

    Once the server identified all buckets \(b_j\), it applies the formula \(\phi \) to retrieve the ciphertexts matching the identifiers \({\psi (\mathbf{w })}.\)

  • \({\mathsf {Update}}({{MSK}},{\phi (\mathbf{u })}, {\mathbf{EDB }}).\) To add files to the data structure, one needs to search for the bucket matching the word and store the file index in the bucket. Deletion of files matching a word requires to delete the bucket associated with the word. Deletion of a single file requires the client to decrypt the files and ask the server to delete the index associated with the corresponding ciphertext.

A careful inspection of our data structure reveals that buckets leak the number of stored words. The server may conduct a statistical analysis based on the sizes of buckets and use the extra information to break privacy. We note that prior work is susceptible the analysis as well. To prevent the server from learning words from the number of indices stored in a bucket, one may apply standard masking techniques to bias the size. One essentially adds “dummy” identifiers to normalize the bucket sizes.

4.6 Security Analysis

We are now ready to analyze the scheme.

Theorem 2

Assume \({\mathsf {PRF}}\) is a secure pseudo-random function and \({\mathsf {CFE}}\) is a secure constrained functional encryption scheme. Then the above dynamic searchable encryption system is weakly secure.

Proof

(Sketch). The proof is trivial and therefor sketched. An adversary \({\mathcal {A}}\) breaking the security of the searchable encryption scheme can be used to construct a reduction against the pseudo-random function or constrained functional encryption scheme. To attack the pseudo-randomness, the reduction flips a bit b and simulates the searchable encryption scheme except that every invocation of \({\mathsf {PRF}}\) is forwarded to the pseudo-random oracle, implementing the \({\mathsf {PRF}}\) (\(b=0\)) or a random function with identical output range (\(b=1\)). When adversary \({\mathcal {A}}\) outputs a guess \(b'=b\), then the reduction conjectures to deal with an oracle implementing the pseudo-random function \({\mathsf {PRF}}\); otherwise, the reduction conjectures to interact with a random function. To attack the security of the constrained functional encryption scheme, the reduction emulates the generation of cryptographic binary tree by forwarding encryption requests to the challenge oracle. It simulates search and update queries by relaying the requests to its key generation oracle. When adversary \({\mathcal {A}}\) outputs a guess \(b'=0\), the reduction claims to interact with \({\mathbf{EDB }}_{0}\) searching for words \({\psi (\mathbf{w })}_{0}\) and making updates \({\phi (\mathbf{u })}_0\); otherwise, it claims to interact with \({\mathbf{EDB }}_{1}\) searching for words \({\psi (\mathbf{w })}_{1}\) and making updates \({\phi (\mathbf{u })}_1.\)