Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

It is now common for data owners to outsource their data to public servers providing storage on a pay-as-you-go basis. This can reduce the costs of data storage compared with that of running a private data center (e.g. hardware, construction, air conditioning and security costs), making this a cost effective solution. If the server is not fully trusted and the data is of a sensitive nature, the data owner may wish to encrypt it to ensure confidentiality. This, however, prevents the efficient retrieval of specific portions of the data as the server is unable to identify the relevant information.

Searchable Encryption (SE) [11, 16, 19, 21, 22, 24, 26, 32] addresses this issue by indexing the encrypted data in such a way as to allow a server to execute a search query (formed by the data owner or an authorised data user) over the encrypted data and return the identifiers of any file that satisfies the query.

To preserve confidentiality of the data, the server must not learn anything about the underlying data from the encrypted data and the data indexes; namely ciphertext indistinguishability and index indistinguishability. In the presence of a search query the only information leaked to the server is the search results. Query indistinguishability is also a desirable property although, due to the offline keyword guessing attack [12], this is not always easy to achieve in the public key setting (where indexes are generated using the data owner’s public key).

The majority of existing work on SE focusses on efficiently preserving confidentiality in the presence of an honest-but-curious server. This means that the server is trusted to follow the search protocol honestly but may try to infer information about data or search queries that it is unauthorised to know.

Verifiable Searchable Encryption (VSE) [13, 25, 30, 35, 37] assumes a stronger semi-honest-but-curious adversarial model in which the server might execute only a fraction of the search, or return a fraction of the search results in order to preserve its resources. To ensure the completeness and correctness of search results in this scenario, it is required that the server is able to prove to the querier that the search was computed honestly.

The current approaches to VSE in the literature do not support a wide range of expressive search queries. We address this issue by extending the types of queries that can be executed and verified by a VSE scheme to include more expressive search queries, as well as some computations. Most VSE schemes in the literature also require that the verification of query results be performed by the entity that issued the query whereas eVSE is publicly and blindly verifiable.

1.1 Our Contributions

We adapt and apply new techniques from the area of Publicly Verifiable Outsourced Computation to VSE in a novel way to enable a wider family of queries, and some types of computations, to be performed over outsourced encrypted data with verifiable query results. In summary, our contributions are:

  • More expressive queries: Our scheme supports queries such as boolean formulae involving conjunctions, disjunctions and negations, threshold operations, polynomials, arbitrary CNF and DNF formulae, and fuzzy searchFootnote 1.

  • Evaluation of computations: Our scheme supports the evaluation of some computations over the encrypted data, such as averaging and counting operations. As well as assigning keywords to label data, we propose to also assign keywords representing certain data values that may be computed over (either in the form of single keywords or as a string of keywords encoding binary data, see Sect. 3.3).

  • Blind public verifiability of query results: Any entity is able to verify the correctness and completeness of query results without any knowledge of either the underlying query or the results themselves.

The remainder of this paper is organised as follows. Section 2 gives some background information on SE and verifiable computation. Section 3 formally defines eVSE and its security model, Sect. 4 gives an instantiation of eVSE and Sect. 5 concludes the paper, highlighting possible avenues of future research. The Appendix provides more details on the security models and gives a security proof sketch as well as a discussion comparing our scheme with the ones in the literature. Additional details can be found in the full version [3].

2 Background

Searchable Encryption (SE) allows data to be outsourced in encrypted form and for keyword search queries to be performed remotely. Methods based on oblivious RAM [20] provide a high level of security (hiding both the access and search patterns) at the expense of slow search times and high communication costs. Song et al. [28] achieve a scheme with fewer rounds of communication, but which leaks the access pattern and requires each word of a document to be encrypted separately, so compression is not possible. Goh [19] introduced meta data (indexes) describing the content of each document, and enabled constant time searches using Bloom filters over the index only. Curtmola et al. [16] extended the system model to allow multiple users to query the data, using broadcast encryption to manage user access privileges. SE schemes that allow many users to upload data can be built using public key encryption, however the data can only be searched by the holder of the corresponding secret key (or a derivative thereof) [11]. Most SE schemes assume an honest-but-curious server model.

Verifiable Searchable Encryption (VSE) schemes assume a semi-honest-but-curious server model. The first VSE scheme was presented by Chai et al. [13], where they extend the paradigm of searchable symmetric encryption (SSE) [16] to create a verifiable SSE (VSSE) scheme that allows verification of search results from a single keyword equality query. Another approach by [25] extends a public key encryption with keyword search scheme [11] to support verification of search results from a single keyword equality query, where the indexes are created using a public key. Sun et al. [30] and Wang et al. [34] detail VSE schemes with enhanced functionality; verifiable multi-keyword ranked search and verifiable fuzzy keyword search, respectively.

Verifiable Computation (VC) allows a client with limited resources to efficiently outsource a computation to a more powerful server, and to verify the correctness of results. Gennaro et al. [18] considered the use of garbled circuits, whilst Parno et al. [27] introduced publicly verifiable computation (PVC) built from key policy attribute based encryption (KP-ABE), where a single client computes an evaluation key for the server and publishes information enabling other clients to outsource computation to the server. Any client may verify the correctness of a result. Alderman et al. [2] considered an alternative system model that used ciphertext policy attribute based encryption (CP-ABE) to allow clients to query computations on data held by the server (or initially outsourced by a client) called Verifiable Delegable Computation (VDC). This can naturally be applied to problems like querying on remote data, as well as MapReduce. Data remains statically stored on the server and may be embedded in a server’s secret key, whilst the computation of many different functions can be requested by creating ciphertexts using only public information. Other notable approaches in the realm of querying remote data can be found in [46, 810, 15].

3 Extended Verifiable Searchable Encryption

3.1 System Model

We consider a system comprising a data owner, a remote storage server, and a set of authorised data users. The data owner sets up the system to generate a master secret and holds a set of data \(D\) (e.g. a database) that they wish to encrypt and outsource to the remote server. The data owner controls which additional users are able to query their encrypted data. Queries may be formulated over these keywords (e.g. to identify records associated with a given set of keywords) as usual in SE, but we also allow computational queries of functions in the class \(NC^1\), which consists of Boolean functions computable by circuits of depth \(\mathcal O(\log n)\) where each gate has a fan-in of two, over encoded data values.

For example consider workgroups within an organisation. The manager or system administrator acts as the data owner for the organisation and outsources a shared database to a remote server. Authorisation is granted by issuing a secret key to each user, which is required when creating a query token \(QT_{Q}\) for a particular query Q. The token is sent to the server who performs the query on the encoded index to generate a result \(R\). We allow any entity to verify the correctness and completeness of the resultFootnote 2, but we restrict the ability to read the value of the result to only authorised data users (holding a retrieval key).

Throughout this work, we assume a strict separation between queriers (the data owner and users) and the remote server – the server may not issue queries itself, else it will trivially be able to learn the encoding of the index and queries (legitimate queriers must know this encoding to gain meaningful results).

3.2 Formal Definition

We now formally define a scheme for eVSE. We use the following notation. Data to be outsourced is denoted D and is considered to be a collection of n documents. Prior to outsourcing, the data owner specifies a pre-index for D, denoted \(\delta (D)\), which assigns a set of descriptive labels to each document e.g. keywords contained in the document or specific data values that may be computed upon. The encoded form of the data, including the descriptive labels, is referred to as the index of D, denoted \(\mathcal {I}_D\), and is stored by the server. Queries for functions in the class \(NC^1\) are denoted by Q and to make such a query, a data user creates a query token \(QT_Q\) for Q, a verification key \(VK_Q\) which allows any entity to blindly verify the result, R, of the query, and a retrieval key \(RK_Q\) which is issued to authorised data users to enable the query result to be learnt.

Definition 1

An Extended Verifiable Searchable Encryption (eVSE) scheme comprises the following algorithms:

  • : Run by the data owner and takes as input the security parameter and a universe of attributes (keywords and data values). It outputs the data owner’s master secret key \(\mathrm {MK}{}\) that is used for further administrative tasks and public parameters \(\mathrm {PP}\), both of which are provided to the remaining algorithms where required.

  • : Run by the data owner and takes as input the pre-index of the data \(\delta (D)\) and the set G of authorised users, and outputs a searchable index \(\mathcal {I}_{D}\) for the data D, as well as a server and data owner state.

  • : Run by the data owner to authorise a user \(\mathrm {ID}\) to perform queries by issuing them a secret key \(SK_\mathrm{{ID}}\) and outputs an updated server state.

  • : Run by a data user using its secret key and both states to generate a query token \(QT_{Q}\) for a query Q, a verification key \(VK_{Q}\) and an output retrieval key \(RK_{Q}\).

  • : Run by the server to execute a query given in the query token \(QT_{Q}\) on the index \(\mathcal {I}_{D}\). It generates a result \(R\) which can be returned to the querying user or published.

  • \(r\leftarrow \mathsf {Verify}(R, VK_{Q}, RT_{Q}, RK_{Q}, \mathrm {PP})\): Verification consists of two steps:

    1. 1.

      \(RT_{Q} \leftarrow \mathsf {BVerif}(R, VK_{Q}, \mathrm {PP})\): Run by any party to verify the correctness and completeness of the result \(R\). It takes the verification key \(VK_{Q}\) and, if the result is accepted, it outputs a retrieval token \(RT_{Q}\) which can be used to learn the result. Otherwise a distinguished failure symbol \(RT_{Q} = \perp \) is returned.

    2. 2.

      \(r\leftarrow \mathsf {Retrieve}(VK_{Q}, RT_{Q}, RK_{Q}, \mathrm {PP})\): Run by a data user to read the value of the result. It takes as input the retrieval token \(RT_{Q}\), the retrieval key \(RK_{Q}\) and the user’s secret key. If the user holds a valid retrieval key for Q and the computation was performed correctly, then it returns the actual result \(r= Q(\mathcal {I}_{D})\), otherwise it returns \(r=\ \perp \).

  • : Run by the data owner using its master secret key to revoke a user’s authorisation to make queries and read results. It does so by updating the server and data owner state.

An eVSE is correct if there is a negligible probability that verification does not suceed when all algorithms are run honestly. A formal definition can be found in [3].

3.3 Types of Query

We consider a broader range of verifiable queries than many prior schemes. In particular, we consider two main types:

  • Keyword matching queries: Queries of this type have formed the basis of most prior work in SE. Suppose there exists a universe (dictionary) of keywords. Each encrypted data item is associated with an index of one or more keywords to describe the contents. Queries are formed over the same universe of keywords. In this work, we permit Boolean formulae over sets of keywords (e.g. \(((\mathtt a \wedge b) \vee c)\) where \(\mathtt{a,b,c}\) are keywords). We return an identifier for each file whose associated keywords in the index satisfy this formula. Thus we can perform very expressive search queries over keywords.

  • Computational queries: Queries of this type are similar to the operations commonly discussed in the context of outsourced computation. We allow statistical queries over keywords (e.g. counting the number of data items that satisfy a keyword matching query), as well as operations over selected data values that have been encoded using additional portions of the keyword universe. It is possible to encode the entire database in such a way as to enable computations over all data fields, but it would usually be more efficient to select a (small) subset of fields that are most useful or most frequently queried. Clearly, keyword matching queries can be seen as a special case of computational queries where the function operator is equality testing.

  • Mixed queries: Queries of this type combine both the functionalities of the aforementioned query types (e.g. finding the average of data values contained in all documents associated with a particular keyword).

All types of query are performed in a verifiable manner to ensure that results are correct and complete.

3.4 Security Model

We now formalise several notions of security as a series of cryptographic games. The adversary against each notion is modelled as a probabilistic polynomial time (PPT) algorithm \(\mathcal {A}\) run by a challenger, with input parameters chosen to represent the knowledge of a real attacker as well as the security parameter \(\kappa \). The adversary algorithm may maintain state and be multi-stage; we refer to each stage as \(\mathcal {A}\) for ease of notation. The notation \(\mathcal {A}^\mathcal {O}\) denotes the adversary being provided with oracle access to the following algorithms: \(\mathsf {BuildIndex}(\cdot , \cdot , \mathrm {MK}{}, \mathrm {PP})\), \(\mathsf {AddUser}(\cdot , \cdot , \mathrm {MK}{}, \mathrm {PP})\), \(\mathsf {Query}(\cdot , \cdot , \cdot , \cdot , \mathrm {PP})\) and \(\mathsf {Search}(\cdot , \cdot , \cdot , \cdot , \mathrm {PP})\). We assume that oracle queries are performed in a logical order such that all required information is generated from previous queries. For each game, we define the advantage and security of \(\mathcal {A}\) as:

Definition 2

The advantage of a PPT adversary \(\mathcal {A}\) is defined as follows, where \(\mathbf{X} \in \{{ PubVerif}, { IndPriv}, { QueryPriv}\}\):

$$\begin{aligned} Adv_{\mathcal {A}}^\mathbf{X}(e \mathcal {VSE}, 1^\kappa ) = Pr [\mathbf Exp _{\mathcal {A}}^\mathbf{X}[e \mathcal {VSE}, 1^\kappa ] =1]. \end{aligned}$$

An eVSE scheme is secure against Game X if for all PPT adversaries \(\mathcal {A}\), Adv\(_{\mathcal {A}}^\mathbf{X}\)(\(e \mathcal {VSE}\), \(1^\kappa \)) \(\le \) negl\((\kappa )\) where \(negl \) is a negligible function.

figure a

Public Verifiability. In Game 1, we capture the notion of public verifiability such that a server may not cheat by returning an incorrect result without being detected. This is a selective notion of security where, at the beginning of the game, the adversary chooses the challenge query and pre-index. The challenger then initialises the system, runs \(\mathsf {AddUser}\) for a randomly chosen \(\mathrm {ID}\) from the userspace, runs \(\mathsf {BuildIndex}\) for the challenge pre-index to create the index, and finally runs \(\mathsf {Query}\). The adversary is given the resulting parameters, as well as access to the above specified oracle queries, and outputs \(R^{\star }\), which it believes to be an incorrect result that will, nevertheless, be accepted by the verifier. The challenger runs the verification steps on this output. The adversary wins if verification succeeds, yet the result is not \(Q(\mathcal {I}_{D^\star })\).

Index Privacy and Query Privacy. In Appendix A, we provide notions of index indistinguishability against a selective chosen keyword attack and query privacy, which ensure that no information regarding the keywords is leaked from the index or query tokens respectively.

4 Construction

4.1 Overview

We base our instantiation on a CP-ABE scheme. As shown by Alderman et al. [2], CP-ABE can be used to verifiably request computations to be performed on data held by a server, referred to as VDC. In VDC, a trusted Key Distribution Center (KDC) initialises the system and issues a CP-ABE decryption key to the server pertaining to the data it holds. We use a similar technique, but have the data owner act as the KDC (so the data need not be revealed to an external KDC, as in VDC). The index for a set of data is a CP-ABE decryption key for a set of attributes encoding the pre-index, and is sent to the server. The method of encoding is described in Sect. 4.2.

We consider the family \(\mathcal {B}\) of Boolean functions closed under complement – that is, if \(F \in \mathcal {B}\) then \(\overline{F}\), where \( \overline{F}(x)= F(x) \oplus 1\), is also in \(\mathcal {B}\). A function \(F : \{0,1\}^n \rightarrow \{0,1\}\) is monotonic if \(x \leqslant y\) implies \(F(x) \leqslant F(y)\), where \(x = (x_1,\dots ,x_n)\le y = (y_1,\dots ,y_n)\) if and only if \(x_i \leqslant y_i\) for all i. For a monotonic function F, the set \(\mathbb {A}_F = \{x: F(x) = 1\}\) defines a monotonic access structure.

A query Q is represented as a Boolean function of keywords and computational data points. If a monotonic CP-ABE scheme is used then queries can be comprised of AND and OR gates (and negation can inefficiently be handled by including both a positively and negatively labelled attribute in the universe and requiring the presence of exactly one of them). A non-monotonic CP-ABE scheme enables queries formed from AND, OR and NOT gates, which is a universal set of gates, and fuzzy CP-ABE enables fuzzy keyword search. We can achieve all functions in the class \(NC^1\), which includes common arithmetic and comparison operators useful in queries. An n-bit result can be formed by performing n Boolean queries, each of which returns the \(i^{th}\) bit of the output.

The query token for a Boolean function \(Q \in \mathcal {B}\) comprises two CP-ABE ciphertexts for access structures representing Q and \(\overline{Q}\in \mathcal {B}\) respectively. To perform the search, the server attempts to decrypt each ciphertext under the secret key (associated with the pre-index) and outputs the result. Each decryption succeeds if and only if the query evaluates to True on the index. Any entity may perform the blind verification operation using the verification key to learn only whether the operation was performed correctly or not. Only entities holding the retrieval token can read the value of the result.

4.2 Data Encoding

Defining the Index. Suppose the data D to be outsourced comprises n documents. We now discuss how to form a pre-index \(\delta (D)\), which represents the keywords and data fields that may be queried over.

Let \(\mathcal {D}\) be a dictionary of keywords that describe the documents. \(\mathcal {D}\) alone suffices for keyword matching queries but for computational queries, we also need to be able to encode data values such that they can be input to queries represented as access structures encoding Boolean functions.

For each data field x that may be input to a computational query, let the maximum size of the data value be \(m_x\) bits. We define \(m_x\) additional attributes \(A_{x,1}, A_{x,2}, \dots , A_{x,{m_x}}\), and define the universe \(\mathcal {C} = \bigcup _{x \in D} \cup _{i=1}^{m_x} A_{x, i}\) to be the union of these attributes over all data fields. Let y be a value stored in the data field x and let the binary representation of y be \(y_1, \dots , y_{m_x}\). We view y as a characteristic tuple of an attribute set \(A_y \subseteq \mathcal {C}\), where \(A_y = \{A_{x,i} : y_i =1\}\) – we include an attribute for position i in the set if and only if the \(i^{th}\) bit of y is 1.

Finally, to enable the index for all n documents to be encoded within a single CP-ABE key (and hence for computations to be performed simultaneously on all documents), and to ensure that the correct index data is used for each query, we must encode a labelling of the document that each attribute pertains to. We define our attribute universe \(\mathcal {U}\) for the CP-ABE scheme to be \(\mathcal {U} = \{\mathcal {D} \cup \mathcal {C}\} \times [n]\). Thats is, we take n copies of \(\mathcal {D}\) and \(\mathcal {C}\). Each element of \(\{\mathcal {D} \cup \mathcal {C}\}\) describes a particular keyword or data value, and each copy relates to a different document in D - if we index each copy of an attribute \(w \in \{\mathcal {D} \cup \mathcal {C}\}\) as \(\{w_i\}_{i=1}^n\), then \(w_i\) denotes the presence of w in document i. In practice, it may be desirable to use a ‘large universe’ CP-ABE scheme, wherein arbitrary textual strings are mapped to attributes (group elements), e.g. using a hash function \(\mathsf {H}\). Thus, for a keyword or data value w in document i, the attribute could be defined as \(\mathsf {H}(w||i)\).Footnote 3

The pre-index of the data D is a set of attributes \(\delta (D)\subseteq \mathcal {U}\). The index that is outsourced will be a CP-ABE key generated over this attribute set.

Hiding the Index. In general, CP-ABE schemes do not hide the attributes within the decryption key. This is usually expected behaviour since CP-ABE is often used to cryptographically enforce access control policies and it is natural to assume that an entity is aware of their access rights.

However, in this setting we are using CP-ABE not to protect objects from unauthorised access, but instead to prove the outcome of a function evaluation. The keys in our setting are formed over attributes encoding the index of outsourced data, as opposed to encoding access rights. Since the server should not learn any information about the data, including the index, we must implement a mechanism by which the decryption key hides the associated attributes.

In many CP-ABE schemes, the public parameters comprise an ordered set of group elements [36], each associated with an attribute from the universe; that is, \(\forall i \in \mathcal {U}\), choose , then form the encoded attribute set \(\{g^{t_i}\}_{i \in \mathcal {U}}\). Thus, given a key (or ciphertext) that comprises \(g^{t_i}\), it is possible, based on the ordering of this set, to determine the attribute \(i\in \mathcal {U}\) it relates to. In addition, the attributes may be listed in the clear, and attached to keys and ciphertexts to indicate which group elements should be applied at each point. Clearly, this is unsuitable for our requirement for a hidden index.

To this end, we first apply a random permutation to \(\mathcal {U}\) such that the position of the group elements within the ordered set does not reveal the attribute string (unless the permutation is known). We then use a symmetric encryption scheme to encrypt each attribute \(x \in \mathcal {U}\) under a key k, and then instantiate the CP-ABE scheme on this universe of encrypted attributes. Thus, without knowledge of the key k, the server should be unable to determine the attribute string x. We assume that only the keywords or data items being computed over are considered sensitive, and not the logical makeup of the Boolean function (in terms of gates).

4.3 Formal Details

The data owner initialises the system and encodes the data as an index which is pushed to the server. Each (authorised) user will be issued with a personalised secret key enabling them to form queries. To make a query Q, a user chooses a random message from the message space \(\mathcal {M}\) to act as a verification token, and encrypt this using the CP-ABE scheme under the access structure encoding Q. The server attempts to decrypt the ciphertext and recovers the chosen message if and only if \(Q(\mathcal {I}_{D}) = 1\). By the indistinguishability security of the CP-ABE scheme, the server learns nothing about the message if \(Q(\mathcal {I}_{D})=0\) since this corresponds to an access structure not being satisfied. Thus, if a server returns the correct message, the user is assured that the query evaluated to 1 on the data. If, however, \(Q(\mathcal {I}_{D}) = 0\), then decryption will return \(\perp \). This is insufficient for verification purposes since the server can return \(\perp \) to convince a user of a false negative search result. Thus, the user must, in fact, produce two CP-ABE ciphertexts. As above, one corresponds to the function Q, whilst the other corresponds to \(\overline{Q}\), the complement query of Q. Hence, the server’s key will decrypt exactly one ciphertext and the returned message will distinguish whether Q or \(\overline{Q}\) was satisfied, and therefore the value of \(Q(\mathcal {I}_{D})\). A well-formed response \((d_0, d_1)\) from a server, therefore, satisfies the following:

$$\begin{aligned} (d_0, d_{1}) ={\left\{ \begin{array}{ll} (m_0, \perp ), &{} \text {if}\ Q(\mathcal {I}_{D}) = 1 \\ (\perp , m_{1}), &{} \text {if}\ Q(\mathcal {I}_{D}) = 0. \end{array}\right. } \end{aligned}$$
(1)

Public Verifiability is achieved by publishing a token comprising a one-way function g applied to both plaintexts. Any entity can apply g to the server’s response and compare with this token to check correctness. To achieve blind verification, a random bit b permutes the order of the ciphertexts. Thus, verifiers that do not know b cannot determine whether a plaintext is associated with Q or \(\overline{Q}\).

Our adversarial model allows the adversary (and hence servers in our system) to hold more than one key (for multiple datasets); we must ensure that a key cannot produce a valid looking response to a query on a different index. We achieve this by labelling each pre-index with a label \(l({\delta (D)})\) and define an attribute for each label. Then, for a pre-index \(\delta (D)\), the decryption key is formed over the attribute set \((\delta (D)\cup l({\delta (D)}))\). Recall that encoded data stored on the server’s side is a collection of n documents, which we label \(D_1, \dots , D_n\). When making a query \(Q(\mathcal {I}_{D})\), a sub-query \(Q_i\) may be formed for each document (e.g. to check if a given keyword is contained in each document). In this case, the encryption algorithm takes the access structure encoding of the conjunction \((D_i \wedge l({\delta (D)}))\) for \(i \in [n]\). A valid result can only be formed by applying the sub-query to the specified document, which is also labelled by \(D_i \in \mathcal {D}\) – decryption succeeds if and only if the function is satisfied and the label \(l({\delta (D)})\) is matched in the key and ciphertext. Note that a key for a different pre-index will not include the correct label. Inputs to the \(\mathsf {Query}\) algorithm are assumed to be in this form.

Let \(\mathcal {CPABE} =\) \((\mathsf {ABE.Setup}\), \(\mathsf {ABE.KeyGen}\), \(\mathsf {ABE.Encrypt}\), \(\mathsf {ABE.Decrypt})\) define a CP-ABE encryption scheme over the universe \(\mathcal {U}\). Let \(\mathcal {SE}\) \(= (\) \(\mathsf {SE}.\mathsf {KeyGen}\), \(\mathsf {SE}.\mathsf {Encrypt}\), \(\mathsf {SE}.\mathsf {Decrypt}\)) be an authenticated symmetric encryption scheme secure [7] in the sense of \(\textsc {IND-CPA}\). Let \(\mathcal {BE}= (\) \(\mathsf {BE}\).\(\mathsf {KeyGen}\), \(\mathsf {BE}\).\(\mathsf {Encrypt}\), \(\mathsf {BE}\).\(\mathsf {Add}\), \(\mathsf {BE}\).\(\mathsf {Decrypt}\)) be a broadcast encryption scheme that retains \(\textsc {IND-CPA}\) security against a coalition of revoked users. Finally, let g be a one-way function and let \(\varPi \) and \(\phi \) be pseudo-random permutations (PRPs) (which pad their inputs if required). Then Algorithms 1–8 define an eVSE scheme for a class of queries \(\mathcal {Q}\).

figure b
figure c
figure d
figure e
figure f
figure g
figure h
figure i

Theorem 1

Given a selective IND-CPA secure CP-ABE scheme, an authenticated symmetric encryption scheme and a broadcast encryption scheme, both secure in the sense of \(\textsc {IND-CPA}\), pseudo-random permutations \(\varPi \) and \(\phi \), and a one-way function g. Let \(e \mathcal {VSE}\) be the extended verifiable searchable encryption scheme defined in Algorithms 1–8. Then \(e \mathcal {VSE}\) is secure in the sense of Public Verifiability, Index Privacy and Query Privacy.

In Appendix A.1 we provide a proof sketch. Full proofs can be found in [3]. In Appendix B we discuss the trade-off between efficiency and functionality of our scheme. Note that we can add additional contextual access control following Alderman et al. [1] by replacing \(\phi \) with a key assignment scheme.

5 Conclusion

With this work we have begun to consider the application of VC techniques in the setting of searchable encryption. On the searchable encryption side, this enables additional functionality in the form of computational queries (e.g. computing the average of outsourced data fields that are linked to a specific set of keywords), whilst on the VC side, this introduces additional privacy concerns regarding the outsourced data and computations. The choice of using VC techniques based on ABE stems from the natural correspondence between attributes and keywords in an index. However, future work should investigate other forms of VC to achieve different classes of functionality and (especially) improve efficiency.

In future work, we would like to consider a model whereby multiple data owners can store data on a server without each having to initialise their own scheme. In practice, this could result in the Key Distribution Center from VDC [2] setting up the system and publishing public parameters that any data owner can use, but enabling each data owner to generate their own CP-ABE decryption keys for the data they hold.