1 Introduction

External storage servers allow users to retrieve outsourced data selectively. For instance, a database located in a cloud storage server can be queried for the segment of records satisfying a certain condition. However, since such servers are managed by untrusted third parties, users are usually reluctant to outsource their sensitive data in the clear.

Encrypting the data to be outsourced is a good approach to overcome this security concern. Nevertheless, traditional encryption schemes fail to provide selective retrieval in an efficient and secure way. Searchable encryption deals with this problem by allowing data owners to issue queries for encrypted outsourced data. Much like traditional encryption schemes, searchable encryption schemes generally come in two distinct types, serving different purposes: symmetric-key searchable encryption (SSE) [15] and public-key searchable encryption.

Public-key searchable encryption (also named public-key encryption with keyword search), or PEKS, was firstly proposed by Boneh et al. [9]. Since their pioneering work, there have appeared several PEKS schemes in the literature [3, 4, 11, 14, 21, 23, 31,32,33, 40, 42], improving the scheme in [9] in terms of efficiency, security or functionality.

The keyword search protocol considered in public-key searchable encryption involves the following entities:

  • a set of data suppliers, which provides and encrypts the data to be outsourced,

  • a storage server (e.g., an e-mail gateway or a database), which stores the outsourced data, and

  • a client of the storage server, who retains the ability to generate queries for the encrypted data.

In the protocol, the client firstly sets up the scheme by generating some public parameters and shares them with the server and the data suppliers. Then, it generates a public and private key pair and shares the public key with the data suppliers. Afterward, the data suppliers may wish to share a collection of documents with the client, where each document is indexed by a set of keywords. To do so, they encrypt this collection and upload the resulting ciphertexts to the storage server. Also, the client wants to receive the stored documents indexed by all keywords in a chosen list from the storage server. To let the storage server know which encrypted document it should forward, the client generates a query and sends it to the storage server. The storage server is then able to use the received information to select the encrypted stored documents satisfying the conditions in the queries and to return those documents to the client. Note that this is done without direct interaction between the client and the data providers.

As for query expressiveness, the scheme in [9] achieves single-keyword queries, i.e., queries matched by documents indexed by a single chosen keyword. Single-keyword PEKS schemes enable data providers to generate an encryption of a keyword w by using the public key of the client and upload it to the storage server. We call this an encrypted index, and we denote it by \(\mathbf {I}(w)\). The client, holding the secret key, can build a trapdoor\(\mathbf {T}(w')\) corresponding to some keyword \(w'\). By sending \(\mathbf {T}(w')\) to the storage server, the client empowers the storage server to learn whether any encrypted index \(\mathbf {I}(w)\) satisfies \(w=w'\), but no other information about \(\mathbf {I}(w)\) is revealed in this process.

One of the most common enhancements of PEKS is conjunctive PEKS [11, 21, 30, 31], which consists in enabling conjunctive field keyword queries. Typically, in conjunctive PEKS, data providers encrypt a tuple (that is, an ordered set) of keywords \((w_{1},\ldots ,w_{m})\) by using the public key of the client, generating an encrypted index \(\mathbf {I}(w_{1},\ldots ,w_{m})\). The client can produce a trapdoor associated with a tuple of keywords \((w'_{1},\ldots ,w'_{l})\), along with a set of keyword fields (or positions) \(\{j_{1},\ldots ,j_{l}\}\subseteq \{1,\ldots ,m\}\). On receiving this trapdoor, the storage server can check if the predicate \((w_{j_{1}}=w'_{1})\wedge \cdots \wedge (w_{j_{l}}=w'_{l})\) holds by using the index \(\mathbf {I}(w_{1},\ldots ,w_{m})\) and the trapdoor. This usage of keyword fields is standard in the PEKS literature and in many tools and applications related to encrypted search, such as in relational DBMS (in the form of table fields), in the metadata of network packets and e-mails and in some of the proposed applications of PEKS [9, 13, 30, 36, 41].

Another enhancement of PEKS is subset PEKS, first defined in [11], which enables subset queries. In subset PEKS, data providers encrypt a tuple of keywords \((w_{1},\ldots ,w_{m})\) by using the public key of the client, generating an encrypted index \(\mathbf {I}(w_{1},\ldots ,w_{m})\). The client can produce a trapdoor associated to m arbitrary sets of keywords \((A_{1},\ldots ,A_{m})\). When receiving this trapdoor, the storage server can check if the conjunctive subset query predicate \((w_{1}\in A_{1})\wedge \cdots \wedge (w_{m}\in A_{m})\) holds by using the index \(\mathbf {I}(w_{1},\ldots ,w_{m})\) and the trapdoor.

In most of the proposed applications for single-keyword PEKS, the exchanged ciphertexts take the form

$$\begin{aligned} \mathrm {Enc}_{\mathrm {pk}}(D)\Vert \mathbf {I}(w_{1})\Vert \cdots \Vert \mathbf {I}(w_{m}), \end{aligned}$$

where \(\mathrm {pk}\) is the public key of the client, \(\mathrm {Enc}\) is some public-key encryption scheme, document D is indexed by keywords \(w_{1},\ldots ,w_{m}\), and \(\mathbf {I}(w_{1}),\ldots , \mathbf {I}(w_{m})\) are the corresponding encrypted indexes. Such ciphertexts are uploaded to the server by the data suppliers. The client can recover the documents that are indexed by the keyword w in position \(i\in \{1,\ldots ,m\}\) by sending the trapdoor \(\mathbf {T}(w)\) and the position i to the server. The basic security property of PEKS schemes is that the server does not learn any information about the encrypted indexes unless it has the knowledge of a matching trapdoor.

Note that one could achieve conjunctive queries by using single-keyword PEKS schemes, simply by querying for particular keywords as stated above, and computing the intersection of the results locally or in the storage server. When doing so, the server learns which documents are indexed by each of the keywords. The benefits of using conjunctive PEKS against using single-keyword PEKS in this way are mainly in efficiency and security, since trapdoors are usually much shorter, the intersection predicate is embedded in the trapdoor and the intersection is computed at the storage server.

The earliest application scenario for PEKS, as suggested by Boneh et al. [9], is e-mail gateways. In this scenario, a user Alice wishes to read her e-mails, which are stored in an untrusted e-mail gateway in an encrypted form. When retrieving her e-mails, she may want the e-mail gateway to forward her just e-mails satisfying certain conditions, e.g., containing the keyword “urgent” or having a particular sender “Bob”. To enable the e-mail gateway to do so, she sends the trapdoors corresponding to these keywords to the gateway, e.g., \(\mathbf {T}\)(“tag:urgent”) and \(\mathbf {T}\)(“sender:Bob”).

Now, suppose user Bob wishes to send Alice an e-mail D. He may encrypt D by using Alice’s public key \(\mathrm {pk}\) and attach the sender information and the “urgent” tag in the form of encrypted indexes of the form \(\mathbf {I}\)(“sender:Bob”) and \(\mathbf {I}\)(“tag:urgent”). Thus, Alice’s gateway would receive the message

$$\begin{aligned} \mathrm {Enc}_{\mathrm {pk}}(D)\Vert \mathbf {I}(\text {``sender:Bob''}) \Vert \mathbf {I}(\text {``tag:urgent''}). \end{aligned}$$

By matching the attached index with the stored trapdoors, the e-mail gateway is able to know which e-mails should be forwarded to Alice by checking which trapdoors match which encrypted indexes. However, it learns nothing else about the e-mails in the process. This example illustrates an application covered by PEKS that seems, a priori, hard to cover by using exclusively symmetric-key mechanisms such as SSE.

Another natural application for PEKS schemes is related to secure audit logs, and it was devised by Waters et al. [41]. Audit logs are stored in an untrusted storage server in an encrypted form by using an Identity-Based Encryption (IBE) scheme (e.g., [8, 10, 24, 25, 39]), and PEKS encrypted indexes are attached to it. Attributes for IBE and keywords for PEKS are related to the audit record at hand, e.g., date and time. An investigator Bob may wish to be granted access to audit logs recorded, for instance, in a particular date and time. To do so, Bob asks the key escrow agent, say Alice, for the trapdoors and decryption keys corresponding to this particular date and time. If Alice authorized Bob to issue this particular search, she would serve Bob the requested IBE decryption keys and PEKS trapdoor, and Bob would then be able to retrieve the audit logs of interest from the untrusted storage server.

Other applications include secure cloud storage [13], decryption key delegation systems [30] and context-based forwarding [36]. Although SSE represents a much more efficient solution than PEKS for cloud storage in the symmetric setting, PEKS schemes can be useful in applications involving asymmetric architectures, such as when sending or sharing outsourced data.

The first symmetric-key searchable encryption scheme was proposed in 2000 by Song et al. [37]. In 2004, Boneh et al. presented the earliest PEKS scheme in [9]. This was immediately followed by the first conjunctive searchable encryption scheme in the symmetric setting by Golle et al. [20] and by an extension of PEKS to conjunctive PEKS by Park et al. [31]. Many authors then presented alternative PEKS schemes that allow decryption of indexes [11, 30, 32], multi-dimensional range queries [35], reduction of communication and storage costs [14, 42], extension to multi-user systems [21] or improvements of security in various ways [3, 13, 17, 28, 33, 40]. Among them, one of the most relevant is the work by Boneh and Waters [11], in which they define the general notion of a Hidden-Vector Encryption (HVE) scheme, providing an enhancement of expressiveness of the queries that allows for conjunctive and keyword search, and also decryption of indexes. We overlook the existing work on the symmetric-key setting of searchable encryption, since it lies outside the scope of this article.

Also, the relationship between PEKS and Anonymous Identity-Based Encryption, abbreviated AIBE, was first established in [9]. Most AIBE schemes (e.g., see [8, 10, 24, 25, 39]) can be easily translated to PEKS schemes and vice-versa via a generic blackbox transformation [1, 9].

We propose two PEKS schemes. The first one achieves conjunctive field keyword search. Under the proposed security definition, it does not provide any security enhancement against using a single-keyword PEKS scheme to issue conjunctive field keyword queries. However, as we see in Sect. 7, it improves all previous related schemes in terms of efficiency in the most critical operations. Moreover, the trapdoors generated by using this first scheme consist of just one group element, and the index size improves all previous PEKS schemes.

The second proposed scheme enables a class of generalized subset queries, which includes subset queries as defined in [11]. The proposed security definition guarantees that nothing is leaked from encrypted indexes apart from the output of the search process. To the best of our knowledge, apart from the scheme in [11], no other subset PEKS schemes have been proposed in the literature. The proposed scheme improves [11] in terms of efficiency and expressiveness, and it does not assume that keywords are taken from a finite keyword space.

The security of the two proposed schemes relies on the intractability of the asymmetric DBDH problem and of the p-BDHI problem, respectively.

The remainder of this paper is structured as follows. In Sect. 2, we outline the preliminaries needed in this work. Our constructions for conjunctive and subset PEKS are described in Sects. 3 and 5. Sections 4 and 6 feature the consistency and security proofs for our conjunctive and subset PEKS schemes, respectively. In Sect. 7, we analyze the efficiency of our schemes. We conclude the article in Sect. 8 with some final remarks and future work directions.

2 Preliminaries

We start this section by giving some general notation and by stating the general model for the proposed PEKS schemes. We then give the consistency and security definitions, providing the hardness assumptions on which we base the security of our schemes. We finally give some implementation remarks.

2.1 Notation

We start by giving some standard notation and definitions used in searchable encryption. In this work, a keyword denotes a binary string \(w\in \{0,1\}^{*}\). We define a document as a tuple of keywords \(\mathbf {D}=(w_{1},\ldots ,w_{m})\), and we say that keyword \(w_{i}\) is in keyword field (or position) i of \(\mathbf {D}\). Note that, in this definition of document, we drop the data items (e.g., files, e-mails...) and consider only the indexing keywords. We make this choice since PEKS works exclusively over the indexing keywords, and data items may be protected by other cryptographic means (as explained in Sect. 1 and as studied in [17]).

If \(\mathbf {D}_{0}, \mathbf {D}_{1}\) are two documents, we denote by \(\mathbf {D}_{0}\varDelta \mathbf {D}_{1}\) the set of keywords appearing in either \(\mathbf {D}_{0}\) or \(\mathbf {D}_{1}\), but not in both at the same time. So if, for instance, \(\mathbf {D}_{0}=(\text {``a''},\text {``b''},\text {``c''})\) and \(\mathbf {D}_{1}=(\text {``a''},\text {``d''},\text {``e''})\), we would then have that \(\mathbf {D}_{0}\varDelta \mathbf {D}_{1}=\{\text {``b''},\text {``c''},\text {``d''},\text {``e''}\}\). We also name \(\mathbf {D}_{0}\varDelta \mathbf {D}_{1}\) as the set of keywords distinguishing\(\mathbf {D}_{0}\) and \(\mathbf {D}_{1}\).

Given a positive integer m, we denote by [m] the set \(\{1,\ldots ,m\}\). Given a function \(f:\mathbb {N}\rightarrow \mathbb {R}\), we say that \(f(\lambda )\) is negligible in\(\lambda \) if for every positive polynomial g there exists an integer \(\lambda _0\) such that, for all \(\lambda >\lambda _0\), \(|f(\lambda )|<1/g(\lambda )\). That is, if it decreases faster than any positive polynomial.

See [12] for an extensive and recent survey on the subject of searchable encryption.

2.2 Model for \(\mathrm {PEKS}\) scheme

We now give the general model for the proposed public-key searchable encryption schemes. Although not stated, every algorithm apart from Setup takes the public parameters as input.

Definition 1

We define a \(\mathrm {PEKS}\) scheme \(\mathscr {S}\) as consisting of five polynomial-time algorithms:

\(\mathscr {S}.\mathrm {Setup}(\lambda )\)::

Probabilistic algorithm run by the client that, given a security parameter \(\lambda \), returns the public parameters \(\mathrm {params}\) of the scheme.

\(\mathscr {S}.\mathrm {KeyGen}()\)::

Probabilistic algorithm run by the client that derives a private key \(\mathrm {sk}\) and a public key \(\mathrm {pk}\) from the public parameters \(\mathrm {params}\).

\(\mathscr {S}.\mathrm {BuildIndex}_{\mathrm {pk}}(\mathbf {D})\)::

Probabilistic algorithm, to be run by data providers. It takes as input a document \(\mathbf {D}\) and returns a corresponding encrypted index \(\mathbf {I}\).

\(\mathscr {S}.\mathrm {Trapdoor}_{\mathrm {sk}}(\mathbf {L},J)\)::

Algorithm run by the client that takes as input a tuple \(\mathbf {L}\) of keywords and a set J of positions. It returns a corresponding trapdoor \(\mathbf {T}\).

\(\mathscr {S}.\mathrm {Search}(\mathbf {I},\mathbf {T})\)::

Deterministic algorithm run by the server and taking as input an encrypted index \(\mathbf {I}\) and a trapdoor \(\mathbf {T}\). It returns either 1 or 0.

2.3 Consistency definition

The consistency property relates to the correctness of the scheme, in the sense that an encrypted index and a trapdoor should match in the search process exactly when the underlying document and query also match. If a document and a query match, then by construction of our schemes the corresponding encrypted index and trapdoor match in the search process. However, the converse does not necessarily hold. In this regard, the usage of hash functions in the proposed schemes induces the existence of false positives in the search process. Therefore, we must analyze the extent to which false positives can be produced, and we recur to a notion of consistency defined by Abdalla et al. [1].

The consistency notions defined by Abdalla et al. [1] are, in increasing strength order, computational, statistical and perfect. We prove consistency under the random oracle model and under an adaptation of the weakest definition of consistency in [1], namely computational consistency. Informally, their definition states that the advantage of any polynomial-time adversary in finding a matching encrypted index and trapdoor coming from a non-matching document and query is negligible in the security parameter, where the adversary has access to the public parameters and to the public key.

Let \(\mathscr {S}\) be a PEKS scheme. Given a security parameter \(\lambda \), we introduce a consistency game in the following three phases:

  • Setup. The challenger runs \(\mathscr {S}.\mathrm {Setup}\) on input \(\lambda \) and then hands over the public parameters to the adversary. It also runs \(\mathscr {S}.\mathrm {KeyGen}\), keeps the private key \(\mathrm {sk}\) secret and hands over the public key \(\mathrm {pk}\) to the adversary.

  • Guess. The adversary outputs a document of the form \(\mathbf {D}=(w_{1},\ldots ,w_{m})\) and a tuple of keywords of the form \(\mathbf {L}=(w'_{1},\ldots ,w'_{l})\) together with a set of positions \(J=\{j_{1},\ldots ,j_{l}\}\subseteq [m]\).

  • Output. The challenger hands over to the adversary the trapdoor \(\mathbf {T}=\mathscr {S}.\mathrm {Trapdoor}_{\mathrm {sk}} (\mathbf {L},J)\), and then, the adversary computes \(\mathbf {I}=\mathscr {S}.\mathrm {BuildIndex}_{\mathrm {pk}}(\mathbf {D})\). If it holds that \(\mathscr {S}.\mathrm {Search}(\mathbf {I},\mathbf {T})=1\) and if there exists a \(j_{i}\in J\) such that \(w_{j_i}\ne w'_{i}\), then the adversary outputs a bit \(b=1\). Otherwise, it outputs \(b=0\).

Definition 2

(Computational consistency of PEKS [1]) A \(\mathrm {PEKS}\) scheme \(\mathscr {S}\) is computationally consistent if the advantage of every probabilistic polynomial-time (PPT) adversary \(\mathscr {A}\) in the above game

$$\begin{aligned} \mathrm {Adv}_{\mathscr {A}}(\lambda )=\mathrm {Pr}(b=1) \end{aligned}$$

is negligible in \(\lambda \).

2.4 Security definition

In this section, we provide the hardness assumptions and the security definitions used in the security analysis of the proposed schemes. All proofs in this work are set in the random oracle model (see [7]).

2.4.1 Hardness assumptions

We now define symmetric and asymmetric bilinear groups. In this article, group operations are always written multiplicatively.

Definition 3

(Bilinear Groups) Let \((\mathbb {G}_{1},\cdot ),(\mathbb {G}_{2},\cdot )\) be two cyclic groups of prime order q with generators gh, respectively (usually denoted by \(\mathbb {G}_{1}=\langle g\rangle \), \(\mathbb {G}_{2}=\langle h\rangle \)), and suppose that there exists a cyclic group \(\mathbb {G}_{T}\) of order q and a non-degenerate bilinear map \(e:\mathbb {G}_{1}\times \mathbb {G}_{2}\rightarrow \mathbb {G}_{T}\).

We say that \(\mathbb {G}_{1}\) is a symmetric bilinear group if there exists an efficiently computable isomorphism between \(\mathbb {G}_{1}\) and \(\mathbb {G}_{2}\). Under such an isomorphism, we denote \(\mathbb {G}_{1}\) and \(\mathbb {G}_{2}\) by \(\mathbb {G}\).

Similarly, we say that \(\mathbb {G}_{1},\mathbb {G}_{2}\) are asymmetric bilinear groups if there exist no non-trivial efficiently computable homomorphisms from \(\mathbb {G}_{2}\) to \(\mathbb {G}_{1}\).

The bilinear groups \(\mathbb {G},\mathbb {G}_{1},\mathbb {G}_{2}\) are taken to be subgroups of the group of points of an elliptic curve, and \(\mathbb {G}_{T}\) is a subgroup of the multiplicative group of a finite field [29]. The definition of symmetric and asymmetric bilinear groups corresponds to Type 1 and Type 3 pairings in the paper by Galbraith et al. [19]. The term pairing refers to the non-degenerate bilinear map in the definition of bilinear group. We refer the reader to their article for properties of particular instantiations and to [5, 34] for techniques to speed up pairing computation.

The first scheme we propose is proved secure under the asymmetric Decisional Bilinear Diffie–Hellman assumption \((\text {asymmetric } \mathrm {DBDH})\). This assumption is proposed in the work [8] by Boneh and Boyen as a generalization of the DBDH assumption (see [22]) to the asymmetric setting. The DBDH assumption is easily seen to imply \(\mathrm {DDH}\) in the target group \(\mathbb {G}_{T}\).

Definition 4

(Asymmetric DBDH Assumption) Let \(\mathbb {G}_{1}=\langle g \rangle \), \(\mathbb {G}_{2}=\langle h \rangle \) be asymmetric bilinear groups deterministically generated according to a security parameter \(\lambda \). We say the asymmetric DBDH assumption holds in \(\mathbb {G}_{1}\) and \(\mathbb {G}_{2}\) if for every PPT algorithm \(\mathscr {B}\),

$$\begin{aligned} \mathrm {Adv}_{\mathscr {B}}(\lambda )&=\left| \mathrm {Pr} \left( \mathscr {B}(g,g^a,g^b,h,h^a,h^c,e\left( g,h\right) ^{abc}) =1\right) \right. \\&\left. \quad -\mathrm {Pr}\left( \mathscr {B}(g,g^a,g^b,h,h^a,h^c,e \left( g,h\right) ^{r})=1\right) \right| \end{aligned}$$

is negligible in \(\lambda \), where the probabilities are taken over abcr uniformly distributed in \(\mathbb {F}_{q}\) and over the random bits of \(\mathscr {B}\).

The second scheme we propose is proved secure under the Bilinear Diffie–Hellman Inversion Assumption \((p-\mathrm {BDHI})\). This assumption is also proposed in [8]. According to [18], the best known algorithm breaking p-BDHI is to solve the Discrete Logarithm Problem \((\mathrm {DLP})\) in \(\mathbb {G}\).

Definition 5

(p-BDHI Assumption) Let \(\mathbb {G}=\langle g \rangle \) denote a symmetric bilinear group deterministically generated according to a security parameter \(\lambda \), and let p be a positive integer. We say the p-BDHI assumption holds in \(\mathbb {G}\) if for every PPT algorithm \(\mathscr {B}\),

$$\begin{aligned} \mathrm {Adv}_{\mathscr {B}}(\lambda )=\mathrm {Pr} \left( \mathscr {B}(g,g^a,g^{a^{2}},\ldots ,g^{a^p}) =e\left( g,g\right) ^{1/a}\right) \end{aligned}$$

is negligible in \(\lambda \), where the probabilities are taken over uniformly distributed \(a\in \mathbb {F}_{q}\) and over the random bits of \(\mathscr {B}\).

In the proposed schemes and in the definitions above, bilinear groups are generated according to the security parameter \(\lambda \). We choose bilinear groups to have exponential order in \(\lambda \). See the security and consistency proofs for more details about this choice.

2.4.2 Security definition

We now introduce the security definition used in this article. We adapt the security definition introduced by Boneh et al. [9] to the conjunctive and subset case of Definition 1.

The used security definition is a semantic-security style definition that guarantees encrypted index indistinguishability in the face of an adversary with access to the public key and to trapdoors not containing any keyword distinguishing the challenge candidate documents. Therefore, in the security definition we propose, the adversary is not allowed to obtain trapdoors associated to any word that appears in one of the challenge candidate documents, but not in both.

Let \(\mathscr {S}\) be a PEKS scheme. Given a security parameter \(\lambda \), we introduce a security game in the following five phases:

  • Setup. The challenger runs Setup on input \(\lambda \) and hands over the public parameters to the adversary. It also runs \(\mathscr {S}.\mathrm {KeyGen}\), keeps the private key \(\mathrm {sk}\) secret and hands over the public key \(\mathrm {pk}\) to the adversary.

  • Query Phase 1. The adversary adaptively requests the challenger for \(q_{T}\) trapdoors of its own choice, where \(q_{T}\) is a polynomial value in the security parameter \(\lambda \). We denote the set of all keywords queried in this phase by \(\mathscr {W}_{1}\).

  • Challenge. The adversary outputs two challenge candidate documents \(\mathbf {D}_{0},\mathbf {D}_{1}\) subject to the restriction that keywords appearing in \(\mathbf {D}_{0}\varDelta \mathbf {D}_{1}\) have not been queried in Query Phase 1. That is, \((\mathbf {D}_{0}\varDelta \mathbf {D}_{1})\cap \mathscr {W}_{1} =\emptyset \). The challenger throws a fair coin \(b\in \{0,1\}\), and outputs the encrypted index \(\mathscr {S}.\mathrm {BuildIndex}(\mathbf {D}_{b})\) corresponding to \(\mathbf {D}_{b}\).

  • Query Phase 2. The adversary proceeds just as in Query Phase 1, but it is not allowed to ask for trapdoors containing keywords in \(\mathbf {D}_{0}\varDelta \mathbf {D}_{1}\). That is, if the set of all keywords queried in this phase is \(\mathscr {W}_{2}\), we impose \((\mathbf {D}_{0}\varDelta \mathbf {D}_{1})\cap \mathscr {W}_{2}=\emptyset \).

  • Guess. The adversary outputs a guess \(b'\in \{0,1\}\) for b.

Definition 6

(Semantic security against adaptive chosen keyword attacks) We say that a \(\mathrm {PEKS}\) scheme \(\mathscr {S}\) is semantically secure against adaptive chosen keyword attacks if the advantage of every PPT adversary \(\mathscr {A}\) in distinguishing b in the above game

$$\begin{aligned} \mathrm {Adv}_{\mathscr {A}}(\lambda )&=|\mathrm {Pr}(b'=b)-1/2|\\&=\left| \mathrm {Pr}(\mathscr {A}(X)=b | X=b)\right. \\&\left. \quad -\mathrm {Pr}(\mathscr {A}(X)=b | X=1-b)\right| \end{aligned}$$

is negligible in \(\lambda \).

For conjunctive PEKS, the security definition we consider is slightly weaker than in other related works, in the following sense. Works such as [4, 11, 14, 16, 20, 21] impose the natural restriction of serving the adversary just trapdoors coming from queries with equal search outcome over the two challenge candidate documents. In the case of conjunctive queries, the restriction we pose is stronger, since served trapdoors can not contain any keywords distinguishing the challenge candidate documents. This implies that trapdoors could leak which encrypted indexes contain some of the keywords in the underlying query, even if there is not a match.

In addition, the considered security definition does not provide trapdoor unlinkability or remove the need for a secure channel for trapdoors, as studied, for instance, in [3, 13, 32, 40].

2.4.3 Implementation remarks

We refer the reader to [2] for remarks and references on the following statements and for a recent review on the state of the art of pairing computation.

The implementation of asymmetric bilinear groups for elliptic curve cryptography is often based on BN, BLS, KSS or MNT elliptic curves. In turn, symmetric bilinear groups are implemented in practice on supersingular elliptic curves.

Supersingular elliptic curves are well known to require large prime order groups for the \(\mathrm {DLP}\) to be intractable (since they have a small MOV exponent), and this would enlarge the size of exchanged information in the proposed schemes. Moreover, recent results on the discrete logarithm problem [6] have rendered symmetric bilinear groups effectively obsolete for cryptographic purposes. Nevertheless, as in [26], we note that the second scheme we propose can be implemented in asymmetric bilinear groups as well, thus reducing the group order and increasing efficiency and security. In this context, symmetric bilinear groups are used just to facilitate the construction of the formal security proof.

We should note that asymmetric bilinear groups guarantee that we can securely and efficiently hash onto \(\mathbb {G}_{1}\). In particular, it is possible to efficiently and uniformly sample from \(\mathbb {G}_{1}\) without computing multiples of the generator g. The fact that we prove security under the random oracle model forces the use of such hash functions in the proposed schemes. See [38] for an explicit solution on secure hashing for BN curves.

3 Conjunctive \(\mathrm {PEKS}\) scheme

The proposed scheme can be seen as an analog to Boneh et al.’s scheme [9] by replacing the symmetric computational-type hardness assumption with an asymmetric decisional-type one. This replacement by a stronger assumption allows one to take advantage of the bilinearity of pairings and build a conjunctive PEKS scheme with small trapdoors and indexes and efficient search process.

Following [14, 20, 21, 30, 31], we assume that the documents to be encrypted satisfy that

  1. 1.

    two different keyword fields never hold the same keyword, and

  2. 2.

    every keyword field is defined.

As noted in the literature [31], this can be effectively achieved by appending a keyword field identifier to every keyword. For instance, when encrypting a document of the form \((w_{1},\ldots ,w_{n})\), one can assume that \(w_{i}=i\Vert w'_{i}\) for some keyword \(w'_{i}\) (which could be NULL or \(\bot \)) and for all \(i\in [m]\). We implicitly assume keywords in documents and trapdoors to be of this form.

Although not stated, every algorithm apart from Setup takes the public parameters as input.

Definition 7

We define a public-key encryption with conjunctive keyword search scheme \(\mathscr {S}_{1}\) by means of the following five polynomial-time algorithms:

\(\mathscr {S}_{1}.\mathrm {Setup}(\lambda )\)::

Given a security parameter \(\lambda \in \mathbb {Z}\), fix two asymmetric bilinear groups \(\mathbb {G}_{1},\mathbb {G}_{2}\) of prime order \(q\ge 2^{\lambda }\) and denote the corresponding pairing by \(e:\mathbb {G}_{1}\times \mathbb {G}_{2}\rightarrow \mathbb {G}_{T}\). Let gh be random generators of \(\mathbb {G}_{1}, \mathbb {G}_{2}\), respectively. Let \(H:\{0,1\}^{*}\rightarrow \mathbb {G}_{1}\) be a collision-free hash function. Define \(m\in \mathbb {Z}\) as the fixed number of keywords in every document, which we assume constant in \(\lambda \) and satisfying \(m\le (1+\log q)/2\). Output the public parameters \(\mathrm {params}=\{\mathbb {G}_{1}, \mathbb {G}_{2}, \mathbb {G}_{T}, q, e, g, h, H, m\}\).

\(\mathscr {S}_{1}.\mathrm {KeyGen}()\)::

Choose \(\beta \in \mathbb {F}_{q}\) uniformly at random.

Output the private key \(\beta \) and the public key \(\alpha =h^\beta \).

\(\mathscr {S}_{1}.\mathrm {BuildIndex}_{\alpha }(\mathbf {D})\)::

Denote by \(\mathbf {D}=(w_{1},\ldots ,w_{m})\) the input document consisting of a tuple of m keywords \(w_{i}\in \{0,1\}^{*}\). Then, uniformly generate a random nonce \(r\in \mathbb {F}_{q}\) and set

$$\begin{aligned}&I_{0} = h^{r}\\&I_{i} = e\left( H(w_{i}\right) , \alpha ^{r})\quad \text{ for } i\in [m].\\&\hbox {Output the index } \mathbf {I}=(I_{0},I_{1},\ldots ,I_{m}). \end{aligned}$$
\(\mathscr {S}_{1}.\mathrm {Trapdoor}_{\beta }(\mathbf {L},J)\)::

Denote by \(\mathbf {L}=(w_{1},\ldots ,w_{l})\) the input tuple of keywords (where \(l\le m\) and \(w_{i}\in \{0,1\}^{*}\)) and by \(J=\{j_{1},\ldots ,j_{l}\}\subseteq [m]\) the input set of positions. Set

$$\begin{aligned} T_{0} = \left( \prod _{i=1}^{l}H(w_{i})\right) ^{\beta }. \end{aligned}$$

Output the trapdoor \(\mathbf {T}\), consisting of \(T_{0}\) along with the fields J to be queried.

\(\mathscr {S}_{1}.\mathrm {Search}(\mathbf {I},\mathbf {T})\)::

Given the index \(\mathbf {I}=(I_{0},I_{1},\ldots ,I_{m})\) and the trapdoor \(\mathbf {T}=(T_{0},J)\), where \(J=\{j_{1},\ldots ,j_{l}\}\), output 1 if

$$\begin{aligned} e\left( T_{0},I_{0}\right) = \prod _{i=1}^{l} I_{j_i}. \end{aligned}$$

Otherwise output 0.

We next give the consistency and security theorems for our scheme. The proofs are deferred to Sects. 4.1 and 4.2, respectively.

Theorem 1

The proposed conjunctive PEKS scheme \(\mathscr {S}_{1}\) is computationally consistent under the random oracle model.

Theorem 2

Assume the \(\mathrm{DBDH}\) assumption holds. Then, the proposed conjunctive PEKS scheme \(\mathscr {S}_{1}\) is semantically secure against adaptive chosen keyword attacks under the random oracle model.

4 Consistency and security proofs for the conjunctive \(\mathrm {PEKS}\) scheme \(\mathscr {S}_{1}\)

In this section, we present the consistency and the security proofs of our conjunctive PEKS scheme \(\mathscr {S}_{1}\).

4.1 Consistency proof for the conjunctive \(\mathrm {PEKS}\) scheme \(\mathscr {S}_{1}\)

We dedicate this section to the proof of Theorem 1. By proceeding in a similar way than in the proof by Abdalla et al. [1], we prove consistency of the scheme \(\mathscr {S}_{1}\) in the random oracle model.

Let \(\mathscr {A}\) be a PPT adversary in the consistency game defined in Sect. 2.3, having access to the public parameters, to the public key \(\mathrm {pk}\) and to the hash oracle H modeled as a random oracle. Let \(\mathrm {WSet}\) be the set of keywords queried to the hash oracle H throughout the game, whose size \(q_{H}\) is polynomial in \(\lambda \). Let \(\mathbf {D}=(w_{1},\ldots ,w_{m})\), \(\mathbf {L}=(w'_{1},\ldots ,w'_{l})\) and \(J=\{j_{1},\ldots ,j_{l}\}\subseteq [m]\) denote the guess of \(\mathscr {A}\) in the Guess phase, where keywords are taken from \(\mathrm {WSet}\), and let \(\tilde{J}\) be the set of positions \(j_i\in J\) such that \(w_{j_{i}}\ne w'_{i}\). Without loss of generality, we rule out adversaries choosing \(\tilde{J}=\emptyset \) in the Guess phase. Let \(r\in \mathbb {F}_{q}\) denote the random nonce generated by \(\mathscr {A}\) in the encrypted index generation of the Output phase.

Denote \(X=e\left( \prod _{i\in J}H(w_{i}),h^{\beta r}\right) \) and also denote \(X'=e\left( \prod _{i=1}^{l}H(w'_{i}),h^{\beta r}\right) \). Now note that the output of \(\mathscr {A}\) in the consistency game is 1 if and only if \(X=X'\). We proceed to bound the probability of this event, which is \(\mathrm {Adv}_{\mathscr {A}}(\lambda )\) by definition.

Let E be the event that there exist \(\mathbf {D}=(w_{1},\ldots ,w_{m})\), \(\mathbf {L}=(w'_{1},\ldots ,w'_{l})\) and \(J=\{j_{1},\ldots ,j_{l}\}\subseteq [m]\), among all possible guesses taking words in \(\mathrm {WSet}\), in such a way that the equality \(\prod _{i=1}^{l}H(w_{j_i})=\prod _{i=1}^{l}H(w'_{i})\) is satisfied. If \(r\beta =0\), then \(\mathscr {A}\) always outputs 1. Otherwise, notice that \(X=X'\) happens only when E happens. Therefore,

$$\begin{aligned} \mathrm {Adv}_{\mathscr {A}}(\lambda )\le \frac{(q-1)^{2}}{q^{2}}\mathrm {Pr}\left( E\right) +\frac{2}{q} \end{aligned}$$

Since \(q\ge 2^\lambda \), it suffices to see that \(\mathrm {Pr}\left( E\right) \) is negligible in \(\lambda \).

Since H is modeled as a random oracle and since inversion permutes group elements, by using Lemma 1 we see that \(\mathrm {Pr}(E)\le q_{H}^{2m}\frac{m2^{2m}}{q}\). This bound is negligible in \(\lambda \), since \(q\ge 2^{\lambda }\) and \(m,q_{H}\) are assumed to be constant and polynomial in \(\lambda \), respectively.

As a consequence of this result, we conclude the proof of Theorem 1. We next state the lemma used above.

Lemma 1

Let \(\mathbb {G}\) be a finite group of order q and neutral element 1. Let mn be positive integers with \(m\le (1+\log q)/2\) and set \(X_{1},\ldots ,X_{n}\) independent and identically distributed uniform random variables with support \(\mathbb {G}\).

Let \(A_{n,2m}\) denote the event that there exists a \(S\subseteq [n]\) with \(|S|\le 2m\) such that \(\prod _{i\in S}X_{i}=1\). Then, we have

$$\begin{aligned} \mathrm {Pr}\left( A_{n,2m}\right) \le n^{2m}\frac{m2^{2m}}{q}. \end{aligned}$$

Proof

To make the notation simpler, denote \(A_{t,t}\) by \(A_{t}\) and set \(t=2m\). Notice that \(A_{n,t}\) happens for \(X_{1},\ldots ,X_{n}\) exactly when it happens for some subset of \(X_{1},\ldots ,X_{n}\) with \(\min (n,t)\) terms. Therefore, by the union bound

$$\begin{aligned} \mathrm {Pr}(A_{n,t})\le \left( {\begin{array}{c}n\\ t\end{array}}\right) \mathrm {Pr}(A_{t})\le n^{t}\mathrm {Pr}(A_{t}). \end{aligned}$$

We now lower bound the probability of the complementary event \(A_{t}^{c}\).

We first prove \(\mathrm {Pr}(A_{t}^{c})\ge \prod _{i=0}^{t-1} \frac{q-2^{i}}{q}\) by induction on t over positive integers. For \(t=1\) we have \(\mathrm {Pr}(A_{1}^{c})=\frac{q-1}{q}\). For \(t>1\) note that, for \(A_{t}^{c}\) to happen with \(X_{1},\ldots ,X_{t}\), the event \(A_{t-1}\) must happen with \(X_{1},\ldots ,X_{t-1}\) and \(X_{t}\) can not take as a value any of the inverses of the subproducts of \(X_{1},\ldots ,X_{t-1}\). Therefore, there are at least \(q-2^{t-1}\) possible values for \(X_{t}\) such that \(A_{t}\) happens and we get that \(\mathrm {Pr}(A_{t}^{c})\ge \frac{q-2^{t-1}}{q}\mathrm {Pr}(A_{t-1}^{c})\) as claimed.

Now we have

$$\begin{aligned} \mathrm {Pr}(A_{t})\le 1-\prod _{i=0}^{t-1}\frac{q-2^{i}}{q}\le 1-\left( 1-\frac{2^{t-1}}{q}\right) ^{t}. \end{aligned}$$

Since \(t\le 1+\log q\), we can bound this last expression by using the binomial inequality, obtaining \(\mathrm {Pr}(A_{t})\le \frac{t2^{t-1}}{q}\), and the result is proved. \(\square \)

4.2 Security proof for the conjunctive \(\mathrm {PEKS}\) scheme \(\mathscr {S}_{1}\)

We dedicate this section to the proof of Theorem 2. As in [11], security is here proved in the random oracle model by means of a sequence of hybrid games.

Given two documents \(\mathbf {D}_{0}=(w_{0,1}, \ldots ,w_{0,m})\) and \(\mathbf {D}_{1}=(w_{1,1},\ldots ,w_{1,m})\), let \(\varDelta \subseteq [m]\) denote the set of positions corresponding to keywords in \(\mathbf {D}_{0}\varDelta \mathbf {D}_{1}\). For \(j\in [m]\) let \(\varDelta _{j}\) denote the first \(\min (j,|\varDelta |)\) elements of \(\varDelta \).

Let \(G_{0}\) be the security game defined in Sect. 2.4.2. Given \(j\in [m]\), we define a hybrid game \(G_{j}\), differing from \(G_{0}\) only in that the keywords in positions in \(\varDelta _{j}\) of the challenge index are chosen uniformly at random by the challenger.

Specifically, we introduce the security game \(G_{j}\) for \(j\in [m]\), consisting of the following five phases:

  • Setup. The challenger runs Setup and hands over the public parameters to the adversary. It also runs \(\mathrm {KeyGen}\), keeps the private key \(\mathrm {sk}\) secret and hands over the public key \(\mathrm {pk}\) to the adversary.

  • Query Phase 1. The adversary adaptively asks the challenger for \(q_{T}\) trapdoors of its own choice, where \(q_{T}\) is a polynomial value in the security parameter \(\lambda \). We denote the set of all keywords queried in this phase by \(\mathscr {W}_{1}\).

  • Challenge. The adversary outputs two challenge candidate documents \(\mathbf {D}_{0}\), \(\mathbf {D}_{1}\), subject to the restriction that keywords appearing in \(\mathbf {D}_{0}\varDelta \mathbf {D}_{1}\) have not been queried in Query Phase 1. That is, \((\mathbf {D}_{0}\varDelta \mathbf {D}_{1})\cap \mathscr {W}_{1} =\emptyset \). The challenger throws a fair coin \(b\in \{0,1\}\) and computes the index \(\mathbf {I}=(I_{0},I_{1},\ldots ,I_{m})\) corresponding to \(\mathbf {D}_{b}\). Then, for every \(i\in \varDelta _{j}\), the challenger replaces \(I_{i}\) with uniformly sampled random elements from \(\mathbb {G}_{1}\) and hands over this modified index to the adversary as the challenge.

  • Query Phase 2. The adversary proceeds just as in Query Phase 1, but it is not allowed to ask for trapdoors containing keywords in \(\mathbf {D}_{0}\varDelta \mathbf {D}_{1}\). That is, if the set of all keywords queried in this phase is \(\mathscr {W}_{2}\), we impose \((\mathbf {D}_{0}\varDelta \mathbf {D}_{1})\cap \mathscr {W}_{2} =\emptyset \).

  • Guess. The adversary outputs a guess \(b'\in \{0,1\}\) for b.

Let \(\mathrm {Adv}_{\mathscr {A},G_{j}}(\lambda )\) denote the advantage of the PPT adversary \(\mathscr {A}\) in guessing b in the game \(G_{j}\). It is clear that \(\mathrm {Adv}_{\mathscr {A},G_{m}} (\lambda )\) is negligible in \(\lambda \) for every PPT adversary \(\mathscr {A}\) because in \(G_{m}\) the two challenge candidate documents share the same information with the challenge index.

Note that \(G_{0}\) is identical to the security game defined in Sect. 2.4.2. We prove through Proposition 1 that the proposed conjunctive PEKS scheme \(\mathscr {S}_{1}\) is semantically secure against adaptive chosen keyword attacks provided the DBDH assumption holds.

Proposition 1

Assume that the \(\mathrm {DBDH}\) assumption holds. For any \(j\in \{0,\ldots ,m-1\}\) and for any PPT adversary \(\mathscr {A}\), the advantages of \(\mathscr {A}\) in the games \(G_{j}\) and \(G_{j+1}\), when using the scheme \(\mathscr {S}_{1}\), are negligibly close in \(\lambda \). That is,

$$\begin{aligned} |\mathrm {Adv}_{\mathscr {A},G_{j}}(\lambda ) -\mathrm {Adv}_{\mathscr {A},G_{j+1}}(\lambda )| \end{aligned}$$

is negligible in \(\lambda \).

Proof

Let \(\mathscr {A}\) be a PPT adversary. For every \(j\in \{0, \ldots , m-1\}\), we build a PPT DBDH distinguisher \(\mathscr {B}_{j}\) taking a DBDH challenge tuple \((g,g^a,g^b,h,h^a,h^c,v)\) as input and interacting with \(\mathscr {A}\) as the challenger in the security game of the scheme.

The distinguisher \(\mathscr {B}_{j}\) is built in such a way that, for tuples with \(v=e\left( g,g\right) ^{abc}\), \(\mathscr {A}\) is playing the game \(G_{j}\), and for tuples with v random \(\mathscr {A}\) is playing the game \(G_{j+1}\). The output of the DBDH distinguisher \(\mathscr {B}_{j}\) depends on the output of \(\mathscr {A}\).

  • Setup. The challenger \(\mathscr {B}_{j}\) runs \(\mathscr {S}_{1}.\mathrm {Setup}(\lambda )\) to obtain \(\mathrm {params}=\{\mathbb {G}_{1}, \mathbb {G}_{2}, q, e, g, h, H, m\}\) the public parameters of the scheme, where H is the hash oracle described below. \(\mathscr {B}_{j}\) hands over the public parameters to \(\mathscr {A}\).

  • Keygen. The challenger \(\mathscr {B}_{j}\) hands over the public key \(h^{a}\) to \(\mathscr {A}\).

  • Hash Oracle. The hash oracle H is operated by \(\mathscr {B}_{j}\), and it maintains a list of tuples of the form \(\langle w, s, c\rangle \) with \(w\in \{0,1\}^{*}\), \(s\in \mathbb {F}_{q}\) and \(c\in \{0,1\}\). The list is initially empty. On input a keyword \(w\in \{0,1\}^{*}\), the oracle H operates as follows:

    1. 1.

      If there is an item in the list whose first element is keyword w, denote it by \(\langle w, s, c\rangle \). Then:

      1. (a)

        If \(c=0\), the oracle returns \(g^s\).

      2. (b)

        If \(c=1\), the oracle returns \(\left( g^b\right) ^s\).

    2. 2.

      If there is no item in the list whose first element is keyword w, then the oracle flips a coin \(c\in \{0,1\}\) with \(\mathrm {Pr}(c=1)=1/(2q_{T}m+1)\), samples \(s\in \mathbb {F}_{q}\) uniformly at random and inserts \(\langle w, s, c\rangle \) into the list. Then, it proceeds to give an output as in the previous point.

  • Query Phase 1. When \(\mathscr {A}\) requests a trapdoor for keywords \(\mathbf {L}=(w_{1},\ldots ,w_{l})\) in the set of keyword fields \(J=\{j_{1},\ldots ,j_{l}\}\), the algorithm \(\mathscr {B}_{j}\) first calls the oracle on input each keyword \(w_{i}\) and retrieves the associated oracle list tuples \(\langle w_{i},s_{i},c_{i}\rangle \). Then, if some coin flip \(c_{i}=1\), \(\mathscr {B}_{j}\) halts. Otherwise, \(\mathscr {B}_{j}\) hands over to \(\mathscr {A}\) the trapdoor \(\mathbf {T}\) consisting of \(T_{0}=\prod _{i=1}^{l}\left( g^a\right) ^{s_{i}}\) and J.

  • Challenge. In this phase, the adversary \(\mathscr {A}\) outputs two documents \(\mathbf {D}_{0}=(w_{0,1},\ldots , w_{0,m})\), \(\mathbf {D}_{1}=(w_{1,1},\ldots , w_{1,m})\) with the restrictions stated in the security game defined in Sect. 2.4.2 and above, and \(\mathscr {B}_{j}\) throws a fair coin \(b\in \{0,1\}\).

    Then, \(\mathscr {B}_{j}\) calls the hash oracle on every keyword \(w_{b,i}\) to fill the H-list with tuples \(\langle w_{b,i},s_{b,i},c_{b,i}\rangle \). The algorithm \(\mathscr {B}_{j}\) halts if:

    • For some \(i\in [m]\backslash \varDelta _{j+1}\) we have \(c_{b,i}=1\), or

    • \(c_{b,t}=0\), where \(\{t\}=\varDelta _{j+1}\backslash \varDelta _{j}\).

    Then \(\mathscr {B}_{j}\) samples a value \(r\in \mathbb {F}_{q}\) uniformly at random and computes the challenge \(\mathbf {I}=(I_{0},I_{1},\ldots ,I_{m})\) in the following way

    $$\begin{aligned} I_{0}= & {} h^{r},\\ I_{i}= & {} \left\{ \begin{array}{ll} \text{ unif. } \text{ sampled } \text{ from } \mathbb {G}_{T} &{} \text{ if } \,i\in \varDelta _{j}\\ v^{rs_{b,i}} &{} \text{ if } \,i\in \varDelta _{j+1}\backslash \varDelta _{j}\ne \emptyset \\ e\left( (g^a)^{r},(h^c)^{s_{b,i}}\right) &{} \text{ if } i\in [m]\backslash \varDelta _{j+1}\ne \emptyset \\ \end{array}\right. \\ \end{aligned}$$

    and hands over \(\mathbf {I}\) to \(\mathscr {A}\).

  • Query Phase 2.\(\mathscr {B}_{j}\) proceeds as in Query Phase 1.

  • Guess. The adversary \(\mathscr {A}\) outputs a guess \(b'\in \{0,1\}\) for b. If \(b=b'\), \(\mathscr {B}_{j}\) outputs 1, and if \(b\ne b'\), \(\mathscr {B}_{j}\) outputs 0.

Since the DBDH assumption holds, \(\mathrm {Adv}_{\mathscr {B}_{j}}(\lambda )\) must be negligible in \(\lambda \). But

$$\begin{aligned} \mathrm {Adv}_{\mathscr {B}_{j}}(\lambda )&=|\mathrm {Pr}(\mathscr {B}_{j}(X)=1 | X=1)\\&-\mathrm {Pr}(\mathscr {B}_{j}(X)=1 | X=0)|\\&=\mathrm {Pr}(\mathscr {B}_{j} \text{ does } \text{ not } \text{ halt })\\&\quad \cdot |\mathrm {Adv}_{\mathscr {A},G_{j}}(\lambda ) -\mathrm {Adv}_{\mathscr {A},G_{j+1}}(\lambda )|. \end{aligned}$$

By Lemma 2, \(\mathrm {Pr}(\mathscr {B}_{j} \text{ does } \text{ not } \text{ halt })\) is non-negligible in \(\lambda \), and the result is proved. \(\square \)

As a consequence of this result, we conclude the proof of Theorem 2. We next state and prove the lemma referenced in the proof of Proposition 1, which is an adaptation of a result in [9].

Lemma 2

([9]) The probability that algorithm \(\mathscr {B}_{j}\) does not halt is non-negligible in the security parameter \(\lambda \).

Proof

We split the calculations between the query phases and the challenge phase.

In each of the query phases, we allow \(\mathscr {A}\) to ask for a polynomial amount \(q_{T}\) (in \(\lambda \)) of trapdoor queries. This amounts to throwing at most \(2mq_{T}\) coins c with \(\mathrm {Pr}(c=1)=1/(2q_{T}m+1)\). Since \(\mathscr {B}_{j}\) does not halt exactly when each and every one of these throws outcome is 0, we have

$$\begin{aligned}&\mathrm {Pr}(\mathscr {B}_{j} \text{ does } \text{ not } \text{ halt } \text{ in } \text{ query } \text{ phases })\\&\quad \ge \left( 1-\frac{1}{2mq_{T}+1}\right) ^{2mq_{T}}\ge 1/e, \end{aligned}$$

which is non-negligible in \(\lambda \).

For the challenge phase, \(\mathscr {B}\) does not halt exactly when the coin throw corresponding to the keyword in position \(\varDelta _{j+1}\backslash \varDelta _{j}\) (if nonempty) of the chosen challenge document is 1 and the coin throws corresponding to the keywords in positions in \([m]\backslash \varDelta _{j+1}\) of the chosen challenge document are all 0. Since, if \(\mathbf {D}_{0}\ne \mathbf {D}_{1}\) then \(|[m]\backslash \varDelta _{j+1}|\le m-1\), we have:

$$\begin{aligned}&\mathrm {Pr}(\mathscr {B}_{j} \text{ does } \text{ not } \text{ halt } \text{ in } \text{ the } \text{ challenge } \text{ phase })\\&\quad \ge \left( 1-\frac{1}{2mq_{T}+1}\right) ^{m-1}\frac{1}{2mq_{T}+1} \ge \frac{1}{e}\frac{1}{2mq_{T}+1}, \end{aligned}$$

which is non-negligible in \(\lambda \) since m is constant in \(\lambda \) and \(q_{T}\) is polynomial in \(\lambda \), and we get the stated lemma. \(\square \)

5 Subset \(\mathrm {PEKS}\) scheme

The second PEKS scheme we propose enables a class of subset queries. This class includes subset queries as defined in [11].

Subset queries, as understood by [11], are specified by an ordered tuple of m sets of keywords \((A_{1},\ldots ,A_{m})\). Then, a document \(\mathbf {D}=(w_{1},\ldots ,w_{m})\) satisfies such a query if and only if the predicate \((w_{1}\in A_{1})\wedge \cdots \wedge (w_{m}\in A_{m})\) holds. The scheme we propose considers subsets in a partition of \(\mathbf {D}\) instead of keywords \(w_{i}\) in this last predicate.

More concretely, in the setup algorithm we fix a partition \(J_{1},\ldots ,J_{m}\) of [m]. Given a document \(\mathbf {D}=(w_{1},\ldots ,w_{m})\), write \(B_{i}=\{w_{j}\}_{j\in J_{i}}\) for every \(i\in [m]\). Given a query \(\mathbf {L}=(w'_{1},\ldots ,w'_{l})\), \(J=\{j_{1},\ldots ,j_{l}\}\), where J is written in increasing order, consider \(A_{i}=\{w'_{k}\}_{j_{k}\in J_{i}}\) for every \(i\in [m]\). Then, the document \(\mathbf {D}\) satisfies the query \(\mathbf {L},J\) if and only if the predicate \((B_1\subseteq A_1)\wedge \cdots \wedge (B_m\subseteq A_m)\) holds. Note that we also admit empty keyword fields in documents, which are denoted by keywords \(\bot \).

For the sake of clarity, before formally stating the proposed construction, we give a brief example illustrating the internal workings of the scheme.

The \(\mathrm {Setup}\) algorithm of the scheme fixes a tuple of possibly repeated field identifiers \((f_{1},\ldots ,f_{m})\). We take \(m=8\) and \((f_{1},\ldots ,f_{8})=(1,1,1,2,2,2,3,3)\) as an example.

When encrypting documents in \(\mathrm {BuildIndex}\), the documents \(\mathbf {D}=(w_{1},\ldots ,w_{m})\) can be thought of as a collection of sets of keywords, where keywords in positions having the same field identifier belong to the same set. Also, the keyword \(\bot \) is allowed, and it stands for a null entry. For instance, following the example above, the document \(\mathbf {D}=(w_{1},w_{2},w_{3},w_{4}, w_{5},\bot ,w_{7},\bot )\) can be thought of as the following collection of sets \((\{w_{1},w_{2},w_{3}\},\{w_{4},w_{5}\}, \{w_{7}\})\).

When generating trapdoors, we input a query consisting of a tuple of keywords \(\mathbf {L}=(w_{1},\ldots ,w_{l})\) and a set of positions \(J=\{j_{1},\ldots ,j_{l}\}\) written in increasing order. As above, keywords in positions having the same field identifier are thought to belong to the same set. Thus, in the example above, the query for words \(\mathbf {L}=(w'_{1},w'_{2},w'_{3},w'_{4},w'_{5},w'_{6}, w'_{7})\) at positions \(J=\{1,2,3,4,5,6,8\}\) can be thought of as the collection of sets \((\{w'_{1},w'_{2},w'_{3}\},\{w'_{4},w'_{5}, w'_{6}\},\{w_{7}\})\).

Now, a document matches a query in the \(\mathrm {Search}\) algorithm exactly when the sets of keywords defined by the document are contained in the sets of keywords defined by the query, in a sequential way. That is, following the example above, a match happens exactly when

$$\begin{aligned}&(\{w_{1},w_{2},w_{3}\}\subseteq \{w'_{1},w'_{2},w'_{3}\}) \wedge (\{w_{4},w_{5}\}\\&\quad \subseteq \{w'_{4},w'_{5},w'_{6}\}) \wedge (\{w_{7}\}\subseteq \{w'_{7}\}). \end{aligned}$$

We now describe the proposed subset PEKS scheme. Although not stated, every algorithm apart from Setup takes the public parameters as input.

Definition 8

We define a public-key encryption with subset keyword search scheme \(\mathscr {S}_{2}\) by means of the following five polynomial-time algorithms:

\(\mathscr {S}_{2}.\mathrm {Setup}(\lambda )\)::

Given a security parameter \(\lambda \in \mathbb {Z}\), fix a symmetric bilinear group \(\mathbb {G}\) of prime order \(q\ge 2^{\lambda }\) and denote the corresponding pairing by \(e:\mathbb {G}\times \mathbb {G}\rightarrow \mathbb {G}_{T}\). Let g be a random generator of \(\mathbb {G}\). Let \(H:\{0,1\}^{*}\rightarrow \mathbb {G}\) and \(H_{1}:\mathbb {G}_{T}\rightarrow \{0,1\}^{*}\) be collision-free hash functions. Set \(m\in \mathbb {Z}\) the maximum number of keywords in every document, which we assume constant in \(\lambda \) and satisfying \(m\le (1+\log q)/2\). Define a tuple \((f_{1},\ldots ,f_{m})\) of possibly repeated field identifiers describing which field does each word in the documents belong to, where each \(f_{i}\in [m]\). Output \(\mathrm {params}=\{\mathbb {G}, q, e, g, H, H_{1}, m, (f_{1},\ldots , f_{m})\}\) the public parameters of the scheme.

\(\mathscr {S}_{2}.\mathrm {KeyGen}()\)::

Choose \(a\in \mathbb {F}_{q}\) uniformly at random.

Output the private key \(\beta =(\beta _{i})_{i=1}^{m}\) and the public key \(\alpha =(\alpha _{i})_{i=1}^{m}\), where \(\beta _{i}=a^{-i}\) and \(\alpha _{i}=g^{a^{i}}\).

\(\mathscr {S}_{2}.\mathrm {BuildIndex}_{\alpha }(\mathbf {D})\)::

Given as input \(\mathbf {D}=(w_{1},\ldots ,w_{m})\) the document consisting of a tuple of m keywords \(w_{i}\) in the domain \(\{0,1\}^{*}\cup \{\bot \}\), generate \(r_{1},\ldots ,r_{m}\in \mathbb {F}_{q}\) uniform random nonces in such a way that \(f_{i}=f_{j}\) implies \(r_{i}=r_{j}\).

Set

$$\begin{aligned} I_{0}= & {} H_{1}\left( e\left( \prod _{i\in [m]:w_{i}\ne \bot } H(w_{i})^{r_{i}},g\right) \right) \\ I_{i}= & {} \alpha _{i}^{r_{i}}\quad \text{ for } i\in [m].\\ \end{aligned}$$

Output the index \(\mathbf {I}=(I_{0},I_{1},\ldots ,I_{m})\).

\(\mathscr {S}_{2}.\mathrm {Trapdoor}_{\beta }(\mathbf {L},J)\)::

Given \(\mathbf {L}=(w_{1},\ldots ,w_{l})\) the input tuple of keywords with \(l\le m\), and the set of keyword fields \(J=\{j_{1},\ldots , j_{l}\}\subseteq [m]\) written in increasing order, set

$$\begin{aligned} T_{i} = H(w_{i})^{\beta _{j_{i}}}\quad \text{ for } i\in \{1,\ldots ,l\}.\\ \end{aligned}$$

Output the trapdoor \(\mathbf {T}\), consisting of \(T_{1},\ldots ,T_{l}\) along with the fields J to be queried.

\(\mathscr {S}_{2}.\mathrm {Search}(\mathbf {I},\mathbf {T})\)::

Denote the index \(\mathbf {I}\) by \(\mathbf {I}=(I_{0},I_{1},\ldots ,I_{m})\) and the trapdoor \(\mathbf {T}\) by \(\mathbf {T}=(T_{0},\{j_{1},\ldots ,j_{l}\})\). For every \(t\in [m]\), let \(J_{t}\) denote the set of elements \(i\in [l]\) such that \(f_{j_{i}}=t\). For every \(i\in [l]\), compute \(v_{i}=e\left( T_{i},I_{j_{i}}\right) \).

Output 1 if there exists subsets \(J'_{t}\subseteq J_{t}\) for \(t\in [m]\) such that

$$\begin{aligned} I_{0}=H_{1}\left( \prod _{t=1}^{m}\prod _{i\in J'_{t}} v_{i}\right) . \end{aligned}$$

Otherwise output 0.

In the following example, we describe an application of our subset PEKS scheme. We follow the e-mail gateway scenario mentioned in Sect. 1 and proposed by Boneh et al. [9].

Example 1

Consider a user Alice that reads her e-mail on various devices (such as laptop, smartphone and desktop). Suppose that each message is tagged with a sequence of at most four keywords to aid classification. Further suppose that the first tag defines the priority of the message (e.g., “urgent” or “low_priority”) and that the last three describe its category (e.g., “social”, “advertising” or “work”), so e-mails have the structure

$$\begin{aligned} \text {message} \Vert \text {priority}\_\text {tag} \Vert \text {cat}\_\text {tag}\_1 \Vert \text {cat}\_\text {tag}\_2 \Vert \text {cat}\_\text {tag}\_3. \end{aligned}$$

Alice receives her e-mail through a gateway, who distributes all messages to her devices according to the attached tags. Due to privacy reasons, Alice does not wish her e-mail gateway to be able to read her e-mail messages nor to have any knowledge of the attached tags. However, she still wants her e-mail gateway to classify and distribute messages to her devices correctly. Hence, she sets up a public-key encryption scheme \((\mathrm {Gen},\mathrm {Enc},\mathrm {Dec})\) and disseminates her public-key material, so that senders can send her messages in an encrypted form.

In this context, our subset PEKS scheme can be used to encrypt the keyword tags.

To set up our subset PEKS scheme in the described setting, Alice would first execute \(\mathscr {S}_{2}.\mathrm {Setup}(\lambda )\) and set \(m=4\) and \((f_1,\ldots ,f_4)=(1,2,2,2)\). Then, she would generate the public key \(\alpha \) and the private key \(\beta \) by calling \(\mathscr {S}_{2}.\mathrm {Keygen}()\) and disseminate the public key.

Suppose that a user Bob wants to send Alice an e-mail message M with low priority to schedule a work meeting. He thus chooses the priority tag “low_priority” and the category tags “work” and “meeting”. He can then generate the index

$$\begin{aligned} \mathbf {I}=\mathscr {S}_{2}.\mathrm {BuildIndex}_{\alpha } ((\mathrm {low\_priority},\mathrm {work},\mathrm {meeting},\bot )) \end{aligned}$$

and send \(\mathrm {Enc}(M) \Vert \mathbf {I}\) to her e-mail gateway.

Now, suppose that Alice wants to restrict the e-mails she receives on her smartphone to personal e-mails, urgent work e-mails and work e-mails for scheduling meetings. To do so, she can send the following trapdoors to her e-mail gateway

$$\begin{aligned}&\mathbf {T}=\mathscr {S}_{2}.\mathrm {Trapdoor}_{\beta } ((\mathrm {personal}),\{2\})\\&\mathbf {T}'=\mathscr {S}_{2}.\mathrm {Trapdoor}_{\beta } ((\mathrm {urgent},\mathrm {work}),\{1,2\})\\&\mathbf {T}''=\mathscr {S}_{2}.\mathrm {Trapdoor}_{\beta } ((\mathrm {work},\mathrm {meeting}),\{2,3\}). \end{aligned}$$

When receiving an encrypted e-mail of the form \(\mathrm {Enc}(M) \Vert \mathbf {I}\), Alice’s e-mail gateway is trusted to forward it to her smartphone exactly when any of the evaluations \(\mathscr {S}_{2}.\mathrm {Search}(\mathbf {I},\mathbf {T})\), \(\mathscr {S}_{2}.\mathrm {Search}(\mathbf {I},\mathbf {T}')\) and \(\mathscr {S}_{2}.\mathrm {Search}(\mathbf {I},\mathbf {T}'')\) return 1.

We next give the consistency and security theorems for our scheme. The proofs are deferred to Sects. 6.1 and 6.2, respectively.

Theorem 3

The proposed subset PEKS scheme \(\mathscr {S}_{2}\) is computationally consistent under the random oracle model.

Theorem 4

Assume that the \((m+1)-\mathrm {BDHI}\) assumption holds. Then, the proposed subset PEKS scheme \(\mathscr {S}_{2}\) is semantically secure against adaptive chosen keyword attacks under the random oracle model.

6 Consistency and security proofs for the subset \(\mathrm {PEKS}\) scheme \(\mathscr {S}_{2}\)

In this section, we give the consistency and security proofs for the subset PEKS scheme \(\mathscr {S}_{2}\).

6.1 Consistency proof for the subset \(\mathrm {PEKS}\) scheme \(\mathscr {S}_{2}\)

We dedicate this section to the proof of Theorem 3.

We now prove consistency in the random oracle model of the scheme \(\mathscr {S}_{2}\) in a similar fashion than in the proof by Abdalla et al. [1].

Let \(\mathscr {A}\) be a PPT adversary in the consistency game defined in Sect. 2.3, which has access to the public parameters, to the public key \(\mathrm {pk}\) and to the hash oracles \(H,H_{1}\) modeled as random oracles. Let \(\mathrm {WSet}\), \(\mathrm {TSet}\) be the sets of polynomial (in \(\lambda \)) size \(q_{H},q_{H_{1}}\) which consist of keywords queried to the hash oracles \(H,H_{1}\) throughout the game, respectively. Write \((f_{1},\ldots ,f_{m})\) the tuple of field identifiers in the public parameters. Let \(\mathbf {D}=(w_{1},\ldots ,w_{m})\) and \(\mathbf {L}=(w'_{1},\ldots ,w'_{l})\) and \(J=\{j_{1},\ldots ,j_{l}\}\subseteq [m]\) (written in increasing order) denote the guess of \(\mathscr {A}\) in the Guess phase.

For \(j\in [m]\), let \(\mathbf {D}_{f_{j}}\) denote the set of keywords in \(\mathbf {D}\) at positions having field identifier \(f_{j}\). Let \(\tilde{J}\) be the set of positions \(j_i\in J\) such that \(w'_{i}\not \subseteq \mathbf {D}_{f_{j_i}}\). Without loss of generality, we rule out adversaries choosing \(\tilde{J}=\emptyset \) in the Guess phase. Let \(r_{1},\ldots ,r_{m}\in \mathbb {F}_{q}\) denote the random nonces generated by \(\mathscr {A}\) in the encrypted index generation of the Output phase.

Denote \(X=e\left( \prod _{i\in [m]:w_{i}\ne \bot }H(w_{i})^{r_{i}}, g\right) \) and, given a subset \(J'\subset J\), denote \(X'=e \left( \prod _{j_{i}\in J'}H(w'_{i})^{r_{j_i}},g\right) \). Now note that the output of \(\mathscr {A}\) in the consistency game is 1 if and only if \(X=X'\) or \(H_{1}(X)=H_{1}(X')\) for some \(J'\subseteq J\) with \(J'\cap \tilde{J}\ne \emptyset \).

Let E denote the event that there exist \(\mathbf {D}=(w_{1}, \ldots ,w_{m})\), \(\mathbf {L}=(w'_{1},\ldots ,w'_{l})\) and \(J=\{j_{1},\ldots ,j_{l}\}\subseteq [m]\), among all possible guesses taking words in \(\mathrm {WSet}\) in such a way that we have \(\prod _{i\in [m]:w_{i}\ne \bot }H(w_{i})^{r_{i}} =\prod _{i\in [l]}H(w'_{i})^{r_{j_i}}\). Likewise, let \(E_{1}\) be the event that there exist \(T,T'\in \mathrm {TSet}\) in such a way that \(H_{1}(T)=H_{1}(T')\).

If all \(r_{1},\ldots ,r_{m}\ne 0\), then \(X=X'\) has nonzero probability of happening only when E happens (note that by ranging over all possible J we remove the need to include the \(J'\) above in the argument). Likewise \(H_{1}(X)=H_{1}(X')\) has nonzero probability of happening only when \(E_{1}\) happens. Therefore,

$$\begin{aligned} \mathrm {Adv}_{\mathscr {A}}(\lambda )\le \left( 1-\frac{m}{q}\right) (\mathrm {Pr}(E) +\mathrm {Pr}(E_{1}))+\frac{m}{q}. \end{aligned}$$

Since \(q\ge 2^{\lambda }\) and m is constant in \(\lambda \), it suffices to prove that \(\mathrm {Pr}(E)\) and \(\mathrm {Pr}(E_{1})\) are negligible in \(\lambda \).

By computing the probability of the complementary and using the binomial inequality, we see that \(\mathrm {Pr}(E_{1})\le q_{H_{1}}^{2}/q\). Now, since H is modeled as a random oracle and since inversion permutes group elements, by using Lemma 1 we see that \(\mathrm {Pr}(E)\le q_{H}^{2m}\frac{m2^{2m}}{q}\). The obtained bounds are indeed negligible in \(\lambda \), since \(q\ge 2^{\lambda }\), and \(m,q_{H}\) are assumed to be constant and polynomial in \(\lambda \), respectively.

As a consequence of this result, we conclude the proof of Theorem 3.

6.2 Security proof for the subset \(\mathrm {PEKS}\) scheme \(\mathscr {S}_{2}\)

We dedicate this section to the proof of Theorem 4.

We prove security in the random oracle model by following a similar technique than [9].

Suppose that there exists a PPT adversary \(\mathscr {A}\) breaking the security game defined in Sect. 2.4.2 with advantage not negligible in \(\lambda \). We then build a successful PPT \((m+1)-\mathrm {BDHI}\) distinguisher \(\mathscr {B}\) taking an \((m+1)-\mathrm {BDHI}\) challenge tuple \((g, g^{a},\ldots ,g^{a^{m+1}})\) as input. By interacting with \(\mathscr {A}\) as the challenger in the security game defined in Sect. 2.4.2, \(\mathscr {B}\) computes \(e\left( g,g\right) ^{1/a}\) with non-negligible advantage in \(\lambda \).

  • Setup. The challenger \(\mathscr {B}\) runs \(\mathrm {Setup}(\lambda )\) to generate the public parameters of the scheme

    $$\begin{aligned} \mathrm {params}=\{\mathbb {G}, q, e, g, H, H_{1}, m, (f_{1},\ldots , f_{m})\}, \end{aligned}$$

    where \(H,H_{1}\) are handles to the hash oracles described below. \(\mathscr {B}\) hands over the public parameters to \(\mathscr {A}\).

  • Keygen. The challenger \(\mathscr {B}\) hands over the public key \((g^{a},\ldots ,g^{a^m})\) to \(\mathscr {A}\).

  • Hash OracleH. The oracle is operated by \(\mathscr {B}\), which maintains a list of tuples of the form \(\langle w, s, c\rangle \) with \(w\in \{0,1\}^{*}\), \(s\in \mathbb {F}_{q}\) and \(c\in \{0,1\}\). The list is initially empty. On input a keyword \(w\in \{0,1\}^{*}\), the oracle H operates as follows:

    1. 1.

      If there is an item in the list whose first element is keyword w, denote it by \(\langle w, s, c\rangle \). Then:

      1. (a)

        If \(c=0\), the oracle returns \(g^s\).

      2. (b)

        If \(c=1\), the oracle returns \(\left( g^{a^{m+1}}\right) ^s\).

    2. 2.

      If there is no item in the list whose first element is keyword w, then the oracle flips a coin \(c\in \{0,1\}\) with \(\mathrm {Pr}(c=1)=1/(2q_{T}m+1)\), samples \(s\in \mathbb {F}_{q}\) uniformly at random and inserts \(\langle w, s, c\rangle \) into the list. Then, it proceeds to give an output as in the previous point.

  • Hash Oracle\(H_{1}\). The oracle is operated by \(\mathscr {B}\), which maintains a list of tuples of the form \(\langle t, V\rangle \) with \(t\in \mathbb {G}_{T}\) and \(V\in \{0,1\}^{*}\). The list is initially empty. On input an element \(t\in \mathbb {G}_{T}\), the oracle \(H_{1}\) operates as follows:

    1. 1.

      If there is an item in the list whose first element is t, denote it by \(\langle t, V\rangle \). The oracle returns V.

    2. 2.

      If there is no item in the list whose first element is t, then the oracle samples \(V\in \mathbb {G}_{T}\) uniformly at random and inserts \(\langle t, V\rangle \) into the list. Then, it proceeds to give an output as in the previous point.

  • Query Phase 1. When \(\mathscr {A}\) requests a trapdoor for keywords \(\mathbf {L}=(w_{1},\ldots ,w_{l})\) in positions \(J=\{j_{1},\ldots ,j_{l}\}\) written in increasing order, the algorithm \(\mathscr {B}\) first calls the H oracle on input each keyword \(w_{i}\) and retrieves the associated oracle list tuples \(\langle w_{i},s_{i},c_{i}\rangle \). Then, if some coin flip \(c_{i}=1\), \(\mathscr {B}\) halts. Otherwise, \(\mathscr {B}\) hands over to \(\mathscr {A}\) the trapdoor \(\mathbf {T}\) consisting of \(T_{i}=(g^{a^{m-j_{i}+1}})^{s_{i}}\) for \(i\in \{1,\ldots ,l\}\), and J.

  • Challenge. The adversary outputs two documents \(\mathbf {D}_{0}=(w_{0,1},\ldots ,w_{0,m})\), \(\mathbf {D}_{1}=(w_{1,1},\ldots ,w_{1,m})\) with the restrictions stated in the security game defined in Sect. 2.4.2, and \(\mathscr {B}\) throws a fair coin \(b\in \{0,1\}\).

    Then, \(\mathscr {B}\) calls the hash oracle on every keyword \(w_{b,i}\) to fill the H-list with tuples \(\langle w_{b,i},s_{b,i},c_{b,i}\rangle \). The algorithm \(\mathscr {B}\) halts if some \(c_{b,i}=1\).

    Then \(\mathscr {B}\) uniformly chooses \(J\in \mathbb {G}_{T}\) and random nonces \(r_{1},\ldots , r_{m}\in \mathbb {F}_{q}\) in such a way that if \(f_{i}=f_{j}\) then \(r_{i}=r_{j}\), and it computes the challenge \(\mathbf {I}=(I_{0},I_{1},\ldots ,I_{m})\) in the following way

    $$\begin{aligned} I_{0} = J, \quad I_{i} = \left( g^{a^{i-1}}\right) ^{r_{i}} \end{aligned}$$

    and hands over \(\mathbf {I}\) to \(\mathscr {A}\). In addition, \(\mathscr {B}\) halts if \(\sum _{i} s_{b,i}r_{i} \equiv 0 \;(\bmod \; q)\), and if not, it stores \(C=(\sum _{i} s_{b,i}r_{i})^{-1} \;(\bmod \; q)\).

  • Query Phase 2.\(\mathscr {B}\) proceeds as in Query Phase 1.

  • Guess. The adversary \(\mathscr {A}\) outputs a guess \(b'\in \{0,1\}\) for b. Then, \(\mathscr {B}\) picks a random element \(\langle t, V\rangle \) from the list in \(H_{1}\), and returns \(t^{C}\).

Note that the challenge is well-formed and that it implicitly imposes the equality \(J=H_{1}\left( e\left( g^{\sum _{i} s_{b,i}r_{i}},g\right) ^{1/a}\right) \). If \(\mathscr {B}\) does not halt, then it perfectly simulates a real attack game up until the moment when \(\mathscr {A}\) issues an \(H_{1}\) oracle query for \(t_{0}=e\left( g^{\sum _{i} s_{0,i}r_{i}},g\right) ^{1/a}\) or \(t_{1}=e\left( g^{\sum _{i} s_{1,i}r_{i}},g\right) ^{1/a}\).

Let \(\mathscr {E}\) denote the event that \(\mathscr {A}\) issues a query for \(t_{0}\) or \(t_{1}\) in a real attack game. We now lower bound the probability of \(\mathscr {E}\). Under the random oracle model, if \(\mathscr {E}\) does not happen, then \(\mathscr {B}\) does not reveal any information about b. Therefore,

$$\begin{aligned} \mathrm {Pr}(b'=b)&=\mathrm {Pr}(\mathscr {E})\mathrm {Pr}(b'=b|\mathscr {E}) +\frac{1}{2}\mathrm {Pr}(\lnot \mathscr {E})\\&=\frac{1}{2}+\mathrm {Pr}(\mathscr {E})\left( \mathrm {Pr}(b'=b|\mathscr {E}) -\frac{1}{2}\right) \end{aligned}$$

so we can express the advantage of \(\mathscr {A}\) by

$$\begin{aligned} \mathrm {Adv}_{\mathscr {A}}(\lambda )&=\left| \mathrm {Pr}(b'=b)-\frac{1}{2}\right| \\&=\mathrm {Pr}(\mathscr {E})\cdot \left| \mathrm {Pr}(b'=b|\mathscr {E}) -\frac{1}{2} \right| \le \frac{1}{2}\mathrm {Pr}(\mathscr {E}) \end{aligned}$$

and \(\mathrm {Pr}(\mathscr {E})\ge 2\mathrm {Adv}_{\mathscr {A}}(\lambda )\).

Now, suppose that

  1. 1.

    \(\mathscr {B}\) does not abort,

  2. 2.

    \(\mathscr {A}\) eventually issues an \(H_{1}\) oracle query for either \(t_{0}\) or \(t_{1}\), i.e., \(\mathscr {E}\) happens, and

  3. 3.

    \(\mathscr {B}\) chooses b such that \(\mathscr {A}\) queries the \(H_{1}\) oracle for \(t_{b}\) (this is well defined, since \(\mathscr {A}\) does not receive any information about b until \(\mathscr {E}\) happens).

Then, if \(\mathscr {B}\) calls the hash oracle \(H_{1}\) on input \(t_{b}\) in the Guess phase, it successfully computes \(e\left( g,g\right) ^{1/a}\). Since \(\mathscr {B}\) uniformly samples an element from all inputs processed by the hash oracle \(H_{1}\) to generate its output in the Guess phase, the probability of \(\mathscr {B}\) breaking the \((m+1)-\mathrm {BDHI}\) assumption in the above situation is at least \(1/q_{H_{1}}\), where \(q_{H_{1}}\) is the polynomial amount of queries issued to the \(H_{1}\) oracle. This implies

$$\begin{aligned} \mathrm {Adv}_{\mathscr {B}}(\lambda )&=\mathrm {Pr}\left( \mathscr {B} \left( g,\ldots ,g^{a^{m+1}}\right) =e\left( g,g\right) ^{1/a}\right) \\&\ge \frac{1}{q_{H_{1}}}\mathrm {Pr}(\mathscr {B} \text{ does } \text{ not } \text{ abort })\mathrm {Pr}(\mathscr {E})\frac{1}{2}\\&\ge \frac{1}{q_{H_{1}}}\mathrm {Pr}(\mathscr {B} \text{ does } \text{ not } \text{ abort }) \mathrm {Adv}_{\mathscr {A}}(\lambda ). \end{aligned}$$

By following the same argument as in Lemma 2, we see that \(\mathrm {Pr}(\mathscr {B} \text{ does } \text{ not } \text{ abort })\) is non-negligible in \(\lambda \). Since \(\mathscr {A}\) breaks the security game defined in Sect. 2.4.2 with non-negligible advantage and \(q_{H_{1}}\) is polynomial in \(\lambda \), we conclude that \(\mathrm {Adv}_{\mathscr {B}}(\lambda )\) is non-negligible as well.

As a consequence of this result, we conclude the proof of Theorem 4.

7 Efficiency analysis

We next lay out the efficiency measures of the proposed schemes and of other similar searchable encryption schemes. We state the size and the time needed to generate an encrypted index and a trapdoor and also the time taken to perform a search operation. We omit multiplication time, hash evaluation time, key setup time, field identifiers size and key storage size in the efficiency analysis. Notice that the search time refers to performing a search operation for a single encrypted index. Note that this is the case of application examples stated in Sect. 1. Thus, the search time over a set of encrypted indexes scales linearly in the number of encrypted indexes.

To analyze performance, we implement our schemes \(\mathscr {S}_{1}\) and \(\mathscr {S}_{2}\) and the schemes in [31] by using the PBC library [27], and we provide the estimated running times for each algorithm. All simulations ran on an Intel\(\textregistered \ Core^{\mathrm{TM}}\) i7-4510U CPU at 2.00 GHz and 8 GB memory under Ubuntu 16.04.1 LTS.

For the sake of comparison, we use symmetric bilinear groups (type A pairings in the PBC library documentation) in all implementations. Also, as suggested in the PBC library documentation, we fix a 512-bit base field order and a 160-bit group order to instantiate the bilinear group. In our implementations, we did not use preprocessing or any of the functions that the PBC library provides in order to speed up computations.

7.1 Conjunctive PEKS scheme

Since we restrict the analysis to conjunctive PEKS schemes, the schemes [3, 16, 32, 33, 35] lie out of the scope of the analysis. However, we include the single-keyword PEKS scheme [9] by Boneh et al. in our analysis for the sake of comparison, extending it to the conjunctive case by considering the concatenation of indexes and trapdoors and the sequential evaluation of the Search algorithm. Note that, for the single-keyword case \(m=l=1\), our scheme and [9] have similar efficiency marks.

We also restrict the analysis to the public-key setting, so we leave out schemes such as [4, 13, 14]. Other schemes such as [20, 42] are omitted due to security considerations.

The size and time efficiency measures can be found in Tables 1 and 2, respectively.

Table 1 Size efficiency comparison of conjunctive PEKS schemes
Table 2 Time efficiency comparison of conjunctive PEKS schemes

Note that the proposed scheme achieves the lowest efficiency marks for index size, trapdoor size, trapdoor generation time and search time. This is not so for the index generation time, which is constrained by the computing power of the senders. However, in several applications, the search time and the index and trapdoor size measures are far more critical, since they are constrained by the throughput of the network and by the computing power of the storage server, and search operations may be executed more than once per encrypted index.

In Table 3, we give the estimated running times in milliseconds for Table 2 by arbitrarily fixing \(m=8\) and \(l=8\) and using 48-bit keywords. For implementation reasons, we do not give the performance analysis for the schemes in [21, 30], mainly because they require the evaluation of multiple independent hash functions. We also omit the analysis of the conjunctive PEKS scheme in [11] for efficiency reasons.

Table 3 Performance analysis of conjunctive PEKS schemes

We observe that \(\mathscr {S}_{1}\) achieves the best index computation and search time in the studied case. Also, the second scheme in [31] gives the best trapdoor computation time. This is achieved in [31] by removing the need for an admissible encoding scheme, thus replacing products in the bilinear group by sums in the underlying finite field.

7.2 Subset PEKS scheme

We now give the efficiency measures for the proposed subset PEKS scheme and the related subset PEKS scheme by Boneh and Waters [11]. For the sake of comparison, we assume that queries are always the ones supported by [11]. See the beginning of Sect. 5 for more details.

One of the main differences between both schemes is that in the proposed scheme the keyword space is an arbitrary exponential-sized keyword space \(\{0,1\}^{*}\), while in [11], keywords are taken from a finite polynomial-sized keyword space. We denote by n the size of this keyword space in the efficiency analysis. Another difference is that, in the scheme we propose, the number of keywords in queries is limited at \(\mathscr {S}_{2}.\mathrm {Setup}\). We denote by L the maximum number of keywords in a trapdoor. The size and time efficiency measures can be found in Tables 4 and 5, respectively.

Table 4 Size efficiency comparison between subset PEKS schemes
Table 5 Time efficiency comparison between subset PEKS schemes

We now give the estimated running times in milliseconds for the \(\mathscr {S}_{2}\) scheme. Since the performance of the subset PEKS scheme in [11] depends strongly on the size of the keyword space, it is difficult to choose parameters allowing a sensible comparison. Therefore, we omit the performance analysis of [11].

We arbitrarily fix \(m=8\), \(l=8\) and use 48-bit keywords. We also instantiate the index and the trapdoor so that the search operation takes the longest possible. In the proposed scheme, the computation of an index and a trapdoor takes an estimated time of \(39.5\text {ms}\) and \(36.6\text {ms}\), respectively, and the search time is approximately \(7.57\text {ms}\).

8 Conclusion and future work

Public-Key Encryption with Keyword Search (PEKS) schemes enable public key holders to encrypt documents, while the secret key holder is able to generate queries for the encrypted data. In this article, we have presented two PEKS schemes enabling conjunctive and subset queries. We have proposed a security notion for PEKS, and we have proved the proposed schemes secure under the asymmetric DBDH problem and the p-BDHI problem, respectively. We have also proved the computational consistence of our constructions. The main strength of our schemes lies in their efficiency since, as shown in the provided efficiency analysis, they improve all previous related schemes in some of the most critical operations.

The proposed schemes could possibly admit various extensions. For example, we believe it is possible to extend our subset PEKS scheme to allow decryption of encrypted indexes by embedding messages in the target group, as done in works such as [11]. Such an extension would allow the retrieval of messages in the search process.

In [1], Abdalla et al. prove computational consistency for the PEKS scheme [9] by Boneh et al., and give a modified scheme achieving the stronger notion of statistical consistency. Since the conjunctive PEKS scheme we propose here can be seen as a natural extension to the scheme in [9] to the conjunctive case, it would be interesting to find similar modifications that improve consistency.

It would also be interesting to maintain a good efficiency and security trade-off while improving the security notion, for example, by providing tight security proofs in the standard model, or by removing the need for a secure channel for trapdoors.