Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In recent years, many storage services have become available with which clients can store documents or files on the service provider’s server. By using such services, clients can access their information at any time and from anywhere and any device. If the number of stored files increases, a keyword search is desirable to find particular files. On the other hand, the client can encrypt files to avoid leaking confidential information to the service provider. Searchable symmetric encryption (SSE) enables the client to search a large number of encrypted files with encrypted keywords.

The concept for SSE was introduced by Song et al. [21], and many SSE schemes have since been proposed [2, 9, 11, 12, 14, 18]. Most SSE schemes offer privacy; e.g., the server cannot learn anything about the stored files or keywords. However, such schemes do not have any mechanism to verify search results; that is, it is assumed that the server is honest and that it always follows protocol. Thus, Kurosawa and Ohtaki [18] introduced enhanced security notions, reliability and universally composable security (UC-security) for SSE schemes. Reliability ensures the validity of search results even if the server is malicious. Kurosawa and Ohtaki showed that an SSE scheme that has both privacy and reliability has UC-securityFootnote 1, and proposed a concrete SSE scheme which has UC-security.

In this paper, we evaluate the scheme proposed by Kurosawa and Ohtaki [18], and show that it is inefficient under some scenarios. We then propose a modified scheme for which privacy is somewhat weaker than the original Kurosawa-Ohtaki scheme. Therefore, our scheme provides UC-security with slightly weaker privacy. More precisely, the additional information our scheme leaks is merely the size of a set of keywords. Yet, the index size for our scheme is much smaller than the original scheme when the set of keywords is a very sparse subset of l-bit strings for some l. The index size is reduced by eliminating dummy elements from the original index and introducing a new tag that proves that the eliminated elements do not exist in the reduced index.

Related works. Various SSE schemes have been proposed. Some support additional search functions, such as multi-keyword searching [1, 13], ranked searching [7], and fuzzy searching [3, 22]. Others support adding and removing documents [15–17, 19].

In [10], a verifiable SSE scheme is proposed. The scheme has two modes: β€œprivacy preferred" and β€œefficiency preferred." However, the former mode, which is relatively more secure in terms of privacy, requires a very large index. Moreover, no formal security proof is provided for this scheme.

[19] proposed another verifiable SSE scheme. However, their scheme does not assume that a client will query a keyword that is not contained in the set of keywords used to build the index. Assume that a client generates an (encrypted) index based on a certain keyword set \(\mathcal{W}\), forgets \(\mathcal{W}\), and then searches for \(w \not \in \mathcal{W}\). In this case, the server has no choice but to return β€œno document hits" without any proof. This means that the server can forge the search results by answering β€œno document hits"at any time.

2 Verifiable Searchable Symmetric Encryption

In this section, we define a (verifiable) SSE scheme and its security. Basically, we follow the notation used in [8, 18, 20].

  • Let \(\mathcal {D}=\{D_0,\ldots ,D_{N-1}\}\) be a set of documents.

  • Let \(\mathcal{W}\) be a set of keywords.

  • For a keyword \(w \in \mathcal {W}\), let \(\mathcal {D}(w)\) denote the set of documents that contain w.

We consider a system model that has two components: a client and a server. Roughly speaking, in SSE schemes, clients encrypt all documents in \(\mathcal {D}\) before storing them on a server. Clients can then search through these documents using a keyword \(w\in \mathcal{W}\) from the set \(\mathcal {D}\), the output for which is derived as follows:

$$\begin{aligned} \mathcal{C}(w)=\{C_i \mid C_i \text { is a ciphertext of }D_i \in \mathcal {D}(w)\}. \end{aligned}$$
(1)

In response to a search query, the server returns \(\mathcal{C}(w)\). If there is a mechanism to verify the validity of the response, the scheme is a verifiable SSE (vSSE).

Hereafter, |X| denotes the bit length of X for a bit string X, and |X| denotes the cardinality of X for a set X. Furthermore, β€œPPT" refers to the probabilistic polynomial time.

2.1 System Model

Formally, a vSSE scheme has two phases: the store phase (executed only once) and the search phase (executed a polynomial number of times). Such a scheme consists of the following six polynomial-time algorithms:Footnote 2

$$ \mathtt{vSSE}=(\mathtt{Gen},\mathtt{Enc},\mathtt{Trpdr},\mathtt{Search},\mathtt{Dec},\mathtt{Verify}) $$

such that

  • \(K \leftarrow \mathtt{Gen}(1^\lambda )\): a probabilistic algorithm that generates a key K, where \(\lambda \) is a security parameter. This algorithm is run by the client during the store phase, and K is kept secret.

  • \((\mathcal{I}, \mathcal{C}) \leftarrow \mathtt{Enc}(K, \mathcal {D}, \mathcal{W})\): a probabilistic encryption algorithm that outputs an encrypted index \(\mathcal{I}\) and \(\mathcal{C}=\{C_0,\ldots ,C_{N-1}\}\), where \(C_i\) is the ciphertext for \(D_i\). This algorithm is run by the client during the store phase, and \((\mathcal{I}, \mathcal{C})\) are sent to the server.

  • \(t(w) \leftarrow \mathtt{Trpdr}(K,w)\): an algorithm that outputs a trapdoor t(w) for a keyword w. This is run by the client during the search phase, and t(w) is sent to the server.

  • \((\tilde{\mathcal{C}}(w), { Proof}) \leftarrow \mathtt{Search}(\mathcal{I}, \mathcal{C}, t(w))\): a deterministic search algorithm, where \(\tilde{\mathcal{C}}(w)\) is the search result and \({ Proof}\) is its proof. This algorithm is run by the server during the search phase, and \((\tilde{\mathcal{C}}(w), { Proof})\) is sent to the client.

  • \(\mathtt{accept/reject}\leftarrow \mathtt{Verify}(K,t(w),\tilde{\mathcal{C}}(w),{ Proof})\): a deterministic verification algorithm that determines the validity of \(\tilde{\mathcal{C}}(w)\) based on \({ Proof}\). This algorithm is run by the client.

  • \(D \leftarrow \mathtt{Dec}(K,C)\): a deterministic decryption algorithm. The client uses this algorithm for all \(C \in \tilde{\mathcal{C}}(w)\), when \(\mathtt{Verify}(K,t(w),\tilde{\mathcal{C}}(w),{ Proof}) = \mathtt{accept}\).

Correctness entails the following from the scheme for the set of documents \(\mathcal {D}\) and a keyword \(w \in \mathcal{W}\):

  • \(D_i=\mathtt{Dec}(K,C_i)\) if \(\mathcal{C}= \{C_0,\ldots ,C_{N-1}\}\) is the output of \(\mathtt{Enc}(K, \mathcal {D},\mathcal{W})\).

  • \(\mathtt{Verify}(K,t(w),\tilde{\mathcal{C}}(w),{ Proof})=\mathtt{accept}\), if \((\mathcal{I},\mathcal{C})\) is outputted by \(\mathtt{Enc}(K,\mathcal {D},\mathcal{W})\), t(w) is outputted by \(\mathtt{Trpdr}(K,w)\), and \((\tilde{\mathcal{C}}(w), { Proof})\) is outputted by \(\mathtt{Search}(\mathcal{I},\mathcal{C},t(w))\).

2.2 Security Definition

We next define some security conditions that should be satisfied by a vSSE scheme.

Privacy. In a vSSE, the server should learn as little information as possible regarding \(\mathcal {D}, \mathcal{W}\), and the queried keyword w. Let \(L=(L_1,L_2)\) be a pair of leakage functions, such that \(L_1(\mathcal {D}, \mathcal{W})\) (and respectively, \(L_2(\mathcal {D}, \mathcal{W}, \mathbf{w}, w)\)) denote the information the user permits the server to learn during the store phase (and respectively, the search phase). Here, \(\mathbf{w}=(w_1,w_2, \ldots )\) is the list of keywords queried in past searches, and w is the keyword queried now. The client’s privacy is defined by using two games: a real game \(\mathtt{{Game}}_{real}\), and a simulation game \(\mathtt{{Game}}_{sim}^L\), as shown in Figs.Β 1 and 2, respectively. \(\mathtt{{Game}}_{real}\) is played by a challenger C and an adversary A, and \(\mathtt{{Game}}_{sim}^L\) is played by C, AΒ  and a simulator S.

Fig. 1.
figure 1

Real game \(\mathtt{{Game}}_{real}\)

Fig. 2.
figure 2

Simulation game \(\mathtt{{Game}}_{sim}^{L}\)

Definition 1

( L -privacy). We say that a vSSE scheme has L-privacy, if there exists a PPT simulator S such that

$$\begin{aligned} |\Pr (\mathbf{A}\text { outputs } b=1 \text { in } \mathtt{{Game}}_{real})-\Pr (\mathbf{A}\text { outputs } b=1 \text { in } \mathtt{{Game}}_{sim}^{L})| \end{aligned}$$
(2)

is negligible for any PPT adversary A.

In most existing SSE schemes, \(L_1(\mathcal {D},\mathcal{W})\) includes \( (|D_0|,\ldots , |D_{N-1}| )\) and some information about \(\mathcal{W}\), such as \(|\mathcal{W}|\) or the length of the keywords. On the other hand, \(L_2(\mathcal {D},\mathcal{W},\mathbf{w},w)\) consists of

$$ \mathtt{List}(w)=\{j \mid D_j\in \mathcal {D}\text { contains } w\} $$

and the search pattern

$$ \mathtt{SPattern}((w_1,\ldots ,w_{q-1}),w) = (sp_1,\ldots , sp_{q-1}), ~~~ sp_j = \left\{ \begin{array}{ll} 1 &{} \text{ if } w_j = w \\ 0 &{} \text{ if } w_j \ne w \end{array}\right. $$

that reveals the past queries that are the same as w.

Reliability. In an SSE scheme, a malicious server should not cheat a client by returning a false result \(\tilde{\mathcal{C}}(w)^* (\ne \mathcal{C}(w))\) during the search phase. We generally call this notion (weak) reliability. In [20], strong reliability was also defined, and a relation between strong reliability and universal composability was discussed. Strong reliability is formulated by considering the game \(\mathtt{{Game}}_{reli}\) shown in Fig.Β 3. This game is played by an adversary \(\mathbf{A}=(\mathbf{A}_1,\mathbf{A}_2)\) (malicious server) and a challenger C. We assume that \(\mathbf{A}_1\) and \(\mathbf{A}_2\) can communicate freely.

Fig. 3.
figure 3

\(\mathtt{{Game}}_{reli}\)

Definition 2

((Strong) Reliability). We say that a vSSE scheme satisfies (strong) reliability if for any PPT adversary A, \(\Pr (\mathbf{A}\text { wins})\) is negligible for any \((\mathcal {D},\mathcal{W})\) and any search queries \(w_0,\ldots ,w_{q-1}\).

From now on, we will say just Reliability for what we mean Strong Reliability.

Universally Composable Security. It is known that if protocol \(\varSigma \) is secure in the universally composable (UC) security framework, then the security of \(\varSigma \) is maintained even if it is combined with other protocols. The security in the UC framework is defined by associating it with a given ideal functionality \(\mathcal {F}\). Refer to [4–6] for the formal definition of the UC framework. Kurosawa and Ohtaki introduced an ideal functionality of vSSE [18, 20]. Here, we generalize the definition in order to handle the general leakage functions \(L=(L_1,L_2)\) as shown in Fig.Β 4. Note that the server does not interact with \(\mathcal {F}_{vSSE}^L\), because it does not have its own input and output.

Fig. 4.
figure 4

Ideal functionality \(\mathcal {F}_{vSSE}^L\)

Definition 3

(UC-Security with Leakage L ). We say that vSSE scheme has universally composable (UC) security with leakage L, if it realizesFootnote 3 the ideal functionality \(\mathcal {F}_{vSSE}^L\).

The following theorem can be proved in the same way as the theorem in [20].

Theorem 1

vSSE has UC security with leakage L against non-adaptive adversaries if the vSSE scheme satisfies L-privacy and reliability.

2.3 Kurosawa-Ohtaki Scheme (KO-Scheme)

We next review the UC-secure SSE scheme proposed by Kurosawa and Ohtaki, KO-scheme [18]. In this scheme, the set of searchable keywords is \(\mathcal{W}\subseteq \{0,1\}^l\) for some l.

Let \(\mathtt{SKE}=(G,E,E^{-1})\) be a symmetric encryption scheme, where G is a key-generation algorithm, E is an encryption algorithm, and \(E^{-1}\) is the corresponding decryption algorithm. For a security parameter \(\lambda \), let \(\pi : \{0,1\}^{\lambda } \times \{0,1\}^{l+1+\log N} \rightarrow \{0,1\}^{l+1+\log N}\) be a pseudorandom permutation, where N denotes the number of documents in \(\mathcal {D}\), and let \(\mathtt{MAC}: \{0,1\}^{\lambda } \times \{0,1\}^* \rightarrow \{0,1\}^n\) be a tag-generation function. For simplicity, we write \(y=\pi (x)\) rather than \(y=\pi (K,x)\), and \(\mathtt{MAC}(m)\) rather than \(\mathtt{MAC}(K,m)\), where K is a key.

The KO-scheme proceeds as follows:

  • \(\mathtt{Gen}(1^\lambda )\): Run G to generate a key \(K_0\) for SKE. Randomly choose a key \(K_1 \in \{0,1\}^\lambda \) for \(\pi \) and a key \(K_2 \in \{0,1\}^\lambda \) for MAC. Output \(K=(K_0, K_1, K_2)\).

  • \(\mathtt{Enc}(K,\mathcal {D},\mathcal{W})\): First, compute \(C_i=E(K_0,D_i)\) for each \(D_i \in \mathcal {D}\), and let \(\mathcal{C}=\{C_0,\ldots ,C_{N-1}\}\). Let \(\mathcal{I}\) be an array of size \(2 \times 2^lN\). We write \(\mathcal{I}[i]\) for the i-th element of \(\mathcal{I}\).

    1. 1.

      Let

      $$\begin{aligned} \mathcal{I}[i] \leftarrow (\mathtt{dummy}, \mathtt{MAC}(i\Vert \mathtt{dummy})) \end{aligned}$$

      for all \(i=0,\ldots ,2\times 2^lN-1\).

    2. 2.

      For each \(w \in \{0,1\}^l\), suppose that \(\mathcal {D}(w)=(D_{s_1},\ldots ,D_{s_m})\). Then for \(j=1,\ldots ,m\), let

      $$\begin{aligned}&addr = \pi (0,w,j)\\&tag_{w,j}=\mathtt{MAC}(addr \Vert C_{s_j})\\&\mathcal{I}[addr] \leftarrow (s_j, tag_{w,j}). \end{aligned}$$
    3. 3.

      For each \(D_k \in \mathcal {D}\), suppose that document number k appears \(N_k\) times in \(\mathcal{I}\). Then for \(j=1,\ldots ,2^l-N_k\), let

      $$\begin{aligned}&addr=\pi (1,j,k)\\&tag_{j,k}=\mathtt{MAC}(addr \Vert C_k)\\&\mathcal{I}[addr] \leftarrow (k, tag_{j,k}). \end{aligned}$$

    Finally, output \((\mathcal{I},\mathcal{C})\).

  • \(\mathtt{Trpdr}(K,w)\): Output

    $$ t(w)=(\pi (0,w,0),\ldots ,\pi (0,w,N-1)) $$
  • \(\mathtt{Search}(\mathcal{I},\mathcal{C},t(w))\): Parse t(w) as \(t(w)=(addr_0,\ldots ,addr_{N-1})\). Suppose that

    $$ \mathcal{I}[addr_i] = (s_i, tag_i) $$

    for \(i=0,\ldots ,N-1\). First, set \(\tilde{\mathcal{C}}(w) \leftarrow \mathtt{empty}\). Then, for \(i=0,\ldots ,N-1\), add \(C_{s_i}\) to \(\tilde{\mathcal{C}}(w)\), if \(s_i \ne \mathtt{{dummy}} \). Set \({ Proof}= (tag_0, \ldots ,tag_{N-1})\). Finally, output \((\tilde{\mathcal{C}}(w), { Proof})\).

  • \(\mathtt{Verify}(K,t(w),\tilde{\mathcal{C}}(w),{ Proof})\): Parse \(t(w), \tilde{\mathcal{C}}(w)\), and \({ Proof}\) as

    $$\begin{aligned} t(w)= & {} (addr_0,\ldots ,addr_{N-1}) \\ \tilde{\mathcal{C}}(w)= & {} (\tilde{C}_0,\ldots ,\tilde{C}_{m-1}) \\ { Proof}= & {} (tag_0,\ldots ,tag_{N-1}). \end{aligned}$$

    Then, verify the validity of the result with the following steps.

    1. 1.

      Let \(X_i \leftarrow \tilde{C}_i\) for \(i=0,\ldots ,m-1\).

    2. 2.

      Let \(X_i \leftarrow \mathtt{dummy }\) for all \(i=m,\ldots ,N-1\).

    3. 3.

      If \(tag_i = \mathtt{MAC}(addr_{i}\Vert X_i)\) for all \(i=0,\ldots ,N-1\), then output accept. Otherwise output reject.

  • \(\mathtt{Dec}(K,C)\): Output a document \(D=E^{-1}_{K_0}(C)\) for a ciphertext C.

Proposition 1

For \(\mathcal{W}\subseteq \{0,1\}^l\), let

$$\begin{aligned} L^{KO}= & {} (L_1^{KO},L_2^{KO})\\ L_1^{KO}(\mathcal {D},\mathcal{W})= & {} (|D_0|,\ldots ,|D_{N-1}|, l) \\ L_2^{KO}(\mathcal {D},\mathcal{W},\mathbf{w},w)= & {} (\mathtt{List}(\mathcal {D},w), \mathtt{SPattern}(\mathbf{w},w)) \end{aligned}$$

The above scheme has \(L^{KO}\)-privacy and reliability. Therefore, it also has UC security with leakage \(L^{KO}\).

Remark 1

In [18], Kurosawa and Ohtaki claimed that only \(\mathtt{List}(\mathcal {D},w) \) is leaked during the search phase. However, it is obvious that the trapdoor leaks a search pattern, since Trpdr is deterministic.

2.4 Inefficiency of KO-Scheme

With the KO-scheme, the index size is \(\mathcal{O}(2^lN)\), where l denotes the bit-length of the keywords. Here we consider a case where \(\mathcal{W}\) is a set of English words that includes a long word, namely β€œindistinguishability.” The bit-length of this set of keywords is \(l=8\times 20\). With such a keyword set, then, the index will become very large.

Let l be the maximum length of the keywords expressed as bit strings, and \(\mathcal{W}_0=\{0,1\}^l\). Then, \(\mathcal{W}\subseteq \mathcal{W}_0\) holds. In general, the KO-scheme is inefficient whenever \(|\mathcal{W}| \ll |\mathcal{W}_0|\), and it is not uncommon.

An easy solution to this problem is to transform each word into a short bit string as follows: Let \(l' = \lceil \log _2 |\mathcal{W}| \rceil \). First, the client numbers the keywords in \(\mathcal{W}\) from 0 to \(|\mathcal{W}|-1\). That is, \(\mathcal{W}=\{w_0,\ldots ,w_{|\mathcal{W}|-1} \}\). For the Enc algorithm, the client does not use the keyword \(w_i \in \mathcal{W}\), but rather its index \(i \in [0,2^{l'}-1]\). Then, the index size is \(\mathcal{O}(2^{l'}N) \approx \mathcal{O}(|\mathcal{W}|N)\), even if \(|\mathcal{W}| \ll 2^l\).

For this solution, however, the client must keep \(\mathcal{W}\) on hand in order to translate the keyword w into its index i when searching w.

In the next section, we provide another solution to reduce the size of the index.

3 Improvement of KO-Scheme

In this section, we propose a new vSSE scheme.

The idea of reducing the index size is elimination of dummy elements from the index, and introduction of a new tag that proves that the eliminated elements do not exist in the constructed index.

Let \(M=|\mathcal{W}|\) be the number of keywords. The index of our scheme is much smaller than the index of the KO-scheme, if \(M \ll 2^l\). This means that the computation cost to generate an index and to search it is also reduced.

3.1 Concrete Description of Our Scheme

Here, \(\mathtt{SKE}=(G,E,E^{-1}),\mathcal {D},\pi \), and \(\mathtt{MAC}\) follow the denotations from Sect.Β 2.3. Our vSSE scheme is as follows:

  • \(\mathtt{Gen}(1^\lambda )\): Run G to generate a key \(K_0\) for SKE. Randomly choose a key \(K_1 \in \{0,1\}^\lambda \) for \(\pi \) and a key \(K_2 \in \{0,1\}^\lambda \) for MAC. Output \(K=(K_0, K_1, K_2)\).

  • \(\mathtt{Enc}(K,\mathcal {D},\mathcal{W})\): First, compute \(C_i=E(K_0,D_i)\) for each \(D_i \in \mathcal {D}\) and let \(\mathcal{C}=\{C_0,\ldots ,C_{N-1}\}\). Our index \(\mathcal{I}\) is an array of size \(MN+1\), and each element of \(\mathcal{I}\) has four fields:

    $$ (\mathtt{\!addr},\mathtt{ID},{\!\mathtt {tag}},\!\mathtt{Ntag}). $$

    Hereafter, \(\mathcal{I}[i]\) denotes the i-th element in \(\mathcal{I}\), and \(\mathcal{I}[i].\mathtt{\!addr}\) denotes the addr field of the i-th element in \(\mathcal{I}\). Furthermore, we will use the same notation for the other three fields. \(\mathcal{I}\) is constructed as follows.

    1. 1.

      Set

      $$\begin{aligned}&\mathcal{I}[MN].\mathtt{\!addr}\leftarrow 2^{l+1+\log N}\\&\mathcal{I}[MN].\mathtt{ID}\leftarrow \mathtt{dummy }\\&\mathcal{I}[MN].{\!\mathtt {tag}}\leftarrow \mathtt{dummy }. \end{aligned}$$
    2. 2.

      Let

      $$\begin{aligned} p_{i,j}={\left\{ \begin{array}{ll} \pi (0,w_i,j) &{}\text{ if } D_j \in \mathcal {D}(w_i)\\ \pi (1,w_i,j) &{}\text{ if } D_j \notin \mathcal {D}(w_i) \end{array}\right. } \end{aligned}$$

      for \(w_i \in \mathcal{W}\) and \(D_j \in \mathcal {D}\), and set

      $$\begin{aligned}&\mathcal{I}[Ni+j].\mathtt{\!addr}\leftarrow p_{i,j}\\&\mathcal{I}[Ni+j].\mathtt{ID}\leftarrow j\\&\mathcal{I}[Ni+j].{\!\mathtt {tag}}\leftarrow \mathtt{MAC}(0\Vert p_{i,j}\Vert C_j) \end{aligned}$$

      for \(i=0,\ldots ,M-1\) and \(j=0,\ldots ,N-1\). At this time, each element of \(\mathcal{I}\) is

      $$ \begin{array}{lclclc} \mathcal{I}[0] &{}=&{} (p_{0,0},&{}0,&{}\mathtt{MAC}(0\Vert p_{0,0}\Vert C_0),&{}\!\mathtt{undefined})\\ \mathcal{I}[1] &{}=&{} (p_{0,1},&{}1,&{}\mathtt{MAC}(0\Vert p_{0,1}\Vert C_1),&{}\!\mathtt{undefined})\\ &{}\vdots &{}&{}&{}\\ \mathcal{I}[N-1] &{}=&{} (p_{0,N-1},&{}N-1,&{}\mathtt{MAC}(0\Vert p_{0,N-1}\Vert C_{N-1}),&{}\!\mathtt{undefined})\\ \mathcal{I}[N] &{}=&{}(p_{1,0},&{}0,&{}\mathtt{MAC}(0\Vert p_{1,0}\Vert C_0),&{}\!\mathtt{undefined})\\ &{}\vdots &{}&{}&{}\\ \mathcal{I}[2N-1] &{}=&{} (p_{1,N-1},&{}N-1,&{}\mathtt{MAC}(0\Vert p_{1,N-1}\Vert C_{N-1}),&{}\!\mathtt{undefined})\\ &{}\vdots &{}&{}&{}\\ \mathcal{I}[MN-1] &{}=&{} (p_{M-1,N-1},&{}N-1,&{}\mathtt{MAC}(0\Vert p_{M-1,N-1}\Vert C_{N-1}),&{}\!\mathtt{undefined}). \end{array} $$

      Note that all values in the addr field are distinct, because \(\pi \) is a permutation.

    3. 3.

      Sort \(\mathcal{I}[0],\ldots ,\mathcal{I}[MN]\) based on the addr field, such that

      $$ \mathcal{I}[0.\mathtt{\!addr}<\mathcal{I}[1].\mathtt{\!addr}<\cdots <\mathcal{I}[MN].\mathtt{\!addr}. $$
    4. 4.

      For \(r=0,\ldots ,MN\), compute

      $$\begin{aligned} Ntag_r={\left\{ \begin{array}{ll} \mathtt{MAC}(1\Vert 0\Vert \mathcal{I}[r].\mathtt{\!addr}) &{}\text{ if } r = 0\\ \mathtt{MAC}(1\Vert \mathcal{I}[r-1].\mathtt{\!addr}\Vert \mathcal{I}[r].\mathtt{\!addr}) &{}\text{ if } r \ne 0 \end{array}\right. } \end{aligned}$$

      and set

      $$ \mathcal{I}[r].\!\mathtt{Ntag}\leftarrow Ntag_r. $$

    Finally, output \((\mathcal{I},\mathcal{C})\).

  • \(\mathtt{Trpdr}(K,w)\): Output

    $$ t(w) = (\pi (0,w,0),\ldots ,\pi (0,w,N-1)). $$
  • \(\mathtt{Search}(\mathcal{I},\mathcal{C},t(w))\): First set \(\tilde{\mathcal{C}}(w) \leftarrow \mathtt{empty}\). Parse t(w) as \(t(w)=(addr_0,\ldots ,addr_{N-1})\). For \(i=0,\ldots ,N-1\), search \(addr_i\) from the addr field in \(\mathcal{I}\), and follow the steps below:

    • If \(addr_i =\mathcal{I}[r_i].\mathtt{\!addr}\) for some \(r_i\in [0,MN-1]\), then set

      $$\begin{aligned} \tilde{\mathcal{C}}(w)\leftarrow & {} \tilde{\mathcal{C}}(w) \cup C_{\mathcal{I}[r_i].\mathtt{ID}}\\ pr_i\leftarrow & {} \mathcal{I}[r_i].{\!\mathtt {tag}}. \end{aligned}$$
    • If \(addr_i < \mathcal{I}[0].\mathtt{\!addr}\), set

      $$\begin{aligned} pr_i \leftarrow (0 \Vert \mathcal{I}[0].\mathtt{\!addr},\ \mathcal{I}[0].\!\mathtt{Ntag}). \end{aligned}$$
    • If \(\mathcal{I}[r_i-1].\mathtt{\!addr}< addr_i < \mathcal{I}[r_i].\mathtt{\!addr}\) for some \(r_i\in [1,MN]\), then set

      $$\begin{aligned} pr_i \leftarrow (\mathcal{I}[r_i-1].\mathtt{\!addr}\Vert \mathcal{I}[r_i].\mathtt{\!addr},\ \mathcal{I}[r_i].\!\mathtt{Ntag}). \end{aligned}$$

    Set \({ Proof}=(pr_0,\ldots ,pr_{N-1})\). Finally, output \((\tilde{\mathcal{C}}(w), { Proof})\).

  • \(\mathtt{Verify}(K,t(w),\tilde{\mathcal{C}}(w),{ Proof}\)): Parse \(\tilde{\mathcal{C}}(w), { Proof}\), and t(w) as

    $$\begin{aligned} \tilde{\mathcal{C}}(w)= & {} (\tilde{C}_0,\ldots ,\tilde{C}_{m-1})\\ { Proof}= & {} (pr_0,\ldots ,pr_{N-1})\\ t(w)= & {} (addr_0,\ldots ,addr_{N-1}) \end{aligned}$$

    and follow the steps below to verify the validity of the search result.

    1. 1.

      If m is not equal to the number of \(pr_j\)s that are tagsβ€”meaning that they do not consist of a pair of addr and Ntagβ€”then output reject.

    2. 2.

      For each \(pr_{j}\) that is a tag, if there does not exist a distinct \(i\in \{0,\ldots ,m-1\}\), such that

      $$\begin{aligned} \mathtt{MAC}(0\Vert addr_{j}\Vert \tilde{C}_i) =pr_{j}, \end{aligned}$$
      (3)

      then output reject.

    3. 3.

      For a \(pr_j\) that is not a tag but rather a pair of addr and Ntag, assume that \(pr_j=(addr'_{j,1}\Vert addr'_{j,2},Ntag_j)\). If the following two statements are true for all such j, then output accept.

      $$\begin{aligned}&addr'_{j,1}<addr_j<addr'_{j,2}&\end{aligned}$$
      (4)
      $$\begin{aligned}&\mathtt{MAC}(1\Vert addr'_{j,1}\Vert addr'_{j,2})=Ntag_j&\end{aligned}$$
      (5)

      Otherwise, output reject.

  • \(\mathtt{Dec}(K,C)\): Output document \(D=E^{-1}_{K_0}(C)\) for a ciphertext C.

3.2 Security

Let

$$\begin{aligned} L^{new}= & {} (L_1^{new},L_2^{new})\\ L_1^{new}(\mathcal {D},\mathcal{W})= & {} L_1^{KO}(\mathcal {D},\mathcal{W})\cup {|\mathcal{W}|} = (|D_0|,\ldots ,|D_{N-1}|, l, |\mathcal{W}|) \\ L_2^{new}(\mathcal {D},\mathcal{W},\mathbf{w},w)= & {} L_2^{KO}(\mathcal {D},\mathcal{W},\mathbf{w},w) =(\mathtt{List}(\mathcal {D},w),\mathtt{SPattern}(\mathbf{w},w)) \end{aligned}$$

We can prove that the above scheme satisfies \(L^{new}\)-privacy and reliability, and therefore UC-security with leakage \(L^{new}\).

Theorem 2

If the symmetric encryption scheme \(\mathtt{SKE}=(G,E,E^{-1})\) is secure in terms of indistinguishability against chosen-plaintext attacks (IND-CPA), and if \(\pi \) is a pseudorandom permutation, then the proposed scheme satisfies \(L^{new}\)-privacy.

Proof

We construct the simulator S as follows. First, S receives \(L_1(\mathcal {D},\mathcal{W}) =(|D_0|,\ldots ,|D_{N-1}|,l, |\mathcal{W}|)\).

  1. 1.

    S runs \(\mathtt{Gen}(1^\lambda )\) to generate key \(K=(K_0,K_1,K_2)\).

  2. 2.

    Let \(C'_j=E(K_0, 0^{|D_j|})\) for \(j=0,\ldots ,N-1\), and let \(\mathcal{C}'=\{C'_0,\ldots ,C'_{N-1}\}\).

  3. 3.

    Choose \(w'_0,\ldots ,w'_{M-1}\) from \(\{0,1\}^l\) randomly, and set \(\mathcal{W}' = \{w'_0,\ldots , w'_{M-1}\}\). Compute the index \(\mathcal{I}'\) as if all of the documents \(D_0,\ldots ,D_{N-1} \in \mathcal {D}\) include all of the keywords in \(\mathcal{W}'\). That is, for \(i = 0,\ldots ,M-1\) and \( j=0,\ldots ,N-1\),

    $$\begin{aligned} \mathcal{I}'[r'_{i,j}]= & {} (\pi (0,w'_i,j),j,tag'_{i,j},\!\mathtt{undefined}) \\ \mathcal{I}'[MN]= & {} (2^{l+1+\log N},\mathtt{dummy },\mathtt{MAC}(0\Vert \mathtt{dummy }),\!\mathtt{undefined}) \nonumber \end{aligned}$$
    (6)

    where

    $$\begin{aligned} r'_{i,j}= & {} Ni+j\\ tag'_{i,j}= & {} \mathtt{MAC}(0||\pi (0,w'_i,j)||C'_j). \end{aligned}$$

    Next, sort the elements based on the addr field, and set

    $$ \mathcal{I}'[r].\!\mathtt{Ntag}\leftarrow \mathtt{MAC}(1\Vert \mathcal{I}'[r-1].\mathtt{\!addr}\Vert \mathcal{I}'[r].\mathtt{\!addr}). $$
  4. 4.

    Return \((\mathcal{I}',\mathcal{C}')\).

During the i-th search iteration, S is given

$$ \mathtt{List}(w_i)=\{s_0,\ldots ,s_{m-1}\} $$

and

$$ \mathtt{SPattern}(\mathbf{w},w_i) = (sp_1,\ldots ,sp_{i-1}) $$

(but neither \(w_i\) nor \(\mathbf{w}\)). S simulates the trapdoor as follows.

  1. 1.

    If \(sp_j=1\) for some \(j <i\), then S sets \(t_i' = t_j'\) and returns \(t'_i\).

  2. 2.

    If \(\mathtt{List}(w_i) = \emptyset \), then S randomly chooses \(w' \in \{0,1\}^l {\setminus } (\mathcal{W}' \cup \mathcal{W}_\mathrm{used})\) Footnote 4, otherwise, S randomly chooses \(w' \in \mathcal{W}' {\setminus } \mathcal{W}_\mathrm{used}\), where \(\mathcal{W}_\mathrm{used}\) is initially empty. Then, S sets

    $$\begin{aligned} \mathcal{W}_\mathrm{used}= & {} \mathcal{W}_\mathrm{used} \cup \{w'\} \\ addr'_j= & {} {\left\{ \begin{array}{ll} \pi (0,w',j) &{}\text{ if } j \in \mathtt{List}(w_i)\\ \pi (1,w',j) &{}\text{ if } j \notin \mathtt{List}(w_i) \end{array}\right. } \end{aligned}$$

    for \(j=0,\ldots ,N-1\), and returns

    $$ t_i'=(addr'_0,\ldots ,addr'_{N-1}). $$

We will prove that there is no adversary A who can distinguish the games \(\mathtt{{Game}}_{real}\) and \(\mathtt{{Game}}_{sim}\) by using six games \(\mathtt{{Game}}_0,\ldots ,\mathtt{{Game}}_5\). Let \(\mathtt{{Game}}_0=\mathtt{{Game}}_{real}\). Hereafter, we write

$$ P_i = \Pr (\mathbf{A}\, \mathrm{outputs}\, b=1 \mathrm{\, in}\,\mathtt{{Game}}_i) $$

for simplicity.

  • \(\mathtt{{Game}}_1\) is equivalent to \(\mathtt{{Game}}_0\), except that each \(C_j = E(K_0, D_j)\) is replaced with \(C'_j= E(K_0, 0^{|D_j|})\) for \(j=0,\ldots ,N-1\). From the assumption for SKE, \(|P_0-P_1|\) is negligible.

  • \(\mathtt{{Game}}_2\) uses a real random permutation \(\pi _2\) for computing addrs rather than pseudorandom permutation \(\pi \) as with \(\mathtt{{Game}}_1\). Then, \(|P_1-P_2|\) is negligible, owing to the pseudorandomness of \(\pi \).

  • \(\mathtt{{Game}}_3\) is equivalent to \(\mathtt{{Game}}_2\), except that the set of keywords is changed from \(\mathcal{W}\) to \(\mathcal{W}'(|\mathcal{W}'|=M)\), and the random permutation is changed to \(\pi _3\), whose output for a keyword \(w_i \in \mathcal{W}\) is the same as the output of \(\pi _2\) for input \(w'_i \in \mathcal{W}'\), and the output for \(w'_i \in \mathcal{W}\) is the same as the output of \(\pi _2\) for \(w_i \in \mathcal{W}\) for all i. Then, \(\pi _3\) is a random permutation, as is \(\pi _2\), and the constructed indexes for \(\mathtt{{Game}}_2\) and \(\mathtt{{Game}}_3\) are identical. Hence, \(|P_2-P_3|\) is negligible.

  • \(\mathtt{{Game}}_4\) is equivalent to \(\mathtt{{Game}}_3\), except that the \(\mathtt{List}(w')\) in \(\mathtt{{Game}}_3\) is replaced by \(\mathtt{List}'(w')=\{0,\ldots ,M-1\}\) for all \(w' \in \mathcal{W}'\), and \(\pi _3\) is replaced by \(\pi _4\), which satisfies

    $$\begin{aligned} \pi _4(0\Vert w'\Vert j)= & {} {\left\{ \begin{array}{ll} \pi _3(0\Vert w'\Vert j) &{}\text{ if } j \in \mathtt{List}(w')\\ \pi _3(1\Vert w'\Vert j) &{}\text{ if } j \notin \mathtt{List}(w') \end{array}\right. }\\ \pi _4(1\Vert w'\Vert j)= & {} {\left\{ \begin{array}{ll} \pi _3(1\Vert w'\Vert j) &{}\text{ if } j \in \mathtt{List}(w')\\ \pi _3(0\Vert w'\Vert j) &{}\text{ if } j \notin \mathtt{List}(w') \end{array}\right. } \end{aligned}$$

    for all \(j=0,\ldots ,N-1\) and all \(w'\). Then, \(\pi _4\) is also a random permutation, and the constructed indexes for \(\mathtt{{Game}}_3\) and \(\mathtt{{Game}}_4\) are identical. Hence, \(|P_3-P_4|\) is negligible.

  • In \(\mathtt{{Game}}_5\), we use pseudorandom permutation \(\pi \), rather than \(\pi _4\), and this is the only difference between \(\mathtt{{Game}}_5\) and \(\mathtt{{Game}}_4\). Because \(\pi _4\) is a random permutation, \(|P_4-P_5|\) is negligible, owing to the pseudorandomness of \(\pi \).

From the above, \(|P_0-P_5|\) is negligible. Since it is obvious that \(\mathtt{{Game}}_5=\mathtt{{Game}}_{sim}\), \(\mathtt{{Game}}_{real}\) and \(\mathtt{{Game}}_{sim}\) are indistinguishable for any adversary A. Β Β Β \(\square \)

Theorem 3

If MAC is existentially unforgeable against chosen-message attacks, our scheme satisfies reliability.

Proof

Suppose that for \((\mathcal {D},\mathcal{W})\) and search queries \(w_0,\ldots ,w_{q-1}\), there exists an adversary \(\mathbf{A}=(\mathbf{A}_1,\mathbf{A}_2)\) who can break the reliability. We show that a forger B against MAC can be constructed using A.

B behaves like a client. When B receives \((\mathcal {D},\mathcal{W})\) from \(\mathbf{A}_1\) during the store phase, it creates \(\mathcal{I}\) and \(\mathcal{C}\) ordinarily, except that B does not choose the key for MAC, but rather uses its own MAC oracle to compute \(\mathcal{I}\). Here, B will send queries to its MAC oracle only when constructing \(\mathcal{I}\).then B sends \((\mathcal{I}, \mathcal{C})\) to \(\mathbf{A}_2\). We note that B will send queries to its MAC oracle only when constructing \(\mathcal{I}\).

In the search phase, \(\mathbf{A}_1\) sends \(w_i\) to B for q times. B calculates a trapdoor \(t(w_i)\) for \(w_i\) normally and sends it to \(\mathbf{A}_2\). \(\mathbf{A}_2\) outputs \((\tilde{\mathcal{C}}(w_i)^*,{ Proof}_i^*)\) and sends it back to B. While this step, B also runs the Search algorithm and gets \((\mathcal{C}(w_i), Proof_i)\) for its own.

For each i, \(\mathbf{A}_2\)’s output \((\tilde{\mathcal{C}}(w_i)^*,{ Proof}_i^*)\) is either of the following three types:

  • Type 1 \((\tilde{\mathcal{C}}(w_i)^*,{ Proof}_i^*)=(\mathcal{C}(w_i),{ Proof}_i)\).

  • Type 2 \((\tilde{\mathcal{C}}(w_i)^*,{ Proof}_i^*) \ne (\mathcal{C}(w_i),{ Proof}_i)\) and:

    • Type 2-1 the Verify algorithm outputs reject.

    • Type 2-2 the Verify algorithm outputs accept.

For each output of Type 1, B returns \(\mathcal {D}(w_i)\) to \(\mathbf{A}_1\).

For each output of Type 2, B has to return reject if it is Type 2-1, and a plaintext \(\tilde{\mathcal {D}}(w_i)^*\) of \(\tilde{\mathcal{C}}(w_i)^*\) if it is Type 2-2. However, B cannot distinguish Type 2-1 and Type 2-2, since B does not have the key for the MAC itself.

For this problem, B randomly chooses J from [1,Β q] at the beginning of the search phase. This J is the prediction by B of i such that i-th output is the first Type 2-2 output. Based on this J, B performs as follows.

For outputs of \(\mathbf{A}_2\) before the J-th output in the search phase, B considers all Type 2 outputs to be Type 2-1, and returns reject to \(\mathbf{A}_1\).

If the J-th output of \(\mathbf{A}_2\) is not Type 2, B fails to forge a MAC and aborts. Otherwise, B considers it as Type 2-2, and determines its output as below.

  • Case a: \(\mathcal{C}(w_J) \ne \tilde{\mathcal{C}}(w_J)^*\) Suppose that

    $$\begin{aligned} t(w_J)= & {} (addr_0,\ldots ,addr_{N-1})\\ \mathcal{C}(w_J)= & {} (C_{q_0},\ldots ,C_{q_{m-1}})\\ \tilde{\mathcal{C}}(w_J)^*= & {} (C^*_0,\ldots ,C^*_{m^*-1})\\ { Proof}_J^*= & {} (pr^*_0,\ldots ,pr^*_{N-1}). \end{aligned}$$

    Note that \(addr_j = \pi (0, w_J, j)\), and that B knows all of the above values. Because \(\tilde{\mathcal{C}}(w_J)^*\ne \mathcal{C}(w_J)\), we need to consider only the following three cases.

    • Case a-1: \(m^*=m\), and there exists \(C^*_i\) in \(\tilde{\mathcal{C}}(w_J)^*\) but not in \(\mathcal{C}(w_J)\)

    • Case a-2: \(m^*>m\)

    • Case a-3: \(m^*<m\).

    In both Cases a-1 and a-2, there exists \(C^*_i\) in \(\tilde{\mathcal{C}}(w_J)^*\) but not in \(\mathcal{C}(w_J)\). Then, B randomly chooses \(pr_j^* \in { Proof}_J^*\) that is a tag, and outputs a forged message-tag pair \(((0\Vert addr_j\Vert C^*_i), pr_j^*)\). Here we assume that A wins in \(\mathtt{{Game}}_{reli}\) and that B successfully predicts J, that is,

    $$\begin{aligned} \mathtt{Verify}(K,t(w_J),\tilde{\mathcal{C}}(w_J)^*,{ Proof}_J^*) = \mathtt{accept} \end{aligned}$$
    (7)

    holds with a non-negligible probability. We show that B’s output shown above is a valid forgery against MAC with a non-negligible probability. This means that \(pr^*_j=\mathtt{MAC}(0\Vert addr_j\Vert C^*_{i})\), and that B did not send the query \((0\Vert addr_j\Vert C^*_i)\) to its own MAC oracle. First, from Eqs. (3) and (7), there exists \(pr_{j'}^*\) such that \(\mathtt{MAC}(0\Vert addr_{j'}\Vert C^*_i) =pr_{j'}^*\), in \({ Proof}_J^*\). \(j=j'\) holds with at least probability 1Β /Β m. Next, we can see that B has never queried \((0\Vert addr_{j''} \Vert C^*_i)\) for any \(j''\) to its own MAC oracle when computing \(\mathcal{I}\), because \(C^*_i\not \in \mathcal{C}(w_J)\). Therefore, B has succeeded in forging a valid tag with non-negligible probability. In Case a-3, there exists \(C_{q_i}\) in \(\mathcal{C}(w_J)\), but not in \(\tilde{\mathcal{C}}(w_J)^*\). Then, \(pr_{q_i}^*\) consists of a pair of addrs and Ntag. Let \(pr_{q_i}^* = (addr_1^* \Vert addr_2^*, Ntag^*)\). B outputs \(((1 \Vert addr_1^* \Vert addr_2^*), Ntag^*)\) as a forgery. Assume that A wins in \(\mathtt{{Game}}_{reli}\). From Eqs.(4) and (5),

    $$\begin{aligned}&addr_{1}^*< addr_{q_i} < addr_{2}^*&\\&\mathtt{MAC}(1\Vert addr_{1}^* \Vert addr_{2}^*) = Ntag^*&\end{aligned}$$

    hold. Further, \(C_{q_i} \in \mathcal{C}(w_J)\) implies that \(addr_{q_i}\) appears in the addr field of \(\mathcal{I}\). Therefore, B did not query \((1\Vert addr'_{1} \Vert addr'_{2})\) to the MAC oracle for any \((addr'_{1}, addr'_{2})\) such that \(addr'_{1}< addr_{q_i} < addr'_{2}\). This means that if A wins in \(\mathtt{{Game}}_{reli}\) with non-negligible probability, then B succeeds with a non-negligible probability. This contradicts the assumption about the security of MAC.

  • Case b: \(\mathcal{C}(w_J)=\tilde{\mathcal{C}}(w_J)^*\) From \((\mathcal{C}(w_J),{ Proof}_J)\ne (\tilde{\mathcal{C}}(w_J)^*, { Proof}_J^*)\), it is obvious that \({ Proof}_J \ne { Proof}_J^*\). Suppose that

    $$\begin{aligned} { Proof}_J= & {} (pr_1,\ldots ,pr_N),\\ { Proof}_J^*= & {} (pr_1^*,\ldots ,pr_N^*) \end{aligned}$$

    Then, there exists an \(i\ s.t.\ pr_i \ne pr_i^*\). Since \(pr_i\) and \(pr_i^*\) are either a tag or a pair \((addr_1 \Vert addr_2,Ntag)\) of addrs and Ntag, every case will be either of the following four cases.

    • Case b-1: both \(pr_i\) and \(pr_i^*\) are tags

    • Case b-2: both \(pr_i\) and \(pr_i^*\) are pairs \((addr_1 \Vert addr_2,Ntag)\)

    • Case b-3: \(pr_i\) is a tag, and \(pr_i^*\) is a pair \((addr_1 \Vert addr_2, Ntag)\)

    • Case b-4: \(pr_i\) is a pair \((addr_1 \Vert addr_2, Ntag)\), and \(pr_i^*\) is a tag.

    In Case b-1, B knows the \(C_j\) which satisfies \(pr_i = \mathtt{MAC}(0 \Vert addr_i \Vert C_j)\). So it chooses another \(C_{j'}\) from \(\mathcal{C}(w_J)\) randomly, and outputs \((0 \Vert addr_i\Vert C_{j'})\) and \(pr_i^*\). In Cases b-2 and b-3, B outputs \(((1 \Vert addr_1^* \Vert addr_2^*), Ntag^*)\), where \(pr_i^*=(addr_1^* \Vert addr_2^*, Ntag^*)\). In Case b-4, if there exists an \(i' (\ne i)\) where Case b-3 is occurring, then B applies exactly the same method as in Case b-3 to \(pr_{i'}^*\) instead of \(pr_{i}^*\) Footnote 5. If A succeeds in breaking the reliability, B successfully predicts J in probability 1Β /Β q. When A wins and B successfully predicts J, then B successfully forges a MAC with probability at least 1Β /Β N. Therefore, we obtain

    $$ \Pr ({\mathbf{B}} \, \mathrm{succeeds}) \ge \Pr ( \mathbf{A}\mathrm{wins \, in} \,\mathtt{{Game}}_{reli}) \times \frac{1}{qN}. $$

    Note that q and N are polynomials of security parameter \(\lambda \).

As a result, our scheme satisfies reliability if MAC is unforgeable against chosen message attack. Β Β Β \(\square \)

From Theorems 1, 2, and 3, our scheme is UC-secure.

Corollary 1

If the symmetric encryption scheme \(\mathtt{SKE}=(G,E,E^{-1})\) is IND-CPA secure, and if \(\pi \) is a pseudorandom permutation, and if MAC is existentially unforgeable against chosen-message attacks, then the above scheme is UC-secure with leakage \(L^{new}\).

The leaked information under our scheme is slightly more than the leakage from the KO-scheme. In particular, our scheme leaks the number of keywords to the server.

Should the client prefer to avoid leaking the exact number of keywords, dummy keywords can be added to \(\mathcal{W}\). Of course, the more dummy keywords that are added, the larger the index grows. This is constitutes a trade-off between security and computational costs.

Nevertheless, the proposed scheme can modify the maximum length of a keyword l by replacing the permutation \(\pi \) with a collision resistant hash function.

3.3 Comparison

TableΒ 1 shows the index size and the computational cost for each algorithm, comparing the KO-scheme with the proposed scheme. In the estimation on the cost of Enc algorithm, we ignored the cost of sorting, which we see it as negligible compared to MAC. We eliminated the rows for algorithms Gen, Trpdr, Dec, inasmuch as they are exactly the same in both schemes.

We can see that our Enc algorithm is much more efficient and that the index size is much smaller than the KO-scheme, when \(|\mathcal{W}| \ll 2^l\). However, our Search and Verify algorithms are less efficient than those in the KO-scheme, because our scheme requires an extra step to search the addrs from the addr field in Search, and to search the \(C_i\)s corresponding to tags in Verify.

Table 1. Comparison of the efficiency of proposed scheme with the KO-scheme

dummy

4 Conclusion

In this paper, we provided generalized definitions for the privacy and UC-security of SSE schemes, and we proposed a vSSE scheme as a modified version of the KO-scheme. Whereas the privacy of the proposed scheme is slightly weaker than the original, the index size is much smaller when the set of keywords is a very sparse subset of l-bit strings for some l.

Importantly, the idea of Ntag to prove β€œnon-existence” can be applied widely. For example, the weak point of the (dynamic) vSSE scheme [19] can be overcome by adding an Ntag after sorting the elements in \(\mathcal{I}\) based on \(\mathtt{label}\). Similarly, most SSE schemes might be converted to vSSE schemes by including a tag and an Ntag.