1 Introduction

Using a remote database like a cloud storage service is the most common method for managing huge data. When data are stored in a remote database, keeping the confidentiality of the stored data becomes the most important issue. To protect stored data, cloud storage managers have adopted various means such as user authentication and access control. However, these are not fundamental solutions, as the cloud storage manager can easily access the stored data. Another way to maintain the privacy of stored data is applying encryption systems. Well-developed encryption systems guarantee the secrecy of the encrypted data from everyone other than the data owner. Indeed, applying the usual encryption systems to cloud storages can be excessive in the sense that encryption systems prevent not only information leakage, but also useful operations such as search and computation. It causes a great deal of inconvenience. When a data owner wants to search a specific document within a cloud storage, the cloud server cannot carry out the search process by itself. Therefore, the entire set of encrypted documents has to be sent to the data owner, the data owner runs the search process after decryption. Searchable encryption is proposed to solve this problem, providing a method of searching a specific encrypted document without decryption.

Since Song et al. introduced the concept of encrypted data search [1], searchable encryption is formalized with many following researches [28]. In searchable encryption, it is assumed that each document consists of several keywords, and a query is determined by keywords that the user wants to search. Because ciphertexts of encryption systems are designed to reveal no information about the corresponding plaintext, searchable encryption provides clues for searching, which is called ‘index’. The index contains information about relations between keywords and encrypted documents. However, an index is also locked to keep the secrecy, and a special key, called a ‘trapdoor, is needed to open it. Since search processes are carried out by the server, the user has to generate and hand over a trapdoor to the server.

Usually, a searchable encryption system consists of four algorithms: key generation, build index, trapdoor generation, and search. In the key generation procedure, the user prepares system parameters including encryption and decryption keys. A set of documents and system parameters are given to the build index algorithm as an input. The build index algorithm outputs encrypted documents and indexes. Outputs of the build index algorithm are sent to the server and stored. If the user wants to search for documents, then the user runs the trapdoor generation algorithm. The keywords that the user wants to search and the user’s secret key are input into the trapdoor generation algorithm. After the trapdoor is given to the server, the server runs the search algorithm. Trapdoor and indexes are input for the search algorithm, and the result of the search algorithm is a set of documents corresponding to the query. Finally, the results of the search algorithm are given to the user.

We can roughly classify searchable encryptions into two categories: public key-based systems and symmetric key-based systems. These two types have different characteristics and different areas of application. The main advantage of public key-based systems is that anyone can provide encrypted documents. Public key-based systems are also capable of adding various abilities easily. On the other hand, symmetric key-based systems have relatively small computational cost for encryption, decryption, or searching. Therefore, symmetric key-based systems are more proper for managing a huge data set than public key-based systems. A privacy-preserving e-mail server can be a typical example of public key-based searchable encryption, while a remote storage system can be a typical application of searchable symmetric encryption.

Basic searchable encryption provides a search algorithm that can find documents corresponding to just one specific keyword. However, this basic search algorithm is very limited and cannot satisfy various demands that naturally arise. Therefore, designing a searchable encryption with useful additional functions is an important goal in searchable encryption. Conjunctive keyword search, range query, ordering, size comparison, and arbitrary search are additional functions which are frequently remarked upon. In this paper, we concentrate on the range query problem.

Formally, the result of a range query is a set of documents of which corresponding keywords are included within a queried interval, a set of successive keywords. To achieve a range query regarding an interval [ab] using ordinal searchable encryption, the user has to do simple keyword searches \(b-a+1\) times repeatedly. This is a very inefficient and insecure method. Because the server, which actually runs the search algorithm, trivially obtains information regarding the size of the queried range and can also divide the result into \(b-a+1\) subsets—each subset corresponds to a single keyword. Up to now, a few protocols have been proposed to solve the range query problem. One was proposed by [9], in which they adapted tree structure and developed a multi-dimensional range query system. Another was proposed by [10], in which the searchable encryption could support many additional functions: conjunctive keyword search, subset query, and range query. The main building block of their work is a bilinear map defined over elliptic curves. Both proposals are based on public key-based techniques.

On the other hand, order-preserving encryption (OPE) is also capable of dealing with range query problem on encrypted data [1113]. OPE is an encryption system which preserves order relations of plaintexts. In other words, \(m_1 > m_2\) implies \(\mathrm{OPE}_k(m_1) > \mathrm{OPE}_k(m_2)\) for any encryption key k, where \(\mathrm{OPE}_k(m_i)\) means a ciphertext of \(m_i\) which is encrypted using k in an OPE manner. In OPE systems, order relations of plaintexts are simply open to anyone, so that utilizing encrypted data—including range query, sorting, min, max, binary search—is easy. However, it is well known that security provided by OPE systems is very limited and is not adequate for data which requires high confidentiality. Moreover, applicable area of OPE is also limited—each document in OPE systems is assumed to have only one (numerical) keyword. For these reasons, we focus our interest on searchable encryption systems here.

In this paper, we propose a new searchable encryption protocol that provides efficient range query, by utilizing only symmetric key encryption systems. Moreover, it has another advantage, the search time depends on the number of documents that correspond to the query outcome rather than to the size of the entire database.

The organization of this paper is as follows. In Sect. 2, we introduce designing concepts and a formal construction for the new protocol. The security proof is presented in Sect. 3. In Sect. 4, we provide comparison of efficiency. Finally Sect. 5 concludes this paper.

Fig. 1
figure 1

Linked chain structure

2 New protocol

2.1 Basic idea

The intrinsic property of range query deals with consecutive keywords simultaneously. Up to now, tree structure is mainly considered to manage neighboring keywords efficiently. However, the tree structure is very limited to manage arbitrary intervals. Therefore, in searchable encryption protocols based on tree structure, the user and the server have to deal with several nodes separately. To mitigate this shortcoming, we present a new approach for range query with linked chain structure.

Curtmola et al. proposed a symmetric searchable encryption (SSE) in 2006 [6]. In this work, they adapted linked chain structure for constructing SSE. This was a new approach for searchable encryption problem. In the classical searchable encryption, an index is computed for each document and attached to an encrypted document. The search algorithm requires comparing every index to a given trapdoor. This means that the running time of the search process cannot be reduced below O(N), where N represents the number of stored documents. However, in Curtmola’s approach, documents that are related to a specific keyword consist of one linked chain. Therefore, the search process simply follows the linked chain, gathering identifiers of documents. To maintain secrecy, every node of the linked chain is positioned randomly in an array and encrypted. A link, which consists of the node’s position and decryption key, is stored at the ancestor node. The trapdoor contains the first node’s position and decryption key. This approach has a great advantage: the running time of the search algorithm depends on the number of matched documents, instead of on the number of total documents (Fig. 1).

If we insert a link from the ‘ending point’ of a chain corresponding to keyword ‘1’ to the ‘starting point’ of a chain corresponding to keyword ‘2’, then the two chains are connected, and searching with the trapdoor of keyword ‘1’ gives documents that correspond to keywords ‘1’ or ‘2’, i.e., an interval from 1 to 2. Similarly, if we connect the ending point of a chain corresponding to keyword ‘i’ to the starting point of a chain corresponding to keyword ‘\(i+1\)’ for \(1 \le i < R\), then we can obtain a trivial solution for a range query with an interval [aR], where \(1 \le a \le R\). Note that a trapdoor for the keyword ‘a’ returns documents corresponding to an interval [aR] (Fig. 2). However, this trivial solution is not satisfactory. There is a problem: the ending point is fixed so that it is hard to manage a general interval [ab], for \(1 \le a\), \(b \le R\).

Fig. 2
figure 2

Connection between linked chains

To solve the problem, we introduce multi-layered linked chain structure. Assume that every keyword belongs to [1, R], where \(R = 2^\ell \) and \(\ell \) is an integer, for simplicity. We divide [1, R] into two sub-intervals \([1,E_{0,0}]\) and \([E_{0,0}+1, R]\), and connect chains in \([1, E_{0,0}]\) and \([E_{0,0}+1, R]\) separately. In \([1, E_{0,0}]\), each external link connects from the chain corresponding to keyword ‘i’ to the chain corresponding to keyword ‘\(i+1\)’, i.e., in increasing direction. Oppositely, in \([E_{0,0}+1, R]\), chains are connected in decreasing direction. Then, we can manage every interval [ab] with \(1 \le a \le E_{0,0} \le b \le R\) easily. A trapdoor consists of the starting points and encryption keys for chain ‘a’ and chain ‘b’. By searching two linked chains, all documents related to the interval [ab] are searched.

If [ab] does not satisfy the condition \(1 \le a \le E_{0,0} \le b \le R\), it means that [ab] completely belongs to \([1, E_{0,0}]\) or \([E_{0,0}+1, R]\). In that case, we can manage the interval by dividing \([1, E_{0,0}]\) and \([E_{0,0}+1, R]\) into two sub-intervals and constructing other linked chains, respectively. To manage a general interval [ab], we repeat the above process until each interval contains only one keyword. By the assumption that \(R = 2^\ell \), total \(\ell \) linked chains are generated for each keyword, if every interval is divided into two sub-intervals with the same size. With this multi-layered linked chain structure, we can manage a general interval [ab] easily and efficiently. For all cases, at most two starting points and encryption keys are required as a trapdoor (Fig. 3).

2.2 Formal construction

In this section, we will introduce a formal construction of the proposed protocol. We assume followings: there are a total of N documents in a data set and each document contains one keyword that is an integer between 1 and R. For convenience, we assume that \(R = 2^\ell \) for an integer \(\ell \). Each document has a unique identifier.

Fig. 3
figure 3

Multi-layered linked chain structure

2.2.1 Key generation

The user prepares a symmetric encryption system E and selects a \({\uplambda }\)-bit secret key k. From now on, \(E_k(m)\) represents a ciphertext of a message m encrypted using a secret key k. The user also selects two cryptographically secure one-way functions, \(f : \{1, \ldots , \ell \} \times \{0, 1\}^\ell \rightarrow \{0, 1\}^\mu \) and \(h : \{1, \ldots , \ell \} \times \{0, 1\}^\ell \rightarrow \{0, 1\}^{\uplambda }\), where \(\mu = \lceil \log _2(\ell \times (N + R)) \rceil \). The secret key k and one-way functions f and h are kept secret.

2.2.2 BuildIndex

  1. 1.

    Preparation

    1. (a)

      The user constructs an array A, which has \(2^\mu \) elements, where each element \(A[i] = (ID_i, \langle P_i, K_i \rangle )\). Initially, every component of A[i] has a special value, ‘EMPTY’. (Later, \(ID_i\) will store a document identifier, and \(\langle P_i, K_i \rangle \) will store a link to the next element of a linked chain.)

    2. (b)

      The user constructs an array B, which is a temporary storage of encryption keys for each element of A. B has \(2^\mu \) elements and each element is \({\uplambda }\)-bits in size.

    3. (c)

      Let S be a set of all documents. The user divides S into R subsets \(S_1, S_2, \ldots , S_R,\) so that \(S_i\) is a set containing every document that corresponds to keyword ‘i’.

  2. 2.

    Marking ‘starting points’

    1. (a)

      For each t and i, with \(1 \le t \le \ell \), \(1 \le i \le R\), compute f(ti) and reserve A[f(ti)] for the ‘starting point’ of the chain with the t-th layer and the keyword i.

    2. (b)

      Store h(ti) at B[f(ti)].

  3. 3.

    Constructing a linked chain for a single keyword.

    1. (a)

      For each t and i, with \(1 \le t \le \ell \), \(1 \le i \le R\), repeat followings.

      1. (i)

        If \(S_i\) is an empty set, then \(ID_{f(t, i)}\) is filled with a random bit string which is distinguishable from proper document identifiers only by the user. (For example, an integer which is greater than the maximal identifier, or simply all zero bits. Note that each node will be encrypted using a unique key; therefore, storing the same bit string several times does not harm the security.) We skip all the following procedures and move to the next t and i.

      2. (ii)

        Choose \(|S_i| - 1\) empty nodes from A randomly. For each chosen node, a random \({\uplambda }\)-bit encryption key is selected and stored at the corresponding element in B.

      3. (iii)

        Construct a linked chain structure over \(|S_i|\) nodes. The starting node is A[f(ti)] and the other \(|S_i| - 1\) nodes are selected at the previous step. Note that if \(|S_i|\) = 1, then the chain is constructed with only one starting node.

      4. (iv)

        Store \(|S_i|\) identifiers at each node in the chain, respectively.

    2. (b)

      For convenience, we will denote chains constructed by the above procedure as \(T_{t,i}\) with \(1 \le t \le \ell \), \(1 \le i \le R\).

  4. 4.

    Computing ‘ending points’ and separating intervals.

    1. (a)

      Set the initial interval \(R_{0,0} = [1, R]\).

    2. (b)

      Repeat the following for all t from \(t = 1\) to \(t = \ell \), and \(d = 0, ..., 2^{t-1}-1\).

      1. (i)

        For an interval, \(R_{t-1,d} = [a,b]\), compute \(E_{t-1,d} = \lfloor (a+b)/2 \rfloor \), where \(\lfloor x \rfloor \) means the largest integer which is smaller than or equal to x.

      2. (ii)

        Separate \(R_{t-1,d}\) into two sub-intervals, \(R_{t,2d} = [a, E_{t-1,d}]\) and \(R_{t,2d+1} = [E_{t-1,d}+1, b]\).

  5. 5.

    Connecting external links

    1. (a)

      For each \(R_{t,d} = [a,b]\), where \(1 \le t \le \ell \) and \(0 \le d \le 2^{t-1}\), the user repeats the following.

      1. (i)

        If d is an even number, connect linked chains \(T_{t,a}, T_{t,a+1}, \ldots , T_{t,b}\) in increasing order, i.e., store an external link, \(\langle f(t,i+1), h(t,i+1) \rangle \), at the last node of \(T_{t,i}\).

      2. (ii)

        Otherwise, connect linked chains \(T_{t,a},\) \(T_{t,a+1},\) \(\dots , T_{t,b}\) in decreasing order, i.e., store an external link, \(\langle f(t,i-1), h(t,i-1) \rangle \), at the last node of \(T_{t,i}\).

  6. 6.

    Encrypting array A

    1. (a)

      Each element A[j] of A is replaced by encrypted value, i.e., \(E_{B[j]}(A[j])\).

    2. (b)

      If A[i] is empty then the user generates a random string with the same length instead of encryption.

After the above processes are completed, each document \(D_j\) is encrypted using the secret key k. The encrypted documents and the index, array A, are sent to the server and stored.

Remark In the above description, every \(E_{t,d}\) is the middle point of a given interval. Actually, there is no need to fix \(E_{t,d}\) as the middle point. It is easy to see that randomly chosen \(E_{t,d}\) does not affect the correctness of the protocol. With the modification, the number of layers is changed and the user must keep memorizing where the intervals are divided.

2.2.3 Trapdoor

The trapdoor function outputs a trapdoor when a given interval [ab] satisfies \(a = E_{t,d}+1\) or \(b = E_{t,d}\) for some t and d.

  1. 1.

    If \(b = E_{t,d}\) for some t and d, then

    $$\begin{aligned} \text{ Trapdoor } = \langle f(t+1, a), h(t+1, a) \rangle . \end{aligned}$$
  2. 2.

    If \(a = E_{t,d} + 1\) for some t and d, then

    $$\begin{aligned} \text{ Trapdoor } = \langle f(t+1, b), h(t+1, b) \rangle . \end{aligned}$$

If a given interval [ab] does not satisfy the above conditions, then the user separates [ab] into two sub-intervals \([a, E_{t,d}]\) and \([E_{t,d}+1, b]\). Note that there is a pair of t and d which satisfy \([a, b] \subseteq R_{t,d}\) and \(a \le E_{t,d} < b\) for any interval [ab]. Then, the user makes two trapdoors separately and queries twice.

2.2.4 Search

Assume that the server receives a trapdoor \(\langle v, k \rangle \) from the user.

  1. 1.

    The server prepares an empty set \(\mathcal {I}\).

  2. 2.

    The server repeats the following search process.

    1. (a)

      Find A[v] and decrypt A[v] using the secret key k. Assume that the server obtains \((ID_v, \langle P_v, K_v \rangle )\), after decrypting A[v].

    2. (b)

      Put \(ID_v\) in \(\mathcal {I}\).

    3. (c)

      If \(\langle P_v, K_v \rangle \) is not empty, then repeat the above process using \(\langle P_v, K_v \rangle \).

  3. 3.

    The server gives the final \(\mathcal {I}\) to the user as a result.

3 Security

3.1 Notions

Since the proposed protocol is based on SSE-1 proposed by [6], the security proof is also very similar. We introduce security notions and non-adaptive indistinguishability security for symmetric searchable encryptions given by Curtmola et al.

A ‘history’ is interaction between a user and a server. If we assume that a user made q queries, then history \(H^{(q)}\) can be uniquely determined by the collection of documents stored and q queries. Thus, we can define a history as \(H^{(q)} = (S, w_1, \dots , w_q)\), where S is a collection of all documents and each \(w_i\) is a queried keyword. Moreover, we can also define a partial history \(H^{(q:\tau )}\) as \((S, w_1, \dots , w_\tau )\), where \(\tau \le q\).

Intuitionally, a history is what the user wants to hide from the server. However, the server can obtain some information such as the index, the number of documents, trapdoors, and encrypted documents. A set of these, i.e., what the adversary actually gets to “see” during interactions, is called a ‘view’ of the given history. Formally, we can define a view of history \(H^{(q)}, V_K(H^{(q)})\) as \((id(D_1), \dots , id(D_N), E(D_1), \dots , E(D_N), ~ Index, ~ T_1,\) \(\dots , T_q)\), where E(D) is a ciphertext of D, \(T_i\)s are trapdoors, and K is a set of secret keys. Similarly, the partial view, \(V_K^{(\tau )}(H^{(q)})\), is \((id(D_i), \dots , id(D_N), E(D_1), \dots , E(D_N), ~ Index, ~ T_1,\) \(\dots , T_\tau )\), where \(\tau \le q\). Trivially, the view should not reveal any information about the history.

With a notion of ‘trace’, we mean the information we are willing to leak about the history. The outcome of each search, i.e., identifiers of documents that contain a queried keyword, is leaked to the server. The pattern of searches, the information of correspondence between two trapdoors, is also revealed to the server. Moreover, since the encrypted documents are stored on the server, we can assume that encrypted documents and identifiers are also leaked. Formally, the trace, \(Tr(H^{(q)})\) can be defined as \(Tr(H^{(q)}) = (id(D_1), ~ \dots , ~ id(D_N),\) \(|D_1|, ~ \dots , ~ |D_N|, ~ S(w_1), ~ \dots ,\) \(S(w_q), {\varPi }_q)\), where \(S(w_i)\) is a set of identifiers of documents that correspond to the query \(w_i\) and \({\varPi }_q\) means search pattern, \({\varPi }_q[i,j] = 1,\) if \(w_i = w_j\) and \({\varPi }_q[i,j] = 0\) otherwise, for \(1 \le i, j \le q\).

Substituting a single keyword \(w_i\) to an interval \([a_i, b_i]\) which satisfies the condition, \(a_i = E_{t,d}+1\) or \(b_i = E_{t,d}\) for some t and d, we can adopt these notions to range query protocols. The pattern of search \({\varPi }_q\) is also changed to \({\varPi }_q [i, j] = 1\) if \([a_i, b_i] \subseteq [a_j, b_j]\), and \({\varPi }_q [i,j] = 0\) otherwise, for \(1 \le i, j \le q\).

Definition

(Non-adaptive indistinguishability security for SSE) An SSE protocol is secure in the sense of non-adaptive indistinguishability if for all q, all (non-uniform) probabilistic polynomial time adversaries \(\mathcal {A} = (\mathcal {A}_1, \mathcal {A}_2)\), and all polynomials p and sufficiently large \({\uplambda }\),

$$\begin{aligned}&\Big | Pr[b'=b | K \leftarrow {Keygen}(1^{\uplambda }); (H_0, H_1, state) \leftarrow \mathcal {A}_1; \\&\quad b \leftarrow _R \{0,1\}; b' \leftarrow \mathcal {A}_2(V_K(H_b), state)] - \frac{1}{2} \Big | < 1/p({\uplambda }), \end{aligned}$$

where \((H_0, H_1)\) are histories of q queries such that \(Tr(H_0) = Tr(H_1)\), where the state is a polynomially bounded string that captures \(\mathcal {A}_1\)’s state, and the probability is taken considering the internal coins of Keygen, \(\mathcal {A}_1\), and the underlying Buildindex algorithm.

3.2 Security proof

Theorem

The proposed protocol satisfies non-adaptive indistinguishability security.

Proof For the proof, we define a simulator \(\mathcal {S}\): with the input Tr(H), a probabilistic polynomial-time algorithm \(\mathcal {S}\) generates \(V'(H)\), which is indistinguishable from \(V_K(H)\). We claim that constructing such a simulator is sufficient to prove the above theorem. If there exists such a simulator \(\mathcal {S}\), then from two histories \(H_1\) and \(H_2\) with \(Tr(H_1) = Tr(H_2)\), there is no polynomial time distinguisher that can distinguish \(V_K(H_1)\) from \(V_K(H_2)\). Note that if \(Tr(H_1) = Tr(H_2)\), then \(\mathcal {S}(Tr(H_1)) = \mathcal {S}(Tr(H_2))\). By the assumption, \(\mathcal {S}(Tr(H_1))\) is indistinguishable from \(V_K(H_1)\), and \(\mathcal {S}(Tr(H_2))\) is indistinguishable from \(V_K(H_2)\).

Now, we start with the construction of \(\mathcal {S}(Tr(H^{(0)}))\) for the case of \(q = 0\). For \(q = 0\), \(\mathcal {S}\) generates \(V'(H^{(0)}) = (id(D_1), \dots , id(D_N), E'(D_1), \dots , E'(D_N),\) \(Index' = A'),\) such that \(id(D_i)\) is copied from \(Tr(H^{(0)})\) and \(E'(D_i)\) is a randomly selected \(|D_i|\)-bit string for \(1 \le i \le N\). Finally, \(\mathcal {S}\) constructs \(Index' = A'\) with randomly chosen strings. In other words, \(\mathcal {S}\) makes \(A'\) to be an array with \(\ell \times (N+R)\) elements, where each element is filled with random string (with the same number and same length as ordinary A).

We claim that no probabilistic polynomial time adversary \(\mathcal {A}_2\) can distinguish \(V'(H^{(0)})\) from \(V_K(H^{(0)}) = (id(D_1), \dots , id(D_N), E(D_1), \dots , E(D_N), A)\) for any \(H^{(0)}\). By a standard hybrid argument, if there exists an adversary \(\mathcal {A}_2\) that can distinguish \(V'(H^{(0)})\) from \(V_K(H^{(0)})\), then we can easily deduce that \(\mathcal {A}_2\) can distinguish between at least one of the elements of \(V'(H^{(0)})\) and its corresponding element in \(V_K(H^{(0)})\). Note that the view of history consists of elements of three types: identifiers of documents, encrypted documents, and the array A. Since the identifiers of documents in \(V'(H^{(0)})\) are exactly the same as those in \(V_K(H^{(0)})\), it is not possible to distinguish the identifiers. From the security of an embedded encryption system, a ciphertext \(E(D_i)\) is indistinguishable from a random string \(E'_i\). Finally, note that each element of the array A is also encrypted using a semantically secure encryption system. Therefore, it cannot be distinguishable from a random string with the same length.

For \(q > 0\), \(\mathcal {S}\) constructs \(V'(H^{(q)}) = (id(D_1), \dots ,\) \(id(D_N), \, E'_1, \, \dots , \, E'_N,\) \(T'_1, \, \dots , \,\) \(T'_q, \, \dots , \, A')\), such that \(id(D_i)\) is copied from \(Tr(H^{(q)})\) and \(E'(D_i)\) is randomly chosen string with the same length to \(D_i\), for \(1 \le i \le N\). Recall that \(Tr(H^{(q)})\) \( = (id(D_1), \, \dots , \, id(D_N), \, |D_1|, \, \dots , \, |D_N|, \, S([a_1, b_1]), \, \dots ,~\) \(S([a_q, b_q]), {\varPi }_q)\), where \(S([a_i,b_i])\) is the set of documents that correspond to the query. To build \(A'\), \(\mathcal {S}\) chooses a symmetric key encryption system \(E'\), two pseudorandom functions \(f'\) and \(h'\), randomly. Also, \(\mathcal {S}\) defines \(S'_0\), a set of dummy documents, which satisfy the following: no document in \(S'_0\) is matched to any \([a_i, b_i]\), and \(|S([a_1, b_1]) \cup S([a_2, b_2]) \cup \dots \cup S([a_q, b_q]) \cup S'_0 | = N\). Let \(S' = S([a_1, b_1]) \cup S([a_2, b_2]) \cup \dots \cup S([a_q, b_q]) \cup S'_0\). Here, \(\mathcal {S}\) runs BuildIndex algorithm over \(S'\) using \(f', g', E'\) and obtains the array \(A'\). Finally, \(\mathcal {S}\) computes trapdoors ordinarily using \(f', g'\), and \(A'\).

We again claim that no probabilistic polynomial time adversary \(\mathcal {A}_2\) can distinguish \(V'(H^{(q)})\) from \(V_K(H^{(q)}) = (id(D_1), \dots , id(D_N), E(D_1), \dots , E(D_N),\) \(T_1, \dots ,\) \(T_q, A)\). By a standard hybrid argument, if there exists an adversary \(\mathcal {A}_2\) that can distinguish \(V'(H^{(q)})\) from \(V_K(H^{(q)})\), then we can easily deduce that \(\mathcal {A}_2\) can distinguish between at least one of the elements of \(V'(H^{(q)})\) and its corresponding element in \(V_K(H^{(q)})\). The identifiers are exactly the same as those of \(V_K(H^{(q)})\), and the secrecy of the embedded encryption algorithm guarantees the indistinguishability of encrypted documents, so we can easily notice that it is impossible to distinguish. Every element in \(A'\) is encrypted using a semantically secure encryption system. Therefore, from the security of the encryption system, we can deduce that no polynomial time adversary can distinguish these nodes from the nodes in A. For trapdoors, we should check whether the linked chain structure in \(A'\) is correct. In other words, using given trapdoors, we must be able to obtain the correct search results, which are the similar to those of \(H^{(q)}\). Note that each \(S([a_i, b_i])\) constitutes a proper linked chain and each trapdoor represents the locations and encryption keys for the starting point of a corresponding linked chain. Thus, we can obtain the same results though the regular search process using the given trapdoors. This means that no adversary can distinguish \(A'\) from A.

The proposed protocol is secure in the sense of non-adaptive indistinguishability if it satisfies the following equation for all positive integers q, all (non-uniform) probabilistic polynomial time adversaries \(\mathcal {A} = (\mathcal {A}_1, \mathcal {A}_2)\), and all polynomials p and sufficiently large \({\uplambda }\):

$$\begin{aligned}&\Big | Pr[ b'=b | K \leftarrow {Keygen}(1^{\uplambda }) ; (H_0, H_1, state) \leftarrow \mathcal {A}_1 ; \\&\quad b \leftarrow \{0,1\}; b' \leftarrow \mathcal {A}_2(V(H_b), state)] - \frac{1}{2} \Big | < 1/p({\uplambda }), \end{aligned}$$

where \(H_0\) and \(H_1\) mean histories over q queries such that \(Tr(H_0) = Tr(H_1)\), state is a polynomially bounded string that captures the internal state of \(\mathcal {A}_1\), and the probability is taken over the internal coins of Keygen, \(\mathcal {A}\), and the underlying BuildIndex algorithm.

Assume that there exists \(\mathcal {A}_2\) that can find \(b'\) with a probability significantly greater than 1/2. This means that \(\mathcal {A}_2\) can distinguish \(V_K(H_0)\) from \(V_K(H_1)\) with non-negligible probability. However, from the claim presented above, we already show that there exists a polynomial time algorithm \(\mathcal {S}\) that can simulate the view from the trace. From the fact that \(Tr(H_1) = Tr(H_2)\), we obtain \(\mathcal {S}(Tr(H_1)) = \mathcal {S}(Tr(H_2))\). From the claim, we can deduce that \(V_K(H_1)\) is indistinguishable from \(\mathcal {S}(Tr(H_1))\), and \(V_K(H_2)\) is also indistinguishable from \(\mathcal {S}(Tr(H_2))\). Therefore, there is no polynomial time distinguisher that can distinguish \(V_K(H_1)\) from \(V_K(H_2)\) and that contradicts to the assumption that \(\mathcal {A}_2\) can find \(b'\) with a probability significantly greater than 1/2. This is the end of the proof.

3.3 Other security concern

In the proposed scheme, someone may worry that in the search process documents are ordered by keyword. Note that the similar problem arises in every range query protocols due to the intrinsic property of range query. For example, when two intervals \(Q_1 = [1,4]\) and \(Q_2 = [2,4]\) are queried to the server and \(D_1\) is the result for \(Q_1\) and \(D_2\) is the result for \(Q_2\) (there is no need to be simultaneous, since the server can view and store every query), then trivially the server can guess that \(Q_1 \supset Q_2\) and additionally the server can know that the documents in \(D_1 - D_2\) are related to the smallest (or the largest) keyword with a great probability. Therefore, information leakage about relation among keywords cannot be prevented completely in range query protocols.

However, the information leakage can be reduced using linked tree structure [8]. In the linked tree structure, document identifiers are collected in a probabilistic manner and not sequential. Therefore, the information leakage about relation among keywords is reduced.

4 Analysis

4.1 Efficiency

4.1.1 Index size

We set the number of elements in A as \(\ell \times (N + R)\) in the previous sections. Here, we will show the reason for this. Assume that there are a total of N documents and R keywords. Recall the BuildIndex procedure. For each \(t = 1, 2, \dots , \ell \), each document is stored just once. Thus, for the best case, N nodes are required to store the documents. Additionally, we need to maintain R linked chains. It is essential to organize a linked chain structure. This means that even if \(S_i\) is empty, we need to construct a linked chain with one element. Actually, for one empty \(S_i\), the number of required nodes increases by one. For the worst case, at most \(N+R-1\) (simply \(N+R\)) elements are required. Since the number of elements required is not changed by the value of t, a total of \(\ell \times (N + R)\) elements are needed. Recall that the size of each element is \(\log _2 N + (\log _2 (\ell \times (N+R)) + {\uplambda })\) bits, where \(\ell \) is \(\log _2 R\) and \({\uplambda }\) is a security parameter.

Up to now, we have not considered a collision of f, an event in which f outputs the same value for different inputs. A collision disturbs the construction of linked chain structures. The more an array A is filled, the more serious the collision effect becomes. In the proposed protocol, however, only the starting points of linked chains are influenced by the collision. The other elements are randomly chosen among empty elements instead of the output of f. Compared to the size of A, the number of starting points is very small, thus starting points are very sparsely positioned in A. Since there are many well-developed methods to manage a sparse hash table efficiently, we can solve the collision problem (usually, at a cost of slightly increased storage).

4.1.2 Trapdoor size

The trapdoor consists of only two links independent of the size of the queried interval. One link is a pair of position information and secret key, which requires \(\log _2 (\ell \times (N+R)) + {\uplambda }\) bits.

Table 1 Efficiency comparison

4.1.3 Search time

Let m be the number of documents that are matched to the query. In the proposed protocol, there is no need to check the entire database to obtain the results for a given trapdoor. The server explores at most R linked chains being guided by links. The exploration only visits \(m+R\) elements at most. Moreover, searching one element requires only one symmetric key decryption. In the literature, the proposed protocol is the first searchable encryption protocol of which the range query requires a searching time below O(N).

4.2 Comparison

Only a few searchable encryption protocols with range query ability have been proposed. Among the previous protocols, MRQED, proposed by [9], and HVE, proposed by [10], are the most remarkable. Thus, we will compare the proposed protocol to HVE and MRQED. Indeed, a naive comparison can be somewhat meaningless because HVE and MRQED deal with wider problems than a simple range query. HVE provides a general method for conjunctive queries, subset queries, and range queries at the same time. In MRQED, a multi-dimensional range query is of main concern. However, there is no better counterpart for comparison. In the comparison, we focus on only a one-dimensional range query problem. We assume that there are N documents and each document has one keyword that is an integer chosen in [1, R]. In the table, m means the number of documents that correspond to the query.

The table shows that the proposed protocol is very efficient in comparing the trapdoor size and search time. Moreover, since the proposed protocol employs no public key-based computations, the real searching speed might be quicker (Table 1).

5 Conclusion

In this paper, we present a new symmetric searchable encryption protocol providing efficient range query. To our knowledge, it is the first symmetric searchable encryption that allows a range query. The multi-layered chain structure is the main idea of its construction. The proposed protocol inherits the advantages of symmetric key encryptions, so that it requires a small searching time. Especially, it is the first searchable encryption in which the search time of range query is independent of the size of the entire database. It also satisfies non-adaptive indistinguishability security. Future works must focus on improving the proposed scheme to provide easy ways for adding new documents and deleting documents from linked lists.