Keywords

6.1 Introduction

The drawback of most of the privacy-enabled retrieval techniques of the server is that they either adopt information theory-based information-theoretic or cryptographic assumption-based computationally bounded privacy techniques which are assuring only partial privacy. But the user assumes that his privacy is always guaranteed by the other party. This serious monopoly move by the server leads to several problems.

Scenario-1: Let us consider that all the patients pertaining to a disease of some region have uploaded all their health cards in a plain or encoded format on a server by considering policy-driven privacy assurance.

The server has several possible moves. What if the server shares the stored patient information with other healthcare industries that are eager to know the disease count there by producing more products? What if the server shares the information with other regional bodies which are already tied up for food and nutrition exchange policies?

What is the mere negative consequence? The healthcare industries may pressurize the government body to exchange such food that creates sufficient malnutrition or set a customized health boundary point so that more patients should be included in that boundary. In turn, analytics-enabled pharmaceutical industries may advertise their products so that the physicians should refer to the same products. If this business cycle continues, then at some point in time, all people will become patients.

Scenario-2: Let the public database maintain all the information particular to its domain (e.g., search engines, social media, multimedia, patent, etc.). Let the authenticated user search or retrieve the subset of information from the database to which he is subscribed to. What happens when the analytics-enabled server tracks all the search or retrieval sequences of a particular user or a group of users related to a particular domain? What if the server shares its analytical results with user’s business opponent?

Scenario-3: Let us consider that a severe war is happening between two rivals on the war field. What if the global positioning system (GPS) server tracks and shares one of the opponent’s livestream information to others?

Scenario-4: Suppose if any two peer devices like two military commanders want to share secret information securely and privately through insecure communication channel or through a mediator? Also, if any two end devices want to communicate with end-to-end encryption enabled like “private chat”? What if the mediator or the third party reveals the communication information?

Perfect Privacy Solution:

To overcome the above problems, the only solution is to shift from policy-driven privacy architecture to protocol-driven privacy architecture. Therefore, we have introduced a new protocol-driven (i.e., scheme level privacy support) perfect privacy-preserving information retrieval scheme using a concept called private information retrieval (PIR).

Private information retrieval [7] is one of the ways of reading the bit information from the other party privately, and private block retrieval (PBR), a realistic extension of PIR, is the way of reading single block information from the database privately. We have successfully constructed a new PBR scheme in a single database setting which neither belongs to information-theoretic (i.e., no replicated database) nor belongs to computationally bounded (i.e., no privacy assumptions). The proposed scheme fully supports “perfect privacy,” i.e., all the queries are mutually exclusive and give no information (not even partial information) about the user privacy. Note that the proposed scheme conventionally uses the term “PIR” but PBR by default until and unless externally stated. Note that the privacy refers to the user privacy, and perfect privacy refers to zero percent privacy leak until and unless externally stated.

The construction uses “quadratic residuosity” as the underlying data privacy operation. Note that the quadratic residuosity property is only used for preserving data from the intermediate adversaries. This property is not related to hide the privacy of the user, i.e., even on identifying the quadratic residuosity property of the numbers sent in the query, server gains no information about the user’s interest. By this, we claim that the construction supports perfect privacy, and the success probability of identifying user’s interest is equal to “random guessing.”

Finally, we have achieved the following results.

  • We have successfully overcome the trivial database download requirement claimed by [7] by achieving the overall communication cost as o(n) where n is the database size which is surely a non-trivial communication as claimed by [9].

  • The protocol is generic in nature and can be adopted by both client-server and peer-to-peer privacy critical applications.

Related work:

Extensive work has been carried out on PIR by various researchers to fulfill the trade-off between communication and computation overheads, to preserve the user as well as server privacy, handling fault tolerance and integrity. The PIR is mainly classified into two categories in which one relay on non-colluding server replication and the other relay on single database with limited computation power.

Information-Theoretic PIR (itPIR) In order to provide protocol-driven privacy, Chor et al. [7] introduced the concept of private reading from k replicated databases and further improved the communication cost in [8] using XOR operations. There are several other improvements introduced by [1, 2] over communication and computation overheads in itPIR setting. Gertner et al. [13] highlight on the data privacy of the server along with the user privacy using the concept of conditional disclosure of secrets.

Computationally Bounded PIR (cPIR) The first quadratic residuosity assumption-based privacy-preserving PIR scheme was introduced by [16] in a single database setting with sub-polynomial communication cost. Chor and Gilboa [6] also presented a one-way function-based PIR scheme with the minimal database replication to achieve only computationally bounded privacy. Cachin et al. [4] presented ϕ-hiding-based scheme with polylogarithmic communication cost. Ishai et al. [15] introduced an efficient cPIR scheme using anonymity techniques. Aguilar-Melchor and Gaborit [19] introduced fast cPIR scheme based on coding theory and lattice assumptions. Groth et al. [14] proposed multi query cPIR with constant communication rate. Jonathan and Andy [24] improve computational complexity of existing cPIR using trapdoor groups. Kushilevitz and Ostrovsky [17] presented a computationally intractable cPIR using one-way trapdoor permutations. Chang [5] presented a computationally bounded PIR with logarithmic communication using Paillier cryptosystem as the underlying intractability assumption. Gentry and Ramzan [11] presented a PBR scheme with log-squared communication using a decision subgroup problem called ϕ-hiding assumption. In order to protect both user and server privacy, several oblivious transfer (OT) schemes [9, 18, 20, 21] have also been introduced in a single database setting. The first keyword-based PIR search [3] has been introduced to apply PIR on the existing server data structure.

Perfect Privacy The term “perfect privacy” as defined in [7] strongly suggests the requirement of the uniformly distributed probability for any two random variables (PIR queries are treated as the random variables). The first information-theoretic single-database PIR scheme was introduced by [12] and recently by [22]. In order to preserve the user privacy in multiuser setting, input anonymity by secret sharing technique is presented by Toledo et al. [23].

Organization:

The rest of the paper is organized as follows. The required notations and preliminaries are described in Sect. 6.2; the preliminary modules, the proposed PIR scheme, and the performance analysis along with the required security proofs are all described in Sect. 6.3; and, finally, the open problems are listed along with the conclusion in Sect. 6.4.

6.2 Notations and Preliminaries

Let [u]={1, 2, ⋅⋅⋅, u} and [1, u] be the method of selecting all the integers from 1 to u; \(\mathcal {D}\mathcal {B}^{b_{v}}_{u}\) is a set of u number of v bit matrix. Let N=pq (where \(N\xleftarrow {R}\{0,1\}^{k}\) with the security parameter k) be the RSA composite modulus, and \(\mathcal {S}_{QR}\), \(\mathcal {S}_{QNR}\subseteq \mathbb {Z}^{+1}_{N}\) are the quadratic residue and non-residue subsets respectively. Let \(\mathcal {J}\mathcal {S}\) and \(\mathcal {L}\mathcal {S}\) be the Jacobi and Legendre symbols respectively. Let c be the total number of l-bit groups of a database block. Let \(\mathcal {U}\) be the end user or the client or the intended service seeker and \(\mathcal {S}\) be the server or the intended service provider.

Quadratic residuosity:

\(\forall x,y\in \mathbb {Z}^{+1}_{N}\), if x ≡ y2 (mod N) then \(x\in \mathbb {Z}^{+1}_{N}\) \(\setminus \mathcal {S}_{QNR}\), i.e., \(x\in \mathcal {Q}\mathcal {R}\); otherwise \(x\in \mathbb {Z}^{{ +1}}_{N}\) \(\setminus \mathcal {S}_{QR}\), i.e., \(x\in \mathcal {S}_{QNR}\).

Definition 1 (Trapdoor Function of [10]) : \(\forall x\in \mathbb {Z}^{*}_{N}\), \(r\in \mathbb {Z}^{{-}1}_{N}\), \(s\in \mathcal {S}_{QNR}\), ∀jx, hx ∈{0, 1}, the function \(\mathcal {T}(x,r,s)=(x)^{2}\cdot r^{jx}\cdot s^{hx}\)=z such that jx=1 if \(\mathcal {J}\mathcal {S}_{N}(x)\)=−1 otherwise jx = 0 and hx=1 if \(x>\frac {N}{2}\) otherwise hx=0. The inverse function \(\mathcal {T}^{{-}1}\) is defined as \(\mathcal {T}^{{-}1}(z)=\sqrt {(z)\cdot r^{{-}jx}\cdot s^{{-}hx}}\)=x. The generalized formula for l number of inputs is given as \(\mathcal {T}:(x_{1},\cdot \cdot \cdot ,x_{l},r,s)\rightarrow z_{1},\cdot \cdot \cdot ,z_{l}\) where \(\mathcal {T}(x_{1},\cdot \cdot \cdot ,x_{l},r,s)\)=\(\mathcal {T}_{1},\cdot \cdot \cdot ,\mathcal {T}_{l}\) and \(\mathcal {T}_{i}(x_{i},r,s)\)=\((x_{i})^{2}\cdot r^{jx_{i}}\cdot s^{hx_{i}}\)=z i , i ∈ [1, l].

We have used a slightly modified version of the above trapdoor function for our proposed PIR scheme and is given as \(\mathcal {M}\mathcal {T}:x\rightarrow (z,t)\), t ∈{0, 1} where \(\mathcal {M}\mathcal {T}\)=x2=z and t is assigned with the “hx” value of the input x. The generalized formula for l number of inputs is given as \(\mathcal {M}\mathcal {T}:(x_{1},\cdot \cdot \cdot ,x_{l})\rightarrow ((z_{1},\cdot \cdot \cdot ,z_{l})\), (t 1, ⋅⋅⋅, t l )) where \(\mathcal {M}\mathcal {T}(x_{1},\cdot \cdot \cdot ,x_{l})\)=\(\mathcal {M}\mathcal {T}_{1},\cdot \cdot \cdot ,\mathcal {M}\mathcal {T}_{l}\) and \(\mathcal {M}\mathcal {T}_{i}(x_{i})\)=\(x_{i}^{2}\)=(z i ,t i ), i ∈ [1, l].

Definition 2 (Perfect Privacy PIR Query) :

If any two randomly selected PIR queries are independent of block reference (or block index), i.e., \(Pr[Q_{i}\xleftarrow {R}\mathcal {Q}\mathcal {F}(1^{k}):A(n,Q_{i},1^{k})=1]\) is equal to \(Pr[Q_{j}\xleftarrow {R}\mathcal {Q}\mathcal {F}(1^{k}):A(n,Q_{j},1^{k})=1]\) where A is a distinguishing server, \(\mathcal {Q}\mathcal {F}\) is the query generating function, and Pr is the probability distribution function then the mutual information between them is I(Q i , Q j )=0. This implies that the queries are independent of privacy or the PIR queries are exhibiting perfect privacy.

Definition 3 (Perfect Privacy Single Database PIR (perfectPIR))

It is a 5-tuple (\(\mathcal {U}\), \(\mathcal {S}\), \(\mathcal {Q}\mathcal {F}\), \(\mathcal {R}\mathcal {C}\), \(\mathcal {I}\mathcal {E}\)) protocol where \(\mathcal {U}\) is the customer, \(\mathcal {S}\) is the service provider, \(\mathcal {Q}\mathcal {F}\) is the query formulation algorithm run by \(\mathcal {U}\), \(\mathcal {R}\mathcal {C}\) is the response creation algorithm run by \(\mathcal {S}\), and \(\mathcal {I}\mathcal {E}\) is the interest extraction algorithm run by \(\mathcal {U}\). Let n bit database \(\mathcal {D}\mathcal {B}^{b_{v}}_{u}\) be a two-dimensional matrix of u rows and v columns. For any interested block \(\mathcal {D}\mathcal {B}_{i}\), i ∈ [u], of the database \(\mathcal {D}\mathcal {B}^{b_{v}}_{u}\), the user \(\mathcal {U}\) generates PIR query described in Definition 2 to achieve the user privacy and sends to the database server \(\mathcal {S}\) where all the queries sent over the insecure communication channel are coved under “quadratic residuosity assumption” (QRA) to achieve “data privacy.” The database server \(\mathcal {S}\) replies by generating block-specific response ciphertext set R j and trapdoor bit set as communication bits for all the blocks \(\mathcal {D}\mathcal {B}_{j}\), j ∈ [1, u] where all the generated ciphertexts from \(\mathcal {S}\) are coved under QRA to achieve “data privacy.” In turn, user \(\mathcal {U}\) retrieves or reads required block \(\mathcal {D}\mathcal {B}_{i}\) using the block-specific response ciphertext set R i and its corresponding trapdoor bit set.

6.3 Perfect Privacy PIR Scheme

Let the database \(\mathcal {D}\mathcal {B}^{b_{v}}_{u}\) be viewed as a two-dimensional matrix of u rows and v columns where n=uv. The database \(\mathcal {D}\mathcal {B}^{b_{v}}_{u}\) of size n=uv is constituted by individual matrix or a block \(\mathcal {D}\mathcal {B}_{i}\)=b 1, b 2, ⋅⋅⋅, b v ,i ∈ [u], each of size v where b is the bit of \(\mathcal {D}\mathcal {B}_{i}\). Let us consider sufficiently large RSA composite modulus N=pq where p ≡ q ≡ 3 (mod 4). Assume that both the parties (user and server) have exchanged some prior information like the database size n, c, and l where each database block is divided into c number of l-bit groups and public key combination table (as described in Table 6.1).

Table 6.1 Public key combinations for 2-Bit and 3-bit encoding where Q\(\in \mathbb {Z}^{+1}_{N}\setminus \mathcal {S}_{QNR}\), N\(\in \mathbb {Z}^{+1}_{N}\setminus \mathcal {S}_{QR}\)

At the high level design, the proposed 5-tuple protocol of Definition 3 is viewed as a way of retrieving or reading information from the service provider privately. The intended service seeker or the customer \(\mathcal {U}\) wishes to retrieve some information from the intended service provider \(\mathcal {S}\) privately using user-centric “public key cryptography.” In order to achieve private retrieval or private reading to read the block from the server, \(\mathcal {U}\) generates perfect privacy supported PIR query \(\mathcal {Q}\) (i.e., query is selected as described in Definition 2) using the initialization or query formulation algorithm \(\mathcal {Q}\mathcal {F}\) and sends to \(\mathcal {S}\). The service provider \(\mathcal {S}\) generates and sends back the response by involving all the database blocks using reply or response creation algorithm \(\mathcal {R}\mathcal {C}\). Finally, the customer \(\mathcal {U}\) retrieves the required block privately using reading or interest extraction algorithm \(\mathcal {I}\mathcal {E}\).

l-Bit Input v/s l-Output Property Combination:

The public key combination selection for a particular bit or a group of bits is described as follows. Let us consider an encoding function \( {f}:(b\xleftarrow {R}\{0,1\},x\xleftarrow {R}\mathbb {Z}^{*}_{N},\mathcal {P}\mathcal {K}\xleftarrow {R}\mathbb {Z}^{+1}_{N})\rightarrow (z\in \mathbb {Z}^{+1}_{N})\) using the encoding bit b, random input x, and public key \(\mathcal {P}\mathcal {K}\) as

$$\displaystyle \begin{array}{lll} {f}(b,x,\mathcal{P}\mathcal{K})=\left\{ \begin{array}{c l} (x^{2}\cdot~\mathcal{P}\mathcal{K}\mid \mathcal{P}\mathcal{K}\xleftarrow{R}\mathcal{S}_{QNR})\equiv(z\in\mathcal{S}_{QNR})~(\mathrm{mod} ~N)&~\mathrm{if b=1}\\ (x^{2}\cdot~\mathcal{P}\mathcal{K}\mid\mathcal{P}\mathcal{K}\xleftarrow{R}\mathcal{S}_{QR})\equiv(z\in\mathcal{S}_{QR})~(\mathrm{mod} ~N)&~\mathrm{if b=0}\\ \end{array}\right.\end{array} $$
(6.1)

If the encoding bit b=1, then the public key \(\mathcal {P}\mathcal {K}\) should always be selected from \(\mathcal {S}_{QNR}\) so that the output ciphertext z always resides in \(\mathcal {S}_{QNR}\). Similarly, if the encoding bit b=0, then the public key \(\mathcal {P}\mathcal {K}\) should always be selected from \(\mathcal {S}_{QR}\) so that the output ciphertext z always resides in \(\mathcal {S}_{QR}\). If there are l-bit input functions 1, ⋅⋅⋅, l producing l output ciphertexts and each function drawn from (6.1) encodes one bit, then l public key combinations are to be used to encode l-bit input. For instance, for 2-bit input, there are two encoding functions 1, 2, two public keys \(\mathcal {P}\mathcal {K}_{1}\xleftarrow {R}\mathcal {S}_{QNR}\),\(\mathcal {P}\mathcal {K}_{2}\xleftarrow {R}\mathcal {S}_{QR}\), and four public key combinations, namely, (\((\mathcal {P}\mathcal {K}_{1}\),\(\mathcal {P}\mathcal {K}_{1})\),\((\mathcal {P}\mathcal {K}_{1}\),\(\mathcal {P}\mathcal {K}_{2})\),\((\mathcal {P}\mathcal {K}_{2}\),\(\mathcal {P}\mathcal {K}_{1})\),\((\mathcal {P}\mathcal {K}_{2}\),\(\mathcal {P}\mathcal {K}_{2})\)) as shown in Table 6.1. Similarly, for 3-bit input, there are three encoding functions 1, 2, 2, two public keys \(\mathcal {P}\mathcal {K}_{1}\),\(\mathcal {P}\mathcal {K}_{2}\), and eight public key combinations, namely, (\((\mathcal {P}\mathcal {K}_{1}\),\(\mathcal {P}\mathcal {K}_{1}\),\(\mathcal {P}\mathcal {K}_{1})\),\((\mathcal {P}\mathcal {K}_{1}\), \(\mathcal {P}\mathcal {K}_{1}\), \(\mathcal {P}\mathcal {K}_{2})\),\((\mathcal {P}\mathcal {K}_{1}\),\(\mathcal {P}\mathcal {K}_{2}\), \(\mathcal {P}\mathcal {K}_{1})\),\((\mathcal {P}\mathcal {K}_{1}\),\(\mathcal {P}\mathcal {K}_{2}\),\(\mathcal {P}\mathcal {K}_{2})\), \((\mathcal {P}\mathcal {K}_{2}\), \(\mathcal {P}\mathcal {K}_{1}\),\(\mathcal {P}\mathcal {K}_{1})\), \((\mathcal {P}\mathcal {K}_{2}\),\(\mathcal {P}\mathcal {K}_{1}\),\(\mathcal {P}\mathcal {K}_{2})\), \((\mathcal {P}\mathcal {K}_{2}\),\(\mathcal {P}\mathcal {K}_{2}\),\(\mathcal {P}\mathcal {K}_{1})\),\((\mathcal {P}\mathcal {K}_{2}\), \(\mathcal {P}\mathcal {K}_{2}\), and \(\mathcal {P}\mathcal {K}_{2})\)) as shown in Table 6.1. If the input bit is 0, then always \(\mathcal {P}\mathcal {K}_{1}\) is selected; otherwise \(\mathcal {P}\mathcal {K}_{2}\) is selected in the encoding function. Clearly, in order to get unique l ciphertext output quadratic residuosity property combinations (as shown in Table 6.1), public key \(\mathcal {P}\mathcal {K}_{1}\) is selected if the input bit is 1 otherwise public key \(\mathcal {P}\mathcal {K}_{2}\) is selected during encoding process using the encoding function .

Bit Group Encoding:

Let us view the database block \(\mathcal {D}\mathcal {B}_{i}\)=b 1, b 2, ⋅⋅⋅, b v ,i ∈ [u] as a set of l-bit groups {G 1 = (b 1, ⋅⋅⋅, b l ), G 2 = (b l+1, ⋅⋅⋅, b 2l), G 3 = (b 2l+1, ⋅⋅⋅, b 3l), ⋅⋅⋅, G σ  = (b vl+1, ⋅⋅⋅, b v )} for some v=lc where c > 0 is an integer constant and σ ∈ [1, c]. In order to accomplish PIR operation on a block \(\mathcal {D}\mathcal {B}_{i}\), the PIR encoding function for \(l\in \mathbb {N}\) bit group G σ , (y 1,⋅⋅⋅,\(y_{l})\xleftarrow {R}\mathbb {Z}^{*}_{N}\) bit input and public keys \(\mathcal {P}\mathcal {K}_{1}\xleftarrow {R}\mathcal {S}_{QNR}\),\(\mathcal {P}\mathcal {K}_{2}\xleftarrow {R}\mathcal {S}_{QR}\) is \(\mathcal {E}:(G_{\sigma },N,y_{1},\cdot \cdot \cdot ,y_{l},\mathcal {P}\mathcal {K}_{1},\mathcal {P}\mathcal {K}_{2})\rightarrow \)( α 1, ⋅⋅⋅, α l ) where α 1, ⋅⋅⋅, α l are the corresponding ciphertext outputs. The detailed description of the encoding function \(\mathcal {E}\) is as follows.

$$\displaystyle \begin{aligned} \mathcal{E}_{\sigma}(G_{\sigma},N,y_{1},\cdot\cdot\cdot,y_{l},\mathcal{P}\mathcal{K}_{1},\mathcal{P}\mathcal{K}_{2})=\left\{ \begin{array}{c l} \mathcal{f}_{1}=[(y_{1})^{2}\cdot\mathcal{P}\mathcal{K}_{j}\equiv&\alpha_{1}~(\mathrm{mod}~N)]\\ ~~\cdot=[~~\cdot~~~~~~~~\cdot~~~\equiv&~~\cdot~~~~~~~~~~~~~~~]\\ ~~\cdot=[~~\cdot~~~~~~~~\cdot~~~\equiv&~~\cdot~~~~~~~~~~~~~~~]\\ \mathcal{f}_{l}=[(y_{l})^{2}\cdot\mathcal{P}\mathcal{K}_{j^{\prime}}\equiv&\alpha_{l}~(\mathrm{mod}~N)] \end{array}\right. \end{aligned} $$
(6.2)

where j, j∈ [2] and each which encodes one bit of G σ in (6.2) is drawn from (6.1).

Connecting Two Encoding Functions Using [10]:

For any two consecutive PIR encoding functions \(\mathcal {E}_{\sigma }\) and \(\mathcal {E}_{\sigma +1}\) of (6.2) where 1 ≤ σ ≤ (c−1), the connecting function \(\mathcal {C}\) is described as follows. Let us consider \(\mathcal {E}_{\sigma }:(G_{\sigma }\), N, y 1,⋅⋅⋅, y l , \(\mathcal {P}\mathcal {K}_{1}\), \(\mathcal {P}\mathcal {K}_{2})\rightarrow \) {α 1, α 2,⋅⋅⋅, α l } and \(\mathcal {E}_{\sigma +1}\):(G σ+1, N, α 1,⋅⋅⋅, α l , \(\mathcal {P}\mathcal {K}_{1}\), \(\mathcal {P}\mathcal {K}_{2})\) \(\rightarrow \{\alpha ^{\prime }_{1},\alpha ^{\prime }_{2},\cdot \cdot \cdot ,\alpha ^{\prime }_{l}\}\) then the connecting function \(\mathcal {C}:(\mathcal {E}_{\sigma },\mathcal {E}_{\sigma +1})\rightarrow \) (\(\{\alpha ^{\prime }_{1},\alpha ^{\prime }_{2},\cdot \cdot \cdot ,\alpha ^{\prime }_{l}\}\), {t 1, t 2, ⋅⋅⋅, t l }) where each trapdoor bit t i , i ∈ [l] generated from the modified trapdoor function \(\mathcal {M}\mathcal {T}\) of (6.3) and is equivalent to “hx” value of the trapdoor function \(\mathcal {T}\) described in Definition 1. Each connecting function \(\mathcal {C}\) in turn connects to the next connecting function.

$$\displaystyle \begin{aligned} \mathcal{C}_{}(\mathcal{E}_{\sigma},\mathcal{E}_{\sigma+1})=\left\{ \begin{array}{c l} \mathrm{Apply}~~\mathcal{E}_{\sigma}~~\mathrm{first}\\ ~~~~~~~~~~~~\mathrm{then}\\ \mathcal{E}_{\sigma+1}=\left\{ \begin{array}{c l} (\mathcal{M}\mathcal{T}(\alpha_{1}))^{2}\cdot\mathcal{P}\mathcal{K}_{j}\equiv&\alpha^{\prime}_{1}~(\mathrm{mod}~N)\\ ~~\cdot~~~~~~~~~~~\cdot~~~\equiv&~\cdot\\ ~~\cdot~~~~~~~~~~~\cdot~~~\equiv&~\cdot\\ (\mathcal{M}\mathcal{T}(\alpha_{l}))^{2}\cdot\mathcal{P}\mathcal{K}_{j^{\prime}}\equiv&\alpha^{\prime}_{l}~(\mathrm{mod}~N) \end{array}\right. \end{array}\right. \end{aligned} $$
(6.3)

Note that only \(\mathcal {E}_{1}\) selects the input y 1,⋅⋅⋅,y l from \(\mathbb {Z}^{*}_{N}\), and all other \(\mathcal {E}_{i}\), i ∈ [2, c], select α 1,⋅⋅⋅,α l from \(\mathbb {Z}^{+1}_{N}\).

6.3.1 Generic l-Bit Perfect Privacy PIR Scheme

By combining above modules, we have finally constructed a perfect privacy preserving single database PIR as follows. All the below described PIR algorithms are taken from Definition 3.

  • Initializing ( \(\mathcal {Q}\mathcal {F}\)): User \(\mathcal {U}\) sends a block independent single PIR query \(\mathcal {Q}\)=(N, \(\mathcal {P}\mathcal {K}_{1}\), \(\mathcal {P}\mathcal {K}_{2}\), y 1,⋅⋅⋅, y l ) to the server where \(\mathcal {P}\mathcal {K}_{1}\xleftarrow {R}\mathcal {S}_{QR}\), \(\mathcal {P}\mathcal {K}_{2}\xleftarrow {R}\mathcal {S}_{QNR}\) and y 1,⋅⋅⋅, \(y_{l}\xleftarrow {R}\mathbb {Z}^{*}_{N}\).

  • Reply ( \(\mathcal {R}\mathcal {C}\)): Server generates the block-specific response R i , i ∈ [1, u], as follows. As a result of response, each block \(\mathcal {D}\mathcal {B}_{i}\) generates two ciphertexts and trapdoor bit set as

    $$\displaystyle \begin{aligned} \begin{array}{rcl}{c l}{} \mathrm{Block~PIR~encryption}&\displaystyle =&\displaystyle \mathcal{E}_{i}(G_{i},N,\mathcal{M}\mathcal{T}(\mathcal{E}_{i-1}),\mathcal{P}\mathcal{K}_{1},\mathcal{P}\mathcal{K}_{2})\\ &\displaystyle =&\displaystyle ((\beta^{\alpha_{l}}_{i}=(\alpha_{1},\cdot\cdot\cdot,\alpha_{l})),(\rho^{t_{l(c-1)}}_{i}=(t_{1},\cdot\cdot\cdot,t_{l(c-1)})))\\ &\displaystyle =&\displaystyle R_{i} \end{array} \end{aligned} $$
    (6.4)

    where i ∈ [c, 2], \(\beta ^{\alpha _{l}}_{i}\) is l number of ciphertexts generated at the block i, \(\rho ^{t_{l(c-1)}}_{i}\) is l(c−1) number of trapdoor bits generated at the block i, \(\mathcal {E}_{1}(\mathcal {P}\mathcal {K}_{j},\mathcal {P}\mathcal {K}_{j^{\prime }},y_{1},\cdot \cdot \cdot ,y_{l},G_{1})\). Note that any two consecutive PIR encoding functions \(\mathcal {E}_{\sigma }\) and \(\mathcal {E}_{\sigma {-}1}\), c ≥ σ ≥ 2 described as \(\mathcal {E}_{\sigma }(G_{\sigma }\), N, \(\mathcal {M}\mathcal {T}\)(\(\mathcal {E}_{\sigma {-}1})\), \(\mathcal {P}\mathcal {K}_{1}\), \(\mathcal {P}\mathcal {K}_{2})\) in (6.4) is equivalent to the connecting function \(\mathcal {C}(\mathcal {E}_{\sigma ^{\prime }},\mathcal {E}_{\sigma ^{\prime }+1})\), 1 ≤ σ≤ (c−1). The overall response from all the blocks of \(\mathcal {D}\mathcal {B}^{b_{v}}_{u}\) would be R=R 1||R 2||⋅⋅⋅||R u . The response R is sent back to the user.

  • Reading ( \(\mathcal {I}\mathcal {E}\)): By using the block-specific response R i =\((\beta ^{\alpha _{l}}_{i},\rho ^{t_{l(c-1)}}_{i})\), the user privately reads the required bits of the interested block \(\mathcal {D}\mathcal {B}_{i}\) as follows.

    $$\displaystyle \begin{aligned} \begin{array}{rcl} {} &\displaystyle \mathcal{D}(p,q,\mathcal{E}_{i}(\mathcal{P}\mathcal{K}_{1},\mathcal{P}\mathcal{K}_{2},\mathcal{M}\mathcal{T}^{{-}1}(\mathcal{D}(p,q,\mathcal{E}_{i{+}1}))){=}(b_{1},b_{2},\cdot\cdot\cdot,b_{v}){=}\mathcal{D}\mathcal{B}_{i}\\ or\\ {} &\displaystyle \mathcal{D}(p,q,\beta^{\alpha_{l}}_{i},\rho^{t_{l(c-1)}}_{i})=(b_{1},b_{2},\cdot\cdot\cdot,b_{v})=\mathcal{D}\mathcal{B}_{i} \end{array} \end{aligned} $$
    (6.5)

    where i ∈ [1, c − 1], \(\mathcal {E}_{c}(\mathcal {P}\mathcal {K}_{j},\mathcal {P}\mathcal {K}_{j^{\prime }},\alpha _{1},\cdot \cdot \cdot ,\alpha _{l},t_{l(c-1)})\).

Theorem 1

If any two randomly selected PIR queries are independent to each other, then they exhibit perfect privacy in PIR environment. In other words, for all quadratic residuosity-based perfect privacy PIR protocols, the probability distributions of any two randomly selected queries are always equal and independent to each other, and hence mutual information between those two queries is always zero.

Proof (Sketch)

Consider any PIR query \(\mathcal {Q}\)=(N, \(\mathcal {P}\mathcal {K}_{1}\), \(\mathcal {P}\mathcal {K}_{2}\), y 1,⋅⋅⋅, y l ) constructed by the user in \(\mathcal {Q}\mathcal {F}\) algorithm. Note that the domains of each element are \(\mathcal {P}\mathcal {K}_{1}\xleftarrow {R}\mathcal {S}_{QR}\), \(\mathcal {P}\mathcal {K}_{2}\xleftarrow {R}\mathcal {S}_{QNR}\) and y 1,⋅⋅⋅, \(y_{l}\xleftarrow {R}\mathbb {Z}^{*}_{N}\). Also consider \(y^{\prime }_{1}\),⋅⋅⋅, \(y^{\prime }_{l}\xleftarrow {R}\mathbb {Z}^{*}_{N}\). Since the domain of the query input is always \(\mathbb {Z}^{*}_{N}\) or the query input is always independent of the block index, it is intuitive that \(Pr[\mathcal {Q}_{i}=(N, \mathcal {P}\mathcal {K}_{1}, \mathcal {P}\mathcal {K}_{2}, y_{1}\), \(\cdots , y_{l})\xleftarrow {R}\mathcal {Q}\mathcal {F}(1^{k}):A(n,\mathcal {Q}_{i},1^{k})=1]\) is equal to \(Pr[\mathcal {Q}_{j}=(N, \mathcal {P}\mathcal {K}_{1}, \mathcal {P}\mathcal {K}_{2}, y^{\prime }_{1},\cdot \cdot \cdot , y^{\prime }_{l})\xleftarrow {R}\mathcal {Q}\mathcal {F}(1^{k}):A(n,\mathcal {Q}_{i},1^{k})=1]\). Therefore, the randomly selected queries \(X=\mathcal {Q}_{i}\) and \(Y=\mathcal {Q}_{j}\) or random variables X and Y are independent to each other. Intuitively, Pr(XY )= Pr(X, Y )= Pr(X) ⋅ Pr(Y ) provided Pr(Y ) > 0. The respective conditional distribution of X and Y  and the mutual informations are

$$\displaystyle \begin{aligned} \begin{aligned} \boldsymbol{Pr}(\boldsymbol{X}\mid\boldsymbol{Y})&=\frac{Pr(XY)}{Pr(Y)}=\frac{Pr(X)\cdot Pr(Y)}{Pr(Y)}=Pr(X)\\ \end{aligned} \end{aligned} $$
(6.6)
$$\displaystyle \begin{aligned} \begin{aligned} \boldsymbol{I(X,Y)}={\sum}_{X}{\sum}_{Y}~Pr(X,Y)~log~\frac{Pr(X)\cdot Pr(Y)}{Pr(X)\cdot Pr(Y)}&={\sum}_{X}{\sum}_{Y}~Pr(X,Y)~log~1\\ &={\sum}_{X}{\sum}_{Y}~Pr(X,Y)\cdot 0=0 \end{aligned} \end{aligned} $$
(6.7)

Intuitively, all the PIR queries are mutually exclusive. This implies user privacy is independent of its query input. Therefore, all the PIR queries exhibit perfect privacy, i.e., the server gains no knowledge about the user privacy or block that the user wishes to retrieve by the query analysis using his unlimited computing power.

Theorem 2

For all perfectPIR protocol, there exists a communication cost o(n) which is always less than the trivial database download cost O(n).

Proof (Sketch)

After each PIR encoding function \(\mathcal {E}_{i}\), i ∈ [1, c], the modified trapdoor function \(\mathcal {M}\mathcal {T}\) generates l number of trapdoor bits. Since there are c number of l bit groups present or equivalently lc number of intermediate ciphertexts generated in a block \(\mathcal {D}\mathcal {B}_{j}\), j ∈ [u], there are exactly l(c − 1) or (v − l) number of trapdoor bits generated from each block. In total, there are ul(c − 1) or u(v − l) number of trapdoor bits generated from the entire database. Clearly, ul(c − 1) or u(v − l) is always less than the database size uv. Therefore the communication cost w.r.t the database size is always o(n) which is clearly an acceptable communication cost for “perfect privacy” in PIR environment.

Performance:

User generates k(3 + l) bit length PIR query Q and sends it to the server. Server generates l(k + (c − 1)) number of communication bits from each block and hence u[l(k + (c − 1))] number of bits from the entire database and sends back this communication bits to the user. Server performs 2ulc number of modular multiplications during PIR invocation, and user performs only 2lc number of modular multiplications during block retrieval.

6.4 Conclusion with Open Problems

We have successfully constructed a new perfect privacy-preserving information retrieval protocol with o(n) communication cost. The proposed scheme successfully adopts quadratic residuosity-based public key cryptography as the underlying primitive. In the future, it is essential to reduce the communication cost so that all including the bandwidth-limited applications can adopt the scheme. Therefore, constructing a perfect privacy PIR with the efficient communication cost is still an open problem.