1 Introduction

Private Set Intersection (PSI) is a primitive of secure multiparty computation that enables two parties—a client and a server—to compute the intersection of their respective sets without disclosing anything about their inputs. The client will learn the intersection of the two sets, and the server will learn nothing [1]. PSI is also known as Private Matching (PM) [2].

There are many practical applications of PSI (or PM) for protecting the privacy of the parties, such as National Security and Law Enforcement [3], Genomic Sequences Query [4], Joint Market Investigation [5], Privacy-Preserving Data Mining [6], Matching the Data Outsourced to Cloud Storage Services [7], Location-Based Sharing Services [8], and other online services [9].

Because of its importance and wide applicability, many schemes for PSI and its variants have been proposed [13, 1015]. Among these schemes, the most efficient PSI scheme requires \(O\left( {n+m} \right) \) costs in communication complexity, which increases linearly with both the client’s set size, n, and the server’s set size, m. Generally speaking, the size of the server’s set is larger than that of the client’s set. In order to reduce the communication complexity further, Wu et al. [2] presented an efficient PM scheme for very large m, which requires \(O\left( {n\mathrm{log}^{2}m} \right) \) costs in communication complexity.

As we know, quantum cryptography has the advantage of higher security than classical cryptography. However, to the best of our knowledge, there is no any quantum scheme for PSI. In this paper, we first proposed a novel quantum scheme for PSI. The proposed scheme only requires \(O\left( n \right) \) communication complexity, completely irrespective of the server set size, m. Therefore, it is especially suitable for large-scale client–server networks or Cloud service models.

2 Proposed scheme

We informally give a definition of Private Set Intersection first and then present our quantum scheme for Private Set Intersection.

Definition 1

Private Set Intersection (PSI)—There are two parties, a client and a server. The client inputs a private set C, and the server inputs a private set S. After running a PSI procedure, the client outputs the intersection of their respective sets, i.e., \(C\cap S\), but the server gets nothing except the client’s set size. In addition, PSI should meet the following requirements.

  • Correctness The client finally outputs the exact (possibly empty) intersection of their respective sets.

  • Client Privacy The server gets no information about the client’s set elements, except his/her set size.

  • Server Privacy The client learns no information about the subset \(S-C\cap S\) (i.e., the subset of elements on the server that are NOT in the intersection of their respective sets), except knowing the subset \(C\cap S\).

Suppose the client’s private set \(C=\left\{ {c_1 ,c_2 ,\ldots ,c_n } \right\} \) and the server’s private set \(S=\left\{ {s_1 ,s_2 ,\ldots s_m } \right\} \), and all elements of the sets C and S lie in \({\mathbb {Z}}_N^*\), where \({\mathbb {Z}}_N^*=\left\{ {1,2,\ldots ,N-1} \right\} \) and N is a natural number, which is far larger than n and m (i.e., \(N\gg n,m\)). In the following scheme, we only consider the honest-but-curious parties, who follow the protocol (honesty), but record everything they see and try to extract a secret (curiosity). The proposed scheme consists of four steps, which are described in detail as follows:

  • Step 1 The client prepares n encoded states, \(\left| {\psi _i } \right. \rangle =\frac{\left| 0 \right. \rangle +|c_i \rangle }{\sqrt{2}}\) for \(i=1\) to n, where \(c_i\) is his/her ith private element in C (i.e., \(c_i \in C\)). Furthermore, the client sends all encoded states to the server by an authenticated quantum channel [16, 17, 24, 25].

  • Step 2 After receiving all encoded states sent from the client, the server applies a quantum operator G on each received state and then sends them back to the client, where \(G=-U_0 U_S\). Here, \(U_0\) and \(U_S \) are unitary operators [18], defined as follows:

$$\begin{aligned} U_0 \left| x \right. \rangle =\left\{ {{\begin{array}{ll} \left| x \right. \rangle &{} \mathrm{if} \; x\ne 0 \\ -\left| 0 \right. \rangle &{} \mathrm{if} \; x=0 \\ \end{array} }} \right. , \end{aligned}$$
(1)
$$\begin{aligned} U_S \left| x \right. \rangle =\left\{ {{\begin{array}{ll} -\left| x \right. \rangle &{} \mathrm{if} \;x\in S \\ \left| x \right. \rangle &{} \mathrm{if}\; x\notin S \\ \end{array} }} \right. , \end{aligned}$$
(2)

where \(\left| x \right. \rangle \) is any basis state in N-dimensional Hilbert space, that is, \(U_0\) maps \(\left| 0 \right. \rangle \) to \(-\left| 0 \right. \rangle \) and leaves the remaining \(\left| x \right. \rangle \) alone, and \(U_S\) maps \(\left| x \right. \rangle \) to \(-\left| x \right. \rangle \) if \(x\in S\) and \(\left| x \right. \rangle \) otherwise. Then, we get

$$\begin{aligned} \left| {{\xi }_i } \right. \rangle= & {} G|\psi _i \rangle \nonumber \\= & {} G\frac{\left| 0 \right. \rangle +|c_i \rangle }{\sqrt{2}}\nonumber \\= & {} -U_0 U_S \frac{\left| 0 \right. \rangle +|c_i \rangle }{\sqrt{2}}\nonumber \\= & {} -\frac{U_0 U_S \left| 0 \right. \rangle +U_0 U_S |c_i \rangle }{\sqrt{2}}\nonumber \\= & {} \left\{ {{\begin{array}{ll} -\frac{U_0 \left| 0 \right. \rangle -U_0 |c_i \rangle }{\sqrt{2}} &{} \mathrm{if} \; c_i \in S \\ -\frac{U_0 \left| 0 \right. \rangle +U_0 |c_i \rangle }{\sqrt{2}} &{} \mathrm{if} \; c_i \notin S \\ \end{array} }} \right. \nonumber \\= & {} \left\{ {{\begin{array}{ll} \frac{\left| 0 \right. \rangle +|c_i \rangle }{\sqrt{2}}&{} \mathrm{if} \; c_i \in S \\ \frac{\left| 0 \right. \rangle -|c_i \rangle }{\sqrt{2}}&{} \mathrm{if} \; c_i \notin S \\ \end{array} }} \right. . \end{aligned}$$
(3)
  • Step 3 For each state returned from the server, the client performs an honest test, that is, he/she check whether the superposition in the corresponding encoded state is preserved as follows:\(\frac{\left| 0 \right. \rangle +|c_i \rangle }{\sqrt{2}}\) or \(\frac{\left| 0 \right. \rangle -|c_i \rangle }{\sqrt{2}}\). Since the two possible states are obviously orthogonal and further the client knows the value of \(c_i\), therefore, he/she is able to completely distinguish them by a von Neumann measurement. If the client finds a cheat of the server (i.e., the measured result is not \(\frac{\left| 0 \right. \rangle +|c_i \rangle }{\sqrt{2}}\) or \(\frac{\left| 0 \right. \rangle -|c_i \rangle }{\sqrt{2}})\), he/she will terminate this protocol, otherwise continue to the next step.

  • Step 4 The client gets the phase information \(p\left( i \right) \) of each returned state by distinguishing it between \(\frac{\left| 0 \right. \rangle +|c_i \rangle }{\sqrt{2}}\) and \(\frac{\left| 0 \right. \rangle -|c_i \rangle }{\sqrt{2}}\), i.e., \(p\left( i \right) =1\) if it is in the state \(\frac{\left| 0 \right. \rangle +|c_i \rangle }{\sqrt{2}}\), and \(p\left( i \right) =-1\) otherwise. Furthermore, if \(p\left( i \right) =1\), then \(c_i \in C\cap S\); otherwise \(c_i \notin C\cap S\). Finally, the client outputs all elements that the phase information is equal to one (i.e., \(\{c_i |c_i \in C\wedge p\left( i \right) =1\hbox { for }i=1\hbox { to }n\})\). However, the server gets nothing except the size of the set C of the client.

3 Analysis

Correctness The scheme proposed above clearly and rightly works when the client and the server honestly execute the protocol. If \(c_i \in S\), it can easily see that \(p\left( i \right) =1\) by Eq. (3). Therefore, the client rightly outputs the intersection of their respective sets by n von Neumann measurements, that is, the proposed PSI scheme is correct.

Client Privacy mainly depends on the server’s impossibility of discriminating the encoded quantum state sent from the client, thanks to two basic laws of quantum theory: No-Cloning Theorem which forbids the creation of identical copies of an arbitrary unknown quantum state, and Heisenberg Uncertainty Principle which implies that it is impossible to measure the state of any system without disturbing that system.

In order to extract the secret information about \(c_i\) from the encoded state \(\left| {\psi _i } \right. \rangle =\frac{\left| 0 \right. \rangle +|c_i \rangle }{\sqrt{2}}\), the server must measure the state \(\left| {\psi _i } \right. \rangle \). However, he cannot perform the equivalent measurement which the client does, because he does not know \(c_i\). Therefore, if the server measures the encoded state, he will certainly disturb it. In the following section, we will analyze two measure-based attacks by a dishonest server in detail.

First, if the server directly measures the encoded state \(\frac{\left| 0 \right. \rangle +|c_i \rangle }{\sqrt{2}}\) by a simple projective measurement (intercept), the measured result can be either \(\left| 0 \right. \rangle \) or \(\left| {c_i } \right. \rangle \) with the probabilities \(\frac{1}{2}\) and \(\frac{1}{2}\), respectively. If he gets \(\left| {c_i } \right. \rangle \), he can successfully pass the honest test by repreparing and returning a new quantum system in the state \(\left| {\psi _i } \right. \rangle =\frac{\left| 0 \right. \rangle +|c_i \rangle }{\sqrt{2}}\) (resend). However, if he gets \(\left| 0 \right. \rangle \), he cannot pass the honest test. In short, this intercept–resend attack will be discovered in the honest test with the probability of \(\frac{1}{2}\), that is, our scheme is cheat-sensitive [1921]. In order to further resist this intercept–resend attack, in principle the client can replace the encoded state \(\left| {\psi _i } \right. \rangle =\frac{\left| 0 \right. \rangle +|c_i \rangle }{\sqrt{2}}\) with states of the form \(\frac{e^{i \theta }\left| 0 \right. \rangle +|c_i \rangle }{\sqrt{2}}\) [19], where the phase \(\theta \) is a parameter randomly and privately selected by the client. Since the server does not know the value of \(\theta \), it will be clearly impossible for him to reprepare the correct reply state after his measurement.

Second, we further discuss a more complicated entangle-measure attack by a dishonest server that he/she is able to prepare an ancillary system and entangle the ancillary system with the quantum system carried the encoded states by his local unitary operations, and afterward he can measure the ancillary system to get the partial information about the client’s private inputs. For simplicity, we only consider the client’s quantum system in the general state of \(\frac{\left| {0\rangle +} \right| k\rangle }{\sqrt{2}}\), where k is a private element of the client’s set C. Suppose that the initial state of the ancillary system is \(|0\rangle _S\) and the server’s dishonest action when he receives the client’s encoded states can be described by a unitary operator \({\tilde{U}}_\mathrm{cs}\), which acts on the client’s quantum system c and the server’s ancillary system s. We can describe it as follows:

$$\begin{aligned}&{\tilde{U}}_\mathrm{cs} \left| {0\rangle _c } \right| 0\rangle _s =\sqrt{\eta _0 }\left| {0\rangle _c } \right| \phi _0 \rangle _s +\sqrt{1-\eta _0 }|V_0 \rangle _\mathrm{cs}, \end{aligned}$$
(4)
$$\begin{aligned}&{\tilde{U}}_\mathrm{cs} \left| {k\rangle _c } \right| 0\rangle _s =\sqrt{\eta _k }\left| {k\rangle _c } \right| \phi _k \rangle _s +\sqrt{1-\eta _k }|V_k \rangle _\mathrm{cs}, \end{aligned}$$
(5)
$$\begin{aligned}&{\tilde{U}}_\mathrm{cs} \left( {\frac{| 0\rangle +|k\rangle }{\sqrt{2}}} \right) _c | {0\rangle _s =\sqrt{\eta _{+k} }\left( {\frac{| 0 \rangle +|k\rangle }{\sqrt{2}}} \right) _c } |\phi _{+k} \rangle _s +\sqrt{1-\eta _{+k} }|V_{+k} \rangle _\mathrm{cs} , \end{aligned}$$
(6)

where \(|V_0 \rangle _\mathrm{cs}\), \(|V_k \rangle _\mathrm{cs}\) and \(|V_{+k} \rangle _\mathrm{cs}\) are a vector orthogonal to \(\left| {0\rangle _c } \right| \phi _0 \rangle _s \), \(\left| {k\rangle _c } \right| \phi _k \rangle _s \) and \(|+k\rangle _c |\phi _{+k} \rangle _s\) \((\left| {+k} \right. \rangle =\frac{\left| 0 \right. \rangle +|k\rangle }{\sqrt{2}})\), respectively, i.e.,

$$\begin{aligned}&{}_c \langle 0|_s \langle \phi _0 |V_0 \rangle _\mathrm{cs} =0, \end{aligned}$$
(7)
$$\begin{aligned}&_c \langle k|_s \langle \phi _k |V_k \rangle _\mathrm{cs} =0, \end{aligned}$$
(8)
$$\begin{aligned}&_c \langle +k|_s \langle \phi _{+k} |V_{+k} \rangle _\mathrm{cs} =0. \end{aligned}$$
(9)

In order to completely pass the honest test (i.e., the returned system c must be \(\frac{\left| {0\rangle +} \right| k\rangle }{\sqrt{2}}\) without consideration of the phase transformation), we can easily deduce that the following condition holds in Eq. (6):

$$\begin{aligned} \eta _{+k} =1, \end{aligned}$$
(10)

that is,

$$\begin{aligned} {\tilde{U}}_\mathrm{cs} \left( {\frac{\left| 0 \right. \rangle +|k\rangle }{\sqrt{2}}} \right) _c | {0\rangle _s =\left( {\frac{\left| 0 \right. \rangle +|k\rangle }{\sqrt{2}}} \right) _c } |\phi _{+k} \rangle _s . \end{aligned}$$
(11)

In addition, obviously the returned states cannot contain other vectors except the vectors of \(|0\rangle _c\) and \(|k\rangle _c\). Thus, in order to fully pass the honest test, Eqs. (4) and (5) must be restrained as the following expressions, accordingly:

$$\begin{aligned} {\tilde{U}}_\mathrm{cs} \left| {0\rangle _c } \right| 0\rangle _s =\sqrt{\eta _0 }\left| {0\rangle _c } \right| \phi _0 \rangle _s +\sqrt{1-\eta _0 }\left| {k\rangle _c } \right| \phi _k \rangle _s , \end{aligned}$$
(12)
$$\begin{aligned} {\tilde{U}}_\mathrm{cs} \left| {k\rangle _c } \right| 0\rangle _s =\sqrt{\eta _k }\left| {k\rangle _c } \right| \phi _k \rangle _s +\sqrt{1-\eta _k }\left| {0\rangle _c } \right| \phi _0 \rangle _s . \end{aligned}$$
(13)

Given from Eq. (11), when \(k=0\), it further gets

$$\begin{aligned} {\tilde{U}}_\mathrm{cs} \left| {0\rangle _c } \right| 0\rangle _s =\left| {0\rangle _c } \right| \phi _0 \rangle _s . \end{aligned}$$
(14)

It implies

$$\begin{aligned} \eta _0 =1. \end{aligned}$$
(15)

Then, we get

$$\begin{aligned} {\tilde{U}}_\mathrm{cs} \left| {k\rangle _c } \right| 0\rangle _s= & {} {\tilde{U}}_\mathrm{cs} \left[ {\sqrt{2}\left( {\frac{\left| k \right. \rangle +\left| 0 \right. \rangle -|0\rangle }{\sqrt{2}}} \right) } \right] _c |0\rangle _s\nonumber \\= & {} {\tilde{U}}_\mathrm{cs} \left[ {\sqrt{2}\left| {+k} \right. \rangle -|0\rangle } \right] _c |0\rangle _s\nonumber \\= & {} \sqrt{2}{\tilde{U}}_\mathrm{cs} \left| {+k\rangle _c } \right| 0\rangle _s -{\tilde{U}}_\mathrm{cs} \left| {0\rangle _c } \right| 0\rangle _s\nonumber \\= & {} \sqrt{2}\left| {+k\rangle _c } \right| \phi _{+k} \rangle _s -\left| {0\rangle _c } \right| \phi _0 \rangle _s \quad \hbox {(by Eq. 11)}\nonumber \\= & {} \sqrt{2}\left( \frac{\left| 0 \right. \rangle +\left| k \right. \rangle }{\sqrt{2}}\right) _c \left| {\phi _{+k} \rangle _s -} \right| 0\rangle _c |\phi _0 \rangle _s\nonumber \\= & {} \left| {0\rangle _c } \right| \phi _{+k} \rangle _s +\left| {k\rangle _c } \right| \phi _{+k} \rangle _s -\left| {0\rangle _c } \right| \phi _0 \rangle _s. \end{aligned}$$
(16)

If we compute the scalar product between Eqs. (13) and (16), then we will obtain the identity

$$\begin{aligned} 1= & {} \sqrt{1-\eta _k} { }_s \langle \phi _0 \left| {\phi _{+k} \rangle _s +\sqrt{\eta _k}\; { }_s \langle \phi _k } \right| \phi _{+k} \rangle _s\nonumber \\&-\sqrt{1-\eta _k}\; { }_s \langle \phi _0 |\phi _0 \rangle _s\nonumber \\= & {} \sqrt{1-\eta _k}\; { }_s \langle \phi _0 \left| {\phi _{+k} \rangle _s +\sqrt{\eta _k} \; { }_s \langle \phi _k } \right| \phi _{+k} \rangle _s\nonumber \\&-\sqrt{1-\eta _k }. \end{aligned}$$
(17)

Since \({ }_s \langle \phi _0 |\phi _{+k} \rangle _s \le 1\) and \({ }_s \langle \phi _k |\phi _{+k} \rangle _s \le 1\), so we get

$$\begin{aligned} 1\le \sqrt{1-\eta _k }+\sqrt{\eta _k }-\sqrt{1-\eta _k }, \end{aligned}$$
(18)

that is,

$$\begin{aligned} 1\le \sqrt{\eta _k }, \end{aligned}$$
(19)

which implies

$$\begin{aligned} \eta _k =1. \end{aligned}$$
(20)

Thus, we can obtain the following expanded expression

$$\begin{aligned} {\tilde{U}}_\mathrm{cs} \left( {\frac{\left| 0 \right. \rangle +|k\rangle }{\sqrt{2}}} \right) _c |0\rangle _s= & {} \frac{{\tilde{U}} _\mathrm{cs} \left| {0\rangle _c } \right| 0\rangle _s +{\tilde{U}}_\mathrm{cs} \left| {k\rangle _c } \right| 0\rangle _s}{\sqrt{2}}\nonumber \\&=\frac{\left| {0\rangle _c } \right| \phi _0 \rangle _s +\left| {k\rangle _c } \right| \phi _k \rangle _s}{\sqrt{2}}. \end{aligned}$$
(21)

Similarly, if we compute the scalar product between Eqs. (11) and (21), then we will obtain

$$\begin{aligned} 1=\frac{1}{2}{ }_s \langle \phi _0 | {\phi _{+k} \rangle _s +\frac{1}{2}{ }_s \langle \phi _k} |\phi _{+k} \rangle _s . \end{aligned}$$
(22)

By Eq. (22), it gives

$$\begin{aligned}&{ }_s \langle \phi _0 |\phi _{+k} \rangle _s =1, \end{aligned}$$
(23)
$$\begin{aligned}&{ }_s \langle \phi _k |\phi _{+k} \rangle _s =1. \end{aligned}$$
(24)

From Eqs. (23) and (24), it shows that if the server wants to be sure that he passes the honest test, then the final states of the ancillary system s for any choice of k will coincide with \(|\phi _0 \rangle _s\), that is, the states of the ancillary system s are independent of the secret k. Therefore, even though the server performs an entangle-measure attack, he cannot yet obtain any secret information about the secret k from the encoded state \(\frac{\left| 0 \right. \rangle +|k\rangle }{\sqrt{2}}\).

Server Privacy If the client honestly executes the protocol, he/she cannot obtain any secret information about the server’s private set, except knowing \(C\cap S\). If the client is dishonest, it is possible for him/her to perform a cheating strategy as follows: he/she sends a false state \(\frac{\left| j \right. \rangle +|k\rangle }{\sqrt{2}}\) to the server, instead of the true state \(\frac{\left| 0 \right. \rangle +|k\rangle }{\sqrt{2}}\). Accordingly, the returned state from the server must be in either \(\frac{\left| j \right. \rangle +|k\rangle }{\sqrt{2}}\) or \(\frac{\left| j \right. \rangle -|k\rangle }{\sqrt{2}}\). However, from the states of \(\frac{\left| j \right. \rangle +|k\rangle }{\sqrt{2}}\) or \(\frac{\left| j \right. \rangle -|k\rangle }{\sqrt{2}}\), the client cannot get the right phase information \(p\left( j \right) \) or \(p\left( k \right) \), while he/she can infer that \(p\left( j \right) =p\left( k \right) \) or \(p\left( j \right) \ne p\left( k \right) \), but not deduce whether j or k belongs to the server’s private set.

We have analyzed the security of proposed protocols in ideal settings. However, in practical settings, there may be some faults (e.g., noise and error) in the quantum channel and measurement. In order to ensure its security in practical settings, we can use the fault-tolerant technologies, such as decoherence-free states and error-correcting code, which were introduced in Refs. [22, 23]. In addition, please note that we only consider the honest-but-curious parties in our protocols, which is similar to the semi-honesty model in the classical settings. In classical settings, any secure protocol in semi-honesty model can be correspondingly translated into a secure protocol in malicious model. However, it still needs to further study how to translate a protocol from semi-honesty model to malicious model in quantum settings. It is also our future work (especially the definition of malicious model in quantum settings).

In addition, the authenticated quantum channel can ensure the security of quantum communications. Like most existing secure multiparty quantum computations, our scheme needs there is an authenticated quantum channel. This is the only assumption we need to have for the scheme to work. In principle, we may use a quantum authentication scheme (QAS) [16] based on Clifford operators introduced in [17] to implement it. We may also use quantum encryptions combined with classical authenticated keys [24, 25]. In addition, we may still ensure the authentication by sharing the entangled quantum resources in advance or using the detecting (or decoy) particle technologies.

Finally, we analyze the communication costs of the proposed scheme. We can easily see that the client sends and receives n encoded states, respectively. So the communication complexity is \(O\left( n \right) \), irrespective of the size of the server’s set. Compared to the classical PSI schemes with \(O\left( {n+m} \right) \) communication complexity, our proposed scheme has a very significant reduction in the communication complexity due to \(O\left( n \right) \) communication complexity.

4 Conclusion

In this paper, we first presented a quantum method to solve PSI problem. In the proposed PSI scheme, the client first prepares n encoded states and then sends them to the server. After applying n quantum operators, the server sends these encoded states back to the client. Finally, the client performs n von Neumann measurements to privately choose out all elements of the intersection of their respective sets. During this process, two parties only require to exchange n quantum states. Obviously, both computation and communication complexities of the proposed scheme are \(O\left( n \right) \), which are independent of the server’s set size m. Therefore, it is very suitable for big data services in Cloud or large-scale client–server networks