Abstract
Similarity measurement is a fundamental problem that arise both on its own and as a key subroutine in more complex tasks, such as machine learning. However, in classical algorithms, the time used to similarity measurement usually increases exponentially as the amount of data and the number of data dimensions increase. In this paper, we presented three quantum algorithms based on Euclidean distance to measure the similarity between data sets. In the proposed algorithms, some special unitary operations are utilized to construct imperative quantum states from quantum random access memory. Then, a badly needed result for estimating the similarity between data sets, can be got by performing projective measurements. Furthermore, it is shown that these algorithms can achieve the exponential acceleration of the classical algorithm in the quantity or the dimension of data.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Dating from the 80’s of last century, it is shown that quantum computing allows for more efficient solutions of some problems than classical computing. The prime-factorization problem is one of this. In 1994, by exploiting quantum mechanical properties, Shor proposed the famous Shor algorithm [1], which is exponentially faster than the best-known classical factoring algorithm. The other one is Grover algorithm for solving unsorted database search problem [2], which makes quadratic speedup over the classical search algorithm. Inspired by these algorithms, people try to exploit quantum computing to solve other problems in the field of information technology [3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20], e.g., machine learning. The most representative one is HHL algorithm [3]. In this algorithm, Harrow et al. utilized Hamiltonian simulation [4] and phase estimation technology to solve linear equations. Furthermore, it has exponential speedup compared with classical counterpart, and can be widely applied in machine learning numerical calculation or other scenarios. After that, more and more attention has been paid to this new multi-knowledge crossed research field, quantum machine learning. As for now, some subtle quantum machine learning algorithms have been proposed for various machine learning tasks, such as quantum principal component analysis [5], quantum supervised learning [6,7,8,9,10], quantum unsupervised learning [7, 11], quantum regression [12,13,14], quantum search engine ranking [15,16,17,18], quantum neural network [19], and so on. More excitingly, all these algorithms achieve desirable speedup effect.
Similarity measurement is a primitive machine learning task, which plays an indispensable role in data clustering [21, 22] and classification [23, 24]. It is a method to evaluate the degree of similarity between individuals. There exist many ways to depict similarity, the most common of which is to use Euclidean distance. That is, the closer the Euclidean distance between two data points (data vectors) is, the more similar they are; and vice versa. With the advent of the era of big data, the number of the data set and the dimension of the data points grow larger. As noted by Aaronson [25], sampling and estimating distances and inner products between post-processed vectors on a classical computer is apparently exponentially hard. Therefore, how to effectively similarity measurement has become a new challenge. Quantum computing may be a good solution to this problem. In 2014, Lloyd exploited quantum parallelism to accelerate the estimation of Euclidean distance from a training data vector to a sample cluster [7]. However, there are many other similarity measurements based on Euclidean distance in classical machine learning algorithms, such as group average distance [26]. Hence, in this work, we examine the issue further and propose three quantum algorithms based on Euclidean distance similarity measurement. In these algorithms, we use the characteristic of quantum random access memory (QRAM) to access data in parallel [27] and the amplitude estimation technology [28] to estimate the Euclidean distance between data sets. And the analysis shows that these algorithms are more efficient than the corresponding classical algorithms.
The remainder of this paper is organized as follows. In Section 2, we briefly introduce the essential preliminaries. Then, the quantum algorithms of three distance measurement methods respectively are described in Section 3. A brief analysis of three proposed quantum algorithms is presented in Section 4. Finally, a short conclusion is provided in Section 5.
2 Preliminaries
2.1 Similarity Measurement Based on Euclidean Distance
As the key of clustering algorithm, similarity measurement can be expressed in many ways. The most common way is Euclidean distance. Given two N-dimensional data vectors u and v, the similarity of them expressed in squared Euclidean distance is \(d^{2}(\mathbf {u},\mathbf {v})={{\sum }_{k=1}^{N}(u_{k}-v_{k})^{2}}\). However, we need not only to estimate the similarity between the points, but also to calculate the distance of two sets in practice. For example, there are three different Euclidean distance measurements in hierarchical clustering, which will lead to different results. Therefore, different methods are needed for different types of data sets. It makes sense to design quantum algorithms for these distances respectively. Before describing quantum algorithms, we review three classical methods, briefly.
Considering the task of similarity measurement for two data sets U and V. And p data points \(\overrightarrow {u_{i}}\in U\) and q data points \(\overrightarrow {v_{j}}\in V\) are given.
-
i
Average Linkage (average distance). As shown in Fig. 1a, the squared Euclidean distance between each data point in one set and that in the other set is calculated firstly. Then, by taking the mean as the distance between the two data sets. The average distance between U and V is obtained, which can be written as:
$$ D(U,V)=\frac{{\sum}_{\overrightarrow{u_{i}}\in U,\overrightarrow{v_{j}}\in V}d^{2}(\overrightarrow{u_{i}},\overrightarrow{v_{j}})}{pq}. $$(1) -
ii
Complete Linkage (furthest distance). As shown in Fig. 1b, this method is to take the distance between the two furthest data points in the two data sets as the distance between two sets. The furthest distance between U and V can be written as:
$$ D^{\prime}(U,V)=\underset{\overrightarrow{u_{i}}\in U,\overrightarrow{v_{j}}\in V}{\max}{d^{2}(\overrightarrow{u_{i}},\overrightarrow{v_{j}})}. $$(2) -
iii
Single Linkage (nearest distance). As shown in Fig. 1b, the single linkage, contrary to the complete linkage, takes the distance between the nearest data points in the two data sets as the distance between the sets. The nearest distance between U and V can be written as:
$$ D^{\prime\prime}(U,V)=\underset{\overrightarrow{u_{i}}\in U,\overrightarrow{v_{j}}\in V}{\min}{d^{2}(\overrightarrow{u_{i}},\overrightarrow{v_{j}})}. $$(3)
These methods complement each other, satisfying common data types. For example, the average linkage is suitable for ellipsoid data sets, and the single linkage is better for strip data sets [29]. Hence, the three methods could be widely applied to classification and clustering of data sets.
2.2 Data Preparation and Pre-processing
In quantum machine learning, we generally assume that data sets consist of arrays of vectors stored in QRAM. Given a N = 2n dimensional complex vector \(\overrightarrow {u}\), we can express it as the composition of N components of \(\overrightarrow {u}=(u_{1},u_{2},\cdots , u_{N})\), where \(\{{u_{k}}=|u_{k}|e^{\texttt {i}\phi _{k}},\ \texttt {i}=\sqrt {-1}\}\). Moreover, we assume that {|uk|,ϕk} are stored as floating point numbers in quantum random access memory, and stored as “binary trees” data structure that is shown in Fig. 2. The details can be found in its initial proposal for designing quantum recommendation system [30], as well as its successful application in designing the quantum linear system algorithm for dense matrices [31]. In short, N dimensional vector \(\overrightarrow {u}\) can be described by a quantum state \(|u\rangle =|\overrightarrow {u}|^{-1} \overrightarrow {u}\) of n qubits in time \(O\left (\log _{2} N\right )\). Once the exponentially compressed quantum versions of the vectors have been created, we can apply these vectors to the proposed quantum algorithms.
3 Quantum Algorithms
In this section, we present quantum algorithms for similarity measurement. It consists of three ways to compute similarity of two data sets based on Euclidean distance: quantum algorithms for estimating the average linkage, the complete linkage and the single linkage.
Suppose that two N dimensional data sets \(U:\left \{\overrightarrow {u_{i}} | i=1,2, \cdots , p\right \}\) and \(V:\left \{\overrightarrow {v_{j}} | j=1,2, \cdots , q\right \}\) are given. From Section 2.2, the corresponding normalization \(\left |\overrightarrow {u_{i}}\right |\) and \(\left |\overrightarrow {v_{j}}\right |\) can be respectively obtained from quantum random access memory. Then, we can read data from QRAM efficiently and build their corresponding form of quantum state \(\left |u_{i}\right \rangle \) and \(\left |v_{j}\right \rangle \) through the following two unitary operations
Here, \(A={\sum }_{i=1}^{p}\left |\overrightarrow {u}_{i}\right |^{2}\), \(B={\sum }_{j=1}^{q}\left |\overrightarrow {v_{j}}\right |^{2}\). That is, the operations \(U_{\mathcal {F}}\), \(U_{\mathcal {T}}\) allow preparation of quantum state encoding an original data point of data sets \(\left \{\overrightarrow {u_{i}}\right \}\) and \(\left \{\overrightarrow {v_{j}}\right \}\) respectively. Moreover, based on the conclusions of Ref. [32], two controlled operators \(C(U_{\mathcal {F}})\), \(C(U_{\mathcal {T}})\) may be constructed with the circuits, if the operators \(U_{\mathcal {F}}\) and \(U_{\mathcal {T}}\) are implemented efficiently. In the following, the implementations of quantum algorithms for three computing methods are put forward.
3.1 Algorithm 1: a Quantum Algorithm for Estimating the Average Linkage
From (1), the average distance between two data sets can be expressed as quantum form
Obviously, we are able to use Lloyd algorithm [7] to accomplish this task. Namely, by Lloyd algorithm, the distances between all data points in one data set and the other data set are got, then the average distance is directly derived from the average value of these distances. However, it needs time \(O\left (pq\log N\right )\) and is not efficient. To achieve the internal quantum efficiency, we reexamine this task and propose a new quantum algorithm, which is described as follows.
-
(A1.1)
Let us start with constructing a quantum system that consists of four registers, in the initial state, |ϕ〉 = |0〉1|0〉2|0〉3|0〉4, where the subscript indicates the location of register.
-
(A1.2)
An operation \(H_{1}\otimes {{F}_{2}^{p}} \otimes I_{3} \otimes {{F}_{4}^{q}}\) is applied on these four registers. Here, \(H=\frac {1}{\sqrt 2} (|0\rangle \langle 0|+|0\rangle \langle 1|+|1\rangle \langle 0|-|1\rangle \langle 1|)\) is a Hadamard operator, and Fp (Fq) is a p(q)-level Fourier transform operator depicted as follows,
$$ F^{p} = \sum\limits_{i=1}^{p}{\omega}^{il}|i\rangle \langle l|, $$(7)where, \(\omega =e^{\frac {2\pi \texttt {i}}{p}}\). After this operation, the whole system is in the state,
$$ \begin{array}{ll} \left|\varphi\right\rangle &=H_{1}\otimes {{F}_{2}^{p}} \otimes I_{3} \otimes {{F}_{4}^{q}}|\phi\rangle_{1234}\\ &=\frac{1}{\sqrt{2 p q}} {\sum}_{i=1}^{p} {\sum}_{j=1}^{q}(|0\rangle|i\rangle|0\rangle|j\rangle+|1\rangle|i\rangle|0\rangle|j\rangle)_{1234}, \end{array} $$(8) -
(A1.3)
In this step, a controlled operator \(C(U_{\mathcal {F}})_{123}\) (\(C(U_{\mathcal {T}})_{134}\)) is performed on \(\left |\varphi \right \rangle \), where the first register is the control qubit, and the second and third (second and fourth) registers are the target. That is, if the control qubit is in state |0〉 then apply \(U_{\mathcal {F}}\) to the target registers, otherwise apply \(U_{\mathcal {T}}\), as shown in Fig. 3. After performing the operations \(C(U_{\mathcal {F}})_{123}\) and \(C(U_{\mathcal {T}})_{134}\), the desired state can be obtained,
$$ \begin{array}{ll} |\psi\rangle &= C(U_{\mathcal{F}})_{123} C(U_{\mathcal{T}})_{134}|\varphi\rangle_{1234}\\ &=\frac{1}{\sqrt{q A+p B}} {\sum}_{i=1}^{p} {\sum}_{j=1}^{q}\left( |0\rangle|i\rangle\left|\overrightarrow{u_{i}}\right|\left|u_{i}\right\rangle|j\rangle+|1\rangle|i\rangle\left|\overrightarrow{v_{j}} \| v_{j}\right\rangle|j\rangle\right)_{1234}, \end{array} $$(9)where, \(A={\sum }_{i=1}^{p}\left |\overrightarrow {u_{i}}\right |^{2}\) and \(B={\sum }_{j=1}^{q}\left |\overrightarrow {v_{j}}\right |^{2}\).
-
(A1.4)
A projective measurement is performed on the first register of state \(\left |\psi \right \rangle \) in the basis \(\{|+\rangle ,|-\rangle \ |\ |\pm \rangle =\frac {1}{\sqrt 2}(|0\rangle \pm |1\rangle )\}\). A direct and simple calculation shows that
$$ Pr(-)=\frac{\left( q {\sum}_{i=1}^{p}\left|\overrightarrow{u_{i}}\right|^{2}+p {\sum}_{j=1}^{q}\left|\overrightarrow{v_{j}}\right|^{2}-2 {\sum}_{i=1}^{p} {\sum}_{j=1}^{q}\left|\overrightarrow{u_{i}}\right|\left|\overrightarrow{v_{j}}\right|\left\langle u_{i} | v_{j}\right\rangle\right)}{2(q A+p B)}. $$(10)which is the probability of that the measurement outcome is |−〉.
-
(A1.5)
Now, we can straightforward to calculate that square of desired distance,
$$ \sum\limits_{i=1}^{p} \sum\limits_{j=1}^{q}\left|\overrightarrow{u_{i}}-\overrightarrow{v_{j}}\right|^{2}=2(q A+p B)\times Pr(-). $$(11)After that, the similarity can be easy to deduce, is equal to \(\frac {{2(q A+p B)\times Pr(-)}}{pq}\).
Through the above steps, we can obtain the average Euclidean distance between two data sets, which can be used to describe the similarity of these. The schematic quantum circuit of Algorithm 1 is shown in Fig. 4.
3.2 Algorithm 2: a Quantum Algorithm for Estimating the Complete Linkage
In classical machine learning, because of the complexity of the average linkage calculation, the complete linkage is used as the distance between data sets sometimes. From (2), it is evident that the complete linkage is equal to the task of finding the maximum, from the distances between all data points in set \(U=\left \{\overrightarrow {u_{i}}\right \}\) and each data point of set \(V=\left \{\overrightarrow {v_{j}}\right \}\), i.e., \(D^{\prime }(U,V)=\max \limits _{i}{{{d}_{i}^{2}}}=\max \limits _{i}{(\max \limits _{j}{d^{2}(\overrightarrow {u_{i}},\overrightarrow {v_{j}})})}\).
Next, the second quantum algorithm for estimating the complete linkage, Algorithm 2, is presented. In this algorithm, we take the same assumption as that in Section 2, the data points are stored as quantum states in QRAM. So, the quantum counterpart of the distance between two points \(\overrightarrow {u_{i}}\) and \(\overrightarrow {v_{j}}\) is represented as
Besides, without loss of generality, we can assume p ≤ q. In this case, we reckon the maximum distance between the sample vector \( \overrightarrow {u_{i}}\) (i = 1,2,⋯ ,p) and each data point of data set \(\left \{ \overrightarrow {v_{j}} \right \}\) in turn, and get set \(\left \{{{d}_{1}^{2}},{{d}_{2}^{2}}, \cdots , {{d}_{p}^{2}}\right \}\) (\({{d}_{i}^{2}}=\max \limits _{j}{d^{2}(\overrightarrow {u_{i}},\overrightarrow {v_{j}})}\), i = 1,2,⋯ ,p). Then, the desired distance can be find based on it. The details of the second algorithm are described in the following steps.
-
(A2.1)
In a similar way to Algorithm 1, p initial states \(|{\phi }_{i}^{\prime }\rangle \) (i = 1,2,⋯ ,p) are prepared firstly. Each state \(|{\phi }_{i}^{\prime }\rangle \) consists of four registers and in the initial state, \(|{\phi }_{i}^{\prime }\rangle = |0\rangle _{1}|i\rangle _{2}|0\rangle _{3}|0\rangle _{4}\). It will be utilized to calculate distance between the ith data point \( \overrightarrow {u_{i}}\) in data set \(\left \{ \overrightarrow {u_{i}} \right \}\) and each point of data set \(\left \{ \overrightarrow {v_{j}} \right \}\).
-
(A2.2)
An operation \(H_{1}\otimes I_{2} \otimes I_{3} \otimes {{F}_{4}^{q}}\) is applied on the state \(|\phi _{i}^{\prime }\rangle \). After this operation, it is in the state,
$$ \begin{array}{ll} \left|{\varphi}_{i}^{\prime}\right\rangle &=H_{1}\otimes I_{2} \otimes I_{3} \otimes {F^{q}_{4}}|\phi_{i}^{\prime}\rangle_{1234}\\ &=\frac { 1 } { \sqrt { 2q } } {\sum}_{ j=1 }^{ q } (| 0 \rangle | i \rangle | 0 \rangle | j \rangle + | 1 \rangle | i \rangle | 0 \rangle | j \rangle)_{1234}. \end{array} $$(13) -
(A2.3)
Two controlled operations \(C(U_{\mathcal {F}})_{123}\), \(C(U_{\mathcal {T}})_{134}\) are performed on the state \(\left |{\varphi }_{i}^{\prime }\right \rangle \). And the details are the same as that in step (A1.3) of Algorithm 1. Then, we obtain the state
$$ \begin{array}{ll} \left|{\psi}_{i}^{\prime}\right\rangle&=C(U_{\mathcal{F}})_{123}C(U_{\mathcal{T}})_{134}|{\varphi}_{i}^{\prime}\rangle_{1234}\\ &=\frac{{\sum}_{j=1}^{q}\left( |0\rangle|i\rangle\left|\overrightarrow{u_{i}}\right|\left|u_{i}\right\rangle|j\rangle+|1\rangle|i\rangle\left|\overrightarrow{v_{j}}\right|\left|v_{j}\right\rangle|j\rangle\right)_{1234}}{\sqrt{q\left|\overrightarrow{u_{i}}\right|^{2}+{\Sigma}_{j}\left|\overrightarrow{v_{j}}\right|^{2}}}. \end{array} $$(14) -
(A2.4)
In this step, we measure these q desired states \(\left |{\psi }_{i}^{\prime }\right \rangle \). Concretely, one projective measurement is performed on the first register of this state in the {|+〉,|−〉} basis. At the same time, the other projective measurement is on the fourth register. This projective measurement is described by an observable, M = Σj|j〉〈j|, a Hermitian operator on the state space of the system being observed. After that, these two registers will project into |−〉〈−| and |j〉〈j| space respectively, with probability
$$ Pr_{i j}=\frac{\left|\overrightarrow{u_{i}}\right|^{2}+\left|\overrightarrow{v_{j}}\right|^{2}-2\left|\overrightarrow{u_{i}}\right|\left|\overrightarrow{v_{j}}\right|\left\langle u_{i} | v_{j}\right\rangle}{2\left( q\left|\overrightarrow{u_{i}}\right|^{2}+{\sum}_{j=1}^{q}\left|\overrightarrow{v_{j}}\right|^{2}\right)}. $$(15)Thus, according to (12) and (15), the auxiliary qubit projects to the space spanned by the index \(\left |j\right \rangle \) of set \(\left \{\left |v_{j}\right \rangle \right \}\), that is the farthest from the ith data point in set \(\{\overrightarrow {u_{i}}\}\), with maximum probability mpi. Finally, it is straightforward to calculate the square of maximum distance between the ith sample vector in the data set \(\{\overrightarrow {u_{i}}\}\) and the data point each of data set \(\{\overrightarrow {v_{j}}\}\),
$$ {{d}_{i}^{2}}=mp_{i} \times 2\left( q\left|\overrightarrow{u_{i}}\right|^{2}+\sum\limits_{j=1}^{q}\left|\overrightarrow{v_{j}}\right|^{2}\right). $$(16) -
(A2.5)
Repeating steps (A2.2)-(A2.4) for each \(|\phi _{i}^{\prime }\rangle \) (i = 1,2,⋯ ,p) in turn. It allows us to get the distance set \({\varOmega }=\left \{{{d}_{1}^{2}},{{d}_{2}^{2}}, \cdots , {{d}_{p}^{2}}\right \}\). Then, it costs time O(p) to check every \({{d}_{i}^{2}}\) and find the maximum by using classical methods. Hence, we retrieve the farthest distance \({d}_{\max \limits }^{2}\) of data point in the data sets \(\{\overrightarrow {u_{i}}\}\) and \(\{\overrightarrow {v_{j}}\}\).
Through the above steps, we can obtain the squared Euclidean distance between the furthest data vectors in two sets. Its execution operations are similar to Algorithm 1, so the circuit diagram can refer to Fig. 4.
3.3 Algorithm 3: a Quantum Algorithm for Estimating the Single Linkage
The single linkage method is more suitable for estimating the similarity of s-shaped and strip-shaped data sets in the clustering algorithm [29]. As shown in Section 2, the single linkage is measuring distance of the nearest pair of data vector in data sets \(\{\overrightarrow {u_{i}}\}\) and \(\{\overrightarrow {v_{j}}\}\), i.e., \(D^{\prime \prime }(U,V)=\min \limits {{{d}_{i}^{2}}}=\min \limits _{i}{(\min \limits _{j}{d^{2}(\overrightarrow {u_{i}},\overrightarrow {v_{j}})})}\). Obviously, the method of estimating the complete linkage can be generalized to that of the single linkage. Next, we will describe the third quantum algorithm for estimating the single linkage. Before describing the steps of it, we first assume that the data sets have been processed which makes the different sets are in different spatial quadrants. Then, we introduce two quantum oracles.
Suppose that the following two quantum oracles Ou, Ov exist to obtain the reciprocal of each component of \(\overrightarrow {u_{i}}\) and \(\overrightarrow {v_{j}}\) respectively,
Here, the data vectors’ components are stored as floating point numbers in quantum random access memory, and the states \(\left |\tilde {u}_{i k}^{\prime }\right \rangle \) and \(|\tilde {v}_{j k}^{\prime }\rangle \) representing vectors may not be normalized. Meanwhile, the two oracles can be used to efficiently prepare the vector states \(\left |\tilde {u}_{i}^{\prime }\right \rangle \), \(\left |\tilde {v}_{j}^{\prime }\right \rangle \) in time \(O\left (\log N\right )\).
The essential steps of this algorithm is depicted as follows.
-
(A3.1)
At first, we construct p initial states \(|{\phi }_{i}^{\prime \prime }\rangle = |0\rangle _{1}|i\rangle _{2}|0\rangle _{3}|0\rangle _{4}\), and each one consists of four registers. As similar to Algorithm 2, the state \(|{\phi }_{i}^{\prime \prime }\rangle \) means we estimate the distance between the ith vector \( \overrightarrow {{u}_{i}}\) and each point of data set \(\left \{ \overrightarrow {{v}_{j}} \right \}\), in this time.
-
(A3.2)
We apply an operation \(H_{1}\otimes I_{2} \otimes {{F}_{3}^{N}} \otimes {{F}_{4}^{q}}\) on the state \(|{\phi }_{i}^{\prime \prime }\rangle \). Here, FN (Fq) is a N(q)-level Fourier transform. In this way, we can obtain state
$$ \begin{array}{ll} \left|{\varphi}_{i}^{\prime\prime}\right\rangle &=H_{1}\otimes I_{2} \otimes {{F}_{3}^{N}} \otimes {{F}_{4}^{q}}|{\phi}_{i}^{\prime\prime}\rangle_{1234}\\ &=\frac{1}{\sqrt{2q}} {\sum}_{j=1}^{q} {\sum}_{k=1}^{N}(|0\rangle|i\rangle|k\rangle|j\rangle+|1\rangle|i\rangle|k\rangle|j\rangle)_{1234}. \end{array} $$(21) -
(A3.3)
Two controlled operations are performed on the state \(\left |{\varphi }_{i}^{\prime \prime }\right \rangle \) based on the oracles. That is, the oracle Ou is applied on the second and the third register if the first qubit of \(\left |{\varphi }_{i}^{\prime \prime }\right \rangle \) is in the state \(\left |0\right \rangle \); otherwise oracle Ov is applied on the third and fourth register. In this way, we will have the specific state
$$ \begin{array}{ll} \left|{\psi}_{i}^{\prime\prime}\right\rangle&=\frac{1}{\sqrt{2q}} {\sum}_{j=1}^{q} {\sum}_{k=1}^{N}\left( |0\rangle O_{u}|i\rangle|k\rangle|j\rangle+|1\rangle|i\rangle O_{v}|k\rangle|j\rangle\right)_{1234}\\ &=\frac{{\sum}_{j=1}^{q}\left( |0\rangle|i\rangle\left|\overrightarrow{{u}_{i}}^{\prime}\right|\left|{u}_{i}^{\prime}\right\rangle|j\rangle+|1\rangle|i\rangle\left|\overrightarrow{{v}_{j}}^{\prime}\right|\left|{v}_{j}^{\prime}\right\rangle|j\rangle\right)_{1234}}{\sqrt{q\left|\overrightarrow{{u}_{i}}^{\prime}\right|^{2}+{\sum}_{j=1}^{q}\left|\overrightarrow{{v}_{j}}^{\prime}\right|^{2}}}. \end{array} $$(22) -
(A3.4)
In this step, the state \(\left |{\psi }_{i}^{\prime \prime }\right \rangle \) instead of state \(\left |{\psi }_{i}^{\prime }\right \rangle \) in the second quantum algorithm. Then, we repeat steps (A2.2)-(A2.4) of Algorithm 2, it can easily adapts to compute the maximum \(d_{i}^{\prime 2}\).
-
(A3.5)
We get the distance set \({\varOmega }^{\prime }=\left \{{d}_{1}^{\prime 2},{d}_{2}^{\prime 2}, \cdots , {d}_{p}^{\prime 2}\right \}\) after we repeat steps (A3.2)-(A3.4) for each state \(|{\phi }_{i}^{\prime \prime }\rangle \) (i = 1,2,⋯ ,p). Hence, we retrieve a distance between the farthest pair of data sets (these correspond to the nearest pair of sets in the original data sets) in the way of step (A2.5) from Algorithm 2.
Although the nearest distance between the original data sets is not calculated through the above steps, our algorithm can map the Euclidean distance between the same closest data point pairs. Then, we can depict the similarity between the two sets by it. This result has the same effect on other machine learning tasks, such as hierarchical clustering.
4 Runtime Analysis
In this paper, we proposed three algorithms’ main ideas are similarly, that they all utilize special measurement results to estimate the Euclidean distance between data sets. However, their implementation details are different, so there are some differences in running time. In this section, the performance of them is analyzed briefly.
From Section 3.1, it is known that the time cost is mainly for two main steps in Algorithm 1, i.e., steps (A1.3) and (A1.4). Evidently, what step (A1.3) generates the state of (9) takes time \(O(\log (pq N^{2}))\) [31] based on two controlled operations \(C(U_{\mathcal {F}})\) and \(C(U_{\mathcal {T}})\). In step (A1.4), the time cost is evidently dominated by the amplitude estimation. According to [28], taking \(O\left (\sqrt {\frac {1-Pr(-)}{Pr(-)}} \cdot \frac {1}{\epsilon }\right )= O\left (\sqrt {\frac {1}{Pr(-)}} \cdot \frac {1}{\epsilon }\right )\) times to perform projective measurement ensures that desired result Pr(−) being within relative error 𝜖. Finally, we can be easy to calculate the average distance by the special measure result. Hence, the time cost of the total Algorithm 1 is \(O\left (\frac {1}{\epsilon \cdot \sqrt {Pr(-)}} \log \left (pq N^{2}\right )\right )\). Compared with time O(ploy(pqN)) for the best-known classical algorithm, in this case the exponential speedup can be achieved.
As for Algorithm 2, it is shown that there are two main differences with the last algorithm. One is preparation of state \(\left |{\psi }_{i}^{\prime }\right \rangle \) as shown in step (A2.3). It takes time \(O(\log (N)+\log (q N))\). Another is that Algorithm 2 gets the maximum distance set \({\varOmega }=\left \{{d_{1}^{2}},{d_{2}^{2}}, \cdots , {d_{p}^{2}}\right \}\). It is described in step (A2.5), and each \({{d}_{i}^{2}}\) for i = 1,2,⋯ ,p costs \(O\left (\frac {1-mp_{i}}{mp_{i}} \cdot \frac {1}{\epsilon _{d}}\right )= O\left (\frac {1}{{mp_{i}\cdot {\epsilon _{d}}}} \right )\) times within relative error 𝜖d of mpi. Therefore, getting the distance set Ω takes time \(O\left ({\sum }_{i=1}^{p} \frac {1}{{{mp_{i}} \cdot \epsilon _{d}}} \log (q N^{2})\right )\). Then, we find the maximum of set \(\left \{{{d}_{1}^{2}},{{d}_{2}^{2}}, \cdots , {{d}_{p}^{2}}\right \}\) by using the simple classical method in time O(p). Through above brief analysis, the total time of Algorithm 2 is \(O\left ({\sum }_{i=1}^{p} \frac {1}{{{mp_{i}} \cdot \epsilon _{d}}} \log (q N^{2})+p\right )\). It means that when the number of data space and the dimension of data are large, Algorithm 2 can achieve exponential speedup over the classical similarity measurement algorithms.
In Algorithm 3, the method adopted is the generalization of the second algorithm. The only difference is that in step (A3.3) of the third quantum algorithm, we preprocess the original data and generate a specific state \(\left |\psi _{i}^{\prime \prime }\right \rangle \) with our quantum oracles Ou and Ov. Thus, the nearest distance is converted to the farthest distance to solve the problem. Due to the parallel processing of the quantum algorithm, the time complexity of Algorithm 3 is similar to that of Algorithm 2. Therefore, it is also exponentially accelerated relative to the steps that produce the same effect on the classical algorithm.
Through the brief analysis, we presented algorithms are accelerated to a certain extent compared with the classical algorithm. It is worth emphasizing that the quantum algorithm implementation of the proposed three programs allow us to call different subroutines according to different needs, just like the classical clustering algorithm.
5 Conclusion
In this paper, we made a further study on the quantum similarity measurement. Instead of date points, the distance between data sets is considered, which usually acts as a basic component of some machine learning algorithms. First, we discussed three common kinds of Euclidean distance methods. Then, the characteristics of parallel access data of quantum random access memory and amplitude estimation technology are utilized to propose the corresponding three quantum algorithms. At last, it is shown that the proposed algorithms can achieve exponential speedup over the classical counterparts for data sets with a mass of high-dimensional vector, via a brief complexity analysis. These results demonstrate that measuring the similarity between data sets can implement other interesting tasks with significantly less quantum resource, e.g., document clustering [33, 34], medical analysis [35, 36], and so on.
References
Shor, P.W.: Algorithms for quantum computation: discrete logarithms and factoring[C]. In: Proceedings 35th Annual Symposium on Foundations of Computer Science, pp 124–134. IEEE Press, New York (1994)
Grover, L.K.: Quantum mechanics helps in searching for a needle in a haystack[J]. Phys. Rev. Lett. 79, 325 (1997)
Harrow, A.W., Hassidim, A., Lloyd, S.: Quantum algorithm for linear systems of equations[J]. Phys. Rev. Lett. 103, 150502 (2009)
Berry, D.W., Ahokas, G., Cleve, R., et al.: Efficient quantum algorithms for simulating sparse Hamiltonians[J]. Commun. Math. Phys. 270, 359–371 (2007)
Lloyd, S., Mohseni, M., Rebentrost, P.: Quantum principal component analysis[J]. Nat. Phys. 10, 631–633 (2014)
Anguita, D., Ridella, S., Rivieccio, F., et al.: Quantum optimization for training support vector machines[J]. Neural Netw. 16, 763–770 (2003)
Lloyd, S., Mohseni, M., Rebentrost, P.: Quantum algorithms for supervised and unsupervised machine learning[J]. arXiv:1307.0411 (2013)
Rebentrost, P., Mohseni, M., Lloyd, S.: Quantum support vector machine for big data classification[J]. Phys. Rev. Lett. 113, 130503 (2014)
Cong, I., Duan, L.: Quantum discriminant analysis for dimensionality reduction and classification[J]. New J. Phys. 18, 073011 (2016)
Ruan, Y., Xue, X., Liu, H., et al.: Quantum algorithm for k-nearest neighbors classification based on the metric of hamming distance[J]. Int. J. Theor. Phys. 56, 3496–3507 (2017)
Aïmeur, E., Brassard, G., Gambs, S.: Quantum speed-up for unsupervised learning[J]. Mach. Learn. 90, 261–287 (2013)
Wiebe, N., Braun, D., Lloyd, S.: Quantum algorithm for data fitting[J]. Phys. Rev. Lett. 109, 050505 (2012)
Schuld, M., Sinayskiy, I., Petruccione, F.: Prediction by linear regression on a quantum computer[J]. Phys. Rev. A 94, 022342 (2016)
Yu, C.H., Gao, F., Wen, Q.Y.: Quantum algorithm for ridge regression[J]. arXiv:1707.09524 (2017)
Garnerone, S., Zanardi, P., Lidar, D.A.: Adiabatic quantum algorithm for search engine ranking[J]. Phys. Rev. Lett. 108, 230506 (2012)
Paparo, G.D., Martin-Delgado, M.A.: Google in a quantum network[J]. Sci. Rep. 2, 444 (2012)
Sánchez-Burillo, E., Duch, J., Gómez-Gardenes, J., et al.: Quantum navigation and ranking in complex networks[J]. Sci. Rep. 2, 605 (2012)
Paparo, G.D., Müller, M., Comellas, F., et al.: Quantum google in a complex network[J]. Sci. Rep. 3, 2773 (2013)
Schuld, M., Sinayskiy, I., Petruccione, F.: The quest for a quantum neural network[J]. Quantum Inf. Process 13, 2567–2586 (2014)
Jiang, D.H., Wang, J., Liang, X.Q., et al.: Quantum voting scheme based on locally indistinguishable orthogonal product states[J]. Int. J. Theor. Phys. 59, 436–444 (2020)
Cohen-Addad, V., Klein, P.N., Mathieu, C.: Local search yields approximation schemes for k-means and k-median in euclidean and minor-free metrics[J]. SIAM J. Comput. 48, 644–667 (2019)
Ahmadian, S., Norouzi-Fard, A., Svensson, O., et al.: Better guarantees for k-means and euclidean k-median by primal-dual algorithms[J]. SIAM Journal on Computing, 2019, FOCS17-97 (2019)
Soucy, P., Mineau, G.W.: A simple KNN algorithm for text categorization[C]. In: Proceedings 2001 IEEE International Conference on Data Mining, pp 647–648. IEEE, California (2001)
Xing, W., Bei, Y.: Medical health big data classification based on KNN classification algorithm[J] IEEE access (2019)
Aaronson, S.: BQP And the polynomial hierarchy[C]. In: Proceedings of the Forty-second ACM Symposium on Theory Of Computing, 141-150, California (2010)
Seifoddini, H.K.: Single linkage versus average linkage clustering in machine cells formation applications[J]. Comput. Ind. Eng. 16, 419–426 (1989)
Giovannetti, V., Lloyd, S., Maccone, L.: Quantum random access memory[J]. Phys. Rev. Lett. 100, 160501 (2008)
Brassard, G., Hoyer, P., Mosca, M., et al.: Quantum amplitude amplification and estimation[J]. Contemp. Math. 305, 53–74 (2002)
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation[J]. ACM Trans. Knowl. Discov. Data 1, 4–es (2007)
Kerenidis, I., Prakash, A.: Quantum recommendation systems[J]. arXiv:1603.08675 (2016)
Wossnig, L., Zhao, Z., Prakash, A.: Quantum linear system algorithm for dense matrices[J]. Phys. Rev. Lett. 120, 050502 (2018)
Araújo, M., Feix, A., Costa, F., et al.: Quantum circuits cannot control unknown operations[J]. New J. Phys. 16, 093026 (2014)
Sardar, T.H., Ansari, Z.: An analysis of MapReduce efficiency in document clustering using parallel K-means algorithm[J]. Future Comput. Inform. J. 3, 200–209 (2018)
Yang, R., Qu, D., Qian, Y., et al.: An online log template extraction method based on hierarchical clustering[J]. EURASIP J. Wirel. Commun. Netw. 2019, 1–12 (2019)
Lange, J.K., DiSegna, S.T., Yang, W, et al.: Using cluster analysis to identify patient factors linked to differential functional gains after total knee Arthroplasty[J]. J. Arthroplasty 35, 121–126 (2020)
Zhang, Y., Liu, Y., Li, Y., et al.: Hierarchical and complex system entropy clustering analysis based validation for traditional Chinese medicine syndrome patterns of chronic atrophic gastritis[J]. J. Altern. Complement. Med. 25, 1130–1139 (2019)
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grants No. 61976053 and No. 61772134), Fujian Province Natural Science Foundation (Grant No. 2018J01776), and Program for New Century Excellent Talents in Fujian Province University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yu, K., Guo, GD., Li, J. et al. Quantum Algorithms for Similarity Measurement Based on Euclidean Distance. Int J Theor Phys 59, 3134–3144 (2020). https://doi.org/10.1007/s10773-020-04567-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10773-020-04567-1