Quantum computing methods for supervised learning

Kulkarni, Viraj; Kulkarni, Milind; Pant, Aniruddha

doi:10.1007/s42484-021-00050-0

Quantum computing methods for supervised learning

Review Article
Published: 06 September 2021

Volume 3, article number 23, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Quantum Machine Intelligence Aims and scope Submit manuscript

Quantum computing methods for supervised learning

Download PDF

1128 Accesses
23 Citations
Explore all metrics

Abstract

The last two decades have seen an explosive growth in the theory and practice of both quantum computing and machine learning. Modern machine learning systems process huge volumes of data and demand massive computational power. As silicon semiconductor miniaturization approaches its physics limits, quantum computing is increasingly being considered to cater to these computational needs in the future. Small-scale quantum computers and quantum annealers have been built and are already being sold commercially. Quantum computers can benefit machine learning research and application across all science and engineering domains. However, owing to its roots in quantum mechanics, research in this field has so far been confined within the purview of the physics community, and most work is not easily accessible to researchers from other disciplines. In this paper, we provide a background and summarize key results of quantum computing before exploring its application to supervised machine learning problems. By eschewing results from physics that have little bearing on quantum computation, we hope to make this introduction accessible to data scientists, machine learning practitioners, and researchers from across disciplines.

Quantum Machine Learning: A Review and Current Status

Quantum Computing for Machine Learning

Quantum machine learning

Article 14 September 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Supervised learning is the most commonly applied form of machine learning. It works in two stages. During the training stage, the algorithm extracts patterns from the training dataset that contains pairs of samples and labels and converts these patterns into a mathematical representation called a model. During the inference stage, this model is used to make predictions about unseen samples. Machine learning algorithms, in general, are data hungry; their performance depends heavily on the size of the datasets used for training. However, this training process is computationally expensive, and working with large datasets requires huge amounts of computational horsepower. As the number and volume of datasets available for research and commercial purposes continues to grow exponentially, new technologies such as NVIDIA CUDA (Nickolls et al. 2008) and Google TPUs (Jouppi et al. 2017) have emerged to enable faster processing of data. Computational speeds have been increasing rapidly over the last several decades in accordance with the observation that the number of transistors in a dense integrated circuit (IC) doubles about every 2 years—this is informally known as Moore’s Law (Friedman 2015). The semiconductor manufacturing process has shrunk from 10 μm in 1970 to about 5 nm in 2020. It is believed, however, that we are nearing the limits of the processing power that classical computers can offer us (Leiserson et al. 2020). At scales smaller than this, quantum mechanical effects come into play (Sperling 2018), and they impose physical limitations on how small electronic components can get.

Quantum computing, on the other hand, proposes to leverage these quantum mechanical effects to carry out computation. In contrast to classical computers that operate on bits that can exist in only one out of two states at a time, quantum computers exploit the fact that quantum bits (qubits) can exist in any one of the infinite possible linear superpositions of these two states. This allows quantum computers to execute multiple paths of computation simultaneously. Quantum computers can efficiently solve computational problems which are believed to be intractable for classical computers (Nielsen and Chuang 2002). Owing to its roots in theoretical physics, most research articles on the topic are written for physicists; this makes them difficult to access for researchers from other fields. The purpose of this paper is to give a broad overview of the synergies between quantum computing and machine learning. We briefly outline the history of quantum physics, describe the preliminaries of quantum computation, and then review the latest research on applying the principles of quantum computation to supervised machine learning. A large-scale general-purpose quantum computer does not yet exist, but restricted quantum machines capable of solving optimization problems are already being sold commercially (Castelvecchi 2017; Johnston 2013). By eschewing results from physics that have little bearing on quantum computation and by providing additional background that may benefit the unfamiliar reader, we hope to make this introduction accessible to data scientists, machine learning practitioners, and researchers from other fields.

Quantum mechanics arose through a number of discoveries in the early twentieth century. Einstein (1905) explained the photoelectric effect in 1905 by postulating that light and all electromagnetic radiation is made up of discrete particles that later came to be called photons. Previously in 1865, Maxwell (1865) had demonstrated that electric and magnetic fields travel through space as waves. de Broglie (1924) proposed in 1923 that particles can exhibit wave-like properties and waves can behave like particles. Building on his approach, Heisenberg (Edwards 1979) developed matrix mechanics, and Schrödinger (1926) developed wave mechanics, both of which were later found to be equivalent. These developments laid the foundation of quantum mechanics. The equations of quantum mechanics have since been extensively tested in innumerable experiments, but even after almost a century of debate, physicists strongly disagree over how those equations should be interpreted and mapped to reality (Schlosshauer et al. 2013).

Benioff (1980) in 1980 and Feynman (1999) in 1982 observed that simulating the evolution of certain quantum systems may be an intractable problem that cannot be solved efficiently by computers, and yet, these quantum systems solved the problem by merely evolving thus suggesting that the evolution of quantum systems could be used as a method of computation. In 1985, Deutsch (1985) designed a universal quantum computer as a quantum-counterpart to the Universal Turing Machine. Deutsch and Jozsa (1992) proposed the Deutsch-Jozsa problem for which the deterministic quantum algorithm is exponentially faster than any deterministic classical solution. Shor’s algorithm (1999) for factoring large integers was the first quantum algorithm that could solve a problem of practical significance faster than any classical algorithm. Grover’s algorithm (1996) showed quadratic improvement in unordered search problems. These results laid the foundation of quantum computation. Since then, quantum algorithms have been proposed for numerous areas including cryptography, search and optimisation, simulation of quantum systems, solving large systems of linear equations, and machine learning (Montanaro 2016).

The volume of training data available to us as well as the sizes of machine learning models in terms of trainable parameters are both growing at a rapid pace. On the other hand, the computational power that classical computing can afford us is fast approaching its limits. In this scenario, quantum computing is increasingly being looked upon as a front-running candidate to cater to the computational demands of machine learning and artificial intelligence in the future. Machine learning algorithms often involve repeated execution of linear algebra routines on large matrices. Quantum solutions can offer exponential speed-up for these routines (Harrow et al. 2009; Wiebe et al. 2012). The benefits quantum computing can bring to machine learning, however, go beyond speed-up in execution. Many tasks in machine learning such as maximum likelihood estimation using hidden variables, principal component analysis, and training of neural networks. require optimization of a non-convex objective function. Optimizing non-convex functions is an NP-hard problem. Classical optimization methods such as gradient descent can get stuck at local minimum or saddle points and may never find the global minimum. Adiabatic quantum computers use quantum annealing to solve non-convex optimization problems by finding low-energy configurations of an appropriate energy function by exploiting tunneling effects to escape local minima (Santoro and Tosatti 2006). Methods based on Grover’s search can find the global minimum in a discrete unordered search space. Under certain conditions, there is a provable separation between quantum learnability and classical learnability, and there are classes of functions that can be learned in polynomial time using quantum methods but not using classical methods (Ciliberto et al. 2018; Servedio and Gortler 2004). In other words, quantum computers may not only find solutions faster than classical computers, but they may even find solutions that are better than those found classically.

The term quantum machine learning is generally used to denote analysis of classical data on quantum computers. This is known as quantum-enhanced machine learning. There are however other ways in which the fields of quantum computing and machine learning overlap. Classical machine learning can be applied to data emanating from quantum systems to solve problems in physics. Another stream of research deals with generalizing classical machine learning to work with quantum data where the input and output are quantum states. Recently, Tang (2019) developed a classical algorithm for recommendation systems that was inspired by quantum computing creating a new category referred to as quantum-inspired algorithms. These are classical algorithms that can be run on conventional computers which borrow ideas from quantum computing to achieve significant theoretical speed-ups over the best prevailing classical algorithms. This paper limits itself to quantum-enhanced machine learning and presents a selection of quantum approaches for implementing supervised machine learning algorithms on quantum computers. Far from being a comprehensive review of the field, it aims to offer the reader a background on the multitude of approaches proposed over the years with enough detail to set the stage for a more detailed exploration. For additional information, we recommend the following excellent surveys and reviews: (Schuld et al. 2015b; Wittek 2014; Biamonte et al. 2017; Adcock et al. 2015; Arunachalam and de Wolf 2017; Kopczyk 2018; Dunjko and Briegel 2018; Dunjko and Wittek 2020; Montanaro 2016).

2 Background of quantum computation

Quantum mechanics is based on four fundamental postulates (Dunjko and Briegel 2018; Nielsen and Chuang 2002): (1) the pure state of a quantum system is given by a unit vector |ψ〉 in a complex Hilbert space; (2) the evolution of a pure state in a closed system is governed by a Hamiltonian H as specified by Schrödinger’s equation $H |{\psi }\rangle = i \hbar \frac {\partial }{\partial t} |{\psi }\rangle $; (3) the quantum state of a composite system is given by the tensor product of the individual systems; (4) projective measurements (observables) are specified by Hermitian operators, and the process of measurement changes the observed system from |ψ〉 to an eigenstate |ϕ〉 with probability given by the Born rule p(ϕ) = |〈ψ|ϕ〉|². In this section, we briefly set up the background of quantum computation based on the above postulates.

2.1 Single qubit

A classical bit can exist in one of two states denoted as 0 and 1. A quantum bit or qubit can exist not only in these two discrete states but in all possible linear superpositions of them. Mathematically, the quantum state of a qubit is represented as a state vector in a two-dimensional Hilbert space. In the Dirac notation, the state vector of a qubit ψ is called a ket and is written as:

$$ \begin{array}{@{}rcl@{}} |{\psi}\rangle = \alpha |0\rangle + \beta |1\rangle \end{array} $$

(1)

where α and β are complex numbers and |α|² + |β|² = 1. The Born’s rule tells us that if this qubit is measured, we will get |0〉 with probability |α|² and |1〉 with probability |β|². Quantum measurements are non-deterministic, and the act of measurement changes the quantum state irreversibly. Before measurement, the qubit exists in a quantum superposition of the states |0〉 and |1〉. The outcome of the measurement, however, is not quantum but classical, i.e. you get either a |0〉 or a |1〉 but not a superposition of the two. During the measurement, the quantum state collapses^{Footnote 1} to the classical state it gets observed in, and all subsequent measurements deterministically result in this same outcome with a probability equal to 1.

The choice of basis vectors |0〉 and |1〉 is arbitrary. We can represent the system using a different set of orthogonal basis vectors such as |+〉 and |−〉 (called the Hadamard or sign basis). Once the computational basis is decided, kets can be represented as column vectors:

$$ \begin{array}{@{}rcl@{}} |0\rangle = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, |1\rangle = \begin{bmatrix} 0 \\ 1 \end{bmatrix} \end{array} $$

(2)

$$ \begin{array}{@{}rcl@{}} |{\psi}\rangle = \alpha \begin{bmatrix} 1 \\ 0 \end{bmatrix} + \beta \begin{bmatrix} 0 \\ 1 \end{bmatrix} = \begin{bmatrix} \alpha \\ \beta \end{bmatrix} \end{array} $$

(3)

The two representations for |ψ〉 given in Eqs. 1 and 3 are equivalent.

2.2 Multiple qubits

The quantum state of a system consisting of more than one unentangled qubits can be represented as the tensor product of the quantum states of the individual qubits. The state of a two-qubit system comprising of qubits represented by |ψ₁〉 and |ψ₂〉 can be written as |Ψ〉 = |ψ₁〉⊗|ψ₂〉. In general, the state of n qubits $\{|{\psi _{1}}\rangle , |{\psi _{2}}\rangle , \dots , |{\psi _{n}}\rangle \}$ can be represented as:

$$ \begin{array}{@{}rcl@{}} |{{\varPsi}}\rangle=|{\psi_{1}}\rangle \otimes |{\psi_{2}}\rangle \otimes {\dots} \otimes |\psi_{k}\rangle=|{\psi_{1}\psi_{2}\dots\psi_{k}}\rangle . \end{array} $$

(4)

However, not all multi-qubit states can be represented as a tensor product of individual states. Consider the state below, one of the Bell states:

$$ \begin{array}{@{}rcl@{}} |{{\varPhi}^{+}}\rangle = \frac{1}{\sqrt{2}}(|00\rangle + |11\rangle) \end{array} $$

(5)

Suppose it could be decomposed into the tensor product of two states as below:

$$ \begin{array}{@{}rcl@{}} |{{\varPhi}^{+}}\rangle &=& (a_{1}|0\rangle + b_{1}|1\rangle) \otimes (a_{2}|0\rangle + b_{2}|1\rangle) \\ &=& a_{1}a_{2}|00\rangle + a_{1}b_{2}|01\rangle + a_{2}b_{1}|10\rangle + b_{1}b_{2}|11\rangle \end{array} $$

(6)

From Eqs. 5 and 6, we know that a₁b₂ = a₂b₁ = 0. Therefore, either a₁a₂ = 0 or b₁b₂ = 0. But, from Eq. 5, both a₁a₂≠ 0 and b₁b₂≠ 0. This proves that the Bell state |Φ⁺〉 cannot be decomposed into the tensor product of two single-qubit states. In such cases, we say that the two qubits are entangled. Given an entangled pair of qubits, measurement on one qubit instantaneously affects the other qubit. Entanglement plays a central role in many quantum algorithms especially in the field of quantum cryptography. There is no counterpart to quantum entanglement in classical physics.

2.3 Quantum gates

Classical computers manipulate information stored in bits using logic gates such as AND, OR, NOT, NAND, and XOR. Likewise, quantum computers manipulate qubits using quantum gates. Transformations on quantum states are represented as rotation of the Hilbert space. Rotation is linear and reversible. Consequently, all transformations on quantum states must be linear and reversible. Quantum gates essentially transform the system from one state to another state. These transformations can be represented as matrices.

The simplest quantum gate is the NOT gate. The NOT gate transforms |ψ₁〉 = α|0〉 + β|1〉 to |ψ₂〉 = α|1〉 + β|0〉 and can be represented as:

$$ \begin{array}{@{}rcl@{}} \textit{NOT}=\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \ \end{array} $$

(7)

The Hadamard gate acts on a single qubit. It is often used to map a qubit from one of its basis states into an equal superposition of all basis states. It transforms |0〉 to $\frac {1}{\sqrt {2}}(|0\rangle +|1\rangle )$ and |1〉 to $\frac {1}{\sqrt {2}}(|0\rangle -|1\rangle )$ and is given by:

$$ \begin{array}{@{}rcl@{}} H = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \end{array} $$

(8)

In general, an n-qubit Hadamard gate is used to initialize an n-qubit system into an equal superposition of all basis states:

(9)

where x ∈{0,1}ⁿ denotes all strings of length n consisting of 0 and 1.

The CNOT (controlled-NOT) gate acts on two qubits where the first qubit acts as a control signal that decides whether the NOT operation should be performed on the second qubit. If the control qubit is |1〉, the NOT operation is applied; if it is |0〉, it is not applied. The CNOT gate leaves the states |00〉 and |01〉 unchanged, while it maps |10〉 to |11〉 and |11〉 to |10〉. It is represented as:

$$ \begin{array}{@{}rcl@{}} CNOT = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{bmatrix} \end{array} $$

(10)

The SWAP gate swaps the states of two qubits transforming |ψ,ϕ〉 to |ϕ,ψ〉. The CSWAP (controlled-SWAP) gate acts on three qubits and swaps the state of the second and third qubit if the first qubit is |1〉. The Toffoli gate (CCNOT) acts on three qubits and performs the computation:

$$ \begin{array}{@{}rcl@{}} |a, b, c\rangle \rightarrow |a, b, c \oplus ab\rangle \end{array} $$

(11)

2.4 Quantum parallelism

While classical computers can execute only one computational path at a time, quantum computers can leverage the ability of quantum states to exist in superpositions to simultaneously execute multiple computational paths. For example, consider the classical function $f(x): \{0,1\}^{2} \rightarrow \{0,1\}$. The function takes two bits as input and outputs a single bit. To evaluate f on all two-bit permutations using classical computation, we need to call f four times: f(0,0), f(0,1), f(1,0), and f(1,1). Quantum superposition allows us to evaluate all four inputs in a single call to f.

Since quantum transformations must be reversible and f is not reversible, we define a reversible quantum function:

$$ \begin{array}{@{}rcl@{}} U_{f}|x\rangle|y\rangle\rightarrow|x\rangle|y \oplus f(x)\rangle \end{array} $$

(12)

The input |ϕ〉 is set up in a superposition of states by initializing two qubits to |0〉 and applying the Hadamard transform:

$$ \begin{array}{@{}rcl@{}} |\phi\rangle = (H \otimes H)|00\rangle &=& \frac{|0\rangle+|1\rangle}{\sqrt{2}} \otimes \frac{|0\rangle+|1\rangle}{\sqrt{2}} \\&=& \frac{1}{2}(|00\rangle+|01\rangle+|10\rangle+|11\rangle) \end{array} $$

(13)

Setting |y〉 = |0〉, we apply U_f as follows:

$$ \begin{array}{@{}rcl@{}} U_{f}|\phi\rangle|0\rangle&=&\frac{1}{2}U_{f}(|00\rangle+|01\rangle+|10\rangle+|11\rangle)\otimes|0\rangle \\ &=& \frac{1}{2}U_{f}(|00,0\rangle+|01,0\rangle+|10,0\rangle+|11,0\rangle) \\ &=& \frac{1}{2}(|00,f(00)\rangle+|01,f(01)\rangle\\ &&+|10,f(10)\rangle+|11,f(11)\rangle) \end{array} $$

(14)

Thus, with a single application of f, we simultaneously evaluate four inputs. Using the Hadamard transform, the set-up in the input in an equal superposition of all basis vectors is a useful starting point for many quantum algorithms.

2.5 No cloning theorem

An important result that has profound implications is the no cloning theorem (Wootters and Zurek 1982) which states that it is not possible to create a copy of an unknown quantum state. Since measurement irreversibly changes the quantum state, given a single copy of the state |ψ〉 = α|0〉 + β|1〉, the values of the amplitudes α and β cannot be exactly determined. Although quantum parallelism can be leveraged to simultaneously execute multiple computational paths, the no cloning theorem places restrictions on the amount of information one can extract from the final quantum state^{Footnote 2}.

2.6 Adiabatic quantum computation

Numerous models have been proposed for quantum computing such as the quantum Turing machine, quantum circuit model, adiabatic quantum computing, measurement-based quantum computing, blind quantum computing, and topological quantum computing (Dunjko and Briegel 2018). All these models are computationally equivalent, but they are implemented very differently. An approach that has shown promise in solving optimization problems is adiabatic^{Footnote 3} quantum computing (Farhi et al. 2000) and is of particular interest because building restricted quantum computers to perform quantum annealing (Section 3.2) based on adiabatic quantum computing is simpler than building universal quantum computers. In adiabatic quantum computing, the optimization problem to be solved is encoded as a boolean satisfiability problem such that the ground state of its Hamiltonian^{Footnote 4} represents the desired solution. The quantum system is initially set up with a simple Hamiltonian that is easy to construct. The system is then evolved from the initial state to a final state. The adiabatic theorem states that if the system is evolved slowly enough, it will remain in the ground state of the instantaneous Hamiltonian throughout the evolution. The final system configuration then represents the solution to the optimization problem.

3 Quantum machine learning

Quantum computing methods for machine learning can be divided into two broad classes: (1) methods designed to run on a universal quantum computer that involve preparation, storage, and processing of quantum states and the retrieval of the classical solution from these states; (2) methods designed to run on quantum annealers that solve optimization problems through the physical evolution of quantum systems according to the principles of adiabatic quantum computing. In Section 3.1, we describe common subroutines designed for circuit quantum computers that can be applied to machine learning problems. In Section 3.2, we present quantum annealing that can solve quadratic unconstrained binary optimization (QUBO) problems. Finally, in Section 3.3, we present quantum versions of selected classical machine learning algorithms.

3.1 Important subroutines of quantum algorithms

A straightforward approach to achieving speed-ups over classical machine learning algorithms is to identify their computationally expensive and frequently executed subroutines and develop quantum alternatives for them. In this section, we describe some common subroutines that form a part of many quantum learning algorithms.

3.1.1 Quantum encoding

Qubits are a scarce and expensive resource. The restricted physical implementations of quantum computers available today have very few qubits^{Footnote 5}. Furthermore, the procedure to encode classical data into quantum states is far from trivial and, in many cases, contributes significantly to the overall complexity of the quantum algorithm. An important question therefore that has implications for performance and feasibility of quantum algorithms is how to represent classical data in quantum states.

Basis encoding

Suppose we have a dataset D having M instances: D = {x⁽¹⁾,...,x^(m),...,x^(M)}. Suppose for each instance x^(m) ∈ D, x^(m) consists of N binary features so that $x^{(m)}=\{b^{(m)}_{1},...,b^{(m)}_{n},...,b^{(m)}_{N}\}$. In basis encoding, each instance x^(m) is encoded as the basis state |x^(m)〉. The dataset D can thus be represented as a superposition of all computational basis states:

$$ \begin{array}{@{}rcl@{}} |{\psi}\rangle = \frac{1}{\sqrt{M}}\sum\limits_{i=1}^{M}|{x^{(i)}}\rangle \end{array} $$

(15)

For example, if D = {010,011}, it can be represented using basis encoding as:

$$ \begin{array}{@{}rcl@{}} |D\rangle = \frac{1}{\sqrt{2}}|010\rangle + \frac{1}{\sqrt{2}}|011\rangle \end{array} $$

(16)

In general, preparing a quantum state representing a dataset D of M instances consisting of N binary features each has a runtime of $\mathcal {O}(MN)$ and requires N qubits as described by Ventura and Martinez (2000). Since M ≪ 2^N in most cases, basis encoding tends to be sparse, i.e. most basis states have zero amplitude.

Amplitude encoding

In amplitude encoding, data is encoded in the amplitudes of quantum states. A single data instance x with N normalized real-valued features can be encoded as:

$$ \begin{array}{@{}rcl@{}} |{\psi_{x}}\rangle = \sum\limits_{i=1}^{N}x_{i}|i\rangle \end{array} $$

(17)

For example, x = {2,3} can be encoded in the amplitudes of the two basis states |0〉 and |1〉 as:

$$ \begin{array}{@{}rcl@{}} |x\rangle = \frac{1}{\sqrt{2^{2}+3^{2}}}(2|0\rangle + 3|1\rangle) \end{array} $$

(18)

Consider a dataset D of M such instances D = {x⁽¹⁾,...,x^(m),...,x^(M)}. All individual instances x^(m) can be concatenated into a single MN-dimensional vector $D^{\prime } = \{x^{(1)}_{1},...,x^{(1)}_{N},...,x^{(M)}_{1},...,x^{(M)}_{N}\}$ and then further encoded into the amplitudes of a quantum state as described in Eq. 17. This encoding requires log(MN) qubits and assumes an efficient procedure to prepare a quantum state with arbitrary amplitudes. Although some quantum states can be prepared efficiently, there exist states in the Hilbert space that cannot be prepared efficiently (Kliesch et al. 2011). Schuld and Petruccione (2018b) present a list of methods to prepare amplitude encoded states.

Besides basis and amplitude encoding, many other methods of encoding exist such as Qsample encoding, dynamic encoding, squeezing embedding, displacement embedding, and Hamiltonian encoding (Schuld and Killoran 2019; Lloyd et al. 2020).

3.1.2 Quantum random access memory

Classical computers store the input data required for the computation in random access memory (RAM). A RAM consists of an address register, an output register, and memory cells. It takes an n bit address as input in the address register and returns the data stored in one of N = 2ⁿ memory cells in the output register. This operation incurs a runtime of $\mathcal {O}(N)$ in the typical implementations used today. The quantum analog of RAM, referred to as Quantum Random Access Memory (QRAM), performs the same operation using quantum registers. It takes as input a superposition of addresses ${\sum }_{a}c_{a}|a\rangle $ and returns a superposition of the data at those addresses ${\sum }_{a}c_{a}|a\rangle |{D_{a}}\rangle $ in $\mathcal {O}(\log N)$ time:

(19)

Many quantum algorithms proceed by assuming the existence of a QRAM that can encode classical data in a superposition and later retrieve it efficiently. Giovannetti et al. (2008) show that conventional RAM architectures are not suitable for such a purpose and propose a bucket-brigade architecture for the QRAM where the N data points are stored in the leaves of a tree-like structure. Such a QRAM can encode Nd −dimensional vectors in $\mathcal {O}(\log Nd)$ using $\log (Nd)$ qubits.

Implementing such a QRAM system in hardware is not trivial, and it has so far not been experimentally realized. It also remains to be seen whether the theoretical speedup that QRAM offers to quantum algorithms will be nullified by practical physical considerations. Although the runtime of the QRAM scales logarithmically, the physical resources it requires scale as $\mathcal {O}(Nd)$. In the presence of noise, active error correction will be necessary to compensate for the errors it causes, and this may nullify the speedup QRAM offers (Arunachalam et al. 2015; Adcock et al. 2015). Loading data into the QRAM is possible in logarithmic time only when the data exhibits a certain level of uniformity. In the absence of this uniformity, the QRAM loses its efficiency and needs $\mathcal {O}(\sqrt {N})$ time. In addition to these caveats, the above analyses do not consider the speed of communication and memory latency. These factors become prominent only when the amount of data is very large. However, since quantum computing’s appeal for machine learning is based on the premise that it will enable us to train models on data volumes larger than what classical computers can support, the limitations imposed by latency need to be studied further. Hardware considerations will ultimately play an important role in determining the feasibility of the QRAM.

Due to its general nature, QRAM is used by many quantum algorithms as a black-box oracle, and it plays a central role in achieving speedup over classical alternatives. However, it is worth noting that not all quantum algorithms depend on the QRAM, and there exist alternative methods to encode classical data into quantum states and retrieve it efficiently (Grover and Rudolph 2002; Aaronson 2015).

3.1.3 Grover’s algorithm and amplitude amplification

Grover’s algorithm (1996) is a quantum search algorithm that offers a quadratic speed-up over classical algorithms when performing a search over an unstructured search space. Suppose we are given a set of N elements X = {x₁,...,x_i,...,x_N} where $x_{i} \in \{0,1\}^{m}$ and a boolean function $f:\{0,1\}^{m} \rightarrow \{0,1\}$ such that:

$$ \begin{array}{@{}rcl@{}} f(x)= \begin{cases} 1 & x = x^{*}\\ 0 & x \neq x^{*} \end{cases} \end{array} $$

(20)

Any classical algorithm that performs a search for x^∗ in X is $\mathcal {O}(N)$ in time. Grover’s algorithm can perform such a search in $\mathcal {O}(\sqrt {N})$. The algorithm has three steps to it.

In the first step, a quantum state is set up in an equal superposition of basis states using the Hadamard transform. As an example, consider N = 8. We set up the state using 3 qubits as:

$$ \begin{array}{@{}rcl@{}} |{\psi_{1}}\rangle&=&(H \otimes H \otimes H)|000\rangle \\ &=&\frac{1}{2\sqrt{2}} (|000\rangle+|001\rangle+|010\rangle+|011\rangle\\&&+|100\rangle+|101\rangle+|110\rangle+|111\rangle) \\ &=&\begin{bmatrix} \frac{1}{2\sqrt{2}} & \frac{1}{2\sqrt{2}} & \frac{1}{2\sqrt{2}} & \frac{1}{2\sqrt{2}} & \frac{1}{2\sqrt{2}} & \frac{1}{2\sqrt{2}} & \frac{1}{2\sqrt{2}} & \frac{1}{2\sqrt{2}}\end{bmatrix}^{T} \end{array} $$

(21)

The second step referred to as phase inversion deals with flipping the amplitude of each |x〉 if f(x) = 1 and leaving it unchanged if f(x) = 0. To do this, we define a unitary quantum oracle O|x〉 = (− 1)^f(x)|x〉. Suppose, in our example, x^∗ is present at the fourth position. Applying gate O on |ψ₁〉 gives us:

$$ \begin{array}{@{}rcl@{}} |{\psi_{2}}\rangle=\begin{bmatrix} \frac{1}{2\sqrt{2}} & \frac{1}{2\sqrt{2}} & \frac{1}{2\sqrt{2}} & \frac{-1}{2\sqrt{2}} & \frac{1}{2\sqrt{2}} & \frac{1}{2\sqrt{2}} & \frac{1}{2\sqrt{2}} & \frac{1}{2\sqrt{2}}\end{bmatrix}^{T} \end{array} $$

(22)

The third step referred to as inversion around the mean involves flipping all amplitudes around their collective mean $\mu =\frac {1}{N}{\sum }_{x}{\alpha _{x}}$. This is performed by the Grover diffusion operator G:

(23)

Applying G to |ψ₂〉 gives us:

$$ \begin{array}{@{}rcl@{}} |{\psi_{3}}\rangle=\begin{bmatrix} \frac{1}{4\sqrt{2}} & \frac{1}{4\sqrt{2}} & \frac{1}{4\sqrt{2}} & \frac{5}{4\sqrt{2}} & \frac{1}{4\sqrt{2}} & \frac{1}{4\sqrt{2}} & \frac{1}{4\sqrt{2}} & \frac{1}{4\sqrt{2}}\end{bmatrix}^{T} \end{array} $$

(24)

These two operations of phase inversion and inversion around the mean constitute one round of the algorithm. Each successive round increases the amplitude of the target element x^∗ and reduces that of all other elements. If we were to measure the system at the end of the first round, we would get the target element as outcome with a probability of $(\frac {5}{4\sqrt {2}})^{2}=78\%$.

After application of the second round (i.e. performing phase inversion and inversion around the mean once again), we get the below state which will find the target element with a probability of 95%:

$$ \begin{array}{@{}rcl@{}} |{\psi_{4}}\rangle=\begin{bmatrix} \frac{-1}{8\sqrt{2}} & \frac{-1}{8\sqrt{2}} & \frac{-1}{8\sqrt{2}} & \frac{11}{8\sqrt{2}} & \frac{-1}{8\sqrt{2}} & \frac{-1}{8\sqrt{2}} & \frac{-1}{8\sqrt{2}} & \frac{-1}{8\sqrt{2}}\end{bmatrix}^{T} \end{array} $$

(25)

For a list of N inputs, $\sqrt {N}$ such rounds are usually performed. The same algorithm can also find k matching entries instead of a single entry. Several modifications have been proposed that extend this work. Durr and Hoyer (1996) propose a quantum algorithm to find the index of the minimum value from a list of N values in time $\mathcal {O}(c\sqrt {N})$ with a probability of at least $(1-\frac {1}{2^{c}})$. These methods generalize Grover’s search and are collectively referred to as amplitude amplification techniques (Brassard and Hoyer 1997).

3.1.4 Calculating inner products using swap test

The swap test (Buhrman et al. 2001) is a simple subroutine used to compute the overlap between two quantum states |ϕ〉 and |ψ〉. Quantum procedures can be easily described using circuit diagrams. The circuit diagram of the swap test is shown in Fig. 1.

The system is initially prepared in the state |0,ϕ,ψ〉. The Hadamard gate applied on the first ancilla qubit^{Footnote 6} transforms the state to $\frac {1}{\sqrt {2}}(|0,\phi ,\psi \rangle +|1,\phi ,\psi \rangle )$. The CSWAP further transforms it to $\frac {1}{\sqrt {2}}(|0,\phi ,\psi \rangle +|1,\psi ,\phi \rangle )$. After the application of the second Hadamard gate to the first qubit, the state can be written as $\frac {1}{2}|0\rangle (|{\phi ,\psi }\rangle +|{\psi ,\phi }\rangle )+\frac {1}{2}|1\rangle (|{\phi ,\psi }\rangle -|{\psi ,\phi }\rangle )$. The probability of measuring the first qubit as |0〉 is given by $p=\frac {1}{2}+\frac {1}{2}{|\langle {\psi |\phi }\rangle |}^{2}$. If |ϕ〉 and |ψ〉 are equal, |〈ψ|ϕ〉|² = 1, and the observed value of p is 1. If |ϕ〉 and |ψ〉 are orthogonal, |〈ψ|ϕ〉|² = 0, and the observed value of p is $\frac {1}{2}$. The degree of overlap given by the inner product of the two states can be estimated with this method to precision 𝜖 using $\mathcal {O}(1/\epsilon )$ copies (Dunjko and Briegel 2018).

3.1.5 Solving systems of linear equations (HHL)

Solving a system of linear equations is an important problem ubiquitous throughout science and engineering. A seminal result in quantum computation is the HHL algorithm (Harrow et al. 2009) that solves the following problem: given a Hermitian matrix $A \in \mathbb {R}^{N \times N}$ and a unit vector $\overrightarrow {b} \in \mathbb {R}^{N}$, find a solution vector $\overrightarrow {x} \in \mathbb {R}^{N}$ that satisfies the equation $A\overrightarrow {x}=\overrightarrow {b}$.

We present here a condensed outline of the algorithm. The solution we are interested in is |x〉 = A^− 1|b〉. Let {v₁,...,v_N} be the eigenvectors of A with corresponding eigenvalues {λ₁,...,λ_N}. The vector $\overrightarrow {b}$ is encoded using amplitude encoding (Section 3.1.1) as $|b\rangle ={\sum }_{i=1}^{N}{\beta _{i}|{v_{i}}\rangle }$. Hamiltonian simulation is used to transform the matrix A into a unitary operator, and quantum phase estimation^{Footnote 7} is used to carry out eigendecomposition to get the state ${\sum }_{i=1}^{N}{\beta _{i}|{v_{i}}\rangle |{\lambda _{i}}\rangle }$. An ancilla qubit is added, rotation conditioned on |λ_i〉 is carried out, and the eigenvalue register |λ_i〉 is uncomputed to yield a state proportional to ${\sum }_{i=1}^{N}{\beta _{i} \lambda _{i}^{-1}|{v_{i}}\rangle } = A^{-1}|b\rangle = |x\rangle $.

It is important to note that while a classical algorithm finds all coefficients x_i for x, the HHL algorithm finds the quantum state $|x\rangle ={\sum }_{i=1}^{N}{x_{i}|i\rangle }$. Obtaining values for all x_i takes $\mathcal {O}(N)$ repetitions; this observation nullifies the speed-up the quantum algorithm has over the classical counterparts. Hence, the HHL algorithm is most useful when used as a subroutine carrying out an intermediate step in a larger process where the quantum state |x〉 is consumed by the next subroutine in the process.

3.2 Quantum annealing

Quantum annealing is a metaheuristic optimization algorithm that leverages quantum effects to solve quadratic unconstrained binary optimization (QUBO) problems that deal with optimizing functions of the form:

$$ \begin{array}{@{}rcl@{}} C(x_{1},x_{2},...,x_{n}) = \sum\limits_{i}a_{i}x_{i} + \sum\limits_{i,j}b_{i,j}x_{i}x_{j} \end{array} $$

(26)

where $a_{i} \in \mathbb {R}$, $b_{i,j} \in \mathbb {R}$, x_i ∈{0,1}. A wide range of problems can be mapped to QUBO and then solved by quantum annealers which are special-purpose quantum computers specifically built to perform quantum annealing.

The Ising model is used in physics to represent a large variety of systems. It was originally proposed to model magnetic materials where every molecule has a spin that can align or anti-align with an applied magnetic field (Bian et al. 2010). The Hamiltonian of the system representing its energy is given by:

$$ \begin{array}{@{}rcl@{}} H = \sum\limits_{i}h_{i}s_{i} + \sum\limits_{i,j}J_{i,j}s_{i}s_{j} \end{array} $$

(27)

where s_i ∈{− 1,+ 1} is the spin of the i th molecule, h_i is the strength of the magnetic field at the i th molecule, and J_i,j is the strength of the interaction between neighboring spins i and j. From Eqs. 26 and 27, it can be seen that the QUBO problem convenient maps onto the Ising Hamiltonian with the mapping s_i = 2x_i − 1.

Quantum annealing works as follows. An initial Hamiltonian H₀ that is easy to construct is chosen. The system is evolved under a time-dependent Hamiltonian given by:

$$ \begin{array}{@{}rcl@{}} H(t) = (1-t)H_{0} + tH_{f} \end{array} $$

(28)

where t is gradually changed from 0 to 1 and the final Hamiltonian H_f is the same as in Eq. 27. At t = 0, the system starts in the ground state of H₀. According to the quantum adiabatic theorem, the system remains in the ground state of the instantaneous Hamiltonian H(t) throughout its evolution provided it is changed sufficiently slowly (Ambainis and Regev 2004). At t = 1, the final Hamiltonian of the system will encode the solution to the problem.

Quantum annealing should not be conflated with the more general adiabatic quantum computing. Quantum annealing specifically solves optimization problems; adiabatic quantum computing is a model of quantum computing that is equivalent to a universal quantum computer. The Hamiltonians used in quantum annealing are classical Hamiltonians, while adiabatic quantum computing uses quantum Hamiltonians that have no classical counterparts (Biswas et al. 2017). For a more comprehensive treatment of adiabatic quantum computing, we refer the reader to (Albash and Lidar 2018).

3.3 Quantum algorithms for machine learning

In this section, we explore how the background and subroutines described in the previous sections can be applied to solve machine learning problems. The common supervised machine learning setting is as follows. The training set consists of M instances {x₁,...,x_M} where $x_{i} \in \mathbb {R}^{N}$ with their corresponding labels {y₁,...,y_M}. Each label y_i can be either a real value (for regression problems) or a discrete class label (for classification problems). In the training phase, the algorithm extracts patterns from the dataset and learns a model. In the inference phase, this model is used to process unseen instances and predict their corresponding labels.

3.3.1 k-Nearest neighbors

The k-nearest neighbors (KNN) is one of the simplest supervised learning algorithms. To predict the label of a new unseen instance x_test, the algorithm looks at k instances in the training set that are closest to x_test and chooses the class that appears most often in the labels of these k-nearest neighbors as the predicted label (for regression, the algorithm assigns the mean value of the k-nearest neighbors as the label). An advantage of KNN is that, unlike many other supervised algorithms, it is non-parametric and makes no assumptions about the data distribution. However, since all computation is deferred, inference can become prohibitively expensive for large training sets. During inference, the distance of the test instance from all other training instances is calculated; this is the most computationally intensive step in the process. Hence, quantum versions of KNN focus on faster evaluation of the distance between two instances.

Aïmeur et al. (2006) propose using the overlap |〈a|b〉| as computed by the swap test (Section 3.1.3) as a measure of similarity between |a〉 and |b〉. Llyod et al. (Lloyd et al. 2013) develop a technique based on the swap test to compute the distance between $\overrightarrow {a}$ and $\overrightarrow {b}$. A state $|{\psi }\rangle =\frac {1}{\sqrt {2}}(|0,a\rangle +|1,b\rangle )$ is constructed by setting up an ancilla. A second state given by $|\phi \rangle =\frac {1}{\sqrt {Z}}(|\overrightarrow {a}||0\rangle -|\overrightarrow {b}||1\rangle )$ is constructed where $Z=|\overrightarrow {a}|^{2}+|\overrightarrow {b}|^{2}$. Using $\langle x|x \rangle =|\overrightarrow {x}|^{-1}|\overrightarrow {x}|$, the authors make the observation that $|\overrightarrow {a}-\overrightarrow {b}|^{2}=Z|\langle \phi |\psi \rangle |^{2}$. With this, the distance between $\overrightarrow {a}$ and $\overrightarrow {b}$ can be retrieved using a swap test (Schuld et al. 2015b). The authors use the above technique to implement the nearest-centroid classification algorithm in which the centroids of all training instances belonging to each class are precomputed; during inference, a class label is assigned to the test instance by calculating its distance from all centroids. Given a training set of M instances, this procedure solves the problem of classifying an N-dimensional test vector into one of several classes in $\mathcal {O}(log(MN))$ compared to $\mathcal {O}(MN)$ required by classical algorithms.

Wiebe et al. (2014a, 2014b) argue that the nearest-centroid classification presented above can perform poorly in practice since training instances are often embedded in complicated manifolds, and the centroids may lie outside these manifolds. They propose two fast methods for computing the distance between vectors based on an alternative representation of classical information in quantum states, amplitude amplification, and Durr-Hoyer minimum finding (Durr and Hoyer 1996).

3.3.2 Support vector machines

Support vector machines (SVM) (Cortes and Vapnik 1995) is a popular classification algorithm that determines the optimal hyperplane that separates instances of two classes in the training data and classifies test instances based on which side of the separating hyperplane they lie on. Given a set of training instances with their labels {(x₁,y₁),...,(x_i,y_i),...,(x_M,y_M)} where $x_{i} \in \mathbb {R}^{N}$ and y_i ∈{− 1,+ 1}, the algorithm learns an N-dimensional hyperplane given by w = [β₁,β₂,...,β_N]^T that separates the instances of the two classes with maximum margin. Classification of a test instance x_t is performed as:

$$ \begin{array}{@{}rcl@{}} y_{t} = sign(w^{T}x_{t} + b) \end{array} $$

(29)

Since the instances may not be strictly separable, slack variables are introduced that provide a soft margin by allowing some data points to violate the margin criterion. This formulation is known as the maximum margin classifier or support vector classifier; this however still requires the data points to be linearly separable. SVMs overcome this limitation of linear separability by what is known as the kernel trick which transforms the feature space into a new, higher-dimensional feature space. Instances that were not linearly separable in the original feature space may be linearly separable in the new feature space. Mathematically, the kernel trick generalizes the dot product between two feature vectors 〈x_i,x_j〉 by a kernel function k(x_i,x_j). Different kernels can be chosen depending on the data distribution. Training an SVM involves solving the quadratic programming problem (Press et al. 2007):

$$ \begin{array}{@{}rcl@{}} minimize && E=\frac{1}{2}\sum\limits_{nm}\alpha_{n}\alpha_{m}t_{n}t_{m}k(x_{n},x_{m})-\sum\limits_{n}\alpha_{n}, \\ subject to && 0 \leq \alpha_{n} \leq C, \\ and{\kern25.5pt} && \sum\limits_{n}\alpha_{n}t_{n}=0 \end{array} $$

(30)

where $\alpha _{n} \in \mathbb {R}$, C is the regularization parameter, and k(⋅,⋅) is the kernel function.

Anguita et al. (2003) observe that training support vector machines may be a hard problem and propose a quantum variant that uses Durr and Hoyer’s minimum finding (Durr and Hoyer 1996) based on Grover’s algorithm to solve the optimization problem. Rebentrost et al. (2012) suggest computing inner products using a quantum method based on an approach similar to the one discussed in Section 3.3.1 which leads to an exponential speed-up with respect to the dimension of the feature vector N. They also describe a least-squares reformulation of the SVM algorithm with slack variables e_j that converts the quadratic optimization problem into a problem of solving a system of linear equations which leads to an additional exponential speed-up in terms of the number of training instances M:

$$ \begin{array}{@{}rcl@{}} y_{j}(\overrightarrow{w} \cdot \overrightarrow{x_{j}}+b) \geq 1 \rightarrow (\overrightarrow{w} \cdot \overrightarrow{x_{j}}+b)=y_{j}-y_{j}e_{j} \end{array} $$

(31)

As shown in Section 3.2, quantum annealing is particularly well-suited for solving optimization problems. Willsch et al. (2020) demonstrate a practical implementation of training SVMs on the commercially available D-Wave DW2000Q quantum annealer by formulating it as a QUBO problem. Mengoni and Di Pierro (2019) review approaches for designing quantum computing techniques for kernel methods in machine learning.

3.3.3 Neural networks

Artificial neural networks or simply neural networks were originally inspired from biological neural networks that model the activity of neurons in human brains. The basic building block of a neural network is the neuron, also called node or perceptron, that maps the input $x \in \mathbb {R}^{N}$ to the output $y \in \mathbb {R}$ as follows^{Footnote 8}:

$$ \begin{array}{@{}rcl@{}} y = g\left( \sum\limits_{i=1}^{N}{w_{i}x_{i}} + b\right) \end{array} $$

(32)

where $w_{i} \in \mathbb {R}$, $b \in \mathbb {R}$, and $g: \mathbb {R} \rightarrow \mathbb {R}$ is the activation function. The outputs of some neurons can be fed as inputs to other neurons thus creating layers within the network.

Even though neural networks were amongst the first machine learning algorithms to be proposed, their research stagnated for several decades from 1940s to 1980s due to the inherent difficulty and large computational power required to train them. They returned to popularity after the introduction of backpropagation (Rumelhart et al. 1986) which eased these problems by offering a faster method for training. In the last decade, with GPUs and cloud computing providing cheaper access to massive computational power, neural networks have dwarfed other learning algorithms^{Footnote 9} to become one of the biggest success stories of modern computers finding applications in various industries including healthcare, manufacturing, finance, and analytics to solve problems in image processing, computer vision, natural language processing, predictive modeling, and many other areas. However, even today, significant resources are required to train neural networks, and training times for research and industrial problems can run into weeks or even months.

An obvious difficulty arises in considering quantum computation as a means for implementing neural networks. Quantum computation (and indeed quantum mechanics itself) is a theory fundamentally based on linear transformations, while an important practical advantage neural networks enjoy over many other learning algorithms is that they can model non-linear data distributions. Bringing non-linearity into quantum algorithms is a non-trivial task (Cao et al. 2017). However, classical neural networks do make heavy use of linear algebra, and the inherent randomness of quantum mechanical effects can be leveraged to automatically introduce noise in the training process to improve model robustness—something that needs to be done purposefully in classical training (Allcock et al. 2018). Numerous neural network architectures have emerged in recent times to tackle problems belonging to a wide range of supervised, unsupervised, and reinforcement learning tasks (Goodfellow et al. 2016). Research in this field has been scattered with different proposals addressing narrow problems in piecemeal style. We present here a select subset of these proposals.

Most early work on quantizing^{Footnote 10} neural networks focussed on Hopfield networks (1982) which differ from the neural networks presently used in practice. In a Hopfield network, all neurons have undirected connections with all other neurons as opposed to feed-forward networks that are organized as layers; also, each neuron outputs a binary 0 or 1 instead of a real number. Hopfield networks are used to model associative memories which allow retrieval of data based on content rather than addresses. Kak (1995) introduces the idea of quantum neural computation and Peruš (2000) describes a quantum associative network by drawing analogies between Hopfield networks and quantum information processing. Behrman et al. (2000) present a quantum realization of neural networks by showing that a single quantum dot^{Footnote 11} molecule can act as a recurrent quantum neural network. More recently, Rebentrost et al. (2018) present a technique based on quantum Hebbian learning and fast quantum matrix inversion to train Hopfield networks.

Boltzmann machines (Ackley et al. 1985), closely related to Hopfield networks, are stochastic generative networks that can learn a probability distribution over a set of inputs. They are trained by adjusting the interconnection weights between the neurons so that the thermal statistics of the system as described by the Boltzmann-Gibbs distribution reproduces the statistics of the data (Biamonte et al. 2017). Boltzmann machines can be conveniently represented by an Ising model whose spins encode features and interactions encode statistical dependencies between the features. In a restricted Boltzmann machine, connections exist only between neurons belonging to different layers; this makes them easier to train than fully-connected Boltzmann machines. Restricted Boltzmann machines can be stacked together to form deep belief networks that can be used to learn internal representations or can be trained under supervision to perform classification. Training Boltzmann machines is exponentially hard and is performed using approximation techniques like contrastive divergence (Hinton 2002; Salakhutdinov and Hinton 2009) that rely on Gibbs sampling. Wiebe et al. (2014a, 2014b) propose quantum methods to efficiently train full Boltzmann machines by preparing a coherent analog of the Gibbs state from which samples can be drawn. Adachi et al. (2015) investigate an alternative approach of performing the sampling on a D-Wave quantum annealer instead of classical Gibbs sampling.

A different line of research involves developing quantum analogs for classical perceptrons. Schuld et al. (2015a) introduce a quantum perceptron model with a step activation function that can be used to develop superposition-based learning schemes in which a superposition of training vectors can be processed in parallel. Kapoor et al. (2016) develop two quantum techniques for modeling perceptrons; the first provides quadratic speed-up with respect to the number of training instances, and the second provides a quadratic reduction in the scaling of the training time with the margin between the two classes.

Feedforward networks are one of the simplest neural network architectures in which the connections between neurons do not form any loops or cycles. They are usually trained using backpropagation, and the optimization is performed using some variant of gradient descent. Most machine learning architectures used in practice today are based on feedforward networks or their derivatives such as convolutional neural networks or recurrent neural networks (Goodfellow et al. 2016). Allcock et al. (2018) define an efficient quantum subroutine for robust inner product estimation using QRAM (Giovannetti et al. 2008) and use it to demonstrate quadratic speed-ups in the size of the network over classical counterparts; they additionally claim that the proposed quantum method naturally imitates regularization techniques like drop-out leading to more robust networks. Farhi et al. (2000) present a general framework for binary classification specifically designed for near-term quantum processors in which the input strings are mapped to computational basis states, and the predicted label is given by the outcome of a Pauli operator measured on a readout qubit. This framework extends to a full quantum neural network that can classify both classical and quantum data. Convolutional neural networks (CNNs) (Lecun et al. 1998) have achieved great success (Krizhevsky et al. 2012) in image classification tasks in recent times. They, however, suffer from the fact that the operation of convolution is computationally expensive. Kerenidis et al. (2019) design a quantum CNN based on quantum computation of the convolution product between two tensors. They also propose a quantum tomography^{Footnote 12} sampling approach to recover classical information from the network output and a quantum backpropagation algorithm for efficient training of the quantum CNN.

3.3.4 Variational quantum classifiers

As discussed in the preceding sections, a trove of quantum techniques have been proposed to solve problems in machine learning. Most of them, however, require computational resources that are far beyond what quantum computers provide today. A new research direction has emerged in recent times that focuses on developing techniques that can be practically implemented on the noisy, intermediate-scale quantum (NISQ) computers that we expect to have in the next few years. These techniques need fewer than 100 qubits, are robust against errors, and are generally aimed at solving problems that have immediate useful applications.

A quantum circuit is composed of a series of unitary transformations that act on the given input state. Such a circuit can be trained to solve a classification problem by tuning its parameters based on how a cost function responds to the training data. Low-depth quantum circuits are suitable for error-correction (Li and Benjamin 2017; Temme et al. 2017), and this makes them a natural candidate for quantum machine learning on NISQ processors. A quantum classifier consists of a circuit U(𝜃) parameterized by the set of parameters 𝜃 that is initialized in a fixed initial state (usually the zero state). An observable $\hat {B}$ is defined to represent the output of the circuit. The cost function of such a classifier can be defined as $f(\theta )=\langle {0}|U^{\dag }(\theta )\hat {B}U(\theta )|0\rangle $. The training process involves finding optimum candidates for the parameters 𝜃 to minimize the cost function f(𝜃). This optimization is carried out using classical methods, most often some variation of gradient descent.

A general theory of variational algorithms and quantum circuit learning is presented in McClean et al. (2016) and Mitarai et al. (2018). Havlíček et al. (2019) propose using a variational quantum circuit as a classifier by representing the feature space as quantum states. Schuld et al. (2020) present a circuit-based supervised classification algorithm for amplitude-encoded data that can be trained using a hybrid quantum-classical scheme of gradient descent. Farhi and Neven (2018) introduce a quantum neural network based on variational circuits for near-term quantum computers that learn boolean functions by tuning the circuit parameters.

Ensemble methods have achieved state-of-art performance on many machine learning challenges. In an ensemble, multiple models are trained on the same task, and their predictions are weighed or combined together to arrive at the final decision. Schuld and Petruccione (2018a) introduce the quantum ensemble of quantum classifiers where the parameters of the classifier are held in superposition allowing a quantum computer to train an exponentially large number of classifiers in parallel. Abbas et al. (2020) show that this quantum ensemble framework contains a classically intractable computation that can be solved efficiently on a quantum computer by reducing it to the Deutsch Jozsa algorithm. Chen et al. (2021) extend circuit learning to quantum data and propose using a shallow parameterized quantum circuit trained using classical methods to classify quantum states.

4 Discussion

Quantum computation has made great strides in the last two decades in both theory and practice. A significant corpus of research has emerged in applying the principles of quantum computation to problems across many fields of science and engineering. At the same time, several approaches for physical realization of quantum computers based on superconducting quantum bits, trapped ions, optical lattices, photonic computing, nuclear magnetic resonance, etc. have shown promise. However, fundamental challenges remain unresolved on both fronts. While developing quantum algorithms, we must consider the input problem and the output problem (Biamonte et al. 2017). The input problem refers to the fact that the cost of reading classical data and encoding it in quantum states can sometimes dominate performance and render the further downstream speed-up irrelevant. The output problem refers to the reverse process of decoding the full classical solution from quantum states. Some important hardware challenges to constructing, operating, and maintaining large-scale quantum computers include achieving longer coherence, greater circuit depth, higher qubit quality, and higher control over qubits. Quantum error correction plays an important role, and it will likely span across hardware and software in the future.

It is important to distinguish between quantum-enhanced machine learning that focuses on techniques to implement learning on classical data using quantum computers from quantum-generalisation of machine learning algorithms that deals with developing fully quantum algorithms that work with quantum data. This is especially true for neural networks for which active research is underway on both fronts. We restrict this paper to quantum-enhanced techniques and refer the reader to Cao et al. (2017), Beer et al. (2020), Amin et al. (2018), Wan et al. (2017), and Cong et al. (2019) for work on quantum generalisations. The principles of quantum computing have inspired development of new classical randomized algorithms that show exponential speed-ups over conventional algorithms (Tang 2019). With these quantum-inspired algorithms, the gap between certain classical and quantum algorithms is no longer exponential but polynomial. Arrazola et al. (2019) provide a study of these algorithms and observe that they work well only under stringent conditions which occur rarely in practice. We do not cover these in this paper.

Owing to its roots in quantum physics, research in quantum computing has so far been confined within the purview of the physics community. Although realization of quantum computers in the form of hardware will remain a problem for physicists, we believe this need not be the case when it comes to applying quantum computing to solve machine learning problems. Classical computing and machine learning, like physics and many other fields, serve as prime examples of disciplines where theoretical results were obtained far before technological progress made possible their experimental realizations. Small-scale quantum computers with less than 100 qubits and quantum annealers with around 2000 qubits have been developed and are already being sold commercially (Castelvecchi 2017; Johnston 2013). We hope this article serves its purpose as introductory material for interested machine learning researchers and practitioners from various disciplines.

Notes

Different interpretations exist regarding the collapse of the quantum state (Schlosshauer et al. 2013). The popular Copenhagen Interpretation suggests that the wave function of a quantum system collapses on observation. The alternative Many Worlds Interpretation suggests that there is no collapse of the wave function; instead, the act of observation results in the observer getting entangled with the observed system.
Although perfect cloning is impossible, Bužek and Hillery (1996) proposed a universal cloning machine that can make imperfect copies of unknown quantum states with high fidelity.
In the context of quantum computing, an adiabatic process is a process which changes the state of a system so gradually that the state can adapt its configuration at each point.
The Hamiltonian operator represents the total energy of the system.
The recent quantum supremacy experiment conducted by Google used only 54 qubits (Arute et al. 2019).
Input qubits that do not hold any input data but are added to satisfy other conditions (most often reversibility of the transformation) are called auxiliary or ancillary qubits.
The quantum phase estimation algorithm can estimate the phase (or eigenvalue) of an eigenvector of a unitary operator.
This is the general form used in modern feedforward neural networks. The original perceptron used a step activation function and produced only binary outputs 0 and 1.
Many informal texts now relegate all other learning algorithms to the category conventional machine learning.
Developing a quantum alternative to a classical computation technique is often referred to as quantizing it although we prefer this term is used sparingly.
Quantum dots are nanometre-scale semiconductor particles.
Quantum tomography is the process by which a quantum state is reconstructed using measurements on an ensemble of identical quantum states.

References

Aaronson S (2015) Read the fine print. Nat Phys 11(4):291
Google Scholar
Abbas A, Schuld M, Petruccione F (2020) On quantum ensembles of quantum classifiers. Quantum Mach Intell 2(1):1
Google Scholar
Ackley DH, Hinton G, Sejnowski TJ (1985) A learning algorithm for Boltzmann machines. Cogn Sci 9(1):147
Google Scholar
Adachi SH, Henderson MP (2015) Application of quantum annealing to training of deep neural networks. arXiv:1510.06356
Adcock J, Allen E, Day M, Frick S, Hinchliff J, Johnson M, Morley-Short S, Pallister S, Price A, Stanisic S (2015) Advances in quantum machine learning. arXiv:1512.02900
Aïmeur E, Brassard G, Gambs S (2006) Machine learning in a quantum world. In: Conference of the Canadian society for computational studies of intelligence. Springer, pp 431–442
Albash T, Lidar DA (2018) Adiabatic quantum computation. Rev Modern Phys 90(1):015002
MathSciNet Google Scholar
Allcock J, Hsieh CY, Kerenidis I, Zhang S (2018) Quantum algorithms for feedforward neural networks. arXiv:1812.03089
Ambainis A, Regev O (2004) An elementary proof of the quantum adiabatic theorem. arXiv:quant-ph/0411152
Amin MH, Andriyash E, Rolfe J, Kulchytskyy B, Melko R (2018) Quantum boltzmann machine. Phys Rev X 8(2):021050
Google Scholar
Anguita D, Ridella S, Rivieccio F, Zunino R (2003) Quantum optimization for training support vector machines. Neural Netw 16(5-6):763
MATH Google Scholar
Arrazola JM, Delgado A, Bardhan BR, Lloyd S (2019) Quantum-inspired algorithms in practice. arXiv:1905.10415
Arunachalam S, Gheorghiu V, Jochym-O’Connor T, Mosca M, Srinivasan PV (2015) On the robustness of bucket brigade quantum RAM. New J Phys 17(12):123010
MATH Google Scholar
Arunachalam S, de Wolf R (2017) Guest column: a survey of quantum learning theory. ACM SIGACT News 48(2):41
MathSciNet Google Scholar
Arute F, Arya K, Babbush R, Bacon D, Bardin JC, Barends R, Biswas R, Boixo S, Brandao FG, Buell DA, et al. (2019) Quantum supremacy using a programmable superconducting processor. Nature 574(7779):505
Google Scholar
Beer K, Bondarenko D, Farrelly T, Osborne TJ, Salzmann R, Scheiermann D, Wolf R (2020) Training deep quantum neural networks. Nat Commun 11(1):1
Google Scholar
Behrman EC, Nash L, Steck JE, Chandrashekar V, Skinner SR (2000) Simulations of quantum neural networks. Inform Sci 128(3-4):257
MathSciNet MATH Google Scholar
Benioff P (1980) The computer as a physical system: a microscopic quantum mechanical Hamiltonian model of computers as represented by Turing machines. J Stat Phys 22(5):563
MathSciNet MATH Google Scholar
Biamonte J, Wittek P, Pancotti N, Rebentrost P, Wiebe N, Lloyd S (2017) Quantum machine learning. Nature 549(7671):195
Google Scholar
Bian Z, Chudak F, Macready WG, Rose G (2010) The Ising model: teaching an old problem new tricks. D-wave systems 2
Biswas R, Jiang Z, Kechezhi K, Knysh S, Mandrà S., O’Gorman B, Perdomo-Ortiz A, Petukhov A, Realpe-Gómez J., Rieffel E, et al. (2017) A NASA perspective on quantum computing: Opportunities and challenges. Parallel Comput 64:81
MathSciNet Google Scholar
Brassard G, Hoyer P (1997) An exact quantum polynomial-time algorithm for Simon’s problem. In: Proceedings of the fifth Israeli symposium on theory of computing and systems. IEEE, pp 12–23
de Broglie L (1924) XXXV. A tentative theory of light quanta. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 47(278):446
Google Scholar
Buhrman H, Cleve R, Watrous J, De Wolf R (2001) Quantum fingerprinting. Phys Rev Lett 87(16):167902
Google Scholar
Bužek V, Hillery M (1996) Quantum copying: Beyond the no-cloning theorem. Phys Rev A 54(3):1844
MathSciNet Google Scholar
Cao Y, Guerreschi GG, Aspuru-Guzik A (2017) Quantum Neuron: an elementary building block for machine learning on quantum computers. arXiv:1711.11240
Castelvecchi D (2017) IBM’s quantum cloud computer goes commercial. Nature News 543 (7644):159
Google Scholar
Chen H, Wossnig L, Severini S, Neven H, Mohseni M (2021) Universal discriminative quantum neural networks. Quantum Mach Intell 3(1):1
Google Scholar
Ciliberto C, Herbster M, Ialongo AD, Pontil M, Rocchetto A, Severini S, Wossnig L (2018) Quantum machine learning: a classical perspective. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 474(2209):20170551
MathSciNet MATH Google Scholar
Cong I, Choi S, Lukin MD (2019) Quantum convolutional neural networks. Nat Phys 15 (12):1273–1278. https://doi.org/10.1038/s41567-019-0648-8
Article Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273
MATH Google Scholar
Deutsch D (1985) Quantum theory, the Church–Turing principle and the universal quantum computer. Proceedings of the Royal Society of London A Mathematical and Physical Sciences 400(1818):97
MathSciNet MATH Google Scholar
Deutsch D, Jozsa R (1992) Rapid solution of problems by quantum computation. Proceedings of the Royal Society of London Series A: Mathematical and Physical Sciences 439(1907):553
MathSciNet MATH Google Scholar
Dunjko V, Briegel HJ (2018) Machine learning & artificial intelligence in the quantum domain: a review of recent progress. Reports on Progress in Physics 81(7):074001
MathSciNet Google Scholar
Dunjko V, Wittek P (2020) A non-review of quantum machine learning: trends and explorations. Quantum Views 4:32
Google Scholar
Durr C, Hoyer P (1996) A quantum algorithm for finding the minimum. arXiv:quant-ph/9607014
Edwards DA (1979) The mathematical foundations of quantum mechanics. Synthese 42(1):1
MathSciNet MATH Google Scholar
Einstein A (1905) On a heuristic viewpoint concerning the production and transformation of light. Annalen der Physik
Farhi E, Goldstone J, Gutmann S, Sipser M (2000) Quantum computation by adiabatic evolution. arXiv:quant-ph/0001106
Farhi E, Neven H (2018) Classification with quantum neural networks on near term processors. arXiv:1802.06002
Feynman RP (1999) Simulating physics with computers. 1982, reprinted in: Feynman and Computation
Friedman TL (2015) Moore’s law turns 50. The New York Times 13
Giovannetti V, Lloyd S, Maccone L (2008) Quantum random access memory. Phys Rev Lett 100(16):160501
MathSciNet MATH Google Scholar
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge
MATH Google Scholar
Grover LK (1996) A fast quantum mechanical algorithm for database search. In: Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pp 212–219
Grover L, Rudolph T (2002) Creating superpositions that correspond to efficiently integrable probability distributions. arXiv:quant-ph/0208112 quant-ph/0208112
Harrow AW, Hassidim A, Lloyd S (2009) Quantum algorithm for linear systems of equations. Phys Rev Lett 103(15):150502
MathSciNet Google Scholar
Havlíček V, Córcoles AD, Temme K, Harrow AW, Kandala A, Chow JM, Gambetta JM (2019) Supervised learning with quantum-enhanced feature spaces. Nature 567(7747):209
Google Scholar
Hinton G (2002) Training products of experts by minimizing contrastive divergence. Neural computation 14(8):1771
MATH Google Scholar
Hopfield JJ (1982) Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences 79(8):2554
MathSciNet MATH Google Scholar
Johnston H (2013) D-Wave sells second quantum computer-this time to NASA. Phys World 26 (07):9
Google Scholar
Jouppi NP, Young C, Patil N, Patterson D, Agrawal G, Bajwa R, Bates S, Bhatia S, Boden N, Borchers A et al (2017) In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th annual international symposium on computer architecture, pp 1–12
Kak SC (1995) Quantum neural computing. In: Advances in imaging and electron physics, vol 94. Elsevier, pp 259–313
Kapoor A, Wiebe N, Svore K (2016) Quantum perceptron models. In: Advances in neural information processing systems, pp 3999–4007
Kerenidis I, Landman J, Prakash A (2019) Quantum algorithms for deep convolutional neural networks. arXiv:1911.01117
Kliesch M, Barthel T, Gogolin C, Kastoryano M, Eisert J (2011) Dissipative quantum church-turing theorem. Phys Rev Lett 107(12):120501
Google Scholar
Kopczyk D (2018) Quantum machine learning for data scientists. arXiv:1804.10068
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278
Google Scholar
Leiserson CE, Thompson NC, Emer JS, Kuszmaul BC, Lampson BW, Sanchez D, Schardl TB (2020) There’s plenty of room at the Top: What will drive computer performance after Moore’s law? Science 368(6495)
Li Y, Benjamin SC (2017) Efficient variational quantum simulator incorporating active error minimization. Phys Rev X 7(2):021050
Google Scholar
Lloyd S, Mohseni M, Rebentrost P (2013) Quantum algorithms for supervised and unsupervised machine learning. arXiv:1307.0411
Lloyd S, Schuld M, Ijaz A, Izaac J, Killoran N (2020) Quantum embeddings for machine learning
Maxwell JC (1865) VIII. A dynamical theory of the electromagnetic field. Philosophical Transactions of the Royal Society of London 155:459. https://royalsocietypublishing.org/doi/10.1098/rstl.1865.0008
Google Scholar
McClean JR, Romero J, Babbush R, Aspuru-Guzik A (2016) The theory of variational hybrid quantum-classical algorithms. New J Phys 18(2):023023
MATH Google Scholar
Mengoni R, Di Pierro A (2019) Kernel methods in quantum machine learning. Quantum Mach Intell 1(3):65
Google Scholar
Mitarai K, Negoro M, Kitagawa M, Fujii K (2018) Quantum circuit learning. Phys Rev A 98(3):032309
Google Scholar
Montanaro A (2016) Quantum algorithms: an overview. npj Quantum Information 2(1):1
Google Scholar
Nickolls J, Buck I, Garland M, Skadron K (2008) Scalable parallel programming with CUDA. Queue 6(2):40
Google Scholar
Nielsen MA, Chuang I (2002) Quantum computation and quantum information
Peruš M (2000) Neural networks as a basis for quantum associative networks. Neural Netw World 10(6):1001
Google Scholar
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical recipes 3rd edition: The art of scientific computing. Cambridge University Press, Cambridge
MATH Google Scholar
Rebentrost P, Bromley TR, Weedbrook C, Lloyd S (2018) Quantum Hopfield neural network. Phys Rev A 98(4):042308
Google Scholar
Rebentrost P, Mohseni M, Lloyd S (2012) Quantum support vector machine for big feature and big data classifica-tion. arXiv:1307.0471 2014
Rumelhart DE, Hinton G, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533
MATH Google Scholar
Salakhutdinov R, Hinton G (2009) Deep boltzmann machines. In: Artificial intelligence and statistics, pp 448–455
Santoro GE, Tosatti E (2006) Optimization using quantum mechanics: quantum annealing through adiabatic evolution. J Phys A Math Gen 39(36):R393
MathSciNet MATH Google Scholar
Schlosshauer M, Kofler J, Zeilinger A (2013) A snapshot of foundational attitudes toward quantum mechanics. Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics 44(3):222
MATH Google Scholar
Schrödinger E (1926) An undulatory theory of the mechanics of atoms and molecules. Phys Rev 28(6):1049
Google Scholar
Schuld M, Bocharov A, Svore KM, Wiebe N (2020) Circuit-centric quantum classifiers. Phys Rev A 101(3):032308
MathSciNet Google Scholar
Schuld M, Killoran N (2019) Quantum machine learning in feature Hilbert spaces. Phys Rev Lett 122(4):040504
Google Scholar
Schuld M, Petruccione F (2018) Quantum ensembles of quantum classifiers. Scientific Reports 8(1):1
MATH Google Scholar
Schuld M, Petruccione F (2018) Supervised learning with quantum computers. Springer, Berlin
MATH Google Scholar
Schuld M, Sinayskiy I, Petruccione F (2015) Simulating a perceptron on a quantum computer. Phys Lett A 379(7):660
MATH Google Scholar
Schuld M, Sinayskiy I, Petruccione F (2015) An introduction to quantum machine learning. Contemp Phys 56(2):172
MATH Google Scholar
Servedio RA, Gortler SJ (2004) Equivalences and separations between quantum and classical learnability. SIAM J Comput 33(5):1067
MathSciNet MATH Google Scholar
Shor PW (1999) Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM review 41(2):303
MathSciNet MATH Google Scholar
Sperling E (2018) Quantum effects at 7/5nm and beyond. Semiconductor Egineering
Tang E (2019) A quantum-inspired classical algorithm for recommendation systems. In: Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pp 217–228
Temme K, Bravyi S, Gambetta JM (2017) Error mitigation for short-depth quantum circuits. Phys Rev Lett 119(18):180509
MathSciNet Google Scholar
Ventura D, Martinez T (2000) Quantum associative memory. Inform Sci 124(1-4):273
MathSciNet Google Scholar
Wan KH, Dahlsten O, Kristjánsson H, Gardner R, Kim M (2017) Quantum generalisation of feedforward neural networks. npj Quantum Information 3(1):1
Google Scholar
Wiebe N, Braun D, Lloyd S (2012) Quantum algorithm for data fitting. Phys Rev Lett 109(5):050505
Google Scholar
Wiebe N, Kapoor A, Svore K (2014a) Quantum algorithms for nearest-neighbor methods for supervised and unsupervised learning. arXiv:1401.2142
Wiebe N, Kapoor A, Svore K (2014b) Quantum deep learning. arXiv:1412.3489
Willsch D, Willsch M, De Raedt H, Michielsen K (2020) Support vector machines on the D-Wave quantum annealer. Comput Phys Commun 248:107006
MathSciNet Google Scholar
Wittek P (2014) Quantum machine learning: what quantum computing means to data mining. Academic Press, Cambridge
MATH Google Scholar
Wootters WK, Zurek WH (1982) A single quantum cannot be cloned. Nature 299(5886):802
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Vishwakarma University, Pune, India
Viraj Kulkarni & Milind Kulkarni
DeepTek Inc, Pune, India
Aniruddha Pant

Authors

Viraj Kulkarni
View author publications
You can also search for this author in PubMed Google Scholar
Milind Kulkarni
View author publications
You can also search for this author in PubMed Google Scholar
Aniruddha Pant
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Viraj Kulkarni.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kulkarni, V., Kulkarni, M. & Pant, A. Quantum computing methods for supervised learning. Quantum Mach. Intell. 3, 23 (2021). https://doi.org/10.1007/s42484-021-00050-0

Download citation

Received: 29 June 2020
Accepted: 03 August 2021
Published: 06 September 2021
DOI: https://doi.org/10.1007/s42484-021-00050-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Quantum computing methods for supervised learning

Abstract

Similar content being viewed by others

Quantum Machine Learning: A Review and Current Status

Quantum Computing for Machine Learning

Quantum machine learning

Explore related subjects

1 Introduction

2 Background of quantum computation

2.1 Single qubit

2.2 Multiple qubits

2.3 Quantum gates

2.4 Quantum parallelism

2.5 No cloning theorem

2.6 Adiabatic quantum computation

3 Quantum machine learning

3.1 Important subroutines of quantum algorithms

3.1.1 Quantum encoding

Basis encoding

Amplitude encoding

3.1.2 Quantum random access memory

3.1.3 Grover’s algorithm and amplitude amplification

3.1.4 Calculating inner products using swap test

3.1.5 Solving systems of linear equations (HHL)

3.2 Quantum annealing

3.3 Quantum algorithms for machine learning

3.3.1 k-Nearest neighbors

3.3.2 Support vector machines

3.3.3 Neural networks

3.3.4 Variational quantum classifiers

4 Discussion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation