1 Introduction

Computational fluid dynamics (CFD) is about solving partial differential equations (PDEs) to obtain the physical properties of fluids, which plays a vital role in industrial design. One of the most significant equations, the Navier–Stokes equation, models the evolution of flow fields on the properties of density, momentum, and energy [1]. Solving such PDEs is a complicated task; thus, numerical methods are often applied to provide approximate results.

Using the quantum computing approach to achieve a faster numerical solution is promising. Quantum computing is a new computing paradigm that offers exponential acceleration over classical computing approaches. Many quantum algorithms, including quantum factorization [2], quantum simulation [3,4,5,6], and the linear system solvers, [7,8,9] have already appeared to prove this idea. The quantum computing hardware is also experiencing fast development. Tens of qubits are made in the superconducting system [10, 11] and ion trap [12]. The quantum error correction and topological quantum computing theories have also been developed to create a fault-tolerant quantum computer [13,14,15,16].

The numerical solution of the fluid flow involves two kinds of problems: transient problems, which require the system state at any given time, and steady problems, which require the steady state after the system evolving an infinitely long time. In this paper, we focus on the quantum speedup on steady problems. A no-go theorem has proved that a nonlinear PDE cannot be solved efficiently even with quantum computing approaches where time precision is considered, and the nonlinearity is large. This theorem prohibits a fast quantum algorithm from solving a general range of fluid flows, where the nonlinear behaviors are common. Instead, the steady problems require to output a steady solution of the fluid flow. Thus, the time precision becomes unimportant in steady problems, and the quantum speedup becomes expectable.

In steady problems, the first-order time-stepping schemes are usually applied. Such schemes start from an arbitrary initial state, evolve with pseudo-time and targets to efficiently converge the system to the stable state solution. In the first-order scheme, one can either evaluate the variable at time \(t^n\) or \(t^{n+1}\), named by the “explicit” or “implicit” method, respectively. The explicit method can directly compute the next step’s variable from this step, and the implicit method should solve a linear equation to compute that. In practice, the explicit method is more suitable for parallel computing, and the implicit method can converge faster [17].

The finite volume method (FVM) is a typical numerical method in the classical CFD domain [18, 19]. In FVM, the computational space is split into small cells, and from an initial state, one performs discrete time-stepping to evolve the system. The time complexity of FVM is mainly dominated by the time of performing the time-stepping, which involves solving large-scale linear algebra problems. For a sparse matrix linear solver, Krylov’s subspace method is widely used in practice. The complexity can be linear to the problem size (the number of grid cells) [20]. When the problem size is large, the computing resource (including hardware resources and computing time) will become expensive, for example, implementing large-scale CFD problem on supercomputer and CPU/GPU clusters [21,22,23].

Intuition is that applying the quantum linear solver (QLS) in the implicit method exponentially accelerates the FVM.Footnote 1 However, applying QLS to practical problems is not apparent. It has been pointed out that the conversion between the classical and the quantum data could become a bottleneck [26], which will fail to demonstrate any quantum advantage.

This paper studies the case of using the implicit method to solve steady problems. We propose a full recipe to accelerate FVM with the quantum approach. By only assuming that the input and output are all classical data, we can still obtain exponential speedup on the steady problem cases. To achieve this, we apply quantum random access memory (QRAM) [27, 28] and design a quantum memory layout. At the input stage, the memory layout helps to implement quantum inputs required by the QLS. At the output stage, we sample the output state and sparsely update the memory. We show that these two processes, which act as the interface between classical and quantum data, can both run in polylogarithmic time. As a result, they enable us to integrate the quantum linear solver submodule into the classical FVM to achieve exponential speedup.

The QLS and the \(l_\infty \) tomography [24] method create a sparse update vector to enable efficient time-stepping. Although such an update differs significantly from the classical algorithm, the convergence stability is maintained if choosing appropriate quantum error tolerance. To prove this, we analyze the error model resulting from the quantum process and perform numerical experiments on various test cases. The result shows that this method has broad applicability to general types of CFD problems with the quantum advantage being kept.

There were some previous works about solving differential equations and partial differential equations with the quantum computer [29,30,31,32,33,34,35,36]. The comparison and relation between this paper and the previous works are discussed in Appendix section 1.

2 Motivation: quantum algorithm with classical input and output

When we use a quantum computer to cope with a practical problem, we should always and only expect classical inputs and outputs. Many quantum algorithms have been proposed and claimed to be faster (exponentially or polynomially) than their classical counterparts. However, a large portion of them only beats classical algorithms under some theoretical limitations. A famous example is the quantum linear system (QLS) algorithm: Harrow–Hassidim–Lloyd (HHL) algorithm [7], which can prepare the state \(|x\rangle \), encoding the solution of the linear equation \(\varvec{A}\varvec{x}=\varvec{b}\). This algorithm uses \(\mathcal {O}(\log N)\) calls to linear equation oracles, where the classical counterpart has to perform at least \(\mathcal {O}(N)\) calls. Based on this work, many quantum machine learning algorithms were proposed and claimed to have exponential speedup over their classical counterparts. However, most of these algorithms did not answer how to deal with real-world data to realize such oracles. Meanwhile, they did not answer how to output the classical vector. In [26], the authors raised a series of obstacles for applying the QLS on quantum machine learning algorithms with real-world data. The main problems include inputting the classical data into the quantum computer and extracting information from the output state given by the QLS. If we hope to preserve quantum speedup, two operations are forbidden. One is to prepare the input state \(|b\rangle \) with an encoded quantum circuit, where even reading all data entries requires \(\mathcal {O}(N)\) time. The other is to perform sampling on the output state to extract the state to a classical vector with \(\mathcal {O}(N)\) times measurements.

We believe the obstacles that appeared in “QLS-based” quantum machine learning algorithms that are also challenging if we want to accelerate the FVM for CFD problems quantumly. The physical variable should be updated at every time step. The time complexity will become the multiplication of the \(T_x\) (the time complexity for preparing \(|x\rangle \)) and M (the number of copies required for sampling). The M should always be sublinear to the problem size N; otherwise, we cannot achieve quantum speedup. Our proposal will consider these obstacles, only assuming that the input and output of this algorithm are all classical data.

3 Methods

3.1 Designated quantum data structure for the FVM

We store the interval data into the QRAM to allow quantum access. The quantum random access memory (QRAM) is the particular storage device used by a quantum computer. It stores classical data, which can be retrieved in quantum superposition. Detailed descriptions of QRAM and our assumptions are shown in Appendix section 1.

Here we design a quantum data structure for integrating quantum approaches with the FVM, as shown in Fig. 1. The data structure includes the geometry definition, the physical variable \(\varvec{U}\), and the residual vector \(\varvec{R}\), which are identical to what the classical FVM solver stores. All data in the QRAM are continuous storage, which allows it to locate any of them with simple computations. In the FVM, each cell is represented by a C-sized tuple of several physical properties, such as a five-tuple \((\rho , \rho u, \rho v, \rho w, \rho E)\) in the compressible three-dimensional flow. (A detailed information about the FVM is shown in Appendix 1.) Thus, we locate each element with an address pair (ik). The first is the node’s number, and the second locates the position in the tuple. For example as shown in Fig. 1(c), \(U_{i,k}\) represents the \(k^\mathrm{th}\) variable of the \(i^\mathrm{th}\) node.

Besides those, we design a vital component for the quantum input and output, which is the residual sum tree, depicted in Fig. 1d. We precompute the sum of the \(l_2\) norm of the residual vector and its subvector and store them into the sum tree. Every node of the binary tree stored the sum of two children nodes, except that the second bottom layer stored the sum of the square of two children. This structure is maintained throughout the whole FVM computing. Whenever the physical variable’s elements are changed, the residual sum tree is also re-computed. The detailed method for updating the sum tree is introduced in the following section (Sect. 3.3).

Fig. 1
figure 1

Schematic of the quantum data structure. a Three memory areas of the QRAM: geometry definition area, which holds the input of the problem; physical variable area holding the \(\varvec{U}^n\); and the residual sum tree. b The linear structure of the geometry definition area. This area is formed with N blocks. The \(i^\mathrm{th}\) block holds s related indices where each of \(i'=i_k\) (\(0\leqslant k<s\)) satisfies \(C_{i,i'}=1\). c The linear structure of the physical variable area. Each block of the physical variables \(\varvec{U}_{i}\) at the cell i. d The binary tree structure of the residual sum tree. The tree’s leaves are the components of the residual vector \(\varvec{R}^n\). Then for each level, we sum up the square of every two nodes. The tree root is \(\Vert \varvec{R}\Vert ^2\).

Now we introduce the quantum functions enabled by such data structures. The physical variable and the residual vector are common vectors. They can be queried by inputting the addresses (or the indices) in a quantum superposition, denoted by \(\mathcal {P}_U\):

$$\begin{aligned} \mathcal {P}_U|i,k\rangle |0\rangle =|i,k\rangle |U_{i,k}\rangle . \end{aligned}$$
(1)

All tree nodes are also stored continuously. We define each node’s address as \(a_r(p)\) from the top of the sum tree. p is a binary string where every digit represents the left/right branch with 0/1. For example, \(a_r(0)\) is the address of the left child of the root; \(a_r(0,1)\) is the right child of the node at \(a_r(0)\). Specially, we directly use \(a_r\) to represent the root’s address. The binary tree’s size is determined, so that any address \(a_r(p)\) can be computed efficiently. The data contained in address \(a_r(p)\) are denoted by \(S_R(p)\). We can perform such unitary transform:

$$\begin{aligned} \mathcal {P}_R|a_r(p)\rangle |0\rangle =|a_r(p)\rangle |S_R(p)\rangle . \end{aligned}$$
(2)

The geometry definition stores the geometry input of the CFD problem. From this, we can query the position and the connection of all cells in a quantum parallel. These are constant during the calculation and are used for computing the elements of the equation. This part enables this unitary:

$$\begin{aligned} \mathcal {P}_s|i\rangle |j\rangle =|i\rangle |i_j\rangle , \end{aligned}$$
(3)

which is essential for the quantum linear solver.

To summarize, this data structure enables these following processes:

  1. 1

    Initialization Initialize the residual sum tree will classically access the QRAM \(\mathcal {O}(N)\) times;

  2. 2

    Quantum Input With access to the QRAM, one can prepare \(|R\rangle \) in \(\mathcal {O}(\log ^2 N)\) time.

  3. 3

    Quantum Output The quantum solver outputs the classical solution which sparsely updating the physical variable \(\varvec{U}\) no more than \(\mathcal {O}(\epsilon ^{-2}\log N)\) times.

  4. 4

    Result Time-stepping from \(\varvec{U}^n\) to \(\varvec{U}^{n+1}\) will cost \(\tilde{\mathcal {O}}((s^3+\log N)s\kappa \epsilon ^{-2}\log ^3 N)\) time.

In the following text, we will introduce how quantum input and output are realized. Then, we analyze the time complexity of all the above processes in the “run-time analysis” section to prove these results.

3.2 Quantum input: constructing quantum subprocedures from the QRAM

The time-stepping of \(\varvec{U}\) is realized by solving linear equations while using an implicit Euler time-stepping scheme. The linearization scheme of the finite volume method is introduced in Appendix Sect. 1, where Eq. (18) is the typical form. In general, the implicit Euler method derives

$$\begin{aligned} \varvec{A}\Delta \varvec{U}=\varvec{R}. \end{aligned}$$
(4)

In this equation, \(\varvec{A}\) and \(\varvec{R}\) are the coefficient matrix and the vector, which subjects to the \(\varvec{U}\) at a certain step. Solving this equation, we obtain \(\Delta \varvec{U}\), which is the update vector to the \(\varvec{U}\). The Euler method is a first-order timing scheme; thus,

$$\begin{aligned} \varvec{U}^{n+1} = \varvec{U}^{n}+\Delta \varvec{U}^{n} \end{aligned}$$
(5)

performs a stepping from n to \(n+1\). Here we use the superscript n to denote the \(n^\mathrm{th}\) iteration step. The main idea is to use the quantum approach to accelerate the linear solver to achieve exponential speedup.

We introduce the quantum linear solver in Appendix Sect. 1. The input of the quantum linear solver consists of three quantum subprocedures. For a general sparse linear equation (like 4), we have the following three quantum processes which encode the information of the linear equation:

$$\begin{aligned} \mathcal {P}_A|i,k,i',k'\rangle = |i,k,i',k'\rangle |A_{i,k}^{i',k'}\rangle , \end{aligned}$$
(6)

which encodes the matrix’s element, and

$$\begin{aligned} \mathcal {P}_R|i,k\rangle = |i,k\rangle |R_{i,k}\rangle , \end{aligned}$$
(7)

which encodes the vector’s element, and

$$\begin{aligned} \mathcal {P}_l|i,p\rangle = |i,C_i(p)\rangle , \end{aligned}$$
(8)

which encodes the \(p^\mathrm{th}\) related cell in the difference scheme.

In the original proposal of the quantum linear solver, these three processes are viewed as “black-boxes”. They did not evaluate the time complexity for solving a certain linear equation. Instead, they evaluated the query complexity to these black-boxes, namely how many times the quantum linear solver calls these three processes. Our task is to construct the input of the quantum linear solver by utilizing the quantum data structure introduced above.

Using the data stored in QRAM to construct \(\mathcal {P}_A\) and \(\mathcal {P}_l\) is straightforward. Construction of \(\mathcal {P}_l\) is equivalent to query the geometry definition data, which has been prepared beforehand. To construct \(\mathcal {P}_A\), we need to compute the value of a matrix element with its position as the input (row and column). Constructing \(\mathcal {P}_A\) requires at most O(s) queries. First, query \(\mathcal {P}_l\) to find all related nodes, then query the physical variables of these nodes, where only O(s) nodes are related. Computing the matrix element is also gate-efficient because these computations are also efficient in classical computing.

Constructing \(\mathcal {P}_b\) requires the data in the residual sum tree. Inspired from [37], if we use a sum tree storing all precomputed of a real-value vector in the QRAM, then with the method introduced in [38], we can prepare such vector efficiently, time complexity being \(O(\log N)\). Using this method, \(\mathcal {P}_b\) can be constructed with \(O(\log N)\) times of queries to the QRAM. A detailed implementation process is written in Appendix Sect. 1.

Using this approach, we can also append a preconditioner to the linear equation. The classical Krylov subspace method and the quantum linear solver are faced with condition number problems. Both complexities have a linear dependency on the condition number. One can use preconditioner, denoted by \(\varvec{P}\), to construct a new equation \(\varvec{PAx}=\varvec{Pb}\). If \(\varvec{PA}\) has a smaller condition number than \(\varvec{A}\), and such preconditioning processes can be implemented efficiently; then, the time complexity can be reduced. Preconditioners are widely used in classical CFD solvers. There have also been proposed several preconditioned quantum linear solvers [39,40,41]. Here, we integrate Jacobi preconditioner in our method. The time complexity of constructing \(\mathcal {P}_A'\) and \(\mathcal {P}_R'\) multiplies by an \(O(\mathrm {poly} s)\) which is preconditioned version \(\varvec{A'}=\varvec{PA}\) and \(\varvec{A'}=\varvec{PR}\) correspondingly. The data stored in the residual sum tree are modified to its preconditioned version. A detailed implementation process of the quantum Jacobi preconditioner is written in Appendix section 1.

3.3 Quantum output: sampling the solution state and update the QRAM

With the quantum inputs, QLS outputs a solution \(|u\rangle =|\frac{\Delta \varvec{U}^{n+1}}{\Vert \Delta \varvec{U}^{n+1}\Vert }\rangle \), a normalized solution of the linear equation, where \(\Vert \cdot \Vert \) represents \(l_2\) norm of a vector in this paper. Now we name the quantum linear solver process as \(\mathcal {A}_u\). \(\mathcal {A}_u\) can prepare the \(|u\rangle \) within sublinear time. Our task is to call \(\mathcal {A}_u\) sublinear time to update the physical vector and move to next iteration step.

Using \(l_\infty \) tomography algorithm [42], we can convert the quantum state into a classical vector within sublinear time. Taking \(\mathcal {A}_u\) as the input, running \(\mathcal {A}_u\) and its controlled version by \(\mathcal {O}(\epsilon ^{-2}\log N)\) many times, we obtain a classical vector \(\tilde{u}\) which is \(\epsilon \)-close to the quantum solution \(\varvec{u}\). We introduce \(l_\infty \) tomography algorithm in Appendix Sect. 1. As a result, obtaining a classical vector \(\tilde{\varvec{u}}\) requires sublinear time.

The tomography algorithm only produces a normalized vector \(\tilde{\varvec{u}}\). We should also obtain all the normalized factors in the algorithm to get the actual update vector of \(\varvec{U}\). QLS produces two factors. First is \(c_R\), which is generated when preparing \(|R\rangle \) and can be obtained from the residual sum tree described above. The second is \(c_l\), which is derived from the non-unitarity of the matrix inversion and the normalization factor of the solution. With amplitude estimation [43] (also introduced in Appendix Sect. 1), we can compute the probability \(p_l\) and then obtain the factor by \(c_l = \alpha \sqrt{p_l}\), where \(\alpha \) is a constant in the QLS. Obtaining the normalization factors will not affect the asymptotic time complexity of the algorithm. Combining these two factors \(c=c_Rc_l\), we obtain the norm of solution \(\Vert \Delta \varvec{U}\Vert \), which implies the variation updated on the target vector \(\varvec{U}\) in the CFD solver.

To perform the time-stepping, we update the physical variable \(\varvec{U}\) with \(\Delta \varvec{U}\). Updating the QRAM from the sampled vector \(\tilde{\varvec{u}}\) is also efficient. The \(l_\infty \) tomography algorithm produces a sparse classical vector with not more than \(\mathcal {O}(\epsilon ^{-2}\log N)\) nonzero elements, which means that the update of QRAM will be performed for less than \(\mathcal {O}(\epsilon ^{-2}\log N)\) times to update the physical variable.

Same as the computing \(\varvec{R}\) and its sum from \(\varvec{U}\), when updating any element of the physical variable, only the residual on the related cells would change. From the tree leaves, we update the all residual \(R_{i',k'}\) related to the updated \(U_{i,k}\) with \(C_{i,i'}=1\). After these residual vector entries change, we again compute the sum tree from the leaves to the root and update correspondingly. The number of updated nodes will not exceed the number of the multiplication of the related residual entries \(\mathcal {O}(s)\) and the number of layers of the sum tree \(\log N\).

As a result, the cost of update one entry of \(\varvec{U}\) is less than \(\mathcal {O}(s\log N)\). Thus, both sampling and updating processes can run in sublinear time.

For a steady problem, the computing stops in two cases. One is when the residual is smaller than the convergence limit \(\epsilon \), which can be extracted from the top of the tree. Another is when reaching the maximum iteration steps. After stopping, the output of this algorithm is the classical vector stored in the physical variable area.

4 Run-time analysis

The time cost for the quantum approach has two main contributions. One is the cost of initializing the data structures (initialization cost); the other is the time complexity between two iteration steps (time-stepping cost).

4.1 Initialization cost

The initialization process fills the QRAM following the memory layout, which is entirely a classical process. The first step is to write in the initial physical variable \(\varvec{U}\) and fill the tree with the wanted sum. Along with these data, we should also fix the memory layout to quickly obtain the memory address of every data entry in constant time.

There may be some concentrations about whether such \(\mathcal {O}(N)\) preparation time will cause the vanishment of this algorithm’s speedup. However, after considering the time consumption of initialization, the quantum speedup is still preserved. To support this, we analyze the three things that contribute to initialization time.

The first is the calculation of the residual vector \(\varvec{R}\). The calculation of this vector exists at every step of the classical FVM in a CFD problem. Even in classical algorithms, this part is not the bottleneck of time. Our algorithm only calculates the residual once initially, which will consume much less than a classical algorithm does.

The second is the fill of the sum tree. To fill a sum tree only requires repeatedly adding the sum of the square of the residual vector. Therefore, it is natural to think this process takes a shorter time than calculating the residual vector.

The third is about the cost of accessing the QRAM classically. As we have mentioned in Appendix Sect. 1, we assume the QRAM has the near capability of RAM, which allows the access to be performed in constant time.

As a result, under the assumption about the QRAM’s capability, the initialization cost would not cost much more than the preprocessing stage of the classical FVM. Also, the initialization of quantum memory only processes once, and we believe this cost would not become the time’s bottleneck.

4.2 Complexity of time-stepping

The evolution cost is the time complexity at every evolution stage. In [9], the authors provided a linear solver algorithm with logarithmic dependence on precision. They show that the query complexity of this algorithm of \(O_A\), \(O_l\) and \(O_b\) is \(O\left( s\kappa \mathrm {polylog}(\frac{s\kappa }{\epsilon })\right) \). Now we start to analyze the time complexity of constructing these subprograms from the initial problem settings.

According to the results in the previous sections, the number of queries to QRAM for implementing \(O_A\), \(O_l\), and \(O_b\) is \(\mathcal {O}(s)\), \(\mathcal {O}(1)\), and \(\mathcal {O}(\log N)\), correspondingly. The time complexity of preconditioned \(O_A'\) has a multiplier of \(\mathcal {O}(s^3)\) contributed by computing the inverse of the diagonal blocks of \(\varvec{(}A)\). The preconditioned \(O_b'\) has the same complexity as \(O_b\).

Now consider the time cost of sampling and updating. We run the QLS with \(\mathcal {O}(\epsilon ^{-2}\log N)\) times to obtain an \(l_\infty \)-close classical vector. This becomes another multiplier to the time complexity of the quantum procedure.

The last multiplier is the cost of querying the QRAM. As we have assumed, the QRAM use \(\mathcal {O}(\log N)\) time to perform one query. By composing these results, the time complexity of the quantum procedure is

$$\begin{aligned} \mathcal {O}\left( (s^3+\log N)s\kappa \log ^3 N\epsilon ^{-2}\mathrm {polylog}(s\kappa /\epsilon ) \right) . \end{aligned}$$
(9)

The final step is to update the sum tree. Updating a preconditioned residual tree has two steps. One is to compute the preconditioned residual, where each term will involve in another inversion of the matrix A, which is \(\mathcal {O}(s^3)\); the other is to update the tree from bottom to the top, which involves \(\mathcal {O}(\log N)\) times for one change in the bottom of the tree. While at most \(\mathcal {O}(\epsilon ^{-2}\log N)\) terms of \(\varvec{U}\) changes, the time cost of updating the tree is \(\mathcal {O}(s^3 \log ^2 N\epsilon ^{-2})\). The total time complexity is the addition of the quantum and the classical procedure. Because the quantum procedure’s complexity is asymptotically greater than the classical’s, we conclude that the evolution time cost has the time complexity shown in Eq. (9).

The classical counterpart’s time complexity is \(\mathcal {O}(Ns\kappa \log 1/\epsilon )\) when using CG as the linear solver. Our algorithm outperforms the classical algorithm on the problem size’s dependency but has worse performance when the problem requires high precision. When the problem size N and the requirement of the precision \(\epsilon \) has such relation \(N\gg \epsilon ^{-2}\), the quantum algorithm will potentially have better performance on time.

5 Error analysis

The time complexity of the quantum approach has better performance on the number of grid cells N but worse on the precision \(\epsilon \), which implies that the problem size should be large enough to show the quantum advantage. On the other side, the numerical experiment shows that the precision should be small. Otherwise, the time integration will not converge. There is the problem: if the precision requirement has some dependency on the problem size, the quantum acceleration will decrease or even vanish. In this section, we will provide evidence that the error threshold will not grow with the problem size.

First, we calculate the total error generated by the quantum sampling with error bound \(\epsilon \) specified. At one step, we define the physical variable \(\varvec{U}\) and its update \(\Delta \varvec{U}\). In our proposal, the quantum process outputs a quantum state \(|u\rangle \) which is proportional to \(\Delta \varvec{U}\),

$$\begin{aligned} \Delta \varvec{U} = u\Vert \Delta \varvec{U}\Vert . \end{aligned}$$
(10)

The \(l_\infty \) tomography outputs a classical vector \(\tilde{u}\) which is \(\epsilon \)-close to u. At any index i, we have

$$\begin{aligned} u_i = \tilde{u}_i + e_i, \end{aligned}$$
(11)

where the error term \(|e_i|<\epsilon \).

Now we consider the amplitude of the \(e_i\). When performing \(l_\infty \) tomography, the output vector is a sample from the multinomial distribution where the sampling number \(M = C\log N/\epsilon ^2\) and the probability distribution (\(|u_0|^2\), \(|u_1|^2\), ... \(|u_{N-1}|^2\)). At any term, the standard error of such sample is: \(\sigma _i = \sqrt{M|u_i|^2(1-|u_i|^2)}\). When \(|u_i|\) is small enough, we have \(\sigma _i\sim |u_i|\sqrt{N}\). Now we assume the error \(e_i\) is approximately linear dependent on the standard error \(\sigma _i\), and thus, we have

$$\begin{aligned} e_i = \mathcal {O}(\sigma _i) \sim \mathcal {O}(|u_i|\sqrt{N}). \end{aligned}$$
(12)

The update vector output by the quantum linear solver should be multiplied by \(\Vert \Delta \varvec{U}\Vert \). As a result, the total error will be amplified by this coefficient.

$$\begin{aligned} E_i = \Vert \Delta U\Vert e_i. \end{aligned}$$
(13)

Compare two cases describing the same problem where one has N cells and the other has kN (mark the variables with extra prime, e.g., \(u'\)). We can assume the distribution of \(\Delta \varvec{U}\) and \(\Delta \varvec{U}'\) is the same because the physical characteristic does not change. From this, we have

$$\begin{aligned} \Vert \Delta U'\Vert ^2 = k\Vert \Delta U\Vert ^2, \end{aligned}$$
(14)

because only the vector size changes to k times. From the definition of u (equation (10)), this results in the decrease of the amplitude of the u, i.e.,

$$\begin{aligned} u_i' = \frac{1}{\sqrt{k}}u_i. \end{aligned}$$
(15)

Combining Eqs. (12), (13), and (15), we obtain that \(E_i = \mathcal {O}(U_i)\). This result implies that the total error generated by the quantum sampling will not change over the problem size N.

6 Numerical experiment

6.1 Numerical methods

The open-source classical CFD software, SU2 [44,45,46], is selected as the base CFD solver. The SU2 mainly implements the finite volume method solver for unstructured grids. The SU2 is highly configurable, supporting sorts of equations and algorithms.

Concentrating on the implicit Euler solver, we inserted codes to support the simulations of quantum error. This part of the code is written around the original linear solver to emulate the errors introduced by quantum processes.

Here we ignored the error from the QLS and focused on simulating the error generated by \(l_\infty \) tomography. As introduced in Appendix Sect. 1, the time complexity increases polylogarithmic to the precision, which implies that we can set a sufficiently high precision without much decelerating this process. In fact, the slowdown effect on the quantum error complexity (see Run-Time Analysis Section) results from the sampling process, namely the error setting from the \(l_\infty \) tomography.

After the linear equation is generated, we classically solve this equation to obtain an unnormalized output. Then, we divide this vector by its \(l_2\) norm to emulate the output of the quantum linear solver. Then, we take this normalized vector as the input of a classical sampling algorithm. The algorithm samples the probability distribution \(C\epsilon ^{-2}\log N\) times for the vector size N, error \(\epsilon \), and \(C=32\)(consistent with the original \(l_\infty \) algorithm). The sample vector is finally multiplied by the above \(l_2\) norm to obtain a quantum version solution. The whole computation process uses the quantum solution instead of the classical solution, while other processes remain unchanged.

We chose a series of example test cases, appended different quantum errors, output the evolution history and flow field. By comparing the quantum error biased results to the classical solver, we study whether the quantum error effect will damage the availability of the FVM.

Fig. 2
figure 2

Results of the numerical experiment. The test case is the three-dimensional inviscid flow around the Onera M6 airfoil. a Pressure coefficient around the airfoil. This result is calculated by the original classical solver. b The simulation result of the quantum approach. The error is set to 1e-2. The quantum solver also solves the flow field correctly. c Convergence history of the test case of the quantum solver with different error settings, compared to the classical solver as the baseline. The error is set from 5e-2 to 1e-4. Except \(\epsilon \)=5e-2, all other cases converge correctly. The maximum stable error is between 5e-2 and 1e-2. d A cross line with y=0.5 solution output by the classical solver and the quantum solver of error 1e-2.

6.2 Inviscid transonic flow

We select the inviscid flow around Onera M6 airfoil as the test case. Onera M6 airfoil is a fundamental test case for a CFD solver. In this case, the problem is a three-dimensional case with 108396 grid points.

We focus on comparison among the classical result with different error settings: from 5e-2 to 1e-4. The definition of error settings is described above. A general view of the convergence history is demonstrated in Fig. 2c. We can find that the computation converges correctly when the error is smaller than 1e-2 and diverges quickly at 5e-2.

The comparison of the flow field is shown in Fig. 2a and b, displayed by the pressure coefficient of the surface flow on the airfoil. The “lambda”-shaped shock wave on the airfoil is computed correctly with both classical and the quantum-accelerated solvers with a certain amount of error. In Fig. 2, we pick a horizontal cross line on half of the airfoil (\(y=0.5d\), where d is the length of the wing), showing that the result of error 1e-2 correctly matches the classical result. We can conclude that the result is correct as long as the computation converges, even with a high quantum error closing to the convergence threshold.

The convergence history of test cases implies an effective band in terms of the quantum error settings. The lower bound is the maximum stable error, defined by the maximum error setting where the computation can converge. The upper bound is \(\epsilon ^{-2}<N\), defined by the minimum error where the quantum advantage is preserved. The quantum approach will overperform the classical when we set the error on this band.

In this case, the maximum stable error is between 5e-2 and 1e-2, satisfying \(N\gg \epsilon ^{-2}\), which suggests the existence of quantum advantage in cases like this.

6.3 Turbulent viscous flow

The turbulent flow, in contrast to laminar flow, is a complicated type of flow motion. Due to its chaotic behavior, turbulence is hard to compute directly from the NS equations. Instead, Reynolds-averaged Navier–Stokes (RANS) equations are used to describe the turbulent flow, which provides approximate time-averaged solutions to the Navier–Stokes equations [47].

The finite volume method with the implicit Euler time-stepping scheme is also applied to the RANS equation. Similarly, the system evolves by solving a linear equation. SU2 code also supports setting RANS as the governing equation. We performed experiments again on the Onera M6 airfoil problem where the Reynolds number is set to 11.72E6.

Fig. 3
figure 3

Results of the supersonic wedge test case. a Convergence history of quantum error set from 1e-2 to 1e-4 and the classical case. These cases terminate after reaching convergence conditions or over 1000 steps. b The pressure coefficient on the cross line at \(y=0.5d\) of the airfoil where d is the length of the wing. The green triangle and the blue circle are plotted of the cases 1e-2 and 5e-3 correspondingly. The 5e-3 case matches the classical correctly. The 1e-2 case matches the classical at most places but slightly biases at \(x>0.8\).

The convergence histories are shown in Fig. 3a look different from those of the above case. In this case, the system cannot keep converging, where a larger error leads to a larger converged residual. We compare the converged results of 1e-2 and 5e-3, which correspond to the largest converged residual and the second largest one among all of our results, as demonstrated in Fig. 3b. The 5e-3 case, which is even converged at a much higher residual than the classical one, closely matches the classical solution. The 1e-2 also matches well, except for \(x>0.8\).

The result also shows that our approach also adapts to the RANS equation. We believe that regardless of the type of the equation, the acceleration of FVM with the quantum approach has broad adaptability. Because the classical algorithms share a similar equation linearization method and implicit time-stepping scheme, using our approach to achieve the quantum speedup is possible if we can construct the linear equation efficiently.

Fig. 4
figure 4

Results of the supersonic wedge. a Convergence history of quantum error set from 1e-5 to 1e-7 and the classical case. b The pressure coefficient of this case after convergence, where a shock wave appears at an angle over the wedge (i.e., the sudden change of pressure from the right bottom to left top). c The flow field with error 1e-6 after 100 iteration steps. d The flow field of the classical case after 100 iteration steps.

The supersonic compressible flow will generate a shock wave characterized by a discontinuous change in pressure, density, and temperature. This phenomenon is visible by solving NS equations by the FVM. As shown in Fig. 4, we calculate the two-dimensional supersonic flow over a wedge. The initial flow is set horizontal with Mach number 2.0, and the wedge is 10 degrees. After converged to the stable state, the NS equation predicts that there will be a shock with certain angles above the wedge.

6.4 Inviscid supersonic flow

We compared the classical result with error set to 1e-5, 1e-6 and 1e-7, whose convergence histories are displayed in Fig. 4a. In subfigure (b), we show the pressure coefficient of the space with quantum error set to 1e-6. The right bottom area is the wedge. The discontinuous change from blue (low pressure) to orange (high pressure) forms an interface that is consistent with the theoretical prediction.

Fig. 5
figure 5

Result of supersonic wedge with error set to 1e-5 after 100 iteration steps

In this case, we tried to evaluate the effect of quantum error physically. Subfigures (c) and (d) show the flow field after 100 steps of \(\epsilon \)=1e-6 and the classical case correspondingly. When both cases are at the early stage of computing, a small quantum error does not affect the solver’s ability to capture the properties of the fluid. In this test case, the discontinuous interface still appears even when the error is close to the threshold (between 1e-5 and 1e-6). Instead, quantum error generates random fluctuations at the blank space, such as the noise pattern at the left-upper space. This becomes the main reason why quantum error will slow down the convergence. We also plot the field at 100 steps of the 1e-5 case, shown in Fig. 5. The error over the threshold will hinder capturing of the shock wave. These results provide possible optimizations to our proposal, including the following two strategies.

First, use different quantum errors at different stages. The error analysis section mentioned that the actual error has an extra multiplier to the \(l_2\) norm of the residual vector due to the normalization of the quantum state. At the beginning of the computation, the system’s residual is large, where a small quantum error can be applied. Thus, the system can successfully capture the character of the flow. When the system is converging and the residual decreases, we can then set a larger quantum error to perform time-stepping rapidly. This strategy could potentially have higher stability and faster convergence.

Second, treat different areas with different errors. The blank space highly contributes to the global error; however, they are not crucial in determining the computation’s correctness and stability. For example, if we concentrate on a small area instead of the whole space (such as the airfoil test cases), the quantum error can be set higher on this area and lower at others.

7 Conclusion

This paper developed a quantum approach for accelerating the classical FVM and achieved exponential speedup on the time-stepping. The initial intuition is to use the quantum linear solver to replace the classical one. The main problem is that the quantum linear solver only accepts the quantum form of input and outputs a quantum state. The conversion between the classical and the quantum data is the bottleneck, and our paper mainly studied this. We apply the QRAM to store the interval data. A data structure for this problem is specially designed. The most critical part is that we always maintain a residual sum tree corresponding to the physical variable. On the one hand, this tree can help the efficient preparation of the quantum input. On the other hand, the tree can be efficiently updated throughout the time-stepping of the FVM. As the result, the time complexity of performing a first-order implicit Euler time-stepping is \(\mathcal {O}\left( \frac{(s^3+\log N)s\kappa \log ^3 N}{\epsilon ^2}\mathrm {polylog}(s\kappa /\epsilon )\right) \). The time complexity depends logarithmically on the grid points N, exponentially faster than the best classical case O(N).

The quantum error is additionally introduced. The quantum approach has a quadratic slowdown in terms of quantum error. We analyzed the effect of the error, concluding that a sufficiently small quantum error will not affect the correctness of the quantum-accelerated solver. Numerical experiments are conducted on various test cases to check the correctness of our approach. The error analysis and the numerical results imply the possibility that our approach can be practically applied to large-size CFD problems and demonstrates the quantum advantage.

Our future work will focus on optimizing the error setting on different stages and different computation areas. We believe that the quantum computer will show its advantage in solving a more complex CFD problem shortly.