1 Introduction

For high-dimensional problems, data-sparse representation schemes allow to overcome the so-called curse of dimensionality and perform computations efficiently. Among many low-parametric formats, tensor decomposition methods appear to be the most promising for high-dimensional data, see reviews [3, 32, 38] and monograph [16]. The recently proposed tensor train (TT) format [39, 42, 44], also known in quantum chemistry as matrix product states (MPS) [54], combines advances of both the canonical and Tucker formats: the number of representing parameters grows only linearly with the dimension and the approximation problem is stable and can be computed by algorithms based on singular value decomposition (SVD). Additional advantage of tensor train format can be taken, if dimensions with large mode size are substituted by a larger number of dimensions with a small mode sizes. If the small mode size equals two, it becomes ‘indivisible’, i.e., can not be reduced further. We can say that such mode represents a quant or bit of information and call this representation of a vector the quantized tensor train (QTT) format, following [31]. Similarly, in quantum computations a large data vector is represented by the entangled quantum state of several qubits (quantum bits), which are systems with two quantum states.

The QTT format is rank-structured and the storage size is governed by QTT-ranks. The impressive approximation properties of the QTT format were discovered in [31] for a class of functions discretized on uniform grids. In particular, it was proven that the QTT-ranks of exp(αx), sin(αx), cos(αx), x p are uniformly bounded with respect to the grid size. For the functions \(e^{-\alpha x^{2}}\), x α, \(\frac{\sin x}{x}\), \(\frac{1}{x}\), etc., similar properties were found experimentally.

In this paper we propose algorithms for the discrete Fourier transform of one- and high-dimensional data represented or approximated in the QTT format. Our approach is based on a radix-2 recursion formula which reduces the Fourier transform to the one of half size and lies behind the well-known Cooley-Tukey FFT algorithm [2, 9]. Each step of the proposed QTT-FFT algorithm includes an approximation to reduce the storage size of the intermediate vectors adaptively to the prescribed level of accuracy. The complexity of m-dimensional n×⋯×n transform with n=2d is bounded by \(\mathcal{O}(m d^{2} R^{3})\), where R is the maximum QTT-rank of the input, all intermediate vectors and output of the QTT-FFT algorithm. For vectors with moderate R, the QTT-FFT algorithm has square-logarithmic scaling w.r.t. the total number of array entries and outperforms the Cooley-Tukey FFT algorithm, which has \(\mathcal {O}(n^{m}\log n)\) complexity. Given an arbitrary input vector, it is not possible to predict the value of R and state if QTT-FFT algorithm is efficient, until the computation is done. This problem is solved partially in [48], where the class vectors with R=1 is fully described and is also shown by numerical experiments that many vectors can be approximated by ones with moderate R. The QTT-ranks depend on the desired accuracy level, and transforms with lower accuracy are computed faster by the approximate QTT-FFT algorithm, in contrast to the exact FFT algorithm.

The QTT-FFT algorithm can be compared with the quantum Fourier transform (QFT) algorithm, widely utilized in quantum computations, such as eigenvalue estimation, order-finding, integer factorization, etc. A single operation in the quantum algorithm has an exponential performance, since it changes all 2d components of the vector describing the entangled state of a quantum system. Remarkably, the superfast QFT algorithm [6] requires \(\mathcal{O}(d^{2})\) quantum operations. The QTT-FFT algorithm has asymptotically the same complexity for fixed R, that explains the word superfast in the title of this paper.

The QTT-FFT can be also compared with the Sparse Fourier transform. In the proposed method we use the data-sparse rank-structured QTT format for data representation, instead of pointwise sparsity of the Fourier image exploited in the Sparse Fourier transform. The QTT-FFT algorithm requires the QTT representation of the input vector, which can be computed from a small number of vector elements (samples) using the TT-ACA algorithm [49]. Our method is deterministic, while the Sparse Fourier transforms algorithms are usually heuristic with certain randomness involved in the selection of samples and with the probabilistic estimation of the accuracy.

This paper is organized as follows. We start from the overview of TT and QTT formats in Sect. 2. In Sect. 3 we present the Fourier transform algorithm for vectors in the QTT format. In Sect. 4 we explain how to keep the QTT-ranks moderate during this procedure and result in the QTT-FFT algorithm for the approximate computation of Fourier transform in the QTT format. In Sect. 5 we consider real-valued transforms (convolution, cosine transform) and explain how to compute them using the QTT-FFT algorithm. In Sect. 6 we develop the multi-dimensional Fourier transform algorithm in the QTT format. In Sect. 7 we give numerical examples illustrating that the use of QTT format relaxes the grid size constrains and allows high-resolution computations of Fourier images in higher dimensions without the ‘curse of dimensionality’. In particular, our approach allows to compute one-dimensional and multi-dimensional Fourier images using n=260 in 1D and n=220 in 3D on a standard workstation, see Sects. 7.1 and 7.2. In Sect. 7.3 we compute the convolution transforms of data with strong cusps or singularities which occur in particular in quantum chemistry and require very fine grid. In Sect. 8 we compare our method with Sparse Fourier transform algorithms for exactly Fourier-sparse signals and signals with limited bandwidth.

2 Tensor Train Format

A tensor is an array with d indices (or modes)

$$\mathbf{X}= \bigl[x(k_1,\ldots,k_d)\bigr], \quad k_p = 0,\ldots,n_p-1, \quad p=1,\ldots,d. $$

The tensor train (TT) format [39, 42] for the tensor X readsFootnote 1

$$ x(k_1,k_2,\ldots,k_d) = X^{(1)}_{k_1} X^{(2)}_{k_2} \cdots X^{(d)}_{k_d}, $$
(1)

where each \(X^{(p)}_{k_{p}}\) is an r p−1×r p matrix. Usually the border conditions r 0=r d =1 are imposed to make every entry x(k 1,…,k d ) a scalar. However, larger r 0 and r d can be considered and every entry of a tensor X=[x(k 1,…,k d )] becomes an r 0×r d matrix. The values r 0,…,r d−1 are referred to as TT-ranks and characterize the separation properties of the tensor X. Three-dimensional arrays \(X^{(p)} = [X^{(p)}_{k_{p}}]\) are referred to as TT-cars.Footnote 2

Definition 1

[39]

The p-th unfolding of a n 1×n 2×⋯×n d tensor X=[x(k 1,…,k d )] is the n 1n p ×n p+1n d matrix X {p}=[x {p}(s,t)] with the following elements

We will write X {p}=[x(k 1k p ,k p+1k d )], assuming that comma separates row and column indices. Obviously, if (1) holds, then \(r_{p} \geq\operatorname{{rank}}X^{\{p\}}\). In fact, \(r_{p} = \operatorname{{rank}}X^{\{p\}}\).

Statement 1

[39, 42]

Each tensor X=[x(k 1,…,k d )] can be represented by the TT format with TT-ranks equal to the ranks of unfoldings,

$$ r_p = \operatorname{{rank}}X^{\{p\}} = \operatorname{{rank}}\bigl[x(k_1 \cdots k_p, k_{p+1} \cdots k_d)\bigr]. $$
(2)

For many tensors, unfoldings have large ranks, but can be approximated by the low rank matrices as follows

$$\big\| X^{\{p\}} - \tilde{X}^{\{p\}} \big\|_F \leq \varepsilon_p, \quad \operatorname{{rank}}\tilde{X}^{\{p\}} = r_p. $$

The minimum r p which satisfies this condition is referred to as the ε p -rank of X {p}.

Statement 2

[39, 42]

If unfoldings X {p} of a tensor X have ε p -ranks r p , then X can be approximated by the tensor \(\tilde{\mathbf{X}}\) with TT-ranks r p and the following accuracy

The TT format for \(\tilde{\mathbf{X}}\), i.e., the approximation of a given tensor X in the TT format with a prescribed accuracy ε, can be computed by a constructive SVD-based algorithm [42]. Here the Frobenius norm of a tensor is defined as follows

$$\|\mathbf{X}\|_F^2 \mathrel{\stackrel{\mathrm{def}}{=}}\sum_{k_1\cdots k_d} \big|x(k_1, \ldots,k_d)\big|^2. $$

In the following we will omit the subscript and write ∥⋅∥=∥⋅∥ F for all vectors, matrices and tensors.

To apply the TT compression to low dimensional data, the idea of quantization was proposed [31, 40]. We will explain the idea for one-dimensional vector \(x = [x(k)]_{k=0}^{n-1}\), restricting the discussion to n=2d. Define the binary notationFootnote 3 of the index k as follows

$$ k=\overline{k_1\cdots k_d} \mathrel{ \stackrel{\mathrm{def}}{=}}\sum_{p=1}^d k_p 2^{p-1}, \quad k_p=0,1. $$
(3)

The isomorphic mapping k↔(k 1,…,k d ) allows to reshape a vector x=[x(k)] into the d-tensor \(\dot{\mathbf{X}} =[\dot{x}(k_{1},\ldots,k_{d})]\). The TT format (1) for the latter is called the QTT format and reads

$$ x(k) = x(\overline{k_1 \cdots k_d}) = \dot{x}(k_1,\ldots,k_d) = X^{(1)}_{k_1} \cdots X^{(d)}_{k_d}. $$
(4)

This idea appears in [40] in the context of matrix approximation. In [31] the TT format applied after the quantization of indices was called the QTT format. The impressive properties of the QTT approximation motivate the development of vector and tensor transforms in the QTT format.

By (2), the QTT-rank r p of a vector x of size n=2d is not larger than the number of columns/rows in X {p}, i.e., 1≤r p ≤2min(p,dp).

Definition 2

We will call the vectors with QTT-ranks one the rank-one vectors and the vectors with QTT-ranks r p =2min(p,dp) the full-rank vectors.

A random tensor, like a random matrix, has full TT-ranks with probability one. In general, the QTT-ranks grow exponentially in d, making the QTT algorithms completely inefficient. Therefore, QTT methods are naturally limited to the class of problems where all data have moderate QTT-ranks. It is not possible yet to describe the whole class of such problems and vectors explicitly, but it is possible to justify the concept by a number of convincing examples. Some function-related examples are already mentioned, more examples of (piecewise) smooth functions and functions with singularities that have a low-rank QTT representation can be found in [14, 36, 41]. The exponential function, discretized on the uniform grid, deserves a special interest in this paper. It has the rank-one QTT representation [31]

$$ \exp(\alpha k) = \exp(\alpha \overline{k_1k_2 \cdots k_d}) = \exp (\alpha k_1) \exp(2 \alpha k_2) \cdots\exp\bigl(2^{d-1} \alpha k_d\bigr), $$
(5)

which plays a key role for the efficient Fourier transform algorithm in the QTT format.

The QTT format can be applied not only to vectors, but also to matrices (see TTM format in [40]). Consider a 2d×2d matrix \(A = [a(j,k)]_{j,k=0}^{2^{d}-1}\), use the binary notation (3) for the indices \(j=\overline {j_{1}\cdots j_{d}}\) and \(k=\overline{k_{1} \cdots k_{d}}\), then permute the binary indices and reshape A to the tensor \(\dot{\mathbf {A}}=[\dot{a}(j_{1}k_{1},j_{2}k_{2},\ldots,j_{d}k_{d})]\). Finally, apply the TT format to \(\dot{\mathbf {A}}\) as follows

$$ a(j,k) = a(\overline{j_1 \cdots j_d}, \overline{k_1 \cdots k_d}) = \dot{a}(j_1k_1, j_2k_2, \ldots, j_dk_d) = A_{j_1 k_1}^{(1)} A_{j_2 k_2}^{(2)} \cdots A_{j_d k_d}^{(d)}, $$
(6)

where each \(A^{(p)}_{j_{p}k_{p}}\) is an r p−1×r p matrix. Merging indices j p and k p into the pair-index i p , we have the d-tensor \(\dot{\mathbf {A}}=[\dot{a}(i_{1},\ldots,i_{d})]\) and can use its unfoldings to find the TT-ranks of (6). In Sect. 3.1 we show that the Fourier matrix is ‘not compressible’ using the QTT format, i.e., its QTT-ranks grow exponentially with d. Therefore, the efficient Fourier transform appears to be a nontrivial problem. In contrast, recent results on explicit representation for the inverse Laplacian and related matrices [24] and efficient convolution in the QTT format [25] are based on the low-rank QTT decompositions of certain matrices and tensors. There are more examples of high-dimensional problems which were efficiently solved using the QTT approximation [4, 36, 37] as well as H-Tucker format [15].

The TT/QTT algorithms are based on basic linear algebra procedures like summation, multiplication and rank truncation (tensor rounding) maintaining the compressed format, i.e., the full data array is never computed. A comprehensive list of basic operations is given in [42]. We will need the Hadamard (elementwise) product,

$$z = x \odot y, \quad\mbox{i.e.,} \quad z(k)=x(k)y(k), \quad k=0, \ldots,2^d-1. $$

If x and y are given in the QTT format (4), the QTT format for z is written as follows

$$ z(k) = z(\overline{k_1\cdots k_d}) = Z^{(1)}_{k_1} \cdots Z^{(d)}_{k_d}, \quad \quad Z^{(p)}_{k_p} = X^{(p)}_{k_p} \otimes Y^{(p)}_{k_p}, \quad p=1,\ldots,d, $$
(7)

where XY denotes the Kronecker (tensor) product of matrices X and Y. The QTT-ranks of the Hadamard product z are the products of the corresponding QTT-ranks of x and y.

3 Discrete Fourier Transform in One Dimension

For n=2d, the normalized discrete Fourier transform (DFT) reads

$$ y(j) = \frac{1}{2^{\frac{d}{2}}}\sum_{k=0}^{2^d-1} x(k) \omega_d^{j k}, \quad\omega_d = \exp \biggl(-\frac{2\pi\mathtt{i}}{2^d} \biggr), \quad\mathtt{i}^2=-1, $$
(8)

where \(F_{d} = \frac{1}{2^{\frac{d}{2}}} [\omega_{d}^{jk} ]_{j,k=0}^{2^{d}-1}\) is the unitary Fourier matrix. The inverse Fourier transform is written in the same way, with \(\omega_{d} = \exp (\frac{2\pi\mathtt{i}}{2^{d}} )\).

3.1 QTT Decomposition of the Fourier Matrix Has Full Ranks

Unlike many matrices used in scientific computing (see, e.g., [40]), the Fourier matrix can not be compressed in the QTT format (6), i.e., its QTT-ranks grow exponentially with d. The proof is based on the properties of unfoldings of the Fourier matrix, defined as follows

Lemma 1

For \(p\leq{\frac{d}{2}}\) the unfoldings \(F_{d}^{\{p\}}\) have orthogonal rows, for \(p\geq{\frac{d}{2}}\) they have orthogonal columns.

Proof

If \(p\leq{\frac{d}{2}}\), elements of the unfolding \(F^{\{p\}}_{d}\) are the following

$$f_d^{\{p\}}\bigl(j'k',j''k'' \bigr) = \frac{1}{2^{\frac{d}{2}}}\omega_d^{jk} = \frac {1}{2^{\frac{d}{2}}} \omega_d^{j'k'} \omega_{d-p}^{j'k''} \omega_{d-p}^{k'j''} \omega_{d-2p}^{j''k''}. $$

In the matrix notation

where Ω′ and Ω″ are unitary diagonal matrices and Φ is the 2p×2dp top submatrix of F dp . Since F dp is orthogonal, the matrix Φ also has orthogonal rows and the Gram matrix of \(F^{\{p\}}_{d}\) is written as follows

The statement for the case \(p\geq{\frac{d}{2}}\) is proved in the same way. □

The following theorem proves that the low-rank approximation of the Fourier matrix in the QTT format with the reasonable accuracy is not possible.

Theorem 1

If the Fourier matrix F d is approximated by the matrix A with relative accuracyF d A∥≤εF d ∥, and A is given in the QTT format (6) with QTT-ranks r 1,…,r d−1, then

$$r_p \geq(1 - \varepsilon ) 4^{\min(p,d-p)}, \quad p=1,\ldots,d-1. $$

Proof

From the assumption of the theorem it follows that for every p the unfolding matrix \(F^{\{p\}}_{d}\) is approximated by the rank-r p matrix A {p} as follows,

where \(j'=\overline{j_{1}\cdots j_{p}}\), \(k'=\overline{k_{1}\cdots k_{p}}\), \(j''=\overline{j_{p+1}\cdots j_{d}}\), \(k''=\overline{k_{p+1}\cdots k_{d}}\). Denote by \(\tilde{F}^{\{p\}}_{d}\) the best rank-r p approximation of \(F^{\{p\}}_{d}\), then

$$\big\|F^{\{p\}}_d - \tilde{F}^{\{p\}}_d\big\| \leq \big\|F^{\{p\}}_d - A^{\{p\}}\big\|. $$

By Lemma 1, the 4p×4dp unfolding \(F^{\{ p\}}_{d}\) has orthogonal rows/columns and the accuracy of the best rank-r p approximation is exactly the following

$$\big\|F^{\{p\}}_d - \tilde{F}^{\{p\}}_d\big\| = \biggl( 1- \frac{r_p}{4^{\min (p,d-p)}} \biggr) \big\|F^{\{p\}}_d \big\|. $$

We have \(1- \frac{r_{p}}{4^{\min(p,d-p)}} \leq\varepsilon \), which completes the proof. □

Corollary 1

The QTT-ranks of the exact QTT-decomposition of the Fourier matrix are r p =4min(p,dp) and the storage size grows exponentially with d.

Remark 1

In contrast to the Fourier transform, the Hadamard transform matrix has QTT-ranks one, which follows directly from the definition,

This transform, also known as Walsh transform, arises in various applications, including quantum computing, etc. The Hadamard transform can be easily applied to a vector given in the QTT format and does not change the QTT-ranks. The Fourier transform can arbitrarily increase the QTT-ranks of a vector.

3.2 Radix-2 Recursion Formula in the QTT Format

Since the Fourier matrix is not compressible in the QTT format, we can not compute y=F d x using the matrix-vector multiplication algorithm in the TT format [42]. Nevertheless, we can apply the Fourier transform to the QTT vector efficiently. We define

(9)

and split odd and even values of the result

We come to the well-known radix-2 recursion formula, the simplest and most common case of the Cooley-Tukey fast Fourier transform (FFT) algorithm [2, 9]. It reduces the full-size Fourier transform to the half-sized transforms as follows

$$ P_d F_d = \left [ \begin{array}{c@{\quad}c}F_{d-1} & \\[4pt] & F_{d-1} \end{array} \right ] \left [ \begin{array}{c@{\quad}c}I & \\[4pt] & \varOmega_{d-1} \end{array} \right ] \frac{1}{\sqrt{2}} \left [ \begin{array}{c@{\quad}c}I & I \\[4pt] I & -I \end{array} \right ]. $$
(10)

Here P d is the bit-shift permutation which agglomerates even and odd elements of a vector, and \(\varOmega_{d-1} = \operatorname{{diag}}\{\omega_{d}^{k'}\}_{k'=0}^{2^{d-1}-1}\) is the matrix of twiddle factors. We will need the following general definitions later,

(11)

Our goal is to compute y=F d x for the vector x given in the QTT format (4). First, we note that the “top” and “bottom” half-vectors of x are the following

(12)

and their summation/subtraction affects only the last car.

The multiplication by the diagonal matrix is written as the Hadamard multiplication

(13)

where vector w d =[w d (k)] has the following rank-two QTT decomposition

$$w_d(k) = w_d(\overline{k_1 \cdots k_d}) = \left [ \begin{array}{c@{\quad}c} 1 & \omega_d^{k_1} \end{array} \right ] \left [ \begin{array}{c@{\quad}c} 1 & \\[4pt] & \omega_d^{2k_2} \end{array} \right ] \cdots \left [ \begin{array}{c@{\quad}c} 1 & \\[4pt] & \omega_d^{2^{d-2}k_{d-1}} \end{array} \right ] \left [ \begin{array}{c} 1-k_d \\[4pt] k_d \end{array} \right ]. $$

Therefore, the vector \(z = \operatorname{{diag}}(w_{d}) \hat{x} = w_{d} \odot\hat{x}\) has the following QTT decomposition

(14)

Note that the multiplication by the twiddle factors doubles the QTT-ranks.

The last step to implement (10) is to apply the half-size Fourier transform to the “top” and “bottom” parts of the vector z. We consider the following \({\frac{n}{2}}\times 2r_{d-1}\) matrix

and compute F d−1 Z′ in the QTT format, applying the radix-2 recursion subsequently.

Each radix-2 step applies the bit-shift permutation to the result. It is easy to notice that P 1 P 2P d =R d , which is a bit-reverse permutation \((R_{d} y)(\overline{j_{d} j_{d-1} \cdots j_{1}}) = y(\overline{j_{1} \cdots j_{d}})\). It can be nicely implemented in the QTT format without any computations by reversing the order of cars in the tensor train,

(15)

We add the bit-reverse step and summarize all the above in the QTT-FFT Algorithm 1.

Algorithm 1
figure 1

QTT-FFT, exact computation

4 Approximate Fourier Transform

We do not benefit from the use of the QTT format if QTT-ranks are too large. However, in Algorithm 1 the QTT-ranks grow by a factor of two each time we multiply by twiddle factors on Line 4. After d steps they grow by 2d=n, which makes the QTT representation completely ineffective. To perform computations efficiently, it is necessary to truncate the QTT-ranks, i.e., to approximate the result by the QTT format with smaller values of ranks.

4.1 TT-Rounding and TT-Orthogonalization

We recall the most important properties of the TT-rounding (also rank truncation or recompression) algorithm proposed in [42]. We will use it to approximate a vector x given in the QTT format (4) by vector \(\tilde{x}\) in the following QTT form,

$$ \tilde{x}(k) = \tilde{x}(\overline{k_1 \cdots k_d}) = \tilde{X}_{k_1}^{(1)} \cdots\tilde{X}_{k_d}^{(d)}, \quad\|\tilde{x}\| = \|x\|, \quad\|x - \tilde{x} \| \leq\varepsilon \| x\|, $$
(16)

where each \(\tilde{X}_{k_{p}}\) is an \(\tilde{r}_{p-1} \times\tilde{r}_{p}\) matrix, \(\tilde{r}_{0}=r_{0}\), \(\tilde{r}_{d}=r_{d}\), \(\tilde{r}_{p} \leq r_{p}\), p=1,…,d−1. The approximation can be done up to some desired accuracy level or by bounding the values of QTT-ranks \(\tilde{r}_{p}\) from above. If we want to prescribe the relative accuracy ε in (16), then the values of \(\tilde{r}_{p}\) appear during the work of the TT-rounding procedure and in general cannot be predicted. If \(\tilde{r}_{p}\) are prescribed, then the approximation is quasi-optimal [42, 45], i.e.,

$$\|x-\tilde{x}\| \leq\sqrt{d-1} \big\|x - \tilde{x}^{\mathrm{best}}\big\|, $$

where \(\tilde{x}^{\mathrm{best}}\) is the best approximation of x with TT-ranks \(\tilde{r}_{p}\). However, the error \(\|x - \tilde{x}\|\) is generally not known in advance. The TT-rounding algorithm is based on QR and SVD algorithms for matrices, which are well established and included in many linear algebra packages (e.g. LAPACK). We illustrate the TT-rounding procedure in Fig. 1.

Fig. 1
figure 2

The TT-rounding algorithm: (top) input; (bottom) output. The rectangles represent the QTT-cars, the sides of rectangles are proportional to the values of QTT-ranks

The key part of the TT-rounding algorithm is the TT-orthogonalization.

Definition 3

The TT-car X (p) is called left- or right-orthogonal if, respectively,

$$\sum_{k_p} \bigl(X^{(p)}_{k_p} \bigr)^* X^{(p)}_{k_p} = I \quad \mbox{or} \quad \sum _{k_p} X^{(p)}_{k_p} \bigl(X^{(p)}_{k_p} \bigr)^* = I. $$

If the TT-car X (p) is written as r p−1×n p ×r p tensor, the left-orthogonality implies that the r p−1 n p ×r p unfolding has orthogonal columns and the right-orthogonality means that the r p−1×n p r p unfolding has orthogonal rows. A product of two left-orthogonal TT-cars is also a left-orthogonal array, as explained by the following statement.

Statement 3

[42]

Consider p×q matrices A i , i=0,…,m−1, and q×r matrices B j , j=0,…,n−1. If the 3-tensors A=[A i ] and B=[B i ] are left-orthogonal, i.e., \(\sum_{i} A_{i}^{*} A_{i} = I\) and \(\sum_{j} B_{j}^{*} B_{j} = I\), then the p×mn×r tensor C=[C ij ], C ij =A i B j , is also left-orthogonal,

$$\sum_{ij} C_{ij}^* C_{ij} = \sum_{ij} (A_i B_j)^* (A_i B_j) = \sum_j B_j^* \biggl( \sum_i A_i^* A_i \biggr) B_j = \sum _j B_j^* B_j = I. $$

The same statement holds for right-orthogonal cars. It can be also generalized to a larger number of subsequent TT-cars.

Definition 4

A sequence of TT-cars X (p),X (p+1),…,X (q−1),X (q) will be referred to as a subtrain.

Statement 4

[42] If all TT-cars of the subtrain X (1),…,X (p) are left-orthogonal and r 0=1, then the following (n 1n p r p matrix X′ has orthogonal columns,

$$X'=\bigl[x'(\overline{k_1\cdots k_{p}},\alpha)\bigr], \quad\mbox{where } x'( \overline{k_1\cdots k_{p}}, :) = X^{(1)}_{k_1} \cdots X^{(p)}_{k_p}. $$

The left- or right-orthogonality of the particular subtrain can be achieved straightforwardly, using the matrix orthogonalization procedure such as QR. The algorithm is called TT-orthogonalization and given in [42]. The ‘structured orthogonality’ of subtrains allows to perturb other cars and control the accuracy of the whole tensor. We illustrate this in Fig. 2. Given a vector x with QTT representation (4), we apply a left TT-orthogonalization to the subtrain X (1)X (p) and a right TT-orthogonalization to X (s)X (d). As a result, these cars are modified and we get another QTT representation for the same vector x, where cars \(\hat{X}^{(1)},\ldots,\hat{X}^{(p-1)}\) are left-orthogonal and X (s+1),…,X (d) are right-orthogonal. We depict the left- and right-orthogonal cars as and , respectively. A similar notation is used in [21].

Fig. 2
figure 3

TT-orthogonalization. (a) Input QTT format for x; (b) TT-orthogonalization is applied to interfaces X (1)X (p) (left) and X (s)X (d) (right); (c) TT-rounding is applied to the subtrain \(\hat{X}^{(p)}_{k_{p}} X^{(p+1)}_{k_{p+1}} \cdots X^{(s-1)}_{k_{s-1}} \hat{X}^{(s)}_{k_{s}}\), giving \(\tilde{X}^{(p)}_{k_{p}} \tilde{X}^{(p+1)}_{k_{p+1}} \cdots\tilde{X}^{(s-1)}_{k_{s-1}} \tilde{X}^{(s)}_{k_{s}}\)

After TT-orthogonalization is done, we apply the TT-truncation to the subtrain X (p)X (s) and reduce the TT-ranks r p ,…,r s−1 to \(\tilde{r}_{p},\ldots,\tilde{r}_{s-1}\), introducing an approximation error or perturbation to the subtrain. The orthogonality of the left and right interfaces X (1)X (p−1) and X (s+1)X (d) guarantees that the same perturbation is introduced to the whole vector x.

Both the TT-rounding and TT-orthogonalization procedures for the QTT format have \(\mathcal{O}(dR^{3})\) complexity, where d is the length of the tensor train and R=maxr p is the maximum TT-rank. Note that the TT-rounding procedure can generate an approximation with orthogonal cars. In [42] the TT-rounding generates a left-orthogonal tensor train, but we will assume that TT-rounding generates a right-orthogonal QTT approximation, as shown in Fig. 1.

4.2 Accuracy of the QTT-FFT Algorithm with Approximation

We include the TT-rounding step in Algorithm 1 and result in the version of QTT-FFT with approximation, which we will call QTT-FFT in the following. It is given by Algorithm 2 and visualized in Fig. 3. Each step of Algorithm 2 includes an approximation which introduces an error to the active subtrain. In the following theorem we estimate the accuracy of the result returned by QTT-FFT.

Algorithm 2
figure 4

QTT-FFT, approximation

Fig. 3
figure 5

Visualization of the QTT-FFT Algorithm 2

Theorem 2

If accuracy-based criterion ε is used on each approximation step in Algorithm  2, the accuracy of the result is the following

$$\|y - F_d x\| \leq d \varepsilon \|y\|. $$

Proof

In the first step D=d of the algorithm it holds y=F d x d and

$$P_d y = (I_1 \otimes F_{d-1} ) z_d, $$

where I p denotes the 2p×2p identity matrix and z d is defined by (14). The approximation step gives \(\|z_{d} - \tilde{z}_{d}\| \leq\varepsilon \| z_{d}\|\) and since P d and I 1F d−1 are unitary matrices we have

$$ y = \tilde{y}_d + \mu_d, \quad \tilde{y}_d \mathrel{\stackrel{\mathrm{def}}{=}}P_d^T ( I_1 \otimes F_{d-1} ) \tilde{z}_d, \quad\| \mu_d\| \leq\varepsilon \|y\|, $$
(17)

where \(\tilde{y}_{d}\) is the approximation of the result and μ d is the error at step D=d. At the next step the Fourier transform is applied to the \({\frac{n}{2}}\times r_{d-1}(\tilde{z}_{d})\) matrix x d−1, where \(r_{d-1}(\tilde{z}_{d})\) is the corresponding QTT-rank of \(\tilde{z}_{d}\). The matrix x d−1 has the following QTT decomposition

If y d−1=F d−1 x d−1 is computed, the approximation \(\tilde{y}_{d}\) is written in elementwise notation as follows

$$ \tilde{y}_d(\overline{j_1 \cdots j_d}) = y_{d-1}(\overline{j_2 \cdots j_d}) \tilde{Z}^{(d)}_{d,j_1}. $$
(18)

Note that the TT-car \(\tilde{Z}^{(d)}_{d}\) is right-orthogonal after the TT-truncation, see Fig. 3. This means that any perturbation introduced to y d−1 results in the error of the same norm introduced to \(\tilde{y}_{d}\). We will now estimate this error.

In the step D=d−1 of the algorithm we compute the Fourier transform of x d−1 using the same radix-2 step. We define z d−1 by Lines 3 and 4 of Algorithm 1 and have

$$P_{d-1}y_{d-1} = (I_1 \otimes F_{d-2} ) z_{d-1}, $$

which is similar to what we had before. Approximation gives \(\|\tilde{z}_{d-1} - z_{d-1}\| \leq\varepsilon \| z_{d-1}\|\). This leads to

$$y_{d-1} = \tilde{y}_{d-1} + \mu_{d-1}, \quad \tilde{y}_{d-1} \mathrel{\stackrel{\mathrm{def}}{=}}P_{d-1}^T ( I_1 \otimes F_{d-2} ) \tilde{z}_{d-1}, \quad\| \mu_{d-1}\| \leq\varepsilon \|y\|. $$

Substituting this equation to (18) and (17), we have, in elementwise form,

Since the TT-car \(Z^{(d)}_{d}\) is right-orthogonal, it does not change the Frobenius norm, and both error terms have a norm bounded by εy∥. It is easy to notice that every approximation step D introduces the perturbation ∥μ D ∥≤εy∥, and since all TT-cars in the right subtrain \(Z^{(D+1)}_{D+1} \cdots Z^{(d)}_{d}\) are right-orthogonal, the perturbation of the result has the same norm. After d approximation steps of the algorithm, the total error is not larger than y∥. This completes the proof. □

Remark 2

The proof of Theorem 2 actually shows that the norm of the error of Algorithm 2 is not larger than the sum of the norms of errors introduced on each approximation step. If for a certain input vector all TT-truncation steps are exact, the result returned by Algorithm 2 is precise. Examples of such vectors are given in [48].

Theorem 2 holds for any input vector x. However, the QTT-ranks of the intermediate vectors x D , D=d,…,1, depend both on the accuracy level ε and the properties of the vector x. It is not easy to describe explicitly the class of vectors x for which all x D in Algorithm 2 will have QTT-ranks bounded by the desired value R for accuracy level ε. This problem is solved for R=1,ε=0 (QTT-rank-one vectors with QTT-rank-one Fourier images) in [48]. The numerical experiments provided in [48] show that the class of such vectors for larger R and non-zero ε can be ‘not so small’, even if ε is close to the machine precision. For instance, the Fourier transform of a random rank-one vector typically (in the large set of numerical tests) is approximated by a vector with moderate QTT-ranks.

4.3 Complexity Analysis

Now we estimate the asymptotic complexity of Algorithm 2 assuming that all QTT-ranks of vectors x D on all steps of the algorithm are bounded by R, as well as the QTT-ranks of input and output.

Theorem 3

If on levels D=d,…,1 of Algorithm  2 the QTT-ranks of intermediate vectors x D are not larger than R D , the complexity is estimated as follows

$$ \mathop{\mathtt{work}}\leq\sum_{D=1}^d \mathcal{O}\bigl(D R_D^3\bigr) \leq \mathcal{O} \bigl(d^2 R^3\bigr), \quad\mbox {\textit{where} } R=\max R_D. $$
(19)

Proof

In step D=d,…,1 we do the following:

  1. 1.

    alter the last block \(\hat{X}^{(D)}_{0,1}=\frac{1}{\sqrt{2}} (X^{(D)}_{0} \pm X^{(D)}_{1} )\) in \(\mathcal{O}(R_{D}^{2})\) operations;

  2. 2.

    multiply the cars by twiddle factors, see Eq. (14) in \(\mathcal{O}(R_{D}^{2})\) operations;

  3. 3.

    compress the subtrain \(z_{D}(k) = Z^{(1)}_{k_{1}} \cdots Z^{(D)}_{k_{D}}\) using the TT-truncation algorithm [42] of \(\mathcal{O}(D R_{D}^{3})\) complexity.

By summation we come to (19). □

The complexity of the QTT-FFT algorithm is bounded by \(\mathcal {O}(d^{2} R^{3})\), where R depends on the input vector x and the accuracy ε of the approximation. It can be compared with the complexity of the superfast quantum Fourier transform [6], which requires \(\mathcal{O}(d^{2})\) quantum operations. In general (for arbitrary input vector x) the value of R can grow exponentially with d. If we use the ‘brute-force’ TT-truncation with the maximum rank criterion, the value of R does not grow and the complexity is \(\mathcal{O} (d^{2})\) for all input vectors, but the accuracy of the result can be arbitrarily bad. If the accuracy-based criterion is used, Theorem 2 guarantees the accuracy of the result, but the value of R and the complexity can be arbitrarily large. Therefore, for a class of all possible input vectors the accuracy and the complexity of the QTT-FFT are related (informally speaking) by the uncertainty principle: both cannot be small.

It is interesting (both theoretically and practically) to establish a special subclass of vectors x for which the QTT-FFT algorithm has both the square-logarithmic complexity and reasonable accuracy. This work has been started in [48]. More examples are provided by the numerical experiments in Sect. 7.

5 Real-Valued Transforms Using QTT-FFT Algorithm

Many problems are formulated in real-valued arithmetic but can be efficiently solved using the Fourier transform. For example, consider the discrete convolution of two real-valued vectors, defined as follows

$$f(j) = \sum_{k=0}^{n-1} h(j-k) g(k), \quad j=0,\ldots,n-1. $$

In matrix-vector form f=Tg where \(T=[t(j,k)]_{j,k=0}^{n-1}\) is the Toeplitz matrix with elements t(j,k)=h(jk). Each n×n Toeplitz matrix is the leading submatrix of some 2n×2n circulant matrix

To correspond with the matrix T, the elements of C are defined by

$$\left . \begin{array}{l@{\quad}l} \hat{c}(k) = h(k), & k=0,\ldots,n-1, \\[4pt] \hat{c}(n) & \mbox{is arbitrary}, \\[4pt] \hat{c}(k) = h(k-2n), & k=n+1,\ldots,2n-1. \end{array} \right . $$

Circulant matrices are diagonalized by the unitary Fourier matrix [10]:

$$C = F^* \varLambda F, \quad\varLambda= \sqrt{2n} \operatorname{{diag}}(F \hat{c}). $$

The multiplication by the Toeplitz matrix T can be performed as follows

$$ \left [\begin{array}{c} f \\[4pt] * \end{array} \right ] = F^* \varLambda F \left [\begin{array}{c} g \\[4pt] 0 \end{array} \right ] , $$
(20)

which involves three Fourier transforms of size 2n and can be done ‘at FFT speed’. The result is the complex-valued vector representing the real-valued convolution. In standard computations we can easily take the real or imaginary part of a vector. For vectors or tensors represented in the QTT format such operation is not straightforward. The algorithm is given by the constructive proof of the following theorem.

Theorem 4

If the complex-valued tensor X=[x(k 1,…,k d )] is represented by the tensor train format (1) with ranks r 0,r 1,…,r d−1,r d , it can be represented by the following tensor train with TT-ranks r 0,2r 1,…,2r d−1,r d , where all cars but one are real-valued.

(21)

where p=2,…,d−1 and \(B_{k_{q}}^{(q)}=\Re X_{k_{q}}^{(q)}\), \(C_{k_{q}}^{(q)} = \Im X_{k_{q}}^{(q)}\) for q=1,…,d.

Proof

Start from

$$X^{(1)}_{k_1} = \left [\begin{array}{c@{\quad}c}B_{k_1}^{(1)} & C_{k_1}^{(1)} \end{array} \right ] \left [\begin{array}{c}I \\[4pt] \mathtt{i}I \end{array} \right ] , $$

where I denotes the identity r 1×r 1 matrix. Define the real-valued car \(\hat{X}^{(1)}_{k_{1}} = [B_{k_{1}}^{(1)} \ \ C_{k_{1}}^{(1)} ] \) and multiply [I i I]T to the second car of tensor train as follows

where in the right-hand side \(\hat{X}^{(2)}_{k_{2}}\) is the new real-valued car and I is r 2×r 2 identity matrix. Continue the process and establish (21). □

Corollary 2

The representation (21) allows to compute the real and imaginary parts of a tensor train (1) as follows

$$\Re x(k_1,\ldots,k_d) = \Biggl(\prod _{p=1}^{d-1}\hat{X}^{(p)}_{k_p} \Biggr) \Re\hat{X}^{(d)}_{k_d}, \quad\quad \Im x(k_1,\ldots,k_d) = \Biggl(\prod _{p=1}^{d-1}\hat{X}^{(p)}_{k_p} \Biggr) \Im\hat{X}^{(d)}_{k_d}. $$

Example 1

Obtain the QTT representation for cosine and sine functions, discretized on the uniform grid, i.e., vectors \([\cos\alpha k ]_{k=0}^{2^{d}-1}\) and \([\sin\alpha k ]_{k=0}^{2^{d}-1}\). They are the real and imaginary part of the vector (5), which has the rank-one QTT representation with the following cars

$$X^{(p)}_{k_p} = \exp\bigl(\mathtt{i}2^{p-1}\alpha k_p\bigr) = \cos 2^{p-1}\alpha k_p + \mathtt{i} \sin2^{p-1}\alpha k_p. $$

We use (21) and form the real-valued cars \(\hat{X}^{(1)}_{k_{1}} = [ \cos\alpha k_{1} \ \ \sin\alpha k_{1} ] \) and

$$\hat{X}^{(p)}_{k_p} = \left [\begin{array}{c@{\quad}c} \cos2^{p-1} \alpha k_p & \sin2^{p-1} \alpha k_p \\[4pt] -\sin 2^{p-1} \alpha k_p & \cos2^{p-1} \alpha k_p \end{array} \right ] , \quad p=2,\ldots,d-1. $$

We result in

(22)

This is the simultaneous representation of two considered vectors in the common QTT form. The QTT-ranks are two, which follows from the existence of the rank-one QTT representation for the exponential function and the simultaneous representation of the real and imaginary part of complex tensor train, ensured by Theorem 4.

Example 2

Given a vector x, compute the orthogonal DCT-II (discrete cosine even type-2) transform, defined as follows

$$y(j) = \sum_{k=0}^{n-1} x(k) \cos \biggl( \frac{\pi}{n}j\biggl(k+{\frac{1}{2}} \biggr) \biggr), \quad j=0,\ldots,n-1. $$

The history of this and related cosine transforms, including the applications to signal and image processing, can be found in a nice review [53]. The DCT-II can be computed through the Fourier transform of size 2n applied to the vector x expanded by zeros,

$$y(j) = \Re \Biggl[ \exp \biggl(-\frac{\pi\mathtt{i}}{2n}j \biggr) \sum _{k=0}^{2n-1} \hat{x}(k) \exp \biggl(- \frac{2\pi\mathtt {i}}{2n}jk \biggr) \Biggr], $$

where \(\hat{x}(0:n-1)=x\) and \(\hat{x}(n:2n-1)=0\). In computational practice double-sized vectors and transforms are usually avoided for the sake of efficiency. However, in the QTT approach, a double sized vector/transform does not require double storage/cost. If n=2d and x is given in the QTT format (4), the vector \(\hat{x}\) has the following QTT representation,

$$\hat{x}(k) = \hat{x}(\overline{k_1 \cdots k_d k_{d+1}}) = X^{(1)}_{k_1} \cdots X^{(d)}_{k_d} (1-k_{d+1}), $$

which has the same QTT-ranks as x. To compute y, we need to apply QTT-FFT algorithm 2 to find \(\hat{y} = F_{d+1} \hat{x}\), which requires \(\mathcal{O}((d+1)^{2} R^{3}) \) operations, asymptotically equal to the complexity of the transform of size 2d. Then we need to consider the “top” half-vector of \(\hat{y}\) as explained by (12) and multiply the result pointwise by the exponential vector. These operations do not change the QTT-ranks. Finally, we apply Theorem 4 and obtain the real-valued cosine transform with QTT-ranks twice as large as ones of the Fourier transform.

Finally, we should mention that the proposed approach to the computation of convolution and cosine transform can be generalized to other types of trigonometric transforms as well as to higher dimensions.

6 Discrete Fourier Transform in Higher-Dimensional Case

The m-dimensional normalized Fourier transform is defined as follows.

(23)

where j q =0,…,n q −1 for q=1,…,m. If x=[x(k 1,…,k m )] and y=[y(j 1,…,j m )] are considered as n 1n d vectors, the m-dimensional Fourier transform is the Kronecker product of m one-dimensional transforms, i.e., \(F_{d_{1},\ldots,d_{m}} = F_{d_{m}} \otimes F_{d_{m-1}} \otimes \cdots\otimes F_{d_{2}} \otimes F_{d_{1}}\). Using the tensor train representation (1), we have

(24)

We see that due to the perfect separation of variables, Fourier transform is independently applied to all TT-cars. The multidimensional Fourier transform does not change the TT-ranks. Therefore, Eq. (24) is the straightforward algorithm to compute the m-dimensional Fourier transform of a n×n×⋯×n vector with \(\mathcal {O}(m R^{2} n\log n)\) complexity, where R is the maximal TT-rank of a vector.

To reach square-logarithmic complexity (of course, for a special type of input data), we assume \(n_{p}=2^{d_{p}}\) and use the binary notation (3) for all mode indices. The input array is represented as follows

(25)

Then the QTT-FFT Algorithm 2 can be used to compute unimodular Fourier transforms subsequently. The QTT-FFT introduces the error, controlled by Theorem 2. The appropriate TT-orthogonalization should be used for interfaces to guarantee that this error will not be amplified in the whole m-dimensional vector. The multi-dimensional version of QTT-FFT is summarized by Algorithm 3 and visualized in Fig. 4. The accuracy and complexity are estimated similarly to the one-dimensional case.

Algorithm 3
figure 6

Multi-dimensional QTT-FFT, approximation

Fig. 4
figure 7

Visualization of the QTT-FFT Algorithm 3

Theorem 5

If the accuracy-based criterion ε is used on each approximation step in Algorithm  3, the accuracy of the result is the following

$$\|y - F_{d_1,\ldots,d_m} x\| \leq\sum_{q=1}^m d_q \varepsilon \|y\|. $$

Theorem 6

The complexity of Algorithm  3 for m-dimensional transform of an n×n×⋯×n vector with n=2d is not larger than \(\mathcal{O}(m d^{2} R^{3})\), where R is the maximum QTT-rank of all intermediate vectors at all steps of Algorithm  3 and all internal calls to Algorithm  2.

Proof

In Line 1 of Algorithm 3 the TT-orthogonalization of the tensor train with md cars is performed, which requires not more than \(\mathcal{O}(mdR^{3})\) operations. Then the Algorithm 2 and TT-orthogonalization are applied to m subtrains with d cars each, which costs not more than \(m\mathcal{O}(d^{2}R^{3})+m\mathcal{O}(d R^{3})\). The last step does not require any operations. Therefore, we obtain the \(\mathcal{O}(md^{2}R^{3})\) estimate, which completes the proof. □

Note that due to the bit-reverse permutation issues, the order of mode indices in the tensor train on output is reversed as follows.

(26)

The reason is that every unimodular QTT transform is performed by Algorithm 2 with the bit-reverse permutation of the result. In the 1D case, the bit-reverse permutation is made without any computations by simple reordering of the transposed QTT-cars. If we apply this reordering to each unimodular subtrain individually, the TT-ranks (ranks separating the physical variables) will not be consistent. Instead, we reverse the tensor train globally and obtain the result where the order of bits in the QTT representation is correct, but the order of frequency modes is reversed compared with the original ordering of space modes, as shown in Fig. 4. Fortunately, a wide class of applications (e.g. convolution, solution of discretized PDEs) requires two successive Fourier transforms. In this case the second transform will recover the original ordering.

This drawback of the multi-dimensional QTT-FFT can be removed using a one-dimensional QTT-FFT algorithm based on the self-sorting FFT [20].

7 Numerical Examples

In this section we compare the accuracy and timings of the trigonometric transforms for one-, two- and three-dimensional functions with moderate QTT-ranks. From the proof of Theorem 3 we see that the complexity of the QTT-FFT depends on all QTT-ranks. The maximum QTT-rank used in (19) sometimes gives incorrect impression of “structure complexity” of particular data array. To account the distribution of all QTT-ranks r 0,…,r d , we define the effective QTT-rank Footnote 4 of a vector as the positive solution r of the quadratic equation  mem(r 0,r 1,r 2,…,r d−1,r d )= mem(r 0,r,r,…,r,r d ), where  mem measures the storage for the QTT format as follows

$$\mathop{\mathtt{mem}}(r_0,r_1,r_2, \ldots,r_{d-1},r_d) = r_0 r_1 + r_1 r_2 + \cdots+ r_{d-1} r_d. $$

The effective QTT-rank is generally non-integer.

For experiments we use one Xeon E5504 CPU at 2.0 GHz with 72 GB of memory in the Institute of Numerical Mathematics, Russian Academy of Sciences, Russia. The TT subroutines and QTT-FFT algorithms are implemented in Fortran 90 by the third author. We use Intel Fortran Composer XE version 12.0 (64 bit) and BLAS/LAPACK and FFTW packages provided with MKL library.

7.1 Fourier Images in 1D

For integrable function f(x), the Fourier transform, or image \(\hat{f}(\xi)\) is defined for every real ξ as follows

$$ \hat{f}(\xi) = \int_{-\infty}^{+\infty} f(t) \exp(-2\pi\mathtt {i}t\xi) dt. $$
(27)

We consider the rectangle pulse function, for which the Fourier transform is known,

$$ \varPi(t) = \left \{ \begin{array}{l@{\quad}l} 0, & \mbox{if } |t|>{\frac{1}{2}}\\[4pt] {\frac{1}{2}}, & \mbox{if } |t|={\frac{1}{2}}, \\[4pt] 1, & \mbox{if } |t|<{\frac{1}{2}}, \end{array} \right . \quad \hat{\varPi}(\xi) = \operatorname{{sinc}}(\xi) \mathrel{\stackrel { \mathrm{def}}{=}}\frac{\sin\pi\xi}{\pi\xi}. $$
(28)

The Fourier integral is approximated by the rectangle rule. Since f(t)=Π(t) is real and even, we write

$$ \hat{f}(\xi_j) = 2 \Re\int_0^{+\infty} f(t) \exp(-2\pi\mathtt {i}t\xi_j) dt \approx2 \Re\sum _{k=0}^{n-1} f(t_k) \exp(-2\pi \mathtt{i}t_k \xi_j) h_t, $$
(29)

where \(t_{k} = (k+{\frac{1}{2}}) h_{t}\), \(\xi_{j} = (j+{\frac{1}{2}}) h_{\xi}\) and k,j=0,…,n−1. If \(h_{t} h_{\xi}= {\frac{1}{n}}\) and n=2d, the sum in the right-hand side can be computed by DFT as follows

$$ \hat{f} \approx\tilde{f} = 2^{{\frac{d}{2}}+1} h_t \Re ( \omega_{d+2} \varOmega_d F_d \varOmega_d f ), \quad \hat{f} = \bigl[\hat{f}(\xi_j) \bigr]_{j=0}^{2^d-1}, \quad f = \bigl[f(t_k) \bigr]_{k=0}^{2^d-1}. $$
(30)

For \(h_{t}=h_{\xi}= \frac{1}{2^{\frac{d}{2}}}\) and d even, the QTT representation of the rectangular pulse has QTT-ranks one, i.e.,

$$\varPi(t_k) = \varPi\biggl(\frac{h}{2} + \overline{k_1 \cdots k_{{\frac{d}{2}}-1}} h + \overline{k_{{\frac{d}{2}}}\cdots k_d}/2 \biggr) = (1-k_{{\frac{d}{2}}}) \cdots(1-k_d). $$

We can use QTT-FFT Algorithm 2 to compute DFT in (30). According to Theorem 2 the relative accuracy of the result is ε if the tolerance parameter is set to \(\frac {\varepsilon }{d}\). The QTT-ranks and timings will grow for smaller ε. In Table 1 we compare the runtime of Algorithm 2 for different ε with the runtime of the FFT out-of-place algorithm from the FFTW library. The timings are rather small and were obtained by averaging the computation time over a large number of executions. We see that for the considered example the QTT-FFT algorithm outperforms the FFT algorithm from FFTW library for n≥220, even for very high accuracy. We also see that the use of moderate or low accuracy for QTT-FFT significantly reduces the QTT-ranks of the result and speeds up the computation. That is not the case for the standard FFT algorithm.

Table 1 Time for QTT-FFT (in milliseconds) w.r.t. size n=2d and accuracy ε. Here  time QTT is the runtime of Algorithm 2,  time FFTW is the runtime of the FFT algorithm from the FFTW library, and \(\operatorname{{rank}} \hat{f}\) is the effective QTT-rank of the Fourier image

The accuracy of the rectangle rule (30) increases for smaller step size h. For fixed frequency ξ we can expect \(\mathcal{O}(h^{2}) = \mathcal {O}(n^{-1})\) convergence, but for very large frequencies ξ j , \(j\gtrsim{\frac{n}{2}} \), Eq. (29) is not accurate since the period of exponent exp(−2πxξ j ) is less 2h. Therefore, we check the accuracy of (30) for a subvectors corresponding to 0≤ξ j ≤16. The results are shown in Table 2, where

$$\mathop{\mathtt{acc}}= \| \hat{f} - \tilde{f} \|_{[0,16]} / \| \hat{f} \|_{[0,16]}, \quad \mbox{where } \| f \|^2_{[0,16]} = \sum_{j: 0 \leq\xi_j \leq16} \big|f(\xi_j)\big|^2. $$

The computations for n≤228 can be performed by FFTW or QTT-FFT. For n>228 the one-processor subroutine from FFTW cannot be used due to storage limitations and data-sparse QTT format is necessary to increase the accuracy of the computed Fourier image. For d≤28 the accuracy of QTT-FFT was verified by comparison with FFTW and the QTT-FFT error was always ‘under control’, i.e., less than the desired accuracy.

Table 2 Accuracy verification for the QTT-FFT Algorithm 2 and f=Π(t). The desired accuracy is ε=10−13, grid size \(h=2^{-{\frac{d}{2}}}\), and  acc denotes the relative accuracy in Frobenius norm of (30) for subvectors corresponding to 0≤ξ j ≤16

7.2 Fourier Images in 3D

In larger dimensions the one-dimensional size of the array that can be processed by standard FFT algorithm is severely restricted by computing resources. Therefore, the discretization accuracy obtained on simple workstations is often low and multiprocessor parallel platforms have to be used for larger grids. As well as in 1D case, the use of data-structured QTT-FFT algorithm relaxes/removes the grid size restrictions and allows to reach higher accuracy of the computation using one CPU. As an example, we compute the following m-dimensional Fourier image from [52]

(31)

where Γ is the Gamma function, δ is arbitrary real parameter, J α is the Bessel function of the first kind of order α, and xξ denotes the scalar product. Introducing the uniform tensor product n×n×⋯×n grids in space and frequency domains with step sizes \(h_{x} = h_{\xi}= \frac {1}{\sqrt{n}}\) and n=2d, we approximate (31) by the rectangle quadrature rule. The result can be computed by m-dimensional DFT.

We perform the experiments for the case m=3, where DFT can be computed both by FFTW and QTT-FFT. The accuracy of Algorithm 3 is verified by comparison with the analytical answer (31). Since the comparison at all 2dm points is unfeasible, the accuracy is defined as follows

The results are shown in Table 3. For d=8 the accuracy of QTT-FFT is verified also by comparison with the FFTW algorithm.

Table 3 Relative accuracy  acc of 3D Fourier image (31)

7.3 Convolution in 3D

In the scientific computing the Fourier transform is often used for the computation of convolutions. As a motivating example, consider the evaluation of the Newton potential,

$$ V(x)=\frac{1}{4\pi}\int_{\mathbb{R}^3} \frac{f(y)}{\|x-y\|} dy, $$
(32)

which plays an important role in the electronic structure calculations based on the Hartree-Fock equation. In many chemical modelling programs, e.g., PC GAMESS and MOLPRO, the electron density function f(y) is represented as a sum of polynomially weighted 3D Gaussians, for which (32) is evaluated analytically. For larger molecules, the proper choice of Gaussian-type orbitals (GTO) basis requires significant efforts, and sometimes computations become infeasible due to the huge number of basis elements involved. The possible alternative may be the discretization on the n×n×n tensor grids using standard Galerkin/collocation schemes with piecewise constant or linear basis elements. The electron density function has strong cusps at the positions of the nuclei, which motivates the use of very precise grids. The tensor-structured methods lead to almost linear or even logarithmic complexity w.r.t. n, which makes the use of very fine grids and high accuracy of computations possible without a special choice of basis. In this way, fast tensor-product convolution with the Newton kernel based on the canonical or/and Tucker tensor decompositions has been developed in [30, 33, 50] and applied to the fully discrete grid-based solution of the Hartree-Fock equation in [26, 35]. These algorithms reduce the three-dimensional convolution to a number of one-dimensional ones, which are computed by FFT in \(\mathcal {O}(n\log n)\) operations. The QTT representation of the canonical vectors reduces the complexity to the logarithmic scale w.r.t. n, see [28]. The multidimensional convolution of QTT-structured data can be computed also directly using the analytical QTT decomposition of a multilevel Toeplitz matrix [25]. We refer also to [8, 12, 27, 29, 34, 43, 47, 51] for the detailed discussions on recent progress in this field.

To discretize (32), we reduce the integration interval to \(B\mathrel{\stackrel{\mathrm{def}}{=}}[-{\frac{1}{2}},{\frac{1}{2}}]^{3}\), assuming that f(y) vanishes outside this cube, and introduce the uniform n×n×n tensor-product grid in B. We consider two discretization methods: collocation and Nyström-type. Collocation method approximates f(y) by a piecewise constant function and leads to

$$ V(x) \approx V_c(x) \mathrel{\stackrel{ \mathrm{def}}{=}}\frac {1}{4\pi} \sum_k f(y_k) \int_{B_k} \frac{dy}{\|x-y\|}, \quad\quad V(x_j) \approx v_c(j) \mathrel{\stackrel{ \mathrm{def}}{=}}V_c(x_j), $$
(33)

where \(x_{j} = (h+{\frac{1}{2}})j\), \(y_{k} = (h+{\frac{1}{2}})k\), B k is the h×h×h cube centered at y k , j=(j 1,j 2,j 3), k=(k 1,k 2,k 3) and j p ,k p =0,…,n−1 for p=1,2,3. As shown in [30], the pointwise accuracy of this approximation is

$$\big|V(x_j) - v_c(j)\big| \leq C h^2. $$

Nyström-type convolution appears if we write (32) at points x=x j and then approximate the integral by the rectangle quadrature rule over B as follows

$$ V(x_j) \approx v_n(j) \mathrel{ \stackrel{\mathrm{def}}{=}}\frac {1}{4\pi} \sum_k f\bigl(y'_k\bigr) \frac {h^3}{\|x_j-y'_k\|}, $$
(34)

where \(y'_{k} = kh\) and j,k are defined as earlier. Since \(y'_{k} \neq x_{j}\) for all j and k, the right-hand side in this equation is well defined. The standard accuracy estimate of the rectangle rule is not applicable to the singular functions and we did not find the accuracy estimate of this method in the literature. Surprisingly, it can be proven that the accuracy of the Nystöm-type convolution for the 3D Newton potential of smooth functions is estimated as

$$\big|V(x_j) - v_n(j)\big| \leq C h^2 |\log h|, $$

which is almost the same order of convergence, as for the collocation method. We will report the proof elsewhere.

Note that the Nyström-type method ‘shifts’ the result, i.e., grids for f(y) and V(x) do not coincide. This can be a drawback for some applications. However, the coefficients of the convolution core in (34) are easier to compute than the ones in (33).

Both (33) and (34) represent the three-dimensional discrete convolution. By (20), it can be computed by three three-dimensional DFTs of size (2n)3. This operation is highly restricted by the value of n and for n≳29 is not feasible for full-sized vectors using a standard FFT algorithm on one processor. If f(y) has moderate QTT-ranks, the Newton potential can be efficiently computed using QTT-FFT Algorithm 3. To do this, we need the QTT representation of the first column of the Newton potential matrix, which is different for the collocation and Nyström-type methods. A similar problem was discussed for the representation of Newton/Yukawa potential in canonical format [1], and was solved using the sinc-quadrature from [17], which expresses r −1 as the sum of Gaussian functions, i.e.,

Gaussians are perfectly separable in physical modes and the QTT representation of each term can be constructed as the Kronecker product of one-dimensional Gaussians, for which we can use full-to-TT algorithm [40]. The QTT-ranks of a one-dimensional Gaussian are bounded by \(\mathcal {O}(\log^{{\frac{1}{2}}}\varepsilon^{-1})\) [5].

The pointwise relative accuracy of the sinc-quadrature for different r is shown in Fig. 5(left). We see that for \(\frac{r_{\mathrm{max}}}{r_{\mathrm{min}}} \approx n \lesssim10^{4}\), the quadrature with n q =500 terms provides almost the machine precision. For n≤106, the quadrature with n q =3600 terms has the pointwise relative accuracy ε=10−13, which is enough for our purposes. Since the potential should be constructed only once (there is no adaptation of grid, etc.), the large n q is not restrictive. In Fig. 5(right) we show the QTT-ranks of the convolution cores (33) and (34) w.r.t. grid size n and relative accuracy level ε.

Fig. 5
figure 8

(left) Pointwise relative accuracy of the sinc-quadrature [17, Eq. (5.3)] with n q terms for r −1. (right) QTT-ranks of the first column of Newton potential matrix for grid size n and relative accuracy ε. Solid lines—collocation method (33), dashed lines—Nyström-type method (34)

To check the accuracy and speed of the proposed method, we compute the Hartree potential (32) for the Gaussian charge density f=g 0,σ for different σ and compare the result with the analytical answer,

(35)

Here a,σ are the center and width of the Gaussian and \(\operatorname{{erf}} (x)\mathrel{\stackrel{\mathrm{def}}{=}}\frac{2}{\sqrt{\pi}}\int_{0}^{x} \exp(-t^{2})dt\) is the error function. Accuracy and timings of the evaluation of (35) using 3D convolution are shown in Fig. 6. The relative accuracy in the Frobenius norm is measured over one line of grid points, \(\{x_{j}\} \times \{x_{{\frac{n}{2}}}\} \times \{x_{{\frac{n}{2}}}\}\), j=0,…,n−1.

Fig. 6
figure 9

Accuracy (left) and computation time (right) for the Newton potential of the Gaussian charge density (35) with width σ on the uniform tensor-product n×n×n grid. Solid lines—collocation method (33), dashed lines—Nyström-type method (34). Approximation accuracy for the QTT-FFT is ε=10−10

It is interesting, that in Fig. 6(left) there are three error plateaus which appear for different reasons. The right plateau for σ=10−1 appears on the level  acc≈10−6 due to the error, introduced by the truncation of the integral away from the cube B, since on the boundary the Gaussian does not vanish to machine zero, i.e., \(\exp (- \frac{x^{2}}{2 \sigma^{2}} ) |_{x={\frac{1}{2}}} \approx10^{-6}\). Gaussian charges with smaller σ represent stronger cusps and do not have this error visible. The right plateau for the Gaussian with σ=10−2 appears due to the approximation error in QTT-FFT, which is set to ε=10−10. Finally, the left plateau for σ=10−3,σ=10−4 and σ=10−5 appear since these cusps cannot be properly described on coarse grids. Therefore, a precise evaluation of the Newton potential of such functions on the uniform tensor product grids requires larger grid sizes and is feasible using QTT approximation.

Timings for the evaluation of 3D convolution using the QTT-FFT algorithm are very moderate and scale square-logarithmically w.r.t. d, see Fig. 6(right).

7.4 Signals with Sparse Fourier Images

Consider a sum of p plane waves in m-dimensional space, defined as follows

$$ f(k) = \sum_{p=1}^R a_p \exp \biggl( \frac{2\pi\mathtt{i}}{n} f_p \cdot k \biggr), \quad f_p \in\mathbb{R}^m, \quad k=\{0,\ldots,n-1 \}^m, \quad n=2^d. $$
(36)

Each plane wave has QTT-ranks one (5), which do not depend on the accuracy and vector size. The QTT-ranks of f=[f(k)] are not larger than R. If the frequencies are integer numbers, f p ∈ℤm, all intermediate vectors of Algorithm 3 also have QTT-ranks one [48]. Therefore, using plain waves we can construct a signal with prescribed QTT-ranks and complexity of the QTT-FFT algorithm, which allows to compare the speed of QTT-FFT and the standard FFT algorithm for different d and R. The results are given in Fig. 7.

Fig. 7
figure 10

QTT-FFT and FFTW time for R plane waves in m dimensions. Graphs 1–3 show the runtime w.r.t. problem size N=n m for different R and m. Graph 4 shows the runtime w.r.t. R for m=1 (thick solid lines), m=2 (dashed lines), m=3 (thin gray solid lines)

We see that the QTT-FFT is asymptotically faster than the full-size FFT for all moderate R that we have tested. However, the crossover point, i.e., the minimum size of the transform for which QTT-FFT outperforms the standard method, depends on R and m. For example, for R=8 the crossover point for m=1 is d≈20, for m=2 is d≈19 and for m=3 is d≈18. Therefore, the QTT-FFT algorithm performs better for high-dimensional problems than the standard FFT algorithm and is not restricted by the problem size. However, the QTT-FFT is not efficient if R is too large.

By Theorem 6, the complexity of multi-dimensional QTT-FFT is not larger than \(\mathcal{O}(m d^{2} R^{3})\). Using signals (36) with different R, we can measure the actual runtime w.r.t. R. The results for signals (36) with a p =1 and random integer frequencies f p ∈ℤm are given by the fourth graph in Fig. 7. Surprisingly, the computational time grows quadratically, not cubically, w.r.t. parameter R for the large-size transforms. The time of smaller-size transforms (see N=212) tends to a constant for large R, since the QTT-ranks of vector (36) can not exceed the QTT-ranks of full-rank vectors. This constant level is, however, significantly larger than the time of corresponding transform of a dense vector. That shows that our current implementation of QTT-FFT is by far not so optimized as the algorithms from FFTW library. We also see that for transforms of the same size N=n m, i.e., with the same log2 N=mlog2 n=md, the transforms with larger m are faster, since the complexity is linear w.r.t. m and quadratic w.r.t. d.

Since the complexity of QTT-FFT is defined by the parameters d,m and R of the input vector, the numerical results obtained for plane waves can be generalized to all vectors with QTT-ranks bounded by R at all steps of the Cooley-Tukey algorithm. Therefore, the runtimes given in Fig. 7 can be considered as the ‘typical’ runtimes of QTT-FFT algorithm for signals characterized by m,n and R.

7.5 QTT-FFT and Sparse Fourier Transform

The QTT representation is the example of a data-sparse representation, i.e., with number of representation parameters asymptotically smaller than the full number of array elements. The sum of plane waves (36) is a particular example of a signal with data-sparse QTT representation: the QTT-ranks of the sum of R plain waves does not exceed R. If f p ∈ℤm, this signal has no more than R non-zero Fourier components and is both QTT-sparse and Fourier-sparse.

The sparse Fourier transform problem is the following: given a signal of length n, estimate the R largest in magnitude coefficients of its Fourier transform. If we use all n elements of a given vector and the standard FFT algorithm, the answer can be trivially computed in \(\mathcal{O}(n \log n)\) operations, which is both the complexity of FFT and sorting algorithm. To reduce the complexity, heuristic algorithms are developed, which estimate the Fourier-sparse representation with given number of terms R using a subset of samples (vector elements), usually selected with a certain randomization. Sparse Fourier transform algorithms are most efficient for the vectors with o(n) non-zero Fourier components. Note that the description of this class of vectors is not explicit, as well as the description of a class of ‘suitable’ vectors for the QTT-FFT algorithm. The Fourier-sparsity of a general input vector cannot be predicted before the Fourier transform is computed.

To compare QTT-FFT with Sparse Fourier transform, we should first explain that QTT-FFT solves actually a different problem: given a signal in data-sparse QTT form, compute a Fourier transform in the same data-sparse format. The similarity, which allows to compare the QTT-FFT approach with the Sparse Fourier transform, is the following: the class of QTT-sparse vectors includes all Fourier-sparse vectors, since each signal with exactly R non-zero Fourier coefficients has QTT-ranks bounded by R (but not vice versa). The differences are the following: QTT-FFT algorithm requires input to be given already in the QTT format, and the output is again the QTT format, which is data-sparse, but not sparse. Two missing components are an algorithm that computes the QTT representation of a given signal from a set of samples (TT-cross), and an algorithm that converts the QTT format to a sparse vector (TT-sparse). This is illustrated by a scheme in Fig. 8.

Fig. 8
figure 11

Sparse Fourier transform and QTT-FFT approach

Algorithms that reconstruct the tensor train format from samples of a given array are proposed in [45, 49]. They are based on the maximum-volume principle [11, 13], which selects proper sets of indices in the array for the interpolation procedure. The maximum-volume algorithm is heuristic, adaptive to the signal and originally non-random, but certain randomness can be included to check the accuracy and/or improve the robustness. The development of the maximum-volume concept to higher dimensions is still at the early stage and “TT-cross” algorithms should be further improved, as well as the background theory.

The “TT-sparse” algorithms were never considered previously, mostly due to the reason that data-sparse TT/QTT representations are “good enough” for the numerical work and we usually do not need to convert them into the pointwise-sparse format. In order to proof the concept, we can propose a very naive algorithm which sparsifies the TT-cars of (1) independently as follows \(A^{(p)}_{k_{p}} \to\tilde{A}^{(p)}_{k_{p}}\), where

$$ \tilde{A}^{(p)}_{k_p}(i,j) = \left \{ \begin{array}{l@{\quad}l} A^{(p)}_{k_p}(i,j), & \mbox{if } |A^{(p)}_{k_p}(i,j)| \geq\mu\max_{k_p} \max_{i,j} |A^{(p)}_{k_p}(i,j)|, \\[6pt] 0,& \mbox{elsewhere}, \end{array} \right . $$
(37)

where μ is the threshold parameter. The sparsified TT format can be easily “decompressed” into a sparse vector.

We show how the “TT-cross → QTT-FFT → TT-sparse” scheme works for the particular two-dimensional signal, which is a sum of five plain waves (36) with unit amplitudes and randomly taken frequencies. Since the frequencies are not integer, the discrete Fourier image of this signal (see Fig. 9(left)) is not sparse, but has strong cusps near the positions of f p . We apply the TT-ACA algorithm [49] (which is the DMRG-type TT-cross algorithm) to reconstruct the QTT representation of the signal from a few samples. Then we apply Algorithm 3 and sparsify the DFT image by (37) with μ=0.4. The result is shown in Fig. 9(right) and gives the correct positions of all five plane wave frequencies. This example shows that QTT-FFT approach (three-step scheme) can be helpful for problems where Sparse Fourier transform algorithms are applied.

Fig. 9
figure 12

(Left) DFT image of 2D data set with R=5 plane waves on 2d×2d grid, d=9. Logarithm of amplitude is shown in 256 levels of gray (black = maximum, white = minimum). (Right) The image after sparsification in the QTT format

8 Comparison with Sparse Fourier Transform

8.1 Comparison with Sparse Fourier Transform Algorithms

The history of the Sparse Fourier transform framework can be found in [18]. Since the Sparse Fourier transform is not a central topic of this paper, we will compare our approach only with recent algorithms developed in this field.

8.1.1 RAlSFA

The randomized heuristic algorithm RAlSFA [55] finds a near-optimal R-term Fourier-sparse representation of a signal of length n with probability δ and quasi-optimality parameter α in time \(\operatorname{{poly}}(r) \log n \log(\frac {1}{\delta })/\alpha^{2}\), where the polynomial \(\operatorname{{poly}}(r)\) is empirically found to be quadratic. The complexity of the QTT-FFT transform for such array is \(\mathcal{O}(R^{3} \log^{2} n)\), that is larger than the RAlSFA complexity by Rlogn factor. The program code is not available in public domain, and we can compare the running time only using the information about crossover points with FFTW. The crossover point between RAlSFA and FFTW algorithms is reported for R=8, some δ and α and dimensions m=1,2,3 as N≃70000 for 1D transforms, N≃9002 for 2D transforms and (estimated) as N≃2103 for 3D transforms. The crossover points of QTT-FFT and FFTW for R=8 and m=1,2,3 are N≃220 for 1D problems, N≃219 for 2D problems and N≃218 for 3D problems, see Fig. 7. If we consider the runtime of TT-ACA algorithm together with QTT-FFT, the crossover point for 1D transform moves to N=221, see Fig. 10(left). For m=2,3 the runtime of TT-ACA also does not change the crosspoint significantly.

Fig. 10
figure 13

Runtimes of the TT-ACA + QTT-FFT algorithms (solid lines), AAFFT algorithm (dashed lines) and FFTW (gray lines) for the plane waves example (36), m=1. (Left) Integer frequencies f p ∈ℤ, (right) real frequencies f p ∈ℝ, relative accuracy ε=10−2

We can conclude that our approach is slower than RAlSFA for 1D problems, has almost the same speed for 2D problems and outperforms it in higher dimensions. The other advantage of the QTT-FFT method is that it does not require any choice of parameters, while the parameter tune in RAlSFA is quite tricky.

8.1.2 sFFT

The nearly optimal Sparse Fourier transform algorithm was recently proposed by MIT group [18, 19]. The complexity is reported to be \(\mathcal{O}(R \log n \log(n/R))\), which is smaller than the complexity of QTT-FFT by a factor of R 2 for nR. The sFFT algorithm is faster than FFTW for n=222 and R≤217. From Fig. 10(left) we see that the “TT-ACA + QTT-FFT” approach is competitive with FFTW for n=222 and R≲23, i.e., our approach is far less efficient for this problem. The program code for sFFT is currently not available in public domain.

8.1.3 AAFFT

The combinatorial sublinear Fourier transform algorithm [23] provides the approximation with high probability and has \(\mathcal{O}(R \log^{5} n)\) complexity. The deterministic version requires \(\mathcal{O}(R^{2} \log^{4} n)\) operations. The complexity of QTT-FFT is \(\mathcal{O}(R^{3} \log^{2}n)\) and TT-ACA algorithm [49] requires \(\mathcal{O}(R^{3} \log n)\) operations for each iteration/sweep. Therefore, we can expect that AAFFT is faster than our approach for larger R.

The program code of AAFFT algorithm (written in C++) is available in public domain [22] for the case m=1. This allows us to perform the numerical comparison. Note that the measured runtime includes the time of evaluation of the value of signal on selected positions (samples), i.e., signal is not precomputed before the experiment. This allows us to use finer grids with n≤230 but increases the runtime of “sampling” part (TT-ACA and the corresponding part of AAFFT) by the factor of R. The results are reported in Fig. 10(left) in terms of CPU times. We see that our approach outperforms AAFFT for small n or very small R. For larger n and R the AAFFT algorithm is faster than the “TT-ACA + QTT-FFT” scheme for the signal that is the sum of R one-dimensional plain waves (36) with integer frequencies f p ∈ℤ. Comparison in terms of number of samples gives similar result.

8.2 Comparison for Limited Bandwidth Signals

We have considered the signals (36) with integer frequencies, i.e., the exactly Fourier-sparse case. Such signals are the ‘best examples’ for the Sparse Fourier transform algorithms. However, since our approach is applicable to a wider class of signals, it is interesting to provide comparison also for the signals which are not exactly Fourier-sparse. For instance, we consider the same plain waves example (36) with arbitrary real frequencies f p ∈ℝm. The Fourier images of such signals are not exactly sparse, for example see Fig. 9(left). The QTT-ranks of (36) are bounded by the number of terms R, but the QTT-ranks of Fourier image (as well as the complexity of QTT-FFT) are larger and depend on selected accuracy level ε, if f p ∉ℤm.

The number of Fourier terms required to represent the Fourier image of (36) with accuracy ε in sparse format is o(n) but grows with ε. We found that for signals (36) with random frequencies and unit amplitudes the Sparse Fourier representation with about 32R terms provides the accuracy ε=10−1 and 1024R terms provide the accuracy ε=10−2. The corresponding runtime for the accuracy ε=10−2 is shown in Fig. 10(right). We see that the QTT-ranks and runtime of our method grow very slowly with ε, while the runtime of AAFFT is proportional to the number of the Fourier terms required for the representation. Therefore, the sum of plane waves (36) with real frequencies is an example of a signal for which our approach allows to compute the approximate Fourier image faster / more accurately than the AAFFT Sparse Fourier algorithm.

Another example is the signal consisting of three components with limited bandwidths,

$$ \hat{f}(\xi) = \exp \biggl(-\frac{\xi^2}{2\sigma^2} \biggr) + \frac{1}{10}\exp \biggl(-\frac{(\xi-\xi_*)^2}{2\sigma^2} \biggr) + \frac{1}{10}\exp \biggl(-\frac{(\xi+\xi_*)^2}{2\sigma^2} \biggr), $$
(38)

where σ=0.02 and ξ =0.4. The Fourier image is shown in Fig. 11(left). This signal models the AM (amplitude modulation) broadcast, where the amplitude of central carrier signal is one order of magnitude larger than the amplitude of sidebands. It is known that the spectrum of the signal belongs to interval [−1,1] and we can measure the signal in the time domain at points t k =hk on the uniform grid of size n=2d. Suppose that our goal is to locate the positions of sidebands in frequency space using a small number of samples from the signal and/or the minimum runtime.

Fig. 11
figure 14

Example of limited bandwidth signal. Left—Fourier image, right—positions of frequencies of R-term Fourier-sparse representation computed by AAFFT w.r.t. R

Since the signal has many Fourier components which are almost zero, the problem can be naturally considered using the Sparse Fourier transform framework, i.e., compute the R-term Fourier-sparse representation of given signal and look at the positions of the computed frequencies. However the number of non-vanishing Fourier components of such a signal is \(\mathcal{O}(n)\), the ratio of the bandwidths of signal components to the whole interval where the spectrum is considered. This makes this problem particularly difficult for Sparse Fourier transform algorithms and can increase the complexity to \(\mathcal{O}(n)\).

We try to locate the sidebands using the AAFFT algorithm with different R. Since the amplitude of carrier signal is larger than the amplitudes of sidebands, for R<R all components of the obtained R-term representation belong to the carrier signal. In other words, even the presence of sidebands cannot be detected until we set R=R with certain critical number of terms R . This is shown in Fig. 11(right). Note that R grows with the signal size, since the frequency domain grid size decreases and more Fourier-sparse components belong to the support of each bandwidth limited component.

Our approach represents the signal by the QTT format (4), and a large number of non-vanishing components does not mean large QTT-ranks and large complexity of the proposed method. In the experiment we note that the QTT-ranks do not grow significantly w.r.t. n, and grow very reasonably w.r.t. accuracy parameter ε. This leads to the square-logarithmic complexity of TT-ACA and QTT-FFT, i.e., \(\mathcal{O}(R^{3} d^{2})\), where R is very moderate. The results are shown in Table 4. The sparse format with R=R Fourier terms approximates the given signal with relative accuracy in Frobenius norm ε≈10−1. A much more accurate QTT approximation is possible and can be obtained in sublinear time.

Table 4 Comparison of AAFFT and “TT-ACA + QTT-FFT” for limited bandwidth signal (38) of length n=2d. ‘AAFFT’ columns: R is minimal number of Fourier terms that allow to detect the sideband,  time and  nval are the corresponding runtime (in milliseconds) and the number of samples, ε is the relative accuracy of the Fourier-sparse representation with R terms. ‘TT-ACA + QTT-FFT’ columns: ε is the desired accuracy, R is the maximum effective QTT-rank seen in the computation,  time and  nval are the corresponding runtime (in milliseconds) and the number of samples

The considered limitation of the Sparse Fourier transform framework is general and does not (in our opinion) address specifically the AAFFT algorithm. We can conclude that for the considered signal with limited bandwidth our method outperforms the Sparse Fourier transform algorithms, even for small n.

9 Conclusion and Future Work

We propose algorithms for the approximate Fourier transform of one- and multi-dimensional vectors in the QTT format. The m-dimensional Fourier transform of an n×⋯×n array with n=2d has \(\mathcal{O}(m d^{2} R^{3})\) complexity where R is the maximum QTT-rank of the input vector, result, and all intermediate vectors of the Cooley-Tukey algorithm. The storage size is bounded by \(\mathcal{O}(m d R^{2})\). The proposed approach is efficient for vectors with moderate R. For vectors with moderate R and large n and m the proposed algorithm outperforms the \(\mathcal{O}(n^{m} \log n)\) FFT algorithm. The complexity w.r.t. d corresponds to the complexity of the superfast quantum Fourier transform, which requires \(\mathcal{O}(d^{2})\) quantum operations.

Using the QTT format, the 1D FFT transform of size n=2d for vectors with QTT-ranks R≲10 can be computed for n≤260 in less than a second. The 3D FFT transform for n×n×n array with about the same QTT-ranks can be computed for n≤220 in one second. This allows to use very fine grids and thus, to accurately compute the Hartree potential of Gaussian functions with strong cusps and width in the range from σ=10−1 to 10−5 by 3D convolution. We measure the runtime of the algorithm applied to multidimensional functions (m=1,2,3) with a small number of Fourier coefficients, which can be used to estimate the runtime for vectors with similar m,d and R. Notice that in the case m>3 the computation of high-resolution FFT and convolution transforms is practically infeasible without the use of a certain approximation.

Combining the QTT-FFT algorithm with a method that computes the QTT representation from several elements (samples) of a given vector, we compare it with Sparse Fourier transform algorithms. By numerical examples we show that our approach can be competitive with existing methods for the Fourier-sparse signals with randomly distributed components, especially for higher dimensions. Our approach is especially advantageous for the signals with limited bandwidth, which are not exactly Fourier-sparse. More detailed comparison with modern Sparse Fourier transform algorithms and links with the fast Fourier transform on hyperbolic cross points [7] and QTT-FFT or QTT cross interpolation schemes are postponed for future work.

The proposed method can be applied in various scientific computing solvers working with multidimensional data in data-sparse tensor formats. The discrete sine/cosine and other trigonometric transforms can be implemented using the corresponding recursion formulae, following the structure of QTT-FFT algorithm. We hope that such algorithms can have an application to image and video processing, directly or with the use of WTT version of the format, following the ideas of [46]. Also, we hope that proposed method can help to construct the classical model of several quantum algorithms based on the QFT, at least for a certain class of input data.