Superfast Fourier Transform Using QTT Approximation

Dolgov, Sergey; Khoromskij, Boris; Savostyanov, Dmitry

doi:10.1007/s00041-012-9227-4

Superfast Fourier Transform Using QTT Approximation

Published: 30 May 2012

Volume 18, pages 915–953, (2012)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Fourier Analysis and Applications Aims and scope Submit manuscript

Superfast Fourier Transform Using QTT Approximation

Download PDF

Sergey Dolgov^1,2,
Boris Khoromskij² &
Dmitry Savostyanov¹

1437 Accesses
43 Citations
Explore all metrics

Abstract

We propose Fourier transform algorithms using QTT format for data-sparse approximate representation of one- and multi-dimensional vectors (m-tensors). Although the Fourier matrix itself does not have a low-rank QTT representation, it can be efficiently applied to a vector in the QTT format exploiting the multilevel structure of the Cooley-Tukey algorithm. The m-dimensional Fourier transform of an n×⋯×n vector with n=2^d has $\mathcal{O}(m d^{2} R^{3})$ complexity, where R is the maximum QTT-rank of input, output and all intermediate vectors in the procedure. For the vectors with moderate R and large n and m the proposed algorithm outperforms the $\mathcal{O}(n^{m} \log n)$ fast Fourier transform (FFT) algorithm and has asymptotically the same log-squared complexity as the superfast quantum Fourier transform (QFT) algorithm. By numerical experiments we demonstrate the examples of problems for which the use of QTT format relaxes the grid size constrains and allows the high-resolution computations of Fourier images and convolutions in higher dimensions without the ‘curse of dimensionality’. We compare the proposed method with Sparse Fourier transform algorithms and show that our approach is competitive for signals with small number of randomly distributed frequencies and signals with limited bandwidth.

Deterministic sparse FFT for M-sparse vectors

Article 05 July 2017

Building Z-Permuted Matrices in the QTT Format

Article 01 December 2020

A New Class of Fully Discrete Sparse Fourier Transforms: Faster Stable Implementations with Guarantees

Article 03 May 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

For high-dimensional problems, data-sparse representation schemes allow to overcome the so-called curse of dimensionality and perform computations efficiently. Among many low-parametric formats, tensor decomposition methods appear to be the most promising for high-dimensional data, see reviews [3, 32, 38] and monograph [16]. The recently proposed tensor train (TT) format [39, 42, 44], also known in quantum chemistry as matrix product states (MPS) [54], combines advances of both the canonical and Tucker formats: the number of representing parameters grows only linearly with the dimension and the approximation problem is stable and can be computed by algorithms based on singular value decomposition (SVD). Additional advantage of tensor train format can be taken, if dimensions with large mode size are substituted by a larger number of dimensions with a small mode sizes. If the small mode size equals two, it becomes ‘indivisible’, i.e., can not be reduced further. We can say that such mode represents a quant or bit of information and call this representation of a vector the quantized tensor train (QTT) format, following [31]. Similarly, in quantum computations a large data vector is represented by the entangled quantum state of several qubits (quantum bits), which are systems with two quantum states.

The QTT format is rank-structured and the storage size is governed by QTT-ranks. The impressive approximation properties of the QTT format were discovered in [31] for a class of functions discretized on uniform grids. In particular, it was proven that the QTT-ranks of exp(αx), sin(αx), cos(αx), x ^p are uniformly bounded with respect to the grid size. For the functions $e^{-\alpha x^{2}}$, x ^α, $\frac{\sin x}{x}$, $\frac{1}{x}$, etc., similar properties were found experimentally.

In this paper we propose algorithms for the discrete Fourier transform of one- and high-dimensional data represented or approximated in the QTT format. Our approach is based on a radix-2 recursion formula which reduces the Fourier transform to the one of half size and lies behind the well-known Cooley-Tukey FFT algorithm [2, 9]. Each step of the proposed QTT-FFT algorithm includes an approximation to reduce the storage size of the intermediate vectors adaptively to the prescribed level of accuracy. The complexity of m-dimensional n×⋯×n transform with n=2^d is bounded by $\mathcal{O}(m d^{2} R^{3})$, where R is the maximum QTT-rank of the input, all intermediate vectors and output of the QTT-FFT algorithm. For vectors with moderate R, the QTT-FFT algorithm has square-logarithmic scaling w.r.t. the total number of array entries and outperforms the Cooley-Tukey FFT algorithm, which has $\mathcal {O}(n^{m}\log n)$ complexity. Given an arbitrary input vector, it is not possible to predict the value of R and state if QTT-FFT algorithm is efficient, until the computation is done. This problem is solved partially in [48], where the class vectors with R=1 is fully described and is also shown by numerical experiments that many vectors can be approximated by ones with moderate R. The QTT-ranks depend on the desired accuracy level, and transforms with lower accuracy are computed faster by the approximate QTT-FFT algorithm, in contrast to the exact FFT algorithm.

The QTT-FFT algorithm can be compared with the quantum Fourier transform (QFT) algorithm, widely utilized in quantum computations, such as eigenvalue estimation, order-finding, integer factorization, etc. A single operation in the quantum algorithm has an exponential performance, since it changes all 2^d components of the vector describing the entangled state of a quantum system. Remarkably, the superfast QFT algorithm [6] requires $\mathcal{O}(d^{2})$ quantum operations. The QTT-FFT algorithm has asymptotically the same complexity for fixed R, that explains the word superfast in the title of this paper.

The QTT-FFT can be also compared with the Sparse Fourier transform. In the proposed method we use the data-sparse rank-structured QTT format for data representation, instead of pointwise sparsity of the Fourier image exploited in the Sparse Fourier transform. The QTT-FFT algorithm requires the QTT representation of the input vector, which can be computed from a small number of vector elements (samples) using the TT-ACA algorithm [49]. Our method is deterministic, while the Sparse Fourier transforms algorithms are usually heuristic with certain randomness involved in the selection of samples and with the probabilistic estimation of the accuracy.

This paper is organized as follows. We start from the overview of TT and QTT formats in Sect. 2. In Sect. 3 we present the Fourier transform algorithm for vectors in the QTT format. In Sect. 4 we explain how to keep the QTT-ranks moderate during this procedure and result in the QTT-FFT algorithm for the approximate computation of Fourier transform in the QTT format. In Sect. 5 we consider real-valued transforms (convolution, cosine transform) and explain how to compute them using the QTT-FFT algorithm. In Sect. 6 we develop the multi-dimensional Fourier transform algorithm in the QTT format. In Sect. 7 we give numerical examples illustrating that the use of QTT format relaxes the grid size constrains and allows high-resolution computations of Fourier images in higher dimensions without the ‘curse of dimensionality’. In particular, our approach allows to compute one-dimensional and multi-dimensional Fourier images using n=2⁶⁰ in 1D and n=2²⁰ in 3D on a standard workstation, see Sects. 7.1 and 7.2. In Sect. 7.3 we compute the convolution transforms of data with strong cusps or singularities which occur in particular in quantum chemistry and require very fine grid. In Sect. 8 we compare our method with Sparse Fourier transform algorithms for exactly Fourier-sparse signals and signals with limited bandwidth.

2 Tensor Train Format

A tensor is an array with d indices (or modes)

$$\mathbf{X}= \bigl[x(k_1,\ldots,k_d)\bigr], \quad k_p = 0,\ldots,n_p-1, \quad p=1,\ldots,d. $$

The tensor train (TT) format [39, 42] for the tensor X reads^{Footnote 1}

$$ x(k_1,k_2,\ldots,k_d) = X^{(1)}_{k_1} X^{(2)}_{k_2} \cdots X^{(d)}_{k_d}, $$

(1)

where each $X^{(p)}_{k_{p}}$ is an r _p−1×r _p matrix. Usually the border conditions r ₀=r _d=1 are imposed to make every entry x(k ₁,…,k _d) a scalar. However, larger r ₀ and r _d can be considered and every entry of a tensor X=[x(k ₁,…,k _d)] becomes an r ₀×r _d matrix. The values r ₀,…,r _d−1 are referred to as TT-ranks and characterize the separation properties of the tensor X. Three-dimensional arrays $X^{(p)} = [X^{(p)}_{k_{p}}]$ are referred to as TT-cars.^{Footnote 2}

Definition 1

[39]

The p-th unfolding of a n ₁×n ₂×⋯×n _d tensor X=[x(k ₁,…,k _d)] is the n ₁⋯n _p×n _p+1⋯n _d matrix X ^{p}=[x ^{p}(s,t)] with the following elements

We will write X ^{p}=[x(k ₁⋯k _p,k _p+1⋯k _d)], assuming that comma separates row and column indices. Obviously, if (1) holds, then $r_{p} \geq\operatorname{{rank}}X^{\{p\}}$. In fact, $r_{p} = \operatorname{{rank}}X^{\{p\}}$.

Statement 1

[39, 42]

Each tensor X=[x(k ₁,…,k _d)] can be represented by the TT format with TT-ranks equal to the ranks of unfoldings,

$$ r_p = \operatorname{{rank}}X^{\{p\}} = \operatorname{{rank}}\bigl[x(k_1 \cdots k_p, k_{p+1} \cdots k_d)\bigr]. $$

(2)

For many tensors, unfoldings have large ranks, but can be approximated by the low rank matrices as follows

$$\big\| X^{\{p\}} - \tilde{X}^{\{p\}} \big\|_F \leq \varepsilon_p, \quad \operatorname{{rank}}\tilde{X}^{\{p\}} = r_p. $$

The minimum r _p which satisfies this condition is referred to as the ε _p-rank of X ^{p}.

Statement 2

[39, 42]

If unfoldings X ^{p} of a tensor X have ε _p-ranks r _p, then X can be approximated by the tensor $\tilde{\mathbf{X}}$ with TT-ranks r _p and the following accuracy

The TT format for $\tilde{\mathbf{X}}$, i.e., the approximation of a given tensor X in the TT format with a prescribed accuracy ε, can be computed by a constructive SVD-based algorithm [42]. Here the Frobenius norm of a tensor is defined as follows

$$\|\mathbf{X}\|_F^2 \mathrel{\stackrel{\mathrm{def}}{=}}\sum_{k_1\cdots k_d} \big|x(k_1, \ldots,k_d)\big|^2. $$

In the following we will omit the subscript and write ∥⋅∥=∥⋅∥_F for all vectors, matrices and tensors.

To apply the TT compression to low dimensional data, the idea of quantization was proposed [31, 40]. We will explain the idea for one-dimensional vector $x = [x(k)]_{k=0}^{n-1}$, restricting the discussion to n=2^d. Define the binary notation^{Footnote 3} of the index k as follows

$$ k=\overline{k_1\cdots k_d} \mathrel{ \stackrel{\mathrm{def}}{=}}\sum_{p=1}^d k_p 2^{p-1}, \quad k_p=0,1. $$

(3)

The isomorphic mapping k↔(k ₁,…,k _d) allows to reshape a vector x=[x(k)] into the d-tensor $\dot{\mathbf{X}} =[\dot{x}(k_{1},\ldots,k_{d})]$. The TT format (1) for the latter is called the QTT format and reads

$$ x(k) = x(\overline{k_1 \cdots k_d}) = \dot{x}(k_1,\ldots,k_d) = X^{(1)}_{k_1} \cdots X^{(d)}_{k_d}. $$

(4)

This idea appears in [40] in the context of matrix approximation. In [31] the TT format applied after the quantization of indices was called the QTT format. The impressive properties of the QTT approximation motivate the development of vector and tensor transforms in the QTT format.

By (2), the QTT-rank r _p of a vector x of size n=2^d is not larger than the number of columns/rows in X ^{p}, i.e., 1≤r _p≤2^min(p,d−p).

Definition 2

We will call the vectors with QTT-ranks one the rank-one vectors and the vectors with QTT-ranks r _p=2^min(p,d−p) the full-rank vectors.

A random tensor, like a random matrix, has full TT-ranks with probability one. In general, the QTT-ranks grow exponentially in d, making the QTT algorithms completely inefficient. Therefore, QTT methods are naturally limited to the class of problems where all data have moderate QTT-ranks. It is not possible yet to describe the whole class of such problems and vectors explicitly, but it is possible to justify the concept by a number of convincing examples. Some function-related examples are already mentioned, more examples of (piecewise) smooth functions and functions with singularities that have a low-rank QTT representation can be found in [14, 36, 41]. The exponential function, discretized on the uniform grid, deserves a special interest in this paper. It has the rank-one QTT representation [31]

$$ \exp(\alpha k) = \exp(\alpha \overline{k_1k_2 \cdots k_d}) = \exp (\alpha k_1) \exp(2 \alpha k_2) \cdots\exp\bigl(2^{d-1} \alpha k_d\bigr), $$

(5)

which plays a key role for the efficient Fourier transform algorithm in the QTT format.

The QTT format can be applied not only to vectors, but also to matrices (see TTM format in [40]). Consider a 2^d×2^d matrix $A = [a(j,k)]_{j,k=0}^{2^{d}-1}$, use the binary notation (3) for the indices $j=\overline {j_{1}\cdots j_{d}}$ and $k=\overline{k_{1} \cdots k_{d}}$, then permute the binary indices and reshape A to the tensor $\dot{\mathbf {A}}=[\dot{a}(j_{1}k_{1},j_{2}k_{2},\ldots,j_{d}k_{d})]$. Finally, apply the TT format to $\dot{\mathbf {A}}$ as follows

$$ a(j,k) = a(\overline{j_1 \cdots j_d}, \overline{k_1 \cdots k_d}) = \dot{a}(j_1k_1, j_2k_2, \ldots, j_dk_d) = A_{j_1 k_1}^{(1)} A_{j_2 k_2}^{(2)} \cdots A_{j_d k_d}^{(d)}, $$

(6)

where each $A^{(p)}_{j_{p}k_{p}}$ is an r _p−1×r _p matrix. Merging indices j _p and k _p into the pair-index i _p, we have the d-tensor $\dot{\mathbf {A}}=[\dot{a}(i_{1},\ldots,i_{d})]$ and can use its unfoldings to find the TT-ranks of (6). In Sect. 3.1 we show that the Fourier matrix is ‘not compressible’ using the QTT format, i.e., its QTT-ranks grow exponentially with d. Therefore, the efficient Fourier transform appears to be a nontrivial problem. In contrast, recent results on explicit representation for the inverse Laplacian and related matrices [24] and efficient convolution in the QTT format [25] are based on the low-rank QTT decompositions of certain matrices and tensors. There are more examples of high-dimensional problems which were efficiently solved using the QTT approximation [4, 36, 37] as well as H-Tucker format [15].

The TT/QTT algorithms are based on basic linear algebra procedures like summation, multiplication and rank truncation (tensor rounding) maintaining the compressed format, i.e., the full data array is never computed. A comprehensive list of basic operations is given in [42]. We will need the Hadamard (elementwise) product,

$$z = x \odot y, \quad\mbox{i.e.,} \quad z(k)=x(k)y(k), \quad k=0, \ldots,2^d-1. $$

If x and y are given in the QTT format (4), the QTT format for z is written as follows

$$ z(k) = z(\overline{k_1\cdots k_d}) = Z^{(1)}_{k_1} \cdots Z^{(d)}_{k_d}, \quad \quad Z^{(p)}_{k_p} = X^{(p)}_{k_p} \otimes Y^{(p)}_{k_p}, \quad p=1,\ldots,d, $$

(7)

where X⊗Y denotes the Kronecker (tensor) product of matrices X and Y. The QTT-ranks of the Hadamard product z are the products of the corresponding QTT-ranks of x and y.

3 Discrete Fourier Transform in One Dimension

For n=2^d, the normalized discrete Fourier transform (DFT) reads

$$ y(j) = \frac{1}{2^{\frac{d}{2}}}\sum_{k=0}^{2^d-1} x(k) \omega_d^{j k}, \quad\omega_d = \exp \biggl(-\frac{2\pi\mathtt{i}}{2^d} \biggr), \quad\mathtt{i}^2=-1, $$

(8)

where $F_{d} = \frac{1}{2^{\frac{d}{2}}} [\omega_{d}^{jk} ]_{j,k=0}^{2^{d}-1}$ is the unitary Fourier matrix. The inverse Fourier transform is written in the same way, with $\omega_{d} = \exp (\frac{2\pi\mathtt{i}}{2^{d}} )$.

3.1 QTT Decomposition of the Fourier Matrix Has Full Ranks

Unlike many matrices used in scientific computing (see, e.g., [40]), the Fourier matrix can not be compressed in the QTT format (6), i.e., its QTT-ranks grow exponentially with d. The proof is based on the properties of unfoldings of the Fourier matrix, defined as follows

Lemma 1

For $p\leq{\frac{d}{2}}$ the unfoldings $F_{d}^{\{p\}}$ have orthogonal rows, for $p\geq{\frac{d}{2}}$ they have orthogonal columns.

Proof

If $p\leq{\frac{d}{2}}$, elements of the unfolding $F^{\{p\}}_{d}$ are the following

$$f_d^{\{p\}}\bigl(j'k',j''k'' \bigr) = \frac{1}{2^{\frac{d}{2}}}\omega_d^{jk} = \frac {1}{2^{\frac{d}{2}}} \omega_d^{j'k'} \omega_{d-p}^{j'k''} \omega_{d-p}^{k'j''} \omega_{d-2p}^{j''k''}. $$

In the matrix notation

where Ω′ and Ω″ are unitary diagonal matrices and Φ is the 2^p×2^d−p top submatrix of F _d−p. Since F _d−p is orthogonal, the matrix Φ also has orthogonal rows and the Gram matrix of $F^{\{p\}}_{d}$ is written as follows

The statement for the case $p\geq{\frac{d}{2}}$ is proved in the same way. □

The following theorem proves that the low-rank approximation of the Fourier matrix in the QTT format with the reasonable accuracy is not possible.

Theorem 1

If the Fourier matrix F _d is approximated by the matrix A with relative accuracy ∥F _d−A∥≤ε∥F _d∥, and A is given in the QTT format (6) with QTT-ranks r ₁,…,r _d−1, then

$$r_p \geq(1 - \varepsilon ) 4^{\min(p,d-p)}, \quad p=1,\ldots,d-1. $$

Proof

From the assumption of the theorem it follows that for every p the unfolding matrix $F^{\{p\}}_{d}$ is approximated by the rank-r _p matrix A ^{p} as follows,

where $j'=\overline{j_{1}\cdots j_{p}}$, $k'=\overline{k_{1}\cdots k_{p}}$, $j''=\overline{j_{p+1}\cdots j_{d}}$, $k''=\overline{k_{p+1}\cdots k_{d}}$. Denote by $\tilde{F}^{\{p\}}_{d}$ the best rank-r _p approximation of $F^{\{p\}}_{d}$, then

$$\big\|F^{\{p\}}_d - \tilde{F}^{\{p\}}_d\big\| \leq \big\|F^{\{p\}}_d - A^{\{p\}}\big\|. $$

By Lemma 1, the 4^p×4^d−p unfolding $F^{\{ p\}}_{d}$ has orthogonal rows/columns and the accuracy of the best rank-r _p approximation is exactly the following

$$\big\|F^{\{p\}}_d - \tilde{F}^{\{p\}}_d\big\| = \biggl( 1- \frac{r_p}{4^{\min (p,d-p)}} \biggr) \big\|F^{\{p\}}_d \big\|. $$

We have $1- \frac{r_{p}}{4^{\min(p,d-p)}} \leq\varepsilon $, which completes the proof. □

Corollary 1

The QTT-ranks of the exact QTT-decomposition of the Fourier matrix are r _p=4^min(p,d−p) and the storage size grows exponentially with d.

Remark 1

In contrast to the Fourier transform, the Hadamard transform matrix has QTT-ranks one, which follows directly from the definition,

This transform, also known as Walsh transform, arises in various applications, including quantum computing, etc. The Hadamard transform can be easily applied to a vector given in the QTT format and does not change the QTT-ranks. The Fourier transform can arbitrarily increase the QTT-ranks of a vector.

3.2 Radix-2 Recursion Formula in the QTT Format

Since the Fourier matrix is not compressible in the QTT format, we can not compute y=F _d x using the matrix-vector multiplication algorithm in the TT format [42]. Nevertheless, we can apply the Fourier transform to the QTT vector efficiently. We define

(9)

and split odd and even values of the result

We come to the well-known radix-2 recursion formula, the simplest and most common case of the Cooley-Tukey fast Fourier transform (FFT) algorithm [2, 9]. It reduces the full-size Fourier transform to the half-sized transforms as follows

$$ P_d F_d = \left [ \begin{array}{c@{\quad}c}F_{d-1} & \\[4pt] & F_{d-1} \end{array} \right ] \left [ \begin{array}{c@{\quad}c}I & \\[4pt] & \varOmega_{d-1} \end{array} \right ] \frac{1}{\sqrt{2}} \left [ \begin{array}{c@{\quad}c}I & I \\[4pt] I & -I \end{array} \right ]. $$

(10)

Here P _d is the bit-shift permutation which agglomerates even and odd elements of a vector, and $\varOmega_{d-1} = \operatorname{{diag}}\{\omega_{d}^{k'}\}_{k'=0}^{2^{d-1}-1}$ is the matrix of twiddle factors. We will need the following general definitions later,

(11)

Our goal is to compute y=F _d x for the vector x given in the QTT format (4). First, we note that the “top” and “bottom” half-vectors of x are the following

(12)

and their summation/subtraction affects only the last car.

The multiplication by the diagonal matrix is written as the Hadamard multiplication

(13)

where vector w _d=[w _d(k)] has the following rank-two QTT decomposition

$$w_d(k) = w_d(\overline{k_1 \cdots k_d}) = \left [ \begin{array}{c@{\quad}c} 1 & \omega_d^{k_1} \end{array} \right ] \left [ \begin{array}{c@{\quad}c} 1 & \\[4pt] & \omega_d^{2k_2} \end{array} \right ] \cdots \left [ \begin{array}{c@{\quad}c} 1 & \\[4pt] & \omega_d^{2^{d-2}k_{d-1}} \end{array} \right ] \left [ \begin{array}{c} 1-k_d \\[4pt] k_d \end{array} \right ]. $$

Therefore, the vector $z = \operatorname{{diag}}(w_{d}) \hat{x} = w_{d} \odot\hat{x}$ has the following QTT decomposition

(14)

Note that the multiplication by the twiddle factors doubles the QTT-ranks.

The last step to implement (10) is to apply the half-size Fourier transform to the “top” and “bottom” parts of the vector z. We consider the following ${\frac{n}{2}}\times 2r_{d-1}$ matrix

and compute F _d−1 Z′ in the QTT format, applying the radix-2 recursion subsequently.

Each radix-2 step applies the bit-shift permutation to the result. It is easy to notice that P ₁ P ₂⋯P _d=R _d, which is a bit-reverse permutation $(R_{d} y)(\overline{j_{d} j_{d-1} \cdots j_{1}}) = y(\overline{j_{1} \cdots j_{d}})$. It can be nicely implemented in the QTT format without any computations by reversing the order of cars in the tensor train,

(15)

We add the bit-reverse step and summarize all the above in the QTT-FFT Algorithm 1.

4 Approximate Fourier Transform

We do not benefit from the use of the QTT format if QTT-ranks are too large. However, in Algorithm 1 the QTT-ranks grow by a factor of two each time we multiply by twiddle factors on Line 4. After d steps they grow by 2^d=n, which makes the QTT representation completely ineffective. To perform computations efficiently, it is necessary to truncate the QTT-ranks, i.e., to approximate the result by the QTT format with smaller values of ranks.

4.1 TT-Rounding and TT-Orthogonalization

We recall the most important properties of the TT-rounding (also rank truncation or recompression) algorithm proposed in [42]. We will use it to approximate a vector x given in the QTT format (4) by vector $\tilde{x}$ in the following QTT form,

$$ \tilde{x}(k) = \tilde{x}(\overline{k_1 \cdots k_d}) = \tilde{X}_{k_1}^{(1)} \cdots\tilde{X}_{k_d}^{(d)}, \quad\|\tilde{x}\| = \|x\|, \quad\|x - \tilde{x} \| \leq\varepsilon \| x\|, $$

(16)

where each $\tilde{X}_{k_{p}}$ is an $\tilde{r}_{p-1} \times\tilde{r}_{p}$ matrix, $\tilde{r}_{0}=r_{0}$, $\tilde{r}_{d}=r_{d}$, $\tilde{r}_{p} \leq r_{p}$, p=1,…,d−1. The approximation can be done up to some desired accuracy level or by bounding the values of QTT-ranks $\tilde{r}_{p}$ from above. If we want to prescribe the relative accuracy ε in (16), then the values of $\tilde{r}_{p}$ appear during the work of the TT-rounding procedure and in general cannot be predicted. If $\tilde{r}_{p}$ are prescribed, then the approximation is quasi-optimal [42, 45], i.e.,

$$\|x-\tilde{x}\| \leq\sqrt{d-1} \big\|x - \tilde{x}^{\mathrm{best}}\big\|, $$

where $\tilde{x}^{\mathrm{best}}$ is the best approximation of x with TT-ranks $\tilde{r}_{p}$. However, the error $\|x - \tilde{x}\|$ is generally not known in advance. The TT-rounding algorithm is based on QR and SVD algorithms for matrices, which are well established and included in many linear algebra packages (e.g. LAPACK). We illustrate the TT-rounding procedure in Fig. 1.

The key part of the TT-rounding algorithm is the TT-orthogonalization.

Definition 3

The TT-car X ^(p) is called left- or right-orthogonal if, respectively,

$$\sum_{k_p} \bigl(X^{(p)}_{k_p} \bigr)^* X^{(p)}_{k_p} = I \quad \mbox{or} \quad \sum _{k_p} X^{(p)}_{k_p} \bigl(X^{(p)}_{k_p} \bigr)^* = I. $$

If the TT-car X ^(p) is written as r _p−1×n _p×r _p tensor, the left-orthogonality implies that the r _p−1 n _p×r _p unfolding has orthogonal columns and the right-orthogonality means that the r _p−1×n _p r _p unfolding has orthogonal rows. A product of two left-orthogonal TT-cars is also a left-orthogonal array, as explained by the following statement.

Statement 3

[42]

Consider p×q matrices A _i, i=0,…,m−1, and q×r matrices B _j, j=0,…,n−1. If the 3-tensors A=[A _i] and B=[B _i] are left-orthogonal, i.e., $\sum_{i} A_{i}^{*} A_{i} = I$ and $\sum_{j} B_{j}^{*} B_{j} = I$, then the p×mn×r tensor C=[C _ij], C _ij=A _i B _j, is also left-orthogonal,

$$\sum_{ij} C_{ij}^* C_{ij} = \sum_{ij} (A_i B_j)^* (A_i B_j) = \sum_j B_j^* \biggl( \sum_i A_i^* A_i \biggr) B_j = \sum _j B_j^* B_j = I. $$

The same statement holds for right-orthogonal cars. It can be also generalized to a larger number of subsequent TT-cars.

Definition 4

A sequence of TT-cars X ^(p),X ^(p+1),…,X ^(q−1),X ^(q) will be referred to as a subtrain.

Statement 4

[42] If all TT-cars of the subtrain X ⁽¹⁾,…,X ^(p) are left-orthogonal and r ₀=1, then the following (n ₁⋯n _p)×r _p matrix X′ has orthogonal columns,

$$X'=\bigl[x'(\overline{k_1\cdots k_{p}},\alpha)\bigr], \quad\mbox{where } x'( \overline{k_1\cdots k_{p}}, :) = X^{(1)}_{k_1} \cdots X^{(p)}_{k_p}. $$

The left- or right-orthogonality of the particular subtrain can be achieved straightforwardly, using the matrix orthogonalization procedure such as QR. The algorithm is called TT-orthogonalization and given in [42]. The ‘structured orthogonality’ of subtrains allows to perturb other cars and control the accuracy of the whole tensor. We illustrate this in Fig. 2. Given a vector x with QTT representation (4), we apply a left TT-orthogonalization to the subtrain X ⁽¹⁾⋯X ^(p) and a right TT-orthogonalization to X ^(s)⋯X ^(d). As a result, these cars are modified and we get another QTT representation for the same vector x, where cars $\hat{X}^{(1)},\ldots,\hat{X}^{(p-1)}$ are left-orthogonal and X ^(s+1),…,X ^(d) are right-orthogonal. We depict the left- and right-orthogonal cars as and , respectively. A similar notation is used in [21].

After TT-orthogonalization is done, we apply the TT-truncation to the subtrain X ^(p)⋯X ^(s) and reduce the TT-ranks r _p,…,r _s−1 to $\tilde{r}_{p},\ldots,\tilde{r}_{s-1}$, introducing an approximation error or perturbation to the subtrain. The orthogonality of the left and right interfaces X ⁽¹⁾⋯X ^(p−1) and X ^(s+1)⋯X ^(d) guarantees that the same perturbation is introduced to the whole vector x.

Both the TT-rounding and TT-orthogonalization procedures for the QTT format have $\mathcal{O}(dR^{3})$ complexity, where d is the length of the tensor train and R=maxr _p is the maximum TT-rank. Note that the TT-rounding procedure can generate an approximation with orthogonal cars. In [42] the TT-rounding generates a left-orthogonal tensor train, but we will assume that TT-rounding generates a right-orthogonal QTT approximation, as shown in Fig. 1.

4.2 Accuracy of the QTT-FFT Algorithm with Approximation

We include the TT-rounding step in Algorithm 1 and result in the version of QTT-FFT with approximation, which we will call QTT-FFT in the following. It is given by Algorithm 2 and visualized in Fig. 3. Each step of Algorithm 2 includes an approximation which introduces an error to the active subtrain. In the following theorem we estimate the accuracy of the result returned by QTT-FFT.

Theorem 2

If accuracy-based criterion ε is used on each approximation step in Algorithm 2, the accuracy of the result is the following

$$\|y - F_d x\| \leq d \varepsilon \|y\|. $$

Proof

In the first step D=d of the algorithm it holds y=F _d x _d and

$$P_d y = (I_1 \otimes F_{d-1} ) z_d, $$

where I _p denotes the 2^p×2^p identity matrix and z _d is defined by (14). The approximation step gives $\|z_{d} - \tilde{z}_{d}\| \leq\varepsilon \| z_{d}\|$ and since P _d and I ₁⊗F _d−1 are unitary matrices we have

$$ y = \tilde{y}_d + \mu_d, \quad \tilde{y}_d \mathrel{\stackrel{\mathrm{def}}{=}}P_d^T ( I_1 \otimes F_{d-1} ) \tilde{z}_d, \quad\| \mu_d\| \leq\varepsilon \|y\|, $$

(17)

where $\tilde{y}_{d}$ is the approximation of the result and μ _d is the error at step D=d. At the next step the Fourier transform is applied to the ${\frac{n}{2}}\times r_{d-1}(\tilde{z}_{d})$ matrix x _d−1, where $r_{d-1}(\tilde{z}_{d})$ is the corresponding QTT-rank of $\tilde{z}_{d}$. The matrix x _d−1 has the following QTT decomposition

If y _d−1=F _d−1 x _d−1 is computed, the approximation $\tilde{y}_{d}$ is written in elementwise notation as follows

$$ \tilde{y}_d(\overline{j_1 \cdots j_d}) = y_{d-1}(\overline{j_2 \cdots j_d}) \tilde{Z}^{(d)}_{d,j_1}. $$

(18)

Note that the TT-car $\tilde{Z}^{(d)}_{d}$ is right-orthogonal after the TT-truncation, see Fig. 3. This means that any perturbation introduced to y _d−1 results in the error of the same norm introduced to $\tilde{y}_{d}$. We will now estimate this error.

In the step D=d−1 of the algorithm we compute the Fourier transform of x _d−1 using the same radix-2 step. We define z _d−1 by Lines 3 and 4 of Algorithm 1 and have

$$P_{d-1}y_{d-1} = (I_1 \otimes F_{d-2} ) z_{d-1}, $$

which is similar to what we had before. Approximation gives $\|\tilde{z}_{d-1} - z_{d-1}\| \leq\varepsilon \| z_{d-1}\|$. This leads to

$$y_{d-1} = \tilde{y}_{d-1} + \mu_{d-1}, \quad \tilde{y}_{d-1} \mathrel{\stackrel{\mathrm{def}}{=}}P_{d-1}^T ( I_1 \otimes F_{d-2} ) \tilde{z}_{d-1}, \quad\| \mu_{d-1}\| \leq\varepsilon \|y\|. $$

Substituting this equation to (18) and (17), we have, in elementwise form,

Since the TT-car $Z^{(d)}_{d}$ is right-orthogonal, it does not change the Frobenius norm, and both error terms have a norm bounded by ε∥y∥. It is easy to notice that every approximation step D introduces the perturbation ∥μ _D∥≤ε∥y∥, and since all TT-cars in the right subtrain $Z^{(D+1)}_{D+1} \cdots Z^{(d)}_{d}$ are right-orthogonal, the perturbation of the result has the same norm. After d approximation steps of the algorithm, the total error is not larger than dε∥y∥. This completes the proof. □

Remark 2

The proof of Theorem 2 actually shows that the norm of the error of Algorithm 2 is not larger than the sum of the norms of errors introduced on each approximation step. If for a certain input vector all TT-truncation steps are exact, the result returned by Algorithm 2 is precise. Examples of such vectors are given in [48].

Theorem 2 holds for any input vector x. However, the QTT-ranks of the intermediate vectors x _D, D=d,…,1, depend both on the accuracy level ε and the properties of the vector x. It is not easy to describe explicitly the class of vectors x for which all x _D in Algorithm 2 will have QTT-ranks bounded by the desired value R for accuracy level ε. This problem is solved for R=1,ε=0 (QTT-rank-one vectors with QTT-rank-one Fourier images) in [48]. The numerical experiments provided in [48] show that the class of such vectors for larger R and non-zero ε can be ‘not so small’, even if ε is close to the machine precision. For instance, the Fourier transform of a random rank-one vector typically (in the large set of numerical tests) is approximated by a vector with moderate QTT-ranks.

4.3 Complexity Analysis

Now we estimate the asymptotic complexity of Algorithm 2 assuming that all QTT-ranks of vectors x _D on all steps of the algorithm are bounded by R, as well as the QTT-ranks of input and output.

Theorem 3

If on levels D=d,…,1 of Algorithm 2 the QTT-ranks of intermediate vectors x _D are not larger than R _D, the complexity is estimated as follows

$$ \mathop{\mathtt{work}}\leq\sum_{D=1}^d \mathcal{O}\bigl(D R_D^3\bigr) \leq \mathcal{O} \bigl(d^2 R^3\bigr), \quad\mbox {\textit{where} } R=\max R_D. $$

(19)

Proof

In step D=d,…,1 we do the following:

1.
alter the last block $\hat{X}^{(D)}_{0,1}=\frac{1}{\sqrt{2}} (X^{(D)}_{0} \pm X^{(D)}_{1} )$ in $\mathcal{O}(R_{D}^{2})$ operations;
2.
multiply the cars by twiddle factors, see Eq. (14) in $\mathcal{O}(R_{D}^{2})$ operations;
3.
compress the subtrain $z_{D}(k) = Z^{(1)}_{k_{1}} \cdots Z^{(D)}_{k_{D}}$ using the TT-truncation algorithm [42] of $\mathcal{O}(D R_{D}^{3})$ complexity.

By summation we come to (19). □

The complexity of the QTT-FFT algorithm is bounded by $\mathcal {O}(d^{2} R^{3})$, where R depends on the input vector x and the accuracy ε of the approximation. It can be compared with the complexity of the superfast quantum Fourier transform [6], which requires $\mathcal{O}(d^{2})$ quantum operations. In general (for arbitrary input vector x) the value of R can grow exponentially with d. If we use the ‘brute-force’ TT-truncation with the maximum rank criterion, the value of R does not grow and the complexity is $\mathcal{O} (d^{2})$ for all input vectors, but the accuracy of the result can be arbitrarily bad. If the accuracy-based criterion is used, Theorem 2 guarantees the accuracy of the result, but the value of R and the complexity can be arbitrarily large. Therefore, for a class of all possible input vectors the accuracy and the complexity of the QTT-FFT are related (informally speaking) by the uncertainty principle: both cannot be small.

It is interesting (both theoretically and practically) to establish a special subclass of vectors x for which the QTT-FFT algorithm has both the square-logarithmic complexity and reasonable accuracy. This work has been started in [48]. More examples are provided by the numerical experiments in Sect. 7.

5 Real-Valued Transforms Using QTT-FFT Algorithm

Many problems are formulated in real-valued arithmetic but can be efficiently solved using the Fourier transform. For example, consider the discrete convolution of two real-valued vectors, defined as follows

$$f(j) = \sum_{k=0}^{n-1} h(j-k) g(k), \quad j=0,\ldots,n-1. $$

In matrix-vector form f=Tg where $T=[t(j,k)]_{j,k=0}^{n-1}$ is the Toeplitz matrix with elements t(j,k)=h(j−k). Each n×n Toeplitz matrix is the leading submatrix of some 2n×2n circulant matrix

To correspond with the matrix T, the elements of C are defined by

$$\left . \begin{array}{l@{\quad}l} \hat{c}(k) = h(k), & k=0,\ldots,n-1, \\[4pt] \hat{c}(n) & \mbox{is arbitrary}, \\[4pt] \hat{c}(k) = h(k-2n), & k=n+1,\ldots,2n-1. \end{array} \right . $$

Circulant matrices are diagonalized by the unitary Fourier matrix [10]:

$$C = F^* \varLambda F, \quad\varLambda= \sqrt{2n} \operatorname{{diag}}(F \hat{c}). $$

The multiplication by the Toeplitz matrix T can be performed as follows

$$ \left [\begin{array}{c} f \\[4pt] * \end{array} \right ] = F^* \varLambda F \left [\begin{array}{c} g \\[4pt] 0 \end{array} \right ] , $$

(20)

which involves three Fourier transforms of size 2n and can be done ‘at FFT speed’. The result is the complex-valued vector representing the real-valued convolution. In standard computations we can easily take the real or imaginary part of a vector. For vectors or tensors represented in the QTT format such operation is not straightforward. The algorithm is given by the constructive proof of the following theorem.

Theorem 4

If the complex-valued tensor X=[x(k ₁,…,k _d)] is represented by the tensor train format (1) with ranks r ₀,r ₁,…,r _d−1,r _d, it can be represented by the following tensor train with TT-ranks r ₀,2r ₁,…,2r _d−1,r _d, where all cars but one are real-valued.

(21)

where p=2,…,d−1 and $B_{k_{q}}^{(q)}=\Re X_{k_{q}}^{(q)}$, $C_{k_{q}}^{(q)} = \Im X_{k_{q}}^{(q)}$ for q=1,…,d.

Proof

Start from

$$X^{(1)}_{k_1} = \left [\begin{array}{c@{\quad}c}B_{k_1}^{(1)} & C_{k_1}^{(1)} \end{array} \right ] \left [\begin{array}{c}I \\[4pt] \mathtt{i}I \end{array} \right ] , $$

where I denotes the identity r ₁×r ₁ matrix. Define the real-valued car $\hat{X}^{(1)}_{k_{1}} = [B_{k_{1}}^{(1)} \ \ C_{k_{1}}^{(1)} ] $ and multiply [I i I]^T to the second car of tensor train as follows

where in the right-hand side $\hat{X}^{(2)}_{k_{2}}$ is the new real-valued car and I is r ₂×r ₂ identity matrix. Continue the process and establish (21). □

Corollary 2

The representation (21) allows to compute the real and imaginary parts of a tensor train (1) as follows

$$\Re x(k_1,\ldots,k_d) = \Biggl(\prod _{p=1}^{d-1}\hat{X}^{(p)}_{k_p} \Biggr) \Re\hat{X}^{(d)}_{k_d}, \quad\quad \Im x(k_1,\ldots,k_d) = \Biggl(\prod _{p=1}^{d-1}\hat{X}^{(p)}_{k_p} \Biggr) \Im\hat{X}^{(d)}_{k_d}. $$

Example 1

Obtain the QTT representation for cosine and sine functions, discretized on the uniform grid, i.e., vectors $[\cos\alpha k ]_{k=0}^{2^{d}-1}$ and $[\sin\alpha k ]_{k=0}^{2^{d}-1}$. They are the real and imaginary part of the vector (5), which has the rank-one QTT representation with the following cars

$$X^{(p)}_{k_p} = \exp\bigl(\mathtt{i}2^{p-1}\alpha k_p\bigr) = \cos 2^{p-1}\alpha k_p + \mathtt{i} \sin2^{p-1}\alpha k_p. $$

We use (21) and form the real-valued cars $\hat{X}^{(1)}_{k_{1}} = [ \cos\alpha k_{1} \ \ \sin\alpha k_{1} ] $ and

$$\hat{X}^{(p)}_{k_p} = \left [\begin{array}{c@{\quad}c} \cos2^{p-1} \alpha k_p & \sin2^{p-1} \alpha k_p \\[4pt] -\sin 2^{p-1} \alpha k_p & \cos2^{p-1} \alpha k_p \end{array} \right ] , \quad p=2,\ldots,d-1. $$

We result in

(22)

This is the simultaneous representation of two considered vectors in the common QTT form. The QTT-ranks are two, which follows from the existence of the rank-one QTT representation for the exponential function and the simultaneous representation of the real and imaginary part of complex tensor train, ensured by Theorem 4.

Example 2

Given a vector x, compute the orthogonal DCT-II (discrete cosine even type-2) transform, defined as follows

$$y(j) = \sum_{k=0}^{n-1} x(k) \cos \biggl( \frac{\pi}{n}j\biggl(k+{\frac{1}{2}} \biggr) \biggr), \quad j=0,\ldots,n-1. $$

The history of this and related cosine transforms, including the applications to signal and image processing, can be found in a nice review [53]. The DCT-II can be computed through the Fourier transform of size 2n applied to the vector x expanded by zeros,

$$y(j) = \Re \Biggl[ \exp \biggl(-\frac{\pi\mathtt{i}}{2n}j \biggr) \sum _{k=0}^{2n-1} \hat{x}(k) \exp \biggl(- \frac{2\pi\mathtt {i}}{2n}jk \biggr) \Biggr], $$

where $\hat{x}(0:n-1)=x$ and $\hat{x}(n:2n-1)=0$. In computational practice double-sized vectors and transforms are usually avoided for the sake of efficiency. However, in the QTT approach, a double sized vector/transform does not require double storage/cost. If n=2^d and x is given in the QTT format (4), the vector $\hat{x}$ has the following QTT representation,

$$\hat{x}(k) = \hat{x}(\overline{k_1 \cdots k_d k_{d+1}}) = X^{(1)}_{k_1} \cdots X^{(d)}_{k_d} (1-k_{d+1}), $$

which has the same QTT-ranks as x. To compute y, we need to apply QTT-FFT algorithm 2 to find $\hat{y} = F_{d+1} \hat{x}$, which requires $\mathcal{O}((d+1)^{2} R^{3}) $ operations, asymptotically equal to the complexity of the transform of size 2^d. Then we need to consider the “top” half-vector of $\hat{y}$ as explained by (12) and multiply the result pointwise by the exponential vector. These operations do not change the QTT-ranks. Finally, we apply Theorem 4 and obtain the real-valued cosine transform with QTT-ranks twice as large as ones of the Fourier transform.

Finally, we should mention that the proposed approach to the computation of convolution and cosine transform can be generalized to other types of trigonometric transforms as well as to higher dimensions.

6 Discrete Fourier Transform in Higher-Dimensional Case

The m-dimensional normalized Fourier transform is defined as follows.

(23)

where j _q=0,…,n _q−1 for q=1,…,m. If x=[x(k ₁,…,k _m)] and y=[y(j ₁,…,j _m)] are considered as n ₁⋯n _d vectors, the m-dimensional Fourier transform is the Kronecker product of m one-dimensional transforms, i.e., $F_{d_{1},\ldots,d_{m}} = F_{d_{m}} \otimes F_{d_{m-1}} \otimes \cdots\otimes F_{d_{2}} \otimes F_{d_{1}}$. Using the tensor train representation (1), we have

(24)

We see that due to the perfect separation of variables, Fourier transform is independently applied to all TT-cars. The multidimensional Fourier transform does not change the TT-ranks. Therefore, Eq. (24) is the straightforward algorithm to compute the m-dimensional Fourier transform of a n×n×⋯×n vector with $\mathcal {O}(m R^{2} n\log n)$ complexity, where R is the maximal TT-rank of a vector.

To reach square-logarithmic complexity (of course, for a special type of input data), we assume $n_{p}=2^{d_{p}}$ and use the binary notation (3) for all mode indices. The input array is represented as follows

(25)

Then the QTT-FFT Algorithm 2 can be used to compute unimodular Fourier transforms subsequently. The QTT-FFT introduces the error, controlled by Theorem 2. The appropriate TT-orthogonalization should be used for interfaces to guarantee that this error will not be amplified in the whole m-dimensional vector. The multi-dimensional version of QTT-FFT is summarized by Algorithm 3 and visualized in Fig. 4. The accuracy and complexity are estimated similarly to the one-dimensional case.

Theorem 5

If the accuracy-based criterion ε is used on each approximation step in Algorithm 3, the accuracy of the result is the following

$$\|y - F_{d_1,\ldots,d_m} x\| \leq\sum_{q=1}^m d_q \varepsilon \|y\|. $$

Theorem 6

The complexity of Algorithm 3 for m-dimensional transform of an n×n×⋯×n vector with n=2^d is not larger than $\mathcal{O}(m d^{2} R^{3})$, where R is the maximum QTT-rank of all intermediate vectors at all steps of Algorithm 3 and all internal calls to Algorithm 2.

Proof

In Line 1 of Algorithm 3 the TT-orthogonalization of the tensor train with md cars is performed, which requires not more than $\mathcal{O}(mdR^{3})$ operations. Then the Algorithm 2 and TT-orthogonalization are applied to m subtrains with d cars each, which costs not more than $m\mathcal{O}(d^{2}R^{3})+m\mathcal{O}(d R^{3})$. The last step does not require any operations. Therefore, we obtain the $\mathcal{O}(md^{2}R^{3})$ estimate, which completes the proof. □

Note that due to the bit-reverse permutation issues, the order of mode indices in the tensor train on output is reversed as follows.

(26)

The reason is that every unimodular QTT transform is performed by Algorithm 2 with the bit-reverse permutation of the result. In the 1D case, the bit-reverse permutation is made without any computations by simple reordering of the transposed QTT-cars. If we apply this reordering to each unimodular subtrain individually, the TT-ranks (ranks separating the physical variables) will not be consistent. Instead, we reverse the tensor train globally and obtain the result where the order of bits in the QTT representation is correct, but the order of frequency modes is reversed compared with the original ordering of space modes, as shown in Fig. 4. Fortunately, a wide class of applications (e.g. convolution, solution of discretized PDEs) requires two successive Fourier transforms. In this case the second transform will recover the original ordering.

This drawback of the multi-dimensional QTT-FFT can be removed using a one-dimensional QTT-FFT algorithm based on the self-sorting FFT [20].

7 Numerical Examples

In this section we compare the accuracy and timings of the trigonometric transforms for one-, two- and three-dimensional functions with moderate QTT-ranks. From the proof of Theorem 3 we see that the complexity of the QTT-FFT depends on all QTT-ranks. The maximum QTT-rank used in (19) sometimes gives incorrect impression of “structure complexity” of particular data array. To account the distribution of all QTT-ranks r ₀,…,r _d, we define the effective QTT-rank ^{Footnote 4} of a vector as the positive solution r of the quadratic equation mem(r ₀,r ₁,r ₂,…,r _d−1,r _d)= mem(r ₀,r,r,…,r,r _d), where mem measures the storage for the QTT format as follows

$$\mathop{\mathtt{mem}}(r_0,r_1,r_2, \ldots,r_{d-1},r_d) = r_0 r_1 + r_1 r_2 + \cdots+ r_{d-1} r_d. $$

The effective QTT-rank is generally non-integer.

For experiments we use one Xeon E5504 CPU at 2.0 GHz with 72 GB of memory in the Institute of Numerical Mathematics, Russian Academy of Sciences, Russia. The TT subroutines and QTT-FFT algorithms are implemented in Fortran 90 by the third author. We use Intel Fortran Composer XE version 12.0 (64 bit) and BLAS/LAPACK and FFTW packages provided with MKL library.

7.1 Fourier Images in 1D

For integrable function f(x), the Fourier transform, or image $\hat{f}(\xi)$ is defined for every real ξ as follows

$$ \hat{f}(\xi) = \int_{-\infty}^{+\infty} f(t) \exp(-2\pi\mathtt {i}t\xi) dt. $$

(27)

We consider the rectangle pulse function, for which the Fourier transform is known,

$$ \varPi(t) = \left \{ \begin{array}{l@{\quad}l} 0, & \mbox{if } |t|>{\frac{1}{2}}\\[4pt] {\frac{1}{2}}, & \mbox{if } |t|={\frac{1}{2}}, \\[4pt] 1, & \mbox{if } |t|<{\frac{1}{2}}, \end{array} \right . \quad \hat{\varPi}(\xi) = \operatorname{{sinc}}(\xi) \mathrel{\stackrel { \mathrm{def}}{=}}\frac{\sin\pi\xi}{\pi\xi}. $$

(28)

The Fourier integral is approximated by the rectangle rule. Since f(t)=Π(t) is real and even, we write

$$ \hat{f}(\xi_j) = 2 \Re\int_0^{+\infty} f(t) \exp(-2\pi\mathtt {i}t\xi_j) dt \approx2 \Re\sum _{k=0}^{n-1} f(t_k) \exp(-2\pi \mathtt{i}t_k \xi_j) h_t, $$

(29)

where $t_{k} = (k+{\frac{1}{2}}) h_{t}$, $\xi_{j} = (j+{\frac{1}{2}}) h_{\xi}$ and k,j=0,…,n−1. If $h_{t} h_{\xi}= {\frac{1}{n}}$ and n=2^d, the sum in the right-hand side can be computed by DFT as follows

$$ \hat{f} \approx\tilde{f} = 2^{{\frac{d}{2}}+1} h_t \Re ( \omega_{d+2} \varOmega_d F_d \varOmega_d f ), \quad \hat{f} = \bigl[\hat{f}(\xi_j) \bigr]_{j=0}^{2^d-1}, \quad f = \bigl[f(t_k) \bigr]_{k=0}^{2^d-1}. $$

(30)

For $h_{t}=h_{\xi}= \frac{1}{2^{\frac{d}{2}}}$ and d even, the QTT representation of the rectangular pulse has QTT-ranks one, i.e.,

$$\varPi(t_k) = \varPi\biggl(\frac{h}{2} + \overline{k_1 \cdots k_{{\frac{d}{2}}-1}} h + \overline{k_{{\frac{d}{2}}}\cdots k_d}/2 \biggr) = (1-k_{{\frac{d}{2}}}) \cdots(1-k_d). $$

We can use QTT-FFT Algorithm 2 to compute DFT in (30). According to Theorem 2 the relative accuracy of the result is ε if the tolerance parameter is set to $\frac {\varepsilon }{d}$. The QTT-ranks and timings will grow for smaller ε. In Table 1 we compare the runtime of Algorithm 2 for different ε with the runtime of the FFT out-of-place algorithm from the FFTW library. The timings are rather small and were obtained by averaging the computation time over a large number of executions. We see that for the considered example the QTT-FFT algorithm outperforms the FFT algorithm from FFTW library for n≥2²⁰, even for very high accuracy. We also see that the use of moderate or low accuracy for QTT-FFT significantly reduces the QTT-ranks of the result and speeds up the computation. That is not the case for the standard FFT algorithm.

Table 1 Time for QTT-FFT (in milliseconds) w.r.t. size n=2^d and accuracy ε. Here time _QTT is the runtime of Algorithm 2, time _FFTW is the runtime of the FFT algorithm from the FFTW library, and $\operatorname{{rank}} \hat{f}$ is the effective QTT-rank of the Fourier image

Full size table

The accuracy of the rectangle rule (30) increases for smaller step size h. For fixed frequency ξ we can expect $\mathcal{O}(h^{2}) = \mathcal {O}(n^{-1})$ convergence, but for very large frequencies ξ _j, $j\gtrsim{\frac{n}{2}} $, Eq. (29) is not accurate since the period of exponent exp(−2πxξ _j) is less 2h. Therefore, we check the accuracy of (30) for a subvectors corresponding to 0≤ξ _j≤16. The results are shown in Table 2, where

$$\mathop{\mathtt{acc}}= \| \hat{f} - \tilde{f} \|_{[0,16]} / \| \hat{f} \|_{[0,16]}, \quad \mbox{where } \| f \|^2_{[0,16]} = \sum_{j: 0 \leq\xi_j \leq16} \big|f(\xi_j)\big|^2. $$

The computations for n≤2²⁸ can be performed by FFTW or QTT-FFT. For n>2²⁸ the one-processor subroutine from FFTW cannot be used due to storage limitations and data-sparse QTT format is necessary to increase the accuracy of the computed Fourier image. For d≤28 the accuracy of QTT-FFT was verified by comparison with FFTW and the QTT-FFT error was always ‘under control’, i.e., less than the desired accuracy.

Table 2 Accuracy verification for the QTT-FFT Algorithm 2 and f=Π(t). The desired accuracy is ε=10⁻¹³, grid size $h=2^{-{\frac{d}{2}}}$, and acc denotes the relative accuracy in Frobenius norm of (30) for subvectors corresponding to 0≤ξ _j≤16

Full size table

7.2 Fourier Images in 3D

In larger dimensions the one-dimensional size of the array that can be processed by standard FFT algorithm is severely restricted by computing resources. Therefore, the discretization accuracy obtained on simple workstations is often low and multiprocessor parallel platforms have to be used for larger grids. As well as in 1D case, the use of data-structured QTT-FFT algorithm relaxes/removes the grid size restrictions and allows to reach higher accuracy of the computation using one CPU. As an example, we compute the following m-dimensional Fourier image from [52]

(31)

where Γ is the Gamma function, δ is arbitrary real parameter, J _α is the Bessel function of the first kind of order α, and x⋅ξ denotes the scalar product. Introducing the uniform tensor product n×n×⋯×n grids in space and frequency domains with step sizes $h_{x} = h_{\xi}= \frac {1}{\sqrt{n}}$ and n=2^d, we approximate (31) by the rectangle quadrature rule. The result can be computed by m-dimensional DFT.

We perform the experiments for the case m=3, where DFT can be computed both by FFTW and QTT-FFT. The accuracy of Algorithm 3 is verified by comparison with the analytical answer (31). Since the comparison at all 2^dm points is unfeasible, the accuracy is defined as follows

The results are shown in Table 3. For d=8 the accuracy of QTT-FFT is verified also by comparison with the FFTW algorithm.

Table 3 Relative accuracy acc of 3D Fourier image (31)

Full size table

7.3 Convolution in 3D

In the scientific computing the Fourier transform is often used for the computation of convolutions. As a motivating example, consider the evaluation of the Newton potential,

$$ V(x)=\frac{1}{4\pi}\int_{\mathbb{R}^3} \frac{f(y)}{\|x-y\|} dy, $$

(32)

which plays an important role in the electronic structure calculations based on the Hartree-Fock equation. In many chemical modelling programs, e.g., PC GAMESS and MOLPRO, the electron density function f(y) is represented as a sum of polynomially weighted 3D Gaussians, for which (32) is evaluated analytically. For larger molecules, the proper choice of Gaussian-type orbitals (GTO) basis requires significant efforts, and sometimes computations become infeasible due to the huge number of basis elements involved. The possible alternative may be the discretization on the n×n×n tensor grids using standard Galerkin/collocation schemes with piecewise constant or linear basis elements. The electron density function has strong cusps at the positions of the nuclei, which motivates the use of very precise grids. The tensor-structured methods lead to almost linear or even logarithmic complexity w.r.t. n, which makes the use of very fine grids and high accuracy of computations possible without a special choice of basis. In this way, fast tensor-product convolution with the Newton kernel based on the canonical or/and Tucker tensor decompositions has been developed in [30, 33, 50] and applied to the fully discrete grid-based solution of the Hartree-Fock equation in [26, 35]. These algorithms reduce the three-dimensional convolution to a number of one-dimensional ones, which are computed by FFT in $\mathcal {O}(n\log n)$ operations. The QTT representation of the canonical vectors reduces the complexity to the logarithmic scale w.r.t. n, see [28]. The multidimensional convolution of QTT-structured data can be computed also directly using the analytical QTT decomposition of a multilevel Toeplitz matrix [25]. We refer also to [8, 12, 27, 29, 34, 43, 47, 51] for the detailed discussions on recent progress in this field.

To discretize (32), we reduce the integration interval to $B\mathrel{\stackrel{\mathrm{def}}{=}}[-{\frac{1}{2}},{\frac{1}{2}}]^{3}$, assuming that f(y) vanishes outside this cube, and introduce the uniform n×n×n tensor-product grid in B. We consider two discretization methods: collocation and Nyström-type. Collocation method approximates f(y) by a piecewise constant function and leads to

$$ V(x) \approx V_c(x) \mathrel{\stackrel{ \mathrm{def}}{=}}\frac {1}{4\pi} \sum_k f(y_k) \int_{B_k} \frac{dy}{\|x-y\|}, \quad\quad V(x_j) \approx v_c(j) \mathrel{\stackrel{ \mathrm{def}}{=}}V_c(x_j), $$

(33)

where $x_{j} = (h+{\frac{1}{2}})j$, $y_{k} = (h+{\frac{1}{2}})k$, B _k is the h×h×h cube centered at y _k, j=(j ₁,j ₂,j ₃), k=(k ₁,k ₂,k ₃) and j _p,k _p=0,…,n−1 for p=1,2,3. As shown in [30], the pointwise accuracy of this approximation is

$$\big|V(x_j) - v_c(j)\big| \leq C h^2. $$

Nyström-type convolution appears if we write (32) at points x=x _j and then approximate the integral by the rectangle quadrature rule over B as follows

$$ V(x_j) \approx v_n(j) \mathrel{ \stackrel{\mathrm{def}}{=}}\frac {1}{4\pi} \sum_k f\bigl(y'_k\bigr) \frac {h^3}{\|x_j-y'_k\|}, $$

(34)

where $y'_{k} = kh$ and j,k are defined as earlier. Since $y'_{k} \neq x_{j}$ for all j and k, the right-hand side in this equation is well defined. The standard accuracy estimate of the rectangle rule is not applicable to the singular functions and we did not find the accuracy estimate of this method in the literature. Surprisingly, it can be proven that the accuracy of the Nystöm-type convolution for the 3D Newton potential of smooth functions is estimated as

$$\big|V(x_j) - v_n(j)\big| \leq C h^2 |\log h|, $$

which is almost the same order of convergence, as for the collocation method. We will report the proof elsewhere.

Note that the Nyström-type method ‘shifts’ the result, i.e., grids for f(y) and V(x) do not coincide. This can be a drawback for some applications. However, the coefficients of the convolution core in (34) are easier to compute than the ones in (33).

Both (33) and (34) represent the three-dimensional discrete convolution. By (20), it can be computed by three three-dimensional DFTs of size (2n)³. This operation is highly restricted by the value of n and for n≳2⁹ is not feasible for full-sized vectors using a standard FFT algorithm on one processor. If f(y) has moderate QTT-ranks, the Newton potential can be efficiently computed using QTT-FFT Algorithm 3. To do this, we need the QTT representation of the first column of the Newton potential matrix, which is different for the collocation and Nyström-type methods. A similar problem was discussed for the representation of Newton/Yukawa potential in canonical format [1], and was solved using the sinc-quadrature from [17], which expresses r ⁻¹ as the sum of Gaussian functions, i.e.,

Gaussians are perfectly separable in physical modes and the QTT representation of each term can be constructed as the Kronecker product of one-dimensional Gaussians, for which we can use full-to-TT algorithm [40]. The QTT-ranks of a one-dimensional Gaussian are bounded by $\mathcal {O}(\log^{{\frac{1}{2}}}\varepsilon^{-1})$ [5].

The pointwise relative accuracy of the sinc-quadrature for different r is shown in Fig. 5(left). We see that for $\frac{r_{\mathrm{max}}}{r_{\mathrm{min}}} \approx n \lesssim10^{4}$, the quadrature with n _q=500 terms provides almost the machine precision. For n≤10⁶, the quadrature with n _q=3600 terms has the pointwise relative accuracy ε=10⁻¹³, which is enough for our purposes. Since the potential should be constructed only once (there is no adaptation of grid, etc.), the large n _q is not restrictive. In Fig. 5(right) we show the QTT-ranks of the convolution cores (33) and (34) w.r.t. grid size n and relative accuracy level ε.

To check the accuracy and speed of the proposed method, we compute the Hartree potential (32) for the Gaussian charge density f=g _0,σ for different σ and compare the result with the analytical answer,

(35)

Here a,σ are the center and width of the Gaussian and $\operatorname{{erf}} (x)\mathrel{\stackrel{\mathrm{def}}{=}}\frac{2}{\sqrt{\pi}}\int_{0}^{x} \exp(-t^{2})dt$ is the error function. Accuracy and timings of the evaluation of (35) using 3D convolution are shown in Fig. 6. The relative accuracy in the Frobenius norm is measured over one line of grid points, $\{x_{j}\} \times \{x_{{\frac{n}{2}}}\} \times \{x_{{\frac{n}{2}}}\}$, j=0,…,n−1.

It is interesting, that in Fig. 6(left) there are three error plateaus which appear for different reasons. The right plateau for σ=10⁻¹ appears on the level acc≈10⁻⁶ due to the error, introduced by the truncation of the integral away from the cube B, since on the boundary the Gaussian does not vanish to machine zero, i.e., $\exp (- \frac{x^{2}}{2 \sigma^{2}} ) |_{x={\frac{1}{2}}} \approx10^{-6}$. Gaussian charges with smaller σ represent stronger cusps and do not have this error visible. The right plateau for the Gaussian with σ=10⁻² appears due to the approximation error in QTT-FFT, which is set to ε=10⁻¹⁰. Finally, the left plateau for σ=10⁻³,σ=10⁻⁴ and σ=10⁻⁵ appear since these cusps cannot be properly described on coarse grids. Therefore, a precise evaluation of the Newton potential of such functions on the uniform tensor product grids requires larger grid sizes and is feasible using QTT approximation.

Timings for the evaluation of 3D convolution using the QTT-FFT algorithm are very moderate and scale square-logarithmically w.r.t. d, see Fig. 6(right).

7.4 Signals with Sparse Fourier Images

Consider a sum of p plane waves in m-dimensional space, defined as follows

$$ f(k) = \sum_{p=1}^R a_p \exp \biggl( \frac{2\pi\mathtt{i}}{n} f_p \cdot k \biggr), \quad f_p \in\mathbb{R}^m, \quad k=\{0,\ldots,n-1 \}^m, \quad n=2^d. $$

(36)

Each plane wave has QTT-ranks one (5), which do not depend on the accuracy and vector size. The QTT-ranks of f=[f(k)] are not larger than R. If the frequencies are integer numbers, f _p∈ℤ^m, all intermediate vectors of Algorithm 3 also have QTT-ranks one [48]. Therefore, using plain waves we can construct a signal with prescribed QTT-ranks and complexity of the QTT-FFT algorithm, which allows to compare the speed of QTT-FFT and the standard FFT algorithm for different d and R. The results are given in Fig. 7.

We see that the QTT-FFT is asymptotically faster than the full-size FFT for all moderate R that we have tested. However, the crossover point, i.e., the minimum size of the transform for which QTT-FFT outperforms the standard method, depends on R and m. For example, for R=8 the crossover point for m=1 is d≈20, for m=2 is d≈19 and for m=3 is d≈18. Therefore, the QTT-FFT algorithm performs better for high-dimensional problems than the standard FFT algorithm and is not restricted by the problem size. However, the QTT-FFT is not efficient if R is too large.

By Theorem 6, the complexity of multi-dimensional QTT-FFT is not larger than $\mathcal{O}(m d^{2} R^{3})$. Using signals (36) with different R, we can measure the actual runtime w.r.t. R. The results for signals (36) with a _p=1 and random integer frequencies f _p∈ℤ^m are given by the fourth graph in Fig. 7. Surprisingly, the computational time grows quadratically, not cubically, w.r.t. parameter R for the large-size transforms. The time of smaller-size transforms (see N=2¹²) tends to a constant for large R, since the QTT-ranks of vector (36) can not exceed the QTT-ranks of full-rank vectors. This constant level is, however, significantly larger than the time of corresponding transform of a dense vector. That shows that our current implementation of QTT-FFT is by far not so optimized as the algorithms from FFTW library. We also see that for transforms of the same size N=n ^m, i.e., with the same log₂ N=mlog₂ n=md, the transforms with larger m are faster, since the complexity is linear w.r.t. m and quadratic w.r.t. d.

Since the complexity of QTT-FFT is defined by the parameters d,m and R of the input vector, the numerical results obtained for plane waves can be generalized to all vectors with QTT-ranks bounded by R at all steps of the Cooley-Tukey algorithm. Therefore, the runtimes given in Fig. 7 can be considered as the ‘typical’ runtimes of QTT-FFT algorithm for signals characterized by m,n and R.

7.5 QTT-FFT and Sparse Fourier Transform

The QTT representation is the example of a data-sparse representation, i.e., with number of representation parameters asymptotically smaller than the full number of array elements. The sum of plane waves (36) is a particular example of a signal with data-sparse QTT representation: the QTT-ranks of the sum of R plain waves does not exceed R. If f _p∈ℤ^m, this signal has no more than R non-zero Fourier components and is both QTT-sparse and Fourier-sparse.

The sparse Fourier transform problem is the following: given a signal of length n, estimate the R largest in magnitude coefficients of its Fourier transform. If we use all n elements of a given vector and the standard FFT algorithm, the answer can be trivially computed in $\mathcal{O}(n \log n)$ operations, which is both the complexity of FFT and sorting algorithm. To reduce the complexity, heuristic algorithms are developed, which estimate the Fourier-sparse representation with given number of terms R using a subset of samples (vector elements), usually selected with a certain randomization. Sparse Fourier transform algorithms are most efficient for the vectors with o(n) non-zero Fourier components. Note that the description of this class of vectors is not explicit, as well as the description of a class of ‘suitable’ vectors for the QTT-FFT algorithm. The Fourier-sparsity of a general input vector cannot be predicted before the Fourier transform is computed.

To compare QTT-FFT with Sparse Fourier transform, we should first explain that QTT-FFT solves actually a different problem: given a signal in data-sparse QTT form, compute a Fourier transform in the same data-sparse format. The similarity, which allows to compare the QTT-FFT approach with the Sparse Fourier transform, is the following: the class of QTT-sparse vectors includes all Fourier-sparse vectors, since each signal with exactly R non-zero Fourier coefficients has QTT-ranks bounded by R (but not vice versa). The differences are the following: QTT-FFT algorithm requires input to be given already in the QTT format, and the output is again the QTT format, which is data-sparse, but not sparse. Two missing components are an algorithm that computes the QTT representation of a given signal from a set of samples (TT-cross), and an algorithm that converts the QTT format to a sparse vector (TT-sparse). This is illustrated by a scheme in Fig. 8.

Algorithms that reconstruct the tensor train format from samples of a given array are proposed in [45, 49]. They are based on the maximum-volume principle [11, 13], which selects proper sets of indices in the array for the interpolation procedure. The maximum-volume algorithm is heuristic, adaptive to the signal and originally non-random, but certain randomness can be included to check the accuracy and/or improve the robustness. The development of the maximum-volume concept to higher dimensions is still at the early stage and “TT-cross” algorithms should be further improved, as well as the background theory.

The “TT-sparse” algorithms were never considered previously, mostly due to the reason that data-sparse TT/QTT representations are “good enough” for the numerical work and we usually do not need to convert them into the pointwise-sparse format. In order to proof the concept, we can propose a very naive algorithm which sparsifies the TT-cars of (1) independently as follows $A^{(p)}_{k_{p}} \to\tilde{A}^{(p)}_{k_{p}}$, where

$$ \tilde{A}^{(p)}_{k_p}(i,j) = \left \{ \begin{array}{l@{\quad}l} A^{(p)}_{k_p}(i,j), & \mbox{if } |A^{(p)}_{k_p}(i,j)| \geq\mu\max_{k_p} \max_{i,j} |A^{(p)}_{k_p}(i,j)|, \\[6pt] 0,& \mbox{elsewhere}, \end{array} \right . $$

(37)

where μ is the threshold parameter. The sparsified TT format can be easily “decompressed” into a sparse vector.

We show how the “TT-cross → QTT-FFT → TT-sparse” scheme works for the particular two-dimensional signal, which is a sum of five plain waves (36) with unit amplitudes and randomly taken frequencies. Since the frequencies are not integer, the discrete Fourier image of this signal (see Fig. 9(left)) is not sparse, but has strong cusps near the positions of f _p. We apply the TT-ACA algorithm [49] (which is the DMRG-type TT-cross algorithm) to reconstruct the QTT representation of the signal from a few samples. Then we apply Algorithm 3 and sparsify the DFT image by (37) with μ=0.4. The result is shown in Fig. 9(right) and gives the correct positions of all five plane wave frequencies. This example shows that QTT-FFT approach (three-step scheme) can be helpful for problems where Sparse Fourier transform algorithms are applied.

8 Comparison with Sparse Fourier Transform

8.1 Comparison with Sparse Fourier Transform Algorithms

The history of the Sparse Fourier transform framework can be found in [18]. Since the Sparse Fourier transform is not a central topic of this paper, we will compare our approach only with recent algorithms developed in this field.

8.1.1 RAlSFA

The randomized heuristic algorithm RAlSFA [55] finds a near-optimal R-term Fourier-sparse representation of a signal of length n with probability δ and quasi-optimality parameter α in time $\operatorname{{poly}}(r) \log n \log(\frac {1}{\delta })/\alpha^{2}$, where the polynomial $\operatorname{{poly}}(r)$ is empirically found to be quadratic. The complexity of the QTT-FFT transform for such array is $\mathcal{O}(R^{3} \log^{2} n)$, that is larger than the RAlSFA complexity by Rlogn factor. The program code is not available in public domain, and we can compare the running time only using the information about crossover points with FFTW. The crossover point between RAlSFA and FFTW algorithms is reported for R=8, some δ and α and dimensions m=1,2,3 as N≃70000 for 1D transforms, N≃900² for 2D transforms and (estimated) as N≃210³ for 3D transforms. The crossover points of QTT-FFT and FFTW for R=8 and m=1,2,3 are N≃2²⁰ for 1D problems, N≃2¹⁹ for 2D problems and N≃2¹⁸ for 3D problems, see Fig. 7. If we consider the runtime of TT-ACA algorithm together with QTT-FFT, the crossover point for 1D transform moves to N=2²¹, see Fig. 10(left). For m=2,3 the runtime of TT-ACA also does not change the crosspoint significantly.

We can conclude that our approach is slower than RAlSFA for 1D problems, has almost the same speed for 2D problems and outperforms it in higher dimensions. The other advantage of the QTT-FFT method is that it does not require any choice of parameters, while the parameter tune in RAlSFA is quite tricky.

8.1.2 sFFT

The nearly optimal Sparse Fourier transform algorithm was recently proposed by MIT group [18, 19]. The complexity is reported to be $\mathcal{O}(R \log n \log(n/R))$, which is smaller than the complexity of QTT-FFT by a factor of R ² for n≫R. The sFFT algorithm is faster than FFTW for n=2²² and R≤2¹⁷. From Fig. 10(left) we see that the “TT-ACA + QTT-FFT” approach is competitive with FFTW for n=2²² and R≲2³, i.e., our approach is far less efficient for this problem. The program code for sFFT is currently not available in public domain.

8.1.3 AAFFT

The combinatorial sublinear Fourier transform algorithm [23] provides the approximation with high probability and has $\mathcal{O}(R \log^{5} n)$ complexity. The deterministic version requires $\mathcal{O}(R^{2} \log^{4} n)$ operations. The complexity of QTT-FFT is $\mathcal{O}(R^{3} \log^{2}n)$ and TT-ACA algorithm [49] requires $\mathcal{O}(R^{3} \log n)$ operations for each iteration/sweep. Therefore, we can expect that AAFFT is faster than our approach for larger R.

The program code of AAFFT algorithm (written in C++) is available in public domain [22] for the case m=1. This allows us to perform the numerical comparison. Note that the measured runtime includes the time of evaluation of the value of signal on selected positions (samples), i.e., signal is not precomputed before the experiment. This allows us to use finer grids with n≤2³⁰ but increases the runtime of “sampling” part (TT-ACA and the corresponding part of AAFFT) by the factor of R. The results are reported in Fig. 10(left) in terms of CPU times. We see that our approach outperforms AAFFT for small n or very small R. For larger n and R the AAFFT algorithm is faster than the “TT-ACA + QTT-FFT” scheme for the signal that is the sum of R one-dimensional plain waves (36) with integer frequencies f _p∈ℤ. Comparison in terms of number of samples gives similar result.

8.2 Comparison for Limited Bandwidth Signals

We have considered the signals (36) with integer frequencies, i.e., the exactly Fourier-sparse case. Such signals are the ‘best examples’ for the Sparse Fourier transform algorithms. However, since our approach is applicable to a wider class of signals, it is interesting to provide comparison also for the signals which are not exactly Fourier-sparse. For instance, we consider the same plain waves example (36) with arbitrary real frequencies f _p∈ℝ^m. The Fourier images of such signals are not exactly sparse, for example see Fig. 9(left). The QTT-ranks of (36) are bounded by the number of terms R, but the QTT-ranks of Fourier image (as well as the complexity of QTT-FFT) are larger and depend on selected accuracy level ε, if f _p∉ℤ^m.

The number of Fourier terms required to represent the Fourier image of (36) with accuracy ε in sparse format is o(n) but grows with ε. We found that for signals (36) with random frequencies and unit amplitudes the Sparse Fourier representation with about 32R terms provides the accuracy ε=10⁻¹ and 1024R terms provide the accuracy ε=10⁻². The corresponding runtime for the accuracy ε=10⁻² is shown in Fig. 10(right). We see that the QTT-ranks and runtime of our method grow very slowly with ε, while the runtime of AAFFT is proportional to the number of the Fourier terms required for the representation. Therefore, the sum of plane waves (36) with real frequencies is an example of a signal for which our approach allows to compute the approximate Fourier image faster / more accurately than the AAFFT Sparse Fourier algorithm.

Another example is the signal consisting of three components with limited bandwidths,

$$ \hat{f}(\xi) = \exp \biggl(-\frac{\xi^2}{2\sigma^2} \biggr) + \frac{1}{10}\exp \biggl(-\frac{(\xi-\xi_*)^2}{2\sigma^2} \biggr) + \frac{1}{10}\exp \biggl(-\frac{(\xi+\xi_*)^2}{2\sigma^2} \biggr), $$

(38)

where σ=0.02 and ξ _∗=0.4. The Fourier image is shown in Fig. 11(left). This signal models the AM (amplitude modulation) broadcast, where the amplitude of central carrier signal is one order of magnitude larger than the amplitude of sidebands. It is known that the spectrum of the signal belongs to interval [−1,1] and we can measure the signal in the time domain at points t _k=hk on the uniform grid of size n=2^d. Suppose that our goal is to locate the positions of sidebands in frequency space using a small number of samples from the signal and/or the minimum runtime.

Since the signal has many Fourier components which are almost zero, the problem can be naturally considered using the Sparse Fourier transform framework, i.e., compute the R-term Fourier-sparse representation of given signal and look at the positions of the computed frequencies. However the number of non-vanishing Fourier components of such a signal is $\mathcal{O}(n)$, the ratio of the bandwidths of signal components to the whole interval where the spectrum is considered. This makes this problem particularly difficult for Sparse Fourier transform algorithms and can increase the complexity to $\mathcal{O}(n)$.

We try to locate the sidebands using the AAFFT algorithm with different R. Since the amplitude of carrier signal is larger than the amplitudes of sidebands, for R<R _∗ all components of the obtained R-term representation belong to the carrier signal. In other words, even the presence of sidebands cannot be detected until we set R=R _∗ with certain critical number of terms R _∗. This is shown in Fig. 11(right). Note that R _∗ grows with the signal size, since the frequency domain grid size decreases and more Fourier-sparse components belong to the support of each bandwidth limited component.

Our approach represents the signal by the QTT format (4), and a large number of non-vanishing components does not mean large QTT-ranks and large complexity of the proposed method. In the experiment we note that the QTT-ranks do not grow significantly w.r.t. n, and grow very reasonably w.r.t. accuracy parameter ε. This leads to the square-logarithmic complexity of TT-ACA and QTT-FFT, i.e., $\mathcal{O}(R^{3} d^{2})$, where R is very moderate. The results are shown in Table 4. The sparse format with R=R _∗ Fourier terms approximates the given signal with relative accuracy in Frobenius norm ε≈10⁻¹. A much more accurate QTT approximation is possible and can be obtained in sublinear time.

Table 4 Comparison of AAFFT and “TT-ACA + QTT-FFT” for limited bandwidth signal (38) of length n=2^d. ‘AAFFT’ columns: R _∗ is minimal number of Fourier terms that allow to detect the sideband, time and nval are the corresponding runtime (in milliseconds) and the number of samples, ε is the relative accuracy of the Fourier-sparse representation with R _∗ terms. ‘TT-ACA + QTT-FFT’ columns: ε is the desired accuracy, R is the maximum effective QTT-rank seen in the computation, time and nval are the corresponding runtime (in milliseconds) and the number of samples

Full size table

The considered limitation of the Sparse Fourier transform framework is general and does not (in our opinion) address specifically the AAFFT algorithm. We can conclude that for the considered signal with limited bandwidth our method outperforms the Sparse Fourier transform algorithms, even for small n.

9 Conclusion and Future Work

We propose algorithms for the approximate Fourier transform of one- and multi-dimensional vectors in the QTT format. The m-dimensional Fourier transform of an n×⋯×n array with n=2^d has $\mathcal{O}(m d^{2} R^{3})$ complexity where R is the maximum QTT-rank of the input vector, result, and all intermediate vectors of the Cooley-Tukey algorithm. The storage size is bounded by $\mathcal{O}(m d R^{2})$. The proposed approach is efficient for vectors with moderate R. For vectors with moderate R and large n and m the proposed algorithm outperforms the $\mathcal{O}(n^{m} \log n)$ FFT algorithm. The complexity w.r.t. d corresponds to the complexity of the superfast quantum Fourier transform, which requires $\mathcal{O}(d^{2})$ quantum operations.

Using the QTT format, the 1D FFT transform of size n=2^d for vectors with QTT-ranks R≲10 can be computed for n≤2⁶⁰ in less than a second. The 3D FFT transform for n×n×n array with about the same QTT-ranks can be computed for n≤2²⁰ in one second. This allows to use very fine grids and thus, to accurately compute the Hartree potential of Gaussian functions with strong cusps and width in the range from σ=10⁻¹ to 10⁻⁵ by 3D convolution. We measure the runtime of the algorithm applied to multidimensional functions (m=1,2,3) with a small number of Fourier coefficients, which can be used to estimate the runtime for vectors with similar m,d and R. Notice that in the case m>3 the computation of high-resolution FFT and convolution transforms is practically infeasible without the use of a certain approximation.

Combining the QTT-FFT algorithm with a method that computes the QTT representation from several elements (samples) of a given vector, we compare it with Sparse Fourier transform algorithms. By numerical examples we show that our approach can be competitive with existing methods for the Fourier-sparse signals with randomly distributed components, especially for higher dimensions. Our approach is especially advantageous for the signals with limited bandwidth, which are not exactly Fourier-sparse. More detailed comparison with modern Sparse Fourier transform algorithms and links with the fast Fourier transform on hyperbolic cross points [7] and QTT-FFT or QTT cross interpolation schemes are postponed for future work.

The proposed method can be applied in various scientific computing solvers working with multidimensional data in data-sparse tensor formats. The discrete sine/cosine and other trigonometric transforms can be implemented using the corresponding recursion formulae, following the structure of QTT-FFT algorithm. We hope that such algorithms can have an application to image and video processing, directly or with the use of WTT version of the format, following the ideas of [46]. Also, we hope that proposed method can help to construct the classical model of several quantum algorithms based on the QFT, at least for a certain class of input data.

Notes

We will often write the equations in elementwise form, which assumes that all indices run through all possible values.
In this paper we use the plural for ‘rank’, which is well-established in tensor analysis, see [38]. We use the word ‘car’ instead of the original ‘core’ [39, 42] to refer to components X ^(p) of (1), since the plural for ‘core’ is not usual.
The order of bits in the binary notation can be different. The big-endian notation assumes that the most significant bit k _d goes first and the least significant bit k ₁ goes last, similar to numbers written in the positional system. The little-endian notation uses reversed directions of bits, from k ₁ to k _d, similar to numerals in the Arabic scripts. The little-endian ordering is consistent with the Fortran style of indexing for multi-dimensional arrays. In this paper we choose the little-endian notation since it allows more elegant and intuitive description of the main QTT-FFT algorithm. The little-endian notation is used also in [20].
This definition was proposed by E.E. Tyrtyshnikov.

References

Bertoglio, C., Khoromskij, B.N.: Low-rank quadrature-based tensor approximation of the Galerkin projected Newton/Yukawa kernels. Comput. Phys. Commun. 183(4), 904–912 (2012). doi:10.1016/j.cpc.2011.12.016
Article MathSciNet Google Scholar
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 297–301 (1965). doi:10.2307/2003354
Article MathSciNet MATH Google Scholar
de Lathauwer, L.: A survey of tensor methods. In: IEEE International Symposium on Circuits and Systems, May 2009, pp. 2773–2776 (2009). doi:10.1109/iscas.2009.5118377
Chapter Google Scholar
Dolgov, S., Khoromskij, B., Oseledets, I.V., Tyrtyshnikov, E.E.: Tensor structured iterative solution of elliptic problems with jumping coefficients (2010). Preprint 55. MPI MIS, Leipzig. www.mis.mpg.de/preprints/2010/preprint2010_55.pdf
Dolgov, S., Khoromskij, B.N., Oseledets, I.V.: Fast solution of multi-dimensional parabolic problems in the TT/QTT-format with initial application to the Fokker-Planck equation (2011). Preprint 80. MPI MIS, Leipzig. http://www.mis.mpg.de/preprints/2011/preprint2011_80.pdf
Ekert, A., Jozsa, R.: Quantum algorithms: entanglement-enhanced information processing. Philos. Trans. R. Soc. Lond. 356, 1769–1782 (1998)
Article MathSciNet MATH Google Scholar
Fenn, M., Kunis, S., Potts, D.: Fast evaluation of trigonometric polynomials from hyperbolic crosses. Numer. Algorithms 41, 339–352 (2006). doi:10.1007/s11075-006-9017-7
Article MathSciNet MATH Google Scholar
Flad, H.-J., Khoromskij, B.N., Savostyanov, D.V., Tyrtyshnikov, E.E.: Verification of the cross 3D algorithm on quantum chemistry data. Russ. J. Numer. Anal. Math. Model. 23(4), 329–344 (2008). doi:10.1515/RJNAMM.2008.020
Article MathSciNet MATH Google Scholar
Gauss, C.F.: Nachlass: theoria interpolationis methodo nova tractata. In: Werke, vol. 3, pp. 265–330. Königliche Gesellschaft der Wissenschaften, Göttingem (1866)
Google Scholar
Golub, G., Van Loan, C.: Matrix Computations. Johns Hopkins University Press, Baltimore (1996)
MATH Google Scholar
Goreinov, S., Oseledets, I., Savostyanov, D., Tyrtyshnikov, E., Zamarashkin, N.: How to find a good submatrix. In: Olshevsky, V., Tyrtyshnikov, E. (eds.) Matrix Methods: Theory, Algorithms, Applications, pp. 247–256. World Scientific, Hackensack (2010)
Chapter Google Scholar
Goreinov, S.A., Oseledets, I.V., Savostyanov, D.V.: Wedderburn rank reduction and Krylov subspace method for tensor approximation. Part 1: Tucker case. SIAM J. Sci. Comput. 34(1), A1–A27 (2012). doi:10.1137/100792056
Article MathSciNet MATH Google Scholar
Goreinov, S.A., Tyrtyshnikov, E.E.: The maximal-volume concept in approximation by low-rank matrices. Contemp. Math. 208, 47–51 (2001)
Article MathSciNet Google Scholar
Grasedyck, L.: Polynomial approximation in hierarchical Tucker format by vector-tensorization (2010). DFG-SPP1324 Preprint 43. Philipps-University, Marburg. http://www.dfg-spp1324.de/download/preprints/preprint043.pdf
Hackbusch, W.: Tensorisation of vectors and their efficient convolution. Numer. Math. 119(3), 465–488 (2011). doi:10.1007/s00211-011-0393-0
Article MathSciNet MATH Google Scholar
Hackbusch, W.: Tensor Spaces and Numerical Tensor Calculus. Springer, Berlin (2012)
Book MATH Google Scholar
Hackbusch, W., Khoromskij, B.N.: Low-rank Kronecker-product approximation to multi-dimensional nonlocal operators. I. Separable approximation of multi-variate functions. Computing 76(3–4), 177–202 (2006). doi:10.1007/s00607-005-0144-0
Article MathSciNet MATH Google Scholar
Hassanieh, H., Indyk, P., Katabi, D., Price, E.: Nearly optimal sparse Fourier transform (2012). Preprint. arXiv:1201.2501 [cs.DS]
Hassanieh, H., Indyk, P., Katabi, D., Price, E.: Simple and practical algorithm for sparse Fourier transform. In: Proceedings of 23^rd annual ACM-SIAM symposium on discrete mathematics, pp. 1183–1194. SIAM, Philadelphia (2012)
Google Scholar
Hegland, M.: A self-sorting in-place fast Fourier transform algorithm suitable for vector and parallel processing. Numer. Math. 68, 507–547 (1994)
Article MathSciNet MATH Google Scholar
Holtz, S., Rohwedder, T., Schneider, R.: The alternating linear scheme for tensor optimization in the tensor train format. SIAM J. Sci. Comput. 34(2), A683–A713 (2012). doi:10.1137/100818893
Article MathSciNet MATH Google Scholar
Iwen, M.A.: AAFFT (Ann Arbor Fast Fourier Transform) (2008). Program Code. http://www.sourceforge.net/projects/aafftannarborfa/
Iwen, M.A.: Combinatorial sublinear-time Fourier algorithms. Found. Comput. Math. 10, 303–338 (2010). doi:10.1007/s10208-009-9057-1
Article MathSciNet MATH Google Scholar
Kazeev, V., Khoromskij, B.N.: Explicit low-rank QTT representation of Laplace operator and its inverse (2010). Preprint 75. MPI MIS, Leipzig. SIAM J. Matrix Anal. Appl. (2012, to appear). www.mis.mpg.de/preprints/2010/preprint2010_75.pdf
Kazeev, V., Khoromskij, B.N., Tyrtyshnikov, E.E.: Multilevel Toeplitz matrices generated by tensor-structured vectors and convolution with logarithmic complexity (2011). Tech. Rep. 36, MPI MIS, Leipzig. http://www.mis.mpg.de/publications/preprints/2011/prepr2011-36.html
Khoromskaia, V.: Numerical solution of the Hartree-Fock equation by multilevel tensor-structured methods. Ph.D. thesis, TU, Berlin (2010). http://opus.kobv.de/tuberlin/volltexte/2011/2948/
Khoromskaia, V., Andrae, D., Khoromskij, B.N.: Fast and accurate tensor calculation of the Fock operator in a general basis (2012). Preprint 4. MPI MIS, Leipzig. www.mis.mpg.de/preprints/2012/preprint2012_4.pdf
Khoromskaia, V., Khoromskij, B.N., Schneider, R.: QTT representation of the Hartree and exchange operators in electronic structure calculations. Comput. Methods Appl. Math. 11(3), 327–341 (2011)
MathSciNet Google Scholar
Khoromskij, B.N.: On tensor approximation of Green iterations for Kohn-Sham equations. Comput. Vis. Sci. 11(4–6), 259–271 (2008). doi:10.1007/s00791-008-0097-x
Article MathSciNet Google Scholar
Khoromskij, B.N.: Fast and accurate tensor approximation of multivariate convolution with linear scaling in dimension. J. Comput. Appl. Math. 234(11), 3122–3139 (2010). doi:10.1016/j.cam.2010.02.004
Article MathSciNet MATH Google Scholar
Khoromskij, B.N.: $\mathcal{O}(d \log n)$-Quantics approximation of N–d tensors in high-dimensional numerical modeling. Constr. Approx. 34(2), 257–280 (2011). doi:10.1007/s00365-011-9131-1
Article MathSciNet MATH Google Scholar
Khoromskij, B.N.: Tensor-structured numerical methods in scientific computing: survey on recent advances. Chemom. Intell. Lab. Syst. 110(1), 1–19 (2012). doi:10.1016/j.chemolab.2011.09.001
Article Google Scholar
Khoromskij, B.N., Khoromskaia, V.: Multigrid accelerated tensor approximation of function related multidimensional arrays. SIAM J. Sci. Comput. 31(4), 3002–3026 (2009). doi:10.1137/080730408
Article MathSciNet MATH Google Scholar
Khoromskij, B.N., Khoromskaia, V., Chinnamsetty, S.R., Flad, H.-J.: Tensor decomposition in electronic structure calculations on 3D Cartesian grids. J. Comput. Phys. 228(16), 5749–5762 (2009). doi:10.1016/j.jcp.2009.04.043
Article MathSciNet MATH Google Scholar
Khoromskij, B.N., Khoromskaia, V., Flad, H.-J.: Numerical solution of the Hartree–Fock equation in multilevel tensor-structured format. SIAM J. Sci. Comput. 33(1), 45–65 (2011). doi:10.1137/090777372
Article MathSciNet MATH Google Scholar
Khoromskij, B.N., Oseledets, I.V.: DMRG+QTT approach to computation of the ground state for the molecular Schrödinger operator (2010). Preprint 69. MPI MIS, Leipzig. www.mis.mpg.de/preprints/2010/preprint2010_69.pdf
Khoromskij, B.N., Oseledets, I.V.: QTT-approximation of elliptic solution operators in high dimensions. Russ. J. Numer. Anal. Math. Model. 26(3), 303–322 (2011). doi:10.1515/rjnamm.2011.017
Article MathSciNet MATH Google Scholar
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009). doi:10.1137/07070111X
Article MathSciNet MATH Google Scholar
Oseledets, I.V.: A new tensor decomposition. Dokl. Math. 80(1), 495–496 (2009). doi:10.1134/S1064562409040115
Article MathSciNet MATH Google Scholar
Oseledets, I.V.: Approximation of 2^d×2^d matrices using tensor decomposition. SIAM J. Matrix Anal. Appl. 31(4), 2130–2145 (2010). doi:10.1137/090757861
Article MathSciNet MATH Google Scholar
Oseledets, I.V.: Constructive representation of functions in tensor formats (2010). Preprint 2010-04. INM RAS, Moscow. http://pub.inm.ras.ru
Oseledets, I.V.: Tensor-train decomposition. SIAM J. Sci. Comput. 33(5), 2295–2317 (2011). doi:10.1137/090752286
Article MathSciNet MATH Google Scholar
Oseledets, I.V., Savostyanov, D.V., Tyrtyshnikov, E.E.: Cross approximation in tensor electron density computations. Numer. Linear Algebra Appl. 17(6), 935–952 (2010). doi:10.1002/nla.682
Article MathSciNet MATH Google Scholar
Oseledets, I.V., Tyrtyshnikov, E.E.: Breaking the curse of dimensionality, or how to use SVD in many dimensions. SIAM J. Sci. Comput. 31(5), 3744–3759 (2009). doi:10.1137/090748330
Article MathSciNet MATH Google Scholar
Oseledets, I.V., Tyrtyshnikov, E.E.: TT-cross approximation for multidimensional arrays. Linear Algebra Appl. 432(1), 70–88 (2010). doi:10.1016/j.laa.2009.07.024
Article MathSciNet MATH Google Scholar
Oseledets, I.V., Tyrtyshnikov, E.E.: Algebraic wavelet transform via quantics tensor train decomposition. SIAM J. Sci. Comput. 33(3), 1315–1328 (2011). doi:10.1137/100811647
Article MathSciNet MATH Google Scholar
Savostyanov, D.V.: Fast revealing of mode ranks of tensor in canonical format. Numer. Math. Theory Method Appl. 2(4), 439–444 (2009). doi:10.4208/nmtma.2009.m9006s
MathSciNet MATH Google Scholar
Savostyanov, D.V.: QTT-rank-one vectors with QTT-rank-one and full-rank Fourier images. Linear Algebra Appl. 436(9), 3215–3224 (2012). doi:10.1016/j.laa.2011.11.008
Article MathSciNet MATH Google Scholar
Savostyanov, D.V., Oseledets, I.V.: Fast adaptive interpolation of multi-dimensional arrays in tensor train format. In: Proceedings of nDS-2011 Conference. IEEE Press, New York (2011). doi:10.1109/nDS.2011.6076873
Google Scholar
Savostyanov, D.V., Tyrtyshnikov, E.E.: Approximate multiplication of tensor matrices based on the individual filtering of factors. J. Comput. Math. Math. Phys. 49(10), 1662–1677 (2009). doi:10.1134/s0965542509100029
Article MathSciNet Google Scholar
Savostyanov, D.V., Tyrtyshnikov, E.E., Zamarashkin, N.L.: Fast truncation of mode ranks for bilinear tensor operations. Numer. Linear Algebra Appl. 19(1), 103–111 (2012). doi:10.1002/nla.765
Article MathSciNet Google Scholar
Stein, E., Weiss, G.: Introduction to Fourier Analysis on Euclidean Spaces. Princeton University Press, Princenton (1971)
MATH Google Scholar
Strang, G.: The discrete cosine transform. SIAM Rev. 41(1), 135–147 (1999)
Article MathSciNet MATH Google Scholar
White, S.R.: Density matrix formulation for quantum renormalization groups. Phys. Rev. Lett. 69(19), 2863–2866 (1992). doi:10.1103/PhysRevLett.69.2863
Article Google Scholar
Zou, J., Gilbert, A., Strauss, M., Daubechies, I.: Theoretical and experimental analysis of a randomized algorithm for Sparse Fourier transform analysis. J. Comput. Phys. 211, 572–595 (2006). doi:10.1016/j.jcp.2005.06.005
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Numerical Mathematics, Russian Academy of Sciences, Gubkina 8, 119333, Moscow, Russia
Sergey Dolgov & Dmitry Savostyanov
Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103, Leipzig, Germany
Sergey Dolgov & Boris Khoromskij

Authors

Sergey Dolgov
View author publications
You can also search for this author in PubMed Google Scholar
Boris Khoromskij
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Savostyanov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dmitry Savostyanov.

Additional information

Communicated by Arieh Iserles.

This work was supported by RFBR grants 09-01-12058, 10-01-00757, 11-01-00549, RFBR/DFG grant 09-01-91332, Russian Federation Gov. contracts No. Π1112 and Π940, 14.740.11.0345. Part of this work was done during the stay of S. Dolgov and D. Savostyanov in Max Plank Institute for Mathematics in the Sciences, Leipzig, Germany, supported by the Promotionsstipendium of Max Planck Society. Part of this work was done during the Visiting Research Fellowship of D. Savostyanov at the University of Chester, supported by the Leverhulme Trust.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dolgov, S., Khoromskij, B. & Savostyanov, D. Superfast Fourier Transform Using QTT Approximation. J Fourier Anal Appl 18, 915–953 (2012). https://doi.org/10.1007/s00041-012-9227-4

Download citation

Received: 28 April 2011
Revised: 22 April 2012
Published: 30 May 2012
Issue Date: October 2012
DOI: https://doi.org/10.1007/s00041-012-9227-4

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Superfast Fourier Transform Using QTT Approximation

Abstract

Similar content being viewed by others

Deterministic sparse FFT for M-sparse vectors

Building Z-Permuted Matrices in the QTT Format

A New Class of Fully Discrete Sparse Fourier Transforms: Faster Stable Implementations with Guarantees

1 Introduction

2 Tensor Train Format

Definition 1

Statement 1

Statement 2

Definition 2

3 Discrete Fourier Transform in One Dimension

3.1 QTT Decomposition of the Fourier Matrix Has Full Ranks

Lemma 1

Proof

Theorem 1

Proof

Corollary 1

Remark 1

3.2 Radix-2 Recursion Formula in the QTT Format

4 Approximate Fourier Transform

4.1 TT-Rounding and TT-Orthogonalization

Definition 3

Statement 3

Definition 4

Statement 4

4.2 Accuracy of the QTT-FFT Algorithm with Approximation

Theorem 2

Proof

Remark 2

4.3 Complexity Analysis

Theorem 3

Proof

5 Real-Valued Transforms Using QTT-FFT Algorithm

Theorem 4

Proof

Corollary 2

Example 1

Example 2

6 Discrete Fourier Transform in Higher-Dimensional Case

Theorem 5

Theorem 6

Proof

7 Numerical Examples

7.1 Fourier Images in 1D

7.2 Fourier Images in 3D

7.3 Convolution in 3D

7.4 Signals with Sparse Fourier Images

7.5 QTT-FFT and Sparse Fourier Transform

8 Comparison with Sparse Fourier Transform

8.1 Comparison with Sparse Fourier Transform Algorithms

8.1.1 RAlSFA

8.1.2 sFFT

8.1.3 AAFFT

8.2 Comparison for Limited Bandwidth Signals

9 Conclusion and Future Work

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation