Abstract
In this paper we consider sparse Fourier transform (SFT) algorithms for approximately computing the best s-term approximation of the discrete Fourier transform (DFT) \({\hat{\mathbf {f}}}\in {{\mathbb {C}}}^N\) of any given input vector \(\mathbf {f}\in {{\mathbb {C}}}^N\) in just \(\left( s \log N\right) ^{{{\mathcal {O}}}(1)}\)-time using only a similarly small number of entries of \(\mathbf {f}\). In particular, we present a deterministic SFT algorithm which is guaranteed to always recover a near best s-term approximation of the DFT of any given input vector \(\mathbf {f}\in {{\mathbb {C}}}^N\) in \({{\mathcal {O}}} \left( s^2 \log ^{\frac{11}{2}} (N) \right) \)-time. Unlike previous deterministic results of this kind, our deterministic result holds for both arbitrary vectors \(\mathbf {f}\in {{\mathbb {C}}}^N\) and vector lengths N. In addition to these deterministic SFT results, we also develop several new publicly available randomized SFT implementations for approximately computing \({\hat{\mathbf {f}}}\) from \(\mathbf {f}\) using the same general techniques. The best of these new implementations is shown to outperform existing discrete sparse Fourier transform methods with respect to both runtime and noise robustness for large vector lengths N.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Herein we are concerned with the rapid approximation of the discrete Fourier transform \({\hat{\mathbf {f}}}\in {\mathbb {C}}^N\) of a given vector \(\mathbf {f}\in {\mathbb {C}}^N\) for large values of N. Though standard Fast Fourier Transform (FFT) algorithms [5, 8, 28] can accomplish this task in \({\mathcal {O}} \left( N \log N \right) \)-time for arbitrary \(N \in {\mathbb {N}}\), this runtime complexity may still be unnecessarily computationally taxing when N is extremely large. This is particularly true when the vector \({\hat{\mathbf {f}}}\) is approximately s-sparse (i.e., contains only \(s \ll N\) nonzero entries) as in compressive sensing [11] and certain wideband signal processing applications (see, e.g. [24]). Such applications have therefore motivated the development of discrete sparse Fourier transform (DSFT) techniques [13, 14] which are capable of accurately approximating s-sparse DFT vectors \({\hat{\mathbf {f}}}\in {\mathbb {C}}^N\) in just \(s \cdot \log ^{{\mathcal {O}}(1)} N\)-time. When \(s \ll N\) these methods are significantly faster than standard \({\mathcal {O}} \left( N \log N \right) \)-time FFT methods, effectively achieving sublinear o(N) runtime complexities in such cases.
Currently, the most widely used \(s \cdot \log ^{{\mathcal {O}}(1)} N\)-time DSFT methods [12, 15, 22] are randomized algorithms which accurately compute \({\hat{\mathbf {f}}}\) with high probability when given sampling access to \(\mathbf {f}\). Many existing sparse Fourier transforms which are entirely deterministic [6, 19, 21, 25, 29], on the other hand, are perhaps best described as unequally spaced sparse Fourier transform (USSFT) methods in that they approximately compute \({\hat{\mathbf {f}}}\), with its entries \(\hat{f}_\omega \) indexed by the set \(B := \left( -\left\lceil \frac{N}{2}\right\rceil ,\left\lfloor \frac{N}{2}\right\rfloor \right] \cap {\mathbb {Z}}\), by sampling its associated trigonometric polynomial
at a collection of \(m \ll N\) specially constructed unequally spaced points \(x_1, \dots , x_m \in [-\pi , \pi ]\). These methods have no probability of failing to recover s-sparse \({\hat{\mathbf {f}}}\), but can not accurately compute the DFT \({\hat{\mathbf {f}}}\) of an arbitrary given vector \(\mathbf {f}\in {\mathbb {C}}^N\) due to their need for unequally spaced function evaluations from f of the form \(\left\{ f(x_k) \right\} ^m_{k=1}\).Footnote 1
This state of affairs has left a gap in the theory of DSFT methods. Existing deterministic sparse Fourier transform algorithms currently can efficiently compute the s-sparse DFT \({\hat{\mathbf {f}}}\) of a given vector \(\mathbf {f}\in {\mathbb {C}}^N\) only if either (i) N is a power of a small prime [26], or else (ii) \(\hat{f}_\omega = 0\) for all \(\omega \in B\) with \(|\omega | > N/4\) [19, 20]. In this paper this gap is filled by the development of a new entirely deterministic DSFT algorithm which is always guaranteed to accurately approximate any (nearly) s-sparse \({\hat{\mathbf {f}}}\in {\mathbb {C}}^N\) of any size N when given access only to \(\mathbf {f}\in {\mathbb {C}}^N\). In addition, the method used to develop this new deterministic DSFT algorithm is general enough that it can be applied to any fast and noise robust USSFT method of the type mentioned above (be it deterministic, or randomized) in order to yield a new fast and robust DSFT algorithm. As a result, we are also able to use the fastest of the currently existing USSFT methods [4, 6, 17, 19, 21, 25, 29] in order to create new publicly available DSFT implementations herein which are both faster and more robust to noise than currently existing noise robust DSFT methods for large N.
More generally, we emphasize that the techniques utilized below free developers of SFT methods to develop more general USSFT methods which utilize samples from the trigonometric polynomial f above at any points \(\left\{ x_k \right\} ^m_{k=1} \subset [-\pi ,\pi ]\) they like when attempting to create better DSFT algorithms in the future. Indeed, the techniques herein provide a relatively simple means of translating any future fast and robust USSFT algorithms into (still fast) DSFT algorithms.
1.1 Theoretical Results
Herein we focus on rapidly producing near best s-term approximations of \({\hat{\mathbf {f}}}\) of the type usually considered in compressive sensing [7]. Let \({\hat{\mathbf {f}}}_{s}^{\mathrm{opt}} \in {\mathbb {C}}^N\) denote an optimal s-term approximation to \({\hat{\mathbf {f}}}\in {\mathbb {C}}^N\). That is, let \({\hat{\mathbf {f}}}_{s}^{\mathrm{opt}}\) preserve s of the largest magnitude entries of \({\hat{\mathbf {f}}}\) while setting the rest of its \(N-s\) smallest magnitude entires to 0.Footnote 2 The following DSFT theorem is proven below.Footnote 3
Theorem 1
Let \(N\in {\mathbb {N}}\), \(s\in [2,N]\cap {\mathbb {N}}\), \(1\le r \le \frac{N}{36}\), and \(\mathbf {f}\in {\mathbb {C}}^N\). There exists an algorithm that will always deterministically return an s-sparse vector \(\mathbf {v} \in {\mathbb {C}}^{N}\) satisfying
in just \({\mathcal {O}} \left( \frac{ s^2\cdot r^{\frac{3}{2}} \cdot \log ^{\frac{11}{2}} (N)}{\log (s)} \right) \)-time when given access to \(\mathbf {f}\). If returning an s-sparse vector \(\mathbf {v}\in {\mathbb {C}}^{N}\) that satisfies (1) for each \(\mathbf {f}\) with probability at least \((1-p) \in [2/3,1)\) is sufficient, a Monte Carlo algorithm also exists which will do so in just \( {\mathcal {O}} \left( s\cdot r^{\frac{3}{2}} \cdot \log ^\frac{9}{2}(N)\cdot \log \left( \frac{N}{p}\right) \right) \)-time.
Note the quadratic-in-s runtime dependence of the deterministic algorithm mentioned by Theorem 1. It turns out that there is a close relationship between the sampling points \(\left\{ x_k \right\} ^m_{k=1}\) used by the deterministic USSFT methods [21] employed as part of the proof of Theorem 1 and the construction of explicit (deterministic) RIP matrices (see [1, 18] for details). As a result, reducing the quadratic dependence on s of the \(s^2 \log ^{{\mathcal {O}}(1)} N\)-runtime complexity of the deterministic DSFT algorithms referred to by Theorem 1 while still satisfying the error guarantee (1) is likely at least as difficult as constructing explicit deterministic RIP matrices with fewer than \(s^2 \log ^{{\mathcal {O}}(1)} N\) rows by subsampling rows from an \(N \times N\) DFT matrix. Unfortunately, explicitly constructing RIP matrices of this type is known to be a very difficult problem [11]. This means that constructing an entirely deterministic DSFT algorithm which is both guaranteed to always satisfy (1), and which also always runs in \(s \log ^{{\mathcal {O}}(1)} N\)-time, is also likely to be extremely difficult to achieve at present.Footnote 4
The remainder of this paper is organized as follows: In Sect. 2 we set up notation and establish necessary background results. Then, in Sect. 3, we describe our method for converting noise robust USSFT methods into DSFT methods. The resulting approach is summarized in Algorithm 1 therein. Next, Theorem 1 is proven in Sect. 4 using the intermediary results of Sects. 2 and 3. An empirical evaluation of several new DSFT algorithms resulting from our proposed approach is then performed in Sect. 5. The paper is finally concluded with a few additional comments in Sect. 6.
2 Notation and Setup
The Fourier series representation of a \(2\pi \hbox {-periodic}\) function \(f:\left[ -\pi ,\pi \right] \rightarrow {\mathbb {C}}\) will be denoted by
with its Fourier coefficients, \(\widehat{f}_{\omega }\), given by
We let \(\widehat{f}:=\left\{ \widehat{f}_{\omega }\right\} _{\omega \in {\mathbb {Z}}}\) represent the infinite sequence of all Fourier coefficients of f below. Given two \(2\pi \)-periodic functions f and g we define the convolution of f and g at \(x \in {\mathbb {R}}\) to be
This definition, coupled with the definition of the Fourier transform, yields the well-known equality
We may also write \(\widehat{f*g}=\widehat{f}\circ \widehat{g}\) where \(\circ \) denotes the Hadamard product.
For any \(N\in {\mathbb {N}}\), define the discrete Fourier transform (DFT) matrix \(F\in {\mathbb {C}}^{N\times N}\) by
and let \(B:=\left( -\left\lceil \frac{N}{2}\right\rceil ,\left\lfloor \frac{N}{2}\right\rfloor \right] \cap {\mathbb {Z}}\) be a set of N integer frequencies centered at 0. Furthermore, let \(\mathbf {f}\in {\mathbb {C}}^N\) denote the vector of equally spaced samples from f whose entries are given by
for \(j = 0, \dots , N-1\). One can now see that if
then
where \({\hat{\mathbf {f}}}\in {\mathbb {C}}^{N}\) denotes the subset of \(\widehat{f}\) with indices in B, and in vector form.Footnote 5 More generally, bolded lower case letters will always represent vectors in \({\mathbb {C}}^{N}\) below.
As mentioned above, \(\widehat{f}:=\left\{ \widehat{f}_{\omega }\right\} _{\omega \in {\mathbb {Z}}}\) is the infinite sequence of all Fourier coefficients of f. For any subset \(S \subseteq {\mathbb {Z}}\) we let \(\widehat{f}\vert _{S}\in {\mathbb {C}}^{{\mathbb {Z}}}\) be the sequence \(\widehat{f}\) restricted to the subset S, so that \(\widehat{f}\vert _{S}\) has terms \(\left( \widehat{f}\vert _{S} \right) _\omega = \widehat{f}_{\omega }\) for all \(\omega \in S\), and \(\left( \widehat{f}\vert _{S} \right) _\omega = 0\) for all \(\omega \in S^{c}:={\mathbb {Z}}\setminus S\). Note that \({\hat{\mathbf {f}}}\) above is exactly \(\widehat{f}\vert _{B}\) excluding its zero terms for all \(\omega \notin B\). Thus, given any subset \(S\subseteq B\), we let \({\hat{\mathbf {f}}}\vert _{S}\in {\mathbb {C}}^{N}\) be the vector \({\hat{\mathbf {f}}}\) restricted to the set S in an analogous fashion. That is, for \(S \subseteq B\) we will have \(\left( {\hat{\mathbf {f}}}\vert _{S} \right) _\omega = {\hat{\mathbf {f}}}_\omega \) for all \(\omega \in S\), and \(\left( {\hat{\mathbf {f}}}\vert _{S} \right) _\omega = 0\) for all \(\omega \in B\setminus S\).
Given the sequence \(\widehat{f}\in {\mathbb {C}}^{{\mathbb {Z}}}\) and \(s\le N\), we denote by \(R_{s}^{\mathrm{opt}}\left( \widehat{f}\right) \) a subset of B containing s of the most energetic frequencies of f; that is
where the frequencies \(\omega _j \in B\) are ordered such that
Here, if desired, one may break ties by also requiring, e.g., that \(\omega _j < \omega _k\) for all \(j < k\) with \(\left| \widehat{f}_{\omega _{j}}\right| =\left| \widehat{f}_{\omega _{k}}\right| \). We will then define \(f_{s}^{\mathrm{opt}}:\left[ -\pi ,\pi \right] \rightarrow {\mathbb {C}}\) based on \(R_{s}^{\mathrm{opt}}\left( \widehat{f}\right) \) by
Any such \(2 \pi \)-periodic function \(f_{s}^{\mathrm{opt}}\) will be referred to as an optimal s-term approximation to f. Similarly, we also define both \(\widehat{f}_{s}^{\mathrm{opt}} \in {\mathbb {C}}^{{\mathbb {Z}}}\) and \({\hat{\mathbf {f}}}_{s}^{\mathrm{opt}} \in {\mathbb {C}}^{N}\) to be \(\widehat{f}\vert _{R_{s}^{\mathrm{opt}}\left( \widehat{f}\right) }\) and \({\hat{\mathbf {f}}}\vert _{R_{s}^{\mathrm{opt}}\left( \widehat{f}\right) }\), respectively.
2.1 Periodized Gaussians
In the sections that follow the \(2\pi \hbox {-periodic}\) Gaussian \(g:\left[ -\pi ,\pi \right] \rightarrow {\mathbb {R}}^{+}\) defined by
with \(c_1 \in {\mathbb {R}}^+\) will play a special role. The following lemmas recall several useful facts concerning both its decay, and its Fourier series coefficients.
Lemma 1
The \(2\pi \hbox {-periodic}\) Gaussian \(g:\left[ -\pi ,\pi \right] \rightarrow {\mathbb {R}}^{+}\) has
for all \(x \in \left[ -\pi ,\pi \right] \).
Lemma 2
The \(2\pi \hbox {-periodic}\) Gaussian \(g:\left[ -\pi ,\pi \right] \rightarrow {\mathbb {R}}^{+}\) has
for all \(\omega \in {\mathbb {Z}}\). Thus, \(\widehat{g}=\left\{ \widehat{g}_{\omega }\right\} _{\omega \in {\mathbb {Z}}}\in \ell ^{2}\) decreases monotonically as \(|\omega |\) increases, and also has \(\Vert \widehat{g} \Vert _{\infty } = \frac{1}{\sqrt{2 \pi }}\).
Lemma 3
Choose any \(\tau \in \left( 0, \frac{1}{\sqrt{2\pi }} \right) \), \(\alpha \in \left[ 1, \frac{N}{\sqrt{\ln N}} \right] \), and \(\beta \in \left( 0 , \alpha \sqrt{\frac{\ln \left( 1/\tau \sqrt{2\pi } \right) }{2}} ~\right] \). Let \(c_1 = \frac{\beta \sqrt{\ln N}}{N}\) in the definition of the periodic Gaussian g from (3). Then \(\widehat{g}_{\omega } \in \left[ \tau , \frac{1}{\sqrt{2\pi }} \right] \) for all \(\omega \in {\mathbb {Z}}\) with \(|\omega | \le \Bigl \lceil \frac{N}{\alpha \sqrt{\ln N}}\Bigr \rceil \).
The proofs of Lemmas 1, 2, and 3 are included in Appendix B for the sake of completeness. Intuitively, we will utilize the periodic function g from (3) as a bandpass filter below. Looking at Lemma 3 in this context we can see that its parameter \(\tau \) will control the effect of \(\widehat{g}\) on the frequency passband defined by its parameter \(\alpha \). Deciding on the two parameters \(\tau , \alpha \) then constrains \(\beta \) which, in turn, fixes the periodic Gaussian g by determining its constant coefficient \(c_1\). As we shall see, the parameter \(\beta \) will also determine the speed and accuracy with which we can approximately sample (i.e., evaluate) the function \(f *g\). For this reason it will become important to properly balance these parameters against one another in subsequent sections.
2.2 On the Robustness of the SFTs Proposed in [21]
The sparse Fourier transforms presented in [21] include both deterministic and randomized methods for approximately computing the Fourier series coefficients of a given \(2 \pi \)-periodic function f from its evaluations at m-points \(\left\{ x_k \right\} ^m_{k=1} \subset [-\pi , \pi ]\). The following results describe how accurate these algorithms will be when they are only given approximate evaluations of f at these points instead. These results are necessary because we will want to execute the SFTs developed in [21] on convolutions of the form \(f *g\) below, but will only be able to approximately compute their values at each of the required points \(x_1, \dots , x_m \in [-\pi ,\pi ]\).
Lemma 4
Let \(s, \epsilon ^{-1} \in {\mathbb {N}} \setminus \{ 1 \}\) with \((s/\epsilon ) \ge 2\), and \(\mathbf {n}\in {\mathbb {C}}^m\) be an arbitrary noise vector. There exists a set of m points \(\left\{ x_k \right\} ^m_{k=1} \subset [-\pi , \pi ]\) such that Algorithm 3 on page 72 of [21], when given access to the corrupted samples \(\left\{ f(x_k) + n_k \right\} ^m_{k=1}\), will identify a subset \(S \subseteq B\) which is guaranteed to contain all \(\omega \in B\) with
Furthermore, every \(\omega \in S\) returned by Algorithm 3 will also have an associated Fourier series coefficient estimate \(z_{\omega } \in {\mathbb {C}}\) which is guaranteed to have
Both the number of required samples, m, and Algorithm 3’s operation count are
If succeeding with probability \((1-\delta ) \in [2/3,1)\) is sufficient, and \((s/\epsilon ) \ge 2\), the Monte Carlo variant of Algorithm 3 referred to by Corollary 4 on page 74 of [21] may be used. This Monte Carlo variant reads only a randomly chosen subset of the noisy samples utilized by the deterministic algorithm,
yet it still outputs a subset \(S \subseteq B\) which is guaranteed to simultaneously satisfy both of the following properties with probability at least \(1-\delta \):
-
(i)
S will contain all \(\omega \in B\) satisfying (4), and
-
(ii)
all \(\omega \in S\) will have an associated coefficient estimate \(z_{\omega } \in {\mathbb {C}}\) satisfying (5).
Finally, both this Monte Carlo variant’s number of required samples, \(\tilde{m}\), as well as its operation count will also always be
Using the preceding lemma one can easily prove the following noise robust variant of Theorem 7 (and Corollary 4) from §5 of [21]. The proofs of both results are outlined in Appendix C for the sake of completeness.
Theorem 2
Suppose \(f: [-\pi ,\pi ] \rightarrow {\mathbb {C}}\) has \(\widehat{f} \in \ell ^1 \cap \ell ^2\). Let \(s, \epsilon ^{-1} \in {\mathbb {N}} \setminus \{ 1 \}\) with \((s/\epsilon ) \ge 2\), and \(\mathbf {n}\in {\mathbb {C}}^m\) be an arbitrary noise vector. Then, there exists a set of m points \(\left\{ x_k \right\} ^m_{k=1} \subset [-\pi , \pi ]\) together with a simple deterministic algorithm \({\mathcal {A}}: {\mathbb {C}}^m \rightarrow {\mathbb {C}}^{4s}\) such that \({\mathcal {A}} \left( \left\{ f(x_k) + n_k \right\} ^m_{k=1} \right) \) is always guaranteed to output (the nonzero coefficients of) a degree \(\le N/2\) trigonometric polynomial \(y_s: [-\pi , \pi ] \rightarrow {\mathbb {C}}\) satisfying
Both the number of required samples, m, and the algorithm’s operation count are always
If succeeding with probability \((1-\delta ) \in [2/3,1)\) is sufficient, and \((s/\epsilon ) \ge 2\), a Monte Carlo variant of the deterministic algorithm may be used. This Monte Carlo variant reads only a randomly chosen subset of the noisy samples utilized by the deterministic algorithm,
yet it still outputs (the nonzero coefficients of) a degree \(\le N/2\) trigonometric polynomial, \(y_s: [-\pi , \pi ] \rightarrow {\mathbb {C}}\), that satisfies (8) with probability at least \(1-\delta \). Both its number of required samples, \(\tilde{m}\), as well as its operation count will always be
We now have the necessary prerequisites in order to discuss our general strategy for constructing several new fully discrete SFTs.
3 Description of the Proposed Approach
In this section we assume that we have access to an SFT algorithm \({\mathcal {A}}\) which requires m function evaluations of a \(2 \pi \)-periodic function \(f: [-\pi , \pi ] \rightarrow {\mathbb {C}}\) in order to produce an s-sparse approximation to \(\widehat{f}\). For any non-adaptive SFT algorithm \({\mathcal {A}}\) the m points \(\left\{ x_k \right\} ^m_{k=1} \subset [-\pi , \pi ]\) at which \({\mathcal {A}}\) needs to evaluate f can be determined before \({\mathcal {A}}\) is actually executed. As a result, the function evaluations \(\left\{ f(x_k) \right\} ^m_{k=1}\) required by \({\mathcal {A}}\) can also be evaluated before \({\mathcal {A}}\) is ever run. Indeed, if the SFT algorithm \({\mathcal {A}}\) is nonadaptive, stable, and robust to noise it suffices to approximate the function evaluations \(\left\{ f(x_k) \right\} ^m_{k=1}\) required by \({\mathcal {A}}\) before it is executed.Footnote 6 These simple ideas form the basis for the proposed computational approach outlined in Algorithm 1.
The objective of Algorithm 1 is to use a nonadaptive and noise robust SFT algorithm \({\mathcal {A}}\) which requires off-grid function evaluations in order to approximately compute the DFT of a given vector \(\mathbf {f}\in {\mathbb {C}}^N\), \({\hat{\mathbf {f}}}= F \mathbf {f}\) . Note that computing \({\hat{\mathbf {f}}}\) is equivalent to computing the Fourier series coefficients of the degree N trigonometric interpolant of \(\mathbf {f}\). Hereafter the \(2 \pi \)-periodic function \(f: [-\pi , \pi ] \rightarrow {\mathbb {C}}\) under consideration will always be this degree N trigonometric interpolant of \(\mathbf {f}\). Our objective then becomes to approximately compute \(\widehat{f}\) using \({\mathcal {A}}\). Unfortunately, our given input vector \(\mathbf {f}\) only contains equally spaced function evaluations of f, and so does not actually contain the function evaluations \(\left\{ f(x_k) \right\} ^m_{k=1}\) required by \({\mathcal {A}}\). As a consequence, we are forced to try to interpolate these required function evaluations \(\left\{ f(x_k) \right\} ^m_{k=1}\) from the available equally spaced function evaluations \(\mathbf {f}\).
Directly interpolating the required function evaluations \(\left\{ f(x_k) \right\} ^m_{k=1}\) from \(\mathbf {f}\) for an arbitrary degree N trigonometric polynomial f using classical techniques appears to be either too inaccurate, or else too slow to work well in our setting.Footnote 7 As a result, Algorithm 1 follows the example of successful nonequispaced fast Fourier transform (NFFT) methods (see, e.g. [2, 9, 10, 23, 30]) and instead uses \(\mathbf {f}\) to rapidly approximate samples from the convolution of the unknown trigonometric polynomial f with (several modulations of) a known filter function g. Thankfully, all of the evaluations \(\left\{ (g*f)(x_k) \right\} ^m_{k=1}\) can be approximated very accurately using only the data in \(\mathbf {f}\) in just \({\mathcal {O}}(m \log N)\)-time when g is chosen carefully enough (see Sect. 3.1 below). The given SFT algorithm \({\mathcal {A}}\) is then used to approximate the Fourier coefficients of \(g*f\) for each modulation of g using these approximate evaluations. Finally, \({\hat{\mathbf {f}}}\) is then approximated using the recovered sparse approximation for each \(\widehat{g*f}\) combined with our a priori knowledge of \(\widehat{g}\).
Next, in Sect. 3.1, explicit bounds will be developed which characterize the runtime required in order to accurately approximate arbitrary samples from \(f*g\) using only a few entries from \(\mathbf {f}\). The attentive reader may notice there that the main theorem in that section (Theorem 4) bares some resemblance to state of the art NFFT error bounds (see, e.g., Steidl’s Theorem 3.1 in [30]) in that it utilizes the properties of truncated convolutions with periodized Gaussians in order to obtain error bounds which decay exponentially with the number of truncated convolution terms utilized per function evaluation. It is important to note, however, that the SFT methods considered herein have several crucial complicating constraints which require such NFFT techniques to be substantially overhauled before they may be fruitfully employed in our setting. Chief among these complications are that \(\Omega (N)\)-time NFFT methods for the evaluation of trigonometric polynomials at nonequispaced points along with their attending error analysis effectively assume that \({\hat{\mathbf {f}}}\) is already known (or, at least, that computing it in FFT-time is an acceptable computational cost). In the case of SFTs this is not true since our main objective is exactly to approximate \({\hat{\mathbf {f}}}\) much more quickly than an FFT can by only reading a tiny sublinear-in-N fraction of the entries in \(\mathbf {f}\). As a result, unlike NFFT methods our analysis needs to focus on rapidly approximating values from \(f *g\) instead of from f itself, and with a Gaussian g whose Fourier transform \(\widehat{g}\) still allows the rapid and accurate application of SFT techniques in Sect. 4 as we continue therein to prove the main result of the paper (Theorem 1).
3.1 Rapidly and Accurately Evaluating \(f*g\)
In this section we will carefully consider the approximation of \(\left( f*g\right) \left( x\right) \) by a severely truncated version of the semi-discrete convolution sum
for any given value of \(x \in [-\pi , \pi ]\). Our goal is to determine exactly how many terms of this finite sum we actually need in order to obtain an accurate approximation of \(f*g\) at an arbitrary x-value. More specifically, we aim to use as few terms from this sum as absolutely possible in order to ensure, e.g., an approximation error of size \({\mathcal {O}}(N^{-2})\).
Without loss of generality, let us assume that \(N=2M+1\) is odd – this allows us to express B, the set of N Fourier modes about zero, as
In the lemmas and theorems below the function \(f:\left[ -\pi ,\pi \right] \rightarrow {\mathbb {C}}\) will always denote a degree-N trigonometric polynomial of the form
Furthermore, g will always denote the periodic Gaussian as defined above in (3). Finally, we will also make use of the Dirichlet kernel \(D_{M}:{\mathbb {R}}\rightarrow {\mathbb {C}}\), defined by
The relationship between trigonometric polynomials such as f and the Dirichlet kernel \(D_{M}\) is the subject of the following lemma.
Lemma 5
Let \(h: [-\pi , \pi ] \rightarrow {\mathbb {C}}\) have \(\widehat{h}_{\omega } = 0\) for all \(\omega \notin B\), and define the set of points \(\left\{ y_{j}\right\} _{j=0}^{2M}=\left\{ -\pi +\frac{2\pi j}{N} \right\} _{j=0}^{2M}\). Then,
holds for all \(x\in \left[ -\pi ,\pi \right] \).
Proof
By the definition of \(D_{M}\), we trivially have \(2\pi \left( \widehat{D_{M}} \right) _{\omega }=\chi _{B}\left( \omega \right) \)\(\forall \omega \in {\mathbb {Z}}\). Thus,
where, as before, \(\circ \) denotes the Hadamard product, and \(*\) denotes convolution. This yields \(h\left( x\right) =2\pi \left( h*D_{M}\right) \left( x\right) \) and so establishes the first equality above. To establish the second equality above, recall from (2) that for any \(\omega \in B\) we will have
since h is a trigonometric polynomial. Thus, given \(x\in \left[ -\pi ,\pi \right] \) one has
We now have the desired result. \(\square \)
We can now write a formula for \(g*f\) which only depends on N evaluations of f in \([-\pi , \pi ]\).
Lemma 6
Given the set of equally spaced points \(\left\{ y_{j}\right\} ^{2M}_{j = 0}=\left\{ -\pi +\frac{2\pi j}{N} \right\} _{j=0}^{2M}\) one has that
for all \(x\in \left[ -\pi ,\pi \right] \).
Proof
By Lemma 5, we have
The last equality holds after a change of variables since g and \(D_{M}\) are both \(2\pi \hbox {-periodic}\). \(\square \)
The next two lemmas will help us bound the error produced by discretizing the integral weights present in the finite sum provided by Lemma 6 above. More specifically, they will ultimately allow us to approximate the sum in Lemma 6 by the sum in (11).
Lemma 7
Let \(x\in \left[ -\pi ,\pi \right] \) and \(y_{j} = -\pi +\frac{2\pi j}{N}\) for some \(j = 0, \dots , 2M\). Then,
Proof
Recalling that \(2\pi \left( \widehat{D_{M}} \right) _{\omega }=\chi _{B}\left( \omega \right) \) for all \(\omega \in {\mathbb {Z}}\) we have that
\(\square \)
Lemma 8
Denote \(I\left( a\right) :=\int _{-a}^{a}\mathbb {e}^{-x^{2}}dx\) for \(a>0\); then
Proof
Let \(a>0\) and observe that
The first equality holds by Fubini’s theorem, and the inequality follows simply by integrating a positive function over a disk of radius a as opposed to a square of sidelength 2a. A similar argument yields the upper bound. \(\square \)
We are now ready to bound the difference between the integral weights present in the finite sum provided by Lemma 6, and the \(g\left( x - y_j \right) \)-weights present in the sum (11).
Lemma 9
Choose any \(\tau \in \left( 0,\frac{1}{\sqrt{2\pi }}\right) \), \(\alpha \in \left[ 1,\frac{N}{\sqrt{\ln N}}\right] \), and \(\beta \in \left( 0,\alpha \sqrt{\frac{\ln \left( 1/\tau \sqrt{2\pi }\right) }{2}}~\right] \). Let \(c_{1}=\frac{\beta \sqrt{\ln N}}{N}\) in the definition of the periodic Gaussian g so that
Then for all \(x\in \left[ -\pi ,\pi \right] \) and \(y_{j} = -\pi +\frac{2\pi j}{N}\),
Proof
Using Lemma 7 we calculate
Upon the change of variable \(v=\frac{\beta n\sqrt{\ln N}}{\sqrt{2}N}\), we get that
where the last inequality follows from Lemma 8. Noting now that
and that \(\frac{N}{M}=2+\frac{1}{M}\in \left( 2,3\right] \) for all \(M \in {\mathbb {Z}}^+\), we can further see that
also always holds. \(\square \)
With the lemmas above we can now prove that (11) can be used to approximate \(\left( g*f\right) \left( x\right) \) for all \(x \in [-\pi , \pi ]\) with controllable error.
Theorem 3
Let \(p \ge 1\). Using the same values of the parameters from Lemma 9 above, one has
for all \(x\in \left[ -\pi ,\pi \right] \).
Proof
Using Lemmas 6 and 9 followed by Holder’s inequality, we have
\(\square \)
To summarize, Theorem 3 tells us that \(\left( g*f\right) \left( x\right) \) can be approximately computed in \({\mathcal {O}}\left( N\right) \)-time for any \(x\in \left[ -\pi ,\pi \right] \) using (11). This linear runtime cost may be reduced significantly, however, if one is willing to accept an additional trade-off between accuracy and the number of terms needed in the sum (11). This trade-off is characterized in the next lemma.
Lemma 10
Let \(x\in \left[ -\pi ,\pi \right] \), \(p \ge 1\), \(\gamma \in {\mathbb {R}}^+\), and \(\kappa := \lceil \gamma \ln N \rceil + 1\). Set \(j' := \arg \min _j \left| x - y_j \right| \). Using the same values of the other parameters from Lemma 9 above, one has
for all \(\beta \ge 4\) and \(N \ge \beta ^2\).
Proof
Appealing to Lemma 1 and recalling that \(c_{1}=\frac{\beta \sqrt{\ln N}}{N}\) we can see that
Using this fact we have that
for all \(k \in {\mathbb {Z}}_N\). As a result, one can now bound
above by
where the \(y_j\)-indexes are considered modulo N as appropriate.
Our goal is now to employ Holder’s inequality on (12). Toward that end, we will now bound the q-norm of the vector \(\mathbf{h} := \left\{ \mathbb {e}^{-\frac{\left( \kappa + \ell - \frac{1}{2}\right) ^{2} 2 \pi ^2}{\beta ^{2}\ln N}} \right\} ^{N- 2 \kappa - 1}_{\ell = 1}\). Letting \(a := q \left( \frac{4}{\beta ^{2}\ln N} \right) \) we have that
where we have used Lemma 8 once again. As a result we have that
for all \(q \ge 1\). Applying Holder’s inequality on (12) we can now see that (12) is bounded above by
The result now follows. \(\square \)
We may now finally combine the truncation and estimation errors in Theorem 3 and Lemma 10 above in order to bound the total error one incurs by approximating \(\left( g*f\right) (x)\) via a truncated portion of (11) for any given \(x \in [-\pi , \pi ]\).
Theorem 4
Fix \(x\in \left[ -\pi ,\pi \right] \), \(p\ge 1\) (or \(p = \infty \)), \(\frac{N}{36}\ge r \ge 1\), and \(g: [-\pi , \pi ] \rightarrow {\mathbb {R}}^{+}\) to be the \(2\pi \)-periodic Gaussian (3) with \(c_1 := \frac{6 \sqrt{\ln (N^r)}}{N}\). Set \(j' := \arg \min _j \left| x - y_j \right| \) where \(y_{j} = -\pi +\frac{2\pi j}{N}\) for all \(j = 0, \dots , 2M\). Then,
As a consequence, we can see that \(\left( g*f\right) (x)\) can always be computed to within \({\mathcal {O}} \left( \Vert \mathbf {f}\Vert _{\infty } N^{-r} \right) \)-error in just \({\mathcal {O}}\left( r \log N \right) \)-time for any given \(\mathbf {f}\in {\mathbb {C}}^N\) once the \(\big \{ g\left( x - y_{j}\right) \big \}^{j' + \left\lceil \frac{6r}{\sqrt{2} \pi } \ln N \right\rceil + 1}_{j = j' - \left\lceil \frac{6r}{\sqrt{2} \pi } \ln N \right\rceil - 1}\) have been precomputed.
Proof
Combining Theorem 3 and Lemma 10 we can see that
where \(\beta = 6 \sqrt{r} \ge 6\), \(N \ge 36 r = \beta ^2\), and \(\gamma = \frac{6r}{\sqrt{2} \pi } = \frac{\beta \sqrt{r}}{\sqrt{2} \pi }\). \(\square \)
We are now prepared to bound the error of the proposed approach when utilizing the SFTs developed in [21].
4 An Error Guarantee for Algorithm 1 When Using the SFTs Proposed in [21]
Given the \(2\pi \hbox {-periodic}\) Gaussian \(g: [-\pi , \pi ] \rightarrow {\mathbb {R}}^{+}\) (3), consider the periodic modulation of g, \(\tilde{g}_{q}:\left[ -\pi ,\pi \right] \rightarrow {\mathbb {C}}\), for any \(q\in {\mathbb {Z}}\) defined by
One can see that
so that the Fourier series coefficients of \(\tilde{g}_{q}\) are those of g, shifted by q; that is,
In line 9 of Algorithm 1, we provide the SFT Algorithm in [21] with the approximate evaluations of \(\left\{ \left( \tilde{g}_{q}*f\right) \left( x_{k}\right) \right\} _{k=1}^{m},\) namely, \(\left\{ \left( \tilde{g}_{q}*f\right) \left( x_{k}\right) +n_{k}\right\} _{k=1}^{m}\), where, by Theorem 4, the perturbations \(n_{k}\) are bounded, for instance, by
With this in mind, let us apply Lemma 4 to the function \(\tilde{g}_{q}*f\). We have the following lemma.
Lemma 11
Let \(s\in [2,N]\cap {\mathbb {N}}\), and \(\mathbf {n}\in {\mathbb {C}}^{m}\) be the vector containing the total errors incurred by approximating \(\tilde{g}_{q}*f\) via a truncated version of (11), as per Theorem 4. There exists a set of m points \(\left\{ x_{k}\right\} _{k=1}^{m}\subset \left[ -\pi ,\pi \right] \) such that Algorithm 3 on page 72 of [21], when given access to the corrupted samples \(\left\{ \left( \tilde{g}_{q}*f\right) \left( x_{k}\right) +n_{k}\right\} _{k=1}^{m},\) will identify a subset \(S\subseteq B\) which is guaranteed to contain all \(\omega \in B\) with
Furthermore, every \(\omega \in S\) returned by Algorithm 3 will also have an associated Fourier series coefficient estimate \(z_{\omega }\in {\mathbb {C}}\) which is guaranteed to have
Next, we need to guarantee that the estimates of \(\widehat{\tilde{g}_{q}*f}\) returned by Algorithm 3 of [21] will yield good estimates of \(\widehat{f}\) itself. We have the following.
Lemma 12
Let \(s\in [2,N]\cap {\mathbb {N}}\). Given a \(2\pi \)-periodic function \(f:\left[ -\pi ,\pi \right] \rightarrow {\mathbb {C}}\), the periodic Gaussian g, and any of its modulations \(\tilde{g}_{q}\left( x\right) =\mathbb {e}^{-\mathbb {i}qx}g\left( x\right) \), one has
Proof
Recall the definition of \(R_{s}^{\mathrm{opt}}\left( \widehat{f}\right) \) as the subset of B containing the s most energetic frequencies of \(\widehat{f}\), and observe that
since, by Lemma 2, \(\widehat{g}_{\omega }<\frac{1}{2}\) for all \(\omega \), and consequently, \(\left( \widehat{\tilde{g}_{q}}\right) _{\omega }=\widehat{g}_{\omega +q}<\frac{1}{2}\) for all \(\omega \). Moreover,
Let us combine the guarantees above into the following lemma.
Lemma 13
Let \(s\in [2,N]\cap {\mathbb {N}}\), and \(\mathbf {n}\in {\mathbb {C}}^{m}\) be the vector containing the total errors incurred by approximating \(\tilde{g}_{q}*f\) via a truncated version of (11), as per Theorem 4. There exists a set of m points \(\left\{ x_{k}\right\} _{k=1}^{m}\subset \left[ -\pi ,\pi \right] \) such that Algorithm 3 on page 72 of [21], when given access to the corrupted samples \(\left\{ \left( \tilde{g}_{q}*f\right) \left( x_{k}\right) +n_{k}\right\} _{k=1}^{m},\) will identify a subset \(S\subseteq B\) which is guaranteed to contain all \(\omega \in B\) with
Furthermore, every \(\omega \in S\) returned by Algorithm 3 will also have an associated Fourier series coefficient estimate \(z_{\omega }\in {\mathbb {C}}\) which is guaranteed to have
The lemma above implies that for any choice of q in line 4 of Algorithm 1, we are guaranteed to find all \(\omega \in \left[ q-\left\lceil \frac{N}{\alpha \sqrt{\ln N}}\right\rceil ,q+\left\lceil \frac{N}{\alpha \sqrt{\ln N}}\right\rceil \right) \cap B\) with
where \(\alpha \) and \(\tau \) are as defined in Lemma 3. Moreover, the Fourier series coefficient estimates \(z_{\omega }\) returned by Algorithm 3 will satisfy
Following Theorem 3, which guarantees a decay of \(N^{-r}\) in the total approximation error, let us set \(\beta =6\sqrt{r}\) for \(1\le r\le \frac{N}{36}\). Recall from Lemma 3 the choice of \(\beta \)\(\in \left( 0,\alpha \sqrt{\frac{\ln \left( 1/\tau \sqrt{2\pi }\right) }{2}}\right] \) where \(\tau \) is to be chosen from \(\left( 0,\frac{1}{\sqrt{2\pi }}\right) \). Thus, we must choose \(\alpha \in \left[ 1,\frac{N}{\sqrt{\ln N}}\right] \) so that
We may remove the dependence on \(\tau \) simply by setting, e.g., \(\tau =\frac{1}{3}\). Then \(\alpha ={\mathcal {O}}\left( \sqrt{r}\right) \).
We are now ready to state the recovery guarantee of Algorithm 1 and its operation count.
Theorem 5
Let \(N\in {\mathbb {N}}\), \(s\in [2,N]\cap {\mathbb {N}}\), and \(1\le r \le \frac{N}{36}\) as in Theorem 4. If Algorithm 3 of [21] is used in Algorithm 1 then Algorithm 1 will always deterministically identify a subset \(S\subseteq B\) and a sparse vector \(\mathbf {v}\vert _{S}\in {\mathbb {C}}^{N}\) satisfying
Algorithm 1’s operation count is then
If returning a sparse vector \(\mathbf {v}\vert _{S}\in {\mathbb {C}}^{N}\) that satisfies (13) with probability at least \((1-p) \in [2/3,1)\) is sufficient, a Monte Carlo variant of the deterministic Algorithm 3 in [21] may be used in line 9 of Algorithm 1. In this case Algorithm 1’s operation count is
Proof
Redefine \(\delta \) in the proof of Theorem 7 in [21] as
and observe that any \(\omega \in B=\left( -\left\lceil \frac{N}{2}\right\rceil ,\left\lfloor \frac{N}{2}\right\rfloor \right] \cap {\mathbb {Z}}\) that is reconstructed by Algorithm 1 will have a Fourier series coefficient estimate \(v_{\omega }\) that satisfies
We can thus bound the approximation error by
In order to make additional progress on (14) we must consider the possible magnitudes of \(\mathbf {\widehat{f}}\) entries at indices in \(S\backslash R_{s}^{\mathrm{opt}}\left( \widehat{f}\right) \) and \(R_{s}^{\mathrm{opt}}\left( \widehat{f}\right) \backslash S\). Careful analysis (in line with the techniques employed in the proof of Theorem 7 of [21]) indicates that
Therefore, in the worst possible case equation (14) will remain bounded by
The error bound stated in (13) follows.
The runtimes follow by observing that \(c_2 = {\mathcal {O}} \left( \alpha \cdot \log ^{\frac{1}{2}} (N)\right) = {\mathcal {O}}\left( r^{\frac{1}{2}}\cdot \log ^{\frac{1}{2}} (N) \right) \) as chosen in line 2 of Algorithm 1, and for every choise of q in line 4 of Algorithm 1, all of the evaluations \(\left\{ (\tilde{g}_q*f)(x_k) \right\} ^m_{k=1}\) can be approximated very accurately in just \({\mathcal {O}}(m r \log N)\)-time, where the number of samples m is on the orders described in Theorem 2. \(\square \)
We are now ready to empirically evaluate Algorithm 1 with several different SFT algorithms \({\mathcal {A}}\) used in its line 9.
5 Numerical Evaluation
In this section we evaluate the performance of three new discrete SFT Algorithms resulting from Algorithm 1: DMSFT-4, DMSFT-6,Footnote 8 and CLW-DSFT.Footnote 9 All of them were developed by utilizing different SFT algorithms in line 9 of Algorithm 1. Here DMSFT stands for the Discrete Michigan State Fourier Transform algorithm. Both DMSFT-4 and DMSFT-6 are implementations of Algorithm 1 that use a randomized version of the SFT algorithm GFFT [29] in their line 9.Footnote 10 The only difference between DMSFT-4 and DMSFT-6 is how accurately each one estimates the convolution in line 7 of Algorithm 1: for DMSFT-4 we use \(\kappa = 4\) in the partial discrete convolution in Lemma 10 when approximating \(\tilde{g}_q*f\) at each \(x_k\), while for DMSFT-6 we always use \(\kappa = 6\). The CLW-DSFT stands for the Christlieb Lawlor Wang Discrete Sparse Fourier Transform algorithm. It is an implementation of Algorithm 1 that uses the SFT developed in [6] in its line 9, and \(\kappa \) varying between 12 and 20 for its line 7 convolution estimates (depending on each input vector’s Fourier sparsity, etc.). All of DMSFT-4, DMSFT-6 and CLW-DSFT were implemented in C++ in order to empirically evaluate their runtime and noise robustness characteristics.
We also compare these new implementations’ runtime and robustness characteristics with FFTW 3.3.4Footnote 11 and sFFT 2.0.Footnote 12 FFTW is a highly optimized FFT implementation which runs in \({\mathcal {O}}(N\log N)\)-time for input vectors of length N. All the standard discrete Fourier transforms in the numerical experiments are performed using FFTW 3.3.4 with FFTW_MEASURE plan. The sFFT 2.0 is a randomized discrete sparse Fourier Transform algorithm written in C++ which is both stable and robust to noise. It was developed by Hassanieh et al. in [15]. Note that DMSFT-4, DMSFT-6, CLW-DSFT, and sFFT 2.0 are all randomized algorithms designed to approximate discrete DFTs that are approximately s-sparse. This means that all of them take both sparsity s and size N of the DFT’s \({\hat{\mathbf {f}}}\in {\mathbb {C}}^N\) they aim to recover as parameters. In contrast, FFTW cannot utilize existing sparsity to its advantage. Finally, all experiments are run on a Linux CentOS machine with 2.50 GHz CPU and 16 GB of RAM.
5.1 Experiment Setup
For the execution time experiments each trial input vector \(\mathbf {f}\in {\mathbb {C}}^N\) was generated as follows: First s frequencies were independently selected uniformly at random from \([0, N)\cap {\mathbb {Z}}\), and then each of these frequencies was assigned a uniform random phase with magnitude 1 as its Fourier coefficient. The remaining frequencies’ Fourier coefficients were then set to zero to form \({\hat{\mathbf {f}}}\in {\mathbb {C}}^N\). Finally, the trial input vector \(\mathbf {f}\) was then formed via an inverse DFT.
For each pair of s and N the parameters in each randomized algorithm were chosen so that the probability of correctly recovering all s energetic frequencies was at least 0.9 per trial input. Every data point in a figure below corresponds to an average over 100 runs on 100 different trial input vectors of this kind. It is worth mentioning that the parameter tuning process for DMSFT-4 and DMSFT-6 requires significantly less effort than for both CLW-DSFT and sFFT 2.0 since the DMSFT variants only have two parameters (whose default values are generally near-optimal).
5.2 Runtime as Input Vector Size Varies
In Fig. 1 we fixed the sparsity to \(s=50\) and ran numerical experiments on 8 different input vector lengths N: \(2^{16}\), \(2^{18}\), \(\ldots \), \(2^{30}\). We then plotted the running time (averaged over 100 runs) for DMSFT-4, DMSFT-6, CLW-DSFT, sFFT 2.0, and FFTW.
As expected, the runtime slope of all the SFT algorithms (i.e. DMSFT-4, DMSFT-6, CLW-DSFT, and sFFT 2.0) is less than the slope of FFTW as N increases. Although FFTW is fastest for vectors of small size, it becomes the slowest algorithm when the vector size N is greater than \(2^{20}\). Among the randomized algorithms, sFFT 2.0 is the fastest one when N is less than \(2^{22}\), but DMSFT-4, DMSFT-6, and CLW-DSFT all outperform sFFT 2.0 with respect to runtime when the input vector’s sizes are large enough. The CLW-DSFT implementation becomes faster than sFFT 2.0 when N is approximately \(2^{21}\) while DMSFT-4 and DMSFT-6 have better runtime performance than sFFT 2.0 when N is greater than \(2^{23}\).
5.3 Runtime as Sparsity Varies
In Fig. 2 we fix the input vector lengths to \(N = 2^{26}\) and run the numerical experiments on 7 different values of sparsity s: 50, 100, 200, 400, 1000, 2000, and 4000. As expected, the FFTW’s runtime is constant as we increase the sparsity. The runtimes of DMSFT-4, CLW-DSFT, and sFFT 2.0 are all essentially linear in s. Here DMSFT-6 has been excluded for ease of viewing/reference – its runtimes lie directly above those of DMSFT-4 when included in the plot. Looking at Fig. 2 we can see the CLW-DSFT’s runtime increases more rapidly with s than that of DMSFT-4 and sFFT 2.0. The runtime of CLW-DSFT becomes the slowest one when sparsity is around 1000. DMSFT-4 and sFFT 2.0 have approximately the same runtime slope as s increases, and they both have good performance when the sparsity is large. However, DMSFT-4 maintains consistently better runtime performance than sFFT 2.0 for all sparsity values, and is the only algorithm in the plot that still faster than FFTW when the sparsity is 4000. Indeed, when the sparsity is 4000 the average runtime of DMSFT-4 is 2.68 s and the average runtime of DMSFT-6 is 2.9 s. Both of them remain faster than FFTW (3.47 s) and sFFT 2.0 (3.96 s) for this large sparsity (though only DMSFT-4 has been included in the plot above).
5.4 Robustness to Noise
In our final set of experiments we test the noise robustness of DMSFT-4, DMSFT-6, CLW-DSFT, sFFT 2.0, and FFTW for different levels of Gaussian noise. Here the size of each input vector is \(N=2^{22}\) and sparsity is fixed at \(s = 50\). The test signals are then generated as before, except that Gaussian noise is added to \(\mathbf {f}\) after it is constructed. More specifically, we first generate \(\mathbf {f}\) and then set \(\mathbf {f}= \mathbf {f}+ \mathbf {n}\) where each entry of \(\mathbf {n}\), \(n_j\), is an i.i.d. mean 0 random complex Gaussian value. The noise vector \(\mathbf {n}\) is then rescaled to achieve each desired signal-to-noise ratio (SNR) considered in the experiments.Footnote 13
Recall that the the randomized algorithms compared herein (DMSFT-4, DMSFT-6, CLW-DSFT, and sFFT 2.0) are all tuned to guarantee exact recovery of s-sparse functions with probability at least 0.9 in all experiments. For our noise robustness experiments this ensures that the correct frequency support, S, is found for at least 90 of the 100 trial signals used to generate each point plotted in Fig. 3. We use average \(L_1\) error to measure the noise robustness of each algorithm for each of these at least 90 trial runs. The average \(L_1\) error is defined as
where S is the true frequency support of the input vector \(\mathbf {f}\), \(\hat{f}_{\omega }\) are the true input Fourier coefficients for all frequencies \(\omega \in S\), and \(z_{\omega }\) are their recovered approximations from each algorithm. Figure 3 graphs the averaged average \(L_1\) error over the at least 90 trial signals where each method correctly identified S.
It can be seen in Fig. 3 that DMSFT-4, DMSFT-6, sFFT 2.0, and FFTW are all robust to noise. As expected, FFTW has the best performance in this test. DMSFT-4 and DMSFT-6 are both more robust to noise when compared to sFFT 2.0. As for CLW-DSFT, it cannot guarantee a 0.9 probability of correctly recovering S when the SNR is less than 40 and so is not plotted for those SNR values. This is due to the base energetic frequency identification methods of [6, 25] being inherently ill conditioned, though the CLW-DSFT results look better when compared to the true \({\hat{\mathbf {f}}}\) with respect to, e.g., earth mover’s distance. Frequencies are often estimated incorrectly by CLW-DSFT at higher noise levels, but when they are they are usually at least close enough to the true frequencies to be informative.
6 Conclusion
Let \({\mathcal {A}}\) be a sublinear-time sparse FFT algorithm which utilizes unequally spaced samples from a given periodic function \(f: [-\pi , \pi ] \rightarrow {{\mathbb {C}}}\) in order to rapidly approximate its sequence of Fourier series coefficients \(\hat{f} \in \ell ^2\). In this paper we propose a generic method of transforming any such algorithm \({\mathcal {A}}\) into a sublinear-time sparse DFT algorithm which rapidly approximates \({\hat{\mathbf {f}}}\) from a given input vector \(\mathbf {f}\in {{\mathbb {C}}}^N\). As a result we are able to construct several new sublinear-time sparse DFT algorithms from existing sparse Fourier algorithms which utilize unequally spaced function samples [6, 21, 25, 29]. The best of these new algorithms is shown to outperform existing discrete sparse Fourier transform methods with respect to both runtime and noise robustness for large vector lengths N. In addition, we also present several new theoretical discrete sparse FFT robust recovery guarantees. These include the first known theoretical guarantees for entirely deterministic and discrete sparse DFT algorithms which hold for arbitrary input vectors \(\mathbf {f}\in {{\mathbb {C}}}^N\).
Notes
Note that methods which compute the DFT \({\hat{\mathbf {f}}}\) of a given vector \(\mathbf {f}\) implicitly assume that \(\mathbf {f}\) contains equally spaced samples from the trigonometric polynomial f above.
Note that \({\hat{\mathbf {f}}}_{s}^{\mathrm{opt}}\) may not be unique as there can be ties for the sth largest entry in magnitude of \(\mathbf {f}\). This trivial ambiguity turns out not to matter.
Of course deterministic algorithms with error guarantees of the type of (1) do exist for more restricted classes of periodic functions f. See, e.g. [3, 4, 27] for some examples. These include USSFT methods developed for periodic functions with structured Fourier support [3] which are of use for, among other things, the fast approximation of functions which exhibit sparsity with respect to other bounded orthonormal basis functions [16].
The interested reader may refer to Appendix A for the proof of (2).
We hasten to point out, moreover, that similar ideas can also be employed for adaptive and noise robust SFT algorithms in order to approximately evaluate f in an “on demand” fashion as well. We leave the details to the interested reader.
Each function evaluation \(f(x_k)\) needs to be accurately computed in just \({\mathcal {O}}(\log ^c N)\)-time in order to allow us to achieve our overall desired runtime for Algorithm 1.
The code for both DMSFT variants is available at https://sourceforge.net/projects/aafftannarborfa/.
The CLW-DSFT code is available at www.math.msu.edu/~markiwen/Code.html.
Code for GFFT is also available at www.math.msu.edu/~markiwen/Code.html.
This code is available at http://www.fftw.org/.
This code is available at https://groups.csail.mit.edu/netmit/sFFT/.
The SNR is defined as \(SNR = 20\log \frac{\parallel \mathbf {f}\parallel _2}{\parallel \mathbf {n}\parallel _2}\), where \(\mathbf {f}\) is the length N input vector and \(\mathbf {n}\) is the length N noise vector.
References
Bailey, J., Iwen, M.A., Spencer, C.V.: On the design of deterministic matrices for fast recovery of Fourier compressible functions. SIAM J. Matrix Anal. Appl. 33(1), 263–289 (2012)
Beylkin, G.: On the fast Fourier transform of functions with singularities. Appl. Comput. Harmon. Anal. 2(4), 363–381 (1995)
Bittens, S.: Sparse FFT for Functions with Short Frequency Support. University of Göttingen, Göttingen (2016)
Bittens, S., Zhang, R., Iwen, M.A.: A deterministic sparse FFT for functions with structured Fourier sparsity. arXiv:1705.05256 (2017)
Bluestein, L.: A linear filtering approach to the computation of discrete Fourier transform. IEEE Trans. Audio Electroacoust. 18(4), 451–455 (1970)
Christlieb, A., Lawlor, D., Wang, Y.: A multiscale sub-linear time Fourier algorithm for noisy data. Appl. Comput. Harmon. Anal. 40, 553–574 (2016)
Cohen, A., Dahmen, W., DeVore, R.: Compressed sensing and best k-term approximation. J. Am. Math. Soc. 22(1), 211–231 (2009)
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19(90), 297–301 (1965)
Dutt, A., Rokhlin, V.: Fast Fourier transforms for nonequispaced data. SIAM J. Sci. Comput. 14(6), 1368–1393 (1993)
Dutt, A., Rokhlin, V.: Fast Fourier transforms for nonequispaced data. ii. Appl. Comput. Harmon. Anal. 2(1), 85–100 (1995)
Foucart, S., Rauhut, H.: A Mathematical Introduction to Compressive Sensing. Birkhäuser, Basel (2013)
Gilbert, A.C., Muthukrishnan, S., Strauss, M.: Improved time bounds for near-optimal sparse Fourier representations. In: Proceedings of the Optics & Photonics 2005, pp. 59141A–59141A. International Society for Optics and Photonics (2005)
Gilbert, A.C., Strauss, M.J., Tropp, J.A.: A tutorial on fast Fourier sampling. IEEE Signal Process. Mag. 25(2), 57–66 (2008)
Gilbert, A.C., Indyk, P., Iwen, M., Schmidt, L.: Recent developments in the sparse Fourier transform: a compressed Fourier transform for big data. IEEE Signal Process. Mag. 31(5), 91–100 (2014)
Hassanieh, H., Indyk, P., Katabi, D., Price, E.: Simple and practical algorithm for sparse Fourier transform. In: Proceedings of the SODA (2012)
Hu, X., Iwen, M., Kim, H.: Rapidly computing sparse Legendre expansions via sparse Fourier transforms. Numer. Algorithms 74(4), 1029–1059 (2017)
Iwen, M.A.: A deterministic sub-linear time sparse Fourier algorithm via non-adaptive compressed sensing methods. In: Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 20–29. Society for Industrial and Applied Mathematics (2008)
Iwen, M.A.: Simple deterministically constructible rip matrices with sublinear Fourier sampling requirements. In: Proceedings of the CISS, pp. 870–875 (2009)
Iwen, M.A.: Combinatorial sublinear-time Fourier algorithms. Found. Comput. Math. 10, 303–338 (2010)
Iwen, M.A.: Notes on lemma 6. Preprint at www.math.msu.edu/~markiwen/Papers/Lemma6_FOCM_10.pdf (2012)
Iwen, M.A.: Improved approximation guarantees for sublinear-time Fourier algorithms. Appl. Comput. Harmon. Anal. 34, 57–82 (2013)
Iwen, M., Gilbert, A., Strauss, M., et al.: Empirical evaluation of a sub-linear time sparse DFT algorithm. Commun. Math. Sci. 5(4), 981–998 (2007)
Keiner, J., Kunis, S., Potts, D.: Using NFFT 3–a software library for various nonequispaced fast Fourier transforms. ACM Trans. Math. Softw. 36(4), 19:1–19:30 (2009)
Laska, J., Kirolos, S., Massoud, Y., Baraniuk, R., Gilbert, A., Iwen, M., Strauss, M.: Random sampling for analog-to-information conversion of wideband signals. In: Proceedings of the 2006 IEEE Dallas/CAS Workshop on Design, Applications, Integration and Software, pp. 119–122. IEEE (2006)
Lawlor, D., Wang, Y., Christlieb, A.: Adaptive sub-linear time Fourier algorithms. Adv. Adapt. Data Anal. 5(01), 1350003 (2013)
Morotti, L.: Explicit universal sampling sets in finite vector spaces. Appl. Comput. Harmonic Anal. (2016) https://doi.org/10.1016/j.acha.2016.06.001
Plonka, G., Wannenwetsch, K.: A deterministic sparse FFT algorithm for vectors with small support. Numer. Algorithms 71(4), 889–905 (2016)
Rabiner, L., Schafer, R., Rader, C.: The chirp z-transform algorithm. IEEE Trans. Audio Electroacoust. 17(2), 86–92 (1969)
Segal, I., Iwen, M.: Improved sparse Fourier approximation results: Faster implementations and stronger guarantees. Numer. Algorithms 63, 239–263 (2013)
Steidl, G.: A note on fast Fourier transforms for nonequispaced grids. Adv. Comput. Math. 9, 337–353 (1998)
Acknowledgements
M.A. Iwen, R. Zhang, and S. Merhi were all supported in part by NSF DMS-1416752. The authors would like to thank Aditya Viswanathan for helpful comments and feedback on the first draft of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Hans G. Feichtinger.
Appendices
Appendix A: Fourier Basics: Continuous versus Discrete Fourier Transforms for Trigonometric Polynomials
Our objective in this appendix is to provide additional details regarding (2) and its relationship to the continuous sparse Fourier transform methods for periodic functions that we employ herein. Our starting point will be to assume only that we have been provided with a vector of data \(\mathbf {f}\in {{\mathbb {C}}}^{N}\). Our goal is to rapidly approximate the matrix vector product \({\hat{\mathbf {f}}}= F \mathbf {f}\) where \(F\in {{\mathbb {C}}}^{N\times N}\) is the DFT matrix whose entries are given by
for \(\omega \in \left( -\left\lceil \frac{N}{2}\right\rceil ,\left\lfloor \frac{N}{2}\right\rfloor \right] \cap {\mathbb {Z}}\) and \(j = 0, \dots , N-1\).
Beginning from this starting point, one may choose to regard the given vector of data \(\mathbf {f}\) as having been generated by sampling a \(2 \pi \)-periodic trigonometric polynomial \(f: [-\pi , \pi ] \rightarrow {{\mathbb {C}}}\) of the form
where \(B:=\left( -\left\lceil \frac{N}{2}\right\rceil ,\left\lfloor \frac{N}{2}\right\rfloor \right] \cap {\mathbb {Z}}\). In particular, herein we will assume that \(\mathbf {f}\) has its jth entry generated by \(f_j := f\left( -\pi +\frac{2\pi j}{N}\right) \) for \(j = 0, \dots , N-1\). Note that there is exactly one such f for the given data \(\mathbf {f}\) since \(|B| = N\) (i.e., f is the unique interpolating polynomial for \(\mathbf {f}\) with \(\omega \in B\)).
Considering f just above, we can now see that for any \(k \in \mathbb {Z}\) the associated Fourier series coefficient of f is
Changing our focus now to the discrete Fourier transform of \(\mathbf {f}\) considered as being samples from f we can see that
Note that the line above establishes (2) where the vector \({\hat{\mathbf {f}}}\in {{\mathbb {C}}}^N\) exactly contains the nonzero Fourier series coefficients of f as its entries. As a result we can see that computing the Fourier series coefficients of f is equivalent to computing the matrix vector product \({\hat{\mathbf {f}}}= F \mathbf {f}\) for our given data \(\mathbf {f}\).
Appendix B: Proof of Lemmas 1, 2 and 3
We will restate each lemma before its proof for ease of reference.
Lemma 14
(Restatement of Lemma 1) The \(2\pi \hbox {-periodic}\) Gaussian \(g:\left[ -\pi ,\pi \right] \rightarrow {\mathbb {R}}^{+}\) has
for all \(x \in \left[ -\pi ,\pi \right] \).
Proof
Observe that
holds since the series above have monotonically decreasing positive terms, and \(x\in \left[ -\pi ,\pi \right] \).
Now, if \(x\in \left[ 0,\pi \right] \) and \(n\ge 1\), one has
which yields
Using Lemma 8 to bound the last integral we can now get that
Recalling now that g is even we can see that this inequality will also hold for all \(x \in [-\pi ,0]\) as well. \(\square \)
Lemma 15
(Restatement of Lemma 2) The \(2\pi \hbox {-periodic}\) Gaussian \(g:\left[ -\pi ,\pi \right] \rightarrow {\mathbb {R}}^{+}\) has
for all \(\omega \in {\mathbb {Z}}\). Thus, \(\widehat{g}=\left\{ \widehat{g}_{\omega }\right\} _{\omega \in {\mathbb {Z}}}\in \ell ^{2}\) decreases monotonically as \(|\omega |\) increases, and also has \(\Vert \widehat{g} \Vert _{\infty } = \frac{1}{\sqrt{2 \pi }}\).
Proof
Starting with the definition of the Fourier transform, we calculate
The last two assertions now follow easily. \(\square \)
Lemma 16
(Restatement of Lemma 3) Choose any \(\tau \in \left( 0, \frac{1}{\sqrt{2\pi }} \right) \), \(\alpha \in \left[ 1, \frac{N}{\sqrt{\ln N}} \right] \), and \(\beta \in \left( 0 , \alpha \sqrt{\frac{\ln \left( 1/\tau \sqrt{2\pi } \right) }{2}} ~\right] \). Let \(c_1 = \frac{\beta \sqrt{\ln N}}{N}\) in the definition of the periodic Gaussian g from (3). Then \(\widehat{g}_{\omega } \in \left[ \tau , \frac{1}{\sqrt{2\pi }} \right] \) for all \(\omega \in {\mathbb {Z}}\) with \(|\omega | \le \Bigl \lceil \frac{N}{\alpha \sqrt{\ln N}}\Bigr \rceil \).
Proof
By Lemma 2 above it suffices to show that
which holds if and only if
Thus, it is enough to have
or,
This, in turn, is guaranteed by our choice of \(\beta \). \(\square \)
Appendix C: Proof of Lemma 4 and Theorem 2
We will restate Lemma 4 before its proof for ease of reference.
Lemma 17
(Restatement of Lemma 4) Let \(s, \epsilon ^{-1} \in {\mathbb {N}} \setminus \{ 1 \}\) with \((s/\epsilon ) \ge 2\), and \(\mathbf {n}\in {\mathbb {C}}^m\) be an arbitrary noise vector. There exists a set of m points \(\left\{ x_k \right\} ^m_{k=1} \subset [-\pi , \pi ]\) such that Algorithm 3 on page 72 of [21], when given access to the corrupted samples \(\left\{ f(x_k) + n_k \right\} ^m_{k=1}\), will identify a subset \(S \subseteq B\) which is guaranteed to contain all \(\omega \in B\) with
Furthermore, every \(\omega \in S\) returned by Algorithm 3 will also have an associate Fourier series coefficient estimate \(z_{\omega } \in {\mathbb {C}}\) which is guaranteed to have
Both the number of required samples, m, and Algorithm 3’s operation count are
If succeeding with probability \((1-\delta ) \in [2/3,1)\) is sufficient, and \((s/\epsilon ) \ge 2\), the Monte Carlo variant of Algorithm 3 referred to by Corollary 4 on page 74 of [21] may be used. This Monte Carlo variant reads only a randomly chosen subset of the noisy samples utilized by the deterministic algorithm,
yet it still outputs a subset \(S \subseteq B\) which is guaranteed to simultaneously satisfy both of the following properties with probability at least \(1-\delta \):
-
(i)
S will contain all \(\omega \in B\) satisfying (15), and
-
(ii)
all \(\omega \in S\) will have an associated coefficient estimate \(z_{\omega } \in {\mathbb {C}}\) satisfying (16).
Finally, both this Monte Carlo variant’s number of required samples, \(\tilde{m}\), as well as its operation count will also always be
Proof
The proof of this lemma involves a somewhat tedious and uninspired series of minor modifications to various results from [21]. In what follows we will outline the portions of that paper which need to be changed in order to obtain the stated lemma. Algorithm 3 on page 72 of [21] will provide the basis of our discussion.
In the first paragraph of our lemma we are provided with m-contaminated evaluations of f, \(\left\{ f(x_k) + n_k \right\} ^m_{k=1}\), at the set of m points \(\left\{ x_k \right\} ^m_{k=1} \subset [-\pi , \pi ]\) required by line 4 of Algorithm 1 on page 67 of [21]. These contaminated evaluations of f will then be used to approximate the vector \(\mathcal {G}_{\lambda ,K} \tilde{\psi } \mathbf{A} \in {\mathbb {C}}^m\) in line 4 of Algorithm 3. More specifically, using (18) on page 67 of [21] one can see that each \(\left( {\mathcal {G}_{\lambda ,K} \tilde{\psi } \mathbf{A}} \right) _j \in {\mathbb {C}}\) is effectively computed via a DFT
for some integers \(0 \le h_j < s_j\). Note that we are guaranteed to have noisy evaluations of f at each of these points by assumption. That is, we have \(f \left( x_{j,k} \right) + n_{j,k}\) for all \(x_{j,k} := -\pi + \frac{2 \pi k}{s_j}\), \(k = 0, \dots , s_j - 1\).
We therefore approximate each \(\left( {\mathcal {G}_{\lambda ,K} \tilde{\psi } \mathbf{A}} \right) _j\) via an approximate DFT as per (19) by
One can now see that
holds for all j. Every entry of both \({\mathcal {E}_{s_1,K} \tilde{\psi } \mathbf{A}}\) and \({\mathcal {G}_{\lambda ,K} \tilde{\psi } \mathbf{A}}\) referred to in Algorithm 3 will therefore be effectively replaced by its corresponding \(E_j\) estimate. Thus, the lemma we seek to prove is essentially obtained by simply incorporating the additional error estimate (20) into the analysis of Algorithm 3 in [21] wherever an \({\mathcal {E}_{s_1,K} \tilde{\psi } \mathbf{A}}\) or \({\mathcal {G}_{\lambda ,K} \tilde{\psi } \mathbf{A}}\) currently appears.
To show that lines 6 – 14 of Algorithm 3 will identify all \(\omega \in B\) satisfying (15) we can adapt the proof of Lemma 6 on page 72 of [21]. Choose any \(\omega \in B\) you like. Lemmas 3 and 5 from [21] together with (20) above ensure that both
and
hold for more than half of the j and \(j'\)-indexes that Algorithm 3 uses to approximate \(\widehat{f}_{\omega }\). The rest of the proof of Lemma 6 now follows exactly as in [21] after the \(\delta \) at the top of page 73 is redefined to be \(\delta := \frac{\epsilon \cdot \left\| {\hat{\mathbf {f}}}- {\hat{\mathbf {f}}}^{\mathrm{opt}}_{(s/\epsilon )} \right\| _1}{s} + \left\| \widehat{f} - \widehat{f}\vert _{B} \right\| _1 + \Vert \mathbf {n}\Vert _\infty \), each \(\left( { \mathcal {G}_{\lambda ,K} \tilde{\psi } \mathbf{A}} \right) _j\) entry is replaced by \(E_j\), and each \(\left( {\mathcal {E}_{s_1,K} \tilde{\psi } \mathbf{A}} \right) _{j'}\) entry is replaced by \(E_{j'}\).
Similarly, to show that lines 15 – 18 of Algorithm 3 will produce an estimate \(z_{\omega } \in {\mathbb {C}}\) satisfying (16) for every \(\omega \in S\) one can simply modify the first few lines of the proof of Theorem 7 in Appendix F of [21]. In particular, one can redefine \(\delta \) as above, replace the appearance of each \(\left( { \mathcal {G}_{\lambda ,K} \tilde{\psi } \mathbf{A}} \right) _j\) entry by \(E_j\), and then use (21). The bounds on the runtime follow from the last paragraph of the proof of Theorem 7 in Appendix F of [21] with no required changes. To finish, we note that the second paragraph of the lemma above follows from a completely analogous modification of the proof of Corollary 4 in Appendix G of [21]. \(\square \)
1.1 Appendix C.1: Proof of Theorem 2
To get the first paragraph of Theorem 2 one can simply utilize the proof of Theorem 7 exactly as it is written in Appendix F of [21] after redefining \(\delta \) as above, and then replacing the appearance of each \(\left( \mathcal {G}_{\lambda ,K} \tilde{\psi } \mathbf{A} \right) _j\) entry with its approximation \(E_j\). Once this has been done, equation (42) in the proof of Theorem 7 can then be taken as a consequence of Lemma 4 above. In addition, all references to Lemma 6 of [21] in the proof can then also be replaced with appeals to Lemma 4 above. To finish, the proof of Corollary 4 in Appendix G of [21] can now be modified in a completely analogous fashion in order to prove the second paragraph of Theorem 2.
Rights and permissions
About this article
Cite this article
Merhi, S., Zhang, R., Iwen, M.A. et al. A New Class of Fully Discrete Sparse Fourier Transforms: Faster Stable Implementations with Guarantees. J Fourier Anal Appl 25, 751–784 (2019). https://doi.org/10.1007/s00041-018-9616-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00041-018-9616-4
Keywords
- Fast Fourier transforms
- Discrete Fourier transforms
- Sparse Fourier transforms
- Nonequispaced Fourier transforms
- Compressive sensing
- Sparse approximation