1 Introduction

Uncertainty principles have maintained a significant role in both science and engineering for most of the past century. In 1927, the concept was introduced by Werner Heisenberg in the context of quantum mechanics [24], in which a particle’s position and momentum are represented by wavefunctions \(f,g\in L^2(\mathbb {R})\), and g happens to be the Fourier transform of f. Measuring the position or momentum of a particle amounts to drawing a random variable whose probability density function is a normalized version of \(|f|^2\) or \(|g|^2\), respectively. Heisenberg’s uncertainty principle postulates a fundamental limit on the precision with which one can measure both position and momentum; in particular, the variance of the position measurement is small only if the momentum measurement exhibits large variance. From a mathematical perspective, this physical principle can be viewed as an instance of a much broader meta-theorem in harmonic analysis:

A nonzero function and its Fourier transform cannot be simultaneously localized.

Heisenberg’s uncertainty principle provides a lower bound on the product of the variances of the probability density functions corresponding to f and \(\hat{f}\). In the time since, various methods have emerged for quantifying localization. For example, instead of variance, one might consider entropy [6], the size of the density’s support [2], or how rapidly it decays [23]. Furthermore, the tradeoff in localization need not be represented by a product—as we will see, it is sometimes more telling to consider a sum.

Beyond physics, the impossibility of simultaneous localization has had significant consequences in signal processing. For example, when working with the short-time Fourier transform, one is forced to choose between temporal and frequency resolution. More recently, the emergence of digital signal processing has prompted the investigation of uncertainty principles underlying the discrete Fourier transform, notably by Donoho and Stark [17], Tao [42], and Tropp [45]. Associated with this line of work is the uniform uncertainty principle of Candès and Tao [12], which played a key role in the development of compressed sensing. The present paper continues this investigation of discrete uncertainty principles with an eye on applications in sparse signal processing.

1.1 Background and Overview

For any finite abelian group G, let \(\ell (G)\) denote the set of functions \(x:G\rightarrow \mathbb {C}\), and \(\widehat{G}\subseteq \ell (G)\) the group of characters over G. Then taking inner products with these characters and normalizing leads to the (unitary) Fourier transform \(F:\ell (G)\rightarrow \ell (\widehat{G})\), namely

$$\begin{aligned} (Fx)[\chi ] :=\frac{1}{\sqrt{|G|}}\sum _{g\in G}x[g]\overline{\chi [g]} \qquad \forall \chi \in \widehat{G}. \end{aligned}$$

The reader who is unfamiliar with Fourier analysis over finite abelian groups is invited to learn more in [43]. In the case where \(G=\mathbb {Z}/n\mathbb {Z}\) (which we denote by \(\mathbb {Z}_n\) in the sequel), the above definition coincides with the familiar discrete Fourier transform after one identifies characters with their frequencies. The following theorem provides two uncertainty principles in terms of the so-called 0-norm \(\Vert \cdot \Vert _0\), defined to be number of nonzero entries in the argument.

Theorem 1

([17, Theorem 1], [42, Theorem 1.1]) Let G be a finite abelian group, and let \(F:\ell (G)\rightarrow \ell (\widehat{G})\) denote the corresponding Fourier transform. Then

$$\begin{aligned} \Vert x\Vert _0\Vert Fx\Vert _0\ge |G|\qquad \forall x\in \ell (G){\setminus }\{0\}. \end{aligned}$$
(1)

Furthermore, if |G| is prime, then

$$\begin{aligned} \Vert x\Vert _0+\Vert Fx\Vert _0\ge |G|+1\qquad \forall x\in \ell (G){\setminus }\{0\}. \end{aligned}$$
(2)

Proof

Sketch For (1), apply the fact that the \(\ell _1/\ell _\infty \)-induced norm of F is given by \(\Vert F\Vert _{1\rightarrow \infty }=1/\sqrt{|G|}\), along with Cauchy–Schwarz and Parseval’s identity:

$$\begin{aligned} \Vert Fx\Vert _{\infty }\le & {} \frac{1}{\sqrt{|G|}}\Vert x\Vert _1 \le \sqrt{\frac{\Vert x\Vert _0}{|G|}}\Vert x\Vert _2 =\sqrt{\frac{\Vert x\Vert _0}{|G|}}\Vert Fx\Vert _2\\\le & {} \sqrt{\frac{\Vert x\Vert _0\Vert Fx\Vert _0}{|G|}}\Vert Fx\Vert _{\infty }, \end{aligned}$$

where the last step bounds a sum in terms of its largest summand. Rearranging gives the result.

For (2), suppose otherwise that there exists \(x\ne 0\) which violates the claimed inequality. Denote \(\mathcal {J}={\text {supp}}(x)\) and pick some \(\mathcal {I}\subseteq \widehat{G}{\setminus }{\text {supp}}(Fx)\) with \(|\mathcal {I}|=|\mathcal {J}|\). Then \(0=(Fx)_\mathcal {I}=F_{\mathcal {I}\mathcal {J}}x_\mathcal {J}\). Since the submatrix \(F_{\mathcal {I}\mathcal {J}}\) is necessarily invertible by a theorem of Chebotarëv [39], we conclude that \(x_\mathcal {J}=0\), a contradiction. \(\square \)

We note that the additive uncertainty principle above is much stronger than its multiplicative counterpart. Indeed, with the help of the arithmetic mean–geometric mean inequality, (1) immediately implies

$$\begin{aligned} \Vert x\Vert _0+\Vert Fx\Vert _0 \ge 2\sqrt{\Vert x\Vert _0\Vert Fx\Vert _0} \ge 2\sqrt{|G|} \qquad \forall x\in \ell (G), \end{aligned}$$
(3)

which is sharp when \(G=\mathbb {Z}_n\) and n is a perfect square (simply take x to be a Dirac comb, specifically, the indicator function \(1_K\) of the subgroup K of size \(\sqrt{n}\)). More generally, if n is not prime, then \(n=ab\) with integers \(a,b\in [2,n/2]\), and so \(a+b\le n/2+2<n+1\); as such, taking x to be an indicator function of the subgroup of size a (whose Fourier transform necessarily has 0-norm b) will violate (2). Overall, the hypothesis that |G| is prime cannot be weakened. Still, something can be said if one slightly strengthens the hypothesis on x. For example, Theorem A in [44] gives that for every \(S\subseteq G\),

$$\begin{aligned} \Vert x\Vert _0+\Vert Fx\Vert _0 >\sqrt{|G|\Vert x\Vert _0} \end{aligned}$$

for almost every \(x\in \ell (G)\) supported on S. This suggests that extreme functions like the Dirac comb are atypical, i.e., (3) is “barely sharp”.

One could analogously argue that, in some sense, (2) is “barely true” when |G| is prime. For an illustration, Fig. 1 depicts a discrete version of the Gaussian function, which is constructed by first periodizing the function \(f(t)=e^{-n\pi t^2}\) over the real line in order to have unit period, and then sampling this periodized function at multiples of 1 / n. As we verify in Sect. 3.2, the resulting function \(x\in \ell (\mathbb {Z}_n)\) satisfies \(Fx=x\), analogous to the fact that a Gaussian function in \(L^2(\mathbb {R})\) with the proper width is fixed by the Fourier transform. Given its resemblance to the fast-decaying Gaussian function over \(\mathbb {R}\), it comes as no surprise that many entries of this function are nearly zero. In the depicted case where \(n=211\) (which is prime), only 99 entries of this function manage to be larger than machine precision, and so from a numerical perspective, this function appears to contradict Theorem 1: \(99+99=198<212=211+1\).

Fig. 1
figure 1

Discrete Gaussian function, obtained by periodizing the function \(f(t)=e^{-n\pi t^2}\) with period 1 before sampling at multiples of 1 / n. The resulting function in \(\mathbb {Z}_n\) is fixed by the \(n\times n\) discrete Fourier transform. In this figure, we take \(n=211\), and only 99 entries are larger than machine precision (i.e., \(2.22\times 10^{-16}\)). As such, an unsuspecting signal processor might think \(\Vert x\Vert _0\) and \(\Vert Fx\Vert _0\) are both 99 instead of 211. Since 211 is prime and \(99+99=198<212=211+1\), this illustrates a lack of numerical robustness in the additive uncertainty principle of Theorem 1. By contrast, our main result (Theorem 2) provides a robust alternative in terms of numerical sparsity, though the result is not valid for the discrete Fourier transform, but rather a random unitary matrix.

To help resolve this discrepancy, we consider a numerical version of traditional sparsity which is aptly named numerical sparsity:

$$\begin{aligned} {\text {ns}}(x) :=\frac{\Vert x\Vert _1^2}{\Vert x\Vert _2^2} \qquad \forall x\in \mathbb {C}^n{\setminus }\{0\}. \end{aligned}$$

See Fig. 2 for an illustration. This ratio appeared as early as 1978 in the context of geophysics [20]. More recently, it has been used as a proxy for sparsity in various signal processing applications [14, 25, 31, 36, 40]. The numerical rank of a matrix is analogously defined as the square ratio of the nuclear and Frobenius norms, and has been used, for example, in Alon’s work on extremal combinatorics [1]. We note that numerical sparsity is invariant under nonzero scaling, much like traditional sparsity. In addition, one bounds the other:

$$\begin{aligned} {\text {ns}}(x) \le \Vert x\Vert _0. \end{aligned}$$
(4)

To see this, apply Cauchy–Schwarz to get

$$\begin{aligned} \Vert x\Vert _1 =\langle |x|,1_{{\text {supp}}(x)}\rangle \le \Vert x\Vert _2\Vert 1_{{\text {supp}}(x)}\Vert _2 =\Vert x\Vert _2\sqrt{\Vert x\Vert _0}, \end{aligned}$$

where |x| denotes the entrywise absolute value of x. Rearranging then gives (4). For this paper, the most useful feature of numerical sparsity is its continuity, as this will prevent near-counterexamples like the one depicted in Fig. 1. What follows is our main result, which leverages numerical sparsity to provide uncertainty principles that are analogous to those in Theorem 1:

Fig. 2
figure 2

Traditional sparsity \(\Vert x\Vert _0\) (left) and numerical sparsity \({\text {ns}}(x)\) (right) for all x in the unit circle in \(\mathbb {R}^2\). This illustrates how numerical sparsity is a continuous analog of traditional sparsity; we leverage this feature to provide robust alternatives to the uncertainty principles of Theorem 1. In this case, one may verify that \({\text {ns}}(x)\le \Vert x\Vert _0\) by visual inspection.

Theorem 2

(Main resultFootnote 1) Let U be an \(n\times n\) unitary matrix. Then

$$\begin{aligned} {\text {ns}}(x){\text {ns}}(Ux)\ge \frac{1}{\Vert U\Vert _{1\rightarrow \infty }^2}\qquad \forall x\in \mathbb {C}^n{\setminus }\{0\}, \end{aligned}$$
(5)

where \(\Vert \cdot \Vert _{1\rightarrow \infty }\) denotes the induced matrix norm. Furthermore, there exists a universal constant \(c>0\) such that if U is drawn uniformly from the unitary group \({\text {U}}(n)\), then with probability \(1-e^{-\Omega (n)}\),

$$\begin{aligned} {\text {ns}}(x)+{\text {ns}}(Ux)\ge (c-o(1))n\qquad \forall x\in \mathbb {C}^n{\setminus }\{0\}. \end{aligned}$$
(6)

Perhaps the most glaring difference between Theorems 1 and 2 is our replacement of the Fourier transform with an arbitrary unitary matrix. Such generalizations have appeared in the quantum physics literature (for example, see [29]), as well as in the sparse signal processing literature [15, 16, 21, 40, 44, 45]. Our multiplicative uncertainty principle still applies when \(U=F\), in which case \(\Vert U\Vert _{1\rightarrow \infty }=1/\sqrt{n}\). Considering (4), the uncertainty principle in this case immediately implies the analogous principle in Theorem 1. Furthermore, the proof is rather straightforward: Apply Hölder’s inequality to get

$$\begin{aligned} {\text {ns}}(x){\text {ns}}(Ux)=\frac{\Vert x\Vert _1^2}{\Vert x\Vert _2^2}\cdot \frac{\Vert Ux\Vert _1^2}{\Vert Ux\Vert _2^2} \ge \frac{\Vert x\Vert _1^2}{\Vert x\Vert _2^2}\cdot \frac{\Vert Ux\Vert _2^2}{\Vert Ux\Vert _{\infty }^2} =\frac{\Vert x\Vert _1^2}{\Vert Ux\Vert _{\infty }^2} \ge \frac{1}{\Vert U\Vert _{1\rightarrow \infty }^2}. \end{aligned}$$
(7)

By contrast, the proof of our additive uncertainty principle is not straightforward, and it does not hold if we replace U with F. Indeed, as we show in Sect. 3.2, the discrete Gaussian function depicted in Fig. 1 has numerical sparsity \(O(\sqrt{n})\), thereby violating (6); recall that the same function is a near-counterexample of the analogous principle in Theorem 1. Interestingly, our uncertainty principle establishes that the Fourier transform is rare in that the vast majority of unitary matrices offer much more uncertainty in the worst case. This naturally leads to the following question:

Problem 3

For each n, what is the largest \(c=c(n)\) for which there exists a unitary matrix U that satisfies \({\text {ns}}(x)+{\text {ns}}(Ux)\ge cn\) for every \(x\in \mathbb {C}^n{\setminus }\{0\}\)?

Letting \(x=e_1\) gives \({\text {ns}}(x)+{\text {ns}}(Ux)\le 1+\Vert Ux\Vert _0\le n+1\), and so \(c(n)\le 1+o(1)\); a bit more work produces a strict inequality \(c(n)<1+1/n\) for \(n\ge 4\). Also, our proof of the uncertainty principle implies \(\liminf _{n\rightarrow \infty }c(n)\ge 1/540{,}000\).

1.2 Outline

The primary focus of this paper is Theorem 2. Having already proved the multiplicative uncertainty principle in (7), it remains to prove the additive counterpart, which we do in the following section. Next, Sect. 3 considers functions which achieve either exact or near equality in (5) when U is the discrete Fourier transform. Surprisingly, exact equality occurs in (5) precisely when it occurs in (1). We also show that the discrete Gaussian depicted in Fig. 1 achieves near equality in (5). We conclude in Sect. 4 by studying a few applications, specifically, sparse signal demixing, compressed sensing with partial Fourier operators, and the fast detection of sparse signals.

2 Proof of Additive Uncertainty Principle

In this section, we prove the additive uncertainty principle in Theorem 2. The following provides a more explicit statement of the principle we prove:

Theorem 4

Draw U uniformly from the unitary group \({\text {U}}(n)\). Then with probability \(\ge 1-8e^{-n/4096}\),

$$\begin{aligned} {\text {ns}}(x)+{\text {ns}}(Ux)\ge \frac{1}{9}\left\lfloor \frac{n}{60{,}000}\right\rfloor \qquad \forall x\in \mathbb {C}^n{\setminus }\{0\}. \end{aligned}$$

For the record, we did not attempt to optimize the constants. Our proof of this theorem makes use of several ideas from the compressed sensing literature:

Definition 5

Take any \(m\times n\) matrix \(\Phi =[\varphi _1\cdots \varphi _n]\).

  1. (a)

    We say \(\Phi \) exhibits \((k,\theta )\)-restricted orthogonality if

    $$\begin{aligned} |\langle \Phi x,\Phi y\rangle |\le \theta \Vert x\Vert _2\Vert y\Vert _2 \end{aligned}$$

    for every \(x,y\in \mathbb {C}^n\) with \(\Vert x\Vert _0,\Vert y\Vert _0\le k\) and disjoint support.

  2. (b)

    We say \(\Phi \) satisfies the \((k,\delta )\)-restricted isometry property if

    $$\begin{aligned} (1-\delta )\Vert x\Vert _2^2\le \Vert \Phi x\Vert _2^2\le (1+\delta )\Vert x\Vert _2^2 \end{aligned}$$

    for every \(x\in \mathbb {C}^n\) with \(\Vert x\Vert _0\le k\).

  3. (c)

    We say \(\Phi \) satisfies the (kc)-width property if

    $$\begin{aligned} \Vert x\Vert _2\le \frac{c}{\sqrt{k}}\Vert x\Vert _1 \end{aligned}$$

    for every x in the nullspace of \(\Phi \).

The restricted isometry property is a now-standard sufficient condition for uniformly stable and robust reconstruction from compressed sensing measurements (for example, see [11]). As the following statement reveals, restricted orthogonality implies the restricted isometry property:

Lemma 6

[4, Lemma 11] If a matrix satisfies \((k,\theta )\)-restricted orthogonality and its columns have unit norm, then it also satisfies the \((k,\delta )\)-restricted isometry property with \(\delta =2\theta \).

To prove Theorem 4, we will actually make use of the width property, which was introduced by Kashin and Temlyakov [27] to characterize uniformly stable \(\ell _1\) reconstruction for compressed sensing. Luckily, the restricted isometry property implies the width property:

Lemma 7

([9, Theorem 11], cf. [27]) If a matrix satisfies the \((k,\delta )\)-restricted isometry property for some positive integer k and \(\delta <1/3\), then it also satisfies the (k, 3)-width property.

What follows is a stepping-stone result that we will use to prove Theorem 4, but it is also of independent interest:

Theorem 8

Draw U uniformly from the unitary group \({\text {U}}(n)\). Then [I U] satisfies the \((k,\delta )\)-restricted isometry property with probability \(\ge 1-8e^{-\delta ^2n/256}\) provided \(\delta <1\) and

$$\begin{aligned} n\ge \frac{256}{\delta ^2}k\log \bigg (\frac{en}{k}\bigg ). \end{aligned}$$
(8)

This is perhaps not surprising, considering various choices of structured random matrices are known to form restricted isometries with high probability [3, 8, 12, 28, 34, 35, 37]. To prove Theorem 8, we show that the structured matrix enjoys restricted orthogonality with high probability, and then appeal to Lemma 6. Before proving this result, we first motivate it by proving the desired uncertainty principle:

Proof of Theorem 4

Take \(k=\lfloor n/60{,}000\rfloor \) and \(\delta =1/4\). We will show \({\text {ns}}(x)+{\text {ns}}(Ux)\ge k/9\) for every nonzero \(x\in \mathbb {C}^n\). If \(k=0\), the result is immediate, and so \(n\ge 60{,}000\) without loss of generality. In this regime, we have \(k\in [n/120{,}000{,}n/60{,}000]\), and so

$$\begin{aligned} \frac{256}{\delta ^2}k\log \bigg (\frac{en}{k}\bigg ) \le 4096\log (120{,}000e)\cdot k \le 60{,}000k \le n. \end{aligned}$$

Theorem 8 and Lemma 7 then give that [I U] satisfies the (k, 3)-width property with probability \(\ge 1-8e^{-n/4096}\). Observe that \(z=[Ux;-x]\) resides in the nullspace of [I U] regardless of \(x\in \mathbb {C}^n\). In the case where x (and therefore z) is nonzero, the width property and the arithmetic mean–geometric mean inequality together give

$$\begin{aligned} \frac{k}{9} \le \frac{\Vert z\Vert _1^2}{\Vert z\Vert _2^2}= & {} \frac{(\Vert x\Vert _1+\Vert Ux\Vert _1)^2}{\Vert x\Vert _2^2+\Vert Ux\Vert _2^2} =\frac{\Vert x\Vert _1^2+2\Vert x\Vert _1\Vert Ux\Vert _1+\Vert Ux\Vert _1^2}{2\Vert x\Vert _2^2}\\\le & {} {\text {ns}}(x)+{\text {ns}}(Ux). \end{aligned}$$

\(\square \)

Proof of Theorem 8

Take \([I~U]=[\varphi _1\cdots \varphi _{2n}]\), and let k be the largest integer satisfying (8). We will demonstrate that [I U] satisfies the \((k,\delta )\)-restricted isometry property, which will then imply the \((k',\delta )\)-restricted isometry property for all \(k'<k+1\), and therefore all k satisfying (8). To this end, define the random quantities

$$\begin{aligned} \theta ^\star (U):=\max _{\begin{array}{c} x,y\in \mathbb {C}^{2n}\\ \Vert x\Vert _0,\Vert y\Vert _0\le k\\ {\text {supp}}(x)\cap {\text {supp}}(y)=\emptyset \end{array}}\frac{|\langle \Phi x,\Phi y\rangle |}{\Vert x\Vert _2\Vert y\Vert _2}, \qquad \theta (U):=\max _{\begin{array}{c} x,y\in \mathbb {C}^{2n}\\ \Vert x\Vert _0,\Vert y\Vert _0\le k\\ {\text {supp}}(x)\subseteq [n]\\ {\text {supp}}(y)\subseteq [n]^c \end{array}}\frac{|\langle \Phi x,\Phi y\rangle |}{\Vert x\Vert _2\Vert y\Vert _2}. \end{aligned}$$

We first claim that \(\theta ^\star (U)\le \theta (U)\). To see this, for any xy satisfying the constraints in \(\theta ^\star (U)\), decompose \(x=x_1+x_2\) so that \(x_1\) and \(x_2\) are supported in [n] and \([n]^c\), respectively, and similarly \(y=y_1+y_2\). For notational convenience, let S denote the set of all 4-tuples (abcd) of k-sparse vectors in \(\mathbb {C}^{2n}\) such that a and b are disjointly supported in [n], while c and d are disjointly supported in \([n]^c\). Then \((x_1,y_1,x_2,y_2)\in S\). Since \({\text {supp}}(x)\) and \({\text {supp}}(y)\) are disjoint, and since I and U each have orthogonal columns, we have

$$\begin{aligned} \langle \Phi x,\Phi y\rangle =\langle \Phi x_1,\Phi y_2\rangle +\langle \Phi x_2,\Phi y_1\rangle . \end{aligned}$$

As such, the triangle inequality gives

$$\begin{aligned} \theta ^\star (U)&=\max _{\begin{array}{c} x,y\in \mathbb {C}^{2n}\\ \Vert x\Vert _0,\Vert y\Vert _0\le k\\ {\text {supp}}(x)\cap {\text {supp}}(y)=\emptyset \end{array}}\frac{|\langle \Phi x_1,\Phi y_2\rangle +\langle \Phi x_2,\Phi y_1\rangle |}{\Vert x\Vert _2\Vert y\Vert _2}\\&\le \max _{(x_1,y_1,x_2,y_2)\in S}\frac{|\langle \Phi x_1,\Phi y_2\rangle |+|\langle \Phi x_2,\Phi y_1\rangle |}{\sqrt{\Vert x_1\Vert _2^2+\Vert x_2\Vert _2^2}\sqrt{\Vert y_1\Vert _2^2+\Vert y_2\Vert _2^2}}\\&\le \bigg (\max _{(x_1,y_1,x_2,y_2)\in S}\frac{\Vert x_1\Vert _2\Vert y_2\Vert _2+\Vert x_2\Vert _2\Vert y_1\Vert _2}{\sqrt{\Vert x_1\Vert _2^2+\Vert x_2\Vert _2^2}\sqrt{\Vert y_1\Vert _2^2+\Vert y_2\Vert _2^2}}\bigg )\theta (U)\\&\le \theta (U), \end{aligned}$$

where the last step follows from squaring and applying the arithmetic mean–geometric mean inequality:

$$\begin{aligned} \bigg (\frac{\sqrt{ad}+\sqrt{bc}}{\sqrt{(a+b)(c+d)}}\bigg )^2 =\frac{ad+bc+2\sqrt{acbd}}{(a+b)(c+d)} \le \frac{ad+bc+(ac+bd)}{(a+b)(c+d)} =1. \end{aligned}$$

At this point, we seek to bound the probability that \(\theta (U)\) is large. First, we observe an equivalent expression:

$$\begin{aligned} \theta (U)=\max _{\begin{array}{c} x,y\in \mathbb {C}^n\\ \Vert x\Vert _2=\Vert y\Vert _2=1\\ \Vert x\Vert _0,\Vert y\Vert _0\le k \end{array}}|\langle x,Uy\rangle |. \end{aligned}$$

To estimate the desired probability, we will pass to an \(\epsilon \)-net \(\mathcal {N}_\epsilon \) of k-sparse vectors with unit 2-norm. A standard volume-comparison argument gives that the unit sphere in \(\mathbb {R}^{m}\) enjoys an \(\epsilon \)-net of size \(\le (1+2/\epsilon )^m\) (see [47, Lemma 5.2]). As such, for each choice of k coordinates, we can cover the corresponding copy of the unit sphere in \(\mathbb {C}^k=\mathbb {R}^{2k}\) with \(\le (1+2/\epsilon )^{2k}\) points, and unioning these produces an \(\epsilon \)-net of size

$$\begin{aligned} |\mathcal {N}_\epsilon | \le \left( {\begin{array}{c}n\\ k\end{array}}\right) \bigg (1+\frac{2}{\epsilon }\bigg )^{2k}. \end{aligned}$$

To apply this \(\epsilon \)-net, we note that \(\Vert x-x'\Vert _2,\Vert y-y'\Vert _2\le \epsilon \) and \(\Vert x'\Vert _2=\Vert y'\Vert _2=1\) together imply

$$\begin{aligned} |\langle x,Uy\rangle |&=|\langle x'+x-x',U(y'+y-y')\rangle |\\&\le |\langle x',Uy'\rangle |+\Vert x-x'\Vert _2+\Vert y-y'\Vert _2+\Vert x-x'\Vert _2\Vert y-y'\Vert _2\\&\le |\langle x',Uy'\rangle |+3\epsilon , \end{aligned}$$

where the last step assumes \(\epsilon \le 1\). As such, the union bound gives

$$\begin{aligned}&{\text {Pr}}(\theta (U)>t) \nonumber \\&\quad ={\text {Pr}}\bigg (\exists x,y\in \mathbb {C}^n,~\Vert x\Vert _2=\Vert y\Vert _2=1,~\Vert x\Vert _0,\Vert y\Vert _0\le k~\text{ s.t. }~|\langle x,Uy\rangle |>t\bigg )\nonumber \\&\quad \le {\text {Pr}}\bigg (\exists x,y\in \mathcal {N}_\epsilon ~\text{ s.t. }~|\langle x,Uy\rangle |>t-3\epsilon \bigg )\nonumber \\&\quad \le \sum _{x,y\in \mathcal {N}_\epsilon }{\text {Pr}}\Big (|\langle x,Uy\rangle |>t-3\epsilon \Big )\nonumber \\&\quad =\left( {\begin{array}{c}n\\ k\end{array}}\right) ^2\bigg (1+\frac{2}{\epsilon }\bigg )^{4k}{\text {Pr}}\Big (|\langle e_1,Ue_1\rangle |>t-3\epsilon \Big ), \end{aligned}$$
(9)

where the last step uses the fact that the distribution of U is invariant under left- and right-multiplication by any deterministic unitary matrix (e.g., unitary matrices that send \(e_1\) to x and y to \(e_1\), respectively). It remains to prove tail bounds on \(U_{11}:=\langle e_1,Ue_1\rangle \). First, we apply the union bound to get

$$\begin{aligned} {\text {Pr}}(|U_{11}|>u)&\le {\text {Pr}}\bigg (|{\text {Re}}(U_{11})|>\frac{u}{\sqrt{2}}\bigg )+{\text {Pr}}\bigg (|{\text {Im}}(U_{11})|>\frac{u}{\sqrt{2}}\bigg )\nonumber \\&=4{\text {Pr}}\bigg ({\text {Re}}(U_{11})>\frac{u}{\sqrt{2}}\bigg ), \end{aligned}$$
(10)

where the last step uses the fact that \({\text {Re}}(U_{11})\) has even distribution. Next, we observe that \({\text {Re}}(U_{11})\) has the same distribution as \(g/\sqrt{h}\), where g has standard normal distribution and h has chi-squared distribution with 2n degrees of freedom. Indeed, this can be seen from one method of constructing the matrix U: Start with an \(n\times n\) matrix G with iid \(N(0,1)+iN(0,1)\) complex Gaussian entries and apply Gram–Schmidt to the columns; the first column of U is then the first column of G divided by its norm \(\sqrt{h}\). Let \(s>0\) be arbitrary (to be selected later). Then \(g/\sqrt{h}>u/\sqrt{2}\) implies that either \(g>\sqrt{s}u/\sqrt{2}\) or \(h<s\). As such, the union bound implies

$$\begin{aligned} {\text {Pr}}\bigg ({\text {Re}}(U_{11})>\frac{u}{\sqrt{2}}\bigg ) \le 2\max \bigg \{{\text {Pr}}\bigg (g>\sqrt{s}\frac{u}{\sqrt{2}}\bigg ),{\text {Pr}}(h<s)\bigg \}. \end{aligned}$$
(11)

For the first term, Proposition 7.5 in [19] gives

$$\begin{aligned} {\text {Pr}}\bigg (g>\sqrt{s}\frac{u}{\sqrt{2}}\bigg ) \le e^{-su^2/4}. \end{aligned}$$
(12)

For the second term, Lemma 1 in [30] gives \({\text {Pr}}(h<2n-\sqrt{8nx})\le e^{-x}\) for any \(x>0\). Picking \(x=(2n-s)^2/(8n)\) then gives

$$\begin{aligned} {\text {Pr}}(h<s)\le e^{-(2n-s)^2/(8n)}. \end{aligned}$$
(13)

We use the estimate \(\left( {\begin{array}{c}n\\ k\end{array}}\right) \le (en/k)^k\) when combining (9)–(13) to get

$$\begin{aligned} \log \Big ({\text {Pr}}(\theta (U)>t)\Big )\le & {} 2k\log \bigg (\frac{en}{k}\bigg )+4k\log \bigg (1+\frac{2}{\epsilon }\bigg )\\&+\,\log 8-\min \bigg \{\frac{s(t-3\epsilon )^2}{4},\frac{(2n-s)^2}{8n}\bigg \}. \end{aligned}$$

Notice \(n/k\ge (256/\delta ^2)\log (en/k)\ge 256\) implies that taking \(\epsilon =\sqrt{(k/n)\log (en/k)}\) gives

$$\begin{aligned} \sqrt{\frac{en}{k}}-\frac{2}{\epsilon } =\bigg (1-\frac{2}{\sqrt{e\log (n/k)}}\bigg )\sqrt{\frac{en}{k}} \ge \bigg (1-\frac{2}{\sqrt{e\log (256)}}\bigg )\sqrt{256e} \ge 1, \end{aligned}$$

which can be rearranged to get

$$\begin{aligned} \log \bigg (1+\frac{2}{\epsilon }\bigg ) \le \frac{1}{2}\log \bigg (\frac{en}{k}\bigg ). \end{aligned}$$

As such, we also pick \(s=n\) and \(t=\sqrt{(64k/n)\log (en/k)}\) to get

$$\begin{aligned} \log \Big ({\text {Pr}}(\theta (U)>t)\Big )\le & {} 4k\log \bigg (\frac{en}{k}\bigg )+\log 8-\frac{25}{4}k\log \bigg (\frac{en}{k}\bigg ) \\\le & {} \log 8-2k\log \bigg (\frac{en}{k}\bigg ). \end{aligned}$$

Since we chose k to be the largest integer satisfying (8), we therefore have \(\theta (U)\le \sqrt{(64k/n)\log (n/k)}\) with probability \(\ge 1-8e^{-\delta ^2n/256}\). Lemma 6 then gives the result. \(\square \)

3 Low Uncertainty with the Discrete Fourier Transform

In this section, we study functions which achieve either exact or near equality in our multiplicative uncertainty principle (6) in the case where the unitary matrix U is the discrete Fourier transform.

3.1 Exact Equality in the Multiplicative Uncertainty Principle

We seek to understand when equality is achieved in (6) in the special case of the discrete Fourier transform. For reference, the analogous result for (1) is already known:

Theorem 9

[17, Theorem 13] Suppose \(x\in \ell (\mathbb {Z}_n)\) satisfies \(\Vert x\Vert _0\Vert Fx\Vert _0=n\). Then x has the form \(x=cT^aM^b1_K\), where \(c\in \mathbb {C}\), K is a subgroup of \(\mathbb {Z}_n\), and \(T,M:\ell (\mathbb {Z}_n)\rightarrow \ell (\mathbb {Z}_n)\) are translation and modulation operators defined by

$$\begin{aligned} (Tx)[j]:=x[j-1], \qquad (Mx)[j]:=e^{2\pi ij/n}x[j] \qquad \forall j\in \mathbb {Z}_n. \end{aligned}$$

Here, i denotes the imaginary unit \(\sqrt{-1}\).

In words, equality is achieved in (1) by indicator functions of subgroups, namely, the so-called Dirac combs (as well as their scalar multiples, translations, modulations). We seek an analogous characterization for our uncertainty principle (6). Surprisingly, the characterization is identical:

Theorem 10

Suppose \(x\in \ell (\mathbb {Z}_n)\). Then \({\text {ns}}(x){\text {ns}}(Fx)=n\) if and only if \(\Vert x\Vert _0\Vert Fx\Vert _0=n\).

Proof

(\(\Leftarrow \)) This follows directly from (4), along with Theorems 1 and 2.

(\(\Rightarrow \)) It suffices to show that \({\text {ns}}(x)=\Vert x\Vert _0\) and \({\text {ns}}(Fx)=\Vert Fx\Vert _0\). Note that both F and \(F^{-1}\) are unitary operators and \(\Vert F\Vert _{1\rightarrow \infty }^2=\Vert F^{-1}\Vert _{1\rightarrow \infty }^2=1/n\). By assumption, taking \(y:=Fx\) then gives

$$\begin{aligned} {\text {ns}}(F^{-1}y){\text {ns}}(y)={\text {ns}}(x){\text {ns}}(Fx)=n. \end{aligned}$$

We will use the fact that x and y each achieve equality in the first part of Theorem 2 with \(U=F\) and \(U=F^{-1}\), respectively. Notice from the proof (7) that equality occurs only if x and y satisfy equality in Hölder’s inequality, that is,

$$\begin{aligned} \Vert x\Vert _1\Vert x\Vert _{\infty }=\Vert x\Vert _2^2,\qquad \Vert y\Vert _1\Vert y\Vert _{\infty }=\Vert y\Vert _2^2. \end{aligned}$$
(14)

To achieve the first equality in (14),

$$\begin{aligned} \sum _{j\in \mathbb {Z}_n}|x[j]|^2 =\Vert x\Vert _2^2 =\Vert x\Vert _1\Vert x\Vert _{\infty } =\sum _{j\in \mathbb {Z}_n}|x[j]|\max _{k\in \mathbb {Z}_n}|x[k]|. \end{aligned}$$

This implies that \(|x[j]|=\max _k|x[k]|\) for every j with \(x[j]\ne 0\). Similarly, in order for the second equality in (14) to hold, \(|y[j]|=\max _k|y[k]|\) for every j with \(y[j]\ne 0\). As such, \(|x|=a1_A\) and \(|y|=b1_B\) for some \(a,b>0\) and \(A,B\subseteq \mathbb {Z}_n\). Then

$$\begin{aligned} {\text {ns}}(x) =\frac{\Vert x\Vert _1^2}{\Vert x\Vert _2^2} =\frac{(a|A|)^2}{a^2|A|} =|A| =\Vert x\Vert _0, \end{aligned}$$

and similarly, \({\text {ns}}(y)=\Vert y\Vert _0\). \(\square \)

3.2 Near Equality in the Multiplicative Uncertainty Principle

Having established that equality in the new multiplicative uncertainty principle (5) is equivalent to equality in the analogous principle (1), we wish to separate these principles by focusing on near equality. For example, in the case where n is prime, \(\mathbb {Z}_n\) has no nontrivial proper subgroups, and so by Theorem 9, equality is only possible with identity basis elements and complex exponentials. On the other hand, we expect the new principle to accommodate nearly sparse vectors, and so we appeal to the discrete Gaussian depicted in Fig. 1:

Theorem 11

Define \(x\in \ell (\mathbb {Z}_n)\) by

$$\begin{aligned} x[j] :=\sum _{j'\in \mathbb {Z}}e^{-n\pi (\frac{j}{n}+j')^2} \qquad \forall j\in \mathbb {Z}_n. \end{aligned}$$
(15)

Then \(Fx=x\) and \({\text {ns}}(x){\text {ns}}(Fx)\le (2+o(1))n\).

In words, the discrete Gaussian achieves near equality in the uncertainty principle (5). Moreover, numerical evidence suggests that \({\text {ns}}(x){\text {ns}}(Fx)=(2+o(1))n\), i.e., the 2 is optimal for the discrete Gaussian. Note that this does not depend on whether n is prime or a perfect square. Recall that a function \(f\in C^{\infty }(\mathbb {R})\) is Schwarz if \(\sup _{x\in \mathbb {R}}|x^{\alpha } f^{(\beta )}(x)|<\infty \) for every pair of nonnegative integers \(\alpha \) and \(\beta \). We use this to quickly prove a well-known lemma that will help us prove Theorem 11:

Lemma 12

Suppose \(f\in C^{\infty }(\mathbb {R})\) is Schwarz and construct a discrete function \(x\in \ell (\mathbb {Z}_n)\) by periodizing and sampling f as follows:

$$\begin{aligned} x[j]=\sum _{j'\in \mathbb {Z}}f\bigg (\frac{j}{n}+j'\bigg ) \qquad \forall j\in \mathbb {Z}_n. \end{aligned}$$
(16)

Then the discrete Fourier transform of x is determined by \(\hat{f}(\xi ):=\int _{-\infty }^\infty f(t)e^{-2\pi i \xi t}dt\):

$$\begin{aligned} (Fx)[k]=\sqrt{n}\sum _{k'\in \mathbb {Z}}\hat{f}(k+k'n) \qquad \forall k\in \mathbb {Z}_n. \end{aligned}$$

Proof

Since f is Schwarz, we may apply the Poisson summation formula:

$$\begin{aligned} x[j] =\sum _{j'\in \mathbb {Z}}f\bigg (\frac{j}{n}+j'\bigg ) =\sum _{l\in \mathbb {Z}}\hat{f}(l)e^{2\pi ijl/n}. \end{aligned}$$

Next, the geometric sum formula gives

$$\begin{aligned} (Fx)[k]&=\frac{1}{\sqrt{n}}\sum _{j\in \mathbb {Z}_n}\bigg (\sum _{l\in \mathbb {Z}}\hat{f}(l)e^{2\pi ijl/n}\bigg )e^{-2\pi ijk/n}\\&=\frac{1}{\sqrt{n}}\sum _{l\in \mathbb {Z}}\hat{f}(l)\sum _{j\in \mathbb {Z}_n}\Big (e^{2\pi i(l-k)/n}\Big )^j =\sqrt{n}\sum _{\begin{array}{c} l\in \mathbb {Z}\\ l\equiv k\bmod n \end{array}}\hat{f}(l). \end{aligned}$$

The result then follows from a change of variables. \(\square \)

Proof of Theorem 11

It is straightforward to verify that the function \(f(t)=e^{-n\pi t^2}\) is Schwarz. Note that defining x according to (16) then produces (15). Considering \(\hat{f}(\xi )=n^{-1/2}e^{-\pi \xi ^2/n}\), one may use Lemma 12 to quickly verify that \(Fx=x\). To prove Theorem 11, it then suffices to show that \({\text {ns}}(x)\le (\sqrt{2}+o(1))\sqrt{n}\). We accomplish this by bounding \(\Vert x\Vert _2\) and \(\Vert x\Vert _1\) separately.

To bound \(\Vert x\Vert _2\), we first expand a square to get

$$\begin{aligned} \Vert x\Vert _2^2 =\sum _{j\in \mathbb {Z}_n}\bigg (\sum _{j'\in \mathbb {Z}}e^{-n\pi (\frac{j}{n}+j')^2}\bigg )^2 =\sum _{j\in \mathbb {Z}_n}\sum _{j'\in \mathbb {Z}}\sum _{j''\in \mathbb {Z}}e^{-n\pi [(\frac{j}{n}+j')^2+(\frac{j}{n}+j'')^2]}. \end{aligned}$$

Since all of the terms in the sum are nonnegative, we may infer a lower bound by discarding the terms for which \(j''\ne j'\). This yields the following:

$$\begin{aligned} \Vert x\Vert _2^2 \ge \sum _{j\in \mathbb {Z}_n}\sum _{j'\in \mathbb {Z}}e^{-2n\pi (\frac{j}{n}+j')^2} =\sum _{k\in \mathbb {Z}}e^{-2\pi k^2/n} \ge \int _{-\infty }^{\infty }e^{-2\pi x^2/n}dx-1 =\sqrt{\frac{n}{2}}-1, \end{aligned}$$

where the last inequality follows from an integral comparison. Next, we bound \(\Vert x\Vert _1\) using a similar integral comparison:

$$\begin{aligned} \Vert x\Vert _1 =\sum _{j\in \mathbb {Z}_n}\sum _{j'\in \mathbb {Z}}e^{-n\pi (\frac{j}{n}+j')^2} =\sum _{k\in \mathbb {Z}}e^{-\pi k^2/n} \le \int _{-\infty }^\infty e^{-\pi x^2/n}dx+1 =\sqrt{n}+1. \end{aligned}$$

Overall, we have

$$\begin{aligned} {\text {ns}}(x) =\frac{\Vert x\Vert _1^2}{\Vert x\Vert _2^2} \le \frac{(\sqrt{n}+1)^2}{\sqrt{n/2}-1} =(\sqrt{2}+o(1))\sqrt{n}. \end{aligned}$$

\(\square \)

4 Applications

Having studied the new uncertainty principles in Theorem 2, we now take some time to identify certain consequences in various sparse signal processing applications. In particular, we report consequences in sparse signal demixing, in compressed sensing with partial Fourier operators, and in the fast detection of sparse signals.

4.1 Sparse Signal Demixing

Suppose a signal x is sparse in the Fourier domain and corrupted by noise \(\epsilon \) which is sparse in the time domain (such as speckle). The goal of demixing is to recover the original signal x given the corrupted signal \(z=x+\epsilon \); see [32] for a survey of various related demixing problems. Provided Fx and \(\epsilon \) are sufficiently sparse, it is known that this recovery can be accomplished by solving

$$\begin{aligned} v^{\star }\quad :=\quad {\text {argmin}}\quad \Vert v\Vert _1\quad \text {subject to}\quad [I~F]v=Fz, \end{aligned}$$
(17)

where, if successful, the solution \(v^{\star }\) is the column vector obtained by concatenating Fx and \(\epsilon \); see [38] for an early appearance of this sort of approach. To some extent, we know how sparse Fx and \(\epsilon \) must be for this \(\ell _1\) recovery method to succeed. Coherence-based guarantees in [15, 16, 21] show that it suffices for \(v^{\star }\) to be k-sparse with \(k=O(\sqrt{n})\), while restricted isometry–based guarantees [5, 11] allow for \(k=O(n)\) if [I F] is replaced with a random matrix. This disparity is known as the square-root bottleneck [46]. In particular, does [I F] perform similarly to a random matrix, or is the coherence-based sufficient condition on k also necessary?

In the case where n is a perfect square, it is well known that the coherence-based sufficient condition is also necessary. Indeed, let K denote the subgroup of \(\mathbb {Z}_n\) of size \(\sqrt{n}\) and suppose \(x=1_K\) and \(\epsilon =-1_K\). Then \([Fx;\epsilon ]\) is \(2\sqrt{n}\)-sparse, and yet \(z=0\), thereby forcing \(v^\star =0\). On the other hand, if n is prime, then the additive uncertainty principle of Theorem 1 implies that every member of the nullspace of [I F] has at least \(n+1\) nonzero entries, and so \(v^\star \ne 0\) in this setting. Still, considering Fig. 1, one might expect a problem from a stability perspective. In this section, we use numerical sparsity to show that \(\Phi =[I~F]\) cannot break the square-root bottleneck, even if n is prime. To do this, we will make use of the following theorem:

Theorem 13

(see [9, 27]) Denote \(\Delta (y):={\text {argmin}}\Vert x\Vert _1\) subject to \(\Phi x=y\). Then

$$\begin{aligned} \Vert \Delta (\Phi x)-x\Vert _2\le \frac{C}{\sqrt{k}}\Vert x-x_k\Vert _1\qquad \forall x\in \mathbb {R}^n \end{aligned}$$
(18)

if and only if \(\Phi \) satisfies the (kc)-width property. Furthermore, \(C\asymp c\) in both directions of the equivalence.

Take x as defined in (15). Then \([x;-x]\) lies in the nullspace of [I F] and

$$\begin{aligned} {\text {ns}}\big ([x;-x]\big ) =\frac{(2\Vert x\Vert _1)^2}{2\Vert x\Vert _2^2} =2{\text {ns}}(x) \le (2\sqrt{2}+o(1))\sqrt{n}, \end{aligned}$$

where the last step follows from the proof of Theorem 11. As such, [I F] satisfies the (kc)-width property for some c independent of n only if \(k=O(\sqrt{n})\). Furthermore, Theorem 13 implies that stable demixing by \(\ell _1\) reconstruction requires \(k=O(\sqrt{n})\), thereby proving the necessity of the square-root bottleneck in this case.

It is worth mentioning that the restricted isometry property is a sufficient condition for (18) (see [11], for example), and so by Theorem 8, one can break the square-root bottleneck by replacing the F in [I F] with a random unitary matrix. This gives a uniform demixing guarantee which is similar to those provided by McCoy and Tropp [33], though the convex program they consider differs from (17).

4.2 Compressed Sensing with Partial Fourier Operators

Consider the random \(m\times n\) matrix obtained by drawing rows uniformly with replacement from the \(n\times n\) discrete Fourier transform matrix. If \(m=\Omega _\delta (k{\text {polylog}} n)\), then the resulting partial Fourier operator satisfies the restricted isometry property, and this fact has been dubbed the uniform uncertainty principle [12]. A fundamental problem in compressed sensing is determining the smallest number m of random rows necessary. To summarize the progress to date, Candès and Tao [12] first found that \(m=\Omega _\delta (k\log ^6n)\) rows suffice, then Rudelson and Vershynin [37] proved \(m=\Omega _\delta (k\log ^4n)\), and recently, Bourgain [8] achieved \(m=\Omega _\delta (k\log ^3n)\); Nelson, Price and Wootters [34] also achieved \(m=\Omega _\delta (k\log ^3n)\), but using a slightly different measurement matrix. In this subsection, we provide a lower bound: in particular, \(m=\Omega _\delta (k\log n)\) is necessary whenever k divides n. Our proof combines ideas from the multiplicative uncertainty principle and the classical problem of coupon collecting.

The coupon collector’s problem asks how long it takes to collect all k coupons in an urn if you repeatedly draw one coupon at a time randomly with replacement. It is a worthwhile exercise to prove that the expected number of trials scales like \(k\log k\). We will require even more information about the distribution of the random number of trials:

Theorem 14

(see [13, 18]) Let \(T_k\) denote the random number of trials it takes to collect k different coupons, where in each trial, a coupon is drawn uniformly from the k coupons with replacement.

  1. (a)

    For each \(a\in \mathbb {R}\),

    $$\begin{aligned} \lim _{k\rightarrow \infty }{\text {Pr}}\Big (T_k\le k\log k+ak\Big ) =e^{-e^{-(a+\gamma )}}, \end{aligned}$$

    where \(\gamma \approx 0.5772\) denotes the Euler–Mascheroni constant.

  2. (b)

    There exists \(c>0\) such that for each k,

    $$\begin{aligned} \sup _{a\in \mathbb {R}}\bigg |{\text {Pr}}\Big (T_k\le k\log k+ak\Big )-e^{-e^{-(a+\gamma )}}\bigg | \le \frac{c\log k}{k}. \end{aligned}$$

Lemma 15

Suppose k divides n, and draw m iid rows uniformly from the \(n\times n\) discrete Fourier transform matrix to form a random \(m\times n\) matrix \(\Phi \). If \(m< k\log k\), then the nullspace of \(\Phi \) contains a k-sparse vector with probability \(\ge 0.4-c(\log k)/k\), where c is the constant from Theorem 14(b).

Proof

Let K denote the subgroup of \(\mathbb {Z}_n\) of size k, and let \(1_K\) denote its indicator function. We claim that some modulation of \(1_K\) resides in the nullspace of \(\Phi \) with the probability reported in the lemma statement. Let H denote the subgroup of \(\mathbb {Z}_n\) of size n / k. Then the Fourier transform of each modulation of \(1_K\) is supported on some coset of H. Letting M denote the random row indices that are drawn uniformly from \(\mathbb {Z}_n\), a modulation of \(1_K\) resides in the nullspace of \(\Phi \) precisely when M fails to intersect the corresponding coset of H. As there are k cosets, each with probability 1 / k, this amounts to a coupon-collecting problem (explicitly, each “coupon” is a coset, and we “collect” the cosets that M intersects). The result then follows immediately from Theorem 14(b):

$$\begin{aligned} {\text {Pr}}(T_k\le m) \le e^{-e^{-(m/k-\log k+\gamma )}}+\frac{c\log k}{k} \le e^{-e^{-\gamma }}+\frac{c\log k}{k} \le 0.6+\frac{c\log k}{k}. \end{aligned}$$

\(\square \)

Presumably, one may remove the divisibility hypothesis in Lemma 15 at the price of weakening the conclusion. We suspect that the new conclusion would declare the existence of a vector x of numerical sparsity k such that \(\Vert \Phi x\Vert _2\ll \Vert x\Vert _2\). If so, then \(\Phi \) fails to satisfy the so-called robust width property, which is necessary and sufficient for stable and robust reconstruction by \(\ell _1\) minimization [9]. For the sake of simplicity, we decided not to approach this, but we suspect that modulations of the discrete Gaussian would adequately fill the role of the current proof’s modulated indicator functions.

What follows is the main result of this subsection:

Theorem 16

Let k be sufficiently large, suppose k divides n, and draw m iid rows uniformly from the \(n\times n\) discrete Fourier transform matrix to form a random \(m\times n\) matrix \(\Phi \). Take \(\delta <1/3\). Then \(\Phi \) satisfies the \((k,\delta )\)-restricted isometry property with probability \(\ge 2/3\) only if

$$\begin{aligned} m\ge C(\delta )k\log (en), \end{aligned}$$

where \(C(\delta )\) is some constant depending only on \(\delta \).

Proof

In the event that \(\Phi \) satisfies \((k,\delta )\)-RIP, we know that no k-sparse vector lies in the nullspace of \(\Phi \). Therefore, Lemma 15 implies

$$\begin{aligned} m\ge k\log k, \end{aligned}$$
(19)

since otherwise \(\Phi \) fails to be \((k,\delta )\)-RIP with probability \(\ge 0.4-c(\log k)/k>1/3\), where the last step uses the fact that k is sufficiently large. Next, we leverage standard techniques from compressed sensing: \((k,\delta )\)-RIP implies (18) with \(C=C_1(\delta )\) (see [10, Theorem 3.3]), which in turn implies

$$\begin{aligned} m\ge C_2(\delta )k\log \bigg (\frac{en}{k}\bigg ) \end{aligned}$$
(20)

by Theorem 11.7 in [19]. Since \(\Phi \) is \((k,\delta )\)-RIP with positive probability, we know there exists an \(m\times n\) matrix which is \((k,\delta )\)-RIP, and so m must satisfy (20). Combining with (19) then gives

$$\begin{aligned} m&\ge \max \bigg \{k\log k,C_2(\delta )k\log \bigg (\frac{en}{k}\bigg )\bigg \}. \end{aligned}$$

The result then follows from applying the bound \(\max \{a,b\}\ge (a+b)/2\) and then taking \(C(\delta ):=(1/2)\min \{1,C_2(\delta )\}\). \(\square \)

We note that the necessity of \(k\log n\) random measurements contrasts with the proportional-growth asymptotic adopted in [7] to study the restricted isometry property of Gaussian matrices. Indeed, it is common in compressed sensing to consider phase transitions in which k, m and n are taken to infinity with fixed ratios k / m and m / n. However, since random partial Fourier operators fail to be restricted isometries unless \(m=\Omega _\delta (k\log n)\), such a proportional-growth asymptotic fails to capture the so-called strong phase transition of these operators [7].

The proof of Theorem 16 relies on the fact that the measurements are drawn at random. By contrast, it is known that every \(m\times n\) partial Hadamard operator fails to satisfy \((k,\delta )\)-RIP unless \(m=\Omega _\delta (k\log n)\) [22, 41]. We leave the corresponding deterministic result in the Fourier case for future work.

4.3 Fast Detection of Sparse Signals

The previous subsection established fundamental limits on the number of Fourier measurements necessary to perform compressed sensing with a uniform guarantee. However, for some applications, signal reconstruction is unnecessary. In this subsection, we consider one such application, namely sparse signal detection, in which the goal is to test the following hypotheses:

$$\begin{aligned} H_0&:~x=0\\ H_1&:~\Vert x\Vert _2^2=\frac{n}{k},~\Vert x\Vert _0\le k. \end{aligned}$$

Here, we assume we know the 2-norm of the sparse vector we intend to detect, and we set it to be \(\sqrt{n/k}\) without loss of generality (this choice of scaling will help us interpret our results later). We will assume the data is accessed according to the following query–response model:

Definition 17

(Query–response model) If the ith query is \(j_i\in \mathbb {Z}_n\), then the ith response is \((Fx)[j_i]+\epsilon _i\), where the \(\epsilon _i\)’s are iid complex random variables with some distribution such that

$$\begin{aligned} \mathbb {E}|\epsilon _i|=\alpha , \qquad \mathbb {E}|\epsilon _i|^2=\beta ^2. \end{aligned}$$

The coefficient of variation v of \(|\epsilon _i|\) is defined as

$$\begin{aligned} v =\frac{\sqrt{{\text {Var}}|\epsilon _i|}}{\mathbb {E}|\epsilon _i|} =\frac{\sqrt{\beta ^2-\alpha ^2}}{\alpha }. \end{aligned}$$
(21)

Note that for any scalar \(c\ne 0\), the mean and variance of \(|c\epsilon _i|\) are \(|c|\alpha \) and \(|c|^2{\text {Var}}|\epsilon _i|\), respectively. As such, v is scale invariant and is simply a quantification of the “shape” of the distribution of \(|\epsilon _i|\). We will evaluate the responses to our queries with an \(\ell _1\) detector, defined below.

Definition 18

(\(\ell _1\) detector) Fix a threshold \(\tau \). Given responses \(\{y_i\}_{i=1}^m\) from the query–response model, if

$$\begin{aligned} \sum _{i=1}^m|y_i|>\tau , \end{aligned}$$

then reject \(H_0\).

The following is the main result of this section:

Theorem 19

Suppose \(\alpha \le 1/(8k)\). Randomly draw m indices uniformly from \(\mathbb {Z}_n\) with replacement, input them into the query–response model and apply the \(\ell _1\) detector with threshold \(\tau =2m\alpha \) to the responses. Then

$$\begin{aligned} {\text {Pr}}\bigg (\text {reject }H_0~\bigg |~H_0\bigg )\le p \end{aligned}$$
(22)

and

$$\begin{aligned} {\text {Pr}}\bigg (\text {fail to reject }H_0~\bigg |~H_1\bigg )\le p \end{aligned}$$
(23)

provided \(m\ge (8k+2v^2)/p\), where v is the coefficient of variation defined in (21).

In words, the probability that the \(\ell _1\) detector delivers a false positive is at most p, as is the probability that it delivers a false negative. These error probabilities can be estimated better given more information about the distribution of the random noise, and presumably, the threshold \(\tau \) can be modified to decrease one error probability at the price of increasing the other. Notice that we only use O(k) samples in the Fourier domain to detect a k-sparse signal. Since the sampled indices are random, it will take \(O(\log n)\) bits to communicate each query, leading to a total computational burden of \(O(k\log n)\) operations. This contrasts with the state-of-the-art sparse fast Fourier transform algorithms which require \(\Omega (k\log (n/k))\) samples and take \(O(k{\text {polylog}}n)\) time (see [26] and references therein). We suspect k-sparse signals cannot be detected with substantially fewer samples (in the Fourier domain or any domain).

We also note that the acceptable noise magnitude \(\alpha =O(1/k)\) is optimal in some sense. To see this, consider the case where k divides n and x is a properly scaled indicator function of the subgroup of size k. Then Fx is the indicator function of the subgroup of size n / k. (Thanks to our choice of scaling, each nonzero entry in the Fourier domain has unit magnitude.) Since a proportion of 1 / k entries is nonzero in the Fourier domain, we can expect to require O(k) random samples in order to observe a nonzero entry, and the \(\ell _1\) detector will not distinguish the entry from accumulated noise unless \(\alpha =O(1/k)\).

Before proving Theorem 19, we first prove a couple of lemmas. We start by estimating the probability of a false positive:

Lemma 20

Take \(\epsilon _1,\ldots ,\epsilon _m\) to be iid complex random variables with \(\mathbb {E}|\epsilon _i|=\alpha \) and \(\mathbb {E}|\epsilon _i|^2=\beta ^2\). Then

$$\begin{aligned} {\text {Pr}}\bigg (\sum _{i=1}^m|\epsilon _i|> 2m\alpha \bigg )\le p \end{aligned}$$

provided \(m\ge v^2/p\), where v is the coefficient of variation of \(|\epsilon _i|\) defined in (21).

Proof

Denoting \(X:=\sum _{i=1}^m|\epsilon _i|\), we have \(\mathbb {E}X=m\alpha \) and \({\text {Var}}X=m(\beta ^2-\alpha ^2)\). Chebyshev’s inequality then gives

$$\begin{aligned} {\text {Pr}}\bigg (\sum _{i=1}^m|\epsilon _i|-m\alpha>t\bigg ) \le {\text {Pr}}(|X-\mathbb {E}X|>t) \le \frac{{\text {Var}}X}{t^2} =\frac{m(\beta ^2-\alpha ^2)}{t^2}. \end{aligned}$$

Finally, we take \(t=m\alpha \) to get

$$\begin{aligned} {\text {Pr}}\bigg (\sum _{i=1}^m|\epsilon _i|>2m\alpha \bigg ) \le m\frac{(\beta ^2-\alpha ^2)}{(m\alpha )^2} =\frac{\beta ^2-\alpha ^2}{m\alpha ^2} \le \frac{\beta ^2-\alpha ^2}{\alpha ^2}\cdot \frac{p}{v^2} =p. \end{aligned}$$

\(\square \)

Next, we leverage the multiplicative uncertainty principle in Theorem 2 to estimate moments of noiseless responses:

Lemma 21

Suppose \(\Vert x\Vert _0\le k\) and \(\Vert x\Vert _2^2=n/k\). Draw j uniformly from \(\mathbb {Z}_n\) and define \(Y:=|(Fx)[j]|\). Then

$$\begin{aligned} \mathbb {E}Y\ge \frac{1}{k}, \qquad \mathbb {E}Y^2=\frac{1}{k}. \end{aligned}$$

Proof

Recall that \({\text {ns}}(x)\le \Vert x\Vert _0\le k\). With this, Theorem 2 gives

$$\begin{aligned} n \le {\text {ns}}(x){\text {ns}}(Fx) \le k{\text {ns}}(Fx). \end{aligned}$$

We rearrange and apply the definition of numerical sparsity to get

$$\begin{aligned} \frac{n}{k} \le {\text {ns}}(Fx) =\frac{\Vert Fx\Vert _1^2}{\Vert Fx\Vert _2^2} =\frac{\Vert Fx\Vert _1^2}{\Vert x\Vert _2^2} =\frac{\Vert Fx\Vert _1^2}{n/k}, \end{aligned}$$

where the second to last equality is due to Parseval’s identity. Thus, \(\Vert Fx\Vert _1\ge n/k\). Finally,

$$\begin{aligned} \mathbb {E}Y =\frac{1}{n}\sum _{j\in \mathbb {Z}_n}|(Fx)[j]| =\frac{1}{n}\Vert Fx\Vert _1 \ge \frac{1}{k} \end{aligned}$$

and

$$\begin{aligned} \mathbb {E}Y^2 =\frac{1}{n}\sum _{j\in \mathbb {Z}_n}|(Fx)[j]|^2 =\frac{1}{n}\Vert Fx\Vert _2^2 =\frac{1}{k}. \end{aligned}$$

\(\square \)

Proof of Theorem 19

Lemma 20 gives (22), and so it remains to prove (23). Denoting \(Y_i:=|(Fx)[j_i]|\), we know that \(|y_i|\ge Y_i-|\epsilon _i|\), and so

$$\begin{aligned} {\text {Pr}}\bigg (\sum _{i=1}^m|y_i|\le 2ma\bigg ) \le {\text {Pr}}\bigg (\sum _{i=1}^mY_i-\sum _{i=1}^m|\epsilon _i|\le 2ma\bigg ). \end{aligned}$$
(24)

For notational convenience, put \(Z:=\sum _{i=1}^mY_i-\sum _{i=1}^m|\epsilon _i|\). We condition on the size of the noise and apply Lemma 20 with the fact that \(m\ge v^2/(p/2)\) to bound (24):

$$\begin{aligned} {\text {Pr}}(Z\le 2m\alpha )&={\text {Pr}}\bigg (Z\le 2m\alpha ~\bigg |~\sum _{i=1}^m|\epsilon _i|>2m\alpha \bigg ){\text {Pr}}\bigg (\sum _{i=1}^m|\epsilon _i|>2m\alpha \bigg )\nonumber \\&\qquad +{\text {Pr}}\bigg (Z\le 2m\alpha ~\bigg |~\sum _{i=1}^m|\epsilon _i|\le 2m\alpha \bigg ){\text {Pr}}\bigg (\sum _{i=1}^m|\epsilon _i|\le 2m\alpha \bigg )\nonumber \\&\le \frac{p}{2}+{\text {Pr}}\bigg (\sum _{i=1}^mY_i\le 4m\alpha \bigg ). \end{aligned}$$
(25)

Now we seek to bound the second term of (25). Taking \(X=\sum _{i=1}^mY_i\), Lemma 21 gives \(\mathbb {E}X\ge m/k\) and \({\text {Var}}X=m{\text {Var}}Y_i\le m\mathbb {E}Y_i^2=m/k\). As such, applying Chebyshev’s inequality gives

$$\begin{aligned} {\text {Pr}}\bigg (\sum _{i=1}^mY_i<\frac{m}{k}-t\bigg ) \le {\text {Pr}}(X\le \mathbb {E}X-t) \le {\text {Pr}}(|X-\mathbb {E}X|>t) \le \frac{{\text {Var}}(X)}{t^2} \le \frac{m}{kt^2}. \end{aligned}$$

Recalling that \(\alpha \le 1/(8k)\), we take \(t=m/(2k)\) to get

$$\begin{aligned} {\text {Pr}}\bigg (\sum _{i=1}^mY_i\le 4m\alpha \bigg )\le & {} {\text {Pr}}\bigg (\sum _{i=1}^mY_i\le \frac{m}{2k}\bigg )\nonumber \\= & {} {\text {Pr}}\bigg (\sum _{i=1}^mY_i\le \frac{m}{k}-t\bigg ) \le \frac{m}{kt^2} =\frac{4k}{m} \le \frac{p}{2}, \end{aligned}$$
(26)

where the last step uses the fact that \(m\ge 8k/p\). Combining (24), (25), and (26) gives the result. \(\square \)