The Restricted Isometry Property of Subsampled Fourier Matrices

Haviv, Ishay; Regev, Oded

doi:10.1007/978-3-319-45282-1_11

Ishay Haviv¹⁵ &
Oded Regev¹⁶

Part of the book series: Lecture Notes in Mathematics ((LNM,volume 2169))

2144 Accesses
26 Citations

Abstract

A matrix $A \in \mathbb{C}^{q\times N}$ satisfies the restricted isometry property of order k with constant ε if it preserves the ℓ ₂ norm of all k-sparse vectors up to a factor of 1 ±ε. We prove that a matrix A obtained by randomly sampling q = O(k ⋅ log² k ⋅ logN) rows from an N × N Fourier matrix satisfies the restricted isometry property of order k with a fixed ε with high probability. This improves on Rudelson and Vershynin (Comm Pure Appl Math, 2008), its subsequent improvements, and Bourgain (GAFA Seminar Notes, 2014).

Access provided by CONRICYT-eBooks. Download chapter PDF

New Subclasses of the Class of $ \mathrm{\mathscr{H}} $-Matrices and Related Bounds for the Inverses

Article 27 June 2017

Finite configurations in sparse sets

Article 01 February 2016

Bounds on Dimension Reduction in the Nuclear Norm

^{Footnote 1}

1 Introduction

A matrix $A \in \mathbb{C}^{q\times N}$ satisfies the restricted isometry property of order k with constant ε > 0 if for every k-sparse vector $x \in \mathbb{C}^{N}$ (i.e., a vector with at most k nonzero entries), it holds that

$$\displaystyle{ (1-\epsilon ) \cdot \| x\|_{2}^{2} \leq \| Ax\|_{ 2}^{2} \leq (1+\epsilon ) \cdot \| x\|_{ 2}^{2}\;. }$$

(1)

Intuitively, this means that every k columns of A are nearly orthogonal. This notion, due to Candès and Tao [9], was intensively studied during the last decade and found various applications and connections to several areas of theoretical computer science, including sparse recovery [8, 20, 27], coding theory [14], norm embeddings [6, 22], and computational complexity [4, 25, 31].

The original motivation for the restricted isometry property comes from the area of compressed sensing. There, one wishes to compress a high-dimensional sparse vector $x \in \mathbb{C}^{N}$ to a vector Ax, where $A \in \mathbb{C}^{q\times N}$ is a measurement matrix that enables reconstruction of x from Ax. Typical goals in this context include minimizing the number of measurements q and the running time of the reconstruction algorithm. It is known that the restricted isometry property of A, for $\epsilon <\sqrt{2} - 1$, is a sufficient condition for reconstruction. In fact, it was shown in [8, 9, 11, 12] that under this condition, reconstruction is equivalent to finding the vector of least ℓ ₁ norm among all vectors that agree with the given measurements, a task that can be formulated as a linear program [13, 16], and thus can be solved efficiently.

The above application leads to the challenge of finding matrices $A \in \mathbb{C}^{q\times N}$ that satisfy the restricted isometry property and have a small number of rows q as a function of N and k. (For simplicity, we ignore for now the dependence on ε.) A general lower bound of $q = \Omega (k \cdot \log (N/k))$ is known to follow from [18] (see also [17]). Fortunately, there are matrices that match this lower bound, e.g., random matrices whose entries are chosen independently according to the normal distribution [10]. However, in many applications the measurement matrix cannot be chosen arbitrarily but is instead given by a random sample of rows from a unitary matrix, typically the discrete Fourier transform. This includes, for instance, various tests and experiments in medicine and biology (e.g., MRI [28] and ultrasound imaging [21]) and applications in astronomy (e.g., radio telescopes [32]). An advantage of subsampled Fourier matrices is that they support fast matrix-vector multiplication, and as such, are useful for efficient compression as well as for efficient reconstruction based on iterative methods (see, e.g., [26]).

In recent years, with motivation from both theory and practice, an intensive line of research has aimed to study the restricted isometry property of random sub-matrices of unitary matrices. Letting $A \in \mathbb{C}^{q\times N}$ be a (normalized) matrix whose rows are chosen uniformly and independently from the rows of a unitary matrix $M \in \mathbb{C}^{N\times N}$, the goal is to prove an upper bound on q for which A is guaranteed to satisfy the restricted isometry property with high probability. Note that the fact that the entries of every row of A are not independent makes this question much more difficult than in the case of random matrices with independent entries.

The first upper bound on the number of rows of a subsampled Fourier matrix that satisfies the restricted isometry property was O(k ⋅ log⁶ N), which was proved by Candès and Tao [10]. This was then improved by Rudelson and Vershynin [30] to O(k ⋅ log² k ⋅ log(klogN) ⋅ logN) (see also [15, 29] for a simplified analysis with better success probability). A modification of their analysis led to an improved bound of O(k ⋅ log³ k ⋅ logN) by Cheraghchi, Guruswami, and Velingker [14], who related the problem to a question on the list-decoding rate of random linear codes over finite fields. Interestingly, replacing the log(klogN) term in the bound of [30] by logk was crucial for their application.^{Footnote 2} Recently, Bourgain [7] proved a bound of O(k ⋅ logk ⋅ log² N), which is incomparable to those of [14, 30] (and has a worse dependence on ε; see below). We finally mention that the best known lower bound on the number of rows is $\Omega (k \cdot \log N)$ [5].

1.1 Our Contribution

In this work, we improve the previous bounds and prove the following.

Theorem 1.1 (Simplified)

Let $M \in \mathbb{C}^{N\times N}$ be a unitary matrix with entries of absolute value $O(1/\sqrt{N})$ , and let ε > 0 be a fixed constant. For some q = O(k ⋅ log² k ⋅ log N), let $A \in \mathbb{C}^{q\times N}$ be a matrix whose q rows are chosen uniformly and independently from the rows of M, multiplied by $\sqrt{N/q}$ . Then, with high probability, the matrix A satisfies the restricted isometry property of order k with constant ε.

The main idea in our proof is described in Sect. 1.3. We arrived at the proof from our recent work on list-decoding [19], where a baby version of the idea was used to bound the sample complexity of learning the class of Fourier-sparse Boolean functions.^{Footnote 3} Like all previous work on this question, our proof can be seen as a careful union bound applied to a sequence of progressively finer nets, a technique sometimes known as chaining. However, unlike the work of Rudelson and Vershynin [30] and its improvements [14, 15], we avoid the use of Gaussian processes, the “symmetrization process,” and Dudley’s inequality. Instead, we follow and refine Bourgain’s proof [7], and apply the chaining argument directly to the problem at hand using only elementary arguments. It would be interesting to see if our proof can be cast in the Gaussian framework of Rudelson and Vershynin.

We remark that the bounds obtained in the previous works [14, 30] have a multiplicative O(ε ⁻²) term, whereas a much worse term of O(ε ⁻⁶) was obtained in [7]. In our proof of Theorem 1.1 we nearly obtain the best known dependence on ε. For simplicity of presentation we first prove in Sect. 3 our bound with a weaker multiplicative term of O(ε ⁻⁴), and then, in Sect. 4, we modify the analysis and decrease the dependence on ε to O(ε ⁻²) up to logarithmic terms.

1.2 Related Literature

As mentioned before, one important advantage of using subsampled Fourier matrices in compressed sensing is that they support fast, in fact nearly linear time, matrix-vector multiplication. In certain scenarios, however, one is not restricted to using subsampled Fourier matrices as the measurement matrix. The question then is whether one can decrease the number of rows using another measurement matrix, while still keeping the near-linear multiplication time. For k < N ^1∕2−γ where γ > 0 is an arbitrary constant, the answer is yes: a construction with the optimal number O(k ⋅ logN) of rows follows from works by Ailon and Chazelle [1] and Ailon and Liberty [2] (see [6]). For general k, Nelson, Price, and Wootters [27] suggested taking subsampled Fourier matrices and “tweaking” them by bunching together rows with random signs. Using the Gaussian-process-based analysis of [14, 30] and introducing further techniques from [23], they showed that with this construction one can reduce the number of rows by a logarithmic factor to O(k ⋅ log²(klogN) ⋅ logN) while still keeping the nearly linear multiplication time. Our result shows that the same number of rows (in fact, a slightly smaller number) can be achieved already with the original subsampled Fourier matrices without having to use the “tweak.” A natural open question is whether the “tweak” from [27] and their techniques can be combined with ours to further reduce the number of rows. An improvement in the regime of parameters of $k =\omega (\sqrt{N})$ would lead to more efficient low-dimensional embeddings based on Johnson–Lindenstrauss matrices (see, e.g., [1–3, 22, 27]).

1.3 Proof Overview

Recall from Theorem 1.1 and from (1) that our goal is to prove that a matrix A given by a random sample Q of q rows of M satisfies with high probability that for all k-sparse x, ∥ Ax ∥ ₂ ² ≈ ∥ x ∥ ₂ ². Since M is unitary, the latter is equivalent to saying that ∥ Ax ∥ ₂ ² ≈ ∥ Mx ∥ ₂ ². Yet another way of expressing this condition is as

$$\displaystyle{\mathop{\mathbb{E}} _{j\in Q}\left [(\vert Mx\vert ^{2})_{ j}\right ] \approx \mathop{\mathbb{E}} _{j\in [N]}\left [(\vert Mx\vert ^{2})_{ j}\right ]\;,}$$

i.e., that a sample Q ⊆ [N] of q coordinates of the vector | Mx | ² gives a good approximation to the average of all its coordinates. Here, | Mx | ² refers to the vector obtained by taking the squared absolute value of Mx coordinate-wise. For reasons that will become clear soon, it will be convenient to assume without loss of generality that ∥ x ∥ ₁ = 1. With this scaling, the sparsity assumption implies that ∥ Mx ∥ ₂ ² is not too small (namely at least 1∕k), and this will determine the amount of additive error we can afford in the approximation above. This is the only way we use the sparsity assumption.

At a high level, the proof proceeds by defining a finite set of vectors $\mathcal{H}$ that forms a net, i.e., a set satisfying that any vector | Mx | ² is close to one of the vectors in $\mathcal{H}$. We then argue using the Chernoff-Hoeffding bound that for any fixed vector $h \in \mathcal{H}$, a sample of q coordinates gives a good approximation to the average of h. Finally, we complete the proof by a union bound over all $h \in \mathcal{H}$.

In order to define the set $\mathcal{H}$ we notice that since ∥ x ∥ ₁ = 1, Mx can be seen as a weighted average of the columns of M (possibly with signs). In other words, we can think of Mx as the expectation of a vector-valued random variable given by a certain probability distribution over the columns of M. Using the Chernoff-Hoeffding bound again, this implies that we can approximate Mx well by taking the average over a small number of samples from this distribution. We then let $\mathcal{H}$ be the set of all possible such averages, and a bound on the cardinality of $\mathcal{H}$ follows easily (basically N raised to the number of samples). This technique is sometimes referred to as Maurey’s empirical method.

The argument above is actually oversimplified, and carrying it out leads to rather bad bounds on q. As a result, our proof in Sect. 3 is slightly more delicate. Namely, instead of just one set $\mathcal{H}$, we have a sequence of sets, $\mathcal{H}_{1},\mathcal{H}_{2},\ldots$, each being responsible for approximating a different scale of | Mx | ². The first set $\mathcal{H}_{1}$ approximates | Mx | ² on coordinates on which its value is highest; since the value is high, we need less samples in order to approximate it well, as a result of which the set $\mathcal{H}_{1}$ is small. The next set $\mathcal{H}_{2}$ approximates | Mx | ² on coordinates on which its value is somewhat smaller, and is therefore a bigger set, and so on and so forth. The end result is that any vector | Mx | ² can be approximately decomposed into a sum ∑ _i h ⁽ⁱ⁾, with $h^{(i)} \in \mathcal{H}_{i}$. To complete the proof, we argue that a random choice of q coordinates approximates all the vectors in all the $\mathcal{H}_{i}$ well. The reason working with several $\mathcal{H}_{i}$ leads to the better bound stated in Theorem 1.1 is this: even though as i increases the number of vectors in $\mathcal{H}_{i}$ grows, the quality of approximation that we need the q coordinates to provide decreases, since the value of | Mx | ² there is small and so errors are less significant. It turns out that these two requirements on q balance each other perfectly, leading to the desired bound on q.

2 Preliminaries

Notation

The notation x ≈ _ε, α y means that x ∈ [(1 −ε)y −α, (1 +ε)y +α]. For a matrix M, we denote by M ^(ℓ) the ℓth column of M and define ∥ M ∥ _∞ = max_i, j | M _i, j | .

The Restricted Isometry Property The restricted isometry property is defined as follows.

Definition 2.1

We say that a matrix $A \in \mathbb{C}^{q\times N}$ satisfies the restricted isometry property of order k with constant ε if for every k-sparse vector $x \in \mathbb{C}^{N}$ it holds that

$$\displaystyle{(1-\epsilon ) \cdot \| x\|_{2}^{2} \leq \| Ax\|_{ 2}^{2} \leq (1+\epsilon ) \cdot \| x\|_{ 2}^{2}.}$$

Chernoff-Hoeffding Bounds We now state the Chernoff-Hoeffding bound (see, e.g., [24]) and derive several simple corollaries that will be used extensively later.

Theorem 2.2

Let X ₁ ,…,X _N be N identically distributed independent random variables in [0,a] satisfying $\mathop{\mathbb{E}}[X_{i}] =\mu$ for all i, and denote $\overline{X} = \frac{1} {N} \cdot \sum _{i=1}^{N}X_{ i}$ . Then there exists a universal constant C such that for every 0 < ε ≤ 1∕2, the probability that $\overline{X} \approx _{\epsilon,0}\mu$ is at least $1 - 2e^{-C\cdot N\mu \epsilon ^{2}/a }$ .

Corollary 2.3

Let X ₁ ,…,X _N be N identically distributed independent random variables in [0,a] satisfying $\mathop{\mathbb{E}}[X_{i}] =\mu$ for all i, and denote $\overline{X} = \frac{1} {N} \cdot \sum _{i=1}^{N}X_{ i}$ . Then there exists a universal constant C such that for every 0 < ε ≤ 1∕2 and α > 0, the probability that $\overline{X} \approx _{\epsilon,\alpha }\mu$ is at least 1 − 2e ^{−C⋅Nαε∕a} .

Proof

If $\mu \geq \frac{\alpha }{\varepsilon }$ then by Theorem 2.2 the probability that $\overline{X} \approx _{\epsilon,0}\mu$ is at least $1 - 2e^{-C\cdot N\mu \epsilon ^{2}/a }$, which is at least 1 − 2e ^{−C⋅ N α ε∕a}. Otherwise, Theorem 2.2 for $\tilde{\varepsilon }= \frac{\alpha }{\mu }>\epsilon$ implies that the probability that $\overline{X} \approx _{\tilde{\varepsilon },0}\mu$, hence $\overline{X} \approx _{0,\alpha }\mu$, is at least $1 - 2e^{-C\cdot N\mu \tilde{\varepsilon }^{2}/a }$, and the latter is at least 1 − 2e ^{−C⋅ N α ε∕a}. ■

Corollary 2.4

Let X ₁ ,…,X _N be N identically distributed independent random variables in [−a,+a] satisfying $\mathop{\mathbb{E}}[X_{i}] =\mu$ and $\mathop{\mathbb{E}}[\vert X_{i}\vert ] =\tilde{\mu }$ for all i, and denote $\overline{X} = \frac{1} {N} \cdot \sum _{i=1}^{N}X_{ i}$ . Then there exists a universal constant C such that for every 0 < ɛ ^′ ≤ 1∕2 and α > 0, the probability that $\overline{X} \approx _{0,\epsilon ^{{\prime}}\cdot \tilde{\mu }+\alpha }\mu$ is at least $1 - 4e^{-C\cdot N\alpha \epsilon ^{{\prime}}/a }$ .

Proof

The corollary follows by applying Corollary 2.3 to max(X _i, 0) and to − min(X _i, 0). ■

We end with the additive form of the bound, followed by an easy extension to the complex case.

Corollary 2.5

Let X ₁ ,…,X _N be N identically distributed independent random variables in [−a,+a] satisfying $\mathop{\mathbb{E}}[X_{i}] =\mu$ for all i, and denote $\overline{X} = \frac{1} {N} \cdot \sum _{i=1}^{N}X_{ i}$ . Then there exists a universal constant C such that for every b > 0, the probability that $\overline{X} \approx _{0,b}\mu$ is at least $1 - 4e^{-C\cdot Nb^{2}/a^{2} }$ .

Proof

We can assume that b ≤ 2a. The corollary follows by applying Corollary 2.4 to, say, α = 3b∕4 and ε ^′ = b∕(4a). ■

Corollary 2.6

Let X ₁ ,…,X _N be N identically distributed independent complex-valued random variables satisfying |X _i |≤ a and $\mathop{\mathbb{E}}[X_{i}] =\mu$ for all i, and denote $\overline{X} = \frac{1} {N} \cdot \sum _{i=1}^{N}X_{ i}$ . Then there exists a universal constant C such that for every b > 0, the probability that $\vert \overline{X}\vert \approx _{0,b}\vert \mu \vert$ is at least $1 - 8e^{-C\cdot Nb^{2}/a^{2} }$ .

Proof

By Corollary 2.5 applied to the real and imaginary parts of the random variables X ₁, …, X _N it follows that for a universal constant C, the probability that $\mathsf{Re}(\overline{X}) \approx _{0,b/\sqrt{2}}\mathsf{Re}(\mu )$ and $\mathsf{Im}(\overline{X}) \approx _{0,b/\sqrt{2}}\mathsf{Im}(\mu )$ is at least $1 - 8e^{-C\cdot Nb^{2}/a^{2} }$. By triangle inequality, it follows that with such probability we have $\vert \overline{X}\vert \approx _{0,b}\vert \mu \vert$, as required. ■

3 The Simpler Analysis

In this section we prove our result with a multiplicative term of O(ε ⁻⁴) in the bound. This will be obtained in Theorem 3.7 as an easy corollary of the following theorem.

Theorem 3.1

For a sufficiently large N, a matrix $M \in \mathbb{C}^{N\times N}$ , and sufficiently small ε,η > 0, the following holds. For some q = O(ε ⁻³ η ⁻¹log N ⋅ log² (1∕η)), let Q be a multiset of q uniform and independent random elements of [N]. Then, with probability $1 - 2^{-\Omega (\epsilon ^{-2}\cdot \log N\cdot \log (1/\eta )) }$ , it holds that for every $x \in \mathbb{C}^{N}$ ,

$$\displaystyle{\mathop{\mathbb{E}} _{j\in Q}\left [\vert (Mx)_{j}\vert ^{2}\right ] \approx _{\epsilon,\eta \cdot \|x\|_{1}^{2}\cdot \|M\|_{\infty }^{2}}\mathop{ \mathbb{E}} _{j\in [N]}\left [\vert (Mx)_{j}\vert ^{2}\right ].}$$

Throughout the proof we assume without loss of generality that the matrix $M \in \mathbb{C}^{N\times N}$ satisfies ∥ M ∥ _∞ = 1. For ε, η > 0, we denote t = log₂(1∕η), r = log₂(1∕ε ²), and γ = η∕(2t).

We now define the approximating vector sets $\mathcal{H}_{i}$, i = 1, …, t, each responsible for coordinates of | Mx | ² of a different scale (the larger the i the smaller the scale). We start by defining the “raw approximations” $\mathcal{G}_{i}$, which are essentially vectors obtained by averaging a certain number of columns of M. We then define the vectors in $\mathcal{H}_{i}$ by restricting the vectors in $\mathcal{G}_{i}$ (actually $\mathcal{G}_{i+r}$) to the set of coordinates B _i where there is a clear “signal” and not just noise. This is necessary in order to make sure that the small coordinates of | Mx | ² are not flooded by noise from the coarse approximations. Details follow.

The Vector Sets $\mathcal{G}_{i}$ For every 1 ≤ i ≤ t + r, let $\mathcal{G}_{i}$ denote the set of all vectors $g^{(i)} \in \mathbb{C}^{N}$ that can be represented as

$$\displaystyle{ g^{(i)} = \frac{\sqrt{2}} {\vert F\vert }\cdot \sum _{(\ell,s)\in F}(-1)^{s/2} \cdot M^{(\ell)} }$$

(2)

for a multiset F of O(2ⁱ ⋅ log(1∕γ)) pairs in [N] ×{ 0, 1, 2, 3}. A trivial counting argument gives the following.

Claim 3.2

For every 1 ≤ i ≤ t + r, $\vert \mathcal{G}_{i}\vert \leq N^{O(2^{i}\cdot \log (1/\gamma )) }.$

The Vector Sets $\mathcal{H}_{i}$ For a t-tuple of vectors $(g^{(1+r)},\ldots,g^{(t+r)}) \in \mathcal{G}_{1+r} \times \cdots \times \mathcal{G}_{t+r}$ and for 1 ≤ i ≤ t, let B _i be the set of all j ∈ [N] for which i is the smallest index satisfying | g _j ^(i+r) | ≥ 2 ⋅ 2^−i∕2. For such i, define the vector h ⁽ⁱ⁾ by

$$\displaystyle\begin{array}{rcl} h_{j}^{(i)} =\min (\vert g_{ j}^{(i+r)}\vert ^{2} \cdot \mathbf{1}_{ j\in B_{i}},9 \cdot 2^{-i}).& &{}\end{array}$$

(3)

Let $\mathcal{H}_{i}$ be the set of all vectors h ⁽ⁱ⁾ that can be obtained in this way.

Claim 3.3

For every 1 ≤ i ≤ t, $\vert \mathcal{H}_{i}\vert \leq N^{O(\epsilon ^{-2}\cdot 2^{i}\cdot \log (1/\gamma )) }.$

Proof

Observe that every $h^{(i)} \in \mathcal{H}_{i}$ is fully defined by some $(g^{(1+r)},\ldots,g^{(i+r)}) \in \mathcal{G}_{1+r} \times \cdots \times \mathcal{G}_{i+r}$. Hence

$$\displaystyle\begin{array}{rcl} \vert \mathcal{H}_{i}\vert \leq \vert \mathcal{G}_{1+r}\vert \cdots \vert \mathcal{G}_{i+r}\vert \leq N^{O(\log (1/\gamma ))\cdot (2^{1+r}+2^{2+r}+\cdots +2^{i+r}) } \leq N^{O(\log (1/\gamma ))\cdot 2^{i+r+1} }\;.& & {}\\ \end{array}$$

Using the definition of r, the claim follows.■

Lemma 3.4

For every $\tilde{\eta }> 0$ and some $q = O(\epsilon ^{-3}\tilde{\eta }^{-1}\log N \cdot \log (1/\gamma ))$ , let Q be a multiset of q uniform and independent random elements of [N]. Then, with probability $1 - 2^{-\Omega (\epsilon ^{-2}\cdot \log N\cdot \log (1/\gamma )) }$ , it holds that for all 1 ≤ i ≤ t and $h^{(i)} \in \mathcal{H}_{i}$ ,

$$\displaystyle{\mathop{\mathbb{E}} _{j\in Q}\left [h_{j}^{(i)}\right ] \approx _{\epsilon,\tilde{\eta }}\mathop{ \mathbb{E}} _{j\in [N]}\left [h_{j}^{(i)}\right ].}$$

Proof

Fix an 1 ≤ i ≤ t and a vector $h^{(i)} \in \mathcal{H}_{i}$, and denote $\mu =\mathop{ \mathbb{E}} _{j\in [N]}[h_{j}^{(i)}]$. By Corollary 2.3, applied with $\alpha =\tilde{\eta }$ and a = 9 ⋅ 2⁻ⁱ (recall that h _j ⁽ⁱ⁾ ≤ a for every j), with probability $1 - 2^{-\Omega (2^{i}\cdot q\epsilon \tilde{\eta }) }$, it holds that $\mathop{\mathbb{E}} _{j\in Q}[h_{j}^{(i)}] \approx _{\epsilon,\tilde{\eta }}\mu$. Using Claim 3.3, the union bound over all the vectors in $\mathcal{H}_{i}$ implies that the probability that some $h^{(i)} \in \mathcal{H}_{i}$ does not satisfy $\mathop{\mathbb{E}} _{j\in Q}[h_{j}^{(i)}] \approx _{\epsilon,\tilde{\eta }}\mu$ is at most

$$\displaystyle{N^{O(\epsilon ^{-2}\cdot 2^{i}\cdot \log (1/\gamma )) } \cdot 2^{-\Omega (2^{i}\cdot q\epsilon \tilde{\eta }) } \leq 2^{-\Omega (\epsilon ^{-2}\cdot 2^{i}\cdot \log N\cdot \log (1/\gamma )) }\;.}$$

We complete the proof by a union bound over i. ■

Approximating the Vectors Mx

Lemma 3.5

For every vector $x \in \mathbb{C}^{N}$ with ∥x∥ ₁ = 1, every multiset Q ⊆ [N], and every 1 ≤ i ≤ t + r, there exists a vector $g \in \mathcal{G}_{i}$ that satisfies $\vert (Mx)_{j}\vert \approx _{0,2^{-i/2}}\vert g_{j}\vert$ for all but at most γ fraction of j ∈ [N] and for all but at most γ fraction of j ∈ Q.

Proof

Observe that for every ℓ ∈ [N] there exist p _ℓ, 0, p _ℓ, 1, p _ℓ, 2, p _ℓ, 3 ≥ 0 that satisfy

$$\displaystyle{\sum _{s=0}^{3}p_{\ell,s} = \vert x_{\ell}\vert \ \ \ \ \text{and}\ \ \ \ \sqrt{2} \cdot \sum _{s=0}^{3}p_{\ell,s} \cdot (-1)^{s/2} = x_{\ell}.}$$

Notice that the assumption ∥ x ∥ ₁ = 1 implies that the numbers p _ℓ, s form a probability distribution. Thus, the vector Mx can be represented as

$$\displaystyle{Mx =\sum _{ \ell=1}^{N}x_{\ell} \cdot M^{(\ell)} = \sqrt{2}\cdot \sum _{\ell =1}^{N}\sum _{ s=0}^{3}p_{\ell,s} \cdot (-1)^{s/2} \cdot M^{(\ell)} =\mathop{ \mathbb{E}} _{ (\ell,s)\sim D}[\sqrt{2}\cdot (-1)^{s/2}\cdot M^{(\ell)}],}$$

where D is the distribution that assigns probability p _ℓ, s to the pair (ℓ, s).

Let F be a multiset of O(2ⁱ ⋅ log(1∕γ)) independent random samples from D, and let $g \in \mathcal{G}_{i}$ be the vector corresponding to F as in (2). By Corollary 2.6, applied with $a = \sqrt{2}$ (recall that ∥ M ∥ _∞ = 1) and b = 2^−i∕2, for every j ∈ [N] the probability that

$$\displaystyle\begin{array}{rcl} \vert (Mx)_{j}\vert \approx _{0,2^{-i/2}}\vert g_{j}\vert & &{}\end{array}$$

(4)

is at least 1 −γ∕4. It follows that the expected number of j ∈ [N] that do not satisfy (4) is at most γ N∕4, so by Markov’s inequality the probability that the number of j ∈ [N] that do not satisfy (4) is at most γ N is at least 3∕4. Similarly, the expected number of j ∈ Q that do not satisfy (4) is at most γ | Q | ∕4, so by Markov’s inequality, with probability at least 3∕4 it holds that the number of j ∈ Q that do not satisfy (4) is at most γ | Q | . It follows that there exists a vector $g \in \mathcal{G}_{i}$ for which (4) holds for all but at most γ fraction of j ∈ [N] and for all but at most γ fraction of j ∈ Q, as required. ■

Lemma 3.6

For every multiset Q ⊆ [N] and every vector $x \in \mathbb{C}^{N}$ with ∥x∥ ₁ = 1 there exists a t-tuple of vectors $(h^{(1)},\ldots,h^{(t)}) \in \mathcal{H}_{1} \times \cdots \times \mathcal{H}_{t}$ for which

1.
$\mathop{\mathbb{E}} _{j\in Q}\left [\vert (Mx)_{j}\vert ^{2}\right ] \approx _{O(\epsilon ),O(\eta )}\mathop{ \mathbb{E}} _{j\in Q}\left [\sum _{i=1}^{t}h_{j}^{(i)}\right ]$ and
2.
$\mathop{\mathbb{E}} _{j\in [N]}\left [\vert (Mx)_{j}\vert ^{2}\right ] \approx _{O(\epsilon ),O(\eta )}\mathop{ \mathbb{E}} _{j\in [N]}\left [\sum _{i=1}^{t}h_{j}^{(i)}\right ]$ .

Proof

By Lemma 3.5, for every 1 ≤ i ≤ t there exists a vector $g^{(i+r)} \in \mathcal{G}_{i+r}$ that satisfies

$$\displaystyle\begin{array}{rcl} \vert (Mx)_{j}\vert \approx _{0,2^{-(i+r)/2}}\vert g_{j}^{(i+r)}\vert & &{}\end{array}$$

(5)

for all but at most γ fraction of j ∈ [N] and for all but at most γ fraction of j ∈ Q. We say that j ∈ [N] is good if (5) holds for every 1 ≤ i ≤ t, and otherwise that it is bad. Notice that all but at most t γ fraction of j ∈ [N] are good and that all but at most t γ fraction of j ∈ Q are good. Let (h ⁽¹⁾, …, h ^(t)) and (B ₁, …, B _t) be the vectors and sets associated with (g ^(1+r), …, g ^(t+r)) as defined in (3). We claim that h ⁽¹⁾, …, h ^(t) satisfy the requirements of the lemma.

We first show that for every good j it holds that | (Mx)_j | ² ≈ _3ε, 9η ∑ _i = 1 ^t h _j ⁽ⁱ⁾. To obtain it, we observe that if j ∈ B _i for some i, then

$$\displaystyle\begin{array}{rcl} 2 \cdot 2^{-i/2} \leq \vert g_{ j}^{(i+r)}\vert \leq 3 \cdot 2^{-i/2}.& &{}\end{array}$$

(6)

The lower bound follows simply from the definition of B _i. For the upper bound, which trivially holds for i = 1, assume that i ≥ 2, and notice that the definition of B _i implies that | g _j ^(i+r−1) | < 2 ⋅ 2^{−(i−1)∕2}. Using (5), and assuming that ε is sufficiently small, we obtain that

$$\displaystyle\begin{array}{rcl} \vert g_{j}^{(i+r)}\vert & \leq & \vert (Mx)_{ j}\vert + 2^{-(i+r)/2} \leq \vert g_{ j}^{(i+r-1)}\vert + 2^{-(i+r-1)/2} + 2^{-(i+r)/2} {}\\ & \leq & 2^{-i/2}(2^{3/2} + 2^{1/2} \cdot \epsilon +\epsilon ) \leq 3 \cdot 2^{-i/2}. {}\\ \end{array}$$

Hence, by the upper bound in (6), for a good j ∈ B _i we have h _j ⁽ⁱ⁾ = | g _j ^(i+r) | ² and $h_{j}^{(i^{{\prime}}) } = 0$ for i ^′ ≠ i. Observe that by the lower bound in (6),

$$\displaystyle{\vert (Mx)_{j}\vert \in [\vert g_{j}^{(i+r)}\vert -2^{-(i+r)/2},\vert g_{ j}^{(i+r)}\vert +2^{-(i+r)/2}] \subseteq [(1-\epsilon )\cdot \vert g_{ j}^{(i+r)}\vert,(1+\epsilon )\cdot \vert g_{ j}^{(i+r)}\vert ],}$$

and that this implies that $\vert (Mx)_{j}\vert ^{2} \approx _{3\epsilon,0}\sum _{i=1}^{t}h_{j}^{(i)}$. On the other hand, in case that j is good but does not belong to any B _i, recalling that t = log₂(1∕η), it follows that

$$\displaystyle{\vert (Mx)_{j}\vert \leq \vert g_{j}^{(t+r)}\vert + 2^{-(t+r)/2} \leq 2 \cdot 2^{-t/2} + 2^{-(t+r)/2} \leq 3 \cdot 2^{-t/2} \leq 3\sqrt{\eta },}$$

and thus $\vert (Mx)_{j}\vert ^{2} \approx _{0,9\eta }0 =\sum _{ i=1}^{t}h_{j}^{(i)}$.

Finally, for every bad j we have

$$\displaystyle{\left \vert \vert (Mx)_{j}\vert ^{2} -\sum _{ i=1}^{t}h_{ j}^{(i)}\right \vert \leq \max \Big (\vert (Mx)_{ j}\vert ^{2},\sum _{ i=1}^{t}h_{ j}^{(i)}\Big) \leq 2.}$$

Since at most t γ fraction of the elements in [N] and in Q are bad, their effect on the difference between the expectations in the lemma can be bounded by 2t γ. By our choice of γ, this is η, completing the proof of the lemma. ■

Finally, we are ready to prove Theorem 3.1.

Proof of Theorem 3.1

By Lemma 3.4, applied with $\tilde{\eta }=\eta /(2t)$, a random multiset Q of size

$$\displaystyle\begin{array}{rcl} q = O\Big(\epsilon ^{-3}\eta ^{-1} \cdot t \cdot \log N \cdot \log (1/\gamma )\Big) = O\Big(\epsilon ^{-3}\eta ^{-1}\log N \cdot \log ^{2}(1/\eta )\Big)& & {}\\ \end{array}$$

satisfies with probability $1 - 2^{-\Omega (\epsilon ^{-2}\cdot \log N\cdot \log (1/\eta )) }$ that for all 1 ≤ i ≤ t and $h^{(i)} \in \mathcal{H}_{i}$,

$$\displaystyle\begin{array}{rcl} \mathop{\mathbb{E}} _{j\in Q}\left [h_{j}^{(i)}\right ] \approx _{\epsilon,\eta /t}\mathop{ \mathbb{E}} _{j\in [N]}\left [h_{j}^{(i)}\right ]\;,& & {}\\ \end{array}$$

in which case we also have

$$\displaystyle{\mathop{\mathbb{E}} _{j\in Q}\left [\sum _{i=1}^{t}h_{ j}^{(i)}\right ] \approx _{\epsilon,\eta }\mathop{ \mathbb{E}} _{j\in [N]}\left [\sum _{i=1}^{t}h_{ j}^{(i)}\right ].}$$

We show that a Q with the above property satisfies the requirement of the theorem. Let $x \in \mathbb{C}^{N}$ be a vector, and assume without loss of generality that ∥ x ∥ ₁ = 1. By Lemma 3.6, there exists a t-tuple of vectors $(h^{(1)},\ldots,h^{(t)}) \in \mathcal{H}_{1} \times \cdots \times \mathcal{H}_{t}$ satisfying Items 1 and 2 there. As a result,

$$\displaystyle{\mathop{\mathbb{E}} _{j\in Q}\left [\vert (Mx)_{j}\vert ^{2}\right ] \approx _{ O(\epsilon ),O(\eta )}\mathop{ \mathbb{E}} _{j\in [N]}\left [\vert (Mx)_{j}\vert ^{2}\right ]\;,}$$

and we are done. ■

3.1 The Restricted Isometry Property

Equipped with Theorem 3.1, it is easy to derive our result on the restricted isometry property (see Definition 2.1) of random sub-matrices of unitary matrices.

Theorem 3.7

For sufficiently large N and k, a unitary matrix $M \in \mathbb{C}^{N\times N}$ satisfying $\|M\|_{\infty }\leq O(1/\sqrt{N})$ , and a sufficiently small ε > 0, the following holds. For some q = O(ε ⁻⁴ ⋅ k ⋅ log² (k∕ε) ⋅ log N), let $A \in \mathbb{C}^{q\times N}$ be a matrix whose q rows are chosen uniformly and independently from the rows of M, multiplied by $\sqrt{N/q}$ . Then, with probability $1 - 2^{-\Omega (\epsilon ^{-2}\cdot \log N\cdot \log (k/\epsilon )) }$ , the matrix A satisfies the restricted isometry property of order k with constant ε.

Proof

Let Q be a multiset of q uniform and independent random elements of [N], defining a matrix A as above. Notice that by the Cauchy-Schwarz inequality, any k-sparse vector $x \in \mathbb{C}^{N}$ with ∥ x ∥ ₂ = 1 satisfies $\|x\|_{1} \leq \sqrt{k}$. Applying Theorem 3.1 with ε∕2 and some $\eta = \Omega (\epsilon /k)$, we get that with probability $1 - 2^{-\Omega (\epsilon ^{-2}\cdot \log N\cdot \log (k/\epsilon )) }$, it holds that for every $x \in \mathbb{C}^{N}$ with ∥ x ∥ ₂ = 1,

$$\displaystyle{\|Ax\|_{2}^{2} = N \cdot \mathop{\mathbb{E}} _{ j\in Q}\left [\vert (Mx)_{j}\vert ^{2}\right ] \approx _{\epsilon /2,\epsilon /2}N \cdot \mathop{\mathbb{E}} _{j\in [N]}\left [\vert (Mx)_{j}\vert ^{2}\right ] =\| Mx\|_{ 2}^{2} = 1\;.}$$

It follows that every vector $x \in \mathbb{C}^{N}$ satisfies ∥ Ax ∥ ₂ ² ≈ _ε, 0 ∥ x ∥ ₂ ², hence A satisfies the restricted isometry property of order k with constant ε. ■

4 The Improved Analysis

In this section we prove the following theorem, which improves the bound of Theorem 3.1 in terms of the dependence on ε.

Theorem 4.1

For a sufficiently large N, a matrix $M \in \mathbb{C}^{N\times N}$ , and sufficiently small ε,η > 0, the following holds. For some q = O(log² (1∕ε) ⋅ε ⁻¹ η ⁻¹log N ⋅ log² (1∕η)), let Q be a multiset of q uniform and independent random elements of [N]. Then, with probability $1 - 2^{-\Omega (\log N\cdot \log (1/\eta ))}$ , it holds that for every $x \in \mathbb{C}^{N}$ ,

$$\displaystyle\begin{array}{rcl} \mathop{\mathbb{E}} _{j\in Q}\left [\vert (Mx)_{j}\vert ^{2}\right ] \approx _{\epsilon,\eta \cdot \|x\|_{1}^{2}\cdot \|M\|_{\infty }^{2}}\mathop{ \mathbb{E}} _{j\in [N]}\left [\vert (Mx)_{j}\vert ^{2}\right ].& &{}\end{array}$$

(7)

We can assume that ε ≥ η, as otherwise, one can apply the theorem with parameters η∕2, η∕2 and derive (7) for ε, η as well (because the right-hand size is bounded from above by ∥ x ∥ ₁ ² ⋅ ∥ M ∥ _∞ ²). As before, we assume without loss of generality that ∥ M ∥ _∞ = 1. For ε ≥ η > 0, we define t = log₂(1∕η) and r = log₂(1∕ε ²). For the analysis given in this section, we define γ = η∕(60(t + r)). Throughout the proof, we use the vector sets $\mathcal{G}_{i}$ from Sect. 3 and Lemma 3.5 for this value of γ.

The Vector Sets $\mathcal{D}_{i,m}$ For a (t + r)-tuple of vectors $(g^{(1)},\ldots,g^{(t+r)}) \in \mathcal{G}_{1} \times \cdots \times \mathcal{G}_{t+r}$ and for 1 ≤ i ≤ t, let C _i be the set of all j ∈ [N] for which i is the smallest index satisfying | g _j ⁽ⁱ⁾ | ≥ 2 ⋅ 2^−i∕2. For m = i, …, i + r define the vector h ^(i, m) by

$$\displaystyle\begin{array}{rcl} h_{j}^{(i,m)} = \vert g_{ j}^{(m)}\vert ^{2} \cdot \mathbf{1}_{ j\in C_{i}},& &{}\end{array}$$

(8)

and for other values of m define h ^(i, m) = 0. Now, for every m, let $\Delta ^{(i,m)}$ be the vector defined by

$$\displaystyle\begin{array}{rcl} \Delta _{j}^{(i,m)} = \left \{\begin{array}{ll} h_{j}^{(i,m)} - h_{j}^{(i,m-1)},&\text{if }\vert h_{j}^{(i,m)} - h_{j}^{(i,m-1)}\vert \leq 30 \cdot 2^{-(i+m)/2}; \\ 0, &\text{otherwise.} \end{array} \right.& &{}\end{array}$$

(9)

Note that the support of $\Delta ^{(i,m)}$ is contained in C _i. Let $\mathcal{D}_{i,m}$ be the set of all vectors $\Delta ^{(i,m)}$ that can be obtained in this way.

Claim 4.2

For every 1 ≤ i ≤ t and i ≤ m ≤ i + r, $\vert \mathcal{D}_{i,m}\vert \leq N^{O(2^{m}\cdot \log (1/\gamma )) }.$

Proof

Observe that every vector in $\mathcal{D}_{i,m}$ is fully defined by some $(g^{(1)},\ldots,g^{(m)}) \in \mathcal{G}_{1} \times \cdots \times \mathcal{G}_{m}$. Hence

$$\displaystyle\begin{array}{rcl} \vert \mathcal{D}_{i,m}\vert \leq \vert \mathcal{G}_{1}\vert \cdots \vert \mathcal{G}_{m}\vert \leq N^{O(\log (1/\gamma ))\cdot (2^{1}+2^{2}+\cdots +2^{m}) } \leq N^{O(\log (1/\gamma ))\cdot 2^{m+1} }\;,& & {}\\ \end{array}$$

and the claim follows. ■

Lemma 4.3

For every $\tilde{\varepsilon },\tilde{\eta }> 0$ and some $q = O(\tilde{\varepsilon }^{-1}\tilde{\eta }^{-1}\log N \cdot \log (1/\gamma ))$ , let Q be a multiset of q uniform and independent random elements of [N]. Then, with probability $1 - 2^{-\Omega (\log N\cdot \log (1/\gamma ))}$ , it holds that for every 1 ≤ i ≤ t, m, and a vector $\Delta ^{(i,m)} \in \mathcal{D}_{i,m}$ associated with a set C _i ,

$$\displaystyle{ \mathop{\mathbb{E}} _{j\in Q}\left [\Delta _{j}^{(i,m)}\right ] \approx _{ 0,b}\mathop{ \mathbb{E}} _{j\in [N]}\left [\Delta _{j}^{(i,m)}\right ] \ \ \text{for}\ \ b = O\Big(\tilde{\varepsilon }\cdot 2^{-i} \cdot \frac{\vert C_{i}\vert } {N} +\tilde{\eta }\Big)\;. }$$

(10)

Proof

Fix i, m, and a vector $\Delta ^{(i,m)} \in \mathcal{D}_{i,m}$ associated with a set C _i as in (9). Notice that

$$\displaystyle{\mathop{\mathbb{E}} _{j\in [N]}[\vert \Delta _{j}^{(i,m)}\vert ] \leq 30 \cdot 2^{-(i+m)/2} \cdot \frac{\vert C_{i}\vert } {N} \;.}$$

By Corollary 2.4, applied with

$$\displaystyle{\epsilon ^{{\prime}} =\tilde{\varepsilon } \cdot 2^{(m-i)/2},\ \ \ \alpha =\tilde{\eta },\ \ \ \text{and}\ \ \ a = 30 \cdot 2^{-(i+m)/2},}$$

we have that (10) holds with probability $1 - 2^{-\Omega (2^{m}\cdot q\tilde{\varepsilon }\tilde{\eta }) }$. Using Claim 4.2, the union bound over all the vectors in $\mathcal{D}_{i,m}$ implies that the probability that some $\Delta ^{(i,m)} \in \mathcal{D}_{i,m}$ does not satisfy (10) is at most

$$\displaystyle{N^{O(2^{m}\cdot \log (1/\gamma )) } \cdot 2^{-\Omega (2^{m}\cdot q\tilde{\varepsilon }\tilde{\eta }) } \leq 2^{-\Omega (2^{m}\cdot \log N\cdot \log (1/\gamma )) }\;.}$$

The result follows by a union bound over i and m. ■

Approximating the Vectors Mx

Lemma 4.4

For every multiset Q ⊆ [N] and every vector $x \in \mathbb{C}^{N}$ with ∥x∥ ₁ = 1 there exist vector collections $(\Delta ^{(i,m)} \in \mathcal{D}_{i,m})_{m=i,\ldots,i+r}$ associated with sets C _i (1 ≤ i ≤ t), for which

1.
$\mathop{\mathbb{E}} _{j\in [N]}\left [\vert (Mx)_{j}\vert ^{2}\right ] \geq \sum _{i=1}^{t}2^{-i} \cdot \frac{\vert C_{i}\vert } {N} -\eta,$
2.
$\mathop{\mathbb{E}} _{j\in Q}\left [\vert (Mx)_{j}\vert ^{2}\right ] \approx _{O(\epsilon ),O(\eta )}\mathop{ \mathbb{E}} _{j\in Q}\left [\sum _{i=1}^{t}\sum _{m=i}^{i+r}\Delta _{j}^{(i,m)}\right ],$ and
3.
$\mathop{\mathbb{E}} _{j\in [N]}\left [\vert (Mx)_{j}\vert ^{2}\right ] \approx _{O(\epsilon ),O(\eta )}\mathop{ \mathbb{E}} _{j\in [N]}\left [\sum _{i=1}^{t}\sum _{m=i}^{i+r}\Delta _{j}^{(i,m)}\right ].$

Proof

By Lemma 3.5, for every 1 ≤ i ≤ t + r there exists a vector $g^{(i)} \in \mathcal{G}_{i}$ that satisfies

$$\displaystyle\begin{array}{rcl} \vert (Mx)_{j}\vert \approx _{0,2^{-i/2}}\vert g_{j}^{(i)}\vert & &{}\end{array}$$

(11)

for all but at most γ fraction of j ∈ [N] and for all but at most γ fraction of j ∈ Q. We say that j ∈ [N] is good if (11) holds for every i, and otherwise that it is bad. Notice that all but at most (t + r)γ fraction of j ∈ [N] are good and that all but at most (t + r)γ fraction of j ∈ Q are good. Consider the sets C _i and vectors $h^{(i,m)},\Delta ^{(i,m)}$ associated with (g ⁽¹⁾, …, g ^(t+r)) as defined in (8). We claim that $\Delta ^{(i,m)}$ satisfy the requirements of the lemma.

Fix some 1 ≤ i ≤ t. For every good j ∈ C _i, the definition of C _i implies that | g _j ⁽ⁱ⁾ | ≥ 2 ⋅ 2^−i∕2, so using (11) it follows that

$$\displaystyle\begin{array}{rcl} \vert (Mx)_{j}\vert \geq \vert g_{j}^{(i)}\vert - 2^{-i/2} \geq 2^{-i/2}.& &{}\end{array}$$

(12)

We also claim that | (Mx)_j | ≤ 3 ⋅ 2^{−(i−1)∕2}. This trivially holds for i = 1, so assume that i ≥ 2, and notice that the definition of C _i implies that | g _j ⁽ⁱ⁻¹⁾ | < 2 ⋅ 2^{−(i−1)∕2}, so using (11), it follows that

$$\displaystyle\begin{array}{rcl} \vert (Mx)_{j}\vert \leq \vert g_{j}^{(i-1)}\vert + 2^{-(i-1)/2} \leq 3 \cdot 2^{-(i-1)/2}.& &{}\end{array}$$

(13)

Since at most (t + r)γ fraction of j ∈ [N] are bad, (12) yields that

$$\displaystyle\begin{array}{rcl} \mathop{\mathbb{E}} _{j\in [N]}\left [\vert (Mx)_{j}\vert ^{2}\right ] \geq \sum _{ i=1}^{t}2^{-i} \cdot \frac{\vert C_{i}\vert } {N} - (t + r)\gamma /2 \geq \sum _{i=1}^{t}2^{-i} \cdot \frac{\vert C_{i}\vert } {N} -\eta,& & {}\\ \end{array}$$

as required for Item 1.

Next, we claim that every good j satisfies

$$\displaystyle{ \vert (Mx)_{j}\vert ^{2} \approx _{ O(\epsilon ),O(\eta )}\sum _{i=1}^{t}h_{ j}^{(i,i+r)}\;. }$$

(14)

For a good j ∈ C _i and m ≥ i,

$$\displaystyle\begin{array}{rcl} \left \vert \vert (Mx)_{j}\vert ^{2} - h_{ j}^{(i,m)}\right \vert \leq 2 \cdot \vert (Mx)_{ j}\vert \cdot 2^{-m/2} + 2^{-m} \leq 10 \cdot 2^{-(i+m)/2},& &{}\end{array}$$

(15)

where the first inequality follows from (11) and the second from (13). In particular, for m = i + r (recall that r = log₂(1∕ε ²)), we have

$$\displaystyle{\left \vert \vert (Mx)_{j}\vert ^{2} - h_{ j}^{(i,i+r)}\right \vert \leq 10 \cdot \epsilon \cdot 2^{-i} \leq 10 \cdot \epsilon \cdot \vert (Mx)_{ j}\vert ^{2}\;,}$$

and thus $\vert (Mx)_{j}\vert ^{2} \approx _{O(\epsilon ),0}h_{j}^{(i,i+r)}$. Since every good j belongs to at most one of the sets C _i, for every good $j \in \bigcup C_{i}$ we have $\vert (Mx)_{j}\vert ^{2} \approx _{O(\epsilon ),0}\sum _{i=1}^{t}h_{j}^{(i,i+r)}$. On the other hand, if j is good but does not belong to any C _i, by our choice of t, it satisfies

$$\displaystyle{\vert (Mx)_{j}\vert \leq \vert g_{j}^{(t)}\vert + 2^{-t/2} \leq 3 \cdot 2^{-t/2} = 3\sqrt{\eta }\;,}$$

and thus $\vert (Mx)_{j}\vert ^{2} \approx _{0,9\eta }0 =\sum _{ i=1}^{t}h_{j}^{(i,i+r)}$. This establishes that (14) holds for every good j.

Next, we claim that for every good j,

$$\displaystyle\begin{array}{rcl} \vert (Mx)_{j}\vert ^{2} \approx _{ O(\epsilon ),O(\eta )}\sum _{i=1}^{t}\sum _{ m=i}^{i+r}\Delta _{ j}^{(i,m)}\;.& &{}\end{array}$$

(16)

This follows since for every 1 ≤ i ≤ t, the vector h ^(i, i+r) can be written as the telescopic sum

$$\displaystyle{h^{(i,i+r)} =\sum _{ m=i}^{i+r}(h^{(i,m)} - h^{(i,m-1)})\;,}$$

where we used that h ^(i, i−1) = 0. We claim that for every good j, these differences satisfy

$$\displaystyle{\vert h_{j}^{(i,m)} - h_{ j}^{(i,m-1)}\vert \leq 30 \cdot 2^{-(i+m)/2},}$$

thus establishing that (16) holds for every good j. Indeed, for m ≥ i + 1, (15) implies that

$$\displaystyle\begin{array}{rcl} \vert h_{j}^{(i,m)} - h_{ j}^{(i,m-1)}\vert \leq 10 \cdot (2^{-(i+m)/2} + 2^{-(i+m-1)/2}) \leq 30 \cdot 2^{-(i+m)/2},& &{}\end{array}$$

(17)

and for m = i it follows from (11) combined with (13).

Finally, for every bad j we have

$$\displaystyle{\Big\vert \vert (Mx)_{j}\vert ^{2} -\sum _{ i=1}^{t}\sum _{ m=i}^{i+r}\Delta _{ j}^{(i,m)}\Big\vert \leq 1 + 30 \cdot \max _{ 1\leq i\leq t}\Big(\sum _{m=i}^{i+r}2^{-(i+m)/2}\Big) \leq 60\;.}$$

Since at most (t + r)γ fraction of the elements in [N] and in Q are bad, their effect on the difference between the expectations in Items 2 and 3 can be bounded by 60(t + r)γ. By our choice of γ this is η, as required. ■

Finally, we are ready to prove Theorem 4.1.

Proof of Theorem 4.1

Recall that it can be assumed that ε ≥ η. By Lemma 4.3, applied with $\tilde{\varepsilon }=\epsilon /r$ and $\tilde{\eta }=\eta /(rt)$, a random multiset Q of size

$$\displaystyle\begin{array}{rcl} q& =& O\Big(\epsilon ^{-1}\eta ^{-1} \cdot r^{2} \cdot t \cdot \log N \cdot \log (1/\gamma )\Big) {}\\ & =& O\Big(\log ^{2}(1/\epsilon ) \cdot \epsilon ^{-1}\eta ^{-1}\log N \cdot \log ^{2}(1/\eta )\Big) {}\\ \end{array}$$

satisfies with probability $1 - 2^{-\Omega (\log N\cdot \log (1/\eta ))}$, that for every 1 ≤ i ≤ t, m, and $\Delta ^{(i,m)} \in \mathcal{D}_{i,m}$ associated with a set C _i,

$$\displaystyle\begin{array}{rcl} \mathop{\mathbb{E}} _{j\in Q}\left [\Delta _{j}^{(i,m)}\right ] \approx _{ 0,b_{i}}\mathop{ \mathbb{E}} _{j\in [N]}\left [\Delta _{j}^{(i,m)}\right ]\ \ \text{for}\ \ b_{ i} = O\Big( \frac{\varepsilon } {r} \cdot 2^{-i} \cdot \frac{\vert C_{i}\vert } {N} + \frac{\eta } {rt}\Big),& & {}\\ \end{array}$$

in which case we also have

$$\displaystyle{ \mathop{\mathbb{E}} _{j\in Q}\left [\sum _{i=1}^{t}\sum _{ m=i}^{i+r}\Delta _{ j}^{(i,m)}\right ] \approx _{ 0,b}\mathop{ \mathbb{E}} _{j\in [N]}\left [\sum _{i=1}^{t}\sum _{ m=i}^{i+r}\Delta _{ j}^{(i,m)}\right ]\ \ \text{for}\ \ b = O\Big(\epsilon \cdot \sum _{ i=1}^{t}2^{-i} \cdot \frac{\vert C_{i}\vert } {N} +\eta \Big)\;. }$$

(18)

We show that a Q with the above property satisfies the requirement of the theorem. Let $x \in \mathbb{C}^{N}$ be a vector, and assume without loss of generality that ∥ x ∥ ₁ = 1. By Lemma 4.4, there exist vector collections $(\Delta ^{(i,m)} \in \mathcal{D}_{i,m})_{m=i,\ldots,i+r}$ associated with sets C _i (1 ≤ i ≤ t), satisfying Items 1, 2, and 3 there. Combined with (18), this gives

$$\displaystyle{\mathop{\mathbb{E}} _{j\in Q}\left [\vert (Mx)_{j}\vert ^{2}\right ] \approx _{ O(\epsilon ),O(\eta )}\mathop{ \mathbb{E}} _{j\in [N]}\left [\vert (Mx)_{j}\vert ^{2}\right ]\;,}$$

and we are done. ■

4.1 The Restricted Isometry Property

It is easy to derive now the following theorem. The proof is essentially identical to that of Theorem 3.7, using Theorem 4.1 instead of Theorem 3.1.

Theorem 4.5

For sufficiently large N and k, a unitary matrix $M \in \mathbb{C}^{N\times N}$ satisfying $\|M\|_{\infty }\leq O(1/\sqrt{N})$ , and a sufficiently small ε > 0, the following holds. For some q = O(log² (1∕ε)ε ⁻² ⋅ k ⋅ log² (k∕ε) ⋅ log N), let $A \in \mathbb{C}^{q\times N}$ be a matrix whose q rows are chosen uniformly and independently from the rows of M, multiplied by $\sqrt{N/q}$ . Then, with probability $1 - 2^{-\Omega (\log N\cdot \log (k/\epsilon ))}$ , the matrix A satisfies the restricted isometry property of order k with constant ε.

Notes

1.
A preliminary version appeared in Proceedings of the 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2016, pages 288–297.
2.
Note that the list-decoding result of [14] was later improved by Wootters [33] using different techniques.
3.
The result in [19] is weaker in two main respects. First, it is restricted to the case that Ax is in {0, 1}^q. This significantly simplifies the analysis and leads to a better bound on the number of rows of A. Second, the order of quantifiers is switched, namely it shows that for any sparse x, a random subsampled A works with high probability, whereas for the restricted isometry property we need to show that a random A works for all sparse x.

References

N. Ailon, B. Chazelle, The fast Johnson–Lindenstrauss transform and approximate nearest neighbors. SIAM J. Comput. 39 (1), 302–322 (2009). Preliminary version in STOC’06
Google Scholar
N. Ailon, E. Liberty, Fast dimension reduction using Rademacher series on dual BCH codes. Discrete Comput. Geom. 42 (4), 615–630 (2009). Preliminary version in SODA’08
Google Scholar
N. Ailon, E. Liberty, An almost optimal unrestricted fast Johnson–Lindenstrauss transform. ACM Trans. Algorithms 9 (3), 21 (2013). Preliminary version in SODA’11
Google Scholar
A.S. Bandeira, E. Dobriban, D.G. Mixon, W.F. Sawin, Certifying the restricted isometry property is hard. IEEE Trans. Inform. Theory 59 (6), 3448–3450 (2013)
Article MathSciNet Google Scholar
A.S. Bandeira, M.E. Lewis, D.G. Mixon, Discrete uncertainty principles and sparse signal processing. CoRR abs/1504.01014 (2015)
Google Scholar
R. Baraniuk, M. Davenport, R. DeVore, M. Wakin, A simple proof of the restricted isometry property for random matrices. Constr. Approx. 28 (3), 253–263 (2008)
Article MathSciNet MATH Google Scholar
J. Bourgain, An improved estimate in the restricted isometry problem, in Geometric Aspects of Functional Analysis. Lecture Notes in Mathematics, vol. 2116, pp. 65–70 (Springer, Berlin, 2014)
Google Scholar
E.J. Candès, The restricted isometry property and its implications for compressed sensing. C. R. Math. 346 (9–10), 589–592 (2008)
Article MathSciNet MATH Google Scholar
E.J. Candès, T. Tao, Decoding by linear programming. IEEE Trans. Inform. Theory 51 (12), 4203–4215 (2005)
Article MathSciNet MATH Google Scholar
E.J. Candès, T. Tao, Near-optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. on Inform. Theory 52 (12), 5406–5425 (2006)
Article MathSciNet MATH Google Scholar
E.J. Candès, M. Rudelson, T. Tao, R. Vershynin, Error correction via linear programming, in 46th Annual IEEE Symposium on Foundations of Computer Science, FOCS, pp. 295–308 (2005)
Google Scholar
E.J. Candès, J.K. Romberg, T. Tao, Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59 (8), 1207–1223 (2006)
Article MathSciNet MATH Google Scholar
S.S. Chen, D.L. Donoho, M.A. Saunders, Atomic decomposition by basis pursuit. SIAM J. Comput. 20 (1), 33–61 (1998)
Article MathSciNet MATH Google Scholar
M. Cheraghchi, V. Guruswami, A. Velingker, Restricted isometry of Fourier matrices and list decodability of random linear codes. SIAM J. Comput. 42 (5), 1888–1914 (2013). Preliminary version in SODA’13
Google Scholar
S. Dirksen, Tail bounds via generic chaining. Electron. J. Prob. 20 (53), 1–29 (2015)
MathSciNet MATH Google Scholar
D.L. Donoho, M. Elad, V.N. Temlyakov, Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 (1), 6–18 (2006)
Article MathSciNet MATH Google Scholar
S. Foucart, A. Pajor, H. Rauhut, T. Ullrich, The Gelfand widths of ℓ _p-balls for 0 < p ≤ 1. J. Complex. 26 (6), 629–640 (2010)
Google Scholar
A.Y. Garnaev, E.D. Gluskin, On the widths of Euclidean balls. Sov. Math. Dokl. 30, 200–203 (1984)
MATH Google Scholar
I. Haviv, O. Regev, The list-decoding size of Fourier-sparse boolean functions, in Proceedings of the 30th Conference on Computational Complexity, CCC, pp. 58–71 (2015)
Google Scholar
P. Indyk, I. Razenshteyn, On model-based RIP-1 matrices, in Automata, Languages, and Programming - 40th International Colloquium, ICALP, pp. 564–575 (2013)
Google Scholar
A.C. Kak, M. Slaney, Principles of Computerized Tomographic Imaging (Society of Industrial and Applied Mathematics, Philadelphia, 2001)
Book MATH Google Scholar
F. Krahmer, R. Ward, New and improved Johnson-Lindenstrauss embeddings via the restricted isometry property. SIAM J. Math. Anal. 43 (3), 1269–1281 (2011)
Article MathSciNet MATH Google Scholar
F. Krahmer, S. Mendelson, H. Rauhut, Suprema of chaos processes and the restricted isometry property. CoRR abs/1207.0235 (2012)
Google Scholar
C. McDiarmid, Concentration, in Probabilistic Methods for Algorithmic Discrete Mathematics. Algorithms Combination, vol. 16 (Springer, Berlin, 1998), pp. 195–248
Google Scholar
A. Natarajan, Y. Wu, Computational complexity of certifying restricted isometry property, in Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX, pp. 371–380 (2014)
Google Scholar
D. Needell, J.A. Tropp, CoSaMP: iterative signal recovery from incomplete and inaccurate samples. Commun. ACM 53 (12), 93–100 (2010)
Article MATH Google Scholar
J. Nelson, E. Price, M. Wootters, New constructions of RIP matrices with fast multiplication and fewer rows, in Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pp. 1515–1528 (2014)
Google Scholar
D.G. Nishimura, Principles of Magnetic Resonance Imaging (Stanford University, Stanford, CA, 2010)
Google Scholar
H. Rauhut, Compressive sensing and structured random matrices, in Theoretical Foundations and Numerical Methods for Sparse Recovery, vol. 9, ed. by M. Fornasier (De Gruyter, Berlin, 2010), pp. 1–92
Google Scholar
M. Rudelson, R. Vershynin, On sparse reconstruction from Fourier and Gaussian measurements. Commun. Pure Appl. Math. 61 (8), 1025–1045 (2008). Preliminary version in CISS’06
Google Scholar
A.M. Tillmann, M.E. Pfetsch, The computational complexity of the restricted isometry property, the nullspace property, and related concepts in compressed sensing. IEEE Trans. Inform. Theory 60 (2), 1248–1259 (2014)
Article MathSciNet Google Scholar
S. Wenger, S. Darabi, P. Sen, K. Glassmeier, M.A. Magnor, Compressed sensing for aperture synthesis imaging, in Proceedings of the International Conference on Image Processing, ICIP, pp. 1381–1384 (2010)
Google Scholar
M. Wootters, On the list decodability of random linear codes with large error rates, in Proceedings of the 45th Annual ACM Symposium on Theory of Computing, STOC, pp. 853–860 (2013)
Google Scholar

Download references

Acknowledgements

We thank Afonso S. Bandeira, Mahdi Cheraghchi, Michael Kapralov, Jelani Nelson, and Eric Price for useful discussions, and anonymous reviewers for useful comments.

Oded Regev was supported by the Simons Collaboration on Algorithms and Geometry and by the National Science Foundation (NSF) under Grant No. CCF-1320188. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF.

Author information

Authors and Affiliations

School of Computer Science, The Academic College of Tel Aviv-Yaffo, Tel Aviv, 61083, Israel
Ishay Haviv
Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
Oded Regev

Authors

Ishay Haviv
View author publications
You can also search for this author in PubMed Google Scholar
Oded Regev
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Mathematical Sciences, Tel Aviv University, Tel-Aviv, Israel
Bo'az Klartag
Mathematics Department, Technion - Israel Institute of Technology, Haifa, Israel
Emanuel Milman

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Haviv, I., Regev, O. (2017). The Restricted Isometry Property of Subsampled Fourier Matrices. In: Klartag, B., Milman, E. (eds) Geometric Aspects of Functional Analysis. Lecture Notes in Mathematics, vol 2169. Springer, Cham. https://doi.org/10.1007/978-3-319-45282-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-45282-1_11
Published: 19 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45281-4
Online ISBN: 978-3-319-45282-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

The Restricted Isometry Property of Subsampled Fourier Matrices

Abstract

Similar content being viewed by others

New Subclasses of the Class of \( \mathrm{\mathscr{H}} \)-Matrices and Related Bounds for the Inverses

Finite configurations in sparse sets

Bounds on Dimension Reduction in the Nuclear Norm

1 Introduction

1.1 Our Contribution

Theorem 1.1 (Simplified)

1.2 Related Literature

1.3 Proof Overview

2 Preliminaries

Notation

Definition 2.1

Theorem 2.2

Corollary 2.3

Proof

Corollary 2.4

Proof

Corollary 2.5

Proof

Corollary 2.6

Proof

3 The Simpler Analysis

Theorem 3.1

Claim 3.2

Claim 3.3

Proof

Lemma 3.4

Proof

Lemma 3.5

Proof

Lemma 3.6

Proof

Proof of Theorem 3.1

3.1 The Restricted Isometry Property

Theorem 3.7

Proof

4 The Improved Analysis

Theorem 4.1

Claim 4.2

Proof

Lemma 4.3

Proof

Lemma 4.4

Proof

Proof of Theorem 4.1

4.1 The Restricted Isometry Property

Theorem 4.5

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation