Abstract
A matrix \(A \in \mathbb{C}^{q\times N}\) satisfies the restricted isometry property of order k with constant ε if it preserves the ℓ 2 norm of all k-sparse vectors up to a factor of 1 ±ε. We prove that a matrix A obtained by randomly sampling q = O(k ⋅ log2 k ⋅ logN) rows from an N × N Fourier matrix satisfies the restricted isometry property of order k with a fixed ε with high probability. This improves on Rudelson and Vershynin (Comm Pure Appl Math, 2008), its subsequent improvements, and Bourgain (GAFA Seminar Notes, 2014).
Access provided by CONRICYT-eBooks. Download chapter PDF
Similar content being viewed by others
1 Introduction
A matrix \(A \in \mathbb{C}^{q\times N}\) satisfies the restricted isometry property of order k with constant ε > 0 if for every k-sparse vector \(x \in \mathbb{C}^{N}\) (i.e., a vector with at most k nonzero entries), it holds that
Intuitively, this means that every k columns of A are nearly orthogonal. This notion, due to Candès and Tao [9], was intensively studied during the last decade and found various applications and connections to several areas of theoretical computer science, including sparse recovery [8, 20, 27], coding theory [14], norm embeddings [6, 22], and computational complexity [4, 25, 31].
The original motivation for the restricted isometry property comes from the area of compressed sensing. There, one wishes to compress a high-dimensional sparse vector \(x \in \mathbb{C}^{N}\) to a vector Ax, where \(A \in \mathbb{C}^{q\times N}\) is a measurement matrix that enables reconstruction of x from Ax. Typical goals in this context include minimizing the number of measurements q and the running time of the reconstruction algorithm. It is known that the restricted isometry property of A, for \(\epsilon <\sqrt{2} - 1\), is a sufficient condition for reconstruction. In fact, it was shown in [8, 9, 11, 12] that under this condition, reconstruction is equivalent to finding the vector of least ℓ 1 norm among all vectors that agree with the given measurements, a task that can be formulated as a linear program [13, 16], and thus can be solved efficiently.
The above application leads to the challenge of finding matrices \(A \in \mathbb{C}^{q\times N}\) that satisfy the restricted isometry property and have a small number of rows q as a function of N and k. (For simplicity, we ignore for now the dependence on ε.) A general lower bound of \(q = \Omega (k \cdot \log (N/k))\) is known to follow from [18] (see also [17]). Fortunately, there are matrices that match this lower bound, e.g., random matrices whose entries are chosen independently according to the normal distribution [10]. However, in many applications the measurement matrix cannot be chosen arbitrarily but is instead given by a random sample of rows from a unitary matrix, typically the discrete Fourier transform. This includes, for instance, various tests and experiments in medicine and biology (e.g., MRI [28] and ultrasound imaging [21]) and applications in astronomy (e.g., radio telescopes [32]). An advantage of subsampled Fourier matrices is that they support fast matrix-vector multiplication, and as such, are useful for efficient compression as well as for efficient reconstruction based on iterative methods (see, e.g., [26]).
In recent years, with motivation from both theory and practice, an intensive line of research has aimed to study the restricted isometry property of random sub-matrices of unitary matrices. Letting \(A \in \mathbb{C}^{q\times N}\) be a (normalized) matrix whose rows are chosen uniformly and independently from the rows of a unitary matrix \(M \in \mathbb{C}^{N\times N}\), the goal is to prove an upper bound on q for which A is guaranteed to satisfy the restricted isometry property with high probability. Note that the fact that the entries of every row of A are not independent makes this question much more difficult than in the case of random matrices with independent entries.
The first upper bound on the number of rows of a subsampled Fourier matrix that satisfies the restricted isometry property was O(k ⋅ log6 N), which was proved by Candès and Tao [10]. This was then improved by Rudelson and Vershynin [30] to O(k ⋅ log2 k ⋅ log(klogN) ⋅ logN) (see also [15, 29] for a simplified analysis with better success probability). A modification of their analysis led to an improved bound of O(k ⋅ log3 k ⋅ logN) by Cheraghchi, Guruswami, and Velingker [14], who related the problem to a question on the list-decoding rate of random linear codes over finite fields. Interestingly, replacing the log(klogN) term in the bound of [30] by logk was crucial for their application.Footnote 2 Recently, Bourgain [7] proved a bound of O(k ⋅ logk ⋅ log2 N), which is incomparable to those of [14, 30] (and has a worse dependence on ε; see below). We finally mention that the best known lower bound on the number of rows is \(\Omega (k \cdot \log N)\) [5].
1.1 Our Contribution
In this work, we improve the previous bounds and prove the following.
Theorem 1.1 (Simplified)
Let \(M \in \mathbb{C}^{N\times N}\) be a unitary matrix with entries of absolute value \(O(1/\sqrt{N})\) , and let ε > 0 be a fixed constant. For some q = O(k ⋅ log2 k ⋅ log N), let \(A \in \mathbb{C}^{q\times N}\) be a matrix whose q rows are chosen uniformly and independently from the rows of M, multiplied by \(\sqrt{N/q}\) . Then, with high probability, the matrix A satisfies the restricted isometry property of order k with constant ε.
The main idea in our proof is described in Sect. 1.3. We arrived at the proof from our recent work on list-decoding [19], where a baby version of the idea was used to bound the sample complexity of learning the class of Fourier-sparse Boolean functions.Footnote 3 Like all previous work on this question, our proof can be seen as a careful union bound applied to a sequence of progressively finer nets, a technique sometimes known as chaining. However, unlike the work of Rudelson and Vershynin [30] and its improvements [14, 15], we avoid the use of Gaussian processes, the “symmetrization process,” and Dudley’s inequality. Instead, we follow and refine Bourgain’s proof [7], and apply the chaining argument directly to the problem at hand using only elementary arguments. It would be interesting to see if our proof can be cast in the Gaussian framework of Rudelson and Vershynin.
We remark that the bounds obtained in the previous works [14, 30] have a multiplicative O(ε −2) term, whereas a much worse term of O(ε −6) was obtained in [7]. In our proof of Theorem 1.1 we nearly obtain the best known dependence on ε. For simplicity of presentation we first prove in Sect. 3 our bound with a weaker multiplicative term of O(ε −4), and then, in Sect. 4, we modify the analysis and decrease the dependence on ε to O(ε −2) up to logarithmic terms.
1.2 Related Literature
As mentioned before, one important advantage of using subsampled Fourier matrices in compressed sensing is that they support fast, in fact nearly linear time, matrix-vector multiplication. In certain scenarios, however, one is not restricted to using subsampled Fourier matrices as the measurement matrix. The question then is whether one can decrease the number of rows using another measurement matrix, while still keeping the near-linear multiplication time. For k < N 1∕2−γ where γ > 0 is an arbitrary constant, the answer is yes: a construction with the optimal number O(k ⋅ logN) of rows follows from works by Ailon and Chazelle [1] and Ailon and Liberty [2] (see [6]). For general k, Nelson, Price, and Wootters [27] suggested taking subsampled Fourier matrices and “tweaking” them by bunching together rows with random signs. Using the Gaussian-process-based analysis of [14, 30] and introducing further techniques from [23], they showed that with this construction one can reduce the number of rows by a logarithmic factor to O(k ⋅ log2(klogN) ⋅ logN) while still keeping the nearly linear multiplication time. Our result shows that the same number of rows (in fact, a slightly smaller number) can be achieved already with the original subsampled Fourier matrices without having to use the “tweak.” A natural open question is whether the “tweak” from [27] and their techniques can be combined with ours to further reduce the number of rows. An improvement in the regime of parameters of \(k =\omega (\sqrt{N})\) would lead to more efficient low-dimensional embeddings based on Johnson–Lindenstrauss matrices (see, e.g., [1–3, 22, 27]).
1.3 Proof Overview
Recall from Theorem 1.1 and from (1) that our goal is to prove that a matrix A given by a random sample Q of q rows of M satisfies with high probability that for all k-sparse x, ∥ Ax ∥ 2 2 ≈ ∥ x ∥ 2 2. Since M is unitary, the latter is equivalent to saying that ∥ Ax ∥ 2 2 ≈ ∥ Mx ∥ 2 2. Yet another way of expressing this condition is as
i.e., that a sample Q ⊆ [N] of q coordinates of the vector | Mx | 2 gives a good approximation to the average of all its coordinates. Here, | Mx | 2 refers to the vector obtained by taking the squared absolute value of Mx coordinate-wise. For reasons that will become clear soon, it will be convenient to assume without loss of generality that ∥ x ∥ 1 = 1. With this scaling, the sparsity assumption implies that ∥ Mx ∥ 2 2 is not too small (namely at least 1∕k), and this will determine the amount of additive error we can afford in the approximation above. This is the only way we use the sparsity assumption.
At a high level, the proof proceeds by defining a finite set of vectors \(\mathcal{H}\) that forms a net, i.e., a set satisfying that any vector | Mx | 2 is close to one of the vectors in \(\mathcal{H}\). We then argue using the Chernoff-Hoeffding bound that for any fixed vector \(h \in \mathcal{H}\), a sample of q coordinates gives a good approximation to the average of h. Finally, we complete the proof by a union bound over all \(h \in \mathcal{H}\).
In order to define the set \(\mathcal{H}\) we notice that since ∥ x ∥ 1 = 1, Mx can be seen as a weighted average of the columns of M (possibly with signs). In other words, we can think of Mx as the expectation of a vector-valued random variable given by a certain probability distribution over the columns of M. Using the Chernoff-Hoeffding bound again, this implies that we can approximate Mx well by taking the average over a small number of samples from this distribution. We then let \(\mathcal{H}\) be the set of all possible such averages, and a bound on the cardinality of \(\mathcal{H}\) follows easily (basically N raised to the number of samples). This technique is sometimes referred to as Maurey’s empirical method.
The argument above is actually oversimplified, and carrying it out leads to rather bad bounds on q. As a result, our proof in Sect. 3 is slightly more delicate. Namely, instead of just one set \(\mathcal{H}\), we have a sequence of sets, \(\mathcal{H}_{1},\mathcal{H}_{2},\ldots\), each being responsible for approximating a different scale of | Mx | 2. The first set \(\mathcal{H}_{1}\) approximates | Mx | 2 on coordinates on which its value is highest; since the value is high, we need less samples in order to approximate it well, as a result of which the set \(\mathcal{H}_{1}\) is small. The next set \(\mathcal{H}_{2}\) approximates | Mx | 2 on coordinates on which its value is somewhat smaller, and is therefore a bigger set, and so on and so forth. The end result is that any vector | Mx | 2 can be approximately decomposed into a sum ∑ i h (i), with \(h^{(i)} \in \mathcal{H}_{i}\). To complete the proof, we argue that a random choice of q coordinates approximates all the vectors in all the \(\mathcal{H}_{i}\) well. The reason working with several \(\mathcal{H}_{i}\) leads to the better bound stated in Theorem 1.1 is this: even though as i increases the number of vectors in \(\mathcal{H}_{i}\) grows, the quality of approximation that we need the q coordinates to provide decreases, since the value of | Mx | 2 there is small and so errors are less significant. It turns out that these two requirements on q balance each other perfectly, leading to the desired bound on q.
2 Preliminaries
Notation
The notation x ≈ ε, α y means that x ∈ [(1 −ε)y −α, (1 +ε)y +α]. For a matrix M, we denote by M (ℓ) the ℓth column of M and define ∥ M ∥ ∞ = max i, j | M i, j | .
The Restricted Isometry Property The restricted isometry property is defined as follows.
Definition 2.1
We say that a matrix \(A \in \mathbb{C}^{q\times N}\) satisfies the restricted isometry property of order k with constant ε if for every k-sparse vector \(x \in \mathbb{C}^{N}\) it holds that
Chernoff-Hoeffding Bounds We now state the Chernoff-Hoeffding bound (see, e.g., [24]) and derive several simple corollaries that will be used extensively later.
Theorem 2.2
Let X 1 ,…,X N be N identically distributed independent random variables in [0,a] satisfying \(\mathop{\mathbb{E}}[X_{i}] =\mu\) for all i, and denote \(\overline{X} = \frac{1} {N} \cdot \sum _{i=1}^{N}X_{ i}\) . Then there exists a universal constant C such that for every 0 < ε ≤ 1∕2, the probability that \(\overline{X} \approx _{\epsilon,0}\mu\) is at least \(1 - 2e^{-C\cdot N\mu \epsilon ^{2}/a }\) .
Corollary 2.3
Let X 1 ,…,X N be N identically distributed independent random variables in [0,a] satisfying \(\mathop{\mathbb{E}}[X_{i}] =\mu\) for all i, and denote \(\overline{X} = \frac{1} {N} \cdot \sum _{i=1}^{N}X_{ i}\) . Then there exists a universal constant C such that for every 0 < ε ≤ 1∕2 and α > 0, the probability that \(\overline{X} \approx _{\epsilon,\alpha }\mu\) is at least 1 − 2e −C⋅Nαε∕a .
Proof
If \(\mu \geq \frac{\alpha }{\varepsilon }\) then by Theorem 2.2 the probability that \(\overline{X} \approx _{\epsilon,0}\mu\) is at least \(1 - 2e^{-C\cdot N\mu \epsilon ^{2}/a }\), which is at least 1 − 2e −C⋅ N α ε∕a. Otherwise, Theorem 2.2 for \(\tilde{\varepsilon }= \frac{\alpha }{\mu }>\epsilon\) implies that the probability that \(\overline{X} \approx _{\tilde{\varepsilon },0}\mu\), hence \(\overline{X} \approx _{0,\alpha }\mu\), is at least \(1 - 2e^{-C\cdot N\mu \tilde{\varepsilon }^{2}/a }\), and the latter is at least 1 − 2e −C⋅ N α ε∕a. ■
Corollary 2.4
Let X 1 ,…,X N be N identically distributed independent random variables in [−a,+a] satisfying \(\mathop{\mathbb{E}}[X_{i}] =\mu\) and \(\mathop{\mathbb{E}}[\vert X_{i}\vert ] =\tilde{\mu }\) for all i, and denote \(\overline{X} = \frac{1} {N} \cdot \sum _{i=1}^{N}X_{ i}\) . Then there exists a universal constant C such that for every 0 < ɛ ′ ≤ 1∕2 and α > 0, the probability that \(\overline{X} \approx _{0,\epsilon ^{{\prime}}\cdot \tilde{\mu }+\alpha }\mu\) is at least \(1 - 4e^{-C\cdot N\alpha \epsilon ^{{\prime}}/a }\) .
Proof
The corollary follows by applying Corollary 2.3 to max(X i , 0) and to − min(X i , 0). ■
We end with the additive form of the bound, followed by an easy extension to the complex case.
Corollary 2.5
Let X 1 ,…,X N be N identically distributed independent random variables in [−a,+a] satisfying \(\mathop{\mathbb{E}}[X_{i}] =\mu\) for all i, and denote \(\overline{X} = \frac{1} {N} \cdot \sum _{i=1}^{N}X_{ i}\) . Then there exists a universal constant C such that for every b > 0, the probability that \(\overline{X} \approx _{0,b}\mu\) is at least \(1 - 4e^{-C\cdot Nb^{2}/a^{2} }\) .
Proof
We can assume that b ≤ 2a. The corollary follows by applying Corollary 2.4 to, say, α = 3b∕4 and ε ′ = b∕(4a). ■
Corollary 2.6
Let X 1 ,…,X N be N identically distributed independent complex-valued random variables satisfying |X i |≤ a and \(\mathop{\mathbb{E}}[X_{i}] =\mu\) for all i, and denote \(\overline{X} = \frac{1} {N} \cdot \sum _{i=1}^{N}X_{ i}\) . Then there exists a universal constant C such that for every b > 0, the probability that \(\vert \overline{X}\vert \approx _{0,b}\vert \mu \vert\) is at least \(1 - 8e^{-C\cdot Nb^{2}/a^{2} }\) .
Proof
By Corollary 2.5 applied to the real and imaginary parts of the random variables X 1, …, X N it follows that for a universal constant C, the probability that \(\mathsf{Re}(\overline{X}) \approx _{0,b/\sqrt{2}}\mathsf{Re}(\mu )\) and \(\mathsf{Im}(\overline{X}) \approx _{0,b/\sqrt{2}}\mathsf{Im}(\mu )\) is at least \(1 - 8e^{-C\cdot Nb^{2}/a^{2} }\). By triangle inequality, it follows that with such probability we have \(\vert \overline{X}\vert \approx _{0,b}\vert \mu \vert\), as required. ■
3 The Simpler Analysis
In this section we prove our result with a multiplicative term of O(ε −4) in the bound. This will be obtained in Theorem 3.7 as an easy corollary of the following theorem.
Theorem 3.1
For a sufficiently large N, a matrix \(M \in \mathbb{C}^{N\times N}\) , and sufficiently small ε,η > 0, the following holds. For some q = O(ε −3 η −1log N ⋅ log2 (1∕η)), let Q be a multiset of q uniform and independent random elements of [N]. Then, with probability \(1 - 2^{-\Omega (\epsilon ^{-2}\cdot \log N\cdot \log (1/\eta )) }\) , it holds that for every \(x \in \mathbb{C}^{N}\) ,
Throughout the proof we assume without loss of generality that the matrix \(M \in \mathbb{C}^{N\times N}\) satisfies ∥ M ∥ ∞ = 1. For ε, η > 0, we denote t = log2(1∕η), r = log2(1∕ε 2), and γ = η∕(2t).
We now define the approximating vector sets \(\mathcal{H}_{i}\), i = 1, …, t, each responsible for coordinates of | Mx | 2 of a different scale (the larger the i the smaller the scale). We start by defining the “raw approximations” \(\mathcal{G}_{i}\), which are essentially vectors obtained by averaging a certain number of columns of M. We then define the vectors in \(\mathcal{H}_{i}\) by restricting the vectors in \(\mathcal{G}_{i}\) (actually \(\mathcal{G}_{i+r}\)) to the set of coordinates B i where there is a clear “signal” and not just noise. This is necessary in order to make sure that the small coordinates of | Mx | 2 are not flooded by noise from the coarse approximations. Details follow.
The Vector Sets \(\mathcal{G}_{i}\) For every 1 ≤ i ≤ t + r, let \(\mathcal{G}_{i}\) denote the set of all vectors \(g^{(i)} \in \mathbb{C}^{N}\) that can be represented as
for a multiset F of O(2i ⋅ log(1∕γ)) pairs in [N] ×{ 0, 1, 2, 3}. A trivial counting argument gives the following.
Claim 3.2
For every 1 ≤ i ≤ t + r, \(\vert \mathcal{G}_{i}\vert \leq N^{O(2^{i}\cdot \log (1/\gamma )) }.\)
The Vector Sets \(\mathcal{H}_{i}\) For a t-tuple of vectors \((g^{(1+r)},\ldots,g^{(t+r)}) \in \mathcal{G}_{1+r} \times \cdots \times \mathcal{G}_{t+r}\) and for 1 ≤ i ≤ t, let B i be the set of all j ∈ [N] for which i is the smallest index satisfying | g j (i+r) | ≥ 2 ⋅ 2−i∕2. For such i, define the vector h (i) by
Let \(\mathcal{H}_{i}\) be the set of all vectors h (i) that can be obtained in this way.
Claim 3.3
For every 1 ≤ i ≤ t, \(\vert \mathcal{H}_{i}\vert \leq N^{O(\epsilon ^{-2}\cdot 2^{i}\cdot \log (1/\gamma )) }.\)
Proof
Observe that every \(h^{(i)} \in \mathcal{H}_{i}\) is fully defined by some \((g^{(1+r)},\ldots,g^{(i+r)}) \in \mathcal{G}_{1+r} \times \cdots \times \mathcal{G}_{i+r}\). Hence
Using the definition of r, the claim follows.■
Lemma 3.4
For every \(\tilde{\eta }> 0\) and some \(q = O(\epsilon ^{-3}\tilde{\eta }^{-1}\log N \cdot \log (1/\gamma ))\) , let Q be a multiset of q uniform and independent random elements of [N]. Then, with probability \(1 - 2^{-\Omega (\epsilon ^{-2}\cdot \log N\cdot \log (1/\gamma )) }\) , it holds that for all 1 ≤ i ≤ t and \(h^{(i)} \in \mathcal{H}_{i}\) ,
Proof
Fix an 1 ≤ i ≤ t and a vector \(h^{(i)} \in \mathcal{H}_{i}\), and denote \(\mu =\mathop{ \mathbb{E}} _{j\in [N]}[h_{j}^{(i)}]\). By Corollary 2.3, applied with \(\alpha =\tilde{\eta }\) and a = 9 ⋅ 2−i (recall that h j (i) ≤ a for every j), with probability \(1 - 2^{-\Omega (2^{i}\cdot q\epsilon \tilde{\eta }) }\), it holds that \(\mathop{\mathbb{E}} _{j\in Q}[h_{j}^{(i)}] \approx _{\epsilon,\tilde{\eta }}\mu\). Using Claim 3.3, the union bound over all the vectors in \(\mathcal{H}_{i}\) implies that the probability that some \(h^{(i)} \in \mathcal{H}_{i}\) does not satisfy \(\mathop{\mathbb{E}} _{j\in Q}[h_{j}^{(i)}] \approx _{\epsilon,\tilde{\eta }}\mu\) is at most
We complete the proof by a union bound over i. ■
Approximating the Vectors Mx
Lemma 3.5
For every vector \(x \in \mathbb{C}^{N}\) with ∥x∥ 1 = 1, every multiset Q ⊆ [N], and every 1 ≤ i ≤ t + r, there exists a vector \(g \in \mathcal{G}_{i}\) that satisfies \(\vert (Mx)_{j}\vert \approx _{0,2^{-i/2}}\vert g_{j}\vert\) for all but at most γ fraction of j ∈ [N] and for all but at most γ fraction of j ∈ Q.
Proof
Observe that for every ℓ ∈ [N] there exist p ℓ, 0, p ℓ, 1, p ℓ, 2, p ℓ, 3 ≥ 0 that satisfy
Notice that the assumption ∥ x ∥ 1 = 1 implies that the numbers p ℓ, s form a probability distribution. Thus, the vector Mx can be represented as
where D is the distribution that assigns probability p ℓ, s to the pair (ℓ, s).
Let F be a multiset of O(2i ⋅ log(1∕γ)) independent random samples from D, and let \(g \in \mathcal{G}_{i}\) be the vector corresponding to F as in (2). By Corollary 2.6, applied with \(a = \sqrt{2}\) (recall that ∥ M ∥ ∞ = 1) and b = 2−i∕2, for every j ∈ [N] the probability that
is at least 1 −γ∕4. It follows that the expected number of j ∈ [N] that do not satisfy (4) is at most γ N∕4, so by Markov’s inequality the probability that the number of j ∈ [N] that do not satisfy (4) is at most γ N is at least 3∕4. Similarly, the expected number of j ∈ Q that do not satisfy (4) is at most γ | Q | ∕4, so by Markov’s inequality, with probability at least 3∕4 it holds that the number of j ∈ Q that do not satisfy (4) is at most γ | Q | . It follows that there exists a vector \(g \in \mathcal{G}_{i}\) for which (4) holds for all but at most γ fraction of j ∈ [N] and for all but at most γ fraction of j ∈ Q, as required. ■
Lemma 3.6
For every multiset Q ⊆ [N] and every vector \(x \in \mathbb{C}^{N}\) with ∥x∥ 1 = 1 there exists a t-tuple of vectors \((h^{(1)},\ldots,h^{(t)}) \in \mathcal{H}_{1} \times \cdots \times \mathcal{H}_{t}\) for which
-
1.
\(\mathop{\mathbb{E}} _{j\in Q}\left [\vert (Mx)_{j}\vert ^{2}\right ] \approx _{O(\epsilon ),O(\eta )}\mathop{ \mathbb{E}} _{j\in Q}\left [\sum _{i=1}^{t}h_{j}^{(i)}\right ]\) and
-
2.
\(\mathop{\mathbb{E}} _{j\in [N]}\left [\vert (Mx)_{j}\vert ^{2}\right ] \approx _{O(\epsilon ),O(\eta )}\mathop{ \mathbb{E}} _{j\in [N]}\left [\sum _{i=1}^{t}h_{j}^{(i)}\right ]\) .
Proof
By Lemma 3.5, for every 1 ≤ i ≤ t there exists a vector \(g^{(i+r)} \in \mathcal{G}_{i+r}\) that satisfies
for all but at most γ fraction of j ∈ [N] and for all but at most γ fraction of j ∈ Q. We say that j ∈ [N] is good if (5) holds for every 1 ≤ i ≤ t, and otherwise that it is bad. Notice that all but at most t γ fraction of j ∈ [N] are good and that all but at most t γ fraction of j ∈ Q are good. Let (h (1), …, h (t)) and (B 1, …, B t ) be the vectors and sets associated with (g (1+r), …, g (t+r)) as defined in (3). We claim that h (1), …, h (t) satisfy the requirements of the lemma.
We first show that for every good j it holds that | (Mx) j | 2 ≈ 3ε, 9η ∑ i = 1 t h j (i). To obtain it, we observe that if j ∈ B i for some i, then
The lower bound follows simply from the definition of B i . For the upper bound, which trivially holds for i = 1, assume that i ≥ 2, and notice that the definition of B i implies that | g j (i+r−1) | < 2 ⋅ 2−(i−1)∕2. Using (5), and assuming that ε is sufficiently small, we obtain that
Hence, by the upper bound in (6), for a good j ∈ B i we have h j (i) = | g j (i+r) | 2 and \(h_{j}^{(i^{{\prime}}) } = 0\) for i ′ ≠ i. Observe that by the lower bound in (6),
and that this implies that \(\vert (Mx)_{j}\vert ^{2} \approx _{3\epsilon,0}\sum _{i=1}^{t}h_{j}^{(i)}\). On the other hand, in case that j is good but does not belong to any B i , recalling that t = log2(1∕η), it follows that
and thus \(\vert (Mx)_{j}\vert ^{2} \approx _{0,9\eta }0 =\sum _{ i=1}^{t}h_{j}^{(i)}\).
Finally, for every bad j we have
Since at most t γ fraction of the elements in [N] and in Q are bad, their effect on the difference between the expectations in the lemma can be bounded by 2t γ. By our choice of γ, this is η, completing the proof of the lemma. ■
Finally, we are ready to prove Theorem 3.1.
Proof of Theorem 3.1
By Lemma 3.4, applied with \(\tilde{\eta }=\eta /(2t)\), a random multiset Q of size
satisfies with probability \(1 - 2^{-\Omega (\epsilon ^{-2}\cdot \log N\cdot \log (1/\eta )) }\) that for all 1 ≤ i ≤ t and \(h^{(i)} \in \mathcal{H}_{i}\),
in which case we also have
We show that a Q with the above property satisfies the requirement of the theorem. Let \(x \in \mathbb{C}^{N}\) be a vector, and assume without loss of generality that ∥ x ∥ 1 = 1. By Lemma 3.6, there exists a t-tuple of vectors \((h^{(1)},\ldots,h^{(t)}) \in \mathcal{H}_{1} \times \cdots \times \mathcal{H}_{t}\) satisfying Items 1 and 2 there. As a result,
and we are done. ■
3.1 The Restricted Isometry Property
Equipped with Theorem 3.1, it is easy to derive our result on the restricted isometry property (see Definition 2.1) of random sub-matrices of unitary matrices.
Theorem 3.7
For sufficiently large N and k, a unitary matrix \(M \in \mathbb{C}^{N\times N}\) satisfying \(\|M\|_{\infty }\leq O(1/\sqrt{N})\) , and a sufficiently small ε > 0, the following holds. For some q = O(ε −4 ⋅ k ⋅ log2 (k∕ε) ⋅ log N), let \(A \in \mathbb{C}^{q\times N}\) be a matrix whose q rows are chosen uniformly and independently from the rows of M, multiplied by \(\sqrt{N/q}\) . Then, with probability \(1 - 2^{-\Omega (\epsilon ^{-2}\cdot \log N\cdot \log (k/\epsilon )) }\) , the matrix A satisfies the restricted isometry property of order k with constant ε.
Proof
Let Q be a multiset of q uniform and independent random elements of [N], defining a matrix A as above. Notice that by the Cauchy-Schwarz inequality, any k-sparse vector \(x \in \mathbb{C}^{N}\) with ∥ x ∥ 2 = 1 satisfies \(\|x\|_{1} \leq \sqrt{k}\). Applying Theorem 3.1 with ε∕2 and some \(\eta = \Omega (\epsilon /k)\), we get that with probability \(1 - 2^{-\Omega (\epsilon ^{-2}\cdot \log N\cdot \log (k/\epsilon )) }\), it holds that for every \(x \in \mathbb{C}^{N}\) with ∥ x ∥ 2 = 1,
It follows that every vector \(x \in \mathbb{C}^{N}\) satisfies ∥ Ax ∥ 2 2 ≈ ε, 0 ∥ x ∥ 2 2, hence A satisfies the restricted isometry property of order k with constant ε. ■
4 The Improved Analysis
In this section we prove the following theorem, which improves the bound of Theorem 3.1 in terms of the dependence on ε.
Theorem 4.1
For a sufficiently large N, a matrix \(M \in \mathbb{C}^{N\times N}\) , and sufficiently small ε,η > 0, the following holds. For some q = O(log2 (1∕ε) ⋅ε −1 η −1log N ⋅ log2 (1∕η)), let Q be a multiset of q uniform and independent random elements of [N]. Then, with probability \(1 - 2^{-\Omega (\log N\cdot \log (1/\eta ))}\) , it holds that for every \(x \in \mathbb{C}^{N}\) ,
We can assume that ε ≥ η, as otherwise, one can apply the theorem with parameters η∕2, η∕2 and derive (7) for ε, η as well (because the right-hand size is bounded from above by ∥ x ∥ 1 2 ⋅ ∥ M ∥ ∞ 2). As before, we assume without loss of generality that ∥ M ∥ ∞ = 1. For ε ≥ η > 0, we define t = log2(1∕η) and r = log2(1∕ε 2). For the analysis given in this section, we define γ = η∕(60(t + r)). Throughout the proof, we use the vector sets \(\mathcal{G}_{i}\) from Sect. 3 and Lemma 3.5 for this value of γ.
The Vector Sets \(\mathcal{D}_{i,m}\) For a (t + r)-tuple of vectors \((g^{(1)},\ldots,g^{(t+r)}) \in \mathcal{G}_{1} \times \cdots \times \mathcal{G}_{t+r}\) and for 1 ≤ i ≤ t, let C i be the set of all j ∈ [N] for which i is the smallest index satisfying | g j (i) | ≥ 2 ⋅ 2−i∕2. For m = i, …, i + r define the vector h (i, m) by
and for other values of m define h (i, m) = 0. Now, for every m, let \(\Delta ^{(i,m)}\) be the vector defined by
Note that the support of \(\Delta ^{(i,m)}\) is contained in C i . Let \(\mathcal{D}_{i,m}\) be the set of all vectors \(\Delta ^{(i,m)}\) that can be obtained in this way.
Claim 4.2
For every 1 ≤ i ≤ t and i ≤ m ≤ i + r, \(\vert \mathcal{D}_{i,m}\vert \leq N^{O(2^{m}\cdot \log (1/\gamma )) }.\)
Proof
Observe that every vector in \(\mathcal{D}_{i,m}\) is fully defined by some \((g^{(1)},\ldots,g^{(m)}) \in \mathcal{G}_{1} \times \cdots \times \mathcal{G}_{m}\). Hence
and the claim follows. ■
Lemma 4.3
For every \(\tilde{\varepsilon },\tilde{\eta }> 0\) and some \(q = O(\tilde{\varepsilon }^{-1}\tilde{\eta }^{-1}\log N \cdot \log (1/\gamma ))\) , let Q be a multiset of q uniform and independent random elements of [N]. Then, with probability \(1 - 2^{-\Omega (\log N\cdot \log (1/\gamma ))}\) , it holds that for every 1 ≤ i ≤ t, m, and a vector \(\Delta ^{(i,m)} \in \mathcal{D}_{i,m}\) associated with a set C i ,
Proof
Fix i, m, and a vector \(\Delta ^{(i,m)} \in \mathcal{D}_{i,m}\) associated with a set C i as in (9). Notice that
By Corollary 2.4, applied with
we have that (10) holds with probability \(1 - 2^{-\Omega (2^{m}\cdot q\tilde{\varepsilon }\tilde{\eta }) }\). Using Claim 4.2, the union bound over all the vectors in \(\mathcal{D}_{i,m}\) implies that the probability that some \(\Delta ^{(i,m)} \in \mathcal{D}_{i,m}\) does not satisfy (10) is at most
The result follows by a union bound over i and m. ■
Approximating the Vectors Mx
Lemma 4.4
For every multiset Q ⊆ [N] and every vector \(x \in \mathbb{C}^{N}\) with ∥x∥ 1 = 1 there exist vector collections \((\Delta ^{(i,m)} \in \mathcal{D}_{i,m})_{m=i,\ldots,i+r}\) associated with sets C i (1 ≤ i ≤ t), for which
-
1.
\(\mathop{\mathbb{E}} _{j\in [N]}\left [\vert (Mx)_{j}\vert ^{2}\right ] \geq \sum _{i=1}^{t}2^{-i} \cdot \frac{\vert C_{i}\vert } {N} -\eta,\)
-
2.
\(\mathop{\mathbb{E}} _{j\in Q}\left [\vert (Mx)_{j}\vert ^{2}\right ] \approx _{O(\epsilon ),O(\eta )}\mathop{ \mathbb{E}} _{j\in Q}\left [\sum _{i=1}^{t}\sum _{m=i}^{i+r}\Delta _{j}^{(i,m)}\right ],\) and
-
3.
\(\mathop{\mathbb{E}} _{j\in [N]}\left [\vert (Mx)_{j}\vert ^{2}\right ] \approx _{O(\epsilon ),O(\eta )}\mathop{ \mathbb{E}} _{j\in [N]}\left [\sum _{i=1}^{t}\sum _{m=i}^{i+r}\Delta _{j}^{(i,m)}\right ].\)
Proof
By Lemma 3.5, for every 1 ≤ i ≤ t + r there exists a vector \(g^{(i)} \in \mathcal{G}_{i}\) that satisfies
for all but at most γ fraction of j ∈ [N] and for all but at most γ fraction of j ∈ Q. We say that j ∈ [N] is good if (11) holds for every i, and otherwise that it is bad. Notice that all but at most (t + r)γ fraction of j ∈ [N] are good and that all but at most (t + r)γ fraction of j ∈ Q are good. Consider the sets C i and vectors \(h^{(i,m)},\Delta ^{(i,m)}\) associated with (g (1), …, g (t+r)) as defined in (8). We claim that \(\Delta ^{(i,m)}\) satisfy the requirements of the lemma.
Fix some 1 ≤ i ≤ t. For every good j ∈ C i , the definition of C i implies that | g j (i) | ≥ 2 ⋅ 2−i∕2, so using (11) it follows that
We also claim that | (Mx) j | ≤ 3 ⋅ 2−(i−1)∕2. This trivially holds for i = 1, so assume that i ≥ 2, and notice that the definition of C i implies that | g j (i−1) | < 2 ⋅ 2−(i−1)∕2, so using (11), it follows that
Since at most (t + r)γ fraction of j ∈ [N] are bad, (12) yields that
as required for Item 1.
Next, we claim that every good j satisfies
For a good j ∈ C i and m ≥ i,
where the first inequality follows from (11) and the second from (13). In particular, for m = i + r (recall that r = log2(1∕ε 2)), we have
and thus \(\vert (Mx)_{j}\vert ^{2} \approx _{O(\epsilon ),0}h_{j}^{(i,i+r)}\). Since every good j belongs to at most one of the sets C i , for every good \(j \in \bigcup C_{i}\) we have \(\vert (Mx)_{j}\vert ^{2} \approx _{O(\epsilon ),0}\sum _{i=1}^{t}h_{j}^{(i,i+r)}\). On the other hand, if j is good but does not belong to any C i , by our choice of t, it satisfies
and thus \(\vert (Mx)_{j}\vert ^{2} \approx _{0,9\eta }0 =\sum _{ i=1}^{t}h_{j}^{(i,i+r)}\). This establishes that (14) holds for every good j.
Next, we claim that for every good j,
This follows since for every 1 ≤ i ≤ t, the vector h (i, i+r) can be written as the telescopic sum
where we used that h (i, i−1) = 0. We claim that for every good j, these differences satisfy
thus establishing that (16) holds for every good j. Indeed, for m ≥ i + 1, (15) implies that
and for m = i it follows from (11) combined with (13).
Finally, for every bad j we have
Since at most (t + r)γ fraction of the elements in [N] and in Q are bad, their effect on the difference between the expectations in Items 2 and 3 can be bounded by 60(t + r)γ. By our choice of γ this is η, as required. ■
Finally, we are ready to prove Theorem 4.1.
Proof of Theorem 4.1
Recall that it can be assumed that ε ≥ η. By Lemma 4.3, applied with \(\tilde{\varepsilon }=\epsilon /r\) and \(\tilde{\eta }=\eta /(rt)\), a random multiset Q of size
satisfies with probability \(1 - 2^{-\Omega (\log N\cdot \log (1/\eta ))}\), that for every 1 ≤ i ≤ t, m, and \(\Delta ^{(i,m)} \in \mathcal{D}_{i,m}\) associated with a set C i ,
in which case we also have
We show that a Q with the above property satisfies the requirement of the theorem. Let \(x \in \mathbb{C}^{N}\) be a vector, and assume without loss of generality that ∥ x ∥ 1 = 1. By Lemma 4.4, there exist vector collections \((\Delta ^{(i,m)} \in \mathcal{D}_{i,m})_{m=i,\ldots,i+r}\) associated with sets C i (1 ≤ i ≤ t), satisfying Items 1, 2, and 3 there. Combined with (18), this gives
and we are done. ■
4.1 The Restricted Isometry Property
It is easy to derive now the following theorem. The proof is essentially identical to that of Theorem 3.7, using Theorem 4.1 instead of Theorem 3.1.
Theorem 4.5
For sufficiently large N and k, a unitary matrix \(M \in \mathbb{C}^{N\times N}\) satisfying \(\|M\|_{\infty }\leq O(1/\sqrt{N})\) , and a sufficiently small ε > 0, the following holds. For some q = O(log2 (1∕ε)ε −2 ⋅ k ⋅ log2 (k∕ε) ⋅ log N), let \(A \in \mathbb{C}^{q\times N}\) be a matrix whose q rows are chosen uniformly and independently from the rows of M, multiplied by \(\sqrt{N/q}\) . Then, with probability \(1 - 2^{-\Omega (\log N\cdot \log (k/\epsilon ))}\) , the matrix A satisfies the restricted isometry property of order k with constant ε.
Notes
- 1.
A preliminary version appeared in Proceedings of the 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2016, pages 288–297.
- 2.
- 3.
The result in [19] is weaker in two main respects. First, it is restricted to the case that Ax is in {0, 1}q. This significantly simplifies the analysis and leads to a better bound on the number of rows of A. Second, the order of quantifiers is switched, namely it shows that for any sparse x, a random subsampled A works with high probability, whereas for the restricted isometry property we need to show that a random A works for all sparse x.
References
N. Ailon, B. Chazelle, The fast Johnson–Lindenstrauss transform and approximate nearest neighbors. SIAM J. Comput. 39 (1), 302–322 (2009). Preliminary version in STOC’06
N. Ailon, E. Liberty, Fast dimension reduction using Rademacher series on dual BCH codes. Discrete Comput. Geom. 42 (4), 615–630 (2009). Preliminary version in SODA’08
N. Ailon, E. Liberty, An almost optimal unrestricted fast Johnson–Lindenstrauss transform. ACM Trans. Algorithms 9 (3), 21 (2013). Preliminary version in SODA’11
A.S. Bandeira, E. Dobriban, D.G. Mixon, W.F. Sawin, Certifying the restricted isometry property is hard. IEEE Trans. Inform. Theory 59 (6), 3448–3450 (2013)
A.S. Bandeira, M.E. Lewis, D.G. Mixon, Discrete uncertainty principles and sparse signal processing. CoRR abs/1504.01014 (2015)
R. Baraniuk, M. Davenport, R. DeVore, M. Wakin, A simple proof of the restricted isometry property for random matrices. Constr. Approx. 28 (3), 253–263 (2008)
J. Bourgain, An improved estimate in the restricted isometry problem, in Geometric Aspects of Functional Analysis. Lecture Notes in Mathematics, vol. 2116, pp. 65–70 (Springer, Berlin, 2014)
E.J. Candès, The restricted isometry property and its implications for compressed sensing. C. R. Math. 346 (9–10), 589–592 (2008)
E.J. Candès, T. Tao, Decoding by linear programming. IEEE Trans. Inform. Theory 51 (12), 4203–4215 (2005)
E.J. Candès, T. Tao, Near-optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. on Inform. Theory 52 (12), 5406–5425 (2006)
E.J. Candès, M. Rudelson, T. Tao, R. Vershynin, Error correction via linear programming, in 46th Annual IEEE Symposium on Foundations of Computer Science, FOCS, pp. 295–308 (2005)
E.J. Candès, J.K. Romberg, T. Tao, Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59 (8), 1207–1223 (2006)
S.S. Chen, D.L. Donoho, M.A. Saunders, Atomic decomposition by basis pursuit. SIAM J. Comput. 20 (1), 33–61 (1998)
M. Cheraghchi, V. Guruswami, A. Velingker, Restricted isometry of Fourier matrices and list decodability of random linear codes. SIAM J. Comput. 42 (5), 1888–1914 (2013). Preliminary version in SODA’13
S. Dirksen, Tail bounds via generic chaining. Electron. J. Prob. 20 (53), 1–29 (2015)
D.L. Donoho, M. Elad, V.N. Temlyakov, Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 (1), 6–18 (2006)
S. Foucart, A. Pajor, H. Rauhut, T. Ullrich, The Gelfand widths of ℓ p -balls for 0 < p ≤ 1. J. Complex. 26 (6), 629–640 (2010)
A.Y. Garnaev, E.D. Gluskin, On the widths of Euclidean balls. Sov. Math. Dokl. 30, 200–203 (1984)
I. Haviv, O. Regev, The list-decoding size of Fourier-sparse boolean functions, in Proceedings of the 30th Conference on Computational Complexity, CCC, pp. 58–71 (2015)
P. Indyk, I. Razenshteyn, On model-based RIP-1 matrices, in Automata, Languages, and Programming - 40th International Colloquium, ICALP, pp. 564–575 (2013)
A.C. Kak, M. Slaney, Principles of Computerized Tomographic Imaging (Society of Industrial and Applied Mathematics, Philadelphia, 2001)
F. Krahmer, R. Ward, New and improved Johnson-Lindenstrauss embeddings via the restricted isometry property. SIAM J. Math. Anal. 43 (3), 1269–1281 (2011)
F. Krahmer, S. Mendelson, H. Rauhut, Suprema of chaos processes and the restricted isometry property. CoRR abs/1207.0235 (2012)
C. McDiarmid, Concentration, in Probabilistic Methods for Algorithmic Discrete Mathematics. Algorithms Combination, vol. 16 (Springer, Berlin, 1998), pp. 195–248
A. Natarajan, Y. Wu, Computational complexity of certifying restricted isometry property, in Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX, pp. 371–380 (2014)
D. Needell, J.A. Tropp, CoSaMP: iterative signal recovery from incomplete and inaccurate samples. Commun. ACM 53 (12), 93–100 (2010)
J. Nelson, E. Price, M. Wootters, New constructions of RIP matrices with fast multiplication and fewer rows, in Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pp. 1515–1528 (2014)
D.G. Nishimura, Principles of Magnetic Resonance Imaging (Stanford University, Stanford, CA, 2010)
H. Rauhut, Compressive sensing and structured random matrices, in Theoretical Foundations and Numerical Methods for Sparse Recovery, vol. 9, ed. by M. Fornasier (De Gruyter, Berlin, 2010), pp. 1–92
M. Rudelson, R. Vershynin, On sparse reconstruction from Fourier and Gaussian measurements. Commun. Pure Appl. Math. 61 (8), 1025–1045 (2008). Preliminary version in CISS’06
A.M. Tillmann, M.E. Pfetsch, The computational complexity of the restricted isometry property, the nullspace property, and related concepts in compressed sensing. IEEE Trans. Inform. Theory 60 (2), 1248–1259 (2014)
S. Wenger, S. Darabi, P. Sen, K. Glassmeier, M.A. Magnor, Compressed sensing for aperture synthesis imaging, in Proceedings of the International Conference on Image Processing, ICIP, pp. 1381–1384 (2010)
M. Wootters, On the list decodability of random linear codes with large error rates, in Proceedings of the 45th Annual ACM Symposium on Theory of Computing, STOC, pp. 853–860 (2013)
Acknowledgements
We thank Afonso S. Bandeira, Mahdi Cheraghchi, Michael Kapralov, Jelani Nelson, and Eric Price for useful discussions, and anonymous reviewers for useful comments.
Oded Regev was supported by the Simons Collaboration on Algorithms and Geometry and by the National Science Foundation (NSF) under Grant No. CCF-1320188. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Haviv, I., Regev, O. (2017). The Restricted Isometry Property of Subsampled Fourier Matrices. In: Klartag, B., Milman, E. (eds) Geometric Aspects of Functional Analysis. Lecture Notes in Mathematics, vol 2169. Springer, Cham. https://doi.org/10.1007/978-3-319-45282-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-45282-1_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45281-4
Online ISBN: 978-3-319-45282-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)