Footnote 1

1 Introduction

A matrix \(A \in \mathbb{C}^{q\times N}\) satisfies the restricted isometry property of order k with constant ε > 0 if for every k-sparse vector \(x \in \mathbb{C}^{N}\) (i.e., a vector with at most k nonzero entries), it holds that

$$\displaystyle{ (1-\epsilon ) \cdot \| x\|_{2}^{2} \leq \| Ax\|_{ 2}^{2} \leq (1+\epsilon ) \cdot \| x\|_{ 2}^{2}\;. }$$
(1)

Intuitively, this means that every k columns of A are nearly orthogonal. This notion, due to Candès and Tao [9], was intensively studied during the last decade and found various applications and connections to several areas of theoretical computer science, including sparse recovery [8, 20, 27], coding theory [14], norm embeddings [6, 22], and computational complexity [4, 25, 31].

The original motivation for the restricted isometry property comes from the area of compressed sensing. There, one wishes to compress a high-dimensional sparse vector \(x \in \mathbb{C}^{N}\) to a vector Ax, where \(A \in \mathbb{C}^{q\times N}\) is a measurement matrix that enables reconstruction of x from Ax. Typical goals in this context include minimizing the number of measurements q and the running time of the reconstruction algorithm. It is known that the restricted isometry property of A, for \(\epsilon <\sqrt{2} - 1\), is a sufficient condition for reconstruction. In fact, it was shown in [8, 9, 11, 12] that under this condition, reconstruction is equivalent to finding the vector of least 1 norm among all vectors that agree with the given measurements, a task that can be formulated as a linear program [13, 16], and thus can be solved efficiently.

The above application leads to the challenge of finding matrices \(A \in \mathbb{C}^{q\times N}\) that satisfy the restricted isometry property and have a small number of rows q as a function of N and k. (For simplicity, we ignore for now the dependence on ε.) A general lower bound of \(q = \Omega (k \cdot \log (N/k))\) is known to follow from [18] (see also [17]). Fortunately, there are matrices that match this lower bound, e.g., random matrices whose entries are chosen independently according to the normal distribution [10]. However, in many applications the measurement matrix cannot be chosen arbitrarily but is instead given by a random sample of rows from a unitary matrix, typically the discrete Fourier transform. This includes, for instance, various tests and experiments in medicine and biology (e.g., MRI [28] and ultrasound imaging [21]) and applications in astronomy (e.g., radio telescopes [32]). An advantage of subsampled Fourier matrices is that they support fast matrix-vector multiplication, and as such, are useful for efficient compression as well as for efficient reconstruction based on iterative methods (see, e.g., [26]).

In recent years, with motivation from both theory and practice, an intensive line of research has aimed to study the restricted isometry property of random sub-matrices of unitary matrices. Letting \(A \in \mathbb{C}^{q\times N}\) be a (normalized) matrix whose rows are chosen uniformly and independently from the rows of a unitary matrix \(M \in \mathbb{C}^{N\times N}\), the goal is to prove an upper bound on q for which A is guaranteed to satisfy the restricted isometry property with high probability. Note that the fact that the entries of every row of A are not independent makes this question much more difficult than in the case of random matrices with independent entries.

The first upper bound on the number of rows of a subsampled Fourier matrix that satisfies the restricted isometry property was O(k ⋅ log6 N), which was proved by Candès and Tao [10]. This was then improved by Rudelson and Vershynin [30] to O(k ⋅ log2 k ⋅ log(klogN) ⋅ logN) (see also [15, 29] for a simplified analysis with better success probability). A modification of their analysis led to an improved bound of O(k ⋅ log3 k ⋅ logN) by Cheraghchi, Guruswami, and Velingker [14], who related the problem to a question on the list-decoding rate of random linear codes over finite fields. Interestingly, replacing the log(klogN) term in the bound of [30] by logk was crucial for their application.Footnote 2 Recently, Bourgain [7] proved a bound of O(k ⋅ logk ⋅ log2 N), which is incomparable to those of [14, 30] (and has a worse dependence on ε; see below). We finally mention that the best known lower bound on the number of rows is \(\Omega (k \cdot \log N)\) [5].

1.1 Our Contribution

In this work, we improve the previous bounds and prove the following.

Theorem 1.1 (Simplified)

Let \(M \in \mathbb{C}^{N\times N}\) be a unitary matrix with entries of absolute value \(O(1/\sqrt{N})\) , and let ε > 0 be a fixed constant. For some q = O(k ⋅ log2 k ⋅ log N), let \(A \in \mathbb{C}^{q\times N}\) be a matrix whose q rows are chosen uniformly and independently from the rows of M, multiplied by \(\sqrt{N/q}\) . Then, with high probability, the matrix A satisfies the restricted isometry property of order k with constant ε.

The main idea in our proof is described in Sect. 1.3. We arrived at the proof from our recent work on list-decoding [19], where a baby version of the idea was used to bound the sample complexity of learning the class of Fourier-sparse Boolean functions.Footnote 3 Like all previous work on this question, our proof can be seen as a careful union bound applied to a sequence of progressively finer nets, a technique sometimes known as chaining. However, unlike the work of Rudelson and Vershynin [30] and its improvements [14, 15], we avoid the use of Gaussian processes, the “symmetrization process,” and Dudley’s inequality. Instead, we follow and refine Bourgain’s proof [7], and apply the chaining argument directly to the problem at hand using only elementary arguments. It would be interesting to see if our proof can be cast in the Gaussian framework of Rudelson and Vershynin.

We remark that the bounds obtained in the previous works [14, 30] have a multiplicative O(ε −2) term, whereas a much worse term of O(ε −6) was obtained in [7]. In our proof of Theorem 1.1 we nearly obtain the best known dependence on ε. For simplicity of presentation we first prove in Sect. 3 our bound with a weaker multiplicative term of O(ε −4), and then, in Sect. 4, we modify the analysis and decrease the dependence on ε to O(ε −2) up to logarithmic terms.

1.2 Related Literature

As mentioned before, one important advantage of using subsampled Fourier matrices in compressed sensing is that they support fast, in fact nearly linear time, matrix-vector multiplication. In certain scenarios, however, one is not restricted to using subsampled Fourier matrices as the measurement matrix. The question then is whether one can decrease the number of rows using another measurement matrix, while still keeping the near-linear multiplication time. For k < N 1∕2−γ where γ > 0 is an arbitrary constant, the answer is yes: a construction with the optimal number O(k ⋅ logN) of rows follows from works by Ailon and Chazelle [1] and Ailon and Liberty [2] (see [6]). For general k, Nelson, Price, and Wootters [27] suggested taking subsampled Fourier matrices and “tweaking” them by bunching together rows with random signs. Using the Gaussian-process-based analysis of [14, 30] and introducing further techniques from [23], they showed that with this construction one can reduce the number of rows by a logarithmic factor to O(k ⋅ log2(klogN) ⋅ logN) while still keeping the nearly linear multiplication time. Our result shows that the same number of rows (in fact, a slightly smaller number) can be achieved already with the original subsampled Fourier matrices without having to use the “tweak.” A natural open question is whether the “tweak” from [27] and their techniques can be combined with ours to further reduce the number of rows. An improvement in the regime of parameters of \(k =\omega (\sqrt{N})\) would lead to more efficient low-dimensional embeddings based on Johnson–Lindenstrauss matrices (see, e.g., [13, 22, 27]).

1.3 Proof Overview

Recall from Theorem 1.1 and from (1) that our goal is to prove that a matrix A given by a random sample Q of q rows of M satisfies with high probability that for all k-sparse x, ∥ Ax ∥ 2 2 ≈ ∥ x ∥ 2 2. Since M is unitary, the latter is equivalent to saying that ∥ Ax ∥ 2 2 ≈ ∥ Mx ∥ 2 2. Yet another way of expressing this condition is as

$$\displaystyle{\mathop{\mathbb{E}} _{j\in Q}\left [(\vert Mx\vert ^{2})_{ j}\right ] \approx \mathop{\mathbb{E}} _{j\in [N]}\left [(\vert Mx\vert ^{2})_{ j}\right ]\;,}$$

i.e., that a sample Q ⊆ [N] of q coordinates of the vector | Mx | 2 gives a good approximation to the average of all its coordinates. Here, | Mx | 2 refers to the vector obtained by taking the squared absolute value of Mx coordinate-wise. For reasons that will become clear soon, it will be convenient to assume without loss of generality that ∥ x ∥ 1 = 1. With this scaling, the sparsity assumption implies that ∥ Mx ∥ 2 2 is not too small (namely at least 1∕k), and this will determine the amount of additive error we can afford in the approximation above. This is the only way we use the sparsity assumption.

At a high level, the proof proceeds by defining a finite set of vectors \(\mathcal{H}\) that forms a net, i.e., a set satisfying that any vector | Mx | 2 is close to one of the vectors in \(\mathcal{H}\). We then argue using the Chernoff-Hoeffding bound that for any fixed vector \(h \in \mathcal{H}\), a sample of q coordinates gives a good approximation to the average of h. Finally, we complete the proof by a union bound over all \(h \in \mathcal{H}\).

In order to define the set \(\mathcal{H}\) we notice that since ∥ x ∥ 1 = 1, Mx can be seen as a weighted average of the columns of M (possibly with signs). In other words, we can think of Mx as the expectation of a vector-valued random variable given by a certain probability distribution over the columns of M. Using the Chernoff-Hoeffding bound again, this implies that we can approximate Mx well by taking the average over a small number of samples from this distribution. We then let \(\mathcal{H}\) be the set of all possible such averages, and a bound on the cardinality of \(\mathcal{H}\) follows easily (basically N raised to the number of samples). This technique is sometimes referred to as Maurey’s empirical method.

The argument above is actually oversimplified, and carrying it out leads to rather bad bounds on q. As a result, our proof in Sect. 3 is slightly more delicate. Namely, instead of just one set \(\mathcal{H}\), we have a sequence of sets, \(\mathcal{H}_{1},\mathcal{H}_{2},\ldots\), each being responsible for approximating a different scale of | Mx | 2. The first set \(\mathcal{H}_{1}\) approximates | Mx | 2 on coordinates on which its value is highest; since the value is high, we need less samples in order to approximate it well, as a result of which the set \(\mathcal{H}_{1}\) is small. The next set \(\mathcal{H}_{2}\) approximates | Mx | 2 on coordinates on which its value is somewhat smaller, and is therefore a bigger set, and so on and so forth. The end result is that any vector | Mx | 2 can be approximately decomposed into a sum i h (i), with \(h^{(i)} \in \mathcal{H}_{i}\). To complete the proof, we argue that a random choice of q coordinates approximates all the vectors in all the \(\mathcal{H}_{i}\) well. The reason working with several \(\mathcal{H}_{i}\) leads to the better bound stated in Theorem 1.1 is this: even though as i increases the number of vectors in \(\mathcal{H}_{i}\) grows, the quality of approximation that we need the q coordinates to provide decreases, since the value of | Mx | 2 there is small and so errors are less significant. It turns out that these two requirements on q balance each other perfectly, leading to the desired bound on q.

2 Preliminaries

Notation

The notation x ≈  ε, α y means that x ∈ [(1 −ε)yα, (1 +ε)y +α]. For a matrix M, we denote by M () the th column of M and define ∥ M ∥   = max i, j  | M i, j  | .

The Restricted Isometry Property The restricted isometry property is defined as follows.

Definition 2.1

We say that a matrix \(A \in \mathbb{C}^{q\times N}\) satisfies the restricted isometry property of order k with constant ε if for every k-sparse vector \(x \in \mathbb{C}^{N}\) it holds that

$$\displaystyle{(1-\epsilon ) \cdot \| x\|_{2}^{2} \leq \| Ax\|_{ 2}^{2} \leq (1+\epsilon ) \cdot \| x\|_{ 2}^{2}.}$$

Chernoff-Hoeffding Bounds We now state the Chernoff-Hoeffding bound (see, e.g., [24]) and derive several simple corollaries that will be used extensively later.

Theorem 2.2

Let X 1 ,…,X N be N identically distributed independent random variables in [0,a] satisfying \(\mathop{\mathbb{E}}[X_{i}] =\mu\) for all i, and denote \(\overline{X} = \frac{1} {N} \cdot \sum _{i=1}^{N}X_{ i}\) . Then there exists a universal constant C such that for every 0 < ε ≤ 1∕2, the probability that \(\overline{X} \approx _{\epsilon,0}\mu\) is at least \(1 - 2e^{-C\cdot N\mu \epsilon ^{2}/a }\) .

Corollary 2.3

Let X 1 ,…,X N be N identically distributed independent random variables in [0,a] satisfying \(\mathop{\mathbb{E}}[X_{i}] =\mu\) for all i, and denote \(\overline{X} = \frac{1} {N} \cdot \sum _{i=1}^{N}X_{ i}\) . Then there exists a universal constant C such that for every 0 < ε ≤ 1∕2 and α > 0, the probability that \(\overline{X} \approx _{\epsilon,\alpha }\mu\) is at least 1 − 2e −C⋅Nαε∕a .

Proof

If \(\mu \geq \frac{\alpha }{\varepsilon }\) then by Theorem 2.2 the probability that \(\overline{X} \approx _{\epsilon,0}\mu\) is at least \(1 - 2e^{-C\cdot N\mu \epsilon ^{2}/a }\), which is at least 1 − 2e C⋅ N α εa. Otherwise, Theorem 2.2 for \(\tilde{\varepsilon }= \frac{\alpha }{\mu }>\epsilon\) implies that the probability that \(\overline{X} \approx _{\tilde{\varepsilon },0}\mu\), hence \(\overline{X} \approx _{0,\alpha }\mu\), is at least \(1 - 2e^{-C\cdot N\mu \tilde{\varepsilon }^{2}/a }\), and the latter is at least 1 − 2e C⋅ N α εa.

Corollary 2.4

Let X 1 ,…,X N be N identically distributed independent random variables in [−a,+a] satisfying \(\mathop{\mathbb{E}}[X_{i}] =\mu\) and \(\mathop{\mathbb{E}}[\vert X_{i}\vert ] =\tilde{\mu }\) for all i, and denote \(\overline{X} = \frac{1} {N} \cdot \sum _{i=1}^{N}X_{ i}\) . Then there exists a universal constant C such that for every 0 < ɛ ≤ 1∕2 and α > 0, the probability that \(\overline{X} \approx _{0,\epsilon ^{{\prime}}\cdot \tilde{\mu }+\alpha }\mu\) is at least \(1 - 4e^{-C\cdot N\alpha \epsilon ^{{\prime}}/a }\) .

Proof

The corollary follows by applying Corollary 2.3 to max(X i , 0) and to − min(X i , 0).

We end with the additive form of the bound, followed by an easy extension to the complex case.

Corollary 2.5

Let X 1 ,…,X N be N identically distributed independent random variables in [−a,+a] satisfying \(\mathop{\mathbb{E}}[X_{i}] =\mu\) for all i, and denote \(\overline{X} = \frac{1} {N} \cdot \sum _{i=1}^{N}X_{ i}\) . Then there exists a universal constant C such that for every b > 0, the probability that \(\overline{X} \approx _{0,b}\mu\) is at least \(1 - 4e^{-C\cdot Nb^{2}/a^{2} }\) .

Proof

We can assume that b ≤ 2a. The corollary follows by applying Corollary 2.4 to, say, α = 3b∕4 and ε  = b∕(4a).

Corollary 2.6

Let X 1 ,…,X N be N identically distributed independent complex-valued random variables satisfying |X i |≤ a and \(\mathop{\mathbb{E}}[X_{i}] =\mu\) for all i, and denote \(\overline{X} = \frac{1} {N} \cdot \sum _{i=1}^{N}X_{ i}\) . Then there exists a universal constant C such that for every b > 0, the probability that \(\vert \overline{X}\vert \approx _{0,b}\vert \mu \vert\) is at least \(1 - 8e^{-C\cdot Nb^{2}/a^{2} }\) .

Proof

By Corollary 2.5 applied to the real and imaginary parts of the random variables X 1, , X N it follows that for a universal constant C, the probability that \(\mathsf{Re}(\overline{X}) \approx _{0,b/\sqrt{2}}\mathsf{Re}(\mu )\) and \(\mathsf{Im}(\overline{X}) \approx _{0,b/\sqrt{2}}\mathsf{Im}(\mu )\) is at least \(1 - 8e^{-C\cdot Nb^{2}/a^{2} }\). By triangle inequality, it follows that with such probability we have \(\vert \overline{X}\vert \approx _{0,b}\vert \mu \vert\), as required.

3 The Simpler Analysis

In this section we prove our result with a multiplicative term of O(ε −4) in the bound. This will be obtained in Theorem 3.7 as an easy corollary of the following theorem.

Theorem 3.1

For a sufficiently large N, a matrix \(M \in \mathbb{C}^{N\times N}\) , and sufficiently small ε,η > 0, the following holds. For some q = O(ε −3 η −1log N ⋅ log2 (1∕η)), let Q be a multiset of q uniform and independent random elements of [N]. Then, with probability \(1 - 2^{-\Omega (\epsilon ^{-2}\cdot \log N\cdot \log (1/\eta )) }\) , it holds that for every \(x \in \mathbb{C}^{N}\) ,

$$\displaystyle{\mathop{\mathbb{E}} _{j\in Q}\left [\vert (Mx)_{j}\vert ^{2}\right ] \approx _{\epsilon,\eta \cdot \|x\|_{1}^{2}\cdot \|M\|_{\infty }^{2}}\mathop{ \mathbb{E}} _{j\in [N]}\left [\vert (Mx)_{j}\vert ^{2}\right ].}$$

Throughout the proof we assume without loss of generality that the matrix \(M \in \mathbb{C}^{N\times N}\) satisfies ∥ M ∥   = 1. For ε, η > 0, we denote t = log2(1∕η), r = log2(1∕ε 2), and γ = η∕(2t).

We now define the approximating vector sets \(\mathcal{H}_{i}\), i = 1, , t, each responsible for coordinates of | Mx | 2 of a different scale (the larger the i the smaller the scale). We start by defining the “raw approximations” \(\mathcal{G}_{i}\), which are essentially vectors obtained by averaging a certain number of columns of M. We then define the vectors in \(\mathcal{H}_{i}\) by restricting the vectors in \(\mathcal{G}_{i}\) (actually \(\mathcal{G}_{i+r}\)) to the set of coordinates B i where there is a clear “signal” and not just noise. This is necessary in order to make sure that the small coordinates of | Mx | 2 are not flooded by noise from the coarse approximations. Details follow.

The Vector Sets \(\mathcal{G}_{i}\) For every 1 ≤ i ≤ t + r, let \(\mathcal{G}_{i}\) denote the set of all vectors \(g^{(i)} \in \mathbb{C}^{N}\) that can be represented as

$$\displaystyle{ g^{(i)} = \frac{\sqrt{2}} {\vert F\vert }\cdot \sum _{(\ell,s)\in F}(-1)^{s/2} \cdot M^{(\ell)} }$$
(2)

for a multiset F of O(2i ⋅ log(1∕γ)) pairs in [N] ×{ 0, 1, 2, 3}. A trivial counting argument gives the following.

Claim 3.2

For every 1 ≤ i ≤ t + r, \(\vert \mathcal{G}_{i}\vert \leq N^{O(2^{i}\cdot \log (1/\gamma )) }.\)

The Vector Sets \(\mathcal{H}_{i}\) For a t-tuple of vectors \((g^{(1+r)},\ldots,g^{(t+r)}) \in \mathcal{G}_{1+r} \times \cdots \times \mathcal{G}_{t+r}\) and for 1 ≤ i ≤ t, let B i be the set of all j ∈ [N] for which i is the smallest index satisfying | g j (i+r) | ≥ 2 ⋅ 2i∕2. For such i, define the vector h (i) by

$$\displaystyle\begin{array}{rcl} h_{j}^{(i)} =\min (\vert g_{ j}^{(i+r)}\vert ^{2} \cdot \mathbf{1}_{ j\in B_{i}},9 \cdot 2^{-i}).& &{}\end{array}$$
(3)

Let \(\mathcal{H}_{i}\) be the set of all vectors h (i) that can be obtained in this way.

Claim 3.3

For every 1 ≤ i ≤ t, \(\vert \mathcal{H}_{i}\vert \leq N^{O(\epsilon ^{-2}\cdot 2^{i}\cdot \log (1/\gamma )) }.\)

Proof

Observe that every \(h^{(i)} \in \mathcal{H}_{i}\) is fully defined by some \((g^{(1+r)},\ldots,g^{(i+r)}) \in \mathcal{G}_{1+r} \times \cdots \times \mathcal{G}_{i+r}\). Hence

$$\displaystyle\begin{array}{rcl} \vert \mathcal{H}_{i}\vert \leq \vert \mathcal{G}_{1+r}\vert \cdots \vert \mathcal{G}_{i+r}\vert \leq N^{O(\log (1/\gamma ))\cdot (2^{1+r}+2^{2+r}+\cdots +2^{i+r}) } \leq N^{O(\log (1/\gamma ))\cdot 2^{i+r+1} }\;.& & {}\\ \end{array}$$

Using the definition of r, the claim follows.

Lemma 3.4

For every \(\tilde{\eta }> 0\) and some \(q = O(\epsilon ^{-3}\tilde{\eta }^{-1}\log N \cdot \log (1/\gamma ))\) , let Q be a multiset of q uniform and independent random elements of [N]. Then, with probability \(1 - 2^{-\Omega (\epsilon ^{-2}\cdot \log N\cdot \log (1/\gamma )) }\) , it holds that for all 1 ≤ i ≤ t and \(h^{(i)} \in \mathcal{H}_{i}\) ,

$$\displaystyle{\mathop{\mathbb{E}} _{j\in Q}\left [h_{j}^{(i)}\right ] \approx _{\epsilon,\tilde{\eta }}\mathop{ \mathbb{E}} _{j\in [N]}\left [h_{j}^{(i)}\right ].}$$

Proof

Fix an 1 ≤ i ≤ t and a vector \(h^{(i)} \in \mathcal{H}_{i}\), and denote \(\mu =\mathop{ \mathbb{E}} _{j\in [N]}[h_{j}^{(i)}]\). By Corollary 2.3, applied with \(\alpha =\tilde{\eta }\) and a = 9 ⋅ 2i (recall that h j (i) ≤ a for every j), with probability \(1 - 2^{-\Omega (2^{i}\cdot q\epsilon \tilde{\eta }) }\), it holds that \(\mathop{\mathbb{E}} _{j\in Q}[h_{j}^{(i)}] \approx _{\epsilon,\tilde{\eta }}\mu\). Using Claim 3.3, the union bound over all the vectors in \(\mathcal{H}_{i}\) implies that the probability that some \(h^{(i)} \in \mathcal{H}_{i}\) does not satisfy \(\mathop{\mathbb{E}} _{j\in Q}[h_{j}^{(i)}] \approx _{\epsilon,\tilde{\eta }}\mu\) is at most

$$\displaystyle{N^{O(\epsilon ^{-2}\cdot 2^{i}\cdot \log (1/\gamma )) } \cdot 2^{-\Omega (2^{i}\cdot q\epsilon \tilde{\eta }) } \leq 2^{-\Omega (\epsilon ^{-2}\cdot 2^{i}\cdot \log N\cdot \log (1/\gamma )) }\;.}$$

We complete the proof by a union bound over i.

Approximating the Vectors Mx

Lemma 3.5

For every vector \(x \in \mathbb{C}^{N}\) with ∥x∥ 1 = 1, every multiset Q ⊆ [N], and every 1 ≤ i ≤ t + r, there exists a vector \(g \in \mathcal{G}_{i}\) that satisfies \(\vert (Mx)_{j}\vert \approx _{0,2^{-i/2}}\vert g_{j}\vert\) for all but at most γ fraction of j ∈ [N] and for all but at most γ fraction of j ∈ Q.

Proof

Observe that for every  ∈ [N] there exist p , 0, p , 1, p , 2, p , 3 ≥ 0 that satisfy

$$\displaystyle{\sum _{s=0}^{3}p_{\ell,s} = \vert x_{\ell}\vert \ \ \ \ \text{and}\ \ \ \ \sqrt{2} \cdot \sum _{s=0}^{3}p_{\ell,s} \cdot (-1)^{s/2} = x_{\ell}.}$$

Notice that the assumption ∥ x ∥ 1 = 1 implies that the numbers p , s form a probability distribution. Thus, the vector Mx can be represented as

$$\displaystyle{Mx =\sum _{ \ell=1}^{N}x_{\ell} \cdot M^{(\ell)} = \sqrt{2}\cdot \sum _{\ell =1}^{N}\sum _{ s=0}^{3}p_{\ell,s} \cdot (-1)^{s/2} \cdot M^{(\ell)} =\mathop{ \mathbb{E}} _{ (\ell,s)\sim D}[\sqrt{2}\cdot (-1)^{s/2}\cdot M^{(\ell)}],}$$

where D is the distribution that assigns probability p , s to the pair (, s).

Let F be a multiset of O(2i ⋅ log(1∕γ)) independent random samples from D, and let \(g \in \mathcal{G}_{i}\) be the vector corresponding to F as in (2). By Corollary 2.6, applied with \(a = \sqrt{2}\) (recall that ∥ M ∥   = 1) and b = 2i∕2, for every j ∈ [N] the probability that

$$\displaystyle\begin{array}{rcl} \vert (Mx)_{j}\vert \approx _{0,2^{-i/2}}\vert g_{j}\vert & &{}\end{array}$$
(4)

is at least 1 −γ∕4. It follows that the expected number of j ∈ [N] that do not satisfy (4) is at most γ N∕4, so by Markov’s inequality the probability that the number of j ∈ [N] that do not satisfy (4) is at most γ N is at least 3∕4. Similarly, the expected number of j ∈ Q that do not satisfy (4) is at most γ | Q | ∕4, so by Markov’s inequality, with probability at least 3∕4 it holds that the number of j ∈ Q that do not satisfy (4) is at most γ | Q | . It follows that there exists a vector \(g \in \mathcal{G}_{i}\) for which (4) holds for all but at most γ fraction of j ∈ [N] and for all but at most γ fraction of j ∈ Q, as required.

Lemma 3.6

For every multiset Q ⊆ [N] and every vector \(x \in \mathbb{C}^{N}\) with ∥x∥ 1 = 1 there exists a t-tuple of vectors \((h^{(1)},\ldots,h^{(t)}) \in \mathcal{H}_{1} \times \cdots \times \mathcal{H}_{t}\) for which

  1. 1.

    \(\mathop{\mathbb{E}} _{j\in Q}\left [\vert (Mx)_{j}\vert ^{2}\right ] \approx _{O(\epsilon ),O(\eta )}\mathop{ \mathbb{E}} _{j\in Q}\left [\sum _{i=1}^{t}h_{j}^{(i)}\right ]\) and

  2. 2.

    \(\mathop{\mathbb{E}} _{j\in [N]}\left [\vert (Mx)_{j}\vert ^{2}\right ] \approx _{O(\epsilon ),O(\eta )}\mathop{ \mathbb{E}} _{j\in [N]}\left [\sum _{i=1}^{t}h_{j}^{(i)}\right ]\) .

Proof

By Lemma 3.5, for every 1 ≤ i ≤ t there exists a vector \(g^{(i+r)} \in \mathcal{G}_{i+r}\) that satisfies

$$\displaystyle\begin{array}{rcl} \vert (Mx)_{j}\vert \approx _{0,2^{-(i+r)/2}}\vert g_{j}^{(i+r)}\vert & &{}\end{array}$$
(5)

for all but at most γ fraction of j ∈ [N] and for all but at most γ fraction of j ∈ Q. We say that j ∈ [N] is good if (5) holds for every 1 ≤ i ≤ t, and otherwise that it is bad. Notice that all but at most t γ fraction of j ∈ [N] are good and that all but at most t γ fraction of j ∈ Q are good. Let (h (1), , h (t)) and (B 1, , B t ) be the vectors and sets associated with (g (1+r), , g (t+r)) as defined in (3). We claim that h (1), , h (t) satisfy the requirements of the lemma.

We first show that for every good j it holds that | (Mx) j  | 2 ≈ 3ε, 9η i = 1 t h j (i). To obtain it, we observe that if j ∈ B i for some i, then

$$\displaystyle\begin{array}{rcl} 2 \cdot 2^{-i/2} \leq \vert g_{ j}^{(i+r)}\vert \leq 3 \cdot 2^{-i/2}.& &{}\end{array}$$
(6)

The lower bound follows simply from the definition of B i . For the upper bound, which trivially holds for i = 1, assume that i ≥ 2, and notice that the definition of B i implies that | g j (i+r−1) | < 2 ⋅ 2−(i−1)∕2. Using (5), and assuming that ε is sufficiently small, we obtain that

$$\displaystyle\begin{array}{rcl} \vert g_{j}^{(i+r)}\vert & \leq & \vert (Mx)_{ j}\vert + 2^{-(i+r)/2} \leq \vert g_{ j}^{(i+r-1)}\vert + 2^{-(i+r-1)/2} + 2^{-(i+r)/2} {}\\ & \leq & 2^{-i/2}(2^{3/2} + 2^{1/2} \cdot \epsilon +\epsilon ) \leq 3 \cdot 2^{-i/2}. {}\\ \end{array}$$

Hence, by the upper bound in (6), for a good j ∈ B i we have h j (i) = | g j (i+r) | 2 and \(h_{j}^{(i^{{\prime}}) } = 0\) for i i. Observe that by the lower bound in (6),

$$\displaystyle{\vert (Mx)_{j}\vert \in [\vert g_{j}^{(i+r)}\vert -2^{-(i+r)/2},\vert g_{ j}^{(i+r)}\vert +2^{-(i+r)/2}] \subseteq [(1-\epsilon )\cdot \vert g_{ j}^{(i+r)}\vert,(1+\epsilon )\cdot \vert g_{ j}^{(i+r)}\vert ],}$$

and that this implies that \(\vert (Mx)_{j}\vert ^{2} \approx _{3\epsilon,0}\sum _{i=1}^{t}h_{j}^{(i)}\). On the other hand, in case that j is good but does not belong to any B i , recalling that t = log2(1∕η), it follows that

$$\displaystyle{\vert (Mx)_{j}\vert \leq \vert g_{j}^{(t+r)}\vert + 2^{-(t+r)/2} \leq 2 \cdot 2^{-t/2} + 2^{-(t+r)/2} \leq 3 \cdot 2^{-t/2} \leq 3\sqrt{\eta },}$$

and thus \(\vert (Mx)_{j}\vert ^{2} \approx _{0,9\eta }0 =\sum _{ i=1}^{t}h_{j}^{(i)}\).

Finally, for every bad j we have

$$\displaystyle{\left \vert \vert (Mx)_{j}\vert ^{2} -\sum _{ i=1}^{t}h_{ j}^{(i)}\right \vert \leq \max \Big (\vert (Mx)_{ j}\vert ^{2},\sum _{ i=1}^{t}h_{ j}^{(i)}\Big) \leq 2.}$$

Since at most t γ fraction of the elements in [N] and in Q are bad, their effect on the difference between the expectations in the lemma can be bounded by 2t γ. By our choice of γ, this is η, completing the proof of the lemma.

Finally, we are ready to prove Theorem 3.1.

Proof of Theorem 3.1

By Lemma 3.4, applied with \(\tilde{\eta }=\eta /(2t)\), a random multiset Q of size

$$\displaystyle\begin{array}{rcl} q = O\Big(\epsilon ^{-3}\eta ^{-1} \cdot t \cdot \log N \cdot \log (1/\gamma )\Big) = O\Big(\epsilon ^{-3}\eta ^{-1}\log N \cdot \log ^{2}(1/\eta )\Big)& & {}\\ \end{array}$$

satisfies with probability \(1 - 2^{-\Omega (\epsilon ^{-2}\cdot \log N\cdot \log (1/\eta )) }\) that for all 1 ≤ i ≤ t and \(h^{(i)} \in \mathcal{H}_{i}\),

$$\displaystyle\begin{array}{rcl} \mathop{\mathbb{E}} _{j\in Q}\left [h_{j}^{(i)}\right ] \approx _{\epsilon,\eta /t}\mathop{ \mathbb{E}} _{j\in [N]}\left [h_{j}^{(i)}\right ]\;,& & {}\\ \end{array}$$

in which case we also have

$$\displaystyle{\mathop{\mathbb{E}} _{j\in Q}\left [\sum _{i=1}^{t}h_{ j}^{(i)}\right ] \approx _{\epsilon,\eta }\mathop{ \mathbb{E}} _{j\in [N]}\left [\sum _{i=1}^{t}h_{ j}^{(i)}\right ].}$$

We show that a Q with the above property satisfies the requirement of the theorem. Let \(x \in \mathbb{C}^{N}\) be a vector, and assume without loss of generality that ∥ x ∥ 1 = 1. By Lemma 3.6, there exists a t-tuple of vectors \((h^{(1)},\ldots,h^{(t)}) \in \mathcal{H}_{1} \times \cdots \times \mathcal{H}_{t}\) satisfying Items 1 and 2 there. As a result,

$$\displaystyle{\mathop{\mathbb{E}} _{j\in Q}\left [\vert (Mx)_{j}\vert ^{2}\right ] \approx _{ O(\epsilon ),O(\eta )}\mathop{ \mathbb{E}} _{j\in [N]}\left [\vert (Mx)_{j}\vert ^{2}\right ]\;,}$$

and we are done.

3.1 The Restricted Isometry Property

Equipped with Theorem 3.1, it is easy to derive our result on the restricted isometry property (see Definition 2.1) of random sub-matrices of unitary matrices.

Theorem 3.7

For sufficiently large N and k, a unitary matrix \(M \in \mathbb{C}^{N\times N}\) satisfying \(\|M\|_{\infty }\leq O(1/\sqrt{N})\) , and a sufficiently small ε > 0, the following holds. For some q = O(ε −4 ⋅ k ⋅ log2 (k∕ε) ⋅ log N), let \(A \in \mathbb{C}^{q\times N}\) be a matrix whose q rows are chosen uniformly and independently from the rows of M, multiplied by \(\sqrt{N/q}\) . Then, with probability \(1 - 2^{-\Omega (\epsilon ^{-2}\cdot \log N\cdot \log (k/\epsilon )) }\) , the matrix A satisfies the restricted isometry property of order k with constant ε.

Proof

Let Q be a multiset of q uniform and independent random elements of [N], defining a matrix A as above. Notice that by the Cauchy-Schwarz inequality, any k-sparse vector \(x \in \mathbb{C}^{N}\) with ∥ x ∥ 2 = 1 satisfies \(\|x\|_{1} \leq \sqrt{k}\). Applying Theorem 3.1 with ε∕2 and some \(\eta = \Omega (\epsilon /k)\), we get that with probability \(1 - 2^{-\Omega (\epsilon ^{-2}\cdot \log N\cdot \log (k/\epsilon )) }\), it holds that for every \(x \in \mathbb{C}^{N}\) with ∥ x ∥ 2 = 1,

$$\displaystyle{\|Ax\|_{2}^{2} = N \cdot \mathop{\mathbb{E}} _{ j\in Q}\left [\vert (Mx)_{j}\vert ^{2}\right ] \approx _{\epsilon /2,\epsilon /2}N \cdot \mathop{\mathbb{E}} _{j\in [N]}\left [\vert (Mx)_{j}\vert ^{2}\right ] =\| Mx\|_{ 2}^{2} = 1\;.}$$

It follows that every vector \(x \in \mathbb{C}^{N}\) satisfies ∥ Ax ∥ 2 2 ≈  ε, 0 ∥ x ∥ 2 2, hence A satisfies the restricted isometry property of order k with constant ε.

4 The Improved Analysis

In this section we prove the following theorem, which improves the bound of Theorem 3.1 in terms of the dependence on ε.

Theorem 4.1

For a sufficiently large N, a matrix \(M \in \mathbb{C}^{N\times N}\) , and sufficiently small ε,η > 0, the following holds. For some q = O(log2 (1∕ε) ⋅ε −1 η −1log N ⋅ log2 (1∕η)), let Q be a multiset of q uniform and independent random elements of [N]. Then, with probability \(1 - 2^{-\Omega (\log N\cdot \log (1/\eta ))}\) , it holds that for every \(x \in \mathbb{C}^{N}\) ,

$$\displaystyle\begin{array}{rcl} \mathop{\mathbb{E}} _{j\in Q}\left [\vert (Mx)_{j}\vert ^{2}\right ] \approx _{\epsilon,\eta \cdot \|x\|_{1}^{2}\cdot \|M\|_{\infty }^{2}}\mathop{ \mathbb{E}} _{j\in [N]}\left [\vert (Mx)_{j}\vert ^{2}\right ].& &{}\end{array}$$
(7)

We can assume that ε ≥ η, as otherwise, one can apply the theorem with parameters η∕2, η∕2 and derive (7) for ε, η as well (because the right-hand size is bounded from above by ∥ x ∥ 1 2 ⋅ ∥ M ∥  2). As before, we assume without loss of generality that ∥ M ∥   = 1. For ε ≥ η > 0, we define t = log2(1∕η) and r = log2(1∕ε 2). For the analysis given in this section, we define γ = η∕(60(t + r)). Throughout the proof, we use the vector sets \(\mathcal{G}_{i}\) from Sect. 3 and Lemma 3.5 for this value of γ.

The Vector Sets \(\mathcal{D}_{i,m}\) For a (t + r)-tuple of vectors \((g^{(1)},\ldots,g^{(t+r)}) \in \mathcal{G}_{1} \times \cdots \times \mathcal{G}_{t+r}\) and for 1 ≤ i ≤ t, let C i be the set of all j ∈ [N] for which i is the smallest index satisfying | g j (i) | ≥ 2 ⋅ 2i∕2. For m = i, , i + r define the vector h (i, m) by

$$\displaystyle\begin{array}{rcl} h_{j}^{(i,m)} = \vert g_{ j}^{(m)}\vert ^{2} \cdot \mathbf{1}_{ j\in C_{i}},& &{}\end{array}$$
(8)

and for other values of m define h (i, m) = 0. Now, for every m, let \(\Delta ^{(i,m)}\) be the vector defined by

$$\displaystyle\begin{array}{rcl} \Delta _{j}^{(i,m)} = \left \{\begin{array}{ll} h_{j}^{(i,m)} - h_{j}^{(i,m-1)},&\text{if }\vert h_{j}^{(i,m)} - h_{j}^{(i,m-1)}\vert \leq 30 \cdot 2^{-(i+m)/2}; \\ 0, &\text{otherwise.} \end{array} \right.& &{}\end{array}$$
(9)

Note that the support of \(\Delta ^{(i,m)}\) is contained in C i . Let \(\mathcal{D}_{i,m}\) be the set of all vectors \(\Delta ^{(i,m)}\) that can be obtained in this way.

Claim 4.2

For every 1 ≤ i ≤ t and i ≤ m ≤ i + r, \(\vert \mathcal{D}_{i,m}\vert \leq N^{O(2^{m}\cdot \log (1/\gamma )) }.\)

Proof

Observe that every vector in \(\mathcal{D}_{i,m}\) is fully defined by some \((g^{(1)},\ldots,g^{(m)}) \in \mathcal{G}_{1} \times \cdots \times \mathcal{G}_{m}\). Hence

$$\displaystyle\begin{array}{rcl} \vert \mathcal{D}_{i,m}\vert \leq \vert \mathcal{G}_{1}\vert \cdots \vert \mathcal{G}_{m}\vert \leq N^{O(\log (1/\gamma ))\cdot (2^{1}+2^{2}+\cdots +2^{m}) } \leq N^{O(\log (1/\gamma ))\cdot 2^{m+1} }\;,& & {}\\ \end{array}$$

and the claim follows.

Lemma 4.3

For every \(\tilde{\varepsilon },\tilde{\eta }> 0\) and some \(q = O(\tilde{\varepsilon }^{-1}\tilde{\eta }^{-1}\log N \cdot \log (1/\gamma ))\) , let Q be a multiset of q uniform and independent random elements of [N]. Then, with probability \(1 - 2^{-\Omega (\log N\cdot \log (1/\gamma ))}\) , it holds that for every 1 ≤ i ≤ t, m, and a vector \(\Delta ^{(i,m)} \in \mathcal{D}_{i,m}\) associated with a set C i ,

$$\displaystyle{ \mathop{\mathbb{E}} _{j\in Q}\left [\Delta _{j}^{(i,m)}\right ] \approx _{ 0,b}\mathop{ \mathbb{E}} _{j\in [N]}\left [\Delta _{j}^{(i,m)}\right ] \ \ \text{for}\ \ b = O\Big(\tilde{\varepsilon }\cdot 2^{-i} \cdot \frac{\vert C_{i}\vert } {N} +\tilde{\eta }\Big)\;. }$$
(10)

Proof

Fix i, m, and a vector \(\Delta ^{(i,m)} \in \mathcal{D}_{i,m}\) associated with a set C i as in (9). Notice that

$$\displaystyle{\mathop{\mathbb{E}} _{j\in [N]}[\vert \Delta _{j}^{(i,m)}\vert ] \leq 30 \cdot 2^{-(i+m)/2} \cdot \frac{\vert C_{i}\vert } {N} \;.}$$

By Corollary 2.4, applied with

$$\displaystyle{\epsilon ^{{\prime}} =\tilde{\varepsilon } \cdot 2^{(m-i)/2},\ \ \ \alpha =\tilde{\eta },\ \ \ \text{and}\ \ \ a = 30 \cdot 2^{-(i+m)/2},}$$

we have that (10) holds with probability \(1 - 2^{-\Omega (2^{m}\cdot q\tilde{\varepsilon }\tilde{\eta }) }\). Using Claim 4.2, the union bound over all the vectors in \(\mathcal{D}_{i,m}\) implies that the probability that some \(\Delta ^{(i,m)} \in \mathcal{D}_{i,m}\) does not satisfy (10) is at most

$$\displaystyle{N^{O(2^{m}\cdot \log (1/\gamma )) } \cdot 2^{-\Omega (2^{m}\cdot q\tilde{\varepsilon }\tilde{\eta }) } \leq 2^{-\Omega (2^{m}\cdot \log N\cdot \log (1/\gamma )) }\;.}$$

The result follows by a union bound over i and m.

Approximating the Vectors Mx

Lemma 4.4

For every multiset Q ⊆ [N] and every vector \(x \in \mathbb{C}^{N}\) with ∥x∥ 1 = 1 there exist vector collections \((\Delta ^{(i,m)} \in \mathcal{D}_{i,m})_{m=i,\ldots,i+r}\) associated with sets C i (1 ≤ i ≤ t), for which

  1. 1.

    \(\mathop{\mathbb{E}} _{j\in [N]}\left [\vert (Mx)_{j}\vert ^{2}\right ] \geq \sum _{i=1}^{t}2^{-i} \cdot \frac{\vert C_{i}\vert } {N} -\eta,\)

  2. 2.

    \(\mathop{\mathbb{E}} _{j\in Q}\left [\vert (Mx)_{j}\vert ^{2}\right ] \approx _{O(\epsilon ),O(\eta )}\mathop{ \mathbb{E}} _{j\in Q}\left [\sum _{i=1}^{t}\sum _{m=i}^{i+r}\Delta _{j}^{(i,m)}\right ],\) and

  3. 3.

    \(\mathop{\mathbb{E}} _{j\in [N]}\left [\vert (Mx)_{j}\vert ^{2}\right ] \approx _{O(\epsilon ),O(\eta )}\mathop{ \mathbb{E}} _{j\in [N]}\left [\sum _{i=1}^{t}\sum _{m=i}^{i+r}\Delta _{j}^{(i,m)}\right ].\)

Proof

By Lemma 3.5, for every 1 ≤ i ≤ t + r there exists a vector \(g^{(i)} \in \mathcal{G}_{i}\) that satisfies

$$\displaystyle\begin{array}{rcl} \vert (Mx)_{j}\vert \approx _{0,2^{-i/2}}\vert g_{j}^{(i)}\vert & &{}\end{array}$$
(11)

for all but at most γ fraction of j ∈ [N] and for all but at most γ fraction of j ∈ Q. We say that j ∈ [N] is good if (11) holds for every i, and otherwise that it is bad. Notice that all but at most (t + r)γ fraction of j ∈ [N] are good and that all but at most (t + r)γ fraction of j ∈ Q are good. Consider the sets C i and vectors \(h^{(i,m)},\Delta ^{(i,m)}\) associated with (g (1), , g (t+r)) as defined in (8). We claim that \(\Delta ^{(i,m)}\) satisfy the requirements of the lemma.

Fix some 1 ≤ i ≤ t. For every good j ∈ C i , the definition of C i implies that | g j (i) | ≥ 2 ⋅ 2i∕2, so using (11) it follows that

$$\displaystyle\begin{array}{rcl} \vert (Mx)_{j}\vert \geq \vert g_{j}^{(i)}\vert - 2^{-i/2} \geq 2^{-i/2}.& &{}\end{array}$$
(12)

We also claim that | (Mx) j  | ≤ 3 ⋅ 2−(i−1)∕2. This trivially holds for i = 1, so assume that i ≥ 2, and notice that the definition of C i implies that | g j (i−1) | < 2 ⋅ 2−(i−1)∕2, so using (11), it follows that

$$\displaystyle\begin{array}{rcl} \vert (Mx)_{j}\vert \leq \vert g_{j}^{(i-1)}\vert + 2^{-(i-1)/2} \leq 3 \cdot 2^{-(i-1)/2}.& &{}\end{array}$$
(13)

Since at most (t + r)γ fraction of j ∈ [N] are bad, (12) yields that

$$\displaystyle\begin{array}{rcl} \mathop{\mathbb{E}} _{j\in [N]}\left [\vert (Mx)_{j}\vert ^{2}\right ] \geq \sum _{ i=1}^{t}2^{-i} \cdot \frac{\vert C_{i}\vert } {N} - (t + r)\gamma /2 \geq \sum _{i=1}^{t}2^{-i} \cdot \frac{\vert C_{i}\vert } {N} -\eta,& & {}\\ \end{array}$$

as required for Item 1.

Next, we claim that every good j satisfies

$$\displaystyle{ \vert (Mx)_{j}\vert ^{2} \approx _{ O(\epsilon ),O(\eta )}\sum _{i=1}^{t}h_{ j}^{(i,i+r)}\;. }$$
(14)

For a good j ∈ C i and m ≥ i,

$$\displaystyle\begin{array}{rcl} \left \vert \vert (Mx)_{j}\vert ^{2} - h_{ j}^{(i,m)}\right \vert \leq 2 \cdot \vert (Mx)_{ j}\vert \cdot 2^{-m/2} + 2^{-m} \leq 10 \cdot 2^{-(i+m)/2},& &{}\end{array}$$
(15)

where the first inequality follows from (11) and the second from (13). In particular, for m = i + r (recall that r = log2(1∕ε 2)), we have

$$\displaystyle{\left \vert \vert (Mx)_{j}\vert ^{2} - h_{ j}^{(i,i+r)}\right \vert \leq 10 \cdot \epsilon \cdot 2^{-i} \leq 10 \cdot \epsilon \cdot \vert (Mx)_{ j}\vert ^{2}\;,}$$

and thus \(\vert (Mx)_{j}\vert ^{2} \approx _{O(\epsilon ),0}h_{j}^{(i,i+r)}\). Since every good j belongs to at most one of the sets C i , for every good \(j \in \bigcup C_{i}\) we have \(\vert (Mx)_{j}\vert ^{2} \approx _{O(\epsilon ),0}\sum _{i=1}^{t}h_{j}^{(i,i+r)}\). On the other hand, if j is good but does not belong to any C i , by our choice of t, it satisfies

$$\displaystyle{\vert (Mx)_{j}\vert \leq \vert g_{j}^{(t)}\vert + 2^{-t/2} \leq 3 \cdot 2^{-t/2} = 3\sqrt{\eta }\;,}$$

and thus \(\vert (Mx)_{j}\vert ^{2} \approx _{0,9\eta }0 =\sum _{ i=1}^{t}h_{j}^{(i,i+r)}\). This establishes that (14) holds for every good j.

Next, we claim that for every good j,

$$\displaystyle\begin{array}{rcl} \vert (Mx)_{j}\vert ^{2} \approx _{ O(\epsilon ),O(\eta )}\sum _{i=1}^{t}\sum _{ m=i}^{i+r}\Delta _{ j}^{(i,m)}\;.& &{}\end{array}$$
(16)

This follows since for every 1 ≤ i ≤ t, the vector h (i, i+r) can be written as the telescopic sum

$$\displaystyle{h^{(i,i+r)} =\sum _{ m=i}^{i+r}(h^{(i,m)} - h^{(i,m-1)})\;,}$$

where we used that h (i, i−1) = 0. We claim that for every good j, these differences satisfy

$$\displaystyle{\vert h_{j}^{(i,m)} - h_{ j}^{(i,m-1)}\vert \leq 30 \cdot 2^{-(i+m)/2},}$$

thus establishing that (16) holds for every good j. Indeed, for m ≥ i + 1, (15) implies that

$$\displaystyle\begin{array}{rcl} \vert h_{j}^{(i,m)} - h_{ j}^{(i,m-1)}\vert \leq 10 \cdot (2^{-(i+m)/2} + 2^{-(i+m-1)/2}) \leq 30 \cdot 2^{-(i+m)/2},& &{}\end{array}$$
(17)

and for m = i it follows from (11) combined with (13).

Finally, for every bad j we have

$$\displaystyle{\Big\vert \vert (Mx)_{j}\vert ^{2} -\sum _{ i=1}^{t}\sum _{ m=i}^{i+r}\Delta _{ j}^{(i,m)}\Big\vert \leq 1 + 30 \cdot \max _{ 1\leq i\leq t}\Big(\sum _{m=i}^{i+r}2^{-(i+m)/2}\Big) \leq 60\;.}$$

Since at most (t + r)γ fraction of the elements in [N] and in Q are bad, their effect on the difference between the expectations in Items 2 and 3 can be bounded by 60(t + r)γ. By our choice of γ this is η, as required.

Finally, we are ready to prove Theorem 4.1.

Proof of Theorem 4.1

Recall that it can be assumed that ε ≥ η. By Lemma 4.3, applied with \(\tilde{\varepsilon }=\epsilon /r\) and \(\tilde{\eta }=\eta /(rt)\), a random multiset Q of size

$$\displaystyle\begin{array}{rcl} q& =& O\Big(\epsilon ^{-1}\eta ^{-1} \cdot r^{2} \cdot t \cdot \log N \cdot \log (1/\gamma )\Big) {}\\ & =& O\Big(\log ^{2}(1/\epsilon ) \cdot \epsilon ^{-1}\eta ^{-1}\log N \cdot \log ^{2}(1/\eta )\Big) {}\\ \end{array}$$

satisfies with probability \(1 - 2^{-\Omega (\log N\cdot \log (1/\eta ))}\), that for every 1 ≤ i ≤ t, m, and \(\Delta ^{(i,m)} \in \mathcal{D}_{i,m}\) associated with a set C i ,

$$\displaystyle\begin{array}{rcl} \mathop{\mathbb{E}} _{j\in Q}\left [\Delta _{j}^{(i,m)}\right ] \approx _{ 0,b_{i}}\mathop{ \mathbb{E}} _{j\in [N]}\left [\Delta _{j}^{(i,m)}\right ]\ \ \text{for}\ \ b_{ i} = O\Big( \frac{\varepsilon } {r} \cdot 2^{-i} \cdot \frac{\vert C_{i}\vert } {N} + \frac{\eta } {rt}\Big),& & {}\\ \end{array}$$

in which case we also have

$$\displaystyle{ \mathop{\mathbb{E}} _{j\in Q}\left [\sum _{i=1}^{t}\sum _{ m=i}^{i+r}\Delta _{ j}^{(i,m)}\right ] \approx _{ 0,b}\mathop{ \mathbb{E}} _{j\in [N]}\left [\sum _{i=1}^{t}\sum _{ m=i}^{i+r}\Delta _{ j}^{(i,m)}\right ]\ \ \text{for}\ \ b = O\Big(\epsilon \cdot \sum _{ i=1}^{t}2^{-i} \cdot \frac{\vert C_{i}\vert } {N} +\eta \Big)\;. }$$
(18)

We show that a Q with the above property satisfies the requirement of the theorem. Let \(x \in \mathbb{C}^{N}\) be a vector, and assume without loss of generality that ∥ x ∥ 1 = 1. By Lemma 4.4, there exist vector collections \((\Delta ^{(i,m)} \in \mathcal{D}_{i,m})_{m=i,\ldots,i+r}\) associated with sets C i (1 ≤ i ≤ t), satisfying Items 1, 2, and 3 there. Combined with (18), this gives

$$\displaystyle{\mathop{\mathbb{E}} _{j\in Q}\left [\vert (Mx)_{j}\vert ^{2}\right ] \approx _{ O(\epsilon ),O(\eta )}\mathop{ \mathbb{E}} _{j\in [N]}\left [\vert (Mx)_{j}\vert ^{2}\right ]\;,}$$

and we are done.

4.1 The Restricted Isometry Property

It is easy to derive now the following theorem. The proof is essentially identical to that of Theorem 3.7, using Theorem 4.1 instead of Theorem 3.1.

Theorem 4.5

For sufficiently large N and k, a unitary matrix \(M \in \mathbb{C}^{N\times N}\) satisfying \(\|M\|_{\infty }\leq O(1/\sqrt{N})\) , and a sufficiently small ε > 0, the following holds. For some q = O(log2 (1∕ε)ε −2 ⋅ k ⋅ log2 (k∕ε) ⋅ log N), let \(A \in \mathbb{C}^{q\times N}\) be a matrix whose q rows are chosen uniformly and independently from the rows of M, multiplied by \(\sqrt{N/q}\) . Then, with probability \(1 - 2^{-\Omega (\log N\cdot \log (k/\epsilon ))}\) , the matrix A satisfies the restricted isometry property of order k with constant ε.