1 Introduction

1.1 Problem Setup

Affine phase retrieval for sparse signals aims to recover a k-sparse signal \({\varvec{x}}_0 \in {\mathbb F}^n\), \({\mathbb F}\in \{{\mathbb R},{\mathbb C}\}\), from the observed data

$$\begin{aligned} y_j = \vert \langle {\varvec{a}}_j, \varvec{x}_0 \rangle + b_j \vert + w_j, \quad j=1, \ldots , m, \end{aligned}$$

where \({\varvec{a}}_j \in {\mathbb F}^n, \, j=1, \ldots , m\) are given measurement vectors, \({\varvec{b}}:= (b_1, \ldots , b_m)^\textrm{T}\in {\mathbb F}^m\) is the given bias vector, and \({\varvec{w}}:= (w_1, \ldots , w_m)^\textrm{T}\in {\mathbb R}^m\) is the noise vector. The affine phase retrieval arises in several practical applications, such as holography [2, 20, 26, 27] and Fourier phase retrieval [3,4,5, 23], where some side information of signals is a priori known before capturing the magnitude-only measurements.

The aim of this paper is to study the following program to recover \({\varvec{x}}_0\) from \({ \varvec{ y}}:= (y_1, \ldots , y_m)^\textrm{T}\in {\mathbb R}^m\):

$$\begin{aligned} \min _{\varvec{x} \in \mathbb C^n} \Vert \varvec{x} \Vert _1 \quad \text {s.t.} \ \Vert \vert \varvec{A} {\varvec{x}}+ \varvec{b} \vert - \varvec{y} \Vert _2 \le \epsilon , \end{aligned}$$
(1)

where \(\varvec{A}:= [{\varvec{a}}_1, \ldots , {\varvec{a}}_m] ^* \in {\mathbb F}^{m\times n}\).

Particularly, we focus on the following questions:

  • Question 1: Assume that \({\varvec{a}}_j, \, j=1, \ldots , m\), are Gaussian random measurements with \(m = O(k\log ( en/k))\). In the absence of noise, i.e., \({\varvec{w}}=0, \, \epsilon =0\), is the solution to (1) \({\varvec{x}}_0\)?

  • Question 2: In the noisy scenario, is the program (1) stable under small perturbation?

For the case where \({\varvec{x}}_0\in {\mathbb C}^n\) is non-sparse, it was shown that \(m \ge 4n-1\) generic measurements are sufficient to guarantee the uniqueness of solutions in [19], and several efficient algorithms with linear convergence rate was proposed to recover the non-sparse signals \({\varvec{x}}_0\) from \({ \varvec{ y}}\) under \(m=O(n\log n)\) Gaussian random measurements in [25]. However, for the case where \({\varvec{x}}_0\) is sparse, to the best of our knowledges, there is no result about it.

1.2 Related Works

1.2.1 Phase Retrieval

The noisy phase retrieval is the problem of recovering a signal \({\varvec{x}}_0 \in {\mathbb F}^n\), \({\mathbb F}\in \{{\mathbb R},{\mathbb C}\}\) from the magnitude-only measurements

$$\begin{aligned} y'_j = \left|\langle {\varvec{a}}_j,{\varvec{x}}_0 \rangle \right| + w_j, \quad j=1, \ldots , m, \end{aligned}$$

where \({\varvec{a}}_j \in {\mathbb F}^n\) are given measurement vectors and \(w_j \in {\mathbb R}\) are noises. It arises naturally in many areas such as X-ray crystallography [21, 22, 28], coherent diffractive imaging [30], and optics [14, 15, 32]. In these settings, optical detectors record only the intensity of a light wave while losing the phase information. Note that \(\left|\langle {\varvec{a}}_j,{\varvec{x}}_0 \rangle \right|^2=\left|\langle {\varvec{a}}_j, e^{ i\theta }{\varvec{x}}_0 \rangle \right|^2\) for any \(\theta \in {\mathbb R}\). Therefore the recovery of \({\varvec{x}}_0\) for the classical phase retrieval is up to a global phase. In the absence of noise, it has been proved that \(m\ge 2n-1\) generic measurements suffice to guarantee the uniqueness of solutions for the real case [1], and \(m \ge 4n-4\) for the complex case [6, 13, 38], respectively. Moreover, several efficient algorithms have been proposed to reconstruct \({\varvec{x}}_0\) from \({ \varvec{ y}}':=[y'_1,\ldots , y'_m]^\textrm{T}\), such as alternating minimization [29], truncated amplitude flow [37], smoothed amplitude flow [7], trust-region [33], and the Wirtinger flow (WF) variants [9, 10, 41].

1.2.2 Sparse Phase Retrieval

For several applications, the underlying signal is naturally sparse or admits a sparse representation after some linear transformation. This leads to the sparse phase retrieval:

$$\begin{aligned} \min _{\varvec{x} \in {\mathbb F}^n } \quad \Vert \varvec{x} \Vert _0 \quad \text {s.t.} \ \Vert \vert \varvec{A} {\varvec{x}}\vert - \varvec{{ \varvec{ y}}}' \Vert _2 \le \epsilon , \end{aligned}$$
(2)

where \(\varvec{A}:=[{\varvec{a}}_1, \ldots , {\varvec{a}}_m]^*\). In the absence of noise, it has been established that \(m = 2k\) generic measurements are necessary and sufficient for uniquely recovering of all k-sparse signals in the real case, and \(m\ge 4k-2\) are sufficient in the complex case [39]. In the noisy scenario, \(O(k \log ( en/k)) \) measurements suffice for stable sparse phase retrieval [12]. Due to the hardness of \(\ell _0\)-norm in (2), a computationally tractable approach to recover \({\varvec{x}}_0\) is by solving the following \(\ell _1\) minimization:

$$\begin{aligned} \min _{\varvec{x} \in {\mathbb F}^n} \quad \Vert \varvec{x} \Vert _1 \quad \text {s.t.} \ \Vert \vert \varvec{A} {\varvec{x}}\vert - { \varvec{ y}}' \Vert _2 \le \epsilon . \end{aligned}$$
(3)

For the real case, based on the strong restricted isometry property (SRIP) established by Voroninski and Xu [34], the authors in [18] proved that, if \({\varvec{a}}_1,\ldots ,{\varvec{a}}_m \sim 1/\sqrt{m} \cdot \mathcal N (0, I_n)\) are i.i.d. Gaussian random vectors with \(m\ge O( k\log ( en/k))\), then the solution \({\widehat{\varvec{x}}}\in {\mathbb R}^n\) to (3) satisfies

$$\begin{aligned} \min \left\{ \Vert {\widehat{\varvec{x}}}- {\varvec{x}}_0 \Vert , \Vert {\widehat{\varvec{x}}}+ {\varvec{x}}_0 \Vert \right\} \lesssim \epsilon +\frac{\sigma _k({\varvec{x}}_0)_1}{\sqrt{k}}, \end{aligned}$$

where \(\sigma _k(\varvec{x}_0)_1:= \min _{\vert {\text {supp}}(\varvec{x})\vert \le k} \Vert \varvec{x} - \varvec{x}_0\Vert _1\). Lately, this result was extended to the complex case by employing the “phaselift” technique in [36]. Specifically, the authors in [36] showed that, for any k-sparse signal \({\varvec{x}}_0 \in {\mathbb C}^n\), the solution \({\widehat{\varvec{x}}}\in {\mathbb C}^n\) to the program

$$\begin{aligned} \mathop {\textrm{argmin}}\limits _{\varvec{x} \in \mathbb C^n} \quad \Vert \varvec{x} \Vert _1 \quad \text {s.t.} \ \Vert \mathcal {A}({\varvec{x}}) - \mathcal {A}({\varvec{x}}_0) \Vert _2 \le \epsilon \end{aligned}$$

satisfies

$$\begin{aligned} \min _{\theta \in [0, 2\pi )}\Vert {\widehat{\varvec{x}}}- e^{ i\theta }\varvec{x}_0\Vert _2 \lesssim \frac{\epsilon }{\sqrt{m}\Vert \varvec{x}_0\Vert _2}, \end{aligned}$$

provided \({\varvec{a}}_1, \ldots , {\varvec{a}}_m \sim \mathcal N(0,I_n)\) are i.i.d. complex Gaussian random vectors and \(m\ge O( k\log ( en/k))\). Here, \(\mathcal {A}({\varvec{x}}):= (\left|{\varvec{a}}_1^* {\varvec{x}}\right|^2, \ldots , \left|{\varvec{a}}_m^* {\varvec{x}}\right|^2)\).

1.2.3 Affine Phase Retrieval

The affine phase retrieval aims to recover a signal \({\varvec{x}}_0 \in {\mathbb F}^n\), \({\mathbb F}\in \{{\mathbb R},{\mathbb C}\}\), from the measurements

$$\begin{aligned} y_j =\left| \langle {\varvec{a}}_j, \varvec{x}_0 \rangle + b_j\right| , \quad j=1, \ldots , m, \end{aligned}$$

where \({\varvec{a}}_j \in {\mathbb F}^n, \, j=1, \ldots , m\) are measurement vectors, \({\varvec{b}}:= (b_1, \ldots , b_m)^\textrm{T}\in {\mathbb F}^m\) is the bias vector. The problem can be regarded as the classic phase retrieval with a priori information, and is raised in many areas, such as holographic phase retrieval [16, 17, 27] and Fourier phase retrieval [3,4,5, 23]. In such scenarios, one needs to employ some additional information about the desired signals to ensure the uniqueness of solutions. Specifically, in holographic optics, a reference signal \(\varvec{r} \in {\mathbb C}^k\), whose structure is a priori known, is included in the diffraction patterns alongside the signal of interest \({\varvec{x}}_0\in {\mathbb C}^n\) [2, 20, 26]. Set \(\varvec{x}_0' = (\varvec{x}_0^\textrm{T}, \varvec{r}^\textrm{T})^\textrm{T}\in \mathbb C^{n+k}\). Then the magnitude-only measurements we obtain that

$$\begin{aligned} y_j = \vert \langle {\varvec{a}}'_j, \varvec{x}_0' \rangle \vert = \vert \langle {\varvec{a}}_j, \varvec{x}_0 \rangle + \langle {\varvec{a}}_j'', \varvec{r} \rangle \vert = \vert \langle {\varvec{a}}_j, \varvec{x}_0 \rangle + b_j\vert , \quad j=1,\ldots ,m, \end{aligned}$$

where \({{\varvec{a}}}'_j = ({\varvec{a}}_j^\textrm{T}, {{\varvec{a}}_j''}^\textrm{T})^\textrm{T}\in {\mathbb C}^{n+k}\) are given measurement vectors and \(b_j = \langle {\varvec{a}}_j'', \varvec{r} \rangle \in {\mathbb C}\) are known. Therefore, the holographic phase retrieval can be viewed as the affine phase retrieval.

Another application of affine phase retrieval arises in Fourier phase retrieval problem. For one-dimensional Fourier phase retrieval problem, it usually does not possess the uniqueness of solutions [35]. Actually, for a given signal with dimension n, beside the trivial ambiguities caused by shift, conjugate reflection and rotation, there still could be \(2^{n-2}\) nontrivial solutions. To enforce the uniqueness of solutions, one approach is to use additionally known values of some entries [4], which can be recast as affine phase retrieval. More related works on the uniqueness of solutions for Fourier phase retrieval can be seen in [11, 31].

1.3 Our Contributions

In this paper, we focus on the recovery of sparse signals from the magnitude of affine measurements. Specifically, we aim to recover a k-sparse signal \({\varvec{x}}_0\in \mathbb F^n\) (\(\mathbb F={\mathbb R}\) or \(\mathbb F={\mathbb C}\)) from the data

$$\begin{aligned} { \varvec{ y}}=\left|\varvec{Ax}_0 + \varvec{b}\right|+{\varvec{w}}, \end{aligned}$$

where \(\varvec{A}:= [{\varvec{a}}_1,\ldots ,{\varvec{a}}_m]^* \in \mathbb F^{m\times n}\) is the measurement matrix, \({\varvec{b}}\in {\mathbb F}^m\) is the bias vector, and \({\varvec{w}}\in {\mathbb R}^m\) is the noise vector. Our aim is to present the performance of the following \(\ell _1\) minimization program:

$$\begin{aligned} \mathop {\textrm{argmin}}\limits _{\varvec{x} \in \mathbb F^n} \Vert \varvec{x} \Vert _1 \quad \text {s.t.} \ \Vert \vert \varvec{A} {\varvec{x}}+ \varvec{b} \vert - \varvec{y} \Vert _2 \le \epsilon . \end{aligned}$$
(4)

We say a triple \((\varvec{A}, {\varvec{b}}, \Delta )\) is instance optimal of order \(k_0\) if it holds

$$\begin{aligned} \Vert \Delta (\left|\varvec{A}{\varvec{x}}+{\varvec{b}}\right|)-{\varvec{x}}\Vert _p \le C\cdot \sigma _{k_0}{({\varvec{x}})}_q \end{aligned}$$
(5)

for all \({\varvec{x}}\in {\mathbb F}^n\). Here, \(\Delta : {\mathbb R}^m\rightarrow {\mathbb F}^n\) is a decoder for reconstructing \({\varvec{x}}\), \(\sigma _k(\varvec{x})_q:= \min _{\vert {\text {supp}}(\varvec{z})\vert \le k} \Vert \varvec{z} - \varvec{x}\Vert _q\) and \(C:= C_{k_0, p, q}\) is a constant depending on \(k_0\), p and q.

Theorem 1

Assume that there exists a matrix \(\varvec{A} \in {\mathbb F}^{m\times n}\), a vector \({\varvec{b}}\in {\mathbb F}^m\), a decoder \(\Delta : {\mathbb F}^m \rightarrow {\mathbb F}^n\) and positive integers \(k_0, \, p, \, q\) such that (5) holds for all \({\varvec{x}}\in {\mathbb F}^n\). Then \({\varvec{b}}\notin \{\varvec{A}{\varvec{z}}: {\varvec{z}}\in {\mathbb F}^n\}\).

Proof

We assume that \({\varvec{b}}= \varvec{A} {\varvec{z}}_0\) where \({\varvec{z}}_0 \in {\mathbb F}^n\). We next show that there exits \({\varvec{x}}\in {\mathbb F}^n\) such that (5) does not hold. For the aim of contradiction, we assume that (5) holds. Since \(\sigma _{k_0}{(-{\varvec{x}})}_q=\sigma _{k_0}{({\varvec{x}})}_q\), we have

$$\begin{aligned} \Vert \Delta (\left|\varvec{A}{\varvec{x}}-{\varvec{b}}\right|)+{\varvec{x}}\Vert _p=\Vert \Delta (\left|\varvec{A}(-{\varvec{x}})+{\varvec{b}}\right|)-(-{\varvec{x}})\Vert _p\le C \sigma _{k_0}{({\varvec{x}})}_q. \end{aligned}$$
(6)

Assume that \({\varvec{x}}_0\in {\mathbb F}^n\) is \(k_0\)-sparse, i.e. \(\sigma _{k_0}({\varvec{x}}_0)_q=0\). According to (5) and (6), we obtain that

$$\begin{aligned} \Delta (\left|\varvec{A}{\varvec{x}}_0+{\varvec{b}}\right|)\,\,=\,\, {\varvec{x}}_0,\quad \Delta (\left|\varvec{A}{\varvec{x}}_0-{\varvec{b}}\right|)\,\,=\,\, -{\varvec{x}}_0. \end{aligned}$$
(7)

Taking \({\varvec{x}}=r{\varvec{x}}_0+2{\varvec{z}}_0\) in (6), we have

$$\begin{aligned} \Vert \Delta (\left|\varvec{A}(r{\varvec{x}}_0+2{\varvec{z}}_0)-{\varvec{b}}\right|)+r{\varvec{x}}_0+2{\varvec{z}}_0\Vert _p\le C \sigma _{k_0}{(r{\varvec{x}}_0+2{\varvec{z}}_0)}_q\le C\sigma _{k_0}{(2{\varvec{z}}_0)}_q, \end{aligned}$$
(8)

where \(r>0\). Observe that

$$\begin{aligned} \Delta (\left|\varvec{A}(r{\varvec{x}}_0+2{\varvec{z}}_0)-{\varvec{b}}\right|)=\Delta (\left|\varvec{A}(r{\varvec{x}}_0)+{\varvec{b}}\right|)=r{\varvec{x}}_0. \end{aligned}$$
(9)

Here, we use \({\varvec{x}}_0\) is \(k_0\)-sparse. Substituting (8) into (9), we obtain that

$$\begin{aligned} \Vert 2r{\varvec{x}}_0+2{\varvec{z}}_0\Vert _p\,\,\le \,\, C\sigma _{k_0}(2{\varvec{z}}_0)_q \end{aligned}$$
(10)

holds for any \(r>0\). Note \(\lim _{r\rightarrow \infty }\Vert 2r{\varvec{x}}_0+2{\varvec{z}}_0\Vert _p=\infty \). Hence, (10) does not hold provided r is large enough. A contradiction! \(\square \)

For the case where \(m\le n\) and \(\varvec{A}\) is full rank, we have \({\varvec{b}}\in \{\varvec{A}{\varvec{z}}: {\varvec{z}}\in {\mathbb F}^n\}\). According to Theorem 1, we know that it is impossible to build the instance-optimality result under this setting. This is quite different from the earlier results on standard phase retrieval [18], where the instance-optimality is

$$\begin{aligned} \min _{\left|c\right|=1} \Vert \Delta (\left|\varvec{A}{\varvec{x}}\right|)-c {\varvec{x}}\Vert _p \le C\cdot \sigma _{k_0}{({\varvec{x}})}_q, \text { for all } {\varvec{x}}\in {\mathbb F}^n. \end{aligned}$$
(11)

The instance-optimality result for the standard phase retrieval, as expressed in equation (11), is established in [18].

1.3.1 Real Case

Our first result gives an upper bound for the reconstruct error of (4) in the real case, under the assumption of \({\varvec{a}}_1, \ldots ,{\varvec{a}}_m \in {\mathbb R}^n\) being real Gaussian random vectors and \(m\ge O(k\log ( en/k))\). It means the \(\ell _1\)-minimization program is stable under small perturbation, even for the approximately k-sparse signals. To begin with, we need the following definition of strong RIP condition, which was introduced by Voroninski and Xu [34].

Definition 1

(Strong RIP in [34]) The matrix \(\varvec{A}\in {\mathbb R}^{m\times n}\) satisfies the Strong Restricted Isometry Property (SRIP) of order k and constants \(\theta _l,\, \theta _u >0 \) if the following inequality

$$\begin{aligned} \theta _{l}\Vert {\varvec{x}} \Vert ^2 \le \min _{I\subset [m],\left|I\right|\ge m/2} \Vert \varvec{A}_I {\varvec{x}} \Vert ^2\le \max _{I\subset [m],\left|I\right|\ge m/2} \Vert \varvec{A}_I {\varvec{x}} \Vert ^2 \le \theta _{u}\Vert {\varvec{x}} \Vert ^2 \end{aligned}$$

holds for all k-sparse signals \({\varvec{x}}\in {\mathbb R}^n\). Here, \(\varvec{A}_I\) denotes the sub-matrix of \(\varvec{A}\) whose rows with indices in I are kept, \([m]:=\{1,\ldots ,m\}\) and \(\left|I\right|\) denotes the cardinality of I.

The following result indicates that the matrix \(\begin{bmatrix} \varvec{A}&\varvec{b} \end{bmatrix} \in \mathbb R^{m \times (n+1)}\) satisfies strong RIP condition with high probability under some mild conditions on \(\varvec{A}\in {\mathbb R}^{m\times n}\) and \({\varvec{b}}\in {\mathbb R}^m\).

Theorem 2

Let \(\varvec{A} \in \mathbb R^{m \times n}\) be a Gaussian random matrix with entries \(a_{k, j}\sim \mathcal N(0,1/m)\). Suppose that the vector \(\varvec{b} \in \mathbb R^m\) satisfies \(\alpha \le \Vert \varvec{b}_I \Vert _2 \le \beta \) for all \(I \subseteq [m]\) with \( \left|I\right| \ge m/2\), where \(\alpha \le \beta \) are two positive constants. Set \(\varvec{A}':= \begin{bmatrix} \varvec{A}&\varvec{b} \end{bmatrix} \in \mathbb R^{m \times (n+1)}\). If \(m \ge Ct(k+1)\log ( en/k)\) with \(t(k+1) \le n\) and \(1 < t \in \mathbb Z\), then there exist constants \(\theta _l',\, \theta _u'\), independent with t, such that the matrix \(\varvec{A}'\) satisfies the strong RIP of order \(tk+1\) and constants \(\theta _l',\, \theta _u'\) with probability at least \(1 - 4\exp (-c'm)\). Here, \(C,\, c'>0\) are constants depending only on \(\alpha \) and \(\beta \).

The following theorem shows that if we add some restrictions on the signal \({\varvec{x}}\), then the instance-optimality result can be established.

Theorem 3

Assume that \(\varvec{A}':= \begin{bmatrix} \varvec{A}&\varvec{b} \end{bmatrix} \in \mathbb R^{m \times (n+1)}\) satisfies the strong RIP of order \((a+1)(k+1)\) with constants \(\theta _u \ge \theta _l>0\). If \(a > \theta _u/\theta _l\), then the following holds: for any vector \({\varvec{x}}_0 \in {\mathbb R}^n\), the solution \({\widehat{\varvec{x}}}\) to (4) with \(\varvec{y} = \vert \varvec{A} \varvec{x}_0 + \varvec{b}\vert + \varvec{w}\) and \(\Vert \varvec{w} \Vert _2 \le \epsilon \) obeys

$$\begin{aligned} \Vert {\widehat{\varvec{x}}}- \varvec{x}_0 \Vert _2 \le K_1\epsilon + K_2 \frac{\sigma _k({\varvec{x}}_0)_1}{\sqrt{a(k+1)}}, \end{aligned}$$

provided \(K_1\epsilon + K_2 \frac{\sigma _k({\varvec{x}}_0)_1}{\sqrt{a(k+1)}} <2\). Here,

$$\begin{aligned} K_1 := \frac{2\left( 1+1/\sqrt{a}\right) }{\sqrt{\theta _l}-\sqrt{\theta _u}/\sqrt{a}} >0, \quad K_2 := \sqrt{\theta _u}K_1 + 2. \end{aligned}$$

From Theorem 2, we know that if \(\varvec{A} \in \mathbb R^{m \times n}\) is a Gaussian random matrix with entries \(a_{k, j}\sim \mathcal N(0,1/m)\) and the sampling complexity \(m\ge C (a+1)(k+2)\log ( en/k)\), then with high probability the matrix \(\varvec{A}':= \begin{bmatrix} \varvec{A}&\varvec{b} \end{bmatrix}\) satisfies strong RIP condition of order \((a+1)(k+1)\) with constants \(\theta _l,\, \theta _u>0\) under some mild conditions on \({\varvec{b}}\). Here, the constants \(\theta _l,\, \theta _u\) are independent with a. Therefore, taking the constant \(a> \theta _u/\theta _l\), the conclusion of Theorem 3 holds with high probability.

In the absence of noise, i.e., \({\varvec{w}}=0, \, \epsilon =0\), Theorem 3 shows that if \({\varvec{a}}_1, \ldots , {\varvec{a}}_m \sim 1/\sqrt{m} \cdot \mathcal N(0, I_n)\) are real Gaussian random vectors and \(m\ge O(k\log ( en/k))\), then all the k-sparse signals \({\varvec{x}}_0 \in {\mathbb R}^n\) could be reconstructed exactly by solving the program (4) under some mild conditions on \({\varvec{b}}\). We state it as the following corollary:

Corollary 1

Let \(\varvec{A} \in \mathbb {R}^{m \times n}\) be a Gaussian random matrix with entries \(a_{jk}\sim \mathcal N(0,1/m)\), and \(\varvec{b} \in \mathbb {R}^m\) be a vector satisfying \(\alpha \le \Vert \varvec{b}_I \Vert _2 \le \beta \) for all \(I \subseteq [m]\) with \( \left|I\right|\ge m/2 \), where \(\alpha \le \beta \) are two positive universal constants. If \(m \ge C k\log ( en/k)\), then with probability at least \(1 - 4\exp (-c m)\) it holds: for any k-sparse signal \(\varvec{x}_0 \in \mathbb {R}^n\), the \(\ell _1\) minimization

$$\begin{aligned} \mathop {\textrm{argmin}}\limits _{\varvec{x} \in \mathbb {R}^n} \Vert \varvec{x}\Vert _1 \quad \mathrm {s.t.} \quad \left|\varvec{A} {\varvec{x}}+ \varvec{b}\right| = \varvec{y} \end{aligned}$$

with \(\varvec{y} = \left|\varvec{A} {\varvec{x}}_0 + \varvec{b}\right|\) has a unique solution \(\varvec{x}_0\). Here \(C,\, c>0\) are constants depending only on \(\alpha \) and \(\beta \).

1.3.2 Complex Case

We next turn to consider the estimation performance of (4) for the complex-valued signals. Let \(\mathbb {H}^{n \times n}\) be the set of Hermitian matrix in \({\mathbb C}^{n\times n}\) and \(\Vert \varvec{H}\Vert _{0, 2}\) denotes the number of non-zero rows in \(\varvec{H}\). Given \({\varvec{a}}_1, \ldots , {\varvec{a}}_m \in {\mathbb C}^n\) and \(b_1,\ldots ,b_m \in {\mathbb C}\), we define a linear map \(\mathcal A': \varvec{H}' \in \mathbb H^{(n+1)\times (n+1)}\rightarrow {\mathbb R}^m\) as follows:

$$\begin{aligned} \mathcal A'(\varvec{H}') = ({\varvec{a}}_1'^* \varvec{H}' {\varvec{a}}_1', \dots , {\varvec{a}}_m'^* \varvec{H}' {\varvec{a}}_m'), \end{aligned}$$
(12)

where \({\varvec{a}}_j':= \Bigl ( \begin{array}{l} {\varvec{a}}_j \\ b_j \end{array} \Bigr ) \in {\mathbb C}^{n+1}\).

Definition 2

We say the linear map \(\mathcal A'\) defined in (12) satisfies the restricted isometry property of order (rk) with constants \(c,\, C >0\) if the following holds

$$\begin{aligned} c \Vert \varvec{H}' \Vert _F \le \frac{1}{m}\Vert \mathcal A'(\varvec{H}') \Vert _1 \le C \Vert \varvec{H}' \Vert _F \end{aligned}$$
(13)

for all \(\varvec{H}':= \begin{bmatrix} \varvec{H} &{} {\varvec{h}}\\ {\varvec{h}}^* &{} 0 \end{bmatrix} \in \mathbb H^{(n+1)\times (n+1)}\) with \({\text {rank}}(\varvec{H}) \le r\), \(\Vert \varvec{H} \Vert _{0, 2} \le k\) and \(\Vert {\varvec{h}} \Vert _0 \le k\).

The following theorem shows that the linear map \({\mathcal A}'\) satisfies the restricted isometry property over low-rank and sparse matrices, provided \({\varvec{a}}_1, \ldots , {\varvec{a}}_m \in {\mathbb C}^n\) are i.i.d. complex Gaussian random vectors and \({\varvec{b}}:= (b_1,\ldots ,b_m)^\textrm{T}\in {\mathbb C}^m\) satisfies some mild conditions.

Theorem 4

Suppose \({\varvec{a}}_1, \ldots , {\varvec{a}}_m \sim 1/\sqrt{2}\cdot \mathcal N(0,I_n) + i/\sqrt{2}\cdot \mathcal N(0,I_n)\) are i.i.d. complex Gaussian random vectors and \({\varvec{b}}\in {\mathbb C}^m\) is a independent sub-gaussian random vector (it also may be deterministic) with sub-gaussian norm \(\Vert {\varvec{b}} \Vert _{\psi _2} \le C\) and \({\mathbb E}\Vert {\varvec{b}} \Vert _1 \ge c_1 m\), \({\mathbb E}\Vert {\varvec{b}} \Vert _2 \le c_2 \sqrt{m} \), where \(C>0,\, c_2 \ge c_1 > 0\) are universal constants. If \(m\ge C' k\log ( en/k)\), then with probability at least \(1-5\exp (-c' m)\), the linear map \({\mathcal A}'\) defined in (12) obeys

$$\begin{aligned} \frac{\theta ^-}{12} \Vert \varvec{H}' \Vert _F \le \frac{1}{m}\Vert \mathcal A'(\varvec{H}') \Vert _1 \le 3\theta ^+ \Vert \varvec{H}' \Vert _F \end{aligned}$$

for all \(\varvec{H}':= \begin{bmatrix} \varvec{H} &{} {\varvec{h}}\\ {\varvec{h}}^* &{} 0 \end{bmatrix} \in \mathbb H^{(n+1)\times (n+1)}\) with \({\text {rank}}(\varvec{H}) \le 2\), \(\Vert \varvec{H} \Vert _{0, 2} \le k\) and \(\Vert {\varvec{h}} \Vert _0 \le k\). Here, \(\theta ^-:=\min (1, c_1/\sqrt{2})\), \(\theta ^+:=\max (\sqrt{6}, c_2)\), and \(C',\, c'>0\) are constants depending only on \(c_1,\, c_2\).

With abuse of notation, we denote \(\mathcal A'({\varvec{x}}'):=\mathcal A'({\varvec{x}}'{\varvec{x}}'^*)\) for any vector \({\varvec{x}}'\in {\mathbb C}^{n+1}\). Then we have

Theorem 5

Assume that the linear map \(\mathcal A'(\cdot )\) satisfies the RIP condition (13) of order (2, 2ak) with constants \(c,\, C>0\). For any k-sparse signal \({\varvec{x}}_0 \in {\mathbb C}^n\), if

$$\begin{aligned} c - C\left( \frac{4}{\sqrt{a}}+\frac{1}{a}\right) >0, \end{aligned}$$

then the solution \({\widehat{\varvec{x}}}\in {\mathbb C}^n\) to

$$\begin{aligned} \mathop {\textrm{argmin}}\limits _{{\varvec{x}}\in {\mathbb C}^n} \quad \Vert {\varvec{x}} \Vert _1 \quad \text{ s.t. } \quad \Vert \mathcal A'({\varvec{x}}')-{\tilde{\varvec{y}}} \Vert \le \epsilon \quad \text{ and } \quad {\varvec{x}}'=({\varvec{x}}^\textrm{T}, 1)^\textrm{T}\end{aligned}$$

with \({\tilde{\varvec{y}}}=\mathcal A'({\varvec{x}}'_0)+{\varvec{w}}\), \(\Vert {\varvec{w}} \Vert \le \epsilon \) and \({\varvec{x}}'_0=({\varvec{x}}_0^\textrm{T}, 1)^\textrm{T}\) obeys

$$\begin{aligned} \min _{\theta \in {\mathbb R}} \left( \Vert {{\widehat{\varvec{x}}}-{e}^{ i\theta }{\varvec{x}}_0}\Vert _2 + \left|1-{e}^{ i\theta } \right|\right) \le \frac{C_0 \epsilon }{\left( \Vert {\varvec{x}}_0 \Vert +1\right) \sqrt{m}}, \end{aligned}$$

where

$$\begin{aligned} C_0:=2\sqrt{2} \cdot \frac{\frac{1}{a} +\frac{4}{\sqrt{a}}+1}{c - C\left( \frac{4}{\sqrt{a}}+\frac{1}{a}\right) }. \end{aligned}$$

Based on Theorem 4, if \({\varvec{a}}_1,\ldots ,{\varvec{a}}_m \in {\mathbb C}^n\) are i.i.d. complex Gaussian random vectors and \(m\ge C' ak\log ( en/ak)\), then with high probability the linear map \(\mathcal A'\) defined in (12) satisfies RIP condition of order (2, 2ak) with constants \(c=\theta ^-/12\) and \(C=3\theta ^+\) under some mild conditions on \({\varvec{b}}\). For the noiseless case where \({\varvec{w}}=0,\,\epsilon =0\), taking the constant \(a> (8C/c)^2\) and combining with Theorem 5, we can obtain the following result.

Corollary 2

Suppose \({\varvec{a}}_1, \ldots , {\varvec{a}}_m \sim 1/\sqrt{2}\cdot \mathcal N(0,I_n)+{i}/\sqrt{2}\cdot \mathcal N(0,I_n)\) are i.i.d. complex Gaussian random vectors and \({\varvec{b}}\in {\mathbb C}^m\) is a independent sub-gaussian random vector (it also may be deterministic) with sub-gaussian norm \(\Vert {\varvec{b}} \Vert _{\psi _2} \le C\) and \( {\mathbb E}\Vert {\varvec{b}} \Vert _1 \ge c_1 m\), \({\mathbb E}\Vert {\varvec{b}} \Vert _2 \le c_2 \sqrt{m}\), where \(C>0,\, c_2 \ge c_1 > 0\) are universal constants. If \(m \ge C'' k\log ( en/k)\), then with probability at least \(1-5\exp (-c'' m)\), then the solution to

$$\begin{aligned} \mathop {\textrm{argmin}}\limits _{{\varvec{x}}\in {\mathbb C}^n} \quad \Vert {\varvec{x}} \Vert _1 \quad \text{ s.t. } \quad \left|\varvec{A}{\varvec{x}}+{\varvec{b}}\right|=\left|\varvec{A}{\varvec{x}}_0+{\varvec{b}}\right| \end{aligned}$$

is \({\varvec{x}}_0\) exactly. Here, \(C'',\, c'' > 0\) are constants depending only on \(c_1,\,c_2\).

Remark 1

We give an upper bound for \(\min _{\theta \in {\mathbb R}} \left( \Vert {{\widehat{\varvec{x}}}-e^{{i}\theta }{\varvec{x}}_0}\Vert _2 + \left|1-e^{ i\theta }\right|\right) \) in Theorem 5. However, since the affine phase retrieval can recover a signal exactly (not just up to a global phase), one may wonder: is there a stable recovery bound for \(\Vert {{\widehat{\varvec{x}}}-{\varvec{x}}_0}\Vert _2\)? We believe that the answer is no, especially for the case where the noise vector \(\Vert {{\varvec{w}}}\Vert _2 \gtrsim \sqrt{m}\). We defer the proof of it for the future work.

1.4 Notations

Throughout the paper, we denote \({\varvec{x}}\sim \mathcal N(0,I_n)\) if \({\varvec{x}}\in {\mathbb R}^n\) is a standard Gaussian random vector. A vector \(\varvec{x}\) is k-sparse if there are at most k nonzero entries of \({\varvec{x}}\). For simplicity, we denote \([m]:= \{1, \dots , m\}\). For any subset \(I \subseteq [m]\), let \(\varvec{A}_I = \begin{bmatrix} {\varvec{a}}_j: j \in I \end{bmatrix}^*\) be the submatrix whose rows are generated by \(\varvec{A} = \begin{bmatrix} {\varvec{a}}_1,\ldots , {\varvec{a}}_m \end{bmatrix}^*\). Denote \(\sigma _k(\varvec{x}_0)_p:= \min _{\left|{\text {supp}}(\varvec{x})\right| \le k} \Vert \varvec{x} - \varvec{x}_0\Vert _p\) as the best k-term approximation error of \(\varvec{x}_0\) with respect to \(\ell _p\) norm. For a complex number b, we use \(b_{\Re }\) and \(b_{\Im }\) to denote the real and imaginary part of b, respectively. For any \(A,B\in {\mathbb R}\), we use \( A \lesssim B\) to denote \(A\le C_0 B\) where \(C_0\in {\mathbb R}_+\) is an absolute constant. The notion \(\gtrsim \) can be defined similarly. Throughout this paper, c, C and the subscript (superscript) forms of them denote constants whose values vary with the context.

2 Proof of Theorem 2 and Theorem 3

In this section, we consider the estimation performance of the \(\ell _1\)-minimization program (4) for the real-valued signals. Before proceeding, we need the following lemma which shows that if \(\varvec{A} \in {\mathbb R}^{m\times n}\) is a real Gaussian random matrix with entries \(a_{k,j}\sim \mathcal N(0,1/m)\), then \(\varvec{A}\) satisfies the strong RIP with high probability.

Lemma 1

(Theorem 2.1 in [34]) Suppose that \(t>1\) and that \(\varvec{A}\in {\mathbb R}^{m\times n}\) is a Gaussian random matrix with entries \(a_{k,j}\sim \mathcal N(0,1/m)\). Let \(m=O(tk \log (e n/k))\) where \(k \in [1,d]\cap {\mathbb Z}\) and \(t\ge 1\) is a constant. Then there exist constants \(\theta _l, \theta _u\) with \(0<\theta _l< \theta _u<2\), independent with t, such that \(\varvec{A}\) satisfies SRIP of order \(t\cdot k\) and constants \(\theta _l,\, \theta _u\) with probability at least \(1-\exp (-cm)\), where \(c>0\) is a universal constant.

2.1 Proof of Theorem 2

Proof

From the definition, it suffices to show there exist constants \(\theta '_l, \theta '_u >0 \) such that the following inequality

$$\begin{aligned} \theta '_{l}\Vert {\varvec{x}}' \Vert ^2 \le \min _{I\subset [m],\left|I\right|\ge m/2} \Vert \varvec{A}'_I {\varvec{x}}' \Vert ^2\le \max _{I\subset [m],\left|I\right|\ge m/2} \Vert \varvec{A}'_I {\varvec{x}}' \Vert ^2 \le \theta '_{u}\Vert {\varvec{x}}' \Vert ^2 \end{aligned}$$
(14)

holds for all \((tk+1)\)-sparse signals \({\varvec{x}}' \in {\mathbb R}^{n+1}\). To this end, we denote \(\varvec{x}' = (\varvec{x}^\textrm{T}, z)^\textrm{T}\), where \(\varvec{x} \in \mathbb R^n\) and \(z \in \mathbb R\). We first consider the case where \(z=0\). From Lemma 1, we know that if \(m \gtrsim t(k+1)\log ( en/(k+1))\) and \(t>1\), then there exist two positive constants \(\theta _l, \theta _u \in (0,2)\) such that

$$\begin{aligned} \theta _l \Vert \varvec{x}\Vert _2^2 \le \min _{I \subseteq [m], \left|I\right|\ge m/2} \Vert \varvec{A}_I \varvec{x}\Vert _2^2 \le \max _{I \subseteq [m], \left|I\right|\ge m/2}\Vert \varvec{A}_I \varvec{x}\Vert _2^2 \le \theta _u \Vert \varvec{x}\Vert _2^2 \end{aligned}$$
(15)

holds for all \((tk+1)\)-sparse vector \(\varvec{x} \in \mathbb R^n\) with probability at least \(1 - \exp (-cm)\). Here, \(c>0\) is a universal constant. Note that \(\varvec{A}' \varvec{x}' = \varvec{A} \varvec{x}\). We immediately obtain (14) for the case where \(z=0\).

Next, we turn to the case where \(z \ne 0\). A simple calculation shows that

$$\begin{aligned} \Vert \varvec{A}'_I \varvec{x}'\Vert _2^2 = \Vert \varvec{A}_I \varvec{x} + z\varvec{b}_I\Vert _2^2 = \Vert \varvec{A}_I \varvec{x}\Vert _2^2 + 2z\langle \varvec{A}_I \varvec{x}, \varvec{b}_I \rangle + z^2\Vert \varvec{b}_I\Vert _2^2 \end{aligned}$$
(16)

for any \(I \subseteq [m]\). Denote \(\varvec{A} = \begin{bmatrix} {\varvec{a}}_1,\ldots , {\varvec{a}}_m \end{bmatrix}^\textrm{T}\). Note that \(\sqrt{m} {\varvec{a}}_j \sim \mathcal N(0, I_n)\). Taking \(\zeta = \frac{\min (\theta _l,\alpha ^2)}{200\beta }\) in Lemma 5, we obtain that there exists a constant \(C>0\) depending only on \(\theta _l,\, \alpha ,\, \beta \) such that when \(m\ge Ct(k+1)\log (en/k)\), with probability at least \(1-3\exp (-c_1 m)\), it holds

$$\begin{aligned} \left|\langle \varvec{A}_I \varvec{x}, \varvec{b}_I \rangle \right| = \left|\langle \varvec{A} {\varvec{x}}, \varvec{b}_I \rangle \right| \le \frac{\min \{\theta _l,\alpha ^2\}}{200\beta } \Vert \varvec{x}\Vert _2\Vert \varvec{b}\Vert _2 \end{aligned}$$
(17)

for all \((tk+1)\)-sparse vectors \(\varvec{x}\) and all \(I \subseteq [m]\). Here, we view \({\varvec{b}}_I={\varvec{b}}\mathbb {I}_I\in {\mathbb R}^m\) (\({\mathbb I}_I(j)=1\) if \(j\in { I}\) and 0 if \(j\notin { I}\)), and \(c_1 > 0\) is a constant depending only on \(\theta _l,\, \alpha ,\, \beta \). Note that the vector \(\varvec{b}\) satisfies

$$\begin{aligned} \alpha \le \Vert \varvec{b}_I \Vert _2 \le \beta \end{aligned}$$
(18)

for all \(I \subseteq [m]\) with \(\left|I\right|\ge m/2\). Putting (15), (17) and (18) into (16), we obtain that when \(m \ge Ct(k+1)\log (en/k)\), with probability at least \(1-4\exp (-cm)\), the following two inequalities

$$\begin{aligned} \Vert \varvec{A}'_I \varvec{x}'\Vert _2^2 \ge \theta _l \Vert \varvec{x}\Vert _2^2 - 2\left|z\right|\frac{\min \{\theta _l, \alpha ^2\}}{200\beta }\Vert \varvec{x}\Vert _2 \beta + \alpha ^2 z^2 \ge 0.99\min \{\theta _l, \alpha ^2\} \Vert \varvec{x}'\Vert _2^2, \end{aligned}$$

and

$$\begin{aligned} \Vert \varvec{A}'_I \varvec{x}'\Vert _2^2 \le \theta _u \Vert \varvec{x}\Vert _2^2 + 2\left|z\right|\frac{\min \{\theta _l, \alpha ^2\}}{200\beta }\Vert \varvec{x}\Vert _2 \beta + \beta ^2 z^2 \le 1.01 \max \{\theta _u, \beta ^2\} \Vert \varvec{x}'\Vert _2^2 \end{aligned}$$

hold for all \((tk+1)\)-sparse vector \(\varvec{x}' \in \mathbb R^{n+1}\) and for all \(I \subseteq [m]\) with \(\left|I\right|\ge m/2\). Here, \(c > 0\) is a constant depending only on \(\theta _l,\, \alpha ,\, \beta \). In other words, we have

$$\begin{aligned} \theta '_l \Vert \varvec{x}' \Vert _2^2 \le \min _{I \subseteq [m], \left|I\right|\ge m/2} \Vert \varvec{A}'_I \varvec{x}'\Vert _2^2 \le \max _{I \subseteq [m], \left|I\right|\ge m/2} \Vert \varvec{A}'_I \varvec{x}' \Vert _2^2 \le \theta '_u \Vert \varvec{x}'\Vert _2^2 \end{aligned}$$

for all \((tk+1)\)-sparse vector \(\varvec{x}'\) with probability at least \(1 - 4\exp (-cm)\). Here, \(\theta '_l = 0.99\min \{\theta _l, \alpha ^2\}\) and \(\theta '_u = 1.01 \max \{\theta _u, \beta ^2\}\). Combining the above two cases and noting that \(\theta _l\), \(\theta _u >0\) are universal constants, we complete the proof. \(\square \)

2.2 Proof of Theorem 3

Proof

Denote \(\varvec{A}' = \begin{bmatrix} \varvec{A}&\varvec{b} \end{bmatrix}\), \({\widehat{\varvec{x}}}' = ({\widehat{\varvec{x}}}^\textrm{T}, 1)^\textrm{T}\) and \({\varvec{x}}'_0 = (\varvec{x}_0^\textrm{T}, 1)^\textrm{T}\). Set

$$\begin{aligned} I:= \{j: (\langle {\varvec{a}}_j, {\widehat{\varvec{x}}} \rangle + b_j)(\langle {\varvec{a}}_j, \varvec{x}_0 \rangle + b_j) \ge 0\}. \end{aligned}$$

We next divide the proof into the following two cases.

Case 1: \(\left|I\right| \ge m/2\). Set \({\varvec{h}}= {\widehat{\varvec{x}}}' - {\varvec{x}}'_0\). For any \(a>1\), we decompose \({\varvec{h}}\) into the sum of \({\varvec{h}}_{T_0}, {\varvec{h}}_{T_1}, \dots \), where \(T_0\) is an index set which consists the indices of the \(k+1\) largest coordinates of \({\varvec{x}}'_0\) in magnitude, \(T_1\) is the index set corresponding to the \(a(k+1)\) largest coordinates of \({\varvec{h}}_{T_0^c}\) in magnitude, \(T_2\) is the index set corresponding to the \(a(k+1)\) largest coordinates of \({\varvec{h}}_{(T_0 \cup T_1)^c}\) in magnitude, and so on. For simplicity, we denote \(T_{jl}:= T_j \cup T_l\). To prove the theorem, we only need to give an upper bound for \(\Vert {{\varvec{h}}}\Vert _2\). Observe that

$$\begin{aligned} \Vert {{\varvec{h}}}\Vert _2 \le \Vert {{\varvec{h}}_{T_{01}}}\Vert _2 + \Vert {{\varvec{h}}-{\varvec{h}}_{T_{01}}}\Vert _2. \end{aligned}$$
(19)

We claim that the following holds:

$$\begin{aligned} \Vert {{\varvec{h}}-{\varvec{h}}_{T_{01}}}\Vert _2 \le \frac{1}{\sqrt{a}}\Vert {\varvec{h}}_{T_{01}} \Vert _2 + \frac{2\sigma _k({\varvec{x}}_0)_1}{\sqrt{a(k+1)}} \end{aligned}$$
(20)

and

$$\begin{aligned} \Vert {{\varvec{h}}_{T_{01}}}\Vert _2 \le \frac{2}{\sqrt{\theta _l}- \sqrt{\theta _u}/\sqrt{a}} \cdot \left( \epsilon + \frac{\sqrt{\theta _u} \sigma _k({\varvec{x}}_0)_1}{\sqrt{a(k+1)}}\right) . \end{aligned}$$
(21)

Here, \(C,\, c,\, \theta _l\) and \(\theta _u\) are positive constants depending only on \(\alpha \) and \(\beta \). Putting (20) and (21) into (19), we obtain that

$$\begin{aligned} \Vert {{\varvec{h}}}\Vert _2 \le \frac{2\left( 1+1/\sqrt{a}\right) }{\sqrt{\theta _l}- \sqrt{\theta _u}/\sqrt{a}} \epsilon + \left( \frac{2(1+1/\sqrt{a})\sqrt{\theta _u}}{\sqrt{\theta _l}- \sqrt{\theta _u}/\sqrt{a}} +2\right) \frac{\sigma _k({\varvec{x}}_0)_1}{\sqrt{a(k+1)}}. \end{aligned}$$

It remains to prove the claim (20) and (21). Since \({\widehat{\varvec{x}}}\) is the solution to \(\ell _1\) minimization program (4), we have

$$\begin{aligned} \Vert {\varvec{x}}'_0 \Vert _1 \ge \Vert {\widehat{\varvec{x}}}' \Vert _1&= \Vert {\varvec{x}}'_0 + {\varvec{h}} \Vert _1 = \Vert ({\varvec{x}}'_0 + {\varvec{h}})_{T_0} \Vert _1 + \Vert ({\varvec{x}}'_0 + {\varvec{h}})_{T_0^c} \Vert _1 \\ {}&\ge \Vert {\varvec{x}}'_{0,T_0} \Vert _1 - \Vert {\varvec{h}}_{T_0} \Vert _1 + \Vert {\varvec{h}}_{T_0^c} \Vert _1 - \Vert {\varvec{x}}'_{0,T_0^c} \Vert _1. \end{aligned}$$

Therefore,

$$\begin{aligned} \Vert {\varvec{h}}_{T_0^c}\Vert _1 \le \Vert {\varvec{h}}_{T_0} \Vert _1 + 2\Vert {\varvec{x}}'_{0,T_0^c} \Vert _1. \end{aligned}$$
(22)

From the definition of \(T_j\), we obtain that, for all \(j \ge 2\),

$$\begin{aligned} \Vert {\varvec{h}}_{T_j} \Vert _2 \le \sqrt{a(k+1)}\Vert {\varvec{h}}_{T_j} \Vert _\infty = \frac{a(k+1)}{\sqrt{a(k+1)}}\Vert {\varvec{h}}_{T_j} \Vert _\infty \le \frac{\Vert {\varvec{h}}_{T_{j-1}} \Vert _1}{\sqrt{a(k+1)}}. \end{aligned}$$

It then gives

$$\begin{aligned} \Vert {\varvec{h}}_{T_{01}^c} \Vert _2 \le \sum _{j \ge 2}\Vert {\varvec{h}}_{T_j} \Vert _2 \le \frac{1}{\sqrt{a(k+1)}}\sum _{j \ge 2}\Vert {\varvec{h}}_{T_{j-1}} \Vert _1 = \frac{1}{\sqrt{a(k+1)}}\Vert {\varvec{h}}_{T_0^c} \Vert _1. \end{aligned}$$
(23)

Putting (22) into (23), we obtain the conclusion of claim (20), namely,

$$\begin{aligned} \begin{aligned} \Vert {\varvec{h}}_{T_{01}^c} \Vert _2&\le \frac{1}{\sqrt{a(k+1)}}\Vert {\varvec{h}}_{T_0^c} \Vert _1 \le \frac{\Vert {\varvec{h}}_{T_0} \Vert _1 + 2\Vert {\varvec{x}}'_{0,T_0^c} \Vert _1}{\sqrt{a(k+1)}} \\ {}&\le \frac{1}{\sqrt{a}} \Vert {\varvec{h}}_{T_0} \Vert _2 + \frac{2\sigma _{k+1}({\varvec{x}}'_0)_1}{\sqrt{k}} \le \frac{1}{\sqrt{a}}\Vert {\varvec{h}}_{T_{01}} \Vert _2 + \frac{2\sigma _k({\varvec{x}}_0)_1}{\sqrt{a(k+1)}}, \end{aligned} \end{aligned}$$
(24)

where the third inequality follows the Cauchy-Schwarz inequality and the last inequality comes from the fact \(\sigma _{k+1}({\varvec{x}}'_0)_1 \le \sigma _k({\varvec{x}}_0)_1\) by the definitions of \({\widehat{\varvec{x}}}'\) and \(\sigma _k(\cdot )_1\).

We next turn to prove the claim (21). Observe that

$$\begin{aligned} \Vert \varvec{A}'_I {\varvec{h}} \Vert _2 \ge \Vert \varvec{A}'_I {\varvec{h}}_{T_{01}} \Vert _2 - \Vert \varvec{A}'_I {\varvec{h}}_{T_{01}^c} \Vert _2. \end{aligned}$$
(25)

For the left hand side of (25), by the definition of I, we have

$$\begin{aligned} \begin{aligned} \Vert \varvec{A}'_I {\varvec{h}} \Vert _2&= \Vert \left|\varvec{A}'_I {\widehat{\varvec{x}}}'| - |\varvec{A}'_I {\varvec{x}}'_0\right| \Vert _2 \\ {}&\le \Vert \left|\varvec{A}' {\widehat{\varvec{x}}}'\right| - \left|\varvec{A}' {\varvec{x}}'_0\right| \Vert _2 \\ {}&\le \Vert {\left|\varvec{A}' {\widehat{\varvec{x}}}'\right|- { \varvec{ y}}}\Vert _2 +\Vert {\left|\varvec{A}' {\varvec{x}}'_0\right|- { \varvec{ y}}}\Vert _2 \\&\le 2\epsilon . \end{aligned} \end{aligned}$$
(26)

For the first term of the right hand side of (25), since the matrix \(\varvec{A}'\) satisfies strong RIP of order \((a+1)(k+1)\) with constants \(\theta _l,\theta _u >0\), we immediately have

$$\begin{aligned} \Vert \varvec{A}'_I {\varvec{h}}_{T_{01}} \Vert _2 \ge \sqrt{\theta _l} \Vert {{\varvec{h}}_{T_{01}}}\Vert _2. \end{aligned}$$
(27)

To give an upper bound for the term \( \Vert \varvec{A}'_I {\varvec{h}}_{T_{01}^c} \Vert _2\), note that \(\Vert {\varvec{h}}_{T_{01}^c} \Vert _\infty \le \Vert {\varvec{h}}_{T_1} \Vert _1/a(k+1)\). Let \(\theta :=\max \left( \Vert {\varvec{h}}_{T_1} \Vert _1/a(k+1), \Vert {\varvec{h}}_{T_{01}^c} \Vert _1/a(k+1) \right) \). Then by the Lemma 2, we could decompose the vector \({\varvec{h}}_{T_{01}^c}\) into the following form:

$$\begin{aligned} {\varvec{h}}_{T_{01}^c} =\sum _{j=1}^N \lambda _j {\varvec{u}}_j, \quad \text {with} \quad 0\le \lambda _j\le 1, \quad \sum _{j=1}^N \lambda _j=1, \end{aligned}$$

where \({\varvec{u}}_j\) are \(a(k+1)\)-sparse vectors satisfying

$$\begin{aligned} \Vert {{\varvec{u}}_j}\Vert _1=\Vert { {\varvec{h}}_{T_{01}^c}}\Vert _1, \quad \Vert {\varvec{u}}_j \Vert _\infty \le \theta . \end{aligned}$$

Therefore, we have

$$\begin{aligned} \Vert {{\varvec{u}}_j}\Vert _2 \le \sqrt{\theta \Vert { {\varvec{h}}_{T_{01}^c}}\Vert _1}. \end{aligned}$$

We notice from (22) that

$$\begin{aligned} \Vert { {\varvec{h}}_{T_{01}^c}}\Vert _1 \le \Vert { {\varvec{h}}_{T_{0}^c}}\Vert _1 \le \Vert {\varvec{h}}_{T_0} \Vert _1 + 2\sigma _k({\varvec{x}}_0)_1. \end{aligned}$$

Thus, if \(\theta = \Vert {\varvec{h}}_{T_1} \Vert _1/a(k+1)\), then we have

$$\begin{aligned} \Vert {{\varvec{u}}_j}\Vert _2\le & {} \sqrt{\frac{\Vert {\varvec{h}}_{T_1} \Vert _1 \Vert {{\varvec{h}}_{T_{01}^c}}\Vert _1}{a(k+1)}} \! \le \! \sqrt{\frac{\Vert {\varvec{h}}_{T_0^c} \Vert _1 \Vert { {\varvec{h}}_{T_{01}^c}}\Vert _1}{a(k+1)}}\\\le & {} \frac{\Vert {\varvec{h}}_{T_0} \Vert _1 + 2\sigma _k({\varvec{x}}_0)_1}{\sqrt{a(k+1)}} \! \le \! \frac{\Vert {\varvec{h}}_{T_0} \Vert _2}{\sqrt{a}}+\frac{ 2\sigma _k({\varvec{x}}_0)_1}{\sqrt{a(k+1)}}. \end{aligned}$$

If \(\theta =\Vert {\varvec{h}}_{T_{01}^c} \Vert _1/a(k+1) \), then

$$\begin{aligned} \Vert {{\varvec{u}}_j}\Vert _2 \le \frac{\Vert {\varvec{h}}_{T_{01}^c} \Vert _1}{\sqrt{a(k+1)}} \le \frac{\Vert {\varvec{h}}_{T_0} \Vert _2}{\sqrt{a}}+\frac{2\sigma _k({\varvec{x}}_0)_1}{\sqrt{a(k+1)}}. \end{aligned}$$

Therefore, for the second term of the right hand side of (25), it follows from the definition of strong RIP that

$$\begin{aligned} \Vert \varvec{A}'_I {\varvec{h}}_{T_{01}^c} \Vert _2 =\Vert {\sum _{j=1}^N \lambda _j \varvec{A}'_I {\varvec{u}}_j }\Vert _2 \le \sqrt{\theta _u} \sum _{j=1}^N \lambda _j \Vert {{\varvec{u}}_j}\Vert _2 \le \sqrt{\theta _u} \left( \frac{\Vert {\varvec{h}}_{T_0} \Vert _2}{\sqrt{a}}+\frac{2\sigma _k({\varvec{x}}_0)_1}{\sqrt{a(k+1)}}\right) . \end{aligned}$$
(28)

Putting (26), (27) and (28) into (25), we immediately obtain

$$\begin{aligned} 2\epsilon \ge \sqrt{\theta _l} \Vert {{\varvec{h}}_{T_{01}}}\Vert _2 - \sqrt{\theta _u}\left( \frac{\Vert {\varvec{h}}_{T_{01}} \Vert _2}{\sqrt{a}}+\frac{2\sigma _k({\varvec{x}}_0)_1}{\sqrt{a(k+1)}}\right) , \end{aligned}$$

which gives

$$\begin{aligned} \Vert {{\varvec{h}}_{T_{01}}}\Vert _2 \le \frac{2}{\sqrt{\theta _l}-\sqrt{\theta _u}/\sqrt{a}} \cdot \left( \epsilon + \frac{\sqrt{\theta _u} \sigma _k({\varvec{x}}_0)_1}{\sqrt{a(k+1)}}\right) . \end{aligned}$$

Case 2: \(\left|I\right| < m/2\). For this case, denote \({\varvec{h}}^+ = {\widehat{\varvec{x}}}' + {\varvec{x}}'_0\). Replacing \({\varvec{h}}\) and the subset I in Case 1 by \({\varvec{h}}^+\) and \(I^c\) respectively, and applying the same argument, we could obtain

$$\begin{aligned} \Vert {\varvec{h}}_+ \Vert \le \frac{2\left( 1+1/\sqrt{a}\right) }{\sqrt{\theta _l}-\sqrt{\theta _u}/\sqrt{a}} \epsilon + \left( \frac{2(1+1/\sqrt{a})\sqrt{\theta _u}}{\sqrt{\theta _l}-\sqrt{\theta _u}/\sqrt{a}} +2\right) \frac{\sigma _k({\varvec{x}}_0)_1}{\sqrt{a(k+1)}}. \end{aligned}$$
(29)

However, recall that \({\widehat{\varvec{x}}}' = ({\widehat{\varvec{x}}}^\textrm{T}, 1)^\textrm{T}\) and \({\varvec{x}}'_0 = (\varvec{x}_0^\textrm{T}, 1)^\textrm{T}\). It means \(\Vert {\varvec{h}}_+ \Vert _2 \ge 2\), which contradicts to (29) by the assumption of \(\epsilon \) and \(\sigma _k({\varvec{x}}_0)_1\), i.e., \( K_1\epsilon + K_2 \frac{\sigma _k({\varvec{x}}_0)_1}{\sqrt{a(k+1)}} <2 \). Therefore, Case 2 does not hold.

Combining the above two cases, we complete our proof. \(\square \)

3 Proof of Theorems 4 and 5

3.1 Proof of Theorem 4

Proof

Without loss of generality, we assume that \(\Vert {\varvec{H}'}\Vert _F=1\). Observe that

$$\begin{aligned} \frac{1}{m}\Vert {\mathcal A}'(\varvec{H}') \Vert _1=\frac{1}{m} \sum _{j=1}^m \left|{\varvec{a}}_j^* \varvec{H} {\varvec{a}}_j+ 2(b_j ( {\varvec{a}}_j^*{\varvec{h}}))_{\Re }\right|:= \frac{1}{m} \sum _{j=1}^m \xi _j. \end{aligned}$$

For any fixed \(\varvec{H} \in \mathbb H^{n\times n}\) and \({\varvec{h}}\in {\mathbb C}^n\), the terms \(\xi _j, j=1,\ldots ,m\) are independent sub-exponential random variables with the maximal sub-exponential norm

$$\begin{aligned} K:=\max _{1\le j\le m}C_1(\Vert {\varvec{H}}\Vert _F+ \Vert b_j \Vert _{\psi _2} \Vert {\varvec{h}} \Vert ) \le C_2 \end{aligned}$$

for some universal constants \(C_1,\, C_2>0\). Here, we use the fact \(\max \left( \Vert {\varvec{H}}\Vert _F,\Vert {\varvec{h}} \Vert \right) \le \Vert {\varvec{H}'}\Vert _F = 1\). For any \(0<\epsilon \le 1\), the Bernstein’s inequality gives

$$\begin{aligned} {\mathbb P}\Biggl (\Bigl \vert \frac{1}{m} \sum _{j=1}^m \left( \xi _j - {\mathbb E}\xi _j\right) \Bigr \vert \ge \epsilon \Biggr ) \le 2\exp \left( -c\epsilon ^2 m\right) , \end{aligned}$$

where \(c>0\) is a universal constant. According to Lemma 6, we obtain that

$$\begin{aligned} \frac{1}{3} {\mathbb E}\sqrt{ \Vert {\varvec{H}}\Vert _F^2+ \vert b_j\vert ^2 \Vert {\varvec{h}} \Vert ^2 } \le {\mathbb E}\xi _j \le 2 {\mathbb E}\sqrt{3 \Vert {\varvec{H}}\Vert _F^2+ \vert b_j \vert ^2 \Vert {\varvec{h}} \Vert ^2}. \end{aligned}$$

This gives

$$\begin{aligned} \frac{1}{m} \sum _{j=1}^m {\mathbb E}\xi _j \le \frac{2}{m} \sum _{j=1}^m {\mathbb E}\left( \sqrt{3} \Vert {\varvec{H}}\Vert _F+ \vert b_j\vert \Vert {\varvec{h}} \Vert \right) \le 2\sqrt{3} \Vert {\varvec{H}}\Vert _F+2c_2\Vert {\varvec{h}} \Vert \le 2\theta ^+, \end{aligned}$$

where \(\theta ^+:=\max (\sqrt{6}, c_2)\). Here, we use the fact \(\Vert {\varvec{H}'}\Vert _F^2=\Vert {\varvec{H}}\Vert _F^2+2\Vert {\varvec{h}} \Vert ^2=1\), \({\mathbb E}\Vert {\varvec{b}} \Vert _1\le \sqrt{m} {\mathbb E}\Vert {\varvec{b}} \Vert \le c_2m\), and \(\frac{a+b}{\sqrt{2}} \le \sqrt{a^2+b^2} \le a+b\) for any positive number \(a,b \in {\mathbb R}\). Similarly, we could obtain

$$\begin{aligned} \frac{1}{m} \sum _{j=1}^m {\mathbb E}\xi _j \ge \frac{1}{3\sqrt{2}}\cdot \frac{1}{m} \sum _{j=1}^m {\mathbb E}\left( \Vert {\varvec{H}}\Vert _F+\vert b_j \vert \Vert {\varvec{h}} \Vert \right) \ge \frac{1}{3\sqrt{2}} \left( \Vert {\varvec{H}}\Vert _F+c_1 \Vert {\varvec{h}} \Vert \right) \ge \frac{\theta ^-}{6}, \end{aligned}$$

where \(\theta ^-:=\min (1, c_1/\sqrt{2})\). Collecting the above estimators, we obtain that, with probability at least \(1-2\exp (-c\epsilon ^2 m)\), the following inequality

$$\begin{aligned} \frac{\theta ^-}{6} -\epsilon \le \frac{1}{m}\Vert {\mathcal A}'(\varvec{H}') \Vert _1\le 2\theta ^+ +\epsilon \end{aligned}$$
(30)

holds for a fixed \(\varvec{H}' \in \mathbb H^{(n+1)\times (n+1)}\). We next show that (30) holds for all \(\varvec{H}' \in \mathcal X\), where

$$\begin{aligned} \mathcal X \!:=\! \left\{ \varvec{H}'\!:=\!\begin{bmatrix} \varvec{H} &{} {\varvec{h}}\\ {\varvec{h}}^* &{} 0 \end{bmatrix} \in \mathbb H^{(n+1)\times (n+1)}: \Vert {\varvec{H}'}\Vert _F=1,\, {\text {rank}}(\varvec{H}) \le 2,\, \Vert \varvec{H} \Vert _{0, 2} \le k,\, \Vert {\varvec{h}} \Vert _0 \le k\right\} . \end{aligned}$$

To this end, we adopt a basic version of a \(\delta \)-net argument. Assume that \(\mathcal N_\delta \) is a \(\delta \)-net of \(\mathcal X\), i.e., for any \(\varvec{H}'=\begin{bmatrix} \varvec{H} &{} {\varvec{h}}\\ {\varvec{h}}^* &{} 0 \end{bmatrix} \in \mathcal X\) there exists a \(\varvec{H}_0':=\begin{bmatrix} \varvec{H}_0 &{} {\varvec{h}}_0 \\ {\varvec{h}}_0^* &{} 0 \end{bmatrix} \in \mathcal N_\delta \) such that \(\Vert {\varvec{H} - \varvec{H}_0}\Vert _F\le \delta \) and \(\Vert {\varvec{h}}-{\varvec{h}}_0 \Vert \le \delta \). Using the same idea of Lemma 2.1 in [36], we obtain that the covering number of \(\mathcal X\) is

$$\begin{aligned} \left|\mathcal N_\delta \right| \le \left( \frac{9\sqrt{2} en}{\delta k} \right) ^{4k+2}\cdot \left( \begin{array}{l} n \\ k \end{array}\right) \left( 1+\frac{2}{\delta }\right) ^{2k} \le \exp \left( C_3 k\log (en/ \delta k)\right) , \end{aligned}$$

where \(C_3>0\) is a universal constant. Note that \({\varvec{h}}-{\varvec{h}}_0\) has at most 2k nonzero entries. We obtain that if \(m\gtrsim k\log (en/k)\), then with probability at least \(1-3\exp (-c m)\), it holds

$$\begin{aligned} \left|\frac{1}{m}\Vert {\mathcal A}'(\varvec{H}') \Vert _1-\frac{1}{m}\Vert {\mathcal A}'(\varvec{H}_0') \Vert _1\right|&\le \frac{1}{m}\Vert {\mathcal A}'(\varvec{H}'-\varvec{H}'_0) \Vert _1 \\&\le \frac{1}{m}\Vert \mathcal A(\varvec{H}-\varvec{H}_0) \Vert _1 +\frac{2}{m} \sum _{j=1}^m \vert b_j\vert \vert {\varvec{a}}_j^* ({\varvec{h}}-{\varvec{h}}_0)\vert \\ {}&\le \frac{1}{m}\Vert \mathcal A(\varvec{H}-\varvec{H}_0) \Vert _1 \\&\quad + 2\sqrt{\frac{1}{m} \sum _{j=1}^m \vert b_j\vert ^2} \! \sqrt{\frac{1}{m} \sum _{j=1}^m \vert {\varvec{a}}_j^* ({\varvec{h}}-{\varvec{h}}_0)\vert ^2}\\ {}&\le 2.45 \Vert {\varvec{H}-\varvec{H}_0}\Vert _F + 3(c_2+1) \Vert {\varvec{h}}-{\varvec{h}}_0 \Vert \\ {}&\le 3\left( c_2+2\right) \delta , \end{aligned}$$

where the linear map \(\mathcal A(\cdot )\) is defined as \(\mathcal {A}(\varvec{H}):= ({\varvec{a}}_1^*\varvec{H} {\varvec{a}}_1, \dots , {\varvec{a}}_m^*\varvec{H} {\varvec{a}}_m)\), and the fourth inequality follows from the combination of Lemma 3, the fact \( \frac{1}{\,}m \sum _{j=1}^m {\varvec{a}}_j {\varvec{a}}_j^* \le 3/2\) with probability at least \(1-\exp (-c m)\), and

$$\begin{aligned} \frac{1}{m} \sum _{j=1}^m \vert b_j \vert ^2 \le \frac{{\mathbb E}\Vert {\varvec{b}} \Vert ^2}{m}+1\le c_2 +1 \end{aligned}$$

with probability at least \(1-2\exp (-c m)\). Choosing \(\epsilon :=\frac{1}{48}\), \(\delta :=\frac{\theta ^-}{48(c_2+2)}\), and taking the union bound, we obtain that the following inequality

$$\begin{aligned} \frac{\theta ^-}{12} \le \frac{1}{m}\Vert {\mathcal A}'(\varvec{H}') \Vert _1\le 3\theta ^+ \quad \text{ for } \text{ all } \quad \varvec{H}'\in \mathcal X \end{aligned}$$

holds with probability at least

$$\begin{aligned} 1-3\exp (-cm)-2 \exp \left( C_3 k \log (en/ \delta k)\right) \cdot \exp (-c \epsilon ^2 m) \ge 1-5 \exp (-c' m), \end{aligned}$$

provided \(m\ge C' k\log (en/k)\), where \(C',\, c'>0\) are constants depending only on \(c_1\) and \(c_2\). \(\square \)

3.2 Proof of Theorem 5

Proof

The proof of this theorem is adapted from that of Theorem 1.3 in [36]. Note that the \(\ell _1\)-minimization problem we consider is

$$\begin{aligned} \mathop {\textrm{argmin}}\limits _{{\varvec{x}}\in {\mathbb C}^n} \quad \Vert {\varvec{x}} \Vert _1 \quad \mathrm {s.t.} \quad \Vert \mathcal A'({\varvec{x}}')-{ \varvec{ y}}' \Vert \le \epsilon \quad \text{ with } \quad {\varvec{x}}'= \begin{pmatrix} {\varvec{x}}\\ 1\end{pmatrix}. \end{aligned}$$
(31)

Here, with some abuse of notation, we set

$$\begin{aligned} \mathcal A'({\varvec{x}}'):=\mathcal A'({\varvec{x}}' {\varvec{x}}') = \left( \left|{\varvec{a}}_1'^* {\varvec{x}}'\right|^2, \ldots , \left|{\varvec{a}}_m'^* {\varvec{x}}'\right|^2\right) \quad \text{ with } \quad {\varvec{a}}_j':=\begin{pmatrix} {\varvec{a}}_j\\ b_j\end{pmatrix},\quad j=1,\ldots ,m. \end{aligned}$$

Let \({\widehat{\varvec{x}}}\in {\mathbb C}^n\) be a solution to (31). Without loss of generality, we assume \(\langle {\widehat{\varvec{x}}}',{\varvec{x}}_0' \rangle \ge 0\) (Otherwise, we can choose \(e^{\textrm{i}\theta } {\varvec{x}}_0'\) for an appropriate \(\theta \)), where \({\widehat{\varvec{x}}}'= \begin{pmatrix} {\widehat{\varvec{x}}}\\ 1\end{pmatrix}\) and \({\varvec{x}}_0'= \begin{pmatrix} {\varvec{x}}_0 \\ 1\end{pmatrix}\). Set

$$\begin{aligned} \hat{\varvec{X}}':={\widehat{\varvec{x}}}'{\widehat{\varvec{x}}}'^*=\begin{pmatrix} {\widehat{\varvec{x}}}{\widehat{\varvec{x}}}^* &{} {\widehat{\varvec{x}}}\\ {\widehat{\varvec{x}}}^*&{} 1 \end{pmatrix} \end{aligned}$$

and

$$\begin{aligned} \varvec{H}':={\widehat{\varvec{x}}}'{\widehat{\varvec{x}}}'^*-{\varvec{x}}'_0{{\varvec{x}}'_0}^*=\begin{pmatrix} {\widehat{\varvec{x}}}{\widehat{\varvec{x}}}^* -{\varvec{x}}_0{\varvec{x}}_0^* &{} {\widehat{\varvec{x}}}-{\varvec{x}}_0 \\ {\widehat{\varvec{x}}}^* -{\varvec{x}}_0^*&{} 0 \end{pmatrix}:=\begin{pmatrix} \varvec{H} &{} {\varvec{h}}\\ {\varvec{h}}^*&{} 0\end{pmatrix}. \end{aligned}$$

Therefore, it suffices to give an upper bound for \(\Vert {\varvec{H}'}\Vert _F\). Denote \(T_0:={\text {supp}}({\varvec{x}}_0)\) and \(T'_0:=T_0 \cup \left\{ n+1\right\} \). Let \(T_1\) be the index set corresponding to the indices of the ak-largest elements of \({\widehat{\varvec{x}}}_{T_0^c}\) in magnitude, and \(T_2\) contain the indices of the next ak largest elements, and so on. Set \(T_{01}:=T_0 \cup T_1\), \(T'_{01}:=T'_0 \cup T_1\), \( \bar{\varvec{h}}:={\varvec{h}}_{T_{01}}\), \( \bar{\varvec{H}}=\varvec{H}_{T_{01}, T_{01}}\), and \( \bar{\varvec{H}}':= \varvec{H}'_{T'_{01}, T'_{01}}\). Noting that

$$\begin{aligned} \Vert {\varvec{H}'}\Vert _F\le \Vert { \bar{\varvec{H}}'}\Vert _F +\Vert {\varvec{H}'- \bar{\varvec{H}}'}\Vert _F, \end{aligned}$$
(32)

we next consider the terms \(\Vert { \bar{\varvec{H}}'}\Vert _F\) and \(\Vert {\varvec{H}'- \bar{\varvec{H}}'}\Vert _F\). We claim that

$$\begin{aligned} \Vert {\varvec{H}'- \bar{\varvec{H}}'}\Vert _F \le \left( \frac{1}{a} +\frac{4}{\sqrt{a}}\right) \Vert { \bar{\varvec{H}}'}\Vert _F \end{aligned}$$
(33)

and

$$\begin{aligned} \Vert { \bar{\varvec{H}}'}\Vert _F \le \frac{1}{c - C\left( \frac{4}{\sqrt{a}}+\frac{1}{a}\right) } \cdot \frac{2\epsilon }{\sqrt{m}}. \end{aligned}$$
(34)

Combining (32), (33) and (34), we obtain that

$$\begin{aligned} \Vert {\varvec{H}'}\Vert _F\le \frac{\frac{1}{a} +\frac{4}{\sqrt{a}}+1}{c - C\left( \frac{4}{\sqrt{a}}+\frac{1}{a}\right) } \cdot \frac{2\epsilon }{\sqrt{m}}. \end{aligned}$$

According to Lemma 4, we immediately have

$$\begin{aligned} \min _{\theta \in {\mathbb R}} \Vert {{\widehat{\varvec{x}}}'-e^{\textrm{i}\theta } {\varvec{x}}'_0}\Vert _2 \le \frac{\sqrt{2}\Vert \varvec{H}' \Vert }{\Vert {\varvec{x}}_0 \Vert +1} \le \frac{\frac{1}{a} +\frac{4}{\sqrt{a}}+1}{c - C\left( \frac{4}{\sqrt{a}}+\frac{1}{a}\right) } \cdot \frac{2\sqrt{2} \epsilon }{\left( \Vert {\varvec{x}}_0 \Vert +1\right) \sqrt{m}}. \end{aligned}$$

By the definition of \({\widehat{\varvec{x}}}'\) and \({\varvec{x}}'_0\), we arrive at the conclusion.

It remains to prove the claims (33) and (34). Note that

$$\begin{aligned} \Vert {\varvec{H}'- \bar{\varvec{H}}'}\Vert _F \le \sum _{i\ge 2, j\ge 2} \Vert {\varvec{H}_{T_i, T_j}}\Vert _F + 2 \sum _{j\ge 2} \Vert {\varvec{H}'_{T'_0, T_j}}\Vert _F+2 \sum _{j\ge 2} \Vert {\varvec{H}'_{T_1, T_j}}\Vert _F . \end{aligned}$$
(35)

We first give an upper bound for the term \(\sum _{i\ge 2, j\ge 2} \Vert {\varvec{H}'_{T_i, T_j}}\Vert _F\). Noting that \({\varvec{x}}_0\) is a k-sparse vector and \({\widehat{\varvec{x}}}\in {\mathbb C}^n\) is the solution to (31), we obtain that

$$\begin{aligned} \Vert {\varvec{x}}_0 \Vert _1 \ge \Vert {{\widehat{\varvec{x}}}}\Vert _1=\Vert {{\widehat{\varvec{x}}}_{T_0}}\Vert _1+\Vert {{\widehat{\varvec{x}}}_{T_0^c}}\Vert _1, \end{aligned}$$

which implies \(\Vert {{\widehat{\varvec{x}}}_{T_0^c}}\Vert _1 \le \Vert {{\widehat{\varvec{x}}}_{T_0}-{\varvec{x}}_0}\Vert _1\). Moreover, by the definition of \(T_j\), we know that for all \(j\ge 2\), it holds \( \Vert {\widehat{\varvec{x}}}_{T_j} \Vert _2 \le \frac{\Vert {\widehat{\varvec{x}}}_{T_{j-1}} \Vert _1}{\sqrt{ak}}\). It then implies

$$\begin{aligned} \sum _{j\ge 2} \Vert {{\widehat{\varvec{x}}}_{T_{j}}}\Vert _2 \le \frac{1}{\sqrt{ak}} \sum _{j\ge 2} \Vert {{\widehat{\varvec{x}}}_{T_{j-1}}}\Vert _1 \le \frac{1}{\sqrt{ak}} \Vert {\widehat{\varvec{x}}}_{T_0^c} \Vert _1 \le \frac{1}{\sqrt{a}} \Vert {{\widehat{\varvec{x}}}_{T_0}-{\varvec{x}}_0}\Vert _2. \end{aligned}$$
(36)

Therefore, the first term of (35) can be estimated as

$$\begin{aligned} \begin{aligned} \sum _{i\ge 2, j\ge 2} \Vert {\varvec{H}_{T_i, T_j}}\Vert _F= \sum _{i\ge 2, j\ge 2} \Vert {{\widehat{\varvec{x}}}_{T_i}}\Vert _2 \Vert {{\widehat{\varvec{x}}}_{T_j}}\Vert _2 = \left( \sum _{j\ge 2} \Vert {{\widehat{\varvec{x}}}_{T_{j}}}\Vert _2\right) ^2 \le \frac{1}{ak} \Vert {\widehat{\varvec{x}}}_{T_{0}^c} \Vert _1^2 \\ = \frac{1}{ak} \Vert \varvec{H}_{T_0^c, T_0^c} \Vert _1 \le \frac{1}{ak} \Vert \varvec{H}_{T_0, T_0} \Vert _1 \le \frac{1}{a} \Vert { \bar{\varvec{H}}'}\Vert _F, \end{aligned} \end{aligned}$$
(37)

where the second inequality follows from

$$\begin{aligned} \Vert {\varvec{H}-\varvec{H}_{T_0,T_0}}\Vert _1 =\Vert {{\widehat{\varvec{x}}}{\widehat{\varvec{x}}}^*-({\widehat{\varvec{x}}}{\widehat{\varvec{x}}}^*)_{T_0,T_0}}\Vert _1 \le \Vert {{\varvec{x}}_0{{\varvec{x}}_0}^*}\Vert _1-\Vert {({\widehat{\varvec{x}}}{\widehat{\varvec{x}}}^*)_{T_0,T_0}}\Vert _1 \le \Vert {\varvec{H}_{T_0, T_0}}\Vert _1. \end{aligned}$$

Here, the first inequality comes from \(\Vert {{\widehat{\varvec{x}}}}\Vert _1 \le \Vert {{\varvec{x}}_0}\Vert _1\).

For the second term and the third term of (35), we obtain that

$$\begin{aligned} \begin{aligned} \sum _{j\ge 2} \Vert {\varvec{H}'_{T'_0, T_j}}\Vert _F+\sum _{j\ge 2} \Vert {\varvec{H}'_{T_1, T_j}}\Vert _F&= \Vert {\widehat{\varvec{x}}}'_{T'_0} \Vert \sum _{j\ge 2} \Vert {\widehat{\varvec{x}}}'_{T_j} \Vert + \Vert {\widehat{\varvec{x}}}'_{T_1} \Vert \sum _{j\ge 2} \Vert {\widehat{\varvec{x}}}'_{T_j} \Vert \\ {}&\le \frac{1}{\sqrt{a}} \Vert {{\widehat{\varvec{x}}}'_{T'_0}-{\varvec{x}}'_0}\Vert _2 \left( \Vert {{\widehat{\varvec{x}}}'_{T'_0}}\Vert _2+\Vert {{\widehat{\varvec{x}}}'_{T_1}}\Vert _2\right) \\ {}&\le \frac{\sqrt{2}}{\sqrt{a}} \Vert {{\widehat{\varvec{x}}}'_{T'_{01}}-{\varvec{x}}'_0}\Vert _2 \Vert {{\widehat{\varvec{x}}}'_{T'_{01}}}\Vert _2 \\&\le \frac{2}{\sqrt{a}} \Vert { \bar{\varvec{H}}'}\Vert _F, \end{aligned} \end{aligned}$$
(38)

where the first inequality follows from (36) due to \({\widehat{\varvec{x}}}'_{T_j}={\widehat{\varvec{x}}}_{T_j}\) for all \(j\ge 1\), and the last inequality comes from Lemma 4. Putting (37) and (38) into (35), we obtain that

$$\begin{aligned} \Vert {\varvec{H}'- \bar{\varvec{H}}'}\Vert _F \le \left( \frac{1}{a} +\frac{4}{\sqrt{a}}\right) \Vert { \bar{\varvec{H}}'}\Vert _F. \end{aligned}$$

This proves the claim (33).

Finally, we turn to prove the claim (34). Note that \(\Vert \mathcal A'({\widehat{\varvec{x}}}')-{\tilde{\varvec{y}}} \Vert \le \epsilon \) and \({\tilde{\varvec{y}}}:=\mathcal A'({\varvec{x}}_0')+\epsilon \), which implies

$$\begin{aligned} \Vert {\mathcal A'(\varvec{H}')}\Vert _2 \le \Vert {\mathcal A'({\widehat{\varvec{x}}}')-{\tilde{\varvec{y}}}}\Vert _2+ \Vert {\mathcal A'({\varvec{x}}_0')-{\tilde{\varvec{y}}}}\Vert _2 \le 2\epsilon . \end{aligned}$$

Thus, we have

$$\begin{aligned} \frac{2\epsilon }{\sqrt{m}} \ge \frac{1}{\sqrt{m}} \Vert {\mathcal A'(\varvec{H}')}\Vert _2 \ge \frac{1}{m} \Vert {\mathcal A'(\varvec{H}')}\Vert _1 \ge \frac{1}{m} \Vert {\mathcal A'( \bar{\varvec{H}}')}\Vert _1 - \frac{1}{m} \Vert {\mathcal A'(\varvec{H}'- \bar{\varvec{H}}')}\Vert _1. \end{aligned}$$
(39)

Recall that \( \bar{\varvec{H}}':=\begin{pmatrix} \bar{\varvec{H}}&{} \bar{\varvec{h}}\\ \bar{\varvec{h}}^*&{} 0\end{pmatrix}\) with \({\text {rank}}( \bar{\varvec{H}})\le 2\), \(\Vert \bar{\varvec{H}} \Vert _{0,2}\le (a+1)k\), and \(\Vert \bar{\varvec{h}} \Vert _0 \le (a+1)k\). It then follows from the RIP of \(\mathcal A'\) that

$$\begin{aligned} \Vert {\mathcal A'( \bar{\varvec{H}}')}\Vert _1 \ge c \Vert { \bar{\varvec{H}}'}\Vert _F. \end{aligned}$$
(40)

To prove (34), it suffices to give an upper bound for the term \(\frac{1}{m} \Vert {\mathcal A'(\varvec{H}'- \bar{\varvec{H}}')}\Vert _1\). Observe that

$$\begin{aligned} \varvec{H}'- \bar{\varvec{H}}' = (\varvec{H}'_{T'_0, {T'^c_{01}}}+\varvec{H}'_{{T'^c_{01}},T'_0}) + (\varvec{H}'_{T_1, T'^c_{01}}+\varvec{H}'_{ T'^c_{01},T_1})+\varvec{H}'_{T'^c_{01}, T'^c_{01}}. \end{aligned}$$
(41)

Since

$$\begin{aligned} \varvec{H}'_{T'_0, {T'^c_{01}}}+\varvec{H}'_{ {T'^c_{01}},T'_0} =\sum _{j\ge 2} (\varvec{H}'_{T'_0, T_{j}}+\varvec{H}'_{ T_{j},T'_0}) =\sum _{j\ge 2} \begin{pmatrix} {\widehat{\varvec{x}}}_{T_0}{\widehat{\varvec{x}}}_{T_j}^*+{\widehat{\varvec{x}}}_{T_j}{\widehat{\varvec{x}}}_{T_0}^* &{} {\widehat{\varvec{x}}}_{T_j} \\ {\widehat{\varvec{x}}}_{T_j}^* &{} 0\end{pmatrix}, \end{aligned}$$

then the RIP of \(\mathcal A'\) implies

$$\begin{aligned} \begin{aligned} \frac{1}{m} \Vert {\mathcal A'(\varvec{H}'_{T'_0, {T'^c_{01}}}+\varvec{H}'_{ {T'^c_{01}},T'_0} )}\Vert _1&\le C \sum _{j\ge 2} \left( \Vert {{\widehat{\varvec{x}}}_{T_0}{\widehat{\varvec{x}}}_{T_j}^*+{\widehat{\varvec{x}}}_{T_j}{\widehat{\varvec{x}}}_{T_0}^*}\Vert _F +2\Vert { {\widehat{\varvec{x}}}_{T_j}}\Vert _2\right) \\ {}&\le 2\sqrt{2} C \Vert {{\widehat{\varvec{x}}}'_{T'_0}}\Vert _2 \sum _{j\ge 2} \Vert {{\widehat{\varvec{x}}}_{T_{j}}}\Vert _2 \\ {}&\le \frac{2\sqrt{2}}{\sqrt{a}}C \Vert {{\widehat{\varvec{x}}}'_{T'_0}}\Vert _2 \Vert {{\widehat{\varvec{x}}}'_{T'_{01}}-{\varvec{x}}'_0}\Vert _2. \end{aligned} \end{aligned}$$
(42)

Similarly, we could obtain

$$\begin{aligned} \frac{1}{m} \Vert {\mathcal A'(\varvec{H}'_{T_1, T'^c_{01}}+\varvec{H}'_{ T'^c_{01},T_1})}\Vert _1 \le \frac{2\sqrt{2}}{\sqrt{a}}C \Vert {{\widehat{\varvec{x}}}'_{T_1}}\Vert _2 \Vert {{\widehat{\varvec{x}}}'_{T'_{01}}-{\varvec{x}}'_0}\Vert _2. \end{aligned}$$
(43)

Finally, observe that \(\frac{1}{m} \Vert {\mathcal A' (\varvec{H}'_{T'^c_{01}, T'^c_{01}})}\Vert _1 = \frac{1}{m} \Vert {\mathcal A (\varvec{H}_{T^c_{01}, T^c_{01}})}\Vert _1\). Using the same technique as [36, Eq. (3.16)], we could obtain

$$\begin{aligned} \frac{1}{m} \Vert {\mathcal A'(\varvec{H}'_{T'^c_{01}, T'^c_{01}})}\Vert _1 \le \frac{C}{a} \Vert { \bar{\varvec{H}}'}\Vert _F. \end{aligned}$$
(44)

Putting (42), (43) and (44) into (41), we have

$$\begin{aligned} \frac{1}{m} \Vert {\mathcal A'(\varvec{H}'- \bar{\varvec{H}}')}\Vert _1 \le \frac{4}{\sqrt{a}}C \Vert {{\widehat{\varvec{x}}}'_{T'_{01}}}\Vert _2 \Vert {{\widehat{\varvec{x}}}'_{T'_{01}}-{\varvec{x}}'_0}\Vert _2+\frac{C}{a} \Vert { \bar{\varvec{H}}'}\Vert _F \le C\left( \frac{4}{\sqrt{a}}+\frac{1}{a}\right) \Vert { \bar{\varvec{H}}'}\Vert _F. \end{aligned}$$
(45)

Combining (39), (40) and (45), we immediately obtain

$$\begin{aligned} \left( c- C\left( \frac{4}{\sqrt{a}}+\frac{1}{a}\right) \right) \Vert { \bar{\varvec{H}}'}\Vert _F \le \frac{2\epsilon }{\sqrt{m}}, \end{aligned}$$

which implies

$$\begin{aligned} \Vert { \bar{\varvec{H}}'}\Vert _F \le \frac{1}{c- C\left( \frac{4}{\sqrt{a}}+\frac{1}{a}\right) } \cdot \frac{2\epsilon }{\sqrt{m}}. \end{aligned}$$

This completes the proof of claim (34). \(\square \)