1 Introduction

In this paper we consider the phase retrieval for sparse signals with noisy measurements, which arises in many different applications. Assume that

$$\begin{aligned} b_j:=|\left<a_j,x_0\right>|+e_j, \quad j=1,\ldots ,m \end{aligned}$$

where \(x_0\in {\mathbb R}^N\), \(a_j\in {\mathbb R}^N\) and \(e_j\in {\mathbb R}\) is the noise. Our goal is to recover \(x_0\) up to a unimodular scaling constant from \(b:=(b_1,\ldots ,b_m)^\top \) with the assumption of \(x_0\) being approximately k-sparse. This problem is referred to as the compressive phase retrieval problem [9].

The paper attempts to address two problems. Firstly we consider the stability of \(\ell _1\) minimization for the compressive phase retrieval problem where the signal \(x_0\) is approximately k-sparse, which is the \(\ell _1\) minimization problem defined as follows:

$$\begin{aligned} \min \Vert x\Vert _1\quad \text{ subject } \text{ to } \quad \bigl \Vert |Ax|-|Ax_0|\bigr \Vert _2\le \epsilon , \end{aligned}$$
(1.1)

where \(A:=[a_1,\ldots ,a_m]^\top \) and \(|Ax_0|:=[|\left<a_1,x_0\right>|,\ldots ,|\left<a_m,x_0\right>|]^\top \). Secondly we investigate instance-optimality in the phase retrieval setting.

Note that in the classical compressive sensing setting the stable recovery of a k-sparse signal \(x_0\in {\mathbb C}^N\) can be done using \(m={\mathcal O}(k\log (N/k))\) measurements for several classes of measurement matrices A. A natural question is whether stable compressive phase retrieval can also be attained with \(m={\mathcal O}(k\log (N/k))\) measurements. This has indeed proved to be the case in [6] if \(x_0\in {\mathbb R}^N\) and A is a random real Gaussian matrix. In [8] a two-stage algorithm for compressive phase retrieval is proposed, which allows for very fast recovery of a sparse signal if the matrix A can be written as a product of a random matrix and another matrix (such as a random matrix) that allows for efficient phase retrieval. The authors proved that stable compressive phase retrieval can be achieved with \(m={\mathcal O}(k\log (N/k))\) measurements for complex signals \(x_0\) as well. In [10], the strong RIP (S-RIP) property is introduced and the authors show that one can use the \(\ell _1\) minimization to recover sparse signals up to a global sign from the noiseless measurements \(|Ax_0|\) provided A satisfies S-RIP. Naturally, one is interested in the performance of \(\ell _1\) minimization for the compressive phase retrieval with noisy measurements. In this paper, we shall show that the \(\ell _1\) minimization scheme given in (1.1) will recover a k-sparse signal stably from \(m={\mathcal O}(k\log (N/k))\) measurements, provided that the measurement matrix A satisfies the strong RIP (S-RIP) property. This establishes an important parallel for compressive phase retrieval with the classical compressive sensing. Note that in [11] such a parallel in terms of the null space property was already established.

The notion of instance optimality was first introduced in [5]. We use \(\Vert x\Vert _0\) to denote the number of non-zero elements in x . Given a norm \(\Vert \cdot \Vert _X\) such as the \(\ell _1\)-norm and \(x\in {\mathbb R}^N\), the best k-term approximation error is defined as

$$\begin{aligned} \sigma _k(x)_X\,\,:=\,\, \min _{z\in \Sigma _k}\Vert x-z\Vert _X, \end{aligned}$$

where

$$\begin{aligned} \Sigma _k:=\{x\in {\mathbb R}^N: \Vert x\Vert _0\le k\}. \end{aligned}$$

We use \(\Delta : {\mathbb R}^m\mapsto {\mathbb R}^N\) to denote a decoder for reconstructing x. We say the pair \((A,\Delta )\) is instance optimal of order k with constant \(C_0\) if

$$\begin{aligned} \Vert x-\Delta (Ax)\Vert _X\le C_0\sigma _k(x)_X \end{aligned}$$
(1.2)

holds for all \(x\in {\mathbb R}^N\). In extending it to phase retrieval, our decoder will have the input \(b=|Ax|\). A pair \((A,\Delta )\) is said to be phaseless instance optimal of order k with constant \(C_0\) if

$$\begin{aligned} \min \Bigl \{\Vert x-\Delta (|Ax|)\Vert _X, \Vert x+\Delta (|Ax|)\Vert _X\Bigr \}\le C_0\sigma _k(x)_X \end{aligned}$$
(1.3)

holds for all \(x\in {\mathbb R}^N\). We are interested in the following problem : Given \(\Vert \cdot \Vert _X\) and \(k<N\), what is the minimal value of m for which there exists \((A,\Delta )\) so that (1.3) holds?

The null space \(\mathcal {N}(A):=\{x\in {\mathbb R}^N:Ax=0\}\) of A plays an important role in the analysis of the original instance optimality (1.2) (see [5]). Here we present a null space property for \(\mathcal {N}(A)\), which is necessary and sufficient, for which there exists a decoder \(\Delta \) so that (1.3) holds. We apply the result to investigate the instance optimality where X is the \(\ell _1\) norm. Set

$$\begin{aligned} \Delta _1(|Ax|):\,= \,\mathop {\mathrm{argmin}}\limits _{z\in {\mathbb R}^N}\Bigl \{\Vert z\Vert _1: |Ax|=|Az|\Bigr \}. \end{aligned}$$

We show that the pair \((A,\Delta _1)\) satisfies (1.3) with X being the \(\ell _1\)-norm provided A satisfies the strong RIP property (see Definition 2.1). As shown in [10], the Gaussian random matrix \(A\in {\mathbb R}^{m\times N}\) satisfies the strong RIP of order k for \(m={\mathcal O}(k\log (N/k)\). Hence \(m={\mathcal O}(k\log (N/k))\) measurements suffice to ensure the phaseless instance optimality (1.3) for the \(\ell _1\)-norm exactly as with the traditional instance optimality (1.2).

2 Auxiliary Results

In this section we provide some auxiliary results that will be used in later sections. For \( x\in {\mathbb R}^N \) we use \(\Vert x\Vert _p:=\Vert x\Vert _{\ell _p}\) to denote the p-norm of x for \(0<p \le \infty \). The measurement matrix is given by \(A:=[a_1,\ldots ,a_m]^T \in \mathbb {R}^{m\times N}\) as before. Given an index set \(I\subset \{1,\ldots ,m\}\) we shall use \(A_I\) to denote the sub-matrix of A where only rows with indices in I are kept, i.e.,

$$\begin{aligned} A_I:=[a_j:j\in I]^\top . \end{aligned}$$

The matrix A satisfies the Restricted Isometry Property (RIP) of order k if there exists a constant \(\delta _k\in [0,1)\) such that for all k-sparse vectors \(z\in \Sigma _k\) we have

$$\begin{aligned} (1-\delta _k)\Vert z\Vert _2^2\le \Vert Az\Vert _2^2\le (1+\delta _k)\Vert z\Vert _2^2. \end{aligned}$$

It was shown in [2] that one can use \(\ell _1\)-minimization to recover k-sparse signals provided that A satisfies the RIP of order t k and \(\delta _{t k}<\sqrt{1-\frac{1}{t}}\) where \(t>1\).

To investigate compressive phase retrieval, a stronger notion of RIP is given in [10]:

Definition 2.1

(S-RIP) We say the matrix \(A=[a_1,\ldots ,a_m]^\top \in \mathbb {R}^{m\times N}\) has the Strong Restricted Isometry Property of order k with bounds \(\theta _-,\ \theta _+\in (0, 2)\) if

$$\begin{aligned} \theta _-\Vert x\Vert _2^2\le \min _{ I\subseteq [m], |I|\ge m/2}\Vert A_Ix\Vert _2^2\le \max _{I\subseteq [m],|I|\ge m/2} \Vert A_Ix\Vert _2^2\le \theta _+\Vert x\Vert _2^2 \end{aligned}$$
(2.1)

holds for all k-sparse signals \(x\in \mathbb {R}^N\), where \([m]:=\{1,\ldots ,m\}\). We say A has the Strong Lower Restricted Isometry Property of order k with bound \(\theta _-\) if the lower bound in (2.1) holds. Similarly we say A has the Strong Upper Restricted Isometry Property of order k with bound \(\theta _+\) if the upper bound in (2.1) holds.

The authors of [10] proved that Gaussian matrices with \(m=\mathcal {O}(tk\log (N/k))\) satisfy S-RIP of order tk with high probability.

Theorem 2.1

([10]) Suppose that \(t>1\) and \( A=(a_{ij})\in \mathbb {R}^{m\times N} \) is a random Gaussian matrix with \(m=\mathcal {O}(tk\log (N/k))\) and \(a_{ij}\sim {\mathcal N}(0,\frac{1}{\sqrt{m}})\). Then there exist \(\theta _-, \theta _+ \in (0,2)\) such that with probability \(1-\exp (-cm/2)\) the matrix A satisfies the S-RIP of order tk with constants \(\theta _-\) and \(\theta _+\), where \(c>0\) is an absolute constant and \(\theta _-\), \(\theta _+\) are independent of t.

The following is a very useful lemma for this study.

Lemma 2.1

Let \( x_0\in \mathbb {R}^N\) and \( \rho \ge 0\). Suppose that \( A\in \mathbb {R}^{m\times N}\) is a measurement matrix satisfying the restricted isometry property with \( \delta _{tk}\le \sqrt{\frac{t-1}{t}} \) for some \( t>1 \). Then for any

$$\begin{aligned} \hat{x}\in \Bigl \{x\in {\mathbb R}^N : \Vert x\Vert _1\le \Vert x_0\Vert _1+\rho , \, \Vert Ax-Ax_0\Vert _2\le \epsilon \Bigr \} \end{aligned}$$

we have

$$\begin{aligned} \Vert \hat{x}-x_0\Vert _2\le c_1\epsilon +c_2\frac{2\sigma _k(x_0)_1}{\sqrt{k}}+c_2\cdot \frac{\rho }{\sqrt{k}}, \end{aligned}$$

where \( c_1=\frac{\sqrt{2(1+\delta )}}{1-\sqrt{t/(t-1)}\delta } \), \( c_2=\frac{\sqrt{2}\delta +\sqrt{(\sqrt{t(t-1)}-\delta t)\delta }}{\sqrt{t(t-1)}-\delta t}+1.\)

Remark 2.1

We build the proof of Lemma 2.1 following the ideas of Cai and Zhang [2]. The full proof is given in Appendix for completeness. It is well-known that an effective method to recover approximately-sparse signals \(x_0\) in the traditional compressive sensing is to solve

$$\begin{aligned} x^\#:=\mathop {\mathrm{argmin}}\limits _{x}\{\Vert x\Vert _1 : \Vert Ax-Ax_0\Vert _2\le \epsilon \}. \end{aligned}$$
(2.2)

The definition of \(x^\#\) shows that

$$\begin{aligned} \Vert x^\#\Vert _1\le \Vert x_0\Vert _1, \quad \Vert Ax^\#-Ax_0\Vert _2\le \epsilon , \end{aligned}$$

which implies that

$$\begin{aligned} \Vert x^\#-x_0\Vert _2\le C_1\epsilon +C_2\frac{\sigma _k(x_0)_1}{\sqrt{k}}, \end{aligned}$$

provided that A satisfies the RIP condition with \(\delta _{tk}\le \sqrt{1-1/t}\) for \(t>1\) (see [2]). However, in practice one prefers to design fast algorithms to find an approximation solution of (2.2), say \(\hat{x}\). Thus it is possible to have \(\Vert \hat{x}\Vert _1> \Vert x_0\Vert _1\). Lemma 2.1 gives an estimate of \(\Vert \hat{x}-x_0\Vert _2\) for the case where \(\Vert \hat{x}\Vert _1\le \Vert x_0\Vert _1+\rho \).

Remark 2.2

In [7], Han and Xu extend the definition of S-RIP by replacing the m / 2 in (2.1) by \(\beta m\) where \(0<\beta <1\). They also prove that, for any fixed \(\beta \in (0,1)\), the \(m\times N\) random Gaussian matrix satisfies S-RIP of order k with high probability provided \(m=\mathcal {O}(k\log (N/k))\).

3 Stable Recovery of Real Phase Retrieval Problem

3.1 Stability Results

The following lemma shows that the map \(\phi _A(x):=|Ax|\) is stable on \(\Sigma _k\) modulo a unimodular constant provided A satisfies strong lower RIP of order 2k. Define the equivalent relation \(\sim \) on \({\mathbb R}^N\) and \({\mathbb C}^N\) by the following: for any \(x,y, x \sim y\) iff \(x= cy\) for some unimodular scalar c, where xy are in \({\mathbb R}^N\) or \({\mathbb C}^N\). For any subset Y of \({\mathbb R}^N\) or \({\mathbb C}^N\) the notation \(Y/\sim \) denotes the equivalent classes of elements in Y under the equivalence. Note that there is a natural metric \(D_\sim \) on \({\mathbb C}^N/\sim \) given by

$$\begin{aligned} D_\sim (x, y) = \min _{|c|=1} \Vert x-cy\Vert . \end{aligned}$$

Our primary focus in this paper will be on \({\mathbb R}^N\), and in this case \(D_\sim (x,y) = \min \{\Vert x-y\Vert _2, \Vert x+y\Vert _2\}\).

Lemma 3.1

Let \(A\in {\mathbb R}^{m\times N}\) satisfy the strong lower RIP of order 2k with constant \(\theta _-\). Then for any \(x, y \in \Sigma _k\) we have

$$\begin{aligned} \Vert |Ax|-|Ay|\Vert _2^2 \ge \theta _- \min (\Vert x-y\Vert _2^2, \Vert x+y\Vert _2^2). \end{aligned}$$

Proof

For any \(x,y\in \Sigma _k \) we divide \(\{1,\ldots ,m\}\) into two subsets:

$$\begin{aligned} T=\{j:~\mathrm{sign}(\langle {a_j,x} \rangle )=\mathrm{sign}(\langle {a_j,y} \rangle )\} \end{aligned}$$

and

$$\begin{aligned} T^c=\{j:~\mathrm{sign}(\langle {a_j,x} \rangle )=-\mathrm{sign}(\langle {a_j,y} \rangle )\}. \end{aligned}$$

Clearly one of T and \(T^c\) will have cardinality at least m / 2. Without loss of generality we assume that T has cardinality no less than m / 2. Then

$$\begin{aligned} \Vert |Ax|-|Ay|\Vert ^2_2= & {} \Vert A_Tx-A_T y\Vert _2^2 + \Vert A_{T^c}x+A_{T^c}y\Vert _2^2\\\ge & {} \Vert A_Tx-A_T y\Vert _2^2 \\\ge & {} \theta _- \Vert x-y\Vert _2^2 \\\ge & {} \theta _- \min (\Vert x-y\Vert _2^2, \Vert x+y\Vert _2^2). \end{aligned}$$

\(\square \)

Remark 3.1

Note that the combination of Lemma 3.1 and Theorem 2.1 shows that for an \(m\times N\) Gaussian matrix A with \(m=O(k\log (N/k))\) one can guarantee the stability of the map \(\phi _A(x):=|Ax|\) on \(\Sigma _k/\sim \).

3.2 The Main Theorem

In this part, we will consider how many measurements are needed for the stable sparse phase retrieval by \(\ell _1\)-minimization via solving the following model:

$$\begin{aligned} \min \Vert x\Vert _1\,\,\, \text{ subject } \text{ to } \,\,\, \Vert |Ax|-|Ax_0|\Vert _2^2\le \epsilon ^2, \end{aligned}$$
(3.1)

where A is our measurement matrix and \(x_0\in \mathbb {R}^N\) is a signal we wish to recover. The next theorem tells under what conditions the solution to (3.1) is stable.

Theorem 3.1

Assume that \(A\in \mathbb {R}^{m\times N}\) satisfies the S-RIP of order tk with bounds \(\theta _-, \theta _+ \in (0,2)\) such that

$$\begin{aligned} t\ge \max \Bigl \{\frac{1}{2\theta _--\theta _-^2},\frac{1}{2\theta _+-\theta _+^2}\Bigr \}. \end{aligned}$$

Then any solution \(\hat{x}\) for (3.1) satisfies

$$\begin{aligned} \min \{\Vert \hat{x}-x_0\Vert _2,\Vert \hat{x}+x_0\Vert _2\}\le c_1\epsilon +c_2\frac{2\sigma _k(x_0)_1}{\sqrt{k}}, \end{aligned}$$

where \(c_1\) and \( c_2\) are constants defined in Lemma 2.1.

Proof

Clearly any \(\hat{x}\in {\mathbb R}^N\) satisfying (3.1) must have

$$\begin{aligned} \Vert \hat{x}\Vert _1\le \Vert x_0\Vert _1 \end{aligned}$$
(3.2)

and

$$\begin{aligned} \Vert |A\hat{x}|-|Ax_0|\Vert _2^2\le \epsilon ^2. \end{aligned}$$
(3.3)

Now the index set \(\{1, 2, \dots , m\}\) is divisible into two subsets

$$\begin{aligned} T&=\{j:~ \mathrm{sign}(\langle {a_j,\hat{x}} \rangle )=\mathrm{sign}(\langle {a_j,x_0} \rangle )\}, \\ T^c&=\{j:~ \mathrm{sign}(\langle {a_j,\hat{x}} \rangle )=-\mathrm{sign}(\langle {a_j,x_0} \rangle )\}. \end{aligned}$$

Then (3.3) implies that

$$\begin{aligned} \Vert A_T\hat{x}-A_Tx_0\Vert _2^2+\Vert A_{T^c}\hat{x}+A_{T^c}x_0\Vert _2^2\le \epsilon ^2. \end{aligned}$$
(3.4)

Here either \(|T|\ge m/2\) or \(|T^c|\ge m/2\). Without loss of generality we assume that \(|T|\ge m/2\). We use the fact

$$\begin{aligned} \Vert A_T\hat{x}-A_Tx_0\Vert _2^2\le \epsilon ^2. \end{aligned}$$
(3.5)

From (3.2) and (3.5) we obtain

$$\begin{aligned} \hat{x}\in \left\{ x\in {\mathbb R}^N: \Vert x\Vert _1\le \Vert x_0\Vert _1, \Vert A_Tx-A_Tx_0\Vert _2\le \epsilon \right\} . \end{aligned}$$
(3.6)

Recall that A satisfies S-RIP of order tk and constants \(\theta _-, \ \theta _+\). Here

$$\begin{aligned} t\ge \max \{\frac{1}{2\theta _--\theta _-^2},\frac{1}{2\theta _+-\theta _+^2}\}>1. \end{aligned}$$
(3.7)

The definition of S-RIP implies that \(A_T\) satisfies the RIP of order tk in which

$$\begin{aligned} \delta _{tk} \le \max \{1-\theta _-,\ \theta _+-1\}\le \sqrt{\frac{t-1}{t}} \end{aligned}$$
(3.8)

where the second inequality follows from (3.7). The combination of (3.6), (3.8) and Lemma 2.1 now implies

$$\begin{aligned} \Vert \hat{x}-x_0\Vert _2\le c_1\epsilon +c_2\frac{2\sigma _k(x_0)_1}{\sqrt{k}}, \end{aligned}$$

where \(c_1\) and \(c_2\) are defined in Lemma 2.1. If \(|T^c|\ge \frac{m}{2}\) we get the corresponding result

$$\begin{aligned} \Vert \hat{x}+x_0\Vert _2\le c_1\epsilon +c_2\frac{2\sigma _k(x_0)_1}{\sqrt{k}}. \end{aligned}$$

The theorem is now proved. \(\square \)

This theorem demonstrates that, if the measurement matrix has the S-RIP, the real compressive phase retrieval problem can be solved stably by \(\ell _1 \)-minimization.

4 Phase Retrieval and Best k-term Approximation

4.1 Instance Optimality from the Linear Measurements

We introduce some definitions and results in [5]. Recall that for a given encoder matrix \(A\in {\mathbb R}^{m\times N}\) and a decoder \(\Delta :{\mathbb R}^m\mapsto {\mathbb R}^N\), the pair \((A,\Delta )\) is said to have instance optimality of order k with constant \(C_0\) with respect to the norm X if

$$\begin{aligned} \Vert x-\Delta (Ax)\Vert _X\le C_0\sigma _k(x)_X \end{aligned}$$
(4.1)

holds for all \(x\in {\mathbb R}^N\). Set \({\mathcal N}(A):=\{\eta \in {\mathbb R}^N: A\eta =0\}\) to be the null space of A. The following theorem gives conditions under which the (4.1) holds.

Theorem 4.1

([5]) Let \(A\in {\mathbb R}^{m\times N}\), \(1 \le k \le N\) and \(\Vert \cdot \Vert _X\) be a norm on \({\mathbb R}^N\). Then a sufficient condition for the existence of a decoder \(\Delta \) satisfying (4.1) is

$$\begin{aligned} \Vert \eta \Vert _X\le \frac{C_0}{2}\sigma _{2k}(\eta )_X, \quad \forall \eta \in {\mathcal N}(A). \end{aligned}$$
(4.2)

A necessary condition for the existence of a decoder \(\Delta \) satisfying (4.1) is

$$\begin{aligned} \Vert \eta \Vert _X\le C_0\sigma _{2k}(\eta )_X, \quad \forall \eta \in {\mathcal N}(A). \end{aligned}$$
(4.3)

For the norm \(X=\ell _1\) it was established in [5] that instance optimality of order k can indeed be achieved, e.g. for a Gaussian matrix A, with \(m=O(k\log (N/k))\). The authors also considered more generally taking different norms on both sides of (4.1). Following [5], we say the pair \((A,\Delta )\) has (pq)-instance optimality of order k with constant \(C_0\) if

$$\begin{aligned} \Vert x-\Delta (Ax)\Vert _{p}\le C_0 k^{\frac{1}{q}-\frac{1}{p}}\sigma _k(x)_{q}, \quad \forall x\in {\mathbb R}^N, \end{aligned}$$
(4.4)

with \(1\le q\le p \le 2\). It was shown in [5] that the (pq)-instance optimality of order k can be achieved at the cost of having \(m=\mathcal {O}(k(N/k)^{2-2/q})\log (N/k)\) measurements.

4.2 Phaseless Instance Optimality

A natural question here is whether an analogous result to Theorem 4.1 exists for phaseless instance optimality defined in (1.3). We answer the question by presenting such a result in the case of real phase retrieval.

Recall that a pair \((A,\Delta )\) is said to be have the phaseless instance optimality of order k with constant \(C_0\) for the norm \(\Vert .\Vert _X\) if

$$\begin{aligned} \min \Bigl \{\Vert x-\Delta (|Ax|)\Vert _X, \Vert x+\Delta (|Ax|)\Vert _X\Bigr \}\le C_0\sigma _k(x)_X \end{aligned}$$
(4.5)

holds for all \(x\in {\mathbb R}^N\).

Theorem 4.2

Let \(A\in {\mathbb R}^{m\times N}\), \(1 \le k \le N\) and \(\Vert \cdot \Vert _X\) be a norm. Then a sufficient condition for the existence of a decoder \(\Delta \) satisfying the phaseless instance optimality (4.5) is: For any \( I\subseteq \{1,\ldots ,m\}\) and \(\eta _1\in \mathcal {N}(A_I)\), \(\eta _2\in \mathcal {N}(A_{I^c})\) we have

$$\begin{aligned} \min \{\Vert \eta _1\Vert _X , \Vert \eta _2\Vert _X \}\le \frac{C_0}{4}\sigma _k(\eta _1-\eta _2)_X+\frac{C_0}{4}\sigma _k(\eta _1+\eta _2)_X. \end{aligned}$$
(4.6)

A necessary condition for the existence of a decoder \(\Delta \) satisfying (4.5) is: For any \(I\subseteq \{1,\ldots ,m\}\) and \(\eta _1\in \mathcal {N}(A_I)\), \(\eta _2\in \mathcal {N}(A_{I^c})\) we have

$$\begin{aligned} \min \{\Vert \eta _1\Vert _X , \Vert \eta _2\Vert _X \}\le \frac{C_0}{2}\sigma _k(\eta _1-\eta _2)_X+\frac{C_0}{2}\sigma _k(\eta _1+\eta _2)_X. \end{aligned}$$
(4.7)

Proof

We first assume (4.6) holds, and show that there exists a decoder \( \Delta \) satisfying the phaseless instance optimality (4.5). To this end, we define a decoder \( \Delta \) as follows:

$$\begin{aligned} \Delta (|Ax_0|)=\mathop {\hbox {argmin}}_{|Ax|=|Ax_0|}\sigma _k(x)_X. \end{aligned}$$

Suppose \( \hat{x}:=\Delta (|Ax_0|)\). We have \(|A\hat{x}|=|Ax_0|\) and \(\sigma _k(\hat{x})_X\le \sigma _k(x_0)_X\). Note that \(\langle {a_j,\hat{x}} \rangle =\pm \langle {a_j,x_0} \rangle \). Let \(I \subseteq \{1,\ldots ,m\}\) be defined by

$$\begin{aligned} I=\Bigl \{j:~\langle {a_j,\hat{x}} \rangle = \langle {a_j,x_0} \rangle \Bigr \}. \end{aligned}$$

Then

$$\begin{aligned} A_I(x_0-\hat{x})=0,\quad A_{I^c}(x_0+\hat{x})=0. \end{aligned}$$

Set

$$\begin{aligned} \eta _1&:=x_0-\hat{x}\in \mathcal {N}(A_I),\\ \eta _2&:=x_0+\hat{x}\in \mathcal {N}(A_{I^c}). \end{aligned}$$

A simple observation yields

$$\begin{aligned} \sigma _k(\eta _1-\eta _2)_X=2\sigma _k(\hat{x})_X\le 2\sigma _k(x_0)_X , \quad \sigma _k(\eta _1+\eta _2)_X=2\sigma _k(x_0)_X. \end{aligned}$$
(4.8)

Then (4.6) implies that

$$\begin{aligned} \min \{ \Vert \hat{x}-x_0\Vert _X, \Vert \hat{x}+x_0\Vert _X \}&= \min \{\Vert \eta _1\Vert _X , \Vert \eta _2\Vert _X \} \\&\le \frac{C_0}{4}\sigma _k(\eta _1-\eta _2)_X+\frac{C_0}{4}\sigma _k(\eta _1+\eta _2)_X \\&\le C_0\sigma _k(x_0)_X. \end{aligned}$$

Here the last equality is obtained by (4.8). This proves the sufficient condition.

We next turn to the necessary condition. Let \(\Delta \) be a decoder for which the phaseless instance optimality (4.5) holds. Let \(I\subseteq \{1,\ldots ,m\}\). For any \(\eta _1\in \mathcal {N}(A_I)\) and \(\eta _2\in \mathcal {N}(A_{I^c})\) we have

$$\begin{aligned} |A(\eta _1+\eta _2)|=|A(\eta _1-\eta _2)|=|A(\eta _2-\eta _1)|. \end{aligned}$$
(4.9)

The instance optimality implies

$$\begin{aligned}&\min \Bigl \{\Vert \Delta (|A(\eta _1+\eta _2)|)+\eta _1+\eta _2\Vert _X,\Vert \Delta (|A(\eta _1+\eta _2)|)-(\eta _1+\eta _2)\Vert _X\Bigr \}\nonumber \\&\quad \le C_0\sigma _k(\eta _1+\eta _2)_X. \end{aligned}$$
(4.10)

Without loss of generality we may assume that

$$\begin{aligned} \Vert \Delta (|A(\eta _1+\eta _2)|)+\eta _1+\eta _2\Vert _X\,\,\le \,\,\Vert \Delta (|A(\eta _1+\eta _2)|)-(\eta _1+\eta _2)\Vert _X. \end{aligned}$$

Then (4.10) implies that

$$\begin{aligned} \Vert \Delta (|A(\eta _1+\eta _2)|)+\eta _1+\eta _2\Vert _X \le C_0\sigma _k(\eta _1+\eta _2)_X. \end{aligned}$$
(4.11)

By (4.9), we have

$$\begin{aligned} \Vert \Delta (|A(\eta _1+\eta _2)|)+\eta _1+\eta _2\Vert _X&=\Vert \Delta (|A(\eta _2-\eta _1)|)-(\eta _2-\eta _1)+2\eta _2\Vert _X \nonumber \\&\ge 2\Vert \eta _2\Vert _X-\Vert \Delta (|A(\eta _2-\eta _1)|)-(\eta _2-\eta _1)\Vert _X. \end{aligned}$$
(4.12)

Combining (4.11) and (4.12) yields

$$\begin{aligned} 2\Vert \eta _2\Vert _X\le C_0\sigma _k(\eta _1+\eta _2)_X+\Vert \Delta (|A(\eta _2-\eta _1)|)-(\eta _2-\eta _1)\Vert _X. \end{aligned}$$
(4.13)

At the same time, (4.9) also implies

$$\begin{aligned} \Vert \Delta (|A(\eta _1+\eta _2)|)+\eta _1+\eta _2\Vert _X&=\Vert \Delta (|A(\eta _2-\eta _1)|)+(\eta _2-\eta _1)+2\eta _1\Vert _X \nonumber \\&\ge 2\Vert \eta _1\Vert _X-\Vert \Delta (|A(\eta _2-\eta _1)|)+(\eta _2-\eta _1)\Vert _X. \end{aligned}$$
(4.14)

Putting (4.11) and (4.14) together, we obtain

$$\begin{aligned} 2\Vert \eta _1\Vert _X \le C_0\sigma _k(\eta _1+\eta _2)_X+\Vert \Delta (|A(\eta _2-\eta _1)|)+(\eta _2-\eta _1)\Vert _X. \end{aligned}$$
(4.15)

It follows from (4.13) and (4.15) that

$$\begin{aligned} \min \left\{ \Vert \eta _1\Vert _X, \Vert \eta _2\Vert _X\right\}&\le \frac{C_0}{2}\sigma _k(\eta _1+\eta _2)_X\\&\quad +\frac{1}{2}\min \{ \Vert \Delta (|A(\eta _2-\eta _1)|)-(\eta _2-\eta _1)\Vert _X,\Vert \Delta (|A(\eta _2-\eta _1)|)\\&\quad +(\eta _2-\eta _1)\Vert _X\}\le \frac{C_0}{2}\sigma _k(\eta _1+\eta _2)_X+\frac{C_0}{2}\sigma _k(\eta _1-\eta _2)_X. \end{aligned}$$

Here the last inequality is obtained by the instance optimality of \( (A,\Delta ) \). For the case where

$$\begin{aligned} \Vert \Delta (|A(\eta _1+\eta _2)|)-(\eta _1+\eta _2)\Vert _X\,\,\le \,\,\Vert \Delta (|A(\eta _1+\eta _2)|)+\eta _1+\eta _2\Vert _X, \end{aligned}$$

we obtain

$$\begin{aligned} \min \{\Vert \eta _1\Vert _X,\Vert \eta _2\Vert _X\}\le \frac{C_0}{2}\sigma _k(\eta _1+\eta _2)_X+\frac{C_0}{2}\sigma _k(\eta _1-\eta _2)_X \end{aligned}$$

via the same argument. The theorem is now proved. \(\square \)

We next present a null space property for phaseless instance optimality, which allows us to establish parallel results for sparse phase retrieval.

Definition 4.1

We say a matrix \(A \in {\mathbb R}^{m\times N}\) satisfies the strong null space property (S-NSP) of order k with constant C if for any index set \(I\subseteq \{1,\ldots ,m\}\) with \(|I|\ge m/2\) and \( \eta \in {\mathcal N}(A_I)\) we have

$$\begin{aligned} \Vert \eta \Vert _X\le C\cdot \sigma _{k}(\eta )_X. \end{aligned}$$

Theorem 4.3

Assume that a matrix \( A\in \mathbb {R}^{m\times N}\) has the strong null space property of order 2k with constant \( C_0/2 \). Then there must exist a decoder \(\Delta \) having the phaseless instance optimality (1.3) with constant \( C_0 \). In particular, one such decoder is

$$\begin{aligned} \Delta (|Ax_0|)=\mathop {\hbox {argmin}}_{|Ax|=|Ax_0|}\sigma _k(x)_X. \end{aligned}$$

Proof

Assume that \( I\subseteq \{1,\ldots ,m\}\). For any \(\eta _1\in \mathcal {N}(A_I)\) and \(\eta _2\in \mathcal {N}(A_{I^c})\) we must have either \(\Vert \eta _1\Vert _X\le \frac{C_0}{2}\sigma _{2k}(\eta _1)_X\) or \(\Vert \eta _2\Vert _X\le \frac{C_0}{2}\sigma _{2k}(\eta _2)_X\) by the strong null space property. If \(\Vert \eta _1\Vert _X\le \frac{C_0}{2}\sigma _{2k}(\eta _1)_X\) then

$$\begin{aligned} \Vert \eta _1\Vert _X\le \frac{C_0}{2}\sigma _{2k}(\eta _1)_X\le \frac{C_0}{4}\sigma _k(\eta _1-\eta _2)_X+\frac{C_0}{4}\sigma _k(\eta _1+\eta _2)_X. \end{aligned}$$

Similarly if \(\Vert \eta _2\Vert _X\le \frac{C_0}{2}\sigma _{2k}(\eta _2)_X\) we will have

$$\begin{aligned} \Vert \eta _2\Vert _X\le \frac{C_0}{2}\sigma _{2k}(\eta _2)_X\le \frac{C_0}{4}\sigma _k(\eta _1-\eta _2)_X+\frac{C_0}{4}\sigma _k(\eta _1+\eta _2)_X. \end{aligned}$$

It follows that

$$\begin{aligned} \min \{\Vert \eta _1\Vert _X , \Vert \eta _2\Vert _X \} \le \frac{C_0}{4}\sigma _k(\eta _1-\eta _2)_X+\frac{C_0}{4}\sigma _k(\eta _1+\eta _2)_X. \end{aligned}$$
(4.16)

Theorem 4.2 now implies that the required decoder \(\Delta \) exists. Furthermore, by the proof of the sufficiency part of Theorem 4.2,

$$\begin{aligned} \Delta (|Ax_0|)=\mathop {\hbox {argmin}}_{|Ax|=|Ax_0|}\sigma _k(x)_X \end{aligned}$$

is one such decoder. \(\square \)

4.3 The Case \( X=\ell _1 \)

We will now apply Theorem 4.3 to the \(\ell _1\)-norm case. The following lemma establishes a relation between S-RIP and S-NSP for the \( \ell _1 \)-norm.

Lemma 4.1

Let abk be integers. Assume that \( A\in \mathbb {R}^{m\times N} \) satisfies the S-RIP of order \( (a+b)k \) with constants \( \theta _-, \ \theta _+\in (0, 2) \). Then A satisfies the S-NSP of order ak under the \(\ell _1\)-norm with constant

$$\begin{aligned} C_0=1+\sqrt{\frac{a(1+\delta )}{b(1-\delta )}}, \end{aligned}$$

where \( \delta \) is the restricted isometry constant and \(\delta :=\max \{1-\theta _-,\theta _+-1\}<1\).

We remark that the above lemma is the analogous to the following lemma providing a relationship between RIP and NSP, which was shown in [5]:

Lemma 4.2

([5, Lemma 4.1]) Let \(a=l/k\), \(b=l'/k\) where \(l,l'\ge k\) are integers. Assume that \( A\in \mathbb {R}^{m\times N}\) satisfies the RIP of order \( (a+b)k \) with \( \delta =\delta _{(a+b)k}<1 \). Then A satisfies the null space property under the \( \ell _1 \)-norm of order ak with constant \(C_0=1+\frac{\sqrt{a(1+\delta )}}{\sqrt{b(1-\delta )}} \).

Proof

By the definition of S-RIP, for any index set \( I\subseteq \{1,\ldots ,m\} \) with \( |I|\ge m/2 \), the matrix \( A_I\in {\mathbb R}^{|I|\times N} \) satisfies the RIP of order \( (a+b)k \) with constant \(\delta _{(a+b)k}=\delta :=\max \{1-\theta _-,\theta _+-1\}< 1 \). It follows from Lemma 4.2 that

$$\begin{aligned} \Vert \eta \Vert _1\le \left( 1+\sqrt{\frac{a(1+\delta )}{b(1-\delta )}}\,\right) \sigma _{ak}(\eta )_1 \end{aligned}$$

for all \(\eta \, \in \mathcal {N}(A_I)\). This proves the lemma. \(\square \)

Set \( a=2\) and \(b=1 \) in Lemma 4.1 we infer that if A satisfies the S-RIP of order 3k with constants \( \theta _-, \ \theta _+\in (0, 2)\), then A satisfies the S-NSP of order 2k under the \(\ell _1\)-norm with constant \(C_0=1+\sqrt{\frac{2(1+\delta )}{1-\delta }} \). Hence by Theorem 4.3, there must exist a decoder that has the instance optimality under the \( \ell _1 \)-norm with constant \( 2C_0 \). According to Theorem 2.1, by taking \(m=O(k\log (N/k))\) a Gaussian random matrix A satisfies S-RIP of order 3k with high probability. Hence, there exists a decoder \(\Delta \) so that the pair \((A, \Delta )\) has the the \(\ell _1\)-norm phaseless instance optimality at the cost of \(m=O(k\log (N/k))\) measurements, as with the traditional instance optimality.

We are now ready to prove the following theorem on phaseless instance optimality under the \(\ell _1\)-norm.

Theorem 4.4

Let \( A \in {\mathbb R}^{m\times N}\) satisfy the S-RIP of order tk with constants \( 0<\theta _-<1< \theta _+<2 \), where

$$\begin{aligned} t\ge \max \left\{ \frac{2}{\theta _-}, \frac{2}{2-\theta _+} \right\} >2. \end{aligned}$$

Let

$$\begin{aligned} \Delta (|Ax_0|)=\mathop {\hbox {argmin}}_{x\in {\mathbb R}^N}\left\{ \Vert x\Vert _1: |Ax|=|Ax_0|\right\} . \end{aligned}$$
(4.17)

Then \( (A,\Delta ) \) has the \(\ell _1\)-norm phaseless instance optimality with constant \( C=\frac{2C_0}{2-C_0} \), where \(C_0=1+\sqrt{\frac{1+\delta }{(t-1)(1-\delta )}} \) and as before

$$\begin{aligned} \delta :=\max \{1-\theta _-,\theta _+-1\}\le 1-\frac{2}{t}. \end{aligned}$$

Proof of Lemma 4.1

Let \(x_0 \in {\mathbb R}^N\) and set \(\hat{x} =\Delta (|Ax_0|)\). Then by definition

$$\begin{aligned} \Vert \hat{x}\Vert _1\le \Vert x_0\Vert _1\quad \text {and}\quad |A\hat{x}|=|Ax_0|. \end{aligned}$$

Denote by \(I\subseteq \{1,\ldots ,m\}\) the set of indices

$$\begin{aligned} I=\left\{ j: \langle {a_j,\hat{x}} \rangle =\langle {a_j,x_0} \rangle \right\} , \end{aligned}$$

and thus \(\langle {a_j,\hat{x}} \rangle =-\langle {a_j,x_0} \rangle \) for \(j \in I^c\). It follows that

$$\begin{aligned} A_I(\hat{x}-x_0)=0 \quad \text{ and } \quad A_{I^c}(\hat{x}+x_0)=0. \end{aligned}$$

Set

$$\begin{aligned} \eta :=\hat{x}-x_0\in \mathcal {N}(A_I). \end{aligned}$$

We know that A satisfies the S-RIP of order tk with constants \( \theta _-,\ \theta _+ \) where

$$\begin{aligned} t\ge \max \left\{ \frac{2}{\theta _-}, \frac{2}{2-\theta _+} \right\} >2. \end{aligned}$$

For the case \(|I|\ge m/2\), \(A_I\) satisfies the RIP of order tk with RIP constant

$$\begin{aligned} \delta =\delta _{tk}:=\max \{1-\theta _-, \theta _+-1\}\le 1-\frac{2}{t}< 1. \end{aligned}$$

Take \( a:=1,\ b:=t-1 \) in Lemma 4.1. Then A satisfies the \(\ell _1\)-norm S-NSP of order k with constant

$$\begin{aligned} C_0=1+\sqrt{\frac{1+\delta }{(t-1)(1-\delta )}}<2. \end{aligned}$$

This yields

$$\begin{aligned} \Vert \eta \Vert _1\le C_0\Vert \eta _{T^c}\Vert _1, \end{aligned}$$
(4.18)

where T is the index set for the k largest coefficients of \( x_0 \) in magnitude. Hence \( \Vert \eta _T\Vert _1\le (C_0-1)\Vert \eta _{T^c}\Vert _1 \). Since \( \Vert \hat{x}\Vert _1\le \Vert x_0\Vert _1 \) we have

$$\begin{aligned} \Vert x_0\Vert _1\ge \Vert \hat{x}\Vert _1&=\Vert x_0+\eta \Vert _1 =\Vert x_{0,T}+x_{0,T^c}+\eta _T+\eta _{T^c}\Vert _1\\&\ge \Vert x_{0,T}\Vert _1-\Vert x_{0,T^c}\Vert _1+\Vert \eta _{T^c}\Vert _1-\Vert \eta _T\Vert _1. \end{aligned}$$

It follows that

$$\begin{aligned} \Vert \eta _{T^c}\Vert _1\le \Vert \eta _T\Vert _1+2\sigma _k(x_0)_1\le (C_0-1)\Vert \eta _{T^c}\Vert _1+2\sigma _k(x_0)_1 \end{aligned}$$

and thus

$$\begin{aligned} \Vert \eta _{T^c}\Vert _1\le \frac{2}{2-C_0}\sigma _k(x_0)_1. \end{aligned}$$

Now (4.18) yields

$$\begin{aligned} \Vert \eta \Vert _1\le C_0\Vert \eta _{T^c}\Vert _1\le \frac{2C_0}{2-C_0}\sigma _k(x_0)_1, \end{aligned}$$

which implies

$$\begin{aligned} \Vert \hat{x}-x_0\Vert _1\le C_0\Vert \eta _{T^c}\Vert _1\le \frac{2C_0}{2-C_0}\sigma _k(x_0)_1. \end{aligned}$$

For the case \(|I^c|\ge m/2\) identical argument yields

$$\begin{aligned} \Vert \hat{x}+x_0\Vert _1\le C_0\Vert \eta _{T^c}\Vert _1\le \frac{2C_0}{2-C_0}\sigma _k(x_0)_1. \end{aligned}$$

The theorem is now proved. \(\square \)

By Theorem 2.1, an \(m\times N\) random Gaussian matrix with \(m=\mathcal {O}(tk\log (N/k))\) satisfies the S-RIP of order tk with high probability. We therefore conclude that the \(\ell _1\)-norm phaseless instance optimality of order k can be achieved at the cost of \(m=\mathcal {O}(tk\log (N/k))\) measurements.

4.4 Mixed-Norm phaseless Instance Optimality

We now consider mixed-norm phaseless instance optimality. Let \( 1\le q\le p\le 2 \) and \(s=1/q-1/p \). We seek estimates of the form

$$\begin{aligned} \min \{\Vert x-\Delta (|Ax|)\Vert _{p}, \Vert x+\Delta (|Ax|)\Vert _{p}\} \le C_0 k^{-s}\sigma _k(x)_{q} \end{aligned}$$
(4.19)

for all \(x\in {\mathbb R}^N\). We shall prove both necessary and sufficient conditions for mixed-norm phaseless instance optimality.

Theorem 4.5

Let \(A\in {\mathbb R}^{ m\times N} \) and \(1\le q\le p\le 2\). Set \(s=1/q-1/p \). Then a decoder \( \Delta \) satisfying the mixed norm phaseless instance optimality (4.19) with constant \( C_0 \) exists if: for any index set \( I\subseteq \{1,\ldots ,m\} \) and any \(\eta _1\in \mathcal {N}(A_I)\), \(\eta _2\in \mathcal {N}(A_{I^c})\) we have

$$\begin{aligned} \min \{\Vert \eta _1\Vert _p , \Vert \eta _2\Vert _p \} \le \frac{C_0}{4}k^{-s}\Bigl (\sigma _k(\eta _1-\eta _2)_q+\sigma _k(\eta _1+\eta _2)_q\Bigr ). \end{aligned}$$
(4.20)

Conversely, assume a decoder \(\Delta \) satisfying the mixed norm phaseless instance optimality (4.19) exists. Then for any index set \( I\subseteq \{1,\ldots ,m\} \) and any \(\eta _1\in \mathcal {N}(A_I)\), \(\eta _2\in \mathcal {N}(A_{I^c})\) we have

$$\begin{aligned} \min \{\Vert \eta _1\Vert _p , \Vert \eta _2\Vert _p \} \le \frac{C_0}{2}k^{-s}\Bigl (\sigma _k(\eta _1-\eta _2)_q+\sigma _k(\eta _1+\eta _2)_q\Bigr ). \end{aligned}$$

Proof of Lemma 4.1

The proof is virtually identical to the proof of Theorem 4.2. We shall omit the details here in the interest of brevity. \(\square \)

Definition 4.2

(Mixed-Norm Strong Null Space Property) We say that A has the mixed strong null space property in norms \( (\ell _p,\ell _q) \) of order k with constant C if for any index set \( I\subseteq \{1,\ldots ,m\} \) with \( |I|\ge m/2 \) the matrix \( A_I\in {\mathbb R}^{|I|\times N} \) satisfies

$$\begin{aligned} \Vert \eta \Vert _p\le Ck^{-s}\sigma _k(\eta )_q \end{aligned}$$

for all \(\eta \in \mathcal {N}(A_I)\), where \(s = 1/q-1/p\).

The above is an extension of the standard definition of the mixed null space property of order k in norms \( (\ell _p,\ell _q) \) (see [5]) for a matrix A, which requires

$$\begin{aligned} \Vert \eta \Vert _p\le Ck^{-s}\sigma _k(\eta )_q \end{aligned}$$

for all \(\eta \in \mathcal {N}(A)\). We have the following straightforward generalization of Theorem 4.3.

Theorem 4.6

Assume that \( A\in \mathbb {R}^{m\times N} \) has the mixed strong null space property of order 2k in norms \( (\ell _p,\ell _q) \) with constant \( C_0/2 \), where \( 1\le q\le p\le 2 \). Then there exists a decoder \(\Delta \) such that the mixed-norm phaseless instance optimality (4.19) holds with constant \(C_0 \).

We establish relationships between mixed-norm strong null space property and the S-RIP. First we present the following lemma that was proved in [5].

Lemma 4.3

([5, Lemma 8.2]) Let \(k\ge 1\) and \(\tilde{k} = \lceil {k(\frac{N}{k})^{2-2/q}}\rceil \). Assume that \(A\in {\mathbb R}^{m\times N}\) satisfies the RIP of order \( 2k+\tilde{k}\) with \( \delta :=\delta _{2k+\tilde{k}}<1 \). Then A satisfies the mixed null space property in norms \((\ell _p,\ell _q)\) of order 2k with constant \( C_0=2^{1/p+1/2}\sqrt{\frac{1+\delta }{1-\delta }}+2^{1/p-1/q}\).

Proposition 4.1

Let \(k\ge 1\) and \(\tilde{k} = \lceil {k(\frac{N}{k})^{2-2/q}}\rceil \). Assume that \(A\in {\mathbb R}^{m\times N}\) satisfies the S-RIP of order \( 2k+\tilde{k}\) with constants \( 0<\theta _- <1 <\theta _+<2\). Then A satisfies the mixed strong null space property in norms \( (\ell _p, \ell _q) \) of order 2k with constant \(C_0=2^{1/p+1/2}\sqrt{\frac{1+\delta }{1-\delta }}+2^{1/p-1/q} \), where \( \delta \) is the RIP constant and \(\delta :=\delta _{2k+\tilde{k}}= \max \{1-\theta _-, \theta _+-1\}\).

Proof of Lemma 4.1

By definition for any index set \( I\subseteq \{1,\ldots ,m\} \) with \( |I|\ge m/2 \), the matrix \( A_I\in {\mathbb R}^{|I|\times N} \) satisfies RIP of order \( 2k+\tilde{k} \) with constant \( C_0=2^{1/p+1/2}\sqrt{\frac{1+\delta }{1-\delta }}+2^{1/p-1/q} \), where \( \delta \) is the RIP constant and \(\delta :=\delta _{2k+\tilde{k}}= \max \{1-\theta _-, \theta _+-1\}\). By Lemma 4.3, we know that \( A_I \) satisfies the mixed null space property in norms \( (\ell _p,\ell _q) \) of order 2k with constant \( C_0=2^{1/p+1/2}\sqrt{\frac{1+\delta }{1-\delta }}+2^{1/p-1/q} \), in other words for any \(\eta \in \mathcal {N}(A_I)\),

$$\begin{aligned} \Vert \eta \Vert _p\le Ck^{-s}\sigma _{2k}(\eta )_q. \end{aligned}$$

So A satisfies the mixed strong null space property. \(\square \)

Corollary 4.1

Let \(k\ge 1\) and \(\tilde{k} = k(\frac{N}{k})^{2-2/q}\). Assume that \(A\in {\mathbb R}^{m\times N}\) satisfies the S-RIP of order \( 2k+\tilde{k}\) with constants \( 0<\theta _- <1 <\theta _+<2\). Let \(\delta :=\delta _{2k+\tilde{k}} =\max \{1-\theta _-, \theta _+-1\}<1 \). Define the decoder \( \Delta \) for A by

$$\begin{aligned} \Delta (|Ax_0|)=\mathop {\hbox {argmin}}_{|Ax|=|Ax_0|}\sigma _k(x)_q. \end{aligned}$$
(4.21)

Then (4.19) holds with constant \(2C_0\), where \(C_0=2^{1/p+1/2}\sqrt{\frac{1+\delta }{1-\delta }}+2^{1/p-1/q} \).

Proof of Lemma 4.1

By the Proposition 4.1, the matrix A satisfies the mixed strong null space property in \( (\ell _p,\ell _q) \) of order 2k with constant \( C_0=2^{1/p+1/2}\sqrt{\frac{1+\delta }{1-\delta }}+2^{1/p-1/q} \). The corollary now follows immediately from Theorem 4.6. \(\square \)

Remark 4.1

Combining Theorem 2.1 and Corollary 4.1, the mixed phaseless instance optimality of order k in norms \( (\ell _p,\ell _q) \) can be achieved for the price of \( \mathcal {O}(k(N/k)^{2-2/q}\log (N/k)) \) measurements, just as with the traditional mixed \((\ell _p,\ell _q)\)-norm instance optimality. Theorem 3.1 implies that the \(\ell _1\) decoder satisfies the \((p,q)=(2,1)\) mixed-norm phaseless instance optimality at the price of \( \mathcal {O}(k\log (N/k)) \) measurements.

5 Appendix: Proof of Lemma 2.1

We will first need the following two Lemmas to prove Lemma 2.1.

Lemma 5.1

(Sparse Representation of a Polytope [2, 12]) Let \(s\ge 1\) and \(\alpha >0\). Set

$$\begin{aligned} T(\alpha ,s):=\Bigl \{u\in \mathbb {R}^n: \Vert u\Vert _\infty \le \alpha ,\ \Vert u\Vert _1\le s\alpha \Bigr \}. \end{aligned}$$

For any \(v\in \mathbb {R}^n\) let

$$\begin{aligned} U (\alpha ,s,v):=\Bigl \{u\in \mathbb {R}^n:\hbox {supp}(u)\subseteq \hbox {supp}(v),\Vert u\Vert _0\le s,\Vert u\Vert _1=\Vert v\Vert _1,\Vert u\Vert _\infty \le \alpha \Bigr \}. \end{aligned}$$

Then \(v\in T(\alpha ,s)\) if and only if v is in the convex hull of \( U (\alpha ,s,v)\), i.e. v can be expressed as a convex combination of some \(u_1, \dots , u_N\) in \( U (\alpha ,s,v)\).

Lemma 5.2

([1, Lemma 5.3]) Assume that \( a_1\ge a_2\ge \cdots \ge a_m\ge 0 \). Let \(r \le m\) and \(\lambda \ge 0\) such that \( \sum _{i=1}^{r}a_i + \lambda \ge \sum _{i=r+1}^{m}a_i \). Then for all \( \alpha \ge 1 \) we have

$$\begin{aligned} \sum _{j=r+1}^{m}a_j^\alpha \le r \left( \root \alpha \of {\frac{\sum _{i=1}^{r}a_i^\alpha }{r}}+\frac{\lambda }{r}\right) ^\alpha . \end{aligned}$$
(5.1)

In particular for \(\lambda =0\) we have

$$\begin{aligned} \sum _{j=r+1}^{m}a_j^\alpha \le \sum _{i=1}^{r}a_i^\alpha . \end{aligned}$$

We are now ready to prove Lemma 2.1.

Proof

Set \(h:=\hat{x}-x_0\). Let \(T_0\) denote the set of the largest k coefficients of \(x_0\) in magnitude. Then

$$\begin{aligned} \Vert x_0\Vert _1+\rho&\ge \Vert \hat{x}\Vert _1 =\Vert x_0+h\Vert _1 \\&=\Vert x_{0,T_0}+h_{T_0}+x_{0,T_0^c}+h_{T_0^c}\Vert _1\\&\ge \Vert x_{0,T_0}\Vert _1-\Vert h_{T_0}\Vert _1-\Vert x_{0,T_0^c}\Vert _1+\Vert h_{T_0^c}\Vert _1. \end{aligned}$$

It follows that

$$\begin{aligned} \Vert h_{T_0^c}\Vert _1&\le \Vert h_{T_0}\Vert _1+2\Vert x_{0,T_0^c}\Vert _1+\rho \\&=\Vert h_{T_0}\Vert _1+2\sigma _k(x_0)_1+\rho . \end{aligned}$$

Suppose that \( S_0 \) is the index set of the k largest entries in absolute value of h . Then we can get

$$\begin{aligned} \Vert h_{S_0^c}\Vert _1\le \Vert h_{T_0^c}\Vert _1&\le \Vert h_{T_0}\Vert _1+2\sigma _k(x_0)_1+\rho \\&\le \Vert h_{S_0}\Vert _1+2\sigma _k(x_0)_1+\rho . \end{aligned}$$

Set

$$\begin{aligned} \alpha :=\frac{\Vert h_{S_0}\Vert _1+2\sigma _k(x_0)_1+\rho }{k}. \end{aligned}$$

We divide \( h_{S_0^c }\) into two parts \( h_{S_0^c }=h^{(1)}+h^{(2)} \), where

$$\begin{aligned} h^{(1)}:=h_{S_0^c}\cdot I_{\{i:\,|h_{S_0^c}(i)|>\alpha /(t-1)\}}, \quad h^{(2)}:=h_{S_0^c}\cdot I_{\{i:\,|h_{S_0^c}(i)|\le \alpha /(t-1)\}} . \end{aligned}$$

A simple observation is that \(\Vert h^{(1)}\Vert _1\le \Vert h_{S_0^c}\Vert _1\le \alpha k \). Set

$$\begin{aligned} \ell := |\mathrm{supp}(h^{(1)})|=\Vert h^{(1)}\Vert _0 . \end{aligned}$$

Since all non-zero entries of \( h^{(1)} \) have magnitude larger than \( \alpha /(t-1) \), we have

$$\begin{aligned} \alpha k\ge \Vert h^{(1)}\Vert _1=\sum _{i\in \mathrm{supp}(h^{(1)})}|h^{(1)}(i)| \ge \sum _{i\in \mathrm{supp}(h^{(1)})}\frac{\alpha }{t-1}=\frac{\alpha \ell }{t-1}, \end{aligned}$$

which implies \( \ell \le (t-1)k \). Thus we have:

$$\begin{aligned} \big \langle A(h_{S_0}+h^{(1)}), Ah\big \rangle \le \Vert A(h_{S_0}+h^{(1)})\Vert _2\cdot \Vert Ah\Vert _2 \le \sqrt{1+\delta }\cdot \Vert h_{S_0}+h^{(1)}\Vert _2\cdot \epsilon .\nonumber \\ \end{aligned}$$
(5.2)

Here we apply the facts that \(\Vert h_{S_0}+h^{(1)}\Vert _0=\ell +k\le tk \) and A satisfies the RIP of order tk with \( \delta :=\delta _{tk}^A \). We shall assume at first that tk as an integer. Note that \(\Vert h^{(2)}\Vert _\infty \le \frac{\alpha }{t-1}\) and

$$\begin{aligned} \Vert h^{(2)}\Vert _1 =\Vert h_{S_0^c}\Vert _1-\Vert h^{(1)}\Vert _1 \le k\alpha - \frac{ \alpha \ell }{t-1}=(k(t-1)-\ell )\frac{\alpha }{t-1}. \end{aligned}$$
(5.3)

We take \( s:=k(t-1)-\ell \) in Lemma 5.1 and obtain that \( h^{(2)} \) is a weighted mean

$$\begin{aligned} h^{(2)}=\sum _{i=1}^{N}\lambda _iu_i,\quad \quad 0\le \lambda _i\le 1, \quad \sum _{i=1}^N\lambda _i=1 \end{aligned}$$

where \( \Vert u_i\Vert _0\le k(t-1)-\ell , \Vert u_i\Vert _1=\Vert h^{(2)}\Vert _1 \), \(\Vert u_i\Vert _\infty \le \alpha /(t-1) \) and \(\hbox {supp}(u_i)\subseteq \hbox {supp}(h^{(2)}) \). Hence

$$\begin{aligned} \Vert u_i\Vert _2\le \sqrt{\Vert u_i\Vert _0}\cdot \Vert u_i\Vert _\infty&=\sqrt{k(t-1)-\ell }\cdot \Vert u_i\Vert _\infty \\&\le \sqrt{k(t-1)}\cdot \Vert u_i\Vert _\infty \\&\le \alpha \sqrt{k/(t-1)}. \end{aligned}$$

Now for \( 0\le \mu \le 1 \) and \( d\ge 0\), which will be chosen later, set

$$\begin{aligned} \beta _j:=h_{S_0}+h^{(1)}+\mu \cdot u_j, \quad j=1,\ldots ,N. \end{aligned}$$

Then for fixed \(i\in [1,N]\)

$$\begin{aligned} \sum _{j=1}^{N}\lambda _j\beta _j-d\beta _i&=h_{S_0}+h^{(1)}+\mu \cdot h^{(2)}-d\beta _i\\&=(1-\mu -d)(h_{S_0}+h^{(1)})-d\mu u_i+\mu h. \end{aligned}$$

Recall that \(\alpha =\frac{\Vert h_{S_0}\Vert _1+2\sigma _k(x_0)_1+\rho }{k}\). Thus

$$\begin{aligned} \Vert u_i\Vert _2&\le \sqrt{k/(t-1)}\alpha \\\nonumber&\le \frac{\Vert h_{S_0}\Vert _2}{\sqrt{t-1}}+\frac{2\sigma _k(x_0)_1+\rho }{\sqrt{k(t-1)}}\\\nonumber&\le \frac{\Vert h_{S_0}+h^{(1)}\Vert _2}{\sqrt{t-1}}+\frac{2\sigma _k(x_0)_1+\rho }{\sqrt{k(t-1)}}\\\nonumber&=\frac{z+R}{\sqrt{t-1}},\nonumber \end{aligned}$$
(5.4)

where \( z:=\Vert h_{S_0}+h^{(1)}\Vert _2\) and \(R:=\frac{2\sigma _k(x_0)_1+\rho }{\sqrt{k}}\). It is easy to check the following identity:

$$\begin{aligned} (2d-1)&\sum _{1\le i<j\le N}\lambda _i\lambda _j\Vert A(\beta _i-\beta _j)\Vert _2^2 \nonumber \\&=\sum _{i=1}^{N}\lambda _i\Bigl \Vert A(\sum _{j=1}^{N}\lambda _j\beta _j-d\beta _i)\Bigr \Vert _2^2 - \sum _{i=1}^{N}\lambda _i(1-d)^2\Vert A\beta _i\Vert _2^2, \end{aligned}$$
(5.5)

provided that \(\sum _{i=1}^N\lambda _i=1\). Choose \(d=1/2\) in (5.5) we then have

$$\begin{aligned} \sum _{i=1}^{N}\lambda _i\Bigl \Vert A\Bigl ( (\frac{1}{2}-\mu )(h_{S_0}+h^{(1)})-\frac{\mu }{2}u_i+\mu h\Bigr )\Bigr \Vert _2^2-\sum _{i=1}^{N}\frac{\lambda _i}{4}\Vert A\beta _i\Vert _2^2 = 0. \end{aligned}$$

Note that for \(d=1/2\),

$$\begin{aligned}&\Bigl \Vert A\Bigl ( (\frac{1}{2}-\mu )(h_{S_0}+h^{(1)})-\frac{\mu }{2}u_i+\mu h\Bigr )\Bigr \Vert _2^2\\&\quad = \Bigl \Vert A\Bigl ( (\frac{1}{2}-\mu )(h_{S_0}+h^{(1)})-\frac{\mu }{2}u_i\Bigr ) \Bigr \Vert _2^2\\&\quad \quad +2\Bigl \langle A\Bigl ((\frac{1}{2}-\mu )(h_{S_0}+h^{(1)})-\frac{\mu }{2}u_i\Bigr ), \mu Ah\Bigr \rangle +\mu ^2\Vert Ah\Vert _2^2. \end{aligned}$$

It follows from \(\sum _{i=1}^N \lambda _i =1\) and \(h^{(2)}=\sum _{i=1}^{N}\lambda _iu_i\) that

$$\begin{aligned}&\sum _{i=1}^{N}\lambda _i\Bigl \Vert A\Bigl ( (\frac{1}{2}-\mu )(h_{S_0}+h^{(1)})-\frac{\mu }{2}u_i+\mu h\Bigr )\Bigr \Vert _2^2\nonumber \\&= \sum _i\lambda _i\Bigl \Vert A\Bigl ( (\frac{1}{2}-\mu )(h_{S_0}+h^{(1)})-\frac{\mu }{2}u_i\Bigr ) \Bigr \Vert _2^2\nonumber \\&\quad +2\Bigl \langle A\Bigl ((\frac{1}{2}-\mu )(h_{S_0}+h^{(1)})-\frac{\mu }{2}h^{(2)}\Bigr ), \mu Ah\Bigr \rangle +\mu ^2\Vert Ah\Vert _2^2 \nonumber \\&= \sum _i\lambda _i\Bigl \Vert A\Bigl ( (\frac{1}{2}-\mu )(h_{S_0}+h^{(1)})-\frac{\mu }{2}u_i\Bigr ) \Bigr \Vert _2^2\nonumber \\&\quad + \mu (1-\mu )\Bigl \langle A(h_{S_0}+h^{(1)}),Ah\Bigr \rangle -\sum _{i=1}^{N}\frac{\lambda _i}{4}\Vert A\beta _i\Vert _2^2. \end{aligned}$$
(5.6)

Set \( \mu =\sqrt{t(t-1)}-(t-1)\). We next estimate the three terms in (5.6). Noting that \(\Vert h_{S_0}\Vert _0\le k \), \( \Vert h^{(1)}\Vert _0\le \ell \) and \(\Vert u_i\Vert _0\le s =k(t-1)-\ell \), we obtain

$$\begin{aligned} \Vert \beta _i\Vert _0\le \Vert h_{S_0}\Vert _0 + \Vert h^{(1)}\Vert _0+ \Vert u_i\Vert _0\le t\cdot k \end{aligned}$$

and \(\Vert (\frac{1}{2}-\mu )(h_{S_0}+h^{(1)})-\frac{\mu }{2}u_i\Vert _0\le t\cdot k \). Since A satisfies the RIP of order \(t\cdot k\) with \(\delta \), we have

$$\begin{aligned}&\Bigl \Vert A\Bigl ( (\frac{1}{2}-\mu )(h_{S_0}+h^{(1)})-\frac{\mu }{2}u_i\Bigr ) \Bigr \Vert _2^2 \\&\qquad \le (1+\delta )\Vert (\frac{1}{2}-\mu )(h_{S_0}+h^{(1)})-\frac{\mu }{2}u_i\Vert _2^2\\&\quad = (1+\delta )\Bigl ((\frac{1}{2}-\mu )^2 \Vert (h_{S_0}+h^{(1)})\Vert _2^2+\frac{\mu ^2}{4}\Vert u_i\Vert _2^2\Bigr )\\&\quad = (1+\delta )\Bigl ((\frac{1}{2}-\mu )^2 z^2+\frac{\mu ^2}{4}\Vert u_i\Vert _2^2\Bigr ) \end{aligned}$$

and

$$\begin{aligned} \Vert A\beta _i\Vert _2^2\ge & {} (1-\delta )\Vert \beta _i\Vert _2^2 =(1-\delta )( \Vert h_{S_0}+h^{(1)}\Vert _2^2+\mu ^2\cdot \Vert u_i\Vert _2^2)\\= & {} (1-\delta )(z^2+\mu ^2\cdot \Vert u_i\Vert _2^2). \end{aligned}$$

Combining the result above with (5.2) and (5.4) we get

$$\begin{aligned}&0\le (1+\delta )\sum _{i=1}^{N}\lambda _i\Bigl ((\frac{1}{2}-\mu )^2z^2+\frac{\mu ^2}{4}\Vert u_i\Vert _2^2\Bigr ) +\mu (1-\mu )\sqrt{1+\delta }\cdot z\cdot \epsilon \\&\qquad -(1-\delta )\sum _{i=1}^{N}\frac{\lambda _i}{4}(z^2+\mu ^2\Vert u_i\Vert _2^2)\\&\quad =\sum _{i=1}^{N}\lambda _i\Bigl (\Bigl ((1+\delta )(\frac{1}{2}-\mu )^2-\frac{1-\delta }{4} \Bigr )z^2+\frac{\delta }{2}\mu ^2\Vert u_i\Vert _2^2 \Bigr )+\mu (1-\mu )\sqrt{1+\delta }\cdot z\cdot \epsilon \\&\quad \le \sum _{i=1}^{N}\lambda _i\Bigl (\Bigl ((1+\delta )(\frac{1}{2}-\mu )^2-\frac{1-\delta }{4} \Bigr )z^2+\frac{\delta }{2}\mu ^2\frac{(z+R)^2}{t-1} \Bigr )\\&\qquad +\mu (1-\mu )\sqrt{1+\delta }\cdot z\cdot \epsilon \\&\quad =\Bigl ((\mu ^2-\mu )+\delta \Bigl ( \frac{1}{2}-\mu +(1+\frac{1}{2(t-1)})\mu ^2\Bigr ) \Bigr )z^2\\&\qquad +\Bigl ( \mu (1-\mu )\sqrt{1+\delta }\cdot \epsilon +\frac{\delta \mu ^2R}{t-1}\Bigr )z+\frac{\delta \mu ^2R^2}{2(t-1)} \\&\quad =-t\Bigl ((2t-1)-2\sqrt{t(t-1)} \Bigr ) (\sqrt{\frac{t-1}{t}}-\delta )z^2\\&\qquad +\Bigl ( \mu ^2\sqrt{\frac{t}{t-1}}\sqrt{1+\delta }\cdot \epsilon +\frac{\delta \mu ^2R}{t-1}\Bigr )z+\frac{\delta \mu ^2R^2}{2(t-1)}\\&\quad =\frac{\mu ^2}{t-1}\Bigl (-t(\sqrt{\frac{t-1}{t}}-\delta )z^2+(\sqrt{t(t-1)(1+\delta )}\epsilon +\delta R)z+\frac{\delta R^2}{2} \Bigr ), \end{aligned}$$

which is a quadratic inequality for z. We know \(\delta <\sqrt{(t-1)/t}\). So by solving the above inequality we get

$$\begin{aligned} z&\le \frac{(\sqrt{t(t-1)(1+\delta )}\epsilon +\delta R)+\left( (\sqrt{t(t-1)(1+\delta )}\epsilon +\delta R)^2+2t(\sqrt{(t-1)/t}-\delta )\delta R^2 \right) ^{1/2} }{2t(\sqrt{(t-1/t)}-\delta ) }\\&\le \frac{\sqrt{t(t-1)(1+\delta )}}{t(\sqrt{(t-1)/t}-\delta )}\epsilon +\frac{2\delta +\sqrt{2t(\sqrt{(t-1)/t}-\delta )\delta }}{2t(\sqrt{(t-1)/t}-\delta )}R. \end{aligned}$$

Finally, noting that \( \Vert h_{S_0^c}\Vert _1\le \Vert h_{S_0}\Vert _1+R\sqrt{k} \), in the Lemma 5.2, if we set \( m=N \), \( r=k \), \( \lambda =R\sqrt{k}\ge 0 \) and \( \alpha =2 \) then \(\Vert h_{S_0^c}\Vert _2\le \Vert h_{S_0}\Vert _2+R \). Hence

$$\begin{aligned} \Vert h\Vert _2&=\sqrt{\Vert h_{S_0}\Vert _2^2+\Vert h_{S_0^c}\Vert _2^2}\\&\le \sqrt{\Vert h_{S_0}\Vert _2^2+(\Vert h_{S_0}\Vert _2+R)^2}\\&\le \sqrt{2\Vert h_{S_0}\Vert _2^2}+R\le \sqrt{2}z+R\\&\le \frac{\sqrt{2(1+\delta )}}{1-\sqrt{t/(t-1)}\delta }\epsilon +\left( \frac{\sqrt{2}\delta +\sqrt{t(\sqrt{(t-1)/t}-\delta )\delta }}{t(\sqrt{(t-1)/t}-\delta )}+1 \right) R. \end{aligned}$$

Substitute R into this inequality and the conclusion follows.

For the case where \(t\cdot k\) is not an integer, we set \(t^*:=\lceil tk\rceil / k\), then \(t^*>t\) and \(\delta _{t^*k}=\delta _{tk}<\sqrt{\frac{t-1}{t}}<\sqrt{\frac{t^*-1}{t^*}}\). We can then prove the result by working on \(\delta _{t^*k}\). \(\square \)