1 Introduction

1.1 Phase Retrieval

Phase retrieval is the process of recovering signals from phaseless measurements. It is of fundamental importance in numerous areas of applied physics and engineering [11, 14]. In general form, phase retrieval problem is to estimate the original signal \( x_0\in \mathbb {H}^n \) (\(\mathbb {H}={\mathbb C}\) or \( {\mathbb R}\)) from

$$\begin{aligned} |Ax|=|Ax_0|+e, \end{aligned}$$
(1.1)

where \(A = [a_1,\ldots , a_m]^*\in {\mathbb H}^{m\times n}\) is the measurement matrix and \( e=[e_1,\cdots , e_m]\in {\mathbb H}^m \) is an error term. While only the magnitude of \( Ax_0 \) is available, it is important to note that the setup naturally leads to ambiguous solutions. For example, if \( \hat{x}\in {\mathbb H}^n \) is a solution to (1.1), then any multiplication of \( \hat{x} \) and a scalar \( c\in {\mathbb H}\) (\( |c|=1 \)) is also a solution to (1.1). Hence, these global ambiguities are considered acceptable for this problem. In this paper, we recover the signal \( x_0 \) actually means that we reconstruct \( x_0 \) up to a unimodular constant.

It is known that, when \( {\mathbb H}={\mathbb R}\), at least \( 2n-1 \) measurements are needed to recover a signal \( x\in {\mathbb R}^n\) [3]. For the complex case, the minimum number of measurements are proved to be at least \( 4n-4 \) when n is in the form of \( n=2^k+1, k\in \mathbb {Z_+} \) [9]. However, for a general dimension n , the same question is still open. About the minimum number of observations, more details can be found in [4, 20]. To reduce the measurement number, priori information must be given, such as sparsity, which means that only few elements in the target signal \( x_0 \) is nonzero. In view of such sparse signals, phase retrieval is also known as compressive phase retrieval, which have many applications in data acquisition [15, 18]. The compressive phase retrieval problem is in fact the magnitude-only compressive sensing problem. For compressive phase retrieval, Wang and Xu explored the minimum number of measurements and extended the null space property in compressed sensing to compressive phase retrieval [20]. In [19], Voroniski and Xu gave the definition of strong restricted isometry property (Definition 2.2) and then many conclusions in compressed sensing can be extended to compressive phase retrieval, such as instance optimality [12].

1.2 Phase Retrieval with Redundant Dictionary

The above conclusions in compressive phase retrieval hold just for signals which are sparse in the standard coordinate basis. However, there are many examples in which a signal of interest is not sparse in an orthonormal basis but sparse in an overcomplete dictionary, such as radar images [13]. We refer to such signals as dictionary-sparse signals. In recent years, many researchers laid special stress on analysing these dictionary-sparse signals in compressed sensing [1, 7, 16]. However, the phase retrieval literature is lacking on this subject. Motivated by the wide application of redundant dictionaries and frames in signal processing and data analysis, we aim to build up a framework for the recovery of dictionary-sparse signals in phase retrieval, which we call it phase retrieval with redundant dictionary.

Suppose \( D\in {\mathbb H}^{n\times N} \) is an overcomplete dictionary (\( n<N \)) or a redundant dictionary. When \( n\ll N \), we say the dictionary D is highly overcomplete or highly redundant. Suppose the signal \( x_0\in {\mathbb H}^n \) is sparse in the overcomplete dictionary \( D\in {\mathbb H}^{n\times N } \). I.e., there exists a sparse vector \( z_0\in {\mathbb H}^N \), such that \( x_0=Dz_0 \). Thus the phase retrieval with redundant dictionary can be interpreted as recovering a signal \( x_0=Dz_0 \) from the measurements \( |ADz_0| \), where \( z_0 \) is sparse. That is to recover \( Dz_0 \) from

$$\begin{aligned} |ADz|=|ADz_0|. \end{aligned}$$
(1.2)

1.3 The \( \ell _1 \)-Analysis Model

Suppose the signal \( x_0\in {\mathbb H}^n \) can be expressed as \( x_0=Dz_0 \), where \( D\in {\mathbb H}^{n\times N } \) is a redundant dictionary and \( z_0\in {\mathbb H}^N \) is a sparse vector. When \( {\mathbb H}={\mathbb C}\), we use \( D^* \) to represent the adjoint conjugate of D . When \( {\mathbb H}= {\mathbb R}\), we use \( D^* \) to represent the transpose of D . In compressed sensing, to reconstruct the signal \( x_0 \), the most commonly used model is the \( \ell _1 \)-analysis model

$$\begin{aligned} \min \Vert D^*x\Vert _1\text{ subject } \text{ to } \Vert Ax-Ax_0\Vert _2^2\le \epsilon ^2, \end{aligned}$$
(1.3)

where \( \epsilon \) is the upper bound of the noise. Due to the smaller dimension of the unknown, \( \ell _1 \)-analysis leads to a simple optimization problem, which is considerably easier to solve. That’s why the \( \ell _1 \)-analysis model is widely used. We refer interested readers to [1, 7, 10] for more superiorities of \( \ell _1 \)-analysis model. In [7], Candès et al. proved that when D is a tight frame and \( D^*x_0 \) is almost k -sparse, the \( \ell _1 \)-analysis (1.3) can guarantee a stable recovery provided that the measurement matrix is Gaussian random matrix with \( m=\mathcal {O}(k\log (n/k)) \).

For the phase retrieval with redundant dictionary (1.2), we also consider the corresponding \( \ell _1 \)-analysis model

$$\begin{aligned} \min \Vert D^*x\Vert _1\text{ subject } \text{ to } \Vert |Ax|-|Ax_0|\Vert _2^2\le \epsilon ^2, \end{aligned}$$
(1.4)

where \( \epsilon \) is the upper bound of the noise level. In this paper, we aim to explore the conditions under which the \( \ell _1 \)-analysis model (1.4) can generate an accurate or a stable solution to (1.2). First for the noiseless case, we analyze the null space of the measurement matrix and give the conditions for exact recovery. Then for the noise case, we give a new property on the measurement matrix and prove that this property can guarantee a stable recovery.

Note that when \( D=I \), the phase retrieval with redundant dictionary is reduced to the traditional phase retrieval and the \( \ell _1 \)-analysis model is reduced to

$$\begin{aligned} \min \Vert x\Vert _1\text{ subject } \text{ to } \Vert |Ax|-|Ax_0|\Vert _2^2\le \epsilon ^2. \end{aligned}$$
(1.5)

For this case, when \( {\mathbb H}={\mathbb R}\), Gao et al. provided a detailed analysis of (1.5) in [12] and had the conclusion that a k -sparse signal can be stably recovered by \( \mathcal {O}(k\log (n/k)) \) Gaussian random measurements. Then a natural question that comes to mind is whether this conclusion still holds for a general frame D .

1.4 Organization

The rest of the paper is organized as follows. In Sect. 2, we give notations and recall some previous conclusions. In Sect. 3, for noiseless case (\( \epsilon =0 \)), we analyze the null space of the measurement matrix and give sufficient and necessary conditions for (1.4) to achieve an exact solution, which will be discussed in real and complex case separately. In general, it’s hard to check whether a matrix satisfies the null space property or not. So in Sect. , we introduce a new property (S-DRIP) (Definition 4.1) on the measurement matrix, which is a natural generalization of the DRIP (see [7] for more details). Using this property, we prove that when the measurement matrix is real Gaussian random matrix with \( m\ge \mathcal {O}(k\log (n/k)) \), the \( \ell _1 \)-analysis (1.4) can guarantee a stable recovery of real signals which are k -sparse in a redundant dictionary. In Sect. 5, we discuss the drawbacks of our results and file out some proper directions for the coming study. Lastly, some proofs are given in the Appendix.

2 Notations and Previous Results

We use \( \ell _0 \)-norm to measure the cardinality of non-zeros of a vector z . We call a signal z is k -sparse, if there are at most k non-zero elements in the signal, i.e., \( \Vert z\Vert _0\le k \). A set of vectors \(\{d_1,\cdots , d_N \} \) in \( {\mathbb H}^n \) is a frame of \( {\mathbb H}^n \) if there exist constants \( 0<s\le t<\infty \) such that for any \( f\in {\mathbb H}^n \),

$$\begin{aligned} s\Vert f\Vert _2^2\le \sum \limits _{j=1}^{N}|\langle f, d_j\rangle |^2\le t\Vert f\Vert _2^2. \end{aligned}$$

If \( s=t \), the frame is a tight frame. We call \( D\in {\mathbb H}^{n\times N} \) a frame in the sense that the columns of D form a frame. Let

$$\begin{aligned} \Sigma _k^N :=\left\{ x\in {\mathbb H}^N: \Vert x\Vert _0\le k \right\} \end{aligned}$$

and

$$\begin{aligned} D\Sigma _k^N :=\left\{ x\in {\mathbb H}^{n}: \exists z\in \Sigma _k^N, x=Dz\right\} . \end{aligned}$$

Suppose the target signal \(x_0\) is in the set \( D\Sigma _k^N \), which means that \(x_0\) can be represented as \(x_0=Dz_0\), where \(z_0\in \Sigma _k^N\).

The best k -term approximation error is defined as

$$\begin{aligned} \sigma _k(x)_1 := \min _{z\in \Sigma _k}\Vert x-z\Vert _1. \end{aligned}$$

For positive integers pq with \( p\le q \), we use [p : q] to represent the set \( \{p,p+1,\ldots ,q-1,q \} \). Suppose \( T\subseteq [1:m] \) is a subset of [1 : m] . We use \( T^c \) to represent the complement set of T and |T| to denote the cardinal number of T . Let \( A_T:=[a_j, j\in T]^* \) denote the sub-matrix of A where only rows with indices in T are kept. Denote \( \mathcal {N}(A) \) as the null space of A .

Definition 2.1

(DRIP) [7] Fix a dictionary \( D\in {\mathbb R}^{n\times N} \) and a matrix \( A\in {\mathbb R}^{m\times n} \). The matrix A satisfies the DRIP with parameters \( \delta \) and k if

$$\begin{aligned} (1-\delta )\Vert Dz\Vert _2^2\le \Vert ADz\Vert _2^2\le (1+\delta )\Vert Dz\Vert _2^2 \end{aligned}$$

holds for all k -sparse vectors \( z\in {\mathbb R}^N \).

The paper [7] have shown that Gaussian random matrices and other random compressed sensing matrices satisfy the DRIP of order k provided the number of measurements m on the order of \( \mathcal {O}(k\log (n/k)) \).

Definition 2.2

(SRIP)[19] We say the matrix \(A=[a_1,\cdots ,a_m]^{\top }\in \mathbb {R}^{m\times n}\) has the Strong Restricted Isometry Property of order k and constants \(\theta _-,\ \theta _+\in (0, 2)\) if

$$\begin{aligned} \theta _-\Vert x\Vert _2^2\le \min _{ I\subseteq [1:m], |I|\ge m/2}\Vert A_Ix\Vert _2^2\le \max _{I\subseteq [1:m],|I|\ge m/2} \Vert A_Ix\Vert _2^2\le \theta _+\Vert x\Vert _2^2 \end{aligned}$$

holds for all k -sparse signals \(x\in \mathbb {R}^n\).

This property was first introduced in [19]. Voroninski and Xu also proved that the Gaussian random matrices satisfy SRIP with high probability.

Theorem 2.1

[19] Suppose that \(t>1\) and \( A\in \mathbb {R}^{m\times n} \) is a Gaussian random matrix with \(m=\mathcal {O}(tk\log (n/k))\). Then there exist \(\theta _-\), \(\theta _+\), with \(0<\theta _-<\theta _+<2\), such that A satisfies SRIP of order tk and constants \(\theta _-\), \(\theta _+\), with probability \(1-exp(-cm/2)\), where \(c>0\) is an absolute constant and \(\theta _-\), \(\theta _+\) are independent of t .

3 The Null Space Property

In this section, for any \( x_0\in D\Sigma _k^N \), we consider the noiseless case of (1.4),

$$\begin{aligned} \min \Vert D^*x\Vert _1\text{ subject } \text{ to } |Ax|=|Ax_0|. \end{aligned}$$
(3.6)

Similarly as the traditional compressed sensing problem, we analyze the null space of the measurement matrix A to explore conditions under which (3.6) can obtain \( cx_0 \) (\( |c|=1 \)).

3.1 The Real Case

We first restrict the signals and measurements to the field of real numbers. The next theorem provides a sufficient and necessary condition for the exact recovery of (3.6).

Theorem 3.1

For a given matrix \( A\in {\mathbb R}^{m\times n} \) and a dictionary \( D\in {\mathbb R}^{n\times N} \), we claim that the following properties are equivalent.

  1. (A)

    For any \(x_0\in D\Sigma _k^N\),

    $$\begin{aligned} argmin _{x\in {\mathbb R}^n}\{\Vert D^*x\Vert _1:|Ax|=|Ax_0|\}=\{\pm x_0\}. \end{aligned}$$
  2. (B)

    For any \(T\subseteq [1:m]\), it holds

    $$\begin{aligned} \Vert D^*(u+v)\Vert _1<\Vert D^*(u-v)\Vert _1 \end{aligned}$$

    for all

    $$\begin{aligned} u\in \mathcal {N}(A_T)\backslash \{0\} ,\quad v\in \mathcal {N}(A_{T^c})\backslash \{0\} \end{aligned}$$

    satisfying

    $$\begin{aligned} u+v\in D\Sigma _k^N. \end{aligned}$$

Proof

(B)\(\Rightarrow \)(A). Assume (A) is false, namely, there exists a solution \( \hat{x}\ne \pm x_0 \) to (3.6). As \( \hat{x} \) is a solution, we have

$$\begin{aligned} |A\hat{x}|=|Ax_0| \end{aligned}$$
(3.7)

and

$$\begin{aligned} \Vert D^*{\hat{x}}\Vert _1\le \Vert D^*x_0\Vert _1. \end{aligned}$$
(3.8)

Denote \(a_j^{\top }, j=1,\ldots ,m\) as the rows of A . Then (3.7) implies that there exists a subset \(T\subseteq [1:m]\) satisfying

$$\begin{aligned} j\in T, \quad \langle a_j, x_0+\hat{x}\rangle =0, \end{aligned}$$
$$\begin{aligned} j\in T^c, \quad \langle a_j, x_0-\hat{x}\rangle =0, \end{aligned}$$

i.e.,

$$\begin{aligned} A_T(x_0+\hat{x})=0, \quad A_{T^c}(x_0-\hat{x})=0. \end{aligned}$$

Define

$$\begin{aligned} u:=x_0+\hat{x},\quad v:=x_0-\hat{x}. \end{aligned}$$

As \( \hat{x}\ne \pm x_0 \), we have \(u\in \mathcal {N}(A_T)\backslash \{0\}\), \(v\in \mathcal {N}(A_{T^c})\backslash \{0\}\) and \(u+v=2x_0\in D\Sigma _k^N\). Then from (B), we know

$$\begin{aligned} \Vert D^*x_0\Vert _1<\Vert D^*\hat{x}\Vert _1, \end{aligned}$$

which contradicts with (3.8).

(A)\(\Rightarrow \)(B). Assume (B) is false, which means that there exists a subset \(T\subseteq [1:m]\),

$$\begin{aligned} u\in \mathcal {N}(A_T)\backslash \{0\},\quad v\in \mathcal {N}(A_{T^c})\backslash \{0\}, \end{aligned}$$
(3.9)

such that

$$\begin{aligned} u+v\in D\Sigma _k^N \end{aligned}$$

and

$$\begin{aligned} \Vert D^*(u+v)\Vert _1\ge \Vert D^*(u-v)\Vert _1. \end{aligned}$$
(3.10)

Let \( x_0:=u+v\in D\Sigma _k^N\) be the signal we want to recover. Set \(\tilde{x} :=u-v\) and we have \( \tilde{x}\ne \pm x_0 \). Then from (3.10) we have

$$\begin{aligned} \Vert D^*\tilde{x}\Vert _1\le \Vert D^*x_0\Vert _1. \end{aligned}$$
(3.11)

Let \(a_j^{\top }, j=1,\ldots ,m\) denote the rows of A . Then from the definition of \( x_0 \) and \( \tilde{x} \), we have

$$\begin{aligned} 2\langle a_j,u \rangle =\langle a_j,x_0+\tilde{x}\rangle ,\\ 2\langle a_j,v\rangle =\langle a_j,x_0-\tilde{x}\rangle . \end{aligned}$$

By (3.9), the subset T satisfies

$$\begin{aligned} j\in T,\quad \langle a_j,x_0\rangle =-\langle a_j,\tilde{x} \rangle \end{aligned}$$

and

$$\begin{aligned} j\in T^c, \quad \langle a_j,x_0\rangle =\langle a_j,\tilde{x} \rangle , \end{aligned}$$

which implies

$$\begin{aligned} |Ax_0|=|A\tilde{x}|. \end{aligned}$$
(3.12)

Putting (3.11) and (3.12) together, we know \( \tilde{x} \) is a solution to model (3.6). However, \(\tilde{x}\ne \pm x_0 \) contradicts with (A). \(\square \)

3.2 The Complex Case

We now consider the same problem in complex case which means that the signals and measurements are all in the complex number field. Let \( \mathcal {S}=\{S_1,\ldots , S_p\} \) be a partition of [1 : m] . The next theorem is a generalization of Theorem 3.1.

Theorem 3.2

For a given matrix \( A\in {\mathbb C}^{m\times n} \) and a dictionary \( D\in {\mathbb C}^{n\times N} \), we claim that the following two properties are equivalent.

  1. (A)

    For any given \(x_0\in D\Sigma _k^N\),

    $$\begin{aligned} argmin _{x\in {\mathbb C}^n}\{\Vert D^*x\Vert _1:|Ax|=|Ax_0|\}=\{cx_0, c\in \mathbb {S}\}. \end{aligned}$$

    (B) Suppose \( \mathcal {S}=\{S_1,\ldots , S_p\}\) is a partition of [1 : m]. For any \(\eta _j\in \mathcal {N}(A_{S_j})\backslash \{0\}\), if

    $$\begin{aligned} \frac{\eta _1-\eta _l}{c_1-c_l}=\frac{\eta _1-\eta _j}{c_1-c_j} \in D\Sigma _k^N\backslash \{0\},\,\,j,l \in [2:p],\,\,j\ne l \end{aligned}$$
    (3.13)

    holds for some pairwise distinct \(c_1,\ldots ,c_p\in \mathbb {S}\), we have

    $$\begin{aligned} \Vert D^*(\eta _j-\eta _l)\Vert _1<\Vert D^*(c_l\eta _j-c_j\eta _l)\Vert _1. \end{aligned}$$

Proof

\((B)\Rightarrow (A)\). Suppose the statement (A) is false. That is to say, there exists a solution \( \hat{x}\notin \{cx_0, c\in \mathbb {S}\}\) to (3.6) which satisfies

$$\begin{aligned} \Vert D^*\hat{x}\Vert _1\le \Vert D^*x_0\Vert _1 \end{aligned}$$
(3.14)

and

$$\begin{aligned} |Ax_0|=|A\hat{x}|. \end{aligned}$$
(3.15)

Denote \(a_j^*, j=1,\ldots ,m\) as the rows of A . From (3.15) we have

$$\begin{aligned} \langle a_j,c_jx_0\rangle =\langle a_j,\hat{x}\rangle , \end{aligned}$$

with \( c_j\in \mathbb {S}, \,j=1,\ldots , m \). We can define an equivalence relation on [1 : m], namely \(j\sim l\), when \(c_j=c_l\). This equivalence relation leads to a partition \(\mathcal {S}=\{S_1,\ldots ,S_p\}\) of [1 : m]. For any \(S_j\), we have

$$\begin{aligned} A_{S_j}(c_jx_0)=A_{S_j}\hat{x}. \end{aligned}$$

Set \(\eta _j:=c_jx_0-\hat{x}\). Then we have \(\eta _j\in \mathcal {N}(A_{S_j})\backslash \{0\}\) and

$$\begin{aligned} \frac{\eta _1-\eta _l}{c_1-c_l}=\frac{\eta _1-\eta _j}{c_1-c_j}=x_0\in D\Sigma _k^N, \,\,\,\,\text {for all}\,\, j,l \in [2:p],\,\,j\ne l. \end{aligned}$$

According to the condition (B), we can get

$$\begin{aligned} \Vert D^*(\eta _j-\eta _l)\Vert _1<\Vert D^*(c_l\eta _j-c_j\eta _l)\Vert _1, \end{aligned}$$

i.e.,

$$\begin{aligned} \Vert D^*(c_j-c_l)x_0\Vert _1<\Vert D^*(c_j-c_l)\hat{x}\Vert _1. \end{aligned}$$

That is equivalent to

$$\begin{aligned} \Vert D^*x_0\Vert _1<\Vert D^*\hat{x}\Vert _1, \end{aligned}$$

which contradicts with (3.14).

\((A)\Rightarrow (B)\). Assume (B) is false, namely, there exists a partition \(\mathcal {S}=\{S_1,\ldots ,S_p\}\) of [1 : m], \(\eta _j\in \mathcal {N}(A_{S_j})\backslash \{0\}\), \( j\in [1:p] \) and some pairwise distinct \( c_1,\ldots ,c_p\in \mathbb {S} \) satisfying (3.13) but

$$\begin{aligned} \Vert D^*(\eta _{j_0}-\eta _{l_0})\Vert _1\ge \Vert D^*(c_{l_0}\eta _{j_0}-c_{j_0}\eta _{l_0})\Vert _1 \end{aligned}$$

holds for some distinct \(j_0, l_0\in [1:p]\). Set

$$\begin{aligned} \tilde{x}:=c_{l_0}\eta _{j_0}-c_{j_0}\eta _{l_0}, \,\, c_{l_0}\ne c_{j_0},\\ x_0:=\eta _{j_0}-\eta _{l_0}\in D\Sigma _k^N. \end{aligned}$$

Then we have

$$\begin{aligned} \tilde{x}\notin \{cx_0, c\in \mathbb {S}\} \end{aligned}$$

and

$$\begin{aligned} \Vert D^*\tilde{x}\Vert _1\le \Vert D^*x_0\Vert _1. \end{aligned}$$
(3.16)

Let \(a_j^*, j=1,\ldots ,m\) denote the rows of A . From \( \eta _j\in \mathcal {N}(A_{S_j})\backslash \{0\} \), we obtain

$$\begin{aligned} \langle a_k,\eta _{j_0}\rangle =0\,\,\, \text {and}\,\, \, \langle a_k,\eta _{l_0}\rangle =0, \,\, k\in S_{l_0}\cup S_{j_0}. \end{aligned}$$

The definition of \( x_0 \) and \( \tilde{x} \) implies

$$\begin{aligned} |\langle a_k,x_0\rangle |=|\langle a_k,\tilde{x}\rangle |, \,\, k\in S_{l_0}\cup S_{j_0}. \end{aligned}$$
(3.17)

While for \(k\notin S_{l_0}\cup S_{j_0} \), we might as well suppose \(k\in S_t \) \( (t\ne l_0, j_0)\), i.e., \( \langle a_k, \eta _t\rangle = 0 \). From

$$\begin{aligned} \frac{\eta _1-\eta _l}{c_1-c_l}=\frac{\eta _1-\eta _j}{c_1-c_j}, \end{aligned}$$

we can obtain

$$\begin{aligned} \frac{\eta _j-\eta _l}{c_j-c_l}=\frac{\eta _m-\eta _n}{c_m-c_n}, \end{aligned}$$

here jlmn are distinct integers. Set

$$\begin{aligned} y_0:= \frac{\eta _{j_0}-\eta _t}{c_{j_0}-c_t}=\frac{\eta _{l_0}-\eta _t}{c_{l_0}-c_t}. \end{aligned}$$

Then we have

$$\begin{aligned} \eta _{j_0}=(c_{j_0}-c_t)y_0+\eta _t,\\ \eta _{l_0}=(c_{l_0}-c_t)y_0+\eta _t. \end{aligned}$$

So \( \tilde{x} \) and \( x_0 \) can be rewritten as

$$\begin{aligned} \tilde{x}=c_{l_0}\eta _{j_0}-c_{j_0}\eta _{l_0}=c_t(c_{j_0}-c_{l_0})y_0+(c_l-c_j)\eta _t,\\ x_0=\eta _{j_0}-\eta _{l_0}=(c_{j_0}-c_{l_0})y_0. \end{aligned}$$

Then \(\langle a_k,\eta _t\rangle =0\) implies

$$\begin{aligned} |\langle a_k,\tilde{x}\rangle |=|\langle a_k,x_0\rangle |,\,\, k\in S_t. \end{aligned}$$

Using a similar argument, we can prove that the claim is also true for other subset \( S_j \). So we have

$$\begin{aligned} |\langle a_k,\tilde{x}\rangle |=|\langle a_k,x_0\rangle |, \,\, \text {for all} \,\, k. \end{aligned}$$
(3.18)

Combining (3.16) and (3.18), we know \(\tilde{x}\) is also a solution to (3.6). However, \( \tilde{x}\notin \{cx_0, c\in \mathbb {S}\} \) contradicts with (A). \(\square \)

Remark 3.1

If we choose \( D=I \), the null space property in Theorem 3.1 and Theorem 3.2 is consistent with the null space property which was introduced in paper [20].

According to the Theorems 3.1 and 3.2, if the measurement matrix satisfies the null space property, we can obtain an exact solution by solving model (3.6). But in general, condition (B) in Theorems 3.1 or 3.2 is difficult to be checked. So in Sect. 4, we provide another property (S-DRIP) of the measurement matrix which can also guarantee an exact recover of model (3.6) in noiseless case. In addition, we prove that this property can be satisfied by Gaussian random matrix.

4 S-DRIP and Stable Recovery

In compressed sensing, for any tight frame D , [7] had the conclusion that a signal \( x_0\in D\Sigma _k^N \) can be approximately reconstructed by \( \ell _1 \)-analysis (1.3) provided the measurement matrix satisfies DRIP and the best k -term approximation error of \( D^*x_0 \) is small. While in phase retrieval, when \( {\mathbb H}={\mathbb R}\), Gao et al. proved that if the measurement matrix satisfies SRIP, then the \( \ell _1 \)-analysis (1.5) can provide a stable solution to traditional phase retrieval problem [12]. For the phase retrieval with redundant dictionary, we combine the above two results to explore the conditions under which the \( \ell _1 \)-analysis model (1.4) can guarantee a stable recovery.

We first impose a natural property on the measurement matrix, which is a combination of DRIP and SRIP.

Definition 4.1

(S-DRIP) Let \( D\in {\mathbb R}^{n\times N} \) be a frame. We say the measurement matrix A obeys the S-DRIP of order k with constants \(\theta _-, \theta _+\in (0,2)\) if

$$\begin{aligned} \theta _-\Vert Dv\Vert _2^2\le \min _{ I\subseteq [1:m], |I|\ge m/2}\Vert A_IDv\Vert _2^2\le \max _{I\subseteq [1:m],|I|\ge m/2} \Vert A_IDv\Vert _2^2\le \theta _+\Vert Dv\Vert _2^2 \end{aligned}$$

holds for all k -sparse signals \(v\in {\mathbb R}^N\).

Thus a matrix \( A\in \mathbb {R}^{m\times n} \) satisfying S-DRIP means that any \( m'\times n \) submatrix of A , with \( m'\ge m/2 \) satisfies DRIP with appropriate parameters.

In fact any matrix \( A\in {\mathbb R}^{m\times n} \) obeying

$$\begin{aligned}&\mathbb {P}\bigg [c_-\Vert Dv\Vert _2^2\le \min _{ I\subseteq [1:m], |I|\ge m/2}\Vert A_IDv\Vert _2^2\le \max _{I\subseteq [1:m],|I|\ge m/2} \Vert A_IDv\Vert _2^2\le c_+\Vert Dv\Vert _2^2\bigg ]\nonumber \\&\quad \ge 1-2e^{-\gamma m} \end{aligned}$$
(4.19)

(\( 0<c_-<c_+<2 \) and \( \gamma \) is a positive number constant) for fixed \( Dv\in {\mathbb R}^n \) will satisfy the S-DRIP with high probability. This can be seen by a standard covering argument (see the proof of Theorem 2.1 in [19]). In [19], Voroninski and Xu proved that Gaussian random matrix satisfies (4.19) in Lemma 4.4. So we have the following conclusion.

Corollary 4.1

For \( t>1 \), Gaussian random matrix \( A\in {\mathbb R}^{m\times n} \) with \( m=\mathcal {O}(tk\log (n/k)) \) satisfies the S-DRIP of order tk and constants \(\theta _-, \theta _+\in (0,2)\) with probability \( 1-2e^{-\gamma m} \), where \( \gamma \) is an absolute positive constant and \( \theta _-, \theta _+ \) are independent of t .

For any \( x_0\in D\Sigma _k^N \), we return to consider the solving model

$$\begin{aligned} \min \Vert D^*x\Vert _1\text{ subject } \text{ to } \Vert |Ax|-|Ax_0|\Vert _2^2\le \epsilon ^2, \end{aligned}$$
(4.20)

where \( \epsilon \) is the error bound. Here signals and matrices are all restricted to the real number field. The next theorem tells under what conditions the solution to (4.20) is stable.

Theorem 4.1

Assume that \( D\in {\mathbb R}^{n\times N} \) is a tight frame and \( x_0\in D\Sigma _k^N \). The matrix \(A\in \mathbb {R}^{m\times n}\) satisfies the S-DRIP of order tk and level \( \theta _-, \theta _+ \in (0,2)\), with

$$\begin{aligned} t\ge \max \{\frac{1}{2\theta _--\theta _-^2},\,\frac{1}{2\theta _+-\theta _+^2}\}. \end{aligned}$$

Then the solution \( \hat{x} \) to (4.20) satisfies

$$\begin{aligned} \min \{\Vert \hat{x}-x_0\Vert _2,\Vert \hat{x}+x_0\Vert _2\}\le c_1\epsilon +c_2\frac{2\sigma _k(D^*x_0)_1}{\sqrt{k}}, \end{aligned}$$

where \( c_1=\frac{\sqrt{2(1+\delta )}}{1-\sqrt{t/(t-1)}\delta } \), \( c_2=\frac{\sqrt{2}\delta +\sqrt{t(\sqrt{(t-1)/t}-\delta )\delta }}{t(\sqrt{(t-1)/t}-\delta )}+1.\) Here \( \delta \) is a constant satisfying

$$\begin{aligned} \delta \le \max \{1-\theta _-, \theta _+-1 \}\le \sqrt{\frac{t-1}{t}}. \end{aligned}$$

We first give a more general lemma, which is the key to prove Theorem 4.1.

Lemma 4.1

Let \( D\in \mathbb {R}^{n\times N} \) be an arbitrary tight frame, \( x_0\in D\Sigma _k^N\) and \( \rho \ge 0\). Suppose that \( A\in \mathbb {R}^{m\times n} \) is a measurement matrix satisfying the DRIP with \( \delta = \delta _{tk}^A\le \sqrt{\frac{t-1}{t}} \) for some \( t>1 \). Then for any

$$\begin{aligned} D^*\hat{x}\in \{D^*x\in \mathbb {R}^N : \Vert D^*x\Vert _1\le \Vert D^*x_0\Vert _1+\rho , \, \Vert Ax-Ax_0\Vert _2\le \epsilon \}, \end{aligned}$$

we have

$$\begin{aligned} \Vert \hat{x}-x_0\Vert _2\le c_1\epsilon +c_2\frac{2\sigma _k(D^*x_0)_1}{\sqrt{k}}+c_2\cdot \frac{\rho }{\sqrt{k}}, \end{aligned}$$

where \( c_1=\frac{\sqrt{2(1+\delta )}}{1-\sqrt{t/(t-1)}\delta } \), \( c_2=\frac{\sqrt{2}\delta +\sqrt{t(\sqrt{(t-1)/t}-\delta )\delta }}{t(\sqrt{(t-1)/t}-\delta )}+1.\)

We put the proof of this Lemma in the Appendix.

Remark 4.1

When \( D=I \), which corresponds to the case of standard compressive phase retrieval, Theorem 4.1 and Lemma 4.1 are consistent with Theorem 3.1 and Lemma 2.1 in [12], respectively.

Remark 4.2

The DRIP constant in Lemma 4.1 is better than the DRIP constants given in [2] and [7]. In [7], Candès et al. proved that the \( l_1 \)-analysis (1.3) can guarantee a stable recovery of signals which are k -sparse in the tight frame D provided the measurement matrix satisfying DRIP with \( \delta _{2k} < 0.08 \). Then Baker improved the result by increasing the DRIP constant to \( \delta _{2k}<\frac{2}{3}\) in [2]. Here we extended Baker’s approach to get a better bound \( \delta _{tk}\le \sqrt{\frac{t-1}{t}} \) for \( t>1 \). As [6] shows, in the special case \( D=I \), for any \( t\ge 4/3 \), the condition \( \delta _{tk}\le \sqrt{\frac{t-1}{t}} \) is sharp for stable recovery in the noisy case. So it is not difficult to conclude that for any tight frame D , the condition \( \delta _{tk}\le \sqrt{\frac{t-1}{t}} \) is also sharp when \( t\ge 4/3 \).

Proof of the Theorem 4.1

As \(\hat{x}\) is the solution to (4.20), we have

$$\begin{aligned} \Vert D^*\hat{x}\Vert _1\le \Vert D^*x_0\Vert _1 \end{aligned}$$
(4.21)

and

$$\begin{aligned} \Vert |A\hat{x}|-|Ax_0|\Vert _2^2\le \epsilon ^2. \end{aligned}$$
(4.22)

Denote \( a_j^{\top }, j\in \{1,\ldots ,m\} \) as the rows of A and divide \(\{1,\ldots ,m\}\) into two groups:

$$\begin{aligned} T=\{j\mid \mathrm{sign}(\langle {a_j,\hat{x}} \rangle )=\mathrm{sign}(\langle {a_j,x_0} \rangle )\}, \end{aligned}$$
$$\begin{aligned} T^c=\{j\mid \mathrm{sign}(\langle {a_j,\hat{x}} \rangle )=-\mathrm{sign}(\langle {a_j,x_0} \rangle )\}. \end{aligned}$$

Then either \(|T|\ge m/2\) or \(|T^c|\ge m/2\). Without loss of generality, we suppose \(|T|\ge m/2\) .

Then (4.22) implies that

$$\begin{aligned} \Vert A_T\hat{x}-A_Tx_0\Vert _2^2\le \Vert A_T\hat{x}-A_Tx_0\Vert _2^2+\Vert A_{T^c}\hat{x}+A_{T^c}x_0\Vert _2^2\le \epsilon ^2. \end{aligned}$$
(4.23)

Combining (4.21) and (4.23), we have

$$\begin{aligned} D^*\hat{x}\in \{D^*x\in {\mathbb R}^N: \Vert D^*x\Vert _1\le \Vert D^*x_0\Vert _1, \Vert A_Tx-A_Tx_0\Vert _2\le \epsilon \}. \end{aligned}$$
(4.24)

Recall that A satisfies S-DRIP of order tk with constants \(\theta _-, \ \theta _+ \in (0,2)\). Here

$$\begin{aligned} t\ge \max \left\{ \frac{1}{2\theta _--\theta _-^2},\frac{1}{2\theta _+-\theta _+^2}\right\} >1. \end{aligned}$$

So \(A_T\) satisfies DRIP of order tk with

$$\begin{aligned} \delta _{tk}^{A_T}\le \max \{1-\theta _-,\ \theta _+-1\}\le \sqrt{\frac{t-1}{t}}. \end{aligned}$$
(4.25)

Combining (4.24), (4.25) and Lemma 4.1, we obtain

$$\begin{aligned} \Vert \hat{x}-x_0\Vert _2\le c_1\epsilon +c_2\frac{2\sigma _k(D^*x_0)_1}{\sqrt{k}}, \end{aligned}$$

where \(c_1\) and \(c_2\) are defined as before in the Theorem 4.1.

If \(|T^c|\ge \frac{m}{2}\), we can get the corresponding result

$$\begin{aligned} \Vert \hat{x}+x_0\Vert _2\le c_1\epsilon +c_2\frac{2\sigma _k(D^*x_0)_1}{\sqrt{k}}. \end{aligned}$$

Then we have proved the theorem. \(\square \)

According to Theorem 4.1, when \( \epsilon =0 \) and \( D^*x_0 \) is k -sparse, the \( \ell _1 \)-analysis (4.20) can provide an exact recovery of the phase retrieval with redundant dictionary (1.2) provided the measurement matrix satisfies S-DRIP. Meanwhile, from Theorem 4.1 and Corollary 4.1, we conclude that the \( \ell _1 \)-analysis (4.20) can provide a stable solution to problem (1.2) if we use as many as \( \mathcal {O}(k \log (n/k)) \) Gaussian random measurements.

5 Discussion

To solve the phase retrieval with redundant dictionary (1.2), we analyze the \( \ell _1 \)-analysis model and give two conditions on the measurement matrix that each of them can guarantee an exact recovery in noiseless case. Theorems 3.1 and 3.2 give the null space property as a sufficient and necessary condition for exact recovery. For the \( \ell _1 \)-synthesis model, we can also use the same analysis to give a null space property of the measurement matrix. A more detailed description of the \( \ell _1 \)-synthesis model is provided in [8]. Theorem 4.1 shows that the \( \ell _1 \)-analysis model is accurate when the measurement matrix satisfies S-DRIP and \( \Vert D^*x_0\Vert _0\le k \). In theory, the \( \ell _1 \)-analysis model has a good performance on solving the phase retrieval with redundant dictionary (1.2). However, the \( \ell _1 \)-analysis is a non-convex optimization for phase retrieval with redundant dictionary due to the non-convex feasible solution set. When \( D=I \), the algorithms of this model have been studied in [15, 17, 22]. These algorithms all demonstrate empirical success, but the convergence issue remains a difficult problem. Extending these algorithms to a redundant dictionary D and giving a convergence analysis is one direction of our future research. Another key drawback of our results is that Theorem 4.1 only holds in the real number field. The point is that the phase changes continuously and there is no proper definition of SRIP in the complex number field. The extension of this result to complex number field is another direction of our future work.