Stable Optimizationless Recovery from Phaseless Linear Measurements

Demanet, Laurent; Hand, Paul

doi:10.1007/s00041-013-9305-2

Stable Optimizationless Recovery from Phaseless Linear Measurements

Published: 14 November 2013

Volume 20, pages 199–221, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Fourier Analysis and Applications Aims and scope Submit manuscript

Stable Optimizationless Recovery from Phaseless Linear Measurements

Download PDF

Laurent Demanet¹ &
Paul Hand¹

758 Accesses
67 Citations
Explore all metrics

Abstract

We address the problem of recovering an n-vector from m linear measurements lacking sign or phase information. We show that lifting and semidefinite relaxation suffice by themselves for stable recovery in the setting of m=O(nlogn) random sensing vectors, with high probability. The recovery method is optimizationless in the sense that trace minimization in the PhaseLift procedure is unnecessary. That is, PhaseLift reduces to a feasibility problem. The optimizationless perspective allows for a Douglas-Rachford numerical algorithm that is unavailable for PhaseLift. This method exhibits linear convergence with a favorable convergence rate and without any parameter tuning.

Convex Recovery of a Structured Signal from Independent Random Linear Measurements

Stable Signal Recovery from Phaseless Measurements

Article 20 October 2015

Exact recovery of sparse multiple measurement vectors by $l_{2,p}$-minimization

Article Open access 10 January 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

We study the recovery of a vector $\mathbf {x}_{0} \in \mathbb {R}^{n}$ or $\mathbb {C}^{n}$ from the set of phaseless linear measurements

$$|\langle \mathbf {x}_0, \mathbf {z}_i\rangle |\quad \text{for } i = 1, \ldots, m, $$

where $\mathbf {z}_{i} \in \mathbb {R}^{n}$ or $\mathbb {C}^{n}$ are known random sensing vectors. Such amplitude-only measurements arise in a variety of imaging applications, such as X-ray crystallography [5, 15, 17], optics [23], and microscopy [16]. We seek stable and efficient methods for finding x ₀ using as few measurements as possible.

This recovery problem is difficult because the set of real or complex numbers with a given magnitude is nonconvex. In the real case, there are 2^m possible assignments of sign to the m phaseless measurements. Hence, exhaustive searching is infeasible. In the complex case, the situation is even worse, as there are a continuum of phase assignments to consider. A method based of alternated projections avoids an exhaustive search but does not always converge toward a solution [11, 12, 14].

In [5, 7, 8], the authors convexify the problem by lifting it to the space of n×n matrices, where xx ^∗ is a proxy for the vector x. A key motivation for this lifting is that the nonconvex measurements on vectors become linear measurements on matrices [1]. A rank-1 constraint is then relaxed to a trace minimization over the cone of positive semi-definite matrices, as is now standard in matrix completion [19]. This convex program is called PhaseLift in [7], where it is shown that x ₀ can be found robustly in the case of random z _i, if m=O(nlogn). The matrix minimizer is unique, which in turn determines x ₀ up to a global phase.

The contribution of the present paper is to show that trace minimization is unnecessary in this lifting framework for the phaseless recovery problem. The vector x ₀ can be recovered robustly by an optimizationless convex problem: one of finding a positive semi-definite matrix that is consistent with linear measurements. We prove there is only one such matrix, provided that there are O(nlogn) measurements. In other words, the phase recovery problem can be solved by intersecting two convex sets, without minimizing an objective. We show empirically that two algorithms converge linearly (exponentially fast) toward the solution. We remark that these methods are simpler than methods for PhaseLift because they require less or no parameter tuning. A result subsequent to the posting of this paper has improved the number of required measurements to O(n) by considering an alternative construction of the dual certificate that allows tighter probabilistic bounds [6].

In [2], the authors show that the complex phaseless recovery problem from random measurements is determined if m≥4n−2 (with probability one). This means that the x satisfying |〈x,z _i〉|=|〈x ₀,z _i〉| is unique and equal to x ₀, regardless of the method used to find it. A corollary of the analysis in [7], and of the present paper, is that this property is stable under perturbations of the data, provided m=O(nlogn). This determinacy is in contrast to compressed sensing and matrix completion, where a prior (sparsity, low-rank) is used to select a solution of an otherwise underdetermined system of equations. The relaxation of this prior (ℓ ₁ norm, nuclear norm) is then typically shown to determine the same solution. No such prior is needed here; the semi-definite relaxation helps find the solution, not determine it.

The determinacy of the recovery problem over n×n matrices may be unexpected because there are n ² unknowns and only O(nlogn) measurements. What compensates for the apparent lack of data is the fact that the matrix we seek has rank one and is thus on the edge of the cone of positive semi-definite matrices. Most perturbed matrices that are consistent with the measurements cease to remain positive semi-definite. In other words, the positive semi-definite cone X⪰0 is “spiky” around a rank-1 matrix X ₀. That is, with high probability, particular random hyperplanes that contain X ₀ and have large enough codimension will have no other intersection with the cone.

The present paper does not advocate for fully abandoning trace minimization in the context of phase retrieval. The structure of the sensing matrices appears to affect the number of measurements required for recovery. Consider measurements of the form $\mathbf {x}_{0}^{*} \varPhi \mathbf {x}_{0}$, for some Φ. Numerical simulations (not shown) suggest that O(n ²) measurements are needed if Φ is a matrix with Gaussian i.i.d. entries. On the other hand, it was shown in [19] that minimization of the nuclear norm constrained by Tr$(\mathbf {X}\varPhi) = \mathbf {x}_{0}^{*} \varPhi \mathbf {x}_{0}$ recovers $\mathbf {x}_{0} \mathbf {x}_{0}^{*}$ with high probability as soon as m=O(nlogn). Other numerical observations (not shown) suggest that it is the symmetric, positive semi-definite character of Φ that allows for optimizationless recovery.

The present paper owes much to [7], as our analysis is very similar to theirs. We wish to also reference the papers [20, 22], where phase recovery is cast as synchronization problem and solved via a semi-definite relaxation of max-cut type over the complex torus (i.e., the magnitude information is first factored out.) The idea of lifting and semi-definite relaxation was introduced very successfully for the max-cut problem in [13]. The paper [20] also introduces a fast and efficient method based on eigenvectors of the graph connection Laplacian for solving the angular synchronization problem. The performance of this latter method was further studied in [3].

1.1 Problem Statement and Main Result

Let $\mathbf{x}_{0} \in \mathbb {R}^{n}$ or $\mathbb {C}^{n}$ be a vector for which we have the m measurements $| \langle \mathbf {x}_{0}, \mathbf {z}_{i} \rangle | = \sqrt{b_{i}}$, for independent sensing vectors z _i distributed uniformly on the unit sphere. We write the phaseless recovery problem for x ₀ as

$$\begin{aligned} \text{Find } \mathbf {x}\text{ such that } A(\mathbf {x}) = \mathbf {b}, \end{aligned}$$

(1)

where $A:\mathbb {R}^{n} \to \mathbb {R}^{m}$ is given by A(x)_i=|〈x,z _i〉|², and A(x ₀)=b.

Problem (1) can be convexified by lifting it to a matrix recovery problem. Let $\mathcal {A}$ and its adjoint be the linear operators

where $\mathcal{H}^{n\times n}$ is the space of n×n Hermitian matrices. Observe that $\mathcal {A}( \mathbf {x}\mathbf {x}^{*}) = A(\mathbf {x}) $ for all vectors x. Letting $\mathbf {X}_{0} = \mathbf {x}_{0} \mathbf {x}_{0}^{*}$, we note that $\mathcal {A}(\mathbf {X}_{0}) = b$. We emphasize that $\mathcal {A}$ is linear in X whereas A is nonlinear in x.

The matrix recovery problem we consider is

$$\begin{aligned} \text{Find } \mathbf {X}\succeq 0 \text{ such that } \mathcal {A}(\mathbf {X}) = \mathbf {b}. \end{aligned}$$

(2)

Without the positivity constraint, there would be multiple solutions whenever $m < \frac{(n+1)n}{2}$. We include the constraint in order to allow for recovery in this classically underdetermined regime.

Our main result is that the matrix recovery problem (2) has a unique solution when there are O(nlogn) measurements.

Theorem 1

Let $\mathbf {x}_{0} \in \mathbb {R}^{n}$ or $\mathbb {C}^{n}$ and $\mathbf {X}_{0} = \mathbf {x}_{0} \mathbf {x}_{0}^{*}$. Let m≥cnlogn for a sufficiently large c. With high probability, X=X ₀ is the unique solution to X⪰0 and $\mathcal {A}(\mathbf {X}) = b$. This probability is at least $1 - e^{-\gamma\frac{m}{n}}$, for some γ>0.

As a result, the phaseless recovery problem has a unique solution, up to a global phase, with O(nlogn) measurements. In the real-valued case, the problem is determined up to a minus sign.

Corollary 2

Let $\mathbf {x}_{0} \in \mathbb {R}^{n}$ or $\mathbb {C}^{n}$. Let m≥cnlogn for a sufficiently large c. With high probability, {e ^iϕ x ₀} are the only solutions to A(x)=b. This probability is at least $1 - e^{-\gamma\frac{m}{n}}$, for some γ>0.

Theorem 1 suggests ways of recovering x ₀. If an $\mathbf {X}\in\{\mathbf {X}\succeq 0\} \cap\{\mathbf {X}\mid \mathcal {A}( \mathbf {X}) = \mathbf {b}\}$ can be found, x ₀ is given by the leading eigenvector of X. See Sect. 6 for more details on how to find X.

1.2 Stability Result

In practical applications, measurements are contaminated by noise. To show stability of optimizationless recovery, we consider the model

$$A(\mathbf {x}) + \boldsymbol{\nu}= \mathbf {b}, $$

where ν corresponds to a noise term with bounded ℓ ₂ norm, ∥ν∥₂≤ε. The corresponding noisy variant of (1) is

$$\begin{aligned} \text{Find } \mathbf {x}\text{ such that } \|A(\mathbf {x}) - \mathbf {b}\|_2 \leq \varepsilon \| \mathbf {x}_0\|_2^2. \end{aligned}$$

(3)

We note that all three terms in (3) scale quadratically in x or x ₀.

Problem (3) can be convexified by lifting it to the space of matrices. The noisy matrix recovery problem is

$$\begin{aligned} \text{Find } \mathbf {X}\succeq 0 \text{ such that } \| \mathcal {A}(\mathbf {X}) - \mathbf {b}\|_2 \leq \varepsilon \|\mathbf {X}_0\|_2. \end{aligned}$$

(4)

We show that all feasible X are within an O(ε) ball of X ₀ provided there are O(nlogn) measurements.

Theorem 3

Let $\mathbf {x}_{0} \in \mathbb {R}^{n}$ or $\mathbb {C}^{n}$ and $\mathbf {X}_{0} = \mathbf {x}_{0} \mathbf {x}_{0}^{*}$. Let m≥cnlogn for a sufficiently large c. With high probability,

$$\mathbf {X}\succeq 0 \quad\textit{and}\quad \| \mathcal {A}(\mathbf {X}) - \mathbf {b}\|_2 \leq \varepsilon \|\mathbf {X}_0\|_2 \quad\Longrightarrow\quad \| \mathbf {X}- \mathbf {X}_0\|_2 \leq C \varepsilon \|\mathbf {X}_0\|_2, $$

for some C>0. This probability is at least $1 - e^{-\gamma\frac {m}{n}}$, for some γ>0.

As a result, the phaseless recovery problem is stable with O(nlogn) measurements.

Corollary 4

Let $\mathbf {x}_{0} \in \mathbb {R}^{n}$ or $\mathbb {C}^{n}$. Let m≥cnlogn for a sufficiently large c. With high probability,

$$\| A(\mathbf {x}) - \mathbf {b}\|_2 \leq \varepsilon \|x_0\|_2^2 \quad\Longrightarrow\quad \| \mathbf {x}- e^{i \phi} \mathbf {x}_0 \|_2 \leq C \varepsilon \|\mathbf {x}_0 \|_2, $$

for some ϕ∈[0,2π), and for some C>0. This probability is at least $1 - e^{-\gamma\frac{m}{n}}$, for some γ>0.

Theorem 3 ensures that numerical methods can be used to find X. See Sect. 6 for ways of finding $\mathbf {X}\in\{\mathbf {X}\succeq 0\} \cap\{\mathcal {A}(\mathbf {X}) \approx \mathbf {b}\}$. As the recovered matrix may have large rank, we approximate x ₀ with the leading eigenvector of X.

1.3 Organization of this Paper

In Sect. 2, we prove a lemma containing the central argument for the proof of Theorem 1. Its assumptions involve ℓ ₁-isometry properties and the existence of an inexact dual certificate. Section 2.3 provides the proof of Theorem 1 in the real-valued case. It cites [7] for the ℓ ₁-isometry properties and Sect. 3 for existence of an inexact dual certificate. In Sect. 3 we construct an inexact dual certificate and show that it satisfies the required properties in the real-valued case. In Sect. 4 we prove Theorem 3 on stability in the real-valued case. In Sect. 5, we discuss the modifications in the complex-valued case. In Sect. 6, we present computational methods for the optimizationless problem with comparisons to PhaseLift. We also simulate them to establish stability empirically.

1.4 Notation

We use boldface for variables representing vectors or matrices. We use normal typeface for scalar quantities. Let z _i,k denote the kth entry of the vector z _i. For two matrices, let 〈X,Y〉=Tr(Y ^∗ X) be the Hilbert-Schmidt inner product. Let σ _i be the singular values of the matrix X. We define the norms

$$\|\mathbf {X}\|_p = \left(\sum_i \sigma_i^p\right)^{1/p}. $$

In particular, we write the Frobenius norm of X as ∥X∥₂. We write the spectral norm of X as ∥X∥.

An n-vector x generates a decomposition of $\mathbb {R}^{n}$ or $\mathbb {C}^{n}$ into two subspaces. These subspaces are the span of x and the span of all vectors orthogonal to x. Abusing notation, we write these subspaces as x and x ^⊥. The space of n-by-n matrices is correspondingly partitioned into the four subspaces x ⊗ x, x ⊗ x ^⊥, x ^⊥ ⊗ x, and x ^⊥ ⊗ x ^⊥, where ⊗ denotes the outer product. We write T _x for the set of symmetric matrices which lie in the direct sum of the first three subspaces, namely $T_{\mathbf {x}}= \{ \mathbf {x}\mathbf {y}^{*} + \mathbf {y}\mathbf {x}^{*} \mid \mathbf {y}\in \mathbb {R}^{n} \text { or } \mathbb {C}^{n}\}$. Correspondingly, we write $T^{\perp}_{\mathbf {x}}$ for the set of symmetric matrices in the fourth subspace. We note that $T^{\perp}_{\mathbf {x}}$ is the orthogonal complement of T _x with respect to the Hilbert-Schmidt inner product. Let e ₁ be the first coordinate vector. For short, let $T = T_{\mathbf {e}_{1}}$ and $T^{\bot}= T^{\bot}_{\mathbf {e}_{1}}$. We denote the projection of X onto T as either $\mathcal{P}_{T} \mathbf {X}$ or X _T. We denote projections onto T ^⊥ similarly.

We let I be the n×n identity matrix. We denote the range of $\mathcal {A}^{*}$ by $\mathcal{R}( \mathcal {A}^{*})$.

2 Proof of Main Result

Because of scaling and the property that the measurement vectors z _i come from a rotationally invariant distribution, we take x ₀=e ₁ without loss of generality. Because all measurements scale with the length ∥z _i∥₂, it is equivalent to establish the result for independent unit normal sensing vectors z _i. To prove Theorem 1, we use an argument based on inexact dual certificates and ℓ ₁-isometry properties of $\mathcal {A}$. This argument parallels that of [7]. We directly use the ℓ ₁-isometry properties they establish, but we require different properties on the inexact dual certificate.

2.1 About Dual Certificates

As motivation for the introduction of an inexact dual certificate in the next section, observe that if $\mathcal {A}$ is injective on T, and if there exists a (exact) dual certificate $\mathbf {Y}\in\mathcal{R}(\mathcal {A}^{*})$ such that

$$\mathbf {Y}_{T}= 0 \quad\text{ and } \quad \mathbf {Y}_{T^\perp }\succ 0, $$

then X ₀ is the only solution to $\mathcal {A}(\mathbf {X}) = \mathbf {b}$. This is because

$$0 = \langle \mathbf {X}- \mathbf {X}_0, \mathbf {Y}\rangle = \langle \mathbf {X}_{T^\perp }, \mathbf {Y}_{T^\perp }\rangle \quad\Rightarrow\quad \mathbf {X}_{T^\perp }= 0 \quad\Rightarrow\quad \mathbf {X}= \mathbf {X}_0, $$

where the first equality is because $\mathbf {Y}\in\mathcal{R}(\mathcal {A}^{*})$ and $\mathcal {A}(\mathbf {X}) = \mathcal {A}(\mathbf {X}_{0})$. The last implication follows from injectivity on T.

Conceptually, Y arises as a Lagrange multiplier, dual to the constraint X⪰0 in the feasibility problem

$$\min\; 0 \quad\text{such that} \quad \mathcal {A}(\mathbf {X}) = \mathbf {b}, \quad \mathbf {X}\succeq0. $$

Dual feasibility requires Y⪰0. As visualized in Fig. 1a, Y acts as a vector normal to a codimension-1 hyperplane that separates the lower-dimensional space of solutions $\{\mathcal {A}(\mathbf {X}) = b\}$ from the positive matrices not in T. The condition $\mathbf {Y}_{T^{\perp}} \succ 0$ is further needed to ensure that this hyperplane only intersects the cone along T, ensuring uniqueness of the solution.

The nullspace condition Y _T=0 is what makes the certificate exact. As $\mathbf {Y}\in\mathcal{R}(\mathcal {A}^{*})$, Y must be of the form $\sum_{i} \lambda _{i} \mathbf {z}_{i} \mathbf {z}_{i}^{*}$. The strict requirement that Y _T=0 would force the λ _i to be complicated (at best algebraic) functions of all the z _j, j=1,…,m. We follow [7] in constructing instead an inexact dual certificate, such that Y _T is close to but not equal to 0, and for which the λ _i are more tractable (quadratic) polynomials in the z _i. A careful inspection of the injectivity properties of $\mathcal {A}$, in the form of the RIP-like condition in [7], is what allows the relaxation of the nullspace condition on Y.

2.2 Central Lemma on Inexact Dual Certificates

With further information about feasible X, we can relax the property that Y _T is exactly zero. In [7], the authors show that all feasible X lie in a cone that is approximately $\{ \|\mathbf {X}_{T^{\perp}}\|_{1} \geq\|\mathbf {X}_{T}- \mathbf {X}_{0}\|\}$, provided there are O(n) measurements. As visualized in Fig. 1b, $\bar {\mathbf {Y}}$ acts as a vector normal to a hyperplane that separates X ₀ from the rest of this cone. The proof of Theorem 1 hinges on the existence of such an inexact dual certificate, along with ℓ ₁-isometry properties that establish X is in this cone with high probability.

Lemma 1

Suppose that $\mathcal {A}$ satisfies

$$\begin{aligned} m^{-1} \| \mathcal {A}(\mathbf {X}) \|_1 \leq(1+ \delta) \|\mathbf {X}\|_1 &\quad\textit{ for all } \mathbf {X}\succeq 0 , \end{aligned}$$

(5)

$$\begin{aligned} m^{-1} \| \mathcal {A}(\mathbf {X}) \|_1 \geq0.94(1 - \delta) \|\mathbf {X}\| &\quad\textit{ for all } \mathbf {X}\in T, \end{aligned}$$

(6)

for some δ≤1/9. Suppose that there exists $\bar {\mathbf {Y}}\in \mathcal{R} (\mathcal {A}^{*})$ satisfying

$$\begin{aligned} \| \bar {\mathbf {Y}}_{T}\|_1 \leq1/2 \quad\textit{and}\quad \bar {\mathbf {Y}}_{T^\perp }\succeq \mathbf {I}_{T^\perp }. \end{aligned}$$

(7)

Then, X ₀ is the unique solution to (2).

Proof of Lemma 1

Let X solve (2), and let H=X−X ₀. We start by showing, as in [7], that the ℓ ₁-isometry conditions (7)–(8) guarantee solutions lie on the cone

$$\begin{aligned} \| \mathbf {H}_{T^\perp }\|_1 \geq\frac{0.94 (1-\delta)}{1+\delta} \| \mathbf {H}_{T}\|. \end{aligned}$$

(8)

This is because

$$0.94 (1-\delta) \| \mathbf {H}_{T}\| \leq m^{-1} \| \mathcal {A}(\mathbf {H}_{T}) \|_1 = m^{-1} \| \mathcal {A}(\mathbf {H}_{T^\perp }) \|_1 \leq(1+\delta) \| \mathbf {H}_{T^\perp }\|_1, $$

where the equality comes from $0 = \mathcal {A}(\mathbf {H}) = \mathcal {A}(\mathbf {H}_{T}) + \mathcal {A}(\mathbf {H}_{T^{\perp}})$, and the two inequalities come from the ℓ ₁-isometry properties (7)–(8) and the fact that $\mathbf {H}_{T^{\perp}} \succeq 0$.

Because $\mathcal {A}(\mathbf {H}) = 0$ and $\bar {\mathbf {Y}}\in\mathcal{R}(\mathcal {A}^{*})$,

$$\begin{aligned} 0 & = \langle \mathbf {H}, \bar {\mathbf {Y}}\rangle \\ &= \langle \mathbf {H}_{T}, \bar {\mathbf {Y}}_{T}\rangle + \langle \mathbf {H}_{T^\perp }, \bar {\mathbf {Y}}_{T^\perp }\rangle \\ &\geq\|\mathbf {H}_{T^\perp }\|_1 - \frac{1}{2}\|\mathbf {H}_{T}\| \end{aligned}$$

(9)

$$\begin{aligned} &\geq\left( \frac{0.94 (1-\delta)}{1+\delta} - \frac{1}{2} \right) \| \mathbf {H}_{T}\|, \end{aligned}$$

(10)

where (11) and (12) follow from (9) and (10), respectively. Because the constant in (12) is positive, we conclude H _T=0. Then, (11) establishes $\mathbf {H}_{T^{\perp}}= 0$. □

2.3 Proof of Theorem 1 and Corollary 2

We use Lemma 1 to prove Theorem 1 for real-valued signals.

Proof of Theorem 1

We need to show that (7)–(9) hold with high probability if m>cnlogn for some c. Lemmas 3.1 and 3.2 in [7] show that (7) and (8) both hold with probability of at least $1 - 3 e^{-\gamma_{1} m}$ provided m>c ₁ n for some c ₁. In section 3, we construct $\bar {\mathbf {Y}}\in\mathcal{R}({\mathcal {A}^{*}})$. As per Lemma 2, $\| \bar {\mathbf {Y}}_{T}\|_{1} \leq1/2$ with probability at least $1 - e^{-\gamma_{2} m / n}$ if m>c ₂ n. As per Lemma 3, $\| \bar {\mathbf {Y}}_{T^{\perp}}- 2 \mathbf {I}_{T^{\perp}}\| \leq1$ with probability at least $1 - 2 e^{-\gamma_{2} m / \log n}$ if m>c ₃ nlogn. Hence, $\bar {\mathbf {Y}}_{T^{\perp}} \succeq \mathbf {I}_{T^{\perp}}$ with at least the same probability. Hence, all of the conditions of Lemma 1 hold with probability at least 1−e ^−γm/n if m>cnlogn for some c and γ. □

The proof of Corollary 2 is immediate because, with high probability, Theorem 1 implies

$$A(\mathbf {x}_1) = A(\mathbf {x}_0) \quad\Rightarrow\quad \mathbf {x}_1\mathbf {x}_1^* = \mathbf {x}_0\mathbf {x}_0^* \quad\Rightarrow\quad \mathbf {x}_1 = e^{i \phi} \mathbf {x}_0. $$

3 Existence of Inexact Dual Certificate

To use Lemma 1 in the proof of Theorem 1, we need to show that there exists an inexact dual certificate satisfying (9) with high probability. Our inexact dual certificate vector is different from that in [7], but we use identical tools for its construction and analysis. We also adopt similar notation.

We note that $\mathcal {A}^{*} \mathcal {A}( \mathbf {X}) = \sum_{i} \langle \mathbf {X}, \mathbf {z}_{i} \mathbf {z}_{i}^{*} \rangle \mathbf {z}_{i} \mathbf {z}_{i}^{*}$, which can alternatively be written as

$$\mathcal {A}^* \mathcal {A}= \sum_{i=1}^m \mathbf {z}_i \mathbf {z}_i^* \otimes \mathbf {z}_i \mathbf {z}_i^*. $$

We let $\mathcal {S}= \mathbb {E}[\mathbf {z}_{i} \mathbf {z}_{i}^{*} \otimes \mathbf {z}_{i} \mathbf {z}_{i}^{*}]$. The operator $\mathcal {S}$ is invertible. It and its inverse are given by

$$\begin{aligned} \mathcal {S}(\mathbf {X}) &= 2 \mathbf {X}+ \text {Tr}(\mathbf {X}) \mathbf {I}, \\ \mathcal {S}^{-1}(\mathbf {X}) &= \frac{1}{2} \left( \mathbf {X}- \frac{1}{n+2} \text {Tr}(\mathbf {X}) \mathbf {I}\right). \end{aligned}$$

(11)

We define the inexact dual certificate

$$\begin{aligned} \bar {\mathbf {Y}}= \frac{1}{m} \sum_{i=1}^m 1_{E_i} \mathbf {Y}_i, \end{aligned}$$

(12)

where

$$\begin{aligned} \mathbf {Y}_i &=\left[ \frac{3}{n+2} \|\mathbf {z}_i\|_2^2 - z_{i,1}^2 \right] \mathbf {z}_i \mathbf {z}_i^*, \end{aligned}$$

(13)

$$\begin{aligned} E_i &= \{ |z_{i,1}| \leq\sqrt{2 \beta\log n}\} \cap\{ \| \mathbf {z}_i \|_2 \leq\sqrt{3 n} \}. \end{aligned}$$

(14)

Alternatively, we can write the inexact dual certificate vector as

$$\begin{aligned} \bar {\mathbf {Y}}= \frac{1}{m} \mathcal {A}^* \left( \mathbf {1}_E\circ \mathcal {A}\mathcal {S}^{-1}2 (\mathbf {I}- \mathbf {e}_1 \mathbf {e}_1^*) \right ), \end{aligned}$$

(15)

where $(\mathbf {1}_{E})_{i} = 1_{E_{i}}$ and ∘ is the elementwise product of vectors. In our notation, truncated quantities have overbars. We subsequently omit the subscript i in z _i when it is implied by context.

3.1 Motivation for the Dual Certificate

For ease of understanding, we first consider a candidate dual certificate given by

$$\begin{aligned} \widetilde{\mathbf {Y}} = \frac{1}{m} \mathcal {A}^* \mathcal {A}\mathcal {S}^{-1}2 (\mathbf {I}- \mathbf {e}_1 \mathbf {e}_1^*). \end{aligned}$$

The motivation for this candidate is twofold: $\widetilde{\mathbf {Y}} \in \mathcal{R}(\mathcal {A}^{*})$, and $\widetilde{\mathbf {Y}} \approx2(\mathbf {I}- \mathbf {e}_{1} \mathbf {e}_{1}^{*})$ as m→∞ because $\mathbb {E}[ \mathcal {A}^{*} \mathcal {A}] = m \mathcal{S}$. In this limit, $\widetilde{Y}$ becomes an exact dual certificate. For finite m, it should be close but inexact. We can write

$$\widetilde{\mathbf {Y}} = \frac{1}{m} \sum_i \mathbf {Y}_i, $$

where Y _i is an independent sample of the random matrix

$$\begin{aligned} \left[ \frac{3}{n+2} \|\mathbf {z}\|_2^2 - z_1^2 \right] \mathbf {z}\mathbf {z}^*, \end{aligned}$$

where $\mathbf {z}\sim\mathcal{N}(0, \mathbf {I})$. Because the vector Bernstein inequality requires bounded vectors, we truncate the dual certificate in the same manner as [7]. That is, we consider $1_{E_{i}} \mathbf {Y}_{i}$, completing the derivation of (14).

3.2 Bounds on $\bar {\mathbf {Y}}$

We define $\pi(\beta) = \mathbb {P}(E^{c})$, where E is the event given by

$$\begin{aligned} E &= \{ |z_{1}| \leq\sqrt{2 \beta\log n}\} \cap\{ \| \mathbf {z}\|_2 \leq\sqrt{3 n} \}, \end{aligned}$$

(16)

where $\mathbf {z}\sim\mathcal{N}(0, \mathbf {I})$. In [7], the authors provide the bound $\pi(\beta) \leq \mathbb {P}( |z_{1}| > \sqrt{2 \beta\log n}) + \mathbb {P}(\|\mathbf {z}\|_{2}^{2} > 3n) \leq n^{-\beta} + e^{-n/3}$, which holds if 2βlogn≥1.

We now present two lemmas that establish that $\bar {\mathbf {Y}}$ is approximately $2(\mathbf {I}- \mathbf {e}_{1} \mathbf {e}_{1}^{*})$, and is thus an inexact dual certificate satisfying (9).

Lemma 2

Let $\bar {\mathbf {Y}}$ be given by (14). There exists positive γ and c such that for sufficiently large n

$$\mathbb {P}\left( \left\| \bar {\mathbf {Y}}_{T}\right\|_1 \geq\frac{1}{2}\right) \leq \exp\left(-\gamma\frac{m}{n} \right) $$

if m≥cn.

Lemma 3

Let $\bar {\mathbf {Y}}$ be given by (14). There exists positive γ and c such that for sufficiently large n

$$\mathbb {P}\left( \left\| \bar {\mathbf {Y}}_{T^\perp }- 2 \mathbf {I}_{T^\perp }\right\| \geq1 \right) \leq2 \exp\left(-\gamma\frac{m}{\log n} \right) $$

if m≥cnlogn.

3.3 Proof of Lemma 2: $\bar {\mathbf {Y}}$ on T

We prove Lemma 2 in a way that parallels the corresponding proof in [7]. Observe that

$$\| \bar {\mathbf {Y}}_{T}\|_1 \leq\sqrt{2} \| \bar {\mathbf {Y}}_{T}\|_2 \leq2 \| \bar {\mathbf {Y}}_{T}\mathbf {e}_1\|_2, $$

where the first inequality follows because $\bar {\mathbf {Y}}_{T}$ has rank at most 2, and the second inequality follows because $\bar {\mathbf {Y}}_{T}$ can be nonzero only in its first row and column. We can write

$$\bar {\mathbf {Y}}_{T}\mathbf {e}_1 = \frac{1}{m} \sum_{i=1}^m \bar {\mathbf {y}}_i, $$

where $\bar {\mathbf {y}}_{i} = \mathbf {y}_{i} 1_{E_{i}}$, and y _i are independent samples of

$$\mathbf {y}= \left[ \frac{3}{n+2} \|\mathbf {z}\|_2^2 - z_1^2 \right] z_1 \mathbf {z}=: \xi z_1 \mathbf {z}. $$

To bound the ℓ ₂ norm of $\bar {\mathbf {Y}}_{T}\mathbf {e}_{1}$, we use the Vector Bernstein inequality on $\bar {\mathbf {y}}_{i}$.

Theorem 5

(Vector Bernstein inequality)

Let x _i be a sequence of independent random vectors and set $V \geq\sum_{i} \mathbb {E}\|\mathbf {x}_{i}\|_{2}^{2}$. Then for all t≤V/max∥x _i∥₂, we have

$$\mathbb {P}\left(\left\| \sum_i (\mathbf {x}_i - \mathbb {E}\mathbf {x}_i ) \right\|_2 \geq\sqrt {V} + t \right) \leq e^{-t^2/4V}. $$

In order to apply this inequality, we need to compute $\max \| \bar {\mathbf {y}}\|_{2}$, $\mathbb {E}\bar {\mathbf {y}}$, and $\mathbb {E}\| \bar {\mathbf {y}}\|_{2}$, where $\bar {\mathbf {y}}= \mathbf {y}1_{E}$.

First, we compute $\max\| \bar {\mathbf {y}}\|_{2}$. On the event E, $|z_{1}| \leq \sqrt{2 \beta\log n}$ and $\|\mathbf {z}\|_{2} \leq\sqrt{3 n}$. If n is large enough that 2βlogn≥9, then |ξ|≤2βlogn. Thus,

$$\| \bar {\mathbf {y}}\|_2 \leq\sqrt{24 n} (\beta\log n)^{3/2} $$

for sufficiently large n.

Second, we find an upper bound for $\mathbb {E}\bar {\mathbf {y}}$. Note that $\mathbb {E}y_{1} = 0$ because

$$\begin{aligned} \mathbb {E}[z_1^4] &=3,\\ \mathbb {E}[z_1^2 \|\mathbf {z}\|_2^2] &= n+2. \end{aligned}$$

By symmetry, every entry of $\bar {\mathbf {y}}$ has zero mean except the first. Hence,

$$\| \mathbb {E}\bar {\mathbf {y}}\|_2 = | \mathbb {E}\bar{y}_1 | = | \mathbb {E}(y_1 - y_1 1_{E^c} ) | = | \mathbb {E}y_1 1_{E^c} | \leq\sqrt{\mathbb {P}(E^c) } \sqrt{\mathbb {E}y_1^2} = \sqrt{\pi (\beta)} \sqrt{ \mathbb {E}y_1^2}. $$

Computing,

$$y_1^2 = (\xi z_1^2)^2 = z_1^8 - \frac{6}{n+2} z_1^6 \|z\|_2^2 + \frac {9}{(n+2)^2} z_1^4 \|z\|_2^4, $$

we find

$$\mathbb {E}y_1^2 \leq44, $$

where we have used

$$\begin{aligned} \mathbb {E}[ z_1^8] =& 105, \end{aligned}$$

(17)

$$\begin{aligned} \mathbb {E}[z_1^6 \| \mathbf {z}\|_2^2 ] =& 15n + 90, \end{aligned}$$

(18)

$$\begin{aligned} \mathbb {E}[z_1^4 \| \mathbf {z}\|_2^4 ] =& 3n^2 + 30 n + 72. \end{aligned}$$

(19)

Thus,

$$\begin{aligned} \| \mathbb {E}\bar {\mathbf {y}}\|_2 \leq\sqrt{44(n^{-\beta} + e^{-n/3})}. \end{aligned}$$

(20)

Third, we find an upper bound for $\mathbb {E}\| \bar {\mathbf {y}}\|_{2}^{2}$. Because $\| \bar {\mathbf {y}}\|_{2}^{2} \leq\| \mathbf {y}\|_{2}^{2}$, we write out

$$\begin{aligned} \| \mathbf {y}\|_2^2 &= \xi^2 z_1^2 \| \mathbf {z}\|_2^2 = z_1^6 \| \mathbf {z}\|_2^2 - \frac{6}{n+2} z_1^4 \|\mathbf {z}\|_2^4 + \frac {9}{(n+2)^2} z_1^2 \|\mathbf {z}\|_2^6. \end{aligned}$$

Hence,

$$\begin{aligned} \mathbb {E}[ \| \mathbf {y}\|_2^2 ] &= (15 n + 90) - \frac{6}{n+2} (3n^2 + 30 n + 72) + \frac{9}{(n+2)^2} (n+2)(n+4)(n+6) \end{aligned}$$

(21)

$$\begin{aligned} &\leq8n+16, \end{aligned}$$

(22)

where we have used (20), (21), and

$$ \mathbb {E}[z_1^2 \| \mathbf {z}\|_2^6 ] = (n+2)(n+4)(n+6). $$

(23)

Applying the vector Bernstein inequality with V=m(8n+16), we have that for all $t \leq(8n+16) / [ \sqrt{24n}(\beta\log n)^{3/2}]$,

$$\mathbb {P}\left( \frac{1}{m} \left \| \sum_i \bar {\mathbf {y}}_i - \mathbb {E}\bar {\mathbf {y}}_i \right\| _2 \geq\sqrt{\frac{8n+16}{m}} + t \right) \leq\exp\left(- \frac {mt^2}{4(8n+16)} \right). $$

Using the triangle inequality and (22), we get

$$\mathbb {P}\left( \frac{1}{m} \left \| \sum_i \bar {\mathbf {y}}_i \right\|_2 \geq\sqrt {44 (n^{-\beta} + e^{-n/3}} ) + \sqrt{\frac{8n+16}{m}} + t \right) \leq \exp\left(- \frac{mt^2}{4(8n+16)} \right). $$

Lemma 2 follows by choosing t,β, and m≥cn where n and c are large enough that

$$\sqrt{44 (n^{-\beta} + e^{-n/3}} ) + \sqrt{\frac{8n+16}{m}} + t \leq \frac{1}{4}. $$

3.4 Proof of Lemma 3: $\bar {\mathbf {Y}}$ on T ^⊥

We prove Lemma 3 in a way that parallels the corresponding proof in [7]. We write

$$\bar {\mathbf {Y}}_{T^\perp }- 2 \mathbf {I}_{T^\perp }= \frac{1}{m} \sum_i (\mathbf {W}_i 1_{E_i} -2 \mathbf {I}_{T^\perp }1_{E_i^c}), $$

where W _i are independent samples of

$$\begin{aligned} \mathbf {W}= \left[ \frac{3}{n+2} \|\mathbf {z}\|_2^2 - z_1^2 \right] \mathcal {P}_{T^\perp }( \mathbf {z}\mathbf {z}^*) - 2 \mathbf {I}_{T^\perp }. \end{aligned}$$

(24)

We decompose W into the three terms

$$\begin{aligned} \mathbf {W}&\phantom{:}= -\left[z_1^2 -1 \right] \mathcal {P}_{T^\perp }( \mathbf {z}\mathbf {z}^*) + 3 \left[ \frac {1}{n+2} \|\mathbf {z}\|_2^2 - 1 \right] \mathcal {P}_{T^\perp }( \mathbf {z}\mathbf {z}^*) + 2 (\mathcal {P}_{T^\perp }{ \mathbf {z}\mathbf {z}^*} - \mathbf {I}_{T^\perp }) \end{aligned}$$

(25)

$$\begin{aligned} &:= \mathbf {W}^{(0)}+ \mathbf {W}^{(1)}+ \mathbf {W}^{(2)}. \end{aligned}$$

(26)

Letting $\bar {\mathbf {W}}^{(k)}_{i} = \mathbf {W}^{(k)}_{i} 1_{E_{i}}$, it suffices to show that with high probability

$$\begin{aligned} \frac{1}{m}\left\| \sum_i 2\mathbf {I}_{T^\perp }1_{E_i^c} \right\| \leq\frac{1}{4} \quad\text{and,}\quad \frac{1}{m}\left\|\sum_i \bar{\mathbf {W}}_i^{(k)} \right\| \leq\frac{1}{4} \quad\text{for } k=0, 1, 2. \end{aligned}$$

(27)

3.4.1 Bound on $\mathbf {I}_{T^{\perp}}1_{E_{i}^{c}}$

We show that $m^{-1} \| \sum_{i} \mathbf {I}_{T^{\perp}}1_{E_{i}^{c}}\| = m^{-1} \sum_{i} 1_{E_{i}^{c}} $ is small with probability at least 1−2e ^−γm for some constant γ>0. To do this, we use the scalar Bernstein inequality.

Theorem 6

(Bernstein inequality)

Let {X _i} be a finite sequence of independent random variables. Suppose that there exists V and c such that for all X _i and all k≥3,

$$\sum_i \mathbb {E}|X_i|^k \leq\frac{1}{2} k! V c_0^{k-2}. $$

Then for all t≥0,

$$\begin{aligned} \mathbb {P}\left( \left| \sum_i X_i - \mathbb {E}X_i \right| \geq t \right) \leq2 \exp\left( - \frac{t^2}{2V + 2 c_0 t} \right). \end{aligned}$$

(28)

Observing that $\mathbb {E}| 1_{E_{i}^{c}}|^{k} = \mathbb {E}1_{E_{i}^{c}} = \pi(\beta)$, we apply the Bernstein inequality with V=π(β)m and c ₀=1/3. Thus,

$$\mathbb {P}\left( \left|\frac{1}{m} \sum_i 1_{E_i^c} - \pi(\beta) \right| \geq t \right) \leq2 \exp\left( - \frac{m t^2}{2 \pi(\beta) + 2 t /3}\right). $$

Using the triangle inequality and taking t and β such that π(β)+t≤1/8 for sufficiently large n, we get

$$\mathbb {P}\left( \left|\frac{1}{m} \sum_i 1_{E_i^c} \right| \geq\frac{1}{8} \right) \leq2 \exp\left( - \gamma m \right) $$

for a γ>0.

3.4.2 Bound on $\bar {\mathbf {W}}^{(0)}$

We show $m^{-1} \|\sum_{i} \bar {\mathbf {X}}^{(0)}\|$ is small with probability at least 1−2exp(−γ/logn). We write this norm as a supremum over all unit vector perpendicular to e ₁:

$$\begin{aligned} \left\|\sum_i \bar {\mathbf {W}}^{(0)}\right\| = \sup_{\mathbf {u}\perp \mathbf {e}_1, \|\mathbf {u}\| =1} \left| \sum_i \langle \mathbf {u}, \bar {\mathbf {W}}^{(0)}_i \mathbf {u}\rangle \right|, \end{aligned}$$

(29)

To control the supremum, we follow the same reasoning as in [7]. We bound $\sum_{i} \langle \mathbf {u}, \bar {\mathbf {W}}^{(0)}_{i} \mathbf {u}\rangle $ for fixed u and apply a covering argument over the sphere of u’s. We write

$$\sum_i \langle \mathbf {u}, \bar {\mathbf {X}}^{(0)}_i \mathbf {u}\rangle = \sum_i \eta_i 1_{E_i}, $$

where η _i are independent samples of

$$\eta= -\left[ z_1^2 -1 \right] \langle \mathbf {z}, \mathbf {u}\rangle ^2. $$

To apply the scalar Bernstein inequality, we compute $\mathbb {E}| \eta1_{E} |^{k}$. Because u⊥e ₁, z ₁ and 〈z,u〉 are independent. Hence,

$$\mathbb {E}| \eta1_E|^k \leq \mathbb {E}| (z_1^2 - 1) 1_{E }|^k \mathbb {E}| \langle \mathbf {z}, \mathbf {u}\rangle |^{2k}. $$

Bounding the first factor, we get

$$\begin{aligned} \mathbb {E}|(z_1^2 - 1) 1_E|^k =& \mathbb {E}|(z_1^2 - 1)^{k-2} 1_E (z_1^2 - 1)^{2} | \leq(2 \beta\log n)^{k-2} \mathbb {E}(z_1^2 -1)^2\\ =& 2 (2 \beta\log n)^{k-2}. \end{aligned}$$

Observing that 〈z,u〉 is a chi-squared variable with one degree of freedom, we have

$$\mathbb {E}| \langle \mathbf {z}, \mathbf {u}\rangle |^{2k} = 1 \times3 \times\cdots\times(2k-1) \leq2^k k! $$

Applying the scalar Bernstein inequality with V=16m and c ₀=4βlogn, we get

$$\mathbb {P}\left( \frac{1}{m} \left|\sum_i \eta_i 1_{E_i} - \mathbb {E}[ \eta_i 1_{E_i}] \right| \geq t \right) \leq2 \exp\left( - \frac{m t^2}{2(16+4 \beta t \log n)} \right). $$

Because $\mathbb {E}\eta_{i} = 0$, we get

$$| \mathbb {E}\eta_i 1_{E_i} | = | \mathbb {E}\eta_i 1_{E_i^c} | \leq\sqrt{\mathbb {P}(E_i^c)} \sqrt{ \mathbb {E}\eta_i^2} = 2\sqrt{ \pi(\beta)}, $$

where we have used $\mathbb {E}(1-z_{1}^{2})^{2} = 2$, and $\mathbb {E}|\langle \mathbf {z}, \mathbf {u}\rangle |^{4} = 3$. Hence,

$$\mathbb {P}\left( \frac{1}{m} \left|\sum_i \eta_i 1_{E_i} \right| \geq t + 2\sqrt{ \pi(\beta)} \right) \leq2 \exp\left( - \frac{m t^2}{2(16+4 \beta t \log n)} \right). $$

Taking t,β,m≥c ₁ n with n large enough so that $t + 2 \sqrt{\pi(\beta)} \leq1/8$, we have

$$\mathbb {P}\left( \frac{1}{m} \left|\sum_i \eta_i 1_{E_i} \right| \geq1/8 \right) \leq2 \exp\left(- \gamma' \frac{m}{\log n} \right), $$

for some γ′>0. To complete the bound on (31), we use Lemma 4 in [21]:

$$\sup_\mathbf {u}\left| \langle \mathbf {u}, \bar {\mathbf {W}}^{(0)}\mathbf {u}\rangle \right| \leq2 \sup_{\mathbf {u}\in\mathcal{N}_{1/4}} \left| \langle \mathbf {u}, \bar {\mathbf {W}}^{(0)}\mathbf {u}\rangle \right|, $$

where $\mathcal{N}_{1/4}$ is a 1/4-net of the unit sphere of vectors u⊥e ₁. As $| \mathcal{N}_{1/4}| \leq9^{n}$, a union bound gives

$$\mathbb {P}\left( \frac{1}{m} \left|\sum_i \eta_i 1_{E_i} \right| \geq1/8 \right) \leq9^n \cdot2 \exp\left(- \gamma' \frac{m}{\log n} \right). $$

Hence,

$$\mathbb {P}\left( \frac{1}{m} \left\|\sum_i \bar {\mathbf {W}}^{(0)}\right\| \geq\frac {1}{4} \right) \leq2 \exp\left(- \gamma m /\log n \right) $$

for some γ>0.

3.4.3 Bounds on $\bar {\mathbf {W}}^{(1)}$ and $\bar {\mathbf {W}}^{(2)}$

The bound for the $\|\sum_{i} \bar {\mathbf {W}}^{(1)}\|$ term is similar. We write

$$\sum_i \langle \mathbf {u}, \bar {\mathbf {W}}^{(1)}_i \mathbf {u}\rangle = \sum_i \eta_i 1_{E_i}, $$

where η _i are independent samples of

$$\eta= 3 \left[ \frac{\|\mathbf {z}\|_2^2}{n+2} -1 \right] \langle \mathbf {z}, \mathbf {u}\rangle ^2. $$

We can bound $\mathbb {E}| \eta_{i} 1_{E} |^{k} \leq12^{k} k!$ because $\|\mathbf {z}\|_{2}^{2} \leq3n$ on E. Applying the scalar Bernstein inequality with c ₀=12 and V=288m gives

$$\mathbb {P}\left( \frac{1}{m} \left|\sum_i \eta_i 1_{E_i} - \mathbb {E}[ \eta_i 1_{E_i}] \right| \geq t \right) \leq2 \exp\left( - \frac{m t^2}{2(288+ 12 t)} \right). $$

The rest of the bound is similar to that of $\|\sum_{i} \bar {\mathbf {X}}^{(0)}\|$ above.

Finally, we also bound $\|\sum_{i} \bar {\mathbf {W}}^{(2)}\|$ similarly. We write

$$\sum_i \langle \mathbf {u}, \bar {\mathbf {W}}^{(2)}_i \mathbf {u}\rangle = \sum_i \eta_i 1_{E_i}, $$

where η _i are independent samples of

$$\eta= 2\langle \mathbf {z}, \mathbf {u}\rangle ^2 - 2. $$

Observing that

$$\mathbb {E}|\eta_i 1_E|^k \leq4^k k!, $$

we apply the scalar Bernstein inequality with c ₀=4 and V=32m, giving

$$\mathbb {P}\left( \frac{1}{m} \left|\sum_i \eta_i 1_{E_i} - \mathbb {E}[ \eta_i 1_{E_i}] \right| \geq t \right) \leq2 \exp\left( - \frac{m t^2}{2(32+ 4 t)} \right). $$

The rest of the bound is as above.

4 Stability

We now prove Theorem 3, establishing the stability of the matrix recovery problem (4). We also prove Corollary 4, establishing the stability of the vector recovery problem (3). As in the exact case, the proof of Theorem 3 hinges on the ℓ ₁-isometry properties (7)–(8) and the existence of an inexact dual certificate satisfying (9). For stability, we use the additional property that $\mathbf {Y}= \mathcal {A}^{*} \lambda$ for a λ controlled in ℓ ₂. It suffices to establish an analogue of Lemma 1 along with a bound on ∥λ∥₂.

Lemma 4

Suppose that $\mathcal {A}$ satisfies (7)–(8) and there exists $\mathbf {Y}= \mathcal {A}^{*} \lambda$ satisfying (9) and ∥λ∥₁≤5. Then,

$$\mathbf {X}\succeq 0 \quad\textit{and}\quad \| \mathcal {A}(\mathbf {X}) - \mathbf {b}\|_2 \leq \varepsilon \|\mathbf {X}_0\| _2 \quad\Longrightarrow\quad \| \mathbf {X}- \mathbf {X}_0\|_2 \leq C \varepsilon \|\mathbf {X}_0\|_2, $$

for some C>0.

Proof of Lemma 4

As before, we take x ₀=e ₁ and $\mathbf {X}_{0} = \mathbf {e}_{1} \mathbf {e}_{1}^{*}$ without loss of generality. Consider any X⪰0 such that $\|\mathcal {A}(\mathbf {X}) - \mathbf {b}\|_{2}\leq \varepsilon $, and let H=X−X ₀. Whereas $\mathcal {A}(\mathbf {H}) = 0$ in the noiseless case, it is now of order ε because

$$\begin{aligned} \|\mathcal {A}(\mathbf {H}) \|_2 \leq\| \mathcal {A}(\mathbf {X}- \mathbf {b}) \|_2 + \| \mathcal {A}(\mathbf {X}_0 - \mathbf {b})\|_2 \leq2 \varepsilon . \end{aligned}$$

(30)

Similarly, |〈H,Y〉| is also of order ε because

$$\begin{aligned} | \langle \mathbf {H}, \mathbf {Y}\rangle | &= | \langle \mathcal {A}(\mathbf {H}), \lambda \rangle \| \leq\| \mathcal {A}(\mathbf {H}) \| _\infty\ \|\lambda\|_1 \leq\| \mathcal {A}(\mathbf {H}) \|_2 \ \|\lambda\|_1 \leq10 \varepsilon . \end{aligned}$$

Analogous to the proof of Lemma 1, we use (9) to compute that

$$\begin{aligned} 10 \varepsilon &\geq \langle \mathbf {H}, \mathbf {Y}\rangle \geq\| \mathbf {H}_{T^\perp }\|_1 - \frac{1}{2} \|\mathbf {H}_{T}\|. \end{aligned}$$

(31)

Using the ℓ ₁-isometry properties (7)–(8), we have

$$\begin{aligned} 0.94(1-\delta) \| \mathbf {H}_{T}\| \leq m^{-1} \| \mathcal {A}(\mathbf {H}_{T}) \|_1 &\leq m^{-1} \| \mathcal {A}(\mathbf {H}) \|_1 + m^{-1} \| \mathcal {A}(\mathbf {H}_{T^\perp }) \|_1 \\ &\leq m^{-1/2} \| \mathcal {A}(\mathbf {H}) \|_2 + (1+ \delta) \| \mathbf {H}_{T^\perp }\|_1 \\ &\leq2 \varepsilon m^{-1/2} + (1+\delta) \| \mathbf {H}_{T^\perp }\|_1 . \end{aligned}$$

(32)

Thus (33) becomes

$$\begin{aligned} \left( 10 + \frac{m^{-1/2}}{0.94(1-\delta)} \right) \varepsilon &\geq\left( 1 - \frac{1+\delta}{2 \cdot0.94 ( 1-\delta)} \right) \| \mathbf {H}_{T^\perp }\|_1, \end{aligned}$$

(33)

which, along with (34), implies

$$\begin{aligned} \|\mathbf {H}_{T^\perp }\|_1 &\leq C_0 \varepsilon \quad\text{and}\quad \|\mathbf {H}_{T}\| \leq C_1 \varepsilon \end{aligned}$$

(34)

for some C ₀,C ₁>0. Recalling that H _T has rank at most 2,

$$\|\mathbf {H}\|_2 \leq\|\mathbf {H}_{T}\|_2 + \| \mathbf {H}_{T^\perp }\|_2 \leq\sqrt{2} \|\mathbf {H}_{T}\| + \| \mathbf {H}_{T^\perp }\|_1 \leq(\sqrt{2} C_1 + C_0 ) \varepsilon \leq C \varepsilon . $$

□

4.1 Dual Certificate Property

It remains to show ∥λ∥₁≤5 for $\bar {\mathbf {Y}}= \mathcal {A}^{*} \lambda$. From (17), we identify $\lambda = m^{-1} (\mathbf {1}_{E} \circ \mathcal {A}\mathcal {S}^{-1}2(\mathbf {I}- \mathbf {e}_{1} \mathbf {e}_{1}^{*}))$. Computing,

$$\begin{aligned} \|\lambda\|_1 &= m^{-1} \| \mathbf {1}_E\circ \mathcal {A}\mathcal {S}^{-1}2(\mathbf {I}- \mathbf {e}_1 \mathbf {e}_1^*) \|_1 \\ &\leq m^{-1} \| \mathcal {A}\mathcal {S}^{-1}2(\mathbf {I}- \mathbf {e}_1 \mathbf {e}_1^*) \|_1 \\ &\leq m^{-1} \left\|\mathcal {A}\left( \frac{3}{n+2} \mathbf {I}\right) - \mathcal {A}\left( \mathbf {e}_1 \mathbf {e}_1^*\right) \right\|_1 \end{aligned}$$

(35)

$$\begin{aligned} &\leq(1 +\delta) \left(\left\| \frac{3}{n+2} \mathbf {I}\right\|_1 + \left\| \mathbf {e}_1 \mathbf {e}_1^*\right\|_1 \right) \\ &\leq4 (1+\delta), \end{aligned}$$

(36)

where (37) follows from (13), and (38) follows from the triangle inequality and the ℓ ₁-isometry property (7). Hence ∥λ∥₁≤5.

4.2 Proof of Corollary 4

Now we prove Corollary 4, showing that stability of the lifted problem (4) implies stability of the unlifted problem (3). As before, we take x ₀=e ₁ without loss of generality. Hence ∥X ₀∥₂=1. Lemma 4 establishes that ∥X−X ₀∥≤C ₀ ε. Recall that $\mathbf {X}_{0} = \mathbf {x}_{0} \mathbf {x}_{0}^{*}$. Decompose $X = \sum_{j} \lambda_{j} \mathbf {v}_{j} \mathbf {v}_{j}^{t}$ with unit-normalized eigenvectors v _j sorted by decreasing eigenvalue. By Weyl’s perturbation theorem,

$$\begin{aligned} \max\left\{ |1-\lambda_1 |, |\lambda_2|, \ldots, |\lambda_n| \right \} \leq C_0 \varepsilon . \end{aligned}$$

(37)

Writing

$$\begin{aligned} \mathbf {X}_0 - \mathbf {v}_1 \mathbf {v}_1^*= (\mathbf {X}_0 - \mathbf {X}) + \left( (\lambda_1 -1) \mathbf {v}_1 \mathbf {v}_1^*+ \sum_{j=2}^m \lambda_j \mathbf {v}_j \mathbf {v}_j^* \right), \end{aligned}$$

(38)

we use the triangle inequality to form the spectral bound

$$\| \mathbf {X}_0 - \mathbf {v}_1 \mathbf {v}_1^*\| \leq2 C_0 \varepsilon . $$

Noting that

$$1 -| \langle \mathbf {x}_0,\mathbf {v}\rangle |^2 = \frac{1}{2}\| \mathbf {X}_0 - \mathbf {v}_1 \mathbf {v}_1^*\|_2^2 \leq\| \mathbf {X}_0 - \mathbf {v}_1 \mathbf {v}_1^*\|^2 \leq4 C_0^2 \varepsilon ^2, $$

we conclude

$$\| \mathbf {x}_0 - \mathbf {v}\|_2^2 = 2 - 2 \langle \mathbf {x}_0, \mathbf {v}\rangle \leq8 C_0^2 \varepsilon ^2. $$

5 Complex Case

The proof of Theorems 1 and 3 are analogous to the complex-valued cases. There are a few minor differences, as outlined and proved in [7]. The sensing vectors are assumed to be of the form $\Re \mathbf {z}_{i} \sim\mathcal {N}(0, \mathbf {I})$ and $\Im \mathbf {z}_{i} \sim\mathcal{N}(0, \mathbf {I})$. The ℓ ₁-isometry conditions for complex $\mathcal {A}$ have weaker constants. Lemma 1 becomes

Lemma 5

Suppose that $\mathcal {A}$ satisfies

$$\begin{array}{l@{\quad}l} m^{-1} \| \mathcal {A}(\mathbf {X}) \|_1 \leq(1+ \delta) \|\mathbf {X}\|_1 &\textit{ for all } \mathbf {X}\succeq 0, \\ m^{-1} \| \mathcal {A}(\mathbf {X}) \|_1 \geq0.828(1 - \delta) \|\mathbf {X}\| &\textit{ for all } \mathbf {X}\in T, \end{array} $$

for some δ≤3/13. Suppose that there exists $\bar {\mathbf {Y}}\in \mathcal{R} (\mathcal {A}^{*})$ satisfying

$$\begin{aligned} \| \bar {\mathbf {Y}}_{T}\|_1 \leq1/2 \quad\textit{ and } \quad \bar {\mathbf {Y}}_{T^\perp }\succeq \mathbf {I}_{T^\perp }. \end{aligned}$$

Then, X ₀ is the unique solution to (2).

The proof of this lemma is identical to the real-valued case. The conditions of the lemma are satisfied with high probability, as before.

The construction of the inexact dual certificate is slightly different because $\mathcal{S}(\mathbf {X}) = \mathbf {X}+ \text {Tr}(\mathbf {X}) \mathbf {I}$ and $\mathcal {S}^{-1}(\mathbf {X}) = \mathbf {X}- \frac {1}{n+1} \text {Tr}(\mathbf {X}) \mathbf {I}$. As a result

$$\mathbf {Y}_i =\left[ \frac{4}{n+1} \|\mathbf {z}_i\|_2^2 - 2 |z_{i,1}|^2 \right] \mathbf {z}_i \mathbf {z}_i^*. $$

The remaining modifications are identical to those in [7], and we refer interested readers there for details.

6 Numerical Simulations

In this section, we show that the optimizationless perspective allows for additional numerical algorithms that are unavailable for PhaseLift directly. These methods give rise to simpler algorithms with less or no parameter tuning. We demonstrate successful recovery under Douglas-Rachford and Nesterov algorithms, and we empirically show that the convergence of these algorithms is linear.

6.1 Optimization Framework

From the perspective of nonsmooth optimization, PhaseLift and the optimizationless feasibility problem can be viewed as a two-term minimization problem

$$\begin{aligned} \min_\mathbf {X}F(\mathbf {X}) + G(\mathbf {X}). \end{aligned}$$

(39)

See, for example, the introduction to [18]. Numerical methods based on this splitting include Forward-Backward, ISTA, FISTA, and Douglas-Rachford [4, 9, 10, 18]. If F is smooth, it enables a forward step based on a gradient descent. Nonsmooth terms admit backward steps involving proximal operators. We recall that the proximal operator for a function G is given by

$$\begin{aligned} \text{prox}_G(\mathbf {X}) &= \text{argmin}_\mathbf {Y}\frac{1}{2} \|\mathbf {X}- \mathbf {Y}\|^2 + G(\mathbf {Y}), \end{aligned}$$

(40)

and we note that the proximal operator for a convex indicator function is the projector onto the indicated set.

PhaseLift can be put in this two-term form by softly enforcing the data fit. That gives the minimization problem

$$\begin{aligned} \min_\mathbf {X}\underbrace{\frac{1}{2} \| \mathcal {A}(\mathbf {X}) - \mathbf {b}\|^2 + \lambda\ \text {tr}(\mathbf {X})}_F + \underbrace{\iota _{\mathbf {X}\succeq 0}(\mathbf {X})}_G \end{aligned}$$

(41)

where ι _X⪰0 is the indicator function that is zero on the positive semidefinite cone and infinite otherwise, and where λ is small and positive. If λ=0, (43) reduces to the optimizationless feasibility problem. The smoothness of F enables methods that are forward on F and backward on G. As a representative of this class of methods, we will consider a Nesterov iteration for our simulations below.

The optimizationless view suggests the splitting

$$\begin{aligned} \min_\mathbf {X}\underbrace{\iota _{\mathcal {A}(\mathbf {X})= \mathbf {b}}(\mathbf {X})}_F + \underbrace{\iota _{\mathbf {X}\succeq 0}(\mathbf {X})}_G. \end{aligned}$$

(42)

where the data fit term is enforced in a hard manner by the indicator function $\iota _{\mathcal {A}(\mathbf {X})= \mathbf {b}}$. Because of the lack of smoothness, we can only use the proximal operators for F and G. These operators are projectors on to the affine space $\mathcal {A}(\mathbf {X}) = \mathbf {b}$ and X⪰0, which we denote by $\mathcal {P}_{\mathcal {A}(\mathbf {X}) = \mathbf {b}}$ and $\mathcal {P}_{\text{psd}}$, respectively.

The simplest method for (44) is Projection onto Convex Sets (POCS), which is given by the backward-backward iteration $\mathbf {X}_{n+1} = \mathcal {P}_{\text{psd}} \mathcal {P}_{\mathcal {A}(\mathbf {X}) = \mathbf {b}}\mathbf {X}_{n}$. Douglas-Rachford iteration often gives superior performance than POCS, so we consider it as a representative of this class of backward-backward methods.

A strength of the optimizationless perspective is that it does not require as much parameter tuning as PhaseLift. For example, formulation (43) requires a numerical choice for λ. Nonzero λ will generally change the minimizer. It is possible to consider a sequence of problems with varying λ, or perhaps to create a schedule of λ within a problem, but these considerations are unnecessary because the optimizationless perspective says we can take λ=0. In particular, formulation (44) has the further strength of requiring no parameters at all.

We note that PhaseLift could alternatively give rise to the two-term splitting

$$\begin{aligned} \min\underbrace{\text{tr}(\mathbf {X})}_F + \underbrace{\iota _{\mathbf {X}\succeq 0}(\mathbf {X}) + \iota _{\mathcal {A}(\mathbf {X})= \mathbf {b}}(\mathbf {X})}_G, \end{aligned}$$

(43)

where the data fit term is enforced in a hard manner. An iterative approach with this splitting would have an inner loop which approximates the proximal operator of G. This inner iteration is equivalent to solving the optimizationless problem.

6.2 Numerical Results

First, we present a Douglas-Rachford [9] approach for finding $\mathbf {X}\in\{\mathbf {X}\succeq 0\} \cap\{ \mathcal {A}(\mathbf {X}) \approx \mathbf {b}\}$ by the splitting (44). It is given by the iteration

$$\begin{aligned} \mathbf {X}_0 &= \mathbf {Y}_0 = 0 \end{aligned}$$

(44)

$$\begin{aligned} \mathbf {Y}_n &= \mathcal {P}_{\mathcal {A}(\mathbf {X}) = \mathbf {b}}(2 \mathbf {X}_{n-1} - \mathbf {Y}_{n-1}) - \mathbf {X}_{n-1} + \mathbf {Y}_{n-1} \end{aligned}$$

(45)

$$\begin{aligned} \mathbf {X}_n &= \mathcal {P}_\text {psd}(\mathbf {Y}_n) \end{aligned}$$

(46)

where $\mathcal {P}_{\text{psd}}$ is the projector onto the positive semi-definite cone of matrices, and $\mathcal {P}_{\mathcal {A}(\mathbf {X}) = \mathbf {b}}$ is the projector onto the affine space of solutions to $\mathcal {A}(\mathbf {X}) = \mathbf {b}$. In the classically underdetermined case, $m < \frac{(n+1)n}{2}$, we can write

$$\mathcal {P}_{\mathcal {A}(\mathbf {X}) = \mathbf {b}}\mathbf {X}= \mathbf {X}- \mathcal {A}^* (\mathcal {A}\mathcal {A}^*)^{-1} \mathcal {A}(\mathbf {X}) + \mathcal {A}^*(\mathcal {A}\mathcal {A}^*)^{-1} \mathbf {b}. $$

In the case that $m \geq\frac{(n+1)n}{2}$, we interpret $\mathcal {P}_{\mathcal {A}(\mathbf {X}) = \mathbf {b}}$ as the least squares solution to $\mathcal {A}(\mathbf {X}) = b$.

Second, we present a Nesterov gradient-based method for solving the problem (43). Letting ${g(\mathbf {X}) = \frac{1}{2} \| \mathcal {A}(\mathbf {X}) - \mathbf {b}\|^{2} + \lambda\ \text {tr} (\mathbf {X})}$, we consider the following Nesterov iteration [5] with constant step size α:

$$\begin{aligned} \mathbf {X}_0 &= \mathbf {Y}_0 = 0 \end{aligned}$$

(47)

$$\begin{aligned} \mathbf {X}_n &= \mathcal {P}_\text {psd}(\mathbf {Y}_{n-1} - \alpha\nabla g(\mathbf {Y}_n-1)) \end{aligned}$$

(48)

$$\begin{aligned} \theta_n &= 2 \left( 1 + \sqrt{1 + 4 / \theta^2_{n-1}} \right)^{-1} \end{aligned}$$

(49)

$$\begin{aligned} \beta_n &= \theta_n (\theta^{-1}_{n-1} - 1) \end{aligned}$$

(50)

$$\begin{aligned} \mathbf {Y}_n &= \mathbf {X}_n + \beta_n(\mathbf {X}_n - \mathbf {X}_{n-1}) \end{aligned}$$

(51)

For our simulations, we consider $\mathbf {x}_{0} \in \mathbb {R}^{n}$ sampled uniformly at random from the unit sphere. We take independent, real-valued $\mathbf {z}_{i} \sim\mathcal{N}(0,\mathbf {I})$, and let the measurements b be subject to additive Gaussian noise corresponding to ε=1/10. We let n vary from 5 to 50 and let m vary from 10 to 250. We define the recovery error as ∥X−X ₀∥₂/∥X ₀∥₂.

Figure 2 shows the average recovery error for the optimizationless problem under the Douglas-Rachford method and the Nesterov method over a range of values of n and m. For the Nesterov method, we consider the optimizationless case of λ=0, and we let the step size parameter α=2×10⁻⁴. Each pair of values was independently sampled 10 times, and both methods were run for 1000 iterations. The plot shows that the number of measurements needed for recovery is approximately linear in n, significantly lower than the amount for which there are an equal number of measurements as unknowns. The artifacts around the curve $m = \frac {n(n+1)}{2}$ appear because the problem is critically determined, and the only solution to the noisy $\mathcal {A}(\mathbf {X}) = \mathbf {b}$ is not necessarily positive in that case.

Figure 3 shows recovery error versus iteration number under the Douglas-Rachford method, the Nesterov method for λ=0 and the Nesterov method for λ=10⁻⁵. For the Nesterov methods, we let the step size parameter be α=10⁻⁴. For noisy data, convergence is initially linear until it tapers off around the noise level. For noiseless data, convergence for feasibility problem is linear under both the Douglas-Rachford and Nesterov methods. The Nesterov implementation of PhaseLift shows initial linear convergence until it tapers off. Because any nonzero λ allows for some data misfit in exchange for a smaller trace, the computed minimum is not X ₀ and the procedure converges to some nearby matrix. The convergence rates of the Nesterov method could probably be improved by tuning the step-sizes in a more complicated way. Nonetheless, we observe that the Douglas-Rachford method exhibits a favorable convergence rate while requiring no parameter tuning.

We would like to remark that work subsequent to this paper shows that the number of measurements needed by the optimizationless feasibility problem is about the same as the number needed by PhaseLift [22]. That is, the phase transition in Fig. 2 occurs in about the same place for both problems.

References

Balan, R., Bodmann, B., Casazza, P.G., Edidin, D.: Painless reconstruction from magnitudes of frame vectors. J. Fourier Anal. Appl. 15(4), 488–501 (2009)
Article MATH MathSciNet Google Scholar
Balan, R., Casazza, P., Edidin, D.: On signal reconstruction without phase. Appl. Comput. Harmon. Anal. 20(3), 345–356 (2006)
Article MATH MathSciNet Google Scholar
Bandeira, A.S., Singer, A., Spielman, D.A.: A Cheeger inequality for the graph connection Laplacian. To appear in SIAM J. Matrix Anal. Appl.
Beck, A., Teboulle, M.: A fast iterative Shrinkage-Thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MATH MathSciNet Google Scholar
Candes, E.J., Eldar, Y.C., Strohmer, T., Voroninski, V.: Phase retrieval via matrix completion. SIAM J. Imaging Sci. 6(1), 199–225 (2011)
Article MathSciNet Google Scholar
Candes, E.J., Li, X.: Solving quadratic equations via phaselift when there are about as many equations as unknowns. To appear in Found. Comput. Math. (2012)
Candes, E.J., Strohmer, T., Voroninski, V.: PhaseLift: exact and stable signal recovery from magnitude measurements via convex programming. Commun. Pure Appl. Math. 66(8), 1241–1274 (2013)
Article MATH MathSciNet Google Scholar
Chai, A., Moscoso, M., Papanicolaou, G.: Array imaging using intensity-only measurements. Inverse Probl. 27(1), 015005 (2011)
Article MathSciNet Google Scholar
Combettes, P., Pesquet, J.: Proximal splitting methods in signal processing. In: Fixed-Point Algorithms for Inverse Problems in Science and Engineering. Springer, New York (2010)
Google Scholar
Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82(2), 421–439 (1956)
Article MATH MathSciNet Google Scholar
Fienup, J.R.: Phase retrieval algorithms: a comparison. Appl. Opt. 21(15), 2758–2769 (1982)
Article Google Scholar
Gerchberg, R., Saxton, W.: A practical algorithm for the determination of phase from image and diffraction plane pictures. Optik 35, 237–246 (1972)
Google Scholar
Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semi-definite programming. J. ACM 42, 1115–1145 (1995)
Article MATH MathSciNet Google Scholar
Griffin, D., Lim, J.: Signal estimation from modified short-time Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 32(2), 236–243 (1984)
Article Google Scholar
Harrison, R.W.: Phase problem in crystallography. J. Opt. Soc. Am. A 10(5), 1045–1055 (1993)
Article Google Scholar
Miao, J., Ishikawa, T., Shen, Q., Earnest, T.: Extending X-ray crystallography to allow the imaging of non-crystalline materials, cells and single protein complexes. Annu. Rev. Phys. Chem. 59, 387–410 (2008)
Article Google Scholar
Millane, R.P.: Phase retrieval in crystallography and optics. J. Opt. Soc. Am. A 7, 394–411 (1990)
Article Google Scholar
Raguet, H., Fadili, J.M., Peyre, G.: A generalized forward-backward splitting. SIAM J. Imaging Sci. 6(3), 1199–1226 (2013)
Article MATH MathSciNet Google Scholar
Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)
Article MATH MathSciNet Google Scholar
Singer, A.: Angular synchronization by eigenvectors and semidefinite programming. Appl. Comput. Harmon. Anal. 30(1), 20–36 (2011)
Article MATH MathSciNet Google Scholar
Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices. In: Eldar, Y.C., Kutyniok, G. (eds.) Compressed Sensing: Theory and Applications. Camb. Univ Press, Cambridge (2010)
Google Scholar
Waldspurger, I., d’Aspremont, A., Mallat, S.: Phase recovery, Maxcut, and complex semi-definite programming. Preprint (2012). arXiv:1206.0102
Walther, A.: The question of phase retrieval in optics. Opt. Acta 10, 41–49 (1963)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors acknowledge generous funding from the National Science Foundation, the Alfred P. Sloan Foundation, TOTAL S.A., and the Air Force Office of Scientific Research. The authors would also like to thank Xiangxiong Zhang for helpful discussions.

Author information

Authors and Affiliations

Department of Mathematics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA, 02139, USA
Laurent Demanet & Paul Hand

Authors

Laurent Demanet
View author publications
You can also search for this author in PubMed Google Scholar
Paul Hand
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paul Hand.

Additional information

Communicated by Peter Casazza.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Demanet, L., Hand, P. Stable Optimizationless Recovery from Phaseless Linear Measurements. J Fourier Anal Appl 20, 199–221 (2014). https://doi.org/10.1007/s00041-013-9305-2

Download citation

Published: 14 November 2013
Issue Date: February 2014
DOI: https://doi.org/10.1007/s00041-013-9305-2

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Stable Optimizationless Recovery from Phaseless Linear Measurements

Abstract

Similar content being viewed by others

Convex Recovery of a Structured Signal from Independent Random Linear Measurements

Stable Signal Recovery from Phaseless Measurements

Exact recovery of sparse multiple measurement vectors by \(l_{2,p}\)-minimization

1 Introduction

1.1 Problem Statement and Main Result

Theorem 1

Corollary 2

1.2 Stability Result

Theorem 3

Corollary 4

1.3 Organization of this Paper

1.4 Notation

2 Proof of Main Result

2.1 About Dual Certificates

2.2 Central Lemma on Inexact Dual Certificates

Lemma 1

Proof of Lemma 1

2.3 Proof of Theorem 1 and Corollary 2

Proof of Theorem 1

3 Existence of Inexact Dual Certificate

3.1 Motivation for the Dual Certificate

3.2 Bounds on \(\bar {\mathbf {Y}}\)

Lemma 2

Lemma 3

3.3 Proof of Lemma 2: \(\bar {\mathbf {Y}}\) on T

Theorem 5

3.4 Proof of Lemma 3: \(\bar {\mathbf {Y}}\) on T ⊥

3.4.1 Bound on \(\mathbf {I}_{T^{\perp}}1_{E_{i}^{c}}\)

Theorem 6

3.4.2 Bound on \(\bar {\mathbf {W}}^{(0)}\)

3.4.3 Bounds on \(\bar {\mathbf {W}}^{(1)}\) and \(\bar {\mathbf {W}}^{(2)}\)

4 Stability

Lemma 4

Proof of Lemma 4

4.1 Dual Certificate Property

4.2 Proof of Corollary 4

5 Complex Case

Lemma 5

6 Numerical Simulations

6.1 Optimization Framework

6.2 Numerical Results

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation

3.4 Proof of Lemma 3: \(\bar {\mathbf {Y}}\) on T ^⊥