Abstract
The largest eigenvalue of a matrix is always larger or equal than its largest diagonal entry. We show that for a class of random Laplacian matrices with independent off-diagonal entries, this bound is essentially tight: the largest eigenvalue is, up to lower order terms, often the size of the largest diagonal. entry. Besides being a simple tool to obtain precise estimates on the largest eigenvalue of a class of random Laplacian matrices, our main result settles a number of open problems related to the tightness of certain convex relaxation-based algorithms. It easily implies the optimality of the semidefinite relaxation approaches to problems such as \({{\mathbb {Z}}}_2\) Synchronization and stochastic block model recovery. Interestingly, this result readily implies the connectivity threshold for Erdős–Rényi graphs and suggests that these three phenomena are manifestations of the same underlying principle. The main tool is a recent estimate on the spectral norm of matrices with independent entries by van Handel and the author.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Toward the end of the 1950s, Eugene Wigner [48] made the remarkable finding that the spectrum of a large class of random matrices is, in high dimension, distributed essentially the same way: under mild assumptions, the distribution of the spectrum converges to the so-called Wigner semicircle law. The study of spectral properties of random matrices has since been a fascinating research area, with important implications in many areas. We refer the reader to the books [6, 43] for more on this subject.
The present paper addresses the problem of estimating the largest eigenvalue of a large class of Laplacian matrices. The investigation of such problems has strong motivations from algorithmic analysis. Indeed, the performance of many popular algorithms is tightly connected with the largest eigenvalue of some matrix that depends on its input, and so studying the performance of such algorithms over random inputs involves understanding the behavior of the largest eigenvalue of a random matrix. In fact, as we will see, the estimates derived here play a crucial role in understanding the typical performance of a natural semidefinite programming-based approach for solving certain computationally hard problems on graphs, such as community detection.
We use the term Laplacian matrix to refer to symmetric matrices whose rows and columns sum to zero. While oftentimes Laplacians are also thought of as being positive semidefinite, the matrices we will treat will not necessarily satisfy this property. Spectral graph theory inspires a useful way of thinking about these matrices [18]. Given a graph on n nodes with edge set E, its adjacency matrix \(A\in \mathbb {R}^{n\times n}\) is defined by \(A_{ij}=1\) if \((i,j)\in E\) and \(A_{ij}=0\) otherwise, and its degree matrix \(D_A\) is a diagonal matrix whose ith diagonal entry is equal to the degree of node i. The Laplacian of the graph is defined to be \(L_A = D_A - A\). The spectrum of the graph Laplacian matrix is known to contain important information about the graph [18] and has been studied for random graphs [14, 17, 22]. Analogously, we make the following definition.
Definition 1.1
Given a symmetric matrix \(X\in \mathbb {R}^{n\times n}\), we define the Laplacian \(L_X\) of X as
where \(D_X\) is the diagonal matrix whose diagonal entries are given by
We will refer to any such matrix \(L_X\) as a Laplacian matrix. Note that these are precisely the symmetric matrices L for which \(L \mathbf {1} =0\), where \(\mathbf {1}\in \mathbb {R}^n\) denotes the all-ones vector.
This paper is concerned with a class of random Laplacian matrices \(L_X\) where the entries of the matrix X are independent centered (but not necessarily identically distributed) random variables. Our main result is that, under mild and easily verifiable conditions, the largest eigenvalue of \(L_X\) is, up to lower order terms, given by its largest diagonal entry. While we defer the formal statement of our main resultsFootnote 1 to Sect. 2, we informally state them here.
Informal Statement of Theorem (2.1)
Let L be an \(n\times n\) symmetric random Laplacian matrix (i.e., satisfying \(L \mathbf {1}=0\)) with centered independent off-diagonal entries such that \(\sum _{j\in [n]\setminus i}\mathbb {E}L_{ij}^2\) is equal for every i, and
Then, with high probability,
Not only does our main result provide an extremely simple tool to precisely estimate the largest eigenvalue of Laplacian matrices, but in the applications studied below, the largest diagonal value also enjoys an interpretation that is intimately tied to the underlying problem.
To illustrate the latter point, we turn back to graph theory. It is well known that the spectrum of the Laplacian of a graph dictates whether or not the graph is connected. On the other hand, its diagonal is simply given by the degrees of the nodes of the graph. A relation between the spectrum of the Laplacian and its diagonal could then translate into a relation between degrees of nodes of a graph and its connectivity. In fact, such a relation is already known to exist: The phase transition for connectivity of Erdős–Rényi graphs Footnote 2 coincides with the one for the existence of isolated nodes. While it is true that any graph with an isolated node (a node with degree zero) cannot be connected, the converse is far from true, rendering this phenomenon particularly interesting. In Sect. 3.1, we will use our main result to provide a simple and illustrative proof of this phenomenon.Footnote 3
We will use our main result to give sharp guarantees for certain algorithms that solve the \({\mathbb {Z}}_2\) Synchronization problem and the community detection problem in the stochastic block model. The \({\mathbb {Z}}_2\) Synchronization problem consists of recovering binary labels \(x_i=\pm 1\) associated with nodes of a graph from noisy (pairwise) measurements of \(x_ix_j\) whenever (i, j) is an edge of the graph (see [42]). This problem is intimately related to correlation clustering [11]. Despite its hardness, spectral methods and semidefinite programming-based methods are known to perform well in both the worst-case [9] and average-case settings [1, 2, 19].Footnote 4
Community detection, or clustering, in a graph is a central problem in countless applications. Unfortunately, even the simplified version of partitioning a graph into two vertex sets, with the same size, that minimize the number of edges across the partition, referred to as minimum bisection, is known to be NP-hard. Nevertheless, certain heuristics are known to work well for typical realizations of random graph models that exhibit community structure [12, 25, 35]. In this setting, a particularly popular model is the stochastic block model with two communities.
Definition 1.2
(Stochastic block model with two communities) Given n even, and \(0\le p,q \le 1\), we say that a random graph G is drawn from \(\mathcal {G}(n,p,q)\), the stochastic block model with two communities, if G has n nodes, divided in two clusters of \(\frac{n}{2}\) nodes each, and for each pair of vertices i, j, (i, j) is an edge of G with probability p if i and j are in the same cluster and with probability q otherwise, independently from any other edge.
We will focus on the setting \(p>q\). The problem of recovering, from a realization \(G\sim \mathcal {G}(n,p,q)\), the original partition of the underlying vertices gained popularity when Decelle et al. [21] conjectured a fascinating phase transition in the constant average–degree regime. More precisely, if \(p = \frac{a}{n}\) and \(q = \frac{b}{n}\) with \(a>b\) constants, it was conjectured that as long as
it is possible to make an estimate of the original partition that correlates with the true partition and that below this threshold it is impossible to do so. This conjecture was later proven in a remarkable series of works by Mossel et al. [37, 38] and Massoulie [34]. Instead of settling for an estimate that correlates with the true partition, we will focus on exactly recovering the partition. A phase transition for this problem was established by Abbe et al. [3] and independently by Mossel et al. [36]. We will show that a certain semidefinite programming-based algorithm succeeds up to the information theoretical threshold, thus settling a problem posed in [3]. We remark that, while the present paper was being written, it was brought to our attention that this problem was also solved independently by parallel research efforts of Hajek et al. [28].
The use of semidefinite relaxations in combinatorial optimization dates back to the late 1970s with the seminal work of László Lovász [32] in the so-called Lovász theta function, and this approach was shortly after made algorithmic in [27]. In the first half of the 1990s, interior point methods were adapted to solve semidefinite programs [5, 39], providing reasonably efficient methods to solve these problems. In 1995, Goemans and Williamson devised the first approximation algorithm based on semidefinite programming [26]. Their algorithm gave the best-known approximation ratio to the Max-Cut problem. Ever since, many approximation algorithms have been designed based on semidefinite programming. In fact, the algorithm we will analyze is greatly inspired by the semidefinite relaxation in [26]. Remarkably, an important conjecture of Khot [30] is known to imply that for a large class of problems including Max-Cut, this approach produces optimal approximation ratios [40].
An approximation ratio is a guarantee that, for any possible instance of the input, the algorithm outputs a solution whose performance is at least a certain fraction (the approximation ratio) of the optimal one. The worst-case nature of this type of guarantee is often pessimistic. A popular alternative is to equip the input with a distribution (for example the Stochastic Block Model) and give guarantees for most inputs. More precisely, we will be interested in understanding when is it the case that the semidefinite relaxation approach gives exactly the correct answer (for most inputs). The tendency for a large class of semidefinite relaxations to be tightFootnote 5 has been observed and conjectured, for example, in [8]. One of the main insights of this paper is the fact that the phenomenon described by our main result provides a unifying principle for understanding the tightness of many convex relaxations.
1.1 Notation
We will make use of several standard matrix and probability notations. For M a matrix, we will denote its kth smallest eigenvalue by \(\lambda _{k}(M)\), largest eigenvalue by \(\lambda _{\max }(M)\), and its spectral norm by \(\Vert M\Vert \). \(\mathrm {diag}(M)\) will be used to refer to a vector with the diagonal elements of M as entries. For \(x\in \mathbb {R}^n\) a vector, \(\mathrm {diag}(x)\) will denote a diagonal matrix \(D\in \mathbb {R}^{n\times n}\) with \(D_{ii}=x_i\).
\(\mathbf {1}\) will denote the all-ones vector, whenever there is no risk of ambiguity for its dimension.
For a scalar random variable Y, we will write its p-norm as \(\Vert Y\Vert _p = \left( \mathbb {E}|Y|^p \right) ^{1/p}\) and infinity norm as \(\Vert Y\Vert _{\infty } = \inf \left\{ a :\ |Y| \le a \text { a.\ s.} \right\} \).
Given a graph, \(\deg (i)\) will be used to denote the degree of node i. In the case of the stochastic block model, \(\deg _\mathrm{in}(i)\) will be used for inner-cluster degree and \(\deg _\mathrm{out}(i)\) for outer-cluster degree.
We will say that an event \(\mathcal {E}\) happens with high probability when
where n is an underlying parameter that is thought of going to infinity (such as the dimension of the matrices or the number of nodes in the graphs being studied).
2 Main Results
We use this section to formulate precise versions of, and briefly discuss, our main results.
Theorem 2.1
Let L be an \(n\times n\) symmetric random Laplacian matrix (i.e., satisfying \(L \mathbf {1}=0\)) with centered independent off-diagonal entries such that \(\sum _{j\in [n]\setminus i}\mathbb {E}L_{ij}^2\) is equal for every i.
Define \(\sigma \) and \(\sigma _{\infty }\) as
If there exists \(c>0\) such that
then there exists \(c_1, C_1, \beta _1\), all positive and depending only on c, such that
with probability at least \(1-c_1n^{-\beta _1}\).
As it will become clear below, the proof of this theorem consists essentially in showing that, under suitable conditions, \(\max _{i}L_{ii} \gtrsim \sigma \sqrt{\log n}\) while \(\Vert X\Vert \lesssim \sigma \), where \(-X\) is the off-diagonal part of L.
Even though we were not able to find a convincing application for which \(\frac{\sigma }{\sigma _\infty }\) was asymptotically growing but slower than \(\sqrt{\log n}\), we still include the theorem below for the sake of completeness.
Theorem 2.2
Let L be an \(n\times n\) symmetric random Laplacian matrix (i.e., satisfying \(L \mathbf {1}=0\)) with centered independent off-diagonal entries such that \(\sum _{j\in [n]\setminus i}\mathbb {E}L_{ij}^2\) is equal for every i.
Define \(\sigma \) and \(\sigma _{\infty }\) as
If there exist c and \(\gamma >0\) such that
then there exist \(C_2, c_2, \epsilon \), and \(\beta _2\), all positive and depending only on c and \(\gamma >0\), such that
with probability at least \(1-c_2\exp \left[ - \left( \log n\right) ^{\beta _2} \right] \).
Remark 2.3
In the theorems above, the condition that \(\sum _{j\in [n]\setminus i}\mathbb {E}L_{ij}^2 \) is equal for every i can be relaxed to the requirement that
for all i. This requires only simple adaptations to the proofs of these theorems.
While we defer the proof of these theorems to Sect. 4, we briefly describe its idea. Lemma 4.1 (borrowed from [10]) estimates that
where \(-X\) is the off-diagonal part of L. One the other hand, \(L_{ii} = \sum _{j\in [n]\setminus i} X_{ij}\) has variance \(\sigma ^2\) and the central limit theorem would suggest that \(L_{ii}\) behave like independent Gaussians of variance \(\sigma ^2\), which would mean that \(\max _{i}L_{ii} \sim \sigma \sqrt{\log n}\) rendering the contribution of the off-diagonal entries (to the largest eigenvalue) negligible. However, several difficulties arise the diagonal entries are not independent (as each pair shares a summand) and one needs to make sure that the central limit theorem behavior sets in (this is, in a way, ensured by requirements (1) and (2)). The proofs in Sect. 4 make many needed adaptations to this argument to make it rigorous.
3 Applications
We now turn our attention to applications of the main results. As a form of warm-up, we will start with understanding connectivity of Erdős–Rényi graphs.
3.1 Connectivity of Erdős–Rényi Graphs
Recall that, for an integer n and an edge probability parameter \(0\le p\le 1\), the Erdős–Rényi graph model [24] \(\mathcal {G}(n,p)\) is a random graph on n nodes where each one of the \({n \atopwithdelims ()2}\) edges appears independently with probability p.
We are interested in understanding the probability that G, drawn according to \(\mathcal {G}(n,p)\), is a connected graph. We will restrict our attention to the setting \(p\le \frac{1}{2}\). Let L be the Laplacian of the random graph, given by \(D-A\) where A is its adjacency matrix and D a diagonal matrix containing the degree of each node. It is well known (see, e.g., [18]) that G connected is equivalent to \(\lambda _2(L)>0\).
It is clear that if G has an isolated node, then it cannot be connected. It is also known that for there not to be isolated nodes one needs the average degree of each node to be at least logarithmic [24]. For this reason, we will focus on the regime
for a constant \(\rho \). It is easy to establish a phase transition on the degrees of the nodes of graphs drawn from \(\mathcal {G}(n,p)\).
Lemma 3.1
Let n be a positive integer, \(\rho \) a constant, and \(p = \frac{\rho \log n}{n}\). Let G be a random graph drawn from \(\mathcal {G}(n,p)\), then for any constant \(\Delta >0\):
-
(1)
If \(\rho > 1\), then, with high probability, \(\min _{i\in [n]}\deg (i) \ge \frac{ \Delta }{\sqrt{\log n}}\mathbb {E}\deg (i)\).
-
(2)
If \(\rho < 1\), then, with high probability, \(\min _{i\in [n]}\deg (i) = 0\). That is, G has at least one isolated node, thus being disconnected.
Part (2) of the lemma is a classical result [24], a particularly simple proof of it proceeds by applying the second moment method to the number of isolated nodes in G. For the sake of brevity, we will skip those details and focus on part (1). The main thing to note in part (1) of Lemma 3.1 is that the lower bound on minimum degree is asymptotically smaller than the average degree \(\mathbb {E}\deg (i)\).
Proof
(of part (1) of Lemma 3.1)
Let \(p=\frac{\rho \log n}{n}\) and i denote a node of the graph, note that \(\mathbb {E}\deg (i) = \frac{n-1}{n}\rho \log n\). We use Chernoff bound (see, for example, Lemma 2.3.3 in [23]) to establish, for any \(0<t<1\),
Taking \(t = \frac{\Delta }{\sqrt{\log n}}\) gives, for n large enough (so that \(t\le 1\)), that the probability that \(\deg (i) < \frac{\Delta }{\sqrt{\log n}} \mathbb {E}\deg (i)\) is at most
which is easily seen to be \(\exp \left[ -\rho \log n + O(\sqrt{\log n} \log \log n) \right] \). A simple union bound over the n vertices of G gives
\(\square \)
Using Theorem 2.1, we will show that, with high probability, as long as every node in G is at least \(\frac{\Delta }{\sqrt{\log n}}\) of the average degree, for a suitable \(\Delta \), then G is connected. This is made precise in the following Lemma.
Lemma 3.2
Let \(n\ge 2\) be an integer and \(\varepsilon >0\). Suppose that \( \frac{\varepsilon \log n}{n}\le p \le \frac{1}{2}\) and G a random graph drawn from \(\mathcal {G}(n,p)\). There exists a constant \(\Delta \) (potentially depending on \(\varepsilon \)) such that, with high probability, the following holds:
If
then G is a connected graph (note that the right-hand side does not depend on i).
Before proving this Lemma, we note that Lemmas 3.1 and 3.2 immediately imply the well-known phase transition phenomenon.
Theorem 3.3
Let n be a positive integer and \(p = \frac{\rho \log n}{n}\).
-
(1)
If \(\rho > 1\), then, with high probability, a random graph drawn from \(\mathcal {G}(n,p)\) is connected.
-
(2)
If \(\rho < 1\), then, with high probability, a random graph drawn from \(\mathcal {G}(n,p)\) has at least one isolated node, thus being disconnected.
While this phase transition is well understood, we find our proof through Lemmas 3.1 and 3.2 enlightening, as it provides a simple explanation of why the phase transition for disappearance of isolated nodes coincides with the one for connectivity. Moreover, it also emphasizes a connection with the optimality of the semidefinite relaxations in both \({\mathbb {Z}}_2\) Synchronization and the Stochastic Block Model that we will discuss in the sections to follow.
Proof
(of Lemma 3.2)
Let \(L_{\mathrm {ER}}\) be the graph Laplacian of G. Note that \(\mathbb {E}\left( L_{\mathrm {ER}}\right) = npI - p \mathbf {1} \mathbf {1}^T\), which means that
Since \(L_{\mathrm {ER}} \mathbf {1} = 0\) and G is connected if and only if \(\lambda \left( L_{\mathrm {ER}}\right) >0\), it follows that G is connected if and only if
We proceed by using Theorem 2.1 for
The hypotheses of the theorem are satisfied as the off-diagonal entries of L are independent and
This guarantees that there exists a constant \(C_1\) such that, with high probability,
where \(\deg (i) = \left( L_{\mathrm {ER}}\right) _{ii}\) is the degree of node i. Equivalently,
Together with (3), This implies that, as long as (4) holds, then
implies the connectivity of G. Straightforward manipulations show that this condition is equivalent to
which is implied by
The lemma follows by taking \(\Delta = 2C_1\). \(\square \)
3.2 A Simpler Problem: \({\mathbb {Z}}_2\) Synchronization with Gaussian Noise
Before presenting the applications of Theorem 2.1 in understanding the performance of a semidefinite relaxation for the problems of \({\mathbb {Z}}_2\) Synchronization and recovery in the Stochastic Block Model, we will motivate them by presenting beforehand a simpler version of these problems:Footnote 6 given a noise level \(\sigma \) and a vector \(z\in \{\pm 1\}^n\) suppose we are given noisy measurements
for each pair (i, j), where \(W_{ij}\) are i.i.d. standard Gaussian random variables (with \(W_{ij}=W_{ji}\)). A version of this problem, over the complex numbers, is treated in [7]. Our objective is to devise an algorithm that recovers the correct z with high probability. By definition, the maximum a posteriori (MAP) estimator maximizes the probability of recovering the correct variable z. Given that we have no a priori information on z we assume a uniform prior, in that case the MAP estimator coincides with the maximum likelihood estimator (MLE) for z. The latter is the solution of
which is known to be NP-hard in general. In fact, (6) includes the Max-Cut problem by taking Y to be the Laplacian of a graph. In the spirit of the relaxation proposed in [26] for the Max-Cut problem, we take \(X = xx^T\) and rewrite (6) as
We now relax the nonconvex rank constraint and arrive at the following semidefinite program, which can be solved in polynomial time up to arbitrary precision [46].
As it will be clear in the below, this relaxation is also used to solve \({\mathbb {Z}}_2\) Synchronization and recovery in the Stochastic Block Model, albeit for a different coefficient matrix Y.
Note that if \(X=xx^T\) is the unique solution to (8), then x must be the solution to (6), meaning that we are able to compute the MLE efficiently by solving (8). This motivates us to understand when is it the case that \(X = xx^T\) is the unique optimal solution of (8). A fruitful way of approaching this relies on duality. The dual of (8) is given by:
Weak duality guarantees that if X and D are feasible solutions of, respectively, (8) and (9), then \(\mathrm {Tr}(YX)\le \mathrm {Tr}(D)\). Indeed, since X and \(D-Y\) are both positive semidefinite, we must have
This means that if we are able to find a so-called dual certificate, a matrix D feasible for (9) for which \(\mathrm {Tr}(D) = \mathrm {Tr}(Yxx^T)\), then it guarantees that \(X=xx^T\) is an optimal solution of (8). To guarantee uniqueness, it suffices to further ensure that \(\lambda _2(D-Y)>0\). In fact, if there existed another optimal solution X, by (10), one would have \(\mathrm {Tr}\left[ (D-Y)X \right] =0\) which can be shown to imply (see, for example, Lemma 5.1. in [1]), together with the feasibility of X, that \(X=xx^T\). This establishes the following Lemma.
Lemma 3.4
(Dual certificate) Let Y be a symmetric \(n\times n\) matrix and \(x\in \{\pm 1\}^n\). If there exists a diagonal matrix D, such that \(\mathrm {Tr}(D) = x^TYx\), \(D-Y\succeq 0\), and \(\lambda _2(D-Y)>0\), then \(X=xx^T\) is the unique optimal solution of (8).
We take a candidate dual certificate D whose diagonal elements are given by
Note that \(D = D_{[\mathrm {diag}(x)Y\mathrm {diag}(x)]}\) as per Definition 1.1. It is easy to see that \(\mathrm {Tr}(D) = x^TYx\) and \((D-Y)x = 0\) which gives the following Lemma.
Lemma 3.5
Let Y be a symmetric \(n\times n\) matrix and \(x\in \{\pm 1\}^n\). Let D be the diagonal matrix defined as \(D = D_{[\mathrm {diag}(x)Y\mathrm {diag}(x)]}\). As long as
\(X=xx^T\) is the unique optimal solution of (8).
Let us return to the setting on which \(Y = zz^T + \sigma W\), where W is a standard Wigner matrix: a symmetric matrix with iid standard Gaussian entries. We want to determine for which values of \(\sigma \) one excepts \(X = zz^T\) to be, with high probability, the solution of (8), as we are interested not only to compute the MLE but also for it to coincide with the planted vector z we want to recover. Since \(\mathrm {diag}(z)W\mathrm {diag}(z)\sim W\) we can, without loss of generality, take \(z=\mathbf {1}\). In that case, we are interested in understanding when
Since
and \(\mathbf {1}\) is always in the nullspace of any Laplacian matrix, we can rewrite (11) as \(\sigma \lambda _{\max }\left( L_{\left[ -W\right] } \right) < \lambda _2\left( L_{\mathbf {1}\mathbf {1}^T} \right) \), showing that it is equivalent to
The triangular inequality tells us that \(\lambda _{\max }\left( L_{\left[ -W\right] } \right) \le \lambda _{\max }\left( -D_{W} \right) + \Vert W\Vert \). It is well known that, for any \(\varepsilon >0\), \(\Vert W\Vert \le (2+\varepsilon )\sqrt{n}\) with high probability (see, for example, Theorem II.11 in [20]). On the other hand,
which is the maximum of n Gaussian random variables each with variance n. A simple union bound yields that, for any \(\varepsilon >0\), \(\lambda _{\max }\left( D_{[-W]} \right) < \sqrt{(2+\varepsilon )n\log n}\) with high probability. This readily implies an exact recovery guarantee for \({\mathbb {Z}}_2\) Synchronization with Gaussian noise.
Proposition 3.6
Let \(z\in \{\pm 1\}^n\) and \(Y = zz^T + \sigma W\) where W is a symmetric matrix with iid standard Gaussian entries. If there exists \(\varepsilon >0\) such that \(\sigma < \sqrt{\frac{n}{(2+\varepsilon )\log n}}\), then, with high probability, \(X = zz^T\) is the unique solution to the semidefinite program (8).
Let us investigate the optimality of this upper bound on \(\sigma \). If the diagonal elements of \(D_{[-W]}\) were independent,Footnote 7 their distribution would be known to indeed concentrate around \(\sqrt{2n \log n}\), suggesting that
which would imply
Both of these statements can be rigorously shown to be true.Footnote 8
This suggests that, in rough terms, the success of the relaxation (8) depends mostly on whether \(\lambda _{\max }\left( D_{[-W]} \right) < \frac{n}{\sigma }\), which is equivalent to
which can be interpreted as a bound on the amount of noise per row of Y. We argue next that this type of upper bound is indeed necessary for any method to succeed at recovering z from Y.
Once again, let us consider \(z=\mathbf {1}\) without loss of generality. Let us consider an oracle version of problem on which one is given the correct label of every single node except of node i. It is easy to see that the maximum likelihood estimator for \(z_i\) on this oracle problem is given by
which would give the correct answer if and only if \( -\sigma \sum _{j \in [n]\setminus i} W_{ij} < n -1\).
This means that if
one does not expect the MLE to succeed (with high probability) at recovering z from \(Y = zz^T + \sigma W\). This means that (with a uniform prior on z) no method is able to recover z with high probability. Note the similarity between (15) and (16). This strongly suggests the optimality of the semidefinite programming-based approach (8).
These optimality arguments can be made rigorous. In fact, in the sections to follow, we will establish precise optimality results of this type, for the applications we are interested in. The main ingredient (13) in the rough argument above was the realization that the spectral norm of W is, with high probability, asymptotically smaller than the largest diagonal entry of \(D_{[-W]}\). Indeed, Theorems 2.1 and 2.2 establish precisely this fact for a large class of matrices with independent off-diagonal entries. Empowered with this result, we will be able to establish optimality for the semidefinite programming approach to solve the problems of \({\mathbb {Z}}_2\) Synchronization and recovery in the stochastic block model, where the underlying random matrices have much less well understood distributions. Modulo the use of Theorem 2.1, the arguments used will be very reminiscent of the ones above.
It is pertinent to compare this approach with the one of using noncommutative Khintchine inequality, or the related matrix concentration inequalities [44, 45], to estimate the spectral norms in question. Unfortunately, those general purpose methods are, in our case, not fine enough to give satisfactory results. One illustration of their known suboptimality is the fact that the upper bound they give for \(\Vert W\Vert \) is of order \(\sqrt{n\log n}\), which does not allow to establish (13), a crucial step in the argument. In fact, the looseness of these bounds is reflected in the suboptimal guarantees obtained in [1,2,3]. Our results are able to establish a phenomenon of the type of (13) by relying on recent sharp estimates for the spectral norm of matrices with independent entries in [10].
3.3 Synchronization Over the Group of Two Elements
Recall the setting of \({\mathbb {Z}}_2\) Synchronization [1, 2]. Given an underlying graph G with n nodes, the task is to recover a binary vector \(z\in \{\pm 1\}^n\) from noisy measurements \(Y_{ij}\) of \(z_iz_j\). Following [1, 2], we will take the underlying graph G to be an Erdős–Rényi graph \(\mathcal {G}(n,p)\) and, for each edge \((i,j)\in G\),
where \(\varepsilon <\frac{1}{2}\) represents the noise level. We are interested in understanding for which values of p and \(\varepsilon \) is it possible to exactly recover z. It is easy to see that, just like in the example in Sect. 3.2, the maximum likelihood estimator is given by (6). Similarly, we consider its semidefinite relaxation (8) and investigate when \(X = zz^T\) is the unique solution of (8).
It is easy to see that Y is given by
where \(A_G\) is the adjacency matrix of the underlying graph and \(A_H\) is the adjacency of the graph consisting of the corrupted edges. In this case, we want conditions on \(\varepsilon \) and p under which \(zz^T\) is the unique solution to:
Lemma 3.5 states that \(zz^T\) is indeed the unique solution as long as the second smallest eigenvalue of
is strictly positive. As \(\mathrm {diag}(z)\left( D_G - 2D_H \right) \mathrm {diag}(z) = D_G - 2D_H\) and conjugating by \(\mathrm {diag}(z)\) does not alter the eigenvalues, the second smallest eigenvalue of (18) being strictly positive is equivalent to
Since \(D_G - A_G - 2\left( D_H - A_H \right) = L_G - 2L_H\), where \(L_G\) and \(L_H\) are the Laplacians of, respectively, G and H, we define \(L_{\mathrm {Synch}}\) and write the condition in terms of \(L_{\mathrm {Synch}}\).
Definition 3.7
(\(L_{{\mathrm {Synch}}}\)) In the setting described above,
where G is the graph of all measurements and H is the graph of wrong measurements.
Then, (19) is equivalent to \(\lambda _2\left( L_{\mathrm {Synch}}\right) >0\). The following Lemma readily follows by noting that \(\mathbb {E}\left[ L_{\mathrm {Synch}} \right] = np(1-2\varepsilon )I_{n\times n} - p(1-2\varepsilon )\mathbf {1}\mathbf {1}^T\).
Lemma 3.8
Consider the \({\mathbb {Z}}_2\) Synchronization problem defined above and \(L_{\mathrm {Synch}}\) defined in Definition 3.7. As long as
the semidefinite program (17) achieves exact recovery.
In [1, 2], this largest eigenvalue is estimated using the general purpose matrix concentration inequalities (such as the ones in [44]) obtaining a suboptimal bound. In contrast, we will do this estimate using Theorem 2.1.
Let us define, for a node \(i,\deg _+(i)\) as the number of noncorrupted edges incident to i and \(\deg _-(i)\) as the number of corrupted edges incident to i. We start by obtaining the following theorem.
Theorem 3.9
As long as \(n>2, p>\frac{\log n}{2n}\) and \(p(1-2\varepsilon )^2 \le \frac{1}{2}\), there exists \(\Delta >0\) such that, with high probability, the following holds: If
then the semidefinite program (17) achieves exact recovery.
Proof
(of Theorem 3.9)
The idea is to apply Theorem 2.1 to \(L = - L_{\mathrm {Synch}} + \mathbb {E}\left[ L_{\mathrm {Synch}} \right] \). Note that L has independent off-diagonal entries and
Hence, there exists a constant \(\Delta '\) such that, with high probability,
We just need to show that there exists \(\Delta >0\) such that if (20) holds, then
Recall that \( (L_{\mathrm {Synch}})_{ii} = \deg _{+}(i) - \deg _{-}(i) \) and \(\mathbb {E}(L_{\mathrm {Synch}})_{ii} = (n-1)p(1-2\varepsilon )\). We can rewrite (21) as
Straightforward algebraic manipulations show that there exists a constant \(\Delta \) such that
proving the theorem. \(\square \)
We note that if \(p \le \frac{\log n}{2n}\), then Theorem 3.3 implies that, with high probability, the underlying graph is disconnected implying impossibility of exact recovery. We also note that if we do not have
then the maximum likelihood does not match the ground truth, rendering exact recovery unrealistic.Footnote 9 The optimality of this analysis hinges upon the fact that the right-hand side of (20) is asymptotically smaller than the expectation of \(\deg _{+}(i) - \deg _{-}(i)\), suggesting that (20) and (22) have similar probabilities and the same phase transition.
The next theorem establishes the optimality of the semidefinite programming-based approach in a particular regime, solving a problem raised in [1, 2]. While it is clear that one can use Theorem 3.9 to establish similar results for many other regimes (for some, through estimates similar to the ones in Lemma 3.17), the main purpose of this paper is not to perform a detailed analysis of this problem but rather to illustrate the efficacy of these semidefinite relaxations and the fundamental connections between these different phenomena, through Theorem 2.1. The independent parallel research efforts of Hajek et al. [29] address other regimes for this particular problem; we refer the interested reader there.
Corollary 3.10
As long as \(\varepsilon < \frac{1}{2}\) and \(p(1-2\varepsilon )^2 \le \frac{1}{2}\), there exists a constant K for which the following holds: If there exists \(\delta >0\) such that
then the semidefinite program (17) achieves exact recovery with high probability.
Before proving this corollary, we emphasize how it solves the problem, raised in [1, 2], of whether the semidefinite programming approach for \({\mathbb {Z}}_2\) Synchronization is optimal in the low signal-to-noise regime. In fact, the results in [1, 2] ensure that the threshold in Corollary 3.10 is optimal for, at least, an interesting range of values of \(\varepsilon \). Empowered with Theorem 3.9, the proof of this corollary becomes rather elementary.
Proof
(of Corollary 3.10)
This corollary will be established with a simple use of Bernstein’s inequality.
Our goal is to show that, given \(\Delta \), there exists a K and \(\delta \) such that, under the hypothesis of the corollary,
holds with high probability. This implies, via Theorem 3.9, that the semidefinite program (17) achieves exact recovery with high probability.
We will consider n to be large enough. We start by noting that it suffices to show that there exists \( \delta > 0 \) such that, for each \(i\in [n]\) separately,
Indeed, (24) together with a union bound over the n nodes of the graph would establish the corollary.
Throughout the rest of the proof, we will fix \(i\in [n]\) and use \(\deg _{+}\) and \(\deg _{-}\) to denote, respectively, \(\deg _{+}(i)\) and \(\deg _{-}(i)\). It is easy to see that
where \(x_j\) are i.i.d. centered random variables with distribution
For any \(t>0\), Bernstein’s inequality gives
Taking \(t = \left[ 1 - \frac{\Delta }{\sqrt{\log n}} \right] (n-1) p (1-2\varepsilon ) \) gives
Condition (23) (for a K to be determined later) guarantees that
meaning that we just need to show that there exists \(K>0\) for which
Note that \(\frac{1}{p}\mathbb {E}x_j^2 = 1 + p(1-2\varepsilon ) \le 1 + (1-2\varepsilon )\) and \(\Vert x_j\Vert _{\infty } = 1 + p(1-2\varepsilon ) \le 2\), implying that
Also, \(\left[ 1 - \frac{\Delta }{\sqrt{\log n}} \right] ^{2} \ge 1 - \frac{2\Delta }{\sqrt{\log n}}\). The corollary is then proved by noting that there exists \(K>0\) such that
\(\square \)
3.4 Stochastic Block Model with Two Communities
We shift our attention to the problem of exact recovery of the stochastic block model with two communities. Recall Definition 1.2, for n even and \(0\le q<p \le 1\), we say that a graph G with n nodes is drawn from the Stochastic block model with two communities \(\mathcal {G}(n,p,q)\) if the nodes are divided in two sets of \(\frac{n}{2}\) nodes each, and for each pair of vertices i, j, (i, j) is an edge of G with probability p if i and j are in the same cluster and q otherwise, independently from any other edge. Let \(g\in \{\pm 1\}^{n}\) be a vector that is 1 in one of the clusters and \(-1\) in the other, our task is to recover g.
The maximum likelihood estimator for g is given by
where B is the signed adjacency of G, meaning that \(B_{ij}=1\) if (i, j) is an edge of G and \(B_{ij}=-1\) otherwise. Note that \(B = 2A - \left( \mathbf {1}\mathbf {1}^T - I \right) \), where A is the adjacency matrix. We will drop the balanced constraint \(\sum _{i=1}^n x_i = 0\), arriving at (6) for \(Y=B\). The intuitive justification is that there are enough \(-1\) entries in B to discourage unbalanced solutions. As in the problems considered above, we will consider the semidefinite relaxation (8).
We want to understand when is it that \(X=gg^T\) is the unique solution of (26). Lemma 3.5 shows that \(gg^T\) is indeed the unique solution of (26) as long as the second smallest eigenvalue of
is strictly positive.
Let us introduce a new matrix.
Definition 3.11
(\(\Gamma _{SBM}\)) Given a graph G drawn from the stochastic block model with two clusters,
where \(\mathcal {D}_{+}\) is a diagonal matrix of inner degrees, \(\mathcal {D}_{-}\) is a diagonal matrix of outer degrees and A is the adjacency matrix of the graph.
It is easy to see that \(D_{\left[ \mathrm {diag}(g)A\mathrm {diag}(g)\right] } = \mathcal {D}_+ - \mathcal {D}_-\). In fact,
which means that \(gg^T\) is the unique solution of (26) as long as \( \lambda _2\left( 2\Gamma _{\mathrm {SBM}} + \mathbf {1}\mathbf {1}^T \right) > 0. \)
Note that
If we suppose that \(p<\frac{1}{2}\), we have \(1 - (p+q) > p-q\) the second smallest eigenvalue of \(\mathbb {E}\left[ 2\Gamma _{\mathrm {SBM}} + \mathbf {1}\mathbf {1}^T \right] \) is \(n \left( p-q \right) \). This establishes the following Lemma.
Lemma 3.12
Let \(n\ge 4\) be even and let G be drawn from G(n, p, q) with edge probabilities \(p<\frac{1}{2}\) and \(q<p\). As long as
the semidefinite program (26) for the stochastic block model problem achieves exact recovery, meaning that \(gg^T\) is its unique solution.
Estimating this largest eigenvalue using Theorem 2.1, we obtain the following theorem.
Theorem 3.13
Let \(n\ge 4\) be even and let G be drawn from G(n, p, q). As long as \(\frac{\log n}{3n}<p<\frac{1}{2}\) and \(q<p\), then there exists \(\Delta >0\) such that, with high probability, the following holds: If,
then the semidefinite program (26) achieves exact recovery.
Proof
The idea is again to apply Theorem 2.1. One obstacle is that \(\Gamma _{\mathrm {SBM}}\) is not a Laplacian matrix. Let g denote the vector that is 1 in a cluster and \(-1\) in the other, and let \(\mathrm {diag}(g)\) denote a diagonal matrix with the entries of g on the diagonal. We define
Note that \(\Gamma '_{\mathrm {SBM}}\) is a Laplacian and both the eigenvalues and diagonal elements of \( \mathbb {E}\left[ \Gamma '_{\mathrm {SBM}} \right] - \Gamma '_{\mathrm {SBM}}\) are the as \( \mathbb {E}\left[ \Gamma _{\mathrm {SBM}} \right] - \Gamma _{\mathrm {SBM}} \).
We apply Theorem 2.1 to \(L = - \Gamma '_{\mathrm {SBM}} + \mathbb {E}\left[ \Gamma '_{\mathrm {SBM}} \right] \). Note that L has independent off-diagonal entries and
Hence, there exists a constant \(\Delta '\) such that, with high probability,
which is equivalent to
We just need to show that there exists \(\Delta >0\) such that if (28) holds, then
Note that \((\Gamma _{\mathrm {SBM}})_{ii} = \deg _\mathrm{in}(i) - \deg _\mathrm{out}(i)\) and
Condition (28) can thus be rewritten as
The theorem is then proven by noting that, for any \(\Delta '\), there exists \(\Delta \) such that
\(\square \)
As a corollary of this theorem, we can establish a sharp threshold for the semidefinite program (26) to achieve exact recovery for the stochastic block model of two clusters, solving a problem posed in [3]. We recall that this problem was simultaneously solved by the parallel research efforts of Hajek et al. [28].
We first show a Lemma concerning \(\min _{i}\left( \deg _\mathrm{in}(i) - \deg _\mathrm{out}(i) \right) \), analogous to Lemma 3.1.
Lemma 3.14
Let G be a random graph with n nodes drawn accordingly to the stochastic block model on two communities with edge probabilities p and q. Let \(p = \frac{\alpha \log n}{n}\) and \(q = \frac{\beta \log n}{n}\), where \(\alpha > \beta \) are constants. Then for any constant \(\Delta > 0\),
-
(1)
If
$$\begin{aligned} \sqrt{\alpha } - \sqrt{\beta } > \sqrt{2}, \end{aligned}$$(31)then, with high probability,
$$\begin{aligned} \min _{i}\left( \deg _\mathrm{in}(i) - \deg _\mathrm{out}(i) \right) \ge \frac{\Delta }{\sqrt{\log n}} \mathbb {E}\left[ \deg _\mathrm{in}(i) - \deg _\mathrm{out}(i) \right] . \end{aligned}$$ -
(2)
On the other hand, if
$$\begin{aligned} \sqrt{\alpha } - \sqrt{\beta } < \sqrt{2}, \end{aligned}$$(32)then, with high probability,
$$\begin{aligned} \min _{i}\left( \deg _\mathrm{in}(i) - \deg _\mathrm{out}(i) \right) < 0, \end{aligned}$$and exact recovery is impossible.
Part (2) is proven in [3], so we will focus on part (1). Before proving this lemma we note how, together with Theorem 3.13, this immediately implies the following corollary.
Corollary 3.15
Let G be a random graph with n nodes drawn accordingly to the stochastic block model on two communities with edge probabilities p and q. Let \(p = \frac{\alpha \log n}{n}\) and \(q = \frac{\beta \log n}{n}\), where \(\alpha > \beta \) are constants. Then, as long as
the semidefinite program (26) coincides with the true partition with high probability.
In order to establish Lemma 3.14, we will borrow an estimate from [3].
Definition 3.16
(Definition 3 in [3]) Let m be a natural number, \(p,q\in [0,1]\), and \(\delta \in \mathbb {R}\), we define
where \(W_1,\dots , W_m\) are i.i.d. \(\mathrm {Bernoulli}(p)\) and \(Z_1,\dots , Z_m\) are i.i.d. \(\mathrm {Bernoulli}(q)\), independent of \(W_1,\dots , W_m\).
Lemma 3.17
Recall Definition 3.16. Let \(\alpha \), \(\beta \), and \(\Delta '\) be constants. Then,
with \(\displaystyle {\lim _{n \rightarrow \infty }\delta (n) = 0}\).
Proof
The proof of this lemma is obtained by straightforward adaptations to the proof of Lemma 8 in [3]. \(\square \)
We are now ready to prove Lemma 3.14.
Proof
(of Lemma 3.14)
Let \(\alpha > \beta \) be constants satisfying condition (32). Given \(\Delta > 0\), we want to show that, with high probability
Let us fix i throughout the rest of the proof. It is clear that we can write
where \(W_1,\dots , W_m\) are i.i.d. \(\mathrm {Bernoulli}(p)\) and \(Z_1,\dots , Z_m\) are i.i.d. \(\mathrm {Bernoulli}(q)\), independent of \(W_1,\dots , W_m\). Hence, since
the probability of \( \deg _\mathrm{in}(i) - \deg _\mathrm{out}(i) < \frac{\Delta }{\sqrt{\log n}} \left( \frac{n}{2}(p-q)\right) \) is equal to
which is upper bounded by,
Take \(\Delta ' = \Delta \left( \frac{\alpha - \beta }{2}\right) + 1\) and recall Definition 3.16, then
where \(\lim _{n \rightarrow infty}\delta (n)=0\), and the last inequality used Lemma 3.17.
Via a simple union bound, it is easy to see that,
which means that, as long as \(\frac{\alpha + \beta }{2} - \sqrt{\alpha \beta }>1\), (34) holds with high probability. Straightforward algebraic manipulations show that (31) implies this condition, concluding the proof of the corollary. \(\square \)
4 Proof of the Main Result
We will prove Theorems 2.1 and 2.2 through a few lemmas. Let us define X as the nondiagonal part of \(-L\) and \(y\in \mathbb {R}^n\) as \(y = \mathrm {diag}\left( D_X\right) \), meaning that \(y=\mathrm {diag}(L)\). Then, \(L = D_X - X\). We will separately lower bound \(\max _i{y_i}\) and upper bound \(\Vert X\Vert \). The upper bound on \(\Vert X\Vert \) is obtained by a direct application of a result in [10].
Lemma 4.1
(Remark 3.13 in [10]). Let X be the \(n\times n\) symmetric matrix with independent centered entries. Then, there exists a universal constant \(c'\), such that for every \(t\ge 0\)
where we have defined
Before continuing with the proof, let us recall the main idea: Lemma 4.1 gives that, with high probability,
where X is the off-diagonal part of \(-L\). One the other hand, \(L_{ii} = \sum _{j\in [n]\setminus i} X_{ij}\) has variance \(\sigma ^2\). The central limit theorem would thus suggest that \(L_{ii}\) behave like a Gaussian of variance \(\sigma ^2\). Since different sums only share a single summand, they are “almost” independent which by itself would suggest that \(\max _{i}L_{ii} \sim \sigma \sqrt{\log n}\), which would imply the theorems. The proof that follows makes this argument precise.
We turn our attention to a lower bound on \(\max _i{y_i}\). Recall that
More specifically, we are looking for an upper bound on
for a suitable value of t. We note that if the \(y_i\)’s were independent, then this could be easily done via lower bounds on the upper tail of each \(y_i\). Furthermore, if the random variable \(y_i\) were Gaussian, obtaining such lower bounds would be trivial. Unfortunately, the random variables in question are neither independent nor Gaussian, forcing major adaptations to this argument. In fact, we will actually start by lower bounding
We will obtain such a bound via a comparison (using Jensen’s inequality) with the maximum among certain independent random variables.
Lemma 4.2
Let \(\mathcal {I}\) and \(\mathcal {J}\) be disjoint subsets of [n]. For \(i\in \mathcal {I}\) define \(z_i\) as
Then
Proof
Since \(\mathcal {I}\cap \mathcal {J}=\emptyset \), \(\{X_{ij}\}_{i\in \mathcal {I},j\in \mathcal {J}}\) is independent from \(\{X_{ij}\}_{i\in \mathcal {I},j\notin \mathcal {J}}\), and so Jensen’s inequality gives
\(\square \)
The following lemma guarantees the existence of sets \(\mathcal {I}\) and \(\mathcal {J}\) with desired properties.
Lemma 4.3
There exist \(\mathcal {I}\) and \(\mathcal {J}\) disjoint subsets of [n] such that
and, for every \(i\in \mathcal {I}\),
where \(z_i\) is defined, as in (36), to be \(z_i = \sum _{j\in \mathcal {J}} X_{ij}\).
Proof
Given the matrix X, we start by constructing a weighted graph on n nodes such that \(w_{ij} = \mathbb {E}X_{ij}^2\) (note that \(w_{ii}=0\), for al i). Let \((S,S^c)\) be a partition of the vertices of this graph, with \(|S|\ge \frac{n}{2}\), that maximizes the cut
It is easy to see that the maximum cut needs to be at least half of the total edge weights.Footnote 10 This readily implies
Consider \(z_i\), for \(i\in S\), defined as
We proceed by claiming that the set \(\mathcal {I}\subset S\) of indices \(i\in S\) for which
satisfies \(|\mathcal {I}| \ge \frac{1}{8} n\). Thus, taking \(\mathcal {J}= S^c\) would establish the lemma.
To justify the claim, note that
and
implying that \(\left( |\mathcal {I}|+ \frac{1}{8}n\right) \sigma ^2 \ge \frac{1}{4} \sigma ^2\). \(\square \)
We now proceed by obtaining a lower bound for \(\mathbb {E}\max _{i\in \mathcal {I}} z_i\), where \(\mathcal {I}\) and \(z_i\) are defined to satisfy the conditions in Lemma 4.3. We note that at this point the random variables \(z_i\) are independent and each is a sum of independent random variables. We use Lemma 8.1 of [31] (for a fixed constant \(\gamma = 1\)) to obtain a lower bound on the upper tail of each \(z_i\).
Lemma 4.4
(Lemma 8.1 of [31]) In the setting described above, there exist two universal positive constants K and \(\varepsilon \) such that for every t satisfying \(t\ge K \frac{\sigma }{8} \) and \(t \le \varepsilon \frac{\sigma ^2}{\sqrt{8}\sigma _\infty }\), we have (for every \(i\in \mathcal {I}\) separately)
We are now ready to establish a lower bound on \(\mathbb {E}\max _{i\in [n]} y_i\).
Lemma 4.5
In the setting described above, there exist two universal positive constants K and \(\varepsilon \) such that for every t satisfying \(t\ge K \frac{\sigma }{8} \) and \(t \le \varepsilon \frac{\sigma ^2}{\sqrt{8}\sigma _\infty }\), we have
Proof
Let K and \(\varepsilon \) be the universal constants in Lemma 4.4 and t such that \(K \frac{\sigma }{8} \le t \le \varepsilon \frac{\sigma ^2}{\sqrt{8}\sigma _\infty }\). Lemma 4.4 guarantees that, for any \(i\in \mathcal {I}\),
Due to the independence of the random variables \(z_i\), we have
where the second to last inequality follows from the fact that \(|\mathcal {I}|\ge \frac{1}{8} n\) and the last from the fact that \(\left( 1-\frac{1}{x}\right) ^x\le \exp (-1)\) for \(x>1\).
Since \(\Vert X_{ij}\Vert _{\infty } \le \sigma _{\infty }\), we have that, almost surely, \(z_i\ge -(n-1)\sigma _{\infty }\). Thus,
which establishes the lemma. \(\square \)
The last ingredient we need is a concentration result to control the lower tail of \(\max _{i\in [n]} y_i\) by controlling its fluctuations around \(\mathbb {E}\max _{i\in [n]} y_i\). We make use of a result in [33].
Lemma 4.6
In the setting described above, define v as
where \(X'\) is an independent identically distributed copy of X.
Then, for any \(x>0\):
Proof
This lemma is a direct consequence of Theorem 12 in [33] by taking the independent random variables to be \(Y_{(i,j)}\) such that \(Y_{(i,j),t}=X_{ij}\) if \(t=i\) and \(Y_{(i,j),t}=0\) otherwise. We note that there is a small typo (in the definition of the quantity v) in the theorem as stated in [33]. \(\square \)
At this point, we need an upper bound on the quantity v defined in (37). This is the purpose of the following lemma.
Lemma 4.7
In the setting above, let \(X'\) be an independent identically distributed copy of X, then
Proof
We apply a Rosenthal-type inequality from Theorem 8 of [13], for each \(i\in [n]\) separately, and get, for any integer p and \(0<\delta <1\),
It is easy to see that
Thus, taking \(p=\lceil \alpha \log n \rceil \) for some \(\alpha >0\) gives
Taking, for example, \(\delta = 0.5\) and \(\alpha = 1\) gives
\(\square \)
We now collect all our bounds in a master Lemma.
Lemma 4.8
In the setting described above, there exist universal constants \(K>0\) and \(\varepsilon >0\) such that, for any t satisfying \( K \frac{\sigma }{8} \le t \le \varepsilon \frac{\sigma ^2}{\sqrt{8}\sigma _\infty } \), we have
Proof
Let \(t>0\) satisfy the hypothesis of the lemma, and \(x>0\).
Recall that Lemma 4.6 gives
On the other hand, Lemmas 4.5 and 4.7 control, respectively, \(\mathbb {E}\left[ \max _{i\in [n]} y_i \right] \) and v, giving
and
Combining all these bounds,
Taking \(x=t/2\) establishes the lemma. \(\square \)
At this point, the proofs of Theorems 2.1 and 2.2 will consist essentially of applying Lemma 4.8 for appropriate values of t.
Proof
(of Theorem 2.1)
Let \(\beta >0\) be a constant to be defined later. Taking \(t = \beta \sigma \sqrt{\log n}\) in Lemma 4.8 gives that, in the setting described above,
provided that \(K \frac{\sigma }{8} \le \beta \sigma \sqrt{\log n} \le \varepsilon \frac{\sigma ^2}{\sqrt{8}\sigma _\infty }\), where K and \(\varepsilon \) are the universal constants in Lemma 4.8.
We start by noting that if \(0<\beta < \frac{1}{\sqrt{8}}\) independent of n, then, for n large enough (not depending on \(\sigma \) or \(\sigma _{\infty }\)),
Thus, provided that \(\frac{K}{8\sqrt{\log n}} \le \beta \le \min \left\{ \varepsilon \frac{\sigma }{\sqrt{8\log n}\sigma _\infty }, \frac{1}{3} \right\} \),
Let c be the constant in the hypothesis of the theorem, then \(\sigma > c\sqrt{\log n}\sigma _\infty \).
Let \(\beta = \min \left\{ \frac{\varepsilon c}{\sqrt{8}}, \frac{1}{3} \right\} \). Clearly, for n large enough,
and
This implies that there exist constants \(c_1', C_1'\), and \(\beta _1'\) such that
Recall that Corollary 4.1 ensures that, for a universal constant \(c'\), and for every \(u\ge 0\), by taking \( t = u\sigma \),
It is easy to see that \(ne^{-u^2\sigma ^2/c'\sigma _{\infty }^2} \le ne^{-u^2(\log n)c/c'} = n^{1-u^2c/c'}\). Taking \(u = \sqrt{2c'/c}\) gives
This means that, with probability at least \(1 - c_1'n^{ -\beta _1' } - n^{-1}\), we have
which, together with the fact that \(\lambda _{\max }(L) \le \Vert X\Vert + \max _{i\in [n]}L_{ii}\), establishes the theorem. \(\square \)
Proof
(of Theorem 2.2)
If \(\sigma > \sqrt{\log n} \sigma _\infty \), then the result follows immediately from Theorem 2.1. For that reason, we restrict our attention to the instances with \(\sigma \le \sqrt{\log n} \sigma _\infty \). We start by setting
Recall that there exist c and \(\gamma >0\) such that \(\sigma \ge c\left( \log n\right) ^{\frac{1}{4} + \gamma }\sigma _{\infty }\), or equivalently
This guarantees that, for n large enough (not depending on \(\sigma \) or \(\sigma _{\infty }\)), the conditions in Lemma 4.8 are satisfied. In fact,
Hence, Lemma 4.8 gives, for t as in (40),
We proceed by noting that, for \(t=2\sigma \left( \frac{\sigma }{\sigma _\infty } \right) ^{\frac{1}{2}} (\log n)^{\frac{1}{8}}\) and n large enough (not depending on \(\sigma \) or \(\sigma _{\infty }\)),
In fact, since \(\sigma \le \sigma _\infty \sqrt{\log n}\),
decreases faster than any polynomial.
Hence, since \(t \ge 2\sigma \sqrt{c}\left( \log n\right) ^{\frac{1}{4} + \frac{\gamma }{2}}\),
We proceed by noting that
Since \(\frac{\sigma _\infty }{\sigma } \le \frac{1}{c}(\log n)^{-\frac{1}{4}-\gamma }\), we have that, for n large enough and a constant \(c''\)
At this point, we upper bound \(\Vert X\Vert \), as in the proof of Theorem 2.1. Recall, as in (39), for any \(u>0\),
Hence,
Taking \(u = (\log n)^{\frac{1}{4}}\) gives
The rest of the proof follows the final arguments in the proof of Theorem 2.1. \(\square \)
5 Conclusion and Future Directions
Theorems 2.1 and 2.2 are valid for matrices whose entries may be distributed in very different ways. This potentially allows one to use them in order to obtain strong guarantees for deterministically censored versions of the problems described, where the measurements are obtained only for edges of a deterministic graph (a similar model was studied, for example, in [1]).
The problem of recovery in the stochastic block model with multiple balanced clusters, also referred to as multisection, is a natural generalization of the one considered here and also admits a semidefinite relaxation. While the results here do not seem to be directly applicable in the analysis of that algorithm, in part because the construction of a dual certificate in that setting is considerably more involved, some of the ideas in the present paper can be adapted for the estimates needed there. These also provide interpretable and sharp guarantees. We refer the interested reader to [4].
Regarding directions for future investigations, from the random matrix side of things it would be interesting to investigate what happens when \(\sigma \gg \sigma _{\infty }\) but \(\frac{\sigma }{\sigma _{\infty }} = o\left( (\log n)^{\frac{1}{4}}\right) \), as this setting is not captured by our results. It would be particularly interesting also to understand whether analogs of these results exist, for instance, where the off-diagonal entries of L are not independent.Footnote 11
From the point of view of applications, a natural question is which other semidefinite relaxations have these optimality guarantees. A general understanding in that direction would be remarkable.
Notes
Our results will be of nonasymptotic nature (we refer the interested reader to [47] for a tutorial on nonasymptotic estimates in random matrix theory).
The Erdős–Rényi model for random graphs will be discussed in more detail in Sect. 3.1.
This will be done by considering a centered version of the Laplacian of the random graph, and relating also the spectral properties and diagonal of it to the ones of the original Laplacian.
When the optimal solution of a semidefinite relaxation is the optimal solution of the original problem, we say that the relaxation is tight.
The diagonal entries of \(D_W\) are not independent because each pair of sums shares a term \(W_{ij}\) as a summand.
Recall that, if we assume a uniform prior, the MLE is the method that maximizes the probability of exact recovery.
One can build such a cut by consecutively selecting memberships for each node in a greedy fashion as to maximize the number of incident edges cut, see [41].
For the particular example of connectivity of an Erdős–Rényi graph, it is possible to use the matrix concentration approach [44, 45] to obtain a guarantee that, while being a factor away from optimal, appears to be adaptable to instances where edges have particular types of dependencies—we refer the reader to Sect. 5.3. in the monograph [45].
References
E. Abbe, A. S. Bandeira, A. Bracher, and A. Singer. Decoding binary node labels from censored edge measurements: Phase transition and efficient recovery. Network Science and Engineering, IEEE Transactions on, 1(1):10–22, Jan 2014.
E. Abbe, A. S. Bandeira, A. Bracher, and A. Singer. Linear inverse problems on Erdős-Rényi graphs: Information-theoretic limits and efficient recovery. IEEE International Symposium on Information Theory (ISIT2014), 2014.
E. Abbe, A. . Bandeira, and G. Hall. Exact recovery in the stochastic block model. Available online at arXiv:1405.3267v4 [cs.SI], 2014.
N. Agarwal, A. S. Bandeira, K. Koiliaris, and A. Kolla. Multisection in the stochastic block model using semidefinite programming. Available online at arXiv:1507.02323 [cs.DS], 2015.
F. Alizadeh. Interior point methods in semidefinite programming with applications to combinatorial optimization. SIAM Journal on Optimization, 5:13–51, 1993.
G. W. Anderson, A. Guionnet, and O. Zeitouni. An introduction to random matrices. Cambridge studies in advanced mathematics. Cambridge University Press, Cambridge, New York, Melbourne, 2010.
A. S. Bandeira, N. Boumal, and A. Singer. Tightness of the maximum likelihood semidefinite relaxation for angular synchronization. Available online at arXiv:1411.3272 [math.OC], 2014.
A. S. Bandeira, Y. Khoo, and A. Singer. Open problem: Tightness of maximum likelihood semidefinite relaxations. In Proceedings of the 27th Conference on Learning Theory, volume 35 of JMLR W&CP, pages 1265–1267, 2014.
A. S. Bandeira, A. Singer, and D. A. Spielman. A Cheeger inequality for the graph connection Laplacian. SIAM J. Matrix Anal. Appl., 34(4):1611–1630, 2013.
A. S. Bandeira and R. v. Handel. Sharp nonasymptotic bounds on the norm of random matrices with independent entries. Annals of Probability, to appear, 2015.
N. Bansal, A. Blum, and S. Chawla. Correlation clustering. Machine Learning, 56(1-3):89–113, 2004.
R. B. Boppana. Eigenvalues and graph bisection: An average-case analysis. In 28th Annual Symposium on Foundations of Computer Science, pages 280–285, 1987.
S. Boucheron, O. Bousquet, G. Lugosi, and P. Massart. Moment inequalities for functions of independent random variables. Ann. Probab., 33(2):514–560, 2005.
W. Bryc, A. Dembo, and T. Jiang. Spectral measure of large random Hankel, Markov and Toeplitz matrices. The Annals of Probability, 34(1):pp. 1–38, 2006.
Y. Chen and A. J. Goldsmith. Information recovery from pairwise measurements. IEEE International Symposium on Information Theory (ISIT2014), 2014.
Y. Chen, C. Suh, and A. J. Goldsmith. Information recovery from pairwise measurements: A Shannon-theoretic approach. Available online at arXiv:1504.01369 [cs.IT], 2015.
F. Chung and L. Lu. Complex Graphs and Networks (Cbms Regional Conference Series in Mathematics). American Mathematical Society, Boston, MA, USA, 2006.
F. R. K. Chung. Spectral Graph Theory. AMS, 1997.
M. Cucuringu. Synchronization over Z2 and community detection in signed multiplex networks with constraints. Journal of Complex Networks, 2015.
K. Davidson and S. Szarek. Local operator theory, random matrices and Banach spaces. In Handbook on the Geometry of Banach spaces, volume 1, pages 317–366. Elsevier Science, 2001.
A. Decelle, F. Krzakala, C. Moore, and L. Zdeborová. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E, 84, December 2011.
X. Ding and T. Jiang. Spectral distribution of adjacency and Laplacian matrices of random graphs. The Annals of Applied Probability, 20(6):2086–2117, 2010.
R. Durrett. Random Graph Dynamics (Cambridge Series in Statistical and Probabilistic Mathematics). Cambridge University Press, New York, NY, USA, 2006.
P. Erdős and A. Rényi. On random graphs, I. Publicationes Mathematicae (Debrecen), 6:290–297, 1959.
U. Feige and J. Kilian. Heuristics for semirandom graph problems. Journal of Computer and System Sciences, 63(4):639 – 671, 2001.
M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefine programming. Journal of the Association for Computing Machinery, 42:1115–1145, 1995.
M. Grötschel, L. Lovász, and A. Schrijver. The ellipsoid method and its consequences in combinatorial optimization. Combinatorica, 1(2):169–197, 1981.
B. Hajek, Y. Wu, and J. Xu. Achieving exact cluster recovery threshold via semidefinite programming. Available online at arXiv:1412.6156, 2014.
B. Hajek, Y. Wu, and J. Xu. Achieving exact cluster recovery threshold via semidefinite programming: Extensions. Available online at arXiv:1502.07738, 2015.
S. Khot. On the power of unique 2-prover 1-round games. Thiry-fourth annual ACM symposium on Theory of computing, 2002.
M. Ledoux and M. Talagrand. Probability in Banach spaces, volume 23 of Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)]. Springer-Verlag, Berlin, 1991.
L. Lovasz. On the shannon capacity of a graph. IEEE Trans. Inf. Theor., 25(1):1–7, 1979.
P. Massart. About the constants in Talagrand’s concentration inequalities for empirical processes. The Annals of Probability, 28(2), 2000.
L. Massoulié. Community detection thresholds and the weak Ramanujan property. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing, STOC ’14, pages 694–703, New York, NY, USA, 2014. ACM.
F. McSherry. Spectral partitioning of random graphs.
E. Mossel, J. Neeman, and A. Sly. Consistency thresholds for the planted bisection model. Available online at arXiv:1407.1591v2 [math.PR], July 2014.
E. Mossel, J. Neeman, and A. Sly. A proof of the block model threshold conjecture. Available online at arXiv:1311.4115 [math.PR], January 2014.
E. Mossel, J. Neeman, and A. Sly. Stochastic block models and reconstruction. Probability Theory and Related Fields (to appear), 2014.
Y. Nesterov and A. Nemirovskii. Interior-Point Polynomial Algorithms in Convex Programming. Society for Industrial and Applied Mathematics, 1994.
P. Raghavendra. Optimal algorithms and inapproximability results for every CSP? In Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, STOC ’08, pages 245–254. ACM, 2008.
S. Sahni and T. Gonzalez. P-complete approximation problems. J. ACM, 23(3):555–565, July 1976.
A. Singer. Angular synchronization by eigenvectors and semidefinite programming. Appl. Comput. Harmon. Anal., 30(1):20 – 36, 2011.
T. Tao. Topics in Random Matrix Theory. Graduate studies in mathematics. American Mathematical Soc., 2012.
J. A. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of Computational Mathematics, 12(4):389–434, 2012.
J. A. Tropp. An introduction to matrix concentration inequalities. Found. Trends Mach. Learning, 8(1–2):1–230, 2015.
L. Vanderberghe and S. Boyd. Semidefinite programming. SIAM Review, 38:49–95, 1996.
R. Vershynin. Introduction to the non-asymptotic analysis of random matrices. Chapter 5 of: Compressed Sensing, Theory and Applications. Edited by Y. Eldar and G. Kutyniok. Cambridge University Press, 2012.
E. P. Wigner. On the distribution of the roots of certain symmetric matrices. Annals of Mathematics, 67(2):pp. 325–327, 1958.
Acknowledgements
The author acknowledges Amit Singer, Emmanuel Abbe, Ramon van Handel, and Georgina Hall for many insightful discussions on the topic of this paper and, specially, for motivating the author to write this manuscript. Many thanks to Ramon van Handel for his crucial help locating the references for the strongest version of the theorems used in the proof of our main results. The author is also indebted to Amit Singer, Ramon van Handel, Dustin G. Mixon, Nicolas Boumal, and Joel Tropp for valuable comments on early drafts of this manuscript. The author presented most of these results in various seminars throughout the end of 2014 and beginning of 2015. Many questions and comments raised by the audience greatly improved the quality of this manuscript, a warm thanks to all of them. The author also thanks the anonymous referees for many helpful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Emmanuel Jean Candés.
Most of this work was carried out, while the author was with the Program in Applied and Computational Mathematics in Princeton University and supported by AFOSR Grant No. FA9550-12-1-0317. Part of the work was also done with the author was with the Department of Mathematics at the Massachusetts Institute of Technology and supported by NSF Grant DMS-1317308.
Rights and permissions
About this article
Cite this article
Bandeira, A.S. Random Laplacian Matrices and Convex Relaxations. Found Comput Math 18, 345–379 (2018). https://doi.org/10.1007/s10208-016-9341-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10208-016-9341-9