1 Introduction

Consider a system \(\mathcal {P}\) of independent random variables and a set \(\mathcal {A}\) of n bad events, where each \(A\in \mathcal {A}\) depends solely on some subset \({\text {vbl}}(A)\subseteq \mathcal {P}\). For example, in a hypergraph 2-coloring instance, \(\mathcal {P}\) represents the vertex colors and \(\mathcal {A}\) the events in which an edge is monochromatic. The dependency graph \(G_{\mathcal {A}} = (\mathcal {A}, \{(A,B) \;|\; {\text {vbl}}(A)\cap {\text {vbl}}(B)\ne \emptyset \})\) includes edges between events if and only if they depend on at least one common variable. Let \({\varGamma }(A)\) be A’s neighborhood in \(G_{\mathcal {A}}\) and \({\varGamma }^+(A) = {\varGamma }(A) \cup \{A\}\) be its inclusive neighborhood. The (general, asymmetric) LLL states [14, 41] that if there is a function \(x : \mathcal {A}\rightarrow (0,1)\) such that

$$\begin{aligned} \Pr (A) \le x(A)\cdot \prod _{B\in {\varGamma }(A)} (1-x(B)) \end{aligned}$$

then \(\Pr (\bigcap _{A\in \mathcal {A}} \overline{A}) > 0\), that is, there is a satisfying assignment to the underlying variables in which no bad events occur. The symmetric LLL is a useful corollary of the general LLL. If p and d are such that \(\Pr (A) \le p\) and \(|{\varGamma }(A)| \le d\) for all A, and \(ep(d+1)<1\), then \(\Pr (\bigcap _{A\in \mathcal {A}} \overline{A}) > 0\). For example, consider a hypergraph in which each edge contains k vertices and intersects at most \(d < 2^{k-1}/e-1\) other edges. Under a uniformly random color assignment \(\mathcal {P}\rightarrow \{\)red, blue\(\}\) the probability an edge is monochromatic is \(p=2^{-(k-1)}\), so \(ep(d+1)<1\). The symmetric LLL proves the existence of a satisfying color assignment but does not yield an efficient algorithm to find one. Beginning with Alon [1] and Beck [7], a long line of research has sought to find efficient (and ideally deterministic) algorithms for computing satisfying assignments [1, 7, 9, 11, 16,17,18,19, 22, 28, 31,32,33,34, 42]. Most of these results required a major weakening of the standard symmetric LLL constraint \(ep(d+1)<1\). In many applications we consider, the bad events are that the sum of \(d^{{\varTheta }(1)}\) random variables deviates away from its expectation. So the probability they are violated is often bounded by Chernoff-type tail bounds, e.g. \(\exp (-d^{{\varTheta }(1)})\).

In a relatively recent breakthrough, Moser and Tardos [33] gave an algorithmic proof of the general asymmetric LLL, with no weakening of the parameters. Their algorithm is simple though the analysis is not trivial. At initialization the algorithm chooses a random assignment to the variables \(\mathcal {P}\). Call an event \(A\in \mathcal {A}\) violated if it occurs under the current assignment to the variables. Let \(\mathcal {F}\subseteq \mathcal {A}\) be the set of violated events. The algorithm repeatedly chooses some \(A\in \mathcal {F}\) and resamples the variables in \({\text {vbl}}(A)\), until \(\mathcal {F} = \emptyset \).

The distributed LLL problem We consider Linial’s LOCAL model [35] of distributed computation in which the distributed network is identical to the dependency graph. In other words, each node \(A\in \mathcal {A}\) hosts a processor, which is aware of n, the degree bound d, and its neighborhood \({\varGamma }(A)\). Computation proceeds in synchronized rounds in which each node may send an unbounded message to its neighbors. Time is measured by the number of rounds; computation local to each node is free. Upon termination each node A must commit to an assignment to its variables \({\text {vbl}}(A)\) that is consistent with its neighbors, i.e., the nodes must collectively agree on a satisfying assignment to \(\mathcal {P}\) avoiding all bad events. We consider the LOCAL model because we will need to send the assignment of \({\text {vbl}}(A)\) in one message.

Moser and Tardos proposed a parallel version of their resampling algorithm (Algorithm 1), which can easily be implemented in the LOCAL model. Let \(G_{\mathcal {F}}\) be the graph induced by the violated events \(\mathcal {F}\) under the current variable assignment. They proved that \(O(\log _{1/ep(d+1)} n)\) iterations of Algorithm 1 suffice to avoid all bad events with probability \(1-1/{\text {poly}}(n)\), i.e., \(O(\log n)\) iterations suffice if \(ep(d+1)\) is bounded away from 1.Footnote 1 (For the sake of a simpler presentation we shall state many results in the symmetric LLL language. Our algorithms and Moser–Tardos algorithm work for the asymmetric LLL as well.) Moser and Tardos suggested using Luby’s randomized MIS algorithm [27], which runs in \({\varTheta }(\log n)\) rounds w.h.p. (which can also be achieved by [2]), for a total running time of \({\varTheta }(\log n \cdot \log _{1/ep(d+1)} n)\). This is, intuitively, a very wasteful LLL algorithm since nodes spend nearly all their time computing MISs rather than performing resampling steps. For certain values of d the running time can be improved by plugging in an MIS algorithm running in \(O(d + \log ^*n)\) time [5] or \(O(\log ^2 d) + \exp (O(\sqrt{\log \log n}))\) time w.h.p. [6].Footnote 2 However, it is not possible to find an MIS in constant time. Kuhn, Moscibroda, and Wattenhofer [23] gave an \({\varOmega }\big (\min \big \{\frac{\log d}{\log \log d} , \sqrt{ \frac{{\log n}}{\log \log n}} \big \}\big )\) lower bound on the complexity of MIS and other symmetry-breaking problems.

figure a

New results We give a new distributed LLL algorithm in the Moser–Tardos resampling framework that avoids the computation of MISs altogether. Due to its simplicity we are happy to display the algorithm in its entirety. We assume that nodes possess unique IDs, which could be assigned in an adversarial manner. Let \({\varGamma }_{\mathcal {F}}(A)\) be A’s neighborhood in \(G_{\mathcal {F}}\).

figure b

One can see that \(\mathcal {I}\) is computed in one round: each node A tells its neighbors whether \(A\in \mathcal {F}\) under the current variable assignment. Once A receives messages from all neighbors it can determine if \({\text {ID}}(A)\) is a local minimum in \(G_{\mathcal {F}}\). We prove that under the slightly stronger criterion \(epd^2 < 1\), this algorithm halts in \(O(\log _{1/epd^2} n)\) steps w.h.p. Most applications of the LLL satisfy the \(epd^2 < 1\) criterion, though not all. We give another distributed LLL algorithm in the resampling framework that finds a satisfying assignment in \(O(\log ^2 d \cdot \log _{1/ep(d+1)} n)\) time under the usual \(ep(d+1)<1\) criterion.

We show that faster algorithms exist when the condition \(ep(d+1)<1\) is replaced by a stronger condition \(p\cdot f(d) < 1\), where f(d) is a faster growing function than \(e(d+1)\). However, it is not clear whether there exists f(d) so that the LLL can be solved in sublogarithmic time in n, independent of d. Moser and Tardos observed that any parallel algorithm in the resampling framework requires \({\varOmega }(\log _{1/p} n)\) resampling steps, even if the dependency graph has no edges. We combine the resampling framework with a locality approach to give an \(O(\log n / \log \log n)\) algorithm for an exponential function f(d). On the other hand, we prove that no constant time distributed LLL algorithm exists and that the LLL for any f(d) requires \({\varOmega }(\log ^* n)\) time.

New applications Existential results in graph coloring [29] (those taking the Rödl nibble approach) can often be phrased as distributed algorithms in which each step succeeds with some tiny but non-zero probability, as guaranteed by the LLL. By using our distributed LLL algorithms we are able to solve a number of graph coloring problems in \(O(\log n)\) time or faster.Footnote 3 Some of these applications require minor changes to existing algorithms while others are quite involved. Below \({\varDelta }\) is the maximum degree, and \(\epsilon >0\) an arbitrarily small parameter.

  • Frugal coloring A k-frugal vertex coloring is one in which each color appears at most k times in the neighborhood of any vertex. Pemmaraju and Srinivasan [36] showed the existence of \(({\varDelta }+1)\)-colorings that are \(O(\log ^2 {\varDelta } / \log \log {\varDelta })\)-frugal, and proved that \((\log {\varDelta } \cdot \log n / \log \log n)\)-frugal colorings could be computed in \(O(\log n)\) time. With some modifications to their proof we show that a \(O(\log ^2{\varDelta }/\log \log {\varDelta })\)-frugal \(({\varDelta }+1)\)-coloring can be computed in \(O(\log n)\) time. Notice that the best existential bound on the frugality for \(({\varDelta }+1)\)-coloring is \(O(\log {\varDelta } / \log \log {\varDelta })\) by Molloy and Reed [30].

    Hind, Molloy, and Reed [21] showed there exist \(\beta \)-frugal, \(O\big ({\varDelta }^{1+\frac{1}{\beta }}\big )\)-colorings by using the asymmetric LLL. We show how to turn their proof into a distributed algorithm that runs in \(O(\log n \cdot \log ^2 {\varDelta })\) time.

  • Girth 4 and 5 In prior work [37] we proved that triangle-free graphs have \((4+\epsilon ){\varDelta } / \ln {\varDelta }\)-colorings and gave \(\log ^{1+o(1)} n\) time algorithms for \((4+\epsilon ){\varDelta } / \ln {\varDelta }\)-coloring triangle-free graphs and \((1+\epsilon ){\varDelta } / \ln {\varDelta }\)-coloring girth-5 graphs. Here we prove that both problems can be solved in \(O(\log n)\) time.

  • Edge coloring Dubhashi et al. [12] gave a \((1+\epsilon ){\varDelta }\) edge-coloring algorithm running in \(O(\log n)\) time, provided that \({\varDelta } = (\log n)^{1+{\varOmega }(1)}\) is sufficiently large relative to n. In [13], Elkin, Pettie, and Su applied our LLL algorithm to show that \((1+\epsilon ){\varDelta }\) edge-coloring can be obtained in \(O(\log ^{*} {\varDelta } + \log n / {\varDelta }^{1-o(1)})\) rounds for \({\varDelta } \ge {\varDelta }_{\epsilon }\), where \({\varDelta }_{\epsilon }\) is a sufficiently large constant depending on \(\epsilon \).

  • List-coloring Suppose each vertex is issued a list of \((1+\epsilon )D > D_{\epsilon }\) colors such that each color appears in at most D lists in the neighborhood of any vertex, where \(D_\epsilon \) is a sufficiently large constant depending on \(\epsilon \). (D need not be close to the degree \({\varDelta }\).) Reed and Sudakov [39] proved that \((1+\epsilon )D\)-list-colorings exist. We show how to construct them in \(O(\log ^{*}D + \log n / D^{1-o(1)})\) time. Furthermore, for any D and any constant \(\epsilon >0\), we show that \((2e+\epsilon )D\) list coloring can be solved in \(O(\log n)\) time.

  • Defective coloring An f-defective coloring is one in which a vertex may share its color with up to f neighbors. Barenboim and Elkin [4], and implicitly, Kuhn and Wattenhofer [24] gave an O(1) time procedure to compute a \(O(\log n)\)-defective \(O({\varDelta } / \log n)\)-coloring. We prove that for any \(f>0\), an f-defective \(O({\varDelta }/f)\)-coloring can be computed in \(O((\log n)/f)\) time.

2 Preliminaries

Let \({\varGamma }^r(A)\) be the r-neighborhood of A (the set of nodes at distance at most r from A, excluding A) and \({\varGamma }^{r+}(A)={\varGamma }^{r}(A)\cup \{A\}\) be its inclusive r-neighborhood. A node set in the subscript indicates a restriction of the neighborhood to that set, e.g., \({\varGamma }_\mathcal {F}^{2+}(A) = {\varGamma }^{2+}(A) \cap \mathcal {F}\).

Consider an execution of a Moser–Tardos-type resampling algorithm. Let \(C : \mathbb {N}\rightarrow \mathcal {A}\) be such that C(i) is the ith event selected by the algorithm for resampling; C is called the record of the execution. (If the algorithm selects events in independent batches then the events in each batch can be listed arbitrarily.) A witness tree \(\tau = (T, \sigma _T)\) is a finite rooted tree where \(\sigma _T: V(T) \rightarrow \mathcal {A}\) labels each vertex in T with an event such that the children of \(u \in T\) receive labels from \({\varGamma }^{+}(\sigma _T(u))\). A 2-witness tree \(\tau = (T, \sigma _T)\) is defined in the same way except that the children of \(u \in T\) may receive labels from \({\varGamma }^{2+}(\sigma _T(u))\). A witness tree (or 2-witness tree) is proper if the children of a vertex receive distinct labels.

Given a record C, the witness tree \(\tau _C(t)\) is constructed as follows. First, create a root node labelled C(t). Looking backward in time, for each \(i = t-1, t-2, \ldots , 1\), check if an existing node is labeled with an event from \({\varGamma }^{+}(C(i))\). If so, let u be one of the deepest such nodes. Create a new node v labeled C(i) and make it a child of u. Given a witness tree \(\tau \), we say \(\tau \) occurs in C if there exists an index t such that \(\tau _C(t) = \tau \). Moser and Tardos proved the following lemma:

Lemma 1

Let \(\tau \) be a fixed witness tree and C be the record produced by the algorithm.

  1. 1.

    If \(\tau \) occurs in C, then \(\tau \) is proper.

  2. 2.

    The probability that \(\tau \) occurs in C is at most \(\prod _{v \in V(\tau )} \Pr (\sigma _T(v))\).

Similarly, for \(r\ge 2\), we can define an r-witness tree \(\tau _C^r(t)\) in the same way except that in each step we attach a node labelled C(i) to the deepest node among nodes labelled \({\varGamma }^{r+}(C(i))\). Also, we say \(\tau \) r-occurs in C if there exists \(t \in \mathbb {N}\) such that \(\tau _C^r(t) = \tau \). Then Lemma 2 holds analogously:

Lemma 2

Let \(\tau \) be a fixed r-witness tree and C be the record produced by the algorithm.

  1. 1.

    If \(\tau \) r-occurs in C, then \(\tau \) is proper.

  2. 2.

    The probability that \(\tau \) r-occurs in C is at most \(\prod _{v \in V(\tau )} \Pr (\sigma _T(v))\).

3 Algorithms

Recall that the parallel/distributed Moser–Tardos algorithm iteratively selects maximal independent sets (MIS) of violated events for resampling. They proved that if there is some slack in the general LLL preconditions then the algorithm terminates in \(O(\log n)\) rounds of MIS.

Theorem 1

(Moser and Tardos) Let \(\mathcal {P}\) be a finite set of mutually independent random variables in a probability space. Let \(\mathcal {A}\) be a finite set of events determined by these variables. If there exists an assignment of reals \(x: \mathcal {A} \rightarrow (0,1)\) such that

$$\begin{aligned} \forall A \in \mathcal {A}: \Pr (A) \le (1-\epsilon ) x(A) \prod _{B \in {\varGamma }(A)} (1- x(B)), \end{aligned}$$

then the probability any bad event occurs after k resampling rounds of Algorithm 1 is at most \((1-\epsilon )^{k} \sum _{A \in \mathcal {A}} \frac{x(A)}{1 - x(A)}\).

In other words, if x(A) is bounded away from 1 then \(O(\log _{\frac{1}{1-\epsilon }} n)\) resampling rounds suffice, w.h.p. A distributed implementation of this algorithm takes \(O(\log _{\frac{1}{1-\epsilon }}n \cdot \mathrm {MIS}(n, d))\), where d is the maximum degree of \(G_{\mathcal {A}}\) and \(\mathrm {MIS}(n, d)\) is the time needed to find an MIS in an n-vertex degree-d graph. It is known that \(\mathrm {MIS}(n,d) = {\varOmega }\big (\min \big \{\frac{\log d}{\log \log d} , \sqrt{ \frac{{\log n}}{\log \log n}} \big \}\big )\) [23]. Our algorithms avoid the computation of MISs. In Sect. 3.1 we analyze the simple distributed LLL algorithm presented in the introduction, which requires slightly weakening the general LLL conditions. In Sect. 3.2 we present an algorithm that works for the standard LLL conditions but is slower by a \(O(\log ^2 d)\) factor.

3.1 A simple distributed algorithm

Recall that in each round of Algorithm 2, a violated event \(A\in \mathcal {F}\) is selected for resampling if \({\text {ID}}(A)\) is a local minimum in the violated subgraph \(G_{\mathcal {F}}\). In order to analyze this algorithm in the witness tree framework we must establish some connection between the depth of witness trees and the number of rounds of resampling. Lemma 3 will let us make such a connection.

Lemma 3

Suppose an event A is resampled in round \(j>1\) of Algorithm 2. There must exist some \(B \in {\varGamma }^{2+}(A)\) resampled in round \(j-1\).

Proof

Let \(\mathcal {F}'\) and \(\mathcal {F}\) be the violated event sets just before and after the resampling step at round \(j-1\). If A is not in \(\mathcal {F}'\) but is in \(\mathcal {F}\) then its variables \({\text {vbl}}(A)\) must have been changed in round \(j-1\), which could only occur if some \(B\in {\varGamma }(A)\) were resampled. Now suppose A is in both \(\mathcal {F}'\) and \(\mathcal {F}\). It was not resampled in round \(j-1\) but was in round j, meaning \({\text {ID}}(A)\) is not a local minimum in \({\varGamma }_{\mathcal {F}'}(A)\) but is a local minimum in \({\varGamma }_{\mathcal {F}}(A)\). This implies that some neighbor \(B\in {\varGamma }(A)\) with \({\text {ID}}(B) < {\text {ID}}(A)\) is in \(\mathcal {F}'\) but not \(\mathcal {F}\), which could only occur if some \(C\in {\varGamma }^+(B) \subseteq {\varGamma }^{2+}(A)\) were resampled in round \(j-1\). \(\square \)

We can now proceed to bound the number of rounds of Algorithm 2 needed to find a satisfying assignment.

Theorem 2

(Asymmetric LLL) Let \(\mathcal {P}\) be a finite set of mutually independent random variables in a probability space. Let \(\mathcal {A}\) be a finite set of events determined by these variables. If there exists an assignment of reals \(x: \mathcal {A} \rightarrow (0,1)\) such that

$$\begin{aligned} \forall A \in \mathcal {A}: \Pr (A) \le (1-\epsilon ) x(A) \prod _{B \in {\varGamma }^2(A)} (1- x(B)), \end{aligned}$$

then the probability any bad event occurs after k resampling rounds of Algorithm 2 is at most \((1-\epsilon )^{k} \sum _{A \in \mathcal {A}} \frac{x(A)}{1 - x(A)}\).

Note the difference with Theorem 1 is that the product is over all \(B \in {\varGamma }^{2}(A)\) not \(B \in {\varGamma }(A)\).

Corollary 1

(Symmetric LLL) Let \(\mathcal {P}\) be a finite set of mutually independent random variables in a probability space. Let \(\mathcal {A}\) be a finite set of events determined by these variables, such that for all \( A \in \mathcal {A}\)

  1. 1.

    \(Pr(A) \le p < 1\), and

  2. 2.

    A shares variables with at most d of the other events.

If \(epd^2 < 1\), then w.h.p. none of the bad events occur after \(O(\log _{\frac{1}{epd^2}} n)\) rounds of Algorithm 2.

Proof

Setting \(x(A) = 1/d^2\) and \(\epsilon = 1- epd^2\) in Theorem 2, we have

$$\begin{aligned}&(1-\epsilon )x(A) \prod _{B \in {\varGamma }^2(A)}( 1- x(B)) \\&\quad \ge \frac{1-\epsilon }{d^2} \cdot \left( 1-\frac{1}{d^2}\right) ^{|{\varGamma }^2(A)|} \\&\quad \ge \frac{1-\epsilon }{d^2} \left( 1-\frac{1}{d^2}\right) ^{(d^2 - 1)} \\&\quad \ge \frac{1-\epsilon }{ed^2} \ge p \ge \Pr (A). \end{aligned}$$

Therefore, the probability a bad event occurs after k rounds of resampling is at most \((1-\epsilon )^k \sum _{A \in \mathcal {A}} \frac{x(A)}{1-x(A)} = (1-\epsilon )^k n / (d^2-1)\), which is \(1 / {\text {poly}}(n)\) if \(k = O(\log _{\frac{1}{1-\epsilon }} n) = O(\log _{\frac{1}{epd^2}} n)\). \(\square \)

Following Moser and Tardos [33] we analyze the following Galton-Watson process for generating an r-witness tree T. Fix an event \(A \in \mathcal {A}\). Begin by creating a root for T labelled A. To shorten the notation, we let \([v]:=\sigma _T(v)\). In each subsequent step, consider each vertex v created in the previous step. For each \(B \in {\varGamma }^{r+}([v])\), independently, attach a child labelled B with probability x(B) or skip it with probability \(1-x(B)\). Continue the process until no new vertices are born. We prove a lemma analogous to one in [33].

Lemma 4

Let \(\tau \) be a fixed proper r-witness tree with its root vertex labelled A. The probability \(p_{\tau }\) that the Galton-Watson process yields exactly the tree \(\tau \) is

$$\begin{aligned} p_{\tau } = \frac{1-x(A)}{x(A)} \prod _{v \in V(\tau )}x'([v]) \end{aligned}$$

where \(x'(B) = x(B) \cdot \Pi _{C \in {\varGamma }^{r}(B) } (1 - x(C))\).

Proof

Let \(W_{v} \subseteq {\varGamma }^{r+}([v])\) denote the set of inclusive r-neighbors of [v] that do not occur as a label of some child node of v. Then,

$$\begin{aligned}&p_{\tau } = \frac{1}{x(A)} \cdot \prod _{v \in V(\tau )} \left( x([v]) \cdot \prod _{u \in W_v}(1 - x([u]) \right) \\&\quad = \frac{1 - x(A)}{x(A)}\cdot \prod _{v \in V(\tau )} \left( \frac{x([v])}{1-x([v])} \cdot \prod _{u \in {\varGamma }^{r+}([v])}(1 - x([u])) \right) \\&\quad = \frac{1 - x(A)}{x(A)}\cdot \prod _{v \in V(\tau )} \left( x([v]) \cdot \prod _{u \in {\varGamma }^{r}([v])}(1 - x([u])) \right) \\&\quad = \frac{1 - x(A)}{x(A)} \cdot \prod _{v \in V(\tau )} x'([v]) \end{aligned}$$

\(\square \)

Lemma 5

If for all \(A \in \mathcal {A}\), we have \(\Pr (A) \le (1-\epsilon )x(A) \cdot \prod _{B \in {\varGamma }^{r}(A)}(1-x(B))\), then the probability that any r-witness tree of size at least k occurs is at most \((1-\epsilon )^{k} \cdot \sum _{A \in \mathcal {A}} \frac{x(A)}{1-x(A)}\).

Proof

Let \(\mathcal {T}_A^r (k)\) denote the infinite set of r-witness trees having root labelled A and containing at least k vertices. By Lemma 2 and the union bound, the probability there exists a violated event after k resampling rounds is at most

$$\begin{aligned}&\sum _{A \in \mathcal {A}} \sum _{\tau \in \mathcal {T}_A^r (k)} \Pr (\tau \,r\text{-occurs } \text{ in } C) \\&\quad \le \sum _{A \in \mathcal {A}} \, \sum _{\tau \in \mathcal {T}_A^r (k)} \, \prod _{v \in V(\tau )} \Pr ([v])&\text{ by } \text{ Lemma } \text{2 } \\&\quad \le \sum _{A \in \mathcal {A}} \, \sum _{\tau \in \mathcal {T}_A^r (k)} \, \prod _{v \in V(\tau )} (1-\epsilon )x'([v])\\&\quad \le (1-\epsilon )^{k} \sum _{A \in \mathcal {A}} \frac{x(A)}{1 - x(A)} \sum _{\tau \in \mathcal {T}_A^r (k)} p_{\tau }&\text{ by } \text{ Lemma } \text{4 } \\&\quad \le (1-\epsilon )^k \sum _{A \in \mathcal {A}} \frac{x(A)}{1 - x(A)} \end{aligned}$$

The last inequality follows since the Galton-Watson process grows exactly one tree. \(\square \)

Let C be the record of Algorithm 2 and \(S_j\) be the segment of the record corresponding to resamplings in round j. The following lemma relates the number of resampling rounds with the occurence of 2-witness trees.

Lemma 6

If there is still a violated event after k resampling rounds in Algorithm 2 then some 2-witness tree of size at least k occurs in C.

Proof

Let \(A_k\) be any event in \(S_k\) and t be its position in the record C. By Lemma 3 there exist events \(A_{k-1},\ldots ,A_1\) in \(S_{k-1},\cdots ,S_1\) such that for all \(j<k\), \(A_j \in {\varGamma }^{2+}(A_{j+1})\). This implies that \(A_{k-1},\ldots ,A_1\) are mapped to distinct nodes in the 2-witness tree \(\tau _C(t)\), whose root is labeled \(A_k\). \(\square \)

Therefore, by Lemma 6, if there is a violated event after k resampling rounds, then a 2-witness tree of size at least k occurs. However, by Lemma 5, it happens with probability at most \((1-\epsilon )^k \cdot \sum _{A \in \mathcal {A}} \frac{x(A)}{1-x(A)}\). Thus, Theorem 2 holds. Note that if x(A) is bounded away from 1, then after \(O(\log _{\frac{1}{1 - \epsilon }} n)\) rounds, w.h.p. no bad event occurs.

3.2 Resampling by weak MIS

In this section we analyze the efficiency of Moser and Tardos’s Algorithm 1 when a new weak MIS procedure (Algorithm 3) is used in lieu of an actual MIS. The Weak-MIS procedure produces, in \(O(\log ^2 d)\) time, an independent set S such that the probability that a node is not in \({\varGamma }^+(S) = S\cup {\varGamma }(S)\) is \(1/{\text {poly}}(d)\). The procedure consists of \(O(\log d)\) iterations where the probability that a vertex avoids \({\varGamma }^+(S)\) is constant per iteration. Each iteration consists of \(\log d\) phases where, roughly speaking, the goal of phase i is to eliminate vertices with degree at least \(d/2^i\) with constant probability. Each phase is essentially one step of Luby’s MIS algorithm, though applied only to a judiciously chosen subset of the vertices. See Algorithm 3.

Our main results are as follows.

Theorem 3

(Asymmetric LLL) Let \(\mathcal {P}\) be a finite set of mutually independent random variables in a probability space. Let \(\mathcal {A}\) be a finite set of events determined by these variables. If there exists an assignment of reals \(x: \mathcal {A} \rightarrow (0,1)\) such that

$$\begin{aligned} \forall A \in \mathcal {A}: \Pr (A) \le (1-\epsilon ) x(A) \prod _{B \in {\varGamma }(A)} (1- x(B)), \end{aligned}$$

then the probability any bad event occurs after k resampling rounds using the Weak-MIS algorithm is at most \(n(\frac{1}{d+1})^k + (1-\epsilon )^{k/2} \sum _{A \in \mathcal {A}} \frac{x(A)}{1 - x(A)}\).

Corollary 2

(Symmetric LLL) Let \(\mathcal {P}\) be a finite set of mutually independent random variables in a probability space. Let \(\mathcal {A}\) be a finite set of events determined by these variables, such that for \(\forall A \in \mathcal {A}\),

  1. 1.

    \(Pr(A) \le p < 1\), and

  2. 2.

    A shares variables with at most d of the other events.

If \(ep(d+1) < 1\), then w.h.p. none of the bad events occur after \(O(\max (\log _{d+1} n, \log _{\frac{1}{ep(d+1)}}n))\) Weak-MIS resampling rounds.

Corollary 2 follows directly by plugging in \(x(A) = 1/(d+1)\) for all \(A \in \mathcal {A}\) and \(k = O\left( \max \left( \log _{d+1} n, \log _{\frac{1}{ep(d+1)}}n\right) \right) \). Notice that if \(\frac{1}{ep(d+1)} > d+1\), we can apply the faster simple distributed algorithm, so the running time in Corollary 2 will be dominated by \(O(\log _{\frac{1}{ep(d+1)}} n \cdot \log ^2 d)\).

figure c

Consider the first iteration of the Weak-MIS algorithm. For each phase i, \(G'\) is the subgraph of \(G_{\mathcal {F}}\) containing vertices with degree at most \(d/ 2^{i}\) and not adjacent to the independent set S. Let \(V_i = \{ v \in G' \mid \deg _{G'}(v) \ge d/ 2^{i}\}\). Note that every vertex in \(G_{\mathcal {F}}\) must end up isolated in \(S'\) or one of the \(V_i\)’s. Let (uv) be an edge in \(G'\). Following Peleg’s analysis [35], define \(\mathcal {E}(u,v)\) to be the event that at phase i, \(b(u) = 0\) and \(b(v) = 1\) and for all other neighbors x of u and v, \(b(x) = 0\). Define \(\mathcal {E}(u) = \bigcup _{v \in {\varGamma }_{G'}(u)} \mathcal {E}(u,v)\) to be the event that exactly one neighbor joins S in this phase. Since these events are disjoint, we have \(\Pr (\mathcal {E}(u)) = \sum _{v \in {\varGamma }_{G'}(u)} \Pr (\mathcal {E}(u,v))\).

Lemma 7

If \(v \in V_i\), then \(\Pr (\mathcal {E}(u)) \ge \frac{1}{4e^2}\).

Proof

\(\Pr (\mathcal {E}(u,v)) \ge p_i (1 - p_i)^{\deg _{G'}(u) + \deg _{G'}(v)} \ge p_i (1 - p_i)^{2d / 2^{i-1}} \ge p_i e^{-2}.\) Since \(\deg _{G'}(u) \ge d / 2^i\), \(\Pr (\mathcal {E}(u)) \ge \frac{d}{2^i} p_i e^{-2} \ge \frac{1}{4e^2}\) \(\square \)

Therefore, if \(v \in G_\mathcal {F} \setminus {\varGamma }^+(S)\) at the beginning of iteration l, the probability that \(v\in {\varGamma }^+(S)\) at the end of iteration l is at least \(1/(4e^2)\). We say a vertex in \(G_{\mathcal {F}}\) fails if, after all \(t=4e^2\ln (2e(d+1)^4)\) iterations, it is still not in \({\varGamma }^+(S)\).

Lemma 8

Let S be an independent set selected by Weak-MIS. If \(v \in \mathcal {F}\) then \(\Pr ({\varGamma }^{+}(v) \cap S = \emptyset ) \le \frac{1}{2e(d + 1)^4}\).

Proof

By Lemma 7, the probability that v survives iteration \(\ell \) conditioned on it surviving iterations 1 through \(\ell -1\) is at most \(1-1/(4e^2)\). Over \(t = 4e^2\ln (2e(d+1)^4)\) iterations the probability of failure is at most \((1-1/(4e^2))^t \le e^{-\ln (2e(d+1)^4)} = \frac{1}{2e(d+1)^4}\). \(\square \)

The next step is to relate the number of rounds of Weak-MIS resampling with the size of witness trees.

Lemma 9

Suppose a bad event is violated after k rounds of Weak-MIS resampling and the maximum depth of the witness trees is t, then there exists a sequence of not necessarily distinct vertices \(v_1, \ldots , v_k\) such that the following hold:

  1. (1)

    \(v_i \in G_i\), where \(G_i\) is the violated subgraph \(G_\mathcal {F}\) at the beginning of round i.

  2. (2)

    \(v_{i+1} \in {\varGamma }^{+}(v_i)\) for \(1 \le i \le k-1\).

  3. (3)

    For at least \(k-t\) indices \(1< l \le k\), \(v_l\) failed in the call to Weak-MIS in round \(l-1\).

Proof

For \(1 \le i \le k\), let \(S_i\) be the segment of the record C corresponding to events resampled at round i. Suppose that an event A is violated after k resampling rounds. Build a witness tree \(\tau \) with root labeled A, adding nodes in the usual fashion, by scanning the record C in time-reversed order. For each j, in decreasing order, attach a node labelled C(j) to the deepest node in \(\tau \) whose label is in \({\varGamma }^{+}(C(j))\), if such a node in \(\tau \) exists. Let \(v_{k+1} = A\). We will build \(v_k, v_{k-1}, \ldots , v_1\) in backward manner. For \(k \ge i \ge 1\), we claim there is an event \(v_i \in {\varGamma }^{+}(v_{i+1})\) such that either \(v_i \in S_{i}\) or \(v_i \in G_i\) and \(v_i\) failed at round i. If \(v_{i+1}\not \in G_i\) is not violated at the beginning of round i, then it must be the case that there exists an event \(v_i \in {\varGamma }^{+}(v_{i+1})\) resampled at round i to cause \(v_{i+1} \in G_{i+1}\). On the other hand, if \(v_{i+1}\in G_i\) is violated at the beginning of round i, then either there exists \(v_i \in {\varGamma }^{+}(v_{i+1})\) resampled at round i or \(v_{i+1}\) failed at round i. In the latter case, we let \(v_i = v_{i+1}\). Notice that \(\tau \) (excluding its artificial root labeled A) is a witness that occured and thus has depth at most t. Since in each of the k rounds, either the depth of our witness tree grows or a vertex fails, at least \(k-t\) vertices must have failed in their respective rounds. \(\square \)

Notice that the total possible number of sequences satisfying (2) in Lemma 9 is at most \(n(d + 1)^{k-1}\). Given a sequence of vertices \(P = (v_1,\ldots , v_k)\) satisfying (2), define \(X_{P}^{(i)}\) to be 1 if \(v_i \in G_i\) and \(v_i\) failed, 0 otherwise. Let \(X_P = \sum _{i=1}^{k} X_{P}^{(i)}\). If a sequence satisfying (1–3) occured, then there exists P such that \(X_P \ge k-t\). Since \(X^{(1)}_{P}, \ldots , X^{(i-1)}_{P}\) are determined by \(S_1,\ldots , S_{i-1}\) and \(G_1, \ldots , G_{i-1}\), \({\text {E}}(X^{(i)}_P \mid X^{(1)}_P,\ldots ,X^{(i-1)}_P) = {\text {E}}(X^{(i)}_P \mid S_1,\ldots , S_{i-1}, G_1, \ldots , G_{i-1}) \le q \mathop {=}\limits ^{\text {def}}\frac{1}{2e(d + 1)^4}\) by Lemma 8. Fixing \(t = k/2\), we have \(k-t = k/2 = kq\cdot e(d+1)^4 \le {\text {E}}[X_{P}] \cdot e(d+1)^4\). By Lemma 19 (Conditional Chernoff Bound):

$$\begin{aligned} \Pr (X_P \ge k/2)&\le \left( \frac{e^{e(d + 1)^4 - 1}}{\left( e(d + 1)^4\right) ^{e(d + 1)^4}}\right) ^{\frac{k}{2e(d + 1)^4}} \\&\le \left( \frac{1}{(d+1)^2}\right) ^{k}. \end{aligned}$$

By the union bound over all possible P satisfying (2), the probability that any such sequence in Lemma 9 occurs is at most

$$\begin{aligned} n\left( d+1\right) ^{k-1} \cdot \left( \frac{1}{(d+1)^2} \right) ^{k} \le n \cdot \left( \frac{1}{d+1}\right) ^{k}. \end{aligned}$$

Moser and Tardos showed that the probability that any witness tree of size at least t occurs is at most \((1-\epsilon )^t\sum _{A \in \mathcal {A}} \frac{x(A)}{1-x(A)}\). Thus, either a witness tree of depth at least \(t = k/2\) occurs or there exists a sequence of vertices (as in Lemma 9) such that \(t - k = k/2\) of them failed. The probability either of these occurs is at most \(n\cdot \left( \frac{1}{d+1}\right) ^{k} + (1-\epsilon )^{k/2}\sum _{A \in \mathcal {A}} \frac{x(A)}{1-x(A)}\) by the union bound.

3.3 A sublogarithmic algorithm

We have seen a faster algorithm for LLL when the general condition \(ep(d+1) < 1\) is replaced by a stronger condition \(p\cdot f(d) < 1\), where f(d) is a faster growing function than \(e(d+1)\). The question of how fast we can do for a stronger condition arises. Does there exist a sublogarithmic algorithm for faster growing f(d), independently of n? We answer this affirmatively for an exponential function of d.

Inspired by [3], our approach is a two-stage approach. In the first stage, we run Algorithm 2 for k(n) rounds. Then we identify the dangerous events, who are likely to become violated if some subset of its neighborhood is resampled. We will show there is a feasible solution by re-assigning the variables belonging to dangerous events. Moreover, we show the components induced by the dangerous events are likely to have weak diameter at most k(n). The weak diameter of a component is the maximum distance w.r.t. the original graph of any pair in the component. In the second stage, each component of dangerous events computes the answer independent of others in time proportional to its weak diameter.

Theorem 4

(Asymmetric LLL) Let \(\Pr (A) \le P_2(A) \le 1\) and \(P_1(A) = 2^{d}\cdot \frac{\Pr (A)}{P_2(A)}\), where d is the maximum degree of the dependency graph. If there exists an assignments of reals \(x_1, x_2 : \mathcal {A} \rightarrow (0,0.99]\) such that for all \(A \in \mathcal {A}\)

  1. 1.

    \(P_1(A) \le (1-\epsilon ) x_1(A) \prod _{B \in {\varGamma }^3(A)} (1-x_1(B))\)

  2. 2.

    \(P_2(A) \le x_2(A) \prod _{B \in {\varGamma }(A)}(1 - x_2(B))\)

then the LLL problem can be solved in \(O\left( \log _{1/(1-\epsilon )} n / \log \right. \left. \log _{1/(1-\epsilon )} n\right) \) rounds.

Proof Sketch of Theorem 4. Given an assignment of each variables, we will classify the vertices into safe vertices and dangerous vertices. An event A is safe if the probability A becomes violated when any subset of its neighbors resample is at most \(P_2(A)\). In contrast, the dangerous vertices are those where there exists a subset of neighbors whose resampling will cause it to be violated with probability greater than \(P_2(A)\).

Using conditional probability, we can bound the probability that a vertex becomes dangerous after a random sampling of \({\text {vbl}}(A)\) by \(P_1(A) = 2^{d} \Pr (A) / P_2(A)\) (Lemma 10). Using Cond. 1 in Theorem 4, we show in Lemma 11 that after we resample dangerous vertices using the simple distributed algorithm for k rounds, if there exists a dangerous component whose weak diameter is at least k, then a 3-witness tree of size \({\varOmega }(k \log k)\) would occur. When \(k = {\varTheta }(\log n /\log \log n)\), a 3-witness tree of size \(O(\log n)\) would occur, which happens with probability at most \(1/{\text {poly}}(n)\). Therefore, with high probability, after \(O(\log n/ \log \log n)\) rounds of resampling, the weak diameters of the dangerous components are bounded by \(O(\log n /\log \log n)\). Finally, a feasible assignment for a dangerous component can be found in \(O(\log n/ \log \log n)\) rounds locally, independent of other dangerous components, which can be argued using Cond. 2 in Theorem 4 and the definition of dangerous vertices.

Proof (Proof of Theorem 4)

Fix \(\emptyset \subseteq \mathcal {D} \subseteq {\varGamma }(A)\), let \(T_{\mathcal {D}}\) denote the set of assignments b for \({\text {vbl}}(A) \setminus {\text {vbl}}(\mathcal {D})\) such that \(b \in T_{\mathcal {D}}\) iff when the variables in \({\text {vbl}}(A) \setminus {\text {vbl}}(\mathcal {D})\) are fixed to be equal to b, the probability A becomes violated after sampling variables in \({\text {vbl}}(\mathcal {D})\) exceeds \(P_2(A)\), that is,

$$\begin{aligned} T_{\mathcal {D}} = \{b \mid \Pr (A \mid {\text {vbl}}(A) \setminus \ {\text {vbl}}(\mathcal {D}) = b) > P_2(A) \} \end{aligned}$$

Given an assignment of the variables of A, we call A “dangerous” if there exists \(\emptyset \subseteq \mathcal {D} \subseteq {\varGamma }(A)\) such that \({\text {vbl}}(A) \setminus {\text {vbl}}(\mathcal {D}) \in T_{\mathcal {D}}\). Otherwise, A is “safe”. Notice that if A is violated then A is also dangerous, if we choose \(\mathcal {D} = \emptyset \).

\(\square \)

Lemma 10

The probability that A becomes dangerous after (re)sampling \({\text {vbl}}(A)\) is at most \(P_1(A)\).

Proof

By the union bound over each subset of neighbors, the probability that A becomes dangerous after sampling or resampling variables in \({\text {vbl}}(A)\) is at most

$$\begin{aligned}&\sum _{\emptyset \subseteq \mathcal {D} \subseteq {\varGamma }(A)} \Pr ({\text {vbl}}(A) \setminus {\text {vbl}}(\mathcal {D}) \in T_{\mathcal {D}}) \\&\quad =\sum _{\emptyset \subseteq \mathcal {D} \subseteq {\varGamma }(A)}\sum _{b \in T_{\mathcal {D}}} \Pr ({\text {vbl}}(A) \setminus {\text {vbl}}(\mathcal {D}) = b) \\&\quad =\sum _{\emptyset \subseteq \mathcal {D} \subseteq {\varGamma }(A)}\sum _{b \in T_{\mathcal {D}}} \frac{\Pr (A \cap ({\text {vbl}}(A) \setminus {\text {vbl}}(\mathcal {D}) = b))}{\Pr (A \mid {\text {vbl}}(A) \setminus {\text {vbl}}(\mathcal {D}) = b)} \\&\quad \le \sum _{\emptyset \subseteq \mathcal {D} \subseteq {\varGamma }(A)} \sum _{b \in T_{\mathcal {D}}} \frac{\Pr (A \cap ({\text {vbl}}(A) \setminus {\text {vbl}}(\mathcal {D}) = b))}{P_2(A)}\\&\quad \le \sum _{\emptyset \subseteq \mathcal {D} \subseteq {\varGamma }(A)} \frac{\Pr (A)}{P_2(A)} \\&\quad \le 2^d \cdot \frac{\Pr (A)}{P_2(A)} = P_1(A). \end{aligned}$$

\(\square \)

For each A, we define a new event \(A'\) to be that A becomes violated after resampling the variables of the dangerous events. Also, we let \(\mathcal {A'}\) to be the set of all new events. If A is safe, then \(\Pr (A') \le P_2(A)\) by definition of safe. If A is dangerous, then \(\Pr (A') = \Pr (A) \le P_2(A)\). By the second condition in Theorem 4, there exists \(x': \mathcal {A'} \rightarrow (0,0.99]\) such that \(\Pr (A') \le x'(A') \prod _{B' \in {\varGamma }(A')} (1 - x'(B'))\) for all \(A' \in \mathcal {A}'\). Therefore, by the standard asymmetric LLL, with non-zero probability, no new events \(A' \in \mathcal {A}'\) occur. This implies there exists a feasible solution by reassigning only the variables of the dangerous events.

Let \(E' \subseteq E\) be the edges having at least one endpoint that is dangerous. Let \(G'\) be the graph induced by \(E'\). Each component of \(G'\) can compute the feasible solution independent of other components. (It is tempting to consider the components induced by only the dangerous vertices. However, when such components \(C_1\) and \(C_2\) are both adjacent to a safe vertex u, we have to consider \(C_1\) and \(C_2\) simultaneously to find an assignment that does not cause u to occur.)

Next we will show that the weak diameter of each component in \(G'\) is bounded. Note that if the weak diameter of each component in \(G'\) is at most D, then each component can find the feasible solution in O(D) time. Each vertex will first learn the topology up to distance D, which is possible in the LOCAL model. Then the leader in each component (say the vertex with the smallest ID) computes the feasible solution locally and then broadcasts the solution back to other vertices in the component.

Lemma 11

Suppose that the conditions in Theorem 4 hold, and there exists a component of weak diameter at least k after running k rounds of the simple distributed algorithm, then a 3-witness tree of size \({\varOmega }(k \log k)\) occurs.

Proof

Suppose that there exists uv in the same component in \(G'\) and \({\text {dist}}_G(u,v) = D \ge k\). Since uv are connected in \(G'\), there exists a shortest u-v path \(P_{uv}\) of length at least D in \(G'\). Notice that there are no consecutive safe vertices in \(P_{uv}\) by the definition of \(G'\). Recall that \(S_{i}\) is the set of events resampled in round i. Let \(L_{k+1}\) be the set of dangerous vertices in \(P_{uv}\). Ideally, one would build \(|L_{k+1}|\) 2-witness trees of depth k, each rooted at each vertex in \(L_{k+1}\), and then glue them together into a 3-witness tree of size \(k \cdot |L_{k+1}|\). However, these 2-witness trees may overlap, so the final 3-witness tree may be much smaller. In the following, we will lower bound the size of the union of the 2-witness tree level by level and show that the size of the final 3-witness tree can be lower bounded.

For each dangerous vertex x in \(P_{uv}\) (i.e. \(x \in L_{k+1}\)), define \(L_{k+1}(x) = \{x \}\). For \(1 \le i \le k\), define \(L_{i}(x)\) inductively to be the set of events sampled during round i that are within distance 2 to any events in \(L_{i+1}(x)\). Define \(L_{i} = \bigcup _{x \in P_{uv}}L_{i}(x)\). For each \(1\le i \le k\), we will show the size of \(L_{i}\) is at least \(\frac{D-2}{4(k-i+1)+2}\).

Notice that \(L_i(x)\) must be non-empty, because by Lemma 3, for each \(k+1 \ge j > i\) and each vertex \(w_j\) in \(L_{j}\), there exists a vertex \(w_{j-1} \in S_{j-1}\) such that \(w_{j-1} \in {\varGamma }^{2+}(w_{j})\). Also, for all \(w \in L_{i}(x)\), \({\text {dist}}_G(x,w) \le 2(k-i+1)\), since by definition of \(L_{i}(x)\), there exists a sequence of vertices \((x= v_{k+1}, v_{k}, \ldots , v_{i} = w)\) such that \(v_i' \in L_i(x)\) for \(k+1 \ge i' \ge i\) and \({\text {dist}}_{G}(v_{i'+1}, v_{i'}) \le 2\) for \(k+1 > i'\ge i\).

Let \(P_{uv} = \{x_0,x_1,\ldots x_{|P_{uv}|}\}\). Let \(j=0\) if \(x_0\) is dangerous; otherwise \(x_1\) must be dangerous and we let \(j=1\). Repeat the following procedure (see Fig. 1): Select any \(w \in L_i(x_j)\). Note that \(x_j\) must be dangerous and \(L_i(x_j)\) is well-defined. Let \(x_{j'}\) be the rightmost vertex in \(P_{uv}\) such that \(w \in L_i(x_j')\) (it can be the case that \(j'=j\)). If \(x_{j'+1}\) is dangerous, set \(j \leftarrow j'+1\); otherwise \(x_{j'+2}\) must be a dangerous vertex, then we set \(j \leftarrow j'+2\). Repeat until \(j > |P_{uv}|\) (Fig. 2).

Fig. 1
figure 1

An illustration of an iteration in the procedure for lower bounding \(L_i\). The dashed lines are paths with length at most \(2(k-i+1)\). In this iteration, the difference, \({\varDelta }\), between the new position and the old position of j is 5. Therefore, if \(2\cdot 2(k-i+1) + 2 < 5\), then the detour from \(x_j\) to \(x'_j\) via \(L_i(x_j)\) would be shorter the distance between \(x_j\) and \(x_j'\) on \(P_{uv}\)

Fig. 2
figure 2

An illustration showing that each resampled events in \(L_i\) is in the 3-witness tree rooted at \(y_s\). The vertices inside the boxes are the independent set I. The dashed line is a sequence of vertices, where adjacent vertices have distance at most 2. The arrow links denote two vertices are within distance 3

\(|L_i|\) must be lower bounded by the total number of iterations l in the procedure above. We will show that we cannot move too far in each iteration, otherwise we would have a path shorter than \({\text {dist}}_G(u,v)\) connecting u and v. Let \({\varDelta }_t\) be the difference of j at the beginning of iteration t and at the end of iteration t. The procedure terminates only if \(\sum _{t=1}^{l} {\varDelta }_t \ge |P_{uv}|-2\) (the minus 2 came from the fact that the first and the last vertex in \(P_{uv}\) can be safe). Consider iteration t, if \({\varDelta }_t > 4(k-i+1) + 2\), it must reduce the distance between u and v by at least \({\varDelta }_t - 4(k-i+1) - 2\). However, the total distance we can reduce is at most \(|P_{uv}| - D\), for otherwise we would have a path connecting u and v with length less D, contradicting with \({\text {dist}}_G(u,v)=D\). Therefore,

$$\begin{aligned} |P_{uv}| - D&\ge \sum _{t=1}^{l}\left( {\varDelta }_t - 4(k-i+1) - 2\right) \\&\ge \left( \sum _{t=1}^{l}{\varDelta }_t\right) - \left( 4(k-i+1) - 2\right) l \\&\ge |P_{uv}| - 2- \left( 4(k-i+1) - 2\right) l \end{aligned}$$

which implies

$$\begin{aligned} l \ge \frac{D-2}{4(k-i+1) - 2} \ge \frac{k-2}{4(k-i+1) - 2}. \end{aligned}$$

Next, we will show that we can glue all the resampled events in \(L_1, \ldots , L_k\) into a single 3-witness tree. We select an independent set \(I = \{y_1,\ldots ,y_s\} \subseteq L_{k+1}\) by starting from the leftmost vertex in \(L_{k+1}\) and repeatedly selecting the first non-adjacent vertex in \(L_{k+1}\). Therefore, \(y_{j+1}\) is in distance at most 3 from \(y_{j}\) for \(1\le j < s\). Also, each \(x_j \in L_{k+1}\) is adjacent to at least one vertex in I. Since I is an independent set, we can append \(y_1,\ldots ,y_s\) to our record artificially. We claim that each node in \(L_i\) for \(1 \le i \le k\) corresponds to a node in the 3-witness tree rooted at \(y_s\). For every node w in \(L_i\), there must exist \(x \in L_{k+1}\) such that \(w \in L_i (x)\). Since x is adjacent to some \(y_j \in I\), it implies w is in the 3-witness tree rooted at \(y_j\). Finally, since \(y_j\) is a node in the 3-witness tree rooted at \(y_s\), w must also be a node in the 3-witness tree rooted at \(y_s\). The 3-witness tree rooted at \(y_s\) must have size at least \(\sum _{i=1}^{k} \frac{k-2}{4(k-i+1) - 2} = {\varOmega }(k \log k)\). \(\square \)

By choosing \(k = {\varOmega }\left( \frac{\log _{1/(1-\epsilon )} n}{\log \log _{1/(1-\epsilon )} n}\right) \), if there exists a component in \(G'\) with diameter at least k, then there exists a 3-witness of size at least \({\varOmega }(\log _{1/(1-\epsilon )} n)\) w.h.p. However, by Condition 1 in Theorem 4 and by Lemma 5, the probability that such a 3-witness tree occurs is at most \(1/{\text {poly}}(n)\). Therefore, we can conclude that after \(O\left( \frac{\log _{1/(1-\epsilon )} n}{\log \log _{1/(1-\epsilon )} n}\right) \) rounds, the weak diameter of each component in \(G'\) is at most \(O\left( \frac{\log _{1/(1-\epsilon )} n}{\log \log _{1/(1-\epsilon )} n}\right) \) w.h.p. and the solution can be found in time proportional to the weak diameter. This completes the proof of Theorem 4. \(\square \)

Corollary 3

(Symmetric LLL) Suppose that for all \(A \in \mathcal {A}\), \(\Pr (A) \le p\) and A shares variables with at most d other events in \(\mathcal {A}\). Let \(z = 4ep2^d d^4\). If \(z < 1\), then a satisfying assignment can be found in \(O(\log _{1/z} n / \log \log _{1/z} n)\) rounds.

Proof (Proof of Collorary 3)

For each \(A \in \mathcal {A}\), let \(P_2(A) = \frac{1}{4d} \ge p \ge \Pr (A)\) and so \(P_1(A) = 2^d\cdot \frac{\Pr (A)}{P_2(A)} \le 4 p d 2^d\). Let \(x_1(A) = 1/d^3\), \(x_2(A) = 1/(2d)\) and \(1 - \epsilon = 4ep2^d d^4\). First, we check that condition 1 in Theorem 4 holds

$$\begin{aligned}&(1-\epsilon )x_1(A) \prod _{B \in {\varGamma }^{3}(A)} (1 - x_1(A)) \\&\quad = 4ep2^d d^4 \cdot \frac{1}{ed^3} \cdot \left( 1 - \frac{1}{d^3} \right) ^{|{\varGamma }^{3}(A)|} \\&\quad \ge 4ep2^d d \left( 1 - \frac{1}{d^3}\right) ^{d^3 - 1} \\&\quad \ge 4p2^d d = \Pr (A). \end{aligned}$$

Condition 2 also holds similarly,

$$\begin{aligned} x_2(A) \prod _{B \in {\varGamma }(A)}(1-x_2(A))&\ge \frac{1}{2d} \cdot \left( 1-\frac{1}{2d}\right) ^{d} \\&= \frac{1}{2d} \cdot \frac{1}{2} = P_2(A). \quad \end{aligned}$$

\(\square \)

3.4 Lower bound

Linial [26] proved that in an n-vertex ring, any distributed \((\log ^{(k)} n)\)-coloring algorithm requires \({\varOmega }(k)\) rounds of communication, even if randomization is used. In particular, O(1)-coloring a ring requires \({\varOmega }(\log ^* n)\) time. We prove that Linial’s lower bound implies that even weak versions of the Lovász local lemma cannot be computed in constant time.

Theorem 5

Let \(\mathcal {P}\), \(\mathcal {A}\), and \(G_{\mathcal {A}}\) be defined as usual. Let d be the maximum degree of any vertex in \(G_{\mathcal {A}}\), \(p = \max _{A\in \mathcal {A}} \Pr (A)\) be the maximum probability of any bad event, and \(f : \mathbb {N}\rightarrow \mathbb {N}\) be an arbitrarily quickly growing function, where \(f(d) \ge e(d+1)\). If \(p\cdot f(d) < 1\) then \(\Pr (\bigcap _{A\in \mathcal {A}} \overline{A}) > 0\). However, \({\varOmega }(\log ^* |\mathcal {A}|)\) rounds of communication are required for the vertices of \(G_{\mathcal {A}}\) to agree on a point in \(\bigcap _{A\in \mathcal {A}} \overline{A}\).

The purpose of the function f is to show that our lower bound is insensitive to significant weakening of the standard criterion “\(ep(d+1)<1\).” We could just as easily substitute \(e^{e^{d}} p < 1\) or any similar criterion, for example.

Proof

Consider the following coloring procedure. Each vertex in an n-vertex ring selects a color from \(\{1,\ldots ,c\}\) uniformly at random. An edge is bad if it is monochromatic, an event that holds with probability \(p=1/c\). Let \(\mathcal {A}\) be the dependency graph for these events having maximum degree \(d=2\) and choose c to be (the constant) \(f(2)+1\), for any quickly growing function f. It follows from the LLL that a good c-coloring exists since \(p\cdot f(2) < 1\). However, by [26], the vertices of \(G_{\mathcal {A}}\) require \({\varOmega }(\log ^* n - \log ^* c) = {\varOmega }(\log ^* n)\) time to find a good c-coloring. \(\square \)

It is also possible to obtain conditional lower bounds on distributed versions of the LLL. For example, the best known randomized \(O({\varDelta })\)-coloring algorithm takes \(\exp (O(\sqrt{\log \log n}))\) time [6], though better bounds are possible if \({\varDelta } \gg \log n\) [40]. If LLL could be solved in less than \(\exp (O(\sqrt{\log \log n}))\) time then we could improve on [6], as follows. Each vertex in G selects a color from a palette of size \(c \ge 2e{\varDelta }\) uniformly at random. As usual, an edge is bad if it is monochromatic. The dependency graph of these bad events corresponds to the line graph of G, which has maximum degree \(d=2{\varDelta }-2\). Since \(e(1/c)(d+1) < 1\), a valid coloring can be found with one invocation of an LLL algorithm. Therefore, if the result of [6] turns out to be tight, then there is an \(\exp (O(\sqrt{\log \log n})))\) time lower bound of for LLL.

4 Applications

The Lovász local lemma has applications in many coloring problems, such as list coloring, frugal coloring, total coloring, and coloring triangle-free graphs [29]. We give a few examples of constructing these colorings distributively. In these applications, the existential bounds are usually achieved by the so called “Rödl Nibble” method or the semi-random method. The method consists of one or more iterations. Each iteration is a random process and some local properties are maintained in the graph. The properties depend on the randomness within a constant radius. Each property is associated with a bad event, which is the event that the property fails to hold. The Lovász local lemma can then be used to show the probability none of the bad events hold is positive, though it may be exponentially small in the size of the graph. This probability can then be amplified in a distributed fashion using a Moser–Tardos-type resampling algorithm. Notice that we will need to find an independent set (e.g., an MIS or Weak-MIS or set of events with locally minimal IDs) in the dependency graph induced by the violated local properties. Since we assumed the LOCAL model, the violated local properties can be identified in constant time and the algorithms for MIS/Weak-MIS can be simulated with a constant factor overhead, where each property is taken care by one of the processors nearby (within constant distance). The important point here is that the dependency graph and the underlying distributed network are sufficiently similar so that distributed algorithms on one topology can be simulated on the other with O(1) slowdown. For a simple example, see the defective coloring problem in the following subsection, where the dependency graph is \(G^2\) (i.e. nodes are adjacent in \(G^2\) iff they are within distance 2 in G).

Most applications of the LLL demand \(epd^2 < 1\) or even weaker bounds. In this case, the efficient simple distributed algorithm can be applied. (The local properties are often that some quantities do not deviate too much from their expectations. Thus, the the failure probability of each local property is often bounded via standard Chernoff-type concentration inequalities.)

4.1 Distributed defective coloring

We begin with a simple single-iteration application that uses the local lemma. Let \(\phi :V\rightarrow \{1,2,\ldots ,k\}\) be a k-coloring. Define \({\text {def}}_{\phi }(v)\) to be the number of neighbors \(w \in N(v)\) such that \(\phi (v) = \phi (w)\). The coloring \(\phi \) is said to be f-defective if \(\max _{v} {\text {def}}_{\phi }(v) \le f\). Barenboim and Elkin ([4], Open Problem 10.7) raised the problem of devising an efficient distributed algorithm for computing an f-defective \(O({\varDelta }/f)\)-coloring. Note that this problem is equivalent to partitioning the vertices into \(O({\varDelta }/f)\) sets such that each set induces a subgraph with maximum degree f.

To warm up, we give a simple procedure for obtaining an f-defective \(O({\varDelta }/ f)\)-coloring in \(O(\log n / f)\) time w.h.p., for \(f \ge 60\ln {\varDelta }\). Suppose each vertex colors itself with a color selected from \(\{1, 2, \ldots , \lceil 2{\varDelta } / f \rceil \}\) uniformly at random. For every \(v \in N(u)\), let \(X_v\) be 1 if v is colored the same as u, 0 otherwise. Let \(X = \sum _{v \in N(u)} X_v\) denote the number of neighbors colored the same as v. Let \(A_u\) denote the bad event that \(X > f\) at u. Clearly, whether \(A_u\) occurs is locally checkable by u in one round. Moreover, the event \(A_u\) only depends on the the random choices of u’s neighbors. If \(A_u\) occured and is selected for resampling, the colors chosen by u and its neighbors will be resampled. Since two events share variables only if they are within distance two, the dependency graph, \(G_{\mathcal {A}}\), is \(G^2\). Therefore, \(G_\mathcal {A}\) has maximum degree \(d = {\varDelta }^2\). Now we will calculate the probability that \(A_u\) occurs. If we expose the choice of u first, then \(\Pr (X_v = 1) \le f / (2{\varDelta })\) and it is independent among other \(v \in N(u)\). Letting \(M = f /2\), we have \(E[X] \le f / 2 = M\). By Lemma 18, \(\Pr (X > f) \le e^{-f / 6}\). Let \(A_u\) denote the bad event that \(X > f\) at u. Therefore, \(epd^2 \le e^{-(f/6 - 1 - 4 \ln {\varDelta } )} \le e^{-(f/12)} \), since \(f \ge 60\ln {\varDelta }\). By using the simple distributed algorithm, it takes \(O(\log _{1/ epd^2} n ) = O(\log n/ f)\) rounds to avoid the bad events w.h.p.

Next, we show that there is a constant \(C > 0\) such that for any \(f \ge C\), an f-defective \(O({\varDelta }/f)\)-coloring can be obtained in \(O(\log n / f)\) rounds. For \(f < C\), we can use the \(({\varDelta }+1)\)-coloring algorithms to obtain 0-defective (proper) \(({\varDelta }+1)\)-colorings that runs in \(O(\log n)\) rounds. Let \({\varDelta }_0 = {\varDelta }\) and \({\varDelta }_i = \log ^{3}{\varDelta }_{i-1}\).

figure d

An f-defective \(O({\varDelta } / f)\)-coloring in G can be obtained by calling defective-coloring(G, 1), which is described in Algorithm 4. The procedure defective-coloring(\(G'\), i) is a recursive procedure whose halting condition is when \(f \ge 60\log {\varDelta }_{i-1}\). When the condition occurs, we will use the procedure described above to obtain an f-defective \((2{\varDelta }_{i-1}/f)\)-coloring in \(G'\). Let l denote the total number of levels of the recursion. The final color of node v is a vector \((c_1,c_2, \ldots , c_l)\), where \(c_i\) denotes the color received by v at level i. Clearly, such a coloring obtained by the procedure is f-defective. The total number of colors used is:

$$\begin{aligned}&\left( \prod _{1\le i< l} \left( 1+6{\varDelta }^{-1/3}_i\right) \cdot \frac{{\varDelta }_{i-1}}{{\varDelta }_i} \right) \cdot \frac{2{\varDelta }_{l-1}}{f} \\&\quad = 2({\varDelta }/f) \cdot \prod _{1 \le i < l} \left( 1 + \frac{6}{\log \underbrace{\log ^{3} \ldots \log ^{3} {\varDelta }}_{i-1}} \right) \\&\quad = O({\varDelta } / f). \end{aligned}$$

Now we will analyze the number of rounds needed in each level i. Suppose that each vertex colors itself with a color selected from \(\{1, 2, \ldots , \lceil (1+6{\varDelta }^{-1/3}_{i})\cdot \frac{{\varDelta }_{i-1}}{ {\varDelta }_{i}} \rceil \}\) uniformly at random. For every \(v \in N(u)\), let \(X_v\) be 1 if v is colored the same as u, 0 otherwise. Let \(X = \sum _{v \in N_{G'}(u)} X_v\) denote the number of neighbors colored the same as v. Let \(A_u\) denote the bad event that \(X > {\varDelta }_i\) at u. The dependency graph \(G_\mathcal {A}\) has maximum degree \(d = {\varDelta }_{i-1}^2\), because two events share variables only if they are within distance two. If we expose the choice of u first, then \(\Pr (X_v = 1) \le \frac{{\varDelta }_i}{{\varDelta }_{i-1}} \cdot \frac{1}{1+6{\varDelta }^{-1/3}_i}\) and it is independent among other \(v \in N_{G'}(u)\). Since the maximum degree of \(G'\) is \({\varDelta }_{i-1}\), \({\text {E}}[X] \le {\varDelta }_i \cdot \frac{1}{1+6{\varDelta }^{-1/3}_{i}} \). By Chernoff Bound (Lemma 18),

$$\begin{aligned} \Pr (A_u)&= \Pr (X> {\varDelta }_i) \\&\le \Pr \left( X > \left( 1+6{\varDelta }^{-1/3}_i\right) \cdot {\text {E}}[X]\right) \\&\le e^{-6^2{\varDelta }_{i}^{-2/3}\cdot {\text {E}}[X]/3} \\&\le e^{-6{\varDelta }_i^{1/3}} = e^{-6\ln {{\varDelta }_{i-1}}}. \end{aligned}$$

Therefore, \(epd^2 \le e^{-\ln {\varDelta }_{i-1}}\) and so Algorithm 2 runs in \(O(\log n / \log {{\varDelta }_{i-1}})\) rounds. The total number of rounds over all levels is therefore

$$\begin{aligned}&O\left( \log n \cdot \left( \frac{1}{\log {\varDelta }} + \frac{1}{\log \log ^{3} {\varDelta }} + \cdots + \frac{1}{\log {{\varDelta }_{l-1}}} + \frac{1}{f} \right) \right) \\&\quad = O\left( \frac{\log n}{f}\right) . \end{aligned}$$

4.2 Distributed frugal coloring

A \(\beta \) -frugal coloring of a graph G is a proper vertex-coloring of G such that no color appears more than \(\beta \) times in any neighborhood. Molloy and Reed [29] showed the following by using an asymmetric version of the local lemma:

Theorem 6

For any constant integer \(\beta \ge 1\), if G has maximum degree \({\varDelta } \ge \beta ^{\beta }\) then G has a \(\beta \)-frugal proper vertex coloring using at most \(16{\varDelta }^{1+\frac{1}{\beta }}\) colors.

Here we outline their proof and show how to turn it into a distributed algorithm that finds such a coloring in \(O(\log n \cdot \log ^2 {\varDelta })\) rounds. If \(\beta = 1\), then simply consider the square graph of G, which is obtained by adding the edges between vertices whose distance is 2. A proper coloring in the square graph is a 1-frugal coloring in G. Since the square graph has maximum degree \({\varDelta }^2\), it can be \(({\varDelta }^2 + 1)\)-colored by simulating distributed algorithms for \(({\varDelta } + 1)\)-coloring.

For \(\beta \ge 2\), let \(k = 16 {\varDelta }^{1+\frac{1}{\beta }}\). Suppose that each vertex colors itself with one of the k colors uniformly at random. Consider two types of bad events. For each edge uv, the Type I event \(A_{u,v}\) denotes that u and v are colored the same. For each subset \(\{u_1,\ldots ,u_{\beta + 1} \}\) of the neighborhood of a vertex, Type II event \(A_{u_1,\ldots , u_{\beta +1}}\) denotes that \(u_1, \ldots , u_{\beta +1}\) are colored the same. If none of the events occur, then the random coloring is a \(\beta \)-frugal coloring. For each Type I event \(A_{u,v}\), \(\Pr (A_{u,v})\) is at most 1 / k. For each Type II event \(A_{u_1,\ldots , u_{\beta +1}}\), \(\Pr (A_{u_1,\ldots , u_{\beta +1}}) \le 1/k^{\beta }\). For each bad event A, let \(x(A) = 2\Pr (A)\). Notice that \(x(A) \le 1/2\), we have:

$$\begin{aligned}&x(A) \prod _{B \in {\varGamma }(A) } (1-x(B)) \\&\quad \ge x(A) \prod _{B \in {\varGamma }(A)} \exp \left( {-x(B) \cdot 2\ln 2 }\right) \\&\qquad \{ (1-x)\ge e^{-x \cdot 2\ln 2} \hbox { for } x \le 1/2\} \\&\quad = x(A)\cdot \exp \left( -2\ln 2 \cdot \sum _{B \in {\varGamma }(A)} 2\Pr (B) \right) \end{aligned}$$

Since A shares variables with at most \((\beta + 1){\varDelta }\) Type I events and \((\beta + 1){\varDelta } \left( {\begin{array}{c}{\varDelta }\\ \beta \end{array}}\right) \) Type II events,

$$\begin{aligned} \sum _{B \in {\varGamma }(A)} \Pr (B)&\le (\beta + 1){\varDelta } \cdot \frac{1}{k} + (\beta + 1){\varDelta } \left( {\begin{array}{c}{\varDelta }\\ \beta \end{array}}\right) \cdot \frac{1}{k^{\beta }} \\&< \frac{(\beta + 1){\varDelta }}{k} + \frac{(\beta + 1){\varDelta }^{\beta + 1}}{\beta ! k^{\beta }} \\&=\frac{\beta +1}{16{\varDelta }^{\frac{1}{\beta }}} + \frac{\beta +1}{\beta !(16)^{\beta }} \\&< 1/8 \\&\qquad \qquad \qquad \qquad \qquad \qquad \hbox {for }{\varDelta } \ge \beta ^{\beta }\hbox { and } \beta \ge 2 \end{aligned}$$

Therefore,

$$\begin{aligned} x(A) \prod _{B \in {\varGamma }(A) } (1-x(B))&\ge x(A) \exp \left( -\frac{\ln 2}{2} \right) \\&= \sqrt{2} \cdot \Pr (A). \end{aligned}$$

By letting \(1-\epsilon = 1/\sqrt{2}\) in Theorem 3, we need at most \(O(\log _{\sqrt{2}} n)\) rounds of weak MIS resampling. In each resampling round, we have to identify the bad events first. Type I events \(A_{u,v}\) can be identified by either u or v in constant number of rounds, where ties can be broken by letting the node with smaller ID check it. If \(\{u_1,\ldots ,u_{\beta +1} \}\) is in the neighborhood of u, then the Type II event \(A_{u_1,\ldots ,u_{\beta +1}}\) will be checked by u. If \(\{u_1,\ldots ,u_{\beta +1} \}\) is in the neighborhood of multiple nodes, we can break ties by letting the one having the smallest ID to check it. All Type II events in the neighborhood of u can be identified from the colors selected by the neighbors of u. Next we will find a weak MIS induced by the bad events in the dependency graph. Each node will simulate the weak MIS algorithm on the events it is responsible to check. Each round of the weak MIS algorithm in the dependency graph can be simulated with constant rounds. The maximum degree d of the dependency graph is \(O((\beta + 1){\varDelta } \left( {\begin{array}{c}{\varDelta }\\ \beta \end{array}}\right) )\). Therefore, we need at most \(O(\log n \cdot \log ^2 d) = O(\log n \cdot \log ^2 {\varDelta })\) rounds, since \(\beta \) is a constant and \((\beta + 1){\varDelta } \left( {\begin{array}{c}{\varDelta }\\ \beta \end{array}}\right) \le (\beta + 1){\varDelta }^{\beta + 1} = {\text {poly}}({\varDelta })\).

4.2.1 \(\beta \)-frugal, \((\varDelta + 1)\)-coloring

The frugal \(({\varDelta }+1)\)-coloring problem for general graphs is studied by Hind, Molloy, and Reed [21], Pemmaraju and Srinivasan [36], and Molloy and Reed [30]. In particular, the last one gave an upper bound of \(O(\log {\varDelta } / \log \log {\varDelta })\) on the frugality of \(({\varDelta } + 1)\)-coloring. This is optimal up to a constant factor, because it matches the lower bound of \({\varOmega }(\log {\varDelta } / \log \log {\varDelta })\) given by Hind et al. [30]. However, it is not obvious whether it can be implemented efficiently in a distributed fashion, because they used a structural decomposition computed by a sequential algorithm. Pemmaraju and Srinivasan [36] showed an existential upper bound of \(O(\log ^2 {\varDelta } / \log \log {\varDelta })\). Furthermore, they gave a distributed algorithm that computes an \(O\big (\log {\varDelta } \cdot \frac{\log n}{\log \log n}\big )\)-frugal, \(({\varDelta } + 1)\)-coloring in \(O(\log n)\) rounds. We show how to improve it to find a \(O(\log ^2 {\varDelta } / \log \log {\varDelta })\)-frugal, \(({\varDelta } + 1)\)-coloring also in \(O(\log n)\) rounds.

They proved the following theorem:

Theorem 7

Let G be a graph with maximum vertex degree \({\varDelta }\). Suppose that associated with each vertex \(v \in V\), there is a palette P(v) of colors, where \(|P(v)| \ge \deg (v) + 1\). Furthermore, suppose \(|P(v)| \ge {\varDelta } / 4\) for all vertices v in G. Then, for some subset \(C \subseteq V\), there is a list coloring of the vertices in C such that:

  1. (a)

    G[C] is properly colored.

  2. (b)

    For every vertex \(v \in V\) and for every color x, there are at most \(9 \cdot \frac{\ln {\varDelta }}{\ln \ln {\varDelta }}\) neighbors of v colored x.

  3. (c)

    For every vertex \(v \in V\), the number of neighbors of v not in C is at most \({\varDelta }(1-\frac{1}{e^5}) + 27 \sqrt{{\varDelta } \ln {\varDelta }}\).

  4. (d)

    For every vertex \(v \in V\), the number of neighbors of v in C is at most \(\frac{{\varDelta }}{e^5} + 27 \sqrt{{\varDelta } \ln {\varDelta }}\).

The theorem was obtained by applying the LLL to the following random process: Suppose that each vertex v has an unique ID. Every vertex picks a color uniformly at random from its palette. If v has picked a color that is not picked by any of its neighbor whose ID is smaller than v, then v will be colored with that color. Let \(q_v\) denote the probability that v becomes colored. Then, if v is colored, with probability \(1-1/(e^5 q_v)\), v uncolors itself. This ensures that the probability that v becomes colored in the process is exactly \(1/e^5\), provided that \(q_v \ge 1/e^{5}\), which they have shown to be true.

They showed by iteratively applying the theorem for \(O(\log {\varDelta })\) iterations, an \(O(\log ^2 {\varDelta }/ \log \log {\varDelta })\)-frugal, \(({\varDelta }+1)\)-coloring can be obtained. Let \(G_i\) be the graph after round i obtained by deleting already colored vertices and \({\varDelta }_i\) be the maximum degree of \(G_i\). The palette P(u) for each vertex u contains colors that have not been used by its neighbors. It is always true that \(|P(v)| \ge \deg (v) + 1\). Notice that to apply Theorem 7, we also need the condition \(|P(v)| \ge {\varDelta } /4\). The worst case behavior of \({\varDelta }_i\) and \(p_i\) is captured by the recurrences:

$$\begin{aligned} {\varDelta }_{i+1}&= {\varDelta }_i \left( 1 - \frac{1}{e^5}\right) + 27\sqrt{{\varDelta }_i \ln {\varDelta }_i} \nonumber \\ p_{i+1}&= p_i - \frac{{\varDelta }_i}{e^5} - 27\sqrt{{\varDelta }_i \ln {\varDelta }_i}. \end{aligned}$$
(1)

They showed the above recurrence can be solved to obtain the following bounds on \({\varDelta }_i\) and \(p_i\):

Lemma 12

Let \(\alpha = (1- 1/e^5)\). There is a constant C such that for all i for which \({\varDelta }_i \ge C\), \({\varDelta }_i \le 2{\varDelta }_0 \alpha ^i\) and \(p_i \ge \frac{{\varDelta }_0}{2} \alpha ^i\).

Therefore, \(|P(v)| \ge {\varDelta } /4\) always holds. The two assumptions of Theorem 7 are always satisfied and so it can be applied iteratively until \({\varDelta }_i < C\), which takes at most \(\log _{1/\alpha }\left( \frac{2{\varDelta }_0}{C} \right) = O(\log {\varDelta })\) iterations. Since each iteration introduces at most \(O(\log {\varDelta }/ \log \log {\varDelta })\) neighbors of the same color to each vertex, the frugality will be at most \(O(\log ^2 {\varDelta }/ \log \log {\varDelta })\). In the end, when \({\varDelta }_i < C\), one can color the remaining graph in \(O({\varDelta }_i + \log ^{*} n)\) time using existing \(({\varDelta }_i + 1)\)-coloring algorithms [5]. This will only add O(1) copies of each color to the neighborhood, yielding a \(O(\log ^2 {\varDelta } / \log \log {\varDelta })\)-frugal, \(({\varDelta } + 1)\)-coloring. In order to make it suitable for our simple distributed algorithm and achieve the running time of \(O(\log n)\), we will relax the criteria of (b),(c),(d) in Theorem 7:

  1. (b’)

    For every vertex \(v \in V\) and for every color x, there are at most \(18 \cdot \frac{\ln {\varDelta }_0}{\ln \ln {\varDelta }_0}\) neighbors of v colored x.

  2. (c’)

    For every vertex \(v \in V\), the number of neighbors of v not in C is at most \({\varDelta }(1-\frac{1}{e^5}) + 40 \sqrt{{\varDelta }} \ln {\varDelta }\).

  3. (d’)

    For every vertex \(v \in V\), the number of neighbors of v in C is at most \(\frac{{\varDelta }}{e^5} + 40 \sqrt{{\varDelta }} \ln {\varDelta }\).

In (b’), \({\varDelta }\) is replaced by \({\varDelta }_0\), which is the maximum degree of the initial graph. Also, the constant 9 is replaced by 18. In (c’) and (d’), the constant 27 is replaced by 40 and \(\sqrt{\ln {\varDelta }}\) is replaced by \(\ln {\varDelta }\). It is not hard to see that Lemma 12 still holds and an \(O(\log ^2 {\varDelta } / \log \log {\varDelta })\)-frugal coloring is still obtainable. Originally, by Chernoff Bound and Azuma’s Inequality, they showed

$$\begin{aligned}&\Pr \left( \# \text{ neighbors } \text{ of } v \text{ colored } x \text{ exceeds } 9\cdot \frac{\ln {\varDelta }}{\ln \ln {\varDelta }} \right) \nonumber \\&\quad <\frac{1}{{\varDelta }^6} \end{aligned}$$
(2)

and

$$\begin{aligned} \Pr \left( \left| P_v - \frac{\deg (v)}{e^5} \right| > 27\sqrt{{\varDelta } \ln {\varDelta } } \right) < \frac{2}{{\varDelta }^{4.5}} \end{aligned}$$
(3)

where \(P_v\) is the number of colored neighbors of v. Theorem 7 can be derived from (2) and (3). The relaxed version (b’), (c’), and (d’) can be shown to fail with a lower probability.

$$\begin{aligned}&\Pr \left( \# \text{ neighbors } \text{ of } v \text{ colored } x \text{ exceeds } 18\cdot \frac{\ln {\varDelta }_0}{\ln \ln {\varDelta }_0} \right) \nonumber \\&\quad < \frac{1}{{\varDelta }_0^{12}} \end{aligned}$$
(4)

and

$$\begin{aligned} \Pr \left( \left| P_v - \frac{\deg (v)}{e^5} \right| > 40 \sqrt{{\varDelta }} \ln {\varDelta } \right) < \frac{2}{{\varDelta }^{9\ln {\varDelta }}} \end{aligned}$$
(5)

The bad event \(A_v\) is when the neighbors of v colored x exceeds \(18 \cdot \frac{\ln {\varDelta }_0}{\ln \ln {\varDelta }_0}\) for some color x or \(|P_v - \frac{\deg (v)}{e^5}| > 40\sqrt{{\varDelta }}\ln {\varDelta }\) happens. By (4), (5), and the union bound, \(\Pr (A_v) \le {({\varDelta }+1)}/{{\varDelta }_0^{12}} + {2}/{{\varDelta }^{9\ln {\varDelta }}}\). In their random process, they showed \(A_v\) depends on variables up to distance two. Thus, the dependency graph \(G_{\mathcal {A}}\) has maximum degree d less than \({\varDelta }^4\). Note that

$$\begin{aligned} epd^2&= e{\varDelta }^8 ({({\varDelta } + 1)}/{(2{\varDelta }_0^{12})} + {2}/{{\varDelta }^{9\ln {\varDelta }}}) \\&\le 1 / (2{\varDelta }_0) + 1 / (2{\varDelta }^{\ln {\varDelta }}) \\&< 2 \cdot \max (1 / (2{\varDelta }_0), 1 / (2{\varDelta }^{\ln {\varDelta }} ))\\&= \max (1 / {\varDelta }_0, 1 / {\varDelta }^{\ln {\varDelta }} ). \end{aligned}$$

The number of resampling rounds needed is at most \(O\Big (\log _{\frac{1}{epd^2}} n\Big )\), which is at most \(\frac{\ln n}{ \min \left( \ln {\varDelta }_0, \ln ^2 {\varDelta } \right) } \le \frac{\ln n}{\ln {\varDelta }_0} + \frac{\ln n}{\ln ^2 {\varDelta }}\). Therefore, the total number of rounds needed is at most:

$$\begin{aligned}&\sum _{i = 1}^{c \ln {\varDelta }_0} \left( \frac{\ln n}{\ln {\varDelta }_0} + \frac{\ln n}{\ln ^2 {\varDelta }_i}\right) \\&\quad \le \sum _{i = 1}^{c \ln {\varDelta }_0} \left( \frac{\ln n}{\ln {\varDelta }_0} + \frac{\ln n}{\ln ^2 (2{\varDelta }_0\alpha ^{i} )}\right) \\&\quad = c \ln {\varDelta }_0 \cdot \frac{\ln n}{\ln {\varDelta }_0} + \sum _{i=1}^{c\ln {\varDelta }_0} \frac{\ln n }{(\ln {\varDelta }_0 - i \ln \frac{1}{\alpha } + \ln 2 )^2} \\&\quad \le c \ln n + \ln n \cdot O\left( \sum _{i=1}^{\infty } \frac{1}{i^2} \right) = O(\log n) \end{aligned}$$

where \(c>0\) is some constant, and \(\alpha = (1- 1/e^5)\).

4.3 Distributed triangle-free graphs coloring

Pettie and Su [37] gave a distributed algorithm for \(({\varDelta } / k)\)-coloring triangle-free graphs:

Theorem 8

Fix a constant \(\epsilon >0\). Let \({\varDelta }\) be the maximum degree of a triangle-free graph G, assumed to be at least some \({\varDelta }_{\epsilon }\) depending on \(\epsilon \). Let \(k\ge 1\) be a parameter such that \(2\epsilon \le 1-\frac{4k}{\ln {\varDelta }}\). Then G can be \(({\varDelta }/k)\)-colored, in time \(O(k + \log ^{*} {\varDelta })\) if \({\varDelta }^{1-\frac{4k}{\ln {\varDelta }} - \epsilon } = {\varOmega }(\ln n)\), and, for any \({\varDelta }\), in time on the order of

$$\begin{aligned} e^{O(\sqrt{\ln \ln n})}\cdot (k + \log ^* {\varDelta })\cdot \frac{\log n}{{\varDelta }^{1-\frac{4k}{\ln {\varDelta }}-\epsilon }} \; = \; \log ^{1+o(1)} n. \end{aligned}$$

The algorithm consists of \(O(k + \log ^{*} {\varDelta })\) iterations. For each iteration i, a property \(\mathcal {H}_i(u)\) is maintained at each vertex u. If \(\mathcal {H}_{i-1}(u)\) is true for all u in G, then after round i, it is shown \(\mathcal {H}_i(u)\) fails with probability at most \(p = \exp \left( -{\varDelta }^{1-\frac{4k}{\ln {\varDelta }}-\epsilon + {\varOmega }(\epsilon )} \right) \), which is at most \(\exp \left( -{\varDelta }^{1-\frac{4k}{\ln {\varDelta }}-\epsilon } \right) /e{\varDelta }^4\) if \({\varDelta } \ge {\varDelta }_\epsilon \), for some constant \({\varDelta }_\epsilon \). Note that if \({\varDelta }^{1-\frac{4k}{\ln {\varDelta }}-\epsilon } = {\varOmega }(\log n)\), then by union bound, with high probability all \(\mathcal {H}_i(u)\) holds. Otherwise, they revert to the distributed constructive Lovász Local Lemma. Let \(G_i\) be the subgraph of G induced by uncolored vertices. The event \(\mathcal {H}_{i}(u)\) shares random variables up to distance two from u in \(G_{i-1}\). The bad events \(\mathcal {A}\) is made up with \(A_u = \overline{E}_i (u)\) for \(u \in G_{i-1}\). Therefore, the dependency graph \(G_\mathcal {A}\) is \(G_{i-1}^{\le 4}\), where (uv) is connected if the \({\text {dist}}_{G_{i-1}}(u,v) \le 4\). The maximum degree d of \(G_{\mathcal {A}}\) is less than \({\varDelta }^4\). By the Lovász Local Lemma, since \(ep(d+1) < 1\), the probability all \(\mathcal {H}_i(u)\) simultaneously hold is positive. To achieve this constructively, note that by Theorem 1, it requires \(O(\log _{\frac{1}{1-\epsilon }} n )\) resampling rounds, where \(1-\epsilon = ep(d+1) \le \exp \left( -{\varDelta }^{1-\frac{4k}{\ln {\varDelta }}-\epsilon } \right) \). Each resampling round involves finding an MIS. They showed in the case \({\varDelta }^{1-\frac{4k}{\ln {\varDelta }}-\epsilon } = O(\log n)\), \({\varDelta }\) will be at most \({\text {polylog}}(n)\), where faster MIS algorithms can be applied. Now we will use the simple distributed algorithm presented in the previous section to resample without finding an MIS in each resampling round. First, notice that with some larger constant \({\varDelta }_{\epsilon }\), if \({\varDelta } \ge {\varDelta }_{\epsilon }\), the failure probability p is at most \(\exp \left( -{\varDelta }^{1-\frac{4k}{\ln {\varDelta }}-\epsilon } \right) /e{\varDelta }^{8}\). Since \(epd^2 \le \exp \left( -{\varDelta }^{1-\frac{4k}{\ln {\varDelta }}-\epsilon } \right) \), by Corollary 1, w.h.p. none of the bad events happen after \(O\big (\log _{\frac{1}{epd^2}} n\big ) = O\Big (\frac{\log n}{{\varDelta }^{1-\frac{4k}{\ln {\varDelta }}-\epsilon } }\Big )\) resampling rounds of the simple distributed algorithm, where each resampling round takes constant time. As a result, the number of rounds is reduced to \(O(\log n)\).

Theorem 9

Fix a constant \(\epsilon >0\). Let \({\varDelta }\) be the maximum degree of a triangle-free graph G, assumed to be at least some \({\varDelta }_{\epsilon }\) depending on \(\epsilon \). Let \(k\ge 1\) be a parameter such that \(2\epsilon \le 1-\frac{4k}{\ln {\varDelta }}\). Then G can be \(({\varDelta }/k)\)-colored, in time \(O(k + \log ^{*} {\varDelta })\) if \({\varDelta }^{1-\frac{4k}{\ln {\varDelta }} - \epsilon } = {\varOmega }(\ln n)\), and, for any \({\varDelta }\), in time on the order of

$$\begin{aligned} (k + \log ^* {\varDelta })\cdot \frac{\log n}{{\varDelta }^{1-\frac{4k}{\ln {\varDelta }}-\epsilon }} \; = \; O(\log n). \end{aligned}$$

Similarly, the \((1+o(1)){\varDelta } / \log {\varDelta }\)-coloring algorithm for girth-5 graphs in [37] can be obtained in \(O(\log n)\) rounds by replacing Moser and Tardos’ algorithm with the simple distributed algorithm.

4.4 Distributed list coloring

Given a graph G, each vertex v is associated with a list (or a palette) of available colors P(v). Let \(\deg _c(v)\) denote the number of neighbors \(w \in N(v)\) such that \(c \in P(w)\). Suppose that \(\deg _c(v)\) is upper bounded by D. The list coloring constant is the minimum K such that for any graph G and any palettes P(u) for \(u\in G\), if \(|P(u)| \ge K\cdot D\) and \(\deg _c(u) \le D\) for every \(u \in G\) and every \(c \in P(u)\), then a proper coloring can be obtained by assigning each vertex a color from its list. Reed [38] first showed the list coloring constant is at most 2e by a single application of LLL. Haxell [20] showed 2 is sufficient. Later, Reed and Sudakov [39] used a multiple iterations Rödl Nibble method to show the list coloring constant is at most \(1+o(1)\), where o(1) is a function of D. Reed’s upper bound of 2e can be made distributed and constructive with a slightly larger factor, say \(2e+\epsilon \) for any constant \(\epsilon > 0\). The LLL condition they need is close to tight and so we will need to use the weak MIS algorithm. The additional slack needed is due to the \(\epsilon \)-slack needed in distributed LLL (\(ep(d+1) \le 1-\epsilon \)). The constructive algorithm can be easily transformed from their proof. Here we outline their proof: Suppose \(|P(v)| \ge (2e+\epsilon )D\) for all v. Each vertex is assigned a color from its palette uniformly at random. They showed that with positive probability, a proper coloring is obtained. Let \(e=uv \in E\), and \(c \in P(u) \cap P(v)\). Define \(A_{e,c}\) to be the bad event that both u and v are assigned c. Clearly, \(p=\Pr (A_{e,c}) = 1/((2e+\epsilon )D)^2\). Also, there are at most \((2e+\epsilon )D^2\) events that depend on the color u picks and at most \((2e+\epsilon )D^2\) events that depend on the color v picks. The dependency graph has maximum degree \(d = 2(2e+\epsilon )D^2-2\). Since \(ep(d+1) \le 2e/(2e+\epsilon )\) is upper bounded by a constant less than 1, we can construct the coloring in \(O(\log n \cdot \log ^2 D)\) rounds by using the weak MIS algorithm.

In the following, we shall show that for any constants \(\epsilon , \gamma > 0\), there exists \(D_{\epsilon , \gamma } > 0\) such that for any \(D \ge D_{\epsilon ,\gamma }\), any \((1+\epsilon )D\)-list coloring instance can be colored in \(O(\log ^{*}D \cdot \max (1, \log n/ D^{1-\gamma }) )\) rounds. The algorithm consists of multiple iterations. Let \(P_i(u)\) and \(\deg _{i,c}(u)\) be the palette and the c-degree of u at end of iteration i. Also, at the end of iteration i, denote the neighbor of u by \(N_i(u)\) and the c-neighbor by \(N_{i,c}(u)\), which are the neighbors of u having c in their palette. Suppose that each vertex u has an unique ID, \({\text {ID}}(u)\). Let \(N^{*}_{i,c}(u)\) denote the set of c-neighbors at the end of iteration i having smaller ID than u. Let \(\deg ^{*}_{i,c}(u) = |N^{*}_{i,c}(u)|\).

figure e
figure f

In each iteration i, each vertex will select a set of colors \(S_i(u) \subseteq P_{i-1}(u)\) and \(K_{i}(u) \subseteq P_{i-1}(u)\), which are obtained from Algorithm 6. If a color is in \(K_{i}(u)\) and it is not in \(S_i(v)\) for any \(v \in N^{*}_{i-1}(u)\), then it remains in its new palette \(P_{i}(u)\). Furthermore, if \(S_i(u)\) contains a color that is in \(P_i(u)\), then u colors itself with the color (in case there are multiple such colors, break ties arbitrarily).

Given \(\pi _i\), the selecting probability for each vertex u to include a color in \(S_{i}(u)\), the probability that \(u \notin S_{i}(N^{*}_{i-1}(u))\) is \((1-\pi _i)^{\deg ^{*}_{i-1,c}(u)}\). Define \(\beta _i = (1-\pi _i)^{t'_{i-1}}\), where \(t'_{i-1}\) is an upper bound on \(\deg _{i-1,c}(u)\) for each vertex u and each color c. Then \(r_c = \beta _i / (1-\pi _i)^{\deg ^{*}_{i-1,c}(u)}\) is always at most 1 and thus it is a valid probability. Therefore, the probability that a color \(c\in P_{i-1}(u)\) remains in \(P_{i}(u)\) is \((1-\pi _i)^{\deg ^{*}_{i-1,c}(u)} \cdot r_c = \beta _i\). As a result, the palette size shrinks by at most a \(\beta _i\) factor in expectation.

Suppose that \(p'_i\) is the lower bound on the palette size at the end of iteration i. Then the probability that u remains uncolored is upper bounded by the probability that any of the colors in \(P_i(u)\) was not selected to be in \(S_i(u)\). The probability is roughly \((1-\pi _i)^{p'_i}\), which we will define it to be \(\alpha _i\). The slight inaccuracy comes from the fact that we are conditioning on the new palette size \(|P_i(u)|\) is lower bounded by \(p'_i\). However, we will show the effect of this conditioning only affects the probability by a small amount.

Let \(p_0 = (1+\epsilon )\cdot D\) and \(t_0 = D\) be the initial palette size and upper bound on c-degree. In the following, \(p_i\) and \(t_i\) are the ideal lower bound of the palette size and the ideal upper bound of the c-degree at the end of each iteration i. \(p'_i\) and \(t'_i\) are the approximation of \(p_i\) and \(t_i\), incoporating the errors from concentration bounds. K is a constant in the selecting probability that depends on \(\epsilon \). T is the threshold on the c-degree before we switch to a different analysis, since the usual concentration bound does not apply when the quantity is small. \(\delta = 1/\log D\) is the error control parameter which is set to be small enough such that \((1 \pm \delta )^{i}\) is \(1\pm o(1)\) for every iteration i.

Intuitively, we would like to have \(t_i\) shrink faster than \(p_i\). To ensure this happens, we must have \(\alpha _1 \le \beta _1\), which holds under our setting of \(\pi _i\). As we will show, \(\alpha _i\) shrinks much faster than \(\beta _i\) as i becomes larger. Note that \(\beta _i\) is at least a constant, as

$$\begin{aligned} \beta _i&= (1- 1/ (Kt'_{i-1} + 1) )^{t'_{i-1}} \\&= (1- 1/ (Kt'_{i-1} + 1) )^{(K t'_{i-1}) \cdot (1/K)}\\&\ge (e^{-1})^{1/K} = e^{-1/K} \\&\qquad \hbox {since }(1-1/(x+1))^{x} \ge e^{-1}. \end{aligned}$$

Lemma 13

\(t_r = T\) after at most \(r = O(\log ^{*} D)\) iterations.

Proof

We divide the iterations into two stages, where the first stage consists of iterations i for which \(t_{i-1}/p_{i-1} \ge 1 / (1.1e^{2/K} K)\). During the first stage, we show that the ratio \(t_{i}/p_{i}\) decreases by a factor of \(\exp \left( -(1-o(1))\frac{\epsilon ^2}{4(1+\epsilon )} \right) \) in every round.

$$\begin{aligned} \frac{t_i}{p_i}&= \frac{\alpha _i}{\beta _i}\frac{t_{i-1}}{p_{i-1}} \\&= (1 - \pi _i)^{p'_i - t'_{i-1} } \cdot \frac{t_{i-1}}{p_{i-1}} \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \hbox {defn. }\alpha _i, \beta _i \\&\le \exp \left( -\pi _i \cdot (p'_i - t'_{i-1} ) \right) \cdot \frac{t_{i-1}}{p_{i-1}} \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad 1 - x \le e^{-x} \\&\le \exp \left( -(1-o(1))\cdot \frac{1}{K}\left( \frac{p_i}{t_{i-1}} - 1\right) \right) \cdot \frac{t_{i-1}}{p_{i-1}} \\&\qquad \qquad \qquad \qquad \qquad \qquad \hbox {defn. }\pi _i,\, \frac{p'_i}{t'_{i-1}} = (1-o(1))\frac{p_i}{t_{i-1}} \\&\le \exp \left( -(1-o(1))\cdot \frac{1}{K}\left( \frac{\beta _i p_{i-1}}{t_{i-1}} - 1\right) \right) \cdot \frac{t_{i-1}}{p_{i-1}} \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \hbox {defn. }p_i \\&\le \exp \left( -(1-o(1))\cdot \frac{1}{K}\left( e^{-1/K}(1+\epsilon ) - 1 \right) \right) \\&\quad \quad \quad \,\,\quad \qquad \cdot \frac{t_{i-1}}{p_{i-1}} \qquad \qquad \qquad \qquad p_{i-1}/t_{i-1} \ge (1+\epsilon ) \\&\le \exp \left( -(1-o(1))\cdot \frac{((1-1/K)(1+\epsilon ) - 1)}{K} \right) \\&\quad \quad \quad \,\,\quad \qquad \cdot \frac{t_{i-1}}{p_{i-1}} \qquad \qquad \qquad \qquad e^{-x} \ge 1 - x\\&= \exp \left( -(1-o(1))\cdot \frac{\epsilon ^2}{4(1+\epsilon )} \right) \cdot \frac{t_{i-1}}{p_{i-1}} \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad K = 2(1+\epsilon )/\epsilon \end{aligned}$$

Therefore, the first stage ends after at most \((1+o(1))\frac{4(1+\epsilon )}{\epsilon ^2} \ln (1.1Ke^{2/K})\) iterations. Let j be the first iteration when the second stage begins. For \(i > j\), we show that \(1/\alpha _i\) has an exponential tower growth.

$$\begin{aligned}&\alpha _i = (1-\pi _i)^{p'_i} \\&\le \exp \left( -(1-o(1))\frac{1}{K}\cdot \frac{p_i}{t_{i-1}} \right) \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad 1-x\le e^{-x} \\&\le \exp \left( -(1-o(1))\frac{1}{K}\cdot \frac{\beta _i p_{i-1}}{t_{i-1}} \right) \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \hbox {defn. }p_i \\&\le \exp \left( -(1-o(1)) \frac{1}{K}\cdot \frac{\beta _{i-1}}{\alpha _{i-1}} \cdot \frac{\beta _i p_{i-2}}{t_{i-2}} \right) \\&\qquad \quad \quad \quad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \frac{p_{i-1}}{t_{i-1}} = \frac{\beta _{i-1}}{\alpha _{i-1}} \frac{p_{i-2}}{t_{i-2}} \\&\le \exp \left( -(1-o(1)) \frac{1}{K}\cdot \frac{e^{-2/K}}{\alpha _{i-1}} \cdot \frac{ p_{i-2}}{t_{i-2}} \right) \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \beta _i \ge e^{-1/K} \\&\le \exp \left( -1 / \alpha _{i-1} \right) \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \frac{t_{i-2}}{p_{i-2}} < \frac{1}{1.1Ke^{2/K}} \end{aligned}$$

Therefore, \(\frac{1}{\alpha _{j + \log ^{*}D + 1}} \ge \underbrace{e^{e^{\cdot ^{\cdot ^{\cdot ^{e}}}}}}_{\log ^{*}D} \ge D\), and so \(t_{j + \log ^{*}D + 1} \le \max (\alpha _{j + \log ^{*}D + 1} \cdot D,T) = T\). \(\square \)

On the other hand, we show the bound on the palette size remains large throughout the algorithm.

Lemma 14

\(p'_{i} = D^{1-o(1)}\) for \(i = O(\log ^{*} D)\).

Proof

\(p'_i = (1-\delta )^{i} p_i \ge (1-\delta )^{i} \prod _{j=1}^{i} \beta _j D \ge (1-\delta )^{i} e^{-i/K} D = (1-o(1)) D^{ -\frac{i}{K\log D} } \cdot D = D^{1-o(1)}\). \(\square \)

In the following we shall show how to ensure that for each iteration i the palette sizes are lower bounded by \(p'_i\) and the c-degrees are upper bounded by \(t'_i\). For convenience let \(H_i(u)\) denote the event that \(|P_i(u)| \ge p'_i\) and \(\deg _{i,c}(u) \le t'_i\) for u and \(c \in P_{i-1}(u)\). Let \(H_i\) denote the event that \(H_i(u)\) holds for every \(u \in G_{i}\).

Lemma 15

Suppose that \(H_{i-1}\) holds, then \(\Pr ( |P_i(u)|< (1-\delta ) \beta _i |P_{i-1}(u)| ) < e^{-{\varOmega }(\delta ^2 p'_i)}\).

Proof

Consider a color \(c \in P_{i-1}(u)\). The probability that c remains in \(P_{i}(u)\) is exactly \(\beta _i\). Since the event that c remains in \(P_{i}(u)\) is independent among other colors, by a Chernoff Bound, \(\Pr (|P_i(u)|< (1-\delta )\beta _i |P_{i-1}(u)|) < e^{-{\varOmega }(\delta ^2 p_{i-1})} \). \(\square \)

Lemma 16

Suppose that \(H_{i-1}\) holds, then \(\Pr (\deg _{i,c}(u) > (1+\delta ) \cdot \max (\alpha _i \cdot \deg _{i-1,c}(u), T)) < e^{-{\varOmega }(\delta ^2 T)} + D \cdot e^{-{\varOmega }(\delta ^2 p'_i) }\).

Proof

Let \(x_1, \ldots x_k \in N_{i-1,c}(u)\) be the c-neighbors of u, ordered by their ID. Let \(\mathcal {E}_j\) denote the event that \(|P_i(x_j)| \ge p'_i\), where \(\Pr (\overline{\mathcal {E}_j}) < e^{-{\varOmega }(\delta ^2 p'_i)}\) by Lemma 15.

Let \(X_i\) denote the event that \(x_i\) remains uncolored after iteration i. Let \(\mathbf {X}_j\) denote the shorthand for \((X_1, \ldots , X_j)\). We will show that for any realization of \(\mathbf {X}_{j-1}\), \(\Pr (X_{j} \mid \mathbf {X}_{j-1}, \mathcal {E}_1, \ldots , \mathcal {E}_j) \le \alpha _i\). Then we can apply Lemma 19, which is a variant of Chernoff bound that works when conditioning on a sequence of likely events.

Let \(U_2 = N_{i-1}(N_{i,c}(u)) \setminus N_{i,c}(u)\) be the neighbors of the c-neighbors excluding the c-neighbors themselves (\(u \in U_2\) unless \(\deg _{i-1,c}(u) = 0\)). First, notice that the events \(\mathbf {X}_{j-1}\) and \(\mathcal {E}_1 \ldots , \mathcal {E}_{j}\) are functions of \(S_i(U_2), S_i(x_1), \ldots , S_i(x_{j-1}), K_i(x_1), \ldots , K_i(x_{j})\). Therefore, we can instead show that under any realization of \(S_i(U_2), S_i(x_1), \ldots , S_i(x_{j-1}), K_i(x_1), \ldots , K_i(x_{j})\) subject to the events \(\mathcal {E}_1 \ldots , \mathcal {E}_{j}\) hold, \(\Pr (X_j \mid S_i(U_2), S_i(x_1), \ldots , S_i(x_{j-1}), K_i(x_1), \ldots , K_i(x_{j})) \le \alpha _i\).

Obviously for any \(c' \in P_{i-1}(x_j)\),

$$\begin{aligned}&\Pr (c' \in S_{i}(x_j) \mid S_i(U_2), S_i(x_1), \ldots , S_i(x_{j-1}), \\&\quad K_i(x_1), \ldots , K_i(x_{j})) = \pi _i. \end{aligned}$$

Therefore,

$$\begin{aligned}&\Pr (X_j \mid S_i(U_2), S_i(x_1), \ldots , S_i(x_{j-1}), \\&\qquad K_i(x_1), \ldots , K_i(x_{j})) \\&\quad \le (1-\Pr (c' \in S_{i}(x_j) \mid S_i(U_2), S_i(x_1), \ldots , S_i(x_{j-1}), \\&\qquad K_i(x_1), \ldots , K_i(x_{j})))^{|P_{i}(u)|} \\&\quad \le (1-\pi _i)^{p'_{i}} = \alpha _i. \end{aligned}$$

Therefore, by Lemma 19, Corollary 5, and by the fact that \(\sum _{j} \Pr (\overline{\mathcal {E}}_j) \le D \cdot e^{-{\varOmega }(\delta ^2 p'_i)}\), we have \(\Pr (\deg _{i,c}(u) > (1+\delta ) \cdot \max (\alpha _i \cdot \deg _{i-1,c}(u), T) ) \le e^{-{\varOmega }(\delta ^2 T)} + D \cdot e^{-{\varOmega }(\delta ^2 p'_i)}\). \(\square \)

Corollary 4

Suppose that \(H_{i-1}\) holds, \(\Pr (\overline{H}_i(u)) \le D\cdot e^{-{\varOmega }(\delta ^2 T)} + 2D^2 \cdot e^{-{\varOmega }(\delta ^2 p'_i)}\).

Proof

By taking union bound over the event in Lemma 15 and the events in Lemma 16 over each \(c \in P_{i-1}(u)\), we get the desired result. \(\square \)

Let r be the first iteration such that \(t_r = T\). If \(H_{r}\) holds, then \(\deg _{r,c}(u) \le t'_r \le (1+\delta )^{r}t_r \le (1+o(1))t_r \le 2T\) for all u and c. Now we switch to the following analysis, which shows the algorithm terminates in a constant number of iterations. For \(i > r\), we define \(t'_i = t'_{i-1} \cdot \frac{T}{p'_i}\). The definition for the rest of parameters remain the same. By Lemma 14, if D is large enough, we can assume that \(p'_i \ge D^{1-0.8\gamma }\) for \(i = r + \lceil 1/(0.1\gamma ) \rceil \), since \(r + \lceil 1/(0.1 \gamma ) \rceil = O(\log ^{*}D)\). Then from the definition of \(t'_i\), it shrinks to less than one in \(\lceil \frac{1}{0.1 \gamma }\rceil \) iterations, since \(T/p'_i \le D^{-0.1\gamma }\) and \(t'_{r + 1/(0.1\gamma )}< (D^{-0.1\gamma })^{\lceil 1/(0.1\gamma ) \rceil } \cdot t'_r < 1\).

Now we will show that under this new definition of \(t_i\) for \(i > r\), \(H_i(u)\) is likely to hold, provided that \(H_{i-1}\) holds.

Lemma 17

Suppose that \(H_{i-1}\) is true where \(i > r\), then \(\Pr (\deg _{i,c}(u) > t'_i) < e^{-{\varOmega }(T)} + D\cdot e^{-{\varOmega }(\delta ^2 p'_i)}\)

Proof

Let \(x_1, \ldots x_{k} \in N_{i-1,c}(u)\) be the c-neighbors of u, ordered by their ID in the increasing order. Let \(\mathcal {E}_j\) denote the event that \(|P_i(x_j)| \ge p'_i\). Note that \(\Pr (\overline{\mathcal {E}}_j) \le e^{-{\varOmega }(\delta ^2 p'_i)}\). As we have shown in the proof of Lemma 16, \(\Pr (X_j \mid \mathbf {X}_j, \mathcal {E}_1, \ldots , \mathcal {E}_j) \le \alpha _i\). Therefore,

$$\begin{aligned}&\Pr (\deg _{i,c}(u)> t'_i) \\&\quad = \Pr \left( \deg _{i,c}(u) > \left( \frac{t'_{i-1}}{\alpha _i t'_{i-1}} \right) \cdot \alpha _i t'_{i-1} \right) \end{aligned}$$

Applying Lemma 19 and Corollary 5 with \(1+\delta = t'_i / (\alpha _i t'_{i-1})\), and noticing that \(\alpha _i \deg _{i-1,c}(u) \le \alpha _i t'_{i-1}\), the probability above is bounded by

$$\begin{aligned}&\le \exp \left( -\alpha _i t'_{i-1} \left( \frac{t'_i}{\alpha _i t'_{i-1}}\ln \frac{t'_i}{\alpha _i t'_{i-1}} - \left( \frac{t'_i}{\alpha _i t'_{i-1}} - 1 \right) \right) \right) \\&\quad + D e^{-{\varOmega }(\delta ^2 p'_i)}\\&\le \exp \left( -t'_{i} \left( \ln \frac{t'_i}{\alpha _i t'_{i-1}} - 1 \right) \right) + D e^{-{\varOmega }(\delta ^2 p'_i)} \\&= \exp \left( -t_{i} \left( \ln \left( \frac{1}{\alpha _i}\right) - \ln \left( \frac{et'_{i-1}}{t'_{i}}\right) \right) \right) \\&\qquad + D e^{-{\varOmega }(\delta ^2 p'_i)}\\&\le \exp \left( -t'_i \left( \left( 1-o(1)\right) \frac{p'_i}{K t'_{i-1}} - \ln \left( \frac{et'_{i-1}}{t'_i}\right) \right) \right) \\&\qquad + D e^{-{\varOmega }(\delta ^2 p'_i)} \quad \ln \frac{1}{\alpha _i} = (1-o(1))\frac{p'_i}{Kt'_{i-1}} \\&\le \exp \left( -\left( (1-o(1))\frac{T}{K} - t'_i \ln (eD)\right) \right) \\&\qquad + D e^{-{\varOmega }(\delta ^2 p'_i)} \qquad \qquad \hbox {defn. }t'_i\hbox { and }t'_{i-1}/t'_i < D \\&\le \exp \left( -T \left( \frac{(1-o(1))}{K} - \frac{t'_{i-1}}{p'_i} \ln (eD) \right) \right) \\&\qquad + D e^{-{\varOmega }(\delta ^2 p'_i)} \\&\le \exp \left( -T \left( (1-o(1))\frac{1}{K} - \frac{2\ln (eD)}{D^{0.1\gamma }} \right) \right) \\&\qquad + D e^{-{\varOmega }(\delta ^2 p'_i)} \quad {t'_{i-1}}{p'_i} \le \frac{2T}{p'_i} \le \frac{2}{D^{0.1\gamma }} \\&\le \exp \left( -{\varOmega }(T) \right) + D e^{-{\varOmega }(\delta ^2 p'_i)} \end{aligned}$$

\(\square \)

Suppose that \(H_{i-1}\) holds, by taking the union bound over all the events \(P_i(u) \ge p'_i\) for all \(u\in G_{i-1}\) and \(\Pr (\deg _{i,c}(u) > t'_i)\) for all \(u \in G_{i-1}\) and all \(c \in P_{i-1}(u)\), we get that \(\Pr (\overline{H}_i(u)) \le D \cdot e^{-{\varOmega }(T)} + 2D^2 \cdot e^{-{\varOmega }(\delta ^2 p'_i)}\).

Therefore, we conclude that for each iteration \(i\ge 1\), if \(H_{i-1}\) holds, then \(\Pr (\overline{H}_i(u)) \le D \cdot \exp (-{\varOmega }(\delta ^2 T)) + 2D^2 \cdot \exp (-{\varOmega }(\delta ^2 p'_i)) \le \exp (-D^{1-0.95\gamma })\) for large enough D. Now we want to ensure that \(H_i\) holds for every iteration i. If \(H_{i-1}\) is true, then \(\Pr (\overline{H}_i(u))\le \exp \left( -D^{1-0.95\gamma }\right) \). If \(D^{1-\gamma } \ge \log n\), then each of the bad events occur with probability at most \(1 / {\text {poly}}(n)\). Since there are O(n) events, by the union bound, \(H_i\) holds w.h.p. On the other hand, if \(D^{1-\gamma } \le \log n\), then we can use the LLL algorithm to make \(H_i\) hold w.h.p. The probability of the failure events are bounded by \(p = \exp \left( -D^{1-0.95\gamma }\right) \). Each event depends on at most \(d = O({\varDelta }^2)\) other events, since each event only depends on the outcomes of the random variables in its neighborhood. Therefore, \(epd^2 \le \exp (-D^{1-\gamma })\) and we can apply the simple LLL algorithm to make all the events hold w.h.p. in \(O(\log _{1/epd^2} n) \le O(\log n / D^{1-\gamma })\) iterations.

By Lemma 13 and the fact that \(t_i\) shrinks to 1 in a constant number of iterations after \(i > r\), the algorithm uses \(O(\log ^{*} D)\) iterations. Each iteration uses \(\max (1, O(\log n / D^{1-\gamma }))\) rounds. The total number of rounds is therefore \(O(\log ^{*} D \cdot \max (1, O(\log n / D^{1-\gamma })) )\).

5 Discussion

We gave distributed LLL algorithms under the conditions \(p\cdot f(d) < 1\) for different functions f(d). When \(f(d) = e(d+1)\) that matches the general condition of LLL, our weak-MIS resampling algorithm gives a running time of \(O(\log ^2 d \cdot \log _{1/ep(d+1)} n )\). Note that the weak-MIS algorithm was later applied in local computation algorithms for computing MIS [25]. Recently, Ghaffari’s new MIS algorithm [15] can compute a weak-MIS in \(O(\log d)\) time, which improves the overall running time for LLL to \(O(\log d \cdot \log _{1/ep(d+1)} n)\).

The lower bound we showed in this paper is \({\varOmega }(\log ^{*} n)\). Very recently, Brandt et al. [8] obtained an \({\varOmega }(\log \log n)\) lower bound for LLL from the sinkless orientation problem and the sinkless coloring problem in 3-regular graphs. Subsequently, Chang, Kopelowitz, and Pettie generalized [8] to show an \({\varOmega }(\log _{d} n)\) lower bound for deterministic LLL algorithms and an \({\varOmega }(\log _{d} \log n)\) lower bound for randomized LLL algorithms [10]. Note that the lower bounds they have obtained requires f(d) to be upper bounded by \(2^d\), while ours allows it to grow unbounded.