Keywords

Mathematics Subject Classification (2020)

1 Introduction

The Rado graph R is a natural limit of the set of all finite graphs (Fraissé limit, see Sect. 2.1). In Rado’s construction, the vertex set is \(\mathbb {N} = \{0,1,2,\ldots \}\). There is an undirected edge from i to j if i < j and the i th binary digit of j is a one (where the 0th digit is the first digit from the right). Thus, 0 is connected to all odd numbers, 1 is connected to 0 and all j which are 2 or 3 (mod 4) and so on. There are many alternative constructions. For p ∈ (0, 1), connecting i and j with probability p gives the Erdős–Rényi graph G(, p), which is (almost surely) isomorphic to R. Further constructions are in Sect. 2.1.

Let \((Q(j))_{0\leqslant j<\infty }\) be a positive probability on \(\mathbb {N}\) (so, Q(j) > 0 for all j, and \(\sum _{j=0}^\infty Q(j)=1\)). We study a ‘ball walk’ on R generated by Q: from \(i\in \mathbb {N}\), pick j ∈ N(i) with probability proportional to Q(j), where N(i) = {j : j ∼ i} is the set of neighbors of i in R. Thus, the probability of moving from i to j in one step is

$$\displaystyle \begin{aligned} K(i,j) = \begin{cases} Q(j)/Q(N(i)) &\text{ if } i\sim j,\\ 0 &\text{ otherwise.} \end{cases} \end{aligned} $$
(1)

As explained below, this walk is connected, aperiodic and reversible, with stationary distribution

$$\displaystyle \begin{aligned} \pi(i) = \frac{Q(i)Q(N(i))}{Z}, \end{aligned} $$
(2)

where Z is the normalizing constant.

It is natural to study the mixing time—the rate of convergence to stationarity. The following result shows that convergence is extremely rapid. Starting at \(i\in \mathbb {N}\), order \(\log _2^* i\) steps suffice, and for infinitely many i, are needed.

Theorem 1

Let Q(j) = 2−(j+1), \(0\leqslant j<\infty \) . For K(i, j) and π defined at (1) and (2) on the Rado graph R,

  1. 1.

    For universal A, B > 0, we have for all \(i\in \mathbb {N}\), \(\ell \geqslant 1\),

    $$\displaystyle \begin{aligned} \|K_i^\ell - \pi\| \leqslant A e^{\log_2^*i} e^{-B\ell}. \end{aligned}$$
  2. 2.

    For universal C > 0, if \(2^{(k)} = 2^{2^{\cdot ^{\cdot ^{\cdot ^{2}}}}}\) is the tower of 2’s of height k,

    $$\displaystyle \begin{aligned} \|K^\ell_{2^{(k)}} - \pi\| \geqslant C \end{aligned}$$

    for all \(\ell \leqslant k\) . Here \(\|K_i^\ell - \pi \| = \frac {1}{2}\sum _{j=0}^\infty |K^\ell (i,j) - \pi (j)|\) is the total variation distance and \(\log _2^* i \) is the number of times log2 needs to be applied, starting from i, to get a result \(\leqslant 1\).

The proofs allow for some variation in the measure Q. They also work for the G(, p) model of R, though some modification is needed since then K and π are random.

Theorem 1 answers a question in Diaconis and Malliaris [8], who proved the lower bound. Most Markov chains on countable graphs restrict attention to locally finite graphs [25]. For Cayley graphs, Bendikov and Saloff-Coste [1] begin the study of more general transitions and point out how few tools are available. See also [12, 20]. Studying the geometry of a space (here R) by studying the properties of the Laplacian (here I − K) is a classical pursuit (“Can you hear the shape of a drum?”)—see [16].

Section 2 gives background on the Rado graph, Markov chains, ball walks, and Hardy’s inequalities. Section 3 gives preliminaries on the behavior of the neighborhoods of the G(, p) model. The lower bound in Theorem 1 is proved in Sect. 4. Both Sects. 3 and 4 give insight into the geometry of R. The upper bound in Theorem 1 is proved by proving that the Markov chain K has a spectral gap. Usually, a spectral gap alone does not give sharp rates of convergence. Here, for any start i, we show the chain is in a neighborhood of 0 after order \(\log _2^* i\) steps. Then the spectral gap shows convergence in a bounded number of further steps. This argument works for both models of R. It is given in Sect. 5.

The spectral gap for the G(, p) model is proved in Sect. 6 using a version of Cheeger’s inequality for trees. For Rado’s binary model, the spectral gap is proved by a novel version of Hardy’s inequality for trees in Sect. 7. This is the first probabilistic application of this technique, which we hope will be useful more generally. There are two appendices containing technical details for the needed versions of Cheeger’s and Hardy’s inequalities.

2 Background on R, Markov Chains, and Hardy’s Inequalities

2.1 The Rado Graph

A definitive survey on the Rado graph (with full proofs) is in Peter Cameron’s fine article [6]. We have also found the Wikipedia entry on the Rado graph and Cameron’s follow-up paper [7] useful.

In Rado’s model, the graph R has vertex set \(\mathbb {N} = \{0,1,2,\ldots \}\) and an undirected edge from i to j if i < j and the i th digit of j is a one. There are many other constructions. The vertex set can be taken as the prime numbers that are 1 (mod 4) with an edge from p to q if the Legendre symbol \((\frac {p}{q}) = 1\). In [8], the graph appears as an induced subgraph of the commuting graph of the group U(, q)—infinite upper-triangular matrices with ones on the diagonal and entries in \(\mathbb {F}_q\). The vertices are points of U(, q). There is an edge from x to y if and only if the commutator x −1 y −1 xy is zero. The infinite Erdős–Rényi graphs G(, p) are almost surely isomorphic to R for all p, 0 < p < 1.

The graph R has a host of fascinating properties:

  • It is stable in the sense that deleting any finite number of vertices or edges yields an isomorphic graph. So does taking the complement.

  • It contains all finite or countable graphs as induced subgraphs. Thus, the (countable) empty graph and complete graphs both appear as induced subgraphs.

  • The diameter of R is two—consider any \(i\ne j \in \mathbb {N}\) and let k be a binary number with ones in positions i and j and zero elsewhere. Then i ∼ k ∼ j.

  • Each vertex is connected to “half” of the other vertices: 0 is connected to all the odd vertices, 1 to 0 and all numbers congruent to 2 or 3 (mod 4), and so on.

  • R is highly symmetric: Any automorphism between two induced subgraphs can be extended to all of R (this is called homogeneity). The automorphism group has the cardinality of the continuum.

  • R is the “limit” if the collection of all finite graphs (Fraissé limit). Let us spell this out. A relational structure is a set with a finite collection of relations (we are working in first order logic without constants or functions). For example, \(\mathbb {Q}\) with x < y is a relational structure. A graph is a set with one symmetric relation. The idea of a “relational sub-structure” clearly makes sense. A class \(\mathcal {C}\) of structures has the amalgamation property if for any \(A, B_1, B_2\in \mathcal {C}\) with embeddings and , there exists \(C\in \mathcal {C}\) and embeddings and such that g 1 f 1 = g 2 f 2. A countable relational structure M is homogeneous if any isomorphism between finite substructures can be extended to an automorphism of M. Graphs and \(\mathbb {Q}\) are homogeneous relational structures. A class \(\mathcal {C}\) has the joint embedding property if for any \(A,B \in \mathcal {C}\) there is a \(C \in \mathcal {C}\) so that A and B are embeddable in C.

    Theorem 2 (Fraissé)

    Let \(\mathcal {C}\) be a countable class of finite structures with the joint embedding property and closed under ‘induced’ isomorphism with amalgamation. Then there exists a unique countable homogeneous \(\mathcal {M}\) with \(\mathcal {C}\) as induced substructures.

    The rationals \(\mathbb {Q}\) are the Fraissé limit of finite ordered sets. The Rado graph R is the Fraissé limit of finite graphs. We have (several times!) been told “for a model theorist, the Rado graph is just as interesting as the rationals”.

There are many further, fascinating properties of R; see [6].

2.2 Markov Chains

A transition matrix K(i, j), \(0\leqslant i,j<\infty \), \(K(i,j)\geqslant 0\), \(\sum _{j=0}^\infty K(i,j)=1\) for all i, \(0\leqslant i<\infty \), generates a Markov chain through its powers

$$\displaystyle \begin{aligned} K^\ell(i,j) = \sum_{k=0}^\infty K(i,k) K^{\ell-1}(k,j). \end{aligned}$$

A probability distribution π(i), \(0\leqslant i<\infty \), is reversible for K if

$$\displaystyle \begin{aligned} \pi(i) K(i,j) = \pi(j) K(j,i) \ \ \text{ for all } 0\leqslant i,j<\infty. \end{aligned} $$
(3)

Example

With definitions (1), (2) on the Rado graph, if i ∼ j,

$$\displaystyle \begin{aligned} \pi(i)K(i,j) = \frac{Q(i)Q(N(i))}{Z} \frac{Q(j)}{Q(N(i))} = \frac{Q(i)Q(j)}{Z} = \pi(j) K(j,i). \end{aligned}$$

(Both sides are zero if ij.)

In the above example, we think of K(i, j) as a ‘ball walk’: From i, pick a neighbor j with probability proportional to Q(j) and move to j. We initially found the neat reversible measure surprising. Indeed, we and a generation of others thought that ball walks would have Q as a stationary distribution. Yuval Peres points out that, given a probability Q(j) on the vertices, assigning symmetric weight Q(i)Q(j) to i ∼ j gives this K for the weighted local walk. A double ball walk—“from i, choose a neighbor j with probability proportional to Q(j), and from j, choose a neighbor k with probability proportional to Q(k)∕Q(N(k))”—results in a reversible Markov chain with Q as reversing measure. Note that these double ball walks don’t require knowledge of normalizing constants. All of this suggests ball walks as reasonable objects to study.

Reversibility (3) shows that π is a stationary distribution for K:

$$\displaystyle \begin{aligned} \sum_{i=0}^\infty\pi(i) K(i,j) = \sum_{i=0}^\infty \pi(j)K(j,i) = \pi(j)\sum_{i=0}^\infty K(j,i) = \pi(j). \end{aligned}$$

In our setting, since the Rado graph has diameter 2, the walk is connected. It is easy to see that it is aperiodic. Thus, the π in (2) is the unique stationary distribution. Now, the fundamental theorem of Markov chain theory shows, for every starting state i, K (i, j) → π(j) as  →, and indeed,

$$\displaystyle \begin{aligned} \lim_{\ell \to\infty} \|K^\ell_i - \pi\|=0. \end{aligned}$$

Reversible Markov chains have real spectrum. Say that (K, π) has a spectral gap if there is A > 0 such that for every f ∈  2(π),

$$\displaystyle \begin{aligned} \sum_i (f(i)-\overline{f})^2 \pi(i) &\leqslant A \sum_{i,j} (f(i)-f(j))^2 \pi(i) K(i,j), \end{aligned} $$
(4)

where \(\overline {f} = \sum _{i=0}^\infty f(i)\pi (i)\). (Then the gap is at least 1∕A.) For chains with a spectral gap, for any i,

$$\displaystyle \begin{aligned} 4\|K_i^\ell - \pi\|{}^2 \leqslant \frac{1}{\pi(i)}\biggl(1-\frac{1}{A}\biggr)^{2\ell}. \end{aligned} $$
(5)

Background on Markov chains, particularly rates of convergence, can be found in the readable book of Levin and Peres [19]. For the analytic part of the theory, particularly (4) and (5), and many refinements, we recommend [23].

There has been a healthy development in Markov chain circles around the theme ‘How does a Markov chain on a random graph behave?’. One motivation being, ‘What does a typical convergence rate look like?’. The graphs can be restricted in various natural ways (Cayley graphs, regular graphs of fixed degree or fixed average degree, etc.). A survey of by now classical work is Hildebrand’s survey of ‘random-random walks’ [14]. Recent work by Bordenave and coauthors can be found from [4, 5]. For sparse Erdős–Rényi graphs, there is remarkable work on the walk restricted to the giant component. See [22], [11] and [3].

It is worth contrasting these works with the present efforts. The above results pick a neighbor uniformly at random. In the present paper, the ball walk drives the walk back towards zero. The papers above are all on finite graphs. The Markov chain of Theorem 1 makes perfect sense on finite graphs. The statements and proofs go through (with small changes) to show that order \(\log _2^* i\) steps are necessary and sufficient. (For the uniform walk on G(n, 1∕2), a bounded number of steps suffice from most initial states, but there are states from which \(\log _2^*n \) steps are needed.)

2.3 Hardy’s Inequalities

A key part of the proof of Theorem 1 applies Hardy’s inequalities for trees to prove a Poincaré inequality (Cf. (4)) and hence a bound on the spectral gap. Despite a large expository literature, Hardy’s inequalities remain little known among probabilists. Our application can be read without this expository section but we hope that some readers find it useful. Extensive further references, trying to bridge the gap between probabilists and analysts, is in [17].

Start with a discrete form of Hardy’s original inequality [13, pp. 239–243]. This says that if \(a_n\geqslant 0\), A n = a 1 + ⋯ + a n, then

$$\displaystyle \begin{aligned} \sum_{n=1}^\infty \frac{A_n^2}{n^2} \leqslant 4\sum_{n=1}^\infty a_n^2, \end{aligned}$$

and the constant 4 is sharp. Analysts say that “the Hardy operator taking {a n} to {A nn} is bounded from 2 to 2”. Later writers showed how to put weights in. If μ(n) and ν(n) are positive functions, one aims for

$$\displaystyle \begin{aligned} \sum_{n=1}^\infty A_n^2 \mu(n) \leqslant A \sum_{n=1}^\infty a_n^2\nu(n), \end{aligned}$$

for an explicit A depending on μ(n) and ν(n). If μ(n) = 1∕n 2 and ν(n) = 1, this gives the original Hardy inequality. To make the transition to a probabilistic application, take a(n) = g(n) − g(n − 1) for g in 2. The inequality becomes

$$\displaystyle \begin{aligned} \sum_{n=1}^\infty g(n)^2 \mu(n) \leqslant A \sum_{n=1}^\infty(g(n)-g(n-1))^2 \nu(n). \end{aligned} $$
(6)

Consider a ‘birth and death chain’ which transits from j to j + 1 with probability b(j) and from j to j − 1 with probability d(j). Suppose that this has stationary distribution μ(j) and that ∑j g(j)μ(j) = 0. Set ν(j) = μ(j)d(j). Then (6) becomes (following simple manipulations)

$$\displaystyle \begin{aligned} \mathrm{Var}(g) \leqslant A \sum_{j,k} (g(j)-g(k))^2 \mu(j)K(j,k) \end{aligned} $$
(7)

with K(j, k) the transition matrix of the birth and death chain. This gives a Poincaré inequality and spectral gap estimate. A crucial ingredient for applying this program is that the constant A must be explicit and manageable. For birth-death chains, this is indeed the case. See [21] or the applications in [9]. The transition from (6) to (7) leans on the one-dimensional setup of birth-death chains. While there is work on Hardy’s inequalities in higher dimensions, it is much more complex; in particular, useful forms of good constants A seem out of reach. In [21], Miclo has shown that for a general Markov transition matrix K(i, j), a spanning tree in the graph underlying K can be found. There is a useful version of Hardy’s inequality for trees due to Evans, Harris and Pick [10]. This is the approach developed in Sect. 7 below which gives further background and details.

Is approximation by trees good enough? There is some hope that the best tree is good enough (see [2]). In the present application, the tree chosen gives the needed result.

2.4 The \(\log ^*\) Function

Take any a > 1. The following is a careful definition of \(\log _a^*x\) for \(x\geqslant 0\). First, an easy verification shows that the map \(x\mapsto (\log x)/x\) on (0, ) is unimodal, with a unique maximum at x = e (where its value is 1∕e), and decaying to − as x → 0 and to 0 as x →. Thus, if a > e 1∕e, then for any x > 0,

$$\displaystyle \begin{aligned} \log_a x = \frac{\log x}{\log a} \leqslant \frac{x}{e\log a} < x. \end{aligned}$$

Since loga is a continuous map, this shows that if we start with any x > 0, iterative applications of loga will eventually lead to a point in (0, a) (because there are no fixed points of loga above that, by the above inequality), and then another application of loga will yield a negative number. This allows us to define \(\log _a^*x\) as the minimum number of applications of loga, starting from x, that gets us a nonpositive result.

If \(a \leqslant e^{1/e}\), the situation is a bit more complicated. Here, \(\log a \leqslant 1/e\), which is the maximum value of the unimodal map \(x\mapsto (\log x)/x\). This implies that there exist exactly two points \(0 < y_a\leqslant x_a\) that are fixed points of loga (with y a = x a if a = e 1∕e). Moreover, loga x < x if x∉[y a, x a], and \(\log _a x\geqslant x\) if x ∈ [y a, x a]. Thus, the previous definition does not work. Instead, we define \(\log _a^*x\) to be the minimum number of applications of loga, starting from x, that leads us to a result \(\leqslant x_a\). In both cases, defining \(\log _a^*0=0\) is consistent with the conventions. Note that \(\log _a^* x\geqslant 0\) for all \(x\geqslant 0\).

3 The Geometry of the Random Model

Throughout this section the graph is G(, 1∕2) — an Erdős–Rényi graph on \(\mathbb {N}=\{0,1,2,\ldots \}\) with probability 1∕2 for each possible edge. From here on, we will use the notation \(\mathbb {N}_+\) to denote the set {1, 2, …} of strictly positive integers. Let Q(x) = 2−(x+1) for \(x \in \mathbb {N}\). The transition matrix

and its stationary distribution π(x) = Z −1 Q(x)Q(N(x)) are thus random variables. Note that N(x), the neighborhood of x, is random. The main result of this section shows that this graph, with vertices weighted by Q(x), has its geometry controlled by a tree rooted at 0. This tree will appear in both lower and upper bounds on the mixing time for the random model.

To describe things, let \(p(x)=\min N(x)\) (p is for ‘parent’, not to be confused with the edge probability p in G(, p)). We need some preliminaries about p(x).

Lemma 1

Let \(\mathcal {B}\) be the event that for all \(x\in \mathbb {N}_+\) , p(x) < x. Then we have that \(\mathbb {P}(\mathcal {B})\geqslant 1/4\).

Proof

Denote , and for any \(e\in \overline E\), consider , where E is the set of edges in G(, 1∕2), so that \((B_e)_{e\in \overline E}\) is a family of independent Bernoulli variables of parameter 1∕2.

For \(x\in \mathbb {N}_+\), define A x the event that x is not linked in \(\mathcal {G}\) to a smaller vertex. Namely, we have formally

where . Note that the family \((A_x)_{x\in \mathbb {N}_+}\) is independent, and in particular, its events are pairwise independent. We are thus in position to apply Kounias–Hunter–Worsley bounds [15, 18, 26] (see also the survey [24]), to see that for any \(n\in \mathbb {N}_+\),

where we used that \( \mathbb {P}(A_1)\geqslant \mathbb {P}(A_2)\geqslant \cdots \geqslant \mathbb {P}(A_n)\), which holds because

We deduce that

Letting n tends to infinity, we get \( \mathbb {P}\!\left (\bigcup _{x\in \mathbb {N}_+} A_x\right )\leqslant \frac 34\). To conclude, note that \( \mathcal {B}^{\mathrm {c}}=\bigcup _{x\in \mathbb {N}_+} A_x\). □

Remark 1

Assume that instead of 1∕2, the edges of \(\overline E\) belong to E with probability p ∈ (0, 1) (still independently), the corresponding notions receive p in index. The above computations show \( \mathbb {P}_p(\mathcal {B})\geqslant 1- (2-3p +p^2)\wedge 1\), so that \(\mathbb {P}_p(\mathcal {B})\) goes to 1 as p goes to 1, but this bounds provides no information for \(p\in (0,(3-\sqrt {5})/2]\).

In fact the above observation shows that the Kounias–Hunter–Worsley bound is not optimal, at least for small p > 0. So let us give another computation of \(\mathbb {P}_p(\mathcal {B})\):

Lemma 2

Consider the situation described in Remark 1 , with p ∈ (0, 1). We have

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mathbb{P}_p(\mathcal{B})& =&\displaystyle \biggl(\sum_{n\in\mathbb{N}} p(n)(1-p)^n\biggr)^{-1}\end{array} \end{aligned} $$

where p(n) is the number of partitions of n. In particular \(\mathbb {P}(\mathcal {B})>0\) for all p ∈ (0, 1).

Proof

Indeed, we have \( \mathcal {B}=\bigcap _{x\in \mathbb {N}_+} A_x^{\mathrm {c}}\), so that by independence of the A x, for \(x\in \mathbb {N}_+\),

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mathbb{P}_p(\mathcal{B})=\prod_{x\in\mathbb{N}_+} \mathbb{P}(A_x^{\mathrm{c}}) =\biggl(\prod_{x\in\mathbb{N}_+} \frac 1{1-(1-p)^x}\biggr)^{-1} =\biggl(\prod_{x\in\mathbb{N}_+} \sum_{n\in\mathbb{N}}(1-p)^{xn}\biggr)^{-1}\\ \end{array} \end{aligned} $$

Let \(\mathcal {N}\) be the set of sequences of integers \((n_l)_{l\in \mathbb {N}_+}\) with all but finitely many elements equal to zero. Applying the distributive law to the above expression, we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \mathbb{P}_p(\mathcal{B})=\biggl(\sum_{(n_l)_{l\in\mathbb{N}_+}\in\mathcal{N}} \prod_{x\in\mathbb{N}_+}(1-p)^{xn_x}\biggr)^{-1} =\biggl(\sum_{n\in\mathbb{N}} p(n)(1-p)^n\biggr)^{-1}\end{array} \end{aligned} $$

where p(n) is the number of ways to write n as \(\sum _{x\in \mathbb {N}_+} xn_x\), with \((n_l)_{l\in \mathbb {N}_+}\in \mathcal {N}\). □

Consider the set of edges

and the corresponding graph . Under \(\mathcal {B}\), it is clear that \(\mathcal {T}\) is a tree. But this is always true:

Lemma 3

The graph \(\mathcal {T}\) is a tree.

Proof

The argument is by contradiction. Assume that \(\mathcal {T}\) contains a cycle, say \((x_l)_{l\in \mathbb {Z}_n}\) with \(n\geqslant 3\). Let us direct the a priori unoriented edges {x l, x l+1}, for \(l\in \mathbb {Z}_n\), by putting an arrow from x l to x l+1 (respectively from x l+1 to x l) if p(x l) = x l+1 (resp. p(x l+1) = x l). Note that we either have

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \forall\ l\in\mathbb{Z}_n,\, x_l\rightarrow x_{l+1}, & \mbox{ or }&\displaystyle \forall\ l\in\mathbb{Z}_n,\, x_{l+1}\rightarrow x_{l},\end{array} \end{aligned} $$
(8)

because otherwise there would exist \(l\in \mathbb {Z}_n\) with two arrows exiting from x l, a contradiction. Up to reindexing \((x_l)_{l\in \mathbb {Z}_n}\) as \((x_{-l})_{l\in \mathbb {Z}_n}\), we can assume that (8) holds.

Fix some \(l\in \mathbb {Z}_n\). Since p(x l) = x l+1, we have x l ∈ N(x l+1), so \(x_{l+2}=p(x_{l+1})\leqslant x_l\). Due to the fact that x l ≠ x l+2 (recall that \(n\geqslant 3\)), we get x l+2 < x l. Starting from x 0 and iterating this relation (in a minimal way, n∕2 times if n is even, or n times if n is odd), we obtain a contradiction: x 0 < x 0. Thus, \(\mathcal {T}\) must be a tree. □

Let us come back to the case where p = 1∕2. The following result gives an idea of how far p(x) is from x, for \(x\in \mathbb {N}_+\).

Lemma 4

Almost surely, there exist only finitely many \(x\in \mathbb {N}_+\) such that p(x) > 2log2(1 + x). In particular, a.s. there exists a (random) finite \(C\geqslant 2\) such that

$$\displaystyle \begin{aligned} \begin{array}{rcl} \forall\ x\in\mathbb{N}_+,\qquad p(x)& \leqslant &\displaystyle C\log_2(1+x).\end{array} \end{aligned} $$

Proof

The first assertion follows from the Borel–Cantelli lemma, as follows. For any \(x\in \mathbb {N}_+\), consider the event

Denoting ⌊⋅⌋ the the integer part, we compute

$$\displaystyle \begin{aligned} \begin{array}{rcl} \sum_{x\in\mathbb{N}_+} \mathbb{P}(A_x)& =&\displaystyle \sum_{x\in\mathbb{N}_+} \mathbb{P}(B_{\{0,x\}}=0, B_{\{1,x\}}=0, \ldots, B_{\{\lfloor 2\log_2(1+x)\rfloor,x\}}=0)\\ & =&\displaystyle \sum_{x\in\mathbb{N}_+} \frac 1{2^{1+\lfloor 2\log_2(1+x)\rfloor}} \leqslant \sum_{x\in\mathbb{N}_+} \frac 1{(1+x)^2} <+\infty.\end{array} \end{aligned} $$

Having shown that a.s. there exists only a finite number of integers \(x\in \mathbb {N}_+\) satisfying p(x) > 2log2(1 + x), denote these points as x 1, …, x N, with \(N\in \mathbb {N}\). To get the second assertion, it is sufficient to take , with the convention that if N = 0. □

4 The Lower Bound

The lower bound in Theorem 1, showing that order \(\log _2^*i\) steps are necessary for infinitely many i is proved in [8] for the binary model of the Rado graph and we refer there for the proof. A different argument is needed for the G(, 1∕2) model. This section gives the details (see Theorem 3 below).

Let μ be the stationary distribution of our random walk on G(, 1∕2) (with Q(j) = 2−(j+1), as in Theorem 1), given a realization of the graph. Note that μ is random. For each \(x\in \mathbb {N}\), let τ x be the mixing time of the walk starting from x, that is, the smallest n such that the law of the walk at time n, starting from x, has total variation distance \(\leqslant 1/4\) from μ. Note that the τ x’s are also random.

Theorem 3

Let τ x be as above. Then with probability one,

$$\displaystyle \begin{aligned} \limsup_{x\to\infty} \frac{\tau_x}{\log_{16}^* x} \geqslant 1. \end{aligned}$$

Equivalently, with probability one, given any ε > 0, \(\tau _{x} \geqslant (1-\varepsilon )\log _{16}^* x\) for infinitely many x.

We need the following lemma.

Lemma 5

With probability one, there is an infinite sequence \(x_0< x_1<x_2<\cdots \in \mathbb {N}\) such that:

  1. 1.

    For each i, x i+1 is connected to x i by an edge, but not connected by an edge to any other number in {0, 1, …, 2x i − 1}.

  2. 2.

    For each i, \(2^{3x_i}\leqslant x_{i+1} \leqslant 2^{3x_i+1}-1\).

Proof

Define a sequence y 0, y 1, y 2, … inductively as follows. Let y 0 be an arbitrary element of \(\mathbb {N}\). For each i, let y i+1 be the smallest element in \(\{2^{3y_i}, 2^{3y_i}+1,\ldots ,2^{3y_i+1}-1\}\) that has an edge to y i, but to no other number in {0, 1, …, 2y i − 1}. If there exists no such number, then the process stops. Let A i be the event that y i exists. Note that A 0 ⊇ A 1 ⊇ A 2 ⊇⋯.

Let F(x) := 23x, G(x) := 23x+1 − 1, a 0 = b 0 = y 0, and for each \(i\geqslant 1\), let

Since \(2^{3y_i} \leqslant y_{i+1}\leqslant 2^{3y_i+1}-1\) for each i, it follows by induction that \(a_i\leqslant y_i\leqslant b_i\) for each i (if y i exists). Now fix some \(i\geqslant 1\). Since the event A i−1 is determined by y 1, …, y i−1, and these random variables can take only finitely many values (by the above paragraph), we can write A i−1 as a finite union of events of the form {y 1 = c 1, …, y i−1 = c i−1}, where \(c_1< c_2 <\cdots < c_{i-1}\in \mathbb {N}\).

Now note that for any c 1 < ⋯ < c i−1, the event happens if and only if {y 1 = c 1, …, y i−1 = c i−1} happens and there is some \(y\in \{2^{3c_{i-1}}, 2^{3c_{i-1}}+1,\ldots ,2^{3c_{i-1}+1}-1\}\) that has an edge to c i−1, but to no other number in {0, …, 2c i−1 − 1}. The event {y 1 = c 1, …, y i−1 = c i−1} is in \(\mathcal {F}_{c_{i-1}}\), where \(\mathcal {F}_x\) denotes the σ-algebra generated by the edges between all numbers in {0, …, x}. On the other hand, on the event {y 1 = c 1, …, y i−1 = c i−1}, it is not hard to see that

$$\displaystyle \begin{aligned} \mathbb{P}(A_i | \mathcal{F}_{c_{i-1}}) = 1 - (1- 2^{-2c_{i-1}})^{2^{3c_{i-1}}}. \end{aligned}$$

Thus,

where in the last step we used the inequality \(0\leqslant 1-x\leqslant e^{-x}\) (which holds for all x ∈ [0, 1]). Note that the term inside the parentheses on the right side is an increasing function of c i−1, and the maximum possible value of y i−1 is b i−1. Thus, summing both sides over all values of c 1, …, c i−1 such that {y 1 = c 1, …, y i−1 = c i−1}⊆ A i−1, we get . Proceeding inductively, this gives

Taking i →, we get \(\mathbb {P}(B)\geqslant \prod _{k=0}^\infty (1-e^{-2^{b_k}})\), where \(B := \bigcap _{k=1}^\infty A_k\). Now recall that the event B, as well as the numbers b 0, b 1, …, are dependent on our choice of y 0. To emphasize this dependence, let us write them as B(y 0) and b k(y 0). Then by the above inequality,

$$\displaystyle \begin{aligned} \sum_{y_0\in \mathbb{N}} \mathbb{P}(B(y_0)^c) \leqslant \sum_{y_0\in \mathbb{N}} \biggl(1-\prod_{k=0}^\infty (1-e^{-2^{b_k(y_0)}})\biggr), \end{aligned} $$

where B(y 0)c denotes the complement of B(y 0). Due to the extremely rapid growth of b k(y 0) as k →, and the fact that b 0(y 0) = y 0, it is not hard to see that the right side is finite. Therefore, by the Borel–Cantelli lemma, B(y 0)c happens for only finitely many y 0 with probability one. In particular, with probability one, B(y 0) happens for some y 0. This completes the proof. □

Proof (Of Theorem 3)

Fix a realization of G(, 1∕2). Let x be so large that μ([x, )) < 1∕10, and \( \prod _{k=1}^\infty (1-2^{-a_k(x)+1})\geqslant {9}/{10}\).

Let x 0, x 1, x 2, … be a sequence having the properties listed in Lemma 5 (which exists with probability one, by the lemma). Discarding some initial values if necessary, let us assume that x 0 > x. By the listed properties, it is obvious that x i → as i →. Thus, to prove Theorem 3, it suffices to prove that

$$\displaystyle \begin{aligned} \liminf_{i\to\infty} \frac{\tau_{x_i}}{\log_{16}^* x_i} \geqslant 1. \end{aligned} $$
(9)

We will now deduce this from the properties of the sequence.

Suppose that our random walk starts from x i for some \(i\geqslant 1\). Since x i connects to x i−1 by an edge, but not to any other number in {0, …, 2x i−1 − 1}, we see that the probability of the walk landing up at x i−1 in the next step is at least

$$\displaystyle \begin{aligned} 1 - \frac{1}{2^{-x_i}}\sum_{k= 2x_i}^\infty 2^{-k} = 1- 2^{-x_i + 1}. \end{aligned}$$

Proceeding by induction, this shows that the chance that the walk lands up at x 0 at step i is at least \( \prod _{k=1}^i(1- 2^{-x_k + 1}) \). Let μ i be the law of walk at step i (starting from x i, and conditional on the fixed realization of our random graph). Then by the above deduction and the facts that x 0 > x and \(x_k \geqslant a_k(x_0)\geqslant a_k(x)\), we have

$$\displaystyle \begin{aligned} \mu_i([x,\infty)) &\geqslant \prod_{k=1}^i(1- 2^{-x_k + 1})\geqslant \prod_{k=1}^i (1-2^{-a_k(x)+1}). \end{aligned} $$

By our choice of x, the last expression is bounded below by 9∕10. But μ([x, )) < 1∕10. Thus, the total variation distance between μ i and μ is at least 8∕10. In particular, \(\tau _{x_i} > i\). Now, \(x_i \leqslant 2^{3x_{i-1} + 1}-1 \leqslant 16^{x_{i-1}}\), which shows that \(\log _{16}^*x_i \leqslant \log _{16}^* x_{i-1} + 1\). Proceeding inductively, we get \( \log _{16}^*x_i \leqslant i + \log _{16}^* x_0 \). Thus, \(\tau _{x_i} > \log _{16}^* x_i - \log _{16}^* x_0\). This proves (9). □

5 The Upper Bound (Assuming a Spectral Gap)

This section gives the upper bound for both the binary and random model of the Rado graph. Indeed, the proof works for a somewhat general class of graphs and more general base measures Q. The argument assumes that we have a spectral gap estimate. These are proved below in Sects. 6 and 7. We give this part of the argument first because, as with earlier sections, it gives a useful picture of the random graph.

Take any undirected graph on the nonnegative integers, with the property:

(10)

Let \(\{X_n\}_{n\geqslant 0}\) be the Markov chain on this graph, which, starting at state i, jumps to a neighbor j with probability proportional to Q(j) = 2−(j+1). The following is the main result of this section.

Theorem 4

Let K be the transition kernel of the Markov chain defined above. Suppose that K has a spectral gap. Let μ be the stationary distribution of the chain, and let a := e 1∕C . Then for any \(i\in \mathbb {N}\) and any \(\ell \geqslant 1\),

$$\displaystyle \begin{aligned} \|K_i^\ell - \mu\|\leqslant C_1e^{\log_a^*i} e^{-C_2 \ell}, \end{aligned} $$

where C 1 and C 2 are positive constants that depend only the properties of the chain (and not on i or ℓ).

By Lemma 4, G(, 1∕2) satisfies the property (10) with probability one, for some C that may depend on the realization of the graph. The Rado graph also satisfies property (10), with \(K = 1/\log 2\). Thus, the random walk starting from j mixes in time \(\log _2^* j\) on the Rado graph, provided that it has a spectral gap. For G(, 1∕2), assuming that the walk has a spectral gap, the mixing time starting from j is \(\log _a^* j\), where a depends on the realization of the graph. The spectral gap for G(, 1∕2) will be proved in Sect. 6, and the spectral gap for the Rado graph will be established in Sect. 7. Therefore, this proves Theorem 1 and also the analogous result for G(, 1∕2).

Proof (Of Theorem 4)

Note that a > 1. Let \(Z_n := \log _a^* X_n\). We claim that there is some j 0 sufficiently large, and some positive constant c, such that

$$\displaystyle \begin{aligned} \mathbb{E}(e^{Z_{n+1}}|\mathcal{F}_n) \leqslant e^{Z_n-c} \ \ \ \text{ if } Z_n >j_0, \end{aligned} $$
(11)

where \(\mathcal {F}_n\) is the σ-algebra generated by X 0, …, X n. (The proof is given below.) This implies that if we define the stopping time \(S := \min \{n\geqslant 0: X_n\leqslant j_0\}\), then \(\{e^{Z_{S\wedge n} + c(S\wedge n)}\}_{n\geqslant 0}\) is a supermartingale with respect to the filtration \(\{\mathcal {F}_{n}\}_{n\geqslant 0}\) (see details below). Moreover, it is nonnegative. Thus, if we start from the deterministic initial condition X 0 = j, then for any n,

$$\displaystyle \begin{aligned} \mathbb{E}(e^{Z_{S\wedge n} + c (S\wedge n)}|X_0=j) \leqslant e^{Z_{S\wedge 0} + c (S\wedge 0)} = e^{\log_a^* j}. \end{aligned} $$

But \(Z_{S\wedge n}\geqslant 0\). Thus, \(\mathbb {E}(e^{c(S\wedge n)}|X_0=j)\leqslant e^{\log _a^* j}\). Taking n → and applying the monotone convergence theorem, we get

$$\displaystyle \begin{aligned} \mathbb{E}(e^{cS}|X_0=j)\leqslant e^{\log_a^* j}. \end{aligned} $$
(12)

Now take any \(j\geqslant 1\) and \(n\geqslant 1\). Let μ be the stationary distribution, and let μ j,n be the law of X n when X 0 = j. Take any A ⊆{0, 1, …}. Then for any \(m\leqslant n\),

But

$$\displaystyle \begin{aligned} \mathbb{P}(X_n\in A|S=i, \, X_i = l, \, X_0=j) &= \mathbb{P}(X_n \in A|X_i = l) = \mu_{l,n-i}(A), \end{aligned} $$

and

$$\displaystyle \begin{aligned} \mu(A) &= \sum_{i=0}^m\sum_{l=0}^{j_0}\mu(A)\mathbb{P}(S=i, \, X_i=l|X_0=j) + \mu(A)\mathbb{P}(S > m|X_0=j). \end{aligned} $$

Thus, |μ j,n(A) − μ(A)| can be bounded above by

$$\displaystyle \begin{aligned} \sum_{i=0}^m\sum_{l=0}^{j_0} |\mu_{l,n-i}(A) - \mu(A)|\mathbb{P}(S=i, \, X_i=l|X_0=j) +\mathbb{P}(S>m |X_0=j). \end{aligned} $$

Now, if our Markov chain has a spectral gap, there exist constants C 1 and C 2 depending only on j 0 and the spectral gap, such that

$$\displaystyle \begin{aligned} |\mu_{l,n-i}(A) - \mu(A)| \leqslant C_1 e^{-C_2(n-i)}\leqslant C_1 e^{-C_2(n-m)} \end{aligned}$$

for all \(0\leqslant i\leqslant m\) and \(0\leqslant l\leqslant j_0\). Using this bound and the bound (12) on \(\mathbb {E}(e^{cS}|X_0=j)\) obtained above, we get

$$\displaystyle \begin{aligned} |\mu_{j,n}(A)-\mu(A)| &\leqslant C_1 e^{-C_2(n-m)} + e^{\log_a^* j - cm}. \end{aligned} $$

Taking m = ⌈n∕2⌉, we get the desired result. □

Proof (Of inequality (11))

It suffices to take n = 0. Suppose that X 0 = j for some \(j\geqslant 1\). By assumption, there is a neighbor k of j such that \(k\leqslant K\log j = \log _a j\). Assuming that j is sufficiently large (depending on K), we have that for any \(l\leqslant k\),

$$\displaystyle \begin{aligned} \log_a^* l \leqslant \log_a^* k \leqslant \log_a^*(\log_a j) = \log_a^*j -1. \end{aligned}$$

Also, \(\log _a^*l \leqslant \log _a^* j\) for any \(l\leqslant j\). Thus,

$$\displaystyle \begin{aligned} \mathbb{E}(e^{Z_1 - Z_0}|X_0=j) &\leqslant e^{-1}\mathbb{P}(X_1 \leqslant k |X_0=j) + \mathbb{P}(k< X_1\leqslant j | X_0=j)\\ &\qquad + \sum_{l> j} e^{\log_a^* l - \log_a^*j}\mathbb{P}(X_1=l| X_0=j). \end{aligned} $$

Now for any \(l \geqslant k\),

$$\displaystyle \begin{aligned} \mathbb{P}(X_1=l | X_0=j) \leqslant \frac{\mathbb{P}(X_1=l | X_0=j)}{\mathbb{P}(X_1 = k | X_0=j)} = \frac{Q(l)}{Q(k)} = 2^{-(l-k)}. \end{aligned} $$

Thus,

$$\displaystyle \begin{aligned} \sum_{l>j} e^{\log_a^*l - \log_a^*j} \mathbb{P}(X_1=l | X_0=j)&\leqslant \sum_{l > j} e^{\log_a^*l - \log_a^*j} 2^{-(l-k)}, \end{aligned} $$

which is less than 1∕4 if j is sufficiently large (since \(k\leqslant \log _a^*j\)). Next, let L be the set of all l > k that are connected to j. Then

$$\displaystyle \begin{aligned} \mathbb{P}(X_1 > k | X_0=j) &\leqslant \frac{\mathbb{P}(X_1 > k | X_0=j) }{\mathbb{P}(X_1 \geqslant k | X_0=j) } = \frac{\sum_{l \in L} 2^{-l}}{2^{-k}+ \sum_{l\in L} 2^{-l}}. \end{aligned} $$

Since the map xx∕(2k + x) is increasing, this shows that

$$\displaystyle \begin{aligned} \mathbb{P}(X_1 > k | X_0=j) &\leqslant \frac{\sum_{l > k} 2^{-l}}{2^{-k}+ \sum_{l> k} 2^{-l}} = \frac{1}{2}. \end{aligned} $$

Combining, we get that for sufficiently large j,

$$\displaystyle \begin{aligned} \mathbb{E}(e^{Z_1 - Z_0}|X_0=j) &\leqslant e^{-1} \mathbb{P}(X_1\leqslant k|X_0=j) + \mathbb{P}(X_1 > k|X_0=j) + \frac{1}{4}\\ {} &= e^{-1} + (1-e^{-1}) \mathbb{P}(X_1 > k|X_0=j) + \frac{1}{4}\\ {} &\leqslant e^{-1} + \frac{1-e^{-1}}{2} + \frac{1}{4} = \frac{3+2e^{-1}}{4} < 1. \end{aligned} $$

Proof (Of the Supermartingale Property)

Note that

$$\displaystyle \begin{aligned} &{\mathbb{E}(e^{Z_{S\wedge (n+1)} + c(S\wedge (n+1))}|\mathcal{F}_{n})}\\ &= \sum_{i=0}^n \mathbb{E}(e^{Z_{S\wedge (n+1)} + c(S\wedge (n+1))} 1_{\{S=i\}}|\mathcal{F}_{n}) + \mathbb{E}(e^{Z_{S\wedge (n+1)} + c(S\wedge (n+1))} 1_{\{S> n\}}|\mathcal{F}_{ n})\\ &= \sum_{i=0}^n \mathbb{E}(e^{Z_{i} + c i} 1_{\{S=i\}}|\mathcal{F}_{n})+ \mathbb{E}(e^{Z_{n+1} + c(n+1)} 1_{\{S> n\}}|\mathcal{F}_{ n}). \end{aligned} $$

The events {S = i} are \(\mathcal {F}_{n}\)-measurable for all \(0\leqslant i\leqslant n\), and so is the event {S > n}. Moreover, Z 0, …, Z n are also \(\mathcal {F}_{n}\)-measurable. Thus, the above expression shows that

$$\displaystyle \begin{aligned} \mathbb{E}(e^{Z_{S\wedge (n+1)} + c(S\wedge (n+1))}|\mathcal{F}_{n}) \!=\!1_{\{S\leqslant n\}} e^{Z_{S\wedge n} + c(S\wedge n)}\!+\!1_{\{S> n\}}\mathbb{E}(e^{Z_{n+1} + c(n+1)} |\mathcal{F}_{ n}). \end{aligned} $$

But if S > n, then Z n > j 0, and therefore by (11),

$$\displaystyle \begin{aligned} \mathbb{E}(e^{Z_{n+1} + c(n+1)} |\mathcal{F}_{ n})\leqslant e^{Z_n - c + c(n+1)} = e^{Z_n + cn}. \end{aligned}$$

Thus,

$$\displaystyle \begin{aligned} \mathbb{E}(e^{Z_{S\wedge (n+1)} + c(S\wedge (n+1))}|\mathcal{F}_{n}) &\leqslant 1_{\{S\leqslant n\}} e^{Z_{S\wedge n} + c(S\wedge n)} + 1_{\{S> n\}}e^{Z_n + cn}\\ &= e^{Z_{S\wedge n} + c(S\wedge n)}.\end{aligned} $$

6 Spectral Gap for the Random Model

Our next goal is to show that the random reversible couple (K, π) admits a spectral gap. The arguments make use of the ideas and notation of Sect. 3. In particular, recall the event \(\mathcal {B} = \{p(x) < x \ \ \forall \ x\in \mathbb {N}_+\}\) from Lemma 1 and the random tree \(\mathcal {T}\) with edge set F from Lemma 3. The argument uses a version of Cheeger’s inequality for trees which is further developed in Appendix 1.

Proposition 1

On \(\mathcal {B}\) , there exists a random constant Λ > 0 such that

$$\displaystyle \begin{aligned} \begin{array}{rcl} \forall\ f\in L^2(\pi),\qquad \Lambda\pi[(f-\pi[f])^2]& \leqslant &\displaystyle \mathcal{E}(f)\end{array} \end{aligned} $$

where in the r.h.s. \(\mathcal {E}\) is the Dirichlet form defined by

Taking into account that for any f ∈ L 2(π), the variance π[(fπ[f])2] of f with respect to π is bounded above by π[(ff(0))2], the previous result is an immediate consequence of the following existence of positive first Dirichlet eigenvalue under \(\mathcal {B}\).

Proposition 2

On \(\mathcal {B}\) , there exists a random constant Λ > 0 such that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \forall\ f\in L^2(\pi),\qquad \Lambda\pi[(f-f(0))^2]& \leqslant &\displaystyle \mathcal{E}(f).\end{array} \end{aligned} $$
(13)

The proof of Proposition 2 is based on the pruning of \(\mathcal {G}\) into \(\mathcal {T}\) and then resorting to Cheeger’s inequalities for trees. More precisely, let us introduce the following notations. Define the Markov kernel \(K_{\mathcal {T}}\) as

Note that this kernel is reversible with respect to π. The corresponding Dirichlet form is given, for any f ∈ L 2(π), by

It will be convenient to work with , where Z is the normalizing constant of π, as in equation (2). Define a nonnegative measure μ on \(\mathbb {N}_+\) as

(14)

Proposition 3

On \(\mathcal {B}\) , there exists λ > 0 such that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \forall\ f\in L^2(\mu),\qquad \lambda\mu[(f-f(0))^2]& \leqslant &\displaystyle \widetilde{\mathcal{E}}(f).\end{array} \end{aligned} $$
(15)

This result immediately implies Proposition 2. Indeed, due on one hand to the inclusion and on the other hand to the nature of Q, we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \forall\ x\in\mathbb{N}_+,\qquad Q(p(x))\ \leqslant \ Q(N(x))\ \leqslant \ 2Q(p(x)).\end{array} \end{aligned} $$
(16)

Thus for any f ∈ L 2(μ),

and thus, Proposition 2 holds with .

The proof of Proposition 3 is based on a Dirichlet-variant of the Cheeger inequality (which is in fact slightly simpler than the classical one, see Appendix 1). For any \(A\subset \mathbb {N}_+\), define . Endow \(\overline E\) with the measure ν induced by, for any \( \{x,y\}\in \overline E\),

Define the Dirichlet–Cheeger constant

where . The proof of the traditional Markovian Cheeger’s inequality given in the lectures by Saloff-Coste [23] implies directly that the best constant λ in Proposition 3 satisfies \( \lambda \geqslant {\iota ^2}/{2}\). Thus it remains to check:

Proposition 4

On \(\mathcal {B}\) , we have \(\iota \geqslant 1/2\) and in particular ι > 0.

Proof

Take any nonempty \(A\in \mathcal {A}\) and decompose it into its connected components with respect to \(\mathcal {T}\): \( A=\bigsqcup _{i\in \mathcal {I}} A_i\), where the index set \(\mathcal {I}\) is at most denumerable. Note that

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mu(A)=\sum_{i\in\mathcal{I}} \mu(A_i), \ \ \nu(A)=\sum_{i\in\mathcal{I}}\nu(A_i),\end{array} \end{aligned} $$

where the second identity holds because there are no edges in F connecting two different A i’s. Thus, it follows that \( \iota = \inf _{A\in \widetilde {\mathcal {A}}}{\nu (\partial A)}/{\mu (A)}\), where \(\widetilde {\mathcal {A}}\) is the set of subsets of \(\mathcal {A}\) which are \(\mathcal {T}\)-connected.

Consider \(A\in \widetilde {\mathcal {A}}\), it has a smallest element \(a\in \mathbb {N}_+\) (since 0∉A). Let T a be the subtree of descendants of A in \(\mathcal {T}\) (i.e., the set of vertices from \(\mathbb {N}_+\) whose non-self-intersecting path to 0 passes through a). We have A ⊂ T a, and ∂A ⊃{a, p(a)} = ∂T a, and it follows that \( {\nu (\partial A)}/{\mu (A)}\geqslant {\nu (\partial T_a)}/{\mu (T_a)}\). We deduce that

$$\displaystyle \begin{aligned} \begin{array}{rcl} \iota& \geqslant &\displaystyle \inf_{a\in\mathbb{N}_+}\frac{\nu(\partial T_a)}{\mu(T_a)} = \inf_{a\in\mathbb{N}_+}\frac{Q(a)Q(p(a))}{\mu(T_a)}.\end{array} \end{aligned} $$

On \(\mathcal {B}\), we have for any \(a\in \mathbb {N}_+\), on the one hand

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \forall\ x\in T_a,\qquad p(x)& \geqslant &\displaystyle p(a),\end{array} \end{aligned} $$
(17)

and on the other hand

(18)

We get μ(T a) equals

It follows that \( \iota \geqslant 1/2\). □

Lemma 4 can now be used to see that the ball Markov chain on the random graph has a.s. a spectral gap. Indeed, we deduce from Lemma 4 that there exists a (random) vertex \(x_0\in \mathbb {N}\) such that for any x > x 0, p(x) < x. Consider

It follows that for any a > x 1, we have, for all ∀ x ∈ T a, p(x) < x. (To see this, take any path a 0, a 1, … in T a, starting at a 0 = a, so that p(a i) = a i−1 for each i. Let k be the first index such that \(a_{k} \geqslant a_{k+1}\), assuming that there exists such a k. Then \(a_{k+1}\leqslant x_0\), and so \(a_k = p(a_{k+1})\leqslant x_1\). But this is impossible, since \(a_0\leqslant a_k\) and a 0 > x 1.) In particular, we see that (17) and (18) hold for a > x 1. As a consequence,

$$\displaystyle \begin{aligned} \begin{array}{rcl} \inf_{a> x_1}\frac{\nu(\partial T_a)}{\mu(T_a)}& \geqslant &\displaystyle \frac 12.\end{array} \end{aligned} $$

By the finiteness of , we also have . So, finally,

$$\displaystyle \begin{aligned} \begin{array}{rcl} \iota& =&\displaystyle \inf_{a\in\mathbb{N}_+}\frac{\nu(\partial T_a)}{\mu(T_a)}\ >\ 0,\end{array} \end{aligned} $$

which shows that G(, 1∕2) has a spectral gap a.s.

7 Spectral Gap for the Rado Graph

This section proves the needed spectral gap for the Rado graph. Here the graph has vertex set \(\mathbb {N}\) and an edge from i to j if i is less than j and the ith bit of j is a one. We treat carefully the case of a more general base measure, Q(x) = (1 − δ)δ x. As delta tends to 1, sampling from this Q is a better surrogate for “pick a neighboring vertex uniformly”. Since the normalization doesn’t enter, throughout take Q(x) = δ x. The heart of the argument is a discrete version of Hardy’s inequality for trees. This is developed below with full details in Appendix 2.

Consider the transition kernel K reversible with respect to π and associated to the measure Q given by for all \(x\in \mathbb {N}\), where δ ∈ (0, 1) (instead of δ = 1∕2 as in the introduction, up to the normalization). Recall that

where N(x) is the set of neighbors of x induced by K and where Z > 0 is the normalizing constant. Here is the equivalent of Proposition 3:

Proposition 5

We have

$$\displaystyle \begin{aligned} \begin{array}{rcl} \lambda& \geqslant &\displaystyle \frac{1-\delta}{16(2\vee\lceil \log_2\log_2(2/\log_2(1/\delta))\rceil)}.\end{array} \end{aligned} $$

This bound will be proved via Hardy’s inequalities. If we resort to Dirichlet–Cheeger, we rather get

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \lambda& \geqslant &\displaystyle \frac{(1-\delta)^2}2.\end{array} \end{aligned} $$
(19)

To see the advantage of Proposition 5, let δ come closer and closer to 1, namely, approach the problematic case of “pick a neighbor uniformly at random”. In this situation, the r.h.s. of the bound of Proposition 5 is of order

$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{1-\delta}{16\lceil \log_2\log_2(1/(1-\delta))\rceil}\end{array} \end{aligned} $$

which is better than (19) as δ goes to 1 −.

Here we present the Hardy’s inequalities method to get Proposition 5 announced above. Our goal is to show that K admits a positive first Dirichlet eigenvalue:

Proposition 6

There exists Λ > 0 depending on δ ∈ (0, 1) such that

$$\displaystyle \begin{aligned} \begin{array}{rcl} \forall\ f\in L^2(\pi),\qquad \Lambda\pi[(f-f(0))^2]& \leqslant &\displaystyle \frac 12\sum_{x,y\,\in\,\mathbb{N}}(f(y)-f(x))^2\, \pi(x)K(x,y).\end{array} \end{aligned} $$

It follows that the reversible couple (K, π) admits a spectral gap bounded below by Λ given above. Indeed, it is an immediate consequence of the fact that for any f ∈ L 2(π), the variance of f with respect to π is bounded above by π[(ff(0))2].

The proof of Proposition 6 is based on a pruning of K and Hardy’s inequalities for trees. Consider the set of unoriented edges induced by K: (in particular, E does not contain the self-edges or singletons). For any \(x\in \mathbb {N}_+\), let p(x) the smallest bit equal to 1 in the binary expansion of x, i.e.,

Define the subset F of E by

and the function ν on F via

To any f ∈ L 2(π), associate the function (df)2 on F given by

Finally, consider the (non-negative) measure μ defined on \(\mathbb {N}_+\) via

(20)

Then we have:

Proposition 7

There exists λ > 0 depending on δ ∈ (0, 1) such that

$$\displaystyle \begin{aligned} \begin{array}{rcl} \forall\ f\in L^2(\mu),\qquad \lambda\mu[(f-f(0))^2]& \leqslant &\displaystyle \sum_{e\in F}(df)^2(e) \nu(e).\end{array} \end{aligned} $$

This result implies Proposition 6. Indeed, note that by the definition of Q,

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \forall\ x\in\mathbb{N}_+,\qquad Q(p(x))\ \leqslant \ Q(N(x))\ \leqslant \ \frac 1{1-\delta}Q(p(x)).\end{array} \end{aligned} $$
(21)

Thus, for any f ∈ L 2(μ),

$$\displaystyle \begin{aligned} \begin{array}{rcl} \lambda\pi[(f-f(0))^2]& = &\displaystyle \frac{\lambda }Z\sum_{x\in\mathbb{N}_+} (f(x)-f(0))^2 Q(x)Q(N(x))\\ & \leqslant &\displaystyle \frac{\lambda }{(1-\delta)Z}\sum_{x\in\mathbb{N}_+} (f(x)-f(0))^2 Q(x)Q(p(x))\\ & =&\displaystyle \frac{\lambda }{(1-\delta)Z}\mu[(f-f(0))^2] \leqslant \frac 1{(1-\delta)Z}\sum_{e\in F}(df)^2(e) \nu(e)\\ & \leqslant &\displaystyle \frac 1{2(1-\delta)}\sum_{x,y\,\in\,\mathbb{N}}(f(y)-f(x))^2\, \pi(x)K(x,y)\end{array} \end{aligned} $$

namely Proposition 6 holds with .

Note that \(\mathbb {N}\) endowed with the set of non-oriented edges F has the structure of a tree. We interpret 0 as its root, so that for any \(x\in \mathbb {N}_+\), p(x) is the parent of x. Note that for any \(x\in \mathbb {N}\), the children of x are exactly the numbers y2x, where y is an odd number. We will denote h(x) the height of x with respect to the root 0 (thus, the odd numbers are exactly the elements of \(\mathbb {N}\) whose height is equal to 1).

According to [21] (see also Evans, Harris and Pick [10]), the best constant λ in Proposition 7, say λ 0, can be estimated up to a factor 16 via Hardy’s inequalities for trees, see (23) below. To describe them we need several notations.

Let \(\mathcal {T}\) the set of subsets T of \(\mathbb {N}_+\) satisfying the following conditions

  • T is non-empty and connected (with respect to F),

  • T does not contain 0,

  • there exists \(M\geqslant 1\) such that \(h(x)\leqslant M\) for all x ∈ T,

  • if x ∈ T has a child in T, then all children of x belong to T.

Note that any \(T\in \mathcal {T}\) admits a closest element to 0, call it m(T). Note that m(T) ≠ 0. When T is not reduced to the singleton {m(T)}, then T ∖{m(T)} has a denumerable infinity of connected components which are indexed by the children of m(T). Since these children are exactly the y2m(T), where \(y\in \mathcal {I}\), the set of odd numbers, call \(T_{y2^{m(T)}}\) the connected component of T ∖{m(T)} associated to y2m(T). Note that \(T_{y2^{m(T)}}\in \mathcal {T}\). We extend ν as a functional on \(\mathcal {T}\), via the iteration

  • when T is the singleton {m(T)}, we take ,

  • when T is not a singleton, decompose T as \(\{m(T)\}\sqcup \bigsqcup _{y\in \mathcal {I}}T_{y2^{m(T)}}\), then ν is defined as

    $$\displaystyle \begin{aligned} \begin{array}{rcl} {} \frac 1{\nu(T)}& =&\displaystyle \frac 1{\nu(\{m(T)\})}+\frac 1{\sum_{y\in\mathcal{I}} \nu(T_{y2^{m(T)}})}.\end{array} \end{aligned} $$
    (22)

For \(x\in \mathbb {N}_+\), let S x be the set of vertices \(y\in \mathbb {N}_+\) whose path to 0 passes through x. For any \(T\in \mathcal {T}\) we associate the subset

where L(T) is the set of leaves of T, namely the x ∈ T having no children in T. Equivalently, T is the set of all descendants of the leaves of T, themselves included.

Consider \(\mathcal {S}\subset \mathcal {T}\) the set of \(T\in \mathcal {T}\) which are such that m(T) is an odd number. Finally, define

We are interested in this quantity because of the Hardy inequalities:

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} A\ \leqslant \ \frac 1{\lambda_0}\ \leq\ 16A,\end{array} \end{aligned} $$
(23)

where recall that λ 0 is the best constant in Proposition 7. (In [21], only finite trees were considered, the extension to infinite trees is given in Appendix 2). So, to prove Proposition 7, it is sufficient to show that A is finite. To investigate A, we need some further definitions. For any \(x\in \mathbb {N}_+\), let

A finite path from 0 in the direction to infinity is a finite sequence of elements of \(\mathbb {N}_+\) such that z 0 = 0 and p(z n) = z n−1 for any . On such a path z, we define the quantity

The following technical result is crucial for our purpose of showing that A is finite.

Lemma 6

For any finite path from 0 in the direction to infinity , we have \(B(z)\leqslant C\) , where .

Proof

Note that for any , h(z n) = n. Furthermore, for any \(x\in \mathbb {N}_+\), we have \(h(x)\leqslant x\) and we get h(p(z n)) = h(z n) − 1 = n − 1, so that \(p(z_n)\geqslant n-1\). Writing \(z_n=y_n2^{p(z_n)}\), for some odd number y n, it follows that

$$\displaystyle \begin{aligned} \begin{array}{rcl} b(z_n)& =&\displaystyle \frac{Q(2^{y_n2^{p(z_n)}})}{Q(p(z_n))} ={\delta^{2^{y_n2^{p(z_n)}}-p(z_n)}} \leqslant {\delta^{2^{2^{p(z_n)}}-p(z_n)}} \leqslant {\delta^{2^{2^{n-1}}-n-1}}.\end{array} \end{aligned} $$

The desired result follows at once. □

We need two ingredients about ratios μ(T )∕ν(T). Here is the first one.

Lemma 7

For any \(T\in \mathcal {T}\) which is a singleton, we have \( \frac {\mu (T^*)}{\nu (T)}\leqslant \frac {1}{1-\delta } \).

Proof

When T is the singleton {m(T)}, on the one hand we have

$$\displaystyle \begin{aligned} \nu(T)=\nu(\{p(m(T)),m(T)\})=\mu(m(T)). \end{aligned}$$

On the other hand, T is the subtree growing from m(T), namely the subtree containing all the descendants of m(T). Note two properties of T :

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} T^*\subset\{y\in\mathbb{N}_+\,:\, y\geqslant m(T)\}& \mbox{ and }&\displaystyle \forall\ y\in T^*,\, p(y)\geqslant p(m(T)),\end{array} \end{aligned} $$
(24)

and we further have \(p(y)\geqslant m(T)\) for any y ∈ T ∖{m(T)}. It follows that

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mu(T^*) & =&\displaystyle \!\!\!\sum_{y\in T^*}\!\!\! Q(y)Q(p(y)) \leqslant Q(p(m(T)))\!\!\!\sum_{y\geqslant m(T)}\!\!\! Q(y)=Q(p(m(T)))\!\!\!\sum_{y\geqslant m(T)}\!\!\! \delta^{y}\\ & =&\displaystyle Q(p(m(T))) \frac{Q(m(T))}{1-\delta} =\frac 1{1-\delta}\mu(m(T)).{}\end{array} \end{aligned} $$
(25)

Thus, we get \( {\mu (T^*)}/{\nu (T)}\leqslant \frac 1{1-\delta }\). □

For the second ingredient, we need some further definitions. The length (T) of \(T\in \mathcal {T}\) is given by , and for any \(l\in \mathbb {N}\), we define

Lemma 8

For any \(l\in \mathbb {N}\) , we have \( \sup _{T\in \mathcal {T}_l}\frac {\mu (T^*)}{\nu (T)}<+\infty \).

Proof

We will prove the finiteness by induction over \(l\in \mathbb {N}\). First, note that \(\mathcal {T}_0\) is the set of singletons, and so Lemma 7 implies that \( \sup _{T\in \mathcal {T}_0}\frac {\mu (T^*)}{\nu (T)}\leqslant \frac 1{1-\delta }\). Next, assume that the supremum is finite for some \(l\in \mathbb {N}\) and let us show that it is also finite for l + 1.

Consider \(T\in \mathcal {T}_{l+1}\), with (T) = l + 1; in particular, T is not a singleton. Decompose T as \(\{m(T)\}\sqcup \bigsqcup _{y\in \mathcal {I}}T_{y2^{m(T)}}\) and recall the relation (22). Since \( T^*=\bigsqcup _{y\in \mathcal {I}}T_{y2^{m(T)}}^*\), it follows that

$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{\mu(T^*)}{\nu(T)}& =&\displaystyle \sum_{y\in \mathcal{I}}\mu(T_{y2^{m(T)}}^*)\left(\frac 1{\nu(\{m(T)\})}+\frac 1{\sum_{y\in\mathcal{I}} \nu(T_{y2^{m(T)}})}\right)\\ & =&\displaystyle \frac{\sum_{y\in \mathcal{I}}\mu(T_{y2^{m(T)}}^*)}{\nu(\{m(T)\})}+\frac{\sum_{y\in \mathcal{I}}\mu(T_{y2^{m(T)}}^*)}{\sum_{y\in\mathcal{I}} \nu(T_{y2^{m(T)}})}\\ & \leqslant &\displaystyle \frac{\mu(\sqcup_{y\in \mathcal{I}}T_{y2^{m(T)}}^*)}{\mu(m(T))}+\sup\left\{\frac{\mu(T_{y2^{m(T)}}^*)}{\nu(T_{y2^{m(T)}})}\,:\, y\in\mathcal{I}\right\}.{} \end{array} \end{aligned} $$
(26)

Consider the first term on the right. Given \(y\in \mathcal {I}\), the smallest possible element of \(T_{y2^{m(T)}}^*\) is y2m(T), and we have for any \(x\in T_{y2^{m(T)}}^*\),

$$\displaystyle \begin{aligned} p(x)\geqslant p(y2^{m(T)})=m(T). \end{aligned}$$

Thus we have the equivalent of (24):

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \bigsqcup_{y\in \mathcal{I}}T_{y2^{m(T)}}^*\subset\{y\in\mathbb{N}_+\,:\, y\geqslant 2^{m(T)}\},\ \forall\ x\in \bigsqcup_{y\in \mathcal{I}}T_{y2^{m(T)}}^*,\, p(x)\geqslant m(T).\quad \end{array} \end{aligned} $$
(27)

Following the computation (25), we get

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mu\left(\bigsqcup_{y\in \mathcal{I}}T_{y2^{m(T)}}^*\right) & < &\displaystyle \frac 1{1-\delta}Q(m(T))Q(2^{m(T)}), \end{array} \end{aligned} $$

where the inequality is strict, because in (27) we cannot have equality for all \(x\in \bigsqcup _{y\in \mathcal {I}}T_{y2^{m(T)}}^*\). It follows that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \frac{\sum_{y\in \mathcal{I}}\mu(T_{y2^{m(T)}}^*)}{\mu(m(T))}< \frac 1{1-\delta}\frac{Q(m(T)) Q(2^{m(T)})}{Q(m(T))Q(p(m(T)))} = \frac{b(m(T))}{1-\delta} \leqslant \frac{C}{1-\delta}\quad \;\; \end{array} \end{aligned} $$
(28)

where C is the constant introduced in Lemma 6. Since for any \(y\in \mathcal {I}\), we have \(T_{y2^{m(T)}}\in \mathcal {T}_l\), we deduce the desired result from the induction hypothesis. □

We are now ready to prove Proposition 7.

Proof (Of Proposition 7)

Fix some \(T\in \mathcal {S}\), we are going to show that \({\mu (T^*)}/{\nu (T)}\leqslant {1+C}/{(1-\delta )}\), where C is the constant introduced in Lemma 6. Due to Lemma 7, this bound is clear if T is a singleton. When T is not the singleton {m(T)}, decompose T as \(\{m(T)\}\sqcup \bigsqcup _{y\in \mathcal {I}}T_{y2^{m(T)}}\) and let us come back to (26). Denote and

which is positive according to (28). Coming back to (26), we have shown

$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{\mu(T^*)}{\nu(T)}& \leqslant &\displaystyle \frac{b(z_1)}{1-\delta}+\frac{\mu(T^*_{z_2})}{\nu(T_{z_2})}\end{array} \end{aligned} $$

where \(z_2\in \{y2^{m(T)}\,:\, y\in \mathcal {I}\}\) is such that

$$\displaystyle \begin{aligned} \begin{array}{rcl} \sup\left\{\frac{\mu(T_{y2^{m(T)}}^*)}{\nu(T_{y2^{m(T)}})}\,:\, y\in\mathcal{I}\right\}\leqslant \frac{\mu(T^*_{z_2})}{\nu(T_{z_2})}+\epsilon.\end{array} \end{aligned} $$

To get the existence of z 2, we used that the supremum is finite, as ensured by Lemma 8.

By iterating this procedure, define a finite path from 0 in the direction to infinity , such that for any ,

$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{\mu(T^*_{z_n})}{\nu(T_{z_n})}& \leqslant &\displaystyle \frac{b(z_n)}{1-\delta}+\frac{\mu(T^*_{z_{n+1}})}{\nu(T_{z_{n+1}})} \end{array} \end{aligned} $$

and \(T_{z_N}\) is a singleton. We have \(N\leqslant \max \{h(x)\,:\, x\in T\}\). We deduce that

$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{\mu(T^*)}{\nu(T)}& \leqslant &\displaystyle \frac{B(z)}{1-\delta}+\frac{\mu(T^*_{z_N})}{\nu(T_{z_N})} \leqslant \frac{C+1}{1-\delta},\end{array} \end{aligned} $$

as desired. □

To get an explicit bound in terms of δ, it remains to investigate the quantity C.

Lemma 9

We have

$$\displaystyle \begin{aligned} \begin{array}{rcl} C& \leqslant &\displaystyle \left\{\begin{array}{ll} 2&\displaystyle \mathit{\mbox{if }}\delta\in (0,1/\sqrt{2}],\\ 1+\left\lceil\log_2\log_2\left(\frac 2{\log_2(1/\delta)}\right)\right\rceil& \mathit{\mbox{if }}\delta\in (1/\sqrt{2},1). \end{array}\right.\end{array} \end{aligned} $$

Proof

Consider . Elementary computations show that

$$\displaystyle \begin{aligned} \begin{array}{rcl} \forall\ l\geqslant 1,\qquad 2^{2^{l+1}}-l-1\geqslant 2(2^{2^l}-l),\end{array} \end{aligned} $$

so we get

$$\displaystyle \begin{aligned} \begin{array}{rcl} \sum_{l\geqslant l_0}\delta^{2^{2^l}-l}& \leqslant &\displaystyle \sum_{n\geqslant 0}\frac 1{2^{2^n}} \leqslant \sum_{n\geqslant 1}\frac 1{2^{n}} =1.\end{array} \end{aligned} $$

Since we have for any \( l\in \mathbb {N}\), \( 2^{2^l}-l\geqslant 0\), we deduce

It is not difficult to check that for any \( l\geqslant 1\), \(2^{2^l}-l\geqslant \frac 12 2^{2^l}\), so that

$$\displaystyle \begin{aligned} \begin{array}{rcl} l_0& =&\displaystyle \min\{l\in\mathbb{N}_+\,:\, 2^{2^l}-l\geqslant 1/\log_2(1/\delta)\} \leqslant \min\{l\in\mathbb{N}_+\,:\, 2^{2^l}\geqslant 2/\log_2(1/\delta)\}\\ & =&\displaystyle 1\vee\lceil \log_2\log_2(2/\log_2(1/\delta))\rceil. \end{array} \end{aligned} $$

The announced result follows from the fact

$$\displaystyle \begin{aligned} \begin{array}{rcl} \log_2\log_2(2/\log_2(1/\delta))\geqslant 1 & \Leftrightarrow&\displaystyle \delta\geqslant \frac 1{\sqrt{2}}.\end{array} \end{aligned} $$

The following observations show that Q needs to be at least decaying exponentially for the Hardy inequality approach to work.

Remark 2

  1. (a)

    In view of the expression of π, it is natural to try to replace (20) by

    But then in Lemma 7, where we want the ratios μ(T )∕ν(T) to be bounded above for singletons T, we end up with the fact that

    $$\displaystyle \begin{aligned} \frac{Q(N(m(T)))}{Q(p(m(T)))}=\frac{\mu(T)}{\nu(T)}\leqslant \frac{\mu(T^*)}{\nu(T)} \end{aligned}$$

    must be bounded above for singletons T. Namely an extension of (21) must hold: there exists a constant c > 0 such that

    $$\displaystyle \begin{aligned} \begin{array}{rcl} {} \forall\ x\in\mathbb{N}_+,\qquad Q(N(x))& \leqslant &\displaystyle c Q(p(x)).\end{array} \end{aligned} $$
    (29)

    Writing x = y2p, with \(y\in \mathcal {I}\) and \(p\in \mathbb {N}\), we must have \( Q(N(y2^p))\leqslant c Q(p)\). Take y = 1 + 2 + 4 + ⋯ + 2l, then we get that p, p + 1, …, p + l all belong to Q(N(y2p)), so that \( Q(\{p, p+1, \ldots , p+l\})\leqslant c Q(p)\), and letting l go to infinity, it follows that , namely, Q has exponential tails.

  2. (b)

    Other subtrees of the graph generated by K could have been considered. It amounts to choose the parent of any \(x\in \mathbb {N}_+\). But among all possible choices of such a neighbor, the one with most weight is p(x), at least if Q is decreasing. In view of the requirement (29), it looks like the best possible choice.

  3. (c)

    If one is only interested in Proposition 7 with μ defined by (20), then many more probability measures Q can be considered, in particular any polynomial probability of the form , for any \(x\in \mathbb {N}\), where ζ is the Riemann function and l > 1.