1 Introduction

It is a classical fact [9, 11] that a uniformly chosen tree from the set of \(n^{n-2}\) trees on n vertices, viewed from an independently chosen uniform random vertex, is locally distributed as a \(\mathsf {Poisson}(1)\) Galton–Watson branching process conditioned to live forever when n is large. One can use this result to find the distribution of a certain “local” structures. For instance, it follows that the degree distribution of a uniformly chosen vertex of a uniformly chosen tree on n vertices converges to the law of a \(\mathsf {Poisson}(1)+1\) random variable.

A uniformly chosen tree on n vertices is a uniform spanning tree (UST) of the complete graph on n vertices. Our goal in this paper is to explicitly describe the local structure of the UST of any dense graph or, equivalently, of a sequence of dense graphs converging to a given graphon. Let us first present this result.

Given a connected graph G we write \(\mathcal {T}\) for a uniformly drawn spanning tree of G and \(B_\mathcal {T}(v,r)\) for the graph-distance ball of radius r in \(\mathcal {T}\) around the vertex \(v\in V(G)\). Our goal is to describe the asymptotic distribution of \(B_\mathcal {T}(X,r)\), viewed up to graph isomorphism, where X is a uniformly chosen random vertex of G. To that aim, let \(\varOmega =[0,1]\) and \(\mu \) be the Lebesgue measure on \(\varOmega \). For a given graphon \(W:\varOmega ^2\rightarrow [0,1]\) (see Sect. 1.1 for a brief introduction to graphons) and \(\omega \in \varOmega \) we write \(\deg (\omega )=\int _{y \in \varOmega } W(\omega ,y)\mathrm {d}y\), and call this number the degree of \(\omega \). A graphon is called nondegenerate if for almost every \(\omega \in \varOmega \) we have \(\deg (\omega )>0\).

Let T be a fixed rooted tree with \(\ell \geqslant 2\) vertices and of height \(r\geqslant 1\). We write by \(\mathrm {Stab}_T\) the set of graph automorphisms of T that preserve the root. In what follows we denote the vertices of T by the numbers \(\{1,\ldots ,\ell \}\) such that the vertices \(p, \ldots , \ell \) are the vertices at height r of T and \(p\in \{2,\ldots ,\ell \}\). Given a nondegenerate graphon W and a tree T as above we define

$$\begin{aligned} \mathrm {Freq}(T;W):= & {} {1 \over |\mathrm {Stab}_T|} \int \limits _{\omega _1, \ldots , \omega _\ell } \exp \left( -\sum _{j=1}^{p-1} b_W(\omega _j) \right) \frac{\sum _{j=p}^\ell \deg (\omega _j) }{\prod _{j=1}^\ell \deg (\omega _j)}\nonumber \\&\times \, \prod _{\begin{array}{c} (i,j) \in E(T) \end{array}} W(\omega _i, \omega _j) \mathrm {d}\omega _1 \cdots \mathrm {d}\omega _\ell \, , \end{aligned}$$
(1)

where

$$\begin{aligned} b_W(\omega ) = \int _{y\in \varOmega } {W(\omega ,y) \over \deg (y)} \mathrm {d}\mu . \end{aligned}$$
(2)

Theorem 1.1

Let T be a fixed rooted tree as above, and \(W:\varOmega ^2 \rightarrow [0,1]\) be a nondegenerate graphon. Then for any \(\varepsilon >0\) there exists \(\xi =\xi (\varepsilon ,W,T)>0\) such that if G is a connected simple graph on at least \(\xi ^{-1}\) vertices that is \(\xi \)-close to W in the cut-distance, then with probability at least \(1-\varepsilon \) a uniformly chosen spanning tree \(\mathcal {T}\) of G satisfies

$$\begin{aligned} \big | \mathbb {P}( B_\mathcal {T}(X,r) \cong T ) - \mathrm {Freq}(T;W) \big | \leqslant \varepsilon \, , \end{aligned}$$

where X is an independently and uniformly chosen vertex of G, and by \(B_\mathcal {T}(X,r) \cong T\) we mean that between the two rooted trees there is a graph-isomorphism preserving the root.

Theorem 1.1 is a natural statement in the context of limits of graph sequences. It asserts that if \(\{G_n\}\) is a sequence of connected graphs converging to a graphon W, then the UST of \(G_n\) converges locally to a certain multi-type branching process that is defined in terms of W. This interpretation involve two graph limit procedures: a dense graph limit [7, 16] to describe the limit of the dense graph sequence \(G_n\), and sparse graph limit to describe the random limit of the UST of \(G_n\), known as Benjamini–Schramm convergence [4]. We refer the reader to Sects. 1.1 and 1.2 for an introduction to these limiting procedure, and begin by describing the limiting branching process.

Definition 1.2

Given a nondegenerate graphon \(W:\varOmega ^2\rightarrow [0,1]\) we define a multi-type branching process \(\kappa _W\). The process has continuum many types that are either \((\mathsf {anc},\omega )\) or \((\mathsf {oth},\omega )\), where \(\omega \in \varOmega \). Here, “\(\mathsf {anc}\)” stands for “ancestral” and “\(\mathsf {oth}\)” stands for “other”.

  1. (1)

    The initial particle (i.e., the root) has type \((\mathsf {anc},\omega )\), where the distribution of \(\omega \) is \(\mu \).

  2. (2)

    If a particle has type \((\mathsf {oth},\omega )\), then its progeny is \(\{(\mathsf {oth},\omega _1),\ldots ,(\mathsf {oth},\omega _k))\}\) where \(\{\omega _1,\ldots ,\omega _k\}\) are a Poisson point process on \(\varOmega \) with intensity \(\frac{W(\omega ,\omega ')}{\deg (\omega ')}\) at \(\omega '\in \varOmega \). In particular, k has distribution \(\mathsf {Poisson}\left( b_W(\omega )\right) \), where \(b_W(\cdot )\) is defined in (2).

  3. (3)

    If a particle has type \((\mathsf {anc},\omega )\) the progeny is \(\{(\mathsf {anc},\omega _0),(\mathsf {oth},\omega _1),\ldots ,(\mathsf {oth},\omega _k))\}\) where \(\{\omega _1,\ldots ,\omega _k\}\) are a Poisson point process on \(\varOmega \) with the same intensity as above and \(\omega _0\) is an independent new particle of \(\varOmega \) which is distributed according to the probability measure on \(\varOmega \) that has density \({W(\omega ,\cdot ) \over \deg (\omega )}\).

We note that an ancestral vertex (i.e., of type \(\mathsf {anc}\)) always has at least 1 progeny and since the initial particle is ancestral the process \(\kappa _W\) survives forever with probability 1. The following theorem is quickly deduced from Theorem 1.1 at Sect. 4.2.

Theorem 1.3

Suppose that \(\{G_n\}\) is a sequence of simple connected graphs of growing orders that converge to a nondegenerate graphon W in the cut-distance and let \(T_n\) be a UST of \(G_n\). Then the sequence \(\{T_n\}\) almost surely converge in the Benjamini–Schramm sense to \(\kappa _W\).

Since the \(L^1\)-norm (corresponding to the edit distance in graph theory) is finer than the cut-norm, we in particular obtain the following. Suppose that G is a large connected simple n-vertex graph with minimum degree \(\mathsf {\Omega }(n)\). Suppose that we add and/or delete \(\mathsf {o}(n^2)\) edges in a way that the resulting graph \(G'\) stays connected. Then the structure of a typical UST of G and of \(G'\) is very similar in the Benjamini–Schramm sense. Even this statement seems to be new.

When \(G_n\) are a sequence of dense regular graphs, then the function \(b_W:\varOmega \rightarrow (0,\infty )\) defined in (2) is identically 1 and we obtain the following.

Corollary 1.4

Suppose that \(\{G_n\}\) is a sequence of simple connected regular graphs of growing orders that converge to a nondegenerate graphon W in the cut-distance and let \(T_n\) be a UST of \(G_n\). Then the sequence \(\{T_n\}\) almost surely converge in the Benjamini–Schramm sense to a \(\mathsf {Poisson}(1)\) Galton–Watson branching process conditioned to live forever.

Theorem 1.3 allows us to deduce several extremal properties of the UST on dense graphs. The number of vertices of degree \(k\geqslant 1\) in the UST of the complete graph \(K_n\) is \((e^{-1}/(k-1)!+\mathsf {o}(1))n\). In fact, using Prüfer codes one can establish that the degree of the a vertex in the UST of \(K_n\) has distribution \(1+\mathsf {Bin}(n-2,\frac{1}{n})\approx 1+\mathsf {Poisson}(1)\).Footnote 1 Using Theorem 1.1 we are able to find the extremal values of the number of vertices of degree k in a general dense graph. In the following theorem we show that the complete graph (or any other regular dense graph) is the minimizer of the number of leaves and the maximizer of the number of vertices of degrees 2 and 3 among the class of dense graphs. Somewhat surprisingly, the maximizer for the number of vertices of degree \(k\geqslant 4\) is a different dense graph (see Sect. 5).

Theorem 1.5

For any \(k \geqslant 1\) we denote by \(L_k\) the random variable counting the number of vertices of degree k in a UST of a simple connected graph G. For every \(\varepsilon ,\delta >0\) there exist numbers \(n_0\in \mathbb {N}\) and \(\gamma >0\) such that the following holds: Whenever G is a graph on \(n\geqslant n_0\) vertices with at least \((1-\gamma )n\) vertices of degrees at least \(\delta n\) then

$$\begin{aligned}&\mathbb {P}\big ( L_1 \leqslant (e^{-1}-\varepsilon )n \big ) \leqslant \varepsilon \, , \end{aligned}$$
(3)
$$\begin{aligned}&\mathbb {P}\big ( L_2 \geqslant (e^{-1}+\varepsilon )n \big ) \leqslant \varepsilon \, , \end{aligned}$$
(4)

and for any \(k \geqslant 3\) we have

$$\begin{aligned} \mathbb {P}\left( L_k \geqslant \left( {1 \over (k-1)!}\tfrac{(k-2)^{k-2}}{e^{k-2}}+\varepsilon \right) n\right) \leqslant \varepsilon . \end{aligned}$$
(5)

We derive Theorem 1.5 in Sect. 5. The rest of this section is organized as follows. We first give the formal definitions and background of dense and sparse graph limits necessary for the reader to parse the statement of Theorems 1.1 and 1.3. Next we discuss some aspects of the theorem, such as the necessity of its assumptions. We end this section with a discussion of related results in the literature.

1.1 Dense Graph Limits

Graphons were introduced by Borgs et al. [7, 16] as limit objects to sequences of dense graphs. Here, we review basic facts and we refer the reader to [15, Part II] for a thorough treatment of the subject. A graphon is a symmetric Lebesgue measurable function \(W:\varOmega ^2\rightarrow [0,1]\), where \(\varOmega \) is any standard atomless probability space.Footnote 2 The underlying measure on \(\varOmega \) will be always denoted by \(\mu \).

Given a measurable function \(U:\varOmega ^2\rightarrow [-1,1]\) we define its cut-norm by

$$\begin{aligned} \Vert U\Vert _\square =\sup _{S,T}\left| \int _{x\in S}\int _{y\in T}U(x,y)\right| , \end{aligned}$$
(6)

where S and T range over all measurable subsets of \(\varOmega \). Now, we can define the key notion of cut-distance. For two graphons \(W_1,W_1:\varOmega ^2\rightarrow [0,1]\), we define

$$\begin{aligned} \delta _\square (W_1,W_2)=\inf _{\varphi } \Vert W_1^\varphi -W_2\Vert _\square \;, \end{aligned}$$
(7)

where \(\varphi :\varOmega \rightarrow \varOmega \) ranges through all measure preserving automorphisms of \(\varOmega \), and \(W_1^\varphi \) stands for a graphon defined by \(W_1^\varphi (x,y)=W_1(\varphi (x),\varphi (y))\). Note that the definition of \(\delta _\square (W_1,W_2)\) extends in a straightforward way if \(W_2\) lives on some other standard atomless probability space \(\varLambda \). In that case, \(\varphi \) ranges of all measure preserving bijections from \(\varLambda \) to \(\varOmega \).

Suppose that G is an n-vertex graph. Then we can consider a graphon representation of G. To this end, partition an atomless standard probability space \(\varOmega \) into n sets, each set of measure \(\frac{1}{n}\), \(\varOmega =\bigsqcup _{v\in V(G)} \varOmega _v\). Then define a graphon \(W_G\) to be 1 on \(\varOmega _u\times \varOmega _v\) for each edge \(uv\in E(G)\), and 0 otherwise. Note that \(W_G\) is not unique as it depends on the choice of the partition \(\{\varOmega _v\}\). With the notion of a graphon representation, we can define distance between a graph and a graphon. Namely, if \(W:\varOmega ^2\rightarrow [0,1]\) is a graphon and G is a graph, we define \(\delta _\square (W,G):=\delta _\square (W,W_G)\). This definition does not depend on the choice of the representation \(W_G\). We say that G is \(\alpha \)-close to W if \(\delta _\square (W,G)\leqslant \alpha \). Though through much of the paper, we shall work with loopless multigraphs (introduced in Sect. 2.1), we always restrict ourselves to graphs when representing as graphons, or when referring to the cut-distance.

Throughout the paper, all sets and functions are tacitly assumed to be measurable. Conversely all our constructions of auxiliary sets and functions are measurable, too.

We say that a sequence \((G_n)_n\) of simple graphs converges to a graphon W if and only if \(\delta _\square (W,G_n) \rightarrow 0\). Now, it is possible to understand the first sentence of Theorem 1.3. The following statement, proved first in [16], is the core of the theory of graphons.

Theorem 1.6

For every sequence of simple graphs of increasing orders there exists a subsequence which converges to a graphon.

1.2 Sparse Graph Limits

Let \(\mathcal {G}_\bullet \) denote the space of rooted locally finite graphs viewed up to root-preserving graph isomorphisms. That is, each element of \(\mathcal {G}_\bullet \) is \((G,\varrho )\) where G is a graph and \(\varrho \) is a vertex of it, and two such elements \((G_1,\varrho _1)\) and \((G_2,\varrho _2)\) are considered equivalent if and only if there exists a graph automorphism \(\varphi :G_1 \rightarrow G_2\) such that \(\varphi (\varrho _1)=\varrho _2\). Given a rooted graph \((G,\varrho )\) and an integer \(r \geqslant 1\) we write \(B_G(\varrho ,r)\) for the graph-distance ball of radius r around \(\varrho \) in G, that is, \(B_G(\varrho ,r)\in \mathcal {G}_\bullet \) is a finite graph rooted at \(\varrho \) on the set of vertices of graphs distance at most r from \(\varrho \) in G together with all the edges induced from G. There is a natural notion of a metric on \(\mathcal {G}_\bullet \). The distance between two elements \((G_1,\varrho _1), (G_2, \varrho _2) \in \mathcal {G}_\bullet \) is defined to be \(2^{-R}\) where \(R\geqslant 0\) is the largest number such that there is a root-preserving graph isomorphism between \(B_{G_1}(\varrho _1,R)\) and \(B_{G_2}(\varrho _2,R)\). Having defined the metric we can consider probability spaces on \(\mathcal {G}_\bullet \) with respect to the Borel \(\sigma \)-algebra.

We say that the law of a random element \((G,\varrho )\in \mathcal {G}_\bullet \) is the Benjamini–Schramm limit of a (possibly random) sequence of finite graphs \(G_n\), if and only if, for any fixed integer \(r \geqslant 1\) the random variable \(B_{G_n}(\varrho _n,r)\) converges in distribution to \(B_G(\varrho ,r)\) where \(\varrho _n\) is an independently chosen uniform random vertex of \(G_n\). In this case we say that the sequence \(G_n\) Benjamini–Schramm converges to \((G,\varrho )\). Note that by putting \(r=1\) in the definition we deduce that in this convergence the degree of the random root \(\varrho _n\) must converge to the degree of \(\varrho \) which explains why this limiting procedure is best suited for sparse graphs. See further discussion in [4].

We have finished defining all the needed terminology required to parse Theorems 1.1 and 1.3.

1.3 Necessity of the Assumptions

The assumptions in Theorem 1.3 are the minimal necessary. First, we obviously need the assumption that \(G_n\) are connected in order for spanning trees to exist.

Next we claim that the local structure of a uniform spanning tree cannot be determined from a degenerate graphon W. Indeed, suppose that a graphon W is given, and let \(\varOmega ^0\) be the elements of \(\varOmega \) that have zero degree W and \(\varOmega ^+ = \varOmega {\setminus } \varOmega ^0\). Assume that W is degenerate so that \(\mu (\varOmega ^0)=\delta >0\).

We can now construct two graph sequences that converge to W. We start with dense graphs \(G_n\) of size \((1-\delta )n\) that convergeFootnote 3 to \(W^+=W_{\mid \varOmega ^+}\) and in the first sequence we attach to \(G_n\) a path of length \(\delta n\) at an arbitrary vertex and in the second sequence we attach \(\delta n\) edges arbitrary to an vertex of \(G_n\) creating \(\delta n\) new vertices of degree 1. It is clear that both sequences converge to W. However, the USTs on the two sequences have different Benjamini–Schramm limits. Indeed, let \(p_1\) denote the probability that in \(\kappa _{W^+}\) the root is a leaf. Then the probability that a randomly chosen vertex is a leaf in the first sequence tends to \((1-\delta )p_1\) and in the second sequence this probability tends to \((1-\delta )p_1 + \delta \).

1.4 Discussion

Theorem 1.3 shows that the local structure of the UST is continuous on the space of dense graphs with the cut-metric (7) and describes this local structure explicitly. As mentioned earlier, the only instance in the literature of Theorem 1.3 that we are aware of is the case of the UST of the complete graph \(K_n\). In this case Grimmett [9] showed that the limiting object is an infinite path upon which we to each vertex an independent \(\mathsf {Poisson}(1)\) branching process—that is, a \(\mathsf {Poisson}(1)\) branching process conditioned to survive forever. This is precisely \(\kappa _W\) where corresponding graphon W is just \(W\equiv 1\).

The analogous continuity result for sparse graphs is also true, though in this case it is typically harder to describe explicitly the limiting object.

Theorem 1.7

[1, Proposition 7.1] Suppose that \(\{G_n\}\) is a Benjamini–Schramm convergent sequence of connected graphs and let \(T_n\) be a UST of \(G_n\). Then there exists a random rooted tree \((T,\varrho )\) such that \(T_n\) Benjamini–Schramm converges to \((T,\varrho )\).

The limiting object in the above theorem \((T,\varrho )\) is the wired uniform spanning forest of the Benjamini–Schramm limit \((G,\varrho )\) of the graphs \(\{G_n\}\), see [18].

One can also ask whether the normalized number of spanning trees is continuous with respect to taking graph limits. Given a graph G, let t(G) be the number of spanning trees in G. In the bounded-degree model, Lyons [17, Theorem 3.2] proved that \(n^{-1} \log t(G)\) is continuous in the Benjamini–Schramm topology. In the dense model, the natural normalization of t(G) is \(n^{-1} t(G)^{1/n}\). For example when \(G=K_n\) then by Cayley’s formula, \(n^{-1} t(G)^{1/n}\) tends to 1, and when G is a typical Erdős–Rényi random graph \(\mathbb G(n,p)\) for \(p\in (0,1)\) fixed, then \(n^{-1} t(G)^{1/n}\) tends to p. However, one cannot expect that with no further assumptions continuity of this parameter with respect to the cut-metric will hold. Indeed, let \(G_n\) formed by a clique of order \(n-\frac{n}{\sqrt{\log n}}\) and a path of length \(\frac{n}{\sqrt{\log n}}\) attached to it at an arbitrary vertex. The sequence \(\{G_n\}\) converges to the complete graphon \(W\equiv 1\) but the number of spanning is substantially lower than for complete graphs; indeed, in this case it can easily by seen using Cayley’s formula that \(n^{-1} t(G_n)^{1/n}\) tends to 0.

However, when we impose a minimum degree condition on the graphs, we can infer the asymptotic normalized number of spanning trees from the limit graphon.

Theorem 1.8

Let \(\delta >0\). Suppose that \(G_1,G_2,\ldots \) is a sequence of simple connected graphs that converge to a graphon W. Suppose that the order of \(G_n\) is n and the minimum degree is at least \(\delta n\). Then the number of spanning trees satisfies

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\root n \of {t(G_n)}}{n}=\exp \left( \int _x\log (\deg _W(x))\right) . \end{aligned}$$

\(\square \)

Theorem 1.8 follows almost immediately from a result due to Kostochka [12] which states that if \(1<d_1\leqslant d_2 \leqslant \cdots \leqslant d_n\) is the degree sequence of a simple connected graph G then for some absolute constant \(C>0\) we have

$$\begin{aligned} \frac{\prod _i d_i}{d_1^{(Cn\log d_1)/d_1}}\leqslant t(G)\leqslant \frac{\prod _i d_i}{n-1}. \end{aligned}$$
(8)

To see that (8) yields Theorem 1.8, it is enough to recall that the degree distribution of a limit graphon is inherited from degree distribution of graphs that converge to it (see Lemma 5.1(1)). In the case of regular graphs, Theorem 1.8 can also be derived from [2, 8].

Lastly, let us mention a result of a similar flavor to ours in the context of percolation [5]. There, the authors show that the critical percolation probability of a dense graph is \({1 \over \lambda _n}\) where \(\lambda _n\) is the largest eigenvalue of the adjacency matrix. In particular it follows that if two dense graphs are close in the cut-metric, then this threshold is also close. They also describe the limiting local structure of bond percolation on dense graph in terms of a branching process on the limiting graphon. While there is some resemblance to the branching process of Theorem 1.3, they are quite different.

1.5 About the Proof of Theorem 1.1

Our proof combines, for the first time to the best of our knowledge, two seemingly unrelated mathematical areas, Kirchhoff’s electric network theory and Szemerédi’s regularity lemma-like graph partitioning techniques. These two are shown here to work seamlessly together.

Kirchhoff’s theory of electric networks [10] allows to compute the probability that a given edge \(e=xy\) is in a UST of a connected graph. This probability is precisely the effective electric resistance between x and y, when we consider the graph as an electric network and let current flow from x to y, see Sect. 3.1.1 and (24). Since there is an edge connecting x to y, this quantity is always a number in [0, 1]. This is the starting point of our proof.

Next we use partition theory (Sect. 2) to decompose our graph G into a bounded number of dense expanders so that different expanders of the decomposition are connected by \(\mathsf {o}(n^2)\) edges. Heuristically, the UST of G is close the union of independent USTs on each of these dense expanders.

Thus, it is natural to study electric theory on dense expanders. It is intuitive (and easy to prove) that if x and y are two vertices in graph, then the effective resistance between x and y is at least \({1 \over \deg (x)}+{1 \over \deg (y)}\) since at the most efficient scenario the electric current splits equally from x to all its neighbors and arrives to y equally from all of y’s neighbors. Of course this lower bound is not sharp—the graph could be the disjoint union of two large stars around x and y and an edge connecting x and y, in which case the resistance between x and y is 1.

However, when the graph is a dense expander, one can use random walks estimates and employ the fascinating and classical connection between random walks and electric networks, to deduce a corresponding approximate upper bound, see Corollary 3.3. The random walk estimate we prove (Lemma 3.2) states that if one starts a random walk on a dense expander from some vertex that is not x or y, then the probability that x is visited before y is \({(1+\mathsf {o}(1))\deg (x) \over \deg (x) + \deg (y)}\).

It is now quite pleasant to observe that Rayleigh’s monotonicity (21), which states that the electric resistance can only decrease by enlarging a network, shows that this upper bound on the resistance holds in each expander in the decomposition of G, and the matching lower bound holds for most edges of G since there are \(\mathsf {o}(n^2)\) edges between components, see Lemma 3.3.

This explains why for most edges in G the probability that they are exhibited in the UST is the sum of the degree reciprocals. An iterative argument is presented in Sect. 3.3, employing the spatial Markov property of the UST (Proposition 3.1), to control the probability of events such as \(B_\mathcal {T}(v,r) \cong T\). There are some delicate technicalities to overcome involving the “outside” effects of the decomposition. Once these are overcome, one reaches the discrete version \(\mathrm {Freq}(T;G)\) of the parameter \(\mathrm {Freq}(T;W)\) of Theorem 1.1 and we show that this parameter approximates the desired probability (Lemma 3.12). In Sect. 4 it is shown that \(\mathrm {Freq}(T;G)\) is close to \(\mathrm {Freq}(T;W)\), its continuous counterpart, if G is close to W in the cut-distance, concluding the proof.

1.6 Organisation of the Paper

We tried to write the paper so that it can be read by probabilists and graph theorists alike. For this reason we recall even concepts relatively well known to one of the communities in a pedestrian manner. Also, at places we try to convey an idea of a proof even when this idea is standard, but in only one of the two communities.

In Sect. 2 we introduce a suitable decomposition of dense graphs, into so-called linear expanders. In Sect. 3 we prove the discrete estimate approximating the probability of the event \(\mathbb {P}(B_\mathcal {T}(X,r)=T)\) by the discrete parameter \(\mathrm {Freq}(T;G)\). In Sect. 4 we show that \(\mathrm {Freq}(T;G)\) and \(\mathrm {Freq}(T;W)\) are close whenever G and W are close in the cut-metric. Lastly, in Sect. 5 we derive Theorem 1.5.

2 Decomposing Dense Graphs into Linear Expanders

2.1 Dense Expanders

Informally, a graph is a dense expander if whenever a vertex set and its complement are of linear size (in the order of the graph) then there are quadratically many edges between these two parts. So, a primal example of a dense graph that is not an expander is a disjoint union of two cliques of order \(\frac{n}{2}\) with a perfect matching connecting them.

We give our definition of expansion for loopless multigraphs. That is, self-loops are not allowed, and two vertices may be connected by several edges. The quantity e(AB) counts all ordered pairs ab that form an edge, \(a\in A\) and \(b\in B\), including multiplicities. Note that each edge with both endvertices in \(A\cap B\) contributes twice to e(AB). For a vertex v and a vertex set A, we write \(\deg (v,A):=e(\{v\},A)\). We write \(\deg (v):=\deg (v,V)\), where V is the vertex set of the (multi)graph. Note that with these conventions, we have \(2e(G)=\sum _{v\in V} \deg (v)\), and \(\sum _{a\in A}\deg (a,B)=e(A,B)=\sum _{b\in B}\deg (b,A)\), for each vertex sets A and B.

Definition 2.1

We say that a loopless multigraph H is a \(\gamma \)-expander if for each \(U\subseteq V(H)\), we have \(e(U,V(H){\setminus } U)\geqslant \gamma |U|(v(H)-|U|)\).

We will later use a simple observation. Removing edges from an expander can obviously render its expansion properties. However, if one removes edges touching only one vertex while leaving the degree of the vertex high, the expansion properties are not damaged by too much as the following simple proposition states.

Proposition 2.2

Let G be a \(\gamma \)-expander loopless multigraph on m vertices and let \(v\in V(G)\) be a vertex. Assume that the maximal number of edges between any two vertices in \(V(G){\setminus }\{v\}\) is at most \(\ell \in \mathbb {N}\). Consider the graph \(G'\) obtained from G by erasing some set of edges emanating from v of size at most \(\ell m\) so that the degree of v in \(G'\) is at least \(\gamma m\). Assume also that \(8\ell ^2 \gamma ^{-2} \leqslant m\) . Then \(G'\) is a \({\gamma \over 2}\)-expander.

Proof

We need to prove that for any \(U \subseteq V(G)\),

$$\begin{aligned} e_{G'}(U, V(G) {\setminus } U) \geqslant {\gamma \over 2} |U|(m-|U|) . \end{aligned}$$
(9)

By symmetry it is enough to prove it when \(|U|\leqslant m/2\). We proceed by considering two cases. In the first case we assume \(|U| \geqslant 4 \gamma ^{-1} \ell \). Then, since we erased at most \(\ell m\) edges, we have

$$\begin{aligned} e_{G'}(U, V(G){\setminus } U) \geqslant \gamma |U|(m-|U|) - \ell m \geqslant {\gamma \over 2} |U|(m-|U|) \, , \end{aligned}$$

since in this case \(\gamma |U|(m-|U|) \geqslant 2\ell m\).

In the second case we assume that \(|U| \leqslant 4 \gamma ^{-1} \ell \). If \(U=\{v\}\), then (9) follows since the degree of v in \(G'\) is at least \(\gamma m\). If \(v \not \in U\), then the maximum number of edges that could have been erased from \(e_G(U, V(G){\setminus } U)\) is at most \(\ell |U|\), hence

$$\begin{aligned} e_{G'}(U, V(G) {\setminus } U) \geqslant \gamma |U|(m-|U|) - 4 \ell ^2 \gamma ^{-1} \geqslant {\gamma \over 2} |U|(m-|U|) \, , \end{aligned}$$

since \(\gamma |U|(m-|U|) \geqslant \gamma m\) and \(8\ell ^2 \gamma ^{-2} \leqslant m\). Lastly, if \(v \in U\) and \(|U|>1\), then

$$\begin{aligned} e_{G'}(U, V(G){\setminus } U) \geqslant e_{G}(U {\setminus } \{v\}, V(G) {\setminus } U \cup \{v\}) - \ell |U| \geqslant {\gamma \over 2} |U|(m-|U|) \, , \end{aligned}$$

by the same logic as above, concluding the proof. \(\square \)

2.2 Expander Decomposition of Dense Graphs

The main result of this section, Theorem 2.7, asserts that each graph that is close to a nondegenerate graphon can be decomposed into a bounded number expanders that are almost isolated from each other. In Definition 2.6 below we describe the expander decomposition that we actually use. Let us recall that the need of such a decomposition (rather than a single expander) stems from examples such as that of a disjoint union of two cliques of order \(\frac{n}{2}\) with a perfect matching connecting them mentioned at the beginning of Sect. 2.1. Passing to a limit we see that in the graphon perspective, the perfect matching vanishes and we are left with two components. Therefore, we now introduce graphon counterparts to the notion of graph connectivity and components, and give their basic properties.

Definition 2.3

A graphon W on a ground space \((\varOmega ,\mu )\) is disconnected if either \(W=0\) a.e. or there exists a subset \(A\subseteq \varOmega \) with \(0<\mu (A)<1\) such that \(W=0\) a.e. on \(A\times A^c\); otherwise W is connected.

We shall require a result of Bollobás, Janson and Riordan [6, Lemma 5.17] which enables us to decompose a graphon into (at most) countably many connected components.

Lemma 2.4

Let \(W:\varOmega \times \varOmega \rightarrow [0,1]\) be a graphon. Then there exists a partition \(\varOmega =\bigcup _{i=0}^{N}\varOmega _i\) into measurable subsets with \(0\leqslant N\leqslant \infty \) such that \(\mu (\varOmega _i)>0\) for \(i\geqslant 1\), the restriction of W to \(\varOmega _i\times \varOmega _i\) is connected for each \(i\geqslant 1\), and \(W=0\) a.e. on \((\varOmega \times \varOmega ){\setminus }\bigcup _{i=1}^{N}(\varOmega _i\times \varOmega _i)\).

Bollobás et al. [5, Lemma 7] showed that connectivity implies an apparently stronger statement.

Lemma 2.5

Let \(W:\varOmega ^2\rightarrow [0,1]\) be a connected graphon, and let \(0<\alpha <\tfrac{1}{2}\) be given. There is some constant \(\beta =\beta (W,\alpha )>0\) such that \(\int _{A\times A^c}W\geqslant \beta \) for every measurable subset \(A\subseteq \varOmega \) with \(\alpha \leqslant \mu (A)\leqslant \tfrac{1}{2}\).

We can now give our definition of expander decomposition.

Definition 2.6

Suppose that G is a loopless multigraph of order n. We say that \(V(G)=V_0\sqcup V_1\sqcup \ldots \sqcup V_k\) is a \((\gamma ,\eta ,\varepsilon )\)-expander decomposition if

  1. (G1)

    \(|V_0|\leqslant \varepsilon n\),

  2. (G2)

    for each \(i\in [k]\) we have that \(e(V_i,V{\setminus } V_i)\leqslant \eta |V_i|n\),

  3. (G3)

    for each \(i\in [k]\) and each \(U\subseteq V_i\), we have \(e(U,V_i{\setminus } U) \geqslant \gamma |U||V_i{\setminus } U|\).

Theorem 2.7

Suppose that \(W:\varOmega ^2\rightarrow [0,1]\) is a nondegenerate graphon. Then for every \(\varepsilon ,\eta >0\) there exist positive constants \(\gamma =\gamma (W,\varepsilon ,\eta )\), \(\xi =\xi (W,\varepsilon ,\eta )\) and \(n_0=n_0(W,\varepsilon ,\eta )\) such that if G is a graph with \(v(G)>n_0\) and \(\delta _\square (G,W)<\xi \) then G admits a \((\gamma ,\eta ,\varepsilon )\)-expander decomposition.

For the proof of Theorem 2.7, we shall need the following result. While we were not able to find an explicit reference, we consider this result folklore. To this end, we need the notion of subgraphons which we introduce now. Suppose that \(W:\varOmega ^2\rightarrow [0,1]\) is a graphon on a probability space \((\mu , \varOmega )\). Similarly to the graph case, for a set \(\varLambda \subseteq \varOmega \) we have the notion of a subgraphon of W induced by \(\varLambda \). This is the restricted function \(W[\varLambda ]:=W_{\upharpoonright \varLambda \times \varLambda }\). In order for \(W[\varLambda ]\) to be a graphon, we always have to consider it together with the renormalized probability space \((\frac{\mu (\cdot )}{\mu (\varLambda )}, \varLambda )\).

Lemma 2.8

Suppose that \(W:\varOmega ^2\rightarrow [0,1]\) is a nondegenerate graphon. Suppose that we have a partition \(\varOmega =\varOmega ^*\sqcup \bigsqcup _{i=1}^k \varOmega _i\) such that for each \(i\in [k]\) we have that W is zero almost everywhere on \(\varOmega _i\times (\varOmega {\setminus } \varOmega _i)\). Then for every \(\lambda >0\) there exists a number \(\xi >0\) so that we have the following. If G is an n-vertex graph with \(\delta _\square (G,W)<\xi \) then there exists a partition \(V(G)=\bigsqcup _{i=0}^k V_i\) such that for each \(i\in [k]\),

  1. (a)

    \(|V_0|=\mu (\varOmega ^*)n\pm \lambda n\), and \(|V_i|=\mu (\varOmega _i)n\pm \lambda n\),

  2. (b)

    \(e_G(V_i,V(G){\setminus } V_i)<\lambda n^2\),

  3. (c)

    \(\delta _\square (G[V_i],W[\varOmega _i])<\lambda \).

Proof of Theorem 2.7

Suppose that we are given a graphon \(W:\varOmega ^2\rightarrow [0,1]\) and two parameters \(\varepsilon ,\eta >0\). By Lemma 2.4, there exists a partition \(\varOmega =\bigsqcup _{i=1}^K\varOmega _i\) (with \(K\leqslant \infty \)) into components of W. Let \(k\in \mathbb N\) be such that \(\mu \left( \bigcup _{i=0}^k\varOmega _i\right) \geqslant 1-\frac{\varepsilon }{4}\). Set \(\varOmega ^*:=\bigcup _{i=k+1}^K\varOmega _i\). Let \(\alpha =\min \left\{ \frac{\varepsilon }{6k}, \min \limits _{i\in [k]}\frac{\mu (\varOmega _i)}{40}\right\} \). Lemma 2.5 shows the existence of a positive constant \(\beta =\beta (W,\alpha )\) such that

$$\begin{aligned} \iint \limits _{A\times (\varOmega _i{\setminus } A)}W \geqslant \beta \ \text {for all}\,\, i\in [k]\,\, \hbox {and all} \,\,A \subseteq \varOmega _i\,\, \hbox {with} \,\,\alpha \mu (\varOmega _i) \leqslant \mu (A)\leqslant \mu (\varOmega _i)/2.\qquad \end{aligned}$$
(10)

Let \(\gamma ,\xi \) and \(\lambda \) satisfy

$$\begin{aligned} \gamma =\min \left\{ \frac{\mu (\varOmega _i)\eta }{48},\frac{\mu (\varOmega _i)\beta }{500}\right\} \quad \text{ and } \quad 0<\xi \ll \lambda \ll \min \left\{ \beta ,\min _{i\in [k]}\mu (\varOmega _i)\eta \right\} . \end{aligned}$$
(11)

Suppose that G is a graph given at the input of the proposition.

Let \(V(G)=V'_0\sqcup V'_1\sqcup \ldots \sqcup V'_k\) be a partition satisfying properties of Lemma 2.8 for the graphon W and its partition \(\varOmega =\varOmega ^*\sqcup \bigsqcup _{i=1}^k\varOmega _i\), together with input error parameter \(\xi \) and output error parameter \(\lambda \). We will modify this partition to obtain a \((\gamma ,\eta ,\varepsilon )\)-expander decomposition of G.

Lemma 2.8(a) gives that

$$\begin{aligned} |V'_0|\leqslant \tfrac{1}{2} \varepsilon n. \end{aligned}$$
(12)

For each \(i\in [k]\), we perform the following cleaning procedure. Let \(U:\varOmega _i^2\rightarrow [0,1]\) be a graphon representation of \(G[V_i']\) on the (renormalized) probability space \(\varOmega _i\) such that we have \(\Vert W[\varOmega _i]-U\Vert _\square <\lambda \). Such a representation exists by Lemma 2.8(c).

Let \(P_i^0:=V_i'\) and \(Q_i^0:=\emptyset \). Now, for \(j=1,2,3,\ldots \) we proceed as follows. If there exists at least one set \(X_i^j\subseteq P_i^{j-1}\) of size at most \(\frac{3}{5}|P_i^{j-1}|\) with \(e(X_i^j,P_i^{j-1}{\setminus } X_i^j)< \gamma |X_i^j|n\), then we take this set, and let \(Q_i^j:=X_i^1\cup X_i^2\cup \ldots \cup X_i^j\), \(P_i^j:=V_i'{\setminus } Q_i^j\), and proceed with \(j+1\). If no set \(X_i^j\) exists, then we set \(j(i):=j-1\), \(V_i:=P_i^{j(i)}\), and terminate. Since the sets \(X_i^j\) (\(j=1,2,\ldots , j(i)-1\)) are nonempty, we will stop eventually.

For every \(i\in [k]\) and every \(j\in [j(i)]\), we have

$$\begin{aligned} e(Q_i^{j},P_i^j)=\sum _{\ell =1}^je(X^{\ell }_i,P_i^j) \leqslant \sum _{\ell =1}^je(X^{\ell }_i,P^{\ell -1}_i{\setminus } X^{\ell }_i) \leqslant \gamma n \sum _{\ell =1}^j |X^{\ell }_i|=\gamma |Q_i^j|n. \end{aligned}$$
(13)

Claim 2.7.1

For each \(i\in [k]\),

$$\begin{aligned} |V_i'{\setminus } V_i|&\leqslant 3\alpha n. \end{aligned}$$
(14)

Proof

Suppose to the contrary that \(|V_i'{\setminus } V_i|\geqslant 3\alpha n\). Let \(j\in \{0,1,2,\ldots ,j(i)\}\) be the largest index for which

$$\begin{aligned} |P_i^{j}|\geqslant \tfrac{1}{4}|V_i'|. \end{aligned}$$
(15)

Now, there are two cases to consider. Either \(|P_i^{j+1}|<\frac{1}{4}|V_i'|\) and then we have \(|P_i^j|\leqslant (\frac{1}{4}+\frac{3}{5})|V_i'|\) by the way we chose the set \(X_i^{j+1}\). Another case is that \(j=j(i)\), that is, we terminated in the step j. Then, by our assumption, \(|V_i'{\setminus } P_i^j|=|V_i'{\setminus } V_i|\geqslant 3\alpha n\). Put together,

$$\begin{aligned} |Q^j_i|=|V_i'{\setminus } P_i^{j}|\geqslant \min \left\{ (1-\tfrac{1}{4}-\tfrac{3}{5})|V_i'|,3\alpha n\right\} = 3\alpha n, \end{aligned}$$
(16)

as \(|V'_i| \geqslant \tfrac{1}{2} \mu (\varOmega _i)n\) (by Lemma 2.8(a)) and \(\alpha \leqslant \tfrac{1}{40}\mu (\varOmega _i)\) (by (11)).

We learn from (13) that

figure a

Let \(\varLambda \subseteq \varOmega _i\) represent the vertices of \(P_i^j\). We have \(\mu (\varLambda )\geqslant \frac{1}{4}\mu (\varOmega _i)-\lambda \geqslant \frac{1}{5}\mu (\varOmega _i)\), due to Lemma 2.8(a) and (15). Similarly, (16) gives \(\mu (\varOmega _i{\setminus } \varLambda )\geqslant \alpha \mu (\varOmega _i)\). Thus, (10) applies. We have

figure b

which contradicts (17). \(\square \)

We have defined the sets \(V_1,V_2,\ldots , V_k\). Set \(V_0:=V_0'\cup \bigcup _{i=1}^k (V_i'{\setminus } V_i)\). Let us now check that \(V(G)=V_0\sqcup V_1\sqcup \ldots \sqcup V_k\) is indeed a desired expander decomposition.

As for property (G1), we have

figure c

For (G2), we first notice that

figure d

as \(\alpha \leqslant \tfrac{1}{12}\mu (\varOmega _i)\). Thus we find, as required,

figure e

where the last inequality holds since \(|V_i|\geqslant \tfrac{1}{2}\mu (\varOmega _i)n\), \(\lambda \ll \mu (\varOmega _i)\eta \), and \(\gamma \leqslant \frac{\mu (\varOmega _i)\eta }{48}\).

Finally, property (G3) follows immediately from the stopping condition. This completes our proof of Theorem 2.7. \(\square \)

2.3 Properties of the Expander Decomposition

For the proof of Theorem 1.1 we argue that the majority of vertices do not “see” much beyond the component in the expander decomposition they belong to. This is formalized in the following definitions.

Definition 2.9

Suppose that G is a loopless multigraph of order n. Assume that \(V(G)=V_0\sqcup V_1\sqcup \ldots \sqcup V_k\) is some expander decomposition of G. For a vertex v we write i(v) for the unique \(i\in \{0,1,\ldots ,k\}\) such that \(v \in V_i\). Given \(\alpha >0\) and \(\varepsilon >0\) we say that \(v\in V{\setminus } V_0\) is \((\alpha ,\varepsilon )\)-good with respect to the decomposition if the following hold:

  1. (a)

    \(\deg (v) \geqslant \mathsf {\Omega }(\varepsilon n)\),

  2. (b)

    \(\deg (v; V_{i(v)}) \geqslant (1-\mathsf {O}(\varepsilon ^2)) \deg (v)\),

  3. (c)

    \(\displaystyle \sum _{u \in V_{i(v)}, u \sim v} \big ( {1 \over \deg (u; V_{i(v)})} - {1 \over \deg (u)} \big ) \leqslant \mathsf {O}(\alpha ^{1/2})\),

  4. (d)

    \(\displaystyle \sum _{u \in V_{i(v)}, u \sim v} {1 \over \deg (u; V_{i(v)})} \leqslant \mathsf {O}(\alpha ^{-1/4})\).

Definition 2.10

Suppose that G is a loopless multigraph of order n. Given numbers \(\beta , \alpha , \gamma >0\) and \(\varepsilon \in (0,\alpha )\), we say that G has an \((\beta , \alpha ,\gamma ,\varepsilon )\)-good-decomposition if

  1. (1)

    G admits a \((\gamma , \varepsilon ^5, \varepsilon ^5)\)-expander decomposition \(V(G)=V_0\sqcup V_1\sqcup \ldots \sqcup V_k\), and

  2. (2)

    At least \((1-\mathsf {O}(\alpha ^{1/4}))n\) vertices of G are \((\alpha ,\varepsilon )\)-good.

  3. (3)

    At least \((1-\mathsf {O}(\beta ))n\) vertices of G have degree at least \(\mathsf {\Omega }(\alpha ^{1/10} n)\).

Next we refine Theorem 2.7.

Lemma 2.11

For any \(\beta >0\) and any nondegenerate graphon \(W:\varOmega ^2\rightarrow [0,1]\), there exist \(\alpha ,\varepsilon , \gamma ,\xi >0\) with \(\beta \gg \alpha \gg \varepsilon \gg \gamma \gg \xi \) such that if G is a simple graph on \(n \geqslant \xi ^{-1}\) vertices with \(d_\square (G,W)\leqslant \xi \), then G has a \((\beta , \alpha , \varepsilon , \gamma )\)-good-decomposition.

Proof

Let \(\beta \) and W be given. Since W is nondegenerate, there exists \(\alpha >0\) such that any m-vertex graph (m is arbitrary) that is \(\xi _1\)-close (for \(\xi _1>0\) sufficiently small) to W has at least \((1-\beta )m\) of degrees at least \(\alpha ^{1/10} m\), so requirement (3) of Definition 2.10 holds. Similarly, we can find constants \(\varepsilon \in (0,\alpha ^{20})\) and \(\xi _2>0\) such that any m-vertex graph (m is arbitrary) which is \(\xi _2\)-close to W has at most \(\alpha m\) vertices of degrees at most \(\varepsilon m\). We apply Theorem 2.7 with input \(\varepsilon ^5\) and \(\eta =\varepsilon ^5\) and retrieve \(\gamma >0\) and \(\xi _3>0\). We set \(\xi :=\min (\xi _1,\xi _2,\xi _3)\). Suppose now that \(G=(V,E)\) is a graph satisfying the assumptions of the lemma. Theorem 2.7 readily gives item (1) of Definition 2.10.

To show item (2), we first note that by property (G2) of the expander decomposition we have that \(e(V_i, V {\setminus } V_i) \leqslant \varepsilon ^5 n |V_i|\) for all \(i\in [k]\). By summing over i we deduce that

$$\begin{aligned} \sum _{i=1}^k e(V_i,V {\setminus } V_i) \leqslant \varepsilon ^5 n^2 . \end{aligned}$$
(18)

Denote by S the set of vertices of \(V{\setminus } V_0\) violating (b) of Definition 2.9 using 1 as the implicit constant in the term \(\mathsf {O}(\varepsilon ^2)\),Footnote 4 that is,

$$\begin{aligned} S = \big \{ v \in V{\setminus } V_0 : \deg (v; V{\setminus } V_{i(v)}) \geqslant \varepsilon ^2 \deg (v) \big \} . \end{aligned}$$

Then by (18) we have that

$$\begin{aligned} \varepsilon ^5 n^2 \geqslant \sum _{v \in S} \deg (v; V {\setminus } V_{i(u)}) \geqslant \varepsilon ^2 \sum _{v\in S} \deg (v) . \end{aligned}$$

Therefore \(\sum _{v \in S} \deg (v) \leqslant \varepsilon ^3 n^2\), from which we learn that

$$\begin{aligned} |S \cap \{v : \deg (v) \geqslant \varepsilon n \}| \leqslant \varepsilon ^2 n . \end{aligned}$$
(19)

We deduce that

$$\begin{aligned} \big | \big \{ v \in V {\setminus } V_0 : \deg (v) \leqslant \varepsilon n \mathrm {\ or\ }\deg (v; V{\setminus } V_{i(v)}) \geqslant \varepsilon ^2 \deg (v) \big \} \big | \leqslant (\alpha + \varepsilon ^2)n . \end{aligned}$$
(20)

Next, for \(i\in [k]\) we write

$$\begin{aligned} \sum _{v \in V_i} \sum _{u \in V_{i}, u \sim v} \Bigg ( {1\over \deg (u; V_{i})} - {1 \over \deg (u)} \Bigg ) = |V_{i}| - \sum _{u\in V_i}{\deg (u; V_i) \over \deg (u)} = \sum _{u \in V_i} {\deg (u; V{\setminus } V_i) \over \deg (u)} . \end{aligned}$$

We sum this over \(i\in [k]\) and get that

$$\begin{aligned}&\sum _{v \in V{\setminus } V_0} \sum _{u \in V_{i(v)}, u \sim v} \Bigg ( {1\over \deg (u; V_{i})} - {1 \over \deg (u)} \Bigg )\\&\quad = \sum _{u \in V{\setminus } V_0} {\deg (u; V{\setminus } V_{i(u)}) \over \deg (u)} \leqslant (\alpha + \varepsilon ^2)n + \varepsilon ^2 n = \mathsf {O}(\alpha n) \, ,\end{aligned}$$

where we bounded the ratio by 1 for those vertices counted in (20), and by \(\varepsilon ^2\) for the vertices that were not. From the last inequality we deduce that there cannot be more than \(\mathsf {\Omega }(\alpha ^{1/2} n)\) vertices \(v\in V{\setminus } V_0\) such that requirement (c) in the definition of \((\alpha ,\varepsilon )\)-good is not satisfied.

Lastly, to show (d), we have that

$$\begin{aligned} \sum _{i=1}^k \sum _{v\in V_i} \sum _{u\in V_i: u \sim v} {1 \over \deg (u; V_i)} = \sum _{i=1}^k |V_i| \leqslant n \, , \end{aligned}$$

therefore there cannot be more than \(\mathsf {\Omega }(\alpha ^{1/4} n)\) vertices \(v\in V{\setminus } V_0\) such that (d) is violated. This concludes our proof. \(\square \)

Part (2) of Definition 2.10 asserts that there are many \((\alpha ,\varepsilon )\)-good vertices in the graph, yet there could still be components of the decomposition \(V_i\) in which the majority of their vertices are not \((\alpha ,\varepsilon )\)-good. These cannot occupy too much of the mass. Indeed, for some \(i\in [k]\), we say the set \(V_i\) is \((\alpha ,\varepsilon )\)-big if at least \((1-\mathsf {O}(\alpha ^{1/8}))|V_i|\) of its vertices are \((\alpha ,\varepsilon )\)-good, and \(e(G[V_i])\geqslant \mathsf {\Omega }(\alpha ^{1/9}|V_i|n)\).

Proposition 2.12

Suppose that G is a graph with n vertices that has a \((\beta , \alpha ,\gamma ,\varepsilon )\)-good-decomposition (as in Definition 2.10). Then

$$\begin{aligned} \sum _{i\in [k]: V_i \mathrm {\ is \ } (\alpha ,\varepsilon )\mathrm {-big}} |V_i| \geqslant (1-\mathsf {O}(\beta ^{1/8})) n . \end{aligned}$$

Proof

Let \(I_1\) be the indices \(i\in [k]\) such that \(V_i\) has \(\mathsf {\Omega }(\alpha ^{1/8} n)\) vertices that are not \((\alpha ,\varepsilon )\)-good. Since this is a \((\beta , \alpha ,\gamma ,\varepsilon )\)-good-decomposition we have that the total number of not \((\alpha ,\varepsilon )\)-good vertices is \(\mathsf {O}(\alpha ^{1/4} n)\). Hence,

$$\begin{aligned} \sum _{i\in I_1} |V_i| \leqslant \mathsf {O}(\alpha ^{1/8} n) = \mathsf {O}(\beta ^{1/8} n) . \end{aligned}$$

Next, let \(I_2\) be the indices \(i \in [k]\) such that \(e(G[V_i]) = O(\alpha ^{1/9}|V_i|n)\). Put \(V' = \bigcup _{i\in I_2}V_i\). We have

$$\begin{aligned} \sum _{v \in V'} \deg (v)\leqslant & {} 2 \sum _{i\in I_2} e(G[V_i]) + \sum _{i \in I_2} e(V_i, V {\setminus } V_i) \leqslant O(\alpha ^{1/9}|V'| n) + \varepsilon ^5 |V'|n \\= & {} O(\alpha ^{1/9}|V'| n) . \end{aligned}$$

If \(|V'| = \mathsf {\Omega }(\beta ^{1/8} n)\), then by property (3) in Definition 2.10, we may bound \(\sum _{v \in V'} \deg (v)\) from below by \(\mathsf {\Omega }(\alpha ^{1/10} |V'| n)\), giving a contradiction to the last estimate. The proof is concluded since \(\sum _{i\in [k]} |V_i| \geqslant (1-\varepsilon ^5)n\). \(\square \)

3 Local Neighborhoods of the UST via Electric Networks

3.1 Preliminaries

Suppose that \(G=(V,E)\) is a loopless multigraph. We denote by \(\{X_t\}_{t \geqslant 0}\) the simple random walk starting from some (possibly random) vertex \(X_0\). That is, \(\{X_t\}_{t \geqslant 0}\) is a Markov chain with state space V and transition matrix \(p(x,y) = \frac{e(\{x\},\{y\})}{\deg (x)}\). We denote by \(\mathbb {P}_v\) the probability measure of the simple random walk started at a vertex v. We will frequently use two stopping times: the hitting time of a vertex v is the random variable \(\tau _v:=\min \{t \geqslant 0:X(t)=u\}\) and the hitting time after zero of a vertex v is the random variable \(\tau _v^+:=\min \{t>0:X(t)=v\}\). Clearly when \(X_0 \ne v\) these two random times are equal.

3.1.1 Effective Resistance

Our analysis relies on the relation between random walks, USTs and the theory of electrical networks. We briefly recall here the basic theory we will use and refer the reader to [18, Chapter 2] for a comprehensive study. Given a graph \(G=(V,E)\) we write \(\overrightarrow{E}\) for the set of directed edges of size 2|E| which contain each edge of E in both direction. Given two distinct vertices uv we say that an antisymmetric function \(f:\overrightarrow{E}\rightarrow \mathbb {R}\) is a flow from u to v if for each vertex \(w\notin \{u,v\}\) the sum of f over edges outgoing from w is zero. A flow is called unit if the sum of f over edges outgoing from u is 1. The effective resistance \(\mathcal {R}_{\mathrm {eff}}(u \leftrightarrow v; G)\) between u and v is defined as the minimum energy \(\mathcal {E}(f) = \sum _{e \in E} f(e)^2\) of any unit flow f from u to v. If u and v are not in the same connected component, we define effective resistance between them to be \(\infty \). When it is clear what the underlying graph G is we simply write \(\mathcal {R}_{\mathrm {eff}}(u \leftrightarrow v)\). From this definition it is immediate that if \(G'\) is a subgraph of G, then

$$\begin{aligned} \mathcal {R}_{\mathrm {eff}}(u \leftrightarrow v; G) \leqslant \mathcal {R}_{\mathrm {eff}}(u \leftrightarrow v; G')\; . \end{aligned}$$
(21)

The latter inequality is also known as Rayleigh’s monotonicity law. The discrete Dirichlet’s principle gives a dual definition of the effective resistance in terms of functions on the vertices. It states that

$$\begin{aligned} \mathcal {R}_{\mathrm {eff}}(u\leftrightarrow v) ^{-1} = \inf \left\{ \sum _{(x,y) \in E} (h(x) - h(y))^2 : h:V\rightarrow \mathbb {R}\, , \, h(u)=0\, , \, h(v)=1 \right\} ,\nonumber \\ \end{aligned}$$
(22)

see [18, Exercise 2.13]. We will also a basic probabilistic interpretation of the effective resistance which can be found in [18, Chapter 2]:

$$\begin{aligned} \mathbb {P}_u\left[ \tau _v<\tau _u^+\right] = \frac{1}{\deg (u) \mathcal {R}_{\mathrm {eff}}(u \leftrightarrow v)} . \end{aligned}$$
(23)

3.1.2 Uniform Spanning Trees

There is a fundamental connection between the uniform spanning tree and electric networks due to Kirchhoff [10]. Let G be connected loopless multigraph and \(e=xy\) be an edge of the graph. As before we denote by \(\mathcal {T}\) a UST of G. Kirchhoff’s formula [10] (see also [18, Chapter 4]) states that for any edge \(e=(x,y)\) of G we have

$$\begin{aligned} \mathbb {P}( e \in \mathcal {T}) = \mathcal {R}_{\mathrm {eff}}(x \leftrightarrow y) . \end{aligned}$$
(24)

Let S be a subset of E(G). We would like to condition on events of the form \(S \subseteq \mathcal {T}\) or \(S \cap \mathcal {T}= \emptyset \). We denote by G / S the loopless multigraph obtained from G by contracting the edges of S and erasing any loops that has been formed, and by \(G-S\) the graph G with the edges of S erased. The following is an easy and classical observation, see [18, Chapter 4].

Proposition 3.1

Let G be a connected loopless multigraph and S a subset of edges of G.

  1. (1)

    If \(G-S\) is connected, then the UST \(\mathcal {T}\) of G conditioned on \(S \cap \mathcal {T}= \emptyset \) has the distribution of the UST on \(G - S\).

  2. (2)

    If S does not contain a cycle, then the UST \(\mathcal {T}\) of G conditioned on \(S \subseteq \mathcal {T}\) has the distribution of \(S \cup \mathcal {T}_{G/S}\) where \(T_{G/S}\) is a UST of G / S.

3.1.3 Mixing Time

Let \(G=(V,E)\) be a finite connected loopless multigraph and consider the lazy simple random walk on it, that is, the Markov chain on the vertex set V with transition probability \(p(x,y) = {e(\{x\},\{y\})\over 2\deg (x)}\) whenever \(x \ne y\) and \(p(x,x) = 1/2\) for any vertex x. Let \(\pi \) be the stationary distribution \(\pi (x) = \deg (x) / 2|E|\) and for each two disjoint subsets of vertices AB we write

$$\begin{aligned} Q(A,B) = \sum _{x\in A, y\in B} \pi (x)p(x,y) = e(A,B)/4|E| . \end{aligned}$$

The Cheeger constant \(\varPhi _*\) is defined as

$$\begin{aligned} \varPhi _* = \min _ {S : \pi (S) \leqslant {1\over 2}} {Q(S,V{\setminus } S) \over \pi (S)} \, , \end{aligned}$$

where \(\pi (S) = \sum _{x\in S} \pi (x)\). This “bottleneck” ratio is frequently used to control the spectral gap of the lazy random walk from which we may bound its mixing time. Let \(1=\lambda _1 > \lambda _2 \geqslant \cdots \geqslant \lambda _n \geqslant 0\) be the eigenvalues of the transition matrix p (we have \(\lambda _1 > \lambda _2\) since G is connected, and \(\lambda _n \geqslant 0\) since the chain is lazy, see [14]). A result by Jerrum and Sinclair [20], Lawler and Sokal [13] and Alon and Milman [3] states that

$$\begin{aligned} \varPhi _*^2/2 \leqslant 1-\lambda _2 \leqslant 2\varPhi _*.\end{aligned}$$
(25)

Assume now that G is a \(\gamma \)-expander as in Definition 2.1 and that the number of parallel edges between any two vertices is at most \(f \geqslant 1\). Then the degree of each vertex is at most fn, hence \(\pi (S) \leqslant f|S|n/2|E|\) for any \(S\subseteq V\). Similarly, for any \(S \subseteq V\) we have that \(\pi (V{\setminus } S) \leqslant fn|V{\setminus } S|/2|E|\), so if \(\pi (V{\setminus } S) \geqslant 1/2\) we get that \(|V{\setminus } S| \geqslant |E|/fn\). Since the minimum degree in a \(\gamma \)-expander is at least \(\gamma (n-1)\) we get that if \(\pi (V{\setminus } S) \geqslant 1/2\), then \(|V{\setminus } S| \geqslant \gamma (n-1)/2f\). Putting all this together we get that if G is a \(\gamma \)-expander and \(n\geqslant 2\), then

$$\begin{aligned} \varPhi _* \geqslant {\gamma |S| |V{\setminus } S| \over 2|S|fn} \geqslant {\gamma ^2 \over 8 f^2} \, , \end{aligned}$$

from which we get by (25) that

$$\begin{aligned} 1-\lambda _2 \geqslant {c \gamma ^4 \over f^4} \, , \end{aligned}$$
(26)

where \(c=1/128\). Recall that the total variation distance \(\Vert \mu -\nu \Vert _{\mathrm {TV}}\) between two probability measures \(\mu \) and \(\nu \) on the same probability space is defined to be \(\sup _A |\mu (A)-\nu (A)|\), where the \(\sup \) is ranging over all events A. For \(\varepsilon >0\) the \(\varepsilon \)-mixing-time \(T_{\mathrm {mix}}(\varepsilon )\) of the chain is defined as

$$\begin{aligned} T_{\mathrm {mix}}(\varepsilon ) = \min \big \{ t : \Vert p^t(x,\cdot ) - \pi (\cdot )\Vert _{\mathrm {TV}} \leqslant \varepsilon \text { for all } x\in V \big \} . \end{aligned}$$

The mixing time and the spectral gap are related via the following statement, see [14, Theorem 12.4],

$$\begin{aligned} T_{\mathrm {mix}}(\varepsilon ) \leqslant {1 \over 1-\lambda _2} \Bigg [ {1 \over 2} \log {1 \over \min _{x \in V} \pi (x)} + \log (1/2\varepsilon ) \Bigg ] . \end{aligned}$$

Since \(f\geqslant 1\) bounds the maximal number of parallel edges between any two vertices, we get that \(|E|\leqslant fn^2\) and hence \(\pi (x) \geqslant \gamma (n-1)/2|E| \geqslant c\gamma /(fn)\) for some universal constant \(c>0\). We deduce from this, the above bound on \(T_{\mathrm {mix}}(\varepsilon )\) and (26) that

$$\begin{aligned} T_{\mathrm {mix}}(\varepsilon ) \leqslant C f^4 \gamma ^{-4} \big [ \log n + \log f/\gamma + \log \varepsilon ^{-1} \big ] \, ,\end{aligned}$$
(27)

for some universal constant \(C>0\).

3.2 Random Walks on Dense Expanders

Lemma 3.2

Suppose that H is a loopless multigraph on n vertices that is a \(\gamma \)-expander and that the number of parallel edges between any two vertices is at most \(f \geqslant 1\). Then for any two distinct nodes u and v and a node \(w \ne v\) we have that

$$\begin{aligned} \mathbb {P}_w\left[ \tau _v<\tau _u^+\right] =\frac{\deg (v)}{\deg (u)+\deg (v)}+\mathsf {O}(\gamma ^{-7}f^7n^{-1}\log (n)). \end{aligned}$$

Proof

Let us assume that there are no edges between u and v—this can only matter for the assertion of the statement when \(w = u\) and in this case affect the estimate by the probability that this edge is traversed on the first step of the random walk; the latter probability is at most \(\mathsf {O}(\gamma ^{-1} f/n)\) the assumption, since \(\deg (u) \geqslant \gamma n\). This error is swallowed in the error estimate of the lemma.

We denote by \(\mathbb {P}^{\text {lazy}}\) the lazy random walk on the graph, that is, the random walk that with probability 1 / 2 stays put and otherwise jumps to a uniformly chosen neighbor. It is clear that if w is a vertex such that \(w \not \in \{u,v\}\) then \(\mathbb {P}^{\text {lazy}}_w(\tau _v<\tau _u^+) = \mathbb {P}_w(\tau _v<\tau _u^+)\). We will first show, via a coupling argument, that for any two vertices \(w_1, w_2 \not \in \{u,v\}\) we have

$$\begin{aligned} \mathbb {P}^{\text {lazy}}_{w_1}(\tau _v< \tau _u^+) = \mathbb {P}^{\text {lazy}}_{w_2}(\tau _v < \tau _u^+) + \mathsf {O}(\gamma ^{-6} f^6 n^{-1} \log (n)) .\end{aligned}$$
(28)

Indeed, let \(\{X_t\}_{t \geqslant 0}\) and \(\{Y_t\}_{t \geqslant 0}\) be two lazy simple random walks starting at \(w_1\) and \(w_2\), respectively. We put \(\varepsilon =n^{-1}\) in (27) and bound \(\log (f/\gamma )\) by \(f/\gamma \) and get that if \(T=C\gamma ^{-5}f^5\log (n)\), then \(\Vert X_T - \pi \Vert _{\text {TV}} \leqslant n^{-1}\) and the same estimate holds for \(Y_T\), hence \(\Vert X_T - Y_T\Vert _{\text {TV}} \leqslant 2n^{-1}\) (where by \(\Vert X_T - Y_t\Vert _{\text {TV}}\) we mean the total variation distance between the laws of \(X_T\) and \(Y_T\)).

By [14, Proposition 4.7] we deduce that we can couple the walks \(\{X_t\}\) and \(\{Y_t\}\) so that \(X_T=Y_T\) with probability at least \(1-2n^{-1}\). If this occurs we continue the coupling so that \(X_t = Y_t\) for all \(t \geqslant T\) by using the same random neighbor at each step of the walk. Thus, if both walks have not visited u or v between time 1 and T, then the event \(\{\tau _v < \tau _u^+\}\) occurs for \(\{X_t\}\) if and only if it occurs for \(\{Y_t\}\). Since the minimal degree of H is at least \(\gamma (n-1)\) and the maximal number of parallel edges between any two vertices is at most f we learn that the probability of visiting u or v between time 1 and T is at most \(Tf/\gamma (n-1) \leqslant 2C\gamma ^{-6}f^6n^{-1}\log (n)\), concluding the proof of (28).

We now continue the proof of the lemma for the case that \(w=u\). If the walker starts at u and \(\tau _v < \tau _u^+\), then it cannot be lazy in the first step. Hence

$$\begin{aligned} \mathbb {P}^{\text {lazy}}_u(\tau _v<\tau _u^+) = {1 \over 2} \mathbb {P}_u(\tau _v<\tau _u^+) .\end{aligned}$$
(29)

Consider now the Markov chain on the two states \(\{u,v\}\) with transition probabilities

$$\begin{aligned} p(u,v) = \mathbb {P}^{\text {lazy}}_u(\tau _v< \tau _u^+) \, , \qquad p(v,u) = \mathbb {P}^{\text {lazy}}_v(\tau _u < \tau _v^+)\, , \end{aligned}$$

with \(p(u,u) = 1-p(u,v)\) and \(p(v,v) = 1-p(v,u)\). This is the lazy random walk “watched” on the vertices u and v. By summing over paths it is immediate that

$$\begin{aligned} \deg (u) p(u,v) = \deg (v) p(v,u) . \end{aligned}$$
(30)

In order to visit v before returning to u, the lazy walker must walk to a random neighbor in the first step, so

$$\begin{aligned} p(u,v) = {1 \over 2} \mathbb {P}^{\text {lazy}}_{N(u)}(\tau _v < \tau _u^+) \, , \end{aligned}$$
(31)

where \(\mathbb {P}^{\text {lazy}}_{N(u)}\) indicates a uniform starting position from the set N(u) of neighbors of u. Similarly, when starting from v, in order to return to v before visiting u, the lazy walker can either stay put on the first step, or jump to a uniform neighbor of v and from there visit v before u, thus

$$\begin{aligned} p(v,v) = {1 \over 2} + {1\over 2} \mathbb {P}^{\text {lazy}}_{N(v)}(\tau _v < \tau _u^+) .\end{aligned}$$
(32)

Since we assumed there are no edges between u and v, by (28) we have that

$$\begin{aligned} \mathbb {P}^{\text {lazy}}_{N(v)}(\tau _v< \tau _u^+) = \mathbb {P}^{\text {lazy}}_{N(u)}(\tau _v < \tau _u^+) + \mathsf {O}(\gamma ^{-6}f^6n^{-1}\log (n)) . \end{aligned}$$

This together with (31) and (32) gives that \(p(u,v)+p(v,u)=1/2 + \mathsf {O}(\gamma ^{-6}f^6n^{-1}\log (n))\). Together with (30) and the fact that all degrees are at least \(\gamma (n-1)\) and at most fn gives that

$$\begin{aligned} p(u,v) = {\deg (v) \over 2(\deg (u) + \deg (v))} + \mathsf {O}( \gamma ^{-7}f^7 n^{-1}\log (n) ) \, , \end{aligned}$$

concluding the proof of lemma when \(w=u\) by (29). The proof for any \(w \not \in \{u,v\}\) can now be completed easily. By (28) and our assumption that there are no edges between u and v shows that

$$\begin{aligned} \mathbb {P}^{\text {lazy}}_{w}(\tau _v< \tau _u^+) = \mathbb {P}^{\text {lazy}}_{N(u)}(\tau _v < \tau _u^+) + \mathsf {O}(\gamma ^{-6}f^6n^{-1}\log n ) \, , \end{aligned}$$

and so the lemma follows by (31). \(\square \)

Corollary 3.3

Suppose that H is a loopless multigraph on n vertices that is a \(\gamma \)-expander and that the number of parallel edges between any two vertices is at most \(f \geqslant 1\). Then for any two vertices \(u \ne v\)

$$\begin{aligned} \mathcal {R}_{\mathrm {eff}}(u \leftrightarrow v) = (1 + \mathsf {O}( \gamma ^{-8} f^8 n^{-1} \log (n) ))\Bigg ( {1 \over \deg (u)} + {1 \over \deg (v)} \Bigg ) . \end{aligned}$$

Proof

Follows immediately by (23) and Lemma 3.2 together with the fact that all degrees are at least \(\gamma (n-1)\) and at most fn. \(\square \)

We now extend Corollary 3.3 to the setting of a general dense graph.

Lemma 3.4

Suppose that G is a loopless multigraph with n vertices given together with a \((\gamma ,\varepsilon ^5,\varepsilon ^5)\)-expander decomposition \(V(G)=V_0\sqcup V_1\sqcup \ldots \sqcup V_k\). Assume further that the maximal number of parallel edges among any two pairs of vertices is at most \(f\geqslant 1\). Assume that \(\gamma ^{-8} f^8 n^{-1}\log n \leqslant \varepsilon \). Let \(i \in [k]\) and \(u \ne v\) be two distinct vertices of \(V_i\). Then

$$\begin{aligned} (1- \mathsf {O}(\varepsilon )) \Bigg ( {1 \over \deg (u)} + {1 \over \deg (v)} \Bigg ) \leqslant \mathcal {R}_{\mathrm {eff}}(u \leftrightarrow v) \leqslant (1- \mathsf {O}(\varepsilon )) \Bigg ( {1 \over \deg (u; V_i)} + {1 \over \deg (v; V_i)} \Bigg ), \end{aligned}$$

and if in addition u and v are \((\alpha ,\varepsilon )\)-good, then

$$\begin{aligned} \mathcal {R}_{\mathrm {eff}}(u \leftrightarrow v) = (1 + \mathsf {O}(\varepsilon )) ) \Bigg ( {1 \over \deg (u)} + {1 \over \deg (v)} \Bigg ) . \end{aligned}$$

Proof

Since \(G[V_i]\) is a \(\gamma \)-expander on at least \(\gamma n\) vertices, by Corollary 3.3 together with Rayleigh’s monotonicity (21) we have

$$\begin{aligned} \mathcal {R}_{\mathrm {eff}}(u \leftrightarrow v)\leqslant & {} (1 + \mathsf {O}( \gamma ^{-8} f^8 n^{-1} \log (n) ) \Bigg ( {1 \over \deg (u; V_i)} + {1 \over \deg (v; V_i)} \Bigg ) , \end{aligned}$$

giving the upper bound of the first assertion of the lemma. The upper bound of the second assertion immediately follows using the part (b) of Definition 2.9.

For the lower bound we will use Dirichlet’s principle (22) and let \(h:V(G) \rightarrow [0,1]\) be the function assigning \(h(v)=1\), \(h(u)=0\) and for any vertex \(x \not \in \{u,v\}\) we put \(h(x) = \deg (v)/(\deg (u) + \deg (v))\). By our assumption there are at most f edges (xy) such that \(x=u\) and \(y=v\) in which \(h(y)-h(x)=1\). Next, there are at most \(\deg (u)\) edges (xy) for which \(x=u\) and \(h(y)-h(x) = \deg (v)/(\deg (u)+\deg (v))\). Similarly, there are at most \(\deg (v)\) edges (xy) for which \(y=v\) and \(h(y)-h(x)=\deg (u)/(\deg (u)+\deg (v))\). All other edges (xy) of the graph have \(h(x)-h(y)=0\). Hence by (22) we have

$$\begin{aligned} \mathcal {R}_{\mathrm {eff}}(u \leftrightarrow v)^{-1}\leqslant & {} f + {\deg (u) \deg (v)^2 + \deg (v) \deg (u)^2 \over (\deg (u) + \deg (v))^2}\\\leqslant & {} {\deg (u)\deg (v) \over \deg (u) +\deg (v)} (1+ \gamma ^{-4} f^2n^{-1}) \, , \end{aligned}$$

where we used the fact that since each \(V_i\) is a \(\gamma \)-expander, its cardinality must be \(\mathsf {\Omega }(\gamma n)\) and hence \(\deg (u)\) and \(\deg (v)\) can be bounded below by \(\mathsf {\Omega }(\gamma ^2 n)\) and above by fn. This gives the required lower bound. \(\square \)

3.3 The Density of Fixed Trees in the UST

The main result of this section, Lemma 3.12, allows express the frequency of a given fixed rooted tree T in a graph using a discrete analogue of the parameter \(\mathrm {Freq}\), see Definition 3.6 below.

Let us introduce some definitions and a setting that will be used throughout this section. In what follows we are always given an arbitrary \(\beta >0\) and a nondegenerate graphon W. We then apply Lemma 2.11 and extract the corresponding \(\alpha ,\varepsilon ,\gamma \) and \(\xi \) so that if G is a connected graph on n vertices and n is sufficiently large (as a function of \(\alpha , \varepsilon \) and \(\gamma \)) and \(\delta _\square (G,W)\leqslant \xi \), then G has a \((\beta , \alpha ,\gamma ,\varepsilon )\)-good decomposition as in Definition 2.10. We denote the given expander decomposition of G by \(V(G)=V_0\sqcup V_1\sqcup \ldots \sqcup V_k\). In light of the quantification of Lemma 2.11 we may assume that

$$\begin{aligned} \beta \gg \alpha \gg \varepsilon \gg \gamma \gg \xi \gg n^{-1} . \end{aligned}$$

Next, let T be a finite rooted tree of height r and \(\ell \) vertices denoted by \(1, \ldots , \ell \) so that 1 is the root of T. Given a graph G we say that \(\ell \) distinct vertices \((v_1, \ldots , v_\ell )\) of G are compatible with T if the pairs

$$\begin{aligned} T(v_1,\ldots ,v_\ell ) := \big \{ (v_q, v_t) : (q,t) \in E(T) \big \} \, , \end{aligned}$$

are all edges of G. Without loss of generality we may assume that the numbering \(\{1,\ldots , \ell \}\) of the vertices of T is such that there exists some \(p\in \{2,\ldots , \ell \}\) such that the vertices of distance r from the root (which all must be leaves) are \(p,\ldots , \ell \).

Definition 3.5

Assume the setting as introduced at the beginning of Sect. 3.3. Given some fixed \(\ell \geqslant 1\) and \(i\in [k]\) we say that an \(\ell \)-tuple of distinct vertices of G are i-pure if each vertex in the \(\ell \)-tuple belongs to \(V_i\) and is \((\alpha ,\varepsilon )\)-good.

Next we define \(\mathrm {Freq}(T;G)\) which is the discrete analogue of \(\mathrm {Freq}(T;W)\). Note the notation \(\mathrm {Freq}(T;G)\) does not reflect the fact that this parameter depends on the partition, and not just on the graph G.

Definition 3.6

Assume the setting as introduced at the beginning of Sect. 3.3. For each \(i\in [k]\) we define

$$\begin{aligned} \mathrm {Freq}(T;G, i):=|\mathrm {Stab}_T|^{-1}\sum _{\begin{array}{c} (v_1,\ldots ,v_\ell ) \\ i\text {-pure} \\ \text {compatible with } T \end{array}} |V_i|^{-1}\exp \left( -\sum _{j=1}^{p-1} b(v_j) \right) \frac{\sum _{j=p}^\ell \deg (v_j) }{\prod _{j=1}^\ell \deg (v_j)}\;, \end{aligned}$$

where

$$\begin{aligned} b(v) = \sum _{u \in V_{i(v)}, u \sim v} {1 \over \deg (u)}. \end{aligned}$$
(33)

Note that (33) is a graph counterpart to the graphon quantity defined in (2).

Finally, let

$$\begin{aligned} \mathrm {Freq}(T;G):= \sum _{\begin{array}{c} i\in [k]: V_i \mathrm {\ is \ } (\alpha ,\varepsilon )-\mathrm {big} \end{array}} {|V_i| \over n} \cdot \mathrm {Freq}(T;G, i) \, .\end{aligned}$$
(34)

We denote by \(\mathcal {T}\) a sample of the UST of G and by \(B_\mathcal {T}(v,r)\) the graph-distance ball in \(\mathcal {T}\) of radius r around \(v\in V(G)\) and we think about it as a subset of edges of \(\mathcal {T}\). We will begin our proof with estimating the probability that \(B_\mathcal {T}(v,r)\) is manifested on an i-pure \(\ell \)-tuple of vertices (as in Definition 3.5) that are compatible with T; later we will see that that all other manifestations of \(B_\mathcal {T}(v,r)\) are negligible. For an i-pure \(\ell \)-tuple that is compatible with T we write \(B_\mathcal {T}(v_1,r){\mathop {\cong }\limits ^{\mathrm {pure}}}T(v_1,\ldots ,v_\ell )\) for the event

  • the edges \(T(v_1,\ldots ,v_\ell )\) are in \(\mathcal {T}\), and,

  • for each \(1 \leqslant j \leqslant p-1\), the edges of emanating from \(v_j\) that are not in \(T(v_1,\ldots ,v_\ell )\) and have both endpoints in \(V_i\) are not in \(\mathcal {T}\).

Lemma 3.7

Assume the setting as introduced at the beginning of Sect. 3.3, and let \(\mathcal {T}\) be a UST of G. Let T be a fixed rooted tree with \(\ell \geqslant 2\) vertices \(1,\ldots ,\ell \) and height \(r \geqslant 1\) (as usual T is rooted at 1). Assume that the vertices at height r of T are \(\{p,\ldots , \ell \}\) for some \(2 \leqslant p \leqslant \ell \). Then for any \(i \in [k]\) and any i-pure \(\ell \)-tuple \((v_1,\ldots ,v_\ell )\) that is compatible with T we have that

$$\begin{aligned} \mathbb {P}\big ( B_\mathcal {T}(v_1,r){\mathop {\cong }\limits ^{\mathrm {pure}}}T(v_1,\ldots ,v_\ell ) \big ) = (1 + \mathsf {O}(\ell \alpha ^{1/4})) \exp \left( -\sum _{j=1}^{p-1} b(v_j) \right) \cdot \frac{\sum _{j=p}^\ell \deg (v_j) }{\prod _{j=1}^\ell \deg (v_j)} \, , \end{aligned}$$

where b(v) is defined in (33).

Proof

Assume that n is large enough as in Lemma 3.4. We first show by induction that the probability that \(T(v_1,\ldots ,v_\ell ) \subseteq E(\mathcal {T})\) is

$$\begin{aligned} (1 + \mathsf {O}(\ell \varepsilon )) {\sum _{j=1}^\ell \deg (v_j) \over \prod _{j=1}^\ell \deg (v_j)} \, .\end{aligned}$$
(35)

Indeed, when T contains only one edge this statement follows immediately from Kirchoff’s formula (24) and Lemma 3.4. If T has more than one edge, assume without loss of generality that \(\ell \) is a leaf in T at distance r from the root and that \((q,\ell )\) is an edge of T. We then use the induction hypothesis on the tree \(T{\setminus } \{\ell \}\), which has \(\ell -1\) vertices. We condition on the event \(T(v_1,\ldots ,v_\ell ) {\setminus } \{(v_q,v_\ell )\} \subseteq E(\mathcal {T})\) and contract these \(\ell -1\) vertices to a single vertex of degree \(\deg (v_1) + \cdots + \deg (v_{\ell -1})\). We shall make use of Proposition 3.1 when working with the contracted graph. Since the contracted multigraph is still a \(\gamma \)-expander with at most \(\ell \) parallel edges between any two vertices, and the vertices \((v_1,\ldots ,v_\ell )\) are all \((\alpha ,\varepsilon )\)-good, we may apply Lemma 3.4 and Kirchoff’s formula (24) to get that the probability that the edge \((v_{q}, v_\ell )\) of G is in \(\mathcal {T}\) is

$$\begin{aligned} (1-\mathsf {O}(\varepsilon )) \Bigg ( {1 \over \deg (v_\ell )} + {1 \over \deg (v_1) + \cdots + \deg (v_{\ell -1})} \Bigg ) . \end{aligned}$$

By our induction hypothesis we get that the required probability is

$$\begin{aligned} (1-\mathsf {O}((\ell -1)\varepsilon ))(1-\mathsf {O}(\varepsilon )) {\sum _{q=1}^{\ell -1} \deg (v_q) \over \prod _{q=1}^{\ell -1} \deg (v_q)} \Bigg ( {1 \over \deg (v_\ell )} + {1 \over \deg (v_1) + \cdots + \deg (v_{\ell -1})} \Bigg ) , \end{aligned}$$

concluding the proof of (35).

We now condition on the event \(T(v_1,\ldots ,v_\ell ) \subseteq E(\mathcal {T})\) and turn to compute the probability that all other edges of G emanating from \(v_1,\ldots ,v_{p-1}\) which have both endpoints in \(V_i\) are not in \(\mathcal {T}\). That is, the event that \(E_{v_j} \cap \mathcal {T}= \emptyset \) for all \(1 \leqslant j \leqslant p-1\), where \(E_{v_j}\) are all the edges of G between \(v_j\) and \(V_i{\setminus } \{v_1,\ldots ,v_\ell \}\).

Let j be an integer \(1 \leqslant j \leqslant p-1\) and condition additionally on all the edges of \((E_{v_1} \cup \cdots \cup E_{v_{j-1}}) \cap \mathcal {T}= \emptyset \) (if \(j=1\) there is no further conditioning). We will prove that under this conditioning the probability that \(E_{v_j} \cap \mathcal {T}= \emptyset \) is

$$\begin{aligned} (1+ \mathsf {O}(\alpha ^{1/4})) \exp \left( -b(v_j)\right) \cdot { \sum _{q=j+1}^\ell \deg (v_q) \over \sum _{q=j}^\ell \deg (v_q) } \, .\end{aligned}$$
(36)

Thus, once (36) is proved, by multiplying it over \(j=1,\ldots ,p-1\), we get that conditioned on \(\mathcal {O}\) and on \(E_G(T) \subseteq E(\mathcal {T})\), the probability that all the required edges are not in \(\mathcal {T}\) is

$$\begin{aligned} (1+ \mathsf {O}(\ell \alpha ^{1/4})) \exp \left( -\sum _{j=1}^{p-1} b(v_j)\right) \cdot {\deg (v_p)+\cdots +\deg (v_\ell ) \over \deg (v_1)+\cdots +\deg (v_\ell )} . \end{aligned}$$

We multiply (35) by this and get the required assertion of the lemma.

We are left to prove (36). Enumerate the neighbors of \(v_j\) which are not in \(\{v_1,\ldots ,v_\ell \}\) by \(u_1,\ldots ,u_{\deg (v_j; V_i)}\). For each \(1 \leqslant m \leqslant \deg (v_j; V_i)\) we condition on the first \(m-1\) edges being closed, by Proposition 3.1 we remove these edges from the graph, and after this removal the graph remains an \({\gamma \over 2}\)-expander by Proposition 2.2. Thus, by Lemma 3.4 we have that in this conditioned graph the resistance on the m-th edge is

$$\begin{aligned} (1+\mathsf {O}(\varepsilon )) \Bigg ( r_m + {1 \over \sum _{q=j}^\ell \deg (v_q) - (m-1) } \Bigg ) ,\end{aligned}$$
(37)

where \(r_m\) satisfies

$$\begin{aligned} {1 \over \deg (u_m)} \leqslant r_m \leqslant {1 \over \deg (u_m; V_i)} .\end{aligned}$$
(38)

We do not necessarily know if \(u_m\) is \((\alpha ,\varepsilon )\)-good itself, and that is why it is not necessarily the case that the two bounds on \(r_m\) are up to \((1+\mathsf {O}(\varepsilon ))\) apart from each other. However, we will use the fact that \(v_j\) is \((\alpha ,\varepsilon )\)-good and use property (c) of this definition.

By (37), the probability that all other edges emanating from \(v_j\) are closed, conditioned on this already occurring for \(v_1,\ldots ,v_{j-1}\), is

$$\begin{aligned} \prod _{m=1}^{\deg (v_j)} \Bigg [ 1- (1+\mathsf {O}(\varepsilon )) \Bigg ( r_m + {1 \over \sum _{q=j}^\ell \deg (v_q) - (m-1) } \Bigg ) \Bigg ] , \end{aligned}$$

which equals

$$\begin{aligned} \exp \Big (-(1+\mathsf {O}(\varepsilon )) \sum _m r_m \Big ) \cdot \exp \Bigg (-(1+\mathsf {O}(\varepsilon )) \sum _m {1 \over \sum _{q=j}^\ell \deg (v_q) - (m-1) }\Bigg ).\qquad \end{aligned}$$
(39)

Since \(v_j\) is \((\alpha ,\varepsilon )\)-good, by property (c) of the definition we learn that

$$\begin{aligned} \sum _{m} r_m = \big (1+\mathsf {O}(\alpha ^{1/2})\big ) \sum _m {1 \over \deg (u_m)} = \big (1+\mathsf {O}(\alpha ^{1/2})\big ) b(v_j). \end{aligned}$$

Property (d) of the same definition asserts that \(b(v_j) \leqslant \alpha ^{-1/4}\), hence the first term in (39) is

$$\begin{aligned} \exp \Big (-(1+\mathsf {O}(\varepsilon )) \sum _m r_m \Big ) = \Big (1+\mathsf {O}(\alpha ^{1/4})\Big ) e^{-b(v_j)} . \end{aligned}$$

To handle the second term of (39) we note that

$$\begin{aligned} \sum _m {1 \over \sum _{q=j}^\ell \deg (v_q) - (m-1) } = \log \Bigg ( {\sum _{q=j} \deg (v_q) \over \sum _{q=j+1} \deg (v_q) } \Bigg ) + \mathsf {O}((\varepsilon n)^{-1}) . \end{aligned}$$

Since \(\{v_1,\ldots ,v_\ell \}\) are \((\alpha ,\varepsilon )\)-good we have that the ratio inside the logarithm is at most \(1+\ell \varepsilon ^{-1}\) and so the second term of (39) equals

$$\begin{aligned} \Big (1+\mathsf {O}(\varepsilon \log (\varepsilon ^{-1}))\Big ) { \sum _{q=j+1}^\ell \deg (v_q) \over \sum _{q=j}^\ell \deg (v_q) } . \end{aligned}$$

We multiply these two terms of (39) and get that (36) holds. \(\square \)

Using Lemma 3.7 we may estimate the probability that \(B_\mathcal {T}(X,r) \cong T\) “purely”, that is, that \(E(B_\mathcal {T}(X,r)) \cap E(G[V_i])\) is a rooted tree that is isomorphic to T. We denote this event by \(B_\mathcal {T}(X,r) {\mathop {\cong }\limits ^{\mathrm {pure}}}T\). Note that \(E(B_\mathcal {T}(X,r)) \cap E(G[V_i])\) need not even be a tree, so the events \(B_\mathcal {T}(X,r) {\mathop {\cong }\limits ^{\mathrm {pure}}}T\) and \(B_\mathcal {T}(X,r) \cong T\) can be very different.

Corollary 3.8

Assume the setting as introduced at the beginning of Sect. 3.3, and let \(\mathcal {T}\) be a UST of G. Fix \(i\in [k]\) and let X be a uniformly chosen random vertex of \(V_i\). Then

$$\begin{aligned} \mathbb {P}\Big (B_\mathcal {T}(X,r) {\mathop {\cong }\limits ^{\mathrm {pure}}}T\Big ) = (1+\mathsf {O}(\ell \alpha ^{1/4})) \mathrm {Freq}(T;G, i) \, , \end{aligned}$$

where \(\mathrm {Freq}(T;G, i)\) is defined in Definition 3.6.

Proof

This is immediate since the event \(E(B_\mathcal {T}(X,r)) \cap E(G[V_i])\cong T\) is the union over all i-pure tuples \((v_1,\ldots ,v_\ell )\) of the event \(B_\mathcal {T}(v_1,r){\mathop {\cong }\limits ^{\mathrm {pure}}}T(v_1,\ldots ,v_\ell )\). These events are not disjoint, since for each automorphism of \(\tau \) of T that fixes the root, we can permute \((v_1,\ldots ,v_\ell )\) according to \(\tau \) and get the identical event. However, up to this invariance, the events are disjoint (which explains the factor of \(|\mathrm {Stab}_T|^{-1}\) in the definition of \(\mathrm {Freq}(T;G,i)\)) and so Lemma 3.7 gives the proof. \(\square \)

Corollary 3.8 is an annealed statement, that is, the probability space is the product of the UST probability measure and an independent uniform vertex X. A quenched statement follows by a second moment argument. We will first need the following assertion.

Lemma 3.9

Suppose that G is a connected graph on n vertices and \(S \subseteq V(G)\). Let \(\mathcal {T}\) be a spanning tree of G and let X be a uniformly chosen vertex of some set \(A\subseteq V(G)\). Then for any fixed rooted tree T of height r we have that

$$\begin{aligned} \mathbb {P}\left( B_\mathcal {T}(X,r) \cong T \mathrm {\ and \ }V(B_\mathcal {T}(X,r)) \cap S \ne \emptyset \right) \leqslant |T|^2|A|^{-1}|S| . \end{aligned}$$

Furthermore, if in addition G has some decomposition \(V(G)=V_0\sqcup V_1\sqcup \ldots \sqcup V_k\), then for each \(i\in [k]\),

$$\begin{aligned} \mathbb {P}\left( B_\mathcal {T}(X,r) {\mathop {\cong }\limits ^{\mathrm {pure}}}T \mathrm {\ and \ }V(B_\mathcal {T}(X,r)) \cap V_i \cap S \ne \emptyset \right) \leqslant |T|^2|A|^{-1}|S| . \end{aligned}$$

Proof

We prove only the first statement; the second follows by the same logic. Fix \(v \in S\). For each vertex q of T, if \(\mathcal {T}\) contains an induced copy of T such that v takes the place of q, then the probability that \(B_\mathcal {T}(X,r) = T\) such that v takes the place of q is at most \(\deg _T(q)/|A|\) since once we choose which edge of \(\mathcal {T}\) that touches v corresponds to the edge of T that touches q towards the root of T, then the corresponding root in \(\mathcal {T}\) is chosen and X has to be chosen precisely to be that root. We bound this probability by |T| / |A| and use the union bound over the vertices q of T and v of S. \(\square \)

We are now ready to prove the quenched version of Corollary 3.8. Observe that without the assumption that \(V_i\) is \((\alpha ,\varepsilon )\)-big \(\mathrm {Freq}(T;G,i)\) can be very small or event 0 (for instance, \(V_i\) could have size less than \(\varepsilon ^5 n\) and if all the vertices of \(V_0\) are connected to each vertex of \(V_i\), then \(V_i\) has no \((\alpha ,\varepsilon )\)-good vertices at all). Thus, we cannot hope to have the have the concentration required for the following quenched statement without this assumption.

Lemma 3.10

Assume the setting as introduced at the beginning of Sect. 3.3. Let \(\mathcal {T}\) be a UST of G. Fix \(i\in [k]\) and assume that \(V_i\) is \((\alpha ,\varepsilon )\)-big. Let X be a uniformly chosen vertex of \(V_i\). Then with probability at least \(1-\mathsf {O}(\ell \alpha ^{1/8})\) the random tree \(\mathcal {T}\) is such that

$$\begin{aligned} \mathbb {P}\left( B_\mathcal {T}(X,r) {\mathop {\cong }\limits ^{\mathrm {pure}}}T\right) = (1+\mathsf {O}(\alpha ^{1/16})) \mathrm {Freq}(T;G, i) . \end{aligned}$$

Proof

We write by \(Z=Z(\mathcal {T})\) the \(\mathcal {T}\)-measurable random variable counting the number of vertices v of \(V_i\) such that \(B_\mathcal {T}(v,r) {\mathop {\cong }\limits ^{\mathrm {pure}}}T\). Corollary 3.8 is equivalent to the assertion that

$$\begin{aligned} \mathbb {E}Z = (1+\mathsf {O}(\ell \alpha ^{1/4})) \mathrm {Freq}(T;G, i) \cdot |V_i| .\end{aligned}$$
(40)

The second moment of Z is

$$\begin{aligned} \mathbb {E}Z^2 = \sum _{v, v' \in V_i} \mathbb {P}\left( B_\mathcal {T}(v,r) {\mathop {\cong }\limits ^{\mathrm {pure}}}T \mathrm {\ and \ }B_\mathcal {T}(v',r) {\mathop {\cong }\limits ^{\mathrm {pure}}}T\right) .\end{aligned}$$
(41)

We will show that for any \(v\in V_i\),

$$\begin{aligned} \sum _{v' \in V_i} \mathbb {P}\left( B_\mathcal {T}(v', r) {\mathop {\cong }\limits ^{\mathrm {pure}}}T \, \mid B_\mathcal {T}(v,r) {\mathop {\cong }\limits ^{\mathrm {pure}}}T \right) \leqslant (1+\mathsf {O}(\ell \alpha ^{1/4}))\mathrm {Freq}(T;G,i)|V_i|.\quad \end{aligned}$$
(42)

If we have this, then since Corollary 3.8 gives that

$$\begin{aligned} \sum _{v \in V_i} \mathbb {P}\left( B_\mathcal {T}(v,r) {\mathop {\cong }\limits ^{\mathrm {pure}}}T \right) = (1+\mathsf {O}(\ell \alpha ^{1/4})) \mathrm {Freq}(T;G,i) \cdot |V_i| \, , \end{aligned}$$

by putting both in (41) we get that

$$\begin{aligned} \mathbb {E}Z^2 = (1+\mathsf {O}(\ell \alpha ^{1/4})) \big [ \mathrm {Freq}(T;G, i) \cdot |V_i| \big ]^2 . \end{aligned}$$

Hence

$$\begin{aligned} \mathrm {Var}(Z) = \mathsf {O}(\ell \alpha ^{1/4}) [ \mathrm {Freq}(T;G, i) \cdot |V_i| \big ]^2 \, , \end{aligned}$$

and so by Chebychev’s inequality we learn that

$$\begin{aligned} \mathbb {P}\left( |Z - \mathbb {E}Z| \geqslant \alpha ^{1/16} \mathrm {Freq}(T;G, i) \cdot |V_i| \right) \leqslant \mathsf {O}(\ell \alpha ^{1/8}) \, , \end{aligned}$$

concluding the proof.

To prove (42) we fix \(v \in V_i\) and condition on the event \(B_\mathcal {T}(v,r) {\mathop {\cong }\limits ^{\mathrm {pure}}}T\) and on the vertices \((v_1,\ldots ,v_\ell )\) which form \(B_\mathcal {T}(v,r)\). By Proposition 3.1 we contract the edges of \(\mathcal {T}\) in \(B_\mathcal {T}(v,r)\) and erase the edges we conditioned that are not in \(\mathcal {T}\). Denote the resulting multigraph by \(G^*\) and by \(v^*\) the vertex to which the vertices \(v_1,\ldots ,v_\ell \) have been identified. We consider the loopless multigraph \(G^*\) with the partition

$$\begin{aligned} V(G^*) = V_0\sqcup V_1\sqcup \ldots \sqcup V_{i-1}\sqcup V_i^* \sqcup V_{i+1}\sqcup \ldots \sqcup V_k \, , \end{aligned}$$

where \(V_i^*\) is \(V_i\) with the vertices \(v_1,\ldots ,v_\ell \) replaced by \(v^*\). We now claim that this partition is a \((\beta , \alpha , \gamma /2, \varepsilon )\)-good-decomposition (as in Definition 2.10). First, by Proposition 2.2 the graph \(G^*[V_i^*]\) is still a \(\gamma /2\)-expander and so the partition \(V_0\sqcup V_1\sqcup \ldots \sqcup V_{i-1}\sqcup V_i^* \sqcup V_{i+1}\sqcup \ldots \sqcup V_k\) is a \((\gamma /2,\varepsilon ^5,\varepsilon ^5)\)-expander decomposition of \(G^*\). Next we verify that the number of good vertices has not dropped by too much. Indeed, from each vertex \(u \in V_i {\setminus } \{v_1,\ldots ,v_\ell \}\) at most \(\ell \) edges touching it were erased and hence it is immediate to check in Definition 2.9 that, as long as n is large enough (in terms of \(\varepsilon \) and \(\alpha \)), if u was \((\alpha ,\varepsilon )\)-good in G, then it is \((\alpha ,\varepsilon )\)-good in \(G^*\)—the constants may have changed, but they are swallowed in the \(\mathsf {O}(\cdot )\) and \(\mathsf {\Omega }(\cdot )\) notation of Definition 2.9.

Thus we may apply Corollary 3.8 and obtain that

$$\begin{aligned} \sum _{v' \in V_i^*} \mathbb {P}\left( B_\mathcal {T}(v',r) {\mathop {\cong }\limits ^{\mathrm {pure}}}T \mid G^* \right) = (1+\mathsf {O}(\ell \alpha ^{1/4})) \mathrm {Freq}(T; G^*, i)\cdot |V_i^*| .\end{aligned}$$
(43)

We wish to bound the sum in (42) by the above sum, however, there are two important differences between the sums. The first is the difference in the input to the \(\mathrm {Freq}\) function. \(\square \)

Claim 3.10.1

If \(V_i\) is \((\alpha ,\varepsilon )\)-big, then

$$\begin{aligned} c\gamma ^{\ell +1} e^{-\ell /\gamma } \leqslant \mathrm {Freq}(T; G, i) \leqslant \ell \gamma ^{-\ell -1}\, , \end{aligned}$$

for some universal constant \(c>0\).

Proof

Since \(V_i\) is \((\alpha ,\varepsilon )\)-big we have at least \((1-\mathsf {O}(\alpha ^{1/8}))|V_i|\) vertices which are \((\alpha ,\varepsilon )\)-good and these must span at least \(\mathsf {\Omega }(\alpha ^{1/9} |V_i|n)\) edges, since the number of edges of \(G[V_i]\) touching non \((\alpha ,\varepsilon )\)-good vertices of \(V_i\) is at most \(\mathsf {O}(\alpha ^{1/8}|V_i|n)\) and we also have \(e(G[V_i])=\mathsf {\Omega }(\alpha ^{1/9}|V_i|n)\). Thus the average degree of this graph is \(d=\mathsf {\Omega }( \alpha ^{1/9} n)\), and, by greedily removing vertices of degree d / 4 we can obtain a subgraph of \(G[V_i]\) of minimal degree \(\mathsf {\Omega }( \alpha ^{1/9} n)=\mathsf {\Omega }(\gamma n)\) such that all of its vertices are \((\alpha ,\varepsilon )\)-good. Thus it is immediate to find \(\mathsf {\Omega }(\gamma ^\ell n^\ell )\) copies of the tree T. We deduce that the number of \(\ell \)-tuples counted in \(\mathrm {Freq}(T; G, i)\) is at least \(\mathsf {\Omega }(\gamma ^\ell n^\ell )\) and at most \(n^\ell \). Since each vertex degree in \(V_i\) is at most n and at least \(\gamma n\) we learn that each \(\ell \)-tuple contributes to \(\mathrm {Freq}(T; G,i)\) at most \(\ell \gamma ^{-\ell -1}n^{-\ell }\) and at least \(\gamma e^{-\ell /\gamma } n^{-\ell }\) and the claim follows. \(\square \)

By this claim, since \(V_i\) and \(V_i^*\) differ by \(\ell -1\) vertices, and \(\ell \)-tuples of \(V_i\) which one of vertices is in \(\{v_1,\ldots ,v_\ell \}\) can contribute at most \(\mathsf {O}(\ell \gamma ^{-\ell -1} n^{-1})\) to \(\mathrm {Freq}(T;G, i)\) we learn that

$$\begin{aligned} \mathrm {Freq}(T; G, i) = (1+ \mathsf {O}(\ell \gamma ^{-\ell -1} n^{-1})) \mathrm {Freq}(T; G^*, i) \, ,\end{aligned}$$
(44)

and n can be taken large enough to that the error in the \(\mathsf {O}\)-notation above is \(\mathsf {O}(\ell \alpha ^{1/4})\). This handles the first difference.

The second difference is that \(B_\mathcal {T}(v',r)\) may be isomorphic to T in G by using some of the edges in \(T(v_1,\ldots ,v_\ell )\) that were contracted to \(v^*\) in \(G^*\). In this case, it does not necessarily hold that \(B_\mathcal {T}(v',r) \cong T\) in \(G^*\), so it is possible that this contribution to (42) is large and not counted for in (43). We will show that this is not the case.

We bound the LHS of (42) as follows. The terms corresponding to \(v' \in \{v_1,\ldots ,v_\ell \}\) we bound by 1 and get a negligible contribution of \(\ell \). For any other \(v'\) we split the event \(B_\mathcal {T}(v',r){\mathop {\cong }\limits ^{\mathrm {pure}}}T\) according to whether \(V(B_\mathcal {T}(v',r)) \cap V(B_\mathcal {T}(v,r)) = \emptyset \). This intersection is empty if and only if \(v^* \not \in V(B_\mathcal {T}(v',r))\) in the graph \(G^*\), and if it is empty, then it holds that \(B_\mathcal {T}(v',r){\mathop {\cong }\limits ^{\mathrm {pure}}}T\) in \(G^*\). Thus the LHS of (42) is at most

$$\begin{aligned}&\ell + \sum _{v' \in V_i^*} \mathbb {P}\left( B_\mathcal {T}(v', r) {\mathop {\cong }\limits ^{\mathrm {pure}}}T \mid G^*\right) \\&\quad + \sum _{v' \in V_i^*} \mathbb {P}\left( B_\mathcal {T}(v', r) {\mathop {\cong }\limits ^{\mathrm {pure}}}T \mathrm {\ and \ }v^* \in V(B_\mathcal {T}(v',r)) \mid G^*\right) .\end{aligned}$$

The first term in the above is negligible, the second we bound using (43) and (44) by the RHS of (42). The third term we bound using Lemma 3.9 applied on the graph \(G^*\) with \(S=\{v^*\}\) and \(A=V_i^*\), giving the bound

$$\begin{aligned} \sum _{v' \in V_i^*} \mathbb {P}\left( B_\mathcal {T}(v',r) {\mathop {\cong }\limits ^{\mathrm {pure}}}T \mathrm {\ and \ }v^*\in V(B_\mathcal {T}(v',r)) \,\, \mid \,\, G^* \right) \leqslant \ell ^2 \, , \end{aligned}$$

which is also negligible. This completes the proof of (42) and concludes the proof of the lemma.\(\square \)

We now turn to estimating the event \(B_\mathcal {T}(X,r) \cong T\) rather than the “pure” version of this event. To that aim we define

$$\begin{aligned} \mathcal {O}= \big \{ (x,y) \in \mathcal {T}: i(x) \ne i(y) \mathrm {\ or \ } x\in V_0 \mathrm {\ or \ } y \in V_0 \big \} \, ,\end{aligned}$$
(45)

that is, \(\mathcal {O}\) is the set of edges of \(\mathcal {T}\) that are between components or contained in \(V_0\). We first assert that this set cannot be too large.

Lemma 3.11

Assume the setting as introduced at the beginning of Sect. 3.3. Let \(\mathcal {T}\) be a UST of G and \(\mathcal {O}\subseteq \mathcal {T}\) be defined in (45). Then

$$\begin{aligned} \mathbb {E}|\mathcal {O}| = \mathsf {O}(\alpha ^{1/4} n) . \end{aligned}$$

Proof

Let us assume that n is large enough as in Lemma 3.4. By Lemma 3.4 and Kirchoff’s formula (24) we have that

$$\begin{aligned} \mathbb {E}|\mathcal {T}\cap E(G[V_i])|\geqslant & {} (1-\mathsf {O}(\varepsilon )) {1 \over 2} \sum _{\begin{array}{c} u \in V_i \end{array}}\sum _{\begin{array}{c} v \in V_i : v\sim u \end{array}} \left( {1 \over \deg (u)} + {1 \over \deg (v)} \right) \\= & {} (1-\mathsf {O}(\varepsilon )) \sum _{u \in V_i} {\deg (u; V_i) \over \deg (u)} \, . \end{aligned}$$

Hence

$$\begin{aligned} \mathbb {E}\left| \mathcal {T}\cap \bigcup _{i=1}^k E(G[V_i])\right| \geqslant (1-\mathsf {O}(\varepsilon )) \sum _{u \in V{\setminus } V_0} {\deg (u; V_{i(u)}) \over \deg (u)} . \end{aligned}$$

For any u that is \((\alpha ,\varepsilon )\)-good we have by part (b) of Definition 2.9 that \(\deg (u; V_{i(u)})\geqslant (1-\varepsilon ^2) \deg (u)\). Since at least \((1-\mathsf {O}(\alpha ^{1/4}))n\) vertices are \((\alpha ,\varepsilon )\)-good we deduce that

$$\begin{aligned} \mathbb {E}\left| \mathcal {T}\cap \bigcup _{i=1}^k E(G[V_i])\right| \geqslant (1-\mathsf {O}(\alpha ^{1/4}))n . \end{aligned}$$

The proof is now completed since \(|\mathcal {T}|=n-1\) with probability 1. \(\square \)

We now reach our final destination.

Lemma 3.12

Assume the setting as introduced at the beginning of Sect. 3.3. Let \(\mathcal {T}\) be a UST of G and let X be a uniformly chosen vertex of G. Then with probability at least \(1-\mathsf {O}(\ell \alpha ^{1/8})\) for the random tree \(\mathcal {T}\) we have

$$\begin{aligned} \mathbb {P}\left( B_\mathcal {T}(X,r) \cong T\right) = (1+\mathsf {O}(\ell ^2 \alpha ^{1/16})) \mathrm {Freq}(T;G) + \mathsf {O}(\ell ^2 \beta ^{1/8}) . \end{aligned}$$

Proof

Let \(\mathcal {O}\) be defined in (45). We apply Lemma 3.10, together with Lemma 3.11 and Markov’s inequality, and get that with probability at least \(1-\mathsf {O}(\ell \alpha ^{1/8})\) the random tree \(\mathcal {T}\) satisfies

  1. (1)

    \(|\mathcal {O}| \leqslant \alpha ^{1/8} n\), and

  2. (2)

    For each \(i\in [k]\) for which \(V_i\) is \((\alpha ,\varepsilon )\)-big we have that

    $$\begin{aligned} \mathbb {P}\left( B_\mathcal {T}(X,r) {\mathop {\cong }\limits ^{\mathrm {pure}}}T \right) = (1+\mathsf {O}(\ell \alpha ^{1/16})) \mathrm {Freq}(T;G, i) \, , \end{aligned}$$

    where X is a uniformly chosen vertex of \(V_i\).

Let X be a uniformly chosen vertex of G. By Proposition 2.12 the probability that X is in either \(V_0\) or in some \(V_i\) that is not \((\alpha ,\varepsilon )\)-big is at most \(\mathsf {O}(\beta ^{1/8})\). Conditioned on \(X \in V_i\) we have that X is uniform random vertex of \(V_i\). Hence we may sum item (2) above over these i’s and get that with probability at least \(1-\mathsf {O}(\ell \alpha ^{1/8})\) the random tree \(\mathcal {T}\) satisfies

$$\begin{aligned} \mathbb {P}(B_\mathcal {T}(X,r) {\mathop {\cong }\limits ^{\mathrm {pure}}}T) = (1+\mathsf {O}(\ell \alpha ^{1/16})) \mathrm {Freq}(T;G) + \mathsf {O}(\beta ^{1/8}) \, , \end{aligned}$$

by definition of \(\mathrm {Freq}(T;G)\).

Assume \(\mathcal {T}\) satisfies these. If \(B_\mathcal {T}(X,r) \cong T\) but not \(B_\mathcal {T}(X,r) {\mathop {\cong }\limits ^{\mathrm {pure}}}T\), then we must have that \(B_\mathcal {T}(X,r) \cong T\) and \(B_\mathcal {T}(X,r) \cap V(\mathcal {O}) \ne \emptyset \). Similarly, if \(B_\mathcal {T}(X,r) {\mathop {\cong }\limits ^{\mathrm {pure}}}T\) but not \(B_\mathcal {T}(X,r) \cong T\), then we similarly must have that \(B_\mathcal {T}(X,r) {\mathop {\cong }\limits ^{\mathrm {pure}}}T\) and \(B_\mathcal {T}(X,r) \cap V(\mathcal {O}) \ne \emptyset \). Since \(|\mathbb {P}(A) - \mathbb {P}(B)|\leqslant \mathbb {P}(A{\setminus } B) + \mathbb {P}(B{\setminus } A)\) we have that

$$\begin{aligned} \left| \mathbb {P}\left( B_\mathcal {T}(X,r) \cong T\right) - \mathbb {P}\left( B_\mathcal {T}(X,r) {\mathop {\cong }\limits ^{\mathrm {pure}}}T\right) \right| = \mathsf {O}(\ell ^2 \beta ^{1/8}) \, ,\end{aligned}$$

where the last inequality is due to Lemma 3.9 with \(A=V(G)\) and \(S=V(\mathcal {O})\). \(\square \)

4 Proof of Theorems 1.1 and 1.3

4.1 Deriving Theorem 1.1 from Lemma 3.12 via Anatomies

In this section, we deduce Theorem 1.1 from Lemma 3.12. The main technical result of this section, Lemma 4.3, says that the quantity \(\mathrm {Freq}\) is continuous in a certain robust sense. To prove Lemma 4.3, we need to group the elements of \(\varOmega \) for a given graphon \(W:\varOmega ^2\rightarrow [0,1]\) to groups with similar degrees. Actually, we need a recursive refinement of this, as follows. We call the above partition of \(\varOmega \) according to the degrees anatomy of depth 1. Now, having defined an anatomy of depth d, anatomy of depth \(d+1\) is a decomposition of \(\varOmega \) into groups in which elements have approximately similar degrees into every individual cell of the anatomy of depth d. Let us make this precise.

Suppose that \(h\in \mathbb {N}\) and S is a finite set. Let \(\mathfrak C(h,S)\) be the partition of \([0,1]^S\) into Voronoi cells generated by points \(p\in \{0,\frac{1}{h},\ldots ,\frac{h-1}{h},1\}^S\) (we assign each boundary point to one arbitrary neighboring cell to break ties). Note that these cells form \((h+1)^{|S|}\) cubes of the form \(\prod _{i\in S} \left[ \max \{0,\frac{2r_i-1}{2h}\},\min \{1,\frac{2r_i+1}{2h}\}\right] \), for some \(\big \{r_i\in \{0,\ldots ,h\}\big \}_{i\in S}\).Footnote 5 In the degenerate case \(S=\emptyset \), we define \(\mathfrak C(h,S):=\{\emptyset \}\). We call the points p the centers of the cells. When \(C\in \mathfrak C(h,S)\) is a Voronoi cell with center \(p=(p_i)_{i\in S}\), for \(i\in S\) we write \(center_i(C):=p_i\).

Let us write \(\texttt {b}\) for an (abstract) element. Below, the only purpose of \(\texttt {b}\) will be to refer to a coordinate that will have to do with the function \(b_W(\cdot )\).

Now, suppose that \(d\in \mathbb {N}\) and \(\mathbf {h}\in \mathbb {N}^d\). For \(t\in [d]\), we write \(\mathbf {h}_t\) for the t-coordinate of \(\mathbf {h}\), and \(\mathbf {h}\llbracket t\rrbracket \) for the t-dimensional vector obtained from \(\mathbf {h}\) by removing the last \(d-t\) components. For \(h\in \mathbb {N}\), let \(\mathfrak D_{h,0}:=\mathfrak C(h,\emptyset )\) and for \(d\in \mathbb {N}\) and \(\mathbf {h}\in \mathbb {N}^d\) let \(\mathfrak D_{\mathbf {h}}:=\mathfrak C\left( \mathbf {h}_d,\mathfrak D_{\mathbf {h}\llbracket d-1\rrbracket }\sqcup \{\texttt {b}\}\right) \). We have

$$\begin{aligned} \sum _{t=0}^d\left| \mathfrak D_{\mathbf {h\llbracket t\rrbracket }}\right| = \hbar (\mathbf {h})\;, \end{aligned}$$
(46)

for a suitable tower-function \(\hbar (\cdot )\).

Suppose \(W:\varOmega ^2\rightarrow [0,1]\) is a graphon and \(h\in \mathbb {N}\) is arbitrary. Let \(\mathcal A_\emptyset :=\varOmega \). We call \(\{\mathcal A_\emptyset \}=\{\mathcal A_C\}_{C\in \mathfrak D_{h,0}}\) anatomy of W of depth 0. Suppose that \(d\in \mathbb {N}\), \(\mathbf {h}\in \mathbb {N}^d\) and that we already know the anatomy \(\{\mathcal A_C\}_{C\in \mathfrak D_{\mathbf {h}\llbracket d-1\rrbracket }}\) of W of depth \(d-1\), which is a partition of \(\varOmega \). Now, for each \(\omega \in \varOmega \) we consider the \(|\mathfrak D_{\mathbf {h}\llbracket d-1\rrbracket }|\)-tuple of degrees

$$\begin{aligned} \mathbf {deg}_{d}(\omega ):=\left( \deg _W(\omega ,\mathcal A_C)\right) _{C\in \mathfrak D_{\mathbf {h}\llbracket d-1\rrbracket }}. \end{aligned}$$

Then for each \(F\in \mathfrak D_{\mathbf {h}}\) we define \(\mathcal A_F:=(\mathbf {deg}_{d})^{(-1)}(F)\cap (\exp (-b_W(\cdot )))^{(-1)}(F)\). In words, each cell \(\mathcal A_F\subseteq \varOmega \) has the property that for each \(C\in \mathfrak D_{\mathbf {h}\llbracket d-1\rrbracket }\) and for each \(\omega \in \mathcal A_F\) we have that

$$\begin{aligned} \deg _W(\omega ,\mathcal A_C)=center_C(F)\pm \tfrac{1}{2\mathbf {h}_d} \quad \text{ and }\quad \exp (-b_W(\omega ))=center_{\texttt {b}}(F)\pm \tfrac{1}{2\mathbf {h}_d} . \end{aligned}$$
(47)

We call \(\{\mathcal A_F\}_{F\in \mathfrak D_{\mathbf {h}}}\) the anatomy of W of depth d . Obviously, \(\{\mathcal A_F\}_{F\in \mathfrak D_{\mathbf {h}}}\) is a partition of \(\varOmega \). Consider an arbitrary \(\omega \in \mathcal A_F\). Summing (47) over all \(C\in \mathfrak D_{\mathbf {h}\llbracket d-1\rrbracket }\) (for which (46) tells us that there are \(\hbar (\mathbf {h}\llbracket d-1\rrbracket )\) summands), we get

$$\begin{aligned} \deg _W(\omega )=\sum _{C\in \mathfrak D_{\mathbf {h}\llbracket d-1\rrbracket }}center_C(F)\; \pm \tfrac{\hbar (\mathbf {h}\llbracket d-1\rrbracket )}{2\mathbf {h}_d}. \end{aligned}$$
(48)

For this reason, we shall call the number \(\sum _{C\in \mathfrak D_{\mathbf {h}\llbracket d-1\rrbracket }}center_C(F)\) the degree of \(\mathcal A_F\) (in W), and denote it by \(\deg ^{\mathrm {anat}}_W(\mathcal A_F)\).

Suppose \(U:\varOmega ^2\rightarrow [0,1]\) and \(X\subseteq \varOmega \). Suppose that \(d\in \mathbb {N}\), \(\mathbf {h}\in \mathbb {N}^d\), and \(\kappa \geqslant 0\) are given. Let \(\mathcal B_\emptyset :=\varOmega \) and for each \(t\in [d]\) let \(\{\mathcal B_C\}_{C\in \mathfrak D_{\mathbf {h}\llbracket t\rrbracket }}\) be a partition of \(\varOmega \). Suppose that for each \(t\in [d]\), each \(F\in \mathfrak D_{\mathbf {h}\llbracket t\rrbracket }\), each \(C\in \mathfrak D_{\mathbf {h}\llbracket t-1\rrbracket }\) and each \(\omega \in \mathcal B_F{\setminus } X\) we have

$$\begin{aligned} \left| \deg _U(\omega ,\mathcal B_C)-center_C(F)\right| \leqslant \tfrac{1}{2\mathbf {h}_t}+\kappa \quad \text{ and }\quad \left| \exp (-b_U(\omega ))-center_{\texttt {b}}(F)\right| \leqslant \tfrac{1}{2\mathbf {h}_d}+\kappa .\nonumber \\ \end{aligned}$$
(49)

We then say that \(\left\{ \{\mathcal B_F\}_{F\in \mathfrak D_{\mathbf {h}\llbracket t\rrbracket }}\right\} _{t=0}^{d}\) are \(\kappa \)-approximate anatomies of U up to depth d with exceptional set X.

Note that when we take \(\kappa =0\) and \(X=\emptyset \) then we recover the notion of anatomies.

Our next lemma says that two graphons that are close in the cut-distance have similar anatomies.

Lemma 4.1

Suppose that \(W:\varOmega ^2\rightarrow [0,1]\) is a nondegerate graphon and \(d\in \mathbb {N}\), \(\mathbf {h}\in \mathbb {N}^d\) are arbitrary. Let \(\left\{ \{\mathcal A_F\}_{F\in \mathfrak D_{\mathbf {h}\llbracket t\rrbracket }}\right\} _{t=0}^{d}\) be the anatomies of W up to depth d. For every \(\kappa >0\) there exists a number \(\delta >0\) such that the following holds for every graphon \(U:\varOmega ^2\rightarrow [0,1]\) with \(\Vert W-U\Vert _\square <\delta \). There exists a set \(X\subseteq \varOmega \) of measure at most \(\kappa \) so that \(\left\{ \{\mathcal A_F\}_{F\in \mathfrak D_{\mathbf {h}\llbracket t\rrbracket }}\right\} _{t=0}^{d}\) are \(\kappa \)-approximate anatomies of U up to depth d with exceptional set X.

Proof

Suppose that we are given W, d, \(\mathbf {h}\), and \(\kappa \) as above.

Since W is nondegenerate, there exists \(\beta \in (0,10^{-6})\) such that the measure of the set

$$\begin{aligned} S:=\left\{ \omega \in \varOmega :\deg (\omega )<16\root 4 \of {\beta }\right\} \end{aligned}$$

is less than \(\frac{\kappa ^2}{100}\). Let \(\delta :=\min \{\frac{\beta \kappa ^4}{400},\frac{\kappa ^2}{4\hbar (\mathbf {h})^2}\}\). Suppose that U is given with \(\Vert W-U\Vert _\square <\delta \). It is enough to prove that there exists a set \(X_{\texttt {b}}\subseteq \varOmega \) of measure at most \(\frac{\kappa }{4}\) such that for each \(t\in [d]\) and each \(F\in \mathfrak D_{\mathbf {h}\llbracket t\rrbracket }\), we have for each \(\omega \in F{\setminus } X_{\texttt {b}}\) that

$$\begin{aligned} \left| \exp (-b_U(\omega ))-center_{\texttt {b}}(F)\right| \leqslant \tfrac{1}{2\mathbf {h}_d}+\kappa \;, \end{aligned}$$
(50)

and that (with t and F as above) for each \(C\in \mathfrak D_{\mathbf {h}\llbracket t-1\rrbracket }\) we have that all but at most \(\frac{\kappa }{4\hbar (\mathbf {h})^2}\) measure of elements \(\omega \in \mathcal A_{F}\) satisfy

$$\begin{aligned} \deg _U(\omega ,\mathcal A_C)\geqslant center_C(F)-{1 \over 2\mathbf {h}_t}- \kappa \;, \end{aligned}$$
(51)

and all but at most \(\frac{\kappa }{4\hbar (\mathbf {h})^2}\) measure of elements \(\omega \in \mathcal A_{F}\) satisfy

$$\begin{aligned} \deg _U(\omega ,\mathcal A_C)\leqslant center_C(F) +{1 \over 2\mathbf {h}_t}+ \kappa . \end{aligned}$$
(52)

The lemma will then follow from (46) by taking X to be the union of \(X_{\texttt {b}}\) together with the exceptional sets from (51), (52) over all t, C, and F. \(\square \)

The following claim clearly implies (50).

Claim 4.1.1

There exists a set \(X_{\texttt {b}}\) of measure at most \(\frac{\kappa }{4}\) such that for all \(\omega \in \varOmega {\setminus } X_{\texttt {b}}\) we have \(|b_W(\omega )-b_U(\omega )|<\kappa \).

For the proof of Claim 4.1.1, we need Claims 4.1.24.1.4 below.

Claim 4.1.2

Suppose that \(\varGamma :\varOmega ^2\rightarrow [0,1]\) is a graphon and that \(A\subseteq \varOmega \). Then

$$\begin{aligned} \int _{x\in \varOmega }\int _{\omega \in A} \frac{\varGamma (x,\omega )}{\deg _\varGamma (\omega )}\leqslant \mu (A). \end{aligned}$$

Proof

By Fubini’s Theorem, we have

$$\begin{aligned} \int _{x\in \varOmega }\int _{\omega \in A} \frac{\varGamma (x,\omega )}{\deg _\varGamma (\omega )}&=\int _{\omega \in A}\frac{1}{\deg _\varGamma (\omega )}\int _{x\in \varOmega } \varGamma (x,\omega )=\int _{\omega \in A}\frac{1}{\deg _\varGamma (\omega )}\deg _\varGamma (\omega ,A)\leqslant \mu (A). \end{aligned}$$

\(\square \)

Claim 4.1.3

Suppose \(\varGamma _1,\varGamma _2:\varOmega ^2\rightarrow [0,1]\) are two graphons and that \(f:\varOmega \rightarrow [0,C]\) is an arbitrary function. Then

$$\begin{aligned} \int _{x\in \Omega }\left| \int _{\omega \in \Omega }f(\omega )(\varGamma _1(x,\omega )-\varGamma _2(x,\omega )\right| \leqslant 2C\cdot \Vert \varGamma _1-\varGamma _2\Vert _\square \;. \end{aligned}$$

Proof

By [15, (8.20)] we can alternatively express as \(\Vert \varGamma _1-\varGamma _2\Vert _\square \)

$$\begin{aligned} \Vert \varGamma _1-\varGamma _2\Vert _\square&=\sup _{F,G:\Omega \rightarrow [0,1]}\left| \int _x\int _y G(x)F(y)(\varGamma _1(x,y)-\varGamma _2(x,y))\right| =\\&=\frac{1}{C}\cdot \sup _{F:\Omega \rightarrow [0,C],G:\Omega \rightarrow [0,1]}\left| \int _x\int _y G(x)F(y)(\varGamma _1(x,y)-\varGamma _2(x,y))\right| \;.\\&\geqslant \frac{1}{2C}\cdot \sup _{F:\Omega \rightarrow [0,C],G:\Omega \rightarrow [-1,1]}\left| \int _x\int _y G(x)F(y)(\varGamma _1(x,y)-\varGamma _2(x,y))\right| \;. \end{aligned}$$

The claim follows by considering in the latter supremum functions

$$\begin{aligned} F(\cdot ) := f(\cdot ) \quad \text{ and }\quad G(\cdot ) := \mathrm {sgn}\left( \int _y f(y) (\varGamma _1(\cdot ,y)-\varGamma _2(\cdot ,y)) \right) . \end{aligned}$$

\(\square \)

For the last auxiliary claim, we shall introduce an auxiliary notion. Below we shall work with not necessarily symmetric \(L^1\)-functions \(K:\varOmega ^2\rightarrow [0,+\infty )\). Note that the notions of cut-norm and cut-distance extend to this setting.Footnote 6 The following claim is then obvious.

Claim 4.1.4

Suppose that \(\varepsilon >0\) and \(\varGamma ,\varGamma ':\varOmega ^2\rightarrow [0,1]\) are two \(L^1\)-functions such that for each \(x,y\in \varOmega \), \(\max (\varGamma (x,y),\varGamma '(x,y))<(1+\varepsilon )\min (\varGamma (x,y),\varGamma '(x,y))\). Then \(\Vert \varGamma -\varGamma '\Vert _\square \leqslant \varepsilon \). \(\square \)

Proof of Claim 4.1.1

Let

$$\begin{aligned} D:=\left\{ \omega \in \varOmega {\setminus } S:\deg _W(\omega )\ne \left( 1\pm \tfrac{\beta ^{0.3}\kappa ^2}{4}\right) \deg _U(\omega )\right\} . \end{aligned}$$

Since for each \(\omega \in D\) we have \(|\deg _W(\omega )-\deg _U(\omega )|\geqslant 3\beta ^{0.55}\kappa ^2\) we get from \(\Vert W-U\Vert _\square <\frac{\beta \kappa ^4}{400}\) that \(\mu (D)\leqslant \frac{\kappa ^2}{100}\).

Put \(A:=S\cup D\). We have \(\mu (A)\leqslant \frac{\kappa ^2}{50}\).

We have

$$\begin{aligned} \int _{x\in \varOmega } \left| b_W(x)-b_U(x)\right|\leqslant & {} \int _{x\in \varOmega } \left| \int _{\omega \in \varOmega {\setminus } A}\frac{W(x,\omega )}{\deg _W(\omega )}-\frac{U(x,\omega )}{\deg _U(\omega )}\right| \nonumber \\&+\int _{x\in \varOmega } \left| \int _{\omega \in A}\frac{W(x,\omega )}{\deg _W(\omega )}-\frac{U(x,\omega )}{\deg _U(\omega )}\right| \end{aligned}$$
(53)

The second term on the right-hand side of (53) is bounded by Claim 4.1.2 by at most \(2\mu (A)\leqslant \kappa ^2/50\). Now, let us consider the \(L^1\)-function \(\tilde{U}\), defined by

$$\begin{aligned} \tilde{U}(x,y):={\left\{ \begin{array}{ll}\frac{\deg _W(y)}{\deg _U(y)} \cdot U(x,y) \quad \text{ if } \quad y\not \in A\\ U(x,y)\quad \text{ if } \quad y\in A \end{array}\right. }. \end{aligned}$$

Observe that U and \(\tilde{U}\) satisfy the assumptions of Claim 4.1.4 with error parameter \(\left( \frac{\beta ^{0.3}\kappa ^2}{4}\right) \). Then the first term of the right-hand side of (53) can be rewritten as

$$\begin{aligned} \int _{x\in \varOmega }\left| \int _{\omega \in \varOmega } \frac{\mathbf {1}_{\varOmega {\setminus } A}(\omega )}{\deg _W(\omega )}\cdot \left( W(x,\omega )-\tilde{U}(x,\omega )\right) \right| . \end{aligned}$$
(54)

Observe that the function \(\frac{\mathbf {1}_{\varOmega {\setminus } A}(\cdot )}{\deg _W(\cdot )}\) is bounded by above by \(\frac{1}{16\root 4 \of {\beta }}\). Claim 4.1.3 tells us that (54) is at most

figure f

Plugging this back into (53) we obtain that \(\int _{x\in \varOmega }|b_W(x)-b_U(x)|<\frac{\kappa ^2}{4}\). The claim follows.

\(\square \)

It now remains to prove (51) and (52). We will only prove (51) since the proof of (52) is verbatim. So, suppose that (51) fails, i.e., the set \(X:=\{\omega \in \mathcal A_F:\deg _U(\omega ,\mathcal A_C)> center_C(F) +{1 \over 2\mathbf {h}_t}+ \kappa \}\) satisfies \(\mu (X)>\frac{\kappa }{2\hbar (\mathbf {h})}\). By the definition of \(\mathcal A_F\) we have for every \(\omega \in X\) that \(\deg _W(\omega ,\mathcal A_C)\leqslant center_C(F) +{1 \over 2\mathbf {h}_t}\). Therefore,

$$\begin{aligned} \int _{X\times \mathcal A_F} W&\leqslant \left( center_C(F)+{1 \over 2\mathbf {h}_t}\right) \mu (X)\quad \text{ but }\\ \int _{X\times \mathcal A_F} U&> \left( center_C(F)+{1 \over 2\mathbf {h}_t}+\kappa \right) \mu (X)\geqslant \left( center_C(F)+{1 \over 2\mathbf {h}_t}\right) \mu (X)+\frac{\kappa ^2}{2\hbar (\mathbf {h})}\quad . \end{aligned}$$

\(\square \)

This is a contradiction to the fact that \(\Vert U-W\Vert _\square <\delta \).

For the proof of Lemma 4.3 we need the following result.

Proposition 4.2

Suppose that \(W:\varOmega ^2\rightarrow [0,1]\) is a nondegenerate graphon. Then

$$\begin{aligned} \mathbb {E}_{x\in \varOmega }[b_W(x)]=1. \end{aligned}$$

Proof

By Fubini’s Theorem,

$$\begin{aligned} \mathbb {E}_{x\in \varOmega }[b_W(x)]=\int _x\left( \int _y\frac{W(x,y)}{\deg (y)}\;\mathrm {d}y\right) \mathrm {d}x=\int _y\left( \int _x \frac{W(x,y)}{\deg (y)}\mathrm {d}x\right) \mathrm {d}y=1. \end{aligned}$$

\(\square \)

Lemma 4.3 puts a relation between quantities \(\mathrm {Freq}(T;W)\) and \(\mathrm {Freq}^-(T;G,V_0,E_0)\), where G is a graph, \(V_0\subseteq V(G)\) and \(E_0\subseteq E(G)\), and \(\mathrm {Freq}^-(T;G,V_0,E_0)\) is defined as

$$\begin{aligned}&\mathrm {Freq}^-(T;G,V_0,E_0):=|\mathrm {Stab}_T|^{-1}\\&\quad \times \sum _{\begin{array}{c} v_1,\ldots ,v_\ell \in V(G) {\setminus } V_0\\ \forall ij\in E(T):v_iv_j\in E(G){\setminus } E_0 \end{array}} \frac{1}{v(G)}\exp \left( -\sum _{j=1}^{p-1} b_G(v_j) \right) \frac{\sum _{j=p}^\ell \deg _G(v_j) }{\prod _{j=1}^\ell \deg _G(v_j)}. \end{aligned}$$

We are now ready to state Lemma 4.3. It says that the parameter \(\mathrm {Freq}(T;\cdot )\) is continuous in a certain sense. While it would be possible to prove continuity of \(\mathrm {Freq}(T;\cdot )\) for nondegenerate graphons with respect to the cut-distance, here we need to put a relation between \(\mathrm {Freq}(T;\cdot )\) of a graphon and the parameter \(\mathrm {Freq}^-(T;\cdot )\) of a graph that is close to that graphon. Let us note that while our proof of continuity is long, the statement itself is natural. Indeed, the definition (1) is an integration involving products of values of the graphon over the edges of T (similar to the way that is used to define the density of T in the graphon), degrees in the graphon and the function \(b(\cdot )\) from (2) which is also defined using degrees in the graphon. As subgraph densities are continuous with respect to the cut-metric, and so is the degree sequence (c.f. Lemma 5.1), it is actually plausible to have continuity of many graph(on) parameters obtained by combinations thereof.

Lemma 4.3

Suppose that \(W:\varOmega ^2\rightarrow [0,1]\) is a nondegenerate graphon and T is a fixed tree. For every \(\varepsilon >0\) there exists \(a>0\) such that for every \(\chi >0\) there exists \(n_0\in \mathbb {N}\) and \(\delta >0\) such that if G is a graph and \(V_0\subseteq V(G)\), \(E_0\subseteq E(G)\) satisfy

  1. (a)

    \(n>n_0\),

  2. (b)

    \(\delta _\square (W,G)<\delta \),

  3. (c)

    \( |V_0|\leqslant a n\), \(|E_0|\leqslant an^2\), and

  4. (d)

    for each \(v\in V(G){\setminus } V_0\) we have \(\deg _G(v)\geqslant \chi n\),

then we have

$$\begin{aligned} |\mathrm {Freq}(T;W)-\mathrm {Freq}^-(T;G,V_0,E_0)|<\varepsilon . \end{aligned}$$
(55)

Proof

Suppose that W, T, and \(\varepsilon \) are given.

Let L be the height of T. Suppose that the vertices of T are \([\ell ]\). Suppose that 1 is the root of T, suppose that the height of each vertex i is denoted \(g_i\) and that the vertices of T are enumerated so that we have \(0=g_1<g_2\leqslant g_3\leqslant \cdots \leqslant g_{p-1}<g_p=g_{p+1}=\cdots =g_{\ell }=L\).

In the course of deriving Theorem 1.3 from Theorem 1.1 in Sect. 4.2 we prove that \(\mathrm {Freq}(T;W)\leqslant 1\).Footnote 7 In particular, the function \(f_W:\varOmega ^\ell \rightarrow [0,+\infty )\),

$$\begin{aligned} f_W(\omega _1,\ldots ,\omega _\ell ):= \exp \left( -\sum _{j=1}^{p-1} b_W(\omega _j) \right) \cdot \frac{\sum _{j=p}^\ell \deg _W(\omega _j) }{\prod _{j=1}^\ell \deg _W(\omega _j)} \cdot \prod _{\begin{array}{c} (i,j) \in E(T) \end{array}} W(\omega _i, \omega _j)\nonumber \\ \end{aligned}$$
(56)

is integrable. Let \(\delta _0>0\) be such that for any set \(A\subseteq \varOmega ^\ell \), \(\mu ^{\otimes \ell }(A)<\delta _0\) we have

$$\begin{aligned} \int _A f_W\leqslant \frac{\varepsilon }{10}. \end{aligned}$$
(57)

Since W is nondegenerate, there exists a number \(\varDelta \in (0,\frac{\delta _0}{4\ell })\) such that the set

$$\begin{aligned} \varOmega _{\mathrm {small}}:=\{\omega \in \varOmega :\deg _W(\omega )<2\varDelta \} \end{aligned}$$

has measure less than \(\frac{\delta _0}{2\ell }\).

Let \(a:=\min \{\frac{\delta _0}{10\ell },\frac{\varepsilon \varDelta ^\ell }{400\ell ^\ell }\}\). Now, suppose that \(\chi \) is given.

Let \(\mathbf {h}\in \mathbb {N}^{L+1}\) and \(d',d,\tau ,\kappa >0\) satisfy

$$\begin{aligned} \min \left( \varDelta ,\varepsilon ,\chi ,\delta _0\right) \gg d'\gg d\gg \tau \gg \frac{1}{\mathbf {h}_1}\gg \frac{1}{\mathbf {h}_2}\gg \ldots \gg \frac{1}{\mathbf {h}_L}\gg \frac{1}{\mathbf {h}_{L+1}}\gg \kappa > 0.\quad \end{aligned}$$
(58)

We note that the above dependencies are tower-type, i.e.,

$$\begin{aligned} \frac{1}{\mathbf {h}_{i}}\gg \frac{\hbar (\mathbf {h}\llbracket i\rrbracket )}{\mathbf {h}_{i+1}}. \end{aligned}$$
(59)

Let \(\left\{ \{\mathcal A_{F}\}_{F\in \mathfrak D_{\mathbf {h}\llbracket t\rrbracket }}\right\} _{t=0}^{L+1}\) be the anatomies of W up to depth \(L+1\). Let \(\delta \) be given by Lemma 4.1 for input graphon W and parameters \(\mathbf {h}\) and \(\kappa \).

Suppose now that the graph G is given. Take a graphon representation U of G such that \(\Vert W-U\Vert _\square <\delta \). Lemma 4.1 tells us that there exists a set \(X\subseteq \varOmega \) of measure at most \(\kappa \) such that \(\left\{ \{\mathcal A_{F}\}_{F\in \mathfrak D_{\mathbf {h}\llbracket t\rrbracket }}\right\} _{t=0}^{L+1}\) are a \(\kappa \)-approximate anatomies of U up to depth \(L+1\) with exceptional set X.

Given a vertex \(v\in V(G)\), we write \(\varOmega _v\subseteq \varOmega \) for the set representing v in U. We write \(\varLambda _{\mathrm {vert}}:=\bigcup _{v\in V_0}\varOmega _v\), and \(\varLambda _{\mathrm {edge}}=\bigcup _{uv\in E_0}\left( \varOmega _u\times \varOmega _v\cup \varOmega _v\times \varOmega _u\right) \). By assumption (c) we have

$$\begin{aligned} \mu (\varLambda _{\mathrm {vert}})\leqslant a \quad \text{ and } \quad \mu ^{\otimes 2}(\varLambda _{\mathrm {edge}})\leqslant 2a. \end{aligned}$$
(60)

To express \(\mathrm {Freq}^-(T;G,V_0,E_0)\), we introduce a counterpart to (56) that reflects U,

$$\begin{aligned} f_U(\omega _1,\ldots ,\omega _\ell ):= \exp \left( -\sum _{j=1}^{p-1} b_U(\omega _j) \right) \cdot \frac{\sum _{j=p}^\ell \deg _U(\omega _j) }{\prod _{j=1}^\ell \deg _U(\omega _j)} \cdot \prod _{\begin{array}{c} (i,j) \in E(T) \end{array}} U(\omega _i, \omega _j)\; \end{aligned}$$

and a version \(f^-_U\), which takes into the account the “deleted” vertices \(V_0\) and edges \(E_0\),

$$\begin{aligned} f^-_U(\omega ):= {\left\{ \begin{array}{ll} 0 &{}\text{ if }\,\, \exists \;\;i\in [\ell ]: \omega _i\in \bigcup _{v\in V_0}\varOmega _v \hbox { or } \exists ij\in E(T): (\omega _i,\omega _j)\in \bigcup _{vw\in E_0}\varOmega _v\times \varOmega _w,\\ f_U(\omega ) &{}\text{ otherwise. } \end{array}\right. } \end{aligned}$$

Note that the assumption (d) implies that

$$\begin{aligned} f^-_U\leqslant \frac{\ell }{\chi ^\ell } \quad \text{ on } \text{ the } \text{ whole } \text{ domain } \varOmega ^\ell . \end{aligned}$$
(61)

We say that a map \(\pi :[\ell ]\rightarrow \left\{ \mathcal A_F\right\} _{t=1,\ldots ,L+1; F\in \mathfrak D_{\mathbf {h}\llbracket t\rrbracket }}\) is a depth preserving anatomy assignment if for each \(i\in [\ell ]\) we have \(\pi (i)=\mathcal A_{F}\) for some \(F\in \mathfrak D_{\mathbf {h}\llbracket L+1-g_i\rrbracket }\). This means that the root 1 is assigned an anatomy of depth \(L+1\) while vertices further from the root are assigned anatomies of smaller depths. Note that depth preserving anatomy assignment partition the set \(\varOmega ^\ell \) it the sense that

$$\begin{aligned} \varOmega ^\ell =\bigsqcup _{\begin{array}{c} \pi :[\ell ]\rightarrow \left\{ \mathcal A_F\right\} _{t=1,\ldots ,L+1; F\in \mathfrak D_{\mathbf {h}\llbracket t\rrbracket }}\\ \mathrm {depth~preserving~anatomy~assignment} \end{array}} \prod _{j=1}^{\ell }\pi (j). \end{aligned}$$
(62)

In particular,

$$\begin{aligned} 1&=\sum _{\begin{array}{c} \pi :[\ell ]\rightarrow \left\{ \mathcal A_F\right\} _{t=1,\ldots ,L+1; F\in \mathfrak D_{\mathbf {h}\llbracket t\rrbracket }}\\ \mathrm {depth~preserving~anatomy~assignment} \end{array}} \prod _{j=1}^{\ell }\mu \left( \pi (j)\right) \;\text{, } \text{ and } \end{aligned}$$
(63)
$$\begin{aligned} 1&=\sum _{\begin{array}{c} \pi :[\ell ]\rightarrow \left\{ \mathcal A_F\right\} _{t=1,\ldots ,L+1; F\in \mathfrak D_{\mathbf {h}\llbracket t\rrbracket }}\\ \mathrm {depth~preserving~anatomy~assignment} \end{array}} \prod _{j=[\ell ]{\setminus } \{j*\}}^{\ell }\mu \left( \pi (j)\right) \;\text{ for } \text{ each } j^*\in [\ell ]. \end{aligned}$$
(64)

We need to define three additional classes of depth preserving anatomy assignments.

  • We say that a depth preserving anatomy assignment \(\pi \) is singular if there exists \(i\in [\ell ]\) such that a positive measure of the elements \(\omega \in \pi (i)\) satisfy \(\deg _W(\omega )\leqslant \varDelta \), or \(b_W(\omega )>\varDelta ^{-1}\).

  • We say that a depth preserving anatomy assignment \(\pi \) is dense if for every vertex \(j\in [\ell ]\) and every child \(j^*\) of j we have that \(center_{\pi (j^*)}(\pi (j))>d\cdot \mu (\pi (j^*))\). If this fails for some pair \(jj^*\), then we call \(j^*\) a witness. (Of course, several witnesses may exist.)

  • We say that a depth preserving anatomy assignment \(\pi \) is non-problematic if it is dense, and it is not singular.

For a depth preserving anatomy assignment \(\pi \), we write \({\varvec{\Omega }}_\pi =\prod _{i=1}^\ell (\pi (i){\setminus } X)\).

The next claim establishes some basic properties of singular depth preserving anatomy assignment. \(\square \)

Claim 4.3.1

Suppose that \(\pi \) is a singular depth preserving anatomy assignment, and that \(i\in \ell \) is a witness of singularity. Then we have that \(\pi (i)\subseteq \varOmega _{\mathrm {small}}\) or that \(b_W(\omega ')>\frac{1}{2\varDelta }\) for all \(\omega '\in \pi (i)\).

Proof

The definition of singular depth preserving anatomy assignment says that there exists \(\omega \in \pi (i)\) such that \(\deg _W(\omega )\leqslant \varDelta \), or \(b_W(\omega )>\varDelta ^{-1}\).

Let us first deal with the former case. A double application of (48) gives that for each \(\omega '\in \pi (i)\) we have

figure g

Therefore, in this case, \(\pi (i)\subseteq \varOmega _{\mathrm {small}}\).

We can deal with the latter case, but using (47) instead of (48). We have

$$\begin{aligned} b_W(\omega ')>\varDelta ^{-1}-\frac{4}{2\mathbf {h}_{L-g_i}}\geqslant \frac{1}{2\varDelta }\quad \text{ for } \text{ all } \omega '\in \pi (i). \end{aligned}$$

\(\square \)

The next key claim says that the integrals of \(f_W\) and \(f^-_U\) over sets corresponding to non–problematic depth preserving anatomy assignment are almost the same.

Claim 4.3.2

Suppose that \(\pi :[\ell ]\rightarrow \left\{ \mathcal A_{F}\right\} _{t=1,\ldots ,L+1; F\in \mathfrak D_{\mathbf {h}\llbracket t\rrbracket }}\) is a non-problematic depth preserving anatomy assignment. Let

$$\begin{aligned} e^\pi _{\mathrm {vert}}&:=\sum _{i=1}^\ell \mu \left( \pi (i)\cap \varLambda _{\mathrm {vert}}\right) \cdot \prod _{k\in V(T){\setminus } \{i\}}\mu (\pi (k))\quad \text{, } \text{ and }\\ e^\pi _{\mathrm {edge}}&:=\sum _{ij\in E(T)}\mu ^{\otimes 2}\left( \varLambda _{\mathrm {edge}}\cap \bigcup _{ij\in E(T)} \pi (i)\times \pi (j)\right) \cdot \prod _{k\in V(T){\setminus } \{i,j\}}\mu (\pi (k)). \end{aligned}$$

Then for the quantities

$$\begin{aligned} Q_1&:=\int \limits _{{\varvec{\omega }}\in {\varvec{\Omega }}_\pi } f_W({\varvec{\omega }})\;\text{ and } \\ Q_2&:=\int \limits _{{\varvec{\omega }}\in {\varvec{\Omega }}_\pi } f^-_U({\varvec{\omega }}) \end{aligned}$$

we have \(Q_1=Q_2\cdot (1\pm d)\pm \frac{(e^\pi _{\mathrm {vert}}+e^\pi _{\mathrm {edge}})\ell }{\varDelta ^\ell }\).

Proof

Suppose that \(j\in V(T)\). If \(j\ne 1\), then we write \(\mathrm {Par}(j)\) for the parent of j in T. Let us write \(F_j\) for the tree obtained from T by erasing the edge from j to \(\mathrm {Par}(j)\) and taking the component containing j. When \(j=1\), we take \(F_j:=T\). Also, for \(j=1\) we define \(center_{\pi (j)}\left( \pi (\mathrm {Par}(j))\right) :=1\).

For \(j\in V(T)\), write

$$\begin{aligned} R_j:=\prod _{k\in V(F_j){\setminus }\{j\}} center_{\pi (k)}\left( \pi (\mathrm {Par}(k))\right) . \end{aligned}$$

Here, recall the convention that a product over the empty set is 1. This in particular applies when j is a leaf and \(j\ne 1\).

We first want to show that the quantities

$$\begin{aligned} \int \limits _{(\omega _1, \ldots , \omega _\ell )\in {\varvec{\Omega }}_\pi }\prod _{\begin{array}{c} (i,j) \in E(T) \end{array}} W(\omega _i, \omega _j) \quad \text {and}\quad \int \limits _{(\omega _1,\ldots ,\omega _\ell )\in {\varvec{\Omega }}_\pi } \prod _{\begin{array}{c} (i,j) \in E(T) \end{array}} U(\omega _i, \omega _j) \end{aligned}$$

are very close. To this end, for \(j\in V(T)\) we define \(A_j(\omega _j)=B_j(\omega _j):=1\) if j is a leaf (different than the root) and otherwise we define

$$\begin{aligned} \begin{aligned} A_j(\omega _j)&:=\int \limits _{\{\omega _k\in \pi (k){\setminus } X\}_{k\in V(F_j){\setminus }\{j\}}}\prod _{(a,b) \in E(F_j)} W(\omega _a, \omega _b)\;,\\ B_j(\omega _j)&:=\int \limits _{\{\omega _k\in \pi (k){\setminus } X\}_{k\in V(F_j){\setminus }\{j\}}}\prod _{(a,b) \in E(F_j)} U(\omega _a, \omega _b). \end{aligned} \end{aligned}$$
(65)

Inductively for \(t=L+1,L,\ldots ,0\), assume that for each vertex \(j\in V(T)\) at height \(g_j=t\),Footnote 8 for each \(\omega _j\in \pi (j)\) we have

$$\begin{aligned} A_j(\omega _j)=R_j\cdot \exp \left( \pm \tfrac{v(F_j)}{\mathbf {h}_1}\right) \end{aligned}$$
(66)

and for every \(\omega _j\in \pi (j){\setminus } X\) we have

$$\begin{aligned} B_j(\omega _j)=R_j\cdot \exp \left( \pm \tfrac{v(F_j)}{\mathbf {h}_1}\right) . \end{aligned}$$
(67)

Now, let suppose that the statement is true for all vertices at height \(t+1\). Let j be an arbitrary vertex at height t, and let \(\omega _j\in \pi (j)\) be arbitrary. When j is a leaf then (66) and (67) hold trivially. So, suppose that j has some children \(j_1,j_2,\ldots ,j_q\). Since these children are at height \(t+1\), applying the induction hypothesis, we disintegrate (65) with respect to these children, and get

figure h

Now, each of the factors in the first product equals to \(center_{\pi (j_c)}(\pi (j))\pm \frac{1}{\mathbf {h}_{L+1-g_j}}\) by using the property (47). Since \(\pi \) is a dense depth preserving anatomy assignment and since \(\mu (\pi (j_c))\) is a positive number which depends only on \(\mathbf {h}\llbracket L-g_j\rrbracket \), we have

$$\begin{aligned} \deg _W(\omega _j,\pi (j_c))\pm \kappa =center_{\pi (j_c)}(\pi (j))\cdot (1\pm \sqrt{\kappa }). \end{aligned}$$

Since \((1\pm \sqrt{\kappa })^q=\exp (\pm \tfrac{1}{\mathbf {h}_1})\), we just verified (66) for j and \(\omega _j\).

We can get exactly the same calculations for \(B_j(\omega )\), where \(\omega \in \pi (j){\setminus } X\) and get

$$\begin{aligned} B_j(\omega _j)&=\prod _{c=1}^q (\deg _U(\omega _j,\pi (j_c))\pm \kappa )\cdot \prod _{c=1}^q \left( R_{j_c}\exp \Big (\pm \tfrac{v(F_{j_c})}{\mathbf {h}_1}\Big ) \right) \; \end{aligned}$$

Now, the fact \(\omega _j\) is a non-exceptional element of the cell \(\pi (j)\) of the \(\kappa \)-approximate anatomy \(\{\mathcal A_F\}_{F\in \mathfrak D_{\mathbf {h}\llbracket L+1-j\rrbracket }}\) (for U) implies (67).

Clearly, the term \(\int _{\omega _1\in \pi (1){\setminus } X} A_1(\omega _1)\) corresponds to the term \(\int \prod _{\begin{array}{c} (i,j) \in E(T) \end{array}} W(\omega _i, \omega _j)\) in the definition of \(Q_1\). We shall now control all the remaining terms. The idea is that these all are almost constant since we are working inside one anatomy cell. Let us first deal with the terms in the definition of \(Q_1\).

As \(\pi (j)\) is non-singular for each \(j\in [\ell ]\), the denominator of \(\frac{\sum _{j=p}^\ell \deg _{W}(\omega _j) }{\prod _{j=1}^\ell \deg _{W}(\omega _j)}\) is at least \(\varDelta ^\ell \). Further, the fluctuations in the denominator are at most \(\sum _{j=1}^{\ell }\frac{\hbar (\mathbf {h}\llbracket L-g_j\rrbracket )}{\mathbf {h}_{L+1-g_j}}\leqslant \tau \) around \(C_1:=\prod _{j=1}^\ell \deg ^{\mathrm {anat}}_W(\pi (j))\) by (48). Similarly, the nominator is bounded from below by \(\varDelta \), and the fluctuations in the nominator are at most \(\sum _{j=p}^\ell \frac{\hbar (\mathbf {h}\llbracket L-g_j\rrbracket )}{\mathbf {h}_{L+1-g_j}}\leqslant \tau \) around \(C_2:=\sum _{j=p}^\ell \sum _{F\in \mathfrak D_{\mathbf {h}\llbracket L-g_j\rrbracket }}center_F(\pi (j))\). The fluctuations in the term \(\exp \left( -\sum _{j=1}^{p-1} b_{W}(\omega _j) \right) \) are at most \(\tau \) around \(C_3:=\prod _{j=1}^{p-1} center_{\texttt {b}}(\pi (j))\). Also, since \(\pi (j)\) is non-singular for each \(j\in [p-1]\), we have \(C_3>\exp (-\frac{\ell }{\varDelta })\). The two upper-/lower- bounds we have derived using from the non-singularity will be used below to transform an additive error of the type \(value\pm error\) to a multiplicative one, \(value\cdot (1\pm \frac{error}{lower~bound~on~the~value})\). Therefore,

figure i

We now attempt to show that this quantity is close to \(Q_2\). First, let us introduce a modified quantity in which even integration over \(V_0\) and \(E_0\) is considered, \(Q^+_2:=\int \limits _{{\varvec{\omega }}\in {\varvec{\Omega }}_\pi } f_U({\varvec{\omega }})\). Now, we claim that we get exactly the same bound as in (68) even for \(Q^+_2\). Let us explain that there are no traps in doing so. First, we can use \(\int _{\omega _1\in \pi (1){\setminus } X} B_1(\omega _1)\) to control \(\int \prod _{\begin{array}{c} (i,j) \in E(T) \end{array}} U(\omega _i, \omega _j)\), exactly in the same way as we used \(\int _{\omega _1\in \pi (1){\setminus } X} A_1(\omega _1)\) to control \(\int \prod _{\begin{array}{c} (i,j) \in E(T) \end{array}} W(\omega _i, \omega _j)\), because (67) is a perfect counterpart to (66). The remaining quantities appearing in the definition of \(Q_2\) are again centered around the constants \(C_1\), \(C_2\), and \(C_3\) as above, except the fluctuations can be bigger by \(\kappa \) since \(\left\{ \{\mathcal A_{F}\}_{F\in \mathfrak D_{\mathbf {h}\llbracket t\rrbracket }}\right\} _{t=0}^{L+1}\) are only \(\kappa \)-approximate anatomies for U (as opposed to exact anatomies for W). However, \(\kappa \) is the smallest constant in (58) and thus this additional error can be easily swallowed by other error terms. Third, the upper- and lower- bounds based on non-nonsigularity that we used to transform an additive errors to multiplicative ones are still valid when the additional approximation term \(\kappa \) is taken into account. Therefore, we conclude that \(Q_2^+= C_3\cdot \frac{C_2}{C_1}\cdot \mu (\pi (1){\setminus } X)\cdot R_1\cdot \left( 1\pm \tfrac{d}{3}\right) \). This finishes the proof.

Now, let us show that we have \(|Q^+_2-Q_2|<\frac{(e^\pi _{\mathrm {vert}}+e^\pi _{\mathrm {edge}})\ell }{\varDelta ^\ell }\). Clearly, we have \(Q^+_2\geqslant Q_2\). On the other hand, if for some \((\omega _1,\ldots ,\omega _\ell )\in {\varvec{\Omega }}_\pi \) we have that \(f^-_U(\omega _1,\ldots ,\omega _\ell )=0\) but \(f_U(\omega _1,\ldots ,\omega _\ell )>0\) then for some \(i\in [\ell ]\) we have \(\omega _i\in \pi (i)\cap \varLambda _{\mathrm {vert}}\) or for some \(ij\in E(T)\) we have \((\omega _i,\omega _i)\in \pi (i)\times \pi (j)\cap \varLambda _{\mathrm {edge}}\). \(\square \)

Claim 4.3.3

Suppose that \(\pi :[\ell ]\rightarrow \left\{ \mathcal A_{F}\right\} _{t=1,\ldots ,L+1; F\in \mathfrak D_{\mathbf {h}\llbracket t\rrbracket }}\) is a depth preserving anatomy assignment that is not singular. Suppose that \(j^*\in [\ell ]\) is a witness that it is not dense either. Then

$$\begin{aligned} \int _{\prod _{j=1}^{\ell } \pi (j)}f_W<\frac{d\ell }{\varDelta ^\ell }\cdot \prod _{j\in [\ell ]} \mu (\pi (j))+\frac{\ell }{\mathbf {h}_{L+1-g_{j^*}}\cdot \varDelta ^\ell }\cdot \prod _{j\in [\ell ]{\setminus } \{j^*\}} \mu (\pi (j)). \end{aligned}$$
(69)

Proof

Let us bound all the terms in (56). More precisely, let \(kj^*\in E(T)\) be any edge that witnesses that \(\pi \) is not a dense depth preserving anatomy assignment. To bound \(f_W\), we shall use that

$$\begin{aligned} f_W(\omega _1,\ldots ,\omega _\ell )\leqslant \exp \left( -\sum _{j=1}^{p-1}b_W(\omega _j)\right) \cdot \frac{\sum _{j=p}^{\ell }\deg _W(\omega _j)}{\varPi _{j=1}^{\ell }\deg _W(\omega _j)}\cdot W(\omega _k,\omega _{j^{*}}). \end{aligned}$$

The term \(\exp \left( -\sum _{j=1}^{p-1} b_W(\omega _j) \right) \) is trivially at most 1. The nominator in the term \(\frac{\sum _{j=p}^\ell \deg _W(\omega _j) }{\prod _{j=1}^\ell \deg _W(\omega _j)}\) is at most \(\ell \) while the denominator is at least \(\varDelta ^\ell \) by non-singularity. Finally, because \(kj^*\in E(T)\) is an edge that witnesses that \(\pi \) is not a dense depth preserving anatomy assignment. Then for every choice of \(\omega _k\in \pi (k)\), we have that \(\int _{\omega _{j^*}\in \pi (j^*)} W(\omega _{k},\omega _{j^*})=\deg _W(\omega _k,\pi (\omega _{j*}))\leqslant d\cdot \mu (\pi (j^*))+\tfrac{1}{2\mathbf {h}_{L+1-g_{j^*}}}\). The claim now follows by integrating over the remaining dimensions. \(\square \)

We are now ready to express \(\mathrm {Freq}(T;W)\) and \(\mathrm {Freq}^-(T;G,V_0,E_0)\) in a way that most terms will approximately cancel. To express \(\mathrm {Freq}(T;W)\), we work with the function \(f_W\) defined in (56). Also, we partition the integration over \(\varOmega ^\ell \) in (1) into terms corresponding to individual depth preserving anatomy assignments as in (62).

$$\begin{aligned} \begin{aligned} |\mathrm {Stab}_T|\cdot \mathrm {Freq}(T;W)=&\sum _{\pi \mathrm {~non}\text {-}\mathrm {problematic}} \int _{{\varvec{\Omega }}_\pi }f_W\\&+ \sum _{\pi \mathrm {~non}\text {-}\mathrm {problematic}}\int _{\prod _{j=1}^\ell \pi (j){\setminus }{\varvec{\Omega }}_\pi }f_W\\&+\sum _{\pi \mathrm {~not~dense~and~not~singular}} \int _{\prod _{j=1}^\ell \pi (j)}f_W\\&+\sum _{\pi \mathrm {~singular}} \int _{\prod _{j=1}^\ell \pi (j)}f_W. \end{aligned} \end{aligned}$$
(70)

Now, we have

$$\begin{aligned} \begin{aligned} |\mathrm {Stab}_T|\cdot \mathrm {Freq}^-(T;G,V_0,E_0)=&\sum _{\pi \mathrm {~non}\text {-}\mathrm {problematic}} \int _{{\varvec{\Omega }}_\pi }f^-_U\\&+ \sum _{\pi \mathrm {~non}\text {-}\mathrm {problematic}}\int _{\prod _{j=1}^\ell \pi (j){\setminus }{\varvec{\Omega }}_\pi }f^-_U\\&+\sum _{\pi \mathrm {~not~dense~and~not~singular}} \int _{\prod _{j=1}^\ell \pi (j)}f^-_U\\&+\sum _{\pi \mathrm {~singular}} \int _{\prod _{j=1}^\ell \pi (j)}f^-_U. \end{aligned} \end{aligned}$$
(71)

We will show that the first sums in (70) and (71) cancel almost perfectly, and that all the remaining terms are negligible.

Claim 4.3.2 tells us that for the first sums in (70) and (71) we have

$$\begin{aligned} \sum _{\pi \mathrm {~non-probl.}} \int _{{\varvec{\Omega }}_\pi }f_W=\sum _{\pi \mathrm {~non-probl.}} \int _{{\varvec{\Omega }}_\pi }f^-_U\cdot (1\pm d)\pm \sum _{\pi \mathrm {~non-probl.}}\frac{(e^\pi _{\mathrm {vert}}+e^\pi _{\mathrm {edge}})\ell }{\varDelta ^\ell }. \end{aligned}$$
(72)

We have

figure j

Recall that in the course of proving Theorems 1.3 from 1.1 we showed that \(\int _{{\varvec{\omega }}\in \varOmega }f_W({\varvec{\omega }})\leqslant |\mathrm {Stab}_T| \leqslant \ell ^\ell \). In particular, using the above bounds, we can turn the multiplicative error in (72) into an additive error,

$$\begin{aligned} \begin{aligned} \sum _{\pi \mathrm {~non}\text {-}\mathrm {probl.}} \int _{{\varvec{\Omega }}_\pi }f_W&=\sum _{\pi \mathrm {~non}\text {-}\mathrm {probl.}} \int _{{\varvec{\Omega }}_\pi }f^-_U \;\pm 2d\ell ^\ell \pm \frac{3a\ell ^2}{\varDelta ^\ell }\\&=\sum _{\pi \mathrm {~non}\text {-}\mathrm {probl.}} \int _{{\varvec{\Omega }}_\pi }f^-_U \;\pm \frac{\varepsilon }{100}. \end{aligned} \end{aligned}$$
(73)

Let us now focus on the second sum in (70). Observe that the set \(\bigcup _{\pi \mathrm {~non}\text {-}\mathrm {problematic}} \prod _{j=1}^\ell \pi (j){\setminus }{\varvec{\Omega }}_\pi \) has measure at most \(\ell \cdot \mu (X)\leqslant \delta _0\). Therefore, (57) applies, and we get that

$$\begin{aligned} \sum _{\pi \mathrm {~non}\text {-}\mathrm {problematic}}\int _{\prod _{j=1}^\ell \pi (j){\setminus }{\varvec{\Omega }}_\pi }f_W\leqslant \frac{\varepsilon }{10}. \end{aligned}$$

Let us now turn to the third sum in (70). We write

$$\begin{aligned} \sum _{\pi \mathrm {~not~dense~and~not~singular}} \int _{\prod _{j=1}^\ell \pi (j)}f_W \leqslant \sum _{g=0}^{L}\sum _{\pi :\mathrm {witness~at~height}~g} \int _{\prod _{j=1}^\ell \pi (j)}f_W\;, \end{aligned}$$

where the last sum ranges over all non-singular depth preserving anatomy assignments \(\pi \) with a witness for not being dense at height g. By Claim 4.3.3 and by (63) and (64) for each \(g\in \{0,\ldots ,L\}\) we have

$$\begin{aligned} \sum _{\pi :\mathrm {witness~at~height}~g} \int _{\prod _{j=1}^\ell \pi (j)}f_W\leqslant \frac{d\ell }{\varDelta ^\ell } +\frac{\ell ^2}{\mathbf {h}_{L+1-g}\cdot \varDelta ^\ell } \leqslant \frac{d'}{L+1}. \end{aligned}$$

Therefore,

$$\begin{aligned} \sum _{\pi \mathrm {~not~dense~and~not~singular}} \int _{\prod _{j=1}^\ell \pi (j)}f_W<d'. \end{aligned}$$
(74)

The fourth sum of (70) can be bounded from above as follows. If \(\pi \) is singular then by Claim 4.3.1 there exists \(i\in [\ell ]\) such that we have \(b_W(\omega )>1/(2\varDelta )\) for all \(\omega \in \pi (i)\), or we have \(\pi (i)\subseteq \varOmega _{\mathrm {small}}\). Since the average value of \(b_W\) is 1 by Proposition 4.2, the measure of the elements that satisfy the former condition is at most \(2\varDelta \). Thus, the measure of the set \(\bigcup _{\pi }\prod _{j=1}^\ell \pi (j)\), where the union ranges over all depth preserving anatomy assignments \(\pi \) which are singular at coordinate i, is at most \(2\varDelta +\mu (\varOmega _{\mathrm {small}})<\frac{\delta _0}{\ell }\). We conclude that the measure of \(\bigcup _{\pi \mathrm {~singular}}\prod _{j=1}^\ell \pi (j)\) is less than \(\delta _0\). Therefore, (57) applies, and \(\sum _{\pi \mathrm {~singular}} \int _{\prod _{j=1}^\ell \pi (j)}f_W\leqslant \frac{\varepsilon }{10}\). It now remains to bound the second, third, and the fourth term (71). Recall that when bounding the corresponding terms in (70) above, we argued that the domains of integration of the second and the fourth term have measures at most \(\delta _0\) each. Combining this with the upper bound (61), we get that the second and the fourth term are at most \(\delta _0\cdot \frac{\ell }{\chi ^\ell }<\frac{\varepsilon }{100}\), using (58). So, the only term we have to control now is

$$\begin{aligned}&\sum _{\pi \mathrm {~not~dense~and~not~singular}} \int _{\prod _{j=1}^\ell \pi (j)}f^-_U\nonumber \\&\quad =\sum _{\pi \mathrm {~not~dense~and~not~singular}} \int _{{\varvec{\Omega }}_\pi }f^-_U&+\sum _{\pi \mathrm {~not~dense~and~not~singular}}\int _{\prod _{j=1}^\ell \pi (j){\setminus }{\varvec{\Omega }}_\pi }f^-_U.\quad \end{aligned}$$
(75)

The first summand can be bounded by \(2d'\) in exactly the same way as we derived (74). To see this, recall that on \({\varvec{\Omega }}_\pi \) the degrees into anatomies are similar for W and for U and thus we can make use of the “non-denseness” and “non-singularity” assumption even for \(f_U^-\) (with slightly worse constants). The second summand of (75) corresponds represents integration over a set of measure at \(\delta _0\). In other words, we can bound the second summand of (75) by \(\frac{\varepsilon }{100}\) in the same way we bounded the second and the fourth term of (71).

The lemma now follows by expanding \(\mathrm {Freq}(T;W)-\mathrm {Freq}^-(T;G,V_0,E_0)\) using (70) and (71) and using the bounds above. \(\square \)

Theorem 1.1 now follows in a straightforward way by combining Lemmas 3.12 and 4.3.

Proof of Theorem 1.1

Suppose that a graphon W, an \(\ell \)-vertex tree T and \(\varepsilon >0\) are given. Let \(a>0\) by given by Lemma 4.3 for W, T, and \(\frac{\varepsilon }{2}\). Lemma 2.11 with parameter \(\beta :=c_1\cdot \min \{a^8,\varepsilon ^{16}\}\) (for a sufficiently small constant \(c_1\)) yields positive constants \(\alpha ,\varepsilon ',\gamma \) and \(\xi '\) with \(\beta \gg \alpha \gg \varepsilon '\gg \gamma \gg \xi '\). Now, let \(n_0\) and \(\delta '\) be numbers given by Lemma 4.3 for input parameters as above together with \(\chi :=\gamma \). Set \(\xi :=\min (\xi ',\delta ,\frac{1}{n_0})\).

Suppose that G is a graph with \(d_\square (G,W)\leqslant \xi \). Owing to Lemma 2.11, we know that G has a \((\beta ,\alpha ,\varepsilon ',\gamma )\)-good decomposition \(V(G)=V_0\sqcup V_1\sqcup \ldots \sqcup V_k\). Lemma 3.12 now tells us that with probability at least \(1-\mathsf {O}(\ell \alpha ^{1/8})\) the uniform spanning tree \(\mathcal {T}\) of G satisfies

$$\begin{aligned} \mathbb {P}_X(B_\mathcal {T}(X,r) \cong T) = (1+\mathsf {O}(\ell ^2 \alpha ^{1/16})) \mathrm {Freq}(T;G) + \mathsf {O}( \ell ^2 \beta ^{1/8}) . \end{aligned}$$
(76)

Recall that the quantity \(\mathrm {Freq}(T;G)\) is defined in (34), and depends on the decomposition \(V(G)=V_0\sqcup V_1\sqcup \ldots \sqcup V_k\). Observe that \(\mathrm {Freq}(T;G)=\mathrm {Freq}^-(T;G,U_0,E_0)\), where \(U_0\) is the union of \(V_0\) and sets \(V_i\) (\(i\geqslant 1\)) that are not \((\alpha ,\varepsilon ')\)-big, and \(E_0\) consists of all edges running across two distinct \((\alpha ,\varepsilon ')\)-big sets \(\{V_i\}_i\). Proposition 2.12 tells us that \(|U_0|= \mathsf {O}(\beta ^{1/8}n)<an\). The fact that \(V(G)=V_0\sqcup V_1\sqcup \ldots \sqcup V_k\) is a \((\gamma ,\varepsilon '^5,\varepsilon '^5)\)-expander decomposition (in particular, (G2) of Definition 2.6 applies) tells us that \(|E_0|\leqslant \varepsilon '^5 n^2<an^2\). Thus, Lemma 4.3 gives \(\mathrm {Freq}(T;G)=\mathrm {Freq}(T;W)\pm \frac{\varepsilon }{2}\). Plugging this back to (76) and using that the term \((1+\mathsf {O}(\ell ^2 \alpha ^{1/16}))\) can be bounded by \((1\pm \frac{\varepsilon }{4})\) and the term \(\mathsf {O}(\ell ^2 \beta ^{1/8})\) is smaller than \(\frac{\varepsilon }{4}\), we get the theorem. \(\square \)

4.2 Proof of Theorem 1.3 from Theorem 1.1

Let T be a fixed rooted tree with \(\ell \geqslant 2\) vertices and height r. We denote the vertices of T, as before, by \(\{1,\ldots ,\ell \}\) so that 1 is the root and the vertices of distance r from the root are \(\{p,\ldots ,\ell \}\) for some \(2 \leqslant p\leqslant \ell \). Our goal is to show that the probability that the first r generations of the branching process yield a tree which is root-isomorphic to T is \(\mathrm {Freq}(T;W)\).

First, the factor of \(|\mathrm {Stab}_T|^{-1}\) comes from the \(|\mathrm {Stab}_T|\) different vertex labellings of T which yield the same event. Secondly, we may split this event to \(\ell -p+1\) disjoint events, indexed by the vertices \(q\in \{p,\ldots ,\ell \}\) at height r of T, and each of these is the event that the path from 1 to q is the first r steps of the ancestral path defined in \(\kappa _W\). We will show that the probability of each of these events is precisely

$$\begin{aligned} \int \limits _{\omega _1, \ldots , \omega _\ell } \exp \left( -\sum _{j=1}^{p-1} b_W(\omega _j) \right) \cdot \frac{\deg (\omega _q) }{\prod _{j=1}^\ell \deg (\omega _j)} \cdot \prod _{\begin{array}{c} (i,j) \in E(T) \end{array}} W(\omega _i, \omega _j) \mathrm {d}\omega _1 \cdots \mathrm {d}\omega _\ell ,\quad \end{aligned}$$
(77)

which is the q-th term in (1) when we expand the sum over j in the numerator. We denote the path from 1 to q in T by \(x_1,\ldots , x_r\) where \(x_1=1\) and \(x_r=q\). By definition of \(\kappa _W\), the density function of the first r ancestral particles \(\omega _{x_1},\ldots ,\omega _{x_r}\) is

$$\begin{aligned} \prod _{j=1}^{q-1} {W(\omega _{x_j}, \omega _{x_{j+1}}) \over \deg (x_j)} . \end{aligned}$$

Thirdly, by independence of the branching process, conditioned on this path, the rest of the branching process emanating from the particles on this path is independent and all new particles (if there are any) are “other” particles. Given a particle of type \(\omega \) (either ancestral or other), the number of its other progeny is distributed as \(\mathsf {Poisson}(b_W(w))\) random variable, so the probability that it equals \(k \geqslant 0\) is \({e^{-b_W(\omega )} b_W(\omega )^k \over k!}\). Given the the number of its “other” progeny is k, these k points of \(\varOmega \) are distributed as k i.i.d. points drawn from the probability density function \({W(\omega ,\omega ') \over b_W(\omega ) \deg (\omega ')}\). Thus, conditioned on having k progeny, the density of these k-tuple \(\{\omega _1,\ldots ,\omega _k\}\) of points is

$$\begin{aligned} {k! \over b(\omega )^k} \prod _{i=1}^k { W(\omega , \omega _i) \over \deg (\omega _i)} . \end{aligned}$$

The \(k!/b_W(\omega )^k\) cancels with the \(b_W(\omega )^k/k!\) in the probability of having k points and we are left with last term times \(e^{-b_W(\omega )}\). We now apply this throughout all the branching points of the tree T (even for \(k=0\)) and this concludes the proof (note that we do not have a factor for \(e^{-b_W(\omega )}\) for tree vertices at level r since we do not care if they have progeny or not). \(\square \)

5 Bounds on the Number of Vertices of a Given Degree in the UST

In this section we prove Theorem 1.5. Let us make two remarks beforehand. Firstly, we cannot hope for converse bounds to the theorem. Indeed, let us take a small \(\alpha >0\) and let us consider the complete bipartite graph \(K_{\alpha n,(1-\alpha n)}\) with color classes A and B, \(|A|=\alpha n\), \(|B|=(1-\alpha )n\). The handshaking lemma tells us that for any spanning tree T of \(K_{\alpha n,(1-\alpha n)}\) we have

$$\begin{aligned} n-1=\sum _{b\in B}\deg _T(b)=(1-\alpha )n+\sum _{b\in B,\deg _T(b)>1}(\deg _T(b)-1). \end{aligned}$$

In particular, the last sum can have at most \(\alpha n-1\) summands. We conclude that T has more than \((1-2\alpha )n\) leaves, and less than \(2\alpha n\) vertices of degrees more than 1.

Secondly, as we mentioned in the beginning of this paper, the degree distribution of a UST in a complete graph is approximately \(\mathsf {Poisson}(1)+1\). Considering Theorem 1.5, we see that the complete graph is an asymptotic minimizer for the number of leaves (density \(e^{-1}\)), the asymptotic maximizer for the number of vertices of degree 2 (density \(e^{-1}\)) and 3 (density \(e^{-1}/2\)). However, the density of vertices degree \(k \geqslant 4\) in the UST of the complete graph is \(e^{-(k-1)}/(k-1)!\) which is smaller than the \((k-2)^{k-2}e^{-(k-2)}/(k-1)!\) given by Theorem 1.5. Examining the proof shows that the bounds in Theorem 1.5 for \(k \geqslant 4\) are optimal in the following sense. Let \(\alpha >0\) be arbitrarily small, and let \(G_{n,k,\alpha }\) be the graph obtained by taking a complete graph on \(n/(k-2)\) vertices and additional \(n(k-3)/(k-2)\) vertices such that each of the \(n^2(k-3)\) possible edges between the two parts is retained with probability \(\alpha \) and erased otherwise, independently of all other edges. It is a straightforward computation using Theorem 1.3 (see also the proof of Theorem 1.5 below) to show that for n large, with high probability the proportion of vertices of degree k in a UST will be arbitrarily close, as \(\alpha \rightarrow 0\), to \(\frac{1}{(k-1)!}((k-2)/e)^{k-2}\).

We are almost ready to prove the theorem. We will need the following folklore statement that provides a relation between degrees in a graphon and finite graphs that converge to it. Lemma 5.1 is easy to prove directly, but let us note that is a straightforward consequence of Lemma 4.1 for anatomies of depth 1.

Lemma 5.1

Suppose that \(G_1,G_2,\ldots \) are finite graphs converging to a graphon \(W:\varOmega ^2\rightarrow [0,1]\).

  1. (1)

    Suppose that \(0\leqslant a<b\leqslant 1\) are given. and let d be the measure of points of \(\varOmega \) whose degree is in the interval (ab). Then for an arbitrary \(\varepsilon >0\) there exists an \(n_0\) such that for every \(n>n_0\) we have that in \(G_n\) there are at least \((d-\varepsilon ) v(G_n)\) many vertices with degrees in the interval \(( (a-\varepsilon )v(G_n) , (b+\varepsilon ) v(G_n) )\) and there are at most \((d+\varepsilon ) v(G_n)\) many vertices with degrees in the interval \(( (a+\varepsilon )v(G_n) , (b-\varepsilon ) v(G_n) )\).

  2. (2)

    Suppose that there exists \(\alpha >0\) such that in each \(G_n\), all but at most \(\mathsf {o}_n(1)v(G_n)\) vertices have degrees at least \(\alpha v(G_{i})\). Then W is nondegenerate.

Proof of Theorem 1.5

We start by proving (3). Assume by contradiction that there are \(\varepsilon _0, \delta _0>0\) such that there exists a sequence of graphs \(G_n\) (which we assume have n vertices) such that

  1. (A)

    at least \((1-\mathsf {o}(1))n\) vertices of \(G_n\) are of degree at least \(\delta _0 n\), and,

  2. (B)

    \(\mathbb {P}\big ( L_1(n) \leqslant (e^{-1}-\varepsilon _0)n \big ) > \varepsilon _0\),

where \(L_1(n)\) is the number of leaves in a UST of \(G_n\). By Theorem 1.6 we may assume without loss of generality that \(G_n\) is a converging sequence and let W be the corresponding limit graphon. By Lemma 5.1(2) and assumption (A) we deduce that W is nondegenerate. Let \(\mathcal {T}_n\) be a UST of \(G_n\). By Theorem 1.3 the sequence \(\mathcal {T}_n\) almost surely satisfies that

$$\begin{aligned} \lim _{n\rightarrow \infty } \mathbb {P}( \deg _{\mathcal {T}_n}(X) = 1 ) = \mathbb {E}_{x\in \varOmega }\left[ \mathbb {P}[1+\mathsf {Poisson}(b_W(x))=1]\right] =\mathbb {E}_{x\in \varOmega }\left[ e^{-b_W(x)}\right] , \end{aligned}$$

where X is a uniformly chosen vertex of \(G_n\). By Proposition 4.2 we have that \(\mathbb {E}_{x\in \varOmega }[b_W(x)]=1\). Hence Jensen’s inequality implies that

$$\begin{aligned} \lim _{n \rightarrow \infty } \mathbb {P}( \deg _{\mathcal {T}_n}(X) = 1 ) \geqslant e^{-1} . \end{aligned}$$

Since \(L_1(n) = n\mathbb {P}( \deg _{\mathcal {T}_n}(X) = 1 )\) we arrive at a contradiction to assumption (B), showing (3).

The proof of (4) and (5) proceeds similarly. We fix \(k \geqslant 2\) and make the an analogous contradictory assumption as in (A) and (B). We have that

$$\begin{aligned} \lim _{n\rightarrow \infty } \mathbb {P}( \deg _{\mathcal {T}_n}(X) = k )= & {} \mathbb {E}_{x\in \varOmega }\left[ \mathbb {P}[1+\mathsf {Poisson}(b_W(x))=k]\right] \\= & {} {1 \over (k-1)!}\mathbb {E}_{x\in \varOmega }\left[ e^{-b_W(x)} b_W(x)^{k-1} \right] . \end{aligned}$$

When \(k=2\), since \(ye^{-y}\leqslant e^{-1}\) for all \(y \geqslant 0\) we learn that the last quantity is at most \(e^{-1}\), proving (4) by contradiction. When \(k \geqslant 3\), it follows by Lemma 5.2 (provided immediately below) that the last quantity is at most \({1 \over (k-1)!}((k-2)/e)^{k-2} \), similarly proving (5). \(\square \)

The last missing piece is to state and prove the optimization result that was key for our proof of Theorem 1.5 above.

Lemma 5.2

Let \(b:[0,1]\rightarrow [0,\infty )\) satisfy \(\int _{[0,1]} b(x)\mathrm {d}x =1\). Then for any \(k \geqslant 2\),

$$\begin{aligned} \int _{[0,1]} e^{-b(x)} b(x)^k \mathrm {d}x \leqslant \left( {k-1 \over e} \right) ^{k-1}. \end{aligned}$$

Proof

For ease of notation, let \(f(x)=e^{-x}x^k\). Fix \(\kappa >0\). Approximating b(x) by simple functions, we can find \(n\in \mathbb {N}\) and n non-negative numbers \(b_1,\ldots ,b_n\) such that \(\frac{1}{n}\sum _{i=1}^{n}b_i=1\), and \(\int _{[0,1]} f(b(x))\mathrm {d}x \leqslant \frac{1}{n}\sum _{i=1}^{n}f(b_i)+\kappa \). Hence

$$\begin{aligned} \int _{[0,1]} e^{-b(x)} b(x)^k \mathrm {d}x \leqslant \max _{(b_1,\ldots ,b_n)\in \mathcal {C}} \varPhi (b_1,\ldots ,b_n) + \kappa , \end{aligned}$$

where \(\varPhi (b_1,\ldots ,b_n):=\frac{1}{n}\sum _{i=1}^{n}f(b_i)\) and \(\mathcal {C}:=\{(b_1,\ldots ,b_n)\in [0,+\infty )^{n}:\frac{1}{n}\sum _{i=1}^{n}b_i \leqslant 1\}\). To prove the lemma, it thus suffices to show that \(\varPhi \vert _{\mathcal {C}} \leqslant \left( \frac{k-1}{e}\right) ^{k-1}\).

By abuse of notation, we use the same symbol \(\vec {b}=(b_1,\ldots ,b_n)\) to denote the maximizer of \(\varPhi \vert _{\mathcal {C}}\). As \(f'(x)=e^{-x}x^{k-1}(k-x)<0\) for every \(x>k\), f(x) is decreasing on \((k,\infty )\), and consequently

$$\begin{aligned} 0 \leqslant b_i \leqslant k \quad \text {for all }i\in [n]. \end{aligned}$$
(78)

There are two possibilities.

Case 1 There exists a non-negative number y so that \(b_i \in \{0,y\}\) for every \(i\in [n]\).

Let \(\lambda \in [0,1]\) denote the proportion of \(i\in [n]\) with \(b_i=y\). Then \(\lambda y=\frac{1}{n}\sum _{i=1}^{n}b_i \leqslant 1\). Hence

$$\begin{aligned} \varPhi (\vec {b})=\lambda e^{-y}y^k \leqslant e^{-y}y^{k-1} \leqslant \left( \frac{k-1}{e}\right) ^{k-1}, \end{aligned}$$

as required.

Case 2 \(b_i\) receives at least two positive values.

Let i and j be two arbitrary indices such that \(1\leqslant i<j \leqslant n\) and \(\min \{b_i,b_j\}>0\). For every small \(\varepsilon >0\), two vectors \(\vec {b}':=(b_1,\ldots ,b_i+\varepsilon ,\ldots ,b_j-\varepsilon ,\ldots ,b_n)\) and \(\vec {b}'':=(b_1,\ldots ,b_i-\varepsilon ,\ldots ,b_j+\varepsilon ,\ldots ,b_n)\) clearly belong to the domain \(\mathcal {C}\). By Taylor’s formula and the optimality of \(\vec {b}\), we thus obtain

$$\begin{aligned} 0 \leqslant \varPhi (\vec {b})-\varPhi (\vec {b}')&=(f'(b_j)-f'(b_i))\varepsilon +O_{b_i,b_j}(\varepsilon ^2),\\ 0 \leqslant \varPhi (\vec {b})-\varPhi (\vec {b}'')&=-(f'(b_j)-f'(b_i))\varepsilon +O_{b_i,b_j}(\varepsilon ^2). \end{aligned}$$

This forces \(f'(b_i)=f'(b_j)\).

As \(f''(x)=e^{-x}x^{k-2}((x-k)^2-k)\), we see that \(f''(x)>0\) for \(x\in (0,k-\sqrt{k})\), and \(f''(x)<0\) for \(x\in (k-\sqrt{k},k)\). It follows that \(f'(x)\) is increasing in \([0,k-\sqrt{k}]\), and decreasing in \([k-\sqrt{k},k]\). Hence for each \(\alpha \in \mathbb {R}\), the equation \(f'(x)=\alpha \) has at most two solutions in [0, k].

From the discussion above and (78), we learn that there are two numbers y and z satisfying the following properties:

  1. (i)

    \(b_i \in \{0,y,z\}\) for every \(i \in [n]\),

  2. (ii)

    \(f'(y)=f'(z)\), that is, \(e^{-y}y^{k-1}(k-y)=e^{-z}z^{k-1}(k-z)\),

  3. (iii)

    \(0<y<z \leqslant k\).

Let \(\lambda \) and \(\mu \) be the proportions of \(i \in [n]\) such that \(b_i=y\) and \(b_i=z\), respectively. Then \(\lambda +\mu \leqslant 1\), \(\lambda y+\mu z\overset{(i)}{=}\frac{1}{n}\sum _{i=1}^{n}b_i \leqslant 1\), and \(\varPhi (\vec {b})\overset{(i)}{=}\varPsi (\lambda ,\mu )\), where \(\varPsi (\lambda ,\mu ):=\lambda f(y)+\mu f(z)\).

We will view y and z as constants, and seek to maximize \(\varPsi (\lambda ,\mu )\) under the constraints: \(\lambda ,\mu \geqslant 0\), \(\lambda +\mu \leqslant 1\), and \(\lambda y+\mu z \leqslant 1\). Let \((\lambda _0,\mu _0)\) be a maximizer of \(\varPsi \). We claim that \(\min \{\lambda _0,\mu _0\}=0\). Before proving the claim, let us show how it implies the lemma. Indeed, if one of \(\lambda _0\) and \(\mu _0\) is zero, say \(\mu _0\), then

$$\begin{aligned} \varPhi (\vec {b})=\varPsi (\lambda ,\mu ) \leqslant \varPsi (\lambda _0,\mu _0)=\lambda _0 e^{-y}y^k \leqslant e^{-y}y^{k-1} \leqslant \left( \frac{k-1}{e}\right) ^{k-1}, \end{aligned}$$

as desired.

It remains to prove that \(\min \{\lambda _0,\mu _0\}=0\). Suppose to the contrary that \(\lambda _0,\mu _0>0\). Let \(\lambda _1=\lambda _0-\varepsilon \) and \(\mu _1=\mu _1+\varepsilon y/z\), where \(\varepsilon >0\) is a small constant. It is not difficult to verify that \(\lambda _1,\mu _1 \geqslant 0\), \(\lambda _1+\mu _1 \leqslant 1\), and \(\lambda _1 y + \mu _1 z \leqslant 1\). Thus, by the optimality of \((\lambda _0,\mu _0)\), we get \(\varPsi (\lambda _1,\mu _1) \leqslant \varPsi (\lambda _0,\mu _0)\), giving \(e^{-z}z^{k-1} \leqslant e^{-y}y^{k-1}\). Combined with (iii), we obtain \(e^{-z}z^{k-1}(k-z)<e^{-y}y^{k-1}(k-y)\), contradicting (ii). This completes our proof of the lemma. \(\square \)

Remark 5.3

Our proof of Theorem 1.5 crucially relied on our running assumption that almost all the degrees in the graph G are linear. It seems natural to investigate similar extremal questions with a weakened form of this assumption. The minimal meaningful assumption seems to be that G contains no vertices of degree 2; this is to avoid the case of paths. This general setting seems much more complicated. For example, the proportion of leaves in a uniform spanning tree on an \(n\times n\) torus is asymptotically almost surely \((1-2/\pi )\cdot \frac{8}{\pi ^2}+\mathsf {o}(1)\approx 0.294\) (see for example [18, p. 112]), which is less than the lower bound of \(e^{-1}\approx 0.368\) in Theorem 1.5.