1 Introduction

Bond percolation, or simply percolation, refers to the random graph obtained by independently keeping each edge of a graph with some fixed probability p (and deleting with probability \(1-p\)). Percolation is a classical and important model in statistical physics and network science, as it serves as a canonical model for assessing robustness of a network when the edges of the underlying network are randomly damaged, and also as a basic model of vaccination for the prevention of an epidemic on networks. A detailed account of many of these applications can be found in [7, 50]. From a theoretical perspective, percolation is one of the most elementary models that exhibits a phase transition, i.e., there exist values \(p_c = p_c(n)\) such that for \(p>p_c(1+\varepsilon )\) and \(\varepsilon >0\), the proportion of vertices in the largest connected component is bounded away from zero with high probability, whereas for \(p<p_c(1-\varepsilon )\) this proportion becomes negligible. The critical behavior is observed when \(p\approx p_c\), and fascinating behavior starts to emerge for the percolation process around this critical value.

It turns out that there is a window of values of p where the component functionals show intermediate and unique behavior. For example, rescaled component functionals converge to non-degenerate scaling limits, in contrast to the fact that they always concentrate for other values of p. Also, the large components in this window are structurally intermediate in the sense that neither there is a giant component with a growing number of cycles, nor do the components look like trees. This regime is called the critical window of the percolation phase-transition. Starting with the pioneering work of Aldous [4], deriving scaling limits for critical component functionals has been the ground for an enormous literature with several interesting scaling-limit results over the past decades [5, 6, 11, 12, 27, 28, 44, 48, 49, 53]. We refer the reader to [26, Chapter 1] and references therein for an elaborate discussion of the nature of this transition, and a literature overview.

In the literature, two fundamentally different types of behavior have been proved for the scaling limits and the critical exponents associated to the critical window and component sizes depending on whether the asymptotic degree distribution satisfies a finite third-moment condition [11, 28] or an infinite third - but a finite second-moment condition [12, 27]. However, the study of critical behavior in the infinite second-moment setting was an open question.

When the degree distribution is asymptotically a power-law with exponent \(\tau \in (2,3)\), then the finite second-moment condition fails. These networks are popularly known as scale-free networks [7] in the literature. Many real-world networks are observed to be scale-free [2, 30, 36, 50]. One of the well-known features of scale-free networks is that they are robust under random edge-deletion, i.e., for any sequence \((p_n)_{n\ge 1}\) satisfying \(\liminf _{n\rightarrow \infty } p_n > 0\), the graph obtained by performing percolation with probability \(p_n\) is supercritical. This feature has been studied experimentally in [3], using heuristic arguments in [23,24,25, 29] (see also [19, 20, 34] in the context of optimal paths in the strong disorder regime), and mathematically in [17]. Thus, in order to observe the percolation critical behavior, one needs to have \(p_n \rightarrow 0\) with the network size, despite of the fact that the average degree of the network remains bounded.

In this paper, we initiate the study of critical behavior in the scale-free regime. As a canonical random graph model on which percolation acts, we take the multigraph generated by the configuration model. When the degree distribution satisfies a power-law with exponent \(\tau \in (2,3)\), it was heuristically argued in [24, 29] that the critical value is \(p_c \sim n^{-(3-\tau )/(\tau -1)}\), so that the critical window is given by the collection of values \(p_c = p_c(\lambda )=\lambda n^{-(3-\tau )/(\tau -1)}\), where \(\lambda >0\) indicates the location inside the critical window. We establish that the scaling exponents from [24, 29] are indeed true, and discuss asymptotics of component functionals inside the critical window. We also show that \(p_c = p_c(\lambda )=\lambda n^{-(3-\tau )/(\tau -1)}\) with \(\lambda >0\) gives the right critical window, by showing that a giant component emerges at the end of the critical window (\(\lambda \rightarrow \infty \)), while components have a trivial star-like structure before the critical window (\(\lambda \rightarrow 0\)). The main contributions of this paper can be summarized as follows:

Critical window. At criticality, we obtain scaling limits for the largest component sizes and surplus edges in a strong topology. The result displays a completely new universality class of scaling limits of critical components. The scaling limits here are different from the general multiplicative coalescent framework in [5]. In particular, the limiting exploration process has bounded variation, so that the general tools from [5] cannot be applied. We also study the diameter of these components and show that the maximum diameter is of order \(\log n\).

Near-critical behavior. For \(p_n= \lambda _n n^{-(3-\tau )/(\tau -1)}\) with \(\lambda _n \rightarrow 0\), the graph is subcritical and we show that the largest components sizes, rescaled by \(n^\alpha p_n\), concentrate. On the other hand, when \(\lambda _n \rightarrow \infty \), the largest component size, rescaled by \(n p_n ^{1/(3-\tau )}\), concentrates, and this is the unique giant component in the sense that the size of the second largest component is much smaller than \(n p_n ^{1/(3-\tau )}\). The nature of the emergence of this giant component for \(p_n\gg p_c\) is markedly different compared to the universality classes in the \(\tau \in (3,4)\) and the \(\tau >4\) regimes, where the giant emerges when the percolation probability satisfies \((p_n-p_c(\lambda _1)) \gg (p_c(\lambda _2) - p_c(\lambda _1))\), for some strictly positive \(p_c\) and \(-\infty<\lambda _1<\lambda _2<\infty \) [38].

Methods. Technically, analyzing percolation on random graphs like the configuration model is challenging, because in order to make Aldous’s exploration process approach [4] work, one is required to keep track of many functionals of the unexplored part of the graph [48], resulting in a high-dimensional exploration process. This difficulty was circumvented in [27, 28] by using Janson’s algorithm [39]. Unfortunately, Janson’s algorithm does not work here due to the fact that the algorithm creates \(n-o(n)\) degree-one vertices. Instead, we sandwich the percolated graph in between two configuration models, which yield the same scaling limits for the component sizes. Also, in order to deduce scaling limits of the component sizes from that of the exploration process, we prove several properties of the limiting exploration process, which are interesting from an independent perspective.

Remark 1

(Single-edge constraint). In a parallel work [10], Bhamidi and the first two authors consider critical percolation on simple random graphs, i.e., random graphs having no multiple-edges, namely generalized random graphs. It turns out that the critical window there is \(p_c \sim n^{-(3-\tau )/2} \gg n^{-(3-\tau )/(\tau -1)}\). This is a distinctive feature in the infinite second-moment case that never surfaced in the other two universality classes of critical random graphs.

Organization of the paper. In Sect. 2, we state our results precisely. In Sect. 2.1, we give the precise definitions of the model and the scaling limits. Section 2.2 is devoted to comments about the heuristics, and some important special cases. In Sect. 3, we study excursions of the limiting exploration process. Section 4 contains the proofs of the results at criticality, and in Sect. 5, we analyze the near-critical regimes.

2 Main Results

2.1 The configuration model

2.1.1 Model description

The configuration model generates random multigraphs with any given degree sequence. Consider n vertices labeled by \([n]:=\{1,2,...,n\}\) and a non-increasing sequence of degrees \({\varvec{d}} =\varvec{d}_n = ( d_i )_{i \in [n]}\) such that \(\ell _n = \sum _{i \in [n]}d_i\) is even. The configuration model on n vertices having degree sequence \({\varvec{d}}\) is constructed as follows [8, 16]:

  • Equip vertex j with \(d_{j}\) stubs, or half-edges. Two half-edges create an edge once they are paired. Therefore, initially we have \(\ell _n=\sum _{i \in [n]}d_i\) half-edges. Pick any half-edge and pair it with a uniformly chosen half-edge from the remaining unpaired half-edges and remove both these half-edges from the set of unpaired half-edges. Keep repeating the above procedure until all half-edges are paired.

Let \(\mathrm {CM}_n({\varvec{d}})\) denote the graph constructed by the above procedure. Note that \(\mathrm {CM}_n({\varvec{d}})\) may contain self-loops and multiple edges. In fact, the probability that \(\mathrm {CM}_n({\varvec{d}})\) is a simple graph tends to zero in our setting with an infinite second-moment condition on the degree distribution [36, Proposition 7.12]. Before stating the main results about the configuration model, we set up some necessary notation.

2.1.2 Notions of convergence and the limiting objects

To describe the main results of this paper, we need some definitions and notations. We use the Bachmann-Landau asymptotic notation \(O(\cdot )\), \(o(\cdot )\), \(\varTheta (\cdot )\) for large-n asymptotics of real numbers. For \((a_n)_{n\ge 1}, (b_n)_{n\ge 1} \subset (0,\infty )\), we write \(a_n\ll b_n\), \(a_n \sim b_n\) and \(a_n\gg b_n\) as a shorthand for \(\lim _{n\rightarrow \infty }a_n/b_n = 0,1,\infty \), respectively. We often use C as a generic notation for a positive constant whose value can be different in different lines. We also use the standard notation of \(\xrightarrow {\scriptscriptstyle \mathbb {P}}\), and \(\xrightarrow {\scriptscriptstyle d}\) to denote convergence in probability and in distribution, respectively. The topology needed for the convergence in distribution will always be specified unless it is clear from the context. We say that a sequence of events \(({\mathscr {E}}_n)_{n\ge 1}\) occurs with high probability (whp) with respect to the probability measures \((\mathbb {P}_n)_{n\ge 1}\) when \(\mathbb {P}_n\big ( {\mathscr {E}}_n \big ) \rightarrow 1\). Define \(f_n = O_{\scriptscriptstyle \mathbb {P}}(g_n)\) when \( ( |f_n|/|g_n| )_{n \ge 1} \) is tight; \(f_n =o_{\scriptscriptstyle \mathbb {P}}(g_n)\) when \(f_n/g_n \xrightarrow {\scriptscriptstyle \mathbb {P}} 0 \); \(f_n =\varTheta _{\scriptscriptstyle \mathbb {P}}(g_n)\) if both \(f_n=O_{\scriptscriptstyle \mathbb {P}}(g_n) \) and \(g_n=O_{\scriptscriptstyle \mathbb {P}}(f_n)\). Denote

$$\begin{aligned} \ell ^p_{{\scriptscriptstyle \downarrow }}:= \Big \{ \mathbf {x}= (x_i)_{i=1}^\infty \subset [0,\infty ): \ x_{i+1}\le x_i\ \forall i,\text { and } \sum _{i=1}^{\infty } x_{i}^p < \infty \Big \} \end{aligned}$$

with the p-norm metric \(d(\mathbf {x}, \mathbf {y})= \big ( \sum _{i=1}^{\infty } |x_i-y_i|^p \big )^{1/p}\). Let \(\ell ^2_{{\scriptscriptstyle \downarrow }} \times \mathbb {N}^{\infty }\) denote the product topology of \(\ell ^2_{{\scriptscriptstyle \downarrow }}\) and \(\mathbb {N}^{\infty }\) with \(\mathbb {N}^{\infty }\) denoting the sequences on \(\mathbb {N}\) endowed with the product topology. Define also

$$\begin{aligned} {\mathbb {U}}_{{\scriptscriptstyle \downarrow }}:= \Big \{ ((x_i,y_i))_{i=1}^{\infty }\in \ell ^2_{{\scriptscriptstyle \downarrow }} \times \mathbb {N}^{\infty }: \sum _{i=1}^{\infty } x_iy_i < \infty \text { and } y_i=0 \text { whenever } x_i=0 \; \forall i \Big \}, \end{aligned}$$

endowed with the metric

$$\begin{aligned} \mathrm {d}_{{\mathbb {U}}}((\mathbf {x}_1, \mathbf {y}_1), (\mathbf {x}_2, \mathbf {y}_2)):= \bigg ( \sum _{i=1}^{\infty } (x_{1i}-x_{2i})^2 \bigg )^{1/2}+ \sum _{i=1}^{\infty } \big | x_{1i} y_{1i} - x_{2i}y_{2i}\big |. \end{aligned}$$
(2.1)

Further, let \({\mathbb {U}}^0_{{\scriptscriptstyle \downarrow }} \subset {\mathbb {U}}_{{\scriptscriptstyle \downarrow }}\) be given by

$$\begin{aligned} {\mathbb {U}}^0_{{\scriptscriptstyle \downarrow }}:= \big \{((x_i,y_i))_{i=1}^{\infty }\in {\mathbb {U}}_{{\scriptscriptstyle \downarrow }} : \text { if } x_k = x_m, k \le m,\text { then }y_k \ge y_m\big \}. \end{aligned}$$

Let \(({\mathbb {U}}^0_{{\scriptscriptstyle \downarrow }})^k\) denote the k-fold product space of \({\mathbb {U}}^0_{{\scriptscriptstyle \downarrow }}\).

Throughout, we write \({\mathbb {D}}[0,\infty )\) to denote the space of càdlàg functions \([0,\infty )\mapsto \mathbb {R}\) equipped with the Skorohod \(J_1\)-topology. Also, let \({\mathbb {D}}_+[0,\infty ) \subset {\mathbb {D}}[0,\infty )\) be the collection of functions with positive jumps only, and \({\mathbb {C}}[0,\infty )\subset {\mathbb {D}}[0,\infty )\) be the collection of continuous functions. For any fixed \(T>0\), \({\mathbb {D}}[0,T], {\mathbb {D}}_+[0,T], {\mathbb {C}}[0,T]\) are defined similarly for functions \([0,T]\mapsto \mathbb {R}\). For any function \(f\in {\mathbb {D}}[0,\infty )\), define \(\underline{f}(t)=\inf _{s\le t}f(s)\). Note that \(\underline{f}\) is non-increasing. Moreover,

$$\begin{aligned} \begin{aligned} \underline{f} \in {\mathbb {C}}[0,\infty ), \quad \text {whenever} \quad f\in {\mathbb {D}}_+[0,\infty ). \end{aligned} \end{aligned}$$
(2.2)

Indeed, if \(\underline{f}\) is discontinuous at some point t, then \(\underline{f}(t-) > \underline{f}(t)\), but that would mean that f has a negative jump of size \( \underline{f}(t-) - \underline{f}(t)\) at t. Thus (2.2) holds. Next, for any \(f\in {\mathbb {D}}_+[0,\infty )\), define the zero set of f by \({\mathscr {Z}}_f = \{t\ge 0: f(t) - \underline{f}(t)=0\}\), and let \(\mathrm {cl}({\mathscr {Z}}_f)\) denote the closure of \({\mathscr {Z}}_f\). An interval (lr) is called an excursion above the past minimum of f, or simply excursion of f (see [9, Sect. IV.2]) if

$$\begin{aligned} \begin{aligned} f(t) - \underline{f}(t)>0, \quad \forall t\in (l,r), \text { where } l\in \mathrm {cl}({\mathscr {Z}}_f)\text { and } r\in \mathrm {cl}({\mathscr {Z}}_f) \cup \{\infty \}. \end{aligned}\end{aligned}$$
(2.3)

For \(f\in {\mathbb {D}}_+[0,T]\), we consider \((l,r) \subset [0,T]\), and define an excursion similarly as in (2.3).

We often use boldface notation \(\mathbf {X}\) for the stochastic process \(( X(s) )_{s \ge 0}\), unless stated otherwise. Consider a decreasing sequence \( \varvec{\theta }=(\theta _1,\theta _2,\dots )\in \ell ^2_{{\scriptscriptstyle \downarrow }}{\setminus } \ell ^1_{{\scriptscriptstyle \downarrow }}\). Denote by \({\mathscr {I}}_i(s):=\mathbb {1}_{\left\{ \xi _i\le s \right\} }\) where \(\xi _i\sim \mathrm {Exp}(\theta _i/\mu )\) independently, and \(\mathrm {Exp}(r)\) denotes the exponential distribution with rate r. Consider the process

$$\begin{aligned} S_{\infty }^\lambda (t) = \frac{\lambda \mu }{\Vert \varvec{\theta }\Vert _2^2} \sum _{i=1}^{\infty } \theta _i{\mathscr {I}}_i(t)- t, \end{aligned}$$
(2.4)

for some \(\lambda , \mu >0\). Note that, for all \(t>0\), \(\mathbb {E}[S_{\infty }^\lambda (t)]<\infty \) since \(\sum _{i}\theta _i^2<\infty \), and consequently \(S_{\infty }^\lambda (t)<\infty \), almost surely. Also, for any \(u<t\),

$$\begin{aligned}\begin{aligned} \mathbb {E}\big [|S_{\infty }^\lambda (t) - S_{\infty }^\lambda (u)|\big ] \le \frac{\lambda \mu }{\Vert \varvec{\theta }\Vert _2^2} \sum _{i=1}^{\infty } \theta _i \mathrm {e}^{-\theta _i u}(1-\mathrm {e}^{-\theta _i (t-u)})+ |t-u| \le (\lambda \mu +1)|t-u|, \end{aligned}\end{aligned}$$

so that \(\mathbf {S}_{\infty }^\lambda \) has bounded variation almost surely. However, since \(\sum _i\theta _i = \infty \), the process experiences infinitely many jumps in any bounded interval of time. Define the reflected version of \(S_{\infty }^\lambda (t)\) by

$$\begin{aligned} \mathrm {refl}( S_{\infty }^\lambda (t))= S_{\infty }^\lambda (t) - \min _{0 \le u \le t} S_{\infty }^\lambda (u). \end{aligned}$$

We will show that, for any \(\lambda >0\), the excursion lengths of the process \(\mathbf {S}_{\infty }^\lambda = (S_{\infty }^\lambda (t))_{t\ge 0}\) can be ordered almost surely as an element of \(\ell ^2_{{\scriptscriptstyle \downarrow }}\). We denote this ordered vector of excursion lengths by \((\gamma _i(\lambda ))_{i\ge 1}\). For \(v,t> 0\), define \(M_t(v) := \sum _{j: v\theta _j\le 1,\ t\theta _j\le 1 } \theta _j^3.\) We will assume that for any \(t>0\),

$$\begin{aligned} \begin{aligned} \int _{0}^\infty \mathrm {e}^{- tv^2M_t(v)} \mathrm {d}v<\infty . \end{aligned}\end{aligned}$$
(2.5)

The technical condition in (2.5) on top of \(\varvec{\theta }\in \ell ^2_{{\scriptscriptstyle \downarrow }}{\setminus }\ell ^1_{{\scriptscriptstyle \downarrow }} \) will be used to ensure that the distribution of \(S_{\infty }^\lambda (t)\) is non-atomic for all \(t>0\) (see Lemma 4 below), which in turn implies that we have strict ordering between excursion lengths, i.e., \(\gamma _{i+1}(\lambda ) <\gamma _i(\lambda )\) for all \(i\ge 1\) almost surely. The condition (2.5) is relatively weak, and is, for example, satisfied for \(\theta _j = j^{-\alpha }\) for \(\alpha \in (1/2,1)\). To see this, note that \(v^2M_t(v) \) is of the same order as \(v^{-1+1/\alpha }\). However, this also shows that (2.5) is not satisfied for the extreme case \(\alpha = 1\), i.e., \(\theta _j = j^{-1}\).

Also, define the counting process \(\mathbf {N}^\lambda = (N^\lambda (t))_{t\ge 0}\) to be the Poisson process that has intensity \((\lambda \mu ^2)^{-1}\Vert \varvec{\theta }\Vert _2^2 \times \mathrm {refl}( S_{\infty }^\lambda (t))\) at time t, conditionally on \(( S_{\infty }^\lambda (u) )_{u \le t}\). Formally, \(\mathbf {N}^\lambda \) is characterized as the counting process for which

$$\begin{aligned} N^\lambda (t) - \frac{\Vert \varvec{\theta }\Vert _2^2}{\lambda \mu ^2} \int \limits _{0}^{t}\mathrm {refl}( S_{\infty }^\lambda (u))\mathrm {d}u \end{aligned}$$
(2.6)

is a martingale. We use the notation \(N_i(\lambda )\) to denote the number of marks of \(\mathbf {N}^\lambda \) in the i-th largest excursion of \(\mathbf {S}_{\infty }^\lambda \). Define

$$\begin{aligned} \begin{aligned} \mathbf {Z}(\lambda ):= ((\gamma _i(\lambda ),N_i(\lambda )))_{i\ge 1}, \text { ordered as an element of }{\mathbb {U}}^0_{{\scriptscriptstyle \downarrow }}. \end{aligned} \end{aligned}$$
(2.7)

2.1.3 Results for the critical window

Fix \(\tau \in (2,3)\). Throughout this paper, we denote

$$\begin{aligned} \alpha = 1/(\tau -1),\qquad \rho =(\tau -2)/(\tau -1),\qquad \eta =(3-\tau )/(\tau -1). \end{aligned}$$

Also, let \(D_n\) be the degree of a vertex chosen uniformly at random from [n]. We start by stating our assumptions on the degree sequences:

Assumption 1

For each \(n\ge 1\), let \(\varvec{d}={\varvec{d}}_n=(d_1,\dots ,d_n)\) be a degree sequence satisfying \(d_1\ge d_2\ge \cdots \ge d_n\). We assume the following about \(({\varvec{d}}_n)_{n\ge 1}\) as \(n\rightarrow \infty \):

  1. (i)

    (High-degree vertices) For any \(i\ge 1\), \( n^{-\alpha }d_i\rightarrow \theta _i,\) where \(\varvec{\theta }:=(\theta _i)_{i\ge 1}\in \ell ^2_{{\scriptscriptstyle \downarrow }}{\setminus } \ell ^1_{{\scriptscriptstyle \downarrow }}\) is such that (2.5) holds.

  2. (ii)

    (Moment assumptions) \((D_n)_{n\ge 1}\) is uniformly integrable, \(\lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i\in [n]}d_i= \mu \) for some \(\mu >0\), and

    $$\begin{aligned} \begin{aligned} \lim _{K\rightarrow \infty }\limsup _{n\rightarrow \infty }n^{-2\alpha } \sum _{i=K+1}^{n} d_i^2=0. \end{aligned}\end{aligned}$$
    (2.8)

In Sect. 2.2, we discuss the generality of Assumption 1 and show that power-law degrees satisfy these assumptions. For \(\mathrm {CM}_n({\varvec{d}})\), the criticality parameter \(\nu _n\) is defined as

$$\begin{aligned} \nu _n = \frac{\sum _{i\in [n]}d_i(d_i-1)}{\sum _{i\in [n]}d_i}. \end{aligned}$$

Molloy and Reed [46], and Janson and Luczak [41] showed that, under some regularity conditions, \(\mathrm {CM}_n({\varvec{d}})\) has a unique giant component (a component of size \(\varTheta (n)\)) with high probability precisely when \(\nu _n \rightarrow \nu >1\). Under Assumption 1, \(\nu _n\rightarrow \infty \), as \(n\rightarrow \infty \) since \(\sum _{i\in [n]} d_i^2 \ge d_1^2= \varTheta (n^{2\alpha })\gg n\), and \(\mathrm {CM}_n({\varvec{d}})\) always contains a giant component (see the remark below [37, Theorem 4.5] and consider \(\pi =1\)).

We study percolation, which refers to deleting each edge of a graph independently with probability \(1-p\). In case of percolation on random graphs, the deletion of edges is also independent from the underlying graph. The percolation probability is allowed to depend on the network size, i.e., \(p = p_n\). Let \(\mathrm {CM}_n(\varvec{d},p_n)\) denote the graph obtained from percolation with probability \(p_n\) on the graphs \(\mathrm {CM}_n({\varvec{d}})\). Fountoulakis [32] showed that \(\mathrm {CM}_n(\varvec{d},p_n)\) is distributed as \(\mathrm {CM}_n(\varvec{d}^p)\), where \(\varvec{d}^p\) is the degree sequence of the percolated graph. Note that the degrees in \(\varvec{d}^p\) could be correlated, so later Janson [39] gave an explicit construction which is simpler to analyze. This construction was used to identify the percolation phase transition in [39] and to study the critical window in [27, 28]. An interested reader is also referred to [28, Algorithm 4] where a construction of the whole percolation process \((\mathrm {CM}_n(\varvec{d},p))_{p\in [0,1]}\) is provided.

Now, under Assumption 1, if \(\liminf _{n\rightarrow \infty }p_n>0\), then \(\mathrm {CM}_n(\varvec{d},p_n)\) retains a giant component with high probability, i.e., \(\mathrm {CM}_n(\varvec{d},p_n)\) is always supercritical; see the remark below [37, Theorem 4.5]. Thus, in order to see the critical behavior, one must take \(p_n \rightarrow 0\), as \(n\rightarrow \infty \). For \(p_n\rightarrow 0\), the graph always contains \(n-o_{\scriptscriptstyle \mathbb {P}}(n)\) degree-zero or isolated vertices, which makes Janson’s construction inconvenient to work with.

For a sequence of finite graphs, the critical behavior is where we see intermediate behavior in the sense that it inherits some features from the subcritical (such as the absence of the giant component) and the supercritical regimes (the largest component is not a tree). The collection of such values of p is called the critical window. However, due to our lack of knowledge about the subcritical phase and the structural propeties therein, it is not a priori evident here how to define the critical window. One way to define the subcritical regime and the critical window would be to say that inside the critical window, the rescaled vector of ordered component sizes converge to some non-degenerate random vector, whereas the component sizes concentrate in the subcritical regime. This property has been observed quite universally for the percolation critical window. In this paper, we take this as our definition of the critical window. It is worthwhile to mention that there is a substantial literature on how to define the critical value. See [18, 35, 37, 43, 47] for different definitions of the critical probability and related discussions.

We will show that the critical window for percolation on \(\mathrm {CM}_n({\varvec{d}})\) is given by

$$\begin{aligned} p_c=p_c(\lambda ):= \frac{\lambda }{\nu _n}(1+o(1)), \quad \lambda \in (0,\infty ). \end{aligned}$$
(2.9)

Notice that, under Assumption 1, \( p_c \sim n^{-2\alpha +1} \sim n^{-\eta }\), where \(\eta = (3-\tau )/(\tau -1)>0\). The case where \(p\ll p_c\) will be called the barely subcritical regime and the case \(p_c\ll p \ll 1\) will be called the barely supercritical regime. We will show that a unique giant component emerges in the barely supercritical regime. We first state the results about the component sizes and the complexity in the critical window, and then discuss the barely sub-/supercritical regimes.

We will always write \({\mathscr {C}}_{\scriptscriptstyle (i)}(p)\) to denote the i-th largest component in the percolated graph. The random graph on which percolation acts will always be clear from the context. A vertex is called isolated if it has degree zero in the graph \(\mathrm {CM}_n({\varvec{d}},p_c(\lambda ))\). We define the component size corresponding to an isolated vertex to be zero (see Remark 2 below). For any component \({\mathscr {C}}\subset \mathrm {CM}_n({\varvec{d}},p_c(\lambda ))\), let \(\mathrm {SP}({\mathscr {C}})\) denote the number of surplus edges given by \(\# \{\text {edges in }\mathscr {C} \}-|\mathscr {C}|+1\). Finally, let

$$\begin{aligned}\begin{aligned} \mathbf {Z}_n(\lambda ) := \big (n^{-\rho }|{\mathscr {C}}_{\scriptscriptstyle (i)}(p_c(\lambda ))|,\mathrm {SP}({\mathscr {C}}_{\scriptscriptstyle (i)}(p_c(\lambda )))\big )_{i\ge 1}, \text { ordered as an element of }{\mathbb {U}}^0_{{\scriptscriptstyle \downarrow }}. \end{aligned}\end{aligned}$$

The following theorem gives the asymptotics for the critical component sizes and the surplus edges of \(\mathrm {CM}_n({\varvec{d}},p_c(\lambda ))\):

Theorem 1

(Critical component sizes and surplus edges). Under Assumption 1, as \(n\rightarrow \infty \),

$$\begin{aligned} \mathbf {Z}_n(\lambda ) \xrightarrow {d}\mathbf {Z}(\lambda ) \end{aligned}$$

with respect to the \({\mathbb {U}}^0_{{\scriptscriptstyle \downarrow }}\) topology, where \(\mathbf {Z}(\lambda )\) is defined in (2.7).

Remark 2

(Ignoring isolated components). Note that \(2\rho <1\) for \(\tau \in (2,3)\). When percolation is performed with probability \(p_c\), there are of the order n isolated vertices and thus \(n^{-2\rho }\) times the number of isolated vertices tends to infinity. This is the reason why we must ignore the contributions due to isolated vertices, when considering the convergence of the component sizes in the \(\ell ^2_{{\scriptscriptstyle \downarrow }}\)-topology. Note that an isolated vertex with self-loops does not create an isolated component.

For a connected graph G, \(\mathrm {diam}(G)\) denotes the diameter of the graph, i.e., the maximum graph distance between any pair of vertices. For an arbitrary graph G, \(\mathrm {diam}(G):=\max \mathrm {diam}(\mathscr {C})\), where the maximum is taken over all connected components. Our next result shows that the diameter of the largest connected components is of order \(\log n\):

Theorem 2

(Diameter of largest critical clusters). Under Assumption 1,

$$\begin{aligned}\begin{aligned} \mathrm {diam}(\mathrm {CM}_n(\varvec{d},p_c(\lambda ))) = O_{\scriptscriptstyle \mathbb {P}}(\log n). \end{aligned}\end{aligned}$$

Thus, the maximum diameter scales logarithmically in the \(\tau \in (2,3)\), in contrast to the other universality classes in the \(\tau \in (3,4)\) and \(\tau >4\) regimes, where graph distances scale as a positive power of n [1, 13].

2.1.4 Behavior in the near-critical regimes

We now discuss asymptotic results for the component sizes in the barely subcritical (\(p_n\ll p_c(\lambda )\)) and barely supercritical (\(p_n\gg p_c(\lambda )\)) regimes. The next two theorems summarize the behavior outside the critical window:

Theorem 3

(Barely subcritical regime). For \(\mathrm {CM}_n(\varvec{d},p_n)\), suppose that \(n^{-\alpha }\ll p_n\ll p_c(\lambda )\) and that Assumption 1 holds. Then, as \(n\rightarrow \infty \),

$$\begin{aligned} \big ((n^\alpha p_n)^{-1}|{\mathscr {C}}_{\scriptscriptstyle (i)}(p_n)|\big )_{i\ge 1} \xrightarrow {\mathbb {P}}(\theta _i)_{i\ge 1}, \end{aligned}$$

in \(\ell ^{2}_{{\scriptscriptstyle \downarrow }}\) topology, and \(\mathbb {P}(\mathrm {SP}({\mathscr {C}}_{\scriptscriptstyle (i)}(p_n)) =0) \rightarrow 1\), for all \(i\ge 1\).

Remark 3

(Components and hubs). In the barely subcritical regime, we show that the i-th largest component is essentially the component containing the i-th largest degree vertex, or the i-th hub. Since the hubs have degree \(\varTheta (n^{\alpha })\), we need the assumption that \(p_n\gg n^{-\alpha }\) in Theorem 3, as otherwise the hubs become isolated, in which case components are likely to be extremely small.

For the result in the barely supercritical regime, let \(p_c(\lambda ) \ll p_n\ll 1\). The exact asymptotics of the high-degree vertices and the tail behavior in (2.8) will not be required. Below, we state the sufficient conditions for the concentration of the size of the giant component. In Sect. 2.2, we will see that these conditions are satisfied when the degrees are sampled from a power-law distribution:

Assumption 2

For each \(n\ge 1\), let \(\varvec{d}={\varvec{d}}_n=(d_1,\dots ,d_n)\) be a degree sequence satisfying \(d_1\ge d_2\ge \cdots \ge d_n\). We assume the following about \(({\varvec{d}}_n)_{n\ge 1}\):

  1. (i)

    \(d_1 = O(n^\alpha )\).

  2. (ii)

    \((D_n)_{n\ge 1}\) is uniformly integrable, and \(\lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i\in [n]}d_i= \mu \) for some \(\mu >0\).

  3. (iii)

    Let \(D_n^\star \) denote the degree of a vertex chosen in a size-biased manner with the sizes being \((d_i/\ell _n)_{i\in [n]}\). Then, there exists a constant \(\kappa >0\) such that

    $$\begin{aligned} 1 - \mathbb {E}[\mathrm {e}^{-tp_n^{1/(3-\tau )}D_n^{\star }}] = \kappa p_n^{(\tau -2)/(3-\tau )}(t^{\tau -2}+o(1)). \end{aligned}$$
    (2.10)

Let \(\mathrm {E}(G)\) denote the number of edges in the graph G.

Theorem 4

(Barely supercritical regime). For \(\mathrm {CM}_n(\varvec{d},p_n)\), suppose that \(p_c(\lambda ) \ll p_n\ll 1\) and that Assumption 2 hold. Then, as \(n\rightarrow \infty \),

$$\begin{aligned} \frac{|{\mathscr {C}}_{\scriptscriptstyle (1)}(p_n)|}{np_n^{1/(3-\tau )}} \xrightarrow {\mathbb {P}}\mu \kappa ^{1/(3-\tau )}, \quad \frac{\mathrm {E}({\mathscr {C}}_{\scriptscriptstyle (1)}(p_n))}{np_n^{1/(3-\tau )}} \xrightarrow {\mathbb {P}}\mu \kappa ^{1/(3-\tau )}, \end{aligned}$$

and for all \(i\ge 2\), \(|{\mathscr {C}}_{\scriptscriptstyle (i)}(p_n)| = o_{\scriptscriptstyle \mathbb {P}}(np_n^{1/(3-\tau )})\), \(\mathrm {E}({\mathscr {C}}_{\scriptscriptstyle (i)}(p_n)) = o_{\scriptscriptstyle \mathbb {P}}(np_n^{1/(3-\tau )})\).

Remark 4

(Relation to Abel-Tauberian theorem). The infinite second-moment assumption is captured by (2.10). The identity (2.10) is basically a version of the celebrated Abel-Tauberian theorem [31, Chapter XIII.5] (see also [15, Chapter 1.7]). However, since both \(D_n^{\star }\) and \(p_n\) depend on n, the joint asymptotics needs to be stated as an assumption. In Sect. 2.2, we discuss how this assumption is satisfied when (i) \(d_i = (1-F)^{-1}(i/n)\) (ii) \(d_i\) is the i-th order statistic of an i.i.d sample, where F is a power-law distribution with \(\tau \in (2,3)\).

2.2 Discussion

Critical window: emergence of hub connectivity. The critical window is the regime in which hubs start getting connected. Hubs are the high-degree vertices, whose asymptotic degree is determined by Assumption 1(i). To understand the above remark more precisely, let us denote the probability that i and j are in the same component in the p-percolated graph by \(\pi (i,j,p)\). Then, for any fixed \(i, j\ge 1\),

$$\begin{aligned}&\limsup _{n\rightarrow \infty } \pi (i,j,p_n) = 0 \quad \text { for }p_n \ll p_c, \end{aligned}$$
(2.11)
$$\begin{aligned}&0< \liminf _{n\rightarrow \infty } \pi (i,j,p_n)\le \limsup _{n\rightarrow \infty } \pi (i,j,p_n) <1 \quad \text { for }p_n = \varTheta ( p_c), \end{aligned}$$
(2.12)
$$\begin{aligned}&\limsup _{n\rightarrow \infty } \pi (i,j,p_n) = 1 \quad \text { for }p_n \gg p_c, \end{aligned}$$
(2.13)

Indeed, any two vertices i and j share \(p_n d_id_j/(\ell _n-1)\) edges in expectation. This expectation is o(1), \(\varTheta (1)\), or \(\omega (1)\) depending on whether \(p_n\ll p_c\), \(p_n\sim p_c\), or \(p_n\gg p_c\). In the subcritical regime, this observation and a simple union bound yields (2.11). For the critical case, a method of moment computation shows that the number of edges between hubs i and j converges in distribution to Poisson\((\lambda \theta _i \theta _j/\mu )\). We don’t prove this here, but instead refer the reader to [36, Proposition 7.13] where similar Poisson approximation computations have been done for the configuration model. This shows (2.12). In the super-critical regime,

$$\begin{aligned}\begin{aligned} \mathbb {P}((i,j) \text { don't share any edge}) \le \prod _{l=1}^{d_j} \bigg (1-\frac{p_nd_i}{\ell _n-2l+1}\bigg ) \le \mathrm {e}^{-p_n d_i d_j/2\ell _n} \rightarrow 0, \end{aligned} \end{aligned}$$

so that \(1- \pi (i,j,p_n) \rightarrow 0\) which yields (2.13). Intuitively, in the barely subcritical regime, all the hubs are in different components. Hubs start getting connected to each other directly, forming the critical components as the p varies over the critical window. Finally in the barely super-critical regime the giant component, which contains all the hubs, is formed. The features (2.11), (2.12) and (2.13) are also observed in the \(\tau \in (3,4)\) case [12]. However, the key distinction between \(\tau \in (3,4)\) and \(\tau \in (2,3)\) is that for \(\tau \in (3,4)\) the paths between the hubs have lengths that grow as \(n^{(\tau -3)/(\tau -1)}\), whereas they are directly connected in the \(\tau \in (2,3)\) regime.

Intuitive explanation for the exploration process. Suppose that we explore the critically percolated configuration model sequentially in a breadth-first manner. The reflected version of the stochastic process in (2.4) turns out to be the limit of the process that counts the number of unpaired half-edges incident to the discovered vertices. This limiting process can be intuitively understood as follows. When we explore hubs, the exploration process increases drastically, causing the jumps in the first term in (2.4). The negative linear drift is an accumulation of two effects. (1) Because we explore two vertices at each time, we get a negative drift \(-2t\). (2) The exploration of the low-degree vertices cumulatively causes a linear positive drift \(+t\). The main contribution in the latter case comes due to the degree-one vertices in the system. Thus in total, we get a drift of \(-t\) in the exploration process (2.4).

Assumption on the degrees. Assumptions 1, 2 hold for two interesting special cases of power-law degrees that have received special attention in the literature: Case (I) \(d_i = (1-F)^{-1}(i/n)\), Case (II) \(d_i\)’s are the order statistics of an i.i.d sample from F. Here F is some distribution function supported on non-negative integers and \((1-F)(x) = c_{\scriptscriptstyle F} k^{-(\tau -1)},\) for \(k \le x< k+1\), and we recall that the inverse of a bounded non-increasing function \(f:\mathbb {R}\mapsto \mathbb {R}\) is defined as

$$\begin{aligned}\begin{aligned} f^{-1}(x) := \inf \{y:f(y)\le x\}. \end{aligned}\end{aligned}$$

We add a dummy half-edge to vertex 1 if necessary to make \(\sum _{i\in [n]}d_i\) even. However, we ignore this contribution since this does not change any asymptotic calculation below. Recall that we use C as a generic notation for a constant whose value can be different between expressions, and \(a_n \sim b_n\) denotes \(\lim _{n\rightarrow \infty }a_n/b_n = 1\).

For Case (I), \(d_i \sim (c_{\scriptscriptstyle F}n/i)^\alpha \) for all \(i = o(n)\) and \(d_i \le C(n/i)^\alpha \) for all \(i\in [n]\). Consequently, Assumption 1(i) is satisfied with \(\theta _i = c_{\scriptscriptstyle F}^\alpha i^{-\alpha }\). To see Assumption 1(ii), note that

$$\begin{aligned}\begin{aligned} \frac{1}{n} \sum _{i\in [n]} d_i \sim \int _0^1 (1-F)^{-1}(x) \mathrm {d}x = \mathbb {E}[D], \end{aligned}\end{aligned}$$

where D has distribution function F, and

$$\begin{aligned} \begin{aligned} n^{-2\alpha } \sum _{i>K} d_i^2 \le C \sum _{i>K} i^{-2\alpha } \sim CK^{1-2\alpha } \rightarrow 0 \quad \text { as } K\rightarrow \infty . \end{aligned}\end{aligned}$$
(2.14)

Also, \(D_n\xrightarrow {d}D\), and \(\mathbb {E}[D_n]\rightarrow \mathbb {E}[D]\) implies that \((D_n)_{n\ge 1}\) is uniformly integrable. To see Assumption 2, with the above computations, we have already verified all the conditions in Assumption 2(i), (ii). To verify Assumption 2(iii), we now show that, for \(t_n = tp_n^{1/(3-\tau )}\) with fixed \(t>0\),

$$\begin{aligned}\begin{aligned}&1 - \mathbb {E}[\mathrm {e}^{-t_nD_n^{\star }}] = \frac{1}{\ell _n} \sum _{k\in [n]} d_k \big (1-\mathrm {e}^{-t_nd_k}\big ) \sim t_n^{\tau - 2} \int _0^\infty c_{\scriptscriptstyle F} z^{-\alpha } (1-\mathrm {e}^{-c_{\scriptscriptstyle F}z^{-\alpha }}) \mathrm {d}z, \end{aligned}\end{aligned}$$

and thus (2.10) holds as well. Let us split the last sum in three parts by restricting to the set \(\{k: d_k <\varepsilon (t_n)^{-1}\}\), \(\{k: d_k \in [\varepsilon (t_n)^{-1}, (\varepsilon t_n)^{-1}]\}\), and \(\{k: d_k >(\varepsilon t_n)^{-1}\}\) and denote them by (I), (II) and (III) respectively. Using the fact that \(1-\mathrm {e}^{-x} \le x\), it follows that

$$\begin{aligned} \begin{aligned}&\frac{(I)}{t_n^{\tau -2}} \le \frac{t_n^{3-\tau }}{\ell _n} \sum _{k: d_k < \varepsilon (t_n)^{-1}} d_k^2 \sim Cn^{2\alpha - 1}t_n^{3-\tau } \sum _{k\ge Cn (t_n/\varepsilon )^{1/\alpha }}k^{-2\alpha } \\&\quad \sim C n^{2\alpha - 1}t_n^{3-\tau } \int _{Cn (t_n/\varepsilon )^{\tau -1}}^\infty x^{-2\alpha } \mathrm {d}x \sim C n^{2\alpha - 1}t_n^{3-\tau } (Cn (t_n/\varepsilon )^{\tau -1})^{1-2\alpha } \sim C\varepsilon ^{3-\tau }, \end{aligned}\nonumber \\ \end{aligned}$$
(2.15)

and

$$\begin{aligned} \frac{(III)}{t_n^{\tau -2}} \le \frac{C}{t_n^{(\tau -2)}\ell _n} \sum _{k: d_k > (\varepsilon t_n)^{-1}} d_k \le \frac{C n^{\alpha -1}}{t_n^{\tau -2}} \int _{1}^{Cn (t_n \varepsilon )^{\tau -1}} \frac{\mathrm {d}x}{x^{\alpha }} \sim C \varepsilon ^{\tau -2}. \end{aligned}$$
(2.16)

Also, we compute (II) by

$$\begin{aligned}\begin{aligned} \frac{(II)}{t_n^{\tau -2}}&= \frac{1}{t_n^{\tau -2} \ell _n } \sum _{k: d_k \in [\varepsilon t_n^{-1}, (\varepsilon t_n)^{-1}]} d_k (1-\mathrm {e}^{-t_n d_k}) \\&\sim \frac{n^{\alpha - 1}}{\mu t_n^{\tau -2}} \sum _{k\in [c_0 n (t_n \varepsilon )^{\tau -1}, c_1 (t_n/\varepsilon )^{\tau -1}]} c_{\scriptscriptstyle F} k^{-\alpha } \big (1-\mathrm {e}^{-t_n (c_{\scriptscriptstyle F}n/k)^{\alpha }}\big )\\&= \frac{1}{n t_n^{\tau -1}} \sum _{z\in [c_0\varepsilon ^{\tau -1}, c_1/\varepsilon ^{\tau -1}]} c_{\scriptscriptstyle F} z^{-\alpha } (1-\mathrm {e}^{-c_{\scriptscriptstyle F}z^{-\alpha }}), \end{aligned}\end{aligned}$$

where we have put \(k = n t_n^{\tau -1} z\), so that the z values increase by \(1/(n t_n^{\tau -1})\) in the final sum. Thus, in the iterated limit \(\lim _{\varepsilon \rightarrow 0} \limsup _{n\rightarrow \infty }\),

$$\begin{aligned}\begin{aligned} \frac{(II)}{t_n^{\tau -2}} \rightarrow \int _0^\infty c_{\scriptscriptstyle F} z^{-\alpha } (1-\mathrm {e}^{-c_{\scriptscriptstyle F}z^{-\alpha }}) \mathrm {d}z = \kappa , \end{aligned}\end{aligned}$$

which yields (2.10) by combining it with (2.15) and (2.16).

Let us now consider Case (II), i.e., the i.i.d degree setup. We have assumed that the degree sequence is ordered in a non-decreasing manner, i.e., \(d_i\) is the i-th order statistic of the i.i.d samples. We use the following construction from [21, Sect. 13.6]. Let \((E_1,E_2,\dots )\) be an i.i.d sequence of unit-rate exponential random variables and let \(\varGamma _i:= \sum _{j=1}^iE_j\). Let

$$\begin{aligned} {\bar{d}}_i=(1-F)^{-1}(\varGamma _i/\varGamma _{n+1}). \end{aligned}$$
(2.17)

Then \((d_1,\dots ,d_n) {\mathop {=}\limits ^{d}} ({\bar{d}}_1,\dots ,{\bar{d}}_n)\). Now, \(\varGamma _i\)’s follow a Gamma distribution with shape parameter n and scale parameter 1. Note that, by the stong law of large numbers, \(\varGamma _{n+1}/n \xrightarrow {\scriptscriptstyle \mathrm {a.s.}}1\). Thus, for each fixed \(i\ge 1\), \(\varGamma _{n+1}/(n\varGamma _i)\xrightarrow {\scriptscriptstyle \mathrm {a.s.}}1/\varGamma _i\). Using (2.17), we see that \({\varvec{d}}\) satisfies Assumption 1(i) almost surely with \(\theta _i=(C_F/\varGamma _i)^{\alpha }\). To see that \((\theta _i)_{i\ge 1} \in \ell ^2_{{\scriptscriptstyle \downarrow }} {\setminus } \ell ^1_{{\scriptscriptstyle \downarrow }}\), observe that \(\varGamma _i/i\xrightarrow {\scriptscriptstyle \mathrm {a.s.}}1\), and \(\alpha \in (1/2,1)\). Next, the first condition in Assumption 1(ii) follows from the strong law of large numbers. To see the second condition, we note that \(\sum _i \varGamma _i^{-2\alpha } <\infty \) almost surely. Now using the fact that \(\varGamma _{n+1}/n\xrightarrow {\scriptscriptstyle \mathrm {a.s.}}1\), we can use arguments identical to (2.14) to show that \(\lim _{K\rightarrow \infty }\limsup _{n\rightarrow \infty } n^{-2\alpha }\sum _{i>K}d_i^2=0\) on the event \(\{\sum _{i=1}^{\infty }\varGamma _i^{-2\alpha }<\infty \}\cap \{\varGamma _{n+1}/n\rightarrow 1\}\). Thus, we have shown that the third condition of Assumption 1(ii) holds almost surely. The verification of Assumption 2 is also identical to Case-(I) if we do the computations conditionally on the Gamma random variables and use the above asymptotics.

Extension to the Norros–Reittu model. A related model where one would expect the same behavior as the configuration model is the multigraph version of the Norros-Reittu model or the Poisson graph process [51]. Given a weight sequence \((w_i)_{i\in [n]}\), the Norros-Reittu multigraph is the multipgraph generated by putting \(\mathrm {Poisson} (w_iw_j/L_n)\) many edges between vertices i and j, where \(L_n = \sum _{i\in [n]} w_i\). If Assumptions 12 holds with \((d_i)_{i\in [n]}\) replaced by \((w_i)_{i\in [n]}\), then we expect the same results for percolation on the Norros-Reittu multigraph about the critical and near critical regimes as described above. We do not pursue the Norros-Reittu multigraph here.

Open Problems. We next state some open problems:

Open Problem 1. Theorem 1 studies convergence of \(\mathbf {Z}_n(\lambda )\) for each fixed \(\lambda \). It will be interesting to study the distribution of \((\mathbf {Z}_n(\lambda ))_{\lambda >0}\) as a stochastic process, when the percolated graphs are coupled through the Harris coupling. In the \(\tau >4\) and \(\tau \in (3,4)\) regimes, such evolution of critical components is described by the so-called augmented multiplicative coalescent process. However, we do not expect the limit to be the augmented multiplicative coalescent here. This is clear from the fact that the scaling limit in (2.4) is not related to the general characterization of exploration processes that arise in relation to multiplicative coalescent in [5]. Heuristically, one would expect that if \(\sum _{i\in {\mathscr {C}}}d_i\mathbb {1}\{i\text { is hub}\}\) denotes the mass of a component, then the components would merge at rate proportional to their masses, but additionally, there are immigrating vertices of degree-one that keep on increasing the component sizes as well. The description of the process, and proving its Feller properties and entrance boundary conditions, are interesting open challenges.

Open Problem 2. Is it possible to prove that the metric structure of components converges in a suitable topology? This question is motivated by a strong notion of structural convergence of critical components that was first established in [1] (\(\tau >4\)) and [13] (\(\tau \in (3,4)\)). Since the components have small distances, it may be natural to consider the local-weak convergence framework. However, the hubs within components have unbounded degrees, which is not covered directly in the local-weak convergence framework.

3 Properties of the Excursions of the Limiting Process

In this section, we prove some good properties of the process (2.4) that allows us to conclude the convergence of largest excursion lengths from the stochastic process convergence. In Sect. 3.1, we identify these good properties for functions in \({\mathbb {D}}_+[0,\infty )\) that ensure continuity of the largest excursion map. Then, we prove in Sect. 3.2 that \(\mathbf {S}_{\infty }^\lambda \) satisfies these good properties almost surely.

3.1 Continuity of the largest excursion map

Recall the definitions of excursions from (2.3). Also, recall from Sect. 2.1.2 that \(\underline{f}(t) = \inf _{u\le t} f(u)\) and \(\mathscr {Z}_f = \{t: f(t) = \underline{f}(t)\}\). Define the set of excursions of f as

$$\begin{aligned} {\mathscr {E}}_f:= \{(l,r): (l,r) \text { is an excursion of }f\}. \end{aligned}$$

We denote the set of excursion begin-points (or left-points) and end-points (or right-points) by \({\mathscr {L}}_f\) and \({\mathscr {R}}_f\) respectively, i.e.,

$$\begin{aligned} {\mathscr {L}}_f:= \{l\ge 0: (l,r)\in {\mathscr {E}}_f \text { for some }r\}\quad \text {and}\quad {\mathscr {R}}_f:= \{r\ge 0: (l,r)\in {\mathscr {E}}_f\text { for some }l\}. \end{aligned}$$

We will use the following elementary fact:

Fact 1

Let \(f\in {\mathbb {D}}_+[0,\infty )\). Then, for all \(r\in {\mathscr {R}}_f{\setminus } \{\infty \}\), f is continuous at r. Consequently, \(r\in \mathscr {Z}_f\).

Proof

Using the right-continuity of f, it suffices to show that \(f(r) = f(r-)\). Suppose that is not the case. Since f has positive jumps only, we must have that \(f(r-) < f(r)\). Since r is an excursion ending point, there exists \(\varepsilon >0\) such that \(f(t) - \underline{f}(t) > 0 \) for all \(t\in (r-\varepsilon ,r)\). On the other hand, using the right-continuity of f and the fact that \(f(r) >f(r-)\), we obtain that \(f(t) - \underline{f}(t) > 0 \) for all \(t\in [r,r+\varepsilon )\) for some \(\varepsilon >0\). Thus, there exists a sufficiently small \(\varepsilon >0\) such that \(f(t) - \underline{f}(t) > 0 \) for all \(t\in (r-\varepsilon ,r+\varepsilon )\). This contradicts the fact that \(r \in \mathrm {cl}(\mathscr {Z}_f){\setminus } \{\infty \}\). \(\quad \square \)

For \(f\in {\mathbb {D}}_+[0,\infty )\), let \(\phi _i(f)\) be the length of the i-th largest excursion of f. Also, let \({\mathscr {A}}_i(f)\) denote the area under i-th largest excursion of f. We will show that if \(f_n\rightarrow f\) in \({\mathbb {D}}[0,\infty )\) then \(\phi _i\) and \({\mathscr {A}}_i\) converge when the limiting function has some good properties. Let us start by describing these good properties:

Definition 1

(Good functions). A function \(f\in {\mathbb {D}}_+[0,\infty )\) is said to be good if the following holds:

  1. (a)

    For all \(r\in {\mathscr {R}}_f{\setminus } \{\infty \}\), r is not a local minimum of f.

  2. (b)

    There does not exist any interval \((q_1,q_2)\) with \(q_1,q_2\in \mathbb {Q}_+\) such that \((q_1,q_2) \subset \mathscr {Z}_f\).

  3. (c)

    For all \((l,r) \in {\mathscr {E}}_f\) with \(r<\infty \), there exists \(\varepsilon _0=\varepsilon _0(l,r)>0\) such that the following holds for all \(\varepsilon \in (0,\varepsilon _0)\): There exists \(\delta = \delta (\varepsilon ,l,r)>0\) such that

    $$\begin{aligned} f(t)> f(r)+\delta \quad \forall t\in (l+\varepsilon ,r-\varepsilon ). \end{aligned}$$
    (3.1)
  4. (d)

    f does not have any infinite excursion, i.e., \(\phi _1(f)<\infty \).

  5. (e)

    For any \(\delta >0\), f has only finitely many excursions of length at least \(\delta \).

  6. (f)

    For all \(i\ge 1\), \(\phi _{i+1} (f)< \phi _i(f)\).

Lemma 1

Suppose that \(f\in {\mathbb {D}}_+[0,\infty )\) is good. Further, let \((f_n)_{n\ge 1} \subset {\mathbb {D}}[0,\infty )\) be such that \(f_n\rightarrow f\) in \({\mathbb {D}}[0,\infty )\). Moreover, let \(\limsup _{n\rightarrow \infty } \phi _1(f_n) < \infty \), and if \(z_n(T)\) denotes the length of the largest excursion of \(f_n\) starting after T, then \(\lim _{T\rightarrow \infty }\limsup _{n\rightarrow \infty } z_n(T) = 0\). Then, for all \(m\ge 1\), as \(n\rightarrow \infty \),

$$\begin{aligned}\begin{aligned} (\phi _i(f_n))_{i\in [m]} \rightarrow (\phi _i(f))_{i\in [m]}, \quad \text {and} \quad ({\mathscr {A}}_i(f_n))_{i\in [m]} \rightarrow ({\mathscr {A}}_i(f))_{i\in [m]}. \end{aligned}\end{aligned}$$

Proof

The proof here is for \(m=1\), and for \(m>1\), we can proceed inductively. Using Definitions 1(d), (e), as well as the assumptions of the lemma, we can take \(T>0\) and \(n_0\ge 1\) large so that the largest excursions of \(f_n\) and f end before T for all \(n\ge n_0\). Let \(\mathfrak {L}\) denote the set of continuous functions \(\varLambda :[0,\infty )\rightarrow [0,\infty )\) that are strictly increasing and satisfy \(\varLambda (0)=0, \varLambda (T)=T\). Suppose (lr) is the longest excursion of f on [0, T], and thus \(\phi _1(f)=r-l\). We will first show that \(\lim _{n\rightarrow \infty }\phi _1(f_n)=\phi _1(f)\).

Fix \(\varepsilon ,\delta >0\) such that (3.1) holds. Let \(||\cdot ||_{\scriptscriptstyle T}\) denote the sup-norm on [0, T]. Recall the definition of the metric for Skorohod \(J_1\)-topology from [14, (12.13)]. Since \(f_n\rightarrow f\) in \({\mathbb {D}}[0,T]\), there exists \((\varLambda _n)_{n\ge 1} \subset \mathfrak {L}\), and \(n_1\ge n_0\) such that for all \(n\ge n_1\),

$$\begin{aligned} ||f_n\circ \varLambda _n-f||_{\scriptscriptstyle T}< \frac{\delta }{2}\quad \text {and}\quad ||\varLambda _n-I||_{\scriptscriptstyle T}< \varepsilon , \end{aligned}$$
(3.2)

where I is the identity function. Using (3.1) and (3.2), for all \(t\in (l+\varepsilon ,r-\varepsilon )\) and \(n\ge n_1\),

$$\begin{aligned}\begin{aligned} f_n\circ \varLambda _n (t)> f(t) - \frac{\delta }{2} > f(r) + \frac{\delta }{2} = \underline{f}(r) + \frac{\delta }{2}, \end{aligned}\end{aligned}$$

where the last equality is due to \(r\in \mathscr {Z}_f\) from Fact 1. Thus, using \(||\varLambda _n-I||_{\scriptscriptstyle T}< \varepsilon \) from (3.2),

$$\begin{aligned} f_n(t)> \underline{f}(r)+\frac{\delta }{2}\quad \forall t \in (l+2\varepsilon , r-2\varepsilon ). \end{aligned}$$
(3.3)

Next, note that the infimum operation is continuous in the Skorohod \(J_1\)-topology [56, Theorem 13.4.1], and thus \(\underline{f}_n \rightarrow \underline{f}\) in \({\mathbb {D}}[0,T]\). Moreover, using (2.2), \(\underline{f} \in {\mathbb {C}}[0,T]\), and therefore, there exists \(n_2\ge n_0\), such that for all \(n\ge n_2\)

$$\begin{aligned} \begin{aligned} \Vert \underline{f}_n - \underline{f}\Vert _{\scriptscriptstyle T} < \frac{\delta }{4}. \end{aligned}\end{aligned}$$
(3.4)

Using \(\underline{f}(t) = \underline{f}(r)\) for all \(t\in [l,r]\), this implies that, for all \(n\ge n_2\),

$$\begin{aligned} \underline{f}(r) = \underline{f}(t)> \underline{f}_n(t)-\frac{\delta }{4}\quad \forall t \in (l+2\varepsilon , r-2\varepsilon ), \end{aligned}$$

and consequently (3.3) yields that for all \(n\ge \max \{n_1,n_2\}\)

$$\begin{aligned} \begin{aligned} f_n(t)-\underline{f}_n(t) > \frac{\delta }{4}\quad \forall t \in (l+2\varepsilon , r-2\varepsilon ). \end{aligned}\end{aligned}$$
(3.5)

Thus,

$$\begin{aligned} \liminf _{n\rightarrow \infty }\phi _1(f_n)\ge r-l-4\varepsilon = \phi _1(f)-4\varepsilon , \end{aligned}$$
(3.6)

which provides the required lower bound. We now turn to a suitable upper bound on the quantity \(\limsup _{n\rightarrow \infty }\phi _1(f_n)\). We claim that, using Definition 1(b), one can find \(r_1,\dots ,r_k\in {\mathscr {R}}_f\) such that \(r_1\le \phi _1(f)+\varepsilon , T-r_k< \phi _1(f)+\varepsilon , \) and \(r_i-r_{i-1}\le \phi _1(f)+\varepsilon , \forall i=2, \dots ,k\). Indeed, since \(\phi _1(f)\) is the largest excursion length of f, if there is no excursion end-point in between 0 and \(\phi _1(f)+\varepsilon \), then there is no excursion begin-point in \([0,\varepsilon )\). The latter shows that the interval \([0,\varepsilon )\) is contained in \(\mathscr {Z}_f=\{t: f(t) = \underline{f}(t)\}\), which contradicts Definition 1(b). The existence of the points \(r_2,\dots ,r_k\) can be shown inductively using similar argument as above. Let \(l_1,\dots ,l_k\) be the excursion begin-points corresponding to the endpoints \(r_1,\dots ,r_k\). We will show that, for all i, \(f_n\) will have an excursion begin-point in \((l_i-4\varepsilon ,l_i+2\varepsilon )\) and end-point in \((r_i-3\varepsilon ,r_i+2\varepsilon )\), so that the largest excursion of \(f_n\) is contained inside one of the intervals \((l_i-4\varepsilon ,r_i+2\varepsilon )\) for \(i\in [k]\).

Using Definition 1(a), \(r_i\) is not a local minimum, and thus for any \(\varepsilon >0\) (sufficiently small), there exists \(\delta >0\) and \(t_i\in (r_i,r_i+\varepsilon )\) such that \(f(r_i)-f(t_i)> \delta \). We also let \(\delta >0\) be sufficiently small such that (3.2) holds. Thus, using (3.2), for all \(n\ge n_1\),

$$\begin{aligned} f(r_i)-f_n(\varLambda _n(t_i))\ge f(r_i)-f(t_i) -\frac{\delta }{2} > \frac{\delta }{2}. \end{aligned}$$

Since \(t_i\in (r_i,r_i+\varepsilon )\), we have that \(t_i^n = \varLambda _n(t_i)\in (r_i-\varepsilon ,r_i+2\varepsilon )\). Thus, for all \(n\ge n_1\), there exists a point \(t_i^n\in (r_i - \varepsilon ,r_i+2\varepsilon )\) such that

$$\begin{aligned} f(r_i)-f_n(t_i^n)> \frac{\delta }{2}. \end{aligned}$$
(3.7)

Next, using (3.4),

$$\begin{aligned} \begin{aligned} \underline{f}_n(r_i-3\varepsilon )\rightarrow \underline{f}(r_i-3\varepsilon ) \ge \underline{f}(r_i)= f(r_i), \end{aligned}\end{aligned}$$
(3.8)

since \(r_i\in \mathscr {Z}_f\). Combining (3.7) and (3.8), we see that \(\underline{f}_n(r_i-3\varepsilon ) > \underline{f}_n(t_i^n)\), and by (3.5), we also have that \(f_n(r_i-3\varepsilon ) - \underline{f}_n(r_i-3\varepsilon )>0\), when \(\varepsilon >0\) is so small that \(r_i-3\varepsilon >l_i+2\varepsilon \). Thus \(f_n\) must have an excursion end-point in \((r_i-3\varepsilon ,r_i+2\varepsilon )\). Also, using Definition 1(b), f has an excursion end-point \(r^0_i\in (l_i-\varepsilon ,l_i)\). The previous argument shows that \(f_n\) has to have an excursion end-point \(r_{i}^{0n} \in (r^0_i-3\varepsilon , r^0_i+2\varepsilon )\), and thus by (3.5), \(f_n\) must have an excursion begin-point in \((r_{i}^{0n},l_i+2\varepsilon )\subset (l_i-4\varepsilon ,l_i+2\varepsilon )\). Therefore,

$$\begin{aligned} \limsup _{n\rightarrow \infty } \phi _1(f_n)\le \max _{i\in [k]} (r_i-l_i) + 6\varepsilon \le \max _{i\in [k]} (r_i-r_{i-1}) + 6\varepsilon \le \phi _1(f)+7\varepsilon . \end{aligned}$$
(3.9)

Hence, the convergence of the largest excursion length follows from (3.6) and (3.9).

Next, we show that \(\lim _{n\rightarrow \infty } {\mathscr {A}}_1(f_n) = {\mathscr {A}}_1(f)\). Let \(e= (l,r)\) be the largest excursion of f. Using (3.5), the interval \((l-2\varepsilon ,r+2\varepsilon )\) is part of some excursion of \(f_n\). Let us denote this excursion by \(e_n = (L_n(e),R_n(e))\). We will show that \(e_n\) is the largest excursion of \(f_n\) when n is large. Indeed, the arguments above already show that

$$\begin{aligned}\begin{aligned} l-4\varepsilon \le L_n (e)\le l+2\varepsilon , \quad \text {and} \quad r-3\varepsilon \le R_n(e) \le r+2\varepsilon , \end{aligned}\end{aligned}$$

and thus \(R_n(e) - L_n(e) \ge r-l-5\varepsilon \). Now, using Definition 1(f), we can take \(\varepsilon >0\) sufficiently small such that \(\phi _2(f_n) < r-l -5\varepsilon \) for all sufficiently large n. Thus, \(e_n = (L_n(e), R_n(e))\) must be the largest excursion of \(f_n\). The convergence of \({\mathscr {A}}_1(f_n)\) follows by using \(f_n\rightarrow f\) in \({\mathbb {D}}[0,\infty )\) together with \(L_n(e) \rightarrow l\) and \(R_n(e) \rightarrow r\) as \(n\rightarrow \infty \). \(\quad \square \)

Remark 5

We emphasize that the strict ordering between excursion lengths in Definition 1(f) is only used in the convergence of \({\mathscr {A}}_i(f_n)\). This ensures that the location of largest excursions of \(f_n\) and f approximately coincide, which is strictly stronger than requiring the convergence of excursion lengths.

Next, we define what it means for a stochastic process \(\mathbf {X} \in {\mathbb {D}}_+[0,\infty )\) to be good:

Definition 2

(Good stochastic process). A stochastic process \(\mathbf {X}\) with sample paths in \({\mathbb {D}}_+[0,\infty )\) is said to be good if the sample path satisfies all the conditions of Definition 1 almost surely.

The following is a direct consequence of Lemma 1:

Proposition 1

Consider a sequence of stochastic processes \((\mathbf {X}_n)_{n\ge 1}\) and a good stochastic process \(\mathbf {X}\) such that \(\mathbf {X}_n \xrightarrow {\scriptscriptstyle d} \mathbf {X}\). Also, let \((\phi _1(\mathbf {X}_n))_{n\ge 1}\) be tight, and if \(Z_n(T)\) denotes the length of the largest excursion of \(\mathbf {X}_n\) starting after time T, then assume that, for any \(\varepsilon >0\), \(\lim _{T\rightarrow \infty }\limsup _{n\rightarrow \infty } \mathbb {P}(Z_n(T) >\varepsilon )= 0\). Then, for all \(m\ge 1\),

$$\begin{aligned}\begin{aligned} \big (\phi _i(\mathbf {X}_n), {\mathscr {A}}_i(\mathbf {X}_n)\big )_{i\in [m]}\xrightarrow {d}\big (\phi _i(\mathbf {X}), {\mathscr {A}}_i(\mathbf {X})\big )_{i\in [m]}. \end{aligned}\end{aligned}$$

3.2 The limiting process is good almost surely

In this section, we will show that the sample paths of \(\mathbf {S}_{\infty }^\lambda \) are good almost surely. Throughout this section, we assume without loss of generality that \(\mu =1\) and \(\sum _i \theta _i^2 = 1\) to simplify writing. An identical proof works for the general \(\mu \) and \(\varvec{\theta }\) by replacing \(\lambda \) with \(\lambda '=\lambda \mu /\sum _{i}\theta _i^2\). Consider the sigma-field \(\mathscr {F}_t = \sigma (\{\xi _i\le s\}:s\le t, i\ge 1)\), where \((\xi _i)_{i\ge 1}\) are the exploration random variables used in the definition of \(\mathbf {S}_{\infty }^\lambda \) in (2.4), and, for a collection of sets \({\mathscr {A}}\), \(\sigma ({\mathscr {A}})\) denotes the minimum sigma-algebra containing all the sets in \({\mathscr {A}}\). Then \((\mathscr {F}_t)_{t\ge 0}\) is a filtration and \(\mathbf {S}_{\infty }^\lambda \) is adapted to \((\mathscr {F}_t)_{t\ge 0}\). Our goal is stated formally in the following proposition:

Proposition 2

The sample paths of \(\mathbf {S}_{\infty }^\lambda \) satisfy the conditions of Definition 1 almost surely.

Proof of Proposition 2. The verification of each of the conditions in Definition 1 are given separately below.

Verification of Definition 1(a). Let \(q\in \mathbb {Q}_+\) and define the random time \(T_q = \inf \{t\ge q: S_{\infty }^\lambda (t) = \inf _{u\le q} S_{\infty }^\lambda (u)\}\). We will show that, almost surely,

$$\begin{aligned} \begin{aligned} \inf \{t>0: S_{\infty }^\lambda (T_q+t) - S_{\infty }^\lambda (T_q)<0\}=0, \quad \text { on }\{T_q<\infty \}, \text { for all } q\in \mathbb {Q}_+. \end{aligned}\end{aligned}$$
(3.10)

Note that if q lies in the interior of some finite-length excursion then \(T_q\in (q,\infty )\), and also \(T_q\) is the end-point of that excursion. Therefore, \(\{T_q:q\in \mathbb {Q}_+\text { and }T_q<\infty \}\) contains the set of excursion end-points of \(\mathbf {S}_{\infty }^\lambda \). Now, (3.10) ensures that \(T_q\) is not a local minimum because we can find u arbitrarily close to \(T_q\) such that \(S_{\infty }^\lambda (u) < S_{\infty }^\lambda (T_q)\). Hence, Definition 1(a) holds for \(\mathbf {S}_{\infty }^\lambda \) almost surely.

Thus it suffices to prove (3.10). Since \(\mathbb {Q}_+\) is countable, it is enough to prove (3.10) for each fixed \(q\in \mathbb {Q}_+\). Let \(V_q = \{i: {\mathscr {I}}_i (T_q) = 1\}\). Note that \(T_q\) is a stopping time. Moreover, conditionally on the sigma-field \(\mathscr {F}_{T_q}\), the process \((S_{\infty }^\lambda (T_q+t) - S_{\infty }^\lambda (T_q))_{t\ge 0}\) is distributed as \(\hat{\mathbf {S}}_{\infty }^\lambda \) given by

$$\begin{aligned} \begin{aligned} {\hat{S}}_{\infty }^\lambda (t) = \lambda \sum _{i\notin V_q} \theta _i{\mathscr {I}}_i(t)- t. \end{aligned}\end{aligned}$$
(3.11)

Define \(L(t) = \lambda \sum _{i=1}^{\infty } \theta _i {\mathscr {N}}_i(t) -t,\) where \(({\mathscr {N}}_i(t))_{t\ge 0}\) is a rate-\(\theta _i\) Poisson process, independently for different i. We assume that \(\hat{\mathbf {S}}_{\infty }^\lambda \) and \(\varvec{L}\) are coupled by taking \({\mathscr {I}}_i(s) = \mathbb {1}\{{\mathscr {N}}_i(s)\ge 1\}\), so that \({\hat{S}}_{\infty }^{\lambda }(t)\le L(t)\) for all \(t\ge 0\) almost surely. Thus, if \(R_0 = \inf \{t> 0: L(t)<0\}\), then it suffices to show that

$$\begin{aligned} \mathbb {P}(R_0 =0) = 1, \end{aligned}$$
(3.12)

and (3.10) follows. Fix \(\varepsilon >0\) and \(K\ge 1\). Then,

$$\begin{aligned} \begin{aligned} \mathbb {P}(R_0 \le \varepsilon )&\ge \mathbb {P}(L(\varepsilon )<0 ) \\&\ge \mathbb {P}\bigg (\lambda \sum _{i=K+1}^{\infty } \theta _i {\mathscr {N}}_i(\varepsilon )< \varepsilon , \text { and } {\mathscr {N}}_i(\varepsilon ) = 0, \ \forall i\in [K] \bigg )\\&=\prod _{i=1}^K\mathbb {P}({\mathscr {N}}_i(\varepsilon ) = 0) \times \mathbb {P}\bigg (\lambda \sum _{i=K+1}^{\infty } \theta _i {\mathscr {N}}_i(\varepsilon ) < \varepsilon \bigg ) \\&=\mathrm {e}^{-\varepsilon \sum _{i = 1}^K\theta _i } \bigg (1- \mathbb {P}\bigg (\lambda \sum _{i=K+1}^{\infty } \theta _i {\mathscr {N}}_i(\varepsilon ) \ge \varepsilon \bigg ) \bigg ) \\&\ge \mathrm {e}^{-\varepsilon \sum _{i = 1}^K\theta _i }\bigg (1-\frac{\lambda }{\varepsilon } \mathbb {E}\bigg [\sum _{i=K+1}^{\infty } \theta _i {\mathscr {N}}_i(\varepsilon )\bigg ]\bigg ) = \mathrm {e}^{-\varepsilon \sum _{i = 1}^K\theta _i }\bigg (1-\lambda \sum _{i=K+1}^{\infty } \theta _i^2 \bigg ), \end{aligned}\end{aligned}$$
(3.13)

where the one-but-last step follows from Markov’s inequality. Thus, using the fact that \(\{R_0 \le \varepsilon \}\searrow \{R_0 = 0\}\), as \(\varepsilon \searrow 0\),

$$\begin{aligned}\begin{aligned} \mathbb {P}(R_0 =0) = \lim _{\varepsilon \searrow 0}\mathbb {P}(R_0 \le \varepsilon ) \ge 1-\lambda \sum _{i=K+1}^{\infty } \theta _i^2, \end{aligned}\end{aligned}$$

and since the above holds for any \(K\ge 1\), and \(\sum _{i}\theta _i^2<\infty \), we have proved (3.12).

Verification of Definition 1(b). Next, we verify that Definition 1(b) holds almost surely for \(\mathbf {S}_{\infty }^\lambda \). Since \(\mathbb {Q}_+\) is countable, we may again work with fixed \(q_1,q_2\in \mathbb {Q}_+\), i.e., it suffices to prove that \((q_1,q_2) \not \subset \{t: S_{\infty }^\lambda (t) = \inf _{u\le t} S_{\infty }^\lambda (u)\}\) almost surely. By the description of our thinned Lévy process, it has positive jumps only, and if there is a jump of size \(\theta _i\) at time t, then \(S_{\infty }^\lambda (t+\theta _i/2) > \inf _{u\le t} S_{\infty }^\lambda (u)= \inf _{u\le t+\theta _i/2} S_{\infty }^\lambda (u)\). Therefore, if \((q_1,q_2) \subset \{t: S_{\infty }^\lambda (t) = \inf _{u\le t} S_{\infty }^\lambda (u)\}\), then there is no \(\xi _i\) such that \(\xi _i\in (q_1,q_2)\). We compute

$$\begin{aligned}\begin{aligned} \mathbb {P}(\forall i\ge 1: \xi _i \notin (q_1,q_2))&= \prod _{i=1}^\infty \mathbb {P}(\xi _i \notin (q_1,q_2))=\prod _{i=1}^\infty (1- \mathrm {e}^{-\theta _i q_1} + \mathrm {e}^{-\theta _i q_2}) \\&= \exp \bigg (\sum _{i=1}^\infty \log \Big (1- \mathrm {e}^{-\theta _i q_1} (1- \mathrm {e}^{-\theta _i (q_2-q_1)}\Big ) \bigg ) \\&\le \exp \bigg ( - \mathrm {e}^{-\theta _1q_1}\sum _{i=1}^\infty (1- \mathrm {e}^{-\theta _i (q_2-q_1)}) \bigg ) = 0, \end{aligned}\end{aligned}$$

where the one-but-last step follows using \(\log (1-x) \le -x\) for all \(x\in (0,1)\) and \(\mathrm {e}^{-\theta _iq_1}\ge \mathrm {e}^{-\theta _1q_1}\) for all \(i\ge 1\), and the last step uses the fact that \(\sum _{i=1}^\infty (1- \mathrm {e}^{-\theta _i (q_2-q_1)}) = \infty \), which follows by applying the limit comparison test together with \((1- \mathrm {e}^{-\theta _i (q_2-q_1)})/\theta _i \rightarrow q_2-q_1\) as \(i\rightarrow \infty \), and \(\sum _{i=1}^\infty \theta _i = \infty \). Thus we have verified that Definition 1(b) holds almost surely for \(\mathbf {S}_{\infty }^\lambda \).

Verification of Definition 1(c). Similarly as above, for any \(q\in \mathbb {Q}_+\), define the stopping time \(T_q(\varepsilon ) = \inf \{t\ge q: S_{\infty }^\lambda (t) \le \inf _{u\le q} S_{\infty }^\lambda (u)+ \varepsilon \}\). Thus, \(T_q(\varepsilon )>q\) if \(S_{\infty }^\lambda (q) > \inf _{u\le q} S_{\infty }^\lambda (u)+ \varepsilon \) and \(T_q(\varepsilon ) = q\) otherwise. Observe that \(T_q(\varepsilon )<\infty \) almost surely since there are no infinite excursions. We claim that it is sufficient to prove

$$\begin{aligned} \lim _{\varepsilon \searrow 0} \mathbb {P}(\mathbf {S}_{\infty }^\lambda \text { has an excursion end-point in }(T_q(\varepsilon ),T_q(\varepsilon )+2\varepsilon ), \text { and } T_q(\varepsilon )<\infty ) =1. \end{aligned}$$
(3.14)

Let \(T_q^-:=\inf \{t> q: S_{\infty }^\lambda (t-) = \inf _{u\le q} S_{\infty }^\lambda (u)\}\). Indeed, if q lies inside some excursion, i.e., \(S_{\infty }^\lambda (q) > \inf _{u\le q} S_{\infty }^\lambda (u)\), then \(T_q(\varepsilon ) \nearrow T_q^-\) as \(\varepsilon \searrow 0\), and (3.14) shows that \(T_q^{-}\) must be an excursion end-point with probability 1. Now, if \(\mathbf {S}_{\infty }^\lambda \) contains an excursion (lr) having a point \(t\in (l,r)\) such that \(S_{\infty }^\lambda (t-) = \inf _{u\le l} S_{\infty }^\lambda (u) = \inf _{u\le t} S_{\infty }^\lambda (u)\), then there exists some \(q\in \mathbb {Q}_+\cap (l,r)\) such that \(T_q^-\) is not an excursion endpoint. The later has zero probability as we showed above. This completes the verification of Definition 1(c).

It remains to prove (3.14). As before, let \(L(t) = \lambda \sum _{i=1}^{\infty } \theta _i {\mathscr {N}}_i(t) -t\), and let us also work under the coupling under which \(S_{\infty }^\lambda (T_q(\varepsilon ) +t) - S_{\infty }^\lambda (T_q(\varepsilon )) \le L(t)\) for all \(t\ge 0\) almost surely. We have \(S_{\infty }^\lambda (T_q(\varepsilon )) \le \inf _{u\le q}S_{\infty }^\lambda (u) + \varepsilon \) on \(\{T_q(\varepsilon )<\infty \}\), since the process has only positive jumps. Also, if \(L(2\varepsilon ) < -\varepsilon \), then \(S_{\infty }^\lambda (T_q(\varepsilon ) +2\varepsilon ) < S_{\infty }^\lambda (T_q(\varepsilon ))-\varepsilon \le \inf _{u\le q}S_{\infty }^\lambda (u) \), and consequently \(\inf _{u\le T_q(\varepsilon )+2\varepsilon } S_{\infty }^\lambda (u)< \inf _{u\le T_q(\varepsilon )} S_{\infty }^\lambda (u) = \inf _{u\le q} S_{\infty }^\lambda (u)\). Therefore, the event in (3.14) holds. Thus, using identical computations as (3.13), it follows that

$$\begin{aligned}\begin{aligned}&\mathbb {P}(\mathbf {S}_{\infty }^\lambda \text { has an excursion end-point in }(T_q(\varepsilon ),T_q(\varepsilon )+2\varepsilon ), \text { and } T_q(\varepsilon )<\infty ) \\&\ge \mathbb {P}(L(2\varepsilon )< -\varepsilon ) = \mathbb {P}\bigg (\lambda \sum _{i=1}^\infty \theta _i{\mathscr {N}}_i(2\varepsilon )<\varepsilon \bigg ) \ge \mathrm {e}^{-2\varepsilon \sum _{i = 1}^K\theta _i }\bigg (1-2\lambda \sum _{i=K+1}^{\infty } \theta _i^2 \bigg ), \end{aligned}\end{aligned}$$

and (3.14) follows by taking the iterated limit \(\lim _{K\rightarrow \infty }\lim _{\varepsilon \rightarrow 0}\), and using \(\sum _i\theta _i^2<\infty \).

Verification of Definition 1(d). We start by providing the martingale decomposition for \(\mathbf {S}_{\infty }^\lambda \):

Lemma 2

The process \(\mathbf {S}_{\infty }^\lambda \) admits the Doob-Meyer decomposition \(S_{\infty }^\lambda (t) = M(t)+A(t)\) with the drift term A(t) and the quadratic variation for the martingale term \(\langle M\rangle (t)\) given by

$$\begin{aligned} A(t) = \lambda \sum _{i=1}^\infty \theta _i^2\min \{\xi _i,t\}-t, \qquad \langle M\rangle (t) = \lambda ^2\sum _{i=1}^\infty \theta _i^3\min \{\xi _i,t\}. \end{aligned}$$

Proof

Define \(M_i(t) = \mathbb {1}_{\left\{ \xi _i\le t\right\} }-\theta _i\min \{\xi _i,t\}\). Then

$$\begin{aligned} \begin{aligned} (M_i(t))_{t\ge 0} \quad \text {is a martingale.} \end{aligned}\end{aligned}$$
(3.15)

Indeed, note that \(M_{i}(t+s) - M_i(t) = 0\) if \(\xi _i \le t\). Thus,

$$\begin{aligned} \begin{aligned} \mathbb {E}[M_i(t+s) - M_i(t) \mid \mathscr {F}_t]&= \mathbb {E}[\mathbb {1}_{\left\{ t<\xi _i\le t+s\right\} }-\theta _i(\min \{\xi _i,t+s\} - \min \{\xi _i,t\}) \mid \xi _i>t] \\&=\mathbb {E}[\mathbb {1}_{\left\{ t<\xi _i\le t+s\right\} }-\theta _i \min \{\xi _i-t,s\} \mid \xi _i>t] \\&= \mathbb {P}(0<\xi _i \le s) - \theta _i\mathbb {E}[\min \{\xi _i,s\}], \end{aligned}\end{aligned}$$
(3.16)

where the last step follows from the memoryless property of the exponential distributions. Now, using the fact that \(\int x\mathrm {e}^{-ax} \mathrm {d}x= -\mathrm {e}^{-ax}(ax+1)/a^2\), one can verify that \(\theta _i\mathbb {E}[\min \{\xi _i,s\}] = 1- \mathrm {e}^{-\theta _i s}\). Applying this to (3.16), we can conclude that \(\mathbb {E}[M_i(t+s) - M_i(t) \vert \mathscr {F}_t] =0\), thus verifying (3.15). Moreover, the quadratic variation of \((M_i(t))_{t\ge 0}\) is given by

$$\begin{aligned} \langle M_i\rangle (t) = \theta _i\min \{\xi _i,t\}. \end{aligned}$$
(3.17)

This follows from the characterization of unit-jump processes given in [52, Lemma 3.1], together with the fact that \(\theta _i\min \{\xi _i,t\}\), the compensator of \(\mathbb {1}_{\left\{ \xi _i\le t\right\} }\), is continuous in t. Then (3.15) and (3.17) completes the proof of Lemma 2. \(\quad \square \)

We are now ready to verify Definition 1(d). In order to prove that \(\mathbf {S}_{\infty }^\lambda \) does not have an excursion of infinite length almost surely, it suffices to show that

$$\begin{aligned} \begin{aligned} \lim _{t\rightarrow \infty }S_{\infty }^\lambda (t) = -\infty \quad \text {almost surely. } \end{aligned}\end{aligned}$$
(3.18)

Fix \(K\ge 1\) such that \(\lambda \sum _{i>K} \theta _i^2 <1/2\). Such a choice of K is always possible as \(\varvec{\theta }\in \ell ^2_{{\scriptscriptstyle \downarrow }}\). Further define the stopping time \(T:= \inf \{t: \xi _i\le t,\ \forall i\in [K]\} = \max _{i\le K} \xi _i\). Thus, \(T<\infty \) almost surely. Note that \(\min \{\xi _i,t\} \le t\) and thus,

$$\begin{aligned}\begin{aligned} \frac{1}{t}\lambda \sum _{i>K}\theta _i^2 \min \{\xi _i,t\} < \frac{1}{2}, \quad \text {almost surely.} \end{aligned}\end{aligned}$$

Therefore, for any \(t>T\),

$$\begin{aligned}\begin{aligned} A(t) = \lambda \sum _{i\in [K]} \theta _i^2 \xi _i + \lambda \sum _{i>K}\theta _i^2 \min \{\xi _i,t\} -t < \lambda \sum _{i\in [K]} \theta _i^2 \xi _i -\frac{t}{2}, \quad \text {almost surely.} \end{aligned}\end{aligned}$$

We conclude that, for any \(r\in (0,1)\), \(t^{-r}A(t) \xrightarrow {\scriptscriptstyle \mathrm {a.s.}}-\infty .\) For the martingale part we will use the exponential concentration inequality [55, Inequality 1, p. 899], which is stated below:

Lemma 3

If M is any continuous time local martingale such that \(M(0) = 0\), and \(\sup _{t\in [0,\infty )} |M(t) - M(t-)| \le c\), almost surely, then for any \(t>0\), \(a>0\) and \(b>0\),

$$\begin{aligned}\begin{aligned} \mathbb {P}\Big (\sup _{s\in [0,t]} M(s) > a, \text { and } \langle M \rangle (t) \le b \Big ) \le \exp \bigg (-\frac{a^2}{2b} \psi \Big ( \frac{a c}{b}\Big ) \bigg ), \end{aligned}\end{aligned}$$

where \(\psi (x) = ( (1+x)\log (1+x) - x ) / x^2\).

In particular, \(\psi (x) \ge 1/(2 (1+x/3))\) (see [42, p. 27]). Note that \( \langle M\rangle (t)\le \lambda ^2 t \sum _{i=1}^\infty \theta _i^3.\) We apply Lemma 3 with \(a = \varepsilon t^{r}\), \(b = \lambda ^2 t \sum _{i=1}^\infty \theta _i^3\), and \(c= \theta _1\). Using Lemma 2, \(\langle M\rangle (t) \le b\) almost surely. Now, \(\psi (ac/b) \ge C/(1+t^{r-1})\), and thus for any \(\varepsilon >0\), and \(r\in (1/2,1)\)

$$\begin{aligned}\begin{aligned} \mathbb {P}\Big (\sup _{s\in [0,t]} |M(s)| > \varepsilon t^{r} \Big ) \le 2 \exp (-C t^{2r-1}), \end{aligned}\end{aligned}$$

for some constant \(C>0\), where the bound on the absolute value of M follows from the fact that \(-M\) is also a martingale, so Lemma 3 applies to \(-M\) as well. Now an application of the Borel–Cantelli lemma proves that \( t^{-r}| M(t) |\xrightarrow {\scriptscriptstyle \mathrm {a.s.}}0,\) for any \(r\in (1/2,1)\). This fact, together with the asymptotics of the drift term, completes the proof of (3.18). \(\quad \square \)

Verification of Definition 1(e). Fix \(\delta >0\). Let \(t_k = (k-1)\delta /2\) and define the event

$$\begin{aligned} \mathrm {C}_k^\delta :=\bigg \{\sup _{t\in (t_{k-1},t_k]}S_{\infty }^\lambda (t_{k+1}) - S_{\infty }^\lambda (t)>0\bigg \}. \end{aligned}$$

Suppose that there is an excursion (lr) with \(r-l > \delta \) and \(l\in (t_{k-1},t_k]\) for some k. Since \(r> t_{k+1}\) and \(l\in (t_{k-1},t_k]\), we have that \(\inf _{u\le t_{k+1}}S_{\infty }^\lambda (u) = \inf _{u\in (t_{k-1},t_k]}S_{\infty }^\lambda (u) \). Consequently, \(S_{\infty }^\lambda (t_{k+1})> \inf _{t\in (t_{k-1},t_k]} S_{\infty }^\lambda (t)\), and therefore \(\mathrm {C}_k^\delta \) must occur. Therefore, if \(\mathbf {S}_{\infty }^\lambda \) has infinitely many excursions of length at least \(\delta \), then \(\mathrm {C}_k^\delta \) must occur infinitely often. Using the Borel–Cantelli lemma, the proof follows if we can show that

$$\begin{aligned} \sum _{k=1}^\infty \mathbb {P}(\mathrm {C}_k^\delta ) <\infty . \end{aligned}$$

As before, fix \(K\ge 1\) such that \(\lambda \sum _{i>K} \theta _i^2 <1/2\), and let \(T:= \inf \{t: \xi _i\le t,\ \forall i\in [K]\} = \max _{i\le k} \xi _i\). Notice that for each \(K\ge 1\),

$$\begin{aligned}\begin{aligned} \sum _{k=1}^\infty \mathbb {P}\left( T>t_{k-1}\right)&= \sum _{k=1}^\infty \mathbb {P}\left( \exists i\in [K]: \xi _i>t_{k-1}\right) \le \sum _{k=1}^\infty K \mathrm {e}^{-\theta _K(k-1)\delta /2}<\infty , \end{aligned}\end{aligned}$$

and therefore it is enough to show that

$$\begin{aligned} \sum _{k=1}^\infty \mathbb {P}(\mathrm {C}_k^\delta \cap \{T\le t_{k-1}\}) <\infty . \end{aligned}$$
(3.19)

Now,

$$\begin{aligned}\begin{aligned}&\sup _{t\in [t_{k-1},t_k]}\big [S_{\infty }^\lambda (t_{k+1})-S_{\infty }^\lambda (t)\big ] \le M(t_{k+1})+\sup _{t\in [t_{k-1},t_k]} - M(t) + \sup _{t\in [t_{k-1},t_k]} [A(t_{k+1}) - A(t)]\\&\quad \le M(t_{k+1}) - M(t_{k-1})+ \sup _{t\in [t_{k-1},t_k]} [M(t_{k-1})- M(t)] \\&\qquad + \sup _{t\in [t_{k-1},t_k]}\bigg [\lambda \sum _{i=1}^\infty \theta _i^2 (\min \{\xi _i,t_{k+1}\}-\min \{\xi _i,t\}) -(t_{k+1}-t)\bigg ]\\&\quad \le 2\sup _{t\in [t_{k-1},t_{k+1}]} |M(t)-M(t_{k-1})| \\&\qquad + \sup _{t\in [t_{k-1},t_k]}\bigg [\lambda \sum _{i=1}^\infty \theta _i^2 (\min \{\xi _i,t_{k+1}\}-\min \{\xi _i,t\}) -(t_{k+1}-t)\bigg ]. \end{aligned}\end{aligned}$$

On the event \(\{T\le t_{k-1}\}\), the second term inside the supremum above reduces to

$$\begin{aligned} \lambda \sum _{i>K} \theta _i^2 (\min \{\xi _i,t_{k+1}\}-\min \{\xi _i,t\}) -(t_{k+1}-t) \le (t_{k+1}-t)\lambda \sum _{i>K} \theta _i^2 - (t_{k+1}-t)< -\frac{\delta }{2},\nonumber \\ \end{aligned}$$
(3.20)

using \(\lambda \sum _{i>K} \theta _i^2 <1/2\). Thus we only need to estimate

$$\begin{aligned} \mathbb {P}\bigg (\sup _{t\in [t_{k-1},t_{k+1}]} |M(t)-M(t_{k-1})| >\frac{\delta }{4}\bigg ). \end{aligned}$$

Note that \((M(t)-M(t_{k-1}))_{t\ge t_{k-1}}\) is a martingale with respect to the filtration \((\mathscr {F}_t)_{t\ge t_{k-1}}\) starting from zero. Moreover, using an identical argument as Lemma 2 yields that the quadratic variation of \((M(t)-M(t_{k-1}))_{t\ge t_{k-1}}\) is given by

$$\begin{aligned} \lambda ^2\sum _{i=1}^\infty \theta _i^3\big (\min \{\xi _i,t\}-\min \{\xi _i,t_{k-1}\}\big ). \end{aligned}$$

Further, \(\mathbb {E}[\min \{\xi _i,t\}] = \theta _i^{-1}(1-\mathrm {e}^{-\theta _i t})\). Therefore, Doob’s martingale inequality [45, Theorem 1.9.1.3] implies

$$\begin{aligned}\begin{aligned}&\sum _{k=1}^\infty \mathbb {P}\bigg (\sup _{t\in [t_{k-1},t_{k+1}]} |M(t)-M(t_{k-1})| >\frac{\delta }{4}\bigg )\\&\le \sum _{k=1}^\infty \frac{16 \lambda ^2}{\delta ^2}\sum _{i=1}^\infty \theta _i^2 (\mathrm {e}^{-\theta _i t_{k-1}}-\mathrm {e}^{-\theta _i t_{k+1}}) = \frac{16\lambda ^2}{\delta ^2} \sum _{i=1}^\infty \theta _i^2(1-\mathrm {e}^{-\theta _i\delta }) \sum _{k=1}^\infty \mathrm {e}^{-\theta _i t_{k-1}}<\infty , \end{aligned}\end{aligned}$$

and the proof of (3.19) now follows using (3.20).

Verification of Definition 1(f). We first prove the following:

Lemma 4

The distribution of \(S_{\infty }^\lambda (t)\) has no atoms for all \(t>0\).

Proof

Let \(\phi _t (v) = \mathbb {E}[\mathrm {e}^{\mathrm {i}v S(t)}]\) for \(v\in \mathbb {R}\). Using the sufficient condition for random variables to have non-atomic distribution stated in [33, p. 189], it suffices to prove that

$$\begin{aligned}\begin{aligned} \int _{-\infty }^\infty |\phi _t (v)| \mathrm {d}v<\infty . \end{aligned}\end{aligned}$$

Note that

$$\begin{aligned}\begin{aligned} \phi _t(v)&= \mathrm {e}^{- \mathrm {i}v t}\prod _{j=1}^{\infty } \mathbb {E}[\mathrm {e}^{\mathrm {i}v \lambda \theta _j\mathbb {1}\{\xi _j\le t\}}] = \mathrm {e}^{- \mathrm {i}v t}\prod _{j=1}^{\infty } (\mathrm {e}^{\mathrm {i}v \lambda \theta _j}(1-\mathrm {e}^{- \theta _jt}) + \mathrm {e}^{- \theta _jt})\\&=\mathrm {e}^{- \mathrm {i}v t}\prod _{j=1}^{\infty }\big ( (1-\mathrm {e}^{-t\theta _j }) \cos (v \lambda \theta _j ) + \mathrm {e}^{-t\theta _j }+ \mathrm {i}(1-\mathrm {e}^{-t\theta _j})\sin (v\lambda \theta _j)\big ). \end{aligned}\end{aligned}$$

Therefore,

$$\begin{aligned}\begin{aligned} |\phi _t (v)|^2&= \prod _{j=1}^\infty \Big (\big ((1-\mathrm {e}^{-t\theta _j }) \cos (v \lambda \theta _j ) + \mathrm {e}^{-t\theta _j }\big )^2+ (1-\mathrm {e}^{-t\theta _j})^2 \sin ^2(v\lambda \theta _j)\Big ) \\&= \prod _{j=1}^\infty \Big (\mathrm {e}^{-2t\theta _j} + 2\cos (v\lambda \theta _j) \mathrm {e}^{-t\theta _j}(1-\mathrm {e}^{-t\theta _j}) + (1-\mathrm {e}^{-t\theta _j})^2 \Big ) \\&= \prod _{j=1}^\infty \Big (1- 2 \mathrm {e}^{-t\theta _j}(1- \mathrm {e}^{-t\theta _j}) (1-\cos (v\lambda \theta _j))\Big )\\&\le \mathrm {e}^{-\sum _{j=1}^\infty 2 \mathrm {e}^{-t\theta _j}(1- \mathrm {e}^{-t\theta _j}) (1-\cos (v\lambda \theta _j))}, \end{aligned}\end{aligned}$$

where in the last step we have used the fact that \(1-x\le \mathrm {e}^{-x}\) for all \(x>0\). Recall (2.5). Let \(j_0\ge 1\) be such that \( \max \{(2|v|\lambda /\pi ) \theta _j,t\theta _j \} \le 1\) for all \(j\ge j_0 \). Now, for \(j\ge j_0\), we have that \(\mathrm {e}^{-t\theta _j} \ge \mathrm {e}^{-1}\), \((1-\mathrm {e}^{-t\theta _j}) \ge t\theta _j/2\) and \(1-\cos (v\lambda \theta _j) \ge \frac{\lambda ^2}{\pi } v^2\theta _j^2\). Thus, using (2.5),

$$\begin{aligned} \begin{aligned} \int _{-\infty }^\infty |\phi _t (v)| \mathrm {d}v \le \int _{-\infty }^\infty \mathrm {e}^{-\frac{\lambda ^2t}{\mathrm {e}\pi } v^2M_t(2|v|\lambda /\pi )} \mathrm {d}v<\infty , \end{aligned}\end{aligned}$$

and the proof now follows. \(\quad \square \)

In order to prove the strict ordering between excursion lengths, it is enough to show that no two excursions of \(\mathbf {S}_{\infty }^\lambda \) have the same length almost surely. For any \(q\in \mathbb {Q}_+\), if q is in some excursion of \(\mathbf {S}_{\infty }^\lambda \), i.e., if \(S_{\infty }^\lambda (q) > \inf _{t\le q} S_{\infty }^\lambda (t)\), then we let e(q) be the excursion containing q, and otherwise we let \(e(q) = \varnothing \). Thus it is enough to show that for any \(q_1,q_2 \in \mathbb {Q}_+\),

$$\begin{aligned}\begin{aligned} \mathbb {P}(e(q_1) \ne e(q_2), \text { but }|e(q_1)| = |e(q_2)|) =0. \end{aligned}\end{aligned}$$

Without loss of generality, let \(q_1<q_2\). Thus, if \(e(q_1)\ne e(q_2)\), then \(e(q_1) \) appears earlier than \(e(q_2)\). Let \(V_{q_2} = \{i: {\mathscr {I}}_i (q_2) = 1\}\). As before, conditionally on \(\mathscr {F}_{q_2}\), the process \((S_{\infty }^\lambda (q_2+t) - S_{\infty }^\lambda (q_2))_{t\ge 0}\) is distributed as \(\hat{\mathbf {S}}_{\infty }^\lambda \) given by

$$\begin{aligned} \begin{aligned} {\hat{S}}_{\infty }^\lambda (t) = \sum _{i\notin V_{q_2}} \theta _i\left( {\mathscr {I}}_i(t)- (\theta _i/\mu )t\right) +\lambda t. \end{aligned}\end{aligned}$$
(3.21)

Therefore, the process in (3.21) again has the form (2.4) (see (3.11)). Now, for any \(x>0\), the probability that \(|e(q_2)| = x\), conditionally on \(\mathscr {F}_{q_2}\) and \(|e(q_1)| = x\), is zero using Lemma 4 together with the fact that \(|V_{q_2}^c| = \infty \). This concludes the verification of Definition 1(f).

4 The Critical Window

In this section, we prove our results related to critical percolation on \(\mathrm {CM}_n({\varvec{d}})\). In Sect. 4.1, we start by describing a way to approximate percolation on a configuration model by a suitable alternative configuration model. In Sect. 4.2, we analyze the latter graph. The first step is to set up an exploration process that approximately encodes the component sizes in terms of excursion lengths above past minima. This exploration process is shown to converge to \(\mathbf {S}_{\infty }^\lambda \) (Sect. 4.2.1). We must also ensure that the exploration process does not have large excursions appearing beyond the time scale of the exploration process, which allows us to prove that the largest component sizes converge to largest excursion lengths of \(\mathbf {S}_{\infty }^\lambda \) (Sect. 4.2.2). Next we analyze the surplus edges (Sect. 4.2.3) and the proof of Theorem 1 is completed in Sect. 4.2.4. Finally, we analyze the diameter of the critical components in Sect. 4.3 and complete the proof of Theorem 2.

4.1 Sandwiching the percolated configuration model

Following the pioneering work of Aldous [4], the main tool to prove scaling limits of the component sizes is to set up an appropriate exploration process. The idea is to explore the graph sequentially, and the exploration process keeps track of some functional of vertices that have been discovered but their neighborhoods have not been explored. For percolation on the configuration model, this could be the number of unpaired half-edges of those vertices. Now, for random graphs with independent connection probabilities, the exploration process is usually Markovian, but not for the configuration model. Indeed, one has to keep track of the degree-profile outside the explored graph in order to know the distribution of the degree of a newly discovered vertex. For d-regular graphs, Nachmias and Peres [48] used the above approach, but this becomes difficult in the unbounded degree case. In earlier papers with Sen [27, 28], we have used a construction by Janson [39] which says that the percolated configuration model can be viewed as a configuration model satisfying some criticality condition, so that it is enough to analyze the behavior of these critical configuration models. However, in the \(\tau \in (2,3)\) regime, this construction does not work because it gives rise to \(n - o(n)\) many degree-one vertices. As a remedy to this problem, we use a result of Fountoulakis [32] to show that the critical configuration model can be sandwiched between two approximately equal configuration models, as stated in Proposition 3 below. We emphasize that Proposition 3 holds for percolation on the configuration model without any specific assumption on the degree distribution, as long as \(\ell _n p_n \gg \log (n)\), and this will be used in the proofs for the near-critical results as well. We start by describing the approximating configuration model below:

Algorithm 1

(S0) Keep each half-edge with probability \(p_n\), independently, and delete the half-edges otherwise. If the total number of retained half-edges is odd, then attach a dummy half-edge to vertex 1.

  1. (S1)

    Perform a uniform perfect matching among the retained half-edges, i.e., within the retained half-edges, pair unpaired half-edges sequentially with a uniformly chosen unpaired half-edge until all half-edges are paired. The paired half-edges create edges in the graph, and we call the resulting graph \({\mathscr {G}}_n(p_n)\).

The following proposition formally states that \({\mathscr {G}}_n(p_n)\) approximates \(\mathrm {CM}_n(\varvec{d},p_n)\):

Proposition 3

Let \(p_n\) be such that \(\ell _np_n \gg \log (n)\). There exists \((\varepsilon _n)_{n\ge 1} \subset (0,\infty )\) with \(\varepsilon _n\rightarrow 0 \), and a coupling such that, with high probability,

$$\begin{aligned} {\mathscr {G}}_n(p_n(1-\varepsilon _n))\subset \mathrm {CM}_n(\varvec{d},p_n)\subset {\mathscr {G}}_n(p_n(1+\varepsilon _n)). \end{aligned}$$

Proof

The proof relies on an exact construction of \(\mathrm {CM}_n(\varvec{d},p_n)\) by Fountoulakis [32] which goes as follows:

Algorithm 2

(S0) Perform a binomial trial \(X\sim \mathrm {Bin}(\ell _n/2,p_n)\) and choose 2X half-edges uniformly at random from the set of all half-edges.

  1. (S1)

    Perform a perfect matching of these 2X chosen half-edges. The resulting graph is distributed as \(\mathrm {CM}_n(\varvec{d},p_n)\).

Notice the similarity between Algorithm 1 (S1) and Algorithm 2 (S1). In both algorithms, given the number of retained half-edges, the choice of the half-edges can be performed sequentially uniformly at random without replacement. Thus, given the number of half-edges in the two algorithms, we can couple the choice of the half-edges, and their pairing (the restriction of a uniform matching to a subset of half-edge remains uniform matching on that subset). Let \({\mathscr {H}}_1\), \({\mathscr {H}}_2^-\) and \({\mathscr {H}}_2^+\), respectively, denote the number of half-edges in \(\mathrm {CM}_n(\varvec{d},p_n)\), \({\mathscr {G}}_n(p_n(1-\varepsilon _n))\) and \({\mathscr {G}}_n(p_n(1+\varepsilon _n))\). From the above discussion, the proof is complete if we can show that, as \(n\rightarrow \infty \),

$$\begin{aligned} \mathbb {P}\big ({\mathscr {H}}_2^-\le {\mathscr {H}}_1\le {\mathscr {H}}_2^+\big ) \rightarrow 1. \end{aligned}$$

We ignore the contribution due to the possible addition of only one dummy edge in Algorithm 3 (S0), as it does not affect asymptotic computations. Notice that \({\mathscr {H}}_1=2X\), where \(X\sim \mathrm {Bin}(\ell _n/2,p_n)\), and \({\mathscr {H}}_2^{+/-}\sim \mathrm {Bin} (\ell _n,p_n(1\pm \varepsilon _n))\). Using standard concentration inequalities [42, Corollary 2.3], it follows that

$$\begin{aligned} {\mathscr {H}}_1 = \ell _np_n +o_{\scriptscriptstyle \mathbb {P}}(\sqrt{\ell _np_n\log (n)}), \end{aligned}$$

and

$$\begin{aligned} {\mathscr {H}}_2^+ = \ell _np_n + \ell _np_n \varepsilon _n +o_{\scriptscriptstyle \mathbb {P}}(\sqrt{\ell _np_n\log (n)}). \end{aligned}$$

If we choose \(\varepsilon _n\) such that \(\varepsilon _n\gg (\log (n)/(\ell _np_n))^{1/2}\) and \(\varepsilon _n\rightarrow 0\), then, with high probability, \({\mathscr {H}}_1\le {\mathscr {H}}_2^+\). Similarly we can conclude that \({\mathscr {H}}_2^-\le {\mathscr {H}}_1\) with high probability, and the proof of Proposition 3 follows. \(\quad \square \)

We conclude this section by stating some properties of the degree sequence of the graph \({\mathscr {G}}_n(p_n)\) that will be crucial in the analysis below. Let \(\tilde{\varvec{d}}=({\tilde{d}}_1,\dots ,{\tilde{d}}_n)\) be the degree sequence induced by Algorithm 1 (S1), and let \({\tilde{\ell }}_n = \sum _{i} {\tilde{d}}_i\) be the number of retained half-edges. Then the following result holds for \(\tilde{\varvec{d}}\):

Lemma 5

(Degrees of \({\mathscr {G}}_n(p_n)\)) Suppose that \(p_n \gg n^{-\alpha }\), and Assumption 1 holds. For each fixed \(i\ge 1\), \({\tilde{d}}_i = d_ip_n(1+o_{\scriptscriptstyle \mathbb {P}}(1))\), \({\tilde{\ell }}_n = \ell _np_n (1+o_{\scriptscriptstyle \mathbb {P}}(1))\), and \(\sum _{i\in [n]} {\tilde{d}}_i({\tilde{d}}_i-1) = p_n^2\sum _{i\in [n]} d_i(d_i-1) (1+o_{\scriptscriptstyle \mathbb {P}}(1))\). Consequently, for \(p_n\ll p_c(\lambda )\), \(\sum _{i\in [n]}{\tilde{d}}_i^2= {\tilde{\ell }}_n(1+o_{\scriptscriptstyle \mathbb {P}}(1))\), whereas for \(p_n= p_c(\lambda )\),

$$\begin{aligned} {\tilde{\nu }}_n = \frac{\sum _{i\in [n]}{\tilde{d}}_i({\tilde{d}}_i-1)}{\sum _{i\in [n]}{\tilde{d}}_i} = \lambda (1+o_{\scriptscriptstyle \mathbb {P}}(1)), \quad and \quad \lim _{K\rightarrow \infty }\limsup _{n\rightarrow \infty } \mathbb {P}\bigg (\sum _{i>K}{\tilde{d}}_i({\tilde{d}}_i-1)>\varepsilon {\tilde{\ell }}_n\bigg ) = 0,\nonumber \\ \end{aligned}$$
(4.2)

for any \(\varepsilon >0\).

Proof

Note that \(\tilde{d}_i \sim \mathrm {Bin}(d_i,p_n)\), independently for \(i\in [n]\). For each fixed \(i\ge 1\), \(d_i p_n \rightarrow \infty \), as \(p_n\gg n^{-\alpha }\). Thus the first fact follows using [42, Theorem 2.1]. Since, \({\tilde{\ell }}_n\sim \mathrm {Bin}(\ell _n,p_n)\), the second fact also follows using the same bound. To see the asymptotics for \({\tilde{m}}_2:=\sum _{i\in [n]} {\tilde{d}}_i({\tilde{d}}_i-1)\), note that \(\mathbb {E}[{\tilde{m}}_2] = p_n^2 m_2\), where \(m_2 = \sum _{i\in [n]}d_i(d_i-1)\). Also, \(\mathrm {Var}({\tilde{d}}_i({\tilde{d}}_i-1)) = 2d_i(d_i-1)p_n^2(1-p_n) (1+(2d_i-3)p_n)\). Thus,

$$\begin{aligned}\begin{aligned} \frac{\mathrm {Var}\big (\sum _{i\in [n]}{\tilde{d}}_i({\tilde{d}}_i-1)\big ) }{\big (\mathbb {E}[\sum _{i\in [n]}{\tilde{d}}_i({\tilde{d}}_i-1)]\big )^2} \le \frac{4d_1 p_n^3 m_2}{p_n^4 m_2^2} = O\Big (\frac{1}{p_n n^\alpha }\Big ) = o(1), \end{aligned}\end{aligned}$$

where the penultimate step uses the fact that \(m_2 = \varTheta (n^{2\alpha })\), \(d_1 = \varTheta (n^{\alpha })\), and in the last step we have again used the fact that \(p_n \gg n^{-\alpha }\). Using Chebyshev’s inequality, it now follows that \({\tilde{m}}_2 = p_n^2 m_2(1+ o_{\scriptscriptstyle \mathbb {P}}(1))\). Thus,

$$\begin{aligned}\begin{aligned} {\tilde{\nu }}_n = (1+o_{\scriptscriptstyle \mathbb {P}}(1))p_n \frac{\sum _{i\in [n]}d_i(d_i-1)}{\sum _{i\in [n]}d_i} = (1+o_{\scriptscriptstyle \mathbb {P}}(1))p_n\nu _n. \end{aligned}\end{aligned}$$

For \(p_n\ll p_c(\lambda )\), \(p_n\nu _n =o(1)\). Thus, \(\sum _{i\in [n]}{\tilde{d}}_i^2= {\tilde{\ell }}_n(1+o_{\scriptscriptstyle \mathbb {P}}(1))\). For \(p_n = p_c(\lambda )\), the first equality in (4.2) follows using (2.9).

We now prove the second inequality in (4.2). For any \(\varepsilon >0\), the required probability is at most

$$\begin{aligned}\begin{aligned}&\mathbb {P}\bigg (\sum _{i>K}{\tilde{d}}_i({\tilde{d}}_i-1)>\varepsilon {\tilde{\ell }}_n, \frac{\ell _np_n}{2}\le {\tilde{\ell }}_n\le 2\ell _np_n \bigg ) +o(1)\\&\quad \le \mathbb {P}\bigg (\sum _{i>K}{\tilde{d}}_i({\tilde{d}}_i-1)> \frac{\varepsilon \ell _np_n}{2}\bigg )+o(1) \\&\quad \le \frac{4p_n^2\sum _{i>K}d_i(d_i-1)}{\varepsilon \ell _n p_n}+o(1) = \frac{4 p_n \sum _{i>K}d_i^2}{\varepsilon \ell _n} +o(1), \end{aligned}\end{aligned}$$

where the penultimate step follows from Markov’s inequality. The proof now follows using (2.8) and \(p_n = \varTheta (n^{1-2\alpha })\) for \(p_n = p_c(\lambda )\). \(\quad \square \)

4.2 Scaling limits of critical components

4.2.1 Convergence of the exploration process

Let \(\tilde{\varvec{d}}=({\tilde{d}}_1,\dots ,{\tilde{d}}_n)\) be the degree sequence induced by Algorithm 1 (S1) with \(p_n=p_c(\lambda )\), and consider \({\mathscr {G}}_n(p_c(\lambda ))\). Note that \({\mathscr {G}}_n(p_c(\lambda ))\) has the same distribution as \(\mathrm {CM}_{n}(\tilde{{\varvec{d}}})\). We start by describing how the connected components in the graph can be explored while generating the random graph simultaneously:

Algorithm 3

(Exploring the graph). The algorithm carries along vertices that can be alive, active, exploring and killed, and half-edges that can be alive, active or killed. Alive and killed half-edges correspond to unpaired and paired half-edges respectively, whereas active half-edges correspond to half-edges that have been found during the exploration, but have not been paired yet. Thus a half-edge can be alive and active simultaneously. Similarly, a vertex is killed when all its half-edges have been explored, otherwise the vertex is alive. An active vertex is an alive vertex that has been found already during the exploration, whereas an exploring vertex is currently being explored. We sequentially explore the graph as follows:

  1. (S0)

    At stage \(i=0\), all the vertices and the half-edges are alive but none of them are active. Also, there are no exploring vertices.

  2. (S1)

    At each stage i, if there is no active half-edge at stage i, choose a vertex v proportional to its degree among the alive (not yet killed) vertices and declare all its half-edges to be active and declare v to be exploring. Proceed to step \(i+1\).

  3. (S2)

    At each stage i, if the set of active half-edges is non-empty, then take an active half-edge e of an exploring vertex v and pair it with a half-edge f chosen uniformly among the alive half-edges. Kill ef. If f is incident to a vertex \(v'\) that has not been discovered before, then declare all the half-edges incident to \(v'\) active (if any), except f. If \(\mathrm {degree}(v')=1\) (i.e. the only half-edge incident to \(v'\) is f) then kill \(v'\). Otherwise, declare \(v'\) to be active and larger than all other vertices that are active. After killing e, if v does not have another active half-edge, then kill v also, and declare the smallest vertex to be exploring.

  4. (S3)

    Repeat from (S1) at stage \(i+1\) if not all half-edges are already killed.

Algorithm 3 gives a breadth-first exploration of the connected components of \(\mathrm {CM}_n(\tilde{{\varvec{d}}})\). Define the exploration process by

$$\begin{aligned} S_n(0)=0,\quad S_n(l)=S_n(l-1)+{\tilde{d}}_{(l)}J_l-2, \end{aligned}$$
(4.3)

where \(J_l\) is the indicator that a new vertex is discovered at time l and \({\tilde{d}}_{(l)}\) is the degree of the new vertex chosen at time l when \(J_l=1\). The \(-2\) in (4.3) takes into account the fact that two half-edges are killed whenever two half-edges are paired at some step. However, at the beginning of exploring a component when Algorithm 3 (S1) is carried out, we do not pair half-edges but the exploration process subtracts \(-2\) nonetheless. For this reason, there is an additional \(-2\) in (4.3) at the beginning of exploring each component, and thus the first component is explored when the exploration process hits \(-2\), the second component is explored when the process hits \(-4\) and so on. More formally, suppose that \(\mathscr {C}_{k}\) is the k-th connected component explored by the above exploration process and define \(\tau _{k}=\inf \big \{ i:S_{n}(i)=-2k \big \}\). Then \(\mathscr {C}_{k}\) is discovered between the times \(\tau _{k-1}+1\) and \(\tau _k\), and \(\tau _{k}-\tau _{k-1}-1\) gives the total number of edges in \(\mathscr {C}_k\). Call a vertex discovered if it is either active or killed. Let \(\mathscr {V}_l\) denote the set of vertices discovered up to time l and \({\mathscr {I}}_i^n(l):=\mathbb {1}_{\left\{ i\in \mathscr {V}_l\right\} }\). Note that

$$\begin{aligned} S_n(l)= \sum _{i\in [n]} {\tilde{d}}_i {\mathscr {I}}_i^n(l)-2l. \end{aligned}$$

In the rest of this section, we often use the asymptotics in Lemma 5 even if it is not stated explicitly. Recall that we write \(\mathscr {F}_l^n = \sigma ({\mathscr {I}}^n_i(l): i\in [n]) \). All the martingales and related computations will be done with respect to the filtration \((\mathscr {F}_l^n )_{l\ge 0}\).

Define the re-scaled version \(\bar{\mathbf {S}}_n\) of \(\mathbf {S}_n\) by \({\bar{S}}_n(t)= n^{-\rho }S_n(\lfloor tn^{\rho } \rfloor )\). Then,

$$\begin{aligned} {\bar{S}}_n(t)= n^{-\rho } \sum _{i\in [n]}({\tilde{d}}_i-1) {\mathscr {I}}_i^n(tn^{\rho })+n^{-\rho } \sum _{i\in [n]}{\mathscr {I}}_i^n(tn^{\rho }) -2t + o(1), \end{aligned}$$
(4.4)

where we have used the convention that \({\mathscr {I}}_i^n(tn^{\rho }) = {\mathscr {I}}_i^n(\left\lfloor tn^{\rho } \right\rfloor )\) when \(tn^{\rho }\) is not an integer. The following theorem describes the scaling limit of this rescaled process:

Theorem 5

Consider the process \(\bar{\mathbf {S}}_n:= ({\bar{S}}_n(t))_{t\ge 0}\) defined in (4.4) and recall the definition of \(\mathbf {S}_{\infty }^\lambda \) in (2.4). Then, under Assumption 1, as \(n\rightarrow \infty ,\)

$$\begin{aligned} \bar{\mathbf {S}}_n \xrightarrow {d}\mathbf {S}_{\infty }^\lambda \end{aligned}$$

with respect to the Skorohod \(J_1\)-topology.

To prove Theorem 5, we need to obtain asymptotics of the first two terms in (4.4). The first term accounts for the contribution due to the non-degree-one vertices during the exploration. The first term is dominated by the contributions due to hubs, which allows us to use a truncation argument. The convergence of the truncated sum is given by the following lemma:

Lemma 6

Fix any \(K\ge 1\), and \({\mathscr {I}}_i(s):=\mathbb {1}_{\left\{ \xi _i\le s \right\} }\) where \(\xi _i\sim \mathrm {Exp}(\theta _i/\mu )\) independently for \(i\in [K]\). Under Assumption 1, as \(n\rightarrow \infty \),

$$\begin{aligned} \left( {\mathscr {I}}_i^n(tn^\rho ) \right) _{i\in [K],t\ge 0} \xrightarrow {d}\left( {\mathscr {I}}_i(t) \right) _{i\in [K],t\ge 0} \end{aligned}$$

with respect to the Skorohod \(J_1\)-topology.

The second term in (4.4) describes the proportion of time when a new vertex is found. Since we see a new vertex of degree one in most steps of the exploration process, this term is shown to converge to the constant function t, which is proved using martingale arguments. This is summarized in the next lemma:

Lemma 7

For any \(u>0\), as \(n\rightarrow \infty \), \(\sup _{u\le t}n^{-\rho } \big |\sum _{i\in [n]}{\mathscr {I}}_i^n(un^{\rho }) -un^{\rho }\big | \xrightarrow {\mathbb {P}}0.\)

We first prove Theorem 5 using Lemmas 6 and 7. The lemmas will be proved subsequently. Let \({\tilde{\ell }}_n(u)\) denote the number of unpaired half-edges at time \(\lfloor un^{\rho }\rfloor \). Thus, \({\tilde{\ell }}_n(u) = {\tilde{\ell }}_n -2(\left\lfloor un^{\rho } \right\rfloor - c_{\left\lfloor un^{\rho } \right\rfloor })+ 1\), where \(c_l\) is the number of components explored up to time l. Note that \({\tilde{\ell }}_n- 2un^{\rho }+1\le {\tilde{\ell }}_n(u) \le {\tilde{\ell }}_n\). Since \({\tilde{\ell }}_n = \varTheta _{\scriptscriptstyle \mathbb {P}} (n^{2\rho })\), we have \({\tilde{\ell }}_n(u) = {\tilde{\ell }}_n(1+o_{\scriptscriptstyle \mathbb {P}}(1))\) uniformly over \(u\le t\). Let \({\tilde{\mathbb {P}}}(\cdot )\) (respectively \({\tilde{\mathbb {E}}}[\cdot ]\)) denote the conditional probability (respectively expectation) conditionally on \(({\tilde{d}}_i)_{i\in [n]}\).

Proof of Theorem 5

Note that, \({\mathscr {I}}_i^n(l) =0\) for all \(l\ge 1\) if \({\tilde{d}}_i=0\). Now, if \({\tilde{d}}_i\ge 1\), then for any \(t\ge 0\), uniformly over \(l\le tn^{\rho }\),

$$\begin{aligned} \tilde{\mathbb {E}}\left[ {\mathscr {I}}_i^n(l)\right] = \tilde{\mathbb {P}}\left( {\mathscr {I}}_i^n(l)=1\right) \le \frac{l{\tilde{d}}_i}{{\tilde{\ell }}_n - 2un^{\rho }+1}. \end{aligned}$$
(4.5)

Let \(X_{n,K}(t):=n^{-\rho }\sup _{u\le t}\sum _{i>K}({\tilde{d}}_i-1){\mathscr {I}}_i^n(un^{\rho })\). Note that \({\mathscr {I}}_i^n(un^{\rho }) \le {\mathscr {I}}_i^n(tn^{\rho })\). Also, using \({\mathscr {I}}_i^n(un^{\rho }) = 0\) whenever \({\tilde{d}}_i=0\), it follows that \(({\tilde{d}}_i-1){\mathscr {I}}_i^n(un^{\rho }) \ge 0\) for all \(i\in [n]\) and \(u>0\). Thus,

$$\begin{aligned}\begin{aligned} {\tilde{\mathbb {E}}}[X_{n,K}(t)]&\le n^{-\rho }{\tilde{\mathbb {E}}}\bigg [\sum _{i>K}({\tilde{d}}_i-1){\mathscr {I}}_i^n(tn^{\rho })\bigg ]\le t \frac{\sum _{i>K}{\tilde{d}}_i({\tilde{d}}_i-1)}{{\tilde{\ell }}_n(t)} := \varepsilon _{n,K}(t), \end{aligned}\end{aligned}$$

where \(\lim _{K\rightarrow \infty }\limsup _{n\rightarrow \infty }\mathbb {P}(\varepsilon _{n,K}(t) >\delta ) = 0\) for any \(\delta >0\), due to Lemma 5. Therefore, for any \(\varepsilon ,\delta >0\), using Markov’s inequality,

$$\begin{aligned}\begin{aligned} \lim _{K\rightarrow \infty }\limsup _{n\rightarrow \infty }\mathbb {P}\Big ({\tilde{\mathbb {P}}}(X_{n,K}(t)> \varepsilon )> \delta \Big ) \le \lim _{K\rightarrow \infty }\limsup _{n\rightarrow \infty }\mathbb {P}\Big ({\tilde{\mathbb {E}}}[X_{n,K}(t)] > \delta \varepsilon \Big ) =0. \end{aligned}\end{aligned}$$

Let \({\mathscr {B}}_{n,K}:= \{{\tilde{\mathbb {P}}}(X_{n,K}(t)> \varepsilon ) > \delta \}\). It follows that

$$\begin{aligned}\begin{aligned} \mathbb {P}(X_{n,K}(t)> \varepsilon ) = \mathbb {E}\big [{\tilde{\mathbb {P}}}(X_{n,K} (t) > \varepsilon ) \big ] \le \mathbb {P}({\mathscr {B}}_{n,K}) + \delta . \end{aligned}\end{aligned}$$

Taking the iterated limit \(\lim _{\delta \rightarrow 0}\limsup _{K\rightarrow \infty }\limsup _{n\rightarrow \infty } \) yields, for any \(\varepsilon >0\),

$$\begin{aligned} \begin{aligned} \lim _{K\rightarrow \infty }\limsup _{n\rightarrow \infty }\mathbb {P}(X_{n,K}(t) > \varepsilon ) = 0. \end{aligned}\end{aligned}$$
(4.6)

Using (4.6) and Lemma 7, it is now enough to deduce the scaling limit, as \(n\rightarrow \infty \), for

$$\begin{aligned} {\bar{S}}_n^K(t) = n^{-\rho }\sum _{i=1}^K {\tilde{d}}_i {\mathscr {I}}_i^n(tn^{\rho })- t \end{aligned}$$

and then taking \(K\rightarrow \infty \). But for any fixed \(K\ge 1\), Lemma 6 yields the limit of \(S_n^K\), and the proof of Theorem 5 follows. \(\quad \square \)

Proof of Lemma 6

By noting that \(({\mathscr {I}}_i^n(tn^\rho ))_{t\ge 0}\) are indicator processes, for any \(m_1\le m_2\le m_3\), it follows that \(\min \{{\mathscr {I}}_i^n(m_2)- {\mathscr {I}}_i^n(m_1), {\mathscr {I}}_i^n(m_3) - {\mathscr {I}}_i^n(m_2)\} = 0\), and thus [14, Theorem 13.5] implies tightness of \(({\mathscr {I}}_i^n(tn^\rho ) )_{t\ge 0, n\ge 1}\) for each fixed \(i\ge 1\). Thus, it is enough to show that

$$\begin{aligned} \tilde{\mathbb {P}}\left( {\mathscr {I}}_i^n(t_in^{\rho })=0,\ \forall i\in [K]\right) \xrightarrow {\mathbb {P}}\tilde{\mathbb {P}}\left( {\mathscr {I}}_i(t_i)=0,\ \forall i\in [K]\right) = \exp \Big ( -\mu ^{-1}\sum _{i=1}^{K} \theta _it_i\Big ), \end{aligned}$$

for any \(t_1,\dots ,t_K\in [0,\infty )\). Now,

$$\begin{aligned} \tilde{\mathbb {P}}\left( {\mathscr {I}}_i^n(m_i)=0,\ \forall i\in [K]\right) =\prod _{l=1}^{\infty }\Big (1-\sum _{i\le K:l\le m_i}\frac{{\tilde{d}}_i}{{\tilde{\ell }}_n-\varTheta (l)} \Big ). \end{aligned}$$
(4.7)

Taking logarithms on both sides of (4.7) and using the fact that \(l\le \max m_i=\varTheta (n^{\rho })\) we get

$$\begin{aligned} \begin{aligned} \tilde{\mathbb {P}}\left( {\mathscr {I}}_i^n(m_i)=0\, \forall i\in [K]\right)&= \exp \Big ( - \sum _{l=1}^{\infty }\sum _{i\le K:l\le m_i} \frac{{\tilde{d}}_i}{{\tilde{\ell }}_n}+o(1) \Big )\\&= \exp \Big ( -\sum _{i\in [K]} \frac{{\tilde{d}}_im_i}{{\tilde{\ell }}_n} +o(1) \Big ). \end{aligned} \end{aligned}$$
(4.8)

Putting \(m_i=t_in^{\rho }\), Assumption 1 (i), (ii) give

$$\begin{aligned} \frac{m_i{\tilde{d}}_i}{{\tilde{\ell }}_n}= \frac{\theta _it_i}{\mu } (1+o_{\scriptscriptstyle \mathbb {P}}(1)). \end{aligned}$$
(4.9)

Hence (4.8) and (4.9) complete the proof of Lemma 6. \(\quad \square \)

Proof of Lemma 7

Define \(W_n(l) = \sum _{i\in [n]} {\mathscr {I}}_i^n(l) -l\). Recall that \({\mathscr {V}}_l\) denotes the set of vertices discovered up to time l, \(\tau _k\) is the time when the k-th component has been explored, and \(c_l\) is the number of components explored up to time l. Observe that

$$\begin{aligned} \begin{aligned} {\tilde{\mathbb {E}}}[W_n(l+1)-W_n(l) \mid \mathscr {F}_l]&= \sum _{i\in [n]} {\tilde{\mathbb {E}}}\big [{\mathscr {I}}^n_i(l+1)\mid \mathscr {F}_l\big ]\mathbb {1}_{\left\{ i\notin {\mathscr {V}}_l\right\} } -1 \\&= \sum _{i\notin {\mathscr {V}}_l} \frac{{\tilde{d}}_i}{{\tilde{\ell }}_n-2l+2c_l+1} - 1 = \frac{2l-1- \sum _{i\in {\mathscr {V}}_l} {\tilde{d}}_i - 2c_l}{{\tilde{\ell }}_n-2l+2c_l+1}. \end{aligned}\nonumber \\ \end{aligned}$$
(4.10)

To see that the final term in (4.10) is negative, note that if \(l= \tau _k\) for some k, then \(\sum _{i\in {\mathscr {V}}_{\tau _k}} {\tilde{d}}_i -2\tau _k= 2k \), and \(c_{\tau _k} = k\) so that

$$\begin{aligned} \begin{aligned} 2\tau _k - 1-\sum _{i\in {\mathscr {V}}_{\tau _k}} {\tilde{d}}_i - 2c_{\tau _k} = -1<0. \end{aligned}\end{aligned}$$
(4.11)

If \(\tau _{k}<l<\tau _{k+1}\), then \(\sum _{i\in {\mathscr {V}}_l{\setminus }{\mathscr {V}}_{\tau _k}} {\tilde{d}}_i -2(l-\tau _k)\ge -1\), and also \(c_l = c_{\tau _k}+1\). Therefore, using (4.11), we conclude that the final term in (4.10) is negative for all \(l\ge 1\), and consequently, \((W_n(l))_{l\ge 1}\) is a super-martingale. We will use the martingale-inequality [54, Lemma 2.54.5] stating that for any sub/super-martingale \((M(t))_{t\ge 0}\), with \(M(0)=0\),

$$\begin{aligned} \varepsilon \mathbb {P}\left( \sup _{s\le t}|M(s)|>3\varepsilon \right) \le 3\mathbb {E}\left[ |M(t)|\right] \le 3\left( |\mathbb {E}\left[ M(t)\right] |+\sqrt{\mathrm {Var}\left( M(t)\right) }\right) . \end{aligned}$$
(4.12)

Using Taylor expansion,

$$\begin{aligned}\begin{aligned} {\tilde{\mathbb {P}}}({\mathscr {I}}_i^n(l)=1) \ge 1- \Big (1-\frac{{\tilde{d}}_i}{{\tilde{\ell }}_n}\Big )^l \ge \Big (\frac{l{\tilde{d}}_i}{{\tilde{\ell }}_n} - \frac{l^2{\tilde{d}}_i^2}{{\tilde{\ell }}_n^2}\Big )\mathbb {1}_{\left\{ l{\tilde{d}}_i < {\tilde{\ell }}_n\right\} }, \end{aligned}\end{aligned}$$

and thus, using Lemma 5, and \(l = tn^{\rho }\),

$$\begin{aligned} \begin{aligned} n^{-\rho } |{\tilde{\mathbb {E}}}[W_n(tn^{\rho })]|&= t- n^{-\rho }\sum _{i\in [n]} {\tilde{\mathbb {P}}}({\mathscr {I}}_i^n(tn^{\rho })=1) \\&\le t\sum _{i\in [n]} \frac{{\tilde{d}}_i\mathbb {1}_{\left\{ {\tilde{d}}_i > {\tilde{\ell }}_n/tn^{\rho }\right\} }}{{\tilde{\ell }}_n} + \frac{t^2n^{\rho } \sum _{i\in [n]}{\tilde{d}}_i^2}{{\tilde{\ell }}_n^2}. \end{aligned}\end{aligned}$$
(4.13)

Let \({\mathscr {E}}_n\) denote the good event that \(\ell _n p_c(\lambda )/2\le {\tilde{\ell }}_n\le 2\ell _n p_c(\lambda )\) and \(p_c(\lambda ) d_i/2\le {\tilde{d}}_i\le 2p_c(\lambda ) d_i\) for all i such that \(d_i>C_0n^{\rho }\) for some \(C_0\) (sufficiently small). Using standard concentration inequalities for the binomial distribution [42, Theorem 2.1], \(\mathbb {P}({\mathscr {E}}_n^c) < 2\mathrm {e}^{-n^{\varepsilon }}\) for some \(\varepsilon >0\). On the event \({\mathscr {E}}_n\), \({\tilde{d}}_i> C n^{\rho }\), and thus \(d_i> C n^{\rho }\). We can bound

$$\begin{aligned}\begin{aligned} \sum _{i\in [n]} \frac{{\tilde{d}}_i\mathbb {1}_{\left\{ {\tilde{d}}_i> {\tilde{\ell }}_n/tn^{\rho }\right\} }}{{\tilde{\ell }}_n} \le \frac{C_1}{\ell _n} \sum _{i\in [n]} d_i \mathbb {1}_{\left\{ d_i>C n^{\rho }\right\} } = o(1), \end{aligned}\end{aligned}$$

where the final step follows using the uniform integrability from Assumption 1. The second term in (4.13) is \(o_{\scriptscriptstyle \mathbb {P}}(1)\) using Lemma 5. Thus,

$$\begin{aligned} \begin{aligned} n^{-\rho } |{\tilde{\mathbb {E}}}[W_n(tn^{\rho })]|&= o_{\scriptscriptstyle \mathbb {P}}(1). \end{aligned}\end{aligned}$$
(4.14)

Next, note that for any \((x_1,x_2,\dots )\), \(0\le a+b \le x_i\) and \(a,b>0\) one has \(\prod _{i=1}^R(1-a/x_i)(1-b/x_i)\ge \prod _{i=1}^R (1-(a+b)/x_i)\). Thus, for all \(l\ge 1\) and \(i\ne j\),

$$\begin{aligned} {\tilde{\mathbb {P}}}({\mathscr {I}}_i^n(l)=0, {\mathscr {I}}_j^n(l)=0)\le {\tilde{\mathbb {P}}}({\mathscr {I}}_i^n(l)=0){\tilde{\mathbb {P}}}({\mathscr {I}}_j^n(l)=0), \end{aligned}$$

and thus

$$\begin{aligned} \begin{aligned}&{\tilde{\mathbb {P}}}({\mathscr {I}}_i^n(l)=1, {\mathscr {I}}_j^n(l)=1) \\&= 1- {\tilde{\mathbb {P}}}({\mathscr {I}}_i^n(l)=0)-{\tilde{\mathbb {P}}}({\mathscr {I}}_j^n(l)=0)+ {\tilde{\mathbb {P}}}({\mathscr {I}}_i^n(l)=0, {\mathscr {I}}_j^n(l)=0) \\&\le 1- {\tilde{\mathbb {P}}}({\mathscr {I}}_i^n(l)=0)-{\tilde{\mathbb {P}}}({\mathscr {I}}_j^n(l)=0)+ {\tilde{\mathbb {P}}}({\mathscr {I}}_i^n(l)=0){\tilde{\mathbb {P}}}({\mathscr {I}}_j^n(l)=0)\\&={\tilde{\mathbb {P}}}({\mathscr {I}}_i^n(l)=1){\tilde{\mathbb {P}}}( {\mathscr {I}}_j^n(l)=1). \end{aligned}\end{aligned}$$
(4.15)

Therefore \({\mathscr {I}}_i^n(l)\) and \({\mathscr {I}}^n_j(l)\) are negatively correlated. Using (4.5), it follows that

$$\begin{aligned}\begin{aligned} \mathrm {Var}({\mathscr {I}}_i^n(l)\vert ({\tilde{d}}_i)_{i\in [n]})\le {\tilde{\mathbb {P}}}({\mathscr {I}}_i^n(l)=1) \le \frac{l{\tilde{d}}_i }{{\tilde{\ell }}_n(t)}, \end{aligned}\end{aligned}$$

uniformly over \(l\le tn^{\rho }\). Therefore, using the negative correlation in (4.15),

$$\begin{aligned} \begin{aligned} n^{-2\rho }\mathrm {Var}\big (W_n(tn^{\rho })\big \vert ({\tilde{d}}_i)_{i\in [n]}\big )&\le n^{-2\rho } \sum _{i\in [n]} \mathrm {Var}\left( {\mathscr {I}}_i^n(tn^{\rho })\vert ({\tilde{d}}_i)_{i\in [n]}\right) \\&= n^{-2\rho } tn^{\rho } \frac{\sum _{i\in [n]}{\tilde{d}}_i}{{\tilde{\ell }}_n(t)}= \varTheta _{\scriptscriptstyle \mathbb {P}}(n^{-\rho }) =o_{\scriptscriptstyle \mathbb {P}}(1). \end{aligned} \end{aligned}$$
(4.16)

Using (4.14) and (4.16), the proof now follows by an application of (4.12). \(\quad \square \)

4.2.2 Large components are explored early

In this section, we prove two key results that allow us to deduce the convergence of the component sizes. Firstly, we show that the rescaled vector of component sizes is tight in \(\ell ^2_{{\scriptscriptstyle \downarrow }}\) (see Proposition 4). This result is then used to show that the largest components of \({\mathscr {G}}_n(p_c(\lambda ))\) are explored before time \(\varTheta (n^{\rho })\) (Proposition 5). The latter allows us to apply Proposition 1. Let \({\mathscr {C}}_{\scriptscriptstyle (i)}\) denote the i-th largest component for \({\mathscr {G}}_n(p_c(\lambda ))\). Recall that our convention is to take \(|{\mathscr {C}}| =0\), if the component consists of one vertex and no edges.

Proposition 4

Under Assumption 1, for any \(\varepsilon >0\),

$$\begin{aligned} \lim _{K\rightarrow \infty }\limsup _{n\rightarrow \infty }\mathbb {P}\bigg (\sum _{i>K}|{\mathscr {C}}_{\scriptscriptstyle (i)}|^2>\varepsilon n^{2\rho }\bigg ) = 0. \end{aligned}$$

Let \({\mathscr {G}}^{\scriptscriptstyle K}_n\) be the random graph obtained by removing all edges attached to vertices \(1,\dots ,K\) and let \({\varvec{d}}'\) be the obtained degree sequence. Further, let \({\mathscr {C}}^{\scriptscriptstyle K}(v)\) and \({\mathscr {C}}_{\scriptscriptstyle (i)}^{\scriptscriptstyle K}\) denote the connected component containing v and the i-th largest component respectively in \({\mathscr {G}}^{\scriptscriptstyle K}_n\). Let \(D^{\scriptscriptstyle K}(v) = \sum _{k\in {\mathscr {C}}^{\scriptscriptstyle K}(v)}{\tilde{d}}_k\) and \(D^{\scriptscriptstyle K}_i = \sum _{k\in {\mathscr {C}}_{\scriptscriptstyle (i)}^{\scriptscriptstyle K}}{\tilde{d}}_k\). Let \(V_n^{*,{\scriptscriptstyle K}}\) be chosen according to the following size-biased distribution:

$$\begin{aligned}\begin{aligned} \mathbb {P}(V_n^{*,{\scriptscriptstyle K}} = i) = \frac{{\tilde{d}}_i}{{\tilde{\ell }}_n - \sum _{i=1}^K {\tilde{d}}_i}, \quad \text {for }i\in [n]{\setminus }[K]. \end{aligned}\end{aligned}$$

Also, denote the criticality parameter of \({\mathscr {G}}^{\scriptscriptstyle K}_n\) by \(\nu _n^{\scriptscriptstyle K}\).

Lemma 8

Suppose that Assumption 1 holds. Then, for any \(\varepsilon >0\),

$$\begin{aligned}\begin{aligned} \lim _{K\rightarrow \infty }\limsup _{n\rightarrow \infty }\mathbb {P}\bigg ({\tilde{\mathbb {E}}}\bigg [\sum _{k\in {\mathscr {C}}^K(V_n^{*,{\scriptscriptstyle K}})}({\tilde{d}}_k-1)\bigg ] > \varepsilon \bigg ) = 0. \end{aligned}\end{aligned}$$

Proof

Note that the criticality parameter of \({\mathscr {G}}_n(p_c(\lambda ))\) is \({\tilde{\nu }}_n = \lambda (1+o_{\scriptscriptstyle \mathbb {P}}(1))\), by Lemma 5. Now, conditionally on the set of removed half-edges, \({\mathscr {G}}^{\scriptscriptstyle K}_n\) is still a configuration model with some degree sequence \({\varvec{d}}'\) with \(d_i'\le {\tilde{d}}_i\) for all \(i\in [n]{\setminus } [K]\) and \(d_i'=0\) for \(i\in [K]\). Further, the criticality parameter of \({\mathscr {G}}^{\scriptscriptstyle K}_n\) satisfies

$$\begin{aligned} \begin{aligned} \nu ^{\scriptscriptstyle K}_n&= \frac{\sum _{i\in [n]} d_i'(d'_i-1)}{\sum _{i\in [n]} d_i'}\le \frac{\sum _{i>K}{\tilde{d}}_i({\tilde{d}}_i-1)}{{\tilde{\ell }}_n-2\sum _{i=1}^K{\tilde{d}}_i} = \lambda \frac{\sum _{i>K}{\tilde{d}}_i({\tilde{d}}_i-1)}{\sum _{i\in [n]}{\tilde{d}}_i({\tilde{d}}_i-1)}(1+o_{\scriptscriptstyle \mathbb {P}}(1)), \end{aligned} \end{aligned}$$
(4.17)

where we have used \({\tilde{\nu }}_n = \lambda (1+o_{\scriptscriptstyle \mathbb {P}}(1))\) in the last step. Now, by Assumption 1 and Lemma 5, it is possible to choose \(K_0\) large such that for all \(K\ge K_0\)

$$\begin{aligned} \nu _n^{\scriptscriptstyle K}<1 \quad \text {with high probability}. \end{aligned}$$

This yields

$$\begin{aligned} \begin{aligned} {\tilde{\mathbb {E}}} \bigg [\sum _{k\in {\mathscr {C}}^K(V_n^{*,{\scriptscriptstyle K}})}({\tilde{d}}_k-1)\bigg ] \le {\tilde{\mathbb {E}}} [{\tilde{d}}_{V_n^{*,{\scriptscriptstyle K}}}-1]\Big (1+ \frac{{\tilde{\mathbb {E}}}[{\tilde{d}}_{\scriptscriptstyle V_n^{*,{\scriptscriptstyle K}}}]}{(1-\nu _n^{\scriptscriptstyle K})}+o_{\scriptscriptstyle \mathbb {P}}(1)\Big ), \end{aligned}\end{aligned}$$
(4.18)

where \({\tilde{d}}_{\scriptscriptstyle V_n^{*,{\scriptscriptstyle K}}}\) is the degree of the vertex \(V_n^{*,{\scriptscriptstyle K}}\) in \({\mathscr {G}}_n(p_c(\lambda ))\). The proof of (4.18) uses path-counting techniques for the configuration model [40]. Since the arguments are adaptations of [27], we move the proof to Appendix A.1. We now use Lemma 5 to compute the asymptotics of the different terms in (4.18). Note that \({\tilde{\mathbb {E}}}[{\tilde{d}}_{\scriptscriptstyle V_n^{*,{\scriptscriptstyle K}}}] \le (1+o_{\scriptscriptstyle \mathbb {P}}(1))\sum _{i>K}{\tilde{d}}_i^2/{\tilde{\ell }}_n = O_{\scriptscriptstyle \mathbb {P}}(1)\), and

$$\begin{aligned} {\tilde{\mathbb {E}}}[{\tilde{d}}_{\scriptscriptstyle V_n^{*,{\scriptscriptstyle K}}}-1] = \frac{\sum _{i>K}{\tilde{d}}_i({\tilde{d}}_i-1)}{{\tilde{\ell }}_n-\sum _{i=1}^K{\tilde{d}}_i} = (1+o_{\scriptscriptstyle \mathbb {P}}(1)) \frac{p_n\sum _{i>K}d_i(d_i-1)}{\sum _{i\in [n]}d_i} \xrightarrow {\mathbb {P}}0, \end{aligned}$$

in the iterated limit \(\lim _{K\rightarrow \infty } \lim _{n\rightarrow \infty }\). Thus the proof of Lemma 8 follows. \(\quad \square \)

Proof of Proposition 4

Recall that \({\mathscr {C}}_{\scriptscriptstyle (i)}^{\scriptscriptstyle K}\) denotes the i-th largest component in \({\mathscr {G}}^{\scriptscriptstyle K}_n\) and \(D^{\scriptscriptstyle K}_i = \sum _{k\in {\mathscr {C}}_{\scriptscriptstyle (i)}^{\scriptscriptstyle K}}{\tilde{d}}_k\). Denote by \(\mathscr {S}_K\), the squared sum of the component sizes after removing components containing \(1,\dots ,K\). Note that

$$\begin{aligned} \begin{aligned} \sum _{i>K} |{\mathscr {C}}_{\scriptscriptstyle (i)}|^2&= \sum _{i\ge 1} |{\mathscr {C}}_{\scriptscriptstyle (i)}|^2-\sum _{i=1}^K |{\mathscr {C}}_{\scriptscriptstyle (i)}|^2 \le \mathscr {S}_K \le \sum _{i\ge 1} |{\mathscr {C}}_{\scriptscriptstyle (i)}^{\scriptscriptstyle K}|^2 \le 4\sum _{i\ge 1} D_i^{\scriptscriptstyle K}\sum _{k\in {\mathscr {C}}_{\scriptscriptstyle (i)}^{\scriptscriptstyle K}}({\tilde{d}}_k-1), \end{aligned}\nonumber \\ \end{aligned}$$
(4.19)

where the last step uses \(d_i'\le {\tilde{d}}_i\) and the fact that for any connected component \({\mathscr {C}}\) with total degree D, we must have \(D-|{\mathscr {C}}|\ge |{\mathscr {C}}|/4\). The last fact can be seen for \(|{\mathscr {C}}|\ge 2\) by \(D-|{\mathscr {C}}| \ge 2(|{\mathscr {C}}| -1) -|{\mathscr {C}}| = |{\mathscr {C}}| -2 \ge |{\mathscr {C}}|/4\), and for \(|{\mathscr {C}}| = 1\) and \(D\ge 2\), this follows trivially. Note here that we do not consider components with \(|{\mathscr {C}}| = 1\) and \(D=0\); see Remark 2. Thus it is enough to bound the final term in (4.19). Now,

$$\begin{aligned} \begin{aligned} {\tilde{\mathbb {P}}}\bigg (\sum _{i\ge 1} D_i^{\scriptscriptstyle K}\sum _{k\in {\mathscr {C}}_{\scriptscriptstyle (i)}^{\scriptscriptstyle K}}({\tilde{d}}_k-1) >\varepsilon n^{2\rho }\bigg )&\le \frac{1}{\varepsilon n^{2\rho }}{\tilde{\mathbb {E}}}\bigg [\sum _{i\ge 1} D_i^{\scriptscriptstyle K}\sum _{k\in {\mathscr {C}}_{\scriptscriptstyle (i)}^{\scriptscriptstyle K}}({\tilde{d}}_k-1) \bigg ] \\&= \frac{{\tilde{\ell }}_n-\sum _{i\in [K]} {\tilde{d}}_i}{\varepsilon n^{2\rho } }{\tilde{\mathbb {E}}}\bigg [\sum _{k\in {\mathscr {C}}^K(V_n^{*,{\scriptscriptstyle K}})}({\tilde{d}}_k-1)\bigg ]. \end{aligned}\nonumber \\ \end{aligned}$$
(4.20)

Thus, the proof follows using Lemma 8, and the fact that \({\tilde{\ell }}_n-\sum _{i\in [K]} {\tilde{d}}_i \le {\tilde{\ell }}_n = O_{\scriptscriptstyle \mathbb {P}}(n^{2\rho })\). \(\quad \square \)

The next proposition shows that, in Algorithm 3, the large components are explored before time \(\varTheta (n^{\rho })\). Let \({\mathscr {C}}_{\max }^{\scriptscriptstyle \ge T}\) denote the size of the largest component whose exploration is started by Algorithm 3 after time \(Tn^{\rho }\), and let \(D_{\max }^{\scriptscriptstyle \ge T} = \sum _{k\in {\mathscr {C}}_{\max }^{\scriptscriptstyle \ge T}}{\tilde{d}}_k\).

Proposition 5

Under Assumption 1, for any \(\varepsilon >0\),

$$\begin{aligned} \lim _{T\rightarrow \infty }\limsup _{n\rightarrow \infty }\mathbb {P}\big (|{\mathscr {C}}_{\max }^{\scriptscriptstyle \ge T}|>\varepsilon n^{\rho }\big ) = 0 \quad \text {and} \quad \lim _{T\rightarrow \infty }\limsup _{n\rightarrow \infty }\mathbb {P}\big (D_{\max }^{\scriptscriptstyle \ge T}>\varepsilon n^{\rho }\big ) = 0. \end{aligned}$$

Proof

Define \(\mathscr {A}_{\scriptscriptstyle K,T}^n:= \{\text {all the vertices of }[K] \text { are explored before time }Tn^{\rho }\}.\) Let \(\mathscr {C}_{\scriptscriptstyle (i)}^{\scriptscriptstyle K}\) denote the i-th largest component of \({\mathscr {G}}^{\scriptscriptstyle K}_n\) so that

$$\begin{aligned} \begin{aligned} \tilde{\mathbb {P}}\left( |\mathscr {C}_{\max }^{\scriptscriptstyle \ge T}|>\varepsilon n^\rho , \ \mathscr {A}_{\scriptscriptstyle K,T}^n\right)&\le {\tilde{\mathbb {P}}}\bigg (\sum _{i\ge 1}\big |\mathscr {C}_{\scriptscriptstyle (i)}^{\scriptscriptstyle K}\big |^2> \varepsilon ^2 n^{2\rho }\bigg ) \\&\le {\tilde{\mathbb {P}}}\bigg (\sum _{i\ge 1} D_i^{\scriptscriptstyle K}\sum _{k\in {\mathscr {C}}_{\scriptscriptstyle (i)}^{\scriptscriptstyle K}}({\tilde{d}}_k-1) >\frac{\varepsilon ^2 n^{2\rho }}{4}\bigg ). \end{aligned}\end{aligned}$$
(4.21)

The final term tends to zero in probability in the iterated limit \(\lim _{K\rightarrow \infty }\limsup _{n\rightarrow \infty }\), as shown in (4.20). Next, using the fact that \({\tilde{d}}_jn^{\rho }=\varTheta ({\tilde{\ell }}_n)\), we get

$$\begin{aligned} \begin{aligned}&\tilde{\mathbb {P}}\left( (\mathscr {A}_{\scriptscriptstyle K,T}^{n})^c\right) =\tilde{\mathbb {P}}\left( \exists j\in [K]: j \text { is not explored before }Tn^{\rho }\right) \\&\le \sum _{j=1}^K \tilde{\mathbb {P}}\left( j \text { is not explored before }Tn^{\rho }\right) \le \sum _{j=1}^K\left( 1-\frac{{\tilde{d}}_j}{{\tilde{\ell }}_n-\varTheta (Tn^{\rho })} \right) ^{Tn^{\rho }}\le \sum _{j=1}^K \mathrm {e}^{-CT}, \end{aligned}\nonumber \\ \end{aligned}$$
(4.22)

where \(C>0\) is a constant that may depend on K, and the final step holds with high probability. Now, by (4.21),

$$\begin{aligned} \tilde{\mathbb {P}}\left( |\mathscr {C}_{\max }^{\scriptscriptstyle \ge T}|>\varepsilon n^{\rho }\right) \le {\tilde{\mathbb {P}}}\bigg (\sum _{i\ge 1}\big |\mathscr {C}_{\scriptscriptstyle (i)}^{\scriptscriptstyle K}\big |^2> \varepsilon ^2 n^{2\rho }\bigg ) + \tilde{\mathbb {P}}\left( (\mathscr {A}_{\scriptscriptstyle K,T}^{n})^c\right) . \end{aligned}$$

The proof for \(\mathbb {P}\big (|{\mathscr {C}}_{\max }^{\scriptscriptstyle \ge T}|>\varepsilon n^{\rho }\big )\) follows by taking the iterated limit \(\lim _{K\rightarrow \infty }\lim _{T\rightarrow \infty }\limsup _{n\rightarrow \infty }\).

For the upper bound on \(\tilde{\mathbb {P}}\left( D_{\max }^{\scriptscriptstyle \ge T}>\varepsilon n^\rho , \ \mathscr {A}_{\scriptscriptstyle K,T}^n\right) \), note that

$$\begin{aligned}\begin{aligned} \tilde{\mathbb {P}}\left( D_{\max }^{\scriptscriptstyle \ge T}>\varepsilon n^\rho , |\mathscr {C}_{\max }^{\scriptscriptstyle \ge T}|\le \varepsilon n^{\rho }/2, \ \mathscr {A}_{\scriptscriptstyle K,T}^n\right)&\le \tilde{\mathbb {P}}\left( D_{\max }^{\scriptscriptstyle \ge T}(D_{\max }^{\scriptscriptstyle \ge T}-|\mathscr {C}_{\max }^{\scriptscriptstyle \ge T}|)>\varepsilon ^2 n^{2\rho }/2, \ \mathscr {A}_{\scriptscriptstyle K,T}^n\right) \\&\le {\tilde{\mathbb {P}}}\bigg (\sum _{i\ge 1} D_i^{\scriptscriptstyle K}\sum _{k\in {\mathscr {C}}_{\scriptscriptstyle (i)}^{\scriptscriptstyle K}}({\tilde{d}}_k-1) >\frac{\varepsilon ^2 n^{2\rho }}{2}\bigg ). \end{aligned}\end{aligned}$$

Hence, the proof for \(\mathbb {P}\big (D_{\max }^{\scriptscriptstyle \ge T}>\varepsilon n^{\rho }\big )\) also follows. \(\quad \square \)

4.2.3 Counting process that counts surplus

Let \(N_n^\lambda (k)\) be the number of surplus edges discovered up to time k and \({\bar{N}}^\lambda _n(u) = N_n^\lambda (\lfloor un^\rho \rfloor )\). Below, we prove the asymptotics for the process \(\bar{\mathbf {N}}^\lambda _n\):

Lemma 9

Under Assumption 1, as \(n\rightarrow \infty \),

$$\begin{aligned} (\bar{\mathbf {S}}_n,\bar{\mathbf {N}}_n^\lambda )\xrightarrow {d}(\mathbf {S}_{\infty }^\lambda ,\mathbf {N}^\lambda ), \end{aligned}$$

where \(\mathbf {N}^\lambda \) is defined in (2.6).

Proof

We write \( N_n^{\lambda }(l)=\sum _{i=2}^l \xi _i\), where \(\xi _i=\mathbb {1}_{\left\{ \mathscr {V}_i=\mathscr {V}_{i-1}\right\} }\). Let \(A_i\) denote the number of active half-edges after stage i while implementing Algorithm 3. Note that

$$\begin{aligned} \tilde{\mathbb {P}}\left( \xi _i=1\mid \mathscr {F}_{i-1}\right) =\frac{A_{i-1}-1}{{\tilde{\ell }}_n-2i-1}= \frac{A_{i-1}}{{\tilde{\ell }}_n}(1+O(i/{\tilde{\ell }}_n))+O({\tilde{\ell }}_n^{-1}), \end{aligned}$$

uniformly for \(i\le Tn^{\rho }\) for any \(T>0\). By Lemma 5, \({\tilde{\ell }}_n = \ell _n p_c(\lambda ) (1+o_{\scriptscriptstyle \mathbb {P}}(1))= n^{2\rho } \lambda \mu ^2/ \sum _i \theta _i^2(1+o_{\scriptscriptstyle \mathbb {P}}(1))\). Therefore, the instantaneous rate of change of the re-scaled process \(\bar{\mathbf {N}}^{\lambda }\) at time t, conditional on the past, is

$$\begin{aligned} n^{\rho }\frac{A_{\left\lfloor tn^\rho \right\rfloor }}{n^{2\rho }\frac{\lambda \mu ^2}{\sum _{i\ge 1}\theta _i^2}}\left( 1+o_{\scriptscriptstyle \mathbb {P}}(1)\right) +o_{\scriptscriptstyle \mathbb {P}}(1)= \frac{\sum _{i\ge 1}\theta _i^2}{\lambda \mu ^2}\mathrm {refl}({\bar{S}}_n(t))\left( 1+o_{\scriptscriptstyle \mathbb {P}}(1)\right) +o_{\scriptscriptstyle \mathbb {P}}(1).\nonumber \\ \end{aligned}$$
(4.23)

Since the reflection of a process is continuous in Skorohod \(J_1\)-topology (see [56, Lemma 13.5.1]), we can use Theorem 5 to conclude that \(\mathrm {refl}(\bar{\mathbf {S}}_n) \xrightarrow {\scriptscriptstyle d} \mathrm {refl} (\mathbf {S}^\lambda _\infty )\), so that the compensator of \(\bar{\mathbf {N}}_n^\lambda \) converges. The convergence of the compensators is usually enough for convergence of Poisson processes. Indeed, for Erdős-Rényi random graphs [4] or rank-one inhomogeneous random graphs [11, 12], showing the convergence of compensators suffices using [22, Theorem 1]. This is because the surplus edges can be added independently after we have observed the whole exploration process. However, this is not true for the configuration model because the surplus edges occur precisely at places with jumps \(-2\). This difficulty was circumvented in [27] for the \(\tau \in (3,4)\) regime. In Appendix A.2, we adapt the arguments from [27] in the \(\tau \in (2,3)\) setting, which completes the proof of Lemma 9. \(\quad \square \)

4.2.4 Convergence of the component sizes and the surplus edges.

We first show the asymptotics of the component sizes and surplus edges of \({\mathscr {G}}_n(p_c(\lambda ))\) generated by Algorithm 1. Recall that \(\mathrm {SP}(\mathscr {C})\) denotes the number of surplus of \(\mathscr {C}\). The following lemma states the tightness of the vector of component sizes and surplus edges of \({\mathscr {G}}_n(p_c(\lambda ))\) in the \({\mathbb {U}}^0_{{\scriptscriptstyle \downarrow }}\)-topology:

Lemma 10

For any \(\varepsilon >0\),

$$\begin{aligned} \lim _{\delta \rightarrow 0}\limsup _{n\rightarrow \infty } \mathbb {P}\bigg (\sum _{i: |\mathscr {C}_{(i)}|\le \delta n^{\rho } }|\mathscr {C}_{\scriptscriptstyle (i)}|\times \mathrm {SP}(\mathscr {C}_{\scriptscriptstyle (i)})> \varepsilon n^{\rho }\bigg )=0. \end{aligned}$$

The proof of Lemma 10 is an adaptation of [27, Proposition 19] in this setting. We provide a proof of Lemma 10 in Appendix A.3. Next, let \(\mathbf {Z}_n'(\lambda )\) denote the vector \((n^{-\rho }|{\mathscr {C}}_{\scriptscriptstyle (i)}|,\mathrm {SP}({\mathscr {C}}_{\scriptscriptstyle (i)}))_{i\ge 1}\), ordered as an element in \({\mathbb {U}}^0_{{\scriptscriptstyle \downarrow }}\). Below, we prove the scaling limit of \(\mathbf {Z}_n'(\lambda )\):

Proposition 6

Under Assumption 1, as \(n\rightarrow \infty \),

$$\begin{aligned} \mathbf {Z}_n'(\lambda ) \xrightarrow {d}\mathbf {Z}(\lambda ) \end{aligned}$$
(4.24)

with respect to the \({\mathbb {U}}^0_{{\scriptscriptstyle \downarrow }}\) topology, where \(\mathbf {Z}(\lambda )\) is defined in (2.7).

Proof

Recall from Proposition 2 that the limiting process \(\mathbf {S}_{\infty }^\lambda \) is good in the sense that all the conditions in Definition 1 are satisfied. Also, Proposition 5 ensures that the additional restriction on the pre-limit process in Proposition 1 is satisfied. Thus, using Theorem 5, an application of Proposition 1 yields the finite-dimensional convergence in (4.24). Finally, the convergence in the \({\mathbb {U}}^0_{{\scriptscriptstyle \downarrow }}\)-topology follows using the tightness in Lemma 10. \(\quad \square \)

We now provide a proof of Theorem 1:

Proof of Theorem 1

Throughout the proof, we ignore the \(\lambda \) in a predefined notation to simplify writing. We will work under the coupling under which Proposition 3 holds, i.e., \({\mathscr {G}}_n(p_c(1-\varepsilon _n))\subset \mathrm {CM}_n(\varvec{d},p_c)\subset {\mathscr {G}}_n(p_c(1+\varepsilon _n))\), where \(\varepsilon _n\rightarrow 0\). We write \({\mathscr {C}}_{\scriptscriptstyle (i)}^-\), \({\mathscr {C}}_{\scriptscriptstyle (i)}\) and \({\mathscr {C}}_{\scriptscriptstyle (i)}^+\) to denote the i-th largest component of \({\mathscr {G}}_n(p_c(1-\varepsilon _n))\), \(\mathrm {CM}_n(\varvec{d},p_c)\) and \({\mathscr {G}}_n(p_c(1+\varepsilon _n))\) respectively, and let \(\mathbf {Z}_n^-\), \(\mathbf {Z}_n\) and \(\mathbf {Z}_n^+\) be the corresponding vectors, rearranged as elements of \({\mathbb {U}}^0_{{\scriptscriptstyle \downarrow }}\). Then,

$$\begin{aligned}\begin{aligned} \mathbf {Z}_n^+ \text { and } \mathbf {Z}_n^- \text { have identical scaling limits as Proposition}~6. \end{aligned}\end{aligned}$$

Let \(\mathrm {d}_{\scriptscriptstyle {\mathbb {U}}} \) denote the metric for the \({\mathbb {U}}^0_{{\scriptscriptstyle \downarrow }}\) topology defined in (2.1). The proof is complete if we can show that, as \(n\rightarrow \infty \),

$$\begin{aligned} \begin{aligned} \mathrm {d}_{\scriptscriptstyle {\mathbb {U}}} (\mathbf {Z}_n^+,\mathbf {Z}_n) \xrightarrow {\mathbb {P}}0. \end{aligned}\end{aligned}$$
(4.25)

First, we prove that, for any \(K\ge 1\),

$$\begin{aligned} \begin{aligned} \lim _{n\rightarrow \infty } \mathbb {P}(\mathscr {C}_{\scriptscriptstyle (i)} ^{-} \subset \mathscr {C}_{\scriptscriptstyle (i)} ^{+}, \ \forall i\le K) = 1. \end{aligned}\end{aligned}$$
(4.26)

If \(\mathscr {C}_{\scriptscriptstyle (1)} ^{-} \) is not contained in \( \mathscr {C}_{\scriptscriptstyle (1)} ^{+}\), then \(|\mathscr {C}_{\scriptscriptstyle (1)} ^{-}|\le |\mathscr {C}_{\scriptscriptstyle (j)} ^{+}|\) for some \(j\ge 2\), which implies that \(|\mathscr {C}_{\scriptscriptstyle (1)} ^{-}|\le |\mathscr {C}_{\scriptscriptstyle (2)} ^{+}|\). Suppose that there is a subsequence \((n_{0k})_{k\ge 1} \subset \mathbb {N}\) along which

$$\begin{aligned} \begin{aligned} \lim _{n_{0k}\rightarrow \infty } \mathbb {P}(|\mathscr {C}_{\scriptscriptstyle (1)}^{-}| \le |\mathscr {C}_{\scriptscriptstyle (2)}^{+}|) >0. \end{aligned}\end{aligned}$$
(4.27)

If (4.27) yields a contradiction, then (4.26) is proved for \(K=1\). To this end, first note that \((n^{-\rho }(|\mathscr {C}_{\scriptscriptstyle (i)}^{-}|, |\mathscr {C}_{\scriptscriptstyle (i)}^{+}|)_{i\ge 1})_{n\ge 1}\) is tight in \((\ell ^2_{{\scriptscriptstyle \downarrow }})^2\). Thus taking a subsequence \((n_k)_{k\ge 1} \subset (n_{0k})_{k\ge 1}\) along which the random vector converges, it follows that

$$\begin{aligned}\begin{aligned} n_k^{-\rho }(|\mathscr {C}_{\scriptscriptstyle (i)}^{-}|, |\mathscr {C}_{\scriptscriptstyle (i)}^{+}|)_{i\ge 1} \xrightarrow {d}(\gamma _i,{\bar{\gamma }}_i)_{i\ge 1} \quad \text { in }(\ell ^2_{{\scriptscriptstyle \downarrow }})^2, \end{aligned}\end{aligned}$$

where \((\gamma _i)_{i\ge 1} {\mathop {=}\limits ^{\scriptscriptstyle d}}({\bar{\gamma }}_i)_{i\ge 1}\). Thus, along the subsequence \((n_k)_{k\ge 1}\),

$$\begin{aligned} \begin{aligned} \lim _{n_k\rightarrow \infty } \mathbb {P}(|\mathscr {C}_{\scriptscriptstyle (1)}^{-}| \le |\mathscr {C}_{\scriptscriptstyle (2)}^{+}|) = \mathbb {P}(\gamma _1\le {\bar{\gamma }}_2). \end{aligned}\end{aligned}$$
(4.28)

\(\square \)

Fact 2

For all \(i\ge 1\), \(\gamma _i = {\bar{\gamma }}_i\) almost surely.

Proof

Under the coupling in Proposition 3, \(\sum _{j\le i}|\mathscr {C}_{\scriptscriptstyle (j)}^{-}|\le \sum _{j\le i}|\mathscr {C}^{+}_{\scriptscriptstyle (j)}|\) and therefore \(\mathbb {P}(\sum _{j\le i}\gamma _j\le \sum _{j\le i} {\bar{\gamma }}_j) =1\), for each fixed \(i\ge 1\). In particular, \(\gamma _1 \le {\bar{\gamma }}_1\) almost surely. But, since \(\gamma _1,{\bar{\gamma }}_1\) have the same distribution, it must be the case that \(\gamma _1 = {\bar{\gamma }}_1\) almost surely. Inductively, we can prove that \(\gamma _i = {\bar{\gamma }}_i\) almost surely. \(\quad \square \)

Thus, using Fact 2, (4.28) reduces to

$$\begin{aligned} \begin{aligned} \lim _{n_k\rightarrow \infty } \mathbb {P}(|\mathscr {C}_{\scriptscriptstyle (1)}^{-}| \le |\mathscr {C}^{+}_{\scriptscriptstyle (2)}|) = \mathbb {P}(\gamma _1\le \gamma _2) = \mathbb {P}(\gamma _1= \gamma _2) = 0, \end{aligned}\nonumber \\ \end{aligned}$$
(4.29)

where the last equality follows from Definition 1(f) and Proposition 2. Note that (4.29) contradicts (4.27), and thus (4.26) follows for \(K=1\). For \(K\ge 2\), we can use a similar argument to show that, with high probability, \(\cup _{i\le K}\mathscr {C}_{\scriptscriptstyle (i)}^{-}\subset \cup _{i\le K}\mathscr {C}_{\scriptscriptstyle (i)}^{+}\). If both \(\mathscr {C}_{\scriptscriptstyle (1)}^{-}\) and \(\mathscr {C}_{\scriptscriptstyle (2)}^{-}\) are contained in \(\mathscr {C}^{+}_{\scriptscriptstyle (1)}\), then \(|\mathscr {C}^{+}_{\scriptscriptstyle (1)}| \ge |\mathscr {C}_{\scriptscriptstyle (1)}^{-}|+|\mathscr {C}_{\scriptscriptstyle (2)}^{-}|\), which occurs with probability tending to zero. This follows using Fact 2 and \(\mathbb {P}({\bar{\gamma }}_1\ge \gamma _1+\gamma _2)=\mathbb {P}(\gamma _1\ge \gamma _1+\gamma _2) = 0\). Thus, \(\mathscr {C}_{\scriptscriptstyle (2)}^{-} \subset \mathscr {C}^{+}_{\scriptscriptstyle (2)}\) with high probability and we can use similar arguments to conclude (4.26) for \(i\le K\).

Next, we show that, for any \(K\ge 1\),

$$\begin{aligned} \begin{aligned} \lim _{n\rightarrow \infty } \mathbb {P}\big (\mathscr {C}_{\scriptscriptstyle (i)} ^{-} \subset \mathscr {C}_{\scriptscriptstyle (i)} \subset \mathscr {C}_{\scriptscriptstyle (i)} ^{+}, \ \forall i\le K\big ) = 1. \end{aligned}\end{aligned}$$
(4.30)

If \(\mathscr {C}_{\scriptscriptstyle (1)} \) is not contained in \( \mathscr {C}_{\scriptscriptstyle (1)} ^{+}\), then \(|\mathscr {C}_{\scriptscriptstyle (1)}|\le |\mathscr {C}_{\scriptscriptstyle (2)} ^{+}|\). However, since \(|\mathscr {C}_{\scriptscriptstyle (1)}^{-}|\le |\mathscr {C}_{\scriptscriptstyle (1)}|\), it follows that \(|\mathscr {C}_{\scriptscriptstyle (1)}^{-}|\le |\mathscr {C}_{\scriptscriptstyle (2)} ^{+}|\). Now, one can repeat identical argument as in (4.26) to prove that \(\mathscr {C}_{\scriptscriptstyle (i)} \subset \mathscr {C}_{\scriptscriptstyle (i)} ^{+}\) for all \(i\le K\) with high probability. Moreover, since \({\mathscr {G}}_n(p_c(1-\varepsilon _n)) \subset \mathrm {CM}_n(\varvec{d}) \) and \(\mathscr {C}_{\scriptscriptstyle (i)} ^{-} \subset \mathscr {C}_{\scriptscriptstyle (i)} ^{+}\) for all \(i\le K\) with high probability, it must also be the case that \(\mathscr {C}_{\scriptscriptstyle (i)} ^{-} \subset \mathscr {C}_{\scriptscriptstyle (i)} \subset \mathscr {C}_{\scriptscriptstyle (i)} ^{+}\) for all \(i\le K\) with high probability. Thus we conclude (4.30). Finally, since \(\mathbf {Z}_n^{-}\) and \(\mathbf {Z}_n^{+}\) have the same distributional limit, it follows using (4.26) that, for all \(i\le K\),

$$\begin{aligned}\begin{aligned} |\mathscr {C}_{\scriptscriptstyle (i)} ^{+}| - |\mathscr {C}_{\scriptscriptstyle (i)} ^{-}| = o_{\scriptscriptstyle \mathbb {P}}(n^\rho ) \quad \text {and}\quad \mathrm {SP}(\mathscr {C}_{\scriptscriptstyle (i)} ^{+})-\mathrm {SP}(\mathscr {C}_{\scriptscriptstyle (i)} ^{-}) \xrightarrow {\mathbb {P}}0. \end{aligned}\end{aligned}$$

Thus, (4.30) yields

$$\begin{aligned}\begin{aligned} \big | |\mathscr {C}_{\scriptscriptstyle (i)} ^{+}| - |\mathscr {C}_{\scriptscriptstyle (i)}| \big |= o_{\scriptscriptstyle \mathbb {P}}(n^{\rho }) \quad \text {and}\quad \big |\mathrm {SP}(\mathscr {C}_{\scriptscriptstyle (i)} ^+)-\mathrm {SP}(\mathscr {C}_{\scriptscriptstyle (i)} ) \big |\xrightarrow {\mathbb {P}}0. \end{aligned}\end{aligned}$$

Moreover, since both \((\mathbf {Z}_n^{-})_{n\ge 1}\) and \((\mathbf {Z}_n^{+})_{n\ge 1}\) are tight in \({\mathbb {U}}^0_{{\scriptscriptstyle \downarrow }}\), it also follows that \((\mathbf {Z}_n)_{n\ge 1}\) is tight in \({\mathbb {U}}^0_{{\scriptscriptstyle \downarrow }}\). Thus (4.25) follows and the proof of Theorem 1 is now complete. \(\quad \square \)

4.3 Analysis of the diameter

In this section, we investigate the asymptotics of the diameter of \({\mathscr {G}}_n(p_c(\lambda ))\). As in the proof of Theorem 1, an application of Proposition 3 yields the diameter of \(\mathrm {CM}_n(\varvec{d},p_c(\lambda ))\) and completes the proof.

Proof of Theorem 2

First let us fix \(\lambda <1\) and use path counting. Let \(P_l\) denote the number of paths of length l in \({\mathscr {G}}_n(p_c(\lambda ))\). Since \(\lambda <1\), we have that \({\tilde{\nu }}_n = \lambda (1+o_{\scriptscriptstyle \mathbb {P}}(1)) < 1\) with high probability. Now, an application of [40, Lemma 5.1] yields that for all \(l\ge 1\), \({\tilde{\mathbb {E}}}[P_l] \le {\tilde{\ell }}_n ({\tilde{\nu }}_n)^{l-1}\). Thus, on the event \(\{{\tilde{\nu }}_n<1\}\), for any \(K\ge 1\),

$$\begin{aligned} \begin{aligned} {\tilde{\mathbb {P}}}(\mathrm {diam}({\mathscr {G}}_n(p_c(\lambda )))>K) \le \sum _{l>K} {\tilde{\mathbb {E}}}[P_l]\le \frac{{\tilde{\ell }}_n({\tilde{\nu }}_n)^K}{1-{\tilde{\nu }}_n} \end{aligned}\end{aligned}$$
(4.31)

Now, taking \(K= C\log n\) for some large constant \(C>0\) gives the desired \(\log n\) bound on the diameter of \({\mathscr {G}}_n(p_c(\lambda ))\) with high probability for \(\lambda <1\).

To extend to the case \(\lambda \ge 1\), we delete R highest-degree vertices to obtain a new graph \({\mathscr {G}}_n^{\scriptscriptstyle R}\). Using (4.17), \({\mathscr {G}}_n^{\scriptscriptstyle R}\) is a configuration model with the criticality parameter \(\nu _n^{\scriptscriptstyle R} <1\) with high probability. Thus the above result applies for \({\mathscr {G}}_n^{\scriptscriptstyle R}\). However, after putting back the R deleted vertices, the diameter of \({\mathscr {G}}^{\scriptscriptstyle >R}\) can increase by at most a factor of R. This implies the \(\log n\) bound on the diameter of \({\mathscr {G}}_n(p_c(\lambda ))\) with high probability for \(\lambda \ge 1\). Finally, as remarked in the beginning of this section, the proof of Theorem 2 follows by invoking Proposition 3. \(\quad \square \)

5 Near-Critical Behavior

Finally we consider the near-critical behavior for \(\mathrm {CM}_n(\varvec{d},p)\) in this section. The analysis for the barely subcritical and supercritical regimes are given separately in Sects. 5.1 and 5.2 respectively.

5.1 Barely-subcritical regime

In this section, we analyze the barely-subcritical regime (\(p_n\ll p_c\)) for percolation and complete the proof of Theorem 3. Recall the exploration process from Algorithm 3 on the graph \({\mathscr {G}}_n(p_n)\), starting with vertex j. Let \({\mathscr {C}}(j,p_n)\) denote the connected component in \({\mathscr {G}}_n(p_n)\) containing vertex j. We will use the same notation for the quantities defined in Sect. 4.2.1, but the reader should keep in mind that we now deal with different \(p_n\) values. We avoid augmenting \(p_n\) in the notation for the sake of simplicity. Consider exploring the graph using Algorithm 3 but starting from vertex j. The exploration process \(\mathbf {S}_n^j\) is given by

$$\begin{aligned} \begin{aligned} S_n^j(0) = {\tilde{d}}_j, \quad S_n^j(l)= {\tilde{d}}_j+\sum _{i: i\ne j} {\tilde{d}}_i {\mathscr {I}}_i^n(l)-2l. \end{aligned} \end{aligned}$$

Thus the exploration process starts from \({\tilde{d}}_j\) now. Now, for any \(u>0\), as \(n\rightarrow \infty \),

$$\begin{aligned}\begin{aligned} \sup _{u\le t}(n^{\alpha }p_n)^{-1} \Big |\sum _{i:i\ne j}\mathscr {I}_i^n(un^{\alpha }p_n) -un^{\alpha }p_n\Big | \xrightarrow {\mathbb {P}}0. \end{aligned}\end{aligned}$$

This follows using identical arguments as in Lemma 7, and thus is skipped here. Consider the re-scaled process \(\bar{\mathbf {S}}^j_n\) defined as \({\bar{S}}^j_n(t)= (n^{\alpha }p_n)^{-1}S_n^j(\left\lfloor tn^{\alpha }p_n \right\rfloor )\). Then,

$$\begin{aligned}\begin{aligned} {\bar{S}}_n^j(t)&= (n^{\alpha }p_n)^{-1}{\tilde{d}}_j+(n^{\alpha }p_n)^{-1} \sum _{i:i\ne j}{\tilde{d}}_i \mathscr {I}_i^n(tn^{\alpha }p_n) - 2 t +o_{\scriptscriptstyle \mathbb {P}}(1) \\&= \theta _j +(n^{\alpha }p_n)^{-1} \sum _{i:i\ne j}({\tilde{d}}_i-1) \mathscr {I}_i^n(tn^{\alpha }p_n) - t +o_{\scriptscriptstyle \mathbb {P}}(1). \end{aligned}\end{aligned}$$

Recall that \({\tilde{\mathbb {E}}}\) is the conditional expectation conditionally on \(({\tilde{d}}_i)_{i\in [n]}\). Now, since the vertices are explored in a size-biased manner with the sizes being \(({\tilde{d}}_i/{\tilde{\ell }}_n)_{i\in [n]}\), for any \(t\ge 0\),

$$\begin{aligned}\begin{aligned} {\tilde{\mathbb {E}}}\bigg [\frac{1}{n^{\alpha }p_n}\sum _{i:i\ne j} (\tilde{d_i}-1){\mathscr {I}}_i^n\big (\lfloor tn^{\alpha }p_n\rfloor \big )\bigg ] \le \frac{tn^{\alpha }p_n}{n^{\alpha }p_n {\tilde{\ell }}_n}\sum _{i\in [n]}{\tilde{d}}_i({\tilde{d}}_i-1) = o_{\scriptscriptstyle \mathbb {P}}(1), \end{aligned} \end{aligned}$$

where the first inequality uses (4.5), and the final step follows from Lemma 5. Consequently, \(\bar{\mathbf {S}}_n^j\) converges in probability to the deterministic process \((\theta _j-t)_{t\in [0,\theta _j]}\). Thus

$$\begin{aligned} \begin{aligned} \# \text { edges in }{\mathscr {C}}(j,p_n) \xrightarrow {\mathbb {P}}\theta _j. \end{aligned}\end{aligned}$$
(5.1)

Next, the proof above shows that \(\max _{l\le \theta _j n^{\alpha }p_n} S_n^j(l) \le 2\theta _j n^{\alpha }p_n\) with high probability. Thus, the probability of creating a surplus edge at each step is at most \(2\theta _j n^{\alpha }p_n/{\tilde{\ell }}_n\). This implies that the probability of creating at least one surplus edge before \(\theta _j n^{\alpha } p_n\) is at most \(2\theta _j^2n^{2\alpha }p_n^2/{\tilde{\ell }}_n = O_{\scriptscriptstyle \mathbb {P}}(n^{2\alpha -1}p_n)= o_{\scriptscriptstyle \mathbb {P}}(1)\). Together with (5.1) yields

$$\begin{aligned} (n^{\alpha }p_n)^{-1}|{\mathscr {C}}(j,p_n)| \xrightarrow {\mathbb {P}}\theta _j, \quad \text {and} \quad \mathbb {P}(\mathrm {SP}({\mathscr {C}}(j,p_n)) =0) \rightarrow 1. \end{aligned}$$

From (5.1), we can also conclude that \(\lim _{n\rightarrow \infty }\mathbb {P}(i\in \mathscr {C}(j)) =0\) for all \(i,j\ge 1\) and \(i\ne j\), since, if \(i\in \mathscr {C}(j)\), then the number of edges in \({\mathscr {C}}(j,p_n)\) is at least \({\tilde{d}}_{i}+{\tilde{d}}_j = n^{\alpha }p_n(\theta _i+\theta _j)\). Thus, \({\mathscr {C}}(i,p_n)\) and \({\mathscr {C}}(j,p_n)\) are disjoint with high probability.

To conclude Theorem 3, we show that the rescaled vector of ordered component sizes is tight in \(\ell ^2_{{\scriptscriptstyle \downarrow }}\). This tightness also yields that, for each fixed \(j\ge 1\),

$$\begin{aligned} |\mathscr {C}(j,p_n)| = |{\mathscr {C}}_{\scriptscriptstyle (j)}(p_n)|, \text { with high probability.} \end{aligned}$$

To show \(\ell ^2_{{\scriptscriptstyle \downarrow }}\)-tightness, it is enough to show that, for any \(\varepsilon >0\),

$$\begin{aligned} \lim _{K\rightarrow \infty }\limsup _{n\rightarrow \infty }\mathbb {P}\bigg (\sum _{i>K}|{\mathscr {C}}_{\scriptscriptstyle (i)}(p_n)|^2>\varepsilon n^{2\alpha }p_n^2\bigg ) = 0. \end{aligned}$$

This can be concluded using identical arguments as in the proof of Proposition 4 above. The proof of Theorem 3 is now complete.

5.2 Barely-supercritical regime

In this section, we provide the proof of Theorem 4. Let \(p_n = \lambda _n n^{-\eta }\), where \(\lambda _n\rightarrow \infty \) since \(p_n \gg p_c(\lambda )\). Our main tool here is a general result [38, Theorem 5.4], that provides asymptotics of the component sizes, if one can verify certain properties of an associated exploration process. Using Proposition 3, it is enough to prove Theorem 4 for the graph \({\mathscr {G}}_n(p_n)\) generated by Algorithm 1. Let \(\tilde{\varvec{d}}\) denote the degree sequence obtained after performing Algorithm 1 (S1). Thus, \({\mathscr {G}}_n(p_n)\) is distributed as \(\mathrm {CM}_n(\tilde{\varvec{d}})\). We will verify Assumptions (B1)–(B8) from [38] on the graph \({\mathscr {G}}_n(p_n)\), which allows us to conclude Theorem 4 from [38, Theorem 5.4]. We start by describing the following exploration process on \({\mathscr {G}}_n(p_n)\) from [38, Sect. 5.1]:

Algorithm 4

(S0) Associate an independent \(\mathrm {Exponential}(1)\) clock \(\xi _e\) to each half-edge e. Any half-edge can be in one of the states among sleeping, active, and dead. Initially at time 0, all the half-edges are sleeping. Whenever the set of active half-edges is empty, select a sleeping half-edge e uniformly at random among all sleeping half-edges and declare it to be active. If e is incident to v, then declare all the other half-edges of v to be active as well. The process stops when there is no sleeping half-edge left; the remaining sleeping vertices are all isolated and we have explored all other components.

  1. (S1)

    Pick an active half-edge (which one does not matter) and kill it, i.e., change its status to dead.

  2. (S2)

    Wait until the next half-edge dies (spontaneously). This half-edge is paired to the one killed in the previous step (S1) to form an edge of the graph. If the vertex it belongs to is sleeping, then we declare this vertex awake and all of its other half-edges active. Repeat from (S1) if there is any active half-edge; otherwise from (S0).

Denote the number of living half-edges upto time t by \(L_n(t)\). Let \({\tilde{V}}_{n,k}(t)\) denote the number of sleeping vertices of degree k such that all the k associated exponential clocks ring after time t. Define

$$\begin{aligned} {\tilde{V}}_n(t) = \sum _{k=1}^\infty {\tilde{V}}_{n,k}(t), \quad {\tilde{S}}_n(t) = \sum _{k=1}^\infty k{\tilde{V}}_{n,k}(t),\quad {\tilde{A}}_n(t) = L_n(t)-{\tilde{S}}_n(t). \end{aligned}$$
(5.2)

We show that Assumptions (B1)–(B8) from [38] hold with

$$\begin{aligned}\begin{aligned} \zeta = \kappa ^{\frac{1}{3-\tau }}, \quad \gamma _n = \beta _n = p_n^{\frac{\tau -2}{3-\tau }}, \quad \psi (t) = \kappa t^{\tau -2} -t, \quad {\hat{g}}(t) = t, \quad {\hat{h}}(t) = \kappa t^{\tau -2}+t, \end{aligned}\end{aligned}$$

where we recall the definition of \(\kappa \) from (2.10). The \(\zeta \) in our notation corresponds to \(\tau \) in the notation of [38, Theorem 5.4]. We have used \(\zeta \) instead of \(\tau \), since in our paper \(\tau \) denotes the power-law exponent.

We first find the number of vertices in \(\mathscr {G}_n(p_n)\). Let \({\tilde{n}}:= \#\{i: {\tilde{d}}_i \ge 1 \}\). Recall that \(V_n\) is a vertex chosen uniformly at random from [n] and let \(D_n = d_{V_n}\) be the degree of \(V_n\) in \(\mathrm {CM}_n({\varvec{d}})\). Note that

$$\begin{aligned} \begin{aligned} \mathbb {E}[{\tilde{n}}]&= \mathbb {E}\bigg [\sum _{i\in [n]} \mathbb {1}_{\left\{ {\tilde{d}}_i \ge 1\right\} } \bigg ] = \sum _{i\in [n]} \big (1- (1-p_n)^{d_i}\big ) = n \mathbb {E}[1-(1-p_n)^{D_n}]. \end{aligned} \end{aligned}$$
(5.3)

Using that \(1-(1-x)^k \le kx\) for any \(k\ge 1\) and \(x\in (0,1)\), we have \(\mathbb {E}[{\tilde{n}}] \le n \mathbb {E}[D_n]\). Also, using \(1-(1-x)^k \ge kx-k^2x^2/2\) for any \(kx< 1\), \(k\ge 1\) and \(x\in (0,1)\),

$$\begin{aligned}\begin{aligned} \mathbb {E}[1-(1-p_n)^{D_n}]&\ge \mathbb {E}[1-(1-p_n)^{D_n} \mathbb {1}_{\left\{ p_nD_n<1\right\} }] \\&\ge p_n \mathbb {E}[D_n \mathbb {1}_{\left\{ p_nD_n<1\right\} }] - \frac{p_n^2}{2} \mathbb {E}[D_n^2 \mathbb {1}_{\left\{ p_nD_n<1 \right\} }]\\&= p_n\mathbb {E}[D_n] -p_n \mathbb {E}[D_n \mathbb {1}_{\left\{ p_nD_n \ge 1\right\} }] - \frac{p_n^2}{2} \mathbb {E}[D_n^2 \mathbb {1}_{\left\{ p_nD_n <1\right\} }]. \end{aligned} \end{aligned}$$

Using Assumption 2 (ii), \((D_n)_{n\ge 1}\) is uniformly integrable and thus \(\mathbb {E}[D_n \mathbb {1}_{\left\{ p_nD_n \ge 1\right\} }] = o(1)\), where in the last step we have used that \(p_n \ll 1\). For the third term, since \((D_n)_{n\ge 1}\) is uniformly integrable, we have that \((D_n)_{n\ge 1}\) is also tight. Thus, \(p_nD_n \xrightarrow {\scriptscriptstyle \mathbb {P}} 0\). Using the uniform integrability of \((D_n)_{n\ge 1}\) again together with \(p_nD_n \mathbb {1}_{\left\{ p_nD_n<1\right\} } \le 1\) and \(p_nD_n \xrightarrow {\scriptscriptstyle \mathbb {P}} 0\), we conclude that \(\mathbb {E}[D_n\times (p_nD_n \mathbb {1}_{\left\{ p_nD_n<1\right\} })] \rightarrow 0\). From (5.3), and Assumption 2 (ii), we now conclude that

$$\begin{aligned} \begin{aligned} \mathbb {E}[{\tilde{n}}] = np_n(\mu +o(1)). \end{aligned}\end{aligned}$$
(5.4)

Further, using standard concentration inequalities for sums of independent Bernoulli random variables [42, (2.9), Theorem 2.8], it follows that

$$\begin{aligned} \begin{aligned} \mathbb {P}(|{\tilde{n}} - \mathbb {E}[{\tilde{n}}] |> \log n \sqrt{np_n} )\le 2\mathrm {e}^{-C(\log n)^2}, \end{aligned}\end{aligned}$$
(5.5)

for some constant \(C>0\). In what follows, we will often use (5.4) and (5.5) to replace \({\tilde{n}}\) by \(np_n\mu \).

Conditions (B1)–(B4) [38] are straightforward. (B8) follows using \(\max _{i\in [n]} {\tilde{d}}_i = O_{\scriptscriptstyle \mathbb {P}}(n^{\alpha }p_n) = o_{\scriptscriptstyle \mathbb {P}}({\tilde{n}}\gamma _n)\). To verify Conditions (B5)–(B7), we first obtain below the asymptotics of the mean-curve and then show that the processes \(\tilde{\varvec{S}}_n\), \(\tilde{\varvec{V}}_n\), \(\tilde{\varvec{A}}_n\) remain uniformly close to their expected curves. These are summarized in the following two propositions:

Proposition 7

For any fixed \(u> 0\), as \(n\rightarrow \infty \),

$$\begin{aligned} \sup _{t\le u}\bigg |\frac{1}{np_n\mu \beta _n}\big (\mathbb {E}[{\tilde{S}}_n(0)]-\mathbb {E}[{\tilde{S}}_n(\beta _n t)]\big ) - {\hat{h}}(t)\bigg |\rightarrow 0, \end{aligned}$$
(5.6)
$$\begin{aligned} \sup _{t\le u}\bigg |\frac{1}{np_n\mu \beta _n}\big (\mathbb {E}[{\tilde{V}}_n(0)]-\mathbb {E}[{\tilde{V}}_n(\beta _n t)]\big ) - {\hat{g}}(t)\bigg |\rightarrow 0, \nonumber \\ \sup _{t\le u}\bigg |\frac{1}{np_n\mu \gamma _n}\mathbb {E}[{\tilde{A}}_n(\beta _n t)] - \psi (t)\bigg | \rightarrow 0. \end{aligned}$$
(5.7)

Proposition 8

For any fixed \(u> 0\), as \(n\rightarrow \infty \), all the terms \(\sup _{t\le u}|{\tilde{S}}_n(\beta _n t)-\mathbb {E}[{\tilde{S}}_n(\beta _n t)] |\), \(\sup _{t\le u}|{\tilde{V}}_n(\beta _n t)-\mathbb {E}[{\tilde{V}}_n(\beta _n t)]|\), and \(\sup _{t\le u}|{\tilde{A}}_n(\beta _n t) - \mathbb {E}[{\tilde{A}}_n(\beta _n t)]| \) are \( o_{\scriptscriptstyle \mathbb {P}}(n p_n\beta _n) \) (and thus \(o_{\scriptscriptstyle \mathbb {P}}(n p_n\gamma _n)\)).

To prove Propositions 7 and 8, we make crucial use of the following lemma:

Lemma 11

For any \(t>0\), as \(n\rightarrow \infty \),

$$\begin{aligned} \mathbb {E}\bigg [\sum _{i\in [n]} \tilde{d_i} \mathrm {e}^{-t\beta _n{\tilde{d}}_i}\bigg ]&= (1+o(1)) p_n \mathrm {e}^{-t\beta _n} \sum _{i\in [n]} d_i \mathrm {e}^{-t\beta _np_n d_i}, \nonumber \\ \mathbb {E}\bigg [\sum _{i\in [n]} \mathrm {e}^{-t\beta _n{\tilde{d}}_i} \mathbb {1}_{\left\{ {\tilde{d}}_i\ge 1\right\} }\bigg ]&= (1+o(1)) \sum _{i\in [n]} \big ( \mathrm {e}^{-t\beta _np_n d_i} - (1-p_n)^{d_i} \big ). \end{aligned}$$
(5.8)

Proof

Note that if \(X\sim \mathrm {Bin}(m,p)\), then

$$\begin{aligned}\begin{aligned} \mathbb {E}\big [X\mathrm {e}^{-sX}\big ] = mp\mathrm {e}^{-s}(1-p+p\mathrm {e}^{-s})^{m-1}. \end{aligned}\end{aligned}$$

Putting \(m=d_i\), \(p=p_n\), and \(s = t\beta _n\), it follows that

$$\begin{aligned} \begin{aligned} \mathbb {E}\big [{\tilde{d}}_i \mathrm {e}^{-t\beta _n{\tilde{d}}_i}\big ]&= d_ip_n \mathrm {e}^{-t\beta _n} \Big (1-p_n\big (1-\mathrm {e}^{-t\beta _n}\big )\Big )^{d_i -1} = (1+o(1)) d_ip_n \mathrm {e}^{-t\beta _n} (1-p_nt\beta _n)^{d_i}\\&= (1+o(1)) d_ip_n \mathrm {e}^{-t\beta _n} \mathrm {e}^{-t\beta _np_n d_i}. \end{aligned}\nonumber \\ \end{aligned}$$
(5.9)

To prove (5.8), note that \(\mathbb {E}[\mathrm {e}^{-sX} \mathbb {1}_{\left\{ X\ge 1\right\} }] = \mathbb {E}[\mathrm {e}^{-sX}] - \mathbb {P}(X=0)\). The proof of (5.8) now follows similarly. \(\quad \square \)

Proof of Proposition 7

Note that, by Lemma 11,

$$\begin{aligned} \begin{aligned} \mathbb {E}\big [{\tilde{S}}_n( \beta _n t)\big ]&= \mathbb {E}\bigg [\sum _{i\in [n]} \tilde{d}_i\mathrm {e}^{-t\beta _n \tilde{d}_i} \bigg ]= (1+o(1))\ell _n p_n \mathrm {e}^{-t\beta _n} \mathbb {E}\big [\mathrm {e}^{-t\beta _n p_n D_n^{\star }}\big ], \\ \mathbb {E}\big [{\tilde{V}}_n( \beta _n t)\big ]&= \mathbb {E}\bigg [\sum _{i\in [n]} \mathrm {e}^{-t\beta _n \tilde{d}_i}\mathbb {1}_{\left\{ {\tilde{d}}_i\ge 1\right\} } \bigg ] = (1+o(1))n\big ( \mathbb {E}\big [\mathrm {e}^{-t\beta _n p_n D_n} - (1-p_n)^{D_n} \big ]\big ), \end{aligned}\nonumber \\ \end{aligned}$$
(5.10)

where \(D_n^{\star }\) has a size-biased distribution with the sizes being \((d_i/\ell _n)_{i\in [n]}\), and \(D_n\) is the degree of a vertex chosen uniformly at random from [n]. By the convergence of \(\mathbb {E}[D_n]\) in Assumption 1,

$$\begin{aligned}\begin{aligned} \mathbb {E}[{\tilde{V}}_n(0)]-\mathbb {E}[{\tilde{V}}_n(\beta _n t)] =(1+o(1))n\mathbb {E}\big [1 - \mathrm {e}^{-t\beta _n p_n D_n}\big ] = (1+o(1)) tn\beta _np_n \mu , \end{aligned}\end{aligned}$$

where the asymptotics of \(n\mathbb {E}\big [1 - \mathrm {e}^{-t\beta _n p_n D_n}\big ]\) follows using identical arguments as (5.3). Further, by using (2.10),

$$\begin{aligned} \begin{aligned} \mathbb {E}[{\tilde{S}}_n(0)]-\mathbb {E}[{\tilde{S}}_n(\beta _n t)]&= (1+o(1))\ell _np_n\mathbb {E}\big [1- \mathrm {e}^{-t\beta _n}\mathrm {e}^{-t\beta _n p_n D_n^{\star }}\big ] \\&= (1+o(1))\ell _np_n\mathbb {E}[1- (1-t\beta _n+o(\beta _n))\mathrm {e}^{-t\beta _n p_n D_n^{\star }}]\\&= (1+o(1))\ell _np_n\big (\mathbb {E}[1- \mathrm {e}^{-t\beta _n p_n D_n^{\star }}]+t\beta _n +o(\beta _n) \big )\\&= (1+o(1))n\mu p_n \beta _n (\kappa t^{\tau -2}+t+o(1)). \end{aligned}\nonumber \\ \end{aligned}$$
(5.11)

Thus, (5.6) and (5.7) follows. Moreover, \(L_n(t)\) is a pure death process, where \(L_n(0) = \sum _{i\in [n]} \tilde{d}_i\), and the jumps occur at rate \(L_n(t)\), and at each jump \(L_n(t)\) decreases by 2. Therefore, \(\mathbb {E}[L_n(t)] = \mathbb {E}[L_n(0)]\mathrm {e}^{-2t}\) and consequently, by (5.2) and (5.9),

$$\begin{aligned}\begin{aligned} \mathbb {E}[{\tilde{A}}(\beta _n t)]&= \ell _np_n\big (\mathrm {e}^{-2\beta _nt} - \mathrm {e}^{-\beta _n t}\mathbb {E}\big [\mathrm {e}^{-tp_n\beta _nD_n^{\star }}\big ]\big )+o(n\beta _np_n)\\&= n\mu p_n \gamma _n(\kappa t^{\tau -2}-t)+o(n\beta _np_n). \end{aligned}\end{aligned}$$

Thus the proof follows. \(\quad \square \)

Proof of Proposition 8

Let us consider \({\tilde{S}}_n\) only; the other inequalities follow using identical arguments. We will show that

$$\begin{aligned} \begin{aligned} \mathbb {E}\Big [\sup _{t\le u\beta _n} |{\tilde{S}}_n(t) - \mathbb {E}[{\tilde{S}}_n(t)]|^2 \Big ] = o((np_n\beta _n)^2), \end{aligned}\end{aligned}$$
(5.12)

then an application of Markov’s inequality completes the proof. To prove (5.12), we will use [38, Lemma 5.15], which says that

$$\begin{aligned} \begin{aligned} \mathbb {E}\Big [\sup _{t\le u\beta _n} |{\tilde{S}}_n(t) - \mathbb {E}[{\tilde{S}}_n(t)]| ^2 \Big ] \le C \mathbb {E}\bigg [ \sum _{i\in [n]} \tilde{d}_i^2 \min \{{\tilde{d}}_iu\beta _n,1\}\big )\bigg ]. \end{aligned}\end{aligned}$$
(5.13)

Although, [38, Lemma 5.15] was stated under Assumptions (A1)–(A4) of this paper, this particular proof does not use this assumption. The proof only uses [38, Lemma 4.2]. Indeed, the deductions in (5.62)–(5.65) of [38] does not require any assumption on the degrees. We skip redoing the proof of (5.13) here. Using the fact that \(1-\mathrm {e}^{-x} \ge (1\wedge x)/3\) in (5.13), it follows that

$$\begin{aligned}\begin{aligned} \mathbb {E}\Big [\sup _{t\le u\beta _n} |{\tilde{S}}_n(t) - \mathbb {E}[{\tilde{S}}_n(t)]| ^2 \Big ] \le C \mathbb {E}\bigg [ \sum _{i\in [n]} \tilde{d}_i^2 \big (1-\mathrm {e}^{-u\beta _n \tilde{d}_i} \big )\bigg ]. \end{aligned}\end{aligned}$$

Now, using standard concentration inequalities for tails of binomial distributions [42, Theorem 2.1], for any \(i\in [n]\),

$$\begin{aligned}\begin{aligned} \mathbb {P}(\tilde{d}_i > 2d_1 p_n ) \le C \mathrm {e}^{-C d_1 p_n} = C\mathrm {e}^{-C n^{\rho } \lambda _n}, \end{aligned}\end{aligned}$$

where \(\lambda _n = p_nn^{\eta } \rightarrow \infty \). Therefore \(\max _{i\in [n]}\tilde{d}_i \le 2d_1 p_n\), almost surely. Thus,

$$\begin{aligned} \begin{aligned} \frac{1}{(\ell _np_n\beta _n)^2}\mathbb {E}\Big [\sup _{t\le u\beta _n} |{\tilde{S}}_n(t) - \mathbb {E}[{\tilde{S}}_n(t)]|^2 \Big ]&\le \frac{C 2d_1p_n }{(\ell _np_n\beta _n)^2} \mathbb {E}\bigg [ \sum _{i\in [n]} \tilde{d}_i \big (1-\mathrm {e}^{-u\beta _n \tilde{d}_i} \big )\bigg ]\\&\le \frac{C 2d_1p_n \ell _np_n}{(\ell _np_n\beta _n)^2} \mathbb {E}\big [ 1- \mathrm {e}^{-u\beta _np_nD_n^\star }\big ], \end{aligned}\nonumber \\ \end{aligned}$$
(5.14)

where the last step follows using (5.10). The final term in (5.14) can be shown to be \(O(\beta _n)\) using identical computations as (5.11). Thus,

$$\begin{aligned}\begin{aligned} \frac{1}{(\ell _np_n\beta _n)^2}\mathbb {E}\Big [\sup _{t\le u\beta _n} |{\tilde{S}}_n(t) - \mathbb {E}[{\tilde{S}}_n(t)]|^2 \Big ] \le \frac{C 2d_1p_n \ell _np_n \beta _n}{(\ell _np_n\beta _n)^2} = O(d_1/n\beta _n) = O\big (\lambda _n^{-\frac{\tau -2}{3-\tau }}\big ) = o(1), \end{aligned}\end{aligned}$$

since \(\lambda _n\rightarrow \infty \), as \(n\rightarrow \infty \). Thus the proof follows. \(\quad \square \)

Proof of Theorem 4

The proof follows by applying [38, Theorem 5.4]. Propositions 78 verify conditions (B5)–(B7) in [38], and the rest of the conditions are straightforward to verify. \(\quad \square \)