The Bethe Hessian and Information Theoretic Approaches for Online Change-Point Detection in Network Data

Hwang, Neil; Xu, Jiarui; Chatterjee, Shirshendu; Bhattacharyya, Sharmodeep

doi:10.1007/s13171-021-00248-1

The Bethe Hessian and Information Theoretic Approaches for Online Change-Point Detection in Network Data

Published: 01 July 2021

Volume 84, pages 283–320, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Sankhya A Aims and scope Submit manuscript

The Bethe Hessian and Information Theoretic Approaches for Online Change-Point Detection in Network Data

Download PDF

Neil Hwang¹,
Jiarui Xu²,
Shirshendu Chatterjee³ &
…
Sharmodeep Bhattacharyya ORCID: orcid.org/0000-0001-5011-4119²

693 Accesses
Explore all metrics

Abstract

Sequences of networks are currently a common form of network data sets. Identification of structural change-points in a network data sequence is a natural problem. The problem of change-point detection can be classified into two main types - offline change-point detection and online or sequential change-point detection. In this paper, we propose three different algorithms for online change-point detection based on certain cusum statistics for network data with community structures. For two of the proposed algorithms, we use information theoretic measures to construct the statistic for the estimation of a change-point. In the third algorithm, we use eigenvalues of the Bethe Hessian matrix to construct the statistic for the estimation of a change-point. We show the consistency property of the estimated change-point theoretically under networks generated from the multi-layer stochastic block model and the multi-layer degree-corrected block model. We also conduct an extensive simulation study to demonstrate the key properties of the algorithms as well as their efficacy.

Community-based time segmentation from network snapshots

Article Open access 28 May 2019

Change Point Detection in a Dynamic Stochastic Blockmodel

Multiple change points detection and clustering in dynamic networks

Article 26 September 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Network data sets have grown vastly in size and complexity in recent decades with the rapid advances in data generation and collection technologies. Although statistical analysis of network data initially focused on single networks, the study of multiple network data sets has gained a larger audience with the advent of multilayer and temporal networks. A special form of multiple networks involves network sequences, both dependent and independent. Network sequence datasets have emerged in several fields of study, including time-series of social networks (Panisson et al. 2013; Stopczynski et al. 2014; Rocha et al. 2010; Van de Bunt et al. 1999; Mislove, 2009; Hogg and Lerman, 2012), epidemiological networks (Salathé et al. 2010; Rocha et al. 2011; Masuda and Holme, 2017), animal networks (Gates and Woolhouse, 2015; Lahiri and Berger-Wolf, 2007; Chen et al. 2015), mobile and online communication networks (Krings et al. 2012; Ferraz Costa et al. 2015; Jacobs et al. 2015; Viswanath et al. 2009; Omodei et al. 2015), economic networks (Popović et al. 2014; Zhang et al. 2014; Zhao et al. 2018), brain networks (Park and Friston, 2013; Sporns, 2013; Thompson et al., 2017), genetic networks (Rigbolt et al. 2011) and ecological networks (Blonder et al. 2012), to name a few. Analysis of network sequences in terms of modeling, summary statistical analysis, analysis of dynamics, community detection, and change-point detection has been investigated in several recent works (see Holme and Saramäki (2012), Holme (2015), Peixoto (2015), Sikdar et al. (2016), and Peixoto and Gauvin (2018) for some review of recent works). In this paper, we concentrate on the problem of change-point detection for network sequences.

The study of change-point detection has a long history in the statistics literature, starting from the early days of quality control (Page, 1954; 1957; Girshick and Rubin, 1952) to recent genomic studies (Siegmund, 2013). Applications of statistical methods for change-point detection are widespread. The disciplines where statistical analysis has been used for change-point detection include medical diagnostics (Yang et al. 2006; Staudacher et al. 2005; Bosc et al. 2003; Cribben et al. 2012), gene expression (Picard et al. 2005; Hocking et al. 2013; Bleakley and Vert, 2011), online activity (Lévy-Leduc et al. 2009), speech and image analysis (Harchaoui et al. 2009; Radke et al. 2005; Kasetkasem and Varshney, 2002; Celik, 2009; 2010), climate science (Reeves et al. 2007), economics (Bai and Perron, 1998) and finance (Lavielle and Teyssiere, 2007; Matteson and James, 2014). The study of the change-point detection problem started with Gaussian models with changes in the mean parameter (Page, 1954), but since then the models studied for structural change-point detection has varied widely, ranging from parametric multivariate data models (Chen and Gupta, 2011) and non-parametric models (Brodsky and Darkhovsky, 2013) to models for dependent and time-series data (Cho and Fryzlewicz, 2015; Aminikhanghahi and Cook, 2017).

The change-point detection problem can be broadly classified into two types.

1.
Offline change-point detection: The whole data sequence is available and the change-points are detected within the data sequence. This problem was studied in the beginning by Page (1954) and Girshick and Rubin (1952).
2.
Online or sequential change-point detection: The data is available sequentially and change-points are detected based on the available data. This problem in classical setting was initially studied by Kolmogorov (1950), Shiryaev (1963), Lorden et al. (1971) and others.

There is deep literature on both types of change-point detection problems and possible methods and theories. An excellent account can be found in the book (Brodsky and Darkhovsky, 2013). In this paper, we concentrate on the problem of online change-point detection for network sequence data.

The problem of change-point detection in sequences of network data has recently received some interest with the increase in the availability of multiple network data sets. However, most efforts have concentrated on the detection of offline change-points. For instance, Lévy-Leduc et al. (2009) was an early work on offline change-point detection in networks using hypothesis testing, Peel and Clauset (2015) used a hierarchical random graph model and a Bayesian procedure to detect change-points, Park et al. (2013) used local graph statistics for change-point and anomaly detection in dynamic networks, and Roy et al. (2017) used a Markov random field model for generating networks and estimated the change-point using a penalized pseudo-likelihood. Another point to note is that one approach to determining change-points is by comparing networks, so hypothesis tests for network comparisons (e.g. Bickel and Sarkar (2016), Wang et al. (2017), Cape et al. (2017), Gao and Lafferty (2017), and Jin et al. (2018)) can also be used for change-point detection in network data with some modification. For a survey of techniques used in the related problem of anomaly detection in graphs, see Ranshous et al. (2015). Some recent works (Wang et al. 2017; Bhamidi et al. 2018; Bao and Michailidis, 2018; Wang et al. 2018; Bhattacharjee et al. 2018; Wills and Meyer, 2019; Padilla et al. 2019; Zhao et al. 2019) propose methods for offline change-point detection in networks generated from block models and graphon models with some theoretical results on the consistency of the detection methods. Development and analysis of online change-point detection methods for network data are relatively rare with some possible exceptions such as (Chen et al. 2019).

In this paper, we focus on the problem of online change-point detection in network data with community structures. We consider the special case where the change-point occurs due to the change in the community structure of the networks. At the same time, the connection probabilities and the degree parameters remain the same. The main contributions of our work are as follows.

(a)
We propose two types of algorithms for online change-point detection based on network data. For the first type, we consider a statistic that captures the variation of information between estimated community structures (denoted $\hat {\mathbf {Z}}$) to construct two algorithms for online change-point detection. The two algorithms are based on two different cumulative sum measures (cusum) of the network adjacency matrices to estimate community structures. For the second type, we develop a statistic based on the eigenvalues of window-sums of the Bethe Hessian matrices obtained from the input networks. We construct an algorithm that uses this statistic to detect a change in estimated numbers of communities (denoted $\hat {K}$).
(b)
We provide the consistency results for the change-point estimators for all three algorithms under multilayer versions of the stochastic block model (MSBM). For the $\hat {\mathbf {Z}}$-based algorithms, we also provide the consistency results for the multilayer degree-corrected block models. For the $\hat {K}$-based algorithm, we prove the theoretical results only for the MSBM.
(c)
We provide extensive simulation results to demonstrate the three algorithms’ efficacy for detecting change-points in an online setup.

The paper is structured as follows. In Section 2, we introduce the data generative model and support algorithms for recovering community structure that will be used in our $\hat {\mathbf {Z}}$-based algorithms for online change-point detection. In Section 3, we propose two $\hat {\mathbf {Z}}$-based and one $\hat K$-based change-point detection algorithms for multilayer networks. In Section 4, we provide theoretical results for the estimators of change points. In Section 5, we present simulation studies to demonstrate the performance of the proposed algorithms and discuss their results. We provide full proofs for all of the theoretical results discussed herein and relevant preliminary material in the Appendices A and B.

2 Background and Preliminaries

2.1 Network Sequence Data

We consider that the data is given in the form of a sequence of N × N adjacency matrices (A⁽¹⁾, A⁽²⁾,…, A^(t),…) corresponding to the sequence of networks $\left ({G}_{N}^{(1)}, {G}_{N}^{(2)}, \ldots , {G}_{N}^{(t)}, \ldots \right )$ having the same set of N vertices V_N = {v₁,…, v_N} but with varying edge sets. $G_{N}^{(t)}$ is referred as the t-th network layer. Since we are considering the online change-point detection problem, we assume that the adjacency matrices are available to us sequentially. At any given point, T, of the sequence, the available set of adjacency matrices are (A⁽¹⁾, A⁽²⁾,…, A^(T)).

For the purpose of this paper, we only consider undirected and unweighted graphs, that is, $A_{i, j}^{(t)} \in \{0, 1\}$ for all i, j ∈{1,…, N} and and A^(t) = (A^(t))^T. However, the conclusions of the paper can be extended to positively weighted graphs with non-random weights in a quite straightforward way by considering weighted adjacency matrices.

We consider that each network $G_{N}^{(t)}$ has an assortative community structure with communities. Let us denote the N × K^(t) matrix Z^(t) to be the actual common community membership matrix of the nodes in each of the graphs ${G}_{N}^{(t)}$, where $\mathbf {Z}^{(t)}_{i, k} = 1$ if the i-th node belongs to the k-th community for all $G_{N}^{(t)}$ and zero otherwise.

2.2 Notations

Let [n] := {1,2,…, n} for and ${\mathscr{M}}_{m, n}$ be the set of all m × n matrices which have exactly one 1 and (n − 1) 0s in each row. 1_m (resp. 0_m) denotes the vector in consisting of all 1s (resp. 0s). I_N is the N × N identity matrix. λ₊(A) denotes the minimum positive eigenvalue, $\lambda _{\ell }^{\downarrow }(\mathbf {A})$ the ℓ − th largest eigenvalue, and ${\lambda }_{\ell }^{\uparrow }(\mathbf {A})$ the ℓ − th smallest eigenvalue of the matrix A. Tr(A) denotes the trace of the matrix A. ∥A∥_{p, q} denotes the L_{p, q} norm of the matrix A. |[m]| denotes the cardinality of the set [m]. For a given matrix M, let M_i∗ denote the i-th row of M and M_∗j its j-th column. Z_[m] denotes the sub-matrix of (Z)_n×K such that $(\mathbf {Z}_{[m]})_{i*}=(\mathbf {Z})_{[m]_{i}*}$, where i ∈ [m],[m] ⊂ [n],[m]_i is the index of the node i in [n]. $\mathbf {Z}_{\bar {[m]}}$ denotes the sub-matrix of (Z)_n×K such that $(\mathbf {Z}_{\bar {[m]}})_{j*}=(\mathbf {Z})_{\bar {[m]}_{j}*}$, where $j\in \bar {[m]},\bar {[m]}=[n]/[m]$, $\bar {[m]}_{j}$ is the index of the node j in [n]. P_ω denotes the matrix form of a label permutation $\omega :\{1,...,K\}\rightarrow \{1,...,K\}$ on a clustering. For , 〈A〉 denotes the matrix A with its diagonal zeroed out: 〈A〉_{i, j} = A_{i, j} if i≠j, i, j ∈ [n] and 〈A〉_{i, i} = 0 for i ∈ [n]. For any subset S of layers, let $\mathbf {A}^{S}:={\sum }_{s\in S}\mathbf {A}^{(s)}$ denote the sum of the corresponding adjacency matrices. For some constant C, if for all $n\geqslant C$, denote

$$ \begin{array}{@{}rcl@{}} f(n)\in& \left\{ \begin{array}{ll} O(g(n)) & \text{if } \lim\sup_{n\rightarrow \infty}\frac{f(n)}{g(n)}<\infty \\ o(g(n)) & \text{if } \lim\sup_{n\rightarrow \infty}\frac{f(n)}{g(n)} = 0 \\ {{\varOmega}}(g(n)) & \text{if } \lim\inf_{n\rightarrow \infty}\frac{f(n)}{g(n)}>0 \\ \omega(g(n)) & \text{if } \lim\inf_{n\rightarrow \infty}\frac{f(n)}{g(n)}= \infty \\ {{\varTheta}}(g(n)) &\text{if } \lim\sup_{n\rightarrow \infty}\frac{f(n)}{g(n)}=c \in (0,\infty). \end{array} \right. \end{array} $$

2.3 Data Generative Models

To set the context to investigate the theoretical properties of the change-point estimators, we consider a flexible data generative model that we call the multilayer degree-corrected block model with one change-point (MDBM). MDBM has six sets of parameters: (i) the change-point τ ∈ (1, T]; (ii) the number of communities K^(t), such that K^(t) = K for t ∈ [1, T] (i.e., the number of communities stays the same before and after τ); (iii) the N × 1 membership vectors z = (z₁,…, z_N) for layers t ∈ [1, τ) and $\tilde {\boldsymbol {z}}=(\tilde z_{1}, \ldots , \tilde z_{N})$ for layers t ∈ [τ, T], where each $z_{i}, \tilde z_{i} \in \{1, \ldots , K\}$; (iv) the K × K connectivity probability matrices $\mathbf {B}:=\left (\mathbf {B}^{(t)}: t \in [1,\tau )\right )$ and $\tilde {\mathbf {B}}:=\left (\tilde {\mathbf {B}}^{(t)}: t \in [\tau , T]\right )$; (v) the N × 1 degree parameters vector ψ = (ψ₁,…, ψ_N); and (vi) the K × 1 vector of probabilities of allocation for each community, π = (π₁,…, π_K) for layers t ∈ [1, τ) and $\tilde {\boldsymbol {\pi }} = (\tilde \pi _{1}, \ldots , \tilde \pi _{\tilde K})$ for layers t ∈ [τ, T]. For i > j, i, j ∈ [N], and t ∈ [1, τ) MDBM with the six sets of parameters is given by

$$ \begin{array}{@{}rcl@{}} && z_{1}, \ldots, z_{N}^{\underset{\sim}{iid}} \text{Mult}(1;(\pi_{1},\ldots,\pi_{K})), \end{array} $$

(2.1)

(2.2)

and for i > j, i, j ∈ [N], and t ∈ [τ, T]

$$ \begin{array}{@{}rcl@{}} \tilde z_{1}, \ldots, \tilde z_{N}^{\underset{\sim}{iid}} \text{Mult}(1;(\tilde \pi_{1},\ldots,\tilde \pi_{K})), \end{array} $$

(2.3)

(2.4)

The inclusion of ψ entails the obvious issue of identifiability. In order to avoid this issue we assume as in Lei et al. (2015) that

$$ \begin{array}{@{}rcl@{}} \underset{i: z_{i} = k}{\max} \psi_{i}=1 \text{ for all } k \in [K] . \end{array} $$

(2.5)

Suppose $\mathbf {Z} \in {\mathscr{M}}_{N,K}$ and $\tilde {\mathbf {Z}} \in {\mathscr{M}}_{N, K}$ denote the actual membership matrices before and after the change-point, respectively. Z and $\tilde {\mathbf {Z}}$ are unknown and we wish to estimate them along with τ. If for i ∈ [N] the corresponding community index is z_i ∈ [K] (or, $\tilde {\mathbf {z}}_{i} \in [ K]$), then clearly

$$ \mathbf{Z}_{ij} = \mathbf 1_{\{\mathbf{z}_{i}=j\}} \ \ (\text{or, } \tilde{\mathbf{Z}}_{ij} = \mathbf 1_{\{\tilde{\mathbf{z}}_{i}=j\}}). $$

In MDBM, each edge is drawn independently given the edge probability matrices $\mathbf {P}^{(t)}:=\left ({P}_{ij}^{(t)}\right )_{i,j\in [N]}$ (or, $\tilde {\mathbf {P}}^{(t)}:=\left (\tilde P^{(t)}_{ij}\right )_{i,j\in [N]}$). So, for i > j, i, j ∈ [N], and t ∈ [1, τ)

$$ {A}_{i,j}^{(t)} \sim \text{Bernoulli}\left( {P}_{i,j}^{(t)}\right), \text{ where } \mathbf{P}^{(t)} := \mathfrak{D}(\boldsymbol{\psi}) \mathbf{Z} \mathbf{B}^{(t)}\mathbf{Z}^T \mathfrak{D}(\boldsymbol{\psi}), $$

(2.6)

and for t ∈ [τ, T]

$$ {A}_{i,j}^{(t)} \sim \text{Bernoulli}\left( \tilde{P}_{i,j}^{(t)}\right), \text{ where } \tilde{\mathbf{P}}^{(t)} := \mathfrak{D}(\boldsymbol{\psi}) \tilde{\mathbf{Z}} \tilde{\mathbf{B}}^{(t)} \tilde{\mathbf{Z}}^T \mathfrak{D}(\boldsymbol{\psi}), $$

(2.7)

where, $\mathfrak {D}(\boldsymbol {\psi }) = \text {Diag}(\boldsymbol {\psi })$.

We also consider a special case of MDBM, which we refer to as the multilayer stochastic block model with one change-point (MSBM). MSBM makes the simplifying assumption that ψ_i = 1 for all i ∈ [N]; however, the number of communities is allowed to change after the change-point. In other words, K^(t) = K for t ∈ [1, τ) and $K^{(t)} = \tilde {K}$ for t ∈ [τ, T]. While $K=\tilde {K}$ in MDBM, it is not necessarily the case in MSBM. Hence, i > j, i, j ∈ [N], and t ∈ [1, τ) the parameter conditions of MSBM are as follows:

$$ \begin{array}{@{}rcl@{}} z_{1}, \ldots, z_{N}^{\underset{\sim}{iid}} \text{Mult}(1;(\pi_{1},\ldots,\pi_{K})), \end{array} $$

(2.8)

(2.9)

and for i > j, i, j ∈ [N], and t ∈ [τ, T]

$$ \begin{array}{@{}rcl@{}} \tilde z_{1}, \ldots, \tilde z_{N}^{\underset{\sim}{iid}} \text{Mult}(1;(\tilde \pi_{1},\ldots,\tilde \pi_{\tilde K})), \end{array} $$

(2.10)

(2.11)

As with MDBM, each edge is drawn independently as follows: For i > j, i, j ∈ [N] and t ∈ [1, τ),

$$ {A}_{i,j}^{(t)} \sim \text{Bernoulli}\left( {P}_{i,j}^{(t)}\right), \text{ where } \mathbf{P}^{(t)} := \mathbf{Z} \mathbf{B}^{(t)}\mathbf{Z}^T, $$

(2.12)

and for t ∈ [τ, T]

$$ A^{(t)}_{i,j} \sim \text{Bernoulli}\left( \tilde{P}_{i,j}^{(t)}\right), \text{ where } \tilde{\mathbf{P}}^{(t)} := \tilde{\mathbf{Z}} \tilde{\mathbf{B}}^{(t)} \tilde{\mathbf{Z}}^T. $$

(2.13)

2.4 Clustering Algorithms from Relevant Works

First, we make precise the notion of clustering and define a simple measure of distance between two clusterings as follows.

Definition 1

ξ is a clustering on the set of nodes V_n if ξ = {ξ₁,..., ξ_K} such that ξ_i ∩ ξ_j = ∅ for i≠j and ${\cup }_{k=1}^{K}\xi _{k} = [N]$. ξ can be represented by the matrix $\mathbf {Z} \in {\mathscr{M}}_{N, K}$, where Z_i∗ = e_k (e_k is k-th unit vector) if node i is assigned to cluster k in ξ.

One simple way to measure the distance between clusterings is to count the number of mismatched elements between two clusterings.

Definition 2

Let $\mathbf {\xi },\mathbf {\xi }^{\prime }$ be two clusterings on [n], Z and $\mathbf {n}^{\mathbf {\xi }}_{K}$ the matrix and the cluster size vector based on ξ ($\mathbf {Z^{\prime }} $ and $\mathbf {n}^{\mathbf {\xi }^{\prime }}_{K}$ resp. on $\mathbf {\xi }^{\prime }$). Let $[m]^{\mathbf {\xi },\mathbf {\xi }^{\prime }}\subset [n]$ be the set of node indices such that $\mathbf {Z}_{i*}=\mathbf {Z}_{j*}\Leftrightarrow \mathbf {Z^{\prime }}_{i*}=\mathbf {Z^{\prime }}_{j*}$ for all $i,j\in [m]^{\mathbf {\xi },\mathbf {\xi }^{\prime }}$. Denote the set of mismatched elements between two clusterings by $[\bar {m}]^{\mathbf {\xi },\mathbf {\xi }^{\prime }} = [n]\setminus [m]^{\mathbf {\xi },\mathbf {\xi }^{\prime }}$.

Algorithms for recovery of community structures will be adapted from Bhattacharyya and Chatterjee (2020b) and Bhattacharyya and Chatterjee (2020a) and are reproduced below for reference. The first is based on the sum of adjacency matrices and the second on the sum of squares of adjacency matrices. We denote these two algorithms as “Clustering Algorithms 1 and 2” or “Algorithms 1 and 2” for brevity.

Before reciting the consistency results for Algorithms 1 and 2, we recall below definitions of pertinent parameters involving ψ, ξ and B:

1.
Measures of heterogeneity of ψ, for all a ∈ [K]:
1. (a)
  $\tilde {N}_{a}:={\sum }_{i\in \xi _{a}}{\psi _{i}^{2}}$
2. (b)
  $\tilde {N}_{\max \limits }:=\max \limits _{a}\tilde {N}_{a}$ and $\tilde {N}_{\min \limits }:=\min \limits _{a}\tilde {N}_{a}$
3. (c)
  $\tilde {N}^{\prime }_{a}:={\sum }_{i\in \xi _{a}\cap \{k_{1},...,k_{N^{\prime }}\}}{\psi _{i}^{2}}$,
4. (d)
  $\tilde {N}^{\prime }_{\max \limits }:=\max \limits _{a}\tilde {N}^{\prime }_{a}$ and $\tilde {N}^{\prime }_{\min \limits }:=\min \limits _{a}\tilde {N}^{\prime }_{a}$
5. (e)
  $\tau _{a}:={\sum }_{i\in \xi _{a}}{\psi _{i}^{2}}{\sum }_{i\in \xi _{a}}\psi _{i}^{-2}$ is a parameter controlling the variation in heterogeneity within each community.
2.
$\psi _{\min \limits }:=\min \limits _{i\in [N]}{\psi _{i}}$
3.
$\lambda _{\text {A}}:=(T)^{-1}{\sum }_{t=1}^{T}\lambda _{K}\left (\frac {N}{d}\mathbf {B}^{(t)}\right )$, the minimum eigenvalue parameter based on sum of normalized connection probability matrices.
4.
$\lambda _{\text {B}}:=(T)^{-1}{\sum }_{t=1}^{T}\lambda _{K}\left (\left (\frac {N}{d}\mathbf {B}^{(t)}\right )^{2}\right )$, the minimum eigenvalue parameter based on sum of squares of normalized connection probability matrices
5.
$N_{\max \limits } := \max \limits \big ({\mathbf {Z}^{T}\mathbf {1}_{N}}\big )$ and $N_{\min \limits } = \min \limits \big ({\mathbf {Z}^{T}\mathbf {1}_{N}}\big )$ are the sizes of largest and smallest communities respectively.

In addition, we let $\hat \xi $ and $\tilde \xi $ denote the estimated community labels using Algorithms 1 and 2, respectively. The theoretical results for Algorithms 1 and 2 reproduced in Theorems 1 and 2 and are based on the following conditions:

(A)
The number of communities, K, is constant.
(B)
Community sizes are balanced, i.e., $N_{\max \limits }/N_{\min \limits } = O(1)$.
(C)
$\psi _{i}=\alpha _{i}/\max \limits \{\alpha _{j}: z_{i}=z_{j}\}$, where $(\alpha _{i})_{i=1}^{N}$ are i.i.d. positive weights such that .
(D)
$N\geqslant 3K$.
(E)
$\lambda _{B} \left (\frac {\tilde N_{\min \limits }}{N}\right )^{2} > \frac 7N$.

Theorem 1.

(Bhattacharyya and Chatterjee, 2020a) Under Conditions (A) through (D) above, for any 𝜖, η, δ > 0 and c ∈ (0,1), there are constants C₁ > 0 as function of 𝜖, c, δ and C₂ > 0 as function of c, δ, such that if $Td\geqslant C_{2}(K/\lambda _{\text {A}})^{1+\frac {1}{\delta }}$, then

$$ |\Bar{[m]}^{\xi,\hat \xi}|\leqslant \bar N_{pr}+C_{1}\bigg[\frac{K\tilde{N}^{\prime}_{\max}}{(\psi_{\min}\lambda_{A}\tilde{N}^{\prime}_{\min})^{2}}+\frac{N(Td)^{-0.5+\delta+\eta}(K{\sum}_{k\in [K]}\tau_{k})^{1/2}}{\lambda_{A}\tilde{N}^{\prime}_{\min}}\bigg] $$

(2.15)

with probability at least 1 − o(1) as $Td\lambda _{A}\rightarrow \infty $, where $\bar N_{pr}:=\frac {N}{e^{(1-c)Td}}$ is the upper bound on the number of pruned nodes in Step 2 of Algorithm 1.

Theorem 2.

(Bhattacharyya and Chatterjee, 2020b) Under Conditions (A), (B), (C), and (E), for any 𝜖 > 0 and Δ > 8, there are constants C > 0 as a function of 𝜖 and $C^{\prime }>0$ such that if

$$ N_{\min}>\frac{C(K\tilde{N}_{\max})^{3}\lambda_{B}^{-2}}{\psi_{\min}^{2}\tilde{N}_{\min}^{4}} + \frac{C{{\varDelta}}(K{\sum}_{a\in K}\tau_{a})^{1/2}}{(Td)^{1/4}\lambda_{B}(\tilde{N}_{\min}/N)^{2}} $$

then

$$ |\Bar{[m]}^{\xi,\tilde \xi}|\leqslant \bar N_{pr}+C\bigg[\frac{(K\tilde{N}_{\max})^{3}}{(\psi_{\min}\lambda_{B})^{2}(\tilde{N}_{\min})^{4}}+\frac{{{\varDelta}}(K{\sum}_{k\in [K]}\tau_{k})^{1/2}}{(Td)^{1/4}\lambda_{B}(\tilde{N}_{min}/N)^{2}}\bigg] $$

(2.16)

with probability at least 1 − o(1) as $(\mathit {Td})^{1/4}\lambda _{B}\!\rightarrow \! \infty $, where the upper bound on the number of pruned nodes in Step 5 of Algorithm 2 is $\bar N_{pr}:=\frac {N}{(Td)^{1/4}\lambda _{B}(\tilde {N}_{min}/N)^{2}}$.

2.5 Background on Information Theory

We define preliminary terms from information theory and conclude with a measure of the distance between clusterings.

Definition 3

Let $\mathbf {n}^{\mathbf {\xi }}_{K} = \mathbf {Z}^{T}\mathbf {1} = \left (n^{\mathbf {\xi }}_{1},...,n^{\mathbf {\xi }}_{K}\right )^{T}$ be the vector of cluster sizes of clustering ξ. Then the entropy of ξ is defined as:

$$ H(\mathbf{\xi}) := -\sum\limits_{k=1}^{K} \frac{{n}_{k}^{\mathbf{\xi}}}{N}\log\left( \frac{{n}_{k}^{\mathbf{\xi}}}{N}\right). $$

(2.17)

Definition 4

Let $\mathbf {\xi },\mathbf {\xi }^{\prime }$ be clusterings on [N], Z and $\mathbf {n}_{K}^{\mathbf {\xi }}$ be defined as in Definitions 1 and 3 for ξ, and $\mathbf {Z^{\prime }} $ and $\mathbf {n}^{\mathbf {\xi }^{\prime }}_{K}$ for $\mathbf {\xi }^{\prime }$. Let $\mathbf {n}^{\mathbf {\xi },\mathbf {\xi }^{\prime }} = \mathbf {Z}^{T}\mathbf {Z^{\prime }} = \left [n_{ij}^{\mathbf {\xi },\mathbf {\xi }^{\prime }}\right ]_{K\times K}$ be the contingency table of ξ and $\mathbf {\xi }^{\prime }$, where $n_{ij}^{\mathbf {\xi },\mathbf {\xi }^{\prime }}$ is the number of nodes in both ξ_i and ${\xi ^{\prime }}_{j}$ for i, j ∈ [K]. The mutual information between ξ and $\mathbf {\xi }^{\prime }$ is defined as:

$$ I(\mathbf{\xi},\mathbf{\xi}^{\prime}) := \sum\limits_{i=1}^{K} \sum\limits_{j=1}^{K} \frac{{n}_{ij}^{\mathbf{\xi},\mathbf{\xi}^{\prime}}}{N}\log\left( N \frac{{n}_{ij}^{\mathbf{\xi},\mathbf{\xi}^{\prime}}}{n^{\mathbf{\xi}}_{i} n^{\mathbf{\xi}^{\prime}}_{j}}\right). $$

(2.18)

The mutual information between two clusterings measures the information one clustering has about the other. The normalized mutual information has been widely used in the literature as an information-theoretic measure of similarity between clustering. Here, we use a measure of dissimilarity between clusterings called variation of information ($\mathcal {V}\mathcal {I}$), which is based on entropy and mutual information.

Definition 5

Let $\mathbf {\xi },\mathbf {\xi }^{\prime }, \mathbf {Z}, \mathbf {n}^{\mathbf {\xi }}_{K}, \mathbf {Z^{\prime }}$ and $\mathbf {n}^{\mathbf {\xi }^{\prime }}_{K}$ be defined as in Definition 4. The variation of information is defined as:

$$ \mathcal{V}\mathcal{I}(\mathbf{\xi},\mathbf{\xi}^{\prime}) := H(\mathbf{\xi}) + H(\mathbf{\xi}^{\prime}) - 2I(\mathbf{\xi},\mathbf{\xi}^{\prime}). $$

(2.19)

Properties of $\mathcal {V}\mathcal {I}$ are discussed in Meilă (2007) and its exhaustive list is given in Appendix A.1. The most relevant property for our purposes is as follows:

Property 1.

(Meilă, 2007) $\mathcal {V}\mathcal {I}$ is a metric.

Note that as a metric, $\mathcal {V}\mathcal {I}$ satisfies the following three axioms: positive definiteness, symmetry, and the triangle inequality.

2.6 The Bethe Hessian Matrix

Recently, a certain class of graph operators has received an increasing level of attention for their spectral properties that allow for a simple and efficient recovery of community structure. While some algorithms, such as those based on belief propagation, require a generative model with correct parameters as inputs, and those based on adjacency matrices and graph Laplacians are ineffective in the presence of degree fluctuations, non-parametric spectral clustering methods based on the non-backtracking and the Bethe Hessian matrices are robust to such challenges. As a result, they have become popular, reliable tools of choice when faced with network data that are sparse and heterogeneous (Krzakala et al. 2013; Saade et al. 2014a; Bordenave et al. 2015; Coste and Zhu, 2019; Gulikers et al. 2016; Bruna and Li, 2017; Saade et al. 2014b; Watanabe and Fukumizu, 2009; Dall’Amico and Couillet, 2019; Dall’Amico et al. 2020; Le and Levina, 2015). We start by defining these matrices on which some of the robust non-parametric methods are based.

2.6.1 Definitions

Definition 6

The non-backtracking operator, B, is a 2|E|× 2|E| matrix indexed by directed edges $i\rightarrow j$ and defined $\mathbf {B}_{i\rightarrow j,k\rightarrow l} = \delta _{jk}(1-\delta _{il})$, where |E| is the number of edges in the adjacency matrix A, i, j, k, l are vertices of A and δ is the Kronecker delta.

Definition 7

The Bethe Hessian matrix is defined as ${\mathbf {H}^{S}_{r}}:= (r^{2}-1)|S|\mathbf {I}_{N}+\mathbf {D}^{S}-r\mathbf {A}^{S}$ based on A^S with parameter r, where $\mathbf {D}^{S}_{N \times N}=\text {Diag}({{d}_{1}^{S}}, \ldots , {{d}_{N}^{S}})$ and ${d^{S}_{i}}:={\sum }_{j\in [N]} \mathbf {A}^{S}_{ij}$ for i ∈ [N]. When |S| = 1 and it is clear from context which A is used, we write $\mathbf {H}_{r}:= (r^{2}-1)\mathbf {I}_{N}+\mathbf {D}-r\mathbf {A}$ with D := Diag(A1_N).

Definition 8

The moving window sum is defined as $\textbf {A}^{[t_{0},t]} := {\sum }_{s\in [t_{0},t]} \mathbf {A}^{(s)}$ where $t_{0} = \max \limits \{1,t-\theta +1\}$ and the window .

2.6.2 Background

It was noted in Krzakala et al. (2013) that for networks containing K communities, while high degree variations suppress signal from informative eigenvalues of A and the Laplacian, the K largest eigenvalues of B are real-valued and well-separated from the circle of radius $\left \|\mathbf {B}\right \|^{1/2}$ where the bulk of the eigenvalues of B is contained. Further, Bordenave et al. (2015) showed that the largest K eigenvalues of B concentrate around the informative eigenvalues of $\mathbb {E}[\mathbf {A}]$.

However, B are non-symmetric, preventing the use of linear algebraic tools for symmetric matrices, and can be much larger than the adjacency matrices, presenting computational challenges. In response, Saade et al. (2014a) gave an intuitive argument rooted in statistical physics and numerical simulations to demonstrate that H_r is simpler given its symmetric nature and is at least as effective as B at detecting communities, while offering significant gains in computational efficiency. In statistical physics, H_r is an approximation of the Hessian matrix of the Bethe free energy at the system equilibria corresponding to the trivial fixed-points of the belief propagation equations, where every vertex belongs to each community with equal probability. Here, the parameter r denotes the temperature of the system (Yedidia et al. 2003). Then, the informative eigenvalues of H_r are the phase transitions in the Ising Hamiltonian model where new communities become visible (Bruna and Li, 2017).

An important property of H_r that bears connection to B is that when r is set to an eigenvalue of the latter, the determinant of H_r vanishes (Hashimoto, 1989; Angel et al. 2015).

On the topic of what specific values should be used for the parameter r, several proposals have been put forth. Krzakala et al. (2013) proposed $\left \|\mathbf {B}\right \|^{1/2}$, i.e., the radius of the circle containing the bulk, which is approximated by r_d below, while Saade et al. (2014a) proposed a square root of the mean degree r_a for SBM networks:

$$ \begin{array}{@{}rcl@{}} {{r}_{d}^{2}} =\left( \sum\limits_{i=1}^{N} d_{i}\right)^{-1}\left( \sum\limits_{i=1}^{N} {{d}_{i}^{2}}\right)-1 \qquad r_{a}=\sqrt{\frac{{\sum}_{i=1}^{N} d_{i}}{N}} \end{array} $$

where d_i denotes the degree of node i. Dall’Amico et al. (2019) presented an intuitive argument that for 2-community degree-corrected block model networks, $r=\frac {a+b}{a-b}$ is insensitive to degree heterogeneity and showed through simulations that it outperforms certain other choices for r in such networks. On the topic of estimating the number of communities K, Le and Levina (2015) proposed counting the number of negative eigenvalues in H_r, with the parameter choices r_d based on Krzakala et al. (2013) and r_a based on Saade et al. (2014a). Noting that simulations using r_d and r_a both underestimated K when network is unbalanced, Le and Levina (2015) suggested an additional method of estimating K whereby the number of eigenvalues that are separated from the bulk by a predetermined integer multiple of the bulk radius is used instead.

Despite the rich literature, however, a formal proof of the intuition that the number of communities in a network is given by the number of negative eigenvalues of H_r, and precisely what value of r enables such a detection and under what conditions, is still lacking.

3 Online Change-point Detection Methods

We propose the following two types of algorithms for online change-point detection. In both cases, we compute an estimate of the structural property of interest using aggregated networks and infer that a change-point has occurred when we detect a change in the estimates.

1.
$\hat {\mathbf {Z}}$-based algorithms detect a change using the estimated community structure.
2.
$\hat {K}$-based algorithm detects a change using the estimated number of communities.

These algorithms are discussed in detail below in Sections 3.1 and 3.2, followed by theoretical results in Section 4.

3.1 $\hat {\mathbf {Z}}$-based Change-Point Detection Algorithms

The community structure of a “recent” (to be made precise later) sequence of networks is estimated, and is compared to the estimated structure of an “old” sequence using $\mathcal {V}\mathcal {I}$. We denote an estimated community structure by $\hat {\mathbf {Z}}$. In constructing the two types of aggregated networks to compare, we employ two approaches. In one, we let the two sequences have a common set of layers; in the other, we keep the two sequences disjoint. These two approaches are encoded in Algorithms 3 and 4 discussed later in Section 3.1.2.

In the absence of a change-point, the latent community structures of a recent and an old sequence of networks are expected to be similar. In other words, the $\mathcal {V}\mathcal {I}$ between the two sequences should be small by Property 1. While latent community structures are unobserved, Theorem 4 below guarantees that as long as the number of mismatched nodes is bounded, the $\mathcal {V}\mathcal {I}$ between the estimated and the latent structures is also bounded. This forms the basis for using estimated community structures in Algorithms 1 and 2.

3.1.1 Upper Bound on $\mathcal {V}\mathcal {I}(\hat {\mathbf {Z}},\mathbf {Z})$

Recall that Theorems 1 and 2 provide upper bounds on the number of misclassified nodes of estimated community structures computed using Algorithms 1 and 2, respectively. Theorems 3 and 4 below state that this bound directly implies an upper bound on $\mathcal {V}\mathcal {I}(\hat {\mathbf {Z}},\mathbf {Z})$. The proofs of Theorems 3 and 4 are given in Appendices A.2 and A.3.

Theorem 3.

Given two clusterings ξ and $\mathbf {\xi }^{\prime }$ (alternatively, Z and $\mathbf {Z^{\prime }}$ in a matrix form), let $[m]^{\mathbf {\xi },\mathbf {\xi }^{\prime }}\subset [N]$ be the set of node indices such that $\mathbf {Z}_{i*}=\mathbf {Z}_{j*}\Leftrightarrow \mathbf {Z^{\prime }}_{i*}=\mathbf {Z^{\prime }}_{j*} $ for all $i,j\in [m]^{\mathbf {\xi },\mathbf {\xi }^{\prime }}$. Then there exists a K × K permutation matrix P_ω such that

$$ |[m]^{\mathbf{\xi},\mathbf{\xi}^{\prime}}|= \text{Tr}(\mathbf{Z}^{T}\mathbf{Z^{\prime}}_{\omega})=\text{Tr}(\mathbf{Z}^{T}\mathbf{Z^{\prime}} \mathbf{P_{\omega}}) \text{, where }\mathbf{Z^{\prime}}_{\omega} = \mathbf{Z^{\prime}} \mathbf{P_{\omega}}. $$

(3.1)

Theorem 4.

Let ξ, $\mathbf {\xi }^{\prime }$ be two clusterings. Let P_ω be the permutation matrix satisfying Eq. 3.1, and denote the contingency table by $\mathbf {Z}^{T}\mathbf {Z^{\prime }}_{\omega }=[n_{ij}]$. Let n₁,..., n_K be the cluster sizes with ξ and ${{n}_{1}^{\prime },...,n_{K}^{\prime }}$ of $\mathbf {\xi }^{\prime }$, respectively. Without loss of generality, assume n₁ < ... < n_K and $n^{\prime }_{1}<...<n^{\prime }_{K}$. Then $\mathcal {V}\mathcal {I}(\mathbf {\xi },\mathbf {\xi }^{\prime })$ given $|[m]^{\mathbf {\xi },\mathbf {\xi }^{\prime }}|$ is upper bounded by $M(\xi , \xi ^{\prime }, |[m]^{\xi , \xi ^{\prime }})$, which is defined as:

$$ \begin{aligned} M\left( \mathbf{\xi},\mathbf{\xi}^{\prime},|[m]^{\mathbf{\xi},\mathbf{\xi}^{\prime}}|\right)= &M^{(a)}_{\mathbf{\xi},|[m]^{\mathbf{\xi},\mathbf{\xi}^{\prime}}|}+M^{(b)}_{\mathbf{\xi},|[m]^{\mathbf{\xi},\mathbf{\xi}^{\prime}}|}+M^{(c)}_{\mathbf{\xi},|[m]^{\mathbf{\xi},\mathbf{\xi}^{\prime}}|}+M^{(d)}_{\mathbf{\xi},|[m]^{\mathbf{\xi},\mathbf{\xi}^{\prime}}|}\\ &+M^{(a)}_{\mathbf{\xi}^{\prime},|[m]^{\mathbf{\xi},\mathbf{\xi}^{\prime}}|}+M^{(b)}_{\mathbf{\xi}^{\prime},|[m]^{\mathbf{\xi},\mathbf{\xi}^{\prime}}|}+M^{(c)}_{\mathbf{\xi}^{\prime},|[m]^{\mathbf{\xi},\mathbf{\xi}^{\prime}}|}+M^{(d)}_{\mathbf{\xi}^{\prime},|[m]^{\mathbf{\xi},\mathbf{\xi}^{\prime}}|} \end{aligned} $$

(3.2)

where,

$$ M^{(a)}_{\mathbf{\xi},|[m]^{\mathbf{\xi},\mathbf{\xi}^{\prime}}|}=\sum\limits_{i=0}^{K^{(a)}}\frac{n_{i}}{N}\log\frac{N}{n_{i}}+\frac{n^{(a)}}{N}\left[\log(K-K^{(a)})+\log\frac{N}{n^{(a)}}\right],\text{while}\ n_{0}=0 $$

(3.3)

$$ \begin{array}{@{}rcl@{}} K^{(a)}\!\!\! &=&\!\!\! \min\{j|j\in 0,..,K-1\},\text{such that}\ |[m]^{\mathbf{\xi},\mathbf{\xi}^{\prime}}|-\sum\limits_{i=0}^{j}n_{i}\leqslant (K-j)n_{j+1} \\ n^{(a)} \!\!\!&=&\!\!\! |[m]^{\mathbf{\xi},\mathbf{\xi}^{\prime}}|-\sum\limits_{i=0}^{K^{(a)}}n_{i} \end{array} $$

$$ \begin{array}{@{}rcl@{}} M^{(b)}_{\mathbf{\xi},|[m]^{\mathbf{\xi},\mathbf{\xi}^{\prime}}|}\!\!\!&=&\!\!\!-\bigg[\frac{K^{(b)}-1}{N}\log N + \frac{n^{(b)}}{N}\log\frac{N}{n^{(b)}}+\sum\limits_{i=K^{(b)}+1}^{K+1}\frac{n_{i}}{N}\log\frac{N}{n_{i}}\bigg],\\&& \textit{while}\ n_{K+1}=0 \end{array} $$

(3.4)

$$ \begin{array}{@{}rcl@{}} K^{(b)} \!\!\!&=&\!\!\! \max\{j|j\in 1,..,K\},\ \text{such that}\ |[m]^{\mathbf{\xi},\mathbf{\xi}^{\prime}}|-(j-1)-\sum\limits_{i=j+1}^{K+1}n_{i}\leqslant n_{j} \\ n^{(b)} \!\!\!&=&\!\!\!|[m]^{\mathbf{\xi},\mathbf{\xi}^{\prime}}|-(K^{(b)}-1)-\sum\limits_{i=K^{(b)}+1}^{K+1}n_{i} \end{array} $$

$$ \begin{array}{@{}rcl@{}} M^{(c)}_{\mathbf{\xi},|[m]^{\mathbf{\xi},\mathbf{\xi}^{\prime}}|}\!\!\!&=&\!\!\!\sum\limits_{i=0}^{K^{(c)}}\frac{n_{i}-1}{N}[\log(K-1)+\log\frac{N}{n_{i}-1}]\\&&+\frac{n^{(c)}}{N}[\log(K-K^{(c)})(K-1)+\log\frac{N}{n^{(c)}}], \end{array} $$

(3.5)

$$ \ \text{while}\ n_{0}=1 $$

$$ \begin{array}{@{}rcl@{}} K^{(c)} &=& \min\{j|j\in 0,..,K-1\},\ \text{such that}\ N-|[m]^{\mathbf{\xi},\mathbf{\xi}^{\prime}}|-{\sum}_{i=0}^{j}(n_{i}-1)\\&\leqslant& (K-j)(n_{j+1}-1) \\ n^{(c)} &=& N-|[m]^{\mathbf{\xi},\mathbf{\xi}^{\prime}}|-{\sum}_{i=0}^{K^{(c)}}(n_{i}-1) \end{array} $$

$$ \begin{array}{@{}rcl@{}} M^{(d)}_{\mathbf{\xi},|[m]^{\mathbf{\xi},\mathbf{\xi}^{\prime}}|}\!\!\!&=&\!\!\!-\left[\frac{n^{(d)}}{N}\log\frac{N}{n^{(d)}}+\sum\limits_{i=K^{(d)}+1}^{K+1}\frac{n_{i}-1}{N}\log\frac{N}{n_{i}-1}\right],\\&& \text{while}\ n_{0}=n_{K+1}=1 \end{array} $$

(3.6)

$$ \begin{array}{@{}rcl@{}} K^{(d)} &=&\max\{j|j\in 1,..,K\},\ s.t.\ N-|[m]^{\mathbf{\xi},\mathbf{\xi}^{\prime}}|-\sum\limits_{i=j+1}^{K+1}(n_{i}-1)\leqslant n_{j}-1 \\ n^{(d)} &=& \ N-|[m]^{\mathbf{\xi},\mathbf{\xi}^{\prime}}|-\sum\limits_{i=K^{(d)}+1}^{K+1}(n_{i}-1) \end{array} $$

$M^{(a)}_{\mathbf {\xi }^{\prime },|[m]^{\mathbf {\xi },\mathbf {\xi }^{\prime }}|},...,M^{(d)}_{\mathbf {\xi }^{\prime },|[m]^{\mathbf {\xi },\mathbf {\xi }^{\prime }}|}$ have bounds of the same form as $M^{(a)}_{\mathbf {\xi },|[m]^{\mathbf {\xi },\mathbf {\xi }^{\prime }}|},$..., $M^{(d)}_{\mathbf {\xi },|[m]^{\mathbf {\xi },\mathbf {\xi }^{\prime }}|}$, with $n^{\prime }_{i}$ replacing n_i.

3.1.2 Two Algorithms

In what follows, we formally introduce the two algorithms based on $\hat {\mathbf {Z}}$ below as Algorithms 3 and 4. Given a sequence of adjacency matrices A⁽¹⁾,..., A^(T), the goal is to detect a change-point τ based on $\mathcal {V}\mathcal {I}$ between recovered community structures for layers {1,..., τ − 1} and {τ,..., T}. This is done in two steps. First, for each layer t, the $\mathcal {V}\mathcal {I}$ between two sequences of layers differing by a segment containing t is checked whether it is larger than the upper bound in Theorem 4. If so, it is inferred that $t\geqslant \tau $. In addition, if for some , the $\mathcal {V}\mathcal {I}$ at t + υ_J through t + υ_J + υ_C − 1 progressively becomes larger, then it is concluded that t is the estimated change-point.

In addition to the adjacency matrices, inputs include the following: cushion κ, which denotes some sufficient number of layers needed for Algorithms 1 and 2 to recover community structure; the number of communities K; and window 𝜃, which denotes the most recent number of layers representing the difference in lengths of any two pairs of sequences of networks being compared.

Algorithms 3 and 4 differ only on the sequences of networks used for comparison. In Algorithm 3, the recent sequence consists of layers $\max \limits \{1,t-m\theta +1\}$ through t, while the old sequence consists of $\max \limits \{1,t-m\theta +1\}$ through t − 𝜃, where m denotes the number of windows and controls the proportion of old layers in the recent sequence. Hence the two sequences share at most (m − 1)𝜃 layers. In Algorithm 4, the recent sequence consists of layers t − 𝜃 + 1 through t, while the old sequence is comprised of layers 1 through t − 𝜃 and is thus disjoint from the recent sequence.

In lines 3 to 7 in Algorithm 3 and 3 to 6 in Algorithm 4, the community structure at each t is estimated and $\mathcal {V}\mathcal {I}$ between recent and old sequences are computed at layers t − 1 through t + υ_J + υ_C − 1. In lines 9 to 11 in Algorithm 3 and lines 8 to 10 in Algorithm 4, $\mathcal {V}\mathcal {I}$ are checked against the upper bound from Theorem 4 and whether they progressively increase to determine if t is a change-point.

Remark 1.

Calculating $\mathcal {V}\mathcal {I}$ in line 9 of Algorithm 3 involves computing upper bounds on $|\Bar {[m]}_{t_{1}:t_{2}}|$ using either (2.15) or (2.16) depending on whether Algorithm 1 or 2 is used. However, since ξ is unknown, letting N_pr be the number of pruned high-degree nodes, we can calculate an upper bound on $|\Bar {[m]}_{t_{1}:t_{2}}|$ using Eqs. 3.7 and 3.8 below (for Algorithms 1 and 2, respectively):

$$ |\Bar{[m]}_{t_{1}:t_{2}}|=N_{pr}+C\bigg[\frac{(K\tilde{N}_{\max})^{3}}{(\psi_{\min}\lambda_{B})^{2}(\tilde{N}_{\min})^{4}}+\frac{{{\varDelta}}(K{\sum}_{k\in [K]}\tau_{k})^{1/2}}{((t_{2}-t_{1})d)^{1/4}\lambda_{B}(\tilde{N}_{min}/N)^{2}}\bigg] $$

(3.7)

$$ |\Bar{[m]}_{t_{1}:t_{2}}|=N_{pr}+C_{1}\bigg[\frac{K\tilde{N}^{\prime}_{\max}}{(\psi_{\min}\lambda_{A}\tilde{N}^{\prime}_{\min})^{2}}+\frac{N((t_{2}-t_{1})d)^{-0.5+\delta+\eta}(K{\sum}_{k\in [K]}\tau_{k})^{1/2}}{\lambda_{A}\tilde{N}^{\prime}_{\min}}\bigg] $$

(3.8)

where C, Δ, C₁, δ and η are constants.

If no change-point is detected by layer t, the layers from $\max \limits \{1,t-m*\theta +1\}$ to t − 1 are assumed to have the same latent community structure and $\hat {\mathbf {\xi }}_{t-\theta }$ given by $\hat {\mathbf {Z}}$ is considered an estimate of ξ_t−𝜃. For π, ψ and B needed in the calculation of $|\Bar {[m]}_{t_{1}:t_{2}}|$, we use adjusted profile likelihood estimates from layers t₁ : t₂ − 𝜃. We derive the MDBM parameter estimates as follows:

$$\hat{\mathbf{\pi}}_{a}=\frac{n_{a}}{N},\ \hat{\boldsymbol{\psi}}_{i}=\frac{{\sum}_{t=t_{1}}^{t_{2}-\theta}d_{i}^{(t)}}{T},\ \hat{B}_{a,b}=\frac{n_{ab}}{n_{a} n_{b}},$$

where $d_{i}^{(t)}$ denotes the degree of node i ∈ [N] in layer t ∈ [t₁, t₂ − 𝜃], T := t₂ − t₁ − 𝜃, n_a denotes the number of nodes with community label a, and $n_{ab}^{(t)}$ denotes number of edges between communities with labels a and b in layer t, for a, b ∈ [K]. To satisfy the assumptions that (1) the connectivity probability matrix does not change with time, and that (2) the local maximal degree equals 1, we normalize and adjust the estimates as follows:

$$\hat{\psi}^{\prime}_{i}=\frac{\hat{\psi}_{i}}{\max_{j: \mathbf{Z}_{j*}=\mathbf{Z}_{i*}}\hat{\psi}_{j}},\ \hat{B}^{\prime}_{a,b}=\frac{{\sum}_{t}n_{ab}^{(t)}}{(t-\theta)n_{a}n_{b}}\max_{i: \mathbf{Z}_{i*}=\mathbf{e}_{a}}\hat{\psi}_{i}\max_{j: \mathbf{Z}_{j*}=\mathbf{e}_{b}}\hat{\psi}_{j}.$$

Remark 2.

The calculation for $|\Bar {[m]}_{t_{1}:t_{2}}|$ in Algorithm 4 is similar to that in Algorithm 3. The only difference is that for values of π, ψ and B that are needed for calculation of $|\Bar {[m]}_{t_{1}:t_{2}}|$, we use adjusted profile likelihood estimates with data from layers 1 to t − 𝜃.

Remark 3.

The time complexity of both Algorithms 3 and 4 is O(TKN²), driven by the partial eigen-decomposition in Steps 5-6 and Steps 4-5 in Algorithms 3 and 4, respectively.

Remark 4.

In both algorithms, tuning parameters consist of the cushion κ, the window , the number of windows , jump length υ_J ∈{0,..., 𝜃 − 1}, and check length . In our simulations, we let κ ∈{2,5}; 𝜃 ∈{2,6};υ_C, υ_J ∈{1,2}.

3.2 $\hat {K}$-based Change-Point Detection Algorithm

Given a sequence of adjacency matrices for networks, the aggregated adjacency matrices are represented by sum of the adjacency matrices. Once aggregated adjacency matrix is obtained from the sequence, it is transformed into the corresponding Bethe Hessian matrix ${\mathbf {H}_{r}^{S}}$, and the number of communities $\hat {K}$ is estimated for each ${\mathbf {H}_{r}^{S}}$. Then, change-points are detected when a change in $\hat {K}$ is observed. The details of the algorithm are given in Algorithm 5.

Remark 5.

The sample size parameter M would need to be set based on initial guesses for K informed by one’s domain knowledge such that M ≫ K. In our simulations with K ∈{3,5}, the choice M = 50 worked well.

Remark 6.

Tuning parameters of the algorithm consist of cushion κ and window 𝜃. κ controls how close the change-point is to the beginning of the sequence such that no change-point occurring before κ can be detected. κ was assigned a value of 2 and 5. 𝜃 controls the length of the interval for estimating change-point and was assigned values of 2, 4, and 6.

Remark 7.

Estimating r that yielded accurate results in the main procedure required that A^S be of sufficient density. In our simulations, that density ranged from 120 to 130, which equated to approximately 15 to 30 layers depending on the setting of the density parameter ρ. The top row in Fig. 1 shows that the first-order linear approximation of r in Steps 9 and 10 yielded an estimate quite close to the limiting value of r given the sparsity setting in the simulations. The bottom row in Fig. 1 shows that the mean density of A^S at which the estimated r is close to the limiting value is similar at different levels of the density parameter ρ. For other network data, the parameters in the linear approximation should first be estimated prior to applying the algorithm.

Remark 8.

The time complexity of the sub-procedure EstimateParams is O(N³) driven by the eigenvalue computation in Step 2, and O(TN³) for the main procedure where the eigenvalue computation for $\textbf {H}(\hat {r})$ in Step 10 is repeated T times.

4 Theoretical Results

In this section, we provide theoretical results of the change-point estimators proposed in Section 3. Recall that the data is given in the form of a sequence of N × N adjacency matrices (A⁽¹⁾, A⁽²⁾,…, A^(t),…). Since we are considering an online change-point detection problem, we assume that the adjacency matrices are available to us sequentially and at any given point, T, of the sequence, the available set of adjacency matrices are (A⁽¹⁾, A⁽²⁾,…, A^(T)). We consider that the data are generated from MDBM as defined in Section 2.3 and MSBM as its special case with ψ_i = 1 for all i ∈ [N] but with potentially different number of communities post-change-point. Our goal now becomes obtaining theoretical properties of the change-point estimators proposed in Section 3. Below, we state the main results for both classes of algorithms. Detailed proofs for all of the results are given in Appendices A and B.

4.1 Assumptions

4.1.1 Assumptions for Theoretical Results for the $\hat {\mathbf {Z}}$-based Algorithms

Although the change-point estimator works for any consistent community recovery algorithm, for the sake of convenience, without loss of generality, we consider that the community labels have been estimated by Algorithm 1 to prove theoretical results on consistency. Lemma 1 gives a bound on the misclassification error based on the $\mathcal {V}\mathcal {I}$ between the estimated community structures using Algorithm 1. Note that in order to use the bound in Eqs. 2.15, we need Conditions (A) through (D) in Section 2.4 on the parameters of MDBM.

With error bounds on the estimated community labels given in Eqs. 2.15 and 2.16, in Appendix A.3, we provide the relationship between the error bounds and the $\mathcal {V}\mathcal {I}$. In what follows, we prove the consistency of the change-point estimator, $\hat {\tau }$, output by Algorithms 3 and 4.

4.1.2 Assumptions for Theoretical Results for the $\hat {K}$-based Algorithm

We consider network data generated from MSBM as described in Section 2.1. In addition, we assume the following:

(F)
B^(t) = B for all t ≥ 1, and a special assortative structure $\mathbf {B}_{K\times K} :=\frac {a-b}{N}\mathbf {I}_{K}+\frac bN\mathbf {1}_{K}{\mathbf {1}_{K}^{T}}$. We note that this special structure is commonly used in the literature to show theoretical properties of SBMs.
(G)
Identical community sizes, i.e., n₁ = ⋯ = n_K = N/K for all t ∈ [1, T]. Hence, after the change-point, we assume that each community is allocated the same number of nodes.

4.2 Theoretical Results for the $\hat {\mathbf {Z}}$-based Algorithms

4.2.1 Theoretical Properties of $\hat {\tau }$ in Algorithms 3 and 4

Both Algorithms 3 and 4 output an estimate of the change-point, $\hat {\tau }$, if the true change-point τ exists within the sequence; otherwise, both return an empty set.

We start with providing a result on the accuracy of estimated community structure. Lemma 1 provides a high probability bound on the errors in estimated community structure.

Lemma 1.

Let $\hat {\mathbf {\xi }}_{1:T}$ be an estimate of ξ_1:T based on layers 1 : T in the network data $(\mathbf {A}_{N\times N}^{(s)})_{s=1}^{T}$. Let |[m]̄_1:T| be an upper bound on the total number of misclassified nodes in $\hat {\mathbf {\xi }}_{1:T}$ with respect to ξ_1:T as stated in Theorem 4.9 from Bhattacharyya and Chatterjee (2020b) under Conditions (A) through (D) in Section 2.4. If $|\Bar {[m]}_{1:T}| \geqslant \frac {K-1}{K}N$, then, $\mathcal {V}\mathcal {I}(\hat {\mathbf {\xi }}_{1:T}, \mathbf {\xi }_{1:T})\leqslant 2\log (K)$. If $|\Bar {[m]}_{1:T}| < \frac {K-1}{K}N$, then as $(Td)^{1/4}\lambda _{B} \rightarrow \infty $,

(4.1)

where $M\left (\mathbf {\xi }_{1:T}, \hat {\mathbf {\xi }}_{1:T}, N-|\Bar {[m]}_{1:T}|\right )$ is as defined in Eq. 3.2 of Theorem 4.

The main theoretical results for $\hat {\tau }$ estimated from Algorithm 3 are given in Theorems 5 and 6. Theorem 5 shows that in absence of a change-point, with high probability Algorithm 3 would return a null set. Theorem 6 shows that estimated change-point $\hat {\tau }$ is close to the true change-point τ with high probability. Denote $\hat {\mathbf {\xi }}_{1:t}$ as $\hat {\mathbf {\xi }}_{t}$, ξ_1:t as ξ_t, |[m]̄_1:t| as |[m]̄_t|. Let ξ₁ be the clustering of layers 1 : τ − 1 and ξ₂ be the clustering of layers $t\geqslant \tau $. Also, recall the quantities defined in terms of the parameters of MSBM and MDBM of Section 2.3 in Section 3.

Theorem 5.

There exists such that for any t < τ and $t \geqslant \tau +\delta _{0}+\theta $, if $|\Bar {[m]}_{t}| < \frac {K-1}{K}N$, as $((t-\theta -1)d)^{1/4}\lambda _{B} \rightarrow \infty $,

(4.2)

Theorem 6.

Let $\hat \tau $ be the estimated change-point from Algorithm 3. Then there exists such that if $|\Bar {[m]}_{t}| < \frac {K-1}{K}N$, as $((t-\theta -1)d)^{1/4}\lambda _{B} \rightarrow \infty $, .

The proof of Theorem 5 follows from Lemma 1 and 2, where as, proof of Theorem 6 follows from Lemmas 1, 2 and Theorem 7. The details of the proofs are given in the Appendices A.4 - A.8.

Lemma 2.

For any t:

$$ \begin{aligned} \mathcal{V}\mathcal{I}(\hat{\mathbf{\xi}}_{t},\hat{\mathbf{\xi}}_{t-\theta})&\leqslant \mathcal{V}\mathcal{I}(\mathbf{\xi}_{t},\mathbf{\xi}_{t-\theta}) + \mathcal{V}\mathcal{I}(\hat{\mathbf{\xi}}_{t},\mathbf{\xi}_{t}) + \mathcal{V}\mathcal{I}(\hat{\mathbf{\xi}}_{t-\theta},\mathbf{\xi}_{t-\theta})\\ \mathcal{V}\mathcal{I}(\hat{\mathbf{\xi}}_{t},\hat{\mathbf{\xi}}_{t-\theta})&\geqslant \mathcal{V}\mathcal{I}(\mathbf{\xi}_{t},\mathbf{\xi}_{t-\theta}) - \mathcal{V}\mathcal{I}(\hat{\mathbf{\xi}}_{t},\mathbf{\xi}_{t}) - \mathcal{V}\mathcal{I}(\hat{\mathbf{\xi}}_{t-\theta},\mathbf{\xi}_{t-\theta}). \end{aligned} $$

(4.3)

Theorem 7.

There exist $\theta \in \mathbb {Z}^{+}$, and such that for any $\tau + \delta _{1} \leqslant \ t \leqslant \tau + \delta _{2} < \tau -1+2\theta $, as $((t-\theta -1)d)^{1/4}\lambda _{B} \rightarrow \infty $,

(4.4)

Thus, according to Theorems 5 and 7, $\mathcal {V}\mathcal {I}(\hat {\mathbf {\xi }}_{t},\hat {\mathbf {\xi }}_{t-\theta })$ crossing its upper bound indicates a change-point and can be confirmed by a subsequent increase in $\mathcal {V}\mathcal {I}(\hat {\mathbf {\xi }}_{t},\hat {\mathbf {\xi }}_{t-\theta })$ in the interval $(\hat \tau +\delta _{1},\hat \tau +\delta _{2})$. Theorem 6 indicates that the estimated change-point is not far from the true change-point.

The following theoretical results provide further support to Algorithm 4. Theorem 8 shows that in absence of a change-point, with high probability Algorithm 3 would return a null set. Theorem 9 shows that estimated change-point $\hat {\tau }$ is close to the true change-point τ with high probability.

Theorem 8.

There exists such that for any t < τ and $t \geqslant \tau +\delta _{0}+\theta $, if $|\Bar {[m]}_{(t-\theta +1):t}| < \frac {K-1}{K}N$ and $|\Bar {[m]}_{t-\theta }| < \frac {K-1}{K}N$, as $(d*\theta )^{1/4}\lambda _{B} \rightarrow \infty $,

(4.5)

Theorem 9.

Let $\hat \tau $ be the estimated change-point from Algorithm 4. There exists such that if $|\Bar {[m]}_{(t-\theta +1):t}| < \frac {K-1}{K}N$ and $|\Bar {[m]}_{t-\theta }| < \frac {K-1}{K}N$, as $(d*\theta )^{1/4}\lambda _{B} \rightarrow \infty $, .

The proof of Theorem 8 follows from Lemmas 1 and 3, where as, proof of Theorem 9 follows from Lemmas 1, 3 and Theorem 10. The details of the proofs are given in the Appendices A.9- A.12.

Lemma 3.

For any t:

$$ \begin{aligned} \mathcal{V}\mathcal{I}(\tilde{\mathbf{\xi}}_{t},\hat{\mathbf{\xi}}_{t-\theta})\!&\leqslant \mathcal{V}\mathcal{I}(\mathbf{\xi}_{(t-\theta+1):t},\mathbf{\xi}_{t-\theta}) + \mathcal{V}\mathcal{I}(\hat{\mathbf{\xi}}_{(t-\theta+1):t},\mathbf{\xi}_{(t-\theta+1):t}) + \mathcal{V}\mathcal{I}(\hat{\mathbf{\xi}}_{t-\theta},\mathbf{\xi}_{t-\theta})\\ \mathcal{V}\mathcal{I}(\tilde{\mathbf{\xi}}_{t},\hat{\mathbf{\xi}}_{t-\theta})&\!\geqslant \mathcal{V}\mathcal{I}(\mathbf{\xi}_{(t-\theta+1):t},\mathbf{\xi}_{t-\theta}) - \mathcal{V}\mathcal{I}(\hat{\mathbf{\xi}}_{(t-\theta+1):t},\mathbf{\xi}_{(t-\theta+1):t}) - \mathcal{V}\mathcal{I}(\hat{\mathbf{\xi}}_{t-\theta},\mathbf{\xi}_{t-\theta}). \end{aligned} $$

(4.6)

Theorem 10.

There exists $\theta \in \mathbb {Z}^{+}\backslash \{1\}$, and such that for any $\tau + \delta _{1} \leqslant \ t \leqslant \tau + \delta _{2}$, as $(d*\theta )^{1/4}\lambda _{B} \rightarrow \infty $

(4.7)

Theorems 8 and 10 state that a change-point is inferred when (1) $\mathcal {V}\mathcal {I}(\tilde {\mathbf {\xi }}_{t},\hat {\mathbf {\xi }}_{t-\theta })$ crosses its upper bound in Theorem 8 and (2) there is a subsequent increase in $\mathcal {V}\mathcal {I}(\tilde {\mathbf {\xi }}_{t},\hat {\mathbf {\xi }}_{t-\theta })$ in $(\hat \tau +\delta _{1},\hat \tau +\delta _{2})$. Theorem 9 indicates that the estimated change-point is not far from the true change-point.

4.3 Theoretical Results for the $\hat K$-based Algorithm

We prove theoretical results for the change-point estimate $\hat {\tau }$ obtained from Algorithm 5 for the MSBM defined in Section 2.3.

4.3.1 Notation

Define d := (a + (K − 1)b)/K as the expected degree of each node in each layer, $\bar d^{S}:=\frac {1}{N|S|}{\mathbf {1}_{N}^{T}}\mathbf {A}^{S}\mathbf {1}_{N}$ as the average observed degree for a specific sequence, S, of layers, and ${{\varDelta }} := \sqrt {N/K}\mathbf {I}_{K}$. Define $\mathbf {W}:=\boldsymbol {{\varDelta }}^{-1}\mathbf {Z}^{T}\tilde {\mathbf {Z}}\boldsymbol {{\varDelta }}^{-1}=\frac {1}{K}\mathbf {1}_{K}{\mathbf {1}_{K}^{T}}$. Recall that $\mathbf {Z} \in {\mathscr{M}}_{N,K}$ (resp. $\tilde {\mathbf {Z}}$) denotes the membership matrix before (resp. after) the change-point τ ∈ [T].

4.3.2 Preliminary Results

We start with a few results on the eigenstructure of aggregated adjacency matrices.

Lemma 4.

Under the conditions (F) and (G) on the MSBM, if no change-point is present within the layers in [T], then and .

Lemma 5.

For network sequences generated from MSBM with conditions (F) and (G), if Z (resp. $\tilde {\mathbf {Z}}$) denotes the membership matrix before (resp. after) the change-point τ ∈ [T], then the spectral decomposition of $t\mathbf {Z}\mathbf {B}\mathbf {Z}^{T}+(T-t)\tilde {\mathbf {Z}}\mathbf {B}\tilde {\mathbf {Z}}^{T}$ is given by

$$ \begin{array}{@{}rcl@{}} &\mathbf{X} \begin{bmatrix} Td & \mathbf{0} & \mathbf{0} \\ \mathbf{0} & t\frac{a-b}{K}\mathbf{I}_{K-1} & \mathbf{0}\\ \mathbf{0} & \mathbf{0} & (T-t)\frac{a-b}{K}\mathbf{I}_{K-1} \end{bmatrix}\mathbf{X}^{T}, \text{ where } \\ & \mathbf{X}:=\begin{bmatrix} \frac{1}{\sqrt N}\mathbf{1}_{N} &\mathbf{Z}\boldsymbol{{\varDelta}}^{-1}\mathbf{U} & (\tilde{\mathbf{Z}}\boldsymbol{{\varDelta}}^{-1}-\mathbf{Z}\boldsymbol{{\varDelta}}^{-1}\mathbf{W})\mathbf{U} \end{bmatrix}, \end{array} $$

and is such that $\mathbf {U}^{T}\mathbf {U}=\mathbf {I}_{K-1}$ and

$$ \left[\mathbf{U}\quad \frac{1}{\sqrt{k}}\mathbf{1}_{K}\right] \text{ is an orthogonal matrix, so } \mathbf{U}\mathbf{U}^{T}=\mathbf{I}_{K}-\frac{1}{K}\mathbf{1}_{K}{\mathbf{1}_{K}^{T}}.$$

4.3.3 Theoretical Properties of the Change-Point Estimator $\hat {\tau }$ in Algorithm 5

Now using the results on the eigen-structure of the aggregated adjacency matrix A^[T], we show that Algorithm 5 estimates the change-point τ with high probability.

Theorem 11.

Suppose ψ_±(r) := r + (1 ± 1/4)Td/r. There are constants c, C > 0 such that if (a) $Td > C\log N$, (b) either $\psi _{+}(r)\leqslant \min \limits \{t, T-t\}\frac {a-b}{K}-T\frac aN$ or $\max \limits \{t, T-t\}\frac {a-b}{K}-T\frac aN<\psi _{+}(r)\leqslant T\frac {a-b}{K}-T\frac aN$, and (c) ψ₋(r) > Ta/N, then the values of $\hat K_{r}:= |\{\ell \in [N]: \lambda _{\ell }^{\uparrow }(H^{[T]}_{r}) < 0\}|$ in the two cases (i) no change-point within layers [T] and (ii) one change-point at layer t ∈ [T] are different with probability $\geqslant 1-cNe^{-2Td/C}$.

Theorem 11 indicates that if there is a change at a layer t of the number of negative eigenvalues, $\hat {K}_{r}$, of a Bethe Hessian matrix with r in a specific range, it implies the presence of a change-point at the instance t. So, with high probability t becomes an estimate of the change-point τ.

5 Simulation

5.1 Simulation Setup

We simulate network data under the frameworks of MSBM and MDBM with N = 1200 nodes, T = 40 layers and the numbers of communities K ∈{3,5}. For membership vectors, we set $\mathbf {Z}^{(t)}_{N\times K^{(t)}}=\mathbf {Z}$ for t < τ and $\mathbf {Z}^{(t)}_{N\times K^{(t)}}=\tilde {\mathbf {Z}}$ for $t\geqslant \tau $, where τ is the change-point. We generate Z_i∗ from $\text {Mult}\left (1;\left (\frac {1}{K},...,\frac {1}{K}\right )\right )$. We fix the community change ratio Ξ ∈ [0,1] and form $\tilde {\mathbf {Z}}$ by (1) randomly sampling nodes from each community with probability Ξ to obtain a node subset of size $N^{\prime }$ from [N] and (2) for each k ∈ [1, K] and each index i of the sampled nodes, change Z_i∗ from e_k to e_m, m≠k, where m is chosen from {[1, K], m≠k} with probability 1/(K − 1). We set the connectivity probability matrix as $\mathbf {B} := \rho [C (\mathbf {I}_{K}) + b(\mathbf {1}_{K}{\mathbf {1}_{K}^{T}})]$, where ρ controls the expected degree of each node at each layer, C controls the in/out ratio and thus determines the degree of assortativity, and b is the baseline value set to 0.1. For MDBM networks, we consider degree parameters $\psi _{i} \sim \text {Uniform}(0.6,1)$. To simulate networks with different levels of sparsity and the prominence of the community structure, we consider ρ ∈{0.0059,0.025,0.05} and C ∈{0.4,0.5,0.6}. For change-points, we let τ ∈{1,5,10,20} and for the community change ratio, we consider Ξ ∈ (0.25,0.5,0.75). Table 1 summarizes the model parameter settings used in the simulations. To examine the performance of Algorithms 3, 4, and 5 under different settings, we use three different windows 𝜃 ∈{2,4,6} when running each algorithm. For the clustering steps of Algorithms 3 and 4, we apply Algorithm 1. To calculate the quantities in Eqs. 3.7 and 3.8, we followed the suggestions in (Bhattacharyya and Chatterjee, 2020a) and (Bhattacharyya and Chatterjee, 2020b) and chose Δ = 9, δ = 0.01 and η = 0.01 for Algorithms 3 and 4. For Algorithm 3, we let C = 0.00005, C₁ = 0.03. For Algorithm 4, we chose C = 0.00025, C₁ = 1/6. Under these settings, $\mathcal {V}\mathcal {I}$ performed as expected according to Theorem 4.

Table 1 Model Parameter Settings for Simulations

Full size table

5.2 Evaluation

As indicators of the quality of input signals for change-point detection, we consider the in/out ratio defined as π := (C + 0.1)/0.1. We assess the performance of each algorithm based on the following three criteria:

1.
False positive (FP): $\hat {\tau }$ is treated as a FP if ∅ is not the output when there is no change-point, and if $\hat {\tau }\notin [\tau ,\tau +2\theta -2]$ when there is a change-point.
2.
False negative (FN): $\hat {\tau }$ is treated as a FN if ∅ is the output or if $\hat {\tau }\notin [\tau ,\tau +2\theta -2]$ when there is a change-point.
3.
Delay $:= \hat {\tau }-\tau $ when $\hat {\tau }\in [\tau ,\tau +2\theta -2]$.

5.3 Results

We evaluated Algorithms 3, 4, and 5 on synthetic networks generated under all four settings in Table 1. Below is a presentation based on networks generated under Setting D. Performance results for the other settings are presented in the Appendix C. We use the evaluation criteria in Section 5.2 and assess the performance by varying the following model parameters: (i) the community change ratio Ξ; (ii) the in/out ratio π; and (iii) the network sparsity parameter ρ. FP, FN, and delay of Algorithms 3, 4, and 5 with varying levels of Ξ are shown in Fig. 2, the impact of π on the performance is shown in Fig. 3, and ρ in Fig. 4. In Figs. 2 through 4, FP and FN are shown in Subfigure (A) while Delay are shown in Subfigure (B).

In Figs. 2 through 4, FP, FN, and delay of Algorithms 3, 4, and 5 versus the window 𝜃 are shown. Higher 𝜃 results in higher accuracy as measured by FP and FN while leading to an increase in delay, since the weight of each new layer gets relatively smaller with larger 𝜃. This can be confirmed in Subfigure (A) in Figs. 2, 3, and 4 where all algorithms show better performance with 𝜃 = 6 while Subfigure (B) show less delay with 𝜃 = 2 for all algorithms. Hence there is a accuracy-delay trade-off in the choice of 𝜃. Noticing that the maximal value of Delay is 𝜃, a general strategy is to choose 𝜃 that equals to the maximal acceptable Delay.

It can be seen from Subfigure (A) in Figs. 2, 3, and 4 that all algorithms tend to have better accuracy (as reflected by FP and FN) on networks with larger values of either Ξ, π or ρ, especially when 𝜃 is set to a large value. However, as can be confirmed in Subfigure (B) in Figs. 2, 3, and 4, there is no evidence of a decrease in Delay for networks with larger values of Ξ, π or ρ. There are multiple factors that can affect the Delay of the algorithms. Take Algorithm 3 as an example. The idea of the algorithm is to compare clustering estimation based on historical and updated sub-sequences of layers of networks. This means that when there is an increase in Ξ, π or ρ, clustering estimations become more precise on the one hand, while more layers of networks after the change-point are needed in the updated sub-sequence.

It is evident in Figs. 2 and 4 that while all three algorithms perform similarly in sparse networks (ρ = 0.006), the $\hat K$-based Algorithm (5) outperforms the $\hat {\mathbf {Z}}$-based Algorithms (3 and 4) for higher values of ρ. That $\hat K$-based algorithm performs better than $\hat {\mathbf {Z}}$-based algorithms makes sense since community structure estimation is generally harder than community number estimation.

6 Conclusion

In this paper, we proposed three algorithms based on two broad approaches for online change-point detection for network sequences with community structure. Two of the three algorithms are based on the change in estimated community structures and the third based on the change in the estimated numbers of communities in networks represented as the Bethe Hessian matrices. We proved theoretical results showing the efficacy of the change-point estimates asymptotically as aggregated degree of the networks grows to infinity. We proved the theoretical results for general cases of both the MSBM and MDBM for Algorithms 3 and 4. For Algorithm 5, we proved the theoretical results for the special case of the MSBM.

One of the obvious next steps would be to generalize the theoretical framework for Algorithm 5 and show that the change-point estimate can recover the change-point under a much general model set up. Tackling dependent sequence of networks will also be another promising set of future problems.

References

Aminikhanghahi, S. and Cook, D.J. (2017). A survey of methods for time series change point detection. Knowl. Inf. Syst. 51, 339–367.
Article Google Scholar
Angel, O., Friedman, J. and Hoory, S. (2015). The non-backtracking spectrum of the universal cover of a graph. Trans. Am. Math. Soc. 367, 4287–4318.
Article MathSciNet MATH Google Scholar
Bai, J. and Perron, P. (1998). Estimating and testing linear models with multiple structural changes. Econometrica 47–78.
Bao, W. and Michailidis, G. (2018). Core community structure recovery and phase transition detection in temporally evolving networks. Sci. Rep. 8, 1–16.
Article Google Scholar
Bhamidi, S., Jin, J., Nobel, A. et al. (2018). Change point detection in network models: Preferential attachment and long range dependence. Ann. Appl. Probab. 28, 35–78.
Article MathSciNet MATH Google Scholar
Bhattacharjee, M., Banerjee, M. and Michailidis, G. (2018). Change point estimation in a dynamic stochastic block model. arXiv:1812.03090.
Bhattacharyya, S. and Chatterjee, S. (2020). Consistent recovery of communities from sparse multi-relational networks: a scalable algorithm with optimal recovery conditions. Complex networks XI, pp. 92–103. Springer.
Bhattacharyya, S. and Chatterjee, S. (2020). General community detection with optimal recovery conditions for multi-relational sparse networks with dependent layers.
Bickel, P.J. and Sarkar, P. (2016). Hypothesis testing for automated community detection in networks. J. R. Stat. Soc. Ser. B Stat. Methodol. 78, 253–273.
Article MathSciNet MATH Google Scholar
Bleakley, K. and Vert, J.P. (2011).
Blonder, B., Wey, T.W., Dornhaus, A., James, R. and Sih, A. (2012). Temporal dynamics and network analysis. Methods Ecol. Evol. 3, 958–972.
Article Google Scholar
Bordenave, C., Lelarge, M. and Massoulié, L. (2015). Non-backtracking spectrum of random graphs: community detection and non-regular ramanujan graphs. In 2015 IEEE 56Th annual symposium on foundations of computer science, pp. 1347–1357. IEEE.
Bosc, M., Heitz, F., Armspach, J.P., Namer, I., Gounot, D. and Rumbach, L. (2003). Automatic change detection in multimodal serial mri: application to multiple sclerosis lesion evolution. Neuroimage 20, 643–656.
Article Google Scholar
Brodsky, E. and Darkhovsky, B.S. (2013). Nonparametric methods in change point problems, vol. 243 Springer Science & Business Media.
Bruna, J. and Li, X. (2017). Community detection with graph neural networks. Stat. 1050, 27.
Google Scholar
Van de Bunt, G.G., Van Duijn, M.A. and Snijders, T.A. (1999). Friendship networks through time: an actor-oriented dynamic statistical network model. Comput. Math. Organ. Theory 5, 167–192.
Article MATH Google Scholar
Cape, J., Tang, M. and Priebe, C.E. (2017). The kato–temple inequality and eigenvalue concentration with applications to graph inference. Electron. J. Stat. 11, 3954–3978.
Article MathSciNet MATH Google Scholar
Celik, T. (2009). Unsupervised change detection in satellite images using principal component analysis and k-means clustering. IEEE Geosci. Remote Sens. Lett. 6, 772–776.
Article Google Scholar
Celik, T. (2010). Image change detection using gaussian mixture model and genetic algorithm. J. Vis. Commun. Image Represen. 21, 965–974.
Article Google Scholar
Chen, H. et al. (2019). Sequential change-point detection based on nearest neighbors. Ann. Stat. 47, 1381–1407.
Article MathSciNet MATH Google Scholar
Chen, J. and Gupta, A.K. (2011). Parametric statistical change point analysis: with applications to genetics, medicine, and finance. Springer Science & Business Media.
Chen, S., Ilany, A., White, B.J., Sanderson, M.W. and Lanzas, C. (2015). Spatial-temporal dynamics of high-resolution animal networks: what can we learn from domestic animals? PloS one 10(6).
Cho, H. and Fryzlewicz, P. (2015). Multiple-change-point detection for high dimensional time series via sparsified binary segmentation. J. R. Stat. Soc. Ser. B Stat. Methodol. 77, 475–507.
Article MathSciNet MATH Google Scholar
Coste, S. and Zhu, Y. (2019). Eigenvalues of the non-backtracking operator detached from the bulk. arXiv:1907.05603.
Cribben, I., Haraldsdottir, R., Atlas, L.Y., Wager, T.D. and Lindquist, M.A. (2012). Dynamic connectivity regression: determining state-related changes in brain connectivity. Neuroimage 61, 907–920.
Article Google Scholar
Dall’Amico, L. and Couillet, R. (2019). Community detection in sparse realistic graphs: Improving the bethe hessian. In ICASSP 2019-2019 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp. 2942–2946. IEEE.
Dall’Amico, L., Couillet, R. and Tremblay, N. (2019). Revisiting the bethe-hessian: improved community detection in sparse heterogeneous graphs. In Advances in neural information processing systems, pp. 4039–4049.
Dall’Amico, L., Couillet, R. and Tremblay, N. (2020). Optimal laplacian regularization for sparse spectral community detection. In ICASSP 2020-2020 IEEE International conference on acoustics, speech and signal processing (ICASSP, pp. 3237–3241. IEEE.
Ferraz Costa, A., Yamaguchi, Y., Juci Machado Traina, A., Traina, Jr C. and Faloutsos, C. (2015). Rsc: Mining and modeling temporal activity in social media. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 269–278. ACM.
Gao, C. and Lafferty, J. (2017). Testing network structure using relations between small subgraph probabilities. arXiv:1704.06742.
Gates, M.C. and Woolhouse, M.E. (2015). Controlling infectious disease through the targeted manipulation of contact network structure. Epidemics 12, 11–19.
Article Google Scholar
Girshick, M.A. and Rubin, H. (1952). A bayes approach to a quality control model. Ann. Math. Stat., 114–125.
Gulikers, L., Lelarge, M. and Massoulié, L. (2016). Non-backtracking spectrum of degree-corrected stochastic block models. arXiv:1609.02487.
Harchaoui, Z., Vallet, F., Lung-Yut-Fong, A. and Cappé, O. (2009). A regularized kernel-based approach to unsupervised audio segmentation. In 2009 IEEE International conference on acoustics, speech and signal processing, pp. 1665–1668. IEEE.
Hashimoto, K.I. (1989). Zeta functions of finite graphs and representations of p-adic groups. In Automorphic forms and geometry of arithmetic varieties, pp. 211–280. Elsevier.
Hocking, T.D., Schleiermacher, G., Janoueix-Lerosey, I., Boeva, V., Cappo, J., Delattre, O., Bach, F. and Vert, J.P. (2013). Learning smoothing models of copy number profiles using breakpoint annotations. BMC Bioinform. 14, 164.
Article Google Scholar
Hogg, T. and Lerman, K. (2012). Social dynamics of digg. EPJ Data Sci. 1, 5.
Article Google Scholar
Holme, P. (2015). Modern temporal network theory: a colloquium. Eur. Phys. J. B 88, 234.
Article Google Scholar
Holme, P. and Saramäki, J. (2012). Temporal networks. Phys. Rep. 519, 97–125.
Article Google Scholar
Jacobs, A.Z., Way, S.F., Ugander, J. and Clauset, A. (2015). Assembling thefacebook: Using heterogeneity to understand online social network assembly. In Proceedings of the ACM Web Science Conference, pp. 1–10.
Jin, J., Ke, Z. and Luo, S. (2018). Network global testing by counting graphlets. In International conference on machine learning, pp. 2333–2341.
Kasetkasem, T. and Varshney, P.K. (2002). An image change detection algorithm based on markov random field models. IEEE Trans. Geosci. Remote Sens.40, 1815–1823.
Article Google Scholar
Kolmogorov, A.N. (1950). Unbiased estimates. Izvestiya Rossiiskoi Akademii Nauk. Seriya Matematicheskaya 14, 303–326.
Google Scholar
Krings, G., Karsai, M., Bernhardsson, S., Blondel, V.D. and Saramäki, J. (2012). Effects of time window size and placement on the structure of an aggregated communication network. EPJ Data Sci. 1, 4.
Article Google Scholar
Krzakala, F., Moore, C., Mossel, E., Neeman, J., Sly, A., Zdeborová, L. and Zhang, P. (2013). Spectral redemption in clustering sparse networks. Proc. Natl. Acad. Sci. 110, 20935–20940.
Article MathSciNet MATH Google Scholar
Lahiri, M. and Berger-Wolf, T.Y. (2007). Structure prediction in temporal networks using frequent subgraphs. In 2007 IEEE Symposium on computational intelligence and data mining, p. 35–42. IEEE.
Lavielle, M. and Teyssiere, G. (2007). Adaptive detection of multiple change-points in asset price volatility. In Long memory in economics, pp. 129–156. Springer.
Le, C.M. and Levina, E. (2015). Estimating the number of communities in networks by spectral methods. arXiv:1507.00827.
Lei, J., Rinaldo, A. et al. (2015). Consistency of spectral clustering in stochastic block models. Ann. Stat. 43, 215–237.
Article MathSciNet MATH Google Scholar
Lévy-Leduc, C., Roueff, F. et al. (2009). Detection and localization of change-points in high-dimensional network traffic data. Ann. Appl. Stat. 3, 637–662.
Article MathSciNet MATH Google Scholar
Lorden, G. et al. (1971). Procedures for reacting to a change in distribution. Ann. Math. Stat. 42, 1897–1908.
Article MathSciNet MATH Google Scholar
Masuda, N. and Holme, P. (2017). Temporal network epidemiology. Springer.
Matteson, D.S. and James, N.A. (2014). A nonparametric approach for multiple change point analysis of multivariate data. J. Am. Stat. Assoc. 109, 334–345.
Article MathSciNet MATH Google Scholar
Meilă, M. (2007). Comparing clusterings-an information based distance. J. Multi. Anal. 98, 873–895.
Article MathSciNet MATH Google Scholar
Mislove, A.E. (2009). Online social networks: measurement, analysis, and applications to distributed information systems. Ph.D thesis.
Omodei, E., De Domenico, M.D. and Arenas, A. (2015). Characterizing interactions in online social networks during exceptional events. Front. Phys.3, 59.
Article Google Scholar
Padilla, O.H.M., Yu, Y. and Priebe, C.E. (2019). Change point localization in dependent dynamic nonparametric random dot product graphs. arXiv:1911.07494.
Page, E.S. (1954). Continuous inspection schemes. Biometrika 41, 100–115.
Article MathSciNet MATH Google Scholar
Page, E.S. (1957). On problems in which a change in a parameter occurs at an unknown point. Biometrika 44, 248–252.
Article MATH Google Scholar
Panisson, A., Gauvin, L., Barrat, A. and Cattuto, C. (2013). Fingerprinting temporal networks of close-range human proximity. In 2013 IEEE International conference on pervasive computing and communications workshops (PERCOM workshops), pp. 261–266. IEEE.
Park, H.J. and Friston, K. (2013). Structural and functional brain networks: from connections to cognition. Science 342, 1238411.
Article Google Scholar
Park, Y., Priebe, C.E. and Youssef, A. (2013). Anomaly detection in time series of graphs using fusion of graph invariants. IEEE J. Select. Top. Signal Process. 7, 67–75.
Article Google Scholar
Peel, L. and Clauset, A. (2015). Detecting change points in the large-scale structure of evolving networks. In AAAI, pp. 2914–2920.
Peixoto, T.P. (2015). Inferring the mesoscale structure of layered, edge-valued, and time-varying networks, Vol. 92.
Peixoto, T.P. and Gauvin, L. (2018). Change points, memory and epidemic spreading in temporal networks. Scient. Rep. 8, 15511.
Article Google Scholar
Picard, F., Robin, S., Lavielle, M., Vaisse, C. and Daudin, J.J. (2005). A statistical approach for array cgh data analysis. BMC Bioinform. 6, 27.
Article Google Scholar
Popović, M., Štefančić, H., Sluban, B., Novak, P.K., Grčar, M., Mozetič, I., Puliga, M. and Zlatić, V. (2014). Extraction of temporal networks from term co-occurrences in online textual sources. PloS one 9, e99515.
Article Google Scholar
Radke, R.J., Andra, S., Al-Kofahi, O. and Roysam, B. (2005). Image change detection algorithms: a systematic survey. IEEE Trans. Image Process. 14, 294–307.
Article MathSciNet Google Scholar
Ranshous, S., Shen, S., Koutra, D., Harenberg, S., Faloutsos, C. and Samatova, N.F. (2015). Anomaly detection in dynamic networks: a survey. Wiley Interdiscip. Rev. Comput. Stat. 7, 223–247.
Article MathSciNet Google Scholar
Reeves, J., Chen, J., Wang, X.L., Lund, R. and Lu, Q.Q. (2007). A review and comparison of changepoint detection techniques for climate data. J. Appl. Meteorol. Climatol. 46, 900–915.
Article Google Scholar
Rigbolt, K.T., Prokhorova, T.A., Akimov, V., Henningsen, J., Johansen, P.T., Kratchmarova, I., Kassem, M., Mann, M., Olsen, J.V. and Blagoev, B. (2011). System-wide temporal characterization of the proteome and phosphoproteome of human embryonic stem cell differentiation. Sci. Signal. 4, rs3–rs3.
Article Google Scholar
Rocha, L.E., Liljeros, F. and Holme, P. (2010). Information dynamics shape the sexual networks of internet-mediated prostitution. Proc. Natl. Acad. Sci.107, 5706–5711.
Article MATH Google Scholar
Rocha, L.E., Liljeros, F. and Holme, P. (2011). Simulated epidemics in an empirical spatiotemporal network of 50,185 sexual contacts. PLos computational biology 7(3).
Roy, S., Atchadé, Y. and Michailidis, G. (2017). Change point estimation in high dimensional markov random-field models. J. R. Stat. Soc. Ser. B Stat. Methodol. 79, 1187–1206.
Article MathSciNet MATH Google Scholar
Saade, A., Krzakala, F. and Zdeborová, L. (2014). Spectral clustering of graphs with the bethe hessian. Advances in neural information processing systems, pp. 406–414.
Saade, A., Krzakala, F. and Zdeborová, L. (2014). Spectral density of the non-backtracking operator on random graphs. EPL Europhys. Lett. 107, 50005.
Article Google Scholar
Salathé, M., Kazandjieva, M., Lee, J.W., Levis, P., Feldman, M.W. and Jones, J.H. (2010). A high-resolution human contact network for infectious disease transmission. Proc. Natl. Acad. Sci. 107, 22020–22025.
Article Google Scholar
Shiryaev, A.N. (1963). On optimum methods in quickest detection problems. Theory Probab. App. 8, 22–46.
Article MATH Google Scholar
Siegmund, D. (2013). Change-points: from sequential detection to biology and back. Seq. Anal. 32, 2–14.
Article MathSciNet MATH Google Scholar
Sikdar, S., Ganguly, N. and Mukherjee, A. (2016). Time series analysis of temporal networks. Eur. Phys. J. B 89, 11.
Article Google Scholar
Sporns, O. (2013). Structure and function of complex brain networks. Dialogues Clin. Neurosci. 15, 247.
Article Google Scholar
Staudacher, M., Telser, S., Amann, A., Hinterhuber, H. and Ritsch-Marte, M. (2005). A new method for change-point detection developed for on-line analysis of the heart beat variability during sleep. Phys. A Stat. Mech. Appl. 349, 582–596.
Article Google Scholar
Stopczynski, A., Sekara, V., Sapiezynski, P., Cuttone, A., Madsen, M.M., Larsen, J.E. and Lehmann, S. (2014). Measuring large-scale social networks with high resolution. PloS one 9, e95978.
Article Google Scholar
Thompson, W.H., Brantefors, P. and Fransson, P. (2017). From static to temporal network theory: Applications to functional brain connectivity. Netw. Neurosci. 1, 69–99.
Article Google Scholar
Viswanath, B., Mislove, A., Cha, M. and Gummadi, K.P. (2009). On the evolution of user interaction in facebook. In Proceedings of the 2nd ACM workshop on Online social networks, pp. 37–42.
Wang, D., Yu, Y. and Rinaldo, A. (2018). Optimal change point detection and localization in sparse dynamic networks. arXiv:1809.09602.
Wang, Y., Chakrabarti, A., Sivakoff, D. and Parthasarathy, S. (2017). Fast change point detection on dynamic social networks. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 2992–2998. AAAI Press.
Wang, Y.R., Bickel, P.J. et al. (2017). Likelihood-based model selection for stochastic block models. Ann. Stat. 45, 500–528.
Article MathSciNet MATH Google Scholar
Watanabe, Y. and Fukumizu, K. (2009). Graph zeta function in the bethe free energy and loopy belief propagation. Advances in neural information processing systems, pp. 2017–2025.
Wills, P. and Meyer, F.G. (2019). Change point detection in a dynamic stochastic blockmodel. International conference on complex networks and their applications, pp. 211–222. Springer.
Yang, P., Dumont, G. and Ansermino, J.M. (2006). Adaptive change detection in heart rate trend monitoring in anesthetized children. IEEE Trans. Biomed. Eng. 53, 2211–2219.
Article Google Scholar
Yedidia, J.S., Freeman, W.T. and Weiss, Y. (2003). Understanding belief propagation and its generalizations. Explor. Artif. Intell. New Millennium8, 236–239.
Google Scholar
Zhang, X., Shao, S., Stanley, H.E. and Havlin, S. (2014). Dynamic motifs in socio-economic networks. EPL Europhys. Lett. 108, 58001.
Article Google Scholar
Zhao, L., Wang, G.J., Wang, M., Bao, W., Li, W. and Stanley, H.E. (2018). Stock market as temporal network. Physic. A Stat. Mech. Appl.506, 1104–1112.
Article Google Scholar
Zhao, Z., Chen, L. and Lin, L. (2019). Change-point detection in dynamic networks via graphon estimation. arXiv:1908.01823.

Download references

Author information

Authors and Affiliations

City University of New York - Bronx Community College, New York, NY, USA
Neil Hwang
Oregon State University, Corvallis, OR, USA
Jiarui Xu & Sharmodeep Bhattacharyya
City University of New York - City College and Graduate Center, New York, NY, USA
Shirshendu Chatterjee

Authors

Neil Hwang
View author publications
You can also search for this author in PubMed Google Scholar
Jiarui Xu
View author publications
You can also search for this author in PubMed Google Scholar
Shirshendu Chatterjee
View author publications
You can also search for this author in PubMed Google Scholar
Sharmodeep Bhattacharyya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sharmodeep Bhattacharyya.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

N. Hwang was partially supported by the Rich Internship awarded by the Department of Mathematics, City College of New York, CUNY, in Summer 2019, and S. Chatterjee was partially supported by the PSC-CUNY Enhanced Research Award #62781-00 50

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 6.11 MB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hwang, N., Xu, J., Chatterjee, S. et al. The Bethe Hessian and Information Theoretic Approaches for Online Change-Point Detection in Network Data. Sankhya A 84, 283–320 (2022). https://doi.org/10.1007/s13171-021-00248-1

Download citation

Received: 16 May 2020
Accepted: 16 March 2021
Published: 01 July 2021
Issue Date: June 2022
DOI: https://doi.org/10.1007/s13171-021-00248-1

Keywords

AMS (2000) subject classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The Bethe Hessian and Information Theoretic Approaches for Online Change-Point Detection in Network Data

Abstract

Similar content being viewed by others

Community-based time segmentation from network snapshots

Change Point Detection in a Dynamic Stochastic Blockmodel

Multiple change points detection and clustering in dynamic networks

1 Introduction

2 Background and Preliminaries

2.1 Network Sequence Data

2.2 Notations

2.3 Data Generative Models

2.4 Clustering Algorithms from Relevant Works

Definition 1

Definition 2

Theorem 1.

Theorem 2.

2.5 Background on Information Theory

Definition 3

Definition 4

Definition 5

Property 1.

2.6 The Bethe Hessian Matrix

2.6.1 Definitions

Definition 6

Definition 7

Definition 8

2.6.2 Background

3 Online Change-point Detection Methods

3.1 \(\hat {\mathbf {Z}}\)-based Change-Point Detection Algorithms

3.1.1 Upper Bound on \(\mathcal {V}\mathcal {I}(\hat {\mathbf {Z}},\mathbf {Z})\)

Theorem 3.

Theorem 4.

3.1.2 Two Algorithms

Remark 1.

Remark 2.

Remark 3.

Remark 4.

3.2 \(\hat {K}\)-based Change-Point Detection Algorithm

Remark 5.

Remark 6.

Remark 7.

Remark 8.

4 Theoretical Results

4.1 Assumptions

4.1.1 Assumptions for Theoretical Results for the \(\hat {\mathbf {Z}}\)-based Algorithms

4.1.2 Assumptions for Theoretical Results for the \(\hat {K}\)-based Algorithm

4.2 Theoretical Results for the \(\hat {\mathbf {Z}}\)-based Algorithms

4.2.1 Theoretical Properties of \(\hat {\tau }\) in Algorithms 3 and 4

Lemma 1.

Theorem 5.

Theorem 6.

Lemma 2.

Theorem 7.

Theorem 8.

Theorem 9.

Lemma 3.

Theorem 10.

4.3 Theoretical Results for the \(\hat K\)-based Algorithm

4.3.1 Notation

4.3.2 Preliminary Results

Lemma 4.

Lemma 5.

4.3.3 Theoretical Properties of the Change-Point Estimator \(\hat {\tau }\) in Algorithm 5

Theorem 11.

5 Simulation

5.1 Simulation Setup

5.2 Evaluation

5.3 Results

6 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Electronic supplementary material

(PDF 6.11 MB)

Rights and permissions

About this article

Cite this article