1 Introduction

The popularity of web, mobile phones, and other portable devices has propelled the growth of large-scale networks such as Facebook and Twitter, as well as a wide range of online content sharing and crowdsourcing services. These networks are dynamically evolving as users join and leave, and as their traffic of interactions varies. To characterize the strength or health of these large-scale networks, we need some measures to tell us their robustness.

Network robustness measures how well network structure is strong and healthy when it is under attack, such as vertices joining and leaving. The ability to measure the robustness of networks can benefit several useful applications. For example, in a phone call network, dense and frequent calls among users reduce the likelihood of churn. A similar comment can be made for online social networks. For another example, the robustness of IP networks affects service quality and security. Service providers therefore aim to monitor, manage, and optimize their networks to keep their networks robust. Network robustness is also studied in other applications, such as disease transmission [3, 12] and network security [17].

Evaluating the robustness of a weighted network is a natural problem because edges of the network may associate with attached information, such as times of interactions in Telcom network, # retweets between Twitterers, and # adoptions between a user and an item for online shopping web site. As today’s networks are usually of very large scale, efficiently measuring robustness of weighted networks is therefore a challenge. The naive way is to extend existing robustness measures to evaluate the robustness of weighted networks. They include node connectivity [11], edge connectivity [11], and algebraic connectivity [13]. Node (or edge) connectivity \(\upsilon (G)\) (or \(\varepsilon (G)\)) of a weighted network G may be defined by the weights of nodes (or edges) that may be removed to break the networks into multiple connected components. Large node and edge connectivity values suggest that a network is robust. Algebraic connectivity \(\lambda (G)\) may be defined by the second smallest eigenvalue of the Laplacian matrix (defined in Sect. 3) of the weighted network.

In combinatorics, an expander network is a connected and undirected network in which every small subset of the vertex set has a large boundary. The goodness (or robustness) of an expander network can be measured by Cheeger constant [29], vertex expansion [5] and edge expansion [18]. Let \(G = (V, E, W)\) be a connected, undirected, and weighted network. Cheeger constant h(G), vertex expansion \(h_v(G)\), and edge expansion \(h_e(G)\) may be defined in Eqs. (1), (2) and (3).

$$\begin{aligned} h(G)&= \min _{S\subset V}\frac{|\partial (S)|}{\min \{{\textit{vol}}(S), {\textit{vol}}({\overline{S}})\}}, \end{aligned}$$
(1)
$$\begin{aligned} h_v(G)&= \min _{S\subset V,0< |S|\le \frac{|V|}{2}}\frac{|\partial _{{\textit{out}}}(S)|}{|S|},\end{aligned}$$
(2)
$$\begin{aligned} h_e(G)&= \min _{S\subset V, 0< |S|\le \frac{|V|}{2}}\frac{|\partial (S)|}{|S|}, \end{aligned}$$
(3)

where the symbols can be found in Table 1. \({\textit{vol}}(S)\) (\({\textit{vol}}({\overline{S}})\)) is the total weighted degrees of vertices in S (complement of S), \(\partial (S)\) is the edge boundary of S (i.e., the set of edges with exactly one endpoint in S), and \(\partial _{{\textit{out}}}(S)\) is the outer vertex boundary of S (i.e., the set of vertices in \(V\backslash S\) with at least one neighbor in S).

Table 1 The description of symbols in Eqs. (1), (2) and (3)

The existing measures may have the following shortcomings:

  • They are only applicable to connected networks. Even though a highly robust giant component exists in a network with very few connected components, the network is considered not robust at all as all these measures return zero values.

  • They quantify robustness using specific (optimal) combinations of vertices (for node connectivity), specific combination of edges (for edge connectivity), and specific eigenvalue (for algebraic connectivity).

  • Even for connected network, they are difficult to scale for large networks of millions vertices. For algebraic connectivity, we need to compute the second smallest eigenvalue of the Laplacian matrix. For node connectivity, edge connectivity, Cheeger constant, vertex expansion, and edge expansion, we have to check all cuts of the weighted network. These are all time-consuming measurements.

In this paper, we aims to extend our proposed \({\mathcal {R}}\)-energy in [14] to measure robustness for the weighted networks. Comparing to the original work, the main contributions of this version can be summarized as follows:

  • We propose \({\mathcal {R}}\)-energy as an efficient measure for weighted network robustness. The new measure, defined based on the normalized Laplacian matrix, demonstrates several nice properties. It can also handle networks with multiple connected components and can be computed with good time complexity \(O(|V| + |E|)\), where V and E are sets of vertices and edges in a weighted network.

  • We find that R-energy can be used to monitor the robustness for dynamic networks. In Theorem 3, we have proved that we can incrementally and efficiently compute the R-energy for dynamic networks with vertex or edge modification, such as insert and delete.

  • We further apply \({\mathcal {R}}\)-energy to a dynamic Twitter community with about 130K users to detect events and regular trend patterns that affect the weighted network robustness. We empirically show that more events can be detected from the reply network in this extended version. This points to the positive effect of defining R-energy on a weighted network.

The remainder of the paper is organized as follows. We first review related work in Sect. 2. We then introduce some basic notations in Sect. 3, before presenting \({\mathcal {R}}\)-energy and its algorithm in Sect. 4. We illustrate some important properties of \({\mathcal {R}}\)-energy in Sect. 5 and demonstrate some observations and the performance of \({\mathcal {R}}\)-energy on both synthetic and real networks in Sect. 6. Before we conclude this paper in Sect. 7, we illustrate some patterns and events found using \({\mathcal {R}}\)-energy on a dynamic Twitter user community.

2 Related work

2.1 Robustness

The traditional network robustness measures, node connectivity, and edge connectivity were proposed by Dekker and Colbert [11]. Network expansion can also be used to measure network robustness. Different formulations of expander give rise to different measures of expander, e.g., edge expansion [18], vertex expansion [5], and spectral expansion [23]. Larger edge or vertex expansions indicate less bottleneck inside a network.

Jamakovic and Mieghem proposed to use the second smallest eigenvalue of the Laplacian matrix also known as algebraic connectivity to measure network robustness [13, 19]. Malliaros et al. [23] described the relationship between algebraic connectivity and node/edge connectivities. According to Cheeger’s inequality, Chung found that the expansion of a network is closely related to the spectral gap between the largest and the second largest eigenvalues of adjacency matrix. Malliaros et al. confirmed the findings of Chung in [23]. This measure is, however, costly to be computed and is sensitive to the network size. Hence, it is not appropriate for comparing networks of different sizes. Albert et al. [1] used diameter to measure robustness of networks, but the measure does not capture network connectivity which should be considered in robustness measures. As mentioned in Sect. 1, they have some drawbacks to measure robustness of weighted networks.

2.2 Graph energy

The energy of a network has always been defined to be some form of deviation of eigenvalues of some network matrix from the mean of eigenvalues. For example, Gutman defined network energy on an adjacency matrix as the absolute deviation of eigenvalues from the mean of eigenvalues which is zero for any adjacency matrix [16]. In [26, 30], Laplacian energy has been defined on the combinatorial Laplacian matrix. In [7], normalized Laplacian energy is defined on the normalized Laplacian matrix in a similar manner.

Day and So studied network energy changes with edge or vertex removals [9, 10]. There are some existing works which derive the lower and upper bounds for different energy definitions including Gutman’s graph energy [2], Laplacian energy [26, 30, 31], and normalized Laplacian energy [7]. They are not appropriate measures for network robustness as computing them would be time costly.

3 Definition of \({\mathcal {R}}\)-energy

In this section, based on the normalized Laplacian matrix of a weighted network, we address how the eigenvalues of the normalized Laplacian of the weighted network are related to the structure of the network and define the \({\mathcal {R}}\)-energy to measure the robustness of the weighted network.

3.1 Normalized Laplacian

Consider an undirected network \(G=(V,E)\) with vertex set V and edge set E (Let \(|V| = n\)). Let \(A_{G}\) denote the adjacency matrix representing G and be defined as:

$$\begin{aligned} A_{G}(i,j):=\left\{ \begin{array}{ll} 1, &{}\quad \hbox {if}\;(v_i,v_j)\in E; \\ 0, &{}\quad \hbox {otherwise.} \end{array} \right. \end{aligned}$$

Definition 1

A weighted network G is triple \((V, E, W_G)\), where V and E are sets of vertices and edges, and each edge \((v_i, v_j)\in E\) associates with weight \(w_{ij}\). As such, \(W_G\) is a weight matrix defined as

$$\begin{aligned}W_G(i,j) =\left\{ \begin{array}{ll} w_{ij}, &{}\quad \hbox {if}\; (v_i, v_j)\in E;\\ 0, &{} \quad \hbox {otherwise.} \end{array} \right. \end{aligned}$$

Formally, we define the neighbor of \(v_i\) to be a vertex set as \(N(v_i):=\{v_j | (v_i, v_j) \in E\}\), and degree of \(v_i\) to be the total weight of vertices in \(N(v_i)\), denoted as \(d(v_i):=\sum _{v_j\in N(v_i)} w_{ij}\). We then define degree matrix \(D_{G}\) as

$$\begin{aligned} D_{G}(i,j):=\left\{ \begin{array}{ll} d(v_i), &{}\quad \hbox {if}\;i=j\;\hbox { and }\;v_i\in V; \\ 0, &{} \quad \hbox {otherwise.} \end{array} \right. \end{aligned}$$

A network is an unweighted network if all edges have the identical weight.

Based on matrices \(W_{G}\) and \(D_{G}\), we define the normalized Laplacian matrix of a weighted network in Definition 2.

Definition 2

Normalized Laplacian matrix \(N_{G}\) of a weighted network G with nonnegative weight matrix \(W_{G}\) is given by

$$\begin{aligned} N_{G} := I-D^{-1/2}_{G}W_{G}D^{-1/2}_{G}. \end{aligned}$$

We denote \(\zeta _1\le \zeta _2\le \cdots \le \zeta _n\) as the sequence of eigenvalues of \(N_{G}\).

The eigenvalues of the normalized Laplacian matrix satisfy three important properties:

Theorem 1

Let G be a weighted network of n vertices. The eigenvalues of \(N_{G}\) have

  1. 1.

    \(0 = \zeta _1 \le \zeta _2\le \frac{n}{n-1}\le \zeta _n \le 2;\)

  2. 2.

    \(\zeta _2 = \cdots = \zeta _n = \frac{n}{n-1}\) if and only if G is a clique of equal edge weights;

  3. 3.

    G has at least i connected components if and only if \(\zeta _j = 0\), for \(j = 1, 2, \ldots , i\).

We prove the theorem in Appendix. Property (1) illustrates that the second smallest and the largest eigenvalues range from 0 to \(\frac{n}{n-1}\) and \(\frac{n}{n-1}\) to 2, respectively. As a special case, when all except the smallest eigenvalue equal \(\frac{n}{n-1}\), the network is a clique as shown in Property (2). Property (3) states that each additional connected component corresponds to a zero eigenvalue. \(\zeta _1\) is therefore 0 in any weighted networks.

3.2 Definition of \({\mathcal {R}}\)-energy

According to Theorem 1, for a weighted network G that is sparsely connected and is far from being a clique, its \(\zeta _2\) is small, but \(\zeta _n\) is large. In contrast, a network that is densely connected and similar to a clique will have \(\zeta _2\) not much smaller than \(\zeta _n\). As such, a robust weighted network should have a small degree of dispersion on eigenvalues.

To evaluate the degree of dispersion of eigenvalues, we define robustness energy as follows:

Definition 3

Let G be a weighted network, the robustness energy (short in \({\mathcal {R}}\)-energy) of G is defined as:

$$\begin{aligned} {\mathbb {E}}_{{\mathcal {R}}}(G):= \frac{1}{n-1}\sum _{i=2}^{n}(\zeta _i-\overline{\zeta })^2 \end{aligned}$$

where \(\overline{\zeta } = \frac{1}{n-1}\sum _{i=2}^{n}\zeta _i\).

The definition of \({\mathcal {R}}\)-energy does not consider \(\zeta _1\) since its value is always zero. A weighted network is more robust if its \({\mathcal {R}}\)-energy is smaller. This is due to the factor that smaller dispersion of \(\zeta _2, \zeta _3, \ldots , \zeta _n\) implies that the weighted network is closer to a clique of equal edge weights. Furthermore, \({\mathcal {R}}\)-energy can be applied for measuring the robustness of both connected and disconnected networks. For a disconnected network, we may observe a large \({\mathcal {R}}\)-energy (less robust) since it has multiple zero eigenvalues.

4 Computation of \({\mathcal {R}}\)-energy

To compute the \({\mathcal {R}}\)-energy, the naive approach is to compute all eigenvalues of the normalized Laplacian matrix. As we known, computing all eigenvalues is computationally expensive. Based on the following theorem, we propose a simple and efficient approach to compute \({\mathcal {R}}\)-energy in \(O(|V| + |E|)\).

Theorem 2

Given that a weighted network of n vertices and its eigenvalues of the normalized Laplacian matrix are \(\zeta _1, \zeta _2, \zeta _3, \ldots , \zeta _n\), we have

  1. 1.

    the mean of eigenvalues \(\zeta _2, \zeta _3, \ldots , \zeta _n\), denoted as \(\overline{\zeta }\), is \(\frac{n}{n-1};\)

  2. 2.

    the \({\mathcal {R}}\)-energy can be computed as

    $$\begin{aligned} {\mathbb {E}}_{{\mathcal {R}}}(G) = \frac{1}{n-1} \sum _{(v_i,v_j)\in E}\frac{w_{G}(i,j)^2}{d(v_i)d(v_j)} - \frac{n}{(n-1)^2}. \end{aligned}$$
    (4)

Proof

According to Definition 2, entry \(N_{G}(i,j)\) of \(N_{G}\) is:

$$\begin{aligned} N_{G}(i,j)=\left\{ \begin{array}{ll} 1, &{}\quad \text {if}\; i = j\;\text { and }\;d(v_i)\ne 0;\\ -\frac{w_{G}(i,j)}{\sqrt{d(v_i)d(v_j)}}, &{}\quad \text {if}\; A_{G}(i,j)\ne 0;\\ 0, &{}\quad \mathrm{otherwise}. \end{array} \right. \end{aligned}$$
(5)

Note that each diagonal element of \(N_{G}\) is 1 and \(\zeta _1 = 0\), we have,

$$\begin{aligned} \frac{1}{n-1} \sum _{i=2}^{n}\zeta _i = \frac{1}{n-1} \sum _{i=1}^{n}\zeta _i= \frac{1}{n-1}\cdot tr(N_{G}) = \frac{n}{n-1}. \end{aligned}$$

where \(tr(N_{G})\) denotes the trace of matrix \(N_{G}\).

In terms of the value of \(\overline{\zeta }\), we now compute the \({\mathcal {R}}\)-energy:

$$\begin{aligned} {\mathbb {E}}_{{\mathcal {R}}}(G)&=\frac{1}{n-1}\sum _{i=2}^{n}\left( \zeta _i - \frac{n}{n-1}\right) ^2 \\&= \frac{1}{n-1}\sum _{i=1}^{n}\zeta _i^2 - \frac{n^2}{(n-1)^2}, \end{aligned}$$

where we have \(\sum _{i=1}^{n}\zeta _i^2 = tr(N(G)^2)\). To further compute, the ith diagonal element of \(N(G)^2\) is

$$\begin{aligned} \sum _{j=1}^{n}N_{G}(i,j)N_{G}(j,i) = \sum _{j\ne i}^{n}\frac{w_{G}(i,j)^2}{d(v_i)d(v_j)} + 1. \end{aligned}$$

Thus, we have:

$$\begin{aligned} {\mathbb {E}}_{{\mathcal {R}}}(G)&= \frac{1}{n-1}\sum _{i=1}^{n}\sum _{j\ne i}^{n}\frac{w_{G}(i,j)^2}{d(v_i)d(v_j)} - \frac{n}{(n-1)^2}.\\&= \frac{1}{n-1} \sum _{(v_i,v_j)\in E}\frac{w_{G}(i,j)^2}{d(v_i)d(v_j)} - \frac{n}{(n-1)^2}. \end{aligned}$$

\(\square \)

Theorem 2 indicates that we can avoid to calculate all eigenvalues for computing \({\mathcal {R}}\)-energy. In terms of the theorem, Algorithm 1 depicts the steps to compute the \({\mathcal {R}}\)-energy for a weighted network. The algorithm consists of two main steps. One is to compute the degree of vertices (Lines 1–3). The other one is to aggregate the value of \(\frac{w_{G}(i, j)^2}{{\textit{deg}}(v_i){\textit{deg}}(v_j)}\) for each edge at Lines 4–6. Both the time and space complexities of the algorithm are \(O(|V| + |E|)\).

figure g

4.1 2-step commute probability

Given a weighted network, it can be considered as a random walk, where its transition probability matrix \(P=(p_{ij})_{1\le i,j\le n}\) can be defined as

$$\begin{aligned}p_{ij}:=\left\{ \begin{array}{ll} \frac{w_{G}(i,j)}{d(v_i)}, &{}\quad \hbox {if}\;(v_i,v_j)\in E; \\ 0, &{} \quad \hbox {otherwise.} \end{array} \right. \end{aligned}$$

where \(p_{ij}\) denotes the probability of reaching \(v_j\) from \(v_i\) in one step.

Let \(p_{ij}^{(t)}\) denote the probability of reaching \(v_j\) from \(v_i\) in exactly t step. Specially, \(p_{ii}^{(2)}\) means the probability of returning \(v_i\) from \(v_i\) in exactly 2 steps, namely 2-step commute probability, i.e.,

$$\begin{aligned} P_{ii}^{(2)}=\sum _{j=1}^{n} p_{ij}\cdot p_{ji}. \end{aligned}$$

The probability is very important because it computes the possibility of a random walk returning back to vertex \(v_i\) after 2 steps when a walker starts at \(v_i\). For a well-connected vertex \(v_i\), a random walk is unlikely to return back to it if the walker starts at \(v_i\), i.e., small 2-step commute probability.

In fact, \({\mathcal {R}}\)-energy is related to the average 2-step commute probability of all vertices in G. That is,

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n} p_{ii}^{(2)} =\frac{1}{n}\sum _{i=1}^{n}\sum _{j\ne i}^{n} p_{ij}\cdot p_{ji} =\frac{1}{n}\sum _{i=1}^{n}\sum _{j\ne i}^{n}\frac{w_{G}(i,j)^2}{d(v_i)d(v_j)}. \end{aligned}$$

We rewrite Eq. (4) into Eq. (6).

$$\begin{aligned} {\mathbb {E}}_{{\mathcal {R}}}(G) = \frac{n}{n-1}\left( \frac{1}{n}\sum _{(v_i,v_j)\in E}\frac{w_{G}(i,j)^2}{d(v_i)d(v_j)} - \frac{1}{n-1}\right) \end{aligned}$$
(6)

The factor \(\frac{n}{n-1}\) in Eq. (6) can be considered as a reward factor for the weighted network of n vertices. Larger graphs are therefore more robust due to monotonically decreasing \(\frac{n}{n-1}\) as n increases. This factor facilitates the comparison of \({\mathcal {R}}\)-energy for networks with different sizes. Note that the 2-step commute probability of each vertex in a clique of equal edge weights with n vertices is \(\frac{1}{n-1}\). The right side of Eq. (6) is thus the difference between the average 2-step commute probability and the average 2-step community probability of a clique with the same size. Hence, the \({\mathcal {R}}\)-energy of G combines the reward of network size with the difference between the average 2-step commute probability of G and a clique with the same size.

4.2 \({\mathcal {R}}\)-energy for disconnected network

\({\mathcal {R}}\)-energy can measure the robustness of both connected and disconnected weighted networks. Suppose that network G has N connected components, denoted as \(\{C_k\}_{k=1}^{N}\). In Eq. (7), the energy is derived by weighted sum of the average 2-step commute probability of vertices from each connected component.

$$\begin{aligned} {\mathbb {E}}_{{\mathcal {R}}}(G)= \frac{n}{n-1} \left( \sum _{k=1}^{N}\frac{n_k}{n}P_{C_k} - \frac{1}{n-1}\right) \end{aligned}$$
(7)

where \(P_{C_k}\) is the average 2-step commute probability of vertices from connected component \(C_k\) in Eq. (8).

$$\begin{aligned} P_{C_k} = \frac{1}{n_k}\sum _{(v_i,v_j)\in C_k}\frac{w_{G}(i,j)^2}{d(v_i)d(v_j)}, k=1,\ldots , N \end{aligned}$$
(8)

\({\mathcal {R}}\)-energy therefore considers a large disconnected network G to be robust if G contains a robust giant component. This conclusion is reasonable and is consistent with our intuition.

For an unweighted network G, \(w_{G}(i, j) = 1\) if \(A_{G}(i, j) = 1\), otherwise 0. The \({\mathcal {R}}\)-energy can be computed in Eq. (9).

$$\begin{aligned} {\mathbb {E}}_{{\mathcal {R}}}(G) = \frac{1}{n-1}\sum _{(v_i,v_j)\in E}\frac{A_{G}(i, j)}{d(v_i)d(v_j)} -\frac{n}{(n-1)^2}. \end{aligned}$$
(9)

The result is consistent with our published work [14].

4.3 Computing \({\mathcal {R}}\)-energy in an incremental manner

Based on following theorem, we compute \({\mathcal {R}}\)-energy in an incremental manner. It means that \({\mathcal {R}}\)-energy can be applied for measuring robustness of dynamic networks.

Theorem 3

Given a weighted network \(G=(V,E,W)\) of n vertices and its \({\mathcal {R}}\)-energy denoted as \({\mathbb {E}}_{{\mathcal {R}}}(G)\), we have

  1. 1.

    if \(v_0\) is a new coming vertex, which connects vertex \(v_k\) with weight w in network G. The \({\mathcal {R}}\)-energy of the new formed network can be computed as

    $$\begin{aligned} \frac{n-1}{n}{\mathbb {E}}_{{\mathcal {R}}}(G)+\frac{2}{n}\varDelta (v_k)- \frac{1}{n^2(n-1)}. \end{aligned}$$
    (10)
  2. 2.

    if a new coming edge e with weight w connects vertices \(v_{i_0}\) and \(v_{j_0}\) in network G. The \({\mathcal {R}}\)-energy of the new formed network can be computed as

    $$\begin{aligned} \frac{n-1}{n}{\mathbb {E}}_{{\mathcal {R}}}(G)+\frac{2}{n}\big (\nabla (v_{i_0})+\nabla (v_{j_0})\big )- \frac{1}{n^2(n-1)}. \end{aligned}$$
    (11)

    where \(\varDelta (v_k)= -\sum _{v_j\in N(v_k)}\frac{w_{G}(j,k)^2w}{(d(v_k)+w)d(v_k)d(v_j)} + \frac{w}{d(v_k)+w}\) and \(\nabla (v_k)= -\sum _{v_j\in N(v_k)}\frac{w_{G}(k,j)^2w}{(d(v_k)+w)d(v_j)d(v_k)}+\frac{w^2}{(d(v_{i_0})+w)(d(v_{j_0})+w)}\).

Proof

  1. 1.

    For vertex \(v_k\), its degree changes to \(d(v_k)+w\). Furthermore, the difference of 2-step commute probability for vertex \(v_k\) is

    $$\begin{aligned} \varDelta (v_k)&\doteq \sum _{v_j\in N(v_k)}\frac{w_{G}(j,k)^2}{(d(v_k)+w)d(v_j)}+ \frac{w^2}{(d(v_k)+w)d(v_0)}-\sum _{v_j\in N(v_k)}\frac{w_{G}(j,k)^2}{d(v_k)d(v_j)} \\&=-\sum _{v_j\in N(v_k)}\frac{w_{G}(j,k)^2w}{(d(v_k)+w)d(v_k)d(v_j)}+\frac{w}{d(v_k)+w}. \end{aligned}$$

    For all neighbors of vertex \(v_k\), the difference of their 2-step commute probabilities is

    $$\begin{aligned}&\sum _{v_j\in N(v_k)}\frac{w_{G}(k,j)^2}{(d(v_k)+w)d(v_j)}+ \frac{w^2}{(d(v_k)+w)d(v_0)}-\sum _{v_j\in N(v_k)}\frac{w_{G}(k,j)^2}{d(v_k)d(v_j)}\\&\quad =-\sum _{v_j\in N(v_k)}\frac{w_{G}(j,k)^2w}{(d(v_k)+w)d(v_k)d(v_j)} + \frac{w}{d(v_k)+w} = \varDelta (v_k). \end{aligned}$$

    According to Eq. (6), \({\mathcal {R}}\)-energy of the new formed network can be computed as

    $$\begin{aligned} {\mathbb {E}}_{{\mathcal {R}}}(G+v_0)&= \frac{1}{n}\left( \sum _{(v_i,v_j)\in E}\frac{w_{G}(i,j)^2}{d(v_i)d(v_j)} +2\varDelta (v_k)\right) - \frac{n+1}{n^2}\\&=\frac{1}{n}\left( (n-1){\mathbb {E}}_{{\mathcal {R}}}(G)+\frac{n}{n-1}+2\varDelta (v_k)\right) - \frac{n+1}{n^2}\\&=\frac{n-1}{n}{\mathbb {E}}_{{\mathcal {R}}}(G)+\frac{2}{n}\varDelta (v_k)- \frac{1}{n^2(n-1)} \end{aligned}$$
  2. 2.

    For new formed edge \(e=(v_{i_0},v_{j_0})\), the difference of 2-step commute probabilities for vertices in \(N(v_{i_0})\) is

    $$\begin{aligned} \nabla (v_{i_0})&\doteq \sum _{v_k\in N(v_{i_0})}\frac{w_{G}(k,i_0)^2}{(d(v_{i_0})+w)d(v_k)}+\frac{w^2}{(d(v_{i_0})+w)(d(v_{j_0})+w)}\\&\quad -\sum _{v_k\in N(v_{i_0})}\frac{w_{G}(k,i_0)^2}{d(v_k)d(v_{i_0})} \\&=-\sum _{v_k\in N(v_{i_0})}\frac{w_{G}(k,i_0)^2w}{(d(v_{i_0})+w)d(v_k)d(v_{i_0})}+\frac{w^2}{(d(v_{i_0})+w)(d(v_{j_0})+w)}. \end{aligned}$$

    For vertex \(v_{i_0}\), the difference of 2-step commute probability is

    $$\begin{aligned} -\sum _{v_k\in N(v_{i_0})}\frac{w_{G}(i_0,k)^2w}{(d(v_{i_0})+w)d(v_k)d(v_{i_0})}+\frac{w^2}{(d(v_{i_0})+w)(d(v_{j_0})+w)}=\nabla (v_{i_0}). \end{aligned}$$

    Similarly, we can compute the difference of 2-step commute probability related to vertex \(v_{j_0}\). According to Eq. (6), \({\mathcal {R}}\)-energy of the new formed network can be computed as

    $$\begin{aligned} {\mathbb {E}}_{{\mathcal {R}}}(G+e)&= \frac{1}{n-1}\left( \sum _{(v_i,v_j)\in E}\frac{w_{G}(i,j)^2}{d(v_i)d(v_j)} +2\nabla (v_{i_0})+2\nabla (v_{j_0})\right) - \frac{n}{(n-1)^2}\\&=\frac{1}{n}\left( (n-1){\mathbb {E}}_{{\mathcal {R}}}(G)+\frac{n}{n-1}+2\nabla (v_{i_0})+2\nabla (v_{j_0})\right) - \frac{n+1}{n^2}\\&=\frac{n-1}{n}{\mathbb {E}}_{{\mathcal {R}}}(G)+\frac{2}{n}\big (\nabla (v_{i_0})+\nabla (v_{j_0})\big )- \frac{1}{n^2(n-1)} \end{aligned}$$

\(\square \)

In terms of Theorem 3, \({\mathcal {R}}\)-energy can be efficiently updated when vertices are added or edges are modified. As a result, we can compute R-energy for dynamic networks in an incremental and efficient manner.

4.4 Some important properties of \({\mathcal {R}}\)-energy

In this section, we show some properties of \({\mathcal {R}}\)-energy of a weighted network.

4.4.1 \({\mathcal {R}}\)-energy for complete networks

Theorem 4

Given a weighted network \(G = (V, E, W)\) of size n,

  1. 1.

    If G is a clique of equal edge weight, then \({\mathbb {E}}_{{\mathcal {R}}}(G) = 0;\)

  2. 2.

    If G is a biclique of equal edge weight, then \({\mathbb {E}}_{{\mathcal {R}}}(G) = \frac{n-2}{(n-1)^2};\)

Proof

As G is a clique of equal edge weight w, degree of each vertex is therefore \((n-1)w\).

$$\begin{aligned} {\mathbb {E}}_{{\mathcal {R}}}(G) = \frac{1}{n-1}\sum _{i=1}^{n}\sum _{j\ne i}^{n}\frac{w^2}{(n-1)^2 w^2} -\frac{n}{(n-1)^2} = 0. \end{aligned}$$

If G is a biclique of equal edge weight, let \(|V_1| = p\), \(|V_2| = q\) and weight be w. Degree of each vertex \(v_1\in V_1\) is qw, and degree of each vertex \(v_2\in V_2\) is pw. Then, \({\mathbb {E}}_{{\mathcal {R}}}(G)\) can be computed as:

$$\begin{aligned}&\frac{1}{n-1}\left( \sum _{i=1}^{p}\sum _{j\ne i}^{n}\frac{w^2}{pq w^2} + \sum _{i=p+1}^{p+q}\sum _{j\ne i}^{n}\frac{w^2}{pq w^2}\right) -\frac{n}{(n-1)^2}\\&\quad = \frac{1}{n-1}\left( \sum _{i=1}^{p}\frac{1}{p} + \sum _{i=p+1}^{p+q}\frac{1}{q}\right) -\frac{n}{(n-1)^2}\\&\quad = \frac{2}{n-1} -\frac{n}{(n-1)^2} = \frac{n-2}{(n-1)^2}. \end{aligned}$$

\(\square \)

This theorem indicates that a clique with equal edge weight is equivalent to an unweighted clique. For example, Fig. 1 shows two cliques of 4 vertices with different weights. In terms of Theorem 4 and Eq. (6), \({\mathcal {R}}\)-energies of two cliques in Fig. 1 are \({\mathbb {E}}_{{\mathcal {R}}}(G_a) = 0\) and \({\mathbb {E}}_{{\mathcal {R}}}(G_b) = \frac{8}{225}\). Network \(G_a\) is therefore more robust than \(G_b\). In addition, only clique with equal weight can achieve zero \({\mathcal {R}}\)-energy. For a biclique, it can be very robust if it is large in size.

Fig. 1
figure 1

Two cliques with different weights

4.4.2 Bounds of \({\mathcal {R}}\)-energy

Theorem 5

Let LB and UB be \(\frac{n[w_{{\textit{min}}}(n-1)-d_{{\textit{max}}}]}{d_{{\textit{max}}}(n-1)^2}\) and \(\frac{n[w_{{\textit{max}}}(n-1)-d_{{\textit{min}}}]}{d_{{\textit{min}}}(n-1)^2}\), respectively. If G is a connected and weighted network of n vertices, then

$$\begin{aligned} \max \big \{0,LB \big \} \le {\mathbb {E}}_{{\mathcal {R}}}(G) < \min \big \{1, UB\big \}, \end{aligned}$$
(12)

where \(w_{{\textit{min}}}\) and \(w_{{\textit{max}}}\) are the minimum and maximum weights of network G, and \(d_{{\textit{min}}}\) and \(d_{{\textit{max}}}\) are the minimum and maximum vertex degrees of network G.

Proof

At first, we bound the value of \({\textit{var}}(G) := \sum _{i=1}^{n}\sum _{j\ne i}^{n}\frac{w_{G}^2(i,j)}{d(v_i)d(v_j)}.\)

$$\begin{aligned} var(G)&= \sum _{i=1}^{n}\frac{1}{d(v_i)}\sum _{v_j\in N(v_i)}\frac{w_{G}^2(i,j)}{d(v_j)}\\&\le \sum _{i=1}^{n}\frac{w_{{\textit{max}}}}{d(v_i)}\sum _{v_j\in N(v_i)}\frac{w_{G}(i,j)}{d_{{\textit{min}}}} = \frac{nw_{{\textit{max}}}}{d_{{\textit{min}}}} \end{aligned}$$

Similarly, we have

$$\begin{aligned} {\textit{var}}(G) \ge \sum _{i=1}^{n}\frac{w_{{\textit{min}}}}{d(v_i)}\sum _{v_j\in N(v_i)}\frac{w_{G}(i,j)}{d_{{\textit{max}}}} = \frac{nw_{{\textit{min}}}}{d_{{\textit{max}}}} \end{aligned}$$

Thus, we have

$$\begin{aligned} \frac{nw_{{\textit{min}}}}{d_{{\textit{max}}}}\le {\textit{var}}(G) \le \frac{nw_{{\textit{max}}}}{d_{{\textit{min}}}} \end{aligned}$$

According to Theorem 2, we have

$$\begin{aligned} {\mathbb {E}}_{{\mathcal {R}}}(G) = \frac{{\textit{var}}(G)}{n-1} - \frac{n}{(n-1)^2} \end{aligned}$$

It is easy to get the bound of \({\mathbb {E}}_{{\mathcal {R}}}(G)\) as

$$\begin{aligned} LB\le {\mathbb {E}}_{{\mathcal {R}}}(G)\le UB \end{aligned}$$

Note that \({\mathbb {E}}_{{\mathcal {R}}}(G)\ge 0\) because it is defined as variance of \(\zeta _i\) for \(i = 2, 3, \ldots , n\). Therefore, we have the left side of Eq. (12). Furthermore, because \(0\le \frac{w_{G}(i,j)}{d(v_i)} \le 1\) and \(0\le \frac{w_{G}(i,j)}{d(v_j)}\le 1\), we have

$$\begin{aligned} {\textit{var}}(G)&= \sum _{i=1}^{n}\sum _{v_j\in N(v_i)}\frac{w_{G}(i,j)}{d(v_i)}\frac{w_{G}(i,j)}{d(v_j)} \\&\le \frac{1}{2}\sum _{i=1}^{n}\sum _{v_j\in N(v_i)}\left( \frac{w_{G}(i,j)}{d(v_i)}+\frac{w_{G}(i,j)}{d(v_j)}\right) = n \end{aligned}$$

Then, we have \({\mathbb {E}}_{{\mathcal {R}}}(G) < 1\). Thus, we have the right side of Eq. (12). \(\square \)

This theorem indicates that \({\mathcal {R}}\)-energy ranges from 0 to 1. The left equality holds if weighted network G is a clique with equal weight.

4.4.3 Other topological measures

In this section, we analyze the other possible robustness measures on weighted networks.

The weighted algebraic connectivity, which is defined by the second smallest eigenvalue of Laplacian matrix of a weighted network, is applied to evaluate robustness of weighted airport transportation network [28] and is a measurement of the robustness for weighted networks [20].

The entropy of a weighted network is defined in Eq. (13):

$$\begin{aligned} {\textit{entropy}} = - \sum _{v\in V} \frac{d(v)}{2m}\log {\left( \frac{d(v)}{2m}\right) }, \end{aligned}$$
(13)

where d(v) denotes the weighted degree of vertex v, and m presents the total weighted degree of all vertices of the network. The entropy of a weighted network evaluates how biased weighted degrees of vertices of the network are. The entropy of a network is maximized, which is \({\textit{log}}(n - 1)\), if the network is a d-regular network with equal weights whatever the positive integer d is.

The disparity of vertex \(v_i\) is defined as below [4]:

$$\begin{aligned} \eta (v_i) = \sum _{v_j\in N(v_i)} \left( \frac{w_{G}(i, j)}{d(v_i)}\right) ^2 \end{aligned}$$
(14)

This measure distinguishes how biased weights of out-link edges of a vertices are. For a vertex of k neighbors, when all weights are of the same order, the quantity is closed to \(\frac{1}{k}\) (\(\ll 1\)). In contrast, where only a small number of connections dominate, the quantity is of order \(\frac{1}{n}\) (\(n \ll k\)). Based on disparities of all vertices in a weighted network, we define mean and variance disparities for the network as follows:

$$\begin{aligned} {\textit{m-disparity}}&= \frac{1}{n} \sum _{v\in V} \eta (v);\\ {\textit{v-disparity}}&= \frac{1}{n} \sum _{v\in V} (\eta (v) - {\textit{m-disparity}})^2. \end{aligned}$$
Table 2 Property summary for possible robustness measures for weighted networks

Existing measures may not be suitable to evaluate robustness of weighted networks. We summarize our analysis in Table 2. In detail, a reasonable robustness measure can evaluate how well following networks have:

  • Networks with isolated vertices: many vertices in a scale-free network have few neighbors. They are easy to be isolated vertices when the network is attacked. From Table 2, entropy, weighted algebraic connectivity, mean disparity, and variance disparity cannot evaluate robustness of network with isolated vertices because: (1) entropy, mean disparity, and variance disparity have no definition for a vertex of zero degree; (2) weighted algebraic connectivity is always zero.

  • Disconnected networks: networks always have many strongly connected components. Weighted algebraic connectivity is zero if the network has multiple strongly connected components. Even though the giant component is a representation of the network, some networks may not have giant component. For the case, weighted algebraic connectivity is invalid which is shown in Table 2.

  • d-regular networks: a d-regular network is closed to a clique when d is a large value. On contrast, the network is far away being a clique. Entropy and variance disparity of all d-regular networks are \({\textit{log}}(n - 1)\) and zero, respectively (note that all edges have the same weight in each d-regular network). Even though regular networks with different d values have different topological structures, entropy and variance disparity cannot distinguish them.

  • Weighted networks: edges of a network may associate with weight or attached information, such as times of interactions and # retweets between two users. However, binary version of \({\mathcal {R}}\)-energy ignores weights of edges in a network.

  • Large-scale networks: today’s networks are usually of very large scale. For example, Cit-Patents [22] has about millions vertices and ten millions edges. Weighted algebraic connectivity cannot be computed efficiently since the second smallest eigenvalue need to be computed. Therefore, efficient measurements of network robustness are required.

5 Robustness on static networks

In this section, we evaluate our proposed \({\mathcal {R}}\)-energy on static networks including synthetically created networks and some real-world networks. We design a set of experiments to compare: (1) the effectiveness and scalability of \({\mathcal {R}}\)-energy; (2) common patterns which are found based on \({\mathcal {R}}\)-energy. The experiments were implemented in Java. They were all conducted on a dual-core 64-bit processor with 3.06 GHz CPUs and 128 GB of RAM.

5.1 Networks

Synthetic networks \({\textit{Syn}}\_N\) is a synthetic graph with N vertices. The generator outputs a synthetic graph as shown in Algorithm 2. The algorithm initializes a graph with N vertices and empty adjacency list. According to the property of the scale-free network, it assigns a degree k to each vertex v in this graph such that \({\textit{Pr}}[{\textit{deg}}(v) = k] \approx k^{-\alpha }\) from Lines 5 to 8. Note that the value of \({\textit{totalDeg}}\) should be even. The steps from Lines 9 to 11 guarantee this condition. Next, we assign the neighbors of each vertex after sorting vertices by degree in decreasing order from Lines 13 to 25. Finally, we assign a weight to an undirected edge randomly from Lines 26 to 30.

figure h

Real networks We use six static real networks with different sizes from Mark Newman’s collectionFootnote 1, Kristian Skrede Gleditsch’s collectionFootnote 2, and Stanford Large Network Dataset Collection.Footnote 3

  • IT: it provides estimates of trade flows between independent states (1948–2000) [15].

  • CN: Neural network [27].

  • HT: it is a weighted network of coauthorships between scientists posting preprints on the High-Energy Theory E-Print Archive between January 1, 1995, and December 31, 1999 [25].

  • AP: it is a weighted network of coauthorships between scientists posting preprints on the Astrophysics E-Print Archive between January 1, 1995, and December 31, 1999 [25].

  • CM: it is a weighted network of coauthorships between scientists posting preprints on the Condensed Matter E-Print Archive between January 1, 1995, and March 31, 2005 [25].

  • SP: it is an undirected social network on Pokec [22].

  • WT: it is an undirected communication network on Wikipedia [27].

  • CP: it is a US patent dataset which spans 37 years (January 1, 1963, to December 30, 1999) and includes all the utility patents granted during that period [22].

The descriptive statistics of these networks are demonstrated in Table 3. In this work, we consider these networks weighted and undirected.

Table 3 Descriptive statistics of experimental networks

5.2 Efficiency and scalability of computing \({\mathcal {R}}\)-energy

In this task, the synthetic networks are generated with different sizes. We compute the values of \({\mathcal {R}}\)-energy for both synthetic and real networks. We illustrate the elapsed time and the values of \({\mathcal {R}}\)-energy in Fig. 2. Note that computing R-energy and algebraic connectivity is faster than computing node connectivity and edge connectivity since computing later metrics needs to check all cuts of a network, which is very expensive operation. In addition, the later metrics are only proposed for evaluating robustness of unweighted networks. As a result, we only compare the efficiency for computing R-energy and algebraic connectivity in part. In Fig. 2a, we demonstrate the elapsed time for computing R-energy and algebraic connectivity. As illustrated in Fig. 2a, c, we observe that the elapsed time for \({\mathcal {R}}\)-energy linearly scales with the number of edges. Furthermore, the elapsed time of computing \({\mathcal {R}}\)-energy is less than 400 ms for synthetic networks with more than 100,000 vertices, and less than 120 s for the largest network Cit-Patents [22] with 5.9M vertices and 33.0M edges. We hold that it is because the complexity of computing \({\mathcal {R}}\)-energy is only \(O(|V|+|E|)\). However, the elapsed time of computing algebraic connectivity is more than 1000 times than that of computing R-energy when there are \(4\times 10^4\) vertices in a synthetic network. This points to the advantage of using \({\mathcal {R}}\)-energy to measure robustness for large networks.

Fig. 2
figure 2

Performance of computing the \({\mathcal {R}}\)-energy

5.3 Impact of vertex removal to \({\mathcal {R}}\)-energy

Unweighted networks with heavy tail are known to be highly robust against random removal of vertices [8], but are hypersensitive to removal of high-degree vertices [1, 6]. We would like to check whether the same conclusion can be observed from weighted networks.

In this task, we experiment with three vertex removal options, namely (a) remove in decreasing degree order; (b) remove in increasing degree order; and (c) remove in random order. For each option, after removing x fraction of vertices from the networks, we compute \({\mathcal {R}}\)-energy to measure the new network robustness. Figure 3 illustrates the \({\mathcal {R}}\)-energy of resultant network for the three options compared with the \({\mathcal {R}}\)-energy of the original network. From the figure, we obtain three important observations as follows.

  • Networks become less robust sooner when vertices with the highest degrees are removed. As demonstrated in Fig. 3, compared with the original networks, the values of \({\mathcal {R}}\)-energy increase sooner when vertices with the highest degrees are removed than when vertices with small degrees are removed, or randomly removed. This is due to the fact that vertices with high degrees tend to have smaller 2-step commute probabilities. Removing them leads to an increase in average returning probability. Therefore, the network becomes less robust.

  • Networks remain robust or become slightly more robust when vertices with the smallest degrees are removed. As illustrated in Fig. 3, we find that the values of \({\mathcal {R}}\)-energy remain constant or decrease slightly when vertices with the smallest degrees are removed from the networks. Because vertices with the smallest degrees have larger 2-step commute probabilities, removing the vertices with the smallest degrees results in little decrease in the average of the returning probabilities.

  • Networks become less robust when vertices are randomly removed. However, the change is slower than that of removing vertices of the highest degrees. This observation can be attributed to the fact that each vertex has a certain chance to decrease its degree when we remove vertices at random. It indicates that the 2-step commute probability of each vertex increases with certain probability. However, vertices with smallest degrees are more likely to be removed in scale-free networks. Hence, vertices of large 2-step commute probabilities are more likely to be removed leading to a decrease in network energy.

The above three observations are also consistent with the results of the unweighted networks [14] and point out the rationality of proposed \({\mathcal {R}}\)-energy.

Fig. 3
figure 3

\({\mathcal {R}}\)-energies of static networks

6 Robustness on dynamic networks

Weighted networks evolve with time and so are their robustness. In this section, we apply \({\mathcal {R}}\)-energy on dynamic and time-evolving Twitter weighted network so as to evaluate robustness as a possible measure to detect events and trends. Unlike the previous event and trend detection research which considers time series of messages or news articles generated in social media, our approach utilizes dynamic changes to network structure. These are the changes that cause a network to become suddenly more robust or less robust than usual.

6.1 Data collection

Twitter is a popular microblogging site with users generating and sharing short message contents in real time [21]. In this experiment, we first selected a set of Twitter users \(U^{us}\) (\(U^{sg}\)) who are the followers and followees of a small set of seed user accounts that belong to US (Singapore) politicians and analysts. These are the users who are more likely to tweet about political topics. We crawled the Twitter data generated by \(U^{\cdot }\) from May 1, 2012, to July 29, 2012.

From \(U^{\cdot }\), we further selected users who write, reply, or retweet at least a tweet per month over three months. There are 129,056 and 48,339 such users from the USA and Singapore, and we keep them in the user set \(U^{\cdot }\) discarding the remaining users and their tweets. Each day, a subset of users in \(U^{\cdot }\) may reply or retweet one another. We therefore construct a weighted reply network and another weighted retweet network for day t and denote them by \(G^{\cdot }_{RE}(t)\) and \(G^{\cdot }_{RT}(t)\), respectively. An undirected edge (uv) is included in the reply network for day t if user u replies at least a tweet from user v, or user v replies at least a tweet from user u in day t. The weight of undirected edge (uv) or (vu) is the number of reply tweets between users u and v. The edges in retweet network on day t are created in a similar manner.

6.2 Event detection

We demonstrate the \({\mathcal {R}}\)-energies of \(G^{\cdot }_{RE}(t)\) and \(G^{\cdot }_{RT}(t)\) in Fig. 4a, b. To facilitate reading, we add vertical lines representing Sundays to the figures. From the figures, we aim to determine events that are characterized by bursts and drops of communication (replies or retweets) by many users. We call these the internal and external events as the former can be explained by the bursty content but not the latter. For example, a sport event may draw user attention away from tweeting about politics. In addition to event detection, we also want to explain internal events by searching the web.

Suppose \((e_1,e_2\ldots ,e_{90})\) is the sequence of \({\mathcal {R}}\)-energy values, where \(e_i\) is the value of \({\mathcal {R}}\)-energy for the i-th day. We calculate the absolute first-order difference of energy sequence, denoted as \((d_1,d_2\ldots ,d_{90})\), where \(d_1 = 0\) and \(d_{t+1} = |e_{t+1} - e_{t}|\) for \(1\le t \le 89\). Based on the mean and standard deviation of \(\{d_t\}\), we can detect an event at time \(t'\) statistically if \(|d_{t'} - {\textit{mean}}(\{d_t\})| > \gamma \cdot {\textit{stddev}}(\{d_t\})\) where \({\textit{mean}}(\{d_t\})\) and \({\textit{stddev}}(\{d_t\})\) denote the mean and standard deviation of \(\{d_t\}\), respectively. In other words, an event is found when the absolute first-order difference deviates from mean more than \(\gamma \) times the standard deviation. However, mean is known to be sensitive to anomalies. We therefore employ trimmed mean that is defined as the mean after discarding the smallest and largest \(\tau {\%} \cdot |\{d_t\}|\) values. In this work, we set \(\gamma =3\) and \(\tau = 5\) empirically.

Fig. 4
figure 4

\({\mathcal {G}}\)-energies of dynamic networks

To describe an event at day t, we need to extract relevant event description keywords from tweets (which can be replies or retweets) generated on the same day t. We denote the words extracted from reply tweets (or retweets) on day t by \(W^{RE}(t)\) (or \(W^{RT}(t)\)) and the frequency of word \(w\in W^{RE}(t)\) (or \(W^{RT}(t)\)) by \(f^{RE}(w,t)\) (or \(f^{RT}(w,t)\)). We define the first-order frequency difference of word w for day t as \(df^{*}(w,t)=f^{*}(w,t)-f^{*}(w,t-1)\).Footnote 4 From \(\{df^{*}(w,t)\}\), we derive the mean and standard deviation as \({\textit{mean}}^{*}(w)\) and \({\textit{stddev}}^{*}(w)\), respectively.

Table 4 Descriptive statistics of reply and retweet networks

Table 4 illustrates the means and standard deviations of absolute energy difference sequence and word frequency difference sequence of the dynamic reply and retweet networks.

Take the largest difference of energy from both \(G^{us}_{RE}\) and \(G^{us}_{RT}\) on June 28, 2012, as an example. The top three words from retweets with highest frequency difference are ‘tax,’ ‘Obamacar,’ and ‘scotu’ (Supreme Court of United States) after stopword removal and word stemming. By searching the web using these keywords, we verified that the Obamacare healthcare law was upheld by the Supreme Court of United States, and there were concerns about tax increase as its outcome. This event attracted a lot of replies and retweets on June 28. The word frequency difference of ‘Obamacar’ in retweets subsided quickly on June 29, 2012, as shown by a negative \(df^{RT}\)(‘Obamacar,’ June 29) value.

For each day t, we define the average frequency difference of the three words \(w_1\), \(w_2\) and \(w_3\) with highest \(df^{*}(\cdot ,t)\) as \( M^{*}(t) =\frac{1}{3} \sum _{i = 1}^{3} df^{*}(w_i, t)\). If \(M^{*}(t)\) deviates far away from the mean \({\textit{mean}}^{*}(w)\) w.r.t. the value \({\textit{stddev}}^{*}(w)\), an event is considered to happen on day t.

Formally, we define the normalized \(M^{*}(t)\) on day t as

$$\begin{aligned} N^{*}(t) = \frac{M^{*}(t) - {\textit{mean}}^{*}(w)}{{\textit{stddev}}^{*}(w)} \end{aligned}$$

The larger the \(N^{*}(t)\) is, the more likely the top words are able to explain some event on t. Empirically, we use the words with \(N^{*}(t)\ge 8\) to help us to explain internal events. On the other hand, an external event may prevent people from communicating in Twitter. In this case, \(N^{*}(t)\) may be small due to very few users generating tweets. We nevertheless tried to use the frequent words on day t to search the web to confirm if an event is external.

Fig. 5
figure 5

Normalized difference of word frequencies

Figure 5a illustrates the \(N^{*}(t)\) values of both \(G^{us}_{RE}\) and \(G^{us}_{RT}\). Table 5 lists eight events found from \(G^{us}_{RE}\) and \(G^{us}_{RT}\) using \({\mathcal {R}}\)-energy. The first column shows the location of event in Fig. 4. The second column shows the date of event and \(N^{*}(t)\) value. The third column shows the top three words derived by top frequency differences in \(G^{us}_{RE}\) or \(G^{us}_{RT}\) depending on which of the two networks are used to detect the event. The final column shows the description of events manually derived from the Google Search results of the top words.

Similarly, Fig. 5b illustrates the \(N^{*}(t)\) values of both \(G^{sg}_{RE}\) and \(G^{sg}_{RT}\). Table 6 lists four events found from \(G^{sg}_{RE}\) and \(G^{sg}_{RT}\) using \({\mathcal {R}}\)-energy.

Instead of using \({\mathcal {R}}\)-energy, we also experimented with time series of daily reply and retweet counts using a similar event detection method. Unlike the \({\mathcal {R}}\)-energy time series, we could detect only two events on June 28 and June 30 listed in Table 5 and one event on June 19 listed in Table 6. This is because reply and retweet counts fluctuate very much over time. We therefore detect fewer bursty events than that using \({\mathcal {R}}\)-energy. The results also show that \({\mathcal {R}}\)-energy can help detecting events that are different.

Table 5 Detected events from \(G_{RE}^{us}\) and \(G_{RT}^{us}\)

Compared the detected events in Table 5 with that of in Table 4 of the original conference version [14], we can observe that four more events are detected from the reply network. This is due to the factor that an unweighted edge in the reply network does not reflect the process of user interaction. Even though two users have a lively discussion about a hot topic, only an unweighted edge forms between them, while a weighted edge captures the interaction between users. However, we do not observe this phenomenon in the retweet network since users usually do not retweet a tweet many times. This points to the positive effect of defining R-energy on a weighted network.

Table 6 Detected events from \(G_{RE}^{sg}\) and \(G_{RT}^{sg}\)

6.3 Periodic trend pattern detection

Other than ad hoc events, Mann–Kendall trend test [24] indicates that a periodic pattern significantly exists in \(G^{\cdot }_{RE}\) and \(G^{\cdot }_{RT}\) of Fig. 4a, b. We also want to detect weekly trend patterns from the figure by examining the regularities in network energy changes. This weekly pattern can be even more distinct when the ad hoc events are removed.

In this section, we therefore focus on detecting weekly pattern. Based on a pre-defined threshold \(\theta \) (\(=0.1\times {\textit{mean}}(\{d_t\})\)), we first derive three kinds of energy changes from the previous day, namely (i) energy increase (‘\(+\)’), (ii) energy decrease (‘−’), and (iii) insignificant change (\({\textit{null}}\)). Given a day of a week x, e.g., Tuesday, we count the number of ‘\(+\)’s, ‘−’s, and \({\textit{null}}\)’s and denote them by p(x), n(x), and \({\textit{null}}(x)\), respectively. After ignoring the ad hoc events, we increment p(x) if the energy change is more than \(\theta \); increment n(x) if the energy change is smaller than \(-\theta \); or increment \({\textit{null}}(x)\) otherwise. The proportions of ‘\(+\)’s and ‘−’s on x across multiple weeks can be defined as:

$$\begin{aligned} {\textit{prop}}(\text {`+'},x)&= \frac{p(x)}{p(x) + n(x) + {\textit{null}}(x)}\\ {\textit{prop}}(`\!-\!\text {'},x)&= \frac{n(x)}{p(x) + n(x) + {\textit{null}}(x)}\\ {\textit{prop}}({\textit{null}},x)&= \frac{{\textit{null}}(x)}{p(x) + n(x) + {\textit{null}}(x)} \end{aligned}$$

Let \({\textit{max}}_{{\textit{prop}}}(x)\) be maximum value of \({\textit{prop}}(\text {`+'},x)\), \({\textit{prop}}(`\!-\!\text {'},x)\) and \({\textit{prop}}({\textit{null}},x)\). We assign a label l to day x as follows:

$$\begin{aligned} l=\left\{ \begin{array}{ll} \text {`+'}, &{}\quad \text {if}\; {\textit{prop}}(\text {`+'},x) \text { equal to } {\textit{max}}_{{\textit{prop}}}(x)\\ `\!-\!\text {'}, &{}\quad \text {if}\; {\textit{prop}}(`\!-\!\text {'},x) \text { equal to } {\textit{max}}_{{\textit{prop}}}(x)\\ {\textit{null}}, &{}\quad \mathrm{otherwise} \end{array} \right. \end{aligned}$$
(15)

In case of \({\textit{prop}}(\text {`+'},x) = {\textit{prop}}(`\!-\!\text {'},x) = {\textit{max}}_{{\textit{prop}}}(x)\), we assign a \({\textit{null}}\) label to the day x.

For example, suppose out of 13 weeks, there are 12 Mondays with ‘−’s, one with ‘\(+\)’ and zero with \({\textit{null}}\). The compositions of positive, negative, and null energy changes on Monday are 7.7%, 92.3%, and 0%, respectively. Monday therefore is assigned to ‘−’. By assembling the proportions of positive, negative, null energy changes for different days of week, we obtain the weekly trend pattern of \(G^{\cdot }_{RE}\) and \(G^{\cdot }_{RT}\).

Figure 6 illustrates the composition of weekly pattern for \(G^{us}_{RE}\) and \(G^{us}_{RT}\). According to label assignment rule, we obtain the weekly trend pattern ‘\(- - + + + + -\)’ for \(G^{us}_{RE}\), and another weekly trend pattern ‘\(- - - + - + -\)’ for \(G^{us}_{RT}\). Other than Friday, the two weekly trend patterns obtained from \(G^{us}_{RE}\) and \(G^{us}_{RT}\) are very similar.

Fig. 6
figure 6

Weekly pattern detecting for US users

Figure 7 illustrates the composition of weekly pattern for \(G^{sg}_{RE}\) and \(G^{sg}_{RT}\). According to label assignment rule, we obtain the weekly trend pattern ‘\(- + + - + + -\)’ for \(G^{sg}_{RE}\), and another weekly trend pattern ‘\(- + + + + + -\)’ for \(G^{sg}_{RT}\). Other than Friday, the two weekly trend patterns obtained from \(G^{sg}_{RE}\) and \(G^{sg}_{RT}\) are very similar. In addition, the weekly trend patterns for two countries are very similar.

From the weekly trend pattern, we can casually conclude that users are less likely to tweet on Saturdays but tweet a lot on Sundays as well as Mondays.

Fig. 7
figure 7

Weekly pattern detecting for Singapore users

7 Conclusion

In many applications, an obvious characteristic is tending to be modeled them with large-scale networks, such as protein–protein interaction networks, neural networks, the Internet, the World Wide Web, social networks, and scientific collaboration networks. To understand the robustness of network with millions or billions vertices is an important and challenged task both in theory and application. In this paper, based on the normalized Laplacian matrix, we define the \({\mathcal {R}}\)-energy for a weighted network to measure its robustness.

In theory, the \({\mathcal {R}}\)-energy is related to the average 2-step commute probability. Vertex has smaller 2-step commute probability if it has higher connectivity. It indicates that the high robustness network has smaller \({\mathcal {R}}\)-energy. Furthermore, the complexity of computing \({\mathcal {R}}\)-energy is \(O(|V| + |E|)\) since the algorithm just scans the entire weighted network once after obtaining the weighted degrees of all vertices. Therefore, it can be easily applied to large networks. Our empirical study illustrates that our algorithm takes less than 120 s for a network with millions vertices. In practice, we can find some patterns of robustness of static networks and some anomaly cases of dynamic networks.

In this work, we have only considered the dynamic information network offline, and thus, it may fail for online event detection. Since online event detection may be more helpful in real-world applications. To address this issue, we plan to extend our \({\mathcal {R}}\)-energy to online detect events on real social network platforms, such as Facebook and Twitter. In addition, we can construct the interaction network for a particular event. In terms of the connection of this information network, we can also apply our \({\mathcal {R}}\)-energy to predict its future trend. Thus, we plan to investigate how to accurately predict its burst.