Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Introduction

One of the most interesting properties of real networks is modularity, i.e., the tendency of nodes to partition themselves into communities (Girvan and Newman 2002; Newman 2006). Loosely speaking, a community is a group of nodes for which the density of links within a group is higher than across the groups. Those communities might represent groups of individuals with shared interests in online social networks, topic-specific research communities in co-authorship networks, and so on.

Much recent research has focused on methods for detecting and analyzing community structure in networks (for a recent review of existing approaches see Fortunato (2010) and references therein). However, the dynamical properties of modular and correlated networks have started to attract attention only recently (Arenas et al. 2006; Galstyan and Cohen 2007; Gleeson 2008; Melnik et al. 2012; Payne et al. 2009).

Understanding the impact of group structure on network dynamics is important for social computing applications. Consider, for instance, word-of-mouth (or viral) marketing of a new product. If different consumer groups have different rating criteria for the product, or different reaction to marketing strategies, then one needs to model how influence propagates within and across communities to predict whether the product will be a hit, or confined to a small subset of consumers. Similarly, understanding how a political message propagates within and across partisan constituencies could be very important for designing effective political campaigns.

Here we report our analysis of a simple dynamical process in networks with community structure. We consider a threshold-based dynamical process on networks (Watts 2004) where the nodes can be in two states, passive or active. The actual meaning of those states is application-dependent (e.g., in viral marketing activation might correspond to purchasing a product). Starting from initial configuration with only a handful of nodes in the active state, we consider a discrete-time dynamics where at each time step, a passive node becomes active if the number of his active neighbors exceeds some predefined threshold. This process is iterated until none of the nodes change his state.

We study the dynamical properties of the above model for networks composed of two loosely coupled communities. Our main observation is that if the initially active nodes (seeds) are contained in one of the communities, then under certain conditions the cascading process has a two-tiered structure, that is, the peaks of the activation dynamics in each community are well separated in time. Furthermore, depending on the link density between and across the groups, and the fraction of seed nodes, the activation might either die out, spread to one of the groups, or spread to both groups. In particular, for a given network, there is a critical fraction of the seed nodes, so that below this critical threshold the activation process is contained, while above the threshold the activation spreads throughout the network. This critical behavior has implications for problems such as influence maximization, where one intends to select initial target nodes so that the size of the resulting cascade is maximal. In particular, we demonstrate that simple target selection strategies that neglect the network community structure can yield overly sub-optimal results.

The rest of the paper is organized as follows: In the next section we formally introduce the cascade model and present its mean-field analysis for networks with structural heterogeneity-random graphs consisting of two loosely coupled sub-graphs (communities). We then elaborate on the implications of the analysis on the influence-maximization problem, and present experiments on synthetically generated networks. We conclude the paper by discussing our main results in the context of the existing literature and pointing out open research questions.

Mean Field Analysis of the Activation Dynamics

Cascade Model

There are a number of approaches for modeling activation cascades on networks (see Borge-Holthoefer et al. (2013) for a recent survey). In this paper we use the Linear Threshold Model (Granovetter 1978) (LTM), which, starting from a set of initially active nodes, propagates the activation through a threshold-based mechanism. Let \( {\mathcal{N}}_{i}\) be the set of active neighbors of node i. Then the node i is activated whenever

$$ {\displaystyle \sum _{j\in {\mathcal{N}}_{i}}{w}_{ij}}\ge {q}_{i}$$
(1)

Here w i j is the normalized weight of the link between the nodes i and j, j w i j = 1, and θ i is the activation threshold for the node i. Usually, θ-s are assumed to be random variables from some distribution, reflecting the uncertainty about individuals.

To simplify the analysis, here we use a modified version of the linear threshold model, where the threshold condition is applied not to the fraction of active neighbors, but their number. We stress, however, that our main results are valid for the fractional threshold model as well, provided that it demonstrates a phase-transition behavior.Footnote 1

Let us associate a binary state variable with each node, s i ∈ { 0, 1}, where the states 0 and 1 correspond to passive and active states, respectively. Then the dynamics of the process is characterized by the following set of equations:

$$ {s}_{i}(t+1)=\Theta [{\displaystyle \sum _{j}{W}_{ij}}{s}_{j}(t)-{h}_{i}]$$
(2)

where Θ(x) is the step function,Footnote 2 h i is the activation threshold for the ith node, and W is the adjacency matrix of the network: For the sake of simplicity, we consider the case of an unweighted graph, so that the entries in the adjacency matrix are either 0 or 1. Equation 2 is iterated until steady state is achieved, that is, none of the nodes changes its state upon further iteration.

We have previously developed a mean-field theory of activation dynamics on modular graphs (Galstyan and Cohen 2007) in the case when the thresholds were the same for all the nodes, h i = H. Here we generalize the framework to the case when nodes have different activation thresholds, drawn from a specified distribution P h .

Activation Cascades in Single-Community Networks

Let us first focus on a single-community network, and consider a graph composed of N nodes, where each of the \( N(N-1)/2\) edges is present with probability p. In the limit of large N, the resulting degree distribution of nodes in this network is the Poisson distribution with a mean z = p N.

Let ρ h (t) be the fraction of active nodes with activation threshold h at time t. Initially, it equals to the fraction of nodes that have been targeted, \( \rho_{h} (t=0) = \rho_{h,0}\). We assume that probability for a node to be selected as a seed is independent of its activation threshold, so that ρ h, 0 = ρ 0. The total fraction of active nodes is ρ(t) = h ρ h (t)P h . Further, let P(k; t) be the probability that a randomly chosen node is connected with exactly k active nodes at time t. It is easy to see that at time t = 0, k is given by Poisson distribution with a mean p N 0z ρ 0.Footnote 3 To study the dynamics of the process, we need to estimate these distributions for later times. To do so, here we use the so called annealed approximation, which has been used to study the dynamical properties of random boolean networks (Derrida and Pomeau 1986; Derrida and Stauffer 1986; Rohlf and Bornholdt 2002). Within the annealed approximation, one averages over the disorder by “rewiring” the network after each iteration. Since during the rewiring process all edges are equally likely, it is easy to see that P(k; t) is still given by a Poisson distribution: However, the mean now depends on the fraction of active nodes ρ(t) = h ρ h (t)P h :

$$ P(k;t)={e}^{zr(t)}\frac{{[zr(t)]}^{k}}{k!}$$
(3)

Consider all the nodes with an activation threshold h. On the first step of the cascading process, the fraction of active nodes among those is given by kh P(k; t = 0). In later iterations, the fraction of active nodes can be calculated as follows. There are N h (1 − ρ h (t)) passive nodes at time t, and each one of these nodes will probability kh P(k; t). Also, due to the rewiring, some of the N h (ρ h (t) − ρ h, 0) active nodes will switch to passive state with the rate k < h P(k; t). We note that the initially targeted nodes are not allowed to de-activate. Combining these together, and using the normalization condition \( {\displaystyle {\sum }_{k=0}^{\infty }P}(k;t)=1\), we obtain the following set of equations

$$ {r}_h(t+1)=1-(1-{r}_)Q(h;zr(t))$$
(4)

where \( Q(h,x)={\displaystyle \sum _{}{}_{k<h}{e}^{-x}}{x}^{k}/k!\) is the regularized gamma function.

To get the total fraction of activated nodes, we multiply Eq. 10 by P h and sum over h, which yields

$$ r(t+1)=1-(1-{r}_{\displaystyle \sum _{h=0}^{\infty }{P}_{h}}Q(h;zr(t))$$
(5)

Equation 5 describes the dynamics of the cascading process in the network. For a fixed connectivity z, the dynamics depends on the fraction of initially targeted nodes, ρ 0, as well as on the threshold distribution function P h . Let us elaborate on the latter dependence in more details. First of all, we assume that P 0 = 0, i.e., there are no nodes that activate spontaneously, aside from the initially targeted nodes. Furthermore, simple inspection shows that the dynamical properties of the model depend on the fraction of nodes with threshold h = 1, P 1. We call these nodes vulnerable since they will activate whenever one of their neighbors is active. Clearly, if the fraction of the vulnerable nodes is sufficiently large, a single node might trigger a global cascade throughout the network. Without going into much mathematical details, we simply observe that such a global cascade will happen whenever the vulnerable nodes form a giant connected component, which, for the random Erdos–Renyi graphs translates into P 1 z ∼ 1. In this paper we focus on the case when P 1 is either zero, or sufficiently small, P 1 ≪ 1 ∕ z, so that for a network of size N, the number of nodes required to cause a global cascade must be of order O(N).

For the latter case, the analysis of Eq. 5 yields the following observation: For a given connectivity z, there is a critical fraction ρ c such that for ρ 0 < ρ c the activation process is localized, while for ρ 0 > ρ c activation spreads to all the nodes in the network. This is schematically illustrated in Fig. 1a, where we plot ρ(t + 1) against ρ(t). Note that the intersections characterize the steady state of the dynamics, or in other words, the fraction of activated nodes at the end of the cascading process. Note, that there is always one intersection around \( r(t+1)=r(t)\approx 1\). For smaller ρ 0, however, there is another stable fixed point. One can calculate the critical density by requiring that the left hand side of Eq. 5 be tangential to the right hand side, as indicated in the inset of Fig. 1a. This yields the following expression for the critical density:

Fig. 1
figure 1

(a) Graphical representation of Eq. 5 for below-critical (red) and above-critical (blue) values of ρ 0. The inset shows the equation (in the vicinity of the solution) for the critical value ρ = ρ c . (b) The critical connectivity plotted against the fraction of seed nodes for the threshold parameter H = 3. The solid line shows the phase boundary obtained analytically

$$ {\rho }_{c}=1-{\left[z{e}^{-{x}_{0}}{\displaystyle \sum _{k=0}^{\infty }{P}_{k+1}}\frac{{x}_{0}^{k-1}}{(k-1)!}\right]}^{-1}$$
(6)

where x 0 satisfies the following equation:

$$ 1-\frac{{x}_{0}}{z}=\frac{{\displaystyle \sum _{k=0}^{\infty }{P}_{k+1}}\frac{{x}_{0}^{k-1}}{(k-1)!}}{{\displaystyle \sum _{k=0}^{\infty }(1-{D}_{k})}\frac{{x}_{0}^{k-1}}{(k-1)!}}$$
(7)

Here D k = ik P i is the cumulative distribution function for the activation thresholds.

In Fig. 1b we compare the analytical prediction with simulation results for the case when all the activation thresholds are set to h = 3. The simulations were done for a graph with 5 ×104 nodes, and for 50 random trials. Each pair (z, ρ c ) was considered to be above the critical line if a global cascade was observed in the majority of trials for that parameters. One can see that the agreement of analytical prediction and the simulation results are excellent.

Activation Cascades in Bi-community Networks

Now let us focus on heterogeneous networks where not all the links have the same probability. In particular, here we focus on networks that are composed of a relatively small, tight community that is connected with a larger population of nodes, as schematically depicted in Fig. 2. We call the nodes in the first and the second community as A and B, respectively. Note, that the group B itself might be comprised of a larger number of sub-communities. This is the case for the networks that we use in our experiments. From the analysis perspective, however, we assume that the links are homogeneously distributed within each community. In other words, we assume that each community is represented by a random Erdos–Renyi graph of N a and N b nodes, respectively, and the interaction between two communities are introduced by linking each of the N a N b with a uniform probability. Such a network is fully characterized by within-group connectivities z a , z b , and the across the group connectivities z ab ≡ (N b N a )z ba .For the sake of simplicity, let us assume that the cascading dynamics in group A is not affected by the nodes in group B. This is a reasonable assumption as long as there are not that many active B-nodes, which is usually the case at the beginning of the cascading process. Thus, the activation dynamics of A nodes is still governed by the Eq. 5. For the B nodes, the activation dynamics is given by a similar equation, with the only difference that it is affected by active A nodes:

$$ \rho{b}(t+1)=1-(1-{\rho}_{b,0}){\displaystyle \sum _{h=0}^{\infty }{P}_{h}}Q(h;{z}_{b}{r}_{b}(t)+{z}_{ba}{r}_{a}(t))$$
(8)

The steady state fraction of active B nodes satisfies the following equation:

$$ \rho{b}^{s}=1-(1-{\rho}_{b,0}){\displaystyle \sum _{h=0}^{\infty }{P}_{h}}Q(h;{z}_{b}{r}_{b}^{s}+{z}_{ba}{r}_{a}^{s})$$
(9)

where ρ a s is the steady state fraction of active A nodes. Thus, the presence of the active A nodes facilitates the activation of B nodes, and the effect depends on the across the group connectivity z ba . Specifically, if z ba is very small, then the activation dynamics in group B can be described as in the previous section. Namely, there is a threshold fraction of seed nodes so that above the threshold all the B will be eventually activated. However, even below the threshold, there is a possibility of a global cascade in group B if the across the group connectivity z ba is sufficiently large. Indeed, our analysis has shown (Galstyan and Cohen 2007) that for a fixed within-group connectivity z b , there is a critical across the group connectivity z ba c so that for z ba > z ba c the activation will propagate from group A to group B and cause a global cascade.

Fig. 2
figure 2

Schematic illustration of a bi-community network

Now let us look at the transient dynamics of the activation cascade; see Galstyan et al. (2009) for more details. In the continuous time limit, the dynamics can be written as

$$ t\frac{d{r}_{a,b}}{dt}=1-{r}_{a,b}-(1-{r}_{a,b}^{0}){\displaystyle \sum _{h=0}^{\infty }{P}_{h}}Q[h;{z}_{b}{r}_{b}(t)+{z}_{ba}{r}_{a}(t)]$$
(10)

Let \( r(t)=a{r}_{a}(t)+(1-a){r}_(t)\), \( a={N}_{a}/({N}_{a}+{N}_{b})\), be the fraction of active nodes in the whole network. In Fig. 3 we compare the solutions obtained from Eq. 10 with the results of simulations on randomly generated graphs for the same network parameters but two different values of the threshold parameter. The parameters of the network are N a = 5, 000, N b = 15, 000, \( {z}_{aa}={z}_{bb}=15\), z ab = 4. The fraction of seed nodes is ρ a 0 = 0. 1, and \( {t}^{-1}=0.1\). The simulations are averaged over 100 random realizations.

Fig. 3
figure 3

Analytical (solid lines) and simulation (circles) results for the activation dynamics. The upper panel shows the fraction of active nodes vs. time for threshold parameter H = 2 and H = 4. The lower panel shows the activation rate d ρd t vs. time for H = 4

We note that the agreement between the analytical prediction and results of the simulations is quite good. The network settles to the same steady state for both values of the threshold parameter H: that is, all of the nodes are activated at the end of the cascading process. However, the transient dynamics depend on the threshold parameter H. For H = 2, activation spreads very quickly through both communities and after a short interval all of the nodes are activate. For H = 4, on the other hand, the fraction of active nodes seems to saturate, then, in later iterations, ρ(t) increases rapidly and eventually all the nodes become active. In Fig. 3b we plot the rate of activation process d ρd t vs. time for H = 4. Apparently, the peak rates of activation in the two communities are separated in time. We call this phenomenon two-tiered dynamics. We would like to note that previously such a multi-peak structure has been observed in Gupta et al. (1989), where the authors studied the impact of different mixing patterns on the spread of sexually transmitted infection.

Influence Maximization

We now focus on influence maximization in modular networks. From the algorithmic standpoint, the influence maximization problem can be stated as follows (Domingos and Richardson 2001; Kempe et al. 2003): Given a social network, an influence model, and a set of nodes S, let σ(S) be the expected number of nodes that will be activated by the end of the cascading process. Then, for a given budget M, the influence maximization problem is concerned with finding the set S of size M that maximizes the return σ(S). While this problem is known to be NP hard for the many influence models, several approximate methods have been developed. An important result established in Kempe et al. (2003) states that for a class of models that obey the so called diminishing returns property, a simple hill-climbing algorithm, which works by greedily selecting the next best candidate node, yields a solution which is guaranteed to be within ∼ 63 % of the optimal. This result was further extended to more general models (Kempe et al. 2005; Mossel and Roch 2007).

It is quite safe to assume that the diminishing returns property is satisfied in saturated, or near-saturated, niche markets. However, those models might fail to capture the dynamics of emerging markets, where the condition of the sub-modular growth can be violated. Indeed, many economical and social phenomenon are better described in terms of critical phase transitions, where a huge growth is observed only after some threshold conditions are met. Here we are interested in this latter case. As we demonstrate below, in such critical systems, the structural properties of networks can play a significant role in the cascading dynamics. Consequently, selection strategies that discard the community structure might result in sub-optimal solution to the influence maximization problem. The intuition is as follows: since the critical number of nodes necessary to cause a cascade for a given connectivity grows linearly with the network size, then it might be beneficial to target the smaller group first and cause an activation cascade in that group. Afterwards, the activation will propagate through the larger network, provided that the density of links between the groups is sufficiently strong.

To validate this observation, we performed experiments on synthetic random graphs as well as real-world citation networks, using both integer and fractional versions of the linear threshold model.Footnote 4 We examined several different targeting strategies. The results presented below are for the random selection (RS), and greedy selection with two different tie-breaking mechanisms in case there are more than one candidates for selection: A random tie-break, where one of the candidates is chosen randomly, and a maximum degree tie-break, where the candidate with the maximum number of links is selected. We denote the corresponding algorithms as G R S and G M D . Furthermore, we complemented each of those strategies by another strategy, which work exactly the same way, but now the candidate nodes are selected only from the community A. The corresponding strategies will be differentiated by a superscript A: RS A, G RS A, and G A MD .

We constructed synthetic networks using a generative model known as stochastic block model (Holland et al. 1983). Namely, we assume that the network is composed of m groups, with N m nodes in each. Each pair of nodes within the same group are linked with probability p i n , while the pairs across the groups are linked with probability p o u t . Thus, the corresponding connectivities within and across the groups are z i n = p i n N m and \( {z}_{out}={p}_{out}(N-{N}_{m})\), respectively. In the experiments below we used m = 10, and N m = 100, so that the total network size is N = 1, 000. We assume that one of those ten groups constitute the group A, while the remaining nine communities form the group B.

In Fig. 4 we plot the fraction of activated nodes against the number of targeted nodes for the integer-threshold model, and for different selection strategies. The connectivities are set to \( {z}_{in}={z}_{out}=10\). The integer thresholds were chosen randomly and uniformly from the interval [2, 10]. One can see that the selection strategies that explicitly target nodes from the smaller community are generally much more efficient, compared to the targeting from the general population of nose. Namely, for small and large values of N 0, both methods have a similar performance. However, there is a window [N 1 c, N 2 c], within which the selection of A nodes is clearly superior. Recalling the analysis from the previous section, it is clear that N 1 c corresponds to the critical threshold for which the activation spreads throughout group A, and the spills into the rest of the network. If one targets nodes from the general population, on the other hand, this critical effect does not come into play until later, when larger number of nodes, N 0 2, have been selected. The difference N 2 cN 1 c depends on the particular selection strategy (e.g., greedy, random selection, etc.), as well as the size of the network. For instance, for random selection strategies, the difference can be estimated as ρ c (N b N a ), where ρ c is the critical fraction of seed nodes required to cause a global cascade (see section “Activation Cascades in Single-Community Networks”).

Fig. 4
figure 4

Results for the integer-threshold LTM

Discussion

We have examined linear threshold model of activation cascades in structured heterogeneous networks. We demonstrated that for models with critical behavior, the structural properties of the network, and specifically, its community structure, can have a strong impact on the cascading process. For two-community networks, we demonstrated that by targeting nodes from the smaller community, one can achieve a cascade with fewer number of seed nodes. This effect is especially significant if the sizes of two communities are vastly different.

We note that the networks considered here mimic scenarios where innovations are introduced through a small community of early adopters. In this respect, our work is related to the organizational viscosity model of Krackhardt (1997) and McGrath and Krackhardt (2003) that describes the diffusion of ideas in an organization. In their approach, the organization is modeled as a number of interacting sub-units, with closer social ties within each unit. When the organization has a more or less homogenous structure, then a newly introduced idea cannot survive unless it is initially adopted by a large number of individuals. However, if the network describing the interaction of sub-units meets certain structural conditions, then the idea might take over the whole population even starting from a small number of initial adopters.

While the analysis shown here was for Erdos–Renyi networks, a similar behavior is observed also for communities with power-law degree distribution; see Galstyan et al. (2009). One important implication of the heavy tail is that it might affect networks dynamical properties, and, in some cases, suppress critical behavior. Finally, we note that the binary-state, single-stage model considered here might be too naive to capture certain dynamical processes on real-world networks. A number of authors have started examining multi-stage models that allow for more fine-grained notion of influence (Bruyn and Lilien 2008; Melnik et al. 2013). Another important extension is enabling nodes with more elaborate temporal dynamics, where the activity patterns can be sustained and reinforced over time (Piedrahita et al. 2013). Understanding the impact of network modularity on more elaborate dynamical models is an interesting future problem.