1 Introduction

Understanding the way in which a disease or a piece of information spreads from person to person is of obvious practical relevance. If we are able to comprehend the mechanisms that dominate such spreading processes we would be able to enhance the spread of valuable information through a community or impair an outbreak of an infectious disease. The similitudes between these two processes, epidemics and information diffusion (also referred to as rumor spreading [1113, 16, 19]) have long been recognized and the two fields have evolved in parallel freely borrowing ideas and concepts from each other [9, 10].

In epidemics it has become clear that some individuals that are “superspreaders” play a dominant role in the course of an epidemic [20]. Intuitively, one expects that similarly influential individuals would also be present in the case of information diffusion and recent years have witnessed a growing interest to understand how to identify them [1, 3, 6, 24, 26]. Successful approaches have focused on studying the effect that different network-based centrality measures have on rumor spreading. In particular, one recent seminal approach [17] has identified the k-core as the best measure to predict influence, outperforming degree centrality or betweenness in the context of an epidemic spreading process. This insight has been followed by many other works, which mainly discuss under which circumstances the k-core actually predicts a node’s disease spreading capabilities [7] or propose alternative measures of influence [8, 18].

Following the original proposal of Kitsak et al. [17], Borge-Holthoefer and Moreno [4] studied rumor spreading dynamics to learn whether the k-core could predict authority or not. Surprisingly, their results indicate that a rumor’s success—measured as the number of individuals that learn about the rumor at the end of the spreading dynamics—is topology-independent: no matter who in a network triggers the rumor, the final number of nodes who learn about it will be the same (given the same spreading parameters). Additionally, central nodes (those at the highest core levels) behave as firewalls, short-circuiting the capacity of the rumor to spread further. This theoretical prediction is clearly at odds with empirical evidence and points to a shortcoming of theoretical models that must be overcome.

The development of the Web 2.0 and the growing popularity of online social networks have not only had a tremendous impact on our daily lives, but they also had the beneficial consequence of generating detailed data on social communication patterns, which can ultimately inspire and guide the development of more realistic models. In this paper we try to fill the gap between observations from real systems and theoretical predictions by introducing some simple modifications to models proposed previously [21, 22]. The resulting models are able to better approximate the behavior of users as observed in online social networks, in particular, the fact that there are influential nodes with larger diffusion capacities, an important feature not accounted for with current rumor spreading models.

Our analysis starts from the simple empirical observation [2, 5, 23] that individuals display complex activity patterns both on and offline and, in particular, are not active around the clock. This fact has two possible interpretations. On one hand, users that are actually spreading the rumor are active only at specific times and only then they are able to participate in the diffusion process. On the other hand, an individual’s choice of becoming active and participating in a specific information cascade can be seen as a demonstration of interest in the topic and his/her will to spread it. Inspired by these two interpretations, we derive two different rumor diffusion models.

The first model incorporates the differences in the activity of the individuals responsible for the spreading of the rumor. Each spreader is assigned with a randomly chosen probability of being active at a given time. In this context, we study the effects of the heterogeneity [23] in the activation probability extracting values from three different probability distributions ranging from a uniform to a long-tailed one. In a more realistic version, following the idea that more active users, usually, also have a central role in the topology of the network, we relate the activity of each individual with its degree.

The second model takes into account the fact that an individual could learn the rumor without actually spreading it further. This is for example what happens in most online social networks, in which followers receive pieces of information from those they are following and not always—indeed, rarely—they transmit the news further. We therefore introduce the possibility that a person that comes into contact with the rumor does not spread it anymore. This approach is complementary to the previous one, as we consider that ignorants and not the spreaders are those who can be inactive.

In the remaining of the paper, we will show that even though these alternatives introduce only small and intuitive changes, they are able to shed light on the complex social mechanisms at work in real social systems and, at least qualitatively, reproduce the heterogeneities observed in Twitter data. The rest of the manuscript is organized as follows: In the next section, we present a general framework for rumor spreading on networks while Sects. 2.1 and 2.2 present the two modified models and the results of numerical simulations. Finally, we draw our conclusions in Sect. 3.

2 General Modeling Framework

In classical rumor spreading models on networks, each of the N nodes of a network can be in one of three possible states. A node holding a rumor and willing to transmit it is called a spreader. Nodes that are unaware of the update will be referred to as ignorants, while those that already know it but are not willing to spread it further are called stiflers. We denote the density of ignorants, spreaders, and stiflers at time t as i(t), ρ(t) and r(t), respectively, with i(t)+ρ(t)+r(t)=1, ∀t. The spreading process takes place along the links connecting spreaders and ignorants. At each time step, spreaders contact all of their neighboring nodes. In the simplest case, whenever a spreader j contacts a node n that is ignorant, the latter will become a spreader with a fixed probability λ. Otherwise, if n is already a spreader, the node j will turn into a stifler with probability α. Mathematically, the general model can be represented as:

where the initial conditions are set such that i(0)=1−1/N, ρ(0)=1/N and r(0)=0. In addition, and without loss of generality, we set λ=1 unless other values are explicitly stated.

For each alternative model presented, extensive numerical calculations have been carried out by simulating the dynamics of rumor propagation on top of a real-world Twitter following/follower network [15]. From an initial scenario, in which all nodes belong to the ignorants class except the seed, we perform S=10 simulations. This is repeated for each node, i.e. every vertex of a network of N nodes acts as the initial seed S times, to obtain statistically significant results. In this way, for each node i, we average the final density of stiflers in the network \(r^{i}_{\infty}\). This quantity accounts for the spreading capacity of node i, which quantifies how deep the rumor penetrated the network when node i was the initial seed:

$$ r^{i}_{\infty} = \frac{1}{S}\sum _{m=1}^{S}r^{i,m}_{\infty} $$
(1)

where \(r^{i,m}_{\infty}\) represents the final density of stiflers for a particular run m with origin at node i. With this information at hand for all nodes, we coarse-grain the individual \(r^{i}_{\infty}\)’s into classes of nodes according to their core number. Thus, r (k S ) represents the average stifler density for all runs with a seed with a k S core index:

$$ r_{\infty} (k_{S} ) = \sum_{i \in \varUpsilon_{k_{S}}} \frac{r^{i}_{\infty}}{N_{k_{S}}} $$
(2)

where \(\varUpsilon_{k_{S}}\) is the set of all \(N_{k_{S}}\) nodes with k S values.

Figure 1 shows the comparison between the values of r (k S ) as obtained via the numerical simulations of the above rumor spreading model and the observed fraction of users N c /N reached by cascades originated at nodes with core index k S obtained by analyzing Twitter usage data from the Spanish Indignados movement (see [4, 15] for details on how data have been extracted and analyzed). The differences observed in this plot are striking. Even though the model ran on the exact same network, the theoretical prediction is completely insensitive to the value of the originating k-core, while in the empirical data there is a clear correlation between belonging to higher cores and larger numbers of nodes reached by the cascade. This difference in behavior clearly shows that there is something fundamentally lacking in the theoretical model.

Fig. 1
figure 1

(Color online) Density of stiflers at the end of the rumor spreading r (k S ) originated in a node that is part of the k S k-core for the general model of rumor spreading (squares) and the empirical fraction, N c /N, of users reached by cascades originated at nodes with k-core k S (circles) as extracted by real Twitter data (for details see [4, 15]). Numerical simulations were ran using the same empirical Twitter follower network

2.1 Model I: Human Activity and Temporal Patterns

We next consider the possibility that nodes are not always available to take part in a certain communication exchange. Each individual is active with a certain probability, a i , affecting his/her behavior as a spreader. Thus, on top of the constraints of the basic framework presented above, we assume that a spreader only attempts to spread the rumor when it is active. As a consequence, the transition from the class of ignorants to the class of spreaders happens less often.

It is worth mentioning that as far as our model is concerned, the approach adopted is rooted in the observation that human activity patterns are mostly heterogeneous and therefore individuals are not always active [23] nor is their activity distributed randomly over time [2, 14, 25]. However, we assume that nodes in the network still have memory of who their potential neighbors are, and although not all the links of a given node were concurrently active, the set of available neighbors would be predefined by the underlying static (aggregated) topology. A more accurate description would require to consider that the topology is shaped by the activity of the nodes, so that the resulting time-varying networks are activity-driven [23]. In the latter case, the interactions between the different classes of nodes in the system would still be activity-driven, but no memory of the static topology would be present, as the interaction structure is redefined at each time step. Whether or not both mechanisms lead to similar behavior is a matter that deserves further investigation.

On the other hand, note that being active or not has no effect on the rumor’s recipients (ignorants). This mechanism is specific for asynchronous communication systems such as Twitter, FedEx, email or SMS where information can be sent even without requiring the collaboration of the recipient. On the contrary, for synchronous systems, such as phone calls or Instant Messaging, that require both the source and the target of a message to be active at the same time, such a scheme would not suffice.

Here we explore three possibilities for the activity distribution: (a) uniform, P(a)∼c; (b) exponential, \(P (a )\sim e^{-a/a_{c}}\); and (c) power-law, P(a)∼a γ. Interestingly, these distributions yield completely different results. Figure 2 illustrates this perfectly. The increase of heterogeneity in activity patterns moves the distribution of outbreak sizes, r (k S ), closer to empirical results, highlighting the fact that heterogeneity is a fundamental factor in real information spreading processes. A uniform activity distribution (lowest panel) completely flattens the spreading capabilities, no matter if nodes are in a topologically relevant region or not. This is in good agreement with [4], the only difference being the time the system needs to reach a final state (that is, the probabilities delay the process significantly). An exponential distribution introduces some amount of asymmetries in the activity distribution, which slightly affects the spreading results (central panel). Finally, a power-law probability distribution introduces heterogeneity in the spreading success, the higher k S , the higher the spreading capacity, just as it has been found empirically [5, 15].

Fig. 2
figure 2

(Color online) Density of stiflers at the end of the rumor spreading r (k S ) originated in a node of k-core k S in the activation probability model. Three different probability distribution functions are used: a power-law with exponent γ=2.5 (upper panel), exponential with a c =0.1 (middle panel) and a uniform distribution (lower panel)

The importance of the spreader-to-stifler rate is revealed in the heterogeneous scenario. α sets the timescale relevant to this process. For high values of α, ρ nodes quickly become stiflers and the rumor doesn’t have the possibility of reaching a significant fraction of the population while lower values of α easily allow for successful dissemination. Furthermore, it should be noted that we are assigning activity levels entirely at random, without any relation between topological features and activity probabilities. This means that a poorly connected node is just as likely to be highly active as a node with high degree. However, it has been seen previously [23] that activity distributions are correlated with the observed degree distribution. The simplest form of effectively implementing this correlation is to assign to node i an activity probability a i =k i /k max .

Figure 3 illustrates the results of this scenario. The great heterogeneity of the degree distribution is clearly reflected. Rumors triggered from low degree nodes (which necessarily have low k S ) die out soon, because the nodes they reach are almost never active. On the contrary, high degree nodes (which are more likely to belong to a high k-core) persistently forward messages, turning any rumor into system-wide knowledge. Note that spreading is almost identical regardless α, in stark contrast to the upper panel of Fig. 2 (where α determines the shape of spreading) possibly indicating that higher level correlations also play an important role.

Fig. 3
figure 3

(Color online) Fraction of stiflers at the end of the rumor spreading r (k S ) originated in a node of k-core k S when activation probabilities are proportional to nodes degree. Three different values of α are used, the underlying network is the empirical Twitter follower network from Ref. [15]

Figure 4 shows the comparison between the topology-dependent and random distribution of the activation probabilities. In this case, all the curves have been obtained with a power-law activity distribution. However, in one of them (blue diamonds) the activity of each node is proportional to its degree (a i =k i /k max ), whereas in the other curves activation probabilities are assigned at random and thus independently from the topological features of the nodes. As in Fig. 2 in the randomly distributed case the spreading is highly affected by α meanwhile for the degree-dependent activation probabilities a substantial independence from α is present.

Fig. 4
figure 4

(Color online) Fraction of stiflers at the end of the rumor spreading r (k S ) originated in a node of k-core k S when the activation probability is proportional to nodes degree (diamonds) or randomly distributed (squares, triangles and circles). Network topology and the other parameters are the same as in Fig. 3

2.2 Model II: Apathy

The analysis of real data from online social networks demonstrates that most of the time users do not react to received messages [5]. One possible interpretation for this is that they have been informed of a rumor but chose not to spread it. This interpretation suggests another ingredient that might be missing from classical rumor spreading models and that might help bring them closer to reality: the possibility that an ignorant is apathetic and directly goes to the stifler status and does not participate further in the spreading dynamics. As noted before, this kind of behavior is common in online social networks like Twitter, in which one receives messages that are rarely spread further. We incorporate this new element by introducing the probability, p, that an ignorant is interested in the topic and decides to diffuse it. In this scenario, when a spreader contacts an ignorant, the latter turns into a spreader with probability λp and into a stifler with probability (1−p)λ. The transitions allowed by our model are then:

It should be noted that Model II is a natural counterpart of Model I presented in Sect. 2.1, in the sense that it also assigns activity probabilities to each node. The main difference is that this uniform probability p is assigned to ignorant individuals and determines whether or not they choose to participate in the spreading process. A parallel can also be made to the case of epidemic spreading where a person become immune to a disease upon coming in contact with pathogen and before it is able to develop symptoms or spread it further.

The behavior of the system can be better understood analytically by writing the mean-field rate equations governing its time evolution in the homogeneous mixing approximation:

(3)
(4)
(5)

with the initial conditions i(0)=1−1/N, ρ(0)=1/N, r(0)=0 and where \(\bar{k}\) represents the number of contacts each spreader has per unit time. The first term in the right side of Eq. (3) accounts for the density of ignorants that turn into spreaders after an interaction whereas the second term models the ignorant to stifler transition with probability (1−p)λ.

Recalling that i(t)+ρ(t)+r(t)=1 we can study the system of Eqs. (3)–(5) analytically in the infinite-time limit ρ(∞)=0, obtaining:

$$ r_{\infty} = 1-e^{- (1+ \frac{\lambda}{\alpha} p ) r_{\infty}}. $$
(6)

The average total stifler density r , for various values of p, obtained by numerically solving this transcendental equation is shown in Fig. 5. We also performed a series of Monte-Carlo (MC) simulations in the homogeneous mixing limit. At t=0 the entire population is ignorant with only a small fraction (≃1/N) being spreaders. At each time step, each spreader contacts \(\bar{k}\) individuals chosen at random from the entire population. If the chosen individual is an ignorant it will become a spreader with probability λp or directly move to stifler status with (1−p)λ. Otherwise, when a spreader comes in contact with a stifler or another spreader it turns into a stifler with probability α. When the spreading process reaches the absorbing state ρ(t)=0 the final density of stiflers is recorded. The simulation results are also plotted in Fig. 5 for comparison with the analytical solution. The agreement between the two approaches is striking and serves as a confirmation that we are not missing any fundamental ingredients in our analyses.

Fig. 5
figure 5

(Color online) Fraction of stiflers at the end of the rumor spreading r for different values of p in comparison with the theoretical prediction of Eq. (6). Numerical results are the average over 103 stochastic runs. In both cases λ=1.0, α=0.5 and \(\bar{k} = 4\)

Although the addition of a constant p parameter (any node is assigned the same p) is a crude approximation to the interest that ignorants might have in becoming spreaders, it has profound implications for the system dynamics when compared to the standard setup. Figure 6 shows the behavior of the system with the inclusion of the new rule, with fixed λ=1, α=0.5 for different p values. As for the power-law activity distribution in model I and in real data [5, 15, 23] a strong correlation between the k-core of the seed and the final outcome of the spreading is observed. In particular, and although we have made no efforts toward fitting this value, it is clear that for a very low probability (p=10−3) we already have a close-to-real behavior. Although this value might seem low (only one in a thousand contacted individuals do forward the rumor), one must consider that this is the probability that one individual will choose to participate in any of rumors he observes. It is well known that most Twitter users commonly follow on the order of hundreds of other individuals so that the number of pieces of content they are exposed to daily can easily be on the order of thousands or tens of thousands of which they are only able, or willing, to participate in a few.

Fig. 6
figure 6

(Color online) Fraction of stiflers at the end of the rumor spreading r (k S ) originated in a node of k-core k S in the ignorant-to-stifler transition model for different p values. For each value of p both the average and the maximum value of numerical simulations is presented. Network topology and the other parameters are the same as in Fig. 3

3 Conclusions

Online social networks are becoming increasingly central in our lives as they come to permeate our daily activity. It then comes as no surprise that they have been welcome by mass social movements around the world as unique platforms for the diffusion of new ideas and even for the coordination of large numbers of individuals. Understanding the forces that drive the behavior of individuals interacting in these networks is then one of the great challenges for science in the next years.

One interesting aspect is how ideas are shared between individuals and what are the conditions that allow for a large dissemination of them. In this context several works studied how a rumor can spread in a population of ignorant individuals but, due to the changes in the way in which these tools allow us to communicate, most of those works cannot catch the details of rumor dynamics on such large scale social systems.

Driven by data from a microblogging online platform we propose two modifications to classical rumor spreading models that are able to qualitatively reproduce the observed differences in the number of individuals reached by the rumor when the seed is located in the most connected circles of the network or in its periphery. The models we present are based on the observation that individuals, both spreaders and ignorants, are not always active in the network. Each model then implements a different effective mechanism that is consistent with this fact: Model I assigns activity probabilities to each node and allows for spreading to occur only when a spreader node is active while Model II assumes that each node has a finite probability of being interested in spreading each specific rumor and would otherwise chose not to participate in the diffusion process. Both variations have proved effective in bringing the classical model one step closer to reality.

In the case of Model I, numerical results highlight that the more heterogeneous the patterns of activation are the more faithfully we are able to imitate real data. Moreover, if, in a second approximation, we relate the activity of a node with its degree (as higher degrees are commonly correlated with high levels of activity in the network) we also observe a substantial independence of the results from stifler transition ratio, α. For Model II, we were also able to give an analytical expression for the final density of stiflers in the system. Interestingly, the analysis of the numerical simulations suggests that close-to-real results are obtained when the probability for an ignorant to be interested in the rumor is very low; another feature also observed in real social networks.

The results presented in this paper clearly evidence that classical rumor spreading models are severely short on their ability to effectively approximate reality. We have shown that even small, empirically based, modifications can significantly increase their level of realism. In particular, our results shed some light on the interplay between technology and human interactions that are at the origin of some of the complex behaviors we observe daily. With this work we have taken a significant first step in paving the way toward a deeper understanding of how ideas spread through our online and offline social networks and help shape current events and society as a whole.