Predicting Information Diffusion Cascades Using Graph Attention Networks

Wang, Meng; Li, Kan

doi:10.1007/978-3-030-63820-7_12

Meng Wang¹¹ &
Kan Li¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1332))

Included in the following conference series:

International Conference on Neural Information Processing

2568 Accesses
4 Citations

Abstract

Effective information cascade prediction plays a very important role in suppressing the spread of rumors in social networks and providing accurate social recommendations on social platforms. This paper improves existing models and proposes an end-to-end deep learning method called CasGAT. The method of graph attention network is designed to optimize the processing of large networks. After that, we only need to pay attention to the characteristics of neighbor nodes. Our approach greatly reduces the processing complexity of the model. We use realistic datasets to demonstrate the effectiveness of the model and compare the improved model with three baselines. Extensive results demonstrate that our model outperformed the three baselines in the prediction accuracy.

Access provided by Autonomous University of Puebla. Download conference paper PDF

SkipCas: Information Diffusion Prediction Model Based on Skip-Gram

CollaborateCas: Popularity Prediction of Information Cascades Based on Collaborative Graph Attention Networks

Information Diffusion Prediction with Personalized Graph Neural Networks

Keywords

1 Introduction

The emergence of online social platforms, such as Twitter, Weibo, Facebook, Instagram, WeChat, QQ, has changed our daily lives. Information diffusion cascade occurs when people see the tweets of others on the platforms and then retweet the information that the others have written. Cascade information can play an important role in addressing the issue of controlling and predicting information diffusion. Along with this growth of the messages on these platforms, however, there are increasing concerns over complex and dynamic information diffusion cascade prediction.

Much work so far has focused on information diffusion cascade prediction. It mainly falls into four main categories: feature-based method, generative process method and deep learning-based methods. Information diffusion cascade prediction still has many challenges: (1) lack of knowledge of dynamic network graph structure when real social networks are constantly changing. This greatly affects the general type of the model. (2) the accuracy of the information diffusion cascade prediction take into account not only structural information but also temporal information.

Here, we introduce an attention and temporal model called CasGAT to predict the information diffusion cascade, which can handle network structure predictions in different time periods. It only deals with the relationship between nodes and neighbor nodes in the graph, without first acquiring the structural features of the entire graph. Therefore, it can handle tasks including evaluating models on completely invisible graphs during training. In addition, CasGAT uses deep learning methods, graphical attention networks and time decay to enhance our framework.

In the rest of this paper, Sect. 2 reviews the related work. In Sect. 3, we describe the main aspects of CasGAT methodology in details. Experimental evaluations quantifying the benefits of our approach are presented in Sect. 4 and Sect. 5 concludes the paper and outlines directions for future work.

2 Related Work

In this section, we briefly review the research on the Information diffusion cascade prediction. In general, existing method of information diffusion cascade prediction falls into three main categories.

2.1 Feature-Based Method

As for early adopts and network structure, the properties of featured-based methods energetically depends on the quality of the hand-crafted features. These features are mainly extracted from temporal features [8, 10], structural features [1, 12], content features [5, 11] and features defined by early adopters [4, 6]. Faced with complex problems and massive data, it is difficult for people to systematically design and measure Effectively capture complex features of relevant information.

2.2 Generation Process Method

The generation process method focuses on modeling the strength function independently for each message arrival. Generally, they observe each event and learn the parameters by maximizing the probability of the event occurring within the observation time window. Shen et al. [9] used the enhanced Poisson process to model three factors in the social network (the influence of nodes, the decay process of information heat, and the “rich get rich” mechanism). These methods show enhanced interpretability, but the implicit information in cascade dynamics cannot be fully utilized to obtain satisfactory predictions.

2.3 Deep Learning-Based Method

Deep learning-based methods are inspired by the latest success of deep learning in many fields, and cascade prediction using deep neural networks has achieved significant performance improvements. The first proposed information cascade predictor (DeepCas) based on deep learning [7] converts a cascade graph into a sequence of nodes by random walk, and automatically learns the representation of each graph. Cao et al. [2] proposed a process based on deep learning, which inherits the high interpretability of the Hawkes process and has the high predictive power of deep learning methods. However, due to the low efficiency of cascading sampling bias and local structure embedding, they lack good learning ability in cascading structure information and dynamic modeling.

3 Model

The information cascade is affected by many factors. The overview of the architecture is shown in Fig. 1. The model takes a cascade graph as the input and predicts the information increment size as the outputs. Next, we focus on the detailed methods one by one.

3.1 Basic Definitions

We denote a social network (e.g., weibo network) as $\mathcal {G}$ = ($\mathcal {V}$, $\mathcal {E}$), where $\mathcal {V}$ is the set of all users of $\mathcal {G}$, $\mathcal {E}\subseteq \mathcal {V} \times \mathcal {V}$ is the set of all relationships between users. Suppose we have M pieces of message in the social network, let ${C}=\left\{ C_m,(1 \le {m} \le {M})\right\} $ be the set of M information cascades. We use $g_{t_{i}}^{i}=\left( v_{t_{i}}^{i}, e_{t_{i}}^{i}\right) $to represent the cascade path for the i-th information diffusion of the message m at $t_{i}$, where $v_{t_{i}}^{i}$ represents the subset of V, $e_{t_{i}}^{i}$ represents the relationship between the users in $v_{t_{i}}^{i}$. The objective problem of this paper can be expressed as: given a set of information cascades ${C}_{{m}}({t})$ observed on a given social network information cascade ${C}=\left( {C}_{1}, {C}_{2}, \ldots , {C}_{{M}}\right) $, we want to predict the increment size $\varDelta S_{m}$ of each information cascade ${C}_{{m}}$ for a fixed time window $\varDelta t$, i.e., $\varDelta S_{m}=\left| S_{m}^{T+\varDelta t}\right| -\left| S_{m}^{T}\right| $.

3.2 Cascade Embedding

Each information cascade on every social media almost consists of a sequence of retweets, so the big cascade is becoming a big problem when processing the original graph cascade. There are different ways to represent cascaded input. In our approach, the cascade $C_m$ represents the propagation path of the message m. We can get the cascade graph $G_{t_{i}}^{i}$ which is represented as a set of cascade paths that are sampled through the $G_{t_{i}}^{i}=\left\{ g_{t_{1}}^{i}, g_{t_{2}}^{i}, g_{t_{3}}^{i}, \ldots , g_{T}^{i}\right\} , t_{i} \in [0, T)$. Each cascade graph not only contains the structure information between the users, but also carries the temporal information. $G_{t_{i}}^{i}$ consists of a set of matrices.

3.3 Graph Attention Layer

This part of our model is based on graph attention network. In our approach, we use the method called feature decomposition to compute the eigenvectors of the above matrix in $G_{t_{i}}^{i}$. And we get the set of node features, $\mathbf{h} =\{\overrightarrow{h_{t_{1}}^{i}}, \overrightarrow{h_{t_{2}}^{i}}, \overrightarrow{h_{t_{3}}^{i}}, \ldots , \overrightarrow{h_{T}^{i}}\}$ as the input of graph attention layer. In order to get the corresponding conversion between the input and output, we need to perform linear transformation at least once according to the input features to get the output features, so we need to train a weight matrix $\mathbf{W} $ for all bytes. The attention coefficients

$$\begin{aligned} e_{u v}=f(W \overrightarrow{h_{u}^{i}}, W \overrightarrow{h_{v}^{i}}) \end{aligned}$$

(1)

indicate the importance of node v’s features to node u. In order to make the attention coefficient easier to calculate and compare, we introduced softmax function to regularize all adjacent nodes of u:

$$\begin{aligned} \alpha _{u v}={softmax}_{v}\left( e_{u v}\right) =\frac{\exp \left( e_{u v}\right) }{\sum _{k \in V} \exp \left( e_{u k}\right) } \end{aligned}$$

(2)

Combining the above formulas (1) and (2), sorting them together can get the complete attention mechanism as follows:

$$\begin{aligned} \alpha _{u v}={softmax}_{v}\left( e_{u v}\right) =\frac{\exp \left( {LeakyReLU}\left( \overrightarrow{f}^{T}\left[ W \overrightarrow{h_{u}^{i}} \Vert W \overrightarrow{h_{v}^{i}}\right] \right) \right) }{\sum _{k \in V} \exp \left( {LeakyReLU}\left( \overrightarrow{f}^{T}\left[ W \overrightarrow{h_{u}^{i}} \Vert W \overrightarrow{h_{k}^{i}}\right] \right) \right) } \end{aligned}$$

(3)

where $.^{T}$ represents transposition and $\Vert $ is the concatenation operation

The outputs of this layer are represented:

$$\begin{aligned} h_{struc} = {\textit{\textbf{h}}}=\sigma \left( \sum _{v \in V} \alpha _{u v} \overrightarrow{h_{v}^{i}}\right) \end{aligned}$$

(4)

The output characteristics of this node are related to all the nodes adjacent to it, which are obtained after their linear and nonlinear activation.

3.4 Temporal Embedding

In our model, we will embed time. We use the most common LSTM (Long short-term memory) to model temporal embedding. Below we will talk about the specific implementation of these aspects. LSTM is a special kind of RNN. We use the most common LSTM (Long Short Term Memory) to model temporal embeddings. LSTM is a special RNN that can selectively memorize past information through a memory unit. The specific implementation method is as follows:

$$\begin{aligned} \begin{array}{l} f_{t} =\sigma \left( W_{f} \cdot \left[ h_{t-1}, g_{t}^{i}\right] +b_{f}\right) \\ i_{t} =\sigma \left( W_{i} \cdot \left[ h_{t-1}, g_{t}^{i}\right] +b_{i}\right) \\ \tilde{C}_{t} =\tanh \left( W_{C} \cdot \left[ h_{t-1}, g_{t}^{i}\right] +b_{C}\right) \\ C_{t}=f_{t} * C_{t-1}+i_{t} * \tilde{C}_{t} \\ o_{t}=\sigma \left( W_{o}\left[ h_{t-1}, g_{t}^{i}\right] +b_{o}\right) \end{array} \end{aligned}$$

(5)

where $g_{t}^{i}$ is a matrix representation of the cascade path, and $f_{t}, i_{t}, o_{t}$ are respectively the input gate, forget gate and output gate.The hidden state is then updated by:

$$\begin{aligned} h_{t}=o_{t} * \tanh \left( C_{t}\right) \end{aligned}$$

(6)

Finally, we get the collection of the m messages $h_{tem} = {h}=\{\overrightarrow{h_{1}}, \overrightarrow{h_{2}}, \overrightarrow{h_{3}}, \ldots , \overrightarrow{h_{M}}\}$.

3.5 Time-Delay Function and Output Layer

Social networking is a very dynamic process. For example, when a Weibo has just been published, it has the greatest influence, and then its effect will become smaller and smaller over time. This is why we have to add the time decay factor. Current methods have power-law functions ($\phi (t)=(t+c)^{-(1+\theta )}$) and exponential functions ($\phi (t)=e^{-\theta t}$) to simulate. In our model, we use a non-parametric method to make a more appropriate representation of the time decay factor. We assume that the overall observation time window is [0, T), and then split the observation time window into l disjoint time intervals $\left\{ \left[ t_{0}=0, t_{1}\right) ,\left[ t_{1}, t_{2}\right) , \ldots \right. $ $\left. \left[ t_{l-1}, t_{l}=T\right) \right\} $. We need to learn the discrete variable of time decay effect $\lambda _{j}, j \in (1,2, \dots , l)$. For different cascade paths, we all add time decay parameters to get a new representation:

$$\begin{aligned} {h_{t}}^{\prime } = \lambda _{m} (h_{tem} + h_{struc}) \end{aligned}$$

(7)

and the m-th cascade path can be represented by:

$$\begin{aligned} h^{\prime }\left( C_{m}\right) =\sum _{t=1}^{T} h_{t}^{\prime } \end{aligned}$$

(8)

The final output of our model consists of a multi-layer perceptron (MLP), expressed as follows:

$$\begin{aligned} \varDelta S_{i}=\mathbf{{MLP}}\left( h^{\prime }\left( C_{i}\right) \right) \end{aligned}$$

(9)

4 Experiments

In this section, we will present the details of experiments conducted on real-world datasets and the results analysis between our proposed model and baseline.

4.1 Dataset

The dataset is from Sina Weibo, one of the most popular microblogging platform in China. This dataset is provided by Cao [2]. In Sina Weibo, the network is composed of the relationship among a big group of users. The relationship contains the retweets, likes and comments. In this paper, we concentrate on the retweets of messages. Compared to likes and comments, retweet is the most direct way to form a cascade. The crawl time of dataset is from 0:00 on June 1, 2016 to 24:00 on June 1, 2018. It remains 119,313 messages in total. We filter out the message before 6 am and after 9 pm. And the length T of observation time window, we consider three settings, i.e., T = 1 h, 2 h and 3 h. Finally, we sort all the rest cascades by publishing time, replacing the first 70% as the training set, the middle 15% as the verification set, and the last 15% as the test set.

4.2 Baselines and Variants of Our Model

For a comprehensive comparison, we considered a variety of the latest alternatives in the methods mentioned above. The baselines are DeepCas [7], DeepHawkes [2] and CasCN [3]. In addition to comparing with existing baseline, we also compared with some variants of CasGAT. Here are a few variants of our model: CasGAT-GCN, CasGAT-GRU, CasGAT-Time.

4.3 Evaluation Metric

For the information cascade prediction problem, we choose standard evaluation metrics—MSLE (mean square log-transformed error) in our experiments. The smaller MSLE, the better prediction performance. Specifically, MSLE is the metric for evaluating the linking accuracy, defined as:

$$\begin{aligned} MSLE=\frac{1}{M} \sum _{i=1}^{M}\left( \log \varDelta S_{i}-\log \varDelta \widetilde{S}_{i}\right) ^{2} \end{aligned}$$

(10)

where M is the total number of the messages, $\varDelta S_{i}$ is the prediction result and $ \varDelta \widetilde{S}_{i}$ is the actual result.

mSLE is the median of $\left( \log \varDelta S_{i}-\log \varDelta \widetilde{S}_{i}\right) ^{2}$ which can effectively reduce the impact of outliers, defined as:

$$\begin{aligned} mSLE=median_{i=1...M}\left( \log \varDelta S_{i}-\log \varDelta \widetilde{S}_{i}\right) ^{2} \end{aligned}$$

(11)

4.4 Result

In this section, we compare the performance of our model with the three baselines. The results are illustrated in Table 1.

Table 1. The performance of baseline model and CasGAT on Sina Weibo dataset

Full size table

Table 1 summarizes the performance comparison between CasGAT and three baselines on the Sina Weibo dataset. Generally speaking, the proposed CasGAT model performs relatively well in the information cascade prediction on the public Sina Weibo dataset. It is superior to traditional methods, such as feature-based methods and generation methods, and superior to the latest deep learning methods. MSLE and mSLE are statistically significantly reduced.

To study and prove the effectiveness of each component of our model, we introduced three variants of CasGAT. These variants are all modified models for a part of our model framework. The experimental results are shown in Table 2, from which we can see that the original CasGAT caused a certain reduction in prediction error compared to other variants. Through comparison with CasGAT-Time, we find that the time decay effect is essential for cascade size prediction. Similarly, CasGAT-GCN and CasGAT-GRU also reduce prediction performance to some extent. Among them, the error of CasGAT-GRU variant is smaller than the original model within 1 h, but the error of the original model is still small in the subsequent time.

Table 2. The performance of CasGAT and its variants on Sina Weibo dataset

Full size table

In summary, structural and time information are two key components in CasGAT. These two factors play an indispensable role in improving prediction accuracy.

5 Conclusion

We propose a new information cascade propagation model based on deep learning - CasGAT. Our model uses structure and temporal information to achieve information cascade prediction. The model mainly adds a graph attention mechanism, which greatly reduces the complexity of modeling graphs. We put our focus on neighbor nodes instead of the entire graph, which can further improve our prediction performance. In the future, we will increase our efforts to expand the model’s perception capabilities and extend the model to more effective data sets.

References

Bao, P., Shen, H.W., Huang, J., Cheng, X.Q.: Popularity prediction in microblogging network: a case study on Sina Weibo. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 177–178 (2013)
Google Scholar
Cao, Q., Shen, H., Cen, K., Ouyang, W., Cheng, X.: DeepHawkes: bridging the gap between prediction and understanding of information cascades. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 1149–1158 (2017)
Google Scholar
Chen, X., Zhou, F., Zhang, K., Trajcevski, G., Zhong, T., Zhang, F.: Information diffusion prediction via recurrent cascades convolution. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 770–781. IEEE (2019)
Google Scholar
Cui, P., Jin, S., Yu, L., Wang, F., Zhu, W., Yang, S.: Cascading outbreak prediction in networks: a data-driven approach. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 901–909 (2013)
Google Scholar
Hong, L., Dan, O., Davison, B.D.: Predicting popular messages in Twitter. In: Proceedings of the 20th International Conference Companion on World Wide Web, pp. 57–58 (2011)
Google Scholar
Lerman, K., Galstyan, A.: Analysis of social voting patterns on digg. In: Proceedings of the First Workshop on Online Social Networks, pp. 7–12 (2008)
Google Scholar
Li, C., Ma, J., Guo, X., Mei, Q.: DeepCas: an end-to-end predictor of information cascades. In: Proceedings of the 26th International Conference on World Wide Web, pp. 577–586 (2017)
Google Scholar
Pinto, H., Almeida, J.M., Gonçalves, M.A.: Using early view patterns to predict the popularity of YouTube videos. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, pp. 365–374 (2013)
Google Scholar
Shen, H., Wang, D., Song, C., Barabási, A.L.: Modeling and predicting popularity dynamics via reinforced Poisson processes. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)
Google Scholar
Szabo, G., Huberman, B.A.: Predicting the popularity of online content. Commun. ACM 53(8), 80–88 (2010)
Article Google Scholar
Tsur, O., Rappoport, A.: What’s in a hashtag? Content based prediction of the spread of ideas in microblogging communities. In: Proceedings of the fifth ACM International Conference on Web Search and Data Mining, pp. 643–652 (2012)
Google Scholar
Weng, L., Menczer, F., Ahn, Y.Y.: Predicting successful memes using network and community structure. In: Eighth International AAAI Conference on Weblogs and Social Media (2014)
Google Scholar

Download references

Acknowledgments

This research was supported by Beijing Natural Science Foundation (No. L181010, 4172054), National Key R & D Program of China (No. 2016YFB0801100), and National Basic Research Program of China (No. 2013CB329605).

Author information

Authors and Affiliations

School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
Meng Wang & Kan Li

Authors

Meng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kan Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kan Li .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut's University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, M., Li, K. (2020). Predicting Information Diffusion Cascades Using Graph Attention Networks. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1332. Springer, Cham. https://doi.org/10.1007/978-3-030-63820-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-63820-7_12
Published: 17 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63819-1
Online ISBN: 978-3-030-63820-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Predicting Information Diffusion Cascades Using Graph Attention Networks

Abstract

Similar content being viewed by others

SkipCas: Information Diffusion Prediction Model Based on Skip-Gram

CollaborateCas: Popularity Prediction of Information Cascades Based on Collaborative Graph Attention Networks

Information Diffusion Prediction with Personalized Graph Neural Networks

Keywords

1 Introduction

2 Related Work

2.1 Feature-Based Method

2.2 Generation Process Method

2.3 Deep Learning-Based Method

3 Model

3.1 Basic Definitions

3.2 Cascade Embedding

3.3 Graph Attention Layer

3.4 Temporal Embedding

3.5 Time-Delay Function and Output Layer

4 Experiments

4.1 Dataset

4.2 Baselines and Variants of Our Model

4.3 Evaluation Metric

4.4 Result

5 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Predicting Information Diffusion Cascades Using Graph Attention Networks

Abstract

Similar content being viewed by others

SkipCas: Information Diffusion Prediction Model Based on Skip-Gram

CollaborateCas: Popularity Prediction of Information Cascades Based on Collaborative Graph Attention Networks

Information Diffusion Prediction with Personalized Graph Neural Networks

Keywords

1 Introduction

2 Related Work

2.1 Feature-Based Method

2.2 Generation Process Method

2.3 Deep Learning-Based Method

3 Model

3.1 Basic Definitions

3.2 Cascade Embedding

3.3 Graph Attention Layer

3.4 Temporal Embedding

3.5 Time-Delay Function and Output Layer

4 Experiments

4.1 Dataset

4.2 Baselines and Variants of Our Model

4.3 Evaluation Metric

4.4 Result

5 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation