Keywords

1 Introduction

Recent years have witnessed the prosperity of various online social media platforms which allow users to generate and share various online contents through comments, likes, or retweets. Consequently, the investigation of information diffusion over online social media has attracted much attention [18]. It finds application in a lot of important scenarios such as viral marketing [9], rumor detection [16], etc. Among many of the research topics related to information diffusion, cascade popularity prediction [3], which aims to predict the future popularity of online contents based on their early diffusion patterns, is a key issue.

To address the cascade popularity prediction problem, a lot of research efforts have been devoted. Recently, deep learning techniques have shown their superiority in automatically capturing valuable information from cascades and predicting cascade popularity in an end-to-end manner [12]. Some approaches [2, 12] represented cascades as multiple node sequences and then fed them into Recurrent Neural Network (RNN) models [5, 10]. To extract underlying diffusion patterns, some researches applied Graph Neural Network (GNN) models [1, 6] on cascade graphs [4] or social networks [3, 11, 14].

Motivation. Although GNN-based approaches have shown high prediction accuracy, a significant disadvantage shared by them is that they treat each cascade independently, while the collaborations among different cascades are ignored. In fact, according to the research of Myers et al. [13], when there are multiple messages spreading over the online social media, these messages will implicitly interact with each other, including both competition and cooperation effects among different cascades. On the one hand, messages with similar content and topics would have a higher chance to be shared by users if they are exposed multiple times to the same user. On the other hand, each user has limited attention with respect to tremendous online contents, thus different messages would implicitly compete with each other [17]. Therefore, it is worthwhile to consider the implicit interactions among different cascades.

Challenges. There are two key challenges in predicting the popularity of cascades when considering the aforementioned factors. The first challenge is how to capture collaborations among different cascades. To this end, instead of treating each cascade independently, multiple cascades should be considered comprehensively and fine-grained user-message interactions should be included into the learning model to get informative cascade embeddings. The second challenge is how to effectively merge temporal and structural information within each cascade. Temporal information can describe the influence of message and predecessors on users’ diffusion behavior. Most current methods model temporal information as a chain and use RNN to capture the memory effects. However, modeling temporal information as a chain cannot capture the inter-dependence in tree-like cascade graphs.

To address the above challenges, we propose a novel deep learning model named CollaborateCas, which utilizes collaborations among different cascades to learn node and cascade embeddings directly. Specifically, for the first challenge, a heterogeneous user-message bipartite graph is built where users and cascades are represented as two types of nodes and the interactions between users and cascades are taken as edges. Then a type-ware Graph Attention Network (GAT) [15] model is designed to learn representations for the two types of nodes. To deal with the second challenge and based on the observation that users would have different reaction time for different early adopters, we take the difference of infection time as users’ edge features in the homogeneous cascade graphs. The proposed approach is tested on two real world datasets and results show that our model significantly outperforms state-of-the-art baselines in terms of prediction accuracy.

In general, the main contributions of our work are as follows:

  • For the cascade popularity prediction problem, we make the first attempt to model user-message interactions as a heterogeneous bipartite graph and design a type-aware GAT model to learn user and cascade embeddings simultaneously. Our model is able to capture collaborations among different cascades by learning from the fine-grained user-message interactions.

  • Time differences of early adopters and later users are taken as temporal information and encoded into edge features in homogeneous cascade graphs. The temporal and structural information within each cascade graph are used to capture the inter-dependence and attractiveness among different users.

  • The proposed approach is evaluated on two real-world datasets. Experimental results indicate that CollaborateCas significantly outperforms state-of-the-art baselines and the average prediction error is reduced by 9.01% and 5.68% respectively on the two datasets.

2 Problem Formulation

We first introduce some preliminaries and basic definitions to formulate the investigated problem.

Definition 1 (Cascade Set)

The data can be represented as a cascade set \(\mathcal {C}^T=\{C_c^T|c\in \mathcal {M}\}\) which contains cascades with respect to the set of messages \(\mathcal {M}\) within the observation time window T. Each cascade \(C_c^T\) can be represented as a set of tuples \(\{(u,v,t)|t\le T\}\), where (uvt) indicates that user v retweeted the message c from user u at time t within the observation time T.

The purpose of our model is to predict the incremental size of cascade based on observations within a specific time window. Therefore, we define incremental size as follows:

Definition 2 (Incremental Size)

The incremental size of a cascade \(C_c^T\) with observation time T after a given time interval \(\varDelta t\) is defined as \(\varDelta S_c=|C_c^{T+\varDelta t} |-|C_c^T |\), where \(|C_c^T|\) indicates the total number of retweeting behaviors with respect to this cascade by time T.

Based on the aforementioned definitions, we define the cascade popularity prediction problem as follows:

Definition 3 (Cascade Prediction Problem)

Give a cascade \(C_c^T\in \mathcal {C}^T\) within the observation time window T, the cascade popularity prediction problem aims to learn a function \(f(\cdot )\) that maps the homogeneous cascade graph \(G_c(V,E)\) and heterogeneous bipartite graph \(\mathcal {G}(\mathcal {V},\mathcal {E})\) to \(\varDelta S_c=|C_c^{T+\varDelta t} |-|C_c^T |\).

3 Methodology

This section will give detailed illustration about our CollaborateCas model. The overall architecture of our deep learning model is shown in Fig. 1.

Fig. 1.
figure 1

Overview of CollaborateCas: (a) Input: a cascades set \(\mathcal {C}^T\) within observation time T; (b) A heterogeneous bipartite graph is built based on observed cascades and embeddings are learned with a type-aware attention mechanism; (c) User embeddings learned in the previous step are fed into local cascade graph and temporal information is taken as edge features; (d) Both embeddings are concatenated together and then fed into MLP for final prediction.

3.1 Heterogeneous Bipartite Graph Learning

Based on observed cascades, we construct a global user-message graph to explicitly show relationships between messages and users. Since our model involves two different types of nodes, we design a type-aware attention mechanism and use different weights, i.e., \(W_{um}\) and \(W_{mu}\) to make a distinction between two different information gathering directions. Let

$$\begin{aligned} \theta _{ij}^{um}=\vec {a}_{um}^T[W_{um}\vec {h}_{c_i}||W_{um}\vec {h}_{u_j}] \end{aligned}$$
(1)
$$\begin{aligned} \theta _{ij}^{mu}=\vec {a}_{mu}^T[W_{mu}\vec {h}_{u_i}||W_{mu}\vec {h}_{c_j}] \end{aligned}$$
(2)

Where \(\vec {a}_{um}\) and \(W_{um}\) are weights from user to message. \(\vec {a}_{mu}\) and \(W_{mu}\) are weights from message to user. Then, \(\theta ^{um}\) and \(\theta ^{mu}\) are used to generate attention coefficients by softmax function.The embeddings are upadated as follows:

$$\begin{aligned} \vec {h}_{c_i}=f(\sum _{j\in N_i}\alpha _{ij}^{um}W_u\vec {h}_{u_j}) \end{aligned}$$
(3)
$$\begin{aligned} \vec {h}_{u_i}=f(\sum _{j\in N_i}\alpha _{ij}^{mu}W_m\vec {h}_{c_j}) \end{aligned}$$
(4)

Where \(N_i\) is the set of neighbors in bipartite graph. \(\vec {h}_{c_i}\) and \(\vec {h}_{u_i}\) are embeddings after updating.

3.2 Homogeneous Cascade Graph Learning

In our work,a modified attention mechanism is designed to incorporate temporal information into the graph attention network model. Specifically, we have:

$$\begin{aligned} \theta _{ij}=f_{mlp}(\varDelta t_{ij})\cdot [W\vec {h}_{u_i}||W\vec {h}_{u_j}] \end{aligned}$$
(5)
$$\begin{aligned} \alpha _{ij}=\frac{\exp (LeakyReLU(\theta _{ij}))}{\sum _{k\in N_i}\exp (LeakyReLU(\theta _{ik}))} \end{aligned}$$
(6)

where \(\varDelta t_{ij}\) is the time difference between user i and user j, c is the corresponding cascade, \(f_{mlp}()\) is a MLP which is used to project time difference scalar to higher dimensional embedding. Then the cascade embedding is obtained through an attention-based pooling:

$$\begin{aligned} \vec {h}_{c}'=\sum _{i\in c}\alpha _i\vec {h}_{u_i} \end{aligned}$$
(7)

where \(\alpha _{i}\) is the output attention coefficient.

3.3 Cascade Prediction and Loss Function

After embeddings from both heterogeneous bipartite and homogeneous cascade graphs are obtained, they are concatenated and fed into a MLP:

$$\begin{aligned} \hat{y}_i=MLP([\vec {h}_{c_i}||\vec {h}_{c_i}']) \end{aligned}$$
(8)

To optimize parameters of this deep learning model, the loss function is defined as the mean squared error:

$$\begin{aligned} L=\frac{\sum _i(y_i-\hat{y}_i)^2}{n} \end{aligned}$$
(9)

Similar to [7], the label is defined as logarithm of incremental size, i.e., \(y_i=\log (\varDelta S_i+1)\), where \(\varDelta S_i\) is the incremental size.

4 Evaluation

In this section, we evaluate the performance of our proposed model CollaborateCas by comparing it with several state-of-the-art approaches. Some variants of CollaborateCas are also considered for experimental study.We evaluate our model on two real-world datasets including Sina Weibo dataset [2] and HEP-PH dataset [8].We adopt two commonly used metrics, i.e., MSE [4] (Mean Square Error) and RMSPE [7] (Root Mean Square Percentage Error).

Table 1. Overall performance between different approaches on the Sina Webo dataset.
Table 2. Overall performance between different approaches on the HEP-PH dataset.

4.1 Baselines

To show the superiority of our approach, we select 5 state-of-the-art approaches and 3 variants as baselines.

  • Feature-linear & Feature-Deep: We feed some selected features into a linear regression model (Feature-linear) and a MLP (Feature-deep).

  • Node2Vec: Node2Vec [10] learns node embeddings from cascade graphs.

  • DeepCas: DeepCas [12] applys GRU neural network to sequences generated from cascade graph.

  • CasCN: CasCN [4] combines graph convolutional network with LSTM.

  • Deepcon_str: Deepcon_str [7] regards each cascade as a node and builds two cascade-level graphs.

  • CollaborateCas-bipartite: CollaborateCas-bipartite removes the part of homogeneous cascade graphs.

  • CollaborateCas-cascade: CollaborateCas-cascade removes bipartite graph.

  • CollaborateCas-mean: The attention mechanism at the output of cascade graph is replaced with mean operation.

4.2 Performance Comparison

The experimental results of our proposed model and various baselines are shown in Table 1 and Table 2. CollaborateCas achieves significantly lower MSE and RMSPE than all the baselines. For feature engineering-based methods, feature-linear and feature-deep show similar predictability on this task. Node2Vec and DeepCas have relative lower accuracy than other deep learning models.

CasCN performs worse than Deepcon_str and our model because it treats each cascade independently. Deepcon_str has overall better performance than other deep learning-based baselines. However, this method ignores detailed interactions between users and cascades. CollaborateCas has achieved better results than baselines in all three observation time windows, indicating that our unified modeling of heterogeneous bipartite graph and homogeneous cascade graphs can significantly improve the performance of cascade popularity prediction.

We also compare the performance of different variants of our model, as shown in Table 3. In general, CollaborateCas still performs better than other variants. The most competitive variant is CollaborateCas-bipartite, which means that the heterogeneous bipartite graph is an essential part for cascade prediction.

Table 3. Overall performance between variants of CollaborateCas.

5 Conclusion

To address the cascade popularity problem, we proposed a novel deep learning model called CollaborateCas, which can capture collaborations among different cascades. To this end, we constructed a heterogeneous bipartite graph based on fine-grained user-message interactions and homogeneous cascade graphs incorporating temporal information as edge features. Experiments results demonstrate that CollaborateCas can achieve higher accuracy than state-of-the-art baselines.