Keywords

1 Introduction

Social media has empowered human society in many ways. It is easier than ever to keep in touch with those we wish to, allowing an enormous variety of relationships to transcend physical isolation [19]. More so than ever before, social media has a responsibility for our mental wellbeing, as the arbiter of interactions between colleagues, friends and loved ones [13, 24]. It is therefore a matter of the utmost importance that we make this platform a safe environment, protected against those wishing to corrupt the service with fake news [20].

Various methods have been used to tackle the misinformation problem. Content-based misinformation analysis models apply natural language processing tools to the text content of claims [23]. Alone, content-based models fail to trace the dynamics of spread for tasks such as early detection or spread forecasting. Recent misinformation analysis models use static graph neural networks to extract geometric propagation patterns; others leverage time-series analysis by treating misinformation spread as a temporal event sequence [4, 15]. These two approaches each neglect the alternative propagation structure with neither leveraging both geometric and temporal dissemination features.

Propagation-based misinformation analysis makes use of patterns that can be attributed to the dynamics of spread. Our principal goal is to utilise the maximum space of these spreading features, so as to make the most effective use of the available data. Specifically, we first formulate misinformation propagation as a dynamic graph, then we employ a continuous-time temporal point process to extract the temporal evolution patterns and geometric features. Furthermore, we use a power law to model the growth in the temporal network scale, so as to forecast the future rate of spread for a claim identified as misinformation. The contributions of this study can thus be summarised as follows. (i) We formulate misinformation propagation as a dynamic graph. (ii) We then design temporal point processes (TPPs) to utilize both temporal and geometric features of the dynamic graph for misinformation detection. (iii) This study is the first to introduce forecasting of user engagements to misinformation analysis.

2 Related Work

To figure out the differences between true and false statements, most researchers conduct studies from three approaches: textual content, multimedia features and social context. Misinformation often contains opinionated language [2], which motivates textual content-based detection [1]. Sentiment features like positive words (e.g., love, sweet) and negating words (e.g., not, never) are reported to help detect rumours [6]. Misinformation also relies on sensational images to provoke an emotional response in consumers. As an example, Deepfakes [3] employed deep learning to generate fake images and videos to convey misleading information.

In social media, every piece of news is correlated to other posts and users. User engagements (e.g., commenting) provide rich reference evidence in two ways: by aggregation with relevant posts for a specific affair, and by temporal evolution. The first way relies on the “wisdom of crowds” to locate potential misinformation [1], while the second way captures temporal propagation patterns. For example, Hawkes processes are used to analyze how user stance changes temporally in [11]. However, these methods neglect geometric propagation features.

Graph neural networks can extract geometric propagation patterns. Graph Convolutional Networks (GCN) are used in [14] to encapsulate the propagation structure of heterogeneous data. Graph-Aware Co-Attention Network (GCAN) is proposed in [4, 8] to utilise the co-attention mechanism in graph modeling. Each of these works use static graphs and researchers neglect temporal information.

3 Problem Formulation

This section gives definitions and describes notation. A source claim takes the form of \({{\varvec{c}}} = ({{\varvec{x}}}, t)\), where \({{\varvec{x}}}\) is a concatenation of the posting user account features and the claim’s text features, i.e. \({{\varvec{x}}} = [{{\varvec{u}}} \mid \mid {{\varvec{M}}}]\). Here, \({{\varvec{u}}}\) is the user account representation and \({\varvec{M}}\) is the text message representation. t is initially zero, as ensuing dissemination events are timestamped with respect to the source claim.

Suppose the claim \({\varvec{c}}\) is accompanied by a sequence of social engagements \(\mathcal {S}=\{{\varvec{v}}_{1}, {\varvec{v}}_{2}, \dots , {\varvec{v}}_{j}, \dots , {\varvec{v}}_{N}\}\), where \({\varvec{v}}_{j} = ({\varvec{x}}_{j}, t_j)\). Similarly, \({\varvec{x}}_j\) is the feature of an engaging node and \(t_{j}\) is the engagement time with respect to claim post time. Social engagements include all forms of interactions that users conduct with claims on social media platforms, such as reposting, commenting and tagging.

Our temporal, dynamic graph is represented as a sequence of time-stamped snapshots \(\mathcal {G} = \{\mathcal {G}(t_0), \mathcal {G}(t_1), \cdots , \mathcal {G}(t_j), \cdots , \mathcal {G}(t_N)\}\), where the first snapshot simply represents the source claim node and further snapshots are added with each representing the state of the dissemination network when a new node is connected. Let \(\mathcal {G}(t)= <\mathcal {V}(t), \mathcal {E}(t)>\) denote the state of the temporal graph \(\mathcal {G}\) at time t, where \(\mathcal {V}(t) = \{{\varvec{c}}, {\varvec{v}}_{1}, {\varvec{v}}_{2}, \dots , {\varvec{v}}_{j}, \dots , {\varvec{v}}_{N(t)}\}\), with N(t) being the number of nodes to have directly or indirectly interacted with the claim \({\varvec{c}}\) as of time t. A new graph snapshot \(\mathcal {G}(t_{j+1})\) is generated when a node \({\varvec{v}}_{j+1}\) is added to the sequence of social engagements. The graph structure of an exemplary false claim’s dissemination tree is demonstrated in Fig. 1.

Fig. 1.
figure 1

Graph representation of source claim dissemination tree, where nodes represent interaction events such as comments and retweets.

4 Model Description

With the temporal evolution of the propagation graph \(\mathcal {G}(t)\), new engagement nodes will establish edges with existing nodes and thus update the graph. To capture both geometric and temporal propagation features, we view the addition of new engagement nodes as the chronological events and develop a temporal point process that generates node embeddings of the dynamic graph \(\mathcal {G}(t)\).

4.1 Propagation by Temporal Point Processes

A temporal point process (TPP) is a stochastic process that is realised as a list of discrete events in the continuous time domain \(t \in \mathbb {R}^{+}\). TPPs usually rely on an intensity function, which is defined as the probability of the occurrence of an event in an infinitesimal time interval [22], to describe the temporal dynamics. They have been used to model dynamic graphs in [10, 17, 25].

In our propagation graph use-case, the timestamped event sequence comprises static graph snapshots. This static propagation graph represents the final state of the misinformation dissemination tree. Symbolically, \(\mathcal {S} = \{({\varvec{x}}_j, t_j)\}^N_{j=1}\), where \({\varvec{x}}_j\) are the event features (previously node features) and \(t_j\) is the timestamp of the \(j^{th}\) event in the sequence \(\mathcal {S}\). Intuitively, the added edge \({\varvec{e}}_{i,j}\) between the source node \({\varvec{v}}_i\) and the new node \({\varvec{v}}_j\) are influenced by not only \({\varvec{v}}_i\) and \({\varvec{v}}_j\), but also the history nodes of \({\varvec{v}}_i\). With this assumption, we define the intensity function associated with adding the new edge \({\varvec{e}}_{i,j}\) as,

$$\begin{aligned} \lambda _{i,j}(t) = g({\varvec{x}}_i, {\varvec{x}}_j) + \sum _{i' \in \mathcal {H}^{i}}\alpha _{i'j}(t) f({\varvec{x}}_{i'}, {\varvec{x}}_j) \kappa (t - t_{i'}). \end{aligned}$$
(1)

where \(\mathcal {H}^i\) contains history events of the node i. The function \(g(\cdot )\) calculates the affinity between two nodes, which is implemented as a bilinear interaction with the trainable parameter \(\mathbf {W}_1\), i.e., \(f({\varvec{x}}_{i}, {\varvec{x}}_{j}) = {\varvec{x}}_{i} *\mathbf {W}_1* {\varvec{x}}_{j}\). A non-linear activation ReLU is used to define the base intensity \(g(\cdot ) = ReLU(f(\cdot ))\).

The influence from history nodes are measured via the self-attention mechanism as proposed in [21, 22]. For history nodes before time t, we calculate attention weight for each node,

$$\begin{aligned} \alpha _{i'j} = \frac{\exp (f({\varvec{x}}_{i'}, {\varvec{x}}_j))}{\sum _{k \in \mathcal {H}^{i}} \exp (f({\varvec{x}}_{k}, {\varvec{x}}_j))}. \end{aligned}$$
(2)

With the intensity function, we derive the probability of having a new node \({\varvec{v}}_j\) following an existing node \({\varvec{v}}_i\) at the timestamp t,

$$\begin{aligned} p\left( {\varvec{v}}_i,{\varvec{v}}_j \mid \mathcal {H}^{i}(t)\right) =\frac{\lambda _{i, j}(t)}{\sum _{i^{\prime } \in \mathcal {H}^{i}(t)} \lambda _{i^{\prime }, j}(t)}. \end{aligned}$$
(3)

The objective function to minimize is the negative log-likelihood of all the events in the sequence, \( \mathcal {L}_{TPP}=-\sum _{t \in \mathcal {T}} \sum _{({\varvec{v}}_i,{\varvec{v}}_j, t) \in \mathcal {E}} \log p\left( {\varvec{v}}_i,{\varvec{v}}_j \mid \mathcal {H}^{i}(t)\right) . \) Negative sampling is used to generate non-existing edges in the objective function as done in [9], so that the learnt node embeddings are able to distinguish which two nodes are connected and which two are not, i.e., the geometric structure. Maximizing the intensity at occurrence timestamps while minimizing the intensity otherwise will enforce the node embeddings to capture temporal dynamics.

4.2 Predictive Task

Macro-dynamics describe the evolution pattern of the network scale. We assume the network scale can be described with a certain dynamics equation. Given a dynamic graph \(\mathcal {G}\), we have the cumulative number of nodes N(t) by timestamp t. We empirically find that N(t) increases in a power law, which is presented in Sect. 5. To approximate the power law, we define the following predictive equation

$$\begin{aligned} \hat{N}(t) = N_{max}*(1 - \alpha *\exp (-\beta *t)), \end{aligned}$$
(4)

where \(N_{max}\), \(\alpha \) and \(\beta \) are learnable parameters. \(N_{max}\) is the maximum number of nodes that this graph will contain while \(\alpha \) and \(\beta \) control how fast the graph scale will increase. Predictive loss is measured by \(\mathcal {L}_{Pred} = (N(t) - \hat{N}(t))^2\).

4.3 Veracity Classification

We have designed a temporal point process to capture the geometric structure and temporal evolution of the propagation graph. With node embeddings, we obtain the graph embedding by concatenating the mean pooling and the maximum pooling of all nodes as well as the source claim being verified, \( {\varvec{x}}_G = \left[ MeanPool(\mathcal {S}) || MaxPool(\mathcal {S}) || {\varvec{c}} \right] . \) The graph embedding is then concatenated by parameters in predictive tasks, i.e., \({\varvec{x}} = [{\varvec{x}}_G || N_{max} || \alpha || \beta ]\). The veracity prediction is conducted by a Multi-Layer Perceptron (MLP) \( \hat{\mathbf {y}}={\text {softmax}}\left( {\text {ReLU}}\left( \mathbf { W}_{2}{\varvec{x}}+\mathbf {b}\right) \right) , \) where \(\mathbf {W}_2\) and \(\mathbf {b}\) are trainable parameters. And the classification loss is calculated by cross-entropy: \( \mathcal {L}_{MLP}=-y \log \left( \hat{y}_{1}\right) -(1-y) \log \left( 1-\hat{y}_{0}\right) . \) We take the weighted sum of the TPP loss, predictive loss and the MLP loss as the final loss function \( \mathcal {L} = \mathcal {L}_{TPP} + \omega _1 * \mathcal {L}_{Pred} + \omega _2 * \mathcal {L}_{MLP}. \)

Table 1. Statistics of the used datasets.

5 Experiments

We use two Twitter datasets [12], i.e., Twitter15 and Twitter16, in the experiments. Each dataset has a collection of stories with a source tweet being verified and a sequence of its retweets. We pick “True” and “False” source tweets to make the experimental datasets, and split the dataset into training, validation and test sets with 70%, 10% and 20% respectively. We train the model with the training set, tune hyperparameters with the validation set and report performance on the test set. We crawl user information according to their user IDs via Twitter API (Table 1).

As we set out to tackle the misinformation detection task, we compare our model with state-of-the-art baselines. RFC [5] is a random forest model with features from the source tweets and engaged user profiles. CRNN [7] combines convolutional neural networks and recurrent neural networks to extract features from engaged users and retweet texts. CSI [15] incorporates relevant articles and analyses the group behaviour of engaged users. dEFEND [16] uses a co-attention mechanism to study the source claims and user features. The graph-based baseline GCAN has been explained in Related Works.

Fig. 2.
figure 2

Plots of average number of nodes comprising a dissemination tree with respect to time from the moment of source claim publication. The left is Twitter15 while the right is Twitter16. The solid curves follow the power law approximation.

6 Results and Analysis

To demonstrate the dissemination trends of true and false claims, we plotted the mean number of nodes within temporal graphs associated with each veracity classification at 5 min time intervals for the first 200 min following a source Tweet’s posting time. In Fig. 2, we make three interesting observations. (1) Both claim veracity types exhibit a similar power-law trend of plateauing gradient. (2) Contrary to much of the misinformation literature, which suggests that fake news spreads faster than true news [18], within our datasets, true news stories spread faster and reach more users on average. (3) There is a far greater disparity between the mean spreading plots in the Twitter16 dataset than there is in the Twitter15 dataset. This would indicate that it is easier to extract temporal features that are consistent within a given veracity classification in Twitter16.

Table 2. Test results on the two experimental datasets.

We show the misinformation detection performance of our model against state-of-the-art baselines on test subsets. From Table 2, we can tell that we are able to achieve comparable performance with GCAN. Specifically, we beat GCAN on the Twitter16 dataset. This can be explained by the fact that Twitter16 displays greater disparity between the mean spreading of true and false claims, and our model captures such patterns to reach higher performance.

7 Conclusion

This study sets out to detect and forecast misinformation. We model the misinformation propagation as a continuous-time dynamic graph, and employ Temporal Point Processes to capture geometric and temporal patterns of the graph. We also develop a power law equation to forecast the growth of the graph scale. Experiments show the effectiveness of our model to achieve state-of-the-art performance in misinformation detection tasks. Future works will investigate more comprehensive methods to combine temporal and geometric features for propagation-based misinformation detection.