1 Introduction

In recent years, the topic of fake news has experienced a resurgence of interest in society. The increased attention stems largely from growing concerns around the widespread impact of fake news on public opinion and events. In January 2017, a spokesman for the German government stated that they “are dealing with a phenomenon of a dimension that they have not seen before,” referring to the proliferation of fake news on social media. Although social media has increased the ease with which real-time information disseminates, its popularity has exacerbated the problem of fake news by expediting the speed and scope at which false information can be spread. Fuller et al. [1] noted that with the massive growth of online communication, the potential for people to deceive through computer-mediated communication has also grown and such deception can have disastrous and far-reaching results on many areas of our lives. Figure 1 shows the social network of the users who share fake news and true news [2,3,4].

Fig. 1
figure 1

a, b The social network of the users who share fake news and true news

Over the past decade, we have witnessed the development of fake news detection technologies, mainly grouping into cue and feature-based methods [5,6,7], which can be employed to distinguish fake news contents from true news contents by designing a set of linguistic cues that are informative of the content veracity, and linguistic analysis based methods [8, 9], which can be applied to distinguish fake from true news by exploiting differences in writing style, language, and sentiment. Such methods do not require task-specific, hand-engineered cue sets and rely on automatically extracting linguistic features from the text. Unfortunately, variations in linguistic cues implies that a new cue set must be designed for a new situation, making it hard to generalize cue and feature engineering methods across topics and domains. Linguistic analysis methods, although better than cue-based methods, still do not fully extract and exploit the rich semantic and syntactic information in the content.

Neural network is an attracted machine learning model that can learn the nonlinear mapping from data, especially for deep model. They have been employed for automatic detection of fake news [10,11,12] and show an impressed practical performance. However, even with sophisticated feature extraction of deep learning methods, fake news detection remains to be a challenge, primarily because the content is crafted to resemble the truth in order to deceive readers, and without fact-checking or additional information, it is often hard to determine veracity by text analysis alone. To tackle these challenges, we proposed MCNN-TFW which can provide deeper semantic analysis and understanding of the news article text and its veracity through the relationship between the article text content and the corresponding weight of sensitive words it invokes.

The multi-level convolutional neural network (MCNN) is designed to condense word-level information into sentences and the process the sentence-level representations with a convolutional neural network, to effectively capture semantic information from new texts which can be used to classify the article as fake or not. The method of calculating the weight of sensitive words (TFW) is denoted their importance to true and false news. The MCNN-TFW where MCNN captures semantic information from article text by representing it at the sentence and word levels, and WS calculate the weight of sensitive words in order to assist fake news detection.

We designed and implemented MCNN-TFW with 1200 lines of Python code. After applying MCNN-TFW to datasets with culture as subject, we found that it exhibits impressive performance of fake news detection. In summary, our main contributions include the following:

  1. 1.

    The MCNN employed the local convolutional features as well as the global semantics features for context feature learning.

  2. 2.

    We proposed a method of calculating the TFW, which has shown their stronger importance with their fake or true labels.

  3. 3.

    We designed and implemented a novel type of MCNN-TFW, a multiple-level convolutional neural network-based fake news detection system, which can be trained end to end.

  4. 4.

    We conducted extensive experiments to evaluate the performance of the MCNN-TFW system. The results show that it exhibits impressive performance of fake news detection.

The rest of this paper is organized as follows. Section 2 discusses related work on previous fake news detection. The framework of MCNN-TFW is described in detail in Section 3. The experimental results which show the performance of our framework are in Section 4. We conclude this paper in Section 5.

2 Related work

In order to detect fake news from article text, earlier fake news detection works were mainly based on manually designed features extracted from news articles or information generated during the news propagation process. Figure 2 shows the word cloud of fake news content and true news content [2, 4]. Though intuitive, manual feature engineering is labor intensive, not comprehensive, and hard to generalize. Recent research has focused on deep learning content-based methods. Deep learning methods alleviate the shortcomings of linguistic analysis based methods by automatic feature extraction, being able to extract both simple features and more complex features that are difficult to specify. Deep learning-based methods have demonstrated significant advances in text classification and analysis [10, 12, 13] and are powerful methods for feature extraction and classification with their ability to capture complex patterns relevant to the task. In this section, we will summarize these techniques.

Fig. 2
figure 2

a, b Word cloud of fake news content and true news content

2.1 Linguistic analysis

The most effective linguistic analysis method applied to fake news detection is the n-gram approach [8, 9, 14]. n-grams are sequences of n contiguous words in a text, constituting words (unigrams) and phrases (bigrams, trigrams) and are widely used in language modeling and text analysis. Apart from word-based features such as n-grams, syntactic features such as part-of-speech (POS) tags are also exploited to capture linguistic characteristics of texts. Ott et al. [9] examined whether this variation in POS tag distribution also exists with respect to text veracity. They trained a SVM classifier using relative POS tag frequencies of texts as features on a dataset containing fake reviews. Ott et al. obtained better classification performance with the n-grams approach, but nevertheless found that the POS tag approach is a strong baseline outperforming the best human judge. Later work has considered deeper syntactic features derived from probabilistic context-free grammars (PCFG) trees [15, 16]. Feng et al. [17] examined the use of PCFG to encode deeper syntactic features for deception detection. In particular, they proposed four variants when encoding production rules as features.

Even with word based n-gram features combined with deeper syntactic features from PCFG trees, linguistic analysis methods, although better than cue-based methods [5,6,7], still do not fully extract and exploit the rich semantic and syntactic information in the content. The n-gram approach is simple and cannot model more complex contextual dependencies in the text. Syntactic features used alone are less powerful than word based n-grams, and a naive combination of the two cannot capture their complex interdependence.

2.2 Convolutional neural networks

Convolutional neural networks (CNN) [18] are generally used in natural language processing tasks such as semantic parsing [12, 19,20,21,22] and text classification [13]. Wang [23] proposed a convolutional neural network to classify short political statements as fake or not using the text features of the statements and available metadata. Qian et al. [24] also similarly demonstrated improved CNN performance over linguistic analysis-based methods such as LIWC, POS, and n-gram approach when classifying a collection of news articles as fake or true. In addition, to handle longer article texts, Qian et al. suggested a variant of the CNN architecture called two-level convolutional neural network (TCNN) which first takes an average of word embedding vectors for words in a sentence to generate sentence representations and then represents articles as a sequence of sentence representations provided as input to the convolutional and pooling layers. Qian et al. found the TCNN variant to be more effective than CNN in classifying the articles.

2.3 Other variants

Recurrent neural network (RNN) [25, 26] based architectures are also proposed for fake news detection. RNNs process the word embeddings in the text sequentially, one word/token at a time, utilizing at each step the information from the current word to update its hidden state which has aggregated information about the previous words. The final hidden state is generally taken as the feature representation extracted by the RNN for the given input sequence. A specific variant called long short-term memory (LSTM) [27,28,29,30], which alleviates some of the training difficulties in RNN, is often used due to the its ability to effectively capture long range dependencies in the text and has been applied to fake news detection, similar to the use of convolutional neural networks, in several works [31,32,33,34], whereas in another variant, LSTM has been applied to both, article headline and article text (body), in an attempt to classify the level of disagreement between the two for deception detection [35].

3 Methodology of MCNN-TFW

In this section, we introduce the architecture of the proposed MCNN-TFW fake news detection model. MCNN-TFW is composed of two parts: (1) MCNN represents each article in the local convolutional features as well as the global semantics features and is able to apply detection on fake news articles. (2) TFW is employed to construct sensitive vocabulary set and calculate the weight of sensitive words. The architecture of the proposed model is shown in Fig. 3.

Fig. 3
figure 3

The overview architecture of MCNN-TFW: (1) multi-level convolutional neural Network (Section 3.2) and (2) method of calculating the weight of sensitive words (Section 3.3)

3.1 Notations

We consider the setting where we have a set of news articles D, and each article is denoted as di. Each article di is composed of a sequence of sentences \( {x}_1,{x}_2\cdots {x}_{n_i} \), where ni is the number of sentences in the article di. In the proposed model, the final feature vector extracted for each article di for classification is marked as y. For each article, there will be several related weight of sensitive words. A given weight to article di is marked as WS(di). For each article di, the target is marked as yi. yi = 1 means this article is real news, and yi = 0 means this article is fake news.

3.2 Multi-level convolutional neural network

Due to the sequential nature of sentences, recurrent neural networks are widely employed to produce textual features. In our proposed framework, as shown in Fig. 4, we employ CNN for sentence encoding instead of RNN. The hierarchical representations of CNNs make local semantic learning in convolutional layers possible; it may better reflect the characteristics information such as sensitive words. Specifically, instead of only forcing a consistency in the semantic space of global features, we can also add the consistency constraints on the local convolutional features. This additional constraint encourages the model to consider the regional semantics into consideration to focus more on sensitive words. Eventually, this design is expected to produce more robust and better global semantics for fake news detection.

Fig. 4
figure 4

The proposed convolutional neural network architecture for text representation learning. It consists of several convolutional layers with different kernel sizes and hierarchical text representation learning

Therefore, our model consists of two learning objectives. The first learning objective, also defined as global objective, is to learn the semantic embeddings using the feature representations of the whole sentences. We introduce the second objective function, local objective, which is to flatten the convolutional features into a vector.

Firstly, following the design in [36], the word embeddings are initialized using pre-trained word2vec model on Google News corpus [37], where each word is embedded into a feature space with dimension k. The article representation is derived from the sentence representations by concatenation of each sentence representation. Let xi ∈ k be the k-dimensional word vector corresponding to the ith word in the sentence. The article di, containing ni sentences, is represented as:

$$ {x}_{1:{n}_i}={x}_1\oplus {x}_2\oplus \cdots \oplus {x}_{n_i} $$
(1)

where ⊕ is a concatenation operator. In general, let xi : i + j refers to the concatenation of words xi, xi + 1, ⋯, xi : i + j. Note that each sentence is represented on a word level; the news article now is represented on a sentence level as shown in Eq. (1).

A convolution operation then applies a filter w ∈ hk to a window of h sentences moving through the article to extract semantic information features from the article. A feature ci is generated from a window of words xi : i + h − 1 by

$$ {r}_i=f\left(w\cdot {x}_{i:i+h-1}+b\right) $$
(2)

Here b ∈  is a bias term and f is an activation function such as ReLU. This filter is applied to each possible window of words in the sentence {x1 : h, x2 : h + 1, ⋯, xn − h + 1 : n} to produce a feature map for each article

$$ \mathrm{Loc}=\left[{r}_1,{r}_2,\cdots, {r}_{n-h+1}\right] $$
(3)

where Loc ∈ n − h + 1 is the local convolutional features. After that, we apply a max-overtime pooling operation over the feature map and take the maximum value Glo = max {oc} as the feature corresponding to this particular filter. Here Glo is the global semantics feature. Filters which have different lengths or have a same length but with different parameters are applied in order to capture features of different lengths and meaning.

Finally, we will combine the local convolutional features and the global semantics features in a certain way to form a final feature vector (Section 2.2) which are used as the input to a fully connected layer and a softmax output.

3.3 TFW: method of calculating the weight of sensitive words

Textual information of fake news can reveal important signals for their credibility detection; a set of frequently used words can also be extracted from the fake news. These extracted words have shown their stronger correlations with their fake or true labels.

Therefore, we first crawled frequent words (named sensitive words) from the dataset we constructed (Section 4.1); let WV denotes the complete vocabulary set. Then we calculated the weight of sensitive words which is denoted their importance to real and false news. Given that several sensitive words are used in fake and real news, the measurement would be biased if only the coefficient of a sensitive word is calculated with its frequency of occurrence in a fake news. We propose a TFW measure of the sensitivity coefficient of sensitive words that exploits the idea of TF-IDF [38,39,40]. To achieve this, 5618 fake news and 5290 real news are downloaded from above dataset, real news in eight categories, such as art, artifacts, and historical sites. We use six terms of sensitive words si to understand its distribution in our fake and real news.

  1. 1.

    nc(si): fake news count of si. It denotes the number of fake news using si in the fake news dataset.

  2. 2.

    rc(si, c): real news count of si. It denotes the number of real news using si in category c.

  3. 3.

    nrt(si): ratio of nc(si) to the total number of fake news in the fake news dataset which is represented as p. nrt(si) can be obtained with \( nrt\left({s}_i\right)=\frac{nc\left({s}_i\right)}{p} \), where p = 5618 in our work.

  4. 4.

    rrt(si, c): ratio of rrt(si, c) to the total number of real news in category c which is represented as q(c). rrt(si, c) can be obtained with \( rrt\left({s}_i,c\right)=\frac{1+ rc\left({s}_i,c\right)}{q(c)} \).

  5. 5.

    nrk(si): rank number of nrk(si) among all the sensitive words.

  6. 6.

    rrk(si, c): rank number of rrk(si, c) among all the sensitive words in category c.

Through analysis of the above dataset by using these six terms, we can draw the following three conclusions:

  1. 1.

    Several sensitive words are used frequently in the fake and real news.

  2. 2.

    Several sensitive words are used more frequently in the fake news than in the real news.

  3. 3.

    The rrt(si, c) and rrk(si, c) differ in the different categories.

In text mining literature, TF-IDF is a numerical statistic intended to reflect how discriminating a term is to a document in a corpus. By utilizing the idea of TF-IDF for reference, we make the scs of a sensitive words be in positive correlation with its nrt and in negative correlation with its rrt. For sensitive words si of a news that belongs to a specific category c, its sensitivity coefficient scs(si) is calculated with Eq. (4).

$$ scs\left({s}_i\right)= nrt\left({s}_i\right)\times \log \frac{1}{rrt\left({s}_i,c\right)} $$
(4)

The formula shows that the sensitivity coefficients calculated by the TFW measure can reflect the importance of sensitive words in different categories. However, there are some datasets that have no category information. Therefore, we calculate the sensitivity coefficients of sensitive words for such news as:

$$ scs\left({s}_i\right)= nrt\left({s}_i\right)\times \log \frac{1}{rrt\left({s}_i\right)} $$
(5)

rrt(si) denotes the percent of news in all real news using the sensitive words si and it is obtained using Eq. (6), in which C denotes the set of all the real categories.

$$ rrt\left({s}_i\right)=\frac{1+{\sum}_{c\in C} rc\left({s}_i,c\right)}{\sum \limits_{c\in C}q(c)} $$
(6)

3.4 Fake news detection user MCNN-TFW

MCNN is able to extract features including local convolutional features and the global semantics features from the article text and use that for predicting whether the article is fake or not, whereas TFW is able to calculate the weight of sensitive words that exist in each news.

In this section, we first combine the local convolutional features and the global semantics features based on weight of sensitive words for each news to form a final feature vector. Based on the pre-extracted vocabulary sets WV, given a news di ∈ D, we calculate the weight of sensitive words for each news as:

$$ WS\left({d}_i\right)=\sum \limits_{s_i\in WV} scs\left({s}_i\right) $$
(7)

The final feature vector is denoted as:

$$ \mathrm{VLG}=\mathrm{Glo}. WS\left({d}_i\right)+\mathrm{Loc} $$
(8)

Then, the final feature vector is fed into a feedforward softmax classifier for classification as shown in Fig. 1 to predict whether the news is fake or real. During the learning process, the loss for each n sized batch sample is evaluated as the sum of the cross-entropy between the neural net’s prediction and the true label, and the loss function is described as follows: where Y denotes the label of input vector, and Y denotes the predicted label result [41,42,43,44,45,46,47,48].

4 Evaluation

In this section, to empirically validate our developed system MCNN-TFW, we first introduce the study setup of our experiments and then address the following four research questions.

  • RQ 1: does MCNN-TFW have a higher performance?

  • RQ 2: does MCNN-TFW outperform the baseline approaches in term of accuracy?

  • RQ 3: what is the role of MCNN in MCNN-TFW?

  • RQ 4: can MCNN-TFW work efficiently and be scalable for a large number of article?

  • RQ 5: whether the method we proposed has general applicability?

4.1 Study setup

4.1.1 Datasets

We utilize five available fake news datasets in this study, which are from the benchmark dataset commonly used in today’s methods. The first dataset is collected by Wang et al. [23] from the LIAR. For the next two datasets, we utilize two available online fake news datasets provided by Weibo [49] and Twitter15 [50]. The last two datasets are provided by NewsFN (https://github.com/GeorgeMcIntire/fake_real_news_dataset) and KaggleFN (https://www.kaggle.com/mrisdal/fake-news). In this paper, our focus is the detection of fake news in cultural communication. Therefore, we combine all datasets and from which crawling culture as subject of news to from two new datasets. Dataset I uses 4180 news from the Weibo, Twitter15, and NewsFN. Dataset II uses 6728 news from LIAR and KaggleFN, including 3518 fake news and 3210 real news. Table 1 lists the information about these two datasets.

Table 1 The dataset information used in all the experiments

4.1.2 Experimental setting

In the experiments, we set the word embedding dimension to be 128 and filter size to 3, 4, 5. For each filter size, 64 filters are initialized randomly and trained. The whole network is trained using the Adam optimization algorithm with a learning rate of 0.001 and dropout rate of 0.5. The mini-batch size is 64 and both local and global loss are computed within each mini-batch.

We build and train the model using TensorFlow and use tenfold cross-validation for evaluation of the model. We used an Ubuntu 14.04 machine with Intel Core i7-5820 k CPU, GeForce GTX TITAN X GPU and 104 GB RAM to deploy our proposed framework. In the experiments, the GPU was utilized to accelerate machine learning algorithms. Table 2 lists the metrics used to evaluate MCNN-TFW.

Table 2 Descriptions of the used metrics

4.2 Accuracy of MCNN-TFW

4.2.1 RQ 1: does MCNN-TFW have a higher performance?

We show the performance of MCNN-TFW fake news detection in cultural communication on dataset I and dataset II in Table 3. The accuracy of the classifier is 91.67% on dataset I, and the accuracy of the classifier is 92.08% on dataset II. This shows that the MCNN-TFW can classify fake news with high precision. Figure 5 shows the classification accuracy of MCNN-TFW on dataset I and dataset II.

Table 3 Performance of MCNN-TFW fake news detection on dataset I and dataset II
Fig. 5
figure 5

Classification accuracy of MCNN-TFW on two datasets

As is shown in Fig. 6, We present the results in terms of the performance of detection, on varying percentage (20–80%) of data samples used as training data to evaluate the variation and stability in performance for the evaluated methods. Overall MCNN-TFW can classify fake news with high performance, even when the training data is limited.

Fig. 6
figure 6

ac Detection performances for MCNN-TFW on dataset I and dataset II. The x-axis represents the percentage of all data used as training data

We can infer that the main reason is that MCNN first introduced the local convolutional features as well as the global semantics features, to effectively capture semantic information from article texts. Then, we employed a method of calculating the weight of sensitive words (TFW), which has shown their stronger importance with their fake or true labels. Finally, the MCNN-TFW where MCNN captures semantic information from article text by representing it at the sentence and word levels and WS calculate the weight of sensitive words in order to assist fake news detection.

Answer to RQ 1

MCNN-TFW can classify fake news with high precision. The accuracy of the classifier is 91.67% on dataset I, and the accuracy of the classifier is 92.08% on dataset II.

4.3 Comparison of MCNN-TFW with other advanced defense methods

4.3.1 RQ 2: does MCNN-TFW outperform the baseline approaches in term of accuracy?

To show the performance of our framework compared with state-of-art detection systems, we investigated the similar approaches that have been previously proposed. In this section, we compare the accuracy of MCNN-TFW with the three baseline approaches that are briefly introduced as below:

  1. 1.

    LIWC. Based on the work of Ott et al. [8], the first baseline we propose is based on using LIWC (linguistic inquiry and word count) features for text analysis. LIWC is a widely used lexicon in social science studies proposed by Pennebaker et al. [51,52,53].

  2. 2.

    CNN. Convolutional neural networks have achieved state-of-the-art in text classification tasks and based on the work of Wang [23] which demonstrates superior performance of CNN over recurrent neural architectures like the bidirectional LSTM (long short-term memory) for fake news detection, we choose CNN for comparison.

  3. 3.

    RST. We extract a set of RST relations using the implementation of the method proposed by Ji et al. [54]. Then, we vectorize the relations and employ SVM for classification. This baseline proposed by Rubin et al. [55] takes into account the hierarchical structure of documents via RST [56].

Experimental results show the detection performances for MCNNTFW and three baseline approaches on dataset I and dataset II. We use accuracy as the metric of performance evaluation given. Table 4 shows the comparison results on dataset I and dataset II, and we make the following observations:

  1. 1.

    The poor performance of RST is because of the following reasons: (a) using RST without an annotated corpus is not very effective and (b) RST relations are extracted using auxiliary stools optimized for other corpora which cannot be applied effectively to the fake news corpus in hand. Note that annotating RST for our corpus is extremely unscalable and time consuming.

  2. 2.

    CNN achieves a better performance than LIWC. In line with the previous study, this shows that for fake news detection, taking into account the text is represented at the word level and fed to the CNN that extracts semantic representation of the detection of fake news do is more effective than employing the existing pre-defined dictionaries as LIWC does.

  3. 3.

    MCNN-TFW outperforms CNN with the proposed multi-level representation. Single-layer CNN built over word-level article representations can only utilize combinations of several nearby words. However, by first condensing word-level information into each sentence, then deriving sentence-level representation for the news article, higher level semantic information can be extracted more effectively.

Table 4 Classification accuracy of MCNN-TFW and three baseline approaches on dataset I and dataset II

Answer to RQ 2

MCNN-TFW outperforms the other methods compared against including RST, LIWC, and CNN due to its ability to effectively capture semantic information from the article text content. Moreover, on the one hand, MCNN introduced the local convolutional features as well as the global semantics features. On the other hand, TFW further improves the performance of MCNN and pushed the accuracy even higher.

4.4 Evaluation of MCNN in MCNN-TFW

4.4.1 RQ 3: what is the effectiveness and significance of MCNN in MCNN-TFW?

In this section, based on the dataset of dataset II above, we further validate the effectiveness and significance of our proposed MCNN in detecting fake news. We compare MCNN-TFW with a classifier built directly using MCNN. The latter does not use the TFW we proposed. Instead, it uses the MCNN model to detect fake news in Section 3.2. We trained a MCNN classifier to achieve an accuracy of 90.21%, which is close to the state of the art. With TFW, the accuracy of ARNDroid is increased to 92.08%. This small reduction is negligible. We evaluated the performance of MCNN-TFW and MCNN on dataset II. The experimental results are shown in Fig. 7.

Fig. 7
figure 7

Detection performances for MCNN-TFW and MCNN

Figure 7 shows the detection performance of the tenfold cross-validations for MCNN-TFW and MCNN on dataset II. The ROC curve indicates that both MCNN-TFW and MCNN achieve a high value for TPR and a low value for FPR. In particular, the AUC values of MCNN-TFW and MCNN are near 0.96 and 0.94, respectively, which indicates that they have similar detection effects. However, it is worth noting that for ARNDroid, the detection performance yields a TPR of 0.914 at an FPR of 0.008.

Answer to RQ 3

MCNN represents each article in the local convolutional features as well as the global semantics features and is able to apply detection on fake news articles.

4.5 Efficiency and scalability of MCNN-TFW

4.5.1 RQ 4: can MCNN-TFW work efficiently and be scalable for a large number of article?

The number of both real and fake news is growing very quickly making it increasingly important that fake news analysis scales so that such fake news does not remain undetected long enough to do major damage or even any damage. Therefore, we systematically evaluate the performance of our developed system MCNN-TFW, including scalability and detection effectiveness.

We first compute the runtime overhead of the three main phases of MCNN-TFW, WV (the complete vocabulary set) construction, MCNN training, and testing. In particular, once the MCNN are trained, it is almost instantaneous to use it to detect fake news. Therefore, for the evaluation of MCNN-TFW efficiency, we mainly measure the WV construction and MCNN training running time on dataset I and dataset II.

Table 5 describes the proposed MCNN-TFW construction WV and training MCNN execution time (in minutes). It is worth noting that the WV construction step can be processed in parallel on multiple servers, so the total time overhead can be greatly reduced if hardware conditions permit when there is a lot of data.

Table 5 WV construction and MCNN training runtimes in minutes

Finally, we show the detection stability of MCNN-TFW, with different sizes of sample sets in Fig. 8. From the results, we can conclude that our developed system MCNN-TFW is feasible in practical use for fake news detection.

Fig. 8
figure 8

Stability evaluation of MCNN-TFW

Answer to RQ 4

The low runtime overhead allows MCNN-TFW to work efficiently and be scalable to a large number of news.

4.6 General applicability of MCNN-TFW

4.6.1 RQ 5: whether the method we proposed has general applicability?

In the following experiments, we examine the generality of our proposed method by applying it to the Weibo and NewsFN fake news detection tasks. Similar to the experiments implemented on fake news in cultural communication, we also implement two groups of experiments, one for presenting the classification accuracy on each dataset and another for comparing between different baseline technologies.

As shown in Fig. 9, we show the performance of MCNN-TFW fake news detection on Weibo and NewsFN. The trend of accuracy is consistent with that found in the experiments (Section 4.2). The classification accuracy on Weibo is 88.82%, and the accuracy on NewsFN is 90.10%.

Fig. 9
figure 9

Classification accuracy on NewsFN and Weibo datasets

As is shown in Table 6, we show measures of classification accuracy of our proposed MCNN-TFW and the above three baseline methods on the Weibo and NewsFN datasets. The experimental results show the following:

  1. 1.

    MCNN-TFW has higher on the accuracy than the other three comparison methods, which indicates that the performance of MCNN-TFW fake news detection is significantly better than other models.

  2. 2.

    The RST model exhibits poor performance on both Weibo and NewsFN datasets.

  3. 3.

    Compared with the LIWC model, the accuracy of CNN is significantly better, which has increased by 18.18% and 19.30% on Weibo and NewsFN datasets respectively.

Table 6 Classification accuracy of MCNN-TFW and three baseline approaches on Weibo and NewsFN datasets

Answer to RQ 5

MCNN-TFW has higher on the accuracy than the other three comparison methods on Weibo and NewsFN datasets, which demonstrates that our proposed method has the general applicability.

5 Conclusion

we first proposed multi-level convolutional neural network (MCNN), which introduced the local convolutional features as well as the global semantics features, to effectively capture semantic information from article texts which can be used to classify the news as fake or not. We then employed a method of calculating the weight of sensitive words (TFW), which has shown their stronger importance with their fake or true labels. Finally, we develop MCNN-TFW, a multiple-level convolutional neural network-based fake news detection system, which is combined to perform fake news detection in that MCNN extracts article representation and TFW calculates the weight of sensitive words for such news.

Our extensive evaluation results show that MCNN-TFW outperforms the state-of-the-art approaches in terms of accuracy and efficiency. Our proposed method on the detection of fake news in cultural communication paradigm can also be nearly applied to other fake news detection tasks. Future work will include studying the performance of our approach in a wider range of applications.