Introduction

The internet has become an inextricable element of our daily lives. In regard to the consumption of news, the more traditional information media such as newspapers and television are seeing a continued decline in their popularity. Certainly, the emergence of various social media platforms has had a substantial influence on this transformation. The currency of social networking websites such as Twitter [1] and Facebook [2] has grown exponentially. In November 2017, Facebook, for example, announced that it had 2.07 billion active users. On a daily basis, 1.37  billion of these people log on to Facebook. Since January 2018, Twitter has had 330 million users. Since the establishment of such relevant forums, these numbers have continued to rise as shown in Fig. 1.

An increasingly large number of people rely on various social media platforms, not only to communicate with friends and family, but also to keep up with current events and news from around the world. As a result, the success of news relies heavily on social media. According to [3], a British study has shown an increase in social media activity and, more importantly, its importance in media consumption. Furthermore, the news consumption also varies by age group of the users as shown in Fig. 2. According to Zubiaga et al. [4], social media has evolved into an essential publishing tool for journalists [5, 6], as well as a primary source of information for people [7]. Journalists can use social media to report on popular perception about current events and even to obtain new information, while citizens can follow the development of news and events through official channels of news outlets and their official social media accounts, along with content from their own circle, such as family and friends, on various social media platforms. Furthermore, due to their inherent capacity to propagate information far quicker than traditional media, social networks have shown to be quite valuable, particularly during several crises [8].

Fig. 1
figure 1

Monthly active users of widely used social networking websites and platforms [9]

Fig. 2
figure 2

Use of social media platforms as a news outlet by age group [9]

However, social media’s beneficial influence comes at a price: a lack of control and trustworthiness—posting content, news, and reviews makes social media a fecund platform for propagating unconfirmed and/or misleading material [4]. People frequently publish or share other people’s postings without confirming the source, legitimacy, or reliability of such content. Most of the times, an intriguing headline is enough to have an article shared thousands of times, even if the substance is possibly incorrect or inaccurate.

Fake news may be yellow journalism that uses the media and the internet to spread purposeful or deliberate misinformation and falsehood. Fake news, however, is not a digital-era invention. Even before the Internet, journalists were tasked with verifying and assessing the accuracy of news and sources, therefore limiting public opinion on false information. The authors in [10, 11], for example, state that in 2016, a bogus rumor about Hillary Clinton being involved in child trafficking prompted a man to shoot a pizzeria that was believed to be one of the trafficking hot spots. Today’s social media platform promotes the spread of unconfirmed and incorrect information to a wide number of users, having a significant impact on global perception and comprehension of events [4]. The 2016 US presidential campaign was probably one of the most prominent examples of how misleading news may impact belief of the general public. The authors of [12] examined the article and discovered several intriguing conclusions, such as voters were exposed to a strong pro-Trump viewpoint during the election campaign. Furthermore, republican voters were more likely to trust in both true and fake news headlines, according to polls [12]. However, it may be safe to conclude that fake news, and misinformation in general, is becoming a huge problem on the internet and therefore, society. As a direct outcome, both the social media platforms and the academic community, along with various independent journalist groups, are quite active in finding and validating potentially fraudulent statements. Facebook, for example, has released a series of recommendations that would assist people avoid becoming victims of fake news articles.

Fake news online has become a considerable concern in recent years (Fig. 3) because of the proliferation of online media platforms. People are frequently unable to devote sufficient time and expertise to examining clues and evaluating news trustworthiness, as seen in Fig. 4. Consequently, given the large number of online content consumers and suppliers, automatic identification of fake news is considered the only viable option to tackle this problem. Thus, fake news detection has become an active research topic in academic circles and industry.

This research study proposes a novel deep learning-based strategy for automatic detection of fake news after evaluating several machine learning and deep learning approaches. Experiments using 16 distinct configurations of machine learning models were conducted and evaluated. In terms of deep learning models, model configurations with Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), and CNN-BiLSTM models along with 3 different word-embedding techniques (a total of 12 distinct configurations) were utilized for experiments and thereafter evaluated against state-of-the-art approaches.

The major contributions of proposed work are listed as follows:

  • A novel deep learning-based approach is presented to automatically detect fake news. The proposed approach significantly outperforms the existing state-of-the-art approaches for fake news detection.

  • Evaluation of 16 configurations of machine learning techniques and 12 configurations of deep learning techniques is presented.

  • The evaluation of the proposed, along with the competing approaches, is done using a consolidated dataset consisting of 64,934 news records. Two publicly available datasets—Fake and real news and allData—were integrated to create the final dataset for experiments.

Fig. 3
figure 3

Number of articles produced on fake news and rumors [13]

Fig. 4
figure 4

Awareness of people about the accuracy of news [9]

The rest of the paper is organized as follows. Section “Literature Survey” of the paper briefly reviews the recent additions to the literature, with an emphasis on data processing, machine learning, and deep learning approaches. The proposed pre-processing approach and model are described in Sect. “Methodology”. Section “Results and Analysis” summarizes the findings and analysis based on the various methods. Section “Comparison with State of the Art” compares the proposed approach’s outcomes to current state-of-the-art methodologies. Conclusions and future scope of the study are presented in Sect. “Conclusion and Future Work”.

Literature Survey

Fake news is not a new phenomenon. Presently, however, it propagates mainly through social media platforms, similar to other related issues like cyberbullying [14] and hate speech [15, 16], with a concentration on breaking events such as criminal gossip, satirical news, natural calamities, and the like. The emergence of fake news has been related to the availability of digital mass media technology. The unverified information may be the part of fake news, and it plays an important role to influence the society connected through social media platforms such as Facebook and Twitter. With the increasing dependency on social media, fake news transforms from textual to visual forms and increase its impact on society. It contains emotionally triggering language and targets the users of the low emotional intelligence. The emotional content-based fake news easily targets the users and it achieves its objective [17, 18]. According to the study in the article [19,20,21], higher emotional intelligence people can differentiate the fake and real news. A research is carried out to identify the emotional intelligence in the article [22].The authors proposed a study to identify role of emotional intelligence in identifying fake news articles from Facebook. In this study, they concluded that people with high emotional intelligence fall less for fake news than people with less emotional intelligence. A psychological characteristics of text of social networking platform concluded that people use manipulative language that focus on dual meaning and away from original information [23]. Also, they concluded that a fake news contains less motion verbs as compared to real news. Again, in the paper [24,25,26], the authors discuss the role of sentiments in increasing the fake news on social media and other platform. They concluded that the sender feels more tension, and they contain negative emotions as compared to real news. Since anybody can contribute news pieces to digital media platforms, online news items feature well-researched snippets as well as opinion-based arguments that are just plain false information. As there is no curator and/or moderator regarding the reliability standards for the material on these sites, false news may propagate easily. To make problems worse, it is difficult to tell the difference between factual news and half-true or false news. To address the issue of fake news, automatic identification has been attempted in the recent years using deep learning and machine learning techniques as identified in the following brief literature review.

In the paper [27], authors discuss the machine learning techniques to detect false news in Russian language. They used TV shows and fact-checking websites as dataset and Support Vector Machine (SVM) model as the classifier. They reported accuracy of 0.600. In the paper [28], authors proposed Vector Space model on TV show transcripts. They reported the 0.630 accuracy on TV show dataset. Again, in the paper [29], authors proposed Credibility Propagation Network of Tweets (CPNT) model on Sina Weibo dataset (real-time self-made dataset). This study reported their performance using different popular metrics such as accuracy (0.840), precision (0.786), and recall (0.933). They commented that proposed approach outperform and obtained significant improvements over the state-of-the-art approaches. In the paper [30], the authors proposed FakeFlow model by combining topic and effective information extracted from text. They evaluated the model’s performance through several experiments on real-world datasets. They achieved the F1-score of 0.960 on TruthShades, 0.88 on PoliticalNews, and 0.85 on FakeNewsNet datasets. They reported that the results of the proposed model are superior when compared against state-of-the-art methods. In the paper [31], the authors discuss the fake news detection using BERT (Bidirectional Encoder Representations from Transformers) model with 0.860 accuracy. They evaluated the results on multiple datasets and concluded that proposed approach outperforms on state-of-the-art methods on all evaluation matrices. In the paper [32], the authors proposed the linguistic feature-driven model for fake news detection. They reported accuracy of 0.860 on the fakenewsdata1 dataset. Also, they compared their proposed model with machine learning and LSTM-based deep learning models. In the paper [33], the authors discuss the detailed analysis of fake news detection on LIAR dataset and concluded that Naive Bayes perform better as compared to other machine learning methods on mentioned dataset. In the paper [34], the authors proposed that fake news was detected using Bidirectional LSTM and Recurrent Neural Network (RNN) models. They evaluated the proposed model on two publicly available, unstructured datasets. They reported that proposed model shows an accuracy of 0.980. In the paper [35], the authors proposed multiple languages and platforms-based fake news detection using k-Nearest Neighbor (k-NN) and Support Vector Machine (SVM) methods. They used three datasets (TwitterBR, FakeBrCorpus, and btvlifestyle) and reported 0.910 accuracy. In the paper [36], the authors proposed automatic filtering of fake news detection in Portuguese language. The proposed approach used SVM model to evaluate the accuracy on Fake.Br (self-made) dataset. In the paper [37], the authors discuss the approaches to detect malicious profiles on Twitter using Random Forest method. They reported that their proposed model got accuracy of 0.990. In the paper [38], the authors discuss the fake news, its effect on society and individual, challenges and opportunities to work toward its detection. They achieved 0.990 accuracy in their experiments. Again in the paper [39], the authors discuss the detailed study on fake news detection and its implications along with the future directions.

In recent years, deep learning is also playing an important role in fake news detection. A hybrid deep model is proposed in the paper [40] for automatic detection of fake news. In this paper, the authors used CSI (Capture, Score, and Integrate) model along with two real-world datasets from Twitter and Weibo. They commented that their proposed model reported 0.890 accuracy. An SRSR (Seriously Rapid Source Review) method is developed to detect fake and real information in journalism [41]. In this paper, the authors apply CNN-based model on SRSR system on fake and real news dataset. They reported 0.92 precision and 0.92 recall. In the paper [42], the authors proposed the WeFEND (WEakly-supervised FakE News Detection) model, a reinforcement learning-based method. In this approach, the authors used their proposed model on WeChat dataset and reported 0.824 accuracy. A deep CNN-based algorithm for fake news detection is proposed in the paper [43]. The proposed method is evaluated on fake news dataset and reported 0.980 accuracy. In the paper [44], the authors proposed deep learning-based model for online fake news detection from multiple domains. They evaluated the proposed model on recently released datasets—FakeNewsAMT and Celebrity—and reported 0.830 accuracy on FakeNewsAMT and 0.790 accuracy on Celebrity. They commented that proposed system outperforms the current handcrafted feature engineering-based state-of-the-art system by significant margins of 3.08% and 9.3%, respectively.

A CNN-RNN-based hybrid deep learning model is proposed in the paper [45] for fake news detection. In this paper, the authors evaluated their proposed model on two new datasets (ISO and FA-KES) and reported 0.600 accuracy which is significantly better than other non-hybrid baseline models. In the paper [46], a domain-specific pre-trained model is proposed. The proposed model, self-ensemble SCIBERT (Scientific Bidirectional Encoder Representations from Transformers)-based model, is evaluated with the FakeHealth dataset. The reported accuracy is 0.690 and precision is 0.720. Again, in the paper [34], the authors present BiLSTM-based model for fake news detection and reported 0.960 accuracy. In the paper [30], the authors used the Longformer model on FakeNewsNet dataset and reported an accuracy of 0.970. In the paper [43], the authors proposed deep learning-based methods with MediaEval, GossipCop, and PolitiFact datasets. They reported 0.920 accuracy.

Several crucial study findings were uncovered after looking at past studies on automatic fake news detection. First, it is noticed that machine learning approaches were utilized in the majority of research to detect false news, which necessitated a great deal of feature engineering. Because of the numerous privacy regulations on social media sites, collecting user information is not always viable. This research investigates whether pre-trained models, rather than handmade characteristics, may be utilized for representation of words. Second, there are not many large datasets to detect phoney news. When it comes to learning algorithms, it is common knowledge that larger datasets tend to generalize well and yield more dependable conclusions, especially with deep learning methods. As a result, the current study examines whether larger datasets may be created by intelligently integrating publicly accessible, comparatively smaller datasets and then used in experiments to acquire more accurate and dependable results? Third, the majority of the research reported their findings in terms of one or two assessment measures such as accuracy, recall, and precision. When it comes to spotting false news, accuracy may not be the only statistic to consider; recall, precision, and F-score are also essential considerations. As a result, improved performance, as measured via most of the popular metrics is desired, as we do not want to mistakenly label any legitimate news as false news. This research explores whether reporting results using all accessible metrics can offer a more fair and just evaluation of the fake news detection system. Table 1 shows the recent work on machine learning and deep learning methods for detecting fake news.

Table 1 Recent work on fake news detection using machine learning and deep learning methods

To address the above identified research points, we employed a variety of machine learning and deep learning methodologies to tackle the first point. In addition, to address the second issue, we consolidated two publicly accessible datasets to have enough input data to train and validate the proposed system. With a large number of fake and real news samples, we anticipate the proposed system’s performance to be unaffected by data limits. Finally, to handle the third point, we analyzed and reported the experimental results using a variety of assessment metrics such as accuracy, precision, recall, F1-score, and AUC-ROC (area under the ROC curve) to evaluate the system as objectively as possible.

Methodology

Fake news detection may be posed as a binary classification problem. In this section, fake news detection is done on the basis of determining if the information in the article is accurate or not. In this regard, the current research makes a contribution by providing a comprehensive comparative analysis of several different models based on unique approaches that demonstrate the utility of deep learning models in detecting fake news. The analysis concludes that a CNN- and BiLSTM-based hybrid approach improves the performance of the fake news detection model in comparison with the recent state-of-the-art approaches. The overall experimental framework used in this work is shown in Fig. 6.

Dataset

In the machine and deep learning community, it is common knowledge that data is the most important aspect of any task. However, there is no single standard dataset that is large enough to be utilized for effectively and reliably detecting fake news, especially with deep learning methods. Therefore, the dataset employed in this study was derived from two different sources: fake and real news dataset [49] and allData [41]. The final dataset was created by merging the datasets mentioned above using the panda’s library’s concat function.

The fake and real news dataset [49] contains news headlines and news descriptions which are labeled as “fake” or “real”. This dataset contains 44,919 labeled news items, out of which 21,417 are labeled as real news and 23,502 as fake news. allData [41] contains 20,015 labeled news items, with 11,941 as fake news and 8074 as real news. The final consolidated dataset used for this study contains 64,934 labeled news items. It contains three columns: News Headline, News Description, and Label (0-fake, 1-real). Approximately 54.6% of news items in the consolidated dataset are labeled as fake news, while 45.4% are labeled as real news. Out of the complete dataset, 70% (45454 news) of the samples were used for training, 15% (9740 news) were used for validation, and 15% (9740 news) were used for testing. The distribution of all three datasets, including the consolidated dataset, is shown in Fig. 5.

Fig. 5
figure 5

Dataset distribution of the datasets used in this work

Preprocessing

Preprocessing is the first step that is accomplished to prepare the input data for feeding into the model. This stage also incorporates the cleaning of the input data, the removal of irregularities in the raw data, data normalization, and other similar issues so that the model can learn more effectively in the later stages of the pipeline.

Many portions in the news descriptions were found to be ineffective in relation to detecting fake news. To reduce the number of distinct terms, the entire data were transformed to lowercase. Special characters, hashtags, and other elements were eliminated. Stop words like “is”, “the”, and “are” were removed since they are not useful for detecting fake news. Following the initial pre-processing, several feature extraction and word-embedding techniques were utilized to extract and represent the input data in a way that the models can fully exploit. The following textual features extraction techniques are used in this work:

  1. 1.

    BOW (bag of words): In this technique, words from sentences are tokenized and placed into bags or groups marking the token and its count.

  2. 2.

    TF-IDF (term frequency–inverse document frequency): In this method, statements of tokenized words are converted into sparse matrices. TF-IDF is calculated using the term frequency (TF) and inverse document frequency (IDF).

In conjunction with the deep learning methods, the following word-embedding techniques are used in this work:

  1. 1.

    Word2Vec: It represents each distinct word with a particular vector. Those vectors are chosen carefully such that a simple mathematical function indicates a level of semantic similarity between the words represented by vectors.

  2. 2.

    GloVe (global vectors): It is a word vector representation method where training is performed on aggregated global word–word occurrence from the corpus.

  3. 3.

    FastText: It is an extension of the Word2Vec model. It represents each word as an n-gram of characters. It helps to capture the meaning of shorter words and allows the embeddings to understand suffixes and prefixes.

Model, Training, and Evaluation

The outperforming model in this study exploits the CNN’s ability to extract spatial information as well as the ability of Long Short-Term Memory (LSTM) to learn long-term relationships. To analyze the input vectors and extract the feature points present at the text level, a 1D convolutional layer is used first. The feature maps of the convolutional layer are used as input for the BiLSTM layer that learns the long-term interdependence of local characteristics in news descriptions to categorize them as real or fake.

Fig. 6
figure 6

Experimental framework used in this work. NB stands for Naïve Bayes

Implementation Decisions

The models are implemented using Google Colab Pro platform. For intensive computing, Google Colab is a cloud environment for Jupyter notebooks that incorporates GPUs (Graphical Processing Units) and TPUs (Tensor Processing Units). The experiment code was written in Python.

Mapping Text to Vectors

The text is tokenized and then converted to vectors. The tokenizer is used to convert a pre-processed training corpus into integer sequences. The length of vector (sequence) is set to 2000 and pre-padding is applied to it. The CNN is given pre-trained word embeddings as input to accomplish local feature extraction. An embedding matrix is created using the FastText, GloVe, and Word2Vec word embeddings. With the current dataset, Word2Vec outperforms the other two embeddings. The dimension of embeddings was set to 200 in this study.

Proposed Model Details

The proposed model is implemented using Keras API. The Sequential model is composed of several layers. The embedding layer is the first layer of the network. This is the input layer where the model is fed the training data. The pre-trained word embeddings are used by giving the prepared embedding matrix as starting weights. To reduce the effect of overfitting, the next layer is a Dropout layer with a rate of 0.3. The third layer is a 1D CNN layer (Conv1D) that has 64 filters of size \(5\times 5\) to extract local features, along with ReLU as the activation function. In the next layer, vectors are pooled (MaxPooling1D) with a window size of 4. The BiLSTM layer that follows receives the pooled feature maps. This information is utilized to train the BiLSTM that outputs long-term dependent features of input while keeping memory. The dimension of the output is set to 128. Next, another Dropout layer is added with a rate of 0.3. The final layer of model is a Dense layer. Here the vectors are classified as real or fake. Sigmoid is used as a activation function in this layer. Binary cross-entropy is used as the loss function and Adaptive Moment Estimation (Adam) is used as the optimizer. Training of the model is performed with a batch size of 64. The proposed model is depicted in Fig. 7 and Algorithm 1.

Fig. 7
figure 7

Architecture of the proposed hybrid deep learning model

Algorithm 1
figure a

Steps in proposed method

Results and Analysis

This section describes the evaluation of machine learning and deep learning methods along with various feature engineering methods. Section 4.1 describes evaluation of the machine learning methods and Sect. “Deep Learning Methods” describes the evaluation of deep learning methods.

Machine Learning Methods

To achieve as accurate results as possible, eight different machine learning models were tried for fake news detection experiments. Among the various classification algorithms used, SVM with linear kernel was found to be the best classification algorithm with an accuracy of 92.3%. The details of these models along with their accuracy, precision, recall, F1-score, and AUC-ROC are depicted in Tables 2 and 3. The graphs for accuracy and F1-score of machine learning methods are shown in Fig. 8.

As per the results obtained, the Multinomial Naive Bayes algorithm with the TF-IDF count vectorizer feature performed the worst among all the classifiers. XgBoost performed the best in terms of precision (0.950) with both BOW and TF-IDF features. SVM with linear kernel and TF-IDF count vectorizer feature provides the best accuracy (0.923), precision (0.949), F1-score (0.917), and ROC-AUC with 0.921. Logistic Regression performs better than other experimented methods in terms of recall (0.892) while SVM performed better in terms of most other performance metrics. The recent advancements in deep learning motivated the authors to apply deep learning-based approaches for fake news detection, as discussed in Sect. Deep Learning Methods.

Table 2 Performance of different machine learning methods on the consolidated dataset using bag of words count vectorizer feature
Table 3 Performance of different machine learning methods on the consolidated dataset using TF-IDF Feature
Fig. 8
figure 8

Graphs showing accuracy and F1-score of Machine Learning methods with different feature extraction techniques

Deep Learning Methods

Different deep learning-based models vanilla and hybrid models were trained with the abovementioned three word-embedding techniques. The details of these models along with their accuracy, precision, recall, F1-score, and AUC-ROC are given in Tables 4, 5, and 6. The graphs for accuracy and F1-score of deep learning methods with different word embeddings are shown in Fig. 9. Among all these deep learning- and word-embedding-based models, the hybrid model CNN-BiLSTM with Word2Vec word-embedding outperforms all other models. It provides accuracy (0.975), precision (0.984), recall (0.970), F1-score (0.977), and AUC-ROC (0.992). It can be seen that, in general, using BiLSTM tends to improve performance over LSTM. This is expected as BiLSTM models the sequence in two directions. Furthermore, when combined with CNN, BiLSTM further improves the results. This indicates that the features identified by CNN layer contribute in the final classification, even after word embeddings have already been used for word representation. Finally, it was expected that FastText embedding would provide the best results among the word embeddings as it can, to some extent, handle out of vocabulary issues. However, this was not observed and the reason for that remains to be explored.

Table 4 Performance of deep learning models on the consolidated dataset with Word2Vec embeddingc
Table 5 Performance of deep learning models on the consolidated dataset with GloVe embedding
Table 6 Performance of deep learning models on the consolidated dataset with FastText embedding
Fig. 9
figure 9

Graphs showing accuracy and F1-score of deep learning methods with different word embeddings

Comparison with State of the Art

In this section, we compare the current study with other research already done on fake news detection. Since there is no standard dataset available, a completely fair comparison is not possible. Therefore, we attempt to compare the methodology and results of other researchers with the present work as far as practicable.

Machine Learning Methods

In this study, different machine learning models were utilized among which SVM proved to be the best that yielded an accuracy of 0.923. The dataset is created by merging 2 different publicly available datasets Fake and real news dataset and allData containing a total of 64,934 data points. The comparison of the current study with state-of-the-art approaches, with respect to the machine learning methods, is shown in Table 7. The graphical comparison of accuracy and F1-score is illustrated in Fig. 10.

Table 7 Comparison of machine learning methods with state of the art
Fig. 10
figure 10

Comparison of (a) accuracy & (b) F1-score of machine learning methods with state of the art

Deep Learning Methods

In this study, various deep learning-based models and hybrid models were trained among which the CNN-BiLSTM model with Word2Vec word-embedding outperformed all other models. It yielded an accuracy of 0.975, precision of 0.984, recall of 0.970, and F1-score of 0.977 on the consolidated dataset containing 64,934 sample points. The comparison of the current study with state-of-the-art approaches, with respect to the deep learning methods, is shown in Table 8. The comparison of accuracy and F1-score is depicted in Fig. 11. It is important to note, however, that a fair comparison of the proposed approach is not possible with state-of-the-art methods. This is because there is no large, standard dataset available for the task. Hence, researchers tend to use the dataset that is available to them or their own scraped datasets. In Table 8, studies that have employed a reasonably large dataset are shown. Among those, the proposed method provides the best performance.

Table 8 Comparison of proposed deep learning method with state of the art
Fig. 11
figure 11

Comparison of (a) accuracy & (b) F1-score of proposed deep learning model with state of the art

Conclusion and Future Work

In this study, several machine learning and deep learning models were assessed for the task of fake news detection and a deep learning-based hybrid model is proposed. The current study used word embeddings like Word2Vec, Glove, and FastText along with models such as LSTM, BiLSTM, CNN-LSTM, and CNN-BiLSTM to arrive at the proposed model. In the experiments, it is observed that all deep learning models outperform machine learning models. The BiLSTM model outperforms unidirectional models. In terms of accuracy (0.975), precision (0.984), recall (0.970), F1-score (0.977), and AUC-ROC (0.992), the CNN-BiLSTM (proposed) model with Word2Vec embedding surpassed all other models. Overall, the employment of deep neural networks in this domain appears to be promising.

In the future, further experiments may be carried out using different, recent word representation techniques to improve the performance. Along with word embeddings, recent high performing classifiers such as transformer architecture-based models may be utilized to improve the recognition of fake news from real news. In relation to products, the proposed methodology may be utilized to create an effective fake news recognition application for various social media platforms.