Fake news detection: recent trends and challenges

Thakar, Hemang; Bhatt, Brijesh

doi:10.1007/s13278-024-01344-4

Fake news detection: recent trends and challenges

Review
Published: 30 August 2024

Volume 14, article number 176, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Social Network Analysis and Mining Aims and scope Submit manuscript

Fake news detection: recent trends and challenges

Download PDF

Hemang Thakar^1,2 &
Brijesh Bhatt²

76 Accesses
Explore all metrics

Abstract

The proliferation of fake news in the digital age has spurred extensive research efforts toward developing effective detection techniques. This abstract delves into recent trends and challenges within the domain of fake news detection. The ubiquity of social media platforms and user-generated content has led to the rapid dissemination of misinformation, necessitating robust mechanisms for differentiating between authentic and fabricated news. This paper explores emerging approaches, such as advanced machine learning models, natural language processing techniques, and cross-modal analysis, which leverage textual, visual, and contextual cues to enhance detection accuracy. However, as fake news tactics become more sophisticated, challenges like adversarial attacks, data scarcity, and domain adaptation come to the forefront. This abstract highlights the ongoing efforts to address these challenges and emphasizes the importance of interdisciplinary collaboration to devise comprehensive solutions for combating the intricate landscape of fake news dissemination.

Fake News Detection: Tools, Techniques, and Methodologies

Fake News Detection: Experiments and Approaches Beyond Linguistic Features

Fake News Detection Methods: A Survey and New Perspectives

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In an era characterized by the rapid dissemination of information through digital platforms, the proliferation of fake news has emerged as a critical concern. The term”fake news” refers to intentionally fabricated or misleading information presented as genuine news, often designed to deceive (Jain et al. 2022), manipulate, or exploit the audience’s emotions and beliefs. The rampant spread of fake news has the potential to sway public opinion, influence decision-making processes, and even disrupt social and political landscapes (Khattar et al. 2019). Consequently, the development of effective methods for fake news detection has become an urgent necessity.

Recent years have witnessed significant advancements in the field of fake news detection, driven by a combination of technological innovations and growing awareness about the potential consequences of misinformation (Zhou et al. 2020). This dynamic landscape poses both opportunities and challenges, as the creators of fake news constantly evolve their strategies to bypass detection mechanisms. From sophisticated AI-generated articles to meticulously doctored images and videos (Orhan 2023), the arsenal of fake news has expanded, demanding more sophisticated and adaptable detection techniques.

Several instances of fake news are depicted in Fig. 1. These fake news instances gained significant traction during the COVID-19 pandemic and the 2016 U.S. General Presidential Election (Kaliyar et al. 2021a).

This article delves into the latest trends and challenges surrounding fake news detection. It explores the evolving techniques employed by purveyors of misinformation and highlights the innovative strategies researchers and technologists have devised to counteract them. From natural language processing and machine learning algorithms to data mining and social network analysis, a multitude of approaches (Varghese et al. 2024) are being harnessed to differentiate between genuine news and deceptive content. However, amidst this progress, there remain formidable obstacles such as the lack of a universally agreed-upon definition of fake news, the ethical implications of content moderation, and the balance between freedom of expression and the need to curb misinformation.

As we navigate this intricate landscape, it becomes crucial to understand the technical aspects of fake news detection and the broader societal and psychological factors that contribute to its dissemination and impact (Roy et al. 1811). By shedding light on the latest developments, this article aims to contribute to the ongoing discourse on combating fake news and promoting media literacy in an increasingly information-saturated world. As technology and deception continue intertwining, staying ahead in the battle against fake news requires (Bharadwaj and Shao 2019) vigilance, collaboration, and a multidimensional approach encompassing technology, psychology, and critical thinking.

Social media has transformed into a platform for sharing information, ideas, and feelings across the globe. However, this convenience has also facilitated the spread of misinformation, which can be disseminated quickly, cheaply, and maliciously. Spreading false information is often used to damage public information, organizations, and even countries, highlighting the importance of identifying misleading information (Oshikawa et al. 1811). Research is being done to create reliable and accurate algorithms that can automatically identify false information on social media to solve this issue. These automated applications are made using cutting-edge technology like data mining, machine learning, and natural language processing (NLP) (Shu et al. 2017).

1.1 Motivation and research objective

The field of fake news detection stands as a thriving research domain, drawing the keen interest of researchers worldwide. Significant scope for enhancement emerges within the realm of fake news detection, primarily due to the limited availability of context- specific news data for training purposes. The adoption of deep learning methodologies in fake news detection presents a distinctive advantage over conventional approaches, given their prowess in extracting advanced features from the data. These aforementioned challenges and opportunities serve as the driving force behind our endeavor to construct an efficient deep-learning model dedicated to the task of fake news detection.

1.2 Existing methodologies for the identification of fake news

Identifying fake news poses a significant challenge due to its deliberate intent to distort information. Preceding theories play a crucial role in directing investigations into counterfeit news detection, employing diverse classification models. Current insights into detecting fake news can be broadly grouped into two main categories: (i) Learning based on News Content, and (ii) Learning based on Social Context.

1.2.1 News content-based learning

News Content-based learning (Jain et al. 2022; Zhou et al. 2020; Dong et al. 2020; Sadeghi et al. 2022; Galli et al. 2022; Bra¸soveanu, A.M., Andonie, R. 2021; Verma et al. 2021; Shishah 2021) involves analyzing the textual and linguistic characteristics of news articles to distinguish between genuine information and fake news. This approach hinges on the understanding that deceptive content often exhibits linguistic anomalies, sensationalism, or lacks credible sources. By examining the structural attributes, writing style, and vocabulary usage within articles, machine learning algorithms can be trained to uncover patterns indicative of falsified information.

Through the utilization of Natural Language Processing (NLP) techniques, such as sentiment analysis, text summarization (Galli et al. 2022; Bra¸soveanu, A.M., Andonie, R. 2021; Reddy et al. 2020; Rani et al. 2022; Palani et al. 2022; Rai et al. 2022; Shan et al. 2021; Kaliyar et al. 2021b; Jarrahi and Safari 2023) and language models, this approach aims to identify inconsistencies, exaggerated claims, and linguistic markers commonly associated with misinformation. For instance, excessive use of emotional language, hyperbolic statements, or the absence of verifiable sources can raise red flags about the authenticity of the content.

Furthermore, this method is fortified by the accumulation of labeled datasets (Zhou et al. 2020; Palani et al. 2022; Rai et al. 2022; Jarrahi and Safari 2023) containing both genuine and fake news articles. Machine learning algorithms can then be trained on these datasets, allowing them to learn the nuanced distinctions between reliable and deceptive content. The process involves feature extraction, wherein relevant linguistic attributes are quantified, and classifiers are employed to differentiate between the two categories.

While News Content-based learning (Kaliyar et al. 2021a, 2021b; Sadeghi et al. 2022; Verma et al. 2021; Souza et al. 2022; Galende and Hern´andez-Pen˜aloza, G., Uribe, S., Garc´ıa, 2022; Nassif et al. 2022; Mughaid and Al-Zu’bi, S., Al Arjan, A., Al-Amrat, R., Alajmi, R., Zitar, R.A., Abualigah, L. 2022; Mohapatra et al. 2022) holds promise in detecting fake news, it is not without limitations. The constantly evolving nature of deceptive strategies demands ongoing updates to the algorithms. Additionally, this approach might struggle with subtle instances of misinformation that do not overtly deviate in language use or style. Balancing the need for accurate detection with potential false positives remains a challenge, as certain linguistic features might be shared between genuine news and well-crafted fake stories.

In essence, News Content-based learning (Li et al. 2021, 2020; Ying et al. 2021; Ma et al. 2015; Wang et al. 2021; Liu and Wu 2020) forms a pivotal part of the arsenal against fake news, leveraging linguistic and textual cues to unravel the threads of deception woven within the fabric of information. Its integration with other approaches, such as Social Context-based learning, holds the potential to enhance the accuracy and robustness of fake news detection systems.

1.2.2 Social context-based learning

Social Context-based learning (Li et al. 2021; Ying et al. 2021; Ma et al. 2015; Wang et al. 2021) involves analyzing the social interactions and dynamics surrounding news articles to assess their credibility and authenticity. This approach recognizes that the dissemination and reception of news are deeply intertwined with the social ecosystem in which they exist. By examining factors such as user engagement, sharing patterns, and the credibility of sources, this method aims to uncover signals that can help distinguish between genuine news and fake information. One of the key components of Social Context-based learning (Kaliyar et al. 2021a, 2021b; Sadeghi et al. 2022; Verma et al. 2021; Souza et al. 2022; Galende and Hern´andez-Pen˜aloza, G., Uribe, S., Garc´ıa, 2022; Nassif et al. 2022; Mughaid and Al-Zu’bi, S., Al Arjan, A., Al-Amrat, R., Alajmi, R., Zitar, R.A., Abualigah, L. 2022; Mohapatra et al. 2022) is the analysis of the propagation patterns of news articles across social media platforms. The rapid sharing of fake news often leads to its viral spread, driven by emotional responses and confirmation bias. By tracking the velocity and volume of shares, likes, comments, and retweets, algorithms can identify articles that are gaining traction unusually quickly or within specific echo chambers.

Furthermore, the credibility of the sources sharing the news plays a critical role. Social Context-based learning (Dong et al. 2020; Ying et al. 2021) involves assessing the authority and authenticity of the accounts sharing the information. Accounts with a history of sharing trustworthy content and a diverse range of sources are more likely to share accurate news. Conversely, accounts that predominantly share sensational or misleading information might raise suspicions.

Contextual analysis also contributes to this approach. Understanding the broader context in which a news article is shared, including the events and conversations surrounding it, can provide insights into its accuracy (Devlin et al. 1810; Reis et al. 2019; P´erez-Rosas, V., Kleinberg, B., Lefevre, A., Mihalcea, R. 1708). Additionally, detecting inconsistencies between a news article and verifiable facts can help identify potential misinformation.

Social Context-based learning is enhanced through the utilization of network analysis, sentiment analysis, and machine learning methodologies (Liu and Wu 2020; Long et al. 2017; Ozbay and Alatas 2021). By modeling the complex relationships between users, content, and interactions, algorithms can learn to differentiate between genuine news and fake news based on social dynamics. However, challenges exist in this approach as well (Liu and Wu 2020; Li et al. 2020). Misinformation campaigns can manipulate social dynamics, employing tactics to artificially inflate engagement metrics. Moreover, relying solely on social context might not catch sophisticated fake news stories that avoid triggering suspicious patterns.

In conclusion, Social Context-based learning complements (Kaliyar et al. 2021a, 2021b; Nassif et al. 2022; Mughaid and Al-Zu’bi, S., Al Arjan, A., Al-Amrat, R., Alajmi, R., Zitar, R.A., Abualigah, L. 2022; Mohapatra et al. 2022) other fake news detection strategies by tapping into the intricate web of social interactions and human behaviors. Its ability to uncover anomalies in sharing patterns and evaluate the credibility of sources offers a valuable perspective in the ongoing battle against the spread of fake news. When integrated with News Content-based learning and other approaches, it contributes to a more comprehensive and effective fake news detection framework.

1.2.3 Hybrid models

Hybrid models combine both content-based and context-based approaches to leverage the strengths of each methodology (Comito et al. 2023). These models can provide a more comprehensive analysis by considering both the textual content and the contextual information.

2 Recent Advancements

1.
Content and social context fusion presented a hybrid model that fuses content-based features with social context features (Orhan 2023). This model uses a multimodal neural network that simultaneously processes textual content using BERT and social context using GNNs (Galende and Hern´andez-Pen˜aloza, G., Uribe, S., Garc´ıa, 2022). The fusion layer combines these features to improve detection accuracy, particularly in cases where either content or context alone is insufficient.
2.
Multi-view learning proposed a multi-view learning framework that incorporates multiple perspectives, including content (Galli et al. 2022), user behavior, and propagation patterns. By using attention mechanisms to weigh the importance of each view dynamically, the model can adapt to different types of fake news scenarios, enhancing its robustness.

These models represent cutting-edge techniques in fake news detection, combining advancements in NLP and network analysis to address the complex challenge of identifying fake news on social media platforms (Khalil et al. 2024). Integrating recent research findings into your paper will provide a comprehensive overview (TS, S.M., Sreeja, P. 2024) of the current state-of-the-art in this field.

2.1 Characteristics of fake news detection

Fake news detection is a critical area of research, focusing on identifying false or misleading information disseminated through various media channels, especially social media (Rastogi and Bansal 2023). This task involves several distinct characteristics that researchers aim to address. First, fake news often exhibits sensationalist language and exaggerated claims intended to elicit strong emotional reactions from readers, making it important for detection systems to analyze linguistic features. Second, the context in which the news appears is crucial; understanding the source, author, and the spread pattern on social networks helps in evaluating the credibility of the information. Third, fake news frequently leverages multimedia elements like images and videos, requiring advanced detection systems to integrate textual analysis with image and video verification (Athira et al. 2023). Fourth, the temporal aspect is significant, as fake news can spread rapidly, necessitating real-time or near-real-time detection capabilities. Additionally, adversarial techniques (Akdag and Cicekli 2024) are used to bypass detection mechanisms, highlighting the need for robust, adaptive models that can counteract these efforts. Hybrid models, which combine content-based and context-based features, are emerging as effective solutions, offering improved accuracy and resilience against sophisticated fake news tactics (Ozbay and Alatas 2021; Shu et al. 2019). Ultimately, the goal is to develop comprehensive systems that can accurately identify fake news, mitigate its spread, and enhance public trust in information.

2.2 Supervised fake news detection

Supervised fake news detection involves training models using labeled datasets where each news item is pre-annotated as either fake or real. This method relies heavily on the availability of large, accurately labeled datasets, which serve as ground truth for the learning algorithms. Supervised models, such as Support Vector Machines (SVM), Convolutional Neural Networks (CNN), and Long Short-Term Memory networks (LSTM), leverage these dataset (Athira et al. 2023) to learn the distinguishing features of fake news, including linguistic patterns, semantic content, and contextual cues. The effectiveness of supervised learning (Kumar and Taylor 2024) is often high when the training data is comprehensive and representative of the variations in news content. However, obtaining such high-quality labeled data is challenging, time-consuming, and expensive, which limits the scalability of supervised approaches.

2.3 Weakly supervised fake news detection

On the other hand, weakly supervised fake news detection aims to alleviate the dependency on extensive labeled datasets by utilizing partially labeled or noisy data. This approach (Rastogi and Bansal 2023) leverages various strategies, such as semi-supervised learning, where a small amount of labeled data is used in conjunction with a larger pool of unlabeled data, and transfer learning, where models pretrained on related tasks are fine-tuned on the target task. Weakly supervised methods also include distant supervision, where external knowledge sources like fact-checking websites provide weak labels. These approaches enable models to learn effectively from less-than-perfect data, making them more adaptable and scalable. Weakly supervised models (Akdag and Cicekli 2024) are particularly useful in dynamic environments like social media, where new and diverse fake news content emerges rapidly. Despite their flexibility, these models (Zhao et al. 2306) may struggle with accuracy and reliability compared to fully supervised models, necessitating ongoing refinement and validation.

3 Related work

Our study sought to bridge the knowledge gap in the area by offering a comprehensive review of the current methods for detecting fake news and promoting multidisciplinary research collaboration. The primary goal of this paper is to provide an overview of the current state of research on the topic.

To achieve this goal, we conducted a thorough review of various solutions that are currently being used to detect fake news. We analyzed the use of machine learning models, network propagation models, and fact-checking methodologies for detecting fake news. In particular, our study focused on how researchers develop and use machine learning models to identify and classify fake news, as well as the tools they employ for this purpose. Furthermore, we also discussed the research challenges that are still open in this field.

The paper (Ahmed et al. 2018) presents a novel n-gram model that automatically detects false information, with a particular focus on fake news and misleading judgments. The study employs two distinct attribute abstraction methods and six different machine-learning classification algorithms. Prepossessing involves removing stop words and stemming keywords to identify misleading information effectively. The classifier is trained using two feature extraction techniques, TF and TF-IDF, in the final classification stage. The research evaluates six machine learning algorithms: SGD, SVM (Sadeghi et al. 2022; Verma et al. 2021; Reddy et al. 2020; Palani et al. 2022; Shan et al. 2021), LSVM (Kaliyar et al. 2021a; Shishah 2021; Rai et al. 2022), KNN (Sadeghi et al. 2022; Verma et al. 2021), LR(Sadeghi et al. 2022; Rani et al. 2022), and DT(Kaliyar et al. 2021a; Verma et al. 2021).

The paper examines (Hirlekar and Kumar 2020) the current state of research on fake news and proposes theoretical and practical approaches to categorize and intervene in this problem. Text mining is one such machine learning approach used to detect fake news, hoaxes, and misinformation. The study proposes a complex classification scheme, including neural networks that use traditional classification procedures. The report recommends identifying fake news using essential text qualities that can be produced independently of platform and language (Faustini and Covoes 2020).

The study compares five datasets, which include articles and posts from social media in three distinct categories of languages, to standards and finds favorable outcomes. The study also examines how training factors affect other common natural language processing algorithms like Word2Vec and bag-of-words.

The study proposes a hybrid attention LSTM model and uses the Wang (Wang 1705) LIAR dataset (Jain et al. 2022; Dong et al. 2020; Sadeghi et al. 2022; Galli et al. 2022) from PolitiFact, which has subject, text, and speaker profiles for 12,836 news items from 3,341 speakers. The results show that the model outperforms recent reference dataset-based models by 14.5%. This demonstrates the importance of the speaker’s profile in determining news trustworthiness (Galli et al. 2022).

The paper begins with an examination of the need for automatic fake news detection, comparing and discussing various techniques’ findings on the most critical new standard datasets. The research focuses on the LIAR (Jain et al. 2022; Dong et al. 2020; Sadeghi et al. 2022; Galli et al. 2022; Bra¸soveanu, A.M., Andonie, R. 2021; Comito et al. 2023), FEVER, and FAKENEWSNET (Sadeghi et al. 2022; Verma et al. 2021; Souza et al. 2022; Galende and Hern´andez-Pen˜aloza, G., Uribe, S., Garc´ıa, 2022; Nassif et al. 2022; Mughaid and Al-Zu’bi, S., Al Arjan, A., Al-Amrat, R., Alajmi, R., Zitar, R.A., Abualigah, L. 2022) datasets, with the LIAR dataset showing excellent accuracy with LSTM and attention LSTM-based models. The FEVER dataset also uses an attention-based LSTM-based model (Kaliyar et al. 2021a; Shishah 2021; Rai et al. 2022) to achieve outstanding accuracy, and the GCN-based model using the FAKENEWSNET dataset achieves complete accuracy (Oshikawa et al. 1811).

Research on how to identify fake news has been ongoing, and numerous algorithms have been created to do so. To detect fake news, researchers have used a variety of models, including convolutional neural networks, long-short-term memory networks, and bidirectional LSTM (Comito et al. 2023; Li et al. 2022). They obtained word vector representations using glove, an unsupervised machine learning approach, and then modeled a deep neural network using CNN and Max-pooling. After that, the gradient disappeared and issues with long-term dependency were removed using Bi-LSTM (Sadeghi et al. 2022; Rani et al. 2022; Mughaid and Al-Zu’bi, S., Al Arjan, A., Al-Amrat, R., Alajmi, R., Zitar, R.A., Abualigah, L. 2022). The Attention Mechanism, which has been effective in various tasks such as machine translation and picture captioning, was used, and a dropout layer was used in the last phase to prevent overfitting. The researchers achieved a 71.2% accuracy rate for the approved testing dataset (Rani et al. 2022).

In another study, the authors categorized the critical factors for each job to investigate the use of multiple supervised learning classifiers, such as KNN(Sadeghi et al. 2022; Verma et al. 2021; Reddy et al. 2020), NB(Sadeghi et al. 2022; Verma et al. 2021; Reddy et al. 2020; Rani et al. 2022; Palani et al. 2022), RF, SVM(Sadeghi et al. 2022; Verma et al. 2021; Reddy et al. 2020; Palani et al. 2022; Shan et al. 2021), and XGB, and the accuracy and F1 score obtained by each classifier. RF and XGB outperformed other classifiers, and it was found that distinguishing fake from genuine articles on a significant, recently accessible, and wholly labeled dataset was tough. They also discussed how supervised learning models could help fact-checkers analyze digital data and come up with solid conclusions (Verma et al. 2021).

To detect fake news, another research employed semantic characteristics and text mining and compared RNN to a naive Bayes classifier and random forest classifier using different groups of linguistic features. Random forest outperformed Naive Bayes in trials when utilizing different features, with a result of 95.66%. The researchers used a Kaggle real-or-fake news dataset for their experiments (Bharadwaj and Shao 2019).

In yet another study, the authors proposed a Multi-source, Multi-class Fake News Detection system that combines convolutional neural networks to analyze the local structure of every word in a statement, LSTM (Kaliyar et al. 2021a; Shishah 2021; Rai et al. 2022) to analyze temporal relationships across the text, and an integrated network to concatenate the last hidden outputs. This technique combines the best characteristics of both systems, as LSTM performs better with lengthier jail sentences (Karimi et al. 2018).

The authors of another study proposed a novel method for detecting fake news at the KE level, which entails representing the claims in the news item as a multimedia knowledge graph and recognizing the misleading aspects in the kind of KEs for a high degree of explainability. They developed a logically structured approach called InfoSurgeon for detecting disinformation in news articles that involve source context, semantic representation, multimedia information components, and previous knowledge. They also proposed a new benchmark for identifying fake news at the KE level via a silver standard annotation dataset (15,000 multimedia article pairs) generated automatically using KG-influenced natural language generation (Fung et al. 2021).

The authors present tools and models that aim to address the challenge of detecting fake news and supporting their study. They created two new datasets, spanning seven different fields, using a combination of human and crowdsourced annotations and data directly obtained from the internet (Monti et al. 1902; Hu et al. 2022). Exploratory tests were conducted using these datasets to identify linguistic features that could potentially be indicative of fraudulent content. Based on these features, the authors developed fake news detectors that achieved up to 78% precision. To provide context for their findings, they compared the efficacy of their detection systems to an objective human baseline (P´erez-Rosas, V., Kleinberg, B., Lefevre, A., Mihalcea, R. 1708).

Table 1 is an indispensable resource for researchers and practitioners in the field of data analysis and research. This table meticulously catalogs a wide range of datasets, serving as a comprehensive reference guide. Whether one is delving into machine learning, statistical analysis, or any data-driven investigation, Table 1 simplifies the process of dataset selection and comparison. It embodies the foundational principle that robust research relies on the quality and relevance of data, making it an invaluable asset in the pursuit of knowledge and innovation across various domains.

Table 1 Model references

Fake news detection: recent trends and challenges

Abstract

Similar content being viewed by others

Fake News Detection: Tools, Techniques, and Methodologies

Fake News Detection: Experiments and Approaches Beyond Linguistic Features

Fake News Detection Methods: A Survey and New Perspectives

Explore related subjects

1 Introduction

1.1 Motivation and research objective

1.2 Existing methodologies for the identification of fake news

1.2.1 News content-based learning

1.2.2 Social context-based learning

1.2.3 Hybrid models

2 Recent Advancements

2.1 Characteristics of fake news detection

2.2 Supervised fake news detection

2.3 Weakly supervised fake news detection

3 Related work

4 Different types of fake news detection Model

4.1 Machine learning technique

4.2 Natural language processing technique

4.3 Deep learning technique

5 Dataset

6 Performance measure formula

7 Open issues and future research

7.1 Adversarial attacks

7.2 Multi-modal analysis

7.3 Contextual understanding

7.4 Fine-grained labeling

7.5 Explainable AI

7.6 Transfer learning

7.7 Long-term dynamics

7.8 Ethical considerations

7.9 Real-time detection

7.9.1 Recent advancements and models addressing challenges in fake news detection

8 Conclusion

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation