1 Introduction

In an era characterized by the rapid dissemination of information through digital platforms, the proliferation of fake news has emerged as a critical concern. The term”fake news” refers to intentionally fabricated or misleading information presented as genuine news, often designed to deceive (Jain et al. 2022), manipulate, or exploit the audience’s emotions and beliefs. The rampant spread of fake news has the potential to sway public opinion, influence decision-making processes, and even disrupt social and political landscapes (Khattar et al. 2019). Consequently, the development of effective methods for fake news detection has become an urgent necessity.

Recent years have witnessed significant advancements in the field of fake news detection, driven by a combination of technological innovations and growing awareness about the potential consequences of misinformation (Zhou et al. 2020). This dynamic landscape poses both opportunities and challenges, as the creators of fake news constantly evolve their strategies to bypass detection mechanisms. From sophisticated AI-generated articles to meticulously doctored images and videos (Orhan 2023), the arsenal of fake news has expanded, demanding more sophisticated and adaptable detection techniques.

Several instances of fake news are depicted in Fig. 1. These fake news instances gained significant traction during the COVID-19 pandemic and the 2016 U.S. General Presidential Election (Kaliyar et al. 2021a).

Fig. 1
figure 1

Illustrations of misleading information circulated across social media. (Kaliyar et al. 2021a)

This article delves into the latest trends and challenges surrounding fake news detection. It explores the evolving techniques employed by purveyors of misinformation and highlights the innovative strategies researchers and technologists have devised to counteract them. From natural language processing and machine learning algorithms to data mining and social network analysis, a multitude of approaches (Varghese et al. 2024) are being harnessed to differentiate between genuine news and deceptive content. However, amidst this progress, there remain formidable obstacles such as the lack of a universally agreed-upon definition of fake news, the ethical implications of content moderation, and the balance between freedom of expression and the need to curb misinformation.

As we navigate this intricate landscape, it becomes crucial to understand the technical aspects of fake news detection and the broader societal and psychological factors that contribute to its dissemination and impact (Roy et al. 1811). By shedding light on the latest developments, this article aims to contribute to the ongoing discourse on combating fake news and promoting media literacy in an increasingly information-saturated world. As technology and deception continue intertwining, staying ahead in the battle against fake news requires (Bharadwaj and Shao 2019) vigilance, collaboration, and a multidimensional approach encompassing technology, psychology, and critical thinking.

Social media has transformed into a platform for sharing information, ideas, and feelings across the globe. However, this convenience has also facilitated the spread of misinformation, which can be disseminated quickly, cheaply, and maliciously. Spreading false information is often used to damage public information, organizations, and even countries, highlighting the importance of identifying misleading information (Oshikawa et al. 1811). Research is being done to create reliable and accurate algorithms that can automatically identify false information on social media to solve this issue. These automated applications are made using cutting-edge technology like data mining, machine learning, and natural language processing (NLP) (Shu et al. 2017).

1.1 Motivation and research objective

The field of fake news detection stands as a thriving research domain, drawing the keen interest of researchers worldwide. Significant scope for enhancement emerges within the realm of fake news detection, primarily due to the limited availability of context- specific news data for training purposes. The adoption of deep learning methodologies in fake news detection presents a distinctive advantage over conventional approaches, given their prowess in extracting advanced features from the data. These aforementioned challenges and opportunities serve as the driving force behind our endeavor to construct an efficient deep-learning model dedicated to the task of fake news detection.

1.2 Existing methodologies for the identification of fake news

Identifying fake news poses a significant challenge due to its deliberate intent to distort information. Preceding theories play a crucial role in directing investigations into counterfeit news detection, employing diverse classification models. Current insights into detecting fake news can be broadly grouped into two main categories: (i) Learning based on News Content, and (ii) Learning based on Social Context.

1.2.1 News content-based learning

News Content-based learning (Jain et al. 2022; Zhou et al. 2020; Dong et al. 2020; Sadeghi et al. 2022; Galli et al. 2022; Bra¸soveanu, A.M., Andonie, R. 2021; Verma et al. 2021; Shishah 2021) involves analyzing the textual and linguistic characteristics of news articles to distinguish between genuine information and fake news. This approach hinges on the understanding that deceptive content often exhibits linguistic anomalies, sensationalism, or lacks credible sources. By examining the structural attributes, writing style, and vocabulary usage within articles, machine learning algorithms can be trained to uncover patterns indicative of falsified information.

Through the utilization of Natural Language Processing (NLP) techniques, such as sentiment analysis, text summarization (Galli et al. 2022; Bra¸soveanu, A.M., Andonie, R. 2021; Reddy et al. 2020; Rani et al. 2022; Palani et al. 2022; Rai et al. 2022; Shan et al. 2021; Kaliyar et al. 2021b; Jarrahi and Safari 2023) and language models, this approach aims to identify inconsistencies, exaggerated claims, and linguistic markers commonly associated with misinformation. For instance, excessive use of emotional language, hyperbolic statements, or the absence of verifiable sources can raise red flags about the authenticity of the content.

Furthermore, this method is fortified by the accumulation of labeled datasets (Zhou et al. 2020; Palani et al. 2022; Rai et al. 2022; Jarrahi and Safari 2023) containing both genuine and fake news articles. Machine learning algorithms can then be trained on these datasets, allowing them to learn the nuanced distinctions between reliable and deceptive content. The process involves feature extraction, wherein relevant linguistic attributes are quantified, and classifiers are employed to differentiate between the two categories.

While News Content-based learning (Kaliyar et al. 2021a, 2021b; Sadeghi et al. 2022; Verma et al. 2021; Souza et al. 2022; Galende and Hern´andez-Pen˜aloza, G., Uribe, S., Garc´ıa, 2022; Nassif et al. 2022; Mughaid and Al-Zu’bi, S., Al Arjan, A., Al-Amrat, R., Alajmi, R., Zitar, R.A., Abualigah, L. 2022; Mohapatra et al. 2022) holds promise in detecting fake news, it is not without limitations. The constantly evolving nature of deceptive strategies demands ongoing updates to the algorithms. Additionally, this approach might struggle with subtle instances of misinformation that do not overtly deviate in language use or style. Balancing the need for accurate detection with potential false positives remains a challenge, as certain linguistic features might be shared between genuine news and well-crafted fake stories.

In essence, News Content-based learning (Li et al. 2021, 2020; Ying et al. 2021; Ma et al. 2015; Wang et al. 2021; Liu and Wu 2020) forms a pivotal part of the arsenal against fake news, leveraging linguistic and textual cues to unravel the threads of deception woven within the fabric of information. Its integration with other approaches, such as Social Context-based learning, holds the potential to enhance the accuracy and robustness of fake news detection systems.

1.2.2 Social context-based learning

Social Context-based learning (Li et al. 2021; Ying et al. 2021; Ma et al. 2015; Wang et al. 2021) involves analyzing the social interactions and dynamics surrounding news articles to assess their credibility and authenticity. This approach recognizes that the dissemination and reception of news are deeply intertwined with the social ecosystem in which they exist. By examining factors such as user engagement, sharing patterns, and the credibility of sources, this method aims to uncover signals that can help distinguish between genuine news and fake information. One of the key components of Social Context-based learning (Kaliyar et al. 2021a, 2021b; Sadeghi et al. 2022; Verma et al. 2021; Souza et al. 2022; Galende and Hern´andez-Pen˜aloza, G., Uribe, S., Garc´ıa, 2022; Nassif et al. 2022; Mughaid and Al-Zu’bi, S., Al Arjan, A., Al-Amrat, R., Alajmi, R., Zitar, R.A., Abualigah, L. 2022; Mohapatra et al. 2022) is the analysis of the propagation patterns of news articles across social media platforms. The rapid sharing of fake news often leads to its viral spread, driven by emotional responses and confirmation bias. By tracking the velocity and volume of shares, likes, comments, and retweets, algorithms can identify articles that are gaining traction unusually quickly or within specific echo chambers.

Furthermore, the credibility of the sources sharing the news plays a critical role. Social Context-based learning (Dong et al. 2020; Ying et al. 2021) involves assessing the authority and authenticity of the accounts sharing the information. Accounts with a history of sharing trustworthy content and a diverse range of sources are more likely to share accurate news. Conversely, accounts that predominantly share sensational or misleading information might raise suspicions.

Contextual analysis also contributes to this approach. Understanding the broader context in which a news article is shared, including the events and conversations surrounding it, can provide insights into its accuracy (Devlin et al. 1810; Reis et al. 2019; P´erez-Rosas, V., Kleinberg, B., Lefevre, A., Mihalcea, R. 1708). Additionally, detecting inconsistencies between a news article and verifiable facts can help identify potential misinformation.

Social Context-based learning is enhanced through the utilization of network analysis, sentiment analysis, and machine learning methodologies (Liu and Wu 2020; Long et al. 2017; Ozbay and Alatas 2021). By modeling the complex relationships between users, content, and interactions, algorithms can learn to differentiate between genuine news and fake news based on social dynamics. However, challenges exist in this approach as well (Liu and Wu 2020; Li et al. 2020). Misinformation campaigns can manipulate social dynamics, employing tactics to artificially inflate engagement metrics. Moreover, relying solely on social context might not catch sophisticated fake news stories that avoid triggering suspicious patterns.

In conclusion, Social Context-based learning complements (Kaliyar et al. 2021a, 2021b; Nassif et al. 2022; Mughaid and Al-Zu’bi, S., Al Arjan, A., Al-Amrat, R., Alajmi, R., Zitar, R.A., Abualigah, L. 2022; Mohapatra et al. 2022) other fake news detection strategies by tapping into the intricate web of social interactions and human behaviors. Its ability to uncover anomalies in sharing patterns and evaluate the credibility of sources offers a valuable perspective in the ongoing battle against the spread of fake news. When integrated with News Content-based learning and other approaches, it contributes to a more comprehensive and effective fake news detection framework.

1.2.3 Hybrid models

Hybrid models combine both content-based and context-based approaches to leverage the strengths of each methodology (Comito et al. 2023). These models can provide a more comprehensive analysis by considering both the textual content and the contextual information.

2 Recent Advancements

  1. 1.

    Content and social context fusion presented a hybrid model that fuses content-based features with social context features (Orhan 2023). This model uses a multimodal neural network that simultaneously processes textual content using BERT and social context using GNNs (Galende and Hern´andez-Pen˜aloza, G., Uribe, S., Garc´ıa, 2022). The fusion layer combines these features to improve detection accuracy, particularly in cases where either content or context alone is insufficient.

  2. 2.

    Multi-view learning proposed a multi-view learning framework that incorporates multiple perspectives, including content (Galli et al. 2022), user behavior, and propagation patterns. By using attention mechanisms to weigh the importance of each view dynamically, the model can adapt to different types of fake news scenarios, enhancing its robustness.

These models represent cutting-edge techniques in fake news detection, combining advancements in NLP and network analysis to address the complex challenge of identifying fake news on social media platforms (Khalil et al. 2024). Integrating recent research findings into your paper will provide a comprehensive overview (TS, S.M., Sreeja, P. 2024) of the current state-of-the-art in this field.

2.1 Characteristics of fake news detection

Fake news detection is a critical area of research, focusing on identifying false or misleading information disseminated through various media channels, especially social media (Rastogi and Bansal 2023). This task involves several distinct characteristics that researchers aim to address. First, fake news often exhibits sensationalist language and exaggerated claims intended to elicit strong emotional reactions from readers, making it important for detection systems to analyze linguistic features. Second, the context in which the news appears is crucial; understanding the source, author, and the spread pattern on social networks helps in evaluating the credibility of the information. Third, fake news frequently leverages multimedia elements like images and videos, requiring advanced detection systems to integrate textual analysis with image and video verification (Athira et al. 2023). Fourth, the temporal aspect is significant, as fake news can spread rapidly, necessitating real-time or near-real-time detection capabilities. Additionally, adversarial techniques (Akdag and Cicekli 2024) are used to bypass detection mechanisms, highlighting the need for robust, adaptive models that can counteract these efforts. Hybrid models, which combine content-based and context-based features, are emerging as effective solutions, offering improved accuracy and resilience against sophisticated fake news tactics (Ozbay and Alatas 2021; Shu et al. 2019). Ultimately, the goal is to develop comprehensive systems that can accurately identify fake news, mitigate its spread, and enhance public trust in information.

2.2 Supervised fake news detection

Supervised fake news detection involves training models using labeled datasets where each news item is pre-annotated as either fake or real. This method relies heavily on the availability of large, accurately labeled datasets, which serve as ground truth for the learning algorithms. Supervised models, such as Support Vector Machines (SVM), Convolutional Neural Networks (CNN), and Long Short-Term Memory networks (LSTM), leverage these dataset (Athira et al. 2023) to learn the distinguishing features of fake news, including linguistic patterns, semantic content, and contextual cues. The effectiveness of supervised learning (Kumar and Taylor 2024) is often high when the training data is comprehensive and representative of the variations in news content. However, obtaining such high-quality labeled data is challenging, time-consuming, and expensive, which limits the scalability of supervised approaches.

2.3 Weakly supervised fake news detection

On the other hand, weakly supervised fake news detection aims to alleviate the dependency on extensive labeled datasets by utilizing partially labeled or noisy data. This approach (Rastogi and Bansal 2023) leverages various strategies, such as semi-supervised learning, where a small amount of labeled data is used in conjunction with a larger pool of unlabeled data, and transfer learning, where models pretrained on related tasks are fine-tuned on the target task. Weakly supervised methods also include distant supervision, where external knowledge sources like fact-checking websites provide weak labels. These approaches enable models to learn effectively from less-than-perfect data, making them more adaptable and scalable. Weakly supervised models (Akdag and Cicekli 2024) are particularly useful in dynamic environments like social media, where new and diverse fake news content emerges rapidly. Despite their flexibility, these models (Zhao et al. 2306) may struggle with accuracy and reliability compared to fully supervised models, necessitating ongoing refinement and validation.

3 Related work

Our study sought to bridge the knowledge gap in the area by offering a comprehensive review of the current methods for detecting fake news and promoting multidisciplinary research collaboration. The primary goal of this paper is to provide an overview of the current state of research on the topic.

To achieve this goal, we conducted a thorough review of various solutions that are currently being used to detect fake news. We analyzed the use of machine learning models, network propagation models, and fact-checking methodologies for detecting fake news. In particular, our study focused on how researchers develop and use machine learning models to identify and classify fake news, as well as the tools they employ for this purpose. Furthermore, we also discussed the research challenges that are still open in this field.

The paper (Ahmed et al. 2018) presents a novel n-gram model that automatically detects false information, with a particular focus on fake news and misleading judgments. The study employs two distinct attribute abstraction methods and six different machine-learning classification algorithms. Prepossessing involves removing stop words and stemming keywords to identify misleading information effectively. The classifier is trained using two feature extraction techniques, TF and TF-IDF, in the final classification stage. The research evaluates six machine learning algorithms: SGD, SVM (Sadeghi et al. 2022; Verma et al. 2021; Reddy et al. 2020; Palani et al. 2022; Shan et al. 2021), LSVM (Kaliyar et al. 2021a; Shishah 2021; Rai et al. 2022), KNN (Sadeghi et al. 2022; Verma et al. 2021), LR(Sadeghi et al. 2022; Rani et al. 2022), and DT(Kaliyar et al. 2021a; Verma et al. 2021).

The paper examines (Hirlekar and Kumar 2020) the current state of research on fake news and proposes theoretical and practical approaches to categorize and intervene in this problem. Text mining is one such machine learning approach used to detect fake news, hoaxes, and misinformation. The study proposes a complex classification scheme, including neural networks that use traditional classification procedures. The report recommends identifying fake news using essential text qualities that can be produced independently of platform and language (Faustini and Covoes 2020).

The study compares five datasets, which include articles and posts from social media in three distinct categories of languages, to standards and finds favorable outcomes. The study also examines how training factors affect other common natural language processing algorithms like Word2Vec and bag-of-words.

The study proposes a hybrid attention LSTM model and uses the Wang (Wang 1705) LIAR dataset (Jain et al. 2022; Dong et al. 2020; Sadeghi et al. 2022; Galli et al. 2022) from PolitiFact, which has subject, text, and speaker profiles for 12,836 news items from 3,341 speakers. The results show that the model outperforms recent reference dataset-based models by 14.5%. This demonstrates the importance of the speaker’s profile in determining news trustworthiness (Galli et al. 2022).

The paper begins with an examination of the need for automatic fake news detection, comparing and discussing various techniques’ findings on the most critical new standard datasets. The research focuses on the LIAR (Jain et al. 2022; Dong et al. 2020; Sadeghi et al. 2022; Galli et al. 2022; Bra¸soveanu, A.M., Andonie, R. 2021; Comito et al. 2023), FEVER, and FAKENEWSNET (Sadeghi et al. 2022; Verma et al. 2021; Souza et al. 2022; Galende and Hern´andez-Pen˜aloza, G., Uribe, S., Garc´ıa, 2022; Nassif et al. 2022; Mughaid and Al-Zu’bi, S., Al Arjan, A., Al-Amrat, R., Alajmi, R., Zitar, R.A., Abualigah, L. 2022) datasets, with the LIAR dataset showing excellent accuracy with LSTM and attention LSTM-based models. The FEVER dataset also uses an attention-based LSTM-based model (Kaliyar et al. 2021a; Shishah 2021; Rai et al. 2022) to achieve outstanding accuracy, and the GCN-based model using the FAKENEWSNET dataset achieves complete accuracy (Oshikawa et al. 1811).

Research on how to identify fake news has been ongoing, and numerous algorithms have been created to do so. To detect fake news, researchers have used a variety of models, including convolutional neural networks, long-short-term memory networks, and bidirectional LSTM (Comito et al. 2023; Li et al. 2022). They obtained word vector representations using glove, an unsupervised machine learning approach, and then modeled a deep neural network using CNN and Max-pooling. After that, the gradient disappeared and issues with long-term dependency were removed using Bi-LSTM (Sadeghi et al. 2022; Rani et al. 2022; Mughaid and Al-Zu’bi, S., Al Arjan, A., Al-Amrat, R., Alajmi, R., Zitar, R.A., Abualigah, L. 2022). The Attention Mechanism, which has been effective in various tasks such as machine translation and picture captioning, was used, and a dropout layer was used in the last phase to prevent overfitting. The researchers achieved a 71.2% accuracy rate for the approved testing dataset (Rani et al. 2022).

In another study, the authors categorized the critical factors for each job to investigate the use of multiple supervised learning classifiers, such as KNN(Sadeghi et al. 2022; Verma et al. 2021; Reddy et al. 2020), NB(Sadeghi et al. 2022; Verma et al. 2021; Reddy et al. 2020; Rani et al. 2022; Palani et al. 2022), RF, SVM(Sadeghi et al. 2022; Verma et al. 2021; Reddy et al. 2020; Palani et al. 2022; Shan et al. 2021), and XGB, and the accuracy and F1 score obtained by each classifier. RF and XGB outperformed other classifiers, and it was found that distinguishing fake from genuine articles on a significant, recently accessible, and wholly labeled dataset was tough. They also discussed how supervised learning models could help fact-checkers analyze digital data and come up with solid conclusions (Verma et al. 2021).

To detect fake news, another research employed semantic characteristics and text mining and compared RNN to a naive Bayes classifier and random forest classifier using different groups of linguistic features. Random forest outperformed Naive Bayes in trials when utilizing different features, with a result of 95.66%. The researchers used a Kaggle real-or-fake news dataset for their experiments (Bharadwaj and Shao 2019).

In yet another study, the authors proposed a Multi-source, Multi-class Fake News Detection system that combines convolutional neural networks to analyze the local structure of every word in a statement, LSTM (Kaliyar et al. 2021a; Shishah 2021; Rai et al. 2022) to analyze temporal relationships across the text, and an integrated network to concatenate the last hidden outputs. This technique combines the best characteristics of both systems, as LSTM performs better with lengthier jail sentences (Karimi et al. 2018).

The authors of another study proposed a novel method for detecting fake news at the KE level, which entails representing the claims in the news item as a multimedia knowledge graph and recognizing the misleading aspects in the kind of KEs for a high degree of explainability. They developed a logically structured approach called InfoSurgeon for detecting disinformation in news articles that involve source context, semantic representation, multimedia information components, and previous knowledge. They also proposed a new benchmark for identifying fake news at the KE level via a silver standard annotation dataset (15,000 multimedia article pairs) generated automatically using KG-influenced natural language generation (Fung et al. 2021).

The authors present tools and models that aim to address the challenge of detecting fake news and supporting their study. They created two new datasets, spanning seven different fields, using a combination of human and crowdsourced annotations and data directly obtained from the internet (Monti et al. 1902; Hu et al. 2022). Exploratory tests were conducted using these datasets to identify linguistic features that could potentially be indicative of fraudulent content. Based on these features, the authors developed fake news detectors that achieved up to 78% precision. To provide context for their findings, they compared the efficacy of their detection systems to an objective human baseline (P´erez-Rosas, V., Kleinberg, B., Lefevre, A., Mihalcea, R. 1708).

Table 1 is an indispensable resource for researchers and practitioners in the field of data analysis and research. This table meticulously catalogs a wide range of datasets, serving as a comprehensive reference guide. Whether one is delving into machine learning, statistical analysis, or any data-driven investigation, Table 1 simplifies the process of dataset selection and comparison. It embodies the foundational principle that robust research relies on the quality and relevance of data, making it an invaluable asset in the pursuit of knowledge and innovation across various domains.

Table 1 Model references

Table 2 serves as a valuable resource for gaining insights into the extensive body of research dedicated to understanding and addressing issues related to bots, clickbaits, rumors, and the analysis of content and context in digital communication. This table presents a consolidated view of the diverse studies, methodologies, and findings within this domain, offering a comprehensive snapshot of the collective efforts made by researchers and scholars. It not only highlights the breadth and depth of research but also provides a convenient reference point for those seeking to explore specific topics within this multifaceted field. In an era marked by the rapid dissemination of information through digital channels, Table 3 plays a pivotal role in promoting informed decision-making and fostering a deeper understanding of the complexities surrounding online content and its impact on society (Fig. 2).

Table 2 Dataset references
Table 3 Comprehensive overview of research utilized in bots, clickbaits, rumors, content, and context analysis
Fig. 2
figure 2

Techniques Utilized for Detecting False Information

4 Different types of fake news detection Model

4.1 Machine learning technique

Initially, machine learning algorithms were developed to detect fake news due to the belief that it is created for financial and political gain (Faustini and Covoes 2020). Since fake news often includes persuasive and argumentative language, the retrieval of written text and linguistic elements is required for machine learning. The author utilized the Naive Bayes classifier to recognize linguistic features such as vocabulary, word count, length, and grammatical style, including text summarization and characterization (Oliveira et al. 2020). However, some fake news categories, such as clickbait articles, have high click-through rates due to their alluring nature, which cannot be detected by this technology.

The authors suggest a machine learning model as a solution, employing gradient- boosted decision trees to identify fake news effectively, resulting in high classification accuracy. They have pinpointed the significance of a casual tone in the creation of clickbait articles. (Elhadad et al. 2020).

Additionally, the author has devised a machine learning model capable of discerning and forecasting whether an article qualifies as clickbait, leveraging features such as URL, content, and title. They used Yahoo aggregate data to construct a training set of 1349 clickbait URLs and a testing set of 2724 non-clickbait URLs. The author categorized eight types of clickbait, such as exaggeration, tease, confusion, provocation, formatting, bait-and-switch, graphic, and wrong, to identify spam mail and websites (Faustini and Covoes 2020; Silva et al. 2020) (Fig. 3).

Fig. 3
figure 3

Machine learning architecture of fake information detection. (Meel and Vishwakarma 2020)

4.2 Natural language processing technique

Natural language processing (NLP) has become a valuable tool in detecting fraud through a variety of techniques, including grammatical and syntactic analysis, correlation, clustering, and boolean text classification, which identifies news as true or false. When detection is challenging, a third category may be introduced to differentiate between temporary actual and temporary fake situations. Utilizing the Text Segmentation method along with the Natural Language Toolkit, the Sentiment Score is subsequently calculated by examining carefully chosen and structured text for indications of fraudulent activity. In NLP, features like text quality and context are essential for accurate detection (Liu and Wu 2020). Stanford parser, a language, and syntactic analyzer claims to produce reliable results. Reality-proof (Silva et al. 2020) studies have shown that NLP is more effective than social authentication. The main aim is to identify syntactic and verbal cues that reveal linguistic disparities between individuals who tell lies and those who tell the truth.

4.3 Deep learning technique

Detecting different types of fake news in the context of deep learning involves the application of various machine-learning techniques to combat the proliferation of misinformation in today’s digital age. The multifaceted nature of fake news necessitates a diverse range of approaches. Text-based fake news detection leverages recurrent neural networks (RNNs) and convolutional neural networks (CNNs) to scrutinize linguistic patterns, sentiment, and contextual cues in textual content. These models analyze the language used in news articles, social media posts, and other textual sources to identify deceptive narratives. Image-based fake news detection, on the other hand, harnesses deep convolutional neural networks (CNNs) to scrutinize visual elements within images. These models can uncover alterations, forgeries, or inconsistencies in visual content, a vital component of debunking photo-manipulated stories. Audio-based fake news detection delves into audio files using models like CNNs (Liu and Wu 2020) and RNNs to identify voice impersonations, audio tampering, or anomalies in spoken content. Combining text, images, and audio, multi-modal fake news detection employs advanced architectures such as Transformer-based models and multi-modal neural networks (MNNs) (Karimi et al. 2018) to fuse and analyze data from various media sources. These models provide a more comprehensive evaluation of information credibility by considering the collective impact of multiple modalities. Overall, the use of deep learning in these various modes empowers researchers and developers to stay ahead in the battle against fake news(Bharadwaj and Shao 2019), offering a multi-pronged approach to safeguard the accuracy and integrity of information in an increasingly digital and interconnected world (Fig. 4).

Fig. 4
figure 4

Deep learning architecture of fake information detection. (Meel and Vishwakarma 2020)

Tables 4 and 5 offer a comprehensive overview of ensemble-based machine/deep learning methods that have achieved evaluation metric scores surpassing 90%. The highlighted cells spotlight the researchers’ top accomplishments in these tables.

Table 4 Metrics assessing performance for the most effective approach in the previously cited research using ML approaches
Table 5 Metrics assessing performance for the most effective approach in the previously cited research using DL approaches

5 Dataset

BuzzFeed (Santia and Williams, J.: Buzzface 2018) Comprising a full news sample published on Facebook, this dataset encompasses content from 9 news agencies spanning September 19 to 23, as well as September 26 and 27, a week preceding the 2016 U.S (Hu et al. 2022). Election. Each post, alongside its associated articles, underwent validation by five BuzzFeed journalists individually. The dataset encompasses a total of 1627 articles, with a breakdown of 826 mainstream articles, 356 left-wing articles, and 545 right-wing articles.

LIAR (Wang 1705) encompasses a collection of 12,836 real-world news articles curated from PolitiFact, each piece of news in this dataset is categorized based on a six-grade truthfulness (Wang 1705; Hu et al. 2022) scale: true, false, half-true, part-true, barely-true, and mostly- true. Additionally, the dataset provides supplementary details about the subjects covered, political affiliations, contextual information, and speakers mentioned in the news articles.

Wibo (Li et al. 2021) is presenting a multi-domain fabricated news dataset in the Chinese language, each dataset entry is accompanied by an annotated domain label. The dataset encompasses fabricated as well as authentic news articles sourced from Sina Weibo, spanning the period from December 2014 to March 2021. Regarding fabricated content, the Weibo21 dataset comprises news articles officially identified as misinformation by the Weibo Community Management Center.

PolitiFact (Zhou et al. 2020) is a renowned nonprofit fact-checking website that operates within the United States and specializes in evaluating political statements and reports. The dataset from PolitiFact encompasses news articles published between May 2002 and July 2018. Verified by domain experts, the dataset includes definitive labels (false or true) assigned to the news content. The content in the PolitiFact dataset primarily consists of statements or news articles disseminated by political figures (such as Congress members, White House staff, and lobbyists) and political groups, all of which have undergone thorough fact-checking by PolitiFact.

GossipCop (Zhou et al. 2020) is operating as a fact-checking platform, the GossipCop dataset pertains to news articles released between July 2000 and December 2018. The dataset incorporates domain experts who meticulously assign definitive labels to news content, thereby upholding the accuracy and reliability of the news tags.

FakeNewsNet (Verma et al. 2021) is incorporating data sourced from the fact-checking platforms BuzzFeed and PolitiFact, this dataset encompasses news articles along with associated user details and retweet information. The dataset aggregates a combined total of 23,196 news articles and 69,733 retweets.

PHEME (Zubiaga et al. 2017) is a compilation consisting of tweets originating from the Twitter platform. Furthermore, the data was gathered from five distinct sources specializing in breaking news, with each source contributing a set of tweets. Each tweet within the dataset comprises both textual content and accompanying images.

We provide a summary of the publicly available datasets utilized, as depicted in Table 6. These datasets comprise data sourced from platforms such as Sina Weibo, Twitter, and various other social media platforms, along with information from fact- checking websites like BuzzFeedWeb, LIAR, and FakeNewsNet.

Table 6 Overview of fake news detection datasets

Table 7 presents a comprehensive overview of the results obtained from various associated models employed in the detection of fake news. This table offers valuable insights into the performance and effectiveness of different approaches and methodologies used to tackle the challenging task of identifying deceptive or misleading information in news sources. By summarizing the results from these associated models, Table 7 serves as a valuable reference for researchers, policymakers, and practitioners striving to enhance our understanding of fake news detection and develop more robust solutions to address this pressing issue in today’s information landscape (Tables 8 and 9,10,11,12,13).

Table 7 Overview of the results from associated models for detecting fake news
Table 8 Performance of various models on the LIAR dataset
Table 9 Performance of various models on the BuzzFeed dataset
Table 10 Performance of various models on the Politifact dataset
Table 11 Performance of various models on the GossipCop dataset
Table 12 Performance of various models on the Pheme dataset
Table 13 Performance of various models on the Weibo dataset

6 Performance measure formula

The analysis of all the gathered articles reveals that, in every case, one or more of the ten performance metrics, as depicted in Table 14, have been employed to assess the simulation outcomes. These metrics serve as indicators of a method’s ability to detect. Additionally, Table 14 includes the formulas for each performance metric.

Table 14 Performance measures for our deceptive-information detection system

As indicated in the presented Table 14, a comprehensive evaluation of classifier performance is conducted across multiple dimensions, including Accuracy, Error Rate, Precision, Sensitivity, F1-Score, Specificity, Area Under the Curve, Geometric Mean, Miss Rate, False Discovery Rate, and Fall-Out Rate.

Within Table 14, the designations TP(True Positive) and TN(True Negative) refer to the count of accurately classified positive and negative instances respectively, while FP(False Positive) and FN(False Negative) refer to the count of positive and negatively labeled instances that were inaccurately classified.

The experimental outcomes stemming from the constructed models were rigorously assessed utilizing all the metrics enumerated in Table 14. The intention was to gauge the performance of distinct detection models from various vantage points rather than relying solely on a singular perspective.

7 Open issues and future research

Fake news has emerged as a significant challenge in the modern information landscape, fueled by the rapid dissemination of information through digital platforms and social media. While substantial progress has been made in developing fake news detection techniques, several open issues and avenues for future research remain to enhance the effectiveness and robustness of these methods.

7.1 Adversarial attacks

One of the pressing challenges in fake news detection is the development of techniques to counter adversarial attacks. Adversaries can manipulate text to bypass detection models, making them vulnerable to subtle alterations. Future research should focus on creating models that are more resistant to such adversarial perturbations by integrating techniques from the field of adversarial machine learning.

7.2 Multi-modal analysis

Fake news is not limited to textual content; images, videos, and audio clips can also be manipulated to spread misinformation. Future research should explore ways to incorporate multi-modal analysis, which involves analyzing multiple forms of media to detect inconsistencies and anomalies that might indicate the presence of fake news.

7.3 Contextual understanding

Current fake news detection models often struggle with understanding the context in which a piece of information is presented. Enhancing models with contextual understanding, including the cultural, social, and historical aspects of a story, can improve their accuracy and reduce false positives.

7.4 Fine-grained labeling

Many fake news datasets currently use binary labels (fake/real). However, fake news exists on a spectrum, ranging from slight distortions to complete fabrications. Introducing finer-grained labels that capture the varying degrees of misinformation can aid in the development of more nuanced detection models.

7.5 Explainable AI

The interpretability of fake news detection models is crucial for building trust and understanding their decisions. Future research should focus on developing methods to make these models more explainable, allowing users to comprehend why a certain piece of content is flagged as fake.

7.6 Transfer learning

The effectiveness of fake news detection models can be limited by the availability of labeled data. Transfer learning techniques, where models trained on one domain are adapted to another with limited labeled data, could play a crucial role in addressing this issue.

7.7 Long-term dynamics

Fake news detection often focuses on the immediate detection of misinformation. How- ever, understanding the long-term dynamics of how fake news evolves and spreads can provide valuable insights into devising more effective strategies to counter its impact.

7.8 Ethical considerations

As fake news detection models become more powerful, ethical considerations surrounding privacy, bias, and unintended consequences become more crucial. Future research should address these concerns to ensure the responsible deployment of detection technologies (Tables 15 and 16).

Table 15 Evaluating our survey in contrast to an established survey centered on social media platforms and varying utilized features
Table 16 Advancements and models addressing challenges in fake news detection

7.9 Real-time detection

The speed at which fake news spreads demands real-time detection systems. Developing models that can analyze and flag potentially false information in real time is essential to mitigate the rapid dissemination of misinformation.

7.9.1 Recent advancements and models addressing challenges in fake news detection

Recent advancements in fake news detection have introduced several models to tackle the associated challenges. Hybrid models (Zhang and Ghorbani 2020) combining various techniques have shown promise, integrating deep learning with traditional machine learning approaches. For instance, models combining Convolutional Neural Networks (CNNs) and Long Short- Term Memory (LSTM) networks (Khalil et al. 2024) leverage the strengths of both spatial feature extraction and temporal sequence processing. BERT-based models (Devlin et al. 1810; Essa et al. 2023) have been particularly effective due to their contextual understanding, enhancing the detection accuracy of nuanced and context-dependent fake news. Furthermore, graph-based approaches (Mohapatra et al. 2022; Monti et al. 1902) have emerged to address the spread and network influence of fake news, utilizing Graph Neural Networks (GNNs) to capture relational data between news items and their sources. Multi-modal models (Wang et al. 2021) incorporating text, images, and metadata have also been developed to handle diverse forms of misinformation. Additionally, adversarial training methods have been employed to make detection models more robust against sophisticated fake news crafted to evade detection. These models collectively address various challenges such as contextual understanding, relational data processing, and robustness against adversarial examples, providing a comprehensive approach to fake news detection on social media platforms.

Future work in the field of fake news detection could focus on several promising areas. One significant direction is the development and refinement of hybrid models that integrate deep learning and traditional machine learning approaches to further enhance accuracy and efficiency. This includes experimenting with different combinations of CNNs and LSTMs for better spatial and temporal feature extraction, as well as exploring advanced transformer models like BERT to improve contextual understanding of the LIAR dataset.

8 Conclusion

In the ever-evolving realm of information dissemination, the emergence of fake news has spurred a dynamic landscape of research and innovation in its detection. This survey paper delved into the recent trends and challenges within this critical domain. As evidenced by the strides made in recent years, the integration of advanced machine learning techniques, natural language processing, and network analysis has yielded promising results in identifying deceptive content. The collaborative efforts of researchers across disciplines have fostered a deeper understanding of the multifaceted nature of fake news, contributing to the refinement of detection models.

Nonetheless, the journey to effective fake news detection is riddled with complexities. Adversarial attacks persistently challenge the robustness of models, urging the need for adversarial training and enhanced security measures. The incorporation of multi-modal analysis, contextual nuances, and explainable AI offers a multi-dimensional approach to addressing the evolving tactics of misinformation.

Ethical considerations loom large, underscoring the importance of striking a balance between privacy, bias mitigation, and the responsible use of AI. The convergence of real-time detection capabilities represents an evolving frontier in the battle against fake news.

In summation, this survey illuminates the substantial progress made and the intricate challenges that lie ahead in the domain of fake news detection. As technology continues to shape the way information is disseminated, a concerted effort to navigate these challenges will pave the way for a more informed, trustworthy, and resilient information ecosystem.