An efficient model for detecting COVID fake news using optimal lightweight convolutional random forest

Birunda, S. Selva; Devi, R. Kanniga; Muthukannan, M.

doi:10.1007/s11760-023-02938-9

An efficient model for detecting COVID fake news using optimal lightweight convolutional random forest

Original Paper
Published: 24 January 2024

Volume 18, pages 2659–2669, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Signal, Image and Video Processing Aims and scope Submit manuscript

An efficient model for detecting COVID fake news using optimal lightweight convolutional random forest

Download PDF

S. Selva Birunda¹,
R. Kanniga Devi² &
M. Muthukannan³

200 Accesses
Explore all metrics

Abstract

Nowadays, social media networks like Facebook, Twitter, Snapchat, Instagram, LinkedIn, and WhatsApp provide communication and connection on a huge scale. The revolution of social media networking has shared information and improved the digital world. Though these platforms are improved in creating new things, they have a dark side that leads in the wrong direction. The main dark side of this media is spreading false news against the people. The fake spreading has both advantages and disadvantages toward people. In particular, during the pandemic of COVID-19, the false news made people believe and misguided people into unexpected situations. Therefore, it is necessary to restrict false news to not reach a huge audience. A novel approach lightweight convolutional random forest-based honey badger (LCRF-HB) is proposed for the detection of fake news via three stages, namely the pre-processing of data, selecting features, and classifying features. Stop-word deletion, stemming, and tokenization are applied to pre-process input data during the pre-processing stage. Then, the features are minimized and the accuracy is enhanced in the selection stage via the honey badger (HB) optimization algorithm. The selected features are then provided to the classification phase where the lightweight convolutional random forest (LCRF) algorithm is used for classifying whether the news is fake or not. The performance metrics attain an accuracy of 98.7%, precision of 98.3%, specificity of 95.4%, and recall of 97.6%, respectively. The comparative analysis and performance evaluation are performed and enable a good performance rate as compared to other fake detection methods.

Fake News Detection Using Convolutional Neural Networks and Random Forest—A Hybrid Approach

Fake News Predictor: A Random Forest-Based Web Application for the Prediction of Fake News on Social Media

A Dynamic Approach for Detecting the Fake News Using Random Forest Classifier and NLP

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In today’s world, social media become a popular online activity that helps to share and communicate with other people throughout the world. The advancement of online activity leads to the creation of abundant fake news that circulates throughout the world. The advertisements made through social media sometimes provide false information. Social media users increased by 25% due to the COVID-19 lockdown. The increment of social media users was the main reason for the circulation of fake news. The fake news obtained in the tweet was spread very fast as compared to other normal social networks. Daily average use of Twitter was 30% during the lockdown [1]. COVID-19 was spread from Wuhan, China. The average affected range of COVID-19 news was circulated through social media instantly. Social media provide freedom for the user to publish fake news without collecting real data related to COVID-19 [2]. The extensive connectivity of people on social media is the main reason for spreading fake news abundantly. Social network including Facebook, Twitter, etc., creates a lot of fake details that are being shared by people without knowing the real information [3]. The fake information is largely spread due to the increase in traffic on social media. Social media shared a large number of fake news that surrounded all over the world among people during the COVID-19 lockdown. During the COVID-19 pandemic, fake news provides more attention among people than real news [4]. During the pandemic situation of COVID-19, incorrect, useless, and harmful information is shared among people from different countries [5].

The SARS-CoV-2 obtained is an important challenge in our world related to health. In the twenty-first century, the spreading of fake news became more prevalent, particularly in the US presidential election held in the year 2016 [6]. Fake news was characterized as misinformation and disinformation. Misinformation means fake news was shared without knowing the facts. Disinformation explains fake news is spread purposely among people to attract their attention [7]. In recent days, the biggest issue circulating the world was the transfer of COVID-19 fake news. Knowledge of complete information about particular news can able to eradicate the spreading of COVID-19 fake news [8]. Sharing of information is a major phenomenon done through social media. The information transfer among people becomes more complex due to language, but social media eradicates the inconvenience obtained in sharing information around the world. All the information shared through social media is not fact, and it does not provide useful details for people. The social media user could analyze the information thoroughly before sharing it on the internet. Cultural evolution (CE) helps to reduce the complexity obtained in sharing information in the cultural system [9]. Twitter and Instagram are the social media networks that have become more popular in the past few years as they provide an easy way to share information abundantly. This leads to the spreading of rumors which creates negative thoughts among people [10].

1.1 Novelty

Due to the spread of fake news during COVID-19 period, apart from the disease, the stress caused by the pandemic and the fear of spreading the disease have become psychological issues. Therefore, the detection of fake news is essential, and many researchers have employed various techniques to detect fake news of COVID-19.

Novel Approach: A novel approach lightweight convolutional random forest-based honey badger (LCRF-HB) is proposed for fake news detection thereby enhancing the detection accuracy.

Minimize Loss Function: The features are selected by employing the honey badger (HB) optimization algorithm, this HB can have the ability to reduce the loss functions, and the lightweight convolutional random forest (LCRF) algorithm is employed for the classification. The LCRF classifies the features that consume less memory and the performance rate is also improved.

The major contribution of this paper is explained as follows;

A novel technique is proposed for the detection of COVID-19 fake news via four stages such as pre-processing the data, reducing the features, and selecting and classifying the features.
A novel lightweight convolutional random forest-based honey badger (LCRF-HB) algorithm is proposed for detecting the fake news of COVID-19 with a higher rate of accuracy.
The proposed LCRF-HB approach is compared with various approaches for analyzing the effectiveness of the system.

The remaining section of the paper is arranged as follows: In Sect. 2, various surveys are discussed. The proposed methodology of LCRF-HB is explained in Sect. 3. The experimental result is described in Sect. 4. The conclusion is explained in Sect. 5.

2 Literature survey

Al-Ahmad et al. [11] illustrated an evolution-based approach for detecting the fake news of COVID-19. The methods in the approached paper, PSO (particle swarm optimization), GA (genetic algorithm), and SSA (salp swarm algorithm), were used for the reduction of consistent appearance and to select three wrappers of evolutionary classifications for execution. The dataset used for implementation was the Koirala dataset. The result indicated that the approach outperformed the other conventional classifiers and achieved an accuracy of 75.4%. However, the detection method used on other domains needed larger datasets. Paka et al. [12] established fake news detection to avoid the spread of false information. The technique cross-SEAN (stitch semi-supervised neural attention) holds the unlabeled data which have the larger portion. A large-scale CTF dataset was used to eliminate the fake tweets. The metrics applied for predicting the performance rates were accuracy and f1-score, respectively. Finally, the accuracy obtained the rate of 0.95% and the result provided the best performance using the technique for real-time detection of fake tweets. Meanwhile, the image media cannot be extracted.

Abdelminaam et al. [13] elaborated a deep learning technique for the detection of misleading information on Twitter during the COVID-19 pandemic. The techniques are modified LSTM and modified GRU to detect fake news. The dataset used for performance evaluation such as CoAID, Politifact, and gossip cop, respectively. Thus, the result found that the fake news and non-fake detection of tweets from COVID-19 information has a high accuracy rate. The drawbacks of this introduced method were there are no multi-class stages to combine the context, temporal, and content features. Michail et al. [14] reviewed a novel scheme to detect fake news by utilizing graph convolutional networks (GCN) in social media. The approached methods were used for verifying the profiles, fake news spreading messages, and graphing participants. BuzzFeedNews and LIAR datasets were used. The result obtained is that the fake information can able to extracted from the textual information in social media and shows the best performance achieved by the accuracy of 0.913%, respectively. The challenge of the approached paper was the fusion of multimedia was not improved on detection.

Dong et al. [15] evaluated a two-path deep semi-supervised learning technique for fake news detection in real time. Supervised learning was for analyzing the few amounts of labeled data, and unsupervised learning was for obtaining the huge amount of unlabeled data. The two datasets are PHEME and LIAR. The parameters of metrics are accuracy, precision, f1-score, and recall. As a result, the methods are used to identify fake news from the labeled data. On the other hand, dependency analysis and sentiment analysis on NLP tasks should not take place on detection. Meel and Vishwakarma [16] described self-ensembling for the detection of fake news articles using the convolutional neural network semi-supervised framework method. The methods were used for hiding the stylometric and linguistic information from the unlabeled data. The Kaggle dataset was used for detecting fake news articles. The result executed with 93.4% accuracy acquired the best performance for fake articles of labeled data. Meanwhile, online information multimedia does not analyze the text news for the detection of fake news.

Kaliyar et al. [17] illustrated a deep learning (DL) technique for detecting fake news in social media based on BERT. The technique FakeBERT (Bidirectional Encoder Representations from Transformers) was implemented to bring out the combination of CNN into single-layered parallel blocks. The parameters were FNR (False Negative Rate), FPR (False Positive Rate), accuracy, cross-entropy loss, and confusion matrix utilized to evaluate the performance. Thus, the result found that from the existing method, the approached methods in this paper outperform the high accuracy rate of 95.9%. The drawback is that the binary and the multi-class real-world datasets were not applicable. Madani et al. [18] demonstrated a technique of artificial intelligence for fake news detection during the period of the COVID-19 pandemic. The techniques were used for detecting the new tweets using the process of machine learning (ML), natural language, and deep learning (DL). The parameter metrics evaluated such accuracy, precision, and f1-score. The result showed that the technique performs better-lacking contemplation of new tweet features with an accuracy of 79%. However, the detection of end-to-end encryption was highly realistic and difficult to detect the manipulation of audio or video.

Khanday et al. [24] analyze the detection of fake news in social media by engaging machine learning (ML) algorithms. Also, Khanday et al. [25] identified a piece of fake news on social media employing ML algorithms. Decision tree (DT) achieves greater efficiency results in detecting fake news systems. Khanday et al. [26] also analyze an LSTM model for propaganda detection on the database of Twitter. The Ensemble approach is explained by Khanday et al. [27] for the identification of fake news on COVID-19 in Online Social Networks (OSN). The AdaBoost attains a greater efficiency of 95.3%; AdaBoost can be employed to enhance the weights of the learning algorithms.

Dixit et al. [28] developed a Levy flight honey badger optimized convolutional neural network to detect fake news, an approach developed for balancing datasets, and poor selection of features. The ISOT dataset is employed for detecting fake news, this approach achieves 95% accuracy, but it does not work well with a large number of datasets. To identify proton-exchange membrane fuel cells, Han and Ghadimi [29] illustrated a convolutional neural network (CNN) and extreme learning machine (ELM) approach with an improved honey badger algorithm (IHBA). By employing the IHBA approach the integration of the CNN and ELM model is enhanced to get the optimal results, this CNN and ELM model attains greater efficiency, and the approach leads to a local optimum.

3 Proposed methodology

The block diagram of the proposed LCRF-HB algorithm based on the detection of fake news about COVID-19 on the Twitter platform is portrayed in Fig. 1. This method consists of three stages such as data pre-processing stage, feature selection stage, and classification stage [19]. The input data are pre-processed by applying stemming, stop-word deletion, and tokenization during the data pre-processing. In the second stage, the features are selected in the feature selection phase using a honey badger algorithm. Finally, a novel light weight convolutional random forest-based honey badger is proposed for identifying fake news. The detailed description of each respective phase is discussed as follows;

3.1 Data pre-processing phase

The data pre-processing can transform the inconsistent, unstructured, unfinished data and variables into the understanding of machine patterns. In this phase, omission of data pre-processing, tokenization, stop-words, and stemming process are executed.

Stemming process: The stemming process objective is to gain basic words that contain similar meanings to different words. Therefore, adjectives, adverbs, nouns, and verbs are transformed into source form. The words consultative, consultant, consulting, and consultants come from the source work consult.

Tokenization: Tokenization is the segmentation of small segments from the original text called tokens. The text data’s punctuation is eliminated by way of tokenization. The number filters are applied for the removal of number terms from the specific sentence. Moreover, the textual data are converted into lower cases by using case convertors. Finally, the N-char filters are used for removing a few characters.

Stop-words deletion: Stop-words are not crucial to use these words often in the sentence. But it is used to complete and merge the sentence. In English, there are more than 500 stop words used in the sentence, namely pronouns, prepositions, and conjunctions, e.g., on, am, under, against a, once, too, any, etc. Therefore, processing time and space are saved by the stop-word deletion.

3.2 Feature selection phase

After the data pre-processing phase, the feature selection phase is employed. Feature selection, in other words, referred to as attribute selection selects the appropriate features from the dataset to attain accurate classification performances. In this paper, a honey badger algorithm (HBA) is utilized for selecting the features. A detailed description of the HB algorithm is mentioned as follows.

3.2.1 Honey Badger (HB) algorithm

The behavior of honey badger's foraging is imitated by the utilization of HBA. The honey badger is utilized to either follow or dig and smell the honeyguide bird to locate the food source [22]. In HBA, first case is the digging mode and the second case is the honey mode. In the prior mode, the honey badger utilized its smelling ability to approximate the prey location. The HBA is split into two stages; the honey phase and the digging phase. The mathematical formulation of HBA is described in the below section;

The candidate solution population in the HBA is calculated and expressed in the below equation;

$$ \left[ {\begin{array}{*{20}c} {z_{11} } & {z_{12} } & {z_{13} } & \cdots & {z_{1F} } \\ {z_{21} } & {z_{22} } & {z_{23} } & \cdots & {z_{2F} } \\ {} & \cdots & \cdots & \cdots & {} \\ {z_{o1} } & {z_{o2} } & {z_{o3} } & \cdots & {z_{eF} } \\ \end{array} } \right] $$

(1)

Stage 1: Initialization

The total numbers of the honey badgers are initialized and it is expressed in the below equation;

$$ z_{k} = {\text{LB}}_{k} + \gamma_{1} \times \left( {{\text{UB}}_{k} - {\text{LB}}_{k} } \right) $$

(2)

From Eq. (2), the honey badger $k{\text{th}}$ position of the individual is depicted as $z_{k}$, the upper bound is depicted as ${\text{UB}}_{k}$, the lower bound is depicted as ${\text{LB}}_{k}$, and the random number is indicated by $\gamma_{1}$.

Step 2: Solution Representation

The objective of using the solution representation process is to reduce the total chosen features as well as the error percentage acquired between the original and predicted density for distinguishing fake news data from real ones. If the dataset contains $D$ features, the decision variable will be assumed $1 + D$ for feature selection and bandwidth identification. Each variable in the dataset ranges between 0 and 1. The corresponding features based on fake news are chosen from the dataset when the variable value is more than 0.5, and on the contrary, if the variable is less than 0.5 then features are not chosen.

Step 3: Fitness evaluation

The obtained solution representations are changed into binary values $[0,1][0,1]$ by the HB algorithm to represent the feature. The solution vector dimension is indicated as ‘1,’ and the solution vector dimension $y_{j}^{{{\text{Dm}}}}$ with value ‘0’ indicates no features selected from the data. The mathematical formulation of converting solution representation to binary values is depicted by,

$$ y_{j}^{{{\text{Dm}}}} = \left\{ {\begin{array}{*{20}c} {1,} & {y_{j}^{{{\text{Dm}}}} \ge 0.5} \\ {0,} & {y_{j}^{{{\text{Dm}}}} < 0.5} \\ \end{array} } \right. $$

(3)

Subsequently, fitness is determined by,

$$ F_{{{\text{ITNESS}}}} { = }\varpi_{1} .\left( \begin{gathered} 1 - {\text{Accuracy}}({\text{LCRF}}) + \hfill \\ \varpi_{2} .\left| {\frac{{{\text{No}}{.}\,{\text{of}}\,{\text{features}}\,{\text{selected}}}}{{{\text{Total}}\,{\text{no}}{.}\,{\text{of}}\,{\text{features}}}}} \right| \hfill \\ \end{gathered} \right) $$

(4)

$$ {\text{Accuracy}}\,{\text{(LCRF)}} = \frac{{E_{n} }}{{E_{n} + C_{n} }} $$

(5)

From the above equations, the error rate weight $\varpi_{1} = [0,1]$, feature selection weight $\varpi_{2} = 1 - \varpi_{1}$; the term ${\text{Accuracy}}\,({\text{LCRF}})$ represents the accuracy rate of the LCRF classification model; $E_{n}$ signifies incorrectly classified samples; and $C_{n}$ implies correctly classified samples.

Step 4: Defining the intensity $\left( {\rm I} \right)$.

Intensity is defined as concentrating the prey's strength and distance among the $k{\text{th}}$ honey badger ${\rm I}_{k}$ representing the prey's smell intensity. The motion becomes vice versa when the smell intensity of prey is higher.

$$ \begin{aligned} {\rm I}_{k} & = \gamma_{2} \times \frac{\delta }{{4\pi f_{k}^{2} }} \\ \delta & = \left( {z_{k} - z_{k + 1} } \right)^{2} \\ f_{k} & = z_{{{\text{PREY}}}} - z_{k} \\ \end{aligned} $$

(6)

From the above equation, the concentration strength is depicted as $\delta$ the distance between $k{\text{th}}$ badger and prey is depicted as $f_{k}$.

Step 5: Density Factor Updation

The time-varying randomization is controlled by using the density factor to make a consistent transition from the exploitation and exploration phases. Then, it is expressed as;

$$ \gamma = E \times {\text{EXP}}\left( {\frac{ - v}{{v_{{{\text{MAX}}}} }}} \right), $$

(7)

From the above equation, the maximum numbers of iterations are denoted by $v_{{{\text{MAX}}}}$, and the constant represented by $E$.

Step 6: Escape with local optima

The HBA utilizes the flag to assist higher opportunities to scan the search spaces formally.

Step 7: Update the agent’s position

The updation of HBA is split into honey and the digging phase.

Step 7-1: Digging stage

In the digging stage, the honey badger acted for cardioid shape and they are expressed in the below equation;

$$ \begin{aligned} z_{{{\text{NEW}}}} & = z_{{{\text{PREY}}}} + H \times \lambda \times K \times z_{{{\text{PREY}}}} \\ & \quad + H \times \gamma_{3} \times \chi \times f_{k} \times \left| {\cos \left( {2\Pi \gamma_{4} } \right) \times \left[ {1 - \cos \left( {2\Pi \gamma_{5} } \right)} \right]} \right| \\ \end{aligned} $$

(8)

The prey position is represented by $z_{{{\text{PREY}}}}$, and the various random numbers are represented by $\gamma_{1} ,\gamma_{2}$ and $\gamma_{3}$.

$$ H = \left\{ {\begin{array}{*{20}l} 1 \hfill & {{\text{if}}\,\gamma_{6} \le 0.5} \hfill \\ { - 1} \hfill & {{\text{else}}} \hfill \\ \end{array} } \right. $$

(9)

$f_{k}$ represents a time-varying factor with search influence.

Step 7-2: Honey stage

The honeyguide birds are followed by honey badger to reach the beehive as expressed in the below equation.

$$ z_{{{\text{NEW}}}} = z_{{{\text{PREY}}}} + H \times \gamma_{7} \times \chi \times f_{j} $$

(10)

The random number is represented by $\gamma_{7}$, the prey location is indicated by $z_{{{\text{PREY}}}}$, and the honey badger's new position is indicated by $z_{{{\text{NEW}}}}$.

3.3 Classification using lightweight convolutional random forest (LCRF) algorithm

The classification phase plays a major role in detecting fake news. In this paper, a novel lightweight convolutional random forest (LCRF) algorithm is employed for the optimal classification. A detailed description of each technique involved in classification is discussed below.

3.3.1 Architecture of LCNN model

The lightweight convolutional neural network (LCNN) is a networking design that consists of two factors: minimum computing unit node-set ${\text{Nd}}$ and connecting dual edge node-set ${\text{Ed}}$[20]. Hence, the LCNN is represented as follows:

$$ {\text{LCNN}} = ({\text{Nd}},{\text{Ed}}) $$

(11)

The minimum computing node set is formulated as ${\text{Nd}} = \{ {\text{Nd}}_{i} \left| {i = 1,..,n\} } \right.$. The computing unit node is utilized for a single convolutional operation or for adding numerous convolutional operations. They were establishing the convolutional operation diversity. The connecting deal edge node set is represented as ${\text{Ed}} = \{ {\text{Ed}}_{i,j} \left| {1 < i < j \le n\} } \right.$. The ${\text{Ed}}$ is noted that they cannot connect themselves, and there are no loops in an unaligned structure of the network. The values $i,j$ in the set of edge nodes ${\text{Ed}}$ need to meet the requirements $P \cup Q = N$; the value of the set $i$ is captured in the value of the set ${\text{Nd}}$ which is represented as $N$. To establish the LCNN, the minimum node located in a set of computing units ${\text{Nd}}$ should reorder the group $a$ to several $\beta$ set nodes in the equal-length path.

$$ {\text{Nd}} = \{ {\text{Nd}}_{{{\text{Ka}}}} \left| {K = 1,..,a;i = 1,...b\} } \right.,\,(n = b \times a) $$

(12)

The groups of different nodes are not adjacent to each other, the full case of edges are connecting small computing unit nodes that are rejected with a value of a definite probability of rejection connectivity. Hence, the edge node set ${\text{Ed}}$ is represented as follows:

$$ {\text{Ed}} = \{ {\text{Ed}}_{ij} \left| {i \in \{ (1,1),(1,2),....(a,b)\} ;..,a;i = 1,...b\} } \right., $$

(13)

$$ j \in \{ (i + 1,1),(i + 1,b)\} \cup \beta * d_{{{\text{rop}}}} \{ (i + 2,1),....,(a,b)\} \} $$

(14)

The above equation $\beta$ denotes the value of disconnection probability.

3.3.2 Random forest

Random forest is based on the concept of an ensemble classifier which classifies the decision tree containing a higher number of votes at the final result of classification. In the classification of clustering analysis and regressive analysis problems in machine learning, the RF algorithm has provided excellent performance to establish a suitable model with some minor adjustments in the hyperparameters setting [21]. The different samplings of bootstrap were generated individually from every decision tree in the random forest. The errors in the classifying decision tree are determined by the different tree's abilities in the classification. Furthermore, the majority votes of the decision tree maximized the random forest classification accuracy rate.

Steps for random forest generation:

Bagging: Bagging is the first step for generating a random forest. The algorithm of RF samplings is extracting randomly in ${2 \mathord{\left/ {\vphantom {2 3}} \right. \kern-0pt} 3}$ times from the initial data training set $Td = \left\{ {\left( {a_{1} ,b_{1} } \right),\left( {a_{2} ,b_{2} } \right)\left( {a_{i} ,b_{i} } \right)} \right\}$ to develop training subsets that are formed by a decision tree. The different samplings of bootstrap sets are captured via bagging to establish each decision tree in a random forest. OOB is an out-of-bag-data error rate that is employed to measure the classification ability of the random forest. OOB evaluation is more efficient compared with cross-validation. The Gini index and the OOB error method are involved in the estimation of features, whereas the DT can compute the error rate while the missed classification probability ${\text{Mgi}}$ is estimated by the Gini approach.

$$ {\text{Mgi}} = \sum\limits_{cl = 1}^{cl} {P_{cl} } (1 - P_{cl} ) = 1 - \sum\limits_{cl = 1}^{cl} {P_{cl}^{2} } $$

(15)

From Eq. (5), the numbers of classes are denoted as ${\text{cl}}$, $M$ indicates the node, and the probability class is represented as $p_{{{\text{cl}}}}$ According to the node, $M$ the random forest feature importance $X_{j}$ was calculated by means. This scenario is formulated as follows:

$$ {\text{Mvi}}_{{{\text{Mj}}}}^{{{\text{GI}}}} = {\text{Mgi}} - {\text{Lgi}} - {\text{Rgi}} $$

(16)

${\text{Lgi}}$ depicts the left node, and ${\text{Rgi}}$ depicts the right node of the Gini index.

3.3.3 Constructing the decision tree

The random forest technique maximizes the decision tree diversity through the construction of various training subsets to enhance their efficiency. Finally, each decision tree model received each classification result.

$$ B = \left\{ {b_{1} \left( a \right),b_{2} \left( a \right),.....b_{i} \left( a \right)} \right\} $$

(17)

From the above equation, $B$ the systematic classification model consists of each model in the decision tree. The result of the terminal categorization voting is determined through a combination of categorization models.

$$ {\text{Td}}(a) = \max \,\arg \sum\limits_{i = 1}^{i} {F(b_{I} } (a) = {\text{BF}}) $$

(18)

From the above equation, classifying the single decision tree model is represented as $b_{I} (a)$, ${\text{BF}}$ is the fault blocking class, and the combined classification model is denoted as ${\text{Td}}(a)$. The indicator function is denoted as $F(.)$ Fig. 2 illustrating COVID-19 fake news detection by using the proposed LCRF-HB approach.

4 Experimental results and discussion

The proposed light weight convolutional random forest-based honey badger (LCRF-HB) algorithm for analyzing the COVID-19 fake news by using Twitter tweets. The remaining sub-sections explain the exactness of COVID-19 fake news prediction.

4.1 Hyperparameter configuration

To identify the optimal parameter values of the LCRF-HB in achieving better performance a hyperparameter configuration is engaged. The LCRF-HB algorithm hyperparameter configuration is tabulated in Table 1.

Table 1 Hyperparameter configuration

Full size table

4.2 Dataset description

The Twitter dataset [23] is employed for analyzing COVID-19 fake news from tweets, and the data are gathered through public accounts. The COVID-19-related information is determined from the COVID-19-related tags. The fake news data were collected from the duration of December 2019 to June 2020. Fake data were collected from Google and Twitter, and these data were filtered through their specific browser. These collected data were gathered to form the Twitter dataset which is in the form of a.csv file. After this, the collected data were taken into the processing stage. On Twitter, the COVID-19 tags were gathered and stored, and each rumor sentiment was labeled by carefully analyzing the sentiment of the rumor’s content and context. In addition, metadata for each tweet are gained including reply or retweet comment content, numbers like retweet and like, and date of publication. These data were stored individually; also, the Google data were gathered by employing the HTTP servers. The data are taken from the results page of Google, and the results are absolute from their relative paths by URL and date. Each rumor record has an authenticity label and its content, replay record, date, reply website, etc., not all source websites contain date information. The Twitter dataset features are explained in Table 2.

Table 2 Features of the twitter dataset

Full size table

4.3 Performance analysis

Various approaches like particle swarm optimization (PSO) algorithm, cross-stitch semi-supervised neural attention model (cross-SEAN), modified long short-term memory (Modified LSTM), convolutional neural network-based long short-term memory (C-LSTM), and proposed light weight convolutional random forest-based honey badger (LCRF-HB) algorithm are utilized for the evaluating the performance. The accuracy analysis performed by utilizing various approaches such as PSO, modified LSTM, C-LSTM, cross-SEAN, and proposed LCRF-HB approaches is illustrated in Fig. 3. As compared to other existing approaches, the proposed LCRF-HB method acquired a greater accuracy of 98.7%. The COVID-19 fake news is detected with higher accuracy in the LCRF-HB than other approaches.

Figure 4 illustrates the precision analyses of various approaches like PSO, modified LSTM, C-LSTM, cross-SEAN, and the proposed LCRF-HB approach. The precision analysis of the LCRF-HB approach acquired a greater value of 98.3%, and the C-LSTM approach has the lowest precision rate of 88%. From this comparative analysis, the COVID-19 fake news is detected with a high precision rate.

The recall rate of various methods like PSO, Modified LSTM, C-LSTM, and cross-SEAN and the proposed LCRF-HB approach is portrayed in Fig. 5. The proposed LCRF-HB approach acquired a higher recall rate of 97.6% than the other methods. The recall rates of 92%, 85%, and 88% are acquired from the PSO model, modified LSTM, and cross-SEAN.

Figure 6 depicts the specificity analysis by utilizing PSO design, modified LSTM approach, C-LSTM model, cross-SEAN model, and proposed LCRF-HB approach. The specificity rates of 84%, 90%, 80%, 92%, and 95.4% are detected by using the PSO design, modified LSTM approach, C-LSTM model, cross-SEAN model, and proposed LCRF-HB approach, respectively. The proposed LCRF-HB approach attained a higher sensitivity rate for detecting the fake news of COVID-19 when compared to other methods.

Figure 7 illustrates the performance analysis of the proposed LCRF-HB approach with different parameters like recall, precision, accuracy, and specificity. The proposed LCRF-HB approach attained 98.7% of accuracy, 95.4% of specificity, 98.3% of precision, and 97.6% recall for COVID-19 fake news detection.

Figure 8a, b depicts the performance of training and validation of accuracy and loss. The performance of training data accuracy attains the value of 0.8%, where the validation accuracy of 0.7%. The effectiveness of training loss attains 0.4%, and validation loss of 0.2%.

Figure 9 depicts the performance analysis of the various state-of-the-art algorithms such as FakeBERT, GCN, and ConvNet concerning the proposed LCRF-HB approach. The performance of FakeBERT attains 95.9%, GCN attains 91.3%, and ConvNet attains 93.4%, but the proposed LCRF-HB attains a greater efficiency of 98.7%.

5 Conclusion

The major objective of this paper is to examine COVID-19 fake news detection. The data pre-processing, feature selection, and classification are the various phases proposed to detect fake news. In data pre-processing, stop-word deletion, stemming, and stop-word deletion are utilized for pro-processing the input data. After data pre-processing, the features are selected using the honey badger algorithm. The LCRF approach is utilized for classifying COVID-19 news as real or fake manually. The Twitter dataset is employed for evaluating the performance of the proposed method. Recall, accuracy, specificity, and precision are the metrics employed for predicting the rate of efficiency. The comparative analysis is performed by utilizing various approaches like PSO design, modified LSTM approach, C-LSTM model, cross-SEAN model, and proposed LCRF-HB approach. The proposed LCRF-HB approach attained a higher 98.3% precision, 95.4% specificity, 97.6% recall, and 98.7% accuracy for detecting COVID-19 fake news. The proposed LCRF-HB approach attained a training accuracy of 98.7%. For future research, various techniques will be integrated to enhance the performance as well as COVID-19 fake news detection optimally.

Availability of data and material

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Code availability

Not applicable.

References

Apuke, O.D., Omar, B.: Fake news and COVID-19: modelling the predictors of fake news sharing among social media users. Telematics Inform. 56, 101475 (2021)
Article Google Scholar
Naeem, S.B., Bhatti, R., Khan, A.: An exploration of how fake news is taking over social media and putting public health at risk. Health Inform. Libr. J. 38(2), 143–149 (2021)
Article Google Scholar
Schuetz, S.W., Sykes, T.A., Venkatesh, V.: Combating COVID-19 fake news on social media through fact checking: antecedents and consequences. Eur. J. Inf. Syst. 30(4), 376–388 (2021)
Article Google Scholar
Zeng, J., Zhang, Y., Ma, X.: Fake news detection for epidemic emergencies via deep correlations between text and images. Sustain. Cities Soc. 66, 102652 (2021)
Article PubMed Google Scholar
Monterrubio, S.M.M., Noain-Sánchez, A., Pérez, E., Crespo, R.G.: Coronavirus fake news detection via MedOSINT check in health care official bulletins with CBR explanation: the way to find the real information source through OSINT, the verifier tool for official journals. Inf. Sci. 574, 210–237 (2021)
Article Google Scholar
Kim, J., Aum, J., Lee, S., Jang, Y., Park, E., Choi, D.: FibVID: comprehensive fake news diffusion dataset during the COVID-19 period. Telemat. Inf. 64, 101688 (2021)
Article Google Scholar
Ceron, W., de Lima-Santos, M.F., Quiles, M.G.: Fake news agenda in the era of COVID-19: identifying trends through fact-checking content. Online Soc. Netw. Media 21, 100116 (2021)
Article Google Scholar
Choraś, M., Demestichas, K., Giełczyk, A., Herrero, A., Ksieniewicz, P., Remoundou, K.U.D., Woźniak, M.: Advanced machine learning techniques for fake news (online disinformation) detection: a systematic mapping study. Appl. Soft Comput. 101, 107050 (2021)
Article Google Scholar
De Oliveira, D.V.B., Albuquerque, U.P.: Cultural evolution and digital media: diffusion of fake news about COVID-19 on Twitter. SN Comput. Sci. 2(6), 1–12 (2021)
Article CAS Google Scholar
Wani, A., Joshi, I., Khandve, S., Wagh, V., Joshi, R.: Evaluating deep learning approaches for covid19 fake news detection. In Combating Online Hostile Posts in Regional Languages during Emergency Situation: First International Workshop, CONSTRAINT 2021, Collocated with AAAI 2021, Virtual Event, February 8, 2021, Revised Selected Papers 1, pp 153–163. Springer (2021)
Al-Ahmad, B., Al-Zoubi, A.M., Abu Khurma, R., Aljarah, I.: An evolutionary fake news detection method for covid-19 pandemic information. Symmetry 13(6), 1091 (2021)
Article ADS CAS Google Scholar
Paka, W.S., Bansal, R., Kaushik, A., Sengupta, S., Chakraborty, T.: Cross-SEAN: a cross-stitch semi-supervised neural attention model for COVID-19 fake news detection. Appl. Soft Comput. 107, 107393 (2021)
Article PubMed PubMed Central Google Scholar
Abdelminaam, D.S., Ismail, F.H., Taha, M., Taha, H., Houssein, E.H., Nabil, A.: Coaid-deep: an optimized intelligent framework for automated detecting covid-19 misleading information on Twitter. IEEE Access 9, 27840–27867 (2021)
Article PubMed Google Scholar
Michail, D., Kanakaris, N., Varlamis, I.: Detection of fake news campaigns using graph convolutional networks. Int. J. Inf. Manag. Data Insights 2(2), 100104 (2022)
Google Scholar
Dong, X., Victor, U., Qian, L.: Two-path deep semisupervised learning for timely fake news detection. IEEE Trans. Comput. Soc. Syst. 7(6), 1386–1398 (2020)
Article Google Scholar
Meel, P., Vishwakarma, D.K.: A temporal ensembling-based semi-supervised ConvNet for the detection of fake news articles. Expert Syst. Appl. 177, 115002 (2021)
Article Google Scholar
Kaliyar, R.K., Goswami, A., Narang, P.: FakeBERT: fake news detection in social media with a BERT-based deep learning approach. Multimedia Tools Appl. 80(8), 11765–11788 (2021)
Article Google Scholar
Madani, Y., Erritali, M., Bouikhalene, B.: Using artificial intelligence techniques for detecting Covid-19 epidemic fake news in Moroccan tweets. Results Phys. 25, 104266 (2022)
Article Google Scholar
Dixit, D.K., Bhagat, A., Dangi, D.: Automating fake news detection using PPCA and levy flight-based LSTM. Soft. Comput.Comput. 26(22), 12545–12557 (2022)
Article Google Scholar
He, Y., Li, T.: A lightweight CNN model and its application in intelligent practical teaching evaluation. In: MATEC Web of Conferences, EDP Sciences, vol. 309, p. 05016 (2020)
Zhou, X., Xu, X., Zhang, J., Wang, L., Wang, D., Zhang, P.: Fault diagnosis of silage harvester based on a modified random forest. Inf. Process. Agric. (2022)
Hashi, F.A., Houssein, E.H., Hussain, K., Mabrouk, M.S., Mand Al-Atabany, W.: Honey Badger Algorithm: new metaheuristic algorithm for solving optimization problems. Math. Comput. SimulComput. Simul. 192, 84–110 (2022)
Article MathSciNet Google Scholar
https://www.kaggle.com/datasets/goyaladi/twitter-dataset
Khanday, A.M.U.D., Khan, Q.R., Rabani, S.T.: Analysing and predicting propaganda on social media using machine learning techniques. In: 2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), pp. 122–127. IEEE (2020)
Khanday, A.M.U.D., Khan, Q.R., Rabani, S.T.: Identifying propaganda from online social networks during COVID-19 using machine learning techniques. Int. J. Inf. Technol. 13, 115–122 (2021)
PubMed Google Scholar
Khanday, A.M.U.D., Khan, Q.R., Rabani, S.T., Wani, M.A., ELAffendi, M.: Propaganda identification on twitter platform during COVID-19 pandemic using LSTM. In: International Conference on Cybersecurity, Cybercrimes, and Smart Emerging Technologies, pp. 303–314. Springer, Cham (2022)
Khanday, A.M.U.D., Khan, Q.R., Rabani, S.T.: Ensemble approach for detecting COVID-19 propaganda on online social networks. Iraqi J. Sci. 4488–4498 (2022)
Dixit, D.K., Bhagat, A., Dangi, D.: An accurate fake news detection approach based on a Levy flight honey badger optimized convolutional neural network model. Concurr. Comput. Pract. Exp. 35(1), 7382 (2023)
Article Google Scholar
Han, E., Ghadimi, N.: Model identification of proton-exchange membrane fuel cells based on a hybrid convolutional neural network and extreme learning machine optimized by improved honey badger algorithm. Sustain. Energy Technol. Assess. 52, 102005 (2022)
Google Scholar

Download references

Acknowledgements

Not applicable

Funding

Not applicable.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, School of Computing, Kalasalingam Academy of Research and Education, Krishnan Kovil, Tamil Nadu, India
S. Selva Birunda
School of Computer Science and Engineering, Vellore Institute of Technology Chennai, Chennai, Tamil Nadu, India
R. Kanniga Devi
Department of Civil, KCG College of Technology, Karapakkam, Chennai, 600 097, Tamil Nadu, India
M. Muthukannan

Authors

S. Selva Birunda
View author publications
You can also search for this author in PubMed Google Scholar
R. Kanniga Devi
View author publications
You can also search for this author in PubMed Google Scholar
M. Muthukannan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

SSB, RKD and MM agreed on the content of the study. SSB, RKD and MM collected all the data for analysis. SSB, RKD and MM agreed on the methodology. SSB, RKD and MM completed the analysis based on agreed steps. Results and conclusions are discussed and written together. The author read and approved the final manuscript.

Corresponding author

Correspondence to R. Kanniga Devi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Consent for publication

Not applicable.

Consent to participate

Not applicable.

Informed consent

Not applicable.

Human and animal rights

This article does not contain any studies with human or animal subjects performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Birunda, S.S., Devi, R.K. & Muthukannan, M. An efficient model for detecting COVID fake news using optimal lightweight convolutional random forest. SIViP 18, 2659–2669 (2024). https://doi.org/10.1007/s11760-023-02938-9

Download citation

Received: 23 January 2023
Revised: 18 September 2023
Accepted: 01 December 2023
Published: 24 January 2024
Issue Date: April 2024
DOI: https://doi.org/10.1007/s11760-023-02938-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An efficient model for detecting COVID fake news using optimal lightweight convolutional random forest

Abstract

Similar content being viewed by others

Fake News Detection Using Convolutional Neural Networks and Random Forest—A Hybrid Approach

Fake News Predictor: A Random Forest-Based Web Application for the Prediction of Fake News on Social Media

A Dynamic Approach for Detecting the Fake News Using Random Forest Classifier and NLP

Explore related subjects

1 Introduction

1.1 Novelty

2 Literature survey

3 Proposed methodology

3.1 Data pre-processing phase

3.2 Feature selection phase

3.2.1 Honey Badger (HB) algorithm

3.3 Classification using lightweight convolutional random forest (LCRF) algorithm

3.3.1 Architecture of LCNN model

3.3.2 Random forest

3.3.3 Constructing the decision tree

4 Experimental results and discussion

4.1 Hyperparameter configuration

4.2 Dataset description

4.3 Performance analysis

5 Conclusion

Availability of data and material

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Consent for publication

Consent to participate

Informed consent

Human and animal rights

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation