1 Introduction

In today’s world, social media become a popular online activity that helps to share and communicate with other people throughout the world. The advancement of online activity leads to the creation of abundant fake news that circulates throughout the world. The advertisements made through social media sometimes provide false information. Social media users increased by 25% due to the COVID-19 lockdown. The increment of social media users was the main reason for the circulation of fake news. The fake news obtained in the tweet was spread very fast as compared to other normal social networks. Daily average use of Twitter was 30% during the lockdown [1]. COVID-19 was spread from Wuhan, China. The average affected range of COVID-19 news was circulated through social media instantly. Social media provide freedom for the user to publish fake news without collecting real data related to COVID-19 [2]. The extensive connectivity of people on social media is the main reason for spreading fake news abundantly. Social network including Facebook, Twitter, etc., creates a lot of fake details that are being shared by people without knowing the real information [3]. The fake information is largely spread due to the increase in traffic on social media. Social media shared a large number of fake news that surrounded all over the world among people during the COVID-19 lockdown. During the COVID-19 pandemic, fake news provides more attention among people than real news [4]. During the pandemic situation of COVID-19, incorrect, useless, and harmful information is shared among people from different countries [5].

The SARS-CoV-2 obtained is an important challenge in our world related to health. In the twenty-first century, the spreading of fake news became more prevalent, particularly in the US presidential election held in the year 2016 [6]. Fake news was characterized as misinformation and disinformation. Misinformation means fake news was shared without knowing the facts. Disinformation explains fake news is spread purposely among people to attract their attention [7]. In recent days, the biggest issue circulating the world was the transfer of COVID-19 fake news. Knowledge of complete information about particular news can able to eradicate the spreading of COVID-19 fake news [8]. Sharing of information is a major phenomenon done through social media. The information transfer among people becomes more complex due to language, but social media eradicates the inconvenience obtained in sharing information around the world. All the information shared through social media is not fact, and it does not provide useful details for people. The social media user could analyze the information thoroughly before sharing it on the internet. Cultural evolution (CE) helps to reduce the complexity obtained in sharing information in the cultural system [9]. Twitter and Instagram are the social media networks that have become more popular in the past few years as they provide an easy way to share information abundantly. This leads to the spreading of rumors which creates negative thoughts among people [10].

1.1 Novelty

Due to the spread of fake news during COVID-19 period, apart from the disease, the stress caused by the pandemic and the fear of spreading the disease have become psychological issues. Therefore, the detection of fake news is essential, and many researchers have employed various techniques to detect fake news of COVID-19.

Novel Approach: A novel approach lightweight convolutional random forest-based honey badger (LCRF-HB) is proposed for fake news detection thereby enhancing the detection accuracy.

Minimize Loss Function: The features are selected by employing the honey badger (HB) optimization algorithm, this HB can have the ability to reduce the loss functions, and the lightweight convolutional random forest (LCRF) algorithm is employed for the classification. The LCRF classifies the features that consume less memory and the performance rate is also improved.

The major contribution of this paper is explained as follows;

  • A novel technique is proposed for the detection of COVID-19 fake news via four stages such as pre-processing the data, reducing the features, and selecting and classifying the features.

  • A novel lightweight convolutional random forest-based honey badger (LCRF-HB) algorithm is proposed for detecting the fake news of COVID-19 with a higher rate of accuracy.

  • The proposed LCRF-HB approach is compared with various approaches for analyzing the effectiveness of the system.

The remaining section of the paper is arranged as follows: In Sect. 2, various surveys are discussed. The proposed methodology of LCRF-HB is explained in Sect. 3. The experimental result is described in Sect. 4. The conclusion is explained in Sect. 5.

2 Literature survey

Al-Ahmad et al. [11] illustrated an evolution-based approach for detecting the fake news of COVID-19. The methods in the approached paper, PSO (particle swarm optimization), GA (genetic algorithm), and SSA (salp swarm algorithm), were used for the reduction of consistent appearance and to select three wrappers of evolutionary classifications for execution. The dataset used for implementation was the Koirala dataset. The result indicated that the approach outperformed the other conventional classifiers and achieved an accuracy of 75.4%. However, the detection method used on other domains needed larger datasets. Paka et al. [12] established fake news detection to avoid the spread of false information. The technique cross-SEAN (stitch semi-supervised neural attention) holds the unlabeled data which have the larger portion. A large-scale CTF dataset was used to eliminate the fake tweets. The metrics applied for predicting the performance rates were accuracy and f1-score, respectively. Finally, the accuracy obtained the rate of 0.95% and the result provided the best performance using the technique for real-time detection of fake tweets. Meanwhile, the image media cannot be extracted.

Abdelminaam et al. [13] elaborated a deep learning technique for the detection of misleading information on Twitter during the COVID-19 pandemic. The techniques are modified LSTM and modified GRU to detect fake news. The dataset used for performance evaluation such as CoAID, Politifact, and gossip cop, respectively. Thus, the result found that the fake news and non-fake detection of tweets from COVID-19 information has a high accuracy rate. The drawbacks of this introduced method were there are no multi-class stages to combine the context, temporal, and content features. Michail et al. [14] reviewed a novel scheme to detect fake news by utilizing graph convolutional networks (GCN) in social media. The approached methods were used for verifying the profiles, fake news spreading messages, and graphing participants. BuzzFeedNews and LIAR datasets were used. The result obtained is that the fake information can able to extracted from the textual information in social media and shows the best performance achieved by the accuracy of 0.913%, respectively. The challenge of the approached paper was the fusion of multimedia was not improved on detection.

Dong et al. [15] evaluated a two-path deep semi-supervised learning technique for fake news detection in real time. Supervised learning was for analyzing the few amounts of labeled data, and unsupervised learning was for obtaining the huge amount of unlabeled data. The two datasets are PHEME and LIAR. The parameters of metrics are accuracy, precision, f1-score, and recall. As a result, the methods are used to identify fake news from the labeled data. On the other hand, dependency analysis and sentiment analysis on NLP tasks should not take place on detection. Meel and Vishwakarma [16] described self-ensembling for the detection of fake news articles using the convolutional neural network semi-supervised framework method. The methods were used for hiding the stylometric and linguistic information from the unlabeled data. The Kaggle dataset was used for detecting fake news articles. The result executed with 93.4% accuracy acquired the best performance for fake articles of labeled data. Meanwhile, online information multimedia does not analyze the text news for the detection of fake news.

Kaliyar et al. [17] illustrated a deep learning (DL) technique for detecting fake news in social media based on BERT. The technique FakeBERT (Bidirectional Encoder Representations from Transformers) was implemented to bring out the combination of CNN into single-layered parallel blocks. The parameters were FNR (False Negative Rate), FPR (False Positive Rate), accuracy, cross-entropy loss, and confusion matrix utilized to evaluate the performance. Thus, the result found that from the existing method, the approached methods in this paper outperform the high accuracy rate of 95.9%. The drawback is that the binary and the multi-class real-world datasets were not applicable. Madani et al. [18] demonstrated a technique of artificial intelligence for fake news detection during the period of the COVID-19 pandemic. The techniques were used for detecting the new tweets using the process of machine learning (ML), natural language, and deep learning (DL). The parameter metrics evaluated such accuracy, precision, and f1-score. The result showed that the technique performs better-lacking contemplation of new tweet features with an accuracy of 79%. However, the detection of end-to-end encryption was highly realistic and difficult to detect the manipulation of audio or video.

Khanday et al. [24] analyze the detection of fake news in social media by engaging machine learning (ML) algorithms. Also, Khanday et al. [25] identified a piece of fake news on social media employing ML algorithms. Decision tree (DT) achieves greater efficiency results in detecting fake news systems. Khanday et al. [26] also analyze an LSTM model for propaganda detection on the database of Twitter. The Ensemble approach is explained by Khanday et al. [27] for the identification of fake news on COVID-19 in Online Social Networks (OSN). The AdaBoost attains a greater efficiency of 95.3%; AdaBoost can be employed to enhance the weights of the learning algorithms.

Dixit et al. [28] developed a Levy flight honey badger optimized convolutional neural network to detect fake news, an approach developed for balancing datasets, and poor selection of features. The ISOT dataset is employed for detecting fake news, this approach achieves 95% accuracy, but it does not work well with a large number of datasets. To identify proton-exchange membrane fuel cells, Han and Ghadimi [29] illustrated a convolutional neural network (CNN) and extreme learning machine (ELM) approach with an improved honey badger algorithm (IHBA). By employing the IHBA approach the integration of the CNN and ELM model is enhanced to get the optimal results, this CNN and ELM model attains greater efficiency, and the approach leads to a local optimum.

3 Proposed methodology

The block diagram of the proposed LCRF-HB algorithm based on the detection of fake news about COVID-19 on the Twitter platform is portrayed in Fig. 1. This method consists of three stages such as data pre-processing stage, feature selection stage, and classification stage [19]. The input data are pre-processed by applying stemming, stop-word deletion, and tokenization during the data pre-processing. In the second stage, the features are selected in the feature selection phase using a honey badger algorithm. Finally, a novel light weight convolutional random forest-based honey badger is proposed for identifying fake news. The detailed description of each respective phase is discussed as follows;

Fig. 1
figure 1

Proposed workflow based on the identification of COVID-19 fake news

3.1 Data pre-processing phase

The data pre-processing can transform the inconsistent, unstructured, unfinished data and variables into the understanding of machine patterns. In this phase, omission of data pre-processing, tokenization, stop-words, and stemming process are executed.

Stemming process: The stemming process objective is to gain basic words that contain similar meanings to different words. Therefore, adjectives, adverbs, nouns, and verbs are transformed into source form. The words consultative, consultant, consulting, and consultants come from the source work consult.

Tokenization: Tokenization is the segmentation of small segments from the original text called tokens. The text data’s punctuation is eliminated by way of tokenization. The number filters are applied for the removal of number terms from the specific sentence. Moreover, the textual data are converted into lower cases by using case convertors. Finally, the N-char filters are used for removing a few characters.

Stop-words deletion: Stop-words are not crucial to use these words often in the sentence. But it is used to complete and merge the sentence. In English, there are more than 500 stop words used in the sentence, namely pronouns, prepositions, and conjunctions, e.g., on, am, under, against a, once, too, any, etc. Therefore, processing time and space are saved by the stop-word deletion.

3.2 Feature selection phase

After the data pre-processing phase, the feature selection phase is employed. Feature selection, in other words, referred to as attribute selection selects the appropriate features from the dataset to attain accurate classification performances. In this paper, a honey badger algorithm (HBA) is utilized for selecting the features. A detailed description of the HB algorithm is mentioned as follows.

3.2.1 Honey Badger (HB) algorithm

The behavior of honey badger's foraging is imitated by the utilization of HBA. The honey badger is utilized to either follow or dig and smell the honeyguide bird to locate the food source [22]. In HBA, first case is the digging mode and the second case is the honey mode. In the prior mode, the honey badger utilized its smelling ability to approximate the prey location. The HBA is split into two stages; the honey phase and the digging phase. The mathematical formulation of HBA is described in the below section;

The candidate solution population in the HBA is calculated and expressed in the below equation;

$$ \left[ {\begin{array}{*{20}c} {z_{11} } & {z_{12} } & {z_{13} } & \cdots & {z_{1F} } \\ {z_{21} } & {z_{22} } & {z_{23} } & \cdots & {z_{2F} } \\ {} & \cdots & \cdots & \cdots & {} \\ {z_{o1} } & {z_{o2} } & {z_{o3} } & \cdots & {z_{eF} } \\ \end{array} } \right] $$
(1)

Stage 1: Initialization

The total numbers of the honey badgers are initialized and it is expressed in the below equation;

$$ z_{k} = {\text{LB}}_{k} + \gamma_{1} \times \left( {{\text{UB}}_{k} - {\text{LB}}_{k} } \right) $$
(2)

From Eq. (2), the honey badger \(k{\text{th}}\) position of the individual is depicted as \(z_{k}\), the upper bound is depicted as \({\text{UB}}_{k}\), the lower bound is depicted as \({\text{LB}}_{k}\), and the random number is indicated by \(\gamma_{1}\).

Step 2: Solution Representation

The objective of using the solution representation process is to reduce the total chosen features as well as the error percentage acquired between the original and predicted density for distinguishing fake news data from real ones. If the dataset contains \(D\) features, the decision variable will be assumed \(1 + D\) for feature selection and bandwidth identification. Each variable in the dataset ranges between 0 and 1. The corresponding features based on fake news are chosen from the dataset when the variable value is more than 0.5, and on the contrary, if the variable is less than 0.5 then features are not chosen.

Step 3: Fitness evaluation

The obtained solution representations are changed into binary values \([0,1][0,1]\) by the HB algorithm to represent the feature. The solution vector dimension is indicated as ‘1,’ and the solution vector dimension \(y_{j}^{{{\text{Dm}}}}\) with value ‘0’ indicates no features selected from the data. The mathematical formulation of converting solution representation to binary values is depicted by,

$$ y_{j}^{{{\text{Dm}}}} = \left\{ {\begin{array}{*{20}c} {1,} & {y_{j}^{{{\text{Dm}}}} \ge 0.5} \\ {0,} & {y_{j}^{{{\text{Dm}}}} < 0.5} \\ \end{array} } \right. $$
(3)

Subsequently, fitness is determined by,

$$ F_{{{\text{ITNESS}}}} { = }\varpi_{1} .\left( \begin{gathered} 1 - {\text{Accuracy}}({\text{LCRF}}) + \hfill \\ \varpi_{2} .\left| {\frac{{{\text{No}}{.}\,{\text{of}}\,{\text{features}}\,{\text{selected}}}}{{{\text{Total}}\,{\text{no}}{.}\,{\text{of}}\,{\text{features}}}}} \right| \hfill \\ \end{gathered} \right) $$
(4)
$$ {\text{Accuracy}}\,{\text{(LCRF)}} = \frac{{E_{n} }}{{E_{n} + C_{n} }} $$
(5)

From the above equations, the error rate weight \(\varpi_{1} = [0,1]\), feature selection weight \(\varpi_{2} = 1 - \varpi_{1}\); the term \({\text{Accuracy}}\,({\text{LCRF}})\) represents the accuracy rate of the LCRF classification model; \(E_{n}\) signifies incorrectly classified samples; and \(C_{n}\) implies correctly classified samples.

Step 4: Defining the intensity \(\left( {\rm I} \right)\).

Intensity is defined as concentrating the prey's strength and distance among the \(k{\text{th}}\) honey badger \({\rm I}_{k}\) representing the prey's smell intensity. The motion becomes vice versa when the smell intensity of prey is higher.

$$ \begin{aligned} {\rm I}_{k} & = \gamma_{2} \times \frac{\delta }{{4\pi f_{k}^{2} }} \\ \delta & = \left( {z_{k} - z_{k + 1} } \right)^{2} \\ f_{k} & = z_{{{\text{PREY}}}} - z_{k} \\ \end{aligned} $$
(6)

From the above equation, the concentration strength is depicted as \(\delta\) the distance between \(k{\text{th}}\) badger and prey is depicted as \(f_{k}\).

Step 5: Density Factor Updation

The time-varying randomization is controlled by using the density factor to make a consistent transition from the exploitation and exploration phases. Then, it is expressed as;

$$ \gamma = E \times {\text{EXP}}\left( {\frac{ - v}{{v_{{{\text{MAX}}}} }}} \right), $$
(7)

From the above equation, the maximum numbers of iterations are denoted by \(v_{{{\text{MAX}}}}\), and the constant represented by \(E\).

Step 6: Escape with local optima

The HBA utilizes the flag to assist higher opportunities to scan the search spaces formally.

Step 7: Update the agent’s position

The updation of HBA is split into honey and the digging phase.

Step 7-1: Digging stage

In the digging stage, the honey badger acted for cardioid shape and they are expressed in the below equation;

$$ \begin{aligned} z_{{{\text{NEW}}}} & = z_{{{\text{PREY}}}} + H \times \lambda \times K \times z_{{{\text{PREY}}}} \\ & \quad + H \times \gamma_{3} \times \chi \times f_{k} \times \left| {\cos \left( {2\Pi \gamma_{4} } \right) \times \left[ {1 - \cos \left( {2\Pi \gamma_{5} } \right)} \right]} \right| \\ \end{aligned} $$
(8)

The prey position is represented by \(z_{{{\text{PREY}}}}\), and the various random numbers are represented by \(\gamma_{1} ,\gamma_{2}\) and \(\gamma_{3}\).

$$ H = \left\{ {\begin{array}{*{20}l} 1 \hfill & {{\text{if}}\,\gamma_{6} \le 0.5} \hfill \\ { - 1} \hfill & {{\text{else}}} \hfill \\ \end{array} } \right. $$
(9)

\(f_{k}\) represents a time-varying factor with search influence.

Step 7-2: Honey stage

The honeyguide birds are followed by honey badger to reach the beehive as expressed in the below equation.

$$ z_{{{\text{NEW}}}} = z_{{{\text{PREY}}}} + H \times \gamma_{7} \times \chi \times f_{j} $$
(10)

The random number is represented by \(\gamma_{7}\), the prey location is indicated by \(z_{{{\text{PREY}}}}\), and the honey badger's new position is indicated by \(z_{{{\text{NEW}}}}\).

3.3 Classification using lightweight convolutional random forest (LCRF) algorithm

The classification phase plays a major role in detecting fake news. In this paper, a novel lightweight convolutional random forest (LCRF) algorithm is employed for the optimal classification. A detailed description of each technique involved in classification is discussed below.

3.3.1 Architecture of LCNN model

The lightweight convolutional neural network (LCNN) is a networking design that consists of two factors: minimum computing unit node-set \({\text{Nd}}\) and connecting dual edge node-set \({\text{Ed}}\)[20]. Hence, the LCNN is represented as follows:

$$ {\text{LCNN}} = ({\text{Nd}},{\text{Ed}}) $$
(11)

The minimum computing node set is formulated as \({\text{Nd}} = \{ {\text{Nd}}_{i} \left| {i = 1,..,n\} } \right.\). The computing unit node is utilized for a single convolutional operation or for adding numerous convolutional operations. They were establishing the convolutional operation diversity. The connecting deal edge node set is represented as \({\text{Ed}} = \{ {\text{Ed}}_{i,j} \left| {1 < i < j \le n\} } \right.\). The \({\text{Ed}}\) is noted that they cannot connect themselves, and there are no loops in an unaligned structure of the network. The values \(i,j\) in the set of edge nodes \({\text{Ed}}\) need to meet the requirements \(P \cup Q = N\); the value of the set \(i\) is captured in the value of the set \({\text{Nd}}\) which is represented as \(N\). To establish the LCNN, the minimum node located in a set of computing units \({\text{Nd}}\) should reorder the group \(a\) to several \(\beta\) set nodes in the equal-length path.

$$ {\text{Nd}} = \{ {\text{Nd}}_{{{\text{Ka}}}} \left| {K = 1,..,a;i = 1,...b\} } \right.,\,(n = b \times a) $$
(12)

The groups of different nodes are not adjacent to each other, the full case of edges are connecting small computing unit nodes that are rejected with a value of a definite probability of rejection connectivity. Hence, the edge node set \({\text{Ed}}\) is represented as follows:

$$ {\text{Ed}} = \{ {\text{Ed}}_{ij} \left| {i \in \{ (1,1),(1,2),....(a,b)\} ;..,a;i = 1,...b\} } \right., $$
(13)
$$ j \in \{ (i + 1,1),(i + 1,b)\} \cup \beta * d_{{{\text{rop}}}} \{ (i + 2,1),....,(a,b)\} \} $$
(14)

The above equation \(\beta\) denotes the value of disconnection probability.

3.3.2 Random forest

Random forest is based on the concept of an ensemble classifier which classifies the decision tree containing a higher number of votes at the final result of classification. In the classification of clustering analysis and regressive analysis problems in machine learning, the RF algorithm has provided excellent performance to establish a suitable model with some minor adjustments in the hyperparameters setting [21]. The different samplings of bootstrap were generated individually from every decision tree in the random forest. The errors in the classifying decision tree are determined by the different tree's abilities in the classification. Furthermore, the majority votes of the decision tree maximized the random forest classification accuracy rate.

Steps for random forest generation:

Bagging: Bagging is the first step for generating a random forest. The algorithm of RF samplings is extracting randomly in \({2 \mathord{\left/ {\vphantom {2 3}} \right. \kern-0pt} 3}\) times from the initial data training set \(Td = \left\{ {\left( {a_{1} ,b_{1} } \right),\left( {a_{2} ,b_{2} } \right)\left( {a_{i} ,b_{i} } \right)} \right\}\) to develop training subsets that are formed by a decision tree. The different samplings of bootstrap sets are captured via bagging to establish each decision tree in a random forest. OOB is an out-of-bag-data error rate that is employed to measure the classification ability of the random forest. OOB evaluation is more efficient compared with cross-validation. The Gini index and the OOB error method are involved in the estimation of features, whereas the DT can compute the error rate while the missed classification probability \({\text{Mgi}}\) is estimated by the Gini approach.

$$ {\text{Mgi}} = \sum\limits_{cl = 1}^{cl} {P_{cl} } (1 - P_{cl} ) = 1 - \sum\limits_{cl = 1}^{cl} {P_{cl}^{2} } $$
(15)

From Eq. (5), the numbers of classes are denoted as \({\text{cl}}\), \(M\) indicates the node, and the probability class is represented as \(p_{{{\text{cl}}}}\) According to the node, \(M\) the random forest feature importance \(X_{j}\) was calculated by means. This scenario is formulated as follows:

$$ {\text{Mvi}}_{{{\text{Mj}}}}^{{{\text{GI}}}} = {\text{Mgi}} - {\text{Lgi}} - {\text{Rgi}} $$
(16)

\({\text{Lgi}}\) depicts the left node, and \({\text{Rgi}}\) depicts the right node of the Gini index.

3.3.3 Constructing the decision tree

The random forest technique maximizes the decision tree diversity through the construction of various training subsets to enhance their efficiency. Finally, each decision tree model received each classification result.

$$ B = \left\{ {b_{1} \left( a \right),b_{2} \left( a \right),.....b_{i} \left( a \right)} \right\} $$
(17)

From the above equation, \(B\) the systematic classification model consists of each model in the decision tree. The result of the terminal categorization voting is determined through a combination of categorization models.

$$ {\text{Td}}(a) = \max \,\arg \sum\limits_{i = 1}^{i} {F(b_{I} } (a) = {\text{BF}}) $$
(18)

From the above equation, classifying the single decision tree model is represented as \(b_{I} (a)\), \({\text{BF}}\) is the fault blocking class, and the combined classification model is denoted as \({\text{Td}}(a)\). The indicator function is denoted as \(F(.)\) Fig. 2 illustrating COVID-19 fake news detection by using the proposed LCRF-HB approach.

Fig. 2
figure 2

LCRF-HB approach for the COVID-19 fake news detection

4 Experimental results and discussion

The proposed light weight convolutional random forest-based honey badger (LCRF-HB) algorithm for analyzing the COVID-19 fake news by using Twitter tweets. The remaining sub-sections explain the exactness of COVID-19 fake news prediction.

4.1 Hyperparameter configuration

To identify the optimal parameter values of the LCRF-HB in achieving better performance a hyperparameter configuration is engaged. The LCRF-HB algorithm hyperparameter configuration is tabulated in Table 1.

Table 1 Hyperparameter configuration

4.2 Dataset description

The Twitter dataset [23] is employed for analyzing COVID-19 fake news from tweets, and the data are gathered through public accounts. The COVID-19-related information is determined from the COVID-19-related tags. The fake news data were collected from the duration of December 2019 to June 2020. Fake data were collected from Google and Twitter, and these data were filtered through their specific browser. These collected data were gathered to form the Twitter dataset which is in the form of a.csv file. After this, the collected data were taken into the processing stage. On Twitter, the COVID-19 tags were gathered and stored, and each rumor sentiment was labeled by carefully analyzing the sentiment of the rumor’s content and context. In addition, metadata for each tweet are gained including reply or retweet comment content, numbers like retweet and like, and date of publication. These data were stored individually; also, the Google data were gathered by employing the HTTP servers. The data are taken from the results page of Google, and the results are absolute from their relative paths by URL and date. Each rumor record has an authenticity label and its content, replay record, date, reply website, etc., not all source websites contain date information. The Twitter dataset features are explained in Table 2.

Table 2 Features of the twitter dataset

4.3 Performance analysis

Various approaches like particle swarm optimization (PSO) algorithm, cross-stitch semi-supervised neural attention model (cross-SEAN), modified long short-term memory (Modified LSTM), convolutional neural network-based long short-term memory (C-LSTM), and proposed light weight convolutional random forest-based honey badger (LCRF-HB) algorithm are utilized for the evaluating the performance. The accuracy analysis performed by utilizing various approaches such as PSO, modified LSTM, C-LSTM, cross-SEAN, and proposed LCRF-HB approaches is illustrated in Fig. 3. As compared to other existing approaches, the proposed LCRF-HB method acquired a greater accuracy of 98.7%. The COVID-19 fake news is detected with higher accuracy in the LCRF-HB than other approaches.

Fig. 3
figure 3

Comparative analysis of accuracy

Figure 4 illustrates the precision analyses of various approaches like PSO, modified LSTM, C-LSTM, cross-SEAN, and the proposed LCRF-HB approach. The precision analysis of the LCRF-HB approach acquired a greater value of 98.3%, and the C-LSTM approach has the lowest precision rate of 88%. From this comparative analysis, the COVID-19 fake news is detected with a high precision rate.

Fig. 4
figure 4

Comparative analysis of precision

The recall rate of various methods like PSO, Modified LSTM, C-LSTM, and cross-SEAN and the proposed LCRF-HB approach is portrayed in Fig. 5. The proposed LCRF-HB approach acquired a higher recall rate of 97.6% than the other methods. The recall rates of 92%, 85%, and 88% are acquired from the PSO model, modified LSTM, and cross-SEAN.

Fig. 5
figure 5

Comparative analysis of recall

Figure 6 depicts the specificity analysis by utilizing PSO design, modified LSTM approach, C-LSTM model, cross-SEAN model, and proposed LCRF-HB approach. The specificity rates of 84%, 90%, 80%, 92%, and 95.4% are detected by using the PSO design, modified LSTM approach, C-LSTM model, cross-SEAN model, and proposed LCRF-HB approach, respectively. The proposed LCRF-HB approach attained a higher sensitivity rate for detecting the fake news of COVID-19 when compared to other methods.

Fig. 6
figure 6

Comparative analysis of specificity

Figure 7 illustrates the performance analysis of the proposed LCRF-HB approach with different parameters like recall, precision, accuracy, and specificity. The proposed LCRF-HB approach attained 98.7% of accuracy, 95.4% of specificity, 98.3% of precision, and 97.6% recall for COVID-19 fake news detection.

Fig. 7
figure 7

Performance analysis

Figure 8a, b depicts the performance of training and validation of accuracy and loss. The performance of training data accuracy attains the value of 0.8%, where the validation accuracy of 0.7%. The effectiveness of training loss attains 0.4%, and validation loss of 0.2%.

Fig. 8
figure 8

a Efficiency of training and validation accuracy, b efficiency of training and validation loss

Figure 9 depicts the performance analysis of the various state-of-the-art algorithms such as FakeBERT, GCN, and ConvNet concerning the proposed LCRF-HB approach. The performance of FakeBERT attains 95.9%, GCN attains 91.3%, and ConvNet attains 93.4%, but the proposed LCRF-HB attains a greater efficiency of 98.7%.

Fig. 9
figure 9

Performance analysis under various state-of-the-art algorithms

5 Conclusion

The major objective of this paper is to examine COVID-19 fake news detection. The data pre-processing, feature selection, and classification are the various phases proposed to detect fake news. In data pre-processing, stop-word deletion, stemming, and stop-word deletion are utilized for pro-processing the input data. After data pre-processing, the features are selected using the honey badger algorithm. The LCRF approach is utilized for classifying COVID-19 news as real or fake manually. The Twitter dataset is employed for evaluating the performance of the proposed method. Recall, accuracy, specificity, and precision are the metrics employed for predicting the rate of efficiency. The comparative analysis is performed by utilizing various approaches like PSO design, modified LSTM approach, C-LSTM model, cross-SEAN model, and proposed LCRF-HB approach. The proposed LCRF-HB approach attained a higher 98.3% precision, 95.4% specificity, 97.6% recall, and 98.7% accuracy for detecting COVID-19 fake news. The proposed LCRF-HB approach attained a training accuracy of 98.7%. For future research, various techniques will be integrated to enhance the performance as well as COVID-19 fake news detection optimally.