1 Introduction

In the contemporary world, fake news is considered as one of the major intimidations to the economy, democratic organisation and journalism. Social network platform provides a suitable channel for the consumers in accessing, creating and sharing diverse data (Monti et al. 2019). The usage of social media network has been increased since numerous people receive and look for recent updates at the appropriate time. On the other hand, social media provides an opportunity to spread countless misleading and fake information by the user and such extensive spreading of fake information has deleterious community consequences (Zhou et al 2020). Initially, the spreading of false information diminishes the faith of public figures in journalism and government department. The fake information broadcasted across the US during the presidential ballot vote-2016 was sarcastically large than spreading of accurate news. Secondly, the fake information changes the way of people that are reacting to authorized and justifiable news. A survey has been demonstrated that the confidence of the public in social media has degraded dramatically between various political parties and peoples of various age categories. Thirdly, unrestrained and riotous web-based fake information results in off-line social incidents (Zervopoulos et al. 2020). Hence, it is necessary to restrict the spreading of fake information on mass media thus to promote confidence all over the world (Oliveira et al. 2020).

Wardle (2017) in his initial finding classifies the fake news into seven categories: parody or satire (i.e. the intention is not to harm but it has the potential to fool), misleading content (i.e. misleading an individual or information), manipulated content (i.e. when genuine information is manipulated to deceive), false context (i.e. the genuine content is shared falsely), false connection (i.e. the heading and content doesn’t support each other), impostor content (i.e. impersonated genuine resources), fabricated content (i.e. the content is hundred percent false and creates harm). In addition to this, the detection of fake news is obviously a major concern for journalists, news reporters, and news industries; also the tools employed in detecting the false news have turned out to be a dreadful requirement (Ruchansky et al. 2017). Investigating the fake news manually is considered as a challenging task. Hence automatic detection of false information has dragged a great deal of awareness in the community of natural language programming that helps in alleviating the time consumption and troublesome human endeavour of examining the reality. Even though, determining the reliability of fake news is considered as a thorny issue for an autonomous system (Kaur et al. 2020). Initially, for better identification of false news, it is necessary to understand the similar topic regarding the report of other news organisations which is termed as “Stance detection”. This stance recognition has been consistently regarded as a significant base for numerous other errands that include online controversy examination, determination of rumours on social media etc. (Le et al. 2020).

The stance detection is categorized as agree, disagree, unrelated and discuss. This categorization is made in accordance with the agreement level among the headlines as well as the content allocated to the headline (Okoro et al. 2018). Furthermore, autonomous detection of fake news in which the current approach is categorized generally into propagation-based and content-based approaches. In the present hypothetical situation, the most common sources of information are the media outlet. Individual sharing of news has been significantly grown in the last years and it becomes a difficult task in differentiating the news emanates from the dependable source from where the original news is generated (Kaliyar et al. 2020). As a consequence, the false news receives plenty of investigation namely Twitter, Google, Facebook etc. To handle the issue based on the spreading of fake news, numerous statistical and machine learning approaches are employed. Statistical approach refers to the relationship among numerous characteristics of data and analysing pattern whereas the classification of uncertain content uses machine learning approaches (Lara-Navarra et al. 2020; Sundararaj 2016, 2019; Sundararaj et al. 2018, 2020; Ravikumar and Kavitha 2020; Rejeesh and Thejaswini 2020; Kavitha and Ravikumar 2021; Hassan and Rashid 2020; Hassan 2020; Hassan et al. 2021; Haseena et al 2014; Gowthul Alam and Baulkani 2019a, 2019b; Nirmal Kumar et al. 2020; Nisha and Madheswari 2016).

This paper proposes four different phases namely the data pre-processing phase, Feature reduction and extraction phase as well as the classification phase. For better reduction of high-dimensional features, PPCA is employed. LFTM-LF distribution is employed in feature extraction and classification for optimal detection of fake news. The major contributions of the paper are discussed in the following section.

  • Utilizing four different phases namely the data pre-processing phase, Feature reduction and extraction phase as well as classification phase for fake news detection.

  • Employing PPCA for feature reduction and extraction thus reducing high-dimensional features.

  • Proposing LSTM-LF for optimal detection of fake news with high rate of accuracy.

  • Comparing our proposed approach with other existing approaches thus evaluating the system effectiveness.

The rest of the paper is organized in the following manner. The existing prior literature work regarding the detection of fake news is discussed in Sect. 2. In Sect. 3, four different phases namely data pre-processing phase, Feature reduction phase, Feature extraction phase and classification phase for optimal fake news detection. The performance analysis and the comparative results of our proposed approach are discussed in Sect. 4. Section 5 concludes the article.

2 Review of related works

In the past few years, numerous researchers proposed various machine learning approaches and mining techniques to determine and to detect the fake news that spreads through social media. To gain better knowledge regarding the detection of fake news, the details of various research articles are summarized in the following section.

Cui et al. (2019) proposed explainable fake news detection (dEFEND) system to determine and to detect the fake news. The performance measures employed in evaluating this approach are Precision, F-measure, accuracy and mean average precision and mean average precision. GossipCop and PolitiFact were the two various datasets used in this approach. The experimental analysis was conducted and the results revealed that the detection performances were high, but this approach failed to consider the posts and explainable comments.

The early detection of fake news using Structure-aware Multi-head Attention Network (SMAN) was demonstrated by Yuan et al. (2020). Three various datasets namely the Weibo, Twitter-15 and Twitter-16 were employed to evaluate the fake news detection. The performances involved in this approach were Accuracy, precision, Recall, F1-measure. Moreover, the accuracy and precision value was very high when compared with other approaches. The execution time of this approach is very high is considered as the major drawback of this approach.

Duan et al. (2020) developed an online incremental log keyword extraction technique by employing a multi-layer dynamic particle swarm optimization algorithm along with deep LSTM networks. RMSE, MAPE, MSE, MAE were the metrics employed in simulation with respect to four various datasets namely HDFS, Hadoop, Spark, Open stack. The rate of robustness and accuracy was high and this approach failed to integrate log keywords in LSTM.

The fake news detection in social media using supervised artificial intelligence algorithms was proposed by Ozbay and Alatas (2020). Buzz Feed, Random political datasets, ISOT fake news were the three different types of datasets employed in this approach. In addition to this, Accuracy, precision, Recall, F1-measure were the simulation parameters employed and thus a very high rate of accuracy was obtained. Also, this approach failed to integrate ensemble approaches to detect fake news.

Kesarwani et al. (2020) developed utilized k-nearest neighbour classifier to detect fake news on social media. True label, accuracy, F1-measure was the performance metrics employed in this approach. The dataset used here was Buzz Feed dataset. From the experimental analysis, the results revealed that the classification accuracy was high. The implementation seems very complex is considered as the major drawback of this approach.

The hierarchical propagation network was proposed by Shu et al. (2020) to detect and to determine the fake news. Accuracy, precision, Recall, F1-measure were the simulation measures evaluated for GossipCop and PolitiFact datasets. The experimental analysis was conducted and the result revealed that the robustness of the proposed approach was high, but the fake news detection is unsupervised.

Wang et al. (2020) developed SemSeq4FD that integrated global semantic relationship and local sequential order to enhance text representation for fake news detection. This paper utilized datasets of two various sectors: English and Chinese datasets (LUN and SLN, Weibo and RCED). Average and maximum length, accuracy, precision, Recall was the performance measures employed for evaluation. This approach recognized multi-view fake news but the flexibility was poor.

The dual-stage transformer model for covid-19 fake news detection and fact-checking was demonstrated by Vijjali et al. (2020). Accuracy, precision and MAP were evaluated for COVID-19 dataset. The efficiency rate was very high but the overall effectiveness is poor is the major disadvantage of this approach.

Zhang et al. (2020) proposed BERT-based domain adaptation neural network for Multi-Modal Fake News Detection. The evaluation measures employed in this approach was Accuracy, precision, Recall, F1-measure and the datasets utilized in this approach was Twitter and Weibo. The fake news detection was enhanced but this approach failed to design a probabilistic model.

Multimodal variational autoencoder for fake news detection was developed by Khattar et al. (2019). Accuracy, precision, Recall, F1-measure were the performance measures employed in this approach for various datasets namely Twitter and Weibo. During analysis, the fake news was detected accurately; but this approach failed to propagate the Twitter data. The summary of the existing literature works is discussed in Table 1.

Table 1 Review of prior literature works

3 Proposed methodology

A news item is said to be fake if the content is confirmed to be true or false. Let us assume \(y = \left\{ {y_{1} ,y_{2} ,...y_{n} } \right\}\) indicates a dataset containing N number of news items. Every news item \(k \in [1,N]\) containing i data sources is represented by \(y_{k} = \left\{ {y_{1k} ,y_{2k} ,...y_{nk} } \right\}\). Furthermore, \(y = \left\{ {y_{1} ,y_{2} ,...y_{n} } \right\}\) indicates a class label set closely associated with the dataset y. Every class label \(y_{k} \in y\) obtains a label set \(L = l_{1} ,l_{2} ,...l_{M}\). Here the total number of degree of fakeness recognized is denoted by M. The block diagram for the proposed approach based on fake news detection is represented in Fig. 1. The proposed approach comprises of four different phases namely the data pre-processing phase, Feature reduction and extraction phase as well as classification phase. During data pre-processing, the input data is pre-processed by employing tokenization, stop-words deletion as well as stemming. In the second phase, the features are reduced by employing PPCA to enhance accuracy and to reduce high-dimensional data. Then the extracted feature is provided to the classification phase where LSTM-LF algorithm is utilized to classify the news as fake or real. The detailed descriptions of each respective phase are employed in the following section.

Fig. 1
figure 1

Architecture of the PPCA and Levy Flight based LSTM fake news detection system

3.1 Data pre-processing phase

The data pre-processing in other terms referred to as a data mining process that is capable of transforming unstructured, variable, inconsistent and incomplete data into machine understanding pattern. Numerous tasks namely the conversion process of normal texts into the lowercase letter, Deletion of stop words, stemming process as well as tokenization are performed in data pre-processing phase. The following section provides a detailed description of each respective task involved in the pre-processing phase.


(a) Tokenization


The term tokenization is the process of division of original texts into smaller segments that are referred to as tokens. On the other hand, the punctuations from the text data are removed by means of tokenization. To remove the number terms from a particular sentence, number filters are employed. The transformation of textual data to upper and lower cases utilizes case converters. In the end, the words containing less number of characters are removed using N-char filters (Mullen et al. 2018).


(b) Deletion of stop words


Stop words are insignificant and not so essential but they are used frequently in connecting expressions and completing the sentence. The stop words are quite usual and prevail in every sentence that does not carry any information. On the other hand, there are approximately five hundred stop words in English; where preposition, conjunction and pronoun are considered as few stop words. The examples of stop words are what, on, am, under, that, when, and, against, by, a, above, an, once, too, where, any, again, the etc. Therefore, by deleting the stop words, space and processing time are saved (Umer et al. 2020).


(c) Stemming process


The main intention of the stemming process is to attain the fundamental form of words that consists of identical meaning with diverse words. During this process, various grammatical words: noun, adjective, verb, adverb etc. are converted into source form. Let us consider an illustration, the word consultant, consulting, consultative, consultants, consult are stemmed to the word “consult”. Thus reduction of words (i.e. stemming of words) into a regular fundamental form is considered as an effective approach (Dharmendra and Suresh, 2015).

Thus the characters and the redundant terms namely the texts, stop words and numbers are filtered in the data pre-processing phase.

3.2 Feature reduction using PPCA

Followed by the data pre-processing phase, the feature reduction phase is employed to reduce the dimensions of the features. High-dimensionality data is considered as the major issue in data pre-processing phase and it is necessary to eliminate the redundant and unrelated feature to enhance the accuracy. By reducing the feature, the processing speed is minimized that results in enhanced performances. On the other hand, the feature reduction has substantial consequences on the result of the textual classification. Hence, to reduce the dimensions of the features and to enhance the rate of accuracy, this paper proposes probability principal component analysis (PPCA) that is discussed in the following section.


Probability principal component analysis (PPCA)


The mathematical formulation, its respective definitions and derivations of probabilistic principal component analysis are discussed in the subsequent section. Let us assume, \(Y_{J}\) be the latent variable that consists of a normal distribution function (Li et al. 2020). Thus,

$$ Y_{J} \sim \,n\,(0,\,I_{{_{M} }} ) $$
(1)

From Eq. (1), the normal distribution function and the identity matrix is represented by n and IM respectively. Then the projection residuals \(\omega_{J}\) are distributed normally. Thus,

$$ \omega_{J} \sim \,n\,(0,\,\delta^{2} I_{M} ) $$
(2)

The probability distribution function in accordance with Eqs. (1) and (2) is formulated in the subsequent equations.

$$ \left[ {Y_{J} } \right]\sim \left( {w^{T} w + \delta^{2} I_{M} } \right)^{ - 1} w^{T} (t_{J} - \phi );\quad {\text{where}}\;w = w_{1} ,w_{2} ,...w_{K} $$
(3)
$$ \left[ {Y_{J} ,Y_{J}^{T} } \right]\sim \delta^{2} \left( {w^{T} w + \delta^{2} I_{M} } \right)^{ - 1} + \left[ {Y_{J} } \right]\;\left[ {Y_{J}^{T} } \right] $$
(4)

From the above equations, the unconditional probability distribution function is represented by \(P\left[ {Y_{J} |t_{J} ,\,w,\,\delta^{2} } \right]\). The sample mean value and the transpose operator are denoted by \(\phi \,\,{\text{and}}\,\,T\). Then by employing log-likelihood function in accordance with the probability function is represented in Eq. (4). Hence,

$$ L_{L} = \sum\limits_{J = 1}^{N} {P\left[ {Y_{J} ,t_{J} } \right]} $$
(5)

The model parameter of PPCA is obtained by employing EM algorithm (Liu et al. 2020). Thus by utilizing PPCA, the dimensions of the features are reduced and the processing speed is minimized. The feature selection is considered as one of the most effectual technique that reduces the high-dimensional data. The classification process is enhanced by feature selection. In addition to this, it can also eliminate the noisy, inappropriate and irrelevant data. It also selects a delegate subset among all data to reduce the complication issues during classification processes. Thus, the elimination of unwanted features minimizes the time of computation thereby attaining high performances with enhanced accuracy rate (Menaga and Revathi 2020).

3.3 Classification using LSTM-LF

The classification phase is one of the most significant processes involved in the detection of fake news. The fake news is detected by identifying the data displayed in the article is real as a result of enumerating the bias of the article written, the interrelation among the body and article of the headline is evaluated. For optimal classification of fake and real news, this paper proposes an LSTM based LF approach that is discussed as follows.


(a) Long Short Term Memory (LSTM)


The most prominent variants of recurrent neural network (RNN) are the LSTM that has accomplished a successful outcome in recent years. In LSTM, the memory cell is the centre part that consists of a gating system. This gating system is capable of judging the information is either beneficial or not. In general, every LSTM cell is composed of three major gates: input gate forgot gate and the output gate as shown in Fig. 2. To recognize long term dependency, the LSTM utilizes separate cell thus the current input value is updated (Duan et al. 2020).

Fig. 2
figure 2

LSTM Architecture

The numerical expressions with respect to three various gates are derived in the following section.

$$ i(g) = \,\sigma \left[ {w_{i} \cdot \left( {y_{T - 1} ,\,h_{T} \,} \right)} \right] + \,b_{i} ) $$
(6)
$$ f(g) = \,\sigma \left[ {w_{f} \cdot \left( {y_{T - 1} ,\,h_{T} \,} \right)} \right] + \,b_{f} ) $$
(7)
$$ o(g)\, = \,\sigma \left[ {w_{o} \cdot \left( {y_{T - 1} ,\,h_{T} \,} \right)} \right] + \,b_{o} ) $$
(8)

From the above equation, the input, forget and the output gate is represented by \(i(g)\), \(f(g)\) and \(o(g)\) respectively. The sigmoid activation function is represented by \(\sigma\). The bias function and the weight function of three respective gates are represented by \(b_{i}\), \(b_{f}\), \(\,b_{o}\) and \(w_{i}\), \(w_{f}\), \(w_{o}\). The hidden state and the input state are represented by \(y_{T - 1} \,\,{\text{and}}\,\,\,h_{T}\). In addition to this, the equations for hidden state and cell state are derived as follows.

$$ c_{i} \, = \,{\text{Tan}} H\left[ {w_{c} \cdot \left( {y_{T - 1} ,\,h_{T} \,} \right)} \right] + \,b_{c} ) $$
(9)
$$ c_{i1} \, = \,f(g) \circ \,c_{T - 1} \, + \,i(g) \circ c_{i} $$
(10)
$$ y_{T} \,\, = \,o(g) \circ \,{\text{Tan}} H(c_{i} \,) $$
(11)

From Eqs. (9), (10) and (11), the hyperbolic activation function, the weight and bias function with respect to the cell state is denoted by \({\text{Tan}} H\), \(w_{c}\) and \(b_{c}\).


(b) Levy Flight (LF) Distribution


In general, the LF distribution is stimulated by numerous physical or natural phenomenon in the environment. LF demonstrates the enhanced performances in searching for resources even in uncertain conditions. The living species varieties namely monkeys, humans, fruit flies and spiders trails their path of levy flight style. The mathematical model involved in determining the levy flight distribution is described in the subsequent section (Houssein et al. 2020).

It is necessary to determine two different features to generate random walk: step length and direction. The step length selects the levy distribution to generate random walk. Then the direction moves towards the target. Therefore, the step length in accordance with the Mantegna algorithm is determined in Eq. (12).

$$ SL = \frac{A}{{\left| B \right|^{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 \delta }}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$\delta $}}}} }};\,\,{\text{where}}\,\,\delta \to 0 < \delta \le 2 $$
(12)

From the above equation, \(SL\,{\text{and}}\,\,\delta\) signifies the step length and levy distribution index.

From Eq. (14),

$$ A\sim P(0,\sigma_{A}^{2} ),\,\,\,B\sim P(0,\sigma_{B}^{2} ) $$
(13)

Then the mathematical derivation for standard deviation with respect to A and B are formulated in Eq. (14).

$$ \sigma_{A} = \left\{ {\frac{{G(1 + \delta ) \times \sin ({{\pi \delta } \mathord{\left/ {\vphantom {{\pi \delta } 2}} \right. \kern-\nulldelimiterspace} 2})}}{{G\left[ {(1 + \delta )/2} \right] \times \delta \times 2^{(\delta - 1)/\delta } }}} \right\} $$
(14)
$$ \sigma_{B} = 1 $$
(15)

From Eq. (15), the gamma function with respect to integer Z is obtained as,

$$ G(Z) = \int\limits_{0}^{\infty } {T^{Z - 1} {\text{e}}^{ - T} {\text{d}}T} $$
(16)

In addition to this, the LFD computes the Euclidean distance formula among two neighbouring search agents. Thus,

$$ E_{D} (X_{k} ,X_{l} ) = \sqrt {(x_{k} - x_{l} )^{2} + (z_{l} - z_{k} )^{2} } $$
(17)

The position co-ordinate of \(X_{k} \,{\text{and}}\,\,X_{l}\) are \(x_{k} ,z_{k}\) and \(x_{l}\), \(z_{l}\) respectively. Then \(E_{D}\) is compared with the threshold till the search agents are completed after certain conditions. The algorithm adjusts its position if the subsequent distance is less than that of a threshold value. Hence,

$$ X_{l} (T + 1) = L_{F} (X_{k} (T),\,X_{Lead} ,\,up_{L} ,\,lo_{L} ) $$
(18)

From Eq. (20), the index values for the total number of iterations are represented by T. The levy flight function in accordance with the step length and direction are represented as \(L_{F}\). \(up_{L} \,{\text{and}}\,\,lo_{L}\) signifies the upper and lower limit in the search space (Zhao et al. 2020). The agents’ position containing a very low number of neighbours is represented as \(X_{Lead}\). Then \(X_{l}\) move to the position with very low number of neighbours.

$$ X_{l} (T + 1) = lo_{L} + (up_{L} - lo_{L} )\,R\,(\,);\,\,{\text{where}}\,\,R \to \left[ {0,1} \right] $$
(19)

Then the comparative scalar value \(\,C_{V}\) with respect to random position and \(X_{l}\) is represented by,

$$ r = R\,(\,),\,\,\,C_{V} = 0.5 $$
(20)

The exploration capability and the performances of the algorithm are enhanced by varying the solutions. Therefore, the solution updation equation is obtained in Eq. (23),

$$ X_{l} (T + 1) = B_{S} + \beta_{1} \times T_{FN} + R(\,) \times \beta_{2} \times (B_{S} + \beta_{3} X_{Lead} )/2 - X_{l} (T)) $$
(21)

Then the new position is computed as,

$$ X_{l}^{New} (T + 1) = L_{F} \,(X_{l} (T + 1),\,B_{S} ,\,up_{L} ,\,lo_{L} ) $$
(22)

From Eqs. (21) and (22), \(\beta_{1} ,\beta_{2} \,{\text{and}}\,\beta_{3}\) represents the random numbers where \(0 < \beta_{1} ,\beta_{2} \,{\text{and}}\,\beta_{3} \le 10\). The total target fitness function and best objective fitness function is represented as \(B_{S}\) and \(T_{FN}\) respectively. Where,

$$ T_{FN} = \sum\limits_{T = 1}^{nn} {\frac{{F_{D} \times X_{T} }}{nn}} $$
(23)

From Eq. (23), \(nn\,\,{\text{and}}\,\,F_{D}\) signifies the total number of neighbours and fitness degree respectively. The neighbouring position with respect to \(X_{l} (T)\) is represented as \(X_{T}\). Then the fitness degree for every neighbouring solution is derived as follows.

$$ F_{D} = \frac{{\partial_{1} (B - Min(B))}}{Max(B) - Min(B)} + \partial_{2} $$
(24)

From Eq. (25),

$$ B = \frac{{fit(X_{l} (T))}}{{fit(X_{l} (T))}},\quad {\text{where}}\;\partial_{1} > 0\;{\text{and}}\;\partial_{2} \le 1 $$
(25)

The iteration process is repeated till the algorithm attains its best optimal solution. Figure 3 describes the formation of proposed LSTM-LF distribution for fake news detection.

Fig. 3
figure 3

Proposed LSTM-LF approach for fake news detection

4 Experiments and discussions

This section depicts the outcome regarding the experimentation and simulation of the proposed approach for fake news detection. Here, various experimental analysis is conducted for four different datasets namely the Buzzfeed dataset, GossipCop dataset, ISOT dataset as well as Politifact dataset are discussed. Finally, the comparative analyses of various approaches are discussed to determine the effectiveness of the system.

4.1 Experimental configuration

The simulation experiments for the proposed approach for fake news detection was implemented under the platform of MATLAB 2016a that consists of CPU E5/ 2690 v2, Intel ®, NVIDIA CPU, 20 GB RAM, Xenon ® and 3 GHz processor.

4.2 Parameter specifications

The parameter details of LSTM, Levy flight distribution algorithms are set as follows. The specification and its respective range values are discussed in Table 2.

Table 2 Parameter specifications

4.3 Simulation metrics

The performances of the proposed approach are evaluated by employing various evaluation measures namely accuracy, precision, specificity and recall. The mathematical expressions regarding each respective measure in terms of fake news detection are discussed in the subsequent section.

$$ {\text{Accuracy}} = \frac{{\left| {T_{N} } \right| + \left| {T_{P} } \right|}}{{\left| {T_{N} } \right| + \left| {T_{P} } \right| + \left| {F_{N} } \right| + \left| {F_{P} } \right|}} $$
(26)
$$ {\text{Precision}} = \frac{{\left| {T_{P} } \right|}}{{\left| {T_{P} } \right| + \left| {F_{P} } \right|}} $$
(27)
$$ {\text{Specificity}} = \frac{{\left| {T_{N} } \right|}}{{\left| {T_{N} } \right| + \left| {F_{P} } \right|}} $$
(28)
$$ {\text{Recall}} = \frac{{\left| {T_{P} } \right|}}{{\left| {F_{N} } \right| + \left| {T_{P} } \right|}} $$
(29)

From Eqs. (26) to (29),

True positive \(\left| {T_{P} } \right|\): The prediction is true positive, if the predicted fake news is practically counterfeit.

True negative \(\left| {T_{N} } \right|\): The prediction is true negative, if the predicted real news is practically authentic.

False positive \(\left| {F_{P} } \right|\): The prediction is false positive, if the predicted fake news is practically authentic.

False negative \(\left| {F_{N} } \right|\): The prediction is false positive, if the predicted fake news is practically authentic.

4.4 Dataset description

This section utilizes four different datasets namely the Buzzfeed dataset, GossipCop dataset, ISOT dataset as well as Politifact dataset (Ozbay and Alatas 2020; Shu et al. 2020) for the detection of fake news. Here 80% of datasets are trained and the rest 20% of datasets are employed for validation purposes. The training and testing dataset details regarding fake news detection are mentioned in Table 3.

Table 3 Training and testing specifications

4.5 Dataset 1: Buzzfeed

Buzzfeed comprises of two different types of news set namely the fake news and the real news. The Buzzfeed dataset was collected from the false new article regarding the presidential election in US-2016. The dataset comprises of 1700 news articles collected from Facebook. The selected terms of the Buzzfeed dataset are political, nation, bill, party, country, democrat etc.

4.6 Dataset 2: GossipCop

The GossipCop dataset consists of news content discussed by various specialized and proficient journalists who are experts in collecting temporal information and social contents. The GossipCop dataset comprises 17,520 news data and among them 5500 are fake. The tweets, retweets and replies of this dataset are 1,060,000; 555,550 and 235,750 respectively.

4.7 Dataset 3: ISOT

The ISOT dataset consists of real and fake news that are acquired from various real-world sources. Here, the dataset comprises of 44,900 data and among them 21,578 are real and the rest 23,322 are fake news.

4.8 Dataset 4: Politifact

The Politifact dataset consists of news content discussed by various specialized and proficient journalists who are experts in collecting temporal information and social contents. The GossipCop dataset comprises of 700 news data and among them 450 news are fake. The tweets, retweets and replies of this dataset are 278,075; 295,530 and 127,426 respectively. The detailed description of testing and training data is obtained in Table 3.

4.9 Performance evaluations

This section depicts the evaluation of performances of the proposed approach for fake news detection. In addition to this, the graphical analysis for various performance measures such as accuracy, precision, specificity and recall with respect to four various datasets namely Buzzfeed dataset, GossipCop dataset, ISOT dataset as well as Politifact dataset is evaluated.

4.9.1 Confusion matrix

Table 4 describes the confusion matrix regarding the fake news with respect to four values namely the true positive, false positive, true negative and false negative. In the confusion matrix, the news sample is categorized as fake news or real news.

Table 4 Confusion matrix regarding fake news

4.9.2 Dataset analysis of various metrics

This section provides the dataset results for four different datasets namely the Buzzfeed dataset, GossipCop dataset, ISOT dataset as well as Politifact dataset for four various metrics like accuracy, precision, specificity and recall. Figure 4a provides the graphical representation for four respective Datasets and accuracy rate. The experimental analysis is conducted and the results reveal that the accuracy rate for Buzzfeed dataset, GossipCop dataset, ISOT dataset and Politifact dataset obtained are 95%, 96%, 94% and 93% respectively.

Fig. 4
figure 4

Analysis of datasets for a accuracy, b specificity, c precision and d recall

In Fig. 4c, the graph is plotted between the precision rate and respective datasets. The precision rate obtained for Buzzfeed dataset, GossipCop dataset, ISOT dataset as well as Politifact datasets are 91%, 94%, 89% and 90%. It is also noted that the precision value of ISOT dataset is comparatively low when compared with the other three datasets. Furthermore, the specificity results with respect to four different datasets are obtained in Fig. 4b. The precision rate is plotted and the results obtained are 94%, 95%, 91% and 97% correspondingly. From the precision analysis, it is well noted that the precision value of ISOTis bit low than other respective datasets. Figure 4d depicts the graphical representation for recall value for Buzzfeed dataset, GossipCop dataset, ISOT dataset as well as Politifact datasets. The recall value obtained is 89%, 88%, 87% and 90% respectively.

Figure 5 describes the overall performances for four various datasets namely Buzzfeed dataset, GossipCop dataset, ISOT dataset as well as Politifact datasets. By employing our proposed approach, the experimental analysis is conducted and the results obtained are 93%, 91%, 89% and 90% respectively. From the discussion, it is clear that the performance rate obtained for Buzzfeed, GossipCop and Politifact datasets are almost similar; but the performance rate obtained for ISOT dataset is comparatively low when compared with other three datasets.

Fig. 5
figure 5

Performance rate analysis

4.9.3 Evaluation results

The evaluation results for the proposed approach with respect to four simulation metrics like accuracy, precision, specificity and recall are plotted in Fig. 6. The overall percentage rate of our proposed approach obtained in terms of accuracy is 98.5% and for specificity, the value achieved is 97.3%. In case of precision and recall, the percentage rate obtained is 98% and 95% respectively. From the above analysis, it is well noted that the proposed approach provides a better performance rate for all simulation metrics.

Fig. 6
figure 6

Overall performance analysis of the proposed approach

4.10 Comparative analysis

This section portrays the state-of-art comparison for various performance measures namely the accuracy, precision, specificity and recall for fake news detection. Figure 7a–d provides the graphical analysis for various simulation metrics and compared our proposed approach with other approaches such as CNN-LSTM (Umer et al. 2020), CNN-Bidirectional LSTM (Kumar et al. 2020), and FNDnet (Kaliyar et al. 2020). The experimental analysis is conducted and the analysis reveals that the proposed approach provides the accuracy, precision, specificity and recall rate of about 98.5%, 98%, 97.3% and 95% respectively. This reveals that the proposed approach provides better performances when compared with other fake news detection approaches.

Fig. 7
figure 7

Comparative analysis for a accuracy, b specificity, c precision and d recall

5 Conclusion

Investigating the fake news manually is considered as a challenging task. Hence automatic detection of false information has dragged a great deal of awareness in the community of natural language programming that helps in alleviating the time consumption and troublesome human endeavour of examining the reality. To overcome such shortcoming, this paper proposed four different phases namely the data pre-processing phase, feature reduction phase, feature extraction phase as well as the classification phase. During data pre-processing, the input data is pre-processed by employing tokenization, stop-words deletion as well as stemming. In the second phase, the features are reduced by employing PPCA to enhance accuracy and high-dimensional data. Then the extracted feature is provided to the classification phase where LSTM-LF algorithm is utilized to classify the news as fake or real optimally. In addition to this, this paper utilized three different datasets namely the Buzzfeed dataset, GossipCop dataset, ISOT dataset as well as Politifact dataset for evaluating our proposed approach. Then the evaluation results for the proposed approach with respect to four simulation metrics like accuracy, precision, specificity and recall are conducted. Finally, our proposed approach is compared with other approaches such as CNN-LSTM, CNN-Bidirectional LSTM, and FNDnet and the analysis reveals that the proposed approach provides the accuracy, precision, specificity and recall rate of about 98.5%, 98%, 97.3% and 95% respectively. In future, various ensemble approaches must be integrated with the optimization algorithms to boost the performances and to detect the fake news more optimally.