1 Introduction

These days, Internet and e-commerce sectors are incessantly growing. Due to exponential growth in these sectors, online reviews are also increasing and reliance on these online reviews are also hiked. Some of the instances where we rely on online reviews are:

  1. 1.

    Buying something from online retail website, we look at the product reviews followed by the seller reviews.

  2. 2.

    For buying business software reviews at different websites are inspected.

  3. 3.

    Online reviews are also investigated to decide whether to watch a movie or not.

Online reviews have become an essential part of our lives. According to an experiment conducted by Lackermair et al. [1] on 104 German online shoppers, 74.04% of the participants rated online reviews as “important or very important” and 85.57% of the participant claimed that before purchasing a product they read reviews “often or very often”. Presently, e-commerce websites like Amazon, Flipkart, etc. provide an option for writing review for a particular product. The reviewers can write whatever they feel about the product which may impact buyer’s decision. Hence, these reviews may either increase or degrade product’s reputation and sales. Thus, spam review detection becomes a necessity.

Dixit et al. [2] categorized spam reviews into three classes namely; Untruthful reviews, Reviews on brands, and Non-Reviews. Untruthful reviews are the reviews which are completely fake while reviews on brands are the reviews that are for a brand or for a seller but does not focus upon the product. Non-reviews are reviews which contain unrelated text or advertisement. Untruthful reviews are the hardest to detect due to its structure. The example of Untruthful review is given below.

Review 1: Great hotel in heart of Chicago for business or pleasure. Rooms are recently upgraded and very modern and large. Flat screen TVs, marble baths, all rooms are suites, great desk, kitchenette, comfortable bed, free wireless Internet... everything you could ask for. Location is easy walk to Magnificent Mile and lots of great restaurants. Staff is friendly and helpful. Short cab ride to Loop.

Review 2: What a terrible experience my family and I had at Affinia Chicago! First of all, we reserved a room with 2 queen size beds and received only 1 King size bed with a cot. When we got to the room, we found hair balls on the floor as if a cat had previously stayed there. What an absolute terror Affinia was and I will never be going back!

For a user it is very difficult to identify that the review 1 is a real review whereas the review 2 is not. Therefore, to identify fake/ spam reviews many baseline methods such as bag of words, n-grams, etc., are proposed. Bag of words-based spam detection methods use individual words as feature for spam review classification. Since, bag of words-based methods generally ignore the semantics of words. Hence, these methods are not very effective in review classification. Some researchers have used lexical and syntactical features for spam detection [3,4,5], while Ott et al. [6, 7], and Lin et al. [8] have used unigrams-based techniques for fake review detection.

Furthermore, supervised, unsupervised and semi-supervised-based machine learning techniques have also used for spam review detection. Cheng et al. [9] presented a case study and compared various methods used for detecting fake reviews. Munzel [10] presented various contextual cues which helped Internet users to distinguish fake from genuine reviews. Narayan et al. [11] introduced a spam review detection method based on opinion mining and supervised learning approach. Petrescu et al. [12] studied the evolution and outcomes of incentivized review campaigns and found that these incentivized campaigns influences the users to post positive reviews of their product. Luca and zervas [13] have used two complementary approaches on Yelp datasets and identified that the only 16% of restaurant reviews on Yelp are filtered. Gieseke et al. [14] have used efficient recurrent local search policy for unsupervised and semi-supervised models to handle the binary classification problems. Further, Behdad et al. [15] investigated the fraud detection problem and also inspected how machine learning models can be applied to it. Mani et al. [16] combined the ability of multiple classifier to identify spam reviews. Ghai et al. [17] introduced a spam detection method based on rating variation score, caps count score, and reviewer’s count score. Heydari et al. [18] examined the doubtful time intervals acquired from time series of reviews to overcome the rating variation of the reviewers. Liu and Pang [19] introduced an aspect-based review deviation unsupervised framework for detecting spamicity. Most of the spam detection model use hand crafted features for spam detection and hand-crafted features cannot reveal the semantics of reviews. Therefore, to learn the semantic representation of reviews a neural network based model has been proposed [20]. Hu et al. [21] introduced a multi-text summarization approach which uses k -medoids clustering to discover the top k-most significant reviews. Hai et al. [22] have used logistic regression-based multi-task learning method (MTL-LR) followed by semi-supervised multi-task Laplacian regularized logistic regression method to enhance the performance of spam detection model.

Moreover, Mateen et al. [23] introduced a hybrid method that uses content-based and graph-based features to identify spam on twitter platform. Vishwarupe et al. [24] have used novel feature to enhance the classification model for spammer detection in twitter dataset. Sedhai and sun [25] proposed a semi-supervised spam detection (S3D) scheme for spam detection in twitter datasets. To study the class imbalance issue in Twitter, Li and liu [20] surveyed some popular methods and identified the most effective method. Chen et al. [26] have used deep analysis on the statistical features of tweets to identify spam tweets. Wu et al. [27] surveyed and compared different methods used for spammer detection in tweets. Singh and singh combined the strength of particle swarm optimization (PSO) and correlation based feature selection technique (CFS) [28] for web spam detection. Li et al. [29] have used synthetic minority over-sampling and de-noising auto-encoder method in the deep belief networks for the classification of web spam. Singh and batra [30] proposed an ensemble based spam detection method in which they have used quotient filter and locality sensitive hashing for efficient and similarity searching respectively. Wei and Singh [31] have discussed current challenges and some future directions for effective surveillance of twitter data. Bindu et al. [32] proposed a unsupervised method that uses community-based features, graph and URL characteristics of user accounts for spam detection on Twitter. Liu et al. [33] introduced a fuzzy-redistribution and asymmetric sampling based hybrid method to detect spammer tweets. Inuwa-Dutse et al. [34] have used account information features to discover the spam posting accounts on twitter. Miller et al. [35] have used two stream clustering methods namely; StreamKM++ and DenStream to identify spammer tweets. Singh et al. [36] have designed a model to detect and block fake review and spams. Narayan et al. [37] introduced a semi-supervised PU-learning-based method for review spam detection.

Recently, metaheuristic algorithms are also used for spam classification. Salehi et al. [38] introduced an genetic algorithm based approach for email spam detection. Idris et al. [39] uses differential evolution [40] and negative selection algorithm to detect spam email. A combined approach based on negative selection and particle swarm optimization (PSO) [41] has been used for email spam detection [42] which sometimes trap to its local solution and also takes more time to stabilize. Metaheuristic-based algorithms generally trap to their local optima therefore, to maintain the diversity in the population and guide the search process a hybrid method based on the strength of evolutionary algorithms and local search methods has been introduced [43].

In this paper, a novel metaheuristic clustering (spiral cuckoo search-based clustering) method has been proposed for spam detection. The overall contribution of this paper has been divided into two folds.

First, a novel metaheuristic method based on the cuckoo search and Fermat spiral has been proposed.

Secondly, the proposed method has been used to solve spam review detection problem.

In CS, Lévy flight is used to generate new solutions which may not be diverse and it may also trap to its local solution. Therefore, to make balance between exploration and exploitation spiral cuckoo search method has been proposed. The proposed method uses Fermat spiral and Lévy flight to generate new solutions. The proposed spiral CS method has been validated on 15 standard benchmark problems including both unimodal and multi-modal problems [44]. Furthermore, a spiral cuckoo search-based clustering method has been introduced for spammer detection. To validate the effectiveness of proposed clustering method, it is tested on five spammer datasets and compared with particle swarm optimization algorithm (PSO), differential evolution (DE), Genetic algorithm [45], K-means [46], cuckoo search (CS) [47] and improved cuckoo search (ICS) [48].

The rest of the paper is structured as follows: the Fermat spiral and cuckoo search method is reviewed in Sect. 2. In Sect. 3 the spiral cuckoo search method is discussed. Section 4, briefs the proposed spam detection method. Section 5 discusses the experimental results and the conclusion is presented in Sect. 6.

2 Preliminaries

2.1 Cuckoo search

Cuckoo search (CS) is a nature inspired optimization method which is based on the brood parasitic conduct of some cuckoo species. Due to obligate brood parasitism behavior, cuckoos use a suitable host to hatch their eggs [47,48,49]. Ani and Guira are some of the cuckoo species who put their eggs in communal bird’s nest [51, 52]. Timing of placing an egg in these cuckoos species are also very amazing. They select a nest in which host birds just placed its own eggs. Usually, cuckoos eggs are incubated earlier as compared to host birds [50, 53]. Therefore, cuckoo’s chicks are born prior to host and these chicks may throw out or remove the host’s eggs which increases the the food share of cuckoo’s chicks. CS method is based upon three principles: (1) at a time, each cuckoo places one egg in a arbitrarily selected nest, (2) nest, having top quality eggs, will carry over the upcoming iterations, (3) total number of host nests are fixed, and \(P_a\)\(\epsilon\) [0, 1] is the probability that a host discovers an egg placed by cuckoo. If the host recognizes the cuckoo’s egg, it either removes the eggs from nest or leave the nest and construct the another one. In short, using this principle, the poor quality eggs (solutions) are replaced by new eggs (solutions).

The complete steps of CS method is depicted in Algorithm 1 [53]. New solutions \(x_i^{(r+1)}\) for a cuckoo n in CS method is generated by using Eq. (1) which rely on the present state and transition probability.

$$\begin{aligned} x_{i}^{(r+1)}= x_{i}^{(r)} + \alpha \otimes Levy(\lambda ) \end{aligned}$$
(1)

here \(\alpha\) is used to scale the step size produced by lévy flight and in most of the cases \(\alpha\) is set to 1 The \(\otimes\) in Eq. (1) represents entry wise multiplications. In CS, Lévy flight is used to explore complete search space as its step size is much longer in the big run and biased random walk is used for exploitation. For exploitation, the fraction \(P_a\) of the worse nest is left and another ones are constructed.

figure a

2.2 Fermat’s spiral

The American Heritage Dictionary defines a spiral “as a curve on a plane that winds around a fixed centre point at a continuously increasing or decreasing distance from the point”. Spiral follows a winding, generally to upward direction and displays a twisted form or shape. In mathematics, Spirals are categorized into two groups namely; two dimensional and three dimensional spirals based on their movement around pivot. The two-dimensional spirals may be easily described using polar coordinates. Archimedean spiral, Fermat’s spiral, Cornu spiral, etc. are some of the important two dimensional spirals. The three dimensional spirals is a two dimensional spiral with additional variable height h.

Fermat spiral is discovered by the great mathematician Pierre de Fermat in 1636. Fermat spiral is based on parabolic formula in polar coordinate as given in Eq. 2 hence, it is also known as the parabolic spiral.

$$\begin{aligned} \displaystyle r=\theta ^{1/2}, \end{aligned}$$
(2)

where radius r is a monotonic continuous function of angle \(\theta\).

The Fermat spiral shows the similar behavior to the Archimedean spiral for \(m=2\) in polar equation. The Fermat spiral produces two r values of opposite sign for any positive \(\theta\) value using Eqs. (3) and (4).

$$\begin{aligned} \displaystyle r=\, & {} a\theta ^{1/2}, \end{aligned}$$
(3)
$$\begin{aligned} \displaystyle r= & {} -a\theta ^{1/2}. \end{aligned}$$
(4)

The Fermat spiral is created by combining the plots generated by both the above equations and shown in Fig. 1. From the Fig. 1, it can be discovered that the resulting spiral is symmetrical about the origin.

Fig. 1
figure 1

Fermat spiral

3 Spiral cuckoo search method

CS employs Lévy flight and biased random walk to find the optimal solution. Generally, CS uses Lévy flight to explore the search region, as its step size is much longer in long run [53]. In CS, Lévy flight generates some of the new solutions closed to the current best solution to expedite the search process and remaining of the solution are generated far away from the current best solution to avoid the premature convergence as given in Fig. 2.

Fig. 2
figure 2

Step sizes drawn from levy flight

From the Fig. 2, it is envisioned that the Lévy flight produces a random walk. The step sizes in random walk are not equal since they rely on the step size scaling factor \(\alpha\) and probability \(P_a\). Due to unequal step sizes in random walk convergence speed of the method will also be affected. The convergence speed of CS relies on the parameters \(\alpha\) and probability \(P_a\), which is fixed in CS method. From the experiments it is found that CS will take longer time to converge if large value of \(\alpha\) and small value of \(P_a\) have been used while CS will converge quickly and its accuracy will be low, if small value of \(\alpha\) and large value of \(P_a\) are used. Therefore, to avoid premature convergence and for better precision, many variants of CS have been proposed. In this paper, a novel cuckoo search method based on Fermat spiral movement has been proposed. A two dimensional Fermat’s spiral can be described using Eqs. (3) and (4) as given in Sect. 2.2.

The spiral movements of Fermat’s spiral is given in Fig. 1. From the figure, it is easily visualize that the movement of Fermat spiral depends upon the angle \(\theta\). In Fermat spiral, for any value of \(\theta\), one positive and one negative value of r is produced. Thus, the resultant spiral will be symmetrical about the line \(y = -x\) as given in Fig. 1, which will help to explore the complete search space and avoids the premature convergence.

The spiral cuckoo search method uses the property of Fermat spiral and Lévy flight along with variable \(\alpha\) and \(P_a\) to find the optimal solution. To accelerate the local search (exploitation), the proposed spiral cuckoo search method employs Lévy flights that generate some of the solution vectors adjacent to best solution while it uses Fermat spiral to explore the complete search space.

4 Proposed spam review detection method

This paper introduces a spiral cuckoo search-based clustering method to detect spam reviews. The proposed clustering method detects the spam reviews in four phases; (i) preprocessing the reviews, (ii) feature extraction, (iii) feature selection and normalization and (iv) spam review detection using spiral cuckoo search-based clustering method. The detailed flow chart of the proposed method has been shown in Fig. 3.

Fig. 3
figure 3

Flowchart of the proposed spiral CS clustering method

4.1 Preprocessing reviews

Online reviews usually contain noise such as stop words, slang words etc. which are not desired while extracting features. Therefore, python natural language toolkit (NLTK) [54] has been used to remove noise and unwanted words from online reviews using following two phases:

4.1.1 Phase 1

In this phase all the unwanted words and noise are removed from online reviews using the following steps:

  1. 1.

    All the reviews are converted into lowercase.

  2. 2.

    Special symbols like ®, @, #, etc. are removed from online reviews.

  3. 3.

    Stop words such as we, the, a, etc. which do not carry any relevant information are removed from reviews using NLTK library.

  4. 4.

    Multiple white spaces in reviews are replaced by single white space.

  5. 5.

    All numbers are removed from reviews.

  6. 6.

    Some punctuation such as forward slash parenthesis, backward slash, and dash are removed from reviews.

4.1.2 Phase 2

This phase employs tokenization step to split paragraphs into sentences. Tokenization is also known as lexical analysis or text segmentation. After tokenization, lemmatization is used to reduce words to their root forms. For example “reading” is converted to “read.”

4.2 Feature extraction

After preprocessing, significant features are extracted using Linguistic Inquiry and Word Count (LIWC 2015) [55]. LIWC 2015 is a text-analysis tool which generally provides 93 features.

4.3 Feature selection

Feature selection also called as attribute selection or variable subset selection is a process of selecting appropriate features with respect to target data. Feature selection is important since it:

  1. 1.

    Removes redundant data.

  2. 2.

    Selects attribute that are significant.

  3. 3.

    Reduces chances of over fitting.

  4. 4.

    Reduces training time.

LIWC tool extract 93 features from dataset. Since, some of the extracted features may be irrelevant and redundant so, they may cause over fitting. Moreover, training time also increases with more number of features [56]. Thus, to eliminate irrelevant and redundant features, whale optimization algorithm with simulated annealing (WOASA) [57] has been used which dynamically selects the optimal set of features from dataset. The main objective of feature selection method is to maximize the classification accuracy and minimize the number of selected features along with error rate. After selecting relevant features, proposed spam detection method is used.

4.4 Spam reviews detection using spiral cuckoo search-based clustering method

Cuckoo search method generates initial population randomly and due to random initialization of population, CS may take longer time to converge. Moreover, it may also trap to its local solution due to the lack of diversity in the population. Therefore, in this paper a novel variant of CS named spiral CS has been proposed. The proposed spiral CS method takes the advantages of Fermat spiral and Lévy flight to generate new solutions. Due to this modification, the proposed method requires lesser number of iterations for convergence and to find the optimal solution.

Furthermore, the proposed spiral CS method has been used to detect spam reviews. To identify spam reviews, a spiral CS-based clustering method has been introduced. The proposed clustering method uses the following three steps to detect spam and non-spam reviews:

  1. 1.

    Generate k cluster centers (\(c_1\), \(c_2,\ldots c_k\)) randomly and use them to initialize the population of spiral cuckoo search. For the spam detection problem, cluster centers for spam (\(c_1\)) and non-spam reviews (\(c_2\)) are generated.

  2. 2.

    Compute the fitness of each pattern (review) using objective function that minimizes the sum squared error and assign it to one of the cluster.

  3. 3.

    Optimize the clusters using spiral cuckoo search.

To understand mathematically, consider \(\hbox {X} = (x_1^d , x_2^d,\ldots , x_r^d\)) is a set of r reviews which are to be divided into k classes such as \(C_1, C_2, \ldots , C_k\). Each review is depicted by a feature matrix having L features and has been scaled in [0, S]. The probability distribution of each feature may be described as follows [58, 59]:

$$\begin{aligned} p_{j}=\frac{O_{j}}{r}. \end{aligned}$$
(5)

where j is the \(j^{th}\) feature i.e. \(0 \le \hbox {j} \le\) S and \(O_j\) is number of reviews that contain \(j^{th}\) feature. The total mean of each feature can be expressed by Eq. (6).

$$\begin{aligned} \mu =\sum _{j=1}^{S}{jp_{j}}. \end{aligned}$$
(6)

Any review is categorized into class \(C_k\) for which it has minimum Euclidean distance. Thus, the probability (\(W_k\)) of occurrence of class \(C_k\) (\(k=1,2, \ldots , n\)) is given by Eq. (7).

$$\begin{aligned} W_{k}=\sum _{j\in C_k}{p_{j}}. \end{aligned}$$
(7)

The mean of class \(C_k\) can be calculated by Eq. (8).

$$\begin{aligned} \mu _{k}=\sum _{j\in C_k}\frac{jp_{j}}{W_{k}}. \end{aligned}$$
(8)

If, \(\mu _{k}\) is the mean of class \(C_k\) then, intra-cluster distance can be calculated using Eq. (9).

$$\begin{aligned} D_{intra}=\sum _{i=1}^{k}\sum _{\forall x_i\in C_k}{\left\| {(x_i-\mu _k)}\right\| }^2,\quad i=1,2,\ldots ,k \end{aligned}$$
(9)

where \(x_i\) is the set of data points in cluster \(C_k\) and \(\mu _k\) is representative point (cluster centroid) for cluster \(C_k\).

To cluster the data points into their respective classes, intra-cluster distance should be minimized or inter-class variance should be maximized. The proposed clustering method minimizes the intra-cluster distance as given in Eq. (9) [60]. The pseudo-code of the spiral CS-based clustering method is given in Algorithm 2.

figure b

5 Experimental results

The efficiency of the proposed spam detection method is discussed in two sections. First, Sect. 5.1 analyze the efficiency of the proposed spiral CS on benchmark functions belonging to two different categories i.e., unimodal and multimodal [32]. Second, Sect. 5.2 discusses the effectiveness of proposed clustering method on spam review and Twitter spammer datasets. For fair comparison, all experiments are simulated on Matlab 2016a running on a computer having 2.30 GHz Intel R core i3 processor, 2 GB of RAM and 500 GB hard-disk.

5.1 Performance analysis of spiral CS

Spiral CS has been tested on 15 benchmark functions including both unimodal (\(F_1 - F_8\)) and multimodal (\(F_9 - F_{15}\)) functions [44]. The unimodal functions evaluate the rate of convergence in achieving global optimum while multi-modal functions test the chances of stucking in local optima. Table 1 depicts the considered benchmark functions along with optimal value. The comparative analysis has been conducted against the four existing nature-inspired algorithms namely; particle swarm optimization (PSO), differential evolution (DE), genetic algorithm (GA), cuckoo search (CS) and a novel variant of CS (ICS) in terms of mean fitness values along with their standard deviation values. In all the algorithms, population size (N) is 50 and maximum iteration (max itr) is 1000. The parameters setting of the considered algorithms is illustrated in Table 2. The obtained best fitness as well as standard deviation values over 30 runs on each benchmark function is averaged and presented in Table 3. From table, it can be visualized that the spiral CS has obtained better results than other methods (PSO, DE, GA, CS, and ICS) on all the considered benchmark functions except benchmark functions \(F_4\) and \(F_{13}\). For the benchmark function \(F_{13}\), ICS perform slightly better than the proposed spiral CS method while CS returns the best standard deviation value for benchmark function \(F_4\). Moreover, Spiral CS, PSO, and DE have eqvivalant mean fitness function and standard deviation value for benchmark function \(F_9\). Thus, it can be stated that proposed spiral cuckoo search outperforms the compared methods.

Table 1 Benchmark functions
Table 2 Parameter values for all the methods
Table 3 Comparative analysis of existing and proposed methods for mean fitness value and corresponding standard deviation values on the standard benchmark functions

5.2 Experimental Analysis of Proposed Spam Detection Method

The accuracy of the proposed spiral CS-based clustering method has been tested on one Twitter spammer and four spam review datasets. The brief description of these datasets have been depicted in Table 4. From the Table 4, it can be visualize that the class distribution of synthetic and yelp datasets are imbalanced (skewed). It is widely known that poor models usually do not show satisfactory results over skewed datasets [61]. Hence, to show the efficacy of proposed spam detection method both skewed (imbalanced) and non-skewed (balanced) datasets have been used.

Table 4 Considered datasets

5.2.1 Spam review dataset

This dataset [6] has been taken from Mylee Ott website which contain total 1600 reviews of 20 Chicago hotels, divided in four labels; negative truthful, negative deceptive, positive truthful and positive deceptive. Each label having 400 reviews. In this dataset, both positive and negative deceptive reviews are acquired from Amazon Mechanical Turk while positive truthful reviews are extracted from TripAdvisor. Negative truthful reviews of this dataset are acquired from Hotels.com, Expedia, Priceline, Orbitz, and TripAdvisor. For better comparison, negative truthful and positive truthful reviews are given “Not spam” label and negative deceptive and positive deceptive are given “spam” label.

5.2.2 Synthetic spam review dataset

This dataset was initially taken from the Database and Information System Laboratory, University of Illinois (TripAdvisor Dataset) and was unlabeled [62]. Thus, to produce spam reviews, synthetic review spamming method has been used [63]. The synthetic review spamming method produces a dataset which consist of 479 reviews with 316 spam and 163 non-spam reviews.

5.2.3 Yelp fake review dataset

This dataset has been taken from Yelp.com which contain reviews of 85 hotels and 130 restaurants in Chicago area [64, 65]. For fair comparison, mixture of popular and disliked restaurants & hotels reviews are considered. The detailed statistics of dataset is given in Table 4. From the table, it can be observed that the distribution of dataset is skewed.

  1. 1.

    Yelp hotel review dataset

    This dataset is subset of Yelp fake review dataset and consist of 5678 reviews [64]. There are 802 spam (fake) and 4876 non-spam reviews generated by 5124 reviewers.

  2. 2.

    Yelp restaurant review dataset

    This dataset is also subset of Yelp fake review dataset and contains 58517 reviews generated by 35593 reviewers [64]. There are 8368 spam and 50149 non-spam reviews.

5.2.4 Twiiter spam dataset

This dataset has been collected using Twitter API and contains 600 million tweets. All the tweets of dataset are manually annotated into two classes namely; spammer and non-spammer.To detect spammer tweets, 12 features are extracted from each tweet. In this paper, a subset of standard Twitter dataset have been used. This dataset consist of 10,000 tweets (5000 spammer and 5000 non-spammer tweets) which are randomly chosen from a fixed continuous time frame.

Table 5 Some of the selected features of synthetic dataset

The spam review datasets are preprocessed to remove noise as discussed in Sect. 4.1. From the preprocessed datasets 93 features are extracted using LIWC 2015 and some of the features of spam review datasets are given in Table 5. However, all the 93 features may not be relevant. Therefore, feature selection method as discussed in Sect. 4.3 has been used to select the best set of features from the spam review datasets. As Twitter spammer dataset contains only 12 features hence, WOASA feature selection method has not been used on Twitter spammer dataset. The total number of selected features and mean error from WOASA for all the spam review datasets are represented in Table 6. Since, the range of values of feature vector in dataset varies widely. Therefore, for uniformity and faster convergence, feature vector matrix is normalized. Afterwords, proposed clustering method is used to identify spam and non-spam reviews. However, classification accuracy alone can be misleading if each class have an unequal number of instances. Therefore, to assess the performance of proposed clustering method and make it comparable with other considered methods along with accuracy, recall and precision are also computed. To compute the precision, recall, and accuracy confusion matrix is created. The confusion matrix C of size \(n \times n\) represents that there are n number of classes and its value \(C_{ji}\) shows the number of patterns of class j predicted in class i.

Table 6 Error in feature selection using binary whale optimization with simulated annealing

In confusion matrix, four values namely; TP (true positive), TN (true negative), FP (false positive), and FN (false negative) as shown in Table 7 are used. where:

  1. TP

    represents the quantity of spam messages which are exactly predicted to spam.

  2. TN

    depicts the amount of non-spam messages which are correctly predicted as non spam.

  3. FP

    shows the amount of non-spam reviews that are incorrectly labeled as spam.

  4. FN

    represents the spam reviews which are wrongly predicted to non-spam.

Based on the confusion matrix, precision, recall and accuracy are computed using Eqs. (10)–(12)

$$\begin{aligned} Precision= & {} \frac{TP}{TP+FP}, \end{aligned}$$
(10)
$$\begin{aligned} Recall= & {} \frac{TP}{TP+FN},\end{aligned}$$
(11)
$$\begin{aligned} Accuracy= & {} \frac{TP+TN}{TP+TN+FP+FN}. \end{aligned}$$
(12)
Table 7 Confusion matrix

However, metaheuristics method are randomized in nature, thus each method has been executed 30 times over each dataset and the experimental outcomes have been examined in regards to mean precision, mean recall, mean accuracy, mean fitness, and standard deviation values. The performance of the proposed spiral cuckoo search clustering method has been analyzed on original datsets as well as datasets with optimal set of features. The mean precision and mean recall of each method over original datsets as well as datasets with optimal set of features has been presented in Table 8. From the table, it can be perceived that the proposed spam detection method attains the best results in the metrics of recall, and precision over all the datasets.

Table 8 Comparison of proposed spam detection method with other methods in terms of mean precision, mean recall over datasets with original and optimal set of features
Table 9 Comparison of proposed spam detection method with other methods in terms of mean accuracy, mean fitness function and standard deviation values with original set of features

The mean fitness, mean accuracy, and standard deviation values for each dataset with original set of features are given in Table 9. From the Table 9, it is clearly observed that the proposed spam detection method gives better results than other methods in terms of mean fitness and mean accuracy. K-means and DE give competitive results over spam review and Yelp hotel review datasets respectively for performance parameters mean computational time while PSO shows better standard deviation value on synthetic spam review dataset.

Furthermore, the proposed clustering method has been tested on datasets with optimal set of features. The mean fitness value, mean accuracy, and standard deviation values of each dataset with the relevant set of features are given in Table 10. From the table, it is observed that the spiral cuckoo search clustering method outperforms all the other methods. However, in terms of mean computational time, the proposed method shows better results over all the datasets except spam review and Yelp hotel review datasets. If the results of Tables 9 and 10 are compared then it can be perceived that the proposed method shows very prominent results over datasets with optimal set of features.

Table 10 Comparison of proposed spam detection method with other methods in terms of mean accuracy, mean fitness function and standard deviation values over datasets with relevant features returned by feature selection method
Fig. 4
figure 4

Box plots for all the considered methods and proposed spiral CS-based clustering method of a spam review dataset, b Synthetic spam review dataset, c Yelp hotel review dataset, and d Yelp restaurant review dataset for the performance parameter accuracy

To validate the performance of proposed method, box plots [66] have been also plotted for all the spam review datasets with relevant set of features and represented in Figs. 4, 5 and 6. In the box plot x-axis denotes the name of the method and the y-axis denotes the parameter under consideration. From the box plots, it is observed that the spiral CS-based clustering method has an edge over other methods in terms of consistency. Moreover, convergence graph has also been plotted in Fig. 7 to show the convergence behavior of all the considered methods and proposed method. In the convergence plot x axis denotes the name of the method and the y- axis denotes the parameter under consideration.

Fig. 5
figure 5

Box plots for all the considered methods and proposed spiral CS-based clustering method of a Spam review data et, b synthetic spam review dataset, c Yelp hotel review dataset, and d Yelp restaurant review dataset for the performance parameter fitness function value

Fig. 6
figure 6

Box plots for all the considered methods and proposed spiral CS-based clustering method of a Spam review dataset, b Synthetic spam review dataset, c Yelp hotel review dataset, and d Yelp restaurant review dataset for the performance parameter computational time

Fig. 7
figure 7

Convergence plots for all the considered methods and proposed spiral CS-based clustering method of a Spam review dataset, b Synthetic spam review dataset, c Yelp hotel review dataset, and d Yelp restaurant review dataset

Furthermore, to validate the significance of results wilcoxon rank sum multiple-problem test is also conducted at 5% level of significance of proposed method and existing methods. Table 11 presents the corresponding p-value and \(z-value\) along with SIG (significance) of each method. The p-value is used in the context of null hypothesis and it determines the significance of results. The null hypothesis is rejected if p-value \(\le 0.05\) and symbolized by \(+\) or −, else, it is accepted and represented by \(=\) symbol. The ’\(+\)’ indicates that method is different and significantly good while ’−’ shows that it is different and significantly poor. From the table, it is visualized that values of SIG are ’\(+\)’ for all datasets i. e. spiral CS-based clustering method is significantly different from the considered methods.

Table 11 Results of the wilxcon test for statistically significance level at \(\alpha = 0.05\)

6 Conclusion

In this paper, a novel variant of cuckoo search namely; spiral CS has been proposed. The proposed method takes the advantages of Fermat spiral and Lévy flight to find the optimal solution in lesser number of iterations. The experimental results of proposed spiral cuckoo search method is validated on 15 benchmark functions including both unimodal and multi-modal. From the experimental results, it can be elicited that the proposed spiral CS method shows promising results than PSO, DE, GA, CS, and ICS. Additionally, the efficiency of spiral CS has been validated through the proposed spiral CS-based clustering method. The performance of proposed clustering method has tested on four spam datasets and one Twitter spammer dataset. Further, the proposed spam detection method has been compared with K-means, PSO, DE, GA, CS, and ICS. Convergence graph is also plotted to depict the exploration and exploitation capabilities of the proposed method. Moreover, box plots are also drawn to show the consistency of proposed method. From the experimental and statistical evidences, it is found that the proposed spiral cuckoo search clustering method is efficient than the compared methods. Though the proposed clustering method is better than the existing methods, still effort is required to improve the accuracy. Therefore future work involves exploring more feature selection techniques and optimization algorithms for better accuracy.