Abstract
The Vortex Search Algorithm (VSA) is a meta-heuristic algorithm that has been inspired by the vortex phenomenon proposed by Dogan and Olmez in 2015. Like other meta-heuristic algorithms, the VSA has a major problem: it can easily get stuck in local optimum solutions and provide solutions with a slow convergence rate and low accuracy. Thus, chaos theory has been added to the search process of VSA in order to speed up global convergence and gain better performance. In the proposed method, various chaotic maps have been considered for improving the VSA operators and helping to control both exploitation and exploration. The performance of this method was evaluated with 24 UCI standard datasets. In addition, it was evaluated as a Feature Selection (FS) method. The results of simulation showed that chaotic maps (particularly the Tent map) are able to enhance the performance of the VSA. Furthermore, it was clearly shown the fitness of the proposed method in attaining the optimal feature subset with utmost accuracy and the least number of features. If the number of features is equal to 36, the percentage of accuracy in VSA and the proposed model is 77.49 and 92.07. If the number of features is 80, the percentage of accuracy in VSA and the proposed model is 36.37 and 71.76. If the number of features is 3343, the percentage of accuracy in VSA and the proposed model is 95.48 and 99.70. Finally, the results on Real Application showed that the proposed method has higher percentage of accuracy in comparison to other algorithms.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
The metaheuristic optimization algorithms were proposed over the past decades and implemented extensively to the problem of the complicated [1, 2]. The essential target in the optimization is the candidate the problem variables to minimize or maximize the objective function based on the global and local search [3, 4]. So as to triumph over the state-of-the-art goals in any problem, most of such algorithms were applied as an attempt to establish an approximate technique for attaining the optimum solution [5, 6]. A number of well-known new nature-inspired algorithms include the Invasive Weed Optimization (IWO) [7], the butterfly optimization algorithm (BOA) [8], the Artificial Bee Colony (ABC) [9], the Fruit Fly Optimization Algorithm (FOA) [10], the Firefly Algorithm (FA) [11], the Krill Herd (KH) algorithm [12], the Differential Evolution (DE) algorithm [13], the Flower Pollination algorithm (FPA) [14], etc. The distinction in nature is an essential factor why the algorithms have an alternate dimension of execution in delivering results [15, 16]. Besides, this factor might be the motivation behind why a few algorithms can best item an answer for specific issues, while others don't. Thus, it is according to this limitation that one algorithm is not good enough for solving every kind of problem.
During the past decade, an arithmetic framework and scientific branch, namely chaos, has been proposed, and is connected deeply with different scientific fields. Chaos involves three major dynamic properties: the quasi stochastic property, being sensitive against initial conditions, and ergodicity. The application of chaos theory in optimization research areas has attracted a lot of attention over the recent years. The Chaotic Optimization Algorithm (COA) [17] is among the applications of chaos, and uses the nature of chaos sequences. It has been indicated if random variables are replaced with chaotic variables, the performance of COA can be enhanced. Therefore, in the literature, there are a number of studies on the hybridization of chaos with other algorithms for the purpose of improving the performance of COA. Some instances include the chaotic ACO [18], chaotic DE algorithm [19, 20], chaotic KH algorithm [21, 22], chaotic FPA [23], chaotic genetic algorithm [24, 25], chaotic PSO [26,27,28], chaotic gravitational search [29,30,31], chaotic bat algorithm [32], and etc.
FS is the procedure of selecting a subset of features from an original feature set; it may be considered the most important pre-processing instrument to solve classification issues [33]. Figuring out a superior subset of features is a quite complicated challenge, and is decisive in the final results of the rates of classification error. The finalized feature subset will retain high rates of classification accuracy. The purpose is choosing an applicable subset including d features from a set of D features (d < D) in a given dataset [34]. D is made out of all features that are present in a given data set; it can encompass redundant, noisy, and misleading features. Consequently, an exhaustive search is performed within the whole solution environment, which usually takes a lot of time and cannot often be implemented in practice. To remedy this FS strategy, maintaining the best subset of d relevant features was taken into consideration. Inappropriate features are not only useless, but also can certainly worsen the classification performance. If irrelevant features are deleted, the computational efficiency can be advanced and classification accuracy expanded.
As indicated by search techniques of feature subsets, the current FS strategies can be classified into two classes: the filter-based approach and the wrapper-based approach. The filter method depends fundamentally on general qualities of datasets to assess and choose include subsets without taking into account an uncommon learning approach. Thus, the productivity of this methodology depends predominantly on the dataset itself instead of on the classifier [35, 36]. The wrapper method utilizes a classification calculation to assess feature subsets and embraces a search system to look for ideal subsets. It often leads to better since the wrapper approach takes into consideration a classifier with the evaluation or search process [37].
Each meta-heuristic algorithm has a unique search strategy. Meta-heuristic algorithms can find optimal solutions based on its own strategy such as balance between exploration and exploitation. Furthermore, VSA has advantages such as the smaller number of parameters and easy implementation. The VSA was embedded with chaotic maps to obtain a better compromise between exploitation and exploration. This paper uses hybrid methods based on CMs with the VSA for FS. The major contribution of the current paper is that a CMs model of VSA has been proposed to enhance the performance of VSA. In proposed methods, the chaotic seek method are followed to choose the ideal characteristic subset that maximizes the category accuracy and minimizes the feature subset duration. Ten one-dimensional CMs are adopted and changed with random movement parameters of the VSA. The performance of the proposed methods is tested on 24 benchmark datasets. Similarly, the performance of VSA is comparison with seven other metaheuristic algorithms. Based on mean criterion, the proposed method can obtain better solutions using the Tent Map in comparison with other metaheuristic algorithms.
The main contributions of this paper are as follows:
-
VSA and Chaotic Maps are defined to FS.
-
The proposed method has a faster convergence performance than the other algorithms. The proposed method has better convergence results on different datasets.
-
The proposed method has been evaluated with 24 UCI standard datasets.
-
The best VSA is State2 with VSAC101 that obtained by using the Tent map.
-
The proposed method has been tested on author identification datasets
-
The obtained results confirmed the validity and superiority of the proposed method in comparison to other algorithms.
The organization of this paper is as follows: Sect. 2 gives related works about chaotic and FS. Section 3 provides an introduction to VSA. The detailed description of the proposed method has been provided in Sect. 4, while the experimental results and discussion of the proposed VSA have been provided in Sect. 5. In Sect. 6, the proposed method has been applied on a real application (i.e., author identification). Finally, the conclusion and future work have been discussed in Sect. 7.
2 Related works
The Moth Swarm Algorithm (MSA) is among the most recently-developed nature-inspired heuristics for the purpose of the optimization problem. However, its shortcoming is that it has slow convergence rate, and the Chaos theory has been incorporated into it to eliminate this drawback. In [38], ten CMs have been embedded within the MSA for the purpose of finding the ideal number of prospectors to increase exploiting the most promising solutions. The proposed method was applied in solving the famous seven benchmark test functions. The results of simulation showed that CMs can enhance the performance of the original MSA with regard to the convergence speed. In addition, the sinusoidal map was found to be the best map for enhancing the performance of MSA.
The Cuckoo search algorithm (CSA) is a metaheuristic algorithm that has been inspired by nature and imitates the obligate brood parasitic behavior of the cuckoo species. The method has been proven to have promising overall performance in solving optimization problems. Chaotic mechanisms were incorporated into CSA to make use of the dynamic features of the chaos theory, to further improve its search overall performance. However, in chaotic CSA (CCSA) [39], the best CM was applied in a single search of the new release, which restrained the exploitation capability of the search. The researchers considered utilizing multiple CMs at the same time to perform the nearby search inside the community of the global best solution that is found by CSA. To attain this goal, three kinds of multiple chaotic CSAs (MCCSA) were proposed via incorporating several CMs into the chaotic local search (CLS) parallel in a random or selective manner. The overall performance of MCCSA was validated using 48 broadly-used benchmark optimization features. The experimental results indicated that MCCSAs are generally better than CCSAs, and the MCCSA-P that makes use of the CMs has the best quality among all sixteen editions of the CSAs.
In [40], a chaos-based Crow Search Algorithm (CCSA) has been proposed to solve the fractional optimization problems (FOPs). The proposed CCSA integrated the chaos theory (CT) into the CSA for the purpose of refining the global convergence velocity and enhance the exploration/exploitation inclinations. CT was utilized to track the standard CSA parameters, which yielded four versions and the high-quality chaotic variant was investigated. The incorporation of CT was able to improve the overall performance of the proposed CCSA and allow the search process to perform better speeds. The overall performance of the CCSA method was proven on twenty fractional benchmark problems. Furthermore, it was further tested on a fractional monetary environmental power dispatch problem via attempting to limit the ratio of the overall emissions to general gasoline cost. Ultimately, the proposed CCSA was compared with the PSO, standard CSA, FA, Dragonfly Algorithm (DA), and GWO. In addition, the efficiency of the proposed CCSA was justified by the non-parametric Wilcoxon signed-rank test. The experimental results proved that the proposed CCSA performs better than similar algorithms with regard to efficiency and reliability.
In [41], a new hybrid algorithm for solving optimization problems based on chaotic ABC and chaotic simulated annealing has been proposed. The chaotic ABC reveals new locations chaotically. Chaos may additionally improve the exploration of the search space. Really, the proposed hybrid method affords a hybrid of nearby search accuracy of simulated annealing and the capacities of global seek of ABC. Moreover, they used an exclusive method for producing the initial population. Sincerely preliminary populace is of brilliant significance for populace-based techniques, because it immediately influences the rate of convergence and nice of the outcomes. It is established the usage of 12 benchmark functions. The effects are as compared with those of the artificial bees’ algorithm, the hybrid algorithm of ABC and simulated annealing and PSO. Simulation effects display the performance of the proposed method.
In [42], an adaptive chaotic Bacterial Foraging Optimization (BFO) is presented. The improved BFO consisted of two new features, the adaptive chemotaxis step setting, and the chaotic perturbation operation in all chemotactic events. The former feature results in fast convergence rate and the acceptable convergence accuracy in the algorithm, while the latter further allows the search to avoid the local optima and attain better convergence accuracy. Firstly, an idea of adaptive exponential decease chemo taxis step is presented, in which the natural exponential function variable is a function about the iterations and nutritive ratio between the current bacterium position and the best bacterium position in each iteration. Secondly, when each bacterium reaches a new position through swim behavior, chaotic perturbation is applied to avoid entrapping into local optima. With five benchmark functions, Chaotic BFO is proved to have a better performance than the original BFO and BFO with linear deceasing chemo taxis step (BFO-LDC).
Jia et al. [43] proposed an effective memetic DE algorithm (DECLS), which makes use of a CLS with a ‘shrinking’ strategy. The shrinking strategy for the CLS search space was introduced in that paper. In addition, the local search length was determined according to the feedback of the fitness of the objective functions in a dynamic manner in order to save the function evaluations. Furthermore, the parameter settings of the DECLS were adapted in the process of evolution so as to further enhance the optimization efficiency. The hybrid form of the DE and a CLS as well as a parameter adaptation mechanism seemed very reasonable. The CLS is helpful in enhancing the local search capability of DE, whereas the parameter adaptation can improve the global optimization quality. The CLS is helpful in improving the optimization performance of the canonical DE through exploring a very large search space in the early phases so as to avoid the occurrence of premature convergence, and exploiting a tiny region in later phases to refine the finalized solutions. In addition, the settings of parameters in the DECLS were controlled adaptively to further improve the search capability. To assess the efficiency and effectiveness of the proposed DECLS algorithm, it was compared with four state-of-the-art DE variants and the IPOP-CMA-ES algorithm on a set of 20 selected benchmark functions. The findings showed that the DECLS is significantly superior, or at least comparable, to other optimizers with regard to the convergence performance and solution accuracy. Furthermore, the DECLS was shown to have certain advantages in terms of solving problems with high dimensions.
In [44], a modified DE algorithm based on the Opposition-based Learning (OBL) and a chaotic sequence named the OBL Chaotic DE (OBL-CDE) was proposed. The proposed OBL-CDE algorithm is different from the basic DE in two ways. The first one is related to the generation of the initial population that follows the OBL rules, while the second one is the dynamic adaption of the scaling factor F through using the chaotic sequence. The numerical results obtained by the OBL-CDE compared to the results of DE and the opposition-based DE algorithms on 18 benchmark functions indicated that the OBL-CDE is capable of finding more superior solutions and maintaining reasonable convergence rates at the same time.
The standard Glowworm Swarm Optimization (GSO) shows poor ability in global search and easily gets trapped into local optima. A Quantum GSO algorithm based on CMs was proposed [45] in order to solve such problems. First of all, a chaotic sequence was generated to initialize the population. This process results in higher probability to cover more local optimal areas, and provides the ground for further optimization and tuning. Next, the quantum behavior was applied to the elite population, which made it possible for individuals to locate any position of the solution space randomly with a certain probability. This greatly enhanced the capability of the algorithm in global search and avoiding local optima. Finally, it adopted the single dimension loop swimming instead of the original fixed-step movement mode. This not only improved the solution precision and convergence speed, but also solved GSO problems that were too sensitive to the step-size, and indirectly enhanced the robustness of the algorithm. The simulation results indicated that the proposed method was feasible and effective.
The Fruit Fly Algorithm (FOA) has recently been proposed as a metaheuristic technique, and is inspired by the behavior of fruit flies. Mitic et al. [46] improved the standard FOA through introducing the novel parameter in combination with chaos. The performance of this chaotic FOA (CFOA) was studied on ten famous benchmark problems using 10 different CMs. In addition, comparison studies with the basic FOA, FOA with Levy flight distribution, and other recently-published chaotic algorithms were made. Statistical findings on each optimization task showed that the CFOA results in a very high convergence rate. In addition, CFOA is compared with recently developed chaos enhanced algorithms such as chaotic bat algorithm, chaotic-accelerated PSO, chaotic FA, chaotic ABC, and chaotic CSA. Research findings generally indicate that FOA with Chebyshev map show superiority to the similar methods in terms of the reliability of global optimality and the algorithm success rate.
In addition, Gandomi et al.[47] proposed a chaos-enhanced version of the accelerated PSO. Some other instances of chaos-enhanced metaheuristic algorithms include the chaotic Genetic Algorithm [48], Chaotic PSO [49, 50], Chaotic Salp Swarm Algorithm [51], Chaotic Elephant Herding Optimization (EHO) algorithm [52], Chaotic Bat Algorithm[53], Chaotic FOA[46], Chaotic GSO Algorithm [45, 54], Chaotic Black Hole algorithm [55], Chaotic Simulated Annealing PSO Algorithm (CSAPSO) [56], Chaotic Social Spider Optimization Algorithm[57], Chaotic Bean Optimization Algorithm[58], Chaotic Quantum CSA [59], Chaotic Antlion Algorithm[60], Chaotic Hybrid Cognitive Optimization Algorithm[61], Chaotic Simulated Annealing [62], Chaotic Based Quantum Genetic Algorithm [63], Chaotic Teaching Learning Algorithm[64], Chaotic DE algorithm [65], Chaotic Grey Wolf Optimization Algorithm[66], Chaotic Fractal Search[67], Chaotic Brain Storm Optimization Algorithm [68], Multi-Objective CCSA [69], Chaotic Grasshopper Optimization Algorithm [70], Chaotic Krill Herd [21, 71, 72], Chaotic DE[73], Chaotic Firefly Algorithm [74, 75], Chaotic Starling PSO Algorithm[76], Chaotic CCSA [77], Chaotic Grey Wolf Optimization Algorithm [78] and etc. Table 1 shows a comparison of different models of meta-heuristic algorithms based on chaotic map.
3 Vortex search algorithm
The VSA is a recent metaheuristic optimization algorithm that changes into the stimulated mode by the vertical flow of the stirred fluids. Its processes consist of the simplified generation phases similar to other single-solution algorithms. The generation of VSA populations is modified to any generations with the aid of the usage of values completely shape the modern single solution. Furthermore, the performance of every update and seek of iteration pass at the seek space is an essential section in rendering single-solution. Inside the proposed VSA, this stability is performed with the aid of using a vortex-like search pattern. The strategies of vortex sample are simulated through some of the nested circles. The info of VSA techniques may be in brief defined in 4 steps as follow [79].
3.1 Generating the initial solution
The preliminary procedure initials ‘center’ μ0 and ‘radius’ r0. In this phase, the initial center (μ0) can be calculated using Eq. (1).
where \(upperlimit\) and \(lowerlimit\) are the bound constraints of the problem, which can be defined in vector of d × 1 dimensional-space. In addition, σ0 is the initial radius r0 generated with Eq. (2).
3.2 Generating the candidate solutions
The procedure of producing candidate solutions is applied for the purpose of rendering the generation of populations \(C_{t} \left( s \right)\) in any iterations, where t is the t-th iteration. The VSA is randomly generated around the initial center μ0 by using a Gaussian distribution, where \(C_{0} \left( s \right) = \left\{ {s_{1} ,s_{2} , \ldots ,s_{m} } \right\}m = 1,2,3, \ldots ,n\) represents the solution and n is the overall number of candidate solutions. The equation of multivariate Gaussian distribution has been shown in Eq. (3).
In Eq. (3) d indicates the dimension, while x is the d × 1 vector of a random variable, μ indicates the d × 1 vector of the sample mean (i.e., center), and Σ indicates the covariance matrix. Equation (4) indicates that when the diagonal elements (i.e., variances) of the Σ values are equal and the off-diagonal elements (i.e., covariance) equal zero (uncorrelated), the resulting shape of the distribution will be spherical. Thus, the value of Σ is computed through utilizing equal variances with zero covariance.
where the representation in Eq. (4), σ2 is the variance of the distribution, I represent the \(d \times d\) identity matrix and σ0 is the initial radius (r0) as can see in Eq. (2).
3.3 Replacement of the current solution
The replacement of the current solution is conducted for the selection process. A solution (which is the best one) \(\mathop s\limits \in C_{0} \left( s \right)\) is selected and memorized from \(C_{0} \left( s \right)\) for the purpose of replacing the current circle center (μ0). Before the selection process, it must be made sure that the candidate solutions are inside the search spaces (Eq. (5)).
where \(k = 1, 2, \ldots , n\) and \(i = 1, 2, \ldots , d\), and rand indicates a random number that is distributed uniformly. VSA uses \(\mathop s\limits\) as a new center, and reduces the vortex size using Eq. (3) to select the next solutions. Thus, the new set of solutions \( C_{1} \left( s \right)\) can be generated. If the chosen solution is better than the best solution, it can be determined as the new best solution and was memorized.
3.4 The radius decrement process
In the VSA, the inverse incomplete gamma function is applied for the purpose of decreasing the radius value during each iteration pass. The incomplete gamma function provided in Eq. (6) often arises in probability theory, especially in applications that involve the chi-square distribution.
where a > 0 is the shape parameter while x ≥ 0 is a random variable. Similar to the incomplete gamma function, its complementary \(\Gamma \left( {x,a} \right)\) is usually also introduced (Eq. (7)). In Eq. (7), \(\Gamma \left( a \right)\) is a (1).
Table 2 describes pseudocode of VSA algorithm.
4 Proposed methods
In this section, the hybrid form of VSA and CMs will be explained. The simple shape of the VSA consists of important keys that can be center and radius. First, the center is a current position from which the VSA may be evaluated based on the problem search space where iterations skip. With respect to exploration for a premier solution that has been carried out up to now, VSA used this function to identify the ‘center’ with the purpose of replacing a new position of the populations. Secondly, ‘radius’ is a method that is utilized to simplify the issues-creating a massive-radius problem to grow to be a small-radius problem. In extra, a Gaussian distribution is a VSA which is used to stability the exploration and exploitation at every iteration skip. However, the VSA used best a single center this is referred to as the single strategy to generate candidate solutions around the current great answer. However, the disadvantages of the VSA can be not noted from the local factor whilst it suffers from the issues that have numerous neighborhood minimal values. in the equal time, the radius used to update the pleasant solution have been capable of decrease the new release skip by way of the usage of a Gaussian distribution, making it less complicated to trap the VSA in neighborhood optima. This explains some drawbacks of the VSA. The presented have a look at specializes in hybridizing the VSA with the CMs. This hybridization is referred to as the chaotic VSA which 10 CMs have been used. These 10 maps have been used in three different locations of the VSA [74]. Figure 1 shows flowchart of proposed method. In the first step is done the initialization of the parameters. In the second step, the VSA Eqs. (9, 10, and 11) are optimized based on the chaotic maps in order to FS. In the third step, the samples are classified and at the end, the accuracy percentage is displayed.
In the proposed model, we combine the formulas of CMs based on Table 3 with Eqs. (3), (5), and (6). The goal is to find the best CMs to optimize VSA. These places can be expressed as follows:
- State 1:
-
the production of candidate solutions inside the search circle [Eq. (9)].
- State 2:
-
If the solution is out of range, these mappings are used to move to the desired range. (Eq. 10).
- State 3:
-
Reduced search radius using reverse gamma function and CMs [Eq. (11)].
In Table 3, the CMs formulas and methods are shown. The optimization of the VSA based on three methods (State1, State2, and State3) has been done. In each method have been used 10 CMs. So, in each run, there are 30 different modes for a given dataset.
Chaos is described as a phenomenon. Any exchange of its preliminary scenario might also purpose non-linear change for future behavior. Chaos optimization is one of the optimization models for search algorithms. The primary idea behind it is too seriously change parameters/variables from the chaos to the solution area. It relies upon for looking out of the global optimum on chaotic motion properties including ergodicity, regularity, and stochastic properties. The major advantages of chaos are speedy convergence and functionality for warding off local minima. CMs have a form of determinate in which no random factors are applied. In this paper, 10 distinguished non-invertible unidimensional maps were adopted to attain chaotic sets. The adopted CMs have been defined in Table 3, where q denotes the index of the chaotic sequence p, and \(p_{q}\) is the \(q^{th}\) number in the chaotic sequence. The remaining parameters including d, c, and μ are considered as the control parameters, determining the chaotic behavior of the dynamic system. The initial point p0 was set to 0.7 for all CMs, as the initial values for CMs may have a great influence of fluctuation patterns on CMs. In this paper, ten different CMs were applied for the optimization process. These maps are Chebyshev, circle, gauss/mouse, iterative, logistic, piecewise, sine, singer, sinusoidal, and tent [74].
Descriptions of State 1, State 2, and State 3 are as follows:
- State 1:
-
VSA generates candidate solutions using just a single ‘center’ (μ). The generation of ‘center’ is then transformed to new center when iterations pass through the limitation of upper and lower bound of problems. This mechanism has some problems. One of such problems is that VSA tends to be trapped in local minima when suffering from a local point of minimum problems. To overcome this, the CMs of candidate solution VSA was proposed.
In this method, chaos maps are used to generate candidate solutions. Several neighbor solutions \( C_{t} \left( s \right)\), (t indicates the iteration index and is t = 0 at initial stages) were generated randomly around the initial center µ0 in the d-dimensional space by using a Gaussian distribution and CMs. Here, \(C_{0} \left( s \right) = \left\{ {s_{1} ,s_{2} , \ldots ,s_{m} } \right\} m = 1,2,3, \ldots ,n\) represents the solutions, and n represents the total number of candidate solutions. In Eq. (9), the formula of the proposed method is given.
where d represents the dimension, cm is the \(d \times 1\) vector of a CMs variable, µ is the d × 1 vector of sample mean (center), and Σ is the covariance matrix.
- State 2:
-
If the solution is out of range, these mappings are used to move to the desired range. During the selection phase, a solution (i.e., the best one), \(\mathop s\limits \in C_{0} \left( s \right)\) is selected and memorized from C0(s) for the purpose of replacing the current circle center µ0. Before the selection phase, it must be made sure that the candidate solutions are inside the search boundaries. To attain this goal, the solutions that exceed the boundaries are shifted into the boundaries, as in Eq. (10). The VSA combined with chaotic sequences is described in Eq. (10). In Eq. (10), \(Cm\left( i \right)\) is the obtained value of chaotic map at \(j^{th}\) iteration.
- State 3:
-
Reduced search radius using reverse gamma function and CMs. In the VSA, the inverse incomplete gamma function is used for the purpose of decreasing the value of the radius during each iteration pass. The incomplete gamma function has been given in Eq. (11).
where a > 0 is known as the shape parameter and cm ≥ 0 is a CMs variable.
In the current study, the chaotic VSA has been implemented as an FS algorithm based on the wrapper method. In VSA, a chaotic sequence is embedded in the search iterations, and the optimal feature subset that describes the dataset is selected using VSA. The FS strategy is aimed at improving the classification efficiency, reducing the length of feature subset, and reducing the computational costs.
4.1 Fitness function
At each iteration, every point position is evaluated the use of a special fitness function fit. The data are randomly divided into extraordinary components, especially training and testing datasets by using the m-fold techniques. Goal standards are used for assessment, which are classification accuracy and the number of selected features. The followed fitness function equation hybrids the two standards into one by means of setting a weight factor as in Eq. (12). a is the class accuracy calculated with the aid of dividing the variety of efficiently labeled instances over the full variety of instances. K-nearest neighbor (KNN) [80] is the used classifier in which k equals to three with suggesting absolute distance. KNN is one in every of supervised learning algorithms which rely on classifying new instance based totally on distance from the new sample to the training samples. The KNN classifier predicts the class of the testing sample through calculating and sorting the distances between the testing sample and each one of the training samples. Such a process is repeated until each datum in the dataset has been selected once as the testing sample. What is meant by the classification accuracy of a feature subset is the ratio of the number of samples that have been predicted correctly to that of all the samples. In this paper, KNN has been used for determining the fitness of the selected features. The selection of K and distance method was decided based on trial and error. Ls is the length of the selected feature subset, Ln is the total number of features, and β is the weighted factor which has value in [0, 1]. β is used to control the importance of classification accuracy and the number of selected features. Since improving accuracy is the primary goal for any classifier, the weight factor is usually set to values near 1 [81]. In this paper, β was set to 0.8. The best solution is maximizes the classification accuracy and minimizes the number of selected features [81].
5 Result and discussion
In this section, first a summary of the main characteristics of the implemented datasets will be discussed. Second, the proposed methods (State1, State2 and State3) using different CMs will be investigated. Third, comparisons will be made between VSA and the proposed method based on FS. Finally, to emphasize the advantages of the proposed method compared to other algorithms, different experiments will be described and the obtained results will thoroughly be discussed.
5.1 Datasets description
Twenty-four benchmark datasets from different types including medical/biology and business were used in the experiments. Four datasets (21, 22, 23, and 24) were related to the identification and classification of the text author. The datasets were collected from the UCI machine learning repository [82]. A short description of each one of the adopted datasets has been presented in Table 4. As it can be observed, the used datasets involve missing values in some records. In the current study, all such missing values were replaced by the median value of all known values of a given feature class. The mathematical definition of the median method has been defined in Eq. (13). \(S_{i,j}\) parameter is the missing value for \(j^{th}\) feature of a given \(i^{th}\) class W. For missing categorical values, the most appeared value for a feature given class is replaced with the missing value.
Four different statistical measurements—including the worst, the best, the mean fitness value, and the standard deviation (SD) were adopted. In the current study, this test was used to evaluate the performance of each CM and determine the best one. The worst, the best, the mean fitness value, and the SD are mathematically defined as follows:
BS is the best score gained so far for each iteration.
5.2 Analysis and discussion
For evaluation of methods on different datasets of four criteria (worst, best, mean and SD) have been used. In Table 5, 30 modes and the VSA with the mentioned criteria are investigated. Proposed Method (State1) is equal to VSAC11 to VSAC101 modes, Proposed Method (State2) is equal to VSAC12 to VSAC102 modes, and Proposed Method (State3) is equal to VSAC13 to VSAC103 modes. With regard to the results, it can be stated that Proposed Method (State2) has better results. Proposed Method (State2) with VSAC101 mode offers best result than other modes using Tent map. The main target of this test is to evaluate the efficiency of VSA with different chaotic maps and define the optimal chaotic map (Tables 6, 7, 8).
5.3 Comparisons between VSA and proposed method based on FS
In Table 9 and Fig. 2, the results of the VSA and the Proposed Method are shown based on the FS. We chose the Proposed Method in order to FS because it had a high percentage of accuracy. Based on the results, it can be said that the Proposed Method in 19 datasets is better than the VSA.
5.4 Comparison and evaluation
Comparison of the Proposed Method with GA, PSO, ABC, BOA, IWO, FPA, and FA algorithms has been performed to evaluate the efficiency. In Table 10, the control parameters of the algorithms are expressed.
The comparison of the Proposed Method with the PSO, ABC, BOA, IWO, GA, FA, FPA, and VSA was performed according to the worst criteria. According to Table 11 and Fig. 3, it is clear that the results of other algorithms are worse than the Proposed Method.
In Table 12, the comparison of the Proposed Method with PSO, ABC, BOA, IWO, GA, FA, FPA, and VSA was performed based on the best criteria. According to Table 12 and Fig. 3, it is clear that the results of the Proposed Method are better than other algorithms.
In Table 13, the comparison of the Proposed Method with PSO, ABC, BOA, IWO, GA, FA, FPA, and VSA was performed based on the mean criteria. According to Table 13 and Fig. 3, it is clear that the results of the Proposed Method are better than other algorithms.
To sum up, the results and discussion of this paper demonstrate that integrating CMs to the VSA is definitely beneficial. The reason why that the Proposed Method outperforms all the other algorithms is that the Tent chaotic map assists this algorithm to highly emphasize exploration in the initial steps of optimization and reduced search radius.
6 Real application: author identification
Author identification, is a \(stylometric\) problem that tries to identify a copied text belonging to an original author [85, 86]. With ever-increasing volume of documents uploaded to the internet, new methods for analyzing and extracting data and knowledge are needed. In order to prevent plagiarism and copying copyrighted materials, the best solution is to use authorship identification. Every writer has his/her own writing style in manuscripts that he/she writes, and the writer’s style can be identified in other papers [87]. Authorship identification is one of the up-to-date problems in the field of natural language processing. Author identification, is an effort to show the writer’s personal characteristics, based on a piece of linguistic information [88] such that various manuscripts written by various authors can be distinguished. Humans possess certain writing patterns for using a language in their writings, which act like figure prints of the writer (writer print); these patterns are specific to the writers [89].
Authors in [90] have proposed an approach known as the \(stylometric\) approach to deal with the problem of Author Identification. There are four different steps in this approach:
-
Calculation of word frequencies to find the most frequent words in the entire corpus.
-
Calculation of normalized frequency. This is done by dividing the frequency of the most frequent word in that document to the total number of words in entire corpus.
-
Using Z-score method.
-
Calculation of distance table by finding distance between two matrices.
Therefore, since the text is converted into numeric representation (feature extraction), classification, and clustering techniques of machine learning can be implemented on it. The Reuter_50_50 data set is used for experiments. There are 50 authors and 50 documents per each author in this dataset. Thus, both training corpus and test corpus contains 2500 texts. These corpuses do not overlap with each other. By applying \(stylometry\) approach and n-gram features to the author identification problem an accuracy of about 85% of that of SVM classifier is achieved which is a higher accuracy in comparison to Delta and KNN classifier.
Dissimilarity Counter Method (DCM), DCM-Voting, and DCM-Classifier have been applied in [91] to the problem of Author Identification. Once the representation spaces are selected, similarity measures such as Euclidean distance, correlation coefficient, and Cosine can be used to compare the documents and then, the document author can be identified using one of the above-mentioned approaches (DCM, DCM-voting, or DCM-Classifier). DCM only uses the similarities between Victoria representations of documents in one space to solve a problem p of P. In the other two DCM-based approaches, it is possible to hybrid different representation spaces. In the case of DCM-voting approach, this is done using a voting technique and as for the DCM-classifier, it can be performed through a supervised learning method which requires the definition of predictive features. During evaluation of the challenge PAN-CLEF 2013, it is observed that DCM-classifier has the best performance only on the Greek corpus with 85%, and the two other approaches i.e. DCM-voting and DCM-classifier obtain the best results or equivalent to the winner of the competition for all evaluation measures (F1, precision and recall) on all the corpora.
The General Impostors Method (GENIM) which took part in the PAN'13 authorship identification competition has been evaluated in [92]. The basis of this model is the comparison made between the given documents and a number of external (impostor) documents, and since there are two stages in their method, the performance had to be measured and parameters needed to be optimized at each step. 25–33 percent of the training documents of each language were used for measuring and optimizing IM, whereas the rest were used for evaluation of GENIM. For the IM evaluation set, 3 or 4 documents were used as seed documents to retrieve the web impostor. The test accuracy is equal to 75.3%.
Blocks containing 140, 280, and 500 characters were investigated. The feature set contains conventional features like syntactic, lexical, application specific features, and some new features that are extracted from n-gram analysis. Moreover, the proposed approach has a mechanism for handling issues related to unbalanced dataset. It also uses Support Vector Machine (SVM) for data classification and uses Information Gain and Mutual Information as a FS strategy. The proposed approach was evaluated experimentally using the Enron email and Twitter corpuses. The results of this evaluation were very promising including an Equal Error Rate (EER) changing between 9.98% and 21.45%, for different block sizes [93].
In [94], by using a cluster-based classification approach, a model is presented for email authorship identification (EAI). Contributions of this paper are as follows: a) Developing a new model for email authorship identification. b) Evaluation of using additional features together with basic \( {\text{stylometric}}\) features for email authorship identification as well as content features that are based on Info Gain FS. On the Enron dataset, the proposed model achieved accuracies of 94, 89, and 81 percent for 10, 25, and 50 authors, respectively. Whereas, on real email dataset constructed by authors, it attained an accuracy of 89.5%.
A large number of researches only focus on enhancing predictive accuracy and do not pay much attention to intrinsic value of the collected evidence. In this paper, a customized associative classification approach, which is a well-known data mining technique, is applied to the authorship attribution problem. This method models the features of writing style which are unique to a person. Then, it measures the associativity level of these features and generates an instinctive classifier. In this research, it is also concluded that a more accurate write print can also be provided by applying modifications on the rule pruning and ranking system described in the popular Classification by Multiple Association Rule (CMAR) algorithm. More convincing evidences can be provided for a court of law by eliminating patterns common amongst different authors since it leads to fairly unique and easy-to-understand write prints. Since this customized abandonment counter method is helpful in solving the problem of the e-mail authorship attribution, it can be used as a powerful tool against cybercrimes. The effectiveness of the presented approach is verified by the results obtained through experiments [95].
An effort is made by authors in [96] to identify the author of articles written in Arabic. They introduced a new dataset which is composed of 12 features and 456 samples of 7 authors. Furthermore, to distinguish different authors from each other, powerful classification techniques were hybrids with the proposed dataset in their approach. The obtained results revealed that the proposed dataset was very successful and achieved a classification performance accuracy of 82% in the hold-out tests. They also conducted some experiments with two well-known classifiers namely the SVM and functional trees (FT) in order to show the efficiency of the proposed feature set. The reported an accuracy of 82% with the FT approach and holdout testing which confirmed robustness of the proposed feature set. Moreover, an accuracy of 100% has been achieved in one of the classes. They also conducted some test on FT by using tenfold cross validation and the proposed approach retained its accuracy to some extent.
One of the classifiers which have been extensively used for language processing is the Naive Bayes classifiers. Nevertheless, the event model used which can remarkably affect the classifier performance is not often mentioned. So far, Naive Bayes (NB) classifiers have never been used for authorship attribution in Arabic. Thus, they proposed to apply these classifiers to this problem, taking into consideration various event models such as simple NB, multinomial NB (MNB), multi-variant Bernoulli NB (MBNB), and Multi-variant Poisson NB (MPNB). The MBNB probability estimation is dependent on whether a feature exists or not, whereas MNB and MPNB a probability estimation is dependent on the frequency of the feature. The mean and standard deviation of the features form the basis of probability estimation in the NB model. The performances of these models are evaluated using a large Arabic dataset taken from books written by 10 different authors. Then, they are compared with other methods. The obtained results reveal that MBNB outperforms other techniques and is able to identify the author of a text with an accuracy of 97.43%. In addition, these results show that MNB and MBNB can be considered as a good choice for authorship attribution [97].
In [98], authorship identification methods were applied to messages of Arabic web forum. In this study, syntactic, lexical, structural, and content-specific writing style features were used to identify the authors. Some of the problematic characteristics of Arabic language were addressed in order to present a model with an acceptable degree of classification accuracy for authorship identification. SVM had a better performance than C4.5 and compared to English performance, the overall accuracy for Arabic was lower. These results were in consistence with previous researches. Finally, as future work, the authors proposed to analyze the differences between these two languages by evaluating the key features as determined by decision trees. Highlighting the linguistic differences between English and Arabic languages provides further insight into possible technique for enhancing the performance of authorship discrimination methodologies in an online, multilingual setting. The results showed accuracies of 85.43 and 81.03 for SVM and C4.5, respectively.
In [99], they developed an authorship visualization known as Write prints which can be used for identification of individuals based on their writing style. Unique writing style patterns are created through this visualization. These patterns can be distinguished in a similar way that fingerprint biometric systems work. Write prints provide an approach which is based on component analysis and utilizes a dynamic feature-based sliding window algorithm. This makes them very suitable for visualizing authorship across larger groups of messages. The performance of visualization across messages taken from three different Arabic and English forums was evaluated and compared with the performance of SVM. This comparison indicated that Write prints show an excellent classification performance and provide better results than SVM in many instances. They also concluded that visualization can be used to identify cyber criminals and can help users authenticate fellow online members to prevent cyber fraud. Accuracies of 68.92 and 87.00 were obtained for Write prints and SVM, respectively.
In [100], they introduced approaches to deal with imbalanced multi-class textual datasets. The main idea behind their approach is to divide the training texts into text samples based on the class size thus, a fairer classification model could be generated. Therefore, it becomes possible to divide majority classes into less and longer samples and minority classes into many shorter samples. They used text sampling techniques to form a training set based on a desirable distribution over the classes. By text sampling, they developed new synthetic data that artificially caused the training size of a class to increase. A series of authorship identification experiments were conducted by these researchers on different multiclass imbalanced cases belonging to two text corpora of two languages; newspaper reportage in Arabic and newswire stories in English. Properties of the presented techniques were revealed by the results obtained through these experiments. They also tested four methods to deal with the problem of class imbalance [100]:
-
The first method: To under-sample majority classes based on training texts. The same amount of text which was equal to the base was used. No modification is applied to the length of each text.
-
The second method: To Under-sample majority classes based on training text lines. All the training texts for a particular author were merged to form a big text. Assuming that xmin represents the size (in text lines) of the shortest big file then, the first xmin text lines of each big file were segmented into text samples of length a (in text lines). It is worth noting that there was as least one complete sentence in each text line in both corpora. It was concluded that smaller values (such as 2 or 3) lead to better results.
-
The third method: Re-balancing the dataset by text samples of varying length. As was mention earlier in this paper, one big file is generated for each author by concatenation of training texts. In other words, the length of text sample is equal to xi/k (where, k is predefined parameter). Short text samples belong to minority authors and long text samples belong to majority authors. Therefore, a balanced dataset is generated which consists of k text samples per class. Experiments were conducted for k = 10, 20, and 50. It is noteworthy that each text line of the training corpus is used exactly once in the text samples.
-
The fourth method: Re-balancing the dataset through text re-sampling. A big file is generated for each author once again. Assuming that x represents the text-length (in text lines) of the \({\text{ith}}\) author and xmax is the longest file then, for each author, k + xmax/xi text samples are generated each of which consisting of xi/k. Therefore, based on the length of the big file, a variable number of text samples are generated for each author. Nonetheless, the relationship is inversed now. Longer text samples are generated for the majority classes but a large number of short text samples are generated for the minority classes.
Using a data set extracted from Arabic novels, they the modified this to two sets of words AFW54 and AFW65, with 11 words eliminated [101]. These two sets were used convert several Arabic texts into frequency vectors. They carried out a performance evaluation on these word sets through experiments which used a hybridization of an EA and LDA to generate a classifier. Then, they fed unseen data to that classifier in order to test it. The obtained performance was apparently consistent with results of authorship attribution researches performed on other languages. It is arguable that AFW54 is a more suitable choice nevertheless; such a claim cannot be made with any statistical significance. For the cases considered here, only a small number of investigations are reported for evaluating the appropriate ‘chunk’ size. In real-world applications this will be probably dependent on several factors, but they have identified at least about 1,000 characterization of function word usage for Arabic authors. Through this work, they have confirmed that the concept of function words translates properly into the Arabic language. In other words, various authors use this set of words in various ways, and this enables us to recognize stylistic features of individual authors and use them to distinguish between different authors [101].
High dimensional datasets bring about more computational challenges. One of the problems with high dimensional datasets is that in most cases, all features of data are not crucial for the knowledge implicit in the data [85, 102]. Consequently, in most occasions, reduction in the dimensions of data is a favored subject. Often, many of candidate features for learning are irrelevant and superfluous and degrade the efficiency of the learning algorithm [103, 104]. Learning accuracy and teaching speed may be worsening with superfluous features. Therefore, choosing the corresponding necessary features in preprocessing phase is essentially important. In this section, for identifying the author, at the first stage, the frequency of words is obtained using the method TF-IDF [105]. At the second stage, each feature is weighted [106]. At the third stage, using metaheuristic algorithms, FS is performed. At the fourth stage, classification is performed via KNN [106].
Furthermore, we used the accuracy as the evaluation measure. This accuracy is calculated as:
In the case that TP represents the number of authors who are in the positive class while, TN indicates the number of authors who are in the negative class. Furthermore, FP is the number of authors falsely was considered as positive class by the model and FN is the number of authors falsely was considered as negative class by the model, even though they were positive.
6.1 Reuter_50_50 dataset
In this subsection, the Proposed Method and other algorithms are applied to Reuter_50_50 datasets. The dataset contains 2500 documents and 50 writers (https://archive.ics.uci.edu/ml/datasets/reuter_50_50). The results from the discussed algorithms and the results from other papers are presented in Table 14 and Fig. 4. The results show that the proposed method has a better identification accuracy compared to other algorithms. Moreover, BOA and FPA have also better identification accuracy compared to other algorithms.
6.2 PAN dataset
These datasets consist of scientific documents in Greek, English, and Spanish, and from 2011 until now, a new dataset has been added to the existing ones every year (https://pan.webis.de). The results from the discussed algorithms and the results from other papers on these datasets are evaluated in Table 15 and Fig. 4. Identification accuracy of proposed method for PAN11, PAN12, PAN13, PAN14, PAN15, and PAN16 are 84%, 80.9%, 81.3%, 82.12%, 83.25%, and 81.79%, respectively. Moreover, identification accuracies of DCM models are less than other algorithms. The algorithms BOA, ABC and IWO have better identification accuracies compared to the algorithms GA, PSO, FPA, and FA.
6.3 Enron email dataset
This dataset is collected and prepared by CALO project (a cognitive assistant that learns and organizes). This dataset includes comments of 150 users who are CEOs of Enron (https://www.cs.cmu.edu/~enron/). The results from the proposed method and the results from other papers on Enron Email dataset are presented in Table 16 and Fig. 4. The results show that the accuracy and error rate in proposed method are 95.04 and 11.68, respectively. Accuracy in the algorithms PSO, BOA and FPA are 91.02, 93.01, and 90.78, respectively. The accuracy and error rate in ABC algorithm are 90.02 and 15.2, respectively. Among other models, the model CCM-10 has a better accuracy, and the lowest accuracies are seen in the models Naïve Bayes and Bayes Net.
6.4 Arabic scripts
This dataset consists of 30 documents from 10 authors. The author was chosen from the website, (http://www.alwaraq.net) and their names are: Aljahedh, Alghazali, Alfarabi, Almas3ody, Almeqrezi, Altabary, Altow7edy, Ibnaljawzy, Ibnrshd, and Ibnsena. The results from the proposed method and the results from other papers on these datasets are presented in Table 17 and Fig. 4. The identification accuracy of proposed method model is 93.24%, which is better than other models.
According to the experiments results, it is concluded that the Proposed Method has a better performance than other models in terms of identification accuracy. According to Tables 15, 16, 17 the proposed method in benchmark functions is the closest to minimum compared to the algorithms FPA, IWO, BOA, ABC, PSO, GA, and FA. Moreover, the proposed method has a better accuracy in the author identification problem. The rate of accuracy of ABC, BOA and proposed method is indicated in Table 17. The results revealed that the proposed method outperformed to the other models that is ABC and BOA models. The percentage of proposed method is 93.24%. Consequently, the percentage of ABC is 91.00% and it is 92.51% for BOA model.
7 Conclusion and feature works
We proposed three State based on the hybrid of chaotic and VSA in this paper for FS. State2 compared with Method1, Method3, and VSA where it had better values. We also used State2 to FS and text author identification. This paper is accompanied by using 10 CMs to enhance the overall performance and precision of the VSA. VSA is introduced to one of the challenge problems, especially FS. The proposed methods have been evaluated on 24 benchmark datasets. Four precise evaluation standards are followed in this paper. These standards are worst, best, mean, and SD. Similarly, the performance of Proposed Method is compared with the popular and maximum current other algorithms. These algorithms are PSO, ABC, BOA, IWO, GA, FA, FPA, and VSA. The experimental effects show that State2 outperforms the other algorithms in terms of best and mean fitness.
Moreover, the outcomes displayed that the Proposed Method (State2) with Tent map can drastically enhance VSA in terms of classification overall performance, stability exceptional, number of FS, and convergence speed. Moreover, the outcomes showed that Tent map turned into the satisfactory map. Therefore, the following conclusion can be drawn:
-
The CMs improve the section of exploration because they change the radius of value search, helping the trapped masses to release themselves from local minima.
-
The CMs are permitted to adaptively adjust exploration and exploitation by the proposed method. As it were, the Proposed Method (State1) encourages VSA to transit gradually from the exploration stage to the exploitation stage.
Essential work on the integration of CMs with the addition of other metaheuristic algorithms will be considered. VSA's performance on more problematic science and real-world engineering problems will be applied in future verification.
References
Gharehchopogh FS, Gholizadeh H (2019) A comprehensive survey: whale optimization algorithm and its applications. Swarm Evol Comput 48:1–24
Shayanfar H, Gharehchopogh FS (2018) Farmland fertility: a new metaheuristic algorithm for solving continuous optimization problems. Appl Soft Comput 71:728–746
Razmjooy N, Khalilpour M, Ramezani M (2016) A new meta-heuristic optimization algorithm inspired by FIFA world cup competitions: theory and its application in PID designing for AVR system. J Control Autom Electr Syst 27(4):419–440
Razmjooy N, Ramezani M (2014) An improved quantum evolutionary algorithm based on invasive weed optimization. Indian J Sci Res 4(2):413–422
Gharehchopogh FS, Shayanfar H, Gholizadeh H (2019) A comprehensive survey on symbiotic organisms search algorithms. Artificial Intelligence Review
Harrison KR, Engelbrecht AP, Ombuki-Berman BM (2016) Inertia weight control strategies for particle swarm optimization. Swarm Intell 10(4):267–305
Xing B, Gao W-J (2014) Invasive Weed Optimization Algorithm. In: Xing B, Gao W-J (eds) Innovative COMPUTATIONAL INTELLIGENCE: A ROUGH GUIDE TO 134 CLEVER ALGORITHms. Springer International Publishing, Cham, pp 177–181
Qi X, Zhu Y, Zhang H (2017) A new meta-heuristic butterfly-inspired algorithm. J Comput Sci 23:226–239
Karaboga D (2005) An idea based on honeybee swarm for numerical optimization. Technical Report TR06, Erciyes University, Engineering Faculty, Computer Engineering Department
Pan W-T (2012) A new fruit fly optimization algorithm: taking the financial distress model as an example. Knowl-Based Syst 26:69–74
Yang XS (2008) Nature-Inspired Metaheuristic Algorithms. Luniver Press, United Kingdom
Gandomi AH, Alavi AH (2012) Krill herd: A new bio-inspired optimization algorithm. Commun Nonlinear Sci Numer Simul 17(12):4831–4845
Storn R, Price K (1996) Minimizing the real functions of the ICEC'96 contest by differential evolution. In: Proceedings of IEEE International Conference on Evolutionary Computation
Yang X-S (2012) Flower pollination algorithm for global optimization. In: Unconventional computation and natural computation. Berlin, Heidelberg
Navid R et al (2019) A comprehensive survey of new meta-heuristic algorithms. In: Recent advances in hybrid metaheuristics for data clustering, p 1–25
Ali N, Mehdi R, Navid R (2016) A New Meta-Heuristic Algorithm for Optimization Based on Variance Reduction of Gaussian distribution. Majlesi J Electr Eng 10(4):49–56
Li B, Jiang W (1998) Optimizing complex functions by chaos search. J Cybern Syst 29:409–419
Li Y-Y, Wen Q-Y, Li L-X (2009) Modified chaotic ant swarm to function optimization. J China Univ Posts Telecommun 16(1):58–63
Yi J, Jian D, Zhenhong S (2017) Pattern synthesis of MIMO radar based on chaotic differential evolution algorithm. Optik 140:794–801
He Y et al (2014) A novel chaotic differential evolution algorithm for short-term cascaded hydroelectric system scheduling. Int J Electr Power Energy Syst 61:455–462
Wang G-G et al (2014) Chaotic krill herd algorithm. Inf Sci 274:17–34
Prasad D, Mukherjee A, Mukherjee V (2017) Application of chaotic krill herd algorithm for optimal power flow with direct current link placement problem. Chaos Solitons Fractals 103:90–100
Yousri D et al (2019) Chaotic flower pollination and grey wolf algorithms for parameter extraction of bio-impedance models. Appl Soft Comput 75:750–774
Yousefi M et al (2018) Chaotic genetic algorithm and Adaboost ensemble metamodeling approach for optimum resource planning in emergency departments. Artif Intell Med 84:23–33
Hong W-C et al (2013) Cyclic electric load forecasting by seasonal SVR with chaotic genetic algorithm. Int J Electr Power Energy Syst 44(1):604–614
Chen K, Zhou F, Liu A (2018) Chaotic dynamic weight particle swarm optimization for numerical function optimization. Knowl-Based Syst 139:23–40
Chuang L-Y, Hsiao C-J, Yang C-H (2011) Chaotic particle swarm optimization for data clustering. Expert Syst Appl 38(12):14555–14563
Liu L et al (2018) Research on ships collision avoidance based on chaotic particle swarm optimization. In: Advances in smart vehicular technology, transportation, communication and applications. Springer International Publishing, Cham
Ji J et al (2017) Self-adaptive gravitational search algorithm with a modified chaotic local search. IEEE Access 5:17881–17895
García-Ródenas R, Linares LJ, López-Gómez JA (2019) A memetic chaotic gravitational search algorithm for unconstrained global optimization problems. Appl Soft Comput 79:14–29
Wang Y et al (2019) A hierarchical gravitational search algorithm with an effective gravitational constant. Swarm Evol Comput 46:118–139
Hong W-C et al (2019) Novel chaotic bat algorithm for forecasting complex motion of floating platforms. Appl Math Model 72:425–443
Wang H, Tan L, Niu B (2019) Feature selection for classification of microarray gene expression cancers using Bacterial Colony Optimization with multi-dimensional population. Swarm Evol Comput 48:172–181
Arora S, Anand P (2019) Binary butterfly optimization approaches for feature selection. Expert Syst Appl 116:147–160
Zakeri A, Hokmabadi A (2019) Efficient feature selection method using real-valued grasshopper optimization algorithm. Expert Syst Appl 119:61–72
Bolón-Canedo V, Alonso-Betanzos A (2019) Ensembles for feature selection: A review and future trends. Inf Fusion 52:1–12
Papa JP et al (2018) Feature selection through binary brain storm optimization. Comput Electr Eng 72:468–481
Guvenc U, Duman S, Hinislioglu Y (2017) Chaotic Moth Swarm Algorithm. In: 2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications (INISTA)
Wang S et al (2017) Multiple chaotic cuckoo search algorithm. In: Advances in Swarm Intelligence. Springer International Publishing, Cham
Rizk-Allah RM, Hassanien AE, Bhattacharyya S (2018) Chaotic crow search algorithm for fractional optimization problems. Appl Soft Comput 71:1161–1175
Chahkandi V, Yaghoobi M, Veisi G (2013) CABC–CSA: a new chaotic hybrid algorithm for solving optimization problems. Nonlinear Dyn 73:475–484
Zhang Y, Zhou W, Yi J (2016) A novel adaptive chaotic bacterial foraging optimization algorithm. In: 2016 International conference on computational modeling, simulation and applied mathematics (CMSAM 2016), p 1–8
Jia D, Zheng G, Khan MK (2011) An effective memetic differential evolution algorithm based on chaotic local search. Inf Sci 181(15):3175–3187
Thangaraj R et al (2012) Opposition based Chaotic Differential Evolution algorithm for solving global optimization problems. In 2012 fourth world congress on nature and biologically inspired computing (NaBIC)
Du Pengzhen TZ, Yan S (2014) A quantum glowworm swarm optimization algorithm based on chaotic sequence. Optimization 7(9)
Mitić M et al (2015) Chaotic fruit fly optimization algorithm. Knowl-Based Syst 89:446–458
Gandomi AH et al (2013) Chaos-enhanced accelerated particle swarm optimization. Commun Nonlinear Sci Numer Simul 18(2):327–340
Yao J-F et al (2001) A new optimization approach-chaos genetic algorithm. Syst Eng 1:015
Li J-W, Cheng Y-M, Chen K-Z (2014) Chaotic particle swarm optimization algorithm based on adaptive inertia weight. In: Control and Decision Conference (2014 CCDC), The 26th Chinese. IEEE
Xu X et al (2018) CS-PSO: chaotic particle swarm optimization algorithm for solving combinatorial optimization problems. Soft Comput 22(3):783–795
Sayed GI, Khoriba G, Haggag MH (2018) A novel chaotic salp swarm algorithm for global optimization and feature selection. Appl Intell p 1–20
Tuba E et al (2018) Chaotic elephant herding optimization algorithm. In: Applied Machine Intelligence and Informatics (SAMI), 2018 IEEE 16th World Symposium on. IEEE
Gandomi AH, Yang X-S (2014) Chaotic bat algorithm. J Comput Sci 5(2):224–232
Pan G, Xu Y (2016) Chaotic glowworm swarm optimization algorithm based on Gauss mutation. In: Natural computation, fuzzy systems and knowledge discovery (ICNC-FSKD), 2016 12th International Conference on. IEEE
Aslani H, Yaghoobi M, Akbarzadeh-T M-R (2015) Chaotic inertia weight in black hole algorithm for function optimization. In: Technology, Communication and Knowledge (ICTCK), 2015 International Congress on. IEEE
Yang X, Niu J, Cai Z (2018) Chaotic Simulated Annealing Particle Swarm Optimization Algorithm. In: 2018 2nd IEEE advanced information management, communicates, electronic and automation control conference (IMCEC). IEEE
Aggarwal S et al (2018) A social spider optimization algorithm with chaotic initialization for robust clustering. Proc Comput Sci 143(1):450–457
Zhang X, Feng T (2018) Chaotic bean optimization algorithm. Soft Comput 22(1):67–77
Boushaki SI, Kamel N, Bendjeghaba O (2018) A new quantum chaotic cuckoo search algorithm for data clustering. Expert Syst Appl 96:358–372
Tharwat A, Hassanien AE (2018) Chaotic antlion algorithm for parameter optimization of support vector machine. Appl Intell 48(3):670–686
Zhou Y, Su K, Shao L (2018) A new chaotic hybrid cognitive optimization algorithm. Cogn Syst Res 52:537–542
Mingjun J, Huanwen T (2004) Application of chaos in simulated annealing. Chaos, Solitons Fractals 21(4):933–941
Teng H, Cao A (2011) An novel quantum genetic algorithm with Piecewise Logistic chaotic map. In: Natural Computation (ICNC), 2011 Seventh International Conference on. IEEE
Kumar Y, Singh PK (2018) A chaotic teaching learning based optimization algorithm for clustering problems. Appl Intell, p 1–27
Yüzgeç U, Eser M (2018) Chaotic based differential evolution algorithm for optimization of baker's yeast drying process. Egypt Inf J
Ibrahim RA, Elaziz MA, Lu S (2018) Chaotic opposition-based grey-wolf optimization algorithm based on differential evolution and disruption operator for global optimization. Expert Syst Appl 108:1–27
Rahman TA et al (2017) Chaotic fractal search algorithm for global optimization with application to control design. In: Computer applications and industrial electronics (ISCAIE), 2017 IEEE symposium on. IEEE
Tuba E, Dolicanin E, Tuba M (2017) Chaotic brain storm optimization algorithm. In International conference on intelligent data engineering and automated learning. Springer, Berlin
Hinojosa S et al (2018) Improving multi-criterion optimization with chaos: a novel Multi-Objective Chaotic Crow Search Algorithm. Neural Comput Appl 29(8):319–335
Arora S, Anand P (2018) Chaotic grasshopper optimization algorithm for global optimization. Neural Comput Appl p 1–21
Saremi S, Mirjalili SM, Mirjalili S (2014) Chaotic Krill Herd Optimization Algorithm. Proc Technol 12:180–185
Wang G-G, Hossein Gandomi A, ossein Alavi A, (2013) A chaotic particle-swarm krill herd algorithm for global numerical optimization. Kybernetes 42(6):962–978
Zhenyu G et al (2006) Self-adaptive chaos differential evolution. In: International Conference on Natural Computation. Springer, Berlin
Gandomi AH et al (2013) Firefly algorithm with chaos. Commun Nonlinear Sci Numer Simul 18(1):89–98
dos Santos CL, Mariani VC (2012) Firefly algorithm approach based on chaotic Tinkerbell map applied to multivariable PID controller tuning. Comput Math Appl 64(8):2371–2382
Wang L et al (2018) A new chaotic starling particle swarm optimization algorithm for clustering problems. Math Prob Eng 2018
Sayed GI, Hassanien AE, Azar AT (2017) Feature selection via a novel chaotic crow search algorithm. Neural Computing and Applications, p 1–18
Kohli M, Arora S (2018) Chaotic grey wolf optimization algorithm for constrained optimization problems. J Comput Des Eng 5(4):458–472
Doğan B, Ölmez T (2015) A new metaheuristic for numerical function optimization: Vortex Search algorithm. Inf Sci 293:125–145
Martin B (1995) Instance-based learning: nearest neighbour with generalisation. doctoral dissertation, University of Waikato
Mafarja M et al (2019) Binary grasshopper optimisation algorithm approaches for feature selection problems. Expert Syst Appl 117:267–286
Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann ArborI
Eberhart R, Kennedy J (1995) A new optimizer using particle swarm theory. in MHS'95. In: Proceedings of the Sixth International Symposium on Micro Machine and Human Science
Villar-Rodriguez E et al (2016) A feature selection method for author identification in interactive communications based on supervised learning and language typicality. Eng Appl Artif Intell 56:175–184
Digamberrao KS, Prasad RS (2018) Author identification using sequential minimal optimization with rule-based decision tree on indian literature in Marathi. Proc Comput Sci 132:1086–1101
Bay Y, Çelebi E (2016) Feature selection for enhanced author identification of Turkish Text. In: Information sciences and systems. Springer, Cham
Zhang C et al (2014) Authorship identification from unstructured texts. Knowl-Based Syst 66:99–111
Zamani H et al (2014) Authorship identification using dynamic selection of features from probabilistic feature set. In: Information Access Evaluation, Multilinguality, multimodality, and interaction. Springer International Publishing, Cham
Nirkhi S, Dharaskar RV, Thakre VM (2014) Stylometric approach for author identification of online messages. Int J Comput Sci Inf Technol 5(5):6158–6159
Frery J, Largeron C, Juganaru-Mathieu M (2015) Author identification by automatic learning. In: 2015 13th International conference on document analysis and recognition (ICDAR)
Seidman S (2013) Authorship verification using the impostors method. In: Notebook for PAN at CLEF, p 13–16
Brocardo ML, Traore I, Woungang I (2015) Authorship verification of e-mail and tweet messages applied for continuous authentication. J Comput Syst Sci 81(8):1429–1440
Nizamani S, Memon N (2013) CEAI: CCM-based email authorship identification model. Egypt Inf J 14(3):239–249
Schmid MR, Iqbal F, Fung BCM (2015) E-mail authorship attribution using customized associative classification. Digit Investig 14:S116–S126
Otoom AF et al (2014) Towards author identification of Arabic text articles. In: 2014 5th International conference on information and communication systems (ICICS)
Altheneyan AS, Menai MEB (2014) Naïve Bayes classifiers for authorship attribution of Arabic texts. J King Saud Univ Comput Inf Sci 26(4):473–484
Abbasi A, Chen H (2005) Applying authorship analysis to arabic web content. In: Intelligence and Security Informatics. Springer, Berlin
Abbasi A, Chen H (2006) Visualizing authorship for identification. In: Intelligence and security informatics. Springer, Berlin
Stamatatos E (2008) Author identification: Using text sampling to handle the class imbalance problem. Inf Process Manage 44(2):790–799
Shaker K, Corne D (2010) Authorship Attribution in Arabic using a hybrid of evolutionary search and linear discriminant analysis. In: 2010 UK Workshop on Computational Intelligence (UKCI)
Wang Y, Feng L (2018) Hybrid feature selection using component co-occurrence based feature relevance measurement. Expert Syst Appl 102:83–99
Kushwaha N, Pant M (2018) Link based BPSO for feature selection in big data text clustering. Futur Gener Comput Syst 82:190–199
Marie-Sainte SL, Alalyani N (2018) Firefly algorithm based feature selection for arabic text classification. J King Saud Univ Comput Inf Sci
Uğuz H (2011) A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl-Based Syst 24(7):1024–1032
Trstenjak B, Mikac S, Donko D (2014) KNN with TF-IDF based Framework for Text Categorization. Proc Eng 69:1356–1364
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Gharehchopogh, F.S., Maleki, I. & Dizaji, Z.A. Chaotic vortex search algorithm: metaheuristic algorithm for feature selection. Evol. Intel. 15, 1777–1808 (2022). https://doi.org/10.1007/s12065-021-00590-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-021-00590-1