Abstract
This chapter presents one of the recently proposed bio-inspired optimization methods, namely, flower pollination algorithm (FPA). FPA for its capability to adaptively search a large search space with maybe many local optima has been employed to solve many real problems. FPA is used to handle the feature selection problem in wrapper-based approach where it is used to search the space of feature for an optimal feature set maximizing a given criteria. The used feature selection methodology was applied in classification and regression data sets and was found to be successful. Moreover, FPA was applied to handle the knapsack problem where different data sets with different dimensions were adopted to assess FPA performance. On all the mentioned problems FPA was benchmarked against bat algorithm (BA), genetic algorithm (GA), particle swarm optimization (PSO) and is found to be very competitive.
Access provided by CONRICYT-eBooks. Download chapter PDF
Similar content being viewed by others
Keywords
- Flower pollination algorithm
- Bio-inspired optimization
- Evolutionary computation
- Feature selection
- Knapsack problem
1 Introduction
This chapter presents the importance of flower pollination algorithm (FPA) for feature selection for regression and classification data and knapsack. In the current applications of machine learning and pattern recognition techniques, there are thousands of such features. The vast amounts of data generated today in biology offer more detailed and useful information on one hand; on the contrary, it makes the data analyzing process more difficult because not all the information is relevant. Selecting the important features of a given dataset is a complex problem. Feature selection is a technique for solving classification and regression problems, and it identifies the significant feature subset and removes the unnecessary ones. This mechanism is particularly useful when the size of feature subset is large, and not all of them are required for describing the data features in experiments [1]. Hence, the use of feature selection method is crucial to reduce the enormous number of features. Feature selection helps in understanding data, decreasing the computation time, reducing the effect of the curse of dimensionality and enhancing the performance of prediction model [2]. Furthermore, the feature selection process enhances the visualization and the comprehensibility of the selected feature subset [3].
In real-world applications, due to different reasons not discussed here, many features introduce noise, while others can be totally irrelevant or even misleading, affecting prediction performance. In these cases, feature selection is a must [4]. Two main criteria are employed to differentiate between the feature selection algorithms as follows:
-
1.
Search strategy: the method employed to generate feature subsets or feature combinations.
-
2.
Subset quality (fitness): the criteria used to judge the quality of a feature subset.
There are two major approaches of feature selection methods: wrapper-based approach (applying machine learning algorithms) and filter-based approach (using statistical methods) [5]. The wrapper-based approach employs a machine learning technique as part of the assessment operation that helps to obtain better results than the filter-based [6], but it has a risk of over-fitting the model and can be computationally costly, and hence, a brilliant search method is required to minimize the computational time [7]. In contrast, the filter-based approach explores for a feature subset that optimizes a given data-dependent criterion rather than using classification-dependent criteria as in the wrapper methods [8].
In general, the feature selection is expressed as multi-objective with these two goals: (1) minimize the selected feature subset and (2) maximize the classification precision (minimize the prediction error in the regression problems). Commonly, these two goals are contradictory, and the optimal solution is a trade-off between them. Several search methods have been employed, based mainly on greedy search; however, these techniques have at least two drawbacks: stagnation in local optima and big computational time [9]. Evolutionary computing (EC) and population-based algorithms adaptively search the feature space by using a set of search agents that interact in a social manner to reach the optimal solution [10]. EC methods are inspired by the animal social and biological behavior in nature like (wolves, antlions, dragonflies, spiders, and so on) in a group [11].
Most of the recent optimization techniques are nature-inspired, i.e. they have been inspired from nature [12].
2 Related Work
Feature selection methods are composed of two elements: the search strategy and the evaluation technique (subset goodness). In the wrapper-based approach (alternative to the filter-based approach), the term wrapper refers to the assessment method. Learning boolean is a filter feature selection method that exhaustively explores all potential feature combinations and chooses the minimum feature subset [6].
Various heuristic techniques mimic the biological and physical conducts in nature, and they have been introduced as robust techniques for the global optimization. GA was the earliest evolutionary based technique proposed in the literature, later enhanced relying on the evolution operator during the reproduction [13]. GA feature selection method using a fuzzy set as the fitness function has been introduced in [14]. Wrapper-filter based feature selection methods combine GA with local search methods [15].
In particle swarm optimization (PSO) methods, a solution is represented by a particle with specific properties like position, fitness, and speed [16]. A binary version of PSO (BPSO) modifies the native PSO algorithm to deal with the binary optimization problems [17]. Moreover, an expanded version of BPSO is implemented to deal with feature selection [18]. The binary variant of bat algorithm (BBA) is employed to feature selection, where the search area is described as an n-cube [19].
Ant colony optimization (ACO) uses Fisher discrimination rate to adopt the heuristic information and rough set approach employed for feature selection [20]. Artificial fish swarm (AFS) algorithm mimics the stimulant reaction by controlling the tail and fin [21]. Artificial bee colony (ABC) relies on the natural conduct of honeybees that randomly produced employer bees are moved in the elite bee direction [22]. The elite bee represents the optimal (near to optimal) solution [23]. Antlion optimization algorithm (ALO) is a comparatively recent EC method, which simulates the antlions hunting in nature [24].
Artificial neural networks (ANN) particularly single hidden layer feed-forward neural networks (SLFN) are viewed as a standout amongst the most conventional machine learning models used in regression and classification domains [25]. The learning algorithm is considered the cornerstone of any neural network. Classical gradient-based learning algorithms are suffering from over-fitting, local minima, and they consume a long time to learn [26]. The back-propagation artificial neural network (BP-ANN) has average learning velocity and is likely to get caught in the local minima, leading to miserable performance and efficiency. The revised back propagation artificial neural network (RBP-ANN) is applied to defeat the constraints of BP-ANN and RBP-ANN [27].
In extreme learning machine (ELM) techniques, the output connections are tuned by solving an optimization problem, i.e. finding the minimum of the cost function by linearization [28]. Huang [29] introduced ELM in order to avoid some of the difficulties observed in gradient-based learning methods. ELM is used as a supervised learning method for SLFN neural networks [30, 31]. ELM is choosing the weights of the input and hidden layers randomly rather than completely adapting all the internal parameters. Moreover, ELM could analytically define the output layer weights [32].
3 Flower Pollination Algorithm (FPA) with Selected Applications
FPA is metaheuristic optimization technique relying on the pollination operation of flowering plants that introduced by Yang in 2012 [33]. Pollination is carried out in two modes self pollination (local search) and cross pollination (global search). Detailed information about the two ways of pollination as follow [34]:
-
1.
Cross pollination happens from the pollen of a flower of a different plant at long distance via pollinators that can fly a big distance (global pollination) [34]. In the cross pollination, the pollinators convey the flower pollens and can fly long distance to assure the pollination and proliferation of the optimal solution \(g_{*}\). The initial rule may be formulated as in Eq. (1):
$$\begin{aligned} X_{i}^{t+1} = X_{i}^{t}+L(X_{i}^{t}-g_*), \end{aligned}$$(1)where \(X_{i}^{t}\) represents the vector of a i solution at t iteration, \(g_{*}\) demonstrates the present best solution, and L describes the pollination strength that randomly pulled from the Lèvy distribution.
-
2.
Self pollination is implantation of one flower from the pollen of identical flower or different flowers of the identical plant that usually happens when there is no pollinator possible. The local pollination and flower constancy is expressed as in the Eq. (2):
$$\begin{aligned} X_{i}^{t+1} = X_{i}^{t} +\varepsilon (X_{j}^{t} - X_{k}^{t}), \end{aligned}$$(2)where \(X_{j}^{t}\) and \(X_{k}^{t}\) demonstrate two random solutions, and \(\varepsilon \) drawn from the uniform distribution.
Because of local pollination may have substantial fraction (p) in the aggregate pollination actions (in our experiments, we used p = 0.5). A switching probability \(p \varepsilon [0, 1]\) manages the local and global pollination. FPA search methodology can be outlined as in the algorithm (1).
3.1 FPA Applied for Feature Selection
FPA is adopted here for exploiting the capabilities of filter and wrapper approaches for feature selection. The filter approach can be described as data-oriented methods that not directly related to classification performance. The wrapper-based approach is more related to prediction performance, but it does not face redundancy and dependency among the selected feature set.
We are seeking to find similarities and differences based on some evaluation criteria that may help in finding weak and strength features of each. All swarm intelligence methods regularly share the data between their multiple agents. Therefore, at every iteration, all/some agents upgrade/modify their position relied on the data of their own position and the other positions.
FPA is applied for feature selection in both classification and regression problems. For a vector with N features, the various feature selection would be \(2^N\) that is the vast space of features to be searched exhaustively. Therefore, intelligent optimization is applied to explore the search area adaptively for best feature subset. The optimal feature subset is the one with least prediction error and a less number of selected features as a common objective in literature. In classification problems, the general fitness function for the proposed optimization algorithms is to maximize the classification accuracy over the validation set given the training set, as shown in Eq. (3) while keeping the minimum number of features selected:
where R indicates the size of chosen feature set, C demonstrates the total number of features in the dataset, \({\alpha }\) and \({\beta }\) depict the significance of classification performance and the chosen feature set length, \({\alpha }\, {\in }\, [0, 1]\) and \({\beta = 1- {\alpha }}\), P is the classification performance measured as in Eq. (4):
where \(N_c\) indicates the number of correctly classified instances, and N is the total number of instances.
In the case of regression problems, the general fitness function for the proposed optimization algorithms is to minimize the prediction error over the validation set given the training set as in Eq. (5) while keeping a minimum number of features selected.
where E indicates the prediction error, \({\alpha }\) and \({\beta }\) show the importance of prediction error and selected feature subset respectively. E is defined as:
where \(a_i\) and \(t_i\) are the actual model prediction value and target value for point i in the validation set.
The used features are the same as the number of features in a given dataset. All features are limited in the range [0, 1], where the feature value approaches to 1; its corresponding feature is a candidate to be selected in classification. In individual fitness calculation, the feature is a threshold to decide whether a feature will be selected at the evaluation stage. Therefore, a static threshold of 0.5 is used as in the Eq. (7):
where \(x_{ij}\) is a D—dimensional point in the search space of features and \(y_{ij}\) is the binary value \(\in {0,1}\) corresponding to selecting/unselecting feature j in solution i from the solution set.
3.2 FPA Applied for Knapsack Problem
Given a set of n elements with each element has a profit \(p_j\) and a weight \(w_j\) and a Knapsack of capacity C the objective is to find the most profitable solution without violating knapsack weight capacity [35]. A vector describing whether an element is selected or not can be represented in binary form with an n-dimensional vector with individual elements \(x_i\in {0,1}\). So, the problem can be mathematically formulated as:
subject to
The knapsack problem is an NP-hard problem which requires a very intelligent optimization to search the huge search space of possibilities. FPA is adopted in this work to solve a set of Knapsack problems with variant dimensions to prove the searching capability of the FPA. Death penalty [36] is adopted to handle the constraint of the knapsack while the total fitness is calculated as in Eq. (8) but with using negative sign to standardize the maximization into minimization.
4 Experimental Results and Discussion
The global and optimizer-specific parameter setting is outlined in Table 1. All the parameters are set either according to domain-specific knowledge as the \(\alpha \) and \(\beta \) parameters of the used fitness function, or based on trial and error on small simulations and common in literature such as the rest of parameters.
In this study, the wrapper approach is used to find a feature subset supervised by the prediction performance. Hence, an intelligent search method is necessary for searching the feature space. In the case of classification datasets, the used classifier in the fitness function as given in Eq. (3) is KNN [37]. KNN is utilized in the experiments based on trial and error basis where the best choice of K is selected \((K=5)\) as the best performing on all the datasets.
4.1 Assessment Indicators
Each algorithm has been applied \(K*M\) times with random positioning of the search agents except for the full features selected solution that was compelled to be a position for one of the search agents. Compelling the full features solution ensures that all consequent feature subsets; if selected as the global best solution, are fitter than it. Repeated runs of the optimization algorithms were applied to test their convergence capability. We have applied two types of indicators (measures) to compare the various algorithms.
-
1.
Firstly, this group of indicators is applied directly to the fitness function obtained based on the validation set and used to characterize the algorithm performance as follows:
-
Mean fitness: is an average value of all the solutions in the final sets obtained by an optimizer in a number of individual runs [38].
-
Median fitness: is used to assess the average performance tolerating noise performance of the optimizer over all the M runs [38].
-
Best fitness: is the minimum value of the fitness function that acquired by the optimizer in M independent applications [38].
-
Worst fitness: is the maximum fitness function value (or worst obtained fitness value) acquired by an optimization method in M independent applications [38].
-
Statistical standard deviation (std): is a representation of the variation of the obtained best solutions found for running a stochastic optimizer for M different runs. Std is used as an indicator for the optimizer capability to converge to same/similar optimal solution [38].
-
-
2.
The second group of indicators is applied to assess the performance of the entire prediction model as follows:
-
Average classification error: depicts how precise the classifier of the chosen feature subset, as shown in the Eq. (10):
$$\begin{aligned} Perf = \frac{1}{M} \sum _{j=1}^M\frac{1}{N}\sum _{i=1}^N Unmatch(C_i,L_i), \end{aligned}$$(10)where M represents the total number of runs for the optimization method, N describes the total instances in the test subset; \(C_i\) depicts the classifier output label of the i data instance. \(L_i\) denotes the source class label of the i data instance, and Unmatch specifies the function that yields 0 if the two labels are equivalent and yields 1 otherwise.
-
Mean square error (MSE): is measuring the mean square error of the difference between actual output and the predicted one as given in Eq. (11):
$$\begin{aligned} MSE = \frac{\sum _{i=1}^n(pred_i-obs_i)^2}{n}, \end{aligned}$$(11) -
Root mean square error (RMSE): is measuring the difference among actual output and the predicted ones as given in Eq. (12):
$$\begin{aligned} RMSE=\sqrt{\frac{\sum _{i=1}^n(obs_i-pred_i)^2}{n}}, \end{aligned}$$(12)where \(obs_i\) and \(pred_i\) are the observed and predicted values respectively. \(\mu \) represents the mean of the noticed values, n demonstrates the total of examples, and i depicts the example number in a given dataset.
-
Average selection size: demonstrates the average size of the chosen feature subset to the aggregate amount of features as in the Eq. (13):
$$\begin{aligned} Selection\_Size= \frac{1}{M}\sum _{i=1}^M \frac{size(g_*^i)}{N_t}, \end{aligned}$$(13)where \(N_t\) represents the total number of features in a given dataset.
-
Average feature reduction: demonstrates the mean size of the reduced features to the aggregate amount of features as in the Eq. (14):
$$\begin{aligned} Reduction = 1-\frac{1}{M}\sum _{i=1}^M \frac{size(g_*^i)}{N_t}, \end{aligned}$$(14) -
Average Fisher score (F-score): assesses the feature subset that has large distances between the data samples in various classes, while the distances among data instances in the same class are as minimum as possible [39]. F-score is computed for individual features given the class labels and for M independent applications of an algorithm; as shown in Eq. (15):
$$\begin{aligned} F_j=\frac{\sum _{k=1}^cn_k(\mu _k^j-\mu ^j)^2}{(\sigma ^j)^2}, \end{aligned}$$(15)where \(F_j\) is the Fisher score for feature j, \(\mu ^j\) is the mean of the entire dataset. \((\sigma ^j)^2\) is the standard deviation of the whole dataset, \(n_k\) denotes the size of the k class, and \(\mu _k^j\) indicates the mean of k class.
-
Wilcoxon: introduced by Wilcoxon [40] as a non-parametric test. The test allocates rank to all the scores considered as one group and afterward sums the ranks of every group. The null hypothesis originates from the same population, so any difference in the two rank sums come only from the testing error. The rank sum test is regularly depicted as the non-parametric version of the T-test for two independent groups.
-
T-test: is a statistical significance that decides whether or not the difference between two classes’ averages most likely reflects a real difference in the population from which the groups were sampled; as in the Eq. (16) [41].
$$\begin{aligned} t=\frac{\bar{x}-\mu _0}{\frac{S}{\sqrt{n}}} \end{aligned}$$(16)where \(\mu _0\) is the average of the t-distribution and \(\frac{S}{\sqrt{n}}\) is its standard deviation.
-
Average computational time: is the run time for a given optimization algorithm in millisecond that calculated over the different runs as given in Eq. (17):
$$\begin{aligned} T_{o}= \frac{1}{M}\sum _{i=1}^M RunTime_{o,i}, \end{aligned}$$(17)where M demonstrates the total number of runs for the optimizer O, and \(RunTime_{o,i}\) is the computational time in millisecond for optimizer o at run number i.
-
4.2 Datasets
All datasets were collected to have a variety of features and instances as delegates of various problem types, which the introduced methods will be examined on. Besides, we selected a set of respectively high dimensional data to ensure the performance of optimization algorithms in huge search spaces. Each dataset is split by cross-validation [42] mode for evaluation, which \(K-1\) folds are employed for the training, validation, and testing sets. Each set is repeated M times, hence, each optimizer is estimated \(K*M\) times for individual dataset. Each dataset is equally sized into training, validation, and testing. Training part is used to train the used classifier through optimization and at the final evaluation. Validation part used to assess the performance of the classifier at the optimization time. Testing part is employed to determine the finally selected features given the trained classifier. The classification and regression models are used to ensure the quality of the selected features and are assessed on the validation set inside the fitness function during the optimization process [6]. In the case of regression datasets, the regression model used in the fitness function as in Eq. (5) is extreme learning machine (ELM) with a different number of hidden layers and sigmoid basis function. ELM used for regression purposes and is adopted to evaluate the fitness function. ELM has seven nodes in input layer representation and one hundred hidden nodes (based on trial and error basis); because ELM needs more hidden nodes than the classical gradient training algorithms [28].
Table 2 outlines twenty-one datasets used in classification problems. The datasets are acquired from the UCI machine learning repository [43, 44]. Table 3 displays the ten datasets applied in the regression experiments. The used datasets are picked from the UCI machine learning repository [43].
4.3 FPA for Feature Selection Using Classification Data
In classification data category, the classifier used in fitness function as in Eq. (3) is KNN [37]. KNN is applied in the experiments based on trial and error basis where the best choice of K is selected \((K=5)\) as the best performing on all the datasets. The aggregate purpose of this part is to declare the bio-inspired optimization methods for feature selection approaches that minimize the selected feature set and maximize the classification performance from applying the whole features and conventional feature selection methods in the classification problem.
Table 4 outlines the average statistical mean fitness of FPA [45], BA [46], GA, and PSO optimization algorithms for all 21 classification datasets that calculated over the 20 runs. We can observe that all used optimization methods outperform the full features selected that proves the capability of wrapper-based method in feature selection problem. We can also highlight that the CS performs in general better than the other optimizers that demonstrate the ability of CS adaptively to explore the area for the optimal feature combination. For evaluating the stability of the stochastic algorithms in the study and converge to the same optimal solution. We measure the standard deviation, and the results are depicted in the Table 5. We can see that, although the FPA depends on Lèvy distribution that has infinite variance it still keeps comparable std measure.
Table 6 outlines the average classification error of the selected feature subset from the optimization methods of test set averaged over the 20 runs. From the table, FPA obtains the best results on average, thus demonstrating the capability of FPA to find optimal feature combinations ensuring proper test performance. Regarding the size of selected features on the original size, Table 7 outlines the kept feature ratio to the total number of features. We can notice that FFA gets the best selection feature subset results in general. The performance over the test data is to some extent compatible with the results from the F-score calculated over the selected features by the different optimizers; as shown in the Table 8. GA has obtained the best F-score values overall. Table 9 outlines the average computational time of different optimization algorithms. From the table, FPA has the best computational time in comparison to all other algorithms.
4.4 FPA for Feature Selection Using Regression Data
In regression data, the regression model used in fitness function as in Eq. (5) is extreme learning machine (ELM). The aggregate purpose of this section is to introduce bio-inspired optimization algorithms for feature selection approach that reduce the number of selected feature subset and reduce the prediction error from applying the whole feature set and conventional feature selection techniques in regression problems.
Table 10 outlines the average statistical mean fitness of BA, CS, DA, FFA, FPA, MAKHA, GA, and PSO optimization algorithms for all ten regression datasets that calculated over the 20 runs. We can highlight that the FPA performs in general better than the other optimizers that prove the capability of FPA adaptively to explore the search area for best feature subset. For evaluating the stability of the stochastic algorithms in the study and converge to the same optimal solution. The standard deviation results are depicted in the Table 11. We can see that, although the FPA depends on Lèvy distribution that has infinite variance it still keeps comparable std measure.
Table 12 describes the mean RMSE of the selected feature subset from the optimization algorithms of test data averaged over the 20 runs. From the table, FFA obtains the best results on average, thus demonstrating the capability of FFA to find optimal feature combinations ensuring proper test performance. Regarding the size of selected features on the original size, Table 13 outlines the kept feature ratio to the total number of features. We can highlight that GA obtains the best selection features size results overall. Table 14 outlines the average computational time of different optimization algorithms. From the table, DA has the best computational time in comparison to all other algorithms.
4.5 FPA for Knapsack Problem
In this section, FPA is used and benchmarked against BA, GA, and PSO on the binary Knapsack problem. A set of 20 benchmark problems were in the study having different dimensionality and capacities as in Table 15.
Functions F1–F20 are expected to evaluate the exploitation capability of a given algorithm. We can see in Table 16 that the performance of the FPA optimization algorithm on the average outperforms the other methods. Such result proves the exploitation capability of the FPA algorithm. The same conclusion can be derived by remarking the median performance presented in Table 17 where the FPA still outperform the BA, GA, and PSO algorithms.
Table 18 depicts the best performance indicator for running individual optimizers over 20 runs. Such indicator targets the optimistic users. We can see from the tables that the FPA outperforms the GA and PSO. Table 19 depicts the worst fitness indicator for both simple and composite benchmark functions. Such indicator is expected to assess the worst performance of a given optimizer and hence target the pessimistic users’ satisfaction. We can see from the table that the worst performance of the FPA still outperform the other algorithms and proves the capability of using such FPA for pessimistic applications.
Table 20 depicts the standard deviation of individual optimizer’s output best solution through the 30 runs. Such indicator is expected to assess the repeatability of the obtained solutions and the convergence to same/similar optima. We can see from Table 20 that the standard deviation for the FPA outperforms the other optimizers which proves that FAP has much exploration capability it can still converge to same/similar optimal and hence can be considered as a candidate optimizer for repeatable results.
Tables 21 and 22 depict The P-value for two of the common significance tests that are expected to assess the significance of output enhance using the proposed variants. The used significance tests are two-sided Wilcoxon test and T-test. We can see that the P-value for Wilcoxon and T-test are around 0 and hence neglecting the null hypothesis and hence proves the significance of the proposed variant that it is found to be significant using FPA rather than BA, GA, and PSO algorithms.
5 Conclusions
This work assesses the performance of FPA on two application domains namely feature selection and knapsack. For feature selection, FPA can overcome the performance of BA, GA, and PSO for its capability to adaptively search the search space with many local optima avoiding premature convergence. In the domain of knapsack also FPA is found to be very competitive to PSO, GA, and BA with the tolerable difference in run time and better optimization performance.
On the basis of future performance, we have five ideas that can be investigated in addition to the work presented here:
-
1.
The proposed FPA method will be assessed using complex datasets that have a huge number (thousands) of input features.
-
2.
Add more statistics evaluation measures such as (sensitivity, specificity, and F-measure).
-
3.
Employ bio-inspired optimization methods for solving the challenging problems and in different applications like big data, bioinformatics, and biomedical.
-
4.
Use more machine learning techniques for wrapper-based fitness evaluation such as support vector machine (SVM), random forest (RF), and support vector regression (SVR).
-
5.
Propose a multi-objective fitness function that uses bio-inspired algorithms to the find optimal feature subset.
References
Chizi, B., Rokach, L., Maimon, O.: A survey of feature selection techniques, pp. 1888–1895. IGI Global (2009)
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Huang, C.L.: ACO-based hybrid classification system with feature subset selection and model parameters optimization. Neurocomputing 73(1–3), 438–448 (2009)
Chen, Y., Miao, D., Wang, R.: A rough set approach to feature selection based on ant colony optimization. Pattern Recognit. Lett. 31(3), 226–233 (2010)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)
Xue, B., Zhang, M., Browne, W.N.: Particle swarm optimisation for feature selection in classification: novel initialisation and updating mechanisms. Appl. Soft Comput. 18, 261–276 (2014)
Guyon, I., Elisseeff, A.: An introduction to variable and attribute selection. Mach. Learn. Res. 3, 1157–1182 (2003)
Chuang, L.Y., Tsai, S.W., Yang, C.H.: Improved binary particle swarm optimization using catfish effect for feature selection. Expert Syst. Appl. 38(10), 12699–12707 (2011)
Xue, B., Zhang, M., Browne, W.N.: Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans. Cybern. 43(6), 1656–1671 (2013)
Shoghian, S., Kouzehgar, M.: A comparison among wolf pack search and four other optimization algorithms. Comput. Electr. Autom. Control Inf. Eng. 6(12), 1619–1624 (2012)
Valdez, F.: Bio-Inspired Optimization Methods. Handbook of Computational Intelligence, pp. 1533–1538. Springer (2015)
Jr, I.F., Yang, X.S., Fister, I., Brest, J., Fister, D.: A brief review of nature-inspired algorithms for optimization. Elektrotehniski Vestnik 80(3), 116–122 (2013)
Holland, J.H.: Adaptation in natural and artificial systems. MIT Press, Cambridge, MA, USA (1992)
Xue, X., Yao, M., Wu, Z., Yang, J.: Genetic ensemble of extreme learning machine. Neurocomputing 129(1), 175–184 (2014)
Zhu, Z.X., Ong, Y.S., Dash, M.: Wrapper-filter feature selection algorithm using a memetic framework. IEEE Trans. Syst. Man Cybern. Part B: Cybern 37, 70–76 (2007)
Eberhart, R., Kennedy, J.: A new optimizer using particle swarm theory, pp. 39–43. International Symposium on Micro Machine and Human, Science (1995)
Kennedy, J., Eberhart, R.C.: A discrete binary version of the particle swarm algorithm. IEEE International Conference on System, Man and Cybernetics, vol. 5, pp. 4104–4108 (1997)
Firpi, H.A., Goodman, E.: Swarmed feature selection. In: 33rd Applied Imagery Pattern Recognition Workshop, USA, pp. 112–118 (2004)
Nakamura, R.Y.M., Pereira, L.A.M., Costa, K.A., Rodrigues, D., Papa, J.P., Yang, X.S.: BBA: a binary bat algorithm for feature selection. In: IEEE XXV Conference on Graphics, Patterns and Images, pp. 291–297 (2012)
Ming, H.: A rough set based hybrid method to feature selection. In: International Symposium on Knowledge Acquisition and Modeling, pp. 585–588 (2008)
Li, X.L., Shao, Z.J., Qian, J.X.: An optimizing method based on autonomous animates: Fish-swarm algorithm, pp. 32–38. Methods and practices of system, engineering (2002)
Karaboga, D., Basturk, B.: A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. J. Glob. Optim. 39, 459–471 (2007)
Sundareswaran, K., Sreedevi, V.T.: Development of novel optimization procedure based on honey bee foraging behavior. In: International Conference on Systems, Man and Cybernetics, pp. 1220–1225 (2008)
Mirjalili, S.: The Ant Lion optimizer. Adv. Eng. Softw. 83, 80–98 (2015)
Miche, Y., Sorjamaa, A., Bas, P., Simula, O., Jutten, C., Lendasse, A.: OP-ELM: optimally pruned extreme learning machine. IEEE Trans. Neural Netw. 21(1), 158–162 (2010)
Han, F., Huang, D.S.: Improved extreme learning machine for function approximation by encoding a priori information. Neurocomputing 69(1), 2369–2373 (2006)
Xu, H., Yu, B.: Automatic thesaurus construction for spam filtering using revised back propagation neural network. Expert Syst. Appl. 37, 18–23 (2010)
Jiuwen, C., Zhiping, L.: Extreme Learning Machines on High Dimensional and Large Data Applications: A Survey. Mathematical Problems in Engineering, Hindawi Publishing Corporation, vol. 2015, no. 1, pp. 1–13 (2015)
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: International Joint Conference on Neural Networks, pp. 985–990 (2004)
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)
Li, X., Xie, H., Wang, R., Cai, Y., Cao, J., Wang, F., Min, H., Deng, X.: Empirical analysis: stock market prediction via extreme learning machine. Neural Comput. Appl. 1(3), 1–12 (2014)
Zhao, G.P., Hen, Z.Q., Miao, C.Y., Man, Z.H.: On improving the conditioning of extreme learning machine: a linear case. In: International Conference on Information, Communications and Signal Processing, pp. 1–5 (2009)
Yang, X.S.: Flower pollination algorithm for global optimization. Unconventional Computation and Natural Computation. Lecture Notes in Computer Science, vol. 7445, pp. 240–249 (2012)
Yang, X.S., karamanoglu, M., He, X.: Multi-objective Flower Algorithm for optimization. In: International Conference on Computational Science, Procedia Computer Science, vol. 18, pp. 861–868 (2013)
Ghosh, D., Goldengorin, B.: The binary knapsack problem: solutions with guaranteed quality. In: SOM-theme A Primary Processes within Firms (2001)
Yeniay, O.: Penalty function methods for constrained optimization with genetic algorithms. Math. Comput. Appl. 10(1), 45–56 (2005)
Yang, C.S., Chuang, L.Y., Li, J.C., Yang, C.H.: Chaotic binary particle swarm optimization for feature selection using logistic map. In: IEEE Conference on Soft Computing in Industrial Applications, pp. 107–112 (2008)
Tilahun, S.L., Ong, H.C.: Prey-predator algorithm: a new metaheuristic algorithm for optimization problems. Inf. Technol. Decis. Mak. 14(6), 1331–1352 (2015)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience (2000)
Wilcoxon, F.: Individual comparisons by ranking methods. Biom. Bull. 1(6), 80–83 (1945)
Rice, J.A.: Mathematical Statistics and Data Analysis, 3rd edn. Duxbury Advanced (2006)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Series in Statistics (2009)
Bache, K., Lichman, M.: UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences, 2013, lastchecked on 15 May 2017. http://archive.ics.uci.edu/ml
Raman, B., Ioerger, T.R.: Instance-Based Filter for Feature Selection. Machine Learning Research, pp. 1–23 (2002)
Yang, X.S.: Nature-Inspired Metaheuristic Algorithms, 2nd edn. Luniver Press, UK (2010)
Yang, X.S.: A New Metaheuristic Bat-Inspired Algorithm. Nature Inspired Cooperative Strategies for Optimization, vol. 284, pp. 65–74. Springer (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Zawbaa, H.M., Emary, E. (2018). Applications of Flower Pollination Algorithm in Feature Selection and Knapsack Problems. In: Yang, XS. (eds) Nature-Inspired Algorithms and Applied Optimization. Studies in Computational Intelligence, vol 744. Springer, Cham. https://doi.org/10.1007/978-3-319-67669-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-67669-2_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67668-5
Online ISBN: 978-3-319-67669-2
eBook Packages: EngineeringEngineering (R0)