Applications of Flower Pollination Algorithm in Feature Selection and Knapsack Problems

Zawbaa, Hossam M.; Emary, E.

doi:10.1007/978-3-319-67669-2_10

Hossam M. Zawbaa³ &
E. Emary⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 744))

2732 Accesses
10 Citations

Abstract

This chapter presents one of the recently proposed bio-inspired optimization methods, namely, flower pollination algorithm (FPA). FPA for its capability to adaptively search a large search space with maybe many local optima has been employed to solve many real problems. FPA is used to handle the feature selection problem in wrapper-based approach where it is used to search the space of feature for an optimal feature set maximizing a given criteria. The used feature selection methodology was applied in classification and regression data sets and was found to be successful. Moreover, FPA was applied to handle the knapsack problem where different data sets with different dimensions were adopted to assess FPA performance. On all the mentioned problems FPA was benchmarked against bat algorithm (BA), genetic algorithm (GA), particle swarm optimization (PSO) and is found to be very competitive.

Access provided by CONRICYT-eBooks. Download chapter PDF

Binary Flower Pollination Algorithm and Its Application to Feature Selection

Hybrid Binary Particle Swarm Optimization and Flower Pollination Algorithm Based on Rough Set Approach for Feature Selection Problem

Adaptive Improved Flower Pollination Algorithm for Global Optimization

Keywords

1 Introduction

This chapter presents the importance of flower pollination algorithm (FPA) for feature selection for regression and classification data and knapsack. In the current applications of machine learning and pattern recognition techniques, there are thousands of such features. The vast amounts of data generated today in biology offer more detailed and useful information on one hand; on the contrary, it makes the data analyzing process more difficult because not all the information is relevant. Selecting the important features of a given dataset is a complex problem. Feature selection is a technique for solving classification and regression problems, and it identifies the significant feature subset and removes the unnecessary ones. This mechanism is particularly useful when the size of feature subset is large, and not all of them are required for describing the data features in experiments [1]. Hence, the use of feature selection method is crucial to reduce the enormous number of features. Feature selection helps in understanding data, decreasing the computation time, reducing the effect of the curse of dimensionality and enhancing the performance of prediction model [2]. Furthermore, the feature selection process enhances the visualization and the comprehensibility of the selected feature subset [3].

In real-world applications, due to different reasons not discussed here, many features introduce noise, while others can be totally irrelevant or even misleading, affecting prediction performance. In these cases, feature selection is a must [4]. Two main criteria are employed to differentiate between the feature selection algorithms as follows:

1.
Search strategy: the method employed to generate feature subsets or feature combinations.
2.
Subset quality (fitness): the criteria used to judge the quality of a feature subset.

There are two major approaches of feature selection methods: wrapper-based approach (applying machine learning algorithms) and filter-based approach (using statistical methods) [5]. The wrapper-based approach employs a machine learning technique as part of the assessment operation that helps to obtain better results than the filter-based [6], but it has a risk of over-fitting the model and can be computationally costly, and hence, a brilliant search method is required to minimize the computational time [7]. In contrast, the filter-based approach explores for a feature subset that optimizes a given data-dependent criterion rather than using classification-dependent criteria as in the wrapper methods [8].

In general, the feature selection is expressed as multi-objective with these two goals: (1) minimize the selected feature subset and (2) maximize the classification precision (minimize the prediction error in the regression problems). Commonly, these two goals are contradictory, and the optimal solution is a trade-off between them. Several search methods have been employed, based mainly on greedy search; however, these techniques have at least two drawbacks: stagnation in local optima and big computational time [9]. Evolutionary computing (EC) and population-based algorithms adaptively search the feature space by using a set of search agents that interact in a social manner to reach the optimal solution [10]. EC methods are inspired by the animal social and biological behavior in nature like (wolves, antlions, dragonflies, spiders, and so on) in a group [11].

Most of the recent optimization techniques are nature-inspired, i.e. they have been inspired from nature [12].

2 Related Work

Feature selection methods are composed of two elements: the search strategy and the evaluation technique (subset goodness). In the wrapper-based approach (alternative to the filter-based approach), the term wrapper refers to the assessment method. Learning boolean is a filter feature selection method that exhaustively explores all potential feature combinations and chooses the minimum feature subset [6].

Various heuristic techniques mimic the biological and physical conducts in nature, and they have been introduced as robust techniques for the global optimization. GA was the earliest evolutionary based technique proposed in the literature, later enhanced relying on the evolution operator during the reproduction [13]. GA feature selection method using a fuzzy set as the fitness function has been introduced in [14]. Wrapper-filter based feature selection methods combine GA with local search methods [15].

In particle swarm optimization (PSO) methods, a solution is represented by a particle with specific properties like position, fitness, and speed [16]. A binary version of PSO (BPSO) modifies the native PSO algorithm to deal with the binary optimization problems [17]. Moreover, an expanded version of BPSO is implemented to deal with feature selection [18]. The binary variant of bat algorithm (BBA) is employed to feature selection, where the search area is described as an n-cube [19].

Ant colony optimization (ACO) uses Fisher discrimination rate to adopt the heuristic information and rough set approach employed for feature selection [20]. Artificial fish swarm (AFS) algorithm mimics the stimulant reaction by controlling the tail and fin [21]. Artificial bee colony (ABC) relies on the natural conduct of honeybees that randomly produced employer bees are moved in the elite bee direction [22]. The elite bee represents the optimal (near to optimal) solution [23]. Antlion optimization algorithm (ALO) is a comparatively recent EC method, which simulates the antlions hunting in nature [24].

Artificial neural networks (ANN) particularly single hidden layer feed-forward neural networks (SLFN) are viewed as a standout amongst the most conventional machine learning models used in regression and classification domains [25]. The learning algorithm is considered the cornerstone of any neural network. Classical gradient-based learning algorithms are suffering from over-fitting, local minima, and they consume a long time to learn [26]. The back-propagation artificial neural network (BP-ANN) has average learning velocity and is likely to get caught in the local minima, leading to miserable performance and efficiency. The revised back propagation artificial neural network (RBP-ANN) is applied to defeat the constraints of BP-ANN and RBP-ANN [27].

In extreme learning machine (ELM) techniques, the output connections are tuned by solving an optimization problem, i.e. finding the minimum of the cost function by linearization [28]. Huang [29] introduced ELM in order to avoid some of the difficulties observed in gradient-based learning methods. ELM is used as a supervised learning method for SLFN neural networks [30, 31]. ELM is choosing the weights of the input and hidden layers randomly rather than completely adapting all the internal parameters. Moreover, ELM could analytically define the output layer weights [32].

3 Flower Pollination Algorithm (FPA) with Selected Applications

FPA is metaheuristic optimization technique relying on the pollination operation of flowering plants that introduced by Yang in 2012 [33]. Pollination is carried out in two modes self pollination (local search) and cross pollination (global search). Detailed information about the two ways of pollination as follow [34]:

1.
Cross pollination happens from the pollen of a flower of a different plant at long distance via pollinators that can fly a big distance (global pollination) [34]. In the cross pollination, the pollinators convey the flower pollens and can fly long distance to assure the pollination and proliferation of the optimal solution $g_{*}$. The initial rule may be formulated as in Eq. (1):
$$\begin{aligned} X_{i}^{t+1} = X_{i}^{t}+L(X_{i}^{t}-g_*), \end{aligned}$$
(1)
where $X_{i}^{t}$ represents the vector of a i solution at t iteration, $g_{*}$ demonstrates the present best solution, and L describes the pollination strength that randomly pulled from the Lèvy distribution.
2.
Self pollination is implantation of one flower from the pollen of identical flower or different flowers of the identical plant that usually happens when there is no pollinator possible. The local pollination and flower constancy is expressed as in the Eq. (2):
$$\begin{aligned} X_{i}^{t+1} = X_{i}^{t} +\varepsilon (X_{j}^{t} - X_{k}^{t}), \end{aligned}$$
(2)
where $X_{j}^{t}$ and $X_{k}^{t}$ demonstrate two random solutions, and $\varepsilon $ drawn from the uniform distribution.

Because of local pollination may have substantial fraction (p) in the aggregate pollination actions (in our experiments, we used p = 0.5). A switching probability $p \varepsilon [0, 1]$ manages the local and global pollination. FPA search methodology can be outlined as in the algorithm (1).

3.1 FPA Applied for Feature Selection

FPA is adopted here for exploiting the capabilities of filter and wrapper approaches for feature selection. The filter approach can be described as data-oriented methods that not directly related to classification performance. The wrapper-based approach is more related to prediction performance, but it does not face redundancy and dependency among the selected feature set.

We are seeking to find similarities and differences based on some evaluation criteria that may help in finding weak and strength features of each. All swarm intelligence methods regularly share the data between their multiple agents. Therefore, at every iteration, all/some agents upgrade/modify their position relied on the data of their own position and the other positions.

FPA is applied for feature selection in both classification and regression problems. For a vector with N features, the various feature selection would be $2^N$ that is the vast space of features to be searched exhaustively. Therefore, intelligent optimization is applied to explore the search area adaptively for best feature subset. The optimal feature subset is the one with least prediction error and a less number of selected features as a common objective in literature. In classification problems, the general fitness function for the proposed optimization algorithms is to maximize the classification accuracy over the validation set given the training set, as shown in Eq. (3) while keeping the minimum number of features selected:

$$\begin{aligned} \downarrow Fitness = \alpha (1-P) + \beta {\frac{\mid R \mid }{\mid C \mid }}, \end{aligned}$$

(3)

where R indicates the size of chosen feature set, C demonstrates the total number of features in the dataset, ${\alpha }$ and ${\beta }$ depict the significance of classification performance and the chosen feature set length, ${\alpha }\, {\in }\, [0, 1]$ and ${\beta = 1- {\alpha }}$, P is the classification performance measured as in Eq. (4):

$$\begin{aligned} P = \frac{N_c}{N}, \end{aligned}$$

(4)

where $N_c$ indicates the number of correctly classified instances, and N is the total number of instances.

In the case of regression problems, the general fitness function for the proposed optimization algorithms is to minimize the prediction error over the validation set given the training set as in Eq. (5) while keeping a minimum number of features selected.

$$\begin{aligned} \downarrow Fitness = \alpha * E + \beta {\frac{\mid R \mid }{\mid C \mid }}, \end{aligned}$$

(5)

where E indicates the prediction error, ${\alpha }$ and ${\beta }$ show the importance of prediction error and selected feature subset respectively. E is defined as:

$$\begin{aligned} E=\sum _{i=1}^M|a_i-t_i|, \end{aligned}$$

(6)

where $a_i$ and $t_i$ are the actual model prediction value and target value for point i in the validation set.

The used features are the same as the number of features in a given dataset. All features are limited in the range [0, 1], where the feature value approaches to 1; its corresponding feature is a candidate to be selected in classification. In individual fitness calculation, the feature is a threshold to decide whether a feature will be selected at the evaluation stage. Therefore, a static threshold of 0.5 is used as in the Eq. (7):

$$\begin{aligned} y_{ij} = {\left\{ \begin{array}{ll} 0 \text{ if }\, x_{ij} < 0.5 \\ 1 \text{ Otherwise }, \end{array}\right. } \end{aligned}$$

(7)

where $x_{ij}$ is a D—dimensional point in the search space of features and $y_{ij}$ is the binary value $\in {0,1}$ corresponding to selecting/unselecting feature j in solution i from the solution set.

3.2 FPA Applied for Knapsack Problem

Given a set of n elements with each element has a profit $p_j$ and a weight $w_j$ and a Knapsack of capacity C the objective is to find the most profitable solution without violating knapsack weight capacity [35]. A vector describing whether an element is selected or not can be represented in binary form with an n-dimensional vector with individual elements $x_i\in {0,1}$. So, the problem can be mathematically formulated as:

$$\begin{aligned} Maximimize \sum _{j=1}^{n} p_j x_j, \end{aligned}$$

(8)

subject to

$$\begin{aligned} \sum _{j=1}^{n} w_j x_j \leqslant C. \end{aligned}$$

(9)

The knapsack problem is an NP-hard problem which requires a very intelligent optimization to search the huge search space of possibilities. FPA is adopted in this work to solve a set of Knapsack problems with variant dimensions to prove the searching capability of the FPA. Death penalty [36] is adopted to handle the constraint of the knapsack while the total fitness is calculated as in Eq. (8) but with using negative sign to standardize the maximization into minimization.

4 Experimental Results and Discussion

The global and optimizer-specific parameter setting is outlined in Table 1. All the parameters are set either according to domain-specific knowledge as the $\alpha $ and $\beta $ parameters of the used fitness function, or based on trial and error on small simulations and common in literature such as the rest of parameters.

Table 1 The parameter setting for experiments

Applications of Flower Pollination Algorithm in Feature Selection and Knapsack Problems

Abstract

Similar content being viewed by others

Binary Flower Pollination Algorithm and Its Application to Feature Selection

Hybrid Binary Particle Swarm Optimization and Flower Pollination Algorithm Based on Rough Set Approach for Feature Selection Problem

Adaptive Improved Flower Pollination Algorithm for Global Optimization

Keywords

1 Introduction

2 Related Work

3 Flower Pollination Algorithm (FPA) with Selected Applications

3.1 FPA Applied for Feature Selection

3.2 FPA Applied for Knapsack Problem

4 Experimental Results and Discussion

4.1 Assessment Indicators

4.2 Datasets

4.3 FPA for Feature Selection Using Classification Data

4.4 FPA for Feature Selection Using Regression Data

4.5 FPA for Knapsack Problem

5 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation