Keywords

1 Introduction

Iterative updates in information technology increase the amount of data generated and retrieved, resulting in most of the data becoming high-dimensional. To deeply uncover the effective information hidden in the data, researchers are trying to design various advanced data mining methods. In this context, feature selection has become a widely studied method as it can reduce the dimensionality of data, using a small combination of features to express the most information of the primal high-dimensional data.

Most of the features in the high-dimensional data are not always valid and necessary. It may contain a large number of redundant, and invalid information [1], which will significantly affect the effect of data analysis. For example, redundant features may increase the computation cost while invalid information may decrease the classification accuracy. Therefore, an appropriate method of feature selection can improve the reliability and validity of data analysis. At present, the FS for high-dimensional data has been applied in many fields, such as customer relationship management in enterprises [2], disease prediction [3] in modern medical diagnosis technologies, and customer recommendation [4] in electronic commerce.

FS is a preprocessing process, using certain evaluation criteria to select subsets from the original feature space [5]. The process of basic FS is followed in Fig. 1.

Fig. 1.
figure 1

General flow of basic FS.

In the process of FS, the feature subsets are generated according to some regulations or strategies [1]. Then, the fitness criteria are used to evaluate the effectiveness of the subsets, and the better performance features are stored in the final feature subset. The purpose of selecting feature subsets is to filter out invalid, incomplete, and redundant features as much as possible [2]. In other words, a good feature subset can use fewer features to present most of the attributes of the original data. Therefore, the information conveys by the original data can be better retained. From the processing, it can be seen that FS is also a typical optimization problem. In consequence, FS is now being studied extensively.

Aiming at reducing feature redundancy, J. Lee [3] designed a multi-variate feature ranking mechanism for FS in high-dimensional microarray data. This mechanism not only considers the relevance of features and targets but also considers the redundancy of repeating features. The ranking mechanism filters out the high-quality feature to generate new subsets. E. S. Hosseini and M. H. Moattar [4] also noticed the imbalance of the data features and instances. The candidate feature subsets were formed and evaluated by a multi-interactive information strategy to reduce the analysis bias caused by the imbalanced data.

In addition to data redundancy and imbalance, FS should also consider the computational complexity. In this regard, some studies used the differential evolution method to filter features, enhancing the ability of FS to select feature combination [5]. F. Aghaeipoor and M.M. Javidi [6] adopted a hybrid FS method of filter and wrapper methods, and selected feature subsets by mRMR and fuzzy rules to improve estimation accuracy and decrease the computing time.

Many researchers regard FS as a single objective task in the past. However, there are increasing amounts of researchers who beginning to pay attention to the interpretability and rationality of the feature subsets generated by FS. The inner connection of each feature in a subset needs to be considered. Especially, when classifying high-dimensional data, the algorithm may generate some large feature subsets without removing the redundant features, which is antithetical to the original intention of FS to reduce the dimension of data. Nevertheless, if the feature subsets are too small, it may filter out a lot of information. To make the subsets more concise with low dimensions, it is wise to consider FS as a multi-objective problem. This paper takes error rates of classification and size of feature subsets as the objective of FS.

The swarm intelligence (SI) algorithms are widely used in multi-objective FS due to their characteristics of simple structure and fewer mathematical rules. SI algorithms imitate the evolution processes of the biological population to construct systems with functions of search optimal solutions. It means to use the population to search a large number of solutions to a given problem and find out the optimal one. Common SI algorithms are artificial bee colony (ABC) [7], particle swarm optimization (PSO) [8], ant colony optimization (ACO) [8], and so on.

Focusing on the problems of multi-objective FS, researchers have made various improvements based on the original SI methods. In [9], the authors take the classification accuracy and the number of selected features as the objectives of FS. To reduce the computational cost of multi-objectives, it embeds the updating mechanism of PSO into the ABC to control the displacement of search individuals. Besides, a ladder-like sample combinations strategy is developed to increase search efficiency. Considering reducing the size of the subsets, Y. Zhou, et al. [10] chose to use the discrete method to compress the search space of the particle swarm, and set the size of particle population by a flexible cut-point method to better search optimal solutions.

Only reducing the size of the subsets can easily make the algorithms fall into the situation of local optimum. To escape the local optimum, W. Wei, et al. [11] design a vector-based elite learning rule and applied it to the immune algorithm to remove redundant features in the data being analyzed. However, elite learning often brings about the problem of premature convergence. To control the algorithms’ speed of convergence, A. Santiago, et al. [12] propose a fuzzy selection of operators. All in all, SI methods are applied to multi-objective FS and they can be modified variously.

Except for the SI algorithms mentioned above, bacterial foraging optimization (BFO) is also a widely studied SI algorithm. BFO has been used to deal with a variety of complex optimization problems because of its outstanding searchability and versatility. Inspired by the foraging activities of E. coli, BFO simulates the life process of bacteria, Sect. 2 will give more details.

The common researches of BFO mainly focus on the improvement strategies of the algorithm, the problems of combinatorial optimization, and optimization scheduling. The improvement strategies mainly include parameters optimization, bacteria chemotaxis process optimization, and population structure optimization. For example, H. Chen et al. [13] designed a chaotic local search strategy to better control the convergence speed of BFO and the defect of the constant step length of bacteria chemotaxis, which increased the search diversity by dynamically changing the chemotaxis step length. In addition, the introduction of chaotic strategies has also expanded the search area of the bacteria population.

Except for the algorithm strategies’ improvements, BFO also has multiple applications. B. Turanoğlu and G. Akkaya [14], aiming at the problem of dynamic facility layout, design a hybrid algorithm combining BFO and simulated annealing (SA) to solve such complex optimization problems. In addition to the problem of the dynamic facility, M. Kaur and S. Kadam [15], in the optimal scheduling problem, propose a multi-objective flora optimization algorithm to find the scheme closest to the optimal scheduling solution. However, the size of the population will significantly influence the search effect of BFO. For modulating the search population of the algorithm, H. Wang, et al. [16] design a multi-dimensional population mechanism of BFO based on the FS method. The algorithm can generate populations in different sizes, thereby generating different feature subsets in classification tasks to increase the variety of solutions. At present, there is very little research that focuses on BFO-based multi-objective FS, but it can be found from those existing studies that this area is very useful and meaningful because BFO performs well in multi-objective problems [17,18,19].

1.1 Goals

This research aims to reduce the dimensionality of high-dimensional data when doing classification tasks. So we design a modified BFO with the improved strategies of adaptive foraging process, structural variation, and update search bacteria population based on multi-objective FS. The specific contributions are as follows:

  • Design a multi-objective BFO with structural variation to improve the problem of slow convergence of the original algorithm.

  • Adaptively adjust the process of chemotaxis, replication, and elimination-dispersal of the improved BFO to increase the diversity of the search process.

  • Design a feature subsets updating mechanism to identify features that can retain better classification effects and eliminate features that will weaken the classification accuracy.

  • Combine the proposed algorithm with the KNN classifier to form a wrapper FS method to achieve efficient classification effects.

1.2 Organization

The rest of this article is structured as follows: Sect. 2 describes the principle of the basic BFO. The proposed algorithm and its improvement strategies are described in Sect. 3. Section 4 shows the experimental results and analysis, and the conclusion is placed in Sect. 5.

2 Basic Bacterial Foraging Optimization

The basic Bacterial Foraging Optimization (BFO) was proposed by K.M. Passino, which mimics the biological habits of E. coli. He designed chemotaxis, replication, and elimination-dispersion three main processes for the BFO algorithm. In addition, each bacterium involved activities of swimming, turning, attraction, and repulsion. It can be seen that BFO is a biological heuristic algorithm, and the researches show that this algorithm has strong searchability [20].

The process of bacterial foraging is called chemotaxis. In the chemotaxis, bacteria swarm towards the place with a high concentration of nutrients. One chemotaxis contains two steps: tumbling and swimming. Bacteria select a random direction by tumbling to find a place with a high nutrient concentration. By swimming, bacteria continuously exploit the direction with high nutrient concentration. In addition, the bacteria colony also has a cell-to-cell signaling communication mechanism via special pheromone. Once a unit finds a good place, it will release an attraction pheromone to inform other units. On the contrary, if a bacterium is exposed to a lower nutrient concentration place or a noxious one, it will release a repulsive pheromone to warn other units to avoid approaching. In the BFO algorithm, this mechanism speeds up the convergence and finds the global optimum more efficiently. The position updating of bacterial foraging optimization during the chemotaxis stage is [20]:

$$\begin{array}{*{20}c} {\theta^i \left( {j + 1,k,l} \right) = \theta^i \left( {j,k,l} \right) + C\left( i \right)\frac{\Delta \left( i \right)}{{\sqrt {\Delta^T \left( i \right)\Delta \left( i \right)} }}} \\ \end{array}$$
(1)

The \(\theta^i \left( {j,k,l} \right)\) presents the position of \(i_{th}\) bacterium in \(j_{th}\) chemotaxis, \(k_{th}\) and \(l_{th}\) stand for the process of reproduction elimination-dispersal, respectively. \(C\left( i \right)\) means the step of chemotaxis and \(\Delta \left( i \right)\) is a manually setting parameter that controls the actives of \(i_{th}\) bacterium. As there are attraction and repulsion actives between bacteria, the formula is followed:

$$\begin{array}{*{20}c} {J_{cc} \left( {\theta^i \left( {j,k,l} \right),P\left( {j,k,l} \right)} \right) = \sum_j^S {J_{cc} \left( {\theta^i \left( {j,k,l} \right){,}\theta^j \left( {j,k,l} \right)} \right)} } \\ \end{array}$$
(2)

Where \(J_{cc} \left( {\theta^i \left( {j,k,l} \right),P\left( {j,k,l} \right)} \right)\) denotes the cell-to-cell signaling of \(i_{th}\) bacterium among other bacteria. \(J_{cc} \left( {\theta^i \left( {j,k,l} \right),\theta^j \left( {j,k,l} \right)} \right)\) controls the communication level of attractant or repellant between \(i_{th}\) and \(j_{th}\) bacterium. Therefore, the \(i_{th}\) the bacterium will be updating with swarming effect not just its fitness \(J\left( {{ }i{ },j{ },{ }k{ },l{ }} \right)\):

$$\begin{array}{*{20}c} {J\left( {i,j,k,l} \right) = J\left( {i,j,k,l} \right) + J_{cc} \left( {\theta^i \left( {j,k,l} \right),P\left( {j,k,l} \right)} \right)} \\ \end{array}$$
(3)

Reproduction simulates the most basic criteria in the natural world that is survival of the fittest. It improves the superiority of bacterial colonies and thus accelerates the convergence of BFO. Firstly, bacteria colonies are ranked according to the health level of each bacterium. Secondly, half of the bacteria with lower health levels are replaced by the half with higher. As a result, the overall health level in next-generation is improved. The health of bacteria are as follow:

$$\begin{array}{*{20}c} {J_{health}^i = \sum_{j = 1}^{N_c + 1} {J\left( {i,j,k,l} \right)} } \\ \end{array}$$
(4)

Where the \(J_{health}^i\) is the health level of \(i_{th}\) the bacterium, it is calculated by adding up each fitness value in \({ }N_c\) chemotaxis processes.

Elimination-dispersal emulates uncertainty in nature. In the life cycle of a bacteria colony, each bacterium could die abruptly or be transported to somewhere else randomly since the uncertain natural disasters. In BFO, the position of bacteria will be reset in a custom probability \(P_{ed}\), which helps the optimization algorithm reduce premature convergence.

3 The Proposed Multi-objective Adapting Chemotaxis BFO Method

Although the basic BFO can realize solving optimization problems. However, the algorithm’s computational time cost of BFO will be increased when faced with high-dimensional data. In addition, the original algorithm is more suitable for single-objective problems. In this paper, we regard FS as a multi-objective problem with two objectives, which are “minimize the size of feature subset” and “maximize classification effect (minimum error rate)”.

This paper proposes an effective algorithm, multi-objective adapting chemotaxis bacteria foraging optimization (MOACBFO), the general flow of MOACBFO is shown in Fig. 2. ‘h’ represents the running time of the program, ‘iter’ means the iteration times of each running. ‘m’ is the swimming number of times of each bacteria.

Fig. 2.
figure 2

General flow of MOACBFO.

Based on the original BFO, MOACBFO has four improved strategies, including structural variation (structure mutation of the basic BFO), external matrix mechanisms, adaptive foraging processes, and feature subsets updating mechanism.

3.1 Structure Variation

Traditional BFO is a multi-layer nested structure algorithm, when the bacterial population faces a high-dimensional search space, many calculation processes will be generated. However, as the scale of data analysis increases, many repetitive calculation results will be generated, which does not conform to the original intention of creating a concise and efficient algorithm. Therefore, this paper disassembled the nested structure of BFO, setting the three processes (chemotaxis, replication, and elimination-dispersal) to be carried out sequentially instead of being nested. We canceled the original nested loop structure of BFO and carried out serial and parallel transformations.

As Fig. 2 shows, in the search process of bacteria, the chemotaxis operation is performed first, and the classification effect of the intermediate process is used to decide whether the algorithm goes to replication operation or elimination-dispersal operation in the next stage of the algorithm. It is worth noting that replication or elimination-dispersal is not performed in every iteration, only if the conditions are met will one of the activities take place.

3.2 External Matrix

To make BFO more suitable for solving multi-objective problems, we introduce an external matrix (EMatrix) [21]. In each iteration, the matrix is used to store the location information of the population generated by the improved bacteria optimization algorithm, the classification error rate of the classifier, and the size of the feature subset generated each time. The classifier used in this paper is the K-nearest neighbor classifier, which is lightweight and efficient.

$$\begin{array}{*{20}c} {Fe_i = \left[ {x_{i1} ,x_{i2} , \ldots ,x_{in} } \right],i = 1,2, \ldots ,z.} \\ \end{array}$$
(5)
$$\begin{array}{*{20}c} {Pos_j = \left[ {Fe^{\prime}_1 ,Fe^{\prime}_2 , \ldots ,Fe^{\prime}_k } \right]} \\ \end{array}$$
(6)
$$\begin{array}{*{20}c} {ErrR_j = \left[ {fit\left( {Pos_1 } \right),fit\left( {Pos_2 } \right), \ldots ,fit\left( {Pos_{iter} } \right)} \right]} \\ \end{array}$$
(7)
$$\begin{array}{*{20}c} {FitN_j = \left[ {Num\left( {Pos_1 } \right),Num\left( {Pos_2 } \right), \ldots ,Num\left( {Pos_{iter} } \right)} \right]} \\ \end{array}$$
(8)
$$\begin{array}{*{20}c} {EMatrix = \left[ {Pos_j ,ErrR_j ,FitN_j } \right],j = 1,2, \ldots ,iter} \\ \end{array}$$
(9)

Formula 5 represents the set of all features, n represents the number of features, and z represents the number of samples. ‘Pos’ records the feature information of the jth generation of bacterial colony search, and ‘k’ represents the number of selected features, see Formula 6. ‘ErrR’ represents the classification error rate obtained by training the feature subset generated by the jth generation of flora into the classifier, see Formula 7 and ‘iter’ represents the number of iterations. ‘FitN’ represents the size of the feature subset generated by the jth generation of flora, see Formula 8.

3.3 Adapting Foraging Process

The adaptive bacterial foraging process is mainly the adaptive change of the step length of bacterial chemotaxis. In addition, there are adaptive options for replication and elimination-dispersal. The basic step length of chemotaxis in BFO is invariable, but the movement of bacteria is not rigid in the real world. The fixed-step makes the bacteria easy to be caught in the same local place which is not beneficial for the diversified development of the population. In other words, the lack of diversity of the bacterium will lead to the results becoming incomplete. In this paper, a simple adaptive chemotaxis step-changing method is adopted [22]. In the MOACBFO, the initial step control mechanism for each bacterium is followed formula (1011):

$$\begin{array}{*{20}c} { \gamma = \left| {\left( {1 - \frac{i}{Swarm}} \right)*\left( {Che_{start} - Che_{end} } \right) + Che_{end } } \right|} \\ \end{array}$$
(10)
$$\begin{array}{*{20}c} {Che_{start} = \frac{{\left| {J_{new} \left( {i,j} \right)} \right|}}{{\left| {J_{new} \left( { i,j} \right)} \right| + \gamma }} = \frac{1}{{ \left( {1 + \frac{\gamma }{{\left| {J_{new} \left( {i,j} \right)} \right|}}} \right)}}} \\ \end{array}$$
(11)

‘γ’ is an adaptive parameter following formula (10), it controls the step length of chemotaxis (‘Chest’). \(J_{new} \left( { i,j} \right)\) is the variation of formula (3). It removes ‘k’ and ‘l’ because the structure of BFO has been changed and the new strategy has followed the formula (14).

One bacterium may do reduplicative work if there is no communication between bacteria and the population. To reduce the computation cost, this paper introduces a learning strategy from PSO, see formula (12).

$$\begin{array}{*{20}c} {Che_{end} = Che_{start} + c_a R_a \left( {PB_i - P_i } \right) + c_b R_b \left( {B - P_i } \right)} \\ \end{array}$$
(12)

PBi’ stands for the best solution of ith bacterium and ‘B’ is the best location searched by the entire bacteria groups. Table 1 shows the pseudo-code of MOACBFO.

Table 1. The pseudo-code of MOACBFO

When the error rate is over 50%, the bacteria population will do the elimination-dispersal operation. A high error rate means the combination of features in the subset cannot reflect most information of the data. To save the computation resources, we set the MOACBFO only do elimination-dispersal instead of replication. On the contrary, if the error rate is small, the MOACBFO will take the replication activity.

3.4 Feature Subsets Updating Strategy

High dimensional data always bring thousands or more instances with many features. If the search population has been the same all the time, the diversity of feature subsets will lack change. This paper proposes a feature subsets updating strategy, see Table 2.

Before updating the features in a subset, the position of each bacterium ‘P’ needs to be found. The basic position updating formula is followed:

$$\begin{array}{*{20}c} {P\left( {i,j + 1} \right) = Pos\left( {i,j} \right) + C_{en} \left( i \right)\frac{\Delta \left( i \right)}{{\sqrt {\Delta^T \left( i \right)\Delta \left( i \right)} }}} \\ \end{array}$$
(13)

After recording the information of position into EMatrix, the fitness (error rate) must be calculated and saved into EMatrix. The fitness value \(J_{new} \left( {i,j} \right)\) is acquired by formula (14). It uses the K-nearest neighbor to be the classifier, which takes the ‘P’ as the input value.

$$\begin{array}{*{20}c} {J_{new} \left( {i,j} \right) = Classifier_{KNN} \left( {P\left( {i,j + 1} \right)} \right)} \\ \end{array}$$
(14)

The pseudo-code of feature subsets updating strategy is below.

Table 2. The pseudo-code of feature subsets updating

4 Experiments and Analysis

In this section, the proposed algorithm is compared with four derived versions of traditional swarm intelligent algorithms. The experiment is carried on 11 microarray data.

4.1 Experiment Design and Parameter Initialization

The comparing algorithms are BPSO [23], BFO [20], BFOLIW [24] and BWFS [21]. Table 3 records the characteristics of the comparing algorithms. To compare the MOACBFO to the multi-objective algorithm, the comparing algorithms in this paper are all added to the external matrix mechanisms used in this paper. As for this, the final comparison algorithms are BPSO, BFO, BFOLIW, and BWFS in a multi-objective version.

Table 3. Algorithms for comparison

To make the comparison experiment more fair and reliable, the parameters set was referred to most of the literature. As the popular size S is usually 30 to 50, this paper takes 45 as the popular size and dimension. The maximum iterations are 200. The running time of the program is 10.

4.2 Data Sources

The testing datasets are 11 microarray data from the UCI machine learning repository [24]. The microarray datasets are high-dimension data about cancer, which have two or more categories and more than 5,000 features. The experiment compares the error rate and the size of the subset of the algorithm.

Table 4. Microarray datasets

4.3 Experiment Results and Analysis

Figure 3 and Fig. 4 show the experiment results of the comparisons. The abscissa axis represents the size of the subsets generated by the algorithms, and the axis of ordinates records the error rate of classification. From the results, MOACBFO always gets a minimum error rate. It reflects that the MOACBFO has a good ability in searching the optimization result.

Fig. 3.
figure 3

Experiments results of all algorithms in comparison on microarray datasets.

Besides, MOACBFO can get a small number of features in a subset with a lower error rate compared with other methods. The results in Fig. 3 and Fig. 4 prove that MOACBFO has a better comprehensive performance. Especially in the datasets DLBCL, Lung-Cancer, Prostate-Tumor, Brain-Tumor2, Leukemia1, and SRBCT, MOACBFO gets better diversity of results.

Fig. 4.
figure 4

Experiments results of all algorithms in comparison on microarray datasets.

To compare the effect of the algorithms in more detail, Table 5 shows the numerical results of average error rates and subsets’ size of all methods. As shown in the Table 5, the MOACBFO can generally reach an error rate of less than 0.005. Although the number of features in each dataset is not the smallest one, the error rate of MOACBFO is low. Since the results need to fit the two objectives (minimize error rates and feature subsets).

Table 5. Average error rate and subset size of all algorithms

5 Conclusions

In general, this paper proposes a modified FS method MOACBFO and receives good performance in comparing with four swarm intelligence algorithms on 11 high-dimensional microarray datasets. MOACBFO effectively improves the classification results of the modified BFO method in a multi-objective optimization perspective. In addition, it achieves the four goals proposed in this paper. MOACBFO has a good convergence effect, better diversity of searching results, higher classification accuracy, and better effect during the multi-objective FS process in the target of classification. However, MOACBFO needs more comparison to become more stable. In the future, we should pay more attention to the application of MOACBFO and its modified version on practical problems.