Keywords

1 Introduction

Among the mathematical tools for dealing with fuzzy and inaccurate problems, rough set theory is widely used in many fields. However, the classic rough set model is not suitable for dealing with continuous data problems. For this reason, some scholars have introduced the concept of neighborhood rough set in order to solve such problems. Lin [1] proposed the concept of neighborhood model relation based on topology from the perspective of granularity. Subsequently, Hu et al. [2] proposed a fast reduction algorithm based on the neighborhood rough set model and a hybrid data reduction algorithm based on the neighborhood rough set, which can directly process unprocessed numerical data, thereby extending the classical rough set model. Application areas. In the neighborhood rough model, because the computational workload is much larger than that of discrete data, the attribute reduction and feature dimensionality reduction in the neighborhood rough model have a higher time complexity, especially with the dimensionality of the decision table. The increase in time complexity has increased geometrically. Therefore, reducing time complexity and reducing computational workload are the key research contents of many scholars.

In recent years, swarm intelligence algorithms have achieved good results in solving some optimization problems [4,5,6,7,8]. The researchers introduced the artificial bee colony optimization algorithm [4] into the attribute reduction of the neighborhood rough set, by constructing a moderate function to guide the colony to search for the smallest feature subset to achieve the purpose of reduction. The designed moderate function can reduce the number of iterations and reduce the reduction. Reduce the time complexity, improve the reduction rate of the algorithm, and obtain multiple minimum attribute reductions. For example, literature [5] proposed an information gain-guided bee colony optimization algorithm, which constructs the mutual information between conditional features and decision-making features based on the information entropy of the features, and finally obtains a feature subset. Literature [6] uses a combination of fuzzy rough set and bee colony algorithm for attribute reduction. Through dependence and reduction rate, an objective function that can reflect the size and importance of the attribute set is constructed, and the attribute reduction problem is transformed into an optimization problem. Finally, the goal is The function is an iterative criterion. Literature [7] proposes a new combined bee colony algorithm to solve the MAR problem, in which the lead bee, follow bee and scout bee adopt a search mode based on mutation calculation. Literature [8] proposed a method of combining rough sets in the classic field with artificial bee colonies, and its algorithm can obtain multiple minimum feature subsets.

The neighborhood rough set attribute subset selection algorithms optimized by these artificial bee colony algorithms all use attribute dependence and the number of attribute subsets as parameters to construct fitness functions. Although more satisfactory results have been obtained in the attribute search subset, there are limitations. For this reason, literature [9] introduces attribute importance based on information entropy as heuristic information, and constructs a new fitness function in combination with attribute dependence, thereby proposing an attribute selection algorithm, which improves The speed of attribute reduction reduces the number of iterations. However, taking the importance of information entropy attribute as the fitness function of the heuristic information, the feature subset searched by the bee colony in some decision tables may contain redundant attributes. This paper draws on the fitness function constructed in the literature [9], based on the attribute distinguishing different neighborhood objects in the discernibility matrix, defines a new attribute importance, and replaces the information entropy attribute in the fitness function with the new attribute importance Importance, an improved attribute selection algorithm of artificial bee colony and neighborhood rough set discernibility matrix is proposed. The main advantage of this algorithm is to provide an idea for the neighborhood rough set attribute reduction method, which reduces the number of generations and speeds up convergence speed and retention of the minimum attribute reduction collected during each iteration will make the result of generating attribute reduction more accurate. The rest of this article is organized as follows. Section 2 introduces the knowledge of neighborhood rough set and artificial bee colony algorithm, Sect. 3 gives attribute reduction algorithm, Sect. 4 gives experimental analysis, and Sect. 5 gives conclusions.

2 Related Knowledge

In order to facilitate understanding, the following briefly introduces related knowledge of neighborhood rough set, artificial bee colony optimization and its improved algorithm related to this article. For more detailed knowledge, please refer to literature [1, 3].

2.1 Basic Knowledge of Neighborhood Rough Set

In the rough set of neighborhood, the equivalence relationship is extended to the neighborhood relationship, and the information of the original data is retained to the greatest extent.

Definition 1

 [2] Given N in the real-dimensional space \(\varOmega \), \(\varDelta =R^{N} \times R \rightarrow R\) , then called \(\varDelta \) is a metric on \(R^{N}\). A non-empty finite set \(U=\left\{ x_{1}, x_{2}, \ldots , x_{n}\right\} \) on a given real number space \(\varOmega \). The neighborhood \(\delta \) of \(\forall \mathbf {x}_{i}\) is defined \(\forall \mathbf {x}_{i}\):

$$\begin{aligned} \delta \left( x_{i}\right) =\left\{ x \mid x \in U, \varDelta \left( x, x_{i}\right) \le \delta \right\} \end{aligned}$$
(1)

\(\varDelta \) represents the distance, and the Euclidean distance is used in this article, which is \(\mathrm {p}=2 \).

$$\begin{aligned} \varDelta _{p}\left( x_{1}, x_{2}\right) =\left( \sum _{i=1}^{N}\left| f\left( x_{1}, a_{k}\right) -f\left( x_{2}, a_{k}\right) \right| ^{p}\right) ^{\frac{1}{p}} \end{aligned}$$
(2)

Definition 2

 [2] Neighborhood decision system \(N D S=(U, C \cup D, V, f)\), to \(B \subseteq C\),\(X \subseteq U\), Then the lower approximation and upper approximation of the neighborhood are respectively defined as:

$$\begin{aligned} \underline{N}_{B}(X)=\left\{ x_{i} \mid \delta _{B}\left( x_{i}\right) \subseteq X, x_{i} \in U\right\} \end{aligned}$$
(3)
$$\begin{aligned} \bar{N}_{B}(X)=\left\{ x_{i} \mid \delta _{B}\left( x_{i}\right) \cap X \ne \varnothing , x_{i} \in U\right\} \end{aligned}$$
(4)

By Definition 2, the positive domain and negative domain of the neighborhood decision system are:

$$\begin{aligned} {\text {Pos}}_{B}(D)=\underline{N}_{B} D \end{aligned}$$
(5)
$$\begin{aligned} {\text {Neg}}_{B}(D)=U-\bar{N}_{B} D \end{aligned}$$
(6)

Definition 3

 [2] Neighborhood decision system \(N D S=(U, C \cup D, V, f)\), The dependence of decision attribute D on conditional attribute subset B is defined as:

$$\begin{aligned} \gamma _{B}(D)=\frac{\left| {\text {Pos}}_{B}(D)\right| }{|U|} \end{aligned}$$
(7)

Definition 4

 [2] Neighborhood decision system \(N D S=(U, C \cup D, V, f)\), to \(\forall c \in C\), If there is \(\gamma _{C-\{c\}}(D) \ne \gamma _{C}(D)\), then c is called a necessary attribute, otherwise it is an unnecessary attribute, and all necessary attributes form the core attribute set CORE(C).

Definition 5

 [2] Neighborhood decision system \(N D S=(U, C \cup D, V, f)\), \(\forall A \subseteq C\), If the conditional attribute subset A satisfies: (1)\(\gamma _{A}(D)=\gamma _{C}(D)\) (2)\(\forall a \in A, \gamma _{A}(D)>\gamma _{A-\{a\}}(D)\).

The conditional attribute subset A is said to be a relative reduction of conditional attribute set, and all reduction sets are denoted as RED(C).

2.2 Artificial Bee Colony and Its Improved Algorithm

Karaboga [4] and others proposed the artificial bee colony algorithm (ABC) to simulate the nectar-collecting behavior of bee colonies in nature. The algorithm converts the nectar source location problem into a function to generate the optimal solution. The process of bee colony searching for nectar source is the optimal solution. The process of. The algorithm idea is:

Suppose the dimension of the problem to be solved is D, the position of the honey source i in t iterations is expressed as \(X_{i}^{t}=\left[ x_{i 1}^{t}, x_{i 2}^{t}, \ldots , x_{i D}^{t}\right] \). Represents the current number of iterations. \(x_{i m} \in \left( K_{m}, U_{m}\right) \),\(K_{m}\) and \(U_{m}\) respectively represent the upper and lower bounds of the cost parameter, \(m=1,2, \ldots , D\). Randomly generate the initial position of the desired nectar source i. At the beginning of the search, the hired bee searches around the nectar source i to generate a new nectar source.

When the fitness of the new honey source \(W_{i}=\left[ \mathrm {w}_{i 1}, w_{i 2}, \ldots w_{i d}\right] \) is better than \(X_{i}\), the greedy selection method is used to replace \(X_{i}\) with \(W_{i}\), otherwise \(X_{i}\) is retained. After the hired bee completes the calculation, he flies back to the information exchange area to share the nectar source information. Follow the bee based on the nectar source information shared by the hired bee. The calculated probability is followed.

Then, the follower bee uses the roulette method to select the hired bee. During the search process, if the nectar source \(X_i\) reaches the threshold limit after trial iterations and no better nectar source is found, the nectar source \(X_i\) will be abandoned and the corresponding employment The role of the bee is transformed into a scout bee. The scout bee will randomly generate a new nectar source in the search space to replace \(X_i\).

Flow chart of artificial bee colony algorithm (Fig. 1.)

Fig. 1.
figure 1

Flow chart of artificial bee colony algorithm.

Artificial bee colony algorithm has the advantages of strong robustness and strong search ability. However, there are problems such as easy to fall into the local optimum and slow convergence speed in the later stage. Researchers have proposed a variety of improvement methods. Among them, the improved method of literature [10] has been widely used. Gao et al. [10] combined the differential evolution algorithm with the global optimal particle-improved bee colony algorithm, and proposed an algorithm with faster convergence speed. The information of the best solution is introduced into the generation of candidate honey sources, and the position update formula is modified to:

$$\begin{aligned} v_{i j}=x_{i j}+\varphi _{i j}\left( x_{i j}-x_{k j}\right) +\alpha _{i j}\left( x_{g j}-x_{k j}\right) \end{aligned}$$
(8)

Among them, \(x_{i j}\) is the best nectar source found by the bee colony so far, \(a_{i j} \in [0,1]\). This paper draws on the Formula (8) for updating the location of the honey source constructed by Gao et al. to optimize the algorithm proposed in this paper.

3 Improved Attribute Selection Algorithm for Artificial Bee Colony and Neighborhood Discrimination Matrix

3.1 Definition of Attribute Importance of Neighborhood Discernibility Matrix

A. Skowron [11] first proposed the use of discernibility matrix to express knowledge system, which is defined as follows:

Definition 6

 [11] Given a decision-making system \(D S=(U, C \cup D, V, f)\), where C is the conditional attribute set, D is the decision attribute set, and the universe is a non-empty finite set of objects \(U=\left\{ x_{1}, x_{2}, \ldots , x_{n}\right\} \), where is \(|U|=n\). Then define the discernibility matrix of the decision-making system as:

$$\begin{aligned} \left( M_{\mathrm {ij}}\right) _{n \times n}=\left( c_{i j}\right) _{n \times n}=\left[ \begin{array}{cccc} c_{11} &{} c_{12} &{} \cdots &{} c_{1 n} \\ c_{21} &{} c_{22} &{} \cdots &{} c_{2 n} \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ c_{n 1} &{} c_{n 2} &{} \cdots &{} c_{n n} \end{array}\right] \end{aligned}$$
(9)

Among, \(i, j=1,2,3, \ldots , n\)

$$\begin{aligned} C_{i j}=\left\{ \begin{array}{c} \left\{ a \mid a \in C \wedge f\left( x_{i}\right) \ne f\left( y_{i}\right) \right\} , f_{D}\left( x_{i}\right) \ne f_{D}\left( x_{i}\right) \\ \varnothing , \quad f_{D}\left( x_{i}\right) =f_{D}\left( x_{i}\right) \end{array}\right. \end{aligned}$$
(10)

In this paper, the above Definition 7 is extended to the rough set of the neighborhood, and the lower discernibility matrix of the neighborhood relationship is defined. The definition method is as follows:

Definition 7

Neighborhood decision system \(N D S=(U, C \cup D, V, f)\), where C is the condition attribute set, D is the decision attribute set, for any \(x_i\), \(x_j \in U\) and attributes \(a \in C\), the discrimination matrix \(M_{ij}\) of the decision system NDS is defined as:

$$\begin{aligned} \left\{ M_{\mathrm {ij}}\right) _{n \times n}=\left\{ \begin{array}{c} \left\{ a|a \in C \wedge | f_{a}\left( x_{i}\right) -f_{a}\left( x_{j}\right) \mid >\delta \left( x_{i}\right) \right\} , f_{D}\left( x_{i}\right) \ne f_{D}\left( x_{i}\right) \\ 0, f_{D}\left( x_{i}\right) \ne f_{D}\left( x_{i}\right) \wedge f_{C}\left( x_{i}\right) -f_{C}\left( x_{i}\right) <\delta \left( x_{i}\right) \\ \varnothing , f_{D}\left( x_{i}\right) =f_{D}\left( x_{i}\right) \end{array}\right. \end{aligned}$$
(11)

According to Definition 7, \(M_{ij}\) can be known that it is an upper triangle or a lower triangle matrix.

Theorem 1

In the neighborhood discernibility matrix \(M_{ij}\), The set of all single attribute elements is the necessary (core) attribute set of C relative to D, namely \({\text {CORE}}_{C}(D)=\left\{ a \mid a \in C \wedge \left( \exists c_{i j},\left( \left( c_{i j} \in M\right) \wedge \left( c_{i j}=\{a\}\right) \right) \right) \right\} \)

Through the analysis of Eq. (11), we know that for objects with equal decision attribute values, these objects do not need to be distinguished, so there is \(c_{i j}=\varphi \). For objects whose decision attribute values are not equal and belong to the same neighborhood, if these objects are indistinguishable, there is \(c_{i j}=0\). For objects with unequal decision attribute values and belonging to different neighborhoods, these objects can be distinguished, then \(c_{i j}\) is composed of conditional attributes in C that can distinguish these objects, which may include the attributes in CORE(C) or include Hor K. All the attributes in \(c_{i j}\) play a role in distinguishing objects \(x_{i}\) and \(x_{j}\) in different neighborhoods, but the effects are different. The necessary attributes have the greatest effect, the relative necessary attributes are the second, and the unnecessary attributes contribute the least. In order to accurately measure the role of different types of conditional attributes in distinguishing objects in different neighborhoods, this paper uses the frequency of occurrence of each attribute in \(c_{i j}\) and the weight of each attribute to measure its importance, which is defined as follows:

Definition 8

Neighborhood decision-making system \(N D S=(U, C \cup D, V, f)\), C is a set of conditional attributes, and D is a set of decision-making attributes. \(c_{i j}\) is a set of non-empty elements, K is the total number of sets of non-empty elements, for \(\forall c \in C\), the importance of attribute c to D is defined as:

$$\begin{aligned} {\text {Nsig}}(c)=\left\{ \begin{array}{c} 1, \quad \left| c_{i j}\right| =1 \\ \frac{\sum _{i, j=1,2, \ldots , n} r_{c} \cap c_{i j}}{K},\left| c_{i j}\right| >1 \end{array}\right. \end{aligned}$$
(12)

Among, \(r_{c} \cap c_{i j}=\left\{ \begin{array}{ll}\frac{1}{\left| c_{i j}\right| }, &{} c \cap c_{i j} \ne \varnothing \\ 0, &{} c \cap c_{i j}=\varnothing \end{array} i, j=1,2, \ldots , n\right. \),\(\left| c_{i j}\right| =1\) represents the number of attributes in the collection \(c_{i j}\).

It can be seen from the Formula (12) in Definition 8 that when \(\left| c_{i j}\right| =1\) is a single attribute, its importance is 1, which means that a single attribute contributes the most when distinguishing objects in different neighborhoods. This is the same as the single attribute in Theorem 2. The attributes match. When \(\left| c_{i j}\right| >1\), that is, the set of \(c_{i j}\) is composed of multiple attributes, it means that distinguishing objects in different neighborhoods is completed by multiple attributes, which may be necessary attributes, relatively necessary attributes or unnecessary attributes. Each attribute functions as \(\frac{1}{\left| c_{i j}\right| }\). Obviously, the importance of a nuclear attribute is the sum of its importance as a single attribute 1 and the importance of multiple attributes, namely: \(1+\frac{\sum _{i, j=1,2, \cdots , n} r_{c} \cap c_{i j}}{K}\), which also reflects that the nuclear attribute has the strongest ability to distinguish neighboring objects. For Definition 8, we can get two properties:

Property 1:

For the set \(c_{i j}\) of non-empty elements in the discernibility matrix M, if \(\forall c \in C\) and \(c \in c_{i j}\), then \(Nsig(c)>0\).

It can be seen from Property 1 that Definition 9 can not only obtain the importance of necessary attributes, but also the relative importance of necessary attributes and unnecessary attributes. It shows that the attributes in the set all play a role in distinguishing objects in different neighborhoods, avoiding the situation that the importance of non-core attributes in Definition 6 are all zero.

Property 2:

For the non-empty element set \(c_{i j}\) in the discernibility matrix M, there is \(\forall a,b \in c_{i j}\), if there are \(a \in CORE(C)\), \(b \in N\), then there is \(Nsig(a)>Nsig(b)>0\). Among them, \(N=C-CORE(C)\) represents the non-core attribute set.

Property 2 shows that the importance of nuclear attributes is the largest, and the importance of non-nuclear attributes is less than it. This is consistent with the properties of the rough set of neighborhoods. Therefore, Definition 9 accurately measures the role of attributes in neighborhood resolution.

3.2 Fitness Function Construction

The fitness function directly determines the evolution direction of the colony, the number of iterations, and the pros and cons of the solution. Therefore, the appropriate fitness function plays an important role in the attribute selection method. At present, the attribute selection algorithm optimized by the artificial bee colony algorithm uses the fitness function of the following Formula (13):

$$\begin{aligned} f i t=\alpha \cdot \gamma _{B}(D)+\beta \cdot \frac{|C|-|B|}{|C|} \end{aligned}$$
(13)

Among them, \(\alpha +\beta =1\) and \(\alpha +\beta \in [0,1]\), |C| is the total number of feature attributes of the data set, |B| is the length of the currently selected feature subset, and \(\gamma _{B}(D)\) is the attribute dependency. Equation (13) has been widely used in the research of various feature selection methods based on neighborhood rough set attribute dependence, and has achieved good results. However, the fitness function also has some shortcomings; one is that the attribute dependence in the fitness function is only calculated for the selected attributes, and the analysis and evaluation of other attributes in the attribute set are lacking [9]; the other is that the fitness function is only The feature is selected based on the attribute dependence and the number of attributes, ignoring the role of heuristic information that can speed up finding the reduced subset. For this reason, literature [9] introduces attribute importance based on information entropy as heuristic information, and combines attribute dependence to construct a new fitness function, as shown in the following Formula (14):

$$\begin{aligned} f i t^{2}=\lambda _{1} \cdot \gamma _{B}(D)+\lambda _{2} \cdot \frac{1}{1+e^{s i g(C-B)}}+\lambda _{3} \cdot \frac{|C|-|B|}{|C|} \end{aligned}$$
(14)

Among them, \(\lambda _{1},\lambda _{2},\lambda _{3}\) are weighting factors, where \(\lambda _{1}+\lambda _{2}+\lambda _{3}=1\) and \(\lambda _{1},\lambda _{2},\lambda _{3} \in [0,1]\); \(sig(C-B)\) is the attribute importance of the conditional feature set outside the selected feature subset B;

However, the attribute importance of information entropy is a quantitative measure of the change in the uncertainty domain. The change of any attribute may cause the uncertainty domain to change. Therefore, the information entropy method improves the non-kernel attributes under the rough set algebraic view. When the importance is zero. However, in some decision tables, the information entropy method has the situation that the importance of redundant attributes is greater than the importance of core attributes [16]. The subset of attributes found by the bee colony may contain redundant attributes, resulting in selection accuracy. Decreased significantly.

figure a

Aiming at the shortcomings of the fitness function constructed by the conditional entropy attribute importance in the literature [9], this paper replaces the fitness function with the attribute importance of the discernibility matrix defined in Sect. 3.1. The new fitness function is as follows (15).

$$\begin{aligned} \text {fitness'}=\omega _{1} \cdot \gamma _{B}(D)+\omega _{2} \cdot \frac{1}{1+e^{N s i g(c, C-B, D)}}+\omega _{3} \cdot \frac{|C|-|B|}{|C|} \end{aligned}$$
(15)

Among them, the weight factor \(\omega _{1},\omega _{2} ,\omega _{3} \) satisfies the condition \(\omega _{1}+\omega _{2} +\omega _{3}=1\) and \(\omega _{1},\omega _{2} ,\omega _{3} \in [0,1] \) ; \(Nsig(c,C-B,D)\) is the attribute importance of the conditional feature set outside the selected feature subset B under the neighborhood rough set discernibility matrix. let \(C=\left\{ c_{1}, c_{2}, \ldots , c_{n}\right\} \), \(\{0,1\}^{n}\) denote m-dimensional Boolean space, define \(x=\left( x_{1}, x_{2}, \ldots , x_{n}\right) \notin \{0,1\}^{n}\).

3.3 Neighborhood Discernibility Matrix Importance and Artificial Bee Colony Feature Selection Algorithm

According to the fitness function constructed by the new neighborhood discernibility matrix attribute importance, we design an artificial bee colony feature search algorithm (ABCNRT algorithm) based on the discernibility matrix attribute importance. The main idea is: First, obtain the discernibility matrix and calculate Resolve the importance of each attribute in the matrix; secondly, construct the fitness function; finally, iterate according to the direction of the fitness function value, until the search for the smallest feature set stops the algorithm. The specific steps of the ABCNRT algorithm are as follows. In the experiment part, the fitness function of the algorithm is used to conduct two sets of experiments with Eqs. (13) and (15).

Algorithm Complexity Analysis. The time complexity of the algorithm in this paper mainly depends on the number of bee colonies nPop, the variable dimension D, the number of conditional attributes M, and the number of universe objects |U|. Among them, the time complexity of calculating the discernibility matrix in the fitness function is \(O\left( |M \Vert U|^{2}\right) \), Suppose the sample is divided into k sets, so the time complexity of importance in the fitness function is \(O\left( (|M|+N) \frac{|U|^{2}}{k}\right) \). In addition to the neighborhood rough set calculation, the outer loop also has the number of bee colonies and variable dimensions. Therefore, in general, the time complexity of the ABCNRT algorithm is \(O\left( (|M|+N) \frac{|U|^{2}}{k} * k{*} D * n P o p\right) \). Among them, it can be calculated in parallel in the process of calculating the degree of dependence, reducing the time complexity as \(O\left( (|M|+N) \frac{|U|^{2}}{k} * D * n P o p\right) \).

Table 1. Data set

4 Experiment Analysis

In order to verify the superiority and reliability of the proposed algorithm. We select 8 data sets from UCI machine learning library (Table 1) for experiments, and compare the algorithm results obtained from experiments with the neighborhood rough set NRS algorithm proposed in literature [3]. Feature selection FSRSWOA algorithm based on rough set and improved Whale optimization algorithm in literature [14] and feature selection PSORSFS algorithm based on rough set and particle swarm optimization algorithm in literature [15] were performed 20 times in the experiment. The software environment is Win 10 operating system, the programming software is Python3.6, the hardware is Intel I7 1040 3.6 Ghz, and the memory is 8 GB.

4.1 Selection of \(\delta \)

In order to remove the influence of dimensions on the data, the data set data is normalized first. Experiment on the data set again, and the experimental results are shown in Fig. 2.

Fig. 2.
figure 2

Classification accuracy varies with threshold

In the experiment, due to the selected threshold \(\delta \), the classification accuracy of different data sets will be different. When calculating the classification accuracy, four data sets in the data set are selected, and the CART classifier is used to classify these four data sets to select the best \(\delta \) value. The classification accuracy changes with the threshold \(\delta \) were cross-validated and analyzed ten times.

It can be seen from Fig. 2 that the threshold is within the range of [0.1, 0.15], and the classification accuracy of the data set is better. In this paper, \(\delta =0.125\) is selected for the experiment.

4.2 Algorithm Comparison Results

Compare this algorithm with NRS algorithm [3], FSRSWOA algorithm [14] and PSORSFS algorithm [15]. The value of ABCNRT algorithm is: initial colony size N = 20, the maximum number of iterations M = 50. When the values of the neighboring particles \(\delta \) are all set to 0.125, three algorithms are used to perform attribute reduction on 8 data sets.

Original Fitness Function. The original fitness function \(f i t=\alpha \cdot \gamma _{B}(D)+\beta \cdot \frac{|C|-|B|}{|C|}\) is selected, and the minimum attribute reduction problem is transformed into the maximum fitness function value problem. Take \(\alpha =0.7\) and \(\beta =0.3\). First, use the ABCNRT algorithm to reduce the 8 UCI data. Table 2 shows the reduction of ABCNRT algorithm.

Table 2. ABCNRT algorithm reduction result

According to Table 2, 5 of the 8 UCI data sets have a reduction rate of more than \(80\%\), and 2 of them have a reduction rate of about \(70\%.\) Only the Heart data set has a lower reduction rate of about \(30\%\). There are more subsets of reductions. The first six data are compared with the initial classification accuracy of Fig. 1. According to Table 2, it can be seen that all data sets are almost better than the initial classification accuracy. It shows that the above table fully reflects the better reduction results achieved by ABCNRT algorithm reduction.

Table 3, 4 shows the comparative experimental results of PSORSFS algorithm, FSRSWOA algorithm, NRS algorithm, ABCNRT algorithm attribute reduction and the best classification accuracy.

According to Table 3 below, the attribute reduction number analysis of the four algorithms for the eight data sets shows that when comparing the NRS algorithm, it is found that except for the Heart data set, the other data ABCNRT algorithm is better than the NRS algorithm. When comparing the PSORSFS algorithm, except for Wine and Heart, the number of other attribute reductions is more than that of the ABCNRT algorithm. When comparing the FSRSWOA algorithm, except for the Heart, Wine and Ionosphere data sets, the number of reductions in some running results is less than that of the ABCNRT algorithm, and the number of other attribute reductions is more than that of the ABCNRT algorithm.

According to the following Table 4, the classification accuracy analysis of the eight data sets in the four algorithms shows that the classification accuracy of the Heart data set is better than the ABCNRT algorithm. The classification accuracy of the Zoo data set is better than the PSORSFS algorithm and the FSRSWOA algorithm. For the ABCNRT algorithm, the Sonar data set FSRSWOA algorithm is better than the ABCNRT algorithm, and the classification accuracy of the ABCNRT algorithm in other data sets is better than the other three algorithms.

Table 3. Attribute reduction
Table 4. Optimal classification accuracy

In summary, it can be proved that the ABCNRT algorithm has advantages.

Improve Fitness Function. The fitness function formula \(\text {fitness'}=\omega _{1} \cdot \gamma _{B}(D)+\omega _{2} \cdot \frac{1}{1+e^{N s i g(c, C-B, D)}}+\omega _{3} \cdot \frac{|C|-|B|}{|C|}\) improved in this paper is adopted, in which the problem of minimum attribute reduction is transformed into the problem of maximum fitness function value, and \(\omega _{1}=0.6,\omega _{2} =0.3,\omega _{3}=0.1\) is taken. Table 4 shows the reduction situation obtained with the improved fitness function formula.

Table 5. ABCNRT algorithm reduction result

It can be seen from Table 5 that the reduction rate of the three data sets Sonar, Ionosphere, Heart, and CMC in the table has been significantly improved. Among them, the reduction rate of 6 data sets has reached more than \(80\%\), and the reduction rate of the Heart data set The reduction rate has also increased to about \(50\%\). In terms of classification accuracy, Sonar, Ionosphere, Heart, and CMC have significantly improved the classification accuracy with the original fitness function based on the improvement of the reduction rate.

This paper improves the predecessor’s fitness function formula, changes the predecessor’s use of information entropy to calculate the importance of the attribute, and calculates the importance of the attribute more accurately. It also provides a new way to calculate the fitness function. Experiments have also proved that the new fitness function is better than the original fitness function.

In the algorithm comparison, according to Table 6, it can be seen that in addition to the Wine data set, the number of attribute reductions in the FSRSWOA algorithm is slightly less than that of the ABCNRT algorithm. The Heart and Ionosphere data sets are partially equal to other algorithms, which can be clearly seen in other data sets. The ABCNRT algorithm is better than the compared algorithm. It can be seen from Table 7 that the optimal classification accuracy of Zoo and Sonar data sets is better than the ABCNRT algorithm in other algorithms, and the classification accuracy of the ABCNRT algorithm in the remaining data sets is better than the comparison algorithm.

Table 6. Attribute reduction
Table 7. Optimal classification accuracy

In summary, it fully illustrates the advantages and reliability of the algorithm in this paper after comparing the other three algorithms.

4.3 Algorithm Performance Comparison

In order to further verify the advantages of the algorithm in this paper, the performance of the above-mentioned various algorithms is compared, and the experiment uses the methods of Friedman test and Nemenyi follow-up test to verify.

First, according to the classification accuracy in Table 4 and Table 7, sort and assign the classification accuracy of the 4 algorithms involved in the experiment on 8 data sets. Tables 8 and 9 show the classification accuracy of each algorithm in each data set. The ranking value of the test performance and its average value.

Table 8. The original function experimental algorithm performance ranking results
Table 9. Improved function experiment algorithm performance ranking results

Then use Friedman test to determine whether the above algorithms have the same performance. Through the above observations, there is no algorithm with the same performance. Suppose we compare n algorithms on M data sets, the commonly used variable \(\tau _{F}=\frac{(M-1) \tau _{X}^{2}}{M(n-1)-\tau _{X}^{2}}\), where \(\tau _{F}\) obeys the F distribution with \(n-1\) and\((n-1)(M-1)\) degrees of freedom.

If the algorithm performance is not the same, use the Nemenyi follow-up test method, use the Nemenyi follow-up test to calculate the critical value range CD, \(C D=q_{\alpha } \sqrt{\frac{n(n+1)}{6 M}}\) of the average ordinal difference, where n is the number of algorithms and M is the number of data sets. When \(n=4\) and \(\alpha =0.1\) are given after looking up the table, the value of \(q_\alpha \) is 2.291, so the critical value \(CD=1.4789\) is obtained.

Fig. 3.
figure 3

Inspection result graph

According to the result of the test (Fig. 3.), it can be seen that in the original fitness function test result graph, the performance of the FSRSWOA algorithm is slightly better than the ABCNRT algorithm, and the two are similar, but in the improved fitness function test result graph It can be clearly seen that the ABCNRT algorithm is better than the other three algorithms. It fully shows that the performance of the attribute reduction algorithm in this paper has significant advantages compared with other algorithms.

In summary, through the above experiments, it can be seen that the ABCNRT algorithm proposed in this paper has more advantages in selecting the optimal attribute reduction and obtaining the optimal classification accuracy. In the above experiments, the algorithm in this paper has selected a better attribute reduction and obtained a better classification accuracy, which shows the effectiveness of the algorithm in this paper.

5 Concluding Remarks

This paper proposes an attribute importance measurement method based on discrimination matrix, and introduces the attribute importance as heuristic information into the bee colony algorithm, constructs a new fitness function, and designs based on artificial bee colony and discrimination matrix The attribute selection method of the optimization algorithm, this method avoids the redundant attributes in the searched attribute subset, reduces the number of iterations, and reduces the reduction time complexity. The experimental results of the UCI data set show that this method improves the reduction rate. After testing, it can be found that the attribute selection method in this paper has better performance and better classification accuracy. However, the algorithm in this paper also has shortcomings. First, it is unable to calculate the importance of attributes that do not appear in the neighborhood discrimination matrix; second, although the number of colony iterations is reduced, the importance of the neighborhood discrimination matrix must be calculated, which increases the amount of calculation. To further reduce the computational workload and find the optimal attribute subset is the method for future research.