1 Introduction

Rough sets, presented in Pawlak’s theory [15, 16], is used to analyze information with inaccuracy, inconsistency, and incompleteness. The main idea of this theory is using equivalence relations for classification and approximating uncertain knowledge with upper and lower approximation methods. However, traditional neighborhood rough set has many shortcomings. Therefore, many related concepts have emerged, such as neighborhood rough set [3, 37], multi-granulation rough set [4, 24], soft rough set [2, 20], and fuzzy rough set [22, 35, 36].

Among all the related concepts, the neighborhood rough set has been proved highly effective in the application of attribute reduction. Hu [8] proposed the neighborhood rough set based on the measurement of attribute distance between samples, which replaces the equivalence relation between samples with the determination of sample neighborhoods. Through this method, neighborhood rough set can process numerical data. To generate high quality attribute sets, forward attribute selection strategy was used for attribute reduction in Qing [19]. However, the computational process of neighborhood rough set has a high degree of complexity, so many works focus on speeding up the process of obtaining reduced attribute sets. Liu [38] proposed a fast hash attribute reduct Algorithm FHARA based on hash bucket partitioning. Wang [30] proposed an improved algorithm based on Liu’s research, proposing a fast reduction algorithm EasiFFRA based on symmetry and decision filtering. Xia [18] proposed a pre-sorting method, which provides a large amount of local and overall sample information in advance by sorting. Srirekha [21] introduced the concept of object ranking and implemented four attribute reduction methods based on object sorting by defining dual operators. Zou [41] proposed a rough set attribute reduction algorithm based on conditional entropy which is used for fatigue decision system of the aluminum welded joints. Chu [5] introduced Best-Worst method and constructed the neighborhood rough set attribute reduction model as well as took attribute correlation into consideration. In the above work, the geometric distribution information of the attribute set can be obtained additionally in the process of calculation, and the prior knowledge can be used to reduce unnecessary operations in the calculation.

In the actual process, it is necessary to consider the dynamic changes of the dataset and the need for multi-dimensional measurements. The multi-granularity mechanism is a good way to take into account the impact of different situations. By fusing information from different situations, it is possible to further accelerate the operation of the algorithm or obtain a set of attributes with excellent properties. Liu [13] introduced multi-granularity mechanisms and achieved result fusion through multi-granularity constraints to achieve attribute reduction. Li [42] also used multi-granularity mechanisms to adjust the degree of mutual influence between two different neighborhood radii. The algorithm of GBNRS proposed by Xia [31] adaptively generates different neighborhoods for each sample, which is more flexible and variable than traditional methods. Dai [40] deleted objects in the dataset to change the granularity. Su [23] proposed an incremental update mechanism for the positive region and right neighborhood that is suitable for dynamic datasets. Yang [33] used a matrix-based mechanism to solve the problem of multi-granularity. Li [12] accelerated the reduction process by constructing a multi-granularity reduction result with multiple neighborhood radii. Tallon [25] introduced positive approximation mechanism and proposed an acceleration strategy for neighborhood based multi-granularity attribute reduction. Zhao [39] noticed that the multi-granularity model can be used in continuous parameters and tried to accelerate the process of multi-granularity attribute reduction. Jiang [9] designed an accelerator for varying neighborhood radius, which realizes high-speed computing under a multi-granularity mechanism. Yang [32] tried to fuse three-way decision, granular computing and multi-granularity approach and propose the multilevel neighborhood granular structures which has very good performance. In Liu et al. [14], three-way decision was introduced into multi-granularity attribute reduction and data-aware multi-granularity structure is automatically induced from self-contained distance space to define a novel feature evaluation criterion. However, the multi-granularity mechanism can improve the quality of the reduced attribute set, but it remains to be proved whether it can reduce the redundant components in the results.

Most method explored relatively less in generating a relative attribute reduct. To address this issue, most studies have designed algorithms to obtain high quality attribute set. Kang [11] proposed an inconsistent gray decision system attribute reduction based on variable precision gray multi-granularity rough set. Guo [7] proposed a double fuzzy consistency measure to simultaneously mine the valuable information in upper and lower approximations from the absolute quantitative perspective and in the boundary from the relative quantitative perspective. Abdolrazzagh-Nezhad [1] proposed a continuous optimization algorithm using cultural algorithm with a dual inheritance system to generate new individuals and planning a novel heuristic to discrete population and belief spaces. Peng [17] noticed that robust variable granular-ball model has shown good effectiveness in label noise environment and used adaptive granular-balls of different sizes to perform efficient attribute reduction. In Ju et al. [10], nearest mutual neighbor-based personalized information granularity is introduced in attribute reduction and Attribute reduction is transformed into an optimization issue. Dynamic attribute reduction was proposed in Yang et al. [34] to cope with changes and it incremental updates attributes to generate attribute sets. Gao [6] tried to remove irrelevant examples to simultaneously exclude redundant attributes. However, most methods aim to directly obtain high-quality solutions, but in reality, this is not always possible. We want to obtain better solutions through iterative processes using randomization. This approach is more stable and also transforms attribute reduction into an optimization problem. Self-information can describe the uncertainty of a signal very well, which is consistent with the idea of fuzzy approximation, so part of the works [26, 28] combines it with attribute selection and achieves good results. In addition, some works [27, 29] has shown that fuzzy set can be used to describe the fuzzy relationships and uncertainties between attributes so as to consider the association between attributes more comprehensively in attribute reduction.

Fig. 1
figure 1

The diagram of the algorithms

In this paper, we try to synthesize the interaction between multiple hash buckets to further effectively reduce the positive region search range of a single sample. Besides, we found that all samples in the same hash bucket have the same search range, which suggests that we can perform positive region search on all samples in a bucket at the same time, which is a very novel multi-granularity mechanism. Based on the above two mechanisms, we propose a multi-hash bucket and multi-granularity positive region search algorithm (MMPRSA) to accelerate positive region search and apply it to random walk. Finally, an efficient attribute reduction algorithm (WalkNAR) is obtained. WalkNAR can perform multiple iterations of the existing set of attributes, trying to remove individual attribute or replace two of them with a new one, so as to obtain a more reduced result. The diagram of the algorithms we designed is shown in Fig. 1. The main contributions of the paper are the following:

  1. 1.

    A method based on multiple hash bucket has been proposed and this method can reduce the range of neighborhoods of a single sample by performing intersection operations between different independent hash buckets so as to avoid a large number of computations of distance between samples.

  2. 2.

    An efficient method, based on multiple granularity is applied and it means some samples can be omitted when processing other samples. This method considers that the search space generated by samples with the same hash bucket index is consistent, so as to transform the positive region search problem of a single sample into a positive region search problem of all samples in the bucket.

  3. 3.

    A forward search strategy, gradually adding attribute sets that maximize the positive region using a greedy strategy, is used for a multi-hash bucket and multi-granularity attribution reduction to obtaining the relative reduction attribute set.

  4. 4.

    Finally, a local search method to explore local solutions through random walk with different strategies is conduct to obtain higher-quality attribute sets. The random walk strategy involves removing a single attribute and replacing two of them with a new one.

The rest of the paper is organized as follow: In Sect. 2, basic concepts about neighborhood rough set are introduced. Section 3 exposes the multi-hash bucket and multi-granularity positive region search algorithm for attribute reduction. The neighborhood rough set-based attribute reduction approach using random walk we proposed is introduced in Sect. 4. Experimental results on UCI data sets are presented in Sect. 5. Some conclusions are drawn in Sect. 6.

2 Preliminaries

We first recall some concepts and results that will be used throughout the paper. The concepts mentioned can all be found within the reference [8].

A decision table can be expressed as \(S=(U,C,D,\mathcal{V},f)\), where \(U=\{x_1,x_2,\ldots ,x_{|U |}\}\) is a finite nonempty set of objects, called universe, \(C=\{a_1,a_2,\ldots ,a_{|C |}\}\) is a finite nonempty set of conditional attributes, D is the decision attributes, and \(f:U\times (C\cup D)\rightarrow \mathcal{V}\) is an information function, where \(\mathcal{V}\) is domain of attributes \(C\cup D\). Moreover, for every \(a\in C\cup D\), we have \(\mathcal{V}_a=\{f(x,a) \mid x\in U\}\). An example of a simple decision table is given in Table 1, where \(U=\{x_1,x_2,\ldots ,x_{11}\}\), \(C=\{a_1,a_2\}\), and \(D=\{d\}\).

Definition 1

As defined in Hu et al. [8], given a m-dimensinal real space \(\Omega \), a neighborhood radius \(\delta \), \(\forall x_i \in \Omega \) and \(B \subseteq C\), the neighborhood \(\delta _B(x_i)\) of \(x_i\) in feature space B is defined as

$$\delta _B(x_i)=\{x_j \mid x_j\in U,\Delta ^{B}(x_i,x_j)\leqslant \delta \}$$

where \(\Delta \) is a mapping \(\Delta : \Omega \rightarrow \Omega \), \(\Delta \) satisfies:

  1. 1.

    Non-negativity:\(\Delta (x_1,x_2)\geqslant 0,\Delta (x_1,x_2)=0 \text { if and}\)\(\text {only if } x_1=x_2;\)

  2. 2.

    Symmetry:\(\Delta (x_1,x_2)=\Delta (x_2,x_1);\)

  3. 3.

    Triangle inequality:\(\Delta (x_1,x_3)\leqslant \Delta (x_1,x_2)+\Delta (x_2,x_3).\)

\(\Delta \) is usually expressed by the \(\text {L}_2\text {-norm}\):

$$ \Delta _{B}(x_{i},x_{j}) =(\sum \limits _{k=1}^s \mid f(x_i,a_k)-f(x_j,a_k) \mid ^2)^{\frac{1}{2}} $$

Example 1

Considering the decision system in Table 1, assuming \(\text {L}_2\text {-norm}\) is used for the distance function \(f(x_i,x_j)\), the distance among samples can be easily compute:

$$\begin{gathered} f({x}_{1},{x}_{2}) ={\textbf {0.1077}},f(x_1,x_3)={\textbf {0.5099}},f(x_1,x_4)={\textbf {0.3000}} \\ f({x}_{2},{x}_{3}) ={\textbf {0.6708}},f(x_2,x_4)={\textbf {0.5657}},f(x_3,x_4)={\textbf {0.2236}} \end{gathered}$$

Assuming \(\delta = 0.3\) then the neighborhood set of \(x_i\) are:

$$\begin{aligned} \delta ({x}_{1})= & {} \{ x_1,x_2,x_4 \},\delta ({x}_{2}) =\{ x_1,x_2 \},\delta ({x}_{3}) \\= & {} \{ x_3,x_4 \},\delta ({x}_{4}) =\{ x_1,x_3,x_4 \} \end{aligned}$$
Table 1 Demo data

Definition 2

As defined in Hu et al. [8], given a universe U, a neighborhood relation N over U and a subset X of U, the mathematical expressions of lower and upper approximations of X in \(\left\langle U,N \right\rangle \) are as fellows:

$$ \underline{N}X=\{{x}_i \mid \delta ({x}_i)\subseteq X,{x}_i\in U\} $$
$$ \overline{N}X=\{{x}_i \mid \delta ({x}_i)\cap X\ne \emptyset ,{x}_i\in {U}\} $$

Further more, \(X_1,X_2,\ldots ,X_s\) are the division by D to U, then the lower approximation and upper approximation of conditional attribute subset B relative to D are:

$$ \underline{N_B}D=\bigcup \limits _{i=1}^s\underline{N_B}X_i $$
$$ \overline{N_B}D=\bigcup \limits _{i=1}^s\overline{N_B}X_i $$

\(\underline{N_B}D\) is also known as the positive region \(\textit{POS}_B\left( D\right) .\)

Example 2

Given a subset \(X=\{x_1,x_2\}\) of U, a conditional attribute subset \(B = C\), then the lower approximation and upper approximation of B relative to D can be given as follow:

$$ \underline{N_B}D=\{ x_2\} $$
$$ \overline{N_B}D=\{ x_1,x_2,x_4\} $$

We can easily know from the example that the following rule is true:

$$ \underline{N_B}D \subseteq X \subseteq \overline{N_B}D $$

\(\underline{N_B}D\) includes all the samples whose neighborhoods with same class intersect with X,and \(\overline{N_B}D\) represents the samples whose neighborhoods with same class are included in X.

3 MMPRSA

In this section we have introduced two methods used in our positive region search algorithm to avoid redundant operations.

3.1 Multi-hash bucket positive region search algorithm

Based on the characteristics of the spherical neighborhood of the neighborhood rough set, Liu et al. proposed an attribute reduction algorithm FHARA based on hash bucket division, which generated hash buckets according to the distance between the randomly selected sample center, so as to limit the neighborhood range of each sample to adjacent buckets and reduce the search range of the positive region.

Fig. 2
figure 2

Hash bucket division

Definition 3

As shown in Fig. 2, given the decision table \(DT=\langle U,C\cup D,\mathcal{V},f\rangle \) assuming that the \(\delta \) is the neighborhood radius, a point X called sample center is randomly selected in the sample space, then the above method maps samples to the mutually exclusive hash bucket \(B_0,B_1,\cdots ,B_c\) according to \(\delta \) and the distance from the samples to X. The possible range of the neighborhood of the corresponding sample can be narrowed to \(B_q\) and adjacent buckets \(B_{q-1}\), \(B_{q+1}\).

Since the sample is divided according to the distance from the sample center, there are:

$$\begin{aligned} \left\{ \begin{array}{lr} B_{q}\!=\!B(x_i)\!=\!\{x_j\mid \lceil \Delta \left( x_j,X\right) /\delta \rceil \!=\!\lceil \Delta (x_i,X)/\delta \rceil \}&{} \\ q=\lceil \Delta (x_i,X)/\delta \rceil \\ \end{array} \right. \end{aligned}$$
(1)

It is easy to draw the following conclusions:

$$\begin{aligned} \forall i,j\in [0,b],i\ne j,B_i\cap B_j=\emptyset \end{aligned}$$
(2)
$$\begin{aligned} \bigcup _{i=1}^{b}B_i=U \end{aligned}$$
(3)

Theorem 1

As delineated in the reference [38], given a decision table \(DT=\langle U,C\cup D,\mathcal{V},f\rangle \), assuming that the \(\delta \) is the neighborhood radius, \(B_0 \dots B_k\) are the buckets, then \(\forall {x}_i \in B_q(q = 1,2,\dots ,k-1)\) the neighborhoods of \(x_i\) are all included in \(B_q \cup B_{q-1} \cup B_{q+1}\).

Proof

According to Eq. (1), \(x_i\in B_q\rightarrow (q-1)\theta <\Delta (x_i,X)\leqslant q\theta \), so \(\forall x_j \notin B_q \cup B_{q-1} \cup B_{q+1}\), we have \(\Delta (x_i,X)-\Delta (x_j,X) \ge \delta \), because we know the triangle inequality, we have \(\Delta (x_i,x_j)\ge \Delta (x_i,X)-\Delta (x_j,X)\), so \(\Delta (x_i,x_j)\ge \delta \). \(\square \)

Since there is no intersection between adjacent buckets, there are no redundant samples when calculating the positive region, assuming that the total number of samples \(|{U}|\) is evenly divided into h buckets, the number of samples to be traversed in a single search is reduced from \(|{U}|\) to \(3|{U}|/h\). Let the number of selected conditional attributes is m, then the computational times of positive region of the entire sample set is O(\(m|{U}|^{2}/h\)), so this method can effectively improve the efficiency of calculating positive region.

Liu thought that b can tend to \(|{U}|\). However, considering that the data in the sample space is not uniformly distributed, there is an inconsistent degree of aggregation, and the neighborhood radius \(\delta \) has restriction on the number of buckets generated. The computational times of the positive region cannot be approached to O(\(m|{U}|\)) in the actual operation.

As shown in Eq. (1), using hash buckets to divide the sample space is essentially to map the sample to a one-dimensional space, and we only need to search for samples in adjacent buckets at this one-dimensional space, which naturally sorts the samples according to distance, so that the adjacent relationship in the one-dimensional space can directly correspond to the neighborhood range of the sample. However, mapping such a large number of samples into one-dimensional space will cause the normalized sample points to be too dense, although it can still have an acceleration effect, but the number of samples in each bucket will surge, resulting in poor algorithm efficiency.

The selection of the sample center has no effect on the consistency and correctness of the algorithm. Assuming the neighborhood search area of \(x_i\) is \(Search(x_i)\), so for the sample point \(x_i\), the bucket corresponding to the \(x_i\) is \(B_q\), the corresponding neighborhood search area for \(x_i\) is:

$$\begin{aligned} Search(x_i)=B_q\cup B_{q-1}\cup B_{q+1} \end{aligned}$$
(4)

There is a simple method that we can choose multiple sample centers and taking the intersection between different search ranges formed by multiple divisions, there is no need to consider the number of attributes of the samples, so this method can avoid the calculation of distance calculation caused by too many attributes.

Fig. 3
figure 3

Single-bucket-based search area reduction

Fig. 4
figure 4

Multi-bucket-based search area reduction

Example 3

As shown in Figs. 3, 4 and Table 2. In Fig. 3 we marked the area within the point search range as a shaded area, let sample center \(X=\{X_1\}\), then we have \(Search(x_1)=\{x_1,x_2,x_3\}\). If two sample centers are taken for calculation, let \(X=\{X_1,X_2\}\) and then we have \(Search(x_1) = {Search(x_1,X_1)}\cap {Search(x_1,X_2)}= \{x_1,x_2,x_3\}\cap {\{x_1,x_2\}}=\{x_1,x_2\}\).

Table 2 Demo data

Since different sample centers have no effect on the algorithm results, multiple sample centers X can be selected to form different divisions of the sample space, and then different neighborhood search ranges can be generated for the same sample, and the search range can be further narrowed by taking the intersection operation, thereby reducing the amount of operation.

Theorem 2

Assuming that the sample centers of b samples are \(X=\{X_1,\ldots ,X_b\}\), and for \(X_k\), the corresponding bucket of \(x_i\) and \(X_k\) is defined as \(B_{q(i,k)k}\), then Eq. (1) and Eq. (4) can be rewritten as follows:

$$\begin{aligned} \left\{ \begin{array}{lr} B_{q(i,k)k}=\{x_j\mid \lceil \Delta \bigl (x_j,X_k\bigr )/\delta \rceil =\lceil \Delta (x_i,X_k)/\delta \rceil \}&{} \\ q=\lceil \Delta \bigl (x_i,X_k\bigr )/\delta \rceil \\ \end{array} \right. \end{aligned}$$
(5)
$$\begin{aligned} \forall x_i\in U, Search(x_i)=\bigcap \limits _{j=1}^b\bigcup \limits _{r=q(i,j)-1}^{q(i,j)+1}B_{rj} \end{aligned}$$
(6)

Compared with mapping to one-dimensional space, this method can effectively cope with sample aggregation for normalized samples, and further reduce the sample range that each sample needs to search compared to single-bucket-based search area reduction, because in most cases, the same type of samples will be gathered in the adjacent area of the current calculation sample, and the possibility of different samples in the neighborhood is small. Therefore, this method also reduces the distance calculation due to different class of samples.

Proof

Taking the two sample center \(X_1,X_2\), assuming that the research sample object is \(x_i\), for the sample center \(X_1\), the corresponding bucket label number for \(x_i\) is \(B_{a1}\), for the sample center \(X_1\), the corresponding bucket label number for \(x_i\) is \(B_{c2}\), then we have:

$$\begin{aligned} \delta (x_i)\subseteq (B_{a1}\cup B_{(a-1)1}\cup B_{(a+1)1}) \end{aligned}$$
(7)
$$\begin{aligned} \delta (x_i)\subseteq (B_{c2}\cup B_{(c-1)2}\cup B_{(c+1)2}) \end{aligned}$$
(8)

then we can conclude that:

$$\begin{aligned} \delta (x_i)\subseteq (B_{a1}\cup B_{(a-1)1}\cup B_{(a+1)1})\cap (B_{c2}\cup B_{(c-1)2}\cup B_{(c+1)2}) \end{aligned}$$
(9)

If we choose b sample centers \(X={X_1,X_2,\dots ,X_b}\), then the proof for the conditions that \(b=2,3,4,\dots \) is similar, finailly we have the result of Eq. (6) \(\square \)

Assuming that the number of sample centers is b, for each sample center, the number of conditional attributes is m, all samples are evenly divided into h buckets, and different bucket divisions are not correlated, then the search area for a single sample is \(3|{U}|/h^b\), and the positive region of the entire sample set is calculated with the computational times of \(O(m|{U}|^2/h^b)\), \(h^b\) is more likely to tend to \(|{U}|\), so this method is theoretically more efficient.

In fact, the number of sample centers b is not as high as possible, selecting too many sample centers will cause the sample search area to be further reduced, while the computational times when initializing the hash bucket is b times of that of selecting a single sample center, and the selection of b will be further tested in subsequent experiments.

3.2 Multi-granularity positive region search algorithm

Liu et al. [13] proposed that the concept of granularity have three parts: parameters, samples and attributes. Most research tended to process samples based on a single granularity, ignoring the use of multiple variable granularities in the algorithm, so there are often certain limitations in their algorithm.

In order to overcome the loss and neglect of information caused by fixed granularity, we hope to obtain more useful information by dynamically changing the granularity of the current calculation when calculating the positive region to simplify the subsequent reduction process.

Fig. 5
figure 5

Reduction mechanism based on multi-granularity

According to Eq. (6), for the reduction mechanism based on the hash bucket, the search area of the neighborhood of any sample is obtained by the adjacent buckets. For some samples, the search range of the neighborhood content is completely consistent, thus we can omit the processing of those sample and instead process the sample in the same bucket consistently.

Theorem 3

Assuming that the current research sample is \(x_i\), set the area \(E_i=\cap _{j=1}^b B_{q(i,j)j}\), \(\forall x_j \in E_i\), \(Search(x_j)=Search(x_i)\), if it can be judged that all samples in the area \(Search(x_i)\) are of the same kind, then we have \(\forall x_j \in E_i\), \(x_j \in POS_B(D)\).

Proof

We use \(D(x_i)\) to represent the class of sample \(x_i\). According to the definition of \(Search(x_i)\) and \(POS_B(D)\) we can prove that the following equations are ture:

$$\begin{aligned} \forall x_i\in U,\delta (x_i)\subseteq Search(x_i) \end{aligned}$$
(10)
$$\begin{aligned} \forall x_j\in \delta (x_i),D(x_i)=D(x_j)\rightarrow x_i\in POS_B(D) \end{aligned}$$
(11)

Assuming a set of samples \(E_i=\cap _{j=1}^b B_{q(i,j)j}\), all samples in \(E_i\) have the same search field, because for any sample center \(X_k\), those samples have the same distance from \(X_k\), so if all the samples in \(E_i\) have the same class \(B'\) then we can prove that:

$$\begin{aligned} \forall x_i\in Search(x_i),D(x_i)=B'\xrightarrow {}\forall x_i\in E_i,x_i\in POS_B(D)\quad \end{aligned}$$
(12)

\(\square \)

In actual use, samples can be stored by establishing an index list, that is, setting the index list of sample \(x_i\) to \(\{q^{i1},\ldots ,q^{i b}\}\), then the set of samples with the same index list with the \(x_i\) is \(E_i=\cap _{j=1}^b B_{q(i,j)j}\).

Table 3 Demo data

Example 4

As shown in Fig. 5 and Table 3, we have Search \((x_1)=\{x_1,x_2,x_3,x_4\}\). At this time, all samples in the range \(Search(x_1)\) are of the same kind, \(\forall x_i \in Search(x_1),\) \(D(x_1)=A\), so according to Eq. (12), \(\forall x_i \in Search(x_1)\), \(x_i \in POS(D)\), so that the subsequent calculation of the \(x_2\) neighborhood can be ignored, also we have \(Search(x_3)=\{x_1,x_2,x_3,x_5,x_6\}\), since the \(x_6\) is a sample with different class at this time, whether the \(x_6\) exists in the \(x_3\) neighborhood, it is impossible to classify all the samples in \(Search(x_3)\) into the positive region, it is necessary to determine whether all the samples in the bucket belong to the positive region and whether the current sample belongs to the positive region do not affect each other, so if \(\Delta (x_3,x_6)>\delta \), we can still add the \(x_3\) to the positive region.

Assuming that the probability that the samples with same decision attribute in adjacent hash buckets is p, and when the appropriate conditional attribute set is selected, because the homogeneous aggregation effect of the sample is more obvious, and the samples with different decision attribute are mapped to the space that are far apart from each other, p will dynamically rise, and the corresponding part of the sample only needs to traverse the decision attributes of all the data in the bucket, without the need for distance calculation, in this case, the complexity of the computational times of this part is reduced to \(O(|{U}|)\), this is an exciting conclusion, and the huge amount of computation caused by conditional attribute distance calculations is completely omitted from the positive region calculation for this part of the sample, which greatly improves the efficiency of the algorithm.

3.3 MMPRSA

Algorithm 1
figure a

Multi-hash bucket and Multi-granularity Positive Region Search Algorithm(MMPRSA)

Combined with the multi-hash bucket and multi-granularity mechanism, the traditional positive region search algorithm is improved, and the pseudocode representation is shown in Algorithm 1. Our algorithm consists of two parts, the first part needs to map all \(x_i\) to the corresponding \(Search(x_i)\), Since b sample centers are selected, the time complexity of this part of the calculation is O(\(b|U |\)). In the second part of our algorithm, all sample points needed to be matched to the samples in \(Search(x_i)\) one by one, but we use a multi-hash bucket mechanism, which makes the size of \(Search(x_i)\) as O(\(|U |/h^{b}\)), so the time complexity of this part is O(\(m|U |^2/h^{b}\)). Finally, combining the two parts, we propose that the time complexity of the algorithm is O(\(m|U |^2/h^{b}\)). However, it is worth noting that we have adopted a multi-granularity mechanism and there is no need to computation between samples of the same class, which do not change the scale of time complexity, but further increases the speed.

The algorithm adds Noncompute arrays to represent samples that do not require subsequent calculations, including the \(x_j\) of excluding positive region samples due to different class with the current sample, and the \(x_l\) existing in the positive region can be determined through multi-granularity calculation.

4 WalkNAR

As one of the local search algorithm strategies, the random walk mechanics are simple and effective, and can gradually converge to the local optimal solution with the iterative process. Based on the attribute reduction problem of random walk, the basic solution strategy needs to initialize the reduction attribute set. Based on the guidance of constraints, transfer to the reduction set with better characteristics, so as to gradually update the solution in the current state until the algorithm converges to the local optimal depreciation or reaches the number of iterations given in advance.

In many cases, the better initial solution can reduce the number of trips required for convergence, but the results obtained by using the greedy algorithm for initialization often require a large number of iterations to converge, so obtaining a high-quality attribute set before iteration can make the algorithm converge faster, or even converge to a better attribute set.

Fig. 6
figure 6

The variance and mean of the gain in the positive region

In order to further understand how to obtain high-quality initial solutions, it is necessary to study the mechanism of greedy algorithms that generate initial solutions. The FHARA algorithm [38] adopts a positive region search algorithm based on a single hash bucket as shown in Fig. 3. FHARA introduces attributes with the maximum of positive region by measuring the positive region of samples under the reduced attribute set which is based on the idea of greed, gradually generating a reduced attribute set. FHARA is shown in Algorithm 2. Assuming a decision table contains \(|{U}|\) records and k attributes are identified as the reduct from a total of m attributes, with the selection of each attribute typically adding \(|{U}|/k\) samples to the positive region, the computational complexity required to determine the reduct is as Eq. 13

$$\begin{aligned} m|{U}|+ & {} (m-1)|{U}|\frac{k-1}{k}+\cdots +(m-k)|{U}|\frac{1}{k}\nonumber \\{} & {} <\frac{m|{U}|(1+,\ldots ,+k)}{k}=\frac{m|{U}|(k+1)}{2} \end{aligned}$$
(13)

Example 5

To further understand Algorithm 2, we provide an example to facilitate the readers’ comprehension. We have a dataset \(U=\{x_0,x_1,x_2,x_3,x_4,x_5\}\) and conditional attributes, denoted as \( a_1 \), \( a_2 \), and \( a_3 \). We initialize the set of reduced attributes to the empty set, \( Red = \emptyset \), and calculate the positive regions for each attribute individually.

  1. 1.

    First Iteration:

    • Calculate the positive region for each attribute:

      $$\begin{aligned} POS(Q,Red\cup \{a_1\},D,\delta )&= \{x_0, x_1\} \\ POS(Q,Red\cup \{a_2\},D,\delta )&= \emptyset \\ POS(Q,Red\cup \{a_3\},D,\delta )&= \emptyset \end{aligned}$$
    • Select attribute \( a_1 \) to be included in the reduced attribute set, \( Red = \{a_1\} \).

    • Update the universal set Q by removing POS(Q\(\{a_1\},D,\delta ) \), resulting in a new set: \( U = \{x_2, x_3, x_4\} \).

  2. 2.

    Second Iteration:

    • Calculate the positive regions for combinations with \( a_1 \):

      $$\begin{aligned} POS(Q,Red\cup \{a_2\},D,\delta )&= \{x_2, x_3\} \\ POS(Q,Red\cup \{a_3\},D,\delta )&= \emptyset \end{aligned}$$
    • Choose attribute \( a_2 \) to be included in the reduced attribute set, updating Red to \(\{a_1, a_2\} \).

    • Update the universal set U to a single sample: \( U = \{x_4,x_5\} \).

  3. 3.

    Third Iteration:

    • Attempt to include \( a_3 \) into the reduced attribute set:

      $$\begin{aligned} POS(Q,Red\cup \{a_3\},D,\delta )&= \emptyset \end{aligned}$$
    • Since the addition of \( a_3 \) does not increase the positive region, the algorithm terminates.

The final reduced attribute set is \( Red = \{a_1, a_2\} \), and the total size of the positive region is 4, covering all samples of \(\{x_0,x_1,x_2,x_3\}\).

Fig. 7
figure 7

Schematic diagram of the random walk strategy

In this paper, the four datasets are selected to track and record the mean and variance of the positive region gain obtained by traversing each reduction attribute when it was added during the running of the FHARA algorithm. As shown in Fig. 6, when solving the problem, the greedy algorithm lacks consideration of the overall situation. For each greedy choice, if the selected optimal attribute has a more obvious advantage than other attributes, then the use of the greedy algorithm is completely feasible. On the contrary, if the attribute selected at each step is not clearly advantageous compared with other attributes, the algorithm lacks the interaction between attributes resulting in getting average or even poor results.

Analyzing the variance and mean of the gain in the positive region in Fig. 6, we find that the first-added and last-added attributes have the lowest positive region gain, and the positive region gain corresponding to the first and last judged attributes has a small variance, suggesting that the decision based on the positive region gain alone is blind when making decisions about these two types of attributes. Therefore, there are redundant attributes results obtained by FHARA, and the attributes in it need to be filtered and verified again.

Algorithm 2
figure b

Fast Hash Attribute Reduct Algorithm(FHARA)

In order to improve the overall quality of the reduced set, we proposed an neighborhood rough set-based attribute reduction approach using random walk (WalkNAR), as shown in Algorithm 3, this approach changes the positive region search function in the FHARA algorithm to the MMPRSA algorithm proposed in this paper, as a method to generate the initial reduction set (i.e. MMARA algorithm), and then uses two random walk update strategies to test the reduction set for many times, as shown in Fig. 7:

  1. 1.

    Strategy A: random deletion of single attribute within the reduction set

  2. 2.

    Strategy B: replace two attributes in the reduction set with other attributes

Both strategies can gradually reduce the attributes in the reduced set, and the conditions produced by strategy A are easier to achieve, so they take precedence over strategy B in practical use and in order to prevent the decline of the representationability of the reduced set.

The algorithm will perform T trials, with each trial involving a fine-tuning of the attribute set before executing the MMPRSA algorithm. Consequently, the time complexity of the algorithm is primarily determined by the MMPRSA. Assuming that the execution of the MMPRSA algorithm involves an average of k attributes, the time complexity of the algorithm is denoted as \(O(kT|U|^{2}/h^{b})\).

Table 4 Datasets
Table 5 Algorithm correctness verification
Algorithm 3
figure c

WalkNAR

5 Experiments

Our algorithm proposed in this paper selects the \(\text {L}_2\text {-norm}\) function as the distance measurement function. All tests are run in the same hardware environment, which is specified as follows: CPU: Intel(R) Core(TM) i5-11300 H CPU @ 2.30GHz; RAM: 16.0 GB; Operating System: Windows 10; Python Version: 3.7. Considering that a single experiment has a certain degree of chance, this article runs all the data in the same environment 10 times and selects the average value as the presentation result. We also compared our algorithm with EasiFFRA. EasiFFRA was proposed by Wang et al. [30] and it has many good properties, such as the symmetry of neighborhood relations and the decision value filtering strategy. The average reduction time of EasiFFRA on 12 datasets is only 24.45\(\%\) of that of the comparison method FHARA.

In order to prove the universality of the algorithm, 10 datasets with a large span of sample size and attribute number are selected for testing, and the details of the dataset are shown in Table 4:

5.1 Algorithm correctness verification

In order to verify the correctness of our algorithm proposed in this paper, EasiFFRA and MMARA were used to perform attribute reduction operations on the same data set. In the experiment, the values of parameter b of MMARA are 2, 4, 6, and 8. The same results were obtained for different values of b and the attribute reduction results are shown in Table 5:

It can be found that the results of the algorithm EasiFFRA and MMARA are consistent, because the attribute reduction idea is consistent with the forward search strategy.

5.2 Comparative experiment of efficiency

5.2.1 Comparison of the running time

In order to verify the high efficiency of MMARA, our experiment calculated the reduction speed of the two in the comparison process. In order to simulate the real situation, we did not change the number of sample centers during the single complete reduction, the number of sample centers selected in this experiment ranges from 2 to 8, and the data displayed are the data with the most suitable parameter selection.

Fig. 8
figure 8

Results of running time on different datasets

Fig. 9
figure 9

Runtime comparison

As shown in Fig. 8, in order to conveniently and intuitively reflect the efficiency of our algorithm, the reduction time of the two algorithms is displayed to show the average ability of the reduction. If not mentioned in the subsequent experiments, the neighborhood radius is 0.1, The algorithm proposed in this paper can have better reduction speed on various types of data, and because our algorithm provides optional parameters b, it can be more adapted to the data with various size by modifying the parameters.

Figure 9 shows the ratio of the time used by our algorithm and the comparison algorithm on each data set. Compared with the EasiFFRA, the average time used by the algorithm proposed in this paper on the selected dataset is 45.98\(\%\) of EasiFFRA. For the datasets with a data volume greater than 1000, the reduction time required by our algorithm is less than 37\(\%\) of the comparison algorithm, and the time spent by MMARA on the letter dataset is only 9.87\(\%\) of the comparison algorithm. This shows that the algorithm in this paper can achieve better results on large datasets.

Table 6 Comparison of the times of distance calculations

5.2.2 Comparison of the times of MMARA distance calculations

In this section, we chose to select the number of sample centers ranging from 2 to 8, and fixed the number of sample centers in the process of single reduction, respectively counted the sum of the number of sample distances calculated by the algorithm and EasiFFRA algorithm on different data sets for comparative analysis.

As shown in Table 6, the data presented in the table is rounded, and the number of calculated distances is proportional to the running time of the algorithm, in the MMARA algorithm, unnecessary distance calculations can be effectively avoided, so that the calculation time of the positive region can be greatly improved. In the data set with the sample size greater than 1000, as shown in the data in the second row of the table, the algorithm proposed in this paper can reduce the number of calculated distances for samples to about \( \frac{1}{3} \) to \( \frac{1}{10} \) of the EasiFFRA algorithm in the process of attribute reduction.

5.2.3 Influence of sample size on MMPRSA

In order to study the influence of sample size on the algorithm MMARA proposed in this paper, we divided the sample center number b into four groups for testing, and in order to further explore the change of sample size, we replicated the dataset with a sample size of less than 20000, we selected the first 20,000 samples from the expanded dataset.

The replication and enrichment operation will change the distribution of the dataset, resulting in the significant aggregation of the same type of samples in the dataset with a small amount of data, while sample distribution properties will be retained with a larger amount of samples. Therefore, this analysis mainly based on the original sample size of data sets for a comprehensive discussion. Our experiments repeated 10 times and the average is taken as the final data display, the test results on the impact of sample size on the algorithm are shown in Fig. 10.

Fig. 10
figure 10

Running time of MMPRSA at different sample numbers

Fig. 11
figure 11

Running time of MMPRSA at different attribute sizes

Fig. 12
figure 12

Accuracy and number of reduction attributes comparison

For most data sets, a significant phenomenon has emerged. With the increase of sample size, the algorithm proposed in this paper will prefer a larger value for the selection of parameter b. And in the case of the small original samples number, this phenomenon will be more obvious. For smaller data sets, when the sample size is greater than 14000, it is more inclined to choose \(b=8\) as the optimal parameter, and for medium-sized datasets, such as german, abalone, when the sample size reaches 16000, \(b=6\) is still used as the optimal parameter, and for the dataset that centerally has a large sample size, it mainly has different parameter selection preferences according to the nature of the dataset itself.

Table 7 Parameter recommendation

For datasets other than the shuttle dataset, when \(b=2\), the efficiency of the algorithm has a very serious decrease with the growth of the sample size, and when the value of b is larger, it has better operating efficiency. To explain the anomalies on the shuttle dataset, we make a specific exploration. In fact, the running speed of the algorithm in this paper on the shuttle dataset is significantly improved compared with other data sets. The small attribute size, and the multiple categories of samples resulting in more samples of distinct classes in the neighborhood and the early end of the traversal process, and this phenomenon will decrease with the increase of the number of attributes, which is consistent with the conclusion of Section 5.2.4.

In summary, when the sample size is large, the algorithm in this paper will prefer the larger value of b, at this time the algorithm in this paper with the growth of the sample size closer and linear growth, the selection of smaller b will lead to poor and unstable performance of the algorithm in this paper, because when the value of b is small, the sample centers are randomly initialized, the efficiency of the algorithm will be closely related to the position of a single sample center, and the selection of more sample center can reduce the dependence of the algorithm on the center of a single sample. This makes the algorithm performance more stable.

5.2.4 Running time of MMPRSA at different attribute sizes

In order to verify the relationship between the algorithm parameters and the number of attributes currently participating in the calculation, we dynamically changed the size of attributes and compared the total calculation time under each parameter, so as to give a parameter recommendation scheme for reducing the running MMPRSA algorithm.

As shown in Fig. 11, the results show that when the number of selected attributes is greater than 8, the growth of attributes will no longer significantly affect the efficiency on the dataset, while the positive region search time will show a different increase when the number of attributes is less than 8. When we selected less sample center, our algorithm does better in the case of a smaller sample size and a smaller number of attributes, and when the sample size is larger and we select more attributes, the algorithm will be more inclined to select more sample centers. In addition, when the number of sample centers is selected from 2 to 6, the positive region search time gradually decreases with the increase of the number of selected sample centers, and when the number of sample centers is 8, due to the nature of the data set, the reduction efficiency usually varies greatly.

In order to comprehensively consider the influence of sample size and attribute number on MMPRSA algorithm, we presented a better parameter selection. As shown in Table 7, we preferentially selects the parameter selection with better performance in multiple attribute sets, that is, \(b=4 or 6\), while for smaller data sets and large data sets, the influence of attributes on calculation time is discussed, our algorithm tends to favor small value of b for small sample sizes. As the sample size increases, the choice of parameter b also tends to favor larger values, so as to achieve optimal dynamic parameter selection in the process of MMARA algorithm reduction.

5.3 Quality comparison experiment

In order to test the update effect of the random walk process on the reduction set, we conducted a reduction set quality comparison experiment, using KNN, support vector machine classifier, random forest to perform a ten-fold cross-check test on the reduction set obtained by MMARA and the reduced set obtained by WalkNAR. The average classification accuracy of the three classifiers is taken as the final result. In order to facilitate further reduction, we selected datasets with a dataset attribute \(\ge \) 20 for experiments, and mainly compares the results of MMARA and WalkNAR.

As shown in Fig. 12, the data subscript is indicated as the dataset-neighborhood radius. WalkNAR can compress the reduced attribute set to 37.49\(\%\) of the original attributes, which is 4.96\(\%\) higher than that of MMARA, and the reduced set after the attribute reduction approach proposed in this paper has the same or even better classification accuracy than the reduced set directly obtained by MMARA. In fact, the classification accuracy of the attribute set obtained by the Narwalk algorithm on each classifier is 84.0476\(\%\), which is 0.6667\(\%\) higher than that of MMARA. The reduced set processed WalkNAR can delete or replace the attributes with poor classification ability in the reduction set so as to achieve the purpose of further reduction.

6 Conclusion

This paper proposed a positive region search algorithm MMPRSA based on multi-hash bucket and multi-granularity mechanism, which solved the problem of locating samples in the neighborhood through information synthesis between multiple sample centers for a large number of redundant calculations in the positive region search process. In addition, the algorithm used multi-granularity to transform the positive region judgment problem of samples into the positive region judgment problem of all samples in a hash bucket, thereby reducing the subsequent traversal operation.

In order to test the characteristics of the algorithm, we proposed an improved algorithm MMARA and an attribute reduction approach WalkNAR, and the comparative experiments show that the proposed algorithm can greatly accelerate the process of attribute reduction, and obtain a better set of reductions. The algorithm gave different parameter choices compared to the comparison algorithm, so it can adapt to data sets with different characteristics.