Keywords

1 Introduction

Rough set theory, introduced by Z. Pawlak [1] in 1982, is an efficient tool to imprecise, incomplete and uncertain information processing [2,3,4,5]. Currently, rough set theory has been successfully applied to many practical problems, including machine learning [6, 7], pattern recognition [8, 9], data mining [10], decision support systems [11], etc.

Attribute reduction, the process of obtaining a minimal set of attributes that can preserve the same ability of classification as the entire attribute set, is one of the core concepts in rough set theory [12]. Maximum distribution reduction, proposed as a compromise between the capability of generalized decision preservation reduction and the complexity of distribution preservation reduction [13] by Zhang et al. [14] in 2003, guarantees the decision value with maximum probability of object in inconsistent decision tables unchanged. Subsequently, Pei et al. proposed a theorem for maximum distribution reduct judgment in 2005. Next, Li et al. [15] paid attention to the computational efficiency of reduction definition and designed a new definition of maximum distribution reduction to speed up attribute reduction. Taking into consideration the general reduction on inconsistent decision tables, Ge et al. [16] proposed new definition of maximum distribution reduction.

Heuristic approaches is one of import method in attribute reduction. The heuristic approach is composed of two parts: the attribute reduction heuristic and the search strategy [17]. The attribute reduction heuristic is the fitness function of a heuristic approach. Existing definitions of heuristics are mainly based on three aspects: dependency degree [18], entropy [19,20,21], and consistency [22, 23]. The search strategy is the control structure of the heuristic approach. Speaking loosely, the search strategy mainly includes three kinds of methods: the deletion method, the addition method, and the addition-deletion method [24].

Existing methods for the judgment of maximum distribution were weak association with mainstream heuristics. As a result, the efficiency of heuristic maximum distribution reduction algorithm was limited due to lack of the support of acceleration policies that mainstream heuristics have. This paper focuses on the quick reduction algorithms for maximum distribution reduction. At first, we analyze the defect of the quick maximum distribution reduction algorithm (Q-MDRA) proposed in [15] and explore the root cause of its defect. Next, based on the existing mainstream heuristic function, we develop three heuristic maximum distribution reduction algorithms. Finally, we conduct some experiments to evaluate the effectiveness and efficiency of proposed algorithms.

The rest of this paper is organized as follows. In Sect. 2, we review some basic notions related to maximum distribution reduction and three classic heuristic functions. In Sect. 3, we show the defect of Q-MDRA with a calculation example of maximum distribution reduction. After exploring the root cause of its defect, we present three novel algorithms for maximum distribution reduction. In Sect. 4, we evaluate the efficiency of proposed algorithms through algorithm complexity analysis and comparison experiments.

2 Preliminary

In this section, we review some basic notions related to maximum distribution reduction and three classic heuristic functions.

The research object of the rough set theory is called the information system. The information system IS can be expressed as four tuple, i.e. \(<U,A,V,f>\), where U stands for the universe of discourse, a non-empty finite set of objects. A is the set of attributes, \(V=\bigcup _{a \in A}V_{a}\) is the set of all attribute values, and \(f:U \times A \rightarrow V\) is an information function that maps an object in U to exactly one value in \(V_a\). For \(\forall x \in U, \forall a \in A\), we have \(f(x,a) \in V_a\). Specifically in the classification problem, the information table contains two kinds of attributes, which can be characterized by a decision table \(DT=(U,C \cup D,V,f)\) with \(C \cap D = \emptyset \), where an element of C is called a condition attribute, C is called a condition attribute set, an element D is called a decision attribute, and D is called a decision attribute set.

For the condition attribute set \(B \subseteq C\), the indiscernibility relation and discernibility relation of B is respectively defined by \(IND(B) = \{<x,y> \,\in U \times U|\forall a \in B, f(x,a)=f(y,a)\}\) and \(DIS(B)=\{<x,y> \,\in U \times U|\exists a \in B, f(x,a)\ne f(y,a)\}\). For an object \(x \in U\), the equivalence class of x, denoted by \([x]_B\), is defined by \([x]_B = \{ y \in U|<x,y> \in IND(B)\}\). The family of all equivalence classes of IND(B), i.e., the partition determined by B, is denoted by U/IND(B) or simply U/B. Obviously, IND(B) is reflexive, symmetric and transitive. Meanwhile, DIS(C) is irreflexive, symmetric, but not transitive. Something else needed to be reminded of is that \(DIS(C)\, \cup \,IND(C)=U \times U,DIS(C) \cap IND(C) = \emptyset \).

One the basis of above notions, the concept of maximum distribution reduction was proposed by Zhang et al. [14] in 2003.

Definition 1

Let \(DT=(U,C \cup D,V,f)\) be a decision table, \(B \subseteq C\) is a maximum distribution reduct of C if and only if B satisfies

$$\begin{aligned} \begin{aligned}&\forall x \in U,\, \gamma _{B}(x)=\gamma _{C}(x);\\&\forall B' \subset B, \exists x \in U,\,\gamma _{B'}(x) \ne \gamma _{C}(x), \end{aligned} \end{aligned}$$

where \(\gamma _{C}(x)=\lbrace P_{i} : P_{i} \in U/D \wedge \arrowvert P_{i} \cap [x ]_{C}\arrowvert =max_{P_{j} \in U/D }( \arrowvert P_{j} \cap [x ]_{C} \arrowvert ) \rbrace \).

It is said that B is a maximum distribution consistent attribute set if B satisfies condition (1) mentioned above only. There are two methods of maximum distribution reduction: the discernibility matrix based methods and the heuristic methods. For that the discernibility matrix based methods are low-efficiency, heuristic methods are the more reasonable choice for processing the larger scale data. The heuristic attribute reduction algorithms comprises two parts: the heuristic function and the control strategy. We take the addition strategy based heuristic algorithms as the research object of paper. For the heuristic functions, we take three classic heuristic functions, i.e., the dependency degree, the condition entropy, and the consistency as the alternatives for the construction of improved algorithms.

Definition 2

Given a decision table \(DT=(U,C \,\cup \,D,V,f)\) and \(B \subseteq C,\, U/B=\{X_{1},\,X_{2},\,\cdots ,\,X_{m}\},\,U/D=\{Y_{1},\,Y_{2},\,\cdots ,\,Y_{n} \}\), three classic heuristic functions (dependency degree, the consistency and conditional entropy) are defined by:

  1. (1)

    \(\varGamma _B(D)=\frac{|POS_B(D)|}{|U|}\);

  2. (2)

    \(\delta _B(D)=|\{D_{j}|\frac{|[x]_B \cap D_{j}|}{|[x]_{B}|}=\max \limits _{k=1}^{|U/D|}\{\frac{|[x]_B \cap D_{k}|}{|[x]_{B}|}\}\}|/|U|\);

  3. (3)

    \(H(D|B)=-\sum _{i=1}^{m}P(X_{i})\sum _{j=1}^{n} P(Y_{j}|X_{i})\log P(Y_{j}|X_{i})\), where \(P(Y_{j}|X_{i})=|X_{i} \cap Y_{j}|/|X_{i}|\), where \(H(B)=-\sum _{i=1}^{m}P(X_{i})\log P(X_{i}),\,P(X_{i})=|X_{i}|/|U|\).

3 Novel Heuristic Maximum Distribution Reduction Algorithms

In this section, we present two defects in Q-MDRA firstly. After analyzing its cause, we construct two quick heuristic maximum distribution reduction algorithms based on classic heuristic functions.

At first, we want to review the quick maximum distribution reduction algorithm (Q-MDRA) proposed by Li et al. Here. Based upon Definition 1, Li et al. [15] proposed following theorem for the judgment of the maximum distribution reduct.

Theorem 1

Let \(DT=(U,C \cup D,V,f)\) be a decision table and \(B \subseteq C\)B is a maximum distribution reduct of C if and only if B satisfies

$$\begin{aligned} \begin{aligned} \forall x \in U,\, \gamma ^{Md}_{B}(D)=\gamma ^{Md}_{C}(D);\\ \forall B' \subset B,\,\gamma ^{Md}_{B'}(D) \ne \gamma ^{Md}_{C}(D), \end{aligned} \end{aligned}$$

where \(\gamma ^{Md}_{B'}(D)=\sum _{X \in U/B}\frac{\arrowvert X \cap P_{i}:arg max_{P_{i} \in U/D}|X \cap P_{i}| \arrowvert }{\arrowvert U \arrowvert }\).

This theorem is expressed by the Theorem 6.11 of Ref. [15]. \(\gamma ^{MD}_B(D)=\gamma _C^{MD}(D)\) maintains unchanged the scale of the maximum decision classes instead of the maximum decision classes for all of the objects in decision tables. That is to say, B may be not a maximum distribution reduct of C in some special conditions. We present the detail information in Sect. 3.1. Based on the variant of dependency degree heuristic function in Theorem 1, Algorithm 1 was constructed by the way of the addition strategy. Something needed to be reminded of in Algorithm 1 is that we denote the assignment operation as “\(:=\)” and use the “\(=\)” to represent that two items are on equal term.

figure a

3.1 The Defects of Q-MDRA

Here, in the way of calculation example, we show the detail information about that Q-MDRA may not perform well as our expectation. Assume that there is a decision table given as Table 1, we are assigned to get the maximum distribution reduct of Table 1.

Table 1. A decision table

For Table 1, we know that \(U=\lbrace x_{1},\,x_{2},\,\cdots ,\,x_{7}\rbrace \)\(C=\lbrace a_{1},\, a_{2},\,a_{3} \rbrace \)\(D=\lbrace d \rbrace \), and obviously we have \(U/C=\lbrace X_{1},\,X_{2},\,X_{3},\,X_{4} \rbrace =\lbrace \lbrace x_{1} \rbrace ,\,\lbrace x_{2} \rbrace ,\,\lbrace x_{3},\,x_{4} \rbrace ,\,\lbrace x_{5},\,x_{6},\, x_{7} \rbrace \rbrace \) and \(U/D=\lbrace P_{1},\,P_{2} \rbrace =\lbrace \lbrace x_{1},\,x_{3},\,x_{5}\rbrace ,\,\lbrace x_{2},\,x_{4},\,x_{6},\,x_{7} \rbrace \rbrace \). According to Definition 1, we know that \(\gamma _{C}(x_{1})=\{P_{1}\}\), \(\gamma _{C}(x_{2})=\{P_{2}\}\); for \(x \in X_{3},\, \gamma _{C}(x)=\{P_{1},\,P_{2}\}\); for \(x \in X_{4}\), we have \(\gamma _{C}(x)=\{P_{2}\}\).

The process of Q-MDRA for obtaining maximum distribution reduct of Table 1 is shown as follows.

Step 1. \(red:=\emptyset \).

Step 2. \(T:=red,\,\gamma ^{Md}_{T}(D)=|P_{2}|/|U|=4/7; \gamma ^{Md}_{T\cup \{ a_{1}\}}(D)=(|P_{1}\cap X_{1}|+|\{x_{2},\,x_{3},\,\cdots ,\,x_{7}\} \cap P_{2}|)/|U|=5/7; T:= T \cup \{a_1\}; \gamma ^{Md}_{T\cup \{ a_{2}\}}(D)=4/7; \gamma ^{Md}_{T\cup \{ a_{3}\}}(D)=4/7\). Because of \(T\ne red\), we operate the assignment of \(red:=T = \{a_{1}\}\).

Step 3. \(T:= red,\, \gamma ^{Md}_{T}(D)=|P_{2}|/|U|=5/7; \gamma ^{Md}_{T\cup \{ a_{2}\}}(D)=5/7; \gamma ^{Md}_{T\cup \{ a_{3}\}}(D)=5/7\). Because T is equal to red, program is over.

Using Q-MDRA we get a collection of attributes \(\{a_{1}\}\). According to Theorem 1, \(\{a_{1}\}\) is a maximum distribution reduct of Table 1 for that \(\{a_{1}\}\) satisfies \(\gamma ^{Md}_{\{a_{1}\}}(D)=\gamma ^{Md}_{\{C\}}(D)=5/7\) and \(\gamma ^{Md}_{\phi }(D)\ne 5/7\). But checking it with original Definition 1, we know that \(\{a_{1}\}\) is not a maximum distribution reduct for Table 1 because \(\gamma _{\{a_{1}\}}(x_{3})=\{P_{2}\} \ne \gamma _{C}(x_{3})=\{P_{1},\,P_{2}\}\). Consequently, Theorem 1 is incorrect.

Here we analyze the root of the defect of Theorem 1. Given a decision table \(DT=(U,C \cup D,V,f),\, U/C=\{X_{1},\,X_{2},\,\cdots ,\, X_{n}\},\,U/D=\{P_{1},\,P_{2},\,\cdots ,\,P_{m}\}\). Let \(mxcf(X_{i})=max_{P_j \in U/D}(|P_j \cap X_i|)\), we have \(\gamma _C^{Md}(D)=\sum _{X_i \in U/C}\frac{mxcf(X_i)}{|U|}\). Assume that \(x_1 \in X_1,\,x_2 \in X_2,\,\gamma _C(x_1) \ne \gamma _C(x_2),\, |\gamma _{C}(x_1)|>1,\, |\gamma _{C}(x_2)|>1,\,|\gamma _{C}(x_1) \cap \gamma _{C}(x_1)| \ge 1\) and \(B \subseteq C,\, U/B=\{X_1 \cup X_2,\,X_3,\,\cdots ,\,X_n\}\), it is obvious that \(mxcf(X_1)+mxcf(X_2)=mxcf(X_1 \cup X_2)\) and \(\gamma _C^{Md}{D}=\gamma _B^{Md}(D)\). But for \(x \in X_1\cup X_2,\, \gamma _B(x)=\gamma _C(x_1)\cap \gamma _C(x_2)\), it is not equal to \(\gamma _C(x_1)\) or \(\gamma _C(x_2)\). The measure \(\gamma _C^{Md}(D)\), used in Theorem 1, is not sensitive to the change of the maximum decision classes of objects that have two or more than two maximum decision classes.

On the other side, an attribute set red outputted by Q-MDRA does not always satisfy \(\gamma ^{Md}_{red}(D)=\gamma ^{Md}_{C}(D)\). The reason is that \(\forall a \in C-red,\, \gamma ^{Md}_{red \cup \lbrace a \rbrace }(D)=\gamma ^{Md}_{red}(D)\) does not guarantee \(\gamma ^{Md}_{red}(D)=\gamma ^{Md}_{C}(D)\). That is to say, \(\forall a \in C-red,\, \gamma ^{Md}_{red \cup \lbrace a \rbrace }(D)=\gamma ^{Md}_{red}(D)\) is not conflicted with \(\exists B \subseteq C-red,\, \gamma ^{Md}_{red \cup B}(D)>\gamma ^{Md}_{red}(D)\).

3.2 Novel Maximum Distribution Reduction Algorithms

To solve the problems identified in Q-MDRA, the concept of indiscernibility relation and discernibility relation of maximum distribution with respect to the specific attribute set are defined. Firstly. Next, the maximum distribution reduct is defined using the indiscernibility relation of maximum distribution. Finally, we construct heuristic maximum distribution reduction algorithms with classic heuristic functions.

Definition 3

Given a decision table \(DT=(U,C \,\cup \,D,V,f)\), the indiscernibility relation of maximum distribution of U with respect to \(B \subseteq C\) is defined as \(IND_{md}(B)=\{<x,\,y>|x,\,y\in U,\,\gamma _{B}(x)=\gamma _{B}(y)\}\), and the discernibility relation of maximum distribution of U with respect to B stands for \(DIS_{md}(B)=\{<x,\,y>|x,\,y\in U,\,\gamma _{B}(x) \ne \gamma _{B}(y)\}\).

Obviously, \(IND_{md}(C)\) is reflexive, symmetric and transitive; \(DIS_{md}(C)\) is irreflexive, symmetric, but not transitive. It is worth noting that \(IND_{md}(C) \cup DIS_{md}(C)=U \times U,\,IND_{md}(C) \cap DIS_{md}(C)= \emptyset \).

Theorem 2

Given \(DT=(U,C,D,V,f)\), B is a maximum distribution consistent attribute set of C if and only if B satisfies \(IND(C) \subseteq IND(B) \subseteq IND_{md}(C),\, DIS_{md}(C) \subseteq DIS(B) \subseteq DIS(C)\).

Proof

It is apparent that \(DIS(B) \subseteq DIS(C)\)\(IND(C) \subseteq IND(B)\), and based on \(IND(B) \cap DIS(B)=\phi ,\,IND(B) \cup DIS(B)=U \times U,\,IND_{md}(C) \cap DIS_{md}(C)=\phi ,\,IND_{md}(C) \cup DIS_{md}(C)=U \times U\), we know that \(DIS(C) \subseteq DIS(B) \subseteq DIS_{md}(C)\) is equal to \(IND(C) \subseteq IND(B) \subseteq IND_{md}(C)\). Thus what all we need is to prove that \(DIS_{md}(C) \subseteq DIS(B)\) is true.

  • Sufficiency(\(\Rightarrow \)): Assume that if B is a maximum distribution consistent attribute set then \(DIS_{md}(C) \nsubseteq DIS(B)\). \(DIS_{md}(C) \nsubseteq DIS(B)\) means \(\exists<x,\,y>\, \in \, DIS_{md}(C),\, <x,\,y> \notin DIS(B)\). Then we know \(\gamma _{C}(x) \ne \gamma _{C}(y)\) and \(\gamma _{B}(x) = \gamma _{B}(y)\). It is conflicted with our assumption. So if B is a maximum distribution consistent attribute set, then \(DIS_{md}(C) \subseteq DIS(B)\).

  • Neccessity(\(\Leftarrow \)): Assume that if B satisfies \(DIS_{md}(C) \subseteq DIS(B)\) then \(\exists x \in U,\, \gamma _{B}(x) \ne \gamma _{C}(x)\). According to the assumption, we know \(\exists y \in [x]_{B}-[x]_{C},\,\gamma _{C}(y) \ne \gamma _{C}(x)\). That is to say, \(<x,\,y> \,\in \, DIS_{md}(C),\, <x,\,y> \,\notin \,DIS(B)\). It is conflicted with \(DIS_{md}(C) \subseteq DIS(B)\). Consequently we know if B satisfies \(DIS_{md}(C) \subseteq DIS(B)\) then \(\forall x \in U,\, \gamma _{B}(x) = \gamma _{C}(x)\).

As mentioned above, Theorem 2 is true.   \(\square \)

Above theorem is good for understanding but it is not friendly in computing. So we represent maximum distribution reduction in the way of classic heuristic functions. According to Definition 2, we can present the definition of the maximum distribution reduct by conditional entropy.

Theorem 3

Given a decision table \(DT=(U,C,D,V,f)\), Let TGran stands for \(U/IND_{md}(C)\), \(B \subseteq C\) is a maximum distribution reduct if and only if B satisfies

  1. (1)

    \(H(TGran|B)=0\);

  2. (2)

    \(\forall B' \subset B\)\(B'\) doesn’t satisfy condition (1).

Proof

On the basis of Theorem 2, we can prove this theorem by explaining the equivalence relation between \(H(TGran|B)=0\) and \(DIS_{md}(C) \subseteq DIS(B)\).

  • Sufficiency(\(\Rightarrow \)): According to the definition of H(Q|P), it is easy to know that \(H(TGran|B)=0 \Leftrightarrow \forall Y \in TGran,\,\exists \{X:X \in U/B \wedge X \cap Y \ne \phi \},\,\bigcup _{X_{i} \in X}X_{i}=Y\). Therefore, we conclude that \(DIS_{md}(C) \subseteq DIS(B),\, IND_{md}(C) \supseteq IND(B)\). As a result, \(H(TGran|B)=0 \Rightarrow B\) is a maximum distribution consistent attribute set.

  • Neccessity(\(\Leftarrow \)): Assume that B is a maximum distribution consistent attribute set, and B satisfies \(H(TGran|B) \ne 0\). According to the definition of conditional etropy, we know \(H(TGran|B) \ne 0\) means \(\exists Y \in TGran,\, X\in U/B\) satisfies \(X \,\cap \,Y \ne \phi \,\wedge \,X \not \subset Y\). That is to say, \(\exists p \in X-X \,\cap \,Y,\, q \in X \,\cap \,Y,\,\gamma _{C}(p) \ne \gamma _{C}(q),\, \gamma _{B}(p)=\gamma _{B}(q)\). This concludes a conflict with B is a maximum distribution reduct. That is to say, if B is a maximum distribution consistent attribute set, then \(H(TGran|B)=0\).

As a result, Theorem 3 is true.   \(\square \)

According to Definition 2, we can use dependency degree for the presentation of the maximum distribution reduct.

Theorem 4

Given a decision table \(DT=(U,C,D,V,f)\), Let TGran stands for \(U/IND_{md}(C)\)\(B \subseteq C\) is a maximum distribution reduct if and only if B satisfies (1) \(\varGamma _B(TGran)=1\); (2) \(\forall B' \subset B\)\(B'\) doesn’t satisfy condition (1).

Proof

According to Theorem 2 and Theorem 3, the conclusion is clearly established.

For that \(\varGamma _C(D)=1 \Leftrightarrow \delta _C(D)=1\), we have \(\varGamma _B(TGran)=1 \Leftrightarrow \delta _B(TGran)=1\). As a result, there is no need to construct a theorem for maximum distribution reduction with \(\delta _B(TGran)\). Based on upon theorems, the significance functions for maximum distribution reduction can be defined as follows.

  1. (1)

    \(Sig_{1}^{outer}(a,B,TGran)=H(TGran|B)-H(TGran|B \cup \{a\}),\, a \not \in B\);

    \(Sig_{2}^{outer}(a,B,TGran)=\varGamma _B(TGran)-\varGamma _{B \cup \{a\}}(TGran),\, a \not \in B\).

  2. (2)

    \(Sig_{1}^{inner}(a,B,TGran)=H(TGran|B-\{a\})-H(TGran|B),\, a \in B\);

    \(Sig_{2}^{inner}(a,B,TGran)=\varGamma _B(TGran)-\varGamma _{B-\{a\}}(TGran),\, a \in B\).

For convenience of algorithm description, we denote \(Sig_{i}^{j}(a,B,TGran,U'),\,i \in \{1,2\},\,j\in \{inner,outer\}\) as the significance value computed in \(U'\). Using Theorem 3 and Theorem 4, we can construct Algorithms 2 and 3 for maximum distribution reduction. Algorithms 2 and 3, indeed, are the variant of the discernibility matrix based reduction algorithms. The difference of two algorithms to the discernibility matrix based algorithms is the focus paid toward the indiscernibility relation instead of the discernibility relation. It can be proved by extending the relation of \(IND(B) \cup DIS(B)=U\times U\) to the reduction algorithms. As a result, in intuition, the correctness of two algorithms can be transmitted from the discernibility matrix based algorithm for obtaining maximum distribution reducts.

figure b
figure c

4 Correctness Analysis and Experiments Results

The objective of this section is to present the correctnes and the efficiency of the attribute reduction algorithms proposed in this paper, i.e. MDRAUCE and MDRAUDD. To show the correctness of two algorithms, we calculate the maximum distribution reduct of Table 1 using MDRAUCE and MDRAUDD, and check outputs of two algorithms with the definition of maximum distribution reduction for validation. On the other side, we employed 12 UCI data sets to verify the performance of time consumption of MDRAUCE, MDRAUDD, and existing maximum distribution reduction algorithms.

4.1 The Validation of Correctness

In this part, we show the correctness of two algorithms proposed in Sect. 3 through presenting the process of calculating the maximum distribution reduct for Table 1 using Algorithm 2 and Algorithm 3. After that, we check the outputs of two algorithms according to the maximum distribution definition.

The process of MDRAUCE for finding the maximum distribution reduct of Table 1 is presented here. In the following description of calculation process, “item1=item2” denotes that the relationship of two are on equal item, and “\(:=\)” stands for the assignment operation.

Step 1. \(red:=\emptyset ,\,TGran=U/IND_{md}(C)=\{\{x_{1}\},\,\{x_{2},\,x_{5},\,x_{6},\,x_{7}\},\,\{x_{3},\,x_{4}\}\}\)\(U'=\{x_1,\,x_2,\,\cdots ,\,x_7\}\).

Step 2. \(Sig_{1}^{outer}({a_1},red,TGran,U') = H^{U'}(TGran|red)-H^{U'}(TGran|red \cup \{a_1\})=1.38-0.79=0.59,\,Sig_{md}^{outer}({a_2},red,TGran,U') = H^{U'}(TGran|red)-H^{U'}(TGran|red \cup \{a_2\})=1.38-0.98=0.40,\,Sig_{md}^{outer}({a_3},red,TGran,U') = H^{U'}(TGran|red)-H^{U'}(TGran|red \cup \{a_3\})=1.38-0.86=0.52\). So \(a_{max}=a_1,\,red:=red \cup \{a_1\}=\{a_1\}\). We have \(POS_{red}(TGran)=\{x_1\}\). \(U'\) and TGran are updated as follows, \(U':=U'-POS_{red}(TGran)=\{x_2,x_3,\cdots ,x_7\},\,TGran=TGran-POS_{red}(TGran)=\{\{x_2,x_5,x_6,x_7\},\,\{x_3,x_4\}\}.\)

Step 3. \(Sig_{1}^{red}({a_2},red,TGran,U') = H^{U'}(TGran|red)-H^{U'}(TGran|red \cup \{a_2\})=0.92-0.81=0.11,\,Sig_{md}^{outer}({a_3},red,TGran,U') = H^{U'}(TGran|red)-H^{U'}(TGran|red \cup \{a_3\})= 0.92-0.46=0.46\). So \(a_{max}=a_3,\,red:=red \cup \{a_3\}=\{a_1,a_3\}\). We have \(POS_{red}(TGran)=\{x_5,x_6,x_7\}\). \(U'\) and TGran are updated as follows, \(U':=U'-POS_{red}(TGran)=\{x_2,x_3,x_4\},\,TGran=TGran-POS_{red}(TGran)=\{\{x_2\},\,\{x_3,x_4\}\}.\)

Step 4. \(Sig_{md}^{outer}({a_2},red,TGran,U') = H^{U'}(TGran|red)-H^{U'}(TGran|red \cup \{a_2\})=0.92-0=0.92\). So \(a_{max}=a_2,\,red:=red \cup \{a_2\}=\{a_1,a_3,a_2\}\). We have \(POS_{red}(TGran)=\{x_2,x_3,x_4\}\). \(U'\) and TGran are updated as follows, \(U':=U'-POS_{red}(TGran)=\emptyset ,\,TGran=TGran-POS_{red}(TGran)=\emptyset .\)

Step 5. Because \(U'=\emptyset \), program is over. Algorithm outputs \(red=\{ {a_1},{a_3},{a_2}\}\) as the result.

The process of MDRAUDD for obtaining the maximum distribution reduct of Table 1 is presented as follows.

Step 1. \(red:=\emptyset ,\,TGran=U/IND_{md}(C)=\{\{x_{1}\},\{x_{2},x_{5},x_{6},x_{7}\},\{x_{3},x_{4}\}\},U'=\{x_1,x_2,\cdots ,x_7\}\).

Step 2. \(Sig_{1}^{outer}({a_1},red,TGran,U') = \varGamma ^{U'}_{red \cup \{a_1\}}(TGran)-\varGamma ^{U'}_{red}(TGran)=\frac{1}{7}-0=\frac{1}{7},\,Sig_{1}^{outer}({a_2},red,TGran,U') = \varGamma ^{U'}_{red \cup \{a_2\}}(TGran)-\varGamma ^{U'}_{red}(TGran)=0-0=0,\,Sig_{1}^{outer}({a_3},red,TGran,U') = \varGamma ^{U'}_{red \cup \{a_3\}}(TGran)-\varGamma ^{U'}_{red}(TGran)=\frac{3}{7}-0=\frac{3}{7}\). So \(a_{max}=a_3,\,red:=red \cup \{a_3\}=\{a_3\}\). We have \(POS_{red}(TGran)=\{x_5,x_6,x_7\}\). \(U'\) and TGran are updated as follows, \(U':=U'-POS_{red}(TGran)=\{x_1,x_2,x_3,x_4\},\,TGran=TGran-POS_{red}(TGran)=\{\{x_1\},\{x_2\},\{x_3,x_4\}\}.\)

Step 3. \(Sig_{1}^{outer}({a_1},red,TGran,U') = \varGamma ^{U'}_{red \cup \{a_1\}}(TGran)-\varGamma ^{U'}_{red}(TGran)=\frac{1}{4}-0=\frac{1}{4},\,Sig_{1}^{outer}({a_2},red,TGran,U') = \varGamma ^{U'}_{red \cup \{a_2\}}(TGran)-\varGamma ^{U'}_{red}(TGran)=\frac{2}{4}-0=0.5\). So \(a_{max}=a_2,\,red:=red \cup \{a_2\}=\{a_3,a_2\}\). We have \(POS_{red}(TGran)=\{x_3,x_4\}\). \(U'\) and TGran are updated as follows, \(U':=U'-POS_{red}(TGran)=\{x_1,x_2\},\,TGran=TGran-POS_{red}(TGran)=\{\{x_1\},\{x_2\}\}.\)

Step 4. \(Sig_{1}^{outer}({a_1},red,TGran,U') = \varGamma ^{U'}_{red \cup \{a_1\}}(TGran)-\varGamma ^{U'}_{red}(TGran)=1-0=1\). So \(a_{max}=a_1,\,red:=red \cup \{a_12\}=\{a_3,a_2,a_1\}\). We have \(POS_{red}(TGran)=\{x_1,x_2\}\). \(U'\) and TGran are updated as follows, \(U':=U'-POS_{red}(TGran)=\emptyset ,\,TGran=TGran-POS_{red}(TGran)=\emptyset .\)

Step 5. Because \(U'=\emptyset \), program is over. Algorithm outputs \(red=\{ {a_3},{a_2},{a_1}\}\) as the result.

According to Definition 1, we know \(\gamma _{red}(x_{1})=\{P_{1}\}\) and \(\gamma _{red}(x_{2})=\{P_{2}\}\); for \(x \in \{x_{3} , x_{4}\}\), we have \( \gamma _{red}(x)=\{P_{1},P_{2}\}\); for \(x \in \{x_{5},x_{6},x_{7}\}\), we know \( \gamma _{red}(x)=\{P_{2}\}\). Meanwhile, we know \(\gamma _{C}(x_{1})=\{P_{1}\}\) and \(\gamma _{C}(x_{2})=\{P_{2}\}\); for \(x \in \{x_{3},x_4\} \gamma _{C}(x)=\{P_{1},P_{2}\}\); for \(x \in \{x_5,x_6,x_7\}\), we have \(\gamma _{C}(x)=\{P_{2}\}\). It is obvious that for \(\forall x \in U,\,\gamma _{red}(x)=\gamma _{C}(x)\). Finally we know that MDRAUCE and MDRAUDD are correct.

4.2 The Efficiency of Proposed Algorithms

In this part, we employed 12 data sets to verify the performance of time consumption of MDRAUDD, MDRAUDD, Q-MDRA [15] and QGARA-FS [16]. We carried out all the attribute reduction algorithms in experiments on a personal computer with Windows 10, Intel(R) Core(TM) CPU i5-8265U 1.60GHZ and 8GB RAM memory. The software used was Visual Studio Code 1.3.8, and the programming language was python 3.7.

The data sets used in experiments are all downloaded from UCI repository of machine learning data sets [25] whose basic information is outlined in Table 2. For the sake that reduction algorithms can address only symbolic data, data sets containing continuous attributes were preprocessed by CAIM [26] discretization algorithm. For each data sets, the positive region dependency degree, i.e. \(\gamma _C(D)\), is listed in the last column of Table 2. As we know, the data set is consistent if \(\gamma _C(D)=1\); otherwise, it is inconsistent. As shown in Table 2, Wpbc, Wine, and Sonar are consistent. Taking into consideration the value of \(\gamma _C(D)\), we take Sat, Segment, Wdbc, and Wave as consistent data sets whose value of \(\gamma _C(D)\) satisfies \( 0.981\le \gamma _C(D) \le 1\). The other 5 data sets (Vehicle, Ion, Glass, Heart, and Pid) are inconsistent.

Table 2. Description of data sets
Table 3. Time consumption of maximum distribution reduction algorithms

Table 3 indicate the computational time of MDRAUCE, MDRADD, Q-MDRA, and QGARA-FS for obtaining maximum distribution reduct on 12 data sets. We can see that MDRADD was the fastest of four attribute reduction algorithms for that it was the best on 11 data sets, and MDRAUCE was faster than QGARA-FS. MDRAUCE performed better than Q-MDRA in obtaining the reduct of 9 data sets. Q-MDRA performed better than MDRAUCE, MDRAUCE on small data sets,i.e. Wine data set. However, in processing the large scale data, Q-MDRA consumed more time than MDRAUCE, MDRADD. From results of experiments on both consistent and inconsistent decision tables, the computational times of four algorithms in obtaining the maximum distribution reduct followed this order: MDRADD \(\ge \) MDRAUCE, Q-MDRA > QGARA-FS. For most of the cases in experiments, the computational time of MDRAUDD can reduce half of the computation time of QGARA-FS and Q-MDRA, such as data sets Wpdc, Glass, Heart, etc. In the same condition. from the row of average time consumption in obtaining reduct of 12 data sets, we know that MDRAUCE and MDRADD are more efficient and steady in time consumption of maximum distribution reduction than existing maximum distribution reduction algorithms.

5 Conclusion

In this paper, we focus on the maximum distribution reduction for complete inconsistent decision tables. The problems in Li’s algorithm for obtaining the maximum distribution reduct were pointed out, and based on classic heuristic functions, we designed two novel heuristic algorithms, i.e. MDRAUCE and MDRADD, to efficiently finding a maximum distribution reduct. Because the scale of data processed becomes larger and larger, the efficiency of attribute reduction algorithms is still our focus of future researches.