Keywords

1 Introduction

We have witnessed a very rapid growth of rough set theory in recent years; rough set theory has been successfully applied in such fields as knowledge discovery, decision analysis, pattern classification, fault diagnosis, etc., [1]. Attribute reduction is one of the key issues in rough set theory. Reference [2] proposed a discernibility matrix method, in which any two objects determine one feature subset that can distinguish them and could obtain all attribute reducts of a given data set. According to the discernibility matrix viewpoint, Ref. [3] provided a technique of attribute reduction for interval ordered information systems, set-valued ordered information systems, and incomplete ordered information systems, respectively. The idea of attribute reduction using positive region, which remains the positive region of target decision unchanged originated by Refs. [4, 5], gave an extension of this positive region reduction for hybrid attribute reduction in the framework of fuzzy rough set. In heuristic search strategies among attribute reduction methods, some attribute significance measures such as dependency function, information gain, consistency, and other measures are employed to select a feature subset. In fact, several authors [6] have applied variants entropies, combination entropy, or mutual information to measure uncertainty of an information system and construct heuristic algorithm of attribute reduction. The information entropy [7] can measure both the uncertainty of an information system and the fuzziness of a rough decision in rough set theory.

Each of these methods is a good tool to handle data redundancy problem in real life and preserves the particular properties. However, the spatial complexity in computing discernibility matrices, the time in computing information entropy are very expensive, and the theoretical proof in entropy is extremely difficult and using intelligent optimization algorithms to solve reduction problem have to answer the question of how to select the suitable model to match the reduction problem.

In this paper, we present a novel attribute reduction algorithm based on equivalence classes with multiple decision values in rough set. The proposed algorithm fits for the decision table with discrete-valued or continuous-valued attributes. We start from equivalence classes which belong to the partition formed by single condition attribute with respect to decision attributes, union the ordered condition attribute satisfied the special conditions in every iterative round and finally acquire an attribute subset which can hold the same descriptive or classification ability as the entire set of attributes.

2 Preliminaries

Rough set theory deals with information represented by a table called decision system, which consists of objects and attributes. This section reviews some essential definitions of rough set that are used for attribute reduction. Detailed description and formal definitions of the theory can be found in Ref. [1].

Definition 1

Let \( S = (U,A_{t} ,V,f) \) be a decision system, also called information system, where \( U = \{x_{ 1} ,x_{ 2} , \ldots ,x_{n} \} \) called universe, is a nonempty finite set of objects. \( {A_{t} = C \cup D} \) is a nonempty finite set of attributes, where \( {C = {\text{\{ }}a_{ 1} ,a_{ 2} , \ldots ,a_{m} {\text{\} }}} \) is the set of condition. Attributes describing the objects and \( D \) is a set of decision attributes that indicates the classes of objects. \( V = \mathop \cup \limits_{{a \in A_{t} }} V_{a} \) is the union of attribute domains, where \( V_{a} \) is the domain of attribute \( a \in A_{t} \). \( {f :U \times A_{t} \to V} \) is an information function where \( f(x,a) \in V_{a} \) for every \( x \in U,a \in A_{t} \).

Definition 2

In a given decision system \( S = (U,A_{t} ,V,f) \), an indiscernibility relation with respect to \( A \subseteq A_{t} \) is defined as \( IND(A) = \{ (x,y) \in U \times U|\forall a \in A[f(x,a) = f(y,a)]\} \).

In rough set, an indiscernibility relation \( IND(A) \) is always deemed as the knowledge in \( S \). Obviously, an indiscernibility relation \( IND(A) \) is reflexive, symmetric and transitive, and thus is an equivalence relation on \( U \) and \( IND(A) = \mathop \cap \limits_{a \in A} IND(\{ a\} ) \). The equivalence relation \( IND(A) \) induces a partition of \( U \), which is denoted by \( U/A \), where an element of \( U/A \) is called an equivalence class or elementary set.

Consider a partition \( U/D = \{ D_{1} ,D_{2} , \ldots ,D_{k} \} \) of the universe U with respect to the decision attribute \( D \) and another partition \( U/A = \{ U_{1} ,U_{2} , \ldots ,U_{r} \} \) defined by a set of condition attributes \( A \). The equivalence classes induced by the partition \( U/A \) are the basic blocks to construct the Pawlak rough set approximations.

Definition 3

Given any subsets \( A \subseteq C \) and \( X \subseteq U \). The \( A \)-lower and \( A \)-upper approximations set of set \( X \) is defined, respectively, as follows:

$$ {A_{ - } (X) = \cup {\text{\{ }}Y \subseteq U/A |Y \subseteq X{\text{\} }}} $$
(63.1)
$$ A^-{(X)} = {\cup} \{ {\text{Y}} {\subseteq} {U/A |Y }{\cap} {X \ne \varnothing} \} $$
(63.2)

\( A_{ - } (X) \) is the set of all objects from \( U \) which can be classified certainly as elements of \( X \) employing the set of attributes \( A \). \( A^{ - } (X) \) is the set of objects of \( U \) which can be possibly classified as elements of \( X \) using the set of attributes \( A \).

Definition 4

The \( A \)-lower approximation of \( X \) is usually called the positive region, written \( POS_{A} (X) \), that is

$$ {POS_{A} (X) = A_{ - } (X)} $$
(63.3)

\( POS_{A} (X) \) indicates the union of all the equivalence classes in \( U/A \) and each element can induce a certain decision.

Definition 5

For a given decision system \( S = (U,A_{t} ,V,f) \) and the equivalence relations \( P,Q \) defined on \( U \), if all attributes on \( P \) are necessary for \( Q \), and then \( P \) is independent to \( Q \).

From those related conceptions, one could conclude that attribute reduction based on the rough set theory conceptually requires keeping the relation of object discernibility. Therefore, the reduction process has to implicitly or explicitly utilize the decision information of information system.

3 Main Achievement

Attribute reduction is a tool to remove redundant condition attributes on condition that the classification or decision-making capacity in information systems does not change. In this paper, our attribute reduction algorithm focuses on three core issues: whether jointed condition attributes would change the condition attribute compatibility of the original decision table; on which condition jointing attributes would achieve the required reduction condition; choosing which condition attributes to joint. This paper first presents the theorem which solves the first core issue: decision value of some equivalence classes do not change after jointing condition attributes.

Definition 7

For any given subsets \( A \subseteq C \) and \( X \subseteq U \) if the decision values for all objects in some of equivalence classes induced by in \( U/A \) are same, these equivalence classes are called equivalence classes with certain decision value, otherwise, they are called equivalence classes with uncertain decision values.

For simplicity, we assume \( D = \{ d\} ,V_{d} = \{ 0,1\} \) in this paper, where d is a decision attribute which describes the decision for each object. A table with multiple decision attributes can be easily transformed into a table with a single decision attribute by considering the Cartesian product of the original decision attributes.

Theorem 1 Suppose \( d \) is the decision attribute. \( A' \) is the subset of condition attribute set \( C(A' \subseteq C) \)? For each of equivalence classes in \( POS_{A'} (U) \), the set \( A' \cup \{ a\} \) will partition it, and the partitioned equivalence classes belong to \( POS_{{A' \cup \{ a\} }} (\{ d\} ) \) partly.

Proof

Given a decision system \( S = (U,C \cup D,V,f),A \subseteq C,a \in C - A', \) According to the definition in Sect. 2, let

$$ POS_{A'} (U) = U/A' = \{ E_{1} ,E_{2} , \ldots ,E_{q} \} , $$
(63.4)
$$ POS_{{\{ d\} }} (U) = U/\{ d\} = \{ E_{0}^{'} ,E_{1}^{'} \} , $$
(63.5)
$$ POS_{{A' \cup \{ a\} }} (U) = U/(A' \cup \{ a\} ) = \{ E_{1}^{''} ,E_{2}^{''} , \cdots E_{r}^{''} \} , $$
(63.6)

For any \( E_{j} (j \in \{ 1,2, \ldots ,n\} ), \) there must exist an equivalence class \( E_{s}^{''} ,s \in \{ 1,2, \cdots r\} \) in the partition \( POS_{{A' \cup \{ a\} }} (U) \) and a equivalence class \( E_{k}^{'} (k \in \{ 0,1\} ) \), such that \( E_{i}^{\prime \prime } \subset E_{j} \),\( E_{j} \subset E_{k}^{\prime } \) furthermore, \( E_{i}^{\prime \prime } \subset E_{k}^{\prime } \) Hence, we have \( {E_{i}^{\prime \prime } \subset POS_{{A' \cup \{ a\} }} (U)} \). The theorem holds.

This theorem shows that equivalence classes induced by condition attribute subset could be segmented and after merging the condition attributes the segments have the same decision value as every subset of the equivalence classes do. This theorem provides a theoretical basis for jointing condition attributes.

Theorem 2 will answer the key problem on which condition combining attributes could achieve the required reduction condition.

Theorem 2 for a given decision system \( S = (U,C \cup D,V,f) \), assume \( A_{i} ,A_{k} \in C, \)

\( {E_{{i,q_{j} }} } \in U/A_{i} ,{E_{{k,q_{j} }} } \in U/A_{k} ,(j = 0,1),i,k = 1,2, \ldots m \) whose decision value is 0 or 1, \( {E_{{i,q_{{_{X} }} }} },{E_{{k,q_{{_{X} }} }} } \) be the uncertain equivalence class formed by condition attribute \( A_{i} ,A_{k} \) with respect to the decision attribute respectively, when the condition \( E_{{i,q_{X} }} \cap E_{{k,q_{X} }} = \varnothing \) and \( i \ne k \) are satisfied, we can achieve the reduction ensuring the compatibility.

Proof Take the two uncertain equivalence classes \( E_{{i,q_{X} }} \) and \( E_{{k,q_{X} }} \) for example, \( E_{{i,q_{X} }} \cap E_{{k,q_{X} }} = \varnothing \) shows that although the decision value of the uncertain equivalence classes in the partition induced by \( A_{i} \) condition attribute is uncertain, the decision value of these classes can be confirmed in the partition induced by \( A_{k} \) condition attribute. In term of theorem 1, the decision value of the equivalence classes in the partition \( U/\{ a_{i} ,a_{k} \} \) is confirmed. So we get the reduction ensuring the compatibility. Theorem 2 is proved.

Next we propose attribute reduction algorithm based on uncertain equivalence classes in rough set. The algorithm starts from the uncertain equivalence class with lest cardinality in a single condition attribute and jointing condition attributes one by one by step until the reduction condition in theorem 2 is satisfied.

In the rough set model, attribute reduction algorithms mainly deal with categorical data. Thus, we can recode the symbol attributes with a set of consecutive nature numbers. In this paper, we focus on discussing a decision table with a set of integral numbers and apply an efficient sort algorithm for computing equivalence classes, positive regions [8].

Algorithm 1 Computing \( U/A \) algorithm

  • Input: a decision information system S, \( {U = {\text{\{ }}x_{ 1} ,x_{ 2} , \ldots ,x_{n} {\text{\} }}},A = \{ a_{1} ,a_{2} , \ldots a_{h} \} \subseteq C \)

  • Output: \( U/A \)

  • Begin

  1. Step 1:

    To every \( a_{i} (i = 1,2, \ldots h) \) denote, the maximum value on attribute \( a_{i} \) by \( M_{i} \) initialize every element in array Order with 1, 2… n, respectively. and all the elements in array TempOrder with 0.

  2. Step 2:

    Step 2: For i = 1 to h do

    • For k = 1 to \( M_{i} \) do

    • {Count[k] = 0;}

    • For j = 1 to n do

    • {Count [\( a_{i} (x_{j} ) \)] ++;TempOrder[j] = Order[j];}//count [\( a_{i} (x_{j} ) \)] now contains the number of elements equal to \( a_{i} (x_{j} ) \)

    • For k = 2 to \( M_{i} \) do

    • {Count[k] = count [k−1] + count[k]};

    • //count[k]now contains the number of elements less than or equal to k

    • For k = n to 1 do

    • {Order [count [\( a_{i} (x_{j} ) \)]] = TempOrder[j]; count [\( a_{i} (x_{j} ) \)] –;}

  3. Step 3:

    let the object sequence from step 2 be \( {\text{\{ }}x_{1}^{'} ,x_{2}^{'} , \ldots ,x_{n}^{'} {\text{\} }} \); t = 1, \( B[t] = x_{1}^{'} \)

  • For j = 2 to n do

  • If \( a_{i} (x_{j}^{'} ) = a_{i} (x_{j - 1}^{'} ) \) for all \( a_{i} \in A(i = 1,2, \cdots h), \) then

  • $$ B[t] = B[t] \cup \{ x_{j}^{'} \} ; $$
  • Else {t = t+1; \( B[t] = x_{j}^{'} \);

  • End

Algorithm 2 Attribute reduction algorithm.

Input: Give a decision information system \( S = (U,C \cup D,V,f),C = \{ a_{1} ,a_{2} , \ldots a_{m} \} \).

Output: Output the reduction of information system \( S \).

Begin

  1. Step 1:

    \( L = \phi ;M = \phi ; \)//L, M is variables//.

  2. Step 2:

    by algorithm 1, acquire \( {E_{i,0} },{E_{i,1} },E_{{i,q_{X} }} ,{POS_{{a_{i} }} (U)},i = 1,2, \ldots m \),

  3. Step 3:

    Sort condition attributes in ascending order of the cardinality of \( E_{{i,q_{X} }} ,i = 1,2, \cdots m \) and stored sorted attribute’s ID in array sort-attar [].

  4. Step 4:

    M = sort-attar [1];

  5. Step 5:
    $$ j = 2 $$
  6. Step 6:

    While (\( card(E_{{M,q_{X} }} ) \ne 0 \)) and \( j <= length(sort\_attr[]) \), \( K = card(E_{{M,q_{X} }} ) \), \( L = M \cup sort\_attr[j] \); if \( card(E_{{L,q_{X} }} ) \) < K then \( M = L;K = card(E_{{L,q_{X} }} ) \); j = j+1;

  7. Step 7:

    if M is independent, turn to step 8, otherwise, For \( \forall a_{i} \in M \), if \( IND(M) <> IND(M - \{ a_{i} \} ) \), then \( M = M - \{ a_{i} \} \); \( IND(M) = IND(M - \{ a_{i} \} ) \); turn to step 7.

  8. Step 8:

    if\( IND(M) = IND(C) \), so the set M is just an attribute reduction of the condition attributes set C

Otherwise \( P = IND(C) - IND(M) \), for any \( \bar{a} = C - M \), if \( P/(M \cup \{ \bar{a}\} ) = \phi , \) then turn to step 4, otherwise by algorithm 1, acquire \( E_{{a^{'} ,q_{X} }} ,\forall a^{'} \in M \cup \{ \bar{a}\} , \) and search for min (\( E_{{a^{'} ,q_{X} }} \)) denoted as \( a_{\hbox{min} } \). Let \( M = M \cup a_{\hbox{min} } \), turn to step 7

End

So the set M is just an attribute reduction of the condition attributes set C and a reduction of information system \( S \) is also obtained.

In the algorithm, we only consider the judgment \( card(E_{{L,q_{X} }} ) < card(E_{{M,q_{X} }} ) \) and do not consider the issue \( card(E_{{M,q_{X} }} ) \le card(E_{{L,q_{X} }} ) \) precisely. In fact, when the judgment is \( card(E_{{M,q_{X} }} ) = card(E_{{L,q_{X} }} ) \), the candidate attributes are not unique and we could obtain multiple solutions.

4 Experimental Analysis

In this section, we will demonstrate the performance of our algorithm given in the above section. The data set from Ref. [9] is a turbine fault diagnosis example. In this example, there are 11 continuous attributes and 21 objects. We collect 15 objects, ID denoted by 2, 3, 4, 5, 6, 8, 9, 10, 12, 13, 14, 15, 17, 19, 20 as the training set, and the remaining 6 objects, ID denoted by 1, 7, 11, 16, 18, 21 as the test set. We use our algorithm on the training set for attribute reduction and verify its result.

  1. Step 1:

    Calculate \( {E_{i,0} },{E_{i,1} },E_{{i,q_{X} }} ,{POS_{{a_{i} }} (U)},i = 1,2, \cdots 11 \), then we get Table 63.1, where \( {a_{i} } \) is the condition attribute, \( j = 0,1 \) corresponds to the equivalence class whose decision value is 0 or 1, \( j = X \) corresponds to the equivalence classes whose decision value is uncertain, the data in Table 63.1 corresponds to the objects’ IDs and the IDs in a {} are in a equivalence class.

    Table 63.1 The result of step 1
  2. Step 2:

    Sort condition attributes in ascending order of the cardinality of \( E_{{i,q_{X} }} ,i = 1,2, \ldots 11 \), acquire, \( Sort\_attr[] = [a_{9} ,a_{7} ,a_{4} ,a_{2} ,a_{1} ,a_{8} ,a_{11} ,a_{6} ,a_{3} ,a_{5} ,a_{10} ] \)

  3. Step 3:

    Let M = Sort-attar [1]

  4. Step 4:

    The \( card(E_{{M,q_{X} }} ) = 2 \ne 0 \). So we start the loop. When \( L = M \cup a_{j} \) \( = \{ a_{9} ,a_{7} ,a_{2} \} \), then \( card(E_{{L,q_{X} }} ) = 0 < \) \( card(E_{{M,q_{X} }} ) = 1 \) and \( N = \{ a_{2} \} \), the loop end the \( M = M \cup N = \{ a_{9} ,a_{7} ,a_{2} \} \).and \( \cup \{ a_{i} |a \in M,i = 1,2, \ldots ,card(M)\} \) = \( a_{9} \cup a_{7} \cup a_{2} \). Calculate the partition \( U/(a_{9} \cup a_{7} \cup a_{2} ) = \{ \{2\} ,\{3\} ,\{4\} ,\{5\} ,\{6\} ,\{8\} ,\{9\} ,\{10 {\text{\} \{12\} ,}}\{13\} ,{\text{\{14\} ,\{15\} ,\{17\} ,\{19}}\} ,\{20\} \} ; \) so we achieve a reduction:\( a_{9} \cup a_{7} \cup a_{2} \)

After obtaining the attribute reduction of decision table, we could test the performance of the reduction \( a_{9} \cup a_{7} \cup a_{2} \).

Experimental results show that the attribute reduction algorithm is simple and easy to implement. Compared with the algorithm of Ref. [6], it has a smaller number of cuts, a smaller number of rules, and the prediction results reach the required accuracy. The main work of the next step is to improve the accuracy of classification rules and eliminate the unclassified objects.

5 Conclusions

This paper offers a novel algorithm of attribute reduction which is suited for the information system with discrete-valued or continuous-valued attributes. The presented algorithm gives attribute reduction in continuous data itself and achieves classification prediction on dataset. In this paper, we prove the effectiveness and efficiency of the algorithm in theory and practice.