Attribute Reduction of Set-Valued Decision Information System

Hu, Jun; Huang, Siyu; Shao, Rui

doi:10.1007/978-3-319-91476-3_38

Jun Hu¹⁶,
Siyu Huang¹⁶ &
Rui Shao¹⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 854))

Included in the following conference series:

International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems

1120 Accesses

Abstract

In practice, we may obtain data which is set-valued due to the limitation of acquisition means or the requirement of practical problems. In this paper, we focus on how to reduce set-valued decision information systems under the disjunctive semantics. First, a new relation to measure the degree of similarity between two set-valued objects is defined, which overcomes the limitations of the existing measure methods. Second, an attribute reduction algorithm for set-valued decision information systems is proposed. At last, the experimental results demonstrate that the proposed method can simplify set-valued decision information systems and achieve higher classification accuracy than existing methods.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Attributes reductions of bipolar fuzzy relation decision systems

Article 02 November 2019

Unsupervised attribute reduction based on $\alpha $-approximate equal relation in interval-valued information systems

Article 25 February 2020

Attribute reduction in interval-valued information systems based on information entropies

Article 01 September 2016

Keywords

1 Introduction

With the development of information technology, the means of data acquisition becomes more and more diverse. Meanwhile, the cost of data storage is getting lower and lower. These make it possible to acquire and store large amount of data, which stimulate the urgent need for automatic data processing. In many real world problems, data uncertainty is pervasive. In the past few years, it can be found that more attention has been paid to uncertain data. Rough set theory [1] is a powerful tool for dealing with uncertainty. Generally, the data processed by rough set theory is complete, accurate and atomic. However, the data in many real problems may be incomplete, inaccurate or non-atomic due to the limitation of acquisition means or the requirement of practical problems. It has become an important issue how to process incomplete data, interval-valued data and set-valued data.

A set-valued information system is that the value for an object on an attribute is not an exact value, but a set containing all possible values. For the processing of set-valued information system, a number of related researches have been studied. Orlowska and Pawlak [3, 4] investigated set-valued information system considering non-deterministic information and introduced the concept of a non-deterministic information system. Sakai et al. [5, 6] set up the theoretical foundations and algorithmic background for adapting rough set methods for the purposes of the analysis of non-deterministic information systems. Yao [7, 8] proposed a number of set-based computation methods based on set-based information system. In addition, some new relations for set-valued information system and their corresponding attribute reduction methods were proposed. And then, Zhang et al. proposed the concept of set-valued information system and a similarity relation [11, 12]. A tolerance relation was defined and the largest tolerance class was used to divide the universe in that paper. Qian et al. [13] proposed a dominant relation for set-valued information system and a corresponding attribute reduction method. Dai and Tian [14] gave a fuzzy relation, that was used to measure the degree of similarity between two set-valued objects. Wang [15] pointed out that the family of reducts defined by Dai need not be a subset of the family of reducts defined within the standard rough set model for set-valued information system. Bao and Yang [16] proposed a $ \updelta $-dominance relation and a corresponding attribute reduction approach. Two types of fuzzy rough approximations, and two corresponding relative positive domain reductions were proposed by Wei et al. [17]. Moreover, some researchers transformed incomplete information systems into set-valued information systems to achieve reduction. Lipski [9, 10] discussed set-valued information systems from the view of incomplete information systems under the case of missing values. And as a special case of set values, incomplete information is probabilistically dealt with by some authors [18, 19] in the case of missing values.

It is an important issue how to define a binary relation dealing with set-valued information systems by rough set theory. Generally, there are two different semantic interpretations of set-valued data, namely, the conjunctive semantics and the disjunctive semantics [2]. Many different definitions have been proposed for these two semantic interpretations. However, we find that they are not appropriate when dealing with the set-valued data under the disjunctive semantics. For this reason, we developed a new approach based on probability, which can characterize the relation between two set-valued objects more reasonable under the disjunctive semantics. Then, an attribute reduction algorithm based on keeping positive domain for set-valued decision information systems was proposed and some experiments were conducted to prove the effect of the algorithm.

The study is structured as follows: In Sect. 2, some basic concepts of set-valued information system are reviewed. A new approach based on probability is proposed to measure the degree of the similarity between two set-valued objects under the disjunctive semantics in Sect. 3. In Sect. 4, we put forward an attribute reduction algorithm for set-valued decision information system. In Sect. 5, the experimental results and relative analysis are presented. Section 6 concludes the paper.

2 Preliminaries

In this section, some basic concepts about set-valued information system will be reviewed.

Definition 1

[12]. An information system is defined as $ \left( {U,A,V,F} \right) $, where $ U $ is a non-empty finite set of objects called the universe. $ A $ is a non-empty finite set of attributes. $ V $ is a union of attribute domains $ \left( {V = \cup_{a \in A} V_{a} } \right) $, $ V_{a} $ is a set including all possible values for $ a \in A $. $ F:U \times A \to V $ is a function that assigns particular values from attribute domain to objects. For $ \forall a \in A $, $ x \in U $, $ F\left( {a,x} \right) \in V_{a} $, $ F\left( {a,x} \right) $ is the value of $ a $ for $ x $. If for any $ a $ and $ x $, $ F\left( {a,x} \right) $ is a single value, then the information system is called a single-valued information system. Otherwise, it is called a set-valued information system.

There are two different semantic interpretations for set-valued information system [2]. The one is conjunctive semantics, the other is disjunctive semantics. Under the conjunctive semantics, a set value represents all the values of an object to an attribute, and all the values in the set are true. Under the disjunctive semantics, a set value represents all possible values of an object to an attribute, but there is only one true value in the set. For example, for $ x \in U $ and $ a \in A $, where $ a $ means the languages that $ x $ can speak. Let $ a\left( x \right) = $ {English, Chinese, French}. If $ a\left( x \right) $ is interpreted conjunctively, it means $ x $ can speak English, Chinese, and French. If $ a\left( x \right) $ is interpreted disjunctively, it means $ x $ can speak only one of English, Chinese, and French. We mainly study set-valued information system under the disjunctive semantics in this paper. In the following, a set-valued information system is under the disjunctive semantics if not otherwise specified.

To characterize the relation between two objects in a set-valued information system, many relations have been developed. Here we cite two important definitions of them.

Definition 2

[10]. Let $ S = \left( {U,A,V,F} \right) $ be a set-valued information system. For $ \forall b\,\in\,A $, a tolerance relation can be defined as follows:

$$ R_{b}^{ \cap } = \left\{ {\left( {x,y} \right) \in U\,|\,b\left( x \right) \cap b\left( y \right) \ne \varnothing } \right\}. $$

(1)

Then, for $ B\,\subseteq\,A $, a tolerance relation can be defined as follows:

$$ R_{B}^{ \cap } = \left\{ {\left( {x,y} \right) \in U\,|\,\forall b \in B,b\left( x \right) \cap b\left( y \right) \ne \varnothing } \right\} = \mathop \cap \limits_{b \in B} R_{b}^{ \cap } . $$

(2)

It is obvious that $ R_{B}^{ \cap } $ is reflexive and symmetric, but not necessarily transitive.

Dai et al. [14] defined a fuzzy relation, then the fuzzy relation is used to measure the similarity between set-valued objects.

Definition 3.

Let $ S = \left( {U,A,V,F} \right) $ be a set-valued information system. For $ \forall b \in A $, a fuzzy relation $ \widetilde{{R_{b} }} $ can be defined as follows:

$$ \mu_{{\widetilde{{R_{b} }}}} \left( {x,y} \right) = \frac{{\left| {b\left( x \right) \cap b\left( y \right)} \right|}}{{\left| {b\left( x \right) \cup b\left( y \right)} \right|}}. $$

(3)

Thus, for $ B\,\subseteq\,A $, a fuzzy relation can be defined as follows:

$$ \mu_{{\widetilde{{R_{B} }}}} \left( {x,y} \right) = \mathop {\inf }\limits_{b \in B} \mu_{{\widetilde{{R_{b} }}}} \left( {x,y} \right). $$

(4)

The above two definitions are not reasonable in some real problems. For example, let $ a\left( x \right) = $ {English, Chinese, French}, $ a\left( y \right) = $ {German, Japanese, English, Chinese}, and $ a\left( z \right) = $ {English, Chinese, French}. According to Definition 2, $ \left( {y,z} \right) \in R_{a}^{ \cap } $ because $ c\left( y \right) \cap c\left( z \right) = $ {English, Chinese} is not empty. That is to say, $ y $ and $ z $ are indiscernible with respect to the tolerance relation. In other words, $ y $ and $ z $ speak the same language. However, they may speak different languages. For example, $ y $ can only speak English and $ z $ can only speak French. By Definition 3, we know that $ \mu_{{\widetilde{{R_{a} }}}} \left( {x,z} \right) = 1 $. Thus, $ x $ and $ z $ are indiscernible with respect to the fuzzy relation. It means that $ x $ and $ z $ speak the same language. However, they may speak different language either. For example, $ x $ can only speak English and $ z $ can only speak French.

From this example, we find that two objects under the disjunctive semantics satisfied the existing relations only denote that they have possibility to be similar. Therefore, a new relation, which can describe the relation between two set-valued objects under the disjunctive semantics, should be defined.

3 A New Similarity Relation Based on Probability

As the analysis in Sect. 2, the existing methods may get unreasonable results under the disjunctive semantics. In order to solve this problem, a similarity relation based on probability is proposed in this section.

Definition 4.

Let $ S = \left( {U,A,V,F} \right) $ be a set-valued information system. For $ \forall b \in A $, the similarity between $ x $ and $ y $ on $ b $ is defined as follows.

$$ \mu_{{R_{b} }} \left( {x,y} \right) = \left\{ {\begin{array}{*{20}l} {\frac{{\left| {b\left( x \right)\,\cap\,b\left( y \right)} \right|}}{{\left| {b\left( x \right)} \right|\,*\,\left| {b\left( y \right)} \right|}},} \hfill & {x \ne y} \hfill \\ {1,} \hfill & {x = y} \hfill \\ \end{array} } \right.. $$

(5)

From the view of probability, $ \mu_{{R_{b} }} \left( {x,y} \right) $ is the possibility that $ x $ and $ y $ take the same value. For $ B\,\subseteq\,A $, the similarity relation $ R_{B} $ between $ x $ and $ y $ on $ B $ can be defined as follows:

$$ \mu_{{R_{B} }} \left( {x,y} \right) = \prod\limits_{b \in B} {\mu_{{R_{b} }} \left( {x,y} \right)} . $$

(6)

There are some important properties of the similarity relation defined above:

(1)
$ R_{B} $ is reflective.

Proof.

Since $ \mu_{{R_{b} }} \left( {x,x} \right) = 1 $, then $ \mu_{{R_{B} }} \left( {x,y} \right) = 1 $, we know that $ R_{B} $ is reflective.

(2)
$ R_{B} $ is symmetric.

Proof.

Since $ \mu_{{R_{b} }} \left( {y,x} \right) = \frac{{\left| {b\left( y \right)\,\cap\,b\left( x \right)} \right|}}{{\left| {b\left( y \right)} \right|\,*\,\left| {b\left( x \right)} \right|}} = \frac{{\left| {b\left( x \right)\,\cap\,b\left( y \right)} \right|}}{{\left| {b\left( x \right)} \right|\,*\,\left| {b\left( y \right)} \right|}} = \mu_{{R_{b} }} \left( {x,y} \right) $, then $ \mu_{{R_{B} }} \left( {y,x} \right) = \mu_{{R_{B} }} \left( {x,y} \right) $, we know that $ R_{B} $ is symmetric.

For the same example used before, according to Definition 4, the similarity between $ x $ and $ y $ is $ \mu_{{R_{a} }} \left( {x,y} \right) = \frac{1}{6} $, and the similarity between $ x $ and $ z $ is $ \mu_{{R_{a} }} \left( {x,z} \right) = \frac{1}{3} $. That is to say, the probability of $ x $ and $ y $ speaking the same language is $ \frac{1}{6} $, and the probability of $ x $ and $ z $ speaking the same language is $ \frac{1}{3} $. Comparing to the existing definitions, the definition we proposed gives a more reasonable and meaningful description between two objects with set values.

Definition 5.

Let $ S = \left( {U,A,V,F} \right) $ be a set-valued information system, for $ B\,\subseteq\,A $, $ x \in U $, the $ \delta $ similarity class of $ x $ with respect to $ B $ can be defined as follows:

$$ \delta_{B} \left( x \right) = \left\{ {y \in U|\mu_{{R_{B} }} \left( {x,y} \right) \ge \delta } \right\}\left( {0 \le \delta \le 1} \right), $$

(7)

where $ \delta $ is a threshold. We can use $ \delta $ to control the size of the information granules generated by $ B $. Specifically, the bigger the $ \delta $, the smaller size of the information granules generated by $ B $.

Theorem 1.

Let $ S = \left( {U,A,V,F} \right) $ be a set-valued information system, $ B $ and $ B^{\prime } $ be two subsets of $ A $, $ \delta $ be a threshold. For $ x \in U $, if $ B \supseteq B^{\prime } $, then $ \delta_{B} \left( x \right)\,\subseteq\,\delta_{{B^{\prime}}} \left( x \right) $.

Proof.

Let $ B^{\prime } = \left\{ {b_{1} ,b_{2} , \ldots ,b_{n} } \right\} $, $ B = B^{\prime } \cup \left\{ {b_{n + 1} ,b_{n + 2} , \ldots ,b_{n + m} } \right\} $. $ \forall x \in U $, $ \forall y \in U $. Then:

$$ \begin{aligned} \mu_{{R_{{B^{\prime}}} }} \left( {x,y} \right) & = \prod\nolimits_{{b \in B^{\prime}}} {\mu_{{R_{b} }} \left( {x,y} \right)} = \mu_{{R_{{b_{1} }} }} \left( {x,y} \right) * \mu_{{R_{{b_{2} }} }} \left( {x,y} \right) * \ldots * \mu_{{R_{{b_{n} }} }} \left( {x,y} \right) \\ \mu_{{R_{B} }} \left( {x,y} \right) & = \prod\nolimits_{b \in B} {\mu_{{R_{b} }} \left( {x,y} \right)} \\ & = \mu_{{R_{{b_{1} }} }} \left( {x,y} \right) * \mu_{{R_{{b_{2} }} }} \left( {x,y} \right) * \ldots * \mu_{{R_{{b_{n + 1} }} }} \left( {x,y} \right)* \ldots *\mu_{{R_{bn + m} }} \left( {x,y} \right) \\ & = \mu_{{R_{{B^{\prime}}} }} \left( {x,y} \right)*\mu_{{R_{{b_{n + 1} }} }} \left( {x,y} \right)* \ldots *\mu_{{R_{bn + m} }} \left( {x,y} \right) \\ \end{aligned} $$

Because $ \forall b\,\in\,A $, $ \mu_{{R_{b} }} \left( {x,y} \right) \in \left[ {0,1} \right] $, there must be $ \mu_{{R_{B} }} \left( {x,y} \right) \le \mu_{{R_{{B^{\prime}}} }} \left( {x,y} \right) $.

Suppose $ \forall x \in \delta_{B} \left( x \right) $, there exist $ \mu_{{R_{B} }} \left( {x,y} \right)\,\ge\,\delta $, then $ \delta\,\le\,\mu_{{R_{B} }} \left( {x,y} \right)\,\le\,\mu_{{R_{{B^{\prime}}} }} \left( {x,y} \right) $, that is to say, $ x \in \delta_{{B^{\prime } }} \left( x \right) $. Therefore, $ \delta_{B} \left( x \right)\,\subseteq\,\delta_{{B^{\prime}}} \left( x \right) $.

Theorem 1 shows that the number of the elements in the similarity class will be changed with the variation of condition attribute set. The smaller the condition attribute set is, the more elements will be in the similarity class.

Theorem 2.

Let $ S = \left( {U,A,V,F} \right) $ be a set-valued information system, $ B $ be a subset of $ A $, $ \delta_{1} $ and $ \delta_{2} $ be two thresholds. For $ x \in U $, if $ \delta_{1} \le \delta_{2} $, then $ \delta_{1B} \left( x \right) \supseteq \delta_{2B} \left( x \right) $.

Proof.

Suppose $ \forall x \in \delta_{2B} \left( x \right) $, that is to say, $ \mu_{{R_{B} }} \left( {x,y} \right) \ge \delta_{2} $. Because $ \delta_{1} \le \delta_{2} $, then there must be $ \mu_{{R_{B} }} \left( {x,y} \right) \ge \delta_{1} $, it can be inferred that $ x \in \delta_{1B} \left( x \right) $. Therefore, $ \delta_{1B} \left( x \right) \supseteq \delta_{2B} \left( x \right) $.

Theorem 2 indicates that we can control the elements in the similarity class by the threshold. The smaller the threshold is, the more elements will be in the similarity class.

Definition 6.

Let $ S = \left( {U,A,V,F} \right) $ be a set-valued information system, $ B $ be a subset of $ A $. Given an arbitrary set $ X\,\subseteq\,U $, the $ \delta $-upper approximation $ \overline{{B_{\delta } }} \left( X \right) $ and the $ \delta $-lower approximation $ \underline{{B_{\delta } }} \left( X \right) $ of $ X $ with respect to $ B $ are:

$$ \overline{{B_{\delta } }} \left( X \right) = \left\{ {x \in U\,|\,\delta_{B} \left( x \right) \cap X \ne \varnothing } \right\}, $$

(8)

$$ \underline{{B_{\delta } }} \left( X \right) = \left\{ {x \in U\,|\,\delta_{B} \left( x \right)\,\subseteq\,X} \right\} $$

(9)

where $ \underline{{B_{\delta } }} \left( X \right) $ contains all the objects that can be classified into $ X $ definitely, and $ \overline{{B_{\delta } }} \left( X \right) $ contains all the objects that can be classified into $ X $ approximately.

Theorem 3.

Let $ S = \left( {U,A,V,F} \right) $ be a set-valued information system, $ B $ and $ B^{\prime } $ be two subsets of $ A $, $ \delta $ be a threshold. For $ X\,\subseteq\,U $, if $ B \supseteq B^{\prime } $, then $ \overline{{B_{\delta } }} \left( X \right)\,\subseteq\,\overline{{B^{\prime }_{\delta } }} \left( X \right) $, and $ \underline{{B_{\delta } }} \left( X \right)\,\supseteq\,\underline{{B^{\prime }_{\delta } }} \left( X \right) $.

Proof.

The proof comes directly from Theorem 1, and hence it is omitted here.

Theorem 3 shows that the number of the elements in the upper and lower approximation sets will be changed with the variation of the condition attribute set. The smaller the condition attribute set is, the more elements will be in the $ \delta $-upper approximation set and less in the $ \delta $-lower approximation set.

Theorem 4.

Let $ S = \left( {U,A,V,F} \right) $ be a set-valued information system, $ B $ be a subset of $ A $, $ \delta_{1} $ and $ \delta_{2} $ be two thresholds. For $ X\,\subseteq\,U $, if $ \delta_{1}\,\le\,\delta_{2} $, then $ \overline{{B_{{\delta_{1} }} }} \left( X \right)\,\supseteq\,\overline{{B_{{\delta_{2} }} }} \left( X \right) $, and $ \underline{{B_{{\delta_{1} }} }} \left( X \right) \subseteq \underline{{B_{{\delta_{2} }} }} \left( X \right) $.

Proof.

The proof comes directly from Theorem 2, and hence it is omitted here.

Theorem 4 indicates that we can control the elements in the upper and lower approximation sets by changing the threshold. The smaller the threshold is, the more elements will be in the $ \delta $-upper approximation set and less in the $ \delta $-lower approximation set.

4 Attribute Reduction of Set-Valued Decision Information System

Generally, most of decision information systems have some redundant attributes. These redundant attributes, on the one hand, waste storage space and reduce the efficiency of data processing. On the other hand, they may be our interference to make correct and concise decisions. Next, we will discuss how to reduce a set-valued decision information system.

Definition 7.

Let $ S = \left( {U,C \cup D,V,F} \right) $ be a set-valued decision information system, where $ U $ is a non-empty finite set of objects called the universe. $ C $ is a set of condition attribute, $ D $ is a decision attribute, $ F $ is a function that assigns particular values from attribute domain to objects. $ {U \mathord{\left/ {\vphantom {U D}} \right. \kern-0pt} D} = \left\{ {d_{1} ,d_{2} , \ldots ,d_{m} } \right\} $ is a division of $ U $. For $ B\,\subseteq\,C $, the positive domain of $ D $ with respect to $ B $ is defined as:

$$ POS_{B} (D) = \bigcup\limits_{i = 1}^{m} {\underline{{B_{\delta } }} (d_{i} ),} $$

(10)

and the negative domain of $ D $ with respect to $ B $ is:

$$ NEG_{B} \left( D \right) = U - POS_{B} \left( D \right), $$

(11)

where $ POS_{B} \left( D \right) $ is the set of objects in $ U $ that can be classified into $ D $ definitely. $ NEG_{B} \left( D \right) $ is the set of objects in $ U $ that can not be classified into $ D $ definitely.

Theorem 5.

Let $ S = \left( {U,C \cup D,V,F} \right) $ be a set-valued decision information system, $ B $ and $ B^{\prime } $ be two subsets of $ C $. If $ B^{\prime }\,\subseteq\,B $, then $ POS_{{B^{\prime}}} \left( D \right)\,\subseteq\,POS_{B} \left( D \right) $.

Proof.

If $ x \in POS_{{B^{\prime}}} \left( D \right) $, there exist $ d_{i} \in {U \mathord{\left/ {\vphantom {U D}} \right. \kern-0pt} D} $ such that $ \delta_{{B^{\prime}}} \left( x \right)\,\subseteq\,d_{i} $. According to Theorem 1, we have $ \delta_{B} \left( x \right)\,\subseteq\,\delta_{{B^{\prime}}} \left( x \right) $. So, $ \delta_{B} \left( x \right)\,\subseteq\,d_{i} $, that is $ x \in POS_{B} \left( D \right) $. Thus, $ POS_{{B^{\prime}}} \left( D \right)\,\subseteq\,POS_{B} \left( D \right) $.

Theorem 5 shows that the positive domain of $ D $ with respect to an attribute subset, is also a subset of the positive domain of $ D $ with respect to all the attributes. That means, if some attributes are deleted from a set-valued decision information system, the positive domain may decrease or remain unchanged. If the positive domain remains unchanged after an attribute is deleted, it means this attribute is redundant for keeping the positive domain. In other words, removing redundant attributes does not affect the correct classification ability of the system.

Definition 8.

Let $ S = \left( {U,C \cup D,V,F} \right) $ be a set-valued decision information system. For $ \forall a \in C $, $ a $ is reducible in $ C $ with respect to $ D $ if $ POS_{{C - \left\{ a \right\}}} \left( D \right) = POS_{C} \left( D \right) $. Otherwise, $ a $ is irreducible in $ C $ with respect to $ D $.

Definition 9.

Let $ S = \left( {U,C \cup D,V,F} \right) $ be a set-valued decision information system, and $ B $ be a subset of $ C $. $ B $ is a reduction of $ C $ with respect to $ D $ if:

(1)
$ POS_{B} \left( D \right) = POS_{C} \left( D \right) $, and
(2)
$ \forall a \in B\,,\,POS_{{B{ - }\left\{ a \right\}}} \left( D \right) \ne POS_{B} \left( D \right). $

According to Definitions 8 and 9, it is obvious that $ B $, the reduction of $ C $, has the same classification ability as $ C $, and deleting any attributes from $ B $ will decrease the correct classification ability of the system. To obtain a reduction of a set-valued decision information system, we can develop the following algorithm.

For a given set-valued decision information system, Algorithm 1 check each attribute by the conditions stated in Definition 9. If the conditions are satisfied, then the attribute will be deleted. Otherwise it will be retained. Finally, we can get a reduction of the set-valued decision information system.

The following example illustrates how to form a reduction using Algorithm 1.

Example 1.

For the set-valued decision information system $ S $ shown in Table 1, we can compute the reduction of $ S $ as follow.

Table 1. A set-valued decision information system

Full size table

Let, $ \delta = {1 \mathord{\left/ {\vphantom {1 {16}}} \right. \kern-0pt} {16}} $, we have:

$$ \begin{aligned} & \delta_{C} \left( {x_{1} } \right) = \left\{ {x_{1} } \right\}\,,\,\delta_{C} \left( {x_{2} } \right) = \left\{ {x_{2} ,x_{8} } \right\} \\ & \delta_{C} \left( {x_{3} } \right) = \left\{ {x_{3} ,x_{4} ,x_{8} } \right\}\,,\,\delta_{C} \left( {x_{4} } \right) = \left\{ {x_{3} ,x_{4} } \right\} \\ & \delta_{C} \left( {x_{5} } \right) = \left\{ {x_{5} ,x_{6} } \right\}\,,\,\delta_{C} \left( {x_{6} } \right) = \left\{ {x_{5} ,x_{6} } \right\} \\ & \delta_{C} \left( {x_{7} } \right) = \left\{ {x_{7} } \right\}\,,\,\delta_{C} \left( {x_{8} } \right) = \left\{ {x_{2} ,x_{3} ,x_{8} } \right\} \\ & U/D = \left\{ {\left\{ {x_{1} ,x_{7} } \right\},\left\{ {x_{3} ,x_{4} } \right\},\left\{ {x_{2} ,x_{5} ,x_{6} ,x_{8} } \right\}} \right\} \\ & POS_{C} \left( D \right) = \left\{ {x_{1} ,x_{2} ,x_{3} ,x_{4} ,x_{5} ,x_{6} ,x_{7} ,x_{8} } \right\} \\ \end{aligned} $$

After $ a_{1} $ is deleted, we have:

$$ \begin{aligned} & \delta_{{C{ - }a_{1} }} \left( {x_{1} } \right) = \left\{ {x_{1} } \right\}\,,\,\delta_{{C{ - }a_{1} }} \left( {x_{2} } \right) = \left\{ {x_{2} ,x_{3} ,x_{8} } \right\} \\ & \delta_{{C{ - }a_{1} }} \left( {x_{3} } \right) = \left\{ {x_{2} ,x_{3} ,x_{4} ,x_{8} } \right\}\,,\,\delta_{{C{ - }a_{1} }} \left( {x_{4} } \right) = \left\{ {x_{3} ,x_{4} ,x_{7} } \right\} \\ & \delta_{{C{ - }a_{1} }} \left( {x_{5} } \right) = \left\{ {x_{5} ,x_{6} } \right\}\,,\,\delta_{{C{ - }a_{1} }} \left( {x_{6} } \right) = \left\{ {x_{5} ,x_{6} } \right\} \\ & \delta_{{C{ - }a_{1} }} \left( {x_{7} } \right) = \left\{ {x_{4} ,x_{7} } \right\},\delta_{{C{ - }a_{1} }} \left( {x_{8} } \right) = \left\{ {x_{2} ,x_{3} ,x_{8} } \right\}a_{3} \\ & POS_{{C{ - }a_{1} }} \left( D \right) = \left\{ {x_{1} ,x_{5} ,x_{6} } \right\} \\ \end{aligned} $$

It can be found that the positive domain has changed after $ a_{1} $ is deleted. So, $ a_{1} $ is irreducible. Likewise, $ a_{2} $, $ a_{3} $, $ a_{4} $ and $ a_{5} $ are all irreducible. Then, the reduction is $ \left\{ {a_{1} ,a_{2} ,a_{3} ,a_{4} ,a_{5} } \right\} $. Let, it can be found that $ a_{1} $$ , $ and $ a_{5} $ are irreducible, but and $ a_{4} $ are reducible. Then, the reduction is $ \left\{ {a_{1} ,a_{2} ,a_{5} } \right\} $. According to the above analysis, we can get different reducts by changing the value of $ \delta $.

5 Experimental Results

The experiments in this section are to demonstrate the effectiveness of the method proposed in this paper. There are five groups of data sets used in the experiments and the information of all datasets is shown in Table 2.

Table 2. Data description

Full size table

In the experiments, we first reduce the five data sets using the tolerance relation, the fuzzy relation and the similarity relation proposed in this paper respectively, then J48 and SMO were used to make comparisons on classification accuracy with the reduction results. The reduction results gotten by different relations are shown in Table 3, and the classification accuracy comparisons are shown as Figs. 1 and 2:

Table 3. Attribute numbers after reduction

Full size table

It is known that only one or two attributes were removed by attribute reduction in vote and breast-cancer from Table 2, and the classification accuracy gotten by the proposed method and the existing methods has no obvious difference on these two datasets. In other three datasets, the results on attribute reduction are significant. By comparison, the method proposed in this paper retains more attributes than the existing methods. On the other hand, the classification accuracy gotten by the proposed method is almost the same as the classification accuracy of the original data, and is obviously higher than the classification accuracy of the other methods. According to the experimental results, we can draw the conclusion that the method proposed in this paper maintains more attributes than the existing methods, but it ensures that the classification accuracy of the system does not change significantly. Although the existing methods remove more attributes, the classification accuracy of the system is also greatly decreased. That means the existing methods remove some useful attributes, and it is unacceptable in some practical problems.

6 Conclusions

A lot of different methods were proposed to deal with set-valued information system. However, they are not appropriate when dealing with the set-valued data under the disjunctive semantics. To address this problem, a similarity relation based on probability was defined to describe the relation between two set-valued objects. Then, a corresponding attribute reduction algorithm based on keeping positive domain was proposed. In the end, a group of experiments were conducted to demonstrate the effectiveness of the proposed methods. The experimental results indicate that the existing methods can get a smaller reduction, but they may remove some useful attributes, thus reduce the classification accuracy of the system significantly. The proposed method retains more attributes, but all these attributes are useful, and it can always get the classification accuracy close to the original data. Because the threshold has an important influence on the classification accuracy when we used the proposed methods, it will be our future work how to choose an appropriate threshold.

References

Pawlak, Z.: Rough set theory and its applications to data analysis. Cybern. Syst. 29(29), 661–688 (2010)
MATH Google Scholar
Guan, Y.Y., Wang, H.K.: Set-valued information systems. Inf. Sci.: Int. J. 176(17), 2507–2525 (2006)
Article MathSciNet Google Scholar
Orłowska, E.: Logic of nondeterministic information. Stud. Log. 44(1), 91–100 (1985)
Article MathSciNet Google Scholar
Orłowska, E., Pawlak, Z.: Representation of nondeterministic information. Theoret. Comput. Sci. 29(1), 27–39 (1984)
Article MathSciNet Google Scholar
Sakai, H., Ishibashi, R., Nakata, M.: Lower and upper approximations of rules in non-deterministic information systems. In: Chan, C.-C., Grzymala-Busse, J.W., Ziarko, W.P. (eds.) RSCTC 2008. LNCS (LNAI), vol. 5306, pp. 299–309. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88425-5_31
Chapter Google Scholar
Sakai, H., Ishibashi, R., Koba, K., Nakata, M.: Rules and apriori algorithm in non-deterministic information systems. In: Peters, J.F., Skowron, A., Rybiński, H. (eds.) Transactions on Rough Sets IX. LNCS, vol. 5390, pp. 328–350. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89876-4_18
Chapter Google Scholar
Yao, Y.Y.: Information granulation and rough set approximation. Int. J. Intell. Syst. 16(1), 87–104 (2001)
Article MathSciNet Google Scholar
Yao, Y.Y., Noroozi, N.: A unified framework for set-based computations. In: Proceedings of the 3rd International Workshop on Rough Sets and Soft Computing. The Society for Computer Simulation, pp. 252–255(1995)
Google Scholar
Lipski Jr., W.: On semantic issues connected with incomplete information databases. Trans. Database Syst. 4(3), 262–296 (1979)
Article Google Scholar
Lipski Jr., W.: On databases with incomplete information. J. ACM 28(1), 41–70 (1981)
Article MathSciNet Google Scholar
Zhang, W.X.: Information Systems and Knowledge Discovery. Science Press, Beijing (2003)
Google Scholar
Zhang, W.X., Mi, J.S.: Incomplete information system and its optimal selections. Comput. Math Appl. 48(5), 691–698 (2004)
Article MathSciNet Google Scholar
Qian, Y., Dang, C., Liang, J., Tang, D.: Set-valued ordered information systems. Inf. Sci. 179(16), 2809–2832 (2009)
Article MathSciNet Google Scholar
Dai, J., Tian, H.: Fuzzy rough set model for set-valued data. Fuzzy Sets Syst. 229, 54–68 (2013)
Article MathSciNet Google Scholar
Wang, C.Y.: A note on a fuzzy rough set model for set-valued data. Fuzzy Sets Syst. 294, 44–47 (2016)
Article MathSciNet Google Scholar
Bao, Z., Yang, S.: Attribute reduction for set-valued ordered fuzzy decision system. In: Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics, pp. 96–99 (2014)
Google Scholar
Wei, W., Cui, J., Liang, J., Wang, J.: Fuzzy rough approximations for set-valued data. Inf. Sci. 360, 181–201 (2016)
Article Google Scholar
Stefanowski, J., Tsoukiàs, A.: Incomplete information tables and rough classification. Comput. Intell. 17(3), 545–566 (2001)
Article Google Scholar
Nakata, M., Sakai, H.: Rough sets handling missing values probabilistically interpreted. In: Ślęzak, D., Wang, G., Szczuka, M., Düntsch, I., Yao, Y. (eds.) RSFDGrC 2005. LNCS (LNAI), vol. 3641, pp. 325–334. Springer, Heidelberg (2005). https://doi.org/10.1007/11548669_34
Chapter MATH Google Scholar

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61472056, 61533020, 61751312, 61379114), the Social Science Foundation of the Chinese Education Commission (15XJA630003), the Scientific and Technological Research Program of Chongqing Municipal Education Commission (KJ1500416), Chongqing Research Program of Basic Research and Frontier Technology (cstc2017jcyjAX0406).

Author information

Authors and Affiliations

Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
Jun Hu, Siyu Huang & Rui Shao

Authors

Jun Hu
View author publications
You can also search for this author in PubMed Google Scholar
Siyu Huang
View author publications
You can also search for this author in PubMed Google Scholar
Rui Shao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Hu .

Editor information

Editors and Affiliations

Universidad de Cádiz, Cádiz, Cadiz, Spain
Jesús Medina
Universidad de Málaga, Málaga, Málaga, Spain
Manuel Ojeda-Aciego
Universidad de Granada, Granada, Spain
José Luis Verdegay
Universidad de Granada, Granada, Spain
David A. Pelta
Universidad de Málaga, Málaga, Málaga, Spain
Inma P. Cabrera
LIP6, Université Pierre et Marie Curie, CNRS, Paris, France
Bernadette Bouchon-Meunier
Iona College, New Rochelle, New York, USA
Ronald R. Yager

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, J., Huang, S., Shao, R. (2018). Attribute Reduction of Set-Valued Decision Information System. In: Medina, J., et al. Information Processing and Management of Uncertainty in Knowledge-Based Systems. Theory and Foundations. IPMU 2018. Communications in Computer and Information Science, vol 854. Springer, Cham. https://doi.org/10.1007/978-3-319-91476-3_38

Download citation

DOI: https://doi.org/10.1007/978-3-319-91476-3_38
Published: 18 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91475-6
Online ISBN: 978-3-319-91476-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Attribute Reduction of Set-Valued Decision Information System

Abstract

Similar content being viewed by others

Attributes reductions of bipolar fuzzy relation decision systems

Unsupervised attribute reduction based on \(\alpha \)-approximate equal relation in interval-valued information systems

Attribute reduction in interval-valued information systems based on information entropies

Keywords

1 Introduction

2 Preliminaries

Definition 1

Definition 2

Definition 3.

3 A New Similarity Relation Based on Probability

Definition 4.

Proof.

Proof.

Definition 5.

Theorem 1.

Proof.

Theorem 2.

Proof.

Definition 6.

Theorem 3.

Proof.

Theorem 4.

Proof.

4 Attribute Reduction of Set-Valued Decision Information System

Definition 7.

Theorem 5.

Proof.

Definition 8.

Definition 9.

Example 1.

5 Experimental Results

6 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation