Keywords

1 Introduction

With the development of information technology, the means of data acquisition becomes more and more diverse. Meanwhile, the cost of data storage is getting lower and lower. These make it possible to acquire and store large amount of data, which stimulate the urgent need for automatic data processing. In many real world problems, data uncertainty is pervasive. In the past few years, it can be found that more attention has been paid to uncertain data. Rough set theory [1] is a powerful tool for dealing with uncertainty. Generally, the data processed by rough set theory is complete, accurate and atomic. However, the data in many real problems may be incomplete, inaccurate or non-atomic due to the limitation of acquisition means or the requirement of practical problems. It has become an important issue how to process incomplete data, interval-valued data and set-valued data.

A set-valued information system is that the value for an object on an attribute is not an exact value, but a set containing all possible values. For the processing of set-valued information system, a number of related researches have been studied. Orlowska and Pawlak [3, 4] investigated set-valued information system considering non-deterministic information and introduced the concept of a non-deterministic information system. Sakai et al. [5, 6] set up the theoretical foundations and algorithmic background for adapting rough set methods for the purposes of the analysis of non-deterministic information systems. Yao [7, 8] proposed a number of set-based computation methods based on set-based information system. In addition, some new relations for set-valued information system and their corresponding attribute reduction methods were proposed. And then, Zhang et al. proposed the concept of set-valued information system and a similarity relation [11, 12]. A tolerance relation was defined and the largest tolerance class was used to divide the universe in that paper. Qian et al. [13] proposed a dominant relation for set-valued information system and a corresponding attribute reduction method. Dai and Tian [14] gave a fuzzy relation, that was used to measure the degree of similarity between two set-valued objects. Wang [15] pointed out that the family of reducts defined by Dai need not be a subset of the family of reducts defined within the standard rough set model for set-valued information system. Bao and Yang [16] proposed a \( \updelta \)-dominance relation and a corresponding attribute reduction approach. Two types of fuzzy rough approximations, and two corresponding relative positive domain reductions were proposed by Wei et al. [17]. Moreover, some researchers transformed incomplete information systems into set-valued information systems to achieve reduction. Lipski [9, 10] discussed set-valued information systems from the view of incomplete information systems under the case of missing values. And as a special case of set values, incomplete information is probabilistically dealt with by some authors [18, 19] in the case of missing values.

It is an important issue how to define a binary relation dealing with set-valued information systems by rough set theory. Generally, there are two different semantic interpretations of set-valued data, namely, the conjunctive semantics and the disjunctive semantics [2]. Many different definitions have been proposed for these two semantic interpretations. However, we find that they are not appropriate when dealing with the set-valued data under the disjunctive semantics. For this reason, we developed a new approach based on probability, which can characterize the relation between two set-valued objects more reasonable under the disjunctive semantics. Then, an attribute reduction algorithm based on keeping positive domain for set-valued decision information systems was proposed and some experiments were conducted to prove the effect of the algorithm.

The study is structured as follows: In Sect. 2, some basic concepts of set-valued information system are reviewed. A new approach based on probability is proposed to measure the degree of the similarity between two set-valued objects under the disjunctive semantics in Sect. 3. In Sect. 4, we put forward an attribute reduction algorithm for set-valued decision information system. In Sect. 5, the experimental results and relative analysis are presented. Section 6 concludes the paper.

2 Preliminaries

In this section, some basic concepts about set-valued information system will be reviewed.

Definition 1

[12]. An information system is defined as \( \left( {U,A,V,F} \right) \), where \( U \) is a non-empty finite set of objects called the universe. \( A \) is a non-empty finite set of attributes. \( V \) is a union of attribute domains \( \left( {V = \cup_{a \in A} V_{a} } \right) \), \( V_{a} \) is a set including all possible values for \( a \in A \). \( F:U \times A \to V \) is a function that assigns particular values from attribute domain to objects. For \( \forall a \in A \), \( x \in U \), \( F\left( {a,x} \right) \in V_{a} \), \( F\left( {a,x} \right) \) is the value of \( a \) for \( x \). If for any \( a \) and \( x \), \( F\left( {a,x} \right) \) is a single value, then the information system is called a single-valued information system. Otherwise, it is called a set-valued information system.

There are two different semantic interpretations for set-valued information system [2]. The one is conjunctive semantics, the other is disjunctive semantics. Under the conjunctive semantics, a set value represents all the values of an object to an attribute, and all the values in the set are true. Under the disjunctive semantics, a set value represents all possible values of an object to an attribute, but there is only one true value in the set. For example, for \( x \in U \) and \( a \in A \), where \( a \) means the languages that \( x \) can speak. Let \( a\left( x \right) = \) {English, Chinese, French}. If \( a\left( x \right) \) is interpreted conjunctively, it means \( x \) can speak English, Chinese, and French. If \( a\left( x \right) \) is interpreted disjunctively, it means \( x \) can speak only one of English, Chinese, and French. We mainly study set-valued information system under the disjunctive semantics in this paper. In the following, a set-valued information system is under the disjunctive semantics if not otherwise specified.

To characterize the relation between two objects in a set-valued information system, many relations have been developed. Here we cite two important definitions of them.

Definition 2

[10]. Let \( S = \left( {U,A,V,F} \right) \) be a set-valued information system. For \( \forall b\,\in\,A \), a tolerance relation can be defined as follows:

$$ R_{b}^{ \cap } = \left\{ {\left( {x,y} \right) \in U\,|\,b\left( x \right) \cap b\left( y \right) \ne \varnothing } \right\}. $$
(1)

Then, for \( B\,\subseteq\,A \), a tolerance relation can be defined as follows:

$$ R_{B}^{ \cap } = \left\{ {\left( {x,y} \right) \in U\,|\,\forall b \in B,b\left( x \right) \cap b\left( y \right) \ne \varnothing } \right\} = \mathop \cap \limits_{b \in B} R_{b}^{ \cap } . $$
(2)

It is obvious that \( R_{B}^{ \cap } \) is reflexive and symmetric, but not necessarily transitive.

Dai et al. [14] defined a fuzzy relation, then the fuzzy relation is used to measure the similarity between set-valued objects.

Definition 3.

Let \( S = \left( {U,A,V,F} \right) \) be a set-valued information system. For \( \forall b \in A \), a fuzzy relation \( \widetilde{{R_{b} }} \) can be defined as follows:

$$ \mu_{{\widetilde{{R_{b} }}}} \left( {x,y} \right) = \frac{{\left| {b\left( x \right) \cap b\left( y \right)} \right|}}{{\left| {b\left( x \right) \cup b\left( y \right)} \right|}}. $$
(3)

Thus, for \( B\,\subseteq\,A \), a fuzzy relation can be defined as follows:

$$ \mu_{{\widetilde{{R_{B} }}}} \left( {x,y} \right) = \mathop {\inf }\limits_{b \in B} \mu_{{\widetilde{{R_{b} }}}} \left( {x,y} \right). $$
(4)

The above two definitions are not reasonable in some real problems. For example, let \( a\left( x \right) = \) {English, Chinese, French}, \( a\left( y \right) = \) {German, Japanese, English, Chinese}, and \( a\left( z \right) = \) {English, Chinese, French}. According to Definition 2, \( \left( {y,z} \right) \in R_{a}^{ \cap } \) because \( c\left( y \right) \cap c\left( z \right) = \) {English, Chinese} is not empty. That is to say, \( y \) and \( z \) are indiscernible with respect to the tolerance relation. In other words, \( y \) and \( z \) speak the same language. However, they may speak different languages. For example, \( y \) can only speak English and \( z \) can only speak French. By Definition 3, we know that \( \mu_{{\widetilde{{R_{a} }}}} \left( {x,z} \right) = 1 \). Thus, \( x \) and \( z \) are indiscernible with respect to the fuzzy relation. It means that \( x \) and \( z \) speak the same language. However, they may speak different language either. For example, \( x \) can only speak English and \( z \) can only speak French.

From this example, we find that two objects under the disjunctive semantics satisfied the existing relations only denote that they have possibility to be similar. Therefore, a new relation, which can describe the relation between two set-valued objects under the disjunctive semantics, should be defined.

3 A New Similarity Relation Based on Probability

As the analysis in Sect. 2, the existing methods may get unreasonable results under the disjunctive semantics. In order to solve this problem, a similarity relation based on probability is proposed in this section.

Definition 4.

Let \( S = \left( {U,A,V,F} \right) \) be a set-valued information system. For \( \forall b \in A \), the similarity between \( x \) and \( y \) on \( b \) is defined as follows.

$$ \mu_{{R_{b} }} \left( {x,y} \right) = \left\{ {\begin{array}{*{20}l} {\frac{{\left| {b\left( x \right)\,\cap\,b\left( y \right)} \right|}}{{\left| {b\left( x \right)} \right|\,*\,\left| {b\left( y \right)} \right|}},} \hfill & {x \ne y} \hfill \\ {1,} \hfill & {x = y} \hfill \\ \end{array} } \right.. $$
(5)

From the view of probability, \( \mu_{{R_{b} }} \left( {x,y} \right) \) is the possibility that \( x \) and \( y \) take the same value. For \( B\,\subseteq\,A \), the similarity relation \( R_{B} \) between \( x \) and \( y \) on \( B \) can be defined as follows:

$$ \mu_{{R_{B} }} \left( {x,y} \right) = \prod\limits_{b \in B} {\mu_{{R_{b} }} \left( {x,y} \right)} . $$
(6)

There are some important properties of the similarity relation defined above:

  1. (1)

    \( R_{B} \) is reflective.

Proof.

Since \( \mu_{{R_{b} }} \left( {x,x} \right) = 1 \), then \( \mu_{{R_{B} }} \left( {x,y} \right) = 1 \), we know that \( R_{B} \) is reflective.

  1. (2)

    \( R_{B} \) is symmetric.

Proof.

Since \( \mu_{{R_{b} }} \left( {y,x} \right) = \frac{{\left| {b\left( y \right)\,\cap\,b\left( x \right)} \right|}}{{\left| {b\left( y \right)} \right|\,*\,\left| {b\left( x \right)} \right|}} = \frac{{\left| {b\left( x \right)\,\cap\,b\left( y \right)} \right|}}{{\left| {b\left( x \right)} \right|\,*\,\left| {b\left( y \right)} \right|}} = \mu_{{R_{b} }} \left( {x,y} \right) \), then \( \mu_{{R_{B} }} \left( {y,x} \right) = \mu_{{R_{B} }} \left( {x,y} \right) \), we know that \( R_{B} \) is symmetric.

For the same example used before, according to Definition 4, the similarity between \( x \) and \( y \) is \( \mu_{{R_{a} }} \left( {x,y} \right) = \frac{1}{6} \), and the similarity between \( x \) and \( z \) is \( \mu_{{R_{a} }} \left( {x,z} \right) = \frac{1}{3} \). That is to say, the probability of \( x \) and \( y \) speaking the same language is \( \frac{1}{6} \), and the probability of \( x \) and \( z \) speaking the same language is \( \frac{1}{3} \). Comparing to the existing definitions, the definition we proposed gives a more reasonable and meaningful description between two objects with set values.

Definition 5.

Let \( S = \left( {U,A,V,F} \right) \) be a set-valued information system, for \( B\,\subseteq\,A \), \( x \in U \), the \( \delta \) similarity class of \( x \) with respect to \( B \) can be defined as follows:

$$ \delta_{B} \left( x \right) = \left\{ {y \in U|\mu_{{R_{B} }} \left( {x,y} \right) \ge \delta } \right\}\left( {0 \le \delta \le 1} \right), $$
(7)

where \( \delta \) is a threshold. We can use \( \delta \) to control the size of the information granules generated by \( B \). Specifically, the bigger the \( \delta \), the smaller size of the information granules generated by \( B \).

Theorem 1.

Let \( S = \left( {U,A,V,F} \right) \) be a set-valued information system, \( B \) and \( B^{\prime } \) be two subsets of \( A \), \( \delta \) be a threshold. For \( x \in U \), if \( B \supseteq B^{\prime } \), then \( \delta_{B} \left( x \right)\,\subseteq\,\delta_{{B^{\prime}}} \left( x \right) \).

Proof.

Let \( B^{\prime } = \left\{ {b_{1} ,b_{2} , \ldots ,b_{n} } \right\} \), \( B = B^{\prime } \cup \left\{ {b_{n + 1} ,b_{n + 2} , \ldots ,b_{n + m} } \right\} \). \( \forall x \in U \), \( \forall y \in U \). Then:

$$ \begin{aligned} \mu_{{R_{{B^{\prime}}} }} \left( {x,y} \right) & = \prod\nolimits_{{b \in B^{\prime}}} {\mu_{{R_{b} }} \left( {x,y} \right)} = \mu_{{R_{{b_{1} }} }} \left( {x,y} \right) * \mu_{{R_{{b_{2} }} }} \left( {x,y} \right) * \ldots * \mu_{{R_{{b_{n} }} }} \left( {x,y} \right) \\ \mu_{{R_{B} }} \left( {x,y} \right) & = \prod\nolimits_{b \in B} {\mu_{{R_{b} }} \left( {x,y} \right)} \\ & = \mu_{{R_{{b_{1} }} }} \left( {x,y} \right) * \mu_{{R_{{b_{2} }} }} \left( {x,y} \right) * \ldots * \mu_{{R_{{b_{n + 1} }} }} \left( {x,y} \right)* \ldots *\mu_{{R_{bn + m} }} \left( {x,y} \right) \\ & = \mu_{{R_{{B^{\prime}}} }} \left( {x,y} \right)*\mu_{{R_{{b_{n + 1} }} }} \left( {x,y} \right)* \ldots *\mu_{{R_{bn + m} }} \left( {x,y} \right) \\ \end{aligned} $$

Because \( \forall b\,\in\,A \), \( \mu_{{R_{b} }} \left( {x,y} \right) \in \left[ {0,1} \right] \), there must be \( \mu_{{R_{B} }} \left( {x,y} \right) \le \mu_{{R_{{B^{\prime}}} }} \left( {x,y} \right) \).

Suppose \( \forall x \in \delta_{B} \left( x \right) \), there exist \( \mu_{{R_{B} }} \left( {x,y} \right)\,\ge\,\delta \), then \( \delta\,\le\,\mu_{{R_{B} }} \left( {x,y} \right)\,\le\,\mu_{{R_{{B^{\prime}}} }} \left( {x,y} \right) \), that is to say, \( x \in \delta_{{B^{\prime } }} \left( x \right) \). Therefore, \( \delta_{B} \left( x \right)\,\subseteq\,\delta_{{B^{\prime}}} \left( x \right) \).

Theorem 1 shows that the number of the elements in the similarity class will be changed with the variation of condition attribute set. The smaller the condition attribute set is, the more elements will be in the similarity class.

Theorem 2.

Let \( S = \left( {U,A,V,F} \right) \) be a set-valued information system, \( B \) be a subset of \( A \), \( \delta_{1} \) and \( \delta_{2} \) be two thresholds. For \( x \in U \), if \( \delta_{1} \le \delta_{2} \), then \( \delta_{1B} \left( x \right) \supseteq \delta_{2B} \left( x \right) \).

Proof.

Suppose \( \forall x \in \delta_{2B} \left( x \right) \), that is to say, \( \mu_{{R_{B} }} \left( {x,y} \right) \ge \delta_{2} \). Because \( \delta_{1} \le \delta_{2} \), then there must be \( \mu_{{R_{B} }} \left( {x,y} \right) \ge \delta_{1} \), it can be inferred that \( x \in \delta_{1B} \left( x \right) \). Therefore, \( \delta_{1B} \left( x \right) \supseteq \delta_{2B} \left( x \right) \).

Theorem 2 indicates that we can control the elements in the similarity class by the threshold. The smaller the threshold is, the more elements will be in the similarity class.

Definition 6.

Let \( S = \left( {U,A,V,F} \right) \) be a set-valued information system, \( B \) be a subset of \( A \). Given an arbitrary set \( X\,\subseteq\,U \), the \( \delta \)-upper approximation \( \overline{{B_{\delta } }} \left( X \right) \) and the \( \delta \)-lower approximation \( \underline{{B_{\delta } }} \left( X \right) \) of \( X \) with respect to \( B \) are:

$$ \overline{{B_{\delta } }} \left( X \right) = \left\{ {x \in U\,|\,\delta_{B} \left( x \right) \cap X \ne \varnothing } \right\}, $$
(8)
$$ \underline{{B_{\delta } }} \left( X \right) = \left\{ {x \in U\,|\,\delta_{B} \left( x \right)\,\subseteq\,X} \right\} $$
(9)

where \( \underline{{B_{\delta } }} \left( X \right) \) contains all the objects that can be classified into \( X \) definitely, and \( \overline{{B_{\delta } }} \left( X \right) \) contains all the objects that can be classified into \( X \) approximately.

Theorem 3.

Let \( S = \left( {U,A,V,F} \right) \) be a set-valued information system, \( B \) and \( B^{\prime } \) be two subsets of \( A \), \( \delta \) be a threshold. For \( X\,\subseteq\,U \), if \( B \supseteq B^{\prime } \), then \( \overline{{B_{\delta } }} \left( X \right)\,\subseteq\,\overline{{B^{\prime }_{\delta } }} \left( X \right) \), and \( \underline{{B_{\delta } }} \left( X \right)\,\supseteq\,\underline{{B^{\prime }_{\delta } }} \left( X \right) \).

Proof.

The proof comes directly from Theorem 1, and hence it is omitted here.

Theorem 3 shows that the number of the elements in the upper and lower approximation sets will be changed with the variation of the condition attribute set. The smaller the condition attribute set is, the more elements will be in the \( \delta \)-upper approximation set and less in the \( \delta \)-lower approximation set.

Theorem 4.

Let \( S = \left( {U,A,V,F} \right) \) be a set-valued information system, \( B \) be a subset of \( A \), \( \delta_{1} \) and \( \delta_{2} \) be two thresholds. For \( X\,\subseteq\,U \), if \( \delta_{1}\,\le\,\delta_{2} \), then \( \overline{{B_{{\delta_{1} }} }} \left( X \right)\,\supseteq\,\overline{{B_{{\delta_{2} }} }} \left( X \right) \), and \( \underline{{B_{{\delta_{1} }} }} \left( X \right) \subseteq \underline{{B_{{\delta_{2} }} }} \left( X \right) \).

Proof.

The proof comes directly from Theorem 2, and hence it is omitted here.

Theorem 4 indicates that we can control the elements in the upper and lower approximation sets by changing the threshold. The smaller the threshold is, the more elements will be in the \( \delta \)-upper approximation set and less in the \( \delta \)-lower approximation set.

4 Attribute Reduction of Set-Valued Decision Information System

Generally, most of decision information systems have some redundant attributes. These redundant attributes, on the one hand, waste storage space and reduce the efficiency of data processing. On the other hand, they may be our interference to make correct and concise decisions. Next, we will discuss how to reduce a set-valued decision information system.

Definition 7.

Let \( S = \left( {U,C \cup D,V,F} \right) \) be a set-valued decision information system, where \( U \) is a non-empty finite set of objects called the universe. \( C \) is a set of condition attribute, \( D \) is a decision attribute, \( F \) is a function that assigns particular values from attribute domain to objects. \( {U \mathord{\left/ {\vphantom {U D}} \right. \kern-0pt} D} = \left\{ {d_{1} ,d_{2} , \ldots ,d_{m} } \right\} \) is a division of \( U \). For \( B\,\subseteq\,C \), the positive domain of \( D \) with respect to \( B \) is defined as:

$$ POS_{B} (D) = \bigcup\limits_{i = 1}^{m} {\underline{{B_{\delta } }} (d_{i} ),} $$
(10)

and the negative domain of \( D \) with respect to \( B \) is:

$$ NEG_{B} \left( D \right) = U - POS_{B} \left( D \right), $$
(11)

where \( POS_{B} \left( D \right) \) is the set of objects in \( U \) that can be classified into \( D \) definitely. \( NEG_{B} \left( D \right) \) is the set of objects in \( U \) that can not be classified into \( D \) definitely.

Theorem 5.

Let \( S = \left( {U,C \cup D,V,F} \right) \) be a set-valued decision information system, \( B \) and \( B^{\prime } \) be two subsets of \( C \). If \( B^{\prime }\,\subseteq\,B \), then \( POS_{{B^{\prime}}} \left( D \right)\,\subseteq\,POS_{B} \left( D \right) \).

Proof.

If \( x \in POS_{{B^{\prime}}} \left( D \right) \), there exist \( d_{i} \in {U \mathord{\left/ {\vphantom {U D}} \right. \kern-0pt} D} \) such that \( \delta_{{B^{\prime}}} \left( x \right)\,\subseteq\,d_{i} \). According to Theorem 1, we have \( \delta_{B} \left( x \right)\,\subseteq\,\delta_{{B^{\prime}}} \left( x \right) \). So, \( \delta_{B} \left( x \right)\,\subseteq\,d_{i} \), that is \( x \in POS_{B} \left( D \right) \). Thus, \( POS_{{B^{\prime}}} \left( D \right)\,\subseteq\,POS_{B} \left( D \right) \).

Theorem 5 shows that the positive domain of \( D \) with respect to an attribute subset, is also a subset of the positive domain of \( D \) with respect to all the attributes. That means, if some attributes are deleted from a set-valued decision information system, the positive domain may decrease or remain unchanged. If the positive domain remains unchanged after an attribute is deleted, it means this attribute is redundant for keeping the positive domain. In other words, removing redundant attributes does not affect the correct classification ability of the system.

Definition 8.

Let \( S = \left( {U,C \cup D,V,F} \right) \) be a set-valued decision information system. For \( \forall a \in C \), \( a \) is reducible in \( C \) with respect to \( D \) if \( POS_{{C - \left\{ a \right\}}} \left( D \right) = POS_{C} \left( D \right) \). Otherwise, \( a \) is irreducible in \( C \) with respect to \( D \).

Definition 9.

Let \( S = \left( {U,C \cup D,V,F} \right) \) be a set-valued decision information system, and \( B \) be a subset of \( C \). \( B \) is a reduction of \( C \) with respect to \( D \) if:

  1. (1)

    \( POS_{B} \left( D \right) = POS_{C} \left( D \right) \), and

  2. (2)

    \( \forall a \in B\,,\,POS_{{B{ - }\left\{ a \right\}}} \left( D \right) \ne POS_{B} \left( D \right). \)

According to Definitions 8 and 9, it is obvious that \( B \), the reduction of \( C \), has the same classification ability as \( C \), and deleting any attributes from \( B \) will decrease the correct classification ability of the system. To obtain a reduction of a set-valued decision information system, we can develop the following algorithm.

figure a

For a given set-valued decision information system, Algorithm 1 check each attribute by the conditions stated in Definition 9. If the conditions are satisfied, then the attribute will be deleted. Otherwise it will be retained. Finally, we can get a reduction of the set-valued decision information system.

The following example illustrates how to form a reduction using Algorithm 1.

Example 1.

For the set-valued decision information system \( S \) shown in Table 1, we can compute the reduction of \( S \) as follow.

Table 1. A set-valued decision information system

Let, \( \delta = {1 \mathord{\left/ {\vphantom {1 {16}}} \right. \kern-0pt} {16}} \), we have:

$$ \begin{aligned} & \delta_{C} \left( {x_{1} } \right) = \left\{ {x_{1} } \right\}\,,\,\delta_{C} \left( {x_{2} } \right) = \left\{ {x_{2} ,x_{8} } \right\} \\ & \delta_{C} \left( {x_{3} } \right) = \left\{ {x_{3} ,x_{4} ,x_{8} } \right\}\,,\,\delta_{C} \left( {x_{4} } \right) = \left\{ {x_{3} ,x_{4} } \right\} \\ & \delta_{C} \left( {x_{5} } \right) = \left\{ {x_{5} ,x_{6} } \right\}\,,\,\delta_{C} \left( {x_{6} } \right) = \left\{ {x_{5} ,x_{6} } \right\} \\ & \delta_{C} \left( {x_{7} } \right) = \left\{ {x_{7} } \right\}\,,\,\delta_{C} \left( {x_{8} } \right) = \left\{ {x_{2} ,x_{3} ,x_{8} } \right\} \\ & U/D = \left\{ {\left\{ {x_{1} ,x_{7} } \right\},\left\{ {x_{3} ,x_{4} } \right\},\left\{ {x_{2} ,x_{5} ,x_{6} ,x_{8} } \right\}} \right\} \\ & POS_{C} \left( D \right) = \left\{ {x_{1} ,x_{2} ,x_{3} ,x_{4} ,x_{5} ,x_{6} ,x_{7} ,x_{8} } \right\} \\ \end{aligned} $$

After \( a_{1} \) is deleted, we have:

$$ \begin{aligned} & \delta_{{C{ - }a_{1} }} \left( {x_{1} } \right) = \left\{ {x_{1} } \right\}\,,\,\delta_{{C{ - }a_{1} }} \left( {x_{2} } \right) = \left\{ {x_{2} ,x_{3} ,x_{8} } \right\} \\ & \delta_{{C{ - }a_{1} }} \left( {x_{3} } \right) = \left\{ {x_{2} ,x_{3} ,x_{4} ,x_{8} } \right\}\,,\,\delta_{{C{ - }a_{1} }} \left( {x_{4} } \right) = \left\{ {x_{3} ,x_{4} ,x_{7} } \right\} \\ & \delta_{{C{ - }a_{1} }} \left( {x_{5} } \right) = \left\{ {x_{5} ,x_{6} } \right\}\,,\,\delta_{{C{ - }a_{1} }} \left( {x_{6} } \right) = \left\{ {x_{5} ,x_{6} } \right\} \\ & \delta_{{C{ - }a_{1} }} \left( {x_{7} } \right) = \left\{ {x_{4} ,x_{7} } \right\},\delta_{{C{ - }a_{1} }} \left( {x_{8} } \right) = \left\{ {x_{2} ,x_{3} ,x_{8} } \right\}a_{3} \\ & POS_{{C{ - }a_{1} }} \left( D \right) = \left\{ {x_{1} ,x_{5} ,x_{6} } \right\} \\ \end{aligned} $$

It can be found that the positive domain has changed after \( a_{1} \) is deleted. So, \( a_{1} \) is irreducible. Likewise, \( a_{2} \), \( a_{3} \), \( a_{4} \) and \( a_{5} \) are all irreducible. Then, the reduction is \( \left\{ {a_{1} ,a_{2} ,a_{3} ,a_{4} ,a_{5} } \right\} \). Let, it can be found that \( a_{1} \)\( , \) and \( a_{5} \) are irreducible, but and \( a_{4} \) are reducible. Then, the reduction is \( \left\{ {a_{1} ,a_{2} ,a_{5} } \right\} \). According to the above analysis, we can get different reducts by changing the value of \( \delta \).

5 Experimental Results

The experiments in this section are to demonstrate the effectiveness of the method proposed in this paper. There are five groups of data sets used in the experiments and the information of all datasets is shown in Table 2.

Table 2. Data description

In the experiments, we first reduce the five data sets using the tolerance relation, the fuzzy relation and the similarity relation proposed in this paper respectively, then J48 and SMO were used to make comparisons on classification accuracy with the reduction results. The reduction results gotten by different relations are shown in Table 3, and the classification accuracy comparisons are shown as Figs. 1 and 2:

Table 3. Attribute numbers after reduction
Fig. 1.
figure 1

Comparison of classification accuracy by J48

Fig. 2.
figure 2

Comparison of classification accuracy by SMO

It is known that only one or two attributes were removed by attribute reduction in vote and breast-cancer from Table 2, and the classification accuracy gotten by the proposed method and the existing methods has no obvious difference on these two datasets. In other three datasets, the results on attribute reduction are significant. By comparison, the method proposed in this paper retains more attributes than the existing methods. On the other hand, the classification accuracy gotten by the proposed method is almost the same as the classification accuracy of the original data, and is obviously higher than the classification accuracy of the other methods. According to the experimental results, we can draw the conclusion that the method proposed in this paper maintains more attributes than the existing methods, but it ensures that the classification accuracy of the system does not change significantly. Although the existing methods remove more attributes, the classification accuracy of the system is also greatly decreased. That means the existing methods remove some useful attributes, and it is unacceptable in some practical problems.

6 Conclusions

A lot of different methods were proposed to deal with set-valued information system. However, they are not appropriate when dealing with the set-valued data under the disjunctive semantics. To address this problem, a similarity relation based on probability was defined to describe the relation between two set-valued objects. Then, a corresponding attribute reduction algorithm based on keeping positive domain was proposed. In the end, a group of experiments were conducted to demonstrate the effectiveness of the proposed methods. The experimental results indicate that the existing methods can get a smaller reduction, but they may remove some useful attributes, thus reduce the classification accuracy of the system significantly. The proposed method retains more attributes, but all these attributes are useful, and it can always get the classification accuracy close to the original data. Because the threshold has an important influence on the classification accuracy when we used the proposed methods, it will be our future work how to choose an appropriate threshold.