Abstract
Rough sets and rule induction in an incomplete and continuous information table are investigated under possible world semantics. We show an approach using possible indiscernibility relations, whereas the traditional approaches use possible tables. This is because the number of possible indiscernibility relations is finite, although we have the infinite number of possible tables in an incomplete and continuous information table. First, lower and upper approximations are derived directly using the indiscernibility relation on a set of attributes in a complete and continuous information table. Second, how these approximations are derived are described applying possible world semantics to an incomplete and continuous information table. Lots of possible indiscernibility relations are obtained. The actual indiscernibility relation is one of possible ones. The family of possible indiscernibility relations is a lattice for inclusion with the minimum and the maximum indiscernibility relations. Under the minimum and the maximum indiscernibility relations, we obtain four kinds of approximations: certain lower, certain upper, possible lower, and possible upper approximations. Therefore, there is no computational complexity for the number of values with incomplete information. The approximations in possible world semantics are the same as ones in our extended approach directly using indiscernibility relations. We obtain four kinds of single rules: certain and consistent, certain and inconsistent, possible and consistent, and possible and inconsistent ones from certain lower, certain upper, possible lower, and possible upper approximations, respectively. Individual objects in an approximation support single rules. Serial single rules from the approximation are brought into one combined rule. The combined rule has greater applicability than single rules that individual objects support.
Access provided by Autonomous University of Puebla. Download chapter PDF
Similar content being viewed by others
Keywords
- Neighborhood rough sets
- Rule induction
- Possible world semantics
- Incomplete information
- Indiscernibility relations
- Continuous values
1 Introduction
The Information generated in the real world includes various types of data. When we deal with character string data, the data is broadly classified into discrete data and continuous data.
Rough sets, constructed by Pawlak [18], are used as an effective method for feature selection, pattern recognition, data mining and so on. The framework consists of lower and upper approximations. This is traditionally applied to complete information tables with nominal attributes. Fruitful results are reported in various fields. However, when we are faced with real-world objects, it is often necessary to handle attributes that take a continuous value. Furthermore, objects with incomplete information ubiquitously exist in the real world. Without processing incomplete and continuous information, the information generated in the real world cannot be fully utilized. Therefore, extended versions of the rough sets have been proposed to handle incomplete information in continuous domains.
An approach handling incomplete information, which is often adopted [7, 20,21,22], is to use the way that Kryszkiewicz applied to nominal attributes [8]. This approach gives in advance the indistinguishability of objects that have incomplete information with other objects. However, it is natural that there are two possibilities for incomplete information objects. One possibility is that an object with incomplete information may have the same value as another object. That is, the two objects may be indiscernible. The other possibility is that the object may have a different value from another object. That is, they may be discernible. Giving in advance the indiscernibility corresponds to neglecting one of the two possibilities. Therefore, the approach leads to loss of information and creates poor results [11, 19].
Another approach is to directly use indiscernibility relations extended to handle incomplete information [14]. Yet another approach is to use possible classes obtained from the indiscernibility relation on a set of attributes [15]. These two approaches have no computational complexity for the number of values with incomplete information. We need to give some justification to these extended approaches. It is known in discrete data tables that an approach using possible class has some justification from the viewpoint of possible world semantics [12]. We focus on an approach directly using indiscernibility relations.Footnote 1 To give it some justification, we need to develop an approach that is based on possible world semantics. The previous approaches are developed under possible tables derived from an incomplete and continuous information table. Unfortunately, an infinite number of possible tables can be generated from an incomplete and continuous information table. Possible world semantics cannot be applied to an infinite number of possible tables.
The starting point for a rough set is the indiscernibility relation on a set of attributes. When an information table contains values with incomplete information, we obtain lots of possible indiscernibility relations in place of the indiscernibility relation. The number is finite, even if the number of possible tables is infinite, because the number of objects is finite. We note this finiteness and develop an approach based on the possible indiscernibility relations, not the possible tables.
The paper is constructed as follows. Section 2 describes an approach directly using indiscernibility relations in a complete and continuous information table. Section 3 develops an approach applying possible world semantics to an incomplete and continuous information table. Section 4 describes rule induction in a complete and continuous information table. Section 5 address rule induction in an incomplete and continuous information table. Section 6 mentions the conclusions.
2 Rough Sets by Directly Using Indiscernibility Relations in Complete and Continuous Information Systems
A continuous data set is represented as a two-dimensional table, called a continuous information table. In the continuous information table, each row and each column represent an object and an attribute, respectively. A mathematical model of an information table with complete and continuous information is called a complete and continuous information system. The complete and continuous information system is a triplet expressed by \((U,AT,\{D(a) \mid a \in AT \})\). U is a non-empty finite set of objects, which is called the universe. AT is a non-empty finite set of attributes such that \(a : U \rightarrow D(a)\) for every \(a \in AT\) where D(a) is the continuous domain of attribute a.
We have two approaches for handling continuous values. One approach is to discretize a continuous domain into disjunctive intervals in which objects are considered as indiscernible [4]. How to discretize has a heavy influence over results. The other approach is to use neighborhood [10]. The indiscernibility of two objects is derived from the distance of the values that characterize them. A threshold is given, which is the indiscernibility criterion. When the distance between two objects is less than or equal to the threshold, they are considered as indiscernible. As the threshold changes, the results change gradually. Therefore, we take the neighborhood-based approach.
Binary relation \(R_{A}\)Footnote 2 that represents the indiscernibility between objects on set \(A \subseteq AT\) of attributes is called the indiscernibility relation on A:
where A(o) is the value sequence for A of object o and \((|A(o) - A(o')| \le \delta _{A}) = (\wedge _{a \in A} |a(o) - a(o')| \le \delta _{a})\) and \(\delta _{a}\)Footnote 3 is a threshold indicating the range of indiscernibility between a(o) and \(a(o')\).
Proposition 1
If \(\delta 1_{A} \le \delta 2_{A}\), equal to \(\wedge _{a \in A}(\delta 1_{a} \le \delta 2_{a})\), then \(R_{A}^{\delta 1_{A}} \subseteq R_{A}^{\delta 2_{A}}\), where \(R_{A}^{\delta 1_{A}}\) and \(R_{A}^{\delta 2_{A}}\) are the indiscernibility relations with thresholds \(\delta 1_{A}\) and \(\delta 2_{A}\), respectively and \(R_{A}^{\delta 1_{A}} = \cap _{a \in A} R_{a}^{\delta 1_{a}}\) and \(R_{A}^{\delta 2_{A}}= \cap _{a \in A} R_{a}^{\delta 2_{a}}\).
From indiscernibility relation \(R_{A}\), indiscernible class \([o]_{A}\) for object o is obtained:
where \([o]_{A} = \cap _{a \in A}[o]_{a}\).
Directly using indiscernibility relation \(R_{A}\), lower approximation \(\underline{apr}_{A}(\mathcal{O})\) and upper approximation \(\overline{apr}_{A}(\mathcal{O})\) for A of set \(\mathcal{O}\) of objects are:
Proposition 2
[14] Let \(\underline{apr}_{A}^{\delta 1_{A}}(\mathcal{O})\) and \(\overline{apr}_{A}^{\delta 1_{A}}(\mathcal{O})\) be lower and upper approximations under threshold \(\delta 1_{A}\) and let \(\underline{apr}_{A}^{\delta 2_{A}}(\mathcal{O})\) and \(\overline{apr}_{A}^{\delta 2_{A}}(\mathcal{O})\) be lower and upper approximations under threshold \(\delta 2_{A}\). If \(\delta 1_{A} \le \delta 2_{A}\), then \(\underline{apr}_{A}^{\delta 1_{A}}(\mathcal{O}) \supseteq \underline{apr}_{A}^{\delta 2_{A}}(\mathcal{O})\) and \(\overline{apr}_{A}^{\delta 1_{A}}(\mathcal{O}) \subseteq \overline{apr}_{A}^{\delta 2_{A}}(\mathcal{O})\).
For object o in the lower approximation of \(\mathcal{O}\), all objects with which o is indiscernible are included in \(\mathcal{O}\); namely, \([o]_{A} \subseteq \mathcal{O}\). On the other hand, for objects in the upper approximation of \(\mathcal{O}\), some objects indiscernible o are in \(\mathcal{O}\). That is, \([o]_{A} \cap \mathcal{O} \ne \emptyset \). Thus, \(\underline{apr}_{A}(\mathcal{O}) \subseteq \overline{apr}_{A}(\mathcal{O})\).
3 Rough Sets from Possible World Semantics in Incomplete and Continuous Information Systems
An information table with incomplete and continuous information is called an incomplete and continuous information system. In incomplete and continuous information systems, \(a : U \rightarrow s_{a}\) for every \(a \in AT\) where \(s_{a}\) is the union of or-sets of values over domain D(a) of attribute a and sets of intervals on D(a). Note that an or-set is a disjunctive set [9]. Single value \(v \in a(o)\) is a possible value that may be the actual value of attribute a in object o. The possible value is the actual one if a(o) is single; namely, \(|a(o)| = 1\).
We have lots of possible indiscernibility relations from an incomplete and continuous information table. The smallest possible indiscernibility relation is the certain one. Certain indiscernibility relation \(CR_{A}\) is:
In this binary relation, which is unique on A, two objects o and \(o'\) of \((o,o') \in CR_{A} \) are certainly indiscernible on A. Such a pair is called a certain pair. Family \(\mathcal{F}(R_{A})\) of possible indiscernibility relations is:
where each element is a possible indiscernibility relation and \(\mathcal{P}(MPPR_{A})\) is the power set of \(MPPR_{A}\) and \(MPPR_{A}\) is:
A pair of objects in \(MPR_{A}\) is called a possible one. \(\mathcal{F}(R_{A})\) has a lattice structure for set inclusion. \(CR_{A}\) is the minimum possible indiscernibility relation in \(\mathcal{F}(R_{A})\) on A, which is the minimum element, whereas \(CR_{A} \cup MPR_{A}\) is the maximum possible indiscernibility relation on A, which is the maximum element. One of possible indiscernibility relations is actual. However, we cannot know it without additional information.
Example 1
Or-set \(<1.25,1.31>\) means 1.25 or 1.31. Let threshold \(\delta _{a_{1}}\) be 0.05 in T of Fig. 1. The set of certain pairs of indiscernible objects on \(a_{1}\) is:
The set of possible pairs of indiscernible objects is:
Applying formulae (5)–(7) to these sets, the family of possible indiscernibility relations and each possible indiscernibility relation \(pr_{i}\) with \(i =1, \ldots , 8\) are:
The family of these possible indiscernibility relations has the lattice structure for set inclusion like Fig. 2. \(pr_{1}\) is the minimum element, whereas \(pr_{8}\) is the maximum element.
We develop an approach based on possible indiscernibility relations in an incomplete and continuous information table. Applying formulae (3) and (4) to a possible indiscernibility relation pr, Lower and upper approximations in pr are:
Proposition 3
If \(pr_{k} \subseteq pr_{l}\) for possible indiscernibility relations \(pr_{k}, pr_{l} \in \mathcal{F}(R_{A})\),
then \(\underline{apr}_{A}(\mathcal{O})^{pr_{k}} \supseteq \underline{apr}_{A}(\mathcal{O})^{pr_{l}}\) and \(\overline{apr}_{A}(\mathcal{O})^{pr_{k}} \subseteq \overline{apr}_{A}(\mathcal{O})^{pr_{l}}\).
From this proposition the families of lower and upper approximations in possible indiscernibility relations also have the same lattice structure for set inclusion as the family of possible indiscernibility relations.
By aggregating the lower and upper approximations in possible indiscernibility relations, we obtain four kinds of approximations: certain lower approximation \(C\underline{apr}_{A}(\mathcal{O})\), certain upper approximation \(C\underline{apr}_{A}(\mathcal{O})\), possible lower approximation \(P\underline{apr}_{A}(\mathcal{O})\), and possible upper approximation \(P\overline{apr}_{A}(\mathcal{O})\):
Using Proposition 3,
where \(pr_{\min }\) and \(pr_{\max }\) are the minimum and the maximum possible indiscernibility relations on A.
Using formulae (16)–(19), we can obtain the four approximations without the computational complexity for the number of possible indiscernibility relations, although the number of possible indiscernibility relations has exponential growth as the number of values with incomplete information linearly increases.
Definability on set A of attributes is defined as follows:
Set \(\mathcal{O}\) of objects is certainly definable if and only if \(\forall pr \in \mathcal{F}(R_{A}) \exists S \subseteq U \ \mathcal{O} = \cup _{o \in S}[o]_{A}^{pr}\).
Set \(\mathcal{O}\) of objects is possibly definable if and only if \(\exists pr \in \mathcal{F}(R_{A}) \exists S \subseteq U \ \mathcal{O} = \cup _{o \in S}[o]_{A}^{pr}\).
These definition is equivalent to:
Set \(\mathcal{O}\) of objects is certainly definable if and only if \(\forall pr \in \mathcal{F}(R_{A}) \ \underline{apr}_{A}(\mathcal{O})^{pr} = \overline{apr}_{A}(\mathcal{O})^{pr}\).
Set \(\mathcal{O}\) of objects is possibly definable if and only if \(\exists pr \in \mathcal{F}(R_{A}) \ \underline{apr}_{A}(\mathcal{O})^{pr} = \overline{apr}_{A}(\mathcal{O})^{pr}\).
Example 2
We use the possible indiscernibility relations in Example 1. Let set \(\mathcal{O}\) of objects be \(\{o_{2},o_{4}\}\). Applying formulae (10) and (11) to \(\mathcal{O}\), lower and upper approximations from each possible indiscernibility relation are:
\(\mathcal{O}\) is possibly definable on \(a_{1}\).
As with the case of nominal attributes [12], the following proposition holds.
Proposition 4
\(C\underline{apr}_{A}(\mathcal{O})\) \(\subseteq \) \(P\underline{apr}_{A}(\mathcal{O})\) \(\subseteq \) \(\mathcal{O}\) \(\subseteq \) \(C\overline{apr}_{A}(\mathcal{O})\) \(\subseteq \) \(P\overline{apr}_{A}(\mathcal{O})\).
Using the four approximations denoted by formulae (16)–(19), lower approximation \(\underline{apr}_{A}^{\bullet }(\mathcal{O})\) and upper approximation \(\overline{apr}_{A}^{\bullet }(\mathcal{O})\) are expressed in interval sets, as is described in [13]Footnote 4:
The two approximations \(\underline{apr}_{A}^{\bullet }(\mathcal{O})\) and \(\overline{apr}_{A}^{\bullet }(\mathcal{O})\) are dependent through the complementarity property \(\underline{apr}_{A}^{\bullet }(\mathcal{O}) = U - \overline{apr}_{A}^{\bullet }(U - \mathcal{O})\).
Example 3
Applying four approximations in Example 2 to formulae (20) and (21),
Furthermore, the following proposition is valid from formulae (16)–(19).
Proposition 5
Our extended approach directly using indiscernibility relations [14] is justified from this proposition. That is, approximations from the extended approach using two indiscernibility relations are the same as the ones obtained under possible world semantics. A correctness criterion for justification is formulated as
where \(q'\) is the approach for complete and continuous information, which is described in Sect. 2, and q is an extended approach of \(q'\), which directly handles with incomplete and continuous information, and \(\bigodot \) is an aggregate operator. This is represented in Fig. 3.
This kind of correctness criterion is usually used in the field of databases handling incomplete information [1,2,3, 6, 17, 23].
When objects in \(\mathcal{O}\) are specified by a restriction containing set B of nominal attribute with incomplete information, elements in domain \(D(B)(=\cup _{b \in B}D(b))\) are used. For example, \(\mathcal{O}\) is specified by restriction \(B = X(=\wedge _{b \in B}(b=x_{b}))\) with \(B \in AT\) and \(x_{b} \in D(b)\). Four approximations: certain lower, certain upper, possible lower, and possible upper ones are:
where
When \(\mathcal{O}\) is specified by a restriction containing set B of numerical attributes with incomplete information, set \(\mathcal{O}\) is specified by an interval where precise values of \(b \in B\) are used.
where
where \(b(o_{m_{b}})\) and \(b(o_{n_{b}})\) are precise and \(\forall b \in B \ b(o_{m_{b}}) \le b(o_{n_{b}})\).
Example 4
In incomplete information table T of Example 1, let \(\mathcal{O}\) be specified by values \(a_{2}(o_{3})\) and \(a_{2}(o_{4})\). Using formulae (32) and (33),
Possible indiscernibility relations \(pr_{\min }\) and \(pr_{\max }\) on \(a_{1}\) is \(pr_{1}\) and \(pr_{8}\) in Example 1. Using formulae (28)–(31),
4 Rule Induction in Complete and Continuous Information Systems
Let single rules that are supported by objects be derived from the lower and upper approximations of O specified by restriction \(B = X\).
-
Object \(o \in \underline{apr}_{A}(O)\) supports rule \(A = A(o) \rightarrow B = X\) consistently.
-
Object \(o \in \overline{apr}_{A}(O)\) supports rule \(A = A(o) \rightarrow B = X\) inconsistently.
The accuracy, which means the degree of consistency, is \(|[o]_{A} \cap O|/|[o]_{A} |\). This degree is equal to 1 for \(o \in \underline{apr}_{A}(O)\).
In the case where a set of attributes that characterize objects has continuous domains, single rules supported by individual objects in an approximation usually have different antecedent parts. So, we obtain lots of single rules. The disadvantage of the single rule is that it lacks applicability. For example, let two values a(o) and \(a(o')\) be 4.53 and 4.65 for objects o and \(o'\) in \(\underline{apr}_{a}(O)\). When O is specified by restriction \(b=x\), o and \(o'\) consistently support single rules \(a = 4.53 \rightarrow b = x\) and \(a = 4.65 \rightarrow b = x\), respectively. By using these single rules, we can say that an object with value 4.57 of a, which is indiscernible with 4.53 under \(\delta _{a} = 0.05\), supports \(a = 4.57 \rightarrow b = x\). However, we cannot at all say anything for a rule consistently supported by an object with value 4.59 discernible with 4.53 and 4.65 under \(\delta _{a} = 0.05\). This shows that the single rule has low applicability.
To improve applicability, we bring serial single rules into one combined rule. Let \(o \in U\) be arranged in ascending order of a(o) and be given a serial superscript from 1 to |U|. \(\underline{apr}_{A}(O)\) and \(\overline{apr}_{A}(O)\) consist of collections of serially superscripted objects. For instance, \(\underline{apr}_{A}(O) = \{\cdots , o_{i_{h}}^{h}, o_{i_{h+1}}^{h+1}, \cdots , o_{i_{k-1}}^{k-1}, o_{i_{k}}^{k}, \cdots \}\) \((h \le k)\). The following processing is done to each attribute in A. A single rule that \(o^{l} \in \underline{apr}_{A}(O)\) has antecedent part \(a = a(o{^l})\) for attribute a. Then, antecedent parts of serial single rules induced from collection \((o_{i_{h}}^{h}, o_{i_{h+1}}^{h+1}, \cdots , o_{i_{k-1}}^{k-1}, o_{i_{k}}^{k})\) can be brought into one combined antecedent part \(a= [a(o_{i_{h}}^{h}), a(o_{i_{k}}^{k})]\). Finally, a combined rule is expressed in \(\wedge _{a \in A}(a= [a(o_{i_{h}}^{h}), a(o_{i_{k}}^{k})] \rightarrow B =X)\). The combined rule has accuracy
Proposition 7
Let \(\underline{r}\) be the set of combined rules obtained from \(\underline{apr}_{A}(O)\) and \(\overline{r}\) be the set from \(\overline{apr}_{A}(O)\). If \((A = [l_{A}, u_{A}] \rightarrow B = X) \in \underline{r}\), then \(\exists l'_{A} \le l_{A}, \exists u'_{A} \ge u_{A} \ (A = [l'_{A}, u'_{A}] \rightarrow B = X) \in \overline{r}\), where O is specified by restriction \(B = X\) and \((A = [l_{A}, u_{A}]) = \wedge _{a\in A}(a = [l_{a}, u_{a}])\).
Proof
A single rule obtained from \(\underline{apr}_{A}(O)\) is also derived from \(\overline{apr}_{A}(O)\). This means that the proposition holds.
Example 7
Let continuous information table T0 in Fig. 3 be obtained, where U consists of \(\{o_{1}, o_{2}, \cdots ,\) \(o_{19}, o_{20}\}\). Tables T1, T2, and T3 in Fig. 4 are created from T0. T1 where set \(\{a_{1},a_{4}\}\) of attributes is projected from T0, T2 where \(\{a_{2},a_{3}\}\) is projected, and T3 where \(\{a_{3}\}\) is projected. In addition, objects included in T1, T2, and T3 are arranged in ascending order of values of attributes \(a_{1}\), \(a_{2}\), and \(a_{3}\), respectively.
Indiscernible classes on \(a_{1}\) of each object under \(\delta _{a_{1}}=0.05\) are:
When O is specified by \(a_{4} = x\), \(O = \{o_{1}, o_{2}, o_{5}, o_{9}, o_{11}, o_{14}, o_{16}, o_{19}, o_{20}\}\). Let O be approximated by objects characterized by attribute \(a_{1}\) whose values are continuous. Using formulae (3) and (4), two approximations are:
In continuous information table T1, which is created from T0, objects are arranged in ascending order of values of attribute \(a_{1}\) and each object is given a serial superscript from 1 to 20. Using the serial superscript, the two approximations are rewritten:
The lower approximation creates consistent combined rules:
from collections \(\{o_{16}^{7}, o_{11}^{8}\}\) and \(\{o_{9}^{14}, o_{5}^{15}, o_{20}^{16}\}\), respectively, where \(a_{1}(o_{16}^{7}) = 3.96, \ a_{1}(o_{11}^{8}) = 3.98, \ a_{1}(o_{9}^{14}) = 4.23,\) and \(a_{1}(o_{20}^{16}) = 4.43\). The upper approximation creates inconsistent combined rules:
from collections \(\{o_{17}^{5}, o_{2}^{6},o_{16}^{7}, o_{11}^{8}\}\) and \(\{o_{10}^{11}, o_{1}^{12},o_{14}^{13},o_{9}^{14},\) \(o_{5}^{15}, o_{20}^{16}, o_{19}^{17},\) \( o_{13}^{18}\}\), respectively, where \(a_{1}(o_{17}^{5}) = 3.90\), \(a_{1}(o_{10}^{11}) = 4.08\), and \(a_{1}(o_{13}^{18}) = 4.92\).
Next, let O be specified by \(a_{3}\) that takes continuous values. In information table T3 projected from T0 the objects are arranged in ascending order of values of \(a_{3}\) and each object is given a serial superscript from 1 to 20. Let lower and upper bounds be \(a_{3}(o_{15}^{6}) = 4.23\) and \(a_{3}(o_{8}^{11}) = 4.50\), respectively. Then, \(O = \{o_{15}^{6}, o_{3}^{7}, o_{17}^{8}, o_{2}^{9}, o_{16}^{10}, o_{8}^{11}\}\). We approximate O by objects restricted by attribute \(a_{2}\). Under \(\delta _{a_{2}}\) = 0.05, indiscernible classes of objects \(o_{1}, \ldots o_{20}\) are:
Formulae (3) and (4) derives the following approximations:
In continuous information table T2, objects are arranged in ascending order of values of attribute \(a_{2}\) and each object is given a serial superscript from 1 to 20. Using objects with superscripts, the two approximations are rewritten:
Consistent combined rules from collections \(\{o_{8}^{7}, o_{15}^{8}\}\) and \(\{o_{17}^{10}, o_{2}^{11},\) \(o_{16}^{12} \}\) are
where \(a_{2}(o_{8}^{7}) = 2.10, \ a_{2}(o_{15}^{8}) = 2.28, \ a_{2}(o_{17}^{10}) = 2.50,\) and \(a_{2}(o_{16}^{12}) = 2.64\). Inconsistent combined rules from collections \((o_{1}^{5}, o_{4}^{6}, o_{8}^{7}, o_{15}^{8}\}\) and \(\{o_{17}^{10}, o_{2}^{11}, o_{16}^{12}, o_{3}^{13}, o_{13}^{14} )\) are
where \(a_{2}(o_{1}^{5}) = 1.97\) and \(a_{2}(o_{13}^{14}) = 2.70\).
Example 7 shows that a combined rule has higher applicability than single rules. For example, by using the consistent combined rule \(a_{2} = [2.10, 2.28] \rightarrow a_{3} = [4.23, 4.50]\), we can say that an object with attribute \(a_{2}\) value 2.16 supports this rule, because 2.16 is included in [2.10, 2.28]. On the other hand, by using single rules \(a_{2} = 2.10 \rightarrow a_{3} = [4.23, 4.50]\) and \(a_{2} = 2.28 \rightarrow a_{3} = [4.23, 4.50]\), we cannot say what rule the object supports, because 2.16 is discernible with both 2.10 and 2.28 under threshold 0.05.
5 Rule Induction in Incomplete and Continuous Information Tables
When O is specified by restriction \(B = X\), we can say for rules induced from objects in approximations as follows:
-
Object \(o \in C\underline{apr}_{A}(O)\) certainly supports rule \(A = A(o) \rightarrow B = X\) consistently.
-
Object \(o \in C\overline{apr}_{A}(O)\) certainly supports rule \(A = A(o) \rightarrow B = X\) inconsistently.
-
Object \(o \in P\underline{apr}_{A}(O)\) possibly supports \(A = A(o) \rightarrow B = X\) consistently.
-
Object \(o \in P\overline{apr}_{A}(O)\) possibly supports \(A = A(o) \rightarrow B = X\) inconsistently.
We create combined rules from these single rules. Let \(U^{C}_{a}\) be the set of objects with complete and continuous information for attribute a and \(U^{I}_{a}\) be one with incomplete and continuous information.
A combined rule is represented by:
The following treatment is done for each attribute \(a \in A\). \(o \in U^{C}_{a}\) is arranged in ascending order of a(o) and is given a serial superscript from 1 to \(|U^{C}_{a}|\). Objects in \((C\underline{apr}_{A}(O) \cap U^{C}_{a})\), in \((C\overline{apr}_{A}(O) \cap U^{C}_{a})\), in \((P\underline{apr}_{A}(O) \cap U^{C}_{a})\), and in \((P\overline{apr}_{A}(O) \cap U^{C}_{a})\) are arranged in ascending order of attribute a values, respectively. And then the objects are expressed by collections of objects with serial superscripts like \(\{\cdots , o_{i_{h}}^{h}, o_{i_{h+1}}^{h+1}, \cdots , o_{i_{k-1}}^{k-1}, o_{i_{k}}^{k}, \cdots \}\) \((h \le k)\). From collection \((o_{i_{h}}^{h}, o_{i_{i+1}}^{h+1}, \cdots , o_{i_{k-1}}^{k-1}, o_{i_{k}}^{k})\), the antecedent part for a of the combined rule expressed by \(A = [l_{A}, u_{A}] \rightarrow B = X\) is created. For a certain and consistent combined rule,
where Z is \((C\underline{apr}_{A}(O) \cap U^{I}_{a})\).
In the case of certain and inconsistent, possible and consistent, possible and inconsistent combined rules, Z is \((C\overline{apr}_{A}(O) \cap U^{I}_{a})\), \((P\underline{apr}_{A}(O) \cap U^{I}_{a})\), and \((P\overline{apr}_{A}(O) \cap U^{I}_{a})\), respectively.
Proposition 8
Let \(C\underline{r}\) be the set of combined rules induced from \(C\underline{apr}_{A}(O)\) and \(P\underline{r}\) the set from \(P\underline{apr}_{A}(O)\). When O is specified by restriction \(B = X\), if \((A = [l_{A}, u_{A}] \rightarrow B = X) \in C\underline{r}\), then \(\exists l'_{A} \le l_{A}, \exists u'_{A} \ge u_{A} \ (A = [l'_{A}, u'_{A}] \rightarrow B = X) \in P\underline{r}\).
Proof
A single rule created from \(C\underline{apr}_{A}(O)\) is also derived from \(P\underline{apr}_{A}(O)\) because of \(C\underline{apr}_{A}(O) \subseteq P\underline{apr}_{A}(O)\). This means that the proposition holds.
Proposition 9
Let \(C\overline{r}\) be the set of combined rules induced from \(C\overline{apr}_{A}(O)\)and \(P\overline{r}\) the set from \(P\overline{apr}_{A}(O)\). When O is specified by restriction \(B = X\), if \((A = [l_{A}, u_{A}] \rightarrow B = X) \in C\overline{r}\), then \(\exists l'_{A} \le l_{A}, \exists u'_{A} \ge u_{A} \ (A = [l'_{A}, u'_{A}] \rightarrow B = X) \in P\overline{r}\).
Proof
The proof is similar to one for Proposition 8.
Proposition 10
Let \(C\underline{r}\) be the set of combined rules induced from \(C\underline{apr}_{A}(O)\) and \(C\overline{r}\) the set from \(C\overline{apr}_{A}(O)\). When O is specified by restriction \(B = X\), if \((A = [l_{A}, u_{A}] \rightarrow B = X) \in C\underline{r}\), then \(\exists l'_{A} \le l_{A}, \exists u'_{A} \ge u_{A}\) \( \ (A = [l'_{A}, u'_{A}] \rightarrow B = X) \in C\overline{r}\).
Proof
The proof is similar to one for Proposition 8.
Proposition 11
Let \(P\underline{r}\) be the set of combined rules induced from \(P\underline{apr}_{A}(O)\) and \(P\overline{r}\) the set from \(P\overline{apr}_{A}(O)\). When O is specified by restriction \(B = X\), if \((A = [l_{A}, u_{A}] \rightarrow B = X) \in P\underline{r}\), then \(\exists l'_{A} \le l_{A}, \exists u'_{A} \ge u_{A} \ (A = [l'_{A}, u'_{A}] \rightarrow B = X) \in P\overline{r}\).
Proof
The proof is similar to one for Proposition 8.
Example 8
Let O be specified by restriction \(a_{4}=x\) in IT2 of Fig. 5.
Each \(C[o_{i}]_{a_{1}}\) with \(i =1,\dots ,20\) is, respectively,
Each \(P[o_{i}]_{a_{1}}\) with \(i =1,\dots ,20\) is, respectively,
Four approximations are:
Objects in \(U_{a_{1}}^{C}\) are arranged in ascending order of \(a_{1}(o)\) like this:
Giving serial superscripts to these objects,
And then, the four approximations are rewritten like these:
Objects are separated into two parts: ones with a superscript and ones with only a subscript; namely, ones having complete information and ones having incomplete information for attribute \(a_{1}\), respectively. That is,
From these expressions and formula (38), four kinds of combined rules are created. A certain and consistent rule is:
Possible and consistent rules are:
Certain and inconsistent rules are:
A possible and inconsistent rule is:
6 Conclusions
We have described rough sets that consist of lower and upper approximations and rule induction from the rough sets in continuous information tables.
First, we have handled complete and continuous information tables. Rough sets are derived directly using the indiscernibility relation on a set of attributes.
Second, we have coped with incomplete and continuous information tables under possible world semantics. We use a possible indiscernibility relation as a possible world. This is because the number of possible indiscernibility relations is finite, although the number of possible tables, which is traditionally used under possible world semantics, is infinite. The family of possible indiscernibility relations has a lattice structure with the minimum and the maximum elements. The families of lower and upper approximations that are derived from each possible indiscernibility relation also have a lattice structure for set inclusion. The approximations are obtained by using the minimum and the maximum possible indiscernibility relations. Therefore, we have no difficulty of computational complexity for the number of attribute values with incomplete information, although the number of possible indiscernibility relations increases exponentially as the number of values with incomplete information grows linearly.
Consequently, we derive four kinds of approximations. These approximations are the same as those obtained from an extended approach directly using indiscernibility relations. Therefore, this justifies the extended approach in our previous work.
From these approximations, we derive four kinds of single rules that are supported by individual objects. These single rules have weak applicability. To improve the applicability, we have brought serial single rules into one combined rule. The combined rule has greater applicability than the single ones that are used to create it.
Notes
- 1.
See reference [16] for an approach using possible classes.
- 2.
\(R_{A}\) is formally \(R_{A}^{\delta _{A}}\). \(\delta _{A}\) is omitted unless confusion.
- 3.
Subscript a of \(\delta _{a}\) is omitted if no confusion.
- 4.
Hu and Yao also say that approximations are described by using an interval set in information tables with incomplete information [5].
References
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley Publishing Company (1995)
Bosc, P., Duval, L., Pivert, O.: An initial approach to the evaluation of possibilistic queries addressed to possibilistic databases. Fuzzy Sets and Syst. 140, 151–166 (2003)
Grahne, G.: The problem of incomplete information in relational databases. Lect. Notes Comput. Sci. 554 (1991)
Grzymala-Busse, J.W.: Mining numerical data a rough set approach. In: Peters, J.F., Skowron, A. (eds.) Transactions on Rough Sets XI. LNCS, vol. 5946, pp. 1–13. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11479-3_1
Hu, M.J., Yao, Y.Y.: Rough set approximations in an incomplete information table. In: Polkowski, L., Yao, Y., Artiemjew, P., Ciucci, D., Liu, D., Ślȩzak, D., Zielosko, B. (eds.) IJCRS 2017. LNCS (LNAI), vol. 10314, pp. 200–215. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60840-2_14
Imielinski, T., Lipski, W.: Incomplete information in relational databases. J. ACM 31, 761–791 (1984)
Jing, S., She, K., Ali, S.: A universal neighborhood rough sets model for knowledge discovering from incomplete heterogeneous data. Expert Syst. 30(1), 89–96 (2013). https://doi.org/10.1111/j.1468-0394.2012.00633_x
Kryszkiewicz, M.: Rules in incomplete information systems. Inf. Sci. 113, 271–292 (1999)
Libkin, L., Wong, L.: Semantic representations and query languages for or-sets. J. Comput. Syst. Sci. 52, 125–142 (1996)
Lin, T.Y.: Neighborhood systems: a qualitative theory for fuzzy and rough sets. In: Wang, P. (ed.) Advances in Machine Intelligence and Soft Computing, vol. IV, pp. 132–155. Duke University (1997)
Nakata, M., Sakai, H.: Applying rough sets to information tables containing missing values. In: Proceedings of 39th International Symposium on Multiple-Valued Logic, pp. 286–291. IEEE Press (2009). https://doi.org/10.1109/ISMVL.2009.1
Nakata, M., Sakai, H.: Twofold rough approximations under incomplete information. Int. J. Gen. Syst. 42, 546–571 (2013). https://doi.org/10.1080/17451000.2013.798898
Nakata, M., Sakai, H.: Describing rough approximations by indiscernibility relations in information tables with incomplete information. In: Carvalho, J.P., Lesot, M.-J., Kaymak, U., Vieira, S., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2016, Part II. CCIS, vol. 611, pp. 355–366. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40581-0_29
Nakata, M., Sakai, H., Hara, K.: Rules induced from rough sets in information tables with continuous values. In: Medina, J., Ojeda-Aciego, M., Verdegay, J.L., Pelta, D.A., Cabrera, I.P., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2018, Part II. CCIS, vol. 854, pp. 490–502. Springer Cham (2018). https://doi.org/10.1007/978-3-319-91476-3_41
Nakata, M., Sakai, H., Hara, K.: Rule induction based on indiscernible classes from rough sets in information tables with continuous Values. In: Nguyen, H.S., et al. (eds.) IJCRS 2018, LNAI 11103, pp. 323-336. Springer (2018). https://doi.org/10.1007/978-3-319-99368-3_25
Nakata, M., Sakai, H., Hara, K.: Rough sets based on possibly indiscernible Classes in Incomplete Information Tables with Continuous Values. In: Hassanien, A.B., et al. (eds.) Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2019, Advances in Intelligent Systems and Computing 1058, pp. 13–23. Springer (2019). https://doi.org/10.1007/978-3-030-31129-2_2
Paredaens, J., De Bra, P., Gyssens, M., Van Gucht, D.: The Structure of the Relational Database Model. Springer-Verlag (1989)
Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991). https://doi.org/10.1007/978-94-011-3534-4
Stefanowski, J., Tsoukiàs, A.: Incomplete information tables and rough classification. Comput. Intell. 17, 545–566 (2001)
Yang, X., Zhang, M., Dou, H., Yang, Y.: Neighborhood systems-based rough sets in incomplete information system. Inf. Sci. 24, 858–867 (2011). https://doi.org/10.1016/j.knosys.2011.03.007
Zenga, A., Lia, T., Liuc, D., Zhanga, J., Chena, H.: A fuzzy rough set approach for incremental feature selection on hybrid information systems. Fuzzy Sets Syst. 258, 39–60 (2015). https://doi.org/10.1016/j.fss.2014.08.014
Zhao, B., Chen, X., Zeng, Q.: Incomplete hybrid attributes reduction based on neighborhood granulation and approximation. In: 2009 International Conference on Mechatronics and Automation, pp. 2066–2071. IEEE Press (2009)
Zimányi, E., Pirotte, A.: Imperfect Information in Relational Databases. In: Motro, A., Smets, P. (eds.) Uncertainty Management in Information Systems: From Needs to Solutions, pp. 35–87. Kluwer Academic Publishers (1997)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Nakata, M., Sakai, H., Hara, K. (2021). Rough Sets and Rule Induction from Indiscernibility Relations Based on Possible World Semantics in Incomplete Information Systems with Continuous Domains. In: Hassanien, A.E., Darwish, A. (eds) Machine Learning and Big Data Analytics Paradigms: Analysis, Applications and Challenges. Studies in Big Data, vol 77. Springer, Cham. https://doi.org/10.1007/978-3-030-59338-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-59338-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59337-7
Online ISBN: 978-3-030-59338-4
eBook Packages: Computer ScienceComputer Science (R0)