Keywords

1 Introduction

We provide an introduction to attribute reduction or feature selection based on rough set theory [35, 36, 39]. Rough set theory approaches uncertainty or inconsistency of membership for sets due to incomplete or granular information. In a rough set approach for data analysis, data sets are usually given by decision tables which consist of objects (items) described by attributes. Moreover, each object in decision tables is classified into decision classes. Because of incompleteness of given attributes, some objects are indiscernible to each other by the attributes, and that causes uncertainty of decision classes. Such an uncertain decision class is approximated by two precise sets, called lower and upper approximations. The difference of the upper and lower approximations is called a boundary.

One of the major topics for rough set based data analysis is (relative) attribute reduction [37, 39]. Attribute reduction is a problem to delete redundant condition (explanatory) attributes for the classification of the decision classes. Minimal sets of attributes preserving a part of information of the classification are called reducts. Reducts can be interpreted as important sets of attributes for the classification. Several types of reducts have been proposed according to a part of the information which should be preserved [23, 36, 43, 45]. Originally, Pawlak proposed reducts preserving the positive region [36, 43], which is the union of all lower approximations of decision classes, in other words, the set of all certainly classified objects. Ślȩzak proposed ones preserving all boundaries of decision classes [45]. One of the authors also proposed two types of reducts, which preserve all lower approximations and all upper approximations of decision classes, and show that they are equivalent to reducts preserving the positive region and all boundaries, respectively [23].

Inspired by the above studies, we provide a framework to discuss attribute reduction in the rough set theory. We regard attribute reduction as removing condition attributes with preserving some part of the lower/upper approximations of the decision classes, because the approximations summarize the classification ability of the condition attributes. Hence, we define several types of reducts according to structures of the approximations [23, 24]. They are called “structure-based” reducts.

When several types of structure-based reducts are defined, we would be interested in whether one reduct is stronger/weaker than another reduct, in other words, one preserves more/less structure than the other. Therefore, we have investigated the strong-weak relation among different types of structure-based reducts. As a result of the investigation, we obtain a strong-weak hierarchy of structure-based reducts. The strong-weak hierarchy is useful when we search the best reduct for an application, because it provides a trade-off between the size (cost for precise classification) of a reduct and its classification ability. It is an advantage of the variations of structure-based reducts.

The rough set model is extended to apply to various kinds of data sets [12, 16, 22, 29, 38, 43, 47, 53, 54]. Two important extensions of the rough set model are the variable precision rough set model [53, 54] and the dominance-based rough set model [16]. The variable precision rough set model is a probabilistic extension. Given precision parameters, requirements for lower and upper approximations are relaxed to tolerate errors in decision tables. The dominance-based rough set model is applied to decision tables with ordinal attributes, where decision classes are ordered and monotonically depend on the ordinal attributes. It deals with inconsistency between the classification of the ordinal decision classes and the monotonic dependence. Instead of decision classes, upward unions and downward unions of decision classes are approximated. In the extended rough set models, we have studied structure-based reducts [20, 21, 25, 26, 31].

In the classical rough set model, it is well-known that reducts are associated with prime implicants of a Boolean function [37, 43]. We can efficiently enumerate reducts by converting it to enumerating prime implicants of the Boolean function. Like that conversion, the methodology solving a problem by solutions of a Boolean equation is called Boolean reasoning [37, 43]. In this chapter, we propose a unified formulation of several Boolean functions corresponding to several types of reducts in the classical and extended rough set models.

In this chapter, we show our theoretical results of structure-based reducts in the classical and extended rough set models, including definitions of reducts and their strong-weak hierarchy. The results consist of our papers [20, 23, 25, 31]. Our main contributions are to propose structure-based reducts, investigate strong-weak relations of reducts, and connect reducts with prime implicants of Boolean functions in a unified formulation in the variable precision and dominance-based rough set models. For the structure-based reducts in the variable precision rough set model, we revise their definitions from our previous work [20]. Parts of the results were independently developed by other authors [33, 45, 49, 52].

This chapter is organized as follows. In Sect. 7.2, we study structure-based reducts in the classical rough set models. Firstly, we define a decision table and the rough set model of the decision table. Then, we introduce several types of reducts including structure-based reducts and others. We show that all types of reducts are reduced to two different types. Finally, we connect all reducts of each type with the prime implicants of a specific Boolean function. Sections 7.3 and 7.4 are devoted to structure-based reducts in the variable precision rough set model and those in the dominance-based rough set model, respectively. Those sections have almost the same organization as that of Sect. 7.2, namely, defining a rough set model and reducts, investigating strong-weak relations of reducts, and connecting reducts with prime implicants of Boolean functions. Concluding remarks are given in Sect. 7.5.

2 Structure-Based Attribute Reduction in Rough Set Models

2.1 Decision Tables

In rough set theory, analysed data sets form decision tables [36, 39]. A decision table is defined by \(\mathbb {D}=(U, AT =C\cup \{d\},\{V_a\}_{a\in AT })\).Footnote 1 \(U\) is a finite set of objects. \( AT \) is a finite set of attributes. \(V\) is a set of attribute values. Each attribute \(a\in AT \) is a function \(a:U\rightarrow V_a\), where \(V_a\subseteq V\) is a set of values for \(a\). For an object \(u\in U\) and an attribute \(a\in AT \), \(a(u)\) is the value of \(u\) with respect to \(a\). For \(A=\{a_{i_1},a_{i_2},\dots ,a_{i_k}\}\subseteq AT \), \(V_A\) is the Cartesian product of \(\{V_{a_{i_l}}\}_{l=1,2,\dots ,k}\), namely, \(V_A = \varPi _{a_{i_l}\in A}V_{a_{i_l}} = \{(v_{i_1},v_{i_2},\dots ,v_{i_k})\;|\;v_{i_l}\in V_{a_{i_l}},l=1,2,\dots ,k\}\). \(A(u)\) is the tuple of the values of \(u\) with respect to \(A\), namely, \(A(u) = (a_{i_1}(u),a_{i_2}(u),\dots ,a_{i_k}(u))\). The attribute set \( AT \) is divided into a condition attribute set \(C\) and a decision attribute \(d\) to investigate the dependency of the decision attribute on condition attributes or the causal effect of condition attributes on the decision attribute. Throughout this chapter, we consider that the objects and the condition attributes are indexed by \(U=\{u_1,u_2,\dots ,u_n\}\) and \(C = \{c_1,c_2,\dots ,c_m\}\), where \(n=|U|\) and \(m=|C|\). Moreover, we define the decision attribute values as \(V_d=\{1,2,\dots ,p\}\).

Remark 1

Decision tables are identical to data sets or data tables for the classification problem or the supervised learning in the data mining or machine learning literature, in which condition attributes are called attributes or independent variables, the decision attribute is a class attribute or dependent variable, and objects are tuples or samples. In that literature, each object is given by a tuple of attribute values with a class label (decision attribute value). However, we use the form of decision tables in this chapter by two reason. One is that we often deal with subsets of the attributes, so we prefer to let the symbols of the attributes be explicit. The other is to emphasise the view that a relation (e.g. the equivalence relation) on the object set is induced from a structure of the attribute value space (e.g. equivalence of values) through the attributes (functions).

Example 1

Consider a decision table \(\mathbb {D}=(U,C\cup \{d\},\{V_a\})\) about car evaluations in Table 7.1, where \(U=\{u_1,u_2,\dots ,u_7\}\), \(C=\{\text {Pr},\text {Ma},\text {Sa}\}\) and \(d=\text {Ev}\). The attribute value sets are given by \(V_{\text {Pr}} = V_{\text {Ma}} = V_{\text {Sa}} = \{\text {low}, \text {med}, \text {high}\}\), \(V_{\text {Ev}} = \{\text {unacc},\text {acc},\text {good}\}\). Condition attributes Pr, Ma, and Sa indicate price, maintenance cost, and safety of a car, respectively, by values high, med (medium), and low. Decision attribute Ev means evaluation of a car by some customer(s).

Table 7.1 Decision table of car evaluations

The value of \(u_1\) with respect to Pr is \(\text {Pr}(u_1)=\text {high}\), and that of \(u_2\) with respect to Ev is \(\text {Ev}(u_2) = \text {unacc}\). The value tuple of \(u_4\) with respect to \(C = \{\text {Pr},\text {Ma},\text {Sa}\}\) is \(C(u_4) = (\text {med},\text {high},\text {low})\).

Given an attribute subset \(A\subseteq AT \), we define an indiscernibility relation on \(U\) with respect to \(A\), denoted by \(R_A\), as follows:

$$\begin{aligned} R_A = \{(u,u')\in U^2\;|\;a(u)=a(u'),\quad \text {for any }a\in A\}. \end{aligned}$$

\(R_A\) is the set of the object pairs each of which is indiscernible by the given attributes \(A\). Obviously, \(R_A\) is an equivalence relation, which is reflexive, symmetric, and transitive. From \(R_A\), we define the equivalence class of an object \(u\in U\), denoted by \(R_A(u)\), as follows:

$$\begin{aligned} R_A(u) = \{u'\in U\;|\;(u',u)\in R_A\}. \end{aligned}$$

\(R_A(u)\) is the set of objects which have the same values as \(u\) for all attributes in \(A\). We denote the set of all equivalence classes with respect to \(R_A\) by \(U/R_A = \{R_A(u)\;|u\in U\}\). Every equivalence class with respect to the decision attribute \(d\) is called a decision class. For each value of the decision attribute \(i\in V_d\), we define the corresponding decision class \(X_i = \{u\in U\;|\;d(u)=i\}\). Clearly, \(\fancyscript{X} = \{X_1,X_2,\dots ,X_p\}\) forms a partition of \(U\).

Example 2

Remember \(\mathbb {D}=(U,C\cup \{d\},\{V_a\})\) in Table 7.1. Let \(A = \{\text {Pr},\text {Ma}\}\) be an attribute subset. The discernibility relation \(R_A\) is described as the following matrix. Symbol \(*\) indicates that the corresponding object pair \(u_i\) and \(u_j\) is in the discernibility relation, i.e., \((u_i,u_j)\in R_A\).

\(\begin{array}{l|lllllll} &{} u_1 &{} u_2 &{} u_3 &{} u_4 &{} u_5 &{} u_6 &{} u_7 \\ \hline u_1 &{} * &{} &{} &{} &{} &{} &{} \\ u_2 &{} &{} * &{} * &{} &{} * &{} * &{} \\ u_3 &{} &{} * &{} * &{} &{} * &{} * &{} \\ u_4 &{} &{} &{} &{} * &{} &{} &{} \\ u_5 &{} &{} * &{} * &{} &{} * &{} * &{} \\ u_6 &{} &{} * &{} * &{} &{} * &{} * &{} \\ u_7 &{} &{} &{} &{} &{} &{} &{} * \\ \end{array}\)

From the matrix, we can easily see that the equivalence classes by \(R_A\) form a partition of \(U\), namely, \(U/R_A = \{\{u_1\},\{u_4\},\{u_7\},\{u_2,u_3,u_5,u_6\}\}\).

The decision classes of the decision table \(\mathbb {D}\) are obtained as \(X_{\text {unacc}}=\{u_1,u_2\}\), \(X_{\text {acc}}=\{u_3,u_4,u_5\}\), \(X_{\text {good}}=\{u_6,u_7\}\).

2.2 Rough Set Models

Let \(A\) be a subset of the attribute set \( AT \) and \(X\) be a subset of the object set \(U\). When \(X\) can be represented by a union of elements in \(U/R_A\), we can say that the classification by \(X\) is consistent with the information of \(A\). Such subsets of objects are called definable sets with respect to \(A\). On the other hand, considering an object subset \(X\) which cannot be represented by any union of elements in \(U/R_A\), the classification of \(X\) is inconsistent with \(A\). The classical Rough Set Model (RSM) [35, 36, 39] deals with the inconsistency by two operators for object sets, called lower and upper approximations. For \(A\subseteq AT \) and \(X\subseteq U\), the lower approximation \(\mathrm {LA}_A(X)\) and the upper approximation \(\mathrm {UA}_A(X)\) of \(X\) with respect to \(A\) is defined by:

$$\begin{aligned} \mathrm {LA}_A(X)&= \{u\in U\;|\;R_A(u)\subseteq X\},\\ \mathrm {UA}_A(X)&= \{u\in U\;|\;R_A(u)\cap X\ne \emptyset \}. \end{aligned}$$

The difference between \(\mathrm {UA}_A(X)\) and \(\mathrm {LA}_A(X)\) is called the boundary of \(X\) with respect to \(A\), which is defined by:

$$\begin{aligned} \mathrm {BN}_A(X) = \mathrm {UA}_A(X)\setminus \mathrm {LA}_A(X). \end{aligned}$$

\(\mathrm {LA}_A(X)\) is interpreted as the set of objects which are certainly classified to \(X\) in view of \(A\). While, \(\mathrm {UA}_A(X)\) is the set of objects which are possibly classified to \(X\) in view of \(A\). \(\mathrm {BN}_A(X)\) is a set of objects whose membership to \(X\) is doubtful.

The approximations are a definable set with respect to \(A\), where a definable set with respect to \(A\) is a set defined by the union of elements in \(U/R_A\):

$$\begin{aligned} \mathrm {LA}_A(X)&= \bigcup _{R_A(u)\subseteq X}R_A(u)=\bigcup _{u\in \mathrm {LA}_A(X)}R_A(u),\\ \mathrm {UA}_A(X)&= \bigcup _{R_A(u)\cap X\ne \emptyset }R_A(u)=\bigcup _{u\in \mathrm {UA}_A(X)}R_A(u). \end{aligned}$$

The boundary is necessarily definable because \(U/R_A\) is the partition of \(U\).

In fact, \(\mathrm {LA}_A(X)\) and \(\mathrm {UA}_A(X)\) are “lower” and “upper” approximations of \(X_i\):

$$\begin{aligned} \mathrm {LA}_A(X)\subseteq X\subseteq \mathrm {UA}_A(X). \end{aligned}$$
(7.1)

By the above inclusion relations and the definition of the boundary, it holds that

$$\begin{aligned} \mathrm {LA}_A(X)&= X\setminus \mathrm {BN}_A(X), \end{aligned}$$
(7.2)
$$\begin{aligned} \mathrm {UA}_A(X)&= X\cup \mathrm {BN}_A(X). \end{aligned}$$
(7.3)

For \(B\subset A\subseteq AT \), we have,

$$\begin{aligned} \mathrm {LA}_B(X)\subseteq \mathrm {LA}_A(X)\text { and }\mathrm {UA}_B(X)\supseteq \mathrm {UA}_A(X). \end{aligned}$$
(7.4)

When \(B\) is included in \(A\), the approximations with respect to \(B\) are coarser that those with respect to \(A\). It means that dropping some attributes, i.e., information, decline the accuracy of RSM.

So far, we have defined approximations of \(X\) from the lower and upper definable sets. We can approximate the partition \(X\) and \(U\setminus {}X\) by three definable sets. They are called positive, boundary, and negative regions of \(X\) with respect to \(A\), denoted by \(\mathrm {POS}_A(X)\), \(\mathrm {BND}_A(X)\), and \(\mathrm {NEG}_A(X)\), respectively:

$$\begin{aligned} \mathrm {POS}_A(X)&= \bigcup \{E\in U/R_A\;|\;E\subseteq X\},\\ \mathrm {BND}_A(X)&= \bigcup \{E\in U/R_A\;|\;E\cap X\ne \emptyset \text { and }E\cap U\setminus X\ne \emptyset \},\\ \mathrm {NEG}_A(X)&= \bigcup \{E\in U/R_A\;|\;E\subseteq U\setminus X\}. \end{aligned}$$

\(\mathrm {POS}_A(X)\) is the union of elements in \(U/R_A\) which are completely included in \(X\), while \(\mathrm {NEG}_A(X)\) is the union of elements in \(U/R_A\) which are completely excluded from \(X\). \(\mathrm {BND}_A(X)\) is the union of the rest of elements in \(U/R_A\). Clearly, \(\mathrm {POS}_A(X)\), \(\mathrm {BND}_A(X)\), and \(\mathrm {NEG}_A(X)\) form a partition of \(U\). We can easily see the following correspondence:

$$\begin{aligned} \mathrm {POS}_A(X)&= \mathrm {LA}_A(X),\\ \mathrm {BND}_A(X)&= \mathrm {BN}_A(X),\\ \mathrm {NEG}_A(X)&= U\setminus \mathrm {UA}_A(X). \end{aligned}$$

In the rest of this section, we consider RSM for decision tables, namely, we only deal with approximations of decision classes \(\fancyscript{X} = \{X_1,X_2,\dots ,X_p\}\) with respect to subsets of condition attributes \(A\subseteq C\).

Example 3

Remember the decision classes \(X_{\text {unacc}}=\{u_1,u_2\}\), \(X_{\text {acc}}=\{u_3,u_4,u_5\}\) and \(X_{\text {good}}=\{u_6,u_7\}\) of the decision table in Table 7.1. The lower and upper approximations with respect to \(C\) of \(X_{\text {unacc}}\), \(X_{\text {cc}}\) and \(X_{\text {good}}\) are obtained as follows:

\(\begin{array}{lll@{\quad }lll} \mathrm {LA}_C(X_{\text {unacc}}) &{} =&{} \{u_1\},&{} \mathrm {UA}_C(X_{\text {unacc}}) &{} =&{} \{u_1,u_2,u_3\},\\ \mathrm {LA}_C(X_{\text {acc}}) &{} =&{} \{u_4\},&{} \mathrm {UA}_C(X_{\text {acc}}) &{} =&{} \{u_2,u_3,u_4,u_5,u_6\},\\ \mathrm {LA}_C(X_{\text {good}}) &{} =&{} \{u_7\},&{} \mathrm {UA}_C(X_{\text {good}}) &{} =&{} \{u_5,u_6,u_7\}. \end{array}\)

We can see that \(\mathrm {LA}_C(X_i)\subseteq X_i\subseteq \mathrm {UA}_C(X_i)\) for each \(i=\text {unacc},\text {acc},\text {good}\). Moreover, we can also see that each approximation is the union of equivalence classes included in the approximation, e.g., \(\mathrm {UA}_C(X_{\text {acc}}) = \{u_2,u_3\}\cup \{u_4\}\cup \{u_5,u_6\}\).

We reduce condition attributes to \(A = \{\text {Pr}\}\). The approximations become:

\(\begin{array}{lll@{\quad }lll} \mathrm {LA}_A(X_{\text {unacc}}) &{} =&{} \{u_1\},&{} \mathrm {UA}_A(X_{\text {unacc}}) &{} =&{} \{u_1,u_2,u_3,u_4,u_5,u_6\},\\ \mathrm {LA}_A(X_{\text {acc}}) &{} =&{} \emptyset ,&{} \mathrm {UA}_A(X_{\text {acc}}) &{} =&{} \{u_2,u_3,u_4,u_5,u_6\},\\ \mathrm {LA}_A(X_{\text {good}}) &{} =&{} \{u_7\},&{} \mathrm {UA}_A(X_{\text {good}}) &{} =&{} \{u_2,u_3,u_4,u_5,u_6,u_7\}. \end{array}\)

The approximations with respect to \(A\) are coarser than those with respect to \(C\), namely, \(\mathrm {LA}_A(X_i)\subseteq \mathrm {LA}_C(X_i)\) and \(\mathrm {UA}_A(X_i)\supseteq \mathrm {UA}_C(X_i)\) for each \(i=\text {unacc},\text {acc},\text {good}\).

For every \(X_i\), the lower approximation \(\mathrm {LA}_A(X_i)\) and the boundary \(\mathrm {BN}_A(X_i)\) can be represented using all upper approximations of decision classes \(\mathrm {UA}_A(X_1)\), \(\mathrm {UA}_A(X_2), \ldots , \mathrm {UA}_A(X_p)\):

$$\begin{aligned} \mathrm {LA}_A(X_i)&= \mathrm {UA}_A(X_i)\setminus \bigcup _{j\in V_d\setminus \{i\}}\mathrm {UA}_A(X_j), \end{aligned}$$
(7.5)
$$\begin{aligned} \mathrm {BN}_A(X_i)&= \mathrm {UA}_A(X_i)\cap \bigcup _{j\in V_d\setminus \{i\}}\mathrm {UA}_A(X_j). \end{aligned}$$
(7.6)

All upper approximations form a cover of \(U\):

$$\begin{aligned} U = \bigcup _{i\in V_d}\mathrm {UA}_A(X_i). \end{aligned}$$
(7.7)

A positive region with respect to \(A\subseteq C\) is also defined for the decision attribute \(d\) or equivalently for the decision table \(\mathbb {D}\). It is the union of all positive regions of decision classes, i.e., the set of objects which are certainly classified to exactly one of the decision classes:

$$\begin{aligned} \mathrm {POS}_A(d) = \bigcup _{i\in V_d}\mathrm {POS}_A(X_i). \end{aligned}$$

A generalized decision function [1, 45] with respect to \(A\subseteq C\), denoted by \(\partial _A:U\rightarrow 2^{V_d}\), provides a useful representation of RSM. For \(u\in U\), \(\partial _A(u)\) is a set of decision attribute values or decision classes to which \(u\) is possibly classified:

$$\begin{aligned} \partial _A(u) = \{i\in V_d\;|\;X_i\cap R_A(u)\ne \emptyset \}. \end{aligned}$$

The generalized decision function gives an object-wise view of RSM. The lower and upper approximations can be expressed by the generalized decision function:

$$\begin{aligned} \mathrm {LA}_A(X_i)&= \{u\in U\;|\;\partial _A(u)=\{i\}\},\\ \mathrm {UA}_A(X_i)&= \{u\in U\;|\;\partial _A(u)\ni i\}. \end{aligned}$$

Because \(\partial _A(u)\) is defined based on \(R_A(u)\), we have

$$\begin{aligned} \partial _A(u) = \partial _A(u') \text { if }(u,u')\in R_A, \end{aligned}$$

and because each object \(u\) is included in at least one upper approximation, we have

$$\begin{aligned} \partial _A(u)\ne \emptyset . \end{aligned}$$

The monotonic property of upper approximations is represented as:

$$\begin{aligned} B\subseteq A \Rightarrow \partial _B(u) \supseteq \partial _A(u) \quad \text { for all }u\in U. \end{aligned}$$

Example 4

Remember \(\mathbb {D}=(U,C\cup \{d\},\{V_a\})\) in Table 7.1. The generalized decision function \(\partial _C\) is obtained as follows.

\(\begin{array}{l} \partial _C(u_1) = \{\text {unacc}\},\quad \partial _C(u_2) = \partial _C(u_3) = \{\text {unacc},\text {acc}\}, \\ \partial _C(u_4) = \{\text {acc}\},\quad \partial _C(u_5) = \partial _C(u_6) = \{\text {acc},\text {good}\}, \\ \partial _C(u_7) = \{\text {good}\}. \end{array}\)

For \(A\subseteq C\), a quality of classification (or quality of approximation) of the decision attribute \(d\) with respect to \(A\) is defined by:

$$\begin{aligned} \gamma _A(d) = \frac{|\mathrm {POS}_A(d)|}{|U|}. \end{aligned}$$
(7.8)

It measures to what degree objects are correctly classified by RSM.

2.3 Reducts in Rough Set Models

2.3.1 Preserving Positive Region, Quality, and Generalized Decisions

Attribute reduction is to find important subsets of condition attributes by dropping as many as possible other condition attributes while preserving some specific information of RSM for a decision table \(\mathbb {D}\). A minimal subset of condition attributes preserving the information is called a relative (or decision) reduct. In this chapter, we call it “reduct” for short. Reducts are originally defined to preserve the positive region \(\mathrm {POS}_C(d)\) [36, 43].

Definition 1

([36, 43]) A reduct is a minimal condition attribute subset \(A \subseteq C\) satisfying the following condition:

$$\begin{aligned} \mathrm {POS}_A(d)=\mathrm {POS}_C(d). \end{aligned}$$
(P)

Here, the minimality is defined in terms of the set inclusion, i.e., there is no proper subset \(A'\subset A\) satisfying (P).

A condition attribute subset \(A\) satisfying (P) preserves the information of the certain classification in the decision table. Generally, there exist more than one reduct in a decision table. The intersection of all reducts is called the core. Every element in the core is an essential condition attribute to preserve the information of \(\mathrm {POS}_C(d)\). The core can be empty. On the other hand, the condition attributes which do not belong to any reducts can be dropped without deterioration of the information. We call the original reduct a P-reduct.

Remark 2

Condition (P) is monotonic with respect to the set inclusion of condition attributes, i.e., for \(A'\subseteq A\subseteq C\) we have \(\mathrm {POS}_{A'}(d)=\mathrm {POS}_{C}(d)\) implies \(\mathrm {POS}_{A}(d)=\mathrm {POS}_{C}(d)\). Hence, the above minimality condition for \(A\) is equivalent to that there is no condition attribute \(a\in A\) such that \(A\setminus \{a\}\) satisfies (P).

Example 5

Remember \(\mathbb {D}=(U,C\cup \{d\},\{V_a\})\) in Table 7.1. The set of all condition attributes \(C\) obviously satisfies condition (P), but it is not a P-reduct because a proper subset \(A = \{\text {Pr},\text {Ma}\}\) satisfies (P). On the other hand, \(A\) is a P-reduct because all of proper subsets of \(A\) do not preserve the positive region: \(\mathrm {POS}_{\{\text {Pr}\}}(d) = \{u_1,u_7\}\) and \(\mathrm {POS}_{\{\text {Ma}\}}(d) = \mathrm {POS}_{\emptyset }(d) = \emptyset \).

We can define another kind of reducts preserving the quality of classification [23, 39, 40].

Definition 2

([40]) A Q-reduct is a minimal condition attribute subset \(A \subseteq C\) satisfying the following condition:

$$\begin{aligned} \gamma _A(d)=\gamma _C(d). \end{aligned}$$
(Q)

Clearly, condition (P) implies (Q). In RSM, the inverse is also true, because the monotonic property of \(\mathrm {POS}(d)\) holds, namely, for \(A'\subseteq A\), \(\mathrm {POS}_{A'}(d)\subseteq \mathrm {POS}_A(d)\). We also call Q-reducts measure-based reducts, because it preserves a predefined measure for information of RSM.

Bazan et al. [1] and Ślȩzak [45] proposed reducts preserving the generalized decision function \(\partial \).

Definition 3

([1, 45]) A G-reduct is a minimal condition attribute subset \(A \subseteq C\) satisfying the following condition:

$$\begin{aligned} \partial _A(u)=\partial _C(u) \text { for all }u\in U. \end{aligned}$$
(G)

As shown in the next section, condition (G) implies (P).

2.3.2 Structure-Based Reducts

Structure-based reducts, proposed by one of the authors [20, 23], are defined to preserve families of object sets (structures) which are composed of lower and upper approximations, or positive, boundary, and negative regions. Hence, reducts of condition (P) can be seen as structure-based.

Now, we introduce structure-based reducts proposed in [23]. First, we define a reduct preserving all lower approximations. The preservation of the lower approximations implies the sustenance of certain classification ability.

Definition 4

([23]) An L-reduct is a minimal condition attribute subset \(A\subseteq C\) preserving the following condition:

$$\begin{aligned} \mathrm {LA}_A(X_i)=\mathrm {LA}_C(X_i)\text { for all }i\in V_d. \end{aligned}$$
(L)

Clearly, condition (L) implies (P) as well as (Q). In RSM, the inverse is also true, because the lower approximations \(\mathrm {LA}(X_i)\), \(i=1,2,\dots ,p\) have the empty intersection with each other, and they are monotonically decreasing with respect to the set inclusion of condition attributes.

However, even if we preserve lower approximations \(\mathrm {LA}_C(X_i)\), \(i=1,2,\ldots ,p\), we may lose the information of boundaries \(\mathrm {BN}_A(X_i)\), \(i=1,2,\ldots ,p\) and the information of upper approximations \(\mathrm {UA}_A(X_i)\), \(i=1,2,\ldots ,p\).

Ślȩzak [45] proposed a type of reducts preserving all boundaries.

Definition 5

([45]) A B-reduct is a minimal condition attribute subset \(A\subseteq C\) preserving the following condition:

$$\begin{aligned} \mathrm {BN}_A(X_i) = \mathrm {BN}_C(X_i) \quad \text { for all }i\in V_d. \end{aligned}$$
(B)

The preservation of boundaries implies the protection against uncertainty expansion. Ślȩzak [45] also showed that condition (B) is equivalent to (G). Hence, we have that a B-reduct is a G-reduct and vice versa.

On the other hand, we proposed a reduct preserving all upper approximations [23].

Definition 6

([23]) A U-reduct is a minimal condition attribute subset \(A\subseteq C\) preserving the following condition:

$$\begin{aligned} \mathrm {UA}_A(X_i) = \mathrm {UA}_C(X_i) \quad \text { for all }i\in V_d. \end{aligned}$$
(U)

By definition, the classification ability of the upper approximations is equal to that of the generalized decision function. Hence, condition (U) is equivalent to (G), and we have that a U-reduct is a G-reduct and vice versa.

From Eqs. (7.2) and (7.5), we know that lower approximations are obtained from upper approximations as well as from boundaries. This fact implies that each of the preservation of all upper approximations or the preservation of all boundaries entails the preservation of all lower approximations.

To sum up the above discussion, we have the next theorem.

Theorem 1

([23, 45]) Let \(A\) be a subset of \(C\). We have the following statements.

  1. (a)

    \(A\) is a Q-reduct if and only if \(A\) is an L-reduct.

  2. (b)

    \(A\) is a P-reduct if and only if \(A\) is an L-reduct.

  3. (c)

    \(A\) is a G-reduct if and only if \(A\) is a U-reduct.

  4. (d)

    \(A\) is a B-reduct if and only if \(A\) is a U-reduct.

  5. (e)

    \(A\) is a U-reduct as well as B-reduct, then \(A\) satisfies condition (L).

All statements in the theorem can be easily proved by the equations which appeared in Sect. 7.2.2. For example, to prove Theorem 1(d), we show that preserving all boundaries implies preserving all upper approximations by Eq. (7.3), and show the converse by Eq. (7.6).

From Theorem 1(e), if \(A\) is a U-reduct then there exists an L-reduct \(B\subseteq A\). Note that the converse is not always true, i.e., for an L-reduct \(B\), there is no guarantee that there exists a U-reduct \(A \supseteq B\).

The relations of 6 types of reducts are depicted in Fig. 7.1. Reducts located in the upper part of the figure preserve regions much more. Therefore, such reducts are larger in the sense of the set inclusion than the other reducts located in the lower part. A line segment connecting two types of reducts implies that, for each reduct of the upper type say \(A\) satisfies the preserving condition of the reduct of the lower one. From the figure, we know that there are 2 different types of reducts: U-reducts and L-reducts, and U-reducts are stronger than L-reducts.

Fig. 7.1
figure 1

Strong-weak hierarchy of 6 types of reducts in RSM

Remark 3

As shown in Theorem 1, the six types of reducts are reduced to two types. However, it is important to define all possible types of reducts and organize them because of two reasons. One is that when we should mention different definitions of reducts (e.g. different authors would give different definitions), we can easily quote the equivalent of them from here. The other is that equivalent reducts (e.g. U-reducts and B-reducts) in RSM could become different in an extended RSM (e.g. variable precision RSM).

Remark 4

From the discussion above, we know that a U-reduct preserves more information than an L-reduct. However, when \(p=2\), we have the following relation:

$$\begin{aligned} \mathrm {UA}_A(X_1)=U\setminus \mathrm {LA}_A(X_2), \quad \mathrm {UA}_A(X_2)=U\setminus \mathrm {LA}_A(X_1). \end{aligned}$$

Namely, we obtain upper approximations from lower approximations. Hence, in that case, an L-reduct is a U-reduct.

Remark 5

From Theorem 1, we see that preserving the measure \(\gamma \) is equivalent to preserving the lower approximations. Contrary, we can define a measure preserving which is equivalent to preserving the upper approximations. For example [23], we define,

$$\begin{aligned} \sigma _A(d)=\frac{\sum _{i\in V_d} |U\setminus \mathrm {UA}_A(X_i)|}{(p-1)|U|}, \end{aligned}$$

then \(\sigma _A(d)=\sigma _C(d)\) is same as condition (U).

2.4 Boolean Functions Representing Reducts

Boolean reasoning [37] is a methodology where solutions of a given problem is associated with those of Boolean equations. In this section, we develop positive (monotone) Boolean functions whose solutions are given by condition attribute subsets satisfying the preserving conditions (L) or (U). Moreover, prime implicants of the Boolean functions exactly correspond to L-reducts or U-reducts. The Boolean functions are useful for enumerating reducts.

The results of this section are well-known and appeared in many papers e.g. [1, 43, 45, 50], but in slightly different expressions from ours. A unified formulation of Boolean functions of different types of reducts is provided using the generalized decision function.

Here, we briefly introduce Boolean functions and Boolean formulas [9, 14]. Let \(q\) be a natural number. A Boolean function is a mapping \(f: \{0,1\}^q\rightarrow \{0,1\}\), where \(w\in \{0,1\}^q\) is called a Boolean vector whose \(i\)th component is \(w_i\). Let \(x_1,x_2, \ldots , x_q\) be Boolean variables. A Boolean formula in the Boolean variables \(x_1,x_2, \ldots , x_q\) is a composition of 0, 1, the variables and operators of conjunction \(\wedge \), disjunction \(\vee \), complementation \(\overline{\cdot }\), such as \(x_1\wedge (x_2\vee \overline{x}_3)\), \((x_1\wedge \overline{x}_2)\vee x_3\), and so on (for complete definition, see e.g. [9]). The Boolean formula is a Boolean function of the variables \(x_1,x_2, \ldots x_q\). Conversely, any Boolean function can be expressed by a Boolean formula. For two Boolean functions \(f\) and \(g\), \(g\le f\) means that \(f\) and \(g\) satisfy \(g(w)\le f(w)\) for all \(w\in \{0,1\}^q\), and \(g<f\) means that \(g\le f\) and \(g\ne f\). A Boolean function \(f\) is positive or monotone, if \(w\le w'\) implies \(f(w)\le f(w')\) for all \(w,w'\in \{0,1\}^q\).

Boolean variables \(x_1,x_2,\ldots \) and the complements \(\overline{x}_1,\overline{x}_2,\ldots \) are called literals. A clause (resp., term) is a disjunction (resp., conjunction) of at most one of \(x_i\) and \(\overline{x}_i\) for each variable. The empty disjunction (resp., conjunction) is denoted by \(\bot \) (resp., \(\top \)). A clause \(c\) (resp., term \(t\)) is an implicate (resp., implicant) of a function \(f\), if \(f\le c\) (resp. \(t\le f\)). Moreover, it is prime if there is no implicate \(c'<c\) (resp., no implicant \(t'>t\)) of \(f\). A conjunction normal form (CNF) (resp., disjunction normal form (DNF)) of a function \(f\) is a Boolean formula of \(f\) which is expressed by a conjunction of implicates (resp. disjunction of implicants) of the function, and it is prime if all its members are prime. The complete CNF (resp. DNF) of \(f\) is the conjunction of all prime implicates (resp. disjunction of all prime implicants) of \(f\). When \(f\) is positive, there is the unique CNF (resp. DNF) of \(f\) which is the complete CNF (resp. DNF) of \(f\).

First, we associate conditions (L) (or (P)) and (U) (or (B)) with the conditions of the generalized decision function. As mentioned in the previous section, condition (U) is equivalent to (G).

Lemma 1

([23, 50]) Let \(A\) be a subset of \(C\). We have the following statements.

  • Condition (L) is equivalent to:

    $$\begin{aligned} \partial _A(u) = \partial _C(u) \quad \text { for all }u\in U \quad \text {such that }|\partial _C(u)|=1. \end{aligned}$$
    (LG)
  • Condition (U) is equivalent to (G), i.e.,

    $$\begin{aligned} \partial _A(u) = \partial _C(u) \quad \text {for all }u\in U. \end{aligned}$$
    (G)

The next lemma is the heart of the Boolean reasoning, which connects two notions: “preserving” and “discerning”.

Lemma 2

For \(u\in U\), the following assertions are equivalent.

  • \(\partial _A(u)=\partial _C(u)\).

  • \(\forall u'\in U,\;(\partial _C(u')\ne \partial _C(u)\Rightarrow \exists a\in A,\;(u',u)\not \in R_{\{a\}})\)

Hence, to preserve the generalized decision of an object \(u\), we should discern \(u\) from other objects \(u'\) having different generalized decisions from that of \(u\).

Using Lemmas 1 and 2, we define two Boolean formulas, called discernibility functions. First, we define a discernibility matrix by \(M=(m_{ij})_{i,j=1,2,\dots ,n}\), where \(ij\)-entry \(m_{ij}\) is a set of condition attributes which discern objects \(u_i\) and \(u_j\),

$$\begin{aligned} m_{ij} = \{c\in C\;|\;c(u_i)\ne c(u_j)\}. \end{aligned}$$

Then, we define discernibility functions.

Definition 7

Discernibility functions \(F^{\mathrm {U}}\) and \(F^{\mathrm {L}}\) are defined as follows:

$$\begin{aligned} F^{\mathrm {U}}(\tilde{c}_1,\dots ,\tilde{c}_m)&= \bigwedge _{i,j|\partial _C(u_j)\ne \partial _C(u_i)}\bigvee _{c\in m_{ij}}\tilde{c},\\ F^{\mathrm {L}}(\tilde{c}_1,\dots ,\tilde{c}_m)&= \bigwedge _{i|\;|\partial _C(u_i)|=1}\bigwedge _{j|\partial _C(u_j)\ne \partial _C(u_i)}\bigvee _{c\in m_{ij}}\tilde{c}, \end{aligned}$$

where \(\tilde{c}_i\) is a Boolean variable corresponding to \(i\)th condition attribute \(c_i\).

For \(A\subseteq C\), we consider a Boolean vector \(\tilde{c}^A=(\tilde{c}^A_1,\dots ,\tilde{c}^A_m)\), where,

$$\begin{aligned} \tilde{c}^A_i = {\left\{ \begin{array}{ll} 1 &{} \text {if }c_i\in A, \\ 0 &{} \text {if }c_i\not \in A. \end{array}\right. } \end{aligned}$$

Let \(F^{\mathrm {U}}(\tilde{c}^A)=1\). Then, for each pair \(u_i\) and \(u_j\) such that \(\partial _C(u_i)\ne \partial _C(u_j)\), the intersection of \(A\) and \(m_{ij}\) should not be empty by the definition of \(F^{\mathrm {U}}\). By Lemma 2, in that case, \(\partial _A(u)=\partial _C(u)\) for each \(u\) holds. We have the similar consequence when \(F^{\mathrm {L}}(\tilde{c}^A)=1\). Therefore, the following theorem holds. Let \({\phi _A=\bigwedge \{\tilde{c}|c \in A\}}\).

Theorem 2

([43, 45, 50]) Let \(A\) be a subset of \(C\). We have the following equivalences:

  • \(A\) satisfies (G), i.e., (U) if and only if \(F^{\mathrm {U}}(\tilde{c}^A)=1\). Moreover, \(A\) is a U-reduct in RSM if and only if \(\phi _{A}\) is a prime implicant of \(F^{\mathrm {U}}\),

  • \(A\) satisfies (LG), i.e., (L) if and only if \(F^{\mathrm {L}}(\tilde{c}^A)=1\). Moreover, \(A\) is an L-reduct in RSM if and only if \(\phi _{A}\) is a prime implicant of \(F^{\mathrm {L}}\).

Definition 7 shows CNFs of \(F^{\mathrm {U}}\) and \(F^{\mathrm {L}}\). The prime CNFs of the functions can be easily obtained. Because the functions are positive, the prime implicants of the prime DNF of each function are all of the prime implicants of the function. Therefore, all reducts of each type appear in the prime DNF of the corresponding function. The problem which converts the prime CNF of a positive Boolean function to its prime DNF is called the dualization problem [14]. We show an example for enumerating reducts by solving the dualization problems of the discernibility functions.

Example 6

Remember the decision table \(\mathbb {D}=(U,C\cup \{d\},\{V_a\})\) in Table 7.1. In Table 7.2, we show again the decision table \(\mathbb {D}\) with the generalized decision function \(\partial _C\).

Table 7.2 The decision table in Table 7.1 with the generalized decision function

The discernibility matrix is obtained as below. Sign \(*\) attached to objects \(u_i\) means that the generalized decision of \(u_i\) is a singleton, or equivalently, \(u_i\) is in the positive region \(\mathrm {POS}_C(d)\).

\( \begin{array}{l|ccccccc} &{} u_1 &{} u_2 &{} u_3 &{} u_4 &{} u_5 &{} u_6 &{} u_7 \\ \hline u_1^* &{} \emptyset &{} C &{} C &{} \{\text {Pr}\} &{} C &{} C &{} C \\ u_2 &{} C &{} \emptyset &{} \emptyset &{} \{\text {Ma},\text {Sa}\} &{} \{\text {Sa}\} &{} \{\text {Sa}\} &{} \{\text {Pr}\} \\ u_3 &{} C &{} \emptyset &{} \emptyset &{} \{\text {Ma},\text {Sa}\} &{} \{\text {Sa}\} &{} \{\text {Sa}\} &{} \{\text {Pr}\} \\ u_4^* &{} \{\text {Pr}\} &{} \{\text {Ma},\text {Sa}\} &{} \{\text {Ma},\text {Sa}\} &{} \emptyset &{} \{\text {Ma},\text {Sa}\} &{} \{\text {Ma},\text {Sa}\} &{} C \\ u_5 &{} C &{} \{\text {Sa}\} &{} \{\text {Sa}\} &{} \{\text {Ma},\text {Sa}\} &{} \emptyset &{} \emptyset &{} \{\text {Pr},\text {Sa}\} \\ u_6 &{} C &{} \{\text {Sa}\} &{} \{\text {Sa}\} &{} \{\text {Ma},\text {Sa}\} &{} \emptyset &{} \emptyset &{} \{\text {Pr},\text {Sa}\} \\ u_7^* &{} C &{} \{\text {Pr}\} &{} \{\text {Pr}\} &{} C &{} \{\text {Pr},\text {Sa}\} &{} \{\text {Pr},\text {Sa}\} &{} \emptyset \\ \end{array} \)

The discernibility functions \(F^{\mathrm {L}}\) and \(F^{\mathrm {L}}\) for the decision table are calculated as:

Therefore, there are two L-reducts {Pr, Ma} and {Pr, Sa}, and one U-reduct {Pr, Sa}.

In this case, we would select the U-reduct {Pr, Sa}, because we obtain the same size of reducts even if we select the other L-reduct.

3 Structure-Based Attribute Reduction in Variable Precision Rough Set Models

3.1 Rough Membership Function

The reason why decision tables are inconsistent is not only lack of knowledge (condition attributes) related to the decision attribute but also noise in observation of attribute values. In the latter case, the classical RSM would not be very useful because it does not permit any errors in the classification of objects into the lower approximations.

To overcome such shortcoming of the classical RSM, the variable precision rough set model (VPRSM) was proposed [53, 54]. Let \(\mathbb {D}=(U, AT =C\cup \{d\},\{V_a\}_{a\in AT })\) be a decision table. In definitions of lower and upper approximations in VPRSM, the following rough membership function of an object \(u\) with respect to an object set \(X\subseteq U\) and an attribute set \(A\subseteq AT \) plays an important role:

$$\begin{aligned} \mu _{X}^A(u)=\frac{|R_A(u)\cap X|}{|R_A(u)|}. \end{aligned}$$

The value \(\mu _X^A(u)\) gives the degree to which the object \(u\) belongs to the set \(X\) under the attribute set \(A\). It can be interpreted as the conditional probability of \(u\in X\) under \(u\in R_A(u)\).

Because the rough membership function of an object is defined based not on the object but its equivalence class, we define a rough membership function of an equivalence class \(E\in U/R_A\) for \(X\):

$$\begin{aligned} \mu _{X}(E)=\frac{|E\cap X|}{|E|}. \end{aligned}$$

An important property of the function is that given two equivalence classes \(E,E'\in U/R_A\) the rough membership of the union \(E\cup E'\) falls between those of \(E\) and \(E'\), namely,

$$\begin{aligned} \min \{\mu _{X}(E),\mu _{X}(E')\}\le \mu _X(E\cup E')\le \max \{\mu _{X}(E),\mu _{X}(E')\}. \end{aligned}$$
(7.9)

3.2 Variable Precision Rough Set Models

Given precision parameters \(0\le \beta <\alpha \le 1\), lower and upper approximations of \(X\) with respect to \(A\) in VPRSM are defined as:

$$\begin{aligned} \mathrm {LA}_{A}^{\alpha }(X)&= \{u\in U\;|\;\mu _{X}^A(u)\ge \alpha \}, \\ \mathrm {UA}_{A}^{\beta }(X)&= \{u\in U\;|\;\mu _{X}^A(u)>\beta \}. \end{aligned}$$

The boundary of \(X\) is defined by \(\mathrm {BN}_A^{\alpha ,\beta }(X)= \mathrm {UA}_{A}^{\beta }(X) \setminus \mathrm {LA}_{A}^{\alpha }(X)\). When \(\alpha =1\) and \(\beta =0\), the approximations of \(X\) are the same as those of the classical RSM. \(\mathrm {LA}_{A}^{\alpha }(X)\) is the set of objects whose degrees of membership to \(X\) are not less than \(\alpha \). On the other hand, \(\mathrm {UA}_{A}^{\beta }(X)\) is the set of objects whose degrees of membership to \(X\) are more than \(\beta \). In this chapter, we restrict our discussion to the situation that \(\alpha =1-\beta \) and \(\beta \in [0,0.5)\). Under that situation, we have the dual property \(\mathrm {LA}_A^\alpha (X)=U\setminus \mathrm {UA}_A^\beta (U\setminus X)\), because \(\mu ^A_X(u)=1-\mu ^A_{U\setminus X}(u)\). We call \(\beta \) an admissible error rate. We denote \(\mathrm {LA}_A^{1-\beta }(X)\) and \(\mathrm {BN}_A^{1-\beta ,\beta }(X)\) by \(\mathrm {LA}_A^{\beta }(X)\) and \(\mathrm {BN}_A^{\beta }(X)\), respectively.

Differently from (7.1) of RSM, we do not always have \(\mathrm {LA}_A^\beta (X)\subseteq X\) and \(\mathrm {UA}_A^\beta (X)\supseteq X\). However, we have

$$\begin{aligned} \mathrm {LA}_A^\beta (X)\subseteq \mathrm {UA}_A^\beta (X), \end{aligned}$$
(7.10)

because \(1-\beta > \beta \) when \(\beta < 0.5\). Moreover, we also have

$$\begin{aligned} \mathrm {LA}_A^\beta (X)\cap \mathrm {LA}_A^\beta (X')=\emptyset , \end{aligned}$$
(7.11)

for any disjoint subsets \(X,X'\subseteq U\), \(X\cap X'=\emptyset \), because \(\beta < 0.5\). Because the inclusion relation of (7.10), each of the lower and upper approximations, and the boundary is represented by the other two sets:

$$\begin{aligned} \mathrm {UA}_A^\beta (X)&= \mathrm {LA}_A^\beta (X)\cup \mathrm {BN}_A^\beta (X),\\ \mathrm {LA}_A^\beta (X)&= \mathrm {UA}_A^\beta (X)\setminus \mathrm {BN}_A^\beta (X). \end{aligned}$$

The monotonic property (7.4) does not hold either. It causes difficulties of defining and enumerating reducts in VPRSM.

We can define positive, boundary, and negative regions in the same manner of the classical RSM:

$$\begin{aligned} \mathrm {POS}_A^\beta (X)&= \bigcup \{R_A(u)\;|\;\mu _{X}^A(u) \ge 1-\beta \},\\ \mathrm {BND}_A^\beta (X)&= \bigcup \{R_A(u)\;|\;\mu _{X}^A(u) \in [\beta ,1-\beta )\},\\ \mathrm {NEG}_A^\beta (X)&= \bigcup \{R_A(u)\;|\;\mu _{X}^A(u) > \beta \}. \end{aligned}$$

Clearly, we have,

$$\begin{aligned} \mathrm {POS}_A^\beta (X)&= \mathrm {LA}_A^\beta (X),\\ \mathrm {BND}_A^\beta (X)&= \mathrm {BN}_A^\beta (X),\\ \mathrm {NEG}_A^\beta (X)&= U\setminus \mathrm {UA}_A^\beta (X). \end{aligned}$$

In the rest of this section, we consider VPRSM under a decision table \(\mathbb {D}=(U,C\cup \{d\},\{V_a\})\). For each decision attribute value \(i\in V_d\), the decision class \(X_i=\{u\in U\;|\;d(u)=i\}\). The set of all decision classes are denoted by \(\fancyscript{X}=\{X_1,X_2,\dots ,X_p\}\).

Example 7

Consider a decision table \(\mathbb {D}=(U,C\cup \{d\},\{V_a\})\) given in Table 7.3. The decision table composed of 40 objects with a condition attribute set \(C=\{c_1,c_2,c_3,c_4\}\) and a decision attribute \(d\). Each condition attribute takes a value bad or good, i.e., \(V_{c_i}=\){bad, good} for \(i=1,2,3,4\). The decision attribute takes one of three values: \(V_d=\){bad, medium, good}. Then there are three decision classes \(X_{\text {b}}\), \(X_{\text {m}}\) and \(X_{\text {g}}\) whose objects take decision attribute value bad, medium and good, respectively. In Table 7.3, objects are classified into 5 groups \(P_1,P_2,\dots ,P_5\) by the condition attributes \(C\). For example, group \(P_1\) is composed of objects having a condition attribute tuple \((c_1,c_2, c_3,c_4)=\) (good, good, bad, good)\(\,\in V_C\). The number of objects in each class in each group is shown in column \(d:\,\)(b, m, g) in Table 7.3. For example, (0,2,9) of group \(P_1\) means that no object is in class \(X_{\text {b}}\), 2 objects are in class \(X_{\text {m}}\) and 9 objects are in class \(X_{\text {g}}\).

Table 7.3 An example of decision table

The rough membership of an object \(u\) in each group \(P_i\) to each class \(X_k\) with respect to \(C\) is the number of objects in \(P_i\) and \(X_k\) divided by the number of objects in \(P_i\). For example, \(\mu ^C_{X_{\text {b}}}(P_1) = 0/(0+2+9) = 0\), \(\mu ^C_{X_{\text {m}}}(P_1) = 2/(0+2+9) = 0.1818\dots \), \(\mu ^C_{X_{\text {g}}}(P_1) = 9/(0+2+9) = 0.8181\dots \). Given a condition attribute subset \(A = \{c_1,c_2\}\), the objects in \(P_1\) and \(P_2\) are indiscernible to each other. Hence, the rough membership of an object \(u\) in \(P_1\) and \(P_2\) with respect to \(A\) becomes \(\mu ^A_{X_{\text {b}}}(P_1) = \mu ^A_{X_{\text {b}}}(P_2) = 0/(0+21+10) = 0\), \(\mu ^A_{X_{\text {m}}}(P_1) = \mu ^A_{X_{\text {m}}}(P_2) = 21/(0+21+10) = 0.6774\dots \), \(\mu ^A_{X_{\text {g}}}(P_1) = \mu ^A_{X_{\text {g}}}(P_2) = 10/(0+21+10) = 0.3225\dots \).

Let \(\beta =0.39\). The lower approximations and the upper approximations with respect to \(C\) and \(\beta \) are obtained as follows:

\(\begin{array}{ll} &{}\mathrm {LA}_{C}^{\beta }(X_\mathrm{b}) = \emptyset , \quad \qquad \mathrm {UA}_{C}^{\beta }(X_\mathrm{b}) = \emptyset ,\\ &{}\mathrm {LA}_{C}^{\beta }(X_\mathrm{m}) = \{P_2\}, \,\quad \mathrm {UA}_{C}^{\beta }(X_\mathrm{m}) = \{P_2,P_4\}, \\ &{}\mathrm {LA}_{C}^{\beta }(X_\mathrm{g}) = \{P_1\}, \!\! \qquad \mathrm {UA}_{C}^{\beta }(X_\mathrm{g}) = \{P_1,P_3,P_4\}, \end{array}\)

where we express approximations by means of groups, namely, all members of a group \(P\) are members of an approximation \(X\) if \(P \in X\).

In VPRSM, the properties corresponding to (7.5) and (7.6) are not always satisfied. Consequently, L-reducts, U-reducts, and B-reducts become independent concepts in VPRSM, and there are no strong-weak relations among them.

Additionally, property (7.7) only partially holds:

$$\begin{aligned} \frac{1}{p} > \beta \Rightarrow U = \bigcup _{i\in V_d} \mathrm {UA}_{A}^{\beta }(X_i). \end{aligned}$$
(7.12)

The union of upper approximations of all decision classes does not always equal to \(U\) but when \(1/p > \beta \). From this fact we define an unpredictable region of \(d\) with respect to \(\beta \) and \(A\), denoted by \(\mathrm {UNP}_A^\beta (d)\), as follows:

$$\begin{aligned} \mathrm {UNP}_A^{\beta }(d) = \bigcap _{i\in V_d} \mathrm {NEG}_{A}^{\beta }(X_i), \end{aligned}$$

equivalently,

$$\begin{aligned} \mathrm {UNP}_A^{\beta }(d) = U-\bigcup _{i\in V_d} \mathrm {UA}_{A}^{\beta }(X_i). \end{aligned}$$

The unpredictable region is the set of all objects which cannot be classified to any decision class.

We can define the positive region of \(d\) with respect to \(\beta \) and \(A\) in the same manner of RSM,

$$\begin{aligned} \mathrm {POS}_A^\beta (d) = \bigcup _{i\in V_d} \mathrm {POS}_A^\beta (X_i). \end{aligned}$$

The quality of classification of \(d\) can be also defined in the same manner,

$$\begin{aligned} \gamma _A^\beta (d) = \frac{|\mathrm {POS}_A^\beta (d)|}{|U|}. \end{aligned}$$

The generalized decision function in RSM can be extended in VPRSM. However, differently from RSM, we define two functions. They are called lower and upper generalized decision functions, denoted by \(\lambda \) and \(\upsilon \), respectively. For each \(u\in U\),

$$\begin{aligned} \lambda _A^\beta (u)&= \{i\in V_d\;|\;\mu _{X_i}^A(u)\ge 1-\beta \},\\ \upsilon _A^\beta (u)&= \{i\in V_d\;|\;\mu _{X_i}^A(u) > \beta \}. \end{aligned}$$

The lower generalized decision of \(u\) is the set of the decision values to which the membership degree of \(u\) is more than or equal to \(1-\beta \). The upper generalized decision of \(u\) is the set of the decision values to which the membership degree of \(u\) is more than \(\beta \). The upper generalized decision corresponds to the generalized decision in RSM. By the definitions, the lower and upper generalized decision functions are closely related to the lower and upper approximations,

$$\begin{aligned} i\in \lambda _A^\beta (u)&\Leftrightarrow u\in \mathrm {LA}_A^\beta (X_i),\\ i\in \upsilon _A^\beta (u)&\Leftrightarrow u\in \mathrm {UA}_A^\beta (X_i). \end{aligned}$$

So, they have the inclusion relation:

$$\begin{aligned} \lambda _A^\beta (u) \subseteq \upsilon _A^\beta (u). \end{aligned}$$
(7.13)

Any two objects in the same equivalence class take the same values of generalized decision functions.

$$\begin{aligned} \text {For each }(u,u')\in R_A,\;\lambda _A^\beta (u) = \lambda _A^\beta (u')\text { and }\upsilon _A^\beta (u) = \upsilon _A^\beta (u'). \end{aligned}$$
(7.14)

The lower generalized decision function is a singleton or the empty set,

$$\begin{aligned} |\lambda _A^\beta (u)| \le 1. \end{aligned}$$
(7.15)

Unlike RSM, we may have the case that the upper generalized decision of \(u\) is a singleton, i.e., \(\upsilon _A^\beta (u)=\{i\}\) but \(u\) does not belong to the lower approximation \(\mathrm {LA}_A^\beta (X_i)\). When the lower generalized decision is a singleton, the upper generalized decision is also a singleton, and they are the same,

$$\begin{aligned} |\lambda _A^\beta (u)| = 1 \Rightarrow \upsilon _A^\beta (u) = \lambda _A^\beta (u). \end{aligned}$$
(7.16)

Property (7.12) can be expressed as:

$$\begin{aligned} \frac{1}{p} > \beta \Rightarrow \upsilon _A^\beta (u)\ne \emptyset . \end{aligned}$$
(7.17)

Hence, \(\upsilon _A^\beta (u)\) may be empty unless \(\beta \) is less than \(\frac{1}{p}\).

We define a function \((\upsilon \backslash \lambda )_A^\beta (u)\) as:

$$\begin{aligned} (\upsilon \backslash \lambda )_A^\beta (u) = \upsilon _A^\beta (u)\setminus \lambda _A^\beta (u). \end{aligned}$$

By properties (7.13), (7.15), and (7.16), we have

$$\begin{aligned} (\upsilon \backslash \lambda )_A^\beta (u)=\emptyset&\Rightarrow \upsilon _A^\beta (u)=\emptyset \text { or }\lambda _A^\beta (u)\ne \emptyset ,\end{aligned}$$
(7.18)
$$\begin{aligned} (\upsilon \backslash \lambda )_A^\beta (u)\ne \emptyset&\Rightarrow (\upsilon \backslash \lambda )_A^\beta (u)=\upsilon _A^\beta (u). \end{aligned}$$
(7.19)

By that property, the following equivalence holds:

$$\begin{aligned} i\in (\upsilon \backslash \lambda )_A^\beta (u) \Leftrightarrow u\in \mathrm {BN}_A^\beta (X_i). \end{aligned}$$
(7.20)

Therefore, we call \((\upsilon \backslash \lambda )\) a boundary generalized decision function.

Example 8

Remember the decision table \(\mathbb {D} = (U,C\cup \{d\},\{V_a\})\) in Table 7.3. Let \(\beta = 0.39\). The lower and upper generalized decision function with respect to \(C\) and \(\beta \) are,

\(\begin{array}{lllll} \lambda _C^\beta (P_1)=\{\text {g}\}, &{} \lambda _C^\beta (P_2)=\{\text {m}\}, &{} \lambda _C^\beta (P_3)=\emptyset , &{} \lambda _C^\beta (P_4)=\emptyset , &{} \lambda _C^\beta (P_5)=\emptyset , \\ \upsilon _C^\beta (P_1)=\{\text {g}\}, &{} \upsilon _C^\beta (P_2)=\{\text {m}\}, &{} \upsilon _C^\beta (P_3)=\{\text {g}\}, &{} \upsilon _C^\beta (P_4)=\{\text {m,g}\}, &{} \upsilon _C^\beta (P_5)=\emptyset , \end{array}\)

where \(\lambda _C^\beta (P_i)\) and \(\upsilon _C^\beta (P_i)\) indicate the lower and upper generalized decisions of an object in the group \(P_i\).

3.3 Structure-Based Reducts in Variable Precision Rough Set Models

Before we define structure-based reducts in VPRSM, we firstly introduce Q-reducts. They preserve the quality of classification with the parameter \(\beta \).

Definition 8

([4, 5]) Let \(\beta \in [0,0.5)\) be an admissible error rate. A Q-reduct with \(\beta \) in VPRSM is a minimal condition attribute subset \(A\subseteq C\) satisfying the following conditions:

$$\begin{aligned}&\gamma _A^\beta (d) = \gamma _C^\beta (d), \end{aligned}$$
(VPQ1)
$$\begin{aligned}&B\text { satisfies (VPQ1)} \quad \ \ \text {for all }\ \ B\supseteq A. \end{aligned}$$
(VPQ2)

Remark 6

In VPRSM, approximations are no longer monotonic with respect to the set inclusion of condition attributes. Hence, condition (VPQ1) is not monotonic with respect to condition attributes, namely, \(A\) satisfies (VPQ1) but \(B\supset A\) does not. In that case, we modify the preserving condition of reducts by adding a condition like (VPQ2). We notice that Beynon [4, 5] originally proposed Q-reducts (the author called them \(\beta \)-reducts) using only (VPQ1).

We define 4 kinds of structure-based reducts in VPRSM [20], which are already discussed in the classical RSM.

Definition 9

([20, 33]) Let \(\beta \in [0,0.5)\) be an admissible error rate.

  • A P-reductFootnote 2 with \(\beta \) in VPRSM is a minimal condition attribute subset \(A\subseteq C\) satisfying the following conditions:

    $$\begin{aligned}&\mathrm {POS}_A^\beta (d) = \mathrm {POS}_C^\beta (d), \end{aligned}$$
    (VPP1)
    $$\begin{aligned}&B\text { satisfies (VPP1)} \quad \ \ \text {for all }\ \ B\supseteq A. \end{aligned}$$
    (VPP2)
  • An L-reduct with \(\beta \) in VPRSM is a minimal condition attribute subset \(A\subseteq C\) satisfying the following condition:

    $$\begin{aligned} \mathrm {LA}_A^\beta (X_i) = \mathrm {LA}_C^\beta (X_i) \quad \ \ \text { for all }~i\in V_d. \end{aligned}$$
    (VPL)
  • A B-reduct with \(\beta \) in VPRSM is a minimal condition attribute subset \(A\subseteq C\) satisfying the following conditions:

    $$\begin{aligned}&\mathrm {BN}_A^\beta (X_i) = \mathrm {BN}_C^\beta (X_i) \quad \ \ \text {for all } i\in V_d, \end{aligned}$$
    (VPB1)
    $$\begin{aligned}&B\text { satisfies (VPB1)} \quad \ \ \text {for all }\ \ B\supseteq A. \end{aligned}$$
    (VPB2)
  • A U-reduct with \(\beta \) in VPRSM is a minimal condition attribute subset \(A\subseteq C\) satisfying the following condition:

    $$\begin{aligned} \mathrm {UA}_A^\beta (X_i) = \mathrm {UA}_C^\beta (X_i) \quad \ \ \text { for all } i\in V_d. \end{aligned}$$
    (VPU)

Mi et al. [33] independently proposed L-reducts and U-reducts under the names of lower and upper distribution reducts.

Additionally, we can define a reduct preserving the unpredictable region.

Definition 10

([20]) Let \(\beta \in [0,0.5)\) be an admissible error rate. A UN-reduct with \(\beta \) in VPRSM is a minimal condition attribute subset \(A\subseteq C\) satisfying the following conditions:

$$\begin{aligned}&\mathrm {UNP}_A^\beta (d) = \mathrm {UNP}_C^\beta (d), \end{aligned}$$
(VPUN1)
$$\begin{aligned}&B\,\text {satisfies (VPUN1) for all }B\supseteq A. \end{aligned}$$
(VPUN2)

Remark 7

We modify the definitions of B- and UN-reducts from our paper [20], because there are mistakes in Boolean functions for B- and UN-reducts. By adding the second condition, the preserving conditions of B- and UN-reducts become monotone with respect to the set-inclusion of condition attribute sets.

By definitions, (VPL) and (VPU) obviously imply (VPP1,2) and (VPUN1,2), respectively. Moreover, (VPP1,2) also implies (VPQ1,2). Hence, we have the following relations among different types of reducts.

Theorem 3

([20]) Let \(A\) be a subset of \(C\). We have the following statements in VPRSM with a fixed parameter \(\beta \in [0,0.5)\),

  1. (a)

    \(A\) is an L-reduct then \(A\) satisfies (VPP1,2),

  2. (b)

    \(A\) is a U-reduct then \(A\) satisfies (VPUN1,2),

  3. (c)

    \(A\) is a P-reduct then \(A\) satisfies (VPQ1,2).

Contrary to the classical RSM, (VPB1,2) is not equivalent to (VPU). In RSM, preserving boundaries implies preventing ambiguity expansion, namely upper approximations. However, in VPRSM, the ambiguity expansion can be prevented not only by preserving boundaries but by preserving them with the unpredictable region. Furthermore, we can define other compositions of different types of reducts.

Simply combining 5 types of structure-based reducts, we obtain \(2^5-1 = 31\) types of reducts (ignoring (a) and (b) of Theorem 3). To reduce the number, we first investigate relationships of preserving conditions of reducts.

Theorem 4

([20]) Let \(A\) be a subset of \(C\). We have the following statements in VPRSM with a fixed parameter \(\beta \in [0,0.5)\),

  • The conjunction of (VPB1) and (VPP1) is equivalent to that of (VPB1) and (VPUN1),

  • The conjunction of (VPL) and (VPB1) is equivalent to that of (VPL) and (VPU),

  • The conjunction of (VPU) and (VPP1) is equivalent to that of (VPL) and (VPU).

We define 4 different types of reducts.

Definition 11

([20]) Let \(\beta \in [0,0.5)\) be an admissible error rate.

  • An LU-reduct with \(\beta \) in VPRSM is a minimal condition attribute subset \(A\subseteq C\) satisfying the following condition:

    $$\begin{aligned} \mathrm {LA}_A^\beta (X_i) = \mathrm {LA}_C^\beta (X_i)\text { and } \mathrm {UA}_A^\beta (X_i) = \mathrm {UA}_C^\beta (X_i) \quad \text { for all }i\in V_d. \end{aligned}$$
    (VPLU)
  • An LUN-reduct with \(\beta \) in VPRSM is a minimal condition attribute subset \(A\subseteq C\) satisfying the following conditions:

    $$\begin{aligned}&\mathrm {LA}_A^\beta (X_i) = \mathrm {LA}_C^\beta (X_i) \quad \text { for all }i\in V_d,\text { and } \mathrm {UNP}_A^\beta (d) = \mathrm {UNP}_C^\beta (d), \end{aligned}$$
    (VPLUN1)
    $$\begin{aligned}&B\text { satisfies (VPLUN1)} \quad \text {for all }B\supseteq A. \end{aligned}$$
    (VPLUN2)
  • A BUN-reduct with \(\beta \) in VPRSM is a minimal condition attribute subset \(A\subseteq C\) satisfying the following conditions:

    $$\begin{aligned}&\mathrm {BN}_A^\beta (X_i) = \mathrm {BN}_C^\beta (X_i) \quad \text { for all }i\in V_d,\text { and } \mathrm {UNP}_A^\beta (d) = \mathrm {UNP}_C^\beta (d), \end{aligned}$$
    (VPBUN1)
    $$\begin{aligned}&B\text { satisfies (VPBUN1)} \quad \text {for all }B\supseteq A. \end{aligned}$$
    (VPBUN2)
  • A PUN-reduct with \(\beta \) in VPRSM is a minimal condition attribute subset \(A\subseteq C\) satisfying the following conditions:

    $$\begin{aligned}&\mathrm {POS}_A^\beta (d) = \mathrm {POS}_C^\beta (d) \quad \text { and } \mathrm {UNP}_A^\beta (d) = \mathrm {UNP}_C^\beta (d), \end{aligned}$$
    (VPPUN1)
    $$\begin{aligned}&B\text { satisfies (VPPUN1)} \quad \text {for all }B\supseteq A. \end{aligned}$$
    (VPPUN2)

In Fig. 7.2, we show the relationships among 9 types of reducts. Names of reducts are abbreviated to their first characters. Reducts located in the upper part of Fig. 7.2 preserve regions much more. Therefore, such reducts are larger in the sense of set inclusion than the other reducts located in the lower part. A line segment connecting two types of reducts implies that, for each reduct of the upper type say \(A\) satisfies the preserving condition of a reduct of the lower one. From Fig. 7.2, we know that LU-reducts preserve regions most. On the other hand, UN-reducts and P-reducts do not preserve many regions.

Fig. 7.2
figure 2

Strong-weak hierarchy of 9 types of structure-based reducts in VPRSM

The next proposition says that composite reducts such as LUN- or BUN-reducts can be constructed from their base reducts such as L- and UN-reducts or B- and UN-reducts. The proposition is useful to enumerate the composite reducts.

Proposition 1

Consider two types of reducts, \(\heartsuit \)-reducts and \(\spadesuit \)-reducts, and the composition of them: \(\heartsuit \spadesuit \)-reducts. Let \(\fancyscript{H}\) and \(\fancyscript{S}\) be the set of all \(\heartsuit \)-reducts and the set of all \(\spadesuit \)-reducts, respectively. Then the set of all \(\heartsuit \spadesuit \)-reducts is the set of all minimal elements of \(\{A\cup B\;|\;A\in \fancyscript{H}\text { and }B\in \fancyscript{S}\}\).

3.4 Boolean Functions Representing Reducts

As shown above, L- and U-reducts in the classical RSM are characterized by prime implicants of certain Boolean functions. In this section, we discuss Boolean functions of 9 types of reducts in VPRSM. To do this, we focus on Boolean functions of reducts pertaining to the lower approximations, the upper approximations, the boundaries, the positive region, and the unpredictable region, since the others can be obtained by taking conjunctions of those Boolean functions or using Proposition 1.

First, we represent the preserving conditions by the generalized decision functions.

Lemma 3

Let \(\beta \in [0,0.5)\) be an admissible error rate, and \(A\) be a subset of \(C\). We have the following statements:

  • Condition (VPL) with \(\beta \) is equivalent to:

    $$\begin{aligned} \lambda _A^\beta (u) = \lambda _C^\beta (u) \quad \text { for all }u\in U. \end{aligned}$$
    (VPLG)
  • Condition (VPU) with \(\beta \) is equivalent to:

    $$\begin{aligned} \upsilon _A^\beta (u) = \upsilon _C^\beta (u) \quad \text { for all }u\in U. \end{aligned}$$
    (VPUG)
  • Condition (VPB1) with \(\beta \) is equivalent to:

    $$\begin{aligned} (\upsilon \backslash \lambda )_A^\beta (u) = (\upsilon \backslash \lambda )_C^\beta (u) \quad \text { for all }u\in U. \end{aligned}$$
    (VPBG1)
  • Condition (VPP1) with \(\beta \) is equivalent to:

    $$\begin{aligned} \lambda _A^\beta (u)=\emptyset \Leftrightarrow \lambda _C^\beta (u)=\emptyset \quad \text { for all }u\in U. \end{aligned}$$
    (VPPG1)
  • Condition (VPUN1) with \(\beta \) is equivalent to:

    $$\begin{aligned} \upsilon _A^\beta (u)=\emptyset \Leftrightarrow \upsilon _C^\beta (u)=\emptyset \quad \text { for all }u\in U. \end{aligned}$$
    (VPUNG1)

The next lemma is the counterpart of Lemma 2 of RSM. However, only the sufficient condition of the lemma holds in VPRSM.

Lemma 4

Let \(u\in U\) be an object, \(\beta \in [0,0.5)\) be an admissible error rate, and \(A\) be a subset of \(C\).

  • The following assertion is a sufficient condition of \(\upsilon _A^\beta (u) = \upsilon _C^\beta (u)\):

    $$\begin{aligned} \forall u'\in U,\;(\upsilon _C^\beta (u')\ne \upsilon _C^\beta (u) \Rightarrow \exists a\in A,\;(u',u)\not \in R_{\{a\}}). \end{aligned}$$
  • The following assertion is a sufficient condition of \(\lambda _A^\beta (u) = \lambda _C^\beta (u)\):

    $$\begin{aligned} \forall u'\in U,\;(\lambda _C^\beta (u')\ne \lambda _C^\beta (u) \Rightarrow \exists a\in A,\;(u',u)\not \in R_{\{a\}}). \end{aligned}$$

This lemma holds due to property (7.9). Then, we have the following corollary.

Corollary 1

We have the following equivalences:

$$\begin{aligned}&\forall u\in U,\;\upsilon _A^\beta (u) = \upsilon _C^\beta (u) \\&\quad \Leftrightarrow \forall u,u'\in U,\;(\upsilon _C^\beta (u')\ne \upsilon _C^\beta (u) \Rightarrow \exists a\in A,\;(u',u)\not \in R_{\{a\}}),\\&\forall u\in U,\;\lambda _A^\beta (u) = \lambda _C^\beta (u) \\&\quad \Leftrightarrow \forall u,u'\in U,\;(\lambda _C^\beta (u')\ne \lambda _C^\beta (u) \Rightarrow \exists a\in A,\;(u',u)\not \in R_{\{a\}}). \end{aligned}$$

It says that all L-reducts and all U-reducts can be enumerated by discernibility functions. The similar result is shown in [33]. However, we do not have the same result for \((\upsilon \backslash \lambda )\) and conditions (VPPG1) and (VPUNG1).

We introduce a discernibility matrix \(M=(m_{i,j})_{ij=1,2,\dots ,n}\), where \(ij\)-entry \(m_{ij}\) is defined by:

$$\begin{aligned} m_{ij} = \{c\in C\;|\;a(u_i)\ne a(u_j)\}. \end{aligned}$$

It is the same as that of RSM. Then, we define discernibility functions corresponding to L-reducts and U-reducts, which are denoted by \(F^{\mathrm {U}}_\beta \) and \(F^{\mathrm {L}}_\beta \), respectively.

Definition 12

Let \(\beta \in [0,0.5)\) be an admissible error rate. Discernibility functions \(F^{\mathrm {U}}_\beta \) and \(F^{\mathrm {L}}_\beta \) are defined as follows:

$$\begin{aligned} F^{\mathrm {U}}_\beta (\tilde{c}_1,\tilde{c}_2,\dots ,\tilde{c}_m)&= \bigwedge _{i,j\;|\;\upsilon _C^\beta (u_i)\ne \upsilon _C^\beta (u_j)}\bigvee _{c\in m_{ij}}\tilde{c},\\ F^{\mathrm {L}}_\beta (\tilde{c}_1,\tilde{c}_2,\dots ,\tilde{c}_m)&= \bigwedge _{i,j\;|\;\lambda _C^\beta (u_i)\ne \lambda _C^\beta (u_j)} \bigvee _{c\in m_{ij}}\tilde{c}, \end{aligned}$$

where, \(\tilde{c}_i\) is a Boolean variable pertaining to a condition attribute \(c_i\in C\).

Function \(F^{\mathrm {U}}_\beta \) is true if and only if at least one variable \(\tilde{c}\) in \(m_{ij}\) of \(\upsilon _C^\beta (u_i)\ne \upsilon _C^\beta (u_j)\) is true. While function \(F^{\mathrm {L}}_\beta \) is true if and only if at least one variable \(\tilde{c}\) in \(m_{ij}\) of \(\lambda _C^\beta (u_i)\ne \lambda _C^\beta (u_j)\) is true.

Remember that we associate \(A\subseteq C\) with a Boolean vector \(\tilde{c}^A=(\tilde{c}^A_1,\tilde{c}^A_2, \ldots , \tilde{c}^A_m)\) as follows:

$$\begin{aligned} \tilde{c}^A_k = {\left\{ \begin{array}{ll} 1 &{} c_k\in A,\\ 0 &{} \mathrm {otherwise}. \end{array}\right. } \end{aligned}$$

Then, we can prove the next theorem from Corollary 1. Remember that \(\phi _{A}\) is the term \(\bigwedge \{\tilde{a}|a \in A\}\).

Theorem 5

([20, 33]) Let \(A\) be the subset of \(C\), and \(\beta \in [0,0.5)\) be an admissible error rate. We have the following equivalences:

  • \(A\) satisfies (VPUG) as well as (VPU) with \(\beta \) if and only if \(F^{\mathrm {U}}_\beta (\tilde{c}^A)=1\). Moreover, \(A\) is a U-reduct with \(\beta \) if and only if \(\phi _{A}\) is a prime implicant of \(F^U_\beta \),

  • \(A\) satisfies (VPLG) as well as (VPL) with \(\beta \) if and only if \(F^{\mathrm {L}}_\beta (\tilde{c}^A)=1\). Moreover, \(A\) is an L-reduct with \(\beta \) if and only if \(\phi _{A}\) is a prime implicant of \(F^L_\beta \).

For the preservation of the boundaries, the positive region, and the unpredictable region, we cannot use discernibility function approach. Because we cannot obtain a preserving subset \(A\subseteq C\) by determining which pairs of objects should be discerned. For example, consider a decision table below.

$$\begin{aligned} \begin{array}{l|lll|lll} \hline &{} c_1 &{} c_2 &{} c_3 &{} X_1 &{} X_2 &{} X_3 \\ \hline P_1 &{} 0 &{} 0 &{} 0 &{} 4 &{} 0 &{} 0 \\ P_2 &{} 0 &{} 0 &{} 1 &{} 0 &{} 2 &{} 0 \\ P_3 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 1 \\ P_4 &{} 1 &{} 1 &{} 0 &{} 1 &{} 1 &{} 1 \\ \hline \end{array} \end{aligned}$$

There are 3 condition attributes \(C=\{c_1,c_2,c_3\}\) with the value set \(V=\{0,1\}\), and 3 decision classes \(X_1,X_2,X_3\). \(P_1,P_2,P_3,P_4\) are sets of objects, where members of each set have the same condition attribute values. The distribution of the decision classes on each set \(P_i\) is shown in the table, for instance the distribution on \(P_1\) forms \(|X_1\cap P_1|=4\), \(|X_2\cap P_1|=0\), and \(|X_3\cap P_1|=0\). Consider P-reducts with \(\beta = 0.4\). The positive region of the table is POS\(_C^\beta (d) = P_1\cup P_2\cup P_3\). When we can make \(P_1\) and \(P_2\) be indiscernible and combine \(P_1\cup P_2\), the positive region is still preserved. Because the distribution on \(P_1\cup P_2\) is \((X_1,X_2,X_3) = (4,2,0)\), and \(\mu _{X_1}(P_1\cup P_2)=2/3\ge 0.6\). Similarly, we can make the pair of \(P_1\) and \(P_3\) and the pair of \(P_2\) and \(P_3\) be indiscernible. However, when we make all of \(P_1\), \(P_2\), and \(P_3\) indiscernible, and select \(\{c_1\}\) as a reduct, \(P_1\cup P_2\cup P_3\) falls outside of POS\(_{\{c_1\}}^\beta (d)\), because the distribution is \((X_1,X_2,X_3)=(4,2,1)\), and \(\mu _{X_1}(P_1\cup P_2\cup P_3)=4/7\le 0.6\).

To overcome that difficulty, for each type of reducts, we consider two approximate discernibility functions \(\hat{F}_\beta \) and \(\check{F}_\beta \): \(\hat{F}_\beta \) characterizes a sufficient condition of the preservation and \(\check{F}_\beta \) characterizes a necessary condition.

First, we discuss discernibility functions characterizing sufficient conditions. By Theorems 3 and 4, we know that \(F^{\mathrm {L}}_\beta \), \(F^{\mathrm {U}}_\beta \), and \(F^{\mathrm {LU}}_\beta =F^{\mathrm {L}}_\beta \wedge F^{\mathrm {U}}_\beta \) are discernibility functions of sufficient conditions for B-reducts, P-reducts, and UN-reducts with \(\beta \).

Definition 13

Let \(\beta \in [0,0.5)\) be an admissible error rate. Discernibility functions \(\hat{F}^{\mathrm {B}}_\beta \), \(\hat{F}^{\mathrm {P}}_\beta \), and \(\hat{F}^{\mathrm {UN}}_\beta \) are defined as follows:

$$\begin{aligned} \hat{F}^{\mathrm {B}}_\beta = F^{\mathrm {L}}_\beta \wedge F^{\mathrm {U}}_\beta , \quad \hat{F}^{\mathrm {P}}_\beta = F^{\mathrm {L}}_\beta , \quad \hat{F}^{\mathrm {UN}}_\beta = F^{\mathrm {U}}_\beta . \end{aligned}$$

Clearly, we have the following proposition.

Proposition 2

([20]) Let \(A\) be a subset of \(C\), and \(\beta \in [0,0.5)\) be an admissible error rate. We have the following implications:

  • If \(\hat{F}^{\mathrm {B}}_\beta (\tilde{c}^A)=1\) then \(A\) satisfies (VPB1) and (VPB2) with \(\beta \),

  • If \(\hat{F}^{\mathrm {P}}_\beta (\tilde{c}^A)=1\) then \(A\) satisfies (VPP1) and (VPP2) with \(\beta \),

  • If \(\hat{F}^{\mathrm {UN}}_\beta (\tilde{c}^A)=1\) then \(A\) satisfies (VPUN1) and (VPUN2) with \(\beta \).

Next, let us discuss a discernibility function characterizing a necessary condition. Consider necessary discernibility functions for P-reducts. In the sufficient discernibility function \(\hat{F}^{\mathrm {P}}_\beta =F^{\mathrm {L}}_\beta \), pairs of objects included in the positive region are discerned when they have different values of \(\lambda ^\beta _C\). However, such pairs are not necessarily discerned because there may be a P-reduct such that some of pairs become indiscernible. On the other hand, for each pair \(u_i\) and \(u_j\), if they are excluded from the positive region of the common condition attributes, i.e., \(C\setminus m_{ij}\), they should be discerned because no subset \(A\subseteq C\setminus m_{ij}\) satisfies (VPP1) and (VPP2). From this consideration, discernibility functions characterizing necessary conditions for preservation of the boundaries, the positive region, and the unpredictable region are obtained as follows.

Definition 14

Let \(\beta \in [0,0.5)\) be an admissible error rate. Moreover, let \(\tilde{c}_i\) be a Boolean variable pertaining to a condition attribute \(c_i\in C\). Discernibility functions \(\check{F}^{\mathrm {B}}_\beta \), \(\check{F}^{\mathrm {P}}_\beta \), and \(\check{F}^{\mathrm {UN}}_\beta \) are defined as follows:

$$\begin{aligned} \check{F}^{\mathrm {B}}_\beta (\tilde{c}_1,\tilde{c}_2,\dots ,\tilde{c}_m)&= \bigwedge _{(i,j)\in \varDelta ^{\mathrm {B}}_\beta }\bigvee _{c\in m_{ij}}\tilde{c}, \\ \check{F}^{\mathrm {P}}_\beta (\tilde{c}_1,\tilde{c}_2,\dots ,\tilde{c}_m)&= \bigwedge _{(i,j)\in \varDelta ^{\mathrm {P}}_\beta } \bigvee _{c\in m_{ij}} \tilde{c}, \\ \check{F}^{\mathrm {UN}}_\beta (\tilde{c}_1,\tilde{c}_2,\dots ,\tilde{c}_m)&= \bigwedge _{(i,j)\in \varDelta ^{\mathrm {UN}}_\beta }\bigvee _{c\in m_{ij}} \tilde{c}, \end{aligned}$$

where,

$$\begin{aligned} (i,j)\in \varDelta ^{\mathrm {B}}_\beta&\Leftrightarrow {\left\{ \begin{array}{ll} (\upsilon \backslash \lambda )_C^\beta (u_i)\ne (\upsilon \backslash \lambda )_C^\beta (u_j),\text { or}\\ (\upsilon \backslash \lambda )_C^\beta (u_i)=(\upsilon \backslash \lambda )_C^\beta (u_j)=\emptyset \\ \quad \text { and }(\upsilon \backslash \lambda )_{C\setminus m_{ij}}^\beta (u_i)=(\upsilon \backslash \lambda )_{C\setminus m_{ij}}^\beta (u_j)\ne \emptyset , \end{array}\right. } \end{aligned}$$
$$\begin{aligned} (i,j)\in \varDelta ^{\mathrm {P}}_\beta&\Leftrightarrow {\left\{ \begin{array}{ll} \lambda ^\beta _C(u_i)\ne \emptyset \text { and }\lambda ^\beta _C(u_j)=\emptyset ,\text { or}\\ \lambda ^\beta _C(u_i)=\emptyset \text { and }\lambda ^\beta _C(u_j)\ne \emptyset ,\text { or}\\ \lambda ^\beta _C(u_i)\ne \emptyset ,\;\lambda ^\beta _C(u_j)\ne \emptyset ,\text { and }\lambda ^\beta _{C\setminus m_{ij}}(u_i)=\lambda ^\beta _{C\setminus m_{ij}}(u_j)=\emptyset , \end{array}\right. } \end{aligned}$$
$$\begin{aligned} (i,j)\in \varDelta ^{\mathrm {UN}}_\beta&\Leftrightarrow {\left\{ \begin{array}{ll} \upsilon ^\beta _C(u_i)\ne \emptyset \text { and }\upsilon ^\beta _C(u_j)=\emptyset ,\text { or}\\ \upsilon ^\beta _C(u_i)=\emptyset \text { and }\upsilon ^\beta _C(u_j)\ne \emptyset ,\text { or}\\ \upsilon ^\beta _C(u_i)\ne \emptyset ,\;\upsilon ^\beta _C(u_j)\ne \emptyset ,\text { and }\upsilon ^\beta _{C\setminus m_{ij}}(u_i)=\upsilon ^\beta _{C\setminus m_{ij}}(u_j)=\emptyset . \end{array}\right. } \end{aligned}$$

Proposition 3

Let \(A\) be a subset of \(C\), and \(\beta \in [0,0.5)\) be an admissible error rate. We have the following implications:

  • If \(\check{F}^{\mathrm {B}}_\beta (\tilde{c}^A)=0\) then \(A\) does not satisfy (VPB1) or (VPB2) with \(\beta \),

  • If \(\check{F}^{\mathrm {P}}_\beta (\tilde{c}^A)=0\) then \(A\) does not satisfy (VPP1) or (VPP2) with \(\beta \),

  • If \(\check{F}^{\mathrm {UN}}_\beta (\tilde{c}^A)=0\) then \(A\) does not satisfy (VPUN1) or (VPUN2) with \(\beta \).

From Proposition 3, we know that any prime implicant of each of \(\check{F}^\mathrm{B}_\beta \), \(\check{F}^\mathrm{P}_\beta \), and \(\check{F}^\mathrm{UN}_\beta \) can be a subset of some reduct of the corresponding type.

Combining Propositions 2 and 3, we have the following theorem.

Theorem 6

Let \(A\) be a subset of \(C\), and \(\beta \in [0,0.5)\) be an admissible error rate. Let \(\hat{\fancyscript{P}}^{\mathrm {B}}_\beta \), \(\hat{\fancyscript{P}}^{\mathrm {P}}_\beta \) and \(\hat{\fancyscript{P}}^{\mathrm {UN}}_\beta \) be the sets of condition attribute subsets corresponding to the prime implicants of \(\hat{F}^{\mathrm {B}}_\beta \), \(\hat{F}^{\mathrm {P}}_\beta \), and \(\hat{F}^{\mathrm {UN}}_\beta \), respectively. Moreover, let \(\check{\fancyscript{P}}^{\mathrm {B}}_\beta \) , \(\check{\fancyscript{P}}^{\mathrm {P}}_\beta \) and \(\check{\fancyscript{P}}^{\mathrm {UN}}_\beta \) be the sets of condition attribute subsets corresponding to the prime implicants of \(\check{F}^{\mathrm {B}}_\beta \), \(\check{F}^{\mathrm {P}}_\beta \), and \(\check{F}^{\mathrm {UN}}_\beta \), respectively. Then, we have the following implications:

  • If \(A\) is a B-reduct with \(\beta \) then \(A\in \{B\subseteq C\;|\;B\supseteq B'\) for some \(B'\in \check{\fancyscript{P}}^{\mathrm {B}}_\beta \) and \(B\not \supset B''\) for any \(B''\in \hat{\fancyscript{P}}^{\mathrm {B}}_\beta \}\),

  • If \(A\) is a P-reduct with \(\beta \) then \(A\in \{B\subseteq C\;|\;B\supseteq B'\) for some \(B'\in \check{\fancyscript{P}}^{\mathrm {P}}_\beta \) and \(B\not \supset B''\) for any \(B''\in \hat{\fancyscript{P}}^{\mathrm {P}}_\beta \}\),

  • If \(A\) is a UN-reduct with \(\beta \) then \(A\in \{B\subseteq C\;|\;B\supseteq B'\) for some \(B'\in \check{\fancyscript{P}}^{\mathrm {UN}}_\beta \) and \(B\not \supset B''\) for any \(B''\in \hat{\fancyscript{P}}^{\mathrm {UN}}_\beta \}\).

The obtained discernibility functions are shown in Table 7.4. In the case of approximate discernibility functions, the first function in the parenthesis characterizes the necessary condition of the preservation and the second function characterizes the sufficient condition. The discernibility functions related to LU-reducts, LUN-reducts and BUN-reducts can be obtained by taking the conjunctions of discernibility functions related to L-reducts, U-reducts, B-reducts and UN-reducts. Note that \(\hat{F}^\mathrm{B}_\beta \wedge \hat{F}^\mathrm{UN}_\beta =(F^\mathrm{L}_\beta \wedge F^\mathrm{U}_\beta )\wedge F^\mathrm{U}_\beta =F^\mathrm{L}_\beta \wedge F^\mathrm{U}_\beta \). This is why we have \(F^\mathrm{L}_\beta \wedge F^\mathrm{U}_\beta \) as the discernibility function characterizing a sufficient condition for the preservation of BUN-reducts.

Table 7.4 Discernibility functions related to 9 kinds of reducts
Table 7.5 The decision table in Table 7.3 with the generalized decision functions

Example 9

Remember the decision table \(\mathbb {D} = (U,C\cup \{d\},\{V_a\})\) in Table 7.3. Let an admissible error rate be \(\beta = 0.39\). In Table 7.5, we show the decision table with three generalized decision functions \(\lambda _C^{0.39}\), \(\upsilon _C^{0.39}\), \((\upsilon \backslash \lambda )_C^{0.39}\) with respect to \(C\) and \(\beta =0.39\).

Now let us enumerate reducts as prime implicants of discernibility functions. First let us discuss L-, U- and LU-reducts with \(\beta =0.39\). The discernibility matrix of the decision table is shown as below.

$$\begin{aligned} \begin{array}{c|ccccc} &{} P_1 &{} P_2 &{} P_3 &{} P_4 &{} P_5 \\ \hline P_1 &{} \emptyset &{} \{c_3,c_4\} &{} \{c_1\} &{} \{c_1,c_2\} &{} \{c_2,c_3\} \\ P_2 &{} \{c_3,c_4\} &{} \emptyset &{} \{c_1,c_3,c_4\} &{} C &{} \{c_2,c_4\} \\ P_3 &{} \{c_1\} &{} \{c_1,c_3,c_4\} &{} \emptyset &{} \{c_2\} &{} \{c_1,c_2,c_3\} \\ P_4 &{} \{c_1,c_2\} &{} C &{} \{c_2\} &{} \emptyset &{} \{c_1,c_3\} \\ P_5 &{} \{c_2,c_3\} &{} \{c_2,c_4\} &{} \{c_1,c_2,c_3\} &{} \{c_1,c_3\} &{} \emptyset \end{array} \end{aligned}$$

From the table, we obtain \(F^\mathrm{L}_{0.39}\) and \(F^\mathrm{U}_{0.39}\) as follows:

$$\begin{aligned} F^{\mathrm {L}}_{0.39}(\tilde{c}_1,\tilde{c}_2,\tilde{c}_3,\tilde{c}_4)&= \bigwedge _{i=1,2,j=3,4,5}\bigvee _{c\in m_{ij}}\tilde{c} \wedge \bigvee _{c\in m_{12}}\tilde{c} \\&= (\tilde{c}_1)\wedge (\tilde{c}_2\vee \tilde{c}_3)\wedge (\tilde{c}_2\vee \tilde{c}_4)\wedge (\tilde{c}_3\vee \tilde{c}_4) \\&= (\tilde{c}_1\wedge \tilde{c}_2\wedge \tilde{c}_3)\vee (\tilde{c}_1\wedge \tilde{c}_2\wedge \tilde{c}_4)\vee (\tilde{c}_1\wedge \tilde{c}_3\wedge \tilde{c}_4),\\ F^{\mathrm {U}}_{0.39}(\tilde{c}_1,\tilde{c}_2,\tilde{c}_3,\tilde{c}_4)&= \bigwedge _{(i,j)\in \{(k,l)|k\ne l\}\setminus \{(1,3),(3,1)\}}\bigvee _{c\in m_{ij}}\tilde{c} \\&= (\tilde{c}_2)\wedge (\tilde{c}_1\vee \tilde{c}_3)\wedge (\tilde{c}_3\vee \tilde{c}_4)=(\tilde{c}_2\wedge \tilde{c}_3)\vee (\tilde{c}_1\wedge \tilde{c}_2\wedge \tilde{c}_4). \end{aligned}$$

We obtain \(F^\mathrm{LU}_{0.39}\) as:

$$\begin{aligned} F^{\mathrm {LU}}_{0.39}(\tilde{c}_1,\tilde{c}_2,\tilde{c}_3,\tilde{c}_4)&= F^\mathrm{L}(\tilde{c}_1,\tilde{c}_2,\tilde{c}_3,\tilde{c}_4) \wedge F^\mathrm{U}(\tilde{c}_1,\tilde{c}_2,\tilde{c}_3,\tilde{c}_4) \\&= (\tilde{c}_1)\wedge (\tilde{c}_2)\wedge (\tilde{c}_3\vee \tilde{c}_4) = (\tilde{c}_1\wedge \tilde{c}_2\wedge \tilde{c}_3)\vee (\tilde{c}_1\wedge \tilde{c}_2\wedge \tilde{c}_4). \end{aligned}$$

Therefore, L-reducts are obtained as \(\{c_1,c_2,c_3\}\), \(\{c_1,c_2,c_4\}\) and \(\{c_1,c_3,c_4\}\). U-reducts are obtained as \(\{c_2,c_3\}\) and \(\{c_1,c_2,c_4 \}\). LU-reducts are obtained as \(\{c_1,c_2,c_3\}\) and \(\{c_1,c_2,c_4\}\). Note that \(\{c_2,c_3\}\) is not an L-reduct but a U-reduct. This is very different from the relation between L- and U-reducts in the classical RSM, i.e., in the classical RSM, a U-reduct includes an L-reduct but an L-reduct never includes a U-reduct.

Now let us discuss B-, P-, UN-reducts with \(\beta =0.39\). We can obtain only approximations of those reducts. To this end, let us get discernibility functions \(\check{F}^\mathrm{B}_{0.39}\), \(\check{F}^\mathrm{P}_{0.39}\), and \(\check{F}^\mathrm{UN}_{0.39}\). For B-reducts, considering the second condition of \(\varDelta ^{\mathrm {B}}_{0.39}\), check each pair \(P_i\) and \(P_j\) such that \((\upsilon \backslash \lambda )_{C}^{0.39}(P_i)=\emptyset \) and \((\upsilon \backslash \lambda )_{C}^{0.39}(P_j)=\emptyset \).

$$\begin{aligned} (\upsilon \backslash \lambda )_{C\setminus m_{12}}^{0.39}(P_1) = (\upsilon \backslash \lambda )_{C\setminus m_{12}}^{0.39}(P_2) = \emptyset ,&(\upsilon \backslash \lambda )_{C\setminus m_{15}}^{0.39}(P_1) = (\upsilon \backslash \lambda )_{C\setminus m_{15}}^{0.39}(P_5) = \emptyset ,\\ (\upsilon \backslash \lambda )_{C\setminus m_{25}}^{0.39}(P_2) = (\upsilon \backslash \lambda )_{C\setminus m_{25}}^{0.39}(P_5) = \emptyset . \end{aligned}$$

For P-reducts, check each pair such that \(\lambda _{C}^{0.39}(P_i)\ne \emptyset \) and \(\lambda _{C}^{0.39}(P_j)\ne \emptyset \).

$$\begin{aligned} \lambda _{C\setminus m_{12}}^{0.39}(P_1) = \lambda _{C\setminus m_{12}}^{0.39}(P_2) = \{\text {medium}\}. \end{aligned}$$

Finally, for UN-reducts, check each pair such that \(\upsilon _{C}^{0.39}(P_i)\ne \emptyset \) and \(\upsilon _{C}^{0.39}(P_j)\ne \emptyset \).

$$\begin{aligned} \begin{array}{ll} &{}\upsilon _{C\setminus m_{12}}^{0.39}(P_1) = \upsilon _{C\setminus m_{12}}^{0.39}(P_2) = \{\text {medium}\}, \quad \upsilon _{C\setminus m_{13}}^{0.39}(P_1) = \upsilon _{C\setminus m_{13}}^{0.39}(P_3) = \{\text {good}\},\\ &{}\upsilon _{C\setminus m_{14}}^{0.39}(P_1) = \upsilon _{C\setminus m_{14}}^{0.39}(P_4) = \{\text {good}\}, \quad \upsilon _{C\setminus m_{23}}^{0.39}(P_2) = \upsilon _{C\setminus m_{23}}^{0.39}(P_3) = \{\text {medium}\},\\ &{}\upsilon _{C\setminus m_{24}}^{0.39}(P_2) = \upsilon _{C\setminus m_{24}}^{0.39}(P_4) = \{\text {medium}\}, \quad \upsilon _{C\setminus m_{34}}^{0.39}(P_3) = \upsilon _{C\setminus m_{34}}^{0.39}(P_4) = \{\text {good}\}. \end{array} \end{aligned}$$

Therefore, discernibility functions \(\check{F}^\mathrm{B}_{0.39}\), \(\check{F}^\mathrm{P}_{0.39}\), and \(\check{F}^\mathrm{UN}_{0.39}\) are obtained as:

$$\begin{aligned} \check{F}^\mathrm{B}_{0.39}(\tilde{c}_1,\tilde{c}_2,\tilde{c}_3,\tilde{c}_4)&= \bigwedge _{i=1,2,5,j=3,4}\bigvee _{c\in m_{ij}}\tilde{c}\wedge \bigvee _{c\in m_{34}}\tilde{c}=\tilde{c}_1\wedge \tilde{c}_2, \\ \check{F}^\mathrm{P}_{0.39}(\tilde{c}_1,\tilde{c}_2,\tilde{c}_3,\tilde{c}_4)&= \bigwedge _{i=1,2,j=3,4,5}\bigvee _{c\in m_{ij}}\tilde{c}=(\tilde{c}_1)\wedge (\tilde{c}_2\vee \tilde{c}_3) \wedge (\tilde{c}_2\vee \tilde{c}_4) \\&= (\tilde{c}_1\wedge \tilde{c}_2) \vee (\tilde{c}_1\wedge \tilde{c}_3\wedge \tilde{c}_4),\\ \check{F}^{\mathrm {UN}}_{0.39}(\tilde{c}_1,\tilde{c}_2,\tilde{c}_3,\tilde{c}_4)&= \bigwedge _{i=1,2,3,4,j=5}\bigvee _{c\in m_{ij}}\tilde{c}=(\tilde{c}_1\vee \tilde{c}_3) \wedge (\tilde{c}_2\vee \tilde{c}_3) \wedge (\tilde{c}_2\vee \tilde{c}_4) \\&= (\tilde{c}_1\wedge \tilde{c}_2) \vee (\tilde{c}_2\wedge \tilde{c}_3)\vee (\tilde{c}_3\wedge \tilde{c}_4). \end{aligned}$$

Because \(\hat{F}^{\mathrm {B}}_{0.39}=F^{\mathrm {L}}_{0.39}\wedge F^{\mathrm {U}}_{0.39}=\) (\(\tilde{c}_1\wedge \tilde{c}_2\wedge \tilde{c}_3)\vee \tilde{c}_1\wedge \tilde{c}_2\wedge \tilde{c}_4)\), the candidates of B-reducts are,

\(\{c_1,c_2\}, \quad \{c_1,c_2,c_3\}, \quad \{c_1,c_2,c_4\}\).

We can see that all of those satisfy (VPB1), hence, \(\{c_1,c_2\}\) is the unique B-reduct. Because \(\hat{F}^{\mathrm {P}}_{0.39}=F^{\mathrm {L}}_{0.39}=(\tilde{c}_1\wedge \tilde{c}_2\wedge \tilde{c}_3)\vee (\tilde{c}_1\wedge \tilde{c}_2\wedge \tilde{c}_4)\vee (\tilde{c}_1\wedge \tilde{c}_3\wedge \tilde{c}_4)\), the candidates of P-reducts are,

\(\{c_1,c_2\}, \quad \{c_1,c_2,c_3\}, \quad \{c_1,c_2,c_4\}, \quad \{c_1,c_3,c_4\}\).

Also, in that case, all candidates satisfy (VPP1), hence, \(\{c_1,c_2\}\) and \(\{c_1,c_3,c_4\}\) are P-reducts. Similarly, the candidates of UN-reducts are,

\(\{c_1,c_2\}, \quad \{c_2,c_3\}, \quad \{c_3,c_4\}, \quad \{c_1,c_2,c_4\}, \quad \{c_1,c_3,c_4\}\),

and all candidates satisfy (VPUN1), hence, \(\{c_1,c_2\}\), \(\{c_2,c_3\}\), and \(\{c_3,c_4\}\) are UN-reducts.

All reducts are arranged in Table 7.6. We can observe that several kinds of reducts are different. In this example, each L-reduct is also an LUN-reduct and vice versa. Such an equivalence holds in this example but not always.

Table 7.6 All obtained reducts with \(\beta =0.39\) in Table 7.3

In this example, we would select \(\{c_1,c_2,c_3\}\) or \(\{c_1,c_2,c_4\}\) to preserve all structures. Additionally, \(c_1\) and \(c_2\) appear in many other reducts. Whereas, we would select U-reduct \(\{c_2,c_3\}\) to reduce the size of the reduct.

4 Structure-Based Attribute Reduction in Dominance-Based Rough Set Models

4.1 Decision Tables Under Dominance Principle and Dominance-Based Rough Set Models

In Dominance-based Rough Set Model (DRSM), known as Dominance-based Rough Set Approach [16, 18, 49], decision tables with order relations are analyzed. Let \(\mathbb {D} = (U, AT =C\cup \{d\},\{V_a\}_{a\in AT })\) be a decision table. The attribute set \( AT \) is partitioned into \( AT _N\) and \( AT _C\), where \( AT _N\) is the set of nominal attributes and \( AT _C\) is the set of criteria (ordinal attributes). For a criterion \(a\in AT _C\), we suppose a total order \(\ge \) on its value set \(V_a\). Moreover, all criteria are of the gain-type, i.e., the greater the better. We assume that the decision attribute \(d\) is a criterion.

In DRSM, it is supposed that if an object \(u\) is better than or equal to another object \(u'\) with respect to all condition attributes, then the class of \(u\) should not be worse than that of \(u'\). This is called the dominance principle [16].

Remark 8

The setting of DRSM is considered as the monotone or ordinal classification problem [2, 3, 32], where classifiers are restricted to be monotonic. Let \(f\) be a classifier, which assigns to each object \(u\) a class label (decision class value) \(f(u)\). The classifier \(f\) is monotonic if for any object pair \(u\) and \(u'\), we have \(u\le u'\) implying \(f(u)\le f(u')\). In this chapter, however, we do not discuss classifiers nor algorithms for building classifiers.

Remark 9

We assume the total order, i.e., antisymmetry, transitivity, and, comparability, on the value set \(V_a\) of each condition criteria \(a\in AT _C\cap C\). However, regardless of comparability, the result of this section can be applied without modification. Additionally, we assume that all criteria are of the gain-type. However, in applications, we may encounter cost-type criteria, i.e., the smaller the better. For a cost-type criterion, we can deal with it as the gain-type by reversing the order of its values.

Remark 10

Generally, there is more than one decision attribute in a decision table. In such a case, the set of decision classes (the partition of objects by the decision attributes) is partially ordered, while it is totally ordered in the case of a single decision attribute. In this section, we focus on the case of a single decision attribute (more generally, the case when the decision classes form a totally ordered set), however, the results of this section could be straightforwardly extended to that of multiple decision attributes.

For \(A\subseteq C\), a dominance relation \(D_A\) on \(U\) is defined by:

$$\begin{aligned} D_A=&\left\{ \!(u,u')\!\in U^2|a(u)\ge a(u'),\forall a\in AT_C\cap A \right. \\&\quad \left. \text { and } a(u)=a(u'),\forall a\in AT_N\cap A\right\} . \end{aligned}$$

\(D_A\) satisfies reflexivity and transitivity. When \((u,u')\in D_A\), we say that \(u\) dominates \(u'\) with respect to \(A\). The relation \((u,u')\in D_A\) means “\(u\) is better than or equal to \(u'\) with respect to criteria \(A\)”. For \(u\in U\), its dominating set and its dominated set with respect to \(A\) are defined, respectively, by:

$$\begin{aligned} D_A^+(u)&= \{u'\in U~|~(u',u)\in D_A\}, \\ D_A^-(u)&= \{u'\in U~|~(u,u')\in D_A\}. \end{aligned}$$

The dominating set \(D_A^+(u)\) (resp. the dominated set \(D_A^-(u)\)) is the set of the objects dominating (resp. dominated by) \(u\) under \(A\).

Since decision classes are ordered \(X_1 < X_2 < \dots < X_p\), one can define an upward union of decision classes \(X_i^\ge \) and a downward union of decision classes \(X_i^\le \) with respect to each class \(X_i\), \(i\in V_d\), as follows:

$$\begin{aligned} X_i^\ge = \bigcup _{j\ge i}X_j,\quad X_i^\le = \bigcup _{j\le i}X_j. \end{aligned}$$

For convenience, \(X_0^\le = X_{p+1}^\ge = \emptyset \). We have \(X_i^\ge = U\setminus X_{i-1}^\le \).

Example 10

Consider a decision table \(\mathbb {D} = (U,C\cup \{d\},\{V_a\})\) given in Table 7.7. This table shows student evaluation in a school. The objects are seven students, i.e., \(U=\{u_1,u_2,\ldots ,u_7\}\). The condition attributes are scores of mathematics (Ma), physics (Ph) and literature (Li), while the decision attribute (\(d\)) is a comprehensive evaluation (E). Namely, \(C=\) {Ma, Ph, Li} and \(d=\) E. We may assume that the better scores in all subjects student takes, the better comprehensive evaluation he/she gets.

Table 7.7 A decision table of student records

Let \(A=\){Ma, Ph}. The dominance relation \(D_A\) is described as the following matrix. Symbol \(*\) indicates that the corresponding row object \(u_i\) and column object \(u_j\) is in the dominance relation, i.e., \((u_i,u_j)\in D_A\).

\( \begin{array}{l|lllllll} &{} u_1 &{} u_2 &{} u_3 &{} u_4 &{} u_5 &{} u_6 &{} u_7 \\ \hline u_1 &{} * &{} * &{} * &{} * &{} * &{} * &{} * \\ u_2 &{} * &{} * &{} * &{} * &{} * &{} * &{} * \\ u_3 &{} &{} &{} * &{} * &{} * &{} * &{} * \\ u_4 &{} &{} &{} &{} * &{} &{} &{} * \\ u_5 &{} &{} &{} &{} &{} * &{} * &{} * \\ u_6 &{} &{} &{} &{} &{} * &{} * &{} * \\ u_7 &{} &{} &{} &{} &{} &{} &{} * \\ \end{array} \)

For each object \(u_i\in U\), symbols \(*\) in the row of \(u_i\) indicate the objects in \(D^-(u_i)\), while symbols \(*\) in the column indicate the objects in \(D^+(u_i)\). For example, \(D^-(u_3) = \{u_3,u_4,u_5,u_6,u_7\}\) and \(D^+(u_3) = \{u_1,u_2,u_3\}\).

There are three decision classes \(X_{\text {b}} = \{u_5,u_7\}\), \(X_{\text {m}} = \{u_2,u_4,u_6\}\) and \(X_{\text {g}} = \{u_1,u_3\}\) for bad, med and good, respectively. The upward and downward unions of those decision classes are,

$$\begin{aligned} \begin{array}{ll} &{}X^\ge _{\text {b}} = U, \quad X^\ge _{\text {m}} = \{u_1,u_2,u_3,u_4,u_6\}, \quad X^\ge _{\text {g}} = \{u_1,u_3\}, \\ &{}X^\le _{\text {b}} = \{u_5,u_7\}, \quad X^\le _{\text {m}} = \{u_2,u_4,u_5,u_6,u_7\}, \quad X^\le _{\text {g}} = U. \end{array} \end{aligned}$$

Given a decision table, the inconsistency with respect to the dominance principle is captured by the difference between upper and lower approximations of the unions of decision classes. Given a condition attribute set \(A\subseteq C\), and \(i\in V_d\), the lower approximation \(\mathrm {LA}_{A}(X_i^\ge )\) and the upper approximation \(\mathrm {UA}_{A}(X_i^\ge )\) of \(X^\ge _i\) are defined, respectively, by:

$$\begin{aligned} \mathrm {LA}_{A}(X_i^\ge )&= \{u\in U~|~D_A^+(u)\subseteq X_i^\ge \},\\ \mathrm {UA}_{A}(X_i^\ge )&= \{u\in U~|~D_A^-(u)\cap X_i^\ge \ne \emptyset \}. \end{aligned}$$

Similarly, the lower approximation \(\mathrm {LA}_{A}(X_i^\le )\) of \(X^\le _i\) and upper approximation \(\mathrm {UA}_{A}(X_i^\le )\) are defined, respectively, by:

$$\begin{aligned} \mathrm {LA}_{A}(X_i^\le )&= \{u\in U~|~D_A^-(u)\subseteq X_i^\le \},\\ \mathrm {UA}_{A}(X_i^\le )&= \{u\in U~|~D_A^+(u)\cap X_i^\le \ne \emptyset \}. \end{aligned}$$

If \(u\) belongs to \(\mathrm {LA}_{A}(X_i^{\ge })\) then all objects dominating \(u\) do not belong to \(X_{i-1}^\le \), i.e., there exists no evidence for \(u\in X_{t-1}^\le \) in view of the monotonicity assumption. Therefore, we can say that \(u\) certainly belongs to \(X_i^{\ge }\). On the other hand if \(u\) belongs to \(\mathrm {UA}_{A}(X_i^{\ge })\) then \(u\) is dominating an object belonging to \(X_i^\ge \), i.e., there exists evidence for \(u\in X_i^\ge \) in view of the monotonicity assumption. Therefore, we can say that \(u\) possibly belongs to \(X_i^{\ge }\). The similar interpretations can be applied to \(\mathrm {LA}_{A}(X_i^{\le })\) and \(\mathrm {UA}_{A}(X_i^{\le })\).

The difference between the upper and lower approximations is called a boundary. The boundaries of an upward union \(X_i^\ge \) and a downward union \(X_i^\le \), denoted by \(\mathrm {BN}_A(X_i^{\ge })\) and \(\mathrm {BN}_A(X_i^{\le })\), are defined by:

$$\begin{aligned} \mathrm {BN}_A(X_i^{\ge })&= \mathrm {UA}_{A}(X_i^{\ge })\setminus \mathrm {LA}_{A}(X_i^{\ge }),\\ \mathrm {BN}_A(X_i^{\le })&= \mathrm {UA}_{A}(X_i^{\le })\setminus \mathrm {LA}_{A}(X_i^{\le }). \end{aligned}$$

Objects in the boundary region of an upward or downward union are classified neither to that union nor to the complement with certainty.

Example 11

Remember the decision table \(\mathbb {D}=(U,C\cup \{d\},\{V_a\})\) in Table 7.7. Let \(A=\){Ma, Ph}. The lower and upper approximations of the upward and downward unions with respect to \(A\) are obtained as follows.

$$\begin{aligned} \begin{array}{ll} &{}\mathrm {LA}_A(X^\ge _{\text {b}}) = U, \quad \mathrm {LA}_A(X^\ge _{\text {m}}) = \{u_1,u_2,u_3,u_4\}, \quad \mathrm {LA}_A(X^\ge _{\text {g}}) = \emptyset , \\ &{}\mathrm {LA}_A(X^\le _{\text {b}}) = \{u_7\}, \quad \mathrm {LA}_A(X^\le _{\text {m}}) = \{u_4,u_5,u_6,u_7\}, \quad \mathrm {LA}_A(X^\le _{\text {g}}) = U, \\ &{}\mathrm {UA}_A(X^\ge _{\text {b}}) = U, \quad \mathrm {UA}_A(X^\ge _{\text {m}}) = U\setminus \{u_7\}, \quad \mathrm {UA}_A(X^\ge _{\text {g}}) = \{u_1,u_2,u_3\}, \\ &{}\mathrm {UA}_A(X^\le _{\text {b}}) = \{u_5,u_6,u_7\}, \quad \mathrm {UA}_A(X^\le _{\text {m}}) = U, \quad \mathrm {UA}_A(X^\le _{\text {g}}) = U. \end{array} \end{aligned}$$

Now, we remember properties of approximations [16, 18, 31]. By the boundary conditions of \(X^\ge \) and \(X^\le \),

$$\begin{aligned} \mathrm {UA}_{A}(X_1^{\ge })&= \mathrm {LA}_{A}(X_1^{\ge })=U, \quad \mathrm {UA}_{A}(X_p^{\le }) = \mathrm {LA}_{A}(X_p^{\le })=U, \nonumber \\ \mathrm {UA}_{A}(X_{p+1}^{\ge })&= \mathrm {LA}_{A}(X_{p+1}^{\ge })=\emptyset , \quad \mathrm {UA}_{A}(X_0^{\le }) = \mathrm {LA}_{A}(X_0^{\le })=\emptyset . \end{aligned}$$
(7.21)

Let \(A\subseteq C\) and \(i\in V_d\). Similarly to RSM, there exist inclusion relations between each union of decision classes and its lower and upper approximations.

$$\begin{aligned} \mathrm {LA}_{A}(X_i^{\ge })\subseteq X_i^{\ge }\subseteq \mathrm {UA}_{A}(X_i^{\ge }), \quad \mathrm {LA}_{A}(X_i^{\le })\subseteq X_i^{\le }\subseteq \mathrm {UA}_{A}(X_i^{\le }). \end{aligned}$$
(7.22)

Approximations are expressed by unions of dominating or dominated sets,

$$\begin{aligned} \mathrm {LA}_{A}(X_i^{\ge })&= \bigcup _{D_A^+(u)\subseteq X_i^{\ge }}D_A^+(u) = \bigcup _{u\in \mathrm {LA}_A(X_i^{\ge })}D_A^+(u), \\ \mathrm {UA}_{A}(X_i^{\ge })&= \bigcup _{D_A^-(u)\cap X_i^{\ge }\ne \emptyset }D_A^+(u) = \bigcup _{u\in \mathrm {UA}_A(X_i^{\ge })}D_A^+(u), \\ \mathrm {LA}_{A}(X_i^{\le })&= \bigcup _{D_A^-(u)\subseteq X_i^{\le }}D_A^-(u) = \bigcup _{u\in \mathrm {LA}_A(X_i^{\le })}D_A^-(u), \\ \mathrm {UA}_{A}(X_i^{\le })&= \bigcup _{D_A^+(u)\cap X_i^{\le }\ne \emptyset }D_A^-(u) = \bigcup _{u\in \mathrm {UA}_A(X_i^{\le })}D_A^-(u). \end{aligned}$$

There exists duality of lower and upper approximations.

$$\begin{aligned} \mathrm {UA}_{A}(X_i^{\ge })=U\setminus \mathrm {LA}_{A}(X_{i-1}^{\le }), \quad \mathrm {UA}_{A}(X_{i}^{\le })=U\setminus \mathrm {LA}_{A}(X_{i+1}^{\ge }). \end{aligned}$$
(7.23)

So, the upper approximations of the pair of complementary unions of decision classes form a cover of \(U\):

$$\begin{aligned} \mathrm {UA}_{A}(X_i^{\ge })\cup \mathrm {UA}_{A}(X_{i-1}^{\le })=U. \end{aligned}$$
(7.24)

By the duality of lower and upper approximations, the boundaries of the pair of complementary unions are the same,

$$\begin{aligned} \mathrm {BN}_A(X_i^{\ge })=\mathrm {BN}_A(X_{i-1}^{\le }). \end{aligned}$$
(7.25)

Lower and upper approximations can be expressed by boundaries. That is useful for investigating relations between different types of reducts:

$$\begin{aligned} \mathrm {UA}_{A}(X_i^{\ge })&= \mathrm {BN}_A(X_i^{\ge })\cup X_i^{\ge }, \quad \mathrm {UA}_{A}(X_i^{\le }) = \mathrm {BN}_A(X_i^{\le })\cup X_i^{\le }, \end{aligned}$$
(7.26)
$$\begin{aligned} \mathrm {LA}_{A}(X_i^{\ge })&= X_i^{\ge }\setminus \mathrm {BN}_A(X_i^{\ge }), \quad \mathrm {LA}_{A}(X_i^{\le }) = X_i^{\le }\setminus \mathrm {BN}_A(X_i^{\le }). \end{aligned}$$
(7.27)

Let \(A,B \subseteq C\) and \(i,j \in V_d\). Then, we have the following monotonicity properties:

$$\begin{aligned} j\ge i&\Rightarrow \mathrm {LA}_{A}(X_j^{\ge })\subseteq \mathrm {LA}_{A}(X_i^{\ge }), \quad \mathrm {UA}_{A}(X_j^{\ge })\subseteq \mathrm {UA}_{A}(X_i^{\ge }), \end{aligned}$$
(7.28)
$$\begin{aligned} j\le i&\Rightarrow \mathrm {LA}_{A}(X_j^{\le })\subseteq \mathrm {LA}_{A}(X_i^{\le }), \quad \mathrm {UA}_{A}(X_j^{\le })\subseteq \mathrm {UA}_{A}(X_i^{\le }), \end{aligned}$$
(7.29)
$$\begin{aligned} B\subseteq A&\Rightarrow \mathrm {LA}_{B}(X_i^{\ge })\subseteq \mathrm {LA}_{A}(X_i^{\ge }), \quad \mathrm {LA}_{B}(X_i^{\le })\subseteq \mathrm {LA}_{A}(X_i^{\le }), \end{aligned}$$
(7.30)
$$\begin{aligned} B\subseteq A&\Rightarrow \mathrm {UA}_{B}(X_i^{\ge })\supseteq \mathrm {UA}_{A}(X_i^{\ge }), \quad \mathrm {UA}_{B}(X_i^{\le })\supseteq \mathrm {UA}_{A}(X_i^{\le }). \end{aligned}$$
(7.31)

Those are important for defining and enumerating reducts.

Furthermore, the authors proposed lower and upper approximations and boundary regions of decision classes [31]. For \(A\subseteq C\) and \(i\in V_d\), lower and upper approximations of \(X_i\) and the boundary region of \(X_i\) are defined by:

$$\begin{aligned} \mathrm {LA}_{A}(X_i)&= \mathrm {LA}_{A}(X_i^{\ge })\cap \mathrm {LA}_{A}(X_i^{\le }), \\ \mathrm {UA}_{A}(X_i)&= \mathrm {UA}_{A}(X_i^{\ge })\cap \mathrm {UA}_{A}(X_i^{\le }), \\ \mathrm {BN}_{A}(X_i)&= \mathrm {UA}_{A}(X_i)\setminus \mathrm {LA}_{A}(X_i). \end{aligned}$$

This definition is an analogy to \(X_i=X_i^\ge \cap X_i^\le \).

Let \(A\subseteq C\) and \(i\in V_d\). The upper approximations of \(X^\ge _i\) and \(X^\le _i\) are represented by upper approximations of decision classes:

$$\begin{aligned} \mathrm {UA}_{A}(X_i^{\ge })&= \bigcup _{j\ge i}\mathrm {UA}_{A}(X_j), \end{aligned}$$
(7.32)
$$\begin{aligned} \mathrm {UA}_{A}(X_i^{\le })&= \bigcup _{j\le i}\mathrm {UA}_{A}(X_j). \end{aligned}$$
(7.33)

The boundary of \(X_i\) is the union of the boundaries of \(X_i^\ge \) and \(X_i^\le \),

$$\begin{aligned} \mathrm {BN}_A(X_i)= \mathrm {BN}_A(X_i^{\ge })\cup \mathrm {BN}_A(X_i^{\le }). \end{aligned}$$
(7.34)

Approximations of decision classes have similar properties as those of unions of decision classes:

$$\begin{aligned} \mathrm {LA}_{A}(X_i)\subseteq&X_i\subseteq \mathrm {UA}_{A}(X_i), \end{aligned}$$
(7.35)
$$\begin{aligned} \mathrm {UA}_{A}(X_i)=&\mathrm {BN}_A(X_i)\cup X_i, \end{aligned}$$
(7.36)
$$\begin{aligned} \mathrm {LA}_{A}(X_i)=&X_i\setminus \mathrm {BN}_A(X_i). \end{aligned}$$
(7.37)

The next properties are analogies to (7.6) and (7.5) of the classical RSM.

$$\begin{aligned} \mathrm {BN}_A(X_i)&= \mathrm {UA}_{A}(X_i)\cap \bigcup _{j\ne i} \mathrm {UA}_{A}(X_j), \end{aligned}$$
(7.38)
$$\begin{aligned} \mathrm {LA}_{A}(X_i)&= U\setminus \bigcup _{j\ne i}\mathrm {UA}_{A}(X_j). \end{aligned}$$
(7.39)

We define the positive region for the decision table in DRSM:

$$\begin{aligned} \mathrm {POS}_A(d) = \bigcup _{i\in V_d} \mathrm {LA}_A(X_i). \end{aligned}$$

The complement of the positive region is exactly the union of all boundaries,

$$\begin{aligned} U\setminus \mathrm {POS}_A(d)=\bigcup _{i\in V_d}\mathrm {BN}_{A}(X_i). \end{aligned}$$
(7.40)

Moreover, the approximations are also monotone with respect to the inclusion relation between condition attribute sets. Let \(A,B \subseteq C\) and \(i\in V_d\).

$$\begin{aligned} B\subseteq A\Rightarrow \mathrm {LA}_{B}(X_i)\subseteq \mathrm {LA}_{A}(X_i), \quad \mathrm {UA}_{B}(X_i)\supseteq \mathrm {UA}_{A}(X_i). \end{aligned}$$
(7.41)

The generalized decision function proposed by Dembczyński et al. [10] also plays an important role for Boolean reasoning in DRSM. It provides an object-wise view of DRSM. Let \(A\subseteq C\) and \(u\in U\), generalized decision of \(u\) with respect to \(A\) is defined by \(\delta _A(u) = \langle l_A(u), u_A(u)\rangle \), where

$$\begin{aligned} l_A(u)&= \min \{i\in V_d\;|\;D_A^+(u)\cap X_i\ne \emptyset \}, \\ u_A(u)&= \max \{i\in V_d\;|\;D_A^-(u)\cap X_i\ne \emptyset \}. \end{aligned}$$

\(\delta _A(u)\) shows the interval of decision classes to which \(x\) may belong. \(l_A(u)\) and \(u_A(u)\) are the lower and upper bounds of the interval. Obviously, we have

$$\begin{aligned} l_A(u)\le u_A(u). \end{aligned}$$
(7.42)

\(l_A(u)\) and \(u_A(u)\) are monotone with respect to the inclusion relation between condition attribute sets. Namely, for \(B,A\subseteq C\) and \(u\in U\), we have

$$\begin{aligned} B\subseteq A\Rightarrow l_B(u)\le l_A(u), \quad u_B(u)\ge u_A(u). \end{aligned}$$
(7.43)

Let \(i\in V_d\), using the generalized decision function, the lower and upper approximations of unions are represented as:

$$\begin{aligned} \mathrm {LA}_{A}(X_i^{\ge })&=\{u\in U\;|\;l_A(u)\ge i\}, \quad \mathrm {UA}_{A}(X_i^{\ge })=\{u\in U\;|\;u_A(u)\ge i\}, \end{aligned}$$
(7.44)
$$\begin{aligned} \mathrm {LA}_{A}(X_i^{\le })&=\{u\in U\;|\;u_A(u)\le i\}, \quad \mathrm {UA}_{A}(X_i^{\le })=\{u\in U\;|\;l_A(u)\le i\}. \end{aligned}$$
(7.45)

We can represent approximations of classes using the generalized decision,

$$\begin{aligned} \mathrm {LA}_{A}(X_i)&= \{u\in U\;|\;l_A(u)=u_A(u)=i\}, \end{aligned}$$
(7.46)
$$\begin{aligned} \mathrm {UA}_{A}(X_i)&= \{u\in U\;|\;l_A(u)\le i\le u_A(u)\}, \end{aligned}$$
(7.47)
$$\begin{aligned} \mathrm {BN}_A(X_i)&= \left\{ u\in U\;|\;l_A(u)\le i\le u_A(u),\;l_A(u) < u_A(u) \right\} . \end{aligned}$$
(7.48)

Example 12

Remember the decision table \(\mathbb {D}=(U,C\cup \{d\},\{V_a\})\) in Table 7.7. Let \(A=\){Ma, Ph}. The generalized decision function \(\delta _A\) with respect to \(A\) is obtained as follows:

\(\begin{array}{ll} \delta _A(u_1) &{}= \langle \text {med},\text {good}\rangle , \quad \delta _A(u_2) = \langle \text {med},\text {good}\rangle , \quad \delta _A(u_3) = \langle \text {med},\text {good}\rangle , \\ \delta _A(u_4) &{}= \langle \text {med},\text {med}\rangle , \quad \delta _A(u_5) = \langle \text {bad},\text {med}\rangle , \quad \delta _A(u_6) = \langle \text {bad},\text {med}\rangle , \\ \delta _A(u_7) &{}= \langle \text {bad},\text {bad}\rangle . \end{array}\)

4.2 Structure-Based Reducts in Dominance-Based Rough Set Models

Before defining structure-based reducts in DRSM, we introduce a notion of reducts preserving the quality of sorting, proposed by Susmaga et al. [49]. For \(A\subseteq C\), the quality of sorting \(\gamma _A(d)\), which is the counterpart of the quality of classification in the classical RSM, is defined by:

$$\begin{aligned} \gamma _A(d)=\frac{|U-\bigcup _{i\in V_d}\mathrm {BN}_A(X^{\le }_i)|}{|U|}=\frac{|U-\bigcup _{i\in V_d}\mathrm {BN}_A(X^{\ge }_i)|}{|U|}. \end{aligned}$$

By (7.34) and (7.40), we can see that \(\gamma _A(d)\) is related to the positive region of DRSM,

$$\begin{aligned} \gamma _A(d)=\frac{|U-\bigcup _{i\in V_d}\mathrm {BN}_A(X^{\le }_i)|}{|U|}=\frac{|U-\bigcup _{i\in V_d}\mathrm {BN}_A(X_i)|}{|U|}=\frac{|\mathrm {POS}_A(d)|}{|U|}. \end{aligned}$$

We call this type of reducts Q-reducts.

Definition 15

([16, 49]) A Q-reduct in DRSM is a minimal condition attribute subset \(A\subseteq C\) satisfying the following condition:

$$\begin{aligned} \gamma _A(d) = \gamma _C(d). \end{aligned}$$
(DQ)

Now, we introduce structure-based reducts in DRSM. Lower and upper approximations and boundary regions of upward and downward unions can be considered as a structure over a given object set \(U\). From this point, we define 7 union-structure-preserving reducts. The following reducts are conceivable.

Definition 16

([25, 52]) We define 7 types of reducts as follows.

  • An L\(^\ge \)-reduct in DRSM is a minimal condition attribute subset \(A\subseteq C\) satisfying the following condition:

  • An L\(^\le \)-reduct in DRSM is a minimal condition attribute subset \(A\subseteq C\) satisfying the following condition:

  • A U\(^\ge \)-reduct in DRSM is a minimal condition attribute subset \(A\subseteq C\) satisfying the following condition:

  • A U\(^\le \)-reduct in DRSM is a minimal condition attribute subset \(A\subseteq C\) satisfying the following condition:

  • An L\(^\diamond \)-reduct in DRSM is a minimal condition attribute subset \(A\subseteq C\) satisfying the following condition:

  • A U\(^\diamond \)-reduct in DRSM is a minimal condition attribute subset \(A\subseteq C\) satisfying the following condition:

  • A B\(^\diamond \)-reduct in DRSM is a minimal condition attribute subset \(A\subseteq C\) satisfying the following condition:

Yang et al. [52] independently proposed four kinds of reducts in DRSM with unknown attribute values, which are application of distribution reducts of Mi et al. [33]. Those reducts preserve lower/upper approximations of upward/downward unions. Hence, they correspond to L\(^\ge \)-, L\(^\le \)-, U\(^\ge \)-, and U\(^\le \)-reducts of ours. However, Yang et al. did not consider boundaries and combinations of different types of reducts.

From (7.23), we know that (\({\mathrm{{DL}}}^{\ge }\)) and (\(\mathrm{{DU}}^\le \)) are equivalent. Similarly, (\(\mathrm{{DL}}^\le \)) and (\({\mathrm{{DU}}}^{\ge }\)) are also equivalent. Therefore, (\(\mathrm{{DL}}^\diamond \)) is equivalent to (\(\mathrm{{DU}}^\diamond \)). Moreover, since condition (\(\mathrm{{DL}}^\diamond \)) implies conditions (\({\mathrm{{DL}}}^{\ge }\)) and (\(\mathrm{{DL}}^\le \)), any L\(^\diamond \)-reduct satisfies (\({\mathrm{{DL}}}^{\ge }\)) and also (\(\mathrm{{DL}}^\le \)). Similarly, since condition (\(\mathrm{{DU}}^\diamond \)) implies conditions (\({\mathrm{{DU}}}^{\ge }\)) and (\(\mathrm{{DU}}^\le \)), any U\(^\diamond \)-reduct satisfies (\({\mathrm{{DU}}}^{\ge }\)) and also (\(\mathrm{{DU}}^\le \)). Therefore, we have the following theorem.

Theorem 7

([25, 52]) Let \(A\) be a subset of \(C\). The following statements hold.

  • \(A\) is a U\(^\ge \)-reduct if and only if \(A\) is an L\(^\le \)-reduct.

  • \(A\) is a U\(^\le \)-reduct if and only if \(A\) is an L\(^\ge \)-reduct.

  • \(A\) is a U\(^\diamond \)-reduct if and only if \(A\) is an L\(^\diamond \)-reduct.

  • \(A\) is a B\(^\diamond \)-reduct if and only if \(A\) is an L\(^\diamond \)-reduct.

  • If \(A\) is an L\(^\diamond \)-reduct then \(A\) satisfies (\({\mathrm{{DL}}}^{\ge }\)), (\(\mathrm{{DL}}^\le \)), (\({\mathrm{{DU}}}^{\ge }\)), and (\(\mathrm{{DU}}^\le \)).

As the result of the discussion, we obtain 3 different types of reducts based on the structure induced from rough set operations on unions. They are represented by L\(^\ge \)-reduct, L\(^\le \)-reduct and L\(^\diamond \)-reduct.

Now, we are ready to define other types of structure-based reducts, considering approximations of decision classes. The first kind of reducts, called L-reduct, preserves the lower approximations of decision classes, the second kind of reducts, called U-reduct, preserves the upper approximations of decision classes, the third kind of reduct, called B-reduct, preserves the boundary regions of decision classes, and the fourth kind of reduct, called P-reduct, preserves the positive region. They are parallel to L-, U-, B-, P-reducts discussion in the classical RSM.

Definition 17

([31]) We define four types of reducts as follows.

  • An L-reduct in DRSM is a minimal condition attribute subset \(A\subseteq C\) satisfying the following condition:

  • A U-reduct in DRSM is a minimal condition attribute subset \(A\subseteq C\) satisfying the following condition:

  • A B-reduct in DRSM is a minimal condition attribute subset \(A\subseteq C\) satisfying the following condition:

  • A P-reduct in DRSM is a minimal condition attribute subset \(A\subseteq C\) satisfying the following condition:

From the properties of approximations of decision classes, we have the following theorem.

Theorem 8

([31]) Let \(A\) be a subset of \(C\). We have the following assertions:

  1. (a)

    \(A\) is a B-reduct if and only if \(A\) is a U-reduct,

  2. (b)

    \(A\) is a P-reduct if and only if \(A\) is an L-reduct,

  3. (c)

    If \(A\) is a U-reduct then \(A\) satisfies (DL).

Consequently, we have only 2 kinds of class-structure-based reducts: L-reducts and U-reducts (or B-reducts). This result is also parallel to the result in RSM.

Let us discuss relations of the union-based reducts, the class-based reducts, the Q-reducts. We have the following theorems.

Theorem 9

([31]) Let \(A\) be a subset of \(C\). We have the following assertions:

  1. (a)

    \(A\) is an L\(^\diamond \)-reduct if and only if \(A\) is a U-reduct,

  2. (b)

    \(A\) is a Q-reduct if and only if \(A\) is an L-reduct.

Additionally, we propose two more types of reducts, which are compounds of L- with L\(^\ge \)- and L\(^\le \)-reducts, respectively.

Definition 18

We define two types of reducts as follows.

  • An LL\(^\ge \)-reduct in DRSM is a minimal condition attribute subset \(A\subseteq C\) satisfying the following condition:

  • A LL\(^\le \)-reduct in DRSM is a minimal condition attribute subset \(A\subseteq C\) satisfying the following condition:

As a result, all types of reducts proposed in DRSM are arranged in Fig. 7.3. Consequently, there exist six different kinds of reducts, i.e., U-reducts (B-reducts, L\(^\diamond \)-reducts, U\(^\diamond \)-reducts, B\(^\diamond \)-reducts), LL\(^\ge \)-reducts, LL\(^\le \)-reducts, L-reducts (P-reducts), L\(^\ge \)-reducts (U\(^\le \)-reducts) and L\(^\le \)-reducts (U\(^\ge \)-reducts) in DRSM.

Fig. 7.3
figure 3

Strong-weak hierarchy of reducts in DRSM

4.3 Boolean Functions Representing Reducts

Because Boolean reasoning is a popular approach to enumerate all reducts of each type in rough set literature, some authors already showed Boolean functions representing their own types of reducts [49, 52]. On the other hand, the authors proposed a unified formulation of Boolean functions for all types of reducts using the generalized decision function in [31]. We only discuss Boolean functions for L\(^\ge \)-, L\(^\le \)- and L-reducts, because U-reducts, LL\(^\ge \)-reducts, LL\(^\le \)-reducts, and their equivalences can be computed from L\(^\ge \)- and L\(^\le \)-reducts or their Boolean functions.

We represent preserving conditions of reducts by those of the generalized decision function.

Lemma 5

([31]) Let \(A\) be a subset of \(C\). We have the following assertions.

  • Condition (\({\mathrm{{DL}}}^{\ge }\)) is equivalent to:

  • Condition (\(\mathrm{{DL}}^\le \)) is equivalent to:

  • Condition (DL) is equivalent to:

The next lemma is parallel to Lemma 2 of RSM. It also connects two notions: “preserving” and “non-dominating”.

Lemma 6

([31]) Let \(u \in U\). The following assertions are equivalent.

  • \(l_A(u)=l_C(u)\).

  • \(\forall u'\in U,\;(l_C(u')<l_C(u) \Rightarrow \exists a\in A,\;(u',u)\not \in D_{\{a\}})\).

Moreover, the following assertions are also equivalent.

  • \(u_A(u)=u_C(u)\).

  • \(\forall u'\in U,\;(u_C(u')>u_C(u) \Rightarrow \exists a\in A,\;(u,u')\not \in D_{\{a\}})\).

Now we are ready to define a non-domination matrix, instead of the discernibility matrix of RSM. The non-domination matrix \(M=(m_{ij})_{i,j=1,2,\dots ,n}\) in DRSM is defined as follows:

$$\begin{aligned} m_{ij}= \{c\in C~|~(u_j,u_i)\not \in D_{\{c\}}\} \end{aligned}$$

Based on \(M\), we define four non-domination functions.

Definition 19

Non-domination functions \(F^\ge \), \(F^\le \) and \(F^{\mathrm {L}}\) are defined as follows.

$$\begin{aligned} F^\ge (\tilde{c}_1,\ldots ,\tilde{c}_m)&= \bigwedge _{i,j|l_C(u_j)<l_C(u_i)}\bigvee _{c\in m_{ij}}\tilde{c}, \\ F^\le (\tilde{c}_1,\ldots ,\tilde{c}_m)&= \bigwedge _{i,j|u_C(u_j)>u_C(u_i)}\bigvee _{c\in m_{ji}}\tilde{c}, \\ F^\mathrm{L}(\tilde{c}_1,\ldots ,\tilde{c}_m)&= \bigwedge _{i: l_C(x_i)=u_C(x_i)} \left( \bigwedge _{j|l_C(u_j)<l_C(u_i)}\bigvee _{c\in m_{ij}}\tilde{c}\wedge \bigwedge _{j|u_C(u_j)>u_C(u_i)}\bigvee _{c\in m_{ji}}\tilde{c}\right) , \end{aligned}$$

where \(\tilde{c}_i\) is a Boolean variable corresponding to \(i\)th condition attribute \(c_i\).

From Lemma 6, we have the following theorem. Let \(A\subseteq C\). Remember that \(\tilde{c}^A\) is a Boolean vector such that \(i\)th element \(\tilde{c}^A_i\) is true iff \(c_i\in A\), and \(\phi _{A}\) is the term \(\bigwedge \{\tilde{c}|\textit{c}\in A\}\).

Theorem 10

([31, 49, 52]) Let \(A\) be a subset of \(C\). We have the following equivalences:

  • \(A\) satisfies (DlG), i.e., (\({\mathrm{{DL}}}^{\ge }\)) if and only if \(F^\ge (\tilde{c}^A)=1\). Moreover, \(A\) is an L\(^\ge \)-reduct in DRSM if and only if \(\phi _{A}\) is a prime implicant of \(F^\ge \),

  • \(A\) satisfies (DuG), i.e., (\(\mathrm{{DL}}^\le \)) if and only if \(F^\le (\tilde{c}^A)=1\). Moreover, \(A\) is an L\(^\le \)-reduct in DRSM if and only if \(\phi _{A}\) is a prime implicant of \(F^\le \),

  • \(A\) satisfies (DLG), i.e., (DL) if and only if \(F^{\mathrm {L}}(\tilde{c}^A)=1\). Moreover, \(A\) is an L-reduct in DRSM if and only if \(\phi _{A}\) is a prime implicant of \(F^{\mathrm {L}}\).

From Theorem 10, all L\(^\ge \)-, L\(^\le \)- and L-reducts can be obtained as all prime implicants of Boolean functions \(F^\ge \), \(F^\le \) and \(F^\mathrm{L}\), respectively.

The proposed non-domination matrices have an advantage when compared with the previous ones. We need to calculate neither lower, upper approximations nor boundary regions of unions but only the lower bounds \(l_C\) and the upper bounds \(u_C\) of all objects. Namely, the computation of the proposed approach is free from the number of decision classes.

Example 13

Remember the decision table given \(\mathbb {D} = (U,C\cup \{d\},\{V_a\})\) in Table 7.7. In Table 7.8, we show again \(\mathbb {D}\) with the lower bounds \(l_C\) and the upper bounds \(u_C\) of the generalized decisions of the objects which appear in the rightmost two columns of the table. To obtain \(l_C(u_i)\) and \(u_C(u_i)\), we search the minimum class in \(D_C^+(u_i)\) and the maximum class in \(D_C^-(u_i)\), respectively.

Table 7.8 The decision table in Table 7.7 with the generalized decision function

Non-domination matrices \(M\) are obtained as follows.

\(\begin{array}{l|ccccccc} &{} u_1&{} u_2&{} u_3&{} u_4&{} u_5&{} u_6&{} u_7 \\ \hline u_1^*&{} \emptyset &{} \{\text {Li}\}&{} \{\text {Ma},\text {Li}\}&{} \{\text {Ma},\text {Ph}\}&{} C&{} C&{} C \\ u_2 &{} \emptyset &{} \emptyset &{} \{\text {Ma}\}&{} \{\text {Ma},\text {Ph}\}&{} \{\text {Ma},\text {Ph}\}&{} C&{} C \\ u_3 &{} \emptyset &{} \emptyset &{} \emptyset &{} \{\text {Ma},\text {Ph}\}&{} \{\text {Ph}\}&{} \{\text {Ph},\text {Li}\}&{} C \\ u_4^*&{} \emptyset &{} \{\text {Li}\}&{} \{\text {Li}\}&{} \emptyset &{} \{\text {Ph},\text {Li}\}&{} \{\text {Ph},\text {Li}\}&{} \{\text {Ph},\text {Li}\} \\ u_5 &{} \emptyset &{} \emptyset &{} \emptyset &{} \{\text {Ma}\}&{} \emptyset &{} \{\text {Li}\}&{} \{\text {Ma},\text {Li}\} \\ u_6 &{} \emptyset &{} \emptyset &{} \emptyset &{} \{\text {Ma}\}&{} \emptyset &{} \emptyset &{} \{\text {Ma}\} \\ u_7^*&{} \emptyset &{} \emptyset &{} \emptyset &{} \emptyset &{} \emptyset &{} \emptyset &{} \emptyset \\ \end{array}\)

For example, the entry corresponding to row \(u_1\) and column \(u_3\) on \(M\) contains Ma and Li, because \(u_3\) is worse than \(u_1\) with respect to Ma and Li but not worse with respect to Ph. Symbol \(C\) at some entries means {Ma, Ph, Li}. The rows with symbol \(*\) show objects \(u_i\) such that \(l_C(u_i)=u_C(u_i)\).

The Boolean function \(F^{\ge }\) is obtained from \(M\) as

$$\begin{aligned} F^{\ge }(\tilde{\text {Ma}},\tilde{\text {Ph}},\tilde{\text {Li}})=\bigwedge _{i=1,\;j=2,3,\dots ,7}\bigvee _{c\in m_{ij}}\tilde{c}\wedge \bigwedge _{i=2,3,4,\;j=5,6,7}\bigvee _{c\in m_{ij}}\tilde{c}=\tilde{\text {Ph}}\wedge \tilde{\text {Li}}. \end{aligned}$$

From the last equation, \(F^{\ge }(\tilde{\text {Ma}},\tilde{\text {Ph}},\tilde{\text {Li}})=\mathrm {true}\) only when \(\tilde{\text {Ph}}=\mathrm {true}\) and \(\tilde{\text {Li}}=\mathrm {true}\). This implies that only {Ma, Ph, Li} and {Ph, Li} satisfy (\({\mathrm{{DL}}}^{\ge }\)) owing to Theorem 10. An L\(^\ge \)-reduct is a minimal set of condition attributes that satisfies (\({\mathrm{{DL}}}^{\ge }\)). Therefore, {Ph, Li} is a unique L\(^\ge \)-reduct. Moreover, the L\(^\ge \)-reduct corresponds to a unique prime implicant of \(F^{\ge }\), i.e., \(\tilde{\text {Ph}}\wedge \tilde{\text {Li}}\).

Similarly, Boolean functions \(F^{\le }\), \(F^{\mathrm {U}}\) and \(F^{\mathrm {L}}\) are

Consequently, we obtain {Ph, Li} as the unique L\(^\ge \)-reduct, {Ma, Ph} as the unique L\(^{\le }\)-reduct and {Ma, Li} as the unique L-reduct. Moreover, {Ph, Li} \(\cup \,\){Ma, Ph} \(\,=\,\){Ma, Ph, Li}\(\,=C\) is the unique U-reduct, {Ph, Li} \(\cup \,\){Ma, Li} \(=\,\){Ma, Ph, Li} \(=C\) is the unique LL\(^\ge \)-reduct, and {Ma, Ph} \(\cup \) {Ma, Li} \(=\,\){Ma, Ph, Li}\(\,=C\) is the unique LL\(^\le \)-reduct.

5 Concluding Remarks

In this chapter, we have studied structure-based attribute reduction as a rough set approach to the attribute selection/reduction problem. We have proposed several concepts of structure-based reducts. In the rough set model, there are 2 different types of reducts, U-reducts and L-reducts. U-reducts preserve generalized decisions \(\partial _C(u)\) for all objects \(u\in U\), while L-reducts do so for all certain classified objects \(u\), namely, \(|\partial _C(u)|=1\). The authors studied refinement of the hierarchy of structure-based reducts (Fig. 7.1) by interpolating reducts which preserve objects \(u\) whose generalized decisions are at most \(k\), namely, \(|\partial _C(u)|\le k\) [24]. The parameter \(k\) provides a trade-off between the size of a reduct and preserved information.

In VPRSM, because approximations may not be monotone with respect to the set inclusion of condition attributes, classifications of some objects become precise by reducing condition attributes. From that viewpoint, the authors have proposed enhancing reducts [21], which do not preserve but make classification more precise than that of all condition attributes.

Attribute reduction have been also studied in other extensions of the rough set model, e.g. tolerance-based RSM [44], RSM for decision tables with missing values [29, 30], Bayesian RSM [47], fuzzy RSM [27, 28], and variable precision DRSM [26]. However, in general, extensions of the rough set model drop some important properties of approximations. Therefore, in such models, reducts may not be represented by Boolean functions.

When a measure \(\gamma \) (e.g. Eq. 7.8) representing a part of consistency of a rough set model is given, we can define approximate measure-based reducts as follows: \(A\subseteq C\) is an approximate reduct if \(\gamma _A\ge (1-\varepsilon )\gamma _C\) or \(\gamma _A\ge \gamma _C-\varepsilon \) for a small \(\varepsilon \ge 0\). Several measures used for approximate measure-based reducts have been proposed, e.g. based on the number of discerned object pairs or the information entropy [45, 46, 51]. Comparing with structure-based reducts, approximate measure-based approach can easily control size of reducts, but we cannot expect which parts of the structure of the rough set model deteriorate by reduction.

We show that reducts are (approximately) represented by prime implicants of Boolean functions (or pairs of Boolean functions). To compute all reducts of a particular type, we solve the dualization problem (more precisely, positive DNF (or CNF) dualization) of the corresponding Boolean function. It probably cannot be solved in polynomial time (it can be solved in quasi-polynomial time with respect to the sizes of the input and the output [9]).

To apply attribute reduction of this chapter to real-world data sets, we notice the following three points. Firstly, we need additional measures to select the best reducts for applications, for example, minimizing the size of the reduct or the number of the equivalence classes given by the reduct and so on [1, 13, 27, 42, 48, 50]. Such an optimization problem cannot be generally solved in polynomial time. Therefore, there are heuristic methods computing one or a number of reducts which are near to optimal [1, 27, 42, 48]. It does not mean that the Boolean functions studied in this chapter are useless for applications. They can be incorporated into heuristic methods.

Secondly, when data sets include numerical or continuous attribute values, the approach of this chapter does not work well, because the order of values or the degree of difference between values are not considered (except for criteria in DRSM). There are two approaches to overcome the drawback. One is discretization [7, 15] where the domain of a numerical attribute is partitioned to lower number of values. After discretization, we can apply attribute reduction to the data set without modification. The other is to use a similarity relation [12, 28] instead of the indiscernibility relation or a fuzzy partition [12, 22, 27] instead of the equivalence classes and define extensions of RSM. In that case, we can define structure-based reducts for the extended RSMs in the same way as those of this chapter.

Thirdly, reducts could suffer from overfitting because of rigid definitions of their preserving conditions. One technique to avoid overfitting is dynamic reducts [1], where decision tables with object subsets of a given cardinality are randomly and repeatedly selected, and reducts which appear in more decision tables than a given threshold are chosen as dynamic reducts.

In this chapter, we did not discuss algorithms to compute reducts and numerical experiments, whereas they are found in [5, 6, 8, 11, 13, 17, 19, 34, 41, 50, 51]. The references show how to select a desirable reduct or find an optimal reduct, and how to use the selected reduct for building classifiers. Additionally, they also show experimental results for benchmark or real-world data sets. The references do not include some types of reducts of this chapter, especially most types of reducts in VPRSM, however, from their results we hope that the proposed reducts would be useful in applications.

Proofs of theoretical results of this chapter are not so difficult. Parts of proofs are found in our papers [20, 23, 25, 31].