Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In this paper, we consider one more extension of the notion of decision table – decision table with many-valued decisions. In a table with many-valued decisions, each row is labeled with a nonempty finite set of decisions, and for a given row, we should find a decision from the set of decisions attached to this row.

Such tables arise in problems of discrete optimization, pattern recognition, computational geometry, decision making etc. [10, 17]. However, the main sources of decision tables with many-valued decisions are datasets filled by statistical or experimental data. In such datasets, we often have groups of objects with equal values of conditional attributes but, probably, different values of the decision attribute. Instead of a group of objects, we can consider one object given by values of conditional attributes. We attach to this object a set of decisions: either all decisions for objects from the group, or k the most frequent decisions for objects from the group, etc. As a result we obtain a decision table with many-valued decisions. In real life applications we can meet multi-label data when we study, e.g., problem of semantic annotation of images [4], music categorization into emotions [35], functional genomics [3], and text categorization [36].

In the rough set theory [22, 30, 31], decision tables are considered often that have equal rows labeled with different decisions. The set of decisions attached to equal rows is called the generalized decision for that rows [2325]. Here our aim is to find the generalized decision for a given row. In the paper, we will call this approach the generalized decision approach. However, the problem of finding an arbitrary decision or one of the most frequent decisions from the generalized decision is interesting also. Such study of decision tables with many-valued decisions can give a new tool for the rough set theory. In [2] and [18] we considered problem of construction of tests (super-reducts) and decision trees for decision tables with many-valued decisions. To choose one of the attributes we used uncertainty measure that is the number of boundary subtables.

Decision table with many-valued decisions can be considered as a decision table with an incomplete information because we don’t know which decision should be chosen from the set of decisions. Incomplete information exists also in decision tables where instead of a single value of conditional attribute we have a subset of values of the attribute domain. In [13, 14] approaches to interpreting queries in a database with such incomplete information were discussed. Z. Pawlak [22] and E. Orłowska [21] proposed Non-deterministic Information System for dealing with an incomplete information. Information incompleteness is connected also with missing values of attributes or intervals on values of attributes. M. Kryszkiewicz in [11] proposed method for computing all optimal generalized rules from decision table with missing values. In [2729] authors proposed rule generation system, based on Apriori algorithm, where incomplete information was considered as nondeterministic information.

In literature, often, problems connected with multi-label data are considered from the point of view of classification (multi-label classifications problems) [7, 8, 15, 19, 33, 34, 37]. Here our aim is not to deal with classification but to show that proposed approach for construction of decision rules for decision tables with many-valued decisions can be useful when we deal with knowledge representation. In various applications, we often deal with decision tables which contain noisy data. In this case, exact rules can be “over-fitted”, i.e., depend essentially on the noise. So, instead of exact rules with many attributes, it is more appropriate to work with approximate rules with smaller number of attributes. Besides, classifiers based on approximate decision rules have often better accuracy than classifiers based on exact decision rules.

In the proposed approach a greedy algorithm constructs \(\alpha \)-decision rules (\(\alpha \) is a degree of rule uncertainty), and the number of rules for a given row is equal to the cardinality of set of decisions attached to this row. Then we choose for each row in a decision table a rule with the minimum length. The choice of shorter rules is connected with the Minimum Description Length principle [26].

The problem of construction of rules with minimum length is NP-hard. Therefore, we consider approximate polynomial algorithm for rule optimization. Based on results of U. Feige [9] it was proved in [16], for decision tables with one-valued decision, that greedy algorithm under some natural assumptions on the class NP, is close to the best polynomial approximate algorithms for partial decision rule minimization. It is natural to use these results in our approach. Note that each decision table with one-valued decision can be interpreted also as a decision table where each row is labeled with a set of decisions which has one element.

The paper, extending a conference publication [6] and some results presented in [17], is devoted to the study of a greedy algorithm for construction of approximate decision rules for decision tables with many-valued decisions. The greedy algorithm for rule construction has polynomial time complexity for the whole set of decision tables with many-valued decisions.

We discuss also a problem of recognition of labels of points in the plain which illustrates the considered approach and the obtained bounds on precision of this algorithm relative to the length of rules.

In this paper, we study only binary decision tables with many-valued decisions. However, the obtained results can be extended to the decision tables filled by numbers from the set \(\{0, \ldots , k-1\}\), where \(k \ge 3\). We present experimental results based on modified data sets from UCI Machine Learning Repository [12] (by removal of some conditional attributes) into the form of decision tables with many-valued decisions. Experiments are connected with length of \(\alpha \)-decision rules, number of different rules, lower and upper bounds on the minimum length of \(\alpha \)-decision rules and 0.5-hypothesis for \(\alpha \)-decision rules. We also present experimental results for the generalized decision approach. It allows us to make some comparative study of length and number of different rules, based on the proposed approach and the generalized decision approach.

The paper consists of eight sections. In Sect. 2, main notions are discussed. In Sect. 3, a parameter M(T) and auxiliary statement are presented. This parameter is used for analysis of a greedy algorithm. Section 4 is devoted to the consideration of a set cover problem. In Sect. 5, the greedy algorithm for construction of approximate decision rules is studied. In this section we also present a lower and upper bounds on the minimum rule length based on the information about greedy algorithm work, and 0.5-hypothesis for tables with many-valued decisions. In Sect. 6, we discuss the problem of recognition of labels of points in the plain. In Sect. 7, experimental results are presented. Section 8 contains conclusions.

2 Main Definitions

In this section, we consider definitions corresponding to decision tables with many-valued decisions.

A (binary) decision table with many-valued decisions is a rectangular table T filled by numbers from the set \(\{0,1\}\). Columns of this table are labeled with attributes \(f_{1},\ldots ,f_{n}\). Rows of the table are pairwise different, and each row is labeled with a nonempty finite set of natural numbers (set of decisions). Note that each decision table with one-valued decisions can be interpreted also as a decision table with many-valued decisions. In such table, each row is labeled with a set of decisions which has one element. An example of decision table with many-valued decisions \(T_0\) is presented in Table 1.

Table 1. Decision table \(T_0\) with many-valued decisions

We will say that T is a degenerate table if either T is empty (has no rows), or the intersection of sets of decisions attached to rows of T is nonempty.

A decision which belongs to the maximum number of sets of decisions attached to rows in T is called the most common decision for T. If we have more than one such decision, we choose the minimum one. If T is empty then 1 is the most common decision for T.

Let \(r = (b_1,\ldots ,b_n)\) be a row of T labeled with a set of decisions D(r) and \(d\in D(r)\). By U(Trd) we denote the set of rows \(r^\prime \) from T for which \(d \notin D(r^\prime )\). We will say that an attribute \(f_i \) separates a row \(r^\prime \in U(T,r,d)\) from the row r if the rows r and \(r^\prime \) have different values at the intersection with the column \(f_i\). The pair (Tr) will be called a decision rule problem.

Let \(\alpha \) be a real number such that \(0\le \alpha < 1\). A decision rule

$$\begin{aligned} f_{i_{1}}=b_{1}\wedge \ldots \wedge f_{i_{m}}=b_{m}\rightarrow d \end{aligned}$$
(1)

is called an \(\alpha \) -decision rule for the pair (Tr) and decision \(d\in D(r)\) if attributes \(f_{i_{1}},\ldots ,f_{i_{m}}\) separate from r at least \((1-\alpha )|U(T,r,d)|\) rows from U(Trd). The number m is called the length of the rule (1). For example, 0.01-decision rule means that attributes contained in the rule should separate from the row r at least \(99\,\%\) of rows from U(Trd). If \(\alpha \) is equal to 0 we have an exact decision rule (0-decision rule) for (Tr) and d. If \(U(T,r,d)=\emptyset \) then for any \(f_{i_{1}},\ldots ,f_{i_{m}}\in \{f_{1},\ldots ,f_{n}\}\) the rule (1) is an \(\alpha \)-decision rule for (Tr) and d. The rule (1) with empty left-hand side (when \(m=0\)) is an \(\alpha \)-decision rule for (Tr) and d if \(U(T,r,d)=\emptyset \).

We will say that a decision rule is an \(\alpha \)-decision rule for the pair (Tr) if this rule is an \(\alpha \)-decision rule for the pair (Tr) and a decision \(d\in D(r)\). We denote by \(L_{\mathrm {min}}(\alpha ,T,r,d)\) the minimum length of an \(\alpha \) -decision rule for the pair (Tr) and decision \(d\in D(r)\). We denote by \(L_{\mathrm {min}}(\alpha ,T,r)\) the minimum length of an \(\alpha \) -decision rule for the pair (Tr). It is clear that

$$L_{\mathrm {min}}(\alpha ,T,r)=\min \{L_{\mathrm {min}}(\alpha ,T,r,d):d\in D(r)\}.$$

Let \(\alpha \), \(\beta \) be real numbers such that \(0\le \alpha \le \beta <1 \). One can show that \(L_{\mathrm {min}}(\alpha ,T,r,d)\ge L_{\mathrm {min}}(\beta ,T,r,d)\) and \(L_{\mathrm {min}}(\alpha ,T,r)\ge L_{\mathrm {min}}(\beta ,T,r)\).

3 Parameter M(T)

In this section, we consider definition of a parameter M(T) and auxiliary statement from [17]. For the completeness, we will give this statement with proof. We will use the parameter M(T) to evaluate precision of a greedy algorithm relative to the length of rules.

Let T be a decision table with many-valued decisions, which has n columns labeled with attributes \(\{f_1,\ldots ,f_n\}\).

Now, we define the parameter M(T) of the table T. If T is a degenerate table then \(M(T)=0\). Let now T be a nondegenerate table. Let

$$\bar{\delta }=(\delta _1,\ldots ,\delta _n)\in \{0,1\}^n.$$

Then \(M(T,\bar{\delta })\) is the minimum natural m such that there exist attributes \(f_{i_{1}},\ldots ,f_{i_{m}}\) \(\in \{f_1,\ldots ,f_n\}\) for which \(T(f_{i_{1}},\delta _{i_{1}})\ldots (f_{i_{m}},\delta _{i_{m}})\) is a degenerate table. Here \(T(f_{i_{1}},\delta _{i_{1}})\ldots \) \((f_{i_{m}},\delta _{i_{m}})\) is a subtable of the table T consisting only rows that have numbers \(\delta _{i_{1}},\ldots ,\delta _{i_{m}}\) at the intersection with the columns \(f_{i_{1}},\ldots ,f_{i_{m}}\). We denote

$$M(T)=\max \{M(T,\bar{\delta }):\bar{\delta }\in \{0,1\}^n\}.$$

Lemma 1

Let T be a nondegenerate decision table with many-valued decisions which have n columns labeled with attributes \(f_{1},\ldots ,f_{n}\), \(\bar{\delta }\!=\!(\delta _{1},\ldots ,\delta _{n})\!\in \!\{0,1\}^{n}\), and \(\bar{\delta }\) be a row of T. Then

$$L_{\mathrm {min}}(0,T,\bar{\delta })\le M(T,\bar{\delta })\le M(T).$$

Proof

By definition, \(M(T,\bar{\delta })\) is the minimum natural m such that there exist attributes \(f_{i_{1}},\ldots ,f_{i_{m}}\in \{f_{1},\ldots ,f_{n}\}\) for which subtable

$$T^{\prime }=T(f_{i_{1}},\delta _{i_{1}})\ldots (f_{i_{m}},\delta _{i_{m}})$$

is a degenerate table. The subtable \(T^{\prime }\) is nonempty since \(\bar{\delta }\) is a row of this subtable. Therefore there is a decision d which, for each row of \(T^{\prime }\), belongs to the set of decisions attached to this row.

One can show that a decision rule

$$ f_{i_{1}}=\delta _{i_{1}}\wedge \ldots \wedge f_{i_{m}}=\delta _{i_{m}}\rightarrow d $$

is a 0-decision rule for the pair \((T,\bar{\delta })\) and decision d. Therefore \(L_{\mathrm {min}}(0,T,\bar{\delta })\le m = M(T,\bar{\delta })\). By definition, \(M(T,\bar{\delta })\le M(T)\).    \(\square \)

4 Set Cover Problem

In this section, we consider a greedy algorithm for construction of an approximate cover (an \(\alpha \)-cover).

Let \(\alpha \) be a real number such that \(0\le \alpha <1\). Let A be a set containing \(N>0\) elements, and \(F=\{S_{1},\ldots ,S_{p}\}\) be a family of subsets of the set A such that \(A=\bigcup _{i=1}^{p}S_{i}\). We will say about the pair (AF) as about a set cover problem. A subfamily \(\{S_{i_{1}},\ldots ,S_{i_{t}}\}\) of the family F will be called an \(\alpha \) -cover for (AF) if \(|\bigcup _{j=1}^{t}S_{i_{j}}|\ge (1-\alpha )|A|\). The problem of searching for an \(\alpha \)-cover with minimum cardinality for a given set cover problem (AF) is NP-hard [20, 32].

We consider now a greedy algorithm for construction of an \(\alpha \)-cover (see Algorithm 1). At each step, this algorithm chooses a subset from F which covers the maximum number of uncovered elements from A. This algorithm stops when the constructed subfamily is an \(\alpha \)-cover for (AF).

figure a

We denote by \(C_{\mathrm {greedy}}(\alpha ,A,F)\) the cardinality of the constructed \(\alpha \)-cover for (AF), and by \(C_{\mathrm {min}}(\alpha ,A,F)\) we denote the minimum cardinality of an \(\alpha \)-cover for (AF).

The following statement was obtained by J. Cheriyan and R. Ravi in [5]. We present it with our own proof.

Theorem 1

Let \(0<\alpha <1\) and (AF) be a set cover problem. Then

$$ C_{\mathrm {greedy}}(\alpha ,A,F)\le C_{\mathrm {min}}(0,A,F)\ln (1/\alpha )+1. $$

Proof

We denote \(m=C_{\mathrm {min}}(0,A,F)\). If \(m=1\) then, as it is not difficult to show, \(C_{\mathrm {greedy}}(\alpha ,A,F)=1\) and the considered inequality holds. Let \(m\ge 2\) and \(S_{i}\) be a subset of maximum cardinality in F. It is clear that \(|S_{i}|\ge N/m\). So, after the first step we will have at most \(N-N/m=N(1-1/m)\) uncovered elements in the set A. After the first step we have the following set cover problem: the set \(A\setminus S_{i}\) and the family \(\{S_{1}\setminus S_{i},\ldots ,S_{p}\setminus S_{i}\}\). For this problem, the minimum cardinality of a cover is at most m. So, after the second step, when we choose a set \(S_{j}\setminus S_{i}\) with maximum cardinality, the number of uncovered elements in the set A will be at most \(N(1-1/m)^{2}\), etc.

Let the greedy algorithm in the process of \(\alpha \)-cover construction make g steps and construct an \(\alpha \)-cover of cardinality g. Then after the step number \(g-1\) more then \(\alpha N\) elements in A are uncovered. Therefore \(N(1-1/m)^{g-1}>\alpha N\) and \(1/\alpha >(1+1/(m-1))^{g-1}\). If we take the natural logarithm of both sides of this inequality we obtain \(\ln 1/\alpha >(g-1)\ln (1+1/(m-1))\). It is known that for any natural p, the inequality \(\ln (1+1/p)>1/(p+1)\) holds. Therefore \(\ln (1/\alpha )>(g-1)/m\) and \(g<m\ln (1/\alpha )+1\). Since \(m=C_{\mathrm {min} }(0,A,F)\) and \(g=C_{\mathrm {greedy}}(\alpha ,A,F)\), we have \(C_{\mathrm { greedy}}(\alpha ,A,F)< C_{\mathrm {min}}(0,A,F)\ln (1/\alpha )+1\).    \(\square \)

5 Greedy Algorithm for \(\alpha \)-Decision Rule Construction

In this section, we present a greedy algorithm for \(\alpha \)-decision rule construction, lower and upper bounds on the minimum length of \(\alpha \)-decision rules (Sect. 5.1) and 0.5-hypothesis connected with the work of a greedy algorithm (Sect. 5.2).

We use the greedy algorithm for construction of \(\alpha \)-covers to construct \(\alpha \)-decision rules. Let T be a table with many-valued decisions containing n columns labeled with attributes \(f_{1},\ldots ,f_{n} \), \(r=(b_{1},\ldots ,b_{n})\) be a row of T, D(r) be the set of decisions attached to r, \(d\in D(r)\), and \(\alpha \) be a real number such that \(0<\alpha <1\).

We consider a set cover problem (A(Trd), F(Trd)) where \( A(T,r,d)=U(T,r,d)\) is the set of all rows \(r^{\prime }\) of T such that \( d\notin D(r^{\prime })\) and \(F(T,r,d)=\{S_{1},\ldots ,S_{n}\}\). For \( i=1,\ldots ,n\), the set \(S_{i}\) coincides with the set of all rows from A(Trd) which are different from r in the column \(f_{i}\). One can show that the decision rule

$$ f_{i_{1}}=b_{i_{1}}\wedge \ldots \wedge f_{i_{m}}=b_{i_{m}}\rightarrow d $$

is an \(\alpha \)-decision rule for (Tr) and decision \(d\!\in \! D(r)\) if and only if \(\{\!S_{i_{1}}\!,\ldots ,\!S_{i_{m}}\!\}\!\) is an \(\alpha \)-cover for the set cover problem (A(Trd), F(Trd)). Evidently, for the considered set cover problem, \(C_{\mathrm {min}}\!(0,A(\!T,r,d)\!,\!F(\!T,r,d)\!)\!=\!L_{\mathrm {min}}\!(0,T,r,d)\!\), where \(L_{\mathrm {min}}(0,T,r,d)\) is the minimum length of 0-decision rule for (Tr) and decision \(d\in D(r)\).

Let us apply the greedy algorithm (see Algorithm 1) to the considered set cover problem. This algorithm constructs an \(\alpha \)-cover which corresponds to an \(\alpha \) -decision rule \(rule(\alpha ,T,r,d)\) for (Tr) and decision \(d\in D(r)\). From Theorem 1 it follows that the length of this rule is at most

$$ L_{\mathrm {min}}(0,T,r,d)\ln (1/\alpha )+1. $$

We denote by \(L_{\mathrm {greedy}}(\alpha ,T,r)\) the length of the rule constructed by the following polynomial time algorithm: for a given \( \alpha \), \(0<\alpha <1\), decision table T, row r of T and decision \( d\in D(r)\), we construct the set cover problem (A(Trd), F(Trd)) and then apply to this problem the greedy algorithm for construction of an \(\alpha \)-cover. We transform the obtained \(\alpha \)-cover into an \(\alpha \)-decision rule \(rule(\alpha ,T,r,d)\). Among the \(\alpha \)-decision rules \(rule(\alpha ,T,r,d)\), \(d\in D(r)\), we choose a rule with the minimum length. This rule is the output of the considered algorithm. We denote by \(L_{\mathrm {min}}(\alpha ,T,r)\) the minimum length of an \(\alpha \)-decision rule for (Tr). According to what has been said above we have the following statement.

Theorem 2

Let T be a non-degenerate decision table with many-valued decisions, r be a row of T, and \(\alpha \) be a real number such that \( 0<\alpha <1\). Then

$$ L_{\mathrm {greedy}}(\alpha ,T,r)\le L_{\mathrm {min}}(0,T,r)\ln (1/\alpha )+1. $$

Note that the considered algorithm is a generalization of an algorithm studied in [16].

Example 1

Let us apply the considered greedy algorithm to \(\alpha = 0.1\), decision table \(T_0\) (see Table 1) and the second row \(r_2\) of this table.

For each \(d\in D(r_2)=\{1,3\}\) we construct the set cover problem \( (A(T,r_2,d),\) \(F(T,r_2,d))\), where \(A(T,r_2,d)\) is the set of all rows \( r^\prime \) of T such that \(d\notin D(r^\prime )\), \(F(T,r_2,d))=\{S_1,S_2,S_3 \}\), and \(S_i\) coincides with the set of rows from \(A(T,r_2,d)\) which are different from \(r_2\) in the column \(f_i\), \(i=1,2,3\). We have:

  • \(A(T,r_2,1)=\{r_3,r_4\}\), \(F(T,r_2,1)=\{S_1=\{r_3\}\), \(S_2=\{r_4\}\), \( S_3=\{r_4\}\}\),

  • \(A(T,r_2,3)=\{r_1,r_3,r_5\}\), \(F(T,r_2,3)=\{S_1=\{r_1,r_3,r_5\}\), \(S_2=\{r_5\}\), \(S_3=\{r_1\}\}\).

Now, we apply the greedy algorithm for the set cover problem (with \(\alpha =0.1\)) to each of the constructed set cover problems, and transform the obtained 0.1-covers into 0.1-decision rules.

For the case \(d=1\), we obtain the 0.1-cover \(\{S_1,S_2\}\) and corresponding 0.1-decision rule \(f_1=0\wedge f_2=1\rightarrow 1\).

For the case \(d=3\), we obtain the 0.1-cover \(\{S_1\}\) and corresponding 0.1-decision rule \(f_1=0\rightarrow 3\). We choose the shortest rule \( f_1=0\rightarrow 3\) which is the result of our algorithm work.

In order to show that the problem of minimization of \(\alpha \)-decision rule length is NP-hard, let us consider a set cover problem (AF) where \(A=\{a_{1},\ldots ,a_{N}\}\) and \(F=\{S_{1},\ldots ,S_{m}\}\). We define the decision table as T(AF), this table has m columns corresponding to the sets \(S_{1},\ldots ,S_{m}\) respectively, and \(N+1\) rows. For \(j=1,\ldots ,N\), the j-th row corresponds to the element \(a_{j}\). The last \((N+1)\)-th row is filled by 0. For \(j=1,\ldots ,N\) and \(i=1,\ldots ,m\), at the intersection of j-th row and i-th column 1 stays if and only if \(a_{j}\in S_{i}\). The set of decisions corresponding to the last row is equal to \(\{2\}\). All other rows are labeled with the set of decisions \(\{1\}\).

One can show that, for any \(\alpha \), \(0\le \alpha <1\), a subfamily \( \{S_{i_{1}},\ldots ,S_{i_{t}}\}\) is an \(\alpha \)-cover for (AF) if and only if the decision rule

$$ f_{i_{1}}=0\wedge \ldots \wedge f_{i_{t}}=0\rightarrow 2 $$

is an \(\alpha \)-decision rule for T(AF) and the last row of T(AF).

So, we have a polynomial time reduction of the problem of minimization of \( \alpha \)-cover cardinality to the problem of minimization of \(\alpha \) -decision rule length for decision tables with many-valued decisions. Since the first problem is NP-hard [20, 32], we have

Proposition 1

For any \(\alpha \), \(0\le \alpha <1\), the problem of minimization of \(\alpha \)-decision rule length for decision tables with many-valued decisions is NP-hard.

5.1 Upper and Lower Bounds on \(L_{\mathrm {min}}(\alpha ,T,r)\)

In this section, we present some results connected with lower and upper bounds on the minimum length of \(\alpha \)-decision rules, based on the information obtained during the greedy algorithm work.

Let T be a decision table with many-valued decisions and r be a row of T. Let \(\alpha \) be a real number such that \(0\le \alpha <1\). We apply the greedy algorithm to \(T,r,\alpha \) and each \(d\in D(r)\) (really to the corresponding set cover problem) and obtain for every \(d\in D(r)\) an \(\alpha \)-decision rule for the pair (Tr) and decision d. Among these rules we choose a rule with the minimum length, and denote this length by \(u(\alpha ,T,r)\). It is clear that

$$ L_{\mathrm {min}}(\alpha ,T,r)\le u(\alpha ,T,r). $$

Let \(d\in D(r)\). We apply the greedy algorithm to \(T,r,\alpha \) and d, and construct the \(\alpha \)-decision rule \(rule(\alpha ,T,r,d)\). Let the length of this rule be equal to t, and \(\delta _{i}\), \(i=1,\ldots ,t\), be the number of rows from U(Trd) separated from row r at the i-th step of the greedy algorithm work. We denote

$$ l(\alpha ,T,r,d)\!=\!\max \biggl \{\biggl \lceil \frac{\lceil (1-\alpha )|U(T,r,d)|\rceil -(\delta _0+\ldots +\delta _i)}{\delta _{i+1}}\biggr \rceil \!:\! i=0,\dots ,t-1\biggr \}\!, $$

where \(\delta _0=0\). Let us denote

$$ l(\alpha ,T,r)=\min _{d\in D(r)}l(\alpha ,T,r,d). $$

We can almost repeat the first part of the proof of Theorem 1.67 from [16] to obtain the following lower bound:

$$ L_{\mathrm {min}}(\alpha ,T,r,d)\ge l(\alpha ,T,r,d), $$

where \(L_{\mathrm {min}}(\alpha ,T,r,d)\) is the minimum length of \(\alpha \)-decision rule for (Tr) and d. From this inequality it follows that

$$ L_{\mathrm {min}}(\alpha ,T,r)\ge l(\alpha ,T,r). $$

5.2 0.5-Hypothesis

In the book [16], the following 0.5-hypothesis was formulated for decision tables with one-valued decisions: for the most part of decision tables for each row r under the construction of decision rule, during each step the greedy algorithm chooses an attribute which separates from r at least one-half of unseparated rows with decisions other than decision attached to the row r.

Let T be a decision table with many-valued decisions and r be a row of T. We will say that 0.5-hypothesis is true for T and r if for any decision \(d\in D(r)\) under the construction of decision rule for the pair (Tr) and decision d, during each step the greedy algorithm chooses an attribute which separates from r at least 50 % of unseparated rows from U(Trd).

We will say that 0.5-hypothesis is true for T if it is true for each row of T.

Now we consider some theoretical results regarding to 0.5-hypothesis for decision tables with many-valued decisions.

A binary information system I is a table with n rows (corresponding to objects) and m columns labeled with attributes \(f_1,\ldots ,f_m\). This table is filled by numbers from \(\{0,1\}\) (values of attributes). For \(j=1,\ldots ,n\), we denote by \(r_j\) the j-th row of the table I.

The information system I will be called strongly saturated if, for any row \(r_j=(b_1,\ldots ,b_n)\) of I, for any \(k\in \{0,\ldots ,n-1\}\) and for any k rows with numbers different from j, there exists a column \(f_i\) which has at least \(\frac{k}{2}\) numbers \(\lnot b_{i}\) (\(b_i\) is the value of the \(f_i\) column for the row \(r_j\)) at the intersection with the considered k rows.

First, we evaluate the number of strongly saturated binary information systems. After that, we study the work of the greedy algorithm on a decision table with many-valued decisions obtained from a strongly saturated binary information system by adding a set of decisions to each row. It is clear that 0.5-hypothesis holds for every such table.

Theorem 3

[16]. Let us consider binary information systems with n rows and \(m\ge n+\log _{2}n\) columns labeled with attributes \(f_1,\ldots ,f_m\). Then the fraction of strongly saturated information systems is at least \(1-1/2^{m-n-\log _{2}n+1}\).

For example, if \(m\ge n+\log _{2}n+6\), than at least \(99\,\%\) of binary information systems are strongly saturated.

Let us consider the work of the greedy algorithm on an arbitrary decision table T with many-valued decisions obtained from the strongly saturated binary information system. Let r be an arbitrary row of table T and \(d\in D(r)\). For \(i=1,2,\ldots \), after the step number i at most \(|U(T,r,d)|/2^{i}\) rows from U(Trd) are unseparated from r. It is not difficult to show that \(L_{\mathrm {greedy}}(\alpha ,T,r)\le \lceil \log _{2}(1/\alpha )\rceil \) for any real \(\alpha \), \(0<\alpha <1\), where \(L_{\mathrm {greedy}}(\alpha ,T,r)\) is the length of \(\alpha \)-decision rules constructed by the greedy algorithm for (Tr). One can prove that \(L_{\mathrm {greedy}}(0,T,r)\le \log _{2}|U(T,r,d)|+1\). It is easy to check that \(l(0,T,r)\le 2\).

6 Problem of Recognition of Labels of Points in the Plain

In this section, we present a problem of recognition of colors of points in the plain (note that, we recognize labels attached to the points, and labels are named as colors), which illustrates the considered approach and the obtained bounds on precision of the greedy algorithm relative to the length of \(\alpha \)-decision rules.

Let we have a finite set \(S=\{(a_1,b_1),\ldots ,(a_n,b_n)\}\) of points in the plane and a mapping \(\mu \) which corresponds to each point \((a_p,b_p)\) a nonempty subset \(\mu (a_p,b_p)\) of the set \(\{green, yellow, red\}\). Colors are interpreted as decisions, and for each point from S we need to find a decision (color) from the set of decisions attached to this point. We denote this problem by \((S,\mu )\).

For the problem \((S,\mu )\) solving, we use attributes corresponding to straight lines which are given by equations of the kind \(x=\beta \) or \( y=\gamma \). These attributes are defined on the set S and take values from the set \(\{0,1\}\). Consider the line given by equation \(x=\beta \). Then the value of corresponding attribute is equal to 0 on a point \((a,b)\in S\) if and only if \(a<\beta \). Consider the line given by equation \(y=\gamma \). Then the value of corresponding attribute is equal to 0 if and only if \( b<\gamma \).

We now choose a finite set of straight lines which allow us to construct a decision rule with the minimum length for the problem \((S,\mu )\). It is possible that \(a_i=a_j\) or \(b_i=b_j\) for different i and j. Let \( a_{i_{1}},\ldots ,a_{i_{m}}\) be all pairwise different numbers from the set \( \{a_1,\ldots ,a_n\}\) which are ordered such that \(a_{i_{1}}<\ldots <a_{i_{m}}.\) Let \(b_{j_{1}},\ldots ,b_{j_{t}}\) be all pairwise different numbers from the set \(\{b_1,\ldots ,b_n\}\) which are ordered such that \( b_{j_{1}}<\ldots <b_{j_{t}}\).

One can show that there exists a decision rule with minimum length which use only attributes corresponding to the straight lines defined by equations \( x=a_{i_{1}}-1\), \(x=(a_{i_{1}}+a_{i_{2}})/2,\ldots \), \( x=(a_{i_{m-1}}+a_{i_{m}})/2\), \(x=a_{i_{m}}+1\), \(y=b_{j_{1}}-1\), \( y=(b_{j_{1}}+b_{j_{2}})/2,\ldots \), \(y=(b_{j_{t-1}}+b_{j_{t}})/2\), \( y=b_{j_{t}}+1\).

Now, we describe a decision table \(T(S,\mu )\) with \(m+t+2\) columns and n rows. Columns of this table are labeled with attributes \(f_1, \ldots ,f_{m+t+2} \), corresponding to the considered \(m+t+2\) lines. Attributes \(f_1,\ldots ,f_{m+1}\) correspond to lines defined by equations \( x=a_{i_{1}}-1\), \(x=(a_{i_{1}}+a_{i_{2}})/2,\ldots , x=(a_{i_{m-1}}+a_{i_{m}})/2, x=a_{i_{m}}+1\) respectively. Attributes \( f_{m+2},\ldots ,f_{m+t+2}\) correspond to lines defined by equations \( y=b_{j_{1}}-1\), \(y=(b_{j_{1}}+b_{j_{2}})/2,\ldots , y=(b_{j_{t-1}}+b_{j_{t}})/2, y=b_{j_{t}}+1\) respectively. Rows of the table \( T(S,\mu )\) correspond to points \((a_1,b_1),\ldots ,(a_n,b_n)\). At the intersection of the column \(f_l\) and row \((a_p,b_p)\) the value \(f_l(a_p,b_p)\) stays. For \(p=1,\ldots ,n\), the row \((a_p,b_p)\) is labeled with the set of decisions \(\mu (a_p,b_p)\).

Example 2

A problem \((S,\mu )\) with four points and corresponding decision table \( T(S,\mu )\) is depicted in Fig. 1. We write “g” instead of “green”, “r” instead of “red”, and “y” instead of “yellow”.

Fig. 1.
figure 1

Problem \((S,\mu )\) and corresponding decision table \(T(S, \mu )\)

Let us evaluate the parameter \(M(T(S,\mu ))\).

Proposition 2

\(M(T(S,\mu ))\le 4\).

Proof

We denote \(T=T(S,\mu )\). Let \(\bar{\delta }=(\delta _1,\ldots ,\delta _{m+t+2})\in \{0,1\}^{m+t+2}\). If \(\delta _1=0\), or \(\delta _{m+1}=1\), or \(\delta _{m+2}=0\), or \(\delta _{m+t+2}=1\), then \(T(f_1,\delta _1)\), or \(T(f_{m+1},\delta _{m+1})\), or \(T(f_{m+2},\delta _{m+2})\), or \(T(f_{m+t+2},\delta _{m+t+2})\) is empty table and \(M(T,\bar{\delta })\le 1\). Let \(\delta _1=1\), \(\delta _{m+1}=0\), \( \delta _{m+2}=1\) and \(\delta _{m+t+2}=0\). One can show that in this case there exist \(i\in \{1,\ldots ,m\}\) and \(j\in \{m+2,\ldots ,m+t+1\}\) such that \( \delta _i=1\), \(\delta _{i+1}=0\), \(\delta _j=1\), and \(\delta _{j+1}=0\). It is clear that the table \(T(f_i,\delta _i)(f_{i+1},\delta _{i+1})(f_j, \delta _j)(f_{j+1},\delta _{j+1})\) contains exactly one row. So \(M(T,\bar{ \delta })\le 4\) and \(M(T)\le 4\).    \(\square \)

From Lemma 1, Theorem 2 and Proposition 2 the next statement follows:

Corollary 1

For any real \(\alpha \), \(0<\alpha <1\), and any row r of the table \(T(S,\mu )\),

$$ L_{\mathrm {greedy}}(\alpha ,T(S,\mu ),r)<4\ln (1/\alpha )+1. $$

Note that \(4\ln (1/0.01)+1<19.\,43\), \(4\ln (1/0.1)+1<10.\,22\), \(4\ln (1/0.2)+1<7.\,44,\) and \(4\ln (1/0.5)+1<3.\,78\).

7 Results of Experiments

This section consists of three parts:

  • experimental results for the many-valued decisions approach (Sect. 7.1),

  • experimental results for the generalized decision approach (Sect. 7.2),

  • comparative study (Sect. 7.3).

We consider a number of decision tables from UCI Machine Learning Repository [12]. In some tables there were missing values. Each such value was replaced with the most common value of the corresponding attribute. Some decision tables contain conditional attributes that take unique value for each row. Such attributes were removed. In some tables there were equal rows with, possibly, different decisions.

Table 2. Characteristics of decision tables with many-valued decisions

In this case each group of identical rows was replaced with a single row from the group which is labeled with the set of decisions attached to rows from the group. To obtain rows which are labeled with sets containing more than one decision we removed from decision tables more conditional attributes. The information about such decision tables can be found in Table 2. This table contains name of initial table, number of rows (column “Rows”), number of attributes (column “Attr”), spectrum of this table (column “Spectrum”), and list of names of removed attributes (column “Removed attributes”). Spectrum of a decision table with many-valued decisions is a sequence \(\#1\), \(\#2\),..., where \(\#i\), \(i=1,2,\ldots \), is the number of rows labeled with sets of decision with cardinality equals to i. All experiments are performed using DAGGER software tool [1] in C++.

7.1 Proposed Approach

We made four groups of experiments which are connected with:

  • length of constructed \(\alpha \)-decision rules,

  • number of different \(\alpha \)-decision rules,

  • lower and upper bounds on the minimum length of \(\alpha \)-decision rules,

  • 0.5-hypothesis.

The first group of experiments is the following. For decision tables described in Table 2 and \(\alpha \in \{0.0,0.001,0.01,0.1,0.2,0.3\}\), we apply to each row of table the greedy algorithm. After that, among the constructed rules we find minimum (column “min”), average (column “avg”) and maximum (column “max”) length of such rules. Results can be found in Tables 3 and 4.

Table 3. Length of \(\alpha \)-decision rules for \(\alpha \in \{0.0,0.001,0.01\}\) constructed by greedy algorithm
Table 4. Length of \(\alpha \)-decision rules for \(\alpha \in \{0.1,0.2,0.3\}\) constructed by greedy algorithm

One can see that the length of constructed \(\alpha \)-decision rules is decreasing when the value of \(\alpha \) is increasing, and the greedy algorithm constructs relatively short \(\alpha \)-decision rules.

Table 5 presents the number of different rules constructed by the greedy algorithm for \(\alpha \in \{0.0,0.001,0.01,0.1,0.2,0.3\}\). In the worst case, the number of different rules can be equal to the number of rows in decision table T. One can see that with the exception of three tables, the number of different rules is non-increasing when the value of \(\alpha \) is increasing.

Table 5. Number of different rules constructed by greedy algorithm

Next group of experimental results is connected with lower and upper bounds on the minimum length of \(\alpha \)-decision rules. Figures 2 and 3 present average values of bounds \(l(\alpha ,T,r)\) and \(u(\alpha ,T,r)\) among all rows r of T for \(\alpha \), \(0\le \alpha <1\), with the step 0.01.

The last group of experiments is connected with 0.5-hypothesis. Table 6 contains, for \(i=1,2,\ldots \), the average percentage of rows separated at i-th step of the greedy algorithm (average among all rows r and decisions \(d\in D(r)\)).

For decision tables described in Table 2 we find the number of rows for which 0.5-hypothesis is true. Table 7 contains name of decision table, number of rows and number of rows for which 0.5-hypothesis is true.

Results in Table 6 show that average percentage of rows separated at i-th step of the greedy algorithm during the exact decision rule construction is more than or equal to 50 % (7-th step of the greedy algorithm for “spect-test-1”). We say that 0.5-hypothesis is true for T if it is true for each row of T. Based on results in Table 7 we can see that 0.5-hypothesis is true for 12 decision tables and is not true for 9 decision tables: “breast-cancer-1”, “kr-vs-kp-5”, “kr-vs-kp-4”, “lymphography-5”, “poker-hand-train-5”, “poker-hand-train-5a”, “spect-test-1”, “tic-tac-toe-3”, and “zoo-data-5”.

Fig. 2.
figure 2

Lower and upper bounds on \(L_{\mathrm {min}}(\alpha ,T,r)\) (“balance-scale-1” and “nursery-1”)

Fig. 3.
figure 3

Average values of lower and upper bounds on \(L_{\mathrm {min}}(\alpha ,T,r)\) (“kr-vs-kp-5” and “spect-test-1”)

Table 6. Average percentage of rows separated at i-th step of the greedy algorithm work
Table 7. Number of rows in decision tables for which 0.5-hypothesis is true

7.2 Generalized Decision Approach

In this section, we present experimental results for \(\alpha \)-decision rules relative to:

  • length of constructed \(\alpha \)-decision rules,

  • number of different \(\alpha \)-decision rules,

  • 0.5-hypothesis.

In the generalized decision approach [2325], the greedy algorithm constructs for each row one \(\alpha \)-decision rule which has on the right-hand side the generalized decision (a number encoding the set of decisions attached to a given row) see Table 8.

Table 8. Transformation of the set of decisions for the generalized decision approach

For decision tables described in Table 2 and \(\alpha \!\in \!\{\!0.0,\!0.001,\!0.01,\!0.1,\!0.2,\!0.3\!\}\) we apply to each row of table the greedy algorithm. After that, among the constructed rules we find minimum (column “min”), average (column “avg”) and maximum (column “max”) length of such rules. Results can be found in Tables 9 and 10.

Table 9. Length of \(\alpha \)-decision rules for \(\alpha \in \{0.0,0.001,0.01\}\)–generalized decision approach
Table 10. Length of \(\alpha \)-decision rules for \(\alpha \in \{0.1,0.2,0.3\}\)–generalized decision approach
Table 11. Number of different rules–generalized decision approach
Table 12. Average percentage of rows separated at i-th step of the greedy algorithm work–generalized decision approach
Table 13. Number of rows in decision tables for which 0.5-hypothesis is true–generalized decision approach

We can say that for this approach the greedy algorithm constructs relatively short \(\alpha \)-decision rules.

We computed the number of different rules constructed by the greedy algorithm for \(\alpha \in \{0.0,0.001,0.01,0.1,0.2,0.3\}\). Results can be found in Table 11. For the generalized decision approach, in the worst case, the number of different rules can be equal to the number of rows in decision table T. With the exception of one table, the number of different rules is nonincreasing with the growth of \(\alpha \).

The last group of experiments is connected with 0.5-hypothesis. Table 12 contains, for \(i=1,2,\ldots \), the average percentage of rows separated at i-th step of the greedy algorithm work (average among all rows r). For two decision tables, the average percentage of separated rows is less than 50 %: for “spect-test-1” – at the 7-th and the 8-th step of the greedy algorithm work, for “mushroom-5”– at the 6-th step of the greedy algorithm work.

We say that 0.5-hypothesis is true for T, if is true for each row of T. Table 13 contains, for decision tables described in Table 2, number of rows for which 0.5-hypothesis is true. From 21 decision tables, the 0.5-hypothesis is not true for 8 of them: “breast-cancer-1”, “kr-vs-kp-5”, “kr-vs-kp-4”, “lymphography-5”, “mushroom-5”, “spect-test-1”, “tic-tac-toe-3” and “zoo-data-5”.

7.3 Comparative Study

In this section, we make comparative study of \(\alpha \)-decision rules for the proposed approach and the generalized decision approach, relative to:

  • length of constructed \(\alpha \)-decision rules,

  • number of different \(\alpha \)-decision rules,

  • 0.5-hypothesis.

Table 14, based on results from Tables 3 and 9 presents, for \(\alpha \in \{0.0,\) \( 0.001, 0.01\}\), a comparison of minimum (column “min”), average (column “avg”) and maximum (column “max”) length of \(\alpha \)-decision rules for both approaches. Each input of Table 14 is equal to the (min, avg, max) length of \(\alpha \)-decision rules for the generalized decision approach divided by the (min, avg, max) length of \(\alpha \)-decision rules for proposed approach.

Table 14. Comparison of length of \(\alpha \)-decision rules for \(\alpha \in \{0.0,0.001,0.01\}\)

We can find decision tables for which minimum, average and maximum length of \(\alpha \)-decision rules constructed using the proposed approach is two or more times shorter than minimum, average and maximum length of \(\alpha \)-decision rules constructed using generalized decision approach. However, for the maximum values of length of 0.01-decision rules for decision tables “poker-hand-train-5” and “poker-hand-train-5a” we have an opposite situation.

Table 15, based on results from Tables 4 and 10 presents, for \(\alpha \in \{0.1, 0.2,\) \( 0.3\}\), comparison of minimum (column “min”), average (column “avg”) and maximum (column “max”) length of \(\alpha \)-decision rules for both approaches. Each input of Table 15 is equal to the corresponding input of Table 10 divided by the input of Table 4.

Table 15. Comparison of length of \(\alpha \)-decision rules for \(\alpha \in \{0.1,0.2,0.3\}\)
Table 16. Comparison of number of different rules

Results are similar to the results from Table 14.

Table 16, based on results from Tables 5 and 11 presents, for \(\alpha \in \{0.0,\) \( 0.001, 0.01, 0.1, 0.2, 0.3\}\), a comparison of the number of different \(\alpha \)-decision rules for both approaches. Each input of Table 16 is equal to the number of different \(\alpha \)-decision rules for the generalized decision approach divided by the number of different \(\alpha \)-decision rules for the proposed approach. We can see that often the number of different \(\alpha \)-decision rules for the generalized decision approach is two or more times greater than the number of different rules for the proposed approach.

The last group of results is connected with 0.5-hypothesis. Based on results from Tables 7 and 13 we can see that, for the proposed approach, the 0.5-hypothesis is not true for 9 decision tables, for generalized decision approach, the 0.5-hypothesis is not true for 8 decision tables. So, the difference is not significant.

8 Conclusions

We studied the greedy algorithm for construction of approximate decision rules. This algorithm has polynomial time complexity for the whole set of decision tables with many-valued decisions. We obtained a bound on precision of this algorithm relative to the length of rules, and considered lower and upper bounds on the minimum length of \(\alpha \)-decision rules. We studied binary decision tables with many-valued decisions but the considered approach can be used also for decision tables with more than two values of attributes, as presented in Sect. 7. Experimental results are connected with the construction of exact and approximate decision rules. Based on them, we can see, that the greedy algorithm constructs relatively short \(\alpha \)-decision rules. We also presented results relative to length, number of different \(\alpha \)-decision rules and 0.5-hypothesis for the approach based on generalized decision.

Based on results connected with comparison of two approaches we can see that the length and number of different rules constructed in the framework of our approach (one decision from the set of decisions attached to row) are usually smaller than the length and number of different rules constructed in the framework of the generalized decision approach (all decisions from the set of decisions attached to row).

Future investigations will be connected with the study of other greedy algorithms and construction of classifiers for decision tables with many-valued decisions.