An Adjusted Apriori Algorithm to Itemsets Defined by Tables and an Improved Rule Generator with Three-Way Decisions

Jian, Zhiwen; Sakai, Hiroshi; Ohwa, Takuya; Shen, Kao-Yi; Nakata, Michinori

doi:10.1007/978-3-030-52705-1_7

Zhiwen Jian¹⁴,
Hiroshi Sakai¹⁴,
Takuya Ohwa¹⁴,
Kao-Yi Shen¹⁵ &
…
Michinori Nakata¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12179))

Included in the following conference series:

International Joint Conference on Rough Sets

1565 Accesses
3 Citations

Abstract

The NIS-Apriori algorithm, which is extended from the Apriori algorithm, was proposed for rule generation from non-deterministic information systems and implemented in SQL. The realized system handles the concept of certainty, possibility, and three-way decisions. This paper newly focuses on such a characteristic of table data sets that there is usually a fixed decision attribute. Therefore, it is enough for us to handle itemsets with one decision attribute, and we can see that one frequent itemset defines one implication. We make use of these characteristics and reduce the unnecessary itemsets for improving the performance of execution. Some experiments by the implemented software tool in Python clarify the improved performance.

You have full access to this open access chapter, Download conference paper PDF

Apriori Algorithm with Dynamic Parameter Selection and Pruning of Misleading Rules

Granulated Tables with Frequency by Discretization and Their Application

Finding Suitable Threshold for Support in Apriori Algorithm Using Statistical Measures

Keywords

1 Introduction

We are following rough set based rule generation from table data sets [10, 14, 22] and Apriori based rule generation from transaction data sets [1, 2, 9], and we are investigating a new framework of rule generation from table data sets with information incompleteness [17,18,19,20,21].

Table 1 is a standard table. We term such a table as a Deterministic Information System (DIS). In DISs, several rough set based rule generation methods are proposed [3, 5, 10, 14, 16, 22, 23]. Furthermore, missing values ‘?’ [6, 7, 11] (Table 2) and a Non-deterministic Information System (NIS) [12, 13, 15] (Table 3) were also investigated to cope with information incompleteness. In [12], question-answering based on possible world semantics was investigated, and an axiom system was given for query translation to one equivalent normal form [12].

In NIS, some attribute values are given as a set of possible attribute values due to information incompleteness. In Tables 2, $\{2,3\}$ in x2 implies ‘either 2 or 3 is the actual value, but there is no information to decide it’, and ‘?’ does there is no information. We replace each ‘?’ with all possible attribute values and have Table 3. Thus, we can handle ‘?’ in NIS (some discretization may be necessary for continuous attribute values). Formerly in NISs, question-answering and information retrieval were investigated, and we are coping with rule generation from NISs.

Table 1. An exemplary DIS $\psi $.

Full size table

Table 2. An exemplary NIS $\varPhi $ with missing value ‘?’, whose value is one of 1, 2, 3.

Full size table

Table 3. An exemplary NIS $\varPhi $. Each ‘?’ is replaced with a set $\{1,2,3\}$ of possible attribute values.

Full size table

The Apriori algorithm [1] was proposed by Agrawal for handling transaction data sets. We adjust this algorithm to DIS and NIS by using the characteristics of table data sets. The highlight of this paper is the following.

(1)
A brief survey of Apriori based rule generation and a rule generator,
(2)
Some improvements of the Apriori based algorithm and a rule generator,
(3)
Experiment by the improved rule generator in Python.

This paper is organized as follows: Sect. 2 surveys our framework on NISs and the Apriori algorithm [1, 2, 9]. Section 3 connects table data sets to transaction data sets and copes with the manipulation of candidates of rules. Then, more effective manipulation is proposed in DISs and NISs. Section 4 describes a new NIS-Apriori based system in Python and presents the improved results. Section 5 concludes this paper.

2 Preliminary: An Overview of Rule Generation and Examples

This section briefly reviews rule generation from DISs and NISs.

2.1 Rules and Rule Generation from DISs

In Table 1, we consider implications like $[P,3]\Rightarrow [Dec,a]$ from x1 and $[R,2]\wedge [S,1]\Rightarrow [Dec,b]$ from x3. Generally, a rule is defined as an implication satisfying some constraint. The following is one standard definition of rules [1, 2, 9, 14, 22]. We follow this definition and consider the following rule generation from DIS.

(A rule from DIS). A rule is an implication $\tau $ satisfying $support(\tau )\ge \alpha $ and $accuracy(\tau )\ge \beta $ ($0< \alpha ,~\beta \le 1.0$) for given threshold values $\alpha $ and $\beta $.

(Rule generation from DIS). If we fix $\alpha $ and $\beta $ in DIS, the set of all rules is also fixed, but we generally do not know them. Rule generation is to generate all minimal rules (we term a rule with minimal condition part a minimal rule).

Here, $support(\tau )$ is an occurrence ratio of an implication $\tau $ for the total objects and $accuracy(\tau )$ is a consistency ratio of $\tau $ for the condition part of $\tau $. For example, let us consider $\tau : [R,2]\wedge [S,1]\Rightarrow [Dec,b]$ from x3. Since $\tau $ occurs one time for five objects, we have $support(\tau )$ = 1/5. Since $[R,2]\wedge [S,1]$ occurs two times, we have $accuracy(\tau )$ = 1/2. Fig. 1 shows all minimal rules (redundant rules are not generated) from Table 1.

2.2 Rules and Rule Generation from NISs

From now, we employ the symbols $\varPhi $ and $\psi $ for expressing NIS and DIS, respectively. In NIS $\varPhi $, we replace a set of all possible values with an element of this set, and then we have one DIS. We term such a DIS a derived DIS from NIS, and let $DD(\varPhi )$ denote a set of all derived DISs from NIS. Table 1 is a derived DIS from Table 3. In NISs like Table 3, we consider the following two types of rules,

(1)
A rule which we certainly conclude from NIS (a certain rule),
(2)
A rule which we may conclude from NIS (a possible rule).

These two types of rules seem to be natural for rule generation with information incompleteness. Yao recalls three-valued logic in rough sets and proposes three-way decisions [23, 24]. These types of rules concerning missing values were also investigated in [6, 11], and we coped with the following two types of rules based on possible world semantics [18, 20]. The definition in [6, 11] and the following definition are semantically different [18].

(A certain rule from NIS). An implication $\tau $ is a certain rule, if $\tau $ is a rule in each of derived DIS from NIS,

(A possible rule from NIS). An implication $\tau $ is a possible rule, if $\tau $ is a rule in at least one derived DIS from NIS.

(Rule generation from NIS). If we fix $\alpha $ and $\beta $ in NIS, the set of all certain rules and the set of all possible rules are also fixed. Rule generation is to generate all minimal certain rules and all minimal possible rules.

Two types of rules depend on all derived DISs from NIS, and the number of them increases exponentially. For Table 3, the number is 324 (=$2^2\times 3^4$), and the number is more than $10^{100}$ for the Mammographic data set [4]. Thus, the realization of a system to handle two types of rules was seemed to be hard, however, we gave one solution to this problem.

(Proved Property). For each implication $\tau $, we developed some formulas to calculate the following,

(1)
$minsupp(\tau )=\min _{\psi \in DD(\varPhi )}\{support(\tau ) \text{ in } \psi \}$,
(2)
$minacc(\tau )=\min _{\psi \in DD(\varPhi )}\{accuracy(\tau ) \text{ in } \psi \}$,
(3)
$maxsupp(\tau )=\max _{\psi \in DD(\varPhi )}\{support(\tau ) \text{ in } \psi \}$,
(4)
$maxacc(\tau )=\max _{\psi \in DD(\varPhi )}\{accuracy(\tau ) \text{ in } \psi \}$.

This calculation employs the rough sets based concept and is independent of the number of derived DISs [18, 20, 21]. By using these formulas, we proved a method to examine ‘$\tau $ is a certain rule or not’ and ‘$\tau $ is a possible rule or not’. This method is also independent of the number of all derived DISs [18, 20, 21].

We apply this property to the Apriori algorithm for realizing a rule generation system. The Apriori algorithm effectively enumerates itemsets (candidates of rules), and the support and accuracy values of every candidate are calculated by the Proved Property. Figures 2 and 3 show the obtained minimal certain rules and minimal possible rules from Table 3. As for the execution time, we discuss it in Sect. 4.

2.3 A Relation Between Rules in DISs and Rules in NISs

Let $\psi ^{actual}$ be a derived DIS with actual information from NIS $\varPhi $ (we cannot decide $\psi ^{actual}$ from $\varPhi $, but we suppose there is an actual $\psi ^{actual}$ for $\varPhi $), then we can easily have the next inclusion relation.

$$\begin{aligned}\begin{array}{ll} \{\tau ~|~\tau \text{ is } \text{ a } \text{ certain } \text{ rule } \text{ in } \varPhi \} \subseteq \{\tau ~|~\tau \text{ is } \text{ a } \text{ rule } \text{ in } \psi ^{actual}\} \\ \subseteq \{\tau ~|~\tau \text{ is } \text{ a } \text{ possible } \text{ rule } \text{ in } \varPhi \} \end{array}\end{aligned}$$

Due to information incompleteness, we know lower and upper approximations of a set of rules in $\psi ^{actual}$. This property follows the concept of rough sets based approximations.

2.4 The Apriori Algorithm for Transaction Data Sets

Let us consider Table 4, which shows four persons’ purchase of items. Such structured data is termed a transaction data set. In this data set, let us focus on a set $\{ham,beer\}$. Such a set is generally termed an itemset. For this itemset, we consider two implications $\tau _{1}: ham\Rightarrow beer$ and $\tau _{2}: beer\Rightarrow ham$. In $\tau _{1}$, $support(\tau _{1})$ = 3/4 and $accuracy(\tau _{1})$ = 3/3. In $\tau _{2}$, $support(\tau _{2})$ = 3/4 and $accuracy(\tau _{2})$ = 3/4. For an itemset $\{ham,beer,corn\}$, we consider six implications, $ham\wedge beer\Rightarrow corn$, $\cdots $, $beer\Rightarrow corn\wedge ham$. Like this, Agrawal proposed a method to obtain rules from transaction data sets, which is known as the Apriori algorithm [1, 2, 9]. This algorithm makes use of the following.

Table 4. An exemplary transaction data set

Full size table

(Monotonicity of support). For two itemsets P and Q, if P $\subseteq $ Q, $support(Q)\le support(P)$ holds.

By using this property, the Apriori algorithm enumerates all itemsets, which satisfy $support\ge \alpha $. Each of such itemsets is termed a frequent itemset. Let us consider the manipulation of itemsets in Table 4 under $support\ge 0.5$. Since there are four transactions, each itemset must occur more than two times. Let $CAN_{i}$ and $FI_{i}$ ($i\ge 0$) denote a set of all candidates of itemsets and a set of all frequent itemsets consisting of $(i+1)$-items, respectively. We have the following.

$$\begin{aligned}&CAN_{0}=\{\{bread\}(\text{ Occurrence= }1),\{milk\}(1),\{ham\}(3),\{beer\}(4),\{corn\}(2), \\&\qquad \{cheese\}(2),\{apple\}(1),\{potato\}(1),\{cake\}(1)\}, \\&FI_{0}=\{\{ham\}(3),\{beer\}(4),\{corn\}(2),\{cheese\}(2)\}, \\&CAN_{1}=\{\{ham,beer\},\{ham,corn\},\{ham,cheese\},\{beer,corn\},\\&\qquad \{beer,cheese\},\{corn,cheese\}\}, \\&FI_{1}=\{\{ham,beer\}(3),\{ham,corn\}(2),\{beer,corn\}(2),\{beer,cheese\}(2)\}, \\&CAN_{2}=\{\{ham,beer,corn\},\{ham,beer,cheese\},\{ham,corn,cheese\},\\&\qquad \{beer,corn,cheese\}\}, \\&FI_{2}=\{\{ham,beer,corn\}(2)\}. \end{aligned}$$

Each element in $CAN_{i}$ ($i\ge 1$) is generated by the combination of two itemsets in $FI_{i-1}$ [1, 2]. Then, every itemset satisfying the support condition becomes the element of $FI_{i}$. For example, for $A:\{ham,corn\}$, $B:\{beer,cheese\}\in FI_{1}$, we add one element of B to A and have $\{ham,corn,beer\}, \{ham,corn$, $cheese\}\in CAN_{2}$. We also do the converse and have $\{beer,cheese,ham\}, \{beer$, $cheese,corn\}\in CAN_{2}$. Only one itemset $\{ham,corn,beer\}$ satisfies the support condition and becomes an element of $FI_{2}$. Like this, $FI_{1}$, $FI_{2}$, $\cdots $, $FI_{n}$ are obtained at first, then the accuracy value of each implication defined by a frequent itemset is evaluated. In the subsequent sections, we change the above manipulation by using the characteristics of table data sets.

3 Some Improvements of the NIS-Apriori Based Rule Generator

We describe the improvements in our framework based on Sect. 2.

3.1 From Transaction Data Sets to Table Data Sets

We translate Table 1 to Table 5 and identify each descriptor with an item. Then, we can see that Table 5 is a transaction data set. Thus, we can apply the Apriori algorithm to rule generation.

Table 5. A transaction data set for DIS $\psi $ in Table 1.

Full size table

We define the next sets $IMP_{1}$, $IMP_{2}$, $\cdots $, $IMP_{n}$.

$IMP_{1}=\{[A,val_{A}]\Rightarrow [Dec,val]\}$,
$IMP_{2}=\{[A,val_{A}]\wedge [B,val_{B}]\Rightarrow [Dec,val]\}$,
$IMP_{3}=\{[A,val_{A}]\wedge [B,val_{B}]\wedge [C,val_{C}]\Rightarrow [Dec,val]\}$,

Here, $IMP_{i}$ means a set of implications which consist of i-condition attributes. A minimal rule is an implication $\tau \in \cup _{i}IMP_{i}$, and we may examine each $\tau \in \cup _{i}IMP_{i}$. However, in the subsequent sections, we consider some effective manipulations to generate minimal rules in $IMP_{1}$, $IMP_{2}$, $\cdots $, sequentially.

3.2 The Manipulation I for Frequent Itemsets by the Characteristics of Table Data Sets

Here, we make use of the characteristics of table data sets below.

(TA1). The decision attribute Dec is fixed. So, it is enough to consider each itemset including one descriptor whose attribute is Dec. For example, we do not handle any itemset like $\{[P,3],[Q,2]\}$ nor $\{[P,3],[Dec,a],[Dec,b]\}$ in Table 5.

(TA2). An attribute is related to each descriptor. So, we handle itemsets with different attributes. For example, we do not handle any itemset like $\{[P,3],[P,1]$, $[Q,2],[Dec,b]\}$ in Table 5.

(TA3). To consider implications, we handle $CAN_{1}$, $FI_{1}$ ($\subseteq IMP_{1})$, $CAN_{2}$, $FI_{2}$ ($\subseteq IMP_{2})$, $\cdots $, which are defined in Sect. 2.4.

Based on the above characteristics, we can consider Fig. 4. In Fig. 4, itemsets satisfying (TA1) and (TA2) are enumerated. Generally, in the Apriori algorithm, the accuracy value is examined after obtaining all $FI_{i}$, because the decision attribute is not fixed. For each set in $FI_{i}$, there are plural implications. However, in a table data set, one implication corresponds to a frequent itemset. We employed this property and proposed the Apriori algorithm adjusted to table data sets [20, 21] in Fig. 5. We term this algorithm the DIS-Apriori algorithm. Here, we calculate the accuracy value of every frequent itemset in each while loop (the rectangle area circled by the dotted line in Fig. 4 and lines 5-7 in Fig. 5). We can easily handle certain rules and possible rules in NISs by extending the DIS-Apriori algorithm.

Proposition 1

[20, 21]

(1)
We replace DIS $\psi $ with NIS $\varPhi $, support and accuracy with minsupp and minacc, respectively. Then, this algorithm generates all minimal certain rules.
(2)
We replace DIS $\psi $ with NIS $\varPhi $, support and accuracy with maxsupp and maxacc, respectively. Then, this algorithm generates all minimal possible rules.
(3)
We term the algorithm consisting of (1) and (2) the NIS-Apriori algorithm.

Both DIS-Apriori and NIS-Apriori algorithms are logically sound and complete for rules. They generate rules without excess and deficiency.

Figures 1, 2 and 3 by the rule generator in SQL are based on the algorithm in Fig. 5 and Proposition 1.

3.3 The Manipulation II for Frequent Itemsets by the Characteristics of Table Data Sets

Now, we advance the manipulation I to the manipulation II. We focus on the statement ‘create $FI_{i}$’ in lines 2 and 10 in Fig. 5. In every while loop, we examine each $\tau \in FI_{i}\subseteq CAN_{i}\subseteq IMP_{i}$, so to reduce sets $CAN_{i}$ and $FI_{i}$ will influence the performance of execution. In Fig. 5, we at first need to remark the following.

(Rule generation). The purpose of rule generation is to generate each minimal implication $\tau \in \cup _{i}IMP_{i}$ satisfying $support(\tau )\ge \alpha $ and $accuracy(\tau )\ge \beta $. We obtain $Rule_{1}, Rest_{1}\subseteq IMP_{1}$ in the 1st while loop, $Rule_{2}, Rest_{2}\subseteq IMP_{2}$ in the 2nd while loop, and $Rule_{3}, Rest_{3}$ in the 3rd while loop, $\cdots $.

(Relation between sets in Fig. 5). We clarify the relation and the definition of $NOrule_{i}$ below.

(1)
$Rule_{i}=\{\tau \in IMP_{i}~|~support(\tau )\ge \alpha ,~accuracy(\tau )\ge \beta \}$,
(2)
$Rest_{i}=\{\tau \in IMP_{i}~|~support(\tau )\ge \alpha ,~accuracy(\tau )<\beta \}$,
(3)
$FI_{i}=\{\tau \in IMP_{i}~|~support(\tau )\ge \alpha \}$,
(4)
$NOrule_{i}=\{\tau \in IMP_{i}~|~support(\tau )<\alpha \}$,
(5)
$IMP_{i}=FI_{i}\cup NOrule_{i}=(Rule_{i}\cup Rest_{i})\cup NOrule_{i}$.

(A case of $\tau \in Rule_{i}$). If $\tau : \wedge _{j} [A_{j},val_{j}]\Rightarrow [Dec,val]\in Rule_{i}$, we do not deal with any redundant implication $\tau ': (\wedge _{j} [A_{j},val_{j}])\wedge [B,b]\Rightarrow [Dec,val]\in IMP_{i+1}$, because $\tau '$ cannot be a minimal rule.

(A case of $\tau \in NOrule_{i}$). If $\tau : \wedge _{j} [A_{j},val_{j}]\Rightarrow [Dec,val]\in NOrule_{i}$, any redundant implication $\tau ': (\wedge _{j} [A_{j},val_{j}])\wedge [B,b]\Rightarrow [Dec,val]$ satisfies $support(\tau ')<\alpha $. So, $\tau '\in IMP_{i+1}$ cannot be a rule. Thus, we do not deal with any redundant implication $\tau '$.

(A case of $\tau \in Rest_{i}$). In the accuracy value, the monotonicity like support does not hold (an example is in [20]). Thus, if $\tau : \wedge _{j} [A_{j},val_{j}]\Rightarrow [Dec,val]\in Rest_{i}$, $accuracy(\tau ')\ge \beta $ may hold for a redundant implication $\tau ': (\wedge _{j} [A_{j},val_{j}])\wedge [B,b]\Rightarrow [Dec,val]\in FI_{i+1}$.

Proposition 2

Let us suppose that we had $Rule_{i}$ and $Rest_{i}$ $(IMP_{i}$=$Rule_{i}\cup Rest_{i}\cup NOrule_{i})$ in the i-th while loop in Fig. 5. Every candidate of a minimal rule in $IMP_{i+1}$ is a redundant implication of $\tau \in Rest_{i}$.

(Proof)

For every implication $\tau \not \in FI_{i}\subseteq IMP_{i}$, its redundant implication $\tau '$ satisfies $support(\tau ')\le support(\tau )<\alpha $. Thus, $\tau '$ cannot be a minimal rule in $IMP_{i+1}$. Based on the Apriori algorithm, we need to combine two frequent itemsets in $FI_{i}$=$Rule_{i}\cup Rest_{i}$ (an example of this combination is described in Sect. 2.4). However, for the minimality condition of rules, we do not handle any redundant implication of $\tau \in Rule_{i}$. Thus, we conclude that every candidate of a minimal rule in $IMP_{i+1}$ is a redundant implication of $\tau \in Rest_{i}$.

Definition 1

We define a set $RCAN_{i}~(\subseteq CAN_{i})$, whose element is a candidate of a minimal rule in $IMP_{i}$ w.r.t. rules $\cup _{j=1,\cdots ,(i-1)}Rule_{j}$ and a set $RFI_{i}=\{\tau \in RCAN_{i}~|~support(\tau )\ge \alpha \}$ $(\subseteq FI_{i}\subseteq IMP_{i})$.

In the Apriori algorithm, the concept of redundancy is not introduced, so that some redundant rules may be generated. The sets $CAN_{i}$ and $FI_{i}$ in Fig. 4 are generated from $FI_{i-1}$ (=$Rule_{i-1}\cup Rest_{i-1}$). However, we can generate $RCAN_{i} (\subseteq CAN_{i})$ and $RFI_{i} (\subseteq FI_{i})$ from $Rest_{i-1}$. Furthermore, we previously generated itemsets $\{[A,a],[B,b],[Dec,v1]\},\{[A,a],[B,b],[Dec,v2]\}\in RCAN_{2}$ from $\{[A,a],[Dec,v1]\}, \{[B,b],[Dec,v2]\}\in Rest_{1}$, and we removed this combination, because there is no object satisfying both [Dec, v1] and [Dec, v2]. This combination formerly generated meaningless itemsets. This revision is another improvement in the manipulation of itemsets.

Proposition 3

The set $RCAN_{i}$ and $RFI_{i}$ are given as follows:

$$\begin{aligned}&(i=1) RCAN_{1}=CAN_{1} \textit{ and } RFI_{1}=FI_{1}, \\&(i\ge 2) RCAN_{i}=\{\tau : (\wedge _{j} [A_{j},val_{j}])\wedge [B,b]\Rightarrow [Dec,val]~|~\\&\qquad \wedge _{j} [A_{j},val_{j}]\Rightarrow [Dec,val]\in Rest_{i-1}, [B,b]\Rightarrow [Dec,val]\in Rest_{1}\}, \\&\qquad RFI_{i}=\{\tau \in RCAN_{i}~|~support(\tau )\ge \alpha \}. \end{aligned}$$

(Proof)

(In case of i = 1) $RCAN_{1}$ = $CAN_{1}$ and $RFI_{1}$ = $FI_{1}$ hold, because redundant rules occur after 2nd while loop.

$(In~ case~ of~ i\ge 2)$ We add one descriptor [B, b] to $\wedge _{j} [A_{j},val_{j}]\Rightarrow [Dec,val]\in Rest_{i-1}$ and have a redundant implication $\tau : (\wedge _{j} [A_{j},val_{j}])\wedge [B,b]\Rightarrow [Dec,val]\in IMP_{i}$ due to Proposition 2.

(1)
In order to handle the same decision, [B, b] must be the condition part of $\tau ': [B,b]\Rightarrow [Dec,val]\in RFI_{1}$ = $FI_{1}$. (If $\tau '\not \in FI_{1}$, $support(\tau )<\alpha $ holds and $\tau $ cannot be a rule, because $\tau $ is a redundant implication of $\tau ')$.
(2)
$FI_{1}$ = $Rule_{1}\cup Rest_{1}$ holds. If $\tau '\in Rule_{1}$, $\tau $ cannot be a minimal rule, because $\tau '$ is a minimal rule.

Based on the above discussion, we conclude $\tau '\in Rest_{1}$.

We propose the manipulation II in Fig. 6 due to the above propositions. In the Apriori algorithm, $CAN_{i}$ is generated by $FI_{i-1}$, but we can remove redundant implications of $\tau \in Rule_{i-1}$. Thus, we can handle $RCAN_{i}$, which is a subset of $CAN_{i}$. If the number of elements in $Rule_{i-1}$ is large, the number of elements in $RCAN_{i}$ will be much smaller than that of $CAN_{i}$.

Proposition 4

The DIS-Apriori algorithm with the manipulation II is sound and complete for minimal rules in DIS, and the NIS-Apriori algorithm with the manipulation II is also sound and complete for minimal certain rules and minimal possible rules in NIS. They do not miss any rule defined in DIS $\psi $ or NIS $\varPhi $.

(Sketch of Proof). We have proved that the DIS-Apriori and NIS-Apriori algorithms are sound and complete [20, 21]. We newly introduced sets $RCAN_{i}\subseteq CAN_{i}$ and $RFI_{i}\subseteq FI_{i}$ by using the redundancy of rules, and we extended the previous two algorithms to those with the manipulation II. The proposed algorithm does not examine each $\tau \in \cup _{j}IMP_{j}$, but examines each $\tau \in \cup _{j}RCAN_{j}$. As a result, this algorithm generates the same rules defined by the procedure ‘to examine each $\tau \in \cup _{j}IMP_{j}$’.

Table 6. The Car Evaluation data set (Objects: 1728, condition attributes: 6). A:$|Rule_{1}|$, B:$|CAN_{2}|$ or $|RCAN_{2}|$, C:$|Rule_{2}|$, D:$|CAN_{3}|$ or $|RCAN_{3}|$, E:$|Rule_{3}|$, F:$|CAN_{4}|$ or $|RCAN_{4}|$, G:$|Rule_{4}|$.

Full size table

Table 7. The Phishing data set (Objects: 1353, condition attributes: 9). Here, A, B, $\cdots $, G are the same as Table 6.

Full size table

Table 8. The Congressional Voting data set (Objects: 435, condition attributes: 16). There are 392 missing values, thus $|DD(\varPhi )|$ = $2^{392}\ge 10^{100}$ (the number of derived DISs exceeds $10^{100}$). A certain rule is a rule in each of more than $10^{100}$ derived DISs. A possible rule is a rule in at least one derived DISs. Here, A, B, $\cdots $, G are the same as Table 6.

Full size table

Table 9. The Lithology data set (Objects: 1923, condition attributes: 10). There are 519 missing values, therefore there are more than $10^{100}$ ($2^{519}\fallingdotseq (2^{10})^{50}> (10^{3})^{50}>10^{100}$) derived DISs. Here, A, B, $\cdots $, G are the same as Table 6.

Full size table

4 An Improved Apriori Based Rule Generator and Some Experiments

This section compares the NIS-Apriori algorithm and the NIS-Apriori algorithm with the manipulation II. Of course, two algorithms generate the same rules due to Propositions 1 and 4, and the latter algorithm makes use of the redundancy concept. We newly implemented two systems in Python (Windows PC, CPU: Intel i7-4600U, 2.7 z). Table 6 shows the results on the Car Evaluation data set [4], and Table 7 does the results on the Phishing data set [4]. They are the cases of DISs, and the characteristic of $RCAN_{i}\subseteq CAN_{i}$ is effectively employed.

Now, we show two examples by the NIS-Apriori algorithm. The one is the Congressional Voting data set [4], and the other is the Lithology data set [8]. As we described in Proposition 1, the NIS-Apriori algorithm (certain rule generation) is the DIS-Apriori algorithm with criterion values minsupp and minacc. Thus, the number of candidates of itemsets is also reduced by the manipulation II. The experiments easily examine the advancement of the manipulation II (Tables 8 and 9).

5 Concluding Remarks

We recently adjusted the Apriori algorithm to table data sets and proposed the DIS-Apriori and NIS-Apriori algorithms. This paper makes use of the characteristics of table data sets (one decision attribute Dec is fixed) and improved these algorithms. If we do not handle table data sets, there was no necessity for considering Fig. 6. The framework of the manipulation II (Fig. 6) is an improvement of Apriori based rule generation by using the characteristics of table data sets. We can generate minimal rules by using $RCAN_{i}\subseteq CAN_{i}$ and $RFI_{i}\subseteq FI_{i}$. This reduction causes to reduce the candidates of itemsets. We newly implemented the proposed algorithm in Python and examined the improvement of the performance of execution by experiments.

References

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of VLDB 1994, pp. 487–499. Morgan Kaufmann (1994)
Google Scholar
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. In: Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI/MIT Press (1996)
Google Scholar
Ciucci, D., Flaminio, T.: Generalized rough approximations in PI 1/2. Int. J. Approx. Reason. 48(2), 544–558 (2008)
Article Google Scholar
Frank, A., Asuncion, A.: UCI machine learning repository. School of Information and Computer Science, University of California, Irvine (2010). http://mlearn.ics.uci.edu/MLRepository.html. Accessed 10 July 2019
Greco, S., Matarazzo, B., Słowiński, R.: Granular computing and data mining for ordered data: the dominance-based rough set approach. In: Meyers, R.A. (ed.) Encyclopedia of Complexity and Systems Science, pp. 4283–4305. Springer, New York (2009). https://doi.org/10.1007/978-0-387-30440-3
Chapter Google Scholar
Grzymała-Busse, J.W., Werbrouck, P.: On the best search method in the LEM1 and LEM2 algorithms. In: Orłowska, E. (ed.) Incomplete Information: Rough Set Analysis. Studies in Fuzziness and Soft Computing, vol. 13, pp. 75–91. Springer, Heidelberg (1998). https://doi.org/10.1007/978-3-7908-1888-8_4
Chapter Google Scholar
Grzymala-Busse, J.W.: Data with missing attribute values: generalization of indiscernibility relation and rule induction. In: Peters, J.F., Skowron, A., Grzymała-Busse, J.W., Kostek, B., Świniarski, R.W., Szczuka, M.S. (eds.) Transactions on Rough Sets I. LNCS, vol. 3100, pp. 78–95. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27794-1_3
Chapter Google Scholar
Hossain, T.M., Watada, J., Hermana, M., Shukri, S.R., Sakai, H.: A rough set based rule induction approach to geoscience data. In: Proceedings of UMSO 2018. IEEE (2018). https://doi.org/10.1109/UMSO.2018.8637237
Jovanoski, V., Lavrač, N.: Classification rule learning with APRIORI-C. In: Brazdil, P., Jorge, A. (eds.) EPIA 2001. LNCS (LNAI), vol. 2258, pp. 44–51. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45329-6_8
Chapter Google Scholar
Komorowski, J., Pawlak, Z., Polkowski, L., Skowron, A.: Rough sets: a tutorial. In: Pal, S.K., Skowron, A. (eds.) Rough Fuzzy Hybridization: A New Method for Decision Making, pp. 3–98. Springer, Heidelberg (1999)
Google Scholar
Kryszkiewicz, M.: Rules in incomplete information systems. Inf. Sci. 113(3–4), 271–292 (1999)
Article MathSciNet Google Scholar
Lipski, W.: On databases with incomplete information. J. ACM 28(1), 41–70 (1981)
Article MathSciNet Google Scholar
Orłowska, E., Pawlak, Z.: Representation of nondeterministic information. Theoret. Comput. Sci. 29(1–2), 27–39 (1984)
Article MathSciNet Google Scholar
Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11(5), 341–356 (1982)
Article Google Scholar
Pawlak, Z.: Systemy Informacyjne: Podstawy Teoretyczne. WNT (1983). (in Polish)
Google Scholar
Riza, L.S., et al.: Implementing algorithms of rough set theory and fuzzy rough set theory in the R package RoughSets. Inf. Sci. 287(10), 68–89 (2014)
Article Google Scholar
Sakai, H., Ishibashi, R., Koba, K., Nakata, M.: Rules and apriori algorithm in non-deterministic information systems. Trans. Rough Sets 9, 328–350 (2008)
MATH Google Scholar
Sakai, H., Wu, M., Nakata, M.: Apriori-based rule generation in incomplete information databases and non-deterministic information systems. Fundam. Inf. 130(3), 343–376 (2014)
Article MathSciNet Google Scholar
Sakai, H.: Execution logs by RNIA software tools. http://www.mns.kyutech.ac.jp/~sakai/RNIA. Accessed 10 July 2019
Sakai, H., Nakata, M.: Rough set-based rule generation and Apriori-based rule generation from table data sets: a survey and a combination. CAAI Trans. Intell. Technol. 4(4), 203–213 (2019)
Article Google Scholar
Sakai, H., Nakata, M., Watada, J.: NIS-Apriori-based rule generation with three-way decisions and its application system in SQL. Inf. Sci. 507, 755–771 (2020)
Article MathSciNet Google Scholar
Skowron, A., Rauszer, C.: The discernibility matrices and functions in information systems. In: Słowiński, R. (ed.) Intelligent Decision Support - Handbook of Advances and Applications of the Rough Set Theory, pp. 331–362. Kluwer Academic Publishers, Berlin (1992)
Chapter Google Scholar
Yao, Y.Y.: Three-way decisions with probabilistic rough sets. Inf. Sci. 180, 314–353 (2010)
Article MathSciNet Google Scholar
Hu, M., Yao, Y.: Structured approximations as a basis for three-way decisions in rough set theory. Knowl.-Based Syst. 165, 92–109 (2019)
Article Google Scholar

Download references

Acknowledgment

The authors would be grateful to the anonymous referees for their useful comments. This work is supported by JSPS (Japan Society for the Promotion of Science) KAKENHI Grant Number JP20K11954.

Author information

Authors and Affiliations

Graduate School of Engineering, Kyushu Institute of Technology, Tobata, Kitakyushu, 804-8550, Japan
Zhiwen Jian, Hiroshi Sakai & Takuya Ohwa
Department of Banking and Finance, Chinese Culture University (SCE), Taipei, Taiwan
Kao-Yi Shen
Faculty of Management and Information Science, Josai International University, Gumyo, Togane, Chiba, 283-0002, Japan
Michinori Nakata

Authors

Zhiwen Jian
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Sakai
View author publications
You can also search for this author in PubMed Google Scholar
Takuya Ohwa
View author publications
You can also search for this author in PubMed Google Scholar
Kao-Yi Shen
View author publications
You can also search for this author in PubMed Google Scholar
Michinori Nakata
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hiroshi Sakai .

Editor information

Editors and Affiliations

Department of Computer Science, Universidad Central de Las Villas, Santa Clara, Cuba
Rafael Bello
Department of Computer Science and Technology, Tongji University, Shanghai, China
Duoqian Miao
School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON, Canada
Rafael Falcon
Department of Management and Information Science, Josai International University, Togane, Chiba, Japan
Michinori Nakata
Departamento de Inteligencia Artificial e Infraestructura de Sistemas Informáticos, Universidad Tecnológica de La Habana “José Antonio Echeverría” (CUJAE), Havana, Cuba
Alejandro Rosete
Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, Italy
Davide Ciucci

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jian, Z., Sakai, H., Ohwa, T., Shen, KY., Nakata, M. (2020). An Adjusted Apriori Algorithm to Itemsets Defined by Tables and an Improved Rule Generator with Three-Way Decisions. In: Bello, R., Miao, D., Falcon, R., Nakata, M., Rosete, A., Ciucci, D. (eds) Rough Sets. IJCRS 2020. Lecture Notes in Computer Science(), vol 12179. Springer, Cham. https://doi.org/10.1007/978-3-030-52705-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-52705-1_7
Published: 07 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-52704-4
Online ISBN: 978-3-030-52705-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Adjusted Apriori Algorithm to Itemsets Defined by Tables and an Improved Rule Generator with Three-Way Decisions

Abstract

Similar content being viewed by others

Apriori Algorithm with Dynamic Parameter Selection and Pruning of Misleading Rules

Granulated Tables with Frequency by Discretization and Their Application

Finding Suitable Threshold for Support in Apriori Algorithm Using Statistical Measures

Keywords

1 Introduction