A New Association Rule Mining Algorithm Based on Compression Matrix

Shu, Sihui

doi:10.1007/978-3-319-01766-2_33

Sihui Shu³

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 277))

1330 Accesses
1 Citations

Abstract

A new association rule mining algorithm based on matrix is introduced. It mainly compresses the transaction matrix efficiently by integrating various strategies. The new algorithm optimizes the known association rule mining algorithms based on matrix given by some researchers in recent years, which greatly reduces the temporal complexity and spatial complexity, and highly promotes the efficiency of association rule mining. It is especially feasible when the degree of the frequent itemset is high.

Access provided by Autonomous University of Puebla. Download conference paper PDF

SR-Mine: Adaptive Transaction Compression Method for Frequent Itemsets Mining

Article 16 November 2021

A Survey on Representation for Itemsets in Association Rule Mining

An Improved Algorithm for Mining Association Rules Based on Partitioned and Compressed Association Graph

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Association rule mining is one of the most important and well-researched techniques of data mining, which is firstly introduced by Agrawal et al. [1]. They presented well-known Apriori algorithm in 1993, since many methods have been involved in the improvement and optimization of Apriori algorithm, such as binary code technology, genetic algorithm, and algorithms based on matrix [2, 3]. The algorithm based on matrix could only scan the database for one time to convert the transactions into matrix and could be reordered by item support count in non-descending order to reduce the number of candidate itemsets and could highly promote Apriori algorithm efficiency in temporal complexity and spatial complexity.

A great deal of work on Apriori algorithms based on matrix has been done [4, 5]. In this chapter, a new improvement of Apriori algorithm based on compression matrix is proposed and could achieve better performance.

2 Preliminaries

Some basic preliminaries used in association rule mining are introduced in this section. Let T = {T ₁,T ₂, ⋯,T _m} be a database of transactions and T _k(k = 1, 2, ⋯, m) denotes a transaction. Let I = {I ₁, I ₂, ⋯ I _n} be a set of binary attributes, called Items. I _k(k = 1, 2, ⋯, n) denotes an Item. Each transaction T _k in T contains a subset of items in I. The number of items contained in T _k is called the length of transaction T _k, which is symbolized |T _k|.

An association rule is defined as an implication of the form X ⇒ Y, where X, Y ∈ I and X ∩ Y = ϕ. The support (min-sup) of the association rule X ⇒ Y is the support (resp. frequency) of the itemset X ∪ Y. If support (min-sup) of an itemset X is greater than or equal to a user-specified support threshold, then X is called frequent itemsets.

In the process of the association rule mining, we find frequent itemsets firstly and produce association rule by these frequent itemsets secondly. So the key procedure of the association rule mining is to find frequent itemsets; some properties of frequent itemset are given as the following:

Property 1 [1]

Every nonempty subset of a frequent itemset is also a frequent itemsets.

By the definition of frequent k-itemset, the conclusion below is easily obtained.

Property 2

If the length |T _i| of a transaction T _i is less than k, then T _i is valueless for generating the frequent k-itemset.

3 An Improvement on Apriori Algorithm Based on Compression Matrix

A new improvement on Apriori algorithm based on compression matrix is introduced. The process of our new algorithm is described as follows:

1.
Generate the transaction matrix.

For a given database with n transactions and m items, the m × n transaction matrix D = (d _ij) is determined, in which d _ij sets 1 if item I _i is contained in transaction T _j or otherwise sets 0.

$$ D=\begin{array}{cc}\hfill \hfill & \hfill \begin{array}{cccc}\hfill {T}_1\hfill & \hfill {T}_2\hfill & \hfill \hfill & \hfill {T}_n\hfill \end{array}\hfill \\ {}\hfill \begin{array}{c}\hfill {I}_1\hfill \\ {}\hfill {I}_2\hfill \\ {}\hfill \vdots \hfill \\ {}\hfill {I}_m\hfill \end{array}\hfill & \hfill \left(\begin{array}{cccc}\hfill {d}_{11}\hfill & \hfill {d}_{12}\hfill & \hfill \cdots \hfill & \hfill {d}_{1n}\hfill \\ {}\hfill {d}_{21}\hfill & \hfill {d}_{22}\hfill & \hfill \cdots \hfill & \hfill {d}_{2n}\hfill \\ {}\hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \hfill & \hfill \vdots \hfill \\ {}\hfill {d}_{m1}\hfill & \hfill {d}_{m2}\hfill & \hfill \cdots \hfill & \hfill {d}_{mn}\hfill \end{array}\right)\hfill \end{array} $$

where $ {d}_{ij}=\left\{\begin{array}{c}\hfill 1,{I}_i\in {T}_j\hfill \\ {}\hfill 0,{I}_i\notin {T}_j\hfill \end{array}\right. $. i = 1, 2, ⋯, m j = 1, 2, ⋯, n.

For each I _k, T _j, $ {v}_k={\displaystyle \sum_{i=1}^n{d}_{ij}} $, k = 1, 2, ⋯, m; $ {h}_j={\displaystyle \sum_{i=1}^m{d}_{ij}} $ , j = 1, 2, ⋯, n.

2.
Produce frequent 1-itemset L ₁ and frequent 2-itemset support matrix D ₁.

The frequent 1-itemset L ₁ is L ₁ = {I _k|v _k ≥ min ‐ sup}.

3.1 Matrix Compression Procedure

In order to reduce the storage space and computation complexity, useless rows and columns should be discovered and removed in “matrix compression procedure,” which will be reused frequently in subsequent processes. Useless rows and columns can be classified into two classes, so the compression procedure is separated into two steps:

(i)
A row I _k is considered as worthless when the corresponding v _k is less than the support min-sup; a column T _j is considered as worthless when the corresponding h _j is less than 2 according to Property 2. Thus, we drop these rows or columns one by one and update v _k and h _j immediately after each drop operation. Subsequently, repeat the procedure (i) until there is no such row or column.
(ii)
Let’s consider the second class of useless rows and store their frequent itemsets for being used in the next procedure. Every row I _l whose corresponding v _l is less than $ \left[\sqrt{n}\right] $ ([x] is the largest integer which is no greater than x) would be removed after its frequent itemsets are calculated as below:

Let min-sup = b. For a satisfied item I _l, let S _l = {T _j|d _lj = 1} and S _l' be the b-combinations set of elements in S _l: $ {S}_l\hbox{'}=\left\{\left({T}_{j_1},{T}_{j_2},\dots, {T}_{j_m}\right)\right.\Big|\left.{T}_{j_1},{T}_{j_2},\dots, {T}_{j_m}\in {S}_l\Big)\right\} $. Each b-tuple $ \left({T}_{j_1},{T}_{j_2},\dots, {T}_{j_m}\right) $ from S _l ' would be scanned in turn, if there exist items $ {I}_{l_1},{I}_{l_2},\cdots, {I}_{l_k} $ except I _l that let $ {d}_{l_i{j}_1}={d}_{l_i{j}_2}=\dots ={d}_{l_i{j}_m}=1 $ (i = 1, 2, ⋯, k), the collection $ \left({I}_{l_1},{I}_{l_2},\cdots, {I}_{l_k},{I}_l\right) $ is one frequent itemset containing I _l. All the frequent itemsets containing I _l can be obtained though handling every b-tuple element from S _l'. After repeating step (i) and (ii) until there is no such useless row or column in the compressed matrix of D, the frequent 2-itemset support matrix D ₁ is produced:
$$ {D}_1=\begin{array}{cc}\hfill \hfill & \hfill \begin{array}{cccc}\hfill {T}_{j_1}\hfill & \hfill {T}_{j_2}\hfill & \hfill \hfill & \hfill {T}_{jq}\hfill \end{array}\hfill \\ {}\hfill \begin{array}{c}\hfill {I}_{i_1}\hfill \\ {}\hfill {I}_{i_2}\hfill \\ {}\hfill \vdots \hfill \\ {}\hfill {I}_{i_p}\hfill \end{array}\hfill & \hfill \left(\begin{array}{cccc}\hfill {d}_{i_1{j}_1}\hfill & \hfill {d}_{i_1{j}_2}\hfill & \hfill \cdots \hfill & \hfill {d}_{i_1{j}_q}\hfill \\ {}\hfill {d}_{i_2{j}_1}\hfill & \hfill {d}_{i_2{j}_2}\hfill & \hfill \cdots \hfill & \hfill {d}_{i_2{j}_q}\hfill \\ {}\hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \hfill & \hfill \vdots \hfill \\ {}\hfill {d}_{i_p{j}_1}\hfill & \hfill {d}_{i_p{j}_2}\hfill & \hfill \cdots \hfill & \hfill {d}_{i_p{j}_q}\hfill \end{array}\right)\hfill \end{array} $$

where 1 ≤ i ₁ < i ₂ < ⋯ i _p ≤ m, 1 ≤ j ₁ < j ₂ < ⋯ j _q ≤ n.

3.
Produce the frequent 2-itemset L ₂ and the frequent 3-itemset support matrix D ₂.

The frequent 2-itemset L ₂ is the union of the 2-itemset subsets produced by frequent itemsets in step (ii) of procedure (2) and a set L′₂ determined by comparing the inner product of each two row vectors of matrix D ₁ with the support min-sup

$$ {L}_2^{\prime }=\left\{\left({I}_{i_h},{I}_{i_r}\right)\right.\Big|{\displaystyle \sum_{k=1}^q{d}_{i_h{j}_k}{d}_{i_r{j}_k}}\ge \min \hbox{-} \sup \left.,h<r,h,r=1,2,\cdots, p\right\}. $$

Matrix D′₂ is obtained by calculating “and” operation of the two corresponding row vectors of every element $ \left({I}_{i_h},{I}_{i_r}\right) $ in L′₂, that is:

$$ {D}_2^{\prime }=\begin{array}{cc}\hfill \hfill & \hfill \begin{array}{cccc}\hfill {T}_{j_1}\hfill & \hfill \hfill & \hfill \hfill & \hfill {T}_{j_2}\hfill \end{array}\begin{array}{ccccc}\hfill \hfill & \hfill \hfill & \hfill \hfill & \hfill {T}_{jq}\hfill & \hfill \hfill \end{array}\hfill \\ {}\hfill \begin{array}{c}\hfill \left({I}_{i_{h_1}},{I}_{i_{r_1}}\right)\hfill \\ {}\hfill \left({I}_{i_{h_1}},{I}_{i_{r_2}}\right)\hfill \\ {}\hfill \vdots \hfill \\ {}\hfill \left({I}_{i_{h_s}},{I}_{i_{r_t}}\right)\hfill \end{array}\hfill & \hfill \left(\begin{array}{cccc}\hfill {d}_{i_{h_1}{j}_1}{d}_{i_{r_1}{j}_1}\hfill & \hfill {d}_{i_{h_1}{j}_2}{d}_{i_{r_1}{j}_2}\hfill & \hfill \cdots \hfill & \hfill {d}_{i_{h_1}{j}_q}{d}_{i_{r_1}{j}_q}\hfill \\ {}\hfill {d}_{i_{h_1}{j}_1}{d}_{i_{r_2}{j}_1}\hfill & \hfill {d}_{i_{h_1}{j}_2}{d}_{i_{r_2}{j}_2}\hfill & \hfill \cdots \hfill & \hfill {d}_{i_{h_1}{j}_q}{d}_{i_{r_2}{j}_q}\hfill \\ {}\hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \hfill & \hfill \vdots \hfill \\ {}\hfill {d}_{i_{h_s}{j}_1}{d}_{i_{r_t}{j}_1}\hfill & \hfill {d}_{i_{h_s}{j}_2}{d}_{i_{r_t}{j}_2}\hfill & \hfill \cdots \hfill & \hfill {d}_{i_{h_s}{j}_q}{d}_{i_{r_t}{j}_q}\hfill \end{array}\right)\hfill \end{array} $$

where 1 ≤ h ₁ < h ₂ < ⋯ h _s ≤ p, 1 ≤ r ₁ < r ₂ < ⋯ r _t ≤ p, n ₁ is called row numbers of matrix D′₂.

(i)
Remove rows or columns in D′₂ using the same approach in step (i) of (2), while column $ {T}_{j_k} $ is considered as useless when $ \left({h}_{j_k}<2\right) $ its length is less than 3 according to Property 2, we drop these columns. Update v _k and h _j immediately, and we drop these rows which the corresponding v _k is less than the support min-sup. Subsequently, repeat the procedure (i) until there is no such row or column.
(ii)
Similarly with step (ii) of (2), every row $ \left({I}_{i_s},{I}_{i_t}\right) $ whose corresponding v _s is less than $ \left[\sqrt{n_1}\right] $ would be removed after finding and storing its frequent itemsets.

Then, the frequent 3-itemset support matrix D ₂ is produced by repeating the matrix compression procedure (i) and (ii) until no more row or column which is considered as a useless element could be found. That is,

$$ {D}_2=\begin{array}{cc}\hfill \hfill & \hfill \begin{array}{cccc}\hfill \hfill & \hfill {T}_{j_{p_1}}\hfill & \hfill \hfill & \hfill \hfill \end{array}\begin{array}{ccccc}\hfill \hfill & \hfill \hfill & \hfill {T}_{j_{p_2}}\hfill & \hfill \hfill & \hfill \hfill \end{array}\begin{array}{cccc}\hfill \hfill & \hfill \hfill & \hfill {T}_{j_{p_w}}\hfill & \hfill \hfill \end{array}\hfill \\ {}\hfill \begin{array}{c}\hfill \left({I}_{i_{h_{s_1}}},{I}_{i_{r_{t_1}}}\right)\hfill \\ {}\hfill \left({I}_{i_{h_{s_1}}},{I}_{i_{r_{t_2}}}\right)\hfill \\ {}\hfill \vdots \hfill \\ {}\hfill \left({I}_{i_{h_{su}}},{I}_{i_{r_{t_v}}}\right)\hfill \end{array}\hfill & \hfill \left(\begin{array}{cccc}\hfill {d}_{i_{h_{s_1}}{j}_{p_1}}{d}_{i_{r_{t_1}}{j}_{p_1}}\hfill & \hfill {d}_{i_{h_{s_1}}{j}_{p_2}}{d}_{i_{r_{t_1}}{j}_{p_2}}\hfill & \hfill \cdots \hfill & \hfill {d}_{i_{h_{s_1}}{j}_{p_w}}{d}_{i_{r_{t_1}}{j}_{p_w}}\hfill \\ {}\hfill {d}_{i_{h_{s_1}}{j}_{p_1}}{d}_{i_{r_{t_2}}{j}_{p_1}}\hfill & \hfill {d}_{i_{h_{s1}}{j}_{p_2}}{d}_{i_{r_{t_2}}{j}_{p_2}}\hfill & \hfill \cdots \hfill & \hfill {d}_{i_{h_{s1}}{j}_{p_w}}{d}_{i_{r_{t_2}}{j}_{p_w}}\hfill \\ {}\hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \hfill & \hfill \vdots \hfill \\ {}\hfill {d}_{i_{h_{s_u}}{j}_{p_1}}{d}_{i_{r_{t_v}}{j}_{p_1}}\hfill & \hfill {d}_{i_{h_{s_u}}{j}_{p_2}}{d}_{i_{r_{t_v}}{j}_{p_2}}\hfill & \hfill \cdots \hfill & \hfill {d}_{i_{h_{s_u}}{j}_{p_w}}{d}_{i_{r_{t_v}}{j}_{p_w}}\hfill \end{array}\right)\hfill \end{array} $$

where $ {j}_1\le {j}_{p_1}<{j}_{p_2}<\cdots <{j}_{p_w}\le {j}_q $, $ \left({I}_{i_{h_{s_y}}},{I}_{i_{r_{t_z}}}\right)\in \left\{\left({I}_{i_{h_m}},{I}_{i_{r_n}}\right)\Big|m=1,2,\cdots, s;n=1,2,\cdots, t\right\} $.

Let $ {L}_2^{{\prime\prime} }=\left\{\left({I}_{i_{h_{s_y}}},{I}_{i_{r_{t_z}}}\right)\right\} $ be the compressed frequent 2-itemset of D ₂.

4.
Produce the frequent 3-itemset L ₃ and the frequent 4-itemset support matrix D ₃.

The frequent 3-itemset is the union of all 3-itemset subsets of the frequent itemsets generated in step (ii) of procedures (2) and (3), and a set defined as {$ \left({I}_{i_{h_{s_m}}},{I}_{i_{r_{t_n}}},{I}_{i_{r_{t_k}}}\right)\Big|\left({I}_{i_{h_{s_m}}},{I}_{i_{r_{t_n}}}\right),\left({I}_{i_{h_{s_m}}},{I}_{i_{r_{t_k}}}\right),\left({I}_{i_{h_{s_n}}},{I}_{i_{r_{t_k}}}\right)\in {L}_2^{{\prime\prime} } $ and inner product of corresponding row vectors of $ \left({I}_{i_{h_{s_m}}},{I}_{i_{r_{t_n}}}\right) $ and $ \left({I}_{i_{h_{s_m}}},{I}_{i_{r_{t_k}}}\right) $ in D ₂ is not less than min-sup}.

Similarly with previous steps, the intermediate matrix D′₃ is produced by calculating “and” operation of the corresponding row vector of $ \left({I}_{i_{h_{s_m}}},{I}_{i_{r_{t_n}}}\right) $ and $ \left({I}_{i_{h_{s_m}}},{I}_{i_{r_{t_k}}}\right) $ in L′₂, which are derived from the element $ \left({I}_{i_{h_{s_m}}},{I}_{i_{r_{t_n}}},{I}_{i_{r_{t_k}}}\right) $ in L ₃.

n ₂ is called row numbers of matrix D′₃.

(i)
Remove rows or columns using the same approach in step (i) of (2) or (3) and execute the following procedure.
(ii)
When the sum of the corresponding row of $ \left({I}_{i_{h_{s_m}}},{I}_{i_{r_{t_n}}},{I}_{i_{r_{t_k}}}\right) $ is less than and equal to $ \left[\sqrt{n_2}\right] $, we find and store frequent itemsets containing items $ \left({I}_{i_{h_{s_m}}},{I}_{i_{r_{t_n}}},{I}_{i_{r_{t_k}}}\right) $ by the same approach in step (ii) of (2), (3) again remove the corresponding row of $ \left({I}_{i_{h_{s_m}}},{I}_{i_{r_{t_n}}},{I}_{i_{r_{t_k}}}\right) $. Then the matrix compression procedure is repeated until no more row or column which is considered as a useless element could be found.
5.
Analogously, the frequent 4-itemset,…, the frequent k-itemset is produced by step (2) to step (5) until the frequent k-itemset support matrix D _k is empty.

4 Algorithm Example Experiment Studying

Suppose that a transaction database is listed as Table 33.1 which is simulated for the number of min-sup is 2.

Table 33.1 Transaction database

Full size table

(1)
Generate the transaction matrix, and calculate the sum of each row v _s and the sum of each column h _s as described in Table 33.2.
Table 33.2 Transaction matrix
Full size table
(2)
Produce the frequent 1-itemset. L ₁ = {I _k|v _k ≥ 2} = {I ₁,I ₂,I ₃,I ₄,I ₅,I ₆,I ₇,I ₈,I ₉,I ₁₀}.
(i)
It is obvious that the corresponding columns of T ₅ should be dropped since the sum of which is less than 2 (h _s < 2). After updating v _s, the corresponding row of I ₇ is removed with regard to its v _s < 2. Then recalculate h _s and accordingly remove the corresponding columns of T ₉. Finally, the new compression matrix is shown in Table 33.3.
Table 33.3 Compression matrix1
Full size table
(ii)
Because $ \left[\sqrt{n}\right]=\left[\sqrt{9}\right]=3 $ and the corresponding v _s of I ₁ I ₆ I ₈ I ₉ is less than and equal to 3, we need to find all the frequent itemsets containing items I _l (l = 1, 6, 8, 9), then remove I _l (l = 1, 6, 8, 9).

The given min-sup being 2, find frequent itemsets containing I ₈ firstly since v ₈ = 2. S ₈ = {T ₁,T ₄|d _8j = 1} and the 2-combinations set of elements in S ₈ is S ₈ ' = {(T ₁,T ₄)}. It is obvious that I ₁ and I ₃ are the rows whose matrix element with column T ₁ and T ₄ are both 1. So (I ₁ I ₃ I ₈) is the only frequent itemset containing I ₈, thus we store (I ₁ I ₃ I ₈) and drop row I ₈. Then another item I ₁ is considered, S ₁ = {T ₁,T ₂,T ₄|d _1j = 1} and the 2-combinations set of elements in S ₁ is S ₁ ' = {(T ₁,T ₂), (T ₁,T ₄), (T ₂,T ₄)}. Frequent itemsets containing items I ₁ are obtained by dealing with three 2-tuples in S ₁ ' successively. Collection (I ₁ I ₃) is the frequent itemset determined by (T ₁,T ₂) using the similar approach in finding the frequent itemset containing I ₈. Similarly, (T ₁,T ₄) determines collection (I ₁ I ₃) and (T ₂,T ₄) determines collection (I ₁ I ₃ I ₄ I ₅). From the above, all the frequent itemsets containing items I ₁ are (I ₁ I ₃) and (I ₁ I ₃ I ₄ I ₅). Continuing scanning other satisfied items accordingly, all the frequent itemsets containing items I _l (l = 1, 6, 8, 9) are found: L′₁ = {(I ₁ I ₃ I ₈),(I ₁ I ₃ I ₄ I ₅),(I ₆ I ₂ I ₃),(I ₉ I ₂ I ₆),(I ₉ I ₂ I ₃)}. After removing rows I _l (l = 1, 6, 8, 9), the newly compressed matrix is shown in Table 33.4.

Table 33.4 Compression matrix2

Full size table

We drop the corresponding columns of T ₇ since the sum of which is less than 2 (h _s < 2) and recalculate v _s again. Then the support matrix of the frequent 2-itemset is listed in Table 33.5. Regarding each row and column again, there is no useless element. In other words, the support matrix in Table 33.5 is fully compressed.

Table 33.5 Support matrix of the frequent 2-itemset

Full size table

(3)
The frequent 2-itemset L ₂ is the union of the 2-itemset subsets produced by L′₁ in step (ii) of procedure (2) and a set L′₂ obtained from the support matrix in Table 33.5

$$ {L}_2^{\prime }=\left\{\left({I}_i,{I}_j\right)\right.\left|{\displaystyle \sum_{k\in \left\{2,3,4,5\right\}}{d}_{ik}{d}_{kj}}\ge 2\left.,i<j,i,j=1,2,3,4,6,8,10\right\}=\left\{\left({I}_2,{I}_3\right)\right.,\left({I}_2,{I}_4\right),\left({I}_2,{I}_5\right),\left({I}_3,{I}_4\right),\left({I}_3,{I}_5\right),\left({I}_4,{I}_5\right)\right\}. $$

That is L ₂ = {(I ₁ I ₃),(I ₁ I ₈),(I ₃ I ₈),(I ₁ I ₄),(I ₁ I ₅),(I ₃ I ₄),(I ₄ I ₅),(I ₃ I ₅),(I ₂ I ₃),(I ₂ I ₆),(I ₆ I ₃),(I ₉ I ₂),(I ₉ I ₃),(I ₉ I ₆)}.

Subsequently, the uncompressed support matrix of the frequent 3-itemset is constructed as listed in Table 33.6.

Table 33.6 Uncompressed support matrix of the frequent 3-itemset

Full size table

Firstly, we remove the corresponding columns of T ₁ by considering its h _s < 2. Where n ₁ = 6, $ \left[\sqrt{n_1}\right]=\left[\sqrt{6}\right]=2 $. Secondly, because the corresponding v _s of (I ₂ I ₄) (I ₂ I ₅) is equal to 2, we work out all the frequent itemsets containing (I ₂ I ₄) or (I ₂ I ₅): L′₃ = {(I ₂ I ₃ I ₄),(I ₂ I ₃ I ₅)} and drop those corresponding rows. That is Table 33.7.

Table 33.7 Support matrix of the frequent 3-itemset

Full size table

(4)
Produce the frequent 3-itemset.

A frequent 3-itemset (I ₃ I ₄ I ₅) is obtained from Table 33.7. And the frequent 3-itemset is L ₃ = {(I ₃ I ₄ I ₅)} ∪ {(I ₁ I ₃ I ₈), (I ₁ I ₃ I ₄) (I ₁ I ₃ I ₅) (I ₁ I ₄ I ₅), (I ₃ I ₄ I ₅), (I ₉ I ₂ I ₃), (I ₉ I ₂ I ₆), (I ₆ I ₂ I ₃)} of L′₁ ∪ {(I ₂ I ₃ I ₄) (I ₂ I ₃ I ₅)} of L′₃.

(5)
Produce a frequent 4-itemset from (I ₁ I ₃ I ₄ I ₅) in L′₁. While no frequent 4-itemset could be found in Table 33.7, so our algorithm ends.

5 Conclusion

An algorithm of mining association rule based on matrix is able to discover all the frequent item sets only by searching the database once and not generating the candidate itemsets, but generating the frequent itemsets directly, which is more efficient. Many researchers have done a great deal of work on it. Here, a new algorithm for generating association rules based on matrix is proposed. It compresses the transaction matrix efficiently by integrating various strategies and achieves better performance than the known algorithms based on matrix. Some new strategies of compressing the transaction matrix are worthy of further research.

References

Agrawal, R., Imielinski, T., & Wami, A. S. (1993). Mining association rules between sets of items in large databases (pp. 207–216). Proceeding of the ACM SIGMOD Conference on Management of Data, Washington, DC.
Google Scholar
Lv, T. X., & Liu, P. Y. (2011). Algorithm for generating strong association rules based on matrix. Application Research of Computers, 28(4), 1301–1303.
Google Scholar
Cao, F. H. (2012). Improved association rule mining algorithm based on two matrixes. Electronic Science and Technology, 25(5), 126–128.
Google Scholar
Xu, H. Z. (2012). The research of association rules data mining algorithms. Science Technology and Engineering, 12(1), 60–63.
Google Scholar
He, B., & Xue, F. (2012). An improved algorithm for mining association rules. Computer Knowledge and Technology, 8(5), 1015–1017.
MathSciNet Google Scholar

Download references

Acknowledgment

This work is financially supported by the Natural Science Foundation of the Jiangxi Province of China under Grant No. 20122BAB201004.

Author information

Authors and Affiliations

College of Mathematics and Computer Science, Jiangxi Science and Technology Normal University,, Nanchang, 330000, China
Sihui Shu

Authors

Sihui Shu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sihui Shu .

Editor information

Editors and Affiliations

University of Texas at Dallas, Richardson, Texas, USA
W. Eric Wong
Chinese Academy of Sciences, Beijing, China, People’s Republic
Tingshao Zhu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shu, S. (2014). A New Association Rule Mining Algorithm Based on Compression Matrix. In: Wong, W.E., Zhu, T. (eds) Computer Engineering and Networking. Lecture Notes in Electrical Engineering, vol 277. Springer, Cham. https://doi.org/10.1007/978-3-319-01766-2_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-01766-2_33
Published: 05 December 2013
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01765-5
Online ISBN: 978-3-319-01766-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

A New Association Rule Mining Algorithm Based on Compression Matrix

Abstract

Similar content being viewed by others

SR-Mine: Adaptive Transaction Compression Method for Frequent Itemsets Mining

A Survey on Representation for Itemsets in Association Rule Mining

An Improved Algorithm for Mining Association Rules Based on Partitioned and Compressed Association Graph

Keywords

1 Introduction

2 Preliminaries

Property 1 [1]

Property 2

3 An Improvement on Apriori Algorithm Based on Compression Matrix

3.1 Matrix Compression Procedure

4 Algorithm Example Experiment Studying

5 Conclusion

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A New Association Rule Mining Algorithm Based on Compression Matrix

Abstract

Similar content being viewed by others

SR-Mine: Adaptive Transaction Compression Method for Frequent Itemsets Mining

A Survey on Representation for Itemsets in Association Rule Mining

An Improved Algorithm for Mining Association Rules Based on Partitioned and Compressed Association Graph

Keywords

1 Introduction

2 Preliminaries

Property 1 [1]

Property 2

3 An Improvement on Apriori Algorithm Based on Compression Matrix

3.1 Matrix Compression Procedure

4 Algorithm Example Experiment Studying

5 Conclusion

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation