Keywords

1 Introduction

During the past decade, the entire business scenario has been tremendously changed, because of rapid change in the business policies and the organization standards. The ultimate change of their objectivity is to retain customers and the business. The innovations and globalization have generated great opportunities in business and choices in the universal marketplace for firms and customers. The organizations began understanding their customers and their interest to fulfill the requirements of them. As a part of understanding the customers entirely, the organizations are collecting various details of the customers for analysis purpose. The collected data is primarily stored in the fashion of a database, and it allocates a unique identification number to the customer. This kind of primary information is useful to the organizations in maintaining the relationship with the customers. In Data Mining Association rule learning is a popular and well researched technique for discovering interesting relations between variables in large databases. Association rules are usually required to satisfy a user-specified minimum support and a user-specified minimum confidence[1][2][3]. Association rules can be extracted using two familiarized algorithms named as Apriori algorithm and FP-Growth algorithm[15][16][17]. The FP-Growth algorithm is completely depends on fp-tree[4]. The previous fp-tree node is labeled only with its current support count, which consumes more time while traversing to extract association rule. In this paper we are more concentrated on design and implementation of novel algorithms for frequent pattern trees. By using the proposed algorithms the traversal time is reduced. In this paper we present six efficient algorithms.

  1. A.

    Frequent Item or Frequent Item set

    An item is said to be a frequent, if it is purchased by minimum number of customers. In other words in any transactional database DB contains any item repeated occurred then the particular item is frequent item. If set of items are repeatedly occurred then that set is called a frequent item set. Generally, these frequent items can be determined using the statistical approach mode. If the data set has only one mode then that is unimodality, if it has two modes the bimodality, if it has three modes then it is called trimodality.

  2. B.

    Association Rule :

    Let D be the database of transactions and ITEMSET = {I1,I2,I3,.. …, In} be the set of items. TR is a transaction includes one or more items in ITEMSET (i.e., TR ⊆ ITEMSET). An association rule has the form P ⇒ Q, where P and Q are non-empty sets of items that is P ⊆ ITEMSET, Q ⊆ ITEMSET such that P ∩ Q = Ø.

2 Literature Review

  1. A.

    Association Rule Mining(ARM) :

    Agarwal et al.(1993)[1] reported frequent item set mining and association rule mining first time. One of the first algorithms proposed for association rule mining was the AIS algorithm. The Apriori algorithm employs the downward closure property that is if an item set is not frequent any superset of it cannot be a frequent. FP-Growth was developed by Grahne & Zhu and it uses an extra array-based structure to decrease the number of traversals of the tree that are required during the analysis. This saves time during general traversal of the tree and also enables direct initialization of the next level of the FP-Tree(s) [A. Ceglar & J. F. Roddick, 2006; G. Grahne & J. Zhu, 2003]. Another FP-Growth based approach is TD-FP-Growth proposed by Wang et. al., and is a top-down variation to the base FP-Growth approach. This approach is said to alleviate the need or demand to generate/build conditional pattern bases and physical projections of the trie [A. Ceglar & J. F. Roddick, 2006; K. Wang et. al., 2002[5][6][7][8][9]].

  2. B.

    Trees :

    James Joseph Sylvester introduced the term graph in 1878[10]. Arthur Cayley conducted a study on a particular type of graphs which are called Trees. The study causes several implications in theoretical chemistry. These results were published by George Polya in 1935. Denes Konig wrote the first text book on the graph theory in 1936. In 1962, AVL Trees was invented by G.M.Adelson, Velskii and E.M.Landis[11]. Heinrich Heesch, in 1969 published a computerised problem solving method. Rudolf Bayer and Edward.M first described the B Trees in 1972.

3 Research Objective

The main objective of the research is to reduce the tree traversal time of a frequent pattern tree. Design and implementation of novel algorithms for new frequent pattern tree with new naming approach. To prove how the novel algorithms are effectively working with an example.

4 Proposed Algorithms

  1. A.

    Algorithm for fp-tree construction:

Algorithm : Novel Fp_tree construction

Input   : Trans_list database DB

Output  : Frequent pattern tree

  1. 1.

    Read Transaction_list into Trans_list from database DB.

  2. 2.

        Let Freq_items be the set of frequent items based on threshold value in Trans_list.

  3. 3.

        Sort Freq_items on descending order of support_count and let the result be F_Sort_List.

  4. 4.

        Create a root node of an FP_Tree T for Trans_list and label it as ‘NULL’.

 For each transaction Trans  in Trans_list do the following

  1. a)

    Select the frequent items from Trans and the result be F_items.

  2. b)

    Sort F_items according to the order of F_Sort_List and the result be  h|R, where h is the first (head) element and R is the remaining elements of the list.

Call insert_tree(h|R,T).

Figure: 1
figure 1

Flow chart for Novel Fp_tree construction Algorithm

  1. B.

    Proposed Algorithm for insert_tree with Two Level  Node Labeling

Algorithm : insert_tree with Two Level  Node Labeling

Input   : each transaction and Tree

Output  : Frequent pattern tree with two level node labels

  1. 1.

    If T has a child N such that N.item-name = h.item-name, then increment N ’s count by 1;

  2. 2.

    else

    {

      create a new node N  

      N’s count initialized to 1,

    }

  1. 3.

    N’s pred data = T’s present data

  2. 4.

    N’s parent link linked to T

  3. 5.

    N’s node-link linked to the nodes with the same item-name via the node-link structure.

  4. 6.

    If P is nonempty, then call insert tree(R, N ) recursively.

    Figure: 2
    figure 2

    Flow chart for insert_tree with Two Level Node Labeling Algorithm

  5. C.

    Novel Algorithm to extract n frequent items from FP_growth tree

Algorithm : NovelFP-growth

Input   : Tree, Key element a and n.

Output   : n frequent patterns associated with the given key element

Procedure NovelFP-growth(Tree, a,n)

{

  //a is the key item

 //n indicates the no of items associated with a particular item.

Step 1: call Traversal_SPP_and_MPP(Tree, a,n)

Step 2: return(freq pattern set(SPP) ∪ freq pattern

set(MPPset) ∪ (freq pattern set(SPP) × freq pattern set(MPPset)))

}

Figure: 3
figure 3

Flow chart for NovelFP-growth Algorithm

  1. D.

    Proposed Algorithm NovelFP-growthSpp

Note  : X represents a pattern in single predecessor path and Y represents a pattern in Multi predecessor path.

Algorithm : NovelFPgrowth_to_traverse_SinglePredecessorPath

Input   : Tree, Key element a and n.

Output   : Traverses in single predecessor path and returns the patterns associated with the key

NovelFP-growthSpp(Tree, a,n)

{

Step 1: let SPP be the Single Predecessors path of n-1 nodes part of Tree;

Step 2: for each combination (denoted as X) of the nodes in the path SPP do

Step 3: generate pattern X ∪ a with support = minimum support of nodes in X;

Step 4: let freq pattern set(SPPSet) be the set of patterns so generated;

}

Figure: 4
figure 4

Flow chart for NovelFP- growth_to_traverse _SinglePredecessorPath

  1. E.

    Proposed Algorithm NovelFP-growthMpp

Algorithm  : NovelFP-growth_to_traverse_MultiPredecessorPath

Input    : Tree, Key element a and n.

Output   : Traverses in multi predecessor path and returns the patterns associated with the key

NovelFP-growthMpp(Tree, a,n)

{

  1. 1.

    let MPP be the Multi Predecessors Path part of Tree with i=1;

  2. 2.

    for each item ai in MPP with i <= n-1 do

    {

  3. 3.

    generate pattern Y = ai ∪ a with support = ai.support;

  4. 4.

    i=i+1

    }

  5. 5.

    construct Y’s conditional pattern-base and then Y’s conditional FP-tree  Tree Y;

  6. 6.

    if Tree Y ≠  Ø then

  7. 7.

    call NovelFP-growth( Y, a, n);

  8. 8.

    let freq pattern set(MPPSet) be the set of patterns so generated;

}

Figure: 5
figure 5

Flow chart for NovelFP-growth_to_traverse_MultiPredecessorPath Algorithm

  1. F.

    Proposed Algorithm Traversal_SPP_and_MPP

Algorithm : Traversal_SPP_and_MPP

Input   : Tree, Key element a and n.

Output   : Traverses either in single predecessor path or in multi predecessor path and returns the patterns associated with the key

 //Freq pattern set SPPSet and MPPSet are global variables.

 Procedure Traversal_SPP_and_MPP(Tree, a,n)

 {

//n indicates the no of items associated with a particular item.

Step 1: if Tree contains a single Predecessors path then

   Call NovelFP-growthSpp(Tree, a,n)

  else

   Call NovelFP-growthMpp(Tree, a,n)

  Step 2: return(freq pattern set(SPPSet) ∪ freq pattern set(MPPset)

  ∪ (freq pattern set(SPP) × freq pattern set(MPPset)))

}

Figure: 6
figure 6

Flow chart for Traversal_SPP_and_MPP Algorithm

5 Performance and results

Theorem :

The time complexity of The Two-level node labeling algorithm(s) is O(n) with reduction of one unit of time in worst case also.

Proof :

The proposed Two Level Node Labeling algorithm constructs a frequent pattern tree which is similar to binary tree. Hence, the time complexity of traversal of frequent pattern tree, which is formed with the proposed algorithm, is similar to the traversal complexity of a binary tree. Since the time complexity of binary tree traversal algorithm is O(n) [13][14], the time complexity of frequent pattern tree is O(n). But the proposed algorithm reduces one level in the tree during traversal.

With taking the training data set as in the table:1 [12]. We get the frequent transactional data base as shown in the table:2 with considering the minimum support value as 3. The final frequent pattern tree will appears as shown in figure:7 by using the proposed algorithms.

Table:1 Transactional Data Base
Table:2 Frequent item Transactional Data Base
Figure: 7
figure 7

Frequent item Transactional Data Base

The traversal time for the predecessor nodes in frequent pattern trees[18][19][20][21] using the traditional frequent pattern tree approach is only O(n) for the nth level node. The proposed algorithm also constructs a binary tree. Hence the time complexity of the newly constructed tree is also O(n). But, one unit of time is less than the time of traditional approach in worst case. The proposed algorithm takes a fewer number of node comparisons to determine the predecessor nodes and its path than that of traditional approach in best and average cases.

6 Conclusion

In this paper, we presented the design and implementation issues of six novel algorithms. One algorithm constructs a frequent pattern tree. Four algorithms are used to label each node and another algorithm extracts the frequent patterns from the frequent pattern tree. In the Example figures the solid line between the nodes represents the relation between the nodes and the dashed line indicates the pointer link between the same nodes, useful to maintain the cumulative node count in the data structure. We also presented the flow charts of each algorithm.This novel approach reduce one level tree traversal of the tree in the worst case also.