Keywords

1 Introduction

1.1 Big Data

Nowadays data is not only growing in its size but also increasing in variety and variability. The big data is defined as database that can store, manipulates, analyze and manage huge amount of data whose size is beyond the ability of traditional database tools. The big data and its attributes can be explained via 3V’s that is volume, variety, varsity and velocity [1]. The volume refers to the amount of data that can be processed at the same time. Big data have ability to process and store terabytes of data. The frequency by which data is generated by some real time application refers to velocity of the big data. The variety includes different formats of data that is structured, semi structured and unstructured data.

Big Data Analytics is the method to analyze large amount of data to uncover unknown relation, hidden patterns and other useful information about the data [2]. Big data analytics is proved to be beneficial in many areas like business analysis, marketing strategies, customer reviews to make useful decisions for the enterprises. The main objective of big data analytics is to help different enterprises to make better business decisions and other users to analyze huge volumes of transaction data as well as other data which is left by Business Intelligence programs. Predictive analytics for further launching the further business strategies [3].

One of such big data analytics is frequent item set mining. In frequent item set mining the set of items purchased together are examined. This analysis helps the enterprise and stores to launch the schemes that give together packs or money saver offers to tempt the customers. The main purpose of frequent itemset mining is analyse transaction data of the supermarket. This is used to examine the behavior of customer in terms of products purchased. There are various algorithms that are used for frequent item set mining [4]. These algorithms can be implemented on horizontal data format as well as on vertical data format [5].

1.2 Frequent Item Set Mining

Horizontal item set mining takes data in horizontal format and that have two basic algorithms. Most fundamental algorithm for frequent item set mining is Apriori algorithm it converts the database in the form of graph before processing. This algorithm makes multiple passes over the data and it uses multiple iterative approach and BFS to explore itemsets [6].

Apriori algorithm goes through the iterative approach and it scans database for many passes and take database as graph that increases its time complexity. Considering the drawbacks of apriori algorithm FP growth algorithm was introduced that uses divide and conquer approach. It compresses the database to tree structure called FP Tree that is frequent pattern tree instead of graph [7].

Apriori algorithm and FP growth algorithm were used for horizontal data base and it is found that comparing and scanning capability of vertical data set is less than horizontal so vertical frequent item set mining techniques were introduced.

Then ECLAT algorithm was implemented for frequent data mining. In this paper will discuss the drawbacks of ECLAT and how the proposed algorithm revised ECLAT is used to reduce the time complexity and its graph and improved results are also being explained.

1.3 ECLAT Algorithm

ECLAT Equivalence class transformation was developed to mine frequent item sets efficiently using vertical data format [8]. The length of transaction set in the item set is simply the support count of an item. Starting with k = 1, all the k-item set are used to form the candidate (k + 1) item sets. The intersection of the TID set is used to compute item sets. This process repeats for all transactions.

This ECLAT algorithm have complexity O(k(k + 1)) where k is the no of items. I had gone through the ECLAT and found some scope to reduce complexity in this paper I had proposed the revised ECLAT in which the time complexity is reduced. In revised ECLAT after scanning the data a lexicography algorithm is being applied and then the data having number of transaction count less than that of support count than those items are removed from the list so after that the scanning time is being reduced. Time of process is less for revised ECLAT than that of ECLAT. Improved results and graph are being explained in this paper.

1.4 Revised ECLAT Algorithm

In Revised ECLAT algorithm, first step is loading the data from database to the data structure. Then the data is being sorted and arranged in descending order. The transactions having transaction count less than the support count are directly be omitted from the data structure and then the algorithm is implemented.

1.5 Revised ECLAT Algorithm Without Sorting

Further, the time can be reduced further by eliminating the sorting algorithm from the algorithm and making one to one comparison and directly omitting the item set which is purchased in transaction whose count is less than the support count. The support count is also a taken directly without computation depending upon the scheme to be launched. The support count is decided prior and then it is being taken with inputs. By eliminating the sorting step the processing time is reduced and a good improvement is being observed. The algorithm after improvement is as follow.

The intersection of transactions and union of itemsets are taken for each transaction with another. When no further intersection is possible or the intersection results in null set then again the support value is compared with the transaction count of each item set and then the transaction set having maximum count will be a printed as resultant itemset and that itemset is called as frequent itemset.

2 Results

2.1 Observation Table

Table 1 shows the time difference analyzed by different algorithm in different format having different data volume.

Table 1 Experimental result shows time required to process the data

2.2 Observation from Graph

From the graphs it is being observed that as the data volume increases the processing time also increases along with it. Different algorithm uses different technique and the processing time changes. For Apriori algorithm the processing time observed is highest then the ECLAT algorithm was implemented and that algorithm have less processing time as compared to Apriori. The proposed algorithm enhanced ECLAT have processing time less than both Apriori and ECLAT Algorithm. In enhanced ECLAT as the data volume was increasing the processing time was also increasing and when the data volume was 2000 MB the processing time of ECLAT is 4.42 s and for enhanced ECLAT the time increases to 5.2 s. (Fig. 1).

Fig. 1
figure 1

Different plot for different algorithm for different data format

The clear comparison can be observed from the bar graph given below. Processing time for different technique for different data volume is being compared and clearly observed that the Enhanced ECLAT1 Algorithm in which sorting is being eliminated gives the best result among all other algorithms (Fig. 2).

Fig. 2
figure 2

Bar graph comparison of different algorithm for different data volume

2.3 Inference

From the observation table, graphs and bar graph it is being observed that as size of data increases the processing time of algorithm also increases. The process time of Apriori Algorithm for horizontal data format is more than that of ECLAT algorithm. ECLAT algorithm is used for vertical data set. The frequent item set mining of data in horizontal format take time more than that of data in vertical data format. After going through ECLAT algorithm new algorithm Enhanced ECLAT is being proposed and observed that processing time is less than that of ECLAT algorithm. In enhanced ECLAT as the data volume was increasing the processing time was also increasing and when the data volume was 2000 MB the processing time of ECLAT is 4.42 s and for enhanced ECLAT the time increases to 5.2 s. The sorting algorithm was eliminated and data is one to one compared and the data with transaction count less than that of support count is directly eliminated from the data structure and then the algorithm is implemented. It is being observed that the Enhanced ECLAT without sorting algorithm give us the best results.

3 Limitation

Limitation of revised ECLAT algorithm is that with increasing the size of data the processing time will not give significant changes. When data volume is small then revised ECLAT is most efficient algorithm but with increase in size of data volume it gives the change but much refined.

4 Conclusion

Frequent data mining is one of the most important big data analytics. In frequent data mining the analyst search the products purchased together and then it is used to find strategies and consider customer reviews. This helps enterprises to launch their business intelligence strategies. This paper explained frequent data mining on vertical data format using ECLAT algorithm and then proposed a new algorithm named revised ECLAT that took less processing time as shown in result table and graphs.