Performance study of distributed Apriori-like frequent itemsets mining

Aouad, Lamine M.; Le-Khac, Nhien-An; Kechadi, Tahar M.

doi:10.1007/s10115-009-0205-3

Performance study of distributed Apriori-like frequent itemsets mining

Regular Paper
Published: 16 April 2009

Volume 23, pages 55–72, (2010)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Knowledge and Information Systems Aims and scope Submit manuscript

Performance study of distributed Apriori-like frequent itemsets mining

Download PDF

Lamine M. Aouad¹,
Nhien-An Le-Khac¹ &
Tahar M. Kechadi¹

342 Accesses
23 Citations
1 Altmetric
Explore all metrics

Abstract

In this article, we focus on distributed Apriori-based frequent itemsets mining. We present a new distributed approach which takes into account inherent characteristics of this algorithm. We study the distribution aspect of this algorithm and give a comparison of the proposed approach with a classical Apriori-like distributed algorithm, using both analytical and experimental studies. We find that under a wide range of conditions and datasets, the performance of a distributed Apriori-like algorithm is not related to global strategies of pruning since the performance of the local Apriori generation is usually characterized by relatively high success rates of candidate sets frequency at low levels which switch to very low rates at some stage, and often drops to zero. This means that the intermediate communication steps and remote support counts computation and collection in classical distributed schemes are computationally inefficient locally, and then constrains the global performance. Our performance evaluation is done on a large cluster of workstations using the Condor system and its workflow manager DAGMan. The results show that the presented approach greatly enhances the performance and achieves good scalability compared to a typical distributed Apriori founded algorithm.

Avoid common mistakes on your manuscript.

References

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceeding of the 20th international conference on very large data bases, VLDB, pp 487–499
Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: SIGMOD 1997, Proceedings ACM SIGMOD international conference on management of data, May 13–15
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Park JS, Chen M-S, Yu PS (1995) An effective hash-based algorithm for mining association rules. In: Proceedings of the 1995 ACM SIGMOD international conference on management of data, SIGMOD ’95
Savasere A, Omiecinski E, Navathe SB (1995) An efficient algorithm for mining association rules in large databases. VLDB J :432–444
Cheung DW, Han J, Ng VT, Fu AW, Fu Y (1996) A fast distributed algorithm for mining association rules. In: PDIS: International conference on parallel and distributed information systems
Cheung DW, Ng VT, Fu AW, Fu Y (1996) Efficient mining of association rules in distributed databases. IEEE Trans Knowl Data Eng 8(6): 911–922
Article Google Scholar
Agrawal R, Shafer JC (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8: 962–969
Article Google Scholar
Ashrafi MZ, Taniar D, Smith KA (2004) ODAM: an optimized distributed association rule mining algorithm. IEEE Distrib Syst Online 5(3)
Zaki M, Ogihara M, Parthasarathy S, Li W (1996) Parallel data mining for association rules on shared memory multi-processors. Proceedings of the 1996 IEEE/ACM conference on supercomputing
Parthasarathy S, Zaki MJ, Ogihara M, Li W (2001) Parallel data mining for association rules on shared memory systems. Knowl Inf Syst 3(1): 1–29
Article MATH Google Scholar
Pramudiono I, Kitsuregawa M (2003) Parallel FP-Growth on PC cluster. In: Proceedings of the 7th Pacific–Asia conference of knowledge discovery and data mining (PAKDD03)
Schuster A, Wolff R (2004) Communication-efficient distributed mining of association rules. Data Min Knowl Disc 8(2): 171–196
Article MathSciNet Google Scholar
Toivonen H (1996) Sampling large databases for association rules. In: VLDB ’96: Proceedings of the 22nd international conference on very large data bases
Assaf Schuster, Ran Wolff, Dan Trock (2005) A high-performance distributed algorithm for mining association rules. Knowl Inf Syst 7(4): 458–475
Article Google Scholar
Chung SM, Congnan L (2008) Efficient mining of maximal frequent itemsets from databases on a cluster of workstations. Knowl Inf Syst 16(3): 359–391
Article Google Scholar
Congnan L, Chung SM (2008) A scalable algorithm for mining maximal frequent sequences using a sample. Knowl Inf Syst 15(2): 149–179
Article Google Scholar
Chung SM, Luo C (2004) Distributed mining of maximal frequent itemsets from databases on a cluster of workstations. In: CCGRID ’04: Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid
Wolff R, Schuster A (2003) Association rule mining in peer-to-peer systems. In: Proceeding of the IEEE conference on data mining ICDM 2003
Zaki M (1999) Parallel and distributed association mining: a survey. IEEE Concurr 7(4): 14–25
Article Google Scholar
Zaki M (2000) Parallel and distributed data mining: an introduction. In: Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Park B, Kargupta H (2002) Distributed data mining: algorithms, systems, and applications. Data Mining Handbook
Le-Khac N-A, Aouad LM, Kechadi MT (2007) Knowledge map: toward a new approach supporting the knowledge management in distributed data mining. In: The third international conference on autonomic and autonomous systems, ICAS
Le-Khac N-A, Kechadi MT, Carthy J (2006) ADMIRE framework: distributed data mining on data grid platforms. First international conference on software and data technologies, ICSOFT
Cannataro M, Congiusta A, Pugliese A, Talia D, Trunfio P (2004) Distributed data mining on grids: services, tools, and applications. IEEE Trans Syst Man Cybern 34(6): 2451–2465
Article Google Scholar
Ghanem VM, Kohler YM, Sayed AJ, Wendel P (2002) Discovery net: towards a grid of knowledge discovery. In: Eighth international conference on knowledge discovery and data mining
Kickinger G, Hofer J, Brezany P, Tjoa AM (2004) Grid knowledge discovery processes and an architecture for their composition. Parallel Distrib Comput Netw. Proceedings of parallel and distributed computing and networks
Berman, Fran (2001) Viewpoint: from TeraGrid to knowledge grid. Commun ACM 44(11): 27–28
Article Google Scholar
Srikant R, Agrawal R (1997) Mining generalized association rules. Future Gener Comput Syst 13(2–3): 161–180
Article Google Scholar
Geerts F, Goethals B, Van Den Bussche J (2005) Tight upper bounds on the number of candidate patterns. ACM Trans Database Syst 30(2): 333–363
Article Google Scholar
Thain D, Tannenbaum T, Livny M (2004) Distributed computing in practice: the condor experience. Concurr Comput Pract Exp 17(2–4): 323–356
Google Scholar
Fahringer T, Jugravu A, Pllana S, Prodan R, Seragiotto C Jr, Truong H-L (2005) ASKALON: a tool set for cluster and grid computing: research articles. Concurr Comput Pract Exp 17(2–4): 143–169
Article Google Scholar
von Laszewski G, Foster I, Gawor J, Lane P (2001) A Java commodity grid kit. Concurr Comput Pract Exp 13(8–9): 645–662
MATH Google Scholar
Delannoy O, Petiton S (2004) A Peer to peer computing framework: design and performance evaluation of YML. HeteroPar 2004, the third international workshop on algorithms, models and tools for parallel computing on heterogeneous networks
Amin K, von Laszewski G, Hategan M, Zaluzec NJ, Hampton S, Rossi A (2004) GridAnt: a client- controllable grid workflow system. In: Proceedings of the 37th annual Hawaii international conference on system science
Cappello F et al (2005) Grid’5000: a large scale, reconfigurable, controlable and monitorable grid platform. In: Sixth IEEE/ACM international workshop on grid computing
Purdom PW, Van Gucht D, Groth DP (2004) Average-case performance of the Apriori algorithm. SIAM J Comput 33(5): 1223–1260
Article MATH MathSciNet Google Scholar
Aouad LM, Le-Khac N-A, Kechadi TM (2007) Grid-based approaches for distributed data mining applications. In: DCABES 2007, the sixth international conference on distributed computing and applications for business, engineering and sciences

Download references

Author information

Authors and Affiliations

School of Computer Science and Informatics, University College Dublin, Belfield, Dublin, 4, Ireland
Lamine M. Aouad, Nhien-An Le-Khac & Tahar M. Kechadi

Authors

Lamine M. Aouad
View author publications
You can also search for this author in PubMed Google Scholar
Nhien-An Le-Khac
View author publications
You can also search for this author in PubMed Google Scholar
Tahar M. Kechadi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lamine M. Aouad.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aouad, L.M., Le-Khac, NA. & Kechadi, T.M. Performance study of distributed Apriori-like frequent itemsets mining. Knowl Inf Syst 23, 55–72 (2010). https://doi.org/10.1007/s10115-009-0205-3

Download citation

Received: 13 December 2007
Revised: 28 January 2009
Accepted: 14 February 2009
Published: 16 April 2009
Issue Date: April 2010
DOI: https://doi.org/10.1007/s10115-009-0205-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Performance study of distributed Apriori-like frequent itemsets mining

Abstract

Article PDF

Similar content being viewed by others

Optimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce

Comprehensive mining of frequent itemsets for a combination of certain and uncertain databases

Data placement in massively distributed environments for fast parallel mining of frequent itemsets

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Performance study of distributed Apriori-like frequent itemsets mining

Abstract

Article PDF

Similar content being viewed by others

Optimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce

Comprehensive mining of frequent itemsets for a combination of certain and uncertain databases

Data placement in massively distributed environments for fast parallel mining of frequent itemsets

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation