A high-performance distributed algorithm for mining association rules

Schuster, Assaf; Wolff, Ran; Trock, Dan

doi:10.1007/s10115-004-0176-3

A high-performance distributed algorithm for mining association rules

Published: 01 May 2005

Volume 7, pages 458–475, (2005)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Knowledge and Information Systems Aims and scope Submit manuscript

A high-performance distributed algorithm for mining association rules

Download PDF

Assaf Schuster¹,
Ran Wolff¹ &
Dan Trock²

233 Accesses
22 Citations
Explore all metrics

Abstract

We present a new distributed association rule mining (D-ARM) algorithm that demonstrates superlinear speed-up with the number of computing nodes. The algorithm is the first D-ARM algorithm to perform a single scan over the database. As such, its performance is unmatched by any previous algorithm. Scale-up experiments over standard synthetic benchmarks demonstrate stable run time regardless of the number of computers. Theoretical analysis reveals a tighter bound on error probability than the one shown in the corresponding sequential algorithm. As a result of this tighter bound and by utilizing the combined memory of several computers, the algorithm generates far fewer candidates than comparable sequential algorithms—the same order of magnitude as the optimum.

Avoid common mistakes on your manuscript.

References

Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data, Washington, DC, pp 207–216
Agrawal R, Shafer J (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8:962–969
Article Google Scholar
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large databases (VLDB’94), Santiago, Chile, pp 487–499
Ananthanarayana VS, Subramanian DK, Murty MN (2000) Scalable, distributed and dynamic mining of association rules. In: Proceedings of HiPC’00, Bangalore, India, pp 559–566
Brin S, Motwani R, Ullman J, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. SIGMOD Rec 6:255–264
Google Scholar
Cheung D, Han J, Ng V, Fu A, Fu Y (1996) A fast distributed algorithm for mining association rules. In: Proceedings of the 1996 international conference on parallel and distributed information systems, Miami Beach, Florida, pp 31–44
Cheung D, Xiao Y (1998) Effect of data skewness in parallel mining of association rules. In: 12th Pacific-Asia conference on knowledge discovery and data mining, Melbourne, Australia, pp 48–60
Hagerup T, Rub C (1989/90) A guided tour of Chernoff bounds. Inf Process Lett 33:305–308
Article Google Scholar
Han E-HS, Karypis G, Kumar V (2000) Scalable parallel data mining for association rules. IEEE Trans Knowl Data Eng 12:352–377
Google Scholar
Han J, Fu Y (1995) Discovery of multiple-level association rules from large databases. In: Proceedings of the 21st international conference on very large data bases (VLDB’95), Zurich, Switzerland, pp 420–431
Han J, Pei J, Yin Y (1999) Mining frequent patterns without candidate generation. Technical Report 99-12, Simon Fraser University
Iko P, Kitsuregawa M (2003) Parallel fp-growth on PC cluster. In: Seventh Pacific-Asia conference of knowledge discovery and data mining (PAKDD03)
Jarai Z, Virmani A, Iftode L (1998) Towards a cost-effective parallel data mining approach. Orlando, Florida
Lin D-I, Kedem ZM (1998) Pincer search: a new algorithm for discovering the maximum frequent set. In: Extending database technology, pp 105–119
Park JS, Chen M-S, Yu PS (1995a) An effective hash-based algorithm for mining association rules. In: Proceedings of ACM SIGMOD international conference on management of data, San Jose, CA, pp 175–186
Park JS, Chen M-S, Yu PS (1995b) Efficient parallel data mining for association rules. In: Proceedings of the ACM international conference on information and knowledge management, Baltimore, MD, pp 31–36
Pei J, Han J (2000) Can we push more constraints into frequent pattern mining? In: Proceedings of the ACM SIGKDD conference on knowledge discovery and data mining, Boston, MA, pp 350–354
Savasere A, Omiecinski E, Navathe SB (1995) An efficient algorithm for mining association rules in large databases. In: Proceedings of the 21st international conference on very large databases (VLDB’95), pp 432–444
Schuster A, Wolff R (2001) Communication-efficient distributed mining of association rules. In: Proceedings of the 2001 ACM SIGMOD international conference on management of data, Santa Barbara, CA, pp 473–484
Srikant R (1993) Synthetic data generation code for association and sequential patterns. Available from the IBM Quest web site at http://www.almaden.ibm.com/cs/quest/
Srikant R, Agrawal R (1994) Mining generalized association rules. In: Proceedings of the 20th international conference on very large databases (VLDB’94), Santiago, Chile, pp 407–419
Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. In: Jagadish HV, Mumick IS (eds) Proceedings of the 1996 ACM SIGMOD international conference on management of data, Montreal, Quebec, Canada, pp 1–12
Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: Heckerman D, Mannila H, Pregibon D, Uthurusamy R (eds) Proceedings of the ACM SIGKDD conference on knowledge discovery and data mining. AAAI Press, pp 67–73
Thomas S, Chakravarthy S (2000) Incremental mining of constrained associations. In: Proceedings of HiPC’00, Bangalore, India, pp 547–558
Toivonen H (1996) Sampling large databases for association rules. In: Proceedings of the 22nd international conference on very large databases (VLDB’96), pp 134–145
Zaiane OR, El-Hajj M, Lu P (2001) Fast parallel association rules mining without candidacy generation. In: IEEE 2001 international conference on data mining (ICDM’2001), pp 665–668
Zaki MJ, Ogihara M, Parthasarathy S, Li W (1996) Parallel data mining for association rules on shared-memory multi-processors. In: Proceedings of the Supercomputing’96, Pittsburg, PA, pp 17–22
Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997a) New algorithms for fast discovery of association rules. Technical Report TR651, Rensselaer Polytechnic Institute
Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997b) Parallel algorithms for discovery of association rules. Data Min Knowl Discov 1:343–373
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Technion—Israel Institute of Technology, Haifa, 32000, Israel
Assaf Schuster & Ran Wolff
Department of Electrical Engineering, Technion—Israel Institute of Technology, Haifa, Israel
Dan Trock

Authors

Assaf Schuster
View author publications
You can also search for this author in PubMed Google Scholar
Ran Wolff
View author publications
You can also search for this author in PubMed Google Scholar
Dan Trock
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ran Wolff.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schuster, A., Wolff, R. & Trock, D. A high-performance distributed algorithm for mining association rules. Knowl Inf Syst 7, 458–475 (2005). https://doi.org/10.1007/s10115-004-0176-3

Download citation

Received: 19 November 2003
Revised: 09 January 2004
Accepted: 16 February 2004
Published: 01 May 2005
Issue Date: May 2005
DOI: https://doi.org/10.1007/s10115-004-0176-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A high-performance distributed algorithm for mining association rules

Abstract

Article PDF

Similar content being viewed by others

Improved Implementation and Performance Analysis of Association Rule Mining in Large Databases

Association Rule Mining in Distributed Environment: A Survey

A scalable association rule learning heuristic for large datasets

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A high-performance distributed algorithm for mining association rules

Abstract

Article PDF

Similar content being viewed by others

Improved Implementation and Performance Analysis of Association Rule Mining in Large Databases

Association Rule Mining in Distributed Environment: A Survey

A scalable association rule learning heuristic for large datasets

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation