Abstract
We present a new distributed association rule mining (D-ARM) algorithm that demonstrates superlinear speed-up with the number of computing nodes. The algorithm is the first D-ARM algorithm to perform a single scan over the database. As such, its performance is unmatched by any previous algorithm. Scale-up experiments over standard synthetic benchmarks demonstrate stable run time regardless of the number of computers. Theoretical analysis reveals a tighter bound on error probability than the one shown in the corresponding sequential algorithm. As a result of this tighter bound and by utilizing the combined memory of several computers, the algorithm generates far fewer candidates than comparable sequential algorithms—the same order of magnitude as the optimum.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data, Washington, DC, pp 207–216
Agrawal R, Shafer J (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8:962–969
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large databases (VLDB’94), Santiago, Chile, pp 487–499
Ananthanarayana VS, Subramanian DK, Murty MN (2000) Scalable, distributed and dynamic mining of association rules. In: Proceedings of HiPC’00, Bangalore, India, pp 559–566
Brin S, Motwani R, Ullman J, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. SIGMOD Rec 6:255–264
Cheung D, Han J, Ng V, Fu A, Fu Y (1996) A fast distributed algorithm for mining association rules. In: Proceedings of the 1996 international conference on parallel and distributed information systems, Miami Beach, Florida, pp 31–44
Cheung D, Xiao Y (1998) Effect of data skewness in parallel mining of association rules. In: 12th Pacific-Asia conference on knowledge discovery and data mining, Melbourne, Australia, pp 48–60
Hagerup T, Rub C (1989/90) A guided tour of Chernoff bounds. Inf Process Lett 33:305–308
Han E-HS, Karypis G, Kumar V (2000) Scalable parallel data mining for association rules. IEEE Trans Knowl Data Eng 12:352–377
Han J, Fu Y (1995) Discovery of multiple-level association rules from large databases. In: Proceedings of the 21st international conference on very large data bases (VLDB’95), Zurich, Switzerland, pp 420–431
Han J, Pei J, Yin Y (1999) Mining frequent patterns without candidate generation. Technical Report 99-12, Simon Fraser University
Iko P, Kitsuregawa M (2003) Parallel fp-growth on PC cluster. In: Seventh Pacific-Asia conference of knowledge discovery and data mining (PAKDD03)
Jarai Z, Virmani A, Iftode L (1998) Towards a cost-effective parallel data mining approach. Orlando, Florida
Lin D-I, Kedem ZM (1998) Pincer search: a new algorithm for discovering the maximum frequent set. In: Extending database technology, pp 105–119
Park JS, Chen M-S, Yu PS (1995a) An effective hash-based algorithm for mining association rules. In: Proceedings of ACM SIGMOD international conference on management of data, San Jose, CA, pp 175–186
Park JS, Chen M-S, Yu PS (1995b) Efficient parallel data mining for association rules. In: Proceedings of the ACM international conference on information and knowledge management, Baltimore, MD, pp 31–36
Pei J, Han J (2000) Can we push more constraints into frequent pattern mining? In: Proceedings of the ACM SIGKDD conference on knowledge discovery and data mining, Boston, MA, pp 350–354
Savasere A, Omiecinski E, Navathe SB (1995) An efficient algorithm for mining association rules in large databases. In: Proceedings of the 21st international conference on very large databases (VLDB’95), pp 432–444
Schuster A, Wolff R (2001) Communication-efficient distributed mining of association rules. In: Proceedings of the 2001 ACM SIGMOD international conference on management of data, Santa Barbara, CA, pp 473–484
Srikant R (1993) Synthetic data generation code for association and sequential patterns. Available from the IBM Quest web site at http://www.almaden.ibm.com/cs/quest/
Srikant R, Agrawal R (1994) Mining generalized association rules. In: Proceedings of the 20th international conference on very large databases (VLDB’94), Santiago, Chile, pp 407–419
Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. In: Jagadish HV, Mumick IS (eds) Proceedings of the 1996 ACM SIGMOD international conference on management of data, Montreal, Quebec, Canada, pp 1–12
Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: Heckerman D, Mannila H, Pregibon D, Uthurusamy R (eds) Proceedings of the ACM SIGKDD conference on knowledge discovery and data mining. AAAI Press, pp 67–73
Thomas S, Chakravarthy S (2000) Incremental mining of constrained associations. In: Proceedings of HiPC’00, Bangalore, India, pp 547–558
Toivonen H (1996) Sampling large databases for association rules. In: Proceedings of the 22nd international conference on very large databases (VLDB’96), pp 134–145
Zaiane OR, El-Hajj M, Lu P (2001) Fast parallel association rules mining without candidacy generation. In: IEEE 2001 international conference on data mining (ICDM’2001), pp 665–668
Zaki MJ, Ogihara M, Parthasarathy S, Li W (1996) Parallel data mining for association rules on shared-memory multi-processors. In: Proceedings of the Supercomputing’96, Pittsburg, PA, pp 17–22
Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997a) New algorithms for fast discovery of association rules. Technical Report TR651, Rensselaer Polytechnic Institute
Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997b) Parallel algorithms for discovery of association rules. Data Min Knowl Discov 1:343–373
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Schuster, A., Wolff, R. & Trock, D. A high-performance distributed algorithm for mining association rules. Knowl Inf Syst 7, 458–475 (2005). https://doi.org/10.1007/s10115-004-0176-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-004-0176-3