Abstract
The most computationally demanding aspect of Association Rule Mining is the identification and counting of support of the frequent sets of items that occur together sufficiently often to be the basis of potentially interesting rules. The task increases in difficulty with the scale of the data and also with its density. The greatest challenge is posed by data that is too large to be contained in primary memory, especially when high data density and/or low support thresholds give rise to very large numbers of candidates that must be counted. In this paper, we consider strategies for partitioning the data to deal effectively with such cases. We describe a partitioning approach which organises the data into tree structures that can be processed independently. We present experimental results that show the method scales well for increasing dimensions of data and performs significantly better than alternatives, especially when dealing with dense data and low support thresholds.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Agarwal R, Aggarwal C, Prasad V (2000) Depth first generation of long patterns. In: Proceedings of the ACM KDD conference on management of data, Boston, pp 108–118
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD conference on management of data, Washington, DC, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference, Santiago, Santiago, Chile, pp 487–499
Bayardo RJ (1998) Efficiently mining long pattern from databases. In: Proceedings of the ACM SIGMOD conference on management of data, pp 85–93
Bayardo RJ, Agrawal R, Gunopulos D (1999) Constraint-based rule mining in large, dense databases. In: Proceedings of the 15th interantional conference on data engineering, pp 188–197
Berry MJ, Linoff GS (1997) Data mining techniques for marketing, sales and customer support. Wiley, New York
Cheung W, Zaiane OR (2003) Incremental mining of frequent patterns without candidate generation or support constraint. In: Proceedings of seventh international database engineering and applications symposium (IDEAS'03), pp 111–116
Coenen F, Goulbourne G, Leng P (2001) Computing association rules using partial totals. In: Proceedings of the 5th European conference on principles of data mining and knowledge discovery (PKDD'01). LNCS, vol 2168. Springer, Berlin Heidelberg New York, pp 54–66
Coenen F, Leng P (2001) Optimising association rule algorithms using itemset ordering. In: Bramer M, Coenen F, Preece A (eds) Research and development in intelligent systems XVIII (Proceedings of ES2001). Springer-Verlag, London, pp 53–66
El-Hajj M, Zaiane OR (2003) Non recursive generation of frequent K-itemsets from frequent pattern tree representations. In: Proceedings of 5th international conference on data warehousing and knowledge discovery (DaWaK 2003). LNCS, vol 2737. Springer-Verlag, Berlin Heidelberg New York, pp 371–380
El-Hajj M, Zaiane OR (2003) Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2003), ACM, pp 109–118
Goulbourne G, Coenen F, Leng P (2000) Algorithms for computing association rules using a partial-support tree. J Knowl Based Syst 13:141–149 (also in Proceedings ES'99, December 1999. Springer, London, pp 132–147)
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD conference on management of data, Dallas, pp 1–12
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87
Huang H, Wu X, Relue R (2002) Association analysis with one scan of databases. In: Proceedings of the 2002 IEEE international conference on data mining (ICDM'02), pp 629–632
Liu G, Lu H, Lou W, Yu J (2003) On computing, storing and querying frequent patterns. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2003), ACM, pp 607–612
Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Proceedings of ACM SIGMOD workshop on data mining and knowledge discovery, pp 11–20
Savasere A, Omiecinski E, Navathe S (1995) An efficient algorithm for mining association rules in large databases. In: Proceedings of the 21th VLDB conference, pp 432–444
Toivonen H (1996) Sampling large databases for association rules. In: Proceedings of the 22th VLDB conference, pp 1–12
Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. Technical report 651, University of Rochester, Computer Science Department, New York
Author information
Authors and Affiliations
Corresponding author
Additional information
Shakil Ahmed received a first class BSc (Hons) degree from Dhaka University, Bangladesh, in 1990; and an MSc (first class), also Dhaka University, in 1992. He received his PhD from The University of Liverpool, UK, in 2005. From 2000 onwards he is a member of the Data Mining Group at the Department of Computer Science of the University of Liverpool, UK. His research interests include data mining, Association Rule Mining and pattern recognition.
Frans Coenen has been working in the field of Data Mining for many years and has written widely on the subject. He received his PhD from Liverpool Polytechnic in 1989, after which he took up a post as a RA within the Department of Computer Science at the University of Liverpool. In 1997, he took up a lecturing post within the same department. His current Data Mining research interests include Association rule Mining, Classification algorithms and text mining. He is on the programme committee for ICDM'05 and was the chair for the UK KDD symposium (UKKDD'05).
Paul Leng is professor of e-Learning at the University of Liverpool and director of the e-Learning Unit, which is responsible for overseeing the University's online degree programmes, leading to degrees of MSc in IT and MBA. Along with e-Learning, his main research interests are in Data Mining, especially in methods of discovering Association Rules. In collaboration with Frans Coenen, he has developed efficient new algorithms for finding frequent sets and is exploring applications in text mining and classification.
Rights and permissions
About this article
Cite this article
Ahmed, S., Coenen, F. & Leng, P. Tree-based partitioning of date for association rule mining. Knowl Inf Syst 10, 315–331 (2006). https://doi.org/10.1007/s10115-006-0010-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-006-0010-1