Abstract
The high dimensionality of massive data results in the discovery of a large number of association rules. The huge number of rules makes it difficult to interpret and react to all of the rules, especially because many rules are redundant and contained in other rules. We discuss how the sparseness of the data affects the redundancy and containment between the rules and provide a new methodology for organizing and grouping the association rules with the same consequent. It consists of finding metarules, rules that express the associations between the discovered rules themselves. The information provided by the metarules is used to reorganize and group related rules. It is based only on data-determined relationships between the rules. We demonstrate the suggested approach on actual manufacturing data and show its effectiveness on several benchmark data sets.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large relational databases. In: Proceedings of ACM SIGMOD international conference on management of data, Washington, DC, USA, pp 207–216
Aumann Y, Lindell Y (2003) A statistical theory for quantitative association rules. J Intell Inf Syst 20(3):255–283
Bay SD, Pazzani MJ (2001) Detecting group differences: mining contrast sets. Data Min Knowl Discov 5(3):213–246
Bayardo R, Agrawal R, Gunopulos D (2000) Constraint-based rule mining in large, dense databases. Data Min Knowl Discov J 4:217–240
Blake CL, Merz CJ (1998) UCI repository of machine learning databases. UC Irvine, Deptatment Information and Computer Science. http://www.ics.uci.edu/mlearn/MLRepository.html
Chawla C, Davis, J Pandey G (2004) On local pruning of association rules using directed hypergraphs. In: Proceedings of the 20th international conference on data engineering, Boston, MA, USA, p 832
Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. In: Proceedings of the international conference on machine learning, Tahoe City, CA, USA, pp 194–202
Fukuda T, Morimoto Y, Morishita S, Tokuyama T (1999) Mining optimized association procedures. Rules for numeric attributes. J Comput Syst Sci 58:1–12
Gupta KG, Strehl A, Ghosh J (1999) Distance based clustering of association rules. In: Proceedings of ANNIE, intelligent engineering systems through articial neural networks, St. Louis, MO, USA, pp 759–764
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of ACM SIGMOD international conference on management of data, Dallas, TX, USA, pp 1–12
Huang S, Webb GI (2005a) Discarding insignificant rules during impact rule discovery in large databases. In: Proceedings of the SIAM 2005 data mining conference, Newport Beach, CA, USA, pp 541–545
Huang S, Webb GI (2005b) Pruning derivative partial rules during impact rule discovery. In: Proceedings of PAKDD the 9th Pacific-Asia conference on advances in knowledge discovery and data mining. Lecture notes in computer science, vol 3518, Hanoi, Vietnam, pp 71–80
Klemettinen M, Mannila H, Ronkainen P, Toivonen H, Verkamo AI (1994) Finding interesting rules from large sets of discovered association rules. In: Proceedings of the international conference on information and knowledge management, CIKM, Gaithersburg, MD, USA, pp 401–407
Lent B, Swami A, Widom J (1997) Clustering association rules. In: Proceedings of the 13th international conference on data engineering, IEEE Computer Society, Birmingham, UK, pp 220–231
Li J, Liu H (2002) Kent ridge bio-medical data set repository, Institute for Infocomm Research. http://sdmc.lit.org.sg/GEDatasets/Datasets.html
Li W, Han J, Pei, J (2001) Accurate and efficient classification based on multiple class- association rules. In: International conference on data mining, ICDM, San Jose, CA, USA, pp 369–376
Liu B, Hsu W (1996) Post analysis of the learned rules. In: Proceedings of the 13th national conference on artificial intelligence, Portland, OR, pp 828–834
Liu B, Hsu W, Chen S (1997) Using general impression to analyze discovered classification rules. In: Proceedings of the 3rd international conference on knowledge discovery and data mining, Newport Beach, CA, USA, pp 31–36
Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proceedings of the 4th international conference on knowledge discovery and data mining, KDD, New York, USA, pp 80–86
Liu B, Hsu W, Ma Y (1999) Pruning and summarizing the discovered associations. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, CA, USA, pp 125–134
Liu H, Hussain F, Tan CL, Dash M (2002) Discretization: an enabling technique. Data Min Knowl Discov 6(4):393-423
Ng R, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained associations rules. In: Proceedings of ACM SIGMOD international conference on management of data, Seattle, WA, pp 13–24
Padmanabhan B, Tuzhilin A (1998) A belief-driven method for discovering unexpected patterns. In: Proceedings of the 4th international conference on knowledge discovery and data mining, KDD, New York, USA, pp 94–100
Quinlan JR (1992) C.45: program for machine learning. Morgan Kaufmann, Los Atlos, CA
Scott DW, Thompson JR (1983) Probability density estimation in higher dimensions. In: Proceedings of the 15th symposium on the interface, North-Holland, pp 173–179
Silberschatz A, Tuzhilin A (1996) What makes patterns interesting in knowledge discovery systems. IEEE Trans Knowl Data Eng 8(6):970–974
Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. In: Proceedings of the ACM SIGMOD conference on management of data, Montreal, Quebec, Canada
Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: Proceedings of the 3rd international conference on knowledge discovery and data mining, KDD, Newport Beach, CA, USA, pp 67–73
Tan P-N, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proceedings of the 8th ACM SIGKDD, international conference on knowledge discovery and data mining, Madison, WI, USA, p 183
Toivonen H, Klemetinen M, Ronkainen P, Hatonen K, Mannila H (1995) Pruning and grouping discovered association rules. In: Proceedings of the Mlnet workshop on statistics, machine learning, and discovery in databases, Heraklion, Crete, Greece, pp 47–52
Yang Y, Webb GI (2002) A comparative study of discretization methods for naive-bayes classifiers. In: Proceedings of the Pacific Rim knowledge acquisition workshop, PKAW, Tokyo, pp 159–173
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by M. J. Zaki.
Rights and permissions
About this article
Cite this article
Berrado, A., Runger, G.C. Using metarules to organize and group discovered association rules. Data Min Knowl Disc 14, 409–431 (2007). https://doi.org/10.1007/s10618-006-0062-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-006-0062-6