Efficient Mining of Top Correlated Patterns Based on Null-Invariant Measures

Kim, Sangkyum; Barsky, Marina; Han, Jiawei

doi:10.1007/978-3-642-23783-6_12

Sangkyum Kim²³,
Marina Barsky²⁴ &
Jiawei Han²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6912))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

3893 Accesses
13 Citations

Abstract

Mining strong correlations from transactional databases often leads to more meaningful results than mining association rules. In such mining, null (transaction)-invariance is an important property of the correlation measures. Unfortunately, some useful null-invariant measures such as Kulczynski and Cosine, which can discover correlations even for the very unbalanced cases, lack the (anti)-monotonicity property. Thus, they could only be applied to frequent itemsets as the post-evaluation step. For large datasets and for low supports, this approach is computationally prohibitive. This paper presents new properties for all known null-invariant measures. Based on these properties, we develop efficient pruning techniques and design the Apriori-like algorithm NICoMiner for mining strongly correlated patterns directly. We develop both the threshold-bounded and the top-k variations of the algorithm, where top-k is used when the optimal correlation threshold is not known in advance and to give user control over the output size. We test NICoMiner on real-life datasets from different application domains, using Cosine as an example of the null-invariant correlation measure. We show that NICoMiner outperforms support-based approach more than an order of magnitude, and that it is very useful for discovering top correlations in itemsets with low support.

Download to read the full chapter text

Chapter PDF

Mining Correlated Patterns with Multiple Minimum All-Confidence Thresholds

Interesting Patterns

SS-FIM: Single Scan for Frequent Itemsets Mining in Transactional Databases

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Dataset: Dblp (2006), http://www.informatik.uni-trier.de/~ley/db/
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD (1993)
Google Scholar
Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: generalizing association rules to correlations. In: SIGMOD (1997)
Google Scholar
Campagna, A., Pagh, R.: Finding associations and computing similarity via biased pair sampling. In: Perner, P. (ed.) ICDM 2009. LNCS, vol. 5633, Springer, Heidelberg (2009)
Google Scholar
Cohen, J., West, S.G., Cohen, P., Aiken, L.: Applied Multiple Regression Correlation Analysis for the Behavioral Sciences, 3rd edn. Lawrence Erlbaum Assoc Inc., Mahwah (2002)
Google Scholar
Hahsler, M.: Groceries dataset (2007), http://rss.acs.unt.edu/Rdoc/library/arules/data/
Hahsler, M., Hornik, K., Reutterer, T.: Implications of probabilistic data modeling for mining association rules. In: Proceedings of the 29th Annual Conference of the Gesellschaft für Klassifikation (2006)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)
MATH Google Scholar
Kuo, W.P., Jenssen, T.-K., Butte, A.J., Ohno-Machado, L., Kohane, I.S.: Analysis of matched mrna measurements from two different microarray technologies. Bioinformatics 18(3), 405–412 (2002)
Article Google Scholar
Lee, Y.-K., Kim, W.-Y., Cai, Y.D., Han, J.: Comine: Efficient mining of correlated patterns. In: ICDM (2003)
Google Scholar
Omiecinski, E.R.: Alternative interest measures for mining associations in databases. IEEE Trans. on Knowl. and Data Eng. 15, 57–69 (2003)
Article Google Scholar
Redmond, M.: Communities and crime dataset (2009), http://archive.ics.uci.edu/ml/datasets/Communities+and+Crime
Redmond, M.A., Baveja, A.: A data-driven software tool for enabling cooperative information sharing among police departments. European Journal of Operational Research 141, 660–678 (2002)
Article MATH Google Scholar
Srikant, R., Agrawal, R.: Mining generalized association rules. In: VLDB (1995)
Google Scholar
Sun, Y., Han, J., Gao, J., Yu, Y.: iTopicModel: Information network-integrated topic modeling. In: Perner, P. (ed.) ICDM 2009. LNCS, vol. 5633, Springer, Heidelberg (2009)
Google Scholar
Tan, P.-N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: KDD (2002)
Google Scholar
Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining, 1st edn. Addison-Wesley Longman Publishing Co., Inc., Amsterdam (2005)
Google Scholar
von Storch, H., Zwiers, F.W.: Statistical analysis in climate research. Cambridge University Press, Cambridge (2002)
Google Scholar
Wu, T., Chen, Y., Han, J.: Re-examination of interestingness measures in pattern mining: a unified framework. Data Min. Knowl. Discov. 21(3), 371–397 (2010)
Article MathSciNet Google Scholar
Xin, D., Han, J., Yan, X., Cheng, H.: Mining compressed frequent-pattern sets. In: VLDB (2005)
Google Scholar
Xiong, H., He, X., Ding, C.H.Q., Zhang, Y., Kumar, V., Holbrook, S.R.: Identification of functional modules in protein complexes via hyperclique pattern discovery. In: Pacific Symposium on Biocomputing (2005)
Google Scholar
Xiong, H., Zhou, W., Brodie, M., Ma, S.: Top-k φ correlation computation. INFORMS Journal on Computing 20(4), 539–552 (2008)
Article MathSciNet MATH Google Scholar
Zhu, S., Wu, J., Xiong, H., Xia, G.: Scaling up top-k cosine similarity search. Data Knowl. Eng. 70, 60–83 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Illinois at Urbana-Champaign, Urbana, IL, USA
Sangkyum Kim & Jiawei Han
Simon Fraser University, BC, Canada
Marina Barsky

Authors

Sangkyum Kim
View author publications
You can also search for this author in PubMed Google Scholar
Marina Barsky
View author publications
You can also search for this author in PubMed Google Scholar
Jiawei Han
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics and Telecommunications, University of Athens, Panepistimioupolis, Ilisia, 15784, Athens, Greece
Dimitrios Gunopulos
Google Switzerland GmbH, Brandschenkestrasse 110, 8002, Zurich, Switzerland
Thomas Hofmann
Department of Computer Science, University of Bari “Aldo Moro”, via Orabona 4, 70125, Bari, Italy
Donato Malerba
Deptartment of Informatics, Athens University of Economics and Business, Patision 76, 10434, Athens, Greece
Michalis Vazirgiannis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, S., Barsky, M., Han, J. (2011). Efficient Mining of Top Correlated Patterns Based on Null-Invariant Measures. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science(), vol 6912. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23783-6_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-23783-6_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23782-9
Online ISBN: 978-3-642-23783-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient Mining of Top Correlated Patterns Based on Null-Invariant Measures

Abstract

Chapter PDF

Similar content being viewed by others

Mining Correlated Patterns with Multiple Minimum All-Confidence Thresholds

Interesting Patterns

SS-FIM: Single Scan for Frequent Itemsets Mining in Transactional Databases

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Efficient Mining of Top Correlated Patterns Based on Null-Invariant Measures

Abstract

Chapter PDF

Similar content being viewed by others

Mining Correlated Patterns with Multiple Minimum All-Confidence Thresholds

Interesting Patterns

SS-FIM: Single Scan for Frequent Itemsets Mining in Transactional Databases

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation