Contrast Mining from Interesting Subgroups

Langohr, Laura; Podpečan, Vid; Petek, Marko; Mozetič, Igor; Gruden, Kristina

doi:10.1007/978-3-642-31830-6_28

Laura Langohr⁵,
Vid Podpečan⁶,
Marko Petek⁷,
Igor Mozetič⁶ &
…
Kristina Gruden⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7250))

Abstract

Subgroup discovery methods find interesting subsets of objects of a given class. We propose to extend subgroup discovery by a second subgroup discovery step to find interesting subgroups of objects specific for a class in one or more contrast classes. First, a subgroup discovery method is applied. Then, contrast classes of objects are defined by using set theoretic functions on the discovered subgroups of objects. Finally, subgroup discovery is performed to find interesting subgroups within the two contrast classes, pointing out differences between the characteristics of the two. This has various application areas, one being biology, where finding interesting subgroups has been addressed widely for gene-expression data. There, our method finds enriched gene sets which are common to samples in a class (e.g., differential expression in virus infected versus non-infected) and at the same time specific for one or more class attributes (e.g., time points or genotypes). We report on experimental results on a time-series data set for virus infected potato plants. The results present a comprehensive overview of potato’s response to virus infection and reveal new research hypotheses for plant biologists.

Download to read the full chapter text

Chapter PDF

pysubgroup: Easy-to-Use Subgroup Discovery in Python

DISDi: Discontinuous Intervals in Subgroup Discovery

Subgroup Discovery Algorithms: A Survey and Empirical Evaluation

Article 06 May 2016

References

Berthold, M.R. (ed.): Bisociative Knowledge Discovery. LNCS (LNAI), vol. 7250. Springer, Heidelberg (2012)
Google Scholar
Kralj Novak, P., Vavpetič, A., Trajkovski, I., Lavrač, N.: Towards Semantic Data Mining with g-SEGS. In: SiKDD 2010 (2010)
Google Scholar
Bruner, J., Goodnow, J., Austin, G.: A Study of Thinking. Wiley (1956)
Google Scholar
Michalski, R.: A Theory and Methodology of Inductive Learning. Artificial Intelligence 20(2), 111–161 (1983)
Article MathSciNet Google Scholar
van Belle, G., Fisher, L., Heagerty, P., Lumley, T.: Biostatistics: A Methodology for the Health Sciences, 2nd edn. Wiley series in probability and statistics. Wiley-Interscience (1993)
Google Scholar
Klösgen, W.: Explora: a Multipattern and Multistrategy Discovery Assistant. In: Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 249–271. AAAI (1996)
Google Scholar
Wrobel, S.: An Algorithm for Multi-Relational Discovery of Subgroups. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997)
Chapter Google Scholar
del Jesus, M., Gonzalez, P., Herrera, F., Mesonero, M.: Evolutionary Fuzzy Rule Induction Process for Subgroup Discovery: A Case Study in Marketing. Transactions on Fuzzy Systems 15, 578–592 (2007)
Article Google Scholar
May, M., Ragia, L.: Spatial Subgroup Discovery Applied to the Analysis of Vegetation Data. In: Karagiannis, D., Reimer, U. (eds.) PAKM 2002. LNCS (LNAI), vol. 2569, pp. 49–61. Springer, Heidelberg (2002)
Chapter Google Scholar
Allison, D., Cui, X., Page, G., Sabripour, M.: Microarray Data Analysis: from Disarray to Consolidation and Consensus. Nature Reviews, Genetics 5, 55–65 (2006)
Article Google Scholar
Mootha, V., Lindgren, C., Eriksson, K.F., Subramanian, A., Sihag, S., Lehar, J., Puigserver, P., Carlsson, E., Ridderstrale, M., Laurila, E., Houstis, N., Daly, M., Patterson, N., Mesirov, J., Golub, T., Tamayo, P., Spiegelman, B., Lander, E., Hirschhorn, J., Altshuler, D., Groop, L.: PGC-1α-responsive Genes Involved in Oxidative Phosphorylation are Coordinately Downregulated in Human Diabetes. Nature Genetics 34(3), 267–273 (2003)
Article Google Scholar
Kim, S.Y., Volsky, D.: PAGE: Parametric Analysis of Gene Set Enrichment. BMC Bioinformatics 6(1), 144 (2005)
Article Google Scholar
Antoniotti, M., Ramakrishnan, N., Mishra, B.: GOALIE, A Common Lisp Application to Discover Kripke Models: Redescribing Biological Processes from Time-Course Data. In: ILC 2005 (2005)
Google Scholar
Antoniotti, M., Carreras, M., Farinaccio, A., Mauri, G., Merico, D., Zoppis, I.: An Application of Kernel Methods to Gene Cluster Temporal Meta-Analysis. Computers & Operations Research 37(8), 1361–1368 (2010)
Article Google Scholar
Zoppis, I., Merico, D., Antoniotti, M., Mishra, B., Mauri, G.: Discovering Relations Among GO-Annotated Clusters by Graph Kernel Methods. In: Măndoiu, I.I., Zelikovsky, A. (eds.) ISBRA 2007. LNCS (LNBI), vol. 4463, pp. 158–169. Springer, Heidelberg (2007)
Chapter Google Scholar
Bay, S., Pazzani, M.: Detecting Group Differences: Mining Contrast Sets. Data Mining and Knowledge Discovery 5, 213–246 (2001)
Article Google Scholar
Webb, G., Butler, S., Newlands, D.: On Detecting Differences between Groups. In: KDD 2003, pp. 256–265. ACM (2003)
Google Scholar
Kralj Novak, P., Lavrač, N., Gamberger, D., Krstacic, A.: CSM-SD: Methodology for Contrast Set Mining through Subgroup Discovery. Journal of Biomedical Informatics 42(1), 113–122 (2009)
Article Google Scholar
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.: Fast Discovery of Association Rules. In: Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI (1996)
Google Scholar
Suzuki, E.: Autonomous Discovery of Reliable Exception Rules. In: KDD 1997 (1997)
Google Scholar
Agrawal, R., Imieliński, T., Swami, A.: Mining Association Rules Between Sets of Items in Large Databases. In: SIGMOD 1993, pp. 207–216. ACM (1993)
Google Scholar
Mielikäinen, T.: Intersecting Data to Closed Sets with Constraints. In: FIMI 2003 (2003)
Google Scholar
Pan, F., Cong, G., Tung, A., Yang, J., Zaki, M.: Carpenter: Finding Closed Patterns in Long Biological Datasets. In: KDD 2003, pp. 637–642. ACM (2003)
Google Scholar
Borgelt, C., Yang, X., Nogales-Cadenas, R., Carmona-Saez, P., Pascual-Montano, A.: Finding Closed Frequent Item Sets by Intersecting Transactions. In: EDBT/ICDT 2011, pp. 367–376. ACM (2011)
Google Scholar
De Raedt, L., Dehaspe, L.: Clausal Discovery. Machine Learning 26, 99–146 (1997)
Article Google Scholar
Gruber, T.: Toward principles for the design of ontologies used for knowledge sharing. International Journal of Human-Computer Studies 43, 907–928 (1995)
Article Google Scholar
Srikant, R., Agrawal, R.: Mining Generalized Association Rules. In: VLDB 1995, pp. 407–419 (1995)
Google Scholar
Khatri, P., Drǎghici, S.: Ontological Analysis of Gene Expression Data: Current Tools, Limitations, and Open Problems. Bioinformatics 21(18), 3587–3595 (2005)
Article Google Scholar
Aoki-Kinoshita, K., Kanehisa, M.: Gene Annotation and Pathway Mapping in KEGG. In: Walker, J.M., Bergman, N.H. (eds.) Comparative Genomics, vol. 396, pp. 71–91. Humana Press (2007)
Google Scholar
Thimm, O., Bläsing, O., Gibon, Y., Nagel, A., Meyer, S., Krüger, P., Selbig, J., Müller, L., Rhee, S., Stitt, M.: MapMan: a User-driven Tool to Display Genomics Data Sets Onto Diagrams of Metabolic Pathways and Other Biological Processes. The Plant Journal 37(6), 914–939 (2004)
Article Google Scholar
Han, J., Fu, Y.: Discovery of Multiple-Level Association Rules from Large Databases. In: VLDB 1995, pp. 420–431. Morgan Kaufmann Publishers Inc. (1995)
Google Scholar
Trajkovski, I., Lavrač, N., Tolar, J.: SEGS: Search for enriched gene sets in microarray data. Journal of Biomedical Informatics 41(4), 588–601 (2008)
Article Google Scholar
Kralj Novak, P., Lavrač, N., Webb, G.: Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining. Journal of Machine Learning Research 10, 377–403 (2009)
MATH Google Scholar
Cui, X., Churchill, G.: Statistical Tests for Differential Expression in cDNA Microarray Experiments. Genome Biology 4(4), 210.1–210.10 (2003)
Google Scholar
Baldi, P., Long, A.: A Bayesian Framework for the Analysis of Microarray Expression Data: Regularized t-test and Statistical Inferences of Gene Changes. Bioinformatics 17(6), 509–519 (2001)
Article Google Scholar
Subramanian, A., Tamayo, P., Mootha, V., Mukherjee, S., Ebert, B., Gillette, M., Paulovich, A., Pomeroy, S., Golub, T., Lander, E., Mesirov, J.: Gene Set Enrichment Analysis: A Knowledge-based Approach for Interpreting Genome-wide Expression Profiles. PNAS 102(43), 15545–15550 (2005)
Article Google Scholar
The Potato Genome Sequencing Consortium: Genome sequence and analysis of the tuber crop potato. Nature 475, 189–195 (2011)
Google Scholar
Bioinformatics @ IPK Gatersleben: BLASTX against Arabidopsis, http://pgrc-35.ipk-gatersleben.de/pls/htmldb_pgrc/f?p=194:5:941167238168085::NO (visited on March 2011)
Podpečan, V., Lavrač, N., Mozetič, I., Kralj Novak, P., Trajkovski, I., Langohr, L., Kulovesi, K., Toivonen, H., Petek, M., Motaln, H., Gruden, K.: SegMine Workflows for Semantic Microarray Data Analysis in Orange4WS. BMC Bioinformatics 12, 416 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and, Helsinki Institute for Information Technology (HIIT), University of Helsinki, Finland
Laura Langohr
Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia
Vid Podpečan & Igor Mozetič
Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia
Marko Petek & Kristina Gruden

Authors

Laura Langohr
View author publications
You can also search for this author in PubMed Google Scholar
Vid Podpečan
View author publications
You can also search for this author in PubMed Google Scholar
Marko Petek
View author publications
You can also search for this author in PubMed Google Scholar
Igor Mozetič
View author publications
You can also search for this author in PubMed Google Scholar
Kristina Gruden
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Information Science, University of Konstanz, Konstanz, Germany
Michael R. Berthold

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Langohr, L., Podpečan, V., Petek, M., Mozetič, I., Gruden, K. (2012). Contrast Mining from Interesting Subgroups. In: Berthold, M.R. (eds) Bisociative Knowledge Discovery. Lecture Notes in Computer Science(), vol 7250. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31830-6_28

Download citation

DOI: https://doi.org/10.1007/978-3-642-31830-6_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31829-0
Online ISBN: 978-3-642-31830-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Contrast Mining from Interesting Subgroups

Abstract

Chapter PDF

Similar content being viewed by others

pysubgroup: Easy-to-Use Subgroup Discovery in Python

DISDi: Discontinuous Intervals in Subgroup Discovery

Subgroup Discovery Algorithms: A Survey and Empirical Evaluation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Contrast Mining from Interesting Subgroups

Abstract

Chapter PDF

Similar content being viewed by others

pysubgroup: Easy-to-Use Subgroup Discovery in Python

DISDi: Discontinuous Intervals in Subgroup Discovery

Subgroup Discovery Algorithms: A Survey and Empirical Evaluation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation