Abstract
Exploratory Data Mining (EDM), the contemporary heir of Exploratory Data Analysis (EDA) pioneered by Tukey in the seventies, is the task of facilitating the extraction of interesting nuggets of information from possibly large and complexly structured data. Major conceptual challenges in EDM research are the understanding of how one can formalise a nugget of information (given the diversity of types of data of interest), and how one can formalise how interesting such a nugget of information is to a particular user (given the diversity of types of users and intended purposes). In this Nectar paper we briefly survey a number of recent contributions made by us and collaborators towards a theoretically motivated and practically usable resolution of these challenges.
Chapter PDF
Similar content being viewed by others
Keywords
- Prior Belief
- Exploratory Data Analysis
- Background Distribution
- Projection Pursuit
- Subjective Interestingness
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
De Bie, T.: An information-theoretic framework for data mining. In: Proc. of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2011 (2011)
De Bie, T.: Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Mining and Knowledge Discovery 23(3), 407–446 (2011)
De Bie, T.: Subjectively interesting alternative clusters. In: Proceedings of the 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings (2011)
De Bie, T., Kontonasios, K.-N., Spyropoulou, E.: A framework for mining interesting pattern sets. SIGKDD Explorations 12(2) (December 2010)
Faloutsos, C., Megalooikonomou, V.: On data mining, compression, and kolmogorov complexity. Data Mining and Knowledge Discovery 15, 3–20 (2007)
Friedman, J., Tukey, J.: A projection pursuit algorithm for exploratory data analysis. IEEE Transactions on Computers 100(9), 881–890 (1974)
Geerts, F., Goethals, B., Mielikäinen, T.: Tiling databases. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 278–289. Springer, Heidelberg (2004)
Gionis, A., Mannila, H., Mielikäinen, T., Tsaparas, P.: Assessing data mining results via swap randomization. ACM Transactions on Knowledge Discovery from Data 1(3), 14 (2007)
Hanhijarvi, S., Ojala, M., Vuokko, N., Puolamäki, K., Tatti, N., Mannila, H.: Tell me something I don’t know: Randomization strategies for iterative data mining. In: Proc. of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 379–388 (2009)
Huber, P.: Projection pursuit. The annals of Statistics, 435–475 (1985)
Kontonasios, K.-N., De Bie, T.: An information-theoretic approach to finding informative noisy tiles in binary databases. In: Proceedings of the 2010 SIAM International Conference on Data Mining (2010)
Kontonasios, K.-N., De Bie, T.: Formalizing complex prior information to quantify subjective interestingness of frequent pattern sets. In: Proc. of the 11th International Symposium on Intelligent Data Analysis, IDA (2012)
Kontonasios, K.-N., De Bie, T.: Subjectively interesting alternative clusterings. Machine Learning (2013)
Kontonasios, K.-N., Spyropoulou, E., De Bie, T.: Knowledge discovery interestingness measures based on unexpectedness. WIREs Data Mining and Knowledge Discovery 2(5), 386–399 (2012)
Kontonasios, K.-N., Vreeken, J., De Bie, T.: Maximum entropy modelling for assessing results on real-valued data. In: Proceedings of the IEEE International Conference on Data Mining, ICDM (2011)
Lemmens, K., De Bie, T., Dhollander, T., Keersmaecker, S.D., Thijs, I., Schoofs, G., De Weerdt, A., De Moor, B., Vanderleyden, J., Collado-Vides, J., Engelen, K., Marchal, K.: DISTILLER: a data integration framework to reveal condition dependency of complex regulons in escherichia coli. Genome Biology 10(R27) (2009)
Lemmens, K., Dhollander, T., De Bie, T., Monsieurs, P., Engelen, K., Winderickx, J., De Moor, B., Marchal, K.: Inferring transcriptional module networks from ChIP-chip-, motif- and microarray data. Genome Biology 7(R37) (2006)
Lijffijt, J., Papapetrou, P., Puolamki, K.: A statistical significance testing approach to mining the most informative set of patterns. In: Data Mining and Knowledge Discovery (December 2012)
Mannila, H.: Theoretical frameworks for data mining. SIGKDD Explorations (2000)
Siebes, A., Vreeken, J., van Leeuwen, M.: Item sets that compress. In: SIAM Conference on Data Mining (2006)
Spyropoulou, E., De Bie, T.: Interesting multi-relational patterns. In: Proceedings of the IEEE International Conference on Data Mining, ICDM (2011)
Spyropoulou, E., De Bie, T., Boley, M.: Mining interesting patterns in multi-relational data. In: Data Min. Knowl. Discov. (2013)
Spyropoulou, E., De Bie, T., Boley, M.: Mining interesting patterns in multi-relational data with n-ary relationships. In: Proceedings of the International Conference on Discovery Science, DS (2013)
Tukey, J.: Exploratory data analysis, Reading, MA, vol. 231 (1977)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
De Bie, T., Spyropoulou, E. (2013). A Theoretical Framework for Exploratory Data Mining: Recent Insights and Challenges Ahead. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2013. Lecture Notes in Computer Science(), vol 8190. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40994-3_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-40994-3_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40993-6
Online ISBN: 978-3-642-40994-3
eBook Packages: Computer ScienceComputer Science (R0)