Abstract
Given the theoretical framework of Mannila and Toivonen [26], we are interested in the discovery of the positive border of interesting patterns, also called the most specific interesting patterns. Many approaches have been proposed among which we quote the levelwise algorithm and the Dualize and Advance algorithm. In this paper, we propose an adaptive strategy – complementary to these two algorithms – based on four steps: 1) In order to initialize the discovery, eliciting some elements of the negative border, for instance using a levelwise strategy until a certain level k. 2) From the negative border found so far, inferring the optimistic positive border by dualization, i.e. the set of patterns whose all specializations are known to be not interesting patterns. 3) Estimating the distance between the positive border to be discovered and the optimistic positive border. 4) Based on these estimates, carrying out an adaptive search either bottom-up (the jump was too optimistic) or top-down (the solution should be very close).
We have instantiated this proposition to the problem of inclusion dependency (IND) discovery. IND is a generalization of the well known concept of foreign keys in databases and is very important in practice. We will first point out how the problem of IND discovery fits into the theoretical framework of [26]. Then, we will describe an instantiation of our adaptive strategy for IND discovery, called Zigzag, from which some experiments were conducted on synthetic databases. The underlying application of this work takes place in a project called DBA Companion devoted to the understanding of existing databases at the logical level using data mining techniques.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Abiteboul, S., Hull, R., Vianu, V.: Fondements des bases de données. Addison-Wesley, Reading (2000)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Databases, Santiago de Chile, Chile, pp. 487–499 (1994)
Albert-Lorincz, H., Boulicaut, J.-F.: Mining frequent sequential patterns under regular expressions: A highly adaptive strategy for pushing contraints. In: Proceedings of the Third SIAM International Conference on Data Mining, San Francisco, CA, USA. SIAM (2003)
Bailey, J., Manoukian, T., Ramamohanarao, K.: A fast algorithm for computing hypergraph transversals and its application in mining emerging patterns. In: IEEE International Conference on Data Mining (ICDM 2003), Floride, USA, pp. 485–488. IEEE Computer Society, Los Alamitos (November 2003)
Bayardo, R.: Efficiently mining long patterns from databases. In: Haas, L.M., Tiwary, A. (eds.) SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, Seattle, Washington, USA, June 2-4, pp. 85–93. ACM Press, New York (1998)
Bell, S., Brockhausen, P.: Discovery of Data Dependencies in Relational Databases. Technical report, LS-8 Report 14, University of Dortmund, p. 18 (April 1995)
Berge, C.: Graphs and Hypergraphs, 2nd rev. edn. North-Holland Mathematical Library 6. American Elsevier, Amsterdam (1976)
Bonchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D.: Adaptive constraint pushing in frequent pattern mining. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 47–58. Springer, Heidelberg (2003)
Boros, E., Elbassioni, K., Gurvich, V., Khachiyan, L.: An efficient implementation of a quasi-polynomial algorithm for generating hypergraph transversals and its application in joint generation. special issue of Discrete Applied Mathematics
Burdick, D., Calimlim, M., Gehrke, J.: Mafia: A maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 17th IEEE International Conference on Data Engineering, Heidelberg, Germany, pp. 443–452. IEEE Computer Society, Los Alamitos (2001)
De Marchi, F., Lopes, S., Petit, J.-M.: Efficient algorithms for mining inclusion dependencies. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 464–476. Springer, Heidelberg (2002)
De Marchi, F., Lopes, S., Petit, J.-M., Toumani, F.: Analysis of existing databases at the logical level: the dba companion project. ACM Sigmod Record 32(1), 47–52 (2003)
Demetrovics, J., Thi, V.: Some remarks on generating Armstrong and inferring functional dependencies relation. Acta Cybernetica 12(2), 167–180 (1995)
Eiter, T., Gottlob, G., Makino, K.: New results on monotone dualization and generating hypergraph transversals. In: STOC 2002, Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, Montreal, Quebec, Canada, June 2-4, 1998, pp. 14–22. ACM Press, New York (2002)
Flouvat, F., Marchi, F.D., Petit, J.-M.: Abs: Adaptive borders search of frequent itemsets. In: FIMI 2004 (2004)
Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent itemsets. In: ICDM 2001, Proceedings IEEE International Conference on Data Mining, pp. 163–170. ACM Press, New York (2001)
Gunopulos, D., Khardon, R., Mannila, H., Saluja, S., Toivonen, H., Sharma, R.S.: Discovering all most specific sentences. ACM Transaction on Database Systems 28(2), 140–174 (2003)
Kantola, M., Mannila, H., Räihä, K.-J., Siirtola, H.: Discovering functional and inclusion dependencies in relational databases. International Journal of Intelligent Systems 7, 591–607 (1992)
Kivinen, J., Mannila, H.: Approximate inference of functional dependencies from relations. Theoritical Computer Science 149(1), 129–149 (1995)
Koeller, A., Rundensteiner, E.: Discovery of high-dimentional inclusion dependencies (poster). In: International Conference on Data Engineering (ICDE 2003). IEEE Computer Society, Los Alamitos (2003)
Levene, M., Loizou, G.: A Guided Tour of Relational Databases and Beyond. Springer, Heidelberg (1999)
Lin, D.-I., Kedem, Z.M.: Pincer search: A new algorithm for discovering the maximum frequent set. In: Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 105–119. Springer, Heidelberg (1998)
Lopes, S., De Marchi, F., Petit, J.-M.: DBA companion: A tool for logical database tuning (demo). In: 20th Proceedings of the IEEE International Conference on Data Engineering, Boston, USA, p. 859. IEEE Computer Society, Los Alamitos (2004)
Lopes, S., Petit, J.-M., Lakhal, L.: Functional and approximate dependencies mining: Databases and FCA point of view. Special issue of Journal of Experimental and Theoretical Artificial Intelligence 14(2/3), 93–114 (2002)
Lopes, S., Petit, J.-M., Toumani, F.: Discovering interesting inclusion dependencies: Application to logical database tuning. Information Systems 17(1), 1–19 (2002)
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)
Mitchell, J.-C.: The implication problem for functional and inclusion dependencies. Information and Control 56(3), 154–173 (1983)
Orlando, S., Palmerini, P., Perego, R., Silvestri, F.: Adaptive and resource-aware mining of frequent sets. In: International Conference on Data Mining (ICDM 2002), Maebashi City, Japan, pp. 338–345. IEEE Computer Society, Los Alamitos (2002)
Rymon, R.: Search through systematic set enumeration. In: Nebel, B., Rich, C., Swartout, W.R. (eds.) International Conference on Principles of Knowledge Representation and Reasoning (KR 1992), Cambridge, USA, pp. 539–550. Morgan Kaufmann, San Francisco (1992)
Uno, T., Satoh, K.: Detailed description of an algorithm for enumeration of maximal frequent sets with irredundant dualization. In: Goethals, B., Zaki, M.J. (eds.) FIMI 2003, Frequent Itemset Mining Implementations, ICDM 2003 Workshop, Melbourne, Florida, USA. CEUR Workshop Proceedings, vol. 90 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
De Marchi, F., Flouvat, F., Petit, JM. (2006). Adaptive Strategies for Mining the Positive Border of Interesting Patterns: Application to Inclusion Dependencies in Databases. In: Boulicaut, JF., De Raedt, L., Mannila, H. (eds) Constraint-Based Mining and Inductive Databases. Lecture Notes in Computer Science(), vol 3848. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11615576_5
Download citation
DOI: https://doi.org/10.1007/11615576_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31331-1
Online ISBN: 978-3-540-31351-9
eBook Packages: Computer ScienceComputer Science (R0)