Abstract
Known pattern discovery algorithms for finding tilings (covers of 0/1-databases consisting of 1-rectangles) cannot be integrated in instant and interactive KD tools, because they do not satisfy at least one of two key requirements: a) to provide results within a short response time of only a few seconds and b) to return a concise set of patterns with only a few elements that nevertheless covers a large fraction of the input database. In this paper we present a novel randomized algorithm that works well under these requirements. It is based on the recursive application of a simple tile sample procedure that can be implemented efficiently using rejection sampling. While, as we analyse, the theoretical solution distribution can be weak in the worst case, the approach performs very well in practice and outperforms previous sampling as well as deterministic algorithms.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Uci machine learning repository, http://archive.ics.uci.edu/ml/
Frequent itemset mining dataset repository (2004), http://fimi.ua.ac.be/data
Al Hasan, M., Zaki, M.J.: Output space sampling for graph patterns. In: Proc. VLDB Endow, pp. 730–741 (2009)
Blumenstock, A., Hipp, J., Kempe, S., Lanquillon, C., Wirth, R.: Interactivity closes the gap. In: Proc. of the KDD Workshop on Data Min. for Business Applications, Philadelphia, USA (2006)
Boley, M.: The Efficient Discovery of Interesting Closed Pattern Collections. PhD thesis (2011)
Boley, M., Lucchese, C., Paurat, D., Gärtner, T.: Direct local pattern sampling by efficient two–step random procedures. In: Proc. ACM SIGKDD (2011)
Boley, M., Mampaey, M., Kang, B., Tokmakov, P., Wrobel, S.: One click mining: Interactive local pattern discovery through implicit preference and performance learning. In: IDEA 2013 Workshop in Proc. ACM SIGKDD, pp. 27–35. ACM (2013)
Boley, M., Moens, S., Gärtner, T.: Linear space direct pattern sampling using coupling from the past. In: Proc. ACM SIGKDD, pp. 69–77. ACM (2012)
Dzyuba, V., van Leeuwen, M.: Interactive discovery of interesting subgroup sets., pp. 150–161 (2013)
Geerts, F., Goethals, B., Mielikäinen, T.: Tiling databases. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 278–289. Springer, Heidelberg (2004)
Goethals, B., Moens, S., Vreeken, J.: Mime: a framework for interactive visual pattern mining. In: Proc. ACM SIGKDD, pp. 757–760. ACM (2011)
Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with cn2-sd. J. Mach. Learn. Res, 153–188 (2004)
Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., Mannila, H.: The discrete basis problem. IEEE Trans. on Knowl. and Data Eng., 1348–1362 (2008)
Moens, S., Goethals, B.: Randomly sampling maximal itemsets. In: IDEA 2013 Workshop in Proc. ACM SIGKDD (2013)
Neal, R.M.: Slice sampling. In: Ann. Statist., pp. 705–767 (2003)
Ng, R.T., Lakshmanan, L.V., Han, J., Pang, A.: Exploratory mining and pruning optimizations of constrained association rules. ACM SIGMOD Record, 13–24 (1998)
van Leeuwen, M.: Interactive data exploration using pattern mining. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 169–182. Springer, Heidelberg (2014)
Vreeken, J., van Leeuwen, M., Siebes, A.: Krimp: Mining itemsets that compress. Data Min. Knowl. Discov., 169–214 (2011)
Škrabal, R., Šimůnek, M., Vojíř, S., Hazucha, A., Marek, T., Chudán, D., Kliegr, T.: Association rule mining following the web search paradigm. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012, Part II. LNCS, vol. 7524, pp. 808–811. Springer, Heidelberg (2012)
Xiang, Y., Jin, R., Fuhry, D., Dragan, F.F.: Summarizing transactional databases with overlapped hyperrectangles. Data Min. Knowl. Discov, 215–251 (2011)
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: Parallel algorithms for discovery of association rules. Data Min. Knowl. Discov., 343–373 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Moens, S., Boley, M., Goethals, B. (2014). Providing Concise Database Covers Instantly by Recursive Tile Sampling. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds) Discovery Science. DS 2014. Lecture Notes in Computer Science(), vol 8777. Springer, Cham. https://doi.org/10.1007/978-3-319-11812-3_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-11812-3_19
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11811-6
Online ISBN: 978-3-319-11812-3
eBook Packages: Computer ScienceComputer Science (R0)