Abstract
In this paper we introduce a simple probabilistic model, hierarchical tiles, for 0–1 data. A basic tile (X,Y,p) specifies a subset X of the rows and a subset Y of the columns of the data, i.e., a rectangle, and gives a probability p for the occurrence of 1s in the cells of X × Y. A hierarchical tile has additionally a set of exception tiles that specify the probabilities for subrectangles of the original rectangle. If the rows and columns are ordered and X and Y consist of consecutive elements in those orderings, then the tile is geometric; otherwise it is combinatorial. We give a simple randomized algorithm for finding good geometric tiles. Our main result shows that using spectral ordering techniques one can find good orderings that turn combinatorial tiles into geometric tiles. We give empirical results on the performance of the methods.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Aggarwal, C., Yu, P.: Finding generalized projected clusters in high dimensional spaces. In: SIGMOD (2000)
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD (1998)
Agrawal, R., Imielinski, T., Swami, A.: Mining associations between sets of items in large databases. In: SIGMOD (1993)
Atkins, J., Boman, E., Hendrickson, B.: A spectral algorithm for seriation and the consecutive ones problem. SIAM Journal on Computing 28(1) (1999)
Beygelzimer, A., Perng, C.-S., Ma, S.: Fast ordering of large categorical datasets for better visualization. In: SIGKDD (2001)
Calders, T., Goethals, B.: Mining all non-derivable frequent itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, p. 74. Springer, Heidelberg (2002)
Cheng, C., Fu, A., Zhang, Y.: Entropy-based subspace clustering for mining numerical data. In: SIGKDD (1999)
Cheng, Y., Church, G.: Biclustering of expression data. In: ISMB (2000)
Chung, F.: Spectral graph theory. American Mathematical Society, Providence (1997)
Edmonds, J., Gryz, J., Liang, D., Miller, R.J.: Mining for empty spaces in large data sets. Theor. Comput. Sci. 296(3), 435–452 (2003)
Fiedler, M.: Algebraic connectivity of graphs. Czech. Math. J. 23 (1973)
Friedman, J., Meulman, J.: Clustering objects on subsets of attributes. JRSS B (2004)
Gaines, B., Compton, P.: Induction of ripple-down rules applied to modeling large databases. JIIS 5(3) (1993)
Han, J., Wang, J., Lu, Y., Tzvetkov, P.: Mining top-k frequent closed patterns without minimum support. In: ICDM (2002)
Jain, A., Murty, M., Flynn, P.: Data clustering: A review. ACM Computing Surveys (1999)
Kivinen, J., Mannila, H., Ukkonen, E.: Learning hierarchical rule sets. In: COLT (1992)
Koren, Y., Harel, D.: Multi-scale algorithm for the linear arrangement problem. Technical Report MCS02-04, The Weizmann Institute of Science (2002)
Liu, B., Ku, L.-P., Hsu, W.: Discovering interesting holes in data. In: IJCAI (1997)
Murali, T., Kasif, S.: Extracting conserved gene expression motifs from gene expression data. In: Pac. Symp. Biocomp., vol. 8 (2003)
Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: NIPS (2001)
Pothen, A., Simon, H., Wang, L.: Spectral nested dissection. Technical Report CS-92-01, Pennsylvania State University, Department of Computer Science (1992)
Rivest, R.: Learning decision lists. Machine Learning 2(3) (1987)
Zha, H., He, X., Ding, C., Gu, M., Simon, H.: Bipartite graph partitioning and data clustering. In: CIKM (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gionis, A., Mannila, H., Seppänen, J.K. (2004). Geometric and Combinatorial Tiles in 0–1 Data. In: Boulicaut, JF., Esposito, F., Giannotti, F., Pedreschi, D. (eds) Knowledge Discovery in Databases: PKDD 2004. PKDD 2004. Lecture Notes in Computer Science(), vol 3202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30116-5_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-30116-5_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23108-0
Online ISBN: 978-3-540-30116-5
eBook Packages: Springer Book Archive