Abstract
Matrix factorizations are a popular tool to mine regularities from data. There are many ways to interpret the factorizations, but one particularly suited for data mining utilizes the fact that a matrix product can be interpreted as a sum of rank-1 matrices. Then the factorization of a matrix becomes the task of finding a small number of rank-1 matrices, sum of which is a good representation of the original matrix. Seen this way, it becomes obvious that many problems in data mining can be expressed as matrix factorizations with correct definitions of what a rank-1 matrix and a sum of rank-1 matrices mean. This paper develops a unified theory, based on generalized outer product operators, that encompasses many pattern set mining tasks. The focus is on the computational aspects of the theory and studying the computational complexity and approximability of many problems related to generalized matrix factorizations. The results immediately apply to a large number of data mining problems, and hopefully allow generalizing future results and algorithms, as well.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Alon, N., Panigrahy, R., Yekhanin, S.: Deterministic approximation algorithms for the nearest codeword problem. In: Dinur, I., Jansen, K., Naor, J., Rolim, J. (eds.) PPROX and RANDOM 2009. LNCS, vol. 5687, pp. 339–351. Springer, Heidelberg (2009)
Ames, B.P.W., Vavasis, S.A.: Nuclear norm minimization for the planted clique and biclique problems. Math. Program. B 129(1), 69–89 (2011)
Araujo, M., Günnemann, S., Mateos, G., Faloutsos, C.: Beyond blocks: hyperbolic community detection. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014, Part I. LNCS, vol. 8724, pp. 50–65. Springer, Heidelberg (2014)
Arora, S., Babai, L., Stern, J., Sweedyk, Z.: The hardness of approximate optima in lattices, codes, and systems of linear equations. In: FOCS 1993, pp. 724–733 (1993)
Bělohlávek, R., Krmelova, M.: Beyond boolean matrix decompositions: toward factor analysis and dimensionality reduction of ordinal data. In: ICDM 2013, pp. 961–966 (2013)
Bělohlávek, R., Vychodil, V.: Discovery of optimal factors in binary data via a novel method of matrix decomposition. J. Comput. Syst. Sci. 76(1), 3–20 (2010)
Belohlavek, R., Vychodil, V.: Factorizing three-way binary data with triadic formal concepts. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds.) KES 2010, Part I. LNCS, vol. 6276, pp. 471–480. Springer, Heidelberg (2010)
Berman, P., Karpinski, M.: Approximating minimum unsatisfiability of linear equations. In: SODA 2002, pp. 514–516 (2002)
Cerf, L., Besson, J., Nguyen, K.N.T., Boulicaut, J.F.: Closed and noise-tolerant patterns in n-ary relations. Data Min. Knowl. Discov. 26(3), 574–619 (2013)
De Bie, T.: Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Min. Knowl. Discov. 23(3), 407–446 (2011)
Dumer, I., Micciancio, D., Sudan, M.: Hardness of approximating the minimum distance of a linear code. IEEE Trans. Inform. Theory 49(1), 22–37 (2003)
Ene, A., Horne, W., Milosavljevic, N., Rao, P., Schreiber, R., Tarjan, R.E.: Fast exact and heuristic methods for role minimization problems. In: SACMAT 2008, pp. 1–10 (2008)
Feige, U.: A threshold of \(\ln n\) for Approximating Set Cover. J. ACM 45(4), 634–652 (1998)
Garey, M.R., Johnson, D.S.: Computers and intractability: A guide to the theory of NP-Completeness. W. H. Freeman, New York (1979)
Geerts, F., Goethals, B., Mielikäinen, T.: Tiling databases. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 278–289. Springer, Heidelberg (2004)
Johnson, D.S.: Approximation Algorithms for Combinatorial Problems. J. Comput. Syst. Sci. 9, 256–278 (1974)
Junttila, E.: Patterns in permuted binary matrices. Ph.D. thesis, Helsinki University Press, Helsinki, August 2011
Kötter, T., Günnemann, S., Berthold, M., Faloutsos, C.: Extracting taxonomies from bipartite graphs. In: WWW 2015 Companion, pp. 51–52 (2015)
Koutra, D., Kang, U., Vreeken, J., Faloutsos, C.: VoG: summarizing and understanding large graphs. In: SDM 2014, pp. 91–99 (2014)
Le Van, T., van Leeuwen, M., Nijssen, S., Fierro, A.C., Marchal, K., De Raedt, L.: Ranked tiling. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014, Part II. LNCS, vol. 8725, pp. 98–113. Springer, Heidelberg (2014)
Lewis, J.M., Yannakakis, M.: The node-deletion problem for hereditary properties is NP-complete. J. Comput. Syst. Sci. 20(2), 219–230 (1980)
Lucchese, C., Orlando, S., Perego, R.: A Unifying Framework for Mining Approximate Top-k Binary Patterns. IEEE Trans. Knowl. Data Eng. 26(12), 2900–2913 (2013)
Maurus, S., Plant, C.: Ternary matrix factorization. In: ICDM 2014, pp. 400–409 (2014)
Miettinen, P.: On the positive-negative partial set cover problem. Inform. Process. Lett. 108(4), 219–221 (2008)
Miettinen, P.: Matrix Decomposition Methods for Data Mining: Computational Complexity and Algorithms. Ph.D. thesis, Department of Computer Science, University of Helsinki (2009)
Miettinen, P.: Boolean tensor factorizations. In: ICDM 2011, pp. 447–456 (2011)
Miettinen, P.: Fully dynamic quasi-biclique edge covers via Boolean matrix factorizations. In: DyNetMM 2013, pp. 17–24 (2013)
Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., Mannila, H.: The Discrete Basis Problem. IEEE Trans. Knowl. Data Eng. 20(10), 1348–1362 (2008)
Peeters, R.: The maximum edge biclique problem is NP-complete. Discrete Appl. Math. 131(3), 651–654 (2003)
Peleg, D.: Approximation algorithms for the Label-Cover\(_{MAX}\) and Red-Blue Set Cover problems. J. Discrete Alg. 5(1), 55–64 (2007)
Ramon, J., Miettinen, P., Vreeken, J.: Detecting bicliques in GF[q]. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part I. LNCS, vol. 8188, pp. 509–524. Springer, Heidelberg (2013)
Simon, H.U.: On approximate solutions for combinatorial optimization problems. SIAM J. Discrete Math. 3(2), 294–310 (1990)
Vreeken, J., van Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress. Data Min. Knowl. Discov. 23(1), 169–214 (2011)
Xiang, Y., Jin, R., Fuhry, D., Dragan, F.F.: Summarizing transactional databases with overlapped hyperrectangles. Data Min. Knowl. Discov. 23(2), 215–251 (2011)
Yang, J., Leskovec, J.: Overlapping community detection at scale: a nonnegative matrix factorization approach. In: WSDM 2013 (2013)
Yannakakis, M.: Node-Deletion Problems on Bipartite Graphs. SIAM J. Comput. 10(2), 310–327 (1981)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Miettinen, P. (2015). Generalized Matrix Factorizations as a Unifying Framework for Pattern Set Mining: Complexity Beyond Blocks. In: Appice, A., Rodrigues, P., Santos Costa, V., Gama, J., Jorge, A., Soares, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2015. Lecture Notes in Computer Science(), vol 9285. Springer, Cham. https://doi.org/10.1007/978-3-319-23525-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-23525-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23524-0
Online ISBN: 978-3-319-23525-7
eBook Packages: Computer ScienceComputer Science (R0)