Abstract
The aim of this paper is to provide some theoretical understanding of quasi-Bayesian aggregation methods of nonnegative matrix factorization. We derive an oracle inequality for an aggregated estimator. This result holds for a very general class of prior distributions and shows how the prior affects the rate of convergence.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
G. I. Allen, L. Grosenick, and J. Taylor, “A Generalized Least-Square Matrix Decomposition”, J. Amer. Statist. Assoc. 109 (505), 145–159 (2014).
P. Alquier, “Bayesian Methods for Low-Rank Matrix Estimation: Short Survey and Theoretical Study”, in Algorithmic Learning Theory 2013 (Springer, 2013), pp. 309–323.
P. Alquier, V. Cottet, N. Chopin, and J. Rousseau, Bayesian Matrix Completion: Prior Specification, Preprint arXiv:1406.1440 (2014).
P. Alquier, J. Ridgway, and N. Chopin, On the Properties of Variational Approximations of Gibbs Posteriors, J. Machine Learning Res., 17 (239), 1–41 (2016).
C. M. Bishop, Pattern Recognition and Machine Learning (Springer, 2006), Chapter10.
P. G. Bissiri, C. C. Holmes, and S. G. Walker, A General Framework for Updating Belief Distributions, J. Roy. Statist. Soc. Ser. B, 78 (5) (2016).
V. Bittorf, B. Recht, C. Re, and J. Tropp, “Factoring Nonnegative Matrices with Linear Programs”, in Advances in Neural Information Processing Systems (2012), pp. 1214–1222.
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers”, Foundations and Trends in Machine Learning 3 (1), 1–122 (2011).
O. Catoni, A PAC-Bayesian Approach to Adaptive Classification, Preprint Laboratoire de Probabilités et Modèles Aléatoires, PMA-840 (2003).
O. Catoni, Statistical Learning Theory and Stochastic Optimization, in Saint-Flour Summer School on Probability Theory 2001, Ed. by Jean Picard (Springer, 2004).
O. Catoni, PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning, in Inst. of Math. Statist. Lecture Notes—Monograph Series (IMS, Beachwood, OH, 2007), Vol.56.
A. T. Cemgil, “Bayesian Inference for Nonnegative Matrix Factorization Models”, Computational Intelligence and Neuroscience (2009).
J. Corander and M. Villani, “Bayesian Assessment of Dimensionality in Reduced Rank Regression”, Statistica Neerlandica 58, 255–270 (2004).
A. Dalalyan and A. B. Tsybakov, “Aggregation by ExponentialWeighting, Sharp PAC-Bayesian Bounds and Sparsity”, Machine Learning 72 (1–2), 39–61 (2008).
A. S. Dalalyan and A. B. Tsybakov, “Aggregation by ExponentialWeighting and Sharp Oracle Inequalities”, in Lecture Notes in Computer Science, Vol. 4539: Learning Theory, Ed. by N. Bshouty and C. Gentile (Springer, Berlin–Heidelberg, 2007), pp. 97–111.
D. Donoho and V. Stodden, “When Does Nonnegative Matrix Factorization Give a Correct Decomposition into Parts?”, in Advances in Neural Information Processing Systems (2003).
C. Févotte, N. Bertin, and J.-L. Durrieu, “Nonnegative Matrix Factorization with the Itakura–Saito Divergence: With Application to Music Analysis”, Neural Computation 21 (3), 793–830 (2009).
I. Giulini, PAC-Bayesian Bounds for Principal Component Analysis in Hilbert Spaces, Preprint arXiv:1511.06263 (2015).
Y. Golubev and D. Ostrovski, “Concentration Inequalities for the Exponential Weighting Method”, Math. Methods Statist. 23 (1), 20–37 (2014).
N. Guan, D. Tao, Z. Luo, and B. Yuan, “NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factorization”, IEEE Trans. on Signal Processing 60 (6), 2882–2898 (2012).
B. Guedj and P. Alquier, “PAC-Bayesian Estimation and Prevision in Sparse AdditiveModels”, Electronic J. Statist, 7, 264–291 (2013).
B. Guedj and S. Robbiano, PAC-Bayesian High Dimensional Bipartite Ranking, Preprint arXiv: 1511.02729 (2015).
D. Guillamet and J. Vitria, “Classifying Faces withNonnegative Matrix Factorization”, in Proc. 5th Catalan Conference for Artificial Intelligence (2002), pp. 24–31.
M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, “An Introduction to Variational Methods for GraphicalModels”, Machine Learning 37, 183–233 (1999).
D. Kim, S. Sra, and I. S. Dhillon, “Fast Projection-BasedMethods for the Least SquaresNonnegativeMatrix Approximation Problem”, Statist. Analysis and Data Mining 1 (1), 38–51 (2008).
Y. Koren, R. Bell, and C. Volinsky, “Matrix Factorization Techniques for Recommender Systems”, Computer 42 (8), 30–37 (2009).
N. D. Lawrence and R. Urtasun, “Nonlinear Matrix Factorization with Gaussian Processes”, in Proc. 26th Annual Internat. Conf. on Machine Learning (ACM, 2009), pp. 601–608.
D. D. Lee and H. S. Seung, “Learning the Parts of Objects by Nonnegative Matrix Factorization”, Nature 401 (6755), 788–791 (1999).
D. D. Lee and H. S. Seung, “Algorithms for Nonnegative Matrix Factorization”, in Adv. in Neural Inform. Processing Systems (2001), pp. 556–562.
G. Leung and A. R. Barron, “Information Theory and Mixing Least-Squares Regressions”, IEEE Trans. Inform. Theory 52 (8), 3396–3410 (2006).
L. Li, B. Guedj, and S. Loustau, PAC-Bayesian Online Clustering, Preprint arXiv:1602.00522 (2016).
Y. J. Lim and Y. W. Teh, “Variational Bayesian Approach to Movie Rating Prediction”, in Proc. KDD Cup and Workshop (2007), Vol. 7, pp. 5–21.
C.-J. Lin, “Projected Gradient Methods for Nonnegative Matrix Factorization”, Neural Computation 19 (10), 2756–2779 (2007).
D. J. C. MacKay, Information Theory, Inference and Learning Algorithms (Cambridge Univ. Press, 2002).
T. T. Mai and P. Alquier, “A Bayesian Approach for Matrix Completion: Optimal Rates under General Sampling Distributions”, Electronic J. Statist. 9, 823–841 (2015).
D. McAllester, “Some PAC-Bayesian Theorems”, in Proc. 11th Annual Conf. on Comput. Learning Theory (ACM, New York, 1998), pp. 230–234.
S. Moussaoui, D. Brie, A. Mohammad-Djafari, and C. Carteret, “Separation of Nonnegative Mixture of Nonnegative Sources Using a Bayesian Approach and MCMC Sampling”, IEEE Trans. on Signal Processing 54 (11), 4133–4145 (2006).
A. Ozerov and C. Févotte, “Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation”, IEEE Trans. on Audio, Speech, and Language Processing 18 (3), 550–563 (2010).
J. Paisley, D. Blei, and M. I. Jordan, Bayesian Nonnegative Matrix Factorization with Stochastic Variational Inference, in Handbook of Mixed Membership Models and Their Applications (Chapman and Hall/CRC, 2015), Chapter11.
R. Salakhutdinov and A. Mnih, “Bayesian Probabilistic Matrix Factorization Using Markov Chain Monte Carlo”, in Proc. 25th Internat. Conf. on Machine Learning (ACM, 2008), pp. 880–887.
M. N. Schmidt, O. Winther, and L. K. Hansen, “Bayesian Nonnegative Matrix Factorization”, in Independent Component Analysis and Signal Separation (Springer, 2009), pp. 540–547.
F. Shahnaz, M.W. Berry, V. P. Pauca, and R. J. Plemmons, “Document Clustering UsingNonnegativeMatrix Factorization”, Inform. Processing & Management 42 (2), 373–386 (2006).
J. Shawe-Taylor and R. Williamson, “A PAC Analysis of a Bayes Estimator”, in Proc. 10th Annual Conf. on Comput. Learning Theory (ACM, New York, 1997), pp. 2–9.
T. Suzuki, “Convergence Rate of Bayesian Tensor Estimator and Its Minimax Optimality”, in Proc. 32nd Internat. Conf. on Machine Learning (Lille, 2015) (2015), pp. 1273–1282.
V. Y. Tan and C. Févotte, “Automatic Relevance Determination in Nonnegative Matrix Factorization”, in SPARS’09-Signal Processing with Adaptive Sparse Structured Representations (2009).
W. Xu, X. Liu, and Y. Gong, “Document Clustering Based on Nonnegative Matrix Factorization”, in Proc. 26th Annual Internat. ACM SIGIR Conf. on Research and Development in Inform. Retrieval (ACM, 2003), pp. 267–273.
Y. Xu and W. Yin, “A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion”, SIAMJ. on Imaging Sci. 6 (3), 1758–1789 (2013).
Y. Xu, W. Yin, Z. Wen, and Y. Zhang, “An Alternating Direction Algorithm for Matrix Completion with Nonnegative Factors”, Frontiers of Mathematics in China 7 (2), 365–384 (2012).
M. Zhong and M. Girolami, “Reversible Jump MCMC for Nonnegative Matrix Factorization”, in Internat. Conf. Artificial Intelligence and Statist. (2009), pp. 663–670.
M. Zhou, C. Wang, M. Chen, J. Paisley, D. Dunson, and L. Carin, Nonparametric Bayesian Matrix Completion, in Proc. IEEE SAM (2010).
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Alquier, P., Guedj, B. An oracle inequality for quasi-Bayesian nonnegative matrix factorization. Math. Meth. Stat. 26, 55–67 (2017). https://doi.org/10.3103/S1066530717010045
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S1066530717010045