Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation

Sugiyama, Masashi; Suzuki, Taiji; Kanamori, Takafumi

doi:10.1007/s10463-011-0343-8

Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation

Published: 11 November 2011

Volume 64, pages 1009–1044, (2012)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation

Download PDF

Masashi Sugiyama¹,
Taiji Suzuki² &
Takafumi Kanamori³

1105 Accesses
79 Citations
4 Altmetric
Explore all metrics

Abstract

Estimation of the ratio of probability densities has attracted a great deal of attention since it can be used for addressing various statistical paradigms. A naive approach to density-ratio approximation is to first estimate numerator and denominator densities separately and then take their ratio. However, this two-step approach does not perform well in practice, and methods for directly estimating density ratios without density estimation have been explored. In this paper, we first give a comprehensive review of existing density-ratio estimation methods and discuss their pros and cons. Then we propose a new framework of density-ratio estimation in which a density-ratio model is fitted to the true density-ratio under the Bregman divergence. Our new framework includes existing approaches as special cases, and is substantially more general. Finally, we develop a robust density-ratio estimation method under the power divergence, which is a novel instance in our framework.

Article PDF

Level Set Estimation

Density estimation via Bayesian inference engines

Article 01 November 2021

Statistical inference based on bridge divergences

Article 17 May 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Ali S.M., Silvey S.D. (1966) A general class of coefficients of divergence of one distribution from another. Journal of the Royal Statistical Society, Series B 28(1): 131–142
MathSciNet MATH Google Scholar
Banerjee A., Merugu S., Dhillon I.S., Ghosh J. (2005) Clustering with Bregman divergences. Journal of Machine Learning Research 6: 1705–1749
MathSciNet MATH Google Scholar
Basu A., Harris I.R., Hjort N.L., Jones M.C. (1998) Robust and efficient estimation by minimising a density power divergence. Biometrika 85(3): 549–559
Article MathSciNet MATH Google Scholar
Best, M. J. (1982). An algorithm for the solution of the parametric quadratic programming problem. Technical report 82–24, Faculty of Mathematics, University of Waterloo.
Bickel, S., Bogojeska, J., Lengauer, T., Scheffer, T. (2008). Multi-task learning for HIV therapy screening. In A. McCallum, S. Roweis (Eds.), Proceedings of 25th annual international conference on machine learning (ICML2008) (pp. 56–63).
Bregman L.M. (1967) The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics 7: 200–217
Article Google Scholar
Caruana R., Pratt L., Thrun S. (1997) Multitask learning. Machine Learning 28: 41–75
Article Google Scholar
Cayton, L. (2008). Fast nearest neighbor retrieval for Bregman divergences. In A. McCallum, S. Roweis (Eds.), Proceedings of the 25th annual international conference on machine learning (ICML2008) (pp. 112–119). Madison: Omnipress.
Chen S.S., Donoho D.L., Saunders M.A. (1998) Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing 20(1): 33–61
Article MathSciNet Google Scholar
Cheng K.F., Chu C.K. (2004) Semiparametric density estimation under a two-sample density ratio model. Bernoulli 10(4): 583–604
Article MathSciNet MATH Google Scholar
Collins M., Schapire R.E., Singer Y. (2002) Logistic regression, adaboost and Bregman distances. Machine Learning 48(1–3): 253–285
Article MATH Google Scholar
Cover T.M., Thomas J.A. (2006) Elements of information theory (2nd ed.). Wiley, Hoboken, NJ, USA
MATH Google Scholar
Csiszár I. (1967) Information-type measures of difference of probability distributions and indirect observation. Studia Scientiarum Mathematicarum Hungarica 2: 229–318
Google Scholar
Dhillon, I., Sra, S. (2006). Generalized nonnegative matrix approximations with Bregman divergences. In Y. Weiss, B. Schölkopf, J. Platt (Eds.), Advances in neural information processing systems (Vol. 8, pp. 283–290). Cambridge, MA: MIT Press.
Efronm B., Hastie T., Johnstone I., Tibshirani R. (2004) Least angle regression. The Annals of Statistics 32(2): 407–499
Article MathSciNet Google Scholar
Fujisawa H., Eguchi S. (2008) Robust parameter estimation with a small bias against heavy contamination. Journal of Multivariate Analysis 99(9): 2053–2081
Article MathSciNet MATH Google Scholar
Gretton, A., Smola, A., Huang, J., Schmittfull, M., Borgwardt, K., Schölkopf, B. (2009). Covariate shift by kernel mean matching. In J. Quiñonero-Candela, M. Sugiyama, A. Schwaighofer, N. Lawrence (Eds.), Dataset shift in machine learning (Chap. 8, pp. 131–160). Cambridge, MA, USA: MIT Press.
Hastie T., Tibshirani R., Friedman J. (2001) The elements of statistical learning: Data mining, inference, and prediction. Springer, New York, NY, USA
MATH Google Scholar
Hastie T., Rosset S., Tibshirani R., Zhu J. (2004) The entire regularization path for the support vector machine. Journal of Machine Learning Research 5: 1391–1415
MathSciNet MATH Google Scholar
Hido, S., Tsuboi, Y., Kashima, H., Sugiyama, M., Kanamori, T. (2008). Inlier-based outlier detection via direct density ratio estimation. In F. Giannotti, D. Gunopulos, F. Turini, C. Zaniolo, N. Ramakrishnan, X. Wu (Eds.), Proceedings of IEEE international conference on data mining (ICDM2008) (pp. 223–232). Pisa, Italy.
Hido S., Tsuboi Y., Kashima H., Sugiyama M., Kanamori T. (2011) Statistical outlier detection using direct density ratio estimation. Knowledge and Information Systems 26(2): 309–336
Article Google Scholar
Huang, J., Smola, A., Gretton, A., Borgwardt, K. M., Schölkopf, B. (2007). Correcting sample selection bias by unlabeled data. In B. Schölkopf, J. Platt, T. Hoffman (Eds.), Advances in neural information processing systems (Vol. 19, pp. 601–608). Cambridge, MA, USA: MIT Press.
Huber P.J. (1981) Robust statistics. Wiley, New York, NY, USA
Book MATH Google Scholar
Jones M.C., Hjort N.L., Harris I.R., Basu A. (2001) A comparison of related density-based minimum divergence estimators. Biometrika 88: 865–873
Article MathSciNet MATH Google Scholar
Jordan M.I., Ghahramani Z., Jaakkola T.S., Saul L.K. (1999) An introduction to variational methods for graphical models. Machine Learning 37(2): 183
Article MATH Google Scholar
Kanamori T., Hido S., Sugiyama M. (2009) A least-squares approach to direct importance estimation. Journal of Machine Learning Research 10: 1391–1445
MathSciNet MATH Google Scholar
Kanamori, T., Suzuki, T., Sugiyama, M. (2010). Theoretical analysis of density ratio estimation. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E93-A(4), 787–798.
Kanamori, T., Suzuki, T., Sugiyama, M. (2012). Kernel-based least-squares density-ratio estimation I: Statistical analysis. Machine Learning (to appear).
Kawahara, Y., Sugiyama, M. (2009). Change-point detection in time-series data by direct density-ratio estimation. In H. Park, S. Parthasarathy, H. Liu, Z. Obradovic (Eds.), Proceedings of 2009 SIAM international conference on data mining (SDM2009) (pp. 389–400). Nevada, USA: Sparks.
Keziou A. (2003) Dual representation of \({\phi}\)-divergences and applications. Comptes Rendus Mathématique 336(10): 857–862
Article MathSciNet MATH Google Scholar
Keziou A., Leoni-Aubin S. (2005) Test of homogeneity in semiparametric two-sample density ratio models. Comptes Rendus Mathématique 340(12): 905–910
Article MathSciNet MATH Google Scholar
Kimura M., Sugiyama M. (2011) Dependence-maximization clustering with least-squares mutual information. Journal of Advanced Computational Intelligence and Intelligent Informatics 15(7): 800–805
Google Scholar
Kullback S., Leibler R.A. (1951) On information and sufficiency. Annals of Mathematical Statistics 22: 79–86
Article MathSciNet MATH Google Scholar
Minka, T. P. (2007). A comparison of numerical optimizers for logistic regression. Technical report, Microsoft Research. http://research.microsoft.com/~minka/papers/logreg/minka-logreg.pdf.
Murata N., Takenouchi T., Kanamori T, Eguchi S. (2004) Information geometry of U-boost and Bregman divergence. Neural Computation 16(7): 1437–1481
Article MATH Google Scholar
Nguyen X., Wainwright M.J., Jordan M.I. (2010) Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Transactions on Information Theory 56(11): 5847–5861
Article MathSciNet Google Scholar
Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine Series 5, 50(302), 157–175.
Google Scholar
Qin J. (1998) Inferences for case-control and semiparametric two-sample density ratio models. Biometrika 85(3): 619–630
Article MathSciNet MATH Google Scholar
Quiñonero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N. (Eds.) (2009). Dataset shift in machine learning. Cambridge, MA, USA: MIT Press.
Rockafellar R.T. (1970) Convex analysis. Princeton University Press, Princeton, NJ, USA
MATH Google Scholar
Schölkopf, B., Smola, A. J. (2002). Learning with kernels. Cambridge, MA, USA: MIT Press.
Shimodaira H. (2000) Improving predictive inference under covariate shift by weighting the log-likelihood function.. Journal of Statistical Planning and Inference 90(2): 227–244
Article MathSciNet MATH Google Scholar
Silverman B.W. (1978) Density ratios, empirical likelihood and cot death. Journal of the Royal Statistical Society, Series C 27(1): 26–33
Google Scholar
Smola, A., Song, L., Teo, C. H. (2009). Relative novelty detection. In D. van Dyk, M. Welling (Eds.), Proceedings of twelfth international conference on artificial intelligence and statistics (AISTATS2009) (Vol. 5, pp. 536–543). Clearwater Beach, FL, USA: JMLR Workshop and Conference Proceedings.
Steinwart I. (2001) On the influence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research 2: 67–93
MathSciNet Google Scholar
Stummer W. (2007) Some Bregman distances between financial diffusion processes. Proceedings in applied mathematics and mechanics 7: 1050503–1050504
Article Google Scholar
Sugiyama, M. (2010). Superfast-trainable multi-class probabilistic classifier by least-squares posterior fitting. IEICE Transactions on Information and Systems, E93-D(10), 2690–2701.
Sugiyama, M., Kawanabe, M. (2011). Machine learning in non-stationary environments: introduction to covariate shift adaptation. Cambridge, MA, USA: MIT Press (to appear).
Sugiyama M., Müller K.R. (2005) Input-dependent estimation of generalization error under covariate shift. Statistics and Decisions 23(4): 249–279
Article MathSciNet MATH Google Scholar
Sugiyama M., Krauledat M., Müller K.R. (2007) Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research 8: 985–1005
MATH Google Scholar
Sugiyama M., Suzuki T., Nakajima S., Kashima H., von Bünau P., Kawanabe M. (2008) Direct importance estimation for covariate shift adaptation. Annals of the Institute of Statistical Mathematics 60(4): 699–746
Article MathSciNet MATH Google Scholar
Sugiyama M., Kanamori T., Suzuki T., Hido S., Sese J., Takeuchi I., Wang L. (2009) A density-ratio framework for statistical data processing. IPSJ Transactions on Computer Vision and Applications 1: 183–208
Article Google Scholar
Sugiyama, M., Takeuchi, I., Suzuki, T., Kanamori, T., Hachiya, H., Okanohara, D. (2010). Least-squares conditional density estimation. IEICE Transactions on Information and Systems, E93-D(3), 583–594.
Sugiyama M., Suzuki T., Itoh Y., Kanamori T., Kimura M. (2011a) Least-squares two-sample test. Neural Networks 24(7): 735–751
Article Google Scholar
Sugiyama M., Yamada M., von Bünau P., Suzuki T., Kanamori T., Kawanabe M. (2011b) Direct density-ratio estimation with dimensionality reduction via least-squares hetero-distributional subspace search. Neural Networks 24(2): 183–198
Article MATH Google Scholar
Sugiyama, M., Suzuki, T., Kanamori, T. (2012). Density ratio estimation in machine learning. Cambridge, UK: Cambridge University Press (to appear).
Suzuki, T., Sugiyama, M. (2009). Estimating squared-loss mutual information for independent component analysis. In T. Adali, C. Jutten, J. M. T. Romano, A. K. Barros (Eds.), Independent component analysis and signal separation (Vol. 5441, pp. 130–137), Lecture notes in computer science. Berlin, Germany: Springer.
Suzuki, T., Sugiyama, M. (2010). Sufficient dimension reduction via squared-loss mutual information estimation. In Y. W. Teh, M. Tiggerington (Eds.), Proceedings of the thirteenth international conference on artificial intelligence and statistics (AISTATS2010) (Vol. 9, pp. 804–811). Sardinia, Italy: JMLR Workshop and Conference Proceedings.
Suzuki T., Sugiyama M. (2011) Least-squares independent component analysis. Neural Computation 23(1): 284–301
Article MathSciNet MATH Google Scholar
Suzuki, T., Sugiyama, M., Sese, J., Kanamori, T. (2008). Approximating mutual information by maximum likelihood density ratio estimation. In Y. Saeys, H. Liu, I. Inza, L. Wehenkel, Y. V. de Peer (Eds.), Proceedings of ECML-PKDD2008 workshop on new challenges for feature selection in data mining and knowledge discovery 2008 (FSDM2008) (Vol. 4, pp. 5–20). Antwerp, Belgium: JMLR Workshop and Conference Proceedings.
Suzuki, T., Sugiyama, M., Kanamori, T., Sese, J. (2009a). Mutual information estimation reveals global associations between stimuli and biological processes. BMC Bioinformatics, 10(1), S52.
Suzuki, T., Sugiyama, M., Tanaka, T. (2009b). Mutual information approximation via maximum likelihood estimation of density ratio. In Proceedings of 2009 IEEE international symposium on information theory (ISIT2009) (pp. 463–467). Seoul, Korea.
Tibshirani R. (1996) Regression shrinkage and subset selection with the lasso. Journal of the Royal Statistical Society, Series B 58(1): 267–288
MathSciNet MATH Google Scholar
Tipping M.E., Bishop C.M. (1999) Mixtures of probabilistic principal component analyzers. Neural Computation 11(2): 443–482
Article Google Scholar
Tsuboi Y., Kashima H., Hido S., Bickel S., Sugiyama M. (2009) Direct density ratio estimation for large-scale covariate shift adaptation. Journal of Information Processing 17: 138–155
Article Google Scholar
Tsuda, K., Rätsch, G., Warmuth, M. (2005). Matrix exponential gradient updates for on-line learning and Bregman projection. In L. K. Saul, Y. Weiss, L. Bottou (Eds.), Advances in neural information processing systems (Vol. 17, pp. 1425–1432). Cambridge, MA: MIT Press.
Williams P.M. (1995) Bayesian regularization and pruning using a Laplace prior. Neural Computation 7(1): 117–143
Article Google Scholar
Wu, L., Jin, R., Hoi, S. C. H., Zhu, J., Yu, N. (2009). Learning Bregman distance functions and its application for semi-supervised clustering. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, A. Culotta (Eds.), Advances in neural information processing systems (Vol. 22, pp. 2089–2097). Curran Associates, Inc.
Yamada, M., Sugiyama, M. (2009) Direct importance estimation with Gaussian mixture models. IEICE Transactions on Information and Systems, E92-D(10), 2159–2162.
Yamada, M., Sugiyama, M. (2010). Dependence minimizing regression with model selection for non-linear causal inference under non-Gaussian noise. In Proceedings of the twenty-fourth AAAI conference on artificial intelligence (AAAI2010) (pp. 643–648). Atlanta, Georgia, USA: The AAAI Press.
Yamada, M., Sugiyama, M. (2011). Cross-domain object matching with model selection. In G. Gordon, D. Dunson, M. Dudík (Eds.), Proceedings of the fourteenth international conference on artificial intelligence and statistics (AISTATS2011) (pp. 807–815). Florida, USA: Fort Lauderdale.
Yamada, M., Sugiyama, M., Wichern, G., Simm, J. (2010). Direct importance estimation with a mixture of probabilistic principal component analyzers. IEICE Transactions on Information and Systems, E93-D(10), 2846–2849.
Yamada M., Sugiyama M., Wichern G., Simm J. (2011) Improving the accuracy of least-squares probabilistic classifiers. IEICE Transactions on Information and Systems E94-D(6): 1337–1340
Article Google Scholar

Download references

Author information

Authors and Affiliations

Tokyo Institute of Technology, 2-12-1 O-okayama, Meguro-ku, Tokyo, 152-8552, Japan
Masashi Sugiyama
The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
Taiji Suzuki
Nagoya University, Furocho, Chikusaku, Nagoya, 464-8603, Japan
Takafumi Kanamori

Authors

Masashi Sugiyama
View author publications
You can also search for this author in PubMed Google Scholar
Taiji Suzuki
View author publications
You can also search for this author in PubMed Google Scholar
Takafumi Kanamori
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Masashi Sugiyama.

About this article

Cite this article

Sugiyama, M., Suzuki, T. & Kanamori, T. Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation. Ann Inst Stat Math 64, 1009–1044 (2012). https://doi.org/10.1007/s10463-011-0343-8

Download citation

Received: 17 September 2010
Revised: 06 September 2011
Published: 11 November 2011
Issue Date: October 2012
DOI: https://doi.org/10.1007/s10463-011-0343-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation

Abstract

Article PDF

Similar content being viewed by others

Level Set Estimation

Density estimation via Bayesian inference engines

Statistical inference based on bridge divergences

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Keywords

Navigation

Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation

Abstract

Article PDF

Similar content being viewed by others

Level Set Estimation

Density estimation via Bayesian inference engines

Statistical inference based on bridge divergences

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Share this article

Keywords

Search

Navigation