Abstract
In this paper, we introduce a class of local divergences between two probability distributions and illustrate its usefulness in model selection. Explicit expressions of the proposed local divergences are derived when the underlying distributions are members of the exponential family of distributions or they are described by multivariate normal models. In addition, a local model selection criterion, termed the local divergence information criterion (LDiv.IC), is proposed. Simulations and applications are presented in order to study and exemplify the performance of the proposed criterion.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Aitkin, M. (2001). Likelihood and Bayesian analysis of mixtures. Stat. Model.1, 287–304.
Akaike, H. (1973). Information Theory and an Extension of the Maximum Likelihood Principle. In Proceeding of the Second International Symposium on Information Theory, (Petrov, B.N. and Csaki, F., eds.). Akademiai Kaido, Budapest.
Avlogiaris, G., Micheas, A. and Zografos, K. (2016a). On local divergences between two probability measures. Metrika79, 303–333.
Avlogiaris, G., Micheas, A. and Zografos, K. (2016b). On testing local hypotheses via local divergence. Stat. Methodol.31, 20–42.
Basu, A., Harris, I.R., Hjort, N.L. and Jones, M.C. (1998). Robust and efficient estimation by minimizing a density power divergence. Biometrika85, 549–559.
Basu, A., Shioya, H. and Park, C. (2011). Statistical Inference: the Minimum Distance Approach. Chapman & Hall/CRC, London.
Bengtsson, T. and Cavanaugh, J.E. (2006). An improved Akaike information criterion for state-space model selection. Comput. Statist. Data Anal.50, 2635–2654.
Burnham, P.K. and Anderson, R.D. (2002). Model Selection and Multimodel Inference a Practical Information-Theoretic Approach, 2nd edn. Springer, Berlin.
Cavanaugh, J.E. (2004). Criteria for linear model selection based on Kullback’s symmetric divergence. Aust. N. Z. J. Stat.46, 257–274.
Claeskens, G. and Hjort, N.L. (2003). The focused information criterion. J. Amer. Statist. Assoc.98, 464, 900–916.
Claeskens, G. and Hjort, N.L. (2008). Model Selection and Model Averaging. Cambridge University Press, Cambridge.
Cressie, N. (1993). Statistics for Spatial Data, 2nd edn. Wiley, New York.
Cressie, N. and Read, T.R.C. (1984). Multinomial goodness-of-fit tests. J. Roy. Statist. Soc. Ser. B46, 440–464.
Csiszár, I. (1963). Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizitat von Markoffschen Ketten. Magyar Tud. Akad. Mat. Kutato Int. Kozl.8, 85–108.
Csiszár, I. (1967). Information-type measures of difference of probability distributions and indirect observations. Studia Sci. Math. Hungar.2, 299–318.
Dempster, P.A., Laird, M.N. and Rubin, B.D. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B39, 1–38.
Dik, J.J. and Gunst, M.C.M. (1985). The distribution of general quadratic forms in normal variables. Stat. Neerl.39, 14–26.
Diggle, P.J. (2013). Statistical Analysis of Spatial and Spatial-Temporal Point Patterns, 3rd edn. CRC Press, Boca Raton.
Fisher, R.A. (1936). The use of multiple measurements Axonomic problems. Ann. Eugen.7, 179–188.
Jiménez-Gamero, M.D., Pino-mejias, R., Alba-Fernández, V. and Moreno Rebollo, J.L. (2011). Minimum φ-divergence estimation in misspecified multinomial models. Comput. Statist. Data Anal.55, 3365–3378.
Kagan, A.M. (1963). On the theory of Fisher’s information quantity. Dokl. Akad. Nauk SSSR151, 277–278.
Konishi, S. and Kitagawa, G. (1996). Generalised information criteria in model selection. Biometrika83, 875–890.
Kullback, S. and Leibler, R.A. (1951). On information and sufficiency. Ann. Math. Statistics22, 79–86.
Lewis, P.A.W. and Shedler, G.S. (1979). Simulation of nonhomogeneous Poisson processes by thinning. Naval Res. Logist.26, 3, 403–413.
Mattheou, K., Lee, S. and Karagrigoriou, A. (2009). A model selection criterion based on the BHHJ measure of divergence. J. Statist. Plann. Inference139, 228–235.
McLachlan, G. and Peal, D. (2000). Finite Mixture Models. Wiley, New York.
Micheas, A. (2014). Hierarchical Bayesian modeling of marked non-homogeneous Poisson processes with finite mixtures and inclusion of covariate information. J. Appl. Stat.41, 12, 2596–2615.
Nielsen, F. and Nock, R. (2011). On Rényi and Tsallis entropies and divergences for exponential families. arXiv:1105.3259v1 [cs.IT] 17 May 2011.
Pardo, L. (2006). Statistical Inference Based on Divergence Measures. Chapman & Hall/CRC, London.
Postman, M., Geller, M. and Huchra, J. (1986). The cluster-cluster correlation function. The Astronomical Journal91, 1267–1273.
Roeder, K. (1992). Density estimation with confidence sets exemplified by superclusters and voids in the galaxies. Journal of Statistical Association85, 617–624.
Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist.6, 461–464.
Seghouane, A.K. and Bekara, M. (2004). A small sample model selection based on the Kullback symmetric divergence. IEEE Trans. Signal Process52, 3314–3323.
Shang, J. (2008). Selection criteria based on Monte Carlo simulation and cross validation in mixed models. Far East J. Theor. Stat.25, 51–72.
Shang, J. and Cavanaugh, J.E. (2008). Bootstrap variants of the Akaike information criterion for mixed model selection. Comput. Statist. Data Anal.52, 2004–2021.
Spiegelhalter, D.J., Best, N.G., Carlin, B.P. and van der Linde, A. (2002). Bayesian measures of model complexity and fit. J. Roy. Statist. Soc. B64, 583–639.
Strauss, D.J. (1975). A model for clustering. Biometrika62, 467–475.
Takeuchi, K. (1976). Distribution of informational statistics and a criterion of model fitting. Suri-Kagaku (Mathematic Sciences)153, 12–18. (in Japanese).
Toma, A. (2014). Model selection criteria using divergences. Entropy16, 5, 2686–2698.
Toma, A. and Broniatowski, M. (2011). Dual divergence estimators and tests: robustness results. J. Multivariate Anal.102, 1, 20–36.
Vuong, Q.H. and Wang, W. (1993). Minimum chi-square estimation and tests for model selection. J. Econometrics56, 141–168.
Acknowledgments
The authors would like to thank three referees for their helpful comments and suggestions which improved the exposition of an earlier version of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Appendix A
Appendix A
1.1 A.1 Estimated Parameters of the Candidate Models in the Redwood Trees Application
- Model 1::
-
bivariate normal with estimated parameters \(\hat {\mu }_{1} = (0.26,0.25)\) and \(\hat {\Sigma }_{1} = \left (\begin {array}{cc} 0.0170 & 0.0049 \\ 0.0049 & 0.0179 \end {array}\right )\).
- Model 2::
-
mixture of two bivariate normal components with estimated parameters \(\hat {w}_{2}= 0.598\), \(\hat {\mu }_{2} = (0.21,0.17)\), \(\hat {\Sigma }_{2} = \left (\begin {array}{cc} 0.0156 & -0.0032 \\ -0.0032 & 0.0068 \end {array}\right )\), \(\hat {w}_{3}= 0.402\), \(\hat {\mu }_{3} = (0.32,0.36)\) and \(\hat {\Sigma }_{3}=\left (\begin {array}{cc} 0.0087 & 0.0007 \\ 0.0007 & 0.0056 \end {array}\right )\).
- Model 3::
-
mixture of three bivariate normal components with estimated parameters \(\hat {w}_{4} = 0.2764\), \(\hat {\mu }_{4} = (0.12,0.27)\), \(\hat {\Sigma }_{5} = \left (\begin {array}{cc} 0.0036 & 0.0040 \\ 0.0040 & 0.0084 \end {array} \right )\), \(\hat {w}_{6}= 0.2506\), \(\hat {\mu }_{6} = (0.34,0.36)\), \(\hat {\Sigma }_{6} = \left (\begin {array}{cc} 0.0083 & -0.0016 \\ -0.0016 & 0.0123 \end {array} \right )\), \(\hat {w}_{7}= 0.4730\), \(\hat {\mu }_{7} = (0.28,0.22)\) and \(\hat {\Sigma }_{7} = \left (\begin {array}{cc} 0.0105 & 0.0079 \\ 0.0079 & 0.0142 \end {array}\right )\).
- Model 4::
-
mixture of four bivariate normal components with estimated parameters \(\hat {w}_{8} = 0.1565\), \(\hat {\mu }_{8} = (0.38,0.18)\), \(\hat {\Sigma }_{8} = \left (\begin {array}{cc} 0.0055 & 0.0000 \\ 0.0000 & 0.0105 \end {array} \right )\), \(\hat {w}_{9}= 0.3004\), \(\hat {\mu }_{9}=(0.34,0.35)\), \(\hat {\Sigma }_{9} = \left (\begin {array}{cc} 0.0050 & 0.0047 \\ 0.0047 & 0.0071 \end {array}\right )\), \(\hat {w}_{10}= 0.2715\), \(\hat {\mu }_{10}=(0.12,0.29)\), \(\hat {\Sigma }_{10} = \left (\begin {array}{cc} 0.0040 & 0.0049 \\ 0.0049 & 0.0083 \end {array} \right )\), \(\hat {w}_{11}= 0.2715\), \(\hat {\mu }_{11}=(0.22,0.15)\) and \(\hat {\Sigma }_{11} = \left (\begin {array}{cc} 0.0053 & 0.0029 \\ 0.0029 & 0.0049 \end {array} \right )\).
- Model 5::
-
mixture of five bivariate normal components with estimated parameters \(\hat {w}_{12}= 0.2588\), \(\hat {\mu }_{12} = (0.12,0.28)\), \(\hat {\Sigma }_{12} = \left (\begin {array}{cc} 0.0037 & 0.0046 \\ 0.0046 & 0.0077 \end {array} \right )\), \(\hat {w}_{13} = 0.1435\), \(\hat {\mu }_{13} = (0.40,0.17)\), \(\hat {\Sigma }_{13}=\left (\begin {array}{cc} 0.0045 & 0.0010 \\ 0.0010 & 0.0094 \end {array} \right )\), \(\hat {w}_{14}= 0.2776\), \(\hat {\mu }_{14} = (0.33,0.34)\), \(\hat {\Sigma }_{14} = \left (\begin {array}{cc} 0.0050 & 0.0052 \\ 0.0052 & 0.0073 \end {array} \right )\), \(\hat {w}_{15} = 0.0916\), \(\hat {\mu }_{15}=(0.24,0.25)\), \(\hat {\Sigma }_{15} = \left (\begin {array}{cc} 0.0057 & 0.0057 \\ 0.0057 & 0.0095 \end {array} \right )\), \(\hat {w}_{16}= 0.2284\), \(\hat {\mu }_{16} = (0.23,0.15)\) and \(\hat {\Sigma }_{16} = \left (\begin {array}{cc} 0.0049 & 0.0034 \\ 0.0034 & 0.0055 \end {array} \right )\).
- Model 6::
-
mixture of six bivariate normal components with estimated parameters \(\hat {w}_{17} = 0.2478\), \(\hat {\mu }_{17} = (0.12,0.28)\), \(\hat {\Sigma }_{17} = \left (\begin {array}{cc} 0.0036 & 0.0044 \\ 0.0044 & 0.0074 \end {array} \right )\), \(\hat {w}_{18}= 0.0883\), \(\hat {\mu }_{18}=(0.25,0.29)\), \(\hat {\Sigma }_{18} = \left (\begin {array}{cc} 0.0061 & 0.0067 \\ 0.0067 & 0.0108 \end {array} \right )\), \(\hat {w}_{19} = 0.2527\), \(\hat {\mu }_{19} = (0.33,0.34)\), \(\hat {\Sigma }_{19} = \left (\begin {array}{cc} 0.0054 & 0.0059 \\ 0.0059 & 0.0083 \end {array} \right )\), \(\hat {w}_{20}= 0.1372\), \(\hat {\mu }_{20}=(0.41,0.16)\), \(\hat {\Sigma }_{20} = \left (\begin {array}{cc} 0.0039 & 0.0016 \\ 0.0016 & 0.0092 \end {array} \right )\), \(\hat {w}_{21}= 0.1943\), \(\hat {\mu }_{21}=\!(0.23,0.16)\), \(\hat {\Sigma }_{21} =\! \left (\begin {array}{cc} 0.0046 & 0.0042 \\ 0.0042 & 0.0067 \end {array} \!\right )\), \(\hat {w}_{22}= 0.0797\), \(\hat {\mu }_{22} =\! (0.24,0.20)\) and \(\hat {\Sigma }_{22} = \left (\begin {array}{cc} 0.0055 & 0.0059 \\ 0.0059 & 0.0090 \end {array}\right )\).
Rights and permissions
About this article
Cite this article
Avlogiaris, G., Micheas, A.C. & Zografos, K. A Criterion for Local Model Selection. Sankhya A 81, 406–444 (2019). https://doi.org/10.1007/s13171-018-0126-x
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13171-018-0126-x
Keywords and phrases.
- Model selection
- AIC
- Local divergence information criterion
- Local model selection criterion
- Local expected overall discrepancy
- Local BHHJ power divergence
- Mixture models
- Point process theory