A Criterion for Local Model Selection

Avlogiaris, G.; Micheas, A. C.; Zografos, K.

doi:10.1007/s13171-018-0126-x

A Criterion for Local Model Selection

Published: 29 March 2018

Volume 81, pages 406–444, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Sankhya A Aims and scope Submit manuscript

A Criterion for Local Model Selection

Download PDF

G. Avlogiaris¹,
A. C. Micheas² &
K. Zografos¹

85 Accesses
5 Citations
Explore all metrics

Abstract

In this paper, we introduce a class of local divergences between two probability distributions and illustrate its usefulness in model selection. Explicit expressions of the proposed local divergences are derived when the underlying distributions are members of the exponential family of distributions or they are described by multivariate normal models. In addition, a local model selection criterion, termed the local divergence information criterion (LDiv.IC), is proposed. Simulations and applications are presented in order to study and exemplify the performance of the proposed criterion.

Article PDF

On local divergences between two probability measures

Article 24 July 2015

Robust statistical inference based on the C-divergence family

Article 30 July 2018

Adaptation of the tuning parameter in general Bayesian inference with robust divergence

Article Open access 04 February 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Aitkin, M. (2001). Likelihood and Bayesian analysis of mixtures. Stat. Model.1, 287–304.
Article Google Scholar
Akaike, H. (1973). Information Theory and an Extension of the Maximum Likelihood Principle. In Proceeding of the Second International Symposium on Information Theory, (Petrov, B.N. and Csaki, F., eds.). Akademiai Kaido, Budapest.
Avlogiaris, G., Micheas, A. and Zografos, K. (2016a). On local divergences between two probability measures. Metrika79, 303–333.
Article MathSciNet Google Scholar
Avlogiaris, G., Micheas, A. and Zografos, K. (2016b). On testing local hypotheses via local divergence. Stat. Methodol.31, 20–42.
Article MathSciNet Google Scholar
Basu, A., Harris, I.R., Hjort, N.L. and Jones, M.C. (1998). Robust and efficient estimation by minimizing a density power divergence. Biometrika85, 549–559.
Article MathSciNet Google Scholar
Basu, A., Shioya, H. and Park, C. (2011). Statistical Inference: the Minimum Distance Approach. Chapman & Hall/CRC, London.
Book Google Scholar
Bengtsson, T. and Cavanaugh, J.E. (2006). An improved Akaike information criterion for state-space model selection. Comput. Statist. Data Anal.50, 2635–2654.
Article MathSciNet Google Scholar
Burnham, P.K. and Anderson, R.D. (2002). Model Selection and Multimodel Inference a Practical Information-Theoretic Approach, 2nd edn. Springer, Berlin.
MATH Google Scholar
Cavanaugh, J.E. (2004). Criteria for linear model selection based on Kullback’s symmetric divergence. Aust. N. Z. J. Stat.46, 257–274.
Article MathSciNet Google Scholar
Claeskens, G. and Hjort, N.L. (2003). The focused information criterion. J. Amer. Statist. Assoc.98, 464, 900–916.
Article MathSciNet Google Scholar
Claeskens, G. and Hjort, N.L. (2008). Model Selection and Model Averaging. Cambridge University Press, Cambridge.
MATH Google Scholar
Cressie, N. (1993). Statistics for Spatial Data, 2nd edn. Wiley, New York.
Book Google Scholar
Cressie, N. and Read, T.R.C. (1984). Multinomial goodness-of-fit tests. J. Roy. Statist. Soc. Ser. B46, 440–464.
MathSciNet MATH Google Scholar
Csiszár, I. (1963). Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizitat von Markoffschen Ketten. Magyar Tud. Akad. Mat. Kutato Int. Kozl.8, 85–108.
MathSciNet MATH Google Scholar
Csiszár, I. (1967). Information-type measures of difference of probability distributions and indirect observations. Studia Sci. Math. Hungar.2, 299–318.
MathSciNet MATH Google Scholar
Dempster, P.A., Laird, M.N. and Rubin, B.D. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B39, 1–38.
MathSciNet MATH Google Scholar
Dik, J.J. and Gunst, M.C.M. (1985). The distribution of general quadratic forms in normal variables. Stat. Neerl.39, 14–26.
Article MathSciNet Google Scholar
Diggle, P.J. (2013). Statistical Analysis of Spatial and Spatial-Temporal Point Patterns, 3rd edn. CRC Press, Boca Raton.
MATH Google Scholar
Fisher, R.A. (1936). The use of multiple measurements Axonomic problems. Ann. Eugen.7, 179–188.
Article Google Scholar
Jiménez-Gamero, M.D., Pino-mejias, R., Alba-Fernández, V. and Moreno Rebollo, J.L. (2011). Minimum φ-divergence estimation in misspecified multinomial models. Comput. Statist. Data Anal.55, 3365–3378.
Article MathSciNet Google Scholar
Kagan, A.M. (1963). On the theory of Fisher’s information quantity. Dokl. Akad. Nauk SSSR151, 277–278.
MathSciNet Google Scholar
Konishi, S. and Kitagawa, G. (1996). Generalised information criteria in model selection. Biometrika83, 875–890.
Article MathSciNet Google Scholar
Kullback, S. and Leibler, R.A. (1951). On information and sufficiency. Ann. Math. Statistics22, 79–86.
Article MathSciNet Google Scholar
Lewis, P.A.W. and Shedler, G.S. (1979). Simulation of nonhomogeneous Poisson processes by thinning. Naval Res. Logist.26, 3, 403–413.
Article MathSciNet Google Scholar
Mattheou, K., Lee, S. and Karagrigoriou, A. (2009). A model selection criterion based on the BHHJ measure of divergence. J. Statist. Plann. Inference139, 228–235.
Article MathSciNet Google Scholar
McLachlan, G. and Peal, D. (2000). Finite Mixture Models. Wiley, New York.
Book Google Scholar
Micheas, A. (2014). Hierarchical Bayesian modeling of marked non-homogeneous Poisson processes with finite mixtures and inclusion of covariate information. J. Appl. Stat.41, 12, 2596–2615.
MathSciNet Google Scholar
Nielsen, F. and Nock, R. (2011). On Rényi and Tsallis entropies and divergences for exponential families. arXiv:1105.3259v1 [cs.IT] 17 May 2011.
Pardo, L. (2006). Statistical Inference Based on Divergence Measures. Chapman & Hall/CRC, London.
MATH Google Scholar
Postman, M., Geller, M. and Huchra, J. (1986). The cluster-cluster correlation function. The Astronomical Journal91, 1267–1273.
Article Google Scholar
Roeder, K. (1992). Density estimation with confidence sets exemplified by superclusters and voids in the galaxies. Journal of Statistical Association85, 617–624.
Article Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist.6, 461–464.
Article MathSciNet Google Scholar
Seghouane, A.K. and Bekara, M. (2004). A small sample model selection based on the Kullback symmetric divergence. IEEE Trans. Signal Process52, 3314–3323.
Article MathSciNet Google Scholar
Shang, J. (2008). Selection criteria based on Monte Carlo simulation and cross validation in mixed models. Far East J. Theor. Stat.25, 51–72.
MathSciNet MATH Google Scholar
Shang, J. and Cavanaugh, J.E. (2008). Bootstrap variants of the Akaike information criterion for mixed model selection. Comput. Statist. Data Anal.52, 2004–2021.
Article MathSciNet Google Scholar
Spiegelhalter, D.J., Best, N.G., Carlin, B.P. and van der Linde, A. (2002). Bayesian measures of model complexity and fit. J. Roy. Statist. Soc. B64, 583–639.
Article MathSciNet Google Scholar
Strauss, D.J. (1975). A model for clustering. Biometrika62, 467–475.
Article MathSciNet Google Scholar
Takeuchi, K. (1976). Distribution of informational statistics and a criterion of model fitting. Suri-Kagaku (Mathematic Sciences)153, 12–18. (in Japanese).
Google Scholar
Toma, A. (2014). Model selection criteria using divergences. Entropy16, 5, 2686–2698.
Article MathSciNet Google Scholar
Toma, A. and Broniatowski, M. (2011). Dual divergence estimators and tests: robustness results. J. Multivariate Anal.102, 1, 20–36.
Article MathSciNet Google Scholar
Vuong, Q.H. and Wang, W. (1993). Minimum chi-square estimation and tests for model selection. J. Econometrics56, 141–168.
Article MathSciNet Google Scholar

Download references

Acknowledgments

The authors would like to thank three referees for their helpful comments and suggestions which improved the exposition of an earlier version of the manuscript.

Author information

Authors and Affiliations

Department of Mathematics, University of Ioannina, Ioannina, Greece
G. Avlogiaris & K. Zografos
Department of Statistics, University of Missouri, Columbia, MO, USA
A. C. Micheas

Authors

G. Avlogiaris
View author publications
You can also search for this author in PubMed Google Scholar
A. C. Micheas
View author publications
You can also search for this author in PubMed Google Scholar
K. Zografos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. C. Micheas.

Appendix A

1.1 A.1 Estimated Parameters of the Candidate Models in the Redwood Trees Application

Model 1::: bivariate normal with estimated parameters \(\hat {\mu }_{1} = (0.26,0.25)\) and \(\hat {\Sigma }_{1} = \left (\begin {array}{cc} 0.0170 & 0.0049 \\ 0.0049 & 0.0179 \end {array}\right )\).
Model 2::: mixture of two bivariate normal components with estimated parameters \(\hat {w}_{2}= 0.598\), \(\hat {\mu }_{2} = (0.21,0.17)\), \(\hat {\Sigma }_{2} = \left (\begin {array}{cc} 0.0156 & -0.0032 \\ -0.0032 & 0.0068 \end {array}\right )\), \(\hat {w}_{3}= 0.402\), \(\hat {\mu }_{3} = (0.32,0.36)\) and \(\hat {\Sigma }_{3}=\left (\begin {array}{cc} 0.0087 & 0.0007 \\ 0.0007 & 0.0056 \end {array}\right )\).
Model 3::: mixture of three bivariate normal components with estimated parameters \(\hat {w}_{4} = 0.2764\), \(\hat {\mu }_{4} = (0.12,0.27)\), \(\hat {\Sigma }_{5} = \left (\begin {array}{cc} 0.0036 & 0.0040 \\ 0.0040 & 0.0084 \end {array} \right )\), \(\hat {w}_{6}= 0.2506\), \(\hat {\mu }_{6} = (0.34,0.36)\), \(\hat {\Sigma }_{6} = \left (\begin {array}{cc} 0.0083 & -0.0016 \\ -0.0016 & 0.0123 \end {array} \right )\), \(\hat {w}_{7}= 0.4730\), \(\hat {\mu }_{7} = (0.28,0.22)\) and \(\hat {\Sigma }_{7} = \left (\begin {array}{cc} 0.0105 & 0.0079 \\ 0.0079 & 0.0142 \end {array}\right )\).
Model 4::: mixture of four bivariate normal components with estimated parameters \(\hat {w}_{8} = 0.1565\), \(\hat {\mu }_{8} = (0.38,0.18)\), \(\hat {\Sigma }_{8} = \left (\begin {array}{cc} 0.0055 & 0.0000 \\ 0.0000 & 0.0105 \end {array} \right )\), \(\hat {w}_{9}= 0.3004\), \(\hat {\mu }_{9}=(0.34,0.35)\), \(\hat {\Sigma }_{9} = \left (\begin {array}{cc} 0.0050 & 0.0047 \\ 0.0047 & 0.0071 \end {array}\right )\), \(\hat {w}_{10}= 0.2715\), \(\hat {\mu }_{10}=(0.12,0.29)\), \(\hat {\Sigma }_{10} = \left (\begin {array}{cc} 0.0040 & 0.0049 \\ 0.0049 & 0.0083 \end {array} \right )\), \(\hat {w}_{11}= 0.2715\), \(\hat {\mu }_{11}=(0.22,0.15)\) and \(\hat {\Sigma }_{11} = \left (\begin {array}{cc} 0.0053 & 0.0029 \\ 0.0029 & 0.0049 \end {array} \right )\).
Model 5::: mixture of five bivariate normal components with estimated parameters \(\hat {w}_{12}= 0.2588\), \(\hat {\mu }_{12} = (0.12,0.28)\), \(\hat {\Sigma }_{12} = \left (\begin {array}{cc} 0.0037 & 0.0046 \\ 0.0046 & 0.0077 \end {array} \right )\), \(\hat {w}_{13} = 0.1435\), \(\hat {\mu }_{13} = (0.40,0.17)\), \(\hat {\Sigma }_{13}=\left (\begin {array}{cc} 0.0045 & 0.0010 \\ 0.0010 & 0.0094 \end {array} \right )\), \(\hat {w}_{14}= 0.2776\), \(\hat {\mu }_{14} = (0.33,0.34)\), \(\hat {\Sigma }_{14} = \left (\begin {array}{cc} 0.0050 & 0.0052 \\ 0.0052 & 0.0073 \end {array} \right )\), \(\hat {w}_{15} = 0.0916\), \(\hat {\mu }_{15}=(0.24,0.25)\), \(\hat {\Sigma }_{15} = \left (\begin {array}{cc} 0.0057 & 0.0057 \\ 0.0057 & 0.0095 \end {array} \right )\), \(\hat {w}_{16}= 0.2284\), \(\hat {\mu }_{16} = (0.23,0.15)\) and \(\hat {\Sigma }_{16} = \left (\begin {array}{cc} 0.0049 & 0.0034 \\ 0.0034 & 0.0055 \end {array} \right )\).
Model 6::: mixture of six bivariate normal components with estimated parameters \(\hat {w}_{17} = 0.2478\), \(\hat {\mu }_{17} = (0.12,0.28)\), \(\hat {\Sigma }_{17} = \left (\begin {array}{cc} 0.0036 & 0.0044 \\ 0.0044 & 0.0074 \end {array} \right )\), \(\hat {w}_{18}= 0.0883\), \(\hat {\mu }_{18}=(0.25,0.29)\), \(\hat {\Sigma }_{18} = \left (\begin {array}{cc} 0.0061 & 0.0067 \\ 0.0067 & 0.0108 \end {array} \right )\), \(\hat {w}_{19} = 0.2527\), \(\hat {\mu }_{19} = (0.33,0.34)\), \(\hat {\Sigma }_{19} = \left (\begin {array}{cc} 0.0054 & 0.0059 \\ 0.0059 & 0.0083 \end {array} \right )\), \(\hat {w}_{20}= 0.1372\), \(\hat {\mu }_{20}=(0.41,0.16)\), \(\hat {\Sigma }_{20} = \left (\begin {array}{cc} 0.0039 & 0.0016 \\ 0.0016 & 0.0092 \end {array} \right )\), \(\hat {w}_{21}= 0.1943\), \(\hat {\mu }_{21}=\!(0.23,0.16)\), \(\hat {\Sigma }_{21} =\! \left (\begin {array}{cc} 0.0046 & 0.0042 \\ 0.0042 & 0.0067 \end {array} \!\right )\), \(\hat {w}_{22}= 0.0797\), \(\hat {\mu }_{22} =\! (0.24,0.20)\) and \(\hat {\Sigma }_{22} = \left (\begin {array}{cc} 0.0055 & 0.0059 \\ 0.0059 & 0.0090 \end {array}\right )\).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Avlogiaris, G., Micheas, A.C. & Zografos, K. A Criterion for Local Model Selection. Sankhya A 81, 406–444 (2019). https://doi.org/10.1007/s13171-018-0126-x

Download citation

Received: 12 July 2017
Published: 29 March 2018
Issue Date: December 2019
DOI: https://doi.org/10.1007/s13171-018-0126-x

Keywords and phrases.

AMS (2000) subject classification.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Criterion for Local Model Selection

Abstract

Article PDF

Similar content being viewed by others

On local divergences between two probability measures

Robust statistical inference based on the C-divergence family

Adaptation of the tuning parameter in general Bayesian inference with robust divergence

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix A

1.1 A.1 Estimated Parameters of the Candidate Models in the Redwood Trees Application

Rights and permissions

About this article

Cite this article

Keywords and phrases.

AMS (2000) subject classification.

Navigation

A Criterion for Local Model Selection

Abstract

Article PDF

Similar content being viewed by others

On local divergences between two probability measures

Robust statistical inference based on the C-divergence family

Adaptation of the tuning parameter in general Bayesian inference with robust divergence

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix A

Appendix A

1.1 A.1 Estimated Parameters of the Candidate Models in the Redwood Trees Application

Rights and permissions

About this article

Cite this article

Share this article

Keywords and phrases.

AMS (2000) subject classification.

Search

Navigation