Abstract
The emergence of the magic number 2 in recent statistical literature is explained by adopting the predictive point of view of statistics with entropy as the basic criterion of the goodness of a fitted model. The historical development of the concept of entropy is reviewed, and its relation to statistics is explained by examples. The importance of the entropy maximization principle as a basis of the unification of conventional and Bayesian statistics is discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
Bibliography
Aitchison, J. (1975). “Goodness of prediction fit.” Biometrika, 62, 547–554
Akaike, H. (1969). “Fitting autoregressive models for prediction.” Ann. Inst. Statist. Math., 21, 243–247.
Akaike, H. (1970). “Statistical predictor identification.” Ann. Inst. Statist. Math., 22, 203–217.
Akaike, H. (1973). “Information theory and an extension of the maximum likelihood principle.” In B. N. Petrov and F. Csaki (eds.), Second International Symposium on Information Theory. Budapest: Akademiai Kiado, 267–281.
Akaike, H. (1974). “A new look at the statistical model identification.” IEEE Trans. Automat. Control, AC-19, 716–723.
Akaike, H. (1977). On entropy maximization principle. In P. R. Krishnaiah, (ed.), Applications of Statistics. Amsterdam: North-Holland, 27–41.
Akaike, H. (1978a). “A Bayesian analysis of the minimum aic procedure”. Ann. Inst. Statist. Math., 30A, 9–14.
Akaike, H. (1978b). “A new look at the Bayes procedure”. Biometrika, 65, 53–59.
Akaike, H. (1979). “A Bayesian extension of the minimum aic procedure of autoregressive model fitting.” Biometrika, 66, 237–242.
Akaike, H. (1980a). “Seasonal adjustment by a Bayesian modeling.” J. Time Series Anal., 1, 1–13.
Akaike, H. (1980b). “Ignorance prior distribution of a hyperparameter and Stein‘s estimator.” Ann. Inst. Statist. Math., 33A, 171–179.
Akaike, H. (1981). “Abstract and commentary on ‘A new look at the statistical model identification’.” Current Contents, Engineering, Technology and Applied Sciences, 12, No. 51, 22.
Akaike, H. (1983a). “On minimum information prior distributions.” Ann. Inst. Statist. Math., 34A, 139–149.
Akaike, H. (1983b). “Information measures and model selection.” In Proceedings of the 44th Session of ISI, 1, 277–291.
Atkinson, A. C. (1980). “A note on the generalized information criterion for choice of a model.” Biometrika, 67, 413–418.
Bahadur, R. R. (1967). An optimal property of the likelihood ratio statistic. In L. M. Lam and J. Neyman (eds.), Proc. 5th Berkeley Symp. Math. Statist. and Probab., 1. Berkeley: Univ. of California Press, 13–26.
Bernardo, J. M. (1979). “Reference posterior distributions for Bayesian inference (with discussion).” J. Roy. Statist. Soc. Ser. B, 41, 113–147.
Boltzman, L. (1872). “Weitere Studien über das Wärmegleichgewicht unter Gasmolekülen.” Wiener Berichte, 66, 275–370.
Boltzman, L. (1877a). “Bemerkungen über einige Probleme der mechanischen Wärmetheorie.” Wiener Berichte, 75, 62–100.
Boltzman, L. (1877b). “Über die Beziehung zwischen dem zweiten Hauptsatze der mechanischen Wärmetheorie und der Wahrscheinlichkeitsrechnung respective den Sätzen über das Wärmegleichgewicht.” Wiener Berichte, 76, 373–435.
Boltzmann, L. (1878). “Weitere Bemerkungen über einige Plobleme der mechanischen Wärmetheorie.” Wiener Berichte, 78, 7–46.
Box, G. E. P. (1980). “Sampling and Bayes’ inference in scientific modelling and robustness.” J. Roy. Statist. Soc. Ser. A, 143, 383–430.
Chernoff, H. (1956). “Large sample theory—parametric case.” Ann. Math. Statist., 27, 1–22.
Csiszar, I. (1975). “I-divergence geometry of probability distributions and minimization problems.” Ann. Probab., 3, 146–158.
Fienberg, S. E. (1980). Analysis of Cross-classified Categorical Data (2nd ed.). Cambridge, MA: M.I.T. Press.
de Finetti, B. (1972). Probability, Induction and Statistics. London: Wiley.
Fisher, R. A. (1935). “The fiducial argument in statistical inference.” Ann. Eugenics, 6, 391–398. Paper 25 in Contributions to Mathematical Statistics (1950). New York: Wiley.
Fisher, R. A. (1936). “Uncertain inference.” Proc. Amer. Acad. Arts and Sciences, 71, 245–258.
Geisser, S. and Eddy, W. F. (1979). “A predictive approach to model selection.” J. Amer. Statist. Assoc., 74, 153–160.
Good, I. J. (1965). The Estimation of Probabilities. Cambridge, MA: M.I.T. Press.
Goodman, L. A. (1971). “The analysis of multidimensional contingency tables: Stepwise procedures and direct estimation methods for building models for multiple classifications.” Technometries, 13, 33–61.
Guttman, I. (1967). “The use of the concept of a future observation in goodness-of-fit problems.” J Roy. Statist. Soc. Ser. B, 29, 83–100.
Hannan, E. J. and Quinn, B. G. (1979). “The determination of the order of an autoregression.” J. Roy. Statist. Soc. Ser. B, 41, 190–195.
Jaynes, E. T. (1957). “Information theory and statistical mechanics.” Phys. Rev., 106, 620–630; 108, 171–182.
Jeffreys, H. (1946). “An invariant form for the prior probability in estimation problems.” Proc. Roy. Soc. London Ser. A, 186, 453–461.
Kerridge, D. F. (1961). “Inaccuracy and inference.” J. Roy. Statist. Soc. Ser. B, 23, 184–194.
Kullback, S. (1959). Information Theory and Statistics. New York: Wiley.
Kullback, S. and Leibler, R. A. (1951). “On information and sufficiency.” Ann. Math. Statist., 22, 79–86.
Leonard, T. (1977). “A Bayesian approach to some multinomial estimation and pretesting problems.” J. Amer. Statist. Assoc., 72, 869–876.
Leonard, T. and Ord, K. (1976). “An investigation of the F-test procedure as an estimation short-cut.” J. Roy. Statist. Soc. Ser. B, 38, 95–98.
Lindley, D. V. (1956). “On a measure of the information provided by an experiment.” Ann. Math. Statist., 27, 986–1005.
Mallows, C. L. (1973). “Some comments on C p .” Technometrics, 15, 661–675.
Pearson, K. (1929). “Laplace, being extracts from lectures delivered by Karl Pearson.” Biometrika, 21, 202–216.
Rao, C. R. (1961). “Asymptotic efficiency and limiting information.” In J. Neyman, (ed.), Proc. 4th Berkeley Symp. Math. Statist, and Probab., 1. Berkeley: Univ. of California Press, 531–548.
Rao, C. R. (1962). “Efficient estimates and optimum inference procedures in large samples.” J. Roy. Statist. Soc. Ser. B, 24, 46–72.
Sanov, I. N. (1957). “On the probability of large deviations of random variables.” (in Russian). Mat. Sbornik N.S., 42, No. 84, 11–44. English transl., Selected Transl. Math. Statist. Probab., 1 (1961), 213–244.
Schwarz, G. (1978). “Estimating the dimension of a model.” Ann. Statist., 6, 461–464.
Shannon, C. E. and Weaver, W. (1949). The Mathematical Theory of Communication. Urbana: Univ. of Illinois Press.
Shibata, R. (1980). “Asymptotically efficient selection of the order of the model for estimating parameter of a linear process.” Ann. Statist., 8, 147–164.
Smith, A. F. M. and Spiegelhalter, D. J. (1980). “Bayes factors and choice criteria for linear models.” J. Roy. Statist. Soc. Ser. B, 42, 213–220.
Stigler, S. M. (1975). “The transition from point to distribution estimation.” In Proceedings of the 40th ISI Meeting, 2, 332–340.
Stone, C. J. (1982). “Local asymptotic admissibility of a generalization of Akaike ‘s model selection rule.” Ann. Inst. Statist. Math., 34A, 123–133.
Stone, M. (1974). “Large deviations of empirical probability measures.” Ann. Statist., 2, 362–366.
Stone, M. (1977a). “Asymptotics for and against cross-validation.” Biometrika, 64, 29–35.
Stone, M. (1977b). “Asymptotics equivalence of choice of models by cross-validation and Akaike ‘s criterion.” J. Roy. Statist. Soc. Ser. B, 39, 44–47.
Wald, A. (1943). “Tests of statistical hypotheses concerning several parameters when the number of observations is large.” Trans. Amer. Math. Soc., 54, 426–482.
Williams, P. M. (1980). “Bayesian conditionalization and the principle of minimum information.” Brit. J. Philos. Sci., 31, 131–144.
Zellner, A. (1977). “Maximal data information prior distributions.” In A. Aykac and C. Brumat (eds.), New Developments in the Applications of Bayesian Methods. Amsterdam: North-Holland, 211–232.
Zellner, A. (1978). “Jeffreys-Bayes posterior odds ratio and the Akaike information criterion for discriminating between models.” Economic Letters, 1, 337–342.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1985 Springer-Verlag New York Inc.
About this paper
Cite this paper
Akaike, H. (1985). Prediction and Entropy. In: Atkinson, A.C., Fienberg, S.E. (eds) A Celebration of Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4613-8560-8_1
Download citation
DOI: https://doi.org/10.1007/978-1-4613-8560-8_1
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4613-8562-2
Online ISBN: 978-1-4613-8560-8
eBook Packages: Springer Book Archive