Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

2.1 The Setup

Assume that we have observed data D = x which was the result of a random experiment X (or can be approximated as such). The data are then modelled using

  1. 1.

    A sample space, \(\mathcal{X}\) for the observed value of x

  2. 2.

    A probability density function for X at x, f(x; θ)

  3. 3.

    A parameter space for θ, \(\Theta \)

The inference problem is to use x to infer properties of θ.

2.2 Approaches to Statistical Inference

The major approaches to statistical inference are:

  1. 1.

    Frequentist or classical

  2. 2.

    Bayesian

  3. 3.

    Likelihood

2.3 Types of Statistical Inference

There are four major statistical inferences:

  1. 1.

    Estimation: Select one value of θ, the estimate, to be reported. Some measure of reliability is assumed to be reported as well.

  2. 2.

    Testing: Compare two values (or sets of values) of θ and choose one of them as better.

  3. 3.

    Interval Estimation: Select a region of θ values as being consistent, in some sense, with the observed data.

  4. 4.

    Prediction: Use the observed data to predict a new result of the experiment.

Note that the first three inferences can be defined as functions from the sample space to subsets of the parameter space. Thus estimation of θ is achieved by defining

$$\displaystyle{\widehat{\theta }\::\; \mathcal{X}\;\mapsto \;\Theta }$$

Then the observation of x results in \(\widehat{\theta }(x)\) as the estimated value of θ for the observed data. Similarly hypothesis testing maps \(\mathcal{X}\) into \(\{\Theta _{0},\Theta _{1}\}\) and interval estimation maps \(\mathcal{X}\) into subsets (intervals) of \(\Theta \).

2.4 Statistics and Combinants

2.4.1 Statistics and Sampling Distributions

Since inferences are defined by functions on the sample space it is convenient to have some nomenclature.

Definition 2.4.1.

A statistic is a real or vector-valued function defined on the sample space of a statistical model.

The sample mean, sample variance, sample median, and sample correlation are all statistics.

Definition 2.4.2.

The probability distribution of a statistic is called its sampling distribution.

A major problem in standard or frequentist statistical theory is the determination of sampling distributions:

  1. 1.

    Either exactly (using probability concepts)

  2. 2.

    Approximately (using large sample results)

  3. 3.

    By simulation (using R or similar statistical software)

2.4.2 Combinants

Definition 2.4.3.

A combinant is a real or vector-valued function defined on the sample space and the parameter space such that for each fixed θ it is a statistic.

Thus a combinant is defined for pairs (x, θ) where x is in the sample space and θ is in the parameter space. For each θ it is required to be a statistic.

The density function f(x; θ) is a combinant, as are the likelihood and functions of the likelihood.

Definition 2.4.4.

If f(x; θ) is the density of x the score function is the combinant defined by

$$\displaystyle{s(\theta;x) = \frac{\partial f(x:\theta )} {\partial \theta } }$$

(This assumes differentiation with respect to θ is defined.)

Definition 2.4.5.

The score equation is the equation (in θ) defined by

$$\displaystyle{s(\theta;x) = \frac{\partial f(x:\theta )} {\partial \theta } = 0}$$

The solution to this equation gives the maximum likelihood estimate, MLE, of θ.

Combinants are used to determine estimates, interval estimates, and tests as well as to investigate the frequency properties of likelihood-based quantities.

2.4.3 Frequentist Inference

In the frequentist paradigm inference is the process of connecting the observed data and the inference (statements about the parameters) using the sampling distribution of a statistic. Note that the sampling distribution is determined by the density function f(x; θ).

2.4.4 Bayesian Inference

In the Bayesian paradigm inference is the process of connecting the observed data and the inference (statements about the parameters) using the posterior distribution of the parameter values. The posterior distribution is determined by the model density and the prior distribution of θ using Bayes theorem (this implicitly treats f(x; θ) as the conditional f(x | θ) of X given θ):

$$\displaystyle{p(\theta \vert x) = \frac{f(x;\theta )\mbox{ prior}(\theta )} {f(x)} }$$

where f(x) is the marginal distribution of X at x.

$$\displaystyle{f(x) =\int _{\Theta }f(x;\theta )\mbox{ prior}(\theta )d\theta }$$

2.4.5 Likelihood Inference

In the likelihood paradigm inference is the process of evaluating the statistical evidence for parameter values provided by the likelihood function.

The statistical evidence for θ 2 vis-a-vis θ 1 is defined by

$$\displaystyle{\mbox{ Ev}(\theta _{2}:\theta _{1};x) = \frac{f(x;\theta _{2})} {f(x;\theta _{1})}}$$

Values for this ratio of 8, 16, and 32 are taken as moderate, strong, and very strong evidence, respectively.

Note that if we define the likelihood of θ as

$$\displaystyle{\mathcal{L}(\theta;x) = \frac{f(x;\theta )} {f(x;\widehat{\theta })}}$$

where \(\widehat{\theta }\) is the maximum likelihood estimate of θ, then the statistical evidence for θ 2 vs θ 1 can be expressed as

$$\displaystyle{\mbox{ Ev}(\theta _{2}:\theta _{1};x) = \frac{\mathcal{L}(\theta _{2};x)} {\mathcal{L}(\theta _{1};x)}}$$

and the posterior of θ can then be expressed as

$$\displaystyle{p(\theta \vert x) = \frac{\mathcal{L}(\theta;x)\mbox{ prior}(\theta )} {f(x)} }$$

i.e., the posterior is proportional to the product of the likelihood and the prior.

2.5 Exercises

As pointed out in the text if f(x; θ) is the density function of the observed data (x 1, x 2, , x n ) and θ is the parameter, then

  1. (a)

    The likelihood, \(\mathcal{L}(\theta;\mathbf{x})\), is

    $$\displaystyle{\mathcal{L}(\theta ) = \frac{f(\mathbf{x};\theta )} {f(\mathbf{x};\widehat{\theta })}}$$

    where \(\widehat{\theta }\) maximizes f(x; θ) and is called the maximum likelihood estimate of θ.

  2. (b)

    The score function is

    $$\displaystyle{\frac{\partial \ln [f(\mathbf{x};\theta )]} {\partial \theta } }$$
  3. (c)

    The observed Fisher information is

    $$\displaystyle{J(\theta ) = -\:\frac{\partial ^{2}\ln [f(\mathbf{x};\theta )]} {\partial \theta ^{2}} }$$

    evaluated at \(\theta =\widehat{\theta }\).

  4. (d)

    The expected Fisher information, I(θ), is the expected value of J(θ), i.e.,

    $$\displaystyle{I(\theta ) = -\mathbb{E}\left \{\frac{\partial ^{2}\ln [f(\mathbf{x};\theta )]} {\partial \theta ^{2}} \right \}}$$
  1. 1.

    Find the likelihood, the maximum likelihood estimate, the score function, and the observed and expected Fisher information when x 1, x 2, , x n represent the results of a random sample from

    1. (i)

      A normal distribution with expected value θ and known variance σ 2

    2. (ii)

      A Poisson distribution with parameter θ

    3. (iii)

      A Gamma distribution with known parameter α and θ

  2. 2.

    For each of the problems in (1) generate a random sample of size 25, i.e.:

    1. (i)

      Take σ 2 = 1 and θ = 3.

    2. (ii)

      Take θ = 5.

    3. (iii)

      Take α = 3 and θ = 2.

    For (i)–(iii) plot the likelihood functions.

  3. 3.

    Suppose that Y i for i = 1, 2, , n are independent, each normal with expected value βx i and variance σ 2 where σ 2 is known and the x i are known constants.

    1. (i)

      Show that the joint density is

      $$\displaystyle{f(\mathbf{y};\beta ) = (2\pi \sigma ^{2})^{-n/2}\exp \left \{-\frac{1} {2\sigma ^{2}}\sum _{i=1}^{n}(y_{ i} -\beta \: x_{i})^{2}\right \}}$$
    2. (ii)

      Find the score function.

    3. (iii)

      Show that the maximum likelihood estimate for β is

      $$\displaystyle{\widehat{\beta }= \frac{\sum _{i=1}^{n}x_{i}y_{i}} {\sum _{i=1}^{n}x_{i}^{2}} }$$
    4. (iv)

      Find the observed Fisher information.

    5. (v)

      Using (iii) find the likelihood for β.

    6. (vi)

      Find the sampling distribution of \(\widehat{\beta }\). Remember that the sum of independent normal random variables is also normal.

    7. (vii)

      Show that the sampling distribution of \(-2\ln [\mathcal{L}(\beta;\mathbf{y})]\) is chi-square with 1 degree of freedom.