11.1 Statistical Decision

When we draw an inference after applying statistical tests to observations of single sample from population, such a decision is termed as “statistical decision.”

11.2 Statistical Hypothesis (Null Hypothesis)

To take a statistical decision, we must make certain assumptions about the population involved. Such assumptions may prove to be true or false. These assumptions are called hypotheses.

We usually frame the null hypothesis with an intension of rejecting it. Suppose there are two treatments “A” and “B.” To prove one treatment is more effective than the other, we frame the null hypothesis (H o) that there is no difference in the effects of two treatments. Under H o the observed differences may only be due to sample fluctuations. Any hypothesis which differs from H o is called an alternative or unconventional hypothesis.

11.3 Test of Hypothesis and Significance

After framing a null hypothesis, we would like to try to determine the difference in the random sample, from the population value under the null hypothesis. If this difference is markedly high on the basis of chance fluctuations with respect to theory of sampling distributions, we reject the assumed hypothesis and declare that the difference is significant.

The procedures laid to decide whether to accept or reject the hypothesis or to determine whether observed sample differs significantly from expected results are called the tests of hypothesis, tests of significance, or rules of statistical decision.

11.4 Type I and Type II Errors

If we deliberately reject H o when the other H o is true, we commit an error. This error will be called “Type I” error. On the other hand, if we accept H o when it must be rejected, again we commit an error. This type of error is called “Type II” error.

We should minimize the errors for a good statistical decision. It is a difficult task to minimize errors as sometimes minimizing one type of error would increase the other type of error. Practically one type of error may be more serious than the other type. Hence under compromise one can try to limit the more serious error for a good statistical inference. There should be deliberate effort to increase the sample size to limit the both types of errors.

11.5 Level of Significance

In testing a hypothesis, the maximum probability with which we would like to restrict “Type I” error is called level of significance of the test. This probability is often denoted by ∝ and is generally specified before a sample is drawn.

In practice the level of significance chosen is 0.05 (5% level of significance). In designing a test of significance, there would be only 5 chances in 100 for accepting the H o. This denotes that we are 95% confident of rejecting the null hypothesis.

11.6 Tests Involving Normal Distribution

If observations (X) are normally distributed with mean μ and variance (σ 2), then the “standard normal variate” \( Z=\frac{x-\mu }{\sigma } \)is also normally distributed with mean “0” (of variate) and variance “1.”

Under the null hypothesis (H o), the “Z”-scores of the sample statistic will be between −1.96 and 1.96 in 95 samples out of 100 samples. Hence, we are 95% confident that the “Z”-scores will lie in this region if the hypothesis is true.

However, if a single sample is drawn at random and has a “Z”-score lying outside the region of −1.96 to 1.96, we would conclude that such an event can happen only in 5 cases out of 100 cases, if the given hypothesis is true. We would then say Z-score differed significantly from expected under the hypothesis. Hence the null hypothesis is rejected. The graphic illustration has been shown in Fig. 11.1 exhibiting shaded areas beyond −1.96 to 1.96.

Fig. 11.1
figure 1

Normal distribution curve showing shaded areas for variate beyond −1.96 to 1.96 in a two-tailed test

The shaded area (5%) is the level of significance of the test. It represents the probability of our being wrong in rejecting the H o. If the Z-scores fall in the shaded area, the critical region, we reject the null hypothesis at 5% level of significance. It would mean that Z-scores of a given sample statistics are significant at 0.05 level of significance (p < 0.05).

11.7 Rules of Statistical Decision

  1. 1.

    Reject the null hypothesis at 5% level of significance (p > 0.05) if Z-score of the sample statistic lies outside the range –1.96 to 1.96.

  2. 2.

    Accept the null hypothesis otherwise.

  3. 3.

    Other levels of significance could be 1% (p < 0.01) or 0.1% (p < 0.001).

11.8 One-Tailed or Two-Tailed Tests

Figure 11.1 represents critical region of rejection of the null hypothesis on both the extreme tails. Due to this type of presentation, such tests are called two-tailed tests. Sometimes we may be interested in the extreme values on one side of the mean (i.e., in one tail of distribution). For example, if we are interested in testing the hypothesis that one process is better than the other “rather than testing one process is better or worse than the other,” we apply one-tailed tests. In such cases, critical region of rejection of the null hypothesis (H o) is only on one side of the distribution.

Critical values of “standard normal variate” “Z” both for “one-tailed” and “two-tailed” tests at 5% (p < 0.05), 1% (P < 0.01), and 0.01% (p < 0.001) levels of significance have been depicted in Table 11.1.

Table 11.1 Critical values of standard normal variate “Z

11.9 Student’s “t”-Distribution

When samples are large (n > 30), the sampling distribution is approximately normal. In case of small samples (n ≤ 30), the distribution cannot be considered normal. The sample distribution may swing to the left or right with decrease in sample size. The “small sampling theory” is applicable to both the small and large samples. It is also branded as “exact sampling theory.”

$$ \mathrm{Statistics}:t=\frac{\left|\overline{x}-\mu \right|}{\raisebox{1ex}{$s$}\!\left/ \!\raisebox{-1ex}{$\sqrt{n}$}\right.} $$

Suppose samples of size “n” are drawn from a normal population or approximately normal population with population mean “μ” and standard deviation σ. For each sample if we compute “t” using the sample mean “\( \overline{X} \)” and standard deviation “s,” we would get a sampling distribution for “t.” This distribution is given by the formula

$$ Y=\frac{Y_{\mathrm{o}}}{{\left(1+\frac{t^2}{n-1}\right)}^{\raisebox{1ex}{$n$}\!\left/ \!\raisebox{-1ex}{$2$}\right.}} $$

wherein, Y o is a constant depending on “n” such that the total area under the curve is 1.

This distribution is known as Student’s “t”-distribution. This was postulated and published by Gosset under the pseudonym “Student” during the early part of the twentieth century.

For large sample size (n ≥30), this curve closely approximates to the “standard normal curve.”

$$ \mathrm{Y}=\frac{1}{\mathcal{\sigma}\sqrt{2\pi }}\cdot {e}^{-\frac{(t)^2}{2}}. $$

11.10 Confidence Intervals

Using the table of t-distribution, we can define 95% and 99% confidence intervals. By doing so, we will be able to estimate the population parameter (μ) within specified limits.

If –t 0. 025 and t 0.025 are the values of “t”-distribution for which 2.5% of the area lies in each tail of the “t”-distribution curve/graph, then 95% confidence interval for t is

$$ \hbox{--} {t}_{0.025}<\frac{\left|\overline{x}-\mu \right|}{\raisebox{1ex}{$s$}\!\left/ \!\raisebox{-1ex}{$\sqrt{n}$}\right.}<{t}_{0.025} $$

Therefore:

$$ \hbox{--} {t}_{0.025}\times \frac{s}{\sqrt{n}}<\left(\left|\overline{X}-\mu \right|\right)<{t}_{0.025}\times \frac{s}{\sqrt{n}}\kern0.5em \mathrm{or} $$
$$ \hbox{--} {t}_{0.025}\times \frac{s}{\sqrt{n}}<\mu <\overline{X}<{t}_{0.025}\times \frac{s}{\sqrt{n}} $$

Hence, μ lies in the interval given above with 95% confidence (i.e., probability 0.95). The t0.025 represents 97.5th percentile value, while –t 0. 025 represents 2.5th percentile value. In general, we can represent confidence limits for population means by

$$ \overline{X}\pm {t}_{\mathrm{c}}\times \frac{s}{\sqrt{n}} $$

where, t c = critical value or confidence coefficient.

This depends on the level of confidence desired and the sample size.

11.11 Applications of Student’s t-Test

Student’s t-test is applied in a variety of ways for statistical analysis as listed below:

  1. 1.

    Comparison of sample with population

  2. 2.

    Comparison of sample with sample

  3. 3.

    Comparison of sample with sample by “t-paired test”

11.11.1 Comparison of Sample with Population

Example 1

Ten individuals are chosen at random from a population, and their heights are measured in inches and noted down as 65, 66, 66, 67, 68, 69, 70, 70, 71, and 71. In the light of this data, discuss the suggestion that mean height in the population is 67 inches.

The following conditions are fulfilled:

  1. 1.

    The population distribution of height is approximately normally distributed.

  2. 2.

    The random sample has been drawn from the population.

Suggestions

Null hypothesis (H o): μ = 67″

Alternate hypothesis (H i): μ ≠ 67″

Solution

  • \( \overline{X}=\frac{\Sigma X}{N}=\frac{683}{10}={68.3}^{{\prime\prime} } \)

  • \( s=\sqrt{\frac{\Sigma {\left(X-\overline{X}\right)}^2}{n-1}}=\sqrt{\frac{44.1}{9}}=\sqrt{4.9}=2.214 \)

  • \( t=\frac{\left|\overline{x}-\mu \right|}{\raisebox{1ex}{$s$}\!\left/ \!\raisebox{-1ex}{$\sqrt{n}$}\right.}=\frac{68.3-67}{2.214}\times \sqrt{10}=\frac{1.3}{2.214}\times \sqrt{10}=1.855 \)

    • Degrees of freedom (df) = 10−1 = 9

    • Referring to the table values of t-distribution for 9 df, we get t 0.05 = 2.262.

    • Here, t < t 0.05; therefore p > 0.05. Hence, it is not significant at 5% level of significance.

Conclusion

We fail to reject the null hypothesis (H o). Hence, we accept with 95% confidence that the mean height of population may be equal to 67″ from which this random sample has been drawn.

11.11.2 Comparison of Sample with Sample

Requirements for comparison of two samples

 

Sample I

Sample II

Mean

\( {\overline{X}}_1=\frac{\Sigma X}{n_1} \)

\( {\overline{X}}_2=\frac{\Sigma X}{n_2} \)

Standard deviation

\( {s}_1=\sqrt{\frac{\Sigma {\left(X-{\overline{X}}_1\right)}^2}{n_1-1}} \)

\( {s}_2=\sqrt{\frac{\Sigma {\left(X-{\overline{X}}_2\right)}^2}{n_2-1}} \)

If both the samples are drawn from the same population, then the estimates s 1 and s 2 may be pooled, to get a better estimate of the population’s “standard deviation” (s p). The pooled estimate is worked out by the following formula:

$$ {s}_p=\sqrt{\frac{\Sigma {\left(X-{\overline{X}}_1\right)}^2+\Sigma {\left(X-{\overline{X}}_2\right)}^2}{n_1+{n}_2-2}} $$
$$ {\left({s}_p\right)}^2=\frac{\Sigma {\left(X-{\overline{X}}_1\right)}^2+\Sigma {\left(X-{\overline{X}}_2\right)}^2}{n_1+{n}_2-2} $$

Degrees of freedom (df) = n 1 + n 2 − 2

$$ {\displaystyle \begin{array}{c}\mathrm{Now}:t=\frac{\left|{\overline{X}}_1-{\overline{X}}_2\right|}{\raisebox{1ex}{${s}_p$}\!\left/ \!\raisebox{-1ex}{$\sqrt{n}$}\right.}\kern0.5em \mathrm{or}\\ {}t=\frac{\left|{\overline{X}}_1-{\overline{X}}_2\right|}{\sqrt{{\frac{s_1}{n_1}}^2+{\frac{s_2}{n_2}}^2}}\end{array}} $$

We have to test whether this value of “t” is >t 0.05 or <t 0.05 for the said degrees of freedom (df). If t > t 0.05, then we reject the null hypothesis (H o). If t < t 0.05, then we accept the null hypothesis.

Example 2

An experiment was conducted for assessing the effect of vitamin A-deficient diet. Out of the 20 inbred rats, 10 rats were fed on normal diet, and the other 10 were fed on vitamin A-deficient diet. The amount of vitamin A in the serum of rats of both the groups was determined, and mean and standard deviation was worked out as shown in Table 11.2.

  1. (a)

    Find out whether the mean value (\( {\overline{X}}_2 \)) of rats fed on vitamin A-deficient diet is the same as the mean value (\( {\overline{X}}_1 \)) of those fed on normal diet.

  2. (b)

    If difference is there, prove that this difference is due to sampling variation or due to the deficiency of vitamin A.

Table 11.2 Mean and standard deviation of vitamin A levels in two groups of inbred rats

Solution

  1. (a)

    The absolute difference between means of groups: \( \left|{\overline{X}}_1-{\overline{X}}_2\right| \) = 3375–2570 = 805 IU.

  2. (b)

    Now 805 IU is the absolute difference of means of two groups \( \left(\left|{\overline{X}}_1-{\overline{X}}_2\right|\right) \) of samples n 1 and n 2 from the normal populations.

Under the null hypothesis, the statistical value of “t” for the sampling distributions is worked out as

$$ t=\frac{\left|{\overline{X}}_1-{\overline{X}}_2\right|}{\sqrt{{\frac{s_1}{n_1}}^2+{\frac{s_2}{n_2}}^2}}=\frac{805}{260}=3.096 $$

df = 10 + 10–2 = 18

We have to find out whether this value of “t” is >t 0.05 or <t 0.05 at the said degrees of freedom. If value of “t” is <t 0.05, then we reject the null hypothesis, and if it is >t 0.05, we accept the null hypothesis. When the value of “t” is >t 0.05 at df = 18, we conclude that p < 0.05.

In the example cited above, the value of “t” is 3.096 and is even >t 0.01. So, p < 0.01. The difference is due to the deficiency of vitamin A. Hence, null hypothesis is rejected.

11.11.3 Comparison of Sample with Sample by “t-Paired Test”

The “t-paired test” is applied when two sets of observations are to be compared from the same subjects/patients. The data could be sets of clinical/biological or biochemical investigations on a group of patients.

Example 3

In a group of nine hypertensive patients, systolic blood pressure was recorded in mm of Hg, before and after treatment as exhibited in Table 11.3. Test the significance of treatment on patients or accept the null hypothesis (H o).

Table 11.3 Systolic blood pressure (BP) in nine patients before and after the treatment

Solution

Formula for t-paired test:

$$ t=\frac{\left|\overline{d}\right|}{\mathrm{S}.\mathrm{E}.}\kern0.5em \mathrm{OR}\kern0.5em t=\frac{\left|\overline{d}\right|}{\frac{\mathrm{S}\mathrm{D}}{\surd n}}\kern0.5em \mathrm{OR}\kern0.5em t=\frac{\left|\overline{d}\right|}{\sqrt{\frac{s^2}{n}}} $$
$$ \mathrm{Mean}\kern0.5em \mathrm{difference}\kern0.5em \left(\overline{d}\right)=\frac{103}{9}=11.4. $$
$$ {\displaystyle \begin{array}{c}\mathrm{Square}\kern0.5em \mathrm{of}\kern0.5em \mathrm{standard}\kern0.5em \mathrm{deviation}:{s}^2=\frac{\Sigma {d}^2-\frac{{\left(\Sigma d\right)}^2}{n}}{n-1}=\frac{2415-1178.8}{8}=\frac{1236.2}{8}\\ {}=154.5.\end{array}} $$
$$ \mathrm{Standard}\kern0.5em \mathrm{deviation}:s=\sqrt{154.5}=12.43. $$
$$ \mathrm{Now}:t=\frac{\left|\overline{d}\right|}{\frac{\mathrm{SD}}{\surd n}}=\frac{\left|\overline{d}\right|}{s}\times \sqrt{n}=\frac{11.4}{12.4}\times \sqrt{9}=\frac{11.4}{12.4}\times 3=2.76. $$

Degrees of freedom (df) = 9–1 = 8.

Table values of “t” at df = 8 are t 0.05 = 2.306 and t 0.01 = 3.355.

Since the computed t-value (2.76) for the given data is >t0.05 (2.306), which translates to p < 0.05, so we reject the null hypothesis (H o). In this case the t (2.76) is <t 0.01 (3.355). Hence, we conclude that the difference due to treatment with a certain drug is significant at 5% level (p < 0.05).