Keywords

1 Introduction

The χ 2 Pearson goodness-of-fit test is very popular in various applications, including the investigation of distributions of measurement error in problems of metrological support.

The correct usage of the Pearson χ 2 test for composite hypotheses (including testing normality) provides estimation of unknown parameters by the grouped data, as in the case of calculating parameter estimates by the original non-grouped sample the test statistic distribution differs from the χ 2 – distribution significantly [1, 2]. By this reason, a series of modified χ 2 tests has been offered, the most famous of which is the Nikulin-Rao-Robson test [35]. Moreover, it is necessary to take into account, that the power of Pearson test depends on the number of grouping intervals [6] and the grouping method used [7].

2 The Pearson χ 2 Test of Goodness-of-Fit

The procedure for hypothesis testing using χ 2 type tests assumes grouping an original sample X 1, X 2, …, X n of size n. The domain of definition of the random variable is divided into k non-overlapping intervals bounded by the points:

$$ x_{0} < x_{1} < \ldots < x_{k - 1} < x_{k} , $$

where x 0, x k are the lower and upper boundaries of the random variable domain. The number of observations n i , in the i-th interval is counted in accordance with this partition, and the probability of falling into this interval,

$$ P_{i} (\theta ) = \int\limits_{{x_{i - 1} }}^{{x_{i} }} {f(x,\,\theta )dx} , $$

corresponds to the theoretical distribution law with the density function f(x, θ), where

$$ n = \sum\limits_{i = 1}^{k} {n_{i} } ,\sum\limits_{i = 1}^{k} {P_{i} (\theta )} = 1. $$

Measurements of the deviations n i /n on P i (θ) form the basis of the statistics used in χ 2 type goodness-of-fit tests.

The statistic of Pearson χ 2 test is calculated using the formula

$$ X_{n}^{2} = n\sum\limits_{i = 1}^{k} {\frac{{(n_{i} /n - P_{i} (\theta ))^{2} }}{{P_{i} (\theta )}}} . $$
(1)

When a simple hypothesis H 0 is true (i.e., all the parameters of the theoretical law are known), this statistic obeys the \( \chi_{r}^{2} \) distribution with r = k – 1 degrees of freedom with n → ∞. The \( \chi_{r}^{2} \) distribution has the density function

$$ g(s) = \frac{1}{{2^{r/2}\Gamma (r/2)}}s^{r/2 - 1} e^{ - S/2} , $$

where Γ(·) is the Euler gamma function.

The test hypothesis H 0 is not rejected if the achieved significance level (p-value) exceeds a specified level of significance α, i.e., if the following inequality holds:

$$ P\left\{ {X_{n}^{2} > X_{n}^{2*} } \right\} = \frac{1}{{2^{r/2}\Gamma (r/2)}}\int\limits_{{X_{n}^{2*} }}^{\infty } {s^{r/2 - 1} e^{ - s/2} ds} > \alpha , $$

where \( X_{n}^{2*} \) is the statistic calculated in (1).

When testing a composite hypothesis and estimating parameters by minimizing the statistic \( X_{n}^{2} \) basing on the same sample, this statistic asymptotically obeys the \( \chi_{r}^{2} \) distribution with r = km – 1 degrees of freedom, where m is the number of parameters estimated.

The statistic \( X_{n}^{2} \) has the same distribution if parameter estimate is obtained by the maximum likelihood method from grouped data by maximizing the likelihood function with respect to θ:

$$ L(\theta ) = \gamma \prod\limits_{i = 1}^{k} {P_{i}^{{n_{i} }} (\theta )} , $$
(2)

where γ is a constant and

$$ P_{i} (\theta ) = \int\limits_{{x_{i - 1} }}^{{x_{i} }} {f(x,\,\theta )dx} $$

is the probability that an observation falls into i-th interval. This result remains for any estimation technique based on grouped data leading to asymptotically effective estimates.

If unknown parameters are estimated by the maximum likelihood method basing on non-grouped data, then the Pearson statistic is distributed as the sum of independent terms [1] \( \chi_{k - m - 1}^{2} + \sum\limits_{j = 1}^{m} {\lambda_{j} \xi_{j}^{2} } , \) where ξ 1, …, ξ m are standard normal random quantities that are independent from each other and from \( \chi_{k - m - 1}^{2} ;\,\lambda_{1} ,\, \ldots ,\,\lambda_{m} \) are numbers between 0 and 1, representing the roots of the equation

$$ \left| {(1 - \lambda ){\mathbf{J}}(\theta ) - {\mathbf{J}}_{g} (\theta )} \right| = 0. $$

Here J(θ) is the Fisher information matrix with respect to the non-grouped observations with elements

$$ J(\theta_{l} ,\,\theta_{j} ) = \int {\left( {\frac{\partial f(x,\,\theta )}{{\partial \theta_{l} }}\frac{\partial f(x,\,\theta )}{{\partial \theta_{j} }}} \right)} f(x,\,\theta )dx; $$

J g (θ) is the Fisher information matrix with respect to the grouped observations with elements

$$ {\mathbf{J}}_{g} (\theta ) = \sum\limits_{i = 1}^{k} {\frac{{\nabla P_{i} (\theta )\nabla^{\tau } P_{i} (\theta )}}{{P_{i} (\theta )}}} . $$

In other words, the distribution of statistic (1), based on maximum likelihood estimates (MLE) calculated by non-grouped data, is unknown and depends, in particular, on the grouping method [2].

3 The Choice of Grouping Intervals

When using chi-squared goodness-of-fit tests, the problem of choosing boundary points and the number of grouping intervals is always important, as the power of these tests considerably depends on the grouping method used. In the case of complete samples (without censored observations), this problem was investigated in [710]. In particular, in [11], the investigation of the power of the Pearson and NRR tests for complete samples has been carried out for various numbers of intervals and grouping methods. The partition of the real line into equiprobable intervals (EPG) is not an optimal grouping method, as a rule. In [12], it was shown for the first time that asymptotically optimal grouping, for which the loss of the Fisher information from grouping is minimized, enables us to maximize the power of the Pearson test against close competing hypotheses. For example, it is possible to maximize the determinant of the Fisher information matrix for grouped data J g (θ), i.e. to solve the problem of D-optimal grouping

$$ \mathop {\hbox{max} }\limits_{{x_{0} < x_{1} < \ldots < x_{k - 1} < x_{k} }} {\mkern 1mu} \det {\mkern 1mu} ({\mathbf{J}}_{g} (\theta )). $$
(3)

In the case of the A-optimality criterion, the trace of the information matrix J g (θ) is maximized by the boundary points

$$ \mathop {\hbox{max} }\limits_{{x_{0} < x_{1} < \ldots < x_{k - 1} < x_{k} }} {\mkern 1mu} {\text{Tr}}{\mkern 1mu} ({\mathbf{J}}_{g} (\theta )), $$
(4)

and the E-optimality criterion maximizes the minimum eigenvalue of the information matrix:

$$ \mathop {\hbox{max} }\limits_{{x_{0} < x_{1} < \ldots < x_{k - 1} < x_{k} }} \,\,{\mkern 1mu} \mathop { \hbox{min} }\limits_{i = 1,2} {\mkern 1mu} {\mkern 1mu} \lambda_{i} ({\mathbf{J}}_{g} (\theta )). $$
(5)

The problem of asymptotically optimal grouping by the A- and E-optimality criteria has been solved for certain distribution families, and the tables of A-optimal grouping are given in [13]. The versions of asymptotically optimal grouping maximize the test power relative to a set of close competing hypotheses, but they do not ensure the highest power against some given competing hypothesis. For the given competing hypothesis H 1, it is possible to construct the χ 2 test, which has the highest power for testing hypothesis H 0 against H 1. For example, in the case of χ 2 Pearson test, it is possible to maximize the non-centrality parameter for the given number of intervals k:

$$ \mathop {\hbox{max} }\limits_{{x_{0} < x_{1} < \ldots < x_{k - 1} < x_{k} }} {\mkern 1mu} n\sum\limits_{j = 1}^{k} {\frac{{\left( {p_{j}^{1} (\theta^{1} ) - p_{j}^{0} (\theta^{0} )} \right)^{2} }}{{p_{j}^{0} (\theta^{0} )}}} , $$
(6)

where \( p_{j}^{0} (\theta^{0} ) = \int\limits_{{x_{j - 1} }}^{{x_{j} }} {f_{0} (u,\,\theta^{0} )du} \), \( p_{j}^{1} (\theta^{1} ) = \int\limits_{{x_{j - 1} }}^{{x_{j} }} {f_{1} (u,\,\theta^{1} )du} \) are the probabilities to fall into j-th interval according to the hypotheses H 0 and H 1, respectively. Let us refer this grouping method to as optimal grouping.

Asymptotically optimal boundary points, corresponding to different optimality criteria, as well as the optimal points, corresponding to (6), are considerably different from each other.

For example, the boundary points maximizing criteria (3)–(6) for the following pair of competing hypotheses are given in Table 1. The null hypothesis H 0 is the normal distribution with density function

Table 1. Optimal boundary points for k = 9
$$ f(x) = \frac{1}{{\sigma \sqrt {2\pi } }}\exp \left\{ { - \frac{{(x - \mu )^{2} }}{{2\sigma^{2} }}} \right\}, $$
(7)

and parameters μ = 0, σ = 1 and the competing hypothesis H 1 is the logistic distribution with density function

$$ f(x) = \frac{\pi }{{\theta_{1} \sqrt 3 }}{\mkern 1mu} \exp \left\{ { - \frac{{\pi (x - \theta_{0} )}}{{\theta_{1} \sqrt 3 }}} \right\}/\left[ {1 + \exp \left\{ { - \frac{{\pi (x - \theta_{0} )}}{{\theta_{1} \sqrt 3 }}} \right\}} \right]^{2} \;, $$
(8)

and parameters θ 0 = 0, θ 1 = 1.

Moreover, in the case of the given competing hypothesis, we can use the so-called Neyman-Pearson classes [14], for which the random variable domain is partitioned into intervals of two types, according to the inequalities f 0(t) < f 1(t) and f 0(t) > f 1(t), where f 0(t) and f 1(t) are the density functions, corresponding to the competing hypotheses. For H 0 and H 1 from our example, we have the first-type intervals

$$ ( - \infty ;2.3747],(0.6828;0.6828],(2.3747;\infty ), $$

and the second-type intervals

$$ (2.3747;0.6828],(0.6828;2.3747]. $$

Figures 1 and 2 illustrate the power of the Pearson χ 2 test for the hypotheses H 0 and H 1 of our example in the case of different grouping methods, depending on the number of intervals (α = 0.1, n = 500). The powers of the well-known nonparametric Kolmogorov, Cramer-von Mises-Smirnov and Anderson-Darling goodness-of-fit tests are given for the comparison.

Fig. 1.
figure 1

The power of the χ 2 Pearson test for simple hypothesis

Fig. 2.
figure 2

The power of the χ 2 Pearson test for composite hypothesis

4 The Pearson χ 2 Test When Checking Normality

The asymptotically D-optimal groupings (AOG) given in Tables 2 and 3 can be used for testing normality using MLE estimates of the parameters μ and σ. Here, the losses in the Fisher information associated with grouping are minimized [13] and the Pearson χ 2 test has maximal power relative to the very close competing hypotheses [13].

Table 2. Optimal boundary points of group intervals for testing of simple and composite hypotheses based on χ 2 – Type Tests (for evaluating μ and σ) and the corresponding values of the relative asymptotic information A
Table 3. Optimal probabilities (frequencies) for testing of simple and composite hypotheses based on χ 2 – type tests (for evaluating μ and σ) and the corresponding values of the relative asymptotic information A

In Table 2, the boundary points t i , i = 1, …, k – 1 are listed in a form that is invariant with respect to the parameters μ and σ for a normal distribution. For calculating the statistic (1), the boundaries x i separating the intervals for specified k are found using the values of t i taken from the corresponding row of the table: \( x_{i} = \hat{\sigma }t_{i} + \hat{\mu }_{i} \), where \( \hat{\mu } \) and \( \hat{\sigma } \) are the MLE of the parameters derived from the given sample. Then, the number of observations n i within each interval are used. The probabilities of falling into a given interval for evaluating the statistic (1) are taken from the corresponding row of Table 3.

When AOG is used in the Pearson χ 2 test, the resulting percentage points \( \tilde{\chi }_{k,\alpha }^{2} \) of the distributions of the statistic (1) and the models of limiting distributions constructed in this paper are shown in Table 4, where β III(θ 0, θ 1, θ 2, θ 3, θ 4) is the type III beta distribution with these parameters and the density

$$ f(x) = \frac{{\theta_{2}^{{\theta_{0} }} }}{{\theta_{3} \beta (\theta_{0} ,\;\theta_{1} )}}\frac{{\left[ {(x - \theta_{4} )/\theta_{3} \;} \right]^{{\theta_{0} - 1}} \left[ {1 - (x - \theta_{4} )/\theta_{3} \;} \right]^{{\theta_{1} - 1}} }}{{\left[ {1 + (\theta_{2} - 1)(x - \theta_{4} )/\theta_{3} \;} \right]^{{\theta_{0} + \theta_{1} }} }}. $$
Table 4. Percentage points \( \tilde{\chi }_{k,\alpha }^{2} \) for the Pearson test statistic when evaluating the parameters μ and σ

To make a decision regarding testing the hypothesis H 0, the value of the statistic \( X_{n}^{2*} \) is compared with the critical value \( \tilde{\chi }_{k,\alpha }^{2} \) from the corresponding row of Table 4, or the attained level of significance \( P\left\{ {X_{n}^{2} > X_{n}^{2*} } \right\} \), determined using the limiting distribution model in the same row of the table, is compared with a specified level of significance α.

The difference between the real distributions \( G(X_{n}^{2} \left| {H_{0} )} \right. \) of statistic (1) and the corresponding \( \chi_{k - m - 1}^{2} \) distributions, when hypothesis H 0 is true, is shown in Fig. 3.

Fig. 3.
figure 3

Distributions of statistic (1) for maximum likelihood estimates of the parameters of a normal distribution based on non-grouped data together with the corresponding \( \chi_{k - m - 1}^{2} \) distributions

Tables 2 and 3 give Fisher asymptotic information:

$$ A\, = \,{ \det }\,{\mathbf{J}}_{\Gamma } \,/\,{ \det }\,{\mathbf{J}}. $$

For tests of normality with calculations of an MLE based on the non-grouped sample, only the parameters μ or σ, the required AOG tables, percentage points, and the limiting distribution models can be found in [15].

For AOG relative to the parameter vector and k = 15 intervals in the grouped sample, about 95 % of the information is preserved. Further increases in the number k of intervals are insignificant; it should be chosen based on the following considerations. For an optimal grouping, the probabilities of falling into an interval are not generally equal (usually these probabilities are minimal for the outermost intervals), so that k should be chosen on the basis of the condition nP i (θ) ≥ 5 … 10 for any interval. At least, in choosing k the recommendation

$$ \mathop {\hbox{min} }\limits_{i} {\mkern 1mu} \left\{ {nP_{i} (\theta )\left| {{\mkern 1mu} i = \overline{1,k} } \right.} \right\} > 1 $$

should be followed. When this condition holds, in the case where the tested hypothesis H 0 is valid, a discrete distribution of the statistic in (1) differs insignificantly from the corresponding asymptotic limiting distribution. If this condition is violated, then the difference between the true distribution of the statistic and the limiting distribution will lead to an increase in the probability of a type I error relative to the specified significance level α. It should also be noted that for small sample sizes, n = 10–20, discrete distributions of the statistics differ substantially from the asymptotic distributions. This condition on the choice of k sets an upper bound estimate on the number of intervals (k ≤ k max). The number of grouping intervals affects the power of the Pearson χ 2 test [6]. It is absolutely unnecessary that its power against a competing distribution (hypothesis) should be maximal for k = k max.

In order to compare the power of the Pearson χ 2 test for checking normality with the power of special normality tests and nonparametric goodness-of-fit tests, the power has been estimated relative to the same competing distributions (hypotheses) as in [15].

The test hypothesis H 0 is taken to be that the observed sample obeys the normal distribution (7).

As competing hypotheses for studying the power of the χ 2 test, we have considered adherence of the analyzed sample to the following distributions: competing hypothesis H 1 corresponds to a generalized normal distribution (family of distributions) with the density

$$ f(x) = \frac{{\theta_{2} }}{{2\theta_{1}\Gamma (1/\theta_{2} )}}\exp \left\{ { - \left( {\frac{{\left| {x - \theta_{0} } \right|}}{{\theta_{1} }}} \right)^{{\theta_{2} }} } \right\} $$

and a shape parameter θ 2 = 4; hypothesis H 2 is the Laplace distribution with the density

$$ f(x) = \frac{1}{{2\theta_{1} }}\exp \left\{ { - \frac{{\left| {x - \theta_{0} } \right|}}{{\theta_{1} }}} \right\}; $$

and hypothesis H 3 is the logistic distribution with the density (8), which is very close to a normal distribution. Figure 4 shows the densities of the distributions corresponding to hypotheses H 1, H 2 and H 3 with scale parameters such that they are the closest to the standard normal law. This choice of hypotheses has a certain justification. Hypothesis H 2, corresponding to a Laplace distribution, is the most distant from H 0. Distinguishing them usually presents no problem. The logistic distribution (hypothesis H 3) is very close to normal and it is generally difficult to distinguish them by goodness-of-fit tests.

Fig. 4.
figure 4

Probability density functions corresponding to the considered hypotheses H i

The competing hypothesis H 1, which corresponds to a generalized normal distribution with a shape factor θ 2 = 4, is a “litmus test” for detection of hidden deficiencies in some tests [1517]. It turned out that for small sample sizes n and small specified probabilities α of type I error, a number of tests employed for testing goodness-of-fit to normal are not able to distinguish close distributions from normal. In these cases, the power 1 – β with respect to hypothesis H 1, where β is the probability of a type II error, is smaller than α. This means that the distribution corresponding to H 1 is “more normal” than the normal law and indicates that the tests are biased.

The power of the Pearson χ 2 test was studied with different number of intervals k ≤ k max for specified sample sizes n. Table 5 lists the maximum powers of the χ 2 test relative to the competing hypotheses H 1, H 2 and H 3, and corresponding to the optimal number k opt of grouping intervals. To a certain extent, it is possible to orient oneself in choosing k on the basis of the values of k opt as a function of n listed in Table 5.

Table 5. Power of the Pearson χ 2 test with respect to hypotheses H 1, H 2 and H 3

5 The Nikulin-Rao-Robson Goodness-of-Fit Test

A modification of the standard statistic \( X_{n}^{2} \) was proposed [35] in which the limiting distribution of the modified statistic is a \( \chi_{k - 1}^{2} \) distribution (the number of degrees of freedom is independent of the number of parameters to be estimated). The unknown parameters of the distribution F(x, θ) have, in this case, must be estimated on the basis of the non-grouped data by a maximum likelihood method. Here the vector P = (P 1, …, P k )τ is assumed to be specified, while the boundary points of the intervals are defined using the relations x i (θ) = F −1(P 1 +… + P i ), i = 1, …, k – 1. The proposed statistic has the form [4]:

$$ Y_{n}^{2} (\theta ) = X_{n}^{2} + n^{ - 1} a^{\tau } (\theta ){\varvec{\uplambda}}(\theta )a(\theta ), $$
(9)

where \( X_{n}^{2} \) is calculated using (1). For distribution laws that are determined only by shift and scale parameters,

$$ {\varvec{\uplambda}}(\theta ) = \left[ {{\mathbf{J}}(\theta ) - {\mathbf{J}}_{g} (\theta )} \right]^{ - 1} . $$

In the case of the normal distribution with parameter vector θ τ = (μ, σ), the Fisher information matrix has the form:

$$ {\mathbf{J}}(\theta ) = \left[ {\begin{array}{*{20}c} {1/\sigma^{2} \;} & 0 \\ 0 & {2/\sigma^{2} } \\ \end{array} } \right], $$

with the elements of the information matrix based on grouped data J g (θ) given by

$$ \begin{array}{*{20}c} {J_{g} (\mu ,\;\mu ) = \sum\limits_{i = 1}^{k} {\frac{1}{{\sigma^{2} P_{i} (\theta )}}\left( {f(t_{i - 1} ) - f(t_{i} )} \right)^{2} } ,} \\ {J_{g} (\sigma ,\;\sigma ) = \sum\limits_{i = 1}^{k} {\frac{1}{{\sigma^{2} P_{i} (\theta )}}\left( {t_{i - 1} f(t_{i - 1} ) - t_{i} f(t_{i} )} \right)^{2} } ,} \\ {J_{g} (\mu ,\;\sigma ) = J_{\gamma } (\sigma ,\;\mu ) = \sum\limits_{i = 1}^{k} {\frac{1}{{\sigma^{2} P_{i} (\theta )}}\left( {f(t_{i - 1} ) - f(t_{i} )} \right)\left( {t_{i - 1} f(t_{i - 1} ) - t_{i} f(t_{i} )} \right)} ,} \\ \end{array} $$

where

$$ t_{i} = (x_{i} - \mu )/\sigma ,t_{0} = - \infty ,t_{k} = \infty ,f(t) = \frac{1}{{\sqrt {2\pi } }}e^{{ - t^{2} /2}} $$

is the standard normal distribution. The elements of the vector \( a^{\tau } (\theta ) = \left[ {a(\mu ),{\mkern 1mu} {\mkern 1mu} a(\sigma )} \right] \) are given by

$$ \begin{array}{*{20}c} {a(\mu ) = \sum\limits_{i = 1}^{k} {\frac{{n_{i} \left( {f(t_{i - 1} ) - f(t_{i} )} \right)}}{{\sigma P_{i} (\theta )}}} ,} \\ {a(\sigma ) = \sum\limits_{i = 1}^{k} {\frac{{n_{i} }}{{\sigma P_{i} (\theta )}}\left( {t_{i - 1} f(t_{i - 1} ) - t_{i} f(t_{i} )} \right)} .} \\ \end{array} $$

As in the case of the Pearson test, when testing for normality with MLE estimation of the parameters μ and σ based on the non-grouped data, Tables 2 and 3 can be used.

For calculating statistic (9), the boundaries separating the intervals for given k are found from the values of t i in the corresponding row of Table 2 using the formula \( x_{i} = \hat{\sigma }t_{i} + \hat{\mu } \), where \( \hat{\mu } \) and \( \hat{\sigma } \) are the MLE parameters found from the sample data. Then the number of observations n i in each interval is counted. The probabilities P i (θ) of falling into an interval when calculating statistic (9) are taken from the corresponding line of Table 3. The elements of the vector α(θ) and matrix Λ(θ) are calculated using the tabulated data for t i , P i and the resulting estimates of \( \hat{\sigma } \).

To decide on the test results for hypothesis H 0, the value of the statistic \( Y_{n}^{2*} \) is compared with the corresponding critical \( \chi_{k - 1,\alpha }^{2} \) or the achieved level of significance (p-value) \( P\left\{ {Y_{n}^{2} > Y_{n}^{2*} } \right\} \) is found from the corresponding \( \chi_{k - 1}^{2} \) distribution.

To test for normality with MLE calculation of the parameters μ or σ separately on the basis of non-grouped samples, the required tables of AOG can be found in [15].

Estimates of the power of the Nikulin-Rao-Robson test for the competing hypotheses H 1, H 2 and H 3 for k opt are given in Table 6. This test is generally more powerful than the Pearson test (for example, see its powers relative to the competing hypotheses H 2 and H 3). Here we often have k opt  = k max for

$$ \mathop {\hbox{min} }\limits_{i} {\mkern 1mu} \left\{ {nP_{i} (\theta )} \right\} > 1, $$
Table 6. Power of the Nikulin-Rao-Robson set with respect to hypotheses H 1, H 2 and H 3

However, this is not always so. In terms of its power relative to the “tricky” hypothesis H 1 it is inferior to the Pearson test, and k opt in this case is considerably smaller than k max with AOG.

6 Conclusion

The power of the Pearson χ 2 test and Nikulin-Rao-Robson test can be maximized by the optimal selection of the number of intervals and interval boundary points.

Combining the obtained results of the power analysis for the Pearson χ 2 test and Nikulin-Rao-Robson test with the results presented in [1517], we can see that in regard to the competing hypothesis H 1, the Pearson χ 2 test shows very good results, yielding in power only to some special normality tests.

At the same time, in regard to competing hypotheses H 2 and H 3, the Pearson χ 2 test and Nikulin-Rao-Robson test inferior in power to most special normality tests and to nonparametric goodness-of-fit tests (Anderson-Darling, Cramer-Mises-Smirnov, Watson, Kuiper, Zhang, Kolmogorov tests).

This work is supported by the Russian Ministry of Education and Science (project 2.541.2014 K).