Abstract
When using the chi-squared goodness-of-fit tests, the problem of choosing boundary points and the number of grouping intervals is always urgent, as the power of these tests considerably depends on the grouping method used. In this paper, the investigation of the power of the Pearson and Nikulin-Rao-Robson chi-squared tests has been carried out for various numbers of intervals and grouping methods. The partition of the real line into equiprobable intervals is not an optimal grouping method, as a rule. It has been shown that asymptotically optimal grouping, for which the loss of the Fisher information from grouping is minimized, enables to maximize the power of the Pearson test against close competing hypotheses. In order to find the asymptotically optimal boundary points, it is possible to maximize some functional (the determinant, the trace or the minimum eigenvalue) of the Fisher information matrix for grouped data. The versions of asymptotically optimal grouping method maximize the test power relative to a set of close competing hypotheses, but they do not insure the largest power against some given competing hypothesis. For the given competing hypothesis H 1, it is possible to construct the chi-squared test, which has the largest power for testing hypothesis H 0 against H 1. For example, in the case of the Pearson chi-squared test, it is possible to maximize the non-centrality parameter for the given number of intervals. So, the purpose of this paper is to give the methods for the choice of optimal grouping intervals for chi-squared goodness-of-fit tests.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
The χ 2 Pearson goodness-of-fit test is very popular in various applications, including the investigation of distributions of measurement error in problems of metrological support.
The correct usage of the Pearson χ 2 test for composite hypotheses (including testing normality) provides estimation of unknown parameters by the grouped data, as in the case of calculating parameter estimates by the original non-grouped sample the test statistic distribution differs from the χ 2 – distribution significantly [1, 2]. By this reason, a series of modified χ 2 tests has been offered, the most famous of which is the Nikulin-Rao-Robson test [3–5]. Moreover, it is necessary to take into account, that the power of Pearson test depends on the number of grouping intervals [6] and the grouping method used [7].
2 The Pearson χ 2 Test of Goodness-of-Fit
The procedure for hypothesis testing using χ 2 type tests assumes grouping an original sample X 1, X 2, …, X n of size n. The domain of definition of the random variable is divided into k non-overlapping intervals bounded by the points:
where x 0, x k are the lower and upper boundaries of the random variable domain. The number of observations n i , in the i-th interval is counted in accordance with this partition, and the probability of falling into this interval,
corresponds to the theoretical distribution law with the density function f(x, θ), where
Measurements of the deviations n i /n on P i (θ) form the basis of the statistics used in χ 2 type goodness-of-fit tests.
The statistic of Pearson χ 2 test is calculated using the formula
When a simple hypothesis H 0 is true (i.e., all the parameters of the theoretical law are known), this statistic obeys the \( \chi_{r}^{2} \) distribution with r = k – 1 degrees of freedom with n → ∞. The \( \chi_{r}^{2} \) distribution has the density function
where Γ(·) is the Euler gamma function.
The test hypothesis H 0 is not rejected if the achieved significance level (p-value) exceeds a specified level of significance α, i.e., if the following inequality holds:
where \( X_{n}^{2*} \) is the statistic calculated in (1).
When testing a composite hypothesis and estimating parameters by minimizing the statistic \( X_{n}^{2} \) basing on the same sample, this statistic asymptotically obeys the \( \chi_{r}^{2} \) distribution with r = k – m – 1 degrees of freedom, where m is the number of parameters estimated.
The statistic \( X_{n}^{2} \) has the same distribution if parameter estimate is obtained by the maximum likelihood method from grouped data by maximizing the likelihood function with respect to θ:
where γ is a constant and
is the probability that an observation falls into i-th interval. This result remains for any estimation technique based on grouped data leading to asymptotically effective estimates.
If unknown parameters are estimated by the maximum likelihood method basing on non-grouped data, then the Pearson statistic is distributed as the sum of independent terms [1] \( \chi_{k - m - 1}^{2} + \sum\limits_{j = 1}^{m} {\lambda_{j} \xi_{j}^{2} } , \) where ξ 1, …, ξ m are standard normal random quantities that are independent from each other and from \( \chi_{k - m - 1}^{2} ;\,\lambda_{1} ,\, \ldots ,\,\lambda_{m} \) are numbers between 0 and 1, representing the roots of the equation
Here J(θ) is the Fisher information matrix with respect to the non-grouped observations with elements
J g (θ) is the Fisher information matrix with respect to the grouped observations with elements
In other words, the distribution of statistic (1), based on maximum likelihood estimates (MLE) calculated by non-grouped data, is unknown and depends, in particular, on the grouping method [2].
3 The Choice of Grouping Intervals
When using chi-squared goodness-of-fit tests, the problem of choosing boundary points and the number of grouping intervals is always important, as the power of these tests considerably depends on the grouping method used. In the case of complete samples (without censored observations), this problem was investigated in [7–10]. In particular, in [11], the investigation of the power of the Pearson and NRR tests for complete samples has been carried out for various numbers of intervals and grouping methods. The partition of the real line into equiprobable intervals (EPG) is not an optimal grouping method, as a rule. In [12], it was shown for the first time that asymptotically optimal grouping, for which the loss of the Fisher information from grouping is minimized, enables us to maximize the power of the Pearson test against close competing hypotheses. For example, it is possible to maximize the determinant of the Fisher information matrix for grouped data J g (θ), i.e. to solve the problem of D-optimal grouping
In the case of the A-optimality criterion, the trace of the information matrix J g (θ) is maximized by the boundary points
and the E-optimality criterion maximizes the minimum eigenvalue of the information matrix:
The problem of asymptotically optimal grouping by the A- and E-optimality criteria has been solved for certain distribution families, and the tables of A-optimal grouping are given in [13]. The versions of asymptotically optimal grouping maximize the test power relative to a set of close competing hypotheses, but they do not ensure the highest power against some given competing hypothesis. For the given competing hypothesis H 1, it is possible to construct the χ 2 test, which has the highest power for testing hypothesis H 0 against H 1. For example, in the case of χ 2 Pearson test, it is possible to maximize the non-centrality parameter for the given number of intervals k:
where \( p_{j}^{0} (\theta^{0} ) = \int\limits_{{x_{j - 1} }}^{{x_{j} }} {f_{0} (u,\,\theta^{0} )du} \), \( p_{j}^{1} (\theta^{1} ) = \int\limits_{{x_{j - 1} }}^{{x_{j} }} {f_{1} (u,\,\theta^{1} )du} \) are the probabilities to fall into j-th interval according to the hypotheses H 0 and H 1, respectively. Let us refer this grouping method to as optimal grouping.
Asymptotically optimal boundary points, corresponding to different optimality criteria, as well as the optimal points, corresponding to (6), are considerably different from each other.
For example, the boundary points maximizing criteria (3)–(6) for the following pair of competing hypotheses are given in Table 1. The null hypothesis H 0 is the normal distribution with density function
and parameters μ = 0, σ = 1 and the competing hypothesis H 1 is the logistic distribution with density function
and parameters θ 0 = 0, θ 1 = 1.
Moreover, in the case of the given competing hypothesis, we can use the so-called Neyman-Pearson classes [14], for which the random variable domain is partitioned into intervals of two types, according to the inequalities f 0(t) < f 1(t) and f 0(t) > f 1(t), where f 0(t) and f 1(t) are the density functions, corresponding to the competing hypotheses. For H 0 and H 1 from our example, we have the first-type intervals
and the second-type intervals
Figures 1 and 2 illustrate the power of the Pearson χ 2 test for the hypotheses H 0 and H 1 of our example in the case of different grouping methods, depending on the number of intervals (α = 0.1, n = 500). The powers of the well-known nonparametric Kolmogorov, Cramer-von Mises-Smirnov and Anderson-Darling goodness-of-fit tests are given for the comparison.
4 The Pearson χ 2 Test When Checking Normality
The asymptotically D-optimal groupings (AOG) given in Tables 2 and 3 can be used for testing normality using MLE estimates of the parameters μ and σ. Here, the losses in the Fisher information associated with grouping are minimized [13] and the Pearson χ 2 test has maximal power relative to the very close competing hypotheses [13].
In Table 2, the boundary points t i , i = 1, …, k – 1 are listed in a form that is invariant with respect to the parameters μ and σ for a normal distribution. For calculating the statistic (1), the boundaries x i separating the intervals for specified k are found using the values of t i taken from the corresponding row of the table: \( x_{i} = \hat{\sigma }t_{i} + \hat{\mu }_{i} \), where \( \hat{\mu } \) and \( \hat{\sigma } \) are the MLE of the parameters derived from the given sample. Then, the number of observations n i within each interval are used. The probabilities of falling into a given interval for evaluating the statistic (1) are taken from the corresponding row of Table 3.
When AOG is used in the Pearson χ 2 test, the resulting percentage points \( \tilde{\chi }_{k,\alpha }^{2} \) of the distributions of the statistic (1) and the models of limiting distributions constructed in this paper are shown in Table 4, where β III(θ 0, θ 1, θ 2, θ 3, θ 4) is the type III beta distribution with these parameters and the density
To make a decision regarding testing the hypothesis H 0, the value of the statistic \( X_{n}^{2*} \) is compared with the critical value \( \tilde{\chi }_{k,\alpha }^{2} \) from the corresponding row of Table 4, or the attained level of significance \( P\left\{ {X_{n}^{2} > X_{n}^{2*} } \right\} \), determined using the limiting distribution model in the same row of the table, is compared with a specified level of significance α.
The difference between the real distributions \( G(X_{n}^{2} \left| {H_{0} )} \right. \) of statistic (1) and the corresponding \( \chi_{k - m - 1}^{2} \) distributions, when hypothesis H 0 is true, is shown in Fig. 3.
Tables 2 and 3 give Fisher asymptotic information:
For tests of normality with calculations of an MLE based on the non-grouped sample, only the parameters μ or σ, the required AOG tables, percentage points, and the limiting distribution models can be found in [15].
For AOG relative to the parameter vector and k = 15 intervals in the grouped sample, about 95 % of the information is preserved. Further increases in the number k of intervals are insignificant; it should be chosen based on the following considerations. For an optimal grouping, the probabilities of falling into an interval are not generally equal (usually these probabilities are minimal for the outermost intervals), so that k should be chosen on the basis of the condition nP i (θ) ≥ 5 … 10 for any interval. At least, in choosing k the recommendation
should be followed. When this condition holds, in the case where the tested hypothesis H 0 is valid, a discrete distribution of the statistic in (1) differs insignificantly from the corresponding asymptotic limiting distribution. If this condition is violated, then the difference between the true distribution of the statistic and the limiting distribution will lead to an increase in the probability of a type I error relative to the specified significance level α. It should also be noted that for small sample sizes, n = 10–20, discrete distributions of the statistics differ substantially from the asymptotic distributions. This condition on the choice of k sets an upper bound estimate on the number of intervals (k ≤ k max). The number of grouping intervals affects the power of the Pearson χ 2 test [6]. It is absolutely unnecessary that its power against a competing distribution (hypothesis) should be maximal for k = k max.
In order to compare the power of the Pearson χ 2 test for checking normality with the power of special normality tests and nonparametric goodness-of-fit tests, the power has been estimated relative to the same competing distributions (hypotheses) as in [15].
The test hypothesis H 0 is taken to be that the observed sample obeys the normal distribution (7).
As competing hypotheses for studying the power of the χ 2 test, we have considered adherence of the analyzed sample to the following distributions: competing hypothesis H 1 corresponds to a generalized normal distribution (family of distributions) with the density
and a shape parameter θ 2 = 4; hypothesis H 2 is the Laplace distribution with the density
and hypothesis H 3 is the logistic distribution with the density (8), which is very close to a normal distribution. Figure 4 shows the densities of the distributions corresponding to hypotheses H 1, H 2 and H 3 with scale parameters such that they are the closest to the standard normal law. This choice of hypotheses has a certain justification. Hypothesis H 2, corresponding to a Laplace distribution, is the most distant from H 0. Distinguishing them usually presents no problem. The logistic distribution (hypothesis H 3) is very close to normal and it is generally difficult to distinguish them by goodness-of-fit tests.
The competing hypothesis H 1, which corresponds to a generalized normal distribution with a shape factor θ 2 = 4, is a “litmus test” for detection of hidden deficiencies in some tests [15–17]. It turned out that for small sample sizes n and small specified probabilities α of type I error, a number of tests employed for testing goodness-of-fit to normal are not able to distinguish close distributions from normal. In these cases, the power 1 – β with respect to hypothesis H 1, where β is the probability of a type II error, is smaller than α. This means that the distribution corresponding to H 1 is “more normal” than the normal law and indicates that the tests are biased.
The power of the Pearson χ 2 test was studied with different number of intervals k ≤ k max for specified sample sizes n. Table 5 lists the maximum powers of the χ 2 test relative to the competing hypotheses H 1, H 2 and H 3, and corresponding to the optimal number k opt of grouping intervals. To a certain extent, it is possible to orient oneself in choosing k on the basis of the values of k opt as a function of n listed in Table 5.
5 The Nikulin-Rao-Robson Goodness-of-Fit Test
A modification of the standard statistic \( X_{n}^{2} \) was proposed [3–5] in which the limiting distribution of the modified statistic is a \( \chi_{k - 1}^{2} \) distribution (the number of degrees of freedom is independent of the number of parameters to be estimated). The unknown parameters of the distribution F(x, θ) have, in this case, must be estimated on the basis of the non-grouped data by a maximum likelihood method. Here the vector P = (P 1, …, P k )τ is assumed to be specified, while the boundary points of the intervals are defined using the relations x i (θ) = F −1(P 1 +… + P i ), i = 1, …, k – 1. The proposed statistic has the form [4]:
where \( X_{n}^{2} \) is calculated using (1). For distribution laws that are determined only by shift and scale parameters,
In the case of the normal distribution with parameter vector θ τ = (μ, σ), the Fisher information matrix has the form:
with the elements of the information matrix based on grouped data J g (θ) given by
where
is the standard normal distribution. The elements of the vector \( a^{\tau } (\theta ) = \left[ {a(\mu ),{\mkern 1mu} {\mkern 1mu} a(\sigma )} \right] \) are given by
As in the case of the Pearson test, when testing for normality with MLE estimation of the parameters μ and σ based on the non-grouped data, Tables 2 and 3 can be used.
For calculating statistic (9), the boundaries separating the intervals for given k are found from the values of t i in the corresponding row of Table 2 using the formula \( x_{i} = \hat{\sigma }t_{i} + \hat{\mu } \), where \( \hat{\mu } \) and \( \hat{\sigma } \) are the MLE parameters found from the sample data. Then the number of observations n i in each interval is counted. The probabilities P i (θ) of falling into an interval when calculating statistic (9) are taken from the corresponding line of Table 3. The elements of the vector α(θ) and matrix Λ(θ) are calculated using the tabulated data for t i , P i and the resulting estimates of \( \hat{\sigma } \).
To decide on the test results for hypothesis H 0, the value of the statistic \( Y_{n}^{2*} \) is compared with the corresponding critical \( \chi_{k - 1,\alpha }^{2} \) or the achieved level of significance (p-value) \( P\left\{ {Y_{n}^{2} > Y_{n}^{2*} } \right\} \) is found from the corresponding \( \chi_{k - 1}^{2} \) distribution.
To test for normality with MLE calculation of the parameters μ or σ separately on the basis of non-grouped samples, the required tables of AOG can be found in [15].
Estimates of the power of the Nikulin-Rao-Robson test for the competing hypotheses H 1, H 2 and H 3 for k opt are given in Table 6. This test is generally more powerful than the Pearson test (for example, see its powers relative to the competing hypotheses H 2 and H 3). Here we often have k opt = k max for
However, this is not always so. In terms of its power relative to the “tricky” hypothesis H 1 it is inferior to the Pearson test, and k opt in this case is considerably smaller than k max with AOG.
6 Conclusion
The power of the Pearson χ 2 test and Nikulin-Rao-Robson test can be maximized by the optimal selection of the number of intervals and interval boundary points.
Combining the obtained results of the power analysis for the Pearson χ 2 test and Nikulin-Rao-Robson test with the results presented in [15–17], we can see that in regard to the competing hypothesis H 1, the Pearson χ 2 test shows very good results, yielding in power only to some special normality tests.
At the same time, in regard to competing hypotheses H 2 and H 3, the Pearson χ 2 test and Nikulin-Rao-Robson test inferior in power to most special normality tests and to nonparametric goodness-of-fit tests (Anderson-Darling, Cramer-Mises-Smirnov, Watson, Kuiper, Zhang, Kolmogorov tests).
This work is supported by the Russian Ministry of Education and Science (project 2.541.2014 K).
References
Chernoff, H., Lehmann, E.L.: The use of maximum likelihood estimates in χ 2 test for goodness of fit. Ann. Math. Stat. 25(3), 579–586 (1954)
Lemeshko, B.Y., Postovalov, S.N.: Limit distributions of the Pearson chi 2 and likelihood ratio statistics and their dependence on the mode of data grouping. Ind. Lab. 64(5), 344–351 (1998)
Nikulin, M.S.: χ 2 tests for continuous distributions with shift and scale parameters. Teor. Veroyatn. Primen. XVIII(3), 583–591 (1973)
Nikulin, M.S.: On χ 2 tests for continuous distributions. Teor. Veroyatn. Primen. XVIII(3), 675–676 (1973)
Rao, K.C., Robson, D.S.: A chi-squared statistic for goodness-of-fit tests within the exponential family. Commun. Stat. 3(1), 1139–1153 (1974)
Lemeshko, B.Y., Chimitova, E.V.: On the choice of the number of intervals in χ 2-type goodness-of-fi t tests. Zavod. Lab. Diagn. Mater. 69(1), 61–67 (2003)
Lemeshko, BYu.: Asymptotically optimum grouping of observations in goodness-of-fit tests. Ind. Lab. 64(1), 59–67 (1998)
Voinov, V., Pya, N., Alloyarova, R.: A comparative study of some modified chi-squared tests. Commun. Stat. Simul. Comput. 38(2), 355–367 (2009)
Denisov, V., Lemeshko, B.: Optimal grouping in estimation and tests of goodness-of-fit hypotheses. Wissenschaftliche Schriftenreihe der Technishen universitat Karl-Marx-Stadt. 10, 63–81 (1989)
Lemeshko, B., Postovalov, S., Chimitova, E.: On statistic distributions and the power of the Nikulin χ 2 test. Ind. Lab. 67(3), 52–58 (2001)
Lemeshko, B., Chimitova, E.: Maximization of the power of χ2 tests. Papers of Siberian Branch of Academy of Science of Higher Education, vol. 2, pp. 53–61 (2000)
Denisov, V., Lemeshko, B.: Optimal grouping in the analysis of experimental data. In: Measuring Information Systems, Novosibirsk, pp. 5–14 (1979). [In Russian]
Lemeshko, B.Y., Lemeshko, S.B., Postovalov, S.N., Chimitova, E.V.: Statistical Data Analysis, Simulation and Study of Probability Regularities. Computer Approach, 888 pp. NSTU Publisher, Novosibirsk (2011). [In Russian]
Greenwood, P.E., Nikulin, M.S.: A Guide to Chi-Squared Testing. John Wiley & Sons Inc., New York (1996)
Lemeshko, B.Y.: Tests for checking the deviation from normal distribution law. In: Guide on the Application. INFRA-M, Moskow (2015). doi:10.12737/6086
Lemeshko, B.Y., Lemeshko, S.B.: Comparative analysis of tests for verifying deviation of a distribution from normal. Metrologiya 2, 3–24 (2005). [In Russian]
Lemeshko, B.Y., Rogozhnikov, A.P.: Features and power of some tests of normality. Metrologiya 4, 3–24 (2009). [In Russian]
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Chimitova, E.V., Lemeshko, B.Y. (2017). Chi-Squared Goodness-of-Fit Tests: The Optimal Choice of Grouping Intervals. In: Szewczyk, R., Kaliczyńska, M. (eds) Recent Advances in Systems, Control and Information Technology. SCIT 2016. Advances in Intelligent Systems and Computing, vol 543. Springer, Cham. https://doi.org/10.1007/978-3-319-48923-0_82
Download citation
DOI: https://doi.org/10.1007/978-3-319-48923-0_82
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48922-3
Online ISBN: 978-3-319-48923-0
eBook Packages: EngineeringEngineering (R0)