1 Introduction

The beta distribution has the probability density function (PDF)

$$ f(x) = x^{p - 1} (1 - x)^{q - 1} /{\rm B}(p,q),\quad 0 \le x \le 1,p > 0,q > 0 . $$
(1)

wherein B(p,q) is the beta function. This distribution is used in different fields of environmental research. Gottschalk and Weingarter (1998) used the beta distribution in the hydrology for runoff coefficients. Yao (1974) modelled the distribution of relative humidity in meteorology by a beta distribution. Flynn (2004) used the beta distribution in a model for human exposure to airborne contaminants. Nadarajah and Kotz (2007) described and disscused modifications of the beta distribution and their applications.

The problem in the statistical inference of the beta distribution was a lack of practical goodness-of-fit tests. Recently, it is possible to test the hypothesis of a beta distribution by the biased transformation (BT) of Raschke (2009) and a classical test for normality. But only the Anderson–Darling test has been researched for the beta distribution up to now. The goal of this paper is to research further EDF tests (EDF—empirical distribution function) and their power. The aim of this research might not be very interesting from the theoretical point of view but it is necessary from the practical one. The goodness-of-fit of models for environmental data have been the research object of previous studies, for example in Song and Singh (2010) and Crujeiras et al. (2010). The test procedure and the different tests for normality are explained in Sect. 2. The selection of the tests is also discussed in this section. The power of the test for the beta distribution is researched in Sect. 3. Empirical data are analysed in Sect. 4 to demonstrate the practical relevance. The results are concluded in Sect. 5. Large tables containing simulation results can be found in Appendix.

2 Test procedures

According to Raschke (2009) the basic procedure to test the hypothesis that a random variable X is beta distributed is the

  • Maximum likelihood estimation of all parameters in Eq. 1 for the sample,

  • Computation of a sample of Y with Y = F −1normal (F beta(X)),

  • Maximum likelihood estimation of all parameters for a normal distribution for Y with the PDF

    $$ f(y) = 1/\left( {\sigma \sqrt {2\pi } } \right)\exp \left[ { - (y - \mu )^{2} /(2\sigma^{2} )} \right]_{{}}^{{}} ,\quad \sigma > 0\;{\text{and}} $$
    (2)
  • Application of a Goodness-of-fit test for normal distributions.

The Hypothesis H 0 is that X is beta distributed. The Hypothesis H 0′ is that Y is normally distributed. H 0 is rejected if H 0′ is rejected with the same level of significance. The inverse F −1normal is the inverse function of a normal distribution with µ = 0 and σ = 1. The function F beta is the cumulative distribution function (CDF) of the beta distribution Eq. 1 with the estimated parameters. The condition that all observations are identical and independent distributed has been fulfilled.

There exist different tests for normality. del Barrio et al. (2000) describe the basics and the history of important tests. Landry and Leparge (1992) have researched the power of such tests. Only the EDF tests are considered here because they are more powerful than most of the other tests, according to the results of Landry and Leparge (1992). “More powerful” means that the average and the minimum share of rejections is higher when H 0 is really false. Some other tests like the Oja test according to Oja (1981, 1983) are also powerful. But there do not exist critical values for every sample size. The same—high power and missing critical values—is for the Shapiro–Wilk test (Shapiro and Wilk 1965) and its approximation (Shapiro and Francia 1972) too. Such tests are not considered here. The test based on the L2-Wasserstein distance (del Barrio et al. 1999; del Barrio et al. 2000) is not considered because of different reasons. Krauzci (2009) has developed a concrete test procedure (critical values) for this approach but the power is not researched and compared satisfiable with other tests. Results are published only for sample size n = 20 and classical EDF tests are applied with a mistake. The older critical values for the Anderson–Darling test (Stephens 1974) are used instead of the current values (Stephens 1986). This mistake was also made by Landry and Leparge (1992). Although the basic of the Anderson–Darling and the Kolmogorov–Smirnov test is the same in these studies there are considerable differences between the share of rejections computed by Landry and Leparge (1992, Table 5) and Krauzci (2009, Tables 2, 3). The differences are noted in the cases “Be(1,1)”-“Beta(1,1)”, “Be(2,2)”-“Beta(2,2)”, “We(2,1)”-“Weibull(2)”, “t10”-”Studenten(10)” and “C(0,1)”-”Cauchy”. Last but not least, the computational effort for the test based at the L2-Wasserstein distance seems to be considerable.

Here, the EDF tests according to Stephens (1986) are used only with values for the ordered sample of Y

$$ D^{ + } = \mathop {\max }\limits_{i} \left[ {i/n - F(Y_{i} )} \right] $$

and

$$ D^{ - } = \mathop {\max }\limits_{i} \left[ {F(Y_{i} ) - (i - 1)/n} \right] $$

The Kolmogorov–Smirnov test value is represented by D* and is defined by

$$ \begin{aligned} D & = \max (D^{ + } ,D^{ - } )\;{\text{and}} \\ D^{*} & = D\left( {\sqrt n - 0.01 + 0.85/\sqrt n } \right). \\ \end{aligned} $$

The Kuiper test value is represented by V* and defined by

$$ \begin{aligned} V & = D^{ + } + D^{ - } \;{\text{and}} \\ V^{*} & = V\left( {\sqrt n + 0.05 + 0.82/\sqrt n } \right). \\ \end{aligned} $$

The Cramér-von Mises test value is represented by W 2* and defined by

$$ \begin{aligned} W^{2} & = \sum\limits_{i = 1}^{n} {\left[ {F(Y_{i} ) - (2i - 1)/(2n)} \right]^{2} } + 1/(12n)\;{\text{and}} \\ W^{2*} & = W^{2} \left( {1.0 + 0.5/n} \right). \\ \end{aligned} $$

The Watson test value is represented by U 2* and defined by

$$ \begin{aligned} U^{2} & = W^{2} - n\left\{ {\sum\limits_{i = 1}^{n} {\left[ {F(Y_{i} )} \right]/n - 0.5} } \right\}^{2} \;{\text{and}} \\ U^{2*} & = U^{2} (1.0 + 0.5/n). \\ \end{aligned} $$

The Anderson–Darling test value is represented by A 2 * and defined by

$$ \begin{aligned} A^{2} & = - n - \sum\limits_{i = 1}^{n} {\left\{ {(2i - 1)\left( {\ln \left[ {F(Y_{i} )} \right] + \ln \left[ {1 - F(Y_{n + 1 - i} )} \right]} \right)} \right\}/n} \;{\text{and}}\quad \\ A^{2*} & = A^{2} \left( {1.0 + 0.75/n + 2.25/n^{2} } \right). \\ \end{aligned} $$

The F(Y i ) is the CDF of the normal distribution with the maximum likelihood estimated parameters. The critical values for a significance level α = 5 and 10% are 0.895 and 0.819 for the Kolmogorov–Smirnov test, 1.489 and 1.386 for the Kuiper test, 0.126 and 0.104 for the Crámer-von Mises test, 0.117 and 0.096 for the Watson test and 0.752 and 0.631 for the Anderson–Darling test (Stephens 1986, Table 4.7, upper tail).

Beta distributed samples for X were simulated, parameters were estimated and the tests were applied for 10,000 samples in order to verify the tests. The share of rejections is listed in Table 2 for different sample sizes and parameter constellations in Eq. 1 and significance levels of α = 5 and 10%. The considered parameters are relative small, the largest value is p,q = 20. Small changes of them in range of p,q < 3 influence the shape of the beta distribution considerable. When p and q become large, the beta distribution becomes the shape of a normal distribution (s. Johnson et al. 1995). This is also the case when p and q become much larger. That is why the equal shares of rejections can be expected for the parameter p,q > 20 as for p,q = 20.

The used EDF tests work. The tests also work for α = 1%, but the results are not shown here.

3 Research of the power of the tests

3.1 Alternative distributions

The power of the EDF tests is researched in the first case with samples of a random variable X which are not beta distributed. The log-normal distribution is used with the PDF

$$ f(y) = 1/\left( {x\sigma \sqrt {2\pi } } \right)\exp \left[ { - (\ln (x) - \mu )^{2} /(2\sigma^{2} )} \right]_{{}}^{{}} ,\quad \sigma > 0,x > 0. $$

This distribution for x ≥ 0 is truncated here at x = F 1 (0.99) of the non-truncated case and then scaled so that 0<x≤1. The gamma distribution is used in the same way. The gamma PDF is defined with

$$ f(x) = \left( {x/\lambda } \right)^{\alpha - 1} \exp \left( { - x/\lambda } \right)/\left[ {\lambda \Upgamma (\alpha )} \right],\quad x > 0,\lambda > 0,\alpha > 0. $$

A normal distribution is truncated at x = F −1 (0.01) and x = F 1 (0.99) of the non-truncated case. Then the distribution is scaled and moved so that 0 ≤ x ≤ 1.

The triangle distribution is also used, with a density function

$$ f(x) = \left\{ { \begin{array}{*{20}c} {\frac{2x}{c}} \hfill & {{\text{for}}\,0 \le x < c,} \hfill \\ {{\frac{2(1 - x)}{1 - c}}} \hfill & {{\text{for}}\,c \le x \le 1,} \hfill \\ {0,} \hfill & {\text{elsewhere,}} \hfill \\ \end{array} } \right. $$

with 0 < c < 1. And the trapeze distribution is used with

$$ f(x) = \left\{ {\begin{array}{*{20}c} {{\frac{2x}{c(1 + d - c)}},} \hfill & {{\text{for}}\,0 \le x < c,} \hfill \\ {{\frac{2}{1 + d - c}},} \hfill & {{\text{for}}\,c \le x < d,} \hfill \\ {{\frac{2(1 - x)}{(1 - d)(1 + d - c)}},} \hfill & {{\text{for}}\,d \le x \le 1,} \hfill \\ \end{array} } \right. $$

with 0 < c < d < 1.

Samples of X were simulated, parameters were estimated for an assumed beta distribution and the tests were applied for 10,000 repetitions. The share of rejections is listed in Table 3 for different sample sizes and parameter constellations and significance levels of α = 5 and 10%. The Anderson–Darling test is the most powerful and has the highest average of shares of rejection. It is followed by the Crámer-von Mises and the Watson test.

3.2 Contaminated beta distribution

In this research beta distributed samples of X are generated. But 20% of the samples are from a different beta distributed variable with parameters p* and q*. The share of rejections for 10,000 repetitions is listed in Table 4 for different sample sizes and parameter constellations and significance levels of α = 5 and 10%. The Anderson–Darling test is the most powerful once again and has the highest average of shares of rejections. It is followed again by the Crámer von Mises and the Watson test.

3.3 Scaled beta distribution

In final research case, the beta distribution is scaled so that 0 ≤ x ≤ s but is assumed as non-scaled (0 ≤ x ≤ 1). The value s is the scale parameter. The share of rejections for 10,000 repetitions of samples is listed in Table 5 for different sample sizes and parameter constellations and significance levels of α = 5 and 10%. The Anderson–Darling test is again the most powerful and has the highest average of shares of rejections. It is followed again by the Crámer von Mises and the Watson test.

3.4 Discussion of results

The Anderson–Darling test is the most powerful test in almost all researched constellations with non beta distributed, contaminated or scaled samples of X (Tables 3, 4, 5). It has the highest share of rejections. The power of the different tests is shown in Fig. 1 as comparison of the minima and averages of the shares of rejection. According to this figure, it is obviously that the Anderson–Darling test is the most powerful. The Cramér-von Mises test is the second most powerful followed by the Watson test. The Kolmogorov–Smirnov test and the Kuiper test are the least powerful. The power of the test increases by increasing sample size as expected.

Fig. 1
figure 1

Minima and average of share of rejections in the researched cases: distributions different from beta (black), contaminated beta distribution (dark gray) and scaled beta distribution (light gray) for α = 5% according to values in Table A2-A4

The power of the test is qualitatively the same for a significance level of α = 1%. These results are not listed here as they are not informative.

4 Practical application in environmental data

The procedure of EDF-tests for the beta distribution is applied now for empirical data. The Kolmogorov–Smirnov test for a fully specified distribution is applied also as done by Chia and Hutchinson (1991) and the classical χ2 test is applied also as done by Yao (1974) to demonstrate the differences from the new tests.

Some random variables are assumed to be beta distributed in the meteorology like the relative humidity of surface by Yao (1974), the daily cloud duration by Chia and Hutchinson (1991) and the sunshine duration by Sulaiman et al. (1999). That’s why an example of meteorological data is analysed here. The data of Haarweg Wageningen weather station of Wageningen University (Netherlands, 2009) of relative humidity (RH) of the air of May 2007 and 2008 has been selected (see Appendix) and analysed separately like Yao (1974) analysed data per month. The results of the estimation and the different tests are listed in Table 1.

Table 1 Results of ML estimation and goodness-of-fit tests for the empirical examples of meteorology and hydrology

The observations and the estimated CDF are shown in Fig. 2a. The hypothesis that the relative humidity RH is beta distributed is not rejected for the May 2007 with a significance level of ≥10% according to all new EDF-tests. The Kolmogorov–Smirnov test (full specified) accepts H 0 also for α ≥ 10%. In contrary to this the χ2 test accepts the hypothesis only for α ≥ 5%. The hypothesis is rejected for the data of May 2008 with a significance level of 1% for the new tests except the Kolmogorov–Smirnov test. This new test rejects the hypothesis with α ≥ 5% like the classical χ2 test. In contrary to these tests, the Kolmogorov–Smirnov test for the full specified distribution accepts the hypothesis at a level of α ≥ 10%.

Fig. 2
figure 2

Empirical and estimated CDF: a relative humidity RH of month May of the Haarweg-Wageningen weather station, b runoff coefficient c for catchments of Switzerland (empirical F i  = i/(n+1) with i as position of observation in the ordered sample)

The runoff coefficient c of catchments is modelled in the hydrology as a beta distributed random variable (Gottschalk and Weingarter 1998) that is why this variable is selected for a further empirical example. The runoff coefficient data has been provided by the authors Gottschalk and Weingartner (personnel communication, 2009) except the data for Emme-Eggiwill and Langeten-Huttwil (Gottschalk and Weingarter 1998, Table 1). Samples of four regions of Switzerland are analysed: Alpine, Pre-alpine, Midland and Southern alpine. The results of the ML estimation and the tests are listed in Table 2. The observations and the estimated CDF are shown in Fig. 2b. The runoff coefficient c can be assumed as beta distributed for the Alpine and Midland with a significance level α > 10% according to the new tests. In contrary to these results, the hypothesis of a beta distributed runoff coefficient is rejected for Pre-alpine and Southern alpine with a significance level of α ≥ 1–10% partly with α < 1%. The hypothesis H 0 would be accepted by the classical χ2 test for all regions and the Kolmogorov–Smirnov test for the full specified distribution with α ≥ 10% except the χ2 test for the Southern alpines which accepts H 0 only with α ≥ 5%.

The EDF test for the beta distribution works well in contrast to the classical χ2 test and the Kolmogorov–Smirnov test for the fully specified distribution (compare Table 2 with Table 6 in Raschke 2009).

Table 2 Share of rejections of different tests for beta distributed X with different sample sizes n, parameters in Eq. 1 and levels of significance

5 Conclusion

The power of different EDF tests applied for the beta distribution by using the BT has been researched. The Anderson–Darling test has the highest power followed by the Cramer von Mises and the Watson test in the researched cases. They are recommended for application in this order. The classical χ2 test and the Kolmogorov–Smirnov test for the fully specified distribution should not be applied for the assumption of a beta distribution with estimated parameters. They do not work well. The test for normality based on L2-Wasserstein distance (del Barrio et al. 1999; del Barrio et al. 2000; Krauzci 2009) should be considered for the test for beta distribution in the future when the power of this test for normality is compared better with other tests.