Abstract
The exponential distribution is a popular model both in practice and in theoretical work. As a result, a multitude of tests based on varied characterisations have been developed for testing the hypothesis that observed data are realised from this distribution. Many of the recently developed tests contain a tuning parameter, usually appearing in a weight function. In this paper we compare the powers of 20 tests for exponentiality—some containing a tuning parameter and some that do not. To ensure a fair ‘apples to apples’ comparison between each of the tests, we employ a data-dependent choice of the tuning parameter for those tests that contain these parameters. The comparisons are conducted for various samples sizes and for a large number of alternative distributions. The results of the simulation study show that the test with the best overall power performance is the Baringhaus and Henze test, followed closely by the test by Henze and Meintanis; both tests contain a tuning parameter. The score test by Cox and Oakes performs the best among those tests that do not include a tuning parameter.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction and motivation
The exponential distribution is a popular choice of model both in practice and in theoretical work. For this reason a great deal of research has been dedicated to the large number of ways in which it can be uniquely characterised. This has ultimately lead to a multitude of tests for testing the hypothesis that observed data are realised from the exponential distribution.
Several authors have written review papers on this topic, describing and comparing a number of tests, see, for example, Spurrier (1984), Ascher (1990) and Henze and Meintanis (2002). However, the most recent review paper on this topic was written more than 10 years ago by Henze and Meintanis (2005). Since then, a number of new tests have been proposed, see for example Jammalamadaka and Taufer (2006), Haywood and Khmaladze (2008), Mimoto and Zitikus (2008), Wang (2008), Volkova (2010), Grané and Fortiana (2011), Abbasnejad et al. (2012), Baratpour and Habibi Rad (2012), Volkova and Nikitin (2013), Meintanis et al. (2014) and Zardasht et al. (2015). Furthermore, many of the tests for exponentiality contain a tuning parameter, often appearing in a weight function. The fact that the powers of these tests are functions of the tuning parameter complicates the comparisons between tests. In many papers the authors evaluate the power of the test over a grid of possible values of this parameter, but the problem with this approach is that the optimal choice of the tuning parameter is unknown in practice. In these papers the authors often provide a so-called ‘compromise’ choice; this is a choice of the tuning parameter that provides reasonably high power for the majority of the alternatives considered in their finite sample studies. Examples of papers that contain these compromise choices include Henze and Meintanis (2002, 2005) and Meintanis et al. (2014). However, while these fixed choices of the parameter are able to produce high powers against a number of alternatives, they can also produce abysmally low powers against other alternatives. Naturally, in practice, the distribution of the realised data is unknown, meaning that the power of tests employing the compromise choice might be suspect.
A method to choose the value of the tuning parameter data-dependently is proposed in Allison and Santana (2015). This approach removes the practical problem of choosing the tuning parameter and also allows one to directly compare the powers achieved by various goodness-of-fit tests.
The aim of this paper is to objectively compare the powers of various tests for exponentiality. Where applicable, the methodology detailed in Allison and Santana (2015) is used in order to choose the value of the tuning parameter data-dependently; this allows a fair ‘apples to apples’ comparison between the tests containing a tuning parameter and those without one.
The remainder of the paper is organised as follows: In Sect. 2 we introduce and provide details of the various tests for exponentiality that form part of the simulation study. The data-dependent choice of the tuning parameter is discussed in Sect. 3. Section 4 presents the results of an extensive Monte Carlo study of the empirical powers of the tests against numerous alternatives to the exponential distribution (a distinction is made between alternative distributions with increasing, decreasing, and non-monotone hazard rates). In Sect. 5 we apply all the tests to a real-world data set and the paper concludes in Sect. 6 with some final remarks.
2 Tests for exponentiality
Let \(X_{1},X_{2},\ldots ,X_{n}\) be a sequence of independent and identically distributed continuous realisations of a random variable X. Denote the exponential distribution with expectation \(1/\lambda \) by \(Exp\left( \lambda \right) \). The composite goodness-of-fit hypothesis to be tested is
for some \(\lambda >0\), against general alternatives.
The majority of the test statistics that we consider are based on the scaled values \(Y_{j}=X_{j}\hat{\lambda }\), where \(\hat{\lambda }=1/\bar{X}_{n}\) with \( \bar{X}_{n}=\frac{1}{n}\sum _{j=1}^{n}X_{j}\). The use of scaled values is motivated from the invariance property of the exponential distribution with respect to scale transformations. Since X follows an exponential distribution if and only if cX is exponentially distributed for every \(c>0\), we would not expect a scale transformation to influence the conclusion drawn regarding the exponentiality of X. As a result, the test statistic depends on the data only through scaled versions of the original data, and the conclusions drawn regarding the exponentiality of \(X_{1},\ldots ,X_{n}\) and \( Y_{1},\ldots ,Y_{n}\) should be the same. In the remainder of the paper we denote the order statistics of \(X_{j}\) and \(Y_{j}\) by \(X_{\left( 1\right) }<X_{\left( 2\right) }<\cdots <X_{\left( n\right) }\) and \(Y_{\left( 1\right) }<Y_{\left( 2\right) }<\cdots <Y_{\left( n\right) }\) respectively.
In this section we provide short descriptions of the 20 tests for exponentiality that we compare to one another in the Monte Carlo study in Sect. 3. These tests are arranged according to the characteristics of the exponential distribution that the test is based on. These tests are chosen because they provide a diverse selection of established tests (tests that have been shown to perform well in terms of power) and newly developed tests, and simultaneously considering tests that contain a tuning parameter as well as those that do not. In addition to the tests presented in this section, we also provide references to numerous other tests for exponentiality not included in this study.
2.1 Tests based on the empirical characteristic function
In recent years many goodness-of-fit tests have been developed which are based on the characteristic function (CF). Typically in these tests the CF of a random variable X, given by
is estimated by the empirical characteristic function (ECF) of the data \(X_{1},\ldots ,X_{n}\), defined as
Standard methods for testing that employ the ECF utilise the L2-type distance
which incorporates the CF, ECF and a parametric weight function , which usually satisfy the conditions \(\int _{-\infty }^\infty t^2 w_\gamma (t) dt <\infty \), \(w_\gamma (t) = w_\gamma (-t)\), and \(w_\gamma (t) \ge 0,~\forall ~t\), and depends on some tuning parameter \(\gamma \).
There has been considerable discussion in the literature on the choice of \(w_\gamma (t)\). Popular choices are \(w_\gamma (t)= e^{-\gamma |t|}\) or \(w_\gamma (t) = e^{-\gamma t^2}\). Both of these correspond to kernel-based choices with \(e^{-\gamma |t|}\) being a multiple of the standard Laplace density as kernel with bandwidth equal to \(1/\gamma \) and \(e^{-\gamma t^2}\) a multiple of the standard normal density as kernel with bandwidth equal to \(1/(\gamma \sqrt{2})\).
For various tests for exponentiality that incorporate the ECF, the interested reader is referred to Henze and Meintanis (2002) and Henze and Meintanis (2005) and the references therein. However, for the purposes of this paper we will only focus on the ‘Epps and Pulley’ test proposed in Epps and Pulley (1986) and a more recent test based on the concept of the probability weighted empirical characteristic function (PWECF) proposed in Meintanis et al. (2014).
2.1.1 Epps and Pulley (1986) test (\({ EP}_n\))
The test proposed in Epps and Pulley (1986) is based on the difference between the ECF, \(\phi _n(t)\), of \(X_1, X_2,\ldots ,X_n\) and the CF of the exponential distribution, \(\phi _0(t,\lambda )=\lambda /(\lambda - it)\). If the data are exponentially distributed with parameter \(\lambda \), then \(\phi _n(t)\) should be close to \(\phi _0(t,\lambda )\).
Estimating \(\lambda \) by \(\widehat{\lambda } = 1/\bar{X}_n\), the test is based on the idea that the quantity
should be small under the null hypothesis, where
The normalised Epps and Pulley test statistic simplifies to
This test rejects \(H_{0}\) for large values of \(|{ EP}_{n}|\). The null distribution of this test statistic was shown to be standard normal in Epps and Pulley (1986). Furthermore, the test was also shown to be consistent against absolutely continuous alternative distributions with monotone hazard rates, strictly positive supports and finite expected values. In a number of studies it has been shown that this test is reasonably powerful, see for example Henze and Meintanis (2005).
2.1.2 PWECF (\({ PW}^1_{n,\gamma }\) and \({ PW}^2_{n,\gamma }\))
There has been a lot of discussion regarding the form of the weight function when using goodness-of-fit tests based on the ECF and CF. Fortunately, Meintanis et al. (2014) provides a statistically meaningful way to choose the weight function. This choice reduces the problem to only choosing a tuning parameter \(\gamma \), typically still contained in the weight function. The probability weighted characteristic function (PWCF) is defined as
where the probability weight function is given by
and where denotes the exponential distribution function with parameter \(\lambda \). Note that the weight function in (1) places more weight at the centre of the distribution than in the tails. The probability weighted empirical characteristic function (PWECF) is then defined as
where the estimated probability weight is given by
and where denotes the exponential distribution function with estimated parameter \(\widehat{\lambda }\).
Meintanis et al. (2014) employs these expressions and develops a test for exponentiality based on the L2-norm between \(\chi _{n}(t;\gamma )\) and \(\chi (t;\gamma )\). The resulting test statistic is given by
Note that the weight function that plagues other tests based on the ECF no longer appears in the test statistic, since the weight function has been incorporated within the PWECF and PWCF functions themselves. In Meintanis et al. (2014), the limiting null distribution of the test statistic is derived and it is shown that this test is consistent for a very large class of alternative distributions. In a finite sample simulation study, the test was also found to be quite powerful against a variety of alternative distributions.
The test statistic in (3) can be simplified to
where \(Z_j = \exp (-Y_j)\). In the Monte Carlo simulation study presented in Meintanis et al. (2014) the power of this test was evaluated over a grid of possible choices of the tuning parameter \(\gamma \). However, for practical applications the authors suggest using \(\gamma =1\), because this choice fared well for the majority of the alternatives considered in their paper. We will henceforth refer to this type of recommended choice of the parameter as the compromise choice.
In Meintanis et al. (2014), the weight function is chosen to give more weight to the centre of the distribution. In this paper we also consider a weight function that places greater weight on the tails. This alternative choice for the weight function appearing in (2) is given by
and the test statistic resulting from (3) when employing this weight function is denoted by \({ PW}^{2}_{n,\gamma }\). Based on some preliminary Monte Carlo studies, we recommend using \(\gamma =0.1\) as the compromise choice.
Both \({ PW}^{1}_{n,\gamma }\) and \({ PW}^{2}_{n,\gamma }\) reject for large values.
2.2 Tests based on the empirical Laplace transform
In general, the Laplace transform (LT) of a random variable X is defined as \(E\left[ e^{-tX}\right] \). For a standard exponential random variable, Y, the Laplace transform is given by
Employing the scaled data \( Y_{1},\ldots ,Y_{n}\), \(\psi (t)\) can be estimated by the empirical Laplace transform (ELT),
We consider two test statistics based on the ELT, namely the ‘Baringhaus and Henze (1991)’ test and the ‘Henze and Meintanis (2002)’ test.
2.2.1 Baringhaus and Henze (1991) test (\({ BH}_{n,\gamma }\))
Baringhaus and Henze (1991) developed a test based on the following differential equation that characterises the exponential distribution: \((1 + t)\psi '(t) + \psi (t) = 0\), for all \(t \in \mathbb {R}\).
Their test makes use of the following weighted L2-norm
where \(\gamma > 0\) is a constant tuning parameter. It is easy to show that the statistic in (4) simplifies to
Baringhaus and Henze (1991) showed that the test statistic has a nondegenerate limiting null distribution and also that the test is consistent against a class of alternative distributions with strictly positive, finite mean. The compromise choice for \(\gamma \) suggested in Baringhaus and Henze (1991) is \(\gamma =1\). This test rejects exponentiality for large values of \({ BH}_{n,\gamma }\).
2.2.2 Henze and Meintanis (2002) test (\(L_{n,\gamma }\))
The natural idea of creating a test for exponentiality by measuring the L2-distance between the ELT and the LT for the standard exponential distribution was first proposed in Henze (1993). The proposed test statistic has the following form:
This test statistic should produce a value close to zero if the null hypothesis is true. However, the equation in (5) does not simplify to a simple closed-form expression and requires numerical integration. To overcome this issue Henze and Meintanis (2002) proposes the following form of the test statistic:
where \(\gamma > 0\). The statistic in (6) simplifies to the following closed-form expression:
Two possible compromise choices for the parameter \(\gamma \) are suggested for practical applications in Henze and Meintanis (2002); \(\gamma =0.75\) and \(\gamma =1\). For the purpose of this paper, we will make use of \(\gamma =0.75\). This test rejects \(H_{0}\) for large values of \(L_{n,\gamma }\).
2.3 Tests based on the empirical distribution function
The use of distance measures based on the empirical distribution function (EDF) is one of the earliest approaches to goodness-of-fit testing. The EDF based on the scaled data \(Y_1, \ldots , Y_n\) is defined as
where denotes the indicator function and \(x \in \mathbb {R}\). The tests considered measure the discrepancy between the standard exponential distribution function and the EDF. The most famous of these include the Kolmogorov–Smirnov and Cramér–von Mises tests (see, for example, D’Agostino and Stephens 1986), which are discussed below. Another test, based on the integrated EDF, can be found in Klar (2001), but is not discussed here.
2.3.1 Kolmogorov–Smirnov (\({ KS}_n\))
The Kolmogorov–Smirnov test statistic is given by:
The test statistic in (7) can be simplified to
where
This test rejects the null hypothesis for large values of \({ KS}_n\).
2.3.2 Cramér–von Mises (\({ CM}_n\))
The Cramér–von Mises test statistic for testing exponentiality is given by
The test statistic in (8) can be simplified to
Large values of \({ CM}_{n}\) will lead to the rejection of the null hypothesis.
2.4 Tests based on mean residual life
In reliability theory and survival analysis the mean residual life (MRL) of a non-negative random variable X at time t, defined as the expected value of the amount of life time remaining after time t, is expressed as
where \(S(t) = 1-F(t)\) is the survival function. It was shown in Shanbhag (1970) that the exponential distribution is characterised by a constant MRL, i.e., for the exponential distribution we have that
It can be shown that the characterisation in (9) is equivalent to
or
Tests based on the MRL (and the various forms of the characterising properties given in (9) to (11)) to test for exponentiality can be found in Baringhaus and Henze (2000), Jammalamadaka and Taufer (2006) and Taufer (2000). A generalisation of the test in Baringhaus and Henze (2000) which includes a more general weight function can be found in Baringhaus and Henze (2008). The two tests considered in this paper, namely the Jammalamadaka and Taufer test from Jammalamadaka and Taufer (2006) and the Baringhaus and Henze test from Baringhaus and Henze (2000), employ the characterisations in (9) and (10), respectively. The test proposed by Taufer (2000), however, makes use of the characterisation in (11). This test is not considered in this study.
2.4.1 Baringhaus and Henze (2000) (\(\overline{{ KS}}_n\) and \(\overline{{ CM}}_n\))
In Baringhaus and Henze (2000), a Kolmogorov–Smirnov and Cramér–von Mises type tests based on the MRL is introduced. The test statistic of the Kolmogorov–Smirnov version of the test is given by
where
The Cramér–von Mises type test statistic is:
The null hypothesis is rejected for large values of \(\overline{{ KS}}_{n}\) and \(\overline{{ CM}}_{n}\). The asymptotic null distributions of \(\overline{{ KS}}_{n} \) and \(\overline{{ CM}}_{n}\) are identical to the asymptotic null distributions of \({ KS}_n\) and \({ CM}_n\) when used to test for a standard uniform distribution. Baringhaus and Henze (2000) showed that these two tests are consistent against each fixed alternative distribution with positive mean.
2.4.2 Jammalamadaka and Taufer (2006) (\(J_{n,\gamma }\))
In Jammalamadaka and Taufer (2006), a test based on the characterization in (9) is developed by first defining what they call the ‘sample MRL after \(X_{(k)}\)’ as follows:
Under exponentiality it follows that
Using (12), a Kolmogorov–Smirnov type statistic is proposed in Jammalamadaka and Taufer (2006) as a possible test for exponentiality:
Unfortunately, it was shown that this version of the test statistic does not converge to zero even under the null hypothesis of exponentiality. To overcome this problem and some other issues plaguing the statistic \(J'_n\), Jammalamadaka and Taufer (2006) constructs a trimmed test statistic whereby some of the last residual means are removed from the calculation. The resulting test statistic has the form
where \(\left\lfloor x\right\rfloor = floor(x)\) and \(\gamma \) is the trimming parameter which indicates how many of the last residual means are discarded. This test rejects the null hypothesis for large values of \(J_{n, \gamma }\).
In Jammalamadaka and Taufer (2006), the authors derive the asymptotic null distribution of \(J_{n,\gamma }\) and also prove that the test is consistent for every fixed non-exponential alternative distribution with finite mean. In addition, it is shown that the powers of the test are highly sensitive to the choice of \(\gamma \), but that a compromise choice of \(\gamma =0.9\) (i.e., when a large proportion of the last mean residuals are trimmed) produces the highest powers for the majority of the alternatives considered.
2.5 Tests based on entropy
For a non-negative continuous random variable X with density function f(x), the entropy (sometimes referred to as the differential entropy) is given by
Initial attempts (see, for example, Grzegorzewski and Wieczorkowski 1999; Ebrahimi et al. 1992) to construct tests for exponentiality based on the entropy exploited the characterisation that, among all distributions with support \([0,\infty )\) and fixed mean, the quantity DE(X) is maximised if X follows an exponential distribution. However, these tests are not explored further in this paper, instead we focus on two more recent tests based on the cumulative residual entropy (CRE). The CRE, introduced in Rao et al. (2004), is an alternative information measure which replaces the density function in (14) with the survival function, and is defined as
where \(S(x) = 1 - F(x)\) is the survival function.
2.5.1 Zardasht et al. (2015) (\({ ZP}_n\))
The first test for exponentiality based on the CRE information measure considered is found in Zardasht et al. (2015). Let X and Z be non-negative random variables with distribution functions F and G, respectively. The test is based on the CRE of the so-called comparison distribution function, \(D(u) = F(G^{-1}(u))\) (Parzen 1998). Calculating the CRE of a random variable with distribution function D(u) and simplifying the following expression is obtained
If W is exponentially distributed with parameter \(\lambda > 0\), then (15) can be expressed as
which is a measure used to compare the distribution function of Z to that of the exponential distribution. If Z is also exponentially distributed, then it easily follows that \(\mathcal {C}(W,Z) = \frac{1}{4}\). The authors of Zardasht et al. (2015) based their test statistic on the difference between an estimator for \(\mathcal {C}(W,Z)\) and \(\frac{1}{4}\). The resulting test statistic is thus
This test rejects exponentiality for both small and large values of \({ ZP}_n\). Zardasht et al. (2015) go on to show that \(\sqrt{n}{} { ZP}_n \mathop {\rightarrow }\limits ^{\mathcal {D}} N(0,5/382)\), but did not formally prove the consistency of the test.
2.5.2 Baratpour and Habibi Rad (2012) (\({ BR}_n\))
The next test considered is based on the cumulative Kullback–Leibler (CKL) divergence (and indirectly on the CRE) introduced in Baratpour and Habibi Rad (2012). If \(W_1\) and \(W_2\) are two non-negative continuous random variables with distribution functions H and G, respectively, then the CKL divergence between these two distributions is defined as
Note that the CKL divergence is somewhat similar to the classical Kullback–Leibler divergence, with the density functions replaced by survival functions.
The authors make use of the fact that, if the null hypothesis is true, then \({ CKL}(F,F_0) = 0\). Rewriting the CKL measure in terms of the CRE measure, and plugging in the necessary estimates, they arrive at the following test statistic
The asymptotic distribution under the null hypothesis is not derived in Baratpour and Habibi Rad (2012), however it is shown that the test is consistent.
This test rejects \(H_{0}\) for large values of \({ BR}_n\).
2.6 Tests based on normalised spacings
It has been shown (see, for example, Jammalamadaka and Goria 2004) that transforming the data can increase the power of tests for exponentiality against certain alternatives. A widely used transformation is to convert the data to the so-called normalized spacings, defined as
with \(X_{(0)} = 1\). To find tests for exponentiality that use normalised spacings, the reader is referred to Epstein (1960), Jammalamadaka and Taufer (2003) and Jammalamadaka and Goria (2004), and for a test where these spacings are used to test for exponentiality in the presence of type-II censoring, see Balakrishnan et al. (2002). We consider two other tests based on spacings; one found in Gail and Gastwirth (1978) and a modification of a test in Gnedenko et al. (1969) which is found in Harris (1976).
2.6.1 Gini test (\(G_n\))
A test statistic that employs normalised spacings for testing exponentiality is described in D’Agostino and Stephens (1986) and is given by:
where
and follows a standard uniform distribution under \(H_0\).
This test rejects \(H_0\) for both small and large values of \({ DS}_n\).
An additional test based on the so-called Gini index, proposed in Gail and Gastwirth (1978), makes use of the following test statistic
It is easy to see that the following relationship holds between the test statistics in (16) and (17):
Similar to \({ DS}_n\), this test rejects the null hypothesis for both small and large values.
Unfortunately, both of these tests have been shown not to be universally consistent.
2.6.2 Harris’ modification of Gnedenko’s F-test (\({ HM}_{n,r}\))
In Gnedenko et al. (1969) a test is proposed for exponentiality involving ordering a sample of size n and then splitting the n elements into two groups; the first containing the r smallest elements and the second containing the remaining \(n-r\) elements. The test statistic, given by
follows an F distribution with 2r and \(2(n-r)\) degrees of freedom under \(H_0\).
A modification of the test in (18) was introduced in Harris (1976). This modification can be used to accommodate testing for exponentiality in the presence of hypercensoring and is referred to as Harris’ modification of Gnedenko’s F-test. For this test, the sample spacings are split into three groups: The first group contains the first r spacings, the last group contains the last r last spacings, and the remaining \(n-2r\) spacings form the second group. The test is based on the elements in the second group and the test statistic is given by
In Harris (1976), it is recommended that r is chosen to be equal to n / 4, and this is also the value of r used in the simulation study presented Sect. 4.
The null hypothesis is rejected for small and large values of both \({ GD}_{n,r}\) and \({ HM}_{n,r}\).
2.7 A test based on a score function
The score function, defined as the gradient of the log likelihood function, is a powerful tool that can be used to test statistical hypotheses. We consider one test, developed in Cox and Oakes (1984), that employs this score function to test for exponentiality.
2.7.1 Cox and Oakes (1984) (\({ CO}_n\))
A score test is introduced in Cox and Oakes (1984) that, when applied to censored data, has the following form
where \(d \le n\) is the number of uncensored data points. However, when \(d=n\) (i.e., in the uncensored case) and one uses the scaled data \(Y_1, \ldots , Y_n\), the statistic becomes
The test rejects \(H_{0}\) for both large and small values of \({ CO}_{n}\) and it is shown using finite sample simulation studies in both Ascher (1990) and Henze and Meintanis (2005) that the test is quite powerful against a wide variety of non-exponential alternatives.
It follows that \(\sqrt{6/n} ({ CO}_n/\pi )\) has a standard normal asymptotic null distribution and the test is consistent against alternative distributions with \(E(X) < \infty \) and \(E(X\ln X - \ln X) \ne 1\), as discussed in, for example, Henze and Meintanis (2002).
2.8 Tests based on other characterizations and properties
Over the years, a multitude of tests for exponentiality have been developed by utilising a number of interesting and varied characterisations and properties of the exponential distribution, but it would not be possible to address all of them in a single study. These tests utilise characterisations such as the memoryless property (see, for example, Ahmad and Alwasel 1999; Alwasel 2001; Angus 1982), the Arnold–Villasenor characterisation (see Jovanović et al. 2015), the Rossberg characterisation (Volkova 2010), and various other characterisations (see, for example, Abbasnejad et al. 2012; Noughabi and Arghami 2011a). Other tests for exponentiality, not included in this paper, include tests for exponentiality based on the analysis of variance (see Shapiro and Wilk 1972), tests based on order statistics (see Bartholomew 1957; Hahn and Shapiro 1967; Jackson 1967; Wong and Wong 1979), tests based on transformations to uniformity (see Hegazy and Green 1975; Seshadri et al. 1969), and tests based on maximum correlations (see Grané and Fortiana 2011), to name but a few. However, for the purposes of the simulation study conducted in this paper, we consider the following four tests: the Ahsanullah test (Volkova and Nikitin 2013), a test based on likelihood ratios (Noughabi 2015), a test based on transformed data (Noughabi and Arghami 2011b), and the Atkinson test (Mimoto and Zitikus 2008). The Ahsanullah test is chosen because no finite sample results for this test are available in Volkova and Nikitin (2013), whereas the remaining three are chosen because of their good power performance in finite sample studies found in the literature.
2.8.1 Tests based on Ahsanullah’s characterisation (\(AH^1_{n}\) and \(AH^2_{n}\))
Assume that the distribution F belongs to a class of distributions \(\mathcal {F}\) that are all strictly monotone and whose hazard rate function, f(x) / S(x), is either increasing or decreasing monotonically. Ahsanullah proved the following characterisation of the exponential distribution in Ahsanullah (1978): Let \(X_{1},X_{2},\ldots ,X_{n}\) be non-negative iid random variables with distribution function F. A necessary and sufficient condition for F to be exponential is that for some j and k, the statistics \((n-j)(X_{(j+1)} - X_{(j)})\) and \((n-k)(X_{(k+1)} - X_{(k)})\) are identically distributed for \(1\le {j}< k < n\).
In Volkova and Nikitin (2013), the following specific settings of this characterization is considered: \(n=2\), \(j=0\) and \(k=1\). Under these settings, the characterization takes the following form: Let X and Y be non-negative iid random variables from the class \(\mathcal {F}\). X is then exponentially distributed if \(|X-Y|\) and \(2\min {\{X,Y\}}\) are identically distributed.
The test statistic suggested in Volkova and Nikitin (2013), derived from this characterization, is
where
If the null hypothesis is true, then \(H_{n}\) and \(G_{n}\) should be close to one another. The test therefore rejects \(H_{0}\) for small or large values of \(AH^1_{n}\). The authors showed that
and calculated local Bahadur efficiencies under common parametric alternatives. However, the finite sample performance of their test statistic was not investigated. In addition, we also consider the more common Cramer–von Mises type distance where the squared difference between \(H_n\) and \(G_n\) is used; the corresponding statistic is denoted by
This new form of the test will reject \(H_0\) for large values of the test statistic.
2.8.2 A test based on likelihood ratios (\({ ZA}_{n}\))
Consider the following two generic statistics,
and
where Z(t), dw(t) and w(t) are appropriately chosen functions. It is easy to show (see, for example, Zhang 2002) that if one chooses \(Z(t) = X^2(t)\), where
is the Pearson chi-squared statistic, then the statistics in equations (19) and (20) become the traditional Anderson–Darling, Cramer–von Mises, and Kolmogorov–Smirnov test statistics for specific choices of \({ dw}(t)\) and w(t), and where \(F_0(x)=1 - \exp (-\lambda x)\).
However, Zhang (2002) suggests using the likelihood ratio statistic \(G^2(t)\) instead of the \(X^2(t)\) statistic, where \(G^2(t)\) is defined as
Choosing \(Z(t)=G^2(t)\), the authors obtain the following easy-to-calculate versions of the tests statistics for certain choices of dw(t) and w(t):
-
Setting \({ dw}(t) = F_n(t)^{-1}\{1- F_n(t)\}^{-1}{} { dF}_n(t)\) in (19), the following statistic is obtained:
$$\begin{aligned} { ZA}_n = -\sum _{j=1}^n \left( \frac{\log (1 - \exp (-Y_{(j)}))}{n - j + 0.5} - \frac{Y_{(j)}}{j - 0.5} \right) . \end{aligned}$$ -
Setting \(dw(t) = F_0(t)^{-1}\{1- F_0(t)\}^{-1}dF_0(t)\) in (19), the following approximate statistic is obtained:
$$\begin{aligned} { ZC}_n = \sum _{j=1}^n \left( \log \left\{ \frac{(1 - \exp (-Y_{(j)}))^{-1} - 1}{ (n- 0.5)/(j - 0.75) - 1 } \right\} \right) ^2. \end{aligned}$$ -
Setting \(w(t) = 1\) in (20), the following statistic is obtained:
$$\begin{aligned} { ZK}_n= & {} \max _{1 \le j \le n} \left( (j - 0.5)\log \left\{ \frac{j - 0.5}{n(1 - \exp (-Y_{(j)}))} \right\} \right. \\&\left. +\,(n - j + 0.5) \log \left\{ \frac{n - j + 0.5}{n(\exp (-Y_{(j)}))} \right\} \right) . \end{aligned}$$
All of these tests reject \(H_0\) for large values of the test statistics.
The finite sample performance of these three new tests for testing the hypothesis of normality are investigated in Zhang (2002), where it is found that the \({ ZA}_n\) and \({ ZC}_n\) versions of these statistics perform well, even when compared to traditionally powerful tests for normality, such as the Shapiro–Wilk test. In Noughabi (2015) the finite sample performance of these tests is investigated when testing for exponentiality. The authors conclude that, among these three tests, \({ ZA}_n\) performs best. As a result we include only \({ ZA}_n\) in our own Monte Carlo study. Note that while the finite sample performance of these tests were extensively studied in Noughabi (2015), the derivation of the asymptotic null distribution and consistency of these tests were not discussed.
2.8.3 A test using transformed data (\(N{\!}A_{n}\))
The test proposed in Noughabi and Arghami (2011b) employs the rather simple idea that, for a uniform distribution, the quantity \(xf_U(x)\) will be equal to \(F_U(x)\), where \(x \in [0,1]\), is the uniform density function and is the uniform distribution function. Therefore, given data \(V_1, V_2, \ldots , V_n\), a test statistic proposed to test for uniformity is
where is the kernel density estimator defined as
with the standard normal density function and h the bandwidth chosen using Silverman’s normal rule of thumb, \(h=1.06sn^{-1/5}\) (see Silverman 1986), where s is the sample standard deviation.
The test for exponentiality proceeds by exploiting the following characterisation of exponentiality (see Alzaid and Al-Osh 1992): For two independent random observations \(W_1\) and \(W_2\) from a distribution G, the random variable \( {W_1}/{(W_1 + W_2)} \) is uniformly distributed if, and only if, G is the exponential distribution.
Subsequently, given the order statistics \(X_{(1)} \le X_{(2)} \le \cdots \le X_{(n)}\), construct the transformed data set
Under the hypothesis of exponentiality, these newly transformed values will have a uniform distribution. The test statistic given in (21) can consequently be used to test deviations from exponentiality for these transformed data:
The test rejects the null hypothesis for large values of \({ NA}_n\).
In Noughabi and Arghami (2011b) the authors investigate the finite sample performance of their newly proposed test, but do not derive any asymptotic results.
Another test using transformed data can be found in Dhumal and Shirke (2014), but we will not discuss this test further in this paper.
2.8.4 The Atkinson test (\({ AT}_{n,\gamma }\))
In Lee et al. (1980) the authors propose tests for exponentiality based on the ratio
for \(\gamma >0\), which is equal to \(\varGamma (1 + \gamma )\) if X is exponentially distributed.
However, an approach whereby the quantity \(Q_F(\gamma )\) is raised to the power \(1/\gamma \) to create the following ratio
is adopted in Mimoto and Zitikus (2008). Naturally, if X is exponentially distributed, then \(R_F(\gamma )\) equals \(\varGamma (1 + \gamma )^{1/\gamma }\) for \(\gamma \ne 0\), and equals \(\exp (-\epsilon )\) when \(\gamma \rightarrow 0\), where \(\epsilon = 0.577215\ldots \) is the Euler constant. The test statistic proposed in Mimoto and Zitikus (2008), called the Atkinson statistic, is based on the difference between an empirical estimator of \(R_F(\gamma )\) and \(\varGamma (1 + \gamma )^{1/\gamma }\), for \(\gamma \) values between \(-1\) and 1, but \(\gamma \ne 0\). The test statistic is given by
where
In the limit where \(\gamma \rightarrow 0\) the quantity \(R_F(\gamma )\) has the form
the numerator of which is consistently estimated by the geometric mean \(G_n = \prod _{j=1}^n X_j^{1/n}\). Therefore, when \(\gamma =0\), the resulting test statistic, called the Moran statistic for exponentiality, has the form
see Moran (1951). For all choices of \(\gamma \), the test rejects the null hypothesis for large values.
Extensive Monte Carlo power studies are presented in Mimoto and Zitikus (2008) where it is found that values of \(\gamma \) close to 0 and close to 0.99 produce the highest power for most alternatives considered. For the purposes of this paper, a compromise choice of \(\gamma =0.01\) is selected. In addition, the authors of Mimoto and Zitikus (2008) establish the asymptotic null distribution and consistency of the test statistic \({ AT}_{n,\gamma }\).
3 A data-dependent choice of the tuning parameter
Many of the tests mentioned in Sect. 2 contain a tuning parameter \(\gamma \) typically appearing in a weight function (see for example the test statistics in (4), (6), and (13)). As stated in the introduction, authors typically approach the selection of this parameter by evaluating the power performance of their tests across a grid of values of the tuning parameter and then suggesting a compromise choice for the parameter by selecting a value that fares well for the majority of the alternatives considered. However, there is general agreement that a data-dependent choice of this parameter is required for practical implementation.
Consider a generic test statistic which contains a tuning parameter \(\gamma \) denoted \(T_{n,\gamma }\), whose critical values, denoted by \(\widetilde{C}_{n,\gamma }(\alpha )\), can be obtained through Monte Carlo simulation. A possible data-dependent choice of the parameter \(\gamma \) proposed by Allison and Santana (2015) can be obtained by maximising the bootstrap power of the test as follows:
where \(\mathbf {Y}_{n}^{*}=(Y_{1}^{*},Y_{2}^{*},\ldots ,Y_{n}^{*})\) denotes a bootstrap sample taken with replacement from \(\mathbf {Y}_{n}\), and \(P^{*}\) is the law of \(\mathbf {Y} _{n}^{*}\) given \(\mathbf {Y}_{n}\). In Allison and Santana (2015) the following algorithm used to approximate the ideal bootstrap estimator \(\widehat{\gamma }\) is provided:
-
1.
Fix a grid of \(\gamma \) values: \(\gamma \in \left\{ \gamma _{1},\gamma _{2},\ldots ,\gamma _{k}\right\} \).
-
2.
Obtain a bootstrap sample \(\mathbf {Y}_{n}^{*}\) by sampling with replacement from \(\mathbf {Y}_{n}\).
-
3.
Calculate \(T_{n,\gamma _{j}}\left( \mathbf {Y}_{n}^{*}\right) \), \( j=1,2,\ldots ,k\).
-
4.
Repeat steps (2) and (3) a large number of times (say B times) and denote the resulting test statistics by \(T_{n,\gamma _{j},1}^{*},T_{n,\gamma _{j},2}^{*},\ldots ,T_{n,\gamma _{j},B}^{*}\), \(j=1,2,\ldots ,k\).
-
5.
Calculate
$$\begin{aligned} \widehat{P}_{{ boot},\gamma _{j}}=\frac{1}{B}\sum _{b=1}^{B}{\text {I}}\left( T_{n,\gamma _{j},b}^{*}\ge \tilde{C}_{n,\gamma _{j}}\left( \alpha \right) \right) ,\quad j=1,2,\ldots ,k. \end{aligned}$$ -
6.
Calculate
$$\begin{aligned} \widehat{\gamma }_{B}=\widehat{\gamma }_{B}\left( \mathbf {X}_{n}\right) =\arg \max _{\gamma \in \left\{ \gamma _{1},\gamma _{2},\ldots ,\gamma k\right\} }\widehat{P }_{{ boot},\gamma }. \end{aligned}$$(23)
The numerical results reported in Tables 2, 3, 4, 5, 6 and 7 in Sect. 4 relating to test statistics containing a tuning parameter are obtained using the estimated tuning parameter obtained in (23). The estimated powers obtained using the compromise choice of \(\gamma \) are reported in parentheses in these tables. The details related to the choice of the grid used for each test are discussed in the next section.
4 Monte Carlo methodology and results
In this section Monte Carlo simulations are used to evaluate the power of the various tests discussed in Sect. 2.
4.1 Simulation setting
Throughout the simulation study we use a significance level of 5% and the critical values of all tests are calculated based on 10 000 independent Monte Carlo replications. All calculations are done in R (R Core Team 2013).
Power estimates are calculated for sample sizes \(n \in \{ 10, 20, 30, 50, 75, 100\}\) using 5000 independent Monte Carlo replications for various alternative distributions. These alternative distributions, given in Table 1, are chosen since they are commonly employed alternatives to the exponential distribution, which has a constant hazard rate (CHR). The distributions considered include those with increasing hazard rates (IHR), decreasing hazard rates (DHR), as well as non-monotone hazard rates (NMHR).
In order to determine the power of the six tests containing a tuning parameter (\({ BH}_{n,\gamma }\), \(L_{n,\gamma }\), \({ PW}^1_{n,\gamma }\), \({ PW}^2_{n,\gamma }\), \(J_{n,\gamma }\), \({ AT}_{n,\gamma }\)) when using the data-dependent choice of the parameter (discussed in Sect. 3), we first need to approximate the empirical powers of these tests for each value of \(\gamma \) in a sequence of \(\gamma \) values. The empirical power based on the data-dependent choice is then calculated as described in Allison and Santana (2015). In each case \(B=250\) bootstrap replications are used to evaluate the bootstrap power of the tests. The following grids of values of the parameter are used for the respective tests:
-
For \({ BH}_{n,\gamma }\), \(L_{n,\gamma }\), \({ PW}^1_{n,\gamma }\), and \({ PW}^2_{n,\gamma }\) the grid of \(\gamma \) values is given by
$$\begin{aligned} \gamma \in \{0.1, 0.25, 0.5, 0.75, 1, 1.5, 2, 2.5, 5\}. \end{aligned}$$ -
For \(J_{n,\gamma }\), the grid of \(\gamma \) values is
$$\begin{aligned} \gamma \in \{ 0.1, 0.3, 0.5, 0.7, 0.9\}. \end{aligned}$$ -
The grid of \(\gamma \) values used for \({ AT}_{n,\gamma }\) is
$$\begin{aligned} \gamma \in \{-0.99, -0.75, -0.5, -0.25, -0.01, 0.01, 0.25, 0.5, 0.75, 0.99 \}. \end{aligned}$$
4.2 Simulation results
Tables 2, 3, 4, 5, 6 and 7 show the estimated powers of the various tests discussed in Sect. 2 for sample sizes \(n \in \{10,20,30,50,75,100\}\) against each of the alternative distributions given in Table 1. The entries in these tables are the percentage of 5000 independent Monte Carlo samples that resulted in the rejection of \(H_0\) rounded to the nearest integer. Note that, for the tests containing a tuning parameter, the primary entry is the approximate power for the test based on the data-dependent choice of the parameter, \(\widehat{\gamma }\), while the approximate power of the test based on the compromise choice appears in parentheses along-side it. To ease comparisons between the results, the highest power for each alternative distribution is highlighted.
The primary aim of this paper is to compare the power of these tests against a wide range of alternative distributions. Below we present some general conclusions relating to the reported estimated powers of the various tests. For the second part of the analysis of the results we consider only the tests containing tuning parameters. Here we compare the powers achieved by tests employing the data-dependent choice proposed in Allison and Santana (2015) with those associated with the compromise choice of the parameter.
The performance of the tests are greatly affected by the shape of the hazard rate of the alternative distribution considered. Consequently, we discuss the overall results, as well as the results categorised according to the shape of the hazard rate classified as increasing, decreasing, or non-monotone.
4.3 Power comparisons
For the purposes of the comparison between the power of the various tests we use the data-dependent choice (and not the compromise choice) of the tuning parameter for the tests containing such a parameter.
Consider the performance of the tests in general against all alternatives. The powers of \({ HM}_{n}\) do not compare favourably to those of the other tests; this test reveals lower powers against the majority of the alternatives. For small samples, \(AH_{n}^{2}\), \({ BR}_{n}\) and \({ NA}_{n}\) also exhibit lower powers against the majority of the alternatives. The tests that generally perform well are \({ CO}_{n}\), \({ ZA}_{n}\), \({ AT}_{n,\widehat{\gamma }}\), \({ BH}_{n,\widehat{\gamma }}\) and \(L_{n,\widehat{\gamma }}\). The \({ CM}_{n}\) and \(\overline{{ CM}}_{n}\) also perform relatively well against the majority of the alternatives, especially for large samples.
We now consider the results pertaining to the alternatives with increasing hazard rates. Against these alternatives \({ HM}_{n}\), \({ KS}_{n}\), \(AH_{n}^{1}\), \( J_{n,\widehat{\gamma }}\), \(PW_{n,\widehat{\gamma }}^{1}\) and \(PW_{n,\widehat{\gamma }}^{2}\) exhibit lower powers for all sample sizes considered. \({ BR}_{n}\) has higher power in the case of small sample sizes, but its power relative to the other tests decreases with sample size. The opposite is true for \(L_{n,\widehat{\gamma } }\), which reveals a relative increase in power with sample size. The two tests based on mean residual life, \(\overline{{ KS}}_{n}\) and \(\overline{{ CM}}_{n}\), perform relatively well for all sample sizes. The Cramér–von Mises type statistic for Ahsanullah’s test, \(AH_{n}^{2}\), and \({ NA}_{n}\) also perform well, especially for small sample sizes. The following tests exhibit high powers in the case of large sample sizes: \(G_{n}\), \({ EP}_{n}\), \({ ZA}_{n}\) and \({ BH}_{n,\widehat{\gamma }}\).
We now turn our attention to the alternatives with decreasing hazard rates. \( { HM}_{n}\), \(AH_{n}^{2}\), \({ BR}_{n}\) and \({ NA}_{n}\) perform poorly for all sample sizes. In turn, the tests for which large powers are observed are \({ CO}_{n}\), \( { BH}_{n,\widehat{\gamma }}\) and \(L_{n,\widehat{\gamma }}\). Furthermore, \(\overline{{ CM}}_{n}\), \(G_{n}\), \({ EP}_{n}\) and \({ AT}_{n,\widehat{\gamma }}\) perform well, especially for large samples, while \(PW_{n,\widehat{\gamma }}^{2}\) provides higher relative powers in the case of small samples.
The results pertaining to the alternatives with non-monotone hazard rates are as follows. The tests generally demonstrating the lowest powers are \( { HM}_{n}\), \({ BR}_{n}\) and \({ NA}_{n}\). For small sample sizes \(AH_{n}^{2}\) performs poorly, while \(G_{n}\), and \({ EP}_{n}\) exhibit relatively low powers in the case of large samples. However, \({ ZA}_{n}\), \({ AT}_{n,\widehat{\gamma }}\), \({ BH}_{n, \widehat{\gamma }}\) and \(L_{n,\widehat{\gamma }}\) generally perform well for all sample sizes. The original probability weighted characteristic function test, \(PW_{n,\widehat{\gamma }}^{1}\), where the weights emphasise the centre of the distribution, does well in the case of larger samples. On the other hand, the alternative formulation of this test with the weight function allocating the majority of the weight to the tails of the distribution, \(PW_{n,\widehat{\gamma } }^{2}\), exhibits relatively high power, especially for small samples. The same is true for \({ CO}_{n}\).
In summary, the powers achieved by \({ HM}_{n}\) are generally substantially lower than those of the remaining tests. Other tests that do not generally achieve good results are \(AH_{n}^{2}\), \({ BR}_{n}\), and \({ ZA}_{n}\). The tests that perform well are \({ BH}_{n,\widehat{\gamma }}\), \(L_{n,\widehat{\gamma }}\), \({ AT}_{n, \widehat{\gamma }}\) and \({ CO}_{n}\). The test that performs the best overall is \( { BH}_{n,\widehat{\gamma }}\), closely followed by \(L_{n,\widehat{\gamma }}\). Note that only one of the tests reported to perform relatively poorly contain a tuning parameter, while only one of the tests reported to achieve high powers do not contain such a parameter; \({ CO}_{n}\) performs the best among those tests that do not include a tuning parameter.
4.4 Comparisons based on the choice of the tuning parameter
Six of the goodness-of-fit test statistics considered contain tuning parameters. Below we compare the powers achieved by these tests using two different values of the tuning parameter. The first value is chosen data-dependently using the method detailed in Allison and Santana (2015), while the second is the compromise choice recommended in the relevant literature. As was the case above, the discussion below does not only refer to the overall performance of the tests; the performance of the tests against alternatives with increasing, decreasing and non-monotone hazard rates are also discussed separately.
We consider the overall results first. For smaller sample sizes there is little to choose between the powers obtained using \({ AT}_{n,\gamma }\) based on the choices of the tuning parameter. However, as the sample size increases, use of the data-dependent choice generally results in a slight increase in relative power. On the other hand, when using \(J_{n,\gamma }\) the choice between the tuning parameters is unimportant for large samples, but for smaller samples the data-dependent choice leads to slightly higher powers. For both \( { BH}_{n,\gamma }\) and \(L_{n,\gamma }\) the data-dependent choice leads to higher powers than the compromise choice. Interestingly, the compromise choice outperforms the data-dependent choice in the case of the original PWECF test, \(PW_{n,\gamma }^{1}\), by a small margin, while the data-dependent choice leads to vast improvements in the powers associated with \(PW_{n,\gamma }^{2}\) (giving more weight towards the tails of the distribution), especially for larger samples.
Next we consider alternative distributions with increasing hazard rates. In this case the use of either method for the choice of the tuning parameter leads to little difference in powers obtained using the \({ AT}_{n,\gamma } \), \({ BH}_{n,\gamma }\), \(PW_{n,\gamma }^{1}\) and \(L_{n,\gamma }\) tests. The performance of \(J_{n,\gamma }\) is slightly improved by using the compromise choice, while the performance of \(PW_{n,\gamma }^{2}\) is greatly improved when using the data-dependent choice of the tuning parameter.
Turning our attention to the alternative distributions with decreasing hazard rates, we see that the observed powers are not substantially affected by the choice of tuning parameter in the case of the following tests: \( { AT}_{n,\gamma }\), \({ BH}_{n,\gamma }\) and \(L_{n,\gamma }\). For both \( PW_{n,\gamma }^{1}\) and \(PW_{n,\gamma }^{2}\) the compromise choice of the tuning parameter outperforms the data-dependent choice. The power of \( J_{n,\gamma }\) is substantially improved when using the data-dependent choice, especially for small samples.
Finally, we consider the performance of the tests against alternatives with non-monotone hazard rates. When using \(PW_{n,\gamma }^{1}\) the powers can be increased by using the compromise choice, especially for small samples. However, substantial improvements in the power of \(PW_{n,\gamma }^{2}\) are realised when the data-dependent choice is used, especially in the case of larger samples. The powers of \({ BH}_{n,\gamma }\) and \(L_{n,\gamma }\) are higher when the data-dependent choice is used than is the case for the compromise choice. The performance of \({ AT}_{n,\gamma }\) is not substantially affected by the choice of the tuning parameter for small samples, but using the data-dependent choice leads to improved power in the case of larger samples. When using \(J_{n,\gamma }\) the data-dependent choice outperforms the compromise choice for small samples.
It is interesting to note that in the cases where the compromise choice of the tuning parameter outperforms the data-dependent choice the difference in realised power is usually small. However, there are cases where the power associated with the data-dependent choice vastly outperforms the compromise choice. As an example, consider the power of \(PW_{n,\gamma }^{2}\) against samples of size 75 generated from a lognormal distribution with parameter 0.8. The power using the compromise choice is estimated to be 0%, while the estimated power associated with the data-dependent choice is estimated to be 96%. Various other instances of this phenomenon can be observed in the reported powers.
To conclude this section, we provide a short illustration of how the choice of the tuning parameter affects the power of two of the tests considered in the study. For this purpose we consider the tests \(L_{n,\gamma }\) and \(J_{n,\gamma }\) for sample size \(n=20\). In order to more easily visualise the behaviour of the powers across the \(\gamma \) values Figs. 1 and 2 present the powers obtained for tests \(L_{n,\gamma }\) and \(J_{n,\gamma }\), respectively, for each choice of \(\gamma \) in the grid of selected \(\gamma \) values. The powers are calculated for five different alternative distributions. For each test, the compromise choice of the tuning parameter is indicated by a vertical dashed line in the relevant figure.
It is clear from the figures that the power of the tests is highly dependent on the choice of \(\gamma \). The compromise choice performs moderately well in many of the alternatives, but in some cases it produces low powers relative to other choices of \(\gamma \) (see, e.g., \(L_{n,\gamma }\) for alternative PW(2) and \(J_{n,\gamma }\) for alternatives LN(1.5) and PW(1)). Furthermore, the main entries in Tables 8 and 9 correspond to the powers presented in the figures, whereas the values stated in parentheses in these tables denote the percentage of times (out of 5000 independent Monte Carlo simulations) that the data-dependent procedure selected the \(\gamma \) value that corresponds to the \(\gamma \) value given in the column heading. These tables are provided to show that the procedure for obtaining the data-dependent choice of the tuning parameter most frequently selects the value of \(\gamma \) that produces the highest power for a given alternative. Consider, for example, \(L_{n,\gamma }\) for the alternative PW(2), where the maximum power of 53% is obtained at \(\gamma =0.1\). The percentage of times that the procedure chose \(\gamma =0.1\) is 68%, and the power of the test based on the data-dependent choice is 43%. In contrast, the power associated with the compromise choice is only 21%.
5 Practical application
In this section we apply all of the tests considered in Sect. 2 to a real-world data set: the ‘Leukemia’ data set given in Table 10 (see Kotze and Johnson 1983, for a discussion of the original data set). These data display the survival times (days) of 43 patients diagnosed with a certain type of Leukemia.
Table 11 lists the names of the 20 different tests discussed in this paper along with the value of the test statistic calculated from these data, the p-value for testing the hypothesis of exponentiality, as well as the time (s) taken to compute the p-value and critical value for each test (based on \(MC=10000\) replications). Where applicable, the data-dependent choice of \(\gamma \) used is also displayed in the table. The number of bootstrap replications in the calculation of the data-dependent choice of the tuning parameter is set to \(B=1000\). The final column in the table indicates whether the test is available in the software package R (R Core Team 2013); these tests are primarily available in the package exptest (Novikov et al. 2013).
All of the tests except \(J_{n,0.9}\) and \({ BR}_n\) do not reject the null hypothesis of exponentiality at a significance level of \(\alpha =0.05\).
As shown in Table 11, none of the tests containing a tuning parameter appear in R. These tests are rather powerful and therefore it might be a worthwhile avenue for future work to create an R package that includes these tests along with the procedure to obtain the tuning parameter data-dependently.
6 Conclusions
In this paper we consider a large number of tests for exponentiality based on a wide variety of characteristics of this distribution. Below we briefly mention these characteristics as well as the tests associated with them.
The tests based on the characteristic function are the Epps and Pulley test (\({ EP}_{n}\)) as well as tests based on the probability weighted empirical characteristic function. We consider two forms of this test; the first uses the original test statistic proposed in Meintanis et al. (2014) (\(PW_{n,\gamma }^{1}\)). The weight function used in this test statistic assigns the majority of the weight to the centre of the distribution. The second formulation of the test statistic considered (\(PW_{n,\gamma }^{2}\)) gives more weight towards the tails of the distributions.
The tests based on the empirical Laplace transform are those of Baringhaus and Henze (\({ BH}_{n,\gamma }\)) as well as Henze and Meintanis (\(L_{n,\gamma }\)).
Another characteristic of the exponential distribution that some of the tests are based on is the distribution function. The tests associated with this characteristic are the Kolmogorov–Smirnov (\({ KS}_{n}\)) and Cramér–von Mises (\({ CM}_{n}\)) tests.
Next we consider the tests based on the mean residual life of the data. The tests considered include those of Baringhaus and Henze. We consider two test statistics based on mean residual life introduced in Baringhaus and Henze (2000); a Kolmogorov–Smirnov type test (\(\overline{{ KS}}_{n}\)) and a Cramér–von Mises type test (\(\overline{{ CM}}_{n}\)). The test of Jammalamadaka and Taufer (\(J_{n,\gamma }\)) is also based on this characteristic.
Another characteristic used to test for exponentiality is entropy. We consider two tests based on entropy; the test of Zardasht et al. (\({ ZP}_{n}\)) and that of Baratpour and Habibi Rad (\({ BR}_{n}\)).
Furthermore, we consider two tests based on the normalised spacings of the observed data. The first of these is the Gini test (\(G_{n}\)) and the second is Harris’ modification of Gnedenko’s F-test (\({ HM}_{n}\)).
The Cox and Oakes test (\({ CO}_{n}\)) is also included in the study. This test is based on a score function.
Various other characteristics are also used. We consider two tests based on Ahsanullah’s characterisation. The first (\(AH_{n}^{1}\)) uses the original test statistic proposed in Volkova and Nikitin (2013). The second test (\(AH_{n}^{2}\)) utilizes a Cramér–von Mises type test statistic. Zhang’s test (\({ ZA}_{n}\)), based on likelihood ratios, is included in the study as well as the Noughabi and Arghami test (\({ NA}_{n}\)) which uses transformed data. Finally, the Atkinson test (\({ AT}_{n,\gamma }\)), based on the Atkinson statistic, is considered.
Based on the results of the Monte Carlo study conducted in this paper, we make some brief conclusions regarding the powers of the tests considered. Generally, \({ HM}_{n}\) achieves powers substantially lower than the remaining tests. In addition, the \(AH_{n}^{2}\), \({ BR}_{n}\), and \({ ZA}_{n}\) tests are also relatively poor performers in terms of power. However, tests that do perform well are \({ BH}_{n,\widehat{\gamma }}\), \(L_{n,\widehat{\gamma }}\), \({ AT}_{n, \widehat{\gamma }}\) and \({ CO}_{n}\). \({ BH}_{n,\widehat{\gamma }}\) has the best overall performance, closely followed by \(L_{n,\widehat{\gamma }}\). Note that only one of the tests reported to perform relatively poorly contain a tuning parameter, while only one of the tests reported to achieve high powers do not contain such a parameter; \({ CO}_{n}\) performs the best among those tests that do not include a tuning parameter.
In light of the results discussed above, we would advise using the data-dependent choice of the tuning parameter; this choice generally outperforms the compromise choice. It is important to note that power associated with the data-dependent choice of the tuning parameter can conceivably be increased further by evaluating the powers over finer grids of tuning parameters than the grids used in the paper. Because of the large number of Monte Carlo replications required for the numerical results shown in the paper, finer grids would substantially increase the computational burden. However, in the case where the hypothesis of exponentiality is to be tested on a single dataset the computational time required is substantially less.
References
Abbasnejad M, Arghami NR, Tavakoli M (2012) A goodness of fit test for exponentiality based on Lin-Won information. J Iran Stat Soc 11(2):191–202
Ahmad I, Alwasel I (1999) A goodness-of-fit test for exponentiality based on the memoryless property. J R Stat Soc Ser B 61(3):681–689
Ahsanullah M (1978) On a characterization of the exponential distributon by spacings. Ann Inst Stat Math Part A 30:163–166
Allison JS, Santana L (2015) On a data-dependent choice of the tuning parameter appearing in certain goodness-of-fit tests. J Stat Comput Simul 85(16):3276–3288
Alwasel I (2001) On goodness of fit testing for exponentiality using the memoryless property. J Nonparametr Stat 13:569–581
Alzaid AA, Al-Osh MA (1992) Characterisation of probability distributions based on the relation. Sankya Ser B 53:188–190
Angus JE (1982) Goodness-of-fit tests for exponentiality based on a loss-of-memory type functional equation. J Stat Plan Inference 6:241–251
Ascher S (1990) A survey of tests for exponentiality. Commun Stat Theory Methods 19(5):1811–1825
Balakrishnan N, Ng HKT, Kannan N (2002) A test of exponentiality based on spacings for progressively Type-II censored data. In: Goodness-of-fit tests and model validity, chap 8. Birkhäuser, Boston, pp 89–111
Baratpour S, Habibi Rad A (2012) Testing goodness-of-fit for exponential distribution based on cumulative residual entropy. Commun Stat Theory Methods 41(8):1387–1396
Baringhaus L, Henze N (1991) A class of consistent tests for exponentiality based on the empirical Laplace transform. Ann Inst Stat Math 43(3):551–564
Baringhaus L, Henze N (2000) Tests of fit for exponentiality based on a characterization via the mean residual life function. Stat Pap 41:225–236
Baringhaus L, Henze N (2008) A new weighted integral goodness-of-fit statistic for exponentiality. Stat Probab Lett 78:1006–1016
Bartholomew DJ (1957) Testing for departure from the exponential distribution. Biometrika 44(1):253–257
Cox DR, Oakes D (1984) Analysis of survival data. Chapman and Hall, London
D’Agostino R, Stephens M (1986) Goodness-of-fit techniques. Marcel Dekker, New York
Dhumal B, Shirke D (2014) A modified test for testing exponentiality using transformed data. J Stat Plan Simul 84:397–403
Ebrahimi N, Habibullah M, Soofi ES (1992) Testing exponentiality based on kullback leibler information. J R Stat Soc Ser B 54:739–748
Epps TW, Pulley LB (1986) A test of exponentiality vs. monotone-hazard alternatives derived from the empirical characteristic function. J R Stat Soc Ser B 48(2):206–213
Epstein B (1960) Tests for the validity of the assumption that the underlying distribution of life is exponential Part i. Technometrics 2(1):83–101
Gail MH, Gastwirth JL (1978) A scale-free goodness-of-fit test for exponentiality based on the gini statistic. J R Stat Soc B 40:350–357
Gnedenko BV, Belyayev YK, Solovyev AD (1969) Mathematical models of reliability theory. Academic Press, Cambridge
Grané A, Fortiana J (2011) A directional test of exponentiality based on maximum correlations. Metrika 73:255–274
Grzegorzewski P, Wieczorkowski R (1999) Entropy-based goodness-of-fit test for exponentiality. Commun Stat Theory Methods 28:1183–1202
Hahn GJ, Shapiro SS (1967) Statistical models in engineering. Wiley, New York
Harris CM (1976) A note on testing for exponentiality. Nav Res Logist Q 23:169–175
Haywood J, Khmaladze E (2008) On distribution-free goodness-of-fit testing of exponentiality. J Econom 143:5–18
Hegazy YAS, Green JR (1975) Some new goodness-of-fit tests using order statistics. J Appl Stat 24(3):299–308
Henze N (1993) A new flexible class of omnibus tests for exponentiality. Commun Stat Theory Methods 22:115–133
Henze N, Meintanis SG (2002) Tests of fit for exponentiality based on the empirical laplace transform. Statistics 36(2):147–161
Henze N, Meintanis SG (2005) Recent and classical tests for exponentiality: a partial review with comparisons. Metrika 36:29–45
Jackson OAY (1967) An analysis of departure from the exponential distribution. J R Stat Soc Ser B 29(3):540–549
Jammalamadaka SR, Goria MN (2004) A test of goodness-of-fit based on gini’s index of spacings. Stat Probab Lett 68(2):177–187
Jammalamadaka SR, Taufer E (2003) Testing exponentiality by comparing the empirical distribution function of the normalized spacings with that of the original data. J Nonparametr Stat 15(6):719–729
Jammalamadaka SR, Taufer E (2006) Use of mean residual life in testing departures from exponentiality. J Nonparametr Stat 18(3):277–292
Jovanović M, Milošević B, Nikitin YY, Obradović M, Volkova KY (2015) Tests of exponentiality based on Arnold–Villasenor characterization and their efficiencies. Comput Stat Data Anal 90:100–113
Klar B (2001) Goodness-of-fit tests for the exponential and the normal distribution based on the integrated distribution function. Ann Inst Stat Math 53(2):338–353
Kotze S, Johnson NL (1983) Encyclopedia of statistical sciences, vol 3. Wiley, New York
Lee SCS, Locke C, Spurrier JD (1980) On a class of tests of exponentiality. Technometrics 22:547–554
Lemonte AJ (2013) A new exponential-type distribution with constant, decreasing, increasing, upside-down bathtub and bathtub-shaped failure rate function. Comput Stat Data Anal 62:149–170
Meintanis SG, Swanepoel JWH, Allison JS (2014) The probability weighted characteristic function and goodness-of-fit testing. J Stat Plan Inference 146:122–132
Mimoto N, Zitikus R (2008) The atikinson index, the moran statistic, and testing exponentiality. J Jpn Stat Soc 38(2):187–205
Moran PAP (1951) The random division of an interval-Part II. J R Stat Soc Ser B 13(1):147–150
Noughabi HA (2015) Testing exponentiality based on the likelihood ratio and power comparison. Ann Data Sci 2(2):195–204
Noughabi HA, Arghami NR (2011a) Testing exponentiality based on characterizations of the exponential distribution. J Stat Comput Simul 81(11):1641–1651
Noughabi HA, Arghami NR (2011b) Testing exponentiality using transformed data. J Stat Comput Simul 81(4):511–516
Novikov A, Pusev R, Yakovlev M (2013) Exptest: tests for exponentiality. http://CRAN.R-project.org/package=exptest, R Package Version 1.2
Parzen E (1998) Statistical methods mining, two sample data analysis, comparison distributions, and quantile limit theorems. Asymptot Methods Probab Stat 1:611–617
R Core Team (2013) R: A language and environment for statistical computing. http://www.R-project.org/
Rao M, Chen Y, Vemuri BC, Wang F (2004) Cumulative residual entropy: a new measure of information. IEEE Trans Inf Theory 50(6):1220–1228
Seshadri V, Csorgo M, Stephens MA (1969) Tests for the exponential distribution using Kolmogorov-type statistics. J R Stat Soc Ser B 31:499–509
Shanbhag DN (1970) The characterizations for exponential and geometric distributions. J Am Stat Assoc 65(331):1256–1259
Shapiro SS, Wilk MB (1972) An analysis of variance test for the exponential distribution (complete samples). Technometrics 14(2):355–370
Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London
Spurrier JD (1984) An overview of tests for exponentiality. Commun Stat Theory Methods 13:1635–1654
Taufer E (2000) A new test for exponentiality against omnibus alternatives. Stoch Model Appl 3:23–36
Volkova KY (2010) On asymptotic efficiency of exponentiality tests based on Rossberg’s characterization. J Math Sci 167(4):486–494
Volkova KY, Nikitin YY (2013) Exponentiality tests based on Ahsanullah’s characterization and their efficiency. J Math Sci 204(1):42–54
Wang B (2008) Goodness-of-fit test for the exponential distribution based on progressively type-II censored sample. J Stat Comput Simul 78(2):125–132
Wong PG, Wong SP (1979) An extremal quotient test for exponential distributions. Metrika 26(1):1–4
Zardasht V, Parsi S, Mousazadeh M (2015) On empirical cumulative residual entropy and a goodness-of-fit test for exponentiality. Stat Pap 56(3):677–688
Zhang J (2002) Powerful goodness-of-fit tests based on the likelihood ratio. J R Stat Soc Ser B 64(2):281–294
Acknowledgements
The first author thanks the National Research Foundation of South Africa for financial support. The authors would also like to thank the Referee and Associate Editor for their constructive and insightful comments that led to an improvement of the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Allison, J.S., Santana, L., Smit, N. et al. An ‘apples to apples’ comparison of various tests for exponentiality. Comput Stat 32, 1241–1283 (2017). https://doi.org/10.1007/s00180-017-0733-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-017-0733-3