1 Introduction and motivation

The exponential distribution is a popular choice of model both in practice and in theoretical work. For this reason a great deal of research has been dedicated to the large number of ways in which it can be uniquely characterised. This has ultimately lead to a multitude of tests for testing the hypothesis that observed data are realised from the exponential distribution.

Several authors have written review papers on this topic, describing and comparing a number of tests, see, for example, Spurrier (1984), Ascher (1990) and Henze and Meintanis (2002). However, the most recent review paper on this topic was written more than 10 years ago by Henze and Meintanis (2005). Since then, a number of new tests have been proposed, see for example Jammalamadaka and Taufer (2006), Haywood and Khmaladze (2008), Mimoto and Zitikus (2008), Wang (2008), Volkova (2010), Grané and Fortiana (2011), Abbasnejad et al. (2012), Baratpour and Habibi Rad (2012), Volkova and Nikitin (2013), Meintanis et al. (2014) and Zardasht et al. (2015). Furthermore, many of the tests for exponentiality contain a tuning parameter, often appearing in a weight function. The fact that the powers of these tests are functions of the tuning parameter complicates the comparisons between tests. In many papers the authors evaluate the power of the test over a grid of possible values of this parameter, but the problem with this approach is that the optimal choice of the tuning parameter is unknown in practice. In these papers the authors often provide a so-called ‘compromise’ choice; this is a choice of the tuning parameter that provides reasonably high power for the majority of the alternatives considered in their finite sample studies. Examples of papers that contain these compromise choices include Henze and Meintanis (2002, 2005) and Meintanis et al. (2014). However, while these fixed choices of the parameter are able to produce high powers against a number of alternatives, they can also produce abysmally low powers against other alternatives. Naturally, in practice, the distribution of the realised data is unknown, meaning that the power of tests employing the compromise choice might be suspect.

A method to choose the value of the tuning parameter data-dependently is proposed in Allison and Santana (2015). This approach removes the practical problem of choosing the tuning parameter and also allows one to directly compare the powers achieved by various goodness-of-fit tests.

The aim of this paper is to objectively compare the powers of various tests for exponentiality. Where applicable, the methodology detailed in Allison and Santana (2015) is used in order to choose the value of the tuning parameter data-dependently; this allows a fair ‘apples to apples’ comparison between the tests containing a tuning parameter and those without one.

The remainder of the paper is organised as follows: In Sect. 2 we introduce and provide details of the various tests for exponentiality that form part of the simulation study. The data-dependent choice of the tuning parameter is discussed in Sect. 3. Section 4 presents the results of an extensive Monte Carlo study of the empirical powers of the tests against numerous alternatives to the exponential distribution (a distinction is made between alternative distributions with increasing, decreasing, and non-monotone hazard rates). In Sect. 5 we apply all the tests to a real-world data set and the paper concludes in Sect. 6 with some final remarks.

2 Tests for exponentiality

Let \(X_{1},X_{2},\ldots ,X_{n}\) be a sequence of independent and identically distributed continuous realisations of a random variable X. Denote the exponential distribution with expectation \(1/\lambda \) by \(Exp\left( \lambda \right) \). The composite goodness-of-fit hypothesis to be tested is

$$\begin{aligned} H_{0}{:}\,\text {the distribution of }X\text { is }{} { Exp}\left( \lambda \right) , \end{aligned}$$

for some \(\lambda >0\), against general alternatives.

The majority of the test statistics that we consider are based on the scaled values \(Y_{j}=X_{j}\hat{\lambda }\), where \(\hat{\lambda }=1/\bar{X}_{n}\) with \( \bar{X}_{n}=\frac{1}{n}\sum _{j=1}^{n}X_{j}\). The use of scaled values is motivated from the invariance property of the exponential distribution with respect to scale transformations. Since X follows an exponential distribution if and only if cX is exponentially distributed for every \(c>0\), we would not expect a scale transformation to influence the conclusion drawn regarding the exponentiality of X. As a result, the test statistic depends on the data only through scaled versions of the original data, and the conclusions drawn regarding the exponentiality of \(X_{1},\ldots ,X_{n}\) and \( Y_{1},\ldots ,Y_{n}\) should be the same. In the remainder of the paper we denote the order statistics of \(X_{j}\) and \(Y_{j}\) by \(X_{\left( 1\right) }<X_{\left( 2\right) }<\cdots <X_{\left( n\right) }\) and \(Y_{\left( 1\right) }<Y_{\left( 2\right) }<\cdots <Y_{\left( n\right) }\) respectively.

In this section we provide short descriptions of the 20 tests for exponentiality that we compare to one another in the Monte Carlo study in Sect. 3. These tests are arranged according to the characteristics of the exponential distribution that the test is based on. These tests are chosen because they provide a diverse selection of established tests (tests that have been shown to perform well in terms of power) and newly developed tests, and simultaneously considering tests that contain a tuning parameter as well as those that do not. In addition to the tests presented in this section, we also provide references to numerous other tests for exponentiality not included in this study.

2.1 Tests based on the empirical characteristic function

In recent years many goodness-of-fit tests have been developed which are based on the characteristic function (CF). Typically in these tests the CF of a random variable X, given by

$$\begin{aligned} \phi (t) = E\left[ e^{{ itX}}\right] , \end{aligned}$$

is estimated by the empirical characteristic function (ECF) of the data \(X_{1},\ldots ,X_{n}\), defined as

$$\begin{aligned} \phi _{n}(t) = \frac{1}{n} \sum _{j=1}^{n}{e^{{ itX}_{j}}}. \end{aligned}$$

Standard methods for testing that employ the ECF utilise the L2-type distance

$$\begin{aligned} \int _{-\infty }^{\infty } |\phi _n(t) - \phi (t)|^2 w_\gamma (t) { dt}, \end{aligned}$$

which incorporates the CF, ECF and a parametric weight function , which usually satisfy the conditions \(\int _{-\infty }^\infty t^2 w_\gamma (t) dt <\infty \), \(w_\gamma (t) = w_\gamma (-t)\), and \(w_\gamma (t) \ge 0,~\forall ~t\), and depends on some tuning parameter \(\gamma \).

There has been considerable discussion in the literature on the choice of \(w_\gamma (t)\). Popular choices are \(w_\gamma (t)= e^{-\gamma |t|}\) or \(w_\gamma (t) = e^{-\gamma t^2}\). Both of these correspond to kernel-based choices with \(e^{-\gamma |t|}\) being a multiple of the standard Laplace density as kernel with bandwidth equal to \(1/\gamma \) and \(e^{-\gamma t^2}\) a multiple of the standard normal density as kernel with bandwidth equal to \(1/(\gamma \sqrt{2})\).

For various tests for exponentiality that incorporate the ECF, the interested reader is referred to Henze and Meintanis (2002) and Henze and Meintanis (2005) and the references therein. However, for the purposes of this paper we will only focus on the ‘Epps and Pulley’ test proposed in Epps and Pulley (1986) and a more recent test based on the concept of the probability weighted empirical characteristic function (PWECF) proposed in Meintanis et al. (2014).

2.1.1 Epps and Pulley (1986) test (\({ EP}_n\))

The test proposed in Epps and Pulley (1986) is based on the difference between the ECF, \(\phi _n(t)\), of \(X_1, X_2,\ldots ,X_n\) and the CF of the exponential distribution, \(\phi _0(t,\lambda )=\lambda /(\lambda - it)\). If the data are exponentially distributed with parameter \(\lambda \), then \(\phi _n(t)\) should be close to \(\phi _0(t,\lambda )\).

Estimating \(\lambda \) by \(\widehat{\lambda } = 1/\bar{X}_n\), the test is based on the idea that the quantity

$$\begin{aligned} \int _{-\infty }^{\infty }{\left( \phi _{n}(t) - \phi _{0}(t,1/\bar{X}_n)\right) w(t){ dt}}, \end{aligned}$$

should be small under the null hypothesis, where

$$\begin{aligned} w(t) = \frac{1}{2\pi (1 + i\bar{X}_nt)}. \end{aligned}$$

The normalised Epps and Pulley test statistic simplifies to

$$\begin{aligned} { EP}_{n}= & {} \sqrt{48n} \int _{0}^{\infty }{\left( \phi _{n}(t) - \frac{1}{1 - i\bar{X}_{n}t} \right) \frac{\bar{X}_{n}}{2\pi (1+i\bar{X}_{n}t)}{} { dt}}\\= & {} \sqrt{48n}\left[ \frac{1}{n}\sum _{j=1}^{n}{e^{-Y_{j}} - \frac{1}{2}}\right] . \end{aligned}$$

This test rejects \(H_{0}\) for large values of \(|{ EP}_{n}|\). The null distribution of this test statistic was shown to be standard normal in Epps and Pulley (1986). Furthermore, the test was also shown to be consistent against absolutely continuous alternative distributions with monotone hazard rates, strictly positive supports and finite expected values. In a number of studies it has been shown that this test is reasonably powerful, see for example Henze and Meintanis (2005).

2.1.2 PWECF (\({ PW}^1_{n,\gamma }\) and \({ PW}^2_{n,\gamma }\))

There has been a lot of discussion regarding the form of the weight function when using goodness-of-fit tests based on the ECF and CF. Fortunately, Meintanis et al. (2014) provides a statistically meaningful way to choose the weight function. This choice reduces the problem to only choosing a tuning parameter \(\gamma \), typically still contained in the weight function. The probability weighted characteristic function (PWCF) is defined as

$$\begin{aligned} \chi (t;\gamma ) = E\left[ W(X;\gamma t)e^{{ itX}}\right] = \int _{-\infty }^{\infty }{W(x;\gamma t)e^{{ itx}}{} { dF}_\lambda (x)}, \end{aligned}$$

where the probability weight function is given by

$$\begin{aligned} W(x,\beta ) = \left[ F_\lambda (x)(1-F_\lambda (x))\right] ^{|\beta |}, \quad \beta \in \mathbb {R},\quad x\in \mathbb {R}, \end{aligned}$$
(1)

and where denotes the exponential distribution function with parameter \(\lambda \). Note that the weight function in (1) places more weight at the centre of the distribution than in the tails. The probability weighted empirical characteristic function (PWECF) is then defined as

$$\begin{aligned} \chi _{n}(t;\gamma ) = \frac{1}{n} \sum _{j=1}^{n}{\widehat{W}(X_{j};\gamma t)e^{{ itX}_{j}}},\quad t \in \mathbb {R}, \end{aligned}$$
(2)

where the estimated probability weight is given by

$$\begin{aligned} \widehat{W}(X_{j};\beta ) = \left[ F_{\widehat{\lambda }}(x)(1-F_{\widehat{\lambda }}(x))\right] ^{|\beta |},\quad \beta \in \mathbb {R},\quad x \in \mathbb {R}, \end{aligned}$$

and where denotes the exponential distribution function with estimated parameter \(\widehat{\lambda }\).

Meintanis et al. (2014) employs these expressions and develops a test for exponentiality based on the L2-norm between \(\chi _{n}(t;\gamma )\) and \(\chi (t;\gamma )\). The resulting test statistic is given by

$$\begin{aligned} { PW}^{1}_{n,\gamma } = {n} \int _{-\infty }^{\infty }{|\chi _{n}(t;\gamma ) - \chi (t;\gamma )|^{2}{} { dt}}. \end{aligned}$$
(3)

Note that the weight function that plagues other tests based on the ECF no longer appears in the test statistic, since the weight function has been incorporated within the PWECF and PWCF functions themselves. In Meintanis et al. (2014), the limiting null distribution of the test statistic is derived and it is shown that this test is consistent for a very large class of alternative distributions. In a finite sample simulation study, the test was also found to be quite powerful against a variety of alternative distributions.

The test statistic in (3) can be simplified to

$$\begin{aligned} { PW}^{1}_{n,\gamma } =&-\frac{2}{n^{2}} \sum _{j=1}^{n} { \sum _{k=1}^{n} {\frac{\gamma \ln {\left[ \left( 1-Z_j\right) Z_j\left( 1-Z_k\right) Z_k\right] }}{(X_{j}-X_{k})^{2} + \gamma ^{2}\ln ^{2}{\left[ \left( 1-Z_j\right) Z_j\left( 1-Z_k\right) Z_k\right] }}}}\\&+ \frac{2}{n} \sum _{j=1}^{n}{\int _{0}^{1}{\frac{\gamma \ln {\left[ \left( 1-Z_j\right) Z_j\left( 1-u\right) u\right] }}{\left[ X_{j} + \ln {(1-u)}\right] ^{2} + \gamma ^{2}\ln ^{2}{\left[ \left( 1-Z_j\right) Z_j\left( 1-u\right) u\right] }}{} { du}}}, \end{aligned}$$

where \(Z_j = \exp (-Y_j)\). In the Monte Carlo simulation study presented in Meintanis et al. (2014) the power of this test was evaluated over a grid of possible choices of the tuning parameter \(\gamma \). However, for practical applications the authors suggest using \(\gamma =1\), because this choice fared well for the majority of the alternatives considered in their paper. We will henceforth refer to this type of recommended choice of the parameter as the compromise choice.

In Meintanis et al. (2014), the weight function is chosen to give more weight to the centre of the distribution. In this paper we also consider a weight function that places greater weight on the tails. This alternative choice for the weight function appearing in (2) is given by

$$\begin{aligned} \widetilde{W}(X_{j};\beta ) = \left[ \frac{1}{4} - F_{\widehat{\lambda }}(x)(1-F_{\widehat{\lambda }}(x))\right] ^{|\beta |},\quad \beta \in \mathbb {R},\quad x \in \mathbb {R}, \end{aligned}$$

and the test statistic resulting from (3) when employing this weight function is denoted by \({ PW}^{2}_{n,\gamma }\). Based on some preliminary Monte Carlo studies, we recommend using \(\gamma =0.1\) as the compromise choice.

Both \({ PW}^{1}_{n,\gamma }\) and \({ PW}^{2}_{n,\gamma }\) reject for large values.

2.2 Tests based on the empirical Laplace transform

In general, the Laplace transform (LT) of a random variable X is defined as \(E\left[ e^{-tX}\right] \). For a standard exponential random variable, Y, the Laplace transform is given by

$$\begin{aligned} \psi (t) = E\left[ e^{-tY}\right] =\frac{1}{1 + t}. \end{aligned}$$

Employing the scaled data \( Y_{1},\ldots ,Y_{n}\), \(\psi (t)\) can be estimated by the empirical Laplace transform (ELT),

$$\begin{aligned} \psi _{n}(t) = \frac{1}{n} \sum _{j=1}^{n}{e^{-tY_{j}}}. \end{aligned}$$

We consider two test statistics based on the ELT, namely the ‘Baringhaus and Henze (1991)’ test and the ‘Henze and Meintanis (2002)’ test.

2.2.1 Baringhaus and Henze (1991) test (\({ BH}_{n,\gamma }\))

Baringhaus and Henze (1991) developed a test based on the following differential equation that characterises the exponential distribution: \((1 + t)\psi '(t) + \psi (t) = 0\), for all \(t \in \mathbb {R}\).

Their test makes use of the following weighted L2-norm

$$\begin{aligned} { BH}_{n,\gamma } = n\int _{0}^{\infty } \left[ (1 + t)\psi _{n}'(t) + \psi _{n}(t)\right] ^2 \exp (-\gamma t){ dt}, \end{aligned}$$
(4)

where \(\gamma > 0\) is a constant tuning parameter. It is easy to show that the statistic in (4) simplifies to

$$\begin{aligned} { BH}_{n,\gamma }&= \frac{1}{n} \sum _{j=1}^{n}\sum _{k=1}^{n}\left[ \frac{(1-Y_{j})(1-Y_{k})}{Y_{j}+Y_{k}+\gamma } - \frac{Y_{j}+Y_{k}}{(Y_{j}+Y_{k}+\gamma )^{2}} \right. \\&\quad \left. + \frac{2Y_{j}Y_{k}}{(Y_{j}+Y_{k}+\gamma )^{2}} + \frac{2Y_{j}Y_{k}}{(Y_{j}+Y_{k}+\gamma )^{3}}\right] . \end{aligned}$$

Baringhaus and Henze (1991) showed that the test statistic has a nondegenerate limiting null distribution and also that the test is consistent against a class of alternative distributions with strictly positive, finite mean. The compromise choice for \(\gamma \) suggested in Baringhaus and Henze (1991) is \(\gamma =1\). This test rejects exponentiality for large values of \({ BH}_{n,\gamma }\).

2.2.2 Henze and Meintanis (2002) test (\(L_{n,\gamma }\))

The natural idea of creating a test for exponentiality by measuring the L2-distance between the ELT and the LT for the standard exponential distribution was first proposed in Henze (1993). The proposed test statistic has the following form:

$$\begin{aligned} H_{n,\gamma } = n\int _0^\infty \left( \psi _n(t) - \frac{1}{1 + t}\right) ^2 \exp (-\gamma t) { dt}. \end{aligned}$$
(5)

This test statistic should produce a value close to zero if the null hypothesis is true. However, the equation in (5) does not simplify to a simple closed-form expression and requires numerical integration. To overcome this issue Henze and Meintanis (2002) proposes the following form of the test statistic:

$$\begin{aligned} L_{n,\gamma }= n\int _{0}^{\infty }{\left[ \psi _{n}(t) - \frac{1}{1+t}\right] ^{2}(1+t)^{2}\exp (-\gamma t){ dt}}, \end{aligned}$$
(6)

where \(\gamma > 0\). The statistic in (6) simplifies to the following closed-form expression:

$$\begin{aligned} L_{n,\gamma } = \frac{1}{n} \sum _{j=1}^{n}{\sum _{k=1}^{n}{\left[ \frac{1+(Y_{j}+Y_{k}+\gamma +1)^{2}}{(Y_{j}+Y_{k}+\gamma )^{3}}\right] }} - 2\sum _{j=1}^{n}{\left[ \frac{1+Y_{j}+\gamma }{(Y_{j}+\gamma )^{2}}\right] } + \frac{n}{\gamma }. \end{aligned}$$

Two possible compromise choices for the parameter \(\gamma \) are suggested for practical applications in Henze and Meintanis (2002); \(\gamma =0.75\) and \(\gamma =1\). For the purpose of this paper, we will make use of \(\gamma =0.75\). This test rejects \(H_{0}\) for large values of \(L_{n,\gamma }\).

2.3 Tests based on the empirical distribution function

The use of distance measures based on the empirical distribution function (EDF) is one of the earliest approaches to goodness-of-fit testing. The EDF based on the scaled data \(Y_1, \ldots , Y_n\) is defined as

$$\begin{aligned} F_n(x) = \frac{1}{n}\sum _{j=1}^n I(Y_j\le x), \end{aligned}$$

where denotes the indicator function and \(x \in \mathbb {R}\). The tests considered measure the discrepancy between the standard exponential distribution function and the EDF. The most famous of these include the Kolmogorov–Smirnov and Cramér–von Mises tests (see, for example, D’Agostino and Stephens 1986), which are discussed below. Another test, based on the integrated EDF, can be found in Klar (2001), but is not discussed here.

2.3.1 Kolmogorov–Smirnov (\({ KS}_n\))

The Kolmogorov–Smirnov test statistic is given by:

$$\begin{aligned} { KS}_{n} = \sup _{x\ge 0} \left| F_n(x) - \left( 1-e^{-x}\right) \right| . \end{aligned}$$
(7)

The test statistic in (7) can be simplified to

$$\begin{aligned} { KS}_n = \max \left\{ { KS}_{n}^{+}, { KS}_{n}^{-}\right\} , \end{aligned}$$

where

$$\begin{aligned} { KS}_{n}^{+} =&\max _{1\le {j}\le {n}} \left[ \frac{j}{n} - \left( 1-e^{-Y_{(j)}}\right) \right] , \\ { KS}_{n}^{-} =&\max _{1\le {j}\le {n}} \left[ \left( 1-e^{-Y_{(j)}}\right) - \frac{j-1}{n}\right] . \end{aligned}$$

This test rejects the null hypothesis for large values of \({ KS}_n\).

2.3.2 Cramér–von Mises (\({ CM}_n\))

The Cramér–von Mises test statistic for testing exponentiality is given by

$$\begin{aligned} { CM}_{n} = \int _{0}^{\infty } \left[ F_n(x) - \left( 1-e^{-x}\right) \right] ^2 e^{-x}{} { dx}. \end{aligned}$$
(8)

The test statistic in (8) can be simplified to

$$\begin{aligned} { CM}_{n} = \frac{1}{12n} + \sum _{j=1}^{n}\left[ \left( 1-e^{-Y_{(j)}}\right) - \frac{2j-1}{2n}\right] ^{2}. \end{aligned}$$

Large values of \({ CM}_{n}\) will lead to the rejection of the null hypothesis.

2.4 Tests based on mean residual life

In reliability theory and survival analysis the mean residual life (MRL) of a non-negative random variable X at time t, defined as the expected value of the amount of life time remaining after time t, is expressed as

$$\begin{aligned} m(t) = E\left[ X-t|X>t\right] = \frac{\int _{t}^{\infty }{S(x){ dx}}}{S(t)}, \end{aligned}$$

where \(S(t) = 1-F(t)\) is the survival function. It was shown in Shanbhag (1970) that the exponential distribution is characterised by a constant MRL, i.e., for the exponential distribution we have that

$$\begin{aligned} m(t) = E(X) = \frac{1}{\lambda }, \quad \forall t > 0. \end{aligned}$$
(9)

It can be shown that the characterisation in (9) is equivalent to

$$\begin{aligned} E\left( \min {\{X,t\}}\right) = \frac{F(t)}{\lambda }, \quad \forall t>0, \end{aligned}$$
(10)

or

$$\begin{aligned} \int _{t}^{\infty } S(x){ dx} = \frac{S(t)}{\lambda }, \quad \forall t > 0. \end{aligned}$$
(11)

Tests based on the MRL (and the various forms of the characterising properties given in (9) to (11)) to test for exponentiality can be found in Baringhaus and Henze (2000), Jammalamadaka and Taufer (2006) and Taufer (2000). A generalisation of the test in Baringhaus and Henze (2000) which includes a more general weight function can be found in Baringhaus and Henze (2008). The two tests considered in this paper, namely the Jammalamadaka and Taufer test from Jammalamadaka and Taufer (2006) and the Baringhaus and Henze test from Baringhaus and Henze (2000), employ the characterisations in (9) and (10), respectively. The test proposed by Taufer (2000), however, makes use of the characterisation in (11). This test is not considered in this study.

2.4.1 Baringhaus and Henze (2000) (\(\overline{{ KS}}_n\) and \(\overline{{ CM}}_n\))

In Baringhaus and Henze (2000), a Kolmogorov–Smirnov and Cramér–von Mises type tests based on the MRL is introduced. The test statistic of the Kolmogorov–Smirnov version of the test is given by

$$\begin{aligned} \overline{{ KS}}_{n} = \sqrt{n} \sup _{t\ge {0}}\left| \frac{1}{n}\sum _{j=1}^{n}{\min \{Y_{j},t\}} - \frac{1}{n}\sum _{j=1}^{n}{I\left( Y_{j}\le {t}\right) }\right| = \sqrt{n} \max \left\{ \overline{{ KS}}_{n}^{+},\overline{{ KS}}_{n}^{-}\right\} , \end{aligned}$$

where

$$\begin{aligned} \overline{{ KS}}_{n}^{+}&= \max _{j \in \{0,1,\ldots ,n-1\}}{\left[ \frac{1}{n} \left( Y_{(1)}+\cdots +Y_{(j)}\right) + Y_{(j+1)} \left( 1-\frac{j}{n}\right) - \frac{j}{n}\right] }, \\ \overline{{ KS}}_{n}^{-}&= \max _{j \in \{0,1,\ldots ,n-1\}}{\left[ \frac{j}{n} - \frac{1}{n} \left( Y_{(1)}+\cdots +Y_{(j)}\right) - Y_{(j)} \left( 1-\frac{j}{n}\right) \right] }. \end{aligned}$$

The Cramér–von Mises type test statistic is:

$$\begin{aligned} \overline{{ CM}}_{n}&= n \int _{0}^{\infty }{\left[ \frac{1}{n} \sum _{j=1}^{n}{\min {\{Y_{j},t\}}} - \frac{1}{n} \sum _{j=1}^{n}{I\left( Y_{j}\le {t}\right) }\right] ^{2} e^{-t}dt} \\&= \frac{1}{n} \sum _{j=1}^{n}\sum _{k=1}^{n}\Big [2 - 3\exp {\left( -\min \{Y_{j},Y_{k}\}\right) } - 2\min \{Y_{j},Y_{k}\}\left( e^{-Y_{j}} + e^{-Y_{k}}\right) \\&\quad +\,2\exp {\left( -\max \{Y_{j},Y_{k}\}\right) }\Big ]. \end{aligned}$$

The null hypothesis is rejected for large values of \(\overline{{ KS}}_{n}\) and \(\overline{{ CM}}_{n}\). The asymptotic null distributions of \(\overline{{ KS}}_{n} \) and \(\overline{{ CM}}_{n}\) are identical to the asymptotic null distributions of \({ KS}_n\) and \({ CM}_n\) when used to test for a standard uniform distribution. Baringhaus and Henze (2000) showed that these two tests are consistent against each fixed alternative distribution with positive mean.

2.4.2 Jammalamadaka and Taufer (2006) (\(J_{n,\gamma }\))

In Jammalamadaka and Taufer (2006), a test based on the characterization in (9) is developed by first defining what they call the ‘sample MRL after \(X_{(k)}\)’ as follows:

$$\begin{aligned} \bar{X}_{>k}&= \frac{1}{n-k+1} \sum _{j=k+1}^{n+1}{\left( X_{(j)}-X_{(k)}\right) } \\&= \frac{1}{n-k+1} \sum _{j=k+1}^{n+1}{(n-j+2)\left( X_{(j)}-X_{(j-1)}\right) }. \end{aligned}$$

Under exponentiality it follows that

$$\begin{aligned} E\left[ \bar{X}_{>k}\right] = E\left[ \bar{X}_{n}\right] = \frac{1}{\lambda }, \quad k=1, 2, \ldots , n. \end{aligned}$$
(12)

Using (12), a Kolmogorov–Smirnov type statistic is proposed in Jammalamadaka and Taufer (2006) as a possible test for exponentiality:

$$\begin{aligned} J'_{n} = \max _{1 \le k \le n } \frac{\left| \bar{X}_{n} - \bar{X}_{>k}\right| }{\bar{X}_{n}}. \end{aligned}$$

Unfortunately, it was shown that this version of the test statistic does not converge to zero even under the null hypothesis of exponentiality. To overcome this problem and some other issues plaguing the statistic \(J'_n\), Jammalamadaka and Taufer (2006) constructs a trimmed test statistic whereby some of the last residual means are removed from the calculation. The resulting test statistic has the form

$$\begin{aligned} J_{n, \gamma } = \max _{1 \le {k} \le {n - \left\lfloor n^{\gamma }\right\rfloor }} \frac{n^{\frac{\gamma }{2}}\left| \bar{X}_{n} - \bar{X}_{>k}\right| }{\bar{X}_{n}},\quad \gamma \in (0,1), \end{aligned}$$
(13)

where \(\left\lfloor x\right\rfloor = floor(x)\) and \(\gamma \) is the trimming parameter which indicates how many of the last residual means are discarded. This test rejects the null hypothesis for large values of \(J_{n, \gamma }\).

In Jammalamadaka and Taufer (2006), the authors derive the asymptotic null distribution of \(J_{n,\gamma }\) and also prove that the test is consistent for every fixed non-exponential alternative distribution with finite mean. In addition, it is shown that the powers of the test are highly sensitive to the choice of \(\gamma \), but that a compromise choice of \(\gamma =0.9\) (i.e., when a large proportion of the last mean residuals are trimmed) produces the highest powers for the majority of the alternatives considered.

2.5 Tests based on entropy

For a non-negative continuous random variable X with density function f(x), the entropy (sometimes referred to as the differential entropy) is given by

$$\begin{aligned} { DE}(X) = - \int _{0}^{\infty }{f(x)\ln {f(x)}{} { dx}.} \end{aligned}$$
(14)

Initial attempts (see, for example, Grzegorzewski and Wieczorkowski 1999; Ebrahimi et al. 1992) to construct tests for exponentiality based on the entropy exploited the characterisation that, among all distributions with support \([0,\infty )\) and fixed mean, the quantity DE(X) is maximised if X follows an exponential distribution. However, these tests are not explored further in this paper, instead we focus on two more recent tests based on the cumulative residual entropy (CRE). The CRE, introduced in Rao et al. (2004), is an alternative information measure which replaces the density function in (14) with the survival function, and is defined as

$$\begin{aligned} { CRE}(X) = - \int _{0}^{\infty }{S(x) \ln {S(x)}{} { dx}}, \end{aligned}$$

where \(S(x) = 1 - F(x)\) is the survival function.

2.5.1 Zardasht et al. (2015) (\({ ZP}_n\))

The first test for exponentiality based on the CRE information measure considered is found in Zardasht et al. (2015). Let X and Z be non-negative random variables with distribution functions F and G, respectively. The test is based on the CRE of the so-called comparison distribution function, \(D(u) = F(G^{-1}(u))\) (Parzen 1998). Calculating the CRE of a random variable with distribution function D(u) and simplifying the following expression is obtained

$$\begin{aligned} \mathcal {C}(X,Z) = - \int _{0}^{\infty }{S(x) \ln {S(x)}{} { dG}(x)}. \end{aligned}$$
(15)

If W is exponentially distributed with parameter \(\lambda > 0\), then (15) can be expressed as

$$\begin{aligned} \mathcal {C}(W,Z) = \int _{0}^{\infty }{x\lambda e^{-x\lambda }{} { dG}(x)}, \end{aligned}$$

which is a measure used to compare the distribution function of Z to that of the exponential distribution. If Z is also exponentially distributed, then it easily follows that \(\mathcal {C}(W,Z) = \frac{1}{4}\). The authors of Zardasht et al. (2015) based their test statistic on the difference between an estimator for \(\mathcal {C}(W,Z)\) and \(\frac{1}{4}\). The resulting test statistic is thus

$$\begin{aligned} { ZP}_n = \frac{1}{n} \sum _{j=1}^{n}{Y_{j}e^{-Y_{j}}} - \frac{1}{4}. \end{aligned}$$

This test rejects exponentiality for both small and large values of \({ ZP}_n\). Zardasht et al. (2015) go on to show that \(\sqrt{n}{} { ZP}_n \mathop {\rightarrow }\limits ^{\mathcal {D}} N(0,5/382)\), but did not formally prove the consistency of the test.

2.5.2 Baratpour and Habibi Rad (2012) (\({ BR}_n\))

The next test considered is based on the cumulative Kullback–Leibler (CKL) divergence (and indirectly on the CRE) introduced in Baratpour and Habibi Rad (2012). If \(W_1\) and \(W_2\) are two non-negative continuous random variables with distribution functions H and G, respectively, then the CKL divergence between these two distributions is defined as

$$\begin{aligned} { CKL}(H,G) = \int _0^\infty \left( 1-H(x)\right) \ln \frac{1-H(x)}{1 - G(x)}{} { dx} - \left[ E(W_1) - E(W_2)\right] . \end{aligned}$$

Note that the CKL divergence is somewhat similar to the classical Kullback–Leibler divergence, with the density functions replaced by survival functions.

The authors make use of the fact that, if the null hypothesis is true, then \({ CKL}(F,F_0) = 0\). Rewriting the CKL measure in terms of the CRE measure, and plugging in the necessary estimates, they arrive at the following test statistic

$$\begin{aligned} { BR}_n = \frac{\sum _{j=1}^{n-1}{\frac{n-j}{n} \left( \ln {\frac{n-j}{n}}\right) \left( X_{(j+1)}-X_{(j)}\right) } + \frac{\sum _{j=1}^{n}{X_{j}^{2}}}{2\sum _{j=1}^{n}{X_{j}}}}{\frac{\sum _{j=1}^{n}{X_{j}^{2}}}{2\sum _{j=1}^{n}{X_{j}}}}. \end{aligned}$$

The asymptotic distribution under the null hypothesis is not derived in Baratpour and Habibi Rad (2012), however it is shown that the test is consistent.

This test rejects \(H_{0}\) for large values of \({ BR}_n\).

2.6 Tests based on normalised spacings

It has been shown (see, for example, Jammalamadaka and Goria 2004) that transforming the data can increase the power of tests for exponentiality against certain alternatives. A widely used transformation is to convert the data to the so-called normalized spacings, defined as

$$\begin{aligned} D_{j} = \left( n-j+1\right) \left( X_{(j)} - X_{(j-1)}\right) ,\quad j = 1,\ldots ,n, \end{aligned}$$

with \(X_{(0)} = 1\). To find tests for exponentiality that use normalised spacings, the reader is referred to Epstein (1960), Jammalamadaka and Taufer (2003) and Jammalamadaka and Goria (2004), and for a test where these spacings are used to test for exponentiality in the presence of type-II censoring, see Balakrishnan et al. (2002). We consider two other tests based on spacings; one found in Gail and Gastwirth (1978) and a modification of a test in Gnedenko et al. (1969) which is found in Harris (1976).

2.6.1 Gini test (\(G_n\))

A test statistic that employs normalised spacings for testing exponentiality is described in D’Agostino and Stephens (1986) and is given by:

$$\begin{aligned} { DS}_n = \sum _{j=1}^{n-1} U_j = 2n - \frac{2}{n}\sum _{j=1}^n jY_{(j)}, \end{aligned}$$
(16)

where

$$\begin{aligned} U_k = \frac{\sum _{j=1}^k D_j}{\sum _{j=1}^n X_j},\quad \text { for } k = 1, \ldots , n-1, \end{aligned}$$

and follows a standard uniform distribution under \(H_0\).

This test rejects \(H_0\) for both small and large values of \({ DS}_n\).

An additional test based on the so-called Gini index, proposed in Gail and Gastwirth (1978), makes use of the following test statistic

$$\begin{aligned} G_{n} = \frac{\sum _{j=1}^{n}{\sum _{k=1}^{n}{\left| Y_{j} - Y_{k}\right| }}}{2n(n-1)}. \end{aligned}$$
(17)

It is easy to see that the following relationship holds between the test statistics in (16) and (17):

$$\begin{aligned} G_n = 1 - \frac{{ DS}_n}{n-1}. \end{aligned}$$

Similar to \({ DS}_n\), this test rejects the null hypothesis for both small and large values.

Unfortunately, both of these tests have been shown not to be universally consistent.

2.6.2 Harris’ modification of Gnedenko’s F-test (\({ HM}_{n,r}\))

In Gnedenko et al. (1969) a test is proposed for exponentiality involving ordering a sample of size n and then splitting the n elements into two groups; the first containing the r smallest elements and the second containing the remaining \(n-r\) elements. The test statistic, given by

$$\begin{aligned} { GD}_{n,r} = \frac{\sum _{j=1}^r D_j / r}{\sum _{j=r+1}^n D_j / (n-r)}, \end{aligned}$$
(18)

follows an F distribution with 2r and \(2(n-r)\) degrees of freedom under \(H_0\).

A modification of the test in (18) was introduced in Harris (1976). This modification can be used to accommodate testing for exponentiality in the presence of hypercensoring and is referred to as Harris’ modification of Gnedenko’s F-test. For this test, the sample spacings are split into three groups: The first group contains the first r spacings, the last group contains the last r last spacings, and the remaining \(n-2r\) spacings form the second group. The test is based on the elements in the second group and the test statistic is given by

$$\begin{aligned} { HM}_{n,r} = \frac{\left( \sum _{j=1}^{r}{D_{j}} + \sum _{j=n-r+1}^{r}{D_{j}}\right) /2r}{\left( \sum _{j=r+1}^{n-r}{D_{j}}\right) /{(n-2r)}}. \end{aligned}$$

In Harris (1976), it is recommended that r is chosen to be equal to n / 4, and this is also the value of r used in the simulation study presented Sect. 4.

The null hypothesis is rejected for small and large values of both \({ GD}_{n,r}\) and \({ HM}_{n,r}\).

2.7 A test based on a score function

The score function, defined as the gradient of the log likelihood function, is a powerful tool that can be used to test statistical hypotheses. We consider one test, developed in Cox and Oakes (1984), that employs this score function to test for exponentiality.

2.7.1 Cox and Oakes (1984) (\({ CO}_n\))

A score test is introduced in Cox and Oakes (1984) that, when applied to censored data, has the following form

$$\begin{aligned} { CO}_{n} = d + \sum _{j=1}^{n}{\ln \left( X_{j}\right) } - d \frac{\sum _{j=1}^{n}{X_{j}\ln \left( X_{j}\right) }}{\sum _{j=1}^{n}{X_{j}}}, \end{aligned}$$

where \(d \le n\) is the number of uncensored data points. However, when \(d=n\) (i.e., in the uncensored case) and one uses the scaled data \(Y_1, \ldots , Y_n\), the statistic becomes

$$\begin{aligned} { CO}_{n} = n + \sum _{j=1}^{n} (1 - Y_j)\ln (Y_j). \end{aligned}$$

The test rejects \(H_{0}\) for both large and small values of \({ CO}_{n}\) and it is shown using finite sample simulation studies in both Ascher (1990) and Henze and Meintanis (2005) that the test is quite powerful against a wide variety of non-exponential alternatives.

It follows that \(\sqrt{6/n} ({ CO}_n/\pi )\) has a standard normal asymptotic null distribution and the test is consistent against alternative distributions with \(E(X) < \infty \) and \(E(X\ln X - \ln X) \ne 1\), as discussed in, for example, Henze and Meintanis (2002).

2.8 Tests based on other characterizations and properties

Over the years, a multitude of tests for exponentiality have been developed by utilising a number of interesting and varied characterisations and properties of the exponential distribution, but it would not be possible to address all of them in a single study. These tests utilise characterisations such as the memoryless property (see, for example, Ahmad and Alwasel 1999; Alwasel 2001; Angus 1982), the Arnold–Villasenor characterisation (see Jovanović et al. 2015), the Rossberg characterisation (Volkova 2010), and various other characterisations (see, for example, Abbasnejad et al. 2012; Noughabi and Arghami 2011a). Other tests for exponentiality, not included in this paper, include tests for exponentiality based on the analysis of variance (see Shapiro and Wilk 1972), tests based on order statistics (see Bartholomew 1957; Hahn and Shapiro 1967; Jackson 1967; Wong and Wong 1979), tests based on transformations to uniformity (see Hegazy and Green 1975; Seshadri et al. 1969), and tests based on maximum correlations (see Grané and Fortiana 2011), to name but a few. However, for the purposes of the simulation study conducted in this paper, we consider the following four tests: the Ahsanullah test (Volkova and Nikitin 2013), a test based on likelihood ratios (Noughabi 2015), a test based on transformed data (Noughabi and Arghami 2011b), and the Atkinson test (Mimoto and Zitikus 2008). The Ahsanullah test is chosen because no finite sample results for this test are available in Volkova and Nikitin (2013), whereas the remaining three are chosen because of their good power performance in finite sample studies found in the literature.

2.8.1 Tests based on Ahsanullah’s characterisation (\(AH^1_{n}\) and \(AH^2_{n}\))

Assume that the distribution F belongs to a class of distributions \(\mathcal {F}\) that are all strictly monotone and whose hazard rate function, f(x) / S(x), is either increasing or decreasing monotonically. Ahsanullah proved the following characterisation of the exponential distribution in Ahsanullah (1978): Let \(X_{1},X_{2},\ldots ,X_{n}\) be non-negative iid random variables with distribution function F. A necessary and sufficient condition for F to be exponential is that for some j and k, the statistics \((n-j)(X_{(j+1)} - X_{(j)})\) and \((n-k)(X_{(k+1)} - X_{(k)})\) are identically distributed for \(1\le {j}< k < n\).

In Volkova and Nikitin (2013), the following specific settings of this characterization is considered: \(n=2\), \(j=0\) and \(k=1\). Under these settings, the characterization takes the following form: Let X and Y be non-negative iid random variables from the class \(\mathcal {F}\). X is then exponentially distributed if \(|X-Y|\) and \(2\min {\{X,Y\}}\) are identically distributed.

The test statistic suggested in Volkova and Nikitin (2013), derived from this characterization, is

$$\begin{aligned} { AH}^1_{n} = \int _{0}^{\infty }{\left[ H_{n}(t) - G_{n}(t)\right] { dF}_{n}(t)}, \end{aligned}$$

where

$$\begin{aligned} H_{n}(t)&= \frac{1}{n^{2}} \sum _{j=1}^{n}{\sum _{k=1}^{n}{I\left( |X_{j} - X_{k}|< t\right) }},\quad t>0, \\ G_{n}(t)&= \frac{1}{n^{2}} \sum _{j=1}^{n}{\sum _{k=1}^{n}{I\left( 2\min {\{X_{j},X_{k}\} < t}\right) }},\quad t>0. \end{aligned}$$

If the null hypothesis is true, then \(H_{n}\) and \(G_{n}\) should be close to one another. The test therefore rejects \(H_{0}\) for small or large values of \(AH^1_{n}\). The authors showed that

$$\begin{aligned} \sqrt{n} { AH}^1_n \mathop {\rightarrow }\limits ^{\mathcal {D}} N\left( 0, \frac{647}{42{,}525}\right) , \end{aligned}$$

and calculated local Bahadur efficiencies under common parametric alternatives. However, the finite sample performance of their test statistic was not investigated. In addition, we also consider the more common Cramer–von Mises type distance where the squared difference between \(H_n\) and \(G_n\) is used; the corresponding statistic is denoted by

$$\begin{aligned} { AH}^2_{n} = \int _{0}^{\infty }{\left[ H_{n}(t) - G_{n}(t)\right] ^2{ dF}_{n}(t)}. \end{aligned}$$

This new form of the test will reject \(H_0\) for large values of the test statistic.

2.8.2 A test based on likelihood ratios (\({ ZA}_{n}\))

Consider the following two generic statistics,

$$\begin{aligned} Z = \int _{-\infty }^{\infty } Z(t) { dw}(t) \end{aligned}$$
(19)

and

$$\begin{aligned} Z_{\max } = \sup _{t \in (-\infty , \infty )} \{ Z(t) w(t)\}, \end{aligned}$$
(20)

where Z(t), dw(t) and w(t) are appropriately chosen functions. It is easy to show (see, for example, Zhang 2002) that if one chooses \(Z(t) = X^2(t)\), where

$$\begin{aligned} X^2(t) = \frac{n[F_n(t) - F_0(t)]^2}{F_0(t)[1 - F_0(t)]} \end{aligned}$$

is the Pearson chi-squared statistic, then the statistics in equations (19) and (20) become the traditional Anderson–Darling, Cramer–von Mises, and Kolmogorov–Smirnov test statistics for specific choices of \({ dw}(t)\) and w(t), and where \(F_0(x)=1 - \exp (-\lambda x)\).

However, Zhang (2002) suggests using the likelihood ratio statistic \(G^2(t)\) instead of the \(X^2(t)\) statistic, where \(G^2(t)\) is defined as

$$\begin{aligned} G^2(t) = 2n\left\{ F_n(t)\log \left( \frac{F_n(t)}{F_0(t)}\right) + [1 - F_n(t)]\log \left( \frac{1 - F_n(t)}{1 - F_0(t)}\right) \right\} . \end{aligned}$$

Choosing \(Z(t)=G^2(t)\), the authors obtain the following easy-to-calculate versions of the tests statistics for certain choices of dw(t) and w(t):

  • Setting \({ dw}(t) = F_n(t)^{-1}\{1- F_n(t)\}^{-1}{} { dF}_n(t)\) in (19), the following statistic is obtained:

    $$\begin{aligned} { ZA}_n = -\sum _{j=1}^n \left( \frac{\log (1 - \exp (-Y_{(j)}))}{n - j + 0.5} - \frac{Y_{(j)}}{j - 0.5} \right) . \end{aligned}$$
  • Setting \(dw(t) = F_0(t)^{-1}\{1- F_0(t)\}^{-1}dF_0(t)\) in (19), the following approximate statistic is obtained:

    $$\begin{aligned} { ZC}_n = \sum _{j=1}^n \left( \log \left\{ \frac{(1 - \exp (-Y_{(j)}))^{-1} - 1}{ (n- 0.5)/(j - 0.75) - 1 } \right\} \right) ^2. \end{aligned}$$
  • Setting \(w(t) = 1\) in (20), the following statistic is obtained:

    $$\begin{aligned} { ZK}_n= & {} \max _{1 \le j \le n} \left( (j - 0.5)\log \left\{ \frac{j - 0.5}{n(1 - \exp (-Y_{(j)}))} \right\} \right. \\&\left. +\,(n - j + 0.5) \log \left\{ \frac{n - j + 0.5}{n(\exp (-Y_{(j)}))} \right\} \right) . \end{aligned}$$

All of these tests reject \(H_0\) for large values of the test statistics.

The finite sample performance of these three new tests for testing the hypothesis of normality are investigated in Zhang (2002), where it is found that the \({ ZA}_n\) and \({ ZC}_n\) versions of these statistics perform well, even when compared to traditionally powerful tests for normality, such as the Shapiro–Wilk test. In Noughabi (2015) the finite sample performance of these tests is investigated when testing for exponentiality. The authors conclude that, among these three tests, \({ ZA}_n\) performs best. As a result we include only \({ ZA}_n\) in our own Monte Carlo study. Note that while the finite sample performance of these tests were extensively studied in Noughabi (2015), the derivation of the asymptotic null distribution and consistency of these tests were not discussed.

2.8.3 A test using transformed data (\(N{\!}A_{n}\))

The test proposed in Noughabi and Arghami (2011b) employs the rather simple idea that, for a uniform distribution, the quantity \(xf_U(x)\) will be equal to \(F_U(x)\), where \(x \in [0,1]\), is the uniform density function and is the uniform distribution function. Therefore, given data \(V_1, V_2, \ldots , V_n\), a test statistic proposed to test for uniformity is

$$\begin{aligned} T_n = \frac{1}{n}\sum _{j=1}^n \left| V_j ~ \widehat{f}(V_j) - F_U(V_j)\right| , \end{aligned}$$
(21)

where is the kernel density estimator defined as

$$\begin{aligned} \widehat{f}(x) = \frac{1}{nh}\sum _{j=1}^n K\left( \frac{x - V_j}{h}\right) , \end{aligned}$$

with the standard normal density function and h the bandwidth chosen using Silverman’s normal rule of thumb, \(h=1.06sn^{-1/5}\) (see Silverman 1986), where s is the sample standard deviation.

The test for exponentiality proceeds by exploiting the following characterisation of exponentiality (see Alzaid and Al-Osh 1992): For two independent random observations \(W_1\) and \(W_2\) from a distribution G, the random variable \( {W_1}/{(W_1 + W_2)} \) is uniformly distributed if, and only if, G is the exponential distribution.

Subsequently, given the order statistics \(X_{(1)} \le X_{(2)} \le \cdots \le X_{(n)}\), construct the transformed data set

$$\begin{aligned} Z_{ij} = \frac{X_{(i)}}{X_{(i)} + X_{(j)}}, \qquad i\ne j,~ i,j=1,2,\ldots ,n. \end{aligned}$$

Under the hypothesis of exponentiality, these newly transformed values will have a uniform distribution. The test statistic given in (21) can consequently be used to test deviations from exponentiality for these transformed data:

$$\begin{aligned} { NA}_n = \frac{1}{n(n-1)}\underset{i\ne j}{\sum \sum } \left| Z_{ij}\widehat{f}(Z_{ij}) - F_U(Z_{ij}) \right| . \end{aligned}$$

The test rejects the null hypothesis for large values of \({ NA}_n\).

In Noughabi and Arghami (2011b) the authors investigate the finite sample performance of their newly proposed test, but do not derive any asymptotic results.

Another test using transformed data can be found in Dhumal and Shirke (2014), but we will not discuss this test further in this paper.

2.8.4 The Atkinson test (\({ AT}_{n,\gamma }\))

In Lee et al. (1980) the authors propose tests for exponentiality based on the ratio

$$\begin{aligned} Q_{F}(\gamma ) = \frac{E[X^{\gamma }]}{\left( E[X]\right) ^\gamma }, \end{aligned}$$

for \(\gamma >0\), which is equal to \(\varGamma (1 + \gamma )\) if X is exponentially distributed.

However, an approach whereby the quantity \(Q_F(\gamma )\) is raised to the power \(1/\gamma \) to create the following ratio

$$\begin{aligned} R_{F}(\gamma ) = \frac{E[X^{\gamma }]^{1/\gamma }}{E[X]}, \end{aligned}$$

is adopted in Mimoto and Zitikus (2008). Naturally, if X is exponentially distributed, then \(R_F(\gamma )\) equals \(\varGamma (1 + \gamma )^{1/\gamma }\) for \(\gamma \ne 0\), and equals \(\exp (-\epsilon )\) when \(\gamma \rightarrow 0\), where \(\epsilon = 0.577215\ldots \) is the Euler constant. The test statistic proposed in Mimoto and Zitikus (2008), called the Atkinson statistic, is based on the difference between an empirical estimator of \(R_F(\gamma )\) and \(\varGamma (1 + \gamma )^{1/\gamma }\), for \(\gamma \) values between \(-1\) and 1, but \(\gamma \ne 0\). The test statistic is given by

$$\begin{aligned} { AT}_{n,\gamma } = \sqrt{n} \left| R_{n}(\gamma ) - \varGamma (1 + \gamma )^{1/\gamma }\right| , \end{aligned}$$
(22)

where

$$\begin{aligned} R_n(\gamma ) = \frac{1}{\bar{X}_n}\left[ \frac{1}{n} \sum _{j=1}^n X_j^\gamma \right] ^{1/\gamma }. \end{aligned}$$

In the limit where \(\gamma \rightarrow 0\) the quantity \(R_F(\gamma )\) has the form

$$\begin{aligned} R_F(0) = \frac{\exp \left( E[\log (X)]) \right) }{E[X]}, \end{aligned}$$

the numerator of which is consistently estimated by the geometric mean \(G_n = \prod _{j=1}^n X_j^{1/n}\). Therefore, when \(\gamma =0\), the resulting test statistic, called the Moran statistic for exponentiality, has the form

$$\begin{aligned} { AT}_{n,0} = \sqrt{n} \left| \frac{G_n}{\bar{X}_n} - \exp (-\epsilon ) \right| , \end{aligned}$$

see Moran (1951). For all choices of \(\gamma \), the test rejects the null hypothesis for large values.

Extensive Monte Carlo power studies are presented in Mimoto and Zitikus (2008) where it is found that values of \(\gamma \) close to 0 and close to 0.99 produce the highest power for most alternatives considered. For the purposes of this paper, a compromise choice of \(\gamma =0.01\) is selected. In addition, the authors of Mimoto and Zitikus (2008) establish the asymptotic null distribution and consistency of the test statistic \({ AT}_{n,\gamma }\).

3 A data-dependent choice of the tuning parameter

Many of the tests mentioned in Sect. 2 contain a tuning parameter \(\gamma \) typically appearing in a weight function (see for example the test statistics in (4), (6), and (13)). As stated in the introduction, authors typically approach the selection of this parameter by evaluating the power performance of their tests across a grid of values of the tuning parameter and then suggesting a compromise choice for the parameter by selecting a value that fares well for the majority of the alternatives considered. However, there is general agreement that a data-dependent choice of this parameter is required for practical implementation.

Consider a generic test statistic which contains a tuning parameter \(\gamma \) denoted \(T_{n,\gamma }\), whose critical values, denoted by \(\widetilde{C}_{n,\gamma }(\alpha )\), can be obtained through Monte Carlo simulation. A possible data-dependent choice of the parameter \(\gamma \) proposed by Allison and Santana (2015) can be obtained by maximising the bootstrap power of the test as follows:

$$\begin{aligned} \widehat{\gamma }=\widehat{\gamma }\left( \mathbf {X}_{n}\right) =\arg \sup _{\gamma \in \mathbb {R}}P^{*}\left( T_{n,\gamma }\left( \mathbf {Y}_{n}^{*}\right) \ge \tilde{C}_{n,\gamma }\left( \alpha \right) \right) , \end{aligned}$$

where \(\mathbf {Y}_{n}^{*}=(Y_{1}^{*},Y_{2}^{*},\ldots ,Y_{n}^{*})\) denotes a bootstrap sample taken with replacement from \(\mathbf {Y}_{n}\), and \(P^{*}\) is the law of \(\mathbf {Y} _{n}^{*}\) given \(\mathbf {Y}_{n}\). In Allison and Santana (2015) the following algorithm used to approximate the ideal bootstrap estimator \(\widehat{\gamma }\) is provided:

  1. 1.

    Fix a grid of \(\gamma \) values: \(\gamma \in \left\{ \gamma _{1},\gamma _{2},\ldots ,\gamma _{k}\right\} \).

  2. 2.

    Obtain a bootstrap sample \(\mathbf {Y}_{n}^{*}\) by sampling with replacement from \(\mathbf {Y}_{n}\).

  3. 3.

    Calculate \(T_{n,\gamma _{j}}\left( \mathbf {Y}_{n}^{*}\right) \), \( j=1,2,\ldots ,k\).

  4. 4.

    Repeat steps (2) and (3) a large number of times (say B times) and denote the resulting test statistics by \(T_{n,\gamma _{j},1}^{*},T_{n,\gamma _{j},2}^{*},\ldots ,T_{n,\gamma _{j},B}^{*}\), \(j=1,2,\ldots ,k\).

  5. 5.

    Calculate

    $$\begin{aligned} \widehat{P}_{{ boot},\gamma _{j}}=\frac{1}{B}\sum _{b=1}^{B}{\text {I}}\left( T_{n,\gamma _{j},b}^{*}\ge \tilde{C}_{n,\gamma _{j}}\left( \alpha \right) \right) ,\quad j=1,2,\ldots ,k. \end{aligned}$$
  6. 6.

    Calculate

    $$\begin{aligned} \widehat{\gamma }_{B}=\widehat{\gamma }_{B}\left( \mathbf {X}_{n}\right) =\arg \max _{\gamma \in \left\{ \gamma _{1},\gamma _{2},\ldots ,\gamma k\right\} }\widehat{P }_{{ boot},\gamma }. \end{aligned}$$
    (23)

The numerical results reported in Tables 23456 and 7 in Sect. 4 relating to test statistics containing a tuning parameter are obtained using the estimated tuning parameter obtained in (23). The estimated powers obtained using the compromise choice of \(\gamma \) are reported in parentheses in these tables. The details related to the choice of the grid used for each test are discussed in the next section.

4 Monte Carlo methodology and results

In this section Monte Carlo simulations are used to evaluate the power of the various tests discussed in Sect. 2.

4.1 Simulation setting

Throughout the simulation study we use a significance level of 5% and the critical values of all tests are calculated based on 10 000 independent Monte Carlo replications. All calculations are done in R (R Core Team 2013).

Power estimates are calculated for sample sizes \(n \in \{ 10, 20, 30, 50, 75, 100\}\) using 5000 independent Monte Carlo replications for various alternative distributions. These alternative distributions, given in Table 1, are chosen since they are commonly employed alternatives to the exponential distribution, which has a constant hazard rate (CHR). The distributions considered include those with increasing hazard rates (IHR), decreasing hazard rates (DHR), as well as non-monotone hazard rates (NMHR).

Table 1 Various choices of the alternative distributions
Table 2 Monte Carlo power estimates for \(n=10\)
Table 3 Monte Carlo power estimates for \(n=20\)
Table 4 Monte Carlo power estimates for \(n=30\)
Table 5 Monte Carlo power estimates for \(n=50\)
Table 6 Monte Carlo power estimates for \(n=75\)
Table 7 Monte Carlo power estimates for \(n=100\)

In order to determine the power of the six tests containing a tuning parameter (\({ BH}_{n,\gamma }\), \(L_{n,\gamma }\), \({ PW}^1_{n,\gamma }\), \({ PW}^2_{n,\gamma }\), \(J_{n,\gamma }\), \({ AT}_{n,\gamma }\)) when using the data-dependent choice of the parameter (discussed in Sect. 3), we first need to approximate the empirical powers of these tests for each value of \(\gamma \) in a sequence of \(\gamma \) values. The empirical power based on the data-dependent choice is then calculated as described in Allison and Santana (2015). In each case \(B=250\) bootstrap replications are used to evaluate the bootstrap power of the tests. The following grids of values of the parameter are used for the respective tests:

  • For \({ BH}_{n,\gamma }\), \(L_{n,\gamma }\), \({ PW}^1_{n,\gamma }\), and \({ PW}^2_{n,\gamma }\) the grid of \(\gamma \) values is given by

    $$\begin{aligned} \gamma \in \{0.1, 0.25, 0.5, 0.75, 1, 1.5, 2, 2.5, 5\}. \end{aligned}$$
  • For \(J_{n,\gamma }\), the grid of \(\gamma \) values is

    $$\begin{aligned} \gamma \in \{ 0.1, 0.3, 0.5, 0.7, 0.9\}. \end{aligned}$$
  • The grid of \(\gamma \) values used for \({ AT}_{n,\gamma }\) is

    $$\begin{aligned} \gamma \in \{-0.99, -0.75, -0.5, -0.25, -0.01, 0.01, 0.25, 0.5, 0.75, 0.99 \}. \end{aligned}$$

4.2 Simulation results

Tables 23456 and 7 show the estimated powers of the various tests discussed in Sect. 2 for sample sizes \(n \in \{10,20,30,50,75,100\}\) against each of the alternative distributions given in Table 1. The entries in these tables are the percentage of 5000 independent Monte Carlo samples that resulted in the rejection of \(H_0\) rounded to the nearest integer. Note that, for the tests containing a tuning parameter, the primary entry is the approximate power for the test based on the data-dependent choice of the parameter, \(\widehat{\gamma }\), while the approximate power of the test based on the compromise choice appears in parentheses along-side it. To ease comparisons between the results, the highest power for each alternative distribution is highlighted.

The primary aim of this paper is to compare the power of these tests against a wide range of alternative distributions. Below we present some general conclusions relating to the reported estimated powers of the various tests. For the second part of the analysis of the results we consider only the tests containing tuning parameters. Here we compare the powers achieved by tests employing the data-dependent choice proposed in Allison and Santana (2015) with those associated with the compromise choice of the parameter.

The performance of the tests are greatly affected by the shape of the hazard rate of the alternative distribution considered. Consequently, we discuss the overall results, as well as the results categorised according to the shape of the hazard rate classified as increasing, decreasing, or non-monotone.

4.3 Power comparisons

For the purposes of the comparison between the power of the various tests we use the data-dependent choice (and not the compromise choice) of the tuning parameter for the tests containing such a parameter.

Consider the performance of the tests in general against all alternatives. The powers of \({ HM}_{n}\) do not compare favourably to those of the other tests; this test reveals lower powers against the majority of the alternatives. For small samples, \(AH_{n}^{2}\), \({ BR}_{n}\) and \({ NA}_{n}\) also exhibit lower powers against the majority of the alternatives. The tests that generally perform well are \({ CO}_{n}\), \({ ZA}_{n}\), \({ AT}_{n,\widehat{\gamma }}\), \({ BH}_{n,\widehat{\gamma }}\) and \(L_{n,\widehat{\gamma }}\). The \({ CM}_{n}\) and \(\overline{{ CM}}_{n}\) also perform relatively well against the majority of the alternatives, especially for large samples.

We now consider the results pertaining to the alternatives with increasing hazard rates. Against these alternatives \({ HM}_{n}\), \({ KS}_{n}\), \(AH_{n}^{1}\), \( J_{n,\widehat{\gamma }}\), \(PW_{n,\widehat{\gamma }}^{1}\) and \(PW_{n,\widehat{\gamma }}^{2}\) exhibit lower powers for all sample sizes considered. \({ BR}_{n}\) has higher power in the case of small sample sizes, but its power relative to the other tests decreases with sample size. The opposite is true for \(L_{n,\widehat{\gamma } }\), which reveals a relative increase in power with sample size. The two tests based on mean residual life, \(\overline{{ KS}}_{n}\) and \(\overline{{ CM}}_{n}\), perform relatively well for all sample sizes. The Cramér–von Mises type statistic for Ahsanullah’s test, \(AH_{n}^{2}\), and \({ NA}_{n}\) also perform well, especially for small sample sizes. The following tests exhibit high powers in the case of large sample sizes: \(G_{n}\), \({ EP}_{n}\), \({ ZA}_{n}\) and \({ BH}_{n,\widehat{\gamma }}\).

We now turn our attention to the alternatives with decreasing hazard rates. \( { HM}_{n}\), \(AH_{n}^{2}\), \({ BR}_{n}\) and \({ NA}_{n}\) perform poorly for all sample sizes. In turn, the tests for which large powers are observed are \({ CO}_{n}\), \( { BH}_{n,\widehat{\gamma }}\) and \(L_{n,\widehat{\gamma }}\). Furthermore, \(\overline{{ CM}}_{n}\), \(G_{n}\), \({ EP}_{n}\) and \({ AT}_{n,\widehat{\gamma }}\) perform well, especially for large samples, while \(PW_{n,\widehat{\gamma }}^{2}\) provides higher relative powers in the case of small samples.

The results pertaining to the alternatives with non-monotone hazard rates are as follows. The tests generally demonstrating the lowest powers are \( { HM}_{n}\), \({ BR}_{n}\) and \({ NA}_{n}\). For small sample sizes \(AH_{n}^{2}\) performs poorly, while \(G_{n}\), and \({ EP}_{n}\) exhibit relatively low powers in the case of large samples. However, \({ ZA}_{n}\), \({ AT}_{n,\widehat{\gamma }}\), \({ BH}_{n, \widehat{\gamma }}\) and \(L_{n,\widehat{\gamma }}\) generally perform well for all sample sizes. The original probability weighted characteristic function test, \(PW_{n,\widehat{\gamma }}^{1}\), where the weights emphasise the centre of the distribution, does well in the case of larger samples. On the other hand, the alternative formulation of this test with the weight function allocating the majority of the weight to the tails of the distribution, \(PW_{n,\widehat{\gamma } }^{2}\), exhibits relatively high power, especially for small samples. The same is true for \({ CO}_{n}\).

In summary, the powers achieved by \({ HM}_{n}\) are generally substantially lower than those of the remaining tests. Other tests that do not generally achieve good results are \(AH_{n}^{2}\), \({ BR}_{n}\), and \({ ZA}_{n}\). The tests that perform well are \({ BH}_{n,\widehat{\gamma }}\), \(L_{n,\widehat{\gamma }}\), \({ AT}_{n, \widehat{\gamma }}\) and \({ CO}_{n}\). The test that performs the best overall is \( { BH}_{n,\widehat{\gamma }}\), closely followed by \(L_{n,\widehat{\gamma }}\). Note that only one of the tests reported to perform relatively poorly contain a tuning parameter, while only one of the tests reported to achieve high powers do not contain such a parameter; \({ CO}_{n}\) performs the best among those tests that do not include a tuning parameter.

4.4 Comparisons based on the choice of the tuning parameter

Six of the goodness-of-fit test statistics considered contain tuning parameters. Below we compare the powers achieved by these tests using two different values of the tuning parameter. The first value is chosen data-dependently using the method detailed in Allison and Santana (2015), while the second is the compromise choice recommended in the relevant literature. As was the case above, the discussion below does not only refer to the overall performance of the tests; the performance of the tests against alternatives with increasing, decreasing and non-monotone hazard rates are also discussed separately.

We consider the overall results first. For smaller sample sizes there is little to choose between the powers obtained using \({ AT}_{n,\gamma }\) based on the choices of the tuning parameter. However, as the sample size increases, use of the data-dependent choice generally results in a slight increase in relative power. On the other hand, when using \(J_{n,\gamma }\) the choice between the tuning parameters is unimportant for large samples, but for smaller samples the data-dependent choice leads to slightly higher powers. For both \( { BH}_{n,\gamma }\) and \(L_{n,\gamma }\) the data-dependent choice leads to higher powers than the compromise choice. Interestingly, the compromise choice outperforms the data-dependent choice in the case of the original PWECF test, \(PW_{n,\gamma }^{1}\), by a small margin, while the data-dependent choice leads to vast improvements in the powers associated with \(PW_{n,\gamma }^{2}\) (giving more weight towards the tails of the distribution), especially for larger samples.

Next we consider alternative distributions with increasing hazard rates. In this case the use of either method for the choice of the tuning parameter leads to little difference in powers obtained using the \({ AT}_{n,\gamma } \), \({ BH}_{n,\gamma }\), \(PW_{n,\gamma }^{1}\) and \(L_{n,\gamma }\) tests. The performance of \(J_{n,\gamma }\) is slightly improved by using the compromise choice, while the performance of \(PW_{n,\gamma }^{2}\) is greatly improved when using the data-dependent choice of the tuning parameter.

Turning our attention to the alternative distributions with decreasing hazard rates, we see that the observed powers are not substantially affected by the choice of tuning parameter in the case of the following tests: \( { AT}_{n,\gamma }\), \({ BH}_{n,\gamma }\) and \(L_{n,\gamma }\). For both \( PW_{n,\gamma }^{1}\) and \(PW_{n,\gamma }^{2}\) the compromise choice of the tuning parameter outperforms the data-dependent choice. The power of \( J_{n,\gamma }\) is substantially improved when using the data-dependent choice, especially for small samples.

Finally, we consider the performance of the tests against alternatives with non-monotone hazard rates. When using \(PW_{n,\gamma }^{1}\) the powers can be increased by using the compromise choice, especially for small samples. However, substantial improvements in the power of \(PW_{n,\gamma }^{2}\) are realised when the data-dependent choice is used, especially in the case of larger samples. The powers of \({ BH}_{n,\gamma }\) and \(L_{n,\gamma }\) are higher when the data-dependent choice is used than is the case for the compromise choice. The performance of \({ AT}_{n,\gamma }\) is not substantially affected by the choice of the tuning parameter for small samples, but using the data-dependent choice leads to improved power in the case of larger samples. When using \(J_{n,\gamma }\) the data-dependent choice outperforms the compromise choice for small samples.

Fig. 1
figure 1

Powers for \(L_{n,\gamma }\) for \(n=20\) for various alternatives

It is interesting to note that in the cases where the compromise choice of the tuning parameter outperforms the data-dependent choice the difference in realised power is usually small. However, there are cases where the power associated with the data-dependent choice vastly outperforms the compromise choice. As an example, consider the power of \(PW_{n,\gamma }^{2}\) against samples of size 75 generated from a lognormal distribution with parameter 0.8. The power using the compromise choice is estimated to be 0%, while the estimated power associated with the data-dependent choice is estimated to be 96%. Various other instances of this phenomenon can be observed in the reported powers.

To conclude this section, we provide a short illustration of how the choice of the tuning parameter affects the power of two of the tests considered in the study. For this purpose we consider the tests \(L_{n,\gamma }\) and \(J_{n,\gamma }\) for sample size \(n=20\). In order to more easily visualise the behaviour of the powers across the \(\gamma \) values Figs. 1 and 2 present the powers obtained for tests \(L_{n,\gamma }\) and \(J_{n,\gamma }\), respectively, for each choice of \(\gamma \) in the grid of selected \(\gamma \) values. The powers are calculated for five different alternative distributions. For each test, the compromise choice of the tuning parameter is indicated by a vertical dashed line in the relevant figure.

Fig. 2
figure 2

Powers for \(J_{n,\gamma }\) for \(n=20\) for various alternatives

It is clear from the figures that the power of the tests is highly dependent on the choice of \(\gamma \). The compromise choice performs moderately well in many of the alternatives, but in some cases it produces low powers relative to other choices of \(\gamma \) (see, e.g., \(L_{n,\gamma }\) for alternative PW(2) and \(J_{n,\gamma }\) for alternatives LN(1.5) and PW(1)). Furthermore, the main entries in Tables 8 and 9 correspond to the powers presented in the figures, whereas the values stated in parentheses in these tables denote the percentage of times (out of 5000 independent Monte Carlo simulations) that the data-dependent procedure selected the \(\gamma \) value that corresponds to the \(\gamma \) value given in the column heading. These tables are provided to show that the procedure for obtaining the data-dependent choice of the tuning parameter most frequently selects the value of \(\gamma \) that produces the highest power for a given alternative. Consider, for example, \(L_{n,\gamma }\) for the alternative PW(2), where the maximum power of 53% is obtained at \(\gamma =0.1\). The percentage of times that the procedure chose \(\gamma =0.1\) is 68%, and the power of the test based on the data-dependent choice is 43%. In contrast, the power associated with the compromise choice is only 21%.

Table 8 Percentage of 5000 samples that resulted in the rejection of \(H_0\) (main entries) and the percentage of times that the procedure selected the specific value of \(\gamma \) (in parentheses) based on test \(L_{n,\gamma }\) for \(n = 20\)
Table 9 Percentage of 5000 samples that resulted in the rejection of \(H_0\) (main entries) and the percentage of times that the procedure selected the specific value of \(\gamma \) (in parentheses) based on test \(J_{n,\gamma }\) for \(n = 20\)

5 Practical application

In this section we apply all of the tests considered in Sect. 2 to a real-world data set: the ‘Leukemia’ data set given in Table 10 (see Kotze and Johnson 1983, for a discussion of the original data set). These data display the survival times (days) of 43 patients diagnosed with a certain type of Leukemia.

Table 11 lists the names of the 20 different tests discussed in this paper along with the value of the test statistic calculated from these data, the p-value for testing the hypothesis of exponentiality, as well as the time (s) taken to compute the p-value and critical value for each test (based on \(MC=10000\) replications). Where applicable, the data-dependent choice of \(\gamma \) used is also displayed in the table. The number of bootstrap replications in the calculation of the data-dependent choice of the tuning parameter is set to \(B=1000\). The final column in the table indicates whether the test is available in the software package R (R Core Team 2013); these tests are primarily available in the package exptest (Novikov et al. 2013).

All of the tests except \(J_{n,0.9}\) and \({ BR}_n\) do not reject the null hypothesis of exponentiality at a significance level of \(\alpha =0.05\).

Table 10 Survival times in days after diagnosis
Table 11 Summary of results for the Leukaemia data set

As shown in Table 11, none of the tests containing a tuning parameter appear in R. These tests are rather powerful and therefore it might be a worthwhile avenue for future work to create an R package that includes these tests along with the procedure to obtain the tuning parameter data-dependently.

6 Conclusions

In this paper we consider a large number of tests for exponentiality based on a wide variety of characteristics of this distribution. Below we briefly mention these characteristics as well as the tests associated with them.

The tests based on the characteristic function are the Epps and Pulley test (\({ EP}_{n}\)) as well as tests based on the probability weighted empirical characteristic function. We consider two forms of this test; the first uses the original test statistic proposed in Meintanis et al. (2014) (\(PW_{n,\gamma }^{1}\)). The weight function used in this test statistic assigns the majority of the weight to the centre of the distribution. The second formulation of the test statistic considered (\(PW_{n,\gamma }^{2}\)) gives more weight towards the tails of the distributions.

The tests based on the empirical Laplace transform are those of Baringhaus and Henze (\({ BH}_{n,\gamma }\)) as well as Henze and Meintanis (\(L_{n,\gamma }\)).

Another characteristic of the exponential distribution that some of the tests are based on is the distribution function. The tests associated with this characteristic are the Kolmogorov–Smirnov (\({ KS}_{n}\)) and Cramér–von Mises (\({ CM}_{n}\)) tests.

Next we consider the tests based on the mean residual life of the data. The tests considered include those of Baringhaus and Henze. We consider two test statistics based on mean residual life introduced in Baringhaus and Henze (2000); a Kolmogorov–Smirnov type test (\(\overline{{ KS}}_{n}\)) and a Cramér–von Mises type test (\(\overline{{ CM}}_{n}\)). The test of Jammalamadaka and Taufer (\(J_{n,\gamma }\)) is also based on this characteristic.

Another characteristic used to test for exponentiality is entropy. We consider two tests based on entropy; the test of Zardasht et al. (\({ ZP}_{n}\)) and that of Baratpour and Habibi Rad (\({ BR}_{n}\)).

Furthermore, we consider two tests based on the normalised spacings of the observed data. The first of these is the Gini test (\(G_{n}\)) and the second is Harris’ modification of Gnedenko’s F-test (\({ HM}_{n}\)).

The Cox and Oakes test (\({ CO}_{n}\)) is also included in the study. This test is based on a score function.

Various other characteristics are also used. We consider two tests based on Ahsanullah’s characterisation. The first (\(AH_{n}^{1}\)) uses the original test statistic proposed in Volkova and Nikitin (2013). The second test (\(AH_{n}^{2}\)) utilizes a Cramér–von Mises type test statistic. Zhang’s test (\({ ZA}_{n}\)), based on likelihood ratios, is included in the study as well as the Noughabi and Arghami test (\({ NA}_{n}\)) which uses transformed data. Finally, the Atkinson test (\({ AT}_{n,\gamma }\)), based on the Atkinson statistic, is considered.

Based on the results of the Monte Carlo study conducted in this paper, we make some brief conclusions regarding the powers of the tests considered. Generally, \({ HM}_{n}\) achieves powers substantially lower than the remaining tests. In addition, the \(AH_{n}^{2}\), \({ BR}_{n}\), and \({ ZA}_{n}\) tests are also relatively poor performers in terms of power. However, tests that do perform well are \({ BH}_{n,\widehat{\gamma }}\), \(L_{n,\widehat{\gamma }}\), \({ AT}_{n, \widehat{\gamma }}\) and \({ CO}_{n}\). \({ BH}_{n,\widehat{\gamma }}\) has the best overall performance, closely followed by \(L_{n,\widehat{\gamma }}\). Note that only one of the tests reported to perform relatively poorly contain a tuning parameter, while only one of the tests reported to achieve high powers do not contain such a parameter; \({ CO}_{n}\) performs the best among those tests that do not include a tuning parameter.

In light of the results discussed above, we would advise using the data-dependent choice of the tuning parameter; this choice generally outperforms the compromise choice. It is important to note that power associated with the data-dependent choice of the tuning parameter can conceivably be increased further by evaluating the powers over finer grids of tuning parameters than the grids used in the paper. Because of the large number of Monte Carlo replications required for the numerical results shown in the paper, finer grids would substantially increase the computational burden. However, in the case where the hypothesis of exponentiality is to be tested on a single dataset the computational time required is substantially less.