Abstract
A new characterization of the Pareto distribution is proposed, and new goodness-of-fit tests based on it are constructed. Test statistics are functionals of U-empirical processes. The first of these statistics is of integral type, it is similar to the classical statistics \(\omega _n^1\). The second one is a Kolmogorov type statistic. We show that the kernels of our statistics are non-degenerate. The limiting distribution and large deviations asymptotics of the new statistics under null hypothesis are described. Their local Bahadur efficiency for parametric alternatives is calculated. This type of efficiency is mostly appropriate for the solution of our problem since the Kolmogorov type statistic is not asymptotically normal, and the Pitman approach is not applicable to this statistic. For the second statistic we evaluate the critical values by using Monte-Carlo methods. Also conditions of local optimality of new statistics in the sense of Bahadur are discussed and examples of such special alternatives are given. For small sample size we compare the power of those tests with some common goodness-of-fit tests.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Let \({\mathscr {P}}\) be the family of the Pareto distributions with the distribution function (d.f.)
In this paper we develop the goodness-of-fit tests for the Pareto distribution using a new characterization based on the property of order statistics. The problem formulation is as follows: let \(X_1,\ldots ,X_n\) be positive i.i.d. random variables (rv’s) with continuous d.f. F. Consider testing the composite hypothesis \(H_0: F \in {\mathscr {P}}\) against the general alternative \(H_1: F \notin {\mathscr {P}}\), assuming that the alternative d.f. is also concentrated on \([1,\infty )\).
The goodness-of-fit tests for the Pareto distribution have been discussed in Beirlant et al. (2006), Gulati and Shapiro (2008), Martynov (2009), Rizzo (2009). We exploit a different idea for constructing and analyzing statistical tests based on characterization by the property of equidistribution of linear statistics by means of so-called U-empirical d.f.’s (Janssen 1988; Korolyuk and Borovskikh 1994). This method was developed early in several articles, particularly, in Nikitin (1996), Nikitin and Peaucelle (2004), Nikitin and Tchirina (1996), Nikitin and Volkova (2010), Nikitin and Volkova (2012), Litvinova (2004). The tests for the Pareto distribution using this approach were obtained and analyzed in Jovanovic et al. (2014). One can observe that the new tests based on characterizations have reasonably high efficiencies and can be competitive with previously known goodness-of-fit tests. Let us explain our approach.
We will say that the d.f. F belongs to the class of distributions \(\mathscr {F}\), if \(\forall x_1, x_2\): either \(F(x_1x_2)\le F(x_1)F(x_2)\) or \(F(x_1x_2)\ge F(x_1)F(x_2)\), see Ahsanullah (1989).
Let \(X_1,\ldots ,X_n\) be i.i.d. positive absolutely continuous random variables with the d.f. F from the class \(\mathscr {F}\). Denote by \(X_{(1,n)}\le X_{(2,n)}\le \ldots \le X_{(n,n)}\) - the order statistics of a random sample \(X_1,...,X_n\).
We present a new characterization within the class \(\mathscr {F}\).
Theorem 1
For the fixed k let \(X_1,...,X_k\) be i.i.d., positive and bounded rv’s having an absolutely continuous (with respect to Lebesgue measure) d.f. F(x). Then the equality in law of \(X_1\) and \(X_{(k,k)}/X_{(k-1,k)}\) takes place iff \(X_1\) has some d.f. from the family \({\mathscr {P}}\).
Proof
Let \(Y=\ln {X}\) and let G denote the d.f. of Y. It can be easily seen that \(F \in \mathscr {F}\) iff G is NBU (“new better than used”) or NWU (“new worse than used”) (Ahsanullah 1977). Further, since we use the monotonic transformation, then \(X_1\) and \(X_{(k,k)}/X_{(k-1,k)}\) will be identically distributed iff \(Y_1\) and \(Y_{(k,k)}-Y_{(k-1,k)}\) are identically distributed. It follows from Ahsanullah (1977) that \(X_1\) and \(X_{(k,k)}/X_{(k-1,k)}\) are identically distributed iff \(Y=\ln {X}\) has the exponential distribution with some scale parameter \(\lambda \), therefore \(X_1\) has the Pareto distribution with the same parameter \(\lambda \). \(\square \)
In the case when \(k=2\) our characterization coincide with another characterization of the Pareto distribution considered in Jovanovic et al. (2014), see also Nikitin and Volkova (2012). Note that our characterization extends the charaterization, involved in Jovanovic et al. (2014).
According to our characterization we construct the U-empirical d.f. by the formula
where \(X_{(s,\{i_1,\ldots ,i_k\})}, \, s=\{k-1,k\}\) denotes the \(s-\)th order statistic of the subsample \(X_{i_1},\ldots ,X_{i_k}\). For rv X the \(U-\)statistical d.f. will be simply the usual empirical d.f. \(F_n(t)=n^{-1}\sum _{i=1}^n\mathbf 1 (X_i<t), t \in R^1\), based at the observations \(X_1,\dots ,X_n\).
It is known that the properties of U-empirical d.f.’s are similar to the properties of usual empirical d.f.’s (Helmers et al. 1988; Janssen 1988). Hence the difference \(H_n- F_n\) for large n should be almost surely close to zero under \(H_0\), and we can measure their closeness by using some test statistics, assuming their large values to be critical.
We suggest two test statistics
Note that both proposed statistics under \(H_0\) are invariant with respect to the change of variables \( X \rightarrow X^{\frac{1}{\lambda }}\), so we may set \(\lambda =1\).
We discuss their limiting distributions under the null hypothesis and find logarithmic asymptotics of large deviations under \(H_0\). Next we calculate their efficiencies against some parametric alternatives from the class \(\mathscr {F}\). We use the notion of local exact Bahadur efficiency (BE) (Bahadur 1971; Nikitin 1995), as the statistics \(D_n^{(k)}\) has the non-normal limiting distribution, hence the Pitman approach to the efficiency is not applicable. However, it is known that the local BE and the limiting Pitman efficiency usually coincide, see Wieand (1976), Nikitin (1995).
Finally, we study the conditions of the local optimality of our tests, describe the “most favorable” alternatives for them and compare the powers of our tests with some standard goodness-of-fit tests.
The family of d.f. in null-hypothesis we consider is a particular case of the so-called Pareto type I distribution with the d.f. \(P_1(x) = 1-(\frac{x}{\beta })^{-\lambda }, \ x \ge \beta > 0, \ \lambda >0\), see, for example, Arnold (1983). For practice goodness-of-fit testing based on our new tests the unknown parameters of the hypothesized Pareto distribution can be estimated by a number of methods, see Arnold (1983, Ch. 5), Kleiber and Kotz (2003, Ch. 3), Brazausskas and Serfling (2003), Rizzo (2009). One can estimate first the parameter \(\beta \) for example by the MLE estimator \(\hat{\beta }= \min _{i=1, \ldots , n} X_i\). Then the sample \(X_1,\ldots ,X_n\) can be transformed to the new sample \(Y_1,\ldots ,Y_n\), where \(Y_i=X_i/\hat{\beta }\) and its has the d.f. considered in (1).
There exist the Pareto’s second model, so-called Pareto type II distribution with d.f. \(P_2(x) = 1-(1+\frac{x-\mu }{\beta })^{-\lambda }, \ x \ge \mu \in \mathbb {R}, \ \lambda >0\). Pareto type I and type II models are related by a following transformation: if rv X has a Pareto type II distribution, then \(X-(\mu -\sigma )\) has a Pareto type I distribution. Therefore using one of the estimator of the location parameter \(\mu \), one can reduce Pareto type II rv to our model. We do not discuss the difference between the parameter estimators and concentrate on the constructing of the goodness-of-fit tests.
2 Integral statistic \(I_{n}^{(k)}\)
The statistic \(I_{n}^{(k)}\) is asymptotically equivalent to the U-statistic of degree \((k+1)\) with the centered kernel
where \(\pi (i_1, \ldots , i_{k+1})\) means all permutations of different indices from \(\{i_1, \ldots ,i_{k+1}\}\).
Let \(X_1,\ldots , X_{k+1}\) be independent rv’s from the standard Pareto distribution. It is known that non-degenerate U-statistics are asymptotically normal (Hoeffding 1948; Korolyuk and Borovskikh 1994). To prove that the kernel \({\varPsi }_k(X_{1},\ldots , X_{{k+1}})\) is non-degenerate, we calculate its projection \(\psi _k(s)\). For a fixed \(X_{{k+1}}=s, \, s\ge 1\) we have:
It follows from the above characterization that the second probability is equal to:
It remains to calculate the first term. For this purpose we decompose the probability as \(\mathbb {P}(X_{k,\{2,\ldots , k,s \}}/X_{k-1,\{2,\ldots , k,s\}} < X_{1}) = \mathbb {P}_1+\mathbb {P}_2+\mathbb {P}_3\), where \(\mathbb {P}_i, i={1,2,3}\) are initial probabilities, computed in one of the following cases:
-
(1)
Let the sample units take places as follows: \(X_2 < \ldots < X_k <s\). Then our probability transforms into
$$\begin{aligned} \mathbb {P}_1= & {} (k-1)! \, \mathbb {P}\left( \frac{s}{X_k} <X_1, X_2 < \ldots < X_k <s\right) \\= & {} (k-1)! \, \mathbb {P}\left( X_k< s, X_1 > \frac{s}{X_k} , X_2 < X_3, X_3 <X_4, \ldots , X_{k-1} < X_{k}\right) . \end{aligned}$$After some calculations we obtain that the last probability is equal to:
$$\begin{aligned}&(k-1)! \int _1^s \left( 1-F\left( \frac{s}{x_k}\right) \right) \frac{F^{k-2}(x_k)}{(k-2)!} d F(x_k)\\&\quad = F^{k-1}(s)-(k-1) \int _1^s \left( 1-\frac{1}{x}\right) ^{k-2} \left( 1-\frac{x}{s}\right) \frac{dx}{x^2}. \end{aligned}$$The integral in the second term can be evaluated using integration by parts and binomial representation of the function \((1-\frac{1}{x})^{k-1}\). Finally we have:
$$\begin{aligned}&\int _1^s \left( 1-\frac{1}{x}\right) ^{k-2} \left( 1-\frac{x}{s}\right) \frac{dx}{x^2} = \frac{1}{s(k-1)}\int _{1}^{s}\sum _{j=0}^{k-1}(-1)^j {{k-1} \atopwithdelims ()j}x^{-j} dx\\&\quad = \frac{1}{s(k-1)}\left( s-1-(k-1)\ln {(s)}+\sum _{j=2}^{k-1}(-1)^j {{k-1} \atopwithdelims ()j}\frac{1-s^{-(j-1)}}{j-1}\right) . \end{aligned}$$Thus the initial probability in this case is equal to
$$\begin{aligned} \mathbb {P}_1=F^{k-1}(s)-F(s)+(k-1)\frac{\ln {s}}{s}-\frac{1}{s}\sum _{j=2}^{k-1}(-1)^j {{k-1} \atopwithdelims ()j}\frac{1-s^{-(j-1)}}{j-1}. \end{aligned}$$ -
(2)
The sample units are \(X_2 < X_3<\ldots X_{k-1}< s < X_k\), then for this case we have:
$$\begin{aligned} \mathbb {P}_2= & {} (k-1)! \, \mathbb {P}\left( \frac{X_k}{s} <X_1, X_2 < X_3<\ldots X_{k-1}< s < X_k\right) \\= & {} (k-1)! \, \mathbb {P}\left( X_k> s, X_1 > \frac{X_k}{s} , X_2 < X_3, X_3 < X_4, \ldots , X_{k-1}<s\right) \\= & {} (k-1)! \int _s^{\infty } \left( 1-F\left( \frac{x_k}{s}\right) \right) \frac{F^{k-2}(s)}{(k-2)!} d F(x_k)\\= & {} \frac{(k-1)}{2s}F^{k-2}(s). \end{aligned}$$ -
(3)
The last case we consider is when s is situated on \(j-\)th place \((1 \le j \le {k-2})\) in variational series of the sample \(X_2, \ldots , X_{k-2}\). It means that the sample units take places as follows: \(X_2< \ldots < s < \ldots <X_{k-2}< X_{k-1} < X_k\) and s also may stand on first and \((k-2)\)-th places. Then the required probability is equal to
$$\begin{aligned} \mathbb {P}_3= & {} (k-1)! \, \mathbb {P}\left( \frac{X_k}{X_{k-1}} <X_1, X_2< \ldots < s < \ldots <X_{k-2}< X_{k-1} < X_k\right) \\= & {} \frac{1}{2} C_{k-1}^{j-1}(1-F(s))^{k-j} F^{j-1}(s), \, 1 \le j \le {k-2}. \end{aligned}$$
Combining the results we get that the first term in the projection has the form:
Note that the last sum is equal to \(\sum _{j=1}^{k-1}C_{k-1}^{j-1}(1-F(s))^{k-j} F^{j-1}(s)=1-F^{k-1}(s)\). Thus for the initial probability we get the result:
Hence we get the final expression for the projection of the kernel \({\varPsi }_k\):
The calculation of the variance for the projection \(\psi _k\) in the general case is too complicated, therefore we calculate it only for particular k.
2.1 Integral statistic \(I_{n}^{(3)}\)
The projection \(\psi _k(s)\) for the case \(k=3\) has the form:
The variance of this projection \({\varDelta }_3^2 = E\psi _3^2(X_1)\) under \(H_{0}\) is given by
Therefore the kernel \({\varPsi }_3\) is centered and non-degenerate. We can apply Hoeffding’s theorem on asymptotic normality of U-statistics, see again Hoeffding (1948), Korolyuk and Borovskikh (1994), which implies that the following result holds
Theorem 2
Under the null hypothesis as \(n \rightarrow \infty \) the statistic \(\sqrt{n}I_{n}^{(3)}\) is asymptotically normal so that
Now we shall evaluate the large deviation asymptotics of the sequence of statistics \(I_{n}^{(3)}\) under \(H_0\). According to the theorem on large deviations of such statistics from Nikitin and Ponikarov (1999), see also DasGupta (2008), Nikitin (2010), we obtain due the fact that the kernel \({\varPsi }_3\) is centered, bounded and non-degenerate the following result.
Theorem 3
For \(a>0\)
where the function \(f_I^{(3)}\) is continuous for the sufficiently small \(a>0\), and
2.2 Some notions from the Bahadur theory
Suppose that under the alternative \(H_1\) the observations have the d.f. \(G(\cdot ,\theta )\) and the density \(g(\cdot ,\theta ), \ \theta \ge 0\), such that \(G(\cdot , 0) \in {\mathscr {P}}\). The measure of the Bahadur efficiency (BE) for any sequence \(\{T_n\}\) of test statistics is the exact slope \(c_{T}(\theta )\) describing the rate of an exponential decrease for the attained level under the alternative d.f. \(G(\cdot ,\theta )\). According to the Bahadur theory (Bahadur 1971; Nikitin 1995) the exact slopes may be found by using the following Proposition.
Proposition 1
Suppose that the following two conditions hold:
-
a)
\(T_n \ \mathop {\longrightarrow }\limits ^{{P_\theta }} \ b(\theta ),\qquad \theta > 0\), where \(-\infty < b(\theta ) < \infty \), and \(\mathop {\longrightarrow }\limits ^{{P_\theta }}\) denotes convergence in probability under \(G(\cdot \ ; \theta )\).
-
b)
\(\mathop {\lim }\limits _{n\rightarrow \infty } n^{-1} \ \ln \ P_{H_0} \left( T_n \ge t \ \right) \ = \ - h(t)\)
for any t in an open interval I, on which h is continuous and \(\{b(\theta ), \, \theta > 0\}\subset I\). Then
We have already found the large deviation asymptotics. In order to evaluate the exact slope it remains to calculate the first condition of the Proposition.
Note that the exact slopes for any \(\theta \) satisfy the inequality (Bahadur 1971; Nikitin 1995)
where \(K(\theta )\) is the Kullback-Leibler “distance” between the alternative and the null-hypothesis \(H_0\). In our case \(H_0\) is composite, hence for any alternative density \(g_j(x,\theta )\) one has
This quantity can be easily calculated as \(\theta \rightarrow 0\) for particular alternatives. According to (6), the local BE of the sequence of statistics \({T_n}\) is defined as
2.3 The local Bahadur efficiency of \(I_n^{(3)}\)
According to Bahadur theory, the considered alternatives should be close to null-hypothesis as \(\theta \rightarrow 0\). Therefore we suggest three alternatives against the Pareto distribution. The first two alternatives we consider are obtained by skewing mechanism (Ley and Paindaveine 2008), we call them Ley–Paindaveine alternatives.
-
i)
First Ley–Paindaveine alternative with the d.f.
$$\begin{aligned} G_1(x,\theta )=F(x)e^{-\theta (1-F(x))},\quad \theta \ge 0, x \ge 1; \end{aligned}$$ -
ii)
Second Ley–Paindaveine alternative with the d.f.
$$\begin{aligned} G_2(x,\theta )=F(x)-\theta \sin {\pi F(x)}, \quad \theta \in [0,\pi ^{-1}], x\ge 1; \end{aligned}$$ -
iii)
log-Weibull alternative with the d.f.
$$\begin{aligned} G_3(x,\theta )=1-e^{-(\ln {x})^{\theta +1}},\quad \theta \in (0,1), x\ge 1. \end{aligned}$$
Let us find the local BE for the alternative under consideration.
According to the Law of Large Numbers for U-statistics (Korolyuk and Borovskikh 1994), the limit in probability under \(H_1\) is equal to
It is easy to show (Jovanovic et al. 2014) that
where \(h_1(s)=\frac{\partial }{\partial \theta }g_1(s,\theta )\mid _{\theta =0}\) and \(\psi _3(s)\) is the projection from (5). Therefore for the first Ley–Paindaveine alternative we have
and the local exact slope of the sequence \(I_n^{(3)}\) as \(\theta \rightarrow 0\) admits the representation
The Kullback-Leibler “distance” \(K_1(\theta )\) between the alternative and the null-hypothesis \(H_0\) admits the following asymptotics (Jovanovic et al. 2014):
Therefore in our case
Consequently, the local efficiency of the test is
Omitting the calculations similar to previous cases, we get for the second Ley–Paindaveine alternative \(b_2(\theta )\sim 0.353\, \theta \), \(c_2(\theta )\sim 1.363\,\theta ^2\), \(\theta \rightarrow 0\). It is easy to show that \(K_2(\theta ) \sim 0.753\, \theta ^2\), \(\theta \rightarrow 0\). Therefore the local BE is equal to 0.905.
After some calculations in case of the log-Weibull alternative we have:
and the local exact slope of the sequence \(I_n^{(3)}\) as \(\theta \rightarrow 0\) admits the representation \(c_3(\theta ) \sim 1.295\, \theta ^2\). Moreover for the log-Weibull distribution \(K_3(\theta )\) satisfies \(K_3(\theta ) \sim \frac{\theta ^2}{12}\), \(\theta \rightarrow 0\). Hence the local BE for the last case is equal to 0.787.
Table 1 gathers values of the local BE.
2.4 Integral statistic \(I_{n}^{(4)}\)
For the case \(k=4\) the projection \(\psi _k(s)\) has the form:
The variance of this projection under \(H_{0}\) is equal to
Therefore the kernel \({\varPsi }_4\) is centered, non-degenerate and bounded. Due to Hoeffding’s theorem on asymptotic normality of U-statistics (Hoeffding 1948; Korolyuk and Borovskikh 1994), we have that:
Theorem 4
Under the null hypothesis as \(n \rightarrow \infty \) the statistic \(\sqrt{n}I_{n}^{(4)}\) is asymptotically normal so that
The large deviation asymptotics of the sequence of statistics \(I_{n}^{(4)}\) under \(H_0\) follows from the following result. It was derived using the theorem on large deviations (see again Nikitin and Ponikarov 1999; DasGupta 2008; Nikitin 2010), applied to the centered, bounded and non-degenerate kernel \({\varPsi }_4\).
Theorem 5
For \(a>0\)
where the function \(f_I^{(4)}\) is continuous for the sufficiently small \(a>0\), and
2.5 The local Bahadur efficiency of \(I_n^{(4)}\)
For this case the limit in probability under \(H_1\) has the following asymptotics
where again \(h_1(s)=\frac{\partial }{\partial \theta }g_1(s,\theta )\mid _{\theta =0}\) and \(\psi _4(s)\) is the projection from (8). Therefore for the first Ley–Paindaveine alternative we have
and the local exact slope of the sequence \(I_n^{(4)}\) as \(\theta \rightarrow 0\) admits the representation
The Kullback-Leibler “distance” for this alternative was already found above, and it satisfies \(K_1(\theta ) \sim \theta ^2/24, \,\theta \rightarrow 0\). Thus the local efficiency of the test is
For other alternatives the calculations are similar. Omitting the details, let us gather the values of the local BE for this case in the Table 2.
In Table 3 we present the efficiencies from Tables 1 and 2 gathered with maximal values of efficiencies against presumed alternatives.
3 Kolmogorov-type statistic \(D_n^{(k)}\)
Now we consider the Kolmogorov type statistic (3). For a fixed t the difference \(H_n(t) - F_n(t)\) is a family of U-statistics with the kernels, depending on \(t\ge 1\):
The projection of this kernels \(\xi _k(s;t)\) for a fixed \(t \ge 1\) has the form:
It remains to calculate the first term. For this purpose like in the previous cases, we write the decomposition
where \(\mathbb {P}_i, i=1,2,3\), are the initial probabilities, computed in one of the following cases:
-
(1)
Let the sample units take places as follows: \(X_1 < X_2< \ldots < X_{k-1} <s\). Then the probability expresses as
$$\begin{aligned} \mathbb {P}_1= & {} (k-1)! \, \mathbb {P}\left( \frac{s}{X_{k-1}} < t, X_1 < X_2< \ldots < X_{k-1} < s\right) \\= & {} (k-1)! \, \mathbf 1 (s\ge t) \mathbb {P}\left( \frac{s}{t}<X_{k-1}< s, X_1 < X_2< \ldots < X_{k-1}\right) \\&+(k-1)! \, \mathbf 1 (s < t) \mathbb {P}(X_1 < X_2< \ldots < X_{k-1} < s)\\= & {} \mathbf 1 ( s\ge t) \left( F^{k-1}(s)-F^{k-1}\left( \frac{s}{t}\right) \right) . \end{aligned}$$ -
(2)
The sample units are \(X_1 < X_2<\ldots X_{k-2}< s < X_{k-1}\), then for this case we have:
$$\begin{aligned} \mathbb {P}_2= & {} (k-1)! \, \mathbb {P}\left( \frac{X_{k-1}}{s} < t, X_1 < X_2<\ldots X_{k-2}< s < X_{k-1}\right) \\= & {} (k-1)! \, \mathbb {P}(s < X_{k-1} < st, X_1 < X_2<\ldots X_{k-2}< s)\\= & {} (k-1)! \frac{F^{k-2}(s)}{(k-2)!}(F(st)-F(s)) =\frac{(k-1)}{s}\left( 1-\frac{1}{s}\right) ^{k-2}\left( 1-\frac{1}{t}\right) . \end{aligned}$$ -
(3)
In the last case let s be situated on \(l-\)th place \((1 \le l \le {k-2})\) in the variational series of the sample \(X_1, \ldots , X_{k-2}\). Then the required probability transforms into:
$$\begin{aligned} \mathbb {P}_3= & {} (k-1)! \, \mathbb {P}\left( \frac{X_{k-1}}{X_{k-2}} < t, X_1 < \ldots <s< \ldots < X_{k-2} < X_{k-1}\right) \\= & {} \left( 1-\frac{1}{t}\right) C_{k-1}^{l-1}(1-F(s))^{k-j} F^{j-1}(s), \, 1 \le l \le {k-2} . \end{aligned}$$
Combining these results we get that the first term in the projection is equal to:
Again we can see that the last sum can be simplified as
Thus the initial probability is equal to
Hence we get the final expression for the projection of the family of kernels \({\varXi }_k(\cdot ,t)\):
It is easy to show that \(E(\xi _k (X; t))=0\). After some calculations we get that the variance of this projection under \(H_{0}\) is for any t
3.1 Kolmogorov-type statistic \(D_n^{(3)}\)
In the case \(k=3\) the projection of the family of kernels \({\varXi }_3 (X,Y, Z;t)\), namely \(\xi _3 (s;t):=E({\varXi }_3 (X, Y, Z; t)\mid X=s)\) is equal to:
Now we calculate variances of these projections \(\delta _3^2(t)\) under \(H_{0}\). Elementary calculations show that
Hence our family of kernels \({\varXi }_3 (X,Y,Z;t)\) is non-degenerate in the sense of Nikitin (2010) and
This value will be important in the sequel when calculating the large deviation asymptotics (Figs. 1, 2, 3).
The limiting distribution of the statistic \(D_n^{(3)}\) is unknown. Using methods of Silverman (1983), one can show that the U-empirical process
weakly converges in \(D(1,\infty )\) as \(n \rightarrow \infty \) to certain centered Gaussian process \(\eta (t)\) with calculable covariance. Then the sequence of statistics \(\sqrt{n} D_n^{(3)}\) converges in distribution to the rv \(\sup _{t\ge 1} |\eta (t)|\) but currently it is impossible to find explicitly its distribution. Hence it is reasonable to determine the critical values for statistics \(D_n^{(3)}\) by simulation.
Now we obtain the logarithmic large deviation asymptotics of the sequence of statistics \(D_n^{(3)}\) under \(H_0\). The family of kernels \(\{{\varXi }_3(X, Y, Z; t), t\ge 0\}\) is not only centered but bounded. Using results from Nikitin (2010) on large deviations for the supremum of non-degenerate U-statistics, we obtain the following result.
Theorem 6
For \(a>0\)
where the function \(f_D^{(3)}\) is continuous for the sufficiently small \(a>0\), moreover
3.2 The local Bahadur efficiency of \(D_n^{(3)}\)
To evaluate the efficiency, first consider again the first Ley–Paindaveine alternative with the d.f. \(G_1(x,\theta ),\theta \ge 0, x \ge 1\) given above. By the Glivenko-Cantelli theorem for U-statistics (Janssen 1988) the limit in probability under the alternative for statistics \(D_n^{(3)}\) is equal to
It is not difficult to show that
where again \(h_1(s)=\frac{\partial }{\partial \theta }g_1(s,\theta )\mid _{\theta =0}\) and \(\xi _3(s;t)\) is the projection defined above in (10). Hence for the first Ley–Paindaveine alternative we have for \(t\ge 1\):
Thus \(b_1(\theta )=\sup _{t\ge 1}|b_1(t,\theta )| \sim 0.125\,\theta \), and it follows that the local exact slope of the sequence of statistics \(D_n\) admits the representation:
The Kullback-Leibler information in this case is given by (7). Hence the local Bahadur efficiency of our test is \(e^B_1(D)= 0.599\).
Next we take the second Ley–Paindaveine distribution, where calculations are similar, and the local BE is equal to 0.689. In the case of the log-Weibull density we find that the local BE is 0.467.
We collect the values of local BE in the Table 4.
3.3 Kolmogorov-type statistic \(D_n^{(4)}\)
In the case \(k=4\) the projection of the family of kernels \({\varXi }_4 (X,Y, Z, W;t)\), is equal to:
Therefore we get that variances of these projections \(\delta _4^2(t)\) under \(H_{0}\)
Hence our family of kernels \({\varXi }_4 (X,Y,Z,W;t)\) is non-degenerate in the sense of Nikitin (2010) and
The limiting distribution of the statistic \(D_n^{(4)}\) is unknown as in the previous section.
The logarithmic large deviation asymptotics of the sequence of statistics \(D_n^{(4)}\) under \(H_0\) is showed in the next theorem.
Theorem 7
For \(a>0\)
where the function \(f_D^{(4)}\) is continuous for the sufficiently small \(a>0\), moreover
3.4 The local Bahadur efficiency of \(D_n^{(4)}\)
In the Table 5 we collect the calculated efficiencies for the statistic \(D_n^{(k)}\) joined with results from the Table 4 and with the maximal values of efficiencies against our alternatives.
We observe that the efficiencies for the Kolmogorov-type test are lower than for the integral test. However, it is the usual situation when testing goodness-of-fit (Nikitin 1995; Rank 1999; Nikitin 2010).
3.5 Critical values
Tables 6 and 7 shows the critical values of the null distribution of \(D_n^{(3)}\) and \(D_n ^{(4)}\) for significance levels \(\alpha = 0.1, 0.05, 0.01\) and specific sample sizes n. Each entry is obtained by using the Monte-Carlo simulation methods with 10,000 replications.
4 Power comparison
We recall computation formulae for statistics \(I_n^{(k)}\) and \(D_n^{(k)}\) for \(k=\{3, 4\}\):
where \(\pi (i_1, \ldots , i_{k+1})\) means all permutations of different indices from \(\{i_1, \ldots ,i_{k+1}\}\).
This section presents results of a Monte-Carlo study to compare powers of new tests with the widely applied for these types of hypotheses Kolmogorov-Smirnov (KS) and Cramer-von Mises (CvM) tests. The comparison is done for the size \(n=20\) and for the significance level \(\alpha =0.05\). All calculation were done using JAVA (The Apache Commons Mathematics Library) and R package with 10,000 replications. We consider following distributions as alternatives for the Pareto distribution:
-
1)
the Gamma alternative \({\varGamma }(\theta )\) with the density \(({\varGamma }(\theta ))^{-1}x^{\theta -1}\exp (-x)\);
-
2)
the log-normal law \(LN(\theta )\) with the density \((\theta x)^{-1}(2\pi )^{-1/2}\exp (-(\log {x})^2/2\theta ^2)\);
-
3)
the first Ley–Paindaveine alternative \(LP1(\theta )\) with the d.f. \((1-\frac{1}{x})\exp (-\theta /x),\theta \ge 0, x \ge 1\);
-
4)
log-Weibull alternative with the d.f. \(1-e^{-(\ln {x})^{\theta +1}},\theta \in (0,1), x\ge 1\);
-
5)
the Weibull distribution \(W(\theta )\) with the density \(\theta x^{\theta -1}\exp (-x^{\theta })\).
The KS and CvM tests are not applicable to composite hypothesis, so first we estimate parameter \(\lambda \) with its maximum likelihood estimator (MLE) \(\hat{\lambda }=n(\sum _{k=1}^n \ln {X_k})^{-1}\), then calculate the critical values of corrected test and powers for the sample \(n=20\) using the Monte-Carlo procedure. Powers are given in the Table 8.
We observe that powers of our tests correspond to local Bahadur efficiencies for considerable alternatives. In whole we can notice that our new statistics in comparison with classical tests more favorable to alternatives with the density shapes similar to the Pareto distribution, for example like the first Ley–Paindaveine alternative. However they are less responsive to close alternatives but with another shapes (for example when density have some twists differ to the Pareto distribution), for example gamma and log-Weibull alternatives.
5 Application to the real data
In this section we apply our tests to the real data example from Hogg and Klugman (1984), where they are discussed in detail. The data set represents the losses due to wind-related catastrophes, 1977, rounded to the nearest million dollars and involved more than $2 million:
These data widely analyze in the literature, see Brazausskas and Serfling (2003) for detail, the new goodness-of-fit tests were proposed in Rizzo (2009). First we apply the same to Brazausskas and Serfling (2003) and Rizzo (2009) the data de-grouping method. The necessity of this method appear as a consequence of the initial data rounding and make from discrete grouping the uniform distributed data. Put
where (A, B) is a grouping interval with m observations. According to Brazausskas and Serfling (2003) for example to observations corresponds to 2 we conditionally take as (A, B) the interval (1.5, 2.5).
We tested the null-hypothesis \(H_0: X\) has the Pareto distribution with the scale parameter \(\sigma = 1.5\) and MLE of the tail parameter \(\lambda = 0.764\). Such special parameters consider in Brazausskas and Serfling (2003), Hogg and Klugman (1984) and in Rizzo (2009), Philbrick and Jurschak (1981) applied \(\sigma = 2.0\).
Applying our tests to these data, we get in the Table 9 the following p-values of test statistics \(I_n^{(k)}\) and \(D_n^{(k)}\), based on 10,000 simulations.
So we conclude that our tests do not reject the null-hypothesis. It correspond to results of Brazausskas and Serfling (2003), Rizzo (2009).
6 Conditions of the local asymptotic optimality
In this section we are interested in conditions of the local asymptotic optimality (LAO) in Bahadur sense for both sequences of statistics \(I_n^{(k)}\) and \(D_n^{(k)}\). This means to describe the local structure of alternatives for which the given statistic has maximal potential local efficiency so that the relation
holds (Nikitin 1995; Nikitin and Tchirina 1996). Such alternatives form the domain of LAO for the given sequence of statistics.
Consider functions
We will assume that the following regularity conditions are true (see also Nikitin and Tchirina 1996):
Denote by \(\mathscr {G}\) the class of densities \(g(x,\theta )\) with d.f.’s \(G(x,\theta )\), satisfying the regularity conditions (11)–(12). We are going to deduce the LAO conditions in terms of the function h(x).
Recall that for alternative densities from \({\mathscr {G}}\) the following asymptotics is valid:
6.1 LAO conditions for \(I_n^{(k)}\)
First consider the integral statistic \(I_n^{(k)}\) with the kernel \({\varPsi }_k(X_1, \ldots , X_{k+1})\) and its projection \(\psi _k(x)\) from (4). Let introduce the auxiliary function
Simple calculations show that
Hence the local asymptotic efficiency takes the form
By Cauchy-Schwarz inequality we obtain that the expression in the right-hand side is equal to 1 iff \(h_0(x)=C_1\psi _k(x)\frac{1}{x^2}\) for some constant \(C_1>0\), so that
The set of distributions for which the function h(x) has such form generate the domain of LAO in the class \(\mathscr {G}\). The simplest examples of such alternatives density \(g(x,\theta )\) for small \(\theta > 0\) is given by the Table 10.
6.2 LAO conditions for \(D_n^{(k)}\)
Now let consider the Kolmogorov type statistic \(D_n^{(k)}\) with the family of kernels \({\varXi }_k\) and their projections \(\xi _k(x;t)\) from (9). After simple calculations we get
Hence the local efficiency takes the form
We can apply once again the Cauchy-Schwarz inequality to the numerator in the last ratio. It follows that the sequence of statistics \(D_n\) is the locally asymptotically optimal, and \(e^B (D_n^{(k)})=1\) iff
and some constants \(C_3>0\) and \(C_4\).
Distributions with such h(x) form the domain of LAO in the class \({\mathscr {G}}\). The simplest examples are given in the Table 11.
7 Conclusion
We constructed two new tests for goodness-of-fit testing for the Pareto distribution based on the new characterization for the Pareto distribution. We describe their limit distribution and large deviations. The Bahadur efficiency for some alternatives has been obtained and it turned out reasonably high. Also we derived the conditions of local optimality for our tests. These tests were compared with some commonly used goodness-of-fit tests and it can be noted that in most cases our tests are more powerful. They can be of some use in statistical research, especially when the alternative is close to the alternative from the LAO class.
References
Ahsanullah M (1977) A characteristic property of the exponential distribution. Ann Stat 5(3):580–582
Ahsanullah M (1989) On characterizations of the uniform distribution based on functions of order statistics. Allgarh J Stat 9:1–6
Arnold BC (1983) Pareto distributions. Internetional Co-operative Publishing House, Fairland, MD
Bahadur RR (1971) Some limit theorems in statistics. SIAM, Philadelphia
Beirlant J, de Wet T, Goegebeur Y (2006) A goodness-of-fit statistic for the Pareto-type behavior. J Comp Appl Math 186:99–116
Brazausskas V, Serfling R (2003) Favorable estimators for fitting Pareto models: a study using goodness-of-fit measures with actual data. ASTIN Bull 33(2):365–381
DasGupta A (2008) Asymptotic theory of statistics and probability. Springer, New York
Gulati S, Shapiro S (2008) Goodness of fit tests for the Pareto distribution. In: Vonta F, Nikulin M, Limnios N, Huber C (eds) Statistical models and methods for biomedical and technical systems. Birkhäuser, Boston, pp 263–277
Helmers R, Janssen P, Serfling R (1988) Glivenko–Cantelli properties of some generalized empirical DF’s and strong convergence of generalized L-statistics. Prob Theory Relat Fields 79:75–93
Hoeffding W (1948) A class of statistics with asymptotically normal distribution. Ann Math Stat 19:293–325
Hogg RV, Klugman SA (1984) Loss distributions. Wiley, New York
Janssen PL (1988) Generalized empirical distribution functions with statistical applications. Limburgs Universitair Centrum, Diepenbeek
Jovanovic M, Milosevic B, Obradovic M (2014) Goodness of fit tests for Pareto distribution based on a characterization and their asymptotics. arXiv:1310.5510. Accepted for publication in Statistics doi:10.1080/02331888.2014.919297
Kleiber C, Kotz SA (2003) Satistical size distributions in economics ans actuarial sciences. Wiley
Korolyuk VS, Borovskikh YV (1994) Theory of \(U\)-statistics. Kluwer, Dordrecht
Ley C, Paindaveine D (2008) Le Cam optimal tests for symmetry against Ferreira and Steel’s general skewed distribution. J Nonparam Stat 21:943–967
Litvinova VV (2004) Asymptotic properties of goodness-of-fit and symmetry tests based on characterizations. Dissertation, Saint-Petersburg University
Martynov GV (2009) Cramér-von Mises test for the Weibull and Pareto Distributions. In: Proceedings of Dobrushin Intern Conf, Moscow, pp 117–122
Nikitin Y (1995) Asymptotic efficiency of nonparametric tests. Cambridge University Press, New York
Nikitin YY (1996) Bahadur efficiency of a test of exponentiality based on a loss of memory type functional equation. J Nonparametr Stat 6(1):13–26
Nikitin YY (2010) Large deviations of \(U\)-empirical Kolmogorov–Smirnov tests, and their efficiency. J Nonparametr Stat 22:649–668
Nikitin YY, Peaucelle I (2004) Efficiency and local optimality of distribution-free tests based on \(U\)- and \(V\)- statistics. Metron LXII:185–200
Nikitin YY, Ponikarov EV (1999) Rough large deviation asymptotics of Chernoff type for von Mises functionals and \(U\)-statistics. In: Proceedings of StPetersburg Math Soc, vol 7, pp 124–167. Engl transl (2001) AMS Transl, vol 2(203), pp 107–146
Nikitin YY, Tchirina AV (1996) Bahadur efficiency and local optimality of a test for the exponential distribution based on the Gini statistic. Stat Methods Appl 5:163–175
Nikitin YY, Volkova KY (2012) Asymptotic efficiency of goodness-of-fit test for power distribution based on Puri–Rubin characterization. Zapis Nauchnykh Semin POMI 408:115–130
Nikitin YY, Volkova KY (2010) Asymptotic efficiency of exponentiality tests based on order statistics characterization. Georgian Math J 17:749–763
Philbrick SW, Jurschak J (1981) Discussion of “Estimating casualty insurance loss amount distributions”. In: Proceedings of the Casualty Actuarial Society, LXVIII
Rank RF (1999) Statistische Anpassungstests und Wahrscheinlichkeiten grosser Abweichungen. Dissertation, University of Hannover
Rizzo ML (2009) New goodness-of-fit tests for Pareto distribution. Astin Bull 39(2):691–715
Silverman BW (1983) Convergence of a class of empirical distribution functions of dependent random variables. Ann Probab 11:745–751
Wieand HS (1976) A condition under which the Pitman and Bahadur approaches to efficiency coincide. Ann Stat 4:1003–1011
Acknowledgments
The authors express their deep gratitude to the Referees and the Associate Editor for their useful suggestions for the improvement of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Research supported by Grant RFBR No. 13-01-00172, Grant NSh No. 2504.2014.1 and by SPbGU Grant No. 6.38.672.2013.
Rights and permissions
About this article
Cite this article
Volkova, K. Goodness-of-fit tests for the Pareto distribution based on its characterization. Stat Methods Appl 25, 351–373 (2016). https://doi.org/10.1007/s10260-015-0330-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-015-0330-y