Abstract
The Stein-type identities are widely recognized for their substantial utility and potency in deriving shrinkage estimators improving on crude estimators in normal, gamma, Poisson, and negative binomial distributions. Additionally, these identities serve to characterize these distributions themselves. The Stein identities are also used to demonstrate normal approximation. Moreover, they are instrumental in constructing statistical tests to assess the goodness-of-fit for normality, exponentiality, and Poissonity of distributions. This article offers an instructive and comprehensive explanation of the applications of Stein-type identities in the aforementioned contexts.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Stein (1973, 1981) introduced the Stein identity, also known as the Stein equation, to derive unbiased estimators for risk functions of shrinkage estimators in the simultaneous estimation of means within normal distributions. This innovative method was employed to enhance the performance of unbiased estimators. The simplicity and potency of this technique have led to significant developments in the field of shrinkage estimation, as extensively documented in the literature. For a comprehensive exploration of this subject, refer to Stein (1981), Strawderman (1971), Shinozaki (1984), Berger (1985), Brandwein and Strawderman (1990), Robert (2007), and Fourdrinier et al. (2018). Komaki (2001) made an intriguing contribution by extending the Stein phenomenon to the prediction of predictive distributions. For further insights into this topic, see Ghosh et al. (2020).
It is crucial to note that Stein identities offer utility not only in the context of shrinkage estimation within decision-theoretic frameworks but also in normal approximations, such as the central limit theorem. Their relevance in normal approximation stems from the fact that Stein identities provide a characterization of the normal distribution. Consequently, these identities can be employed in constructing goodness-of-fit test statistics for normality. Hudson (1978) extended Stein-type identities to gamma, Poisson, and negative binomial distributions. Thus, the threefold applications of shrinkage estimation, distribution characterization, and goodness-of-fit testing can be extended to these alternative distributions.
In this paper, we offer an instructive exposition and review of these expanded applications of Stein identities. While many of the results presented herein are well-established in the literature, readers will appreciate the versatile utility of Stein identities in both statistical theory and practical applications.
In Sect. 2, we explain that the normal distribution is equivalent to the Stein identity or the differential equation based on the moment-generating function. Although many characterizations of normal distributions were given in the literature, some of which are summarized there. Two applications of the Stein identity are provided in Sect. 3. Especially, we construct a goodness-of-fit test statistic for normality based on the Stein identity and investigate numerically the performance.
Another important application of the Stein identity is the normal approximation of sum of independent random variables. An instructive explanation is provided in Sect. 4 based on Chen et al. (2011).
In Sect. 5, we describe that the gamma distribution is equivalent to the Stein-type identity or the differential equation of the moment-generating function. Some characterizations of gamma and exponential distributions and shrinkage estimation in decision-theoretic frameworks are summarized. A goodness-of-fit test statistic for exponentiality is constructed based on the Stein-type identity and the performance is investigated numerically.
In Sect. 6, we explain that the Poisson distribution is equivalent to the Stein-type identity or the differential equation of the moment-generating function. Shrinkage estimation and goodness-of-fit test for Poissonity are demonstrated. The Stein-type identity in a negative binomial distribution is briefly described. Some remarks and extensions are given in Sect. 7 as concluding remarks.
2 Stein identity and characterization of normal distributions
In normal distributions, Stein (1973, 1981) developed the so-called Stein identity which is not only useful for calculating higher moments, but also powerful for developing shrinkage estimators improving on the minimax estimator in the simultaneous estimation of normal means. The Stein identity also provides the characterization of normal distribution, which means that the central limit theorem can be shown by using the Stein identity. For recent developments and review on the Stein identity, see Bellec and Zhang (2021), Chen (2021), Fathi et al. (2022) and Anastasiou (2023).
Theorem 2.1
Let X be a random variable with mean \(\textrm{E}[X]=\mu \) and variance \(\textrm{Var}(X)={\sigma }^2\). Then, the following four conditions are equivalent.
-
(a)
\(X\sim \mathcal{N}(\mu , {\sigma }^2)\).
-
(b)
For any differentiable function \(h(\cdot )\) with \(\textrm{E}[|(X-\mu )h(X)|]<\infty \) and \(\textrm{E}[|h'(X)|]<\infty \), it holds that
$$\begin{aligned} \textrm{E}[(X-\mu )h(X)]={\sigma }^2 \textrm{E}[h'(X)], \end{aligned}$$(2.1)where \(h'(x)\) is the derivative of h(x).
-
(c)
For any real constant t satisfying \(\textrm{E}[|X|e^{tX}]<\infty \), it holds that
$$\begin{aligned} \textrm{E}[(X-\mu )\exp \{tX\}]=t {\sigma }^2 \textrm{E}[\exp \{tX\}]. \end{aligned}$$(2.2) -
(d)
Let \(g(t)=\textrm{E}[e^{t(X-\mu )}]\). Then for any t in the interval \((-c,c)\) for positive constant c, g(t) satisfies the differential equation
$$\begin{aligned} {\textrm{d}\over \textrm{d}t}\log g(t)={g'(t)\over g(t)}={\sigma }^2t\quad \text {or}\quad {\textrm{d}^2\over \textrm{d}t^2}\log g(t)={g''(t)\over g(t)}-\Big ({g'(t)\over g(t)}\Big )^2={\sigma }^2, \end{aligned}$$(2.3)where \(g(0)=1\), \(g'(0)=0\) and \(g''(0)={\sigma }^2\).
Proof
The proof from (a) to (b) can be done by integration by parts as seen from Stein (1981) and Fourdrinier et al. (2018). We here introduce another approach. Making the transformation \(Y=X-\mu \) gives
Differentiating both sides with respect to \(\mu \) and using Lebesgue’s dominated convergence theorem, we can demonstrate that
which is rewritten as in (2.1) by turning back with \(X=Y+\mu \).
Clearly, one gets (c) from (b). Also, it is easy to get (d) from (c). For the proof from (d) to (a), the solution of the differential equation in (2.3) is \(g(t)=\exp \{{\sigma }^2 t^2/2\}\), which implies that \(X-\mu \sim \mathcal{N}(0, {\sigma }^2)\). \(\square \)
We briefly provide some conditions for characterizing normality. The study of characterizations of normality has had a long history as explained in Kagan et al. (1973) and Kotz (1974). For a good review of the book of Kagan et al. (1973), see Diaconis et al. (1977). In Theorem 2.2, sufficient condition (b) was given by Cramér (1936), and we provide a simple proof by using the Stein identity. Condition (d) was shown by Kac (1939), Bernstein (1941) and Lukacs (1942).
Theorem 2.2
Assume that independent random variables \(X_1\) and \(X_2\) are identically distributed with \(\textrm{E}[X_i]=\mu \) and \(\textrm{Var}(X_i)={\sigma }^2\) for \(i=1,2\). Then the following four conditions are equivalent.
-
(a)
\(X_i \sim \mathcal{N}(\mu , {\sigma }^2)\) for \(i=1, 2\).
-
(b)
\(X_1+X_2\sim \mathcal{N}(2\mu , 2{\sigma }^2)\).
-
(c)
For \(i=1,2\), the density function of \(X_i-\mu \) is symmetric, and \((X_1-X_2)^2/(2{\sigma }^2) \sim \chi _1^2\).
-
(d)
\(X_1+X_2\) and \(X_1-X_2\) are independent.
Proof
Since one gets clearly (b), (c) and (d) from (a), it is sufficient to demonstrate the opposite directions. For the proof from (b) to (a), the condition in (b) and Theorem 2.1(c) implies that
From the independence of \(X_1\) and \(X_2\), we have
Since \(X_1\) and \(X_2\) have the same distribution, we have \(\textrm{E}[(X_1-\mu )\exp \{t X_1\}] \textrm{E}[\exp \{t X_2\}]= \textrm{E}[(X_2-\mu )\exp \{t X_2\}] \textrm{E}[\exp \{t X_1\}]\), so that we can see that for \(i=1, 2\),
which, from Theorem 2.1, shows that \(X_i\sim \mathcal{N}(\mu , {\sigma }^2)\).
For the proof from (c) to (b), let \(Y_i = X_i-\mu \) for simplicity. Since \((Y_1-Y_2)^2/(2{\sigma }^2)\sim \chi _1^2\), we have \(Y_1-Y_2\sim \mathcal{N}(0,2{\sigma }^2)\). From Theorem 2.1(c), it follows that
Note that \(Y_1\) and \(Y_2\) are independent and \(-Y_i\) has the same distribution as \(Y_i\). Thus, equality (2.5) can be rewritten as
which is identical to equality (2.4). Hence one gets (b).
Finally, we provide the proof from (d) to (a) along with the proof of Lukacs (1942). The independence of \(X_1+X_2\) and \(X_1-X_2\) is equivalent to the independence of \(Y_1+Y_2\) and \(Y_1-Y_2\) for \(Y_i=X_i-\mu \), which implies that
Letting \(g(t)=\textrm{E}[\exp \{tY_i\}]\), we can see that LHS of (2.6) is written as
On the other hand, RHS of (2.6) is written as \(\{g(s)\}^2g(t)g(-t)\), so that Eq. (2.6) is expressed as
equivalently rewritten as
Let \(\psi (t)=(d/dt)\log g(t)\). Differentiating the both sides in (2.7) with respect to s and t, we have
Note that \(\psi (0)=0\). Substituting \(s=0\) in (2.8) gives \(\psi (t)+\psi (-t)=0\), or \(\psi (-t)=- \psi (t)\), which is used to rewrite (2.9) as \(\psi (s+t)-\psi (s-t)=2\psi (t)\). Combining this equality and (2.8) gives
Equation (2.10) implies that \(\psi (t)\) is written as \(\psi (t) = c t\) for constant c, namely \((d/dt)\log g(t)=ct\). From Theorem 2.1(d), the solution is \(g(t)=\exp \{ct^2/2\}\). Since \(g''(0)={\sigma }^2\), we have \(c={\sigma }^2\). Thus, \(Y_i=X_i-\mu \sim \mathcal{N}(0, {\sigma }^2)\). \(\square \)
The normality can be characterized by a random sample. Conditions (b), (e) and (f) in Theorem 2.3 were derived by Kagan et al. (1965), Lukacs (1942) and Ruben (1974), respectively.
Theorem 2.3
Let \(X_1, \ldots , X_n\) be a random sample from a population with \(\textrm{E}[X_i]=\mu \) and \(\textrm{Var}(X_i)={\sigma }^2\). Let \({{\overline{X}}}=n^{-1}\sum _{i=1}^n X_i\) and \(S^2=n^{-1}\sum _{i=1}^n(X_i-{{\overline{X}}})^2\). Then, the following six conditions are equivalent.
-
(a)
\(X_i \sim \mathcal{N}(\mu , {\sigma }^2)\) for \(i=1, \ldots , n\).
-
(b)
\(\textrm{E}[{{\overline{X}}}| X_1-{{\overline{X}}}, \ldots , X_n-{{\overline{X}}}]=\mu \) for \(n\ge 3\).
-
(c)
\({{\overline{X}}}\) and \((X_1-{{\overline{X}}}, \ldots , X_n-{{\overline{X}}})\) are independent.
-
(d)
\({{\overline{X}}}\sim \mathcal{N}(\mu , {\sigma }^2/n)\).
-
(e)
\({{\overline{X}}}\) and \(S^2\) are independent.
-
(f)
For \(i=1, \ldots , n\), the density function of \(X_i-\mu \) is symmetric, and \(nS^2/{\sigma }^2\sim \chi _{n-1}^2\).
Proof
Using well-known properties of a normal distribution, one gets (b), (c), (d), (e) and (f) from (a). For the proof from (b) to (a), let \(Z_i=(X_i-\mu )/{\sigma }\) for simplicity. Then, (b) is rewritten as \(\textrm{E}[{{\overline{Z}}}| Z_1-{{\overline{Z}}}, \ldots , Z_n-{{\overline{Z}}}]=0\) for \({{\overline{Z}}}=n^{-1}\sum _{i=1}^n Z_i\). This implies that
Let \(s_i=t_i-{{\overline{t}}}\) for \({{\overline{t}}}=n^{-1}\sum _{i=1}^n t_i\). Then the above equality is expressed as
equivalently rewritten as
for \(\psi (t)=(d/dt)\log \textrm{E}[\exp \{t Z_i\}]\). Since \(\sum _{i=1}^n s_i=0\), we have \(s_n = - \sum _{i=1}^{n-1}s_i\). Thus,
Substituting \(s_2=\cdots =s_{n-1}=0\) gives \(\psi (s_1)=-\psi (-s_1)\). Hence, the above equality is expressed as
Since this equality holds for \(n\ge 3\), we can see that the solution is \(\psi (t)=ct\). Since \(\textrm{Var}(Z_i)=1\), we have \(c=1\). Thus, from Theorem 2.1(d), it follows that \(Z_i \sim \mathcal{N}(0, 1)\).
For the proof from (c) to (a), the case of \(n=2\) follows from Theorem 2.2(d). When \(n\ge 3\), the independence between \({{\overline{X}}}\) and \((X_1-{{\overline{X}}}, \ldots , X_n-{{\overline{X}}})\) implies that
which results in (b) and leads to (a).
The proof from (d) to (a) can be done by using the same arguments as in the proof of Theorem 2.2(d).
For the proof from (e) to (a), we provide the proof given by Lukacs (1942). Let \(Y_i=X_i-\mu \) and \({{\overline{Y}}}=n^{-1}\sum _{i=1}^n Y_i\). Note that \(\sum _{i=1}^n(X_i-{{\overline{X}}})^2=\sum _{i=1}^n (Y_i-{{\overline{Y}}})^2\) and \({{\overline{X}}}={{\overline{Y}}}+\mu \). Since \(\sum _{i=1}^n (Y_i-{{\overline{Y}}})^2\) and \({{\overline{Y}}}\) are independent, we have
For \(g(s)=\textrm{E}[e^{sY_i}]\), we can see that \(\textrm{E}[e^{s\sum _{i=1}^n Y_i}]= \{ g(s)\}^n\). Differentiating (2.11) with respect to t and putting \(t=0\) gives
Noting that
we can express (2.12) as
Since \(Y_1, \ldots , Y_n\) are independently and identically distributed, the terms in LHS of (2.13) are evaluated as
Substituting these quantities into (2.13) yields
Thus, from Theorem 2.1(d), it follows that \(X_i-\mu \sim \mathcal{N}(0, {\sigma }^2)\), and one gets (a). For the proof of (f) to (a), see Ruben (1974). \(\square \)
3 Applications to shrinkage estimation and goodness-of-fit test
We now provide two applications of the Stein identity to shrinkage estimation and goodness-of-fit tests for normality.
3.1 Shrinkage estimation
The Stein identity is very powerful for deriving unbiased risk estimators of shrinkage estimators. Let \(X_1, \ldots , X_p\) be independent random variables such that \(X_i \sim \mathcal{N}({\theta }_i, {\sigma }^2)\), \(i=1, \ldots , p\). Consider the problem of estimating \({{\varvec{\theta }}}=({\theta }_1, \ldots , {\theta }_p)^\top \) simultaneously for known \({\sigma }^2\). When estimator \({{\widehat{{{\varvec{\theta }}}}}}\) is evaluated with the risk function relative to the quadratic loss \(\Vert {{\widehat{{{\varvec{\theta }}}}}}-{{\varvec{\theta }}}\Vert ^2/{\sigma }^2=({{\widehat{{{\varvec{\theta }}}}}}-{{\varvec{\theta }}})^\top ({{\widehat{{{\varvec{\theta }}}}}}-{{\varvec{\theta }}})/{\sigma }^2\), Stein (1956) established the inadmissibility of \({{\varvec{X}}}=(X_1, \ldots , X_p)^\top \) in the case of \(p\ge 3\), and James and Stein (1961) suggested the shrinkage estimator
The improvement over \({{\varvec{X}}}\) was proved using a somewhat complicated properties of noncentral chi-squares distribution. Stein (1973) provided a new technique based on the Stein identity for the proof. Because of a simple integration-by-part, the Stein identity enabled us to develop innovated results and great contributions to this research area. The Stein identity was extended to the identity in the chi-square and Wishart distributions and those identities were unified by Konno (2009) which enables us to handle the high-dimensional cases. For some developments and extensions, see Berger (1985), Brandwein and Strawderman (1990), Fourdrinier et al. (2018), Ghosh et al. (2020), Tsukuma and Kubokawa (2020) and Maruyama et al. (2023) and the reference therein.
Theorem 3.1
Let \({{\varvec{h}}}({{\varvec{X}}})=(h_1({{\varvec{X}}}), \ldots , h_p({{\varvec{X}}}))^\top \), where \(h_i({{\varvec{X}}})\) is differentiable and satisfies \(\textrm{E}[|(X_i-{\theta }_i)h_i({{\varvec{X}}})|]<\infty \) and \(\textrm{E}[|(\partial /\partial X_i)h_i({{\varvec{X}}})|]<\infty \). Then, the shrinkage estimator \({{\widehat{{{\varvec{\theta }}}}}}_{{\varvec{h}}}={{\varvec{X}}}-{{\varvec{h}}}({{\varvec{X}}})\) has the unbiased risk estimator
where \({{\varvec{\nabla }}}=(\partial /\partial X_1, \ldots , \partial /\partial X_p)^\top \). Especially, \({{\widehat{{{\varvec{\theta }}}}}}_\phi ={{\varvec{X}}}-W^{-1}\phi (W){{\varvec{X}}}\) for \(W=\Vert {{\varvec{X}}}\Vert ^2/{\sigma }^2\) has the risk unbiased estimator
Proof
The risk function of \({{\widehat{{{\varvec{\theta }}}}}}_{{\varvec{h}}}\) is \(R({{\varvec{\theta }}}, {{\widehat{{{\varvec{\theta }}}}}}_{{\varvec{h}}})=p-2\textrm{E}[({{\varvec{X}}}-{{\varvec{\theta }}})^\top {{\varvec{h}}}({{\varvec{X}}})]/{\sigma }^2+\textrm{E}[\{{{\varvec{h}}}({{\varvec{X}}})\}^\top {{\varvec{h}}}({{\varvec{X}}})]/{\sigma }^2\), and the Stein identity gives
This provides the unbiased estimator of the risk function given in (3.1). (3.2) can be derived from (3.1). \(\square \)
From (3.1) or (3.2), we can derive conditions on \({{\varvec{h}}}(\cdot )\) or \(\phi (\cdot )\) for improvement over \({{\varvec{X}}}\). For example, Baranchik’s (1970) condition is (a) \(\phi (w)\) is nondecreasing and (b) \(0\le \phi (w)\le 2(p-2)\).
The James–Stein estimator \({{\widehat{{{\varvec{\theta }}}}}}^{\textrm{JS}}\) corresponds to the case of \(\phi (w)=p-2\), and the risk unbiased estimator suggests the equation
which is interpreted as the Pythagorean triangle among \({{\varvec{X}}}\), \({{\varvec{\theta }}}\) and \({{\widehat{{{\varvec{\theta }}}}}}^{\textrm{JS}}\). Kubokawa (1994) constructed a class of estimators improving on the James–Stein estimator. See also Kubokawa (1991).
Theorem 3.2
The estimator \({{\widehat{{{\varvec{\theta }}}}}}_\phi ={{\varvec{X}}}-W^{-1}\phi (W){{\varvec{X}}}\) improves on the James–Stein estimator if (a) \(\phi (w)\) is nondecreasing, and (b) \(\lim _{w\rightarrow \infty }\phi (w)=p-2\) and
Proof
The risk difference is \({\Delta }=R({{\varvec{\theta }}}, {{\widehat{{{\varvec{\theta }}}}}}^{\textrm{JS}})-R({{\varvec{\theta }}}, {{\widehat{{{\varvec{\theta }}}}}}_\phi )=-\textrm{E}[ \{\phi (W)- (p-2)\}^2/W] + 4 \textrm{E}[\phi '(W)]\), and from condition (b), it is noted that
so that, after making the transformation, the first term is written as
where \(f_p(w, {\lambda })\) denotes the density function of noncentral chi-square distribution with p degrees of freedom and noncentrality \({\lambda }=\Vert {{\varvec{\theta }}}\Vert ^2/{\sigma }^2\). Thus,
Since \(\phi '(w)\ge 0\) from condition (a), we have \({\Delta }\ge 0\) if \(\phi (w)\) satisfies \(\phi (w)\ge \phi _{\lambda }(w)\), where
We here show that \(\phi _0(w)\ge \phi _{\lambda }(w)\), which is written as
or
Since the noncentral chi-squared distribution can be expressed as a mixtute of Poisson and central chi-squared distributions, it is noted that
for \(P_{\lambda }(k)=({\lambda }/2)^k e^{-{\lambda }/2}/k!\). Hence,
is increasing in y, so that for \(w>y\),
which implies that \(\phi _0(w)\ge \phi _{\lambda }(w)\). Using integration by parts, we can see that
which is given in (3.3). Hence, it is proved that \({\Delta }\ge 0\) under condition \(\phi (w)\ge \phi _0\). \(\square \)
It is interesting to note that the estimator \({{\widehat{{{\varvec{\theta }}}}}}^{\textrm{GB}}={{\widehat{{{\varvec{\theta }}}}}}_{\phi _0}\) with \(\phi _0(W)\) is the generalized Bayes estimator against the prior distribution \(\pi ({{\varvec{\theta }}})=\Vert {{\varvec{\theta }}}\Vert ^{2-p}\). Since \(\phi _0(w)\) satisfies the above conditions (a), the generalized Bayes estimator \({{\widehat{{{\varvec{\theta }}}}}}_{\phi _0}\) improves on the James–Stein estimator. It is also interesting to note that the prior distribution \(\pi ({{\varvec{\theta }}})=\Vert {{\varvec{\theta }}}\Vert ^{2-p}\) is a harmonic function, namely \({{\varvec{\nabla }}}^\top {{\varvec{\nabla }}}\pi ({{\varvec{\theta }}})=\sum _{i=1}^p({\partial ^2/\partial {\theta }_i^2})\Vert {{\varvec{\theta }}}\Vert ^{2-p}=0\). The positive-part Stein estimator \({{\widehat{{{\varvec{\theta }}}}}}^{\mathrm{S+}}=\max \{1-(p-2)/W, 0\}{{\varvec{X}}}\) also satisfies the conditions (a) and (b).
The risk performances of the shrinkage estimators \({{\widehat{{{\varvec{\theta }}}}}}^{\textrm{JS}}\), \({{\widehat{{{\varvec{\theta }}}}}}^{\mathrm{S+}}\) and \({{\widehat{{{\varvec{\theta }}}}}}^{\textrm{GB}}\), denoted by JS, PS and GB, respectively, are investigated by simulation with 10, 000 replications and the average values of the risks are reported in Table 1 for \(p=6\), \({\sigma }^2=1\) and \({{\varvec{\theta }}}=(k/3){{\varvec{I}}}\), \(k=0, \ldots , 9\). As distributions of \(X_i\)’s, we treat normal, double exponential and t-distributions with 5 degrees of freedom. From Table 1, it is seen that the minimality of the three shrinkage estimators is robust for the t- and double exponential distributions.
3.2 Goodness-of-fit tests for normality
Goodness-of-fit tests for normality have been studied in a lot of articles. For references and explanations including omnibus test procedures, see Madansky (1988) and Thode (2002). An idea of using the Stein identity for testing normality is interesting and reasonable, because the Stein identity characterizes normal distributions. Henze and Visagie (2020) and Betsch and Ebner (2020) constructed test statistics based on the Stein identity.
Let \(X_1, \ldots , X_n\) be a random sample from a population with distribution function \(F(\cdot )\), where the mean and variance are denoted by \(\textrm{E}[X_i]=\mu \) and \(\textrm{Var}(X_i)={\sigma }^2\). The problem is to test the normality of the underlying distribution under the null hypothesis \(H_0: F=\mathcal{N}(\mu , {\sigma }^2)\). From Theorem 2.1, the characterization of a normal distribution of random variable X is
and the sample counterpart of the LHS is expressed by \(w_t/\sqrt{n}\), where
for \(Y_i=(X_i-{{\overline{X}}})/S\) and \(S^2=n^{-1}\sum _{i=1}^n(X_i-{{\overline{X}}})^2\). It is noted that \(w_t\) is invariant under the transformation of location and scale. Then, the normality can be tested based on \(\textrm{ST}_t=|w_t|\). Since it depends on t, however, it is better to take a weighted \(L^2\) distance and integrate over t. Henze and Visagie (2020) considered the test statistic \(\int _{-\infty }^\infty w_t^2 K(t)\textrm{d}t\) for a weight function K(t) and suggested the use of \(K(t)=e^{-{\gamma }t^2}\) for positive \({\gamma }\). The resulting test statistic is
where \(A_{ij}=Y_i+Y_j\). Taking \(K(t)=1\) for \(-c<t<c\) and otherwise \(K(t)=0\) with positive constant c, one gets another test statistic
Based on \(w_t\), we also suggest the test statistic
for positive constant c.
The following lemma is helpful for investigating asymptotic properties of these test statistics.
Lemma 3.1
Let \(Z_i=(X_i-\mu )/{\sigma }\) for \(i=1, \ldots , n\). For \(g(t)=\textrm{E}[e^{tZ_1}]\), let \(h_0(t)=g'(t)-tg(t)\), \(h_1(t)=tg'(t)+(1-t^2)g(t)\) and \(h_2(t)=tg''(t)+(1-t^2)g'(t)\). Assume that \(\textrm{E}[Z_1^2e^{t Z_1}]<\infty \) for t around zero. Then, \(w_t\) is approximated as \(w_t=W_n(t)+\sqrt{n}h_0(t)+o_p(1)\), where
Proof
Let \({{\overline{Z}}}=n^{-1}\sum _{i=1}^n Z_i\). Note that \(\{1+(S/{\sigma }-1)\}^{-1}=1-(S/{\sigma }-1) +O_p(n^{-1})\) and \(S/{\sigma }-1=\sqrt{S^2/{\sigma }^2}-1=2^{-1}(S^2/{\sigma }^2-1)+O_p(n^{-1})\). Then, \((X_i-{{\overline{X}}})/S\) is approximated as
Using this approximation, we evaluate \(w_t\) as
Since \(e^x=1+x + O(x^2)\) and \({{\overline{Z}}}=O_p(1/\sqrt{n})\), \(w_t\) is approximated as
which can be rewritten as
Each term can be evaluated as
Thus, it can be verified that \(w_t\) is approximated as \(w_t=W_n+\sqrt{n}h_0(t)+o_p(1)\), which proves Lemma 3.1\(\square \)
From (3.5), the central limit theorem shows that \(W_n(t)\) is asymptotically distributed as the normal distribution with mean zero and variance \(\textrm{Var}(W_n(t))\) under the assumption of \(\textrm{E}[Z_1^4]<\infty \), where the variance can be evaluated as
Note that \(h_1(t)=g(t)+t h_0(t)\) and \(h_2(t)=2t g(t)+h_0(t)+t h_0'(t)\). Under the normality hypothesis \(H_0\), we have \(h_0(t)=0\), \(h_1(t)=g(t)\), \(h_2(t)=2tg(t)\) and \(\textrm{E}[(Z_1-t)^2e^{2tZ_1}]=(1+t^2)e^{2t^2}\). Thus, the asymptotic variance of \(w_t\) under the normality is \(\textrm{Var}(W_n(t)) =V(t)^2+o(1)\), where
Henze and Visagie (2020) showed the consistency of \(\textrm{HV}_{\gamma }\), namely \(\textrm{P}_F(\textrm{HV}_{\gamma }>d)\rightarrow 1\) as \(n\rightarrow \infty \) for positive constant d and non-normal distributions F. Using Lemma 3.1, we can verify the consistency of \(\textrm{IST}_c\) and \( \textrm{MST}_c\).
Theorem 3.3
Assume that \(\textrm{E}[Z_1^4]<\infty \). Then, the test statistics \(\textrm{IST}_c\) and \( \textrm{MST}_c\) are consistent.
Proof
Concerning the consistency of \(\textrm{MST}_c\), it can be observed that for all t in the interval \((-c, c)\),
Note that \(W_n(t)\) converges in distribution to the normal distribution. When F is not a normal distribution, from Theorem 2.1, there is some \(t_0\) in \((-c, c)\) such that \(h_0(t_0)\not = 0\). Hence, \(\textrm{P}_F\{W_n(t_0)< -\sqrt{n}h_0(t_0)-d+o_p(1)\}\rightarrow 1\) when \(h_0(t_0)<0\), and \(\textrm{P}_F\{W_n(t_0)> -\sqrt{n}h_0(t_0)+d+o_p(1)\}\rightarrow 1\) when \(h_0(t_0)>0\). This shows the consistency of \(\textrm{MST}_c\).
For \(\textrm{IST}_c\), it is observed that for large n,
From (3.5), we have \(\int _{-c}^c W_n(t)h_0(t)\textrm{d}t=n^{-1/2}\sum _{i=1}^n Y_i^*\) for
Since \(Y_1^*, \ldots , Y_n^*\) are independently and identically distributed with zero mean and a finite variance, it can be seen that \(\textrm{E}[\{\int _{-c}^c W_n(t)h_0(t)\textrm{d}t\}^2]\) converges to a positive constant. Thus, it is concluded that \(\textrm{P}_F\Big (\int _{-c}^c w_t^2 \textrm{d}t>\textrm{d}\Big )\rightarrow 1\). \(\square \)
We investigate the performances of powers of the test statistics \(\textrm{HV}_{\gamma }\) with \({\gamma }=3\) given in Henze and Visagie (2020), \(\textrm{IST}_c\) with \(c=1\) and \(\textrm{MST}_c\) with \(c=1\). We also treat the test statistic \(\textrm{ST}_t=w_t/V(t)\) with \(t=0.5\), which converges in distribution to \(\mathcal{N}(0,1)\) under \(H_0\). From the proof of Theorem 3.3, this test can be seen to be consistent in the sense that \(\textrm{P}_F(\textrm{ST}_{t_0}>d)\rightarrow 1\) for distributions with \(h_0(t_0)\not = 0\) for \(t_0=0.5\). As another competitor, we add the test statistic suggested by De Wet and Ventner (1972), who modified the Shapiro–Francia (1972) and the Shapiro–Wilk (1965) test statistics, as
where \(X_{(1)}\le \cdots \le X_{(n)}\) are order statistics of \(X_i\)’s. The idea of this test is simple, but powerful. When data are distributed as a normal distribution, the points \((x_{(i)}, a_i)\) are plotted on or near the line, so that \(\textrm{DWV}\) is close to one. Thus, \(H_0\) is rejected when \(\textrm{DWV}\) is smaller than a critical value.
The powers of the five test statistics \(\textrm{ST}_{0.5}\), \(\textrm{HV}_3\), \(\textrm{IST}_1\), \(\textrm{MST}_1\) and \(\textrm{DWV}\) are investigated by simulation, where their critical values are adjusted so that their type I errors are \({\alpha }=5\%\). We consider the three alternative distributions
where \(\mathcal{N}(0,1)\), Ex(1) and \(DE(0,{\sigma })\) denote random variables having standard normal distribution \(\mathcal{N}(0,1)\), exponential distribution Ex(1) and double exponential distribution \(DE(0,{\sigma })\) with scale parameter \({\sigma }\), respectively.
The values of the powers for \(w=0.2, 0.5, 0.8, 1.0\) and \(n=50\) are obtained based on simulation data with 10,000 replications by using the Ox by Doornik (2007) and reported in Table 2. The model \(M_1\) has skewness, and the tests \(\textrm{ST}_{0.5}\) and \(\textrm{DWV}\) are more powerful, but \(\textrm{MST}_1\) is less powerful. The model \(M_2\) has kurtosis, and the test \(\textrm{DWV}\) is more powerful, but \(\textrm{ST}_{0.5}\) is less powerful. In the model \(M_3\) with mixed distributions as an alternative, the test \(\textrm{DWV}\) is good, and the other four tests are similarly performed. Through Table 2, it is seen that the performances of \(\textrm{HV}_3\) and \(\textrm{IST}_1\) are not bad, but such tests based on moments are less powerful than \(\textrm{DWV}\) based on quantiles.
4 Stein’s methods for normal approximations
An important application of the Stein identity is the normal approximation. This approach is called Stein’s method, and it has been studied in the literature including Ho and Chen (1978), Stein (1986), Goldstein and Reinert (1997), Shorack (2000), Barbour and Chen (2005), Diaconis and Holmes (2004), Chen and Shao (2005), Chen et al. (2011), Chen et al. (2013), Lehmann and Romano (2022) and the reference therein. Of these, Chen et al. (2011) gives us a good explanation for the Stein method. For instructive purposes, we here provide a simple introduction based on Chen et al. (2011).
4.1 A basic concept and a simple method
Let \(X_1, \ldots , X_n\) be independent random variables with \(\textrm{E}[X_i]=0\) and \(\textrm{Var}(X_i)=1\). Let
Note that \(X^{(i)}\) is independent of \(\xi _i\). Let Y be a random variable having \(\mathcal{N}(0,1)\). Then, we want to show that for any real z,
as \(n\rightarrow \infty \).
For any nonnegative function \(h(\cdot )\) satisfying \(\textrm{E}[h(X)]<\infty \), in general, the solution of the equation
is given by
where \(\mu _h=\textrm{E}[h(Y)]\) and \(\phi (x)\) is the probability density function of \(\mathcal{N}(0,1)\). In particular, for \(h_z(x)=I(x\le z)-\Phi (z)\), the solution of the equation \(I(x\le z)-\Phi (z) = f_z'(x)-x f_z(x)\) is written as
where \({{\overline{\Phi }}}(x)=1-\Phi (x)\). Then, we get the equality
For the central limit theorem (CLT), it is sufficient to show that \(\lim _{n\rightarrow \infty }\textrm{E}[f_z'(X)-X f_z(X)]=0\). It is noted that the RHS of (4.4) is exactly zero from the Stein identity if \(X\sim \mathcal{N}(0,1)\). To this end, we prepare the following lemma.
Lemma 4.1
For any nonnegative function \(h(\cdot )\) satisfying \(\textrm{E}[h(X)]<\infty \), it holds that
Proof
The proof is from Chen et al. (2011). From the Taylor series expansion with integral remainder, it follows that
We first write \(\textrm{E}[Xf_h(X)]=\sum _{i=1}^n \textrm{E}[\xi _i f_h(X^{(i)}+\xi _i)]\). From (4.6), \(\textrm{E}[\xi _i]=0\) and independence of \(X^{(i)}\) and \(\xi _i\), we observe that
Since \(\textrm{E}[\xi _i^2f_h'(X^{(i)})]=\textrm{E}[\xi _i^2] \textrm{E}[f_h'(X^{(i)})]=n^{-1}\textrm{E}[f_h'(X^{(i)})]\), on the other hand, it can be seen that
Combining (4.7) and (4.8) yields (4.5) in Lemma 4.1. \(\square \)
Hereafter, we consider the specific function \(h_{z,{\alpha }}(x)\), defined by
for positive constant \({\alpha }\). It is noted that \(h_{z,{\alpha }}(x)\) is absolutely continuous and bounded as \(|h_{z,{\alpha }}(x)|\le 1\). Let \(f_{z,{\alpha }}(x)\) be the function given in (4.2) for \(h(x)=h_{z,{\alpha }}(x)\).
Lemma 4.2
The function \(f_{z,{\alpha }}(x)\) satisfies that \(|f_{z,{\alpha }}(x)|\le \sqrt{\pi /2}\), \(|f_{z,{\alpha }}'(x)|\le 2\) and
where \(d(w, x)=|h_{z,{\alpha }}(w+x)-h_{z,{\alpha }}(w)|\).
Proof
For \(x>0\), from RHS of (4.2), it follows that
Since \(\{1-\Phi (x)\}/ \phi (x)\) is decreasing, we have \(\{1-\Phi (x)\}/ \phi (x)\le \{1-\Phi (0)\}/ \phi (0)=\sqrt{\pi /2}\). For \(x<0\), it follows from (4.2) that
Since \(\Phi (x)/ \phi (x)\) is increasing, we have \(\Phi (x)/\phi (x)\le \Phi (0)/\phi (0)=\sqrt{\pi /2}\). Thus, \(|f_{z,{\alpha }}(x)|\le \sqrt{\pi /2}\).
We next show that \(|f_{z,{\alpha }}'(x)|\le 2\). Note that \(f_{z,{\alpha }}'(x)=xf_{z,{\alpha }}(x)+h_{z,{\alpha }}(y)-\mu _{h_{z,{\alpha }}}\). For \(x>0\), it can be demonstrated that
which is called Mills’ ratio. Then from RHS of (4.2), it follows that
For \(x<0\), Mills’ ratio implies that
namely we have \(x\Phi (x)/\phi (x)<1\). Then from RHS of (4.2), it follows that
Thus, the inequality \(|f_{z,{\alpha }}'(x)|\le 2\) is proved.
Finally, it is noted that \(f_{z,{\alpha }}'(w+x)-f_{z,{\alpha }}'(w)=(w+x)f_{z,{\alpha }}(w+x)-wf_{z,{\alpha }}(w)+h_{z,{\alpha }}(w+x)-h_{z,{\alpha }}(w)\). Since \(|f_{z,{\alpha }}'(x)|<2\), it can be observed that \(|f_{z,{\alpha }}(w+x)-f_{z,{\alpha }}(w)|<2|x|\). Since \(|f_{z,{\alpha }}(x)|<\sqrt{\pi /2}\), we have
which shows (4.9). \(\square \)
We now show the central limit theorem using Lemmas 4.1 and 4.2. It is first noted that for \(Y\sim \mathcal{N}(0,1)\),
where \({\Delta }= |\textrm{E}[h_{z,{\alpha }}(X)]-\textrm{E}[h_{z,{\alpha }}(Y)]|\). Since \( |\Phi (z+{\alpha })-\Phi (z)|<{\alpha }/\sqrt{2\pi }\), we have
From (4.1), (4.2) and Lemma 4.1, it follows that
From (4.9) in Lemma 4.2, it can be seen that
Note that \(d(X^{(i)},\xi _i)=|\xi _i|/{\alpha }\) and \((\textrm{E}[|X^{(i)}])^2\le \textrm{E}[(X^{(i)})^2]=(n-1)/n<1\). Then,
Similarly,
Combining (4.10), (4.11) and these observations gives the following theorem.
Theorem 4.1
For \(X=\sum _{i=1}^nX_i/\sqrt{n}\), it holds that
Assume that \(\sum _{i=1}^n \textrm{E}[|X_i|^3]/n\) converges to a positive constant. Let \({\alpha }=n^{-1/4}\). Then, \(|\textrm{P}(X\le z)-\Phi (z)|\rightarrow 0\) as \(n\rightarrow \infty \).
The inequality in (4.17) is improved if we use the following concentration inequality due to Chen et al. (2011).
Lemma 4.3
For any a and b satisfying \(a<b\),
In the evaluation of \(\textrm{E}[d(X^{(i)},\xi _i)]\) in (4.12), we can see that for \(\xi _i>0\),
From the inequality (4.14), it follows that for all \(\xi _i\),
which gives
Thus, the same arguments used above provide the inequality
where
When \({\alpha }=1/\sqrt{n}\), we have
The inequality provides a Berry–Esseen-type bound.
4.2 A method based on the K-function
Another method based on the K-function is useful for evaluating \(\textrm{E}[f_h'(X)-Xf_h(X)]\) for \(f_h(\cdot )\) in (4.2). Define \(K_i(t)\) by
It can be seen that \(K_i(t)\ge 0\) for \(t\in \Re \) and that
Lemma 4.4
For \(f_h(\cdot )\) in (4.2), it holds that
Proof
Since \(\xi _i\) and \(X^{(i)}\) are independent and \(\textrm{E}[\xi _i]=0\), it is observed that \(\textrm{E}[X f_h(X)]=\sum _{i=1}^n\textrm{E}[\xi _i f_h(X)]=\sum _{i=1}^n\textrm{E}[\xi _i \{f_h(X)-f(X^{(i)})\}]\), which is rewritten as
Since \(\sum _{i=1}^n \int _{-\infty }^\infty K_i(t) \textrm{d}t=\sum _{i=1}^n\textrm{E}[\xi _i^2]=1\), we have \(\textrm{E}[f_h'(X)]=\sum _{i=1}^n \textrm{E}[\int _{-\infty }^\infty f_h'(X)K_i(t) \textrm{d}t]\). Combining these observations gives the expression in (4.16). \(\square \)
We treat \(h_z(x)=I(x\le z)-\Phi (z)\) and the function \(f_z(x)\) given in (4.3). From (4.1), it follows that
where \(a\vee b=\max (a,b)\) and \(a\wedge b=\min (a,b)\). Similarly to Lemma 4.2, it can be shown that
because \(|f_z(x)|\le \sqrt{2\pi }/4\). Thus,
From Lemma 4.3, it follows that
Hence from Lemma 4.4, we get
which yields the following bound.
Theorem 4.2
For \(X=\sum _{i=1}^nX_i/\sqrt{n}\), it holds that
Chen and Shao (2001) derived a more refined concentration inequality and obtained the improved bound given by
where \({\beta }_2=\sum _{i=1}^n\textrm{E}[\xi _i^2I(|\xi _i|>1)]\) and \({\beta }_3=\sum _{i=1}^n\textrm{E}[\xi _i^3I(|\xi _i|\le 1)]\). This corresponds to the Lindeberg’s condition. In fact, let \(X_1, \ldots , X_n\) be independent random variables with \(\textrm{E}[X_i]=0\) and \(\textrm{Var}(X_i)={\sigma }_i^2\). Let \(S_n=\sum _{i=1}^n X_i\) and \(B_n^2=\sum _{i=1}^n{\sigma }_i^2\). Then, \(\xi _i\) and X correspond to \(\xi _i=X_i/B_n\) and \(X=S_n/B_n\). It is observed that for any \({\varepsilon }>0\),
Thus from (4.18), it follows that
which converges to zero if the Lindeberg condition
is satisfied.
5 Stein-type identities in gamma and exponential distributions
5.1 Stein-type identities and characterization of gamma and exponential distributions
We treat the gamma distribution \(Ga({\alpha }, {\beta })\) with the density function
where \({\alpha }\) and \({\beta }\) are positive parameters. Hudson (1978) derived the Stein-type identity in the gamma distribution, and Betsch and Ebner (2019) showed that the identity characterizes the gamma distribution.
Theorem 5.1
Let X be a positive random variable with \(\textrm{E}[X]={\alpha }{\beta }\) and \(\textrm{Var}(X)={\alpha }{\beta }^2\). Then, the following four conditions are equivalent.
-
(a)
\(X\sim Ga({\alpha }, {\beta })\).
-
(b)
For any differentiable function \(h(\cdot )\) with \(\textrm{E}[|Xh(X)|]<\infty \) and \(\textrm{E}[|Xh'(X)|]<\infty \), it holds that
$$\begin{aligned} \textrm{E}[(X-{\alpha }{\beta })h(X)]={\beta }\textrm{E}[Xh'(X)]. \end{aligned}$$(5.1) -
(c)
For real constant t with \(t<1/{\beta }\), it holds that
$$\begin{aligned} \textrm{E}[(X-{\alpha }{\beta })\exp \{tX\}]=t {\beta }\textrm{E}[X\exp \{tX\}]. \end{aligned}$$(5.2) -
(d)
\(g(t)=\textrm{E}[\exp \{tX\}]\) satisfies the differential equation
$$\begin{aligned} {\textrm{d}\over \textrm{d}t}\log \{g(t)\}= {{\alpha }{\beta }\over 1-t{\beta }}\quad \text {or}\quad {\textrm{d}^2\over \textrm{d}t^2}\log \{g(t)\}= {{\alpha }{\beta }^2 \over (1-t{\beta })^2}. \end{aligned}$$(5.3)
Proof
For the proof from (a) to (b), the identity (5.1) can be derived by integration by parts, because \({\beta }(d/ dx) f(x|{\alpha },{\beta }) = - (x-{\alpha }{\beta })f(x|{\alpha },{\beta })\). We here provide another approach. Making the transformation \(y=x/{\beta }\) gives the expression
Differentiating both sides with respect to \({\beta }\), we have
which leads to (5.1). Clearly, one gets (c) from (b). For the proof from (c) to (d), the identity (5.2) is written as \(g'(t) -{\alpha }{\beta }g(t) = t{\beta }g'(t)\), which is expressed as (5.3). For the proof from (d) to (a), the solution of the differential equation in (5.3) is \(\log g(t)=-{\alpha }\log (1-t{\beta })\), namely \(g(t) = (1-t{\beta })^{-{\alpha }}\), which implies that \(X \sim Ga({\alpha },{\beta })\). \(\square \)
We here provide some conditions for characterizing gamma distributions. Condition (c) in Theorem 5.2 is due to Lukacs (1955). See also Kagan et al. (1973) and Kotz (1974).
Theorem 5.2
Assume that independent positive random variables \(X_1\) and \(X_2\) are identically distributed with \(\textrm{E}[X_i]={\alpha }{\beta }\) and \(\textrm{Var}(X_i)={\alpha }{\beta }^2\). Then the following three are equivalent.
-
(a)
\(X_i \sim Ga({\alpha }, {\beta })\) for \(i=1, 2\).
-
(b)
\(X_1+X_2\sim Ga(2{\alpha }, {\beta })\).
-
(c)
\(X_1+X_2\) and \(X_1/(X_1+X_2)\) are independent.
Proof
Since one gets clearly (b) and (c) from (a), it is sufficient to demonstrate the opposite directions. For the proof from (b) to (a), the condition in (b) and Theorem 5.1(c) implies that
From the independence of \(X_1\) and \(X_2\), equality (5.4) is rewritten as
which, from Theorem 5.1, shows that \(X_i\sim Ga({\alpha }, {\beta })\).
The proof from (c) to (a) can be done along with the proof of Lukacs (1955). From the independence of \(X_1+X_2\) and \(X_1/(X_1+X_2)\), it follows that
Differentiating both sides twise with respect to s and t and putting \(t=0\), we have
Let \(a=\textrm{E}[X_1^2/(X_1+X_2)^2]\) and \(g(s)=\textrm{E}[\exp \{sX_1\}]\). Then,
which rewrite (5.6) as
Let \(\psi (s)=g'(s)/g(s)\), and we have \(\psi '(s)=g''(s)/g(s)-\{\psi (s)\}^2\). The equality (5.7) is expressed as
The solution of the differential equation is
for \(b=2a/(1-2a)\) and constant \(c_0\). Since \(\psi (s)=(d/ds)\log g(s)\), we have
Since \(g(0)=1\), \(g'(0)={\alpha }{\beta }\) and \(g''(0)={\alpha }{\beta }(1+{\beta })\), constants satisfy \(c_1/c_0^{1/b}=1\), \(c_1/c_0^{1+1/b}={\alpha }{\beta }\) and \(c_1(1+b)/c_0^{2+1/b}={\alpha }({\alpha }+1){\beta }^2\), which gives \(c_0=({\alpha }{\beta })^{-1}\), \(c_1=({\alpha }{\beta })^{-{\alpha }}\) and \(b=1/{\alpha }\). This yields \(g(s)=(1-{\beta }s)^{-{\alpha }}\), which means that \(X_i \sim Ga({\alpha },{\beta })\) for \(i=1, 2\). \(\square \)
Condition (b) can be easily extended to the case of a sample with size n. Such an extension of condition (c) was done by Khatri and Rao (1968).
The exponential distribution \(Ex({\lambda })\) corresponds to the case of \({\alpha }=1\) and \({\beta }=1/{\lambda }\). The characterization problem of the exponential distribution has been studied in the literature. Of these, Shanbhag (1970a) showed that memoryless property characterizes the exponential distribution: \(X\sim Ex({\lambda })\) if and only if for any \(x, y>0\),
In fact, \(\log \textrm{P}(X>x+y)=\log \textrm{P}(X>x)+\log \textrm{P}(X>y)\) implies that \(\log \textrm{P}(X>x)=- cx\) for \(c>0\), namely \(\textrm{P}(X>x)=\exp (- cx)\), which means that X has an exponential distribution. For independently and identically distributed positive random variables \(X_1\) and \(X_2\), Ferguson (1964) proved that \(X_i\) has an exponential distribution if and only if \(\min (X_1, X_2)\) is independent of \(X_1-X_2\).
5.2 Shrinkage estimation
The Stein identity is useful for obtaining improved shrinkage estimators in simultaneous estimation in gamma distributions. Let \(X_1, \ldots , X_p\) be independent random variables such that \(X_i\sim Ga({\alpha }_i, {\beta }_i)\), \(i=1, \ldots , p\). We first consider the simultaneous estimation of \({{\varvec{\alpha }}}=({\alpha }_1, \ldots , {\alpha }_p)^\top \) in the case of \({\beta }_1=\cdots ={\beta }_p=1\). When estimator \({{\widehat{{{\varvec{\alpha }}}}}}=({{\widehat{{\alpha }}}}_1, \ldots , {{\widehat{{\alpha }}}}_p)^\top \) is evaluated by the risk relative to the quadratic loss \(\sum _{i=1}^p({{\widehat{{\alpha }}}}_i-{\alpha }_i)^2\), Hudson (1978) suggested the shrinkage estimator
Theorem 5.3
For \(p\ge 3\), the risk functions have the relation
which is interpreted as the Pythagorean triangle among \({{\varvec{X}}}\), \({{\varvec{\alpha }}}\) and \({{\widehat{{{\varvec{\alpha }}}}}}^{\textrm{S}}\).
Proof
The estimator in (5.8) has the risk
The Stein-type identity in (5.1) gives \(\textrm{E}[(X_i-{\alpha }_i)h(X_i)]=\textrm{E}[X_i h'(X_i)]\) for
Noting that
we can see that
Thus,
which is used to rewrite the risk as
This shows that the estimator \({{\widehat{{{\varvec{\alpha }}}}}}^{\textrm{S}}\) improves on \({{\varvec{X}}}\) for \(p\ge 3\). Since \((p-2)^2/\sum _{j=1}^p (\log X_j)^2=\Vert {{\varvec{X}}}- {{\widehat{{{\varvec{\alpha }}}}}}^{\textrm{S}}\Vert ^2\), the above risk function expresses the Pythagorean triangle. \(\square \)
The risk performances of the estimators \({{\varvec{X}}}\) and \({{\widehat{{{\varvec{\alpha }}}}}}^{\textrm{S}}\) are investigated by simulation with 10, 000 replications and the average values of the risks are reported in Table 3 for \(p=3, 6\), \({\beta }=1\) and \({{\varvec{\alpha }}}=(k/3){{\varvec{I}}}\), \(k=1, \ldots , 10\). Table 3 shows that the improvement of the shrinkage estimator \({{\widehat{{{\varvec{\alpha }}}}}}^{\textrm{S}}\) over \({{\varvec{X}}}\) is significant in the case of \(p=6\).
We next consider the simultaneous estimation of \({{\varvec{\beta }}}=({\beta }_1, \ldots , {\beta }_p)\) for known common \({\alpha }_1=\cdots = {\alpha }_p={\alpha }\), where estimator \({{\widehat{{{\varvec{\beta }}}}}}=({{\widehat{{\beta }}}}_1, \ldots , {{\widehat{{\beta }}}}_p)\) is evaluated by the risk relative to the quadratic loss \(\sum _{i=1}^p({{\widehat{{\beta }}}}_i-{\beta }_i)^2\). This estimation problem was studied by Berger (1980), Das Gupta (1986) and others. Das Gupta (1986) suggested the shrinkage estimator \({{\widehat{{{\varvec{\beta }}}}}}^{\textrm{S}}=({{\widehat{{\beta }}}}^{\textrm{S}}_1, \ldots , {{\widehat{{\beta }}}}^{\textrm{S}}_p)\) with
and derived a condition for improving on \({{\widehat{{{\varvec{\beta }}}}}}_0={{\varvec{X}}}/({\alpha }+1)\).
Theorem 5.4
When \(p\ge 2\), the estimator \({{\widehat{{{\varvec{\beta }}}}}}^{\textrm{S}}\) improves on \({{\widehat{{{\varvec{\beta }}}}}}_0\) relative to the quadratic loss if
Proof
The risk function of the estimator (5.9) is
The Stein-type identity in (5.1) gives
because \(\partial V/\partial X_i=V/(pX_i)\). Thus, the risk difference is written as
because \({{\overline{X}}}\ge V\). This shows that \({\Delta }\le 0\) under the condition in Theorem 5.4. \(\square \)
The risk performances of the estimators \({{\widehat{{{\varvec{\beta }}}}}}_0\) and \({{\widehat{{{\varvec{\beta }}}}}}^{\textrm{S}}\) are investigated by simulation with 10, 000 replications and the average values of the risks are reported in Table 4 for \(p=2, 6\), \({\alpha }=1\) and \({{\varvec{\beta }}}=(k/3){{\varvec{I}}}\), \(k=1, \ldots , 10\). From the table, the improvement of the shrinkage estimator \({{\widehat{{{\varvec{\beta }}}}}}^{\textrm{S}}\) over \({{\widehat{{{\varvec{\beta }}}}}}_0\) is numerically confirmed.
5.3 Goodness-of-fit tests for exponentiality
We consider to construct a statistic for testing exponentiality using the Stein identity. Goodness-of-fit tests for exponentiality have been studied in the literature. For some good reviews, see Henze and Meintanis (2005) and Ossai et al. (2022). The idea of constructing test statistics for exponentiality based on the Stein identity appears in Betsch and Ebner (2019) and Henze et al. (2012).
Let \(X_1, \ldots , X_n\) be a positive random sample from a distribution function \(F(\cdot )\) with \(\textrm{E}[X_i]={\sigma }\). Consider the problem of testing the exponentiality of the underlying distribution \(H_0: F=Ex(1/{\sigma })=Ga(1, {\sigma })\). From Theorem 5.1, the characterization of exponential distributions is
and the sample counterpart is \(w_t/\sqrt{n}\), where
for \(Y_i=X_i/{{\overline{X}}}\). It is noted that \(w_t\) is invariant under the transformation of scale. Henze et al. (2012) suggested a couple of test statistics based on \(w_t\), one of which is
where \(A_{ij}=Y_i+Y_j\) and \(B_{ij}=Y_iY_j\). Similarly to the problem of testing normality, we can consider the test statistics \(\textrm{IST}_c^+=\int _0^c w_t^2 \textrm{d}t\), \(\textrm{IST}_c^-=\int _{-c}^0 w_t^2 \textrm{d}t\), \(\textrm{IST}_c=\int _{-c}^c w_t^2 \textrm{d}t\) and \(\textrm{MST}_c=\sup _{-c<t<c} |w_t|\) for positive constant c, where
The following lemma is helpful for investigating asymptotic properties of these test statistics.
Lemma 5.1
Let \(Z_i=X_i/{\sigma }\) for \(i=1, \ldots , n\). Let \(h_0(t)=\textrm{E}[\{(1-t)Z_1-1\}e^{tZ_1}]\) and \(h_1(t)=(1-2t)\textrm{E}[Z_1e^{tZ_1}]+t(1-t)\textrm{E}[Z_1^2e^{tZ_1}]\). Then, \(w_t\) is approximated as \(w_t=W_n(t)+\sqrt{n}h_0(t)+o_p(1)\), where
Proof
Letting \({{\overline{Z}}}=n^{-1}\sum _{i=1}^n Z_i\), we rewrite \(w_t\) as
Since \(Z_i/{{\overline{Z}}}=Z_i/\{1+({{\overline{Z}}}-1)\}=1-({{\overline{Z}}}-1)+O_p(n^{-1})\), \(w_t\) can be approximated as
which leads to the approximation \(w_t=W_n+\sqrt{n}h_0(t)+o_p(1)\). Hence, Lemma 5.1 is proved. \(\square \)
From (5.11), the central limit theorem shows that \(W_n(t)\) is asymptotically distributed as the normal distribution with mean zero and variance \(\textrm{Var}(W_n(t))\) under the assumption of \(\textrm{E}[Z_1^2e^{2tZ_1}]<\infty \), where the variance can be evaluated as
Note that \(h_0(t)=(1-t)g'(t)-g(t)\) for \(g(t)=\textrm{E}[e^{tZ_1}]\). Then, \(h_1(t)=(1-2t)g'(t)+t(1-t)g''(t)=th_0'(t)+\{h_0(t)+g(t)\}/(1-t)\) and
Under the exponentiality hypothesis \(H_0\), we have \(g(t)=1/(1-t)\), \(h_0(t)=0\), \(h_1(t)=g(t)/(1-t)=1/(1-t)^2\) and \(\textrm{E}[Z_1^2]=2\). Also note that
Thus, the asymptotic variance of \(w_t\) under the exponentiality is \(\textrm{Var}(W_n(t)) =V(t)^2+o(1)\), where
for \(0<t<1/2\).
Henze et al. (2012) showed the consistency of \(\textrm{HME}_{\gamma }\). Using Lemma 5.1 and the same arguments as in the proof of Theorem 3.3, we can verify the consistency of the suggested test statistics.
Theorem 5.5
Assume that \(\textrm{E}[Z_1^2e^{tZ_1}]<\infty \) for t around zero. Then, the test statistics \(\textrm{IST}_c^+\), \(\textrm{IST}_c^-\), \(\textrm{IST}_c\) and \(\textrm{MST}_c\) given below (5.10) are consistent.
We investigate the performances of powers of the suggested estimators \(\textrm{IST}_c^+\), \(\textrm{IST}_c^-\), \(\textrm{IST}_c\) and \(\textrm{MST}_c=\sup _{-c<t<c} |w_t|\) for \(c=0.1\). As competitors, we treat the test \(\textrm{HME}_{\gamma }\) of Henze et al. (2012) for \({\gamma }=1\) and two more simple test statistics. One of them is the Cox and Oakes (1984) test
and the hypothesis \(H_0\) is rejected when \(|\textrm{CO}|>z_{{\alpha }/2}\). Another test is based on the coefficient of variation \(\textrm{HS}=nS^2/{{\overline{X}}}^2\), given in Hahn and Shapiro (1967), and the hypothesis \(H_0\) is rejected when \(\textrm{HS}<\chi _{n-1, 1-{\alpha }/2}^2\) or \(\textrm{HS}>\chi _{n-1, {\alpha }/2}^2\). This is also derived from a likelihood ratio for testing the homogeneity \(H_0^{*} : {\lambda }_1=\cdots = {\lambda }_n\) for \(X_i\sim Ex({\lambda }_i)\), and the null hypothesis \(H_0^{*}\) is rejected when \(\textrm{HS}>\chi _{n-1, {\alpha }}^2\).
The powers of those test statistics are examined by simulation with 10,000 replication. For \(n=50\), we adjust the type-I errors before calculating their powers under the following three alternatives for \(w=0.2, 0.5, 0.8, 1.0\).
where Ex(1), Ga(a, b), LogN(0, 1) and InvG(0, 1) denote random variables having exponential distribution Ex(1), gamma distribution Ga(a, b), log normal distribution LogN(0, 1) and inverse Gaussian distribution InvG(1, 1), respectively. The values of their powers are reported in Table 5. From the table, it is observed that the tests \(\textrm{IST}_{0.1}\), \(\textrm{HME}_1\) and \(\textrm{CO}\) are more powerful for \(w=0.2\), 0.5 and 0.8 in \(M_1\), \(M_2\) and \(M_3\). In the case of \(w=1\), the test \(\textrm{IST}_{0.1}\) is most powerful for Ga(1.2, 0.8), the tests \(\textrm{IST}_{0.1}^+\), \(\textrm{IST}_{0.1}^-\), \(\textrm{MST}_{0.1}\) and \(\textrm{HS}\) are more powerful for LN(0, 1) and the tests \(\textrm{IST}_{0.1}\), \(\textrm{HME}_1\) and \(\textrm{CO}\) are more powerful for InvG. Overall, the tests \(\textrm{IST}_{0.1}\), \(\textrm{HME}_1\) and \(\textrm{CO}\) have similar performances and the tests \(\textrm{IST}_{0.1}^+\), \(\textrm{IST}_{0.1}^-\), \(\textrm{MST}_{0.1}\) and \(\textrm{HS}\) are similarly performed.
6 Stein-type identities in Poisson and negative binomial distributions
6.1 Stein-type identity in Poisson distributions
In the Poisson distribution \(Po({\lambda })\), Hudson (1978) provided the Stein-type identity, which is also characterizes the Poisson distribution as seen below.
Theorem 6.1
Let X be a non-negative and discrete random variable with \(\textrm{E}[X]={\lambda }\). Then, the following four conditions are equivalent.
-
(a)
\(X\sim Po({\lambda })\).
-
(b)
For any function \(h(\cdot )\) with \(\textrm{E}[|Xh(X)|]<\infty \), it holds that
$$\begin{aligned} \textrm{E}[{\lambda }h(X)]=\textrm{E}[X h(X-1)] \quad \text {or}\quad \textrm{E}[X h(X)]=\textrm{E}[{\lambda }h(X+1)] \end{aligned}$$(6.1) -
(c)
For any real constant t, it holds that
$$\begin{aligned} \textrm{E}[(X-{\lambda }e^t)e^{tX}]=0. \end{aligned}$$(6.2) -
(d)
\(g(t)=\textrm{E}[\exp \{tX\}]\) satisfies the differential equation
$$\begin{aligned} {\textrm{d}\over \textrm{d}t}\log \{g(t)\}= {\lambda }e^t. \end{aligned}$$(6.3)
Proof
For the proof from (a) to (b), it is noted that
which produces the identity (6.1). Clearly, one gets (c) from (b). For the proof from (c) to (d), the identity (6.2) is written as \(g'(t) = {\lambda }e^t g(t)\), which is given in (6.3). For the proof from (d) to (a), the solution of the differential equation in (6.3) is \(g(t) = \exp \{{\lambda }(e^t-1)\}\), which implies that \(X \sim Po({\lambda })\). \(\square \)
The characterization of the Poisson distribution has been studied in a lot of papers. A feature of this distribution is that the sample mean and the unbiased sample variance have the same expectation. Shanbhag (1970b) derived the related condition for characterizing the Poisson distribution.
Theorem 6.2
Assume that nonnegative and discrete random variables \(X_1\) and \(X_2\) are independently and identically distributed. Then, the following three conditions are equivalent.
-
(a)
\(X_i \sim Po({\lambda })\) for \(i=1, 2\).
-
(b)
\(X_1+X_2 \sim Po(2{\lambda })\).
-
(c)
The conditional expectation of \((X_1-X_2)^2\) given \(X_1+X_2\) is equal to \(X_1+X_2\), namely \(\textrm{E}[(X_1-X_2)^2|X_1+X_2]=X_1+X_2\).
Proof
The proof from (a) to (b) is trivial. For the proof from (b) to (a), from Theorem 6.1, we have
which easily leads to
Thus, we get (a) by using Theorem 6.1 again.
For the proof from (a) to (c), it is noted that \(\textrm{E}[(X_1-X_2)^2]=2{\lambda }\) and \(\textrm{E}[X_1+X_2]=2{\lambda }\), namely
Since \(X_1+X_2\) is complete and sufficient, from \(\textrm{E}[ \textrm{E}[(X_1-X_2)^2|X_1+X_2] - (X_1+X_2)]=0\), it follows that \(\textrm{E}[(X_1-X_2)^2|X_1+X_2] - (X_1+X_2)=0\), and we get (c).
For the proof from (c) to (a), it is noted that condition (c) implies
It is observed that
Then from (6.4), for \(g(t)=\textrm{E}[e^{tX_1}]\), we have
Let \(\psi (t)=g'(t)/g(t)\), and this equality is expressed as \(\psi '(t)=\psi (t)\). The solution of this differential equation is \(\log \psi (t)=t + \log c_0\), namely \(\psi (t)=c_0 e^t\). This implies that \(\log g(t)=c_0 e^t + \log c_1\), or \(g(t)=c_1\exp \{c_0 e^t\}\). Since \(g(0)=c_1e^{c_0}=1\) or \(c_1=e^{-c_0}\), we get \(g(t)=\exp \{c_0(e^t-1)\}\), which leads to the Poisson distribution. \(\square \)
Theorem 6.2 can be extended to the case of a random sample with size n, where condition (c) is replaced by \(\textrm{E}[(n-1)^{-1}\sum _{i=1}^n(X_i-{{\overline{X}}})^2|{{\overline{X}}}]={{\overline{X}}}\).
6.2 Two applications of the Stein-type identity in Poisson distributions
We here provide two applications of the Stein-type identity in Poisson distributions. One of them is to obtain improved shrinkage estimators in simultaneous estimation in Poisson distributions. Let \(X_1, \ldots , X_p\) be independent random variables such that \(X_i\sim Po({\lambda }_i)\), \(i=1, \ldots , p\). Consider the problem of simultaneously estimating \({{\varvec{\lambda }}}=({\lambda }_1, \ldots , {\lambda }_p)\) relative to the loss \(\sum _{i=1}^p({{\hat{{\lambda }}}}_i-{\lambda }_i)^2/{\lambda }_i\). Clevenson and Zidek (1975) constructed a class of estimators improving on \({{\varvec{X}}}\), given by \({{\widehat{{{\varvec{\lambda }}}}}}_\phi =({{\hat{{\lambda }}}}_{\phi ,1}, \ldots , {{\hat{{\lambda }}}}_{\phi ,p})\) for
Theorem 6.3
The unbiased risk estimator of the shrinkage estimator \({{\widehat{{{\varvec{\lambda }}}}}}_\phi \) is
Thus, for \(p\ge 2\), \({{\widehat{{{\varvec{\lambda }}}}}}_\phi \) improves on \({{\varvec{X}}}\) if \(\phi (\cdot )\) satisfies the conditions (a) \(\phi (z)\) is nondecreasing in z, and (b) \(0\le \phi (z) \le 2(p-1)\).
Proof
The risk function of \({{\widehat{{{\varvec{\lambda }}}}}}_\phi \) is
From the Stein identity in Theorem 6.1, we use the identity \(\textrm{E}[(X_i/{\lambda }_i) h(X_i)]=\textrm{E}[ h(X_i+1)]\) to write
These observations give the expression in the unbiased risk estimator. \(\square \)
In the case of \(\phi (z)=p-1\), the estimator \({{\widehat{{{\varvec{\lambda }}}}}}^{\textrm{S}}={{\varvec{X}}}-(p-1)(Z+p-1)^{-1}{{\varvec{X}}}\) has the risk unbiased estimator \({{\widehat{R}}({{\varvec{\lambda }}}, {{\widehat{{{\varvec{\lambda }}}}}}^{\textrm{S}})}=p - (p-1)^2(Z+p+1)/\{(Z+p-1)(Z+p)\}\). Clevenson and Zidek (1975) showed that the Bayes estimator \({{\widehat{{{\varvec{\lambda }}}}}}^{\textrm{GB}}={{\varvec{X}}}-(p-1+{\beta })(Z+p-1+{\beta })^{-1}{{\varvec{X}}}\) satisfies the conditions of Theorem 6.3 for \(0\le {\beta }\le p-1\) and it is admissible for \({\beta }>1\). Numerical investigation of the estimator \({{\widehat{{{\varvec{\lambda }}}}}}^{\textrm{CZ}}\) is given in Table 7.
We next consider to derive goodness-of fit test statistics for Poissonity based on the Stein identity. The problem of testing Poissonity has been studied in the literature, and one can see Mijburgh and Visagie (2020) for an overview.
Let \(X_1, \ldots , X_n\) be a discrete and nonnegative random sample from a population with distribution function \(F(\cdot )\) with mean \(\textrm{E}[X_i]={\lambda }\). Consider the problem of testing the Poissonity of the underlying distribution \(H_0: F=Po({\lambda })\). From Theorem 6.1, the characterization of the Poisson distribution is
and the sample counterpart is \(w_t/\sqrt{n}\), where
The idea of Henze et al. (2012) is used to construct the test statistic
for \(A_{ij}=X_i+X_j\), \(B_{ij}=A_iX_j\) and positive constant \({\gamma }\). This was suggested by Treutler (1995) and the related test was proposed by Baringhaus and Henze (1992). For other test statistics, see Gürtler and Henze (2000). Similarly to the problem of testing exponentiality, we can consider the test statistics
and \(\textrm{MST}_c = \sup _{-c<t<0}|w_t|\) for positive constant c.
It is observed that \(w_t\) can be approximated as
where \(h_0(t)=\textrm{E}[(X_1-{\lambda }e^t)e^{tX_1}]\) and \(g(t)=\textrm{E}[e^{tX_1}]\). This gives the following lemma.
Lemma 6.1
\(w_t\) is approximated as \(w_t=W_n(t)+\sqrt{n}h_0(t)+o_p(1)\), where
From the central limit theorem, it follows that \(W_n(t)\) converges in distribution to the normal distribution with mean zero and the variance
Since \(h_0(t)=g'(t)-{\lambda }e^tg(t)\), we have \(h_0'(t)=g''(t)-{\lambda }e^t g'(t)-{\lambda }e^tg(t)\), so that \(\textrm{E}[X_1(X_1-{\lambda }e^t)e^{tX_1}]=g''(t)-{\lambda }e^tg'(t)=h_0'(t)+{\lambda }e^t g(t)\). Under the null hypothesis \(H_0\) of Poissonity, it can be seen that \(g(t)=\exp \{{\lambda }(e^t-1)\}\), \(h_0(t)=0\) and
so that \(\textrm{Var}(W_n(t))=V(t)^2+o(1)\), where
Using Lemma 6.1 and the same arguments as in the proof of Theorem 3.3, we can verify the consistency of the suggested test statistics.
Theorem 6.4
Assume that \(\textrm{E}[X_1^2e^{tX_1}]<\infty \) for t around zero. Then, the test statistics \(\textrm{BHT}_{\gamma }\), \(\textrm{IST}_c\) and \(\textrm{MST}_c\) given below (6.6) are consistent.
We investigate the performances of powers of the test statistics \(\textrm{BHT}_{\gamma }\) for \({\gamma }=1\), \(\textrm{IST}_c\) for \(c=1\) and \(\textrm{MST}_c\) for \(c=1\). As a competitor, we employ the test based on Fisher’s index, given by \(\textrm{FI}=(n-1)S^2/{{\overline{X}}}\). The null hypothesis \(H_0\) is rejected when \(\textrm{FI}>\chi _{n-1, 1-{\alpha }/2}^2\) or \(\textrm{FI}>\chi _{n-1, {\alpha }/2}^2\). The simulation experiments are conducted under the following two alternatives for \({\alpha }=0, 2, 3, 4, 5\) and \({\lambda }=1, 3\).
where \(Po({\lambda })\), Po(0.2), Nbin(10, 10/10.2) and Bin(10, 0.1) denote independent random variables having Poisson distributions \(Po({\lambda })\), Po(1.5), negative binomial distribution Nbin(10, 10/10.2) and binomial distribution Bin(10, 0.1), respectively. It is here noted that the type-I errors of the tests depend on the unknown parameter \({\lambda }\). To fix this problem, the parametric bootstrap method is useful for critical values of the tests. For example, we explain how to obtain the critical value of \(\textrm{BHT}_{\gamma }\) in model \(M_2\). We first generate K samples of size n from \(M_2\) and each sample consists of \(u_1, \ldots , u_{n}\) from \(Po({\lambda })\) and \(v_1, \ldots , v_{n}\) from Po(0.2) for \(K=1,000\) and \(n=50\). Then, we generate the B bootstrap samples \((u_1^{(b)}, \ldots , u_{n}^{(b)})\), \(b=1, \ldots , B\) from \(Po({{\overline{u}}})\) for \({{\overline{u}}}=\sum _{i=1}^{n}u_i/n\) and calculate the value of \(\textrm{BHT}_{\gamma }^{(b)}\) for \(b=1, \ldots , B\) and \(B=1,000\). We obtain the critical value \(q_{\textrm{BHT}}\) of the test \(\textrm{BHT}_{\gamma }\) as an upper 5% quantile of the histogram of \(\textrm{BHT}_{\gamma }^{(b)}\)’s. We then calculate K values of \(\textrm{BHT}_{\gamma }\) based on \(x_i=u_i + {\alpha }v_i\) for \(i=1, \ldots , n\) and count the number such that \(\textrm{BHT}_{\gamma }>q_{\textrm{BHT}}\). In this way, we can obtain approximated values of the size and the power of the test \(\textrm{BHT}_{\gamma }\). These values are reported in Table 6.
From the table, the sizes of the tests have small variations for \(n=50\), \(B=1,000\) and \(K=1,000\) and the variations will lower for large n, B and K. The tests \(\textrm{BHT}_1\), \(\textrm{IST}_1\) and \(\textrm{MST}_1\) have similar performances in the powers and are more powerful than \(\textrm{FI}\) in the case of \({\lambda }=1\) for \(M_1, M_2\) and \(M_3\). In the case of \({\lambda }=3\), the test \(\textrm{MST}_1\) is more powerful than \(\textrm{BHT}_1\) and \(\textrm{IST}_1\) and \(\textrm{FI}\) is more powerful for \({\alpha }=4, 5\) in \(M_1\) and \(M_2\).
6.3 Stein-type identity in negative binomial distributions
Consider the negative binomial distribution \(Nbin({\alpha }, p)\) with the probability function
where \({\alpha }>0\) and \(0<p<1\). Although \({\alpha }\) is a natural number in the negative binomial distribution, we here treat \({\alpha }\) as a positive real number. Hudson (1978) provided the Stein-type identity, which is also characterizes the negative binomial distribution as seen below.
Theorem 6.5
Let X be a non-negative and discrete variable. Then, the following four conditions are equivalent.
-
(a)
\(X\sim Nbin({\alpha },p)\).
-
(b)
For any function \(h(\cdot )\) with \(\textrm{E}[|Xh(X)|]<\infty \), it holds that
$$\begin{aligned} \textrm{E}[X h(X)]=q \textrm{E}[(X+{\alpha }) h(X+1)]. \end{aligned}$$(6.8) -
(c)
For any real constant t, it holds that
$$\begin{aligned} \textrm{E}[X e^{tX}]=q \textrm{E}[(X+{\alpha }) e^{t(X+1)}]. \end{aligned}$$(6.9) -
(d)
\(g(t)=\textrm{E}[\exp \{tX\}]\) satisfies the differential equation
$$\begin{aligned} {\textrm{d}\over \textrm{d}t}\log \{g(t)\}= {{\alpha }q e^t \over 1-qe^t}. \end{aligned}$$(6.10)
Proof
For the proof from (a) to (b), it is noted that
which shows the identity (6.8). Clearly, one gets (c) from (b). For the proof from (c) to (d), the identity (6.9) is written as \(g'(t) =q\{g'(t)+{\alpha }g(t)\}e^t\), which is (6.10). For the proof from (d) to (a), the solution of the differential equation in (6.10) is \(\log g(t)={\alpha }\log p -{\alpha }\log (1-qe^t)\), namely, \(g(t) = {p^{\alpha }/(1-q e^t)^{\alpha }}\), which implies that \(X \sim Nbin({\alpha },p)\). \(\square \)
Theorem 6.6
Assume that nonnegative and discrete random variables \(X_1\) and \(X_2\) are independently and identically distributed with \(\textrm{E}[|X|e^{tX}]<\infty \). The following two are equivalent.
-
(a)
\(X_i\sim Nbin({\alpha },p)\) for \(i=1,2\).
-
(b)
\(X_1+X_2 \sim Nbin(2{\alpha }, p)\).
Proof
It is easy to see (b) from (a). For the proof from (b) to (a), from Theorem 6.5, it follows that
This equality leads to
which, from Theorem 6.5, shows (a). \(\square \)
We briefly describe the Stein problem of simultaneous estimation of means of k negative binomial distributions. This is due to Tsui (1984). Let \(X_1, \ldots , X_p\) be independent random variables such that \(X_i \sim Nbin({\alpha }_i, \eta _i)\) for \(i=1, \ldots , p\). The mean of \(X_i\) is denoted by \({\theta }_i={\alpha }_i (1-\eta _i)/\eta _i\). Let \({{\varvec{X}}}=(X_1, \ldots , X_p)\) and \({{\varvec{\theta }}}=({\theta }_1, \ldots , {\theta }_p)\), and we consider the estimation of \({{\varvec{\theta }}}\) for known \({\alpha }_i\)’s relative to the loss \(\sum _{i=1}^p ({{\hat{{\theta }}}}_i-{\theta }_i)^2/{\theta }_i\). Tsui (1984) suggested the shrinkage estimator \({{\widehat{{{\varvec{\theta }}}}}}_\phi =({{\hat{{\theta }}}}_{\phi ,1}, \ldots , {{\hat{{\theta }}}}_{\phi ,p})\), where
for \(Z=\sum _{i=1}^p X_i\) and nonnegative function \(\phi (\cdot )\). Let \({\Delta }= R({{\varvec{\theta }}}, {{\widehat{{{\varvec{\theta }}}}}}_\phi )-R({{\varvec{\theta }}}, {{\varvec{X}}})\). The conditions on \(\phi (\cdot )\) for improving on \({{\varvec{X}}}\) was derived by Tsui (1984).
Theorem 6.7
The risk difference \({\Delta }\) is decomposed as \({\Delta }={\Delta }_1+{\Delta }_2\), where
Then, it holds that \({\Delta }\le 0\) if the following conditions are satisfied. (a) \(\phi (z)\) is nondecreasing, (b) \(0<\phi (z) \le 2(p-1)\) and (c) \(\phi (z)/z\) is nonincreasing.
Proof
More generally, we consider the estimator \({{\hat{{\theta }}}}_i=X_i+f_i({{\varvec{X}}})\). Let \({{\varvec{e}}}_i\) be a p-variate vector whose i-th coordinate is one and whose other coordinates are zero. Note that \({\alpha }_i/{\theta }_i=1/q_i -1\) and \(\textrm{E}[h(X_i)/q_i]=\textrm{E}[(X_i+{\alpha }_i)h(X_i+1)/(X_i+1)]\) from Theorem 6.5. Then,
where
Substituting \(f_i({{\varvec{X}}})=- X_i\phi (Z)/(Z+p-1)\) yields the expressions in Theorem 6.7. It can be easily checked that \({\Delta }_1\le 0\) and \({\Delta }_2\le 0\) under the conditions (a), (b) and (c).
\(\square \)
We investigate the risk performances of the estimators \({{\varvec{X}}}\) and \({{\widehat{{{\varvec{\theta }}}}}}^{\textrm{CZ}}\) under the three distributions of \(X_i\): the Poisson distribution \(Po({\theta }_i)\), geometric distribution \(Geo(1/({\theta }_i+1))\) and negative binomial distribution \(Nbin(5, 5/({\theta }_i+5)\). The simulation experiment has been conducted with \(p=6\) and \({\theta }_i=k/2\) for \(i=1, \ldots , p\) and \(k=1, \ldots , 10\), and the average losses based on simulation with 10,000 replications are reported Table 7. From Table 7, it is seen that the improvements of \({{\widehat{{{\varvec{\theta }}}}}}^{\textrm{CZ}}\) and \({{\widehat{{{\varvec{\theta }}}}}}^{\textrm{TS}}\) over \({{\varvec{X}}}\) are significant and robust in the Poisson, geometric and negative binomial distributions.
7 Concluding remarks
We conclude the paper with some remarks and extensions. Through the paper, we have used the moment-generating functions for characterizing and testing the normal, exponential and Poisson distributions. As well known, however, the use of the moment generating functions is limited to the existence of their expectations. To avoid this point, it may be better to use the characteristic functions, and the results given in the paper can be extended to the arguments based on the characteristic functions.
The test of normality has been treated in Sect. 3.2 in the uni-variate case. For testing multivariate normal distributions, Ebner (2021) provided a test statistic based on the Stein method. Many other statistics for testing multivariate-normality have been suggested in the literature. For example, see Mardia (1970), Mecklin and Mundfrom (2004) and Ebner and Henze (2020).
Although the Stein method in this paper is limited to normal distributions, Ley and Swan (2013) suggested the general density approach which is applicable to general distributions. Barbour (1988) and Götze (1991) introduced the generator approach which can adapt the method to many other distributions. For the details, see Reinert (2005).
Finally, we give some recent developments related to the Stein method. Betsch et al. (2021) suggested new techniques based on the Stein method for estimating parameters. In addition, Betsch et al. (2022) discussed the estimation of parameters in negative-binomial distributions and the test of Poissonity based on the Stein method. Betsch and Ebner (2021) provided characterization of continuous distributions based on indicator functions, and Betsch et al. (2022) applied a similar argument to discrete distributions.
References
Anastasiou, A., et al. (2023). Stein’s method meets computational statistics: A review of some recent developments. Statistical Science, 38, 120–139.
Baranchik, A. J. (1970). A family of minimax estimators of the mean of a multivariate normal distribution. Annals of Mathematical Statistics, 41, 642–645.
Barbour, A. D. (1988). Stein’s method and Poisson process convergence. Journal of Applied Probability, 25, 175–184.
Barbour, A. D., & Chen, L. H. Y. (2005). Stein’s method and applications. World Scientific.
Baringhaus, L., & Henze, N. (1992). A goodness of fit test for the Poisson distribution based on the empirical generating function. Statistics and Probability Letters, 13, 269–274.
Bellec, P. C., & Zhang, C.-H. (2021). Second-order Stein: URE for SURE and other applications in high-dimensional inference. Annals of Statistics, 49, 1864–1903.
Berger, J. O. (1980). Improving on inadmissible estimators in continuous exponential families with applications to simultaneous estimation of gamma problem. Annals of Statistics, 8, 545–571.
Berger, J. O. (1985). Statistical decision theory and Bayesian analysis. Springer.
Bernstein, S. N. (1941). Sur une propriété caractéristique de la loi de Gauss. Leningrad. Polytechnic Institute. Transactions, 3, 21–22.
Betsch, S., & Ebner, B. (2019). A new characterization of the gamma distribution and associated goodness-of-fit tests. Metrika, 82, 779–806.
Betsch, S., & Ebner, B. (2020). Testing normality via a distributional fixed point property in the Stein characterization. TEST, 29, 105–138.
Betsch, S., & Ebner, B. (2021). Fixed point characterizations of continuous univariate probability distributions and their applications. Annals of the Institute of Statistical Mathematics, 73, 31–59.
Betsch, S., Ebner, B., & Klar, B. (2021). Minimum \(L^q\)-distance estimators for non-normalized parametric models. Canadian Journal of Statistics, 49, 514–548.
Betsch, S., Ebner, B., & Nestmann, F. (2022). Characterizations of non-normalized discrete probability distributions and their application in statistics. Electronic Journal of Statistics, 16, 1303–1329.
Brandwein, A. C., & Strawderman, W. E. (1990). Stein estimation: The spherically symmetric case. Statistical Science, 5, 356–369.
Chen, H. Y. (2021). Stein’s method of normal approximation: Some recollections and reflections. Annals of Statistics, 49, 1850–1863.
Chen, L. H. Y., Fang, X., & Shao, Q.-M. (2013). From Stein identities to moderate deviations. Annals of Statistics, 41, 262–293.
Chen, L. H. Y., Goldstein, L., & Shao, Q.-M. (2011). Normal approximation by Stein’s method. Springer.
Chen, L. H. Y., & Shao, Q.-M. (2001). A non-uniform Berry–Esseen bound via Stein fs method. Probability Theory and Related Fields, 120, 236–254.
Chen, L. H. Y., & Shao, Q.-M. (2005). Stein’s method for normal approximation. In: A. D. Barbour & L. H. Y. Chen (Eds.), An introduction to Stein’s method, Lecture Notes Series No. 4 (pp. 1–59), Institute for Mathematical Sciences, National University of Singapore, Singapore University Press and World Scientific.
Clevenson, M. L., & Zidek, J. V. (1975). Simultaneous estimation of the means of independent Poisson laws. Journal of the American Statistical Association, 70, 698–705.
Cox, D. R., & Oakes, D. (1984). Analysis of survival data. Chapman and Hall.
Cramér, H. (1936). Über eine Eigenschaft der normalen verteilungsfunktion. Mathematische Zeitschrift, 41, 405–414.
Das Gupta, A. (1986). Simultaneous estimation in the multiparameter gamma distribution under weighted quadratic losses. Annals of Statistics, b14, 206–219.
De Wet, T., & Ventner, J. H. (1972). Asymptotic distributions of certain test criteria of normality. South African Statistical Journal, 6, 135–149.
Diaconis, P., & Holmes, S. (2004). Stein’s method: Expository lectures and applications. Institute of Mathematical Statistics Lecture Notes—Monograph Series, 46, IMS, Haywad, CA.
Diaconis, P., Olkin, I., & Ghurye, S. G. (1977). Review of the book of Kagan, Linnik and Rao (1973). Annals of Statistics, 5, 583–592.
Doornik, J. A. (2007). Object-oriented matrix programming using Ox (3rd ed.). Timberlake Consultants Press.
Ebner, B. (2021). On combining the zero bias transform and the empirical characteristic function to test normality. Latin American Journal of Probability and Mathematical Statistics, 18, 1029–1045.
Ebner, B., & Henze, N. (2020). Tests for multivariate normality—A critical review with emphasis on weighted \(L^2\)-statistics. TEST, 29, 845–892.
Efron, B., & Morris, C. (1973). Stein’s estimation rule and its competitors—An empirical Bayes approach. Journal of the American Statistical Association, 68, 117–130.
Fathi, M., Goldstein, L., Reinert, G., & Saumard, A. (2022). Relaxing the Gaussian assumption in shrinkage and SURE in high dimension. Annals of Statistics, 50, 2737–2766.
Fourdrinier, D., Strawderman, W. E., & Wells, M. T. (2018). Shrinkage estimation. Springer Nature.
Ferguson, T. S. (1964). A characterization of the exponential distribution. Annals of Mathematical Statistics, 35, 1199–1207.
Ghosh, M., Kubokawa, T., & Datta, G. S. (2020). Density prediction and the Stein phenomenon. Sankhya, 82–A, 330–352.
Goldstein, L., & Reinert, G. (1997). Annals of Applied Probability, 7, 935–952.
Götze, F. (1991). On the rate of convergence in the multivariate CLT. Annals of Probability, 19, 724–739.
Gürtler, N., & Henze, N. (2000). Recent and classical goodness-of-fit tests for the Poisson distribution. Journal of Statistical Planning and Inference, 90, 207–225.
Hahn, G. J., & Shapiro, S. S. (1967). Statistical models in engineering. Wiley.
Henze, N., & Meintanis, G. (2005). Recent and classical tests for exponentiality: A partial review with comparisons. Metrika, 61, 29–45.
Henze, N., Meintanis, G., & Ebner, B. (2012). Goodness-of-fit tests for the gamma distribution based on the empirical Laplace transform. Communications in Statistics-Theory and Methods, 41, 1543–1556.
Henze, N., & Visagie, J. (2020). Testing for normality in any dimension based on a partial differential equation involving the moment generating function. Annals of the Institute of Statistical Mathematics, 72, 1109–1136.
Ho, S.-T., & Chen, L. H. (1978). An \(L_p\) bound for the remainder in a combinatorial central limit theorem. Annals of Probability, 6, 231–249.
Hudson, H. M. (1978). A natural identity for exponential families with applications in multiparameter estimation. Annals of Statistics, 6, 473–484.
James, W., & Stein, C. (1961). Estimation with quadratic loss. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 361–379). University of California Press, Berkeley.
Kac, M. (1939). On a characterization of the normal distribution. American Journal of Mathematics, 61, 726–728.
Kagan, A. M., Linnik, Y. V., & Rao, C. R. (1965). On a characterization of the normal law based on a property of the sample average. Sankhya, A27, 405–406.
Kagan, A. M., Linnik, Y. V., & Rao, C. R. (1973). Characterization problems in mathematical statistics. Wiley.
Khatri, C. G., & Rao, C. R. (1968). Some characterizations of the gamma distribution. Sankhya, A30, 157–166.
Komaki, F. (2001). A shrinkage predictive distribution for multivariate normal observables. Biometrika, 88, 859–864.
Konno, Y. (2009). Shrinkage estimators for large covariance matrices in multivariate real and complex normal distributions under an invariant quadratic loss. Journal of Multivariate Analysis, 100, 2237–2253.
Kotz, S. (1974). Characterizations of statistical distributions: A supplement to recent surveys. International Statistical Review, 42, 39–65.
Kubokawa, T. (1991). An approach to improving the James–Stein estimator. Journal of Multivariate Analysis, 36, 121–126.
Kubokawa, T. (1994). A unified approach to improving equivariant estimators. Annals of Statistics, 22, 290–299.
Lehmann, E. L., & Romano, J. P. (2022). Testing statistical hypotheses (4th ed.). Springer.
Ley, C., & Swan, Y. (2013). Stein’s density approach and information inequalities. Electronic Communications in Probability, 18, 1–14.
Lukacs, E. (1942). A characterization of the normal distribution. Annals of Mathematical Statistics, 13, 91–93.
Lukacs, E. (1955). A characterization of the gamma distribution. Annals of Mathematical Statistics, 26, 319–324.
Madansky, A. (1988). Prescriptions for working statisticians. Springer.
Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57, 519–530.
Maruyama, Y., Kubokawa, T., & Strawderman, W. E. (2023). Stein estimation. Springer briefs in statistics. Springer.
Mecklin, C. J., & Mundfrom, D. J. (2004). An appraisal and Bibliography of tests for multivariate normality. International Statistical Review, 72, 123–138.
Mijburgh, P. A., & Visagie, I. J. (2020). An overview of goodness-of-fit rests for the Poisson distribution. South African Statistical Journal, 54, 207–230.
Ossai, E. O., Madukaife, M. S., & Oladugba, A. V. (2022). A review of tests for exponentiality with Monte Carlo comparisons. Journal of Applied Statistics, 49, 1277–1304.
Reinert, G. (2005). Three general approaches to Stein’s method. In A. D. Barbour & L. H. Y. Chen (Eds.), An introduction to Stein’s method. Lecture Notes Series, Institute for Mathematical Sciences, National University of Singapore (Vol. 4). Singapore University Press.
Robert, C. P. (2007). The Bayesian choice: From decision-theoretic foundations to computational implementation (2nd ed.). Springer.
Ruben, H. (1974). A new characterization of the normal distribution through the sample variance. Sankhya, A36, 379–388.
Shanbhag, D. N. (1970). The characterizations for exponential and geometric distributions. Journal of the American Statistical Association, 65, 1256–1259.
Shanbhag, D. N. (1970). Another characteristic property of the Poisson distribution. Proceedings of the Cambridge Philosophical Society, 68, 167–169.
Shapiro, S. S., & Francia, R. S. (1972). An approximate analysis of variance test for normality. Journal of the American Statistical Association, 67, 215–216.
Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality. Biometrika, 52, 591–611.
Shinozaki, N. (1984). Simultaneous estimation of location parameters under quadratic loss. Annals of Statistics, 12, 322–335.
Shorack, G. R. (2000). Probability for statisticians. Springer.
Stein, C. (1973). Estimation of the mean of a multivariate normal distribution. In Proceedings of the Prague symposium on asymptotic statistics (pp. 345–381).
Stein, C. (1981). Estimation of the mean of a multivariate normal distribution. Annals of Statistics, 9, 1135–1151.
Stein, C. (1986). Approximate computation of expectations. Institute of Mathematics Lecture Notes – Monograph Series (Vol. 7). IMS, Haywad, CA.
Strawderman, W. E. (1971). Proper Bayes minimax estimators of the multivariate normal mean. Annals of Mathematical Statistics, 42, 385–388.
Thode, H. C. (2002). Testing for normality. Marcel Dekker.
Treutler, B. (1995). Tests for the Poisson distribution. Diploma Thesis. University of Karlsruhe (in German).
Tsui, K.-W. (1984). Robustness of Clevenson–Zidek-type estimators. Journal of the American Statistical Association, 79, 152–157.
Tsukuma, H., & Kubokawa, T. (2020). Shrinkage estimation for mean and covariance matrices. Springer briefs in statistics. Springer.
Acknowledgements
I would like to thank the Editor, the Associate Editor and the reviewer for many valuable comments and helpful suggestions which led to an improved version of this paper. I am also grateful to Dr. Ryo Imai for his valuable comments. This research was supported in part by Grant-in-Aid for Scientific Research (18K11188, 22K11928) from Japan Society for the Promotion of Science.
Funding
Open Access funding provided by The University of Tokyo.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares that he has no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kubokawa, T. Stein’s identities and the related topics: an instructive explanation on shrinkage, characterization, normal approximation and goodness-of-fit. Jpn J Stat Data Sci 7, 267–311 (2024). https://doi.org/10.1007/s42081-023-00239-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42081-023-00239-6