5.1 Introduction

In Chaps. 2 and 3 we studied estimators that improve over the “usual” estimator of the location vector for the case of a normal distribution. In this chapter, we extend the discussion to spherically symmetric distributions discussed in Chap. 4. Section 5.2 is devoted to a discussion of domination results for Baranchik type estimators while Sect. 5.3 examines more general estimators. Section 5.4 discusses Bayes minimax estimation. Finally, Sect. 5.5 discusses estimation with a concave loss.

We close this introductory section by extending the discussion of Sect. 2.2 on the empirical Bayes justification of the James-Stein estimator to the general multivariate (but not necessarily normal) case.

Suppose X has a p-variate distribution with density f(∥x − θ2), unknown location vector θ and known scale matrix σ 2 I p. The problem is to estimate θ under loss L(θ, δ) = ∥δ − θ2. Let the prior distribution on θ be given by π(θ) = f n(θ), the n-fold convolution of the density f(⋅) with itself. Note that the distribution of θ is the same as that of \(\sum _{i=1}^{n} Y_i \) where the Y i are iid with density f(⋅). Recall from Example 1.3 that the Bayes estimator of θ is given by

$$\displaystyle \begin{aligned}\delta_n(X) = \frac{n}{n+1} X = \left(1 - \frac{1}{n+1}\right)X.\end{aligned}$$

Assume now that n is unknown. Since

$$\displaystyle \begin{aligned} E(X^{\scriptscriptstyle{\mathrm{T}}} X) = E\bigg(\sum_{i=0}^n Y^{\scriptscriptstyle{\mathrm{T}}} _i Y_i\bigg)= (n+1)\,E(Y^{\scriptscriptstyle{\mathrm{T}}} _0 Y_0) = (n+1)\,(\mathrm{tr}\, \sigma^2 I)= (n+1)\, p\,\sigma^2 \, , \end{aligned}$$

an unbiased estimator of n + 1 is X T X∕( 2), and so p σ 2∕(X T X) is a reasonable estimator of 1∕(n + 1). Substituting p σ 2∕(X T X) for 1∕(n + 1) in the Bayes estimator, we have that

$$\displaystyle \begin{aligned} \delta^{EB}(X) = \bigg(1 - \frac{p\,\sigma^2}{X^{\scriptscriptstyle{\mathrm{T}}} X}\bigg)X \end{aligned}$$

can be viewed as an empirical Bayes estimator of θ without any assumption on the form of the density (and in fact there is not even any need to assume there is a density). Hence this Stein-like estimator can be viewed as a reasonable alternative to X from an empirical Bayes perspective regardless of the form of the underlying distribution.

Note that Diaconis and Ylvisaker (1979) introduced the prior f n(θ) as a reasonable conjugate prior for location families since it gives linear Bayes estimators. Strawderman (1992) gave the above empirical Bayes argument. In the normal case the sequence of priors corresponds to that in Sect. 2.2.3 with τ 2 = n σ 2. The shrinkage factor p σ 2 in the present argument differs from (p − 2) σ 2 in the normal case since in this general case we use a “plug-in” estimator of 1∕(n + 1) as opposed to the unbiased estimator (in the normal case) of 1∕(σ 2 + τ 2).

5.2 Baranchik-Type Estimators

In this section, assuming that X has a spherically symmetric distribution with mean vector θ and that loss is L(θ, δ) = ∥δ − θ2, we consider estimators of the Baranchik-type, as (2.19) in the normal setting, for different families of densities. In Sect. 5.3, we consider results for general estimators of the form X + g(X).

5.2.1 Variance Mixtures of Normal Distributions

We first consider spherically symmetric densities which are variance mixtures of normal distributions. Suppose

$$\displaystyle \begin{aligned} f(\Vert x - \theta\Vert^2) = \frac{1}{(2\pi)^{p/2}} \int_0^\infty \frac{1}{v^{p/2}} \exp\left\{-\frac{\Vert x - \theta\Vert^2}{2v}\right\} dG(v), \end{aligned} $$
(5.1)

where G(⋅) is a probability distribution on (0, ), i.e., a mixture of \(\mathcal {N}_p(\theta , vI)\) distributions with mixing distribution G(⋅).

Our first result gives a domination result for Baranchik type estimators for such distributions. This result is analogous to Theorem 2.3 in the normal case.

Theorem 5.1 (Strawderman 1974b)

Let X have density of the form (5.1) and let

$$\displaystyle \begin{aligned}\delta_{a,r}^B(X) = \bigg(1 - a \frac{r(\Vert X\Vert^2)}{\Vert X\Vert^2}\bigg) X,\end{aligned} $$

where the function r(⋅) is absolutely continuous. Assume the expectations E[V ] and E[V −1] are finite where V  has distribution G. Then \(\delta _{a,r}^B(X)\) is minimax for the loss L(θ, δ) = ∥δ  θ2 provided

  1. (1)

    0 ≤ a ≤ 2(p − 2)∕E[V −1],

  2. (2)

    0 ≤ r(t) ≤ 1 for any t ≥ 0,

  3. (3)

    r(t) is nondecreasing in t, and

  4. (4)

    r(t)∕t is nonincreasing in t.

Furthermore, \(\delta _{a,r}^B(X)\) dominates X provided the inequalities in (1) or (2) (on a set of positive measure) are strict or r (t) is strictly increasing on a set of positive measure.

Proof

The proof proceeds by calculating the conditional risk given V = v, noting that the distribution of X|V = v is normal N(θ, vI p). First note that E[V ] <  is equivalent to E 0[∥X2] <  so that the risk of X is finite. Similarly, it can be seen that E[V −1] <  if and only if E 0[∥X−2] < . Then, thanks to (2), we have E 0[r 2(∥X2)∥X−2] < . Actually, we will see below that, for any θ, E θ[∥X−2] ≤ E 0[∥X−2], and hence, E θ[r 2(∥X2)∥X−2] <  which guarantees that the risk of \(\delta _{a,r}^B(X)\) is finite. Note that, conditionally on V , ∥X2V  has a noncentral chi-square distribution with p degrees of freedom and noncentrality parameter ∥θ2V . Hence, since the family of noncentral chi-square distributions have monotone (increasing) likelihood ratios in the noncentrality parameter (and therefore are stochastically increasing), ∥X2V  is (conditionally) stochastically decreasing in V  and increasing in ∥θ2.

Hence,

$$\displaystyle \begin{aligned} E_\theta \left[ \frac{1}{\Vert X \Vert ^2/V} \right] \leq E_0 \left[ \frac{1}{\Vert X \Vert ^2 / V} \right] \end{aligned}$$

and, as a result,

$$\displaystyle \begin{aligned} E_\theta \left[ \frac{1}{\Vert X \Vert ^2} \right] & = E \left[ E_\theta \left[\frac{1}{\Vert X \Vert ^2} \bigg| V \right] \right.\\ &= E \left[ \frac{1}{V} E_\theta \left[ \frac{1}{\Vert X \Vert ^2/V} \bigg| V \right] \right]\\ & \leqq E \left[ \frac{1}{V} E_0 \left[ \frac{1}{\Vert X \Vert ^2/V} \right] \right]\\ & = E_0 \left[ \frac{1}{\Vert X \Vert ^2} \right]. \end{aligned} $$

This sufficies to establish finiteness of the risk of \(\delta _{a,r}^B(X)\). We now deal with the main part of the theorem. Using Corollary 2.1 and Theorem 2.3, we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} R(\theta, \delta_{a,r}^B) &\displaystyle =&\displaystyle E\{ E[ \Vert \delta_{a,r}^B(X) - \theta\Vert^2\, | V]\}\\ &\displaystyle =&\displaystyle E\bigg\{ E\bigg[ \Vert X-\theta\Vert^2 + V^2\bigg(\frac{a^2r^2(\Vert X\Vert^2)}{V^2\Vert X\Vert^2} - \frac{2a (p-2)}{V} \frac{r(\Vert X\Vert^2)}{\Vert X\Vert^2}\bigg)\\ &\displaystyle &\displaystyle \qquad \qquad - 4 \, a V r^\prime (\Vert X\Vert^2) \bigg| V \bigg] \bigg\}\\ &\displaystyle \le&\displaystyle R(\theta, X) + E\bigg\{ a E\bigg[\frac{r(\Vert X\Vert^2)}{\Vert X\Vert^2/V} \bigg| V\bigg] \bigg(\frac{a}{V} - 2(p-2)\bigg)\bigg\}, \end{array} \end{aligned} $$
(5.2)

since r 2(∥X2) ≤ r(∥X2) and r (∥X2) ≥ 0. Now, as a consequence of the above monotone likelihood property, ∥X2V  is stochastically decreasing in V . It follows that the conditional expectation in (5.2) is nondecreasing in V  since, if v 1 < v 2, we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} E\bigg[\frac{r(\Vert X\Vert^2)} {\Vert X\Vert^2/V} \bigg| V = v_1\bigg] &\displaystyle =&\displaystyle E\bigg[\frac{r\big(v_1 \frac{\Vert X\Vert^2} {V} \big)} {\Vert X\Vert^2/V} \bigg| V = v_1\bigg] \\ &\displaystyle \le&\displaystyle E\bigg[\frac{r\big(v_2 \frac{\Vert X\Vert^2} {V} \big)} {\Vert X\Vert^2/V} \bigg| V = v_1\bigg] \\ &\displaystyle \le&\displaystyle E\bigg[\frac{r\big(v_2 \frac{\Vert X\Vert^2} {V} \big)} {\Vert X\Vert^2/V} \bigg| V = v_2\bigg] \\ &\displaystyle =&\displaystyle E\bigg[\frac{r(\Vert X\Vert^2)} {\Vert X\Vert^2/V} \bigg| V = v_2\bigg]. \end{array} \end{aligned} $$

The first inequality follows since r(∥X2) is nondecreasing while the second since r(t)∕t is nonincreasing and ∥X2V  is stochastically decreasing in V . Finally, using the fact that aV −1 − 2(p − 2) is decreasing in V , and the fact that E[g(Y )h(Y )] ≤ E[g(Y )]E[h(Y )] if g and h are monotone in opposite directions, it follows that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} R(\theta, \delta_{a,r}^B) &\displaystyle \le&\displaystyle R(\theta, X) + a E\bigg[\frac{V r(\Vert X\Vert^2)}{\Vert X\Vert^2}\bigg] E\bigg[\frac {a}{V} - 2(p-2)\bigg] \\ &\displaystyle \le&\displaystyle R(\theta, X) \end{array} \end{aligned} $$
(5.3)

by assumption (a). Hence \(\delta _{a,r}^B(X)\) is minimax, since X is minimax.

The dominance result follows since the inequality in (5.2) is strict if there is strict inequality in (2) or if r (⋅) is strictly positive on a set of positive measure and the inequality in (5.3) is strict if the inequalities in (1) are strict. □

Example 5.1 (The multivariate Student-t distribution)

The multivariate Student-t distribution: If V  has an inverse Gamma (v∕2, v∕2) distribution (that is, \(V \sim v/\chi _{v}^2\)), then the distribution of X is a multivariate Student-t distribution with ν degrees of freedom. Since \(E[V] = E[v/\chi _{v}^2] = v/(v - 2)\) for v > 2 and \(E[V^{-1}] = E[\chi _{v}^2/v]=1\), the conditions of Theorem 5.1 requires 0 ≤ a ≤ 2(p − 2) and v > 2.

Example 5.2 (Examples of the function r(t))

The James-Stein estimator has r(t) ≡ 1 and hence satisfies conditions (2), (3) and (4) of Theorem 5.1. Also r(t) = t∕(t + b) satisfies these conditions. Similarly, the positive-part James-Stein estimator (1 − aX T X)+ X is such that

$$\displaystyle \begin{aligned}r(t) = \begin{cases} t/a & \text{for }0 \leq t \leq a\\ 1 & \text{for }t \geq a \end{cases}\end{aligned} $$

and

$$\displaystyle \begin{aligned}\frac{r(t)}{t} = \begin{cases} 1/a & \text{for }0 \leq t \leq a\\ 1/t & \text{for }t \geq a \end{cases}\end{aligned} $$

hence also satisfies the conditions (2), (3) and (4) of Theorem 5.1.

It is worth noting, and easy to see, that if the sampling distribution is N(θ, I p) and the prior distribution is any variance mixture of normal distributions as in (3.4), in the Baranchik representation of the Bayes estimator (see Corollary 3.1), the function r(t)∕t is always nonincreasing. This fact leads to the following observation on the (sampling distribution) robustness of Bayes minimax estimators for a normal sampling distribution. If δ π(X) = (1 − a r(∥X2)∕∥X2)X is a Bayes minimax estimator with respect to a scale mixture of normal priors for a N(θ, I p) sampling distribution, and if r(t) is nondecreasing, this Bayes minimax estimator remains minimax for a multivariate-t sampling distribution in Example 5.1 as long as the degrees of freedom is greater than two.

It is also interesting to note that, in general, there will be no uniformly optimal choice of the shrinkage constant “a” in the James-Stein estimator if the mixing distribution G(⋅) is nondegenerate. The optimal choice will typically depend on ∥θ2. This is in contrast to the normal sampling distribution case, where G(⋅) is degenerate, and where the optimal choice is a = (p − 2)σ 2.

5.2.2 Densities with Tails Flatter Than the Normal

In this section we consider the subclass of spherically symmetric densities f(∥x − θ2) such that, for any t ≥ 0 for which f(t) > 0,

$$\displaystyle \begin{aligned} \frac{F(t)}{f(t)} \ge c > 0 \end{aligned} $$
(5.4)

for some fixed positive c, where

$$\displaystyle \begin{aligned} F(t) = \frac{1}{2} \int_t^\infty f(u) du.\end{aligned} $$
(5.5)

This class was introduced in Berger (1975) (without the constant 1/2 multiplier).

This class of densities contains a large subclass of variance mixtures of normal densities but also many others. The following lemma gives some conditions which guarantee inclusion or exclusion from the class satisfying (5.4) and (5.5).

Lemma 5.1

Suppose X has density f(∥x  θ2).

  1. (1)

    (Mixture of normals). If, for some distribution G on (0, ),

    $$\displaystyle \begin{aligned}f(\Vert x - \theta\Vert^2) = \bigg(\frac{1}{\sqrt {2\pi}}\bigg)^p \int_0^\infty v^{-p/2} \exp \left\{- \frac{\Vert x -\theta\Vert^2}{2v}\right\} d G(v)\end{aligned} $$

    where E[V p∕2] is finite, E denoting the expectation with respect to G, then f(⋅) is in the class (5.4) with c = E[V p∕2+1]∕E[V p∕2] for p ≥ 3.

  2. (2)

    If f(t) = h(t)e at with h(t) nondecreasing, then f(⋅) is in the class (5.4) .

  3. (3)

    If f(t) = e atg(t) where g(t) is nondecreasing and limt g(t) = ∞, then f(t) is not in the class (5.4) .

Proof

(1) Applying the definition of F in (5.5) we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} F(t) &\displaystyle =&\displaystyle \frac{1}{2}\int_t^\infty f(u) du\\ &\displaystyle =&\displaystyle \frac{1}{2(\sqrt{2\pi})^p} \int_t^\infty \int_0^\infty v^{-p/2} \exp \left\{-u/2v\right\} dG(v) du \\ &\displaystyle =&\displaystyle \frac{1}{(\sqrt{2\pi})^p} \int_0^\infty v^{-p/2+1} \exp \left\{-t/2v\right\} dG(v).\vspace{-2pt} \end{array} \end{aligned} $$

Hence the ratio in (5.4) equals

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \frac{F(t)}{f(t)} &\displaystyle =&\displaystyle \frac{\int_0^\infty v^{-p/2+1} \, \exp \left\{-t/2v\right\} \, dG(v)} {\int_0^\infty v^{-p/2} \exp \left\{-t/2v\right\} dG(v)} \\ &\displaystyle \geq&\displaystyle \frac{\int_0^\infty v^{-p/2+1} \, dG(v)} {\int_0^\infty v^{-p/2} dG(v)} \\ &\displaystyle =&\displaystyle \frac{E[V^{-p/2+1}]}{E[V^{-p/2}]}.\vspace{-2pt} \end{array} \end{aligned} $$
(5.6)

The inequality follows since the family of densities proportional to the function \(v \mapsto v^{-p/2} \, \exp \left \{-t/2v\right \}\) has monotone (increasing) likelihood ratio in the parameter t. Note that if p ≥ 3, E[V p∕2] <  implies E[V p∕2+1] < . This completes the proof of (1).

(2) In this case it follows

$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{F(t)}{f(t)} &\displaystyle =&\displaystyle \frac{ \frac{1}{2} \int_t^\infty h(u) e^{-au} du} { h(t) e^{-at}}\\ &\displaystyle \ge&\displaystyle \frac{1}{2} \int_t^\infty e^{-a(u-t)} du\\ &\displaystyle =&\displaystyle \frac{1}{2a}. \end{array} \end{aligned} $$

Hence (5.4) is satisfied with c = 1∕2a, which proves (2).

(3) In this case it follows

$$\displaystyle \begin{aligned} \begin{array}{rcl} 2 \lim_{t\to\infty} \frac{F(t)}{f(t)} &\displaystyle =&\displaystyle \lim_{t\to\infty} \frac{\int_t^\infty \exp \left\{-aug(u)\right\} du} {\exp \{-atg(t)\}}\\ &\displaystyle =&\displaystyle \lim_{t\to\infty} \int_t^\infty \exp \left\{-aug (u) + atg(t)\right\} du\\ &\displaystyle =&\displaystyle \lim_{t\to\infty} \int_0^\infty \exp \left\{-a(u+t)g(u+t) + atg(t)\right\} du\\ &\displaystyle \le&\displaystyle \lim_{t\to\infty} \int_0^\infty \exp \left\{-aug(t)\right\} du\\ &\displaystyle =&\displaystyle \lim_{t\to\infty} \frac{1}{ag(t)}\\ &\displaystyle =&\displaystyle 0. \end{array} \end{aligned} $$

Hence f(t) is not in the class (5.4), which shows (c). □

Part (2) of the lemma shows that densities with tails flatter than the normal (and including the normal) are in the class (5.4), while densities with tails “sufficiently lighter” than the normal are not included. Also the condition in part (3) is stronger than necessary in that it suffices that the condition hold only for all t larger than some positive K. See Berger (1975) for further details and discussion.

Example 5.3

Some specific examples in the class (5.4) include (see Berger 1975 for more details)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mathrm{(1)}\qquad f(t) &\displaystyle =&\displaystyle K/\mathrm{cosh} t\qquad \left(c\approx 1/2\right)\\ \mathrm{(2)} \qquad f(t) &\displaystyle =&\displaystyle K t(1+t^2)^{-m}\,\, \mathrm{with}\,\, m > p/4\qquad \left(c = m/2\right) \\ \mathrm{(3)} \qquad f(t) &\displaystyle =&\displaystyle K e^{-\alpha t -\beta}\big/(1 + e^{-\alpha t -\beta})^2\qquad \left(c = \alpha/2 \right)\\ \mathrm{(4)} \qquad f(t) &\displaystyle =&\displaystyle K t^n e^{-t/2}\,\, \mathrm{for}\,\, n\ge 0\qquad (c = 1). \end{array} \end{aligned} $$

The latter two distributions are known as the logistic type and Kotz , respectively.

The following lemma plays the role of Stein’s lemma (Theorem 2.1) for the family of spherically symmetric densities.

Lemma 5.2

Let X have density f(∥x  θ2) and let g(X) be a weakly differentiable function such that E θ[|(Xθ)T g(X)|] < ∞. Then

$$\displaystyle \begin{aligned} \begin{array}{rcl} E_\theta[(X-\theta)^{\scriptscriptstyle{\mathrm{T}}} g(X)] &\displaystyle =&\displaystyle E_\theta\bigg[ \frac{F(\Vert X-\theta\Vert^2)} {f(\Vert X -\theta\Vert^2)} \mathrm{div}\, g(X)\bigg] \\ &\displaystyle =&\displaystyle C \, E_\theta^*\bigg[\mathrm{div}\, g(X)\bigg] \end{array} \end{aligned} $$

where F(t) is defined as in (5.5) and \(E_\theta ^*\) denotes expectation with respect to the density

$$\displaystyle \begin{aligned} x \mapsto \frac{1}{C} \, F(\Vert x - \theta\Vert^2) \end{aligned}$$

and where it is assumed that

$$\displaystyle \begin{aligned} C = \int_{\mathbb{R}^p} F(\Vert x - \theta\Vert^2) \, dx < \infty \, . \end{aligned}$$

Proof

Note that the existence of the expectations in Lemma 5.2 will be guaranteed for any function g(x) such that E θ[∥g(x)∥2] <  as soon as E 0[∥X2] < . The proof will follow along the lines of Sect. 2.4 making use of Stokes’ theorem . It follows that

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle E[(X - \theta)^{\scriptscriptstyle{\mathrm{T}}} g(X)]\\ &\displaystyle &\displaystyle \quad = \int_{R^p} (x-\theta)^{\scriptscriptstyle{\mathrm{T}}} g(x) f(\Vert x-\theta\Vert^2) \,dx\\ &\displaystyle &\displaystyle \quad = \int_0^\infty \int_{S_{R,\theta}} (x-\theta)^{\scriptscriptstyle{\mathrm{T}}} g(x) \,f(\Vert x-\theta\Vert^2) \,d\sigma_{R,\theta}(x) \,dR\;\; (\text{by Lemma 1.4}) \end{array} \end{aligned} $$
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle =&\displaystyle \int_0^\infty \int_{S_{R,\theta}} \bigg(\frac{x-\theta} {\Vert x-\theta\Vert}\bigg)^{\scriptscriptstyle{\mathrm{T}}} d\sigma_{R,\theta}(x) \,R\,f(R^2)\,dR \\ &\displaystyle =&\displaystyle \int_0^\infty \int_{B_{R,\theta}} \mathrm{div}\, g(x) \,dx R \, f(R^2) \,dR \qquad (\text{Stokes' theorem}) \\ &\displaystyle =&\displaystyle \int_{R^p} \mathrm{div}\, g(x) \int_{\Vert x-\theta\Vert}^\infty Rf(R^2)\, dR \,dx\qquad (\text{Fubini's theorem}) \\ &\displaystyle =&\displaystyle \int_{R^p} \mathrm{div}\, g(x) \, F(\Vert x-\theta\Vert^2) \,dx \\ &\displaystyle =&\displaystyle E_\theta\Bigg[ \mathrm{div}\, g(x) \frac{ F(\Vert x-\theta\Vert^2)} {f(\Vert x-\theta\Vert^2)} \bigg] \\ &\displaystyle =&\displaystyle C \, E_\theta^*\bigg[\mathrm{div}\, g(X)\bigg]\vspace{-3pt} \end{array} \end{aligned} $$

Now, with the important analog of Stein’s lemma in hand, we can extend some of the minimaxity results from the Gaussian setting to the case of spherically symmetric distributions. The following result gives conditions for minimaxity of estimators of the Baranchik type.

Theorem 5.2

Let X have density f(∥x  θ2) which satisfies (5.4) for some 0 < c < ∞. Assume also that E 0[∥X2] < ∞ and E 0[∥X−2] < ∞. Let

$$\displaystyle \begin{aligned} \delta_{a,r}^B(X) = \bigg(1 - \frac{a \,r(\Vert X\Vert^2)} {\Vert X\Vert^2}\bigg) X\end{aligned} $$

where r(⋅) is absolutely continuous. Then \(\delta _{a,r}^B(X)\) is minimax for p ≥ 3 provided

  1. (1)

    0 < a ≤ 2 c (p − 2),

  2. (2)

    0 ≤ r(t) ≤ 1, and

  3. (3)

    r(⋅) is nondecreasing.

Furthermore \( \delta _{a,r}^B(X) \) dominates X provided both inequalities are strict in (1) or in (2) on a set of positive measure or if r (⋅) is strictly positive on a set of positive measure.

Proof

We note that the conditions ensure finiteness of the risk so that Lemma 5.2 is applicable. Hence we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} R(\theta, \delta_{a,r}^B) &\displaystyle =&\displaystyle E_\theta\bigg[ \Vert X-\theta\Vert^2 + \frac{a^2 r^2(\Vert X\Vert^2)} {\Vert X\Vert^2} - 2 \, \frac{a \, r(\Vert X\Vert^2)X^{\scriptscriptstyle{\mathrm{T}}} (X-\theta)} {\Vert X\Vert^2}\bigg]\ \\ &\displaystyle =&\displaystyle R(\theta, X) + a \, E_\theta\bigg[\frac{ar^2(\Vert X\Vert^2)} {\Vert X\Vert^2} - 2 \, \mathrm{div}\, \bigg(\frac{r(\Vert X\Vert^2)X} {\Vert X\Vert^2}\bigg) \frac{F(\Vert X-\theta\Vert^2)} {f(\Vert X-\theta\Vert^2)}\bigg]\vspace{-4pt} \end{array} \end{aligned} $$

by Lemma 5.2. Therefore the risk difference between \(\delta _{a,r}^B(X)\) and X equals

$$\displaystyle \begin{aligned} \begin{array}{rcl} \varDelta_\theta &\displaystyle =&\displaystyle a \, E_\theta\bigg[\frac{ar^2(\Vert X\Vert^2)} {\Vert X\Vert^2} - \bigg(\frac{2(p-2)r(\Vert X\Vert^2)} {\Vert X\Vert^2} + 4 \, r^\prime (\Vert X\Vert^2)\bigg) \frac{F(\Vert X-\theta\Vert^2)} {f(\Vert X-\theta\Vert^2)}\bigg]\\ &\displaystyle \le&\displaystyle a \, E_\theta\bigg[\frac{r(\Vert X\Vert^2)} {\Vert X\Vert^2} \bigg(a - 2(p-2) \, \frac{F(\Vert X-\theta\Vert^2)} {f(\Vert X-\theta\Vert^2)}\bigg)\bigg]\\ &\displaystyle \le&\displaystyle a \, E_\theta\bigg[\frac{r(\Vert X\Vert^2)} {\Vert X\Vert^2} \big(a - 2(p-2) \, c \big)\bigg]\\ &\displaystyle \le&\displaystyle 0. \vspace{-4pt} \end{array} \end{aligned} $$

The domination part follows as in Theorem 5.1. □

Theorem 5.2 applies for certain densities for which Theorem 5.1 is not applicable and additionally lifts the restriction that r(t)∕t is nonincreasing. However, if the density is a mixture of normals, and both theorems apply, the shrinkage constant “a” given by Theorem 5.1 (with a = 2(p − 2)∕E[V −1]) is strictly larger than that for Theorem 5.2 ( with a = 2(p − 2)c) whenever the mixing distribution G(⋅) is not degenerate. To see this note that

$$\displaystyle \begin{aligned}\frac{1}{E[V^{-1}]} > c = \frac{E[V^{-p/2 +1}]}{E[V^{-p/2}]} \end{aligned}$$

or equivalently

$$\displaystyle \begin{aligned}E[V^{-p/2}] > E[V^{-1}] E[V^{-p/2 +1}] \end{aligned}$$

whenever the positive random variable V  is non-degenerate. Note also that E[V −1] <  whenever E[V p∕2] <  and p ≥ 3.

Example 5.4 (The multivarite Student-t distribution, continued)

Suppose X has a p-variate Student-t distribution with ν degrees of freedom as in Example 5.1, so that V  has an inverse Gamma(ν∕2, ν∕2) distribution . In this case

$$\displaystyle \begin{aligned}E[V^{-p/2}] = \frac{2^{p/2} \varGamma\big(\frac{p+\nu}{2}\big)} {\nu^{p/2} \varGamma\big(\frac{\nu}{2}\big)} \end{aligned}$$

which is finite for all ν > 0 and p > 0.

The bound on the shrinkage constant, “a”, in Theorem 5.1 is 2(p − 2) as shown in Example 5.1, while the bound on “a”, in Theorem 5.2, as indicated above, is given by

$$\displaystyle \begin{aligned}2(p-2) \frac{E[V^{-p/2+1}]} {E[V^{-p/2}]} = 2(p-2) \bigg(\frac{\nu}{\nu +p-2}\bigg) < 2(p-2). \end{aligned}$$

Hence, for large p, the bound on the shrinkage factor “a” can be substantially less for Theorem 5.2 than for Theorem 5.1 in the case of a multivariate-t sampling distribution. Note that, for fixed p, as ν tends to infinity the smaller bound tends to the larger one (and the Student-t distribution tends to the normal).

Example 5.5 (Examples 5.3 continued)

All of the distributions in Example 5.3 satisfy the assumptions of Theorem 5.2 (under suitable moment conditions for the second density). It is interesting to note that for the Kotz distribution , the value of c (= 1), as in (5.4), doesn’t depend on the parameter n > 0. Hence the bound on the shrinkage factor “a” is 2(p − 2) and is also independent of n, indicating a certain distributional robustness of the minimaxity property of Baranchik type estimators with a < 2(p − 2).

With additional assumptions on the function F(t)∕f(t) in (5.4) (i.e. it is either monotone increasing or monotone decreasing), theorems analogous to Theorem 5.2 can be developed which further improve the bounds on the shrinkage factor “a”. These typically may involve additional assumptions on the function r(⋅). We will see examples of this type in the next section.

5.3 More General Minimax Estimators

We now consider minimaxity of general estimators of the form X + a g(X). The initial results rely on Lemma 5.2. The first result follows immediately from this lemma and gives an expression for the risk.

Corollary 5.1

Let X have a density f(∥x  θ2) such that E 0[∥X2] < ∞ and let g(X) be weakly differentiable and be such that E θ[∥g(X)∥2] < ∞.

Then, for loss L(θ, δ) = ∥δ  θ2 , the risk of X + a g(X) can be expressed as

$$\displaystyle \begin{aligned} R(\theta, X + a\,g(X)) = R(\theta, X) + E_\theta\big[a^2\, \Vert g(X)\Vert^2 + 2\,a\, Q(\Vert X-\theta\Vert^2)\, \,\mathrm{div}\, g(X)\big] \end{aligned} $$
(5.7)

where

$$\displaystyle \begin{aligned} Q(\Vert X-\theta\Vert^2) = \frac{F(\Vert X-\theta\Vert^2)} {f(\Vert X-\theta\Vert^2)} \end{aligned} $$
(5.8)

and where F(∥X  θ2) is defined in (5.5) .

An immediate consequence of Corollary 5.1 when the density of f satisfies (5.4), i.e. Q(t) ≥ c > 0 for some constant c, is the following.

Corollary 5.2

Under the assumptions of Corollary 5.1 , assume that, for some c > 0, we have Q(t) ≥ c for any t ≥ 0. Then X + g(X) is minimax and dominates X provided, for any \(x\in \mathbb {R}^p\) ,

$$\displaystyle \begin{aligned}\Vert g(x)\Vert^2 + 2\, c \, \,\mathrm{div}\,\, g(x) \le 0\end{aligned}$$

with strict inequality on a set of positive measure.

The following two theorems establish minimaxity results under the assumption that Q(t) is monotone.

Theorem 5.3 (Brandwein et al. 1993)

Suppose X has density f(∥x  θ2) such that E 0[∥X2] < ∞ and that Q(t) in (5.8) is nonincreasing. Suppose there exists a nonpositive function h(U) such that E R,θ[h(U)] is nondecreasing where U  U R,θ (the uniform distribution on the sphere of radius R centered at θ) and such that E θ[|h(x)|] < ∞. Furthermore suppose that g(X) is weakly differentiable and also satisfies

  1. (1)

    div g(X) ≤ h(X),

  2. (2)

    g(X)∥2 + 2 h(X) ≤ 0 , and

  3. (3)

    0 ≤ a  E 0(∥X2)∕p.

Then δ(X) = X + ag(X) is minimax. Also δ(X) dominates X provided g(⋅) is nonzero with positive probability and strict inequality holds with positive probability in (1) or (2) , or both inequalities are strict in (3) .

Proof

Note that g(x) satisfies the conditions of Corollary 5.1. Then we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} R(\theta,\delta) &\displaystyle =&\displaystyle R(\theta, X) + a\, E[a\, \Vert g(X)\Vert^2 + 2\,Q(\Vert X-\theta\Vert^2)\, \,\mathrm{div}\, g(X)] \\ &\displaystyle =&\displaystyle R(\theta, X) + a\, E [ E_{R,\theta} [ a\, \Vert g(X)\Vert^2 + 2\,Q\,(R^2)\, \mathrm{div}\, g(X) ]] \end{array} \end{aligned} $$

where E R,θ is as above and E denotes the expectation with respect to the radial distribution. Now, using (1) and (2), we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} R(\theta, \delta) &\displaystyle \le&\displaystyle R(\theta, X) + a\,E [E_{R,\theta} [ - 2\,a\,h(X) + 2\,Q(R^2)\, h(X)]] \\ &\displaystyle =&\displaystyle R(\theta, X) + 2\,a\,E[(a - Q(R^2))\, E_{R,\theta}[- h(X)]] \\ &\displaystyle \le&\displaystyle R(\theta, X) + 2\,a\,E[a - Q(R^2)] \,E_\theta [- h(X)] \end{array} \end{aligned} $$

by the monotonicity assumptions on E R,θ[h(⋅)] and Q(t) as well as the covariance inequality .

Hence, since − h(X) ≥ 0, we have R(θ, δ) ≤ R(θ, X), provided 0 ≤ a ≤ E[Q(R 2)]. Now E[Q(R 2)] = E 0[∥X2]∕p by Lemma 5.3 below, hence δ is minimax. The domination result follows since the additional conditions imply strict inequality between the risks. □

Lemma 5.3

For any k > −p such that E[R k+2] < ∞,

$$\displaystyle \begin{aligned}E[R^k Q(R^2)] = \frac{1}{p+k}\, E[R^{k+2}].\end{aligned}$$

In particular, we have

$$\displaystyle \begin{aligned}E[Q(R^2)] = \frac{1}{p}\, E[R^2] = \frac{1}{p} \,E_0 [\Vert X\Vert^2]\end{aligned}$$

and, for p ≥ 3,

$$\displaystyle \begin{aligned}E\bigg[\frac{Q(R^2)}{R^2}\bigg] = \frac{1}{p-2}.\end{aligned}$$

Proof

Recall that the radial density φ(r) of R = ∥X − θ∥ can be expressed as φ(r) = σ(S)r p−1 f(r 2) where σ(S) is the area of the unit sphere S in R p. By (5.8) and (5.5), we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} E[R^k Q(R^2)] &\displaystyle =&\displaystyle \frac{1}{2} \int_{R^p} \Vert x\Vert^k \int^\infty_{\Vert x\Vert^2} f(t)\,dt\, dx \\ &\displaystyle =&\displaystyle \frac{1}{2} \int_0^\infty \int_{B_{\sqrt{t}}} \Vert x\Vert^k dx\, f(t)\, dt\quad \text{by Fubini's theorem}\\ &\displaystyle =&\displaystyle \frac{1}{2} \int_0^\infty \int_0^{\sqrt{t}} \sigma(S)\, r^{k+p-1} dr \,f(t)\,\, dt\quad \text{by Lemma 1.4}\\ &\displaystyle =&\displaystyle \frac{1}{2} \int_0^\infty \sigma (S)\, \frac{t^{(k+p)/2}}{k+p}\, f(t)\, dt\\ &\displaystyle =&\displaystyle \frac{1}{k+p} \int_0^\infty r^{k+2} \varphi(r) \,dr\quad \text{by the change of variable} \,\,\, t = r^2\\ &\displaystyle =&\displaystyle \frac{1}{k+p} \,E[R^{k+2}]. \end{array} \end{aligned} $$

Note that positivity of integrands and E[R k+2] <  implies E[R k Q(R 2)] < . □

The next theorem reverses the monotonicity assumption on Q(⋅) and changes the condition on the function h(X) which, in turn, bounds the divergence of g(X).

Theorem 5.4 (Brandwein et al. 1993)

Suppose X has a density f(∥x  θ2) such that E 0[∥X2] < ∞ and E 0[1∕∥X2] < ∞ and such that Q(t) in (5.8) is nondecreasing. Suppose there exists a nonpositive function h(X) such that \(E_{R,\theta } \left [R^2 h(U)\right ]\) is nonincreasing where U  U R,θ and such that E θ[−h(X)] < ∞.

Furthermore suppose that g(X) is weakly differentiable and also satisfies

  1. (1)

    div g(X) ≤ h(X),

  2. (2)

    g(X)∥2 + 2 h(X) ≤ 0, and

  3. (3)

    \(0 \le a \le \frac {1}{(p-2)E_0(1/\Vert X\Vert ^2)}\).

Then δ(X) = X + a g(X) is minimax. Also δ(X) dominates X provided g(⋅) is nonzero with positive probability and strict inequality holds with positive probability in (1) or (2) , or both inequalities are strict in (3) .

Proof

As in the proof of Theorem 5.3, we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} R(\theta,\delta) &\displaystyle \le&\displaystyle R(\theta,X) + 2\,a\,E[(a-Q(R^2))\, E_{R,\theta}[-h(X)]]\\ &\displaystyle =&\displaystyle R(\theta,X) + 2\,a\,E\bigg[\bigg(\frac{a}{R^2} - \frac{Q(R^2)}{R^2}\bigg) \,E_{R,\theta}[-R^2 \,h(X)]\bigg] \\ &\displaystyle \le&\displaystyle R(\theta,X) + 2\,a \,E\bigg[\frac{a}{R^2} - \frac{Q(R^2)}{R^2}\bigg] \,E_{R_0,\theta}[-R_0^2\, h(X)] \end{array} \end{aligned} $$

where R 0 is a point such that \(a - Q(R_0^2) = 0\), provided such a point exists. Here we have used the version of the covariance inequality that states

$$\displaystyle \begin{aligned}E f(X) g(X) \le E f(X) g(X_0) \end{aligned}$$

provided that g(X) is nondecreasing (respectively, nonincreasing) and f(X) changes sign once from +  to − (respectively, − to + ) at X 0. But such a point R 0 does exist provided

$$\displaystyle \begin{aligned}E\bigg[\frac{a}{R^2} - \frac{Q(R^2)}{R^2}\bigg] \le 0 \end{aligned}$$

since Q(R 2) is nondecreasing.

It follows that R(θ, δ) ≤ R(θ, X) provided that \(aE[\frac {1}{R^2}] \le E [\frac {Q(R^2)}{R^2}]\). However \(E[\frac {Q(R^2)}{R^2}] = \frac {1}{p-2}\) by Lemma 5.3 and hence the result follows as in Theorem 5.3. □

Note that the bound on “a” in both of these theorems is strictly larger than the bound in Theorem 5.2 provided Q(R 2) is not constant. This is so since the bound in Theorem 5.2 is based on \(c = \inf Q(R^2)\) while, in these results, the bound is equal to a (possibly weighted) average of Q(R 2).

We indicate the utility of these two results by applying them to the James-Stein estimator.

Corollary 5.3

Let X  f(∥x  θ2) for p ≥ 4 and let \(\delta _b^{JS} (X) = (1 - b / \Vert X \Vert ^2 )X\) . Assume also that E 0[∥X2] < ∞ and E 0[1∕∥X2] < ∞. Then \(\delta _b^{JS}(X)\) is minimax and dominates X provided either

  1. (1)

    Q(R 2) is nonincreasing and

    $$\displaystyle \begin{aligned}0 < b < 2(p-2)\frac{E_0\Vert X\Vert^2}{p}, \; or \end{aligned}$$
  2. (2)

    Q(R 2) is nondecreasing and

    $$\displaystyle \begin{aligned}0 < b < \frac{2}{E_0(1/\Vert X\Vert^2)}. \end{aligned}$$

Proof

We apply Theorems 5.3 and 5.5 with g(X) = −[2 (p − 2)∕∥X2]X, div g(X) = −2 (p − 2)2∕∥X2 = h(X). It follows from Lemma A.5 in Appendix A.10 that when p ≥ 4, E θ,R[h(U)] is nondecreasing in R and E θ,R[R 2 h(U)] is nonincreasing in R. Hence, if Q(R 2) is nonincreasing, Theorem 5.3 implies that

$$\displaystyle \begin{aligned}\delta_a(X) = X - \frac{2\,(p-2)\, a}{\Vert X\Vert^2} X = \delta_{2\,(p-2)\,a}^{JS}(X) \end{aligned}$$

is minimax and dominates X provided 0 < a < E 0[∥X2]∕p or equivalently 0 < 2 (p − 2) a < 2 (p − 2) E 0(∥X2)∕p which is (1) with b = 2 (p − 2) a. Similarly, applying Theorem 5.5 when Q(R 2) is nondecreasing, we find that δ a(X) is minimax and dominates X if

$$\displaystyle \begin{aligned}0 < a < \frac{1}{(p-2)E_0(1/\Vert X\Vert^2)} \end{aligned}$$

which is (2). □

Example 5.6 (Densities with increasing and decreasing Q(R 2))

Note first that variance mixtures of normal distributions have increasing Q(R 2) since, by (5.6) and (5.8), Q(R 2) may be viewed as the expected value of V  with respect to a family of distributions with monotone increasing likelihood ratio in t = R 2. Note also that the bound for the shrinkage constant “a” in a James-Stein estimator is the same in Corollary 5.3 as it is in Theorem 5.1 for mixtures of normals.

We also note that, if we consider f(t) to be proportional to a density of a positive random variable, then 2 Q(t) is the reciprocal of the hazard rate. There is a large literature on increasing and decreasing hazard rates (see, for example, Barlow and Proschan 1981).

We note that the monotonicity of Q(t) may be determined in many cases by studying the log-convexity or the log-concavity of f(t). In particular, if ln f(t) is convex (concave), then Q(t) is nondecreasing (nonincreasing). To see this, note that

$$\displaystyle \begin{aligned}Q(t) = \frac{1}{2} \frac{\int_t^\infty f(u) \,du}{f(t)} = \frac{1}{2} \int_0^\infty \frac{f(s+t)}{f(t)}\, ds \end{aligned}$$

and hence Q(t) will be nondecreasing (nonincreasing) if \(\frac {f(s+t)}{f(t)}\) is nondecreasing (nonincreasing) in t for each s > 0. But, assuming for simplicity that f is differentiable, for any t ≥ 0 such that f(t) > 0,

$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{d}{dt}\bigg(\frac{f(s+t)}{f(t)}\bigg) &\displaystyle =&\displaystyle \frac{f(t)f^\prime (s+t) - f(s+t)f^\prime (t)} {f^2(t)}\\ &\displaystyle =&\displaystyle \frac{f(s+t)}{f(t)} \bigg[\frac{f^\prime (s+t)}{f(s+t)} - \frac{f^\prime (t)}{f(t)}\bigg]\\ &\displaystyle =&\displaystyle \frac{f(s+t)}{f(t)} \bigg[\frac{d}{dt}\,\mathrm{ln}\, f(s+t) - d/dt \,\mathrm{ln}\, f(t)\bigg]. \end{array} \end{aligned} $$

This is positive or negative when ln f(s + t) is convex or concave in t, respectively. For example if X has a Kotz distribution with parameter n, f(t) ∝ t n e t∕2. Then \(\mathrm {ln}\, f(t) = K + n\, \mathrm {ln}\, t - \frac {t}{2}\) which is concave if n ≥ 0 and convex if n ≤ 0. Hence Q(t) is decreasing if n > 0 and increasing if n < 0. Of course the log-convexity (log-concavity) of f(t) is not a necessary condition for the nondecreasing (nonincreasing) monotonicity of Q(t). Thus, it is easy to check that \( f(t) \, \propto \, \exp (-t^2) \exp [-1/2 \int ^t_0 \exp (-u^2)\, \,du] \) leads to \(Q(t) = \exp (t^2) \), which is increasing. But \(\log f(t)\) is not convex.

An important class of distributions is covered by the following corollary.

Corollary 5.4

Let X  f(∥x  θ2) for p ≥ 4 with \(f(t) \propto \exp (-\beta t^\alpha )\) where α > 0 and β > 0. Then \(\delta _b^{JS}(X) = (1 - b/\Vert X\Vert ^2)X\) is minimax and dominates X provided either

  1. (1)

    α ≤ 1 and \(0 < b < \frac {2} {\beta ^{1/\alpha }} \, \frac {p-2} {p} \, \frac {\varGamma ((p+2)/2\alpha )} {\varGamma (p/2\alpha )}\) or

  2. (1)

    α > 1 and \(0 < b < \frac {2} {\beta ^{1/\alpha }} \, \frac {\varGamma ( p/2\alpha )} {\varGamma ((p- 2)/2\alpha )}\).

Proof

By the above discussion, Q(R 2) is nonincreasing (nondecreasing) for α ≥ 1 (α ≤ 1). Then the result follows from Corollary 5.3 and the fact that

$$\displaystyle \begin{aligned}E_0 [\Vert X\Vert^k] = \frac{1}{\beta^{k/2\alpha}} \frac{\varGamma(\frac{p+k}{2\alpha})} {\varGamma(\frac{p}{2\alpha})}\end{aligned} $$

for k > −p. □

The final theorem of this section gives conditions for minimaxity of estimators of the form X + a g(X) for general spherically symmetric distributions. Note that no density is needed for this result which relies on the radial distribution defined in Theorem 4.1.

We first need the following lemma which will play the role of the Stein lemma in the proof of the domination and minimaxity results.

Lemma 5.4

Let X have a spherically symmetric distribution around θ, and let g(X) be a weakly differentiable function such that E θ[ |(Xθ)T g(X)| ] < ∞. Then

$$\displaystyle \begin{aligned}E_\theta [(X-\theta)^{\scriptscriptstyle{\mathrm{T}}} g(X)] = \frac{1}{p} E\bigg[R^2 \int_{B_{R,\theta}} \mathrm{div}\, g(X)\, d\mathcal{V}_{R,\theta}(X)\bigg]\end{aligned} $$

where E denotes the expectation with respect to the radial distribution and where \(\mathcal {V}_{R,\theta }(\cdot )\) is the uniform distribution on B R, θ , the ball of radius R centered at θ.

Proof

Let ρ be the radial distribution and according to Theorem 4.1, we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} E[(X - \theta)^{\scriptscriptstyle{\mathrm{T}}} g(X)] &\displaystyle =&\displaystyle \int_{\mathbb{R}_+} \int_{S_{R,\theta}} (x-\theta)^{\scriptscriptstyle{\mathrm{T}}} g(x) \,d {\mathcal U}_{R,\theta}(x) \,d\rho(R)\\ &\displaystyle =&\displaystyle \int_{\mathbb{R}_+} \frac{R}{\sigma_{R,\theta}(S_{R,\theta})} \int_{S_{R,\theta}} \frac{(x-\theta)^{\scriptscriptstyle{\mathrm{T}}}}{\Vert x-\theta\Vert} \,g(x)\, d\sigma_{R,\theta}(x) \,d\rho(R)\\ &\displaystyle =&\displaystyle \int_{\mathbb{R}_+} \frac{R}{\sigma_{R,\theta}(S_{R,\theta})} \int_{B_{R,\theta}} \mathrm{div} \, g(x) \,dx\, d \rho(R)\ \ {\text{by Stokes' theorem}}\\ &\displaystyle =&\displaystyle \frac{1}{p} \int_{\mathbb{R}_+} \int_{B_{R,\theta}} \mathrm{div}\, g(x) \,d\mathcal{V}_{R,\theta}(x)\, R^2 d\rho(R) \end{array} \end{aligned} $$

since the volume of B R,θ equals λ(B R,θ) =  R,θ(S R,θ)∕p. □

Theorem 5.5 (Brandwein and Strawderman 1991a)

Let X have a spherically symmetric distribution around θ, and suppose E 0[∥X2] < ∞ and E 0[1∕∥X2] < ∞. Suppose there exists a nonpositive function h(⋅) such that h(X) is subharmonic and E R,θ[R 2 h(U)] is nonincreasing where \(U\sim {\mathcal U}_{R,\theta }\) and such that E θ[|h(x)|] < ∞. Furthermore suppose that g(X) is weakly differentiable and also satisfies

  1. (1)

    div g(X) ≤ h(X),

  2. (2)

    g(X)∥2 + 2 h(X) ≤ 0, and

  3. (3)

    \(0 \le a \le \frac {1}{pE_0(1/\Vert X\Vert ^2)}\).

Then δ(X) = X + a g(X) is minimax. Also δ(X) dominates X provided g(⋅) is non-zero with positive probability and strict inequality holds with positive probability in (1) or (2) , or both inequalities are strict in (3) .

Proof

Using Lemma 5.4 and Conditions (1) and (2), we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} R(\theta,\delta) &\displaystyle =&\displaystyle R(\theta,X) + a\,E_\theta\big[a\,\Vert g(X)\Vert^2 + 2\,(X-\theta)^{\scriptscriptstyle{\mathrm{T}}} g(X)\big] \\ &\displaystyle \le&\displaystyle R(\theta,X) + 2\,a\,E_\theta\big[-a\,h(X) + (X-\theta)^{\scriptscriptstyle{\mathrm{T}}} g(X)\big] \\ &\displaystyle =&\displaystyle R(\theta,X) + 2\,a\bigg\{ E_\theta \big[-a\,h(X)\big] + \frac{1}{p} E \bigg[R^2 \int_{B_{R,\theta}} \mathrm{div}\, g(X) \,d\mathcal{V}_{R,\theta}(X)\bigg]\bigg\} \\ &\displaystyle \le&\displaystyle R(\theta,X) + 2\,a\bigg\{ E_\theta \big[ -a\,h(X)\big] + \frac{1}{p} E \bigg[R^2 \int_{B_{R,\theta}} h(X)\, d\mathcal{V}_{R,\theta}(X) \bigg]\bigg\}. \end{array} \end{aligned} $$

By subharmonicity of h (see Appendix A.8 and Sections 1.3 and 2.5 in du Plessis 1970),

$$\displaystyle \begin{aligned}\int_{B_{R,\theta}} h(X) d\mathcal{V}_{R,\theta}(X) \le \int_{S_{R,\theta}} h(X) d\mathcal{U}_{R,\theta}(X). \end{aligned}$$

Hence,

$$\displaystyle \begin{aligned} \begin{array}{rcl} R(\theta,\delta) &\displaystyle \le&\displaystyle R(\theta,X) + 2\,a\bigg\{E_\theta\big[-a\,h(X)\big] + \frac{1}{p} E\bigg[R^2 \int_{S_{R,\theta}} h(X)\, d\mathcal{U}_{R,\theta}(X) \bigg]\bigg\} \\ &\displaystyle =&\displaystyle R(\theta,X) + 2\,a\,E\bigg[\bigg(\frac{a}{R^2} - \frac{1}{p}\bigg) \cdot \bigg(-R^2 \int_{S_{R,\theta}} h(X) d\mathcal{U}_{R,\theta}(X)\bigg) \bigg] \\ &\displaystyle =&\displaystyle R(\theta,X) + 2\,a\,E\bigg[\bigg(\frac{a}{R^2} - \frac{1}{p}\bigg) \big( - E_{R,\theta} [R^2 h(X)]\big)\bigg] \\ &\displaystyle \le&\displaystyle R(\theta,X) + 2\,a\,E\bigg[\bigg(\frac{a}{R^2} - \frac{1}{p}\bigg) \bigg] E\big[ -E_{R,\theta} [R^2 h(X)]\big]. \\ \end{array} \end{aligned} $$

The last inequality follows from the monotonicity of E R,θ[R 2 h(X)] and the covariance inequality. Hence R(θ, δ) ≤ R(θ, X) when E[aR 2 − 1∕p] ≤ 0 which is equivalent to (3). The domination part follows as before. □

We note that the shrinkage constant in the above result 1∕{pE 0[1∕∥X2]} is somewhat smaller than the constant in Theorem 5.4 (a = 1∕{(p − 2)E 0[1∕∥X2]}), but Theorem 5.5 has essentially no restrictions on the distribution of X aside from moment conditions (which coincide in Theorems 5.4 and 5.5). In particular we do not even assume that a density exists! However there is an additional assumption of subharmonicity of h.

The following useful corollary gives minimaxity for James-Stein estimators in dimension p ≥ 4 for all spherically symmetric distributions with finite E 0[∥X2] and E 0[1∕∥X2].

Corollary 5.5

Let X have a spherically symmetric distribution with p ≥ 4, and suppose E 0[∥X2] < ∞ and E 0[1∕∥X2] < ∞. Then

$$\displaystyle \begin{aligned}\delta_a^{JS}(X) = \bigg(1 - \frac{a}{\Vert X\Vert^2}\bigg)X \end{aligned}$$

is minimax and dominates X provided

$$\displaystyle \begin{aligned}0 < a < \frac{1}{pE_0(1/\Vert X\Vert^2)}. \end{aligned}$$

Proof

Here g(X) = −X∕∥X2 and is weakly differentiable for p ≥ 3. Then div g(X) = −(p − 2)∕∥X2 and ∥g(X)∥2 = 1∕∥X2 so that Conditions (1) and (2) of Theorem 5.5 are satisfied with h(X) = −α∕∥X2 where 0 ≤ α ≤ p − 2. Now the subharmonicity of h(X) and its monotonicity condition hold since it is shown in the appendix that, for p ≥ 4, 1∕∥X2 is super-harmonic (so that E R,θ[1∕∥X2] is nonincreasing in R) and that R 2 E R,θ[1∕∥U2] is nondecreasing in R.

Furthermore, it is worth noting that E R,θ[1∕∥U2] is nonincreasing in ∥θ∥ (see Lemma A.5 and remark that follows). Hence, for any \( \theta \in \mathbb {R} ^p \), we have E θ[−h(X)] <  since

$$\displaystyle \begin{aligned} E_{R, \theta} [ 1 / \Vert X \Vert ^2 ] \leq E_{R, 0} [ 1 / \Vert X ^2 \Vert ] \end{aligned}$$

so that

$$\displaystyle \begin{aligned} E_{\theta} [ 1 / \Vert X \Vert ^2 ] \leq E_{0} [ 1 / \Vert X \Vert ^2 ] < \infty \, , \end{aligned}$$

by assumption. □

Example 5.7 (Nonspherical minimax estimators)

In Sect. 2.4.4, we considered estimators which shrink toward a subspace. Theorem 5.5 allows us to show that estimators of this type are minimax for general spherically symmetric distributions. To be specific, suppose V  is a s < p dimensional linear subspace and let

$$\displaystyle \begin{aligned}\delta_a(X) = P_V X + \bigg(1 - \frac{a}{\Vert X - P_V X\Vert^2}\bigg)(X-P_V X). \end{aligned}$$

As in the proof of Theorem 2.6, it can be shown that the risk of δ a(X) equals

$$\displaystyle \begin{aligned} R(\theta,\delta_a(X)) = E_{\nu_1}[\Vert Y_1 - \nu_1\Vert^2] + E_{\nu_2}\bigg[\bigg\Vert\bigg(1 - \frac{a}{\Vert Y_2\Vert^2} \bigg)Y_2 - \nu_2\bigg\Vert^2\bigg], \end{aligned} $$
(5.9)

where Y 1, Y 2, ν 1 and ν 2 are as in Theorem 2.6.

In the present case, Y 2 has a spherically symmetric distribution about ν 2 of dimension p − s. Hence, by Theorem 5.5,

$$\displaystyle \begin{aligned} \begin{array}{rcl} E(\theta, \delta_a(X)) &\displaystyle \le&\displaystyle E_{\nu_1}[\Vert Y_1 - \nu_1\Vert^2] + E_{\nu_2}[\Vert Y_2 - \nu_2\Vert^2] \\ &\displaystyle =&\displaystyle E_\theta \Vert X-\theta\Vert^2 \\ &\displaystyle =&\displaystyle R(\theta, X), \end{array} \end{aligned} $$

provided p − s ≥ 4 and

$$\displaystyle \begin{aligned}0 < a < \frac{1}{(p-s)\,E_0[1/\Vert X- P_V X\Vert^2]}.\end{aligned}$$

5.4 Bayes Estimators

In this section, we consider (generalized) Bayes estimators of the location vector \(\theta \in \mathbb {R}^p\) of a spherically symmetric distribution. More specifically let X be a random vector in \(\mathbb {R}^p\) with density f(∥x − θ2) and let π(θ) be a prior density. Under quadratic loss ∥δ − θ2, the (generalized) Bayes estimator of θ is the posterior mean given by

$$\displaystyle \begin{aligned} \delta_\pi(X) = X + \frac{1}{m(X)} \int_{\mathbb{R}^p} (\theta - X) \,f(\Vert X-\theta\Vert^2)\, \pi(\theta)\, d\theta \end{aligned} $$
(5.10)

where m(x) is the marginal

$$\displaystyle \begin{aligned} m(x) = \int_{\mathbb{R}^p} f(\Vert x - \theta\Vert^2) \pi(\theta) \,d\theta. \end{aligned} $$
(5.11)

Recall from Sect. 3.1.1 that, in the normal case (that is, \(f(t) \propto \exp (-t/2\sigma ^2)\) with σ 2 known) the superharmonicity of \(\sqrt {m(x)}\) is a sufficient condition for minimaxity of δ π(X). This superharmonicity is implied by that of m(x) and in turn by that of π(θ). While in the nonnormal case minimaxity has been studied by many authors (for example, see Strawderman (1974b); Berger (1975); Brandwein and Strawderman (1978, 1991a)) relatively few results on minimaxity of Bayes estimators are known. The primary technique to establish minimaxity is through a Baranchik representation of the form (1 − a r (∥X2)∕∥X2)X. The minimaxity conditions are essentially those developed in Theorems 5.3 and 5.4 and most of the derivations are in the context of variance mixtures of normals. See Strawderman (1974b), Maruyama (2003a) and Fourdrinier et al. (2008) for more discussion and results on Bayes estimation in this setting.

The main difficulty in using Theorem 5.1 with mixtures of normals densities for the sampling distribution is to prove the monotonicity (and boundedness) properties of the function r(⋅). Maruyama (2003a) and Fourdrinier et al. (2008) consider priors which are mixtures of normals as well. Their main condition for obtaining minimaxity of the corresponding Bayes estimator is that the mixing density g of the sampling distribution has monotone nondecreasing likelihood ratio when considered as a scale parameter family. In Fourdrinier et al. (2008), explicit use is made of that monotone likelihood ratio property for the mixing (possibly generalized) density h of the prior distribution.

The main result of Fourdrinier et al. (2008) is the following. Consult that paper for the somewhat technical proof.

Theorem 5.6

Let X be a random vector in \(\mathbb {R}^p\) (p ≥ 3) distributed as a variance mixture of multivariate normal distributions with density

$$\displaystyle \begin{aligned} f(x) = \int_0^\infty \frac{1}{(2\pi v)^{p/2}} \, \exp \bigg( - \frac{1}{2}\, \frac{\Vert x-\theta\Vert^2}{v}\bigg) \, g(v) \,dv \end{aligned} $$
(5.12)

where g is the density of a known nonnegative random variable V . Let π be a (generalized) prior with density of the form

$$\displaystyle \begin{aligned} \pi(\theta) = \int_0^\infty \frac{1}{(2\pi t)^{p/2}} \, \exp \bigg( - \frac{1}{2}\, \frac{\Vert \theta\Vert^2}{t}\bigg) \, h(t) \,dt \end{aligned} $$
(5.13)

where h is a function from \(\mathbb {R}_+\) into \(\mathbb {R}_+\) such that this integral exists.

Assume that the mixing density g is such that

$$\displaystyle \begin{aligned} E[V] = \int_0^\infty v \,g(v)\, dv < \infty\,\, \mathrm{and}\,\, E[V^{-p/2}] = \int_0^\infty v^{-p/2}\, g(v) \,dv < \infty. \end{aligned} $$
(5.14)

Assume also that the mixing function h of the (possibly improper) prior density π is absolutely continuous and satisfies

$$\displaystyle \begin{aligned} \lim_{t\to\infty} \frac{h(t)}{t^\beta} = c \end{aligned} $$
(5.15)

for some β < p∕2 − 1 and some 0 < c < ∞. Assume, finally, that h and g have monotone increasing likelihood ratio when considered as a scale parameter family.

Then, if there exist K > 0, t 0 > 0 and α < 1 such that

$$\displaystyle \begin{aligned} h(t) \le K\, t^{-\alpha}\quad \mathrm{for}\,\, 0 < t < t_0, \end{aligned} $$
(5.16)

the (generalized or proper) Bayes estimator δ h with respect to the prior distribution corresponding to the mixing function h is minimax provided that β satisfies

$$\displaystyle \begin{aligned} - (p-2) \bigg[ \frac{E[V^{-p/2 + 1}]}{E[V]E[V^{-p/2}]} - \frac{1}{2} \bigg] \le \beta. \end{aligned} $$
(5.17)

For priors with mixing distribution h satisfying (5.16) and (5.17) an argument as in Maruyama (2003a) using Brown (1979) and a Tauberian theorem suggests that the resulting generalized Bayes estimator is admissible if β ≤ 0. Maruyama and Takemura (2008) have verified this under additional conditions which imply, in the setting of Theorem 5.6, that E θ[∥X3] < .

As an illustration assume that the sampling distribution is a p-variate Student-t with n 0 degrees of freedom which corresponds to the inverse gamma mixing density (n 0∕2, n 0∕2), that is, to \(g(v) \propto v^{-(n_0 + 2)/2} \exp (- n_0/2v)\). Let the prior be a Student-t distribution with n degrees of freedom, that is, with mixing density \(h(t) \propto t^{-(n+2)/2} \exp (-n/2t)\). It is clear that Conditions (5.14) and (5.15) are satisfied with n 0 ≥ 7. It is also clear that Condition (5.16) holds for any α < 1. Finally a simple calculation shows that

$$\displaystyle \begin{aligned}\frac{E[V^{-p/2 + 1}]}{E[V]E[V^{-p/2}]} = \frac{n_0 - 2}{p + n_0 - 2} \end{aligned}$$

so that Condition (5.17) reduces to

$$\displaystyle \begin{aligned}n \le (p-2) \bigg[ \frac{2(n_0 - 2)}{p + n_0 - 2} - 1\bigg] - 2.\end{aligned}$$

Note that, as n > 0, this condition holds if and only if p ≥ 5 and

$$\displaystyle \begin{aligned}n_0 \ge 3 + p \frac{p}{p-4}.\end{aligned}$$

Other examples (including generalized priors) can be found in Fourdrinier et al. (2008).

In the following, we consider broader classes of spherically symmetric distributions which are not restricted to variance mixtures of normals. Minimaxity of generalized Bayes estimators is obtained for unimodal spherically symmetric superharmonic priors π(∥θ2) under the additional assumption that the Laplacian of π(∥θ2) is a nondecreasing function of ∥θ2. The results presented below are derived in Fourdrinier and Strawderman (2008a). An interesting feature is that their approach does not rely on the Baranchik representation used in Maruyama (2003a) and Fourdrinier et al. (2008). Note, however, that the superharmonicity property of the priors implies that the corresponding Bayes estimators cannot be proper (see Theorem 3.2).

First note that, for any prior π(θ), the Bayes estimator in (5.10) can be written as

$$\displaystyle \begin{aligned} \delta_\pi(X) = X + \frac{\nabla M(X)}{m(X)} \end{aligned} $$
(5.18)

where, for any \(X\in \mathbb {R}^p\),

$$\displaystyle \begin{aligned}M(x) = \int_{\mathbb{R}^p} F(\Vert x - \theta\Vert^2) \,\pi(\theta) \,d\theta\end{aligned}$$

with F given in (5.5). Thus δ π(X) has the general form δ π(X) = X + g(X) (with g(X) = ∇M(X)∕m(X)). If the density f(∥x − θ2) is as in Sect. 5.2.1, that is, such F(t)∕f(t) ≥ c > 0 for some fixed positive constant c, then Corollary 5.2 applies and δ π(X) = X + g(X) = X + ∇M(X)∕m(X) is minimax provided, for any \(x\in \mathbb {R}^p\),

$$\displaystyle \begin{aligned}2 \,c\,\, \mathrm{div}\, g(x) + \Vert g(x)\Vert^2 \le 0.\end{aligned}$$

In particular, it follows that if

$$\displaystyle \begin{aligned} 2\,c \,\frac{\varDelta M(x)}{m(x)} - 2\,c\, \frac{\nabla M(x) \cdot \nabla m(x)}{m^2(x)} + \frac{\Vert\nabla M(x)\Vert^2}{m^2(x)} \le 0 \end{aligned} $$
(5.19)

and

$$\displaystyle \begin{aligned}E_\theta \bigg[ \bigg\Vert \frac{\nabla M(X)}{m(X)}\bigg\Vert^2 \bigg] < \infty,\end{aligned}$$

δ π is minimax.

For a spherically symmetric prior π(∥θ2), the main result of Fourdrinier and Strawderman (2008a) is the following.

Theorem 5.7

Assume that X has a spherically symmetric distribution in \(\mathbb {R}^p\) with density f(∥x  θ2). Assume that \(\theta \in \mathbb {R}^p\) has a superharmonic prior π(∥θ2) such that π(∥θ2) is nonincreasing and Δπ(∥θ2) is nondecreasing inθ2 . Assume also that

$$\displaystyle \begin{aligned}E_\theta \bigg[ \bigg\Vert \frac{\nabla M(X)}{m(X)}\bigg\Vert^2\bigg] < \infty.\end{aligned}$$

Then the Bayes estimator δ π is minimax under quadratic loss provided that f(t) is log-convex, \(c = \frac {F(0)}{f(0)} > 0\) and

$$\displaystyle \begin{aligned} \int_0^\infty f(t) t^{p/2} dt \le 4c \int_0^\infty - f^\prime (t) t^{p/2} dt < \infty. \end{aligned} $$
(5.20)

To prove Theorem 5.7 we need some preliminary lemmas whose proofs are given in Appendix A.9 . Note first that it follows from the spherical symmetry of π that, for any \(x\in \mathbb {R}^p\), m(x) and M(x) are functions of t = ∥x2. Then, setting

$$\displaystyle \begin{aligned}m(x) = m(t)\quad \mathrm{and}\quad M(x) = M(t),\end{aligned}$$

we have

$$\displaystyle \begin{aligned} \nabla m(x) = 2m^\prime (t)\,x\quad \mathrm{and}\quad \nabla M(x) = 2\,M^\prime (t)\,x. \end{aligned} $$
(5.21)

Lemma 5.5

Assume that π (t) ≤ 0, for any t ≥ 0. Then we have M (t) ≤ 0, for any t ≥ 0.

Lemma 5.6

For any \(x\in \mathbb {R}^p\) ,

$$\displaystyle \begin{aligned}x \cdot\nabla m(x) = -2 \, \int_0^\infty H(u,t) \,u^{p/2}\, f^\prime (u) \,du \end{aligned}$$

and

$$\displaystyle \begin{aligned}x \cdot\nabla M(x) = \int_0^\infty H(u,t)\, u^{p/2}\, f(u) \,du \end{aligned}$$

where, for u ≥ 0 and for t ≥ 0,

$$\displaystyle \begin{aligned} H(u,t) = \lambda(B) \int_{B_{\sqrt{u}, x}} x\cdot\theta\,\pi^\prime (\Vert\theta\Vert^2) \,d\mathcal{V}_{\sqrt{u}, x} (\theta) \end{aligned} $$
(5.22)

and \(\mathcal {V}_{\sqrt {u}, x}\) is the uniform distribution on the ball \(B_{\sqrt {u}, x}\) of radius \(\sqrt {u}\) centered at x and λ(B) is the volume of the unit ball.

Lemma 5.7

For any t ≥ 0, the function H(u, t) in (5.22) is nondecreasing in u provided that Δπ(∥θ2) is nondecreasing inθ2.

Lemma 5.8

Let h(∥θ  x2) be a unimodal density and let ψ(θ) be a symmetric function. Then

$$\displaystyle \begin{aligned}\int_{\mathbb{R}^p} x\cdot\theta \,\psi(\theta) \,h(\Vert\theta - x\Vert^2)\, d\theta \ge 0\end{aligned}$$

as soon as ψ is nonnegative.

Proof (Proof of Theorem 5.7 )

By the superharmonicity of π(∥θ2), we have ΔM(x) ≤ 0 for all \(x\in \mathbb {R}^p\) so that by (5.19), it suffices to prove that

$$\displaystyle \begin{aligned} -2\,c \,\nabla M(x) \cdot \nabla m(x) + \Vert\nabla M(x)\Vert^2 \le 0 \end{aligned} $$
(5.23)

for all \(x\in \mathbb {R}^p\). Since m and M are spherically symmetric, by (5.21), (5.23) reduces to − 2cM (t)m (t) + (M (t))2 ≤ 0 where t = ∥x2. Since M (t) ≤ 0 by Lemma 5.5, (5.23) reduces to − 2cm (t) + M (t) ≥ 0 or, by (5.21), to − 2 c x ⋅∇m(x) + x ⋅∇M(x) ≥ 0 or, by Lemma 5.6, to

$$\displaystyle \begin{aligned} 4c E\bigg[ H(u,t) \frac{f^\prime (u)}{f(u)}\bigg] + E[h(u,t)] \ge 0, \end{aligned} $$
(5.24)

where E denotes the expectation with respect to the density proportional to u p∕2 f(u). Since, by assumption, Δπ(∥θ2) is nondecreasing in ∥θ2, H(u, t) is nondecreasing in u by Lemma 5.7. Furthermore f (u)∕f(u) is nondecreasing by log-convexity of f so that (5.16) is satisfied as soon as

$$\displaystyle \begin{aligned} 4\,c\, E[H(u,t)] \,E\bigg[\frac{f^\prime (u)}{f(u)}\bigg] + E[H(u,t)] \ge 0. \end{aligned} $$
(5.25)

Finally, as π (∥θ2) ≤ 0 by assumption, Lemma 5.2 guarantees that H(u, t) ≤ 0 (note that \(V_{\sqrt {u}, x}\) has a unimodal density) and hence (5.25) reduces to

$$\displaystyle \begin{aligned}4c E\bigg[ \frac{f^\prime (u)}{f(u)}\bigg] + 1 \le 0 \end{aligned} $$

which is equivalent to (5.20). □

Several examples of priors and sampling distributions which satisfy the assumptions of Theorem 5.7 are given in Fourdrinier and Strawderman (2008a). We briefly summarize these.

Example 5.8 (Priors related to the fundamental harmonic prior)

Let \(\displaystyle \pi (\|\theta \|{ }^2) = \left (\frac {1}{A+\|\theta \|{ }^2}\right )^c\) with A ≥ 0 and \(0\leq c \leq \frac {p}{2}-1\).

Example 5.9 (Mixtures of priors)

Let (π α)α𝜖A be a family of priors such that the assumptions of Theorem 5.7 are satisfied for any α ∈ A. Then any mixture of the form ∫A π α(∥θ2) dH(α) where H is a probability measure on A satisfies these assumptions as well. For instance, Example 5.8 with c = 1, p ≥ 4, A = α and the gamma density \(\displaystyle \alpha \longmapsto \frac {\beta ^{1-v}}{\varGamma (1-v)}\alpha ^{-v} e^{-\beta \alpha }\) with β > 0 and 0 < v < 1 leads to the prior

$$\displaystyle \begin{aligned}\|\theta\|{}^{-2-v} \: e^{\beta\|\theta\|{}^2} \: \varGamma(v,\beta \|\theta\|{}^2), \end{aligned}$$

where

$$\displaystyle \begin{aligned}\varGamma(v,y) = \int_y^\infty e^{-x} x^{v-1} \, dx \end{aligned}$$

is the complement of the incomplete gamma function.

Example 5.10 (Variance mixtures of normals)

Let

$$\displaystyle \begin{aligned}\pi(\|\theta\|{}^2) = \int_O^\infty \left(\frac{u}{2\pi}\right)^{p/2} \exp \left(\frac{-u \|\theta\|{}^2}{2}\right) \, h(u) du \end{aligned}$$

a mixture of normals with respect to the inverse of the variance . As soon as, for any u > 0,

$$\displaystyle \begin{aligned}\frac{u h^\prime (u)}{h(u)} \leq -2, \end{aligned}$$

the prior π(∥θ2) satisfies the assumptions of Theorem 5.7. Note that the priors in Example 5.10 arise as such a mixture with \(h(u) \propto \alpha u^{k-p/2 -1} \exp (- A/2 u)\).

Other examples can be given and a constructive approach is proposed in Fourdrinier and Strawderman (2008a).

We now give examples of sampling distributions which satisfy the assumptions of Theorem 5.7.

Example 5.11 (Variance mixtures of normals)

Let

$$\displaystyle \begin{aligned} f(t) = (2 \, \pi)^{-p/2} \int_0^\infty v^{-p/2} \exp \left( - \frac{t}{2 \, v} \right) h(v) \, dv \end{aligned}$$

where h is a mixing density and let V  be a nonnegative random variable with density proportional to f(t). If E[V p∕2] <  and E[V ] E[V p∕2]∕E[V p∕2+1] < 2 then the sampling density f satisfies the assumptions of Theorem 5.7.

Example 5.12 (Densities proportional to \(e^{-\alpha t^\beta }\))

Let

$$\displaystyle \begin{aligned} f(t) = K\, e^{-\alpha t^\beta} \end{aligned}$$

where α > 0, \(\frac {1}{2} < \beta \le 1\) and K is the normalizing constant. Then the sampling density f satisfies the assumptions of Theorem 5.7 as soon as β is in a neighborhood of the form ]1 − 𝜖, 1] with 𝜖 > 0. However, note that these are not satisfied when β = 1∕2.

Fourdrinier and Strawderman (2008a) give other examples with densities proportional to e αt+βφ(t) where φ is a convex function.

5.5 Shrinkage Estimators for Concave Loss

In this section we consider improved shrinkage estimators for loss functions that are concave functions of squared error loss. The basic results are due to Brandwein and Strawderman (1980, 1991b) and we largely follow the method of proof in the later paper. The general nature of the main result is that (under mild conditions) if an estimator can be shown to dominate X under squared error loss then the same estimator, with a suitably altered shrinkage constant, will dominate X for a loss which is a concave function of squared error loss.

Let X have a spherically symmetric distribution around θ, and let g(X) be a weakly differentiable function. The estimators considered are of the form

$$\displaystyle \begin{aligned} \delta(X) = X + ag(X). \end{aligned} $$
(5.26)

The loss functions are of the form

$$\displaystyle \begin{aligned} L(\theta,\delta) = \ell(||\delta - \theta||{}^2 ), \end{aligned} $$
(5.27)

where (⋅) is a differentiable nonnegative, nondecreasing concave function (so that, in particular (⋅) ≥ 0).

One basic tool needed for the main result is Theorem 5.5, and the other is the basic property of the concave function (⋅) that (t + a) ≤ (t) + aℓ (t).

The following result shows that shrinkage estimators that improve on X for squared error loss also improve on X for concave loss provided the shrinkage constant is adjusted properly.

Theorem 5.8 (Brandwein and Strawderman 1991a)

Let X have a spherically symmetric distribution around θ, let g(X) be a weakly differentiable function, and let the loss be given by (5.27) .

Suppose there exists a subharmonic function h(⋅) such that E θ,R[R 2 h(U)] is nonincreasing where \(U\sim {\mathcal U}_{R,\theta }\) . Furthermore suppose that the function g(⋅) satisfies \( E^*_\theta [||g(X)||{ }^2]< \infty \) and also satisfies

  1. (1)

    div g(x) ≤ h(x), for any \(x \in \mathbb {R}^p\) ,

  2. (2)

    g(x)∥2 + 2h(x) ≤ 0, for any \(x \in \mathbb {R}^p\) , and

  3. (3)

    \(0 \le a \le \frac {1}{pE^*_0(1/\Vert X\Vert ^2)}\) ,

where \(E^*_\theta \) refers to the expectation with respect to the distribution whose Radon-Nikodyn derivative with respect to the distribution of X is proportional to ℓ (||X  θ||2).

Then δ(X) = X + ag(X) is minimax. Also δ(X) dominates X provided g(⋅) is non-zero with positive probability and strict inequality holds with positive probability in (1) or (2) , or both inequalities are strict in (3) .

Proof

Note, by concavity of (⋅) and the usual identity

$$\displaystyle \begin{aligned} \begin{array}{rcl} R(\theta,\delta) &\displaystyle =&\displaystyle E_\theta [\ell(||\delta(X)-\theta||{}^2)] \\ &\displaystyle \leq &\displaystyle E_\theta [\ell(||X-\theta||{}^2)]\\ &\displaystyle &\displaystyle \qquad + E _\theta[\ell^{\prime}(||X-\theta||{}^2)(a^{2}||g(X)||{}^{2}+2a(X-\theta)^{\prime}g(X))]. \end{array} \end{aligned} $$

Hence, the difference in risk, R(θ, δ) − R(θ, X) is bounded by

$$\displaystyle \begin{aligned} \begin{array}{rcl} R(\theta,\delta) - R(\theta,X) &\displaystyle \leq &\displaystyle E_\theta[\ell^{\prime}(||X-\theta||{}^2)(a^{2}||g(X)||{}^{2}+2a(X-\theta)^{\prime}g(X)) ] \\ &\displaystyle =&\displaystyle E^{*}_\theta[(a^{2}||g(X)||{}^{2}+2a(X-\theta)^{\prime}g(X))] \\ &\displaystyle \leq&\displaystyle 0, \end{array} \end{aligned} $$

by Theorem 5.5 applied to the distribution corresponding to \(E^{*}_\theta \). □