1 Introduction

It is well-known that the normal distribution and its related statistical inference such as estimation of its mean are crucial in application. Ever since Stein [19] discovered the inadmissibility of the best invariant estimator of the p-dimensional \((p\ge 3)\) normal mean under quadratic loss, there has been considerable interest in improving upon the best invariant estimator of a location vector by relaxing the normality assumption, studying more general estimators, or considering different loss functions.

Under the quadratic loss, James and Stein [15] presented a class of dominating estimators, \(\left( 1-a/||\mathbf{X}||^2\right) \mathbf{X}\) for \(0<a<2(p-2)\) if \(\mathbf{X}\) has a normal distribution with the identity covariance matrix \(I_p.\) This result remains true if the distribution of \(\mathbf{X}\) is spherically symmetric about its location vector and \(p\ge 4\) as shown by Brandwein [2], Brandwein and Strawderman [3, 4, 7], Fan and Fang [12,13,14], Maruyama [16], and Brown and Zhao [9], Tosh and Dasgupta [21] and others; see the review articles by Brandwein and Strawderman [5, 6]. When the dimension is at least three, Brown [8] also proved that the best invariant estimator of a location vector is inadmissible for a wide class of distributions and loss functions. When the components of \(\mathbf{X}\) are independent, identically and symmetrically (iis) distributed about their respective means, Shinozaki [18] studied the dominance conditions of the James-Stein type estimator

$$\begin{aligned} {\varvec{\delta }}_{a,\,b}(\mathbf{X})=\left( 1-{a\over {b+||\mathbf{X}||^2}}\right) \mathbf{X}, \end{aligned}$$
(13.1)

over \(\mathbf{X}\) and obtained the bounds of a and b in (13.1) that depend on the second and fourth moments of the component distributions. Xu [23] investigated the bounds of a and b in (13.1) when \(\mathbf{X}\) has a sign-invariant distribution.

For more general estimators and different loss functions, Miceli and Strawderman [17] restricted the distribution of \(\mathbf{X}\) to the subclass of iis distributions called independent component variance mixtures of normals and replaced a in (13.1) by \(ar(X_1^2, \ldots , X_p^2),\) where \(r(X_1^2, \ldots , X_p^2)\) is a function of \(X_1^2, \ldots , X_p^2.\) Their loss function is nonquadratic. When \(\mathbf{X}\) has a spherically symmetric distribution about its location vector \({\varvec{\theta }}\) and loss function is a quadratic loss, a concave function of quadratic loss, or the general quadratic loss, Brandwein and Strawderman [7] elegantly used the divergence theorem to prove the dominance of the estimators

$$\begin{aligned} {\varvec{\delta }}_{a,\,\mathbf{g}}(\mathbf{X})=\mathbf{X}+a\mathbf{g}(\mathbf{X}). \end{aligned}$$
(13.2)

over \(\mathbf{X}\) under conditions (i) \(||\mathbf{g}||^2/2\le -h\le -\triangledown \circ \mathbf{g},\) where \(-h\) is superharmonic, (ii) \(E[-R^2h(\mathbf{V})]\) is nondecreasing in R,  where \(\mathbf{V}\) has a uniform distribution in the sphere centered at \({\varvec{\theta }}\) with a radius \(R=||\mathbf{X}-{\varvec{\theta }}||,\) and (iii) \(0<a\le 1/[pE(R^{-2})].\) Clearly, the estimators \({\varvec{\delta }}_{a,\,\mathbf{g}}(\mathbf{X})\) given by (13.2), together with conditions (i) and (iii) extend the classical James-Stein estimator to a broader class of estimators, while their condition (ii) is a technical condition. Xu and Izmirlian [24] dropped their technical condition (ii) and obtained a bound \(0<a<[\mu _1/(p^2\mu _{-1})][1-(p-1)\mu _1/(p\mu _{-1}\mu _2)]^{-1}\) for a,  where \(\mu _i=E(R^i)\) for \( i=-1,1,2.\) As stated by Xu and Izmirlian [24], their bound of a is sometimes worse than the bound obtained by Brandwein and Strawderman [7]. A question of theoretical interest is raised: Is this possible that bounds of a obtained by Brandwein and Strawderman [7] and Xu and Izmirlian [24] can be improved under a weaker condition than Brandwein and Strawderman’s [7] technical condition (ii)? In this paper we provide an affirmative answer to this question. Specifically, we use the fact that the average of \(-h\) over the sphere is nonincreasing in the radius to show dominance of \({\varvec{\delta }}_{a,\,\mathbf{g}}(\mathbf{X})\) over \(\mathbf{X}\) and obtain a new bound \(0<a\le \mu _1/(p\mu _{-1})\) for a,  which is always better than \(1/(p\mu _{-2})\) and \([\mu _1/(p^2\mu _{-1})][1-(p-1)\mu _1/(p\mu _{-1}\mu _2)]^{-1}.\)

The paper is organized as follows: In Sect. 13.2 we present the main result that states the dominance conditions of the estimators \({\varvec{\delta }}_{a,\,\mathbf{g}}(\mathbf{X})\) with respect to the quadratic loss. To illustrate the construction of the function h and the performance of the new bound, three examples are also studied in Sect. 13.2. In Sect. 13.3 we extend the main result in Sect. 13.2 to other loss functions that are nondecreasing concave functions of quadratic loss. The estimators of the location vector when the scale is unknown and the observation \((\mathbf{X}^T, \mathbf{Y}^T)^T\) contains a residual vector \(\mathbf{Y}\) are also considered in Sect. 13.3. Section 13.4 is devoted to some concluding remarks, while the last section consists of proofs of results in Sects. 13.2 and 13.3.

2 Main Results

Let \({\varvec{\delta }}=(\delta _1,\ldots ,\delta _p)^T\) be an estimator of \({\varvec{\theta }}\) and let \(R({\varvec{\delta }},{\varvec{\theta }})=E[L({\varvec{\delta }},{\varvec{\theta }})]\) be the risk of \({\varvec{\delta }},\) where the loss function \(L({\varvec{\delta }},{\varvec{\theta }})\) is defined by

$$\begin{aligned} L({\varvec{\delta }},{\varvec{\theta }})=||{\varvec{\delta }}-{\varvec{\theta }}||^2=\sum _{i=1}^p(\delta _i-\theta _i)^2. \end{aligned}$$
(13.3)

That is, the loss function \(L({\varvec{\delta }},{\varvec{\theta }})\) we consider in this section is quadratic. Furthermore, we employ the following notation introduced by Xu and Izmirlian [24]:

$$\begin{aligned} \begin{aligned} m(t)&=-E_\mathbf{U}[h(t\mathbf{U}+{\varvec{\theta }})],\\ M_*(t)&=M(t)-M(0)\,=\,\int _0^tm(z)dz \end{aligned} \end{aligned}$$
(13.4)

for \(t\ge 0,\) where \(-h\) is a nonnegative and superharmonic function and the random vector \(\mathbf{U}\) has a uniform distribution on the surface of the unit sphere. Note that m(t) is a nonincreasing function of t and \(M_*(t)\) is a nonnegative and nondecreasing concave function of t (see Du Plessis [11], p. 54]).

Theorem 13.1

Suppose that \(\mathbf{X}\sim \mathrm{SS}_p({\varvec{\theta }}, I_p)\) (spherically symmetric about mean vector \({\varvec{\theta }}\)) and \({\varvec{\delta }}_{a,\,\mathbf{g}}(\mathbf{X})\) is defined by (13.2). Then under quadratic loss (13.3), \({\varvec{\delta }}_{a,\,\mathbf{g}}(\mathbf{X})\) has a smaller risk than \({\varvec{\delta }}_{0,\,\mathbf{g}}(\mathbf{X})=\mathbf{X}\) if

(i)     \(||\mathbf{g}||^2/2\le -h\le -\triangledown \circ \mathbf{g},\) where \(-h\) is superharmonic,

(ii)   \(r\int _0^1m(rz)pz^{p-1}dz\ge c\int _0^1M_*(rz)pz^{p-2}dz\) when \(r>\sqrt{ap},\) where m and \(M_*\) are defined by (13.4) and \(1\le c\le p-1\) is a constant, and

(iii)  \(0<a\le \mu _1/(p\mu _{-1}),\) where \(\mu _{-i}=E(R^{-i})\) for \(i=-1,1\) and \(R=||\mathbf{X}-{\varvec{\theta }}||.\)

Remark 13.1

The condition (ii) of Theorem 13.1 is slightly weaker than the condition (ii) of Brandwein and Strawderman [7]. To see this, we use integration by parts to obtain that

$$\begin{aligned} r\int _0^1m(rz)pz^{p-1}dz=p\left( M_*(r)-\int _0^1M_*(rz)(p-1)z^{p-2}dz\right) . \end{aligned}$$

Thus, the condition (ii) above is equivalent to \(N(r)\ge 0\) when \(r>\sqrt{ap},\) where N(r) is defined by

$$\begin{aligned} N(r)=M_*(r)-(p-1-c)\int _0^1M_*(rz)z^{p-2}dz. \end{aligned}$$

Taking the derivative of N(r) gives that

$$\begin{aligned} N^\prime (r)=m(r)-(p-1-c)\int _0^1m(rz)z^{p-1}dz. \end{aligned}$$
(13.5)

Since the condition (ii) of Brandwein and Strawderman [7] is equivalent to

$$\begin{aligned} \int _0^1m(rz)z^{p-1}dz\le {1\over {p-2}}m(r),\quad r>0. \end{aligned}$$
(13.6)

Applying (13.6) to (13.5) will yield that

$$\begin{aligned} N^\prime (r)\ge m(r)-{p-1-c\over {p-2}}m(r)={c-1\over {p-2}}m(r)\ge 0 \end{aligned}$$

because \(c\ge 1.\) This shows that N(r) is a nondecreasing function of r. Using the fact that \(\lim \limits _{r\rightarrow 0^+}N(r)=0,\) we can conclude that \(N(r)\ge 0\) when \(r>0.\)

It is also worth mentioning that we only require the condition (ii) to be true when \(r>\sqrt{ap}.\) When \(r\le \sqrt{ap},\) there is no any assumption.

Remark 13.2

Let F denote the distribution function (df) of \(R=||\mathbf{X}-{\varvec{\theta }}||.\) Then applying Lemma 13.1 in Sect. 13.5 with \(f_1(r)=r,\,g_1(r)=1/r^2,\,f_2(r)=g_2(r)=1\) and \(d\alpha =dF\) yields that

$$\mu _{-1}=E\left( {1\over {R}}\right) =E\left( R{1\over {R^2}}\right) \le E(R)E\left( {1\over {R^2}}\right) =\mu _1\mu _{-2}.$$

Using this fact, we can conclude that the new bound for a is better than that of Brandwein and Strawderman [7] because

$${1\over {p\mu _{-2}}}\le {\mu _1\over {p\mu _{-1}}}.$$

Remark 13.3

The new bound for a is also better than that of Xu and Izmirlian [24]. This can be seen from a direct comparison with the fact that \(\mu _1\le \mu _{-1}\mu _2,\) which follows from an application of Lemma 13.1 in Sect. 13.5 with \(f_1(r)=1/r,\,g_1(r)=r^2,\,f_2(r)=g_2(r)=1\) and \(d\alpha =dF.\)

Remark 13.4

It needs to be mentioned that the requirement of dimensionality such as \(p\ge 4\) usually arises in the condition (i) of Theorem 13.1. Meanwhile, although the function h used in Theorem 13.1 has many choices, we usually take \(h(\mathbf{X})=\triangledown \circ \mathbf{g}(\mathbf{X})\) when \(\triangledown \circ \mathbf{g}(\mathbf{X})\) is a subharmonic function.

Example 13.1

Consider the James-Stein [15] estimator which is given by

$$\delta _{a,0}(\mathbf{X})=\left( 1-{a\over {||\mathbf{X}||^2}}\right) \mathbf{X}$$

and discussed by many authors including Brandwein and Strawderman [7] and Fan and Fang [13]. Clearly, taking \(\mathbf{g}(\mathbf{X})=-\mathbf{X}/||\mathbf{X}||^2\) in (13.2) will see that \(\delta _{a,0}(\mathbf{X})\) is a special case of estimators (13.2). Let

$$h(\mathbf{X})=\triangledown \circ \mathbf{g}(\mathbf{X})=-{p-2\over {||\mathbf{X}||^2}}.$$

Then \(-h\) is superharmonic if \(p\ge 4\) because

$$-\sum _{i=1}^p{\partial ^2h\over {\partial x_i^2}}=-{(p-2)(p-4)\over {||\mathbf{X}||^2}}\le 0.$$

The condition (i) in Theorem 13.1 is clearly satisfied. Meanwhile, condition (ii) in Theorem 13.1 is also true because Brandwein and Strawderman’s [7] technical condition (ii) is true, see Lemma 2.2 of Fan and Fang [13].

To illustrate the performance of the new bound of a,  we consider two examples below. We use \(a_\mathrm{new}\) to denote the new bound \(\mu _1/(p\mu _{-1})\) of a. We denote by \(a_\mathrm{bs}=1/(p\mu _{-2}),\) the bound of a in Brandwein and Strawderman’s [7] Theorem 2.1, and \(a_\mathrm{xi}= [\mu _1/(p^2\mu _{-1})][1-(p-1)\mu _1/(p\mu _{-1}\mu _2)]^{-1},\) the bound of a in Xu and Izmirlian’s [24] Theorem 1.

Example 13.2

Let \(\mathbf{X}\) have a normal distribution with mean \(\varvec{\theta }\) and covariance matrix \(I_p.\) Then \(R^2=||\mathbf{X}-{\varvec{\theta }}||^2\) has a \(\chi ^2_p\)-distribution, which implies that \(\mu _{-2}=1/(p-2),\,\mu _{-1}=\varGamma ((p-1)/2)/[\sqrt{2}\varGamma (p/2)],\) and \(\mu _1=\sqrt{2}\varGamma ((p+1)/2)/\varGamma (p/2).\) Table 13.1 below provides the values of three bounds of a for different p.

Table 13.1 Bounds of a

One can see from Table 13.1 that the new bound of a is the best, especially, it is much better than other two bounds when the dimensionality is small.

Example 13.3

Let \(\mathbf{X}\) have a uniform distribution in the unit sphere centered at \(\varvec{\theta }.\) Then \(R=||\mathbf{X}-{\varvec{\theta }}||\) has a probability density function (pdf) \(pr^{p-1},\, 0\le r\le 1\) and \(\mu _i=p/(p+i),\, i=-2,-1,1,2.\) Thus, \(a_\mathrm{bs}=(p-2)/p^2,\) \(a_\mathrm{xi}=(p-1)/(p^2+3p-2),\) and \(a_\mathrm{new}=(p-1)/[p(p+1)].\) Table 13.2 below provides the values of three bounds of a for different p.

Table 13.2 Bounds of a

One can see from Table 13.2 that the new bound of a is the best. Meanwhile, all three bounds will approach to zero when the dimensionality increases.

3 Extensions to Other Loss Functions and the Unknown Scale Case

Similar to Xu and Izmirlian [24], we consider two extensions in this section. The first one is to show that Theorem 13.1 in Sect. 13.2 can be generalized to a larger class of loss functions, while the second one is to estimate the location vector with an unknown scale parameter.

The loss function used in the first extension is

$$\begin{aligned} L({\varvec{\delta }},{\varvec{\theta }})=W\left( ||{\varvec{\delta }}-{\varvec{\theta }}||^2\right) , \end{aligned}$$
(13.7)

where W is a nonnegative and nondecreasing concave function. The loss function (13.7) has been studied for the spherically symmetric distributions by many investigators including Bock [1], Brandwein and Strawderman [4, 6, 7], Fan and Fang [12,13,14], Xu [23], and Xu and Izmirlian [24].

Theorem 13.2

Let F be the df of \(R=||\mathbf{X}-{\varvec{\theta }}||\) satisfying

$$\begin{aligned} 0<\int _0^{\infty }W^\prime \left( r^2\right) dF(r)<\infty , \end{aligned}$$

where \(W^\prime \) is the derivative of W. Suppose that \(\mathbf{X}\) is spherically symmetric about \({\varvec{\theta }}\) and \({\varvec{\delta }}_{a,\,\mathbf{g}}(\mathbf{X})\) is defined by (13.2). Then under loss function (13.7), \({\varvec{\delta }}_{a,\,\mathbf{g}}(\mathbf{X})\) has a smaller risk than \(\mathbf{X}\) if the conditions (i) and (ii) of Theorem 13.1 hold and

(iii) \(0<a<\nu _1/(p\nu _{-1}),\) where \(\nu _i=E_G(R^i)\) for \(i=-1,1\) and G is a weighted df of F with the weight function \(W^\prime \left( r^2\right) \) defined by

$$\begin{aligned} G(t)=\left( \int _0^{\infty }W^\prime \left( r^2\right) dF(r)\right) ^{-1}\int _0^tW^\prime \left( r^2\right) dF(r),\quad t\ge 0. \end{aligned}$$

Now we investigate the problem of estimating the location vector \({\varvec{\theta }}=(\theta _1,\ldots ,\theta _p)^T\) when the observation \((\mathbf{X}^T,\mathbf{Y}^T)^T\) contains an \(m\times 1\) residual vector \(\mathbf{Y}\) such that \(\mathbf{X}_*^T=(1/\sigma )(\mathbf{X}^T, \mathbf{Y}^T)\) follows a spherically symmetric distribution \(\mathrm{SS}_{p+m}({\varvec{\theta }}_*,\sigma ^2I_{p+m}), \) where \({\varvec{\theta }}_*^T=({\varvec{\theta }}^T, \mathbf{0}_m^T),\) \(\mathbf{0}_m\) is an \(m\times 1\) vector in which all elements are zero, and \(\sigma \) is an unknown scale. The improved estimators we consider is given by

$$\begin{aligned} {\varvec{\delta }}^*_{a,\,\mathbf{g}}(\mathbf{X}_*)=\mathbf{X}+a\mathbf{Y}^T\mathbf{Y}{} \mathbf{g}(\mathbf{X}). \end{aligned}$$
(13.8)

Theorem 13.3

Suppose that \(\mathbf{X}\) is a \(p\times 1\) random vector and \(\mathbf{Y}\) is an \(m\times 1\) random vector such that \(\mathbf{X}_*=(1/\sigma )\,(\mathbf{X}^T, \mathbf{Y}^T)^T\sim \mathrm{SS}_{p+m}({\varvec{\theta }}^*,\sigma ^2\,I_{p+m}). \) Let \({\varvec{\delta }}^*_{a,\,\mathbf{g}}(\mathbf{X}_*)\) be defined by (13.8). Then under the scaled quadratic loss function

$$L({\varvec{\delta }},{\varvec{\theta }})=||{\varvec{\delta }}-{\varvec{\theta }}||^2/\sigma ^2,$$

\({\varvec{\delta }}^*_{a,\,\mathbf{g}}(\mathbf{X}_*)\) dominates \(\mathbf{X}\) if conditions (i) and (ii) of Theorem 13.1 hold and

(iii) \(0<a<(p-1)/[p(m+2)].\)

The bound of a in Theorem 13.3 doesn’t depend on the distribution of \(\mathbf{X}_*.\) Cellier, Fourdrinier and Robert [10] first observed this type of robustness phenomenon for the James-Stein estimator.

4 Discussion

If \(-h\) is superharmonic, Brandwein and Strawderman [7] used the fact that its average over the ball (“volume”) is greater than its average over the sphere (“surface area”) to show the dominance of the estimators of the form \({\varvec{\delta }}_{a,\,\mathbf{g}}(\mathbf{X})\) over the estimator \(\mathbf{X}.\) In this paper we use the fact that the average of \(-h\) over the sphere is a nonincreasing function of the radius of the sphere. The new approach allows us not only to weaken their technical condition (ii), but also to obtain a better bound for a. The new bound of a is also better than those of Brandwein and Strawderman [7] and Xu and Izmirlian [24]. In addition, we consider two extensions. The first is to extend the quadratic loss (13.3) to the loss function (13.7), while the second is to study the estimators of the location vector when the observation \((\mathbf{X}^T, \mathbf{Y}^T)^T\) contains a residual vector \(\mathbf{Y}\) and the scale is unknown. While the bounds of a given by the theorems in Sects. 13.2 and 13.3 are better than those of Brandwein and Straderman [7] and Xu and Izmirlian [24], they are not necessarily optimal and should be considered a guide post. Clearly, one may be able to obtain better bounds than those given here if the distribution of R is known. Stein [20], for example, used integration by parts to obtain \(0<a\le 1(=\mu _2/p)\) under normality. Thus, it would be interesting to see if our new bound \(0<a\le \mu _1/(p\mu _{-1})\) can be further improved to \(0<a\le \mu _2/p\) for \(\mathbf{X}\sim \mathrm{SS}_p(\varvec{\theta }, I_p).\) As a final point, it would be interesting, but perhaps very difficult, to study the dominance conditions of the estimator \({\varvec{\delta }}_{a,\,\mathbf{g}}(\mathbf{X})=\mathbf{X}+a\mathbf{g}(\mathbf{X})\) over \(\mathbf{X}\) for other distributions. As mentioned by Xu and Izmirlian [24], results of estimator (13.1) obtained by Shinozaki [18] for the class of distributions with independently and identically distributed components and by Xu [23] for the sign-invariant distribution are very limited.

5 Proofs

In this section we use \(f_{c,\,s}(z)\) to denote the pdf of the \(\mathrm{Beta}\) distribution \(\mathrm{Beta}(c,s)\) given by

$$\begin{aligned} f_{c,\,s}(z)={\varGamma (c+s)\over {\varGamma (c)\varGamma (s)}}z^{c-1}(1-z)^{s-1},\quad 0<z<1, \end{aligned}$$

where \(c>0\) and \(s>0\) are parameters. To shorten the proofs of results in Sects. 13.2 and 13.3, we need the following lemmas in which the first one is taken from Wijsman’s [22] Theorem 2.

Lemma 13.1

Let \(\alpha \) be a measure on the real line \(\mathbb {R}\) and let \(f_j,\,g_j\,(j=1,2)\) be Borel-measurable functions: \(\mathbb {R}\rightarrow \mathbb {R}\) such that \(f_2\ge 0,\,g_2\ge 0,\) and \(\int |f_ig_j|d\alpha <\infty \, (i, j=1, 2).\) If \(f_1/f_2\) and \(g_1/g_2\) are monotonic in the same direction, then

$$\begin{aligned} \int f_1g_1d\alpha \int f_2g_2d\alpha \ge \int f_1g_2d\alpha \int f_2g_1d\alpha , \end{aligned}$$
(13.9)

whereas if \(f_1/f_2\) and \(g_1/g_2\) are monotonic in the opposite directions, then inequality in (13.9) is reversed. The equality in (13.9) holds if and only if \(f_2=0\) or \(g_2=0\) or \(f_1/f_2=\text {constant}\) or \(g_1/g_2=\text {constant}\) almost everywhere with respect to the measure \(\rho \) defined by \(d\rho =(|f_1|+|f_2|)(|g_1|+|g_2|)d\alpha .\)

Lemma 13.2

Let the function \(M_*\) be defined by (13.4). Then

$$\begin{aligned} \int _0^1M_*(rz)f_{c-1,1}(z)dz\le M_*(r)\le \int _0^1M_*(rz)cz^{c-2}dz, \end{aligned}$$

for any \(r>0,\) where \(c>1\) is a constant.

Proof

Since \(M_*\) is a nondecreasing concave function with \(M_*(0)=0\) and the expected value of \(\mathrm{Beta}(c-1,1)\) distribution is \((c-1)/c,\) using the Jensen’s inequality will yield that

$$\begin{aligned} \int _0^1M_*(rz)f_{c-1,1}(z)dz\le M_*\left( r{c-1\over {c}}\right) \le M_*(r). \end{aligned}$$

Furthermore, the concavity of \(M_*\) implies that \(M_*(rz)\ge zM_*(r)\) for \(z\in [0,1]\) and \(r>0.\) Thus,

$$\begin{aligned} \int _0^1M_*(rz)cz^{c-2}dz\ge \int _0^1M_*(r)cz^{c-1}dz=M_*(r)\int _0^1cz^{c-1}dz=M_*(r). \end{aligned}$$

Lemma 13.3

For \(z\in [0,1],\) let

$$\ell (z)={\beta (r)\over {p}}f_{p,1}(z)+1-{\beta (r)\over {p}},$$

where \(\beta (r)=r^2/a\) is considered a parameter. Then \(\ell (z)\) is a pdf on [0, 1] when \(\beta (r)\le p.\) Furthermore, when \(\beta (r)\le p,\) we have

$$\begin{aligned} m(r)-\beta (r)\int _0^1m(rz)z^{p-1}dz\le \left( 1-{\beta (r)\over {p}}\right) {M_*(r)\over {r}} \end{aligned}$$

for \(r>0,\) where m and \(M_*\) are defined by (13.4).

Proof

When \(\beta (r)\le p,\) \(\ell (z)\) is a pdf on [0, 1] because it is a convex combination of pdfs \(f_{p,1}(z)\) and \(f_{1,1}(z)=1\) on [0, 1]. Furthermore, since m is a nonincreasing function, we have

$$\begin{aligned} m(r)\le \int _0^1m(rz)\ell (z)dz, \end{aligned}$$

which leads to

$$\begin{aligned} \begin{aligned} m(r)-\beta (r)\int _0^1m(rz)z^{p-1}dz&\le \int _0^1m(rz)\ell (z)dz-{\beta (r)\over {p}}\int _0^1m(rz)f_{p,1}(z)dz\\&=\int _0^1m(rz)\left( \ell (z)-{\beta (r)\over {p}}f_{p,1}(z)\right) dz\\&=\left( 1-{\beta (r)\over {p}}\right) \int _0^1m(rz)dz\\&=\left( 1-{\beta (r)\over {p}}\right) {M_*(r)\over {r}}. \end{aligned} \end{aligned}$$

Lemma 13.4

When \(\beta (r)=r^2/a>p,\) we have

$$\begin{aligned} m(r)-\beta (r)\int _0^1m(rz)z^{p-1}dz\le \left( 1-{\beta (r)\over {p}}\right) {c\over {r}}\int _0^1M_*(rz)pz^{p-2}dz \end{aligned}$$

for \(r>0,\) where m and \(M_*\) are defined by (13.4) and \(c\in [1,p-1]\) is a constant.

Proof

Since m is a nonincreasing function, we have

$$\begin{aligned} m(r)\le \int _0^1m(rz)f_{p,1}(z)dz, \end{aligned}$$

which leads to

$$\begin{aligned} \begin{aligned} m(r)-\beta (r)\int _0^1m(rz)z^{p-1}dz&\le \left( 1-{\beta (r)\over {p}}\right) \int _0^1m(rz)f_{p,1}(z)dz\\&\le \left( 1-{\beta (r)\over {p}}\right) {c\over {r}}\int _0^1M_*(rz)pz^{p-2}dz. \end{aligned} \end{aligned}$$
(13.10)

Here the last inequality in (13.10) follows from the condition (ii) of Theorem 13.1 and \(\beta (r)>p.\)

Remark 13.5

Lemmas 13.3 and 13.4 can be combined below:

$$\begin{aligned} m(r)-\beta (r)\int _0^1m(rz)z^{p-1}dz\le N_1(r)N_2(r), \end{aligned}$$
(13.11)

where \(\beta (r)=r^2/a\) and

$$\begin{aligned} \begin{aligned} N_1(r)&=\left( 1-{\beta (r)\over {p}}\right) {1\over {r}},\\ N_2(r)&=I[\beta (r)\le p]M_*(r)+cI[\beta (r)>p]\int _0^1M_*(rz)pz^{p-2}dz. \end{aligned} \end{aligned}$$
(13.12)

Here I[A] denotes the indicator function of the event A.

Lemma 13.5

For \(r>0,\) let \(N_1(r)\) and \(N_2(r)\) be defined by (13.12). Then \(N_1(r)\) is strictly decreasing in r and \(N_2(r)\) is nondecreasing in r. Furthermore, \(E_R[N_1(R)N_2(R)]\le 0\) if \(a\le \mu _1/(p\mu _{-1}).\)

Proof

Since \(N_1(r)=1/r-r/(ap),\) it is a strictly decreasing function of r. Similarly, since both \(M_*(r)\) and \(\int _0^1M_*(rz)pz^{p-2}dz\) are nondecreasing in r and \(M_*(r)\le \int _0^1M_*(rz)pz^{p-2}dz\) from Lemma 13.2, we can conclude that \(N_2(r)\) is a nondecreasing function of r. Furthermore, applying Lemma 13.1 with \(f_1(r)=N_1(r),\) \(g_1(r)=N_2(r),\) \(f_2(r)=g_2(r)=1,\) and a probability measure \(d\alpha =dF\) will yield that

$$\begin{aligned} E_R[N_1(R)N_2(R)]\le E_R[N_1(R)]E_R[N_2(R)]=\left( \mu _{-1}-{\mu _1\over {ap}}\right) E[N_2(R)]\le 0 \end{aligned}$$

if \(a\le \mu _1/(p\mu _{-1})\) because \(N_2(r)\ge 0\) for \(r>0.\)

Proof of Theorem 13.1. When \(\mathbf{X}\sim \mathrm{SS}_p({\varvec{\theta }},I_p),\) we have \(\mathbf{X}-{\varvec{\theta }}=\mathbf{Z}\buildrel d\over =R\mathbf{U},\) where R and \(\mathbf{U}\) are independent, \(R\buildrel d\over =||\mathbf{Z}||,\) and \(\mathbf{U}\) has a uniform distribution on the surface of the unit sphere. Using the argument of Xu and Izmirlian [24] with a verbatim copy of their (12), we obtain that the difference between the risks of two estimators \({\varvec{\delta }}_{a,\,\mathbf{g}}(\mathbf{X})\) and \(\mathbf{X}\) is given by

$$\begin{aligned} \begin{aligned} D_1&=R\big ({\varvec{\delta }}_{a,\,\mathbf{g}}(\mathbf{X}), {\varvec{\theta }}\big )-R\big (\mathbf{X},{\varvec{\theta }}\big )\\&=a^2E\left[ ||\mathbf{g}(\mathbf{Z}+{\varvec{\theta }})||^2\right] +2a E\left[ \mathbf{Z}^T\mathbf{g}(\mathbf{Z}+{\varvec{\theta }})\right] \\&=a^2E\left[ ||\mathbf{g}(\mathbf{Z}+{\varvec{\theta }})||^2\right] +2ap^{-1}E\left[ R^2\triangledown \circ \mathbf{g}(R\mathbf{V}+{\varvec{\theta }})\right] \\&\le 2a^2E\left[ -h(R\mathbf{U}+{\varvec{\theta }})\right] +2ap^{-1}E\left[ R^2h(R\mathbf{V}+{\varvec{\theta }})\right] \\&=2a^2E_R\left[ E_\mathbf{U}\left( -h(R\mathbf{U}+{\varvec{\theta }})\big |R\right) +(ap)^{-1}R^2E_\mathbf{V}\left( h(R\mathbf{V}+{\varvec{\theta }})\big |R\right) \right] \\&=2a^2E_R\left[ m(R)-(ap)^{-1}R^2E_\mathbf{V}\left( h(R\mathbf{V}+{\varvec{\theta }})\big |R\right) \right] \\&=2a^2E_R\left[ m(R)-\beta (R)\int _0^1m(Rv)v^{p-1}dv\right] \\&\le 2a^2E_R\left[ N_1(R)N_2(R)\right] \\&\le 0 \end{aligned} \end{aligned}$$
(13.13)

if \(a\le \mu _1/(p\mu _{-1}).\) Here the first inequality in (13.13) is based on the condition (i); the fifth equality in (13.13) is from the definition of function m;  the last equality in (13.13) follows from the definition of m and the fact that \(\mathbf{V}\buildrel d\over =V\mathbf{U},\) where the random variable \(V\sim \mathrm{Beta}(p,1)\) and \(\mathbf{U}\) having a uniform distribution on the surface of the unit sphere are independent; the second-to-last inequality in (13.13) is based on Lemmas 13.3 and 13.4 or (13.11); the last inequality in (13.13) follows from Lemma 13.5. This completes the proof.

Proof of Theorem 13.2. Using the same approach as in Brandwein and Strawderman [4, 6] or Xu and Izmirlian [24], we obtain that the difference between the risks of two estimators \({\varvec{\delta }}_{a,\,\mathbf{g}}(\mathbf{X})\) and \(\mathbf{X}\) is given by

$$\begin{aligned} \begin{aligned} D_2&=R\big ({\varvec{\delta }}_{a,\, \mathbf{g}}(\mathbf{X}), {\varvec{\theta }}\big )-R(\mathbf{X}, {\varvec{\theta }})\\&=E\left[ W\left( R^2+\varDelta _a({\mathbf{X}})\right) \right] -E\left[ W\left( R^2\right) \right] , \end{aligned} \end{aligned}$$
(13.14)

where

$$\varDelta _a(\mathbf{X})=||{\varvec{\delta }}_{a,\,\mathbf{g}}(\mathbf{X})-{\varvec{\theta }}||^2-||\mathbf{X}-{\varvec{\theta }}||^2.$$

Since W is a nondecreasing concave function,

$$W\left( R^2+\varDelta _a(\mathbf{X})\right) <W\left( R^2\right) +W^\prime \left( R^2\right) \varDelta _a(\mathbf{X}).$$

Then we can conclude from (13.14) that

$$\begin{aligned} \begin{aligned} D_2&\le E_{\mathbf{X}}\left[ W^\prime \left( R^2\right) \varDelta _a(\mathbf{X})\right] \\&\le E_R\left\{ W^\prime \left( R^2\right) E_\mathbf{U}[\varDelta _a(R\mathbf{U}+{\varvec{\theta }})|R]\right\} \\&\le 2a^2E_R\left[ W^\prime (R^2)N_1(R)N_2(R)\right] \\&=2a^2E_{R_*}\left[ N_1(R_*)N_2(R_*)\right] E_R\left[ W^\prime \left( R^2\right) \right] , \end{aligned} \end{aligned}$$

where the df G of the random variable \(R_*\) is defined by

$$G(t)=\left( \int _0^{\infty }W^\prime \left( r^2\right) dF(r)\right) ^{-1}\int _0^tW^\prime \left( r^2\right) dF(r),\quad t\ge 0,$$

which is a weighted df of F with the weight function \(W^\prime \left( r^2\right) .\) The result follows immediately from the assumption that \(0<E_R\left[ W^\prime \left( R^2\right) \right] <\infty \) and the proof of Theorem 13.1 except for a change from the df F to the df G.

Proof of Theorem 13.3. Like Brandwein and Strawderman [7] and Xu and Izmirlian [24], the difference \(D_3\) between the risks of two estimators \(\delta ^*_{a,\,\mathbf{g}}(\mathbf{X}_*)\) and \(\mathbf{X}\) is equal to

$$\begin{aligned} \begin{aligned} D_3&=R\left( {\varvec{\delta }}^*_{a,\,\mathbf{g}}(\mathbf{X}_*),{\varvec{\theta }}\right) -R\left( \mathbf{X},{\varvec{\theta }}\right) \\&={1\over {\sigma ^2}}E\left[ a^2(\mathbf{Y}^T\mathbf{Y})^2||\mathbf{g}(\mathbf{Z}+{\varvec{\theta }})||^2+2a\,\mathbf{Y}^T\mathbf{Y}{} \mathbf{Z}^T\mathbf{g}(\mathbf{Z}+{\varvec{\theta }})\right] \\&={1\over {\sigma ^2}}E\left( a^2D_{31}+2aD_{32}\right) , \end{aligned} \end{aligned}$$
(13.15)

where \(\mathbf{Z}=\mathbf{X}-{\varvec{\theta }}\buildrel d\over =R\mathbf{U},\) and

$$\begin{aligned} \begin{aligned} D_{31}&=E\left[ (\mathbf{Y}^T\mathbf{Y})^2||\mathbf{g}(\mathbf{Z}+{\varvec{\theta }})||^2\bigl |\,||\mathbf{Z}||=R, ||\mathbf{Y}||=S\right] ,\\ D_{32}&=E\left[ \mathbf{Y}^T\mathbf{Y}{} \mathbf{Z}^T\mathbf{g}(\mathbf{Z}+{\varvec{\theta }})\bigl |\,||\mathbf{Z}||=R,||\mathbf{Y}||=S\right] . \end{aligned} \end{aligned}$$

Using the divergence theorem and condition (i), we obtain that

$$\begin{aligned} \begin{aligned} D_{32}&=E\left[ \mathbf{Y}^T\mathbf{Y}{} \mathbf{Z}^T\mathbf{g}(\mathbf{Z}+{\varvec{\theta }})\bigl |||\mathbf{Z}||=R,||\mathbf{Y}||=S\right] \\&=S^2RE_\mathbf{U}\left[ \mathbf{U}^T\mathbf{g}(R\mathbf{U}+{\varvec{\theta }})\bigl |||\mathbf{Z}||=R,||\mathbf{Y}||=S\right] \\&={S^2R^2\over {p}}E_\mathbf{V}\big (\triangledown \circ \mathbf{g}(R\mathbf{V}+{\varvec{\theta }})\bigl |||\mathbf{Z}||=R,||\mathbf{Y}||=S\big )\\&\le -{S^2R^2\over {p}}\int _0^1m(Rz)f_{p, 1}(z)dz, \end{aligned} \end{aligned}$$
(13.16)

where m is defined by (13.4). Similarly, using the condition (i) will yield that

$$\begin{aligned} \begin{aligned} D_{31}&=E\left[ (\mathbf{Y}^T\mathbf{Y})^2||\mathbf{g}(\mathbf{Z}+{\varvec{\theta }})||^2\bigl |||\mathbf{Z}||=R,||\mathbf{Y}||=S\right] \\&\le -2S^4E\left[ h(\mathbf{Z}+{\varvec{\theta }})\bigl |||\mathbf{Z}||=R,||\mathbf{Y}||=S\right] \\&=-2S^4E\left[ h(R\mathbf{U}+{\varvec{\theta }})\bigl |||\mathbf{Z}||=R,||\mathbf{Y}||=S\right] \\&=2S^4m(R). \end{aligned} \end{aligned}$$
(13.17)

Combining (13.16), (13.17) with (13.15) and using the same argument as the proof of theorem 13.1 will obtain the following inequality

$$\begin{aligned} \begin{aligned} D_3&\le {2a\over {\sigma ^2}}E\left( aS^4m(R)-{S^2R^2\over {p}}\int _0^1m(Rz)\,f_{p,1}(z)\,dz\right) \\&={2a\over {\sigma ^2}}E\left[ \left( aS^4\right) \left( m(R)-{R^2\over {aS^2}}\int _0^1m(Rz)z^{p-1}dz\right) \right] \\&\le {2a\over {\sigma ^2}}E\left[ \left( aS^4\right) \left( 1-{R^2\over {apS^2}}\right) {1\over {R}}N_2(R)\right] , \end{aligned} \end{aligned}$$
(13.18)

where the first inequality in (13.18) is based on (13.16) and (13.17), the last inequality of (13.18) follows from (13.11) after replacing a by \(aS^2,\) and \(N_2(R)\) is defined by (13.12). Let \(T^2=R^2+S^2.\) Then \(T^2\) and \(B=R^2/T^2\sim \mathrm{Beta}(p/2,m/2)\) are independent. Let \(C(c,s)=\varGamma (c+s)/[\varGamma (c)\varGamma (s)]\) for \(c>0, s>0\) and let \(C^*=C(p/2,m/2)/C(p/2,(m+2)/2).\) Write \(\lambda =a+1/p.\) Then we can see from (13.18) that

$$\begin{aligned} \begin{aligned} {\sigma ^2\over {2a}}D_3&\le E\left[ aS^4\left( 1-{R^2\over {apS^2}}\right) {1\over {R}}N_2(R)\right] \\&=E\left[ (1-B)\left( a-\lambda B\right) T^4\left( N_1\left( TB^{1/2}\right) +{N\left( TB^{1/2}\right) \over {TB^{1/2}}}\right) \right] \\&=C^*E\left[ \left( a-\lambda B\right) T^4\left( N_1\left( TB^{1/2}\right) +{N\left( TB^{1/2}\right) \over {TB^{1/2}}}\right) \right] \\&=C^*E\left[ \left( a-\lambda B\right) T^4N_1\left( TB^{1/2}\right) \right] \\&+C^*E\left[ \left( aB^{-1/2}-\lambda B^{1/2}\right) T^3N(TB^{1/2})\right] \\&\le C^*\left( a-\lambda {p\over {p+m+2}}\right) E\left[ T^4N_1\left( TB^{1/2}\right) \right] \\&+C^*\left( a{{C(p/2,(m+2)/2)}\over {C((p-1)/2, (m+2)/2)}}-\lambda {{C(p/2,(m+2)/2)}\over {C((p+1)/2,(m+2)/2)}}\right) \\&\times E\left[ T^3N\left( TB^{1/2}\right) \right] \\&\le 0 \end{aligned} \end{aligned}$$
(13.19)

if

$$\begin{aligned} \begin{aligned}&a-\lambda {p\over {p+m+2}}\le 0,\\&a{C\left( p/2,(m+2)/2\right) \over {C\left( (p-1)/2, (m+2)/2\right) }}-\lambda {{C\left( p/2,(m+2)/2\right) }\over {C\left( (p+1)/2,(m+2)/2\right) }}\le 0. \end{aligned} \end{aligned}$$
(13.20)

Here the second-to-last inequality of (13.19) follows from applications of Lemma 13.1 with the measure \(d\alpha =f_{p/2, (m+2)/2}(b)db\) on [0, 1] and \(f_1(b)=a-\lambda b, g_1(b)=T^4N_1(Tb^{1/2}), f_2(b)=g_2(b)=1,\) and \(f_1(b)=ab^{-1/2}-\lambda b^{1/2}, g_1(b)=T^3N(Tb^{1/2}),\) \(f_2(b)=g_2(b)=1,\) respectively. Simple algebra shows that the first inequality in (13.20) is equivalent to \(0<a\le 1/(m+2),\) while the second inequality in (13.20) is equivalent to \(0<a\le (p-1)/[p(m+2)].\) Therefore, \(D_3\le 0\) if \(0<a\le (p-1)/[p(m+2)].\)