4.1 Introduction

In the previous chapters, estimation problems were considered for the normal distribution setting. Stein (1956) showed that the usual estimator of a location vector could be improved upon quite generally for p ≥ 3 and Brown (1966) substantially extended this conclusion to essentially arbitrary loss functions. Explicit results of the James-Stein type, however, have thus far been restricted to the case of the normal distribution. Recall the geometrical insight from Sect. 2.2.2, the development did not depend on the normality of X or even that θ is a location vector – this suggests that the improvement for Stein-type estimators may hold for more general distributions. Strawderman (1974a) first explored such an extension and considered estimation of the location parameter for scale mixtures of multivariate normal distributions. Other extensions of James-Stein type results to distributions other than scale mixtures of normal distributions are due to Berger (1975), Brandwein and Strawderman (1978), and Bock (1985). In this chapter, we will introduce the general class of spherically symmetric distributions; we will examine point estimation for variants of this general class in subsequent three chapters.

4.2 Spherically Symmetric Distributions

The normal distribution has been generalized in two important directions. First, as a special case of the exponential family and second, as a spherically symmetric distribution. In this chapter, we will consider the latter. There are a variety of equivalent definitions and characterizations of the class of spherically symmetric distributions ; a comprehensive review is given in Fang et al. (1990). We now turn our interest to general orthogonally invariant distributions in \(\mathbb {R}^n\) and a slightly more general notion of spherically symmetric distributions.

Definition 4.1

A random vector \(X \in \mathbb {R}^n\) (equivalently the distribution of X) is spherically symmetric about \(\theta \in \mathbb {R}^n\) if X − θ is orthogonally invariant. We denote this by X ∼ SS(θ).

Note that Definition 4.1 states that X ∼ SS(θ) if and only if X = Z + θ where Z ∼ SS(0). As an example, the uniform distribution \({\mathcal U}_{R,\theta }\) (cf. Definition 1.4) on the sphere S R,θ of radius R and centered at θ is spherically symmetric about θ. Furthermore, if P is a spherically symmetric distribution about θ, then

$$\displaystyle \begin{aligned}P(HC+ \theta) = P (C+\theta),\end{aligned}$$

for any Borel set C of \(\mathbb {R}^n\) and any orthogonal transformation H.

The following proposition is immediate from the definition.

Proposition 4.1

If a random vector \(X \in \mathbb {R}^n\) is spherically symmetric about \(\theta \in \mathbb {R}^n\) then, for any orthogonal transformation H, HX is spherically symmetric about Hθ (X  θ has the same distribution as HX  Hθ).

The connection between spherical symmetry and uniform distributions on spheres is indicated in the following theorem.

Theorem 4.1

A distribution P in \(\mathbb {R}^n\) is spherically symmetric about \(\theta \in \mathbb {R}^n\) if and only if there exists a distribution ρ in \(\mathbb {R}_+\) such that \(P(A)= \int _{\mathbb {R}_+} {\mathcal U}_{r, \theta } (A)\, d \rho (r)\) for any Borel set A of \(\mathbb {R}^n\) . Furthermore, if a random vector X has such a distribution P, then the radiusX  θhas distribution ρ (called the radial distribution) and the conditional distribution of X givenX  θ∥ = r is the uniform distribution \({\mathcal U}_{r, \theta }\) on the sphere S r,θ of radius r and centered at θ.

Proof

Sufficiency is immediate since the distribution \({\mathcal U}_{r, \theta }\) is spherically symmetric about θ for any r ≥ 0.

It is clear that for the necessity it suffices to consider θ = 0. Let X be distributed as P where P is SS(0), ν(x) = ∥x∥, and ρ be the distribution of ν. Now, for any Borel sets A in \(\mathbb {R}^n\) and B in \(\mathbb {R}_+\) and for any orthogonal transformation H, we have (using basic properties of conditional distributions )

$$\displaystyle \begin{aligned} \begin{array}{rcl} \int_B P(H^{-1}(A) \mid \nu = r)\, d \rho (r) &\displaystyle =&\displaystyle P(H^{-1}(A) \cap \nu^{-1} (B))\\ &\displaystyle =&\displaystyle P(H^{-1}(A \cap H (\nu^{-1}(B))))\\ &\displaystyle =&\displaystyle P(A \cap H (\nu^{-1}(B)))\\ &\displaystyle =&\displaystyle P(A \cap \nu^{-1}(B))\\ &\displaystyle =&\displaystyle \int_B P(A \mid \nu = r)\,d \rho (r) \end{array} \end{aligned} $$

where we used the orthogonal invariance of the measure P and the function ν. Since the above equality holds for any B, then, almost everywhere with respect to ρ, we have

$$\displaystyle \begin{aligned}P (H^{-1}(A) \mid \nu = r)= P (A \mid \nu = r).\end{aligned}$$

Equivalently, the conditional distribution given ν is orthogonally invariant on S r. By unicity (see Lemma 1.1), it is the uniform distribution on S r and the theorem follows. □

Corollary 4.1

A random vector \(X \in \mathbb {R}^n\) has a spherically symmetric distribution about \(\theta \in \mathbb {R}^n\) if and only if X has the stochastic representation X = θ + R U where R (R = ∥X  θ) and U are independent, R ≥ 0 and \(U \sim {}\mathcal U\).

Proof

In the proof of Theorem 4.1, we essentially show that the distribution of (X − θ)∕∥X − θ∥ is \(\mathcal U\) independently of ∥X − θ∥. This is the necessity part of the corollary. The sufficiency part is direct. □

Also, the following corollary is immediate.

Corollary 4.2

Let X be a random vector in R n having a spherically symmetric distribution about \(\theta \in \mathbb {R}^n\) . Let h be a real valued function on R n such that the expectation E θ[h(X)] exists. Then

$$\displaystyle \begin{aligned} E_\theta[h(X)] = E[{E_{R,\theta}[h(X)]}] \, , \end{aligned}$$

where E R,θ is the conditional expectation of X givenX  θ∥ = R (i.e. the expectation with respect to the uniform distribution \({\mathcal U}_{R, \theta }\) on the sphere S R,θ of radius R and centered at θ) and E is the expectation with respect to the distribution of the radiusX  θ∥.

A more general class of distributions where \((X - \theta )/\| X - \theta \| \sim \mathcal U\) but not necessarily independently of ∥X − θ∥ is known as the isotropic distributions (see Philoche 1977). The class of spherically symmetric distributions with a density with respect to the Lebesgue measure is of particular interest. The form of this density and its connection with the radial distribution are the subject of the following theorem.

Theorem 4.2

Let \(X \in \mathbb {R}^n\) have a spherically symmetric distribution about \(\theta \in \mathbb {R}^n\) . Then the following two statements are equivalent.

  1. (1)

    X has a density f with respect to the Lebesgue measure on \(\mathbb {R}^n\).

  2. (2)

    X  θhas a density h with respect to the Lebesgue measure on \(\mathbb {R}_+\).

Further, if (1) or (2) holds, there exists a function g from \(\mathbb {R}_+\) into \(\mathbb {R}_+\) such that

$$\displaystyle \begin{aligned}f (x)=g (\| x - \theta \|{}^2) \, \, \, a.e.\end{aligned}$$

and

$$\displaystyle \begin{aligned}h(r)= \frac{2 \pi^{n/2}} {\varGamma(n/2)}r^{n-1}g(r^2) \, \, a.e.\end{aligned}$$

The function g is called the generating function and h the radial density.

Proof

The fact that (1) implies (2) follows directly from the representation of X in polar coordinates. We can also argue that (2) implies (1) in a similar fashion using the independence of ∥X − θ∥, angles, and the fact that the angles have a density. The following argument shows this directly and, furthermore, gives the relationship between f, g, and h.

It is clear that it suffices to assume that θ = 0. Suppose then that R = ∥X∥ has a density h. According to Theorem 4.1, for any Borel set A of \(\mathbb {R}^n\), we have

This implies that the random vector X has density

$$\displaystyle \begin{aligned}f(x)= \frac{h (\|x \|)}{\sigma_1(S_1) \|x \|{}^{n-1}}= g(\|x\|{}^2)\end{aligned}$$

with h(r) = σ 1(S 1) r n−1 g(r 2), which is the announced formula for h(r) since σ 1(S 1) = 2π n∕2Γ(n∕2) by Corollary 1.1. □

We now turn our attention to the mean and the covariance matrix of a spherically symmetric distribution (when they exist).

Theorem 4.3

Let \(X \in \mathbb {R}^n\) be a random vector with a spherically symmetric distribution about \(\theta \in \mathbb {R}^n\) . Then, the mean of X exists if and only if the mean of R = ∥X  θexists, in which case E[X] = θ. The covariance matrix of X exists if and only if E[R 2] is finite, in which case

$$\displaystyle \begin{aligned} {\mathit{\mbox{cov}}} (X)= \frac{E\lbrack R^2 \rbrack}{n}I_n .\end{aligned}$$

Proof

Note that X = Z + θ where Z ∼ SS(0) and it suffices to consider the case θ = 0. By the stochastic representation X = R U in Corollary 4.1 with R = ∥X∥ independent of U and \(U \sim {\mathcal U}\), the expectation E[X] exists if and only if the expectations E[R] and E[U] exist. However, since U is bounded, E[U] exists and is equal to zero since E[U] = E[−U] by orthogonal invariance.

Similarly, E[∥X2] = E[R 2] E[∥U2] = E[R 2] and consequently the covariance matrix of X exists if and only if E[R 2] < . Now

$$\displaystyle \begin{aligned}{\mbox{cov}} (R U)=E \lbrack R^2 \rbrack \, E \lbrack UU^{\scriptscriptstyle{\mathrm{T}}} \rbrack = \frac{E\lbrack R^2 \rbrack}{n}I_n.\end{aligned}$$

Indeed \(E \lbrack U^2_i \rbrack = E \lbrack U^2_j \rbrack = 1/n\) since U i and U j have the same distribution by orthogonal invariance and since \(\sum ^n_{i=1} U^2_i = 1\). Furthermore, E[U i U j] = 0, for i ≠ j, since U i U j has the same distribution as − U i U j by orthogonal invariance. □

An interesting and useful subclass of spherically symmetric distributions consists of the spherically symmetric unimodal distributions. We only consider absolutely continuous distributions.

Definition 4.2

A random vector \(X \in \mathbb {R}^n\) with density f is unimodal if the set \(\lbrace x \in \mathbb {R}^n \mid f(x) \geq a \rbrace \) is convex for any a ≥ 0.

Lemma 4.1

Let \(X \in \mathbb {R}^n\) be a spherically symmetric random vector about θ with generating function g. Then the distribution of X is unimodal if and only if g is nonincreasing.

Proof

Suppose first that the generating function g is nonincreasing. Take the left continuous version of g. For any a ≥ 0, defining g −1(a) = sup{y ≥ 0∣g(y) = a} we have

$$\displaystyle \begin{aligned}\lbrace x \in \mathbb{R}^n \mid g(\| x \|{}^2) \geq a \rbrace = \lbrace x \in \mathbb{R}^n \mid \| x \|{}^2 \leq g^{-1} (a) \rbrace\end{aligned}$$

which is a ball of radius \(\sqrt {g^{-1}(a)}\) and convex. Conversely suppose that the set \(\lbrace x \in \mathbb {R}^n \mid g (\| x \|{ }^2) \geq a \rbrace \) is convex for any a ≥ 0 and let ∥x∥≤∥y∥. Then, for x T = y∕∥y∥∥x∥, we have ∥x T∥ = ∥x∥ and x T ∈ [−y, y] and hence, by the unimodality assumption, g(∥x2) = g(∥x T2) ≥ g(∥y2). □

Theorem 4.1 showed that a spherically symmetric distribution is a mixture of uniform distributions on spheres. It is worth noting that, when the distribution is also unimodal, it is a mixture of uniform distributions on balls.

Theorem 4.4

Let \(X \in \mathbb {R}^n\) be a spherically symmetric random vector about \(\theta \in \mathbb {R}^n\) with generating function g. Then the distribution of X is unimodal if and only if there exists a distribution ν in \(\mathbb {R}_+\) with no point mass at 0 such that

$$\displaystyle \begin{aligned} P \lbrack X \in A \rbrack = \int_{R_+} {\mathcal V}_{r, \theta}(A)\,\, d \nu(r) \end{aligned} $$
(4.1)

for any Borel set A of \(\mathbb {R}^n\) , where \({\mathcal V}_{r, \theta }\) is the uniform distribution on the ball \(B_{r, \theta }= \lbrace x \in \mathbb {R}^n \mid \| x - \theta \| \leq r\rbrace \).

Proof

It is clear that it suffices to consider the case where θ = 0. Suppose first that formula (4.1) is satisfied. Then expressing

gives

after applying Lemma 1.4 and Fubini’s theorem. Then

again by Lemma 1.4 with the nonincreasing function

$$\displaystyle \begin{aligned} g(u^2) = \int^{\infty}_{u} \frac{1}{\lambda (B_r)}\,\,d \nu\,\,(r). \end{aligned} $$
(4.2)

Hence according to Lemma 4.1, the distribution of X is unimodal.

Conversely, suppose that the distribution of X is unimodal. According to the above, this distribution will be a mixture of uniform distributions on balls if there exists a distribution ν on \(\mathbb {R}_+\) with no point mass at 0 such that (4.2) holds. If such a distribution exists, (4.2) implies that ν can be expressed through a Stieltjes integral as

$$\displaystyle \begin{aligned}\nu (u)= \int^{u}_{0} \lambda (B_r)(-dg(r^2)).\end{aligned}$$

It suffices therefore to show that ν is a distribution function on \(\mathbb {R}_+\) with no point mass at 0. Note that, as g is nonincreasing, ν is the Stieltjes integral of a positive function with respect to a nondecreasing function and hence ν is nondecreasing. Since λ(B r) = λ(B 1) r n = n σ 1(S 1) r n, an integration by parts gives

$$\displaystyle \begin{aligned} \nu (u)= \sigma_1 (S_1) \int^{u}_{0}r^{n-1}g(r^2)\, dr - \lambda (B_1)^n \,g(u^2). \end{aligned} $$
(4.3)

Note that the first term of the right hand side (4.3) is the distribution function of the radial distribution (see Theorem 4.2) and approaches 0 (respectively 1) when u approaches 0 (respectively ). Therefore, to complete the proof it suffices to show that

$$\displaystyle \begin{aligned}\lim_{u\rightarrow 0} u^n g (u^2)=\lim_{u\rightarrow \infty} u^n g (u^2)= 0\ .\end{aligned}$$

Since

$$\displaystyle \begin{aligned}\int^{\infty}_{0}r^{n-1}g(r^2)\, dr < \infty,\end{aligned}$$

we have

$$\displaystyle \begin{aligned}\lim_{r \rightarrow \infty} \int^{r}_{r/2}r^{n-1}g(u^2)\, du = 0.\end{aligned}$$

By the monotonicity of g, we have

$$\displaystyle \begin{aligned}\int^{r}_{r/2}u^{n-1}g(u^2)\, \,du \geq\,(r^2) \int^{r}_{r/2}u^{n-1}\, du = g(r^2)\,r^n\, \frac{1}{n}\left(1- \frac{1}{2^n}\right).\end{aligned}$$

Hence, limr r n g(r 2) = 0. The limit as r approaches 0 can be treated similarly and the result follows. □

It is possible to allow the possibility of a point mass at 0 for a spherically symmetric unimodal distribution, but we choose to restrict the class to absolutely continuous distributions. For a more general version of unimodality see Section 2.1 of Liese and Miescke (2008).

4.3 Elliptically Symmetric Distributions

By Definition 1.2, a random vector \(X \in \mathbb {R}^n\) is orthogonally invariant if, for any orthogonal transformation H, HX has the same distribution as X. The notion of orthogonal transformation is relative to the classical scalar product \(\langle x,y\rangle = \sum _{i=1 }^{n} x_i y_i \). It is natural to investigate orthogonal invariance with respect to orthogonal transformations relative to a general scalar product 〈x, yΓ = x T Γy =∑1≤i,jn x i Γ ij y j where Γ is a symmetric positive definite n × n matrix. We define a transformation H to be Γ-orthogonal if it preserves the scalar product in the sense that, for any \(x \in \mathbb {R}^n\) and \(y \in \mathbb {R}^n\), 〈Hx, HyΓ = 〈x, yΓ or, equivalently, if it preserves the associated norm \(\|x\|{ }_\varGamma = \sqrt {\langle x, x\rangle _\varGamma }\), that is, if ∥HxΓ = ∥xΓ . Note that H is necessarily invertible since

$$\displaystyle \begin{aligned} {\mathrm{ker}} \, H = \{x \in \mathbb{R}^n \mid H x = 0\} = \{x \in \mathbb{R}^n \mid \|H x\|{}_\varGamma = 0\} = \{x \in \mathbb{R}^n \mid \|x\|{}_\varGamma = 0\} = \{0\} \, .\end{aligned} $$

Then it can be seen that H is Γ-orthogonal if and only if 〈Hx, yΓ = 〈x, H −1 yΓ, for any \(x \in \mathbb {R}^n\) and \(y \in \mathbb {R}^n\) or, equivalently, if H T ΓH = Γ.

In this context, the Γ-sphere of radius r ≥ 0 is defined as

$$\displaystyle \begin{aligned}S^{\varGamma}_{r}= \lbrace x \in \mathbb{R}^n \mid x^{\scriptscriptstyle{\mathrm{T}}} \varGamma x= r^2 \rbrace \, .\end{aligned}$$

Definition 4.3

A random vector \(X \in \mathbb {R}^n\) (equivalently the distribution of X) is Γ-orthogonally invariant if, for any Γ-orthogonal transformation H, the distribution of Y = HX is the same as that of X.

We can define a uniform measure on the ellipse \(S^{\varGamma }_{r}\) in a manner analogous to (1.3) and the resulting measure is indeed Γ-orthogonally invariant. It is not however the superficial measure mentioned at the end of Sect. 1.3, but is, in fact, a constant multiple of this measure where the constant of proportionality depends on Γ and reflects the shape of the ellipse. Whatever the constant of proportionality is, it allows the construction of a unique uniform distribution on \(S^{\varGamma }_{r}\) as in (1.4). The uniqueness follows from the fact that the Γ-orthogonal transformations form a compact group. We can then adapt the material from Sects. 1.3 and 4.2 to the case of a general positive definite matrix Γ. However, we present an alternative development.

The following discussion indicates a direct connection between the usual orthogonal invariance and Γ-orthogonal invariance . Suppose, for the moment, that \(X \in \mathbb {R}^n\) has a spherically symmetric density given by g(∥x2). Let Σ be a positive definite matrix and A be a nonsingular matrix such that AA T = Γ. Standard change of variables gives the density of Y = AX as |Σ|−1∕2 g(y T Σ −1 y). Let H be any Σ −1 orthogonal transformation and let Z = HY . The density of Z is |Σ|−1∕2 g(z T Σ −1 z) since H −1 is also Σ −1 -orthogonal and hence, (H −1)T Σ −1 H −1 = Σ −1. This suggests that, in general, \(Y= \varSigma ^{\frac {1}{2}}X\) is Σ −1-orthogonally invariant if and only if X is orthogonally invariant. The following result establishes this general fact.

Theorem 4.5

Let Σ be a positive definite n × n matrix. A random vector \(Y \in \mathbb {R}^n\) is Σ −1 -orthogonally invariant if and only if Y = Σ 1∕2 X with X orthogonally invariant.

Proof

First note that, for any Σ −1-orthogonal matrix H, Σ −1∕2 −1∕2 is an I n -orthogonal matrix since

$$\displaystyle \begin{aligned} \begin{array}{rcl} (\varSigma^{- {1/2}}H \varSigma^{{1/2}})^{\scriptscriptstyle{\mathrm{T}}} (\varSigma^{- {1/2}}H \varSigma^{{1/2}})&\displaystyle =&\displaystyle \varSigma^{{1/2}} H^{\scriptscriptstyle{\mathrm{T}}} \varSigma^{-1}H \varSigma^{{1/2}}\\ &\displaystyle =&\displaystyle \varSigma^{{1/2}} \varSigma^{-1} \varSigma^{{1/2}}\\ &\displaystyle =&\displaystyle I_n . \end{array} \end{aligned} $$

Then, if X is orthogonally invariant, for any Borel set C, of \(\mathbb {R}^n\) we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} P [H \varSigma^{{1/2}}X \in C ] &\displaystyle =&\displaystyle P[\varSigma^{- {1/2}}H \varSigma^{{1/2}} X \in \varSigma^{- {1/2}}C]\\ &\displaystyle =&\displaystyle P[X \in \varSigma^{- {1/2}}C]\\ &\displaystyle =&\displaystyle P[\varSigma ^{{1/2}}X \in C]. \end{array} \end{aligned} $$

Hence Y = Σ 1∕2 X is Σ −1-orthogonally invariant.

Similarly, for any orthogonal matrix G, Σ 1∕2 −1∕2 is a Σ −1-orthogonal matrix. So, if Y = Σ 1∕2 X is Σ −1-orthogonally invariant, then X is orthogonally invariant. □

Note that, if X is orthogonally invariant and its covariance matrix exists, it is of the form σ 2 I n by Theorem 4.3. Therefore, if Y = Σ 1∕2 X, the covariance matrix of Y  is σ 2 Σ, while, by Theorem 4.5, Y  is Σ −1-orthogonal invariant. In statistical models, it is often more natural to parametrize through a covariance matrix Σ than through its inverse (graphical models are the exception) and this motivates the following definition of elliptically symmetric distributions.

Definition 4.4

Let Σ be a positive definite n × n matrix. A random vector X (equivalently the distribution of X) is elliptically symmetric about \(\theta \in \mathbb {R}^n\) if X − θ is Σ −1-orthogonally invariant. We denote this by X ∼ ES(θ, Σ).

Note that, if X ∼ SS(θ), then X ∼ ES(θ, I n). If Y ∼ ES(θ, Σ), then Σ −1∕2 Y ∼ SS(Σ −1∕2 θ).

In the following, we briefly present some results for elliptically symmetric distributions that follow from Theorem 4.5 and are the analogues of those in Sects. 1.3 and 4.2. The proofs are left to the reader.

For the rest of this section, let Σ be a fixed positive definite n × n matrix and denote by \(S^{\varSigma ^{-1}}_R = \{x \in \mathbb {R}^n \mid x^{\scriptscriptstyle {\mathrm {T}}} \varSigma ^{-1}x = R^2 \}\) the Σ −1- ellipse of radius R and by \({\mathcal U}^{\varSigma }_{R}\) the uniform distributions on \(S^{\varSigma ^{-1}}_{R}\).

Lemma 4.2

  1. (1)

    The uniform distribution \({\mathcal U}^{\varSigma }_{R}\) on \(S^{\varSigma ^{-1}}_{R}\) is the image under the transformation \(Y = \varSigma ^{\frac {1}{2}} X\) of the uniform distribution \({\mathcal U}_{R}\) on the sphere S R , that is,

    $$\displaystyle \begin{aligned}{\mathcal U}^{\varSigma}_{R} (\varOmega)={\mathcal U}_{R}(\varSigma^{- \frac{1}{2}} \varOmega)\end{aligned}$$

    for any Borel set Ω of \(S^{\varSigma ^{-1}}_R\).

  2. (2)

    If X is distributed as \({\mathcal U}^{\varSigma }_{R}\) , then

    1. (a)

      Σ −1∕2 X∕(X T Σ −1 X)1∕2 is distributed as \(\mathcal U\) and

    2. (b)

      X∕(X T Σ −1 X)1∕2 is distributed as \({\mathcal U}^{\varSigma }_{1}\).

Theorem 4.6

A random vector \(X \in \mathbb {R}^n\) is distributed as ES(θ, Σ) if and only if there exists a distribution \(\rho \in \mathbb {R}_+\) such that

$$\displaystyle \begin{aligned}P[X \in A]= \int_{\mathbb{R}_+} {\mathcal U}^{\varSigma}_{r,\theta}\,(A)\,d \rho\,(r)\end{aligned}$$

for any Borel set A on \(\mathbb {R}^n\) , where \({\mathcal U}^{\varSigma }_{r,\theta }\) is the uniform distribution \({\mathcal U}^{\varSigma }_{R}\) translated by θ. Equivalently X has the stochastic representation X = R U where \(R = \|X- \theta \|{ }_{\varSigma ^{-1}} = ((x- \theta )^{\scriptscriptstyle {\mathrm {T}}}\varSigma ^{-1} (x- \theta ))^{{1/2}}\) and U are independent, R ≥ 0 and \(U \sim {\mathcal U}^{\varSigma }_{1}\) . For such X, the radius R has distribution ρ(called the radial distribution).

Theorem 4.7

Let \(X \in \mathbb {R}^n\) be distributed as ES(θ, Σ). Then the following two statements are equivalent:

  1. (1)

    X has a density f with respect to the Lebesgue measure on \(\mathbb {R}^n\) ; and

  2. (2)

    \(\| X - \theta \|{ }_{\varSigma ^{-1}}\) has a density h with respect to Lebesgue measure on \(\mathbb {R}_+\).

Further, if (1) or (2) holds, there exists a function g from \(\mathbb {R}_+\) into \(\mathbb {R}_+\) such that

$$\displaystyle \begin{aligned}f(x)=g(\| x- \theta \|{}^{2}_{\varSigma^{-1}})\end{aligned}$$

and

$$\displaystyle \begin{aligned}h(r)=\frac{2 \pi^{{n/2}}}{\varGamma({n/2})}| \varSigma |{}^{- {1/2}}r^{n-1}g(r^2).\end{aligned}$$

Theorem 4.8

Let \(X \in \mathbb {R}^n\) be distributed as ES(θ, Σ). Then the mean of X exists if and only if the mean of \(R = \| X - \theta \|{ }_{\varSigma ^{-1}}\) exists, in which case E[X] = θ. The covariance matrix exists if and only if E[R 2] is finite, in which case cov(X) = E[R 2] Σn.

Theorem 4.9

Let \(X \in \mathbb {R}^n\) be distributed as ES(θ, Σ) with generating function g. Then the distribution of X is unimodal if and only if g is nonincreasing. Equivalently there exists a distribution \(\nu \in \mathbb {R}_+\) with no point mass at 0 such that

$$\displaystyle \begin{aligned}P[X \in A]= \int_{\mathbb{R}_+} {\mathcal V}^{\varSigma}_{r, \theta}(A)\,d \nu (r)\end{aligned}$$

for any Borel set A of \(\mathbb {R}^n\) , where \({\mathcal V}^{\varSigma }_{r, \theta }\) is the uniform distribution on the ball (solid ellipse)

$$\displaystyle \begin{aligned}B^{\varSigma}_{r, \theta}= \{x \in \mathbb{R}^n \mid \| x - \theta \|{}_{\varSigma^{-1}} \leq r \}.\end{aligned}$$

4.4 Marginal and Conditional Distributions for Spherically Symmetric Distributions

In this section, we study marginal and conditional distributions of spherically symmetric distributions. We first consider the marginal distributions for a uniform distribution on S R.

Theorem 4.10

Let \(X= (X^{\scriptscriptstyle {\mathrm {T}}} _1, X_2 ^{\scriptscriptstyle {\mathrm {T}}} )^{\scriptscriptstyle {\mathrm {T}}} \sim {\mathcal U}_R\) in \(\mathbb {R}^n\) where dim X 1 = p and dim X 2 = n  p. Then, for 1 ≤ p < n, X 1 has an absolutely continuous spherically symmetric distribution with generating function g R given by

(4.4)

Proof

The proof is based on the fact that \(R \, Y/\| Y \| \sim {\mathcal U}_R\), for any random variable Y  with a spherically symmetric distribution (see Lemma 1.2), in particular \({\mathcal N}_n (0,I_n)\), and on the fact that X 1 has an orthogonally invariant distribution in \(\mathbb {R}^p\). To see this invariance, note that, for any p × p orthogonal matrix H 1 and any (n − p) × (n − p) orthogonal matrix H 2, the matrix

$$\displaystyle \begin{aligned} \begin{array}{rcl} H &\displaystyle =&\displaystyle \begin{pmatrix} H_1 &\displaystyle 0 \\ 0 &\displaystyle H_2 \end{pmatrix} , \end{array} \end{aligned} $$

is a block diagonal n × n orthogonal matrix. Hence

$$\displaystyle \begin{aligned} H \begin{pmatrix} X_1 \\ X_2 \end{pmatrix} = \begin{pmatrix} H_1 X_1 \\ H_2 X_2 \end{pmatrix}\end{aligned} $$
(4.5)

is distributed as \((x^{\scriptscriptstyle {\mathrm {T}}} _1, x^{\scriptscriptstyle {\mathrm {T}}} _2 )^{\scriptscriptstyle {\mathrm {T}}} \) and it follows that H 1 X 1 ∼ X 1 and so X 1 is orthogonally invariant.

Therefore, if \(Y= (Y^{\scriptscriptstyle {\mathrm {T}}} _1, Y^{\scriptscriptstyle {\mathrm {T}}} _2 ) ^{\scriptscriptstyle {\mathrm {T}}} \sim {\mathcal N}_{n}(0, I_n)\), then ∥Y 12 is independent of ∥Y 22 and, according to standard results, Z = ∥Y 12∕∥Y ∥2 has a beta distribution, that is Beta(p∕2, (n − p)∕2). It follows that Z  = ∥X 12∕∥X2 = ∥X 12R 2 has the same distribution since both X∕∥X∥ and Y∕∥Y ∥ have distribution \({\mathcal U}_R.\)

Thus ∥X 12 = R 2 Z has a Beta(p∕2, (n − p)∕2) density scaled by R 2. By a change of variable, the density of ∥X 1∥ is equal to

Hence, by Theorem 4.2, X 1 has the density given by (4.4). □

Corollary 4.3

Let \(X= (X^{\scriptscriptstyle {\mathrm {T}}} _1, X_2 ^{\scriptscriptstyle {\mathrm {T}}} )^{\scriptscriptstyle {\mathrm {T}}} \sim SS (\theta )\) in \(\mathbb {R}^n\) where dim X 1 = p and dim X 2 = n  p and where \(\theta = (\theta ^{\scriptscriptstyle {\mathrm {T}}}_1 , \theta _2^{\scriptscriptstyle {\mathrm {T}}} )^{\scriptscriptstyle {\mathrm {T}}} \).

Then, for 1 ≤ p < n, the distribution of X 1 is an absolutely continuous spherically symmetric distribution SS(θ 1) on \(\mathbb {R}^p\) with generating function given by \(\int g_R (\| X_1 - \theta _1 \|{ }^2)\,\, d \nu (R)\) where ν is the radial distribution of X and g R is given by (4.4).

Unimodality properties of the densities of projections are given in the following result.

Corollary 4.4

For the setup of Corollary 4.3 , the density of X 1 is unimodal whenever n  p ≥ 2. Furthermore, if p = n − 2 and \(X = \sim {\mathcal U}_{R,\theta }\) , then X 1 has the uniform distribution on \(B_{R, \theta _1}\) in \(\mathbb {R}^{n-2}\).

In this book, we will have more need for the marginal distributions than the conditional distributions of spherically symmetric distributions. For results on conditional distributions, we refer the reader to Fang and Zhang (1990) and to Fang et al. (1990). We will however have use for the following result.

Theorem 4.11

Let \(X= (X^{\scriptscriptstyle {\mathrm {T}}} _1, X_2 ^{\scriptscriptstyle {\mathrm {T}}} )^{\scriptscriptstyle {\mathrm {T}}} \sim {\mathcal U}_{R, \theta }\) in \(\mathbb {R}^n\) where dim X 1 = p and dim X 2 = n  p and where \(\theta = (\theta ^{\scriptscriptstyle {\mathrm {T}}}_1, \theta ^{\scriptscriptstyle {\mathrm {T}}}_2)^{\scriptscriptstyle {\mathrm {T}}} \) . Then the conditional distribution of X 1 given X 2 is the uniform distribution on the sphere in \(\mathbb {R}^p\) of radius (R 2 −∥X 2θ 22)1∕2 centered at θ 1.

Proof

First, it is clear that the support of the conditional distribution of X 1 given X 2 is the sphere in \(\mathbb {R}^p\) of radius (R 2 −∥X 2θ 22)1∕2 centered at θ 1. It suffices to show that the translated distribution centered at 0 is orthogonally invariant. To this end, note that, for any orthogonal transformation H on \(\mathbb {R}^p\), the block diagonal transformation with blocks H and I np, denoted by \(\tilde H\), is orthogonal in \(\mathbb {R}^n\). Then

$$\displaystyle \begin{aligned} \tilde H \big( (X_1 - \theta_1)^{\scriptscriptstyle{\mathrm{T}}} , (X_2 - \theta_2)^{\scriptscriptstyle{\mathrm{T}}} \big)^{\scriptscriptstyle{\mathrm{T}}} \sim \big( (X_1 - \theta_1)^{\scriptscriptstyle{\mathrm{T}}} , (X_2 - \theta_2)^{\scriptscriptstyle{\mathrm{T}}} )^{\scriptscriptstyle{\mathrm{T}}} \sim {\mathcal U}_{R, \theta} \, \end{aligned}$$

that is,

$$\displaystyle \begin{aligned} \big( (H (X_1 - \theta_1) )^{\scriptscriptstyle{\mathrm{T}}} , (X_2 - \theta_2)^{\scriptscriptstyle{\mathrm{T}}} \big)^{\scriptscriptstyle{\mathrm{T}}} \sim \big( (X_1 - \theta_1)^{\scriptscriptstyle{\mathrm{T}}} , (X_2 - \theta_2)^{\scriptscriptstyle{\mathrm{T}}} \big)^{\scriptscriptstyle{\mathrm{T}}} \sim {\mathcal U}_{R, \theta} \, . \end{aligned}$$

Hence

$$\displaystyle \begin{aligned} H (X_1 - \theta_1) | (X_2 - \theta_2) \sim (X_1 - \theta_1) | (X_2 - \theta_2) \, , \end{aligned}$$

and therefore, the distribution of X 1 given X 2 is orthogonally invariant, since θ 2 is fixed. The lemma follows. □

When properly interpreted, Corollaries 4.3 and 4.4 and Theorem 4.11 continue to hold for a general orthogonal projection π from \(\mathbb {R}^n\) onto any subspace V  of dimension p. See also Sect. 2.4.4 where the distribution is assumed to be normal.

4.5 The General Linear Model

This section is devoted to the general linear model, its canonical form and the issues of estimation, sufficiency and completeness.

4.5.1 The Canonical Form of the General Linear Model

Much of this book is devoted to some form of the following general problem. Let (X T, U T)T be a partitioned random vector in \(\mathbb {R}^n\) with a spherically symmetric distribution around a vector partitioned as (θ T, 0T)T where dim X = dim θ = p and dim U = dim 0 = k with p + k = n. Such a distribution arises from a fixed orthogonally invariant random vector \( (X_0^{\scriptscriptstyle {\mathrm {T}}} , U^{\scriptscriptstyle {\mathrm {T}}} _0)^{\scriptscriptstyle {\mathrm {T}}} \) and a fixed scale parameter σ through the transformation

$$\displaystyle \begin{aligned} (X^{\scriptscriptstyle{\mathrm{T}}} , U^{\scriptscriptstyle{\mathrm{T}}})^{\scriptscriptstyle{\mathrm{T}}} = \sigma \, (X_0^{\scriptscriptstyle{\mathrm{T}}} , U^{\scriptscriptstyle{\mathrm{T}}} _0)^{\scriptscriptstyle{\mathrm{T}}} + (\theta^{\scriptscriptstyle{\mathrm{T}}} , 0^{\scriptscriptstyle{\mathrm{T}}})^{\scriptscriptstyle{\mathrm{T}}} \, , \end{aligned} $$
(4.6)

so that the distribution of ((Xθ)T, U T)T is orthogonally invariant. We also refer to θ as a location parameter.

We will assume that the covariance matrix of (X T, U T)T exists, which is equivalent to the finiteness of the expectation E[R 2] where R = (∥Xθ2 + ∥U2)1∕2 is its radius (in this case, we have cov(X T, U T)T = E[R 2] I nn). Then it will be convenient to assume that the radius R 0 = (∥X 02 + ∥U 02)1∕2 of \((X_0^{\scriptscriptstyle {\mathrm {T}}} , U^{\scriptscriptstyle {\mathrm {T}}} _0)^{\scriptscriptstyle {\mathrm {T}}}\) satisfies \(E[R_0^2] = n\) since we have

$$\displaystyle \begin{aligned} {\mathrm{cov}}(X^{\scriptscriptstyle{\mathrm{T}}} , U^{\scriptscriptstyle{\mathrm{T}}})^{\scriptscriptstyle{\mathrm{T}}} = \sigma^2 \, {\mathrm{cov}}(X_0^{\scriptscriptstyle{\mathrm{T}}} , U_0^{\scriptscriptstyle{\mathrm{T}}})^{\scriptscriptstyle{\mathrm{T}}} = \sigma^2 \, I_n \, . \end{aligned}$$

Note that when it is assumed that the distribution in (4.6) is absolutely continuous with respect to the Lebesgue measure on \(\mathbb {R}^n\), the corresponding density may be represented as

$$\displaystyle \begin{aligned} \frac{1}{\sigma^n}\,g \left(\frac{\| z - \theta \|{}^2 + \| u \|{}^2}{\sigma^2}\right) \end{aligned} $$
(4.7)

where g is the generating function.

This model also arises as the canonical form of the following seemingly more general model, the general linear model. For an n × p matrix V  (often referred to as the design matrix and assumed here to be full rank p), suppose that an n × 1 vector Y  is observed such that

$$\displaystyle \begin{aligned} Y = V \beta + \varepsilon \, , \end{aligned} $$
(4.8)

where β is a p × 1 vector of (unknown) regression coefficients and ε is an n × 1 vector with a spherically symmetric error distribution about 0. A common alternative representation of this model is Y = η + ε where ε is as above and η is in the column space of V .

Using partitioned matrices, let \(G = (G _1^{\scriptscriptstyle {\mathrm {T}}} \; G_2^{\scriptscriptstyle {\mathrm {T}}})^{\scriptscriptstyle {\mathrm {T}}}\) be an n × n orthogonal matrix partitioned such that the first p rows of G (i.e. the rows of G 1 considered as column vectors) span the column space of V . Now let

$$\displaystyle \begin{aligned} \begin{pmatrix} X \\ U \end{pmatrix} = G \, Y = \begin{pmatrix} G_1 \\ G_2 \end{pmatrix} V \, \beta + G \, \varepsilon = \begin{pmatrix} \theta \\ 0 \end{pmatrix} + G \, \varepsilon \end{aligned} $$
(4.9)

with θ = G 1 Vβ and G 2 Vβ = 0 since the rows of G 2 are orthogonal to the columns of V . It follows from the definition that (X T, U T)T has a spherically symmetric distribution about (θ T, 0T)T. In this sense, the model given in the first paragraph is the canonical form of the above general linear model.

This model has been considered by various authors such as Cellier et al. (1989), Cellier and Fourdrinier (1995), Maruyama (2003b), Maruyama and Strawderman (2005), and Fourdrinier and Strawderman (2010). Also, Kubokawa and Srivastava in (2001) addressed the multivariate case where θ is a mean matrix (in this case where X and U are matrices as well).

4.5.2 Least Squares, Unbiased and Shrinkage Estimation

Consider the model in (4.9). Since the columns of \(G_1^{\scriptscriptstyle {\mathrm {T}}}\) (the rows of G 1) and the columns of V  span the same space, there exists a nonsingular p × p matrix A such that

$$\displaystyle \begin{aligned} V = G_1^{\scriptscriptstyle{\mathrm{T}}} A \mbox{, which implies } A = G_1 V, \end{aligned} $$
(4.10)

since \(G_1 G_1^{\scriptscriptstyle {\mathrm {T}}} = I_p\). So

$$\displaystyle \begin{aligned} \theta = A \beta \mbox{, that is, } \beta = A^{-1} \theta \, . \end{aligned} $$
(4.11)

Noting that \(V^{\scriptscriptstyle {\mathrm {T}}}V = A^{\scriptscriptstyle {\mathrm {T}}} \, G_1 \, G_1^{\scriptscriptstyle {\mathrm {T}}} \, A = A^{\scriptscriptstyle {\mathrm {T}}}A\), it follows that the estimation of θ by \(\hat \theta (X,U)\) under the loss

$$\displaystyle \begin{aligned} L (\theta, \hat \theta)= (\hat \theta - \theta)^{\scriptscriptstyle{\mathrm{T}}} (\hat \theta - \theta) = \| \hat \theta - \theta \|{}^2 \end{aligned} $$
(4.12)

is equivalent to the estimation of β by

$$\displaystyle \begin{aligned} \hat \beta(Y) = A^{-1} \, \hat \theta (G_1 Y, G_2 Y) = (G_1 V)^{-1} \, \hat \theta (G_1 Y, G_2 Y) \end{aligned} $$
(4.13)

under the loss

$$\displaystyle \begin{aligned} L^* (\beta, \hat \beta)=(\hat \beta - \beta)^{\scriptscriptstyle{\mathrm{T}}} A^{\scriptscriptstyle{\mathrm{T}}}A (\hat \beta - \beta) = (\hat \beta - \beta)^{\scriptscriptstyle{\mathrm{T}}} V^{\scriptscriptstyle{\mathrm{T}}}V (\hat \beta - \beta) \end{aligned} $$
(4.14)

in the sense that the resulting risk functions are equal,

$$\displaystyle \begin{aligned} \begin{array}{rcl} R^*(\beta, \hat \beta) = E[L^*(\beta ,\hat \beta(Y))] = E[L (\theta, \hat \theta)] = R(\theta, \hat \theta) \, . \end{array} \end{aligned} $$

Actually, the corresponding loss functions are equal. To see this, note that

$$\displaystyle \begin{aligned} \begin{array}{rcl} L^*(\beta ,\hat \beta(Y)) &\displaystyle =&\displaystyle (\hat \beta(Y)- \beta)^{\scriptscriptstyle{\mathrm{T}}} A^{\scriptscriptstyle{\mathrm{T}}}A(\hat \beta(Y)- \beta) \\ &\displaystyle =&\displaystyle (A(\hat \beta(Y)- \beta))^{\scriptscriptstyle{\mathrm{T}}}(A(\hat \beta(Y)- \beta)) \\ &\displaystyle =&\displaystyle (\hat \theta(X,U)- \theta)^{\scriptscriptstyle{\mathrm{T}}}(\hat \theta(X,U)- \theta) \\ &\displaystyle =&\displaystyle L(\theta ,\hat \theta(X,U)) \, , \end{array} \end{aligned} $$

where (4.13) and (4.11) were used for the third equality.

Note that the above equivalence between the estimation of θ, the mean vector of X, and the estimation of the regression coefficients β also holds for the respective invariant losses

$$\displaystyle \begin{aligned} L (\theta, \hat \theta, \sigma^2)= \frac{1}{\sigma^2} \, (\hat \theta - \theta)^{\scriptscriptstyle{\mathrm{T}}} (\hat \theta - \theta) = \frac{1}{\sigma^2} \, \| \hat \theta - \theta \|{}^2 \end{aligned} $$
(4.15)

and

$$\displaystyle \begin{aligned} L^* (\beta, \hat \beta, \sigma^2) = \frac{1}{\sigma^2} \, (\hat \beta - \beta)^{\scriptscriptstyle{\mathrm{T}}} A^{\scriptscriptstyle{\mathrm{T}}}A (\hat \beta - \beta) = \frac{1}{\sigma^2} \, (\hat \beta - \beta)^{\scriptscriptstyle{\mathrm{T}}} V^{\scriptscriptstyle{\mathrm{T}}}V (\hat \beta - \beta) \, . \end{aligned} $$
(4.16)

Additionally, the correspondence (4.13) can be reversed as

$$\displaystyle \begin{aligned} \hat \theta (X, U) = A \, \hat \beta(G_1^{\scriptscriptstyle{\mathrm{T}}} \, X + G_2^{\scriptscriptstyle{\mathrm{T}}} \, U) = G_1 X \, \hat \beta(G_1^{\scriptscriptstyle{\mathrm{T}}} \, X + G_2^{\scriptscriptstyle{\mathrm{T}}} \, U) \end{aligned} $$
(4.17)

since, according to (4.9),

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} Y = G^{\scriptscriptstyle{\mathrm{T}}} \begin{pmatrix} X \\ U \end{pmatrix} = \begin{pmatrix} G_1 \\ G_2 \end{pmatrix} ^{{\scriptscriptstyle\mathrm{T}}} \begin{pmatrix} X \\ U \end{pmatrix} = (G_1^{\scriptscriptstyle{\mathrm{T}}} \, G_2^{\scriptscriptstyle{\mathrm{T}}}) \begin{pmatrix} X \\ U \end{pmatrix} = G_1^{\scriptscriptstyle{\mathrm{T}}} \, X + G_2^{\scriptscriptstyle{\mathrm{T}}} \, U \, . \end{array} \end{aligned} $$
(4.18)

There is also a correspondence between the estimation of θ and the estimation of η in the following alternative representation of the general linear model. Here

$$\displaystyle \begin{aligned}\eta = G^{\scriptscriptstyle{\mathrm{T}}} \begin{pmatrix} \theta \\ 0 \end{pmatrix} = \begin{pmatrix} G_1 \\ G_2 \end{pmatrix} ^{{\scriptscriptstyle{\mathrm{T}}}} \begin{pmatrix} \theta \\ 0 \end{pmatrix} = (G_1^{\scriptscriptstyle{\mathrm{T}}} \, G_2^{\scriptscriptstyle{\mathrm{T}}}) \begin{pmatrix} \theta \\ 0 \end{pmatrix} = G_1^{\scriptscriptstyle{\mathrm{T}}} \, \theta + G_2^{\scriptscriptstyle{\mathrm{T}}} \, 0 = G_1^{\scriptscriptstyle{\mathrm{T}}} \, \theta\end{aligned}$$

and

$$\displaystyle \begin{aligned} G_1 \eta = G_1 \, G_1^{\scriptscriptstyle{\mathrm{T}}} \, \theta = \theta \, . \end{aligned}$$

It follows that the estimation of \(\theta \in \mathbb {R}^p\) by \(\hat \theta (X,U)\) under the loss \(\| \hat \theta - \theta \|{ }^2\) (the loss (4.12)) is equivalent to the estimation of η in the column space of V  under the loss \(\| \hat \eta - \eta \|{ }^2\) by

$$\displaystyle \begin{aligned} \hat \eta(Y) = G_1^{\scriptscriptstyle{\mathrm{T}}} \, \hat \theta(G_1 Y, G_2 Y) \end{aligned} $$
(4.19)

in the sense that the risks functions are equal. The easy demonstration is left to the reader.

Consider the first correspondence expressed in (4.13) and (4.17) between estimators in Models (4.8) and (4.9). We will see that it can be made completely explicit for a wide class of estimators. First, note that the matrix G 1 can be easily obtained by the Gram-Schmidt orthonormalization process or by the QR decomposition of the design matrix X, where Q is an orthogonal matrix such that Q T V = R and R is an n × p upper triangular matrix (so that \(G_1 = Q_1^{\scriptscriptstyle {\mathrm {T}}}\) and \(G_2 = Q_2^{\scriptscriptstyle {\mathrm {T}}}\)). Second, a particular choice of A can be made that gives rise to a closed form of G 1.

To see this, let

$$\displaystyle \begin{aligned} A = (V^{\scriptscriptstyle{\mathrm{T}}} V)^{1/2} \end{aligned} $$
(4.20)

(a square root of V T V , which is invertible since V  has full rank) and set

$$\displaystyle \begin{aligned} G_1 = A \, (V^{\scriptscriptstyle{\mathrm{T}}} V)^{-1} V^{\scriptscriptstyle{\mathrm{T}}} = (V^{\scriptscriptstyle{\mathrm{T}}} V)^{-1/2} V^{\scriptscriptstyle{\mathrm{T}}} \, . \end{aligned} $$
(4.21)

Then we have

$$\displaystyle \begin{aligned} G_1 \, V = A, \quad \quad V = G_1^{\scriptscriptstyle{\mathrm{T}}} A , \end{aligned} $$
(4.22)

and

$$\displaystyle \begin{aligned} G_1 \, G_1^{\scriptscriptstyle{\mathrm{T}}} = (V^{\scriptscriptstyle{\mathrm{T}}} V)^{-1/2} V^{\scriptscriptstyle{\mathrm{T}}} V (V^{\scriptscriptstyle{\mathrm{T}}} V)^{-1/2} = I_p \, . \end{aligned} $$
(4.23)

Hence, as in (4.10), (4.22) expresses that the columns of \(G_1^{\scriptscriptstyle {\mathrm {T}}}\) (the rows of G 1) span the same space as the columns of V , noticing that (4.23) means that these vectors are orthogonal. Therefore, completing \(G_1^{\scriptscriptstyle {\mathrm {T}}}\) through the Gram-Schmidt orthonormalization process, we obtain an orthogonal matrix \(G = (G_1^{\scriptscriptstyle {\mathrm {T}}} G_2^{\scriptscriptstyle {\mathrm {T}}})^{\scriptscriptstyle {\mathrm {T}}}\), with G 1 in (4.21), such that

$$\displaystyle \begin{aligned} G \, V = \begin{pmatrix} (V^{\scriptscriptstyle{\mathrm{T}}} V)^{1/2} \\ 0 \end{pmatrix} \, . \end{aligned} $$
(4.24)

The relationship linking A and G 1 in (4.21) is an alternative to (4.10) and is true in general; that is,

$$\displaystyle \begin{aligned} G_1 = A \, (V^{\scriptscriptstyle{\mathrm{T}}} V)^{-1} \, V^{\scriptscriptstyle{\mathrm{T}}} \mbox{ or equivalently } A = (V^{\scriptscriptstyle{\mathrm{T}}} G_1^{\scriptscriptstyle{\mathrm{T}}})^{-1} V^{\scriptscriptstyle{\mathrm{T}}} V \, . \end{aligned}$$

Indeed, we have V T V = A T A so that (V T V )−1 (A T A) = I p. Hence, (V T V )−1 A T = A −1 , which implies \( V \, (V^{\scriptscriptstyle {\mathrm {T}}} V)^{-1} \, A^{\scriptscriptstyle {\mathrm {T}}} = V \, A^{-1} = G_1^{\scriptscriptstyle {\mathrm {T}}} A \, A^{-1} = G_1^{\scriptscriptstyle {\mathrm {T}}} \, , \) according to (4.10).

As a consequence, if \(\hat \beta _{ls}\) is the least squares estimator of β, we have

$$\displaystyle \begin{aligned} \hat \beta_{ls}(Y) = (V^{\scriptscriptstyle{\mathrm{T}}} V)^{-1} \, V^{\scriptscriptstyle{\mathrm{T}}} \, Y \end{aligned} $$
(4.25)

so that the corresponding estimator \(\hat \theta _0\) of θ is the projection \(\hat \theta _0(X,U) = X\) since

$$\displaystyle \begin{aligned} \hat \theta_0(X,U) = A \, \hat \beta_{ls}(Y) = A \, (V^{\scriptscriptstyle{\mathrm{T}}} V)^{-1} V^{\scriptscriptstyle{\mathrm{T}}} \, Y = G_1 \, Y = X \, . \end{aligned} $$
(4.26)

From this correspondence, the estimator \(\hat \theta _{0} (X,U)= X\) of θ is often viewed as the standard estimator. Note that, with the choice of A in (4.20), we have the closed form

$$\displaystyle \begin{aligned} \hat \beta_{ls}(Y) = (V^{\scriptscriptstyle{\mathrm{T}}} V)^{-1/2} X \, . \end{aligned} $$
(4.27)

Furthermore, the correspondence between \(\hat \theta (X,U)\) and \(\hat \beta _{ls}(Y)\) can be specified when \(\hat \theta (X,U)\) depends on U only through ∥U2, in which case, with a slight abuse of notation, we write \(\hat \theta (X,U) = \hat \theta (X,\|U\|{ }^2)\). Indeed, first note that

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \|X\|{}^2 &\displaystyle =&\displaystyle (A \, \hat \beta_{ls}(Y))^{\scriptscriptstyle{\mathrm{T}}} (A \, \hat \beta_{ls}(Y)) \\ &\displaystyle =&\displaystyle (\hat \beta_{ls}(Y))^{\scriptscriptstyle{\mathrm{T}}} \, A^{\scriptscriptstyle{\mathrm{T}}} A \, (\hat \beta_{ls}(Y)) \\ &\displaystyle =&\displaystyle (\hat \beta_{ls}(Y))^{\scriptscriptstyle{\mathrm{T}}} \, V^{\scriptscriptstyle{\mathrm{T}}} V \, (\hat \beta_{ls}(Y)) \\ &\displaystyle =&\displaystyle (V \, \hat \beta_{ls}(Y))^{\scriptscriptstyle{\mathrm{T}}} V \, (\hat \beta_{ls}(Y)) \\ &\displaystyle =&\displaystyle \|V \, \hat \beta_{ls}(Y)\|{}^2 \, . \end{array} \end{aligned} $$
(4.28)

On the other hand, according to (4.9), we have ∥X2 + ∥U2 = ∥G Y ∥2 = ∥Y ∥2. Hence,

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \|U\|{}^2 &\displaystyle =&\displaystyle \|Y\|{}^2 - \|X\|{}^2 \\ &\displaystyle =&\displaystyle \|Y\|{}^2 - \|V \, \hat \beta_{ls}(Y)\|{}^2 \\ &\displaystyle =&\displaystyle \|Y - V \, \hat \beta_{ls}(Y)\|{}^2 \end{array} \end{aligned} $$
(4.29)

since \(Y - V \, \hat \beta _{ls}(Y)\) is orthogonal to \(V \, \hat \beta _{ls}(Y)\). Consequently, according to (4.13) and (4.10), Equations (4.29) and (4.26) give that the estimator \(\hat \beta (Y)\) of β corresponding to the estimator \(\hat \theta (X,\|U\|{ }^2)\) of θ is

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \hat \beta(Y) = (G_1 \, V)^{-1} \, \hat \theta(G_1 \, V \, \hat \beta_{ls}(Y),\|Y - V \, \hat \beta_{ls}(Y)\|{}^2) \, . \end{array} \end{aligned} $$
(4.30)

Note that, when ones chooses G 1 as in (4.21), \(\hat \beta (Y)\) in (4.30) has the closed form

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \hat \beta(Y) &\displaystyle =&\displaystyle (V^{\scriptscriptstyle{\mathrm{T}}} \, V)^{-1/2} \, \hat \theta \left((V^{\scriptscriptstyle{\mathrm{T}}} \, V)^{1/2} \, \hat \beta_{ls}(Y), \|Y - V \, \hat \beta_{ls}(Y)\|{}^2\right) \\ &\displaystyle =&\displaystyle (V^{\scriptscriptstyle{\mathrm{T}}} \, V)^{-1/2} \, \hat \theta \left((V^{\scriptscriptstyle{\mathrm{T}}} \, V)^{-1/2} V^{\scriptscriptstyle{\mathrm{T}}} Y, \|Y - V \, \hat \beta_{ls}(Y)\|{}^2\right) \, . \end{array} \end{aligned} $$
(4.31)

In particular, we can see through (4.28), that the “robust” Stein-type estimators of θ,

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \hat \theta_r(X,\|U\|{}^2) = \left( 1 - a \, \frac{\|U\|{}^2}{\|X\|{}^2} \right) X \end{array} \end{aligned} $$
(4.32)

have as a correspondence the “robust” estimators of β

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \hat \beta_r(Y) &\displaystyle =&\displaystyle (G_1 \, V)^{-1} \left( 1 - a \, \frac{\|Y - V \, \hat \beta_{ls}(Y)\|{}^2}{\|V \, \hat \beta_{ls}(Y)\|{}^2} \right) G_1 \, V \, \hat \beta_{ls}(Y) \\ &\displaystyle =&\displaystyle \left( 1 - a \, \frac{\|Y - V \, \hat \beta_{ls}(Y)\|{}^2}{\|V \, \hat \beta_{ls}(Y)\|{}^2} \right) \hat\beta_{ls}(Y) \end{array} \end{aligned} $$
(4.33)

(note that the two G 1 V  terms simplify). We use the term “robust” since, for appropriate values of the positive constant a, they dominate X whatever the spherically symmetric distribution, as we will see in Chap. 5 (see also Cellier et al. 1989; Cellier and Fourdrinier 1995).

According to the correspondence seen above between the risk functions of the estimators of θ and the estimators of β, using these estimators in (4.33) is then a good alternative to the least squares estimator: they dominate the least squares estimator of β simultaneously for all spherically symmetric error distributions with a finite second moment (see Fourdrinier and Strawderman (1996) for the use of these robust estimators when σ 2 is known and also Sect. 5.2).

4.5.3 Sufficiency in the General Linear Model

Suppose (X T, U T)T has a spherically symmetric distribution about (θ T, 0T)T with dim X = dim θ = p > 0 and dim U = dim 0 = k > 0. Furthermore, suppose that the distribution is absolutely continuous with respect to the Lebesgue measure on \(\mathbb {R}^n\) for n = p + k. The corresponding density may be represented as in (4.7). We refer to θ as a location vector and to σ as a scale parameter. As seen in the previous section, such a distribution arises from a fixed orthogonally invariant random vector \( (X_0^{\scriptscriptstyle {\mathrm {T}}} , U^{\scriptscriptstyle {\mathrm {T}}} _0)^{\scriptscriptstyle {\mathrm {T}}} \) with generating function g through the transformation

$$\displaystyle \begin{aligned}\left( \begin{array}{c} X\\ U \end{array} \right)= \sigma \left( \begin{array}{c} X_0\\ U_0 \end{array} \right)+ \left( \begin{array}{c} \theta\\ 0 \end{array} \right). \end{aligned}$$

Each of θ, σ 2 and g(⋅) may be known or unknown, but perhaps the most interesting case from a statistical standpoint is the following.

Suppose θ and σ 2 are unknown and g(⋅) is known. It follows immediately from the factorization theorem that (X, ∥U2) is sufficient . It is intuitively clear that this statistic is also minimal sufficient since dim(X, ∥U2) = dim(θ, σ 2). Here is a proof of that fact.

Theorem 4.12

Suppose that (X T, U T)T is distributed as (4.7). Then the statistic (X, ∥U2) is minimal sufficient for (θ, σ 2) when g is known.

Proof

By Theorem 6.14 of Casella and Berger (2001), it suffices to show that if, for all \((\theta , \sigma ^2) \in \mathbb {R}^p \times \mathbb {R}_{+}\),

$$\displaystyle \begin{aligned} \frac{g \left(\frac{\|x_1 - \theta\|{}^2 + \|u_1\|}{\sigma^2}\right)} {g \left(\frac{\|x_2 - \theta\|{}^2 +\|u_2\|{}^2}{\sigma^2}\right)}= c \end{aligned} $$
(4.34)

where c is a constant then x 1 = x 2 and ∥u 12 = ∥u 22. Note that 0 < c <  since otherwise (4.7) cannot be a density.

Letting τ 2 = 1∕σ 2, (4.34) can be written, for all τ > 0, as

$$\displaystyle \begin{aligned} g (\tau^2 v^{2}_{1})= c g (\tau^2 v^{2}_{2}) \end{aligned} $$
(4.35)

where \(v^{2}_{1}= \|x_1 - \theta \|{ }^2 + \|u_1\|{ }^2\) and \(v^{2}_{2}= \|x_2 - \theta \|{ }^2 + \|u_2\|{ }^2\) for each fixed \(\theta \in \mathbb {R}^p\). First, we will show that \(v^{2}_{1} = v^{2}_{2}\). Note that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} 1 &\displaystyle =&\displaystyle \int_{\mathbb{R}^p \times \mathbb{R}^k} g(\|x\|{}^2 + \|u\|{}^2)\,dx \, du \\ &\displaystyle =&\displaystyle K \int^{\infty}_{0} \, r^{p+k-1}\, g(r^2)\, dr\, \quad \quad \quad \quad \quad \quad \quad \quad \quad (\text{by Theorem 4.2}) \\ &\displaystyle =&\displaystyle K \upsilon^{p+k} \int^{\infty}_{0}\, \tau^{p+k-1}\, g(v^2 \tau^2)\, d\tau \end{array} \end{aligned} $$
(4.36)

for any v > 0. Then it follows from (4.35) and (4.36) that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} 1 &\displaystyle =&\displaystyle K v^{p+k}_{1} \,\int^{\infty}_{0}\,\tau^{p+k-1}g(v^{2}_{1}\tau^2)\,d\tau\\ &\displaystyle =&\displaystyle cK v^{p+k}_{1}\,\int^{\infty}_{0}\,\tau^{p+k-1}g(v^{2}_{2}\tau^2)\,d\tau\\ &\displaystyle =&\displaystyle c \frac{v_{1}^{p+k}}{v_{2}^{p+k}}. \end{array} \end{aligned} $$
(4.37)

Let \(F (b)= \int ^{b}_{0}\,\tau ^{p+k-1}g(\tau ^2)\,d\tau \) and choose b such that F is strictly increasing at b. Suppose v 1 > v 2. Then, for any v > 0,

$$\displaystyle \begin{aligned}F(b)=v^{p+k}\,\int^{b/v}_{0}\, \tau^{p+k-1}\,g(v^2\tau^2)\,d\tau\end{aligned}$$

and consequently

$$\displaystyle \begin{aligned} \begin{array}{rcl} \int^{b/v_1}_{0}\,\tau^{p+k-1}\, g(v^{2}_{1}\tau^2)d\tau &\displaystyle =&\displaystyle \frac{F(b)}{v^{n}_{1}}\\ &\displaystyle =&\displaystyle c \int^{b/v_1}_{0}\, \tau^{p+k-1}\, g(v^{2}_{2}\tau^2)d\tau\\ &\displaystyle <&\displaystyle c \int^{b/v_2}_{0}\, \tau^{p+h-1}\, g(v^{2}_{2}\tau^2)d\tau\\ &\displaystyle =&\displaystyle c \, \frac{F(b)}{v^{n}_{2}}. \end{array} \end{aligned} $$

It follows that \(c \, v_{1}^{p+k} / v_{2}^{p+h} > 1\), which contradicts (4.37). A similar argument would give \(c \, v_{1}^{p+k} / v_{2}^{p+h} < 1\) for v 1 < v 2 and v 1 = v 2. Now, setting \(\theta = \frac {x_1 + x_2}{2}\) in the expressions for v 1 and v 2 implies \(\|u_1\|{ }^2 = \|u_2^2\|\). It then follows that ∥x 1 − θ2 = ∥x 2 − θ2 for all \(\theta \in \mathbb {R}^p\), which implies x 1 = x 2 by setting θ = x 2 (or x 1). □

In the case where θ is unknown, σ 2 is known, and the distribution is multivariate normal, X is minimal sufficient (and complete). However, in the non-normal case, (X, ∥U2) is typically minimally sufficient, and may or may not be complete, which is the subject of the next section.

4.5.4 Completeness for the General Linear Model

The section largely follows the development in Fourdrinier et al. (2014). In the case where both θ and σ 2 are unknown and g is known, the minimal sufficient statistic (X, ∥U2) can be either complete or incomplete depending on g. If g corresponds to a normal distribution, the statistic is complete by standard results for exponential families. However, when the generating function is of the form with 0 < r 1 < r 2 <  and K is the normalizing constant, (X, ∥U2) is not complete. In fact incompleteness of (X, ∥U2) follows from the fact that the minimal sufficient statistic, when θ is known, σ 2 is unknown and g is known, is incomplete.

Theorem 4.13

  1. (1)

    If X  f(x  θ) with \( \theta \in \mathbb {R}^p \) where f has compact support , then X is not complete for θ.

  2. (2)

    If X ∼ 1∕σf(xσ), where f has support contained in an interval [a, b] with 0 < a < b < ∞, then X is not complete for σ.

Before giving the proof of Theorem 4.13, note that if the generating function is of the form for 0 < r 1 < r 2 <  and the value of θ is assumed to be known and equal to θ 0, then T = ∥X − θ 02 + ∥U2 is minimal sufficient and has density of the form .

Therefore, T is not a complete statistic for σ 2 by Lemma 4.13 (2). It follows that there exists a function h(⋅) not equal to zero a.e. such that E σ[h(T)] = 0 for all σ > 0. Since \(E_{\sigma ^{2}} [h(\beta \, T)] = E _{\beta \sigma ^{2}} [h (T)]\), it follows that \(E _{\sigma ^{2}} [h(\beta \, T)] = 0 \) for all σ 2 > 0, β > 0, and also that \(M(t) = \int ^1_0 E_{\sigma ^{2}} [h(\beta \, t)] \, m(\beta ) \, d \beta = 0\) for any function m(⋅) for which the integral exists. In particular, this holds when m(⋅) is the density of a Beta(k∕2, p∕2) random variable (where finiteness of the integral is guaranteed since \( E_{\sigma ^{2}} [h(\beta \, T)]\) is continuous in β). Now, since B = ∥U2T has a Beta(k∕2, p∕2) distribution, ∥U2 = BT, and \(M (\sigma ^2) = E_{\sigma ^{2}} [h(B \, T) = E _{\sigma ^{2}} [h( \Vert U \Vert ^2 )] \equiv 0\).

Since the distribution of ∥U2 does not depend on θ, it follows that when both θ and σ 2 are unknown, \(E_{\theta , \sigma ^2} [ h(\Vert U \Vert ^2) ] \equiv 0 \). Hence, (X, ∥U2), while minimal sufficient, is not complete for the case of a generating function of the form with 0 < r 1 < r 2 < .

Note that whenever θ is unknown, σ 2 is known, and (X, ∥U2) is minimal sufficient (so the distribution is not normal, since then X would be minimal sufficient) ∥U2 is ancillary and the minimal sufficient statistic is not complete.

Proof of Theorem 4.13

First, note that part (2) follows from part (1) by the standard technique of transforming a scale family to a location family by taking logs.

We will show the incompleteness of a location family in \(\mathbb {R}\) when F has bounded support. We show first that, if F is a cdf with bounded support contained inside [a, b], the characteristic function (c.f.) \(\hat {f} \) is analytic in \(\mathbb {C}\) (the entire complex plane) and is of order 1 (i.e., \( |\hat {f} (\eta )| \) is \(O \exp (|\eta |{ }^{1+ \varepsilon }) \) for all 𝜖 > 0 and is not \( O (\exp (|\eta |{ }^{1- \varepsilon })\) for any 𝜖 > 0).

To see this, without loss of generality assume 0 < a < b < . Then

$$\displaystyle \begin{aligned} \begin{array}{rcl} |\hat{f} (\eta) | &\displaystyle \leq&\displaystyle \int^b_a \exp(|\eta | X)\,d F (x)\\ &\displaystyle \leq&\displaystyle \exp(b|\eta | ) \int^b_a \,d F (x) \\ &\displaystyle = &\displaystyle \exp(b|\eta|) \\ &\displaystyle = &\displaystyle O (\exp(|\eta|{}^{1 + \varepsilon})).\end{array} \end{aligned} $$

for all ε > 0. Also, if η = −iv for v > 0, then

$$\displaystyle \begin{aligned} |\hat{f} (\eta)| = \int^b_a \exp(v x) \, d F (x) \geq \exp(av) \int^b _a \, dF(x) = \exp(av) \, . \end{aligned}$$

However, \( \exp (av) \) is not \(O ( \exp (v ^{1-\varepsilon }) \) for any ε > 0. Hence \( \hat {f} (\eta ) \) is of order 1.

In the step above, we used 0 < a < b < . Note that if either a and/or b is negative then the distribution of X is equal to the distribution of z + θ 0 where θ 0 is negative and where the distribution of z satisfies the assumptions of the theorem. Hence \( E \exp (i \eta x) = E \exp (i \eta z) e ^{i \eta \theta _{0}}\), so \( | E \exp (i \eta x) | \leq \exp (|\eta |b) \exp (|i \eta ||\theta _0|) \) which is \( O \exp (| \eta |{ }^{1+\epsilon }) \; \mbox{for all}\; \epsilon > 0\).

Similarly, for η = −iv (recall θ 0 < 0),

$$\displaystyle \begin{aligned} \begin{array}{rcl} |E \exp(i \eta x) | &\displaystyle = &\displaystyle E \exp(t v z) \, \exp(-v \theta_{0}) \\ &\displaystyle \geq&\displaystyle e ^{v |\theta _{0}|} \exp(av)\\ &\displaystyle = &\displaystyle \exp(v (a + |\theta_{0} |) \end{array} \end{aligned} $$

and this is not \(O (\exp ^{v ^{1-\varepsilon }}) \) for any ε > 0. □

Note that \( \hat {f} (\eta ) \) exists in all of \(\mathbb {C}\) since F has bounded support and is analytic by standard results in complex analysis (See e.g. Rudin 1966). To complete the proof of Theorem 4.13 we need the following lemma.

Lemma 4.3

If X  F(x) where the cdf F has bounded support in \(\mathbb {R}\) and F is not degenerate, then the characteristic function \( \hat {f} (\eta ) \) has at least one zero in \(\mathbb {C}\).

Proof

This follows almost directly from the Hadamard factorization theorem which implies that a function \( \hat {f} (z) \) that is analytic in all of \(\mathbb {C}\) and of order 1 is of the form \( \hat {f} (z) = \exp (az +b) P(z) \). P(z) is the so called canonical product formed from the zeros of \( \hat {f} (z) \), where P(0) = 1 and P(z) = 0 for each such root. (See e.g., Titchmarsh (1932) for an extended discussion of the form of P(z)). Therefore, either \( \hat {f} (z) \) has no zeros, in which case \( \hat {f}(z) = \exp (az) \) (since \(\hat {f} (0) = 1 = e ^b \Rightarrow b = 0 )\) and P(z) ≡ 1, or \( \hat {f} (z)\) has at least one zero. The case where \( \hat {f} (z) = \exp (a z) \) corresponds to the degenerate case where \( \exp (a z) = \hat {f}(z) = E \exp (i zx) \) with P[X = −ia] = 1. Since F is assumed to not be degenerate, \( \hat {f} (z) \) must have at least one zero by the uniqueness of the Fourier transform.

To finish the proof of Theorem 4.13 note that, by By Lemma 4.3, there exists an η 0 such that

$$\displaystyle \begin{aligned} \hat{f} (\eta_0) = \int^\infty_{-\infty} \exp(i \eta_{0} x) f (x) \, d x = 0 . \end{aligned}$$

This implies that for any \( \theta \in \mathbb {R}\),

$$\displaystyle \begin{aligned} \begin{array}{rcl} 0 &\displaystyle = &\displaystyle \left( \int^\infty_{-\infty}\exp(i \eta_{0} x) f (x) \, d x \right) \, \exp ^{i \eta_{0} \theta} \\ &\displaystyle = &\displaystyle \int^\infty_{-\infty}\exp(i x(\eta_{0} + \theta)) f (x) \, d x\\ &\displaystyle = &\displaystyle \int^\infty_{-\infty}\exp(i \eta_{0} x) f (x-\theta) \, d x\\ &\displaystyle = &\displaystyle E _\theta [\exp(i \eta_{0} X) =E_\theta [ \exp(i( a_0 +b_0 i)) X]\\ &\displaystyle = &\displaystyle E _\theta \exp(i \eta_{0} X) \exp(-b_0 X) ]\\ &\displaystyle = &\displaystyle E _\theta [\exp(-b_{0} X) \{ \cos a_0 x + i \sin a_0 x \}]. \end{array} \end{aligned} $$

Hence, for any \( \theta \in \mathbb {R}\), we have \(E_\theta [ \exp (-b_0 x) \cos (a_0 x) ] \equiv 0\).

Additionally, \( E_\theta [ |\exp (-b_0 x) \cos (a_0 x)| ] < \infty \) for all θ since f(⋅) has bounded support. The theorem then follows, since \( h(X) = e ^{-b_0 X} \cos a_0 X \) is an unbiased estimator of 0, which is not equal to 0 almost surely for each θ. This proves the result for p = 1. The extension from \(\mathbb {R}\) to \(\mathbb {R}^p\) is straightforward since the marginal distribution of each coordinate has compact support. □

4.6 Characterizations of the Normal Distribution

There is a large literature on characterizations of the normal distribution that has had a long history. A classical reference that covers a number of characterizations of the normal distribution is Kagan et al. (1973). We give only a small sample of these characterizations. The first result gives a characterization in terms of the normality of linear transformations.

Theorem 4.14

Let X  ES(θ) in \(\mathbb {R}^n\) . If A is any fixed linear transformation of positive rank such that AX has a normal distribution then X has a normal distribution.

Proof

First note that it suffices to consider the case θ = 0. Furthermore it suffices to prove the result for X ∼ SS(0) since an elliptically symmetric distribution is the image of a spherically symmetric distribution by a nonsingular transformation. Note also that, if X ∼ SS(0), its characteristic function φ X(t) = Ψ(t T t) since, for any orthogonal transformation H, the characteristic function φ HX of HX satisfies

$$\displaystyle \begin{aligned}\varphi_{H X}(t)= \varphi_X (H^{\scriptscriptstyle{\mathrm{T}}} t)= \varphi_X (t).\end{aligned}$$

Now the characteristic function φ AX of AX equals

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \varphi_{A X}(t)&\displaystyle =&\displaystyle E[\exp\{i t^{\scriptscriptstyle{\mathrm{T}}} AX\}]= E[\exp\{i(A^{\scriptscriptstyle{\mathrm{T}}} t)^{\scriptscriptstyle{\mathrm{T}}} X\}]= \varPsi (t^{\scriptscriptstyle{\mathrm{T}}} AA^{\scriptscriptstyle{\mathrm{T}}} t). \end{array} \end{aligned} $$
(4.38)

Also, by Theorem 4.3, Cov(X) = E[R 2]∕nI n. Hence Cov(AX) = (E[R 2]∕n) AA T and the fact that AX is normal implies that E[R 2] <  and that Cov(AX) = α AA T for α ≥ 0. This implies that \(\varphi _{AX}(t)= \exp \{-\alpha \, t^{\scriptscriptstyle {\mathrm {T}}} AA^{\scriptscriptstyle {\mathrm {T}}} t/2\}\). Therefore, by (4.38), \(\varPsi (z)= \exp \{- \alpha z/2\}\) and \(\varphi _X (t)= \exp \{-\alpha t^{\scriptscriptstyle {\mathrm {T}}} t/2\}\), so X is normal. □

Corollary 4.5

Let X  ES(θ) in \(\mathbb {R}^n\) . If any orthogonal projection Π has a normal distribution (and, in particular, any marginal), then X has a normal distribution.

The next theorem gives a characterization in terms of the independence of linear projections.

Theorem 4.15

Let X  ES(θ) in \(\mathbb {R}^n\) . If A and B are any two fixed linear transformations of positive rank such that AX and BX are independent, then X has a normal distribution.

Proof

As in the proof of Theorem 4.14, we can assume that X ∼ SS(0). Then the characteristic function φ X of X is φ X(t) = Ψ(t T t). Hence, the characteristic functions φ AX and φ BX of AX and BX are \(\varphi _{AX}(t_1)= \varPsi (t^{\scriptscriptstyle {\mathrm {T}}}_1 AA^{\scriptscriptstyle {\mathrm {T}}} t_1)\) and \(\varphi _{BX}(t_2)= \varPsi (t^{\scriptscriptstyle {\mathrm {T}}}_2 BB^{\scriptscriptstyle {\mathrm {T}}} t_2)\), respectively. By the independence of AX and BX, we have

$$\displaystyle \begin{aligned}\varPsi (t^{\scriptscriptstyle{\mathrm{T}}}_1 AA^{\scriptscriptstyle{\mathrm{T}}} t_1 + t^{\scriptscriptstyle{\mathrm{T}}}_2 BB^{\scriptscriptstyle{\mathrm{T}}} t_2)= \varPsi (t^{\scriptscriptstyle{\mathrm{T}}}_1 AA^{\scriptscriptstyle{\mathrm{T}}} t_1) \varPsi(t^{\scriptscriptstyle{\mathrm{T}}}_2 BB^{\scriptscriptstyle{\mathrm{T}}} t_2).\end{aligned}$$

Since A and B are of positive rank this implies that, for any u ≥ 0 and v ≥ 0,

$$\displaystyle \begin{aligned}\varPsi (u + v)= \varPsi (u) \varPsi (v).\end{aligned}$$

This equation is known as Hamel’s equation and its only continuous solution is Ψ(u) = e αu for some \(\alpha \in \mathbb {R}\) (see for instance Feller 1971, page 305). Hence, \({\varphi }_X(t)= e^{\alpha t^{\scriptscriptstyle {\mathrm {T}}} t}\) for some α ≤ 0 since φ X is a characteristic function. It follows that X has a normal distribution. □

Corollary 4.6

Let X  ES(θ) in \(\mathbb {R}^n\) . If any two projections (in particular, any two marginals) are independent, then X has a normal distribution.