Tests for multivariate normality based on canonical correlations

Thulin, Måns

doi:10.1007/s10260-013-0252-5

Tests for multivariate normality based on canonical correlations

Published: 11 January 2014

Volume 23, pages 189–208, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Statistical Methods & Applications Aims and scope Submit manuscript

Tests for multivariate normality based on canonical correlations

Download PDF

Måns Thulin¹

452 Accesses
12 Citations
Explore all metrics

Abstract

We propose new affine invariant tests for multivariate normality, based on independence characterizations of the sample moments of the normal distribution. The test statistics are obtained using canonical correlations between sets of sample moments in a way that resembles the construction of Mardia’s skewness measure and generalizes the Lin–Mudholkar test for univariate normality. The tests are compared to some popular tests based on Mardia’s skewness and kurtosis measures in an extensive simulation power study and are found to offer higher power against many of the alternatives.

Tests for multivariate normality—a critical review with emphasis on weighted $L^2$-statistics

Article Open access 01 December 2020

A Comment on Affine Invariance and Ancillarity in Testing Multivariate Normality

Article Open access 11 March 2021

On a test of normality based on the empirical moment generating function

Article 12 June 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Many classical multivariate statistical methods are based on the assumption that the data comes from a multivariate normal distribution. Consequently, the use of such methods should be followed by an investigation of the assumption of normality. A number of tests for multivariate normality can be found in the literature, but the field has not been investigated to the same extent as have tests for univariate normality.

Let $\gamma ={ E }(X-\mu )^3/\sigma ^ 3$ denote the skewness of a univariate random variable $X$ and $\kappa ={ E }(X-\mu )^ 4/\sigma ^ 4-3$ denote its (excess) kurtosis. Both these quantities are 0 for the normal distribution but nonzero for many other distributions, and some common tests for univariate normality are therefore based on $\gamma $ and $\kappa $.

Different analog multivariate measures of skewness and kurtosis have been proposed, perhaps most notably by Mardia (1970). Said measures have been used for various tests for multivariate normality in the last few decades. Some of these tests, in particular the tests that use Mardia’s skewness and kurtosis measures as test statistics, have proved to have high power in many simulation studies (e.g. Mecklin and Mundfrom 2004, 2005) and new tests for normality based on multivariate skewness and kurtosis continue to be published today (Doornik and Hansen 2008; Kankainen et al. 2007).

In many inferential situations, some types of departures from normality are a more serious concern than are others. For instance, MANOVA is known to be sensitive to deviations from normality in the form of asymmetry, but to be relatively robust against deviations in the form of heavy tails. Using skewness and kurtosis allows us to construct tests that are directed toward some particular class of alternatives: skewness is used to detect asymmetric alternatives whereas kurtosis is used to detect alternatives with either short or long tails. This typically results in tests that, in comparison to omnibus tests that are directed to all alternatives, have higher power against the class of alternatives that they are directed to.

While more directed toward certain alternatives, such tests may however still be prone to reject alternatives from other classes. The sample skewness and sample kurtosis are correlated, which for instance can cause a skewness-based test to reject normality for a symmetric distribution with heavy tails. Henze (2002) and others have argued that this is a reason to avoid directed tests for normality. Directed tests will however in general have comparatively low power against alternatives that they are not directed to, lowering the risk of rejecting normality because of an unimportant deviation from normality. It is arguably better to have a test that has high power against interesting alternatives and lower power against uninteresting alternatives, rather than a test that has medium high power against all alternatives.

In this paper six new directed tests for normality, all related to multivariate skewness or kurtosis, are proposed. Their common basis is independence characterizations of sample moments of the multivariate normal distribution.

In Sect. 2 we reexamine Mardia’s measure of multivariate skewness, which leads to two new classes of tests for multivariate normality. In Sect. 3 we state explicit expressions for covariances between multivariate sample moments in terms of moments of ${\varvec{{X}}}=(X_1,\ldots ,X_p)'$. This will allow us to estimate the moments involved and to test whether these sample moments are correlated.

In Sect. 4 we study the first class of new tests for normality, all of which are related to multivariate skewness. These can be viewed as multivariate generalizations of the univariate $Z_2'$ test (Thulin 2010), which in turn is a modified version of the Lin and Mudholkar (1980) test. In Sect. 5 we study the second class of tests, related to multivariate kurtosis. These, in turn, are generalizations of the Thulin (2010) $Z_3'$ modification of a test proposed by Mudholkar et al. (2002). The tests are applied to the Iris data in Sect. 6. The results of a simulation study comparing the new tests with tests based on Mardia’s skewness and kurtosis measures is presented in Sect. 7, which is followed by a discussion in Sect. 8. The text concludes with an appendix containing proofs and tables. Additional tables and figures are included in two online supplements.

2 Mardia’s multivariate skewness and kurtosis measures revisited

2.1 Multivariate skewness

A well-known characterization of the multivariate normal distributions is that the i.i.d. $p$-variate variables ${\varvec{{X_1}}}, \ldots , {\varvec{{X_n}}}$ are normal if and only if the sample mean vector ${\varvec{{\bar{X}}}}=(\bar{X}_1,\bar{X}_2,\ldots ,\bar{X}_p)'$ and the sample covariance matrix ${\varvec{{S}}}$ are independent. Our aim is to test this independence in order to assess the normality of a population. As testing independence is difficult, we will resort to testing correlations instead.

Assume that ${\varvec{{X_1}}}, \ldots , {\varvec{{X_n}}}$ are i.i.d. $p$-variate random variables with nonsingular covariance matrix ${\varvec{{\Sigma }}}$. Let ${\varvec{{\bar{X}}}}=(\bar{X}_1,\bar{X}_2,\ldots ,\bar{X}_p)'$ be the sample mean vector and let

$$\begin{aligned} {\varvec{{S}}} = \left[ \begin{array}{l@{\quad }l@{\quad }l@{\quad }l} S_{11} &{} S_{12} &{} \cdots &{} S_{1p} \\ S_{12} &{} S_{22} &{} \cdots &{} S_{2p} \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ S_{1p} &{} S_{2p} &{} \cdots &{} S_{pp} \end{array} \right] \end{aligned}$$

be the sample covariance matrix with $S_{ij}=(n-1)^{-1}\sum _{k=1}^n(X_{k,i}-\bar{X}_i)(X_{k,j}-\bar{X}_j)$. Define

$$\begin{aligned} {\varvec{{u}}}=\mathrm{vech}({\varvec{{S}}})=(S_{11},S_{12},\ldots ,S_{1p},S_{22},S_{23},\ldots ,S_{2p},S_{33},\ldots ,S_{p-1,p},S_{pp})' \end{aligned}$$

so that ${\varvec{{u}}}$ is a vector containing the $q=p(p+1)/2$ distinct elements of ${\varvec{{S}}}$. Now, consider the covariance matrix of the vector $({\varvec{{\bar{X}'}}},{\varvec{{u'}}})'$, in the following denoted $({\varvec{{\bar{X}}}},{\varvec{{u}}})$:

$$\begin{aligned} \mathrm{Cov}(({\varvec{{\bar{X}}}},{\varvec{{u}}})) = \left[ \begin{array}{r@{\quad }r} {\varvec{{{\varvec{{\Lambda }}}_{11}}}} &{} {\varvec{{{\varvec{{\Lambda }}}_{12}}}} \\ {\varvec{{{\varvec{{\Lambda }}}_{21}}}} &{} {\varvec{{{\varvec{{\Lambda }}}_{22}}}}\end{array} \right] \end{aligned}$$

(1)

where ${\varvec{{\Lambda _{11}}}}=\mathrm{Cov}({\varvec{{\bar{X}}}})$, ${\varvec{{\Lambda _{22}}}}=\mathrm{Cov}({\varvec{{u}}})$, ${\varvec{{\Lambda _{21}}}}={\varvec{{{\varvec{{\Lambda }}}_{12}^{'}}}}$ and ${\varvec{{{\varvec{{\Lambda }}}_{12}}}}$ contains covariances of the type $\mathrm{Cov}(\bar{X}_i, S_{jk})$, $i,j,k=1,\ldots ,p$. If ${\varvec{{\bar{X}}}}$ and ${\varvec{{u}}}$ are uncorrelated, then ${\varvec{{\Lambda _{12}}}}={\varvec{{0}}}$.

Mardia (1970, 1974) noted that for univariate random variables, asymptotically $\text{ cor }(\bar{X}, S^2)\approx \frac{1}{\sqrt{2}}\gamma $ if $\kappa $ is assumed to be negligible. Based on this, he used $\mathrm{Cov}(({\varvec{{\bar{X}}}},{\varvec{{u}}}))$ to construct a multivariate skewness measure. Studying the canonical correlations (see e.g. Mardia et al. 1979) between ${\varvec{{\bar{X}}}}$ and ${\varvec{{u}}}$ he proposed the measure

$$\begin{aligned} \beta _{1,p}=2\sum _{i=1}^ p\lambda _i^2 \end{aligned}$$

where $\lambda _1,\ldots ,\lambda _p$ are the canonical correlations. This expression reduces to $2\text{ cor }(\bar{X}, S^2)^ 2\approx \gamma ^2$ for univariate random variables.

From the theory of canonical correlations we have that $\lambda _1^ 2,\ldots ,\lambda _p^ 2$ are the eigenvalues of $ {\varvec{{\Lambda _{11}}}}^{-1}{\varvec{{\Lambda _{12}}}}{\varvec{{\Lambda _{22}}}}^{-1}{\varvec{{\Lambda _{21}}}} $ and thus

$$\begin{aligned} \beta _{1,p}=2tr({\varvec{{\Lambda _{11}}}}^{-1}{\varvec{{\Lambda _{12}}}}{\varvec{{\Lambda _{22}}}}^{-1}{\varvec{{\Lambda _{21}}}}). \end{aligned}$$

Taking these moments to order $n^{-1}$ Mardia showed that

$$\begin{aligned} \beta _{1,p}\approx { E }\left( ({\varvec{{X}}}-{\varvec{{\mu }}})'{\varvec{{\Sigma }}}^{-1}({\varvec{{Y}}}-{\varvec{{\mu }}})\right) ^{3} \end{aligned}$$

where ${\varvec{{X}}}$ and ${\varvec{{Y}}}$ are independent and identical random vectors. The sample counterpart of the above expression,

$$\begin{aligned} b_{1,p}=\frac{1}{n^2}\sum _{i,j=1}^n\left( ({\varvec{{X_i}}}-{\varvec{{\bar{X}}}})'{\varvec{{S^{-1}}}}({\varvec{{X_j}}}-{\varvec{{\bar{X}}}}\right) ^3, \end{aligned}$$

(2)

is commonly used as a measure for multivariate skewness and as a test statistic for a test for multivariate normality.

In Section 2.8 of McCullagh (1987) Mardia’s approximation of $\beta _{1,p}$ is shown to be a natural generalization of $\gamma ^2$. It is however not necessarily a good approximation of the canonical correlations between ${\varvec{{\bar{X}}}}$ and ${\varvec{{u}}}$. An important assumption underlying Mardia’s skewness measure is that the fourth central moments of the distribution are negligible. Seeing as, for univariate variables, $\gamma ^2-2\le \kappa $; see Dubkov and Malakhov (1976); this seems like a rather strong condition. For univariate random variables, Thulin (2010) noted that

$$\begin{aligned} \rho _2=\text{ cor }(\bar{X}, S^2)=\frac{\gamma }{\sqrt{\kappa +3-\frac{n-3}{n-1}}} \end{aligned}$$

and used $\hat{\rho }_2=Z_2'$, the sample moment version of this quantity, as a test statistic for a test for normality, proposing a test that is a modified version of the test of Lin and Mudholkar (1980). In Thulins’ simulation power study $Z_2'$ was more powerful than $\hat{\gamma }$ against most of the alternatives under study. Consequently, for $p=1$ it is better to use the explicit expression for $\text{ cor }(\bar{X}, S^2)$ rather than the approximation $\text{ cor }(\bar{X}, S^2)\approx \frac{1}{\sqrt{2}}\gamma $. It is therefore of interest to use Mardia’s approach without any approximations, in the hope that this will render a more powerful test for normality. In Sect. 3 we give explicit expressions for $\mathrm{Cov}(\bar{X}_i, S_{jk})$ and $\mathrm{Cov}(S_{ij}, S_{kl})$, allowing us to study ${\varvec{{\Lambda _{11}}}}^{-1}{\varvec{{\Lambda _{12}}}}{\varvec{{\Lambda _{22}}}}^{-1}{\varvec{{\Lambda _{21}}}}$ without approximations and to construct new test statistics.

2.2 Multivariate kurtosis

Mardia (1970, 1974) proposed the multivariate kurtosis measure

$$\begin{aligned} \beta _{2,p}={ E }\left( ({\varvec{{X}}}-{\varvec{{\mu }}})'{\varvec{{\Sigma }}}^{-1}({\varvec{{Y}}}-{\varvec{{\mu }}})\right) ^2 \end{aligned}$$

with sample counterpart

$$\begin{aligned} b_{2,p}=\frac{1}{n}\sum _{i=1}^n\left( ({\varvec{{X_i}}}-{\varvec{{\bar{X}}}})'{\varvec{{S^{-1}}}}({\varvec{{X_i}}}-{\varvec{{\bar{X}}}})\right) ^2. \end{aligned}$$

(3)

In the univariate setting

$$\begin{aligned} \rho _3&= \text{ cor }\left( \bar{X},\frac{n}{(n-1)(n-2)}\sum _{i=1}^n(X_i-\bar{X})^3\right) \nonumber \\&= \frac{\kappa }{\sqrt{\lambda +9\frac{n}{n-1}(\kappa +\gamma ^2)+\frac{6n^2}{(n-1)(n-2)}}}, \end{aligned}$$

(4)

where $\lambda =\frac{\mu _6}{\sigma ^6}-15\kappa -10\gamma ^2-15$ is the sixth standardized cumulant (Thulin 2010). In a simulation power study (Thulin 2010) found the test for normality based on $\hat{\rho }_3=Z_3'$, the sample counterpart of (4), to have a better overall performance than the popular $\hat{\kappa }=b_2=b_{2,1}$ test. It is therefore of interest to find a multivariate generalization of $Z_3'$, in hopes that it will yield a test with higher power than $b_{2,p}$.

Similarly to what was done above for the covariance, let

$$\begin{aligned} S_{ijk}=\frac{n}{(n-1)(n-2)}\sum _{r=1}^n(X_{r,i}-\bar{X}_i)(X_{r,j}-\bar{X}_j)(X_{r,k}-\bar{X}_k) \end{aligned}$$

and

$$\begin{aligned} {\varvec{{v}}}=(S_{111},S_{112},\ldots ,S_{pp(p-1)},S_{ppp})',\end{aligned}$$

a vector of length $p+p(p-1)+p(p-1)(p-2)/6$. We will construct tests based on the fact that ${\varvec{{\bar{X}}}}$ and ${\varvec{{v}}}$ are independent if ${\varvec{{X_1}}}, \ldots , {\varvec{{X_n}}}$ are normal. The covariance matrix of $({\varvec{{\bar{X}}}},{\varvec{{v}}})$ can be written as

$$\begin{aligned} \mathrm{Cov}(({\varvec{{\bar{X}}}},{\varvec{{v}}})) = \left[ \begin{array}{r@{\quad }r} {\varvec{{{\varvec{{\Psi }}}_{11}}}} &{} {\varvec{{{\varvec{{\Psi }}}_{12}}}} \\ {\varvec{{{\varvec{{\Psi }}}_{21}}}} &{} {\varvec{{{\varvec{{\Psi }}}_{22}}}}\end{array} \right] \end{aligned}$$

(5)

where ${\varvec{{\Psi _{11}}}}=\mathrm{Cov}({\varvec{{\bar{X}}}})$, ${\varvec{{\Psi _{22}}}}=\mathrm{Cov}({\varvec{{v}}})$, ${\varvec{{\Psi _{21}}}}={\varvec{{{\varvec{{\Psi }}}_{12}^{'}}}}$ and ${\varvec{{{\varvec{{\Psi }}}_{12}}}}$ contains covariances of the type $\mathrm{Cov}(\bar{X}_i, S_{jkl})$, $i,j,k,l=1,\ldots ,p$. If ${\varvec{{X}}}$ and ${\varvec{{v}}}$ are uncorrelated, ${\varvec{{\Psi _{12}}}}={\varvec{{0}}}$.

3 Explicit expressions for the covariances

In the following theorems we state explicit expressions for the elements of $\mathrm{Cov}(({\varvec{{\bar{X}}}},{\varvec{{u}}}))$ and $\mathrm{Cov}(({\varvec{{\bar{X}}}},{\varvec{{v}}}))$ in terms of moments of $(X_1,\ldots ,X_p)$. These covariances can be obtained by tedious but routine calculations of the moments involved, that are much simplified by the use of tensor notation, as described in McCullagh (1987). All five covariances can be found scattered in the literature, expressed using cumulants: (6)–(8) are all given in Section 4.2.3 of McCullagh (1987), (9) is found in Problem 4.5 of McCullagh (1987) and (10) is expression (7) in Kaplan (1952).

Theorem 1

Assume that ${\varvec{{X_1}}}, \ldots , {\varvec{{X_n}}}$ are i.i.d. $p$-variate random variables with ${ E }|X_iX_jX_kX_l|<\infty $ for $i,j,k,l=1,2,\ldots ,p$. Let $\mu _{i_1,\ldots ,i_s}={ E }(X_{i_1}-\mu _{i_1})(X_{i_2}-\mu _{i_2})\cdots (X_{i_s}-\mu _{i_s})$. Then, for $n\ge 2p+p(p-1)/2$ and $i,j,k,l=1,2,\ldots ,p$

(i)
the elements of ${\varvec{{\Lambda }}}_{11}$ are
$$\begin{aligned} \mathrm{Cov}(\bar{X}_i, \bar{X}_j)=\frac{1}{n}\mu _{ij}, \end{aligned}$$
(6)
(ii)
the elements of ${\varvec{{\Lambda }}}_{12}$ and ${\varvec{{\Lambda }}}_{21}$ are
$$\begin{aligned} \mathrm{Cov}(\bar{X}_i, S_{jk})=\frac{1}{n}\mu _{ijk} \end{aligned}$$
(7)
and
(iii)
the elements of ${\varvec{{\Lambda }}}_{22}$ are
$$\begin{aligned} \mathrm{Cov}(S_{ij}, S_{kl})=\frac{1}{n}(\mu _{ijkl}-\mu _{ij}\mu _{kl})+\frac{1}{n(n-1)}(\mu _{ik}\mu _{jl}+\mu _{il}\mu _{jk}). \end{aligned}$$
(8)

Since ${\varvec{{\Psi }}}_{11}={\varvec{{\Lambda }}}_{11}$, we only give the expressions for ${\varvec{{\Psi _{22}}}}$ and ${\varvec{{\Psi _{12}}}}$ in the following theorem.

Theorem 2

Assume that ${\varvec{{X_1}}}, \ldots , {\varvec{{X_n}}}$ are i.i.d. $p$-variate random variables with ${ E }|X_\alpha X_\beta X_\gamma X_\delta X_\epsilon X_\zeta |<\infty $ for $\alpha ,\ldots ,\zeta =1,2,\ldots ,p$. Let $\mu _{i_1,\ldots ,i_s}={ E }(X_{i_1}-\mu _{i_1})(X_{i_2}-\mu _{i_2})\cdots (X_{i_s}-\mu _{i_s})$. Then, for $n\ge 2p+p(p-1)+p(p-1)(p-2)/6$ and $i,j,k,r,s,t=1,2,\ldots ,p$

(i)
the elements of ${\varvec{{\Psi }}}_{12}$ and ${\varvec{{\Psi }}}_{21}$ are
$$\begin{aligned} \mathrm{Cov}(\bar{X}_i, S_{rst})=\frac{1}{n}\left( \mu _{irst}-\mu _{ir}\mu _{st}-\mu _{is}\mu _{rt}-\mu _{it}\mu _{rs}\right) \end{aligned}$$
(9)
and
(ii)
the elements of ${\varvec{{\Psi }}}_{22}$ are
$$\begin{aligned} \mathrm{Cov}(S_{ijk}, S_{rst})\!&= \!\frac{1}{n}\lambda _{ijkrst} \!+\!\frac{1}{n\!-\!1}\left( \sum ^9\mu _{ir}(\mu _{jkst}\!-\!\sum ^3\mu _{jk}\mu _{st})\!+\!\sum ^9\mu _{ijr}\mu _{kst}\right) \nonumber \\&+\frac{n}{(n-1)(n-2)}\sum ^6\mu _{ir}\mu _{js}\mu _{kt} \end{aligned}$$
(10)
where $\lambda _{ijkrst}$ is given below and $\sum ^k$ denotes summation over $k$ distinct permutations of $i,j,k,r,s,t$. In particular, in $\sum ^9\mu _{ir}(\ldots )$ the summation is taken over all permutations of $i,j,k,r,s,t$ where $i$ and either of $j,k$ switch places and/or $r$ and either of $s,t$ switch places. In $\sum ^9\mu _{ijr}\mu _{kst}$ the summation is taken over all permutations except $\mu _{ijk}\mu _{rst}$. Finally, in $\sum ^3\mu _{jk}\mu _{st}$ and
$$\begin{aligned} \lambda _{ijkrst}=\mu _{ijkrst}-\sum ^{15}\mu _{ij}(\mu _{krst}-\sum ^{3}\mu _{kr}\mu _{st})-\sum ^{10}\mu _{ijk}\mu _{rst}-\sum ^{15}\mu _{ij}\mu _{kr}\mu _{st} \end{aligned}$$
the sums are taken over all distinct permutations.

4 Tests based on ${\varvec{{\bar{X}}}}$ and ${\varvec{{u}}}$

4.1 Modifying Mardia’s statistic

The factor 2 in Mardia’s expression

$$\begin{aligned} \beta _{1,p}=2tr({\varvec{{\Lambda _{11}}}}^{-1}{\varvec{{\Lambda _{12}}}}{\varvec{{\Lambda _{22}}}}^{-1}{\varvec{{\Lambda _{21}}}}) \end{aligned}$$

is only of interest if we assume negligible fourth moments (in the sense of Mardia (1970)). We will therefore omit it in the following and instead study the quantity

$$\begin{aligned} tr({\varvec{{\Lambda _{11}}}}^{-1}{\varvec{{\Lambda _{12}}}}{\varvec{{\Lambda _{22}}}}^{-1}{\varvec{{\Lambda _{21}}}}). \end{aligned}$$

Let ${\varvec{{L_{11}}}}$, ${\varvec{{L_{22}}}}$, ${\varvec{{L_{12}}}}$ and ${\varvec{{L_{21}}}}$ be the sample counterparts of ${\varvec{{\Lambda _{11}}}}$, ${\varvec{{\Lambda _{22}}}}$, ${\varvec{{\Lambda _{12}}}}$ and ${\varvec{{\Lambda _{21}}}}$, where $\mu _{i_1,\ldots ,i_s}={ E }(X_{i_1}-\mu _{i_1})(X_{i_2}-\mu _{i_2})\ldots (X_{i_s}-\mu _{i_s})$ are estimated by the sample moments

$$\begin{aligned} m_{i_1,\ldots ,i_s}=n^{-1}\sum _{k=1}^n (x_{k,i_1}-\bar{x}_{i_1})(x_{k,i_2}-\bar{x}_{i_2})\ldots (x_{k,i_s}-\bar{x}_{i_s}), \end{aligned}$$

(11)

i.e. where the moments in Theorem 1 are replaced by their sample counterparts. The test statistic for the new test is

$$\begin{aligned} Z_{2,p}^{({ HL})}=tr({\varvec{{L_{11}}}}^{-1}{\varvec{{L_{12}}}}{\varvec{{L_{22}}}}^{-1}{\varvec{{L_{21}}}}). \end{aligned}$$

(12)

The null hypothesis of normality is rejected if $Z_{2,p}^{({ HL})}$ is sufficiently large.

$Z_{2,1}^{({ HL})}$ coincides with $Z_2'^2$ from Thulin (2010) and is thus equivalent to the $|Z_2'|$ test presented there. $Z_{2,2}^{({ HL})}$ is a polynomial of degree 10 in 13 moments and the full formula takes up more than two pages. It is however readily computed using a computer, as is $Z_{2,p}^{({ HL})}$ for higher $p$.

It should be noted that differences in index notation complicate the situation somewhat here. Mardia’s skewness is denoted $b_{1,p}$, with 1 as its index, whereas the univariate correlation statistic $Z_2'$ has 2 as its index. When generalizing $Z_2'$ to the multivariate setting we will keep the index 2, hoping that it won’t be confused with Mardia’s kurtosis measure $b_{2,p}$.

4.2 Other test statistics from the theory of canonical correlations

Let ${\varvec{{Y}}}$ and ${\varvec{{Z}}}$ be normal random vectors with

$$\begin{aligned} \mathrm{Cov}(({\varvec{{Y}}},{\varvec{{Z}}})) = \left[ \begin{array}{l@{\quad }l} {\varvec{{{\varvec{{\Sigma }}}_{11}}}} &{} {\varvec{{{\varvec{{\Sigma }}}_{12}}}} \\ {\varvec{{{\varvec{{\Sigma }}}_{21}}}} &{} {\varvec{{{\varvec{{\Sigma }}}_{22}}}}\end{array} \right] \end{aligned}$$

partitioned like (1). Let ${\varvec{{\hat{\Sigma }_{11}}}}$, ${\varvec{{\hat{\Sigma }_{22}}}}$ and ${\varvec{{\hat{\Sigma }_{12}}}}={\varvec{{\hat{\Sigma }_{21}'}}}$ be the sample covariance matrices and $\hat{\nu }_1^2,\ldots ,\hat{\nu }_p^2$ be the eigenvalues of ${\varvec{{\hat{\Sigma }_{11}}}}^{-1}{\varvec{{\hat{\Sigma }_{12}}}}{\varvec{{\hat{\Sigma }_{22}}}}^{-1}{\varvec{{\hat{\Sigma }_{21}}}}$. In Section 10.3 of Kshirsagar (1972) the test statistic of the likelihood ratio test of $H_0: {\varvec{{\Sigma _{12}}}}={\varvec{{0}}}$ versus $H_1: {\varvec{{\Sigma _{12}}}}\ne {\varvec{{0}}}$ is shown to be

$$\begin{aligned} -n\log \prod _{i=1}^p(1-\hat{\nu }_i^2). \end{aligned}$$

(13)

Now, let $\hat{\lambda }_1^2\ge \hat{\lambda }_2^2\ge \ldots \ge \hat{\lambda }_p^2$ be the eigenvalues of ${\varvec{{L_{11}}}}^{-1}{\varvec{{L_{12}}}}{\varvec{{L_{22}}}}^{-1}{\varvec{{L_{21}}}}$. Assuming that the necessary moments exist, ${\varvec{{\bar{X}}}}$ and ${\varvec{{u}}}$ are asymptotically normal. Although ${\varvec{{L_{22}}}}$ and ${\varvec{{L_{12}}}}$ are not the usual sample covariance matrices, in the light of (13), this suggests the use of the following statistic for a test for normality:

$$\begin{aligned} Z_{2,p}^{(W)}=\prod _{i=1}^p(1-\hat{\lambda }_i^2). \end{aligned}$$

(14)

The null hypothesis of normality is rejected if $Z_{2,p}^{(W)}$ is sufficiently small.

Another quantity that has been considered for a test of $H_0: {\varvec{{\Sigma _{12}}}}={\varvec{{0}}}$, for instance by Bartlett (1939), is

$$\begin{aligned} Z_{2,p}^{({ PB})}=\sum _{i=1}^p\frac{\hat{\lambda }_i^2}{1-\hat{\lambda }_i^2}. \end{aligned}$$

(15)

$Z_{2,p}^{({ PB})}$ is similar to $Z_{2,p}^{({ HL})}$, but weighs the correlation coefficients so that larger coefficients become more influential. The null hypothesis should be rejected for large values of $Z_{2,p}^{({ PB})}$.

Finally, we can consider the statistic

$$\begin{aligned} Z_{2,p}^{(max)}=\max (\hat{\lambda }_1^2,\ldots ,\hat{\lambda }_p^2)=\hat{\lambda }_1^2, \end{aligned}$$

(16)

large values of which imply non-normality. $Z_{2,p}^{(max)}$ seems perhaps like the most natural choice for a test statistic, as $\lambda _1=0$ implies that all canonical correlations are 0.

The statistics $Z_{2,p}^{({ HL})}$, $Z_{2,p}^{(W)}$, $Z_{2,p}^{({ PB})}$ and $Z_{2,p}^{(max)}$ are all related to well-known statistics from multivariate analysis of variance; they are analogs of the Hotelling–Lawley trace, Wilk’s $\Lambda $, the Pillai–Bartlett trace and Roy’s greatest root, respectively. For $p=1$ these statistics are all equivalent to the $|Z_2'|$ test from Thulin (2010).

4.3 Theoretical results

Some fundamental properties of the new test statistics are presented in the following theorem. Its proof is given in the Appendix.

Theorem 3

Assume that ${\varvec{{X_1}}}, \ldots , {\varvec{{X_n}}}$ are i.i.d. $p$-variate random variables fulfilling the conditions of Theorem 1. Then, for $n\ge 2p+p(p-1)/2$ and $i,j,k=1,2,\ldots ,p$

(i)
$Z_{2,p}^{({ HL})}, Z_{2,p}^{(W)}, Z_{2,p}^{({ PB})}$ and $Z_{2,p}^{(max)}$ are affine invariant, i.e. invariant under nonsingular linear transformations ${\varvec{{AX}}}+{\varvec{{b}}}$ where ${\varvec{{A}}}$ is a nonsingular $p\times p$ matrix and ${\varvec{{b}}}$ is a $p$-vector,
(ii)
The population canonical correlation $\lambda _1=\max _{{\varvec{{a}}},{\varvec{{b}}}}|\rho ({\varvec{{a\bar{X}}}},{\varvec{{bu}}})|=0$ if $\mu _{ijk}= 0$ for all $i,j,k$ and $>0$ if $\mu _{ijk}\ne 0$ for at least one combination of $i,j,k$, and
(iii)
$Z_{2,p}^{({ HL})}, Z_{2,p}^{(W)}, Z_{2,p}^{({ PB})}$ and $Z_{2,p}^{(max)}$ converge almost surely to the corresponding functions of the population canonical correlations $\lambda _1\ge \lambda _2\ge \ldots \ge \lambda _p$.

Since the statistics are affine invariant, their distributions are the same for all $p$-variate normal distributions for a given sample size $n$. These null distributions are easily obtained using Monte Carlo simulation.

Since $\lambda _1\ge \lambda _j$ for $j>1, \lambda _1=0$ implies that all population canonical correlations are 0, as is the case for the normal distribution. The tests should therefore not be sensitive to distributions with that kind of symmetry. All four statistics are, by (ii) and (iii), however consistent against alternatives where $\mu _{ijk}\ne 0$ for at least one combination of $i,j,k$. In particular, they are sensitive to alternatives with skew marginal distributions.

5 Tests based on ${\varvec{{\bar{X}}}}$ and ${\varvec{{v}}}$

5.1 Test statistics

The ideas used in Sect. 4 for $\mathrm{Cov}(({\varvec{{\bar{X}}}},{\varvec{{u}}}))$ can also be used for $\mathrm{Cov}(({\varvec{{\bar{X}}}},{\varvec{{v}}}))$ in an analog manner, yielding multivariate generalizations of (4). This leads to two new tests for normality, as described below.

Let ${\varvec{{P_{11}}}}, {\varvec{{P_{22}}}}, {\varvec{{P_{12}}}}$ and ${\varvec{{P_{21}}}}$ be the sample counterparts of ${\varvec{{\Psi _{11}}}}, {\varvec{{\Psi _{22}}}}, {\varvec{{\Psi _{12}}}}$ and ${\varvec{{\Psi _{21}}}}$, where the $\mu _{i_1,\ldots ,i_s}$ are estimated by the sample moments, as in (11) above. Let $\hat{\psi }_1^2\ge \ldots \ge \hat{\psi }_p^2$ be the eigenvalues of ${\varvec{{P_{11}}}}^{-1}{\varvec{{P_{12}}}}{\varvec{{P_{22}}}}^{-1}{\varvec{{P_{21}}}}$.

The test statistics for the new tests are

$$\begin{aligned} Z_{3,p}^{({ HL})}&= tr({\varvec{{P_{11}}}}^{-1}{\varvec{{P_{12}}}}{\varvec{{P_{22}}}}^{-1}{\varvec{{P_{21}}}})=\sum _{i=1}^p\hat{\psi }_i^2, \end{aligned}$$

(17)

$$\begin{aligned} Z_{3,p}^{(W)}&= \prod _{i=1}^p(1-\hat{\psi }_i^2), \end{aligned}$$

(18)

We have also considered other statistics, but found these to have lower power than these two. Large values of $Z_{3,p}^{({ HL})}$ and small values of $Z_{3,p}^{(W)}$ imply non-normality. Both statistics are equivalent to $|Z_3'|$ from Thulin (2010) for $p=1$.

5.2 Theoretical results

The following theorem mimics Theorem 3 above. Its proof is given in the Appendix.

Theorem 4

Assume that ${\varvec{{X_1}}}, \ldots , {\varvec{{X_n}}}$ are i.i.d. $p$-variate random variables fulfilling the conditions of Theorem 2. Then, for $n\ge 2p+p(p-1)+p(p-1)(p-2)/6$ and $i,j,k,r,s,t=1,2,\ldots ,p$

(i)
$Z_{3,p}^{({ HL})}$ and $Z_{3,p}^{(W)}$ are affine invariant, i.e. invariant under nonsingular linear transformations ${\varvec{{AX}}}+{\varvec{{b}}}$ where ${\varvec{{A}}}$ is a nonsingular $p\times p$ matrix and ${\varvec{{b}}}$ is a $p$-vector,
(ii)
The population canonical correlation $\psi _1=\max _{{\varvec{{a}}},{\varvec{{b}}}}|\rho ({\varvec{{a\bar{X}}}},{\varvec{{bv}}})|=0$ if $\mu _{irst}-\mu _{ir}\mu _{st}-\mu _{is}\mu _{rt}-\mu _{it}\mu _{rs}=0$ for all $i,r,s,t=1,\ldots ,p$ and $>0$ otherwise, and
(iii)
$Z_{3,p}^{({ HL})}$ and $Z_{3,p}^{(W)}$ converge almost surely to the corresponding functions of the population canonical correlations $\psi _1\ge \psi _2\ge \ldots \ge \psi _p$.

Using the affine invariance, the null distributions of the statistics can be obtained through Monte Carlo simulation.

By (ii) and (iii) both statistics are consistent against alternatives where $\mu _{irst}-\mu _{ir}\mu _{st}-\mu _{is}\mu _{rt}-\mu _{it}\mu _{rs}\ne 0$ for at least one combination of $i,r, s, t$.

6 Analysis of the Iris data set

In Table 1 we present the results for the new tests when applied to the famous Iris data set of Fisher (1936). The tests are applied to each of the three subsets of the Iris data: Setosa, Versicolor and Virginica. For each such subset $n=50$ and $p=4$. We also applied Mardia’s skewness test $b_{1,p}$ (2), Mardia’s kurtosis test $b_{2,p}$ (3) and the Mardia–Kent omnibus test $T$ (Mardia and Kent 1991), in which the skewness and kurtosis measures are combined. To compute the critical values and $p$-values, we approximated the null distribution of each test statistic by 10,000 simulated samples from a normal distribution. The resulting critical values at the 5 % level are given in the table. Recall that for the $\mathbf {Z_{2,p}^{(W)}}$ and $\mathbf {Z_{3,p}^{(W)}}$ statistics, values of the statistic that are smaller than the critical value imply non-normality.

Table 1 Results for the Iris data set

Full size table

At the 5 % level, the only time at which normality is rejected is when $b_{2,p}$ is applied to the Setosa sample.

7 Simulation results

7.1 The simulation study

To evaluate the performance of the new $Z_{2,p}$ and $Z_{3,p}$ tests, a Monte Carlo study of their power was carried out. The tests were compared to $b_{1,p}$, $b_{2,p}$ (3) and the Mardia–Kent test $T$. The tests were compared for $n=20$ and $n=50$ for $p=2$ and $p=3$. For some alternatives, more combinations of $n$ and $p$ were used. Since the results for $p=2$ and $p=3$ were quite similar, we only present the results for $p=3$ below. The results for $p=2$ can be found in Supplement S1.

In many power studies for multivariate tests for normality alternatives with independent marginal distributions have been used. We believe that this can be misleading, as distributions with independent marginals are uncommon in practice and indeed of little interest in the multivariate setting, where the dependence structure of the marginals often is paramount. For this reason, we decided to focus mainly on alternatives with a more complex dependence structure in our study. One alternative with independent exponential marginals, which has been used in many previous power studies, is included for reference.

The alternatives used in the study are presented in Tables 2 and 3. Contour plots of the alternatives from Table 2 are given in Supplement S2. The asymmetric multivariate Laplace distribution mentioned in Table 3 is described in Kotz et al. (2000).

Table 2 Alternatives constructed using their marginal distributions

Full size table

Table 3 Purely multivariate alternatives. Here ${\varvec{{\Sigma _r}}}$ is a covariance matrix with unit variances and correlations $r$

Full size table

In order to see which alternatives that the different tests could be sensitive to, the population values of the statistics were determined for all alternatives. For most distributions the values were computed numerically, to one decimal place for Mardia’s statistics and to two decimal places for the $Z_{2,p}$ and $Z_{3,p}$ tests. The population values are given in Table 3 in Supplement S1.

Using R, the nine tests were applied to 1,000,000 samples from each alternative and each combination of $n$ and $p$. The null distributions for all test statistics were estimated using 100,000 standard normal samples.

7.2 Results for symmetric alternatives

The results for alternatives with symmetric marginal distributions are presented in Table 4 in the “Appendix”. Mardia’s kurtosis test $b_{2,p}$ had the best overall performance against symmetric alternatives with long-tailed marginal distributions, with the Mardia-Kent $T$ test as runner-up. The $Z_{3,p}$ tests had by far the best performance against symmetric alternatives with short-tailed marginal distributions, but performed poorly against heavy-tailed alternatives. It should therefore be regarded as being directed against short-tailed alternatives.

Table 4 Power of tests for normality against symmetric alternatives, $\alpha =0.05, p=3$

Full size table

$b_{2,p}$ and the $Z_{3,p}$ tests were somewhat unexpectedly outperformed by the $b_{1,p}$ test and the $Z_{2,p}$ tests for the $Laplace(0,1)$ (type I) distribution. This was likely caused by the fact that a distribution with that particular dependence structure (described in Table 2), while having symmetric marginal distributions, is not symmetric in a multivariate sense, as can be seen from the contour plot in Supplement S2 or in Table 3 in Supplement S1.

Finally, we investigated the size of the tests by computing their power against two normal distributions. All tests attained the desired size $\alpha =0.05$.

7.3 Results for asymmetric alternatives

The results for alternatives with asymmetric marginal distributions are presented in Figs. 1 and 2 and Table 5 in the Appendix.

Table 5 Power of tests for normality against asymmetric alternatives, $\alpha =0.05, p=3$

Full size table

Mardias skewness test $b_{1,p}$ and the $Z_{2,p}$ tests are all directed to asymmetric alternatives, and outperformed the other tests. However, no directed test was uniformly more powerful than the other directed tests. For $p=2$, the $Z_{2,p}^{(max)}$ had the best overall performance against asymmetric alternatives, while $b_{1,p}$ and the $Z_{2,p}^{(W)}$ and $Z_{2,p}^{({ PB})}$ tests also displayed a good average performance. For $p=3$ the performance of $Z_{2,p}^{(max)}$ was somewhat worse, whereas $b_{1,p}$, $Z_{2,p}^{(W)}$ and $Z_{2,p}^{({ PB})}$ still showed good performance.

How varying $n$ and $p$ affects the power of the tests is investigated in Figs. 1 and 2. In Fig. 1a, we see that against a distribution with $Beta(1,2)$ marginal distributions, the $Z_{3,p}$ tests have the best performance for small $n$, whereas the $Z_{2,p}$ tests are superior for larger $n$. In Fig. 1, it is seen that against a distribution with $LogN(0,1)$ marginal distributions, the $Z_{2,p}$ tests have higher power than the $b_{1,p}$ test for small $n$, while the relation is reversed for larger $n$.

In Fig. 2a, we see that against the $AL(\mathbf {3},\mathbf {\Sigma _0})$ distribution, $b_{2,p}$ has slightly higher power than the $Z_{2,p}$ tests for small $n$, whereas the $Z_{2,p}$ tests have slightly higher power for larger $n$. In Fig. 2b however, when $n/p$ is fixed and $p$ is increased, the difference in power between the tests remains more or less unchanged.

8 Discussion

Based on the simulation results, our recommendations are that the $Z_{2,p}^{(max)}$ test should be used against asymmetric alternatives when $p=2$. For higher $p$, $b_{1,p}$, $Z_{2,p}^{(W)}$ or $Z_{2,p}^{({ PB})}$ should be used instead. Mardia’s $b_{2,p}$ test should be used against heavy-tailed symmetric alternatives. For short-tailed symmetric alternatives, one of the $Z_{3,p}$ tests would be a better choice.

Most previous power studies for multivariate tests for normality have focused on alternatives with independent marginal distributions. Such distributions are likely to be rare in practice, and as is shown by the two distributions with $Laplace(0,1)$ marginals used in our study, multivariate dependence structures can greatly affect the power of tests for normality.

To complicate matters further, some of the results in the tables highlight the fact that what holds true for one combination of $p$ and $n$ can be false for a different combination. For instance, when $p=2$, $Z_{2,p}^{(max)}$ had higher power than $b_{1,p}$ for the $AL(\mathbf {1},{\varvec{{\Sigma _{0}}}})$ and the multivariate $\chi ^2_8$ alternatives, but when $p=3$, $Z_{2,p}^{(max)}$ had lower power than $b_{1,p}$. This phenomenon merits further investigation, as it implies that power studies performed for low values of $p$ can be misleading when choosing between tests to use for higher-dimensional data. Further examples of this phenomenon are given in Figs. 1 and 2.

In recent years, several authors have studied robust testing for normality, i.e. normality test designed to be robust against outliers. See Stehlík et al. (2012) and Cerioli et al. (2013) for examples. Stehlík et al. (2014) proposed a robustified version of the univariate $Z_{2,1}$ test. A robustified version of the multivariate $Z_{2,p}$ test will appear in a future paper by the author.

Looking at the normal mixtures, which can be viewed as contaminated normal distributions, we see that $Z_{2,p}^{(max)}$ and $b_{1,p}$ were on a par for the mildy polluted mixtures (with a 9:1 mixing ratio) and that $Z_{2,p}^{(max)}$ in general had higher power for the heavily polluted mixtures (with a 3:1 mixing ratio). This suggests the use of the $Z_{2,p}^{(max)}$ statistic for a test for outliers, an idea that perhaps could be investigated further.

Implementations of the $Z_{2,p}$ and $Z_{3,p}$ in R are available from the author. Some critical values for the new tests are given in Table 6.

Table 6 Critical values of the new tests

Full size table

References

Bartlett MS (1939) A note on tests of significance in multivariate analysis. Math Proc Camb Philos Soc 35:180–185
Article Google Scholar
Cerioli A, Farcomeni A, Riani M (2013) Robust distances for outlier-free goodness-of-fit testing. Comput Stat Data Anal 65:29–45
Article MathSciNet Google Scholar
Doornik JA, Hansen H (2008) An omnibus test for univariate and multivariate normality. Oxf Bull Econ Stat 70:927–939
Article Google Scholar
Dubkov AA, Malakhov AN (1976) Properties and interdependence of the cumulants of a random variable. Radiophys Quantum Electron 19:833–839
Article Google Scholar
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188
Article Google Scholar
Henderson HV, Searle SR (1979) Vec and vech operators for matrices, with some uses in Jacobians and multivariate statistics. Can J Stat 7:65–81
Article MATH MathSciNet Google Scholar
Henze N (2002) Invariant tests for multivariate normality: a critical review. Stat Pap 43:467–506
Article MATH MathSciNet Google Scholar
Kankainen A, Taskinen S, Oja H (2007) Tests of multinormality based on location vectors and scatter matrices. Stat Methods Appl 16:357–359
Article MATH MathSciNet Google Scholar
Kaplan EL (1952) Tensor notation and the sampling cumulants of k-statistics. Biometrika 39:319–323
MATH MathSciNet Google Scholar
Kollo T (2002) Multivariate skewness and kurtosis measures with an application in ICA. J Multivar Anal 99:2328–2338
Article MathSciNet Google Scholar
Kollo T, von Rosen D (2005) Advanced multivariate statistics with matrices. Springer, Berlin. ISBN 978-1-4020-3418-3
Kotz S, Kozubowski TJ, Podgórski K (2000) An asymmetric multivariate Laplace distribution, Technical Report No. 367, Department of Statistics and Applied Probability, University of California at Santa Barbara
Kshirsagar AM (1972) Multivariate analysis. Marcel Dekker, ISBN 0-8247-1386-9
Lin C-C, Mudholkar GS (1980) A simple test for normality against asymmetric alternatives. Biometrika 67:455–61
Article MATH MathSciNet Google Scholar
Mardia KV (1970) Measures of multivariate skewness and kurtosis with applications. Biometrika 57:519–530
Article MATH MathSciNet Google Scholar
Mardia KV (1974) Applications of some measures of multivariate skewness and kurtosis in testing normality and robustness studies. Sankhya Indian J Stat 36:115–128
MATH MathSciNet Google Scholar
Mardia KV, Kent JT (1991) Rao score tests for goodness of fit and independence. Biometrika 78:355–363
Article MATH MathSciNet Google Scholar
Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic Press, ISBN 0-12-471250-9
McCullagh P (1987) Tensor methods in statistics. University Press, ISBN 0-412-27480-9
Mecklin CJ, Mundfrom DJ (2004) An appraisal and bibliography of tests for multivariate normality. Int Stat Rev 72:123–128
Article MATH Google Scholar
Mecklin CJ, Mundfrom DJ (2005) A Monte Carlo comparison of the type I and type II error rates of tests of multivariate normality. J Stat Comput Simul 75:93–107
Article MATH MathSciNet Google Scholar
Mudholkar GS, Marchetti CE, Lin CT (2002) Independence characterizations and testing normality against restricted skewness–kurtosis alternatives. J Stat Plan Inference 104:485–501
Article MATH MathSciNet Google Scholar
Stehlík M, Fabián Z, Střelec L (2012) Small sample robust testing for normality against Pareto tails. Commun Stat Simul Comput 41:1167–1194
Article MATH Google Scholar
Stehlík M, Střelec L, Thulin M (2014) On robust testing for normality in chemometrics. Chemom Intell Lab Syst 130:98–109
Article Google Scholar
Thulin M (2010) On two simple tests for normality with high power. Pre-print, arXiv:1008.5319

Download references

Acknowledgments

The author wishes to thank the editor and two anonymous referees for comments that helped improve the paper, and Silvelyn Zwanzig for several helpful suggestions.

Author information

Authors and Affiliations

Department of Mathematics, Uppsala University, P.O. Box 480, 751 06 , Uppsala, Sweden
Måns Thulin

Authors

Måns Thulin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Måns Thulin.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 131 KB)

Supplementary material 2 (pdf 736 KB)

Appendix: proofs and tables

For the proof of Theorems 3 and Theorems 4 we need some basic properties of the Kronecker product $\otimes $ and $\mathrm{vech}$ and $\mathrm{vec}$ operators from Henderson and Searle (1979). See also Kollo and von Rosen (2005) and Kollo (2002) for more on these tools from matrix algebra.

For a $p\times q$ matrix ${\varvec{{A}}}=\{a_{ij}\}$ and an $r\times s$ matrix ${\varvec{{B}}}$, the Kronecker product ${\varvec{{A}}}\otimes {\varvec{{B}}}$ is the $pr\times qs$ matrix $\{a_{ij}{\varvec{{B}}}\}$, $i=1,\ldots ,p$, $j=1,\ldots ,q$. The $\mathrm{vec}$ operator stacks the columns of a matrix underneath eachother, forming a single vector. If the columns of the $p\times q$ matrix ${\varvec{{A}}}$ are denoted ${\varvec{{a_1}}},\ldots ,{\varvec{{a_q}}}$ then $\mathrm{vec}({\varvec{{A}}})=({\varvec{{a_1'}}},\ldots ,{\varvec{{a_q'}}})'$ is a vector of length $pq$.

We will use that

$$\begin{aligned} ({\varvec{{A}}}\otimes {\varvec{{B}}})({\varvec{{C}}}\otimes {\varvec{{D}}})={\varvec{{AC}}}\otimes {\varvec{{BD}}},\qquad ({\varvec{{A}}}\otimes {\varvec{{B}}})'={\varvec{{A'}}}\otimes {\varvec{{B'}}} \end{aligned}$$

and that if ${\varvec{{A}}}$ is a $p\times p$ matrix and ${\varvec{{B}}}$ a $q\times q$ matrix,

$$\begin{aligned} \det ({\varvec{{A}}}\otimes {\varvec{{B}}})=\det ({\varvec{{A}}})^q\det ({\varvec{{A}}})^p. \end{aligned}$$

The $\mathrm{vech}$ operator works as the $\mathrm{vec}$ operator, except that it only contains each distinct element of the matrix once. For a symmetric matrix ${\varvec{{A}}}$, $\mathrm{vech}({\varvec{{A}}})$ thus contains only the diagonal and the elements above the diagonal, whereas $\mathrm{vec}({\varvec{{A}}})$ contains the diagonal elements and the off-diagonal elements twice.

We have the following relationship between the $\mathrm{vec}$ operator and the Kronecker product:

$$\begin{aligned} \mathrm{vec}({\varvec{{ABC}}})=({\varvec{{C'}}}\otimes {\varvec{{A}}})\mathrm{vec}({\varvec{{B}}}). \end{aligned}$$

Furthermore, for a given symmetric $p\times p$ matrix ${\varvec{{A}}}$ there exists a $p(p+1)/2\times p^2$ matrix ${\varvec{{H}}}$ and a $p^2\times p(p+1)/2$ matrix ${\varvec{{G}}}$ such that

$$\begin{aligned} \mathrm{vech}({\varvec{{A}}})={\varvec{{H}}}\mathrm{vec}({\varvec{{A}}})\qquad \text{ and } \qquad \mathrm{vec}({\varvec{{A}}})={\varvec{{G}}}\mathrm{vech}({\varvec{{A}}}). \end{aligned}$$

As a preparation for the proof of Theorem 3, we prove the following auxiliary lemma.

Lemma 1

Assume that ${\varvec{{X}}},{\varvec{{X_1}}}, \ldots , {\varvec{{X_n}}}$ are i.i.d. $p$-variate random variables fulfilling the conditions of Theorem 1. Let $S_{ij}=(n-1)^{-1}\sum _{k=1}^n(X_{k,i}-\bar{X}_i)(X_{k,j}-\bar{X}_j)$ be the elements of the sample covariance matrix ${\varvec{{S}}}$.

$$\begin{aligned} {\varvec{{u_X}}}=(S_{11},S_{12},\ldots ,S_{1p},S_{22},S_{23},\ldots ,S_{2p},S_{33},\ldots ,S_{p-1,p},S_{pp})'=\mathrm{vech}({\varvec{{S}}}) \end{aligned}$$

is a vector with $q=p(p+1)/2$ distinct elements. Denote its covariance matrix $\mathrm{Cov}({\varvec{{u_X}}})={\varvec{{\Lambda _{22}}}}$.

Let ${\varvec{{A}}}$ be a nonsingular $p\times p$ matrix and let ${\varvec{{b}}}$ be a $p$-dimensional vector. Then there exists a nonsingular $q\times q$ matrix ${\varvec{{D}}}$ such that

(i)
the sample variances and covariances of ${\varvec{{Y}}}={\varvec{{AX}}}+{\varvec{{b}}}$ are given by ${\varvec{{u_Y}}}={\varvec{{Du_X}}}$,
(ii)
$\mathrm{Cov}({\varvec{{u_Y}}})={\varvec{{D\Lambda _{22}D'}}}$ and
(iii)
$\det ({\varvec{{D}}})=\det ({\varvec{{A}}})^{p+1}$,

Proof

The transformed sample ${\varvec{{AX}}}+{\varvec{{b}}}$ has sample covariance matrix ${\varvec{{ASA'}}}$, so we wish to study $\mathrm{vech}({\varvec{{ASA'}}})$. We have

$$\begin{aligned} \mathrm{vec}({\varvec{{ASA'}}})=({\varvec{{A}}}\otimes {\varvec{{A}}})\mathrm{vec}({\varvec{{S}}}). \end{aligned}$$

Moreover, since ${\varvec{{S}}}$ is symmetric there exist nonsingular matrices ${\varvec{{G}}}$ and ${\varvec{{H}}}$ such that

$$\begin{aligned} \mathrm{vec}({\varvec{{S}}})={\varvec{{G}}}\mathrm{vech}({\varvec{{S}}})\qquad \text{ and } \qquad \mathrm{vech}({\varvec{{S}}})={\varvec{{H}}}\mathrm{vec}({\varvec{{S}}}). \end{aligned}$$

Thus

$$\begin{aligned} {\varvec{{u_Y}}}=\mathrm{vech}({\varvec{{ASA'}}})={\varvec{{H}}}({\varvec{{A}}}\otimes {\varvec{{A}}}){\varvec{{G}}}\mathrm{vech}({{\varvec{{S}}}})=:{\varvec{{D}}}{\varvec{{u_X}}}, \end{aligned}$$

which establishes the existence of ${\varvec{{D}}}$. From Section 4.2 of Henderson and Searle (1979) we have

$$\begin{aligned} \det ({\varvec{{D}}})=\det ({\varvec{{H}}}({\varvec{{A}}}\otimes {\varvec{{A}}}){\varvec{{G}}})=\det ({\varvec{{A}}})^{p+1} \end{aligned}$$

which is nonzero, since ${\varvec{{A}}}$ is nonsingular. ${\varvec{{D}}}$ is hence also nonsingular. In conclusion, we have established the existence and nonsingularity of ${\varvec{{D}}}$ as well as (i) and (iii). Finally, (ii) follows immediately from (i). $\square $

We now have the tools necessary to tackle Theorem 3.

Proof of Theorem 3

(i)
From Theorem 10.2.4 in Mardia et al. (1979) we have that the canonical correlations between the random vectors ${\varvec{{Y}}}$ and ${\varvec{{Z}}}$ are invariant under the nonsingular linear transformations ${\varvec{{AY}}}+{\varvec{{b}}}$ and ${\varvec{{CZ}}}+{\varvec{{d}}}$. Clearly all five statistics are invariant under changes in location, since ${\varvec{{S_{11}}}}$, ${\varvec{{S_{22}}}}$, ${\varvec{{S_{12}}}}$ and ${\varvec{{S_{21}}}}$ all share that invariance property. It therefore suffices to show that the nonsingular linear transformation ${\varvec{{AX}}}$ induces nonsingular linear transformations ${\varvec{{C\bar{X}}}}$ and ${\varvec{{Du}}}$. ${\varvec{{C}}}={\varvec{{A}}}$ is immediate and the existence of ${\varvec{{D}}}$ is given by Lemma 1.
(ii)
By part (ii) of Theorem 1, $\mu _{ijk}=0$ for all $i,j,k$ implies that ${\varvec{{\Lambda }}}_{12}={\varvec{{0}}}$. But then ${\varvec{{\Lambda _{11}}}}^{-1}{\varvec{{\Lambda _{12}}}}{\varvec{{\Lambda _{22}}}}^{-1}{\varvec{{\Lambda _{21}}}}={\varvec{{0}}}$ and all canonical correlations are 0. If $\mu _{ijk}\ne 0$ then $\rho (\bar{X}_i,S_{jk})\ne 0$. Thus the linear combinations ${\varvec{{a'\bar{X}}}}=\bar{X}_i$ and ${\varvec{{b'u}}}=S_{jk}$ have nonzero correlation. $\lambda _1$ must therefore be greater than 0.
(iii)
Follows from the fact that the statistics are continuous function of sample moments that converge almost surely.$\square $

The proofs of parts (ii) and (iii) of Theorem 4 are analog to the previous proof. The proof for part (i) is however slightly different as we omit to explicitly give a matrix that gives a nonsingular linear transformation of ${\varvec{{v_X}}}$.

Proof of Theorem 4

(i) Let the third order central moment of a multivariate random variable ${\varvec{{Z}}}$ be

$$\begin{aligned} \bar{m}_3({\varvec{{Z}}})&= { E }\left[ ({\varvec{{Z}}}-{ E }{\varvec{{Z}}})\otimes ({\varvec{{Z}}}-{ E }{\varvec{{Z}}})'\otimes ({\varvec{{Z}}}-{ E }{\varvec{{Z}}}) \right] '\nonumber \\&= { E }\left[ ({\varvec{{Z}}}-{ E }{\varvec{{Z}}})\left( ({\varvec{{Z}}}-{ E }{\varvec{{Z}}})\otimes ({\varvec{{Z}}}-{ E }{\varvec{{Z}}})\right) '\right] . \end{aligned}$$

Given a sample ${\varvec{{X}}}_1,\ldots ,{\varvec{{X}}}_p$, let $S_{ijk}=\frac{n}{(n-1)(n-2)}\sum _{r=1}^n(X_{r,i}-\bar{X}_i)(X_{r,j}-\bar{X}_j)(X_{r,k}-\bar{X}_k)$. When the distribution of ${\varvec{{Z}}}$ is the empirical distribution of said sample,

$$\begin{aligned} {\varvec{{v_X}}}=(S_{111},S_{112},\ldots ,S_{pp(p-1)},S_{ppp})'=\frac{n^2}{(n-1)(n-2)}\mathrm{vech}\left( \bar{m}_3({\varvec{{Z}}})\right) . \end{aligned}$$

Similarly $\mathrm{vec}\left( \bar{m}_3({\varvec{{Z}}})\right) $ stacks the elements of $\bar{m}_3({\varvec{{Z}}})$ in a vector that simply is $\mathrm{vech}\left( \bar{m}_3({\varvec{{Z}}})\right) $ with a few repetitions:

$$\begin{aligned} {\varvec{{w_X}}}=(S_{111},S_{112},\ldots ,S_{112}\ldots ,S_{pp(p-1)},S_{ppp})'=\frac{n^2}{(n-1)(n-2)}\mathrm{vec}\left( \bar{m}_3({\varvec{{Z}}})\right) . \end{aligned}$$

Thus, for each linear combination ${\varvec{{a'}}}{\varvec{{w_X}}}$ there exists a ${\varvec{{b}}}$ so that ${\varvec{{b'}}}{\varvec{{v_X}}}={\varvec{{a'}}}{\varvec{{w_X}}}$ and therefore, by the definition of canonical correlations, the (sample) canonical correlations between ${\varvec{{\bar{X}}}}$ and ${\varvec{{v_X}}}$ are the same as those between ${\varvec{{\bar{X}}}}$ and ${\varvec{{w_X}}}$.

Writing ${\varvec{{Y}}}={\varvec{{Z}}}-{ E }{\varvec{{Z}}}$, we have $\bar{m}_3({\varvec{{Z}}})={ E }\left( {\varvec{{Y}}}({\varvec{{Y}}}\otimes {\varvec{{Y}}})'\right) $ and

$$\begin{aligned} \bar{m}_3({\varvec{{AZ}}})&= { E }\left( {\varvec{{AY}}}({\varvec{{AY}}}\otimes {\varvec{{AY}}})'\right) ={ E }\left( {\varvec{{AY}}}({\varvec{{Y}}}\otimes {\varvec{{Y}}})'({\varvec{{A}}}\otimes {\varvec{{A}}})'\right) \nonumber \\&= {\varvec{{A}}}\bar{m}_3({\varvec{{Z}}})({\varvec{{A}}}\otimes {\varvec{{A}}})'. \end{aligned}$$

Hence

$$\begin{aligned} \mathrm{vec}\left( \bar{m}_3({\varvec{{AZ}}})\right) =({\varvec{{A}}}\otimes {\varvec{{A}}}\otimes {\varvec{{A}}})\mathrm{vec}\left( \bar{m}_3({\varvec{{Z}}})\right) . \end{aligned}$$

Now, $\det ({\varvec{{A}}}\otimes {\varvec{{A}}}\otimes {\varvec{{A}}})=\det ({\varvec{{A}}}\otimes {\varvec{{A}}})^p\det ({\varvec{{A}}})^{p^2}=\det ({\varvec{{A}}})^{3p^2}>0$, so ${\varvec{{E}}}:=({\varvec{{A}}}\otimes {\varvec{{A}}}\otimes {\varvec{{A}}})$ is a nonsingular matrix such that $\bar{m}_3({\varvec{{AZ}}})={\varvec{{E}}}\bar{m}_3({\varvec{{Z}}})$. Since canonical correlations are invariant under nonsingular linear transformations of the two sets of variables, this means that the canonical correlations between ${\varvec{{\bar{X}}}}$ and ${\varvec{{w_X}}}$ remain unchanged under the transformation ${\varvec{{AX}}}+{\varvec{{b}}}$. Thus the canonical correlations between ${\varvec{{\bar{X}}}}$ and ${\varvec{{v_Y}}}$ must also necessarily remain unchanged. This proves the affine invariance of the statistics. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thulin, M. Tests for multivariate normality based on canonical correlations. Stat Methods Appl 23, 189–208 (2014). https://doi.org/10.1007/s10260-013-0252-5

Download citation

Accepted: 29 December 2013
Published: 11 January 2014
Issue Date: June 2014
DOI: https://doi.org/10.1007/s10260-013-0252-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Tests for multivariate normality based on canonical correlations

Abstract

Similar content being viewed by others

Tests for multivariate normality—a critical review with emphasis on weighted \(L^2\)-statistics

A Comment on Affine Invariance and Ancillarity in Testing Multivariate Normality

On a test of normality based on the empirical moment generating function

1 Introduction

2 Mardia’s multivariate skewness and kurtosis measures revisited

2.1 Multivariate skewness

2.2 Multivariate kurtosis

3 Explicit expressions for the covariances

Theorem 1

Theorem 2

4 Tests based on \({\varvec{{\bar{X}}}}\) and \({\varvec{{u}}}\)

4.1 Modifying Mardia’s statistic

4.2 Other test statistics from the theory of canonical correlations

4.3 Theoretical results

Theorem 3

5 Tests based on \({\varvec{{\bar{X}}}}\) and \({\varvec{{v}}}\)

5.1 Test statistics

5.2 Theoretical results

Theorem 4

6 Analysis of the Iris data set

7 Simulation results

7.1 The simulation study

7.2 Results for symmetric alternatives

7.3 Results for asymmetric alternatives

8 Discussion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 131 KB)

Supplementary material 2 (pdf 736 KB)

Appendix: proofs and tables

Appendix: proofs and tables

Lemma 1

Proof

Proof of Theorem 3

Proof of Theorem 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation