1 Introduction

Many classical multivariate statistical methods are based on the assumption that the data comes from a multivariate normal distribution. Consequently, the use of such methods should be followed by an investigation of the assumption of normality. A number of tests for multivariate normality can be found in the literature, but the field has not been investigated to the same extent as have tests for univariate normality.

Let \(\gamma ={ E }(X-\mu )^3/\sigma ^ 3\) denote the skewness of a univariate random variable \(X\) and \(\kappa ={ E }(X-\mu )^ 4/\sigma ^ 4-3\) denote its (excess) kurtosis. Both these quantities are 0 for the normal distribution but nonzero for many other distributions, and some common tests for univariate normality are therefore based on \(\gamma \) and \(\kappa \).

Different analog multivariate measures of skewness and kurtosis have been proposed, perhaps most notably by Mardia (1970). Said measures have been used for various tests for multivariate normality in the last few decades. Some of these tests, in particular the tests that use Mardia’s skewness and kurtosis measures as test statistics, have proved to have high power in many simulation studies (e.g. Mecklin and Mundfrom 2004, 2005) and new tests for normality based on multivariate skewness and kurtosis continue to be published today (Doornik and Hansen 2008; Kankainen et al. 2007).

In many inferential situations, some types of departures from normality are a more serious concern than are others. For instance, MANOVA is known to be sensitive to deviations from normality in the form of asymmetry, but to be relatively robust against deviations in the form of heavy tails. Using skewness and kurtosis allows us to construct tests that are directed toward some particular class of alternatives: skewness is used to detect asymmetric alternatives whereas kurtosis is used to detect alternatives with either short or long tails. This typically results in tests that, in comparison to omnibus tests that are directed to all alternatives, have higher power against the class of alternatives that they are directed to.

While more directed toward certain alternatives, such tests may however still be prone to reject alternatives from other classes. The sample skewness and sample kurtosis are correlated, which for instance can cause a skewness-based test to reject normality for a symmetric distribution with heavy tails. Henze (2002) and others have argued that this is a reason to avoid directed tests for normality. Directed tests will however in general have comparatively low power against alternatives that they are not directed to, lowering the risk of rejecting normality because of an unimportant deviation from normality. It is arguably better to have a test that has high power against interesting alternatives and lower power against uninteresting alternatives, rather than a test that has medium high power against all alternatives.

In this paper six new directed tests for normality, all related to multivariate skewness or kurtosis, are proposed. Their common basis is independence characterizations of sample moments of the multivariate normal distribution.

In Sect. 2 we reexamine Mardia’s measure of multivariate skewness, which leads to two new classes of tests for multivariate normality. In Sect. 3 we state explicit expressions for covariances between multivariate sample moments in terms of moments of \({\varvec{{X}}}=(X_1,\ldots ,X_p)'\). This will allow us to estimate the moments involved and to test whether these sample moments are correlated.

In Sect. 4 we study the first class of new tests for normality, all of which are related to multivariate skewness. These can be viewed as multivariate generalizations of the univariate \(Z_2'\) test (Thulin 2010), which in turn is a modified version of the Lin and Mudholkar (1980) test. In Sect. 5 we study the second class of tests, related to multivariate kurtosis. These, in turn, are generalizations of the Thulin (2010) \(Z_3'\) modification of a test proposed by Mudholkar et al. (2002). The tests are applied to the Iris data in Sect. 6. The results of a simulation study comparing the new tests with tests based on Mardia’s skewness and kurtosis measures is presented in Sect. 7, which is followed by a discussion in Sect. 8. The text concludes with an appendix containing proofs and tables. Additional tables and figures are included in two online supplements.

2 Mardia’s multivariate skewness and kurtosis measures revisited

2.1 Multivariate skewness

A well-known characterization of the multivariate normal distributions is that the i.i.d. \(p\)-variate variables \({\varvec{{X_1}}}, \ldots , {\varvec{{X_n}}}\) are normal if and only if the sample mean vector \({\varvec{{\bar{X}}}}=(\bar{X}_1,\bar{X}_2,\ldots ,\bar{X}_p)'\) and the sample covariance matrix \({\varvec{{S}}}\) are independent. Our aim is to test this independence in order to assess the normality of a population. As testing independence is difficult, we will resort to testing correlations instead.

Assume that \({\varvec{{X_1}}}, \ldots , {\varvec{{X_n}}}\) are i.i.d. \(p\)-variate random variables with nonsingular covariance matrix \({\varvec{{\Sigma }}}\). Let \({\varvec{{\bar{X}}}}=(\bar{X}_1,\bar{X}_2,\ldots ,\bar{X}_p)'\) be the sample mean vector and let

$$\begin{aligned} {\varvec{{S}}} = \left[ \begin{array}{l@{\quad }l@{\quad }l@{\quad }l} S_{11} &{} S_{12} &{} \cdots &{} S_{1p} \\ S_{12} &{} S_{22} &{} \cdots &{} S_{2p} \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ S_{1p} &{} S_{2p} &{} \cdots &{} S_{pp} \end{array} \right] \end{aligned}$$

be the sample covariance matrix with \(S_{ij}=(n-1)^{-1}\sum _{k=1}^n(X_{k,i}-\bar{X}_i)(X_{k,j}-\bar{X}_j)\). Define

$$\begin{aligned} {\varvec{{u}}}=\mathrm{vech}({\varvec{{S}}})=(S_{11},S_{12},\ldots ,S_{1p},S_{22},S_{23},\ldots ,S_{2p},S_{33},\ldots ,S_{p-1,p},S_{pp})' \end{aligned}$$

so that \({\varvec{{u}}}\) is a vector containing the \(q=p(p+1)/2\) distinct elements of \({\varvec{{S}}}\). Now, consider the covariance matrix of the vector \(({\varvec{{\bar{X}'}}},{\varvec{{u'}}})'\), in the following denoted \(({\varvec{{\bar{X}}}},{\varvec{{u}}})\):

$$\begin{aligned} \mathrm{Cov}(({\varvec{{\bar{X}}}},{\varvec{{u}}})) = \left[ \begin{array}{r@{\quad }r} {\varvec{{{\varvec{{\Lambda }}}_{11}}}} &{} {\varvec{{{\varvec{{\Lambda }}}_{12}}}} \\ {\varvec{{{\varvec{{\Lambda }}}_{21}}}} &{} {\varvec{{{\varvec{{\Lambda }}}_{22}}}}\end{array} \right] \end{aligned}$$
(1)

where \({\varvec{{\Lambda _{11}}}}=\mathrm{Cov}({\varvec{{\bar{X}}}})\), \({\varvec{{\Lambda _{22}}}}=\mathrm{Cov}({\varvec{{u}}})\), \({\varvec{{\Lambda _{21}}}}={\varvec{{{\varvec{{\Lambda }}}_{12}^{'}}}}\) and \({\varvec{{{\varvec{{\Lambda }}}_{12}}}}\) contains covariances of the type \(\mathrm{Cov}(\bar{X}_i, S_{jk})\), \(i,j,k=1,\ldots ,p\). If \({\varvec{{\bar{X}}}}\) and \({\varvec{{u}}}\) are uncorrelated, then \({\varvec{{\Lambda _{12}}}}={\varvec{{0}}}\).

Mardia (1970, 1974) noted that for univariate random variables, asymptotically \(\text{ cor }(\bar{X}, S^2)\approx \frac{1}{\sqrt{2}}\gamma \) if \(\kappa \) is assumed to be negligible. Based on this, he used \(\mathrm{Cov}(({\varvec{{\bar{X}}}},{\varvec{{u}}}))\) to construct a multivariate skewness measure. Studying the canonical correlations (see e.g. Mardia et al. 1979) between \({\varvec{{\bar{X}}}}\) and \({\varvec{{u}}}\) he proposed the measure

$$\begin{aligned} \beta _{1,p}=2\sum _{i=1}^ p\lambda _i^2 \end{aligned}$$

where \(\lambda _1,\ldots ,\lambda _p\) are the canonical correlations. This expression reduces to \(2\text{ cor }(\bar{X}, S^2)^ 2\approx \gamma ^2\) for univariate random variables.

From the theory of canonical correlations we have that \(\lambda _1^ 2,\ldots ,\lambda _p^ 2\) are the eigenvalues of \( {\varvec{{\Lambda _{11}}}}^{-1}{\varvec{{\Lambda _{12}}}}{\varvec{{\Lambda _{22}}}}^{-1}{\varvec{{\Lambda _{21}}}} \) and thus

$$\begin{aligned} \beta _{1,p}=2tr({\varvec{{\Lambda _{11}}}}^{-1}{\varvec{{\Lambda _{12}}}}{\varvec{{\Lambda _{22}}}}^{-1}{\varvec{{\Lambda _{21}}}}). \end{aligned}$$

Taking these moments to order \(n^{-1}\) Mardia showed that

$$\begin{aligned} \beta _{1,p}\approx { E }\left( ({\varvec{{X}}}-{\varvec{{\mu }}})'{\varvec{{\Sigma }}}^{-1}({\varvec{{Y}}}-{\varvec{{\mu }}})\right) ^{3} \end{aligned}$$

where \({\varvec{{X}}}\) and \({\varvec{{Y}}}\) are independent and identical random vectors. The sample counterpart of the above expression,

$$\begin{aligned} b_{1,p}=\frac{1}{n^2}\sum _{i,j=1}^n\left( ({\varvec{{X_i}}}-{\varvec{{\bar{X}}}})'{\varvec{{S^{-1}}}}({\varvec{{X_j}}}-{\varvec{{\bar{X}}}}\right) ^3, \end{aligned}$$
(2)

is commonly used as a measure for multivariate skewness and as a test statistic for a test for multivariate normality.

In Section 2.8 of McCullagh (1987) Mardia’s approximation of \(\beta _{1,p}\) is shown to be a natural generalization of \(\gamma ^2\). It is however not necessarily a good approximation of the canonical correlations between \({\varvec{{\bar{X}}}}\) and \({\varvec{{u}}}\). An important assumption underlying Mardia’s skewness measure is that the fourth central moments of the distribution are negligible. Seeing as, for univariate variables, \(\gamma ^2-2\le \kappa \); see Dubkov and Malakhov (1976); this seems like a rather strong condition. For univariate random variables, Thulin (2010) noted that

$$\begin{aligned} \rho _2=\text{ cor }(\bar{X}, S^2)=\frac{\gamma }{\sqrt{\kappa +3-\frac{n-3}{n-1}}} \end{aligned}$$

and used \(\hat{\rho }_2=Z_2'\), the sample moment version of this quantity, as a test statistic for a test for normality, proposing a test that is a modified version of the test of Lin and Mudholkar (1980). In Thulins’ simulation power study \(Z_2'\) was more powerful than \(\hat{\gamma }\) against most of the alternatives under study. Consequently, for \(p=1\) it is better to use the explicit expression for \(\text{ cor }(\bar{X}, S^2)\) rather than the approximation \(\text{ cor }(\bar{X}, S^2)\approx \frac{1}{\sqrt{2}}\gamma \). It is therefore of interest to use Mardia’s approach without any approximations, in the hope that this will render a more powerful test for normality. In Sect. 3 we give explicit expressions for \(\mathrm{Cov}(\bar{X}_i, S_{jk})\) and \(\mathrm{Cov}(S_{ij}, S_{kl})\), allowing us to study \({\varvec{{\Lambda _{11}}}}^{-1}{\varvec{{\Lambda _{12}}}}{\varvec{{\Lambda _{22}}}}^{-1}{\varvec{{\Lambda _{21}}}}\) without approximations and to construct new test statistics.

2.2 Multivariate kurtosis

Mardia (1970, 1974) proposed the multivariate kurtosis measure

$$\begin{aligned} \beta _{2,p}={ E }\left( ({\varvec{{X}}}-{\varvec{{\mu }}})'{\varvec{{\Sigma }}}^{-1}({\varvec{{Y}}}-{\varvec{{\mu }}})\right) ^2 \end{aligned}$$

with sample counterpart

$$\begin{aligned} b_{2,p}=\frac{1}{n}\sum _{i=1}^n\left( ({\varvec{{X_i}}}-{\varvec{{\bar{X}}}})'{\varvec{{S^{-1}}}}({\varvec{{X_i}}}-{\varvec{{\bar{X}}}})\right) ^2. \end{aligned}$$
(3)

In the univariate setting

$$\begin{aligned} \rho _3&= \text{ cor }\left( \bar{X},\frac{n}{(n-1)(n-2)}\sum _{i=1}^n(X_i-\bar{X})^3\right) \nonumber \\&= \frac{\kappa }{\sqrt{\lambda +9\frac{n}{n-1}(\kappa +\gamma ^2)+\frac{6n^2}{(n-1)(n-2)}}}, \end{aligned}$$
(4)

where \(\lambda =\frac{\mu _6}{\sigma ^6}-15\kappa -10\gamma ^2-15\) is the sixth standardized cumulant (Thulin 2010). In a simulation power study (Thulin 2010) found the test for normality based on \(\hat{\rho }_3=Z_3'\), the sample counterpart of (4), to have a better overall performance than the popular \(\hat{\kappa }=b_2=b_{2,1}\) test. It is therefore of interest to find a multivariate generalization of \(Z_3'\), in hopes that it will yield a test with higher power than \(b_{2,p}\).

Similarly to what was done above for the covariance, let

$$\begin{aligned} S_{ijk}=\frac{n}{(n-1)(n-2)}\sum _{r=1}^n(X_{r,i}-\bar{X}_i)(X_{r,j}-\bar{X}_j)(X_{r,k}-\bar{X}_k) \end{aligned}$$

and

$$\begin{aligned} {\varvec{{v}}}=(S_{111},S_{112},\ldots ,S_{pp(p-1)},S_{ppp})',\end{aligned}$$

a vector of length \(p+p(p-1)+p(p-1)(p-2)/6\). We will construct tests based on the fact that \({\varvec{{\bar{X}}}}\) and \({\varvec{{v}}}\) are independent if \({\varvec{{X_1}}}, \ldots , {\varvec{{X_n}}}\) are normal. The covariance matrix of \(({\varvec{{\bar{X}}}},{\varvec{{v}}})\) can be written as

$$\begin{aligned} \mathrm{Cov}(({\varvec{{\bar{X}}}},{\varvec{{v}}})) = \left[ \begin{array}{r@{\quad }r} {\varvec{{{\varvec{{\Psi }}}_{11}}}} &{} {\varvec{{{\varvec{{\Psi }}}_{12}}}} \\ {\varvec{{{\varvec{{\Psi }}}_{21}}}} &{} {\varvec{{{\varvec{{\Psi }}}_{22}}}}\end{array} \right] \end{aligned}$$
(5)

where \({\varvec{{\Psi _{11}}}}=\mathrm{Cov}({\varvec{{\bar{X}}}})\), \({\varvec{{\Psi _{22}}}}=\mathrm{Cov}({\varvec{{v}}})\), \({\varvec{{\Psi _{21}}}}={\varvec{{{\varvec{{\Psi }}}_{12}^{'}}}}\) and \({\varvec{{{\varvec{{\Psi }}}_{12}}}}\) contains covariances of the type \(\mathrm{Cov}(\bar{X}_i, S_{jkl})\), \(i,j,k,l=1,\ldots ,p\). If \({\varvec{{X}}}\) and \({\varvec{{v}}}\) are uncorrelated, \({\varvec{{\Psi _{12}}}}={\varvec{{0}}}\).

3 Explicit expressions for the covariances

In the following theorems we state explicit expressions for the elements of \(\mathrm{Cov}(({\varvec{{\bar{X}}}},{\varvec{{u}}}))\) and \(\mathrm{Cov}(({\varvec{{\bar{X}}}},{\varvec{{v}}}))\) in terms of moments of \((X_1,\ldots ,X_p)\). These covariances can be obtained by tedious but routine calculations of the moments involved, that are much simplified by the use of tensor notation, as described in McCullagh (1987). All five covariances can be found scattered in the literature, expressed using cumulants: (6)–(8) are all given in Section 4.2.3 of McCullagh (1987), (9) is found in Problem 4.5 of McCullagh (1987) and (10) is expression (7) in Kaplan (1952).

Theorem 1

Assume that \({\varvec{{X_1}}}, \ldots , {\varvec{{X_n}}}\) are i.i.d. \(p\)-variate random variables with \({ E }|X_iX_jX_kX_l|<\infty \) for \(i,j,k,l=1,2,\ldots ,p\). Let \(\mu _{i_1,\ldots ,i_s}={ E }(X_{i_1}-\mu _{i_1})(X_{i_2}-\mu _{i_2})\cdots (X_{i_s}-\mu _{i_s})\). Then, for \(n\ge 2p+p(p-1)/2\) and \(i,j,k,l=1,2,\ldots ,p\)

  1. (i)

    the elements of \({\varvec{{\Lambda }}}_{11}\) are

    $$\begin{aligned} \mathrm{Cov}(\bar{X}_i, \bar{X}_j)=\frac{1}{n}\mu _{ij}, \end{aligned}$$
    (6)
  2. (ii)

    the elements of \({\varvec{{\Lambda }}}_{12}\) and \({\varvec{{\Lambda }}}_{21}\) are

    $$\begin{aligned} \mathrm{Cov}(\bar{X}_i, S_{jk})=\frac{1}{n}\mu _{ijk} \end{aligned}$$
    (7)

    and

  3. (iii)

    the elements of \({\varvec{{\Lambda }}}_{22}\) are

    $$\begin{aligned} \mathrm{Cov}(S_{ij}, S_{kl})=\frac{1}{n}(\mu _{ijkl}-\mu _{ij}\mu _{kl})+\frac{1}{n(n-1)}(\mu _{ik}\mu _{jl}+\mu _{il}\mu _{jk}). \end{aligned}$$
    (8)

Since \({\varvec{{\Psi }}}_{11}={\varvec{{\Lambda }}}_{11}\), we only give the expressions for \({\varvec{{\Psi _{22}}}}\) and \({\varvec{{\Psi _{12}}}}\) in the following theorem.

Theorem 2

Assume that \({\varvec{{X_1}}}, \ldots , {\varvec{{X_n}}}\) are i.i.d. \(p\)-variate random variables with \({ E }|X_\alpha X_\beta X_\gamma X_\delta X_\epsilon X_\zeta |<\infty \) for \(\alpha ,\ldots ,\zeta =1,2,\ldots ,p\). Let \(\mu _{i_1,\ldots ,i_s}={ E }(X_{i_1}-\mu _{i_1})(X_{i_2}-\mu _{i_2})\cdots (X_{i_s}-\mu _{i_s})\). Then, for \(n\ge 2p+p(p-1)+p(p-1)(p-2)/6\) and \(i,j,k,r,s,t=1,2,\ldots ,p\)

  1. (i)

    the elements of \({\varvec{{\Psi }}}_{12}\) and \({\varvec{{\Psi }}}_{21}\) are

    $$\begin{aligned} \mathrm{Cov}(\bar{X}_i, S_{rst})=\frac{1}{n}\left( \mu _{irst}-\mu _{ir}\mu _{st}-\mu _{is}\mu _{rt}-\mu _{it}\mu _{rs}\right) \end{aligned}$$
    (9)

    and

  2. (ii)

    the elements of \({\varvec{{\Psi }}}_{22}\) are

    $$\begin{aligned} \mathrm{Cov}(S_{ijk}, S_{rst})\!&= \!\frac{1}{n}\lambda _{ijkrst} \!+\!\frac{1}{n\!-\!1}\left( \sum ^9\mu _{ir}(\mu _{jkst}\!-\!\sum ^3\mu _{jk}\mu _{st})\!+\!\sum ^9\mu _{ijr}\mu _{kst}\right) \nonumber \\&+\frac{n}{(n-1)(n-2)}\sum ^6\mu _{ir}\mu _{js}\mu _{kt} \end{aligned}$$
    (10)

    where \(\lambda _{ijkrst}\) is given below and \(\sum ^k\) denotes summation over \(k\) distinct permutations of \(i,j,k,r,s,t\). In particular, in \(\sum ^9\mu _{ir}(\ldots )\) the summation is taken over all permutations of \(i,j,k,r,s,t\) where \(i\) and either of \(j,k\) switch places and/or \(r\) and either of \(s,t\) switch places. In \(\sum ^9\mu _{ijr}\mu _{kst}\) the summation is taken over all permutations except \(\mu _{ijk}\mu _{rst}\). Finally, in \(\sum ^3\mu _{jk}\mu _{st}\) and

    $$\begin{aligned} \lambda _{ijkrst}=\mu _{ijkrst}-\sum ^{15}\mu _{ij}(\mu _{krst}-\sum ^{3}\mu _{kr}\mu _{st})-\sum ^{10}\mu _{ijk}\mu _{rst}-\sum ^{15}\mu _{ij}\mu _{kr}\mu _{st} \end{aligned}$$

    the sums are taken over all distinct permutations.

4 Tests based on \({\varvec{{\bar{X}}}}\) and \({\varvec{{u}}}\)

4.1 Modifying Mardia’s statistic

The factor 2 in Mardia’s expression

$$\begin{aligned} \beta _{1,p}=2tr({\varvec{{\Lambda _{11}}}}^{-1}{\varvec{{\Lambda _{12}}}}{\varvec{{\Lambda _{22}}}}^{-1}{\varvec{{\Lambda _{21}}}}) \end{aligned}$$

is only of interest if we assume negligible fourth moments (in the sense of Mardia (1970)). We will therefore omit it in the following and instead study the quantity

$$\begin{aligned} tr({\varvec{{\Lambda _{11}}}}^{-1}{\varvec{{\Lambda _{12}}}}{\varvec{{\Lambda _{22}}}}^{-1}{\varvec{{\Lambda _{21}}}}). \end{aligned}$$

Let \({\varvec{{L_{11}}}}\), \({\varvec{{L_{22}}}}\), \({\varvec{{L_{12}}}}\) and \({\varvec{{L_{21}}}}\) be the sample counterparts of \({\varvec{{\Lambda _{11}}}}\), \({\varvec{{\Lambda _{22}}}}\), \({\varvec{{\Lambda _{12}}}}\) and \({\varvec{{\Lambda _{21}}}}\), where \(\mu _{i_1,\ldots ,i_s}={ E }(X_{i_1}-\mu _{i_1})(X_{i_2}-\mu _{i_2})\ldots (X_{i_s}-\mu _{i_s})\) are estimated by the sample moments

$$\begin{aligned} m_{i_1,\ldots ,i_s}=n^{-1}\sum _{k=1}^n (x_{k,i_1}-\bar{x}_{i_1})(x_{k,i_2}-\bar{x}_{i_2})\ldots (x_{k,i_s}-\bar{x}_{i_s}), \end{aligned}$$
(11)

i.e. where the moments in Theorem 1 are replaced by their sample counterparts. The test statistic for the new test is

$$\begin{aligned} Z_{2,p}^{({ HL})}=tr({\varvec{{L_{11}}}}^{-1}{\varvec{{L_{12}}}}{\varvec{{L_{22}}}}^{-1}{\varvec{{L_{21}}}}). \end{aligned}$$
(12)

The null hypothesis of normality is rejected if \(Z_{2,p}^{({ HL})}\) is sufficiently large.

\(Z_{2,1}^{({ HL})}\) coincides with \(Z_2'^2\) from Thulin (2010) and is thus equivalent to the \(|Z_2'|\) test presented there. \(Z_{2,2}^{({ HL})}\) is a polynomial of degree 10 in 13 moments and the full formula takes up more than two pages. It is however readily computed using a computer, as is \(Z_{2,p}^{({ HL})}\) for higher \(p\).

It should be noted that differences in index notation complicate the situation somewhat here. Mardia’s skewness is denoted \(b_{1,p}\), with 1 as its index, whereas the univariate correlation statistic \(Z_2'\) has 2 as its index. When generalizing \(Z_2'\) to the multivariate setting we will keep the index 2, hoping that it won’t be confused with Mardia’s kurtosis measure \(b_{2,p}\).

4.2 Other test statistics from the theory of canonical correlations

Let \({\varvec{{Y}}}\) and \({\varvec{{Z}}}\) be normal random vectors with

$$\begin{aligned} \mathrm{Cov}(({\varvec{{Y}}},{\varvec{{Z}}})) = \left[ \begin{array}{l@{\quad }l} {\varvec{{{\varvec{{\Sigma }}}_{11}}}} &{} {\varvec{{{\varvec{{\Sigma }}}_{12}}}} \\ {\varvec{{{\varvec{{\Sigma }}}_{21}}}} &{} {\varvec{{{\varvec{{\Sigma }}}_{22}}}}\end{array} \right] \end{aligned}$$

partitioned like (1). Let \({\varvec{{\hat{\Sigma }_{11}}}}\), \({\varvec{{\hat{\Sigma }_{22}}}}\) and \({\varvec{{\hat{\Sigma }_{12}}}}={\varvec{{\hat{\Sigma }_{21}'}}}\) be the sample covariance matrices and \(\hat{\nu }_1^2,\ldots ,\hat{\nu }_p^2\) be the eigenvalues of \({\varvec{{\hat{\Sigma }_{11}}}}^{-1}{\varvec{{\hat{\Sigma }_{12}}}}{\varvec{{\hat{\Sigma }_{22}}}}^{-1}{\varvec{{\hat{\Sigma }_{21}}}}\). In Section 10.3 of Kshirsagar (1972) the test statistic of the likelihood ratio test of \(H_0: {\varvec{{\Sigma _{12}}}}={\varvec{{0}}}\) versus \(H_1: {\varvec{{\Sigma _{12}}}}\ne {\varvec{{0}}}\) is shown to be

$$\begin{aligned} -n\log \prod _{i=1}^p(1-\hat{\nu }_i^2). \end{aligned}$$
(13)

Now, let \(\hat{\lambda }_1^2\ge \hat{\lambda }_2^2\ge \ldots \ge \hat{\lambda }_p^2\) be the eigenvalues of \({\varvec{{L_{11}}}}^{-1}{\varvec{{L_{12}}}}{\varvec{{L_{22}}}}^{-1}{\varvec{{L_{21}}}}\). Assuming that the necessary moments exist, \({\varvec{{\bar{X}}}}\) and \({\varvec{{u}}}\) are asymptotically normal. Although \({\varvec{{L_{22}}}}\) and \({\varvec{{L_{12}}}}\) are not the usual sample covariance matrices, in the light of (13), this suggests the use of the following statistic for a test for normality:

$$\begin{aligned} Z_{2,p}^{(W)}=\prod _{i=1}^p(1-\hat{\lambda }_i^2). \end{aligned}$$
(14)

The null hypothesis of normality is rejected if \(Z_{2,p}^{(W)}\) is sufficiently small.

Another quantity that has been considered for a test of \(H_0: {\varvec{{\Sigma _{12}}}}={\varvec{{0}}}\), for instance by Bartlett (1939), is

$$\begin{aligned} Z_{2,p}^{({ PB})}=\sum _{i=1}^p\frac{\hat{\lambda }_i^2}{1-\hat{\lambda }_i^2}. \end{aligned}$$
(15)

\(Z_{2,p}^{({ PB})}\) is similar to \(Z_{2,p}^{({ HL})}\), but weighs the correlation coefficients so that larger coefficients become more influential. The null hypothesis should be rejected for large values of \(Z_{2,p}^{({ PB})}\).

Finally, we can consider the statistic

$$\begin{aligned} Z_{2,p}^{(max)}=\max (\hat{\lambda }_1^2,\ldots ,\hat{\lambda }_p^2)=\hat{\lambda }_1^2, \end{aligned}$$
(16)

large values of which imply non-normality. \(Z_{2,p}^{(max)}\) seems perhaps like the most natural choice for a test statistic, as \(\lambda _1=0\) implies that all canonical correlations are 0.

The statistics \(Z_{2,p}^{({ HL})}\), \(Z_{2,p}^{(W)}\), \(Z_{2,p}^{({ PB})}\) and \(Z_{2,p}^{(max)}\) are all related to well-known statistics from multivariate analysis of variance; they are analogs of the Hotelling–Lawley trace, Wilk’s \(\Lambda \), the Pillai–Bartlett trace and Roy’s greatest root, respectively. For \(p=1\) these statistics are all equivalent to the \(|Z_2'|\) test from Thulin (2010).

4.3 Theoretical results

Some fundamental properties of the new test statistics are presented in the following theorem. Its proof is given in the Appendix.

Theorem 3

Assume that \({\varvec{{X_1}}}, \ldots , {\varvec{{X_n}}}\) are i.i.d. \(p\)-variate random variables fulfilling the conditions of Theorem 1. Then, for \(n\ge 2p+p(p-1)/2\) and \(i,j,k=1,2,\ldots ,p\)

  1. (i)

    \(Z_{2,p}^{({ HL})}, Z_{2,p}^{(W)}, Z_{2,p}^{({ PB})}\) and \(Z_{2,p}^{(max)}\) are affine invariant, i.e. invariant under nonsingular linear transformations \({\varvec{{AX}}}+{\varvec{{b}}}\) where \({\varvec{{A}}}\) is a nonsingular \(p\times p\) matrix and \({\varvec{{b}}}\) is a \(p\)-vector,

  2. (ii)

    The population canonical correlation \(\lambda _1=\max _{{\varvec{{a}}},{\varvec{{b}}}}|\rho ({\varvec{{a\bar{X}}}},{\varvec{{bu}}})|=0\) if \(\mu _{ijk}= 0\) for all \(i,j,k\) and \(>0\) if \(\mu _{ijk}\ne 0\) for at least one combination of \(i,j,k\), and

  3. (iii)

    \(Z_{2,p}^{({ HL})}, Z_{2,p}^{(W)}, Z_{2,p}^{({ PB})}\) and \(Z_{2,p}^{(max)}\) converge almost surely to the corresponding functions of the population canonical correlations \(\lambda _1\ge \lambda _2\ge \ldots \ge \lambda _p\).

Since the statistics are affine invariant, their distributions are the same for all \(p\)-variate normal distributions for a given sample size \(n\). These null distributions are easily obtained using Monte Carlo simulation.

Since \(\lambda _1\ge \lambda _j\) for \(j>1, \lambda _1=0\) implies that all population canonical correlations are 0, as is the case for the normal distribution. The tests should therefore not be sensitive to distributions with that kind of symmetry. All four statistics are, by (ii) and (iii), however consistent against alternatives where \(\mu _{ijk}\ne 0\) for at least one combination of \(i,j,k\). In particular, they are sensitive to alternatives with skew marginal distributions.

5 Tests based on \({\varvec{{\bar{X}}}}\) and \({\varvec{{v}}}\)

5.1 Test statistics

The ideas used in Sect. 4 for \(\mathrm{Cov}(({\varvec{{\bar{X}}}},{\varvec{{u}}}))\) can also be used for \(\mathrm{Cov}(({\varvec{{\bar{X}}}},{\varvec{{v}}}))\) in an analog manner, yielding multivariate generalizations of (4). This leads to two new tests for normality, as described below.

Let \({\varvec{{P_{11}}}}, {\varvec{{P_{22}}}}, {\varvec{{P_{12}}}}\) and \({\varvec{{P_{21}}}}\) be the sample counterparts of \({\varvec{{\Psi _{11}}}}, {\varvec{{\Psi _{22}}}}, {\varvec{{\Psi _{12}}}}\) and \({\varvec{{\Psi _{21}}}}\), where the \(\mu _{i_1,\ldots ,i_s}\) are estimated by the sample moments, as in (11) above. Let \(\hat{\psi }_1^2\ge \ldots \ge \hat{\psi }_p^2\) be the eigenvalues of \({\varvec{{P_{11}}}}^{-1}{\varvec{{P_{12}}}}{\varvec{{P_{22}}}}^{-1}{\varvec{{P_{21}}}}\).

The test statistics for the new tests are

$$\begin{aligned} Z_{3,p}^{({ HL})}&= tr({\varvec{{P_{11}}}}^{-1}{\varvec{{P_{12}}}}{\varvec{{P_{22}}}}^{-1}{\varvec{{P_{21}}}})=\sum _{i=1}^p\hat{\psi }_i^2, \end{aligned}$$
(17)
$$\begin{aligned} Z_{3,p}^{(W)}&= \prod _{i=1}^p(1-\hat{\psi }_i^2), \end{aligned}$$
(18)

We have also considered other statistics, but found these to have lower power than these two. Large values of \(Z_{3,p}^{({ HL})}\) and small values of \(Z_{3,p}^{(W)}\) imply non-normality. Both statistics are equivalent to \(|Z_3'|\) from Thulin (2010) for \(p=1\).

5.2 Theoretical results

The following theorem mimics Theorem 3 above. Its proof is given in the Appendix.

Theorem 4

Assume that \({\varvec{{X_1}}}, \ldots , {\varvec{{X_n}}}\) are i.i.d. \(p\)-variate random variables fulfilling the conditions of Theorem 2. Then, for \(n\ge 2p+p(p-1)+p(p-1)(p-2)/6\) and \(i,j,k,r,s,t=1,2,\ldots ,p\)

  1. (i)

    \(Z_{3,p}^{({ HL})}\) and \(Z_{3,p}^{(W)}\) are affine invariant, i.e. invariant under nonsingular linear transformations \({\varvec{{AX}}}+{\varvec{{b}}}\) where \({\varvec{{A}}}\) is a nonsingular \(p\times p\) matrix and \({\varvec{{b}}}\) is a \(p\)-vector,

  2. (ii)

    The population canonical correlation \(\psi _1=\max _{{\varvec{{a}}},{\varvec{{b}}}}|\rho ({\varvec{{a\bar{X}}}},{\varvec{{bv}}})|=0\) if \(\mu _{irst}-\mu _{ir}\mu _{st}-\mu _{is}\mu _{rt}-\mu _{it}\mu _{rs}=0\) for all \(i,r,s,t=1,\ldots ,p\) and \(>0\) otherwise, and

  3. (iii)

    \(Z_{3,p}^{({ HL})}\) and \(Z_{3,p}^{(W)}\) converge almost surely to the corresponding functions of the population canonical correlations \(\psi _1\ge \psi _2\ge \ldots \ge \psi _p\).

Using the affine invariance, the null distributions of the statistics can be obtained through Monte Carlo simulation.

By (ii) and (iii) both statistics are consistent against alternatives where \(\mu _{irst}-\mu _{ir}\mu _{st}-\mu _{is}\mu _{rt}-\mu _{it}\mu _{rs}\ne 0\) for at least one combination of \(i,r, s, t\).

6 Analysis of the Iris data set

In Table 1 we present the results for the new tests when applied to the famous Iris data set of Fisher (1936). The tests are applied to each of the three subsets of the Iris data: Setosa, Versicolor and Virginica. For each such subset \(n=50\) and \(p=4\). We also applied Mardia’s skewness test \(b_{1,p}\) (2), Mardia’s kurtosis test \(b_{2,p}\) (3) and the Mardia–Kent omnibus test \(T\) (Mardia and Kent 1991), in which the skewness and kurtosis measures are combined. To compute the critical values and \(p\)-values, we approximated the null distribution of each test statistic by 10,000 simulated samples from a normal distribution. The resulting critical values at the 5 % level are given in the table. Recall that for the \(\mathbf {Z_{2,p}^{(W)}}\) and \(\mathbf {Z_{3,p}^{(W)}}\) statistics, values of the statistic that are smaller than the critical value imply non-normality.

Table 1 Results for the Iris data set

At the 5 % level, the only time at which normality is rejected is when \(b_{2,p}\) is applied to the Setosa sample.

7 Simulation results

7.1 The simulation study

To evaluate the performance of the new \(Z_{2,p}\) and \(Z_{3,p}\) tests, a Monte Carlo study of their power was carried out. The tests were compared to \(b_{1,p}\), \(b_{2,p}\) (3) and the Mardia–Kent test \(T\). The tests were compared for \(n=20\) and \(n=50\) for \(p=2\) and \(p=3\). For some alternatives, more combinations of \(n\) and \(p\) were used. Since the results for \(p=2\) and \(p=3\) were quite similar, we only present the results for \(p=3\) below. The results for \(p=2\) can be found in Supplement S1.

In many power studies for multivariate tests for normality alternatives with independent marginal distributions have been used. We believe that this can be misleading, as distributions with independent marginals are uncommon in practice and indeed of little interest in the multivariate setting, where the dependence structure of the marginals often is paramount. For this reason, we decided to focus mainly on alternatives with a more complex dependence structure in our study. One alternative with independent exponential marginals, which has been used in many previous power studies, is included for reference.

The alternatives used in the study are presented in Tables 2 and 3. Contour plots of the alternatives from Table 2 are given in Supplement S2. The asymmetric multivariate Laplace distribution mentioned in Table 3 is described in Kotz et al. (2000).

Table 2 Alternatives constructed using their marginal distributions
Table 3 Purely multivariate alternatives. Here \({\varvec{{\Sigma _r}}}\) is a covariance matrix with unit variances and correlations \(r\)

In order to see which alternatives that the different tests could be sensitive to, the population values of the statistics were determined for all alternatives. For most distributions the values were computed numerically, to one decimal place for Mardia’s statistics and to two decimal places for the \(Z_{2,p}\) and \(Z_{3,p}\) tests. The population values are given in Table 3 in Supplement S1.

Using R, the nine tests were applied to 1,000,000 samples from each alternative and each combination of \(n\) and \(p\). The null distributions for all test statistics were estimated using 100,000 standard normal samples.

7.2 Results for symmetric alternatives

The results for alternatives with symmetric marginal distributions are presented in Table 4 in the “Appendix”. Mardia’s kurtosis test \(b_{2,p}\) had the best overall performance against symmetric alternatives with long-tailed marginal distributions, with the Mardia-Kent \(T\) test as runner-up. The \(Z_{3,p}\) tests had by far the best performance against symmetric alternatives with short-tailed marginal distributions, but performed poorly against heavy-tailed alternatives. It should therefore be regarded as being directed against short-tailed alternatives.

Table 4 Power of tests for normality against symmetric alternatives, \(\alpha =0.05, p=3\)

\(b_{2,p}\) and the \(Z_{3,p}\) tests were somewhat unexpectedly outperformed by the \(b_{1,p}\) test and the \(Z_{2,p}\) tests for the \(Laplace(0,1)\) (type I) distribution. This was likely caused by the fact that a distribution with that particular dependence structure (described in Table 2), while having symmetric marginal distributions, is not symmetric in a multivariate sense, as can be seen from the contour plot in Supplement S2 or in Table 3 in Supplement S1.

Finally, we investigated the size of the tests by computing their power against two normal distributions. All tests attained the desired size \(\alpha =0.05\).

7.3 Results for asymmetric alternatives

The results for alternatives with asymmetric marginal distributions are presented in Figs. 1 and 2 and Table 5 in the Appendix.

Fig. 1
figure 1

a Power against dependent Beta(1,2) marginals, \(p=2\). b Power against dependent LogN(0,1) marginals, \(p=2\)

Fig. 2
figure 2

a Power against AsL(3,I), \(p=2\). b Power against AsL(3,I), \(n=5p\)

Table 5 Power of tests for normality against asymmetric alternatives, \(\alpha =0.05, p=3\)

Mardias skewness test \(b_{1,p}\) and the \(Z_{2,p}\) tests are all directed to asymmetric alternatives, and outperformed the other tests. However, no directed test was uniformly more powerful than the other directed tests. For \(p=2\), the \(Z_{2,p}^{(max)}\) had the best overall performance against asymmetric alternatives, while \(b_{1,p}\) and the \(Z_{2,p}^{(W)}\) and \(Z_{2,p}^{({ PB})}\) tests also displayed a good average performance. For \(p=3\) the performance of \(Z_{2,p}^{(max)}\) was somewhat worse, whereas \(b_{1,p}\), \(Z_{2,p}^{(W)}\) and \(Z_{2,p}^{({ PB})}\) still showed good performance.

How varying \(n\) and \(p\) affects the power of the tests is investigated in Figs. 1 and 2. In Fig. 1a, we see that against a distribution with \(Beta(1,2)\) marginal distributions, the \(Z_{3,p}\) tests have the best performance for small \(n\), whereas the \(Z_{2,p}\) tests are superior for larger \(n\). In Fig. 1, it is seen that against a distribution with \(LogN(0,1)\) marginal distributions, the \(Z_{2,p}\) tests have higher power than the \(b_{1,p}\) test for small \(n\), while the relation is reversed for larger \(n\).

In Fig. 2a, we see that against the \(AL(\mathbf {3},\mathbf {\Sigma _0})\) distribution, \(b_{2,p}\) has slightly higher power than the \(Z_{2,p}\) tests for small \(n\), whereas the \(Z_{2,p}\) tests have slightly higher power for larger \(n\). In Fig. 2b however, when \(n/p\) is fixed and \(p\) is increased, the difference in power between the tests remains more or less unchanged.

8 Discussion

Based on the simulation results, our recommendations are that the \(Z_{2,p}^{(max)}\) test should be used against asymmetric alternatives when \(p=2\). For higher \(p\), \(b_{1,p}\), \(Z_{2,p}^{(W)}\) or \(Z_{2,p}^{({ PB})}\) should be used instead. Mardia’s \(b_{2,p}\) test should be used against heavy-tailed symmetric alternatives. For short-tailed symmetric alternatives, one of the \(Z_{3,p}\) tests would be a better choice.

Most previous power studies for multivariate tests for normality have focused on alternatives with independent marginal distributions. Such distributions are likely to be rare in practice, and as is shown by the two distributions with \(Laplace(0,1)\) marginals used in our study, multivariate dependence structures can greatly affect the power of tests for normality.

To complicate matters further, some of the results in the tables highlight the fact that what holds true for one combination of \(p\) and \(n\) can be false for a different combination. For instance, when \(p=2\), \(Z_{2,p}^{(max)}\) had higher power than \(b_{1,p}\) for the \(AL(\mathbf {1},{\varvec{{\Sigma _{0}}}})\) and the multivariate \(\chi ^2_8\) alternatives, but when \(p=3\), \(Z_{2,p}^{(max)}\) had lower power than \(b_{1,p}\). This phenomenon merits further investigation, as it implies that power studies performed for low values of \(p\) can be misleading when choosing between tests to use for higher-dimensional data. Further examples of this phenomenon are given in Figs. 1 and 2.

In recent years, several authors have studied robust testing for normality, i.e. normality test designed to be robust against outliers. See Stehlík et al. (2012) and Cerioli et al. (2013) for examples. Stehlík et al. (2014) proposed a robustified version of the univariate \(Z_{2,1}\) test. A robustified version of the multivariate \(Z_{2,p}\) test will appear in a future paper by the author.

Looking at the normal mixtures, which can be viewed as contaminated normal distributions, we see that \(Z_{2,p}^{(max)}\) and \(b_{1,p}\) were on a par for the mildy polluted mixtures (with a 9:1 mixing ratio) and that \(Z_{2,p}^{(max)}\) in general had higher power for the heavily polluted mixtures (with a 3:1 mixing ratio). This suggests the use of the \(Z_{2,p}^{(max)}\) statistic for a test for outliers, an idea that perhaps could be investigated further.

Implementations of the \(Z_{2,p}\) and \(Z_{3,p}\) in R are available from the author. Some critical values for the new tests are given in Table 6.

Table 6 Critical values of the new tests