Abstract
We propose new affine invariant tests for multivariate normality, based on independence characterizations of the sample moments of the normal distribution. The test statistics are obtained using canonical correlations between sets of sample moments in a way that resembles the construction of Mardia’s skewness measure and generalizes the Lin–Mudholkar test for univariate normality. The tests are compared to some popular tests based on Mardia’s skewness and kurtosis measures in an extensive simulation power study and are found to offer higher power against many of the alternatives.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Many classical multivariate statistical methods are based on the assumption that the data comes from a multivariate normal distribution. Consequently, the use of such methods should be followed by an investigation of the assumption of normality. A number of tests for multivariate normality can be found in the literature, but the field has not been investigated to the same extent as have tests for univariate normality.
Let \(\gamma ={ E }(X-\mu )^3/\sigma ^ 3\) denote the skewness of a univariate random variable \(X\) and \(\kappa ={ E }(X-\mu )^ 4/\sigma ^ 4-3\) denote its (excess) kurtosis. Both these quantities are 0 for the normal distribution but nonzero for many other distributions, and some common tests for univariate normality are therefore based on \(\gamma \) and \(\kappa \).
Different analog multivariate measures of skewness and kurtosis have been proposed, perhaps most notably by Mardia (1970). Said measures have been used for various tests for multivariate normality in the last few decades. Some of these tests, in particular the tests that use Mardia’s skewness and kurtosis measures as test statistics, have proved to have high power in many simulation studies (e.g. Mecklin and Mundfrom 2004, 2005) and new tests for normality based on multivariate skewness and kurtosis continue to be published today (Doornik and Hansen 2008; Kankainen et al. 2007).
In many inferential situations, some types of departures from normality are a more serious concern than are others. For instance, MANOVA is known to be sensitive to deviations from normality in the form of asymmetry, but to be relatively robust against deviations in the form of heavy tails. Using skewness and kurtosis allows us to construct tests that are directed toward some particular class of alternatives: skewness is used to detect asymmetric alternatives whereas kurtosis is used to detect alternatives with either short or long tails. This typically results in tests that, in comparison to omnibus tests that are directed to all alternatives, have higher power against the class of alternatives that they are directed to.
While more directed toward certain alternatives, such tests may however still be prone to reject alternatives from other classes. The sample skewness and sample kurtosis are correlated, which for instance can cause a skewness-based test to reject normality for a symmetric distribution with heavy tails. Henze (2002) and others have argued that this is a reason to avoid directed tests for normality. Directed tests will however in general have comparatively low power against alternatives that they are not directed to, lowering the risk of rejecting normality because of an unimportant deviation from normality. It is arguably better to have a test that has high power against interesting alternatives and lower power against uninteresting alternatives, rather than a test that has medium high power against all alternatives.
In this paper six new directed tests for normality, all related to multivariate skewness or kurtosis, are proposed. Their common basis is independence characterizations of sample moments of the multivariate normal distribution.
In Sect. 2 we reexamine Mardia’s measure of multivariate skewness, which leads to two new classes of tests for multivariate normality. In Sect. 3 we state explicit expressions for covariances between multivariate sample moments in terms of moments of \({\varvec{{X}}}=(X_1,\ldots ,X_p)'\). This will allow us to estimate the moments involved and to test whether these sample moments are correlated.
In Sect. 4 we study the first class of new tests for normality, all of which are related to multivariate skewness. These can be viewed as multivariate generalizations of the univariate \(Z_2'\) test (Thulin 2010), which in turn is a modified version of the Lin and Mudholkar (1980) test. In Sect. 5 we study the second class of tests, related to multivariate kurtosis. These, in turn, are generalizations of the Thulin (2010) \(Z_3'\) modification of a test proposed by Mudholkar et al. (2002). The tests are applied to the Iris data in Sect. 6. The results of a simulation study comparing the new tests with tests based on Mardia’s skewness and kurtosis measures is presented in Sect. 7, which is followed by a discussion in Sect. 8. The text concludes with an appendix containing proofs and tables. Additional tables and figures are included in two online supplements.
2 Mardia’s multivariate skewness and kurtosis measures revisited
2.1 Multivariate skewness
A well-known characterization of the multivariate normal distributions is that the i.i.d. \(p\)-variate variables \({\varvec{{X_1}}}, \ldots , {\varvec{{X_n}}}\) are normal if and only if the sample mean vector \({\varvec{{\bar{X}}}}=(\bar{X}_1,\bar{X}_2,\ldots ,\bar{X}_p)'\) and the sample covariance matrix \({\varvec{{S}}}\) are independent. Our aim is to test this independence in order to assess the normality of a population. As testing independence is difficult, we will resort to testing correlations instead.
Assume that \({\varvec{{X_1}}}, \ldots , {\varvec{{X_n}}}\) are i.i.d. \(p\)-variate random variables with nonsingular covariance matrix \({\varvec{{\Sigma }}}\). Let \({\varvec{{\bar{X}}}}=(\bar{X}_1,\bar{X}_2,\ldots ,\bar{X}_p)'\) be the sample mean vector and let
be the sample covariance matrix with \(S_{ij}=(n-1)^{-1}\sum _{k=1}^n(X_{k,i}-\bar{X}_i)(X_{k,j}-\bar{X}_j)\). Define
so that \({\varvec{{u}}}\) is a vector containing the \(q=p(p+1)/2\) distinct elements of \({\varvec{{S}}}\). Now, consider the covariance matrix of the vector \(({\varvec{{\bar{X}'}}},{\varvec{{u'}}})'\), in the following denoted \(({\varvec{{\bar{X}}}},{\varvec{{u}}})\):
where \({\varvec{{\Lambda _{11}}}}=\mathrm{Cov}({\varvec{{\bar{X}}}})\), \({\varvec{{\Lambda _{22}}}}=\mathrm{Cov}({\varvec{{u}}})\), \({\varvec{{\Lambda _{21}}}}={\varvec{{{\varvec{{\Lambda }}}_{12}^{'}}}}\) and \({\varvec{{{\varvec{{\Lambda }}}_{12}}}}\) contains covariances of the type \(\mathrm{Cov}(\bar{X}_i, S_{jk})\), \(i,j,k=1,\ldots ,p\). If \({\varvec{{\bar{X}}}}\) and \({\varvec{{u}}}\) are uncorrelated, then \({\varvec{{\Lambda _{12}}}}={\varvec{{0}}}\).
Mardia (1970, 1974) noted that for univariate random variables, asymptotically \(\text{ cor }(\bar{X}, S^2)\approx \frac{1}{\sqrt{2}}\gamma \) if \(\kappa \) is assumed to be negligible. Based on this, he used \(\mathrm{Cov}(({\varvec{{\bar{X}}}},{\varvec{{u}}}))\) to construct a multivariate skewness measure. Studying the canonical correlations (see e.g. Mardia et al. 1979) between \({\varvec{{\bar{X}}}}\) and \({\varvec{{u}}}\) he proposed the measure
where \(\lambda _1,\ldots ,\lambda _p\) are the canonical correlations. This expression reduces to \(2\text{ cor }(\bar{X}, S^2)^ 2\approx \gamma ^2\) for univariate random variables.
From the theory of canonical correlations we have that \(\lambda _1^ 2,\ldots ,\lambda _p^ 2\) are the eigenvalues of \( {\varvec{{\Lambda _{11}}}}^{-1}{\varvec{{\Lambda _{12}}}}{\varvec{{\Lambda _{22}}}}^{-1}{\varvec{{\Lambda _{21}}}} \) and thus
Taking these moments to order \(n^{-1}\) Mardia showed that
where \({\varvec{{X}}}\) and \({\varvec{{Y}}}\) are independent and identical random vectors. The sample counterpart of the above expression,
is commonly used as a measure for multivariate skewness and as a test statistic for a test for multivariate normality.
In Section 2.8 of McCullagh (1987) Mardia’s approximation of \(\beta _{1,p}\) is shown to be a natural generalization of \(\gamma ^2\). It is however not necessarily a good approximation of the canonical correlations between \({\varvec{{\bar{X}}}}\) and \({\varvec{{u}}}\). An important assumption underlying Mardia’s skewness measure is that the fourth central moments of the distribution are negligible. Seeing as, for univariate variables, \(\gamma ^2-2\le \kappa \); see Dubkov and Malakhov (1976); this seems like a rather strong condition. For univariate random variables, Thulin (2010) noted that
and used \(\hat{\rho }_2=Z_2'\), the sample moment version of this quantity, as a test statistic for a test for normality, proposing a test that is a modified version of the test of Lin and Mudholkar (1980). In Thulins’ simulation power study \(Z_2'\) was more powerful than \(\hat{\gamma }\) against most of the alternatives under study. Consequently, for \(p=1\) it is better to use the explicit expression for \(\text{ cor }(\bar{X}, S^2)\) rather than the approximation \(\text{ cor }(\bar{X}, S^2)\approx \frac{1}{\sqrt{2}}\gamma \). It is therefore of interest to use Mardia’s approach without any approximations, in the hope that this will render a more powerful test for normality. In Sect. 3 we give explicit expressions for \(\mathrm{Cov}(\bar{X}_i, S_{jk})\) and \(\mathrm{Cov}(S_{ij}, S_{kl})\), allowing us to study \({\varvec{{\Lambda _{11}}}}^{-1}{\varvec{{\Lambda _{12}}}}{\varvec{{\Lambda _{22}}}}^{-1}{\varvec{{\Lambda _{21}}}}\) without approximations and to construct new test statistics.
2.2 Multivariate kurtosis
Mardia (1970, 1974) proposed the multivariate kurtosis measure
with sample counterpart
In the univariate setting
where \(\lambda =\frac{\mu _6}{\sigma ^6}-15\kappa -10\gamma ^2-15\) is the sixth standardized cumulant (Thulin 2010). In a simulation power study (Thulin 2010) found the test for normality based on \(\hat{\rho }_3=Z_3'\), the sample counterpart of (4), to have a better overall performance than the popular \(\hat{\kappa }=b_2=b_{2,1}\) test. It is therefore of interest to find a multivariate generalization of \(Z_3'\), in hopes that it will yield a test with higher power than \(b_{2,p}\).
Similarly to what was done above for the covariance, let
and
a vector of length \(p+p(p-1)+p(p-1)(p-2)/6\). We will construct tests based on the fact that \({\varvec{{\bar{X}}}}\) and \({\varvec{{v}}}\) are independent if \({\varvec{{X_1}}}, \ldots , {\varvec{{X_n}}}\) are normal. The covariance matrix of \(({\varvec{{\bar{X}}}},{\varvec{{v}}})\) can be written as
where \({\varvec{{\Psi _{11}}}}=\mathrm{Cov}({\varvec{{\bar{X}}}})\), \({\varvec{{\Psi _{22}}}}=\mathrm{Cov}({\varvec{{v}}})\), \({\varvec{{\Psi _{21}}}}={\varvec{{{\varvec{{\Psi }}}_{12}^{'}}}}\) and \({\varvec{{{\varvec{{\Psi }}}_{12}}}}\) contains covariances of the type \(\mathrm{Cov}(\bar{X}_i, S_{jkl})\), \(i,j,k,l=1,\ldots ,p\). If \({\varvec{{X}}}\) and \({\varvec{{v}}}\) are uncorrelated, \({\varvec{{\Psi _{12}}}}={\varvec{{0}}}\).
3 Explicit expressions for the covariances
In the following theorems we state explicit expressions for the elements of \(\mathrm{Cov}(({\varvec{{\bar{X}}}},{\varvec{{u}}}))\) and \(\mathrm{Cov}(({\varvec{{\bar{X}}}},{\varvec{{v}}}))\) in terms of moments of \((X_1,\ldots ,X_p)\). These covariances can be obtained by tedious but routine calculations of the moments involved, that are much simplified by the use of tensor notation, as described in McCullagh (1987). All five covariances can be found scattered in the literature, expressed using cumulants: (6)–(8) are all given in Section 4.2.3 of McCullagh (1987), (9) is found in Problem 4.5 of McCullagh (1987) and (10) is expression (7) in Kaplan (1952).
Theorem 1
Assume that \({\varvec{{X_1}}}, \ldots , {\varvec{{X_n}}}\) are i.i.d. \(p\)-variate random variables with \({ E }|X_iX_jX_kX_l|<\infty \) for \(i,j,k,l=1,2,\ldots ,p\). Let \(\mu _{i_1,\ldots ,i_s}={ E }(X_{i_1}-\mu _{i_1})(X_{i_2}-\mu _{i_2})\cdots (X_{i_s}-\mu _{i_s})\). Then, for \(n\ge 2p+p(p-1)/2\) and \(i,j,k,l=1,2,\ldots ,p\)
-
(i)
the elements of \({\varvec{{\Lambda }}}_{11}\) are
$$\begin{aligned} \mathrm{Cov}(\bar{X}_i, \bar{X}_j)=\frac{1}{n}\mu _{ij}, \end{aligned}$$(6) -
(ii)
the elements of \({\varvec{{\Lambda }}}_{12}\) and \({\varvec{{\Lambda }}}_{21}\) are
$$\begin{aligned} \mathrm{Cov}(\bar{X}_i, S_{jk})=\frac{1}{n}\mu _{ijk} \end{aligned}$$(7)and
-
(iii)
the elements of \({\varvec{{\Lambda }}}_{22}\) are
$$\begin{aligned} \mathrm{Cov}(S_{ij}, S_{kl})=\frac{1}{n}(\mu _{ijkl}-\mu _{ij}\mu _{kl})+\frac{1}{n(n-1)}(\mu _{ik}\mu _{jl}+\mu _{il}\mu _{jk}). \end{aligned}$$(8)
Since \({\varvec{{\Psi }}}_{11}={\varvec{{\Lambda }}}_{11}\), we only give the expressions for \({\varvec{{\Psi _{22}}}}\) and \({\varvec{{\Psi _{12}}}}\) in the following theorem.
Theorem 2
Assume that \({\varvec{{X_1}}}, \ldots , {\varvec{{X_n}}}\) are i.i.d. \(p\)-variate random variables with \({ E }|X_\alpha X_\beta X_\gamma X_\delta X_\epsilon X_\zeta |<\infty \) for \(\alpha ,\ldots ,\zeta =1,2,\ldots ,p\). Let \(\mu _{i_1,\ldots ,i_s}={ E }(X_{i_1}-\mu _{i_1})(X_{i_2}-\mu _{i_2})\cdots (X_{i_s}-\mu _{i_s})\). Then, for \(n\ge 2p+p(p-1)+p(p-1)(p-2)/6\) and \(i,j,k,r,s,t=1,2,\ldots ,p\)
-
(i)
the elements of \({\varvec{{\Psi }}}_{12}\) and \({\varvec{{\Psi }}}_{21}\) are
$$\begin{aligned} \mathrm{Cov}(\bar{X}_i, S_{rst})=\frac{1}{n}\left( \mu _{irst}-\mu _{ir}\mu _{st}-\mu _{is}\mu _{rt}-\mu _{it}\mu _{rs}\right) \end{aligned}$$(9)and
-
(ii)
the elements of \({\varvec{{\Psi }}}_{22}\) are
$$\begin{aligned} \mathrm{Cov}(S_{ijk}, S_{rst})\!&= \!\frac{1}{n}\lambda _{ijkrst} \!+\!\frac{1}{n\!-\!1}\left( \sum ^9\mu _{ir}(\mu _{jkst}\!-\!\sum ^3\mu _{jk}\mu _{st})\!+\!\sum ^9\mu _{ijr}\mu _{kst}\right) \nonumber \\&+\frac{n}{(n-1)(n-2)}\sum ^6\mu _{ir}\mu _{js}\mu _{kt} \end{aligned}$$(10)where \(\lambda _{ijkrst}\) is given below and \(\sum ^k\) denotes summation over \(k\) distinct permutations of \(i,j,k,r,s,t\). In particular, in \(\sum ^9\mu _{ir}(\ldots )\) the summation is taken over all permutations of \(i,j,k,r,s,t\) where \(i\) and either of \(j,k\) switch places and/or \(r\) and either of \(s,t\) switch places. In \(\sum ^9\mu _{ijr}\mu _{kst}\) the summation is taken over all permutations except \(\mu _{ijk}\mu _{rst}\). Finally, in \(\sum ^3\mu _{jk}\mu _{st}\) and
$$\begin{aligned} \lambda _{ijkrst}=\mu _{ijkrst}-\sum ^{15}\mu _{ij}(\mu _{krst}-\sum ^{3}\mu _{kr}\mu _{st})-\sum ^{10}\mu _{ijk}\mu _{rst}-\sum ^{15}\mu _{ij}\mu _{kr}\mu _{st} \end{aligned}$$the sums are taken over all distinct permutations.
4 Tests based on \({\varvec{{\bar{X}}}}\) and \({\varvec{{u}}}\)
4.1 Modifying Mardia’s statistic
The factor 2 in Mardia’s expression
is only of interest if we assume negligible fourth moments (in the sense of Mardia (1970)). We will therefore omit it in the following and instead study the quantity
Let \({\varvec{{L_{11}}}}\), \({\varvec{{L_{22}}}}\), \({\varvec{{L_{12}}}}\) and \({\varvec{{L_{21}}}}\) be the sample counterparts of \({\varvec{{\Lambda _{11}}}}\), \({\varvec{{\Lambda _{22}}}}\), \({\varvec{{\Lambda _{12}}}}\) and \({\varvec{{\Lambda _{21}}}}\), where \(\mu _{i_1,\ldots ,i_s}={ E }(X_{i_1}-\mu _{i_1})(X_{i_2}-\mu _{i_2})\ldots (X_{i_s}-\mu _{i_s})\) are estimated by the sample moments
i.e. where the moments in Theorem 1 are replaced by their sample counterparts. The test statistic for the new test is
The null hypothesis of normality is rejected if \(Z_{2,p}^{({ HL})}\) is sufficiently large.
\(Z_{2,1}^{({ HL})}\) coincides with \(Z_2'^2\) from Thulin (2010) and is thus equivalent to the \(|Z_2'|\) test presented there. \(Z_{2,2}^{({ HL})}\) is a polynomial of degree 10 in 13 moments and the full formula takes up more than two pages. It is however readily computed using a computer, as is \(Z_{2,p}^{({ HL})}\) for higher \(p\).
It should be noted that differences in index notation complicate the situation somewhat here. Mardia’s skewness is denoted \(b_{1,p}\), with 1 as its index, whereas the univariate correlation statistic \(Z_2'\) has 2 as its index. When generalizing \(Z_2'\) to the multivariate setting we will keep the index 2, hoping that it won’t be confused with Mardia’s kurtosis measure \(b_{2,p}\).
4.2 Other test statistics from the theory of canonical correlations
Let \({\varvec{{Y}}}\) and \({\varvec{{Z}}}\) be normal random vectors with
partitioned like (1). Let \({\varvec{{\hat{\Sigma }_{11}}}}\), \({\varvec{{\hat{\Sigma }_{22}}}}\) and \({\varvec{{\hat{\Sigma }_{12}}}}={\varvec{{\hat{\Sigma }_{21}'}}}\) be the sample covariance matrices and \(\hat{\nu }_1^2,\ldots ,\hat{\nu }_p^2\) be the eigenvalues of \({\varvec{{\hat{\Sigma }_{11}}}}^{-1}{\varvec{{\hat{\Sigma }_{12}}}}{\varvec{{\hat{\Sigma }_{22}}}}^{-1}{\varvec{{\hat{\Sigma }_{21}}}}\). In Section 10.3 of Kshirsagar (1972) the test statistic of the likelihood ratio test of \(H_0: {\varvec{{\Sigma _{12}}}}={\varvec{{0}}}\) versus \(H_1: {\varvec{{\Sigma _{12}}}}\ne {\varvec{{0}}}\) is shown to be
Now, let \(\hat{\lambda }_1^2\ge \hat{\lambda }_2^2\ge \ldots \ge \hat{\lambda }_p^2\) be the eigenvalues of \({\varvec{{L_{11}}}}^{-1}{\varvec{{L_{12}}}}{\varvec{{L_{22}}}}^{-1}{\varvec{{L_{21}}}}\). Assuming that the necessary moments exist, \({\varvec{{\bar{X}}}}\) and \({\varvec{{u}}}\) are asymptotically normal. Although \({\varvec{{L_{22}}}}\) and \({\varvec{{L_{12}}}}\) are not the usual sample covariance matrices, in the light of (13), this suggests the use of the following statistic for a test for normality:
The null hypothesis of normality is rejected if \(Z_{2,p}^{(W)}\) is sufficiently small.
Another quantity that has been considered for a test of \(H_0: {\varvec{{\Sigma _{12}}}}={\varvec{{0}}}\), for instance by Bartlett (1939), is
\(Z_{2,p}^{({ PB})}\) is similar to \(Z_{2,p}^{({ HL})}\), but weighs the correlation coefficients so that larger coefficients become more influential. The null hypothesis should be rejected for large values of \(Z_{2,p}^{({ PB})}\).
Finally, we can consider the statistic
large values of which imply non-normality. \(Z_{2,p}^{(max)}\) seems perhaps like the most natural choice for a test statistic, as \(\lambda _1=0\) implies that all canonical correlations are 0.
The statistics \(Z_{2,p}^{({ HL})}\), \(Z_{2,p}^{(W)}\), \(Z_{2,p}^{({ PB})}\) and \(Z_{2,p}^{(max)}\) are all related to well-known statistics from multivariate analysis of variance; they are analogs of the Hotelling–Lawley trace, Wilk’s \(\Lambda \), the Pillai–Bartlett trace and Roy’s greatest root, respectively. For \(p=1\) these statistics are all equivalent to the \(|Z_2'|\) test from Thulin (2010).
4.3 Theoretical results
Some fundamental properties of the new test statistics are presented in the following theorem. Its proof is given in the Appendix.
Theorem 3
Assume that \({\varvec{{X_1}}}, \ldots , {\varvec{{X_n}}}\) are i.i.d. \(p\)-variate random variables fulfilling the conditions of Theorem 1. Then, for \(n\ge 2p+p(p-1)/2\) and \(i,j,k=1,2,\ldots ,p\)
-
(i)
\(Z_{2,p}^{({ HL})}, Z_{2,p}^{(W)}, Z_{2,p}^{({ PB})}\) and \(Z_{2,p}^{(max)}\) are affine invariant, i.e. invariant under nonsingular linear transformations \({\varvec{{AX}}}+{\varvec{{b}}}\) where \({\varvec{{A}}}\) is a nonsingular \(p\times p\) matrix and \({\varvec{{b}}}\) is a \(p\)-vector,
-
(ii)
The population canonical correlation \(\lambda _1=\max _{{\varvec{{a}}},{\varvec{{b}}}}|\rho ({\varvec{{a\bar{X}}}},{\varvec{{bu}}})|=0\) if \(\mu _{ijk}= 0\) for all \(i,j,k\) and \(>0\) if \(\mu _{ijk}\ne 0\) for at least one combination of \(i,j,k\), and
-
(iii)
\(Z_{2,p}^{({ HL})}, Z_{2,p}^{(W)}, Z_{2,p}^{({ PB})}\) and \(Z_{2,p}^{(max)}\) converge almost surely to the corresponding functions of the population canonical correlations \(\lambda _1\ge \lambda _2\ge \ldots \ge \lambda _p\).
Since the statistics are affine invariant, their distributions are the same for all \(p\)-variate normal distributions for a given sample size \(n\). These null distributions are easily obtained using Monte Carlo simulation.
Since \(\lambda _1\ge \lambda _j\) for \(j>1, \lambda _1=0\) implies that all population canonical correlations are 0, as is the case for the normal distribution. The tests should therefore not be sensitive to distributions with that kind of symmetry. All four statistics are, by (ii) and (iii), however consistent against alternatives where \(\mu _{ijk}\ne 0\) for at least one combination of \(i,j,k\). In particular, they are sensitive to alternatives with skew marginal distributions.
5 Tests based on \({\varvec{{\bar{X}}}}\) and \({\varvec{{v}}}\)
5.1 Test statistics
The ideas used in Sect. 4 for \(\mathrm{Cov}(({\varvec{{\bar{X}}}},{\varvec{{u}}}))\) can also be used for \(\mathrm{Cov}(({\varvec{{\bar{X}}}},{\varvec{{v}}}))\) in an analog manner, yielding multivariate generalizations of (4). This leads to two new tests for normality, as described below.
Let \({\varvec{{P_{11}}}}, {\varvec{{P_{22}}}}, {\varvec{{P_{12}}}}\) and \({\varvec{{P_{21}}}}\) be the sample counterparts of \({\varvec{{\Psi _{11}}}}, {\varvec{{\Psi _{22}}}}, {\varvec{{\Psi _{12}}}}\) and \({\varvec{{\Psi _{21}}}}\), where the \(\mu _{i_1,\ldots ,i_s}\) are estimated by the sample moments, as in (11) above. Let \(\hat{\psi }_1^2\ge \ldots \ge \hat{\psi }_p^2\) be the eigenvalues of \({\varvec{{P_{11}}}}^{-1}{\varvec{{P_{12}}}}{\varvec{{P_{22}}}}^{-1}{\varvec{{P_{21}}}}\).
The test statistics for the new tests are
We have also considered other statistics, but found these to have lower power than these two. Large values of \(Z_{3,p}^{({ HL})}\) and small values of \(Z_{3,p}^{(W)}\) imply non-normality. Both statistics are equivalent to \(|Z_3'|\) from Thulin (2010) for \(p=1\).
5.2 Theoretical results
The following theorem mimics Theorem 3 above. Its proof is given in the Appendix.
Theorem 4
Assume that \({\varvec{{X_1}}}, \ldots , {\varvec{{X_n}}}\) are i.i.d. \(p\)-variate random variables fulfilling the conditions of Theorem 2. Then, for \(n\ge 2p+p(p-1)+p(p-1)(p-2)/6\) and \(i,j,k,r,s,t=1,2,\ldots ,p\)
-
(i)
\(Z_{3,p}^{({ HL})}\) and \(Z_{3,p}^{(W)}\) are affine invariant, i.e. invariant under nonsingular linear transformations \({\varvec{{AX}}}+{\varvec{{b}}}\) where \({\varvec{{A}}}\) is a nonsingular \(p\times p\) matrix and \({\varvec{{b}}}\) is a \(p\)-vector,
-
(ii)
The population canonical correlation \(\psi _1=\max _{{\varvec{{a}}},{\varvec{{b}}}}|\rho ({\varvec{{a\bar{X}}}},{\varvec{{bv}}})|=0\) if \(\mu _{irst}-\mu _{ir}\mu _{st}-\mu _{is}\mu _{rt}-\mu _{it}\mu _{rs}=0\) for all \(i,r,s,t=1,\ldots ,p\) and \(>0\) otherwise, and
-
(iii)
\(Z_{3,p}^{({ HL})}\) and \(Z_{3,p}^{(W)}\) converge almost surely to the corresponding functions of the population canonical correlations \(\psi _1\ge \psi _2\ge \ldots \ge \psi _p\).
Using the affine invariance, the null distributions of the statistics can be obtained through Monte Carlo simulation.
By (ii) and (iii) both statistics are consistent against alternatives where \(\mu _{irst}-\mu _{ir}\mu _{st}-\mu _{is}\mu _{rt}-\mu _{it}\mu _{rs}\ne 0\) for at least one combination of \(i,r, s, t\).
6 Analysis of the Iris data set
In Table 1 we present the results for the new tests when applied to the famous Iris data set of Fisher (1936). The tests are applied to each of the three subsets of the Iris data: Setosa, Versicolor and Virginica. For each such subset \(n=50\) and \(p=4\). We also applied Mardia’s skewness test \(b_{1,p}\) (2), Mardia’s kurtosis test \(b_{2,p}\) (3) and the Mardia–Kent omnibus test \(T\) (Mardia and Kent 1991), in which the skewness and kurtosis measures are combined. To compute the critical values and \(p\)-values, we approximated the null distribution of each test statistic by 10,000 simulated samples from a normal distribution. The resulting critical values at the 5 % level are given in the table. Recall that for the \(\mathbf {Z_{2,p}^{(W)}}\) and \(\mathbf {Z_{3,p}^{(W)}}\) statistics, values of the statistic that are smaller than the critical value imply non-normality.
At the 5 % level, the only time at which normality is rejected is when \(b_{2,p}\) is applied to the Setosa sample.
7 Simulation results
7.1 The simulation study
To evaluate the performance of the new \(Z_{2,p}\) and \(Z_{3,p}\) tests, a Monte Carlo study of their power was carried out. The tests were compared to \(b_{1,p}\), \(b_{2,p}\) (3) and the Mardia–Kent test \(T\). The tests were compared for \(n=20\) and \(n=50\) for \(p=2\) and \(p=3\). For some alternatives, more combinations of \(n\) and \(p\) were used. Since the results for \(p=2\) and \(p=3\) were quite similar, we only present the results for \(p=3\) below. The results for \(p=2\) can be found in Supplement S1.
In many power studies for multivariate tests for normality alternatives with independent marginal distributions have been used. We believe that this can be misleading, as distributions with independent marginals are uncommon in practice and indeed of little interest in the multivariate setting, where the dependence structure of the marginals often is paramount. For this reason, we decided to focus mainly on alternatives with a more complex dependence structure in our study. One alternative with independent exponential marginals, which has been used in many previous power studies, is included for reference.
The alternatives used in the study are presented in Tables 2 and 3. Contour plots of the alternatives from Table 2 are given in Supplement S2. The asymmetric multivariate Laplace distribution mentioned in Table 3 is described in Kotz et al. (2000).
In order to see which alternatives that the different tests could be sensitive to, the population values of the statistics were determined for all alternatives. For most distributions the values were computed numerically, to one decimal place for Mardia’s statistics and to two decimal places for the \(Z_{2,p}\) and \(Z_{3,p}\) tests. The population values are given in Table 3 in Supplement S1.
Using R, the nine tests were applied to 1,000,000 samples from each alternative and each combination of \(n\) and \(p\). The null distributions for all test statistics were estimated using 100,000 standard normal samples.
7.2 Results for symmetric alternatives
The results for alternatives with symmetric marginal distributions are presented in Table 4 in the “Appendix”. Mardia’s kurtosis test \(b_{2,p}\) had the best overall performance against symmetric alternatives with long-tailed marginal distributions, with the Mardia-Kent \(T\) test as runner-up. The \(Z_{3,p}\) tests had by far the best performance against symmetric alternatives with short-tailed marginal distributions, but performed poorly against heavy-tailed alternatives. It should therefore be regarded as being directed against short-tailed alternatives.
\(b_{2,p}\) and the \(Z_{3,p}\) tests were somewhat unexpectedly outperformed by the \(b_{1,p}\) test and the \(Z_{2,p}\) tests for the \(Laplace(0,1)\) (type I) distribution. This was likely caused by the fact that a distribution with that particular dependence structure (described in Table 2), while having symmetric marginal distributions, is not symmetric in a multivariate sense, as can be seen from the contour plot in Supplement S2 or in Table 3 in Supplement S1.
Finally, we investigated the size of the tests by computing their power against two normal distributions. All tests attained the desired size \(\alpha =0.05\).
7.3 Results for asymmetric alternatives
The results for alternatives with asymmetric marginal distributions are presented in Figs. 1 and 2 and Table 5 in the Appendix.
Mardias skewness test \(b_{1,p}\) and the \(Z_{2,p}\) tests are all directed to asymmetric alternatives, and outperformed the other tests. However, no directed test was uniformly more powerful than the other directed tests. For \(p=2\), the \(Z_{2,p}^{(max)}\) had the best overall performance against asymmetric alternatives, while \(b_{1,p}\) and the \(Z_{2,p}^{(W)}\) and \(Z_{2,p}^{({ PB})}\) tests also displayed a good average performance. For \(p=3\) the performance of \(Z_{2,p}^{(max)}\) was somewhat worse, whereas \(b_{1,p}\), \(Z_{2,p}^{(W)}\) and \(Z_{2,p}^{({ PB})}\) still showed good performance.
How varying \(n\) and \(p\) affects the power of the tests is investigated in Figs. 1 and 2. In Fig. 1a, we see that against a distribution with \(Beta(1,2)\) marginal distributions, the \(Z_{3,p}\) tests have the best performance for small \(n\), whereas the \(Z_{2,p}\) tests are superior for larger \(n\). In Fig. 1, it is seen that against a distribution with \(LogN(0,1)\) marginal distributions, the \(Z_{2,p}\) tests have higher power than the \(b_{1,p}\) test for small \(n\), while the relation is reversed for larger \(n\).
In Fig. 2a, we see that against the \(AL(\mathbf {3},\mathbf {\Sigma _0})\) distribution, \(b_{2,p}\) has slightly higher power than the \(Z_{2,p}\) tests for small \(n\), whereas the \(Z_{2,p}\) tests have slightly higher power for larger \(n\). In Fig. 2b however, when \(n/p\) is fixed and \(p\) is increased, the difference in power between the tests remains more or less unchanged.
8 Discussion
Based on the simulation results, our recommendations are that the \(Z_{2,p}^{(max)}\) test should be used against asymmetric alternatives when \(p=2\). For higher \(p\), \(b_{1,p}\), \(Z_{2,p}^{(W)}\) or \(Z_{2,p}^{({ PB})}\) should be used instead. Mardia’s \(b_{2,p}\) test should be used against heavy-tailed symmetric alternatives. For short-tailed symmetric alternatives, one of the \(Z_{3,p}\) tests would be a better choice.
Most previous power studies for multivariate tests for normality have focused on alternatives with independent marginal distributions. Such distributions are likely to be rare in practice, and as is shown by the two distributions with \(Laplace(0,1)\) marginals used in our study, multivariate dependence structures can greatly affect the power of tests for normality.
To complicate matters further, some of the results in the tables highlight the fact that what holds true for one combination of \(p\) and \(n\) can be false for a different combination. For instance, when \(p=2\), \(Z_{2,p}^{(max)}\) had higher power than \(b_{1,p}\) for the \(AL(\mathbf {1},{\varvec{{\Sigma _{0}}}})\) and the multivariate \(\chi ^2_8\) alternatives, but when \(p=3\), \(Z_{2,p}^{(max)}\) had lower power than \(b_{1,p}\). This phenomenon merits further investigation, as it implies that power studies performed for low values of \(p\) can be misleading when choosing between tests to use for higher-dimensional data. Further examples of this phenomenon are given in Figs. 1 and 2.
In recent years, several authors have studied robust testing for normality, i.e. normality test designed to be robust against outliers. See Stehlík et al. (2012) and Cerioli et al. (2013) for examples. Stehlík et al. (2014) proposed a robustified version of the univariate \(Z_{2,1}\) test. A robustified version of the multivariate \(Z_{2,p}\) test will appear in a future paper by the author.
Looking at the normal mixtures, which can be viewed as contaminated normal distributions, we see that \(Z_{2,p}^{(max)}\) and \(b_{1,p}\) were on a par for the mildy polluted mixtures (with a 9:1 mixing ratio) and that \(Z_{2,p}^{(max)}\) in general had higher power for the heavily polluted mixtures (with a 3:1 mixing ratio). This suggests the use of the \(Z_{2,p}^{(max)}\) statistic for a test for outliers, an idea that perhaps could be investigated further.
Implementations of the \(Z_{2,p}\) and \(Z_{3,p}\) in R are available from the author. Some critical values for the new tests are given in Table 6.
References
Bartlett MS (1939) A note on tests of significance in multivariate analysis. Math Proc Camb Philos Soc 35:180–185
Cerioli A, Farcomeni A, Riani M (2013) Robust distances for outlier-free goodness-of-fit testing. Comput Stat Data Anal 65:29–45
Doornik JA, Hansen H (2008) An omnibus test for univariate and multivariate normality. Oxf Bull Econ Stat 70:927–939
Dubkov AA, Malakhov AN (1976) Properties and interdependence of the cumulants of a random variable. Radiophys Quantum Electron 19:833–839
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188
Henderson HV, Searle SR (1979) Vec and vech operators for matrices, with some uses in Jacobians and multivariate statistics. Can J Stat 7:65–81
Henze N (2002) Invariant tests for multivariate normality: a critical review. Stat Pap 43:467–506
Kankainen A, Taskinen S, Oja H (2007) Tests of multinormality based on location vectors and scatter matrices. Stat Methods Appl 16:357–359
Kaplan EL (1952) Tensor notation and the sampling cumulants of k-statistics. Biometrika 39:319–323
Kollo T (2002) Multivariate skewness and kurtosis measures with an application in ICA. J Multivar Anal 99:2328–2338
Kollo T, von Rosen D (2005) Advanced multivariate statistics with matrices. Springer, Berlin. ISBN 978-1-4020-3418-3
Kotz S, Kozubowski TJ, Podgórski K (2000) An asymmetric multivariate Laplace distribution, Technical Report No. 367, Department of Statistics and Applied Probability, University of California at Santa Barbara
Kshirsagar AM (1972) Multivariate analysis. Marcel Dekker, ISBN 0-8247-1386-9
Lin C-C, Mudholkar GS (1980) A simple test for normality against asymmetric alternatives. Biometrika 67:455–61
Mardia KV (1970) Measures of multivariate skewness and kurtosis with applications. Biometrika 57:519–530
Mardia KV (1974) Applications of some measures of multivariate skewness and kurtosis in testing normality and robustness studies. Sankhya Indian J Stat 36:115–128
Mardia KV, Kent JT (1991) Rao score tests for goodness of fit and independence. Biometrika 78:355–363
Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic Press, ISBN 0-12-471250-9
McCullagh P (1987) Tensor methods in statistics. University Press, ISBN 0-412-27480-9
Mecklin CJ, Mundfrom DJ (2004) An appraisal and bibliography of tests for multivariate normality. Int Stat Rev 72:123–128
Mecklin CJ, Mundfrom DJ (2005) A Monte Carlo comparison of the type I and type II error rates of tests of multivariate normality. J Stat Comput Simul 75:93–107
Mudholkar GS, Marchetti CE, Lin CT (2002) Independence characterizations and testing normality against restricted skewness–kurtosis alternatives. J Stat Plan Inference 104:485–501
Stehlík M, Fabián Z, Střelec L (2012) Small sample robust testing for normality against Pareto tails. Commun Stat Simul Comput 41:1167–1194
Stehlík M, Střelec L, Thulin M (2014) On robust testing for normality in chemometrics. Chemom Intell Lab Syst 130:98–109
Thulin M (2010) On two simple tests for normality with high power. Pre-print, arXiv:1008.5319
Acknowledgments
The author wishes to thank the editor and two anonymous referees for comments that helped improve the paper, and Silvelyn Zwanzig for several helpful suggestions.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix: proofs and tables
Appendix: proofs and tables
For the proof of Theorems 3 and Theorems 4 we need some basic properties of the Kronecker product \(\otimes \) and \(\mathrm{vech}\) and \(\mathrm{vec}\) operators from Henderson and Searle (1979). See also Kollo and von Rosen (2005) and Kollo (2002) for more on these tools from matrix algebra.
For a \(p\times q\) matrix \({\varvec{{A}}}=\{a_{ij}\}\) and an \(r\times s\) matrix \({\varvec{{B}}}\), the Kronecker product \({\varvec{{A}}}\otimes {\varvec{{B}}}\) is the \(pr\times qs\) matrix \(\{a_{ij}{\varvec{{B}}}\}\), \(i=1,\ldots ,p\), \(j=1,\ldots ,q\). The \(\mathrm{vec}\) operator stacks the columns of a matrix underneath eachother, forming a single vector. If the columns of the \(p\times q\) matrix \({\varvec{{A}}}\) are denoted \({\varvec{{a_1}}},\ldots ,{\varvec{{a_q}}}\) then \(\mathrm{vec}({\varvec{{A}}})=({\varvec{{a_1'}}},\ldots ,{\varvec{{a_q'}}})'\) is a vector of length \(pq\).
We will use that
and that if \({\varvec{{A}}}\) is a \(p\times p\) matrix and \({\varvec{{B}}}\) a \(q\times q\) matrix,
The \(\mathrm{vech}\) operator works as the \(\mathrm{vec}\) operator, except that it only contains each distinct element of the matrix once. For a symmetric matrix \({\varvec{{A}}}\), \(\mathrm{vech}({\varvec{{A}}})\) thus contains only the diagonal and the elements above the diagonal, whereas \(\mathrm{vec}({\varvec{{A}}})\) contains the diagonal elements and the off-diagonal elements twice.
We have the following relationship between the \(\mathrm{vec}\) operator and the Kronecker product:
Furthermore, for a given symmetric \(p\times p\) matrix \({\varvec{{A}}}\) there exists a \(p(p+1)/2\times p^2\) matrix \({\varvec{{H}}}\) and a \(p^2\times p(p+1)/2\) matrix \({\varvec{{G}}}\) such that
As a preparation for the proof of Theorem 3, we prove the following auxiliary lemma.
Lemma 1
Assume that \({\varvec{{X}}},{\varvec{{X_1}}}, \ldots , {\varvec{{X_n}}}\) are i.i.d. \(p\)-variate random variables fulfilling the conditions of Theorem 1. Let \(S_{ij}=(n-1)^{-1}\sum _{k=1}^n(X_{k,i}-\bar{X}_i)(X_{k,j}-\bar{X}_j)\) be the elements of the sample covariance matrix \({\varvec{{S}}}\).
is a vector with \(q=p(p+1)/2\) distinct elements. Denote its covariance matrix \(\mathrm{Cov}({\varvec{{u_X}}})={\varvec{{\Lambda _{22}}}}\).
Let \({\varvec{{A}}}\) be a nonsingular \(p\times p\) matrix and let \({\varvec{{b}}}\) be a \(p\)-dimensional vector. Then there exists a nonsingular \(q\times q\) matrix \({\varvec{{D}}}\) such that
-
(i)
the sample variances and covariances of \({\varvec{{Y}}}={\varvec{{AX}}}+{\varvec{{b}}}\) are given by \({\varvec{{u_Y}}}={\varvec{{Du_X}}}\),
-
(ii)
\(\mathrm{Cov}({\varvec{{u_Y}}})={\varvec{{D\Lambda _{22}D'}}}\) and
-
(iii)
\(\det ({\varvec{{D}}})=\det ({\varvec{{A}}})^{p+1}\),
Proof
The transformed sample \({\varvec{{AX}}}+{\varvec{{b}}}\) has sample covariance matrix \({\varvec{{ASA'}}}\), so we wish to study \(\mathrm{vech}({\varvec{{ASA'}}})\). We have
Moreover, since \({\varvec{{S}}}\) is symmetric there exist nonsingular matrices \({\varvec{{G}}}\) and \({\varvec{{H}}}\) such that
Thus
which establishes the existence of \({\varvec{{D}}}\). From Section 4.2 of Henderson and Searle (1979) we have
which is nonzero, since \({\varvec{{A}}}\) is nonsingular. \({\varvec{{D}}}\) is hence also nonsingular. In conclusion, we have established the existence and nonsingularity of \({\varvec{{D}}}\) as well as (i) and (iii). Finally, (ii) follows immediately from (i). \(\square \)
We now have the tools necessary to tackle Theorem 3.
Proof of Theorem 3
-
(i)
From Theorem 10.2.4 in Mardia et al. (1979) we have that the canonical correlations between the random vectors \({\varvec{{Y}}}\) and \({\varvec{{Z}}}\) are invariant under the nonsingular linear transformations \({\varvec{{AY}}}+{\varvec{{b}}}\) and \({\varvec{{CZ}}}+{\varvec{{d}}}\). Clearly all five statistics are invariant under changes in location, since \({\varvec{{S_{11}}}}\), \({\varvec{{S_{22}}}}\), \({\varvec{{S_{12}}}}\) and \({\varvec{{S_{21}}}}\) all share that invariance property. It therefore suffices to show that the nonsingular linear transformation \({\varvec{{AX}}}\) induces nonsingular linear transformations \({\varvec{{C\bar{X}}}}\) and \({\varvec{{Du}}}\). \({\varvec{{C}}}={\varvec{{A}}}\) is immediate and the existence of \({\varvec{{D}}}\) is given by Lemma 1.
-
(ii)
By part (ii) of Theorem 1, \(\mu _{ijk}=0\) for all \(i,j,k\) implies that \({\varvec{{\Lambda }}}_{12}={\varvec{{0}}}\). But then \({\varvec{{\Lambda _{11}}}}^{-1}{\varvec{{\Lambda _{12}}}}{\varvec{{\Lambda _{22}}}}^{-1}{\varvec{{\Lambda _{21}}}}={\varvec{{0}}}\) and all canonical correlations are 0. If \(\mu _{ijk}\ne 0\) then \(\rho (\bar{X}_i,S_{jk})\ne 0\). Thus the linear combinations \({\varvec{{a'\bar{X}}}}=\bar{X}_i\) and \({\varvec{{b'u}}}=S_{jk}\) have nonzero correlation. \(\lambda _1\) must therefore be greater than 0.
-
(iii)
Follows from the fact that the statistics are continuous function of sample moments that converge almost surely.\(\square \)
The proofs of parts (ii) and (iii) of Theorem 4 are analog to the previous proof. The proof for part (i) is however slightly different as we omit to explicitly give a matrix that gives a nonsingular linear transformation of \({\varvec{{v_X}}}\).
Proof of Theorem 4
(i) Let the third order central moment of a multivariate random variable \({\varvec{{Z}}}\) be
Given a sample \({\varvec{{X}}}_1,\ldots ,{\varvec{{X}}}_p\), let \(S_{ijk}=\frac{n}{(n-1)(n-2)}\sum _{r=1}^n(X_{r,i}-\bar{X}_i)(X_{r,j}-\bar{X}_j)(X_{r,k}-\bar{X}_k)\). When the distribution of \({\varvec{{Z}}}\) is the empirical distribution of said sample,
Similarly \(\mathrm{vec}\left( \bar{m}_3({\varvec{{Z}}})\right) \) stacks the elements of \(\bar{m}_3({\varvec{{Z}}})\) in a vector that simply is \(\mathrm{vech}\left( \bar{m}_3({\varvec{{Z}}})\right) \) with a few repetitions:
Thus, for each linear combination \({\varvec{{a'}}}{\varvec{{w_X}}}\) there exists a \({\varvec{{b}}}\) so that \({\varvec{{b'}}}{\varvec{{v_X}}}={\varvec{{a'}}}{\varvec{{w_X}}}\) and therefore, by the definition of canonical correlations, the (sample) canonical correlations between \({\varvec{{\bar{X}}}}\) and \({\varvec{{v_X}}}\) are the same as those between \({\varvec{{\bar{X}}}}\) and \({\varvec{{w_X}}}\).
Writing \({\varvec{{Y}}}={\varvec{{Z}}}-{ E }{\varvec{{Z}}}\), we have \(\bar{m}_3({\varvec{{Z}}})={ E }\left( {\varvec{{Y}}}({\varvec{{Y}}}\otimes {\varvec{{Y}}})'\right) \) and
Hence
Now, \(\det ({\varvec{{A}}}\otimes {\varvec{{A}}}\otimes {\varvec{{A}}})=\det ({\varvec{{A}}}\otimes {\varvec{{A}}})^p\det ({\varvec{{A}}})^{p^2}=\det ({\varvec{{A}}})^{3p^2}>0\), so \({\varvec{{E}}}:=({\varvec{{A}}}\otimes {\varvec{{A}}}\otimes {\varvec{{A}}})\) is a nonsingular matrix such that \(\bar{m}_3({\varvec{{AZ}}})={\varvec{{E}}}\bar{m}_3({\varvec{{Z}}})\). Since canonical correlations are invariant under nonsingular linear transformations of the two sets of variables, this means that the canonical correlations between \({\varvec{{\bar{X}}}}\) and \({\varvec{{w_X}}}\) remain unchanged under the transformation \({\varvec{{AX}}}+{\varvec{{b}}}\). Thus the canonical correlations between \({\varvec{{\bar{X}}}}\) and \({\varvec{{v_Y}}}\) must also necessarily remain unchanged. This proves the affine invariance of the statistics. \(\square \)
Rights and permissions
About this article
Cite this article
Thulin, M. Tests for multivariate normality based on canonical correlations. Stat Methods Appl 23, 189–208 (2014). https://doi.org/10.1007/s10260-013-0252-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-013-0252-5