1 Introduction

As evidenced by the papers Arcones (2007), Batsidis et al. (2013), Cardoso de Oliveira and Ferreira (2010), Ebner (2012), Enomoto et al. (2012), Farrel et al. (2007), Hanusz and Tarasińska (2008), Hanusz and Tarasińska (2012), Henze et al. (2018), Joenssen and Vogel (2014), Jönsson (2011), Kim (2016), Koizumi et al. (2014), Mecklin and Mundfrom (2005), Pudełko (2005), Székeley and Rizzo (2005), Tenreiro (2011), Tenreiro (2017), Thulin (2014), Villaseñor-Alva and Estrada (2009), Voinov et al. (2016), Yanada et al. (2015), and Zhou and Shao (2014), there is an ongoing interest in the problem of testing for multivariate normality. Without claiming to be exhaustive, the above list probably covers most of the publications in this field since the review paper Henze (2002).

Recently, Henze and Koch (2017) provided the lacking theory for a test for univariate normality suggested by Zghoul (2010). The purpose of this paper is twofold. First, we generalize the results of Henze and Koch (2017) to the multivariate case, thus obtaining a class of affine invariant and consistent tests for multivariate normality. Secondly, in contrast to that paper (and most of the other publications), which considered only independent and identically distributed (i.i.d.) observations, we also provide the asymptotics of our test statistics in the context of GARCH-type dependence.

To be more specific, let (for the time being) \(X, X_1,X_2, \ldots \) be a sequence of i.i.d. d-variate random column vectors that are defined on a common probability space \((\Omega ,\mathcal{A},\mathbb {P})\). We assume that the distribution \(\mathbb {P}^{X}\) of X is absolutely continuous with respect to Lebesgue measure. Let N\(_d(\mu ,\Sigma )\) denote the d-variate normal distribution with mean vector \(\mu \) and non-degenerate covariance matrix \(\Sigma \), and write \(\mathcal{N}_d\) for the class of all non-degenerate d-dimensional normal distributions. A test for multivariate normality is a test of the null hypothesis

$$\begin{aligned} H_0: \ \mathbb {P}^X \in \mathcal{{N}}_d, \end{aligned}$$

and usually such a test should be consistent against any fixed non-normal alternative distribution. Since the class \(\mathcal{N}_d\) is closed with respect to full rank affine transformations, any genuine test statistic \(T_n = T_n(X_1,\ldots ,X_n)\) based on \(X_1,\ldots ,X_n\) should also be affine invariant, i.e., we should have \( T_n(AX_1+b, \ldots , AX_n+b) = T_n(X_1,\ldots ,X_n) \) for each nonsingular \(d \times d\)-matrix A and each \(b \in \mathbb {R}^d\), see Henze (2002) for a critical account on affine invariant tests for multivariate normality.

In what follows, let \(\overline{X}_n = n^{-1}\sum _{j=1}^n X_j\), \(S_n = n^{-1}\sum _{j=1}^n (X_j-\overline{X}_n)(X_j-\overline{X}_n)^\top \) denote the sample mean and the sample covariance matrix of \(X_1,\ldots ,X_n\), respectively, where \(\top \) means transposition of vectors and matrices. Furthermore, let

$$\begin{aligned} Y_{n,j} = S_n^{-1/2}(X_j - \overline{X}_n), \qquad j=1,\ldots , n, \end{aligned}$$

be the so-called scaled residuals of \(X_1,\ldots ,X_n\), which provide an empirical standardization of \(X_1,\ldots ,X_n\). Here, \(S_n^{-1/2}\) denotes the unique symmetric square root of \(S_n\). Notice that \(S_n\) is invertible with probability one provided that \(n \ge d+1\), see Eaton and Perlman (1973). The latter condition is tacitly assumed to hold in what follows. Under \(H_0\), the empirical moment generating function

$$\begin{aligned} M_n(t) = \frac{1}{n} \sum _{j=1}^n \exp \left( t^\top Y_{n,j}\right) , \quad t\in \mathbb {R}^d, \end{aligned}$$
(1.1)

of \(Y_{n,1}, \ldots ,Y_{n,n}\) should be close to

$$\begin{aligned} m(t)=\exp \left( \Vert t\Vert ^2/2\right) , \end{aligned}$$

which is the moment generating function of the standard normal distribution N\(_d(0,\text {I}_d)\). Here and in the sequel, \(\Vert \cdot \Vert \) stands for the Euclidean norm on \(\mathbb {R}^d\), and I\(_d\) is the unit matrix of order d.

The statistic proposed in this paper is the weighted \(L^2\)-statistic

$$\begin{aligned} T_{n,\beta } = n \int _{\mathbb {R}^d} \left( M_n(t) - m(t) \right) ^2 \, w_\beta (t) \, \text {d}t, \end{aligned}$$
(1.2)

where

$$\begin{aligned} w_\beta (t) = \exp \left( -\beta \Vert t\Vert ^2 \right) , \end{aligned}$$
(1.3)

and \(\beta >1\) is some fixed parameter, the role of which will be discussed later. Notice that \(T_{n,\beta }\) is the ‘moment generating function analogue’ to the BHEP-statistics for testing for multivariate normality [see, e.g., Baringhaus and Henze (1988), Henze and Zirkler (1990), and Henze and Wagner (1997)]. The latter statistics originate if one replaces \(M_n(t)\) with the empirical characteristic function of the scaled residuals and m(t) with the characteristic function \(\exp (-\Vert t\Vert ^2/2)\) of the standard normal distribution N\(_d(0,{\text {I}}_d)\). For a general account on weighted \(L^2\)-statistics see, e.g., Baringhaus et al. (2017).

In principle, one could replace \(w_\beta \) in (1.3) with a more general weight function satisfying some general conditions. The above special choice, however, leads to a test criterion with certain extremely appealing features, since straightforward calculations yield the representation

$$\begin{aligned} T_{n,\beta }= & {} \pi ^{d/2} \left( \frac{1}{n} \sum _{i,j=1}^n \frac{1}{\beta ^{d/2}} \exp \left( \frac{\Vert Y_{n,i}+Y_{n,j}\Vert ^2}{4 \beta } \right) + \frac{n}{(\beta -1)^{d/2}} \right. \nonumber \\&\quad \left. -\,2 \sum _{j=1}^n \frac{1}{(\beta -1/2)^{d/2}} \exp \left( \frac{\Vert Y_{n,j}\Vert ^2}{4 \beta -2} \right) \right) , \end{aligned}$$
(1.4)

which is amenable to computational purposes. Notice that the condition \(\beta >1\) is necessary for the integral in (1.2) to be finite. Later, we have to impose the further restriction \(\beta >2\) to prove that \(T_{n,\beta }\) has a non-degenerate limit null distribution as \(n \rightarrow \infty \). We remark that \(T_{n,\beta }\) is affine invariant since it only depends on the Mahalanobis angles and distances \(Y_{n,i}^\top Y_{n,j}\), \(1 \le i,j \le n\). Rejection of \(H_0\) is for large values of \(T_{n,\beta }\).

The rest of the paper unfolds as follows. The next section shows that letting \(\beta \) tend to infinity in (1.2) yields a linear combination of two well-known measures of multivariate skewness. In Sect. 3, we derive the limit null distribution of \(T_{n,\beta }\) in the i.i.d. setting. Section 4 addresses the question of consistency of the new tests against general alternatives, while Sect. 5 considers the new criterion in the context of multivariate GARCH models in order to test for normality of innovations, and it provides the pertaining large-sample theory. Section 6 presents a Monte Carlo study that compares the new tests with competing ones, and it considers a real data set from the financial market. The article concludes with discussions in Sect. 7.

2 The case \(\beta \rightarrow \infty \)

In this section, we show that the statistic \(T_{n,\beta }\), after a suitable scaling, approaches a linear combination of two well-known measures of multivariate skewness as \(\beta \rightarrow \infty \).

Theorem 2.1

We have (elementwise on the underlying probability space)

$$\begin{aligned} \lim _{\beta \rightarrow \infty } \beta ^{3+d/2} \, \frac{96 T_{n,\beta }}{n \pi ^{d/2}} = 2b_{1,d} + 3{\widetilde{b}}_{1,d}, \end{aligned}$$

where

$$\begin{aligned} b_{1,d} = \frac{1}{n^2} \sum _{j,k=1}^n \left( Y_{n,j}^\top Y_{n,k}\right) ^3, \quad {\widetilde{b}}_{1,d} = \frac{1}{n^2} \sum _{j,k=1}^n Y_{n,j}^\top Y_{n,k} \, \Vert Y_{n,j}\Vert ^2 \, \Vert Y_{n,k}\Vert ^2 \end{aligned}$$

are multivariate sample skewness in the sense of Mardia (1970) and Móri et al. (1993), respectively.

Proof

Let \(b_{2,d} = n^{-1}\sum _{j=1}^n \Vert Y_{n,j}\Vert ^4\) denote multivariate sample kurtosis in the sense of Mardia (1970). From (1.4) and

$$\begin{aligned} \exp (y) = 1 + y + \frac{y^2}{2} + \frac{y^3}{6} + O(y^4) \end{aligned}$$

as \(y \rightarrow 0\), the result follows by very tedious but straightforward calculations, using the relations \(\sum _{j=1}^n Y_{n,j} = 0\), \(\sum _{j=1}^n \Vert Y_{n,j}\Vert ^2 = nd\), \(\sum _{j,k=1}^n \Vert Y_{n,j}+ Y_{n,k}\Vert ^2 = 2n^2d\),

For the derivation of the second but last expression, see the proof of Theorem 4.1 of Henze et al. (2018). We stress that although \(b_{2,d}\) and \(\sum _{j=1}^n \Vert Y_{n,j}\Vert ^6\) show up in some of the equations above, these terms cancel out in the derivation of the final result. \(\square \)

Remark 2.2

Interestingly, \(T_{n,\beta }\) exhibits the same limit behavior as \(\beta \rightarrow \infty \) as both the statistic studied by Henze et al. (2018), which is based on a weighted \(L^2\)-distance involving both the empirical characteristic function and the empirical moment generating function, and the BHEP-statistic for testing for multivariate normality, which is based on the empirical characteristic function, see Theorem 2.1 of Henze (1997). At first sight, Theorem 2.1 seems to differ from Theorem 4 of Henze and Koch (2017) which covers the special case \(d=1\), but a careful analysis shows that—with the notation \(\tau (\beta )\) in that paper—we have \(\lim _{\beta \rightarrow \infty } \beta ^{7/2} \tau (\beta ) =0\).

3 Asymptotic null distribution in the i.i.d. case

In this section, we consider the case that \(X_1,X_2, \ldots \) are i.i.d. d-dimensional random vectors with some non-degenerate normal distribution. The key observation for deriving the limit distribution of \(T_{n,\beta }\) is the fact that

$$\begin{aligned} T_{n,\beta } \ = \ \int _{\mathbb {R}^d} W_n(t)^2 \, w_\beta (t) \, \text {d}t, \end{aligned}$$

where

$$\begin{aligned} W_n(t) = \sqrt{n}\left( M_n(t) - m(t)\right) , \quad t \in \mathbb {R}^d, \end{aligned}$$
(3.1)

with \(M_n(t)\) given in (1.1). Notice that \(W_n\) is a random element of the Hilbert space

$$\begin{aligned} \text {L}^2_\beta := \text {L}^2\left( \mathbb {R}^d,\mathcal{B}^d,w_\beta (t)\text {d}t\right) \end{aligned}$$
(3.2)

of (equivalence classes of) measurable functions \(f:\mathbb {R}^d \rightarrow \mathbb {R}\) that are square integrable with respect to the finite measure on the \(\sigma \)-field \(\mathcal{B}^d\) of Borel sets of \(\mathbb {R}^d\) given by the weight function \(w_\beta \) defined in (1.3). The resulting norm in \(\text {L}_\beta ^2\) will be denoted by \(\Vert f\Vert _{\text {L}_\beta ^2} = \sqrt{\langle f,f \rangle }\). With this notation, \(T_{n,\beta }\) takes the form

$$\begin{aligned} T_{n,\beta } \ = \ \Vert W_n\Vert ^2_{\text {L}^2_\beta }. \end{aligned}$$
(3.3)

Writing “\({\mathop {\longrightarrow }\limits ^{\mathcal{D}}}\)” for convergence in distribution of random vectors and stochastic processes, the main result of this section is as follows.

Theorem 3.1

(Convergence of \(W_n\) under \(H_0\))

Suppose that X has some non-degenerate d-variate normal distribution, and that \(\beta >2\) in (1.3). Then, there is a centered Gaussian random element W of \({\text {L}}^2_\beta \) having covariance kernel

$$\begin{aligned} C(s,t) = \exp \left( \frac{\Vert s\Vert ^2+\Vert t\Vert ^2}{2} \right) \left( \mathrm{{e}}^{s^\top t} -1 - s^\top t - \frac{\left( s^\top t \right) ^2}{2} \right) , \quad s,t \in \mathbb {R}^d, \end{aligned}$$

so that \(W_n {\mathop {\longrightarrow }\limits ^{\mathcal{D}}}W\) as \(n \rightarrow \infty \).

In view of (3.3), the Continuous Mapping Theorem yields the following result.

Corollary 3.2

If \(\beta >2\), then, under the null hypothesis \(H_0\),

$$\begin{aligned} T_{n, \beta } {\mathop {\longrightarrow }\limits ^{\mathcal{D}}}\Vert W\Vert ^2_{{\text {L}}^2_\beta } \ \ \text {as} \ n \rightarrow \infty . \end{aligned}$$

Remark 3.3

The distribution of \(T_{\infty ,\beta } := \Vert W\Vert ^2_{{\text {L}}^2_\beta }\) (say) is that of \(\sum _{j=1}^\infty \lambda _jN_j^2\), where \(\lambda _1, \lambda _2, \ldots \) are the positive eigenvalues of the integral operator \(f \mapsto Af\) on \({\text {L}}^2_\beta \) associated with the kernel C given in Theorem 3.1, i.e., \((Af)(t) = \int C(s,t) f(s) \exp (-\beta \Vert s\Vert ^2) \text {d}s\), and \(N_1,N_2, \ldots \) are i.i.d. standard normal random variables. We did not succeed in obtaining explicit solutions of this equation. However, since

$$\begin{aligned} \mathbb E (T_{\infty ,\beta })= & {} \int _{\mathbb {R}^d} C(t,t) \, w_\beta (t) \, \text {d}t,\\ \mathbb {V}(T_{\infty ,\beta })= & {} 2 \int _{\mathbb {R}^d} \int _{\mathbb {R}^d} C^2(s,t) w_\beta (s) w_\beta (t) \, \text {d}s \text {d}t \end{aligned}$$

(see Shorack and Wellner 1986, p. 213), tedious but straightforward manipulations of integrals yield the following result, which generalizes Theorem 2 of Henze and Koch (2017).

Theorem 3.4

If \(\beta >2\), we have

  1. (a)
    $$\begin{aligned} \mathbb {E}(T_{\infty ,\beta }) = \pi ^{d/2} \left( \frac{1}{(\beta -2)^{d/2}} - \frac{1}{(\beta -1)^{d/2}} - \frac{d}{2(\beta -1)^{d/2+1}} - \frac{d(d+2)}{8 (\beta -1)^{d/2+2}}\right) ,\\ \end{aligned}$$
  2. (b)
    $$\begin{aligned} \mathbb {V}(T_{\infty ,\beta })= & {} 2\pi ^d\left( \frac{1}{(\beta (\beta -2))^{d/2}} - \frac{2^{d+1}}{\eta ^{d/2}} - \frac{(1+2d)2^d}{\eta ^{d/2+1}} - \frac{d(d+2)2^d}{\eta ^{d/2 +2}} \right. \\&\qquad \left. +\,\frac{1}{(\beta -1)^d} + \frac{d}{2(\beta -1)^{d+2}} + \frac{3d(d+2)}{64 (\beta -1)^{d+4}}\right) , \end{aligned}$$

where \(\eta = 4(\beta -1)^2-1\).

Proof of Theorem 3.1

In view of affine invariance, we assume w.l.o.g. that the distribution of X is N\(_d(0,\mathrm{{I}}_d)\). In Henze et al. (2018), the authors considered the “exponentially down-weighted empirical moment generating function process”

$$\begin{aligned} A_n(t) = \exp \left( - \frac{\Vert t\Vert ^2}{2}\right) \, M_n(t), \quad t \in \mathbb {R}^d. \end{aligned}$$
(3.4)

Notice that, with the notation given in (3.2), we have

$$\begin{aligned} \Vert A_n\Vert ^2_{\text {L}^2_\beta } = \Vert M_n\Vert ^2_{\text {L}^2_{\gamma }}, \end{aligned}$$

where \(\gamma = \beta -1\) From display (10.5) and Propositions 10.3 and 10.4 of Henze et al. (2018), we have

$$\begin{aligned} A_n(t) = \exp \left( - \frac{\Vert t\Vert ^2}{2} \right) \sqrt{n} \left( \frac{1}{n}\sum _{j=1}^n \mathrm{{e}}^{t^\top X_j} - m(t)\right) + V_{n}(t) + R_n(t), \end{aligned}$$

where \(\int _{\mathbb {R}^d} R_n^2(t) w_\gamma (t)\text {d} t = o_\mathbb {P}(1)\), and

$$\begin{aligned} V_{n}(t) = - \frac{1}{2\sqrt{n}} \sum _{j=1}^n \left( \left( t^\top X_j\right) ^2 - \Vert t\Vert ^2\right) - \frac{1}{\sqrt{n}} \sum _{j=1}^n t^\top X_j. \end{aligned}$$

Display (3.4) and the representation of \(A_n\) as a sum yield

$$\begin{aligned} W_n(t) = \frac{1}{\sqrt{n}} \sum _{j=1}^n Z_j(t) + m(t)R_n(t), \end{aligned}$$

where

$$\begin{aligned} Z_j(t) = \mathrm{{e}}^{t^\top X_j} - m(t) - \frac{m(t)}{2} \left( \left( t^\top X_j\right) ^2 - \Vert t\Vert ^2\right) - m(t) t^\top X_j. \end{aligned}$$

Notice that \(Z_1,Z_2, \ldots \) are i.i.d. centered random elements of L\(^2_\beta \). Since

$$\begin{aligned} \int _{\mathbb {R}^d} (m(t)R_n(t))^2 w_\beta (t)\, \text {d}t = \int _{\mathbb {R}^d} R_n^2(t) w_\gamma (t) \, \text {d} t = o_\mathbb {P}(1), \end{aligned}$$

a Central Limit Theorem in Hilbert spaces [see e.g., Bosq (2000)] shows that there is a centered Gaussian random element W of L\(^2_\beta \), so that \(W_n {\mathop {\longrightarrow }\limits ^{\mathcal{D}}}W.\) Using the fact that \(t^\top X\) has the normal distribution N\((0,\Vert t\Vert ^2)\) and the relations

$$\begin{aligned} \mathbb {E}\left[ \mathrm{{e}}^{s^\top X} \left( t^\top X\right) ^2 \right]= & {} m(s)\left( \left( s^\top t\right) ^2 + \Vert t\Vert ^2\right) ,\\ \mathbb {E}\left[ \mathrm{{e}}^{s^\top X} t^\top X \right]= & {} m(s) s^\top t,\\ \mathbb {E}\left[ \left( s^\top X\right) ^2 \left( t^\top X\right) ^2 \right]= & {} 2 \left( s^\top t\right) ^2 + \Vert s\Vert ^2 \, \Vert t\Vert ^2, \end{aligned}$$

some straightforward algebra shows that the covariance kernel C(st) figuring in the statement of Theorem 3.1 equals \(\mathbb {E}Z_1(s)Z_1(t)\). \(\square \)

4 Consistency

The next result shows that the test for multivariate normality based on \(T_{n,\beta }\) is consistent against general alternatives.

Theorem 4.1

Suppose X has some absolutely continuous distribution, and that \(M_X(t) = \mathbb {E}[\exp (t^\top X)] < \infty \), \(t \in \mathbb {R}^d\). Furthermore, let \({\widetilde{X}} = \Sigma ^{-1/2}(X-\mu )\), where \(\mu = \mathbb {E}(X)\) and \(\Sigma ^{-1/2}\) is the symmetric square root of the inverse of the covariance matrix \(\Sigma \) of X. Letting \(M_{{\widetilde{X}}}(t) = \mathbb {E}[\exp (t^\top {\widetilde{X}})]\), we have

$$\begin{aligned} \liminf _{n \rightarrow \infty } \frac{T_{n,\beta }}{n} \ \ge \ \int _{\mathbb {R}^d} \left( M_{{\widetilde{X}}}(t) -m(t) \right) ^2 \, w_\beta (t) \, \mathrm {d} t \end{aligned}$$

almost surely.

Proof of Theorem 3.1

Because of affine invariance we may w.l.o.g. assume \(\mathbb {E}X =0\) and \(\Sigma = \text {I}_d\). Fix \(K>0\) and put \(M_n^\circ (t) = n^{-1}\sum _{j=1}^n \exp (t^\top X_j)\). From the proof of Theorem 6.1 of Henze et al. (2018) we have

$$\begin{aligned} \lim _{n\rightarrow \infty } \max _{\Vert t\Vert \le K} \left| M_n(t) - M_n^\circ (t) \right| = 0 \end{aligned}$$

\(\mathbb {P}\)-almost surely. Now, the strong law of large numbers in the Banach space of continuous functions on \(B(K):=\{t \in \mathbb {R}^d:\Vert t\Vert \le K\}\) and Fatou’s lemma yield

$$\begin{aligned} \liminf _{n\rightarrow \infty } \frac{T_{n,\beta }}{n}\ge & {} \liminf _{n \rightarrow \infty } \int _{B(K)} \left( M_n(t) - m(t)\right) ^2 w_\beta (t) \, \text {d}t\\\ge & {} \int _{B(K)} \left( \mathbb {E}\text {e}^{t^\top X}- m(t)\right) ^2 w_\beta (t) \, \text {d}t \end{aligned}$$

\(\mathbb {P}\)-almost surely. Since K is arbitrary, the assertion follows. \(\square \)

Now, suppose that X has an alternative distribution (which is assumed to be standardized) satisfying the conditions of Theorem 4.1. Since \(\mathbb {E}\exp (t^\top X) - m(t) \ne 0\) for at least one t, Theorem 4.1 shows that \(\lim _{n\rightarrow \infty } T_{n,\beta } = \infty \)\(\mathbb {P}\)-almost surely. Since, for any given nominal level \(\alpha \in (0,1)\), the sequence of critical values of a level-\(\alpha \)-test based on \(T_{n,\beta }\) that rejects \(H_0\) for large values of \(T_{n,\beta }\) converges according to Theorem 3.1, this test is consistent against such an alternative. It should be ’all the more consistent’ against any distribution not satisfying the conditions of Theorem 4.1 but, in view of the reasoning given in Csörgő (1989), the behavior of \(T_{n,\beta }\) against such alternatives is a difficult problem.

5 Testing for normality in GARCH models

In this section, we consider the multivariate GARCH (MGARCH) model

$$\begin{aligned} X_j=\Sigma _j^{1/2}(\theta )\varepsilon _j, \quad j\in \mathbb {Z}, \end{aligned}$$
(5.1)

where \(\theta \in \Theta \subseteq \mathbb R^v\) is a v-dimensional vector of unknown parameters. The unobservable random errors or innovations \(\{\varepsilon _j, \,j \in \mathbb {Z}\}\) are i.i.d. copies of a d-dimensional random vector \(\varepsilon \), which is assumed to have mean zero and unit covariance matrix. Hence,

$$\begin{aligned} \Sigma _j(\theta )=\Sigma (\theta ; X_{j-1},X_{j-2}, \ldots ) \end{aligned}$$

is the conditional covariance matrix of \(X_j\), given \(X_{j-1},X_{j-2}, \ldots \). The explicit expression of \(\Sigma _j(\theta )\) depends on the assumed MGARCH model (see, e.g., Francq and Zakoïan 2010, for a detailed description of several relevant models). The interest in testing for normality of the innovations stems from the fact that this distributional assumption is made in some applications, and that, if erroneously accepted, some inferential procedures can lead to wrong conclusions (see, e.g., Spierdijk 2016, for the effect on the assessment of standard risk measures such as the value at risk).

Therefore, an important step in the analysis of GARCH models is to check whether the data support the distributional hypotheses made on the innovations. Because of this reason, a number of goodness-of-fit tests have been proposed for the innovation distribution. The papers by Klar et al. (2012) and Ghoudi and Rémillard (2014) contain an extensive review of such tests as well as some numerical comparisons between them for the special case of testing for univariate normality. The proposals for testing goodness-of-fit in the multivariate case are rather scarce.

The class of GARCH models has been proved to be particularly valuable in modeling financial data. As discussed, among others in Rydberg (2000), one of the stylized features of financial data is that they are heavy-tailed. From an extensive simulation study (a summary is reported in Sect. 6), we learnt that, for i.i.d. data, the test of normality based on \(T_{n,\beta }\) exhibits a high power against heavy-tailed distributions. Because of these reasons, this section is devoted to adapt that procedure to testing for normality of the innovations based on data \(X_1, \ldots , X_n\) that are driven by equation (5.1). Therefore, on the basis of the observations, we wish to test the null hypothesis

$$\begin{aligned} {H}_{0,G}: \ \mathrm{{The \ law \ of}} \ \varepsilon \ \mathrm{{is}} \ \text {N}_d(0,{ \mathrm{I}}_d). \end{aligned}$$

against general alternatives. Notice that \({{H}}_{0,G}\) is equivalent to the hypothesis that, conditionally on \(\{X_{j-1},X_{j-2},\ldots \}\), the law of \(X_j\) is N\(_d(0,\Sigma _j(\theta ))\), for some \(\theta \in \Theta \). Two main differences with respect to the i.i.d. case are: (a) the innovations in (5.1) are assumed to be centered at zero with unit covariance matrix; and (b) the conditional covariance matrix \(\Sigma _j(\theta )\) of \(X_j\) is time-varying in a way that depends on the unknown parameter \(\theta \) and on past observations.

Notice that although \({{H}}_{0,G}\) is about the distribution of \(\varepsilon \), the innovations are unobservable in the context of model (5.1). Hence, any inference on the distribution of the innovations should be based on the residuals

$$\begin{aligned} {\widetilde{\varepsilon }}_j(\widehat{\theta }_n)={\widetilde{ \Sigma }}_j^{-1/2}(\widehat{\theta }_n)X_j, \quad 1 \le j \le n. \end{aligned}$$

Recall that \(\Sigma _j(\theta )=\Sigma (\theta ; X_{j-1},X_{j-2},\dots )\), but we only observe \(X_1, \ldots , X_n\). Therefore, to estimate \(\Sigma _j(\theta )\), apart from a suitable estimator \(\widehat{\theta }_n\) of \(\theta \), we also need to specify values for \(\{X_j, \ j\le 0\}\), say \(\{\widetilde{X}_j, \ j\le 0\}\). So we write \({\widetilde{ \Sigma }}_j({\theta })\) for \(\Sigma (\theta ; X_{j-1}, \ldots , X_{1},{\widetilde{X}}_0,{\widetilde{X}}_{-1} \ldots )\). Under certain conditions, these arbitrarily fixed initial values are asymptotically irrelevant.

Taking into account that the innovations have mean zero and unit covariance matrix, we will work directly with the residuals, without standardizing them. Let \(M_n^G\) be defined as \(M_n\) in (1.1) by replacing \(Y_{n,j}\) with \({\widetilde{\varepsilon }}_j(\widehat{\theta }_n)\), \(1\le j \le n\), and define \(T_{n, \beta }^G\) as \(T_{n, \beta }\) in (3.3) with \(W_n\) changed for \(W_n^G\), where \(W_n^G\) is defined as \(W_n\) in (3.1) with \(M_n\) replaced by \(M_n^G\). In order to derive the asymptotic null distribution of \(W_n^G\), we will make the assumptions (A.1)–(A.6) below. In the sequel, \(C>0\) and \(\varrho \), \(0< \varrho < 1\), denote generic constants, the values of which may vary across the text, \(\theta _0\) stands for the true value of \(\theta \), and for any matrix \(A=(a_{kj})\), \(\Vert A \Vert =\sum _{k,j}|a_{k j}|\) denotes the \(l_1\)-norm (we use the same notation as for the Euclidean norm of vectors).

  1. (A.1)

    The estimator \(\widehat{\theta }_n\) satisfies \( \sqrt{n}(\widehat{\theta }_n-\theta _0)= n^{-1/2} \sum _{j=1}^nL_j +o_{\mathbb {P}}(1), \) where \(L_j=h_j g_j\), \(g_j=g(\theta _0; \varepsilon _j)\) is a vector of \(d^2\) measurable functions such that \( \mathbb E ( g_j)=0\) and \(\mathbb E ( g_j^\top g_j)<\infty \), and \(h_j=h (\theta _0; \varepsilon _{j-1}, \varepsilon _{j-2}\ldots )\) is a \(v\times d^2\)-matrix of measurable functions satisfying \(\mathbb E ( \Vert h_j h_j^\top \Vert ^2)<\infty \),

  2. (A.2)

    \(\sup _{\theta \in \Theta }\left\| \widetilde{\Sigma }^{-1/2}_{j}(\theta )\right\| \le C,\,\, 1\le j \le n, \quad \sup _{\theta \in \Theta }\left\| \Sigma ^{-1/2}_{j}(\theta )\right\| \le C, \, \, j \in \mathbb {Z},\quad \mathbb {P}\text{-a.s. }\),

  3. (A.3)

    \(\sup _{\theta \in \Theta }\Vert \Sigma ^{1/2}_{j}(\theta )-\widetilde{\Sigma }^{1/2}_{j}(\theta )\Vert \le C\varrho ^j\), \(1\le j \le n\),

  4. (A.4)

    \(\mathbb E \left\| X_j\right\| ^\varsigma <\infty \) and \( \mathbb E\left\| \Sigma ^{1/2}_{j}(\theta _0)\right\| ^\varsigma <\infty \), \(j \in \mathbb {Z}\), for some \(\varsigma >0\),

  5. (A.5)

    for any sequence \(x_1,x_2,\dots \) of vectors of \(\mathbb {R}^d\), the function \(\theta \mapsto \Sigma ^{1/2}(\theta ; x_1, x_2,\dots )\) admits continuous second-order derivatives,

  6. (A.6)

    for some neighborhood \(V(\theta _0)\) of \(\theta _0\), there exist \(p> 1\), \(q> 2\) and \(r> 1\) so that \(2p^{-1}+2r^{-1}=1\) and \(4q^{-1}+2r^{-1}=1\), and, for each \(j \in \mathbb {Z}\),

    $$\begin{aligned}&\mathbb E \sup _{\theta \in V(\theta _0)}\left\| \sum _{k,\ell =1}^v\Sigma _j^{-1/2}(\theta )\frac{\partial ^2\Sigma ^{1/2}_j(\theta )}{\partial \theta _k\partial \theta _\ell }\right\| ^p<\infty ,\\&\mathbb E \sup _{\theta \in V(\theta _0)}\left\| \sum _{k=1}^v\Sigma _j^{-1/2}(\theta )\frac{\partial \Sigma ^{1/2}_j(\theta )}{\partial \theta _k}\right\| ^q<\infty ,\\&\mathbb E \sup _{\theta \in V(\theta _0)}\left\| \Sigma _j^{1/2 }(\theta _0)\Sigma _j^{-1/2}(\theta )\right\| ^r<\infty , \end{aligned}$$

The next result gives the asymptotic null distribution of \(W_n^G\).

Theorem 5.1

(Convergence of \(W_n^G\) under \(H_{0,G}\))

Let \(\{X_j\}\) be a strictly stationary process satisfying (5.1), with \(X_j\) being measurable with respect to the sigma-field generated by \(\{\varepsilon _u,u\le j\}\). Assume that (A.1)–(A.6) hold and that \(\beta >2\). Then under the null hypothesis \({{H}}_{0,G}\), there is a centered Gaussian random element \(W_G\) of \({\text {L}}^2_{\beta }\), having covariance kernel \(C_G(s,t) = Cov (U(t),U(s)),\) so that \(W_n^G {\mathop {\longrightarrow }\limits ^{\mathcal{D}}}W_G\) as \(n \rightarrow \infty \), where

$$\begin{aligned} U(t)=\exp \left( t^\top \varepsilon _1\right) -m(t)-m(t)a(t)^\top L_1, \end{aligned}$$

\(a(t)^\top =(t^\top \mu _1 t, \ldots , t^\top \mu _v t)\), \(\mu _k=\mathbb {E}[A_{1k}(\theta _0)]\), \(A_{1k}(\theta )=\Sigma _1^{-1/2}(\theta )\frac{\partial }{\partial \theta _k} \Sigma _1^{1/2}(\theta )\), \(1\le k \le v\).

From Theorem 5.1 and the Continuous Mapping Theorem we have the following corollary.

Corollary 5.2

Under the assumptions of Theorem 5.1, we have

$$\begin{aligned} T_{n, \beta }^G {\mathop {\longrightarrow }\limits ^{\mathcal{D}}}\Vert W_G\Vert ^2_{{\text {L}}^2_{\beta }} \ \ \text {as} \ n \rightarrow \infty . \end{aligned}$$

The standard estimation method for the parameter \(\theta \) in GARCH models is the quasi maximum likelihood estimator (QMLE), defined as

$$\begin{aligned} {\widehat{\theta }}_n= \mathop {\hbox { arg max}}_{{\theta } \in {\Theta }} {L}_n(\theta ), \end{aligned}$$

where

$$\begin{aligned} L_n(\theta )=-\frac{1}{2} \sum _{j=1}^n \widetilde{\ell }_j (\theta ),\quad \widetilde{\ell }_j(\theta )=X^\top _j \widetilde{ \Sigma }_j(\theta )^{-1}X_j +\log det \widetilde{ \Sigma }_j(\theta ). \end{aligned}$$

Comte and Lieberman (2003) and Bardet and Wintenberger (2009), among others, have shown that under certain mild regularity conditions the QMLE satisfies (A.1) for general MGARCH models.

As observed before, there are many MGARCH parametrizations for the matrix \(\Sigma _j(\theta )\). Nevertheless, there exist only partial theoretical results for such models. The Constant Conditional Correlation model, proposed by Bollerslev (1990) and extended by Jeantheau (1998), is an exception, since its properties have been thoroughly studied. This model decomposes the conditional covariance matrix figuring in (5.1) into conditional standard deviations and a conditional correlation matrix, according to \( \Sigma _j(\theta _0)={ {D}}_j(\theta _0) R(\theta _0) {{D}}_j(\theta _0), \) where \({{D}}_j(\theta )\) and \(R(\theta )\) are \(d\times d\)-matrices, \(R(\theta )\) is a correlation matrix, and \({{D}}_j(\theta )\) is a diagonal matrix so that \(\sigma ^2_j(\theta )=\text{ diag }\left\{ D^2_j(\theta )\right\} \) with

$$\begin{aligned} \sigma ^2_j(\theta )={b}+\sum _{k=1}^p {{B}}_k X_{j-k}^{(2)}+\sum _{k=1}^q {{\Gamma }}_k \sigma ^2_{j-k}(\theta ). \end{aligned}$$

Here, \(X_{j}^{(2)}= X_{j} \odot X_{j}\), where \(\odot \) denotes the Hadamard product, that is, the element by element product, b is a vector of dimension d with positive elements, and \(\{B_k\}_{k=1}^p\) and \(\{{{\Gamma }}_k\}_{k=1}^q\) are \(d\times d\)-matrices with non-negative elements. This model will be referred to as CCC-GARCH(p,q). Under certain weak assumptions, the QMLE for the parameters in this model satisfies (A.1), and (A.2)–(A.6) also hold, see Francq and Zakoïan (2010) and Francq et al. (2017).

The asymptotic null distribution of \(T_{n, \beta }^G \) depends on the equation defining the GARCH model and on \(\theta _0\) through the quantities \(\mu _1, \ldots , \mu _v\), as well as on which estimator of \(\theta \) has been employed. Therefore, the asymptotic null distribution cannot be used to approximate the null distribution of \(T_{n, \beta }^G \). Following Klar et al. (2012), we will estimate the null distribution of \(T_{n, \beta }^G \) by using the following parametric bootstrap algorithm:

  1. (i)

    Calculate \(\widehat{\theta }_n=\widehat{\theta }_n(X_1, \ldots , X_n)\), the residuals \(\widetilde{\varepsilon }_1,\ldots ,\widetilde{\varepsilon }_n\) and the test statistic \({T}_{n,\beta }^G= {T}_{n,\beta }^G(\widetilde{\varepsilon }_1,\ldots ,\widetilde{\varepsilon }_n)\).

  2. (ii)

    Generate i.i.d. vectors \({\varepsilon }_1^*,\ldots ,{\varepsilon }_n^*\) from a N\(_d(0,{ \mathrm{I}}_d)\) distribution. Let \(X_j^*=\widetilde{\Sigma }_j^{1/2}(\widehat{\theta })\varepsilon _j^*\), \(j=1,\ldots , n\).

  3. (iii)

    Calculate \(\widehat{\theta }_n^*=\widehat{\theta }_n(X_1^*, \ldots , X_n^*)\), the residuals \(\widetilde{\varepsilon }_1^*,\ldots ,\widetilde{\varepsilon }_n^*\), and approximate the null distribution of \({T}^G_{n,\beta }\) by means of the conditional distribution, given the data, of \({T}^{G*}_{n,\beta }={T}^G_{n,\beta }(\widetilde{\varepsilon }_1^*,\ldots ,\widetilde{\varepsilon }_n^*)\).

In practice, the approximation in step (iii) is carried out by generating a large number of bootstrap replications of the test statistic \({T}_{n,\beta }^{G}\), whose empirical distribution function is used to estimate the null distribution of \({T}^G_{n,\beta }\). Similar steps to those given in the proof of Theorem 5.1 show that if one assumes that (A.1)–(A.6) continue to hold when \(\theta _0\) is replaced by \(\theta _n\), with \(\theta _n\rightarrow \theta _0\) as \(n\rightarrow \infty \), and \(\varepsilon \sim \text {N}_d(0,{ \mathrm{I}}_d)\), then the conditional distribution of \({T}^{G*}_{n,\beta }\), given the data, converges in law to \( \Vert W_G\Vert ^2_{{\text {L}}^2_{\beta }}\), with \(W_G\) as defined in Theorem 5.1. Therefore, the above bootstrap procedure provides a consistent null distribution estimator.

Remark 5.3

The practical application of the above bootstrap null distribution estimator entails that the parameter estimator of \(\theta \) and the residuals must be calculated for each bootstrap resample, which results in a time-consuming procedure. Following the approaches in Ghoudi and Rémillard (2014) and Jiménez-Gamero and Pardo-Fernández (2017) for other goodness-of-fit tests for univariate GARCH models, we could use a weighted bootstrap null distribution estimator in the sense of Burke (2000). From a computational point of view, it provides a more efficient estimator. Nevertheless, it can be verified that the consistency of the weighted bootstrap null distribution estimator of \({T}_{n,\beta }^G\) requires the existence of the moment generating function of the true distribution generating the innovations, which is a rather strong condition, specially taking into account that the alternatives of interest are heavy-tailed.

As in the i.i.d. case, the next result shows that the test for multivariate normality based on \(T_{n,\beta }^G\) is consistent against general alternatives.

Theorem 5.4

Let \(\{X_j\}\) be a strictly stationary process satisfying (5.1), with \(X_j\) being measurable with respect to the sigma-field generated by \(\{\varepsilon _u,u\le j\}\). Assume that (A.1)–(A.6) hold, that \(\varepsilon \) has some absolutely continuous distribution, and that \(M_\varepsilon (t) = \mathbb {E}[\exp (t^\top \varepsilon )] < \infty \), \(t \in \mathbb {R}^d\). We then have

$$\begin{aligned} \liminf _{n \rightarrow \infty } \frac{T_{n,\beta }^G}{n} \ \ge \ \int _{\mathbb {R}^d} \left( M_{\varepsilon }(t) -m(t) \right) ^2 \, w_\beta (t) \, \mathrm {d} t \end{aligned}$$

in probability.

Similar comments to those made after Theorem 4.1 for the i.i.d. case can be done in this setting.

Proof of Theorem 5.1

From the proof of Theorem 7.1 in Henze et al. (2018), it follows that \(W_n^G(t)=W_{1,n}^G(t)+r_{n,1}(t)\), with \(W_{1,n}^G(t)=n^{-1/2}\sum _{j=1}^nV_j(t)\),

$$\begin{aligned} V_j(t)=\exp \left( t^\top \varepsilon _j\right) -m(t)a(t)^\top \sqrt{n}\left( \widehat{\theta }_{n}-\theta _{0}\right) -m(t), \end{aligned}$$

and \(\Vert r_{n,1}\Vert _{{\text {L}}^2_{\beta }}=o_{\mathbb {P}}(1)\). By Assumption A.1, \(W_{1,n}^G(t)=W_{2,n}^G(t)+r_{n,2}(t)\), with \(W_{2,n}^G(t)=n^{-1/2}\sum _{j=1}^nU_j(t)\),

$$\begin{aligned} U_j(t)=\exp (t^\top \varepsilon _j)-\exp (-\Vert t\Vert ^2/2)a(t)^\top L_j-\exp (-\Vert t\Vert ^2/2), \end{aligned}$$

and \(\Vert r_{n,2}\Vert _{{\text {L}}^2_{\beta }}=o_{\mathbb {P}}(1)\).

To prove the result, we will apply Theorem 4.2 in Billingsley (1968) to \(\{W_{2,n}^G(t), t \in \mathbb {R}^d \}\) by showing that (a) for each positive M, \(\{W_{2,n}^G(t), t \in B(K) \}\) converges in law to \(\{W^G(t), t \in B(K) \}\) in C(B(K) ), the Banach space of real-valued continuous functions on \(B(K):=\{t \in \mathbb {R}^d:\Vert t\Vert \le K\}\), endowed with the supremum norm; (b) for each \(\varepsilon >0\), there is a positive K so that

$$\begin{aligned}&\int _{\mathbb {R}^d\setminus B(K)} \mathbb E \left[ W_{2,n}^G(t)^2\right] w_\beta (t) \, \text {d}t < \varepsilon , \end{aligned}$$
(5.2)
$$\begin{aligned}&\int _{\mathbb {R}^d\setminus B(K)} \mathbb E\left[ W^G(t)^2\right] w_\beta (t) \, \text {d}t < \varepsilon . \end{aligned}$$
(5.3)

\(\square \)

Proof of (a): By applying the central limit theorem for martingale differences, the finite-dimensional distributions of \(\{W_{2,n}^{G}(t), t \in \mathbb {R}^d \}\) converge to those of \(\{W_{G}(t), t \in \mathbb {R}^d \}\). Hence, to prove (a), we must show that \(\{W_{2,n}^{G}(t), t \in B(K) \}\) is tight. With this aim, we write \(W_{2,n}^{G}(t)=W_{3,n}^{G}(t)-W_{4,n}^{G}(t)\), with \(W_{3,n}^{G}(t)=n^{-1/2}\sum _{j=1}^n\{\exp (t^\top \varepsilon _j)-m(t)\}\) and \(W_{4,n}^{G}(t)=m(t)a(t)^\top n^{-1/2} \sum _{j=1}^nL_j\). The mean value theorem gives

$$\begin{aligned} \mathbb E \left[ \{\exp \left( t^\top \varepsilon \right) -\exp (s^\top \varepsilon )\}^2\right] \le \kappa \Vert t-s\Vert ^2, \quad s,t \in B(K), \end{aligned}$$

for some positive \(\kappa \). From Theorem 12.3 in Billingsley (1968), the process \(\{W_{3,n}^G(t), t \in B(K) \}\) is tight. By the central limit theorem for martingale differences, \(n^{-1/2} \sum _{j=1}^nL_j\) converges in law to a v-variate zero mean normal random vector. Hence, \(\{W_{4,n}^G(t), t \in B(K) \}\), being a product of a continuous function and a term which is \(O_{\mathbb {P}}(1)\), is tight, and the same property holds for \(\{W_{2,n}^G(t), t \in B(K) \}\).

Proof of (b): In view of \(\mathbb E \left[ W_{2,n}^G(t)^2 \right] = \mathbb E \left[ U_1(t)^2\right] <\infty \), for each \(\varepsilon >0\), there is some positive constant K so that (5.2) holds. Likewise, (5.3) holds, which completes the proof. \(\square \)

Proof of Theorem 5.4

Let \(\varepsilon _j(\theta )=\Sigma _j^{-1/2}(\theta )X_j\). Notice that \(\varepsilon _j(\theta _0)=\varepsilon _j\). Let \(\widetilde{M}_n^G(t)= n^{-1} \sum _{j=1}^n \exp \{t^\top \widetilde{\varepsilon }_j(\widehat{\theta }_n) \}\), \(\widehat{M}_n^G(t)=n^{-1} \sum _{j=1}^n \exp \{t^\top \varepsilon _j(\widehat{\theta }_n) \}\), \(M_n^\circ (t)=n^{-1} \sum _{j=1}^n \exp \{t^\top \varepsilon _j \}\) and \(B(K):=\{t \in \mathbb {R}^d:\Vert t\Vert \le K\}\). To show the result we will prove

  1. (a)

    \(\sup _{t \in B(K)} |\widehat{M}_n^G(t)-M_n^\circ (t)|=o_{\mathbb {P}}(1)\),

  2. (b)

    \(\sup _{t \in B(K)} |\widetilde{M}_n^G(t)-\widehat{M}_n^G(t)|=o_{\mathbb {P}}(1)\),

and the result will follow by using the same proof as in the i.i.d. case. \(\square \)

Proof of (a): Let \(\widehat{\theta }_n = (\widehat{\theta }_{n1}, \ldots , \widehat{\theta }_{nv})^\top \), \(\theta _0 = (\theta _{01}, \ldots , \theta _{0v})^\top \) and \(A_{jk}(\theta ) = \Sigma _j^{-1/2} (\theta )\frac{\partial }{\partial \theta _k} \Sigma _j^{1/2} (\theta )\). We have \(\varepsilon _j(\widehat{\theta }_n) =\varepsilon _j+\Delta _{n,j}\), with \(\Delta _{n,j}=-\sum _{k=1}^vA_{jk}(\widetilde{\theta }_{n,j})\varepsilon _j(\widehat{\theta }_{nk}-\theta _{0k})\), for some \(\widetilde{\theta }_{n,j}\) between \(\widehat{\theta }_{n}\) and \(\theta _0\). Observe that \(\exp (t^\top \Delta _{n,j})-1=t^\top \Delta _{n,j}\exp (\alpha _{n,j}t^\top \Delta _{n,j})\) for some \(\alpha _{n,j}\in (0,1)\). Now (A.1) and (A.6) yield \(\Vert \Delta _{n,j}\Vert \le D_j \Vert \varepsilon _j\Vert \Vert \widehat{\theta }_{n}-\theta _0\Vert \) for large enough n, with \( \mathbb {E} (D_j^2)<\infty \). The Cauchy–Schwarz inequality gives

$$\begin{aligned} \left| \widehat{M}_n^G(t)-M_n^\circ (t)\right| =\left| \frac{1}{n}\sum _{j=1}^n \exp \left( t^\top \varepsilon _j\right) \left\{ \exp \left( t^\top \Delta _{n,}j\right) -1\right\} \right| \le r_{1,n}(t)^{1/2}r_{2,n}(t)^{1/2}, \end{aligned}$$

where \(r_{1,n}(t)=M_n(2t)\), and

$$\begin{aligned} r_{2,n}(t)=\Vert t\Vert ^2 \Vert \widehat{\theta }_n-\theta _0\Vert ^2\exp \left\{ 2\Vert t\Vert \Vert \widehat{\theta }_n-\theta _0\Vert \max _{1\le j \le n}D_j \Vert \varepsilon _j\Vert \right\} \frac{1}{n}\sum _{j=1}^n D_j^2 \Vert \varepsilon _j\Vert ^2. \end{aligned}$$

From the strong law of large numbers in the Banach space of continuous functions on B(K), we have

$$\begin{aligned} \sup _{t\in B(K)} r_{1,n}(t) \le \sup _{t\in B(K)} M_\varepsilon (2t) +\sup _{t\in B(K)}\left| M_n^G(2t)+M_\varepsilon (2t)\right| <K_1 \quad \mathbb {P}\text {-a.s.} \end{aligned}$$

for some positive constant \(K_1\). From the ergodic theorem, \(n^{-1}\sum _{j=1}^n D_j^2 \Vert \varepsilon _j\Vert ^2<K_2\)\(\mathbb {P}\)-almost surely for some positive constant \(K_2\). Using stationarity and finite second-order moments, if follows that \( \max _{1\le j \le n}D_j \Vert \varepsilon _j\Vert /\sqrt{n} \rightarrow 0\), \(\mathbb {P}\)-almost surely. Hence (A.1) yields \(\sup _{t \in B(K)} r_{2,n}(t) \rightarrow 0\), in probability. This concludes the proof of (a).

Proof of (b): The reasoning follows similar steps as the proof of fact (c.1) in the proof of Theorem 7.1 in Henze et al. (2018) and is thus omitted. \(\square \)

6 Monte Carlo results

This section describes and summarizes the results of an extensive simulation experiment carried out to study the finite-sample performance of the proposed tests. Moreover, we consider a real data set of monthly log returns. All computations have been performed using programs written in the R language.

6.1 Numerical experiments for i.i.d. data

Upper quantiles of the null distribution of \(T_{n,\beta }\) have been approximated by generating 100,000 samples from a law N\(_d(0,\text {I}_d)\). Table 1 displays some critical values with the convention that an entry like \(^{-4}1.17\) stands for \(1.17 \times 10^{-4}\). The results show that large sample sizes are required to approximate the critical values by their corresponding asymptotic values.

Table 1 Critical points for \(\pi ^{-d/2}T_{n,\beta }\)

A natural competitor of the test based on \(T_{n,\beta }\) is the test studied in Henze and Wagner (1997) (HW-test), which is based on the empirical characteristic function. The latter procedure is simple to compute as well as affine invariant, and it has revealed good power performance with regard to competitors. Another sensible competitor is the test proposed in Henze et al. (2018) that will be called the hybrid test since it is based on both the empirical characteristic function and the empirical moment generating function. The performance of the test based on \(T_{n,\beta }\) is quite close to that of the hybrid test. Their behavior in relation to the HW-test depends on whether the distribution is heavy-tailed or not. We tried a number of non-heavy-tailed distributions (specifically, the multivariate Laplace distribution, finite mixtures of normal distributions, the skew-normal distribution, the multivariate \(\chi ^2\)-distribution, the Khintchine distribution, the uniform distribution on \([0,1]^d\) and the Pearson type II family). For these distributions, we observed that the powers of the proposed and the hybrid tests are either similar to or smaller than that of the HW-test; for two-sided heavy-tailed distributions, the new and the hybrid tests outperform the HW-test. This observation can be appreciated by looking at Table 2, which displays the empirical power calculated by generating 1000 samples (in each case), for the significance level \(\alpha =0.05\), from the following two-sided heavy-tailed alternatives: (\(\textit{ASE}_{\theta }\)) the \(\theta \)-stable and elliptically-contoured distribution, (\(\textit{GN}_{\beta }\)) the multivariate \(\beta \)-generalized distribution, that coincides with the normal distribution for \(\theta =2\) and has heavy tails for \(0<\theta <2\) (Goodman and Kotz 1973), and the (\(T_{\theta }\)) multivariate Student’s t with \(\theta \) degrees of freedom. The same fact was also observed in the simulations of Zghoul (2010) for the test based on \(T_{n,\beta }\) in the univariate case. For the alternative distributions presented in Table 2, the hybrid test is slightly more powerful than the new test. Nevertheless, this is not always the case, as can be perceived by looking at Table 3, where the test based on \(T_{n,\beta }\) has larger power than the hybrid test for the one-sided heavy-tailed alternatives: (\(LN_{\sigma }\)) multivariate log-normal, built with independent components from \(\exp (\sigma Z)\), Z having a standard normal distribution, and (\(MP_\theta \)) multivariate Pareto, built with independent components from \(\exp (E/\theta )\), E having an exponential distribution with mean 1. Table 3 exhibits powers for sample size \(n=20\), because for these alternatives with \(n=50\) all powers are close to 1.

Although in all tried cases, the powers of the test based on \(T_{n,\beta }\) and that of the hybrid test are quite close, we must say that the practical application of the hybrid test is limited to small sample sizes since the calculation of the associated test statistic requires \(O(n^4)\) computations, while the calculation of \(T_{n,\beta }\) only needs \(O(n^2)\). For this reason, we only generated 1000 Monte Carlo samples for Tables 2 and 3, because with such a computational cost, a larger experiment becomes unaffordable.

Before ending this subsection, we comment on the choice of \(\beta \) for \(T_{n,\beta }\). Although the properties so far studied are valid for any \(\beta >2\), in practice, for finite-sample sizes, we observed that the choice of \(\beta \) has an impact on the power of the proposed test. In view of this fact, in our simulations, we tried a large number of values for \(\beta \), for \(T_{n,\beta }\) and the statistic of the HW-test as well as for the parameter \(\gamma \) involved in the statistic of the hybrid test. The tables display the results for those values of \(\beta \) (also for \(\gamma \)) giving the highest power in most of the cases considered. The same can be said for the simulations in the next subsection.

Table 2 Percentage of rejection for nominal level \(\alpha =0.05\) and \(n=50\)
Table 3 Percentage of rejection for nominal level \(\alpha =0.05\) and \(n=20\)

6.2 Numerical experiments for GARCH data

In our simulations, we considered a bivariate CCC–GARCH(1,1) model with

$$\begin{aligned} {b}=\left( \begin{array}{c}0.1\\ 0.1\end{array}\right) , \quad {{B}}_1= \left( \begin{array}{cc}0.1 &{} \quad 0.1\\ 0.1 &{} \quad 0.1\end{array}\right) , \quad {{\Gamma }}_1= \left( \begin{array}{cc} \gamma &{} \quad 0.01\\ 0.01 &{} \quad \gamma \end{array}\right) , \quad {{R}}=\left( \begin{array}{cc}1 &{} \quad r\\ r &{} \quad 1 \end{array}\right) , \end{aligned}$$

for \(\gamma =0.3, 0.4, 0.5\) and \(r=0, 0.3\), and a trivariate CCC–GARCH(1,1) model with \({b}=(0.1, 0.1, 0.1)^\top \),

$$\begin{aligned} {B}_1= \left( \begin{array}{ccc}0.1 &{} \quad 0.1 &{} \quad 0.1\\ 0.1 &{}\quad 0.1 &{}\quad 0.1 \\ 0.1 &{} \quad 0.1 &{} \quad 0.1\end{array}\right) , \quad {\Gamma }_1= \left( \begin{array}{ccc} \gamma &{}\quad 0.01 &{}\quad 0.01 \\ 0.01 &{}\quad \gamma &{}\quad 0.01\\ 0.01 &{}\quad 0.01 &{}\quad \gamma \end{array}\right) , \quad {R}=\left( \begin{array}{ccc}1 &{}\quad r &{} \quad r\\ r &{} \quad 1 &{} \quad r\\ r &{} \quad r &{} \quad 1 \end{array}\right) \end{aligned}$$

and \(\gamma \) and r as before. The parameters in the CCC-GARCH models were estimated by their QMLE using the package ccgarch of the language R. For the distribution of the innovations, we first took \({\varepsilon }_1, \ldots , {\varepsilon }_n\) i.i.d. from a multivariate normal distribution (N) to study the level of the resulting bootstrap test. Then, to assess the power, we considered the following heavy-tailed distributions: \(T_\theta \), \(GN_{\theta }\), and (AEP) a vector with independent components, each having an asymmetric exponential power distribution (Zhu and Zinde-Walsh 2009) with parameters \(\alpha =0.4\), \(p_1=1.182\) and \(p_2=1.820\) (these settings gave useful results in practical applications for the errors in GARCH-type models). As in the previous subsection, we also calculated the HW-test. Since practical applications of MGARCH models involve large data sets, the hybrid test was not included in our simulations.

Table 4 reports the percentages of rejections for nominal significance level \(\alpha =0.05\) and sample size \(n=300\), for \(r=0,\, 0.3\) and \(\gamma =0.4\). The resulting pictures for \(\gamma =0.3, \, 0.5\) are quite similar so, to save space, we omit the results for these values of \(\gamma \). In order to reduce the computational burden, we adopted the warp-speed method of Giacomini et al. (2013), which works as follows: rather than computing critical points for each Monte Carlo sample, one resample is generated for each Monte Carlo sample, and the resampling test statistic is computed for that sample; then the resampling critical values for \({T^G}_{n,\beta }\) are computed from the empirical distribution determined by the resampling replications of \(T_{n,\beta }^{G*}\). In our simulations, we generated 10,000 Monte Carlo samples for the level and 2000 for the power. Looking at Table 4, we conclude that: the actual level of the proposed bootstrap test is very close to the nominal level, and this is also true for the HW-test (although to the best of our knowledge, the consistency of the bootstrap null distribution estimator of the HW-test statistic has been proved only for the univariate case in Jiménez-Gamero 2014); and with respect to the power, the proposed test in most cases outperforms the HW-test.

Table 4 Percentage of rejections for nominal level \(\alpha =0.05\), \(\gamma =0.4\) and \(n=300\)

6.3 A real data set application

As an illustration, we consider the monthly log returns of IBM stock and the S&P 500 index from January 1926 to December 1999 with 888 observations. This data set were analyzed in Example 10.5 of Tsay (2010), where it is showed that a CCC-GARCH(1,1) model provides an adequate description of the data, which is available from the website http://faculty.chicagobooth.edu/ruey.tsay/teaching/fts/ of the author. We applied the proposed test and the HW-test for testing \(H_{0,G}\). The p-values were obtained by generating 1000 bootstrap samples. For all values of \(\beta \) in Table 4, we get the same p-value, 0.000, which leads us to reject \(H_{0,G}\), as expected by looking at Fig. 1, which displays the scatter plot of the residuals after fitting a CCC-GARCH(1,1) model to the log returns, and Fig. 2, that represents the histograms of the marginal residuals with the probability density function of a standard normal law superimposed.

Fig. 1
figure 1

Scatter plot of the residuals

Fig. 2
figure 2

Histograms of the residuals

7 Conclusions

We have studied a class of affine invariant tests for multivariate normality both in an i.i.d. setting and in the context of testing that the innovation distribution of a multivariate GARCH model is Gaussian, thus generalizing results of Henze and Koch (2017) in two ways. The test statistics are suitably weighted \(L^2\)-statistics based on the difference between the empirical moment generating function of scaled residuals of the data and the moment generating function of the standard normal distribution in \(\mathbb {R}^d\). As such, they can be considered as ’moment generating function analogues’ to the time-honored class of BHEP-tests that use the empirical characteristic function. As the decay of a weight function figuring in the test statistic tends to infinity, the test statistic approaches a certain linear combination of two well-known measures of multivariate skewness. The tests are easy to implement, and they turn out to be consistent against a wide range of alternatives. In contrast to a recently studied \(L^2\)-statistic of Henze et al. (2018) that uses both the empirical moment generating and the empirical characteristic function, our test is also feasible for larger sample sizes since the computational complexity is of order \(O(n^2)\). Regarding power, the new tests outperform the BHEP-tests against heavy-tailed distributions.