1 Introduction

Consider a multiplicative strictly stationary process

$$\begin{aligned} x_t=\sigma _t \varepsilon _t, \end{aligned}$$
(1)

where \(\varepsilon _t\) and \(\sigma _t\) are independent, \(\{\varepsilon _t\}_{t \in \mathbb {Z}}\) is a sequence of independent and identically distributed (IID) random variables with mean zero and unit variance, and suppose \(\sigma ^2_t=\mathrm{{Var}}(x_t|\mathbb I_{t-1})\), where \(\mathbb I_{t}\) denotes the information available at time t. A GARCH model is a specification of the conditional variance which takes the general form

$$\begin{aligned} \sigma ^2_t=h(\varvec{\Upsilon }_{t,p,q}), \end{aligned}$$
(2)

where \({\varvec{\Upsilon }}_{t,p,q}=(x_{t-1},\ldots ,x_{t-p},\log (\sigma _{t-1}^2),\ldots , \log (\sigma _{t-q}^2))^\prime \) and \(h: \mathbb R^r\mapsto \mathbb R\), with \(r=p+q\), being \(p\ge 1\) and \(q\ge 0\). The classical GARCH corresponds to

$$\begin{aligned} h({\varvec{\Upsilon }}_{t,p,q})=\omega +\sum _{j=1}^p \gamma _j x^2_{t-j}+\sum _{j=1}^q \beta _j \sigma ^2_{t-j},\quad \quad t\in \mathbb {Z} \end{aligned}$$
(3)

Although (3) is indeed the most popular specification of the function \(h(\cdot )\) in (2) for the multiplicative time series in (1), there exist numerous alternative instances of (2), such as the asymmetric log-GARCH, the QGARCH and the EGARCH models, among many others, which have proved useful in applications. Some of these models are considered in our Monte Carlo study in Sect. 6. For more information, the reader is referred to Rachev and Mittnik (2000), Francq and Zakoïan (2010), Teräsvirta (2009) and Jondeau et al. (2007).

In this paper, we construct goodness-of-fit tests for correct specification of the function \(h(\cdot )\) in (2) in the context of model (1). This problem has been considered by Berkes et al. (2003a), Hong and Lee (2003), Escanciano (2008), Halunga and Orme (2009), Leucht et al. (2015), Zheng et al. (2018) and Francq et al. (2018), among others.

A number of such tests are based on the following idea: If model (2) is correctly specified, then the sets \(\{\epsilon _t=x_t/h(\varvec{\Upsilon }_{t,p,q})^{1/2}\}\) and \(\{\varepsilon _t\}\) coincide; therefore, the random variables in \(\{\epsilon _t=x_t/h(\varvec{\Upsilon }_{t,p,q})^{1/2}\}\) must be IID. For instance, the tests in Berkes et al. (2003a), Zheng et al. (2018) and Francq et al. (2018) are portmanteau-type tests based on a finite number of correlations of some functions of the residuals. It is well known that these tests are not globally consistent.

Now, when it comes to testing independence in a time series, methods based on characteristic functions such as the tests of Hong (1999) and the tests based on the so-called distance (auto)covariance originally suggested by Székely et al. (2007) and adapted to serial correlation by Fokianos and Pitsillou (2018) and Davis et al. (2018), all with observable data, are quite suitable. Davis et al. (2018) also applied such distance to the residuals of an autoregressive process. Hong and Lee (2003) adapted the method in Hong (1999) for unobservable innovations \(\{\varepsilon _t\}\) in linear and nonlinear time series models. This proposal is not consistent against all alternatives since it is centered on the pairwise independence of \(\varepsilon _t\) and \(\varepsilon _{t+s}\), for all s, which is weaker than the independence of all innovations. We nevertheless revisit this test in our Monte Carlo study.

The relation to characteristic function-based methods with our method lies entirely in the weighting scheme which uses the complex exponential function, but further discussion on this issue is postponed till Remark 2. In fact, as it will be seen further down the paper, our method deals with the specification problem not via independence, but directly targeting correct specification of the conditional variance. As such, the starting point of our method is in the same spirit as the methods of Halunga and Orme (2009), Leucht et al. (2015) and Escanciano (2008). Nevertheless, Escanciano’s method is in the frequency domain, which is rather different from ours. Halunga and Orme’s approach is based on the fact that under model (2), \(\varepsilon _t\) and \(\varvec{\Upsilon }_{t,p,q}\) must be uncorrelated, and their tests are not universally consistent. Finally, the test in Leucht et al. (2015) is restricted to the GARCH(1,1) case. On the other hand, our method is applicable to a wide range of models of general (fixed) order, and by virtue of Bierens characterization, it is consistent against all fixed alternative specifications of the model under test.

The remainder of the paper is outlined as follows. In Sect. 2, we introduce the new test statistic and discuss some computational aspects. Section 3 is devoted to the asymptotic properties of the proposed method. Specifically, we derive the limit (in probability) of the proposed test statistic as well as its asymptotic null distribution and show that the test which rejects the null hypothesis for large values of the test statistic is consistent against fixed alternatives. Because the asymptotic null distribution depends on certain unknown quantities, Section 4 studies in detail a consistent approximation to the null distribution. Specifically, we propose to approximate the null distribution by means of a weighted bootstrap estimator. Section 5 shows that the conditions required for the consistency of the weighted bootstrap estimator are met for certain popular specifications of (2), including conditional duration models. Section 6 summarizes a Monte Carlo experiment carried out to study the finite-sample properties of the method, along with empirical applications. Finally, we end in Section 7 with conclusions and discussion. All proofs are included in Supplementary Material.

Throughout the paper, we will make use of the following notation: all vectors are column vectors; for any vector v, \(v_k\) denotes its kth coordinate, \(\Vert v\Vert \) its Euclidean norm and \(v'\) its transpose; if \(A=(a_{jk})\) is a matrix, then \(\Vert A\Vert =\sum _{j,k}|a_{jk}|\); for any complex number \(x=a+ib\), \(\bar{x}=a-ib\) and \(|x|=\sqrt{a^2+b^2}=\sqrt{x \bar{x}}\), with \(i=\sqrt{-1}\); \(P_0\) denotes probability by assuming that the null hypothesis is true; \(P_*\), \(\mathbb {E}_*\) and \(Cov_*\) denote the conditional probability law, expectation and covariance, given the data, respectively; all limits in this paper are taken when \(T \rightarrow \infty \), where T denotes the sample size; \({\mathop {\rightarrow }\limits ^{\mathcal {L}}}\) denotes convergence in distribution; \({\mathop {\rightarrow }\limits ^{P}}\) denotes convergence in probability; \({\mathop {\rightarrow }\limits ^{a.s.}}\) denotes the almost sure convergence; \(L_w^2\) stands for the space of all \(L^2\) functions defined on the measure space \((\mathbb {R}^r, \mathcal {B}_r,\mu )\), where \(\mathcal {B}_r\) denotes the \(\sigma \)-field of Borel subsets of \(\mathbb {R}^r\), and the measure \(\mu \) has density w: \(\mathrm{d}\mu (\mathbf{{u}})=w(\mathbf{{u}})\mathrm{d}{} \mathbf{{u}}\), that is, \(L_w^2=\{f:\mathbb {R}^r \mapsto \mathbb {C}: \Vert f\Vert ^2_w=\int |f(\mathbf{{u}})|^2w(\mathbf{{u}})\mathrm{d}{} \mathbf{{u}}<\infty \}\), where an unspecified integral denotes integration over \(\mathbb {R}^{r}\); \(\langle \cdot , \cdot \rangle _w\) denotes the scalar product in the Hilbert space \(L_{w}^2\).

2 The test statistic

Let \(x_t\), \(1 \le t \le T\), be observations on the process (1). On the basis of these observations, we wish to test the null hypothesis

$$\begin{aligned} \mathcal{{H}}_0: \sigma ^2_t=h_{\varvec{\vartheta }}(\varvec{\Upsilon }_{t,p,q}), \ \text{ for } \text{ some } \ \varvec{\vartheta }\in \Theta \subseteq \mathbb {R}^v, \end{aligned}$$
(4)

where \(h_{\varvec{\vartheta }}(\cdot )\) denotes a specific function incorporating a multidimensional parameter \(\varvec{\vartheta }\).

In order to formulate our procedure, let \(\sigma ^2_t(\varvec{\vartheta }):=h_{\varvec{\vartheta }}(\varvec{\Upsilon }_{t,p,q})\) and notice that, under the null hypothesis \(\mathcal{{ H}}_0\), we have \( \mathbb {E}(x^2_t|{\varvec{\Upsilon }}_{t,p,q})=\sigma ^2_t(\varvec{\vartheta })\). In this connection, recall the characterization of Bierens (1982) that for real x and a given vector \(\mathbf{y}\) of dimension r, \(\mathbb {E}(x|{ \mathbf y})= 0\) holds if and only if \(\mathbb {E}(x \mathrm{e}^{i \mathbf{u}'{} \mathbf{y}})=0\), for all \(\ \mathbf{{u}} \in \mathbb {R}^{r}\). In view of this characterization, let \(\widehat{\varepsilon }_t:= x_t/\sigma _t(\widehat{\varvec{\vartheta }}_T)\) be the residuals, where \(\widehat{\varvec{\vartheta }}_T\) is a consistent estimator of the parameter \(\varvec{\vartheta }\). Recall that \(\sigma ^2_t(\varvec{\vartheta })\) depends on \(\{x_k, \, -\infty <k \le t-1\}\), whereas we observe \(x_1, \ldots , x_T\). So, in order to calculate the residuals, instead of \(\sigma ^2_t (\widehat{\varvec{\vartheta }}_T)\), we consider \(\widetilde{\sigma }^2_t (\widehat{\varvec{\vartheta }}_T)\) defined as \( \sigma ^2_t (\widehat{\varvec{\vartheta }}_T)\) on the basis of initial values \(\widetilde{\varvec{\Upsilon }}_1:=(\tilde{x}_0,\ldots ,\tilde{x}_{1-p},\log (\tilde{\sigma }_{0}^2),\ldots ,\log (\tilde{\sigma }_{1-q}^2))'\). Thus, in practice, instead of \(\widehat{\varepsilon }_t\), we consider \(\widetilde{\varepsilon }_t:= x_t/\widetilde{\sigma }_t(\widehat{\varvec{\vartheta }}_T)\), \(1 \le t\le T\).

With this notation, our core test-process is defined as

$$\begin{aligned} S_T(\mathbf{{u}})= \frac{1}{T-m}\sum _{t=m+1}^T \left( \widetilde{\varepsilon }^2_t-1\right) \mathrm{e}^{i\mathbf{{u}}'{\widetilde{\varvec{\Upsilon }}_{t}}}, \quad \mathbf{{u}} \in \mathbb {R}^{r},\end{aligned}$$
(5)

where \(\widetilde{\varvec{\Upsilon }}_{t}:= (x_{t-1},\ldots ,x_{t-p},\log (\widetilde{\sigma }_{t-1}^2 (\widehat{\varvec{\vartheta }}_T)),\ldots ,\log (\widetilde{\sigma }_{t-q}^2 (\widehat{\varvec{\vartheta }}_T)))^\prime \), \(m=\mathrm{{max}}\{p,q\}\). Under \(\mathcal{{H}}_0\), it is expected that \(S_T(\mathbf u)\) will be close to zero for large sample size T, \(\forall \mathbf{{u}} \in \mathbb {R}^r\). We consider as test statistic the normalized integrated process

$$\begin{aligned} Q_{T,w}=(T-m)\int |S_T(\mathbf{{u}})|^2 w(\mathbf{{u}}) \mathrm{d} \mathbf{u}=(T-m)\Vert S_T\Vert _w^2, \end{aligned}$$
(6)

where \(w(\cdot )\) denotes a weight function whose choice is discussed below. Rejection of the null hypothesis \(\mathcal{{H}}_0\) is for large values of \(Q_{T,w}\).

We would like to underline the computational simplicity of the test statistic in (6): by straightforward algebra, from (5)

$$\begin{aligned} |S_T(\mathbf{{u}})|^2= \frac{1}{(T-m)^2} \sum _{t,s=m+1}^T \left( \widetilde{\varepsilon }^2_t-1\right) \left( \widetilde{\varepsilon }^2_s-1\right) \cos \{\mathbf{{u}}^\prime (\widetilde{\varvec{\Upsilon }}_{t}-\widetilde{\varvec{\Upsilon }}_{s})\}, \end{aligned}$$

which makes it possible to express the test statistic as

$$\begin{aligned} Q_{T,w}= \frac{1}{T-m} \sum _{t,s=m+1}^T \left( \widetilde{\varepsilon }^2_t-1\right) \left( \widetilde{\varepsilon }^2_s-1\right) I_w(\widetilde{\varvec{\Upsilon }}_{t}-\widetilde{\varvec{\Upsilon }}_{s}), \end{aligned}$$
(7)

where

$$\begin{aligned} I_w(\mathbf{{x}})=\int _{\mathbb {R}^{r}} \cos (\mathbf{{u}}'{} \mathbf{{x}})w(\mathbf{{u}})\mathrm{d}\mathbf u. \end{aligned}$$
(8)

The computation is further simplified by considering specific instances of the weight function \(w(\cdot )\) in (8). For instance, if we choose the density of the spherical stable distribution [see Nolan (2013)], then we readily obtain from (8), \(I_w(\varvec{x})=\mathrm{e}^{-\Vert \varvec{x}\Vert ^\gamma }\), where \(0<\gamma \le 2\) stands for the characteristic exponent of the spherical stable law. The resulting test statistic in (7) reduces to

$$\begin{aligned} Q_{T,\gamma }= \frac{1}{T-m} \sum _{t,s=m+1}^T \left( \widetilde{\varepsilon }^2_t-1\right) \left( \widetilde{\varepsilon }^2_s-1\right) \mathrm{e}^{-\Vert \widetilde{\varvec{\Upsilon }}_{t}-\widetilde{\varvec{\Upsilon }}_{s}\Vert ^\gamma }. \end{aligned}$$
(9)

Here, we only employ the Gaussian density as weight function, leading to the test statistic in (9) with \(\gamma =2\), since this is the only member of the spherical stable class satisfying Assumption 7 which is instrumental in deriving the asymptotic results of the next section.

Remark 1

The null hypothesis \(\mathcal {H}_0\) states, among other things, that \(\sigma ^2_t\) depends on \(\mathbb I_{t-1}\) through \(\varvec{\Upsilon }_{t,p,q}\). Obviously, \(\varvec{\Upsilon }_{t,p,q}\) could be replaced with any one-to-one transform of it (by properly modifying the expression of \(h_{\varvec{\vartheta }}( \cdot )\)). The motivation of our choice for \(\varvec{\Upsilon }_{t,p,q}\) is of pure technical nature: a commonly used estimator for the parameter \(\varvec{\vartheta }\) in GARCH models is the quasimaximum likelihood estimator (QMLE) which minimizes, with respect to \(\varvec{\vartheta }\)

$$\begin{aligned} \sum _{t=1}^T \tilde{\ell }_t(\varvec{\vartheta }), \quad \text{ where } \quad \tilde{\ell }_t(\varvec{\vartheta })=\frac{x_t^2}{\tilde{\sigma }^2_t(\varvec{\vartheta })}+\log \tilde{\sigma }^2_t(\varvec{\vartheta }). \end{aligned}$$

The conditions required for the QMLE to be a consistent and asymptotically normal estimator of \(\varvec{\vartheta }\) coincide with those required to derive asymptotic properties of the test statistic (6) with our choice of \(\varvec{\Upsilon }_{t,p,q}\).

3 Asymptotic properties

In the asymptotics, C (with \(C>0\)) and \(\varrho \) (with \(0<\varrho <1\)) will denote generic constants whose exact values are unimportant and may vary across the text. Also let \(\dot{\sigma }_t^2(\varvec{\vartheta })=\frac{\partial }{\partial \varvec{\vartheta }}{\sigma }_t^2(\varvec{\vartheta })\) and \(\ddot{\sigma }_t^2(\varvec{\vartheta })=\frac{\partial ^2}{\partial \varvec{\vartheta }\partial \varvec{\vartheta }'}{\sigma }_t^2(\varvec{\vartheta })\).

Assumption 1

\(\{x_t\}_{t \in \mathbb {Z}}\) is a strictly stationary and ergodic process.

Assumption 2

  1. (a)

    \(\hat{ \varvec{\vartheta }}_T {\mathop {\longrightarrow }\limits ^{P}} {\varvec{\vartheta }}_0\), for some \({\varvec{\vartheta }}_0\) that belongs to the interior of \(\Theta \), \({\varvec{\vartheta }}_0\) being the true parameter value when \(\mathcal{H}_0\) is true.

  2. (b)

    \(\displaystyle \mathbb {E} \left[ \left\{ \frac{x_t^2}{\sigma _t^2(\varvec{\vartheta }_0)}\right\} ^{(1+\delta )}\right] <\infty ,\) for some \(\delta >0\).

Assumption 3

\(\displaystyle \sup _{\varvec{\vartheta }\in \Theta }\left| \frac{1}{\sigma _t^2(\varvec{\vartheta })}-\frac{1}{\tilde{\sigma }_t^2(\varvec{\vartheta })}\right| \le C \rho ^t\), \(\displaystyle \sup _{\varvec{\vartheta }\in \Theta }\left| \log \{\sigma _t^2(\varvec{\vartheta })\}-\log \{\tilde{\sigma }_t^2(\varvec{\vartheta })\}\right| \le C \rho ^t\).

Assumption 4

\(\mathbb {E}|x_0|^s<\infty \), for some \(s>0\).

Assumption 5

For any \(k>0\), there exists a neighborhood \(\Theta _k\) of \(\varvec{\vartheta }_0\) such that

$$\begin{aligned} \mathbb {E} \left[ \sup _{\varvec{\vartheta }\in \Theta _k}\left\{ \frac{\sigma _t^2(\varvec{\vartheta }_0)}{\sigma _t^2(\varvec{\vartheta })}\right\} ^k\right] <\infty . \end{aligned}$$

Assumption 6

The function \(\varvec{\vartheta }\mapsto {\sigma }_t^2(\varvec{\vartheta })\) is twice continuously differentiable, and the derivatives satisfy: for any \(k>0\), there exists a neighborhood \(\Theta _k\) of \(\varvec{\vartheta }_0\) such that

$$\begin{aligned} \mathbb {E} \left[ \sup _{\varvec{\vartheta }\in \Theta _k}\left\| \frac{\dot{\sigma }_t^2(\varvec{\vartheta })}{\sigma _t^2(\varvec{\vartheta })}\right\| ^k\right]<\infty ,\quad \mathbb {E} \left[ \sup _{\varvec{\vartheta }\in \Theta _k}\left\| \frac{\ddot{\sigma }_t^2(\varvec{\vartheta })}{\sigma _t^2(\varvec{\vartheta })}\right\| ^k\right] <\infty . \end{aligned}$$

Assumption 7

\(w:\mathbb {R}^r \mapsto (0,\infty )\) is an even function: \(w(\mathbf{{u}})=w(-\mathbf{{u}})\), \(\forall \mathbf{{u}}\in \mathbb {R}^r \), satisfying

$$\begin{aligned} \int (1+\Vert u\Vert ^2)w(\mathbf{{u}})\mathrm{d}{} \mathbf{{u}}<\infty . \end{aligned}$$

Notice that the neighborhoods in Assumptions 5 and 6 may not coincide. Nevertheless, without loss of generality, we can assume that they do coincide and thus use the same symbol to denote both neighborhoods.

Assumption 1 is a condition on the data generation process. Assumption 2 will be required to derive the limit (in probability) of the statistic \((T-m)^{-1}Q_{T,w}\). This assumption is also employed by Leucht et al. (2015) for testing goodness-of-fit to the linear GARCH(1,1) model. It is required to get a closed-form expression of such limit under alternatives. Notice that under the null hypothesis, Assumption 2 typically holds true: part (a) is satisfied by the QMLE and part (b) entails that \(\mathbb {E}\{\varepsilon ^{2(1+\delta )}\} <\infty \), for some \(\delta >0\), which is usually assumed for the asymptotic normality of the QMLE. Assumption 3 is required to ensure that the choice of the initial values \(\widetilde{\varvec{\Upsilon }}_1:=(\tilde{x}_0,\ldots ,\tilde{x}_{1-p},\log (\tilde{\sigma }_{0}^2),\ldots ,\log (\tilde{\sigma }_{1-q}^2))'\) is unimportant for the asymptotic behavior of the test statistic. Assumption 4 is a mild moment condition on the data generation process. Assumptions 5 and 6 are conditions on the model under the null hypothesis that must be checked for each particular model.

As mentioned in Sect. 2, rejection of the null hypothesis \(\mathcal{{H}}_0\) is for large values of \(Q_{T,w}\). The next result gives a sound justification for such assertion. Let \({\varvec{\Upsilon }}_t({\varvec{\vartheta }})= (x_{t-1},\ldots , x_{t-p},\log \{\sigma _{t-1}^2({\varvec{\vartheta }})\},\ldots ,\log \{\sigma _{1-q}^2({\varvec{\vartheta }})\})'\).

Theorem 1

Suppose that Assumptions 17 hold. Then,

$$\begin{aligned} \frac{Q_{T,w}}{T-m} {\mathop {\longrightarrow }\limits ^{P}} \kappa = \int _{\mathbb {R}^{r}} \left| \mathcal {E}_0(\mathbf u)\right| ^2 w(\mathbf{{u}}) \mathrm{d} \mathbf{u}, \end{aligned}$$
(10)

where \(\mathcal {E}_0(\mathbf u):=\mathcal {E}_{\varvec{\vartheta }_0}(\mathbf u)\) with

$$\begin{aligned} \mathcal {E}_{\varvec{\vartheta }}(\mathbf{{u}})= \mathbb {E}\left\{ \left( \frac{x_t^2}{\sigma _t^2({\varvec{\vartheta }})}-1\right) \mathrm{e}^{i\mathbf{{u}}'{\varvec{\Upsilon }}_t({\varvec{\vartheta }})}\right\} . \end{aligned}$$

Theorem 1 states that \(\frac{1}{T-m}Q_{T,w}\) converges in probability to a nonnegative constant \(\kappa \ge 0\) which, from Bierens characterization, satisfies that \(\kappa =0\) iff the null hypothesis is true. Thus, a reasonable test of \( \mathcal{{H}}_0\) should reject the null hypothesis for large values of \(Q_{T,w}\). Now, to determine what are large values we must calculate its null distribution, or at least an approximation to it. The exact null distribution of \(Q_{T,w}\) is clearly unknown. As an approximation, we will derive its asymptotic null distribution. With this aim, a further assumption is needed, which is listed below.

Assumption 8

Under the null hypothesis,

$$\begin{aligned} \widehat{\varvec{\vartheta }}_T={\varvec{\vartheta }}_0+\frac{1}{T}\sum _{t=1}^T L_t({\varvec{\vartheta }}_0)+o_P(T^{-1/2}), \end{aligned}$$

with \(L_{t}({\varvec{\vartheta }}_0)=(g_1(\varepsilon _t)l_1(\varepsilon _{t-1},\, \varepsilon _{t-2}, \ldots ), \, \ldots , g_{v}(\varepsilon _t)l_{v}(\varepsilon _{t-1}, \varepsilon _{t-2}, \ldots ))'\), \(1\le t\le T\), satisfying \(\mathbb {E}\{g_u(\varepsilon _0)\}=0\), \(\mathbb {E}\{g_u(\varepsilon _0)^2\}<\infty \), \(\mathbb {E}\{l_u(\varepsilon _{-1}, \varepsilon _{-2}, \ldots )^2\}<\infty \), \(1 \le u \le v\).

First, next theorem derives an asymptotic approximation for \(Q_{T,w}\).

Theorem 2

Suppose that Assumptions 18 hold and that \(\mathbb {E}(\varepsilon _0^4)<\infty \). If \(\mathcal{H}_0\) is true,

$$\begin{aligned} Q_{T,w}= & {} \left\| W_T(\cdot ;{\varvec{\vartheta }}_0)\right\| _w^2+o_P(1), \end{aligned}$$

where

$$\begin{aligned} W_T(\mathbf{{u}};{\varvec{\vartheta }})= & {} \frac{1}{\sqrt{T-m}}\sum _{t=m+1}^T H_t(\mathbf{{u}};{\varvec{\vartheta }}),\\ H_t(\mathbf{{u}};{\varvec{\vartheta }})= & {} (\varepsilon ^2_t-1)\left[ \cos \{ \mathbf{{u}}'{\varvec{\Upsilon }}_t({\varvec{\vartheta }}) \}+ \sin \{ \mathbf{{u}}'{\varvec{\Upsilon }}_t({\varvec{\vartheta }}) \} \right] -G(\mathbf{{u}};{\varvec{\vartheta }})'L_t({\varvec{\vartheta }}),\\ G(\mathbf{{u}};{\varvec{\vartheta }})= & {} \mathbb {E}\left( \frac{\dot{\sigma }_0^2(\varvec{\vartheta })}{\sigma _0^2(\varvec{\vartheta })} \left[ \cos \{ \mathbf{{u}}'{\varvec{\Upsilon }}_0({\varvec{\vartheta }}) \}+ \sin \{ \mathbf{{u}}'{\varvec{\Upsilon }}_0({\varvec{\vartheta }}) \} \right] \right) . \end{aligned}$$

Next, as a consequence of the result in Theorem 2, the following corollary provides the asymptotic null distribution of \(Q_{T,w}\).

Corollary 1

Suppose that the assumptions of Theorem 2 hold. Then,

$$\begin{aligned} Q_{T,w}{\mathop {\longrightarrow }\limits ^{\mathcal {L}}}\Vert Z\Vert ^2_w, \end{aligned}$$

where \(\{Z(\mathbf{{u}}), \,\mathbf{{u}}\in \mathbb {R}^r\}\) is a zero-mean Gaussian random element in \(L^2_w\) with covariance kernel \(\varrho _0(\mathbf{{u}},\mathbf{{v}})=\mathbb {E}\{H_0(\mathbf{{u}}; {\varvec{\vartheta }}_0)H_0(\mathbf{{v}};{\varvec{\vartheta }}_0) \}\), \(\forall \mathbf{{u}}, \mathbf{{v}} \in \mathbb {R}^r\).

As an immediate consequence of Theorem 1 and Corollary 1, it follows that the test that rejects the null hypothesis for large values of the test statistic \(Q_{T,w}\) is consistent against any fixed alternative such that \(E\left( x_t^2 \, | \, \Upsilon _{t,p,q}\right) \ne h_{\varvec{\vartheta }}(\Upsilon _{t,p,q})\).

Remark 2

In general, Bierens characterization involving \(\mathbb {E}\{x\,g(\mathbf{y}, \mathbf{u})\}\), holds for a wide range of “instrument” functions \(g(\cdot ,\cdot )\). For a thorough discussion on the possible choice of instruments, the reader is referred to Stinchcombe and White (1998). Here, we have restricted ourselves to the exponential function \(g(\mathbf{y}, \mathbf{u})=\mathrm{e}^{i \mathbf{u}'{} \mathbf{y}}\) which was the original instrument considered in Bierens (1982). There are several reasons for this choice such as computational convenience (see Sect. 2) and power considerations. In this connection, we refer to Escanciano (2008), where the resulting tests for this choice of g exhibit favorable performance in finite samples. Nevertheless, as it will become evident from the proofs in Supplementary Material, all results still hold true for arbitrary instruments, provided that the associated function \(g(\cdot ,\cdot )\) is bounded and satisfies

$$\begin{aligned} |g(\mathbf{x},\mathbf{u})-g(\mathbf{y},\mathbf{u})|\le C \Vert \mathbf{u}\Vert \Vert \mathbf{x}-\mathbf{y}\Vert , \end{aligned}$$
(11)

for some positive constant \(C>0\).

Remark 3

As an anonymous referee observed, the test statistic is not scale invariant. The reason is that properly specified GARCH models contain a scale-free error term and thus model the scale of the data (either directly the variance, or otherwise some form of volatility) via a certain regression-type function. In turn, this regression function is in most cases itself not scale equivariant, and consequently, the model itself, as well as the corresponding parameter estimates, is not scale equivariant. Hence, the notion of scale equivariance does not apply to GARCH models, and therefore, the corresponding goodness-of-fit tests do depend on the scale of the data

Going back to Corollary 1, we note that besides its dependence on \({\varvec{\vartheta }}_0\) and the corresponding estimator, the asymptotic null distribution of the test statistic \(Q_{T,w}\) also depends on the unknown distribution of the innovations. Therefore, the asymptotic null distribution does not provide a useful approximation to the null distribution of \(Q_{T,w}\). In the next section, we circumvent this problem by proposing a resampling scheme that consistently estimates the limit law of \(Q_{T,w}\).

4 Approximating the null distribution using multipliers

The tests proposed in Leucht et al. (2015) for testing goodness-of-fit to a linear GARCH(1,1) model is, in a sense, similar to our proposal (it can be seen as a particular case of our proposal if instead of taking the exponential function, a general function \(g(\cdot ,\cdot )\) satisfying (11) is considered). These authors proposed a residual-based bootstrap to approximate the null distribution of their test statistic. We could also consider a residual-based bootstrap procedure to approximate the null distribution of our test statistic but, as noted in Jeong (2017), the residual-based bootstrap actively employs the specific dependence structure of the model. In addition, the residual-based bootstrap is very time-consuming, since the parameter must be estimated with each resample. Because of these reasons, next we prove the consistency of a widely applicable bootstrap method, the weighted bootstrap, also called the multipliers method. This method has been used effectively in the context of testing goodness-of-fit for the error distribution in linear GARCH models for methods based on the empirical distribution function in Ghoudi and Rémillard (2014, 2018), as well as for methods based on the empirical characteristic function in Jiménez-Gamero and Pardo-Fernández (2017).

In order to discuss the implementation of the multipliers method for approximating the null distribution of \(Q_{T,w}\), recall that Theorem 2 states that \(Q_{T,w} = Q_{1,T,w} +o_p(1)\), with

$$\begin{aligned} Q_{1,T,w}=\left\| W_T(\cdot ;{\varvec{\vartheta }}_0)\right\| _w^2. \end{aligned}$$

Let \(\xi _{m+1}, \ldots , \xi _T\) be IID copies of a random variable \(\xi \) with \(\mathbb {E}(\xi )=0\) and \(\mathrm{{Var}}(\xi )=1\), independent of \(x_1,\ldots , x_T\). Let us consider the following weighted bootstrap version of \(Q_{1,T,w}\),

$$\begin{aligned} Q_{1,T,w}^*=\left\| W_{1,T}^*(\cdot ;{\varvec{\vartheta }}_0)\right\| _w^2, \end{aligned}$$

with

$$\begin{aligned} W_{1,T}^*(\mathbf{{u}};{\varvec{\vartheta }}_0) = \frac{1}{\sqrt{T-m}}\sum _{t=m+1}^T H_t(\mathbf{{u}};{\varvec{\vartheta }}_0)\xi _t. \end{aligned}$$

The following result gives the large-sample conditional distribution of \(Q_{1,T,w}^*\), given \(x_1, \ldots , x_T\).

Theorem 3

Suppose that Assumptions 1 and 68 hold and that \(\mathbb {E}(\varepsilon _0^4)<\infty \). Then,

$$\begin{aligned} \sup _x \left| P_*\left( Q_{1,T,w}^*\le x\right) -P\left( \Vert Z_1\Vert ^2_w\le x\right) \right| {\mathop {\longrightarrow }\limits ^{P}}0, \end{aligned}$$

where \(\{Z_1(\mathbf{{u}}), \,\mathbf{{u}}\in \mathbb {R}^r\}\) is a zero-mean Gaussian random element in \(L^2_w\) having covariance kernel \(\varrho (\mathbf{{u}},\mathbf{{v}})=\mathbb {E}\{H_0(\mathbf{{u}}; {\varvec{\vartheta }}_0)H_0(\mathbf{{v}};{\varvec{\vartheta }}_0) \}\), \(\forall \mathbf{{u}}, \mathbf{{v}} \in \mathbb {R}^r\).

As an immediate consequence of Theorem 3, the following corollary states that when \(\mathcal{H}_0\) is true, the conditional distribution of \(Q_{1,T,w}^*\), given \(x_1,\ldots , x_T\), provides a consistent estimator of the null distribution of \(Q_{T,w}\).

Corollary 2

Suppose that the assumptions of Theorem 2 hold. Then,

$$\begin{aligned} \sup _x \left| P_*\left( Q_{1,T,w}^*\le x\right) -P_0\left( Q_{T,w}\le x\right) \right| {\mathop {\longrightarrow }\limits ^{P}}0. \end{aligned}$$

Although the conditional distribution, given \(x_1,\ldots , x_T\), of \(Q_{1,T,w}^*\) provides a consistent estimator of the null distribution of \(Q_{T,w}\), it presents the same drawbacks as the asymptotic null distribution of \(Q_{T,w}\), since this distribution depends on the unknown values of \(H_{m+1}(\mathbf{{u}};{\varvec{\vartheta }}_0), \ldots , H_T(\mathbf{{u}};{\varvec{\vartheta }}_0)\). These quantities depend on \({\varvec{\vartheta }}_0\), on the function \(G(\mathbf{{u}};{\varvec{\vartheta }}_0)\), as well as on the unobservable innovations. In addition, \(L_t({\varvec{\vartheta }})\) in Assumption 8 is usually unknown. Thus, \(Q_{1,T,w}^*\) cannot be used in practice. To overcome this problem, we replace \(\varepsilon _t\) by \(\widetilde{\varepsilon }_t\), (\(m+1\le t \le T\)), \(G(\mathbf{{u}};{\varvec{\vartheta }}_0)\) by

$$\begin{aligned} \widetilde{G}(\mathbf{{u}};\widehat{\varvec{\vartheta }}_T) = \frac{1}{T-m}\sum _{t=m+1}^T\widetilde{A}_t(\widehat{\varvec{\vartheta }}_T)\left\{ \cos ( \mathbf{{u}}'\widetilde{\varvec{\Upsilon }}_t )+\sin ( \mathbf{{u}}'\widetilde{\varvec{\Upsilon }}_t )\right\} , \end{aligned}$$

with

$$\begin{aligned} \widetilde{A}_t({\varvec{\vartheta }})=\frac{\partial }{\partial {\varvec{\vartheta }}}\log \{\widetilde{\sigma }_t^2({\varvec{\vartheta }})\}, \end{aligned}$$
(12)

and \(L_t({\varvec{\vartheta }}_0)\) by \(\widehat{L_t({\varvec{\vartheta }}_0)}\) satisfying

$$\begin{aligned} \frac{1}{T-m}\sum _{t=m+1}^T \Vert L_t({\varvec{\vartheta }}_0)-\widehat{L_t({\varvec{\vartheta }}_0)}\Vert ^2 {\mathop {\longrightarrow }\limits ^{P}}0. \end{aligned}$$
(13)

A candidate for \(\widehat{L_t({\varvec{\vartheta }}_0)}\) will be discussed in Sect. 5 for some special cases of specifications in \(\mathcal{H}_0\). So, instead of \(Q_{1,T,w}^*\) we consider

$$\begin{aligned} Q_{T,w}^* = \left\| W_T^*(\cdot ;\widehat{\varvec{\vartheta }}_T) \right\| _w^2,\\ \end{aligned}$$

with

$$\begin{aligned} W_T^*(\mathbf{{u}};\widehat{\varvec{\vartheta }}_T) = \frac{1}{\sqrt{T-m}}\sum _{t=m+1}^T \widetilde{H}_t(\mathbf{{u}};\widehat{\varvec{\vartheta }}_T)\xi _t, \end{aligned}$$

and

$$\begin{aligned} \widetilde{H}_t(\mathbf{{u}};\widehat{\varvec{\vartheta }}_T) = (\widetilde{\varepsilon }^2_t-1)\left\{ \cos ( \mathbf{{u}}'{\widetilde{\varvec{\Upsilon }}}_t )+ \sin ( \mathbf{{u}}'{\widetilde{\varvec{\Upsilon }}}_t ) \right\} -\widetilde{G}(\mathbf{{u}};\widehat{\varvec{\vartheta }}_T)'\widehat{L_t({\varvec{\vartheta }}_0)}. \end{aligned}$$

The next result shows that \(Q_{T,w}^* \) and \(Q_{1,T,w}^*\) both have the same conditional limit distribution, given the observations. The main advantage of \(Q_{T,w}^* \) over \(Q_{1,T,w}^*\) is that the former does not depend on unknown quantities, and thus, at least in principle, its distribution can be calculated. This practical issue will be dealt with at the end of the section. From now on, the conditional distribution of \(Q_{T,w}^* \), given the data, will be called the weighted bootstrap distribution of \(Q_{T,w}\).

Assumption 9

$$\begin{aligned} \displaystyle \frac{1}{T-m}\sum _{t=m+1}^T \Vert A_t({\varvec{\vartheta }}_0)-\tilde{A}_t(\widehat{\varvec{\vartheta }}_T)\Vert {\mathop {\longrightarrow }\limits ^{P}}0. \end{aligned}$$

Theorem 4

Suppose that Assumptions 19 and (13) hold. Then

$$\begin{aligned} \sup _x \left| P_*\left( Q_{T,w}^*\le x\right) -P\left( Q_{1,T,w}^*\le x\right) \right| {\mathop {\longrightarrow }\limits ^{P}}0. \end{aligned}$$

Note that the result in Theorem 4 is valid whether or not the null hypothesis is true. Two immediate consequences follow.

Corollary 3

If \(\mathcal{H}_0\) is true and the assumptions of Theorem 4 hold, then

$$\begin{aligned} \sup _x \left| P_*\left( Q_{T,w}^*\le x\right) -P_0\left( Q_{T,w}\le x\right) \right| {\mathop {\longrightarrow }\limits ^{P}}0. \end{aligned}$$

In view of Corollary 3 and for \(\alpha \in (0,1)\), we consider the test function

$$\begin{aligned} {\Psi }^*={\Psi }^*(x_1, \ldots , x_T)=\left\{ \begin{array}{ll} 1, &{}\quad \text{ if } Q_{T,w}\ge q_{ \alpha }^*,\\ 0, &{}\quad \text{ otherwise, } \end{array}\right. \end{aligned}$$

where \(q_{\alpha }^*\) is the \(1-\alpha \) percentile of the conditional distribution of \(Q_{T,w}^*\), given \(x_1, \ldots , x_T\), or equivalently, \(\Psi ^*=1\) if \(p^*\le \alpha \), where \(p^*=P_*\{Q_{T,w}^*\ge Q_{T,w, \mathrm{obs}}\}\), \(Q_{T,w, \mathrm{obs}}\) being the observed value of the test statistic \(Q_{T,w}\). The result in Corollary 3 states that the test \(\Psi ^*\) is asymptotically correct, in the sense that its type I error is asymptotically equal to the nominal level \(\alpha \). In the next corollary, we prove that \({\Psi }^*\) is asymptotically able to detect any fixed alternative.

Corollary 4

If \(\mathcal{H}_0\) is not true and the assumptions of Theorem 4 hold, then \(P(\Psi ^*=1)\rightarrow 1\).

The calculation of the exact weighted bootstrap distribution of \(Q_{T,w}\) is, from a practical point of view, not feasible. Nevertheless, the weighted bootstrap p-value can be approximated by simulation, following the steps in next algorithm. Let \(M=(M_{t,s})\) denote the \((T-m)\times (T-m)\)-matrix with elements

$$\begin{aligned} M_{t,s}=\int \widetilde{H}_{m+t}(\mathbf{{u}};\widehat{\varvec{\vartheta }}_T) \widetilde{H}_{m+s}(\mathbf{{u}};\widehat{\varvec{\vartheta }}_T)w(\mathbf{{u}})\mathrm{d}{} \mathbf{{u}}, \quad 1\le t,s \le T-m. \end{aligned}$$

Algorithm

  1. 1.

    Compute the test statistic \(Q_{T,w,obs}\).

  2. 2.

    For some large integer B, repeat for every \(b \in \{1,\ldots ,B\}\):

    1. (a)

      Generate \(\xi ^b=(\xi _{m+1}^b, \ldots , \xi _T^b)'\).

    2. (b)

      Calculate \(Q_{T,w}^{*b}={\xi ^b}'M \xi ^b/(T-m)\).

  3. 3.

    Approximate the p-value corresponding to \(Q_{T,w,obs}\) by \(\hat{p}^{*}=\frac{1}{B}\sum _{b=1}^BI\{ Q_{T,w}^{*b}>Q_{T,w,obs}\}\).

Before moving to applications, we reiterate that in the weighted bootstrap algorithm, the parameter \({\varvec{\vartheta }}\) needs to be estimated only once, while the residual-based bootstrap counterpart requires estimation of this parameter for each resample.

5 Applications

The theory so far developed for testing \(\mathcal{H}_0\) is very general. This section discusses the applicability of the proposed procedure to testing for some special models.

5.1 Testing for the linear GARCH(p,q) model

Let us now consider the problem of testing for the linear GARCH model defined in (3), for some \(p \ge 1\), \(q \ge 0\). The consistency and the asymptotic normality of the QMLE in this model were first studied in Berkes et al. (2003b). Then, Francq and Zakoïan (2004) derived those properties under somewhat weaker assumptions, so we will follow the approach in the latter paper.

Francq and Zakoïan (2004) have given a necessary and sufficient condition for Assumption 1 to hold. If such a condition holds, then Assumption 4 is met (ibid. Proposition 1). Let \({\varvec{\vartheta }}=(\omega , \gamma _1, \ldots , \gamma _q, \beta _1, \ldots , \beta _q)' \in \Theta \subset (0,\infty ) \times [0, \infty )^{p+q}\), for some compact set \(\Theta \). Assume that \({\varvec{\vartheta }}_0=(\omega _0, \gamma _{01}, \ldots , \gamma _{0q}, \beta _{01}, \ldots , \beta _{0q})'\) belongs to the interior of \(\Theta \), \(\varepsilon _t^2\) has a non-degenerate distribution satisfying \(\mathbb {E}(\varepsilon _0^4)<\infty \), \(\{x_t\}\) is stationary and ergodic, \(\sum _{j=1}^q\beta _{0j}<1\) and that the parameter \({\varvec{\vartheta }}_0\) is identifiable. Then, the QMLE of \({\varvec{\vartheta }}\) satisfies Assumption 8 with

$$\begin{aligned} \begin{array}{l} \displaystyle L_j=L_j({\varvec{\vartheta }})=(\varepsilon _j^2-1)A_t({\varvec{\vartheta }})J^{-1}, \quad 1\le t \le T,\\ \displaystyle J=E\{A_1({\varvec{\vartheta }})A_1({\varvec{\vartheta }})'\}\\ \displaystyle A_t({\varvec{\vartheta }})=\frac{\dot{\sigma }_t^2(\varvec{\vartheta })}{\sigma _t^2(\varvec{\vartheta })}, \quad 1\le t \le T, \end{array} \end{aligned}$$
(14)

(ibid. Theorem 2.2). Under these conditions, it follows that Assumptions 3, 5 and 6 hold (ibid. Theorems 2.1 and 2.2).

For the consistency of the weighted bootstrap distribution estimator of \(Q_{T,w}\), we also need \(\widehat{L_t({\varvec{\vartheta }}_0)}\) satisfying (13). Let

$$\begin{aligned} \begin{array}{l} \displaystyle \hat{L}_t=(\tilde{\varepsilon }_j^2-1)\tilde{A}_j(\widehat{\varvec{\vartheta }}_T)\hat{J}^{-1}, \quad 1\le t \le T, \\ \displaystyle \hat{J}=\frac{1}{n}\sum _{t=1}^T\tilde{A}_t(\widehat{\varvec{\vartheta }}_T)\tilde{A}_t(\widehat{\varvec{\vartheta }}_T)', \end{array} \end{aligned}$$
(15)

and \(\tilde{A}_j({\varvec{\vartheta }}) \) as defined in (12). Proposition 7.1 in Jiménez-Gamero and Pardo-Fernández (2017) shows that \(\{\hat{L}_t, \, 1\le t \le T\}\) satisfies (13), while Lemma B.1 (f) of the same paper guarantees that Assumption 9 holds. Therefore, our procedure can be applied to testing for the linear GARCH model when the parameter is estimated by means of the QMLE.

We note that asymptotic results in Leucht et al. (2015) for the special case of the linear GARCH(1,1) model are obtained under the same assumptions, but the corresponding residual-based bootstrap resampling scheme employed requires stronger assumptions than our proposal. As it was mentioned in Introduction, the approach in Escanciano (2008) is rather different from ours, even the required assumptions are not comparable: although that paper does not assume that the innovations are IID, when the test in that paper is applied to test for \(\mathcal {H}_0\), it requires \(\mathbb {E}\{(x_0^2-\sigma _0^2)^2\}<\infty \), which is ensured if \(\mathbb {E}\{x_0^4\}<\infty \) and \(\mathbb {E}\{\sigma _0^4\}<\infty \). From Theorem 2.9 in Francq and Zakoïan (2010), the existence of such moments entails certain restrictions on the parametric space \(\Theta \).

5.2 Testing for the asymmetric log-GARCH(p,q) model

Although the linear GARCH model is the most popular, it possesses some limitations such as (i) the positivity of the components of \({\varvec{\vartheta }}\) which complicates estimation procedures, (ii) the assumption that volatility is bounded from below and (iii) the lack of the leverage effect whereby negative observations tend to have more impact on volatility than positive observations of the same magnitude. Because of these reasons, some alternative models have been suggested. Here, we consider the asymmetric log-GARCH(p,q) model, whose specification of the function \(h(\cdot )\) in (2) is given by

$$\begin{aligned} h(\Upsilon _{t,p,q})= & {} \exp \left\{ \omega +\sum _{j=1}^p \left[ \gamma _{j+}I(x_{t-j}>0)+\gamma _{j-}I(x_{t-j}<0)\right] \log (x^2_{t-j})\right. \\&\left. +\sum _{j=1}^q \beta _j \log (\sigma ^2_{t-j}) \right\} , \quad \forall t=1,2,\ldots \ . \end{aligned}$$

For recent works on this class of models, the reader is referred to Francq et al. (2013, 2018), and the references therein.

Theorem 2.1 in Francq et al. (2013) gives a sufficient condition for Assumption 1 to hold. Theorem 3.1 (see also Proposition 3.3 for the case \(p=q=1\)) in that paper gives a sufficient condition for Assumption 4 to hold. Expressions (7.6) and (7.8) in the proof of Theorem 4.1 in Francq et al. (2013) show that, under the assumptions of that theorem, Assumption 3 is also met. Under those assumptions, our Assumptions 5 and 6 also hold (ibid. Theorem 4.2). Also the QMLE of \({\varvec{\vartheta }}\) satisfies Assumption 8 with \(L_j({\varvec{\vartheta }})\) as in (14).

For the consistency of the weighted bootstrap distribution estimator of \(Q_{T,w}\), we also need \(\widehat{L_t({\varvec{\vartheta }}_0)}\) satisfying (13). With this aim, we can use the approximation defined in (15). Then,

$$\begin{aligned} \sup _{{\varvec{\vartheta }}} \left\| \frac{\partial }{\partial {\varvec{\vartheta }}}\log \{\sigma _t^2(\varvec{\vartheta })\}-\frac{\partial }{\partial {\varvec{\vartheta }}}\log \{\tilde{\sigma }_t^2(\varvec{\vartheta })\}\right\| \le C \rho ^t, \end{aligned}$$

(ibid. Theorem 4.2), which implies that Assumption 9 holds. Then, Proposition 7.1 in Jiménez-Gamero and Pardo-Fernández (2017) shows that \(\{\hat{L}_t, \, 1\le t \le T\}\) satisfies (13), since the proof is based on the assumptions made, and not on the particular specification of the linear GARCH. Therefore, our procedure can be applied to testing for the asymmetric log-GARCH model when the parameter is estimated by means of the QMLE.

5.3 Testing for specification of duration models

Engle and Russell (1998) proposed the linear autoregressive conditional duration model to analyze high-frequency data on durations between successive financial market trades. Later on, since the linear specification is so restrictive in practice, as pointed out by Dufour and Engle (2000), various alternative nonlinear conditional duration models have been proposed (for example, Bauwens and Giot (2003) proposed logarithmic autoregressive conditional duration models).

Let us consider the autoregressive conditional duration model

$$\begin{aligned} \begin{array}{lll} y_t &{}=&{} \psi _t \epsilon _t, \\ \psi _t &{}=&{} h_{\varvec{\vartheta }}({\varvec{\Lambda }}_{t, p,q}), \end{array} \end{aligned}$$

where \({\varvec{\Lambda }}_{t, p, q}=(y_{t-1},\ldots ,y_{t-p}, \log \psi _{t-1},\ldots ,\log \psi _{t-q})^\prime \), and \(\{h_{\varvec{\vartheta }} : {\varvec{\vartheta }} \in \varvec{\Theta } \}\) denotes a specific parametric family of nonnegative functions that involves a parameter vector \(\varvec{\varvec{\vartheta }}\), and \(\{\epsilon _i\}_{i\in \mathbb {Z}}\) is a sequence of IID positive random variables with \(\mathbb {E} \epsilon _0 = 1\) and \(\mathbb {E}\epsilon _0^2<\infty \).

In practice, \(y_t\) is the duration between two consecutive events such as the quote, price change with \(\psi _t\) being the conditional expected value given past observations.

Notice that an autoregressive conditional duration model has the dynamics of the square of a GARCH model. Since our procedure, initially designed for testing model specification in GARCH models, depends on the square of the data and on the specification for \(\sigma _t^2\), all results carry over to the present model, and thus, specification testing may be carried out just by replacing \(x^2_t\) and \(\sigma _t^2\) by \(y_t\) and \(\psi _t\), respectively.

6 Monte Carlo and real data applications

6.1 Monte Carlo results

This section is devoted to the study of the finite-sample performance of the proposed tests in terms of level approximation and power. With this purpose, a Monte Carlo simulation study was conducted. In all cases, the innovations are sequences of IID variables having a standard normal distribution and the parameters were estimated by means of the QMLE. The results displayed in the tables are the empirical powers obtained by generating in each case 1000 samples of size \(T=500\). As weight function \(w(\cdot )\), we took the density function of a \((p+q)\)-variate normal distribution which yields \(I_w({\varvec{x}})=\exp \{-(1/a)\Vert {\varvec{x}}\Vert ^2\}\) for several values of \(a>0\), which meets Assumption 7. The nominal significance levels are \(\alpha =0.05\) and \(\alpha =0.10\). With the aim of comparing the behavior of the approximations provided by the weighted bootstrap (WB in the tables) studied in Sect. 4 and the residual-based bootstrap (RB in the tables) as defined in Leucht et al. (2015), we calculated both of them with \(B=1000\) for the weighted bootstrap and by generating \(B=500\) bootstrap samples for the residual-based bootstrap, as in Leucht et al. (2015). All computations in this paper have been performed by using programs written in the R language (R Core Team 2018).

Table 1 Percentages of rejections under \(\mathcal{{H}}_{0, \mathrm{lin}}\) for \(T=500\) (level)

We first considered the problem of testing goodness-of-fit for a linear GARCH(1,1) model, that is,

$$\begin{aligned} \mathcal{{H}}_{0, \mathrm{lin}}: \sigma ^2_t=\omega + \gamma x_{t-1}^2+ \beta \sigma ^2_{t-1}. \end{aligned}$$

The parameter \((\omega , \gamma , \beta )\) was estimated by the QMLE by using the function garch of the R package tseries. In addition to the proposal in this paper, we tried the portmanteau-type test in Berkes et al. (2003a) (headed as Portm. in the tables) for several values of the number of lags considered, k, and the test in Hong and Lee (2003) (headed as HL in the tables) for several values of the bandwidth, p. Table 1 displays the results for the level. For the test proposed in this paper, when \(\beta \le 0.7\), both approximations give actual levels which are quite close to the nominal ones; for \(\beta =0.8\), the weighted bootstrap approximation is a bit liberal, and the residual-based bootstrap approximation becomes very conservative. Unreported simulation results indicate that for both methods a sample size as large as \(T=1000\) is required in order to obtain empirical levels close to the nominal values for \(\beta =0.8\) and that even higher samples sizes are required for more persistent models. Motivated by a real data set application, we also tried the weighted bootstrap approximation for \(T=3300\), \(\omega =0.02\), \(\gamma =0.04\) and \(\beta =0.95\). For this sample size, the residual-based bootstrap approximation becomes unaffordable. Table 2 displays the results, showing that when the model is very persistent, but the sample size is large, the weighted bootstrap provides an adequate approximation.

The test in Berkes et al. (2003a) is conservative for small values of \(\beta \) and rather liberal for large \(\beta \). In unreported simulations, we observed that such behavior for large \(\beta \) persists even for larger sample sizes. We also observe that the actual level strongly depends on the number of lags considered. The test in Hong and Lee (2003) is conservative for all bandwidths considered, and its level is robust to the choice of the bandwidth, as also observed by the authors in their paper.

Table 2 Percentages of rejections under \(\mathcal{{H}}_{0, \mathrm{lin}}\) for \(T=3300\) with the weighted bootstrap approximation (level)

As alternatives, we considered the QGARCH(1,1) defined through Eq. (1) with

$$\begin{aligned} \sigma ^2_t=\omega + \gamma x_{t-1}^2+ \beta \sigma ^2_{t-1} + \delta x_{t-1}, \end{aligned}$$

and the asymmetric log-GARCH(1,1), as defined in Sect. 5.2. Table 3 displays the obtained results for the power. It also includes the test in Leucht et al. (2015) (labeled as LNK in Table 3) with the same choices for the functions involved in the definition of their test statistic as those considered in their simulations. For the proposal in this paper and QGARCH alternative, the test based on the weighted bootstrap approximation and the one based on the residual-based bootstrap approximation exhibit quite close powers, but for the asymmetric log-GARCH alternative the test based on the weighted bootstrap approximation is clearly more powerful. Although under the null hypothesis both approximations should be close (at least for large sample sizes), under alternatives they may be rather different, and thus, the powers may considerably differ. In all tried cases, the test proposed in this paper outperforms the ones in Leucht et al. (2015). Berkes et al. (2003a) and Hong and Lee (2003).

Table 3 Percentages of rejections under alternatives for testing \(\mathcal{{H}}_{0, \mathrm{lin}}\) for \(T=500\) (power)

We also considered the problem of testing goodness-of-fit for the asymmetric log-GARCH(1,1) model, that is,

$$\begin{aligned} \mathcal{{H}}_{0, \mathrm{alog}}: \log (\sigma ^2_t)=\omega + \left\{ \gamma _+I(x_{t-1}>0)+\gamma _{-}I(x_{t-1}<0)\right\} \log (x^2_{t-j})+\beta \log (\sigma ^2_{t-1}). \end{aligned}$$

As before, in addition to the proposal in this paper, we tried (an adaptation of) the portmanteau-type test in Francq et al. (2018) (headed as Portm. in the tables) for several values of the number of lags considered, k, and again the test in Hong and Lee (2003) (headed as HL in the tables) for several values of the bandwidth, p. Table 4 displays the results for the level. As for the test proposed in this paper, both approximations give actual levels which are quite close to the nominal ones. Motivated by a real data set application, we also tried the weighted bootstrap approximation for \(T=3300\), \(\omega =0.04\), \(\gamma _+=0.03\), \(\gamma _-=0.02\) and \(\beta =0.96\). For this sample size, the residual-based bootstrap approximation becomes unaffordable. Table 5 displays the results obtained. The empirical levels are quite close to the nominal values. The portmanteau-type test gives actual levels close to the nominal values for large k. The test in Hong and Lee (2003) is conservative for all bandwidths considered.

Table 4 Percentages of rejections under \(\mathcal{{H}}_{0, \mathrm{alog}}\) for \(T=500\) (level)
Table 5 Percentages of rejections under \(\mathcal{{H}}_{0, \mathrm{alog}}\) for \(T=3300\) with the weighted bootstrap approximation (level)

As alternative, we considered the QGARCH(1,1) model. Table 6 displays the obtained power results. As noticed before, for the proposal in this paper, the tests based on the considered approximations to the null distribution may have different powers. In this case, the test based on the residual-based bootstrap approximation is a bit more powerful than the one based on the weighted bootstrap approximation. In all tried cases, the new test outperforms the portmanteau-type test and the one in Hong and Lee (2003).

Table 6 Percentages of rejections under alternatives for testing \(\mathcal{{H}}_{0, \mathrm{alog}}\) for \(T=500\) (power)

We also compared both approximations in terms of the required CPU. Table 7 shows the average CPU time in seconds to get one p-value for testing \(\mathcal{{H}}_{0, \mathrm{lin}}\) for \(T=500, 1000\), by generating \(B=1000\) resamples in each case. The quantities in the table clearly show the computational efficiency of the weighted bootstrap in comparison with the residual-based bootstrap. This advantage of the weighted bootstrap over the residual-based bootstrap comes from the fact that the former does not re-estimate the GARCH parameters and the residuals at each iteration.

Table 7 CPU time consumed for the calculation of one p-value (in seconds)

6.2 Real data fits to GARCH and duration models

This subsection reports the results of three real data set applications.

6.2.1 S&P500 daily stock index

First, we applied our methodology to the extensively studied S&P500 daily stock index. Specifically, we considered two periods: the first from January 2, 1990, to December 31, 1993, and the second from January 2, 1998, to August 28, 2002. Figures 1 and 2 in Supplementary Material display the original time series and the time series of the log returns for each period. The number of observations in these periods is 1013 and 1170, respectively. The reason for choosing such periods is that, according to the results obtained in Hlávka et al. (2017), the S&P500 return series associated with each period is a martingale difference sequence. So we could go a step beyond and test for some GARCH-type specification.

We tested the null hypothesis \(\mathcal{{H}}_{0, \mathrm{lin}}\). The QMLE of the parameters are: \((\hat{\omega }, \hat{\gamma }, \hat{\beta })=(7.0185 \times 10^{-9}, 1.7753\times 10^{-2}, 9.8129\times 10^{-1})\), for the first period, and \((\hat{\omega }, \hat{\gamma }, \hat{\beta })=(7.1723 \times 10^{-6}, 9.4489\times 10^{-2}, 8.6811\times 10^{-1})\), for the second one. Since from the simulations we know that for the sample size at hand, the resulting test is a bit liberal under persistent models, we refrain from applying the weighted bootstrap approximation to calculate the p-value for the fitted GARCH(1,1) model of the first period. The residual-based bootstrap p-values for the proposed test and the test in Leucht et al. (2015) are 0.014 (for \(a=0.5\), 1.0 and 1.5) and 0.080, respectively, showing that the GARCH(1,1) specification for the conditional variance is not adequate for the first period. For the second period, we got the weighted bootstrap p-values: 0.581, 0.564, 0.544, and the residual-based bootstrap p-values: 0.302, 0.334 and 0.350 for \(a=0.5\), 1.0 and 1.5, respectively, and 0.204 for the test in Leucht et al. (2015). No test rejects \(\mathcal{{H}}_{0, \mathrm{lin}}\) for the second period. A visual inspection of Figure 1 in Supplementary Material suggests a possibility of dispersion change for the first period, whereas Figure 2 in Supplementary Material shows more stability for the second period.

6.2.2 Daily exchange rates

We consider returns series of the daily exchange rates of the American Dollar (USD), the Japanese Yen (JPY), the British Pound (GBP), the Swiss Franc (CHF) and Canadian Dollar (CAD) with respect to the Euro, from January 5, 1999, to January 18, 2012, which corresponds to 3344 observations. The data were obtained from the web site http://www.ecb.int/stats/exchange/eurofxref/html/index.en.html. These data were analyzed in Francq et al. (2013), where the authors fitted a log-GARCH(1,1) model to each exchange rate. Here, we test for the adequacy of such a model as well as of the (linear) GARCH(1,1). Table  8 displays the parameter estimators for each model and each exchange rate (although Table 1 in Francq et al. (2013) displays the fitted parameters for the log-GARCH(1,1) model, we decided to include them here for the sake of completeness) and the weighted bootstrap p-values for testing goodness-of-fit for each model based on 2000 replications. Looking at the results in this table, we conclude that the log-GARCH(1,1) provides an adequate fit only for the USD and the GBP series. On the other hand, the GARCH(1,1) model is adequate for the GBP, the CHF and the CAD series, while no model provides a good description of the JPY data.

Table 8 Fitted parameter values for the GARCH(1,1) and the log-GARCH(1,1) models on daily returns of exchange rates and weighted bootstrap p-values for testing goodness-of-fit to these models

6.2.3 Exxon price duration

In our third application, we use NYSE price duration data to assess the performance of the linear ACD(1,1) model

$$\begin{aligned} \psi _t = \omega +\gamma y_{t-1}+\beta \psi _{t-1}, \end{aligned}$$

to the adjusted Exxon price durations, which comprises \(n=2716\) observations (see Fernandes and Gramming (2005) for a detailed description of the data). Figure 3 in Supplementary Material displays duration plot and histogram. Fernandes and Gramming (2005) and Meintanis et al. (2018) tested goodness-of-fit for the innovation distribution by assuming a linear ACD(1,1) model. In estimating the parameters, we employed the QMLE with a unit exponential distribution innovation distribution. The resulting estimates are \((\hat{\omega }, \hat{\gamma }, \hat{\beta })= (0.0695, 0.0507, 0.8801)\), yielding weighted bootstrap p-values: 0.453, 0.544 and 0.545, and the corresponding residual-based bootstrap p-values: 1.0, 1.0 and 1.0 for \(a=0.5\), 1.0 and 1.5, respectively, indicating that a linear ACD(1,1) model cannot be rejected.

7 Conclusion

We suggest a goodness-of-fit test for general conditional moment specification models of arbitrary (but finite) memory, which is consistent against any deviation from the null model and which may be readily computed without approximate numerical techniques. The new procedure was implemented by suitable bootstrap resampling and found to perform well in finite samples for GARCH models as well as for conditional duration models. Thus, our approach suits well practitioners’ purposes since it can be readily be adapted to a wide range of GARCH-type models with applications in diverse fields.

Although not handled here, our method can be readily extended to Poisson autoregressive models for time series of counts, wherein the conditional mean-adjusted observations play the same role as the error terms in GARCH models. An entirely different question is whether the present method can be adapted to test goodness-of-fit to continuous volatility models such as the models tested in Monsalve-Cobis et al. (2011) and González-Manteiga et al. (2017). This issue, however, requires separate investigation.