1 Introduction

Empirical processes play a prominent role in Statistics, since statistical procedures often involve functionals of them. In certain settings, such as linear models or time series, some variables of interest, such as the errors or innovations, cannot be directly observed and the inference is based on the residuals, whose calculation involves the estimation of certain parameters in the model. GARCH models, introduced by Bollerslev (1986), belong to this class. The present paper is concerned with the study of the empirical characteristic function (ECF) of the residuals of these models. This study is motivated by the fact that the last decades have witnessed an increasing number of statistical procedures based on functions of the ECF process in a wide range of models and settings: for example, in point estimation (Feuerverger and McDunnough 1981a, b), the \(k\)-sample problem (Hušková and Meintanis 2008; Alba Fernández et al. 2008) and goodness-of-fit (GOF) tests (Epps and Pulley 1983; Baringhaus and Henze 1988; Gürtler and Henze 2000; Meintanis 2004; Matsui and Takemura 2005, 2008; Jiménez-Gamero et al. 2009, for independent and identically distributed (IID) observations, and Hušková and Meintanis 2007, 2010 and Jiménez-Gamero et al. 2005, for the errors in regression models, among many others). Much of the appeal of these procedures is that the application usually requires weaker conditions than their analogues based on the empirical distribution function (EDF). Another advantage of the statistical procedures based on the ECF over those based on the EDF is that while the data dimension plays an important role in the later ones (for instance, the Cramér von Mises test cannot be readily calculated for \(d\)-dimensional data, for any \(d\ge 2\)), it plays no role for many ECF-based tests, since the Cramér Wold device (see, for example, Serfling 1980, pp 17–18) is automatically applied. A key step towards the development of statistical procedures based on the ECF for making inferences on GARCH models is to study the ECF process of the residuals.

Some other processes associated with the residuals of GARCH models have been previously studied. For example, Berkes and Horváth (2001) have studied the empirical process of the observations; Berkes and Horváth (2003) have studied the empirical process of the squared residuals; the results in this last paper inspired those in Horváth et al. (2004), where some GOF tests based on the EDF of the squared residuals were numerically studied; Kulperger and Yu (2005) have studied partial sums of \(k\)th powers of residuals, with applications to change-point problems and GOF; Koul and Ling (2006) have studied the empirical process of the residuals with applications to testing GOF for the distribution of the innovations; Horváth et al. (2008) have studied partial sums of the squared observations and of its EDF.

This paper is devoted to study the limit behavior of the ECF process of the residuals. Specifically, we study the convergence in the class of continuous functions defined on a compact set, as well as the convergence in the Hilbert space \(L_2(w)=\{f:\mathbb {R} \rightarrow \mathbb {C}: \Vert f\Vert ^2_w=\int |f(t)|^2w(t)\mathrm{d}t<\infty \}\), for some nonnegative function \(w\) satisfying \(0<\int w(t)\mathrm{d}t<\infty \). We also study the convergence in law to a Gaussian process. The covariance structure of the limit process depends on the distribution of the innovations, the estimators employed to approximate the parameters of the GARCH model and the equation defining the model. Applications of the obtained results are reported. Specifically, we consider the problem of testing symmetry, which is equivalent to testing that the imaginary part of the population characteristic function (CF) of the innovations is equal to 0. Surprisingly, the limiting null distribution of the considered test statistic coincides with that derived for IID data, which only depends on the population CF. Another application to the problem of testing GOF for the distribution of the innovations is also given. In both applications, the null distribution of the test statistic is approximated by a bootstrap algorithm. The consistency of these bootstrap estimators is proven.

The paper is organized as follows. Section 2 describes the model and summarizes some properties that will be used along the paper. The main results concerning the asymptotic behavior of the ECF process of the residuals are studied in Sect. 3. Section 4 provides two applications of the obtained results to testing symmetry and GOF for the distribution of the innovations. All proofs, as well as some intermediate results, are sketched in the Appendix.

Before ending this section, we introduce some notation: all vectors are column vectors; for any vector \(v\), \(v_k\) denotes its \(k\)th coordinate, \(\Vert v\Vert \) its Euclidean norm and \(v'\) its transpose; for any complex number \(x=a+\text{ i }b\), \(\bar{x}=a-\text{ i }b\) and \(|x|=\sqrt{a^2+b^2}=\sqrt{x \bar{x}}\); for any complex function \(f(x)\), \(\mathrm{Re}f(t)\) and \(\mathrm{Im}f(t)\) denote the real and the imaginary parts of \(f\), respectively, that is to say, \(f(x)=\mathrm{Re}f(t)+\text{ i } \mathrm{Im}f(x)\); \(P_0\), \(E_0\) and \(\mathrm{Cov}_0\) denote probability, expectation and covariance, respectively, by assuming that the null hypothesis is true; \(P_{*}\), \(E_{*}\) and \(\mathrm{Cov}_{*}\) denote the conditional probability law, expectation and covariance, given \(X_1, X_2, \ldots , X_n\), respectively; all limits in this paper are taken when \(n \rightarrow \infty \); \(\mathop {\rightarrow }\limits ^{\mathcal {L}}\) denotes convergence in distribution; \(\mathop {\rightarrow }\limits ^{P}\) denotes convergence in probability; \(\mathop {\rightarrow }\limits ^{a.s.}\) denotes the almost sure convergence; an unspecified integral denotes integration over the whole real line \(\mathbb {R}\); \(\langle \cdot , \cdot \rangle \) denotes the scalar product in the Hilbert space \(L_2(w)\); without loss of generality it will be assumed along the paper that \(\int w(t)\mathrm{d}t=1\).

2 The model

Let \(p,q \in \mathbb {N}\cup \{0\}\). A stochastic process \(\{X_j, \; -\infty <j<\infty \}\) is said to follow a GARCH(\(p\), \(q\)) model if it satisfies the equations

$$\begin{aligned} X_j=\sigma _j\varepsilon _j, \end{aligned}$$
(1)

with

$$\begin{aligned} \sigma _j^2=c+\sum _{k=1}^pa_kX^2_{j-k}+\sum _{l=1}^qb_l\sigma ^2_{j-l}, \end{aligned}$$
(2)

for \(-\infty <j<\infty \), where \(c>0\), \(a_k \ge 0\) and \(b_l \ge 0\). If \(q=0\) then we get an autoregressive conditional heteroscedastic (ARCH) model, introduced by Engle (1982). Throughout this paper, it will be assumed that \(\{X_j, \; -\infty <j<\infty \}\) satisfies (1) and (2), that it is stationary, that \(\{\varepsilon _j,\; -\infty <j<\infty \}\) are IID variables with \(E(\varepsilon _j)=0\) and \(E(\varepsilon _j^2)=1\), and that \(\varepsilon _j\) is independent of \(\{X_{j-k}, \; k \ge 1\}\).

Bougerol and Picard (1992a, b) have given necessary and sufficient conditions for the existence of a unique strictly stationary solution of (1) and (2). A necessary and sufficient condition for the process \(\{X_j, \; -\infty <j<\infty \}\) to be (strictly) stationary with \(E(X_j^2)<\infty \) is (see, for example, Theorem 4.4 in Fan and Yao 2003)

$$\begin{aligned} \sum _{k=1}^pa_k+\sum _{l=1}^qb_l<1. \end{aligned}$$

In this case, \(E(X_j)=0\) and

$$\begin{aligned} E(X_j^2)=c\left( 1-\sum _{k=1}^pa_k-\sum _{l=1}^qb_l\right) ^{-1}. \end{aligned}$$

Let \(\mathcal {F}_{j}\) be the \(\sigma \)-algebra generated by \(\{\varepsilon _k, \, -\infty <k\le j\}\). Since \(E(X_j^2 \, | \, \mathcal {F}_{j-1})=\sigma _j^2\), the expectations of \(X_j^2\) and \(\sigma _j^2\) coincide. If \(E(\log \sigma _0^2)<\infty \), then Theorem 2.1 in Berkes et al. (2003) shows that \(\sigma _j^2\) can be expressed as (see also Hall and Yao 2003)

$$\begin{aligned} \sigma _j^2=\sigma _j^2(\theta )&= \frac{c}{1-\sum _l b_l}+\sum _{k=1}^pa_kX^2_{j-k}\\&\quad +\,\sum _{k=1}^pa_k \sum _{v=1}^{\infty }\sum _{l_1}^q \ldots \sum _{l_v}^q b_{l_1}\ldots b_{l_v}X^2_{j-k-l_1-\cdots -lv}, \end{aligned}$$

where \(\theta =(c,a_1,\ldots , a_p, b_1, \ldots , b_q)'\) and the multiple sum vanishes if \(q=0\). From Lemma 2.3 in Berkes et al. (2003), a sufficient condition for \(E(\log \sigma _0^2)<\infty \) to hold is that \(E(|\varepsilon _0^2|^{\delta })<\infty \), for some \(\delta >0\). Since we assume that \(E(\varepsilon _0^2)=1\), then the above expansion for \(\sigma _j^2\) holds. Let \(r=1+p+q\) denote the dimension of \(\theta \), which is assumed to be fixed but unknown.

As in Berkes and Horváth (2003), it will be also assumed that \(\theta \in \Theta _0=\Theta (\rho _0,\rho _1,\rho _2)=\{u=(\gamma , \alpha _1, \ldots , \alpha _p,\beta _1,\) \( \ldots , \beta _q ): \, \beta _1+ \cdots + \beta _q \le \rho _0, \rho _1 \le \min \{\gamma , \alpha _1, \ldots , \alpha _p,\beta _1, \) \(\ldots , \beta _q \}\le \max \{\gamma , \alpha _1, \ldots , \alpha _p,\beta _1, \ldots , \beta _q \}\) \(\le \rho _2\}\), for some constants \(\rho _0,\rho _1,\rho _2\) satisfying \(0<\rho _0<1\), \(0<\rho _1<\rho _2\), \(q\rho _1 \le \rho _0\). Note that this assumption requires \(p\) and \(q\) to be known, and rules out zero coefficients in \(\theta \).

To estimate \(\theta \), it is often assumed that the errors \(\varepsilon _j\) are normally distributed. This estimator, \(\hat{\theta }\), is called the Gaussian maximum likelihood estimator (GMLE). If

$$\begin{aligned} E( \varepsilon ^4)<\infty , \end{aligned}$$
(3)

then \(\sqrt{n}(\hat{\theta }-\theta )\) is asymptotically normally distributed, even if the errors are not normally distributed (see Hall and Yao 2003; Francq and Zakoïan 2004). Moreover, even if (3) does not hold then, under certain conditions, \(n^{\kappa }(\hat{\theta }-\theta )\) is bounded in probability, for some \(\kappa >0\) (see Hall and Yao 2003). Although the GMLE has become the most popular estimator, other estimators have been proposed. Examples are the estimators in Peng and Yao (2003), which are asymptotically normally distributed without requiring (3), and those in Berkes and Horváth (2004), where a class of estimators including the GMLE is studied. From now on, we will denote through \(\hat{\theta }\) to any estimator of \(\theta \).

3 Main results

In a GARCH model, the errors are not observable. Thus to make inferences on the errors, we must approximate them by means of the residuals. With this aim, we must first estimate \(\sigma _j^2(\theta )\). Note that \(\sigma _j^2(\theta )\) depends on \(\{X_k,\, -\infty <k\le j-1\}\), whereas we observe \(X_1, \ldots , X_n\). So, in order to calculate the residuals, instead of \(\sigma _j^2(\hat{\theta })\), we consider \(\tilde{\sigma }_j^2(\hat{\theta })\), where

$$\begin{aligned} \tilde{\sigma }_j^2(\theta )&= \frac{c}{1-\sum _l b_l}+\sum _{k=1}^{\min \{p, j-1 \}} a_kX^2_{j-k}\\&+\,\sum _{k=1}^pa_k \sum _{v=1}^{\infty }\sum _{l_1}^q \ldots \sum _{l_v}^q b_{l_1}\ldots b_{l_v}X^2_{j-k-l_1-\cdots -lv} I(j-k-l_1-\cdots -l_v) \end{aligned}$$

and \(I(S)\) denotes the indicator function of the set \(S\), which only depends on the observations \(X_1, \ldots , X_{j-1}\). Let \(\{\tilde{\varepsilon }_j=X_j/\tilde{\sigma }_j(\hat{\theta }), \, 1\le j\le n\}\) be the residuals and let \(\varphi _{n,\nu } (t)\) denote the ECF of the residuals \(\tilde{\varepsilon }_{\nu +1}, \ldots , \tilde{\varepsilon }_n\)

$$\begin{aligned} \varphi _{n, \nu }(t)=\frac{1}{n-\nu }\sum _{j=\nu +1}^n e^{it\tilde{\varepsilon }_j} , \end{aligned}$$

for some integer \(\nu \ge 1\). The reason for only considering the residuals \(\tilde{\varepsilon }_{\nu +1}, \ldots , \tilde{\varepsilon }_n\), instead of all of them, \(\tilde{\varepsilon }_{1}, \ldots , \tilde{\varepsilon }_n\), is that for small \(j\), \(\tilde{\sigma }_{j}^2(\theta )\) is not a good approximation to \({\sigma }_{j{}_{}}^2(\theta )\), and thus early terms in the series should be avoided.

For IID data, it is well known that the ECF of the data estimates consistently the population CF and that the ECF process converges to a complex Gaussian process in finite intervals (see, for example, Feuerverger and Mureika 1977; Csörgő 1981a, b; Marcus 1981). The next theorems state similar results for the ECF of the residuals. Let \(\varphi (t)\) denote the CF of \(\varepsilon _0\).

Theorem 1

Assume that \(\theta \in \Theta _0\) and \(n^{\kappa }(\hat{\theta }-\theta )=O_P(1)\), for some \(\kappa >0\). Let \(\nu =\nu (n)\) be an integer satisfying

$$\begin{aligned} \nu /n \rightarrow 0. \end{aligned}$$
(4)

Then,

  1. (a)

    \(\sup _{t\in S}\left| \varphi _{n, \nu }(t)- \varphi (t)\right| \mathop {\longrightarrow }\limits ^{P} 0\), \(\forall S\) compact interval.

  2. (b)

    \(\Vert \varphi _{n,\nu }-\varphi \Vert _w \mathop {\longrightarrow }\limits ^{P} 0. \)

Next, we study the convergence in law of the ECF process \(Y_{n, \nu }(t)=\sqrt{n-\nu }\{\varphi _{n, \nu }(t) -\varphi (t)\} \) and of its \(L_2(w)\)-norm, \(\Vert Y_{n, \nu }\Vert _w\). With this aim, we will assume that \(\sqrt{n}(\hat{\theta }-\theta )\) is asymptotically normal. Specifically, we will assume that \(\hat{\theta }\) satisfies the following.

  1. (A.1)

    \(\hat{\theta }\) can be expressed as

    $$\begin{aligned} \hat{\theta }={\theta }+{n}^{-1}\sum _{j=1}^nL_j(\theta )+ o_P(n^{-1/2}), \end{aligned}$$

    where \(L_j(\theta )=(g_1(\varepsilon _j)l_1(\varepsilon _{j-1},\, \varepsilon _{j-2}, \ldots ), \, \ldots , g_{r}(\varepsilon _j)l_{r}(\varepsilon _{j-1}, \varepsilon _{j-2}, \ldots ))',\) \(1\le j\le n\),

    $$\begin{aligned} E\{g_u(\varepsilon _0)\}=0,\quad E\{g_u(\varepsilon _0)^2\}<\infty , \quad E\{l_u(\varepsilon _{-1}, \varepsilon _{-2}, \ldots )^2\}<\infty , \quad 1 \le u \le r. \end{aligned}$$

The GMLE as well as other often used estimators of \(\theta \) satisfy (A.1) (see Sect. 3 of Berkes and Horváth 2003). If \(\hat{\theta }\) satisfies (A.1) then, by the Martingale Central Limit Theorem (see, for example, Kundu et al. 2000), \(\sqrt{n}(\hat{\theta }-\theta )\mathop {\longrightarrow }\limits ^{\mathcal {L}} N_r(0,\Sigma _{\theta })\), an \(r\)-variate zero mean normal law with variance matrix \(\Sigma _{\theta }=var\{L_0(\theta )\}=(\varsigma _{uv})\), where

$$\begin{aligned} \varsigma _{uv}=E\{g_u(\varepsilon _0) g_v(\varepsilon _0)\}E\{l_u(\varepsilon _{-1}, \varepsilon _{-2}, \ldots )l_v(\varepsilon _{-1}, \varepsilon _{-2}, \ldots )\}, \quad 1 \le u,v \le r. \end{aligned}$$

Let \(\mu _c(t)\!=\!\frac{\partial }{\partial t}\mathrm{Re}\varphi (t)\!=\!E\{-\varepsilon _0 \sin (t\varepsilon _0)\}\) and \(\mu _s(t)\!=\!\frac{\partial }{\partial t}\mathrm{Im}\varphi (t)\!=\!E\{\varepsilon _0 \cos (t\varepsilon _0)\}\). Observe that these derivatives exist because we assume that the innovations have finite first moment. Finally, let \(\sigma _j^2(\theta )A_j(\theta )\) be the \(r\)-vector of derivatives of \(\sigma _j^2(\theta )\) with respect to \(\theta \), that is, \(A_j(\theta )=\frac{1}{\sigma _j^2(\theta )}\frac{\partial }{\partial \theta }\sigma _j^2(\theta )\).

Theorem 2

Assume that \(\theta \in \Theta _0\) and that \(\hat{\theta }\) satisfies (A.1). Let \(\nu =\nu (n)\) be an integer satisfying (4). Let \(Y_{n, \nu }(t)=\sqrt{n-\nu }\{\varphi _{n, \nu }(t) -\varphi (t)\}\) and let \(Y(t)\) be a zero mean complex valued Gaussian process with covariance structure

$$\begin{aligned} \begin{aligned} \mathrm{Cov}\{\mathrm{Re}Y(t), \mathrm{Re}Y(s)\}&= \mathrm{Cov}\{C(t), C(s)\},\\ \mathrm{Cov}\{\mathrm{Re}Y(t), \mathrm{Re}Y(s)\}&= \mathrm{Cov}\{C(t), S(s)\},\\ \mathrm{Cov}\{\mathrm{Re}Y(t), \mathrm{Re}Y(s)\}&= \mathrm{Cov}\{S(t), S(s)\},\\ \end{aligned} \end{aligned}$$

\(\forall t,s \in \mathbb {R}\), where \(C(t)=\cos (t\varepsilon _0)-\mathrm{Re}\varphi (t)-\frac{1}{2}t\mu _c(t)E\{A_0(\theta )\}'L_0(\theta )\), \(S(t)=\sin (t\varepsilon _0)-\mathrm{Im}\varphi (t)-\frac{1}{2}t\mu _s(t)E\{A_0(\theta )\}'L_0(\theta )\). Then,

  1. (a)

    \(Y_{n, \nu }(t)\) converges weakly to Y(t) in every compact interval.

  2. (b)

    If

    $$\begin{aligned} \int t^4w(t)\mathrm{d}t<\infty , \end{aligned}$$
    (5)

    then \(\Vert Y_{n, \nu }\Vert _w^2\mathop {\longrightarrow }\limits ^{\mathcal {L}} \Vert Y\Vert ^2_w\).

Remark 1

Let \(m_{c}(t)=\mathrm{Cov}\{\cos (t \varepsilon _0), L_0(\theta )\}\) and \(m_{s}(t)=\mathrm{Cov}\{\sin (t \varepsilon _0), L_0(\theta )\}\). Note that,

$$\begin{aligned} \mathrm{Cov}\{C(t), C(s)\}&= \frac{1}{2}\{\mathrm{Re}\varphi (t+s)+\mathrm{Re}\varphi (t-s)\}-\mathrm{Re}\varphi (t)\mathrm{Re}\varphi (s)\\&\quad -\frac{1}{2}t\mu _c(t)E\{A_0(\theta )\}'m_c(s) -\frac{1}{2}s\mu _c(s)E\{A_0(\theta )\}'m_c(t)\\&\quad +\frac{1}{4}ts\mu _c(t)\mu _c(s)E\{A_0(\theta )\}'\Sigma _{\theta } E\{A_0(\theta )\},\\ \mathrm{Cov}\{C(t), S(s)\}&= \frac{1}{2}\{\mathrm{Im}\varphi (t+s)+\mathrm{Im}\varphi (t-s)\}-\mathrm{Re}\varphi (t)\mathrm{Im}\varphi (s)\\&\quad - \frac{1}{2} t\mu _c(t)E\{A_0(\theta )\}'m_s(s)-\frac{1}{2}s\mu _s(s)E\{A_0(\theta )\}'m_c(t)\\&\quad + \frac{1}{4}ts\mu _c(t)\mu _s(s)E\{A_0(\theta )\}'\Sigma _{\theta } E\{A_0(\theta )\}, \\ \mathrm{Cov}\{S(t), S(s)\}&= \frac{1}{2}\{-\mathrm{Re}\varphi (t+s)+\mathrm{Re}\varphi (t-s)\}-\mathrm{Im}\varphi (t)\mathrm{Im}\varphi (s)\\&\quad -\frac{1}{2} t\mu _s(t)E\{A_0(\theta )\}'m_s(s) -\frac{1}{2}s\mu _s(s)E\{A_0(\theta )\}'m_s(t)\\&\quad +\frac{1}{4}ts\mu _s(t)\mu _s(s)E\{A_0(\theta )\}'\Sigma _{\theta } E\{A_0(\theta )\}, \end{aligned}$$

\(\forall t,s \in \mathbb {R}\). Therefore, in contrast to the IID case, the limit law of the ECF process depends not only on the CF of the innovations, but also on the estimator of \(\theta \) employed, through \(\Sigma _{\theta }\), \(m_c(t)\) and \(m_s(t)\), and on the equation defining the GARCH model through \(E\{A_0(\theta )\}\).

4 Applications

4.1 Testing for symmetry

Many commonly used packages allow the practitioner to choose between several symmetric distributions for obtaining the (quasi) maximum likelihood estimator of the parameter \(\theta \), usually: normal (obtaining the GMLE), Laplace and Student \(t\). The two later distributions let us model tails which are heavier than those of the normal law, a fact frequently observed in financial time series (see Rydberg 2000). Note that all of these distributions are symmetric, a hypothesis questioned by several authors, in the light of certain practical applications (see also Rydberg 2000). So, one could wish to test if the hypothesis of symmetry is supported by the data. This hypothesis is equivalent to the following

$$\begin{aligned} H_{0S}: \text{ the } \text{ law } \text{ of } \text{ the } \text{ errors } \text{ is } \text{ symmetric } \Longleftrightarrow H_{0S}: \,\mathrm{Im}\varphi (t)=0, \forall t. \end{aligned}$$

As a consequence of Theorem 1, under the assumptions in this theorem, if \(w\) is a weight function such that \(w(t)>0\), \(\forall t \in \mathbb {R}\), then

$$\begin{aligned} T_{n,\nu }=T_{n,\nu }(X_1,\ldots , X_n)= \Vert \mathrm{Im} \varphi _{n,\nu }\Vert _w^2 \mathop {\longrightarrow }\limits ^{P} \Vert \mathrm{Im} \varphi \Vert _w^2 \ge 0, \end{aligned}$$
(6)

with \(\Vert \mathrm{Im} \varphi \Vert _w=0\) if and only if \(H_{0S}\) is true. So a reasonable test for testing \(H_{0S}\) should reject the null hypothesis for “large” values of \(T_{n,\nu }\). This statistic (with \(\nu =0\)) was first proposed by Feuerverger and Mureika (1977) for testing symmetry in the IID case (see also Henze et al. 2003). Now, to determine what are large values of \(T_{n,\nu }\), we should calculate the null distribution of \(T_{n,\nu }\), or at least a consistent approximation to it. Clearly, the null distribution of \(T_{n,\nu }\) is unknown. A classical way to approximate the null distribution of a test statistic is through its asymptotic null distribution. As a consequence of Theorem 2, under \(H_{0S}\),

$$\begin{aligned} (n-\nu )T_{n,\nu }\mathop {\longrightarrow }\limits ^{\mathcal {L}} W_{0S}=\Vert Y_{0S}\Vert _w^2, \end{aligned}$$
(7)

where \(Y_{0S}(t)=\mathrm{Im} Y(t)\), \(Y(t)\) being as defined in Theorem 2. From Theorem 2 and Remark 1, since under \(H_{0S}\) we have that \(\mathrm{Im} \varphi (t)=0\) and \(\mu _s(t)=\frac{\partial }{\partial t}\mathrm{Im} \varphi (t)=0\), \(\forall t\), it follows that the Covariance structure of \(Y_{0S}(t)\) is given by

$$\begin{aligned} K(s,t)=E\{Y_{0S}(t), Y_{0S}(s)\} = \frac{1}{2}\{-\varphi (t+s)+\varphi (t-s)\}. \end{aligned}$$
(8)

Note that the asymptotic null distribution of \((n-\nu )T_{n,\nu }\) depends neither on the estimator of \(\theta \) employed nor on the equation defining the GARCH model governing the data, but only on the population CF of the innovations. In fact, the asymptotic null distribution of \((n-\nu )T_{n,\nu }\) coincides with that obtained in Feuerverger and Mureika (1977) for the IID case. In other words, under \(H_{0S}\), \(\sqrt{n-\nu } \varphi _{n,\nu }(t)\) asymptotically behaves just like \(\sqrt{n}\varphi _n(t)\) in the sense that both processes have the same weak limit, where \(\varphi _n(t)=\frac{1}{n}\sum _{j=1}^n\varepsilon _j\). Let \(0<\alpha <1\). The limit (7) tells us that \((n-\nu )T_{n,\nu }=O_P(1)\), and thus from (6), it follows that the test function for testing \(H_{0S}\)

$$\begin{aligned} {\Psi }_{S}={\Psi }_{S}(X_1, X_2, \ldots , X_n)=\left\{ \begin{array}{l@{\quad }l} 1, &{} \text{ if } (n-\nu )T_{n,\nu }\ge t_{ \alpha },\\ 0, &{}\text{ otherwise, } \end{array}\right. \end{aligned}$$
(9)

where \(t_{\alpha }\) is the \(1-\alpha \) percentile of the null distribution of \((n-\nu )T_{n,\nu }\), or a consistent approximation to it, is consistent against fixed alternatives, that is to say, it rejects \(H_{0S}\) with probability tending to one when it is false.

As observed before, the null distribution of \((n-\nu )T_{n,\nu }\) cannot be exactly calculated. The asymptotic null distribution of \((n-\nu )T_{n,\nu }\) cannot be used to approximate its null distribution, because it depends on the unknown CF of the innovations. Thus, we have to resort to other methods to approximate the null distribution of the test statistic.

The test (9) has been numerically investigated by Klar et al. (2012). To approximate the null distribution of the test statistic, these authors have employed the following bootstrap algorithm, which is quite similar to the bootstrap schemes employed in Hall and Yao (2003); Horváth et al. (2004) and Pascual et al. (2006).

Algorithm 1

  1. (i)

    On the basis of \(X_1,\ldots ,X_n\), compute \(\hat{\theta }=\hat{\theta }(X_1,\ldots ,X_n)=(\hat{c},\hat{a}_1,\ldots ,\hat{a}_p,\hat{b}_1,\ldots ,\hat{b}_q)'\).

  2. (ii)

    Compute the residuals \(\tilde{\varepsilon }_{1}, \ldots , \tilde{\varepsilon }_{n}\).

  3. (iii)

    Define the bootstrap observations

    $$\begin{aligned} X^*_{n,j}=\tilde{\sigma }_j^*(\hat{\theta }) \varepsilon ^*_j, \end{aligned}$$

    where

    $$\begin{aligned} \tilde{\sigma }_j^{*2}(\hat{\theta })=\hat{c}+\sum _{k=1}^{\min \{p,j-1\}}\hat{a}_kX^{*2}_{n,j-k}+\sum _{l=1}^{\min \{q,j-1\}}\hat{b}_l\tilde{\sigma }^{*2}_{j-l}(\hat{\theta }) \end{aligned}$$

    and \(\varepsilon ^*_j=\upsilon _j\tilde{\varepsilon }_j\), \(j=1, \ldots , n\), \(\upsilon _{1}, \ldots , \upsilon _n\) are IID with \(P(\upsilon _j=-1)=P(\upsilon _j=1)=0.5\), and \(\upsilon _{1}, \ldots , \upsilon _n\) are also independent of \(\tilde{\varepsilon }_{1}, \ldots , \tilde{\varepsilon }_{n}\).

  4. (iv)

    Based on the bootstrap data, \(\mathbf{X}_n^*=(X^*_{n,1},\ldots ,X^*_{n,n})\) calculate the test statistic, obtaining \(T^*_{n,\nu }=T_{n,\nu }(X^*_{n,1},\ldots ,X^*_{n,n})\). Approximate the null distribution of \((n-\nu )T_{n,\nu }\) through the conditional distribution of \((n-\nu )T_{n,\nu }^{*}\), given the data.

The above algorithm can be slightly modified by generating the bootstrap innovations from a symmetrization of the EDF of the residuals. We call Algorithm 2 to the resulting bootstrap algorithm.

Algorithm 2

Steps (i), (ii) and (iv) are as in Algorithm 1.

  1. (iii)

    The bootstrap observations are defined as in Algorithm 1, but now \(\varepsilon ^*_{\nu +1},\ldots , \varepsilon ^*_{n}\) are IID from the EDF of \(\tilde{\varepsilon }_{\nu +1}, \ldots , \tilde{\varepsilon }_{n},-\tilde{\varepsilon }_{\nu +1}, \ldots , -\tilde{\varepsilon }_{n}.\)

In practice, the bootstrap estimation of the null distribution of \((n-\nu )T_{n,\nu }\) in step (iv) in Algorithms 1 and 2 must be carried out by simulation, that is, by generating a high number of bootstrap samples, say \(\mathbf{X}_n^{*1},\ldots ,\mathbf{X}_n^{*B}\), and then approximating the null distribution of \((n-\nu )T_{n,\nu }\) through the EDF of \((n-\nu )T_{n,\nu }^{*1}=(n-\nu )T_{n,\nu }(\mathbf{X}_n^{*1}), \ldots , (n-\nu )T_{n,\nu }^{*B}=(n-\nu )T_{n,\nu }(\mathbf{X}_n^{*B})\). This requires the calculation of \(\hat{\theta }^{*1}=\hat{\theta }(\mathbf{X}_n^{*1}), \ldots , \hat{\theta }^{*B}=\hat{\theta }(\mathbf{X}_n^{*B})\) as well as the bootstrap residuals, \(\tilde{\varepsilon }_j^{*b}\), \(\nu +1 \le j\le n\), \(1\le b \le B\).

We can considerably save computing time by taking advantage of the property that, under \(H_{0S}\), \(\sqrt{n-\nu } \varphi _{n,\nu }(t)\) asymptotically behaves the same as \(\sqrt{n}\varphi _n(t)\). With this aim, we treat the residuals \(\tilde{\varepsilon }_{\nu +1}, \ldots , \tilde{\varepsilon }_n\) as if they were the true errors \(\varepsilon _{\nu +1},\ldots , \varepsilon _n\) and then act as in the IID setting (following for example the approach in Henze et al. 2003). This way we elude the calculation of \((n-\nu )T_{n,\nu }^{*1}=(n-\nu )T_{n,\nu }(\mathbf{X}_n^{*1}), \ldots , (n-\nu )T_{n,\nu }^{*B}=(n-\nu )T_{n,\nu }(\mathbf{X}_n^{*B})\). Algorithms 3 and 4 give two bootstrap null distribution estimators that make use of this fact.

Algorithm 3

Steps (i) and (ii) are as in Algorithm 1.

  1. (iii)

    Let \(\varepsilon ^*_j=\upsilon _j\tilde{\varepsilon }_j\), \(j=1, \ldots , n\), where \(\upsilon _{1}, \ldots , \upsilon _n\) are IID with \(P(\upsilon _j=-1)=P(\upsilon _j=1)=0.5\), and \(\upsilon _{1}, \ldots , \upsilon _n\) are also independent of \(\tilde{\varepsilon }_{1}, \ldots , \tilde{\varepsilon }_{n}\).

  2. (iv)

    Approximate the null distribution of \((n-\nu )T_{n,\nu }\) through the conditional distribution of \((n-\nu )T_{n,\nu }^{*}\), given the data, where \(T_{n,\nu }^{*}= \Vert S_{n,\nu }^{*}\Vert _w^2\) and \(S_{n,\nu }^{*}(t)=\frac{1}{n-\nu }\sum _{j=\nu +1}^n \sin (t\varepsilon ^*_j).\)

Algorithm 4

Steps (i), (ii) and (iv) are as in Algorithm 3.

  1. (iii)

    \(\varepsilon ^*_{\nu +1},\ldots , \varepsilon ^*_{n}\) are IID from the EDF of \(\tilde{\varepsilon }_{\nu +1}, \ldots , \tilde{\varepsilon }_{n},-\tilde{\varepsilon }_{\nu +1}, \ldots , -\tilde{\varepsilon }_{n}.\)

We next show the consistency of the distribution estimators yielded by Algorithms 3 and 4. Observe that no additional assumption is assumed to prove such consistency. Before stating these results, we want to remark that to derive the convergence in (7), it is not necessary to assume that \(\hat{\theta }\) satisfies (A.1), but only that \(\sqrt{n}(\hat{\theta }-\theta )=O_P(1)\) (this is evident from the proof of Theorem 2).

Theorem 3

Assume that \(\theta \in \Theta _0\), that (5) holds and that \(\sqrt{n}(\hat{\theta }-\theta )=O_P(1)\). Let \(\nu =\nu (n)\) be an integer satisfying (4). If \(T_{n,\nu }^{*}\) is as defined in Algorithm 3 or Algorithm 4, then

$$\begin{aligned} \sup _x|P_{*}\{(n-\nu )T_{n,\nu }^{*}\le x\}-P(W_{0S}\le x)|\mathop {\rightarrow }\limits ^{P}0, \end{aligned}$$

where \(W_{0S}\) is as defined in (7).

Let \(\{\lambda _j\}\) be the set of eigenvalues of operator \(\mathcal {A}\) defined on \(L_2(w)\) by

$$\begin{aligned} \mathcal {A} v(y)=\int K (x,y) v(y)w(x)\mathrm{d}x. \end{aligned}$$

The random variate \(W_{0S}\) is distributed as a (infinite) sum of independent chi-squared variates with one degree of freedom, \(\chi ^2_1\), multiplied by the eigenvalues of \(\mathcal {A}\), \(\sum _{j} \lambda _j \chi ^2_{1j}\). The set \(\{\lambda _j\}\) is unknown because \(K(t,s)\) is unknown. Nevertheless, \(K(s,t)\) can be consistently estimated by

$$\begin{aligned} K_n(t,s)&= \frac{1}{n-\nu }\sum _{j=\nu +1}^n \sin (t\tilde{\varepsilon }_j)\sin (s\tilde{\varepsilon }_j)\\&= \frac{1}{2(n-\nu )}\sum _{j=\nu +1}^n \left[ \cos \{(t-s)\tilde{\varepsilon }_j\}-\cos \{(t+s)\tilde{\varepsilon }_j\}\right] . \end{aligned}$$

From Lemma 6 in Sect. 5, \(K_n(t,s) \mathop {\rightarrow }\limits ^{P} K(t,s)\), \(\forall s,t \in \mathbb {R}\). Thus, we can approximate the distribution of \(W_{0S}\), and thus the null distribution of \((n-\nu )T_{n,\nu }\), by means of

$$\begin{aligned} W_n=\sum _{j} \hat{\lambda }_j \chi ^2_{1j}, \end{aligned}$$
(10)

where \(\{\hat{\lambda }_j\}\) are the eigenvalues of operator \(\mathcal {A}_n\) defined by \(\mathcal {A}_n v(y)=\int K_n (x,y) v(y)w(x)\mathrm{d}x\). Routine calculations show that \(\{\hat{\lambda }_j\}\) are the eigenvalues of the \((n-\nu )\times (n-\nu )\)-matrix \(M=(m_{jk})\) with

$$\begin{aligned} m_{jk}=\frac{1}{2(n-\nu )} \left\{ I_w(\tilde{\varepsilon }_j-\tilde{\varepsilon }_k)-I_w(\tilde{\varepsilon }_j+\tilde{\varepsilon }_k)\right\} , \end{aligned}$$

where \(I_w(t)=\int \cos (tx)w(x)\mathrm{d}x\). Therefore, the set \(\{\hat{\lambda }_j\}\) can be easily calculated using most statistical and mathematical programming languages. \(W_n\) is also a bootstrap estimator of the null distribution of \((n-\nu )T_{n,\nu }\). It is usually called a “bootstrap in the limit” estimator, since it has been built by replacing all unknown quantities in the limit distribution of the test statistic by appropriate estimators. The next result shows that \(W_n\) estimates consistently the null distribution of \((n-\nu )T_{n,\nu }\).

Theorem 4

Under assumptions in Theorem 3,

$$\begin{aligned} \sup _x|P_{*}\{W_n\le x\}-P(W_{0S}\le x)|\mathop {\rightarrow }\limits ^{P}0, \end{aligned}$$

where \(W_{n}\) and \(W_{0S}\) are defined in (10) and (7), respectively.

We will call Algorithm 5 to the bootstrap approximation to the null distribution of \((n-\nu )T_{n,\nu }\) in Theorem 4.

Algorithm 5

Steps (i) and (ii) are as in Algorithm 1.

  1. (iii)

    Calculate the eigenvalues \(\{\hat{\lambda }_j\}\) of matrix \(M\)

  2. (iv)

    Approximate the null distribution of \((n-\nu )T_{n,\nu }\) through the conditional distribution of \(W_n=\sum _{j} \hat{\lambda }_j \chi ^2_{1j}\), given the data.

Remark 2

Using the trigonometric identity \(2\sin (a) \sin (b)=\cos (a-b)-\cos (a+b)\), it can be easily derived the following alternative expression for \((n-\nu )T_{n,\nu }\),

$$\begin{aligned} (n-\nu )T_{n,\nu }= \sum _{j,k =\nu +1}^nm_{jk}, \end{aligned}$$

which is useful from a computational point of view.

Remark 3

In practice, the bootstrap distribution estimators in Algorithms 1–4 must be approximated by simulation. As for the null distribution estimator in Algorithms 5, since the distribution of a linear combination of \(\chi ^2\) variates is unknown, the conditional distribution of \(W_n\) must be approximated either by simulation or by some numerical method (see, for example, Kotz et al. 1967; Castaño-Martínez and López-Blázquez 2005).

We have presented five bootstrap algorithms to estimate the null distribution of \((n-\nu )T_{n,\nu }\). To compare their finite sample performance, we carried out a small simulation experiment. We generated data from a GARCH(1,1) model with \(c = 0.1\), \(a_1 = 0.3\), \(b_1 = 0.3\) and several symmetric distributions for the innovations, namely, normal, Laplace and \(t_5\). The sample size we took was \(n = 400\) and \(\nu = 10\). We took as weight function \(w\) in the definition of test statistic \(T_{n,\nu }\) the density of the standard normal distribution. As in Klar et al. (2012), in order to approximate the bootstrap \(p\) value of the observed value of the test statistic, we generated \(B = 200\) bootstrap samples for Algorithms 1–4. The conditional distribution of \(W_n\) was also approximated by simulation. This experiment was repeated 1,000 times. The parameters in the GARCH model were estimated through the GMLE. To calculate the parameter estimators as well as the residuals, we used the package tseries of the R language. Table 1 reports the number of bootstrap \(p\) values less than or equal to \(\alpha \), for \(\alpha =0.05,\, 0.10\), which are the estimated type I errors. Looking at this table, we see that the estimated type I errors are quite close to the nominal values in all cases. We also compared the algorithms in terms of the CPU consumed. Last column in Table 1 displays the obtained results. Algorithm 5 emerges as the cheapest in terms of computing time.

Table 1 Estimated probabilities of type I errors and relative CPU

The power of the test \(\Psi _S\), when the null distribution of the test statistic is estimated by means of Algorithm 1, has been numerically investigated by Klar et al. (2012). To study if the method of approximating the null distribution has any impact on the power for finite sample size, we repeated the above experiment with samples from skewed versions of the symmetric distributions in Table 1. Such skewed versions were obtained by applying the skewing mechanism proposed in Fernández and Steel (1998), namely, the density of the skewed distribution, indexed by a scalar \(\gamma \in (0,\infty )\), is generated from the symmetric density \(f\) as follows

$$\begin{aligned} f_{\gamma }(t)=\frac{2}{\gamma +1/\gamma }\left\{ f(t/\gamma )I(t \ge 0)+f(\gamma t)I(t<0) \right\} . \end{aligned}$$

For \(\gamma =1\), we obtain the symmetric density \(f\), for \(\gamma >1\) (\(\gamma <1\)) \(f_{\gamma }\) is skewed to the right (left). Since \(f_{\gamma }(t)= f_{1/\gamma }(-t)\), it is sufficient to consider values \(\gamma >1\). As in Klar et al. (2012), the values of \(\gamma \) were chosen so that the value of the skewness coefficient (that in our case coincides with the third moment because \(E(\varepsilon _j)=0\) and \(E(\varepsilon _j^2)=1\)) has comparable values across the different distributions. Table 2 displays the obtained results for nominal level \(\alpha =0.05\). Looking at this table, we conclude that the method of estimating the null distribution of the test statistic has little effect on the power, since all estimated powers are quite close.

Table 2 Estimated powers for nominal level \(\alpha =0.05\)

Summarizing, since the levels and the powers yielded by the five algorithms are very close, and Algorithm 5 is, from a computational point of view, the cheapest, we recommend its use.

4.2 Testing goodness-of-fit for the distribution of the innovations

To estimate the parameters of a GARCH model, it is usually assumed that the errors or innovations are normally distributed. Under certain not very restrictive conditions, the resultant estimator is normally distributed, even if the errors are not normally distributed (see Hall and Yao 2003; Berkes et al. 2003; Francq and Zakoïan 2004; Escanciano 2009). Nevertheless, as shown in Berkes and Horváth (2004) and numerically observed by Huang et al. (2008), the choice of the correct likelihood leads to more accurate estimates of the parameters. In addition, as argued in Angelidis et al. (2004) and Koul and Ling (2006), among many others, the knowledge of the error distribution plays an important role in evaluating the Value at Risk (VaR), a quantity very useful in economics and finance, whose calculation involves the distribution of the innovations. Hence, for certain purposes, a very important step in the analysis of GARCH models is to check if the data support the distributional hypothesis made on the innovations.

Some tests have been proposed for testing GOF for the innovations distribution. Since the innovations or errors are not observable, all these tests are necessarily based on the estimated errors or residuals. The proposed tests are “residual versions” for testing GOF for IID data. For example, Horváth et al. (2004) have numerically studied some GOF tests based on the EDF of the squared residuals for testing GOF for normality; Kulperger and Yu (2005) have proposed a Jarque–Bera type normality test; Koul and Ling (2006) and Bai and Chen (2008) have proposed a Kolmogorov–Smirnov type GOF test for testing a simple null hypothesis; Horváth and Zytikis (2006), Mimoto (2008) and Koul and Mimoto (2012) have proposed GOF tests for testing a simple null hypothesis, which are based on a kernel-type density estimator calculated from the residuals.

In a recent paper, Klar et al. (2012) have numerically studied a test based on the ECF of the residuals, comparing it with some of the tests cited above, for the problem of testing normality. From the obtained numerical results, they conclude that the test based on the ECF is one of the most powerful. The test statistic based on the ECF considered in Klar et al. (2012) is just \(R_{n,\nu }=\Vert Y_{n, \nu }\Vert _w^2\), with \(\varphi (t)=\varphi _0(t)\) the CF of the normal law and \(w\) the density of the standard normal distribution. Thus, the results in Sect. 3 provide a theoretical basis for this test. Specifically, for testing

$$\begin{aligned} H_{0G}:\, \text{ the } \text{ CF } \text{ of } \,\varepsilon _0 \,\hbox {is } \,\varphi _0(t), \end{aligned}$$

for some \(\varphi _0(t)\) totally specified, from Theorems 1(b) and 2(b) it follows that the test

$$\begin{aligned} {\Psi }_{G}={\Psi }_{G}(X_1, X_2, \ldots , X_n)=\left\{ \begin{array}{l@{\quad }l} 1, &{} \text{ if } R_{n,\nu }\ge r_{ \alpha },\\ 0, &{}\text{ otherwise, } \end{array}\right. \end{aligned}$$

where \(r_{\alpha }\) is the \(1-\alpha \) percentile of the null distribution of \(R_{n,\nu }\), or a consistent approximation to it, is consistent against fixed alternatives, that is to say, it rejects \(H_{0G}\) with probability tending to one when it is false, whenever \(w(t)>0\), \(\forall t \in \mathbb {R}\). The null distribution of \(R_{n,\nu }\) cannot be exactly calculated. The asymptotic null distribution of \(R_{n,\nu }\) cannot be used to approximate its null distribution because it depends on unknowns (recall Remark 1). To approximate the null distribution of \(R_{n,\nu }\), Klar et al. (2012) have employed the following bootstrap algorithm.

Algorithm 6

  1. (i)

    On the basis of \(X_1,\ldots ,X_n\), compute \(\hat{\theta }=(\hat{c},\hat{a}_1,\ldots ,\hat{a}_p, \hat{b}_1,\ldots ,\hat{b}_q)'\).

  2. (ii)

    Define the bootstrap data

    $$\begin{aligned} X^*_{n,j}={\sigma }_j^*(\hat{\theta })\varepsilon _{j}^* \end{aligned}$$

    where \(\{\varepsilon _{j}^*, \, -\infty <j<\infty \}\) are IID with common CF \(\varphi _0(t)\) and

    $$\begin{aligned} {\sigma }_j^{*2}(\hat{\theta })= \hat{c}+\sum _{k=1}^p\hat{a}_kX^{*2}_{n,j-k}+\sum _{l=1}^q\hat{b}_l\sigma ^{*2}_{j-l}(\hat{\theta }), \quad j\in \mathbb {Z}. \end{aligned}$$
  3. (iii)

    Approximate the null distribution of \(R_{n,\nu }=R_{n,\nu }(X_1, \ldots , X_n)\) through the conditional distribution of \(R^*_{n,\nu }=R_{n,\nu }(X_1^*, \ldots , X_n^*)\), given \(X_1, \ldots , X_n\),

To prove that the above bootstrap scheme provides a consistent null distribution estimator of \(R_{n,\nu }\), we will assume that \(\hat{\theta }^*=\hat{\theta }(X_1^*, \ldots , X_n^*)\) satisfies the following assumption, which is equal to assumption (A.1) plus a Lindeberg condition to ensure that \(\sqrt{n}(\hat{\theta }^*-\hat{\theta })\) is asymptotically normal, plus a continuity condition to ensure that when \(H_0\) is true \(\sqrt{n}(\hat{\theta }-{\theta })\) and \(\sqrt{n}(\hat{\theta }^*-\hat{\theta })\) both converge in law to the same limit.

  1. (A.2)
    1. (a)

      \(\hat{\theta }^*\) can be expressed as

      $$\begin{aligned} \hat{\theta }^*=\hat{\theta }+{n}^{-1}\sum _{j=1}^nL_{j}(\hat{\theta })+ r^*, \end{aligned}$$

      with \(r^*=o_{P_{*}}(n^{-1/2})\) in probability, that is to say, with probability tending to 1, and

      $$\begin{aligned} L_{j}(\hat{\theta })&= (g_1(\varepsilon _j^*)l_1(\varepsilon _{j-1}^*,\, \varepsilon _{j-2}^*, \ldots ), \, \ldots ,\\&\quad g_{r}(\varepsilon _j^*)l_{r}(\varepsilon _{j-1}^*, \varepsilon _{j-2}^*, \ldots ))', \quad 1\le j\le n. \end{aligned}$$
    2. (b)

      \(E_{*}\{g_u(\varepsilon _0^*)\}=0\), \(E_{*}\{g_u(\varepsilon _0^*)^2\}<\infty \), \(E_{*}\{l_u(\varepsilon _{-1}^*, \varepsilon _{-2}^*, \ldots )^2\}<\infty \), \(1 \le u \le r\), in probability.

    3. (c)

      For every \(b \in \mathbb {R}^r\), \(\frac{1}{n} \sum _{j=1}^n E_{*} \left[ \{b'L_{j}(\hat{\theta })\}^2 \, | \, \mathcal {F}_{0 j-1}\right] \mathop {\rightarrow }\limits ^{P_{*}} b'\Sigma _{0\theta }b\), with probability tending to 1, where \( \mathcal {F}_{0 j}\) is the \(\sigma \)-algebra generated by \(\{\varepsilon _{k}^*, \, -\infty <k \le j\}\) and \(\Sigma _{0\theta }=\mathrm{Cov}_0\{L_0(\theta )\}\).

    4. (d)

      \(\lim L_n(\epsilon , e_k)=0\) for every \(\epsilon >0\) and every \(1\le k \le r\), in probability, where \(\{e_1, \ldots , e_r\} \) is any basis of \(\mathbb {R}^r\) and for \(b \in \mathbb {R}^r\),

      $$\begin{aligned} L_n(\epsilon , b)=\frac{1}{n} \sum _{j=1}^n E_{*}\left[ \{b'L_{j}(\hat{\theta })\}^2 I\{|b'L_{j}(\hat{\theta })|>\epsilon \}\right] . \end{aligned}$$
    5. (e)

      For every \(b \in \mathbb {R}^r\), \(\frac{1}{n} \sum _{j=1}^n E_{*} \left\{ \cos (t\varepsilon _j^*)b'L_{j}(\hat{\theta }) \, | \, \mathcal {F}_{0 j-1}\right\} \mathop {\rightarrow }\limits ^{P_{*}} E_0 \{\cos (t\varepsilon _0) b'L_{0}({\theta })\}\) and \(\frac{1}{n}\! \sum _{j=1}^n E_{*} \!\left\{ \sin (t\varepsilon _j^*)b'L_{j}(\hat{\theta }) | \mathcal {F}_{0 j-1} \right\} \!\mathop {\rightarrow }\limits ^{P_{*}}\! E_0\{\sin (t\varepsilon _0)b' L_{0}({\theta })\}\) in probability, \(\forall t\in \mathbb {R}\).

If \(\hat{\theta }^*\) satisfies (A.2)(a)–(d) then, from Theorem 1.3 in Kundu et al. (2000), it follows that

$$\begin{aligned} \sup _x\left| P_{*}\{ \sqrt{n}(\hat{\theta }^*-\hat{\theta })\le x\}-P(Z\le x)\right| \mathop {\longrightarrow }\limits ^{P} 0, \end{aligned}$$

where \(Z\sim N_r(0,\Sigma _{0\theta })\). If in addition \(\hat{\theta }\) satisfies (A.1) and \(H_0\) is true, then \(\sqrt{n}(\hat{\theta }-{\theta })\) and \(\sqrt{n}(\hat{\theta }^*-\hat{\theta })\) both converge in law to the same limit.

Let \(\mu _{0c}(t)=\frac{\partial }{\partial t}\mathrm{Re}\varphi _0(t)\) and \(\mu _{0s}(t)=\frac{\partial }{\partial t}\mathrm{Im}\varphi _0(t)\). The next result shows the consistency of the bootstrap approximation in Algorithm 6 as an estimator of the null distribution of the test statistic \(R_{n,\nu }\).

Theorem 5

Assume that \(\theta \in \Theta _0\), that \(\sqrt{n}(\hat{\theta }-\theta )=O_P(1)\) and that \(\hat{\theta }^*\) satisfies (A.2). Let \(\nu =\nu (n)\) be an integer satisfying (4). Let \(Y_0(t)\) be a zero mean complex valued Gaussian process with Covariance structure

$$\begin{aligned} \begin{aligned} \mathrm{Cov}\{\mathrm{Re}Y(t), \mathrm{Re}Y(s)\}&= \mathrm{Cov}_0\{C_0(t), C_0(s)\}\\ \mathrm{Cov}\{\mathrm{Re}Y(t), \mathrm{Re}Y(s)\}&= \mathrm{Cov}_0\{C_0(t), S_0(s)\}\\ \mathrm{Cov}\{\mathrm{Re}Y(t), \mathrm{Re}Y(s)\}&= \mathrm{Cov}_0\{S_0(t), S_0(s)\} \end{aligned} \end{aligned}$$

\(\forall t,s \in \mathbb {R}\), where \(C_0(t)=\cos (t\varepsilon _0)-\mathrm{Re}\varphi _0(t)-\frac{1}{2}t\mu _{0c}(t)E_0\{A_0(\theta )\}'L_0(\theta )\), \(S_0(t)=\sin (t\varepsilon _0)-\mathrm{Im}\varphi _0(t)-\frac{1}{2}t\mu _{0s}(t)E_0\{A_0(\theta )\}'L_0(\theta )\). Let \(w\) be a non-negative function satisfying (5) and let \(W_0=\Vert Y_0\Vert _w^2\). Then

$$\begin{aligned} \sup _{x}\left| P_{*}(R^*_{n,\nu } \le x)-P(W_0 \le x) \right| \mathop {\rightarrow }\limits ^{P}0. \end{aligned}$$

The result in Theorem 5 holds whether or not \(H_0\) is true. If \(H_0\) is true and \(\hat{\theta }\) satisfies (A.1), then the conditional distribution of \(R^*_{n,\nu }\), given \(X_1, \ldots , X_n\), and the distribution of \(R_{n,\nu }\) are close in the sense that both converge to the same limit.

Remark 4

The following alternative expression of \(R_{n,\nu }\), which can be easily derived using elementary formulas for the sine and the cosine of a sum, is useful from a computational point of view,

$$\begin{aligned} R_{n,\nu }=\frac{1}{n-\nu }\sum _{j=\nu }^n\sum _{k=\nu }^nh(\tilde{\varepsilon }_j, \tilde{\varepsilon }_k), \end{aligned}$$

where \(h(x,y)=I_w(x-y)-I_{w0}(x)-I_{w0}(y)+I_{w00}\), with \(I_{w0}(x)=\int I_w(x-y)dF_0(y)\) and \(I_{w00}= \int \!\int I_w(x-y)dF_0(x)dF_0(y)\), \(F_0\) being the cumulative distribution function corresponding to \(\varphi _0\).