On the empirical characteristic function process of the residuals in GARCH models and applications

Jiménez Gamero, M. Dolores

doi:10.1007/s11749-014-0359-5

On the empirical characteristic function process of the residuals in GARCH models and applications

Original Paper
Published: 19 February 2014

Volume 23, pages 409–432, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

TEST Aims and scope Submit manuscript

On the empirical characteristic function process of the residuals in GARCH models and applications

Download PDF

M. Dolores Jiménez Gamero¹

266 Accesses
12 Citations
Explore all metrics

Abstract

The class of generalized autoregressive conditional heteroscedastic (GARCH) models has been proved to be particularly valuable in modeling financial data. This paper is devoted to study the empirical characteristic function process of the residuals. Specifically, it is shown that such process uniformly converges to the population characteristic function (CF) of the innovations in compact sets. The weak convergence of this empirical process, suitably normalized, is also studied. The limit depends on the population CF of the innovations, the equation defining the GARCH model and the parameter estimators employed to calculate the residuals. Applications of the obtained results for testing symmetry and goodness-of-fit to the law of the innovations are given.

Statistical inference for mixture GARCH models with financial application

Article 12 March 2021

Asymptotic normality of Huber-Dutter estimators in a linear EV model with AR(1) processes

Article Open access 27 November 2014

Estimating FARIMA models with uncorrelated but non-independent error terms

Article 14 May 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Empirical processes play a prominent role in Statistics, since statistical procedures often involve functionals of them. In certain settings, such as linear models or time series, some variables of interest, such as the errors or innovations, cannot be directly observed and the inference is based on the residuals, whose calculation involves the estimation of certain parameters in the model. GARCH models, introduced by Bollerslev (1986), belong to this class. The present paper is concerned with the study of the empirical characteristic function (ECF) of the residuals of these models. This study is motivated by the fact that the last decades have witnessed an increasing number of statistical procedures based on functions of the ECF process in a wide range of models and settings: for example, in point estimation (Feuerverger and McDunnough 1981a, b), the $k$-sample problem (Hušková and Meintanis 2008; Alba Fernández et al. 2008) and goodness-of-fit (GOF) tests (Epps and Pulley 1983; Baringhaus and Henze 1988; Gürtler and Henze 2000; Meintanis 2004; Matsui and Takemura 2005, 2008; Jiménez-Gamero et al. 2009, for independent and identically distributed (IID) observations, and Hušková and Meintanis 2007, 2010 and Jiménez-Gamero et al. 2005, for the errors in regression models, among many others). Much of the appeal of these procedures is that the application usually requires weaker conditions than their analogues based on the empirical distribution function (EDF). Another advantage of the statistical procedures based on the ECF over those based on the EDF is that while the data dimension plays an important role in the later ones (for instance, the Cramér von Mises test cannot be readily calculated for $d$-dimensional data, for any $d\ge 2$), it plays no role for many ECF-based tests, since the Cramér Wold device (see, for example, Serfling 1980, pp 17–18) is automatically applied. A key step towards the development of statistical procedures based on the ECF for making inferences on GARCH models is to study the ECF process of the residuals.

Some other processes associated with the residuals of GARCH models have been previously studied. For example, Berkes and Horváth (2001) have studied the empirical process of the observations; Berkes and Horváth (2003) have studied the empirical process of the squared residuals; the results in this last paper inspired those in Horváth et al. (2004), where some GOF tests based on the EDF of the squared residuals were numerically studied; Kulperger and Yu (2005) have studied partial sums of $k$th powers of residuals, with applications to change-point problems and GOF; Koul and Ling (2006) have studied the empirical process of the residuals with applications to testing GOF for the distribution of the innovations; Horváth et al. (2008) have studied partial sums of the squared observations and of its EDF.

This paper is devoted to study the limit behavior of the ECF process of the residuals. Specifically, we study the convergence in the class of continuous functions defined on a compact set, as well as the convergence in the Hilbert space $L_2(w)=\{f:\mathbb {R} \rightarrow \mathbb {C}: \Vert f\Vert ^2_w=\int |f(t)|^2w(t)\mathrm{d}t<\infty \}$, for some nonnegative function $w$ satisfying $0<\int w(t)\mathrm{d}t<\infty $. We also study the convergence in law to a Gaussian process. The covariance structure of the limit process depends on the distribution of the innovations, the estimators employed to approximate the parameters of the GARCH model and the equation defining the model. Applications of the obtained results are reported. Specifically, we consider the problem of testing symmetry, which is equivalent to testing that the imaginary part of the population characteristic function (CF) of the innovations is equal to 0. Surprisingly, the limiting null distribution of the considered test statistic coincides with that derived for IID data, which only depends on the population CF. Another application to the problem of testing GOF for the distribution of the innovations is also given. In both applications, the null distribution of the test statistic is approximated by a bootstrap algorithm. The consistency of these bootstrap estimators is proven.

The paper is organized as follows. Section 2 describes the model and summarizes some properties that will be used along the paper. The main results concerning the asymptotic behavior of the ECF process of the residuals are studied in Sect. 3. Section 4 provides two applications of the obtained results to testing symmetry and GOF for the distribution of the innovations. All proofs, as well as some intermediate results, are sketched in the Appendix.

Before ending this section, we introduce some notation: all vectors are column vectors; for any vector $v$, $v_k$ denotes its $k$th coordinate, $\Vert v\Vert $ its Euclidean norm and $v'$ its transpose; for any complex number $x=a+\text{ i }b$, $\bar{x}=a-\text{ i }b$ and $|x|=\sqrt{a^2+b^2}=\sqrt{x \bar{x}}$; for any complex function $f(x)$, $\mathrm{Re}f(t)$ and $\mathrm{Im}f(t)$ denote the real and the imaginary parts of $f$, respectively, that is to say, $f(x)=\mathrm{Re}f(t)+\text{ i } \mathrm{Im}f(x)$; $P_0$, $E_0$ and $\mathrm{Cov}_0$ denote probability, expectation and covariance, respectively, by assuming that the null hypothesis is true; $P_{*}$, $E_{*}$ and $\mathrm{Cov}_{*}$ denote the conditional probability law, expectation and covariance, given $X_1, X_2, \ldots , X_n$, respectively; all limits in this paper are taken when $n \rightarrow \infty $; $\mathop {\rightarrow }\limits ^{\mathcal {L}}$ denotes convergence in distribution; $\mathop {\rightarrow }\limits ^{P}$ denotes convergence in probability; $\mathop {\rightarrow }\limits ^{a.s.}$ denotes the almost sure convergence; an unspecified integral denotes integration over the whole real line $\mathbb {R}$; $\langle \cdot , \cdot \rangle $ denotes the scalar product in the Hilbert space $L_2(w)$; without loss of generality it will be assumed along the paper that $\int w(t)\mathrm{d}t=1$.

2 The model

Let $p,q \in \mathbb {N}\cup \{0\}$. A stochastic process $\{X_j, \; -\infty <j<\infty \}$ is said to follow a GARCH($p$, $q$) model if it satisfies the equations

$$\begin{aligned} X_j=\sigma _j\varepsilon _j, \end{aligned}$$

(1)

with

$$\begin{aligned} \sigma _j^2=c+\sum _{k=1}^pa_kX^2_{j-k}+\sum _{l=1}^qb_l\sigma ^2_{j-l}, \end{aligned}$$

(2)

for $-\infty <j<\infty $, where $c>0$, $a_k \ge 0$ and $b_l \ge 0$. If $q=0$ then we get an autoregressive conditional heteroscedastic (ARCH) model, introduced by Engle (1982). Throughout this paper, it will be assumed that $\{X_j, \; -\infty <j<\infty \}$ satisfies (1) and (2), that it is stationary, that $\{\varepsilon _j,\; -\infty <j<\infty \}$ are IID variables with $E(\varepsilon _j)=0$ and $E(\varepsilon _j^2)=1$, and that $\varepsilon _j$ is independent of $\{X_{j-k}, \; k \ge 1\}$.

Bougerol and Picard (1992a, b) have given necessary and sufficient conditions for the existence of a unique strictly stationary solution of (1) and (2). A necessary and sufficient condition for the process $\{X_j, \; -\infty <j<\infty \}$ to be (strictly) stationary with $E(X_j^2)<\infty $ is (see, for example, Theorem 4.4 in Fan and Yao 2003)

$$\begin{aligned} \sum _{k=1}^pa_k+\sum _{l=1}^qb_l<1. \end{aligned}$$

In this case, $E(X_j)=0$ and

$$\begin{aligned} E(X_j^2)=c\left( 1-\sum _{k=1}^pa_k-\sum _{l=1}^qb_l\right) ^{-1}. \end{aligned}$$

Let $\mathcal {F}_{j}$ be the $\sigma $-algebra generated by $\{\varepsilon _k, \, -\infty <k\le j\}$. Since $E(X_j^2 \, | \, \mathcal {F}_{j-1})=\sigma _j^2$, the expectations of $X_j^2$ and $\sigma _j^2$ coincide. If $E(\log \sigma _0^2)<\infty $, then Theorem 2.1 in Berkes et al. (2003) shows that $\sigma _j^2$ can be expressed as (see also Hall and Yao 2003)

$$\begin{aligned} \sigma _j^2=\sigma _j^2(\theta )&= \frac{c}{1-\sum _l b_l}+\sum _{k=1}^pa_kX^2_{j-k}\\&\quad +\,\sum _{k=1}^pa_k \sum _{v=1}^{\infty }\sum _{l_1}^q \ldots \sum _{l_v}^q b_{l_1}\ldots b_{l_v}X^2_{j-k-l_1-\cdots -lv}, \end{aligned}$$

where $\theta =(c,a_1,\ldots , a_p, b_1, \ldots , b_q)'$ and the multiple sum vanishes if $q=0$. From Lemma 2.3 in Berkes et al. (2003), a sufficient condition for $E(\log \sigma _0^2)<\infty $ to hold is that $E(|\varepsilon _0^2|^{\delta })<\infty $, for some $\delta >0$. Since we assume that $E(\varepsilon _0^2)=1$, then the above expansion for $\sigma _j^2$ holds. Let $r=1+p+q$ denote the dimension of $\theta $, which is assumed to be fixed but unknown.

As in Berkes and Horváth (2003), it will be also assumed that $\theta \in \Theta _0=\Theta (\rho _0,\rho _1,\rho _2)=\{u=(\gamma , \alpha _1, \ldots , \alpha _p,\beta _1,$ $ \ldots , \beta _q ): \, \beta _1+ \cdots + \beta _q \le \rho _0, \rho _1 \le \min \{\gamma , \alpha _1, \ldots , \alpha _p,\beta _1, $ $\ldots , \beta _q \}\le \max \{\gamma , \alpha _1, \ldots , \alpha _p,\beta _1, \ldots , \beta _q \}$ $\le \rho _2\}$, for some constants $\rho _0,\rho _1,\rho _2$ satisfying $0<\rho _0<1$, $0<\rho _1<\rho _2$, $q\rho _1 \le \rho _0$. Note that this assumption requires $p$ and $q$ to be known, and rules out zero coefficients in $\theta $.

To estimate $\theta $, it is often assumed that the errors $\varepsilon _j$ are normally distributed. This estimator, $\hat{\theta }$, is called the Gaussian maximum likelihood estimator (GMLE). If

$$\begin{aligned} E( \varepsilon ^4)<\infty , \end{aligned}$$

(3)

then $\sqrt{n}(\hat{\theta }-\theta )$ is asymptotically normally distributed, even if the errors are not normally distributed (see Hall and Yao 2003; Francq and Zakoïan 2004). Moreover, even if (3) does not hold then, under certain conditions, $n^{\kappa }(\hat{\theta }-\theta )$ is bounded in probability, for some $\kappa >0$ (see Hall and Yao 2003). Although the GMLE has become the most popular estimator, other estimators have been proposed. Examples are the estimators in Peng and Yao (2003), which are asymptotically normally distributed without requiring (3), and those in Berkes and Horváth (2004), where a class of estimators including the GMLE is studied. From now on, we will denote through $\hat{\theta }$ to any estimator of $\theta $.

3 Main results

In a GARCH model, the errors are not observable. Thus to make inferences on the errors, we must approximate them by means of the residuals. With this aim, we must first estimate $\sigma _j^2(\theta )$. Note that $\sigma _j^2(\theta )$ depends on $\{X_k,\, -\infty <k\le j-1\}$, whereas we observe $X_1, \ldots , X_n$. So, in order to calculate the residuals, instead of $\sigma _j^2(\hat{\theta })$, we consider $\tilde{\sigma }_j^2(\hat{\theta })$, where

$$\begin{aligned} \tilde{\sigma }_j^2(\theta )&= \frac{c}{1-\sum _l b_l}+\sum _{k=1}^{\min \{p, j-1 \}} a_kX^2_{j-k}\\&+\,\sum _{k=1}^pa_k \sum _{v=1}^{\infty }\sum _{l_1}^q \ldots \sum _{l_v}^q b_{l_1}\ldots b_{l_v}X^2_{j-k-l_1-\cdots -lv} I(j-k-l_1-\cdots -l_v) \end{aligned}$$

and $I(S)$ denotes the indicator function of the set $S$, which only depends on the observations $X_1, \ldots , X_{j-1}$. Let $\{\tilde{\varepsilon }_j=X_j/\tilde{\sigma }_j(\hat{\theta }), \, 1\le j\le n\}$ be the residuals and let $\varphi _{n,\nu } (t)$ denote the ECF of the residuals $\tilde{\varepsilon }_{\nu +1}, \ldots , \tilde{\varepsilon }_n$

$$\begin{aligned} \varphi _{n, \nu }(t)=\frac{1}{n-\nu }\sum _{j=\nu +1}^n e^{it\tilde{\varepsilon }_j} , \end{aligned}$$

for some integer $\nu \ge 1$. The reason for only considering the residuals $\tilde{\varepsilon }_{\nu +1}, \ldots , \tilde{\varepsilon }_n$, instead of all of them, $\tilde{\varepsilon }_{1}, \ldots , \tilde{\varepsilon }_n$, is that for small $j$, $\tilde{\sigma }_{j}^2(\theta )$ is not a good approximation to ${\sigma }_{j{}_{}}^2(\theta )$, and thus early terms in the series should be avoided.

For IID data, it is well known that the ECF of the data estimates consistently the population CF and that the ECF process converges to a complex Gaussian process in finite intervals (see, for example, Feuerverger and Mureika 1977; Csörgő 1981a, b; Marcus 1981). The next theorems state similar results for the ECF of the residuals. Let $\varphi (t)$ denote the CF of $\varepsilon _0$.

Theorem 1

Assume that $\theta \in \Theta _0$ and $n^{\kappa }(\hat{\theta }-\theta )=O_P(1)$, for some $\kappa >0$. Let $\nu =\nu (n)$ be an integer satisfying

$$\begin{aligned} \nu /n \rightarrow 0. \end{aligned}$$

(4)

Then,

(a)
$\sup _{t\in S}\left| \varphi _{n, \nu }(t)- \varphi (t)\right| \mathop {\longrightarrow }\limits ^{P} 0$, $\forall S$ compact interval.
(b)
$\Vert \varphi _{n,\nu }-\varphi \Vert _w \mathop {\longrightarrow }\limits ^{P} 0. $

Next, we study the convergence in law of the ECF process $Y_{n, \nu }(t)=\sqrt{n-\nu }\{\varphi _{n, \nu }(t) -\varphi (t)\} $ and of its $L_2(w)$-norm, $\Vert Y_{n, \nu }\Vert _w$. With this aim, we will assume that $\sqrt{n}(\hat{\theta }-\theta )$ is asymptotically normal. Specifically, we will assume that $\hat{\theta }$ satisfies the following.

(A.1)
$\hat{\theta }$ can be expressed as
$$\begin{aligned} \hat{\theta }={\theta }+{n}^{-1}\sum _{j=1}^nL_j(\theta )+ o_P(n^{-1/2}), \end{aligned}$$
where $L_j(\theta )=(g_1(\varepsilon _j)l_1(\varepsilon _{j-1},\, \varepsilon _{j-2}, \ldots ), \, \ldots , g_{r}(\varepsilon _j)l_{r}(\varepsilon _{j-1}, \varepsilon _{j-2}, \ldots ))',$ $1\le j\le n$,
$$\begin{aligned} E\{g_u(\varepsilon _0)\}=0,\quad E\{g_u(\varepsilon _0)^2\}<\infty , \quad E\{l_u(\varepsilon _{-1}, \varepsilon _{-2}, \ldots )^2\}<\infty , \quad 1 \le u \le r. \end{aligned}$$

The GMLE as well as other often used estimators of $\theta $ satisfy (A.1) (see Sect. 3 of Berkes and Horváth 2003). If $\hat{\theta }$ satisfies (A.1) then, by the Martingale Central Limit Theorem (see, for example, Kundu et al. 2000), $\sqrt{n}(\hat{\theta }-\theta )\mathop {\longrightarrow }\limits ^{\mathcal {L}} N_r(0,\Sigma _{\theta })$, an $r$-variate zero mean normal law with variance matrix $\Sigma _{\theta }=var\{L_0(\theta )\}=(\varsigma _{uv})$, where

$$\begin{aligned} \varsigma _{uv}=E\{g_u(\varepsilon _0) g_v(\varepsilon _0)\}E\{l_u(\varepsilon _{-1}, \varepsilon _{-2}, \ldots )l_v(\varepsilon _{-1}, \varepsilon _{-2}, \ldots )\}, \quad 1 \le u,v \le r. \end{aligned}$$

Let $\mu _c(t)\!=\!\frac{\partial }{\partial t}\mathrm{Re}\varphi (t)\!=\!E\{-\varepsilon _0 \sin (t\varepsilon _0)\}$ and $\mu _s(t)\!=\!\frac{\partial }{\partial t}\mathrm{Im}\varphi (t)\!=\!E\{\varepsilon _0 \cos (t\varepsilon _0)\}$. Observe that these derivatives exist because we assume that the innovations have finite first moment. Finally, let $\sigma _j^2(\theta )A_j(\theta )$ be the $r$-vector of derivatives of $\sigma _j^2(\theta )$ with respect to $\theta $, that is, $A_j(\theta )=\frac{1}{\sigma _j^2(\theta )}\frac{\partial }{\partial \theta }\sigma _j^2(\theta )$.

Theorem 2

Assume that $\theta \in \Theta _0$ and that $\hat{\theta }$ satisfies (A.1). Let $\nu =\nu (n)$ be an integer satisfying (4). Let $Y_{n, \nu }(t)=\sqrt{n-\nu }\{\varphi _{n, \nu }(t) -\varphi (t)\}$ and let $Y(t)$ be a zero mean complex valued Gaussian process with covariance structure

$$\begin{aligned} \begin{aligned} \mathrm{Cov}\{\mathrm{Re}Y(t), \mathrm{Re}Y(s)\}&= \mathrm{Cov}\{C(t), C(s)\},\\ \mathrm{Cov}\{\mathrm{Re}Y(t), \mathrm{Re}Y(s)\}&= \mathrm{Cov}\{C(t), S(s)\},\\ \mathrm{Cov}\{\mathrm{Re}Y(t), \mathrm{Re}Y(s)\}&= \mathrm{Cov}\{S(t), S(s)\},\\ \end{aligned} \end{aligned}$$

$\forall t,s \in \mathbb {R}$, where $C(t)=\cos (t\varepsilon _0)-\mathrm{Re}\varphi (t)-\frac{1}{2}t\mu _c(t)E\{A_0(\theta )\}'L_0(\theta )$, $S(t)=\sin (t\varepsilon _0)-\mathrm{Im}\varphi (t)-\frac{1}{2}t\mu _s(t)E\{A_0(\theta )\}'L_0(\theta )$. Then,

(a)
$Y_{n, \nu }(t)$ converges weakly to Y(t) in every compact interval.
(b)
If
$$\begin{aligned} \int t^4w(t)\mathrm{d}t<\infty , \end{aligned}$$
(5)
then $\Vert Y_{n, \nu }\Vert _w^2\mathop {\longrightarrow }\limits ^{\mathcal {L}} \Vert Y\Vert ^2_w$.

Remark 1

Let $m_{c}(t)=\mathrm{Cov}\{\cos (t \varepsilon _0), L_0(\theta )\}$ and $m_{s}(t)=\mathrm{Cov}\{\sin (t \varepsilon _0), L_0(\theta )\}$. Note that,

$$\begin{aligned} \mathrm{Cov}\{C(t), C(s)\}&= \frac{1}{2}\{\mathrm{Re}\varphi (t+s)+\mathrm{Re}\varphi (t-s)\}-\mathrm{Re}\varphi (t)\mathrm{Re}\varphi (s)\\&\quad -\frac{1}{2}t\mu _c(t)E\{A_0(\theta )\}'m_c(s) -\frac{1}{2}s\mu _c(s)E\{A_0(\theta )\}'m_c(t)\\&\quad +\frac{1}{4}ts\mu _c(t)\mu _c(s)E\{A_0(\theta )\}'\Sigma _{\theta } E\{A_0(\theta )\},\\ \mathrm{Cov}\{C(t), S(s)\}&= \frac{1}{2}\{\mathrm{Im}\varphi (t+s)+\mathrm{Im}\varphi (t-s)\}-\mathrm{Re}\varphi (t)\mathrm{Im}\varphi (s)\\&\quad - \frac{1}{2} t\mu _c(t)E\{A_0(\theta )\}'m_s(s)-\frac{1}{2}s\mu _s(s)E\{A_0(\theta )\}'m_c(t)\\&\quad + \frac{1}{4}ts\mu _c(t)\mu _s(s)E\{A_0(\theta )\}'\Sigma _{\theta } E\{A_0(\theta )\}, \\ \mathrm{Cov}\{S(t), S(s)\}&= \frac{1}{2}\{-\mathrm{Re}\varphi (t+s)+\mathrm{Re}\varphi (t-s)\}-\mathrm{Im}\varphi (t)\mathrm{Im}\varphi (s)\\&\quad -\frac{1}{2} t\mu _s(t)E\{A_0(\theta )\}'m_s(s) -\frac{1}{2}s\mu _s(s)E\{A_0(\theta )\}'m_s(t)\\&\quad +\frac{1}{4}ts\mu _s(t)\mu _s(s)E\{A_0(\theta )\}'\Sigma _{\theta } E\{A_0(\theta )\}, \end{aligned}$$

$\forall t,s \in \mathbb {R}$. Therefore, in contrast to the IID case, the limit law of the ECF process depends not only on the CF of the innovations, but also on the estimator of $\theta $ employed, through $\Sigma _{\theta }$, $m_c(t)$ and $m_s(t)$, and on the equation defining the GARCH model through $E\{A_0(\theta )\}$.

4 Applications

4.1 Testing for symmetry

Many commonly used packages allow the practitioner to choose between several symmetric distributions for obtaining the (quasi) maximum likelihood estimator of the parameter $\theta $, usually: normal (obtaining the GMLE), Laplace and Student $t$. The two later distributions let us model tails which are heavier than those of the normal law, a fact frequently observed in financial time series (see Rydberg 2000). Note that all of these distributions are symmetric, a hypothesis questioned by several authors, in the light of certain practical applications (see also Rydberg 2000). So, one could wish to test if the hypothesis of symmetry is supported by the data. This hypothesis is equivalent to the following

$$\begin{aligned} H_{0S}: \text{ the } \text{ law } \text{ of } \text{ the } \text{ errors } \text{ is } \text{ symmetric } \Longleftrightarrow H_{0S}: \,\mathrm{Im}\varphi (t)=0, \forall t. \end{aligned}$$

As a consequence of Theorem 1, under the assumptions in this theorem, if $w$ is a weight function such that $w(t)>0$, $\forall t \in \mathbb {R}$, then

$$\begin{aligned} T_{n,\nu }=T_{n,\nu }(X_1,\ldots , X_n)= \Vert \mathrm{Im} \varphi _{n,\nu }\Vert _w^2 \mathop {\longrightarrow }\limits ^{P} \Vert \mathrm{Im} \varphi \Vert _w^2 \ge 0, \end{aligned}$$

(6)

with $\Vert \mathrm{Im} \varphi \Vert _w=0$ if and only if $H_{0S}$ is true. So a reasonable test for testing $H_{0S}$ should reject the null hypothesis for “large” values of $T_{n,\nu }$. This statistic (with $\nu =0$) was first proposed by Feuerverger and Mureika (1977) for testing symmetry in the IID case (see also Henze et al. 2003). Now, to determine what are large values of $T_{n,\nu }$, we should calculate the null distribution of $T_{n,\nu }$, or at least a consistent approximation to it. Clearly, the null distribution of $T_{n,\nu }$ is unknown. A classical way to approximate the null distribution of a test statistic is through its asymptotic null distribution. As a consequence of Theorem 2, under $H_{0S}$,

$$\begin{aligned} (n-\nu )T_{n,\nu }\mathop {\longrightarrow }\limits ^{\mathcal {L}} W_{0S}=\Vert Y_{0S}\Vert _w^2, \end{aligned}$$

(7)

where $Y_{0S}(t)=\mathrm{Im} Y(t)$, $Y(t)$ being as defined in Theorem 2. From Theorem 2 and Remark 1, since under $H_{0S}$ we have that $\mathrm{Im} \varphi (t)=0$ and $\mu _s(t)=\frac{\partial }{\partial t}\mathrm{Im} \varphi (t)=0$, $\forall t$, it follows that the Covariance structure of $Y_{0S}(t)$ is given by

$$\begin{aligned} K(s,t)=E\{Y_{0S}(t), Y_{0S}(s)\} = \frac{1}{2}\{-\varphi (t+s)+\varphi (t-s)\}. \end{aligned}$$

(8)

Note that the asymptotic null distribution of $(n-\nu )T_{n,\nu }$ depends neither on the estimator of $\theta $ employed nor on the equation defining the GARCH model governing the data, but only on the population CF of the innovations. In fact, the asymptotic null distribution of $(n-\nu )T_{n,\nu }$ coincides with that obtained in Feuerverger and Mureika (1977) for the IID case. In other words, under $H_{0S}$, $\sqrt{n-\nu } \varphi _{n,\nu }(t)$ asymptotically behaves just like $\sqrt{n}\varphi _n(t)$ in the sense that both processes have the same weak limit, where $\varphi _n(t)=\frac{1}{n}\sum _{j=1}^n\varepsilon _j$. Let $0<\alpha <1$. The limit (7) tells us that $(n-\nu )T_{n,\nu }=O_P(1)$, and thus from (6), it follows that the test function for testing $H_{0S}$

$$\begin{aligned} {\Psi }_{S}={\Psi }_{S}(X_1, X_2, \ldots , X_n)=\left\{ \begin{array}{l@{\quad }l} 1, &{} \text{ if } (n-\nu )T_{n,\nu }\ge t_{ \alpha },\\ 0, &{}\text{ otherwise, } \end{array}\right. \end{aligned}$$

(9)

where $t_{\alpha }$ is the $1-\alpha $ percentile of the null distribution of $(n-\nu )T_{n,\nu }$, or a consistent approximation to it, is consistent against fixed alternatives, that is to say, it rejects $H_{0S}$ with probability tending to one when it is false.

As observed before, the null distribution of $(n-\nu )T_{n,\nu }$ cannot be exactly calculated. The asymptotic null distribution of $(n-\nu )T_{n,\nu }$ cannot be used to approximate its null distribution, because it depends on the unknown CF of the innovations. Thus, we have to resort to other methods to approximate the null distribution of the test statistic.

The test (9) has been numerically investigated by Klar et al. (2012). To approximate the null distribution of the test statistic, these authors have employed the following bootstrap algorithm, which is quite similar to the bootstrap schemes employed in Hall and Yao (2003); Horváth et al. (2004) and Pascual et al. (2006).

Algorithm 1

(i)
On the basis of $X_1,\ldots ,X_n$, compute $\hat{\theta }=\hat{\theta }(X_1,\ldots ,X_n)=(\hat{c},\hat{a}_1,\ldots ,\hat{a}_p,\hat{b}_1,\ldots ,\hat{b}_q)'$.
(ii)
Compute the residuals $\tilde{\varepsilon }_{1}, \ldots , \tilde{\varepsilon }_{n}$.
(iii)
Define the bootstrap observations
$$\begin{aligned} X^*_{n,j}=\tilde{\sigma }_j^*(\hat{\theta }) \varepsilon ^*_j, \end{aligned}$$
where
$$\begin{aligned} \tilde{\sigma }_j^{*2}(\hat{\theta })=\hat{c}+\sum _{k=1}^{\min \{p,j-1\}}\hat{a}_kX^{*2}_{n,j-k}+\sum _{l=1}^{\min \{q,j-1\}}\hat{b}_l\tilde{\sigma }^{*2}_{j-l}(\hat{\theta }) \end{aligned}$$
and $\varepsilon ^*_j=\upsilon _j\tilde{\varepsilon }_j$, $j=1, \ldots , n$, $\upsilon _{1}, \ldots , \upsilon _n$ are IID with $P(\upsilon _j=-1)=P(\upsilon _j=1)=0.5$, and $\upsilon _{1}, \ldots , \upsilon _n$ are also independent of $\tilde{\varepsilon }_{1}, \ldots , \tilde{\varepsilon }_{n}$.
(iv)
Based on the bootstrap data, $\mathbf{X}_n^*=(X^*_{n,1},\ldots ,X^*_{n,n})$ calculate the test statistic, obtaining $T^*_{n,\nu }=T_{n,\nu }(X^*_{n,1},\ldots ,X^*_{n,n})$. Approximate the null distribution of $(n-\nu )T_{n,\nu }$ through the conditional distribution of $(n-\nu )T_{n,\nu }^{*}$, given the data.

The above algorithm can be slightly modified by generating the bootstrap innovations from a symmetrization of the EDF of the residuals. We call Algorithm 2 to the resulting bootstrap algorithm.

Algorithm 2

Steps (i), (ii) and (iv) are as in Algorithm 1.

(iii)
The bootstrap observations are defined as in Algorithm 1, but now $\varepsilon ^*_{\nu +1},\ldots , \varepsilon ^*_{n}$ are IID from the EDF of $\tilde{\varepsilon }_{\nu +1}, \ldots , \tilde{\varepsilon }_{n},-\tilde{\varepsilon }_{\nu +1}, \ldots , -\tilde{\varepsilon }_{n}.$

In practice, the bootstrap estimation of the null distribution of $(n-\nu )T_{n,\nu }$ in step (iv) in Algorithms 1 and 2 must be carried out by simulation, that is, by generating a high number of bootstrap samples, say $\mathbf{X}_n^{*1},\ldots ,\mathbf{X}_n^{*B}$, and then approximating the null distribution of $(n-\nu )T_{n,\nu }$ through the EDF of $(n-\nu )T_{n,\nu }^{*1}=(n-\nu )T_{n,\nu }(\mathbf{X}_n^{*1}), \ldots , (n-\nu )T_{n,\nu }^{*B}=(n-\nu )T_{n,\nu }(\mathbf{X}_n^{*B})$. This requires the calculation of $\hat{\theta }^{*1}=\hat{\theta }(\mathbf{X}_n^{*1}), \ldots , \hat{\theta }^{*B}=\hat{\theta }(\mathbf{X}_n^{*B})$ as well as the bootstrap residuals, $\tilde{\varepsilon }_j^{*b}$, $\nu +1 \le j\le n$, $1\le b \le B$.

We can considerably save computing time by taking advantage of the property that, under $H_{0S}$, $\sqrt{n-\nu } \varphi _{n,\nu }(t)$ asymptotically behaves the same as $\sqrt{n}\varphi _n(t)$. With this aim, we treat the residuals $\tilde{\varepsilon }_{\nu +1}, \ldots , \tilde{\varepsilon }_n$ as if they were the true errors $\varepsilon _{\nu +1},\ldots , \varepsilon _n$ and then act as in the IID setting (following for example the approach in Henze et al. 2003). This way we elude the calculation of $(n-\nu )T_{n,\nu }^{*1}=(n-\nu )T_{n,\nu }(\mathbf{X}_n^{*1}), \ldots , (n-\nu )T_{n,\nu }^{*B}=(n-\nu )T_{n,\nu }(\mathbf{X}_n^{*B})$. Algorithms 3 and 4 give two bootstrap null distribution estimators that make use of this fact.

Algorithm 3

Steps (i) and (ii) are as in Algorithm 1.

(iii)
Let $\varepsilon ^*_j=\upsilon _j\tilde{\varepsilon }_j$, $j=1, \ldots , n$, where $\upsilon _{1}, \ldots , \upsilon _n$ are IID with $P(\upsilon _j=-1)=P(\upsilon _j=1)=0.5$, and $\upsilon _{1}, \ldots , \upsilon _n$ are also independent of $\tilde{\varepsilon }_{1}, \ldots , \tilde{\varepsilon }_{n}$.
(iv)
Approximate the null distribution of $(n-\nu )T_{n,\nu }$ through the conditional distribution of $(n-\nu )T_{n,\nu }^{*}$, given the data, where $T_{n,\nu }^{*}= \Vert S_{n,\nu }^{*}\Vert _w^2$ and $S_{n,\nu }^{*}(t)=\frac{1}{n-\nu }\sum _{j=\nu +1}^n \sin (t\varepsilon ^*_j).$

Algorithm 4

Steps (i), (ii) and (iv) are as in Algorithm 3.

(iii)
$\varepsilon ^*_{\nu +1},\ldots , \varepsilon ^*_{n}$ are IID from the EDF of $\tilde{\varepsilon }_{\nu +1}, \ldots , \tilde{\varepsilon }_{n},-\tilde{\varepsilon }_{\nu +1}, \ldots , -\tilde{\varepsilon }_{n}.$

We next show the consistency of the distribution estimators yielded by Algorithms 3 and 4. Observe that no additional assumption is assumed to prove such consistency. Before stating these results, we want to remark that to derive the convergence in (7), it is not necessary to assume that $\hat{\theta }$ satisfies (A.1), but only that $\sqrt{n}(\hat{\theta }-\theta )=O_P(1)$ (this is evident from the proof of Theorem 2).

Theorem 3

Assume that $\theta \in \Theta _0$, that (5) holds and that $\sqrt{n}(\hat{\theta }-\theta )=O_P(1)$. Let $\nu =\nu (n)$ be an integer satisfying (4). If $T_{n,\nu }^{*}$ is as defined in Algorithm 3 or Algorithm 4, then

$$\begin{aligned} \sup _x|P_{*}\{(n-\nu )T_{n,\nu }^{*}\le x\}-P(W_{0S}\le x)|\mathop {\rightarrow }\limits ^{P}0, \end{aligned}$$

where $W_{0S}$ is as defined in (7).

Let $\{\lambda _j\}$ be the set of eigenvalues of operator $\mathcal {A}$ defined on $L_2(w)$ by

$$\begin{aligned} \mathcal {A} v(y)=\int K (x,y) v(y)w(x)\mathrm{d}x. \end{aligned}$$

The random variate $W_{0S}$ is distributed as a (infinite) sum of independent chi-squared variates with one degree of freedom, $\chi ^2_1$, multiplied by the eigenvalues of $\mathcal {A}$, $\sum _{j} \lambda _j \chi ^2_{1j}$. The set $\{\lambda _j\}$ is unknown because $K(t,s)$ is unknown. Nevertheless, $K(s,t)$ can be consistently estimated by

$$\begin{aligned} K_n(t,s)&= \frac{1}{n-\nu }\sum _{j=\nu +1}^n \sin (t\tilde{\varepsilon }_j)\sin (s\tilde{\varepsilon }_j)\\&= \frac{1}{2(n-\nu )}\sum _{j=\nu +1}^n \left[ \cos \{(t-s)\tilde{\varepsilon }_j\}-\cos \{(t+s)\tilde{\varepsilon }_j\}\right] . \end{aligned}$$

From Lemma 6 in Sect. 5, $K_n(t,s) \mathop {\rightarrow }\limits ^{P} K(t,s)$, $\forall s,t \in \mathbb {R}$. Thus, we can approximate the distribution of $W_{0S}$, and thus the null distribution of $(n-\nu )T_{n,\nu }$, by means of

$$\begin{aligned} W_n=\sum _{j} \hat{\lambda }_j \chi ^2_{1j}, \end{aligned}$$

(10)

where $\{\hat{\lambda }_j\}$ are the eigenvalues of operator $\mathcal {A}_n$ defined by $\mathcal {A}_n v(y)=\int K_n (x,y) v(y)w(x)\mathrm{d}x$. Routine calculations show that $\{\hat{\lambda }_j\}$ are the eigenvalues of the $(n-\nu )\times (n-\nu )$-matrix $M=(m_{jk})$ with

$$\begin{aligned} m_{jk}=\frac{1}{2(n-\nu )} \left\{ I_w(\tilde{\varepsilon }_j-\tilde{\varepsilon }_k)-I_w(\tilde{\varepsilon }_j+\tilde{\varepsilon }_k)\right\} , \end{aligned}$$

where $I_w(t)=\int \cos (tx)w(x)\mathrm{d}x$. Therefore, the set $\{\hat{\lambda }_j\}$ can be easily calculated using most statistical and mathematical programming languages. $W_n$ is also a bootstrap estimator of the null distribution of $(n-\nu )T_{n,\nu }$. It is usually called a “bootstrap in the limit” estimator, since it has been built by replacing all unknown quantities in the limit distribution of the test statistic by appropriate estimators. The next result shows that $W_n$ estimates consistently the null distribution of $(n-\nu )T_{n,\nu }$.

Theorem 4

Under assumptions in Theorem 3,

$$\begin{aligned} \sup _x|P_{*}\{W_n\le x\}-P(W_{0S}\le x)|\mathop {\rightarrow }\limits ^{P}0, \end{aligned}$$

where $W_{n}$ and $W_{0S}$ are defined in (10) and (7), respectively.

We will call Algorithm 5 to the bootstrap approximation to the null distribution of $(n-\nu )T_{n,\nu }$ in Theorem 4.

Algorithm 5

Steps (i) and (ii) are as in Algorithm 1.

(iii)
Calculate the eigenvalues $\{\hat{\lambda }_j\}$ of matrix $M$
(iv)
Approximate the null distribution of $(n-\nu )T_{n,\nu }$ through the conditional distribution of $W_n=\sum _{j} \hat{\lambda }_j \chi ^2_{1j}$, given the data.

Remark 2

Using the trigonometric identity $2\sin (a) \sin (b)=\cos (a-b)-\cos (a+b)$, it can be easily derived the following alternative expression for $(n-\nu )T_{n,\nu }$,

$$\begin{aligned} (n-\nu )T_{n,\nu }= \sum _{j,k =\nu +1}^nm_{jk}, \end{aligned}$$

which is useful from a computational point of view.

Remark 3

In practice, the bootstrap distribution estimators in Algorithms 1–4 must be approximated by simulation. As for the null distribution estimator in Algorithms 5, since the distribution of a linear combination of $\chi ^2$ variates is unknown, the conditional distribution of $W_n$ must be approximated either by simulation or by some numerical method (see, for example, Kotz et al. 1967; Castaño-Martínez and López-Blázquez 2005).

We have presented five bootstrap algorithms to estimate the null distribution of $(n-\nu )T_{n,\nu }$. To compare their finite sample performance, we carried out a small simulation experiment. We generated data from a GARCH(1,1) model with $c = 0.1$, $a_1 = 0.3$, $b_1 = 0.3$ and several symmetric distributions for the innovations, namely, normal, Laplace and $t_5$. The sample size we took was $n = 400$ and $\nu = 10$. We took as weight function $w$ in the definition of test statistic $T_{n,\nu }$ the density of the standard normal distribution. As in Klar et al. (2012), in order to approximate the bootstrap $p$ value of the observed value of the test statistic, we generated $B = 200$ bootstrap samples for Algorithms 1–4. The conditional distribution of $W_n$ was also approximated by simulation. This experiment was repeated 1,000 times. The parameters in the GARCH model were estimated through the GMLE. To calculate the parameter estimators as well as the residuals, we used the package tseries of the R language. Table 1 reports the number of bootstrap $p$ values less than or equal to $\alpha $, for $\alpha =0.05,\, 0.10$, which are the estimated type I errors. Looking at this table, we see that the estimated type I errors are quite close to the nominal values in all cases. We also compared the algorithms in terms of the CPU consumed. Last column in Table 1 displays the obtained results. Algorithm 5 emerges as the cheapest in terms of computing time.

Table 1 Estimated probabilities of type I errors and relative CPU

Full size table

The power of the test $\Psi _S$, when the null distribution of the test statistic is estimated by means of Algorithm 1, has been numerically investigated by Klar et al. (2012). To study if the method of approximating the null distribution has any impact on the power for finite sample size, we repeated the above experiment with samples from skewed versions of the symmetric distributions in Table 1. Such skewed versions were obtained by applying the skewing mechanism proposed in Fernández and Steel (1998), namely, the density of the skewed distribution, indexed by a scalar $\gamma \in (0,\infty )$, is generated from the symmetric density $f$ as follows

$$\begin{aligned} f_{\gamma }(t)=\frac{2}{\gamma +1/\gamma }\left\{ f(t/\gamma )I(t \ge 0)+f(\gamma t)I(t<0) \right\} . \end{aligned}$$

For $\gamma =1$, we obtain the symmetric density $f$, for $\gamma >1$ ($\gamma <1$) $f_{\gamma }$ is skewed to the right (left). Since $f_{\gamma }(t)= f_{1/\gamma }(-t)$, it is sufficient to consider values $\gamma >1$. As in Klar et al. (2012), the values of $\gamma $ were chosen so that the value of the skewness coefficient (that in our case coincides with the third moment because $E(\varepsilon _j)=0$ and $E(\varepsilon _j^2)=1$) has comparable values across the different distributions. Table 2 displays the obtained results for nominal level $\alpha =0.05$. Looking at this table, we conclude that the method of estimating the null distribution of the test statistic has little effect on the power, since all estimated powers are quite close.

Table 2 Estimated powers for nominal level $\alpha =0.05$

Full size table

Summarizing, since the levels and the powers yielded by the five algorithms are very close, and Algorithm 5 is, from a computational point of view, the cheapest, we recommend its use.

4.2 Testing goodness-of-fit for the distribution of the innovations

To estimate the parameters of a GARCH model, it is usually assumed that the errors or innovations are normally distributed. Under certain not very restrictive conditions, the resultant estimator is normally distributed, even if the errors are not normally distributed (see Hall and Yao 2003; Berkes et al. 2003; Francq and Zakoïan 2004; Escanciano 2009). Nevertheless, as shown in Berkes and Horváth (2004) and numerically observed by Huang et al. (2008), the choice of the correct likelihood leads to more accurate estimates of the parameters. In addition, as argued in Angelidis et al. (2004) and Koul and Ling (2006), among many others, the knowledge of the error distribution plays an important role in evaluating the Value at Risk (VaR), a quantity very useful in economics and finance, whose calculation involves the distribution of the innovations. Hence, for certain purposes, a very important step in the analysis of GARCH models is to check if the data support the distributional hypothesis made on the innovations.

Some tests have been proposed for testing GOF for the innovations distribution. Since the innovations or errors are not observable, all these tests are necessarily based on the estimated errors or residuals. The proposed tests are “residual versions” for testing GOF for IID data. For example, Horváth et al. (2004) have numerically studied some GOF tests based on the EDF of the squared residuals for testing GOF for normality; Kulperger and Yu (2005) have proposed a Jarque–Bera type normality test; Koul and Ling (2006) and Bai and Chen (2008) have proposed a Kolmogorov–Smirnov type GOF test for testing a simple null hypothesis; Horváth and Zytikis (2006), Mimoto (2008) and Koul and Mimoto (2012) have proposed GOF tests for testing a simple null hypothesis, which are based on a kernel-type density estimator calculated from the residuals.

In a recent paper, Klar et al. (2012) have numerically studied a test based on the ECF of the residuals, comparing it with some of the tests cited above, for the problem of testing normality. From the obtained numerical results, they conclude that the test based on the ECF is one of the most powerful. The test statistic based on the ECF considered in Klar et al. (2012) is just $R_{n,\nu }=\Vert Y_{n, \nu }\Vert _w^2$, with $\varphi (t)=\varphi _0(t)$ the CF of the normal law and $w$ the density of the standard normal distribution. Thus, the results in Sect. 3 provide a theoretical basis for this test. Specifically, for testing

$$\begin{aligned} H_{0G}:\, \text{ the } \text{ CF } \text{ of } \,\varepsilon _0 \,\hbox {is } \,\varphi _0(t), \end{aligned}$$

for some $\varphi _0(t)$ totally specified, from Theorems 1(b) and 2(b) it follows that the test

$$\begin{aligned} {\Psi }_{G}={\Psi }_{G}(X_1, X_2, \ldots , X_n)=\left\{ \begin{array}{l@{\quad }l} 1, &{} \text{ if } R_{n,\nu }\ge r_{ \alpha },\\ 0, &{}\text{ otherwise, } \end{array}\right. \end{aligned}$$

where $r_{\alpha }$ is the $1-\alpha $ percentile of the null distribution of $R_{n,\nu }$, or a consistent approximation to it, is consistent against fixed alternatives, that is to say, it rejects $H_{0G}$ with probability tending to one when it is false, whenever $w(t)>0$, $\forall t \in \mathbb {R}$. The null distribution of $R_{n,\nu }$ cannot be exactly calculated. The asymptotic null distribution of $R_{n,\nu }$ cannot be used to approximate its null distribution because it depends on unknowns (recall Remark 1). To approximate the null distribution of $R_{n,\nu }$, Klar et al. (2012) have employed the following bootstrap algorithm.

Algorithm 6

(i)
On the basis of $X_1,\ldots ,X_n$, compute $\hat{\theta }=(\hat{c},\hat{a}_1,\ldots ,\hat{a}_p, \hat{b}_1,\ldots ,\hat{b}_q)'$.
(ii)
Define the bootstrap data
$$\begin{aligned} X^*_{n,j}={\sigma }_j^*(\hat{\theta })\varepsilon _{j}^* \end{aligned}$$
where $\{\varepsilon _{j}^*, \, -\infty <j<\infty \}$ are IID with common CF $\varphi _0(t)$ and
$$\begin{aligned} {\sigma }_j^{*2}(\hat{\theta })= \hat{c}+\sum _{k=1}^p\hat{a}_kX^{*2}_{n,j-k}+\sum _{l=1}^q\hat{b}_l\sigma ^{*2}_{j-l}(\hat{\theta }), \quad j\in \mathbb {Z}. \end{aligned}$$
(iii)
Approximate the null distribution of $R_{n,\nu }=R_{n,\nu }(X_1, \ldots , X_n)$ through the conditional distribution of $R^*_{n,\nu }=R_{n,\nu }(X_1^*, \ldots , X_n^*)$, given $X_1, \ldots , X_n$,

To prove that the above bootstrap scheme provides a consistent null distribution estimator of $R_{n,\nu }$, we will assume that $\hat{\theta }^*=\hat{\theta }(X_1^*, \ldots , X_n^*)$ satisfies the following assumption, which is equal to assumption (A.1) plus a Lindeberg condition to ensure that $\sqrt{n}(\hat{\theta }^*-\hat{\theta })$ is asymptotically normal, plus a continuity condition to ensure that when $H_0$ is true $\sqrt{n}(\hat{\theta }-{\theta })$ and $\sqrt{n}(\hat{\theta }^*-\hat{\theta })$ both converge in law to the same limit.

(A.2)
1. (a)
  $\hat{\theta }^*$ can be expressed as
  $$\begin{aligned} \hat{\theta }^*=\hat{\theta }+{n}^{-1}\sum _{j=1}^nL_{j}(\hat{\theta })+ r^*, \end{aligned}$$
  with $r^*=o_{P_{*}}(n^{-1/2})$ in probability, that is to say, with probability tending to 1, and
  $$\begin{aligned} L_{j}(\hat{\theta })&= (g_1(\varepsilon _j^*)l_1(\varepsilon _{j-1}^*,\, \varepsilon _{j-2}^*, \ldots ), \, \ldots ,\\&\quad g_{r}(\varepsilon _j^*)l_{r}(\varepsilon _{j-1}^*, \varepsilon _{j-2}^*, \ldots ))', \quad 1\le j\le n. \end{aligned}$$
2. (b)
  $E_{*}\{g_u(\varepsilon _0^*)\}=0$, $E_{*}\{g_u(\varepsilon _0^*)^2\}<\infty $, $E_{*}\{l_u(\varepsilon _{-1}^*, \varepsilon _{-2}^*, \ldots )^2\}<\infty $, $1 \le u \le r$, in probability.
3. (c)
  For every $b \in \mathbb {R}^r$, $\frac{1}{n} \sum _{j=1}^n E_{*} \left[ \{b'L_{j}(\hat{\theta })\}^2 \, | \, \mathcal {F}_{0 j-1}\right] \mathop {\rightarrow }\limits ^{P_{*}} b'\Sigma _{0\theta }b$, with probability tending to 1, where $ \mathcal {F}_{0 j}$ is the $\sigma $-algebra generated by $\{\varepsilon _{k}^*, \, -\infty <k \le j\}$ and $\Sigma _{0\theta }=\mathrm{Cov}_0\{L_0(\theta )\}$.
4. (d)
  $\lim L_n(\epsilon , e_k)=0$ for every $\epsilon >0$ and every $1\le k \le r$, in probability, where $\{e_1, \ldots , e_r\} $ is any basis of $\mathbb {R}^r$ and for $b \in \mathbb {R}^r$,
  $$\begin{aligned} L_n(\epsilon , b)=\frac{1}{n} \sum _{j=1}^n E_{*}\left[ \{b'L_{j}(\hat{\theta })\}^2 I\{|b'L_{j}(\hat{\theta })|>\epsilon \}\right] . \end{aligned}$$
5. (e)
  For every $b \in \mathbb {R}^r$, $\frac{1}{n} \sum _{j=1}^n E_{*} \left\{ \cos (t\varepsilon _j^*)b'L_{j}(\hat{\theta }) \, | \, \mathcal {F}_{0 j-1}\right\} \mathop {\rightarrow }\limits ^{P_{*}} E_0 \{\cos (t\varepsilon _0) b'L_{0}({\theta })\}$ and $\frac{1}{n}\! \sum _{j=1}^n E_{*} \!\left\{ \sin (t\varepsilon _j^*)b'L_{j}(\hat{\theta }) | \mathcal {F}_{0 j-1} \right\} \!\mathop {\rightarrow }\limits ^{P_{*}}\! E_0\{\sin (t\varepsilon _0)b' L_{0}({\theta })\}$ in probability, $\forall t\in \mathbb {R}$.

If $\hat{\theta }^*$ satisfies (A.2)(a)–(d) then, from Theorem 1.3 in Kundu et al. (2000), it follows that

$$\begin{aligned} \sup _x\left| P_{*}\{ \sqrt{n}(\hat{\theta }^*-\hat{\theta })\le x\}-P(Z\le x)\right| \mathop {\longrightarrow }\limits ^{P} 0, \end{aligned}$$

where $Z\sim N_r(0,\Sigma _{0\theta })$. If in addition $\hat{\theta }$ satisfies (A.1) and $H_0$ is true, then $\sqrt{n}(\hat{\theta }-{\theta })$ and $\sqrt{n}(\hat{\theta }^*-\hat{\theta })$ both converge in law to the same limit.

Let $\mu _{0c}(t)=\frac{\partial }{\partial t}\mathrm{Re}\varphi _0(t)$ and $\mu _{0s}(t)=\frac{\partial }{\partial t}\mathrm{Im}\varphi _0(t)$. The next result shows the consistency of the bootstrap approximation in Algorithm 6 as an estimator of the null distribution of the test statistic $R_{n,\nu }$.

Theorem 5

Assume that $\theta \in \Theta _0$, that $\sqrt{n}(\hat{\theta }-\theta )=O_P(1)$ and that $\hat{\theta }^*$ satisfies (A.2). Let $\nu =\nu (n)$ be an integer satisfying (4). Let $Y_0(t)$ be a zero mean complex valued Gaussian process with Covariance structure

$$\begin{aligned} \begin{aligned} \mathrm{Cov}\{\mathrm{Re}Y(t), \mathrm{Re}Y(s)\}&= \mathrm{Cov}_0\{C_0(t), C_0(s)\}\\ \mathrm{Cov}\{\mathrm{Re}Y(t), \mathrm{Re}Y(s)\}&= \mathrm{Cov}_0\{C_0(t), S_0(s)\}\\ \mathrm{Cov}\{\mathrm{Re}Y(t), \mathrm{Re}Y(s)\}&= \mathrm{Cov}_0\{S_0(t), S_0(s)\} \end{aligned} \end{aligned}$$

$\forall t,s \in \mathbb {R}$, where $C_0(t)=\cos (t\varepsilon _0)-\mathrm{Re}\varphi _0(t)-\frac{1}{2}t\mu _{0c}(t)E_0\{A_0(\theta )\}'L_0(\theta )$, $S_0(t)=\sin (t\varepsilon _0)-\mathrm{Im}\varphi _0(t)-\frac{1}{2}t\mu _{0s}(t)E_0\{A_0(\theta )\}'L_0(\theta )$. Let $w$ be a non-negative function satisfying (5) and let $W_0=\Vert Y_0\Vert _w^2$. Then

$$\begin{aligned} \sup _{x}\left| P_{*}(R^*_{n,\nu } \le x)-P(W_0 \le x) \right| \mathop {\rightarrow }\limits ^{P}0. \end{aligned}$$

The result in Theorem 5 holds whether or not $H_0$ is true. If $H_0$ is true and $\hat{\theta }$ satisfies (A.1), then the conditional distribution of $R^*_{n,\nu }$, given $X_1, \ldots , X_n$, and the distribution of $R_{n,\nu }$ are close in the sense that both converge to the same limit.

Remark 4

The following alternative expression of $R_{n,\nu }$, which can be easily derived using elementary formulas for the sine and the cosine of a sum, is useful from a computational point of view,

$$\begin{aligned} R_{n,\nu }=\frac{1}{n-\nu }\sum _{j=\nu }^n\sum _{k=\nu }^nh(\tilde{\varepsilon }_j, \tilde{\varepsilon }_k), \end{aligned}$$

where $h(x,y)=I_w(x-y)-I_{w0}(x)-I_{w0}(y)+I_{w00}$, with $I_{w0}(x)=\int I_w(x-y)dF_0(y)$ and $I_{w00}= \int \!\int I_w(x-y)dF_0(x)dF_0(y)$, $F_0$ being the cumulative distribution function corresponding to $\varphi _0$.

References

Alba Fernández V, Jiménez-Gamero MD, Muñoz-García J (2008) A test for the two-sample problem based on empirical characteristic functions. Comput Stat Data Anal 52:3730–3748
Article MATH Google Scholar
Angelidis T, Benos A, Degiannakis S (2004) The use of GARCH models in VaR estimation. Stat Methodol 1:105–128
Article MATH Google Scholar
Bai J, Chen Z (2008) Testing multivariate distributions in GARCH models. J Econom 143:19–36
Article MathSciNet Google Scholar
Baringhaus L, Henze N (1988) A consistent test for multivariate normality based on the empirical characteristic function. Metrika 35:339–348
Article MATH MathSciNet Google Scholar
Berkes I, Horváth L (2001) Strong approximation of the empirical process of GARCH sequences. Ann Appl Probab 11:789–809
Article MATH MathSciNet Google Scholar
Berkes I, Horváth L (2003) Limit results for the empirical process of squared residuals in GARCH models. Stoch Process Appl 105:271–298
Article MATH Google Scholar
Berkes I, Horváth L (2004) The efficiency of the estimators of the parameters in GARCH processes. Ann Stat 32:633–655
Article MATH Google Scholar
Berkes I, Horváth L, Kokoszka P (2003) GARCH processes: structure and estimation. Bernoulli 9:201–227
Article MATH MathSciNet Google Scholar
Bollerslev T (1986) Generalized, autoregressive conditional heteroskedasticity. J. Econom 31:307–327
Article MATH MathSciNet Google Scholar
Bougerol P, Picard N (1992a) Strict stationarity of generalized autoregressive process. Ann Probab 52:115–127
MATH MathSciNet Google Scholar
Bougerol P, Picard N (1992b) Stationarity of GARCH processes and of some nonnegative time series. J Econom 52:115–127
Article MATH MathSciNet Google Scholar
Castaño-Martínez A, López-Blázquez F (2005) Distribution od a sum of weighted central chi-square variables. Commun Stat Theory Methods 34:515–524
Article MATH Google Scholar
Csörgő S (1981a) Limit behaviour of the empirical characteristic function. Ann Probab 9:130–144
Article MathSciNet Google Scholar
Csörgő S (1981b) Multivariate empirical characteristics functions. Z Wahrsch Verw Gebiete 55:203–229
Article MathSciNet Google Scholar
Dunford N, Schwartz JT (1963) Linear operators. Spectral theory. Self adjoint operators in Hilbert space. Part II. Wiley, NJ
Google Scholar
Engle RF (1982) Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50:987–1007
Article MATH MathSciNet Google Scholar
Epps TW, Pulley LB (1983) A test for normality based on the empirical characteristic function. Biometrika 70:723–726
Article MATH MathSciNet Google Scholar
Escanciano JC (2009) Quasimaximum likelihood estimation of semi-strong GARCH models. Econom Theory 25:561–570
Article MATH MathSciNet Google Scholar
Fan J, Yao Q (2003) Nonlinear time series. Springer, New York
Fernández C, Steel M (1998) On Bayesian modeling of fat tails and skewness. J Am Stat Assoc 93:359–371
MATH Google Scholar
Feuerverger A, McDunnough P (1981a) On some Fourier methods for inference. J Am Stat Assoc 76:379–387
Article MATH MathSciNet Google Scholar
Feuerverger A, McDunnough P (1981b) On the efficiency of empirical characteristic function procedures. J Roy Stat Soc Ser B 43:20–27
MATH MathSciNet Google Scholar
Feuerverger A, Mureika RA (1977) The empirical characteristic function and its applications. Ann Stat 5:88–97
Article MATH MathSciNet Google Scholar
Francq C, Zakoïan JM (2004) Maximum likelihood estimation of pure GARCH and ARMA-GARCH processes. Bernoulli 10:605–637
Article MATH MathSciNet Google Scholar
Gürtler N, Henze N (2000) Goodness-of-fit tests for the Cauchy distribution based on the empirical characteristic function. Ann Inst Stat Math 52:267–286
Article MATH Google Scholar
Hall P, Yao Q (2003) Inference in ARCH and GARCH models with heavy-tailed errors. Econometrica 71:285–317
Article MATH MathSciNet Google Scholar
Henze N, Klar B, Meintanis SG (2003) Invariant tests for symmetry about an unspecified point based on the empirical characteristic function. J Multivar Anal 87:275–297
Article MATH MathSciNet Google Scholar
Horváth L, Kokoszka P, Teyssière G (2004) Bootstrap misspecification tests for ARCH based on the empirical process of squared residuals. J Stat Comput Simul 74:485–4965
Google Scholar
Horváth L, Zytikis R (2006) Testing goodness of fit based on densities of GARCH innovations. Econom Theory 22:457–482
MATH Google Scholar
Horváth L, Kokoszka P, Zytikis R (2008) Distributional analysis of empirical volatility in Garch processes. J Stat Plan Inference 138:3578–3589
Article MATH Google Scholar
Huang D, Wang H, Yao Q (2008) Estimating GARCH models: when to use what? Econom J 11:27–38
Article MATH Google Scholar
Hušková M, Meintanis SG (2007) Omnibus tests for the error distribution in the linear regression model. Statistics 41:363–376
Article MATH MathSciNet Google Scholar
Hušková M, Meintanis SG (2008) Tests for the multivariate k-sample problem based on the empirical characteristic function. J Nonparametr Stat 20:263–277
Article MATH MathSciNet Google Scholar
Hušková M, Meintanis SG (2010) Tests for the error distribution in nonparametric possibly heteroscedastic regression models. Test 19:92–112
Article MATH MathSciNet Google Scholar
Jiménez-Gamero MD, Alba-Fernández V, Muñoz-García J, Chalco-Cano Y (2009) Goodness-of-fit tests based on empirical characteristic functions. Comput Stat Data Anal 53:3957–3971
Article MATH Google Scholar
Jiménez-Gamero MD, Muñoz-García J, Pino-Mejías R (2005) Testing goodness of fit for the distribution of errors in multivariate linear models. J Multivar Anal 95:301–322
Article Google Scholar
Klar B, Lindner F, Meintanis SG (2012) Specification tests for the error distribution in GARCH models. Comput Stat Data Anal 56:3587–3598
Article MATH MathSciNet Google Scholar
Kotz S, Johnson NL, Boyd DW (1967) Series representations of quadratic forms in normal variables. I. Central case. Ann Math Stat 38:823–837
Article MATH MathSciNet Google Scholar
Koul HL, Ling S (2006) Fitting an error distribution in some heteroscedastic time series models. Ann Stat 34:994–1012
Article MATH MathSciNet Google Scholar
Koul HL, Mimoto N (2012) A goodness-of-fit test for GARCH innovation density. Metrika 75:127–149
Article MATH MathSciNet Google Scholar
Kulperger R, Yu H (2005) High moment partial sum processes of residuals in GARCH models and their applications. Ann Stat 33:2395–2422
Article MATH MathSciNet Google Scholar
Kundu S, Majumdar S, Mukherjee K (2000) Central limit theorems revisited. Stat Probab Lett 47:27–265
Article MathSciNet Google Scholar
Matsui M, Takemura A (2005) Empirical characteristic function approach to goodness-of-fit tests for the Cauchy distribution with parameters estimated by MLE or EISE. Ann Inst Stat Math 57:183–199
Article MATH MathSciNet Google Scholar
Matsui M, Takemura A (2008) Goodness-of-fit tests for symmetric stable distributions-Empirical characteristic function approach. Test 17:546–566
Article MATH MathSciNet Google Scholar
Marcus MB (1981) Weak convergence of the empirical characteristic function. Ann Probab 9:194–201
Article MATH MathSciNet Google Scholar
Meintanis SG (2004) Goodness-of-fit tests for the logistic distribution based on empirical transformations. Sankhyā 66:306–326
MATH MathSciNet Google Scholar
Mimoto N (2008) Convergence in distribution for the sup-norm of a kernel density estimator for GARCH innovations. Stat Probab Lett 78:915–923
Article MATH MathSciNet Google Scholar
Pascual L, Romo J, Ruiz E (2006) Bootstrap prediction for returns and volatilities in GARCH models. Comput Stat Data Anal 50:2293–2312
Article MATH MathSciNet Google Scholar
Peng L, Yao Q (2003) Least absolute deviations for ARCH and GARCH models. Biometrika 90:967–975
Article MathSciNet Google Scholar
Rydberg TH (2000) Realistic statistical modelling of finantial data. Int Stat Rev 68(3):233–258
Article MATH Google Scholar
Serfling RJ (1980) Approximation theorems of mathematical statistics. Wiley

Download references

Acknowledgments

The author thanks the anonymous referees for their constructive comments and suggestions which helped to improve the presentation. The author also acknowledges financial support from grant UJA2013/08/01.

Author information

Authors and Affiliations

Dpto. de Estadística e Investigación Operativa, Universidad de Sevilla, Sevilla, Spain
M. Dolores Jiménez Gamero

Authors

M. Dolores Jiménez Gamero
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Dolores Jiménez Gamero.

Appendix: proofs

Before proving the results in Sects. 3 and 4, we state some preliminary results.

Lemma 1

There exist $\rho _{01} \le \rho _{1}$ and $\rho _{02} \le \rho _{2}$, with $0<\rho _{01}<\rho _{02}$, such that $\theta \in \Theta _1=\Theta (\rho _0, \rho _{01}, \rho _{02})$ and

$$\begin{aligned} E\left\{ \sup _{u \in \Theta _1}\frac{\sigma ^{2\xi }_0({\theta })}{\sigma ^{2\xi }_0(u)}\right\} <\infty , \quad \xi =1,2. \end{aligned}$$

Proof

Recall that we assume that $\{X_j, \, -\infty < j< \infty \}$ is stationary. From Proposition 1 in Francq and Zakoïan (2004), this implies that there exists $s>0$ such that $E(X_0^{2s})<\infty $.

If $u_1, \, u_2 \in \Theta _1$, with $u_j=(\gamma _j, \alpha _{j1}, \ldots , \alpha _{jp},\beta _{j1}, \ldots , \beta _{jq} )$, $j=1,2$, then

$$\begin{aligned} \frac{\beta _{1k}}{\beta _{2k}} \le \frac{\rho _{02} }{\rho _{01} }:=1+\delta , \end{aligned}$$

(11)

$1\le k \le q$. From the proof of Theorem 2.2 in Francq and Zakoïan (2004),

$$\begin{aligned} \sup _{u \in \Theta _1}\frac{\sigma _j^2(u)}{\sigma _j^2(\theta )} \le D\left( 1+\sum _{k=1}^p\sum _{v=1}^{\infty } (1+\delta )^v \rho ^{v\varsigma } X^{2\varsigma }_{j-k-v}\right) , \end{aligned}$$

for some positive constant $D$ and some $0<\rho <1$, $\forall \, \varsigma \in (0,1)$. Choosing $\varsigma $ such that $E(X^{4\varsigma }_0)<\infty $ and $\rho _{01}$, $\rho _{02}$ such that $\delta =(1-\rho ^{\varsigma })/(2\rho ^{\varsigma })$, we get the result for $\xi =2$, which implies the result for $\xi =1$. $\square $

A detailed reading of the proof of Lemma 1 reveals that the result is still true if instead of keeping $\theta \in \Theta _0$ fixed, we consider a sequence of values for $\theta $, say $\{\theta _m\}$, all of them in $\Theta _1$, since the bound in (11) is valid for all $u_1, \, u_2 \in \Theta _1$. Next lemma states this fact.

Lemma 2

There exist $\rho _{01} \le \rho _{1}$ and $\rho _{02} \le \rho _{2}$, with $0<\rho _{01}<\rho _{02}$, such that $\theta \in \Theta _1=\Theta (\rho _0, \rho _{01}, \rho _{02})$ and for any $\{\theta _m \} \subset \Theta _1$

$$\begin{aligned} \sup _m E\left\{ \sup _{u \in \Theta _1}\frac{\sigma ^{2\xi }_0({\theta _m})}{\sigma ^{2\xi }_0(u)}\right\} <\infty , \quad \xi =1,2. \end{aligned}$$

Recall that $\sigma _j^2(\theta )A_j(\theta )$ is the $r$-vector of derivatives of $\sigma _j^2(\theta )$ with respect to $\theta $. Let $\sigma _j^2(\theta )B_j(\theta )$ be the $r\times r$-matrix of second-order derivatives of $\sigma _j^2(\theta )$ with respect to $\theta $, that is, $B_j(\theta )=\frac{1}{\sigma _j^2(\theta )}\frac{\partial ^2}{\partial \theta \partial \theta '}\sigma _j^2(\theta )$. The following result is in the proof of Lemma 5.6 in Berkes et al. (2003).

Lemma 3

(a)
$ E \left( \sup _{u \in \Theta _0} |A_0(u)|^{\zeta } \right) <\infty ,$ for any $\zeta >0$,
(b)
$ E \left( \sup _{u \in \Theta _0} |B_0(u)|^{\zeta } \right) <\infty ,$ for any $\zeta >0$,

where $|V |$ denotes the largest of the absolute values of the elements of the vector (or matrix) $V$.

Let $\{\hat{\varepsilon }_j=X_j/\sigma _j(\hat{\theta }), \, 1\le j\le n\}$ be the non-truncated version of the residuals. The following result is Eq. (5.6) in Koul and Mimoto (2012). Here, we state it under weaker assumptions.

Lemma 4

Assume that $\hat{\theta } \in \Theta _1$, where $\Theta _1$ is as in Lemma 1. Then

$$\begin{aligned} \sum _{j=1}^\infty |\tilde{\varepsilon }_j-\hat{\varepsilon }_j|=O(1), \quad a.s. \end{aligned}$$

(12)

Proof

To derive (12), Koul and Mimoto (2012) use Lemma 5.1 in Berkes et al. (2003), which proves that

$$\begin{aligned} E\left\{ \sup _{u \in \Theta _0}\frac{\sigma ^{2}_0({\theta })}{\sigma ^{2}_0(u)}\right\} <\infty , \end{aligned}$$

(13)

provided that $E(|\varepsilon _0|^{2(1+\xi )})<\infty $ for some $\xi >0$. The Lemma also assumes that $\lim _{x \rightarrow 0}x^{-\mu }P(\varepsilon _0^2 \le x)=0$, for some $\mu >0$. Nevertheless, Francq and Zakoïan (2004) showed that (13) holds by only assuming that $\{X_j, \, -\infty < j< \infty \}$ is stationary if instead of taking the supremum in $u \in \Theta _0$, it is taken in an adequately chosen subset $\Theta _1 \subseteq \Theta _0$ (recall the proof of Lemma 1). $\square $

Lemma 5

Assume that ${\theta } \in \Theta _1$, where $\Theta _1$ is as in Lemma 1, and that $n^{\kappa }(\hat{\theta }-\theta )=O_P(1)$, for some $\kappa >0$. Let $\nu =\nu (n)$ be an integer satisfying (4). Then

(a)
$\frac{1}{n-\nu }\sum _{j=\nu +1}^n|\hat{\varepsilon }_j-{\varepsilon }_j|=o_P(1)$.
(b)
If $\kappa >0.25$, $\frac{1}{\sqrt{n-\nu }}\sum _{j=\nu +1}^n(\hat{\varepsilon }_j-{\varepsilon }_j)^2=o_P(1).$

Proof

Let $D>0$ be a constant. By the Mean Value Theorem and Markov inequality,

$$\begin{aligned} P\left( \frac{1}{n-\nu }\sum _{j=\nu +1}^n|\hat{\varepsilon }_j-{\varepsilon }_j|>D\, \Vert \hat{\theta }-\theta \Vert \right) \le \frac{1}{D} \frac{1}{n-\nu }\sum _{j=\nu +1}^n E\left( \frac{|X_j|}{\sigma _j(\tilde{\theta }_j)}\Vert A_j(\tilde{\theta }_j)\Vert \right) , \end{aligned}$$

with $\tilde{\theta }_j=\tau _j \hat{\theta }+(1-\tau _j){\theta }$, for some $0<\tau _j<1$. By Hölder inequality,

$$\begin{aligned} E\left( \frac{|X_j|}{\sigma _j(\tilde{\theta }_j)}\Vert A_j(\tilde{\theta }_j)\Vert \right) \le E^{1/2}\left( \frac{X_j^2}{\sigma ^2_j(\tilde{\theta }_j)}\right) E^{1/2}\left( \Vert A_j(\tilde{\theta }_j)\Vert ^2\right) . \end{aligned}$$

$$\begin{aligned} E\left( \frac{X_j^2}{\sigma ^2_j(\tilde{\theta }_j)}\right) =E\left( \varepsilon _j^2\frac{\sigma ^2_j(\theta )}{\sigma ^2_j(\tilde{\theta }_j)}\right) \le E\left( \varepsilon _j^2 \sup _{u\in \Theta _1}\frac{\sigma ^2_j(\theta )}{\sigma ^2_j(u)}\right) =E\left( \sup _{u \in \Theta _1}\frac{\sigma ^2_0({\theta })}{\sigma ^2_0(u)}\right) <\infty , \end{aligned}$$

where the last equality comes from the fact that $\sup _{u\in \Theta _1}\frac{\sigma ^2_j(\theta )}{\sigma ^2_j(u)}$ only depends on $\{X_{k},\, k\le j-1\}$, the independence between $\varepsilon _j$ and $\{X_{k},\, k\le j-1\}$ and $E(\varepsilon _j^2)=1$. On the other hand, if $\hat{\theta } \in \Theta _1$, from Lemma 3,

$$\begin{aligned} E\left( \Vert A_j(\tilde{\theta }_j)\Vert ^2\right) \le E \left( \sup _{u \in \Theta _1} \Vert A_j(u)\Vert ^2\right) =E \left( \sup _{u \in \Theta _1} \Vert A_0(u)\Vert ^2\right) <\infty . \end{aligned}$$

Thus, if $\hat{\theta } \in \Theta _1$, $\frac{1}{n-\nu }\sum _{j=\nu +1}^n|\hat{\varepsilon }_j-{\varepsilon }_j|=O_p(\Vert \hat{\theta }-\theta \Vert )$. Since we assume that $n^{\kappa }(\hat{\theta }-\theta )=O_P(1)$, for some $\kappa >0$, which implies that $\Vert \hat{\theta }-\theta \Vert =o_P(1)$ and that $P(\hat{\theta } \in \Theta _1) \rightarrow 1$, it readily follows the result in (a).

Part (b) can be proven following similar steps to those given in the proof of part (a). $\square $

Note that the results in Lemmas 1, 2, 4 (and thus also Lemma 5) depend of a subset $\Theta _1 \subset \Theta _0$ containing $\theta $. To simplify notation, from now on we will assume that $\Theta _1 =\Theta _0$.

Proof of Theorem 1

(a) Let $\hat{\varepsilon }_j=X_j/{\sigma }_j(\hat{\theta })$, $1 \le j \le n$, and $c_{n, \nu }(t)=\frac{1}{n-\nu }\sum _{j=\nu +1}^n e^{it{\varepsilon }_j}$, which satisfies

$$\begin{aligned} \sup _{t \in S}|c_{n, \nu }(t) -\varphi (t)|\mathop {\longrightarrow }\limits ^{P} 0, \end{aligned}$$

(14)

for any fixed compact interval $S$. We have

$$\begin{aligned} |\varphi _{n, \nu }(t)-c_{n, \nu }(t)|\le 2|t|\left\{ \frac{1}{n-\nu }\sum _{j=\nu +1}^n|\tilde{\varepsilon }_j-\hat{\varepsilon }_j |+ \frac{1}{n-\nu }\sum _{j=\nu +1}^n|\hat{\varepsilon }_j-{\varepsilon }_j |\right\} .\nonumber \\ \end{aligned}$$

(15)

The result follows from (14), (15) and Lemmas 4 and 5.

(b) Let $\delta >0$. There exists a compact interval $S=S(\delta )$ such that $\int _S w(t)\mathrm{d}t \ge 1-\delta $. Since $|\varphi _{n,\nu }(t)-\varphi (t)|^2\le 4$, we have that $\Vert \varphi _{n, \nu }-\varphi \Vert ^2_w \le 4\delta + \sup _{t \in S}|\varphi _{n, \nu }(t)-\varphi (t)|$ and, therefore, the result follows from part (a). $\square $

Proof of Theorem 2

(a)
Let $S \subset \mathbb {R}$ be any compact set. We have
$$\begin{aligned}&\left| \frac{1}{\sqrt{n-\nu }}\sum _{j=\nu +1}^n e^{it \hat{\varepsilon }_j} - \frac{1}{\sqrt{n-\nu }}\sum _{j=\nu +1}^ne^{it \tilde{\varepsilon }_j}\right| \le 2|t| \frac{1}{\sqrt{n-\nu }}\sum _{j=\nu +1}^n|\hat{\varepsilon }_j-\tilde{\varepsilon }_j| =|t|o_P(1), \end{aligned}$$
(16)

where the last equality follows from Lemma 4. Since $\hat{\varepsilon }_j=\varepsilon _j+(\hat{\varepsilon }_j-\varepsilon _j)$, by Taylor expansion,

$$\begin{aligned} \frac{1}{\sqrt{n-\nu }}\sum _{j=\nu +1}^n \{\cos (t \hat{\varepsilon }_j)-Re\varphi (t)\}=V_n(t)+tW_n(t)+t^2Z_n(t), \end{aligned}$$

where

$$\begin{aligned} V_n(t)=\frac{1}{\sqrt{n-\nu }}\sum _{j=\nu +1}^n \{\cos (t \varepsilon _j)-\mathrm{Re}\varphi (t)\}, \quad W_n(t)=\frac{-1}{\sqrt{n-\nu }}\sum _{j=\nu +1}^n \sin (t \varepsilon _j) (\hat{\varepsilon }_j-\varepsilon _j), \end{aligned}$$

and $|Z_n(t)|\le Z_{n1}$, $\forall t \in \mathbb {R}$, with $Z_{n1}=(n-\nu )^{-1/2}\sum _{j=\nu +1}^n(\hat{\varepsilon }_j-\varepsilon _j)^2=o_p(1)$, from Lemma 5. By Taylor expansion of $\hat{\theta }$ around $\theta $, $W_n(t)=W_{1n}(t)+W_{2n}(t)+W_{3n}(t)+W_{4n}(t)+W_{5n}(t)$, with

$$\begin{aligned} W_{1n}(t)&= \frac{1}{2}\sqrt{n}(\hat{\theta }-\theta )\frac{1}{\sqrt{(n-\nu )n}}\sum _{j=\nu +1}^n \left\{ \sin (t \varepsilon _j)\varepsilon _j +\mu _c(t)\right\} A_j(\theta ),\\ W_{2n}(t)&= \frac{1}{2}\sqrt{n}(\hat{\theta }-\theta )\mu _c(t)\frac{1}{\sqrt{(n-\nu )n}}\sum _{j=\nu +1}^n \left[ A_j(\theta ) - E\{A_0(\theta )\}\right] ,\\ W_{3n}(t)&= -\frac{1}{8} \sqrt{n}(\hat{\theta }-\theta )\frac{1}{n\sqrt{n-\nu }}\sum _{j=\nu +1}^n \sin (t \varepsilon _j)\varepsilon _j A_j(\tilde{\theta }_j)A_j(\tilde{\theta }_j)'\frac{\sigma _j(\theta )}{\sigma _j(\tilde{\theta }_j)}\sqrt{n}(\hat{\theta }-\theta )',\\ W_{4n}(t)&= \frac{1}{4} \sqrt{n}(\hat{\theta }-\theta )\frac{1}{n\sqrt{n-\nu }}\sum _{j=\nu +1}^n \sin (t \varepsilon _j)\varepsilon _j B_j(\tilde{\theta }_j)\frac{\sigma _j(\theta )}{\sigma _j(\tilde{\theta }_j)}\sqrt{n}(\hat{\theta }-\theta )',\\ W_{5n}&= \frac{-1}{2}\sqrt{n}(\hat{\theta }-\theta ) \left( \frac{n-\nu }{n}\right) ^{1/2}\mu _c(t)E\{A_0(\theta )\}, \end{aligned}$$

where $\tilde{\theta }_j=\alpha _j \hat{\theta }+(1-\alpha _j)\theta $, for some $\alpha _j \in (0,1)$, $\forall j$. Recall that $\mathcal {F}_{j}$ denotes the $\sigma $-algebra generated by $\{\varepsilon _k, \, -\infty <k\le j\}$. For each fixed $t\in \mathbb {R}$, $\left\{ \sin (t \varepsilon _j) \varepsilon _j A_j(\theta ) -\mu _c(t) A_j(\theta )\right\} $ is a real-valued martingale difference with respect to $\{\mathcal {F}_{j}\}$. Thus, from the Central Limit Theorem for real-valued martingale differences

$$\begin{aligned} \frac{1}{\sqrt{n-\nu }}\sum _{j=\nu +1}^n \left\{ \sin (t \varepsilon _j) \varepsilon _j -\mu _c(t) \right\} A_j(\theta )=O_P(1). \end{aligned}$$

Taking into account that $\sqrt{n}(\hat{\theta }-\theta )=O_P(1)$, we get that $W_{1n}(t)=o_P(1)$ for each fixed $t\in \mathbb {R}$. From the Ergodic Theorem and taking into account that $\sqrt{n}(\hat{\theta }-\theta )=O_P(1)$, we get that $W_{2n}(t)=o_P(1)$ for each fixed $t\in \mathbb {R}$. We also have $W_{3n}(t)=o_P(1)$ and $W_{4n}(t)=o_P(1)$ for each fixed $t\in \mathbb {R}$. Since $W_{1n}(t)+W_{2n}(t)+W_{3n}(t)+W_{4n}(t)$ is continuous as a function of $t$, we conclude that

$$\begin{aligned} \sup _{t\in S}\left| t \left\{ W_{1n}(t)+W_{2n}(t)+W_{3n}(t)+W_{4n}(t)\right\} \right| =o_P(1). \end{aligned}$$

From (A.1),

$$\begin{aligned} \sup _{t \in S} \left| tW_{5n}(t)+\frac{1}{2}t\mu _c(t)E\{A_0(\theta )\}'\frac{1}{\sqrt{n}} \sum _{j=1}^nL_j(\theta ) \right| =o_P(1). \end{aligned}$$

Because $\nu /n \rightarrow 0$, it is easy to see that $\sup _{t \in S}|V_n(t)-V_{n1}(t)|=o_P(1)$, where $V_{n1}(t)=\frac{1}{\sqrt{n}}\sum _{j=1}^n \{\cos (t \varepsilon _j)-\mathrm{Re}\varphi (t)\}$. Putting all above results together, we get

$$\begin{aligned} \sup _{t \in S} \left| \frac{1}{\sqrt{n-\nu }}\sum _{j=\nu +1}^n \{\cos (t \hat{\varepsilon }_j)-\mathrm{Re}\varphi (t)\}-\frac{1}{\sqrt{n}}\sum _{j=1}^n C_j(t)\right| =o_P(1), \end{aligned}$$

with $C_j(t)=\cos (t\varepsilon _j)-\mathrm{Re}\varphi (t)-\frac{1}{2}t\mu _c(t)E\{A_0(\theta )\}'L_j(\theta )$, $1 \le j \le n$. Analogously,

$$\begin{aligned} \sup _{t \in S} \left| \frac{1}{\sqrt{n-\nu }}\sum _{j=\nu +1}^n \{\sin (t \hat{\varepsilon }_j)-\mathrm{Im}\varphi (t)\}-\frac{1}{\sqrt{n}}\sum _{j=1}^n S_j(t)\right| =o_P(1), \end{aligned}$$

with $S_j(t)=\sin (t\varepsilon _j)-\mathrm{Im}\varphi (t)-\frac{1}{2}t\mu _s(t)E\{A_0(\theta )\}'L_j(\theta )$, $1 \le j \le n$. Let $Y_{n}(t)=\frac{1}{\sqrt{n}}\sum _{j=1}^n \{C_j(t)+\text{ i }S_j(t)\}$. This process satisfies: $E\{Y_{n}(t)\}=0$, $\forall t \in \mathbb {R}$ and

$$\begin{aligned} \begin{aligned} \mathrm{Cov}\{\mathrm{Re}Y_n(t), \mathrm{Re}Y_n(s)\}&=\mathrm{Cov}\{C_1(t),C_1(s)\}=\mathrm{Cov}\{\mathrm{Re}Y(t), \mathrm{Re}Y(s)\}, \\ \mathrm{Cov}\{\mathrm{Im}Y_n(t), \mathrm{Im}Y_n(s)\}&=\mathrm{Cov}\{S_1(t),S_1(s)\}=\mathrm{Cov}\{\mathrm{Im}Y(t), \mathrm{Im}Y(s)\}, \\ \mathrm{Cov}\{\mathrm{Im}Y_n(t), \mathrm{Re}Y_n(s)\}&=\mathrm{Cov}\{S_1(t),C_1(s)\}=\mathrm{Cov}\{\mathrm{Im}Y(t), \mathrm{Re}Y(s)\}, \quad \forall t,s \in \mathbb {R}. \end{aligned} \end{aligned}$$

The Central Limit Theorem for real-valued martingale differences and the Cramér Wold device prove that for all finite collections $s_1, \ldots , s_v$, the random vector $(Y_{n}(s_1),\ldots , Y_{n}(s_v))$ converges in distribution to $(Y(s_1),\ldots , Y(s_v))$. To finish the proof, we must prove that the measures $\mathrm{Re}Y_n(t)$ and $\mathrm{Im}Y_n(t)$ are tight. Since $E(\varepsilon ^2)=1<\infty $, $\frac{1}{\sqrt{n}}\sum _{j=1}^n \{\cos _j(t\varepsilon _j)-\mathrm{Re}\varphi (t)\}$ is tight in every finite interval (see Csörgő 1981a). Because $E\{A_0(\theta )\}$ is finite, $\frac{1}{\sqrt{n}}\sum _{j=1}^n L_j(\theta )$ is bounded in probability and $t\mu _c(t)$ is a continuous function, we get that $t\mu _c(t)E\{A_0(\theta )\}'\frac{1}{\sqrt{n}}\sum _{j=1}^nL_j(\theta )$ is tight in every finite interval. Thus, $\mathrm{Re}Y_n(y)$ is tight in every finite interval. Analogously, $\mathrm{Im}Y_n(y)$ is tight in every finite interval. These facts imply the uniform convergence of $Y_{n,\nu }(t)$ to $Y(t)$ in finite intervals.

(b) From the proof of part (a): $\left| Y_{n,\nu }(t)-Y_n(t)\right| \le a_{0n}+a_{1n}t+a_{2n}t^2$, with $a_{jn}=o_P(1)$, $j=0,1,2$, which implies $\Vert Y_{n,\nu }-Y\Vert _w^2=o_P(1).$ The result follows from the this equality by taking into account that $\Vert Y_{n,\nu }\Vert ^2_w=\Vert Y\Vert ^2_w+\Vert Y_{n,\nu }-Y\Vert _w^2+2C_n$, with $|C_n|^2 \le \Vert Y\Vert ^2_w\Vert Y_{n,\nu }-Y\Vert _w^2=o_P(1).$ $\square $

To prove Theorem 3, we will use the following auxiliary result.

Lemma 6

Suppose assumptions in Theorem 3 hold. Let

$$\begin{aligned} K_n(t,s)=\frac{1}{n-\nu }\sum _{j=\nu +1}^n \sin (t\tilde{\varepsilon }_j)\sin (s\tilde{\varepsilon }_j). \end{aligned}$$

Then

$$\begin{aligned} \sup _{(t,s)\in S}|K_n(t,s)-K(t,s)|\mathop {\rightarrow }\limits ^{P} 0, \end{aligned}$$

$\forall S\subset \mathbb {R}^2$ compact set, where $K(t,s)$ is as defined in (8).

Proof

The result is a consequence of Theorem 1 (a) since, by applying elementary trigonometric identities, we have the following alternative expression for $K_n(t,s)$

$$\begin{aligned} K_n(t,s)=\frac{1}{2(n-\nu )}\sum _{j=\nu +1}^n \left[ \cos \{(t-s)\tilde{\varepsilon }_j\}-\cos \{(t+s)\tilde{\varepsilon }_j\}\right] . \end{aligned}$$

$\square $

Proof of Theorem 3

To prove the result, we apply Theorem 1.1 in Kundu et al. (2000). So we will check that conditions (i)–(iii) in such Theorem hold. With this aim, we first note that $E_{*}\{\sin (t\varepsilon ^*_j)\}=0$, $\nu +1 \le j\le n$, $\forall n$ and

$$\begin{aligned} \mathrm{Cov}_{*}\{S_{n,\nu }^{*}(t), S_{n,\nu }^{*}(s)\}=\frac{1}{n-\nu }\sum _{j=\nu +1}^n \sin (t\tilde{\varepsilon }_j)\sin (s\tilde{\varepsilon }_j)=K_n(t,s), \end{aligned}$$

for both algorithms. From Lemma 6, $K_n(t,s) \mathop {\rightarrow }\limits ^{P} K(t,s)$, $\forall s,t \in \mathbb {R}$. Note also that $|K(t,s)|\le 1$, $\forall s,t \in \mathbb {R}$. Let $\{e_k, \, k \ge 0\}$ be an orthonormal basis of $L_2(w)$. Since $K_n(t,s) \mathop {\rightarrow }\limits ^{P} K(t,s)$ and $|K(t,s)|\le 1$, Dominated Convergence Theorem yields

$$\begin{aligned} \lim \langle R_ne_k,e_l\rangle =\lim \int K_n(t,s)e_k(t)e_l(s)w(t)w(s)\mathrm{d}t\mathrm{d}s=\langle R e_k,e_l\rangle , \end{aligned}$$

in probability. This proves that condition (i) holds. To verify condition (ii), using Monotone Convergence Theorem, Parseval’s relation and Dominated Convergence Theorem, we get

$$\begin{aligned} \lim \sum _{k=0}^{\infty }\langle C_ne_k,e_k\rangle&= \lim \sum _{k=0}^{\infty }E_{*}\{\langle R_n,e_k\rangle ^2\}\\&= \lim E_{*} \{ \Vert R_n\Vert _w^2\}= \int \lim K_n(t,t) w(t)\mathrm{d}t\\&= \int K(t,t) w(t)\mathrm{d}t= E \Vert Y_{0S}\Vert _w <\infty , \end{aligned}$$

in probability. Let $W_{nj}=(n-\nu )^{-1/2}\sin (t\varepsilon ^*_j)$. To prove condition (iii), we first notice that $|\langle W_{nj}, e_k \rangle | \le 1/\sqrt{n-\nu }$, whence $E\left( \langle W_{nj}, e_k \rangle ^2 I\{|\langle W_{nj}, e_k\rangle |>\epsilon \}\right) =0$ for sufficiently large $n$. Thus, $\lim \sum _{j=1}^nE\left( \langle W_{nj}, e_k\rangle ^2 I\{|\langle W_{nj}, e_k\rangle |>\epsilon \}\right) =0$, for every $\epsilon >0$ and every $k \ge 0$. $\square $

Proof of Theorem 4

Let $\epsilon >0$ be arbitrary but fixed. Let $a\in L_2(w)$ such that $\Vert a\Vert _w=1.$ There exists $m=m(\epsilon )>0$ such that $\int _Ca^2(t)w(t)\mathrm{d}t \ge 1-\epsilon $, where $C=[-m,m]$. Let $\bar{C}$ denote the complementary set of $C$. Let $ y_n=\sup _{s,t\in C}|K_{n}(t,s)-K(s,t)|$. Taking into account that $\int |a(t)|w(t)\mathrm{d}t \le 1$, $\int _{\bar{C}} |a(t)|w(t)\mathrm{d}t \le \sqrt{\epsilon }$ and $|K_{n}(t,s)-K(s,t) |\le 2$, we get

$$\begin{aligned} \int \left| K_{n}(t,s)-K(s,t) \right| a(t)a(s)w(t)w(s)\mathrm{d}s\mathrm{d}t \le y_{n}+2\epsilon . \end{aligned}$$

The above inequality and Lemma 6 both imply that $\int \left| K_n(t,s)-K(s,t) \right| a(t)a(s)w(t)w(s)\mathrm{d}s\mathrm{d}t =o_P(1)$. By Corollary XI.9.4 (a) in Dunford and Schwartz (1963, p. 1090), this implies that $|\lambda _j-\hat{\lambda }_{j}|=o_P(1)$, $\forall j$, where $\lambda _1 \ge \lambda _2 \ge \ldots $ and $\hat{\lambda }_1 \ge \hat{\lambda }_2 \ge \ldots $, and hence

$$\begin{aligned} \sup _x \left| P_{*}(W_n \le x)- P(W_{0S} \le x)\right| \mathop {\longrightarrow }\limits ^{P} 0. \end{aligned}$$

The above limit and (7) both imply the result. $\square $

Proof of Theorem 5

The proof closely follows that of Theorem 2, so we omit it. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiménez Gamero, M.D. On the empirical characteristic function process of the residuals in GARCH models and applications. TEST 23, 409–432 (2014). https://doi.org/10.1007/s11749-014-0359-5

Download citation

Received: 04 June 2013
Accepted: 29 January 2014
Published: 19 February 2014
Issue Date: June 2014
DOI: https://doi.org/10.1007/s11749-014-0359-5

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On the empirical characteristic function process of the residuals in GARCH models and applications

Abstract

Similar content being viewed by others

Statistical inference for mixture GARCH models with financial application

Asymptotic normality of Huber-Dutter estimators in a linear EV model with AR(1) processes

Estimating FARIMA models with uncorrelated but non-independent error terms

1 Introduction

2 The model

3 Main results

Theorem 1

Theorem 2

Remark 1

4 Applications

4.1 Testing for symmetry

Algorithm 1

Algorithm 2

Algorithm 3

Algorithm 4

Theorem 3

Theorem 4

Algorithm 5

Remark 2

Remark 3

4.2 Testing goodness-of-fit for the distribution of the innovations

Algorithm 6

Theorem 5

Remark 4

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: proofs

Appendix: proofs

Lemma 1

Proof

Lemma 2

Lemma 3

Lemma 4

Proof

Lemma 5

Proof

Proof of Theorem 1

Proof of Theorem 2

Lemma 6

Proof

Proof of Theorem 3

Proof of Theorem 4

Proof of Theorem 5

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation