1 Introduction

The autoregressive conditional heteroscedastic (ARCH) model introduced by Engle (1982) expresses the conditional variance (volatility) of the process as a linear functional of the squared past values. This model has a lot of extensions. For instance, Bollerslev (1986) generalized the ARCH (GARCH) model by adding the past realizations of the volatility. The volatility function of a GARCH process is a linear function of products of squared past innovations. Thus, by construction, the conditional variance only depends on the modulus of the past variables: past positive and negative innovations have the same effect on the current volatility. This property is in contradiction with many empirical studies on series of stocks, showing a negative correlation between the squared current innovation and the past innovations. For instance, Black (1976) showed that the past negative returns seem to have more impact on the current volatility than the past positive returns. Numerous financial series present this stylized fact, known as the leverage effect. Since 1993, a lot of extensions are made to consider the leverage effect. Among the various asymmetric GARCH processes introduced in the econometric literature, the more general is the asymmetric power GARCH (APGARCH for short) model of Ding et al. (1993). For some positive constant \(\delta _0\), it is defined by

$$\begin{aligned} \left\{ \begin{aligned}&\varepsilon _t = \zeta _t\eta _t\\&\zeta _t^{\delta _0} = \omega _0 + \sum \limits _{i=1}^q \alpha _{0i}^+ \left( \varepsilon _{t-i}^+\right) ^{\delta _0} + \alpha _{0i}^- \left( -\varepsilon _{t-i}^-\right) ^{\delta _0} + \sum \limits _{j=1}^p \beta _{0j} \zeta _{t-j}^{\delta _0}, \end{aligned}\right. \end{aligned}$$
(1)

where \(x^+ = \max (0,x)\) and \(x^- = \min (0,x)\). It is assumed that

A0: \((\eta _t)\) is a sequence of independent and identically distributed (iid, for short) random variables with \({\mathbb {E}}\vert \eta _t\vert ^r< \infty \) for some \(r> 0\).

In the sequel, the vector of parameter of interest (the true parameter) is denoted

$$\begin{aligned} \vartheta _0= \left( \omega _0, \alpha _{01}^+, \ldots , \alpha _{0q}^+, \alpha _{01}^-, \ldots , \alpha _{0q}^-, \beta _{01},\ldots , \beta _{0p},\delta _0\right) ' \end{aligned}$$

and satisfies the positivity constraints \(\vartheta _0\in ]0,\infty [ \times [0,\infty [^{2q+p}\times ]0,\infty [\). The representation (1) includes various GARCH time series models: the standard GARCH of Engle (1982) and Bollerslev (1986) obtained for \(\delta _0=2\) and \(\alpha _{0i}^+=\alpha _{0i}^-\) for \(i=1,\dots ,q\); the threshold ARCH (TARCH) model of Rabemananjara and Zakoïan (1993) for \(\delta _0 = 1\) and the GJR model of Glosten et al. (1993) for \(\delta _0= 2\).

After identification and estimation of the GARCH processes, the next important step in the GARCH modelling consists in checking if the estimated model fits the data satisfactorily. This adequacy checking step allows to validate or invalidate the choice of the orders p and q. Thus it is important to check the validity of a GARCH(pq) model, for given orders p and q. This paper is devoted to the problem of the validation step of APGARCH(pq) representations (1) processes, when the power \(\delta _0\) is estimated. Based on the residual empirical autocorrelations, Box and Pierce (1970) derived a goodness-of-fit test, the portmanteau test, for univariate strong autoregressive moving-average (ARMA) models (i.e. under the assumption that the error term is iid). Ljung and Box (1978) proposed a modified portmanteau test which is nowadays one of the most popular diagnostic checking tool in ARMA modelling of time series. Since the articles by Ljung and Box (1978) and McLeod (1978), portmanteau tests have been important tools in time series analysis, in particular for testing the adequacy of an estimated ARMA(pq) model. See also Wai (2004), for a reference book on the portmanteau tests.

The intuition behind these portmanteau tests is that if a given time series model with iid innovation \(\eta _t\) is appropriate for the data at hand, the autocorrelations of the residuals \({\hat{\eta }}_t\) should be close to zero, which is the theoretical value of the autocorrelations of \(\eta _t\). The standard portmanteau tests thus consists in rejecting the adequacy of the model for large values of some quadratic form of the residual autocorrelations.

Li and Mak (1994) and Shiqing and Wai (19978) studied a portmanteau test based on the autocorrelations of the squared residuals. Indeed the test based on the autocorrelations is irrelevant because the process such that this use to define a GARCH model (\({\hat{\eta }}_t = \varepsilon _t/ {\hat{\sigma }}_t\)) with \({\hat{\sigma }}_t\) independent of \(\sigma \{\eta _u, u < t\}\), is a martingale difference and thus is uncorrelated. Concerning the GARCH class model, Berkes et al. (2003) developed an asymptotic theory of portmanteau tests in the standard GARCH framework. Leucht et al. (2015) suggest a consistent specification test for GARCH(1, 1) model. This test is based on a test statistic of Cramér–Von Mises type. Recently, Dolores et al. (2020) proposed goodness-of-fit test for certain parametrizations of conditionally heteroscedastic time series with unobserved components. Francq et al. (2018) also proposed a portmanteau test for the Log-GARCH model and the exponential GARCH (EGARCH) model. Carbon and Francq (2011) work on the APARCH model when the power \(\delta _0\) is known and suggest a portmanteau test for this class of models. However, in term of power performance, the authors have showed that: these portmanteau tests are more disappointing since they fail to detect alternatives of the form \(\delta _0>2\) when the null is \(\delta _0=2\) (see the right array in Table 1 of Carbon and Francq (2011)). Also, in practice the power \(\delta _0\) is unknown and thus can be estimated. To circumvent the problems, we propose in this work to adopt these portmanteau tests to the case of APGARCH model when the power \(\delta _0\) is unknown and is jointly estimated with the model’s parameters. Consequently, under the null hypothesis of an APGARCH(pq) model, we shown that the asymptotic distribution of the proposed statistic is a chi-squared distribution as in Carbon and Francq (2011). To obtain this result, we need the following technical (but not restrictive) assumption:

A1 the support of \((\eta _t)\) contains at least eleven positive values or eleven negative values.

Notice that Carbon and Francq (2011) need that the support of \((\eta _t)\) contains at least three positive values or three negative values only. This is due to the fact that \(\delta _0\) was known in their work.

In Sect. 2, we recall the results on the quasi-maximum likelihood estimator (QMLE) asymptotic distribution obtained by Hamadeh and Zakoïan (2011) when the power \(\delta _0\) is unknown. Section 3 presents our main aim, which is to complete the work of Carbon and Francq (2011) and to extend the asymptotic theory to the wide class of APGARCH models (1) when the power \(\delta _0\) is estimated with the other parameters. In Sect. 4, we test the null assumption of an APGARCH(0, 1) and an APGARCH(1, 1) model for different values of \(\delta _0\in \{0.5,1,1.5,2,2.5,3\}\). The empirical power are also investigated. Section 5 illustrates the portmanteau test for APGARCH(pq) models, with varying p and q, applied to exchange rates. To obtain these results, we use the asymptotic properties of the QMLE obtained by Hamadeh and Zakoïan (2011) for the APGARCH model (1).

2 Quasi-maximum likelihood estimation when the power \(\delta _0\) is unknown

Let the parameter space \(\varDelta \subseteq ]0,\infty [ \times [0,\infty [^{2q+p}\times ]0,\infty [\).

For all \(\vartheta =(\omega , \alpha _1^+, \ldots , \alpha _q^+, \alpha _1^-, \ldots , \alpha _q^-, \beta _1,\ldots , \beta _p,\delta )' \in \varDelta \), we assume that \(\zeta _t(\vartheta )\) is the strictly stationary and non-anticipative solution of

$$\begin{aligned} \zeta _t(\vartheta ) = \left( \omega + \sum \limits _{i=1}^q \alpha _i^+\left( \varepsilon _{t-i}^+\right) ^\delta + \alpha _i^-\left( -\varepsilon _{t-i}^-\right) ^\delta + \sum \limits _{j=1}^p \beta _j\zeta _{t-j}^\delta (\vartheta )\right) ^{1/\delta }, \end{aligned}$$
(2)

where \(\vartheta \) is equal to an unknown value \(\vartheta _0\) belonging to \(\varDelta \). In the sequel, we let \(\zeta _t(\vartheta _0)=\zeta _t\). Given the realizations \(\varepsilon _1,\dots ,\varepsilon _n\) (of length n) satisfying the APGARCH(pq) representation (1), the variable \(\zeta _t(\vartheta )\) can be approximated by \({\tilde{\zeta }}_t(\vartheta )\) defined recursively by

$$\begin{aligned} {\tilde{\zeta }}_t(\vartheta ) =\left( \omega + \sum \limits _{i=1}^q \alpha _{i}^+ \left( \varepsilon _{t-i}^+\right) ^\delta + \alpha _{i}^- \left( -\varepsilon _{t-i}^-\right) ^\delta + \sum \limits _{j=1}^p \beta _{j} {\tilde{\zeta }}_{t-j}^\delta (\vartheta )\right) ^{1/\delta },\quad \text { for }t\ge 1, \end{aligned}$$

conditional to the initial values \(\varepsilon _0, \ldots , \varepsilon _{1-q}\), \({\tilde{\zeta }}_0(\vartheta ) \ge 0, \ldots , {\tilde{\zeta }}_{1-p}(\vartheta )\ge 0\). The quasi-maximum likelihood (QML) method is particularly relevant for GARCH models because it provides consistent and asymptotically normal estimators for strictly stationary GARCH processes under mild regularity conditions (but with no moment assumptions on the observed process). The QMLE is obtained by the standard estimation procedure for GARCH class models. Thus a QMLE of \(\vartheta _0\) of the model (1) is defined as any measurable solution \({\hat{\vartheta }}_n\) of

$$\begin{aligned} {\hat{\vartheta }}_n = \underset{\vartheta \in \varDelta }{\arg \min } \dfrac{1}{n} \sum \limits _{t=1}^n {\tilde{l}}_t(\vartheta ),\quad \text { where } {\tilde{l}}_t(\vartheta ) = \dfrac{\varepsilon _t^2}{{\tilde{\zeta }}_t^2(\vartheta )} + \log \left( {\tilde{\zeta }}_t^2(\vartheta )\right) . \end{aligned}$$
(3)

To ensure the asymptotic properties of the QMLE (for the model (1)) obtained by Hamadeh and Zakoïan (2011), we need the following assumptions:

A2 \(\vartheta _0 \in \varDelta \) and \(\varDelta \) is compact.

A3 \(\forall \vartheta \in \varDelta , \quad \sum _{j=1}^p \beta _j < 1\) and \(\gamma (C_{0}) < 0\) where

$$\begin{aligned} \gamma (C_0):=\inf _{t\in {\mathbb {N}}^*}\frac{1}{t} {\mathbb {E}}\left( \log \left\| C_{0t}C_{0t-1}\dots C_{01}\right\| \right) =\lim _{t\rightarrow \infty }\frac{1}{t} \log \left\| C_{0t}C_{0t-1}\dots C_{01}\right\| \;a.s. \end{aligned}$$

is called the top Lyapunov exponent of the sequence of matrix \(C_{0}=\{C_{0t}{,}t\in \mathbb Z\}\). The matrix \(C_{0t}\) is defined by

$$\begin{aligned} C_{0t} = \begin{pmatrix} \kappa (\eta _t) &{}\quad \beta _{0p} &{}\quad \alpha _{[2:q-1]} &{}\quad \alpha _{[q:q]} \\ I_{p-1} &{}\quad 0_{(p-1)\times 1} &{}\quad 0_{(p-1)\times 2(q-2)} &{}\quad 0_{(p-1)\times 2} \\ {\underline{\eta }}_t &{}\quad 0_{2\times 1} &{}\quad 0_{2\times 2(q-2)} &{}\quad 0_{2\times 2} \\ 0_{2(q-2)\times (p-1)} &{}\quad 0_{2(q-2)\times 1} &{}\quad I_{2(q-2)} &{}\quad 0_{2(q-2)\times 2} \end{pmatrix}, \end{aligned}$$

where \(I_k\) denotes the identity matrix of size k and

$$\begin{aligned} \begin{aligned}&\kappa (\eta _t) = \left( \beta _{01}+\alpha _{01}^+(\eta _t^+)^{\delta _0} + \alpha _{01}^-\left( -\eta _t^-\right) ^{\delta _0} ,\beta _{01},\ldots , \beta _{0p-1}\right) ,\\&\alpha _{[i:j]} = \left( \alpha _{0i}^+,\alpha _{0i}^-,\ldots , \alpha _{0j}^+, \alpha _{0j}^-\right) ,\quad { for }\quad i \le j,\quad {\underline{\eta }}_t = \begin{pmatrix} (\eta _t^+)^{\delta _0} &{}\quad 0_{1\times (p-1)} \\ (-\eta _{t}^-)^{\delta _0} &{}\quad 0_{1\times (p-2)}\end{pmatrix}. \end{aligned} \end{aligned}$$

A4 If \(p>0, {\mathcal {B}}_{\vartheta _0}(z)=1 - \sum _{j=1}^p \beta _{0j}z^j\) has non common root with \({\mathcal {A}}_{\vartheta _0}^+(z) = \sum _{i=1}^q\alpha _{0i}^+z^i\) and \({\mathcal {A}}_{\vartheta _0}^-(z)= \sum _{i=1}^q\alpha _{0i}^-z^i\). Moreover \({\mathcal {A}}_{\vartheta _0}^+(1) + {\mathcal {A}}_{\vartheta _0}^-(1) \ne 0\) and \(\alpha _{0q}^+ + \alpha _{0q}^- + \beta _{0p} \ne 0\).

A5 \({\mathbb {E}}[\eta _t^2]=1\) and \(\eta _t\) has a positive density on some neighborhood of zero.

A6 \(\vartheta _0 \in \overset{\circ }{\varDelta }\), where \(\overset{\circ }{\varDelta }\) denotes the interior of \(\varDelta \).

To ensure the strong consistency of the QMLE, a compactness assumption is required (i.e A2). The assumption A3 makes reference to the condition of strict stationarity for the model (1). Assumptions A4 and A5 are made for identifiability reasons and Assumption A6 precludes the situation where certain components of \(\vartheta _0\) are equal to zero. Then under the assumptions A0, A2A6, Hamadeh and Zakoïan (2011) showed that \({\hat{\vartheta }}_n\rightarrow \vartheta _0\) a.s. as \(n\rightarrow \infty \) and \(\sqrt{n}({\hat{\vartheta }}_n - \vartheta _0)\) is asymptotically normal with mean 0 and covariance matrix \((\kappa _\eta -1)J^{-1}\), where

$$\begin{aligned}&J := {\mathbb {E}}_{\vartheta _0}\left[ \dfrac{\partial ^2 l_t(\vartheta _0)}{\partial \vartheta \partial \vartheta '}\right] = {\mathbb {E}}_{\vartheta _0}\left[ \dfrac{\partial \log (\zeta _t^2(\vartheta _0))}{\partial \vartheta } \dfrac{\partial \log (\zeta _t^2(\vartheta _0))}{\partial \vartheta '}\right] ,\\&\quad \text {with } {l}_t(\vartheta ) = \dfrac{\varepsilon _t^2}{{\zeta }_t^2(\vartheta )} + \log ({\zeta }_t^2(\vartheta )) \end{aligned}$$

where \(\kappa _\eta := {\mathbb {E}}[\eta _t^4]< \infty \) by A0 and \(\zeta _t(\vartheta )\) is given by (2).

3 Portmanteau test

To check the adequacy of a given time series model, for instance an ARMA(pq) model, it is common practice to test the significance of the residuals autocorrelations. In the GARCH framework this approach is not relevant because the process \(\eta _t=\varepsilon _t/\zeta _t\) is always a white noise (possibly a martingale difference) even when the volatility is misspecified. To check the adequacy of a volatility model, under the null hypothesis

$$\begin{aligned} {{\mathbf{H}}_{\mathbf{0}}} : \text {the process} (\varepsilon _t) \text {satisfies the model (1)}, \end{aligned}$$

it is much more fruitful to look at the squared residuals autocovariances

$$\begin{aligned} {\hat{r}}_h = \dfrac{1}{n} \sum \limits _{t = \vert h \vert +1}^n\left( {\hat{\eta }}_t^2-1\right) \left( {\hat{\eta }}_{t-\vert h\vert }^2-1\right) ,\quad \text { with } {\hat{\eta }}_t^2 = \dfrac{\varepsilon _t^2}{{\hat{\zeta }}_t^2}, \end{aligned}$$

for \(\vert h \vert < n\) and where \({\hat{\zeta }}_t = {\tilde{\zeta }}_t({\hat{\vartheta }}_n)\) is the quasi-maximum likelihood residuals. For a fixed integer \(m\ge 1\), we consider the vector of the first m sample autocovariances defined by

$$\begin{aligned} {{\hat{\gamma }}}_m = ({\hat{r}}_1,\ldots ,{\hat{r}}_m), \text { such that } 1\le m < n. \end{aligned}$$

The following theorem gives the asymptotic distribution of the autocovariances of the squared residuals.

Theorem 1

Under the assumptions A0A6, if \((\varepsilon _t)\) is the non-anticipative and stationary solution of the APGARCH(pq) model (1), we have

$$\begin{aligned} \sqrt{n} {{\hat{\gamma }}}_m \xrightarrow [n\rightarrow \infty ]{\mathrm {d}} {\mathcal {N}}\left( 0,D\right) \text { where } {D}=({\kappa }_\eta - 1)^2 I_m - ({\kappa }_\eta -1){C}_m{J}^{-1}{C}'_m \end{aligned}$$

is nonsingular and where the matrix \({C}_m\) is given by (17) in the proof of Theorem 1.

The proof of this result is postponed to Sect. 7.

The standard portmanteau test for checking that the data is a realization of a strong white noise introduced by Box and Pierce (1970) or Ljung and Box (1978) is based on the residuals autocorrelations \({{\hat{\rho }}}(h)\) and is defined by

$$\begin{aligned} Q^{\textsc {bp}}_m=n\sum _{h=1}^m{\hat{\rho }}^2(h)\quad \text { and }\quad {Q}_m^{\textsc {lb}}=n(n+2)\sum _{h=1}^m\frac{{\hat{\rho }}^2(h)}{n-h}, \end{aligned}$$
(4)

where n is the length of the series and m is a fixed integer. Under the assumption that the noise sequence is iid, the standard test procedure consists in rejecting the strong white noise hypothesis if the statistics (4) are larger than a certain quantile of a chi-squared distribution. These tests are not robust to conditional heteroscedasticity or other processes displaying a second order dependence. Indeed such nonlinearities may arise for instance when the observed process \((\varepsilon _t)\) follows a GARCH representation. Other situations where the standard tests are not robust can be found for instance in Relvas and Paula (2016), Cao et al. (2010), Francq et al. (2005) or Yacouba and Abdoulkarim (2018), Yacouba and Bruno (2018), Boubacar (2011). Nevertheless our main goal is to propose a more robust portmanteau statistics in the APGARCH framework.

In order to state our second result, we also need further notations. Let \({\hat{\kappa }}_\eta \), \({\hat{J}}\) and \({\hat{C}}_m\) be weakly consistent estimators of \(\kappa _\eta \), J and \({C}_m\) involved in the asymptotic normality of \(\sqrt{n} {{\hat{\gamma }}}_m\) (see Theorem 1). For instance, \(\kappa _\eta \) and J can be estimated by their empirical or observable counterparts given by

$$\begin{aligned} {\hat{\kappa }}_\eta = \dfrac{1}{n} \sum \limits _{t=1}^n \dfrac{\varepsilon _t^4}{{\tilde{\zeta }}_t^4({\hat{\vartheta }}_n)} \qquad \text{ and }\qquad {\hat{J}} = \dfrac{1}{n}\sum \limits _{t=1}^n \dfrac{\partial \log {\tilde{\zeta }}_t^2({\hat{\vartheta }}_n)}{\partial \vartheta }\dfrac{\partial \log {\tilde{\zeta }}_t^2({\hat{\vartheta }}_n)}{\partial \vartheta '}. \end{aligned}$$

We can write the vector of parameters \(\vartheta := (\theta ', \delta )'\) where \(\theta =(\omega , \alpha _1^+, \ldots , \alpha _q^+\), \(\alpha _1^-, \ldots , \alpha _q^-, \beta _1,\ldots , \beta _p)'\in {\mathbb {R}}^{2q+p+1}\) corresponds to the vector of parameters when the power \(\delta \) is known. The parameter of interest becomes \(\vartheta _0 :=(\theta _0',\delta _0)'\), where

$$\begin{aligned} \theta _0= \left( \omega _0, \alpha _{01}^+, \ldots , \alpha _{0q}^+, \alpha _{01}^-, \ldots , \alpha _{0q}^-, \beta _{01},\ldots , \beta _{0p}\right) '. \end{aligned}$$

With the previous notation, for all \(\vartheta = (\theta ', \delta )'\in \varDelta \), the derivatives in the expression of \({\hat{J}}\) can be recursively computed for \(t> 0\) by

$$\begin{aligned} \begin{aligned} \dfrac{\partial {\tilde{\zeta }}_t^\delta (\vartheta )}{\partial \theta }&= \underline{{{\tilde{c}}}}_t(\vartheta ) + \sum \limits _{j=1}^p \beta _j \dfrac{\partial {\tilde{\zeta }}_{t-j}^\delta (\vartheta )}{\partial \theta },\\ \dfrac{\partial {\tilde{\zeta }}_t^\delta (\vartheta )}{\partial \delta }&= \sum \limits _{i=1}^q \alpha _i^+\log \left( \varepsilon _{t-i}^+\right) \left( \varepsilon _{t-i}^+\right) ^\delta + \alpha _i^-\log \left( -\varepsilon _{t-i}^-\right) \left( -\varepsilon _{t-i}^-\right) ^\delta + \sum \limits _{j=1}^p \beta _j \dfrac{\partial {\tilde{\zeta }}_{t-j}^\delta (\vartheta )}{\partial \delta }, \end{aligned} \end{aligned}$$

with the initial values \(\partial {\tilde{\zeta }}_t(\vartheta )/\partial \vartheta = 0\), for all \(t = 0, \ldots , 1-p\) and

$$\begin{aligned} \underline{{{\tilde{c}}}}_t(\vartheta ) = \left( 1, \left( \varepsilon _{t-1}^+\right) ^\delta , \ldots , \left( \varepsilon _{t-q}^+\right) ^\delta , \left( -\varepsilon _{t-1}^-\right) ^\delta , \ldots , \left( -\varepsilon _{t-q}^-\right) ^\delta , {\tilde{\zeta }}_{t-1}^\delta (\vartheta ), \ldots , {\tilde{\zeta }}_{t-p}^\delta (\vartheta )\right) '. \end{aligned}$$
(5)

By convention we let \(\log (\varepsilon _{t}^+) = 0\) if \(\varepsilon _t \le 0\) and respectively \(\log (-\varepsilon _{t}^-) = 0\) if \(\varepsilon _t \ge 0\).

We define the matrix \({\hat{C}}_m\) of size \(m \times (2q+p+2)\) and we take

$$\begin{aligned} {\hat{C}}_m(h,k) = -\dfrac{1}{n}\sum \limits _{t=h+1}^n({\hat{\eta }}^2_{t- h} -1) \dfrac{1}{{\tilde{\zeta }}_t^2({\hat{\vartheta }}_n)} \dfrac{\partial {\tilde{\zeta }}_t^2({\hat{\vartheta }}_n)}{\partial \vartheta _k} \text { for }1\le h \le m\text { and }1\le k \le 2q+p+2, \end{aligned}$$
(6)

where \({\hat{C}}_m(h,k)\) denotes the (hk) element of the matrix \({\hat{C}}_m\). Let \({\hat{D}} =({\hat{\kappa }}_\eta -1)^2 I_m-({\hat{\kappa }}_\eta -1){\hat{C}}_m{\hat{J}}^{-1}{\hat{C}}'_m\) be a weakly consistent estimator of the matrix D. The following result gives the asymptotic distribution for quadratic forms of autocovariances of squared residuals and is established in the case where the power is unknown and is estimated with the others parameters.

Theorem 2

Under Assumptions of Theorem 1 and \(\mathrm {\mathbf {H_0}}\), we have

$$\begin{aligned} n {{\hat{\gamma }}}_m'{\hat{D}}^{-1}{{\hat{\gamma }}}_m \xrightarrow [n\rightarrow \infty ]{\mathrm {d}} \chi ^2_m. \end{aligned}$$

The proof of this result is postponed to Sect. 7. The adequacy of the APGARCH(pq) model (1) is then rejected at the asymptotic level \(\alpha \) when

$$\begin{aligned} n {{\hat{\gamma }}}_m'{\hat{D}}^{-1}{{\hat{\gamma }}}_m\ >\ \chi ^2_m(1-\alpha ), \end{aligned}$$

where \(\chi ^2_m(1-\alpha )\) represents the \((1-\alpha )-\)quantile of the chi-square distribution with m degrees of freedom.

Remark 1

If we focuse on the following alternative hypothesis

$$\begin{aligned} {{\mathbf{H}}_{\mathbf{1}}} : \text {the process} (\varepsilon _t)\text { does not admit the representation (1) with parameter} \vartheta _0, \end{aligned}$$

at least one \( r^0_h= \mathbb E [ (\eta ^2_t(\vartheta _0)-1)(\eta ^2_{t-h}(\vartheta _0)-1)] \ne 0\) under \({\mathbf {H_1}}\). One may prove that under \({\mathbf {H_1}}\)

$$\begin{aligned} {{\hat{\gamma }}}_m'{\hat{D}}^{-1}{{\hat{\gamma }}}_m \xrightarrow [n\rightarrow \infty ]{{\mathbb {P}}} {\gamma ^0_m}' D^{-1} \gamma ^0_m \end{aligned}$$

where the vector \(\gamma ^0_m = (r^0_1,\ldots , r^0_m)'\). Therefore the test statistic \(n {{\hat{\gamma }}}_m'{\hat{D}}^{-1}{{\hat{\gamma }}}_m \) is consistent in detecting \(\mathrm {\mathbf {H_1}}\).

The proof of this remark is also postponed to Sect. 7.

4 Numerical illustration

By means of Monte Carlo experiments, we investigate the finite sample properties of the test introduced in this paper. The numerical illustrations of this section are made with the free statistical R software (see https://www.r-project.org/). First, we simulated \(N=1,000\) independent replications of size \(n=500\) and \(n=2000\) of an APGARCH(0, 1) model

$$\begin{aligned} \varepsilon _t = \left( 0.2 + 0.4\left( \varepsilon _{t-1}^+\right) ^{\delta _0} + 0.1 \left( -\varepsilon _{t-1}^-\right) ^{\delta _0}\right) ^{1/\delta _0}\eta _t \end{aligned}$$
(7)

for different values of \(\delta _0\in \{0.5,1,1.5,2,2.5,3\}\). Second, we also simulated \(N=1000\) independent replications of size \(n=2000\) and \(n=5000\) of an APGARCH(1, 1) model

$$\begin{aligned} \varepsilon _t = \left( 0.009 + 0.036\left( \varepsilon _{t-1}^+\right) ^{\delta _0} + 0.074 \left( -\varepsilon _{t-1}^-\right) ^{\delta _0} + 0.879\zeta _{t-1}^{\delta _0}\right) ^{1/\delta _0}\eta _t \end{aligned}$$
(8)

for different values of \(\delta _0\in \{1,1.5,2,2.5,3\}\). Three distributions of \((\eta _t)\) are considered for each model:

  1. (a)

    a symmetric distribution namely a standard \({\mathcal {N}}(0,1)\),

  2. (b)

    a centered and standardized two-components Gaussian mixture distribution (\(0.1{\mathcal {N}}(-2,2)+0.9{\mathcal {N}}(2,0.16)\)) to obtain \({\mathbb {E}}(\eta _t^2)=1\), which is highly leptokurtic since \(\kappa _{\eta }=10.53\) (see Hamadeh and Zakoïan (2011)),

  3. (c)

    a standardized Student’s distribution with \(\nu \) degrees of freedom (\(t_{\nu }\)) where \(\nu =3\) and \(\nu =9\), such that \({\mathbb {E}}(\eta _t^2)=1\). Contrary to \(\nu =9\) it should be noted that for \(\nu =3\), the asymptotic distribution of autocovariances of squared residuals obtained in Theorem 1 is not valid since \(\kappa _{\eta }=\infty \).

For each of these N replications we use the QMLE method to estimate the corresponding coefficient \(\vartheta _{0}\) for the APGARCH(0, 1) (resp. the APGARCH(1, 1)) model considered. After estimating the APGARCH(0, 1) (resp. the APGARCH(1, 1)) model we apply portmanteau test to the squared residuals for different values of \(m\in \{1,\dots ,12\}\), where m is the number of autocorrelations used in the portmanteau test statistic.

Table 1 Empirical size of the proposed test: relative frequencies (in %) of rejection of an APARCH(0, 1) given by (7) with innovation a) (\(\eta _t\sim {\mathcal {N}}(0,1)\))

Tables 123 and 4 (resp. Tables 567 and 8) display the empirical relative frequencies of rejection over the N independent replications for the 3 nominal levels \(\alpha =1\%\), 5% and 10%, when the data generating process (DGP for short) is the APGARCH(0, 1) (resp. the APGARCH(1, 1)) model. For these nominal levels, the empirical relative frequency of rejection size over the \(N=1,000\) independent replications should vary respectively within the confidences intervals \([0.3\%,1.7\%]\), \([3.6\%, 6.4\%]\) and \([8.1\%,11.9\%]\) with probability 95% and \([0.3\%, 1.9\%]\), \([3.3\%, 6.9\%]\) and \([7.6\%, 12.5\%]\) with probability 99% under the assumption that the true probabilities of rejection are respectively \(\alpha =1\%\), \(\alpha =5\%\) and \(\alpha =10\%\). As expected, Tables 123 and 4 (resp. Tables 567 and 8) show that the error of first kind is better controlled (most of the rejection frequencies are within the 99% significant limits, except for Model APGARCH(0, 1) when \(\delta _0=2\)) when \(n=2000\) (resp. \(n=5,000\)) than when \(n=500\) (resp. \(n=2000\)) when the DGP follows an APGARCH(0, 1) (resp. an APGARCH(1, 1)) model. Note also that even in the case where \((\eta _t)\) have infinite fourth moments, namely, when \(\eta _t\sim t_{3}\) the proposed test performs well for Model APGARCH(0, 1). The opposite conclusion is obtained for Model APGARCH(1, 1) since the rejection frequencies are globally outside the 99% significant limits.

Consequently the proposed test well controls the error of first kind for the candidates models when the number of observations is large, which could correspond in practice to the length for daily financial series or higher-frequency data.

We now repeat the same experiments to examine the empirical power of the test: first for the null hypothesis of an APGARCH(0, 1) against an APGARCH(1, 1) alternative given by (8). Second for the null hypothesis of an APGARCH(1, 1)) against the following APGARCH(2, 1) alternative defined by

$$\begin{aligned} \varepsilon _t&= \left( 0.2 + 0.07\left( \varepsilon _{t-1}^+\right) ^{\delta _0}+ 0.03 \left( \varepsilon _{t-2}^+\right) ^{\delta _0} \right. \nonumber \\&\quad \left. +\, 0.051\left( -\varepsilon _{t-1}^-\right) ^{\delta _0} + 0.18 \left( -\varepsilon _{t-2}^-\right) ^{\delta _0} + 0.704 \zeta _{t-1}^{\delta _0}\right) ^{1/\delta _0}\eta _t. \end{aligned}$$
(9)

Tables 91011 and 12 (resp. Tables 1314 and 15) compare the empirical powers of Model (8) (resp. of Model (9)) with different values of \(\delta _0\) over the N independent replications at different asymptotic levels \(\alpha \). In term of power performance we observe that:

  1. 1.

    In the first case the powers of the test are quite satisfactory, except for \(\delta _0=0.5\) and \(m=1\), when the null is an APGARCH(0, 1) model. Even for the sample size \(n=500\), the test is able to clearly reject the APGARCH(0, 1) model when the DGP follows Model (8) (see Tables 9, 1011, 12).

  2. 2.

    In the second experiment it is seen that the proposed test have high power in the case of an APGARCH(1, 1) when the DGP follows Model (9) (Tables 1314 and 15). Note that when \(\eta _t\sim t_3\) we do not reported here the empirical power of the proposed test which is hardly interpretable, because we have already seen in Table 7 that the test do not well control the error of first kind in this APGARCH(1, 1) framework.

  3. 3.

    The empirical power of the test in the two experiments is in general decreasing when m increases and is increasing when \(\delta _0\) increases.

Table 2 Empirical size of the proposed test: relative frequencies (in %) of rejection of an APARCH(0, 1) given by (7) with innovation b) (\(\eta _t\sim 0.1{\mathcal {N}}(-2,2)+0.9{\mathcal {N}}(2,0.16))\)
Table 3 Empirical size of the proposed test: relative frequencies (in %) of rejection of an APARCH(0, 1) given by (7) with innovation c) (\(\eta _t\sim t_3\))
Table 4 Empirical size of the proposed test: relative frequencies (in %) of rejection of an APARCH(0, 1) given by (7) with innovation c) (\(\eta _t\sim t_9\))
Table 5 Empirical size of the proposed test: relative frequencies (in %) of rejection of an APARCH(1, 1) given by (8) with innovation a) (\(\eta _t\sim {\mathcal {N}}(0,1)\))
Table 6 Empirical size of the proposed test: relative frequencies (in %) of rejection of an APARCH(1, 1) given by (8) with innovation b) (\(\eta _t\sim 0.1{\mathcal {N}}(-2,2)+0.9{\mathcal {N}}(2,0.16)\))
Table 7 Empirical size of the proposed test: relative frequencies (in %) of rejection of an APARCH(1, 1) given by (8) with innovation c) (\(\eta _t\sim t_3\))
Table 8 Empirical size of the proposed test: relative frequencies (in %) of rejection of an APARCH(1, 1) given by (8) with innovation c) (\(\eta _t\sim t_9\))
Table 9 Empirical power (in %) of the proposed test for the null hypothesis of an APARCH(0, 1) against an APARCH(1, 1) given by (8) with innovation a) (\(\eta _t\sim {\mathcal {N}}(0,1)\))

5 Adequacy of APGARCH models for real datasets

We consider the daily return of four exchange rates EUR/USD (Euros Dollar), EUR/JPY (Euros Yen), EUR/GBP (Euros Pounds) and EUR/CAD (Euros Canadian dollar). The observations covered the period from November 01, 1999 to April 28, 2017 which correspond to \(n=4478\) observations. The data were obtain from the website of the National Bank of Belgium (https://www.nbb.be). It may seem surprising to investigate asymmetry models for exchange rate returns, while the conventional view is that leverage is not relevant for such series. However, many empirical studies (see for instance Harvey and Sucarrat (2014), Francq et al. (2018)), show that asymmetry/leverage is relevant for exchange rates, especially when one currency is more liquid or more attractive than the other. It may also be worth mentioning the sign of the effect depends on which currency appears in the denominator of the exchange rate.

Table 16 displays the \(p-\)values for adequacy of the APGARCH(pq) models for daily returns of exchange rates based on m squared residuals autocovariances, as well as the estimated power denoted \({\hat{\delta }}\). To summarize our empirical investigations, Table 16 shows that the APGARCH(0, q) models (even with large order q) are generally rejected, whereas the APGARCH(0, 5) and APGARCH(0, 6) models seem to be relevant for the EUR/USD and EUR/CAD series. The APGARCH(0, q) model assumption is rejected and is not adapted to EUR/GBP and EUR/JPY series, whereas the APGARCH(pq) models seem the most appropriate for these exchange rates (EUR/GBP and EUR/JPY). This table only concerns the daily exchange rates, but similar conclusions hold for the weekly log returns (see for instance Francq and Zakoïan (2019)). From the two last columns of Table 16, we can also see that the estimated power \({{\hat{\delta }}}\) is not necessary equal to 1 or 2 and is different for each series. The \(p-\)values of the corresponding QMLE, \({{\hat{\delta }}}\), are given in parentheses. The last column then presents the confidence interval at the asymptotic level \(\alpha = 5\%\) for the parameter \({{\hat{\delta }}}\).

6 Concluding remarks

Three distributions of \((\eta _t)\) have been considered in this paper. We remark that, as expected because the asymptotic distribution of autocovariances of squared residuals obtained in Theorem 1 is not valid since \(\kappa _{\eta }=\infty \), the test does not control the error of first kind in the GARCH case. The other distributions yield good results.

Concerning the parameter \(\delta _0\), the proposed test is recommended for any values.

The portmanteau test is thus an important tool in the validation process. From the empirical results and the simulation experiments, we draw the conclusion that the proposed portmanteau test based on squared residuals of an APGARCH(pq) (when the power is unknown and is jointly estimated with the model’s parameters) controls well the error of first kind at different asymptotic level \(\alpha \) and is efficient to detect a misspecification of the order (pq).

Table 10 Empirical power (in %) of the proposed test for the null hypothesis of an APARCH(0, 1) against an APARCH(1, 1) given by (8) with innovation b) (\(\eta _t\sim 0.1{\mathcal {N}}(-2,2)+0.9{\mathcal {N}}(2,0.16)\))
Table 11 Empirical power (in %) of the proposed test for the null hypothesis of an APARCH(0, 1) against an APARCH(1, 1) given by (8) with innovation c) (\(\eta _t\sim t_3\))
Table 12 Empirical power (in %) of the proposed test for the null hypothesis of an APARCH(0, 1) against an APARCH(1, 1) given by (8) with innovation c) (\(\eta _t\sim t_9\))
Table 13 Empirical power (in %) of the proposed test for the null hypothesis of an APARCH(1, 1) against an APARCH(2, 1) given by (9) with innovation a) (\(\eta _t\sim {\mathcal {N}}(0,1)\))

7 Proofs

We recall that for all \(\vartheta \in \varDelta \), \(\zeta _t(\vartheta )\) is the strictly stationary and non-anticipative solution of (2).

The matrix J can be rewritten as

$$\begin{aligned} J = {\mathbb {E}}_{\vartheta _0}\left[ \dfrac{1}{\zeta _{t}^4(\vartheta _0)} \dfrac{\partial \zeta _{t}^2(\vartheta _0)}{\partial \vartheta } \dfrac{\partial \zeta _{t}^2(\vartheta _0)}{\partial \vartheta '}\right] . \end{aligned}$$

First, we shall need some technical results which are essentially contained in Hamadeh and Zakoïan (2011). Let K and \(\rho \) be generic constants, whose values will be modified along the proofs, such that \(K>0\) and \(\rho \in ]0,1[\).

7.1 Reminder on technical issues on quasi likelihood method for APGARCH models

The starting point is the asymptotic irrelevance of the initial values. Under A0, A2A6, Hamadeh and Zakoïan (2011) show that:

$$\begin{aligned} \sup \limits _{\vartheta \in \varDelta } | \zeta _{t}^\delta (\vartheta ) - {\tilde{\zeta }}_{t}^\delta (\vartheta ) | \le K\rho ^t. \end{aligned}$$
(10)

Similar properties also hold for the derivatives with respect to \(\vartheta \) of \(\zeta _{t}^\delta (\vartheta ) - {\tilde{\zeta }}_{t}^\delta (\vartheta )\). We sum up the properties that we shall need in the sequel. We refer to Hamadeh and Zakoïan (2011) for a more detailed treatment. For some \(s \in ]0,1[\), we have

$$\begin{aligned} {\mathbb {E}}\vert \varepsilon _0 \vert ^{2s}< \infty , \qquad {\mathbb {E}} \sup \limits _{\vartheta \in \varDelta } | \zeta _{t}^{2s} |< \infty , \qquad {\mathbb {E}} \sup \limits _{\vartheta \in \varDelta } | {\tilde{\zeta }}_{t}^{2s} | < \infty . \end{aligned}$$
(11)

Moreover, from (10), the mean-value theorem implies that

$$\begin{aligned} \sup \limits _{\vartheta \in \varDelta } | \zeta _{t}^2(\vartheta ) - {\tilde{\zeta }}_{t}^2(\vartheta )| \le K\rho ^t\sup \limits _{\vartheta \in \varDelta }\max \{\zeta _t^2(\vartheta ), {\tilde{\zeta }}_t^2(\vartheta )\}. \end{aligned}$$
(12)

For all \(d\ge 1\)

$$\begin{aligned} {\mathbb {E}}\left\| \sup \limits _{\vartheta \in \varDelta } \dfrac{1}{\zeta _{t}^\delta (\vartheta )}\dfrac{\partial \zeta _t^\delta (\vartheta )}{\partial \vartheta }\right\| ^d<\infty , \qquad {\mathbb {E}}\left\| \sup \limits _{\vartheta \in \varDelta } \dfrac{1}{\zeta _{t}^\delta (\vartheta )}\dfrac{\partial ^2 \zeta _t^\delta (\vartheta )}{\partial \vartheta \partial \vartheta '}\right\| ^d <\infty . \end{aligned}$$
(13)

There exists a neighborhood \({\mathcal {V}}(\vartheta _0)\) of \(\vartheta _0\) such that for all \(\xi > 0\) and \(a = 1-(\delta _0/\delta )(1-s)> 0\)

$$\begin{aligned} \sup \limits _{\vartheta \in {\mathcal {V}}(\vartheta _0)} \left( \dfrac{\zeta _t^2(\vartheta _0)}{\zeta _t^2(\vartheta )}\right) \le \left( K + K\sum \limits _{i=1}^q\sum \limits _{k=0}^\infty (1 + \xi )^k\rho ^{ak}\vert \varepsilon _{t-i-k}\vert ^{2\delta }\right) ^{2/\delta }, \end{aligned}$$

and it holds that

$$\begin{aligned} {\mathbb {E}}\left| \sup \limits _{\vartheta \in {\mathcal {V}}(\vartheta _0)} \left( \dfrac{\zeta _t^2(\vartheta _0)}{\zeta _t^2(\vartheta )}\right) \right| < \infty . \end{aligned}$$
(14)

The matrix J is invertible and

$$\begin{aligned} \quad \sqrt{n}({\hat{\vartheta }}_n - \vartheta _0) = J^{-1} \dfrac{1}{\sqrt{n}} \sum \limits _{t=1}^ns_t \dfrac{1}{\zeta _t^2}\dfrac{\partial \zeta _t^2(\vartheta _0)}{\partial \vartheta } + \mathrm {o}_{\mathbb P}(1),\quad \text {with } s_t = \eta _t^2-1. \end{aligned}$$
(15)
Table 14 Empirical power (in %) of the proposed test for the null hypothesis of an APARCH(1, 1) against an APARCH(2, 1) given by (9) with innovation b) (\(\eta _t\sim 0.1{\mathcal {N}}(-2,2)+0.9{\mathcal {N}}(2,0.16)\))
Table 15 Empirical power (in %) of the proposed test for the null hypothesis of an APARCH(1, 1) against an APARCH(2, 1) given by (9) with innovation a) (\(\eta _t\sim t_9\))

7.2 Proof of Theorem 1

The proof of Theorem 1 is close to the proof of Carbon and Francq (2011). Only the invertibility of the matrix D needs to be adapted. But, to understand the proofs and to have its own autonomy, we rewrite all the proof. We also decompose this proof in 3 following steps.

  1. (i)

    Asymptotic impact of the unknown initial values on the statistic \({{\hat{\gamma }}}_m\).

  2. (ii)

    Asymptotic distribution of \(\sqrt{n}{{\hat{\gamma }}}_m\).

  3. (iii)

    Invertibility of the matrix D.

We now introduce the vector of m autocovariances \({\gamma }_m = (r_1, \ldots , r_m)'\) where the h-th element is define as

$$\begin{aligned} r_h = \dfrac{1}{n}\sum \limits _{t=h+1}^n s_ts_{t-h}\ , \quad \text {with } s_t = \eta _t^2-1 \text { and } 0<h < n. \end{aligned}$$

Let \(s_t(\vartheta )=\eta ^2_t(\vartheta )-1\) with \(\eta _t(\vartheta ) = \varepsilon _t / \zeta _t(\vartheta )\) and \({\tilde{s}}_t(\vartheta )= {{\tilde{\eta }}}^2_t(\vartheta )-1\) with \({{\tilde{\eta }}}_t(\vartheta ) = \varepsilon _t / {{\tilde{\zeta }}}_t(\vartheta )\). Let \(r_h(\vartheta )\) obtained by replacing \(\eta _t\) by \(\eta _t(\vartheta )\) in \(r_h\) and \({\tilde{r}}_h(\vartheta )\) by replacing \({\eta }_t\) by \({\tilde{\eta }}_t(\vartheta )\) in \(r_h\). The vectors \({\gamma }_m(\vartheta ) = (r_1(\vartheta ),\ldots , r_m(\vartheta ))'\) and \(\tilde{{\gamma }}_m(\vartheta ) = ({\tilde{r}}_1(\vartheta ),\ldots , {\tilde{r}}_m(\vartheta ))'\) are such that \({\gamma }_m = {\gamma }_m(\theta _0)\), \(\tilde{{\gamma }}_m = \tilde{{\gamma }}_m(\theta _0)\) and \(\hat{{\gamma }}_m = \tilde{{\gamma }}_m({\hat{\vartheta }}_n)\).

7.2.1 Asymptotic impact of the unknown initial values on the statistic \({{\hat{\gamma }}}_m\)

We have \(s_t(\vartheta )s_{t-h}(\vartheta ) - {\tilde{s}}_t(\vartheta ){\tilde{s}}_{t-h}(\vartheta ) = a_t + b_t\) with \(a_t = \{s_t(\vartheta ) - {\tilde{s}}_t(\vartheta )\}s_{t-h}(\vartheta )\) and \(b_t = {\tilde{s}}_t(\vartheta )\{s_{t-h}(\vartheta ) - {\tilde{s}}_{t-h}(\vartheta )\}\). Using (12) and \(\inf _{\vartheta \in \varDelta } {\tilde{\zeta }}^2_t \ge \inf _{\vartheta \in \varDelta }\omega ^{2/\delta } > 0\), we have

$$\begin{aligned} \vert a_t \vert +\vert b_t \vert \le K \rho ^t \varepsilon _t^2(\varepsilon _{t-h}^2 + 1) \sup \limits _{\vartheta \in \varDelta } \max \{ {\tilde{\zeta }}_t^2, \zeta _t^2\} \ . \end{aligned}$$

Using the inequality \((a+b)^s \le a^s + b^s\), for \(a,b \ge 0\) and \(s\in ]0,1[\), (11) and Hölder’s inequality, we have for some \(s^*\in ]0,1[\) sufficiently small

$$\begin{aligned} {\mathbb {E}}\left| \dfrac{1}{\sqrt{n}}\sum \limits _{t=1}^n \sup \limits _{\vartheta \in \varDelta }\vert a_t \vert \right| ^{s^*} \le K \dfrac{1}{n^{s^*/2}}\sum \limits _{t=1}^n \rho ^{ts^*} \underset{n\rightarrow \infty }{\longrightarrow } 0. \end{aligned}$$

We deduce that \(n^{-1/2}\sum _{t=1}^n \sup _{\vartheta \in \varDelta } \vert a_t \vert = \mathrm {o}_{\mathbb P}(1)\). We have the same convergence for \(b_t\), and for the derivatives of \(a_t\) and \(b_t\). Consequently, we obtain

$$\begin{aligned} \sqrt{n}\Vert \gamma _m - {{\tilde{\gamma }}}_m \Vert = \mathrm {o}_{\mathbb P}(1), \qquad \sup \limits _{\vartheta \in \varDelta }\left\| \dfrac{\partial \gamma _m}{\partial \vartheta } - \dfrac{\partial {{\tilde{\gamma }}}_m}{\partial \vartheta }\right\| = \mathrm {o}_{\mathbb P}(1),\text { as }n\rightarrow \infty . \end{aligned}$$
(16)

The unknown initial values have no asymptotic impact on the statistic \({{\hat{\gamma }}}_m\).

7.2.2 Asymptotic distribution of \(\sqrt{n}{{\hat{\gamma }}}_m\)

We now show that the asymptotic distribution of \(\sqrt{n}{{\hat{\gamma }}}_m\) is deduced from the joint distribution of \(\sqrt{n}\gamma _m\) and of the QMLE. Using (16) and a Taylor expansion of \(\gamma _m(\cdot )\) around \({\hat{\vartheta }}_n\) and \(\vartheta _0\), we obtain

$$\begin{aligned} \begin{aligned} \sqrt{n} {{\hat{\gamma }}}_m&= \sqrt{n}{{\tilde{\gamma }}}_m(\vartheta _0) + \dfrac{\partial {{\tilde{\gamma }}}_m(\vartheta ^*)}{\partial \vartheta } \sqrt{n}({\hat{\vartheta }}_n - \vartheta _0) \\&= \sqrt{n}\gamma _m + \dfrac{\partial \gamma _m(\vartheta ^*)}{\partial \vartheta }\sqrt{n}({\hat{\vartheta }}_n - \vartheta _0) + \mathrm {o}_{\mathbb P}(1), \end{aligned} \end{aligned}$$

for some \(\vartheta _i^*\), \(i=1,\dots ,2q+p+2\) between \({\hat{\vartheta }}_n\) and \(\vartheta _0\). In view of (14), there exists a neighborhood \({\mathcal {V}}(\vartheta _0)\) of \(\vartheta _0\) such that

$$\begin{aligned} {\mathbb {E}} \sup \limits _{\vartheta \in {\mathcal {V}}(\vartheta _0)}\left\| \dfrac{\partial ^2 s_{t-h}(\vartheta )s_t(\vartheta )}{\partial \vartheta \partial \vartheta '}\right\| < \infty . \end{aligned}$$

For a fixed \(r_h\), using these inequalities, (13) and Assumption A0 (\(\kappa _\eta <\infty \)), the almost sure convergence of \(\vartheta ^*\) to \(\vartheta _0\), a second Taylor expansion and the ergodic theorem, we obtain

$$\begin{aligned} \begin{aligned} \dfrac{\partial r_h(\vartheta ^*)}{\partial \vartheta }&=\dfrac{\partial r_h(\vartheta _0)}{\partial \vartheta } + \mathrm {o}_{\mathbb P}(1) \underset{n\rightarrow \infty }{\longrightarrow } c_h := {\mathbb {E}}\left[ s_{t-h}(\vartheta _0) \dfrac{\partial s_t(\vartheta _0)}{\partial \vartheta }\right] \\&= -{\mathbb {E}}\left[ s_{t-h}\dfrac{1}{\zeta _t^2(\vartheta _0)} \dfrac{\partial \zeta _t^2(\vartheta _0)}{\partial \vartheta }\right] \end{aligned} \end{aligned}$$

by the fact \({\mathbb {E}}[s_t(\vartheta _0)\partial s_{t-h}(\vartheta _0)/\partial \vartheta ] = 0\). Note that, \(c_h\) is the almost sure limit of the row h of the matrix \({\hat{C}}_m\). Consequently we have

$$\begin{aligned} \dfrac{\partial \gamma _m(\vartheta _0)}{\partial \vartheta } \underset{n\rightarrow \infty }{\longrightarrow } C_m := \begin{pmatrix} c_1' \\ \vdots \\ c_m' \end{pmatrix}. \end{aligned}$$
(17)

It follows that

$$\begin{aligned} \sqrt{n} {{\hat{\gamma }}}_m = \sqrt{n}\gamma _m + C_m\sqrt{n}({\hat{\vartheta }}_n - \vartheta _0) + \mathrm {o}_{\mathbb P}(1). \end{aligned}$$
(18)

Denote \(\sqrt{n}\gamma _m = n^{-1/2}\sum _{t=1}^n s_tS_{t-1:t-m}\), where \(S_{t-1:t-m} = (s_{t-1},\ldots , s_{t-m})'\). We now derive the asymptotic distribution of \(\sqrt{n}({\hat{\vartheta }}'_n - \vartheta '_0, \gamma '_m)'\). In view of (15), the central limit theorem of Billingsley (1961) applied to the martingale difference process

$$\begin{aligned} \left\{ \Upsilon _t = \left( s_t J^{-1}\dfrac{1}{\zeta _t^2(\vartheta _0)}\dfrac{\partial \zeta _t^2(\vartheta _0)}{\partial \vartheta '}, s_tS_{t-1:t-m}'\right) ' ; \sigma (\eta _u, u \le t)\right\} , \end{aligned}$$

shows that

$$\begin{aligned} \sqrt{n}({\hat{\vartheta }}'_n - \vartheta '_0, \gamma '_m)'=\dfrac{1}{\sqrt{n}}\sum \limits _{t=1}^n\Upsilon _t + \mathrm {o}_{\mathbb P}(1)\xrightarrow [n\rightarrow \infty ]{\mathrm {d}} {\mathcal {N}}\left( 0, {\mathbb {E}}[\Upsilon _t\Upsilon _t']\right) , \end{aligned}$$
(19)

where

$$\begin{aligned} {\mathbb {E}}\left[ \Upsilon _t \Upsilon _t'\right] = (\kappa _\eta -1 ) \begin{pmatrix} J^{-1} &{} -J^{-1}C_m' \\ -C_mJ^{-1} &{} (\kappa _\eta - 1)I_m \end{pmatrix}. \end{aligned}$$

Using (18) and (19) we obtain the distribution of \(\sqrt{n}{{\hat{\gamma }}}_m\).

7.2.3 Invertibility of the matrix D

We now show that D is invertible. Assumption A5 entails that the law of \(\eta _t^2\) is non degenerated, therefore \(\kappa _\eta > 1\). Thus study the invertibility of the matrix D is similar to study the invertibility of \((\kappa _\eta - 1)I_m - C_m J^{-1} C_m'\). Let

$$\begin{aligned} V = S_{t-1:t-m} + C_m J^{-1}\dfrac{1}{\zeta _t^2(\vartheta _0)} \dfrac{\partial \zeta _t^2(\vartheta _0)}{\partial \vartheta }\quad \text {such that}\quad {\mathbb {E}}\left[ VV'\right] = (\kappa _\eta - 1)I_m - C_mJ^{-1}C_m'. \end{aligned}$$

If the matrix \({\mathbb {E}}\left[ VV'\right] \) is singular, then there exist a vector \(\lambda = (\lambda _1,\ldots , \lambda _m)'\) not equal to zero such that

$$\begin{aligned} \lambda ' V = \lambda ' S_{t-1:t-m} + \lambda 'C_mJ^{-1} \left( \dfrac{1}{\zeta _t^2(\vartheta _0)}\dfrac{\partial \zeta _t^2(\vartheta _0)}{\partial \theta } +\dfrac{1}{\zeta _t^2(\vartheta _0)}\dfrac{\partial \zeta _t^2(\vartheta _0)}{\partial \delta }\right) = 0, \quad a.s. \end{aligned}$$
(20)

since \(\vartheta =(\theta ',\delta )'\). Using the fact that

$$\begin{aligned} \dfrac{1}{\zeta _t^2(\vartheta _0)}\dfrac{\partial \zeta _t^2(\vartheta _0)}{\partial \theta }= & {} \dfrac{2}{\delta } \dfrac{1}{\zeta _t^\delta (\vartheta _0)}\dfrac{\partial \zeta _t^\delta (\vartheta _0)}{\partial \theta }\text { and } \dfrac{1}{\zeta _t^2(\vartheta _0)}\dfrac{\partial \zeta _t^2(\vartheta _0)}{\partial \delta }\\&= -\dfrac{2}{\delta ^2}\log (\zeta _t^\delta (\vartheta _0))+\dfrac{2}{\delta } \dfrac{1}{\zeta _t^\delta (\vartheta _0)}\dfrac{\partial \zeta _t^\delta (\vartheta _0)}{\partial \delta }, \end{aligned}$$

we can rewrite the equation (20) as follow

$$\begin{aligned} \lambda ' V = \lambda ' S_{t-1:t-m} + \mu ' \dfrac{1}{\zeta _t^\delta (\vartheta _0)}\left( \delta \dfrac{\partial \zeta _t^\delta (\vartheta _0)}{\partial \theta }-\zeta _t^\delta (\vartheta _0)\log (\zeta _t^\delta (\vartheta _0))+\delta \dfrac{\partial \zeta _t^\delta (\vartheta _0)}{\partial \delta }\right) = 0, \quad a.s. \end{aligned}$$
(21)

with \(\mu ' = (2/\delta ^2)\lambda 'C_mJ^{-1}\). We remark that \(\mu \ne 0\). Otherwise \(\lambda ' S_{t-1:t-m}=0\) a.s., which implies that there exists \(j\in \{1,...,m\}\) such that \(s_{t-j}\) is measurable with respect to the \(\sigma -\)field generated by \(s_r\) for \(t-1\le r\le t-m\) with \(r\ne t-j\). This is impossible because the \(s_t\)’s are independent and non degenerated.

We denote \(\mu = (\nu _1', \nu _2)'\), where \(\nu _1' = (\mu _1,\ldots , \mu _{2q+p+1})'\) and \(\nu _2 = \mu _{2q+p+2}\); and we rewrite (21) as

$$\begin{aligned} \lambda ' V&= \lambda ' S_{t-1:t-m} + \nu _1'\delta \dfrac{1}{\zeta _t^\delta (\vartheta _0)}\dfrac{\partial \zeta _t^\delta (\vartheta _0)}{\partial \theta } \\&\quad + \nu _2 \dfrac{1}{\zeta _t^\delta (\vartheta _0)}\left( -\zeta _t^\delta (\vartheta _0)\log (\zeta _t^\delta (\vartheta _0))+\delta \dfrac{\partial \zeta _t^\delta (\vartheta _0)}{\partial \delta }\right) = 0, \quad a.s. \end{aligned}$$

or equivalent,

$$\begin{aligned}&\lambda ' S_{t-1:t-m}\zeta _t^\delta (\vartheta _0) + \nu _1'\delta \dfrac{\partial \zeta _t^\delta (\vartheta _0)}{\partial \theta } \nonumber \\&\qquad +\, \nu _2 \left( -\zeta _t^\delta (\vartheta _0)\log (\zeta _t^\delta (\vartheta _0))+\delta \dfrac{\partial \zeta _t^\delta (\vartheta _0)}{\partial \delta }\right) = 0, \quad a.s. \end{aligned}$$
(22)

The derivatives involved in (22) are defined recursively by

$$\begin{aligned} \begin{aligned} \dfrac{\partial \zeta _t^\delta (\vartheta )}{\partial \theta }&= \underline{{c}}_t(\vartheta ) + \sum \limits _{j=1}^p \beta _j \dfrac{\partial \zeta _{t-j}^\delta (\vartheta )}{\partial \theta },\\ \dfrac{\partial \zeta _t^\delta (\vartheta )}{\partial \delta }&= \sum \limits _{i=1}^q \alpha _i^+\log (\varepsilon _{t-i}^+)(\varepsilon _{t-i}^+)^\delta + \alpha _i^-\log (-\varepsilon _{t-i}^-)(-\varepsilon _{t-i}^-)^\delta + \sum \limits _{j=1}^p \beta _j \dfrac{\partial \zeta _{t-j}^\delta (\vartheta )}{\partial \delta }, \end{aligned} \end{aligned}$$

where \(c_t(\vartheta )\) is defined by replacing \({{\tilde{\zeta }}}^\delta _t(\vartheta )\) by \(\zeta ^\delta _t(\vartheta )\) in \({{\tilde{c}}}_t(\vartheta )\) (see (5)). We remind that \(\varepsilon _t^+ = \zeta _t\eta _t^+\) and \(\varepsilon _t^- = \zeta _t\eta _t^-\) and let \(R_t\) a random variable measurable with respect to \(\sigma \{\eta _u, u \le t\}\) whose value will be modified along the proof. We decompose (22) in four terms and we have

$$\begin{aligned} \nu '_1\delta \dfrac{\partial \zeta _t^\delta (\vartheta _0)}{\partial \theta }&= \mu _2\delta \zeta _{t-1}^\delta (\eta _{t-1}^+)^\delta + \mu _{q+2}\delta \zeta _{t-1}^\delta (-\eta _{t-1}^-)^\delta + R_{t-2}, \\ \zeta _t^\delta&= \alpha _1^+\zeta _{t-1}^\delta (\eta _{t-1}^+)^\delta + \alpha _1^-\zeta _{t-1}^\delta (-\eta _{t-1}^-)^\delta + R_{t-2},\\ -\nu _2\zeta _t^\delta (\vartheta _0)\log (\zeta _t^\delta (\vartheta _0))&=-\nu _2\left( \alpha _1^+\zeta _{t-1}^\delta (\eta _{t-1}^+)^\delta + \alpha _1^-\zeta _{t-1}^\delta (-\eta _{t-1}^-)^\delta + R_{t-2}\right) \\&\quad \times \log \left( \alpha _1^+\zeta _{t-1}^\delta (\eta _{t-1}^+)^\delta + \alpha _1^-\zeta _{t-1}^\delta (-\eta _{t-1}^-)^\delta + R_{t-2}\right) \\ \lambda ' s_{t-1:t-m}&= \lambda _1 \eta _{t-1}^2 + R_{t-2}, \end{aligned}$$

that gives

$$\begin{aligned} \lambda ' s_{t-1:t-m} \zeta _t^\delta&= \lambda _1\zeta _{t-1}^\delta \left[ \alpha _1^+(\eta _{t-1}^+)^{\delta +2} + \alpha _1^-(-\eta _{t-1}^-)^{\delta + 2}\right] + \lambda _1\eta _{t-1}^2R_{t-2} + R_{t-2}\\&\quad + \left[ (\eta _{t-1}^+)^\delta + (-\eta _{t-1}^-)^\delta \right] R_{t-2}, \end{aligned}$$

and

$$\begin{aligned} \nu _2\delta \dfrac{\partial \zeta _t^\delta (\vartheta _0)}{\partial \delta }= & {} \nu _2\delta \alpha _1^+\log \left( \zeta _{t-1}(\eta _{t-1}^+)\right) \zeta _{t-1}^\delta (\eta _{t-1}^+)^\delta \\&+ \nu _2\delta \alpha _1^-\log \left( \zeta _{t-1}(-\eta _{t-1}^-)\right) \zeta _{t-1}^\delta (-\eta _{t-1}^-)^\delta \\&+ R_{t-2}, \\= & {} \nu _2\alpha _1^+\log \left( \zeta _{t-1}^\delta (\eta _{t-1}^+)^\delta \right) \zeta _{t-1}^\delta (\eta _{t-1}^+)^\delta \\&+ \nu _2 \alpha _1^-\log \left( \zeta _{t-1}^\delta (-\eta _{t-1}^-)^\delta \right) \zeta _{t-1}^\delta (-\eta _{t-1}^-)^\delta + R_{t-2}. \end{aligned}$$

Following these previous expressions, (21) entails that almost surely

$$\begin{aligned} \lambda 'V&= \lambda _1\zeta _{t-1}^\delta \left[ \alpha _1^+(\eta _{t-1}^+)^{\delta +2} + \alpha _1^-(-\eta _{t-1}^-)^{\delta + 2}\right] \\&\quad + \eta _{t-1}^2R_{t-2} + \left[ R_{t-2}+\nu _2\alpha _1^+R_{t-2}\log (\zeta _{t-1}(\eta _{t-1}^+))\right] (\eta _{t-1}^+)^\delta \\&\quad + \left[ R_{t-2}+\nu _2\alpha _1^-R_{t-2}\log (\zeta _{t-1}(-\eta _{t-1}^-))\right] (-\eta _{t-1}^-)^\delta R_{t-2} \\&\quad + R_{t-2} -\nu _2\left( \alpha _1^+\zeta _{t-1}^\delta (\eta _{t-1}^+)^\delta + \alpha _1^-\zeta _{t-1}^\delta (-\eta _{t-1}^-)^\delta + R_{t-2}\right) \\&\quad \log \left( \alpha _1^+\zeta _{t-1}^\delta (\eta _{t-1}^+)^\delta + \alpha _1^-\zeta _{t-1}^\delta (-\eta _{t-1}^-)^\delta + R_{t-2}\right) =0, \end{aligned}$$

or equivalent to the two equations

$$\begin{aligned}&\lambda _1\zeta _{t-1}^\delta \alpha _1^+(\eta _{t-1}^+)^{\delta +2} -\left( \nu _2\alpha _1^+\zeta _{t-1}^\delta (\eta _{t-1}^+)^\delta + R_{t-2}\right) \log \left( \alpha _1^+\zeta _{t-1}^\delta (\eta _{t-1}^+)^\delta + R_{t-2}\right) \nonumber \\&\quad +\left[ R_{t-2} + \nu _2\alpha _1^+R_{t-2}\log (\zeta _{t-1}(\eta _{t-1}^+))\right] (\eta _{t-1}^+)^\delta + \eta _{t-1}^2R_{t-2} + R_{t-2} =0,\, a.s. \end{aligned}$$
(23)
$$\begin{aligned}&\quad \lambda _1\zeta _{t-1}^\delta \alpha _1^-(-\eta _{t-1}^+)^{\delta +2} -\left( \nu _2\alpha _1^-\zeta _{t-1}^\delta (-\eta _{t-1}^-)^\delta + R_{t-2}\right) \nonumber \\&\quad \log \left( \alpha _1^-\zeta _{t-1}^\delta (-\eta _{t-1}^-)^\delta + R_{t-2}\right) + \left[ R_{t-2} + \nu _2\alpha _1^-R_{t-2}\log (\zeta _{t-1}(-\eta _{t-1}^-))\right] \nonumber \\&\quad (-\eta _{t-1}^-)^\delta + \eta _{t-1}^2R_{t-2} + R_{t-2} =0,\, a.s.. \end{aligned}$$
(24)

Note that an equation of the form

$$\begin{aligned} a\vert x \vert ^{\delta + 2} + [b + c(\vert x \vert ^\delta )]\log [b + c(\vert x \vert ^\delta )]+[d + e\log (\vert x \vert )] \vert x \vert ^\delta + f x^2 + g = 0 \end{aligned}$$

cannot have more than 11 positive roots or more than 11 negative roots, except if \(a=b=c=d=e=f=g=0\). By assumption A1, Eqs. (23) and (24) thus imply that \(\lambda _1(\alpha _1^+ + \alpha _1^-)=0\) and \(\nu _2(\alpha _1^+ + \alpha _1^-)=0\). If \(\lambda _1=0\) and \(\nu _2=0\) then \(\lambda 'S_{t-1:t-m} := \lambda _{2:m}'S_{t-2:t-m}\). By (22), we can write that

$$\begin{aligned}&\left[ \alpha _1^+\zeta _{t-1}^\delta (\eta _{t-1}^+)^\delta + \alpha _1^-\zeta _{t-1}^\delta (-\eta _{t-1}^-)^\delta \right] \lambda _{2:m}' S_{t-2:t-m} \\&\quad = -\mu _2\zeta _{t-1}^\delta (\eta _{t-1}^+)^\delta + \mu _{q+2}\zeta _{t-1}^\delta (-\eta _{t-1}^-)^\delta + R_{t-2}, \end{aligned}$$

which entails

$$\begin{aligned} \alpha _1^+\zeta _{t-1}^\delta (\eta _{t-1}^+)^\delta \lambda _{2:m}' S_{t-2:t-m} = -\mu _2\zeta _{t-1}^\delta (\eta _{t-1}^+)^\delta + R_{t-2} \end{aligned}$$

and a similar expression with \((-\eta _{t-1}^-)^\delta \) can be obtained. Subtracting the conditional expectation with respect to \({\mathcal {F}}_{t-2}=\sigma \{\eta _r^+, \eta _r^- \ {;}\ r\le t-2\}\) in both sides of the previous equation, we obtain

$$\begin{aligned} \begin{aligned}&\alpha _1^+ \zeta _{t-1}^\delta \lambda _{2:m}'S_{t-2:t-m} \left[ (\eta _{t-1}^+)^\delta - {\mathbb {E}}[(\eta _{t-1}^+)^\delta \vert {\mathcal {F}}_{t-2}]\right] \\&\quad = \mu _2\zeta _{t-1}^\delta \left[ {\mathbb {E}}[(\eta _{t-1}^+)^\delta \vert {\mathcal {F}}_{t-2}] - (\eta _{t-1}^+)^\delta \right] ,\quad a.s.\\&\alpha _1^+ \zeta _{t-1}^\delta \lambda _{2:m}'S_{t-2:t-m} \left[ (\eta _{t-1}^+)^\delta - {\mathbb {E}}[(\eta _{t-1}^+)^\delta ]\right] \\&\quad = \mu _2\zeta _{t-1}^\delta \left[ {\mathbb {E}}[(\eta _{t-1}^+)^\delta ] - (\eta _{t-1}^+)^\delta \right] ,\quad a.s.. \end{aligned} \end{aligned}$$

Since the law of \(\eta _t\) is non degenerated, we have \(\alpha _1^+ = \mu _2 = 0\) and symmetrically \(\alpha _1^- = \mu _{q+2} = 0\). But for APGARCH(p, 1) models, it is impossible to have \(\alpha _1^+ = \alpha _1^- = 0\) by the assumption A4. The invertibility of D is thus shown in this case. For APGARCH(pq) models, by iterating the previous arguments, we can show by induction that (21) entails \(\alpha _1^+ + \alpha _1^- = \ldots = \alpha _q^+ + \alpha _q^-=0\). Thus \(\lambda _1=\dots =\lambda _m=0\) which leads to a contradiction. The non-singularity of the matrix D follows. \(\square \)

7.3 Proof of Theorem 2

The almost sure convergence of \({\hat{D}}\) to D as n goes to infinity is easy to show using the consistency result. The matrix D can be rewritten as \(D = (\kappa _\eta - {\hat{\kappa }}_\eta )B + ({\hat{\kappa }}_\eta -1)A,\) where the matrices A and B are given by

$$\begin{aligned} A&= (C_m - {\hat{C}}_m)J^{-1}C_m' + {\hat{C}}_m(J^{-1} - {\hat{J}}^{-1})C_m' + {\hat{C}}_m{\hat{J}}^{-1}(C_m' - {\hat{C}}_m') + {\hat{A}},\\ B&= (A-{\hat{A}}) + (\kappa _\eta - {\hat{\kappa }}_\eta )I_m + {\hat{B}}, \end{aligned}$$

with \({\hat{A}} = {\hat{C}}_m{\hat{J}}^{-1}{\hat{C}}_m'\) and \({\hat{B}}=({\hat{\kappa }}_\eta -1)I_m - {\hat{A}}\). Finally, we have

$$\begin{aligned} D-{\hat{D}} = (\kappa _\eta - {\hat{\kappa }}_\eta )B + ({\hat{\kappa }}_\eta - 1)\left[ (A-{\hat{A}}) + (\kappa _\eta - {\hat{\kappa }}_\eta )I_m\right] . \end{aligned}$$

For any multiplicative norm, we have

$$\begin{aligned} \Vert D-{\hat{D}}\Vert \le \vert \kappa _\eta - {\hat{\kappa }}_\eta \vert \Vert B \Vert + \vert {\hat{\kappa }}_\eta - 1 \vert \left[ \Vert A - {\hat{A}}\Vert + \vert \kappa _\eta - {\hat{\kappa }}_\eta \vert m \right] \end{aligned}$$

and

$$\begin{aligned}&\Vert A - {\hat{A}} \Vert \le \Vert C_m - {\hat{C}}_m \Vert \Vert J^{-1} \Vert \Vert C_m' \Vert + \Vert {\hat{C}}_m\Vert \Vert J^{-1} \Vert \Vert {\hat{J}} - J \Vert \Vert {\hat{J}}^{-1} \Vert \Vert C_m' \Vert \\&\quad + \Vert C_m \Vert \Vert {\hat{J}}^{-1}\Vert \Vert C_m' - {\hat{C}}_m'\Vert . \end{aligned}$$

In view of (13), we have \(\Vert C_m \Vert <\infty \). Because the matrix J is nonsingular, we have \(\Vert J^{-1} \Vert <\infty \) and

$$\begin{aligned} \Vert {\hat{J}}^{-1}-J^{-1} \Vert \underset{n\rightarrow \infty }{\longrightarrow } 0,\quad a.s. \end{aligned}$$

by consistency of \({\hat{\vartheta _n}}\). Under Assumption A5, we have \(\vert \kappa _\eta - 1\vert \le K\). Using the previous arguments and also the strong consistency of \({\hat{\vartheta _n}}\), we have

$$\begin{aligned} \vert \kappa _\eta - {\hat{\kappa }}_\eta \vert \underset{n\rightarrow \infty }{\longrightarrow } 0,\quad a.s.\text { and } \Vert C_m - {\hat{C}}_m \Vert \underset{n\rightarrow \infty }{\longrightarrow } 0,\quad a.s. \end{aligned}$$

We then deduce that \(\Vert B \Vert \le K\) and the conclusion follows. Thus \({\hat{D}} \underset{n\rightarrow \infty }{\longrightarrow } D\) almost surely.

To conclude the proof of Theorem 2, it suffices to use Theorem 1 and the following result: if \(\sqrt{n} {{\hat{\gamma }}}_m \xrightarrow [n\rightarrow \infty ]{\mathrm {d}} {\mathcal {N}}\left( 0,D\right) \), with D nonsingular, and if \({\hat{D}}\underset{n\rightarrow \infty }{\longrightarrow } D\) in probability, then \( n {{\hat{\gamma }}}_m'{\hat{D}}^{-1}{{\hat{\gamma }}}_m \xrightarrow [n\rightarrow \infty ]{\mathrm {d}} \chi ^2_m.\) \(\square \)

Table 16 Portmanteau test \(p-\)values for adequacy of the APGARCH(pq) models for daily returns of exchange rates, based on m squared residuals autocovariances

7.4 Proof of Remark 1

We suppose that \(\mathrm {\mathbf {H_1}}\) holds true. One may rewrite the above arguments in order to prove that there exists a nonsingular matrix \(D^*\) such that

$$\begin{aligned} \sqrt{n} ( {{\hat{\gamma }}}_m - \gamma ^0_m ) \xrightarrow [n\rightarrow \infty ]{\mathrm {d}} {\mathcal {N}}\left( 0,D^*\right) \ . \end{aligned}$$
(25)

The matrix \(D^*\) is given by \({D^*}= \Sigma _{\gamma ^0_m} + C_m^*({\kappa }_\eta - 1)J^{-1}{C_m^*}' + C_m^*\Sigma _{{\hat{\vartheta }}_n,\gamma ^0_m} + \Sigma _{{\hat{\vartheta }}_n,\gamma ^0_m}'{C_m^*}'\), where the matrices \(\Sigma _{\gamma ^0_m}\) and \(\Sigma _{{\hat{\vartheta }}_n,\gamma ^0_m}\) are obtained from the asymptotic distribution of

$$\begin{aligned} \dfrac{1}{\sqrt{n}} \sum \limits _{t=1}^n \Upsilon ^*_t&:=\dfrac{1}{\sqrt{n}} \sum \limits _{t=1}^n \left( s_t J^{-1}\dfrac{1}{\zeta _t^2(\vartheta _0)}\dfrac{\partial \zeta _t^2(\vartheta _0)}{\partial \vartheta '}, s_tS_{t-1:t-m}'-{\gamma ^0_m}'\right) '\\&\xrightarrow [n\rightarrow \infty ]{\mathrm {d}} {\mathcal {N}}(0, {\mathbb {E}}\left[ \Upsilon ^*_t {\Upsilon ^*_t}'\right] ), \end{aligned}$$

with

$$\begin{aligned} {\mathbb {E}}\left[ \Upsilon ^*_t {\Upsilon ^*_t}'\right] =: \begin{pmatrix} ({\kappa }_\eta - 1)J^{-1} &{} \Sigma _{{\hat{\vartheta }}_n,\gamma ^0_m} \\ \Sigma _{{\hat{\vartheta }}_n,\gamma ^0_m}' &{} \Sigma _{\gamma ^0_m} \end{pmatrix} . \end{aligned}$$

For \(h=1,\dots ,m\) the row h of the matrix \(C^*_m\) is given by

$$\begin{aligned} c^*_h&:= {\mathbb {E}}\left[ s_{t-h}(\vartheta _0) \dfrac{\partial s_t(\vartheta _0)}{\partial \vartheta }+s_{t}(\vartheta _0) \dfrac{\partial s_{t-h}(\vartheta _0)}{\partial \vartheta }\right] \\ {}&= -{\mathbb {E}}\left[ s_{t-h}\dfrac{1}{\zeta _t^2(\vartheta _0)} \dfrac{\partial \zeta _t^2(\vartheta _0)}{\partial \vartheta }+s_{t}\dfrac{1}{\zeta _{t-h}^2(\vartheta _0)} \dfrac{\partial \zeta _{t-h}^2(\vartheta _0)}{\partial \vartheta }\right] . \end{aligned}$$

Consequently we have

$$\begin{aligned} \dfrac{\partial \gamma _m(\vartheta _0)}{\partial \vartheta } \underset{n\rightarrow \infty }{\longrightarrow } C^*_m := \begin{pmatrix} {c^*_1}' \\ \vdots \\ {c^*_m}' \end{pmatrix}. \end{aligned}$$

Now we write

$$\begin{aligned}\sqrt{n} {{{\hat{D}}}}^{-1/2}{\hat{\gamma _m}}&= {{{\hat{D}}}}^{-1/2} \sqrt{n} ( {\hat{\gamma _m}} - \gamma ^0_m ) + {{{\hat{D}}}}^{-1/2} \sqrt{n} \gamma ^0_m \\&= D^{-1/2} \sqrt{n} ( {\hat{\gamma _m}} - \gamma ^0_m ) + D^{-1/2} \sqrt{n} \gamma ^0_m + \mathrm {o}_{\mathbb P} (1) \ . \end{aligned}$$

Then it holds that

$$\begin{aligned} n{{\hat{\gamma }}}_m'{\hat{D}}^{-1}{{\hat{\gamma }}}_m&= \big ( \sqrt{n} {{{\hat{D}}}}^{-1/2}{\hat{\gamma _m}} \big ) ' \times \big ( \sqrt{n} {{{\hat{D}}}}^{-1/2}{\hat{\gamma _m}} \big ) \nonumber \\&= n ( {\hat{\gamma _m}} - \gamma ^0_m )' D^{-1} ( {\hat{\gamma _m}} - \gamma ^0_m ) + 2 n ( {\hat{\gamma _m}} - \gamma ^0_m )' D^{-1} \gamma ^0_m + n { \gamma ^0_m}'D^{-1} \gamma ^0_m + \mathrm {o}_{\mathbb P} (1) \end{aligned}$$
(26)

By the ergodic theroem, \( ( {\hat{\gamma _m}} - \gamma ^0_m )' D^{-1} \gamma ^0_m = \mathrm {o}_{\mathbb P} (1)\). By Van der Vaart (see [van der Vaart (1998), Lemma 17.1]), the convergence (25) implies that

$$\begin{aligned} ( {\hat{\gamma _m}} - \gamma ^0_m )' D^{-1} ( {\hat{\gamma _m}} - \gamma ^0_m ) \xrightarrow [n\rightarrow \infty ]{\mathrm {d}} \sum _{i=1}^m \lambda _i Z_i^2 \end{aligned}$$

where \((Z_i)_{1\le i\le m}\) are i.i.d. with \({\mathcal {N}}(0,1)\) laws and the \(\lambda _i\)’s are the eigenvalues of the matrix \(D^{-1/2}D^*D^{-1/2}\). Reporting these convergences in (26), we deduce that

$$\begin{aligned} {{\hat{\gamma }}}_m'{\hat{D}}^{-1}{{\hat{\gamma }}}_m&= {\hat{\gamma _m}} - \gamma ^0_m )' D^{-1} ( {\hat{\gamma _m}} - \gamma ^0_m ) + 2 ( {\hat{\gamma _m}} - \gamma ^0_m )' D^{-1} \gamma ^0_m + { \gamma ^0_m}'D^{-1} \gamma ^0_m + \mathrm {o}_{\mathbb P} (1) \\&= { \gamma ^0_m}'D^{-1} \gamma ^0_m + \mathrm {o}_{\mathbb P} (1) \end{aligned}$$

and the remark is proved. \(\square \)