1 Introduction

After Engle (1982) initiated the literature on autoregressive conditional heteroskedasticity (ARCH) and the model proved itself to be very useful in empirical applications, an immense amount of research has been directed towards extending Engle’s original ideas empirically as well as theoretically (see for example Dias-Curto et al. (2009) as an example of the usefulness of GARCH models in practice). Chan, Karolyi, Longstaff and Sanders (1992, CKLS from now onwards) proposed to introduce the lagged level of the spot interest rate in the conditional variance equation, generating the so called level-effect ARCH model. This model has subsequently been successfully used and extended by Brenner et al. (1996), Andersen and Lund (1997), Ball and Torous (1999) among others. Recently, Maheu and Yang (2016) have estimated with Bayesian methods and using financial data both CKLS model and also the CKLS model with ARCH disturbances; while Bu et al. (2017) have estimated the CKLS in two different regimes also using financial data. However, despite of its empirical success, the asymptotic behavior of the quasi-maximum likelihood (QML) estimator associated with the level-effect ARCH model has, to the best of our knowledge, not been formally established yet. Most papers on conditional heteroskedastic time series, see, e.g., Berkes and Horváth (2004), Straumann and Mikosch (2006), Hamadeh and Zakoïan (2011) and Francq et al. (2018), do not allow for the introduction of the level of a series, such as the interest rate, in the conditional variance equation. The double autoregressive model of Ling (2004) is an exception. Triffi (2006) illustrates the convergence results for the Constant Elasticity of Variance (CEV)-ARCH model of Fornari and Mele (2006), but from the best of our knowledge, the asymptotic normality and consistency of the QML/ML estimator for the level-ARCH model is still unknown. In the following sections we will present the model and provide a simple proof of asymptotic normality and consistency of the ML estimator within the traditional level-effect ARCH(1) setting. The simulation section confirms our theoretical results and finally, we conclude.

2 The level-effect ARCH model

Consider the discrete-time approximation of the CKLS model

$$\begin{aligned} y_{t}^{*}= & {} \Delta y_{t}-\left( a+by_{t-1}\right) =\sigma _{t}\left| y_{t-1}\right| ^{\gamma }z_{t}, \end{aligned}$$
(1)
$$\begin{aligned} \sigma _{t}^{2}= & {} w+\alpha \left( \frac{y_{t-1}^{*}}{\left| y_{t-2}\right| ^{\gamma }}\right) ^{2}, \end{aligned}$$
(2)

for \(t=1,\ldots ,T\) that we denote the level-effect ARCH.Footnote 1 Let us denote the parameter vector of interest by \(\theta =\left( \gamma ,w,\alpha \right) {\acute{}} \) and let the true parameter values be given by \(\theta _{0}=\left( \gamma _{0},w_{0},\alpha _{0}\right) ^{\prime }\). Further, in many applications \( y_{t}^{*}\) is chosen as a transformation of current and lagged values of \(y_{t},\) such as \(y_{t}^{*}=y_{t}^{*}\left( y_{t},y_{t-1},y_{t-2},\ldots ;\delta \right) \) where \(\delta \) is a vector of parameters. One such specification could be \(y_{t}^{*}=\Delta y_{t}-\left( a+by_{t-1}\right) ,\) where \(\delta =\left( a,b\right) ^{\prime },\) such as the one considered in Andersen and Lund (1997, p. 354) and in Broze et al (1995, Eq. 2). However, also note that model (1), (2) is not exactly that of Andersen and Lund (1997), as they use \(\Delta y_{t}=\left( a+by_{t-1}\right) +\sigma _{t}y_{t-1}^{\gamma }z_{t}\) instead of \(\Delta y_{t}=\left( a+by_{t-1}\right) +\sigma _{t}\left| y_{t-1}\right| ^{\gamma }z_{t}.\) The absolute value is needed, as otherwise the discrete-time model does not ensure that \(y_{t}\) is non-negative. In practice, \(\delta \) can be pre-estimated in a first stage. This pre-estimation approach is very common in empirical research, particularly, when modelling spot interest rates, see, for example, Ball and Torous (1999, p. 2349). In order to avoid additional complexity of the proofs, we assume throughout this paper that \(\delta \) is known, see also Remark 2.Footnote 2 It should be noted that (1)–(2) is a generalization of Frydman (1994), who consider a discrete-time process, but where \(\alpha _{0}=0.\) Broze, Scaillet and Zakoïan (1995, Eq. 2 in p. 202) have also analyzed the regular level effect model but without the ARCH component, but where they introduce also \(\left| y_{t-1}\right| ^{\gamma }z_{t}\) in (1). Moreover, for the case where \(\gamma _{0}\) is known, (1), (2) is the standard linear model for which a complete characterization of the estimation theory has been developed by Jensen and Rahbek (2004a, b) and in Kristensen and Rahbek (2005, 2008). The “quasi”-log likelihood function (conditional on past values of \(y_{t}\)) associated with (1), (2) is given as

$$\begin{aligned} L_{T}\left( \theta \right)= & {} \sum _{t=1}^{T}l_{t}\left( \theta \right) =-\frac{ 1}{2}\sum _{t=1}^{T}\ln \left[ y_{t-1}^{2\gamma }\left( w+\alpha \left( \frac{ y_{t-1}^{*}}{\left| y_{t-2}\right| ^{\gamma }}\right) ^{2}\right) \right] \nonumber \\&-\frac{1}{2}\sum _{t=1}^{T}\frac{\left( y_{t}^{*}\right) ^{2}}{y_{t-1}^{2\gamma }\left( w+\alpha \left( \frac{y_{t-1}^{*} }{\left| y_{t-2}\right| ^{\gamma }}\right) ^{2}\right) }. \end{aligned}$$
(3)

We proceed under the following set of maintained assumptions:

Assumption A

A1 :

\(z_{t}\sim N.i.i.d.\left( 0,1\right) ,\)

A2 :

\(\infty>w_{0}>0,\) \(\infty>\alpha _{0}>0,\)

A3 :

\(E\left[ \left( \frac{y_{t-1}^{*}}{\left| y_{t-2}\right| ^{i}}\right) ^{\varphi }\left| \ln \left( \left| y_{t-1}\right| \right) \right| ^{3}\right] <\infty ,\) \(E\left[ \left( \frac{y_{t-1}^{*}}{\left| y_{t-2}\right| ^{i}}\right) ^{\varphi }\left( \ln \left( \left| y_{t-1}\right| \right) \right) ^{2}\left| \ln \left( \left| y_{t-2}\right| \right) \right| \right] <\infty ,\) \(E\left[ \left( \frac{y_{t-1}^{*}}{\left| y_{t-2}\right| ^{i}} \right) ^{\varphi }\left( \ln \left( \left| y_{t-2}\right| \right) \right) ^{2}\left| \ln \left( \left| y_{t-1}\right| \right) \right| \right] <\infty ,\) for both \(\varphi =0\) and for some \(\varphi >0 \) for \(i=0,1.\)

Assumption B

B1 :

\(E\ln \left( \alpha _{0}z_{t}^{2}\right) <0\),

B2 :

\(1\ge \gamma _{0}\ge 0,\) \(\left| b+1\right| <1,\) \(E\ln \left| b+1+\alpha _{0}z_{t}^{2}\right| <0.\)

Assumptions A1, A2 and B1 are very common in the traditional ARCH literature (see e.g. Jensen and Rahbek (2004a, b), p. 1205). In A1 we need to impose Gaussianity as we make use in our proofs of the results in Broze et al. (1995). Also, A1 implies that \(E\left( \left( 1-z_{t}^{2}\right) ^{2}\right) =\zeta =2.\) Note also, A3 -an assumption very specific for the level-effect ARCH model- explains the necessity to introduce \(\left| y_{t-1}\right| ^{\gamma }\) in (1) as in Broze et al (1995, Eq. 2).Footnote 3 In B2, we require \(1\ge \gamma _{0}\ge 0\)-see Broze et al (1995, Proposition 3) for more details-. The condition for stationarity of \( \sigma _{t}^{2},\) \(\left( y_{t}^{*}/y_{t-1}^{\gamma }\right) ,\) \( y_{t}^{*}\) and \(y_{t}\) is given by the following two Lemmas:

Lemma 1

Let Assumption A hold. A necessary and sufficient condition for strict stationarity of \(\sigma _{t}^{2}\) and \(\left( y_{t}^{*}/\left| y_{t-1}\right| ^{\gamma _{0}}\right) \) as generated by (1), (2) is given by

$$\begin{aligned} E\ln \left( \alpha _{0}z_{t}^{2}\right) <0. \end{aligned}$$

Proof of Lemma 1

Lemma 1 is not a new result in the literature since if \(\left( y_{t}^{*}/\left| y_{t-1}\right| ^{\gamma _{0}}\right) \) is known, the level-effect ARCH model reduces to the ARCH(1) and the condition for strict stationarity is well known (see e.g. Jensen and Rahbek 2004a). \(\square \)

Lemma 2

Let Assumption A and B1 hold. A necessary and sufficient condition for ergodicity and second order stationarity of \(y_{t}^{*}\) and \(y_{t}\) as generated by (1), (2) is given by

$$\begin{aligned} 1>\gamma _{0}\ge 0,\left| b+1\right|<1,E\ln \left| b+1+\alpha _{0}z_{t}^{2}\right| <0. \end{aligned}$$

When \(\gamma _{0}=1,\) the previous condition is a sufficient condition for second order stationarity and ergodicity.

Proof of Lemma 2

Given in the proof of Proposition 3 of Broze et al (1995, Appendix D) but replacing \(\sigma _{0,h}\) in Broze et al. (1995) by the required strict stationarity condition of \(\sigma _{t}^{2}\) that was proved in Lemma 1, and where the Gaussianity of \(z_{t}\) is used in the definition of the transition density.

Next, the main result of the paper regarding the limiting distribution of the ML estimator in the level-effects ARCH model can be established. \(\square \)

Theorem 1

Define \(u_{1t}\left( \theta _{0}\right) =\left( \ln \left| y_{t-1}\right| -w_{0}\left( \frac{1}{w_{0}}-\frac{1}{\sigma _{t}^{2}\left( \theta _{0}\right) }\right) \ln \left| y_{t-2}\right| \right) \), \(u_{2t}\left( \theta _{0}\right) =\left( \frac{1}{\sigma _{t}^{2}\left( \theta _{0}\right) }\right) \) and \(u_{3t}\left( \theta _{0}\right) =\left( \frac{w_{0}}{\alpha _{0}}\right) \left( \frac{1}{w_{0}}- \frac{1}{\sigma _{t}^{2}\left( \theta _{0}\right) }\right) \), let Assumptions A and B hold, \(\theta _{0}\) is assumed to be an interior point of the parameter space and assume that \(\delta =\left( a,b\right) ^{\prime }\) is known. Consider the log likelihood function given by (3). Then, there exists a fixed open neighborhood \(U=U\left( \theta _{0}\right) \) of \(\theta _{0}\) such that with probability tending to one as \( T\longrightarrow \infty ,\) \(L_{T}\left( \theta \right) \) has a unique maximum point \({\widehat{\theta }}\) in U. In addition, the ML estimator \( {\widehat{\theta }}\) is consistent and asymptotically normal

$$\begin{aligned} \sqrt{T}\left[ {\widehat{\theta }}-\theta _{0}\right] {\acute{}} \overset{d}{\longrightarrow }N\left( 0,\left( \zeta ^{2}/4\right) \Lambda ^{-1}\right) , \end{aligned}$$

where

$$\begin{aligned} \Lambda =\zeta \left[ \begin{array}{ccc} {\overline{m}}_{11} &{} \frac{1}{2}{\overline{m}}_{12} &{} \frac{1}{2}{\overline{m}} _{13} \\ \frac{1}{2}{\overline{m}}_{12} &{} \frac{1}{4}{\overline{m}}_{22} &{} \frac{1}{4} {\overline{m}}_{23} \\ \frac{1}{2}{\overline{m}}_{13} &{} \frac{1}{4}{\overline{m}}_{23} &{} \frac{1}{4} {\overline{m}}_{33} \end{array} \right] >0, \end{aligned}$$

\(\zeta =2\) and \({\overline{m}}_{ij}=E\left( u_{it}\left( \theta _{0}\right) u_{jt}\left( \theta _{0}\right) \right) \) for \(i=1,2,3\) and \(j=1,2,3\).Footnote 4

Proof of Theorem 1

The proof of Theorem 1 is given in the Appendix.

Importantly, Theorem 1 applies to the MLE of the stationary level-effect ARCH(1) process. However, if \(\gamma _{0}\) is known under Assumption A but \( E\ln \left( \alpha _{0}z_{t}^{2}\right) >0\) then the asymptotics for the ML estimator of \(\alpha \) can still be established. To see this, simply notice that the model (1), (2) in this case can be rewritten as

$$\begin{aligned} {\widetilde{y}}_{t}= & {} \sigma _{t}z_{t} \\ \sigma _{t}^{2}= & {} \omega +\alpha {\widetilde{y}}_{t-1}^{2} \end{aligned}$$

for \({\widetilde{y}}_{t}\equiv \frac{y_{t}^{*}}{\left| y_{t-1}\right| ^{\gamma _{0}}}.\) This representation of the process \( {\widetilde{y}}_{t}\) is exactly identical to the model given by equation (1) in Jensen and Rahbek (2004a). Consequently, when \(E\ln \left( \alpha _{0}z_{t-1}^{2}\right) >0\) (Assumption B fails) then \({\widetilde{y}}_{t}^{2} \overset{a.s.}{\longrightarrow }\infty \) from Lemma 1 as \(\sigma _{t}^{2} \overset{a.s.}{\longrightarrow }\infty \) (see also Klüppelberg et al. (2004)) and the asymptotic results follows directly from Jensen and Rahbek (2004a, Lemmas 1–5).Footnote 5 The case of \(E\ln \left( \alpha _{0}z_{t-1}^{2}\right) =0\) implies, as shown in Klüppelberg et al. (2004) for an ARCH(1) (under suitable conditions), that \(\sigma _{t}^{2}\overset{p}{\longrightarrow }\infty \) and therefore different arguments are required in this case (see Pedersen and Rahbek (2016)). Three remarks should be added: \(\square \)

Remark 1

It is well known, that the stationary level-effect ARCH model, can be estimated by non-parametric techniques, since the variance function is smooth and only depends on \(y_{t-1}.\) In the nonstationary case, Han and Zhang (2009) consider ARCH models by applying the results of Wang and Phillips (2009a, b), although they do not allow for a level-effect.

Remark 2

In model (1), (2), we are assuming that \( \delta =\left( a,b\right) ^{\prime }\) is known. In practice, \(\delta \) can be pre-estimated in a first stage. If \(\delta \) and \(\theta \) are estimated jointly in mean and variance equation (contrary to Ball and Torous (1999)), then our proof will need to be extended to account for the joint estimation. At this stage we do not know if this would require stronger assumptions than those in Assumptions A and B.

Remark 3

The generalization of the asymptotic results when going from ARCH(1) to GARCH(1,1) can most likely be provided in a similar fashion as the extension from Jensen and Rahbek (2004a) to Jensen and Rahbek (2004b), with the added complexity in the proofs.

3 Simulations

In this section we evaluate and discuss, based on simulations, how well the asymptotic results of \({\widehat{\theta }}\) given by Theorem 1 can approximate the finite sample properties. We also report simulation results to check the consequences when some of the assumptions are violated. We set \(\delta =\left( a,b\right) ^{\prime }=\left( 0,-1\right) ^{\prime }\) in all simulations and we do not estimate it.

In Panel A of Table 1, results are reported on point estimates, their associated biases and the root mean squared errors (RMSE’s) when Assumption B1 holds, i.e., the volatility process is stationary. In Panel A, three alternative data generating processes are considered: The first is characterized by having standard Gaussian distributed innovations whereas the two remaining processes have fatter tails as the innovations are standard t-distributed with 5 degrees of freedom. The results show that under all three data generating processes, the MLE is relatively accurate with small biases and small RMSE’s even at small sample sizes, i.e., \(T=1000\) . As expected from Theorem 1, the biases and the RMSE’s are decreasing for all the estimators as sample sizes increase. Note that in Theorem 1, we show the asymptotic theory for the MLE, however in this simulation section we also show what happens under alternative distribution functions for the innovations.

In Panel B of Table 1, two data generating processes, both in violation with Assumption B1, are considered. As noted in Jensen and Rahbek (2004a, b) \( \omega \) is not identified in this case hence is fixed at its true value in the population. The results of Panel B, Table 1, clearly illustrates that the consistency properties of the MLE of \(\alpha \) still holds when \(\sigma _{t}^{2}\) and \(\left( y_{t}^{*}/y_{t-1}^{\gamma _{0}}\right) \) are nonstationary. The main theoretical results of this paper are silent about the asymptotic properties of \(\gamma \) when B1 is violated. This is due to the complexity of the expressions of the first, second and third order derivatives of the loglikelihood with respect to \(\gamma \), and we can only analyze the case when B1 holds. However the simulations seem to indicate strongly that consistency of the MLE of \(\gamma \) is maintained also when Assumption B1 does not hold.

Table 1 Simulation results on the QMLE, i.e., \({\widehat{\theta }} = \left( {\widehat{\gamma }},{\widehat{\omega }},{\widehat{\alpha }} \right) \)

In Table 2, consistency of the estimated standard errors of the MLE in finite samples is illustrated. The term SE \(\left( {\widehat{\gamma }}\right) \) denotes the “feasible” standard error of \({\widehat{\gamma }}\) computed according to the expression for the asymptotic variance-covariance function derived in Theorem 1, but where all the population parameters are replaced by sample estimates (sample analogy estimation). ASE \(\left( {\widehat{\gamma }}\right) \) denotes the asymptotic standard error of \({\widehat{\gamma }}\). It is defined similar to SE \(\left( {\widehat{\gamma }}\right) \) but computed based on the true populations parameters.

We see that for all the models presented in Table 2, there is a very close correspondence between SE \(\left( {\widehat{\gamma }}\right) \) and ASE \(\left( {\widehat{\gamma }}\right) \). These results are very encouraging, implying that the asymptotic variance covariance matrix provides a good approximation of the parameter estimation uncertainty also in finite sample. It is also noticeable, that in the cases of relative fat tailed t-distributed innovations (when using the misspecified Gaussian quasi-likelihood) the standard errors of the MLE increase as expected, relative to the cases where innovations are normally distributed. But the observed increase in parameter estimation uncertainty seems to be only of limited magnitude. Similar results hold for SE \(\left( \widehat{\omega }\right) \), SE \(\left( {\widehat{\alpha }}\right) \), ASE \(\left( {\widehat{\omega }}\right) \) and ASE \(\left( {\widehat{\alpha }}\right) \).Footnote 6

Table 2 Simulation results on the standard errors of the QMLE

4 Conclusion

In this paper we establish consistency and asymptotic normality of the ML estimator in the level-effect ARCH model. We also show in simulations that the asymptotic theory provides a good approximation in finite samples.