1 Introduction

The k-factor Gegenbauer autoregressive moving average (GARMA) model nests ARIMA, fractionally integrated ARMA (ARFIMA), seasonal ARFIMA, and single-factor GARMA models as special cases and may simultaneously include features of each. The k-factor GARMA(p,q) model is defined as

$$\begin{aligned} \phi (L)\prod _{i=1}^{k}\left( 1-2\eta _{i}L+L^{2}\right) ^{d_i}(x_{t} -\mu )=\theta (L)\,\varepsilon _{t}. \end{aligned}$$
(1)

Here \(|\eta _i|\le 1\), \(d_i\) are memory parameters, and \(\phi (L)\) and \(\theta (L)\) are p and q order polynomials in the lag operator L such that \(\phi \left( z\right) =0\) and \(\theta \left( z\right) =0\) have all roots outside the unit circle and no common zeros. Further, \(\left\{ \varepsilon _{t}\right\}\) is a white noise disturbance sequence.

These long-memory models are especially useful because they can capture complex but commonly observed patterns in the spectral density and autocorrelation functions (ACF) of a stochastic process using only a few parameters. Excellent recent reviews of the estimation methods for GARMA processes were provided by Dissanayake et al. (2018) and Hunt et al. (2022), who discuss difficulties with obtaining theoretical distribution results for estimators of the model parameters. Of particular note, there appears to be little existing distribution theory for maximum likelihood-based estimation methods in the time domain when \(k>1\). In this paper, we address this void by presenting a conditional sum of squares (CSS) estimator along with proposed joint asymptotic distributions for all parameters in the k-factor model. Simulation experiments generally validate the theoretical distributions. As an application, we model the trading volume of IBM equities, finding evidence of complex long-memory dynamics.

Long-memory models were popularized by Granger and Joyeux (1980), Hosking (1981) and Granger (1980, 1981) who introduced fractional differencing as a means of capturing complicated stochastic properties of data in the time and frequency domains. These models have proven especially useful by bridging the gap between infinite variance unit root processes and finite variance short memory processes. One shortcoming of the original fractionally differenced models, however, is that they are incapable of capturing long-memory processes with persistent cycles in the ACF. Gray et al. (1989), along with the correction in Gray et al. (1994), addressed this issue with the GARMA model, which was generalized by Woodward et al. (1998) to allow for multiple sources of cyclic long memory. The general model is capable of generating many complex patterns in the ACF that have previously been very difficult to capture. One particularly interesting case is a process that contains both ARFIMA(0,0) and GARMA(0,0) components, such that the ACF decays non-monotonically at a hyperbolic rate and is asymmetric about zero, such as shown in Fig. 1.

Fig. 1
figure 1

ACF of a process with both an ARFIMA and a GARMA component

Due to its flexibility, the k-factor GARMA approach has proven very useful for modeling many physical, economic, and financial time series that exhibit complex long-memory features. For solar activity, Gray et al. (1989) and Chung (1996b) estimate a single-factor model for sunspots, while Maddanu and Proietti (2022) considered a model with \(k=4\), ultimately isolating a single long-memory cycle of about 11 years. Woodward et al. (1998) and Diongue and Ndongo (2016) provide evidence supporting the existence of multiple sources of long memory in atmospheric \(\hbox {CO}_2\) and river flows. In economics and finance, these methods have been used to study interest rates (Ramachandran and Beaumont 2001; Gil-Alana 2007; Asai et al. 2020), exchange rates (Smallwood and Norrbin 2006), inflation (Caporale and Gil-Alana 2011; Peiris and Asai 2016), equity prices (Lu and Guegan 2011; Caporale and Gil-Alana 2014) and unemployment (Gil-Alana 2007; Beaumont and Smallwood 2022). The possibility of many sources of long memory was illustrated recently by Leschinski and Sibbertsen (2019), who modeled California electricity load data using 14 independent long-memory components.

Despite the increasing interest in the k-factor GARMA model, a unifying estimation approach does not appear to exist. Almost all studies assume the positions of the singularities are known (for example, Caporale and Gil-Alana (2011) and Arteche (2020)), or employ two-step procedures where the Gegenbauer frequencies are typically first estimated by inspection of the periodogram (for example, Hidalgo and Soulier (2004), Lu and Guegan (2011) and Asai et al. (2020), amongst others). Only a handful of studies have attempted to simultaneously estimate all model parameters, including memory parameters and the positions of the spectral poles, known as Gegenbauer frequencies. In this context, wavelet procedures have been used by Lu and Guegan (2011), Alomari et al. (2020), and Ayache et al. (2022) and offer a promising semi-parametric alternative to estimation of spectral poles. However, these methods have only generally been used to estimate models with \(k=1\). Specifically, Alomari et al. (2020) and Ayache et al. (2022) consider time series processes having spectra encompassing the 1-factor GARMA model as a special case. Alomari et al. (2020) do establish consistency for the frequency parameter using the wavelet-based method of Bardet and Bertrand (2010) who introduced a nonparametric approach to spectral density estimation. The result was extended by Ayache et al. (2022) to establish asymptotic normality for the estimators. In the time domain, Dissanayake et al. (2016) provided distributional results using a state-space approach based on associated Gegenbauer polynomials and the Kalman filter to obtain likelihood-based estimates for the 1-factor GARMA(0,0) model with \(|\eta |<1\). Kouamé and Hili (2008, 2012) use minimum distance estimators and show consistency and asymptotic normality for estimators of differencing parameters, although specific knowledge of \(\eta _i\) is generally required.

A major difficulty in generalizing distribution theory for the full k-factor model lies in the fact that estimators of the parameters dictating the positions of the spectral poles appears to be non-standard, with rates of convergence that may differ relative to those of other parameters. Additionally, the relevant parameter space is closed, whereas successful attempts to establish distributional results for estimators in the time domain generally exclude the zero-frequency as an admissible value (see, Kouamé and Hili (2008) and Dissanayake et al. (2016), for example). Further, maximum likelihood-based estimators in the frequency domain typically use a discrete set of frequencies for the associated singularities. For these estimators, as argued by Giraitis et al. (2001), a full set of distributional results may not exist.

For inference in the models considered here, we are unaware of any study proposing a full set of distributional findings for any estimator. The strongest results appear to have been offered by Hidalgo (2005), who considers a semiparametric estimator of the memory parameter and position of the spectral pole for processes having spectra consistent with the GARMA process. Hidalgo (2005) rigorously establishes theoretical results for estimation of the underlying model parameters, even when the singularity occurs at the origin. For a single-factor model, Giraitis et al. (2001) establish consistency for the Whittle estimator of the Gegenbauer frequency and provide normality results for the differencing parameter. In the time domain, with a known spectral pole at the origin, Robinson (2006) establishes consistency and asymptotic normality for the CSS estimator of the parameters for a general model that includes stationary ARFIMA processes as a special case. As referenced above, for spectral poles that do not include the origin, partial results are available from Kouamé and Hili (2008) and Dissanayake et al. (2016).

With \(k=1\), promising results for the CSS method were proposed by Chung (1996a, 1996b), who attempted to establish complete distributional results for all parameters. The method of proof of Chung relied on the observation that, for the true parameter values, the expectation of the approximate likelihood function is zero. The results of Chung are seen as somewhat controversial, as there were no attempts to constrain the position of the unknown spectral pole. In fact, Chung (1996a) argues that there is a discontinuity in the distribution at the zero frequency. Perhaps more remarkably, with T denoting the sample size, Chung (1996a) argues that the associated estimate of the Gegenbauer frequency achieves a \(T^2\)-rate of convergence when the spectral pole occurs at 0 or \(\pi\), while it is otherwise \(O_p(T^{-1})\). Most importantly, as initially pointed out by Giraitis et al. (2001), Chung (1996a) was unable to provide a rigorous initial proof establishing consistency. Additionally, Beaumont and Smallwood (2022) provide extensive simulation evidence yielding some support for theoretical concerns when the position of the spectral pole occurs at the origin.

Although the results of Chung (1996a) may appear tenuous, the CSS estimator provides a feasible and relatively simple method to obtain joint estimation results for the GARMA model parameters. Additionally, the consistency proof established by Robinson (2006) for the CSS estimator likely extends to the k-factor GARMA process. Notwithstanding the concerns when the Gegenbauer frequency is 0, the simulation evidence of Beaumont and Smallwood (2022) otherwise generally supports the results of Chung (1996a). Beaumont and Smallwood (2022) also demonstrate that the CSS method generally obtains a smaller bias for estimation of the spectral pole relative to the Whittle counterpart. Diongue and Ndongo (2016) provide similar evidence, demonstrating that, compared to a Whittle-based estimator, the CSS method is relatively efficient in estimating differencing parameters for k-factor GARMA processes with infinite variance disturbances. Given these promising simulation results, it is worthwhile to consider the properties of the CSS estimator when applied to models with multiple Gegenbauer frequencies.

Here, for the k-factor GARMA model, we study the CSS estimator described by Chung and Baillie (1993) for ARFIMA models and by Chung (1996a, 1996b) for single-factor GARMA models. All parameters are simultaneously estimated, including the ARMA components. Furthermore, we propose an asymptotic distribution for all parameters in the model, where, to our knowledge, only partial results are currently available. The results show that the estimates of each Gegenbauer frequency are asymptotically independent of all other model parameters. We provide simulation evidence to help validate the results. The simulation evidence, including additional results in Beaumont and Smallwood (2022), demonstrates that the theory can typically be reliably used to provide inference for the estimated parameters. To the extent that there are concerns with testing for models with a spectral pole at the origin, we provide a simple parametric bootstrap procedure based on our estimator.

The rest of the paper is organized as follows. In the next section, we present the details of the multi-factor GARMA model. We introduce the CSS estimator and derive its properties in Sect. 3. In Sect. 4, we provide Monte Carlo evidence for the finite sample precision of the iterative CSS estimation method that we propose. In Sect. 5, we show that the weekly trading volume of IBM stocks is best modeled with a six-factor GARMA process. We summarize and draw conclusions in Sect. 6, and an appendix contains technical details.

2 k-Factor GARMA processes

The k-factor GARMA model, defined in Eq. (1), was originally discussed by Gray et al. (1989) and presented in greater detail by Woodward et al. (1998). More specifically, with \(i=1,\ldots ,k\), the \(d_i\) are memory parameters, and \(\eta _i\) dictate the periodic features of the process. Each Gegenbauer polynomial, \((1-2\eta _i L + L^2)^{d_i}\), has a pair of complex roots with modulus one and expands to an infinite order polynomial in L. When \(k=1\), we get the single frequency GARMA model (Hosking 1981; Gray et al. 1989), and when, in addition, \(\eta =1\), the model further reduces to an ARFIMA(p, 2dq) model (Granger and Joyeux 1980; Hosking 1981). Finally, we get an ARIMA model when \(\eta =1\) and \(d=0.5\), and an ARMA process when \(d=0\).

Assuming that each \(\eta _{i}\) is distinct, the k-factor GARMA model is stationary if for all i, \(d_i<0.5\) whenever \(|\eta _{i}|<1\), and \(d_i<0.25\) when \(|\eta _{i}|=1\). The model is invertible if \(d_i>-\, 0.5\) when \(|\eta _{i}|<1\), and \(d_i>-\,0.25\) when \(|\eta _{i}|=1\). Proofs for these results are available in Woodward et al. (1998).

For stationary cases, the moving average representation is,

$$\begin{aligned} (x_{t}-\mu )=\frac{\theta (L)}{\phi _(L)}\overset{k}{\underset{i=1}{\prod }}\left( 1-2\eta _{i}L+L^{2}\right) ^{-d_i}\,\varepsilon _{t}, \end{aligned}$$
(2)

from which the spectral density function is obtained as

$$\begin{aligned} f(\omega ) = \frac{\sigma ^2}{2\pi } \bigg |\frac{\theta (\textrm{e}^{-i\omega })}{\phi (\textrm{e}^{-i\omega })} \bigg |^{2}\overset{k}{\underset{j=1}{\prod }}\left\{ 2\,|\cos (\omega )-\cos (\upsilon _{j})|\right\} ^{-2d_{j}},\quad \omega \in \left[ 0,\pi \right] \end{aligned}$$
(3)

where \(\upsilon _{j}=\cos ^{-1}(\eta _{j})\) are the Gegenbauer frequencies. The spectral density function is unbounded at \(\upsilon _{j}\) if \(d_{j}>0\) and vanishes there if \(d_{j}<0.\) The autoregressive representation is most relevant for estimation of the CSS function considered here and is given as follows:

$$\begin{aligned} \frac{\phi (L)}{\theta (L)}\overset{k}{\underset{i=1}{\prod }}\left( 1-2\eta _{i}L+L^{2}\right) ^{d_i} (x_{t}-\mu ) = \varepsilon _{t}. \end{aligned}$$
(4)

The autocovariances for a k-factor GARMA model can be computed as

$$\begin{aligned} \gamma _{j}=2\int _{0}^{\pi }f(\omega )\cos (\omega j)\,\textrm{d}\omega , \end{aligned}$$
(5)

where special attention must be given to the singularities in \(f\left( \omega \right)\) as discussed by McElroy and Holan (2016). Convenient approximations for \(\gamma _j\) are only available for single frequency models. For example, when \(\eta =1\) and \(d<0.25\), the autocorrelations exhibit hyperbolic decay as demonstrated by Granger and Joyeux (1980) for fractional processes. For GARMA(0,0) models, Chung (1996a) shows that for large j, the autocorrelation function with \(|\eta |<1\) and \(d<0.5\), \(d \ne 0\), can be approximated as \(\rho _{j} \approx J \cos (j\,\upsilon )\,j^{2d-1}\), where the constant J does not depend upon j. This expression makes clear the hyperbolically damped sinusoidal pattern of the autocorrelation function of a stationary GARMA process with \(|\eta |<1\).

In Fig. 1, we illustrate a model that combines ARFIMA(0,0) and GARMA(0,0) models, which is of particular interest for economic and financial applications. This example used a model with parameters \(\left( \eta _{1},d_{1}\right) =\left( 1,0.15\right)\) and \(\left( \eta _{2},d_{2}\right) =\left( 0.992,0.25\right)\). Note that the first frequency corresponds to an unbounded spike at the origin of the spectrum. The second frequency corresponds to an unbounded spike at the frequency \(\upsilon _{2}=\cos ^{-1}\left( 0.992\right) =0.1266\) radians, or 0.0201Hz, which is very close to the origin, with a cycle length of 50 periods. The ACF clearly demonstrates long cycles about the hyperbolic decay characteristic of fractional processes.

3 Estimation

As discussed above, several estimation procedures have been proposed for the k-factor model. In this section, we generalize the CSS estimator of Chung (1996a, 1996b) for single-factor GARMA models to models with \(k>1\).

3.1 The constrained sum of squares estimator

In this subsection, we define the CSS estimator we employ for the GARMA process and set preliminaries for the distribution theory proposed in the following subsection. In the case where a spectral pole exists at 0 or \(\pi\), the CSS estimator of the k-factor GARMA model inherits the problems associated with time-domain estimation of \(\mu\) for simple ARFIMA models as espoused by Cheung and Diebold (1994) and Chung (1996b). Therefore, in this section we impose that \(\mu\) is known, leaving the issue of an unknown mean for future research.Footnote 1

To establish notation, let \(\delta =(d_{1},\ldots ,d_{k})^{\prime }\), \(\tau =(\phi _{1,},\ldots ,\phi _{p},\theta _{1},\ldots ,\theta _{q})^{\prime }\), and \(\eta =(\eta _{1},\ldots ,\eta _{k})^{\prime }\), where \(\psi =(\delta ^{\prime },\tau ^{\prime },\eta ^{\prime })^{\prime }\). We further have, \(\delta \in \Psi _\delta\), \(\tau \in \Psi _\tau\), and \(\Psi _\eta =\prod _{i=1}^k[-1,1]\), where \(\Psi _\delta\) and \(\Psi _\tau\) are compact subsets of \(\mathbb {R}^k\) and \(\mathbb {R}^{p+q}\), respectively, and where \(\Psi =\Psi _\delta \times \Psi _\tau \times \Psi _\eta\). The sum of squares function considered here is used to estimate the true, unknown values given by the associated vector denoted \(\psi _0=(\delta _0^{\prime }, \tau _0^{\prime }, \eta _0^{\prime })^{\prime }\). If we assume that the initializing disturbances are zero, then the maximization of the CSS function is asymptotically equivalent to maximum likelihood estimation. The following additional assumptions are imposed for the distribution theory presented in the next subsection.

Assumption 1

\(\{\varepsilon _t\}\) are martingale differences with respect to an increasing sequence of sigma-fields, \({F_t}\), such that, for some \(\beta >0\), \(\sup _t E(|\varepsilon _t|^{2+\beta }\, \vert F_{t-1})<\infty\), almost surely, and \(E(\varepsilon _t^2 \vert F_{t-1})=\sigma ^2\), almost surely.

Assumption 2

\(\delta _0\) lies in the interior of the set \(\prod _{i=1}^k [0,\bar{d}_i]\), where \(\bar{d}_i\)=0.25 if \(|\eta _{i,0} |=1\), whereas \(\bar{d}_i=0.50\) if \(|\eta _{i,0} |<1\). Further, \(\tau _0\) is in the interior of \(\Psi _\tau\).

Assumption 3

The value of k is known, and \(\eta _0=\left( \eta _{1,0},\eta _{2,0},\ldots ,\eta _{k,0}\right) ^\prime\) has no common elements, where \(\eta _{i,0} \ne \eta _{j,0}, \forall i \ne j\).

The first assumption relaxes an unnecessarily strong normality condition, whereas, as illustrated below, estimation requires only the associated sum of squared errors. The second assumption is standard within the long-memory literature, specifically when developing consistency arguments (Robinson 2006), and the third condition is needed for identification. Below, we discuss methods that can be used to estimate the unknown value of k.

Under the assumptions above, we can use the AR representation from (4) to define the sum of squares function. Specifically, define \(\alpha _j(\psi )\) as the jth coefficient in the expansion of \(\frac{\phi (L)}{\theta (L)}{\prod }_{i=1}^{k}(1-2\eta _{i}L+L^{2})^{d_i}\). We define the truncated disturbances and sum of squares function, \(s_T(\psi )\), as,

$$\begin{aligned} s_T(\psi )=\sum _{t=1}^{T} \varepsilon _t(\psi ) ^2 \end{aligned}$$
(6)

where

$$\begin{aligned} \varepsilon _t(\psi )=\sum _{j=0}^{t-1}\alpha _j(\psi )x_{t-j}. \end{aligned}$$
(7)

Under all above assumptions, the set of CSS estimates, \(\hat{\psi }=(\hat{\delta }^{\prime },\hat{\tau }^{\prime },\hat{\eta }^{\prime })^{\prime },\) is then defined as follows:

$$\begin{aligned} \hat{\psi }= \underset{\psi \in \Psi }{\arg \min }s_T(\psi ). \end{aligned}$$
(8)

Conditions for consistency of the CSS estimators have been established by Robinson (2006). The following two assumptions establish consistency under the additional assumptions above and defining \(\alpha (L;\psi )=\sum _{j=0}^\infty \alpha _j(\psi )L^j\).

Assumption 4

For the true parameter vector \(\psi _0\), we have \(\psi _0 \in \Psi\), and for all \(\psi \in \Psi \setminus {\psi _0}\), \(\alpha (L;\psi ) \ne \alpha (L;\psi _0)\).

Assumption 5

\(\sum _{j=0}^{\infty }{\sup }_{\psi \in \Psi } |\alpha _j(\psi )|< \infty\).

The fourth assumption is also an identification condition, while the last assumption requires absolute summability of the coefficients in the autoregressive representation for \(x_t\). Under the assumptions above, absolute summability is established if \(d_i>0\) for all \(i \in \{1,\ldots ,k\}\), as provided in the following lemma whose proof is given in Appendix.

Lemma 1

Under Assumptions 14, the coefficients in the \(AR(\infty )\) representation of \(x_t\) in Eq. (4) are absolutely summable provided \(d_i>0\) for all \(i \in \{1,\ldots ,k\}\).

3.2 Asymptotic distributions

Here, we extend the proofs of Chung (1996a, 1996b) to propose distributional theory for the CSS estimator in (8). The proofs augment Chung (1996a, 1996b), and, as such, complications might be expected. Specifically, similar to Chung, the distribution for \(\hat{\eta }_i\) is shown to be non-standard with a discontinuity occurring at \(|\eta _i|=1\). In this specific case, it is not possible to constrain all parameters to lie in the interior of the parameter space, an assumption that would typically be employed in establishing a limiting distribution (see, Andrews and Sun (2004), for example). Consequently, we use an extensive set of simulations to help validate results, especially for the cases when \(\eta _{i,0} = 1\).

To extend Chung (1996a, 1996b), we consider four cases. The first case is for those models for which \(|\eta _{i,0}|<1\), for all \(i=1,\ldots ,k.\) The second case is for those models for which there exists a value \(\eta _{i,0}=1\), where \(|\eta _{j,0}|<1\) for \(i \ne j\). The third case is for those models for which there exists a value \(\eta _{i,0}=-1\), and \(|\eta _{j,0}|<1\), otherwise. The final scenario is for those models for which there exists two values \(\eta _{i,0}\) and \(\eta _{j,0}\), such that \(\eta _{i,0}=1\) and \(\eta _{j,0}=-1\). The first theorem establishes that the asymptotic information matrix for the k-factor GARMA model is block diagonal.

Theorem 1

(Asymptotic independence of  \(\hat{\eta }\)) Let \(\hat{\psi }_{\delta ,\tau }=(\hat{d}_{1},\ldots ,\hat{d}_{k},\hat{\phi }^{\prime },\hat{\theta }^{\prime })^{\prime }\) and \(\hat{\eta }=(\hat{\eta }_{1},\ldots ,\hat{\eta }_{k})^{\prime }\) be the estimated parameters associated with (8) for the k-factor GARMA model. The asymptotic distribution of \(\hat{\psi }_{\delta ,\tau }\) is independent of \(\hat{\eta }\).

The proof of this theorem is given in “Appendix 1”. The essential idea is to establish the different rates of stochastic convergence for the elements of \(\hat{\psi }_{\delta ,\tau }\) and \(\hat{\eta }\). No conditions are placed on the value of \(\eta _{i,0}\) relative to \(\eta _{j,0}, \,i\ne j\), so this theorem holds for all four cases described above. Consequently, the asymptotic distribution of \(\hat{\psi }_{\delta ,\tau }\) can be considered independently of \(\hat{\eta }\).

Theorem 2 yields the asymptotic distribution of the estimator of \(\psi _{\delta ,\tau }\), where, again, the proof is provided in Appendix.

Theorem 2

(Asymptotic distribution of  \(\hat{\psi }_{\delta ,\tau }\)) Let \(\hat{\psi }_{\delta ,\tau }\) be the CSS estimator of the true value \(\psi _{\delta _0,\tau _0}\) for the stationary and invertible k-factor GARMA model. Then, under Assumptions 15,

$$\begin{aligned} \sqrt{T}(\hat{\psi }_{\delta ,\tau }-\psi _{\delta _0,\tau _0})\rightsquigarrow N\left( 0,I_{\psi _{\delta _0,\tau _0}}^{-1}\right) , \end{aligned}$$
(9)

where \(\rightsquigarrow\) denotes the weak convergence of the random vector \(\hat{\psi }_{\delta ,\tau }\), and where

$$\begin{aligned} \underset{(k+p+q)\times (k+p+q)}{I_{\psi _{\delta _0,\tau _0}}}=\left[ \begin{array}{lllll} I_{d_{1,0}} &{}\quad \cdots &{}\quad I_{d_{1,0}d_{k,0}} &{}\quad I_{d_{1,0},\phi _0} &{}\quad I_{d_{1,0},\theta _0}\\ \vdots &{}\quad \ddots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots \\ I_{d_{1,0}d_{k,0}} &{}\quad \cdots &{}\quad I_{d_{k,0}} &{}\quad I_{d_{k,0},\phi _0} &{}\quad I_{d_{k,0},\theta _0}\\ I_{d_{1,0},\phi _0} &{}\quad \cdots &{}\quad I_{d_{k,0},\phi _0} &{}\quad I_{\phi _0} &{}\quad I_{\phi _0,\theta _0}\\ I_{d_{1,0},\theta _0} &{}\quad \cdots &{}\quad I_{d_{k,0},\theta _0} &{}\quad I_{\phi _0,\theta _0} &{}\quad I_{\theta _0} \end{array} \right] . \end{aligned}$$
(10)

With \(\upsilon _{i,0}=cos^{-1}(\eta _{i,0})\), the elements of \(I_{\psi _{\delta _0,\tau _0}}\)are defined as follows:

$$\begin{aligned} I_{d_{i,0}}= & {} 2\left[ \frac{\pi ^{2}}{3}-\pi \upsilon _{i,0}+\upsilon _{i,0}^{2}\right] ,\quad i=1,\ldots ,k \end{aligned}$$
(11a)
$$\begin{aligned} I_{d_{i,0}d_{j,0}}= & {} 2\left[ \frac{\pi ^{2}}{3}-\pi \upsilon _{i,0}+\frac{\upsilon _{i,0} ^{2}+\upsilon _{j,0}^{2}}{2}\right] ,\quad \upsilon _{i,0}>\upsilon _{j,0}, \end{aligned}$$
(11b)
$$\begin{aligned} I_{d_{i,0}\phi _{j,0}}= & {} 2\overset{\infty }{\underset{l=0}{\sum }}\phi _{l,0}^{*}\frac{\cos [(l+j)\upsilon _{i,0}]}{(l+j)},\quad i=1,\ldots ,k,\quad j=1,\ldots ,p \end{aligned}$$
(11c)
$$\begin{aligned} I_{d_{i,0}\theta _{m,0}}= & {} 2\overset{\infty }{\underset{l=0}{\sum }}\theta _{l,0}^{*}\frac{\cos [(l+m)\upsilon _{i,0}]}{(l+m)},\quad i=1,\ldots ,k,\quad m=1,\ldots ,q \end{aligned}$$
(11d)

where \(\phi _{l,0}^{*}\) and \(\theta _{l,0}^{*}\) denote the lth coefficients in the infinite order expansions of \(\phi _{0}^{-1}(L)\) and \(\theta _{0}^{-1}(L),\) respectively. The submatrices \(I_{\phi _0,}\) \(I_{\phi _0,\theta _0}\)and \(I_{\theta _0}\) consist of elements that are the same as the corresponding submatrices of the usual information matrix of an ARMA model.

To calculate the information matrix in Theorem 2, the coefficients of \(\phi _{l,0}^{*}\) and \(\theta _{l,0}^{*}\) are easily calculated recursively using the method of equating coefficients. Equipped with these values, it is straightforward to calculate the information matrix to obtain standard errors used in inference. In the application below, given the large number of potential permutations, with different values of k, p, and q, we use a straightforward computation that truncates relevant infinite sums with 10 million terms.

Throughout, we abstract from the case where \(\mu\) is unknown, although a result is available if \(|\eta _{i,0}|<1\), \(i=1,\ldots , k\). With \(|\eta _{i,0}|<1\) for all i, the CSS estimator of the true mean \(\mu _0\), denoted \(\hat{\mu },\) has the following distribution, where f(0) denotes the spectral density function evaluated at frequency \(\omega =0\),

$$\begin{aligned} \sqrt{T}(\hat{\mu }-\mu _0)\rightsquigarrow N(0,2\pi f(0)). \end{aligned}$$
(12)

The distributions of \(\hat{\mu }\) and the sample mean, \(\bar{x}\), are equivalent. The proof is omitted as these results are a simple extension of Theorem 1 in Chung (1996b).

Theorem 3 is the central result and proposes the asymptotic distribution of \(\hat{\eta }\) for all of our four cases.

Theorem 3

(Asymptotic distribution of \(\hat{\eta }\)) Let \(\hat{\eta }_{1},\ldots ,\hat{\eta }_{k}\) be the estimators of \(\eta _{1,0},\ldots ,\eta _{k,0},\) based on Eq. (8) for a stationary and invertible k-factor GARMA model for a sample \(\{x_{t}\},\,\) \(t=1,\ldots ,T\). Without loss of generality, order the elements of \(\eta _0\) from smallest to largest. Then let \(D_{\eta _{1,0}}\) denote a dummy variable that takes on the value 1 if \(\eta _{1,0}=-1\) and 0 otherwise, and let \(D_{\eta _{k,0}}\) denote a dummy variable that takes on the value 1 if \(\eta _{k,0}=1\) and 0 otherwise. Under Assumptions 15,

$$\begin{aligned} T(\hat{\eta }_{i}-\eta _{i,0})\rightsquigarrow \frac{\sin (\upsilon _{i,0})}{d_{i,0}} \frac{\left[ \int _{0}^{1}W_{2i-1-D_{\eta _{1,0}}}\,\textrm{d}W_{2i-D_{\eta _{1,0}}} -\int _{0}^{1}W_{2i-D_{\eta _{1,0}}}\,\textrm{d}W_{2i-1-D_{\eta _{1,0}}}\right] }{\left[ \int _{0}^{1}W_{2i-1-D_{\eta _{1,0}}}^{2}(r)\,\textrm{d}r+\int _{0}^{1}W_{2i-D_{\eta _{1,0}} }^{2}(r)\,\textrm{d}r\right] } \end{aligned}$$
(13)

with \(\vert \eta _{i,0}|<1,\) where \(i=1+ D_{\eta _{1,0}},\ldots ,k- D_{\eta _{k,0}}\) and,

$$\begin{aligned} T^{2}(\hat{\eta }_{1}+1) \rightsquigarrow -\frac{1}{2d_{1,0}} \frac{\int _{0}^{1}\left[ \int _{0}^{r}W_{1}(s)\textrm{d}s\right] \textrm{d}W_{1}(r)}{\int _{0}^{1}\left[ \int _{0}^{r}W_{1}(s)\,\textrm{d}s\right] ^{2}\textrm{d}r},\quad \text {if}\ \ \eta _{1,0}=-1, \end{aligned}$$
(14)
$$\begin{aligned} T^{2}(\hat{\eta }_{k}-1) \rightsquigarrow \frac{1}{2d_{k,0}}\frac{\int _{0} ^{1}\left[ \int _{0}^{r}W_{2k-1-D_{\eta _{1,0}}}(s)\textrm{d}s\right] \textrm{d}W_{2k-1-D_{\eta _{1,0}}}(r)}{\int _{0}^{1}\left[ \int _{0}^{r}W_{2k-1-D_{\eta _{1,0}}}(s)\textrm{d}s\right] ^{2}\textrm{d}r},\quad \text {if}\ \ \eta _{k,0}=1, \end{aligned}$$
(15)

where \(W_{1},W_{2},\ldots ,W_{2k-D_{\eta _{1,0}}-D_{\eta _{k,0}}}\), are \(2k-D_{\eta _{1,0}}-D_{\eta _{k,0}}\) independent Brownian motions.

The proof is given in “Appendix”. An important result of this theorem relates to the asymptotic independence of the values in the vector \(\hat{\eta }.\) In addition, for each \(\hat{\eta }_{i}\), \(d_{i,0}\) and \(\upsilon _{i,0}\) enter the equation for the asymptotic distribution proportionally, so one only needs the values of the stochastic integrals depicted in Theorem 3 to calculate asymptotic confidence intervals. The values for these integrals are reported in Chung (1996a).

3.3 Estimation algorithm

These theorems provide practical information for designing an efficient algorithm. We know that the asymptotic distributions of the memory parameters are not independent of the ARMA parameters. Also, the asymptotic distribution of \(\hat{\psi }_{\delta ,\tau }\) and \(\hat{\eta }\) are independent, but the elements of \(\hat{\psi }_{\delta ,\tau }\) are \(O_{p}(T^{-1/2}),\) whereas \(\hat{\eta }_{i}\) is \(O_{p}(T^{-1})\) if \(|\eta _{i,0}|<1\) and \(O_{p}(T^{-2})\) if \(|\eta _{i,0}|=1.\) These results suggest that the algorithm of Woodward et al. (1998), which estimates ARMA parameters independently of \(\left( \eta _{i},d_i\right)\), will produce inconsistent estimates. It might be preferable to use an extension of Chung’s method (Chung 1996a, b) by conducting a grid search over each element of \(\eta\) combined with a gradient method for \(\psi _{\delta ,\tau }.\) However, Monte Carlo simulations indicate that the grid over each value of \(\eta _i\) must be very fine, since the objective function has many local minima. A k-dimensional line search for \(\eta\) coupled with a gradient-based search for \(\psi _{\delta ,\tau }\) would be computationally infeasible, unless the parameter space is bounded in some way or a very coarse grid is used.

The computational complexity of the CSS estimator for a k-factor GARMA model can be better appreciated if we consider the step of recursively computing the residuals. The inverse of the ith Gegenbauer polynomial in the k-factor GARMA model can be expanded as (Gray et al. 1989)

$$\begin{aligned} (1-2\eta _i z+z^{2})^{d_i}=\sum _{j=0}^{\infty }C_{j}^{\left( -d_i\right) }\left( \eta _i\right) \,z^{j}, \end{aligned}$$
(16a)

where

$$\begin{aligned} C_{j}^{\left( -d_i\right) }\left( \eta _i\right) = \sum _{l=0}^{\left[ j/2\right] }\frac{\left( -1\right) ^{l}\left( 2\eta _i\right) ^{j-2l} \,\Gamma \left( -d_i-l+j\right) }{l!\,\left( j-2l\right) !\,\Gamma \left( -d_i\right) }, \end{aligned}$$
(16b)

and where \(\left[ j/2\right]\) is the integer part of j/2. As Chung (1996a) notes, the best way to calculate the coefficients \(C_{j}^{(-d_i)}\) is via the recursion,

$$\begin{aligned} C_{j}^{(-d_i)}(\eta _i)=2\eta _{i}\left( \frac{-d_i-1}{j}+1\right) C_{j-1}^{(-d_i)}(\eta _i)-\left( 2\frac{-d_i-1}{j}+1\right) C_{j-2}^{(-d_i)}(\eta _i), \end{aligned}$$
(17)

where \(C_{0}^{(-d_i)}(\eta _i)=1\) and \(C_{1}^{(-d_i)}(\eta _i)=-2\,d_i\,\eta _{i}.\) Under the assumption that \(\varepsilon _{0}\)=\(\varepsilon _{-1}\)=\(\cdots =0,\) \(\varepsilon _t\) can be calculated recursively from the expression

$$\begin{aligned} \frac{\phi (L)}{\theta (L)}\prod _{i=1}^{k}\left[ \sum _{j=0}^{t-1}C_{j}^{-(d_i)}(\eta _i) \,L^{j}\right] (x_{t}-\mu )=\varepsilon _{t}. \end{aligned}$$
(18)

The combination of the k-dimensional product over the above sums creates most of the computational burden.

To overcome computational issues, coupled with different rates of convergence of various model parameters, we use an extension of the iterative multi-step gradient-based algorithm developed by Ramachandran and Beaumont (2001). First, for a given k, we obtain a grid of starting values for each element of \(\eta\). We use each set of starting values in this grid to estimate \(\psi _{\delta ,\tau }\). Conditional on the estimated value, \(\hat{\psi }_{\delta ,\tau }\), we then estimate the elements of \(\eta\) using an unconstrained gradient-based search.Footnote 2 Using the updated estimates of \(\eta\), a new estimate of \(\psi _{\delta ,\tau }\) is obtained, which is then used to update the estimate of \(\eta\). This procedure continues for all combinations of starting values for \(\eta _i\). The final model results from the set of parameters that jointly produce the smallest sum of squared errors. Although computationally intensive, the use of this multi-step gradient-based iterative algorithm provides large gains in computational time relative to the full k-dimensional line search for \(\eta _i\).

Our theoretical results assume that the number of spectral poles, k, is known, although this may be unlikely in many applications. It is beyond the scope of this paper to settle how k should be determined for all applications. However, we provide here some guidance based on the existing literature and also propose an additional method that shows promise. Within the literature, k is most commonly selected through ocular inspection of the periodogram of the data to locate the dominant frequencies (Yajima 1991; Hidalgo and Soulier 2004; Arteche 2020). Although there is some theoretical support for this approach, the number of candidate frequencies could be low if the spectrum is dominated by behavior at the origin (Leschinski and Sibbertsen 2019). Hidalgo and Soulier (2004) introduce a procedure to determine the model order that sequentially identifies the largest periodogram frequency and then tests the significance of the persistence parameter at that frequency. If the parameter is found to be insignificant, the iterative procedure ends. Otherwise, the significant \((\eta _i, d_i)\)-pair is added to the Gegenbauer filter, some neighborhood around that pole is excluded, and the sequential search continues. Leschinski and Sibbertsen (2019) propose a related iterative procedure that tests for significant poles in the spectrum after sequentially applying a Gegenbauer filter based on estimated memory parameters obtained using a Whittle method. The procedure terminates when the test-statistic for a singularity is insignificant.

We propose a relatively simple method that selects k based on the minimum value of the Bayesian Information Criterion (BIC) for integer values of \(k\le \bar{K}\), where \(\bar{K}\) is some sufficiently large upper bound.Footnote 3 To test this procedure, we simulated various k-factor GARMA models, and present here results for a potentially interesting case with parameters \((\eta _{i,0}, d_{i,0})_{i=1}^2 = (0.5, 0.2) \text { and } (-\,0.5, 0.4)\). The model also includes an AR(1) term with \(\phi _0 = 0.8\). We simulate 1000 replications of the true model with sample sizes of \(T = 100, 200, 300, 500, 1000,\) and 2000. We select \(\bar{K}\) to be 4, which is large enough to explore the sensitivity of our results without placing undue burden on computational resources. In addition to recording the selected value of k based on the BIC, we also consider model selection based on the Hannan–Quinn (HQ) and Akaike (AIC) information criteria.

The results of these simulations are reported in Table 1, where the true value of k is 2. In the top panel, we report the proportion of times the AIC selects different values that range from 1 to 4. The correct value of k is selected a majority of times for all sample sizes. The success rate for choosing \(k=2\) increases slowly in sample sizes beginning with 52% for \(T=100\), and reaches nearly 70% when \(T=2000\). The second and third panels show the AIC only performs comparatively well when \(T=100\), whereas it is strictly dominated by the BIC and HQ criteria for larger samples. Specifically, the BIC and HQ criteria are extremely accurate when \(T\ge 500\), with the BIC outperforming the HQ. The final panel of Table 1 shows the bias of the estimated parameters when k is set to 4. For large sample sizes, the bias induced by selecting \(k>2\) is quite small.

Table 1 Proportion of times k is chosen by both sample size T and information criterion for a 2-factor GARMA(1,0) model

Additional simulations (available upon request) show that the consequences of choosing k too large are relatively minor unless T is small. We also observe that the estimation errors, particularly the RMSEs, associated with overestimating k are greater for the ARMA parameters than for \(\eta _i\) and \(d_i\). Consequently, the more important the short-term dynamics are, the more critical it is to accurately estimate k. In many applications, until more definitive theoretical results for estimating k can be derived, we recommend that researchers use several methods to choose k and check the robustness of their estimation results.

4 Finite sample performance

In this section, we report simulation results that examine the finite sample properties of the CSS estimator. We are interested in examining the bias in the parameter estimates and in comparing the finite sample standard errors of the estimates with their asymptotic counterparts. Ramachandran and Beaumont (2001) and Beaumont and Smallwood (2022) have done extensive simulations for the single-factor GARMA model, with the latter paying particular attention to the parametric region where \(\eta\) is close to one and d is close to one-half. Based on those results, we use sample sizes of 500, 1000, and 2000 and concentrate on two-factor models with parameter ranges that we believe are most relevant for economic and financial applications.

The initial simulation results are presented in Tables 2, 3, 4 and 5.Footnote 4 Each column lists the parameters of the simulated model and each block in the tables gives the results from 1000 replications for each specific parameterization. For computational purposes, we use an iterative procedure to generate a large amount of data before discarding all but the last 500 or 2000 observations. Throughout, we report the true parameter values (True), the mean bias, the root mean squared error (RMSE), the mean of the numerical standard errors calculated from the estimated Hessian matrix in the last iteration (MNSE), and the mean of the true asymptotic standard errors (MASE) based on Theorem 2. We use the estimated values of all parameters to compute the true asymptotic standard errors for each of the 1000 replications and then average them to get the MASE. Since the mean bias is small, there will be inconsequential differences between the MASE computed this way and the true ASE computed using the true parameter values. Additionally, our MASE values will vary by sample size, since the standard errors are not multiplied by \(T^{-1/2}\).

Table 2 presents the results for six different two frequency GARMA(0,0) models with values of \(\eta _{i,0}\) set to \(-\frac{1}{2},0,\frac{1}{2}\) and values of \(d_{i,0}\) equal to 0.2 and 0.4. The estimation biases are all quite small, especially for the values of \(\hat{\eta }_1\) and \(\hat{\eta }_2\), which converge at a faster rate than the other parameters. Theoretically, \(\hat{\eta }_i\) is \(O_p(T^{-1})\) whereas remaining parameters have standard rate-\(\sqrt{T}\) convergence. It is therefore wholly consistent with the theoretical results to observe that the MNSE’s of \(\hat{\eta }_i\) in Table 2 are about 4 times larger for T= 500 relative to T=2000. In contrast, the MNSE’s for \(\hat{d}_i\) are about 2 times larger for samples of 500 relative to samples of 2000.

Generally speaking, a larger value of \(d_{i,0}\) mitigates the already small bias in \(\hat{\eta }_i\), which appears to be marginally more sensitive to estimation outliers. This is likely due to the fact that an estimate of \(d_i\) near zero can lead to poor estimates of the corresponding \(\eta _{i}\), because that Gegenbauer polynomial will have very little impact on the objective function no matter what the value of \(\eta _{i,0}\) is. In all cases, \(\mu\) is estimated with the sample mean, which is asymptotically equivalent to the CSS estimator of \(\mu\) provided \(|\eta _{i,0}|<1,\ i=1,\ldots ,k.\) As noted above, the estimator for the mean is \(O_{p}(T^{-1/2})\), the same rate of convergence as the parameters in \(\hat{\psi }_{\delta ,\tau }\), so its bias is also quite small. The true asymptotic standard errors of the corresponding values of \(\hat{d}_{1}\) and \(\hat{d}_{2}\) are quite comparable to their numerical counterparts. Finally, in light of the results of Theorem 3, it is not surprising to see that the MNSE and RMSE for \(\hat{\eta }_1\) and \(\hat{\eta }_2\) are quite different, since the RMSE assumes convergence at the rate \(T^{1/2}\).

Table 2 Simulations for the 2-factor GARMA(0,0) processes

To examine the influence of ARMA parameters, \(\phi\) and \(\theta\), we choose a particular parameterization (second case from Table 2) and estimate various two-factor GARMA(pq) models with p and q being either zero or one. The results are reported in Table 3 and are similar to those in Table 2. Again, for all of the cases considered in Table 3, the median and mean biases are quite small. Again, we see that the mean asymptotic standard errors are virtually identical to the RMSE and MNSE for \(\hat{d}_i\), \(\hat{\phi }\), and \(\hat{\theta }\), particularly with \(T=2000\). These simulation results yield particularly strong evidence supporting the theoretical results in Theorem 2.

Table 3 Simulation results for 2-factor GARMA processes with \(p,q\ne 0\)

Table 4 examines the particularly interesting case where \(\eta _{1,0}\)=1 and \(|\eta _{2,0}|<1,\) so that we get a combination ARFIMA and GARMA model. Compared to \(\hat{\eta }_2\), \(\hat{\eta }_{1}\) has very little bias and extremely small RMSE and MNSE, reflecting that this parameter may be \(O_{p}(T^{-2})\) as reported in the theoretical results above. As expected, the MNSE for \(\hat{\eta }_1\) is about \((2000/500)^2 = 16\) times larger when the sample size is 500 compared to when the sample size is 2000. The results for \(\hat{\eta }_2\) when \(|\eta _{2,0}|<1\) are similar to those in Tables 2 and 3, as are the results for the \(\hat{d}_1\) and \(\hat{d}_2\). When \(\eta _{i,0}=1\), however, the sample mean and CSS estimate of \(\mu\) are no longer asymptotically equivalent. Thus, we use the CSS estimator for \(\mu\) in these cases. The computational difficulties of time domain estimators for ARFIMA models when the mean is unknown have been well documented (Yajima 1991; Chung and Baillie 1993; Cheung and Diebold 1994). In spite of these difficulties, the mean is fairly unbiased, albeit with a wide distribution. Again, the remaining parameters suffer from very little distortion.

Table 4 Estimation of simulated ARFIMA/GARMA processes

As noted above, the computational burden of the CSS estimator grows rapidly with the number of spectral poles due to the grid search over each \(\eta _i\). Thus, if we could narrow the range of the grid search, we could improve the efficiency of the algorithm. With \(i \ne j\), since \(\hat{\eta }_i\) is independent of both \(\hat{\eta }_j\) and \(\hat{\psi }_{\delta ,\tau }\), it may be possible to first estimate each value of \(\eta _i\) sequentially to get good starting values. We could then re-estimate the entire model using fairly tight grids over each \(\eta _i\). In Table 5, we investigate this possibility. First, we estimate a 1-factor GARMA model and then filter the data with the resulting Gegenbauer polynomial before estimating the second frequency using a 1-factor model on these filtered data. This process should produce good starting values for \(\eta\) as long as the biases are not too large.

Table 5 Sequential estimation of 2-factor models with single-factor models

The first two models in Table 5 are cases from the previous simulations, and the third case represents a mixed ARFIMA/GARMA model in which the ARFIMA component is short memory (\(d_{i,0}<0\)). The latter process, which is not covered by the theorems above, may result from differencing processes with a non-stationary ARFIMA component. For each of the cases considered in Table 5, the sample mean is used to estimate \(\mu\). We find that the method generally first selects the frequency with the largest corresponding value of \(d_{i,0}\), thus capturing the most dominate feature of the ACF. The results in Table 5 indicate that the small sample biases in \(\hat{\eta }_{1}\) and \(\hat{\eta }_{2}\) are reasonable, suggesting that the method of choosing a tight grid around these point estimates might work, at least when k is small. The relatively large biases in the values of the vector \(\hat{\psi }_{\delta ,\tau }\), however, confirm the results of Theorem 2 that a consistent estimator is obtained only through joint estimation of all parameters.

For a fixed sample size, these results strongly support the use of the multi-step gradient estimation algorithm, while largely validating the proposed distribution theory. Notably, the distribution of \(\hat{\eta }_i\) appears independent of \(\hat{\eta }_j\), \(i \ne j\), and the distribution of these parameters is largely unaffected by the inclusion of ARMA dynamics. Additionally, the proposed distribution theory for \(\hat{d}_i\) is confirmed. Finally, as shown below, and in numerous other simulations that are available upon request, the estimator appears to achieve the proposed rates of convergence, even when we estimate multiple GARMA components.

For the single frequency case, Chung (1996a) uses a line grid search to estimate \(\eta\), along with a gradient-based method for \(\psi _{\delta ,\tau }\). This implies that the parameter space being searched over is a countable finite set that requires the use of boundary constraints, given that a fine grid would be needed to capture an estimate of \(\eta\) near the true value. Based on the limited algorithm, Chung (1996a) provides support for the proposed theory and associated confidence bands for \(\hat{\eta }\) for all cases except when \(\eta _0=1\). Here, it would appear that the associated empirical test sizes for \(\eta _0=1\) under the null are too large to be of practical use. Beaumont and Smallwood (2022) consider the consequences of using a two-dimensional grid search over both \(\eta\) and d without the use of boundary constraints for \(\eta ,\) and show that the exact distributional results of Chung (1996a) are generally supported, with two exceptions. First, similar to Chung (1996a), Beaumont and Smallwood (2022) show that the theory under the hypothesis \(\eta _0=1\) is problematic for testing purposes, with empirical sizes that are often much higher than their associated theoretical counterparts. Secondly, when \(|\eta _{i,0} |<1\), it is shown that with the use of the proposed algorithm, the resulting empirical distribution has slightly fatter tails and a more peaked density relative to the proposed theory. In terms of calculating confidence bands, the issue appears to be very minor and disappears as the sample size increases. Nonetheless, small biases in confidence bands can result, especially as \(d_0 \rightarrow 0\). We now consider more complete simulation evidence to analyze the extent to which these previous results carry over when \(k>1\).

For varying sample sizes, we considered a variety of experiments, including models where there exists a value of \(\eta _{i,0}=1\). For brevity, the full set of results are not reported here, but are available upon request. Here, we report results for four fairly complicated 2-factor parameterizations. Model 1 is a GARMA(0,0) model with \(\{\eta _{1,0}, d_{1,0}\}= \{0.5,0.4\}\), and \(\{\eta _{2,0}, d_{2,0}\} = \{0,0.2\}\). Given the distributional results above, this parameterization represents a case where the process is expected to be especially volatile.Footnote 5 Model 2 is also a GARMA(0,0) model but with \(\{\eta _{1,0},d_{1,0}\}=\{0.98,0.45\}\), and \(\{\eta _{2,0},d_{2,0}\}=\{-0.4,0.3\}\). This parameterization approaches the region of the discontinuity in our theoretical distribution for \(\hat{\eta }\) and is also a strongly persistent process with \(d_{1,0}\) close to 0.50. Model 3 is the same as Model 1, except we add an AR(1) term with \(\phi _0 = 0.80\). Finally, we consider a case with \(\{\eta _{1,0},d_{1,0}\}=\{1.00,0.20\}\), and \(\{\eta _{2,0},d_{2,0}\}=\{-0.6,0.45\}\). The theoretical results suggest that the estimates of \(\eta _1\) and \(\eta _2\) have different rates of convergence, and given the values of \(d_{1,0}\) and \(d_{2,0}\), the process is again close to the non-stationary border. This parameterization will allow us to explore how theoretical concerns regarding the CSS estimator when \(|\eta _{i,0}|=1\) impact results for \(|\eta _{j,0}|<1,i\ne j.\)

First, we compare the theoretical and simulated distributions of \(\hat{\eta }_i\), \(i={1,2}\). Figure 2 shows the empirical and theoretical normalized cumulative distribution functions (cdf) for \(\hat{\eta }_{1}\) from Model 1 for sample sizes of \(500 \text { and } 2000\). For the empirical distributions, we plot \(T(\hat{\eta }_1 - 0.50)\), where the elements of \(\hat{\eta }\) are computed using the estimation algorithm described above, and the theoretical quantities have been calculated using Eq. (13) from Theorem 3. The vertical differences between the theoretical and empirical curves show the disagreements between the theoretically and empirically derived critical values for each percentile. The two shaded regions show areas below the 0.025 and above the 0.975 percentiles, which would be relevant for the construction of a 95% confidence interval.

Fig. 2
figure 2

Percentiles of theoretical/empirical CDFs of \(\hat{\eta }_{1}\) in model 1 with \(\{\eta _{1,0}, d_{1,0}\}= \{0.5,0.4\}\)

The first observation is that the empirical and theoretical distributions are in fairly close agreement, and this agreement is consistent as the sample size increases. This suggests that the proposed rate-T convergence in Theorem 3 is strongly supported. Second, there is some evidence that the empirical tails are larger than implied by the theory, so we will now explore the consequences of any such differences.

When estimating a k-factor GARMA model, the calculation of confidence bands for \(\hat{\eta }_i\) is likely the most important application of the theory. To get a sense of how applicable our proposed distribution theory and algorithm are, Table 6 provides the estimated biases in calculating the upper and lower 68, 90, 95, and 99% confidence bands for the four models described above. As a reference point, the theoretical bands for each value of \(\hat{\eta }_i\) with T=500 are provided in italic font. Below the theoretical bands, we show the bias associated with the empirical bands for \(\hat{\eta }_{1}\), followed by those of \(\hat{\eta }_2,\) for each sample size.

Table 6 Empirical and theoretical confidence intervals of the \(\eta\)’s

For Model 1, and with relatively small samples of 500 observations, the 99% confidence bands are quite unreliable for \(\eta _{2,0}=0\). From the last two columns in the second block of Model 1 in Table 6, the theoretical confidence band for \(\hat{\eta }_2\) with \(\eta _{2,0}=0\) when \(T=500\) is \([-\,0.0423, 0.0423]\). In contrast, among the 5050 simulations, 99% of the estimated values of \(\hat{\eta }_2\) were within a range of \([-0.0910,0.0683]\), thus producing a bias of the lower 99% band of \(-\) 0.0487 (e.g., \(-\,0.0910+0.0423)\). In general, with small sample sizes, there are small but potentially non-negligible biases when using the 99% confidence bands. Otherwise, the results in Table 6 support the use of the proposed distribution theory in calculating these intervals. First, we note that the differences between the theoretical and estimated bands decrease sharply as T increases and become negligible in most cases when \(T=2000\). Throughout, 68% and 90% bands are surprisingly accurate, such that multiple confidence bands could be presented for researchers wishing to take a conservative approach. Finally, we observe that there are no qualitative differences between the estimated bands from the GARMA(0,0) and GARMA(1,0) models, represented as Model 2 and Model 3, suggesting that the values of \(\hat{\eta }_i\) are independent of ARMA components as implied by the proposed theory.

The simulations for the case with \(\eta _{1,0}=1\) merit additional discussion. First, we see that any potential concerns regarding estimation of \(\eta _1\) likely do not to impact estimation of \(\eta _2\). For example, with \(\eta _{2,0}=-0.60\) and with \(T=500\), 99.5% of all values of \(\hat{\eta }_2\) were less than \(-\) 0.5793, which is quite close to the theoretical upper 99% confidence band given by \(-\) 0.5850. Similar to other experiments, the biases in estimating theoretical percentiles decline with the sample size and become negligible for \(T=2000\). For estimates of \(\eta _1\), we see that the biases in calculating confidence bands are negligible, likely reflecting the proposed rate of convergence given by \(T^2\). Nonetheless, it is important to note that the \(T^2\) factor also affects the test-statistic for the hypothesis \(\eta _{1,0}=1\). More specifically, using the distribution theory outlined in Theorem 3, we obtained the empirical sizes for the null hypothesis \(H_0: \eta _{1,0}=1\) vs. the alternative \(\eta _{1,0}<1\) based on the test statistic \(T^2(\hat{\eta }_1-1)\). The results show that substantial size distortion results. More specifically, the empirical sizes for \(T=500,1000,\) and 2000 observations were equal to 16.89, 18.48, and 18.75%, respectively, based on a 5% test size. This result matches the findings in Beaumont and Smallwood (2022), who show that the distribution theory under the null \(\eta _{1,0}=1\) can be unreliable. Computational methods likely offer resolution for researchers interested in determining if cycles are potentially infinite. In the next section, we briefly outline how to extend Beaumont and Smallwood (2022) to implement a simple parametric bootstrap in order to conduct tests of the hypothesis \(|\eta _{i,0} |=1\) in the multi-factor GARMA model.

5 Application

Emerging research has demonstrated that cyclical long memory is an important characteristic of many financial time series.Footnote 6 To demonstrate the applicability of the CSS estimator and the proposed theory, we consider the weekly trading volume of IBM equities from January 1, 1962, through March 28, 2022. Without loss of generality, the data have been rescaled by dividing by the maximum value for volume. The periodogram of the difference of the resulting series is depicted in Fig. 3.

Fig. 3
figure 3

Periodogram of the first difference of IBM trading volume

From the visual inspection of Fig. 3, we identified as many as 9 frequencies as candidates for spectral poles, including the origin, which dominates the periodogram for the raw series. Based on the discussion above, we then used the BIC to select k and the number of autoregressive and moving average parameters. For each k, we considered all combinations of models with \(p,q \le 3\). Among the 144 estimated models, the BIC selected the 6-frequency GARMA(2,3) model, while the Hannan–Quinn marginally selected k=8 vs. \(k=6\) when considering \(p=2\) and \(q=3\). We therefore selected the 6-frequency GARMA(2,3) model whose estimation results appear in Table 7. Results for \(k>6\) produce similar findings that are available on request.

For the 6-factor model, one isolated frequency is at the origin and the other 5 estimated frequencies are depicted in Fig. 3 by the vertical dotted lines.Footnote 7 Based on the simulation results as discussed above, we show 68% confidence bands under the assumption that \(|\eta _{i,0}|<1\). Additionally, for estimates of \(\psi _{\delta ,\tau }\), we present both numerical and asymptotic standard errors that are very similar and, thus, provide strong support for the proposed distribution theory.Footnote 8

Table 7 Estimation of 6-frequency GARMA(2,3) model for IBM volume

Because the estimated value of \(\eta _1\) is only marginally less than 1, there is strong evidence of a spectral pole at the origin. As discussed above, however, the distribution theory building on Chung (1996a) is suspect when \(\eta _{i,0}\)=1. Consequently, we suggest that a bootstrap method may be a reliable alternative. Although the construction of a fully validated bootstrap test statistic is outside the scope of this paper, the existing literature provides guidance that we exploit here. First, note that under the null, \(H_0: \eta _{1,0}=1\), the parameter \(\eta _1\) lies on the boundary of the parameter space. In such cases, it has been established that bootstrap samples generated from unrestricted CSS estimation may yield invalid test statistics, failing to mimic the target distribution under the null (Andrews 2000; Cavaliere et al. 2017; Cavaliere and Rahbek 2021). A resolution to this problem is to use a restricted bootstrap, where samples are formed from residuals and parameters estimated under the null (Cavaliere et al. 2017). Recently, for the single-frequency GARMA model, Beaumont and Smallwood (2022) propose a restricted bootstrap method to compute critical values and demonstrate that bootstrapped test statistics for the null \(H_0: \eta _0=1\) have correct nominal size, even under potential non-stationarity.

Following Beaumont and Smallwood (2022), we generate a test statistic for the null, \(H_0:\eta _{1,0}=1\), through re-estimation of the selected 6-factor GARMA(2,3) model with \(\eta _{1,0}=1\) imposed. We sample with replacement from the estimated residuals to construct 1000 samples under the null hypothesis. We then estimate the unrestricted 6-factor model for each of the 1000 samples to obtain \(T^2(\hat{\eta }_1^{(j)}-1)\), for \(j \in (1,1000)\). The test-statistics are sorted to obtain bootstrapped critical values that are presented in Table 7 along with the critical values obtained using Theorem 3. As seen in the table, even with 3144 observations, the theoretical critical values appear to be far too small in absolute value when compared to the bootstrapped critical values. In this example, the discrepancy does not alter the conclusion given an estimated value so close to unity.

It should be noted that more research is needed to determine the conditions under which the proposed bootstrap test is consistent. The main task would be an analysis of the distributional properties of the bootstrapped test statistic under the alternative. In general, as pointed out by Cavaliere and Rahbek (2021), this is a very difficult problem, and there is reason to believe the current environment presents unique challenges. In particular, the procedure above uses bootstrapped residuals obtained under the null. If the alternative hypothesis is true, the resulting disturbances are expected to possess long memory of a potentially complicated form, since the correct filter, \((1-2\eta _{1,0}L+L^2)^{d_{1,0}}\), has not been applied to the data. Further, as our theory above shows, \(\hat{d}_i\) is not independent of \(\hat{d}_j\), so that there are additional complications that arise under misspecification.Footnote 9 The behavior of the residuals in this context will be important in future research exploring formal proofs for consistency.

To the extent that there is concern with the proposed bootstrap when the null is false, Cavaliere and Rahbek (2021) propose a hybrid approach to obtain bootstrapped samples using parameters estimated under the null, while using disturbances obtained from unrestricted estimation. This avoids the issue of sampling with long-memory residuals. Specifically, let \(\varepsilon _t^*\) denote the set of residuals obtained from the unconstrained model in Table 7. Then, resampling of \(\varepsilon _t^*\) with replacement is used along with parameter estimates with \(\eta _1=1\) imposed to bootstrap samples consistent with the null under investigation. The remaining steps are the same as for the restricted bootstrap. As discussed extensively by Cavaliere and Rahbek (2021), the use of a hybrid bootstrap of this sort can be useful in instances where boundary conditions are met for a given parameter, but concerns also exist about the properties of residuals under the alternative. As a robustness check to the findings above, we conducted the hybrid bootstrap, and the results continue to yield a failure to reject the null \(\eta _{1,0}=1\) for any conventional test-size.

Finally, to put our findings into context relative to traditional time series methods, we provide estimation results associated with ARIMA models in the bottom panel of Table 7. First, unit root tests present somewhat contradictory results. Specifically, the DF-GLS test of Elliott et al. (1996) yields a rejection of the unit root null at the 5% level when a linear time trend is considered, where a failure to reject otherwise results. Further, coefficients on linear time trends are insignificant for ARIMA models estimated in levels, where the sum of autoregressive coefficients is quite close to one. We therefore proceed by estimating an ARIMA(1,1,1) model, which yielded the lowest BIC for all model combinations with p and q less than or equal to 3. The estimated moving average coefficient is large and negative, potentially contributing to the confusion rendered from standard unit root tests.

As evidenced by a much lower BIC value (\(-\) 9467.3 vs. \(-\) 9123.3), the estimated 6-factor GARMA model yields a superior in-sample fit relative to ARIMA methods. The GARMA estimation results further yield evidence against a unit root. Perhaps most importantly, the GARMA model can capture very diverse dynamics in the data and provides additional insights for researchers analyzing long-memory cycles in financial time series. In our example of IBM trading volume, the estimated values of the Gegenbauer frequencies, \(\hat{\upsilon }_i\), range from 0.0003 to 2.8897, indicating cycle lengths of 13.05, 6.52, 4.35, 2.61 and 2.17 weeks. In addition, we detect an extremely long, potentially infinite, cycle associated with the value of \(\hat{\eta }_1\) that cannot be distinguished from unity. To our knowledge, we are the first to document the potential for multiple sources of long memory in equity trading volumes, a finding that may improve our understanding of stock market behavior.

6 Conclusions

In this paper, we review the properties of a model that captures very diverse patterns in the autocorrelation functions of data. The k-factor GARMA model generalizes existing long-memory models and has the particular advantage that the ACF can decay at a non-monotonic rate that is not necessarily symmetric about zero. In addition, the k-factor GARMA model can accommodate multiple poles in the spectral density function.

As noted by Hunt et al. (2022), providing a full set of distributional results for estimators of k-factor GARMA models has proven elusive. Building on the results in Chung (1996a, 1996b), we study a conditional sum of squares estimator and propose its asymptotic properties. The key feature of our results is that, for all possible values, the asymptotic distribution of \(\hat{\eta }_{i}\) is independent of all other parameters, including \(\hat{\eta }_{j}\), whenever \(i\ne j\). It is important to note, however, that remaining parameters, notably memory parameters, are not asymptotically independent of each other, and therefore methods that sequentially estimate these values will likely suffer from severe bias. Finally, the model parameters are shown to converge at differing rates. This greatly complicates attempts to establish rigorous initial consistency proofs, especially given potential discontinuities in the distribution theory for \(\hat{\eta }_i\). We attempt to overcome this shortcoming by conducting extensive simulations and drawing on the recent work of Beaumont and Smallwood (2022) to show that the estimator performs in precisely the way our theory predicts in nearly all cases.

The simulation results show that the estimator performs well and that the finite sample standard errors are close to the asymptotic calculations. Further, the proposed theory can be used to accurately obtain confidence bands for \(\hat{\eta }_i\). Finally, an application demonstrates the practical value of the k-factor GARMA model. The trading volume of IBM is shown to be well modeled by a six-factor GARMA model with a spectral singularity at the origin.

Given the early success of k-factor GARMA models, as discussed in Introduction, our proposed estimator should find a number of important applications in a myriad of fields. Further, the proposed distribution theory will likely be useful in a number of contexts where specific interest lies in uncertainty regarding the periodicity of long-memory cycles. Nonetheless, challenges still remain. More work is likely needed to determine the appropriate number of spectral poles, although we are able to provide recommendations potentially complementing the recent breakthrough by Leschinski and Sibbertsen (2019). Perhaps more importantly, the proposed distribution theory for \(\hat{\eta }_i\) directly follows Chung (1996a) in allowing the true value, \(\eta _{i,0}\), to potentially lie on the boundary of the associated parameter space. Additionally, the theoretical results suggest that a discontinuity in the distribution of \(\hat{\eta }_i\) occurs as \(|\eta _{i,0} |\rightarrow 1\), where standard rate-T convergence gives way to a \(T^2\) rate when \(|\eta _{i,0} |=1\). While extensive simulation evidence provides strong support for the proposed theory when \(|\eta _{i,0} |<1\), the findings also suggest there are concerns when \(\eta _{i,0} =1\). A proposed bootstrap test offers one potential remedy for researchers interested in testing \(\eta _{i,0}=1\), although additional theory will be required to confirm consistency of the test.