Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

4.1 Tools

4.1.1 Introduction

Most statistical procedures in time series analysis (and in fact statistical inference in general) are based on asymptotic results. Limit theorems are therefore a fundamental part of statistical inference. Here we first review very briefly a few of the basic principles and results needed for deriving limit theorems in the context of long-memory and related processes.

4.1.2 How to Derive Limit Theorems?

To prove the convergence of an appropriately normalized process S n (⋅), one has to verify the convergence of finite-dimensional distributions and tightness. With respect to the first issue, we usually prove just one-dimensional convergence because in most situations extensions to the multivariate case are straightforward. The tools we describe here are applicable to many statistics, not only partial sums. On the other hand, most of the statistics we will consider are just partial sums.

4.1.2.1 How to Verify Finite-Dimensional Convergence?

Suppose that X t (\(t\in\mathbb{N}\)) is a stationary process. One of the common methods for deriving limit theorems is to evaluate its characteristic function. This is however rarely successful in a long-memory setting. An alternative method for partial sums of long-memory sequences is to study the asymptotic behaviour of cumulants. Recall that for a given random variable X, its cumulants are the coefficients in the power series expansion of κ X (z)=logE(e zX), i.e. κ j =κ j (X) in

$$\kappa_{X}(z)=\sum_{j=0}^{\infty}\frac{z^{j}}{j!}\kappa_{j}. $$

In particular, κ 1=μ X =E(X), \(\kappa_{2}=\sigma_{X}^{2}=\operatorname {var}(X)\). If E(X)=0, then κ 4=E(X 4)−3E 2(X 2). One of the useful properties of cumulants is that for a normal random variable X, we have κ j =0 for all j≥3, and this is only the case for the normal distribution. Moreover, a normal distribution is uniquely determined by its moments.

The justification for the approach based on cumulants is the following well-known result (see e.g. Rao 1965):

Theorem 4.1

Let S n (\(n\in\mathbb{N}\)) be a sequence of random variables such that E[|S n |j]<∞ for all j, and let Y be a random variable whose distribution is uniquely determined by its moments μ j =E(Y j) (\(j\in\mathbb{N}\)). Then the convergence of all cumulants κ j (S n ) of S n (\(j\in\mathbb{N}\)) to the cumulants κ j (Y) of Y implies that S n converges to Y in distribution.

Cumulants are useful if all moments exist. An approach that does not require finiteness of higher-order moments is referred to as a K-dependent approximation method and is adapted from Billingsley (1968, Theorem 4.2).

Proposition 4.1

Let X t (\(t\in\mathbb{N}\)) be a stationary sequence, c n a sequence of constants, and X t,K (\(t\in\mathbb{N}\)) a sequence of K-dependent random variables. Define \(S_{n}=\sum_{t=1}^{n}X_{t}\) and \(S_{n,K}=\sum_{t=1}^{n}X_{t,K}\), and suppose that the following holds:

  1. (a)

    \(c_{n}^{-1}S_{n,K}\overset{d}{\to} S_{K} \) as n→∞;

  2. (b)

    \(S_{K}\overset{P}{\to} S\) as K→∞;

  3. (c)
    $$\lim_{K\rightarrow\infty}\lim\sup_{n\rightarrow\infty}P\bigl(c_{n}^{-1} |S_{n,K}-S_{n}|>\gamma\bigr)=0 $$

    for each γ>0.

Then, as n→∞,

$$c_{n}^{-1}S_{n}\overset{d}{\rightarrow}S. $$

To apply this theorem, we mention that if \(v_{K}^{2}\rightarrow v^{2}\) as K→∞, then \(N(0,v_{K}^{2})\overset{d}{\rightarrow}N(0,v^{2})\). Furthermore, this approach requires the following result for K-dependent sequences.

Lemma 4.1

Let X t,K (\(t\in\mathbb{N}\)) be a stationary sequence of K-dependent random variables with \(\operatorname {var}(X_{0,K})<\infty\), and define \(S_{n,K}=\sum_{t=1}^{n}X_{t,K}\). Then

$$n^{-\frac{1}{2}}S_{n,K}\overset{d}{\rightarrow}\sigma_{K}N(0,1), $$

where \(\sigma_{K}^{2}=\operatorname {var}(X_{0,K})+2\sum_{j=1}^{K}\mathit{cov}(X_{0,K},X_{j,K})\).

Another useful result is the following martingale central limit theorem.

Lemma 4.2

Let \((X_{t,n},\mathcal{F}_{t})\) (\(t\in \mathbb{N}\), n≥1) be a martingale difference array, and define \(\tilde{X}_{t,n}=X_{t,n}-E(X_{t,n}|\mathcal{F}_{t-1})\). Furthermore, assume that the following conditions hold:

  1. (a)

    for each δ>0,

    $$\sum_{t=1}^{n}E\bigl(\tilde{X}_{t,n}^{2}1 \bigl\{|\tilde{X}_{t,n}|>\delta\bigr\} \bigr)\rightarrow 0, $$
  2. (b)
    $$\sum_{t=1}^{n}E\bigl(\tilde{X}_{t,n}^{2}\big| \mathcal{F}_{t-1}\bigr)\overset {p}{\rightarrow }1. $$

Then

$$\sum_{t=1}^{n}X_{t,n}\overset{d}{\rightarrow}N(0,1). $$

4.1.2.2 How to Verify Tightness?

There are several ways to prove tightness. A particularly useful result given in Theorem 15.6 of Billingsley (1968) provides sufficient conditions for tightness in D (the space of right-continuous functions with left limits):

Lemma 4.3

A stochastic process Y n (u) (u∈[0,1]) is tight if there exist η>1, a>0 and a nondecreasing function g such that for all v 1<u<v 2∈[0,1],

$$E \bigl[ \big|Y_{n}(v_{2})-Y_{n}(u)\big|^{a}\big|Y_{n}(u)-Y_{n}(v_{1})\big|^{a} \bigr] \leq\bigl(g(v_{2})-g(v_{1})\bigr)^{\eta}. $$

In particular, assume that X t (\(t\in\mathbb{N}\)) is a stationary sequence of random variables and G is a function such that E[G(X t )]=0. Consider the partial sum process

$$ S_{n}(u)=\sum_{t=1}^{[nu]}G(X_{t})\quad\bigl(u\in[0,1]\bigr). $$
(4.1)

Applying Lemma 4.3 to the partial sum process \(Y_{n}(u)=d_{n}^{-1}S_{n}(u)\) yields the following result (see Theorem 2.1 in Taqqu 1975).

Lemma 4.4

Assume that

  1. (a)

    E[G(X 1)]=0 and E[G 2(X 1)]<∞.

  2. (b)

    \(d_{n}^{2}\sim n^{2d+1}L_{S}(n)\) with \(-\frac{1}{2}\leq d<\frac{1}{2}\) and a slowly varying function L S .

  3. (c)

    \({E}[S_{n}^{2}(1)]=O(d_{n}^{2})\).

  4. (d)

    There exists a>(2d+1)−1 such that \({E}(|S_{n}(1)|^{2a})=O((E[S_{n}^{2}(1)])^{a})\).

Then \(d_{n}^{-1}S_{n}(\cdot)\) is tight.

Proof

Assume for simplicity that L S ≡1. We note that the process S n (u), u∈[0,1], has stationary increments. In particular, for 0≤uv≤1, \(S_{n}(v)-S_{n}(u)\overset{{d}}{=} S_{n}(v-u)\). Thus, applying the Cauchy–Schwarz inequality and stationarity of increments, we have for v 1<u<v 2, and a suitable constant 0<C<∞,

Since (2d+1)a>1, Billingsley’s criterium is fulfilled, and the process is tight. □

If we restrict ourselves to d>0, then Lemma 4.3 leads to a particularly useful criterion in the long-memory case because it amounts to finding a bound on E[(Y n (v 2)−Y n (v 1))2] only.

Lemma 4.5

Assume that Y n (u) (u∈[0,1]) is a stochastic process with stationary increments. If

(4.2)

d>0, then the process is tight.

Indeed, if we consider again \(Y_{n}(u)=d_{n}^{-1}S_{n}(u)\), then

and the exponent exceeds one since d>0. We note that this approach does not work when d≤0. Hence, in a sense, showing tightness in a long-memory case is easier than in a weakly dependent and antipersistent situation. We note further that condition (4.2) is almost the same as a moment condition for tightness of processes in C; see Theorem 12.3 in Billingsley (1968).

4.1.2.3 Functional Central Limit Theorem for Processes

The following result is used to establish a functional limit theorem for a sum of independent stochastic processes; see e.g. p. 226 of Whitt (2002).

Lemma 4.6

Let X t (u) (u∈[0,∞), \(t\in\mathbb{N}\)) be an i.i.d. sequence of processes viewed as random elements in D[0,∞). If E(X 1(u))=0, \(E(X_{1}^{2}(u))<\infty\) for each u∈[0,∞) and there exist continuous nondecreasing functions f, g and numbers a>1/2, b>1 such that

for all 0≤u<v≤∞, 0≤v 1<u<v 2<∞, then

$$n^{-1/2}\sum_{t=1}^{n}X_{t}(u)\Rightarrow G(u), $$

where G is a zero-mean Gaussian process with continuous sample paths, cov(G(0),G(u))=cov(X 1(0),X 1(u)), anddenotes weak convergence in D[0,∞).

4.1.2.4 Functional Central Limit Theorem for Inverses

The following result, known as Vervaat’s lemma (see Vervaat 1972 or De Haan and Ferreira 2006), plays a crucial role in deriving limit theorems for appropriately scaled and normalized quantile processes (as inverses of empirical processes; see Sect. 4.8.2), or counting processes (as inverses of partial sum processes; see Sect. 4.9).

Lemma 4.7

(FCLT for Inverse Functions)

Denote by D 0([0,∞)) the subset of D[0,∞) consisting of non-decreasing, non-negative, unbounded functions. Let y n (⋅) (n≥1) be a sequence of elements of D 0([0,∞)). Moreover, let y(⋅) be a continuous function on [0,∞), and c n (n≥1) a sequence of positive numbers such that c n →0. If

$$\frac{y_{n}(u)-u}{c_{n}}\rightarrow y(u) $$

uniformly on compact sets in [0,∞), then

$$\frac{y_{n}^{-1}(u)-u}{c_{n}}\rightarrow-y(u) $$

uniformly on compact sets in [0,∞), where \(y_{n}^{-1}(u):=\inf \{v:y_{n}(v)>u\}\) is the generalized inverse of y n (⋅).

It is important to mention that the continuity assumption on y(⋅) cannot be relaxed. If the limiting function has jumps, then the uniform convergence of the inverse processes does not follow necessarily. In particular, this theorem will be applicable to situations where we have weak convergence in D[0,1] equipped with the standard J 1-topology, to a continuous process, and from that we will conclude weak convergence in that topology for the inverse processes. If the limiting process has jumps, we may not be able to conclude weak convergence of the inverse processes in the same topology, even though we may have weak convergence of the original processes. Nevertheless, at least finite-dimensional convergence follows. We refer to Whitt (2002, Chap. 13) for more details.

It is also important to see that in this lemma we assume the identity function to be the correct quantity to subtract. Thus, for instance, when dealing with the empirical distribution function \(F_{n}(x)=n^{-1}\sum _{t=1}^{n} 1\{X_{t}\leq x\}\) (where XF X ), the result actually refers to \(\tilde{F}_{n}(x)=n^{-1}\sum_{t=1}^{n} 1\{F_{X}(X_{t})\leq x\}\) and the corresponding inverse. The reason is that F X (X) is uniformly distributed, so that we are in the situation described in Vervaat’s lemma. The result for F n (and \(F_{n}^{-1}\)) then follows by the continuous mapping theorem.

4.1.3 Spectral Representation of Stationary Sequences

In this section we collect several standard results on spectral theory for stationary processes. Some of these properties have been used in the preliminary discussion on long memory, see Chap. 1. We state these results without a reference since they can be found in standard textbooks on time series such as Brockwell and Davis (1991).

Recall that for a zero-mean second-order stationary process X t (\(t\in\mathbb{Z}\)) with autocovariances γ X (k), there is a spectral distribution function F such that

$$\gamma_{X}(k)=\int_{-\pi}^{\pi}e^{ik\lambda}\,dF(\lambda). $$

Moreover, X t has a spectral representation of the form

$$X_{t}(\omega)=\int_{-\pi}^{\pi}e^{it\lambda}\,dM(\lambda;\omega), $$

where M(⋅;ω) is a spectral measure (for simplicity, we will often write M(λ) instead of M(λ;ω)). The spectral measure is a complex-valued zero mean stochastic process on [−π,π] with (a.s.) right-continuous sample paths and uncorrelated (but not necessarily independent) increments with a variance that is directly related to F. More specifically, we have

In particular, if the spectral density exists, then we may write the infinitesimal equation \(\operatorname {var}(dM(\lambda))=E[ | dM(\lambda)| ^{2}] =f(\lambda)\,d\lambda\).

It is important to distinguish between the role of the spectral distribution F and the spectral measure M. The spectral distribution determines the autocovariance structure, i.e. linear dependence, of the process only. In contrast, the spectral measure fully specifies the process (in the sense of the probability distribution of sample paths). In the special case where M=M ε with \(E[| dM_{\varepsilon}(\lambda)| ^{2}] =\sigma_{\varepsilon}^{2}/(2\pi)\cdot d\lambda\) we obtain a white noise process with variance \(\sigma_{\varepsilon}^{2}\) where “white noise” stands for uncorrelated observations. This follows directly from the spectral representation

$$ \varepsilon_{t}=\int_{-\pi}^\pi e^{it\lambda}\,dM_{\varepsilon}(\lambda)\quad (t\in\mathbb{Z)} $$
(4.3)

since

The spectral density of ε t is \(f_{\varepsilon}(\lambda )=\sigma_{\varepsilon}^{2}/(2\pi)\). One should bear in mind that, in general, this does not imply the independence of ε t (\(t\in\mathbb {Z}\)). Such a direct conclusion can only be made if M(λ;ω) is a Gaussian process.

A zero mean, purely nondeterministic second-order stationary process always has a Wold decomposition

$$X_{t}=\sum_{j=0}^{\infty}a_{j}\varepsilon_{t-j}=A(B)\varepsilon_{t} \quad (t\in\mathbb{Z}) $$

with uncorrelated (i.e. “white noise”) innovations ε t and A(z)=∑a j z j such that \(\sum_{j=0}^{\infty}a_{j}^{2}<\infty\). Therefore, the spectral measure and spectral distribution have a simple form, namely (with equality in the L 2(Ω) sense)

$$ X_{t}=\int_{-\pi}^{\pi}e^{it\lambda}\,dM_{X}(\lambda) =\int_{-\pi}^{\pi }e^{it\lambda}A\bigl(e^{-i\lambda}\bigr)\, dM_{\varepsilon}(\lambda) \quad (t\in\mathbb{Z}). $$
(4.4)

In other words,

$$dM_{X}(\lambda)= \Biggl( \sum_{j=0}^{\infty}a_{j}e^{-ij\lambda} \Biggr) \,dM_{\varepsilon}(\lambda) =A\bigl(e^{-i\lambda}\bigr) \,dM_{\varepsilon}(\lambda). $$

The spectral density

$$f_{X}(\lambda)=\frac{1}{2\pi}\sum_{k=\infty}^{\infty}\gamma_{X}(k)\exp (-i\lambda k) $$

is then given by

$$f_{X}(\lambda)=\frac{\sigma_{\varepsilon}^{2}}{2\pi} \Biggl \vert \sum_{j=0}^{\infty}a_{j}e^{-ij\lambda} \Biggr \vert ^{2}= \frac{\sigma_{\varepsilon}^{2}}{2\pi} \bigl \vert A\bigl(e^{-i\lambda}\bigr)\bigr \vert ^{2}. $$

These formulas are valid generally. More specifically, if we consider linear processes only, the ε t s in the Wold representation are not only uncorrelated but even independent. This means that the increments of M ε are independent (instead of being just uncorrelated). Even more specifically, a Gaussian process is a linear process that has normally distributed ε t s, namely \(\varepsilon_{t}\sim N(0,\sigma_{\varepsilon}^{2})\). This means that we are in the following situation. The measure M ε is a Gaussian spectral measure such that for all sets A, E[M ε (A)]=0, E ε [M(AB)]=0 for all disjoint sets A and B, and \(E[M_{\varepsilon}(A)\overline{M_{\varepsilon}(A)}] =\sigma_{\varepsilon}^{2}|A|/(2\pi)\), where |⋅| denotes the Lebesgue measure. Moreover, for all λ 1λ 2<λ 3λ 4, the increments M ε (λ 4)−M ε (λ 3) and M ε (λ 2)−M ε (λ 1) are independent. (For simplicity of notation, we will mostly assume that \(\sigma_{\varepsilon}^{2}=1\), which means that M ε (⋅) is a spectral measure of an i.i.d. N(0,1) sequence.) The Gaussian process X t is then given by

$$ X_{t}=\sum_{j=0}^{\infty}a_{j}\varepsilon_{t-j}=\int_{-\pi}^{\pi} e^{it\lambda}\,dM_{X}(\lambda)\quad(t\in\mathbb{N}), $$
(4.5)

where M X is the Gaussian spectral measure defined by

$$dM_{X}(\lambda)= \Biggl( \sum_{j=0}^{\infty}a_{j}e^{-ij\lambda} \Biggr) \,dM_{\varepsilon}(\lambda)=A\bigl(e^{-i\lambda}\bigr)\,dM_{\varepsilon}( \lambda )=:\sqrt{2\pi}a(\lambda)\,dM_{\varepsilon}(\lambda). $$

Note that in the notation with a(λ), the spectral density can be written as

$$f_{X}(\lambda)=\sigma_{\varepsilon}^{2}\bigl \vert a( \lambda)\bigr \vert ^{2}. $$

Thus, for \(\sigma_{\varepsilon}^{2}=1\), we have the identity f X (λ)=|a(λ)|2.

Another result that is very useful in many situations, such as prediction or (Gaussian) maximum likelihood estimation, is the following factorization of the spectral density. Let us write logf X as a Fourier series

$$\log f_{X}( \lambda) =\sum_{j=-\infty}^{\infty}\alpha_{j}e^{-ij\lambda} $$

with coefficients

$$ \alpha_{j}=\alpha_{-j}=\frac{1}{2\pi}\int_{-\pi}^{\pi}e^{ij\lambda}\log f_{X}( \lambda) \,d\lambda. $$
(4.6)

Then we obtain the factorization

$$ f_{X} ( \lambda ) =\exp ( \alpha_{0} ) \bigl \vert A \bigl( e^{-i\lambda} \bigr) \bigr \vert ^{2}=\frac{\sigma_{\varepsilon}^{2}}{2\pi }\bigl \vert A \bigl( e^{-i\lambda} \bigr) \bigr \vert ^{2}=:\frac{\sigma_{\varepsilon}^{2}}{2\pi}h_{X}( \lambda), $$
(4.7)

where

$$A ( z ) =\sum_{j=0}^{\infty}a_{j}z^{j}= \exp \Biggl( \sum_{j=1}^{\infty} \alpha_{j}z^{j} \Biggr) $$

and

$$\frac{\sigma_{\varepsilon}^{2}}{2\pi}=\exp( \alpha_{0}) . $$

The last equation, together with (4.6), implies

$$\alpha_{0}=\frac{1}{2\pi}\int_{-\pi}^{\pi}\log f_{X}( \lambda) \,d\lambda=\log\sigma_{\varepsilon}^{2}-\log2\pi. $$

For the function h X (⋅) defined in (4.7), we therefore obtain

$$ \int_{-\pi}^{\pi}\log h_{X}( \lambda) \,d\lambda=0. $$
(4.8)

This property is particularly useful for the asymptotic theory of (Gaussian) quasi-maximum likelihood estimation.

Finally, the following lemma is useful in spectral analysis of stationary sequences (see Lemma 2 in Moulines et al. 2007a). Consider the spectral radius Sp(A) of an n×n matrix A, defined as the maximal absolute eigenvalue, or

$$\mathit{Sp}(A)=\sup_{\mathbf{x}\in\mathbb{R}^{n}:\| \mathbf{x}\| \leq 1}\mathbf{x}^{T}A\mathbf{x}. $$

Now let A=Σ n =[γ X (ij)] i,j=1,…,n be the covariance matrix of X=(X 1,…,X n )T, where X t is a zero-mean stationary process with spectral density f X . Then

where the last expression follows from the Parseval identity. Hence, we have the following result.

Lemma 4.8

Assume that X t (\(t\in\mathbb{Z}\)) is a stationary process with the spectral density f X . Assume that Σ n is the covariance matrix of X 1,…,X n . Then

$$\mathit{Sp}(\varSigma_{n})\le2\pi\sup_{\lambda\in[-\pi,\pi]}\big|f_{X}(\lambda)\big|. $$

4.2 Limit Theorems for Sums with Finite Moments

4.2.1 Introduction

Let X t (\(t\in\mathbb{N}\)) be a stationary process. The asymptotic behaviour of partial sums

$$ S_{n}(u)=S_{n,G}(u)=\sum_{t=1}^{[nu]}G(X_{t}) $$
(4.9)

is at the core of probability theory. In this section we present limit theorems for partial sums associated with long-memory or antipersistent processes. Two types of distinctions have to be made. One is between linear and nonlinear processes. The other is between processes with finite and infinite variance. The case of infinite variance is studied in Sect. 4.3. Depending on which of these cases is considered, different results and mathematical techniques are required.

In this section we discuss finite-variance processes only. We will begin our exposition by assuming that X t (\(t\in\mathbb{N}\)) is a Gaussian process, since computations and proofs are technically less challenging than for instance for general Appell polynomials. The limiting phenomena related to partial sums of subordinated Gaussian sequences were observed first by Rosenblatt (1961) and then developed independently by Taqqu (1975, 1977, 1979), Dobrushin (1980) and Dobrushin and Major (1979). Further developments can be found in Breuer and Major (1983), Giraitis and Surgailis (1985), Ho and Sun (1987, 1990), Dehling and Taqqu (1989a, 1989b) and Arcones (1994). Although the original technique in Taqqu (1975) to show convergence to the so-called Hermite–Rosenblatt distribution was based on characteristic functions, the common method to obtain a non-central limit theorem is based on (multiple) Wiener–Itô integrals, together with the diagram formula. For long-memory linear processes, the first result was obtained in Davydov (1970a, 1970b); see also Gorodetskii (1977), Lang and Soulier (2000), Wang et al. (2003).

As for subordinated linear processes, there are two common approaches: Appell polynomials (Surgailis 1981, 1982; Giraitis 1985; Giraitis and Surgailis 1986, 1989; Avram and Taqqu 1987; Surgailis and Vaičiulis 1999; Surgailis 2000; also see Surgailis 2003 for a review) and a martingale decomposition (Ho and Hsing 1996, 1997; Giraitis and Surgailis 1999; Wu 2003; see also Hsing 2000 for a review).

The theory for nonlinear models with long memory is less well developed. EGARCH-type models were considered in Surgailis and Viano (2002), whereas results for LARCH(∞) processes can be found for instance in Giraitis et al. (2000c), Giraitis and Surgailis (2002), Berkes and Horváth (2003), Beran (2006).

4.2.2 Normalizing Constants for Stationary Processes

Before getting into the details of limiting distributions, a first question can be answered relatively easily, namely which normalizing sequences should be used to obtain nondegenerate limits. Let \(S_{n}=\sum _{t=1}^{n}X_{t}\), where X t (\(t\in\mathbb{N}\)) is a stationary sequence with appropriate moment conditions. We consider the asymptotic behaviour of \(\operatorname {var}(S_{n})\) in three cases: long memory, short memory and antipersistence.

Lemma 4.9

(Long Memory)

Let X t (\(t\in\mathbb{N}\)) be a stationary sequence with γ X (k)∼L γ (k)k 2d−1 (k→∞) for some \(0<d<\frac{1}{2}\), where L γ is slowly varying at infinity. Then, as n→∞,

$$ \operatorname {var}(S_{n})\sim L_{S}(n)n^{2d+1} $$
(4.10)

with

$$ L_{S}(n)=L_{1}(n)=C_{1}L_{\gamma}(n)=\frac{1}{d\left( 2d+1\right) }L_{\gamma}(n). $$
(4.11)

Proof

We have

The last expression can be written as

 □

Lemma 4.10

(Short Memory)

Let X t (\(t\in\mathbb{N}\)) be a stationary sequence with \(\sum_{k=-\infty }^{\infty }\gamma_{X}(k)>0\) and \(\sum_{k=-\infty}^{\infty}| \gamma_{X}(k)| <\infty\). Then, as n→∞,

$$ \operatorname {var}(S_{n})\sim c_{S}n $$
(4.12)

with

$$ c_{S}=\sum_{k=-\infty}^{\infty}\gamma_{X}(k). $$
(4.13)

Proof

Cesaro summability implies

$$\sum_{k=-(n-1)}^{n-1}\frac{k}{n}\gamma_{X}(k)\rightarrow0, $$

so that

$$\operatorname {var}(S_{n})\sim n\sum_{k=-(n-1)}^{n-1}\gamma_{X}(k)\sim c_{S}n. $$

 □

Lemma 4.11

(Antipersistence)

Let X t (\(t\in\mathbb{N}\)) be a stationary sequence with γ X (k)∼L γ (k)k 2d−1 (k→∞) for some \(-\frac{1}{2}<d<0\), where L γ is slowly varying at infinity, and

$$\sum_{k=-\infty}^{\infty}\gamma_{X}(k)=0. $$

Then, as n→∞,

$$ \operatorname {var}(S_{n})\sim L_{S}(n)n^{2d+1} $$
(4.14)

with

$$ L_{S}(n)=\frac{1}{d( 2d+1) }L_{\gamma}(n). $$
(4.15)

Proof

Then the result follows by the same arguments as in the long-memory case. □

Note that in the proof of Lemma 4.11, the Riemann approximation could not be applied to \(\sum_{k=-(n-1)}^{n-1}\gamma_{X}(k)\) directly because u 2d−1 is not integrable at the origin for d<0. Note also that in the antipersistent case, L γ (k)<0 for k large enough. However, since L γ (k) is multiplied by d −1, the slowly varying function L S (n) is positive asymptotically.

Taking into account Theorem 1.3, a unified formula including (4.10), (4.12) and (4.14) can be written in terms of the spectral density. Using the notation

$$L_{f}(\lambda)=L_{\gamma}\bigl(\lambda^{-1}\bigr)\pi^{-1}\varGamma(2d)\sin\biggl( \frac{\pi}{2}-\pi d\biggr) $$

and

(4.16)

we have

$$\operatorname {var}(S_{n})\sim\nu(d)L_{f} \bigl( n^{-1} \bigr) n^{2d+1}\sim\nu(d)f_{X}\bigl(n^{-1}\bigr)n. $$

4.2.3 Subordinated Gaussian Processes

We begin our exposition by assuming that X t (\(t\in\mathbb{N}\)) are normal random variables because computations and proofs are technically less challenging than in the case of Appell polynomials, for instance. The limiting phenomena related to partial sums of subordinated Gaussian sequences were first observed by Rosenblatt (1961) and then developed independently by Taqqu (1975, 1977, 1979) Dobrushin (1980) and Dobrushin and Major (1979). Further developments can be found in Breuer and Major (1983), Giraitis and Surgailis (1985), Ho and Sun (1987, 1990) and Arcones (1994). Although the original technique in Taqqu (1975) to show convergence to the so-called Hermite–Rosenblatt distribution was based on characteristic functions, the common method to obtain non-central limit theorems is based on (multiple) Wiener–Itô integrals, together with the diagram formula.

4.2.3.1 Moment Bounds and Normalizing Constants

Recall from Sect. 3.1.2 that each function G(⋅) in \(L^{2}(\mathbb{R},\phi)\) with \(\phi (x)=(2\pi)^{-1/2}\*\exp(-x^{2}/2)\) can be expanded as

$$G(X)=E\bigl[G(X)\bigr]+\sum_{l=1}^{\infty} \frac{J(l)}{l!}H_{l}(X)=E\bigl[G(X)\bigr]+\sum _{l=m}^{\infty}\frac{J(l)}{l!}H_{l}(X), $$

where J(l)=E[G(X)H l (X)], X is a standard Gaussian random variable, and m is the Hermite rank of G (i.e. the smallest m≥1 such that \(J(m)\not=0\)). Moreover, recall the formula (3.16) for \(H_{m}(\sum_{j=1}^{l}a_{j}x_{j})\),

$$ H_{m} \Biggl( \sum_{j=1}^{l}a_{j}x_{j} \Biggr) =\sum_{m_{1}+\cdots+m_{k} =m}\frac{m!}{m_{1}!\dots m_{k}!}\prod _{j=1}^{l}a_{j}^{m_{j}}H_{m_{j}} (x_{j}) . $$
(4.17)

This was used for deriving the formula for covariances of Hermite polynomials given in Lemma 3.5. For convenience, we repeat the result here:

Lemma 4.12

Let X 1, X 2 be a pair of jointly standard normal random variables with covariance γ=cov(X 1,X 2). Then

$$ \mathit{cov} \bigl( H_{l}(X_{1}),H_{l}(X_{2}) \bigr) =l!\gamma^{l}, $$
(4.18)

whereas for jl,

$$ \mathit{cov} \bigl( H_{j}(X_{1}),H_{l}(X_{2}) \bigr) =0. $$
(4.19)

In particular, assume now that

$$\gamma_{X}(k)\sim L_{\gamma}(k)k^{2d-1}$$

with d∈(0,1/2), and consider the sum of H m (X t ). From Lemma 4.12 we see that if \(d>1-\frac {1}{2}m^{-1}\), the autocovariance \(\gamma _{H_{m}}(k)=\mathit{cov}(H_{m}(X_{t}),H_{m}(X_{t+k}))\) of the transformed process H m (X t ) is not summable because it is (up to the slowly varying function) of the order k m(2d−1) with m(2d−1)>−1. Using the same argument as in the proof of Lemma 4.9, we then obtain

$$ \operatorname {var}\Biggl( \sum_{t=1}^{n}H_{m}(X_{t}) \Biggr) =m!\sum_{k=1}^{n}\sum _{j=1}^{n}\gamma_{X}^{m}(j-k) \sim L_{m}(n)n^{(2d-1)m+2}, $$
(4.20)

where

$$ L_{m}(n)=m!C_{m}L_{\gamma}^{m}(n) $$
(4.21)

and

$$ C_{m}=\frac{2}{[(2d-1)m+1][(2d-1)m+2]}. $$
(4.22)

Furthermore, if G has the Hermite rank m, then the variance of G(X) can be decomposed into (orthogonal) contributions of the Hermite coefficients,

$$ \operatorname {var}\bigl( G ( X ) \bigr) =\sum_{l=1}^{\infty} \biggl( \frac{J(l)}{l!} \biggr)^{2}l!=\sum _{l=m}^{\infty}\frac{J^{2}(l)}{l!}. $$
(4.23)

Similarly, if X 1 and X 2 are as in Lemma 4.12,

$$ \mathit{cov} \bigl( G(X_{1}),G(X_{2}) \bigr) =\sum _{l=m}^{\infty}\frac{J^{2}(l)}{l!} \gamma^{l}. $$
(4.24)

Consequently, applying this to the stationary Gaussian sequence X t (\(t\in\mathbb{N}\)), we obtain

$$ \gamma_{G}(k)=\mathit{cov}\bigl(G(X_{t}),G(X_{t+k}) \bigr)=\sum_{l=m}^{\infty}\frac {J^{2}(l)}{l!}\gamma_{X}^{l}(k). $$
(4.25)

Thus, as k→∞, the asymptotic behaviour of cov(G(X t ),G(X t+k )) is determined by the leading term \((J^{2}(m)/m!)\gamma _{X}^{m}(k)\). From (4.25) we therefore conclude that for a function G with the Hermite rank m, the asymptotic behaviour of the autocovariance is given by

$$\gamma_{G}(k)\sim\frac{J^{2}(m)}{m!}L_{\gamma}^{m}(k)k^{m(2d-1)}\quad (k\rightarrow\infty). $$

Therefore, if m(1−2d)<1, then by the same argument as in (4.20),

$$ \operatorname {var}\Biggl( \sum_{t=1}^{n}G(X_{t}) \Biggr) \sim\frac{J^{2}(m)}{m!}C_{m}L_{\gamma}^{m}(n)n^{(2d-1)m+2}= \biggl( \frac{J(m)}{m!} \biggr)^{2}L_{m}(n)n^{(2d-1)m+2}, $$
(4.26)

where C m is the constant in (4.22), and L m (⋅) is the slowly varying function defined in (4.21). Otherwise, if m(1−2d)>1, then

$$\sum_{k=1}^{\infty}\big|\mathit{cov}\bigl(G(X_{t}),G(X_{t+k})\bigr)\big|<\infty. $$

Therefore, one can expect two different types of convergence: either a long-memory type where the normalization for partial sums is

$$ n^{-((d-\frac{1}{2})m+1)}L_{m}^{-\frac{1}{2}}(n)=n^{-\frac{1}{2}-((m-1)/2-d)}L_{m}^{-\frac{1}{2}}(n) $$
(4.27)

or a weakly-dependent type with the usual normalization n −1/2.

We conclude the discussion of normalizing constants by mentioning two useful bounds derived by Arcones (1994):

  • If m(1−2d)<1, then there is a constant C such that for any function G with Hermite rank m,

    $$\operatorname {var}\Biggl( n^{-1}\sum_{t=1}^{n}G(X_{t}) \Biggr) \leq C\gamma_{X}^{m}(n)\operatorname {var}\bigl(G(X_{1})\bigr). $$
  • If m(1−2d)>1, then there is a constant C such that for any function G with Hermite rank m,

    $$\operatorname {var}\Biggl( n^{-1}\sum_{t=1}^{n}G(X_{t}) \Biggr) \leq Cn^{-1}\operatorname {var}\bigl(G(X_{1})\bigr). $$

The first inequality looks very similar to (4.26). However, the important difference is that the constant C depends on the Gaussian process X t only and not on the function G.

4.2.3.2 Limiting Distribution

The Hermite rank of G(x)=x is one. Furthermore, \(\sum_{t=1}^{[nu]}X_{t}\) is normally distributed for all n and u∈[0,1]. Therefore, in view of (4.27), the following result is obvious. Note that it is valid for all values of \(d\in(-\frac{1}{2},\frac {1}{2})\), i.e. for long memory (\(d\in(0,\frac{1}{2})\)), short memory (\(d=\frac{1}{2}\)) and antipersistence (\(d\in(-\frac{1}{2},0)\)). The limiting process is Gaussian. The dependence structure of the increments depends on d.

Theorem 4.2

Assume that X t (\(t\in \mathbb{N}\)) is a stationary sequence of standard normal random variables such that f X (λ)=L f (λ)|λ|−2d with d∈(−1/2,1/2) and the assumptions of Lemma 4.9 (for d>0, Lemma 4.10) (for d=0) or Lemma 4.11 (for d<0) hold respectively. Let \(S_{n}(u)=\sum_{t=1}^{[nu]}X_{t}\). Then

$$n^{-(d+\frac{1}{2})}L_{1}^{-\frac{1}{2}}(n)S_{n}(u)\Rightarrow B_{H}(u)\quad\bigl(u\in[0,1]\bigr), $$

where B H (⋅) is a standard fractional Brownian motion with Hurst parameter \(H=d+\frac {1}{2}\), ” denotes weak convergence in D[0,1], and L 1(n)=L f (n −1)ν(d) with ν(d) defined in (4.16).

Proof

As mentioned in the introduction to this chapter, we prove finite-dimensional convergence just in the one-dimensional case. Clearly, S n (u) is normal, and \(r_{n}^{2}=\operatorname {var}(S_{n}(1))/(n^{2d+1}L_{1}(n))\rightarrow1\). Thus, with \(d_{n}^{2}=n^{2d+1}L_{1}(n)\),

$$E \bigl( e^{i\theta d_{n}^{-1}S_{n}(1)} \bigr) =\exp\biggl(-\frac{1}{2} \theta^{2}r_{n}^{2}\biggr)\rightarrow\exp\bigl(- \theta^{2}/2\bigr). $$

Thus, one-dimensional distributions of S n (u) converge to the standard normal distribution.

For tightness, note that S n (1) is normal, so that \(E[S_{n}^{2l}(1)]\) (\(l\in\mathbb{N}\)) is proportional to \(( E[S_{n}^{2}(1)]) ^{l}\). Therefore, the conditions of Lemma 4.4 are fulfilled, and tightness follows. □

We will now present another proof of this theorem. The reason is that it will be easily extendable to more complicated cases of general Hermite polynomials and non-normal random variables. Recall some notions on the spectral representation of stationary time series from Sect. 4.1.3. Let ε t (\(t\in \mathbb{Z}\)) be a centred, finite-variance i.i.d. sequence. Then ε t can be represented in terms of a Gaussian spectral measure with uncorrelated increments,

$$\varepsilon_{t}=\int_{-\pi}^{\pi}e^{it\lambda}\,dM_{\varepsilon}(\lambda)\quad(t\in\mathbb{Z}).$$

Recall also that

$$E\bigl[\bigl \vert dM_{\varepsilon}(\lambda)\bigr \vert ^{2}\bigr]= \frac{\sigma_{\varepsilon }^{2}}{2\pi}\,d\lambda=f_{\varepsilon}(\lambda)\,d\lambda, $$

where \(\sigma_{\varepsilon}^{2}=\operatorname {var}(\varepsilon_{t})\). Without loss of generality, we will assume that \(\sigma_{\varepsilon}^{2}=1\) in the following. Moreover it will be convenient to use instead of M ε the spectral measure

$$M_{0}(A)=\sqrt{2\pi}M_{\varepsilon}(A), $$

so that

$$\varepsilon_{t}=\frac{1}{\sqrt{2\pi}}\int_{-\pi}^{\pi}e^{it\lambda} \,dM_{0}(\lambda) $$

and E[|dM 0(λ)|2]=. For a linear process \(X_{t}=\sum_{j=0}^{\infty}a_{j}\varepsilon_{t-j}\) (\(t\in\mathbb{Z}\)) with \(\sum_{j=0}^{\infty}a_{j}^{2}<\infty\) (and \(\sigma_{\varepsilon }^{2}=1\)), one then has the spectral representation

$$ X_{t}=\int_{-\pi}^{\pi}e^{it\lambda}\,dM_{X}(\lambda)\quad(t\in\mathbb{Z}) $$
(4.28)

with

The spectral density of X t is

$$f_{X}(\lambda)=\frac{1}{2\pi}\bigl \vert A\bigl(e^{-i\lambda} \bigr)\bigr \vert ^{2}=\bigl \vert a(\lambda)\bigr \vert ^{2}. $$

Assume that f X (λ)=L f (λ)|λ|−2d as λ→0 or γ X (k)∼L γ (k)k 2d−1 as k→∞. Recall that, under suitable conditions, these assumptions are equivalent to

$$L_{f}(\lambda)=L_{\gamma}\bigl(\lambda^{-1}\bigr) \pi^{-1}\varGamma(2d)\sin \biggl( \frac{\pi}{2}-\pi d \biggr) $$

and

$$ L_{\gamma}(k)=2L_{f}\bigl(k^{-1}\bigr)\varGamma(1-2d) \sin(\pi d). $$
(4.29)

Then \(| a(\lambda)| =L_{f}^{1/2}(\lambda)|\lambda|^{-d}\). Now, we are ready to present an alternative proof of Theorem 4.2. This type of approach was initiated in Dobrushin (1980), Dobrushin and Major (1979); also see Arcones (1994) and Lang and Soulier (2000). We will use a representation of a fractional Brownian motion that appears in Sect. 3.7.1.

Alternative proof of Theorem 4.2

Let \(S_{n}=S_{n}(1)=\sum_{t=0}^{n-1}X_{t}\) (note that we take summation from t=0 to n−1) and write the spectral representation

where

$$ D_{n}(\lambda)=\frac{e^{i \lambda n}-1}{n(e^{i\lambda }-1)}1\bigl\{|\lambda|\le\pi n\bigr\}. $$
(4.30)

Since lim u→0(e λu−1)/u=λ, we conclude that

$$ \lim_{n\to\infty}D_{n}(\lambda/n)\to\frac {e^{i\lambda }-1}{i\lambda}=:D(\lambda). $$
(4.31)

Now, E(|dM 0(n −1 λ)|2)=n −1 . Hence, n 1/2 M 0(n −1 A) and M 0(A) have the same distribution (as stochastic processes indexed by A), and we can write

$$S_{n}\overset{\mathrm{d}}{=}n^{1/2}\int_{-n\pi}^{n\pi} D_{n}(\lambda /n)a \biggl( \frac{\lambda}{n} \biggr) \,dM_{0}( \lambda) \approx n^{1/2}\int_{-\infty}^{\infty} D_{n}(\lambda/n) a \biggl( \frac{\lambda}{n} \biggr) \,dM_{0}( \lambda). $$

Consequently, we have two possible scenarios:

  • \(\lim_{\lambda\to0}a(\lambda)= a(0)=\sqrt{ f_{X}(0)}\not=0\). Then we expect

    $$n^{-1/2}S_{n}\overset{\mathrm{d}}{\to} a(0)\int_{-\infty}^{\infty }\frac{e^{i\lambda}-1}{i\lambda}\,dM_{0}(\lambda) . $$
  • \(a(\lambda)=L_{f}^{1/2}(\lambda)|\lambda|^{-d}\), d∈(−1/2,0)∪(0,1/2). Then we expect

    $$ n^{-(1/2+d)}L_{f}^{-1/2}\bigl(n^{-1} \bigr)S_{n}\overset{\mathrm{d}}{\to}\int_{-\infty}^{\infty }D( \lambda) \frac{1}{|\lambda|^{d}}\,dM_{0}(\lambda) . $$
    (4.32)

In the latter case, applying (4.21) and (4.22) with m=1 and (4.29), we obtain

$$L_{1}(n)= \frac{2\varGamma(1-2d)\sin\pi d}{d(2d+1)}L_{f}\bigl(n^{-1}\bigr)=:K_{1}^{-2}(1,d)L_{f} \bigl(n^{-1}\bigr). $$

Thus,

$$n^{-(1/2+d)}L_{1}^{-1/2}(n)S_{n} =K_{1}(1,d)\int_{-\infty}^{\infty} |\lambda|^{-d}\frac{e^{i\lambda}-1}{i\lambda} \,dM_{0}(\lambda). $$

Recall Proposition 3.1. We can verify that K 1(1,d) agrees with K 1(1,H) there by setting \(H=d+\frac{1}{2}\), so that the limiting random variable is B H (1).

To make the argument (4.32) precise, we note that for |λ|<πn,

uniformly w.r.t. λ (the bound does not depend on λ). Thus,

We conclude that D n (λ/n) converges to D(λ) in \(L^{2}(\mathbb{R},d\lambda)\) (here “” stands for the Lebesgue measure). Also,

$$n^{-d}L_{f}^{-1/2}\bigl(n^{-1} \bigr)D_{n}(\lambda/n)a\biggl(\frac{\lambda }{n}\biggr) $$

converges in \(L^{2}(\mathbb{R},d\lambda)\) to D(λ)|λ|d. Since

we conclude the convergence in L 2. Thus, the result of Proposition 4.2 follows. □

The limiting distribution in formula (4.32) can be also written as

$$ n^{-(1/2+d)} L_{f}^{-1/2}\bigl(n^{-1}\bigr)S_{n}(1) \overset{\mathrm{d}}{\to}\int_{-\infty}^{\infty}D(\lambda) \,dW_{X}(\lambda) , $$
(4.33)

where

$$ dW_{X}(\lambda)=\frac{1}{|\lambda|^{d}}\,dM_{0}(\lambda). $$
(4.34)

The measure W X is called the limiting spectral measure that depends (via the parameter d) on the sequence X t . This representation will be essential in Sect. 4.4.

The longish version of the proof of Theorem 4.2 will allow us to obtain the limiting behaviour of subordinated Gaussian sequences. First, we extend the theorem to partial sum processes \(S_{n,H_{m}}(u):=\sum_{t=1}^{[nu]}H_{m}(X_{t})\), where H m is the mth Hermite polynomial. Remarkably, the limit is no longer an fBm process, provided that long memory is strong enough and m≥2. This was first observed in Rosenblatt (1961), also see Taqqu (1975). Note that their method of proof is based on characteristic functions and is different from the one used in the alternative proof of Theorem 4.2.

Theorem 4.3

Assume that X t (\(t\in\mathbb{N}\)) is a stationary sequence of standard normal random variables such that γ X (k)∼L γ (k)k 2d−1 with d∈(0,1/2). Let \(S_{n,H_{m}}(u)=\sum_{t=1}^{[nu]}H_{m}(X_{t})\). If m(1−2d)<1, then

$$n^{-(1-m(\frac{1}{2}-d))}L_{m}^{-1/2}(n)S_{n,H_{m}}(u)\Rightarrow Z_{m,H}(u)\quad\bigl(u\in[0,1]\bigr), $$

where Z m,H (⋅) is a Hermite–Rosenblatt process with \(H=d+\frac{1}{2}\), ⇒ denotes weak convergence in D[0,1], and \(L_{m}(n)=m!C_{m}L_{\gamma}^{m}(n)\), see (4.21) and (4.22).

Note that this type of convergence requires long memory to be strong enough. In particular, if m=2, we require d∈(1/4,1/2). If this is not the case, then the partial sum process has weak dependence properties.

Example 4.1

Assume that m=2. If d∈(1/4,1/2), then

$$n^{-2d}L_{2}^{-1/2}(n)\sum_{t=1}^{[nu]}\bigl(X_{t}^{2}-1\bigr) \Rightarrow Z_{2,H}(u), $$

where

For each fixed u∈[0,1], the limit is non-normal. This will be illustrated by simulations in computer Example 4.3 later in this section.

Proof of Theorem 4.3

The proof is almost a copy of the alternative proof of Theorem 4.2. We replace (4.28) by

$$H_{m}(X_{t})=\int_{-\pi}^{\pi}\cdots\int_{-\pi}^{\pi}e^{it(\lambda _{1}+\cdots+\lambda_{m})}\,dM_{X}(\lambda_{1})\cdots dM_{X}(\lambda_{m}) $$

(we refer to Sect. 3.7.1.3 for the formula and the meaning of this integral). Recalling

$$dM_{X}(\lambda) =\sqrt{2\pi}a(\lambda)\,dM_{\varepsilon}(\lambda )=a(\lambda)\,dM_{0}(\lambda), $$

we have

where the integration is over [−,]m. Therefore, if \(a(\lambda )=L_{f}^{1/2}(\lambda)|\lambda|^{-d}\), d∈(0,1/2), then we expect

(4.35)

cf. (4.31). Again, we identify

$$L_{m}(n)=m!C_{m} \bigl(2\varGamma(1-2d)\sin\pi d \bigr)^{m}L_{f}^{m}\bigl(n^{-1} \bigr)=K_{1}^{-2}(m,d)L_{f}^{m} \bigl(n^{-1}\bigr), $$

and from Proposition 3.1 we recognize the representation of the Hermite–Rosenblatt process.

A precise argument for (4.35) is the same as in the case m=1; see the proof of Proposition 4.2. Furthermore, we do not verify tightness here since it will be done in the next theorem. □

Finally, convergence of partial sums \(S_{n,G}(u)=\sum_{t=1}^{[nu]}G(X_{t})\) is just a consequence of Theorem 4.3, using the so-called reduction principle, proven originally in Taqqu (1975).

Theorem 4.4

Assume that X t (\(t\in\mathbb{N}\)) is a stationary sequence of standard normal random variables such that γ X (k)∼L γ (k)k 2d−1 (d∈(0,1/2)). Let \(S_{n,G}(u)=\sum_{t=1}^{[nu]}G(X_{t})\), where G is a function such that E[G(X 1)]=0, E[G 2(X 1)]<∞. If m is the Hermite rank of G and m(1−2d)<1, then

$$n^{-(1-m(\frac{1}{2}-d))}L_{m}^{-1/2}(n)S_{n,G}(u)\Rightarrow\frac{J(m)}{m!}Z_{m,H}(u) \quad\bigl(u\in[0,1]\bigr), $$

where Z m,H (⋅) is a Hermite–Rosenblatt process, \(H=d+\frac{1}{2}\), ⇒ denotes weak convergence in D[0,1], and L m is given in (4.21):

$$L_{m}(n)=m!C_{m}L_{\gamma}^{m}(n). $$

Proof

Decompose

$$G(x)=\frac{J(m)}{m!}H_{m}(x)+\sum_{l=m+1}^{\infty}\frac{J(l)}{l!}H_{l}(x)=:\frac{J(m)}{m!}H_{m}(x)+G^{\ast}(x). $$

Using (4.18) and (4.25), we have

$$\mathit{cov} \biggl[ \frac{J(m)}{m!}H_{m}(X_{0}), \frac{J(m)}{m!}H_{m}(X_{k}) \biggr] =\frac{J^{2}(m)}{m!} \gamma_{X}^{m}(k) $$

and

$$\mathit{cov}\bigl[G^{\ast}(X_{0}),G^{\ast}(X_{k}) \bigr]=\sum_{l=m+1}^{\infty}\frac {J^{2}(l)}{l!}\gamma_{X}^{l}(k). $$

Furthermore, for any t,s, the random variables G (X t ) and H m (X s ) are uncorrelated. Therefore,

(4.36)

The Hermite rank of the function G is at least m+1. Consequently, we have two scenarios. Either \(\sum_{k}\gamma_{X}^{m}(k)<\infty\), and then both terms in (4.36) are of the order O(n), or \(\sum_{k}\gamma _{X}^{m}(k)=+\infty\), and then the second term dominates the first one. The latter happens if m(1−2d)<1, and in this case the asymptotic behaviour of \(\sum_{t=1}^{n}G(X_{t})\) is the same as that of \((J(m)/m!)\sum_{t=1}^{n} H_{m}(X_{t})\).

A proof of tightness is immediate. If we set

$$S_{n,G}^{\prime}(u):=n^{-(m(d-1/2)+1)}L_{m}^{-m/2}(n)S_{n}(u), $$

we have

$$E \bigl[ \bigl(S_{n,G}^{\prime}(u)-S_{n,G}^{\prime}(v) \bigr)^{2} \bigr] \sim |u-v|^{m(2d-1)+2}. $$

Since m(1−2d)<1, the exponent is greater than one, and tightness follows from Lemma 4.3. □

In contrast, if the Hermite rank is large enough such that m(1−2d)>1, then we have a weakly dependent-type behaviour of partial sums. The statement and proof of this result is postponed to the section on limit theorems for Appell polynomials.

Example 4.2

We illustrate the theoretical findings by a simulation example. First, we generate n=1000 i.i.d. standard normal random variables X t and plot the partial sum sequence \(S_{k}=\sum_{t=1}^{k} X_{t}\), k=1,…,n. This procedure is repeated for a Gaussian fractional ARIMA(0,d,0) process with parameter d=0.4. The corresponding partial sum processes are plotted in Fig. 4.1. They can be considered approximations of a Brownian motion and a fractional Brownian motion with H=0.9 respectively. Note that the path of the fractional Brownian motion is much smoother than the one of Brownian motion. This is due to long memory, which acts like a smoothing filter.

Fig. 4.1
figure 1

Partial sum sequence \(S_{k}=\sum_{t=1}^{k} X_{t}\) (k=1,…,n) with X t i.i.d. N(0,1) (left) and X t generated by a FARIMA(0,0.4,0) process (right)

Example 4.3

In this example we generate n=1000 random variables X t from a Gaussian fractional ARIMA(0,d,0) process with parameter d=0.4 and compute their sum. This procedure is repeated N=1000 times. A normal probability plot of the N=1000 sums \(\sum_{t=1}^{n} X_{t}\) is displayed in the left panel of Fig. 4.2. The right panel shows a normal probability plot for the sums \(\sum_{t=1}^{n} X_{t}^{2}\). The non-normal behaviour is clearly visible.

Fig. 4.2
figure 2

Illustration of Theorem 4.3: normal probability plots of partial sums \(\sum_{t=1}^{k} X_{t}\) (left) and \(\sum_{t=1}^{k} X^{2}_{t}\), where X t is generated by a FARIMA(0,0.4,0) process

4.2.4 Linear Processes

In this section we consider a causal linear process

$$ X_{t}=\sum_{j=0}^{\infty}a_{j}\varepsilon_{t-j}\quad(t\in\mathbb{N}), $$
(4.37)

where, without loss of generality, \(\sum_{j=0}^{\infty}a_{j}^{2}=1\), and ε t (\(t\in\mathbb{Z}\)) are i.i.d. zero mean random variables with \(\operatorname {var}(\varepsilon_{1})=\sigma_{\varepsilon}^{2}<\infty\). Thus, \(\operatorname {var}(X_{1})=\sigma_{X}^{2}=\sigma_{\varepsilon}^{2}\). Note that Gaussian processes are included in this definition, but the class is much more general. Three different assumptions on the coefficients will be considered as j→∞ and with L a denoting a slowly varying function at infinity:

  • (B1) long memory:

    $$a_{j}\sim L_{a} ( j ) j^{d-1}\quad\biggl(0<d< \frac{1}{2}\biggr); $$
  • (B2) short memory:

    $$\sum_{j=0}^{\infty} \vert a_{j}\vert <\infty,\qquad\sum_{j=0}^{\infty}a_{j} \neq0. $$
  • (B3) antipersistence:

    $$a_{j}\sim L_{a} ( j ) j^{d-1}$$

    with \(-\frac{1}{2}<d<0\), and

    $$\sum_{j=0}^{\infty}a_{j}=0. $$

Under the short-memory assumption (B2), limiting behaviour is classical (see Theorem 4.5); see Brockwell and Davis (1991). Under long memory (B1), the first result was obtained in Davydov (1970a, 1970b); see also Gorodetskii (1977), Lang and Soulier (2000), Wang et al. (2003).

4.2.4.1 Asymptotic Covariances and Normalizing Constants

The behaviour of the autocovariance function γ X and the spectral density f X for the three cases can be characterized as follows. Combining Lemmas 4.13–4.15 with Lemmas 4.9–4.11, respectively, yields the asymptotic behaviour of \(\operatorname {var}(S_{n})\) (where \(S_{n}(u)=\sum_{t=1}^{[nu]}X_{t}\), S n =S n (1)).

Lemma 4.13

Under assumption (B1), we have, as λ→0 and k→∞ respectively,

(4.38)

where

$$ L_{\gamma}(k)=L_{a}^{2}(k)\cdot\sigma_{\varepsilon}^{2} \int_{0}^{\infty }v^{d-1} ( 1+v )^{d-1} \,dv=\sigma_{\varepsilon}^{2}L_{a}^{2}(k)B(1-2d,d), $$
(4.39)

B(x,y) denotes the Beta function, and L f is obtained from L γ by (cf. (1.1))

$$ L_{f}(\lambda)=L_{\gamma}\bigl(\lambda^{-1}\bigr) \pi^{-1}\varGamma(2d)\sin \biggl( \frac{\pi}{2}-\pi d \biggr) . $$
(4.40)

Hence, via Lemma 4.9,

$$ \operatorname {var}(S_{n})\sim L_{S}(n)n^{2d+1} =\frac{1}{d ( 2d+1 ) }L_{\gamma}(n)n^{2d+1}. $$
(4.41)

Proof

We have

$$\gamma_{X}(k)\sim\sigma_{\varepsilon}^{2}\sum _{j=1}^{\infty}L_{a} ( j ) L_{a} ( j+k ) j^{d-1} ( j+k )^{d-1}=\sigma_{\varepsilon}^{2}S_{\infty,k} \cdot k^{2d-1},$$

where

$$S_{\infty,k}=\lim_{n\rightarrow\infty}S_{n,k}$$

and

where the last approximation is uniform in n. The approximation formula for f X follows from Theorem 1.3. □

Example 4.4

(ARFIMA Model)

Consider an ARFIMA(0,d,0) model, d∈(0,1/2). This process has the linear representation \(X_{t}=\sum _{j=0}^{\infty}a_{j}\varepsilon_{t-j}\), where

$$a_{j}=\frac{\varGamma(j+d)}{\varGamma(j+1)\varGamma(d)}\sim\frac{1}{\varGamma(d)}j^{d-1}\quad(j\rightarrow\infty). $$

Thus, L a ∼1/Γ(d), so that

$$\gamma_{X}(k)\sim c_{\gamma}k^{2d-1} $$

with

The last equality follows from Γ(d)Γ(1−d)=π/sinπd. Moreover,

so that

$$f_{X}(\lambda)\sim\frac{\sigma_{\varepsilon}^{2}}{2\pi}|\lambda|^{-2d}. $$

Lemma 4.14

Under assumption (B2), we have

$$\sum_{k=-\infty}^{\infty}\bigl \vert \gamma_{X}(k)\bigr \vert <\infty,\qquad \sum _{k=-\infty}^{\infty}\gamma_{X}(k)>0. $$

If, in addition, \(\sum_{j=0}^{\infty}j|a_{j}|<\infty\), then f X (λ) is continuous on [−π,π].

Proof

We have

Furthermore,

$$\sum_{k=-\infty}^{\infty}\gamma_{X}(k)=2\pi f_{X}(0)=2\pi\frac{\sigma_{\varepsilon}^{2}}{2\pi}\Biggl \vert \sum _{j=0}^{\infty}a_{j}\Biggr \vert ^{2}>0. $$

To show that f X is continuous, consider

$$\tilde{a}(\lambda)=\sum_{j=0}^{\infty}a_{j}e^{-ij\lambda}. $$

Since, as x→0, sinxx and cosx−1∼x 2/2, we obtain for ε<1,

so that \(\tilde{a}(\cdot)\) is continuous, and hence so is \(f_{X}(\lambda )=\sigma_{\varepsilon}^{2}/(2\pi)\left| \tilde{a}(\lambda)\right| ^{2}\). □

Lemma 4.15

Under assumption (B3), we have, as λ→0 and k→∞ respectively,

(4.42)
(4.43)

where

and L f is obtained from L γ by (4.40).

Proof

Similarly to the proof of Lemma 4.13,

$$\gamma_{X}(k)=\sigma_{\varepsilon}^{2}\sum_{j=0}^{\infty}a_{j}a_{j+k} =\sigma_{\varepsilon}^{2}S_{\infty,k}\cdot k^{2d-1} $$

with S ∞,k =lim n→∞ S n,k ,

$$S_{n,k}=k^{1-2d}\sum_{j=0}^{nk}a_{j}a_{j+k}=S_{n,k}(1)+S_{n,k}(2) $$

and

where the approximations are uniform in n. Moreover,

$$\sum_{k=-\infty}^{\infty}\gamma_{X}(k)=2\pi f_{X}(0)=2\pi\frac{\sigma_{\varepsilon}^{2}}{2\pi}\Biggl \vert \sum _{j=0}^{\infty}a_{j}\Biggr \vert ^{2}=0. $$

The approximation of f X for λ→0 follows from Theorem 1.3. □

4.2.4.2 Asymptotic Distribution

Proofs of the next results illustrate different techniques that are applicable in various situations:

  • Under short memory (B2), we apply the K-dependent approximation method, i.e. a combination of Proposition 4.1 and Lemma 4.1. This is easier than the cumulant method and does not require restrictive moment assumptions. It is particularly suited for linear processes (see Brockwell and Davis 1991).

  • Under long memory (B1), we apply the method based on random spectral measures, as outlined in the alternative proof of Theorem 4.2; see Lang and Soulier (2000).

Theorem 4.5

Assume that X t (\(t\in\mathbb{N}\)) is a stationary linear process (4.37) such that (B2) holds. Then

$$n^{-1/2}S_{n}=n^{-1/2}\sum_{t=1}^{n}X_{t} \rightarrow N\bigl(0,\nu^{2}\bigr), $$

where the variance \(\nu^{2}=\sigma_{X}^{2}+2\sum_{k=1}^{\infty}\gamma_{X}(k)\).

This theorem can be formulated in terms of functional convergence to Brownian motion.

Proof

Let \(X_{t,K}=\sum_{j=0}^{K}a_{j}\varepsilon_{t-j}\). Since the sequence X t,K (\(t\in\mathbb{N}\)) is K-dependent, an application of Lemma 4.1 yields

$$n^{-1/2}S_{n,K}=n^{-1/2}\sum_{t=1}^{n}X_{t,K} \overset{d}{\rightarrow}N\bigl(0,\nu_{K}^{2}\bigr) $$

with \(\nu_{K}^{2}=\operatorname {var}(X_{0,K})+2\sum_{k=0}^{K}\gamma_{X_{K}}(k)\), where

$$\gamma_{X_{K}}(k)=E[X_{t,K}X_{t+k,K}]=\sigma_{\varepsilon}^{2}\sum_{j=0}^{K}a_{j}a_{j+k}. $$

Since ν K ν as K→∞, we conclude \(N(0,\nu_{K}^{2})\overset{d}{\rightarrow}N(0,\nu^{2})\). It suffices to prove that for all δ>0,

$$\lim_{K\rightarrow\infty}\limsup_{n\rightarrow\infty}P \bigl( n^{-1/2}|S_{n}-S_{n,K}|>\delta \bigr) =0. $$

The result of our theorem will then follow by Proposition 4.1. By Markov’s inequality, it is sufficient to verify that

$$\lim_{K\rightarrow\infty}\lim_{n\rightarrow\infty}n^{-1}\operatorname {var}(S_{n}-S_{n,K})=0. $$

Let \(\bar{X}_{t,K}=X_{t}-X_{t,K}\). Then

The lim n→∞ behaviour above is obtained by applying the dominated convergence theorem. For this, we need ∑ k j |a j a j+k |<∞. This is true under the summability condition \(\sum_{j=0}^{\infty}|a_{j}|<\infty\). Under this condition, we can also exchange the summations ∑ k and ∑ j . Finally,

$$\lim_{K\rightarrow\infty}\lim_{n\rightarrow\infty }n^{-1}\operatorname {var}(S_{n}-S_{n,K})\leq\sum_{k=-\infty}^{\infty}|a_{k}|\lim_{m\rightarrow\infty}\sum _{j=K+1}^{\infty}|a_{j}|=0. $$

 □

Under (B1), the asymptotic behaviour of partial sums changes. This result was proven first in Davydov (1970a, 1970b). The method below is adapted from Lang and Soulier (2000), where the reader is referred to for details.

Theorem 4.6

Assume that X t (\(t\in\mathbb{N}\)) is a stationary linear process (4.37) such that the long-memory condition (B1) holds, i.e. a j L a (j)j d−1, \(d\in(0,\frac{1}{2})\). Then

$$n^{-(d+\frac{1}{2})}L_{S}^{-1/2}(n)S_{n}(u)=n^{-(d+\frac{1}{2})}L_{S}^{-1/2}(n)\sum_{t=1}^{[nu]}X_{t}\Rightarrow B_{H}(u)\quad\bigl(u\in[0,1]\bigr), $$

where B H (u) is a standard fractional Brownian motion, \(H=d+\frac{1}{2}\), ⇒ denotes weak convergence in D[0,1], and

$$L_{S}(n)=\frac{1}{d(2d+1)}L_{\gamma}(n) $$

with L γ defined in (4.39):

Proof

We use the spectral method, as in the alternative proof of Theorem 4.2. Recall that any stationary sequence with finite variance can be written as

$$\varepsilon_{t}=\frac{1}{\sqrt{2\pi}}\int_{-\pi}^{\pi}e^{it\lambda}M_{0}\,(d\lambda),\quad t\in\mathbb{Z}. $$

The only difference between the spectral measure M 0 here and M 0 in the proof of Theorem 4.2 is that the measure here is not necessarily Gaussian. In particular, there is no guarantee that n 1/2 M 0(n −1⋅) and M 0(⋅) have the same distribution. Nevertheless, the same argument can be applied (see Lang and Soulier 2000). □

Example 4.5

(ARFIMA)

Assume that X t (\(t\in\mathbb{N}\)) is a FARIMA(0,d,0) model as in Example 4.4. Then

Hence,

$$n^{-(d+\frac{1}{2})}L_{S}^{-1/2}(n)\sum_{t=1}^{[nu]}X_{t}\Rightarrow B_{H}(u) $$

and

$$L_{S}(n)=c_{\gamma}\frac{1}{d(2d+1)}. $$

Note that the innovations ε t do not need to be Gaussian.

4.2.5 Subordinated Linear Processes

Next we consider the case where instead of the linear process X t (\(t\in\mathbb{N}\)) a subordinated process, i.e. a transformation Y t =G(X t ) (\(t\in\mathbb{N}\)), is observed. Recall that in the Gaussian case asymptotic properties of partial sums of X t and H m (X t ) (and, via the reduction principle of Theorem 4.4, of general functionals) can be studied using the spectral method. For linear processes, we applied again the spectral method in Theorem 4.6. However, this extension is not feasible for subordinated linear processes. In this setup, there are two common approaches: Appell polynomials (Surgailis 1982; Giraitis 1985; Giraitis and Surgailis 1986, 1989; Avram and Taqqu 1987; Surgailis and Vaičiulis 1999; Surgailis 2000; see also Surgailis 2003, for overview) and a martingale decomposition (Ho and Hsing 1996, 1997; Wu 2003; see also Hsing 2000 for an overview).

4.2.5.1 Normalizing Constants: Simple Example

Before we develop a general formula, let us consider the simple case of \(G(X_{t})=X_{t}^{2}\).

Example 4.6

Let X t (\(t\in \mathbb{N}\)) be a linear process defined by (4.37). Assume that \(E[\varepsilon_{1}^{4}]<\infty\) and that the long-memory condition (B1) holds. Using formula (4.38) for the covariance of X t (\(t\in\mathbb{N}\)), we have

$$\gamma_{X}^{2}(k)\sim L_{\gamma}^{2}(k)k^{2(2d-1)}. $$

On the other hand,

Note that under (B1) the limiting behaviour of \(\gamma_{X}^{2}(k)\) is determined by the second term. Now,

$$X_{0}^{2}=\sum_{j=0}^{\infty}a_{j}^{2}\varepsilon_{0-j}^{2}+\sum _{{j,l=0;\ j\not=l}}^{\infty}a_{j}a_{l}\varepsilon_{0-j}\varepsilon _{0-l}=:X_{0,1}+X_{0,2}. $$

Analogously, we define \(X_{k}^{2}:=X_{k,1}+X_{k,2}\). Note that X 0,1 and X k,2 are uncorrelated. The same holds for X 0,2 and X k,1. Furthermore,

$$\mathit{cov}(X_{0,1},X_{k,1})=E\bigl[\varepsilon_{1}^{4} \bigr]\sum_{j=0}^{\infty}a_{j}^{2}a_{j+k}^{2}$$

and

$$\mathit{cov}(X_{0,2},X_{k,2})=2\sum_{{j,l=0;\ j\not=l}}^{\infty}a_{j}a_{l}a_{j+k}a_{l+k}. $$

Recalling that the second covariance is of a larger order than the first one, we conclude

$$\gamma_{X^{2}}(k)\sim2\sum_{{j,l=0;\ j\not=l}}^{\infty}a_{j}a_{l}a_{j+k}a_{l+k}\sim2\gamma_{X}^{2}(k)\sim2L_{\gamma}^{2}(k)k^{2(2d-1)}. $$

4.2.5.2 Normalizing Constants: Appell Polynomials

Now, we turn our attention to general nonlinear functionals. For a general non-normal distribution, in view of Sect. 3.3, a natural approach is to start with the Wick product Y t =A m (X t )=:X t ,…,X t : where A m is the mth Appell polynomial associated with the marginal distribution of X t . Suppose that γ X (k) is known, either exactly or its asymptotic behaviour. Can we give a simple formula for γ Y (k)? In principle, the diagram formulas given in Theorem 3.10 provide an answer because

$$\kappa ( Y_{t},Y_{t+k} ) = \biggl[ \frac{\partial^{2}}{\partial z_{1}\partial z_{2}}\log E \bigl[ \exp ( z_{1}Y_{t}+z_{2}Y_{t+k} ) \bigr] \biggr]_{z=0}=\gamma_{Y}(k). $$

To apply the diagram formula, consider a table W with two rows W 1, W 2 of length m. The positions in W 1 are associated with X t and those in W 2 with X t+k , i.e. we may write \(W_{1}=\{ \tilde{X}_{(1,1)},\ldots,\tilde{X}_{(1,m)}\} \) with \(\tilde{X}_{(1,t)}=X_{t}\) and \(W_{2}=\{ \tilde{X}_{(2,1)},\ldots,\tilde{X}_{(2,m)}\} \) with \(\tilde{X}_{(2,j)}=X_{j+k}\). Using the same notation as in Theorem 3.10, we obtain from (3.81)

$$ \gamma_{Y}(k)=\kappa \bigl(\mathopen{:}X^{W_{1}}\mathclose{:},\mathopen{:}X^{W_{2}}\mathclose{:} \bigr) =\sum_{\gamma \in\varGamma_{W}^{\not- ,c}}\kappa \bigl( X^{\prime V_{1}} \bigr) \cdot\cdot \cdot\kappa \bigl( X^{\prime V_{r}} \bigr) . $$
(4.44)

Unfortunately, this is a rather complicated expression because in general κ(X V) may not be zero for any subset V. There is one exception where (4.44) simplifies considerably, namely if X t (\(t\in\mathbb{N}\)) is a Gaussian process. In this case, all cumulants κ(X V) are zero except for normal edges, i.e. κ(X V)=0 if |V|≠2, so that the sum in (4.44) is over \(\varGamma_{W}^{\not- ,c,\mathcal{N}}\), and, up to a constant, we obtain a sum of correlations to the power m, see Corollary 3.5.

Although (4.44) is complicated, it is possible to give simple asymptotic formulas for γ Y (k) and, consequently, the variance of \(S_{n,A_{m}}=\sum_{t=1}^{n}A_{m}(X_{t})\). A first simplification can be obtained in the representation of Appell polynomials of linear processes:

Lemma 4.16

Let X t (\(t\in\mathbb{N}\)) be a linear process (4.37) such that the Appell polynomials of its marginal distribution A m (\(m\in\mathbb{N}\)) exist. Then

$$ A_{m} ( X_{t} ) =\sum_{k_{1},\ldots,k_{m}=0}^{\infty}a_{k_{1}}\cdot\cdot\cdot a_{k_{m}} (\mathopen{:}\varepsilon_{t-k_{1}}\cdot\cdot \cdot \varepsilon_{t-k_{m}}\mathclose{:} ) . $$
(4.45)

Proof

The result follows from

$$A_{m}(X_{t})=\mathopen{:}\underset{m}{\underbrace{X_{t},\ldots,X_{t}}}\mathclose{:} $$

and multilinearity of the Wick product. □

A direct consequence of this result is a simplified expression for S n :

Corollary 4.1

Let X t (\(t\in\mathbb{N}\)) be a linear process defined by (4.37) such that the Appell polynomials of its marginal distribution A m (\(m\in\mathbb{N}\)) exist. Let

$$S_{n,A_{m}}=\sum_{t=1}^{n}A_{m}( X_{t} ) . $$

Then

$$S_{n,A_{m}}=\sum_{k_{1},\ldots,k_{m}=0}^{\infty}a_{k_{1}} \cdot\cdot\cdot a_{k_{m}}\sum_{t=1}^{n} (\mathopen{:}\varepsilon_{t-k_{1}}\cdot\cdot\cdot \varepsilon_{t-k_{m}}\mathclose{:}) $$

with a k =0 for k<0.

Furthermore, the diagram formula can be used to obtain an expression for the asymptotic autocovariance function of the subordinated sequence Y t (\(t\in\mathbb{N}\)) under long memory:

Corollary 4.2

Let X t (\(t\in\mathbb{N}\)) be a linear process defined by (4.37) such that the Appell polynomials of its marginal distribution A m (\(m\in\mathbb{N}\)) exist and the long-memory assumption (B1) holds. Then \(Y_{t}=A_{m}\left( X_{t}\right) \) has an autocovariance function γ Y (k) with

(4.46)

as k→∞, cf. (4.39).

Proof

Here, only an outline of the extended proof in Giraitis and Surgailis (1989) and Surgailis and Vaičiulis (1999) is given. Lemma 4.16 and the multilinearity of cumulants imply

Now consider a table W with two rows W i ={ε (i,1),…,ε (i,m)} (i=1,2) with \(\varepsilon _{(1,s)}=\varepsilon_{t_{s}}\) and \(\varepsilon_{(2,s)}=\varepsilon _{t_{s}^{\prime}}\). The diagram formula for cumulants of Wick products implies

$$\kappa (\mathopen{:}\varepsilon_{t-j_{1}},\ldots,\varepsilon_{t-j_{m}}\mathclose{:},\mathopen{:} \varepsilon_{t+k-j_{1}^{\prime}},\ldots,\varepsilon_{t+k-j_{m}^{\prime}}\mathclose{:} ) =\sum _{\gamma\in\varGamma_{W}^{\not- ,c}}\kappa \bigl( \varepsilon^{\prime V_{1}} \bigr) \cdots \kappa \bigl( \varepsilon^{\prime V_{r}} \bigr) . $$

Using this equation, we have

$$\kappa \bigl( A_{m} ( X_{t} ) ,A_{m} ( X_{t+k} ) \bigr) =r_{\mathrm{main}}+r_{k},$$

where

$$r_{\mathrm{main}}=\sum_{\gamma\in\varGamma_{W}^{\not- ,c,\mathcal{N}}}\sum _{\substack{j_{1},\ldots,j_{m}=0\\j_{1}^{\prime},\ldots,j_{m}^{\prime }=0}} \Biggl( \prod_{i=1}^{m}a_{j_{i}}a_{j_{i}^{\prime}} \Biggr) \kappa \bigl( \varepsilon^{\prime V_{1}} \bigr) \cdot\cdot\cdot\kappa \bigl( \varepsilon^{\prime V_{r}} \bigr) $$

and

$$r_{k}=\sum_{\gamma\in\varGamma_{W}^{\not- ,c}\setminus\varGamma_{W}^{\not- ,c,\mathcal{N}}}\sum _{\substack{j_{1},\ldots,j_{m}=0\\j_{1}^{\prime},\ldots,j_{m}^{\prime}=0}} \Biggl( \prod_{i=1}^{m}a_{j_{i}}a_{j_{i}^{\prime}} \Biggr) \kappa \bigl( \varepsilon^{\prime V_{1}} \bigr) \cdot\cdot\cdot \kappa \bigl( \varepsilon^{\prime V_{r}} \bigr) . $$

It can be shown that, as k→∞, r k =o(k (2d−1)m), so that only diagrams in \(\varGamma_{W}^{\not- ,c,\mathcal{N}}\) matter asymptotically. For instance, for \(\gamma=\bigcup_{i=1}^{m-1}V_{i}\) with V i ={(1,i),(2,i)} (i=1,…,m−2) and V m−1={(1,m−1),(2,m−1),(1,m),(2,m)}, we have, because of independence of the random variables ε i ,

$$\kappa \bigl( \varepsilon^{\prime V_{1}} \bigr) \cdot\cdot\cdot\kappa \bigl( \varepsilon^{\prime V_{m-1}} \bigr) =0, $$

unless \(j_{1}^{\prime}=j_{1}+k\),…\(,j_{m-1}^{\prime}=j_{m-1}+k\) and j m−1=j m , \(j_{m-1}^{\prime}=j_{m}^{\prime}=j_{m-1}+k\). Thus, the contribution of γ to r m is

$$\sigma_{\varepsilon}^{2} \Biggl( \sum_{j=0}^{\infty}a_{j}a_{j+k} \Biggr)^{m-2}\sum_{j=0}^{\infty}a_{j}^{2}a_{j+k}^{2} \sim\gamma_{X}^{m-2} ( k ) L(k)k^{4d-3}=o \bigl( k^{(2d-1)m} \bigr) . $$

For κ main, the calculation simplifies considerably because each \(\gamma\in\varGamma_{W}^{\not- ,c,\mathcal{N}}\) consists of edges V j ={(1,j),(1,π(j))} (j=1,2,…,m) where π is a permutation of {1,2,…,m}. Thus, the number of diagrams in \(\varGamma_{W}^{\not- ,c,\mathcal{N}}\) is \(| \varGamma_{W}^{\not- ,c,\mathcal{N}}| =m!\). Moreover, for each permutation π,

$$\sum_{\substack{j_{1},\ldots,j_{m}=0\\j_{1}^{\prime},\ldots ,j_{m}^{\prime}=0}} \Biggl( \prod _{i=1}^{m}a_{j_{i}}a_{j_{i}^{\prime}} \Biggr) \kappa \bigl( \varepsilon^{\prime V_{1}} \bigr) \cdot\cdot\cdot\kappa \bigl( \varepsilon^{\prime V_{r}} \bigr) =\sigma_{\varepsilon}^{2m} \Biggl( \sum _{j=0}^{\infty}a_{j}a_{j+k} \Biggr)^{m}=\gamma_{X}^{m}(k). $$

Thus, taking the sum over all m! permutations, we have

$$r_{\mathrm{main}}=m!\gamma_{X}^{m}(k). $$

 □

Note that, if X t (\(t\in\mathbb{N}\)) is a Gaussian process, then we have the exact relationship \(\gamma_{A_{m}}(k)=m!\gamma_{X}^{m}(k)\) for any finite k because all cumulants above order 2 are zero, so that all contributions except those from \(\varGamma_{W}^{\not- ,c,\mathcal{N}}\) are zero. (cf. Sect. 4.2.3).

The combination of Lemma 4.9 and formula (4.38) yields an asymptotic formula for the variance of \(S_{A_{m},n}=\sum_{t=1}^{n}A_{m}(X_{t})\) under the assumption of long memory (see Giraitis and Surgailis 1989; Surgailis and Vaičiulis 1999):

Theorem 4.7

Let X t (\(t\in\mathbb{N}\)) be a linear process defined by (4.37) such that the Appell polynomials A m (\(m\in\mathbb{N}\)) of its marginal distribution exist and the long-memory assumption (B1) holds. Assume further that m(1−2d)<1. Then, as n→∞,

$$\operatorname {var}( S_{n,A_{m}} ) =\operatorname {var}\Biggl( \sum_{t=1}^{n}A_{m}(X_{t}) \Biggr) \sim L_{m} ( n ) n^{(2d-1)m+2}$$

with

(4.47)

and L γ given by (4.39). On the other hand, if m(1−2d)>1, then

$$\operatorname {var}(S_{n,A_{m}})=O(n). $$

We recognize the same formula as in the Gaussian case, see (4.20). Furthermore, note that, in general, antipersistence is not inherited because the condition that autocovariances add up to zero is destroyed much more easily than nonsummability.

4.2.5.3 Asymptotic Distributions: Appell Polynomials

In the previous sections we obtained asymptotic expressions for the autocovariance function \(\gamma_{A_{m}}(k)=\mathit{cov}(A_{m}(X_{t}),A_{m}(X_{t+k}))\) and the variance \(v_{n}^{2}:=\operatorname {var}(S_{n,A_{m}})\). The remaining question is which processes one obtains as limits of \(S_{n,A_{m}}(t)/v_{n}\). It turns out that, under suitable moment conditions, the only possible limiting processes are Hermite–Rosenblatt processes. In fact this question has been answered in the Gaussian case, see Theorem 4.4.

Theorem 4.8

Let X t (\(t\in\mathbb{N}\)) be a linear process defined by (4.37) such that the Appell polynomials A m (\(m\in\mathbb{N}\)) of its marginal distribution exist and the long-memory assumption (B1) holds, i.e. a j L a (j)j d−1, d∈(0,1/2). Let

$$S_{n,A_{m}}(u)=\sum_{t=1}^{ [ nu ] }A_{m} ( X_{t} ) \quad\bigl(u\in[0,1]\bigr) $$

and assume that \(E( \varepsilon_{1}^{2j}) <\infty\) for all j. Then, if m(1−2d)<1,

$$ n^{-(1-m(\frac{1}{2}-d))}L_{m}^{-1/2}(n)S_{n,A_{m}}(u)\Rightarrow Z_{m,H}(u)\quad\bigl(u\in[0,1]\bigr), $$
(4.48)

where Z m,H (⋅) is the Hermite–Rosenblatt process with \(H=d+\frac{1}{2}\), ⇒ denotes weak convergence in D[0,1], and L m is given in (4.47):

with L γ given by (4.39):

$$L_{\gamma}(k)=L_{a}^{2}(k)\cdot\sigma_{\varepsilon}^{2} \int_{0}^{\infty}v^{d-1} ( v+1 )^{d-1}\,dv. $$

On the other hand, if m(1−2d)>1, then \(\operatorname {var}(S_{n,A_{m}})\sim\sigma _{S}n\) for some σ S >0, and

$$ n^{-\frac{1}{2}}S_{n,A_{m}}(u)\Rightarrow\sigma_{S}B(u)\quad\bigl(u\in [0,1]\bigr), $$
(4.49)

where B(⋅) is a standard Brownian motion, anddenotes weak convergence in D[0,1].

In other words, the asymptotic distribution is the same as in case of Hermite polynomials. Moreover, L m agrees with L m in Theorem 4.3.

Proof

At first consider the case with m(1−2d)>1. The proof is rather long, so that only a sketch is given here (for details, see e.g. Surgailis 2003). To prove the convergence of finite-dimensional distributions, we use the cumulant method (cf. Theorem 4.1). Recall that for the normal distribution, all cumulants of order j≥3 equal zero, and there is no other distribution with this property. It is therefore sufficient to show that for j≥3,

$$\lim_{n\rightarrow\infty}\kappa_{j} \bigl( n^{-\frac {1}{2}}S_{n,A_{m}} ( t ) \bigr) =n^{-\frac{j}{2}}\lim_{n\rightarrow\infty}\kappa \bigl( \underset{j}{ \underbrace{S_{n,A_{m}} ( t ) ,\ldots ,S_{n,A_{m}} ( t ) }} \bigr) =0. $$

Without loss of generality, we may fix t at t=1, and we write \(S_{n,A_{m}}=S_{n,A_{m}}(1)\). Now for \(s_{1},\ldots,s_{j}\in\mathbb{N}\), consider a table W with rows

$$W_{r}= \{ X_{(r,1)}=X_{s_{r}},\ldots,X_{(r,j)}=X_{s_{r}} \} \quad (1\leq r\leq j). $$

Then, because of multilinearity of κ,

The diagram formula implies

$$\kappa \bigl(\mathopen{:}X^{W_{1}}\mathclose{:},\ldots, \mathopen{:}X^{W_{j}}\mathclose{:} \bigr) =\sum _{\gamma\in \varGamma _{W}^{\not- ,c}}\kappa \bigl( X^{^{\prime}V_{1}} \bigr) \cdot\cdot \cdot \kappa \bigl( X^{^{\prime}V_{r}} \bigr), $$

and hence,

Since the number of diagrams in \(\varGamma_{W}^{\not- ,c}\) is finite and does not depend on n, it is sufficient to show that \(n^{-\frac {j}{2}}J_{n,\gamma }\) converges to zero. Note first that, for any s 1,…,s j and VW,

$$\kappa \bigl( X^{\prime V} \bigr) =\kappa \bigl( \underset{\vert V\cap W_{1}\vert \mbox{-}\text{times}}{\underbrace{X_{s_{1}}, \ldots,X_{s_{1}}}},\ldots,\underset{\vert V\cap W_{j} \vert \mbox{-}\text{times}}{\underbrace {X_{s_{j}},\ldots,X_{s_{j}}}} \bigr) . $$

Since X t (\(t\in\mathbb{N}\)) is a linear process with i.i.d. innovations ε j (\(t\in\mathbb{Z}\)), this can be written as

$$\kappa \bigl( X^{\prime V} \bigr) =\mathrm{const}\cdot B_{V,s_{1},\ldots,s_{j}}, $$

where

$$B_{V,s_{1},\ldots,s_{j}}=\sum_{i=-\infty}^{\infty}a_{i+s_{1}}^{\vert V\cap W_{1}\vert } \cdots a_{i+s_{j}}^{\vert V\cap W_{j}\vert }. $$

Hence,

$$\kappa \bigl( X^{\prime V_{1}} \bigr) \cdot\cdot\cdot\kappa \bigl( X^{\prime V_{r}} \bigr) =\mathrm{const}\cdot\prod_{u=1}^{r}B_{V_{u},s_{1},\ldots,s_{j}}, $$

so that it is sufficient to show that each \(n^{-\frac{j}{2}}B_{V_{u},s_{1},\ldots,s_{j}}\) converges to zero. This requires a rather laborious detailed argument. However, the essential idea used in Surgailis (2003, Lemma 6.1) is to show this first for a finite moving average process \(X_{t,K}=\sum_{j=0}^{K}a_{j}\varepsilon_{t-j}\) (actually Surgailis allows for a two-sided moving average) and then give an upper bound for the difference between the approximation \(J_{n,\gamma}^{K}\) and J n,γ that converges to zero as K tends to infinity. Note that a similar approximation argument was used to establish convergence of partial sums of weakly dependent linear processes, see Theorem 4.5.

Tightness is easier than fidi-convergence but is omitted here; we refer the reader to Giraitis (1985).

Next, consider the case m(1−2d)<1. This case has been considered for instance in Surgailis (1981, 1982), Giraitis and Surgailis (1986, 1989) and Avram and Taqqu (1987); see also Surgailis (2003) for an overview.

Recall from Corollary 4.1 that

$$S_{n,A_{m}}=\sum_{t=1}^{n}\sum _{j_{1},\ldots,j_{m}=0}^{\infty }a_{j_{1}}\cdot \cdot\cdot a_{j_{m}} (\mathopen{:}\varepsilon_{t-j_{1}}\cdot\cdot\cdot \varepsilon_{t-j_{m}}\mathclose{:} ) . $$

Consider

$$ U_{n,m}:=m!\sum_{t=1}^{n}\sum _{0=j_{1}<j_{2}<\cdots<j_{m}}^{\infty} a_{j_{1}}\cdot\cdot\cdot a_{j_{m}} ( \mathopen{:}\varepsilon_{t-j_{1}}\cdot\cdot \cdot\varepsilon_{t-j_{m}}\mathclose{:} ) . $$
(4.50)

Since the random variables \(\varepsilon_{j_{1}}\cdot\cdot\cdot \varepsilon _{j_{m}}\) in this expression are independent, we have

$$\mathopen{:}\varepsilon_{j_{1}}\cdot\cdot\cdot\varepsilon _{j_{m}}\mathclose{:}=A_{1}(\varepsilon _{j_{1}})\cdots A_{1}(\varepsilon_{j_{m}})=\varepsilon_{j_{1}}\cdots \varepsilon_{j_{m}}. $$

Therefore, we may write

$$ U_{n,m}=m!\sum_{t=1}^{n}\sum_{0=j_{1}<j_{2}<\cdots<j_{m}}^{\infty}\prod _{s=1}^{m}a_{j_{s}}\varepsilon_{t-j_{s}}=:m!\sum_{t=1}^{n}V_{t,m}. $$
(4.51)

If we recall now (cf. proof of Theorem 4.6) that

$$\varepsilon_{t}=\frac{1}{\sqrt{2\pi}}\int_{-\pi}^{\pi}e^{it\lambda}M_{0}\,(d\lambda), $$

where M 0 is a spectral measure with independent increments, then combining argument from the proof of Theorem 4.3 with the proof of Theorem 4.6, we expect that

(4.52)

where dW X (λ)=|λ|ddM 0(λ) is the limiting spectral measure defined in (4.34). The spectral-domain function L f is replaced by the time-domain slowly varying function L m using the same argument as in the proof of Theorem 4.3:

$$L_{m}(n)=m!C_{m} \bigl( 2\varGamma(1-2d)\sin(\pi d) \bigr)^{m}L_{f}^{m}\bigl(n^{-1} \bigr). $$

Then,

$$ n^{-(1-m(\frac{1}{2}-d))}L_{m}^{-1/2}(n)U_{n,m}\overset{\mathrm{d}}{\rightarrow}Z_{m,H}(1). $$
(4.53)

Finally,

$$S_{n,A_{m}}=U_{n,m}+r_{n,m}, $$

where the remainder r n,m involves summation over j 1,…,j m such that at least two indices agree. The remainder is of a smaller order (see Avram and Taqqu 1987 for details).

Tightness is very easy. We use the same argument as in the proof of Theorem 4.4, together with the variance estimates in Theorem 4.7. □

As noted in the proof, in the case with m(1−2d)<1, the convergence of \(S_{n,A_{m}}\) is determined by the term U n,m defined in (4.51). In fact, the convergence equation (4.52) will play a crucial role in some of the results following below.

The assumptions of the theorem can be relaxed in various ways. For instance, in order to obtain the usual central limit theorem in (4.49), only ∑|γ X (k)|m<∞ is required instead of the specific decay of γ X (see Surgailis 2003). Moreover, the result can be extended to

$$S_{n,G}(u)=\sum_{t=1}^{[ nu] }G(X_{t}) $$

with

$$G(x)=\sum_{j=m}^{\infty}\frac{a_{\mathrm{app},j}}{j!}A_{j}(x). $$

Assuming that a app,m ≠0 (i.e. G has Appell rank m), the contribution of \(a_{\mathrm{app},m}\*A_{m}(X_{t})/m!\) dominates, provided that m(1−2d)<1. For example, Surgailis (2000) considers arbitrary polynomials G. Furthermore, Surgailis and Vaičiulis (1999) replace independent ε t (\(t\in\mathbb{Z}\)) by martingale differences, and Surgailis (2000) considers \(\tilde{X}_{t}=X_{t}+V_{t}\) where V t (\(t\in\mathbb{N}\)) is a stationary short-memory process.

In view of the fact that for each distribution different Appell polynomials are obtained, and in general they are not orthogonal, it is quite remarkable that the same asymptotic limit is obtained as under Gaussian subordination and Hermite polynomials. Moreover, it is worth noting that, for fixed m, the condition m(1−2d)<1 means that \(d>\frac{1}{2}(1-m^{-1})\). Thus, a nonstandard limiting behaviour (which is also called noncentral limit theorem) is achieved for sufficiently strong long-range dependence. The higher the degree m of the Appell polynomial, the stronger dependence has to be to satisfy the condition. This is essentially due to (4.46). Since at the same time d does not exceed \(\frac{1}{2}\), there is no such d for m=1. In other words, for X t (\(t\in\mathbb{N}\)), a noncentral limit theorem holds for all \(0<d<\frac{1}{2}\).

4.2.5.4 Asymptotic Distributions: Martingale Approach and Power Ranks

Recall now that the jth Appell coefficient can be obtained either by

$$ a_{\mathrm{app},j}=E \bigl[ G^{(j)} ( X ) \bigr] $$
(4.54)

if the jth derivative of G exists and its expected value is not zero (see (3.66)) or by

$$ a_{\mathrm{app},j}=(-1)^{j}\int G(x)p_{X}^{(j)}(x)\,dx $$
(4.55)

(see (3.69)), where \(p_{X}=F_{X}^{\prime}\) is the density of X. Note that due to (4.54), a similar definition of Appell rank that has been proposed in the literature is the so-called power rank.

Definition 4.1

Let X be a random variable. The power rank of a function G (with respect to X) is the smallest integer m≥1 such that \(G_{\infty}^{(m)}(x)\not=0\), where G (x)=E[G(X+x)].

Example 4.7

Let F X be the distribution of a random variable X with E(X)=0. If G(x)=x 2E(X 2), then \(G_{\infty}^{(1)}(0)=2\int u\,dF_{X}(u)=2E(X)=0\). Furthermore, \(G_{\infty}^{(2)}(0)=2\int dF_{X}(u)=2\). This implies that for a centred linear process X t =∑a j ε tj , the power rank of the quadratic function is always 2, regardless of the distribution of ε t (and the marginal distribution of X t ).

Using the power rank, Ho and Hsing (1996, 1997) developed a different approach to studying limit theorems for functionals of linear processes. To describe the idea, let us again consider

and

$$ G_{K}(y):=E \bigl[ G(X_{t,K}+y) \bigr] \quad (K\geq0),\qquad G_{\infty }(y)=E \bigl[ G(X_{t}+y) \bigr] . $$
(4.56)

We also use the convention G −1=G and \(\tilde{X}_{0,-1}=X_{0}\). Note now, that if \(\mathcal{F}\) is a sigma field, ξ A is a random variable that is \(\mathcal{F}\)-measurable and ξ B is a random variable that is independent of \(\mathcal{F}\) and has distribution F B , then

$$ E \bigl[ G(\xi_{A}+\xi_{B}+y)|\mathcal{F} \bigr] =\int G( \xi_{A}+v+y)\,dF_{B}(v)=:G_{B,\ast}( \xi_{A}+y) $$
(4.57)

and

$$ G_{\ast}(y):=E \bigl[ G(\xi_{A}+\xi_{B}+y) \bigr] =E \bigl[ G_{B,\ast} (\xi_{A}+y) \bigr] . $$
(4.58)

Now let \(\mathcal{F}_{K}=\sigma(\varepsilon_{j},-\infty<j\leq K)\) (\(K\in\mathbb{Z}\)). We apply (4.57) and (4.58) with \((\xi_{A},\xi_{B},\mathcal {F})=(\tilde {X}_{t,K-1},X_{t,K-1},\mathcal{F}_{t-K})\) and \((\xi_{A},\xi_{B},\mathcal {F})=(\tilde{X}_{t,K},X_{t,K},\mathcal{F}_{t-(K+1)})\) respectively. We obtain

(4.59)
(4.60)

The point of this approximation is that the first term in the last expression is just the partial sum of the linear sequence, multiplied by a constant. The first term is of a larger order than the second term. Consequently, using Theorem 4.6, we expect

$$n^{-(d+\frac{1}{2})}L_{S}^{-1/2}(n)\sum _{t=1}^{n} \bigl\{ G(X_{t})-E \bigl[ G(X_{1}) \bigr] \bigr\} \overset{d}{\rightarrow}G_{\infty}^{(1)}(0)B_{H}(1). $$

This is useful, of course, only if \(G_{\infty}^{(1)}(0)\), the first power rank of G, does not vanish. If \(G_{\infty}^{(1)}(0)=0\), then the expansion is continued until we obtain a non-vanishing quantity \(G_{\infty }^{(m)}(0)\). In that case we say that the power rank of G is m. If for example the power rank is 2, the expansion reads further

As before, the second term in the last expression is of a smaller order than the first one. We recognize the first term as \(G_{\infty}^{(2)}(0)U_{n,2}/2!\) (cf. (4.51)). Therefore, using the convergence result (4.52), we have

$$n^{-2d}L_{2}^{-1/2}(n)\sum _{j=1}^{n} \bigl\{ G(X_{t})-E \bigl[ G(X_{1}) \bigr] \bigr\} \Rightarrow G_{\infty}^{(2)}(0)Z_{2,H}(1)/2!. $$

This can be generalized to arbitrary power ranks. There are a lot of technical details missing in the heuristic explanation above. We make it more precise, using a modified version of Ho and Hsing’s approach (see Wu 2003). In order to do this, let G be a function, and \(p\in\mathbb{N}\). Define (cf. (4.51))

$$T_{n}(G;p)=\sum_{t=1}^{n} \Biggl\{ G(X_{t})-E \bigl[ G(X_{1}) \bigr] -\sum _{r=1}^{p}G_{\infty}^{(r)}(0)V_{t,r} \Biggr\} , $$

where

$$V_{t,r}=\sum_{0\leq j_{1}<\cdots<j_{r}}\prod _{s=1}^{r}a_{j_{s}}\varepsilon _{t-j_{s}}. $$

In particular,

$$T_{n}(G;1)=\sum_{j=1}^{n} \bigl\{ G(X_{t})-E \bigl[ G(X_{1}) \bigr] -G_{\infty}^{(1)}(0)X_{t} \bigr\} . $$

For any random variable Y, let ∥Y r =E 1/r[Y r]. The following theorem establishes a reduction principle for T n (G;p) that can be viewed as a counterpart to the Gaussian case (see the proof of Theorem 4.4). We state the result assuming that the slowly varying function L a in (B1) is constant. The statement can be modified appropriately to incorporate a general slowly varying function L a (j).

Theorem 4.9

Let X t (\(t\in\mathbb {N}\)) be a linear process defined by (4.37) with coefficients satisfying assumption (B1) with L a (j)≡1. Assume that E[|ε|4+γ]<∞ for some γ>0 and

$$ \max_{r=1,2,\ldots,p+1}\sup_{y}\big|G_{\infty}^{(r)}(y)\big|<\infty, $$
(4.61)

where G is defined in (4.56).

  • If (p+1)(1−2d)>1, then \(\Vert T_{n}(G;p)\Vert_{2}^{2}=O(n)\).

  • If (p+1)(1−2d)<1, then

    $$ \big\Vert T_{n}(G;p)\big\Vert_{2}^{2}=O\bigl(n^{2-(p+1)(1-2d)}\bigr). $$
    (4.62)

The proof of this result is postponed to the end of this section. At this moment, let us discuss its consequences and technical assumptions. Assumption (4.61) is in the spirit of Ho and Hsing (1997). Another assumption was considered in Wu (2003). Similarly to definition (4.56), one can argue that

For example,

Hence, for instance, if G has uniformly bounded second-order derivatives, then the limit as δ→0 exists. However, such a strong assumption is not needed in fact, and a condition like (4.61) suffices (see Ho and Hsing 1996, Lemma 6.2, Wu 2003). We may thus write \(G_{0}^{(r)}(y)={E}[G^{(r)}(a_{0}\varepsilon_{0}+y)]\) and

Therefore, it is intuitively clear that properties of \(G_{0}^{(r)}\) are transferred to \(G_{1}^{(r)}\) and by induction to any of \(G_{K}^{(r)}\), K≥1.

Example 4.8

Consider G(u)=1{ux 0} for a fixed x 0. Then G (y)=E[1{X+yx 0}]=P(Xx 0y), and

$$G_{\infty}^{(1)}(0)=\frac{d}{dy}P(X\leq x_{0}-y)|_{y=0}=-p_{X}(x_{0}-y)|_{y=0}=-p_{X}(x_{0}), $$

where p X is the density of X.

What is the consequence of the theorem above? Take p=1. We obtain \(\Vert T_{n}(G;1)\Vert_{2}^{2}=O(\max\{n,n^{4d}\})\). Recall now Theorem 4.6 that describes convergence of partial sums \(\sum_{t=1}^{n}X_{t}\). We conclude that the limiting behaviour of

$$n^{-(\frac{1}{2}+d)}L_{1}^{-1/2}(n)\sum _{t=1}^{n}\bigl\{ G(X_{t})-E \bigl(G(X_{1})\bigr)\bigr\} $$

is the same as that of

$$n^{-(\frac{1}{2}+d)}L_{1}^{-1/2}(n)G_{\infty}^{(1)}(0)\sum_{t=1}^{n}X_{t}, $$

where L 1(n)=(d(2d+1))−1 L γ (n), and L γ (n) given in (4.39). If the power rank is greater than one, then one has to apply a higher-order expansion (p≥2). The limiting behaviour of the partial sum follows from the corresponding limit theorem for U n,p . The latter was considered in (4.51) and (4.52).

Corollary 4.3

Let \(X_{t}=\sum _{j=0}^{\infty }a_{j}\varepsilon_{t-j}\) (\(t\in\mathbb{Z}\)) be a linear process defined by (4.37) with coefficients satisfying assumption (B1), i.e. a j L a (j)j d−1, d∈(0,1/2). Assume that G has the power rank m. If m(1−2d)<1, then, under the conditions of Theorem 4.9,

$$n^{-(1-m(\frac{1}{2}-d))}L_{m}^{-1/2}(n)\sum _{t=1}^{n}\bigl\{ G(X_{t})-E \bigl(G(X_{1})\bigr)\bigr\}\overset{d}{\rightarrow}G_{\infty}^{(m)}(0)Z_{m,H}(1), $$

where

and L γ is given by (4.39):

Let us apply Corollary 4.3 to \(X_{t}^{2}\), where X t is a linear process such that \(E(X_{1}^{2})=1\). The example shows that in a sense, the power rank method is distribution free. In contrast, limiting results for Appell polynomials are not directly applicable to \(X_{t}^{2}-1\), unless X t are Gaussian.

Example 4.9

Consider a linear process \(X_{t}=\sum_{j=0}^{\infty}a_{j}\varepsilon_{t-j}\) (\(t\in\mathbb{Z}\)) such that \(\sum_{k=0}^{\infty}a_{k}^{2}=1\) and \(E[\varepsilon_{1}^{2}]=1\). Let G(x)=x 2. Then recall from Example 4.6 that

$$\sum_{t=1}^{n}\bigl(X_{t}^{2}-1 \bigr)=\sum_{t=1}^{n}\sum _{j=0}^{\infty}a_{j}^{2}\bigl( \varepsilon_{t-j}^{2}-1\bigr)+\sum _{t=1}^{n}\sum_{{k,l=0;\ k\not =l}}^{\infty }a_{k}a_{l} \varepsilon_{t-k}\varepsilon_{t-l}. $$

The first term can be represented as \(\sum_{t=1}^{n}Y_{t}\), where Y t (\(t\in\mathbb{Z}\)) is the linear process \(Y_{t}=\sum_{j=0}^{\infty}c_{k}\xi_{t-j}\), \(\xi_{t-j}=\varepsilon_{t-j}^{2}-1\), with summable coefficients \(c_{j}=a_{j}^{2}\). Using Theorem 4.5, we have

$$n^{-1/2}\sum_{t=1}^{n}\sum _{j=0}^{\infty}a_{j}^{2}\bigl( \varepsilon_{t-j}^{2}-1\bigr)\overset{d}{\rightarrow}N \bigl(0,v^{2}\bigr), $$

where \(v^{2}=\sigma_{Y}^{2}+2\sum_{k=1}^{\infty}\gamma_{Y}(k)\). The second term can be recognized as U n,2, see (4.51), (4.52) and (4.53). Therefore,

$$n^{-2d}L_{2}^{-1/2}(n)U_{n,2}\overset{d}{ \rightarrow}Z_{2,H}(1) $$

if d∈(1/4,1/2), where Z 2,H (u) is the Hermite–Rosenblatt process with H=d+1/2. On the other hand,

$$n^{-1/2}U_{n,2}\overset{d}{\rightarrow}\sigma_{S}N(0,1) $$

if d<1/4. Furthermore, the terms in (4.63) are uncorrelated. Therefore, if d>1/4, then

$$n^{-2d}L_{2}^{-1/2}(n)\sum _{t=1}^{n}\bigl(X_{t}^{2}-1\bigr) \overset{d}{\rightarrow }Z_{2,H}(1). $$

Otherwise, if d<1/4,

$$ n^{-1/2}\sum_{j=1}^{n} \bigl(X_{t}^{2}-1\bigr)\overset{d}{\rightarrow}N\bigl(0,v+ \sigma_{S}^{2}\bigr). $$
(4.63)

Example 4.10

(ARFIMA)

Assume that X t (\(t\in\mathbb{N}\)) is a FARIMA(0,d,0) process as in Examples 4.4 and 4.5. Then

$$\gamma_{X}(k)\sim c_{\gamma}k^{2d-1},\qquad c_{\gamma}=\frac{\sigma_{\varepsilon}^{2}}{\pi}\varGamma(1-2d)\sin(\pi d). $$

Hence, for d∈(1/4,1/2),

$$n^{-2d}L_{2}^{-1/2}(n)\sum _{t=1}^{n}\bigl(X_{t}^{2}-1\bigr) \overset{d}{\rightarrow }Z_{2,H}(1), $$

where

$$L_{2}(n)=2C_{2}c_{\gamma}^{2},\quad C_{2}=\frac{1}{(2(2d-1)+1)(2d+1)}. $$

Of course, this is comparable to the Gaussian case, see Example 4.1.

4.2.5.5 Technical Details for Theorem 4.9

We write the proof for p=1 only, leaving out some technical details. They can be found in Ho and Hsing (1996, 1997) and Wu (2003). Using the notation \(\mathcal{V}_{t}=(\varepsilon_{t},\varepsilon_{t-1},\ldots,)\), we may write \(T_{n}(G;1)=\sum_{t=1}^{n}U(\mathcal{V}_{t})\), where U(⋅) is a suitable function. Let P K be the conditional expectation operator

$$P_{K}Y=E[Y|\mathcal{V}_{K}]-E[Y|\mathcal{V}_{K-1}]. $$

Noting that P K T n (G;1)=0 if K>n, we can write down the orthogonal decomposition

$$T_{n}(G;1)=\sum_{K=-\infty}^{n}P_{K}T_{n}(G;1). $$

Furthermore,

since the terms corresponding to tK−1 vanish. Therefore,

$$\big\Vert T_{n}(G;1)\big\Vert_{2}^{2}=\sum _{K=-\infty}^{n}\big\Vert P_{K}T_{n}(G;1)\big\Vert_{2}^{2}=\sum_{K=-\infty}^{n} \Biggl \Vert \sum_{t=\max\{K,1\}}^{n}P_{K}U(\mathcal{V}_{t})\Biggr \Vert _{2}^{2}. $$

Now, for any stationary sequence Y t (\(t\in\mathbb{N}\)), we have \(\Vert \sum_{t=1}^{n}Y_{t}\Vert_{2}\leq\sum_{t=1}^{n}\Vert Y_{t}\Vert_{2}\). Therefore, if we define

$$\psi_{t-K}^{2}=\big\Vert P_{K}U(\mathcal{V}_{t})\big\Vert_{2}^{2}=\big\Vert P_{-(t-K)}U(\mathcal{V}_{0})\big\Vert_{2}^{2} $$

and use Lemma 4.17 below, we obtain

(4.64)
(4.65)

A rough bound for this expression can be established as follows:

Let us evaluate the first term only:

This is statement (4.62) of Theorem 4.9 when p=1. We note that the integral above is well defined. For example, as s→∞, the integrand behaves like {s 2(d−1)+1/2}2, which is integrable since d<1/2. A detailed computation can be found in Lemma 5 in Wu (2003).

To finish the proof of Theorem 4.9, we have to prove the following lemma.

Lemma 4.17

Assume that the conditions of Theorem 4.9 are satisfied. Then

$$\big\Vert P_{-K}U(\mathcal{V}_{0})\big\Vert_{2}^{2} =O\bigl(K^{4(d-1)+1}\bigr),\quad K\geq0. $$

Proof

We have

Now we use the decomposition \(X_{0}=X_{0,K-1}+\tilde{X}_{0,K-1}\) and note that X 0,K−1 is independent of \(\mathcal{F}_{-K}\), whereas \(\tilde {X}_{0,K-1}\) is measurable w.r.t. this sigma field. Thus, recalling that E(ε 1)=0, the second term in \(P_{-K}U(\mathcal{V}_{0})\) yields

$$E[X_{0}|\mathcal{F}_{-K}]-E[X_{0}|\mathcal{F}_{-(K+1)}]=\tilde{X}_{0,K-1}-\tilde{X}_{0,K}=a_{K}\varepsilon_{-K}. $$

The first term in \(P_{-K}U(\mathcal{V}_{0})\) is

$$G_{K-1}(\tilde{X}_{0,K-1})-G_{K}(\tilde{X}_{0,K}). $$

Applying (4.57) and (4.58) with \((\xi_{A},\xi_{B},\mathcal{F})=(\tilde{X}_{0,K-1},X_{0,K-1},\mathcal{F}_{0-K})\) and \((\xi_{A},\xi_{B},\mathcal{F})=(\tilde{X}_{0,K},X_{0,K},\mathcal{F}_{0-(K+1)})\), our goal is to evaluate the bound

$$\big\Vert P_{-K}U(\mathcal{V}_{0})\big\Vert_{2}^{2}=\big\Vert G_{K-1}(\tilde{X}_{0,K-1})-G_{K}(\tilde{X}_{0,K})-G_{\infty}^{(1)}(0)a_{K}\varepsilon _{0-K}\big\Vert_{2}^{2}. $$

In the first step, we will replace G K−1 by G K . Note first that for any \(y\in\mathbb{R}\),

(4.66)

Taking into account that E(ε K )=0 and applying a Taylor expansion, we therefore obtain

Therefore,

The first term I 1 is treated again using a Taylor approximation: it is bounded by \(a_{K}^{4}E^{2}(\varepsilon_{1}^{2})\sup _{y}|G_{K}^{(2)}(y)|\). As for the second term, since \(\tilde{X}_{0,K}\) and ε K are independent, we have

$$I_{2}=a_{K}^{2}E\bigl[\varepsilon^{2}\bigr]\big\Vert G_{\infty}^{(1)}(0)-G_{K}^{(1)}(\tilde{X}_{0,K})\big\Vert_{2}^{2}.$$

Thus, in analogy to (4.66), by conditioning on \(\tilde{X}_{0,K}\),

$$ G_{\infty}^{(1)}(y)=E \bigl[ G^{(1)}(X+y) \bigr] =E \bigl[ G_{K}^{(1)}(\tilde{X}_{0,K}+y) \bigr] . $$
(4.67)

Furthermore, for any two random variables η A and η B , we have E[(η A E[η B ])2]≤E[(η A η B )2]. Therefore, using (4.67) with \(\tilde{Y}_{0,K}\), an independent copy of \(\tilde{X}_{0,K}\), we obtain

Hence,

$$I_{2}\leq Ca_{K}^{2}\sum_{j=K+1}^{\infty}a_{j}^{2}\sim Ca_{K}^{2}\sum _{j=K+1}^{\infty}j^{2(d-1)}\sim CK^{4(d-1)+1}. $$

This finishes the proof of the lemma.

Note that we had to assume that, for p=1,

$$\max_{r=1,2}\sup_{y}\big|G_{K}^{(r)}(y)\big|<\infty. $$

This explains the conditions of Theorem 4.9. □

4.2.6 Stochastic Volatility Models and Their Modifications

In this section we consider limit theorems for partial sums of stochastic volatility models. Let X t =σ t ξ t \((t\in\mathbb{N})\), where

$$\sigma_{t}=\sigma(\zeta_{t}),\quad \zeta_{t}=\sum_{j=1}^{\infty} a_{j}\varepsilon_{t-j}, $$

and σ(⋅) is a positive function. It is assumed that (ξ t ,ε t ) (\(t\in\mathbb{Z}\)) is a sequence of i.i.d. random vectors and E(ε 1)=0. The linear process ζ t is assumed to have long memory with autocovariance function γ ζ (k)∼L γ (k)k 2d−1, d∈(0,1/2). However, we do not assume at the moment that E(ξ 1)=0. If the sequences ξ t and ε t are mutually independent, then the model is called LMSV (Long-Memory Stochastic Volatility), but for the purpose of this section, we do not need to make this assumption.

Let \(\mathcal{G}_{j}\) be the sigma field generated by ξ l ,ε l , lj. We consider partial sums

$$S_{n}(u)=\sum_{t=1}^{[nu]} \bigl\{ G(X_{t})-E \bigl[ G(X_{1}) \bigr] \bigr\} \quad\bigl(u \in[0,1]\bigr), $$

where G is a measurable function such that E[G 2(X 1)]<∞.

The asymptotic behaviour of partial sums is described in the following theorem. For simplicity, we formulate it in a Gaussian setting; however, it can be extended to linear processes, using the results of Sect. 4.2.5 instead of Theorem 4.4.

Theorem 4.10

Consider the stochastic volatility model described above with \(v^{2}=\operatorname {var}(G(X_{1}))<\infty\) (but possibly \(E(\xi_{1})\not=0\)). Assume in addition that ε t (\(t\in\mathbb{Z}\)) are standard normal.

  • If \(E[G(X_{1})|\mathcal{G}_{0}]=0\), then

    $$ n^{-1/2}S_{n} (u)\Rightarrow v B(u), $$
    (4.68)

    where B(u) (u∈[0,1]) is a standard Brownian motion.

  • If \(E[G(X_{1})|\mathcal{G}_{0}]\not=0\), then

    $$ n^{-(1-m(\frac{1}{2}-d))}L_{m}^{-1/2}(n)\sum _{t=1}^{[nu]}\bigl\{ G(X_{t})-E \bigl[ G(X_{1}) \bigr] \bigr\}\Rightarrow\frac{J(m)}{m!}Z_{m,H}(u), $$
    (4.69)

    wheredenotes weak convergence in D[0,1], Z m,H (u) (u∈[0,1]) is the Hermite–Rosenblatt process, m is the Hermite rank of

    $$\tilde{G}(y)=\int G\bigl(s\sigma(y)\bigr)\,dF_{\xi}(s) $$

    with F ξ denoting the distribution of ξ, \(L_{m}(n)=m!C_{m}L_{\gamma }^{m}(n)\) (cf. (4.39), (4.21), (4.22)) and \(J(m)=E[\tilde{G}(\zeta_{1})H_{m}(\zeta_{1})]\).

Proof

Note that σ t is measurable w.r.t. \(\mathcal{G}_{t-1}\), whereas ξ t is independent of \(\mathcal{G}_{t-1}\). Thus,

Note that the first part is a martingale. For this part, it suffices to verify the conditions of the martingale central limit theorem; see Lemma 4.2. Set X t,n =n −1/2 G(X t ). The Lindeberg condition is clearly satisfied since

$$E \bigl[ \tilde{X}_{t,n}^{2}1\bigl\{|\tilde{X}_{t,n}|> \delta\bigr\} \bigr] \leq4E \bigl[ X_{t,n}^{2}1\bigl\{|X_{t,n}|> \delta\bigr\} \bigr] \rightarrow0 $$

on account of E[G 2(X 1)]<∞, where \(\tilde{X}_{t,n}=X_{t,n}-E[X_{t,n}|\mathcal{G}_{t-1}]\). Furthermore, \(E[G^{2}(X_{t})|\mathcal{G}_{t-1}]\) is a measurable function of the random variable ζ t and hence of the i.i.d. sequence ε t−1,ε t−2,…. Therefore, the sequence \(E[G^{2}(X_{t})|\mathcal{G}_{t-1}]\) (t≥1) is ergodic, and \(n^{-1}\sum_{t=1}^{n}E[G^{2}(X_{t})|\mathcal{G}_{t-1}]\) converges in probability to E[G 2(X 1)]. Therefore, we conclude (4.68) for the martingale part M n (u).

On the other hand, the second part R n (u) can be written as

$$R_{n}(t)=\sum_{t=1}^{[nu]} \bigl\{ \tilde{G}(\zeta_{t})-E \bigl[ \tilde {G}(\zeta_{t}) \bigr] \bigr\} , $$

and (4.69) can be concluded using Theorem 4.4. □

Several comments have to be made here. We note that the proof of (4.68) does not involve a particular structure of the model. Consider for example the standard stochastic volatility model where E(ξ 1)=0. If we take G(x)=x, then \(n^{-1/2}\sum_{t=1}^{[nu]}X_{t}\) converges to a Brownian motion without the assumption of Gaussianity on ε t . Furthermore, it is worth mentioning that this approach works (in the case (4.68) only) for partial sums of GARCH, ARCH(∞) or LARCH(∞) models; for the latter, see Beran (2006).

Example 4.11

Assume that G(y)=y 2. Then \(\tilde{G}(y)=E[\xi _{1}^{2}]\sigma^{2}(y)\). Therefore, m is the Hermite rank of σ 2(y). In particular, if σ(y)=exp(y), then m=1. We conclude

$$n^{-(d+1/2)}L_{1}^{-1/2}(n)\sum _{t=1}^{[nu]}\bigl(X_{t}^{2}-E \bigl(X_{1}^{2}\bigr)\bigr)\Rightarrow J(1)B_{H}(u), $$

where \(J(1)=E( \zeta_{1}\exp(2\zeta_{1})) E(\xi_{1}^{2})\). This is analogous to Surgailis and Viano (2002); note however that the authors considered general linear processes.

If \(E(\xi_{1})\not=0\) and G(x)=x, then (4.68) is no longer valid; rather (4.69) holds with m=1.

Example 4.12

(Long-Memory Stochastic Duration, LMSD)

For the purpose of this example, we assume that random variables ξ t (\(t\in\mathbb{N}\)) are strictly positive and hence non-centred. Furthermore, it is assumed that the sequences ξ t and σ t are independent. Then X t =ξ t σ t inherits the dependence structure from σ t , i.e.

$$\mathit{cov}(X_{0},X_{k})=E(X_{0}X_{k})-E(X_{0})E(X_{k})=E^{2}[\xi_{1}]\mathit{cov}(\sigma _{0},\sigma_{k}). $$

Assume that G(x)=x and σ(x)=exp(x). Then \(\tilde{G}(y)=E(\xi _{1})\exp(y)\) and m=1. Application of Theorem 4.10 yields

$$n^{-(d+1/2)}L_{1}^{-1/2}(n)\sum _{t=1}^{[nu]}\bigl(X_{t}-E(X_{1}) \bigr)\Rightarrow J(1)B_{H}(u) $$

weakly in D[0,1], where B H (⋅) is a fractional Brownian motion with H=d+1/2, and J(1)=E[ζ 1exp(ζ 1)]E[ξ 1].

Example 4.13

We illustrate the centering effect with a simulation example. First, we generate n=1000 i.i.d. standard normal random variables ξ t . Then we simulate independently n=1000 observations ζ t from a Gaussian FARIMA(0,d,0) process with d=0.4 and compute σ t =exp(ζ t ). Then, we construct two stochastic volatility models: a centred one, X t =ξ t σ t and a non-centred one, \(\tilde{X}_{t}=(\xi _{t}+1)\sigma_{t}\). Finally, we plot the partial sum sequences \(S_{k}=\sum_{t=1}^{k}X_{t}\) and \(\tilde{S}_{k}=\sum_{t=1}^{k}(\tilde{X}_{t}-E(\tilde{X}_{1}))\), k=1,…,n. The corresponding partial sum processes are plotted in Fig. 4.3. The smoother path in the second, non-centred, case indicates an influence of long memory (cf. Fig. 4.1).

Fig. 4.3
figure 3

Partial sums for a centred and a non-centred stochastic volatility model

4.2.7 ARCH(∞) Models

Recall from Definition 2.1 that the ARCH(∞) model has the form X t =σ t ξ t , where ξ t (\(t\in\mathbb{Z}\)) are i.i.d. zero mean random variables with variance \(\sigma_{\xi}^{2} \). Also,

$$\sigma_{t}^{2}=b_{0}+\sum_{j=1}^{\infty}b_{j}X_{t-j}^{2}. $$

Furthermore, if \(\sigma_{\xi}^{2}\sum_{j=1}^{\infty}b_{j}<1\), then X t (\(t\in\mathbb{Z}\)) is stationary, and \(E(X_{1}^{2})<\infty\). The sequence X t (\(t\in\mathbb{Z}\)) is a martingale. Using the martingale central limit theorem (see Lemma 4.2), we conclude the following result. It can also be stated in a functional form (as convergence to a Brownian motion).

Corollary 4.4

Consider an ARCH(∞) model as in Definition 2.1. Assume that \(\sigma_{\xi}^{2}\sum_{j=1}^{\infty}b_{j}<1\). Then

$$n^{-1/2}\sum_{t=1}^{n}X_{t} \overset{d}{\rightarrow}N\bigl(0,\sigma_{X}^{2}\bigr), $$

where

$$\sigma_{X}^{2}=\frac{\sigma_{\xi}^{2}b_{0}}{1-\sigma_{\xi}^{2}\sum_{j=1}^{\infty}b_{j}}. $$

Next, we are interested in the asymptotic behaviour of

$$S_{n}=\sum_{t=1}^{n} \bigl(X_{t}^{2}-E\bigl(X_{1}^{2}\bigr) \bigr). $$

To deal with this, we will use the general Definition 2.2 of ARCH(∞) models and set \(Y_{t}=X_{t}^{2}=v_{t}\zeta_{t}=\sigma _{t}^{2}\xi_{t}^{2}\). In contrast to X t (\(t\in\mathbb{Z}\)), the squared sequence is not a martingale. However, we recall from Theorem 2.3 that, under the existence condition \(\mu_{\zeta}^{1/2}\sum_{j=1}^{\infty}b_{j}<1\) (which guarantees E(Y 2)<∞), we have the summability of the covariances, \(\sum_{k=\infty}^{\infty}|\gamma_{Y} (k)|<\infty\). Thus, we may expect a central limit for partial sum S n with the rate n −1/2. Indeed, we will argue that the ARCH(∞) model Y t =v t ζ t , \(v_{t}=b_{0}+\sum_{j=1}^{\infty}b_{j}Y_{t-j}\), can be written using the Wold decomposition with respect to a martingale difference.

To see this, assume that \(E(\zeta_{1})=E(\xi_{1}^{2})=1\) and let \(\psi(z)=1-\sum_{j=1}^{\infty}b_{j}z^{j}\). Since \(\sum_{j=1}^{\infty}b_{j}<1\), we conclude that ψ(⋅) is analytic on {z:|z|<1} and has no zeros in {z:|z|≤1}. Hence, it is invertible, and \(\psi^{-1}(z)=\sum _{j=0}^{\infty}\tilde{b}_{j}z^{j}\) with \(\sum_{j=0}^{\infty}|\tilde{b}_{j}|<\infty\). Now, v t =b 0+(1−ψ(B))Y t , which leads to

$$\psi(B)Y_{t}=Y_{t}-v_{t}+b_{0}=v_{t}(\zeta_{t}-1)+b_{0}. $$

On the other hand,

so that

$$E(Y_{1})\psi(B)=E(Y_{1})\psi(1)=b_{0}. $$

Hence, ψ(B)(Y j E(Y 1))=v t (ζ t −1) and

$$Y_{t}-E(Y_{1})=\sum_{j=0}^{\infty}\tilde{b}_{j}v_{t}(\zeta_{t}-1). $$

We note that v t (ζ t −1) (\(t\in\mathbb{Z}\)) is a martingale difference sequence. Therefore, the centred Y t has a Wold decomposition with summable coefficients \(\sum_{j=0}^{\infty}\tilde{b}_{j}\), where the innovations v t (ζ t −1) are uncorrelated and martingale differences. Consequently, we could in principle apply the same method as in the proof of Theorem 4.5, provided that it can be generalized to possibly dependent innovations that are martingale differences. Since this is possible, we can conclude the following result.

Theorem 4.11

Consider an ARCH(∞) process as in Definition 2.2. Assume that \(\sqrt{E[ \xi_{1}^{2}] }\sum_{j=1}^{\infty}b_{j}<1\). Then

$$n^{-1/2}\sum_{t=1}^{n} \bigl(Y_{t}-E(Y_{1})\bigr) \overset{d}{\rightarrow}N\bigl(0, \sigma_{Y}^{2}\bigr), $$

where \(\sigma_{Y}^{2}=\sum_{k=-\infty}^{\infty}\gamma_{Y}(k)\).

4.2.8 LARCH Models

Recall that a LARCH(∞) process is defined as

where b 0≠0, and ξ t (\(t\in\mathbb{Z}\)) are i.i.d. zero mean random variables with \(\sigma_{\xi}^{2}=E(\xi_{1}^{2})=1\). As in the case of ARCH(∞) processes, the sequence X t is a martingale difference. Therefore, the statement of Corollary 4.4 still holds with \(\sigma_{X}^{2}=E[\sigma_{1}^{2}]=b_{0}^{2}/(1-\Vert b\Vert_{2}^{2})\) (cf. (2.51)).

The situation is different when we consider \(X_{t}^{2}\). We can use the decomposition (cf. (2.56))

$$ \sum_{t=1}^{n}\bigl(X_{t}^{2}-E \bigl(X_{1}^{2}\bigr)\bigr)=\sum_{t=1}^{n} \bigl(\sigma_{t}^{2}-E\bigl(\sigma_{1}^{2} \bigr)\bigr)+\sum_{t=1}^{n}\bigl( \xi_{t}^{2}-1\bigr)\sigma_{t}^{2}. $$
(4.70)

The second term is a martingale and therefore of the order \(O_{P}(\sqrt{n})\). Therefore, in the case of a long-memory LARCH(∞) process, the asymptotic behaviour of \(\sum_{t}(X_{t}^{2}-E(X_{t}^{2}))\) is the same as that of \(\sum_{t}(\sigma_{t}^{2}-E(\sigma_{t}^{2}))\). On the other hand, (2.57) of Theorem 2.7 suggests that \(\sum_{t}(\sigma_{t}^{2}-E(\sigma_{1}^{2}))\) behaves (up to a constant) like ∑ t (σ t E(σ 1)). This will be justified below. We then obtain the following result.

Theorem 4.12

Consider a LARCH(∞) process. Let μ p =E[|ξ 1|p]<∞. Assume that \(11\mu_{4}^{1/2}b^{2}<1\), where \(b=\sum_{j=1}^{\infty}b_{j}^{2}\), and that

$$ b_{j}\sim c_{b}j^{d-1}\quad(j\rightarrow\infty), $$
(4.71)

where c b >0, d∈(0,1/2). Then

$$n^{-(d+1/2)}\sum_{t=1}^{[nu]} \bigl(X_{t}^{2}-E\bigl(X_{1}^{2}\bigr) \bigr)\Rightarrow2b_{0}^{-1}E\bigl( \sigma_{1}^{2}\bigr)c_{1} \biggl( \frac{1}{d(2d+1)} \biggr)^{1/2}B_{H}(u), $$

wheredenotes weak convergence in D[0,1], B H (u) is a fractional Brownian motion with the Hurst parameter H=d+1/2, and

$$c_{1}= \biggl( \frac{b_{0}^{2}}{1-\Vert b\Vert^{2}} \biggr)^{1/2}\sqrt{B(d,1-2d)}c_{b}. $$

Remark 4.1

According to Theorem 2.7, the condition \(11\mu _{4}^{1/2}b^{2}<1\) implies that the fourth moment of X t is finite.

Proof

Step 1::

First, we look at \(\sum_{t=1}^{[nu]}(\sigma_{t}-E(\sigma _{1}))\). It can be written as

$$\sum_{t=1}^{n}\bigl(\sigma_{t}-E( \sigma_{1})\bigr)=\sum_{t=1}^{n}\sum _{l=1}^{\infty }b_{l}\sigma_{t-l} \xi_{t-l}=\sum_{t=1}^{n}\sum _{l=-\infty}^{t-1}b_{t-l}\sigma_{l}\xi_{l}. $$

We note that σ t ξ t (\(t\in\mathbb{Z}\)) are uncorrelated and martingale differences. Therefore, we have the partial sum of a process ∑b tl σ l ξ l that is a weighted linear sum with innovations being martingale differences. This is similar, though not identical, to the sum studied in Sect. 4.2.5 (the difference is that the innovations are only uncorrelated, not independent, i.e. we do not have a linear process). To identify asymptotic constants, rewrite the sum as \(\sum_{t=1}^{n}\sum_{l=1}^{\infty}b_{l}\xi_{t-l}\sigma_{t-l}\). Then for t<t′,

$$\mathit{cov} \Biggl( \sum_{l=1}^{\infty}b_{l} \xi_{t-l}\sigma_{t-l},\sum_{l=1}^{\infty }b_{l} \xi_{t^{\prime}-l}\sigma_{t^{\prime}-l} \Biggr) =\operatorname {var}(\xi_{1} \sigma_{1})\sum_{l=1}^{\infty}b_{l}b_{l+t^{\prime}-t}. $$

If (4.71) holds, then, as |j′−j|→∞, the covariance behaves like

Using known results for linear processes (see Lemma 4.9), we obtain, as n→∞,

$$\operatorname {var}\Biggl( \sum_{t=1}^{n} \sigma_{t}\Biggr) \sim \operatorname {var}( \xi_{1}\sigma_{1})\frac{1}{d(2d+1)}c_{b}^{2}B(d,1-2d)n^{2d+1} $$

(note that these results are applicable as long as the innovations are uncorrelated). Now,

$$\operatorname {var}(\xi_{0}\sigma_{0})=\frac{b_{0}^{2}}{1-\Vert b\Vert_{2}^{2}}. $$

Theorem 4.6 can be generalized to the case where innovations are martingale differences. Setting

$$c_{1}= \biggl( \frac{b_{0}^{2}}{1-\Vert b\Vert^{2}} \biggr)^{1/2} \bigl( B(d,1-2d) \bigr)^{1/2}c_{b},$$

one then can apply the generalized version of Theorem 4.6 to obtain

$$ \frac{1}{n^{d+1/2}}\sum_{t=1}^{[nu]}\bigl( \sigma_{t}-E(\sigma_{1})\bigr)\Rightarrow c_{1} \biggl( \frac{1}{d(2d+1)} \biggr)^{1/2}B_{H}(u). $$
(4.72)
Step 2::

To deal with \(\sum_{t=1}^{[nu]}(\sigma_{t}^{2}-E(\sigma _{1}^{2}))\), we recall that (cf. (2.57))

$$\mathit{cov}\bigl(\sigma_{t}^{2},\sigma_{t+k}^{2} \bigr)\sim \biggl( \frac{2E(\sigma_{1}^{2})}{b_{0}} \biggr)^{2}\mathit{cov}( \sigma_{t},\sigma_{t+k})\quad(k\rightarrow \infty). $$

The implication is that the asymptotic behaviour of the partial sum is the same as that of

$$2b_{0}^{-1}E\bigl[\sigma_{1}^{2}\bigr] \sum_{t=1}^{[nu]}\bigl(\sigma_{t}-E( \sigma_{1})\bigr) $$

(though more detailed arguments are required to obtain a similar linear representation as for σ t ). Hence,

$$n^{-(d+1/2)}\sum_{t=1}^{[nu]}\bigl( \sigma_{t}^{2}-E\bigl(\sigma_{1}^{2}\bigr) \bigr)\Rightarrow 2b_{0}^{-1}E\bigl(\sigma_{1}^{2} \bigr)c_{1} \biggl( \frac{1}{d(2d+1)} \biggr)^{1/2}B_{H}(u). $$

Using this and decomposition (4.70), we obtain the result. □

4.2.9 Summary of Limit Theorems for Partial Sums

We summarize the main results for partial sums under long memory in Table 4.1. For simplicity, the slowly varying functions are assumed to be constant in this summary. Also, only \(X_{t}^{2}\) is considered as a representative of nonlinear transformations.

Table 4.1 Limits for partial sums with finite moments

4.3 Limit Theorems for Sums with Infinite Moments

4.3.1 Introduction

In this section we present limit theorems for partial sums of long-memory processes with infinite moments. Although the theory is quite well understood for weakly dependent random variables (Davis and Resnick 1985, Davis and Hsing 1995, Denker and Jakubowski 1989, Dabrowski and Jakubowski 1994, Bartkiewicz et al. 2011), the case of long memory is less well developed yet, except in the linear case. Results for linear processes with long memory were proven already several decades ago in Astrauskas (1983) and Kasahara and Maejima (1988). Subordinated linear processes were studied in Hsing (1999), Koul and Surgailis (2001), Surgailis (2002, 2004), Vaičiulis (2003). Surprisingly, the martingale decomposition method, used for finite-variance random variables in Theorem 4.9, works also here. Subordinated Gaussian processes were considered for instance in Davis (1983) and Sly and Heyde (2008). Limiting results for infinite-variance stochastic volatility models with long memory are almost non-existing; see McElroy and Politis (2007), Surgailis (2008), Kulik and Soulier (2012). In particular, both subordinated Gaussian processes and stochastic volatility models can be treated using a point process methodology. A complete list of the meanwhile quite extended literature would be too long to be included here. However, some important results and more references can be found for instance in Astrauskas et al. (1991), Benassi et al. (2002), Heath et al. (1998), Houdré and Kawai (2006), Kokoszka and Taqqu (1995a, 1995b, 1996, 1997, 1999), Koul and Surgailis (2001), Samorodnitsky (2004), Samorodnitsky and Taqqu (1994), Surgailis (2004), Zhou and Wu (2010).

First, we will summarize (with some details) results on regularly varying distributions, stable laws and point processes, referring the reader for details to standard textbooks such as Bingham et al. (1989), Feller (1971), Kallenberg (1997), Resnick (2007), Samorodnitsky and Taqqu (1994), Embrechts et al. (1997).

4.3.2 General Tools: Regular Variation, Stable Laws and Point Processes

4.3.2.1 Regular Variation

Let X t (\(t\in\mathbb{N}\)) be an i.i.d. sequence whose marginal distribution has regularly varying tails:

$$ P(X_{1}>x)\sim\frac{1+\beta}{2}x^{-\alpha}L_{X}(x),\qquad P(X_{1}<-x)\sim\frac{1-\beta}{2}x^{-\alpha}L_{X}(x)\quad(x\rightarrow\infty), $$
(4.73)

where L X (⋅) is slowly varying at infinity, and β∈[−1,1]. Condition (4.73) is the balanced tail condition. It is equivalent to P(|X 1|>x)∼x α L X (x) and

$$\lim_{x\rightarrow\infty}\frac{P(X_{1}>x)}{P(|X_{1}|>x)}=\frac{1+\beta}{2},\qquad\lim_{x\rightarrow\infty}\frac{P(X_{1}<-x)}{P(|X_{1}|>x)}=\frac{1-\beta}{2}. $$

A typical example is a random variable with Cauchy density p X (x)=π(1+x 2)−1. This random variable is symmetric, and P(X 1>x)∼(πx)−1, x>0. Therefore, the Cauchy distribution is regularly varying with index α=1. Another example is a (two-sided) Pareto distribution where

$$P\bigl(|X_{1}|>x\bigr)=x^{-\alpha}\quad(x>1). $$

We note that if α∈(0,2), then random variable X has an infinite second moment. The case α=2 requires special attention.

Example 4.14

Assume that L X (x)≡1 and that for x>x 0>0, we have \(\bar{F}_{|X|} (x):=P(|X|>x)=x^{-\alpha}\) with α=2. Then

$$\int_{x_{0}}^{\infty}x\bar{F}_{|X|}(x)\,dx=\int_{x_{0}}^{\infty}xx^{-\alpha} \,dx=\int_{x_{0}}^{\infty}x^{-1}\,dx=+\infty. $$

On the other hand, if L X (x)=(logx)−2, then

$$\int_{x_{0}}^{\infty}xx^{-\alpha}\frac{1}{(\log x)^{2}}\,dx=\int_{x_{0}} ^{\infty}\frac{1}{x(\log x)^{2}}\,dx=\int_{\log x_{0}}^{\infty}\frac {1}{u^{2}}\,du<+\infty. $$

Therefore, we have infinite and finite variance, respectively, in the first and the second case. This means that for α=2, the slowly varying function plays an important role.

The following result is the appropriately modified Karamata theorem. It provides extremely useful estimates for truncated moments (see e.g. Resnick 2007, pp. 25, 36).

Lemma 4.18

Assume that X is a random variable such that (4.73) holds. Let \(\bar{F}(x)=P(X>x)\).

  • If α<η, then

    $$E \bigl[ X^{\eta}1\bigl\{|X|\leq x\bigr\} \bigr] \sim\frac{\alpha}{\eta-\alpha }x^{\eta } \bar{F}(x). $$

Finally note that

$$ c_{n}=\inf\bigl\{x:P\bigl(|X|>x\bigr)\leq n^{-1}\bigr\} $$
(4.74)

will be the appropriate normalization sequence used to establish convergence of partial sums and point process convergence. In particular, this sequence can be chosen as c n =n 1/α L(n), where L is a slowly varying at infinity. If L X (x)≡A (i.e. L is constant), then c n =A 1/α n 1/α.

4.3.2.2 Stable Random Variables

Stable random variables can be considered as a special case of (4.73). There are several equivalent definitions of stable random variables.

Definition 4.2

A random variable X is stable if for any n≥2, there exist constants c n >0 and \(d_{n}\in\mathbb{R}\) such that

$$X_{1}+\cdots+X_{n}\overset{d}{=}c_{n}X+d_{n} , $$

where X 1,X 2,… are independent copies of X. Necessarily, c n =n 1/α, where α∈(0,2]. If d n =0, then X is called strictly stable.

Equivalently, stable random variables are characterized in terms of domains of attraction:

Definition 4.3

A random variable X is stable if there exists an i.i.d. sequence Y t (\(t\in\mathbb{N}\)) and constants c n >0, \(d_{n}\in\mathbb{R}\) such that

$$\frac{Y_{1}+\cdots+Y_{n}}{c_{n}}+d_{n}\overset{d}{\rightarrow}X. $$

The characteristic function of a stable random variable X is given by

$$E \bigl[ e^{i\theta X} \bigr] =\left\{ \begin{array}{l@{\quad}l} \exp ( -\eta^{\alpha}|\theta|^{\alpha} ( 1-i \beta\mathrm{sign}(\theta)\tan\frac{\pi\alpha}{2} ) +i\mu\theta ) & \text{if }\alpha\not=1, \\[6pt] \exp ( -\eta|\theta| ( 1+i\beta\frac{2}{\pi}\mathrm{sign}(\theta )\ln(\theta) ) +i\mu\theta ) & \text{if }\alpha=1. \end{array} \right. $$

Here, 0<α≤2, η>0 is the scale parameter, −1≤β≤1 is a skewness, and \(\mu\in\mathbb{R}\) a shift parameter. We write XS α (η,β,μ). In particular, X is symmetric α-stable (written as XSαS) if XS α (η,0,0). If β=1, then the random variable X is called totally skewed to the right. If α∈(1,2], then −∞<μ=E(X)<∞. In what follows, we will omit the case α=1 from our discussion.

If α∈(0,2), then stable random variables are heavy tailed in the sense of (4.73). Indeed, if XS α (η,β,μ), then

$$ \lim_{x\rightarrow\infty}x^{\alpha}P(X>x)=C_{\alpha}\frac{1+\beta}{2}\eta^{\alpha},\qquad\lim_{x\rightarrow\infty}x^{\alpha }P(X<-x)=C_{\alpha }\frac{1-\beta}{2}\eta^{\alpha}, $$
(4.75)

where

$$C_{\alpha}=\biggl( \int_{0}^{\infty}x^{-\alpha}\sin x\biggr)^{-1} =\frac{1-\alpha}{\varGamma(2-\alpha)\cos(\pi\alpha/2)}\quad(\alpha\not=1). $$

Therefore, (4.73) holds with L X (x)≡C α η α. If η=1, then the scaling constant c n defined in (4.74) is \(c_{n}=C_{\alpha }^{1/\alpha}n^{1/\alpha}\).

In what follows, we will use several properties of stable random variables. They can be obtained by considering the characteristic function. If \(X_{j}\overset{\mathrm{d}}{=}S_{\alpha}(\eta_{j},\beta_{j},\mu_{j})\) (j=1,2) are independent, then

$$ X_{1}+X_{2}\overset{\mathrm{d}}{=}S_{\alpha} \biggl( \bigl(\eta_{1}^{\alpha}+\eta_{2}^{\alpha} \bigr)^{1/\alpha},\frac{\beta_{1}\eta_{1}^{\alpha}+\beta_{2}\eta_{2}^{\alpha}}{\eta_{1}^{\alpha}+\eta_{2}^{\alpha}},\mu_{1}+\mu_{2} \biggr) $$
(4.76)

and

$$ cX_{1}\overset{\mathrm{d}}{=}S_{\alpha}\bigl(|c| \eta_{1},\mathrm {sign}(c)\beta_{1},c\mu_{1}\bigr). $$
(4.77)

Due to the scaling property, it is sufficient to consider S α (1,β,μ) random variables.

4.3.2.3 Stable Convergence

Stable random variables play a crucial in the asymptotic theory for heavy-tailed random variables (with α∈(0,2); see Gnedenko and Kolmogorov 1968, Feller 1971). Assume that X t (\(t\in\mathbb{N}\)) is an i.i.d. sequence of S α (1,β,μ) random variables. Using (4.76) and (4.77), we have

$$n^{-1/\alpha}\sum_{t=1}^{n}X_{t} \overset{\mathrm{d}}{=}S_{\alpha} \biggl( 1,\beta,\frac{n\mu}{n^{1/\alpha}} \biggr) . $$

Thus, if α∈(0,1), then n/n 1/α→0 and

$$ n^{-1/\alpha}\sum_{t=1}^{n}X_{t}\overset{\mathrm{d}}{\rightarrow} S_{\alpha}(1,\beta,0). $$
(4.78)

If α∈(1,2), a centering is required:

$$n^{-1/\alpha}\sum_{t=1}^{n}(X_{t}- \mu_{n})\overset{\mathrm {d}}{=}S_{\alpha } \biggl( 1,\beta, \frac{n(\mu-\mu_{n})}{n^{1/\alpha}} \biggr) . $$

Thus, we may choose μ n =μ (recall from Definition 4.3 that for α∈(1,2), we have μ=E(X)) to obtain

$$ n^{-1/\alpha}\sum_{t=1}^{n}(X_{t}-\mu)\overset{\mathrm{d}}{\rightarrow }S_{\alpha}(1,\beta,0). $$
(4.79)

However, we may also choose μ n =E[X1{|X|<n 1/α}]. Then from the Karamata theorem, as n→∞,

$$\frac{n(\mu-\mu_{n})}{n^{1/\alpha}}=\frac{nE[X\cdot1\{|X|\geq n^{1/\alpha}\}]}{n^{1/\alpha}}\rightarrow C_{\alpha}\frac{\alpha}{\alpha-1}. $$

Consequently,

$$n^{-1/\alpha}\sum_{t=1}^{n} \bigl(X_{t}- E\bigl[X\cdot1\bigl\{|X|<n^{1/\alpha}\bigr\}\bigr]\bigr) \overset{\mathrm{d}}{\rightarrow}S_{\alpha} \biggl( 1,\beta,C_{\alpha } \frac{\alpha}{\alpha-1} \biggr) . $$

Of course, we can restate these results using \(c_{n}=C_{\alpha }^{1/\alpha }n^{1/\alpha}\) instead of n 1/α. The convergence results can be proven formally using the characteristic functions.

More generally, a classical result by Skorokhod (1957) states that if the i.i.d. random variables X t (\(t\in\mathbb{N}\)) fulfill (4.73) with L X (x)≡A, then

$$ n^{-1/\alpha}S_{n}(u):=n^{-1/\alpha}\sum_{t=1}^{[nu]}(X_{t}-\mu )\Rightarrow A^{1/\alpha}C_{\alpha}^{-1/\alpha}Z_{\alpha}(u), $$
(4.80)

where Z α (⋅) is an α-stable Lévy motion with \(Z_{\alpha}(u)\overset{\mathrm{d}}{=}u^{1/\alpha}S_{\alpha}(1,\beta,0)\), ⇒ denotes weak convergence in D[0,1] w.r.t. J 1 topology, and μ=E(X) if α∈(1,2) and μ=0 if α∈(0,1). We say then that random variables X t (\(t\in\mathbb{N}\)) are in the domain of attraction of the α-stable law. Of course, if the random variables X t are stable S α (1,β,0) and u=1, then (4.80) reduces to (4.79) since then A=C α .

4.3.2.4 Point Processes

Point processes are a useful tool to study limit theorems for partial sums, sample covariances and some other functionals such as extremes. Here, we summarize (with some details) results on convergence of point processes. For a detailed exposition, the reader is referred to Resnick (2007) or Embrechts et al. (1997).

Let X t (\(t\in\mathbb{N}\)) be a stationary sequence, and c n a sequence of constants. Define the point process as

$$N_{n}=\sum_{t=1}^{n}\delta_{(t/n,c_{n}^{-1}X_{t})}. $$

Here, δ is a Dirac measure, which means that δ x (A)=1 if xA and 0 otherwise. A point process N n can be viewed as a random element defined on [0,1]×(−∞,∞), with values in \(\mathbb{N}\). In other words, this is a random element with values in M p (E), the set of all Radon point measures on \(E=\mathbb{R}^{2}\). In particular, if we choose a set U=[0,1]×(0,u), then \(N_{n}(U)=\tilde{N}_{n}(u)=\sum_{t=1}^{n}1\{0<c_{n}^{-1}X_{t}<u\}\) counts points \(c_{n}^{-1}X_{t}\) that lie between 0 and u. The process \(\tilde{N}_{n}(u)\) (\(u\in\mathbb{R}_{+}\)) is called a counting process and is depicted on Fig. 4.4.

Fig. 4.4
figure 4

Counting process: X (1)X (2)X (3) are the smallest observations in the sample X 1,…,X n

There are several ways to establish convergence of point processes. The first one is referred to as Kallenberg’s theorem (see Theorem 14.17 in Kallenberg 1997, or Theorem 5.2.2 in Embrechts et al. 1997).

Proposition 4.2

Let N n , \(n\in\mathbb{N}\), and N be point processes on \(\mathbb{R}^{d}\) such that N has no multiple points. Assume that

(4.81)
(4.82)

for \(U=\bigcup_{i=1}^{K}(k_{i},l_{i})\times(s_{i},t_{i})\), K≥1, 0≤k i <l i ≤1, and arbitrary relatively compact open intervals (s i ,t i ) of (−∞,0)∪(0,∞). Then N n converges weakly to N in \(M_{p}(\mathbb{R}^{d})\).

We illustrate this theorem by proving convergence of point processes based on i.i.d. sequences. The proof will be easily adapted to models with (long-range) dependence, such as stochastic volatility or subordinated Gaussian sequences. Define the measure λ on (−∞,∞)∖{0} by

$$ d\lambda(x)=\alpha \biggl[ \frac{1+\beta}{2}x^{-(\alpha+1)}1 \{ 0<x<\infty \} + \frac{1-\beta}{2}(-x)^{-(\alpha+1)} 1 \{ -\infty<x<0 \} \biggr]\, dx, $$
(4.83)

where β∈[−1,1]. We say that ds×(x) is an intensity measure of a Poisson process N on [0,1]×(−∞,∞) if for any A⊂[0,1], B⊂(−∞,∞), we have

$$E\bigl[N(A\times B)\bigr]=\int_{B}\int_{A}d \lambda(x)\,ds. $$

In particular, we note that E[N([0,1]×(−∞,∞))]<∞.

Theorem 4.13

Let X t (\(t\in\mathbb{N}\)) be a sequence of i.i.d. random variables such that (4.73) holds. Let

$$P\bigl(|X_{1}|>c_{n}\bigr)\sim n^{-1}. $$

Then N n converges weakly in \(M_{p}([0,1]\times\mathbb{R})\) to a Poisson process N on [0,1]×((−∞,∞)∖{0}) with intensity measure ds×(x).

Before we prove this result, let us state some of its consequences. First, the result can be restated as

$$\sum_{t=1}^{n}\delta_{c_{n}^{-1}X_{t}}\Rightarrow\sum_{l=0}^{\infty}\delta_{j_{l}}, $$

where ⇒ denotes weak convergence in \(M_{p}(\mathbb{R})\), and j l are points of a Poisson process with intensity measure (x). If α∈(0,1), then the continuous mapping theorem yields that

$$c_{n}^{-1}\sum_{j=1}^{n}X_{t}\overset{\mathrm{d}}{\rightarrow}\sum _{l=0}^{\infty}j_{l}. $$

If we assume for a moment that X t (\(t\in\mathbb{N}\)) fulfill (4.73) with L X A, then the scaling constants defined in (4.74) become c n =n 1/α A 1/α, and so

$$n^{-1/\alpha}\sum_{t=1}^{n}X_{t}\overset{\mathrm{d}}{\rightarrow}A^{1/\alpha} \sum_{l=0}^{\infty}j_{l}. $$

For the α-stable random variables X t , we have A=C α . Comparing this expression with (4.78) and using the scaling property (4.77), we conclude that \(\sum_{l=0}^{\infty}j_{l}\) is a series representation of \(S_{\alpha}(C_{\alpha}^{-1/\alpha},\beta,0)\). However, this consideration is not valid for the case where α∈(1,2).

Analogously,

$$\sum_{t=1}^{n}\delta_{c_{n}^{-2}X_{t}^{2}}\Rightarrow\sum_{l=0}^{\infty} \delta_{j_{l}^{2}}, $$

and for α∈(0,2),

$$c_{n}^{-2}\sum_{t=1}^{n}X_{t}^{2} \overset{\mathrm{d}}{\rightarrow}\sum_{l=0}^{\infty}j_{l}^{2}=S_{\alpha/2} \bigl( C_{\alpha/2}^{-2/\alpha },1,0 \bigr) , $$

or

$$n^{-2/\alpha}\sum_{t=1}^{n}X_{t}^{2} \overset{\mathrm{d}}{\rightarrow }A^{2/\alpha}S_{\alpha/2} \bigl( C_{\alpha/2}^{-2/\alpha},1,0 \bigr) . $$

We note that for \(X_{t}^{2}\), the skewness parameter is β=1. Then the stable random variable is called totally skewed to the right. This means that the heavy-tailed property (4.75) of the limiting stable distribution is related to the heavy-tailed behaviour of

$$P\bigl(X^{2}>x\bigr)=P(X>\sqrt{x})+P(X<-\sqrt{x})\sim Ax^{-\alpha}, $$

which is valid for positive values of x only. In contrast, when considering X t , the heavy-tailed behaviour of the limiting random variable \(S_{\alpha}(C_{\alpha}^{-1/\alpha},\beta,0)\) is attributed to the heavy-tailed behaviour of P(X>x) (x>0) and P(X<x) (x<0).

Proof of Theorem 4.13

We verify (4.81). It is enough to consider \(U=\bigcup_{i=1}^{K}(k_{i},l_{i})\times(s_{i},t_{i})\) for K=1. We have

where we recall that \(\lambda((s_{i},t_{i}))=\int _{s_{i}}^{t_{i}}d\lambda(x)\), and the measure λ(⋅) is given by (4.83). To prove (4.82), write

Let

$$Q_{n}=\prod_{i=1}^{K}\prod_{nk_{i}<t<nl_{i}}e^{-n^{-1}\lambda ((s_{i},t_{i}))}$$

and note that

as n→∞. Recall the two elementary inequalities

$$\Bigg|\prod_{i=1}^{K}(s_{i}-t_{i})\Bigg|\leq\sum_{i=1}^{K}|s_{i}-t_{i}| \quad \text{and}\quad \big|1-e^{-x}-x\big|\leq x^{1+\varepsilon}$$

for any ε>0. Then we obtain

for some ε>0. □

Another result, due to Davis and Resnick (1988, Proposition 2.1), is useful when studying processes that can be approximated by sequences with finite memory. Their result is stated in fact in a much more general setting, which is omitted here.

We say that a sequence ν n of measures converges vaguely to ν (\(\nu_{n}\overset{v}{\rightarrow}\nu\)) if for all continuous functions \(g:E\rightarrow\mathbb{R}^{d}\) with compact support (written as gC +(E)), we have

$$\int g(x)\nu_{n}(dx)\rightarrow\int g(x)\nu(dx). $$

We refer to Appendix A for additional precise notions related to vague convergence.

Proposition 4.3

Assume that X t (\(t\in\mathbb{N}\)) is a stationary K-dependent sequence with values in \(\mathbb{R}^{d}\) and c n →∞ is a sequence of constants such that for the marginal distribution, we have

$$nP\bigl(c_{n}^{-1}X\in\cdot\bigr)\overset{v}{\rightarrow} \lambda(\cdot). $$

Furthermore, assume that for any \(g\in C^{+}(\mathbb{R}^{d})\),

$$\lim_{k\rightarrow\infty}\limsup_{n\rightarrow\infty}n\sum_{t=2}^{[n/k]}E \bigl[g\bigl(c_{n}^{-1}X_{1}\bigr)g \bigl(c_{n}^{-1}X_{t}\bigr)\bigr]=0. $$

Then

$$N_{n}=\sum_{t=1}^{n}\delta_{(t/n,c_{n}^{-1}X_{t})}$$

converges weakly in \(M_{p}([0,1]\times\mathbb{R})\) to a Poisson process N on [0,1]×(−∞,∞) with intensity measure ds×(x).

This result is applicable to sequences X t with regularly varying tails as in (4.73). In fact (see Theorem 3.6 in Resnick 2007), the vague convergence of \(nP(c_{n}^{-1}X\in\cdot)\) is equivalent to regular variation of the distribution of X.

4.3.3 Sums of Linear and Subordinated Linear Processes

In this section we discuss limit theorems for partial sums of linear processes

$$X_{t}=\sum_{j=0}^{\infty}a_{j}\varepsilon_{t-j}, $$

where a j c a j d−1, d∈(0,1/2), and ε t (\(t\in\mathbb{Z}\)) are i.i.d. random variables such that

$$ P(\varepsilon_{1}>x)\sim A\frac{1+\beta}{2}{x^{-\alpha}},\qquad P(\varepsilon_{1}<-x)\sim A\frac{1-\beta}{2}x^{-\alpha}. $$
(4.84)

In both, the coefficients a j and the tail P(ε 1>x), we assume for simplicity that possible slowly varying functions are constant. If α∈(1,2), we assume also that E(ε 1)=0.

The infinite series above converges if \(\sum_{j=0}^{\infty}|a_{j}|^{\delta}<\infty\) for some δ<α (see e.g. Avram and Taqqu 1992). In our case this is possible if an only if α(d−1)<−1 and hence d<1−1/α. Thus, if α∈(0,1), then the existence condition implies that \(\sum_{j=0}^{\infty}|a_{j}|<\infty\). Consequently, for α∈(0,1), long memory (in the sense of non-summability of the coefficients) is excluded.

Linear processes are the easiest models to describe the interplay between dependence and heavy tails. The asymptotic theory for partial sums is well developed and includes approaches such as convergence of stochastic integrals (Astrauskas 1983, Kasahara and Maejima 1986, 1988) or K-dependent approximations, together with the point process methodology (Davis and Resnick 1985, Davis and Hsing 1995). Interesting results on functional convergence are given in Avram and Taqqu (1992), among others.

4.3.3.1 Tail Behaviour

First, we analyse the tail behaviour of linear processes. We note that if ε t (\(t\in\mathbb{Z}\)) are S α (1,0,0), so that (4.84) holds with β=0 and A=C α , then

$$X_{1}\overset{\mathrm{d}}{=} \Biggl( \sum _{j=0}^{\infty}|a_{j}|^{\alpha } \Biggr)^{1/\alpha}S_{\alpha}(1,0,0)=:D_{\alpha}^{1/\alpha}S_{\alpha }(1,0,0) \overset{\mathrm{d}}{=}D_{\alpha}^{1/\alpha}\varepsilon_{1}, $$

which follows directly from properties (4.76) and (4.77). Therefore, we may conclude that, as x→∞,

$$P\bigl(|X_{1}|>x\bigr)\sim P\bigl(D_{\alpha}^{1/\alpha}| \varepsilon_{1}|>x\bigr)\sim D_{\alpha }C_{\alpha}x^{-\alpha} \sim D_{\alpha}P\bigl(|\varepsilon_{1}|>x\bigr). $$

This property is valid in fact under the general assumption (4.84).

Lemma 4.19

Assume that X t (\(t\in\mathbb {N}\)) is a linear process, ε t (\(t\in\mathbb {Z}\)) are i.i.d. random variables such that (4.84) holds, and E(ε 1)=0 if α∈(1,2).

  • If for some δ<α,

    $$ \sum_{j=0}^{\infty}|a_{j}|+\sum_{j=0}^{\infty}|a_{j}|^{\delta}<\infty, $$
    (4.85)

    then

    $$ \lim_{x\rightarrow\infty}\frac{P( |X_{1}|>x) }{P(|\varepsilon _{1}|>x)}=\sum_{j=0}^{\infty}|a_{j}|^{\alpha}. $$
    (4.86)
  • If a j c a j d−1, d∈(0,1−1/α), and ε t (\(t\in\mathbb{Z}\)) are symmetric with α∈(1,2), then (4.86) holds.

Note that in the second part of the theorem, the coefficients a j are not absolutely summable, however ∑|a j |α is finite. This turns out to be sufficient. The first part was proven in Cline (1983); see also Davis and Resnick (1985). The second part was proven (under special assumptions with symmetry of the innovations) in Kokoszka and Taqqu (1996).

4.3.3.2 Point Process Convergence

In what follows we show that, under the conditions of Lemma 4.19, a point process based on X t (\(t\in\mathbb{N}\)) converges. Its behaviour is the same under short memory (4.85) and under long memory.

Theorem 4.14

Under the assumptions of Lemma 4.19, we have

$$\sum_{t=1}^{n}\delta_{c_{n}^{-1}(X_{t},\ldots,X_{t-K})}\Rightarrow\sum _{l=1}^{\infty}\sum_{r=0}^{\infty}\delta_{j_{l}(a_{r},a_{r-1},\ldots ,a_{r-K})} $$

in \(M_{p}(\mathbb{R}^{K+1})\), where c n is such that P(|ε 1|>c n )∼n −1, i.e. c n A 1/α n 1/α.

Proof

We give the proof for K=0 only. For details, we refer to Davis and Resnick (1985, Theorem 2.4). We note that the authors prove the results under condition (4.85). However, a crucial part of the proof relies on (4.86) only, which due to Lemma 4.19 is valid under more general conditions on a j . We restate Theorem 4.13 in terms of i.i.d. random variables ε t (\(t\in\mathbb{Z}\)),

$$\sum_{t=1}^{n}\delta_{c_{n}^{-1}\varepsilon_{t}}\Rightarrow \sum_{l=1}^{\infty}\delta_{j_{l}} $$

where c n A 1/α n 1/α. Moreover (see Theorem 2.2. in Davis and Resnick 1985), this convergence can be extended to

$$ \sum_{t=1}^{n}\delta_{c_{n}^{-1}(\varepsilon_{t},\ldots,\varepsilon _{t-K})}\Rightarrow\sum_{l=1}^{\infty}\sum_{r=0}^{K}\delta_{j_{l}\mathbf{e}_{r}}, $$
(4.87)

where e r is a unit vector in \(\mathbb{R}^{K+1}\) with the rth coordinate equal to one. In other words, the limiting process has the following structure. It is a Poisson process with values in \(\{0,\ldots ,K\}\times\mathbb{R}\) such that it is a univariate Poisson process on the horizontal line \(\{0\}\times\mathbb{R}\) and its points are repeated on the other horizontal lines. Since the mapping \((z_{t},\ldots,z_{t-K})\rightarrow\sum_{r=0}^{K}b_{k}z_{t-k}\) from \(M_{p}(\mathbb{R}^{K+1})\) to \(M_{p}(\mathbb{R}\setminus\{0\})\) is continuous, (4.87) implies

$$\sum_{t=1}^{n}\delta_{c_{n}^{-1}X_{t,K}}\Rightarrow\sum_{l=1}^{\infty}\sum_{r=0}^{K}\delta_{j_{l}a_{r}}, $$

where \(X_{t,K}=\sum_{r=0}^{K}a_{r}\varepsilon_{t-k}\). Letting K→∞, we obtain

$$\sum_{l=1}^{\infty}\sum_{r=0}^{K}\delta_{j_{l}a_{r}}\overset{\mathrm{p}}{\rightarrow}\sum_{l=1}^{\infty}\sum_{r=0}^{\infty}\delta_{j_{l}a_{r}}. $$

Therefore, to apply Proposition 4.1, we need to verify that the sequence X t can be approximated by the K-dependent sequence X t,K , in the sense that for each γ>0,

$$\lim_{K\rightarrow\infty}\limsup_{n\rightarrow\infty}P \Bigl( c_{n}^{-1}\sup_{1\leq t\leq n}|X_{t}-X_{t,K}|>\gamma \Bigr) =0. $$

The latter probability is bounded by \(nP( c_{n}^{-1}|X_{0}-X_{0,K}|>\gamma) \). Since P(|ε 1|>c n )∼n −1, applying (4.86), we have, as n→∞,

$$nP \bigl( c_{n}^{-1}|X_{0}-X_{0,K}|>\gamma \bigr) \sim\frac{P ( |X_{0}-X_{0,K}|>c_{n}\gamma ) }{P(|\varepsilon_{1}|>c_{n})}=\gamma^{-\alpha}\sum _{r=K+1}^{\infty}|a_{r}|^{\alpha}. $$

The last expression converges to zero as K→∞. □

4.3.3.3 Convergence of Partial Sums

Recall our comments following Theorem 4.13. If the innovations ε t have tail index α∈(0,1), then we may conclude directly from Theorem 4.14 that

$$c_{n}^{-1}\sum_{t=1}^{n}X_{t} \overset{\mathrm{d}}{\rightarrow} \Biggl( \sum_{j=0}^{\infty}a_{j} \Biggr) \sum_{l=1}^{\infty}j_{l} \overset{\mathrm {d}}{=} \Biggl( \sum_{j=0}^{\infty}a_{j} \Biggr) S_{\alpha}\bigl(C_{\alpha }^{-1/\alpha },\beta,0\bigr), $$

where j l are points of a Poisson process, and \(\sum _{l=1}^{\infty}j_{l}\) is a series representation of \(S_{\alpha}(C_{\alpha}^{-1/\alpha},\beta,0)\). Equivalently,

$$n^{-1/\alpha}\sum_{t=1}^{n}X_{t} \overset{\mathrm{d}}{\rightarrow }A^{1/\alpha } \Biggl( \sum _{j=0}^{\infty}a_{j} \Biggr) S_{\alpha} \bigl( C_{\alpha }^{-1/\alpha},\beta,0 \bigr) \overset{\mathrm{d}}{=}A^{1/\alpha }C_{\alpha }^{-1/\alpha} \Biggl( \sum _{j=0}^{\infty}a_{j} \Biggr) S_{\alpha}(1, \beta,0). $$

The situation is more complicated for α∈(1,2). Convergence of partial sums does not follow directly from point process convergence (however, as in Davis and Resnick 1985, an implication of point process convergence may serve as an intermediate tool—this will be illustrated for stochastic volatility models in the following section). In particular, for a long-memory sequence, the scaling for partial sums \(\sum_{t=1}^{n}X_{t}\) of linear processes may differ from c n .

Theorem 4.15

Assume that X t (\(t\in\mathbb{Z}\)) is a linear process such that a j c a j d−1, d∈(0,1/2) and ε t (\(t\in\mathbb{Z}\)) are i.i.d random variables such that (4.84) holds with α∈(1,2) and E(ε 1)=0.

  • If for some δ<α,

    $$ \sum_{j=0}^{\infty}|a_{j}|+\sum_{j=0}^{\infty}|a_{j}|^{\delta}<\infty, $$
    (4.88)

    then

    $$n^{-1/\alpha}S_{n}(u)=n^{-1/\alpha}\sum _{t=1}^{[nu]}X_{t}\overset {\mathrm{f.d.}}{ \rightarrow}A^{1/\alpha}C_{\alpha}^{-1/\alpha} \Biggl( \sum _{j=0}^{\infty}a_{j} \Biggr) Z_{\alpha}(u), $$

    where Z α (⋅) is an α-stable Lévy motion (with independent increments) such that \(Z_{\alpha}(1)\overset{\mathrm{d}}{=}S_{\alpha}(1,\beta,0)\), and \(\overset{\mathrm{f.d.}}{\rightarrow}\) denotes finite-dimensional convergence.

  • If 0<d<1−1/α, then

    $$n^{-H}S_{n}(u) =n^{-H}\sum_{t=1}^{[nu]}X_{t} \Rightarrow A^{1/\alpha}C_{\alpha}^{-1/\alpha}\frac{c_{a}}{d}\tilde{Z}_{H,\alpha}(u), $$

    where H=d+α −1, \(\tilde{Z}_{H,\alpha}(\cdot)\) is a Linear Fractional stable motion, anddenotes weak convergence in D[0,1] w.r.t. the Skorokhod J 1-topology.

Before we present a proof, we make several comments.

Remark 4.2

If condition (4.88) holds, then the scaling factor and the limiting process are (up to a constant) the same as for i.i.d. random variables; see (4.80). The limiting Lévy process has independent increments and discontinuous sample paths. Thus, in this case the particular structure of the coefficients a j is not really important. On the other hand, if d∈(0,1−1/α), then the scaling factor involves the memory parameter d. This is one reason why such a process is said to have long-range dependence. Also, the limiting process has dependent increments but continuous sample paths. We illustrate this in Example 4.15. Note also that the theorem can be stated more generally by allowing slowly varying functions in both a j and the tail of ε 1.

Remark 4.3

It should be pointed out that in the long-memory case (d∈(0,1−1/α)) we have weak convergence w.r.t. the standard J 1-topology and the limiting process has continuous paths. In contrast, in the case of summable coefficients we have finite-dimensional convergence only, and this cannot be extended to J 1-convergence. This can be seen as follows. Assume for a moment that X t =b 0 ε t +b 1 ε t−1 (\(t\in\mathbb{N}\)). The limiting behaviour of \(S_{n}=\sum_{t=1}^{n}X_{t}\) is determined by large values of X t (\(t\in\mathbb{N}\)). Now, there is a small chance that both ε t and ε t+1 are large since P(ε t >x,ε t+1>x)=o(P(ε 1>x)) as x→∞. Therefore, we have one large value of a particular \(\varepsilon_{t^{\ast}}\), say which implies \(X_{t^{\ast}}\approx b_{0}\varepsilon_{t^{\ast}}\) and \(X_{t^{\ast}+1}\approx b_{1}\varepsilon_{t^{\ast}}\). This produces two “clustered” large jumps in the limiting process, which contradicts a heuristic explanation of J 1-topology in the Appendix A. However, it is possible to have weak convergence w.r.t. different topologies. We refer to Avram and Taqqu (1992).

Proof

In the case of weak dependence (i.e. where (4.88) holds), the proof mimics the one for normal convergence (see Theorem 4.5). Let \(X_{t,K}=\sum _{j=0}^{K}a_{j}\varepsilon_{t-j}\). Note that (4.80) can be restated for u=1 as

$$n^{-1/\alpha} \Biggl( \sum_{t=1}^{n} \varepsilon_{t},\ldots,\sum_{t=1}^{n} \varepsilon_{t-m} \Biggr) \overset{\mathrm{d}}{\rightarrow }A^{1/\alpha }C_{\alpha}^{-1/\alpha}\bigl(Z_{\alpha}(1), \ldots,Z_{\alpha}(1)\bigr). $$

The continuous mapping theorem implies

$$n^{-1/\alpha}\sum_{t=1}^{n}X_{t,K} \overset{\mathrm{d}}{\rightarrow}A^{1/\alpha}C_{\alpha}^{-1/\alpha} \Biggl( \sum_{j=0}^{K}a_{j} \Biggr) Z_{\alpha}(1). $$

Furthermore, \(( \sum_{j=0}^{K}a_{j}) Z_{\alpha }(1)\overset{\mathrm{p}}{\rightarrow}( \sum_{j=0}^{\infty }a_{j}) Z_{\alpha}(1)\). We finish the proof by verifying

$$\limsup_{K\rightarrow\infty}\lim_{n\rightarrow\infty}P\bigl(n^{-1/\alpha}\big|S_{n}(1)-S_{n,K}(1)\big|>\gamma\bigr)=0 $$

for each γ>0. This requires precise calculations on the tail behaviour of X t . In particular, (4.86) plays a crucial role. We refer to Davis and Resnick (1985) for details. The result then follows from Proposition 4.1.

As for the long-memory case, we assume for simplicity that ε t (\(t\in\mathbb{Z}\)) are S α (1,β,0). We may write

$$S_{n}=\sum_{t=1}^{n}X_{t}=\sum_{l=-\infty}^{n}\varepsilon_{l}\sum _{j=1-l}^{n-l}a_{j}=:\sum_{l=-\infty}^{n}\tilde{a}_{l,n}\varepsilon_{l}$$

with \(\tilde{a}_{l,n}=\sum_{j=1-l}^{n-l}a_{j}\). If a j c a j d−1, then

$$\tilde{a}_{l,n}\sim\frac{c_{a}}{d} \bigl\{ (n-l)^{d}-(1-l)^{d} \bigr\} . $$

Therefore, since S n is a sum of independent stable random variables, on account of (4.76), we expect that

$$\sum_{l=-\infty}^{n}\tilde{a}_{l,n}\varepsilon_{l}\overset{\mathrm{d}}{=}S_{\alpha}(\eta_{n},\beta,0) $$

with the scale parameter such that

Here, note that the integral above is defined only if 0<d<1−1/α. Therefore, with b n =(c α /d)n H (recall that now C α =A since we consider stable innovations), the distribution of \(b_{n}^{-1}S_{n}(1)\) agrees asymptotically with the distribution of a stable random variable with the scale

$$\eta= \biggl( \int_{-\infty}^{1} \bigl\{ (1-v)_{+}^{d}-(-v)_{+}^{d} \bigr\}^{\alpha}\,dv \biggr)^{1/\alpha} $$

and skewness β. Now, if we have a stable integral ∫g(x) dM(x), then it is a stable random variable with the scale (∫|g(x)|αdx)1/α. Thus, for each u, the Linear Fractional Stable Motion \(\tilde{Z}_{H,\alpha}(\cdot)\) (see Sect. 3.7.2 for additional details)

$$\int_{-\infty}^{u} \bigl\{ (u-v)_{+}^{H-1/\alpha}-(-v)_{+}^{H-1/\alpha} \bigr\} \,dZ_{\alpha}(v) $$

is a stable random variable with the scale

$$u^{1/\alpha} \biggl( \int_{-\infty}^{1} \bigl\{ (u-v)_{+}^{H-1/\alpha}-(-v)_{+}^{H-1/\alpha} \bigr \}^{\alpha}\,dv \biggr)^{1/\alpha}. $$

Consequently, the result follows for u=1. In this argument we replaced the coefficients \(\tilde{a}_{l,n}\) by the asymptotically equivalent expressions. This approximation can be made more precise by computing the characteristic function. □

Example 4.15

We illustrate Theorem 4.15 by a simulation study. First, as in Example 4.2, we generate n=1000 i.i.d. standard normal random variables X t and plot the partial sum sequence \(S_{k}=\sum_{t=1}^{k}X_{t}\), k=1,…,n. This procedure is repeated for Gaussian FARIMA(0,d,0) process with d=0.4. The path of the fractional Brownian motion is much smoother than of the Brownian motion. This is due to the influence of long memory. The corresponding partial sum processes are plotted in Fig. 4.5. For comparison, we simulate i.i.d. random variables from a t-distribution with 3/2 degrees of freedom (hence, with a finite mean and infinite variance) and a FARIMA(0,0.4,0) process where the innovations have a t-distribution with 3/2 degrees of freedom. The partial sum processes are depicted on Fig. 4.6. In the i.i.d. case, the process has clearly discontinuous sample paths, whereas this effect does not seem to be present in the long-memory case.

Fig. 4.5
figure 5

Paths of a partial sum sequence \(S_{k}=\sum _{t=1}^{k}X_{t}\) with X t i.i.d. N(0,1) (left) and X t generated by a FARIMA(0,0.4,0) process

Fig. 4.6
figure 6

Paths of a Lévy stable motion and a fractional stable motion with Hurst parameter H=d+1/α, d=0.4, α=3/2

4.3.3.4 Subordinated Case

Consider the partial sum

$$S_{n,G}(u)=\sum_{t=1}^{[nu]} \bigl\{ G(X_{t})-E \bigl[ G(X_{1}) \bigr] \bigr\} \quad\bigl(u \in[0,1]\bigr), $$

where G is a measurable function. Subordinated linear processes with infinite second moments were studied in Hsing (1999), Koul and Surgailis (2001), Surgailis (2002, 2004), Vaičiulis (2003). Surprisingly, the martingale decomposition method, used in Theorem 4.9 for variables with finite variance, works also here.

We start with the simple case of polynomials. Let us focus on a quadratic function G(x)=x 2. If α∈(0,2), then we can repeat the argument following point process convergence in Theorem 4.14. First (see the discussion following Theorem 4.13), we can also write

$$\sum_{t=1}^{n}\delta_{c_{n}^{-2}X_{t}^{2}}\Rightarrow\sum_{j=0}^{\infty} \sum_{l=0}^{\infty}\delta_{j_{l}^{2}a_{j}^{2}}. $$

This is valid as long as the conditions of Lemma 4.19 hold. Now, if α∈(0,2), the random variables \(X_{t}^{2}\) (\(t\in\mathbb{N}\)) have infinite means. Therefore, for α∈(0,2),

$$c_{n}^{-2}\sum_{t=1}^{n}X_{t}^{2} \overset{\mathrm{d}}{\rightarrow} \Biggl( \sum_{j=0}^{\infty}a_{j}^{2} \Biggr) \sum_{l=0}^{\infty}j_{l}^{2}= \Biggl( \sum_{j=0}^{\infty}a_{j}^{2} \Biggr) S_{\alpha/2} \bigl( C_{\alpha /2}^{-2/\alpha},1,0 \bigr) , $$

or equivalently,

$$n^{-2/\alpha}\sum_{t=1}^{n}X_{t}^{2} \overset{\mathrm{d}}{\rightarrow } \Biggl( \sum_{j=0}^{\infty}a_{j}^{2} \Biggr) A^{2/\alpha}C_{\alpha/2}^{-2/\alpha }S_{\alpha/2} ( 1,1,0 ) . $$

The case α∈(0,1) was proven in Davis and Resnick (1985, Theorem 4.2), whereas the case α∈(1,2) is addressed in Kokoszka and Taqqu (1996, Theorem 2.1). In other words, long memory does not influence the limiting behaviour.

Now, the situation changes when 2<α<4. The partial sum

$$S_{n,G}(u)=\sum_{t=1}^{[nu]} \bigl(X_{t}^{2}-E\bigl(X_{1}^{2}\bigr) \bigr) $$

can be decomposed as (cf. Example 4.9)

$$S_{n,G,1}(u)+S_{n,G,2}(u):=\sum_{t=1}^{[nu]} \sum_{j=0}^{\infty}a_{j}^{2} \bigl(\varepsilon_{t-j}^{2}-E\bigl(\varepsilon_{1}^{2} \bigr)\bigr)+\sum_{t=1}^{[nu]}\sum _{{j,k=0;\ j\not=k}}^{\infty}a_{j}a_{l} \varepsilon_{t-j}\varepsilon_{t-k}. $$

The first part S n,G,1(u) is a partial sum process based on the linear process with summable coefficients \(a_{j}^{2}\). Therefore, on account of the first part of Theorem 4.15,

$$n^{-2/\alpha}S_{n,G,1}(u)\overset{\mathrm{f.d.}}{\rightarrow}A^{2/\alpha }C_{\alpha/2}^{-2/\alpha} \Biggl( \sum_{j=0}^{\infty}a_{j}^{2} \Biggr) Z_{\alpha/2}(u), $$

where Z α/2(⋅) is a Lévy process such that \(Z_{\alpha/2}(1)\overset{\mathrm{d}}{=}S_{\alpha/2}(1,1,0)\), i.e. Z α/2(1) is an α/2-stable random variable that is completely skewed to the right.

Convergence of the second term follows exactly as in Example 4.9. First, since 2<α<4, the random variables ε t have a finite variance where under the assumption a j c a j d−1 we have γ X (k)=cov(X t ,X t+k )∼L γ (k)k 2d−1 with

$$L_{\gamma}(k)=c_{a}^{2}\sigma_{\varepsilon}^{2} \int_{0}^{\infty}v^{d-1} ( v+1 )^{d-1}\,dv, $$

see Lemma 4.13. If 1/4<d<1/2, then

$$n^{-2d}L_{2}^{-1/2}(n)S_{n,G,2}(u)\Rightarrow Z_{2,H}(u), $$

where H=d+1/2, Z 2,H (u) is the Hermite–Rosenblatt process, and

$$L_{2}(n)=m!C_{m}L_{\gamma}^{m}(n). $$

Otherwise, if 0<d<1/4, then n −1/2 S n,G,2(u)=O P (1). Therefore, we have a dichotomous behaviour depending on a relation between the “memory parameter” d and tails. Such consideration can be carried out for instance for Appell polynomials (see Vaičiulis 2003). Before we state our theorem, we recall for convenience the heavy-tail condition (4.84):

$$ P(\varepsilon_{1}>x)\sim A\frac{1+\beta}{2}{x^{-\alpha}},\qquad P(\varepsilon _{1}<-x)\sim A\frac{1-\beta}{2}x^{-\alpha}. $$
(4.89)

Theorem 4.16

Assume that X t (\(t\in\mathbb{Z}\)) is a linear process such that a j c a j d−1, d∈(0,1/2) and ε t (\(t\in\mathbb{Z}\)) are i.i.d. random variables such that (4.89) holds with α∈(2,4). Also, assume that E(ε 1)=0.

  • If 0<d<1/α, then

    $$n^{-2/\alpha}\sum_{t=1}^{[nu]} \bigl(X_{t}^{2}-E\bigl(X_{1}^{2}\bigr) \bigr)\overset{\mathrm {f.d.}}{\rightarrow}A^{2/\alpha}C_{\alpha/2}^{-2/\alpha} \Biggl( \sum_{j=0}^{\infty }a_{j}^{2} \Biggr) Z_{\alpha/2}(u), $$

    where Z α/2(⋅) is an α/2-stable Lévy motion such that \(Z_{\alpha/2}(1)\overset{\mathrm{d}}{=}S_{\alpha/2}(1,1,0)\).

  • If 1/α<d<1/2, then

    $$n^{-2d}L_{2}^{-1/2}(n)\sum _{t=1}^{[nu]}\bigl(X_{t}^{2}-E \bigl(X_{1}^{2}\bigr)\bigr)\Rightarrow Z_{2,H}(u), $$

    wheredenotes weak convergence in D[0,1], Z 2,H (⋅) is the Hermite–Rosenblatt process, and H=d+1/2.

The next theorem follows from Theorem 4.15 and a reduction principle along the lines of Theorem 4.9. We assume that the innovations in the linear process are symmetric.

Theorem 4.17

Assume that X t (\(t\in\mathbb{Z}\)) is a linear process such that a j c a j d−1, d∈(−∞,1/2), ε t (\(t\in\mathbb {Z}\)) are i.i.d. symmetric random variables such that (4.89) holds with α∈(1,2) and β=0, i.e.

$$P(\varepsilon_{1}>x)\sim\frac{A}{2}{x^{-\alpha}},\qquad P(\varepsilon _{1}<-x)\sim\frac{A}{2}x^{-\alpha}. $$

Furthermore, assume that the distribution F ε of ε 1 fulfills

$$\big|F_{\varepsilon}^{(2)}(x)\big|\leq C\bigl(1+|x|\bigr)^{-\alpha}, \qquad \big|F_{\varepsilon}^{(2)}(x)-F_{\varepsilon}^{(2)}(y)\big| \leq C|x-y|\bigl(1+|x|\bigr)^{-\alpha}, $$

where |xy|<1, \(x\in\mathbb{R}\).

  • If 0<d<1−1/α and G is bounded, then

    $$ n^{-H}\sum_{t=1}^{[nu]} \bigl\{ G(X_{t})-E \bigl[ G(X_{1}) \bigr] \bigr\} \Rightarrow A^{1/\alpha}C_{\alpha}^{-1/\alpha}\frac{c_{a}}{d}G_{\infty}^{(1)}(0) \tilde{Z}_{H,\alpha}(u), $$
    (4.90)

    wheredenotes weak convergence in D[0,1], and \(\tilde{Z}_{H,\alpha}(\cdot)\) is a linear fractional stable motion with H=d+α −1 such that \(\tilde{Z}_{H,\alpha}(1)\) is a symmetric α-stable random variable with scale

    $$\eta= \biggl( \int_{-\infty}^{1} \bigl\{ (1-v)_{+}^{d}-(-v)_{+}^{d} \bigr\}^{\alpha}\,dv \biggr)^{1/\alpha} $$

    and G (x)=E[G(X+x)].

  • If 1−2/α<d<0 and A=1 in (4.89) and G is bounded, then

    $$ n^{-1/\alpha(1-d)}\sum_{t=1}^{[nu]} \bigl\{ G(X_{t})-E \bigl[ G(X_{1}) \bigr] \bigr\} \Rightarrow c_{G}^{+}\tilde{Z}_{\alpha(1-d)}^{+}(u)+c_{G}^{-} \tilde{Z}_{\alpha(1-d)}^{-}(u), $$
    (4.91)

    where \(\tilde{Z}_{\alpha(1-d)}^{+}(\cdot)\), \(\tilde{Z}_{\alpha(1-d)}^{-} (\cdot)\) are independent copies of an α(1−d)-stable Lévy motion such that \(Z_{\alpha(1-d)}(1)\overset{\mathrm{d}}{=}S_{\alpha(1-d)}(1,1,0)\) and

    $$c_{G}^{\pm}=C_{\alpha(1-d)}^{-1/\alpha(1-d)}\frac{c_{a}^{1/(1-d)}}{1-d}\int_{0}^{\infty} \bigl[ G_{\infty}(\pm v)-G_{\infty}(0) \bigr] v^{-1-1/(1-d)}\,dv, $$

    where G (x)=E[G(X 1+x)].

  • If −∞<d<1−2/α and G is bounded, then

    $$ n^{-1/2}\sum_{t=1}^{[nu]} \bigl\{ G(X_{t})-E \bigl[ G(X_{1}) \bigr] \bigr\} \Rightarrow \sigma_{S}B(u), $$
    (4.92)

    where B(⋅) is a standard Brownian motion, and σ S is a finite positive constant.

This theorem was proven in Koul and Surgailis (2001), Surgailis (2002) and Hsing (1999). Remarkably, in (4.90) and (4.91), we may obtain a stable limit arising from a summation of bounded random variables. The convergence in (4.90) can be thought of as a long-memory-type behaviour since the scaling involves the memory parameter d and the limiting process has dependent increments. The convergence in (4.91) is a sort of an intermediate case: the scaling involves d, but the limiting process has independent increments. Finally, (4.92) represents a standard behaviour: as in the i.i.d. case, the limiting process is a Brownian motion since \(\operatorname {var}(G(X_{1}))\) is finite.

Below, we give an outline of the proof of (4.90). As for (4.91), the limiting process has independent increments, but the scaling factor involves the memory parameter d. The reason for this is that the process S n,G (u) can be approximated by a sum \(\sum_{t=1}^{n}\eta_{G}(\varepsilon_{t})\) of i.i.d. random variables, where

$$\eta_{G}(\varepsilon_{t})=\sum_{j=0}^{\infty} \bigl\{ G_{\infty}(a_{j}\varepsilon_{t})-E \bigl[ G_{\infty}(a_{j}\varepsilon_{1}) \bigr] \bigr\}, $$

and the variables η G have a tail decaying like |x|α(1−d).

In (4.90) it may happen that the quantity \(G_{\infty}^{(1)}(0)\) vanishes. It is an open question, whether it is possible to obtain a nondegenerate limit in this case with 1<α<2. Let us recall that in the case of linear processes with finite moments the solution to this problem is given for example in Theorem 4.4. In the case of infinite moments, this question was studied in Surgailis (2004) under the assumption 2<α<4. It may happen that the limit is an α(1−d)-Lévy stable motion, Hermite–Rosenblatt process or Brownian motion.

Proof of Theorem 4.17

Recall the notation from the proof of Theorem 4.9. We denote by \(\mathcal{V}_{t}\) the sigma field generated by (ε t ,ε t−1,…) and set

$$T_{n}(G;1)=\sum_{t=1}^{n} \bigl( G(X_{t})-E \bigl[ G(X_{1}) \bigr] -G_{\infty}^{(1)}(0)X_{t} \bigr) $$

and \(P_{K}Y=E(Y|\mathcal{V}_{K})-E(Y|\mathcal{V}_{K-1})\). We can repeat the computation there, using the rth norm with r<α instead of r=2:

The first inequality follows from a result for martingale differences Y t (\(t\in\mathbb{N}\)), namely

$$\Bigg\Vert\sum_{t=1}^{n}Y_{t}\Bigg\Vert_{r}^{r}\leq2\sum_{t=1}^{n}\Vert Y_{j}\Vert _{r}^{r}$$

for any 1≤r≤2. The second one is the norm inequality used in the proof of Theorem 4.9. Now, instead of Lemma 4.17, we use

$$\big\Vert P_{-(t-K)}U(\mathcal{V}_{0})\big\Vert_{r}\leq(t-K)^{-(1-d)(1+\gamma)}, $$

where (1+γ)r<α. Computations leading to this expression are quite involved; we refer the reader to Koul and Surgailis (2001). Then one obtains

$$\big\Vert T_{n}(G;1)\big\Vert_{r}^{r}\leq C\sum_{K=-\infty}^{n}\Biggl( \sum _{t=K\vee 1}^{n}(t-K)^{-(1-d)(1+\gamma)}\Biggr) ^{r}\leq Cn^{r+1}n^{-(1-d)(1+\gamma)r}$$

by similar calculations as those leading to (4.64), (4.65). Choosing γ sufficiently close to 0, we conclude that

$$\big\Vert T_{n}(G;1)\big\Vert_{r}^{r}=o\bigl(n^{r(d+1/\alpha)}\bigr). $$

In particular, \(\Vert T_{n}(G;1)\Vert_{r}^{r}=o(v_{n}^{r})\), where

$$v_{n}=C_{\alpha}^{-1/\alpha}A^{1/\alpha}\frac{c_{a}}{d}n^{H} $$

with \(H=d+\frac{1}{\alpha}\). Therefore, on account of Theorem 4.15, the limiting behaviour of

$$v_{n}^{-1}\sum_{t=1}^{n} \bigl\{ G(X_{t})-E \bigl[ G(X_{1}) \bigr] \bigr\} $$

is the same as that of \(v_{n}^{-1}G_{\infty}^{(1)}(0)\sum_{t=1}^{n}X_{t}\). □

4.3.4 Stochastic Volatility Models

In this section we consider Long-Memory Stochastic Volatility (LMSV) sequences with infinite moments. Let X t =σ t ξ t (\(t\in\mathbb{N}\)), where

$$\sigma_{t}=\sigma(\zeta_{t}),\quad\zeta_{t}=\sum_{j=1}^{\infty}a_{j}\varepsilon_{t-j}, $$

σ(⋅) is a positive function, \(\sum_{j=1}^{\infty }a_{j}^{2}<\infty\), and ε t (\(t\in\mathbb{Z}\)) are i.i.d. random variables. It is further assumed that ξ t (\(t\in\mathbb{Z}\)) is a sequence of i.i.d. random variables such that

$$ P(\xi_{1}>x)\sim A\frac{1+\beta}{2}{x^{-\alpha}},\qquad P(\xi_{1}<-x)\sim A\frac{1-\beta}{2}x^{-\alpha}. $$
(4.93)

Also, we assume that the sequences ε t (\(t\in\mathbb{Z}\)) and ξ t (\(t\in\mathbb{Z}\)) are mutually independent. At the moment we do not assume anything about the mean of ξ t .

Limiting results for infinite-variance volatility models with long memory are almost non-existing; see Kulik and Soulier (2012) or Surgailis (2008); the latter in a quadratic LARCH case. In particular, we will show below that stochastic volatility models can be treated using a point process methodology.

4.3.4.1 Tail Behaviour

The first question we have to answer is the following. If ξ is like in (4.93), what is the consequence on the tail of X? The next lemma shows that if the random variables ε and σ are independent, then σε is still regularly varying. The result is often referred to as Breiman’s lemma (Breiman 1965), and a proof can be found for example in Resnick (2007, Proposition 7.5).

Lemma 4.20

Assume that (4.93) holds. If σ 1 is a positive random variable independent of ξ 1 and such that for some δ>0,

$$ E\bigl(\sigma_{1}^{\alpha+\delta}\bigr)<\infty, $$
(4.94)

then the distribution of σξ is regularly varying, and

$$ \lim_{x\rightarrow\infty}\frac{P(\sigma_{1}\xi_{1}>x)}{P(|\xi_{1}|>x)}=\frac{1+\beta}{2}E\bigl( \sigma_{1}^{\alpha}\bigr),\qquad\lim_{x\rightarrow \infty } \frac{P(\sigma_{1}\xi_{1}<-x)}{P(|\xi_{1}|>x)}=\frac{1-\beta}{2}E \bigl( \sigma_{1}^{\alpha} \bigr) . $$
(4.95)

Lemma 4.20 implies for the LMSV model and arbitrary p>0 that

$$ P\bigl(|X_{1}|^{p}>x\bigr)=P\bigl(X_{1}>x^{1/p} \bigr)+P\bigl(X_{1}<-x^{1/p}\bigr)\sim A\;E\bigl( \sigma_{1}^{\alpha}\bigr)x^{-\alpha/p}. $$
(4.96)

Thus, if we consider the LMSV model, we may take ξ t as in (4.93), σ(x)=e x and ζ t (\(t\in\mathbb{N}\)) to be e.g. long-memory Gaussian. Then the random variables X t (\(t\in\mathbb{N}\)) have heavy tails and long memory.

4.3.4.2 Point Process Convergence

Point process convergence results play a crucial role when proving asymptotic results for partial sums based on infinite-variance sequences. Here, we assume that the reader is familiar with material presented in Sect. 4.3.2.4.

We start with a simple generalization of Theorem 4.13 to the LMSV model. Recall the intensity measure

$$d\lambda(x)=\alpha \biggl[ \frac{1+\beta}{2}x^{-(\alpha+1)}1 \{ 0<x<\infty \} + \frac{1-\beta}{2}(-x)^{-(\alpha+1)}1 \{ -\infty<x<0 \} \biggr] \,dx, $$

where β∈[−1,1], and consider the point processes

$$N_{n}=\sum_{t=1}^{n}\delta_{(t/n,c_{n}^{-1}X_{t})}, $$

where c n is chosen to fulfill P(|ξ 1|>c n )∼n −1, i.e.

$$c_{n}=A^{1/\alpha}n^{1/\alpha}. $$

The next result shows that the point process based on the LMSV sequence X t behaves as if the random variables were independent. It will be clear from the proof that the same applies to |X t |r where r is any power. Furthermore, we do not really need the particular structure σ t =σ(ζ t ), where ζ t (\(t\in\mathbb{Z}\)) is a linear process. Only the ergodicity of σ t (\(t\in\mathbb {N}\)) is needed.

Theorem 4.18

Consider the LMSV model X t =σ t ξ t (\(t\in\mathbb{N}\)) such that (4.93) and Breiman’s condition (4.94) hold. Then N n converges weakly in \(M_{p}([0,1]\times\mathbb{R})\) to a Poisson process N with intensity measure \(E(\sigma_{1}^{\alpha })ds\times d\lambda(x)\).

Proof

(Personal communication with P. Soulier) The proof is basically the same as in the i.i.d. case, see Theorem 4.13. We also use the same notation as in Theorem 4.13. Let \(U=\bigcup _{i=1}^{K}(k_{i},l_{i})\times(s_{i},t_{i})\). Then

where \(\mathcal{F}_{\sigma}\) is the sigma field generated by the entire sequence σ t . Let θ t ((s i ,t i )) be the limit of \(nP(c_{n}^{-1}X_{t}\in(s_{i},t_{i})|\mathcal{F}_{\sigma})\) and write

$$Q_{n}=\prod_{i=1}^{K}\prod _{nk_{i}<t<nl_{i}}\exp \bigl\{ -n^{-1}\theta_{t} \bigl((s_{i},t_{i}) \bigr) \bigr\} . $$

Note that θ t is a random variable since it depends on the sequence σ t (\(t\in\mathbb{N}\)). Therefore, the only difference between the LMSV setting and the i.i.d. one is that Q n here is a random variable and λ((s i ,t i )) is replaced by θ t ((s i ,t i )). Nevertheless, Q n converges in probability to

$$\exp \Biggl\{ -E\bigl(\sigma_{1}^{\alpha}\bigr)\sum _{i=1}^{K}(l_{i}-k_{i})\lambda \bigl((s_{i},t_{i})\bigr) \Biggr\} =P\bigl(N(U)=0\bigr). $$

It remains to prove that |P n Q n | converges in probability to 0 and apply the bounded convergence theorem. To prove that |P n Q n |→ P 0, we proceed as in Theorem 4.13:

For the second term, we have

$$nE \biggl[ \biggl \vert 1-e^{-n^{-1}\theta_{1}((s_{i},t_{i}))} -\frac{\theta_{1}((s_{i},t_{i}))}{n}\biggr \vert \biggr] \leq Cn^{-\delta}E\bigl[\sigma_{1}^{\alpha+\delta} \bigr]. $$

Furthermore, let us recall the so-called Potter’s bound (see Theorem 1.5.6. in Bingham et al. 1989), namely: for v>0,

$$nP\bigl(c_{n}^{-1}v\xi_{1}\in(s_{i},t_{i}) \bigr)\leq C\bigl(\max\{v,1\}\bigr)^{\alpha+\delta}, $$

where δ>0. For the first term, we apply Potter’s bound to get

$$nP\bigl(c_{n}^{-1}X_{1}\in(s_{i},t_{i})| \mathcal{F}_{\sigma }\bigr)=nP\bigl(c_{n}^{-1} \xi_{1}\sigma_{1}\in(s_{i},t_{i})| \mathcal{F}_{\sigma}\bigr)\leq\bigl(\max\{\sigma_{1},1\} \bigr)^{\alpha+\delta}, $$

and the same bound holds for θ 1(s i ,t i ). We then can apply bounded convergence to get

$$\lim_{n\rightarrow\infty}E \bigl[ \big|nP\bigl(c_{n}^{-1}X_{1} \in(s_{i},t_{i})\bigr)-\theta_{1} \bigl((s_{i},t_{i})\bigr)\big| \bigr] =0. $$

 □

4.3.4.3 Convergence of Partial Sums

Having established point process convergence, we proceed with its consequences for partial sums. Assume that ξ 1 fulfills (4.93) and E(ξ 1)=0 or ξ 1 is symmetric if α∈(0,1). Define

$$S_{n}(u)=\sum_{t=1}^{[nu]}X_{t}$$

and

$$S_{n,p}(u)=\sum_{t=1}^{[nu]} \bigl( |X_{t}|^{p}-E\bigl[|X_{1}|^{p}\bigr] \bigr) , $$

assuming that E[|X 1|p]<∞ but E[|X 1|2p]=∞. Due to Lemma 4.20, this is achieved when p<α<2p. In the next theorem we show that depending on an interplay between long memory and tails, partial sums based on the LMSV sequence may converge either to a Lévy process (weakly dependent behaviour) or to a Hermite process (long-memory behaviour).

Theorem 4.19

Consider the LMSV model X t =σ t ξ t (\(t\in\mathbb{N}\)) and assume that the conditions of Theorem 4.18 hold. In addition, we assume that α>1, E(ξ 1)=0 and ζ t (\(t\in\mathbb{N}\)) is a Gaussian linear process with coefficients a j satisfying (B1), i.e. a j =L a (j)j d−1, d∈(0,1/2), and covariance function γ ζ (k)∼L γ (k)k 2d−1. Let m≥1 be the Hermite rank of the function σ p(⋅) and assume further that \(E(\sigma_{1}^{2\alpha+2\varepsilon })<\infty\).

  • If 1<α<2, then

    $$ n^{-1/\alpha}S_{n}(u)\Rightarrow A^{1/\alpha}C_{\alpha}^{-1/\alpha}\bigl(E\bigl[\sigma_{1}^{\alpha}\bigr]\bigr)^{1/\alpha} Z_{\alpha}(u), $$
    (4.97)

    where Z α (⋅) is an α-stable Lévy process such that \(Z_{\alpha}(1)\overset{\mathrm{d}}{=}S_{\alpha}(1,\beta ,0)\), anddenotes weak convergence in D[0,1].

  • If p<α<2p and 1−m(1/2−d)<p/α, then

    $$ n^{-p/\alpha}S_{n,p}(u)\Rightarrow A^{p/\alpha}C_{\alpha/p}^{-p/\alpha } \bigl(E\bigl[\sigma_{1}^{\alpha}\bigr]\bigr)^{p/\alpha} Z_{\alpha/p}(u), $$
    (4.98)

    where Z α (⋅) is an α/p-stable Lévy process such that \(Z_{\alpha}(1)\overset{\mathrm{d}}{=}S_{\alpha/p}(1,1,0)\), anddenotes weak convergence in D[0,1].

  • If p<α<2p and 1−m(1/2−d)>p/α, then

    $$ n^{-(1-m(\frac{1}{2}-d))}L_{m}^{-1/2}(n)S_{n,p}(u)\Rightarrow \frac {J(m)E[|\xi_{1}|^{p}]}{m!}Z_{m,H}(u), $$
    (4.99)

    where Z m,H (⋅) is a Hermite process of order m, \(H=d+\frac{1}{2}\),

    $$L_{m}(n)=m!C_{m}L_{\gamma}(n), $$

    J(m) is the Hermite coefficient of σ p(⋅), anddenotes weak convergence in D[0,1].

When α∈(1,2), the partial sum S n (u) is a martingale because E(X t )=E(ξ t )E(σ t )=0. Hence, only the stable Lévy limit arises, and (4.97) holds. This can be concluded from a general theory by Surgailis (2008). If S n,p (⋅) is considered, then we observe a dichotomous behaviour. Assume for simplicity that m=1. If long memory is strong enough, then it influences the limiting behaviour. Interestingly, the infinite variance sequence |X t |p yields a limiting process with finite variance. Furthermore, results are readily extendable to the case where ζ t is a general linear process. Instead of Theorem 4.4, one has to use corresponding results for subordinated linear processes; see Theorem 4.6. Furthermore, in contrast to Theorem 4.15 for linear processes with infinite variance, we note that we have weak convergence w.r.t. J 1-topology in all three cases.

Example 4.16

(Cf. Example 4.11)

Assume that X t =ξ t exp(ζ t ), where ζ t is a standard normal sequence with covariance γ ζ (k)∼L γ k 2d−1, d∈(0,1/2). If α∈(2,4) and d+1/2<2/α, then n −2/α S n,2(u) converges to a Lévy process. Otherwise, if α∈(2,4) and d+1/2>2/α, then

$$n^{-(1/2+d)}L_{1}(n)^{-1/2}(n)S_{n,2}(u)\Rightarrow J(1)E\bigl(\xi_{1}^{2}\bigr)B_{H}(u), $$

where L 1(n)=(d(2d+1))−1 L γ (n) and J(1)=E[ζ 1exp(2ζ 1)].

In the spirit of Example 4.12, if α∈(1,2) and \(E(\xi_{t})\not=0\), then long memory appears already in \(\sum_{t=1}^{[nu]}X_{t}\).

Example 4.17

(LMSD with Infinite Variance)

As in Example 4.12, we assume that the random variables ξ t (\(t\in\mathbb{N}\)) are strictly positive. Suppose that we have heavy tails

$$P(\xi_{1}>x)\sim Ax^{-\alpha}\quad(x\rightarrow\infty) $$

with α∈(1,2). Furthermore, it is assumed that the sequences ξ t and ζ t are independent and the covariance of ζ t is of the asymptotic form γ ζ (k)∼L γ (k)k 2d−1, d∈(0,1/2). Let G(x)=x and σ(x)=exp(x), so that the Hermite rank m=1. Then we have a dichotomous behaviour for \(S_{n}(u):=\sum _{t=1}^{[nu]}(X_{t}-E(X_{1}))\). Specifically, (4.98) and (4.99) hold with p=1:

  • If 1/2+d<1/α, then

    $$ n^{-1/\alpha}S_{n}(u)\Rightarrow A^{1/\alpha}C_{\alpha}^{-1/\alpha}\bigl(E\bigl[\sigma_{1}^{\alpha}\bigr]\bigr)^{1/\alpha} Z_{\alpha}(u), $$
    (4.100)

    where Z α (⋅) is an α-stable Lévy process such that \(Z_{\alpha}(1)\overset{\mathrm{d}}{=}S_{\alpha}(1,1,0)\).

  • If 1/2+d>1/α, then

    $$ n^{-(1/2+d)}L_{1}^{-1/2}(n)S_{n}(u)\Rightarrow J(1)E[\xi_{1}]B_{H}(u), $$
    (4.101)

    where B H (⋅) is a fractional Brownian motion, \(H=d+\frac{1}{2}\), L 1(n)=C 1 L γ (n) and J(1)=E[ζ 1exp(ζ 1)].

Proof of Theorem 4.19

Let \(\mathcal{F}_{t}\) be a sigma field generated by ξ j ,ε j (jt). We start by studying S n,p (⋅). Write

Note that \(E[|X_{t}|^{p}|\mathcal{F}_{t-1}]=E(|\xi_{1}|^{p})\sigma^{p}(\zeta_{t})\) is a function of ζ t and does not depend on ξ t . Therefore, for the long-memory part R n (u), we have

$$ n^{-(1-m(\frac{1}{2}-d))}L_{1}^{-1/2}(n)R_{n}(u)\Rightarrow\frac {J(m)E[|\xi _{1}|^{p}]}{m!}Z_{m,H}(u) $$
(4.102)

if m(1/2−d)<1, where Z m,H (⋅) is a Hermite–Rosenblatt process, and L 1 is a slowly varying function defined in Theorem 4.4. If m(1/2−d)>1, then

$$ n^{-1/2}R_{n}(u)\Rightarrow vE\bigl[|\xi_{1}|^{p} \bigr]B(u), $$
(4.103)

where B(⋅) is a standard Brownian motion, and v is a constant.

We will show that under the assumptions we have,

$$ c_{n}^{-p}M_{n}(u)\Rightarrow C_{\alpha/p}^{-p/\alpha} \bigl(E\bigl[\sigma_{1}^{\alpha }\bigr]\bigr)^{p/\alpha} Z_{\alpha/p}(u), $$
(4.104)

or equivalently,

$$n^{-1/\alpha}M_{n}(u)\Rightarrow A^{p/\alpha}C_{\alpha/p}^{-p/\alpha}\bigl(E\bigl[\sigma_{1}^{\alpha}\bigr]\bigr)^{p/\alpha} Z_{\alpha/p}(u). $$

From (4.102), (4.103) and (4.104) we conclude the proof of the theorem. First we prove (4.104). The proof is very similar to the proof of convergence of the partial sum of an i.i.d. sequence in the domain of attraction of a stable law to a Lévy stable process. The difference consists of some additional technicalities (see e.g. the proof of Theorem 71 in Resnick 2007 for additional details).

Step 1::

For 0<ε<1, decompose M n (u) further as

The term \(\tilde{M}_{n}^{(\varepsilon )}(\cdot)\) is treated using point process convergence. It excludes small jumps X t defined by \(c_{n}^{-1}|X_{t}|<\varepsilon \). The reason for this is that the summation functional is not continuous on the entire real line; one has to exclude small jumps. For any ε>0, the summation functional is an almost surely (with respect to the distribution of the Poisson point process, see e.g. p. 215 in Resnick 2007) continuous mapping from the set of Radon measures on [0,1]×[ε,∞) to \(D([0,1],\mathbb{R})\). From Theorem 4.18 we then conclude

$$ c_{n}^{-p}\sum_{t=1}^{[nu]}|X_{t}|^{p}1 \bigl\{|X_{t}|>\varepsilon c_{n}\bigr\} \Rightarrow \sum _{k:t_{k}\leq u}|j_{k}|^{p}1\bigl\{|j_{k}|> \varepsilon \bigr\} $$
(4.105)

in \(([0,1],\mathbb{R})\), where we recall that (t k ,j k ) are points of the limiting Poisson process. Taking expectations in (4.105), we obtain

$$\lim_{n\rightarrow\infty}[nu]c_{n}^{-p}E \bigl[ |X_{1}|^{p}1 \bigl\{ |X_{1}|>\varepsilon c_{n}\bigr\} \bigr] =u\int_{|x|>\varepsilon }|x|^{p}\,d \lambda(x) $$

uniformly with respect to u∈[0,1], since this is a sequence of increasing functions with a continuous limit. Furthermore, we claim that

$$c_{n}^{-p}\Biggl \vert \sum_{t=1}^{[nu]} \bigl( E \bigl[ |X_{1}|^{p}1\bigl\{|X_{1}|> \varepsilon c_{n}\bigr\} \bigr] -E \bigl[ |X_{t}|^{p}1 \bigl\{|X_{t}|>\varepsilon c_{n}\bigr\}\big|\mathcal{F}_{t-1} \bigr] \bigr) \Biggr \vert \overset{\mathrm{p}}{\rightarrow}0 $$

uniformly in u∈[0,1]. The variance of the last expression is in fact bounded by

where γ ζ (k) is the covariance function of the Gaussian sequence ζ t (\(t\in\mathbb{Z}\)), and m is the Hermite rank of σ p(⋅). Recall Potter’s bound (see Theorem 1.5.6. in Bingham et al. 1989): for v>0,

$$nP\bigl(c_{n}^{-1}v\xi_{1}\in(s_{i},t_{i}) \bigr)\leq C\bigl(\max\{v,1\}\bigr)^{\alpha+\delta}, $$

where δ>0. Now, if p<α<2p, then we combine Karamata’s theorem with Potter’s bound to obtain

Since by assumption \(E[\sigma_{1}^{2\alpha+2\varepsilon }]<\infty\) for some ε>0, we have for each t,

(4.106)

where the last bound is obtained for some ε>0 by Potter’s bound. This proves the convergence of finite-dimensional distributions to 0 and tightness in D([0,1]). We now argue that the bounds obtained above imply

$$c_{n}^{-p}\tilde{M}_{n}^{(\varepsilon )}(u)\Rightarrow C_{\alpha /p}^{-p/\alpha }\bigl(E\bigl[\sigma_{1}^{\alpha} \bigr]\bigr)^{p/\alpha}Z_{\alpha/p}^{(\varepsilon )}(u) $$

and also \(Z_{\alpha/p}^{(\varepsilon )}(u)\Rightarrow Z_{\alpha /p}(u)\) as ε→0. Therefore, it is suffices to show the negligibility of \(c_{n}^{-p}M_{n}^{(\varepsilon )}\), i.e. that small jumps are negligible. By Doob’s martingale inequality we obtain

Recall that α<2p. By Karamata’s theorem (Lemma 4.18),

$$E \bigl[ |X_{1}|^{2p}1\bigl\{|X_{1}|<\varepsilon c_{n}\bigr\} \bigr] \sim\frac{2\alpha }{2p-\alpha}(\varepsilon c_{n})^{2p} \bar{F}_{X}(\varepsilon c_{n})\sim\frac {2\alpha }{2p-\alpha} \varepsilon ^{2p-\alpha}c_{n}^{2p}n^{-1}. $$

Applying this and letting ε→0, we conclude that \(c_{n}^{-p}M_{n}^{(\varepsilon )}\) is uniformly negligible in L 2 and therefore also in probability. Thus,

$$c_{n}^{-p}M_{n}(u)\Rightarrow C_{\alpha/p}^{-p/\alpha} \bigl(E\bigl[\sigma_{1}^{\alpha }\bigr]\bigr)^{p/\alpha} Z_{\alpha/p}(u). $$

This finishes the proof of (4.98) and (4.99).

As for the sum S n , the long-memory part R n vanishes since E(X 1)=E(ξ 1)E(σ 1)=0. Thus, in this case also only the stable limit arises. □

The reader is referred to Kulik and Soulier (2012) for more discussion, a detailed proof and extensions to stochastic volatility with leverage.

4.3.5 Subordinated Gaussian Processes with Infinite Variance

Previously (see Theorem 4.16 or Theorem 4.19, Eq. (4.99)) we have seen that it is possible to obtain limiting distributions with finite variance although we start with innovations with infinite second moments. In this section we illustrate that this type of behaviour can also be achieved in the context of Gaussian subordination with infinite variance. This rather peculiar result depends on specific circumstances to be explained below.

Let X t (\(t\in\mathbb{Z}\)) be a stationary centred Gaussian process with covariance γ X (K)∼L γ (k)k 2d−1, d∈(0,1/2). Assume that G is a function such that, as x→∞,

$$ P\bigl(G(X_{1})>x\bigr)\sim A\frac{1+\beta}{2}x^{-\alpha},\qquad P \bigl(G(X_{1})<-x\bigr)\sim A\frac{1-\beta}{2}x^{-\alpha}, $$
(4.107)

where β∈[−1,1]. If α∈(0,2), then G(X t ) have infinite (or non-existing) variance. Furthermore, if α∈(0,1), then E(|G(X 1)|)=+∞. A typical example is G(x)=|x|−1/α. After the transformation |x|−1/α the mass from zero is “sent” to infinity (since for a standard normal density, \(\phi(0)\not=0\)). Another example is G(x)=bexp(cx 2) for some constants \(b\in\mathbb{R}\) and c>0.

In this section we shall assume that α∈(1,2). Again we consider

$$S_{n,G}(u)=\sum_{t=1}^{[nu]} \bigl\{ G(X_{t})-E \bigl[ G(X_{1}) \bigr] \bigr\} . $$

With a similar trick as in the proof of Theorem 4.19, i.e. the decomposition into a martingale and a long-memory part, S n,G will be studied using techniques available for weakly dependent processes with infinite variance (see M n (⋅) in the proof of Theorem 4.19) and finite-variance subordinated Gaussian processes (see Sect. 4.2.3). This method was used in Sly and Heyde (2008) for α∈(1,2). The result for α∈(0,1) was proven in Davis (1983).

4.3.5.1 Point Process Convergence

Assume that α∈(1,2), so that \(\operatorname {var}(G(X_{t}))<\infty\). As in case of the LMSV model, we start with the convergence of point processes

$$N_{n}=\sum_{t=1}^{n}\delta_{(t/n,c_{n}^{-1}G(X_{t}))},$$

where in the present context

$$c_{n}=\inf\bigl\{x:P\bigl(\big|G(X_{1})\big|>x\bigr)\leq n^{-1}\bigr\}. $$

Recall that

$$d\lambda(x)=\alpha \biggl[ \frac{1+\beta}{2}x^{-(\alpha+1)}1 \{ 0<x<\infty \} + \frac{1-\beta}{2}(-x)^{-(\alpha+1)}1 \{ -\infty<x<0 \} \biggr] . $$

We state the following result without proof. In principle, as in the LMSV case, it says that the random variables G(X t ) behave as if they were independent.

Theorem 4.20

Consider a Gaussian sequence X t (\(t\in\mathbb{N}\)) and a real-valued function G such that (4.107) holds. Then N n converges weakly in \(M_{p}([0,1]\times\mathbb{R})\) to a Poisson process N with intensity measure ds×(x).

4.3.5.2 Hypercontraction Principle for Gaussian Random Variables

We shall explain how it is possible to obtain a finite-variance random variable from infinite-variance variables G(X t ). Recall that for a function G such that E[G 2(X 1)]<∞, we have the following expansion:

$$G(x)=E \bigl[ G(X_{1}) \bigr] +\sum_{l=m}^{\infty} \frac{J(l)}{l!}H_{l}(x), $$

where m is the Hermite rank of G, and J(l)=E[G(X 1)H l (X 1)]. This expansion is also valid for a function G with E[|G(X 1)|1+θ]<∞, where θ∈(0,1). Indeed, the Hermite coefficients J(l) are still well defined. Applying the Hölder inequality, we obtain with r=(1+θ)/θ,

$$ \big|J(l)\big|\leq E^{\frac{1}{1+\theta}}\bigl[\big|G(X_{1})\big|^{1+\theta} \bigr]E^{\frac{1}{r}}\bigl[\big|H_{l}(X_{1})\big|^{r} \bigr]=\Vert G\Vert_{1+\theta}\Vert H_{l}\Vert_{r}<\infty, $$
(4.108)

where \(\Vert G\Vert_{r}^{r}=\int G^{r}(u)\phi(u)\,du\). Now, let X=a 1 X 1+θX 2, where \(a_{1}^{2}+\theta^{2}=1\), and X 1, X 2 are independent standard normal random variables. Let \(\mathcal{F}\) be the sigma field generated by X 2. We will argue below that although E[G 2(X)]=+∞, we have

$$\operatorname {var}\bigl(E\bigl[G(X_{1})|\mathcal{F}\bigr]\bigr)<\infty. $$

We start with the following result.

Lemma 4.21

Assume that E[|G(X 1)|1+θ]<∞, where θ∈(0,1). Then

$$\sum_{l=m}^{\infty}\frac{J^{2}(l)}{l!}\theta^{2l}<\infty. $$

Proof

From Lemma 3.1 in Taqqu (1977) we have the following bound:

$$\Vert H_{l}\Vert_{r}\leq(r-1)^{l/2}\sqrt{l!}. $$

Applying (4.108) (recall that r=(1+θ)/θ), we obtain

$$\frac{J^{2}(l)\theta^{2l}}{l!}\leq\frac{\theta^{2l}}{l!}\Vert G\Vert _{1+\theta}^{2}(r-1)^{l}l!=\theta^{2l}\Vert G\Vert_{1+\theta}^{2}\theta ^{-l}=\Vert G\Vert_{1+\theta}^{2}\theta^{l}. $$

 □

The consequence of this simple lemma is quite remarkable. Applying formula (3.16) and recalling that X 2 is \(\mathcal{F}\)-measurable and Hermite polynomials H l (l≥1) are centred, we obtain

We recall that \(E[ H_{l}^{2}(X_{2})] =l!\). From Lemma 4.21 we have

$$\sum_{l=m}^{\infty} \biggl( \frac{J(l)}{l!} \biggr)^{2}\theta^{2l}l!<\infty. $$

This expression is however equal to

$$\operatorname {var}\Biggl( \sum_{l=m}^{\infty}\frac{J(l)}{l!} \theta^{l}H_{l}(X_{2}) \Biggr) =\operatorname {var}\Biggl( \sum _{l=m}^{\infty}\frac{J(l)}{l!}E \bigl[H_{l}(X)|\mathcal {F}\bigr] \Biggr) . $$

Thus, \(\sum_{l=m}^{\infty}E[H_{l}(X)|\mathcal{F}]J(l)/l!\) is a well-defined Hermite expansion of a function

$$\tilde{g}(X_{2}):=E\bigl[G(X)|\mathcal{F}\bigr]=E\bigl[ \tilde{g}(X_{2})\bigr]+\sum_{l=m}^{\infty} \frac{J(l)}{l!}\theta^{l}H_{l}(X_{2}) $$

with finite variance. Note also that, since X 2 is \(\mathcal{F}\)-measurable,

$$E\bigl[\tilde{g}(X_{2})H_{l}(X_{2})\bigr]=E \bigl \{ E\bigl[G(X)|\mathcal{F}\bigr]H_{l}(X_{2}) \bigr\} =E\bigl[G(X)H_{l}(X_{2})\bigr]. $$

4.3.5.3 Partial Sums Convergence

Theorem 4.21

Assume that X t (\(t\in \mathbb{Z}\)) is a stationary standard normal sequence with covariance γ X (K)∼L γ (k)k 2d−1, d∈(0,1/2). Let G be a function with Hermite rank m such that (4.107) holds with 1<α<2.

  • If 1<α<2 and 1−m(1/2−d)<1/α, then

    $$ n^{-1/\alpha}\sum_{t=1}^{[nu]} \bigl\{ G(X_{t})-E \bigl[ G(X_{1}) \bigr] \bigr\} \overset{ \mathrm{f.d.}}{\rightarrow}A^{1/\alpha}C_{\alpha }^{-1/\alpha} Z_{\alpha}(u), $$
    (4.109)

    where Z α (⋅) is an α-stable Lévy process such that \(Z_{\alpha}(1)\overset{\mathrm{d}}{=}S_{\alpha}(1,\beta,0)\).

  • If m is the Hermite rank of G and \(1-m(\frac{1}{2}-d)>1/\alpha\), then

    $$n^{-(1-m(\frac{1}{2}-d))}L_{m}^{-1/2}(n)\sum _{t=1}^{[nu]} \bigl\{ G(X_{t})-E \bigl[ G(X_{1}) \bigr] \bigr\} \Rightarrow Z_{m,H}(u)\quad \bigl(u \in[0,1]\bigr), $$

    where \(H=d+\frac{1}{2}\), \(L_{m}(n)=m!C_{m}L_{\gamma}^{m}(n)\), Z m,H (u) is the Hermite–Rosenblatt process, anddenotes weak convergence in D[0,1].

Proof

We present just a short heuristic derivation. The Gaussian sequence can be written as a linear process \(X_{t}=\sum _{j=0}^{\infty}a_{j}\varepsilon_{t-j}\), where ε t (\(t\in\mathbb{Z}\)) are i.i.d. standard normal, and \(\sum_{j=0}^{\infty}a_{j}^{2}=1\). Let \(\mathcal{F}_{t}=\sigma (\varepsilon _{t},\varepsilon_{t-1},\ldots)\). Then

where l is such that \(\theta:=\sqrt{\sum_{j=l}^{\infty }a_{j}^{2}}<\alpha-1\). The first part M n (⋅) is a martingale. Therefore, its limiting properties are studied in the very same way as M n (⋅) in the proof of Theorem 4.19. As for the second part, write

$$X_{t}:=\sum_{j=0}^{l-1}a_{j}\varepsilon_{t-j}+\theta\tilde{X}_{t,l}, $$

where \(\tilde{X}_{t,l}:=\theta^{-1}\sum_{j=l}^{\infty}a_{j}\varepsilon_{t-j}\). The random variables \(\tilde{X}_{t,l}\) (\(t\in\mathbb{N}\)) are standard normal. Applying Lemma 4.21, the function

$$g(\tilde{X}_{t,l}):=E \bigl[ G(X_{t})|\mathcal{F}_{t-l} \bigr] -E \bigl[ G(X_{1}) \bigr] $$

has finite variance. Therefore, the convergence of the second part R n (u) follows from Theorem 4.4. □

4.3.6 Quadratic LARCH Models

We recall (cf. (2.58)) that the quadratic LARCH(∞) (or LARCH+) process is the unique solution of

$$ X_{t}=b_{0}\eta_{t}+\xi_{t}\sum_{j=1}^{\infty}b_{j}X_{t-j}, $$
(4.110)

where (η t ,ξ t ) (\(t\in\mathbb{Z}\)) is a sequence of i.i.d. random vectors. We assume that b j c b j d−1 (d∈(0,1/2)) and that the random variables η t are heavy tailed in the sense that

$$P\bigl(|\eta_{1}|>x\bigr)\sim Ax^{-\alpha}$$

for some α∈(2,4). In other words, \(E(\eta_{1}^{2})<\infty\), but \(E(\eta_{1}^{4})=\infty\). Furthermore, we assume that \(E(\xi_{1}^{4}+\xi_{1}^{2}\eta_{1}^{2})<\infty\). Surgailis (2008) considers convergence of the sum of the squares and proves that under appropriate technical assumptions we have a dichotomous behaviour as in case of the stochastic volatility model (cf. Theorem 4.19) or the subordinated Gaussian sequence with heavy tails (cf. Theorem 4.21): if \(d+\frac{1}{2}<2/\alpha\), then

$$n^{-2/\alpha}\sum_{t=1}^{[nu]} \bigl(X_{t}^{2}-E\bigl(X_{1}^{2}\bigr) \bigr) $$

converges in a finite-dimensional sense to a Lévy process. Otherwise, if \(d+\frac{1}{2}>2/\alpha\), then

$$n^{-(d+\frac{1}{2})}\sum_{t=1}^{[nu]} \bigl(X_{t}^{2}-E\bigl(X_{1}^{2}\bigr) \bigr) $$

converges to a fractional Brownian motion.

Also, if α∈(1,2), then \(n^{-1/\alpha}\sum_{t=1}^{n}X_{t}\) converges to a stable limit. As in the case of LMSV processes (see Sect. 4.3.4), this can be concluded from a general theory by Surgailis (2008).

4.3.7 Summary of Limit Theorems for Partial Sums

We summarize the main limit theorems. We consider centred linear process \(X_{t}=\sum_{j=0}^{\infty}a_{j}\varepsilon_{t-j}\) such that, as x→∞,

$$P(\varepsilon_{1}>x)\sim A\frac{1+\beta}{2}{x^{-\alpha}},\qquad P(\varepsilon _{1}<-x)\sim A\frac{1-\beta}{2}x^{-\alpha}$$

with α∈(1,2) and appropriate regularity conditions (that assure the existence of the process) hold. When the sum of the squares \(X_{t}^{2}\) is considered, then we assume instead that α is in the range α∈(2,4).

Another class of processes considered above are stochastic volatility models with infinite second moments. As a representative, we look at \(X_{t}=\xi_{t} \exp( \sum_{j=1}^{\infty}a_{j}\varepsilon_{t-j}) \), where the sequences ξ t and ε t are mutually independent. We assume that

$$P(\xi_{1}>x)\sim A\frac{1+\beta}{2}{x^{-\alpha}},\qquad P(\xi _{1}<-x)\sim A\frac{1-\beta}{2}x^{-\alpha} $$

with α∈(1,2) and E[ξ 1]=0. Again, if the sum of \(X_{t}^{2}\) is considered, then this tail behaviour is assumed to hold for α∈(2,4). Furthermore, the random variables ε t are assumed to be standard normal. We use the notation B(⋅) for a Brownian motion on [0,1], B H (⋅) denotes a fractional Brownian motion on [0,1], Z 2,H (⋅) is the Hermite–Rosenblatt process on [0,1], and \(\tilde {Z}_{H,\alpha}\) is a linear fractional stable motion with Hurst parameter H=d+1/α. Furthermore, c is a generic constant. We summarize the results for partial sums in Table 4.2. For simplicity, the slowly varying functions are assumed to be constant.

Table 4.2 Limits for partial sums with infinite moments

4.4 Limit Theorems for Sample Covariances

In a preliminary analysis of a time series, sample autocovariances play a crucial role. Moreover, limit theorems for quadratic forms can often be deduced from those for sample covariances. In this section we therefore study the limiting behaviour of sample covariances and, more generally, of multivariate functions applied to long-memory sequences. Surprisingly, this theory is not well developed beyond Gaussian (Rosenblatt 1979; Ho and Sun 1987, 1990; Arcones 1994) and linear processes with finite (Hosking 1996; Horváth and Kokoszka 2008) and infinite moments (Kokoszka and Taqqu 1996; Horváth and Kokoszka 2008). Some recent results were developed for stochastic volatility models (Davis and Mikosch 2001; McElroy and Politis 2007; Kulik and Soulier 2012).

4.4.1 Gaussian Sequences

In what follows, all vectors are considered as column vectors. Consider a stationary centred sequence of Gaussian vectors

$$\mathbf{X}_{t}= \bigl( X_{t}^{(1)}, \ldots,X_{t}^{(q)} \bigr)^{T}\quad (t\in\mathbb{Z}) $$

with the marginal covariance matrix Σ and autocovariance function \(\gamma_{i,j}(k)=E[ X_{0}^{(i)}X_{k}^{(j)}] \) (i,j=1,…,q), and assume either

$$ \sum_{k=-\infty}^{\infty}\big|\gamma_{i,j}(k)\big|<\infty $$
(4.111)

or the existence of a parameter d∈(0,1/2) and a slowly varying function L γ such that

$$ \gamma_{i,j}(k)\sim a_{i,j}k^{2d-1}L_{\gamma}(k)\quad (i,j=1,2,\ldots,q), $$
(4.112)

where the constants a i,j are not all equal to zero. We will then use the same notation γ(k)=k 2d−1 L γ (k) as in the univariate case.

Example 4.18

Let q=2 and assume that \(\tilde{X}_{t}^{(1)}\) (\(t\in\mathbb{N}\)) and \(\tilde{X}_{t}^{(2)}\) (\(t\in\mathbb{N}\)) are mutually independent long-memory standard Gaussian sequences with the same covariances \(\gamma_{X}(k)=\gamma_{\tilde{X}}(k)=\gamma(k)\). Then (4.112) holds with a 1,1=a 2,2=1 and a 1,2=a 2,1=0.

Example 4.19

Let X t (\(t\in\mathbb{N}\)) be a stationary standard Gaussian sequence with covariance γ X (k)=c γ k 2d−1. Fix s>0, and let

$$\bigl( X_{t}^{(1)},X_{t}^{(2)} \bigr)^{T}= ( X_{t},X_{t+s} )^{T}\quad(t\in \mathbb{N}). $$

Then

$$\gamma_{1,1}(k)=\gamma_{2,2}(k)={E}[X_{0}X_{k}]=\gamma_{X}(k), $$

so that a 1,1=a 2,2=1. Furthermore,

$$\gamma_{1,2}(k)={E}[X_{0}X_{s+k}]=\gamma_{X}(k+s)\sim\gamma_{X}(k) $$

as k→∞, so that a 1,2=1. Similarly, a 2,1=1.

Example 4.20

Assume that \(\tilde{X}_{t}^{(1)}\) and \(\tilde{X}_{t}^{(2)}\) (\(t\in\mathbb{N}\)) are as in Example 4.18. Fix s>0, and let

$$\bigl( X_{t}^{(1)},X_{t}^{(2)} \bigr)^{T}= \bigl( \tilde{X}_{t}^{(1)},\rho \tilde{X}_{t}^{(2)}+\sqrt{1-\rho^{2}} \tilde{X}_{t}^{(2)} \bigr)^{T}, $$

where ρ=γ X (s). Note that for a fixed t, the vectors \(( X_{t}^{(1)},X_{t}^{(2)}) ^{T}\)in Example 4.19 and here have the same covariance matrix. Now, a 1,1=a 2,2=1, whereas

$$\gamma_{1,2}(k)=\rho\gamma_{X}(k), $$

so that a 1,2=ρ. Similarly, a 2,1=ρ.

After explaining basic structures of dependent Gaussian vectors, we turn our attention to limit theorems. It turns out that limit theorems for multivariate Gaussian vectors can be reduced to the case where the vectors have the identity covariance matrix I q . Therefore, we start with the case of independent components.

4.4.1.1 Independent Components

Consider the collection \(\{\tilde{X}_{t}^{(l)},l\in\mathbb{N},t\in\mathbb {N}\}\) of long-memory Gaussian sequences. For any \(l\not=k\), the sequences \(X_{t}^{(l)}\) and \(X_{t}^{(k)}\) (t∈) are assumed to be independent. Recall the following notation from Sect. 4.2.3 (see also Sect. 4.1.3) the following notation. Assume for a moment that \(X_{t}=\sum_{j=0}^{\infty}a_{j}\varepsilon_{t-j}\) is the Gaussian process, where ε t (\(t\in\mathbb{Z}\)) are i.i.d. standard normal random variables. Consider the following random measures: M ε (⋅) is a Gaussian random measure with independent increments, associated with the sequence ε t , that is \(E[|dM_{\varepsilon}(\lambda)|^{2}]=\sigma_{\varepsilon}^{2}/(2\pi)\,d\lambda\), \(dM_{0}(\lambda)=\sqrt{2\pi}dM_{\varepsilon}(\lambda)\),

$$dM_{X}(\lambda)= \Biggl( \sum_{j=0}^{\infty}a_{j} e^{-ij\lambda} \Biggr) \,dM_{\varepsilon}(\lambda)=A\bigl(e^{-i\lambda} \bigr)\,dM_{\varepsilon}(\lambda)= a(\lambda)\,dM_{0}(\lambda) $$

is the spectral random measure associated with a sequence X t (\(t\in\mathbb{N}\)). Recall further that n 1/2 M 0(n −1 A) is another Gaussian random measure with the same distribution as M 0(A). Then

$$\frac{L_{f}^{1/2}((n\lambda)^{-1})}{L_{f}^{1/2}(n^{-1})}|\lambda|^{-d}n^{1/2}\,dM_{0}\bigl(n^{-1}\lambda\bigr) $$

converges vaguely to W X ():=|λ|ddM 0(λ).

As in Sect. 4.2.3, we can represent the Gaussian sequences \(\tilde{X}_{t}^{(l)}\) (\(t\in\mathbb {N}\)) as (cf. (4.28))

$$\tilde{X}_{t}^{(l)}=\int_{-\pi}^{\pi} e^{it\lambda}\,d M_{\tilde{X}^{(l)}} (\lambda)\quad(t\ge1), $$

where

$$dM_{\tilde{X}^{(l)}}(\lambda)=a^{(l)}(\lambda)\,dM_{0}^{(l)}(\lambda) , $$

and \(M_{0}^{(l)}(\cdot)\) (l≥1) are independent Gaussian random measures. Furthermore, \(|a^{(l)}(\lambda)|^{2}=f_{\tilde{X}^{(l)}}(\lambda)\), where \(f_{(l)}=f_{\tilde{X}^{(l)}}\) is the spectral density associated with the sequence \(\tilde{X}_{t}^{(l)}\) (\(t\in\mathbb{N}\)). Also, \(n^{1/2} M_{0}^{(l)}(n^{-1}A)\overset{\mathrm{d}}{=}M_{0}(A)\), and

$$ \frac{L_{f_{(l)}}^{1/2}((n\lambda)^{-1})}{L_{f_{(l)}}(n^{-1})} |\lambda|^{-d} n^{1/2} \,dM_{0}^{(l)}\bigl(n^{-1}\lambda\bigr) $$
(4.113)

converges vaguely to a measure \(dW_{\tilde{X}^{(l)}}(\lambda)=|\lambda|^{-d}\,dM_{0}^{(l)}(\lambda)\).

As in the alternate proof of Theorem 4.2 (see also the proof of Theorem 4.3), we may write

with

$$D_{n}(\lambda)=\frac{e^{i\lambda n}-1}{n(e^{i\lambda}-1)} 1\bigl\{|\lambda|<\pi n\bigr\}. $$

The functions above converge to

$$D(\lambda)= \frac{e^{i\lambda}-1}{i\lambda}. $$

Thus, if

$$a^{(l)}(\lambda)=a_{l,l}L_{f}^{1/2}\bigl( \lambda^{-1}\bigr)|\lambda|^{-d} \quad (l=1,2), $$

then we may conclude that for d∈(1/4,1/2),

This convergence can be extended to nonlinear functionals. The following theorem is adapted from Arcones (1994). For simplicity, we assume that all a l,l in (4.112) are one. (Recall from Example 4.18 that the terms a i,l , \(i\not =l\), vanish.)

Theorem 4.22

Let \(\tilde{X}_{t}=(\tilde{X}_{t}^{(1)},\ldots, \tilde{X}_{t}^{(q)})^{T}\) (\(t\in\mathbb{N}\)), be a stationary sequence of centred Gaussian vectors with the marginal covariance matrix I q , such that (4.112) holds. Let \(G:\mathbb{R}^{q}\to\mathbb{R}\) be a function with the Hermite rank \(m=\tilde{m}(G)\). If m(1−2d)>1, then

where

$$\tilde{Z}_{(r_{1},\ldots,r_{ m}),H}(1)=\int_{\mathbb{R}^{ m}} D(\lambda_{1}+ \cdots+\lambda_{ m})\prod_{l=1}^{ m} \frac{1}{|\lambda_{l}|^{r_{l}}} \,dM_{0}^{(r_{1})}(\lambda_{1})\cdots dM_{0}^{(r_{ m})}(\lambda_{ m}), $$

\(\int_{\mathbb{R}^{ m}}\) is the m-fold multiple Wiener–Ito integral, and

$$\tilde{c}_{r_{1},\ldots,r_{ m}}=\frac{1}{ m!}E \Biggl[ G({\tilde{X}}_{1})\prod_{l=1}^{q} H_{k(r_{1},\ldots,r_{ m})}\bigl(\tilde{X}^{(l)}_{1}\bigr) \Biggr] , $$

where k(r 1,…,r m ) is the number of components among r 1,…,r m that are equal to l.

Again, as in (4.33), the limiting random variable \(\tilde{Z}_{(r_{1},\ldots,r_{ m}),H}(1)\) can be expressed as

$$ \int_{\mathbb{R}^{ m}} \frac{e^{iu(\lambda_{1}+\cdots+\lambda_{ m})}-1}{i(\lambda_{1}+\cdots +\lambda_{ m})} \,dW_{\tilde{X}^{(r_{1})}}(\lambda_{1})\cdots dW_{\tilde{X}^{(r_{ m})}}(\lambda_{ m}), $$
(4.114)

where \(dW_{\tilde{X}^{(r)}}(\lambda)=|\lambda|^{-d}\,dM_{0}^{(r)}(\lambda)\).

Example 4.21

Consider G(y 1,y 2)=H 2(y 2)H 2(y 2). Then (see Example 3.8) its Hermite rank with respect to a vector \(\tilde{X}_{1}=(\tilde{X}_{1}^{(1)},\tilde{X}_{1}^{(2)})^{T}\) of independent standard normal random variables is m(G)=4. Then

$$c_{1,1,2,2}=\frac{1}{4!}E \bigl[ G({\tilde{X}}_{1})H_{2} \bigl(\tilde{X}_{1}^{(1)}\bigr)H_{2}\bigl( \tilde{X}_{1}^{(2)}\bigr) \bigr] =\frac{1}{4!}\tilde{J} \bigl(G,(2,2)\bigr) =\frac{4}{4!} . $$

Also, this computation is invariant under permutation of indices (1,1,2,2). All other coefficients \(c_{r_{1},r_{2},r_{3},r_{4}}\) vanish. Note that k(1,1,2,2)=2 for l=1,2. Thus,

$$n^{-(1-4(1/2-d))}L_{f}^{4/2}\bigl(n^{-1}\bigr)\sum _{t=1}^{n}H_{2}\bigl( \tilde{X}_{t}^{(1)}\bigr)H_{2}\bigl( \tilde{X}_{t}^{(2)}\bigr) $$

converges in distribution to

$$\frac{6\times4}{4!} \int_{\mathbb{R}^{4}} \frac{e^{iu(\lambda_{1}+\cdots+\lambda_{4})}-1}{i(\lambda_{1}+\cdots+\lambda_{4})} \,dW_{\tilde{X}^{(1)}}(\lambda_{1})\,dW_{\tilde{X}^{(1)}}(\lambda_{2})\,dW_{\tilde{X}^{(2)}}(\lambda_{3})\,dW_{\tilde{X}^{(2)}}(\lambda_{4}). $$

This can be also seen by expanding

$$\sum_{t=1}^{n}H_{2}\bigl( \tilde{X}_{t}^{(1)}\bigr)H_{2}\bigl( \tilde{X}_{t}^{(2)}\bigr) $$

and using a representation for H m (X t ), see the proof of Theorem 4.3. The convergence is valid for d∈(1/4,1/2).

Example 4.22

Let G(y)=H m (y). Then one can see that Z m,H (1) in Theorem 4.22 is exactly the Hermite–Rosenblatt random variable.

4.4.1.2 From Independent to Dependent Components

In general, let \(X_{t}=(X_{t}^{(1)},\ldots,X_{t}^{(q)})^{T}\) (\(t\in \mathbb{N}\)) be a long-memory Gaussian sequence with cross-autocovariance function \(\gamma_{i,j}(k)=E( X_{0}^{(i)}X_{k}^{(j)}) \) as in (4.112) and marginal covariance matrix Σ. Then the statement of Theorem 4.22 remains valid if we replace \(m=\tilde{m}(G)\) by m=m(G,X 1), where m(G,X 1) is the Hermite rank of G with respect to the Gaussian vector X 1; the spectral measures \(W_{\tilde{X}^{(r_{l})}}\) are replaced by the so-called joint spectral measure

$$\bigl(dW_{X^{(1)}}(\lambda_{1}),\ldots,dW_{X^{(q)}}( \lambda_{q})\bigr), $$

and

$$c_{r_{1},\ldots,r_{m}}=\frac{1}{m!}E \Biggl[ G(X_{1}) \prod _{l=1}^{q} H_{k(r_{1},\ldots,r_{m})}\bigl(X_{1}^{(l)} \bigr) \Biggr] . $$

We do not provide details here; the reader is referred to Arcones (1994). However, we will consider the special case of the covariance matrix Σ since this leads to study of sample covariances.

Example 4.23

Recall Example 3.13. We consider the function

$$G(X_{t},X_{t+s})=e^{pX_{t}}e^{pX_{t+s}}. $$

Then the Hermite rank is one. Thus, we have to evaluate \(c_{r_{1}}\), r 1=1,2. We compute

$$c_{1}={E}\bigl[G(X_{t},X_{t+s})X_{t} \bigr]=p\bigl(1+\gamma_{X}(s)\bigr)e^{p^{2}(1+\gamma_{X} (s))}. $$

Also, c 2=E[G(X t ,X t+s )X t+s ]=c 1. Thus,

$$n^{-(d+1/2)}L_{f}^{-1/2}\bigl(n^{-1}\bigr)\sum_{t=1}^{n}e^{pX_{t}}e^{pX_{t+s}}\overset{d}{\rightarrow}2c_{1}\int D(\lambda)\,dW_{X}(\lambda), $$

where W X is the spectral random measure associated with X t (\(t\in\mathbb{N}\)), see (4.34).

4.4.1.3 From Independent to Dependent Components: Sample Covariances

We go back to the original problem of sample covariances. Our vectors \(X_{t}=(X_{1}^{(1)},X_{t}^{(2)})^{T}\) are as in Example 4.19:

$$\bigl( X_{t}^{(1)},X_{t}^{(2)} \bigr)^{T}= ( X_{t},X_{t+s} )^{T}\quad(t\in \mathbb{N}). $$

We write

Recall now the proof of Theorems 4.2 and 4.3. Like in the proof of Theorem 4.3

(4.115)

Note that, as n→∞, \(e^{is\lambda_{2}/n}\rightarrow1\). Therefore, omitting technical details, the limiting behaviour of

$$ n^{-2d}L_f^{-1}\bigl(n^{-1}\bigr)\sum _{t=0}^{n-1} \bigl(X_tX_{t+s}-E(X_tX_{t+s}) \bigr) $$

or, equivalently, of

$$ n^{-2d}L_2^{-1/2}\bigl(n^{-1}\bigr)\sum _{t=0}^{n-1} \bigl(X_tX_{t+s}-E(X_tX_{t+s}) \bigr) $$

is the same as that of \(n^{-2d}L_{2}^{-1/2}(n^{-1})\sum_{t=0}^{n-1}(X_{t}^{2}-E(X_{1}^{2}))\), i.e. it does not involve s. Hence, using Theorem 4.3 with m=2, one can argue that for d∈(1/4,1/2),

(4.116)

where

$$\hat{\gamma}_n(s)=\frac{1}{n}\sum_{t=0}^{n-s}X_tX_{t+s} \quad (s=1,\ldots,K) $$

is the sample covariance at lag s and H=d+1/2. Thus, the limiting random vector has totally dependent components.

We extend this to arbitrary Hermite polynomials. Recall Example 3.15. One can derive the equation (see Lemma 3.4 in Fox and Taqqu 1985)

$$ H_{m}(X_{t})H_{m}(X_{t+s})=m! \gamma_{X}^{m}(s)+\sum_{r=1}^{m}(m-r)! \binom {m}{r}^{2}\gamma_{X}^{m-r}(s)K_{r}(t,{t+s}), $$
(4.117)

where

$$K_r(j,l)=\int_{-\pi}^{\pi}\!\!\cdots\!\! \int_{-\pi}^{\pi}e^{ij(\lambda _1+\cdots+\lambda_r)+i l(\lambda_{r+1}+\cdots+\lambda_{2r})}\prod _{l=1}^{2r}a(\lambda_l)\,dM_0( \lambda_1)\cdots dM_0(\lambda_{2r}). $$

For m=1, the formula reduces to the formula for X t X t+s , used in deriving (4.115). For m=2, the formula yields

The important feature of decomposition (4.117) is that under the condition d∈(1/4,1/2) only the term with r=1 will contribute. In other words, the limiting behaviour of

$$\hat{\gamma}_{n}(s;H_{m}):=\frac{1}{n}\sum_{t=1}^{n-s}H_{m}(X_{t})H_{m}(X_{t+s}) $$

is up to a constant the same for each m≥1. Noting that \((m-1)!\binom{m}{1}^{2}=m!m\) and using (4.117), we have for d∈(1/4,1/2),

(4.118)

where H=d+1/2.

4.4.2 Linear Processes with Finite Moments

In this section we consider second-order stationary linear processes \(X_{t}= \sum_{j=0}^{\infty}a_{j}\varepsilon_{t-j}\) (\(t\in\mathbb{N}\)), where ε t (\(t\in\mathbb{Z}\)) are i.i.d. random variables such that E(ε 1)=0, \(E(\varepsilon_{1}^{2})=\sigma_{\varepsilon}^{2}=1\) and \(E(\varepsilon_{1}^{4})=\eta<\infty\).

Let

$$\hat{\gamma}_{n}(s)=\frac{1}{n}\sum_{t=0}^{n-s}X_{t}X_{t+s}. $$

It converges in probability to the population covariance

$$\gamma_{X}(s)=E(X_{0}X_{s})=\sigma_{\varepsilon}^{2}\sum_{j=0}^{\infty}a_{j}a_{j+s}. $$

Classical results for weakly dependent sequences under \({E}(\varepsilon_{1}^{4})<\infty\) were obtained in Anderson (1971, p. 478); see also Brockwell and Davis (1991, Proposition 7.3.3). For long-memory linear processes, they were obtained in Hosking (1996) and Horváth and Kokoszka (2008).

Theorem 4.23

Let \(X_{t}=\sum _{j=0}^{\infty}a_{j}\varepsilon_{t-j}\) (\(t\in\mathbb{N}\)) be a linear process such that E(ε 1)=0, \({E}(\varepsilon_{1}^{2})=\sigma _{\varepsilon}^{2}=1\) and \({E}(\varepsilon_{1}^{4})=\eta<\infty\). Furthermore, assume that \(\sum_{j=0}^{\infty}a_{j}^{2}=1\).

  1. (a)

    If a j L a (j)j d−1, d∈(0,1/4) or \(\sum _{j=0}^{\infty}|a_{j}|<\infty\), then

    $$n^{1/2} \bigl(\hat{\gamma}_{n}(s) - \gamma_{X}(s) \bigr)\overset{{d}}{\to} N\bigl(0,\nu^{2}\bigr) , $$

    where the variance is

    $$\nu^{2}=(\eta-3)\gamma_{X}^{2}(s)+\sum _{k=-\infty}^{\infty} \bigl( \gamma_{X}^{2}(k)+ \gamma_{X}^{2}(k+s) \bigr) . $$
  2. (b)

    If a j L a (j)j d−1 and d∈(1/4,1/2), then

    $$n^{1-2d}L_{2}^{-1/2}(n) \bigl(\hat{\gamma}_{n}(s)- \gamma_{X}(s)\bigr)\overset{{d}}{\rightarrow}Z_{2,H}(1), $$

    where Z 2,H (u) is a Hermite–Rosenblatt process, \(L_{2}(n)=2C_{2}L_{\gamma}^{2}(n)\),

    $$C_{2}= \bigl[ \bigl(2(2d-1)+1\bigr) (2d+1) \bigr]^{-1}, $$

    and L γ (n) is given in (4.39).

This theorem can be formulated in a multivariate setup. In the first case the limiting distribution is multivariate normal (with dependent components):

$$ n^{1/2}\bigl(\hat{\gamma}_{n}(0)-\gamma_{X}(0), \ldots,\hat{\gamma }_{n}(q)-\gamma_{X}(q)\bigr)\overset{{d}}{ \rightarrow}(G_{0},\ldots,G_{q}), $$
(4.119)

where (G 0,…,G q ) is a zero-mean Gaussian vector with covariance

$$ E[G_{s}G_{t}]=(\eta-3)\gamma_{X}(s) \gamma_{X}(t)+\sum_{k=-\infty }^{\infty } \bigl( \gamma_{X}(k)\gamma_{X}(k+s-t)+\gamma_{X}(k+s) \gamma_{X}(k+t) \bigr) . $$
(4.120)

In the second case, d∈(1/4,1/2), the limit has the form (Z 2,H (1),…,Z 2,H (1)).

Proof

For part (a), we use the standard truncation argument as illustrated in the proof of Theorem 4.5. Let

First, since the sequence X t,K X t+s,K is (K+s)-dependent, its convergence is described by

$$n^{1/2} \bigl( \hat{\gamma}_{n}^{(K)}(s)- \gamma_{X}^{(K)}(s) \bigr) \overset{{d}}{\rightarrow}N\bigl(0, \nu_{K}^{2}\bigr), $$

where

$$\nu_{K}^{2}=(\eta-3) \bigl( \gamma_{X}^{(K)}(s) \bigr)^{2}+\sum_{k=-\infty }^{\infty} \bigl[ \bigl(\gamma_{X}^{(K)}(k)\bigr)^{2}+\bigl( \gamma_{X}^{(K)}(k+s)\bigr)^{2}\bigr] . $$

Since \(\nu_{K}^{2}\rightarrow\nu^{2}\) as K→∞, we also have \(N(0,\nu_{K}^{2})\overset{{d}}{\rightarrow}N(0,\nu^{2})\). It suffices to verify that for all δ>0,

$$\lim_{K\rightarrow\infty}\lim\sup_{n\rightarrow\infty}P \bigl( \bigl \vert n^{1/2} \bigl(\hat{\gamma}_{n}^{(K)}(s)-\gamma_{X}^{(K)}(s) \bigr)-n^{1/2}\bigl(\hat {\gamma }_{n}(s)-\gamma_{X}(s) \bigr)\bigr \vert >\delta \bigr) =0. $$

By Markov’s inequality, to do this, it suffices to verify that

$$\lim_{K\rightarrow\infty}\lim_{n\rightarrow\infty}n\cdot \operatorname {var}\bigl(\hat{\gamma}_{n}^{(K)}(s)-\hat{\gamma}_{n}(s)\bigr)=0. $$

In the case of Theorem 4.5 this was handled by introducing the random variable \(\bar{X}_{t,K}=X_{t}-X_{t,K}\). In our situation here this is not straightforward since

$$\sum_{j,j^{\prime}=0}^{\infty}a_{j}a_{j+s}-\sum_{j,j^{\prime}=0}^{K}a_{j}a_{j+s}\not=\sum_{j,j^{\prime}=K+1}^{\infty}a_{j}a_{j+s}. $$

We have to verify that

We prove the first part only. The expression is

$$\sum_{k=-(n-1)}^{n-1} \biggl( 1-\frac{|k|}{n} \biggr) \Biggl[ (\eta -3)\sigma_{\varepsilon}^{2}\sum _{j=0}a_{j}a_{j+s}a_{j+k}a_{j+k+s}+ \gamma_{X}^{2}(k)+\gamma_{X}^{2}(k+s) \Biggr] . $$

Then the relation follows by the dominated convergence theorem. For this, one needs, in particular, \(\sum_{k}\gamma_{X}^{2}(k)<\infty\), which is achieved if d∈(0,1/4) or \(\sum_{j=0}^{\infty}|a_{j}|<\infty\).

As for part (b), we use the following decomposition:

We may write the first part as \(M_{n}=n^{-1}\sum_{t=1}^{n}Y_{t}\), where Y t (\(t\in\mathbb{N}\)) is the linear process \(Y_{t}=\sum_{j=0}^{\infty }c_{j}(\varepsilon_{t-j}-\sigma_{\varepsilon}^{2})\) with summable coefficients c j =a j a j+s . Indeed, by the Cauchy–Schwarz inequality,

$$\sum|c_{j}|\leq \Bigl( \sum a_{j}^{2} \Bigr)^{1/2} \Bigl( \sum a_{j+s}^{2} \Bigr)^{1/2}<\infty. $$

Thus, n 1/2 M n converges to a normal distribution on account of Theorem 4.5.

As for the second part, we may recognize that it has almost the same form as the therm U n,2 in (4.51), so that its limiting distribution is of Hermite–Rosenblatt type. If d∈(1/4,1/2), then

$$n^{1-2d}L_{2}^{-1/2}(n)R_{n}\overset{{d}}{\rightarrow}Z_{2,H}(1). $$

Thus, the second part dominates if d∈(1/4,1/2).

Note that formally the limit in part (b) may depend on s. However, this is not the case; a precise computation is given in Horváth and Kokoszka (2008). □

4.4.3 Linear Processes with Infinite Moments

Here we consider the same linear processes as in Sect. 4.4.2, however, instead of assuming \({E}[\varepsilon_{1}^{4}]<\infty\), we impose the regularly varying condition:

$$ P(\varepsilon_{1}>x)\sim A\frac{1+\beta}{2}x^{-\alpha},\qquad P(\varepsilon _{1}<-x)=A\frac{1+\beta}{2}x^{-\alpha}, $$
(4.121)

where A>0, β∈[−1,1] and α∈(1,4). In particular, E[|ε 1|]<∞, \({E}[\varepsilon_{1}^{4}]=+\infty \).

There is a vast literature on sample covariances for weakly dependent linear processes with regularly varying innovations. Kanter and Steiger (1974) considered AR(p) models, Davis and Resnick (1985, 1986) considered processes with infinite variance and with finite variance, but infinite fourth moment, respectively. In the latter papers, the authors used point process techniques, as described in the section on partial sums with infinite moments; see Sect. 4.3. This technique was successfully applied to bilinear processes with infinite moments (Davis and Resnick 1996; Basrak et al. 1999) and to GARCH models (Davis and Mikosch 1998; Basrak et al. 2002)

As for long-memory linear processes, Kokoszka and Taqqu (1996) generalized the results by Davis and Resnick (1985) for α∈(1,2), whereas Horváth and Kokoszka (2008) generalized Davis and Resnick (1986) for α∈(2,4). (Recall that there is no long memory if α∈(0,1)).

Recall that the sample covariance is defined as

$$\hat{\gamma}_{n}(s)=\frac{1}{n}\sum_{t=1}^{n-s}X_{t}X_{t+s}\quad (s=1,\ldots,q). $$

The first result deals with α∈(1,2). There is no influence of long memory.

Theorem 4.24

Assume that X t (\(t\in\mathbb{N}\)) is a linear process and ε t (\(t\in \mathbb{Z}\)) are i.i.d. random variables such that (4.121) holds with α∈(1,2) and E(ε 1)=0. If α∈(1,2), then

(4.122)

where S α (1,1,0) is a stable random variable.

Proof

The proof is given in Davis and Resnick in the weakly dependent case (4.88); however it applies to the long-memory situation as long as the conditions of Theorem 4.24 are fulfilled. The reason for this is that under the condition \(\sum_{j}a_{j}^{2}<\infty\), the quantity ∑ j a j a j+s is also finite. We give a sketch of the proof for \(\hat{\gamma}_{n}(q)\) only. Recall from Theorem 4.14 that

$$\sum_{t=1}^{n}\delta_{c_{n}^{-1}(X_{t},\ldots,X_{t-K})}\Rightarrow\sum _{l=1}^{\infty}\sum_{r=0}^{\infty}\delta_{j_{l}(a_{r},a_{r-1},\ldots ,a_{r-K})}, $$

where j l are points of the limiting Poisson process, c n is such that P(|ε 1|>c n )∼n −1, i.e. c n A 1/α n 1/α. The continuous mapping theorem yields

As γ→0, the latter random variable converges to

$$\Biggl( \sum_{j=0}^{\infty}a_{j}a_{j+q} \Biggr) \sum_{l=0}^{\infty}j_{l}^{2} \overset{{d}}{=} \Biggl( \sum_{j=0}^{\infty}a_{j}a_{j+q} \Biggr) S_{\alpha/2}\bigl(C_{\alpha/2}^{-2/\alpha},1,1\bigr). $$

It remains to show that

$$\lim_{\gamma\rightarrow0}\limsup_{n\rightarrow\infty}P \Biggl( c_{n}^{-2}\Bigg|\sum_{t=1}^{n}X_{t}X_{t+q}1 \bigl\{|X_{t}|<c_{n}\gamma,|X_{t+q}|<c_{n}\gamma\bigr\}\Bigg|>\gamma \Biggr) =0. $$

This probability is bounded by

$$\frac{n}{c_{n}^{2}\gamma}{E}\bigl[\big|X_{1}^{2}\big|1\bigl\{|X_{1}|< \gamma c_{n}\bigr\}\bigr]. $$

We conclude the proof by applying Karamata’s theorem (Lemma 4.18) together with the tail estimates in Lemma 4.19. □

The situation is different for α∈(2,4). We have a dichotomous behaviour, depending on the interplay between tails and memory.

Theorem 4.25

Assume that X t (\(t\in\mathbb{N}\)) is a linear process such that a j c a j d−1, d∈(0,1/2) (so that γ X (k)∼L γ (k)k 2d−1, see (4.39)) and ε t (\(t\in\mathbb{Z}\)) are i.i.d. random variables such that (4.121) holds with α∈(2,4) and E(ε 1)=0.

  • If α∈(2,4) and 0<d<1/α, then (4.122) holds.

  • If α∈(2,4) and 1/α<d<1/2, then

    $$n^{1-2d}L_{2}^{-1/2}(n) \bigl(\hat{\gamma}_{n}(s)- \gamma_{X}(s)\bigr)\overset{{d}}{\rightarrow}Z_{2,H}(1), $$

    where Z 2,H (u) is a Hermite–Rosenblatt process, and \(L_{2}(n)=2!C_{2}L_{\gamma}^{2}(n)\).

Proof

Consider the decomposition M n +R n from the proof of Theorem 4.23:

Since the random variables ε t have a finite variance, we again have

$$n^{1-2d}L_{2}^{-1/2}(n)R_{n}\overset{{d}}{\rightarrow}Z_{2,H}(1) $$

if d∈(1/4,1/2) and n −1/2 R n =O P (1) if d∈(0,1/4). The first part, M n , is the partial sum of a linear process with summable coefficients and infinite variance, and hence we can conclude the stable limit for M n . □

4.4.4 Stochastic Volatility Models

Some recent results were developed for stochastic volatility models (McElroy and Politis 2007, Kulik and Soulier 2012). In the latter paper, the authors show differences between LMSV and models with a leverage.

Consider a stochastic volatility model X t =σ t ξ t (\(t\in\mathbb{N}\)) such that the sequences σ t (\(t\in\mathbb {N}\)) and ξ t (\(t\in\mathbb{N}\)) are independent. Assume that E(ξ 1)=0. We are interested in sample covariances of X t and \(X_{t}^{2}\). For the first one, we note that

$$\hat{\gamma}_{n}(s)=\frac{1}{n}\sum_{t=1}^{n-s}\xi_{t}\xi_{t+s}\sigma _{t}\sigma_{t+s}$$

is a martingale w.r.t. sigma field generated by (σ j ,ξ j ), jt. Therefore, if we assume additionally \(E[\xi_{1}^{2}]<\infty\), then

$$\sqrt{n}\hat{\gamma}_{n}(s)\overset{{d}}{\rightarrow}N \bigl(0,v^{2}\bigr), $$

where \(v^{2}=E[\sigma_{0}^{2}\sigma_{s}^{2}]E^{2}[\xi_{1}^{2}]\). The more interesting situation happens in the second case of squares. Assume that \({E}[\xi_{1}^{4}]<\infty\). Then

Again, the first part is a martingale, and therefore it is O P (n −1/2). The second part is a possible long-memory contribution of the bivariate sequence σ t σ t+s (\(t\in\mathbb{N}\)). For example, if we consider σ t =exp( t ), where ζ t (\(t\in\mathbb{N}\)) is the long-memory Gaussian process as in Example 4.23, then for d∈(1/4,1/2) (refer to Example 4.23 for the precise notation),

$$n^{-(d+1/2)}L_{f}^{-1/2}(n)R_{n}\overset{{d}}{ \rightarrow}2{E}^{2}\bigl[\xi_{1}^{2} \bigr]c_{1}\int D(\lambda)\,dW_{\zeta}(\lambda), $$

where W ζ is the spectral random measure associated with ζ t (\(t\in\mathbb{N}\)). Therefore, since the second part R n dominates, the limiting distribution for

$$n^{1-(d+1/2)}L_{2}^{-1/2}\bigl(n^{-1}\bigr)\hat{ \gamma}_{n}(s) $$

is the same as for R n . If on the other hand d∈(0,1/4), then both terms M n and R n are of the same order.

This consideration can be extended to random variables ξ t such that (4.121) holds with α∈(2,4). Then, we have again a dichotomous behaviour: the limit can be either a stable random variable or a Hermite–Rosenblatt random variable. The situation becomes complicated though when one considers models with leverage. We refer to Davis and Mikosch (2001) and Kulik and Soulier (2012).

4.4.5 Summary of Limit Theorems for Sample Covariances

We consider a centred linear process \(X_{t}=\sum _{j=0}^{\infty}a_{j}\varepsilon_{t-j}\) such that either \(E(\varepsilon_{1}^{4})<\infty \) or

$$P(\varepsilon_{1}>x)\sim A\frac{1+\beta}{2}{x^{-\alpha}},\qquad P(\varepsilon_{1}<-x)\sim A\frac{1-\beta}{2}x^{-\alpha} $$

with α∈(1,4) and appropriate regularity conditions (that assure existence of the process). In the table, Z 2,H (⋅) is a Hermite–Rosenblatt process on [0,1], and \(\tilde{S}_{\alpha/2}\) is an α/2-stable random variable. Furthermore, c is a generic constant. The main results are summarized in Table 4.3.

Table 4.3 Limits for sample covariances

4.5 Limit Theorems for Quadratic Forms

In this section we consider quadratic forms,

$$ Q_{n}(u):=\sum_{t,s=1}^{[nu]}b_{t-s} \bigl\{ G(X_{t},X_{s})-E \bigl[ G(X_{t},X_{s}) \bigr] \bigr\} ,\qquad Q_{n}:=Q_{n}(1), $$
(4.123)

where b k (\(k\in\mathbb{Z}\)) is a sequence of constants, and \(G:\mathbb{R}^{2}\rightarrow\mathbb{R}\). We are interested in asymptotic properties of Q n (u).

In the Gaussian case, such studies were conducted in Rosenblatt (1979), Fox and Taqqu (1985, 1987), Avram (1988), Terrin and Taqqu (1990), Beran and Terrin (1994), among others. For linear processes, classical limit theorems for weakly dependent sequences are given in Brillinger (1969) and Hannan (1970) (and references therein); also see Klüppelberg and Mikosch (1996). They follow directly from limit theorems for sample covariances, proven in Theorem 4.23. For long memory such studies were initiated by Giraitis and Surgailis (1990). The authors concluded a weakly dependent behaviour, using approximation of a quadratic form by another quadratic form with weakly dependent variables. Other results along these lines were proven for instance in Horváth and Shao (1999) and Bhansali et al. (1997). The case of the multivariate Appell polynomials is studied in Terrin and Taqqu (1991), Giraitis and Taqqu (1997, 1998, 1999a, 2001), Giraitis et al. (1998). Kokoszka and Taqqu (1997) discuss quadratic forms for infinite-variance processes. We also refer to Giraitis and Taqqu (1999b) for an overview.

There are two principal applications of quadratic forms. First, we can derive the limiting behaviour of the periodogram and the Whittle estimator (see Sect. 5.5 for results and references), or we can use quadratic forms to test for possible changes in the long-memory parameter (see e.g. Beran and Terrin 1996, Horváth and Shao 1999).

4.5.1 Gaussian Sequences

In this section we shall assume that X t (\(t\in\mathbb{Z}\)) is a centred Gaussian sequence with autocovariance function γ X (k)=L γ (k)k 2d−1. First, we exploit the relation between sample covariances and quadratic forms. Using results obtained in Sect. 4.4, we obtain a long-memory behaviour I (i.e. of “type I”) of Q n (u) for d∈(1/4,1/2) directly from limit theorems for sample covariances. The result was proven in Fox and Taqqu (1985) and is presented in Theorem 4.26. For d∈(0,1/4), we obtain convergence with rate n −1/2, as proven in Fox and Taqqu (1985) as well. The result is presented in Theorem 4.27 and is referred to as weakly dependent behaviour I.

These results are very similar to those for partial sums \(\sum_{t=1}^{[nu]}(X_{t}^{2}-1)\). These sums were studied in Sect. 4.2.3, and we recall the dichotomous behaviour: convergence to the Hermite–Rosenblatt process or Brownian motion for d∈(1/4,1/2) and d∈(0,1/4) respectively.

In Theorem 4.26 the limiting process will be degenerated if ∑ l b l =0, as it happens for Fourier coefficients. Another type of weakly dependent behaviour is obtained if in addition to ∑ l b l =0 the coefficients also decay to zero fast enough. Then, the coefficients b l compensate for long memory, and Q n (⋅) converges at rate n 1/2 for all d∈(0,1/2) (weakly dependent behaviour II). Such results were proven in Fox and Taqqu (1985, Theorem 3; 1987), Avram (1988), Beran and Terrin (1994) (also Beran 1986). The authors use the method of cumulants; see the proof of Theorem 4.28. On the other hand, if the coefficients b l do not compensate for long memory, then Terrin and Taqqu (1990) prove that the limiting process is neither Gaussian nor Hermite–Rosenblatt (long-memory behaviour II). The authors use multiple Wiener–Itô integrals; see the proof of Theorem 4.29.

4.5.1.1 Long Memory Behaviour I

Recall that the sample covariances for the sequence X t (\(t\in\mathbb{Z}\)) are defined by

$$\hat{\gamma}_{n}(s)=\frac{1}{n}\sum _{t=1}^{n-\vert s\vert }X_{t}X_{t+\vert s\vert }. $$

Reorganizing indices, we may write

$$Q_{n}(1)=\sum_{t,s=1}^{n}b_{t-s} \bigl(X_{t}X_{s}-E(X_{t}X_{s})\bigr)=n\sum _{|l|\leq n-1}b_{l} \bigl( \hat{\gamma}_{n}(l)- \gamma_{X}(l) \bigr) . $$

Recall that for d∈(1/4,1/2) (see (4.116)),

$$ n^{1-2d}L_{2}^{-1/2}(n) \bigl( \hat{ \gamma}_{n}(1)-\gamma_{X}(1),\ldots ,\hat{ \gamma}_{n}(K)-\gamma_{X}(K) \bigr) \overset{\mathrm {d}}{ \rightarrow }\bigl(Z_{2,H}(1),\ldots,Z_{2,H}(1)\bigr). $$
(4.124)

This, together with the continuous mapping theorem, implies that for any fixed integer K>0,

Clearly, \(( \sum_{l=-K}^{K}b_{l}) Z_{2,H}(1)\overset{\mathrm {p}}{\rightarrow}( \sum_{l=-\infty}^{\infty}b_{l}) Z_{2,H}(1)\). Furthermore,

$$\lim_{K\rightarrow\infty}\limsup_{n\rightarrow\infty}P \bigl( n^{-2d}L_{2}^{-1/2}(n)\big|Q_{n,K}(1)-Q_{n}(1)\big|>\delta \bigr) =0 $$

for each δ>0. The reader is referred to Fox and Taqqu (1985, Theorem 1) for details on the latter approximation and tightness. This leads to the following result, which is formulated more generally in a functional form.

Theorem 4.26

Assume that X t (\(t\in\mathbb{Z}\)) is a stationary sequence of standard normal random variables such that γ X (k)∼L γ (k)k 2d−1, d∈(1/4,1/2). If \(\sum_{l=-\infty}^{\infty}|b_{l}|<\infty\), then

where \(L_{2}(n)=2!C_{2}L_{\gamma}^{2}(n)\) (cf. (4.22)), \(H=d+\frac{1}{2}\), ⇒ denotes weak convergence, and Z 2,H (⋅) is the Hermite–Rosenblatt process.

This result has been proven in fact in a more general setting Fox and Taqqu (1985). Consider

$$Q_{n}(u;H_{m}):=\sum_{t,s=1}^{[nu]}b_{t-s} \bigl\{ H_{m}(X_{t})H_{m}(X_{s})-E \bigl[ H_{m}(X_{t})H_{m}(X_{s}) \bigr] \bigr\} . $$

The same methodology as above works, given that we use (4.118) instead of (4.124):

We conclude for d∈(1/4,1/2) and under the condition \(\sum_{l=-\infty }^{\infty}|b_{l}|<\infty\),

$$n^{-2d}L_{A_{2}}^{-1/2}(n)Q_{n}(1;H_{m}) \overset{\mathrm{d}}{\rightarrow }m!m \Biggl( \sum_{l=-\infty}^{\infty}b_{l} \gamma_{X}^{m-1}(l) \Biggr) Z_{2,H}(1). $$

4.5.1.2 Weakly Dependent Behaviour I

Theorem 4.26 above requires d∈(1/4,1/2). What about d∈(0,1/4)? As in the case of partial sums \(\sum _{t=1}^{[nu]}(X_{t}^{2}-1)\), one obtains a weakly dependent behaviour, i.e. a central limit theorem with scaling n −1/2 Fox and Taqqu (1985).

Theorem 4.27

Assume that X t (\(t\in\mathbb{Z}\)) is a stationary sequence of standard normal random variables such that γ X (k)∼L γ (k)k 2d−1, d∈(0,1/4). Then

$$n^{-1/2}Q_{n}(u)=n^{-1/2}\sum _{t,s=1}^{[nu]}b_{t-s}\bigl(X_{t}X_{s}-E(X_{t}X_{s})\bigr)\Rightarrow\sigma_{0}B(u), $$

where B(⋅) is a standard Brownian motion, and σ 0>0.

The constant σ 0 is given in a complicated form, and we refer to Fox and Taqqu (1985) for a precise formula.

4.5.1.3 Weakly Dependent Behaviour II

In Theorem 4.26 it may happen that \(\sum_{l=-\infty}^{\infty}b_{l}=0\) and hence the limit will be degenerated. This can happen when b l are Fourier coefficients of a real-valued function g. Specifically, let

$$ b_{l}=\int_{-\pi}^{\pi}e^{il\lambda}g(\lambda)\,d\lambda=:2\pi\hat{g}_{l} ,\qquad g(\lambda)\sim c_{g}|\lambda|^{-\gamma}\quad\text{as}\ |\lambda|\rightarrow0. $$
(4.125)

To assure the existence of Fourier coefficients, we assume that γ<1. Then, b l c b l γ−1, \(c_{b}=2c_{g}\varGamma(1-\gamma)\sin( \pi\frac{\gamma}{2}) \). The following result was proven in Fox and Taqqu (1987); see also Theorem 3 in Fox and Taqqu (1985) and Avram (1988).

Theorem 4.28

Assume that X t (\(t\in\mathbb{Z}\)) is a stationary sequence of standard normal random variables such that γ X (k)∼L γ (k)k 2d−1, d∈(0,1/2). If

$$ 2d+\gamma<1/2, $$
(4.126)

then

$$ n^{-1/2}Q_{n}(1)\overset{d}{\rightarrow}\sigma_{Q}Z, $$
(4.127)

where

$$\sigma_{Q}^{2}:=16\pi^{3}\int_{-\pi}^{\pi} \bigl(f(\lambda)g(\lambda)\bigr)^{2}\,d\lambda, $$

f=f X is the spectral density of X t (\(t\in\mathbb{Z}\)), and Z is a standard normal random variable.

Let us comment on condition (4.126). First, it assures that \(\sigma_{Q}^{2}\) is finite. Second, it means that the coefficients b l decay appropriately fast, to compensate for long memory in X t (\(t\in\mathbb{Z}\)).

Proof

We present a modified version of the proof in Avram (1988). Let \(\varSigma =[\gamma_{X}(j-l)]_{j,l=1}^{n}\) and \(B=[b_{j-l}]_{j,l=0}^{n-1}\). Then,

$$Q_{n}(1)=(X_{1},\ldots,X_{n})B(X_{1},\ldots,X_{n})^{T}$$

has the pth cumulant equal to (see Grenander and Szegö 1958, p. 218)

$$\mathrm{cum}_{p}\bigl(Q_{n}(1)\bigr)=2^{p-1}(p-1)!\mathrm{Trace}(\varSigma B)^{p}.$$

Note that

$$\gamma_{X}(j-l)=\int_{-\pi}^{\pi}e^{i(j-l)\lambda}f_{X}(\lambda)\,d\lambda =:2\pi\hat{f}_{j-l}, $$

where \(\hat{f}_{j-l}\) is the Fourier coefficient of the spectral density f=f X . Furthermore, \(B=2\pi[\hat{g}_{j-l}]_{j,l=0}^{n-1}\). Recall that the trace of a matrix is the sum of its diagonal elements. We have

$$\frac{1}{n}\mathrm{Trace}(\varSigma)=\frac{2\pi}{n}(\hat{f}_{0}+\cdots +\hat{f}_{0})=2\pi\hat{f}_{0}=\int_{-\pi}^{\pi}f_{X}(\lambda)\,d\lambda. $$

Of course, f X is integrable given d<1/2. Analogously, recall that the trace can be written as a Hadamard product: \({\rm Trace}(\varSigma B)=\sum_{j,l}\gamma_{X}(j-l)B_{j,l}\). Since \(\hat{f}_{l}\hat{g}_{l}\) is summable, we then obtain

as n→∞. By the Parseval identity and since g is real,

$$\lim_{n\rightarrow\infty}\frac{1}{n}\mathrm{Trace}(\varSigma B)=4\pi^{2}\frac{1}{2\pi}\int_{-\pi}^{\pi}f_{X}(\lambda)\bar{g}(\lambda)\,d\lambda \;=2\pi\int_{-\pi}^{\pi}f_{X}(\lambda)g(\lambda)\,d\lambda. $$

On the other hand, if λ 1,…,λ n are the eigenvalues of ΣB, then we can write alternatively

$$\lim_{n\rightarrow\infty}\frac{1}{n}\mathrm{Trace}(\varSigma B)=\lim _{n\rightarrow\infty}\frac{1}{n}\sum_{j=1}^{n}\lambda_{j}\rightarrow \frac{4\pi^{2}}{2\pi}\int_{-\pi}^{\pi}f_{X}(\lambda)g(\lambda)\,d\lambda. $$

The matrix (ΣB)p has eigenvalues \(\lambda_{j}^{p}\), j=1,…,n. One can then argue analogously that

$$\lim_{n\rightarrow\infty}\frac{1}{n}\mathrm{Trace}(\varSigma B)^{p}=\lim_{n\rightarrow\infty}\frac{1}{n}\sum_{j=1}^{n}\lambda_{j}^{p}=\frac{(4\pi^{2})^{p}}{2\pi}\int_{-\pi}^{\pi}(f_{X}(\lambda)g(\lambda ))^{p}\,d\lambda. $$

Thus,

$$\mathrm{cum}_{p}\bigl(n^{-1/2}Q_{n}(1) \bigr)=n^{-p/2}\mathrm{cum}_{p}\bigl(Q_{n}(1) \bigr)=\frac{2^{p-1}}{n^{p/2-1}}\frac{(p-1)!}{n}\mathrm{Trace}(\varSigma B)^{p}. $$

Consequently, lim n→∞cum p (n −1/2 Q n (1))=0 if p>2, and

$$\lim_{n\rightarrow\infty}\mathrm{cum}_{2}\bigl(n^{-1/2}Q_{n}(1) \bigr)=16\pi^{3}\int_{-\pi}^{\pi} \bigl(f_{X}(\lambda)g(\lambda)\bigr)^{2}\,d\lambda, $$

which provides the limiting variance. Application of the method of cumulants (see Theorem 4.1) then yields the result. □

4.5.1.4 Long-Memory Behaviour II

In contrast to Theorem 4.28, if the coefficients b l do not compensate for long memory (i.e., when (4.126) fails to hold), then we have the following result, due to Terrin and Taqqu (1990). Recall that g(λ)∼c g |λ|γ as λ→0 (see (4.125)) and that M 0(⋅) is a random measure that appears in the spectral representation of the linear Gaussian sequence; see Sect. 4.1.3.

Theorem 4.29

Assume that X t (\(t\in\mathbb{Z}\)) is a stationary sequence of standard normal random variables such that γ X (k)∼L γ (k)k 2d−1, d∈(0,1/2). If

$$ 1/2<2d+\gamma<1, $$
(4.128)

then

$$ n^{-(2d+\gamma)}L_{f}^{-1}\bigl(n^{-1} \bigr)Q_{n}(u)\Rightarrow c_{g}Z(u), $$
(4.129)

where

$$Z(u)=\int\!\!\!\int\psi_{u}(\lambda_{1},\lambda_{2})\frac{1}{\lambda_{1}} \frac{1}{\lambda_{2}}\,dM_{0}(\lambda_{1})\,dM_{0}(\lambda_{2}), $$

and

$$\psi_{u}(\lambda_{1},\lambda_{2})=\int_{\mathbb{R}}\frac{e^{iu(\lambda _{1}-\lambda)}-1}{i(\lambda_{1}+\lambda)}\frac{e^{iu(\lambda_{2}+\lambda )}-1}{i(\lambda_{2}-\lambda)}|\lambda|^{-\gamma}\,d\lambda. $$

The limiting process is self-similar with \(H=2d+\gamma\in(\frac{1}{2},2)\), but neither Gaussian nor Hermite–Rosenblatt.

We note that for γ=0, we have b l =1 for l=0 and 0 otherwise. In this case the result of Theorem 4.29 reduces to the asymptotic behaviour of \(\sum _{t=1}^{[nu]}(X_{t}^{2}-1)\), see Theorem 4.3.

Proof

The proof is sketched here. It follows the same idea as in the case of partial sums \(\sum_{t=1}^{n}H_{m}(X_{t})\). Recall that the multiple Wiener–Itô integral “removes” the diagonal (see Appendix A). We write

$$X_{t}X_{s}-E(X_{t}X_{s})=\int_{[-\pi,\pi]^{2}\setminus\{\lambda_{1}=\lambda_{2}\}}e^{it\lambda_{1}}e^{is\lambda_{2}}a(\lambda_{1})a(\lambda _{2})\,dM_{0}(\lambda_{1})\,dM_{0}(\lambda_{2}), $$

where |a(λ)|2=f X (λ).

Thus,

where

Thus, Q n (1) equals in distribution to

$$\int_{-n\pi}^{n\pi}\!\!\!\int_{-n\pi}^{n\pi} a \biggl( \frac{\lambda_{1}}{n} \biggr) a \biggl( \frac{\lambda_{2}}{n} \biggr) \psi_{1}(\lambda_{1},\lambda_{2};n) \,dM_{0}(\lambda_{1})\,dM_{0}(\lambda_{2}) . $$

Clearly, lim n→∞ ψ 1(λ 1,λ 2;n)=ψ 1(λ 1,λ 2), and as in the alternative proof of Theorem 4.2, one can argue that the convergence is uniform. Therefore, the same method as in Theorem 4.2 applies, and the result (4.129) follows for u=1. A proof of functional convergence is omitted here. □

4.5.2 Linear Processes

As in the case of partial sums, the results on quadratic forms for Gaussian LRD sequences have a counterpart for general linear sequences

$$ X_{t}=\sum_{j=0}^{\infty}a_{j}\varepsilon_{t-j}\quad(t\in\mathbb{Z}), $$
(4.130)

where \(\sum_{j=0}^{\infty}a_{j}^{2}=1\), ε t (\(t\in\mathbb{Z}\)) are i.i.d. zero mean random variables with \(\operatorname {var}(\varepsilon_{1})=\sigma _{\varepsilon}^{2}=1\). We will assume that either \(\sum_{j=0}^{\infty}|a_{j}|<\infty\) or a j L a (j)j d−1 with d∈(0,1/2).

Results for quadratic forms

$$Q_{n}(u)=\sum_{t,s=1}^{[nu]}b_{t-s} \bigl(X_{t}X_{s}-E(X_{t}X_{s})\bigr) $$

based on weakly dependent linear processes are classical (see Brillinger 1969; Hannan 1970; also see Klüppelberg and Mikosch 1996) and follow directly from limit theorems for sample covariances, as proven before in Theorem 4.23.

For long memory, such studies had been initiated by Giraitis and Surgailis (1990). The authors concluded a weakly dependent behaviour, similar to that of Theorem 4.28, using an approximation of the quadratic form by another quadratic form with weakly dependent variables. Other results along this line can be found in Horváth and Shao (1999) and Bhansali et al. (1997).

When one replaces Q n (u) by

$$Q_{n}(u;P_{m_{1},m_{2}})=\sum_{t,s=1}^{[nu]}b_{t-s} \bigl\{ P_{m_{1},m_{2}}(X_{t}X_{s})-E \bigl[ P_{m_{1},m_{2}}(X_{t},X_{s}) \bigr] \bigr\} , $$

where \(P_{m_{1},m_{2}}\) is a multivariate Appell polynomial, then limit theorems are very complicated; see Terrin and Taqqu (1991), Giraitis and Taqqu (1997, 1998, 1999a, 2001). We refer to Giraitis and Taqqu (1999b) for an overview.

4.5.2.1 Weakly Dependent Processes

Assume that \(\sum_{j=0}^{\infty}|a_{j}|<\infty\). Recall Theorem 4.23 and the multivariate convergence (4.120):

$$n^{1/2}\bigl(\hat{\gamma}_{n}(0)-\gamma_{X}(0), \ldots,\hat{\gamma}_{n}(K)-\gamma_{X}(K)\bigr) \overset{\mathrm{d}}{\rightarrow}(G_{0},\ldots,G_{K}), $$

where (G 0,…,G K ) is a Gaussian vector. We apply a similar method as in the proof of Theorem 4.26. There we concluded long-memory behaviour of quadratic forms from long-memory behaviour of sample covariances. Here, we will conclude short-memory behaviour of quadratic forms from short memory-behaviour of sample covariances.

We have

$$Q_{n}(1)=\sum_{t,s=1}^{n}b_{t-s} \bigl(X_{t}X_{s}-E(X_{t}X_{s})\bigr)=n\sum _{|l|\leq n-1}b_{l} \bigl( \hat{\gamma}_{n}(l)- \gamma_{X}(l) \bigr) . $$

The continuous mapping theorem implies

$$n^{-1/2}Q_{n,K}(1):=n^{-1/2}n\sum _{|l|\leq K}b_{l} \bigl( \hat{\gamma}_{n}(l)- \gamma_{X}(l) \bigr) \overset{\mathrm{d}}{\rightarrow}b_{0}G_{0}+2\sum_{l=1}^{K}b_{l}G_{l}. $$

To apply Proposition 4.1, we need to show that

$$\lim_{K\rightarrow\infty}\limsup_{n\rightarrow\infty}P \Biggl( \sqrt {n}\Biggl \vert \sum _{l=K+1}^{n-1}b_{l}\bigl(\hat{ \gamma}_{n}(l)-\gamma_{X}(l)\bigr)\Biggr \vert >\delta \Biggr) =0. $$

This is straightforward since the correlations between \(\hat{\gamma}_{n}(l)\) (l≥1) are absolutely summable. Therefore, we may apply Chebyshev inequality in a suitable way to finish the proof. □

4.5.2.2 Long-Memory Sequences

The following result is a counterpart to Theorem 4.28.

Theorem 4.30

Assume that X t (\(t\in\mathbb{N}\)) is a linear process with long-range dependence defined in (4.130), with spectral density f X (λ)∼c f |λ|−2d. Assume that the coefficients b l are given by (4.125), i.e. b l c b l γ−1. Let κ 4 be the fourth cumulant of ε 1. If

$$ 2d+\gamma<1/2, $$
(4.131)

then

$$ n^{-1/2}Q_{n}(1)=n^{-1/2}\sum _{t,s=1}^{n}b_{t-s}\bigl(X_{t}X_{s}-E(X_{t}X_{s})\bigr)\overset{\mathrm{d}}{\rightarrow}\sigma_{Q}Z, $$
(4.132)

where Z is standard normal, and

$$\sigma_{Q}^{2}:=16\pi^{3}\int_{-\pi}^{\pi} \bigl(f_{X}(\lambda)g(\lambda )\bigr)^{2}\,d\lambda+ \kappa_{4} \biggl( 2\pi\int_{-\pi}^{\pi}f_{X}( \lambda )g(\lambda)\,d\lambda \biggr)^{2}. $$

Of course, if the innovations ε t are normal, then κ 4=0, and the result reduces to Theorem 4.28.

Proof

To prove this theorem, Giraitis and Surgailis (1990) do not use the method of cumulants. Instead, they approximate Q n =Q n (1) by a weakly dependent sequence. A similar approach is also used in Bhansali et al. (1997), and we present a sketch of the method there.

Write \(Q_{n,X}=\sum_{t,s=1}^{n}b_{t-s}X_{t}X_{s}\) and \(Q_{n,\varepsilon} =\sum_{t,s=1}^{n}v_{t-s}\varepsilon_{t}\varepsilon_{s}\), where

$$v_{l}=2\pi\int_{-\pi}^{\pi}g(\lambda)f_{X}(\lambda)e^{il\lambda}\,d\lambda. $$

Since Q n,ε is a quadratic form of independent random variables, it is much easier to derive its asymptotic distribution, namely (see Bhansali et al. 1997, Theorem 4.1):

$$\frac{1}{\sqrt{\operatorname {var}(Q_{n,\varepsilon})}} \bigl( Q_{n,\varepsilon}-E ( Q_{n,\varepsilon} ) \bigr) \overset{ \mathrm{d}}{\rightarrow}N(0,1), $$

where

$$\operatorname {var}(Q_{n,\varepsilon})=v_{0}^{2}n\cdot\sigma_{\varepsilon}^{2}+2\sum _{{j,l=1;\ j\not=l}}^{n}v_{j-l}^{2}$$

and \(\sigma_{\varepsilon}^{2}=\operatorname {var}(\varepsilon_{t})\). Under our assumptions,

$$g(\lambda)f_{X}(\lambda)\sim c_{g}|\lambda|^{-\gamma}c_{f}|\lambda |^{-2d}$$

as λ→0. Therefore, the coefficients v l satisfy

$$v_{l}\sim c_{v}l^{2d+\gamma-1},\quad c_{v}=2c_{f}c_{g} \varGamma \bigl(1-(2d+\gamma )\bigr)\sin \biggl( \pi\frac{2d+\gamma}{2} \biggr) . $$

Furthermore, Q n,X Q n,ε =o P (1). Evaluation of this is quite challenging, and the reader is referred to Giraitis and Surgailis (1990). Once this is verified, the convergence of Q n,X follows from the convergence of Q n,ε E(Q n,ε ). □

The limiting behaviour of quadratic forms becomes more involved if one considers nonlinear functionals. Recall the definition of bivariate Appell polynomials. Redefine Q n as

$$Q_{n}(u)=Q_{n}(u;P_{m_{1},m_{2}})=\sum _{t,s=1}^{[nu]}b_{t-s} \bigl\{ P_{m_{1},m_{2}}(X_{t},X_{s})-E \bigl[P_{m_{1},m_{2}}(X_{t},X_{s})\bigr] \bigr\} . $$

Let \(B=[b_{j-l}]_{j,l=1}^{n}\) and \(\varSigma^{(m)}=[\gamma _{X}^{m}(j-l)]_{j,l=1}^{n}\). Also, let h m be the m-fold convolution of a function h. Giraitis and Taqqu (1997) showed that if

$$ \lim_{n\rightarrow\infty}\frac{\mathrm{Trace}(\varSigma^{(m_{1})}B\varSigma ^{(m_{2})}B)}{n}=\int_{-\pi}^{\pi}f_{X}^{\ast m_{1}}(\lambda)f_{X}^{\ast m_{2}}(\lambda)g^{2}(\lambda)\,d\lambda<\infty, $$
(4.133)

then n −1/2 Q n converges in distribution to a normal random variable; however the formula for the limiting variance is quite complicated. Condition (4.133) holds if

$$ \max\bigl(1-m_{1}(1-2d),0\bigr)\big/2+\max\bigl(1-m_{2}(1-2d),0 \bigr)\big/2+\gamma<1/2. $$
(4.134)

In particular, if m 1=m 2=1, then this is equivalent to 2d+γ<1/2, so that we recover (4.131). On the other hand, if m 1=1, m 2=2, then the condition reads: 3d−1+γ<1/2 if d∈(1/4,1/2); d+γ<3/2 if d∈(0,1/4).

If (4.134) does not hold, then there is a variety of different possible limits, as presented in Giraitis and Taqqu (1999b). The proofs involve the familiar method based on the multiple Wiener–Itô integrals.

4.5.3 Summary of Limit Theorems for Quadratic Forms

We summarize the main results for quadratic forms of Gaussian sequences in Table 4.4. We assume that X t (\(t\in\mathbb{Z}\)) is a centred Gaussian sequence with covariance γ X (k)∼c γ k 2d−1, d∈(0,1/2), so that a slowly varying function can be omitted. In what follows, B(⋅) is a Brownian motion on [0,1], Z 2,H (⋅) is a Hermite–Rosenblatt process on [0,1], and Z(⋅) is the self-similar process with Hurst parameter H=2d+γ, as in Theorem 4.29. Furthermore, c is a generic constant.

Table 4.4 Panorama of limits for quadratic forms of Gaussian sequences

4.6 Limit Theorems for Fourier Transforms and the Periodogram

In this section we present some basic properties of the Discrete Fourier Transform (DFT) and the periodogram. We analyse their second-order properties showing a remarkable difference between weakly dependent and long-memory linear processes. In particular, the DFT and the periodogram computed at Fourier frequencies are asymptotically independent under short memory but asymptotically dependent under long memory. To achieve asymptotic independence in the latter case, one has to consider the DFT at appropriately high frequencies. The asymptotic dependence of the DFT and the periodogram ordinates implies a different limiting behaviour of the DFT under short and long memory respectively.

4.6.1 Periodogram and Discrete Fourier Transform (DFT)

For an observed second-order stationary time series X 1,…,X n , let \(\bar{x}=\bar{x}_{n}=n^{-1}\sum_{t=1}^{n}X_{t}\) and define by

the sample autocovariances. Also, define the (centred) periodogram by

If E[X 1]=μ=0, then \(I_{n,X}^{\mathrm{centred}}(\lambda)\) can be approximated by

$$I_{n,X}(\lambda)=\frac{1}{2\pi n}\Biggl \vert \sum _{t=1}^{n}X_{t}e^{-it\lambda }\Biggr \vert ^{2}. $$

For Fourier frequencies λ j =2πj/n (j=1,…,N n ; N n =[(n−1)/2]), we have the exact identity \(I_{n,X}^{\mathrm{centred}}(\lambda_{j})=I_{n,X}(\lambda_{j})\) since \(\sum_{t=1} ^{n}e^{-it\lambda_{j}}=0\). Therefore, in most applications the non-centred periodogram I n,X is used. The non-centred periodogram can be written in terms of the discrete Fourier transform (DFT). Let

$$d_{n,X}(\lambda)=\frac{1}{\sqrt{2\pi n}}\sum_{t=1}^{n}X_{t}e^{it\lambda}. $$

Then clearly I n,X (λ)=|d n,X (λ)|2.

4.6.2 Second-Order Properties of the Fourier Transform and the Periodogram

4.6.2.1 Mean and Covariance of the DFT and the Periodogram

We are interested in a general expression for the expected value and covariance of the DFT and the periodogram ordinates I n,X (λ j ), where λ j are Fourier frequencies.

Lemma 4.22

Assume that X t (\(t\in\mathbb {Z}\)) is a second-order stationary sequence with mean 0, covariance function γ X and spectral density f X . Then E[d n,X (λ j )]=0,

$$E \biggl( \frac{I_{n,X}(\lambda_{j})}{f_{X}(\lambda_{j})} \biggr) =\frac{1}{f_{X}(\lambda_{j})}\int_{-\pi}^{\pi}K_{n}( \lambda_{j}-\lambda )f_{X}(\lambda)\,d\lambda $$

and

$$ E \bigl[ d_{n,X} ( \lambda_{j} ) \,\overline{d_{n,X} ( \lambda_{j} ) } \,\bigr] =\int_{-\pi}^{\pi}K_{n} ( \lambda -\lambda_{j} ) f_{X}(\lambda)\,d\lambda, $$
(4.135)

where

$$K_{n}(\lambda)=\frac{1}{2\pi n} \biggl( \frac{\sin(n\lambda/2)}{\sin (\lambda/2)} \biggr)^{2}$$

is the Féjer kernel.

Proof

The formula is classical (see Priestley 1981 p. 419), but we give a proof for completeness. We have

Furthermore,

Similarly, (4.135) follows from

 □

Note that the Féjer kernel is also defined by

$$K_{n} ( \lambda ) =\frac{1}{2\pi n}\sum_{t,s=1}^{n}e^{-i(t-s)\lambda}=\frac{1}{2\pi n}\bigl \vert D_{n}(\lambda)\bigr \vert ^{2}, $$

where

$$D_{n}(\lambda)=\sum_{t=1}^{n}e^{it\lambda}= \frac{e^{i ( n+1 ) \lambda}-e^{i\lambda}}{e^{i\lambda}-1}$$

is (a version of) the Dirichlet kernel.

4.6.2.2 Weakly Dependent Sequences

Assume that X t (\(t\in\mathbb{Z}\)) is a second-order stationary weakly dependent time series with mean 0. Then (see e.g. Brockwell and Davis 1991) the following holds:

  • The periodogram is an asymptotically unbiased estimator of the spectral density:

    $$ E \bigl[ I_{n,X}(\lambda_{j})-f_{X}( \lambda_{j}) \bigr] =O\bigl(n^{-1}\bigr) $$
    (4.136)

    uniformly in j=1,…,[n/2].

  • The periodogram ordinates at Fourier frequencies are asymptotically uncorrelated with correlations converging to zero uniformly:

    $$ \bigl \vert \mathit{cov} \bigl( I_{n,X} ( \lambda_{j} ) ,I_{n,X} ( \lambda_{l} ) \bigr) \bigr \vert \leq C_{1}n^{-1} $$
    (4.137)

    with some finite constant C 1.

  • $$ \biggl( \frac{I_{n,X} ( \lambda_{j_{1}} ) }{f_{X} ( \lambda_{j_{1}} ) },\ldots,\frac{I_{n,X} ( \lambda_{j_{k}} ) }{f_{X} ( \lambda_{j_{k}} ) } \biggr) \underset{d}{ \rightarrow } ( Z_{1},\ldots,Z_{k} ), $$
    (4.138)

    where Z 1,…,Z k are i.i.d. standard exponential random variables, and \(\lambda_{j_{1}},\allowbreak \ldots,\lambda_{j_{k}}\) are distinct Fourier frequencies.

On the other hand, it will be shown in a subsequent section that these properties are no longer valid for linear time series with long memory.

Of course, the main tool to establish (4.137) and (4.138) is Lemma 4.22. Note that (cf. Gradshteyn and Rhyzhik 1965, p. 414) \(\int_{-\pi}^{\pi}K_{n}(\lambda_{j}-\lambda)\,d\lambda=1\). Thus, if X t =ε t is a centred i.i.d. sequence, then \(f_{\varepsilon}(\lambda)=\sigma_{\varepsilon}^{2}/(2\pi)\), and hence,

$$ E \biggl( \frac{I_{n,\varepsilon}(\lambda_{j})}{f_{\varepsilon}(\lambda_{j})} \biggr) =1\quad\bigl(j=1,\ldots,[n/2]\bigr), $$
(4.139)

independently of the chosen Fourier frequency λ j . This justifies (4.137) for an i.i.d. sequence. It should be mentioned, though, that this equality is valid at Fourier frequencies only. Furthermore, if ε t (\(t\in\mathbb{Z}\)) are i.i.d. with mean zero and variance \(\sigma_{\varepsilon}^{2}\), then we have, for distinct Fourier frequencies λ k , λ l (\(k\not=l\)),

$$ E \bigl[ d_{n,\varepsilon}(\lambda_{k})\,\overline{d_{n,\varepsilon }( \lambda_{l})}\, \bigr] =\frac{\sigma_{\varepsilon}^{2}}{2\pi}\sum _{t=1}^{n}e^{it(\lambda_{k}-\lambda_{l})}=0. $$
(4.140)

If in addition the random variables ε t are standard Gaussian, then the discrete Fourier transform at different Fourier frequencies is also jointly Gaussian and hence independent. Consequently, the periodogram ordinates I n,ε (λ j )=|d n,ε (λ j )|2 computed at distinct Fourier frequencies are independent. Moreover, 2πI n,ε (λ j ) (j=1,…,N n ; N n =[(n−1)/2]) have a standard exponential distribution. In particular,

$$ E \bigl[ 2\pi I_{n,\varepsilon}(\lambda_{j}) \bigr] =1,\qquad \operatorname {var}\bigl(2\pi I_{n,\varepsilon}(\lambda_{j})\bigr)=1. $$
(4.141)

If the random variables ε t are not Gaussian, then d n,ε (λ k ), d n,ε (λ l ) are uncorrelated (i.e. (4.140) still holds), but they are no longer independent. For the periodogram, we have

$$ \mathit{cov}\bigl(I_{n,\varepsilon}(\lambda_{k}),I_{n,\varepsilon}( \lambda_{l})\bigr)=\frac{\kappa_{4}}{4\pi^{2}n}, $$
(4.142)

where κ 4 is the fourth cumulant. Note that in the Gaussian case κ 4=0. Nevertheless, the periodogram ordinates are asymptotically independent and have the standard exponential distribution. This way one obtains (4.138).

4.6.2.3 Linear Long-Memory Sequences

Properties (4.136), (4.137) and (4.138) are not valid in the case of linear process with long memory. The behaviour of the periodogram at frequencies converging to zero can be formulated as follows (Künsch 1986; Hurvich and Beltrao 1993, 1994a, 1994b; Robinson 1995a):

Theorem 4.31

Let \(X_{t}=\sum _{j=0}^{\infty }a_{j}\varepsilon_{t-j}\) be a second-order stationary linear process and assume that f X (λ)∼c f |λ|−2d as |λ|→0 with d∈(0,1/2). Define

$$\mu(j;d)=|2\pi j|^{2d}\frac{2}{\pi}\int_{-\infty}^{\infty} \frac{\sin^{2}(\lambda/2)}{(2\pi j-\lambda)^{2}}\vert \lambda \vert ^{-2d}\,d\lambda. $$

Then for any fixed positive integer j,

$$\lim_{n\rightarrow\infty}E \biggl[ \frac{I_{n,X}(\lambda_{j})}{f_{X}(\lambda_{j})} \biggr] =\mu(j;d). $$

Proof

We use Lemma 4.22. Using the assumption f X (λ)∼c f |λ|−2d, we have

(4.143)

It is easy to see that, as n→∞, the functions

converge pointwise to

$$\biggl \vert \frac{2\pi j}{\lambda}\biggr \vert ^{2d}\frac{2}{\pi} \frac{\sin^{2}(\lambda/2)}{(2\pi j-\lambda)^{2}}. $$

Thus,

$$\lim_{n\rightarrow\infty}E \biggl( \frac{I_{n,X}(\lambda_{j})}{f_{X}(\lambda_{j})} \biggr) =|2\pi j|^{2d} \frac{2}{\pi}\int_{-\infty}^{\infty }\frac{\sin^{2}(\lambda/2)}{(2\pi j-\lambda)^{2}} \vert \lambda \vert ^{-2d}\,d\lambda, $$

given that we can exchange limit with integration (which follows from Lebesgue dominated convergence) and that integration over (−∞,−)∪(,∞) is negligible. □

Detailed calculations can be found in Hurvich and Beltrao (1993). The authors considered a more general spectral density f X (λ)=|λ|−2d f (λ) with a smooth function f . In fact, this computation is valid for d∈(−0.5,1.5); however, if d>0.5, f X is not a spectral density since the model is not stationary (Hurvich and Ray 1995). What is important here is that the normalized periodogram at Fourier frequencies depends on both j and d, as opposed to the i.i.d. case described in (4.139).

Furthermore, using the same argument as for the mean, Hurvich and Beltrao (1993) argue that for any two integers \(l\not=k\),

$$\lim_{n\rightarrow\infty}E \biggl[ \frac{d_{n,X}(\lambda_{k})\,\overline {d_{n,X}(\lambda_{l})}}{\sqrt{f_{X}(\lambda_{k})f_{X}(\lambda_{l})}} \,\biggr] =:\gamma_{w}(l,k;d), $$

where

$$\gamma_{w}(l,k;d)=(-1)^{l+k+1}|2\pi k|^{d}|2\pi l|^{d}\frac{2}{\pi}\int_{-\infty}^{\infty}\frac{\sin^{2}(\lambda/2)}{(2\pi k-\lambda)(2\pi l+\lambda)}|\lambda|^{-2d}\,d\lambda. $$

Furthermore, if the random variables X t are Gaussian, then

Thus, unlike the i.i.d. case, the DFTs and the normalized periodogram ordinates are not asymptotically independent.

4.6.2.4 Refined Covariance Bounds for Long-Memory Sequences

One can obtain the following asymptotic independence of the DFT and periodogram ordinates if the Fourier frequencies λ j are not too close to zero.

Recall that f X (λ)∼c f |λ|−2d and let

$$d_{n,X}^{0} ( \lambda ) =\frac{d_{n,X} ( \lambda ) }{\sqrt{c_{f}\lambda^{-2d}}}$$

and γ X (k)=cov(X t ,X t+k ). Then the following holds.

Theorem 4.32

Let \(X_{t}=\sum _{j=0}^{\infty }a_{j}\varepsilon_{t-j}\) be a second-order stationary linear process with

$$ f_{X} ( \lambda ) =\bigl \vert 1-\exp ( -i\lambda ) \bigr \vert ^{-2d} f_{\ast}(\lambda)\approx \vert \lambda \vert ^{-2d} f_{\ast}(\lambda)\approx c_{f} \vert \lambda \vert ^{-2d} $$
(4.144)

and such that

$$ f_{X}(\lambda)=c_{f}\vert \lambda \vert ^{-2d} +O \bigl( \lambda^{\rho -2d} \bigr) $$
(4.145)

for some 0<ρ≤2 and \(-\frac{1}{2}<d<\frac{1}{2}\). Let j n , k n be positive integer-valued sequences such that j n /n→0 and j n >k n . Then,

(4.146)

and

$$ \mathit{cov} \bigl( d_{n,X}^{0} ( \lambda_{j_{n}} ) ,d_{n,X}^{0} ( \lambda_{k_{n}} ) \bigr) =O \biggl( \frac{\log j_{n}}{k_{n}} \biggr) . $$
(4.147)

Before we proceed with the proof, we comment on assumption (4.145). This is a smoothness condition for f . For example, if ρ=2, then f is twice differentiable in the neighbourhood of the origin. This type of condition is crucial in studying for example semiparametric estimators of d.

Proof

The essential arguments can be seen by considering (4.146). Condition (4.145) implies

so that

$$\frac{f_{X}(\lambda_{j})}{c_{f}\lambda_{j}^{-2d}}=1+O \biggl( \biggl( \frac{j}{n} \biggr)^{\rho} \biggr) . $$

In a second step, one shows

$$ E \bigl[ d_{n,X} ( \lambda_{j} ) \,\overline{d_{n,X} ( \lambda_{j} ) } \,\bigr] =f_{X}(\lambda_{j})+O \biggl( \lambda_{j}^{-2d}\frac{\log j}{j} \biggr), $$
(4.148)

so that

$$E \biggl[ \frac{d_{n,X} ( \lambda_{j} )\,\overline{d_{n,X} ( \lambda_{j} ) }}{f_{X}(\lambda_{j})} \biggr] =1+O \biggl( \frac{\log j}{j} \biggr). $$

To show (4.148), we use the general formula for the covariance of DFT; see (4.135). Since K n is 2π-periodic with \(\int_{-\pi}^{\pi}K_{n}(u)\,du=1\), we obtain

$$ E \bigl[ d_{n,X} ( \lambda_{j} ) \,\overline{d_{n,X} ( \lambda_{j} ) } \,\bigr] -f_{X}(\lambda_{j})=\int _{-\pi}^{\pi} \bigl[ f_{X}( \lambda)-f_{X}(\lambda_{j}) \bigr] K_{n} ( \lambda- \lambda_{j} ) \,d\lambda. $$
(4.149)

Now, for n large enough, λ j is smaller than δ/2, so that

$$f_{X}(\lambda_{j})\leq c_{\delta}\lambda_{j}^{-2d}, \qquad\bigl \vert f_{X}^{\prime}(\lambda_{j})\bigr \vert \leq c_{\delta}\lambda_{j}^{-2d-1}$$

for a suitable finite constant c δ . Noting that K n (u)=O(n −1) for δ/2<uπ, we obtain

For \(0<d<\frac{1}{2}\), this is of order O((j/n)1−2dj −1)=o(j −1logj). Similarly, for \(-\frac{1}{2}<d<\frac{1}{2}\), the overall order is O(n −1)=O((j/n)j −1)=o(j −1logj). Therefore, the only relevant range of integration in (4.143) is −δλδ. There are two asymptotic poles that are approached asymptotically on the right-hand side of (4.149): a pole in f X for λ j →0 and an asymptotic singularity in K n (λλ j ) for λ=λ j . The largest order is obtained for the integral over \(\varDelta _{n}=[ \frac{1}{2}\lambda _{j},2\lambda_{j}] \). There, we have

Since |D n (u)|≤2|u|−1 (0<|u|<π), we have

$$\int_{-c\lambda_{j}}^{c\lambda_{j}}\bigl \vert D_{n}(\lambda) \bigr \vert \,d\lambda=O ( \log j ) $$

for any fixed c>0. Moreover, \(\lim_{\lambda\rightarrow\lambda _{j}}| \lambda-\lambda_{j}| K( \lambda-\lambda_{j}) =0\), and we obtain

and thus,

$$J(\lambda_{j})=O \bigl( n^{-1}\log j \bigr) . $$

Putting the orders together, we have

as required in (4.148). □

4.6.3 Limiting Distribution

4.6.3.1 Fourier Transform and Periodogram for Long-Memory Sequences

Now, we will describe the limiting distribution for the DFT and the periodogram ordinates. Let us write d n,X (λ j )=A(λ j )+iB(λ j ), where

$$A(\lambda)=\frac{1}{\sqrt{2\pi n}}\sum_{t=1}^{n}X_{t}\cos(t\lambda) ,\qquad B(\lambda)=\frac{1}{\sqrt{2\pi n}}\sum_{t=1}^{n}X_{t}\sin(t\lambda). $$

Then I n,X (λ j )=A 2(λ j )+B 2(λ j ). Assume for simplicity that X t is a Gaussian process. It follows from (4.147) that for each fixed K,

$$\biggl( \frac{d_{n,X}(\lambda_{j})}{\sqrt{f_{X}(\lambda_{j})}},j=1,\ldots ,K \biggr) $$

converges to a multivariate Gaussian distribution with dependent components and covariance matrix [γ w (l,k;d)] k,l=1,…,K . Furthermore, for each fixed j, the cosine and the sine parts A(λ j ) and B(λ j ) are uncorrelated with different variances. Therefore,

$$ \frac{I_{n,X}(\lambda_{j})}{f_{X}(\lambda_{j})}=\frac{A^{2}(\lambda_{j})}{f_{X}(\lambda_{j})}+\frac{B^{2}(\lambda_{j})}{f_{X}(\lambda_{j})}\underset{d}{\rightarrow}a\chi_{1}^{2}(1)+b\chi_{1}^{2}(2), $$
(4.150)

where a,b are constants, and \(\chi_{1}^{2}(j)\), j=1,2, are independent χ 2 random variables with one degree of freedom. Thus, in contrast to the i.i.d. case, the normalized periodogram ordinates have a different asymptotic distribution at each frequency. Moreover, the limiting distribution has dependent components.

4.6.3.2 Sum of Periodogram Ordinates

Let ϕ be a deterministic, real-valued function and consider the partial sum

$$S_{n,X}(\phi)=\sum_{j=1}^{N_{n}}\phi \bigl(I_{n,X}(\lambda_{j})\bigr), $$

where N n =[(n−1)/2]. If X t =ε t are i.i.d., then (cf. (4.141))

$$\operatorname {var}\Biggl( \sum_{j=1}^{N_{n}}2\pi I_{n,\varepsilon}(\lambda_{j}) \Biggr) \approx n(1+\kappa_{4}/2). $$

Also,

$$n^{-1/2}\sum_{j=1}^{N_{n}}2\pi I_{n,\varepsilon}(\lambda_{j})\underset {d}{\rightarrow}N(0,1+\kappa_{4}/2). $$

These asymptotic results are obvious when ε t are Gaussian since the periodogram ordinates are independent. If ϕ=log and ε t are Gaussian, then

$$\operatorname {var}\bigl(\log\bigl(2\pi I_{n,\varepsilon}(\lambda_{j})\bigr)\bigr)=\operatorname {var}\bigl(\log\bigl(I_{n,\varepsilon }(\lambda_{j})/f_{\varepsilon}(\lambda_{j})\bigr)\bigr)=\operatorname {var}\bigl(\log(Z)\bigr), $$

where Z is standard exponential. We compute

(4.151)

Therefore, in the Gaussian i.i.d. case,

$$n^{-1/2}\sum_{j=1}^{N_{n}}\log\bigl( 2\pi I_{n,\varepsilon}(\lambda _{j})\bigr) \underset{d}{\rightarrow}N\bigl(0,\pi^{2}/6\bigr). $$

In the long-memory case, the periodogram ordinates are asymptotically dependent, so that these convergence results are not valid. However, for a proper choice of asymptotically negligible constants c n,k , it is possible to obtain asymptotic normality of ∑c n,k ϕ(I n,X (λ k )) regardless whether X t is weakly or strongly dependent. We will illustrate this in the context of semiparametric estimation of the long-memory parameter d.

4.7 Limit Theorems for Wavelets

4.7.1 Introduction

In this section we discuss limit theorems for the discrete wavelet transform of long-memory stochastic processes. We refer to Sect. 3.5 for basic definitions of wavelets. At this point we recall that for a scaling function ϕ and a wavelet function ψ, dilated and translated functions are defined as

$$\phi_{j,k}(x)=2^{j/2}\phi\bigl(2^{j}x-k\bigr),\qquad \psi_{j,k}(x)=2^{j/2}\psi \bigl(2^{j}x-k\bigr). $$

However, it is not necessary that the wavelet functions are constructed using the multiresolution analysis, nor that they are orthogonal.

4.7.2 Discrete Wavelet Transform of Stochastic Processes

Assume first that Y(u) (\(u\in\mathbb{R}\)) is a continuous-time stochastic process. Define

$$d_{j,k}^{Y}=\int_{\mathbb{R}}Y(u)\psi_{j,k}(u)\,du,\qquad a_{j,k}^{Y}=\int_{\mathbb{R}}Y(u)\phi_{j,k}(u)\,du\quad(j,k\in\mathbb{Z)}. $$

In other words, \(d_{j,k}^{Y}\) and \(a_{j,k}^{Y}\) are (random) wavelet coefficients of the continuous-time process Y(u) \((u\in\mathbb{R})\). If the continuous-time process has mean zero, then clearly E(d j,k )=0 for each j,k. For simplicity, we write in the following a j,k , d j,k instead of \(a_{j,k}^{Y}\), \(d_{j,k}^{Y}\).

Assume further that Y(u) (\(u\in\mathbb{R}\)) has stationary increments. For each fixed resolution level j, the process d j,k (\(k\in\mathbb{Z}\)) is stationary. Indeed, we may verify, for instance, that the marginal distributions are invariant under translation: the random coefficient

is equal in distribution to

$$\int \bigl( Y(u)-Y(0) \bigr) \psi_{j,k}(u)\,du=\int Y(u) \psi_{j,k}(u)\,du=d_{j,k}. $$

The same applies to the scaling coefficients a j,k =∫Y(u)ϕ j,k (u) du. A more rigorous proof of stationarity can be found in e.g. Houdré (1994). See also Masry (1993) and Cambanis and Houdré (1995) for the DWT of stochastic processes.

If moreover, the process Y(u) is H-self-similar, then for each j, k,

$$d_{j,k}\overset{\mathrm{d}}{=}2^{-j(H+1/2)}d_{0,k}. $$

Indeed, heuristically,

Hence, if the continuous-time process Y(u) \((u\in\mathbb{R})\) is self-similar with stationary increments (H-SSSI), then

$$E \bigl[ d_{j,k+l}^{2} \bigr] =2^{-j(2H+1)}E \bigl[ d_{0,k}^{2} \bigr] =2^{-j(2H+1)}E \bigl[ d_{0,0}^{2} \bigr] . $$

This applies, in particular, to fractional Brownian motion. As we will see later, these formulas can be used to define a wavelet-based estimator of the self-similarity parameter H.

4.7.3 Second-Order Properties of Wavelet Coefficients

Now, we turn our attention to stationary processes X(u) (\(u\in\mathbb{R}\)). For example, X(u)=Y(u)−Y(u−1) (\(u\in\mathbb{R}\)) can be defined as increments of the H-SSSI process considered above. Define analogously wavelet and scaling coefficients:

Then d j,k and a j,k (\(k\in\mathbb{Z}\)) form stationary sequences. We verify for instance that the marginal distributions are shift-invariant: for \(l\in\mathbb{Z}\), we have

Hence, we can analyse the covariance structure of the stationary sequence d j,k (\(k\in\mathbb{Z}\)). Assume that the process X(u) (\(u\in \mathbb{R}\)) is centred, has the covariance function γ X (s) (\(s\in\mathbb{R}\)) and the spectral density

$$f_{X}(\lambda)=\int_{-\infty}^{\infty}\gamma_{X}(s)e^{-i\lambda s}\,ds. $$

Assume further that

$$f_{X}(\lambda)=\lambda^{-2d}f_{\ast}(\lambda),\quad\lambda\rightarrow0, $$

where lim λ→0 f (λ)=c f ∈(0,∞) and d∈[0,1/2). For example, X(u) could be fractional Gaussian noise, i.e. increments of fractional Brownian motion with Hurst parameter \(H=d+\frac{1}{2}\).

One of the most intriguing properties of DWT is the decorrelation (whitening) property. Specifically, if the wavelet ψ has M vanishing moments, then we will argue below that

$$\mathit{cov}(d_{j,0},d_{j,k})=O\bigl(k^{-2M+2d-1}\bigr)\quad(k\rightarrow\infty). $$

That is, the stationary sequence d j,k (\(k\in\mathbb{Z}\)) is weakly dependent (i.e. has summable covariances) if M≥1. For example, the whitening property applies to fractional Gaussian noise X(u)=B H (u)−B H (u−1), where B H (u) is a fractional Brownian motion with Hurst parameter H∈(1/2,1). This phenomenon is discussed for instance in Flandrin (1992), Tewfik and Kim (1992), Abry et al. (1998) or Mielniczuk and Wojdyłło (2007a).

To justify the whitening property, recall that

$$\hat{\psi}(\lambda)=\int_{-\infty}^{\infty}\psi(x)e^{-i\lambda x}\,dx $$

is the Fourier transform of ψ. Hence,

We can then evaluate covariance structure of the wavelet coefficients of the process X(⋅) as

(4.152)

This formula is crucial to evaluate the variance and covariance structure of the wavelet coefficients for stochastic processes with long memory. A change of variables ω=2−(j+j′)/2 λ,

$$\lambda=2^{(j+j^{\prime})/2}\omega, $$

and the form f(λ)=λ −2d f (λ) of the spectral density yield

where

$$r=\bigl \vert 2^{(j-j^{\prime})/2}k-2^{(j^{\prime}-j)/2}\omega k^{\prime }\bigr \vert . $$

When j,j′→−∞ (i.e. we are considering coarse resolution levels or “low frequencies”), then 2(j+j′)/2 ω→0, so that

$$f_{\ast}\bigl(2^{(j+j^{\prime})/2}\omega\bigr)\sim f_{\ast}(0)=c_{f}. $$

This motivates the following definition:

$$ \varPsi_{j,j^{\prime}}\bigl(k,k^{\prime}\bigr):=\int_{-\infty}^{\infty} \omega^{-2d}\hat{\psi}\bigl(2^{(j^{\prime}-j)/2}\omega\bigr) \overline{\hat{\psi }\bigl(2^{(j-j^{\prime })/2}\omega\bigr)}e^{-ir\omega}\,d\omega. $$
(4.153)

We note that if \(j\not=j^{\prime}\), \(k\not=k^{\prime}\) and d=0, then, due to orthogonality, the covariances vanish if the wavelet family ψ j,k is constructed using the MRA. As we will see below, in the case of long memory, orthogonality of wavelets is not crucial at all. The most important property is the number M of vanishing moments of the wavelet function ψ.

To see this, let d>0 and consider j=j′ and k′=0. Then

$$\mathit{cov}(d_{j,0},d_{j,k})=2^{-2jd}\int \omega^{-2d}f_{\ast}\bigl(2^{j}\omega\bigr)\bigl \vert \hat{\psi}(\omega)\bigr \vert ^{2}e^{-ik\omega}\,d\omega. $$

Again, as j→−∞, we approximate this integral as

$$\mathit{cov}(d_{j,0},d_{j,k})=2^{-2jd}f_{\ast}(0) \int\omega^{-2d}\bigl \vert \hat {\psi }(\omega)\bigr \vert ^{2}e^{-ik\omega}\,d\omega. $$

Next, recall now from Sect. 3.5 that if the wavelet function ψ has M vanishing moments, then

$$\big|\hat{\psi}(\lambda)\big|=\big|\hat{\psi}^{(M)}(0)\big||\lambda|^{M}+o\bigl(|\lambda |^{M}\bigr)\quad(\lambda\rightarrow0). $$

Thus, if k is large enough, then we have to analyse the following integral in a neighbourhood (−ε/k,ε/k) of the origin:

$$2^{-2jd}c_{f} \bigl\{ \hat{\psi}^{(M)}(0) \bigr \}^{2}\int_{\varepsilon /k}^{\varepsilon /k}\omega^{-2d} \omega^{2M}e^{-ik\omega}\,d\omega. $$

The change of variables λ= yields the approximation

$$2^{-2jd}c_{f} \bigl\{ \hat{\psi}^{(M)}(0) \bigr \}^{2}k^{-2M+2d-1}\int_{-\varepsilon }^{\varepsilon } \lambda^{2M-2d}e^{-i\lambda}\,d\lambda. $$

The integral is finite as long as 2M−2d>−1. Of course, in these computations several simplifications and informal approximations are used. Nevertheless, we have obtained heuristically the following decorrelation property.

Lemma 4.23

Assume that X(u) (\(u\in \mathbb{R}\)) is a stationary centred process such that its spectral density is given by f X (λ)=|λ|−2d f (λ), \(\lambda \in\mathbb{R}\), d∈(0,1/2) and lim λ→0 f (λ)=c f ∈(0,∞). Then for each \(j\in\mathbb{Z}\),

$$\mathit{cov}(d_{j,0},d_{j,k})=O\bigl(k^{-2M+2d-1}\bigr)\quad(k\rightarrow\infty). $$

The same result carried over to series X t (\(t\in\mathbb{Z}\)) in discrete time, when transformed into their continuous-time versions as discussed in the introduction to wavelets. In particular, the restrictions \(d<\frac{1}{2}\) and M≥1 imply that we always have cov(d j,0,d j,k )=o(k −2). This means that

$$\sum_{k=-\infty}^{\infty}\big|\mathit{cov}(d_{j,0},d_{j,k})\big|<\infty $$

and the wavelet coefficients d j,k (\(k\in\mathbb{Z}\)) are weakly dependent. Moreover, if the process X(u) (\(u\in\mathbb{R}\)) is Gaussian, then the wavelet coefficients are Gaussian as well. Also, in the Gaussian case we have

$$\mathit{cov}\bigl(d_{j,0}^{2},d_{j,k}^{2} \bigr)=2\mathit{cov}^{2}(d_{j,0},d_{j,k}), $$

so that these autocovariances converge as well.

As indicated above, a very useful property is also (4.153) because for large enough scales, i.e. for j, j′→−∞,

$$\mathit{cov}(d_{j,k},d_{j^{\prime},k^{\prime}})\approx2^{-(j+j^{\prime})d}f_{\ast }(0) \varPsi_{j,j^{\prime}}\bigl(k,k^{\prime}\bigr). $$

Thus, the weak dependence extends to the wavelet coefficients at different resolution levels \(j\not=j^{\prime }\).

To evaluate the variance of d j,k , set j=j′, k=k′ in (4.152). Then

Again, we approximate f (2j λ)≈f (0)=c f (for j→−∞) and hence

$$ \operatorname {var}(d_{j,k})\approx2^{-2jd}c_{f}\int| \lambda|^{-2d}\big|\hat{\psi}(\lambda )\big|^{2}\,d\lambda=:2^{-2jd}c_{f} \varPsi(2d), $$
(4.154)

where

$$\varPsi(\gamma)=\int\lambda^{-\gamma}\big|\hat{\psi}(\lambda)\big|^{2}\,d\lambda. $$

This heuristic approximation has been derived in Abry et al. (1998). More precise bounds have been obtained in Lemma 1 in Bardet et al. (2000) or Theorem 1 in Moulines et al. (2007a). A bound that requires a semiparametric assumption on the spectral density similar to the one used for the DFT is for instance:

Lemma 4.24

Assume that for some d∈(0,1/2),

$$f_{X}(\lambda) =\lambda^{-2d} \bigl( f_{\ast}(0)+O \bigl(|\lambda|^{\rho}\bigr) \bigr) . $$

Under appropriate regularity conditions, we have, as j→−∞,

$$\big|\operatorname {var}(d_{j,k})-2^{-2jd}c_{f}\varPsi(2d)\big|\leq2^{-2jd}2^{j\rho}\varPsi(2d-\rho). $$

Proof

In the proof, we omit several details, referring to the papers mentioned above. We note that

$$\big|\operatorname {var}(d_{j,k})-2^{-2jd}c_{f}\varPsi(2d)\big| \leq2^{-2jd}\int|\lambda|^{-2d}\bigl \vert \bigl\{ f_{\ast} \bigl(2^{j}\lambda\bigr)-f_{\ast}(0) \bigr\} \bigr \vert \big|\hat { \psi }(\lambda)\big|^{2}\,d\lambda. $$

Under the assumption

$$f_{\ast}(\lambda)=|\lambda|^{-2d}\bigl(f_{\ast}(0)+O \bigl(|\lambda|^{\rho}\bigr)\bigr), $$

the bound is

$$2^{-2jd}\int|\lambda|^{-2d} \bigl\{ 2^{j}\lambda \bigr\}^{\rho} \big|\hat {\psi }(\lambda)\big|^{2}\,d\lambda=2^{-2jd}2^{j\rho} \varPsi(2d-\rho). $$

 □

4.8 Limit Theorems for Empirical and Quantile Processes

4.8.1 Linear Processes with Finite Moments

The empirical distribution function plays an essential role in statistical inference. Many statistics that are concerned with inference for the marginal distribution of a process can be written as functionals of the (marginal) empirical distribution function F n (x). Therefore, in principle, their distribution follows “automatically”, once the empirical distribution function is characterized asymptotically. Sometimes, the functionals are quite involved however so that the derivation requires some additional work. Relatively simple functionals occur for instance in goodness-of-fit tests, and even more directly in quantile estimation. For obvious reasons, limiting results for quantile processes follow directly from those for the empirical distribution function.

Recall that for a stationary process X t (\(t\in\mathbb{Z}\)) with marginal distribution function F X (x)=P(Xx), a simple nonparametric estimator of F X is the (marginal) empirical distribution function

$$ F_{n,X}(x)=\frac{1}{n}\sum_{t=1}^{n}1 \{ X_{t}\leq x \} \quad (x\in\mathbb{R}). $$
(4.155)

Under very general assumptions (for example ergodicity of the sequence), F n,X is a uniformly consistent estimator of F X , which means that, as n→∞,

$$ \sup_{x\in\mathbb{R}}\big|F_{n,X}(x)-F_{X}(x)\big|\underset{p}{\rightarrow}0. $$
(4.156)

Furthermore, if X t (\(t\in\mathbb{Z}\)) are i.i.d., then the classical Donsker invariance principle states

$$ \sqrt{n}E_{n,X}(x):=\sqrt{n} \bigl[ F_{n,X}(x)-F_{X}(x) \bigr] \Rightarrow \tilde{B}\bigl(F_{X}(x)\bigr), $$
(4.157)

where ⇒ denotes weak convergence in D[0,∞), and \(\tilde {B}(u)\) (u∈[0,1]) is a Brownian bridge, i.e. \(\tilde{B}(u)=B(u)-uB(u)\) where B(u) is standard Brownian motion. In other words, the appropriately normalized empirical processes E n,X (x) converge weakly to the time-changed Brownian bridge. An analogous result, with the same normalizing rate but a different limiting process, holds for weakly dependent processes under very general conditions. The situation is quite different, however, under long memory. This can be seen as follows. The indicator function is a very specific transformation of X, i.e. we consider

$$G(X;x)=1 \{ X\leq x \} -F_{X} ( x ) . $$

Let \(p_{X}=F_{X}^{\prime}\) be the density of X. With the function yG(y;x) we can associate the Appell coefficients a app,j (j≥1):

Furthermore, recall also (see Definition 4.1) that G (y)=E[G(X+y)]. Applying this to G(y;x)=1{yx}, we obtain G (y)=P(Xxy), and hence,

$$G_{\infty}^{(1)}(0)=-p_{X}(x-y)|_{y=0}=-p_{X}(x). $$

Therefore, the theory for partial sums of subordinated long-memory processes (considered e.g. in Sects. 4.2, 4.3) will imply the limiting behaviour for the empirical distribution F n,X (x) function when x is fixed.

The asymptotic behaviour of the empirical process based on long-memory linear processes with finite variance was studied in Dehling and Taqqu (1989b), Giraitis and Surgailis (1999), Ho and Hsing (1996), Giraitis et al. (1997), Wu (2003) and Csörgő et al. (2006), Csörgő and Kulik (2008a, 2008b). Here, we state the result under the assumptions that are needed to apply the martingale expansion technique of Ho and Hsing (1996) and Wu (2003), as considered in Theorem 4.9. When dealing with linear processes, this technique seems to be superior to the Appell expansion.

Theorem 4.33

Let X t (\(t\in\mathbb {Z}\)) be a linear process \(X_{t}=\sum_{j=0}^{\infty }a_{j}\varepsilon_{t-j}\) with coefficients satisfying assumption (B1), i.e. a j L a (j)j d−1, d∈(0,1/2) (so that γ X (k)∼L γ (k)k 2d−1). Also, assume that E(|ε 1|4+γ)<∞ for some γ>0 and that p ε , the density of the innovations, is such that

$$ \sup_{x\in\mathbb{R}}\big|p_{\varepsilon}^{(r)}(x)\big| +\int\big|p_{\varepsilon}^{(r)}(x)\big|^{2}\,dx<\infty\quad(r=0,1,2). $$
(4.158)

Then we have the uniform reduction principle

$$ n^{\frac{1}{2}-d}L_{1}^{-\frac{1}{2}} ( n ) \sup_{x\in\mathbb {R}}\bigl \vert F_{n,X} ( x ) -F_{X} ( x ) +p_{X}(x)\bar{x} \bigr \vert \rightarrow_{p}0. $$
(4.159)

Consequently,

$$ n^{\frac{1}{2}-d}L_{1}^{-\frac{1}{2}} ( n ) \bigl[ F_{n,X} ( x ) -F_{X}(x) \bigr] \Rightarrow p_{X}(x)Z, $$
(4.160)

where L 1(n)=(d(2d+1))−1 L γ (n), ⇒ denotes weak convergence in D(−∞,∞), and Z is a standard normal random variable.

Remark 4.4

Condition (4.158) implies that the same holds for the density p X . In particular, the conditions on \(p_{X}^{(1)}(x)\) and \(p_{X}^{(2)}(x)\) are required to control a remainder term in the second-order expansion leading to (4.159). Note also that the assumptions of the theorem can be modified to E(|ε 1|2+γ)<∞ and

$$ \bigl \vert E \bigl[ \exp ( is\varepsilon_{1} ) \bigr] \bigr \vert \leq C \bigl( 1+\vert s\vert \bigr)^{\delta} $$
(4.161)

for some δ>0, 0<C<∞. Condition (4.161) means in principle that p X is infinitely often differentiable. These assumptions were used in Giraitis and Surgailis (1999). The authors were also able to deal with double-sided linear processes, however, at the cost of additional moment assumptions.

Remark 4.5

Under the conditions of Theorem 4.33, the finite-dimensional convergence in (4.160) follows directly from Theorem 4.9 and Corollary 4.3. Tightness is usually not proven directly, but rather follows from the reduction principle (4.159). For the latter, we refer to Dehling and Taqqu (1989b) or Csörgö, Szyszkowicz and Wang in the Gaussian case and to Ho and Hsing (1996) and Wu (2003) in the linear case.

Proof

We repeat the martingale approximation argument presented before Theorem 4.9, adapting it to the indicator function G(y;x)=1{yx}. Recall that \(\mathcal{F}_{K}=\sigma( \varepsilon_{j},j\leq K) \) is the σ-algebra generated by ε j (jK). We start with an orthogonal expansion of the indicator function,

$$1 \{ X_{t}\leq x \} -F_{X}(x)\underset{L_{X}^{2} ( \varOmega ) }{=}\sum_{j=0}^{\infty} \zeta_{t}(j), $$

where

$$\zeta_{t}(j)=P ( X_{t}\leq x|\mathcal{F}_{t-j} ) -P ( X_{t}\leq x|\mathcal{F}_{t-j-1} ) . $$

Note that \(\zeta_{t}(0)=1\{ X_{t}\leq x\} -P( X_{t}\leq x|\mathcal{F}_{t-1}) \). As before, the nice feature of this expansion is that, for fixed t, ζ t (j) (j=0,1,2,…) is a martingale difference, so that we indeed obtain orthogonality in the sense that for jj ,

$$\bigl\langle\zeta_{t}(j),\zeta_{t}\bigl(j^{\ast} \bigr) \bigr\rangle=\mathit{cov} \bigl( \zeta_{t}(j),\zeta_{t} \bigl(j^{\ast}\bigr) \bigr) =0. $$

In more concrete terms, we have

$$P ( X_{i}\leq x|\mathcal{F}_{t-j} ) =P \Biggl( \sum _{s=0}^{j-1}a_{s} \varepsilon_{t-s}\leq x-\sum_{s=j}^{\infty}a_{s} \varepsilon_{t-s} \Biggr) =F_{j} ( u_{j} ), $$

where, given \(\mathcal{F}_{t-j}\), the argument

$$u_{j}=x-\sum_{s=j}^{\infty}a_{s}\varepsilon_{t-s}$$

is fixed (of course, u j depends on t as well, but this dependence is omitted). Similarly,

$$F_{j+1} ( u_{j+1} ) =P ( X_{t}\leq x| \mathcal{F}_{t-j-1} ) =P \Biggl( \sum_{s=0}^{j}a_{s} \varepsilon_{t-s}\leq x-\sum_{s=j+1}^{\infty}a_{s} \varepsilon_{t-s} \Biggr) . $$

Note that u j+1=u j a j ε tj and

$$\zeta_{t}(j)=F_{j} ( u_{j} ) -F_{j+1} ( u_{j+1} ) . $$

A heuristic argument leads to the idea how one may obtain a linearization. We will use the notation \(p_{j}(u)=F_{j}^{\prime}(u)\) for the probability density function of \(\sum_{s=0}^{j-1}a_{s}\varepsilon_{t-s}\) and F ε (y)=P(εy). For F j+1(u j+1), we can write

$$F_{j+1} ( u_{j+1} ) =\int p_{j} ( y ) F_{\varepsilon }\bigl( q_{j}(x,y) \bigr) \,dy $$

with

$$q_{j}(x,y)=\frac{u_{j+1}(x)-y}{a_{j}}. $$

For the sake of argument, assume that a j >0 for j large enough. Since a j →0 (as j→∞), we have q j →∞ and F ε (q j (x,y))→1 if y<u j+1(x). On the other hand, q j →−∞ and F ε (q j (x,y))→0, if y>u j+1. Therefore, as j→∞,

$$F_{j+1} ( u_{j+1} ) \approx\int_{-\infty}^{u_{j+1}}p_{j} ( y ) \,dy=F_{j} ( u_{j+1} ) . $$

Furthermore, using u j =u j+1a j ε tj with a j ε tj →0 in probability as j→∞, we obtain in first approximation

$$F_{j} ( u_{j} ) \approx F_{j} ( u_{j+1} ) -p_{j} ( u_{j+1} ) a_{j}\varepsilon_{t-j},$$

so that

Finally, as j→∞, F j converges to F X (and p j to f X ) and u j+1 to x, so that we may hope to obtain the following approximation:

A precise computation establishes the rate in (4.159). □

Taking into account higher-order terms in the Taylor expansions above, a complete orthogonal decomposition can be obtained:

$$ F_{n,X}(x)-F_{X}(x)=\frac{1}{n}\sum _{t=1}^{n}\sum_{r=1}^{\infty} ( -1 )^{k}F_{X}^{(r)}(x)V_{t,r} $$
(4.162)

with

$$V_{t,r}=\sum_{0\leq j_{1}<j_{2}<\cdots<j_{r}}^{\infty}\prod _{s=1}^{r}a_{j_{s}}\varepsilon_{t-j_{s}}, $$

already defined in (4.51).

Theorem 4.33 is remarkable not only because of the slower rate of convergence under long memory, but also because the asymptotic process p X (x)Z (in x) is degenerate. The entire sample path is determined by one normal variable Z and a deterministic function p X (x). In other words, all sample paths have the shape of p X (x)! This is in sharp contrast to the case of weak memory where the asymptotic process is proportional to a Brownian bridge (see (4.157) above).

The convergence (4.160) can be extended further. In addition to (4.158), assume that the condition holds with r=3. Then the following holds:

  • If d∈(1/4,1/2), then

    $$ n^{1-2d}L_{2}^{-1/2} ( n ) \bigl[ F_{n,X} ( x ) -F_{X}(x)+p_{X}(x)\bar{x} \bigr] \Rightarrow p_{X}^{(1)}(x)Z_{2,H}(1), $$
    (4.163)

    where Z 2,H (1) is the Hermite–Rosenblatt random variable, and H=d+1/2.

  • If d∈(0,1/4), then

    $$ \sqrt{n} \bigl[ F_{n,X} ( x ) -F_{X}(x)+p_{X}(x) \bar{x} \bigr] \Rightarrow Z(x), $$
    (4.164)

    where Z(⋅) is a Gaussian process.

Essentially, these convergence results are very similar to the case of nonlinear functionals. The asymptotic behaviour of

$$F_{n,X}\left( x\right) -F_{X}(x)+p_{X}(x)\bar{x}$$

is determined by \(\frac{1}{2}p_{X}^{(1)}(x)n^{-1}U_{n,2}\), where

$$U_{n,2}=2!\sum_{t=1}^{n}\sum_{0=j_{1}<j_{2}}^{\infty}a_{j_{1}}a_{j_{2}}\varepsilon_{t-j_{1}}\varepsilon_{t-j_{2}}$$

is defined in (4.51).

Furthermore, Theorem 4.33 can be extended to subordinated processes \(Y_{j}=\tilde {G}(X_{t})\). As expected from Theorem 4.4 (Gaussian case) or Theorem 4.8 (the linear case), the rate of convergence and the asymptotic distribution depends on the Appell (or, equivalently, the power) rank of

$$G(X;x)=1 \bigl\{ \tilde{G}(X)\leq x \bigr\} -F_{Y}(x). $$

The limiting process is a Hermite–Rosenblatt random variable multiplied by a deterministic function.

4.8.2 Applications and Extensions

4.8.2.1 Quantile Processes and Trimmed Sums

Weak convergence (4.160) for empirical processes based on LRD linear sequences has immediate implications for sample quantiles. For y∈(0,1), define the quantile function

$$Q_{X}(y)=F_{X}^{-1}(y)=\inf\bigl \{x:F_{X}(x)\geq y\bigr\}. $$

We will assume that F X and Q X are differentiable, so that

$$Q_{X}(y)=\inf\bigl\{x:F_{X}(x)=y\bigr\}. $$

In an analogous manner, the empirical quantile function is defined as \(Q_{n,X}(y)=F_{n,X}^{-1}(y)\) with F n,X defined in (4.155). By definition, Q n,X is left-continuous. Noting that for x=Q X (y),

$$Q_{X}^{\prime}(y)=\frac{1}{p_{X} ( x ) }, $$

(4.160) implies

$$ L_{1}^{-\frac{1}{2}} ( n ) n^{\frac{1}{2}-d} \bigl[ Q_{n,X}(y)-Q_{X}(y) \bigr] \Rightarrow Z, $$
(4.165)

where Z is a standard normal random variable, and the convergence is in D[a,b] equipped with the sup-norm for 0<a<b<1. It is remarkable that the limiting variable does not depend on y (this is of course due to the degenerate structure of the limiting process in (4.160)). A detailed evaluation and further extensions can be found in Ho and Hsing (1996), Wu (2005), Csörgő et al. (2006), Youndjé and Vieu (2006), Csörgő and Kulik (2008a, 2008b) or Coeurjolly (2008a, 2008b).

The result for the quantile function can be extended to trimmed sums

$$ T_{n,h}:=\frac{1}{n-2[nh]}\sum_{t=[nh]+1}^{n-[nh]}X_{t:n}, $$
(4.166)

where h∈(0,1/2), and X 1:n X 2:n ≤⋯≤X n:n are the order statistics. Then

$$L_{1}^{-\frac{1}{2}} ( n ) n^{\frac{1}{2}-d}T_{n,h} \rightarrow_{d}Z. $$

See Ho and Hsing (1996), Wu (2003) or Kulik and Ould Haye (2008).

Note, however, that the weak convergence (4.165) cannot be extended to (0,1). Similarly, the result (4.166) does not hold for sums of extremes \(\sum_{t=1}^{[nh]}X_{t:n}\) or \(\sum_{t=n-[nh]}^{n}X_{t:n}\). There, the limiting behaviour depends on an interplay between the dependence parameter d and the heaviness of tails of the random variables X t . We refer to Kulik (2008a) for details. Similar issues will be discussed in Sect. 4.8.5 in connection with tail empirical processes.

4.8.2.2 Goodness-of-Fit Test

An immediate consequence for statistical inference is for instance an unusual behaviour of the Kolmogorov–Smirnov statistic, namely

$$ L_{1}^{-\frac{1}{2}}(n)n^{\frac{1}{2}-d}T_{\mathrm{KS},n}:=L_{1}^{-\frac {1}{2}}(n)n^{\frac{1}{2}-d} \sup_{x\in\mathbb{R}}\bigl \vert F_{n,X} ( x ) -F_{X} ( x ) \bigr \vert \underset{d}{\rightarrow }\vert Z\vert \sup_{x\in\mathbb{R}}p_{X}(x), $$
(4.167)

given that \(\sup_{x\in\mathbb{R}}p_{X}(x)<\infty\). Therefore, we may approximate p-values by

$$ P ( T_{KS,n}>u ) \approx2\bar{\varPhi} \biggl( \frac{u}{\sup_{x\in\mathbb{R}}p_{X}(x)}L_{1}^{\frac{1}{2}}(n)n^{d-\frac{1}{2}} \biggr), $$
(4.168)

where u≥0, Φ is the cumulative standard normal distribution, and \(\bar{\varPhi}=1-\varPhi\). Note in particular that for a given density, the value \(\sup_{x\in\mathbb{R}}p_{X}(x)\) is known. Of course, in general one has to estimate the dependence parameter d.

In contrast, for weakly dependent processes, the supremum of the transformed Brownian bridge \(\tilde{B}\circ F\) over the interval [0,1] is required.

4.8.3 Empirical Processes with Estimated Parameters

Consider the assumptions of Theorem 4.33. As mentioned previously, a direct statistical application of the limiting behaviour of the empirical process is the Kolmogorov–Smirnov statistic, as established in (4.167). As explained in (4.168), this result can be used, in principle, to test whether the marginal distribution F X of an observed series X 1,…,X n is equal to a specific distribution F 0. Usually, however, one needs to test whether F X belongs to a certain type of distributions, instead of one fixed F 0. For instance, we would like to test whether F X is in a parametric family \(\{F_{X}(\cdot,{\theta}),\theta\in{\mathbb{R}}\}\), without specifying the parameter θ a priori. The nuisance parameter θ has to be estimated from the observed series. Thus, instead of T KS(θ)=T KS,n (θ), one considers

$$T_{\mathrm{KS}} ( \hat{\theta} ) =\sup_{x\in{\mathbb{R}}} \big|F_{n,X}(x)-F_{X}(x;\hat{\theta})\big|, $$

where \(\hat{\theta}\) is a suitable estimate of θ. If the observations are i.i.d., then the rate of convergence for both, the original Kolmogorov–Smirnov statistics T KS=T KS(θ) and \(T_{\mathrm{KS}}( \hat{\theta}) \), is the same, though the variances of the limiting distributions are different.

To show what may happen in the long-memory case, let us consider a sequence Y t =X t +μ (\(t\in\mathbb{N}\)). Clearly, F Y (x)=F X (x;μ)=F X (xμ). The empirical processes

$$E_{n,X}(x)=F_{n,X}(x)-F_{X}(x)=\frac{1}{n}\sum _{t=1}^{n}1 \{ X_{t}\leq x \} -F_{X}(x) $$

and

$$E_{n,Y}(x;\mu):=F_{n,Y}(x)-F_{Y}(x)=\frac{1}{n} \sum_{t=1}^{n}1 \{ Y_{t}\leq x \} -F_{Y}(x) $$

are related by

$$ E_{n,Y}(x;\mu)=E_{n,X} ( x-\mu ) . $$
(4.169)

On account of (4.160), \(L_{1}^{-\frac {1}{2}}( n) n^{\frac{1}{2}-d}E_{n,Y}(x)\) converges weakly to p X (xμ)Z. Now, consider instead

$$E_{n,Y}(x;\hat{\mu})=F_{n,Y}(x)-F_{X}(x;\hat{\mu}). $$

We will use the estimate \(\hat{\mu}=\bar{y}\), so that \(\hat{\mu}-\mu =\bar{x}\). We then write

Now, we apply Taylor’s expansion to obtain

where R n is of a smaller order than \(\bar{x}^{2}\). Furthermore, the reduction principle (4.159) implies

$$n^{\frac{1}{2}-d}L_{1}^{-\frac{1}{2}} ( n ) \sup_{x\in\mathbb{R}} \bigl \vert E_{n,X}(x-\mu)+p_{X}(x-\mu)\bar{x}\bigr \vert \rightarrow_{p}0. $$

Thus,

where the bound o P (1) is uniform in x given that \(\sup_{x\in \mathbb{R}}|p_{X}^{(2)}(x)|<\infty\). In other words, the empirical processes E n,Y (⋅ ;μ) and \(E_{n,Y}(\cdot\,;\hat{\mu})\) have different rates of convergence. Surprisingly, plugging in the parameter estimate improves the rate of convergence of the empirical process and therefore of goodness-of-fit tests such as the Kolmogorov–Smirnov or Anderson–Darling tests (Beran and Ghosh 1991; Ho 2002; Kulik 2009). The precise convergence rates are described in the following theorem.

Theorem 4.34

Assume that the conditions of Theorem 4.33 are fulfilled. Additionally, assume that (4.158) holds with r=3.

  • If d∈(1/4,1/2) then

    $$ n^{1-2d}L_{1}^{-1/2}(n)E_{n,Y}(x;\hat{\mu}) \Rightarrow p_{X}^{(1)} ( x-\mu ) \biggl( Z_{2}- \frac{1}{2}Z_{1}^{2} \biggr) , $$
    (4.170)

    where Z 1 and Z 2 are uncorrelated random variables, Z 1N(0,1), and Z 2=Z 2,H (1) is the Hermite–Rosenblatt variable.

  • If d∈(0,1/4) then

    $$ \sqrt{n}E_{n,Y}(x;\hat{\mu})\Rightarrow Z ( x-\mu ) , $$
    (4.171)

    where Z(⋅) is a Gaussian process.

Remark 4.6

The limiting Gaussian process has a rather complicated covariance structure. Nevertheless, the result (4.171) suggests that for d∈(0,1/4), we can apply standard resampling techniques available for weakly dependent data, see Chap. 10.

To shed some light on the results of Theorem 4.34, consider the case d∈(1/4,1/2). The expression for the limiting process follows essentially from the approximation

$$E_{n,Y}(x;\hat{\mu})\approx \bigl\{ E_{n,X}(x- \mu)+p_{X}(x-\mu)\bar{x} \bigr\} +\frac{1}{2}p_{X}^{(1)}(x- \mu)\bar{x}^{2}. $$

Now, the result follows from (4.163) and the limiting behaviour of the sample mean.

Furthermore, the limiting behaviour may change if different estimators of the mean μ are considered or if one considers a location-scale family Y=μ+σX (see Beran and Ghosh 1991; Ho 2002; Kulik 2009).

4.8.4 Linear Processes with Infinite Moments

As noticed above, finite-dimensional convergence of the appropriately scaled empirical process E n,X =F n,X F X (x) follows from the result for partial sums of subordinated linear processes, by considering the function yG(y;x)=1{yx}. We will apply the same idea to linear processes \(X_{t}=\sum_{j=0}^{\infty}a_{j}\varepsilon_{t-j}\) with i.i.d. symmetric infinite variance innovations, i.e.

$$ P(\varepsilon_{1}>x)\sim A\frac{1+\beta}{2}{x^{-\alpha}},\qquad P(\varepsilon_{1}<-x)\sim A\frac{1-\beta}{2}x^{-\alpha} $$
(4.172)

with β=0. The general result mimics Theorem 4.17. We established there that for 0<d<1−1/α, we have

$$n^{-H}\sum_{t=1}^{[nu]} \bigl\{ G(X_{t})-E \bigl[ G(X_{1}) \bigr] \bigr\} \Rightarrow A^{1/\alpha}C_{\alpha}^{-1/\alpha}\frac{c_{a}}{d}G_{\infty}^{(1)}(0) \tilde{Z}_{H,\alpha}(u), $$

where \(\tilde{Z}_{H,\alpha}(\cdot)\) is a linear fractional stable motion with H=d+α −1 and G (y)=E[G(X+y)]. Setting u=1 and evaluating G (y)=P(Xxy), \(G_{\infty}^{(1)}(0)=-p_{X}(x)\), we may conclude that for a fixed \(x\in\mathbb{R}\),

$$n^{-H}\sum_{t=1}^{n} \bigl( 1 \{X_{t}\leq x\}-P(X_{1}\leq x) \bigr) \overset{\mathrm{d}}{ \rightarrow}A^{1/\alpha}C_{\alpha}^{-1/\alpha}\frac{c_{a}}{d}p_{X}(x)\tilde{Z}_{H,\alpha}(1). $$

This can be extended to convergence of the process E n,X (x) (\(x\in\mathbb{R}\)), see Koul and Surgailis (2001).

Theorem 4.35

Assume that X t (\(t\in\mathbb{Z}\)) is a linear process with a j c a j d−1,

$$0<d<1-1/\alpha, $$

and ε t (\(t\in\mathbb{Z}\)) are i.i.d. symmetric random variables such that (4.89) holds with α∈(1,2) and β=0:

$$P(\varepsilon_{1}>x)\sim A\frac{1+\beta}{2}{x^{-\alpha}},\qquad P(\varepsilon_{1}<-x)\sim A\frac{1-\beta}{2}x^{-\alpha}. $$

Furthermore, assume that the distribution F ε of ε 1 is such that

$$\big|F_{\varepsilon}^{(2)}(x)\big|\leq C\bigl(1+|x|\bigr)^{-\alpha},\qquad \big|F_{\varepsilon }^{(2)}(x)-F_{\varepsilon}^{(2)}(y)\big|\leq C|x-y|\bigl(1+|x|\bigr)^{-\alpha}, $$

where |xy|<1, \(x\in\mathbb{R}\). Then

$$ n^{1-H}E_{n,X}(x)\Rightarrow A^{1/\alpha}C_{\alpha}^{-1/\alpha}\frac {c_{a}}{d}p_{X}(x)\tilde{Z}_{H,\alpha}(1), $$
(4.173)

where \(\tilde{Z}_{H,\alpha}(1)\) is a symmetric α-stable random variable with scale η given by

$$\eta= \biggl( \int_{-\infty}^{1} \bigl\{ (1-v)_{+}^{d}-(-v)_{+}^{d} \bigr\}^{\alpha}\,dv \biggr)^{1/\alpha}. $$

4.8.5 Tail Empirical Processes

Let X t (\(t\in\mathbb{Z}\)) be a stationary sequence with marginal distribution F X . More specifically, we shall assume that X t is a stochastic volatility model considered in Sect. 4.3.4. Recall that the model is X t =ξ t σ t (\(t\in\mathbb{Z}\)), where

$$\sigma_{t}=\sigma(\zeta_{t}),\quad\zeta_{t}=\sum_{j=1}^{\infty}a_{j}\varepsilon_{t-j}, $$

and σ(⋅) is a positive function. It is assumed that ξ t (\(t\in\mathbb{Z}\)) is a sequence of i.i.d. random variables such that

$$ P(\xi_{1}>x)\sim A\frac{1+\beta}{2}{x^{-\alpha}},\qquad P(\xi _{1}<-x)\sim A\frac{1-\beta}{2}x^{-\alpha}. $$
(4.174)

Also, we assume that the sequences ξ t (\(t\in\mathbb{Z}\)) and ε t (\(t\in\mathbb{Z}\)) are mutually independent. In particular (cf. Lemma 4.20), we have

$$P\bigl(|X_{1}|>x\bigr)\sim E\bigl(\sigma^{\alpha}(\zeta_{1}) \bigr)P\bigl(|\xi_{1}|>x\bigr), $$

provided that

$$ E\bigl[\sigma^{\alpha+\delta}(\zeta_{1})\bigr]<\infty $$
(4.175)

for some δ>0. In Theorem 4.19 we saw that the limiting behaviour of partial sums depends on an interplay between the long-memory parameter d and the tail index α. Therefore, it is important to have reliable estimates of both parameters, d and α. With the help of the tail empirical process it is possible to prove asymptotic normality of the so-called Hill estimator of α.

We note first that the tail behaviour of X implies that, as n→∞,

$$T_{n}(x):=P\bigl(X_{1}>(1+x)u_{n}|X_{1}>u_{n} \bigr)=\frac{\bar{F}_{X}((1+x)u_{n})}{ \bar{F}_{X}(u_{n})}\rightarrow T(x):=(1+x)^{-\alpha} $$

for any sequence of constants u n →∞. The tail empirical distribution functions \(\tilde{T}_{n}(s)\) and the tail empirical processes e n (s) are defined by

$$\tilde{T}_{n}(s)=\frac{1}{n\bar{F}_{X}(u_{n})}\sum_{t=1}^{n}1 \bigl\{X_{t}>u_{n}(1+s)\bigr\} $$

and

$$ e_{n}(s)=\tilde{T}_{n}(s)-T_{n}(s)\ \bigl(s\in[0, \infty )\bigr). $$
(4.176)

We note that for large values of u n , only extreme observations are included in the sum. Hence the name “tail empirical”.

Drees (1998, 2000) and Rootzén (2009) show that for weakly dependent observations X t , scaled processes w n e n converge weakly in D[0,∞) to a Gaussian process w=BT, where B is a standard Brownian motion, and \(w_{n}^{2}=n\bar{F}_{X}(u_{n})\). The situation changes in the long-memory case. The limiting behaviour depends on an interplay between the memory parameter d and the behaviour of u n . If u n grows sufficiently fast (that means that very few extremes are included in the tail empirical distribution), then long memory does not influence the limit: w n e n w with, as before, \(w_{n}=\sqrt{n\bar{F}_{X}(u_{n})}\) and w=BT. However, if u n grows at an appropriately slow rate, then long memory starts to play a role: w n e n converge weakly to a degenerate limiting process w(s)=CT(s)Z m,H (1) (where C is a constant), and the scaling factor is different, namely \(w_{n}=n^{m(\frac {1}{2}-d)}L(n)\), where L is a slowly varying function. The corresponding result is stated in Theorem 4.36.

In order to state the result, let us define the function G n on (−∞,∞)×[0,∞) by

$$ G_{n}(x,s)=\frac{P(\sigma(x)\xi_{1}>(1+s)u_{n})}{P(\xi_{1}>u_{n})}. $$
(4.177)

This function converges pointwise to T(s)G(x)=T(s)σ α(x). Furthermore, the Hermite coefficients J n (m,s) of the function xG n (x,s) converge (as n→∞) to J(m)T(s), uniformly with respect to s≥0, where J(m) is the m-th Hermite coefficient of G. This implies that for large n, the Hermite rank m n (s) of G n (⋅,s) is not greater than the Hermite rank m of G. To avoid further complications, we impose the assumption inf s≥0 m n (s)=m for sufficiently large n.

Theorem 4.36

Consider the stochastic volatility model X t =ξ t σ t (\(t\in\mathbb{Z}\)) and assume that (4.174) and (4.175) hold. Additionally, we assume that ζ j (\(t\in\mathbb{Z}\)) is a Gaussian linear process with coefficients a j satisfying (B1), i.e. a j =L a (j)j d−1, d∈(0,1/2) (so that γ X (k)∼L γ (k)k 2d−1). Let m≥1 be the Hermite rank of the function σ α(⋅), and set H=d+1/2. Assume that E[σ 2α+δ(X 1)]<∞.

  1. (i)

    If \(n\bar{F}_{X}(u_{n})\rightarrow\infty\) and \(n^{1-m(1-2d)}L_{m}(n)\bar{F}_{X}(u_{n})\rightarrow0\) as n→∞, then \(\sqrt{n\bar{F}_{X}(u_{n})}\,e_{n}\) converges weakly in D[0,∞) to the Gaussian process BT, where B is a standard Brownian motion.

  2. (ii)

    If \(n\bar{F}_{X}(u_{n})\rightarrow \infty\) and \(n^{1-m(1-2d)}L_{m}(n)\bar{F}_{X}(u_{n})\rightarrow\infty\) as n→∞, then

    $$n^{m(\frac{1}{2}-d)}L_{m}^{-1/2}(n)e_{n}(s)\Rightarrow\frac{J(m)T(s)}{E[\sigma^{\alpha}(\zeta_{1})]}Z_{m,H}(1), $$

    wheredenotes weak convergence in D[0,∞), Z m,H (⋅) is a Hermite–Rosenblatt process, and \(L_{m}(n)=m!C_{m}L_{\gamma}^{m}(n)\).

The practical application of these limit theorems for e n (⋅) is not quite straightforward. First of all, \(\bar{F}_{X}(u_{n})\) is unknown. The second problem is that we would like to center the tail empirical distribution function by T(s), not T n (s). The second question can be addressed by introducing the assumption

$$ \lim_{n\rightarrow\infty}w_{n}\Vert T_{n}-T\Vert_{\infty}=0, $$
(4.178)

where

$$\Vert T_{n}-T\Vert_{\infty}=\sup_{t\geq1}\biggl \vert \frac{P(X_{1}>u_{n}t)}{P(X_{1}>u_{n})}-t^{-\alpha}\biggr \vert , $$

and the scaling w n is either \(\sqrt{n\bar{F}_{X}(u_{n})}\) or \(n^{m(\frac{1}{2}-d)}L_{m}^{-1/2}(n)\) in cases (i) and (ii) respectively. In other words, we impose a condition that makes the bias T n T negligible. This is related to the so-called second-order regular variation (see Drees 1998; Kulik and Soulier 2011), but we omit details here. As an example, assume for instance that

$$P(\xi_{1}>x)=cx^{-\alpha}\bigl(1+O\bigl(x^{-\beta}\bigr) \bigr)\quad (x\rightarrow\infty) $$

for some constant c>0. Then the second-order regular variation refers to the second-order term x β in the expansion for the tail of ξ 1.

Now, suppose that the second-order assumption holds. Let X 1:n ≤⋯≤X n:n be the order statistics of X 1,…,X n , define \(k_{n}=n\bar{F}_{X}(u_{n})\) and replace u n by X nk:n in the definition of the tail empirical distribution function. Implicitly, k=k n will become a user chosen number of extreme statistics such that k n →∞ and k n =o(n). Thus, we define

$$\hat{T}_{n}(s)=\frac{1}{k}\sum_{t=1}^{n}1 \bigl\{X_{t}>X_{n-k:n}\cdot(1+s)\bigr\} $$

and the practically computable processes

$$\hat{e}_{n}^{\ast}(s)=\hat{T}_{n}(s)-T(s)\qquad\bigl(s\in[0,\infty)\bigr). $$

It follows from Rootzén (2009) and Kulik and Soulier (2011) that

$$w_{n}\,\hat{e}_{n}^{\ast}(s)\Rightarrow w^{\ast}(s)=w(s)-T(s)w(1). $$

In particular, if \(w_{n}=\sqrt{n\bar{F}_{X}(u_{n})}=\sqrt{k_{n}}\) and w(s)=B(T(s)), then \(w^{\ast}(s)=\tilde{B}(T(s))\), where \(\tilde{B}\) is a Brownian bridge. However, if \(w_{n}=n^{m(\frac {1}{2}-d)}L(n)\) and w(s)=CT(s)Z m,H (1), then w (s)=0. This is a similar effect as for the standard empirical process with estimated parameters considered in Sect. 4.34. More surprisingly, we have the following result for the process \(\hat{e}_{n}^{\ast}(s)\).

Theorem 4.37

Assume that the conditions of Theorem 4.36 are fulfilled. Assume additionally that (4.178) holds. Then \(\sqrt{k}\,\hat {e}_{n}^{\ast }(s)\) converges weakly in D[0,∞) to the Gaussian process \(\tilde {B}(T(s))\), where \(\tilde{B}\) is a standard Brownian bridge, regardless of the behaviour of \(n^{1-m(1-2d)}L_{m}(n)\bar{F}_{X}(u_{n})\).

4.8.5.1 Application to Tail Index Estimation

One of the most important problems when dealing with heavy tails is to estimate the tail index α. The best known (though in many ways not always reliable) method is Hill’s estimator. Using the notation γ=α −1, the Hill estimator of γ is defined by

$$\hat{\gamma}_{n}=\frac{1}{k}\sum_{j=1}^{k} \log \biggl( \frac{X_{n-j+1:n}}{X_{n-k:n}} \biggr) . $$

Noting that

the estimator can also be written as

$$\hat{\gamma}_{n}=\int_{0}^{\infty}\frac{\hat{T}_{n}(s)}{1+s}\,ds. $$

Since \(\gamma=\int_{0}^{\infty}(1+s)^{-1}T(s)\,ds\), we have

$$\hat{\gamma}_{n}-\gamma=\int_{0}^{\infty}\frac{\hat{e}_{n}^{\ast}(s)}{1+s}\,ds. $$

Thus we can apply Theorem 4.37 to obtain the asymptotic distribution of the Hill estimator. Heuristically,

$$\sqrt{k_{n}} ( \hat{\gamma}_{n}-\gamma ) \rightarrow_{d}\int_{0}^{\infty} \frac{\tilde{B}(T(s))}{1+s}\,ds. $$

This integral is a normal random variable with variance γ 2 (for details, see Kulik and Soulier 2011). In summary, we have the following result.

Corollary 4.5

Under the assumptions of Theorem 4.37, \(\sqrt{k}(\hat{\gamma}_{n}-\gamma)\) converges in distribution to a centred Gaussian distribution with variance γ 2.

This result can be used to construct confidence intervals for γ. It is known that this result gives the best possible rate of convergence for the Hill estimator for i.i.d. data (see Drees 1998). The surprising result is that it is possible to achieve the same i.i.d. rates regardless of the dependence parameter d.

4.8.5.2 Proof of Theorem 4.36

Proof

We follow a similar idea as in the proof of Theorem 4.19. Let \(\mathcal{E}\) be the σ-field generated by the Gaussian process ζ t (\(t\in\mathbb{Z}\)). Write

(4.179)

The difference between (4.179) and the decomposition used in the proof of Theorem 4.19 is that here the first part is the sum of conditionally independent random variables, instead of being a martingale. The second part is a function of the Gaussian sequence ζ t (\(t\in\mathbb{N}\)) and does not depend on the sequence ξ t (\(t\in\mathbb{N}\)).

For the first part, it can be shown that, using the conditional independence,

$$\log E \bigl[ \exp \bigl( it\sqrt{n\bar{F}_{X}(u_{n})}M_{n}(0) \bigr) \big|\mathcal{E} \bigr] \rightarrow_{P}-t^{2}/2. $$

The bounded convergence theorem implies

$$\sqrt{n\bar{F}_{X}(u_{n})}M_{n}(0)\rightarrow_{d}T(0)Z, $$

where Z is standard normal. Using the Cramer–Wald device, it is extended to

(4.180)

where the normal random variables are independent. Computations are somewhat involved, but the idea is relatively easy. Since the random variables are conditionally independent, the characteristic function can be evaluated.

Recall that

$$G_{n}(x,s)=\frac{P(\sigma(x)\xi_{1}>(1+s)u_{n})}{P(\xi_{1}>u_{n})} $$

converges pointwise to T(s)G(x)=T(s)σ α(x). Let us now write

with \(R_{n}^{\ast}=\sum_{t=1}^{n}G(\zeta_{t})\). Convergence of \(T(s)R_{n}^{\ast}\) is concluded in the very same way as in (4.102) and (4.103). For m(1/2−d)<1 and m(1/2−d)>1, we have, respectively,

$$n^{-(1-m(\frac{1}{2}-d))}L_{m}^{-1/2}(n)R_{n}^{\ast}\Rightarrow\frac {J(m)}{m!}Z_{m,H}(1) $$

and

$$n^{-1/2}R_{n}^{\ast}\Rightarrow vZ, $$

where v is a constant. The second part, \(\tilde{R}_{n}(s)\) is of a smaller order than \(R_{n}^{\ast}\), uniformly in s≥0. Since

$$ R_{n}(s)=\frac{P(\xi_{1}>u_{n})}{n\bar{F}_{X}(u_{n})}\sum_{t=1}^{n} \bigl( G_{n}(\zeta_{t},s)-E\bigl[G_{n}( \zeta_{t},s)\bigr] \bigr) , $$
(4.181)

and \(P(\xi_{t}>u_{n})/\bar{F}_{X}(u_{n})\rightarrow 1/E[\sigma^{\alpha}(\zeta_{1})]\), we conclude that for m(1/2−d)<1,

$$ n^{m(\frac{1}{2}-d)}L_{m}^{-1/2}(n)R_{n}(s) \rightarrow_{d} \frac{J(m)T(s)}{E[\sigma^{\alpha}(\zeta_{1})]} Z_{m,H}(1). $$
(4.182)

This convergence is easily extended to multivariate convergence. If m(1/2−d)>1, then R n (s) is uniformly negligible w.r.t. the conditionally independent part M n (s). Therefore, (4.182) and (4.180) yield the finite-dimensional convergence. For details and proof of tightness, we refer to Kulik and Soulier (2011). □

4.8.5.3 Further Extensions

The results given above are extendable to stochastic volatility models with leverage. Instead of decomposing e n (s) into a conditionally i.i.d. part M n (s) and a long-memory part R n (s), we may apply the martingale decomposition as in the proof of Theorem 4.19. For details, see Luo (2011).

4.9 Limit Theorems for Counting Processes and Traffic Models

In this section we review limit theorems for counting processes and traffic models, such as renewal reward, ON–OFF, shot-noise and infinite source Poisson processes, considered in Sect. 2.2.4.

4.9.1 Counting Processes

Let X j (j≥1) be a stationary sequence of strictly positive random variables with distribution F and finite mean. Let τ 0 have the distribution F (0) and define

$$\tau_{j}=\tau_{0}+\sum_{k=1}^{j}X_{k}\quad(j\geq1) $$

and

$$S_{n}(t)=\sum_{j=1}^{[nt]}X_{j}. $$

Note that the notation X j and S n (t) is different from what was used previously (which was \(S(u)=\sum_{t=1}^{[nu]}X_{t}\)). The reason is that here the natural time parameter is in the upper limit [nt] of the sum.

Now, let N(t) be the associated counting process. Since

$$N(t)=\max\{k\geq0:\tau_{k-1}\leq t\}=\min\{k\geq0:\tau_{k}>t\}, $$

one can view N(t) as the generalized inverse of the partial sums process S n (t). Consequently, if the limiting process for partial sums is Gaussian, Lemma 4.7 will imply the weak convergence of N(t) from that of S n (t). In other words, we apply Lemma 4.7 to

  • y n (t)=S n (t)/(),

  • \(y_{n}^{-1}(t)=N_{n}(t)/n\), where N n (t)=N(nμt).

If \(c_{n}^{-1}(S_{n}(t)/(n\mu)-t)\) converges to a process S(t) with some constants c n , then \(c_{n}^{-1}(N(n\mu t)/n-t)\) converges to −S(t). The same procedure applies to any stationary counting process associated with a stationary sequence X j (\(j\in\mathbb{N}\)) with finite mean.

Example 4.24

Recall Theorem 4.5. There, X j (\(j\in\mathbb{N}\)) is a linear process \(X_{j}=\sum _{k=0}^{\infty}a_{k}\varepsilon_{j-k}\) with summable coefficients a k and i.i.d. centred innovations ε j (\(j\in\mathbb{Z}\)). We can reformulate Theorem 4.5 to accommodate \(\mu=E(X_{1})\not=0\). We have

$$n^{-1/2}\sum_{j=1}^{[nt]}(X_{j}-\mu)\Rightarrow vB(t) $$

in D[0,1], where \(v^{2}=\sigma_{X}^{2}+2\sum _{k=1}^{\infty}\gamma_{X}(k)\), and B(t) (t∈[0,1]) is a standard Brownian motion. Equivalently,

$$\frac{S_{n}(t)/(n\mu)-t}{n^{-1/2}}\Rightarrow v\mu^{-1}B(t), $$

so that S(t)= −1 B(t) and c n =n −1/2. Application of Lemma 4.7 yields

$$n^{-1/2}\bigl(N(n\mu t)-nt\bigr)\Rightarrow v\mu^{-1}B(t). $$

However, we cannot extend this to the situation of Theorem 4.6. The long-range dependent linear process must have zero mean and hence cannot be strictly positive.

Example 4.25

Recall Example 4.12. The model considered there is X j =ξ j σ(ζ j ), where ξ j (j≥1) are strictly positive random variables with mean E(ξ 1), and ζ j is a centred Gaussian sequence with covariance γ ζ (k)∼L γ (k)k 2d−1, d∈(0,1/2). We established in Example 4.12 that for G(x)=x and σ(x)=exp(x), we have

$$n^{-(d+1/2)}L_{1}^{-1/2}(n)\sum _{j=1}^{[nt]}\bigl(X_{j}-E(X_{1}) \bigr)\Rightarrow J(1)B_{H}(t) $$

weakly in D[0,1], where B H (⋅) is fractional Brownian motion with H=d+1/2 and J(1)=E(ζ 1exp(ζ 1))E(ξ 1). Hence, for the inverse processes, we obtain

$$n^{-H}L_{1}^{-1/2}(n) \bigl(N(n\mu t)-nt\bigr) \Rightarrow J(1)\mu^{-1}B_{H}(t). $$

Thus, long memory in the interpoint distances generates long-memory-type behaviour in the functional central limit theorem for the counting process.

Let now X j (\(t\in\mathbb{N}\)) be an i.i.d. sequence of strictly positive random variables such that

$$P(X_{1}>x)\sim Ax^{-\alpha}\quad(A>0,\alpha>1). $$

In Sect. 4.3 we saw that the appropriately centred and normalized S n (t) converges to an α-stable Lévy process with independent increments (cf. (4.80)):

$$c_{n}^{-1}\sum_{j=1}^{[nt]}(X_{j}-\mu)\Rightarrow C_{\alpha}^{-1/\alpha} Z_{\alpha}(t), $$

where c n =inf{s:P(X>x)≤n −1}, c n A 1/α n 1/α, and Z α (t) is an α-stable Lévy motion such that \(Z_{\alpha}(1)\overset{\mathrm{d}}{=}S_{\alpha}(1,1,0)\). The limiting process has discontinuous sample paths, and hence Lemma 4.7 is not applicable. However (see Theorem 7.3.2 in Whitt 2002), one can generalize Vervaat’s result to cover the case of limiting processes with discontinuous sample paths. One has to mention though that although S n (t) may converge in the standard Skorokhod topology, the same does not apply to the counting process. One has to consider a weaker M 1 topology (see comments on p. 235 as well as Sects. 13.6 and 13.7 in Whitt 2002). Here, we just illustrate finite-dimensional convergence.

Example 4.26

In the situation described above,

$$ c_{n}^{-1}\bigl(N(n\mu t)-nt\bigr)\overset{\mathrm{fidi}}{ \rightarrow}-C_{\alpha }^{-1/\alpha}\mu^{-1} Z_{\alpha}(t). $$
(4.183)

Thus, a heavy-tailed distribution of interarrival times X j generates Long-Range count Dependence (LRcD) in the counting process (see Example 2.5). On the other hand, the limiting process has independent increments. Furthermore, in Example 2.5 we found out that \(\operatorname {var}(N(t))\) is proportional to t 2H (as t→∞) with H=(3−α)/2. On the other hand, n H(N(nμt)−nt) converges to 0 in probability. Hence, N(⋅) is an example of a second-order stationary process where its standard deviation does not yield an appropriate scaling.

Example 4.27

Recall Example 4.17. If d+1/2<1/α, then by Whitt’s approach

$$ n^{-1/\alpha}\bigl(N(n\mu t)-nt\bigr)\overset{\mathrm{fidi}}{\rightarrow}-A^{1/\alpha }C_{\alpha }^{-1/\alpha} \bigl\{ E\bigl(\sigma_{1}^{\alpha}\bigr) \bigr\}^{1/\alpha} \mu^{-1}Z_{\alpha}(t). $$
(4.184)

If however d+1/2>1/α, we can use Vervaat’s Lemma 4.7 to conclude

$$ n^{-(d+1/2)}L_{1}^{-1/2}(n) \bigl(N(n\mu t)-nt\bigr) \Rightarrow J(1)E(\xi_{1})\mu^{-1}B_{H}(t). $$
(4.185)

We summarize our findings in Table 4.5. It should be noted that in the case of strong dependence the results are just for the case in Examples 4.25, 4.27, not for all long-memory models.

Table 4.5 Limits for counting processes—tails vs. dependence

4.9.2 Superposition of Counting Processes

Let N (m)(t) (t≥0, m=1,…,M) be independent copies of a stationary renewal process N(t) associated with a renewal sequence X j (\(j\in\mathbb{N}\)). We assume that, as x→∞,

$$\bar{F}(x)=P(X_{1}>x)\sim x^{-\alpha}L(x)\quad(1<\alpha<2), $$

and that \(P(\tilde{X}_{0}>x)=\mu^{-1}\int_{x}^{\infty}\bar{F}(u)\,du\), where μ=E[X 1]=λ −1. Application of Lemma 4.6 yields

$$ \lim_{M\rightarrow\infty}\frac{1}{M^{1/2}}\sum_{m=1}^{M} \bigl( N^{(m)}(t)-\lambda t \bigr) \Rightarrow G(t), $$
(4.186)

where G(⋅) is a Gaussian process with stationary increments and the same covariance structure as N(t). In particular (see Example 2.5),

$$\operatorname {var}\bigl(G(t)\bigr)=\operatorname {var}\bigl(N(t)\bigr)\sim\frac{2\lambda}{(\alpha-1)(2-\alpha)(3-\alpha )}t^{3-\alpha}L(t)=: \sigma_{0}^{2}t^{3-\alpha}L(t). $$

Indeed, to apply Lemma 4.6, we verify that for t>s,

$$\operatorname {var}\bigl( N(t)-N(s) \bigr) =\operatorname {var}\bigl( N(t-s) \bigr) \sim C(t-s)^{2H}$$

and 2H>1. Also, the second condition of Lemma 4.6 is easily verified.

We recognize that the limiting process has up to a constant the same variance as a fractional Brownian motion with the Hurst index H=(3−α)/2. Now, let us consider the time scaled process N (m)(Tt). For a fixed T>0, application of (4.186) yields

$$\lim_{M\rightarrow\infty}\frac{1}{M^{1/2}}\sum_{m=1}^{M} \bigl( N^{(m)}(Tt)-\lambda Tt \bigr) \Rightarrow G(Tt)= \sigma_{0}B_{H}(Tt) $$

and \(\operatorname {var}(G(Tt))\sim\sigma_{0}^{2}T^{2H}t^{2H}L(Tt)\sim\sigma_{0}^{2}T^{2H}t^{2H}L(T)\) as T→∞. Thus, applying H-self-similarity of fractional Brownian motion, we have

$$\lim_{T\rightarrow\infty}\frac{1}{T^{H}}\lim_{M\rightarrow\infty}\frac{1}{M^{1/2}} \sum_{m=1}^{M} \bigl( N^{(m)}(Tt)- \lambda Tt \bigr) \Rightarrow\sigma_{0}B_{H}(t). $$

On the other hand, (4.183) yields

$$\lim_{T\rightarrow\infty}a_{T}^{-1} \bigl( N^{(m)}(Tt)- \lambda Tt \bigr) \overset{\mathrm{fidi}}{\rightarrow}-\mu^{-1}C_{\alpha}^{-1/\alpha } Z_{\alpha}^{(m)}(\lambda t)\quad(m=1,\ldots,M), $$

where Z (m)(⋅) (m=1,…,M) are independent Lévy processes, and a T T 1/α (T). Consequently, since the sum of independent Lévy processes yields a Lévy process, we obtain

$$\lim_{M\rightarrow\infty}\frac{1}{M^{1/\alpha}}\lim_{T\rightarrow\infty} a_{T}^{-1}\sum_{m=1}^{M} \bigl( N^{(m)}(Tt)-\lambda Tt \bigr) \overset {\mathrm{fidi}}{ \rightarrow}-\lambda^{1+1/\alpha}C_{\alpha}^{-1/\alpha}Z_{\alpha}(t), $$

where Z α (⋅) is an α-stable Lévy process. The limiting constants were obtained by replacing t with λt and using \(Z_{\alpha}(\lambda t)\overset{\mathrm{d}}{=}\lambda^{1/\alpha}Z_{\alpha}(t)\).

We observe that different limiting schemes yield different limiting processes. This feature will be also present in different traffic models.

In contrast, if the renewal sequence has a finite variance and short memory, then application of Example 4.24 yields that both procedures lim M→∞lim T→∞ and lim T→∞lim M→∞ produce the same limit, namely a Brownian motion. Likewise, in the case of strong dependence and a finite variance (as in Example 4.25), both procedures yield a fractional Brownian motion.

We summarize these observations in Table 4.6. We do not fill in the case of strong dependence and heavy tails (situation of Example 4.27). It is clear that there are four possible limits. If the counting process converges to fBm, then the limit for superpositions must be fBm as well. If the counting process converges to a Lévy process, then the superposition converges to either fBm or a Lévy process, depending on the order of taking these limits.

Table 4.6 Limits for superposition of counting processes—tails vs. dependence

4.9.3 Traffic Models

Let W(u) be a traffic model. It can be either a renewal reward, or ON–OFF, or infinite source Poisson or error duration process. In Sect. 2.2.4 we noted that the models have long memory in terms of non-integrable covariances or nonlinear growth of the variance of the integrated process. A very interesting feature is that long memory in a traffic process implies that the integrated process

$$W^{\ast}(t)=\int_{0}^{t} \bigl\{ W(v)-E \bigl[W(v)\bigr] \bigr\}\, dv $$

converges in the sense of finite-dimensional distributions to an α-stable Lévy motion. The scaling factor has to be chosen as T −1/α L(T), where L is a slowly varying function. In particular, this is another example of a second-order long-memory process where the variance grows at rate T 2H, but T H W (Tt) converges to zero in probability as T→∞ (see e.g. Example 4.26). Furthermore, as in the case of counting processes, the convergence cannot hold in the D[0,1] space equipped with the J 1-topology.With respect to J 1 the continuous process W (Tt) must converge to a continuous limit, which is not the case here.

In the context of computer networks, these phenomena describe long memory of an individual source. However, they do not explain long memory at the level of teletraffic, which usually consists of a large number of sources. Assume now that we have M independent copies W (m)(⋅) (m=1,…,M) of the traffic process W(t). Define

$$W_{T,M}^{\ast}(t)=\int_{0}^{Tt}\sum _{m=1}^{M} \bigl\{ W^{(m)}(v)-E\bigl[W(v)\bigr] \bigr\} \,dv=\sum_{m=1}^{M}W^{(m)\ast}(Tt), $$

where W (m)∗(u), m=1,…,M, are i.i.d. copies of the cumulated process W (m)(t). The process \(W_{T,M}^{\ast}(t)\) can be interpreted as (centred) total workload of M workstations at time t or as cumulative packet counts in the network by time t. We are interested in the limiting behaviour of the properly normalized cumulative process \(W_{T,M}^{\ast }(t)\).

We will consider two limiting scenarios. First, we will analyse what happens if we let first M→∞ and then T→∞. In this setup, we will proceed as follows.

Step 1::

Use Lemma 4.6 to establish that with some sequence a M ,

$$\lim_{M\rightarrow\infty}a_{M}^{-1}\sum _{m=1}^{M} \bigl\{ W^{(m)}(t)-E \bigl[W^{(m)}(t)\bigr] \bigr\} $$

converges to a process, say, G(t). If the process is Gaussian, then its covariance structure is the same as that of W(u).

Step 2::

If the process G(t) is Gaussian, then the integral \(G^{\ast}(Tt)=\int_{0}^{Tt}G(u)\,du\) is Gaussian as well. We have

$$\operatorname {var}\bigl(G^{\ast}(Tt)\bigr)=\int_{0}^{Tt} \biggl( \int_{0}^{v}\mathit{cov}\bigl(W(0),W(s) \bigr)\,ds \biggr)\,dv. $$

From the form of the covariance function we will conclude either a Brownian motion or a fractional Brownian motion as limit.

Step 3::

The sum of independent (fractional) Brownian motions yields (fractional) Brownian motion. We will conclude that

$$\lim_{t\rightarrow\infty}a_{T}^{-1}\lim_{M\rightarrow\infty }a_{M}^{-1} \int_{0}^{Tt}\sum_{m=1}^{M} \bigl( W^{(m)}(v)-E\bigl[W^{(m)}(v)\bigr] \bigr)\,dv $$

converges to a (fractional) Brownian motion, where a T is proportional to T 1/2 or T H (H>1/2), respectively.

As for the case T→∞ and then M→∞, we will proceed as follows.

Step 1::

For each m=1,…,M, approximate

$$\lim_{T\rightarrow\infty}c_{T}^{-1}\int_{0}^{Tt} \bigl\{ W^{(m)}(v)-E\bigl[W^{(m)}(v)\bigr] \bigr\}\,dv \approx c_{T}^{-1}\sum_{j=1}^{N(Tt)}U_{j}\quad(T\rightarrow\infty), $$

where N(⋅) is an appropriate counting process, and U j (\(j\in \mathbb{N}\)) is an appropriate i.i.d. sequence. Note that both N and U j depend on m. If the random variables U j have a finite variance, then for each m, the limiting process is a Brownian motion, and c T =T 1/2. If the random variables U j are regularly varying with index α, then we obtain a Lévy process as a limit and c T =T 1/α.

Step 2::

The sum of independent Brownian motions (Lévy processes) is a Brownian motion (Lévy process). We conclude the convergence for

$$\lim_{M\rightarrow\infty}d_{M}^{-1}\lim_{T\rightarrow\infty }c_{T}^{-1} \int_{0}^{Tt}\sum_{m=1}^{M} \bigl( W^{(m)}(v)-E\bigl[W^{(m)}(v)\bigr] \bigr)\,dv $$

with some sequence d M .

One has to mention though that the proofs are sketched, without verifying some technical details.

4.9.4 Renewal Reward Processes

Recall from Example 2.12 the renewal reward process

$$W(t)=Y_{0}1 \{ 0<t<\tau_{0} \} +\sum _{j=1}^{\infty}Y_{j}1 \{ \tau_{j-1}\leq t< \tau_{j} \} , $$

X j =τ j τ j−1. We assume for simplicity that Y j (\(j\in\mathbb{N}\)) is a centred i.i.d. sequence, independent of the renewal sequence τ 0,X j (j≥1), and also that E[X 1]=μ=λ −1 is finite. We are interested in the limiting behaviour of the cumulative process \(W_{T,M}^{\ast}(t)\) defined above. For the purpose of the limiting regime lim M→∞lim T→∞, we represent the cumulative process as follows:

$$ \int_{0}^{Tt}W(u)\,du=\min \{ Tt,\tau_{0} \} Y_{0}+\sum_{j=0} ^{\infty}Y_{j+1} \bigl(\min \{ Tt,\tau_{j+1} \} -\tau_{j}\bigr)_{+}. $$
(4.187)

Indeed, if Tt<τ 0, then \(\int_{0}^{Tt}W(u)\,du=Y_{0}Tt\); if τ 0<Tt<τ 1, then \(\int_{0}^{Tt}W(u)\,du=Y_{0}Tt+Y_{1}(Tt-\tau _{0})\) etc.

An alternative representation will yield an approximation of the cumulative reward by a sum of i.i.d. random variables. For Tt>τ 0, we may write

$$ \int_{0}^{Tt}W(u)\,du=Y_{0}\tau_{0}+\sum_{j=1}^{N(Tt)}Y_{j}X_{j}-U, $$
(4.188)

where N(t) is the renewal process associated with τ j . The first two terms represent the renewal intervals that are at least partially included in [0,Tt]. For example, if τ 0<Tt<τ 1, then N(Tt)=1, and the sum includes Y 0 τ 0+Y 1 X 1. However, not the entire renewal interval X 1 is included in [0,Tt]. We have to subtract a portion (τ 1Tt)Y 1, and this is “hidden” in the variable U.

In most cases considered below, only \(\sum_{j=1}^{N(Tt)}Y_{j}X_{j}\) contributes to the limiting behaviour of \(\int_{0}^{Tt}W(u)\,du \).

We start with a standard limiting behaviour. Specifically, we assume first that \(\operatorname {var}(X)=\sigma_{X}^{2}<\infty\) and \(\operatorname {var}(Y)=\sigma_{Y}^{2}<\infty\). In particular, there is no LRcD in the counting process N(t) and hence in the cumulative renewal reward process \(\int_{0}^{t} W(u)\,du\).

Theorem 4.38

Assume that

  • Interarrival times have a finite variance: \(\operatorname {var}(X_{1})=\sigma_{X}^{2}<\infty\);

  • Rewards have a finite variance: \(\operatorname {var}(Y_{1})=\sigma_{Y}^{2}<\infty\).

Then,

$$\lim_{T\rightarrow\infty}\lim_{M\rightarrow\infty}\frac{W_{T,M}^{\ast}(t)}{T^{1/2}M^{1/2}}=\lim_{M\rightarrow\infty}\lim_{T\rightarrow\infty }\frac{W_{T,M}^{\ast}(t)}{T^{1/2}M^{1/2}}\overset{\mathrm{d}}{=} \sigma_{\mathrm{reward},1}B(t), $$

where \((B(t),t\in\mathbb{R})\) is a standard Brownian motion,

$$\sigma_{\mathrm{reward},1}^{2}=\frac {E[X_{1}^{2}]E[Y_{1}^{2}]}{E[X_{1}]},$$

and the convergence is to be understood as a finite-dimensional one.

Proof

First, we consider the limit taken in the order lim M→∞ first, and then lim T→∞.

Step 1::

Since W (m) (m=1,…,M) are independent identically distributed processes with finite variance, application of Lemma 4.6 implies that for each T,

$$\lim_{M\rightarrow\infty}\frac{1}{M^{1/2}}\sum _{m=1}^{M}W^{(m)}(Tt)\Rightarrow G(Tt) $$

in D[0,∞), where G(t) (t≥0) is a centred stationary Gaussian process with covariance function cov(W(0),W(u)).

Step 2::

The cumulative process \(G^{\ast}(\cdot t)=\int _{0}^{\cdot t}G(t)\,du\) is still a Gaussian process with variance \(\operatorname {var}(G^{\ast }(Tt))=\operatorname {var}( \int_{0}^{Tt}W(u)\,du) =TtE[X_{1}^{2}]E[Y_{1}^{2}]/\mu\) (see Examples 2.5 and 2.12).

Step 3::

The form of the covariance function yields that the process T −1/2 G (Tt) (t≥0) is a Brownian motion.

Now, we consider the reverse order of taking the limits.

Step 1::

We use an approximation induced by representation (4.188).

$$\frac{1}{T^{1/2}}\sum_{j=1}^{N(Tt)}Y_{j}X_{j}= \biggl( \frac {N(Tt)}{T} \biggr)^{1/2}\frac{1}{\sqrt{N(Tt)}}\sum _{j=1}^{N(Tt)}Y_{j}X_{j}. $$

Recall that for a stationary renewal process, N(Tt)/TEE[N(t)]=λt=μ −1 t. Thus, as T→∞,

$$\frac{1}{T^{1/2}}\sum_{j=1}^{N(Tt)}Y_{j}X_{j}\approx\frac{t^{1/2}}{\mu ^{1/2}}\frac{1}{(Tt)^{1/2}}\sum_{j=1}^{Tt}Y_{j}X_{j}\Rightarrow\frac{1}{\mu ^{1/2}}\sqrt{\operatorname {var}(Y_{1}X_{1})}B(t). $$

Since X 1 and Y 1 are independent and E[Y 1]=0, we obtain \(\operatorname {var}(Y_{1}X_{1})= E[Y_{1}^{2}]E[X_{1}^{2}]\).

Step 2::

Hence, for each fixed m=1,…,M,

$$T^{-1/2}\int_{0}^{Tt}W^{(m)}(u)\,du\Rightarrow\sigma_{\mathrm{reward},1}B^{(m)}(t), $$

where B (m)(t) are independent standard Brownian motions. Hence, the superposition converges to a Brownian motion.

 □

Next, we analyse what happens if the finite variance assumption on the rewards still holds, but the renewal process has intervals with an infinite variance. Recall that then the corresponding counting process N(t) has the LRcD property (see Examples 2.5 and 2.12) since its variance grows faster than linear. Also (see Examples 2.5 and 2.12), the variance of the cumulative process \(\int_{0}^{Tt}W(u)\,du\) grows faster than linear.

Theorem 4.39

Assume that

  • Interarrival times are regularly varying: P(X 1>x)∼C X x α (α∈(1,2)) as x→∞;

  • Rewards have a finite variance \(\operatorname {var}(Y_{1})=\sigma_{Y}^{2}<\infty \), and they are symmetric.

Then,

$$ \lim_{T\rightarrow\infty}\lim_{M\rightarrow\infty}\frac{W_{T,M}^{\ast}(t)}{T^{H}M^{1/2}}\overset{\mathrm{d}}{=}\sigma_{\mathrm{reward},2}B_{H}(t), $$
(4.189)

where \((B_{H}(t),t\in\mathbb{R})\) is a standard fractional Brownian motion with Hurst index H=(3−α)/2, and

$$\sigma_{\mathrm{reward},2}^{2}=C_{X}\frac{2E[Y_{1}^{2}]}{E[X_{1}](\alpha-1)(2-\alpha)(3-\alpha)}. $$

On the other hand,

$$ \lim_{M\rightarrow\infty}\lim_{T\rightarrow\infty}\frac{W_{T,M}^{\ast}(t)}{T^{1/\alpha}M^{1/\alpha}}\overset{\mathrm{d}}{=}C_{\mathrm {reward},1}Z_{\alpha}(t), $$
(4.190)

where \(Z_{\alpha}(t)\overset{\mathrm{d}}{=}t^{1/\alpha}S_{\alpha }(1,0,0)\) is a symmetric Lévy process, and

$$C_{\mathrm{reward},1}=\mu^{-1/\alpha}E^{1/\alpha}\bigl[|Y_{1}|^{\alpha}\bigr]C_{X}^{1/\alpha}C_{\alpha}^{-1}. $$

Sketch of Proof

First, we proceed with lim T→∞lim M→∞.

Step 1::

As in the case of Theorem 4.38, Lemma 4.6 implies that for each T,

$$\lim_{M\rightarrow\infty}\frac{1}{M^{1/2}}\sum _{m=1}^{M}W^{(m)}(Tt)\Rightarrow G(Tt) $$

in D[0,∞), where G(t) (\(t\in\mathbb{R}\)) is a centred stationary Gaussian process with covariance function cov(W(0),W(t)).

Step 2::

The cumulative process G (Tt) is Gaussian with variance σ reward,2(Tt)2H, H=(3−α)/2 (see Example 2.12).

Step 3::

The form of the variance yields that the scaled process T H G (Tt) is a fractional Brownian motion.

Next, we deal with the reversed order of limits.

Step 1::

We have

By applying Breiman lemma we note that

$$P(Y_{1}X_{1}>x)\sim E\bigl[Y_{+}^{\alpha} \bigr]P(X_{1}>x)\sim E\bigl[Y_{+}^{\alpha}\bigr]C_{X}x^{-\alpha} $$

and

$$P(Y_{1}X_{1}<-x)\sim E\bigl[Y_{-}^{\alpha} \bigr]P(X_{1}>x)\sim E\bigl[Y_{-}^{\alpha}\bigr]C_{X}x^{-\alpha}. $$

Thus, application of (4.80) yields

$$\frac{1}{T^{1/\alpha}}\sum_{j=1}^{N(Tt)}Y_{j}X_{j} \Rightarrow\mu^{-1/\alpha }E^{1/\alpha}\bigl[|Y_{1}|^{\alpha} \bigr]C_{X}^{1/\alpha}C_{\alpha}^{-1} Z_{\alpha}(t). $$
Step 2::

The result follows by taking d M =M 1/α. □

Finally, we analyse the case where both interarrival times and rewards are heavy tailed. We separate both limiting regimes in two theorems below.

Theorem 4.40

Assume that

  • Interarrival times are regularly varying: P(X 1>x)∼C X x α (α∈(1,2)) as x→∞;

  • Rewards are regularly varying: P(Y 1>x)∼C Y x β (β∈(1,2)) as x→∞; and they are symmetric.

We have the following limits as lim M→∞lim T→∞:

  • If α<β<2, then (4.190) still holds.

  • If β<α<2, then

    $$ \lim_{M\rightarrow\infty}\lim_{T\rightarrow\infty}\frac{W_{T,M}^{\ast}(t)}{T^{1/\beta}M^{1/\beta}}\overset{\mathrm{d}}{=}C_{\mathrm{reward},2} Z_{\beta}(t), $$
    (4.191)

    where \(Z_{\beta}(t)\overset{\mathrm{d}}{=}t^{1/\beta}S_{\beta}(1,0,0)\) is a symmetric Lévy process, and

    $$C_{\mathrm{reward},2}=\mu^{-1/\beta}E^{1/\beta}\bigl[X_{1}^{\beta }\bigr]C_{Y}^{1/\beta }C_{\beta}^{-1}. $$

Proof

The proof is very similar to that of Theorem 4.39. Recall that the limiting behaviour of \(\int_{0}^{Tt}W(u)\,du\) is determined by \(\sum_{j=1}^{N(Tt)}Y_{j}X_{j}\). If α<β, we may proceed exactly in the same way as in Theorem 4.39. Otherwise, if β<α, then

$$\frac{1}{T^{1/\beta}}\sum_{j=1}^{N(Tt)}Y_{j}X_{j}= \biggl( \frac{N(Tu)}{T} \biggr)^{1/\beta}\frac{1}{(N(Tt))^{1/\beta}}\sum _{j=1}^{N(Tt)}Y_{j}X_{j}\approx\frac{1}{\mu^{1/\beta}}\frac{1}{T^{1/\beta}}\sum _{j=1}^{Tt}Y_{j}X_{j}. $$

By applying Breiman lemma we have

$$P(Y_{1}X_{1}>x)\sim E\bigl[X_{1}^{\beta} \bigr]P(Y_{1}>x)\sim E\bigl[X_{1}^{\beta}\bigr]C_{Y}x^{-\beta} $$

and

$$P(Y_{1}X_{1}<-x)\sim E\bigl[X_{1}^{\beta} \bigr]P(Y_{1}<-x)\sim E\bigl[X_{1}^{\beta}\bigr]C_{Y}x^{-\beta}. $$

Thus, application of (4.80) yields

$$\frac{1}{T^{1/\beta}}\sum_{j=1}^{N(Tt)}Y_{j}X_{j} \Rightarrow\mu^{-1/\beta }E^{1/\beta}\bigl[X_{1}^{\beta} \bigr]C_{Y}^{1/\beta}C_{\beta}^{-1} Z_{\beta}(t). $$

 □

We note also in passing that the case β<α above does not require that X 1 is regularly varying. Therefore, (4.191) holds also when β<2 and \(\operatorname {var}(X_{1})<\infty\).

We consider now the case of the other limit.

Theorem 4.41

Assume that

  • Interarrival times consist of positive integers and are regularly varying: P(X 1>x)∼C X x −(α+1) (α∈(1,2)) as x→∞;

  • Rewards are regularly varying and symmetric: P(Y 1>x)∼C Y βx β (β∈(1,2)) as x→∞;

We have the following limits as lim T→∞lim M→∞:

  • If β<α<2, then (4.191) holds.

  • If α<β<2, then

    $$ \lim_{T\rightarrow\infty}\lim_{M\rightarrow\infty}T^{-(\beta-\alpha +1)/\beta }M^{-1/\beta}W_{T,M}^{\ast}(t) \overset{\mathrm{d}}{=}C_{X}^{1/\beta}C_{Y}^{1/\beta}Z_{\beta}^{\ast}(t), $$
    (4.192)

    where \(Z_{\beta}^{\ast}(t)\) is symmetric β-stable process with characteristic function

    $$E \Biggl[ \exp \Biggl( i\sum_{l=1}^{h} \theta_{l}Z_{\beta}^{\ast }(t_{l}) \Biggr) \Biggr] =\exp\bigl(-\sigma^{\beta}(\mathbf{\theta},\mathbf{t})\bigr), $$

    where t=(t 1,…,t h )T, \(\mathbf{\theta}=(\theta _{1},\ldots,\theta_{h}^{T})\),

We observe that if β<α, the order of taking limits does not matter. However, if α<β, we obtain the new process \(Z_{\beta}^{\ast}(t)\). This process has stationary increments and is self-similar with self-similarity parameter H=(βα+1)/β. For details on this process, we refer to Levy and Taqqu (2000). Furthermore, note that the convergence to \(Z_{\beta}^{\ast}(t)\) requires the additional technical assumption that the interarrival times assume positive integers only.

Sketch of Proof

We note that the technique of the proofs of Theorems 4.38 or 4.39 does not work. We cannot apply Lemma 4.6 because the process does not have a finite variance. Instead, we present a simplified version of the proofs of Theorems 2.2 and 2.3 in Levy and Taqqu (2000).

We use representation (4.187). Assume for a moment that Y k (k≥0) are symmetric β-stable, \(Y_{1}\overset {\mathrm{d}}{=}S_{\beta}(\eta,0,0)\), η>0. Thus, its characteristic function is given by

$$\varphi_{Y}(\theta)=E\exp(i\theta Y_{1})=\exp\bigl(- \eta^{\beta}|\theta |^{\beta}\bigr). $$

We compute the characteristic function of \(R(Tu)=\int_{0}^{Tu}W(u)\,du\). Set τ −1=0. Then, by conditioning on the entire sequence τ j and using the fact that the random variables Y j (j≥0) are i.i.d.,

Since \(W_{T,M}^{\ast}(t)\) is the sum of independent copies of the process R(Tu), we have

$$E \Biggl[ \exp \Biggl( i\sum_{l=1}^{h} \theta_{l}M^{-1/\beta}W_{1,M}^{\ast}(t_{l}) \Biggr) \Biggr] =\exp\bigl(-\sigma^{\beta}(\mathbf{ \theta},\mathbf{t})\bigr). $$

An additional limiting argument applied to random variables Y j that are regularly varying as in the theorem yields

$$\lim_{M\rightarrow\infty}M^{-1/\beta}W_{T,M}^{\ast}(t)\overset{\mathrm {d}}{=}Z_{\beta,T}^{\ast}(t), $$

where \(Z_{\beta,T}^{\ast}(t)\) (t∈[0,1]) is a symmetric β-stable process with characteristic exponent \(\sigma^{\beta}(\mathbf{\theta },T\mathbf{t};C_{Y}/C_{\beta})\). This process is neither self-similar, nor has it stationary increments.

More technical details are required to establish

$$T^{-(\beta-\alpha+1)/\beta}\sigma^{\beta}(\mathbf{\theta},T\mathbf{t} ;C_{Y}/C_{\beta})\rightarrow\sigma^{\beta}(\mathbf{\theta},T\mathbf{t}). $$

This implies the finite-dimensional convergence of \(T^{-(\beta-\alpha +1)/\beta}Z_{\beta,T}^{\ast}(t)\) to \(Z_{\beta}^{\ast}(t)\). □

Several bibliographical notes are in place here. Theorem 4.38 was proven in Taqqu and Levy (1986, Theorem 5). Theorem 4.39 was proven in Taqqu and Levy (1986). Theorem 4.40 was proven in Levy and Taqqu (1987), whereas Theorem 4.41 can be found in Levy and Taqqu (2000) and Pipiras and Taqqu (2000b). In particular, in the latter paper, the authors showed that the limiting process \(Z_{\beta}^{\ast}(t)\) is not a linear fractional stable motion. Also see Taqqu (2002) and Willinger et al. (2003) for an overview.

A summary of the results discussed here is given in Table 4.7.

Table 4.7 Limits for superposition of cumulative renewal reward processes—tails of interarrival times vs. tails of rewards. The tail parameters α∈(1,2), β∈(1,2)

4.9.5 Superposition of ON–OFF Processes

Assume now that we have M independent copies W (m)(⋅) (m=1,…,M) of the ON–OFF process W(t) defined in (2.77).

We shall assume that the ON and OFF periods in each model have the same distributions: \(P(X_{j,\mathrm{on}}(m)>x)=\bar{F}_{\mathrm{on}}(x)\), \(P(X_{j,\mathrm{off}}(m)>x)=\bar{F}_{\mathrm{off}}(x)\), where X j,on(m), X j,off(m) (\(t\in\mathbb{Z}\)) are the consecutive ON and OFF periods, respectively, in the mth ON–OFF process (m=1,…,M). Since W (m)(u) are stationary and have the same distribution for each m, we obtain

$$E \Biggl[ \int_{0}^{Tt}\sum _{m=1}^{M}W^{(m)}(u)\,du \Biggr] =\mathit{TME}\bigl[W(0) \bigr]t=\mathit{TM}\frac{\mu_{\mathrm{on}}}{\mu_{\mathrm{on}}+\mu_{\mathrm{off}}}t =\mathit{TM}\frac{\mu_{\mathrm{on}}}{\mu}t. $$

Recall from Lemma 2.7 that the ON–OFF process has long memory (in the sense of Definition 1.4), or \(\int_{0}^{t}W(u)du\) has long memory (in the sense of Definition 1.5) if the ON (or OFF) periods are heavy tailed. In this case we are interested in limit theorems for the superposition of ON–OFF processes. Such studies were conducted in Taqqu et al. (1997), Mikosch et al. (2002) or Dombry and Kaj (2011). Specifically, the following two theorems were proven in Taqqu et al. (1997).

Theorem 4.42

Assume that ON and OFF periods satisfy (2.78) and (2.79), i.e.

(4.193)
(4.194)

with α on<α off. Then,

$$\lim_{T\rightarrow\infty}\lim_{M\rightarrow\infty}\frac{W_{T,M}^{\ast}(t)}{T^{H}M^{1/2}}\overset{\mathrm{d}}{=}C_{\mathrm{on}}^{1/2}\sigma _{\mathrm{on}\mbox{--}\mathrm{off}}B_{H}(t), $$

where (B H (t),t∈(0,1)) is a fractional Brownian motion with Hurst parameter H=(3−α on)/2, and

$$\sigma_{\mathrm{on}\mbox{--}\mathrm{off}}^{2}=\frac{\mu_{\mathrm{on}\mbox{--}\mathrm{off}}^{2}}{(\alpha _{\mathrm{on}}-1)\mu^{3}}. $$

Sketch of Proof

Step 1::

Since W (m)(⋅) (m=1,…,M) are independent identically distributed bounded processes, application of Lemma 4.6 implies

$$\lim_{M\rightarrow\infty}\frac{1}{M^{1/2}}\sum_{m=1}^{M} \bigl\{ W^{(m)}(t)-E\bigl[W^{(m)}(t)\bigr] \bigr\} \Rightarrow G(t), $$

where G(t) (t∈[0,1]) is a centred stationary Gaussian process with the covariance function cov(W(0),W(t)).

Step 2::

Therefore, \(\int_{0}^{Tt}G(t)\,du\) is still a Gaussian process with variance \(\operatorname {var}( \int_{0}^{Tt}W(u)\,du) \). By Lemma 2.7, the variance grows at rate \(C_{\mathrm{on}}\sigma_{\mathrm{on}\mbox{--}\mathrm{off}}^{2}(Tt)^{2H}\) as T→∞, which is the same as for fractional Brownian motion. We conclude

$$\lim_{T\rightarrow\infty}\frac{1}{T^{H}}\int_{0}^{Tt}G(t)\,du\Rightarrow C_{\mathrm{on}}^{1/2}\sigma_{\mathrm{on}\mbox{--}\mathrm{off}}B_{H}(t). $$
Step 3::

Let

$$U(Tt)=\lim_{M\rightarrow\infty}\frac{W_{T,M}^{\ast}(t)}{T^{H}M^{1/2}}. $$

The tightness is verified by noting that as T→∞, for t 1<t 2,

and 2H>1. The tightness is verified by applying Lemma 4.5. □

However, similarly to the case of superposition of renewal processes, different orders of taking limits yield completely different limiting processes.

Theorem 4.43

Assume that ON and OFF periods satisfy (4.193) and (4.194) with α on<α off and α on∈(1,2). Then

$$ \lim_{M\rightarrow\infty}\lim_{T\rightarrow\infty}(MT)^{-1/\alpha}\int_{0}^{Tt} \Biggl( \sum_{m=1}^{M}\bigl(W^{(m)}(u)-E \bigl[W^{(m)}(u)\bigr]\bigr) \Biggr) \,du\overset{\mathrm{d}}{=}C_{0}Z_{\alpha}(t), $$
(4.195)

where \(Z_{\alpha}(t)\overset{\mathrm{d}}{=}t^{1/\alpha}S_{\alpha }(1,1,0)\) is a Lévy process, and \(C_{0}=( \frac{\mu_{\mathrm{off}}}{\mu^{1+1/\alpha}}) C_{\mathrm{on}}^{1/\alpha}C_{\alpha }^{-1/\alpha}\).

Sketch of Proof

Step 1::

First, we show that for each m=1,…,M,

(4.196)

where \(Z_{\alpha}^{(m)}(t)\overset{\mathrm{d}}{=}t^{1/\alpha}S_{\alpha}(1,1,0)\) are independent Lévy processes.

If Ttτ 0, then there are three scenarios possible: either, at 0, the process is ON, then \(\int_{0}^{Tt}W(u)\,du=\min(Tt,X_{0,\mathrm{on}})\); or at 0, the process is OFF, and X 0,off>Tt, then \(\int_{0}^{Tt}W(u)\,du=0\); or at 0, the process is OFF, and X 0,off<Tt, then \(\int_{0}^{Tt}W(u)\,du=Tt-X_{0,\mathrm{off}}\leq\tau_{0}-X_{0,\mathrm{off}}=X_{0,\mathrm{on}}\) (this last situation is shown on Fig. 4.7). In either case, \(\int_{0}^{Tt}W(u)\,du\leq X_{0,\mathrm{on}}\). Since X 0,on is a random variable with a finite mean, we conclude that X 0,on/T 1/α→0 in probability as T→∞.

Fig. 4.7
figure 7

ON–OFF process: The 0th interval starts with OFF period. The marked area shows \(\int_{0}^{Tt}W(u)\,du\)

If Tt>τ 0, then

$$\int_{0}^{Tt}W(u) \,du =X_{0,\mathrm{on}}+ \sum_{j=1}^{N(Tt)}X_{j,\mathrm {on}}-U, $$

where UX N(Tt)+1,on. The first two terms represent the sum of all ON intervals that are at least partially included in [0,Tt]. For example, if τ 0<Tt<τ 1, then N(Tt)=1 and \(\sum_{j=1}^{N(Tt)}X_{j,\mathrm{on}}=X_{1,\mathrm{on}}\); thus, both X 0,on and X 1,on are counted as fully included in [0,Tt]. Now, assume that the renewal intervals X t start with ON periods. It may happen that either τ 0+X 1,on=τ 0+X N(Tt),on<Tt, and then U=0, or τ 0+X 1,on>Tt, and in the latter case we have to subtract a portion (τ 0+X 1,onTt)≤X 2,on that is not included [0,Tt]. A similar consideration is valid if the renewal intervals X t start with OFF periods.

We conclude that the only term that contributes to the limiting behaviour of \(\int_{0}^{Tt}W(u)\,du\) is the sum \(\sum_{j=1}^{N(Tt)}X_{j,\mathrm{on}}\). In the same spirit,

$$Tt=X_{0,\mathrm{on}}+X_{0,\mathrm{off}}+\sum_{j=1}^{N(Tt)}X_{j,\mathrm {on}}+\sum_{j=1}^{N(Tt)}X_{j,\mathrm{off}}-Y, $$

where YX N(Tt)+1,on. Thus, informally,

$$\int_{0}^{Tt}E\bigl[W(u)\bigr]\,du= \frac{\mu_{\mathrm{on}}}{\mu_{\mathrm{on}}+\mu_{\mathrm{off}}}Tt\approx\frac{\mu_{\mathrm{on}}}{\mu_{\mathrm{on}}+\mu_{\mathrm{off}}} \Biggl( \sum _{j=1}^{N(Tt)}X_{j,\mathrm{on}}+\sum _{j=1}^{N(Tt)}X_{j,\mathrm{off}} \Biggr) . $$

Consequently, the limiting behaviour of \(T^{-1/\alpha}\int_{0}^{Tt} \{W(u)-E[W(u)]\}\, du\) is determined by

$$\frac{1}{T^{1/\alpha}}\sum_{j=1}^{N(Tt)} \bigl( J_{j}-E[J_{j}] \bigr) , $$

where after some simple algebra

We thus have

$$\frac{1}{T^{1/\alpha}}\sum_{j=1}^{N(Tt)} \bigl( J_{j}-E[J_{j}] \bigr) = \biggl( \frac{N(Tt)}{T} \biggr)^{1/\alpha}\frac{1}{(N(Tt))^{1/\alpha}}\sum_{j=1}^{N(Tt)} \bigl( J_{j}-E[J_{j}] \bigr) . $$

Recall that for a stationary renewal process N(Tt)/TE[N(t)]=(μ on+μ off)−1 μ −1 t as T→∞. Therefore, the limiting behaviour of sum is the same as that of

$$\frac{t^{1/\alpha}}{\mu^{1/\alpha}}\frac{1}{(Tt)^{1/\alpha}}\sum_{j=1}^{Tt} \bigl( J_{j}-E[J_{j}] \bigr) . $$

We note that, as x→∞,

$$P(J_{1}>x)\sim \biggl( \frac{\mu_{\mathrm{off}}}{\mu} \biggr)^{\alpha _{\mathrm{on}}}C_{\mathrm{on}}x^{-\alpha_{\mathrm{on}}}, \quad\ \ \ P(J_{1} <-x)\sim \biggl( \frac{\mu_{\mathrm{on}}}{\mu} \biggr)^{\alpha_{\mathrm{off}}}C_{\mathrm{off}}x^{-\alpha_{\mathrm{off}}}. $$

Since α=α on<α off, application of (4.80) yields

$$T^{-1/\alpha}\sum_{j=1}^{Tt} \bigl(J_{j}-E[J_{j}]\bigr)\Rightarrow \biggl( \frac{\mu_{\mathrm{off}}}{\mu} \biggr) C_{\mathrm{on}}^{1/\alpha}C_{\alpha }^{-1/\alpha} Z_{\alpha}(u), $$

where \(Z_{\alpha}(t)\overset{\mathrm{d}}{=}t^{1/\alpha}S_{\alpha }(1,1,0)\) is a Lévy process. We conclude that (4.196) holds.

Step 2::

Since the Lévy processes Z (m)(t) are independent, the result follows.

 □

If the ON and OFF times have a finite variance, similar arguments lead to a Brownian motion as a limit for both limiting regimes. We summarize our observations in Table 4.8.

Table 4.8 Limits for superposition of ON–OFF processes

Similar results as for renewal reward and ON–OFF hold for the Infinite Poisson source model, see Konstantopoulos and Lin (1998), Mikosch et al. (2002).

4.9.6 Simultaneous Limits and Further Extensions

What happens when T and M go to infinity simultaneously? The techniques described above fail. Following Mikosch et al. (2002), one can consider the parameter M as an increasing function of T, i.e. M=M(T). Alternatively, see Mikosch and Samorodnitsky (2007), one can consider the intensity of the point process τ j to depend on a number of sources M. Consequently, following Mikosch and Samorodnitsky (2007), we consider the process

$$W_{\lambda_{M},M}^{\ast}(t)=\sum_{m=1}^{M}W^{(m)\ast}(\lambda_{M}t)=\sum _{m=1}^{M}\int_{0}^{\lambda_{M}t}W^{(m)}(u)\,du, $$

where the W(⋅), W (m)(⋅) (m≥1) are independent copies of either a renewal reward, an ON–OFF or an M/G/∞ process. We observe that an increase in the intensity can be interpreted as an increase in time in our original cumulative process \(W_{T,M}^{\ast}(t)\).

Define also a scaling sequence

$$a_{M}=\sqrt{M\operatorname {var}\biggl(\int_{0}^{\lambda_{M}}W(u) \,du\biggr)}. $$

In the examples considered above (i.e. renewal reward, ON–OFF, M/G/∞) we have

$$\operatorname {var}\biggl(\int_{0}^{\lambda_{M}}W(u)\,du\biggr)\sim C \lambda_{M}^{3-\alpha}L(\lambda_{M}). $$

For fixed t, convergence of \(a_{M}^{-1}W_{\lambda_{M},M}^{\ast}(t)\) follows from a classical limit theorem for i.i.d. arrays. Indeed, for some δ>0, using Hölder’s inequality and stationarity of W(u),

$$E \bigl[ \big|W_{\lambda_{M},M}^{\ast}(t)\big|^{2+\delta} \bigr] \leq( \lambda_{M}t)^{1+\delta}\int_{0}^{\lambda_{M}t}E \bigl[\big|W(u)-E\bigl[W(u)\bigr]\big|^{2+\delta }\bigr]\,du\leq C(\lambda_{M}t)^{2+\delta}$$

as long as E[|W(0)|2+δ]<∞. In particular, this is fulfilled for the ON–OFF model and both, renewal reward and M/G/∞, as long as \(E[Y_{1}^{2+\delta}]<\infty\).

If this is the case, we conclude that

$$M^{-\delta/2}\frac{E [ |W_{\lambda_{M},M}^{\ast}(t)|^{2+\delta } ]}{ ( \operatorname {var}(\int_{0}^{\lambda_{M}}W(u)\,du) )^{1+\delta/2}} \sim M^{-\delta/2}\frac{(\lambda_{M}t)^{2+\delta}}{\lambda_{M}^{(3-\alpha )(1+\delta/2)}L^{1+\delta/2}(\lambda_{M})}. $$

For each t, the last expression converges to 0 as long as

$$ \lambda_{M}=o\bigl(M^{1/(\alpha-1+\delta)}\bigr) $$
(4.197)

for some δ>0.

For each t, we conclude the convergence of \(a_{M}^{-1}W_{\lambda _{M},M}^{\ast }(t)\) to a normal distribution. The tightness follows clearly from

$$\operatorname {var}\bigl( a_{M}^{-1}W_{\lambda_{M},M}^{\ast}(t-s) \bigr) =a_{M}^{-2}M\operatorname {var}\biggl(\int _{0}^{\lambda_{M}(t-s)}W(u)\,du\biggr)\leq C(t-s)^{3-\alpha}. $$

Therefore, under the fast growth condition (4.197), we conclude the convergence to an fBm. Of course, if we set λ M =T, then, as M→∞, condition (4.197) is clearly fulfilled, and we may recover the convergence in the lim T→∞lim M→∞ scheme.

Condition (4.197) is called a fast growth condition. Indeed, it means that the number M of sources grows faster than the intensity λ M , which as mentioned above, can be interpreted as time.

It should be mentioned that in the original paper, Mikosch et al. (2002), the fast growth for an M/G/∞ process is defined as

$$ \lim_{T\rightarrow\infty}\lambda_{T}T^{1-\alpha}=\infty. $$
(4.198)

On the other hand, the slow growth is defined as

$$ \lim_{T\rightarrow\infty}\lambda_{T}T^{1-\alpha}=0. $$
(4.199)

Similar conditions are imposed in the ON–OFF (Mikosch et al. 2002) or renewal reward context (Taqqu 2002, Pipiras et al. 2004). Roughly speaking, fast growth corresponds to convergence to an fBm, whereas slow growth is responsible for a stable convergence.

Furthermore, similar results to those presented here can be obtained for very general Poisson shot-noise and cluster processes; see Klüppelberg et al. (2003), Klüppelberg and Kühn (2004), Faÿ et al. (2006), Rolls (2010).

However, the picture may change if we consider more complicated models. In particular, we may obtain an fBm limit even in a slow growth regime (see Mikosch and Samorodnitsky 2007, Fasen and Samorodnitsky 2009).

Furthermore, if the limit in (4.199) is a finite, nonnegative constant, then the limiting process is a fractional Poisson process, see Dombry and Kaj (2011).

4.10 Limit Theorems for Extremes

In this section we study the limiting behaviour of partial maxima based on a stationary sequence X t (\(t\in\mathbb{Z}\)). We start by recalling some basic results for i.i.d. sequences and illustrating Fréchet and Gumbel domains of attraction. Then, for long-memory sequences, we separate our discussion into the Gumbel and the Fréchet case. A primary example for the first situation is a stationary Gaussian sequence. We argue that there is no influence of dependence (in particular, of long memory) on the limiting behaviour of maxima (Berman 1964, 1971; Leadbetter et al. 1978, 1983; Buchmann and Klüppelberg 2005, 2006). On the other hand, there is no available theory for general linear processes with long memory in the Gumbel case. Furthermore, Breidt and Davis (1998) argue that maxima of Gaussian-based stochastic volatility models (with possible long memory) behave as if the random variables were independent.

Next, we turn our attention to the Fréchet domain of attraction. There, the main tool is point process convergence studied in Sect. 4.3. As we will see, the rate of convergence of maxima of linear processes (weakly or strongly dependent) is the same as for i.i.d. sequences, however, dependence implies that the so-called extremal index is smaller than one (Davis and Resnick 1985). On the other hand, extremes of heavy-tailed stochastic volatility models (with possible long memory) behave again like independent random variables (Davis and Mikosch 2001; Kulik and Soulier 2012, 2013).

These considerations in the Gumbel and Fréchet case may suggest that long memory does not play any role in the limiting behaviour of maxima. However, the picture is much more complicated. This will be illustrated by looking at the extremal behaviour of general stationary stable processes in Sect. 4.10.3. That theory was developed in Samorodnitsky (2004, 2006) and Resnick and Samorodnitsky (2004).

We start our discussion with a sequence X t (\(t\in\mathbb{Z}\)) of i.i.d. random variables with common distribution function F. Define partial maxima by M n =max{X 1,…,X n }. The classical Fisher–Tippett theorem identifies three possible limits for M n . We refer to Chap. 3 in Embrechts et al. (1997) for further details and examples.

Theorem 4.44

Assume that X t (\(t\in\mathbb{Z}\)) is a sequence of i.i.d. random variables. If there exist constants c n >0 and \(d_{n}\in\mathbb{R}\) and a non-degenerate distribution function Λ such that

$$c_{n}^{-1} \bigl( \max\{X_{1},\ldots,X_{n} \}-d_{n} \bigr) \overset {d}{\rightarrow}\varLambda, $$

then Λ is one of the following distributions: Fréchet, Weibull or Gumbel, defined by the cumulative distribution functions

Example 4.28

Assume that X t (\(t\in\mathbb {N}\)) are standard normal. Choose c n =(2lnn)−1/2 and

$$d_n=\frac{1}{2^{1/2}} \biggl\{2(\log n)^{1/2}- \frac{\log\log n+\log (4\pi)}{2\sqrt{\log n}} \biggr\}. $$

Then the limiting distribution is Gumbel.

Example 4.29

Assume that X t (\(t\in\mathbb{N}\)) fulfill

$$ P(X_{1}>x)\sim A\frac{1+\beta}{2}{x^{-\alpha}},\qquad P(X_{1}<-x)\sim A\frac{1-\beta}{2}x^{-\alpha}. $$
(4.200)

(The left-tail behaviour is not needed here, however, we include it for completeness.) Let \(A_{\beta}=A\frac{1+\beta}{2}\). Then,

$$P\bigl((A_{\beta}n)^{-1/\alpha}\max\{X_{1}, \ldots,X_{n}\}\leq x\bigr)=F^{n}\bigl(A_{\beta }^{1/\alpha}xn^{1/\alpha} \bigr)=\bigl(1-\bar{F}\bigl(A_{\beta}^{1/\alpha}xn^{1/\alpha}\bigr)\bigr)^{n}, $$

where \(\bar{F}(x)=1-F(x)\). Hence, for n large enough,

$$P\bigl((A_{\beta}n)^{-1/\alpha}\max\{X_{1}, \ldots,X_{n}\}\leq x\bigr)= \biggl( 1-\frac{x^{-\alpha}}{n} \biggr)^{n}\rightarrow\exp\bigl(-x^{-\alpha}\bigr) $$

as n→∞. In this case d n =0, c n =(A β n)1/α, and the limiting law is Fréchet.

These examples identify two main classes of distributions and their corresponding extreme value behaviour: (a) the class of regularly varying distributions, that is \(\bar{F}(x)=x^{-\alpha}L(x)\) as x→∞, where L is a slowly varying function; then the limit is Fréchet; and (b) a class of (informally speaking) light-tailed distributions with unbounded support, like normal, log-normal or Gamma; then the limit is Gumbel. The first class is called the domain of attraction of the Fréchet law, and the second one the domain of attraction of the Gumbel law. The third type, Weibull, appears when the distribution has a bounded support, with a regularly varying behaviour at a boundary. This case will not be discussed here.

In the context of the examples above, a natural question is what happens if we drop the i.i.d. assumption. We will discuss this problem separately for the Fréchet and Gumbel domains of attraction respectively.

4.10.1 Gumbel Domain of Attraction

It turns out that maxima of a (possibly LRD) Gaussian sequence X t (\(t\in\mathbb{N}\)) behaves as if the random variables X t (\(t\in \mathbb{N}\)) were independent.

Theorem 4.45

Let X t (\(t\in\mathbb{N}\)) be a stationary Gaussian process with covariance function γ(k) such that Berman’s condition holds:

$$ \lim_{k\rightarrow\infty}\log(k)\gamma(k)=0. $$
(4.201)

Then

$$c_{n}^{-1} \bigl( \max(X_{1},\ldots,X_{n})-d_{n} \bigr) \overset {d}{\rightarrow}\varLambda_{\mathrm{Gumbel}}, $$

where c n =(2logn)−1/2, and

$$d_n=\frac{1}{2^{1/2}} \biggl\{2(\log n)^{1/2}- \frac{\log\log n+\log (4\pi)}{2\sqrt{\log n}} \biggr\}, $$

cf. Example 4.28.

Proof

The proof is only sketched here; some additional technical details can be found in Berman (1964) or Leadbetter et al. (1978, 1983).

We start with the following special version of the normal comparison lemma (see Lemma 3.2 in Leadbetter et al. 1983). For each y,

Next, let us fix x and define u n =c n x+d n . Then, since c n →0 and d n →∞, u n d n as n→∞. Furthermore,

Hence,

$$\exp \bigl(-u_n^2/2 \bigr)\sim\exp\bigl(-d_n^2/2 \bigr)\sim n^{-1}\sqrt{\log n}\sim\frac{u_n}{\sqrt{2}n}. $$

We may write

$$n\big|\gamma_X(k)\big|\exp \biggl(-\frac{u_n^2}{(1+|\gamma_X(k)|)} \biggr)= n\big|\gamma_X(k)\big|\exp\bigl(-u_n^2\bigr)\exp \biggl(-\frac{u_n^2|\gamma_X(k)|}{(1+|\gamma_X(k)|)} \biggr). $$

Let β>0 and k>n β. Define \(v_{n}=\sup_{k\ge n^{\beta}}|\gamma_{X}(k)|\). Note that

$$v_n u_n^2\sim2 v_n \log(n) 2\frac{\log n}{\log n^{\beta}}v_n \log n^{\beta}=\frac{2}{\beta}v_n \log n^{\beta}\to0 $$

as γ(n)log(n)→0. We note that this is exactly the place that Breiman’s condition plays a role. Therefore,

On the other hand, there exists δ>0 such that 1+|γ X (k)|<2−δ. Then

since we may assume without loss of generality that |γ X (k)|≤1. The bound converges to 0 when β<δ/(2+δ). This finishes the proof. □

In Theorem 4.45 we considered a discrete-time process X t (\(t\in\mathbb{Z}\)). The result can be extended to general continuous-time Gaussian processes, in particular to fractional Brownian motion B H (u); see Berman (1971). Furthermore, the result extends to stochastic differential equations driven by fBm. To illustrate this, we consider a continuous-time process Y(u) (\(u\in\mathbb{R}\)) that solves

$$ Y(v)-Y(u)=\int_{u}^{v}\mu\bigl(Y(s)\bigr)\,ds+\int _{u}^{v}\sigma\bigl(Y(s)\bigr)\,dB_{H}(s)\quad(u<v), $$
(4.202)

where μ(⋅) and σ(⋅)>0 are deterministic functions. We recall from Sect. 2.2.5.2 that if μ(x)=μ<0, σ(x)=σ, then the solution is a fractional Ornstein–Uhlenbeck process

$$Y(u)=\mathrm{FOU}(u)=\sigma\int_{-\infty}^{u}\exp\bigl( \mu(u-v)\bigr)\,dB_{H}(v). $$

The general Berman theory applies and

$$c_{T}^{-1} \Bigl( \max_{0\leq u\leq T}\mathrm{FOU}(u)-d_{T} \Bigr) \overset{d}{\rightarrow}\varLambda_{\mathrm{Gumbel}}, $$

where

with a constant C 0. We note that the rate of convergence does not depend on the Hurst parameter H. This convergence can be treated as the counterpart to the discrete-time situation in Theorem 4.45.

More generally, Buchmann and Klüppelberg (2005, 2006) study processes of the form Y ψ (u)=ψ(FOU(u)), where FOU(u) is a fractional Ornstein–Uhlenbeck process, and ψ is a function. Under general conditions established in those papers, Y ψ (u) solves (4.202), and the inverse function ψ −1 of ψ fulfills

$$\psi^{-1}(u)=\int_{\psi(0)}^{u}\frac{ds}{\sigma(s)}. $$

Furthermore, the authors give general conditions that guarantee

$$ \bigl(c_{T}^{\ast}\bigr)^{-1} \Bigl( \max_{0\leq u\leq T}Y_{\psi}(u)-\psi (d_{T}) \Bigr) \overset{d}{ \rightarrow}\varLambda_{\mathrm{Gumbel}}, $$
(4.203)

where \(c_{T}^{\ast}\) is possibly different than c T . The form of \(c_{T}^{\ast}\) depends on assumptions on ψ. For example, if

$$\lim_{y\rightarrow\infty}\frac{\psi(y+x/y)-\psi(y)}{\psi(y+1/y)-\psi(y)}=x, $$

then

$$c_{T}^{\ast}=\frac{2^{1/2}(-\mu)^{2H}}{\varGamma(2H+1)} \biggl\{ \psi \biggl( d_{T}+\frac{1}{d_{T}} \biggr) -\psi(d_{T}) \biggr\} . $$

In particular, we can choose ψ(x)=exp(x q), q∈(0,2). Then (4.203) holds with \(c_{T}^{\ast}\) as above. We note further that this is not applicable when q=2. Then the limiting distribution is Gumbel. Indeed, note that when Z is standard normal, then \(e^{Z^{2}}\) has a regularly varying tail and hence cannot belong to the Gumbel domain of attraction. We refer to Buchmann and Klüppelberg (2005, 2006) for further results.

A natural question arises. Can we generalize the theorem above to linear processes \(X_{t}=\sum_{k=0}^{\infty }a_{k}\varepsilon_{t-k}\), where ε t (\(t\in\mathbb{Z}\)) belong to the domain of attraction of the Gumbel law? The answer is affirmative for weakly dependent sequences. Davis and Resnick (1988, p. 61; see also Rootzén 1986) show that if

$$P \bigl( c_{n}^{-1}\bigl(\max\{\varepsilon_{1}, \ldots,\varepsilon_{n}\}-d_{n}\bigr)<x \bigr) \rightarrow_{d}\varLambda(x), $$

then for the partial maxima of the linear process, we have

$$P \bigl( c_{n}^{-1}\bigl(\max\{X_{1}, \ldots,X_{n}\}-d_{n}\bigr)<x \bigr) \rightarrow_{d} \varLambda^{\theta}(x) $$

with some θ∈(0,1). The parameter θ is called the extremal index and describes the contribution of dependence to the limiting law (see Embrechts et al. 1997 for more details). However, the authors assumed, in particular, that \(\sum_{k=0}^{\infty}|a_{k}|<\infty\), so that long memory is excluded. At the moment there do not seem to be any results for linear processes in the case of long memory.

Breidt and Davis (1998) study stochastic volatility models

$$X_{t}=\xi_{t}\sigma_{t}=\xi_{t}\exp( \eta_{t}/2), $$

where ξ t (\(t\in\mathbb{N}\)) is an i.i.d. standard normal sequence, independent of the stationary zero-mean Gaussian sequences η t . After log-transformation, the sequence

$$Y_{t}:=\log X_{t}^{2}=\eta_{t}+\log\xi_{t}^{2} $$

is represented as the sum of a stationary Gaussian sequence and the log of a \(\chi_{1}^{2}\) random variables. The tail of Y t has a complicated form, nevertheless it belongs to the domain of attraction of the Gumbel law. A modification of the normal comparison lemma allows us to prove the following result.

Theorem 4.46

Let X t (\(t\in\mathbb{N}\)) be a stochastic volatility model

$$X_{t}=\xi_{t}\exp(\eta_{t}/2), $$

where ξ t (\(t\in\mathbb{N}\)) is an i.i.d. standard normal sequence, independent of the stationary zero-mean Gaussian sequence η t . Assume that the covariance function of η t satisfies Berman’s condition (4.201), and let \(Y_{t}=\log X_{t}^{2}\). Then

$$c_{n}^{-1} \bigl( \max(Y_{1},\ldots,Y_{n})-d_{n} \bigr) \overset {d}{\rightarrow}\varLambda_{\mathrm{Gumbel}}, $$

where c n =(2logn)−1/2,

$$d_{n}\sim2\psi_{1}(\log n)^{1/2}+\psi_{2} \log\bigl((2\log n)^{1/2}\bigr)-\psi_{3}(2\log n)^{-1/2}( \log\log n+\psi_{4})+\psi_{5}, $$

where ψ 1,ψ 2,ψ 3,ψ 4 are positive constants, and \(c_{5}\in\mathbb{R}\).

We observe no influence of possible long memory in volatility on the limiting behaviour of maxima. As for Gaussian sequences considered in Theorem 4.45, the only difference appears in the form of the centering constants d n .

4.10.2 Fréchet Domain of Attraction

Recall Example 4.29. If the random variables are i.i.d. such that (4.200) holds, then the limiting distribution is Fréchet. This result can also be obtained using point processes. We recall from Sect. 4.3, Theorem 4.13, that

$$N_{n}:=\sum_{t=1}^{n}\delta_{\tilde{c}_{n}^{-1}X_{t}}\Rightarrow\sum _{l=1}^{\infty}\delta_{j_{l}}=:N, $$

where j l are points of a Poisson process with intensity measure

$$ d\lambda(x)= \alpha \biggl[ \frac{1+\beta}{2}x^{-(\alpha+1)} 1 \{ 0<x<\infty \} +\frac{1-\beta}{2}(-x)^{-(\alpha+1)} 1 \{ -\infty<x<0 \} \biggr]\, dx, $$
(4.204)

and \(\tilde{c}_{n}\) is such that \(P(|X_{1}|>\tilde{c}_{n})\sim n^{-1}\), that is \(\tilde{c}_{n}\sim A^{1/\alpha}n^{1/\alpha}\). We note that the event {max{X 1,…,X n }≤x} is equivalent to {no points of N n in (x,∞)}. Hence, for x>0,

Changing the scaling from \(\tilde{c}_{n}\) to c n =(A β n)1/α, we immediately conclude

$$P\bigl(c_{n}^{-1}\max\{X_{1},\ldots,X_{n} \}\leq x\bigr)\rightarrow\exp \bigl( -x^{-\alpha} \bigr) = \varLambda_{\mathrm{Frechet}}(x). $$

This approach to extremes via point processes can be generalized to dependent sequences, including series with long memory.

We start with linear processes. As in Sect. 4.3, we assume that \(X_{t}=\sum _{k=0}^{\infty }a_{k}\varepsilon_{t-k}\), where the random variables ε t are i.i.d. with a regularly varying distribution, that is

$$ P(\varepsilon_{1}>x)\sim A\frac{1+\beta}{2}{x^{-\alpha}},\qquad P(\varepsilon_{1}<-x)\sim A\frac{1-\beta}{2}x^{-\alpha}. $$
(4.205)

If α∈(1,2), we assume also that E(ε 1)=0. Of course, since ε t are i.i.d.,

$$P\bigl(c_{n}^{-1}\max\{\varepsilon_{1},\ldots, \varepsilon_{n}\}\leq x\bigr)\rightarrow \exp \bigl( -x^{-\alpha} \bigr) =\varLambda_{\mathrm{Frechet}}(x), $$

where c n =(A β n)1/α.

We saw in Sect. 4.3 that

$$P(X_{1}>x)\sim D_{\alpha}P(\varepsilon_{1}>x),\qquad P(X_{1}<-x)\sim D_{\alpha}P(\varepsilon_{1}<-x), $$

where the constant \(D_{\alpha}=\sum_{j=0}^{\infty}|a_{j}|^{\alpha}\) is assumed to be finite. Hence, if \(X_{t}^{\ast}\) (\(t\in\mathbb{Z}\)) is an i.i.d. sequence with the same marginal distribution as X t , then with the same c n =(A β n)1/α,

$$ P\bigl(c_{n}^{-1}\max\bigl\{X_{1}^{\ast}, \ldots,X_{n}^{\ast}\bigr\}\leq x\bigr)\rightarrow \exp \bigl(-D_{\alpha}x^{-\alpha}\bigr). $$
(4.206)

We note that the constant D α does not play the role of the extremal index (for the definition see e.g. Embrechts et al. 1997) because the i.i.d. random variables \(X_{t}^{\ast}\) have the tail P(X 1>x)∼D α P(ε 1>x). The limiting distribution above will serve as a benchmark for comparison with dependent linear processes X t that have the same marginal distribution as \(X_{t}^{\ast}\). To do this, we will assume without loss of generality that D α =1.

In Theorem 4.14 we showed, in particular, the following convergence of point processes:

$$\sum_{t=1}^{n}\delta_{\tilde{c}_{n}^{-1}X_{t}}\Rightarrow\sum _{l=1}^{\infty }\sum_{r=0}^{\infty}\delta_{j_{l}a_{r}}, $$

where \(\tilde{c}_{n}\sim A^{1/\alpha}n^{1/\alpha}\). Let us also assume for simplicity that all coefficients a j are nonnegative. When restricted to (0,∞), the limiting Poisson process has the intensity measure (cf. Davis and Resnick 1985)

$$\alpha\frac{1+\beta}{2}a_{+}^{\alpha}x^{-(\alpha+1)}dx, $$

where a +=max j a j . The same argument as described above for the i.i.d. case leads to the following result on sample extremes for heavy-tailed processes with possible long memory. Limiting behaviour of extremes follows directly from Lemma 4.19 and Theorem 4.14, under the assumptions therein.

Theorem 4.47

Let X t (\(t\in\mathbb{Z}\)) be a linear process where the innovations ε t (\(t\in\mathbb{Z}\)) are i.i.d. random variables such that (4.205) holds and E(ε 1)=0 if α∈(1,2). Suppose that either for some δ<α,

$$\sum_{j=0}^{\infty}|a_{j}|+\sum_{j=0}^{\infty}|a_{j}|^{\delta}<\infty, $$

or a j c a j d−1, d∈(0,1−1/α), and ε t (\(t\in\mathbb{Z}\)) are symmetric with α∈(1,2). Moreover, assume that D α =1 and a j ≥0. Then with c n =(A β n)1/α,

$$P\bigl(c_{n}^{-1}\max\{X_{1},\ldots,X_{n} \}\leq x\bigr)\rightarrow\exp \bigl(-a_{+}x^{-\alpha}\bigr). $$

This result should be compared with the expression (4.206) for \(X_{1}^{\ast},\ldots,X_{n}^{\ast}\) (with D α =1). The additional term θ:=a +∈(0,1] in the limiting distribution in Theorem 4.47 is the extremal index and describes the effect of dependence on the limiting behaviour of extremes. Since the coefficients a j are positive, extreme values of the sequence X t are generated by large positive values of the sequence ε t . If some of the coefficients are negative, large positive values of X t are possibly due to large negative values of the innovations, and hence the extremal index will change:

$$\theta=a_{+}+a_{-}\frac{1-\beta}{1+\beta}, $$

where a =max{max(−a j ),0}. We refer to Davis and Resnick (1985) and Embrechts et al. (1997) for more details.

We continue our discussion with heavy-tailed stochastic volatility models, as studied in Sect. 4.3.4. We assume that X t =ξ t σ t , where ξ t are i.i.d. such that

$$ P(\xi_{1}>x)\sim A\frac{1+\beta}{2}{x^{-\alpha}},\qquad P(\xi _{1}<-x)\sim A\frac{1-\beta}{2}x^{-\alpha}. $$
(4.207)

We will assume also for simplicity that the sequences σ t and ξ t are independent from each other. Then, \(P(X_{1}>x)\sim AE(\sigma _{1}^{\alpha })\frac{1+\beta}{2}{x^{-\alpha}}\). Hence, if \(X_{1}^{\ast},\ldots ,X_{n}^{\ast }\) are independent copies of X 1, then with c n =(A β n)1/α,

$$P\bigl(c_{n}^{-1}\max\bigl\{X_{1}^{\ast}, \ldots,X_{n}^{\ast}\bigr\}\leq x\bigr)\rightarrow \exp\bigl(-E \bigl(\sigma_{1}^{\alpha}\bigr)x^{-\alpha}\bigr). $$

Again, the constant \(E(\sigma_{1}^{\alpha})\) is related the to marginal behaviour of X t , not to the dependence structure. In Theorem 4.18 we concluded that the point process based on X 1,…,X n has the same limit as for the corresponding i.i.d. copies \(X_{1}^{\ast},\ldots,X_{n}^{\ast}\). Directly from Theorem 4.18 we conclude that the limiting behaviour maxima associated with heavy-tailed stochastic volatility models is the same as in the i.i.d. case. There is no influence of any dependence in volatility.

Theorem 4.48

Consider the LMSV model X t =ξ t σ t (\(t\in\mathbb{N}\)) such that (4.207), the Breiman condition (4.94) and \(E(\sigma _{1}^{\alpha+\varepsilon })<\infty\) with some ε>0 hold. Also, assume that σ t (\(t\in\mathbb{N}\)) is ergodic. Then

$$P\bigl(c_{n}^{-1}\max\{X_{1},\ldots,X_{n} \}\leq x\bigr)\rightarrow\exp\bigl(-E\bigl(\sigma_{1}^{\alpha} \bigr)x^{-\alpha}\bigr). $$

4.10.3 Stationary Stable Processes

Samorodnitsky (2004, 2006) considers a general stationary symmetric α-stable (SαS) process X t that can be represented by X t =∫g t (s) dM(s), where M is an SαS random measure. As mentioned in Sect. 1.3.6.3, such processes can be decomposed into a dissipative and a conservative part. As we will indicate below, the dissipative part has no influence on the limiting behaviour of maxima, whereas the conservative part does.

Rosiński (1995) argues that the class of ergodic SαS processes that are generated by the dissipative flow coincides with the class of moving averages X t =∫g t (s) dM(s)=∫g(ts) dM(s). In particular, consider a Linear Fractional Stable Motion

$$ Z_{H,\alpha}(u)=\int_{-\infty}^{\infty}Q_{u,1}(x;H,\alpha)\,d Z_{\alpha}(x), $$
(4.208)

where Z α (⋅) is a symmetric α-stable (SαS) Lévy process,

$$ Q_{u,1}(x;H,\alpha)=c_{1} \bigl[ (u-x)_{+}^{H-1/\alpha }-(-x)_{+}^{H-1/\alpha} \bigr] +c_{2} \bigl[ (u-x)_{-}^{H-1/\alpha}-(-x)_{-}^{H-1/\alpha} \bigr], $$
(4.209)

and H>1/α. Let X t =Z H,α (t)−Z H,α (t−1). Samorodnitsky (2004) proves that in this case

$$P\bigl(n^{-1/\alpha}\max\{X_{1},\ldots,X_{n}\}\leq x\bigr) \rightarrow\exp \bigl(-Cx^{-\alpha}\bigr), $$

where C is a positive constant. Hence, the rate of growth of maxima is the same as in the i.i.d. case. We observed this already in the case of moving averages considered in Theorem 4.47.

In contrast, a simple (non-ergodic) example of an SαS process generated by the conservative flow is given by X t =Z 1/β ε t , (\(t\in\mathbb{N}\)), where Z is a strictly positive α/β-stable random variable, and ε t is a sequence of i.i.d. symmetric S β (1,0,0) random variables, independent of Z, and 0<α<β<2. Then, marginally, the random variables X t are α-stable.

We recall that the β-stability and symmetry of random variables ε t yield

$$P(\varepsilon_{1}>x)\sim\frac{1}{2}C_{\beta}x^{-\beta}, $$

cf. (4.75). Choosing c n =(C β /2)1/β n 1/β, we have

Hence, even though the random variables X t are α-stable, the scaling involves β, not α. In other words, maxima grow slower than in the i.i.d. case. This is a general pattern for stable processes generated by a dissipative flow. We refer to Samorodnitsky (2004, 2006) and Resnick and Samorodnitsky (2004) for further details.