1 Introduction

As a simple but useful tool, the auto-regression (AR) models have been widely used in economics and many other fields. Among them, the simplest one is the autoregressive process with order 1, i.e., an AR(1) model, which is usually defined as

$$\begin{aligned} y_t = \mu + \rho y_{t-1} + e_t, \quad t=1,\cdots ,n. \end{aligned}$$
(1)

where \(y_0\) is a constant and \(e_t\)’s are independent and identically distributed (hereafter, i.i.d.) random errors with mean zero and finite variance. The process \(\{y_t\}\) is (i) stationary if \(|\rho |<1\) independent of n, (ii) unit root if \(\rho =1\), (iii) near unit root if \(\rho =1+c/n\) for some nonzero constant c, iv) explosive if \(|\rho | > 1\) independent of n, and v) moderate deviation from a unit root if \(\rho =1+c/k_n\) for some nonzero constant c and a sequence \(\{k_n\}\) satisfying \(k_n\rightarrow \infty \) and \(k_n/n\rightarrow 0\) as \(n\rightarrow \infty \).

When \(\mu = 0\) and the error variance in the model (1) is finite, it is well known in the literature that the least squares estimator for \(\rho \) has a quite different limit distribution in cases of stationary, unit root and near unit root; see [17]. The convergence rate of the correlation coefficient is \(\sqrt{n}\), n for cases (i)–(iii), respectively, and may even be \((1 + c)^n\) in the case of (v) for some \(c > 0\) as stated in [19]. More studies on this model can be found in [1, 2, 4, 8, 9, 13, 16, 20] and references therein.

When \(\mu \ne 0\) with finite variance, [21] and [11] studied the limit theory for the AR(1) for cases of iv) and v), respectively. It is shown that the inclusion of a nonzero intercept may change drastically the large sample properties of the least squares estimator compared to [19]. More recently, [15] studied how to construct a uniform confidence region for \((\mu , \rho )\) regardless of (i)–(v) based on the empirical likelihood method.

Observe that \(e_t\) may have an infinite variance in practice [5, 18], and most of the aforementioned research was focused on the case of \(e_t\) having a finite variance. In this paper, we are interested in considering the model (1) when \(\mu \ne 0\) and the variance of \(e_t\) may possibly be infinite. We will derive the limit distribution of the least squares estimator of \((\mu , \rho )\) for the following cases:

  • (P1) \(|\rho |<1\) independent of n;

  • (P2) \(|\rho | > 1\) independent of n;

  • (P3) \(\rho = 1\);

  • (P4) \(\rho = 1 + \frac{c}{n}\) for some constant \(c \ne 0\);

  • (P5) \(\rho = 1 + \frac{c}{n^\alpha }\) for some constants \(c < 0\) and \(\alpha \in (0, 1)\);

  • (P6) \(\rho = 1 + \frac{c}{n^\alpha }\) for some constants \(c > 0\) and \(\alpha \in (0, 1)\).

Since the current paper allows for the inclusion of both the intercept and a possible infinite variance, it can be treated as an extension of the existing literature, i.e., [11, 14, 17, 19, 21], among others.

We organize the rest of this paper as follows. Section 2 provides the methodology and main limit results. Detailed proofs are put in Section 3.

2 Methodology and main results

Under model (1), by minimizing the sum of squares:

$$\begin{aligned} \sum _{t=1}^n (y_t-\mu -\rho y_{t-1})^2, \end{aligned}$$

with respect to \((\mu , \rho )^\top \), we get the least squares estimator for \((\mu ,\rho )^\top \) as follows

$$\begin{aligned} \left\{ \begin{array}{ll} {\hat{\mu }}=\frac{\sum \limits _{s=1}^ny_s\sum \limits _{t=1}^n y_{t-1}^2-\sum \limits _{s=1}^ny_{s-1}\sum \limits _{t=1}^n y_ty_{t-1}}{n\sum \limits _{t=1}^n y_{t-1}^2-\left( \sum \limits _{t=1}^n y_{t-1}\right) ^2}\\ {\hat{\rho }}=\frac{n\sum \limits _{t=1}^n y_ty_{t-1}-\sum \limits _{s=1}^ny_{s-1}\sum \limits _{t=1}^n y_t}{n\sum \limits _{t=1}^n y_{t-1}^2- \left( \sum \limits _{t=1}^n y_{t-1}\right) ^2}. \end{array}\right. \end{aligned}$$
(2)

Here \(A^\top \) denotes the transpose of the matrix or vector A. In the sequel, we will investigate the limit distribution of \(({\hat{\mu }}-\mu , {\hat{\rho }}-\rho )^\top .\)

To derive the limit distribution of this least squares estimator, we follow [19] by assuming that

  • (C1) The innovations \(\{e_t\}\) are i.i.d. with \(E[e_t]=0\);

  • (C2) The process is initialized at \(y_0=O_p(1)\).

Observing that the variance of \(e_t\)’s may not exist, we use the slowly varying function \(l(x)=E[e_t^2I(|e_t|\le x)]\) instead as did in [14] to characterize the dispersion property of the random errors, which is supposed to satisfy

  • C3) \(l(nx)/l(n) \rightarrow 1\) as \(n \rightarrow \infty \) for any \(x > 0\).

An example of slowly varying function is when l(x) has a limit, say \(\lim \limits _{x\rightarrow \infty }l(x)=\sigma ^2\), which implies \(\{e_t\}\) having a finite variance \(\sigma ^2\). Another example is \(l(x)=\log (x)\), \(x>1\), which implies that the variance of \(e_t\)’s does not exist. One known property of l(x) is that \(l(x)=o(x^{\varepsilon })\) as \(x\rightarrow \infty \), for any \(\varepsilon >0\). More properties on l(x) can be found in [10]. To deal with the possibly infinite variance, we introduce the following sequence \(\{b_k\}_{k=0}^\infty \), where

$$\begin{aligned} b_0=\inf \{ x\ge 1: l(x)>0\} \end{aligned}$$

and

$$\begin{aligned} b_j=\inf \left\{ s: s\ge b_0+1, \frac{l(s)}{s^2}\le \frac{1}{j}\right\} ,~\text {for}~j=1,2,\ldots ,n, \end{aligned}$$

which imply directly \(n l(b_n)\le b_n^2\) for all \(n\ge 1\); see also [12].

For convenience, in the sequel we still call \(|\phi | < 1\) the stationary case, \(\rho = 1\) the unit root case, \(\phi = 1 + \frac{c}{n}\) for some \(c\ne 0\) the near unit root case, \(\rho = 1 + \frac{c}{n^\alpha }\) for some \(c \ne 0\) the moderate deviation case and \(|\rho | > 1\) the explosive case, even when the variance of \(v_t\) is infinite. We will divide the theoretical derivations into four separate subsections.

2.1 Limit theory for the stationary case

We first consider the stationary case \(|\rho | < 1\), which is independent of n. Observe that

$$\begin{aligned} y_t = \mu + \rho y_{t-1} + e_t = \frac{\mu }{1 - \rho } + \left( y_0 - \frac{\mu }{1 - \rho }\right) \rho ^t + \sum _{j=1}^{t} \rho ^{t-j} e_j. \end{aligned}$$

We write \({\bar{y}}_t = y_t - \frac{\mu }{1 - \rho }\), and then have

$$\begin{aligned} {{\bar{y}}}_t=\rho {{\bar{y}}}_{t-1}+e_t. \end{aligned}$$

To prove the main result for this case, we need the following preliminary lemma.

Lemma 1

Suppose conditions (C1)–(C3) hold. Under P1, as \(n\rightarrow \infty \), we have

$$\begin{aligned}{} & {} \frac{1}{n} \sum _{t=1}^n y_{t-1} \overset{p}{\longrightarrow }\ \frac{\mu }{1-\rho },\\{} & {} \frac{1}{nl(b_n)} \sum _{t=1}^n y_{t-1}^2 \overset{p}{\longrightarrow } {\left\{ \begin{array}{ll} \frac{1}{1-\rho ^2},&{}~\text {if}~\lim \limits _{m\rightarrow \infty }l(b_m)=\infty ,\\ \frac{1}{1-\rho ^2}+\frac{\mu ^2}{\sigma ^2(1-\rho )^2},&{}~\text {if}~\lim \limits _{m\rightarrow \infty }l(b_m)=\sigma ^2, \end{array}\right. } \end{aligned}$$

and

$$\begin{aligned} \begin{pmatrix} \frac{1}{\sqrt{nl(b_n)}}\sum \limits _{t=1}^n e_t\\ \frac{1}{\sqrt{n}l(b_n)}\sum \limits _{t=1}^n {{\bar{y}}}_{t-1}e_t \end{pmatrix} \overset{d}{\longrightarrow } \begin{pmatrix} W_1\\ W_2 \end{pmatrix} \sim N\left( \begin{pmatrix}0\\ 0 \end{pmatrix}, \begin{pmatrix}1\quad 0\\ ~0\quad \frac{1}{1-\rho ^2} \end{pmatrix} \right) . \end{aligned}$$
(3)

Based on Lemma 1, we can show the following theorem.

Theorem 1

Under conditions (C1)–(C3), as \(n\rightarrow \infty \), we have under P1 that

$$\begin{aligned} \left( \begin{array}{ccc} \sqrt{\frac{n}{l(b_n)}}({\hat{\mu }}-\mu ) \\ \sqrt{n} ({\hat{\rho }}-\rho ) \end{array} \right) \overset{d}{\longrightarrow }\ \left( \begin{array}{ccc} X_1 \\ X_2 \end{array} \right) ; \end{aligned}$$

where \(X_1= W_1 - \frac{\mu (1+\rho )}{\sigma ^2}W_2\) and \(X_2 = (1-\rho ^2)W_2\) if \(\lim \limits _{m \rightarrow \infty } l(b_n) = \sigma ^2\), and \(X_1 = W_1\) and \(X_2 = (1-\rho ^2)W_2\) if \(\lim \limits _{m \rightarrow \infty } l(b_n) = \infty \).

Remark 1

Theorem 1 indicates that the possible infinite variance may affect the convergence rate of the least squares estimator of the intercept, but has no impact on that of \(\rho \).

Remark 2

When \(\lim \limits _{m\rightarrow \infty } l(b_m)\) exists and is equal to \(\sigma ^2\), we have \((X_1, X_2)^\top \sim N(0, \Sigma _1)\), where \(\Sigma _1 = (\sigma _{ij}^2)_{1\le i,j\le 2}\) with \(\sigma _{11}^2 = 1 + \frac{\mu ^2(1 + \rho )}{\sigma ^4 (1 - \rho )}\), \(\sigma _{12}^2 = \sigma _{21}^2 = - \frac{\mu (1+\rho )}{\sigma ^2}\) and \(\sigma _{22}^2 = 1 - \rho ^2\). That is, the limit distribution reduces to the ordinary case; see [15] and references therein for details.

2.2 Limit theory for the explosive case

For this case, let \({\tilde{y}}_t=\sum _{i=1}^t \rho ^{t-i}e_i+\rho ^t y_0\), then

$$\begin{aligned} y_t=\mu \frac{1-\rho ^t}{1-\rho }+\rho ^t y_0+\sum _{i=1}^t \rho ^{t-i}e_i=\mu \frac{1-\rho ^t}{1-\rho }+{\tilde{y}}_t. \end{aligned}$$

Along the same line as Section 2.1, we derive a preliminary lemma first as follows.

Lemma 2

Suppose conditions (C1)–(C3) hold. Under P2, as \(n\rightarrow \infty \), we have

$$\begin{aligned} \begin{pmatrix} \frac{1}{\sqrt{nl(b_n)}}\sum \limits _{t=1}^n e_t\\ \frac{1}{\sqrt{l(b_n)}}\sum \limits _{t=1}^n\rho ^{-(n-t)}e_t\\ \frac{\rho }{\sqrt{l(b_n)}}\sum \limits _{t=1}^{n-1}\rho ^{-t}e_t+\rho y_0 \end{pmatrix} \overset{d}{\longrightarrow } \begin{pmatrix} W_1\\ U_1\\ U_2 \end{pmatrix}, \end{aligned}$$

and

$$\begin{aligned} \begin{pmatrix} \rho ^{-(n-2)}\{l(b_n)\}^{-1}\sum \limits _{t=1}^n {{\tilde{y}}}_{t-1}e_t\\ (\rho ^2-1)\rho ^{-2(n-1)}\{l(b_n)\}^{-1}\sum \limits _{t=1}^n {{\tilde{y}}}_{t-1}^2 \end{pmatrix} \overset{d}{\longrightarrow } \begin{pmatrix} U_1U_2\\ U_2^2 \end{pmatrix}, \end{aligned}$$

where \(U_1 \sim \lim \limits _{m\rightarrow \infty } \frac{1}{\sqrt{l(b_m)}}\sum \limits _{t=1}^m\rho ^{-(m-t)}e_t\), \(U_2 \sim \rho y_0 + \lim \limits _{m\rightarrow \infty } \frac{\rho }{\sqrt{l(b_m)}}\sum \limits _{t=1}^{m-1}\rho ^{-t}e_t\), and \(W_1\) are mutually independent random variables. \(W_1\) is specified in Lemma 1.

Using this lemma, we can obtain the following theorem.

Theorem 2

Under conditions (C1)–(C3), as \(n\rightarrow \infty \), we have

$$\begin{aligned} \left( \begin{array}{ccc} \sqrt{\frac{n}{l(b_n)}}({\hat{\mu }}-\mu ) \\ \rho ^n({\hat{\rho }}-\rho ) \end{array} \right) \overset{d}{\longrightarrow } \left( \begin{array}{ccc} W_1 \\ (\rho ^2-1)\frac{U_1}{U_2+\mu \rho /(\rho -1)} \end{array} \right) , \end{aligned}$$

under P2.

Similar to the case of \(|\phi | < 1\), Theorem 2 indicates that the possible infinite variance only affects the convergence rate of \({\hat{\mu }}\). The joint limit distribution reduces to that obtained in [21] if \(\lim \limits _{m\rightarrow \infty } l(b_m)\) is finite.

2.3 Limit theory for the unit root and near unit root cases

In these two cases, \(\rho =:\rho _n=1+\frac{c}{n}, c\in {\mathbb {R}}\). (\(c = 0\) corresponds to \(\rho = 1\), i.e., the unit root case.) Let \({\tilde{y}}_t=\sum _{i=1}^t \rho ^{t-i}e_i\), then

$$\begin{aligned} y_t=\mu +\rho y_{t-1}+e_t=\mu \left( \sum _{j=0}^{t-1}\rho ^j\right) +\rho ^t y_0+{\tilde{y}}_t. \end{aligned}$$

We have the following lemma:

Lemma 3

Let \(E_n(t)=\frac{\sum \limits _{i=1}^{\lfloor ns\rfloor }e_i}{\sqrt{nl(b_n)}}\), \(s\in [0, 1]\). Then

$$\begin{aligned} E_n(s) \overset{D}{\longrightarrow }\ {{\bar{W}}}(s),~\text {in}~ D[0, 1]~\text {as}~n\rightarrow \infty , \end{aligned}$$

where \(\{{{\bar{W}}}(s), s\ge 0\}\) is a standard Brownian process, \(\lfloor \cdot \rfloor \) is the floor function, and \(\overset{D}{\longrightarrow }\) denotes the weak convergence. Moreover, define \(J_c(s) = \lim \limits _{a \rightarrow c} \frac{1-e^{as}}{-a}\), then as \(n\rightarrow \infty \), we have under P3 and P4 that

$$\begin{aligned} \left\{ \begin{array}{ll} &{}n^{-2}\sum \limits _{t=1}^n\left( \sum \limits _{j=0}^{t-2}\rho ^j\right) \rightarrow \int _0^1 J_{c}(s)\, ds,\\ &{}n^{-3}\sum \limits _{t=1}^n\left( \sum \limits _{j=0}^{t-2}\rho ^j\right) ^2\rightarrow \int _0^1 J_{c}^2(s)\, ds,\\ &{}n^{-3/2}\sum \limits _{t=1}^n\left( \sum \limits _{j=0}^{t-2}\rho ^j\right) \frac{e_t}{\sqrt{l(b_n)}} \overset{d}{\longrightarrow }\ \int _0^1 J_{c}(s)\, d{{\bar{W}}}(s). \end{array}\right. \end{aligned}$$

and in turn

$$\begin{aligned} \frac{\sum \limits _{k=1}^n{\tilde{y}}_t^2}{n^2 l(b_n)} \overset{d}{\longrightarrow }\ \int _0^1 e^{-2c(1-s)} {{\bar{W}}}^2(B_c(s))\,ds, \end{aligned}$$
$$\begin{aligned} \frac{\sum \limits _{k=1}^n{\tilde{y}}_t}{n^{3/2} \sqrt{l(b_n)}}\overset{d}{\longrightarrow }\ \int _0^1 e^{-c(1-s)} \bar{W}(B_c(s))\,ds, \end{aligned}$$
$$\begin{aligned} \frac{\sum \limits _{k=1}^n{\tilde{y}}_{t-1}e_t}{n l(b_n)} \overset{d}{\longrightarrow }\ -c\int _0^1 e^{-2c(1-s)}\bar{W}^2(B_c(s))\,ds\ +\frac{{{\bar{W}}}^2(B_c(1))}{2}-\frac{1}{2}, \end{aligned}$$

where \(B_c(s)=e^{2c}(e^{-2cs-1})/(-2c)\).

This lemma can be easily proved using similar techniques as in [6] based on the fact that

$$\begin{aligned} \frac{1}{n^{3/2}\sqrt{l(b_n)}}\sum _{t=1}^n\left( \sum _{j=0}^{t-2}\rho ^j\right) e_t^{(2)} \overset{p}{\longrightarrow }\ 0,~n\rightarrow \infty . \end{aligned}$$

Using this lemma, it is easy to check the following theorem.

Theorem 3

Under the conditions (C1)–(C3), as \(n\rightarrow \infty \), we have

$$\begin{aligned} \left( \begin{array}{ccc} \sqrt{\frac{n}{l(b_n)}} ({\hat{\mu }}-\mu ) \\ \sqrt{\frac{n^3}{l(b_n)}} ({\hat{\rho }}-\rho ) \end{array} \right) \overset{d}{\longrightarrow } \left( \begin{array}{ccc} Y_1/d \\ Y_2/(\mu d) \end{array} \right) ; \end{aligned}$$

under P3 and P4, where

$$\begin{aligned} d= & {} \int _0^1 J_c^2(s)\,ds -\left( \int _0^1 J_c(s)\,ds\right) ^2,\\ Y_1= & {} {{\bar{W}}}(1) \int _0^1 J_c^2(s)\,ds - \int _0^1 J_c(s)\,ds \int _0^1 J_{c}(s)\, d{{\bar{W}}}(s),\\ Y_2= & {} \int _0^1 J_{c}(s)\, d{{\bar{W}}}(s)- {{\bar{W}}}(1)\int _0^1 J_c(s)\,ds. \end{aligned}$$

2.4 Limit theory for the moderate deviation cases

As stated in [19], the moderate deviation cases bridge the different convergence rates of cases P1–P4. That is, case P5 bridges the stationary case and the near unit root case, while Case P5 bridges the explosive case and the near unit root case. The derivation of these two cases needs to be handled differently because for the case \(c > 0\), the martingale center limit theorem fails to hold. Following [19], we consider them separately.

The following lemma is useful in deriving the limit distribution of the least square estimator under cases P5-P6.

Lemma 4

Suppose conditions (C1)–(C3) hold.

  • (i) Under P5, as \(n \rightarrow \infty \), we have

    $$\begin{aligned} \begin{pmatrix} \frac{1}{\sqrt{n}}\sum \limits _{t=1}^n\frac{e_t}{\sqrt{l(b_n)}}\\ \frac{1}{\sqrt{n^\alpha }}\sum \limits _{t=1}^n\frac{\rho ^{t-1}e_t}{\sqrt{l(b_n)}}\\ \frac{1}{\sqrt{n^\alpha }}\sum \limits _{t=1}^n\frac{\rho ^{n-t}e_t}{\sqrt{l(b_n)}}\\ \frac{1}{\sqrt{n^{1+\alpha }}}\sum \limits _{t=2}^n \frac{e_t}{\sqrt{l(b_n)}} \sum \limits _{j=1}^{t-1} \frac{\rho ^{t-1-j}e_j}{\sqrt{l(b_n)}} \end{pmatrix} ~\overset{d}{\longrightarrow }~ \begin{pmatrix} V_{11}\\ V_{12}\\ V_{13}\\ V_{14} \end{pmatrix} \sim N(0, \Sigma _2), \end{aligned}$$

    where \(\Sigma _2= diag (1, -\frac{1}{2c},-\frac{1}{2c},-\frac{1}{2c})\), which implies that \(V_{1i}\)’s are independent;

  • (ii) Under P6, as \(n \rightarrow \infty \), we have

    $$\begin{aligned} \begin{pmatrix} \frac{1}{\sqrt{n}}\sum \limits _{t=1}^n\frac{e_t}{\sqrt{l(b_n)}}\\ \frac{1}{\sqrt{n^\alpha }}\sum \limits _{t=1}^n\frac{\rho ^{-t}e_t}{\sqrt{l(b_n)}}\\ \frac{1}{\sqrt{n^{\alpha }}}\sum \limits _{t=1}^n\frac{\rho ^{t-1-n}e_t}{\sqrt{l(b_n)}} \end{pmatrix} ~\overset{d}{\longrightarrow }~ \begin{pmatrix} V_{21}\\ V_{22}\\ V_{23} \end{pmatrix}, \end{aligned}$$

    and

    $$\begin{aligned} \frac{1}{\rho ^n n^\alpha }\sum _{t=2}^n \left( \sum _{i=1}^{t-1}\frac{\rho ^{t-1-i}e_i}{\sqrt{l(b_n)}}\right) \frac{e_t}{\sqrt{l(b_n)}} \overset{d}{\longrightarrow }\ V_{22}V_{23}, \end{aligned}$$

    where \((V_{21}, V_{22}, V_{23})^\top \sim N(0, \Sigma _3)~\text {with}~\Sigma _3=diag(1, \frac{1}{2c},\frac{1}{2c})\), which implies that \(V_{2i}\)’s are independent.

Theorem 4

Suppose conditions (C1)–(C3) hold.

  • (i) Under P5, we have as \(n\rightarrow \infty \)

    $$\begin{aligned} \left( \begin{array}{ccc} a_n({\hat{\mu }}-\mu ) \\ a_nn^\alpha ({\hat{\rho }}-\rho ) \end{array} \right) \overset{d}{\longrightarrow }\left( \begin{array}{ccc} \frac{\mu }{cd} \\ \frac{1}{d} \end{array} \right) ^\top Z; \end{aligned}$$

    where

    $$\begin{aligned} a_n= & {} \frac{\max (n^{1-\alpha /2}l(b_n), n^{3\alpha /2})}{\max (n^{\alpha }\sqrt{l(b_n)}, n^{1/2}l(b_n))},\\ Z= & {} \frac{\mu }{c} V_{12} I(\alpha> 1/2) + V_{14} I(\alpha \le 1/2),\\ d= & {} \frac{\mu ^2}{-2c^3} I(\alpha > 1/2) + \frac{1}{-2c}I(\alpha \le 1/2), \end{aligned}$$

    if \(\lim \limits _{n\rightarrow \infty } l(b_n) = \infty \), and

    $$\begin{aligned} a_n= & {} n^{\max (\alpha , 1/2) - \alpha /2},\\ Z= & {} \frac{\mu }{c} V_{12} I(\alpha \ge 1/2) + V_{14} I(\alpha \le 1/2),\\ d= & {} \frac{\mu ^2}{-2c^3} I(\alpha \ge 1/2) + \frac{1}{-2c}I(\alpha \le 1/2), \end{aligned}$$

    if \(\lim \limits _{m\rightarrow \infty } l(b_m) = \sigma ^2\).

  • (ii) Under P6, we have as \(n\rightarrow \infty \)

    $$\begin{aligned} \left( \begin{array}{ccc} \sqrt{\frac{n}{l(b_n)}}({\hat{\mu }}-\mu ) \\ \frac{n^{3\alpha /2}\rho ^n}{\sqrt{l(b_n)}}({\hat{\rho }}-\rho ) \end{array} \right) \overset{d}{\longrightarrow }\left( \begin{array}{ccc} V_{21} \\ \frac{2c^2}{\mu }V_{23} \end{array} \right) . \end{aligned}$$

Remark 3

Theorem 4 indicates that the possible infinite variance affects both estimators of \(\mu \) and \(\rho \). Similar to [21], the limit distribution of the least squares estimators under P5 is degenerated, but slightly differently, we obtain the exact limit distribution under this case.

Remark 4

Under some mild conditions, it is possible to extend the current result to the case that \(\rho = 1 + \frac{c}{k_n}\) by using similar arguments for some general sequence \(\{k_n\}\) such that \(k_n = o(n)\) and \(k_n / n \rightarrow 0\) as \(n \rightarrow \infty \) as studied in [19].

3 Detailed proofs of the main results

In this section, we provide all detailed proofs of the lemmas and theorems stated in Section 2.

Proof of Lemma 1

To handle the possible infinite variance, we use the truncated random variables. Let

$$\begin{aligned} \left\{ \begin{array}{ll} e_t^{(1)}=e_tI(|e_t|\le b_n)- E[e_tI(|e_t|\le b_n)],\\ e_t^{(2)}=e_tI(|e_t|> b_n)- E[e_tI(|e_t|> b_n)], \end{array}\right. \end{aligned}$$
(4)

where \(I(\cdot )\) denotes the indicative function. The key step is to show that the difference of replacing \(e_t\) by \(e_t^{(1)}\) in the summations is negligible.

Let \(\{{{\bar{y}}}_t^{(1)}\}\) and \(\{{{\bar{y}}}_t^{(2)}\}\) be two time series satisfying

$$\begin{aligned} {{\bar{y}}}_t^{(k)}=\rho {{\bar{y}}}_{t-1}^{(k)}+ e_t^{(k)}, \quad k=1,2. \end{aligned}$$

Obviously, \(\{e_t^{(1)} / \sqrt{l(b_n)}: t\ge 1\}\) are i.i.d., and under P1), it is easy to check that \(\{{{\bar{y}}}_{t-1}e_t^{(1)} / \sqrt{l(b_n)} : t\ge 1\}\) is a martingale differences sequence with respect to \({\mathcal {F}}_{t-1} =\sigma (\{e_s: s \le t-1\})\) for \(t = 1, 2, \cdots , n\), which satisfy the Lindeberg’s condition. Hence, by the Cramér-Wold device and the central limit theorem for martingale difference sequences, we have

$$\begin{aligned} \begin{pmatrix} \frac{1}{\sqrt{nl(b_n)}}\sum \limits _{t=1}^n e_t^{(1)}\\ \frac{1}{\sqrt{n}l(b_n)}\sum \limits _{t=1}^n {{\bar{y}}}_{t-1}^{(1)} e_t^{(1)} \end{pmatrix} \overset{d}{\longrightarrow } \begin{pmatrix} W_1\\ W_2 \end{pmatrix}. \end{aligned}$$
(5)

Next, under condition C3), it follows from [7] that

$$\begin{aligned} E[|e_t|I(|e_t|>b_n)]=o(l(b_n)b_n^{-1}),\quad n\rightarrow \infty . \end{aligned}$$

Then by \(nl(b_n)\le b_n^2\) and the Markov inequality, we have, for any \(\varepsilon > 0\),

$$\begin{aligned} P\left( \left| \sum _{t=1}^n e_t^{(2)}\right| \ge \sqrt{nl(b_n)}\varepsilon \right)\le & {} \frac{\sum \limits _{t=1}^n E|e_t^{(2)}|}{\sqrt{nl(b_n)}\varepsilon }\\\le & {} 2\frac{\sum \limits _{t=1}^nE[|e_t|I(|e_t|>b_n)]}{\sqrt{nl(b_n)}\varepsilon }\\= & {} o\left( \frac{\sqrt{nl(b_n)}}{b_n}\right) = o(1), ~ \text { as } n \rightarrow \infty , \end{aligned}$$

That is,

$$\begin{aligned} \frac{1}{\sqrt{nl(b_n)}}\sum _{t=1}^n e_t^{(2)} = {o_p(1)},\quad n\rightarrow \infty , \end{aligned}$$
(6)

Furthermore, note that \({{\bar{y}}}_{t-1}^{(k)}=\rho ^{t-1}\bar{y}_0^{(k)}+\sum _{i=1}^{t-1}\rho ^{t-1-i}e_i^{(k)},~k=1,2\). By the Hölder inequality, we have

$$\begin{aligned} E\left( \left| \frac{e_t^{(1)}}{\sqrt{l(b_n)}}\right| \right) \le \left\{ E\left( \frac{e_t^{(1)}}{\sqrt{l(b_n)}}\right) ^2\right\} ^{1/2}\le 1. \end{aligned}$$

Using the Markov inequality, we have

$$\begin{aligned}{} & {} P\left( \left| \sum _{t=1}^n {{\bar{y}}}_{t-1}^{(1)}e_t^{(2)}\right| \ge \sqrt{n}l(b_n)\varepsilon \right) \\{} & {} \le \frac{1}{\sqrt{n}l(b_n)\varepsilon }\sum _{t=1}^nE(|{{\bar{y}}}_{t-1}^{(1)}|)E(|e_t^{(2)}|)\\{} & {} \le \frac{1}{\sqrt{n}l(b_n)\varepsilon }\frac{1}{1-\rho }\{E(|{{\bar{y}}}_0^{(1)}|)+n E(|e_t^{(1)}|)\}E(|e_t^{(2)}|)\\{} & {} = o(n^{-1/2+\alpha /2}b_n^{-1}) + o\left( \frac{\sqrt{nl(b_n)}}{b_n}\right) =o(1), \end{aligned}$$

i.e.,

$$\begin{aligned} \frac{1}{\sqrt{n}l(b_n)}\sum _{t=1}^n {{\bar{y}}}_{t-1}^{(1)}e_t^{(2)} = o_p(1), \quad \text {as } n \rightarrow \infty . \end{aligned}$$
(7)

Similarly, we can show

$$\begin{aligned} \frac{1}{\sqrt{n}l(b_n)}\sum _{t=1}^n {{\bar{y}}}_{t-1}^{(2)}e_t^{(j)} = o_p(1),\quad n\rightarrow \infty , ~ j = 1, 2. \end{aligned}$$
(8)

This, together with (6)-(8), shows

$$\begin{aligned} \begin{pmatrix} \frac{1}{\sqrt{nl(b_n)}}\sum \limits _{t=1}^n e_t\\ \frac{1}{\sqrt{n}l(b_n)}\sum \limits _{t=1}^n {{\bar{y}}}_{t-1}e_t \end{pmatrix} = \begin{pmatrix} \frac{1}{\sqrt{nl(b_n)}}\sum \limits _{t=1}^n e_t^{(1)}\\ \frac{1}{\sqrt{n}l(b_n)}\sum \limits _{t=1}^n {{\bar{y}}}_{t-1}^{(1)}e_t^{(1)} \end{pmatrix} + o_p(1), \quad \text {as } n \rightarrow \infty , \end{aligned}$$

while combined with (5) shows (3).

Note that \({{\bar{y}}}_t=\rho {{\bar{y}}}_{t-1}+e_t\). Multiplying both sides with \({{\bar{y}}}_t\) and \(y_{t-1}\) respectively, and taking summation, we have

$$\begin{aligned} {\left\{ \begin{array}{ll} \sum \limits _{t=1}^n {{\bar{y}}}_t^2=\rho \sum \limits _{t=1}^n {{\bar{y}}}_t{{\bar{y}}}_{t-1}+\sum \limits _{t=1}^n {{\bar{y}}}_te_t,\\ \sum \limits _{t=1}^n {{\bar{y}}}_t{{\bar{y}}}_{t-1}=\rho \sum \limits _{t=1}^n {{\bar{y}}}_{t-1}^2+\sum \limits _{t=1}^n {{\bar{y}}}_{t-1}e_t. \end{array}\right. } \end{aligned}$$

Since

$$\begin{aligned} \frac{1}{\sqrt{n}l(b_n)}\sum _{t=1}^n {{\bar{y}}}_{t-1}e_t \overset{d}{\longrightarrow }W_2, \end{aligned}$$

and

$$\begin{aligned} \sum _{t=1}^n {{\bar{y}}}_te_t=\rho \sum _{t=1}^n {{\bar{y}}}_{t-1}e_t+\sum _{t=1}^n e_t^2, \end{aligned}$$

we have

$$\begin{aligned} \frac{\sum _{t=1}^n {{\bar{y}}}_te_t}{nl(b_n)} \overset{p}{\longrightarrow }1,~n\rightarrow \infty , \end{aligned}$$

by noting that \(\frac{\sum _{t=1}^n e_t^2}{nl(b_n)} \overset{p}{\longrightarrow }1\) (see (3.4) in [12]). Hence,

$$\begin{aligned} \left\{ \begin{array}{ll} \frac{\sum \limits _{t=1}^n {{\bar{y}}}_t^2}{nl(b_n)}=\rho \frac{\sum \limits _{t=1}^n {{\bar{y}}}_t{{\bar{y}}}_{t-1}}{nl(b_n)}+1+o_p(1)\\ \frac{\sum \limits _{t=1}^n {{\bar{y}}}_t{{\bar{y}}}_{t-1}}{nl(b_n)}=\rho \frac{\sum \limits _{t=1}^n {{\bar{y}}}_{t-1}^2}{nl(b_n)}+o_p(1), \end{array}\right. \end{aligned}$$

which implies that as \(n\rightarrow \infty \)

$$\begin{aligned} \left\{ \begin{array}{ll} \frac{\sum \limits _{t=1}^n {{\bar{y}}}_t^2}{nl(b_n)} \overset{p}{\longrightarrow }\frac{1}{1-\rho ^2}\\ \frac{\sum \limits _{t=1}^n {{\bar{y}}}_t{{\bar{y}}}_{t-1}}{nl(b_n)} \overset{p}{\longrightarrow }\frac{\rho }{1-\rho ^2}. \end{array}\right. \end{aligned}$$

Note that

$$\begin{aligned} \frac{1}{n} \sum _{t=1}^n y_{t-1}= & {} \frac{1}{n} \sum _{t=1}^n \bar{y}_{t-1}+\frac{\mu }{1-\rho } = \frac{\mu }{1-\rho } + o_p(1), \end{aligned}$$

and

$$\begin{aligned} \frac{1}{nl(b_n)} \sum _{t=1}^n y_{t-1}^2=\frac{1}{nl(b_n)} \sum _{t=1}^n {{\bar{y}}}_{t-1}^2 +\frac{2\mu }{1-\rho }\frac{1}{nl(b_n)} \sum _{t=1}^n \bar{y}_{t-1}+\frac{1}{l(b_n)}\frac{\mu ^2}{(1-\rho )^2}. \end{aligned}$$

Using these, the rest proof of this lemma follows directly by the law of large numbers.

Proof of Theorem 1

For the least squares estimator, it is easy to check that

$$\begin{aligned} \begin{pmatrix} {\hat{\mu }}-\mu \\ {\hat{\rho }}-\rho \end{pmatrix} = \begin{pmatrix} \frac{\sum \limits _{t=1}^ny_{t-1}^2\sum \limits _{s=1}^ne_s -\sum \limits _{s=1}^ny_{s-1} \sum \limits _{t=1}^ne_ty_{t-1}}{n\sum \limits _{t=1}^ny_{t-1}^2-\left( \sum \limits _{t=1}^ny_{t-1}\right) ^2}\\ \frac{n\sum \limits _{t=1}^ny_{t-1}e_t-\sum \limits _{t=1}^ny_{t-1}\sum \limits _{s=1}^ne_s}{n\sum \limits _{t=1}^ny_{t-1}^2 -\left( \sum \limits _{t=1}^ny_{t-1}\right) ^2} \end{pmatrix}. \end{aligned}$$

For convenience, hereafter write

$$\begin{aligned} \Delta _1= & {} \sum _{t=1}^ny_{t-1}^2\sum _{s=1}^ne_s -\sum _{s=1}^ny_{s-1} \sum _{t=1}^ny_{t-1}e_t,\\ \Delta _2= & {} n\sum _{t=1}^ny_{t-1}e_t-\sum _{t=1}^ny_{t-1}\sum _{s=1}^ne_s,\\ \Delta _3= & {} n\sum _{t=1}^ny_{t-1}^2-\left( \sum _{t=1}^ny_{t-1}\right) ^2. \end{aligned}$$

Observe that

$$\begin{aligned} \Delta _1= & {} \sum _{t=1}^ny_{t-1}^2\sum _{s=1}^n e_s -\sum _{s=1}^ny_{s-1} \sum _{t=1}^ny_{t-1} e_t\\= & {} \left( \sum _{t=1}^n{\bar{y}}_{t-1}^2 + \frac{\mu }{1-\rho }\sum _{t=1}^n{{\bar{y}}}_{t-1}\right) \sum _{s=1}^n e_s-\sum _{s=1}^ny_{s-1}\sum _{t=1}^n{{\bar{y}}}_{t-1} e_t. \end{aligned}$$

Hence, by Lemma 1 we have, as \(n\rightarrow \infty \),

$$\begin{aligned} \frac{1}{(nl(b_n))^{3/2}} \Delta _1 \overset{d}{\longrightarrow } \frac{1}{1-\rho ^2} W_1 - \frac{\mu }{\sigma ^2(1-\rho )}W_2 I\left( \lim \limits _{m\rightarrow \infty } l(b_m) = \sigma ^2\right) . \end{aligned}$$

Next, relying on

$$\begin{aligned} \Delta _2= & {} n\sum _{t=1}^ny_{t-1}e_t-\sum _{t=1}^ny_{t-1}\sum _{s=1}^ne_s\\= & {} n\sum _{t=1}^n{{\bar{y}}}_{t-1}e_t-\sum _{t=1}^n{{\bar{y}}}_{t-1}\sum _{s=1}^ne_s\\= & {} n\left( \sum _{t=1}^n{{\bar{y}}}_{t-1}e_t\right) \{1+o_p(1)\}, \end{aligned}$$

we obtain

$$\begin{aligned} \frac{1}{n^{3/2}l(b_n)} \Delta _2 \overset{d}{\longrightarrow }\ W_2. \end{aligned}$$

Following a similar fashion, we have

$$\begin{aligned} \frac{1}{n^2l(b_n)}\Delta _3 = \frac{1}{n l(b_n)} \sum _{t=1}^n y_{t-1}^2 - \frac{1}{l(b_n)} \left( \frac{1}{n}\sum _{t=1}^ny_{t-1}\right) ^2 \overset{p}{\longrightarrow }\ \frac{1}{1 - \rho ^2}. \end{aligned}$$

Then this theorem follows immediately by using the continuous mapping theorem.

Proof of Lemma 2

For the first part, by following a similar fashion to Lemma 1, we can show that

$$\begin{aligned} \begin{pmatrix} \frac{1}{\sqrt{nl(b_n)}}\sum \limits _{t=1}^n e_t\\ \frac{1}{\sqrt{l(b_n)}}\sum \limits _{t=1}^n\rho ^{-(n-t)}e_t\\ \frac{\rho }{\sqrt{l(b_n)}}\sum \limits _{t=1}^{n-1}\rho ^{-t}e_t+\rho y_0 \end{pmatrix} = \begin{pmatrix} \frac{1}{\sqrt{nl(b_n)}}\sum \limits _{t=1}^n e_t^{(1)}\\ \frac{1}{\sqrt{l(b_n)}}\sum \limits _{t=1}^n\rho ^{-(n-t)}e_t^{(1)}\\ \frac{\rho }{\sqrt{l(b_n)}}\sum \limits _{t=1}^{n-1}\rho ^{-t}e_t^{(1)}+\rho y_0 \end{pmatrix} + o_p(1). \end{aligned}$$

The rest proof is similar to [3] and [21]. We omit the details.

For the second part, we only prove the case of \(\lim \limits _{m\rightarrow \infty }l(b_m)=\infty \). Let \({\tilde{y}}_t^{(k)}=\sum \limits _{i=1}^t \rho ^{t-i}e_i^{(k)}+\rho ^t y_0\), \(k=1,2\), \(t = 1, 2, \cdots , n\). Similar to Lemma 1, we have

$$\begin{aligned} \rho ^{-n}\{l(b_n)\}^{-1}\sum _{t=1}^n E|{{\tilde{y}}}_{t-1}^{(k)}e_t^{(j)}| \overset{p}{\longrightarrow }\ 0,\\ \rho ^{-2n}\{l(b_n)\}^{-1}\sum _{t=1}^n E|{{\tilde{y}}}_{t-1}^{(k)}{{\tilde{y}}}_{t-1}^{(j)}| \overset{p}{\longrightarrow }\ 0, \end{aligned}$$

for \((k,j) \in \{(1, 2), (2, 1), (2, 2)\}\), as \(n\rightarrow \infty \), and in turn, we can obtain that

$$\begin{aligned} \begin{pmatrix} \rho ^{-(n-2)}\{l(b_n)\}^{-1}\sum \limits _{t=1}^n {{\tilde{y}}}_{t-1}e_t\\ (\rho ^2-1)\rho ^{-2(n-1)}\{l(b_n)\}^{-1}\sum \limits _{t=1}^n {{\tilde{y}}}_{t-1}^2 \end{pmatrix} = \begin{pmatrix} \rho ^{-(n-2)}\{l(b_n)\}^{-1}\sum \limits _{t=1}^n {{\tilde{y}}}_{t-1}^{(1)}e_t^{(1)}\\ (\rho ^2-1)\rho ^{-2(n-1)}\{l(b_n)\}^{-1}\sum \limits _{t=1}^n ({{\tilde{y}}}_{t-1}^{(1)})^2 \end{pmatrix} + o_p(1). \end{aligned}$$

The rest proof is similar to the first part. We omit the details.

Proof of Theorem 2

Using the same arguments as [21], it follows from Lemma 2 that

$$\begin{aligned} {\left\{ \begin{array}{ll} \rho ^{-(n-1)}\{l(b_n)\}^{-1/2} y_n \overset{d}{\longrightarrow }\ U_2+\frac{\mu \rho }{\rho -1},\\ \rho ^{-(n-2)}\{l(b_n)\}^{-1}\sum \limits _{t=1}^n y_{t-1}e_t \overset{d}{\longrightarrow }\ U_1(U_2+\frac{\mu \rho }{\rho -1}),\\ (\rho -1)\rho ^{-(n-1)}\{l(b_n)\}^{-1/2}\sum \limits _{t=1}^n y_{t-1} \overset{d}{\longrightarrow }\ U_2+\frac{\mu \rho }{\rho -1},\\ (\rho ^2-1)\rho ^{-2(n-1)}\{l(b_n)\}^{-1}\sum \limits _{t=1}^n y_{t-1}^2 \overset{d}{\longrightarrow }\ (U_2+\frac{\mu \rho }{\rho -1})^2. \end{array}\right. } \end{aligned}$$

Then as \(n\rightarrow \infty \), we have

$$\begin{aligned} \frac{1}{n^{1/2}\rho ^{2n}\{l(b_n)\}^{3/2}}\Delta _1= & {} \rho ^{-2n}\{l(b_n)\}^{-1}\sum _{t=1}^n y_{t-1}^2\times \frac{1}{\sqrt{nl(b_n)}}\sum _{t=1}^n e_t+o_p(1)\\\overset{d}{\longrightarrow } & {} \frac{1}{\rho ^2(\rho ^2-1)} W_1 \left( U_2+\frac{\mu \rho }{\rho -1}\right) ^2\\ \frac{1}{n\rho ^{n}l(b_n)}\Delta _2= & {} \rho ^{-n}\{l(b_n)\}^{-1}\sum _{t=1}^n y_{t-1}e_t+o_p(1)\\\overset{d}{\longrightarrow } & {} \frac{1}{\rho ^2}U_1 \left( U_2+\frac{\mu \rho }{\rho -1}\right) . \end{aligned}$$

and

$$\begin{aligned} \frac{1}{n\rho ^{2n}l(b_n)}\Delta _3= & {} \rho ^{-2n}\{l(b_n)\}^{-1}\sum _{t=1}^n y_{t-1}^2+o_p(1)\\\overset{d}{\longrightarrow } & {} \frac{1}{\rho ^2(\rho ^2-1)}\left( U_2+\frac{\mu \rho }{\rho -1}\right) ^2. \end{aligned}$$

Then the theorem has been proved.

Proof of Theorem 3

Similar to the proof of Theorem 1, by Lemma 3, we have, as \(n \rightarrow \infty \),

$$\begin{aligned} \Delta _1= & {} \sum _{t=1}^ny_{t-1}^2\sum _{t=1}^ne_t -\sum _{t=1}^ny_{t-1} \sum _{t=1}^ny_{t-1}e_t\\= & {} \mu ^2 \left\{ \sum _{t=1}^n \left( \sum _{j=0}^{t-2}\rho ^j\right) ^2\sum _{s=1}^ne_s - \left( \sum _{s=1}^n \sum _{j=0}^{s-2}\rho ^j\right) \left( \sum _{t=1}^n\sum _{j=0}^{t-2}\rho ^je_t \right) \right\} \{1+o_p(1)\}, \end{aligned}$$

which implies

$$\begin{aligned} \frac{1}{\sqrt{n^7 l(b_n)}} \Delta _1 \overset{d}{\longrightarrow } \mu ^2\left( {{\bar{W}}}(1)\int _0^1 J_c^2(s)\,ds-\int _0^1 J_c(s)\,ds \int _0^1 J_{c}(s)\, d{{\bar{W}}}(s)\right) , \end{aligned}$$

and

$$\begin{aligned} \Delta _2= & {} n\sum _{t=1}^ny_{t-1}e_t-\sum _{t=1}^ny_{t-1}\sum _{t=1}^ne_t\\= & {} \mu \left\{ n\left( \sum _{t=1}^n\sum _{j=0}^{t-2}\rho ^je_t\right) - \left( \sum _{t=1}^n \sum _{j=0}^{t-2}\rho ^j\right) \sum _{t=1}^ne_t \right\} \{1+o_p(1)\}, \end{aligned}$$

which leads

$$\begin{aligned} \frac{1}{\sqrt{n^5 l(b_n)}} \Delta _2 \overset{d}{\longrightarrow } \mu \left( \int _0^1 J_{c}(s)\, d{{\bar{W}}}(s) - {{\bar{W}}}(1)\int _0^1 J_c(s)\,ds\right) , \end{aligned}$$

and

$$\begin{aligned} \Delta _3= & {} n\sum _{t=1}^ny_{t-1}^2- \left( \sum _{t=1}^ny_{t-1}\right) ^2\\= & {} \mu ^2 n\sum _{t=1}^n \left( \sum _{j=0}^{t-2}\rho ^j \right) ^2-\mu ^2 \left( \sum _{t=1}^n \sum _{j=0}^{t-2}\rho ^j \right) ^2+o_p(n^4), \end{aligned}$$

which results in

$$\begin{aligned} \frac{1}{n^4} \Delta _3 \rightarrow \mu ^2\left( \int _0^1 J_c^2(s)\,ds -\left( \int _0^1 J_c(s)\,ds \right) ^2\right) ,\quad n\rightarrow \infty . \end{aligned}$$

Then the theorem has been proved.

Proof of Lemma 4

(i) Similar to Lemma 1, under P5, by the Markov inequality and the fact \(nl(b_n)\le b_n^2\), we have for any \(\varepsilon > 0\)

$$\begin{aligned} P\left( \left| \sum _{t=1}^n \rho ^{n-t}e_t^{(2)}\right| \ge \sqrt{n^\alpha l(b_n)}\varepsilon \right)\le & {} \frac{\sum _{t=1}^n \rho ^{t-1}E|e_t^{(2)}|}{\sqrt{n^\alpha l(b_n)}\varepsilon }\\\le & {} 2\frac{\sum _{t=1}^n\rho ^{t-1}E[|e_t|I(|e_t|>b_n)]}{\sqrt{n^\alpha l(b_n)}\varepsilon }\\= & {} o\left( \frac{l(b_n)}{b_n}\right) \frac{1 - \rho ^{n}}{1-\rho }\frac{1}{\sqrt{n^\alpha l(b_n)}\varepsilon }\\= & {} o\left( \frac{\sqrt{n^\alpha l(b_n)}}{b_n}\right) \rightarrow 0, ~ \text { as } n \rightarrow \infty . \end{aligned}$$

This implies that

$$\begin{aligned} \frac{1}{\sqrt{n}}\sum _{t=1}^n\frac{e_t^{(2)}}{\sqrt{l(b_n)}} \overset{p}{\longrightarrow }0~\text {and} ~\frac{1}{\sqrt{n^\alpha }}\sum _{t=1}^n\frac{\rho ^{t-1}e_t^{(2)}}{\sqrt{l(b_n)}} \overset{p}{\longrightarrow }0 ~ \text { as } n \rightarrow \infty . \end{aligned}$$

Similarly, we can show that

$$\begin{aligned} \frac{1}{\sqrt{n^\alpha }}\sum _{t=1}^n\frac{\rho ^{n-t}e_t^{(2)}}{\sqrt{l(b_n)}} \overset{p}{\longrightarrow }0,~ n\rightarrow \infty . \end{aligned}$$

Next, if one of i or j equals 2, it follows from Lemma A.2 of [14] that

$$\begin{aligned} \frac{1}{\sqrt{n^{1+\alpha }}}\sum _{t=2}^n \left\{ \sum _{k=1}^{t-1}\frac{\rho ^{t-1-k}e_k^{(i)}}{\sqrt{l(b_n)}} \right\} \frac{e_t^{(j)}}{\sqrt{l(b_n)}} \overset{p}{\longrightarrow }0, \quad n\rightarrow \infty . \end{aligned}$$

We actually obtain

$$\begin{aligned} \begin{pmatrix} \frac{1}{\sqrt{n}}\sum \limits _{t=1}^n\frac{e_t}{\sqrt{l(b_n)}}\\ \frac{1}{\sqrt{n^\alpha }}\sum \limits _{t=1}^n\frac{\rho ^{t-1}e_t}{\sqrt{l(b_n)}}\\ \frac{1}{\sqrt{n^\alpha }}\sum \limits _{t=1}^n\frac{\rho ^{n-t}e_t}{\sqrt{l(b_n)}}\\ \frac{1}{\sqrt{n^{1+\alpha }}}\sum \limits _{t=2}^n \frac{e_t}{\sqrt{l(b_n)}} \sum \limits _{j=1}^{t-1} \frac{\rho ^{t-1-j}e_j}{\sqrt{l(b_n)}} \end{pmatrix} ~=~ \begin{pmatrix} \frac{1}{\sqrt{n}}\sum \limits _{t=1}^n\frac{e_t^{(1)}}{\sqrt{l(b_n)}}\\ \frac{1}{\sqrt{n^\alpha }}\sum \limits _{t=1}^n\frac{\rho ^{t-1}e_t^{(1)}}{\sqrt{l(b_n)}}\\ \frac{1}{\sqrt{n^\alpha }}\sum \limits _{t=1}^n\frac{\rho ^{n-t}e_t^{(1)}}{\sqrt{l(b_n)}}\\ \frac{1}{\sqrt{n^{1+\alpha }}}\sum \limits _{t=2}^n \frac{e_t^{(1)}}{\sqrt{l(b_n)}} \sum \limits _{j=1}^{t-1} \frac{\rho ^{t-1-j}e_j^{(1)}}{\sqrt{l(b_n)}} \end{pmatrix} + o_p(1). \end{aligned}$$
(9)

Based on the Cramér-Wold device and central limit theorem for martingales differences sequence, Lindeberg’s condition for the first part of the right side of (9) can be proved by using the same arguments as [19] and [14]. We omit the details.

(ii) The proof of the case under P6 is similar to that of (i) and [19], thus is omitted.

Proof of Theorem 4

(i) Under P5, observe that \(\rho ^n = o(n^{-\alpha })\), \(y_0 = o(n^{\alpha /2})\), and

$$\begin{aligned} y_t= & {} \mu + \rho y_{t-1} + e_t\\= & {} \frac{\mu }{1 - \rho } + \left( \frac{\mu }{c} n^\alpha +y_0 \right) \rho ^t+\sum _{i=1}^t\rho ^{t-i}e_i, \end{aligned}$$

which implies

$$\begin{aligned} y_n= & {} \left( \frac{\mu }{1 - \rho } + \sum _{i=1}^n\rho ^{t-i}e_i\right) \{1 +o_p(1)\}. \end{aligned}$$

It is easy to verify that

$$\begin{aligned} \sum _{t=1}^n y_{t-1}= & {} \frac{1}{1 - \rho } \Big (n\mu - y_n + \sum _{t=1}^n e_t\Big ),\\ \sum _{t=1}^n y_{t-1}e_t= & {} \left\{ \sum _{t=1}^n e_t \left( \frac{\mu }{1 - \rho } + \sum _{i=1}^{t-1}\rho ^{t-1-i}e_i\right) \right\} \{1 + o_p(1)\}, \\ \sum _{t=1}^n y_{t-1}^2= & {} \frac{1}{1 - \rho ^2} \Big (n\mu ^2 - y_n^2 + \sum _{t=1}^n e_t^2 + 2\mu \rho \sum _{t=1}^n y_{t-1} + 2\mu \sum _{t=1}^n e_t + 2 \rho \sum _{t=1}^n y_{t-1}e_t\Big )\\= & {} \frac{1}{1 - \rho ^2} \Big (n\mu ^2 - y_n^2 + \sum _{t=1}^n e_t^2 + 2\mu \rho \sum _{t=1}^n y_{t-1}\Big )\{1 + o_p(1)\}. \end{aligned}$$

Hence,

$$\begin{aligned} \Delta _1= & {} \sum \limits _{t = 1}^n y_{t-1}^2 \cdot \sum \limits _{t = 1}^n e_{t} -\sum \limits _{t = 1}^n y_{t-1} \cdot \sum \limits _{t = 1}^n y_{t-1} e_t\\= & {} \left\{ \frac{1}{1 - \rho ^2} \Big (n\mu ^2 - y_n^2 + \sum _{t=1}^n e_t^2 + 2\mu \rho \sum _{t=1}^n y_{t-1} + 2\mu \sum _{t=1}^n e_t + 2 \rho \sum _{t=1}^n y_{t-1}e_t\Big ) \cdot \sum \limits _{t = 1}^n e_{t} \right. \\{} & {} \quad \left. - \frac{1}{1 - \rho } \Big (n\mu - y_n + \sum _{t=1}^n e_t\Big ) \cdot \sum \limits _{t = 1}^n y_{t-1} e_t \right\} \{1 + o_p(1)\}\\= & {} \frac{1}{1 - \rho } \Big \{\Big (\frac{n\mu ^2 - y_n^2 + \sum _{t=1}^n e_t^2}{2} + \mu \rho \sum _{t=1}^n y_{t-1} + \mu \sum _{t=1}^n e_t\Big ) \cdot \sum \limits _{t = 1}^n e_{t} - \Big (n\mu - y_n\Big ) \\{} & {} \quad \times \Big [\frac{\mu }{1 - \rho } \sum _{t=1}^n e_t - \frac{\mu }{1 - \rho } \sum _{t=1}^n \rho ^{t-1}e_t + \sum _{t=1}^n \Big (e_t \sum _{j=0}^{t-1} \rho ^{t-1-j}e_j\Big )\Big ]\Big \}\{1 + o_p(1)\}\\= & {} \frac{n\mu }{1 - \rho } \Big \{\frac{\mu }{1 - \rho } \sum _{t=1}^n \rho ^{t-1}e_t - \sum _{t=1}^n \Big (e_t \sum _{j=0}^{t-1} \rho ^{t-1-j}e_j\Big )\Big \}\{1 + o_p(1)\}\\= & {} \left\{ n^{1+5\alpha /2}\sqrt{l(b_n)} \frac{\mu ^2}{c^2} \left( \frac{1}{\sqrt{n^\alpha }} \sum _{t=1}^n \rho ^{t-1} \frac{e_t}{\sqrt{l(b_n)}}\right) \right. \\{} & {} \left. + n^{3(1+\alpha )/2}l(b_n) \frac{\mu }{c} \left( \frac{1}{\sqrt{n^{1 + \alpha }}} \sum _{t=1}^n \frac{e_t}{\sqrt{l(b_n)}} \sum _{j=0}^{t-1} \rho ^{t-1-j}\frac{e_j}{\sqrt{l(b_n)}}\right) \right\} \{1 + o_p(1)\}, \end{aligned}$$

and

$$\begin{aligned} \Delta _2= & {} -\sum \limits _{t = 1}^n y_{t-1} \cdot \sum \limits _{t = 1}^n e_{t} + n \sum \limits _{t = 1}^n y_{t-1} e_t\\= & {} -\frac{1}{1 - \rho } \Big (n\mu - y_n + \sum _{t=1}^n e_t\Big ) \cdot \sum \limits _{t = 1}^n e_{t} + n \Big [\frac{\mu }{1 - \rho } \sum _{t=1}^n e_t\\{} & {} - \frac{\mu }{1 - \rho } \sum _{t=1}^n \rho ^{t-1}e_t + \sum _{t=1}^n \Big (e_t \sum _{j=0}^{t-1} \rho ^{t-1-j}e_j\Big )\Big ]\\= & {} \left\{ n^{1+3\alpha /2} \sqrt{l(b_n)} \cdot \frac{\mu }{c} \cdot \left( \frac{1}{\sqrt{n^{\alpha }}} \sum _{t=1}^n \rho ^{t-1} \frac{e_t}{\sqrt{l(b_n)}} \right) \right. \\{} & {} + \left. n^{3/2+\alpha /2} l(b_n) \left( \frac{1}{\sqrt{n^{1+\alpha }}} \sum _{t=1}^n\frac{e_t}{\sqrt{l(b_n)}} \sum _{j=0}^{t-1} \rho ^{t-1-j}\frac{e_j}{\sqrt{l(b_n)}}\right) \right\} \{1 + o_p(1)\}, \end{aligned}$$

and

$$\begin{aligned} \Delta _3= & {} n\sum \limits _{t=1}^n y_{t-1}^2 - \Big (\sum \limits _{t=1}^n y_{t-1}\Big )^2\\= & {} \frac{n}{1 - \rho }\Big \{\frac{1}{1+\rho } \Big [\frac{2 n\mu ^2 \rho }{1 - \rho } + \Big (n\mu ^2 + \sum _{t=1}^n e_t^2\Big ) - y_n^2 - \frac{2\mu \rho }{1 - \rho } y_n + \frac{4\rho \mu }{1 - \rho } \sum _{t=1}^n e_t\Big ] \\{} & {} - \frac{1}{1 - \rho } \Big [n\mu ^2 - 2 \mu y_n + 2\mu \sum _{t=1}^n e_t\Big ]\Big \}\{1 + o_p(1)\}\\= & {} \frac{n}{1 - \rho }\Big \{\frac{1}{1+\rho } \Big [\sum _{t=1}^n e_t^2 - y_n^2 - \frac{2\mu \rho }{1 - \rho } y_n\Big ] + \frac{2 \mu y_n}{1 - \rho }\Big \}\{1 + o_p(1)\}\\= & {} \left\{ \frac{n^{2+\alpha }l(b_n)}{-2c} \left( \frac{1}{n} \sum _{t=1}^n \frac{e_t^2}{l(b_n)}\right) + n^{1+3\alpha } \frac{\mu ^2}{-2c^3}\right\} \{1 + o_p(1)\}. \end{aligned}$$

These, together with Lemma 4, lead directly to (i).

(ii) Similarly, we have under P6

$$\begin{aligned} y_n= \frac{\mu }{c} n^\alpha \rho ^n+n^{\alpha /2}\rho ^n\sqrt{l(b_n)}V_2 \{1+o_p(1)\}, \end{aligned}$$

as \(n \rightarrow \infty \), which implies that

$$\begin{aligned} y_n^2= \frac{\mu ^2}{ c^2} n^{2\alpha }\rho ^{2n}+2\frac{\mu }{c}n^{3\alpha /2}\rho ^{2n}\sqrt{l(b_n)}V_2\{1+o_p(1)\} \end{aligned}$$

and

$$\begin{aligned} \sum _{t=1}^n y_{t-1}= & {} \frac{1}{c} n^\alpha y_n-\frac{1}{c} n^\alpha y_0-\frac{\mu }{c} n^{\alpha +1}-\frac{1}{c} n^\alpha \sum _{t=1}^n e_t\\= & {} \frac{\mu }{c^2}n^{2\alpha }\rho ^n+\frac{1}{c} n^{3\alpha /2}\rho ^n\sqrt{l(b_n)}V_2\{1+o_p(1)\}. \end{aligned}$$

Using a similar technique, we obtain, as \(n \rightarrow \infty \),

$$\begin{aligned} \sum _{t=1}^ny_{t-1}e_t= & {} \sum _{t=1}^n \left( -\frac{\mu }{c} n^\alpha +\frac{\mu }{c} n^\alpha \rho ^{t-1}+y_0\rho ^{t-1}+\sum _{i=1}^{t-1}\rho ^{t-1-i}e_i\right) e_t\\= & {} -\frac{\mu }{c}n^\alpha \sum _{t=1}^ne_t+\frac{\mu }{c} n^\alpha \sum _{t=1}^n\rho ^{t-1}e_t+y_0\sum _{t=1}^n\rho ^{t-1}e_t +\sum _{t=1}^n\left( \sum _{i=1}^{t-1}\rho ^{t-1-i}e_i\right) e_t\\= & {} \frac{\mu }{c}n^{3\alpha /2}\rho ^n\sqrt{l(b_n)}V_3\{1+o_p(1)\}, \end{aligned}$$

and

$$\begin{aligned} \sum _{t=1}^ny_{t-1}^2=\frac{\mu ^2}{2c^3}n^{3\alpha }\rho ^{2n}+\frac{\mu }{c^2}n^{5\alpha /2}\rho ^{2n}\sqrt{l(b_n)}V_2 \{1+o_p(1)\}. \end{aligned}$$

Then as \(n\rightarrow \infty \), we have

$$\begin{aligned} \Delta _1= & {} \frac{\mu ^2}{2c^3}n^{3\alpha +1/2}\rho ^{2n}\sqrt{l(b_n)}V_1\{1+o_p(1)\},\\ \Delta _2= & {} \frac{\mu }{c}n^{3\alpha /2+1}\rho ^n\sqrt{l(b_n)}V_3 \{1+o_p(1)\},\\ \Delta _3= & {} \frac{\mu ^2}{2c^3}n^{3\alpha +1}\rho ^{2n}\{1+o_p(1)\} \end{aligned}$$

Therefore, the result holds.

4 Concluding remarks

In this paper, we investigated the limit distribution of the least squares estimator of \((\mu , \rho )\) for the first-order autoregressive model with \(\mu \ne 0\). The discussions were taken under the assumption that the error variance may be infinite. The existing results fail to hold under this assumption. Our results show that the possible infinite variance affects the convergence rate of the estimator of the intercept in all cases, but only in some cases for the correlation coefficient; see Sections 3.3 and 3.4 for details. Based on the current results, one could build some testing procedures, e.g., t-statistics. However, their limit distributions may be quite complex because the least squares estimator has a different limit distribution in different cases, and even is degenerated in moderate deviations from a unit root case. Hence, it is interesting to construct some uniform statistical inferential procedures, e.g., confidence region for \((\mu , \rho )^\top \), which are robust to all cases above. Nevertheless, this topic is beyond the scope of the current paper and will be pursued in the future.