Keywords

JEL Classifications

Economic and financial time series have frequently been successfully modelled by autoregressive moving-average (ARMA) schemes of the type

$$ a(L){y}_t=b(L){\varepsilon}_t, $$
(1)

where εt is an orthogonal sequence (that is, E(εt) = 0, E(εtεs) = 0 for all ts), L is the backshift operator for which Lyt = yt−1 and a(L), b(L) are finite-order lag polynomials

$$ a(L)=\sum_{i=0}^p{a}_i{L}^i, \ b(L)=\sum_{j=0}^q{b}_j{L}^j, $$

whose leading coefficients are a0 = b0 = 1. Parsimonious schemes (often with p + q ≤ 3) are usually selected in practice either by informal ‘model identification’ processes such as those described in the text by Box and Jenkins (1976) or more formal order-selection criteria which penalize choices of large p and/or q. Model (1) is assumed to be irreducible, so that a(L) and b(L) have no common factors. The model (1) and the time series yt are said to have an autoregressive unit root if a(L) factors as (1 − L)a1(L) and a moving-average unit root if b(L) factors as (1 − L)b1(L).

Since the early 1980s, much attention has been focused on models with autoregressive unit roots. In part, this interest is motivated by theoretical considerations such as the importance of martingale models of efficient markets in finance and the dynamic consumption behaviour of representative economic agents in macroeconomics; and, in part, the attention is driven by empirical applications, which have confirmed the importance of random walk phenomena in practical work in economics, in finance, in marketing and business, in social sciences like political studies and communications, and in certain natural sciences. In mathematics and theoretical probability and statistics, unit roots have also attracted attention because they offer new and important applications of functional limit laws and weak convergence to stochastic integrals. The unit root field has therefore drawn in participants from an excitingly wide range of disciplines.

If (1) has an autoregressive unit root, then we may write the model in difference form as

$$ \Delta {y}_t={u}_t={a}_1{(L)}^{-1}b(L){\varepsilon}_t, $$
(2)

where the polynomial a1(L) has all its zeros outside the unit circle. This formulation suggests more general nonparametric models where, for instance, ut may be formulated in linear process (or Wold representation) form as

$$ {u}_t=c(L){\varepsilon}_t=\sum_{j=0}^{\infty }{c}_j{\varepsilon}_{t-j}, \ \mathrm{with} \ \sum_{j=0}^{\infty }{c}_j^2<\infty, $$
(3)

or as a general stationary process with spectrum fu(λ). If we solve (2) with an initial state y0 at t = 0, we have the important partial sum representation

$$ {y}_t=\sum_{j=1}^t{u}_j+{y}_0={S}_t+{y}_0, $$
(4)

showing that St and hence yt are ‘accumulated’ or ‘integrated’ processes proceeding from a certain initialization y0. A time series yt that satisfies (2) or (4) is therefore said to be integrated of order one (or a unit root process or an I(1) process) provided fu(0) > 0. The latter condition rules out the possibility of a moving-average unit root in the model for ut that would cancel the effect of the autoregressive unit root (for example, if b(L) = (1 − L)b1(L) then model (2) is Δyt = Δa1(L)−1b1(L)εt or, after cancellation, just yt = a1(L)−1b1(L)εt, which is not I(1)). Note that this possibility is also explicitly ruled out in the ARMA case by the requirement that a(L) and b(L) have no common factors. Alternatively, we may require that ut ≠ Δvt for some weakly stationary time series vt, as in Leeb and Potscher (2001) who provide a systematic discussion of I(1) behaviour. The partial sum process St in (4) is often described as a stochastic trend.

The representation (4) is especially important because it shows that the effect of the random shocks uj on yt does not die out as the time distance between j and t grows large. The shocks uj then have a persistent effect on yt in this model, in contrast to stationary systems. Whether actual economic time series have this characteristic or not is, of course, an empirical issue. The question can be addressed through statistical tests for the presence of a unit root in the series, a subject which has grown to be of major importance since the mid-1980s and which will be discussed later in this article. From the perspective of economic modelling the issue of persistence is also important because, if macroeconomic variables like real GNP have a unit root, then shocks to real GNP have permanent effects, whereas in traditional business cycle theory the effect of shocks on real GNP is usually considered to be only temporary. In more recent real business cycle theory, variables like real GNP are modelled in such a way that over the long run their paths are determined by supply side shocks that can be ascribed to technological and demographic forces from outside the model. Such economic models are more compatible with the statistical model (4) or close approximations to it in which the roots are local to unity in a sense that is described later in this essay.

Permanent and transitory effects in (4) can be distinguished by decomposing the process ut in (3) as follows

$$ {u}_t=\left\{C(1)+\left(L-1\right)\tilde{C}(L)\right\}{\varepsilon}_t=C(1){\varepsilon}_t+{\tilde{\varepsilon}}_{t-1}-{\tilde{\varepsilon}}_t, $$
(5)

where \( {\tilde{\varepsilon}}_t=\tilde{C}(L){\varepsilon}_t,\tilde{C}(L)={\sum}_0^{\infty }{\tilde{c}}_j{L}^j\mid \) and \( {\tilde{c}}_j={\sum}_{j+1}^{\infty }{c}_s \) The decomposition (5) is valid algebraically if

$$ \sum_{j=0}^{\infty }{j}^{1/2}\mid {c}_j\mid <\infty, $$
(6)

as shown in Phillips and Solo (1992), where validity conditions are systematically explored. Equation (5) is sometimes called the Beveridge–Nelson (1981) or BN decomposition of ut, although both specialized and more general versions of it were known and used beforehand. The properties of the decomposition were formally investigated and used for the development of laws of large numbers and central limit theory and invariance principles in the paper by Phillips and Solo (1992). When the decomposition is applied to (4) it yields the representation

$$ {y}_t=C(1)\sum_1^t{\varepsilon}_j+{\tilde{\varepsilon}}_0-{\tilde{\varepsilon}}_t+{y}_0=C(1)\sum_1^t{\varepsilon}_j+{\xi}_t+{y}_0, \ \mathrm{say}, $$
(7)

where \( {\xi}_t={\tilde{\varepsilon}}_0-{\tilde{\varepsilon}}_t \). The right side of (7) decomposes yt into three components: the first is a martingale component, \( {Y}_t=C(1){\sum}_1^t{\varepsilon}_j \), where the effects of the shocks εj are permanent; the second is a stationary component, where the effects of shocks are transitory, viz. \( {\xi}_t={\tilde{\varepsilon}}_0-{\tilde{\varepsilon}}_t \), since the process \( {\tilde{\varepsilon}}_t \) is stationary with valid Wold representation \( {\tilde{\varepsilon}}_t=\tilde{C}(L){\varepsilon}_t \) under (6) when εt is stationary with variance σ2; and the third being the initial condition y0. The relative strength of the martingale component is measured by the magnitude of the (infinite dimensional) coefficient \( C(1)={\sum}_{j=0}^{\infty }{c}_j \), which plays a large role in the measurement of long-run effects in applications. Accordingly, the decomposition (7) is sometimes called the martingale decomposition (cf., Hall and Heyde 1980) where it was used in various forms in the probability literature prior to its use in economics.

The leading martingale term \( {Y}_t=C(1){\sum}_{s=1}^t{\varepsilon}_s \) in (7) is a partial sum process or stochastic trend and, under weak conditions on εt (see Phillips and Solo 1992, for details) this term satisfies a functional central limit theorem whereby the scaled process

$$ {n}^{-1/2}{Y}_{\left[ nr\right]}\Rightarrow B(r), $$
(8)

a Brownian motion with variance ω2 = C(1)2σ2 = 2πfu(0), a parameter which is called the long-run variance of ut, and where [ · ] signifies the integer part of its argument. Correspondingly,

$$ {n}^{-1/2}{Y}_{\left[ nr\right]}\Rightarrow B(r), $$
(9)

provided \( {y}_0={o}_p\left(\sqrt{n}\right) \) A related result of great significance is based on the limit

$$ {n}^{-1}\sum_{t=1}^{\left[ nr\right]}{Y}_{t-1}{\varepsilon}_tC(1)\Rightarrow {\int}_0^r BdB $$
(10)

of the sample covariance of Yt−1 and its forward increment, C(1)εt. C(1)εt. The limit process \( M(r)={\int}_0^r BdB \) is represented here as an Ito (stochastic) integral and is a continuous time martingale. The result may be proved directly (Solo 1984; Phillips 1987a; Chan and Wei 1988) or by means of martingale convergence methods (Ibragimov and Phillips, 2004) which take advantage of the fact that \( {\sum}_{t=1}^k{Y}_{t-1}{\varepsilon}_t \) is a martingale. The limit theory given (9) and (10) was extended in Phillips (1987b, 1988a) and Chan and Wei (1987) to cases where the model (2) has an autoregressive root in the vicinity of unity (\( \rho =1+\frac{c}{n} \), for some fixed c) rather than precisely at unity, in which case the limiting process is a linear diffusion (or Ornstein–Uhlenbeck process) with parameter c. This limit theory has proved particularly useful in the analysis of asymptotic local power functions of unit root tests (Phillips 1987b) and the construction of confidence intervals (Stock 1991). Phillips and Magdalinos (2007) considered moderate deviations from unity of the form \( \rho =1+\frac{c}{k} \), where k → ∞ but \( \frac{k}{n}\to 0 \), so that the roots are local but further away from unity, showing that central limit laws rather than functional laws apply in this case (see also Giraitis and Phillips 2006). This theory is applicable to mildly explosive processes (where c > 0) and therefore assists in bridging the gap between the limit theory for the stationary, unit root and explosive cases.

Both (8) and (10) have important multivariate generalizations that play a critical role in the study of spurious regressions (Phillips 1986) and cointegration limit theory (Phillips and Durlauf 1986; Engle and Granger 1987; Johansen 1988; Phillips 1988a; Park and Phillips 1988, 1989). In particular, if \( {y}_t={\left({y}_{at}^{\prime },{y}_{bt}^{\prime}\right)}^{\prime } \), \( {u}_t=\left({u}_{at}^{\prime },{u}_{bt}^{\prime}\right) \) and \( {\varepsilon}_t={\left({\varepsilon}_{at}^{\prime },{\varepsilon}_{bt}^{\prime}\right)}^{\prime } \) are vector processes and \( E\left({\varepsilon}_t{\varepsilon}_t^{\prime}\right)=\Sigma \), then: (i) the decomposition (5) continues to hold under (6), where |cj| is interpreted as a matrix norm; (ii) the functional law (8) holds and the limit process is vector Brownian motion \( B={\left({B}_a^{\prime },{B}_b^{\prime}\right)}^{\prime } \) with covariance matrix Ω = C(1)ΣC(1); and (iii) sample covariances converge weakly to stochastic processes with drift, as in

$$ {n}^{-1}\sum_{t=1}^{\left[ nr\right]}{Y}_{at-1}{u}_{bt}^{\hbox{'}}\Rightarrow {\int}_0^r{B}_a{dB}_b^{\hbox{'}}+{\lambda}_{ab}r, $$
(11)

where \( {\lambda}_{ab}={\sum}_{k=1}^{\infty }E\left({u}_{a0}{u}_{bk}^{\prime}\right) \) is a one sided long-run covariance matrix. The limit process on the right side of (11) is a semimartingale (incorporating a deterministic drift function λabr) rather than a martingale when λab ≠ 0.

The decomposition (7) plays an additional role in the study of cointegration (Engle and Granger 1987). When the coefficient matrix C(1) is singular and β spans the null space of C(1)′, then β0C(1)= 0 and (7) leads directly to the relationship

$$ {\beta}^{\prime }{Y}_t=0, \ \mathrm{a}.\mathrm{s}., $$

which may be interpreted as a long run equilibrium (cointegrating) relationship between the stochastic trends (Yt) of yt. Correspondingly, we have the empirical cointegrating relationship

$$ {\beta}^{\prime }{y}_t={v}_t, $$

among the observed series yt with a residual vt = β′(ξt + y0) that is stationary. The columns of β span what is called the cointegration space.

The above discussion presumes that the initialization y0 has no impact on the limit theory, which will be so if y0 is small relative to the sample size, specifically, if \( {y}_0={o}_p\left(\sqrt{n}\right) \). However, if \( {y}_0={O}_p\left(\sqrt{n}\right) \), for example if \( {y}_0={y}_{0{\theta}_n} \) is indexed to depend on past shocks uj (satisfying a process of the form (3)) to some point in the distant past θn which is measured in terms of the sample size n, then the results can differ substantially. Thus, if θn = [κn], for some fixed parameter κ > 0, then \( {y}_{0{\theta}_n}={\sum}_1^{\left[\kappa n\right]}{u}_{-j}, \) and \( {n}^{-1/2}{y}_{0{\theta}_n}\Rightarrow {B}_0\left(\kappa \right) \), for some Brownian motion B0(κ) with covariance matrix Ω00 given by the long-run variance matrix of uj. Under such an initialization, (9) and (11) are replaced by

$$ {n}^{-1/2}{y}_{\left[ nr\right]}\Rightarrow B(r)+{B}_0\left(\kappa \right)\coloneq B\left(r,\kappa \right), \ \mathrm{say} $$
(12)

and

$$ {n}^{-1}\sum_{t=1}^{\left[ nr\right]}{y}_{at-1}{u}_{bt}^{\prime}\Rightarrow {\int}_0^r{B}_a\left(s,\kappa \right){dB}_b{(s)}^{\prime }+{\lambda}_{ab}r, $$

so that initializations play a role in the limit theory. This role becomes dominant when κ becomes very large, as is apparent from (12). The effect of initial conditions on unit root limit theory was examined in simulations by Evans and Savin (1981, 1984), by continuous record asymptotics by Phillips (1987a), in the context of power analysis by Müller and Elliott (2003), for models with moderate deviations from unity by Andrews and Guggenberger (2006), and for cases of large κ by Phillips (2006).

Model (4) is of special interest to economists working in finance because its output, yt, behaves as if it has no fixed mean and this is a characteristic of many financial time series. If the components uj are independent and identically distributed (i.i.d.) then yt is a random walk. More generally, if uj is a martingale difference sequence (mds) (that is orthogonal to its own past history so that Ej−1 (uj) = E(uj| uj−1, uj−2,. …,u1) = 0 then yt is a martingale. Martingales are the essential mathematical elements in the development of a theory of fair games and they now play a key role in the mathematical theory of finance, exchange rate determination and securities markets. Duffie (1988) provides a modern treatment of finance that makes extensive use of this theory.

In empirical finance much attention has recently been given to models where the conditional variance \( E\left({u}_j^2|{u}_{j-1},{u}_{j-2},\dots, {u}_1\right)={\sigma}_j^2 \) is permitted to be time varying. Such models have been found to fit financial data well and many different parametric schemes for \( {\sigma}_j^2 \) have been devised, of which the ARCH (autoregressive conditional heteroskedasticity) and GARCH (generalized ARCH) models are the most common in practical work. These models come within the general class of models like (1) with mds errors. Some models of this kind also allow for the possibility of a unit root in the determining mechanism of the conditional variance \( {\sigma}_j^2 \) and these are called integrated conditional heteroskedasticity models. The IGARCH (integrated GARCH) model of Engle and Bollerslev (1986) is an example, where for certain parameters ω ≥ 0, β ≥ 0, and α > 0, we have the specification

$$ {\sigma}_j^2=\upomega +{\beta \sigma}_{j-1}^2+\alpha {u}_{j-1}^2, $$
(13)

with α + β = 1 and uj = σjzj, where the zj are i.i.d. innovations with E(zj) = 0 and \( E\left({z}_j^2\right)=1 \). Under these conditions, the specification (13) has the alternative form

$$ {\sigma}_j^2=\upomega +{\sigma}_{j-1}^2+{\alpha \sigma}_{j-1}^2\left({z}_{j-1}^2-1\right), $$
(14)

from which it is apparent that \( {\sigma}_j^2 \) has an autoregressive unit root. Indeed, since

$$ E\left({\sigma}_j^2|{\sigma}_{j-1}^2\right)=\upomega +{\sigma}_{j-1}^2, $$

σ2j is a martingale when ω = 0. It is also apparentfrom (14) that shocks as manifested in the deviation \( {z}_{j-1}^2-1 \) are persistent in \( {\sigma}_j^2 \). Thus, \( {\sigma}_j^2 \) shares some of the characteristics of an I(1) integrated process. But in other ways, \( {\sigma}_j^2 \) is very different. For instance, when ω = 0 then \( {\sigma}_j^2\to 0 \) almost surely as j → ∞ and, when ω > 0, \( {\sigma}_j^2 \) is asymptotically equivalent to a strictly stationary and ergodic process. These and other features of models like (13) for conditional variance processes with a unit root are studied in Nelson (1990).

In macroeconomic theory also, models such as (2) play a central role in modern treatments. In a highly influential paper, R. Hall (1978) showed that under some general conditions consumption is well modelled as a martingale, so that consumption in the current period is the best predictor of future consumption, thereby providing a macroeconomic version of the efficient markets hypothesis. Much attention has been given to this idea in subsequent empirical work.

One generic class of economic model where unit roots play a special role is the ‘present value model’ of Campbell and Shiller (1988). This model is based on agents’ forecasting behaviour and takes the form of a relationship between one variable Yt and the discounted, present value of rational expectations of future realizations of another variable Xt+i(i = 0,1,2,...). More specifically, for some stationary sequence ct (possibly a constant) we have

$$ {Y}_t=\theta \left(1-\delta \right)\sum_{i=0}^{\infty }{\delta}^i{E}_t\left({X}_{t+i}\right)+{c}_t. $$
(15)

When Xt is a martingale, Et(Xt + i) = Xt and (15) becomes

$$ {Y}_t=\theta {X}_t+{c}_t, $$
(16)

so that Yt and Xt are cointegrated in the sense of Engle and Granger (1987). More generally, when Xt is I(1) we have

$$ {Y}_t=\theta {X}_t+{\overline{c}}_t, $$
(17)

where \( {\overline{c}}_t={c}_t+\theta {\sum}_{k=1}^{\infty }{\delta}^k{E}_t\left(\Delta {X}_{t+k}\right) \), so that Yt and Xt are also cointegrated in this general case. Models of this type arise naturally in the study of the term structure of interest rates, stock prices and dividends and linear-quadratic intertemporal optimization problems.

An important feature of these models is that they result in parametric linear cointegrating relations such as (16) and (17). This linearity in the relationship between Yt and Xt accords with the linear nature of the partial sum process that determines Xt itself, as seen in (4), and has been extensively studied since the mid-1980s. However, in more general models, economic variables may be determined in terms of certain nonlinear functions of fundamentals. When these fundamentals are unit root processes like (4), then the resulting model has the form of a nonlinear cointegrating relationship. Such models are relevant, for instance, in studying market interventions by monetary and fiscal authorities (Park and Phillips 2000; Hu and Phillips 2004) and some of the asymptotic theory for analysing parametric models of this type and for statistical inference in such models is given in Park and Phillips (1999, 2001), de Jong (2004), Berkes and Horváth (2006) and Pötscher (2004). More complex models of this type are nonparametric and different methods of inference are typically required with very different limit theories and typically slower convergence rates (Karlsen et al. 2007; Wang and Phillips 2006). Testing for the presence of such nonlinearities can therefore be important in empirical practice (Hong and Phillips 2005; Kasparis 2004).

Statistical tests for the presence of a unit root fall into the general categories of classical and Bayesian, corresponding to the mode of inference that is employed. Classical procedures have been intensively studied and now occupy a vast literature. Most empirical work to date has used classical methods but some attention has been given to Bayesian alternatives and direct model selection methods. These approaches will be outlined in what follows.

Although some tests are known to have certain limited (asymptotic) point optimality properties, there is no known procedure which uniformly dominates others, even asymptotically. Ploberger (2004) provides an analysis of the class of asymptotically admissable tests in problems that include the simplest unit root test, showing that the conventional likelihood ratio (LR) test (or Dickey and Fuller 1979; Dickey and Fuller 1981, t test) is not within this class, so that the LR test, while it may have certain point optimal properties, is either inadmissible or must be modified so that it belongs to the class. This fundamental difficulty, together with the nonstandard nature of the limit theory and the more complex nature of the asymptotic likelihood in unit root cases partly explains why there is such a proliferation of test procedures and simulation studies analysing performance characteristics in the literature.

Classical tests for a unit root may be classified into parametric, semiparametric and nonparametric categories. Parametric tests usually rely on augmented regressions of the type

$$ \Delta {y}_t={ay}_{t-1}+\sum_{i=1}^{k-1}{\phi}_i\Delta {y}_{t-i}+{e}_t, $$
(18)

where the lagged variables are included to model the stationary error ut in (2). Under the null hypothesis of a unit root, we have a = 0 in (18) whereas when yt is stationary we have a < 0. Thus, a simple test for the presence of a unit root against a stationary alternative in (18) is based on a one-sided t-ratio test of \( {\mathscr{H}}_0:a=0 \) against \( {\mathscr{H}}_1:a<0 \). This test is popularly known as the ADF (or augmented Dickey–Fuller) test (Said and Dickey 1984) and follows the work of Dickey and Fuller (1979, 1981) for testing Gaussian random walks. It has been extensively used in empirical econometric work since the Nelson and Plosser (1982) study, where it was applied to 14 historical time series for the USA leading to the conclusion that unit roots could not be rejected for 13 of these series (all but the unemployment rate). In that study, the alternative hypothesis was that the series were stationary about a deterministic trend (that is, trend stationary) and therefore model (18) was further augmented to include a linear trend, viz.

$$ \Delta {y}_t=\mu +\beta t+{ay}_{t-1}+\sum_{i=1}^{k-1}{\phi}_i\Delta {y}_{t-i}+{e}_t, $$
(19)

When yt is trend stationary we have a < 0 and β ≠ 0 in (19), so the null hypothesis of a difference stationary process is a = 0 and β = 0. This null hypothesis allows for the presence of a non-zero drift in the process when the parameter μ ≠ 0. In this case a joint test of the null hypothesis \( {\mathscr{H}}_0:a=0 \), β = 0 can be mounted using a regression F-test. ADF tests of a = 0 can also be mounted directly using the coefficient estimate from (18) or (19), rather than its t ratio (Xiao and Phillips 1998).

What distinguishes both these and other unit root tests is that critical values for the tests are not the same as those for conventional regression F- and t-tests, even in large samples. Under the null, the limit theory for these tests is nonstandard and involves functionals of a Wiener process. Typically, the critical values for five or one per cent level tests are much further out than those of the standard normal or chi-squared distributions. Specific forms for the limits of the ADF t-test (ADFt) and coefficient (ADFa) test are

$$ {ADF}_t\Rightarrow \frac{\int_0^1 WdW}{{\left({\int}_0^1{W}^2\right)}^{1/2}}, \ {ADF}_a\Rightarrow \frac{\int_0^1 WdW}{\int_0^1{W}^2}, $$
(20)

where W is a standard Wiener process or Brownian motion with variance unity. The limit distributions represented by the functionals (20) are known as unit root distributions. The limit theory was first explored for models with Gaussian errors, although not in the Wiener process form and not using functional limit laws, by Dickey (1976), Fuller (1976) and Dickey and Fuller (1979, 1981), who also provided tabulations. For this reason, the distributions are sometimes known as Dickey–Fuller distributions. Later work by Said and Dickey (1984) showed that, if the lag number k in (18) is allowed to increase as the sample size increases with a condition on the divergence rate that k = 0(n1/3), then the ADF test is asymptotically valid in models of the form (2) where ut is not necessarily autoregressive.

Several other parametric procedures have been suggested, including Von Neumann ratio statistics (Sargan and Bhargava 1983; Bhargava 1986; Stock 1994a), instrumental variable methods (Hall 1989; Phillips and Hansen 1990) and variable addition methods (Park 1990). The latter also allow a null hypothesis of trend stationarity to be tested directly, rather than as an alternative to difference stationarity. Another approach that provides a test of a null of trend stationarity is based on the unobserved components representation

$$ {y}_t=\mu +\beta t+{r}_t+{u}_t, \ {r}_t={r}_{t-1}+{v}_t. $$
(21)

which decomposes a time series yt into a deterministic trend, an integrated process or random walk (rt) and a stationary residual (ut). The presence of the integrated process component in yt can then be tested by testing whether the variance \( \left({\sigma}_v^2\right) \) of the innovation vt is zero. The null hypothesis is then \( {\mathscr{H}}_0:{\sigma}_{\nu}^2=0 \), which corresponds to a null of trend stationarity. This hypothesis can be tested in a very simple way using the Lagrange multiplier (LM) principle, as shown in Kwiatkowski et al. (1992), leading to a commonly used test known as the KPSS test. If \( {\widehat{e}}_t \) denotes the residual from a regression of yt on a deterministic trend (a simple linear trend in the case of (21) above) and \( {\widehat{\upomega}}_e^2 \) is a HAC (heteroskedastic and autocorrelation consistent) estimate constructed from \( {\widehat{e}}_t \), then the KPSS statistic has the simple form

$$ LM=\frac{n^{-2}\sum_{t=1}^n{S}_t^2}{{\widehat{\upomega}}_e^2}, $$

where St is the partial sum process of the residuals \( {\sum}_{j=1}^t{\widehat{e}}_j. \) Under the null hypothesis of stationarity, this LM statistic converges to \( {\int}_0^1{V}_X^2 \), where VX is a generalized Brownian bridge process whose construction depends on the form (X) of the deterministic trend function. Power analysis indicates that test power depends importantly on the choice of bandwidth parameter in HAC estimation and some recent contributions to this subject are Sul et al. (2006) and Müller (2005) and Harris et al. (2007). Other general approaches to testing I(0) versus I(1) have been considered in Stock (1994a, 1999).

By combining rt and ut in (21) the components model may also be written as

$$ {y}_t=\mu +\beta t+{x}_t, \ \Delta {x}_t={ax}_{t-1}+{\eta}_t. $$
(22)

In this format it is easy to construct an LM test of the null hypothesis that yt has a stochastic trend component by testing whether a = 0 in (22). When a = 0, (22) reduces to

$$ \Delta {y}_t=\beta +{\eta}_t, \ \mathrm{or} \ {y}_t=\beta t+\sum_1^t{\eta}_i+{y}_0, $$
(23)

and so the parameter μ is irrelevant (or surplus) under the null. However, the parameter β retains the same meaning as the deterministic trend term coefficient under both the null and the alternative hypothesis. This approach has formed the basis of several tests for a unit root that have been developed (see Bhargava 1986; Schmidt and Phillips 1992) and the parameter economy of this model gives these tests some advantage in terms of power over procedures like the ADF in the neighbourhood of the null.

This power advantage may be further exploited by considering point optimal alternatives in the construction of the test and in the process of differencing (or detrending) that leads to (23), as pursued by Elliott et al. (1995). In particular, note that (23) involves detrending under the null hypothesis of a unit root, which amounts to first differencing, whereas if the root were local to unity, the appropriate procedure would be to use quasi differencing. However, since the value of the coefficient in the locality of unity is unknown (otherwise, there would be no need for a test), it can only be estimated or guessed. The procedure suggested by Elliott et al. (1995) is to use a value of the localizing coefficient in the quasi-differencing process for which asymptotic power is calculated by simulation to be around 50 per cent, a setting which depends on the precise model for estimation that is being used. This procedure, which is commonly known as generalized least squares (GLS) detrending (although the terminology is a misnomer because quasi-differencing not full GLS is used to accomplish trend elimination) is then asymptotically approximately point optimal in the sense that its power function touches the asymptotic power envelope at that value. Simulations show that this method has some advantage in finite samples, but it is rarely used in empirical work in practice, partly because of the inconvenience of using specialized tables for the critical values of the resulting test and partly because settings for the localizing coefficient are arbitrary and depend on the form of the empirical model.

Some unit root tests based on standard limit distribution theory have been developed. Phillips and Han (2008), for example, give an autoregressive coefficient estimator whose limit distribution is standard normal for all stationary, unit root and local to unity values of the autoregressive coefficient. This estimator may be used to construct tests and valid confidence intervals, but tests suffer power loss because the rate of convergence of the estimator is \( \sqrt{n} \) uniformly over these parameter values. So and Shin (1999) and Phillips et al. (2004) showed that certain nonlinear instrumental variable estimators, such as the Cauchy estimator, also lead to t-tests for a unit root which have an asymptotic standard normal distribution. Again, these procedures suffer power loss from reduced convergence rates (in this case, n1/4), but have the advantage of uniformity and low bias. Bias is a well known problem in autoregressive estimation and many procedures for addressing the problem have been considered. It seems that bias reduction is particularly advantageous in the case of unit root tests in panel data, where cross-section averaging exacerbates bias effects when the time dimension is small. Some simulation and indirect inference procedures for bias removal have been successfully used both in autoregressions (Andrews 1993; Gouriéroux et al. 2000) and in panel dynamic models (Gouriéroux, Phillips and Yu 2006).

Semiparametric unit root tests are among the most commonly used unit root tests in practical work and are appealing in terms of their generality and ease of use. Tests in this class employ nonparametric methods to model and estimate the contribution from the error process ut in (2), allowing for both autocorrelation and heterogeneity. These tests and the use of functional limit theory methods in econometrics, leading to the limit formulae (20), were introduced in Phillips (1987a). Direct least squares regression on

$$ \Delta {y}_t={ay}_{t-1}+{u}_t $$
(24)

gives an estimate of the coefficient and its t-ratio in this equation. These two statistics are then corrected to deal with serial correlation in ut by employing an estimate of the variance of ut and its long-run variance. The latter estimate may be obtained by a variety of kernel-type HAC or other spectral estimates (such as autoregressive spectral estimates) using the residuals \( {\widehat{u}}_t \) of the OLS regression on (24). Automated methods of bandwidth selection (or order selection in the case of autoregressive spectral estimates) may be employed in computing these HAC estimates and these methods typically help to reduce size distortion in unit root testing (Lee and Phillips 1994; Stock 1994a; Ng and Perron 1995, 2001). However, care needs to be exercised in the use of automated procedures in the context of stationarity tests such as the KPSS procedure to avoid test inconsistency (see Lee 1996; Sul et al. 2006).

This semiparametric approach leads to two test statistics, one based on the coefficient estimate, called the Z(a) test, the other based on its t-ratio, called the Z(t) test. The limit distributions of these statistics are the same as those given in (20) for the ADF coefficient and t-ratio tests, so the tests are asymptotically equivalent to the corresponding ADF tests. Moreover, the local power functions are also equivalent to those of the Dickey–Fuller and ADF tests, so that there is no loss in asymptotic power from the use of nonparametric methods to address autocorrelation and heterogeneity (Phillips 1987b). Similar semiparametric corrections can be applied to the components models (21) and (22) leading to generally applicable LM tests of stationarity (σ2 = 0) and stochastic trends (a = 0).

The Z tests were extended in Phillips and Perron (1988) and Ouliaris, Park and Phillips (1989) to models with drift, and by Perron (1989) and Park and Sung (1994) to models with structural breaks in the drift or deterministic component. An important example of the latter is the trend function

$$ {h}_t=\sum_{j=0}^p{f}_j{t}^j+\sum_{j=0}^p{f}_{m,j}{t}_m^j, \ \mathrm{where} \ {t}_m^j=\left\{\begin{array}{cc}\hfill 0\hfill & \hfill t\in \left\{1,\dots, m\right\}\hfill \\ {}\hfill {\left(t-m\right)}^j\hfill & \hfill t\in \left\{m+1,\dots, n\right\}\hfill \end{array}\right. $$
(25)

which allows for the presence of a break in the polynomial trend at the data point t = m + 1. Collecting the individual trend regressors in (25) into the vector xt, there exists a continuous function X(r) = (1, r, …, rp) such that \( {D}_n^{-1}{x}_{\left[ nr\right]}\to X(r) \) as n → ∞ uniformly in r ∈ [0,1], where Dn = diag(1,n,...,np). If μ = limn → ∞(m/n) > 0 is the limit of the fraction of the sample where the structural change occurs, then the limiting trend function Xμ(r) corresponding to (25) has a similar break at the point μ. All the unit root tests discussed above continue to apply as given for such broken trend functions with appropriate modifications to the limit theory to incorporate the limit function Xμ(r). Indeed, (25) may be extended further to allow for multiple break points in the sample and in the limit process. The tests may be interpreted as tests for the presence of a unit root in models where broken trends may be present in the data. The alternative hypothesis in this case is that the data are stationary about a broken deterministic trend of degree p.

In order to construct unit root tests that allow for breaking trends like (25) it is necessary to specify the break point m. (Correspondingly, the limit theory depends on Xμ(r) and therefore on μ.) In effect, the break point is exogenously determined. Perron (1989) considered linear trends with single break points intended to capture the 1929 stock market crash and the 1974 oil price shock in this way. An alternative perspective is that any break points that occur are endogenous to the data and unit root tests should take account of this fact. In this case, alternative unit root tests have been suggested (for example, Banerjee et al. 1992; Zivot and Andrews 1992) that endogenize the break point by choosing the value of m that gives the least favourable view of the unit root hypothesis. Thus, if ADF(m) denotes the ADF statistic given by the t-ratio for α in the ADF regression (19) with a broken trend function like (25), then the trend break ADF statistic is

$$ ADF\left(\widehat{m}\right)=\underset{\underline{m}\le m\le \overline{m}}{\min} \ ADF(m), \ \mathrm{where} \ \underline{m}=\left[n\underline{\mu}\right],\overline{m}=\left[n\overline{\mu}\right], $$
(26)

for some \( 0<\underline{\mu}<\overline{\mu}<1. \) The limit theory for this trend break ADF statistic is given by

$$ ADF\left(\widehat{m}\right)\Rightarrow \underset{\mu \in \left[\underline{\mu},,,\overline{\mu}\right]}{\inf}\left[{\int}_0^1{W}_{X_{\mu }} dW\right]{\left[{\int}_0^1{W}_{X_{\mu}}^2\right]}^{-1/2}, $$
(27)

where WX is detrended standard Brownian motion defined by

\( {W}_X(r)=W(r)-\left[{\int}_0^1 WX\right]{\left[{\int}_0^1 XX\right]}^{-1}X(r). \) The limit process Xμ(r) that appears in the functional \( {W}_{X_{\mu }} \) is dependent on the trend break point μ over which the functional is minimized. Similar extensions to trend breaks are possible for other unit root tests and to multiple breaks (Bai 1997; Bai and Perron 1998, 2006; Kapetanios 2005). Critical values of the limiting test statistic (27) are naturally further out in the tail than those of the exogenous trend break statistic, so it is harder to reject the null hypothesis of a unit root when the break point is considered to be endogenous.

Asymptotic and finite sample critical values for the endogenized trend break ADF unit root test are given in Zivot and Andrews (1992). Simulations studies indicate that the introduction of trend break functions leads to further reductions in the power of unit root tests and to substantial finite sample size distortion in the tests. Sample trajectories of a random walk are often similar to those of a process that is stationary about a broken trend for some particular breakpoint (and even more so when several break points are permitted in the trend). So continuing reductions in the power of unit root tests against competing models of this type is to be expected and discriminatory power between such different time series models is typically low. In fact, the limit Brownian motion process in (9) can itself be represented as an infinite linear random combination of deterministic functions of time, as discussed in Phillips (1998), so there are good theoretical reasons for anticipating this outcome. Carefully chosen trend stationary models can always be expected to provide reasonable representations of given random walk or unit root data, but such models are certain to fail in post-sample projections as the post-sample data drifts away from any given trend or broken trend line. Phillips (1998, 2001) explores the impact of these considerations in a systematic way.

From a practical standpoint, models with structural breaks attach unit weight and hence persistence to the effects of innovations at particular times in the sample period. In effect, break models simply dummy out the effects of certain observations by parameterizing them as persistent effects. To the extent that persistent shocks of this type occur intermittently throughout the entire history of a process, these models are therefore similar to models with a stochastic trend. However, if only one or a small number of such breaks occur then the process does have different characteristics from that of a stochastic trend. In such cases, it is often of interest to identify the break points endogenously and relate such points to institutional events or particular external shocks that are know to have occurred.

More general nonparametric tests for a unit root are also possible. These rely on frequency domain regressions on (24) over all frequency bands (Choi and Phillips 1993). They may be regarded as fully nonparametric because they test in a general way for coherency between the series yt and its first difference Δyt. Other frequency domain procedures involve the estimation of a fractional differencing parameter and the use of tests and confidence intervals based on the estimate. The time series yt is fractionally integrated with memory parameter d if (1 − L)dyt = ut and ut is a stationary process with spectrum fu(λ) that is continuous at the origin with fu(0) > 0, or a (possibly mildly heterogeneous) process of the form given in (3). Under some rather weak regularity conditions, it is possible to estimate d consistently by semiparametric methods irrespective of the value of d. Shimotsu and Phillips (2005) suggest an exact local Whittle estimator \( \widehat{d} \) that is consistent for all d and for which \( \sqrt{n}\left(\widehat{d}-d\right)\Rightarrow N\left(0,\frac{1}{4}\right) \), extending earlier work by Robinson (1995) on local Whittle estimation in the stationary case where |d| < 1. These methods are narrow band procedures focusing on frequencies close to the origin, so that long run behaviour is captured. The Shimotsu–Phillips estimator may be used to test the unit root hypothesis \( {\mathscr{H}}_0:d=1 \) against alternatives such as \( {\mathscr{H}}_1:d<1 \). The limit theory may also be used to construct valid confidence intervals for d.

The Z(a), Z(t) and ADF tests are the most commonly used unit root tests in empirical research. Extensive simulations have been conducted to evaluate the performance of the tests. It is known that the Z(a), Z(t) and ADF tests all perform satisfactorily except when the error process ut displays strong negative serial correlation. The Z(a) test generally has greater power than the other two tests but also suffers from more serious size distortion. All of these tests can be used to test for the presence of cointegration by using the residuals from a cointegrating regression. Modification of the critical values used in these tests is then required, for which case the limit theory and tables were provided in Phillips and Ouliaris (1990) and updated in MacKinnon (1994).

While the Z tests and other semiparametric procedures are designed to cope with mildly heterogeneous processes, some further modifications are required when there is systematic time-varying heterogeneity in the error variances. One form of systematic variation that allows for jumps in the variance has the form \( E\left({\varepsilon}_t^2\right)={\sigma}_t^2={\sigma}^2g\left(\frac{t}{n}\right) \), where the variance evolution function \( g\left(\frac{t}{n}\right) \) may be smooth except for simple jump discontinuities at a finite number of points. Such formulations introduce systematic time variation into the errors, so that we may write \( {\varepsilon}_t=g\left(\frac{t}{n}\right){\eta}_t \), where ζt is a martingale difference sequence with variance \( E{\zeta}_t^2={\sigma}^2 \). These evolutionary changes then have persistent effects on partial sums of εt, thereby leading to alternate functional laws of the form

$$ {n}^{-1/2}{Y}_{\left[ nr\right]}\Rightarrow {B}_g(r)={\int}_0^rg(s) dB(s), $$

in place of (8). Accordingly, the limit theory for unit root tests changes and some nonparametric modification of the usual tests is needed to ensure that existing asymptotic theory applies (Beare 2006) or to make appropriate corrections in the limit theory (Cavaliere 2004; Cavaliere and Taylor 2007) so that there is less size distortion in the tests.

An extension of the theory that is relevant in the case of quarterly data is to the seasonal unit root model

$$ \left(1-{L}^4\right){y}_t={u}_t. $$
(28)

Here, the polynomial 1 − L4 can be factored as (1 − L)(1 + L)(1 + L2), so that the unit roots (or roots on the unit circle) in (28) occur at 1, −1, i, and −i, corresponding to the annual (L = 1) frequency, the semi-annual (L = −1) frequency, and the quarter and three quarter annual (L = i, −i) frequency respectively. Quarterly differencing, as in (28), is used as a seasonal adjustment device, and it is of interest to test whether the data supports the implied hypothesis of the presence of unit roots at these seasonal frequencies. Other types of seasonal processes, say monthly data, can be analysed in the same way. Tests for seasonal unit roots within the particular context of (28) were studied by Hylleberg et al. (1990), who extended the parametric ADF test to the case of seasonal unit roots. In order to accommodate fourth differencing, the autoregressive model is written in the new form

$$ {\Delta}_4{y}_t={\alpha}_1{y}_{1t-1}+{\alpha}_2{y}_{2t-1}+{\alpha}_3{y}_{3t-2}+{\alpha}_4{y}_{3t-1}+{\sum}_{i=1}^p{\phi}_i{\Delta}_4{y}_{t-i}+{\varepsilon}_t, $$
(29)

where Δ4 = 1 − L4, y1t = (1 + L)(1 + L2)yt, y2t = − (1 − L)(1 + L2)yt, and y3t = − (1 − L2)yt. The transformed data y1t, y2t, y3t retain the unit root at the zero frequency (long run), the semi-annual frequency (two cycles per year), and the annual frequency (one cycle per year). When α1 = α2 = α3 = α4 = 0, there are unit roots at the zero and seasonal frequencies. To test the hypothesis of a unit root (L = 1) in this seasonal model, a t-ratio test of α1 = 0 is used. Similarly, the test for a semi-annual root (L = − 1) is based on a t-ratio test of α2 = 0, and the test for an annual root on the t-ratios for α3 = 0 or α4 = 0. If each of the α’s is different from zero, then the series has no unit roots at all and is stationary. Details of the implementation of this procedure are given in Hylleberg et al. (1990), the limit theory for the tests is developed in Chan and Wei (1988), and Ghysels and Osborne (2001) provide extensive discussion and applications.

Most empirical work on unit roots has relied on classical tests of the type described above. But Bayesian methods are also available and appear to offer certain advantages like an exact finite sample analysis (under specific distributional assumptions) and mass point posterior probabilities for break point analysis. In addressing the problem of trend determination, traditional Bayes methods may be employed such as the computation of Bayesian confidence sets and the use of posterior odds tests. In both cases prior distributions on the parameters of the model need to be defined and posteriors can be calculated either by analytical methods or by numerical integration. If (18) is rewritten as

$$ {y}_t=\rho {y}_{t-1}+\sum_1^{k-1}{\phi}_i\Delta {y}_{t-i}+{e}_t $$
(30)

then the posterior probability of the nonstationary set {ρ ≥ 1} is of special interest in assessing the evidence in support of the presence of a stochastic trend in the data. Posterior odds tests typically proceed with ‘spike and slab’ prior distributions (π) that assign an atom of mass such as π(ρ = 1) = θ to the unit-root null and a continuous distribution with mass 1 − θ to the stationary alternative, so that π(−1 <ρ< 1) = 1 − θ. The posterior odds then show how the prior odds ratio θ/(1 − θ) in favour of the unit root is updated by the data.

The input of information via the prior distribution, whether deliberate or unwitting, is a major reason for potential divergence between Bayesian and classical statistical analyses. Methods of setting an objective correlative in Bayesian analysis through the use of model-based, impartial reference priors that accommodate nonstationarity are therefore of substantial interest. These were explored in Phillips (1991a), where many aspects of the subject are discussed. The subject is controversial, as the attendant commentary on that paper and the response (Phillips 1991b) reveal. The simple example of a Gaussian autoregression with a uniform prior on the autoregressive coefficient ρ and with an error variance σ2 that is known illustrates one central point of controversy between Bayesian and classical inference procedures. In this case, when the prior on ρ is uniform, the posterior for ρ is Gaussian and symmetric about the maximum likelihood estimate \( \widehat{\rho} \) (Sims and Uhlig 1991), whereas the sampling distribution of \( \widehat{\rho} \) is biased downwards and skewed with a long left-hand tail. Hence, if the calculated value of \( \widehat{\rho} \) were found to be \( \widehat{\rho}=1 \), then Bayesian inference effectively assigns a 50 per cent posterior probability to stationarity {|ρ| <1}, whereas classical methods, which take into account the substantial downward bias in the estimate \( \widehat{\rho} \), indicate that the true value of ρ is much more likely to be in the explosive region {ρ > 1}.

Another major point of difference is that the Bayesian posterior distribution is asymptotically Gaussian under very weak conditions, which include cases where there are unit roots (ρ = 1), whereas classical asymptotics for \( \widehat{\rho} \) are non-standard, as in (20). These differences are explored in Kim (1994), Phillips and Ploberger (1996) and Phillips (1996). The unit root case is one of very few instances where Bayesian and classical asymptotic theory differ. The reason for the difference in the unit root case is that Bayesian asymptotics rely on the local quadratic shape of the likelihood and condition on a given trajectory, whereas classical asymptotics rely on functional laws such as (9), which take into account the persistence in unit root data which manifest in the limiting trajectory.

Empirical illustrations of the use of Bayesian methods of trend determination for various macroeconomic and financial time series are given in DeJong and Whiteman (1991a, b), Schotman and van Dijk (1991) and Phillips (1991a, 1992), the latter implementing an objective model-based approach. Phillips and Ploberger (1994, 1996) develop Bayes tests, including an asymptotic information criterion PIC (posterior information criterion) that extends the Schwarz (1978) criterion BIC (Bayesian information criterion) by allowing for potential nonstationarity in the data (see also Wei 1992). This approach takes account of the fact that Bayesian time series analysis is conducted conditionally on the realized history of the process. The mathematical effect of such conditioning is to translate models such as (30) to a ‘Bayes model’ with time-varying and data-dependent coefficients, that is,

$$ {y}_{t+1}={\widehat{\rho}}_t{y}_t+\sum_1^{k-1}{\widehat{\phi}}_{it}\Delta {y}_{t-i}+{e}_t, $$
(31)

where \( \left({\widehat{\rho}}_t,{\widehat{\phi}}_{it};i=1,\dots, k-1\right) \) are the latest best estimates of the coefficients from the data available to point ‘t’ in the trajectory. The ‘Bayes model’ (31) and its probability measure can be used to construct likelihood ratio tests of hypotheses such as the unit root null ρ = 1, which relate to the model selection criterion PIC. Empirical illustrations of this approach are given in Phillips (1994, 1995).

Nonstationarity is certainly one of the most dominant and enduring characteristics of macroeconomic and financial time series. It therefore seems appropriate that this feature of the data be seriously addressed both in econometric methodology and in empirical practice. However, until the 1980s this was not the case. Before 1980 it was standard empirical practice in econometrics to treat observed trends as simple deterministic functions of time. Nelson and Plosser (1982) challenged this practice and showed that observed trends can be better modelled if one allows for stochastic trends even when there is some deterministic drift. Since their work there has been a continuing reappraisal of trend behaviour in economic time series and substantial development in the econometric methods of nonstationary time series. But the general conclusion that stochastic trends are present as a component of many economic and financial time series has withstood extensive empirical study.

This article has touched only a part of this large research field and traced only the main ideas involved in unit root modelling and statistical testing. This overview also does not cover the large and growing field of panel unit root testing and panel stationarity tests. The reader may consult the following review articles devoted to various aspects of the field for additional coverage and sources: (a) on unit roots: Phillips (1988b), Diebold and Nerlove (1990), Dolado et al. (1990), Campbell and Perron (1991), Stock (1994b), Phillips and Xiao (1998), and Byrne and Perman (2006); (b) on panel unit roots: Phillips and Moon (1999), Baltagi and Kao (2000), Choi (2001), Hlouskova and Wagner (2006); and (c) special journal issues of the Oxford Bulletin of Economics and Statistics (1986; 1992), the Journal of Economic Dynamics and Control (1988), Advances in Econometrics (1990), Econometric Reviews (1992), and Econometric Theory (1994).

See Also