1 Introduction

Inflation persistence is an important issue for economists and especially for central bankers. This is because the degree of inflation persistence influences the extent to which central banks can control inflation. If inflation persistence is high, a shock to the price level increases inflation for a long period. In the worst case scenario, inflation might even follow the path of a random walk, making it impossible for central banks to bring it under control. In the best case, inflation is integrated of order zero. This implies that it reverts back to its initial level soon after a shock has occurred.

Not only is the level of inflation important in economic analysis but also the question of whether and when it has changed. If the occurrence and/or timing of a break are not accounted for properly, then inflation forecasts and policy decisions might be misguided. Despite its importance, there is still no agreement on the significance and dating of past changes in inflation persistence in the U.S. and elsewhere.Footnote 1 The diverse findings could be due to the fact that many studies ignore the fractionally integrated nature of inflation. This may lead to misspecification and incorrect test results. The early results presented by Geweke and Porter-Hudak (1983) along with the international evidence of Hassler and Wolters (1995) and Baillie et al. (1996), have long since established that inflation exhibits long memory. In view of this evidence, Kumar and Okimoto (2007) argue that tests for a change in inflation persistence using unit root tests or autoregressive coefficients may lead to incorrect conclusions. Their study is the first to use long memory techniques to determine a change in inflation persistence. It applies a visual judgment of rolling window estimates and analyzes two exogenously split subsamples. We go beyond this approach and attempt not only to answer the question of whether there has been a change in U.S. inflation persistence but also to determine the \(data{\text{- }}driven\) timing and the number of breaks.

Our paper contributes to the existing literature by proposing and applying a new procedure for determining the timing and the significance of breaks in the degree of fractional integration. We are not aware of any other test allowing for multiple breaks in long memory at unknown points in time. The test builds on a modified version of the lag-augmented LM (Lagrange Multiplier) test proposed by Demetrescu et al. (2008) where dummy variables account for potential breaks. The \(F\) type test statistic is computed from a regression of differences under the null hypothesis. Therefore, no \(I(d)\) series, \(d \ne 0\), enter the test regression under the null, and the estimators converge at the conventional \(\sqrt{T}\) rate, with \(T\) denoting the sample size. Consequently, we can compare the maximum of a sequence of \(F\) statistics to critical values by Bai and Perron (1998, 2003b), see also Andrews (1993) for the case of just one break. The test is able to detect a break in the long-memory parameter even relatively close to the boundaries of the sample because it does not rely on a separate estimation of the long-memory parameter before and after potential breaks. Further, a sequence of tests makes it possible to estimate the number of breaks.

Since Stock’s (2001) comment on the innovative study of Cogley and Sargent (2001), his warning not to confuse a change in volatility with a change in persistence has been taken seriously. Fortunately, our test inherits the properties of the lag-augmented LM test developed by Demetrescu et al. (2008): Using Eicker–White standard errors renders the test robust to unconditional heteroskedasticity of a very general nature, see Kew and Harris (2009). In fact, the variance process is essentially unrestricted, thus allowing for time-varying volatility except for explosive and degenerate cases.

We apply our new tests to monthly U.S. inflation rates in the period 1966–2008. While there is strong evidence for a break in long memory in October 1973, a second potential break in March 1980 turns out to be insignificant. Prior to making the long-memory analysis, a significant mean shift found in 1981 has been subtracted from the data. In addition, we observe a considerable decline in volatility during the eighties.

The rest of the paper is organized as follows. In the section which follows, we will discuss the model of fractional integration with a break in the order of integration. Next, in section three, we obtain a new Chow-type test for multiple breaks assuming the break dates are known a priori. Experimental evidence is collected showing that the test works extremely well in finite samples even if the order of integration is misspecified under the null hypothesis of no break. The fourth section is devoted to the case where the break points are not known. We propose performing the test as a max-Chow test in line with Andrews (1993) when testing against just one break, and generalizing this approach for several breaks by adopting tests developed by Bai and Perron (1998). The finite-sample performance is studied through simulations. In Sect. 5, we turn to the analysis of monthly U.S. inflation rates, allowing for breaks in the mean as well as for breaks in the order of integration. Our concluding remarks are made in the final section, while mathematical proofs are contained in the Appendix.

2 Breaks in long memory

As a starting point, let us recall how long memory is defined and interpreted within a fractionally integrated framework. Under the null hypothesis of no break the observed time series \(\{y_t\}\) (\(t=1,\ldots ,T\)) is integrated of order \(d\),

$$\begin{aligned} \Delta ^d y_t = (1-L)^d y_t = e_t \ \sim \ I(0), \end{aligned}$$
(1)

where \(\{e_t\}\) is a stationary and invertible short-memory process integrated of order zero, \(I(0)\), and \(L\) denotes the conventional lag operator. We write the Wold representation of \(\{e_t\}\) in terms of zero mean white noise innovations \(\{\varepsilon _{t}\}\), say \(e_t = \sum _{k=0}^\infty \gamma _k \varepsilon _{t-k}\). Then integration of order zero means that \(\sum _{k=0}^\infty \gamma _k\) is finite and different from zero. Fractional differences are defined through the usual binomial expansion,

$$\begin{aligned} \left( 1-L\right) ^{d}=\sum \limits _{i=0}^{\infty } \pi _{i,d} L^{i}, \quad \pi _{0,d} =1, \ \pi _{i,d}= \frac{i-1-d}{i} \, \pi _{i-1,d}, \quad i\ge 1. \end{aligned}$$

Similarly, one may expand the inverse filter with coefficients \(\{\psi _{i,d}\}\),

$$\begin{aligned} y_t = (1-L)^{-d} e_t = \sum \limits _{i=0}^{\infty } \psi _{i,d} e_{t-i}, \end{aligned}$$

which provides a well defined stationary process only for \(d< 0.5\). The impulse response coefficients \(\{c_i\}\) of \( \{y_t\}\) can be obtained by convolution of \(\psi _{i,d}\) and \(\gamma _k\) such that:

$$\begin{aligned} y_t = \sum \limits _{i=0}^{\infty } c_{i} \varepsilon _{t-i}. \end{aligned}$$

Hassler and Kokoszka (2010) provide a necessary and sufficient condition which \(\{\gamma _k\}\) has to obey for the impulse response coefficients \(\{c_i\}\) to decay hyperbolically in \(d\). Under this rather weak condition it holds true for \(d>0\) that

$$\begin{aligned} c_i \sim c \, i^{d-1} \quad , \text{ i.e., } \lim _{i \rightarrow \infty } \frac{c_i}{ i^{d-1}} = c \ne 0, \end{aligned}$$
(2)

where the constant \(c\) is defined in Hassler and Kokoszka (2010, Proposition 2.1). For \(d = 1\), past innovations \(\varepsilon _{t-i}\) have a permanent effect on \(y_t\), while for \(0.5 \le d < 1\) we observe nonstationarity with transitory shocks,Footnote 2 \(c_i \rightarrow 0\) as \(i \rightarrow \infty \). Finally, for \(0<d<0.5\) the impulse response coefficients \(\{c_i\}\) die out fast enough to be square-summable resulting in a stationary process, though still dying out slowly enough that \(\{c_i\}\) is not summable, which characterizes long memory. In view of (2), \(d\) is interpreted as the degree of persistence or the memory parameter measuring how slowly the effect of past shocks dies out.

As an alternative hypothesis to (1) we model \(m\) breaks constituting \(m+1\) regimes,

$$\begin{aligned} y_t= (1-L)^{-d_j} e_t, \quad t=T_{j-1} + 1,\ldots ,T_j \, \quad j=1,\ldots ,m+1, \end{aligned}$$
(3)

with \(T_0=0\) and \(T_{m+1}=T\). The null hypothesis of no breaks becomes

$$\begin{aligned} H_0: \, d_2-d_1 = \cdots = d_{m+1}-d_m =0. \end{aligned}$$

In what follows we prefer the parameterization

$$\begin{aligned} d_j = d + \theta _{j-1}, \quad j=1, \ldots , m+1, \ \theta _0=0, \end{aligned}$$
(4)

such that \(\theta _j\) denotes the shift relative to the first period occurring at the \(j^{th}\) break. The null hypothesis of interest may now be recast as

$$\begin{aligned} H_0: \, \theta _1 = \cdots = \theta _m = 0. \end{aligned}$$
(5)

If a sudden shift in \(d\) is considered as too extreme in practice, there still may be a “smooth transition”,Footnote 3

$$\begin{aligned} y_t= \left\{ \begin{array}{ll} y_{0} +\, \sum \limits _{i=0}^{t-1}\psi _{i,d} \, e_{t-i}, &{} t=1,\ldots , T_1 \\ y_{T_{1}} +\, \sum \limits _{i=0}^{t-1-T_1}\psi _{i,d+\theta _1} \, e_{t-i}, &{} t=T_1+ 1,\ldots , T_2 \\ \vdots &{} \\ y_{T_m} +\, \sum \limits _{i=0}^{t-1-T_m}\psi _{i,d+\theta _m} \, e_{t-i}, &{} t=T_m+ 1,\ldots , T \end{array} \right. , \end{aligned}$$
(6)

where \(\psi _{i,d+\theta _j}\) are the coefficients from expanding \((1-L)^{-d-\theta _j}\). In (6), only realizations \(e_t\) after a break contribute to the slowly evolving long memory after \(T_j\).

As in Bai and Perron (1998), we assume that the potential break points are determined by break fractions \(\lambda _j\), i.e. \(T_j =[\lambda _j T]\), where \( [\cdot ]\) denotes the integer part. In fact, treating the break points as unknown parameters, it makes sense to distinguish true break fractions \( \lambda _j^0\) from those estimated from the data, \(\widehat{\lambda }_j\). To reduce the notational burden we have ignored such a distinction in the exposition so far. As usual in this literature, we maintain the standard assumption that the true break points all grow with the sample size, such that each subsample contains an increasing number of observations.

Assumption 1

For the true break fractions it holds true that

$$\begin{aligned} 0 = \lambda _0^0 < \lambda _1^0 < \cdots < \lambda _m^0 < \lambda _{m+1}^0=1. \end{aligned}$$

There exists a considerable body of literature that deals with a break from an I(0) to an I(1) process (and vice versa), starting with tests pioneered by Kim (2000) and Busetti and Taylor (2004). Hassler and Scheithauer (2011) showed that those tests have power against fractional alternatives, too. If we wish to allow for \(d \ne 0\) under the null, however, then \(d\) would have to be estimated first in order to apply such tests to differenced data. It is not clear how the preliminary estimation step would affect the subsequent test.

Some recent papers have proposed alternative procedures to detect breaks in long memory at an unknown time. Referencing the least-squares principle, Gil-Alana (2008) discusses a procedure allowing for breaks in the memory parameter and/or the mean and a linear time trend, but this technique does not allow to establish significance. Sibbertsen and Kruse (2009) discuss a CUSUM of squares-based test and find that the critical values depend on the unknown parameter \(d\). This requires a preliminary consistent estimation \( \widehat{d}\) under \(H_0\); such an estimate can be very volatile in smaller samples resulting in unreliable subsequent inference. Martins and Rodrigues (2012) propose a related procedure relying on a recursive forward and backward estimation where critical values depend again on the unknown parameter \(d\). Further, Ray and Tsay (2002) adopt a Bayesian perspective and apply Markov Chain Monte Carlo methods to estimate the posterior probability and size of a change in the order of integration. Finally, Beran and Terrin (1996) suggest using non-overlapping subsamples to compute (approximate) maximum likelihood [ML] estimates of \(d\), \(\widehat{d}_1\), \(t=1,\ldots ,T_1\), and \(\widehat{d}_2\), \( t=T_1+1,\ldots , T\), where \(T_1\) is varied systematically. The test statistic builds on the maximum difference \(|\widehat{d}_2-\widehat{d}_1|\). The limiting distribution established by Beran and Terrin (1999) coincides upon squaring with the one given by Andrews (1993) as supremum of so-called tied-down Bessel processes. It was derived under the sufficient assumption of \(\sqrt{T}\)-consistent estimators, see Andrews (1993, Theorem 1). Consequently, Beran and Terrin (1996) work with a parametric approximation to ML requiring a fully specified model for the \(I(0)\) component \(\{e_t\}\), see also Yamaguchi (2011), likewise working with an approximation to ML. The asymptotic theory does not seem to hold for semiparametric estimators converging with a slower rate than \(\sqrt{T}\). This is one further motivation for our proposal, since in the regression framework by Demetrescu et al. (2008) \(\sqrt{T}\)-consistency is maintained, see also Proposition 2 below. The major advantage, however, of the regression approach is that it extends naturally to multiple breaks along the lines of Bai and Perron (1998).

3 Tests with known break points

In some cases, economists have an idea of the timing of a potential break point in persistence or wish to know the impact of a certain event on persistence. In the context of this paper, the inauguration of a new central bank governor might be an event that induces a break in inflation persistence. Alternatively, economists might be interested in the impact of a new inflation target or a new monetary policy regime on inflation persistence. Therefore, the case of known break fractions is an interesting starting point for which we will first derive a test statistic from the Lagrange Multiplier [LM] principle under simplifying assumptions before then turning to extensions that are relevant in practice.

3.1 Under iid assumptions

Working with finite samples of size \(T\) the theoretical difference operator from (1) has to be adjusted. Given a finite past starting value with the first observation \(y_1\), the infinite expansion is truncated in practice. We call the truncated differences \(\Delta ^d_{t,y}\) instead of \( \Delta ^d y_t\), and denote them by \(x_t\) for brevity,

$$\begin{aligned} x_t = \Delta ^d_{t,y} = \sum \limits _{j=0}^{t-1} \pi _{j,d} y_{t-j}, \ t=1,\ldots , T. \end{aligned}$$
(7)

This amounts to assuming that past values of \(\{y_t \}\) are zero for \(t \le 0\). Such processes are also classified as “type II” contrasting the more conventional “type I” processes; see e.g., Robinson (2005) for a discussion. To derive an LM test we will further assume absence of short memory.

Assumption 2

Let \(\{e_t\}=\{ \varepsilon _t\}\), \(t \in \mathbb Z \), from (3) be an iid series with mean 0 and variance \(\sigma ^2\). The starting values are set equal to zero, \(y_t=0\) for \(t \le 0\).

To set up the score function in the Appendix we have to assume a Gaussian pseudo-log-likelihood function, although Gaussianity is not required for the limiting distribution below.

Proposition 1

Under (3) with (4), (5), and Assumption 2, the LM statistic becomes—with true break points—\(T_j = [\lambda _j^0 T]\)

$$\begin{aligned} LM=\frac{6}{{\sigma }^4 \, \pi ^2} \ \sum _{j=1}^m \ \frac{\left( \sum \limits _{t = T_j +1}^{T_{j+1}} x_t x_{t-1}^* \right) ^2}{T_{j+1} - T_j}, \end{aligned}$$
(8)

where

$$\begin{aligned} x_{t-1}^* = \sum _{i=1}^{t-1} i^{-1} x_{t-i} \end{aligned}$$
(9)

with \(\{x_t\}\) from (7).

Proof

See Appendix. \(\square \)

Note that \(LM\) with \(\{x_t\}\) from (7) is computed under the null of no break, i.e. (5) and (4), which means that the differencing parameters are equal to \(d\) in all subsamples. The summation in \(LM\) starts with the second sample after \(T_1\), but the information of the first sample is contained in \(\{x_{t-1}^* \}\). Along the lines of Breitung and Hassler (2002, Theorem 1), \(LM\) can be approximated by an \(F-\) statistic testing for \(\psi _1= \cdots = \psi _m =0\) in the following regression estimated by ordinary least squares [OLS],

$$\begin{aligned} x_t= \sum _{j=1}^m \widehat{\psi }_j \, x_{t-1}^*\, D_{t}(\lambda _j^0)+ \widehat{\varepsilon }_t, \quad t=2,\ldots ,T, \end{aligned}$$
(10)

with the step dummy variables (\(j=1,\ldots ,m\))

$$\begin{aligned} D_t(\lambda _j^0)=\left\{ \begin{array}{ll} 1, \quad t=[\lambda _j^0 T] +1,\ldots , [\lambda _{j+1}^0 T] &{} \\ 0, \quad \text{ else } &{} \end{array} \right. . \end{aligned}$$
(11)

For the usual \(F\) statistic it is straightforward to obtain (with \(SSR=\sum _{t=2}^T {\widehat{\varepsilon }_t}^2\))

$$\begin{aligned} \frac{T-m}{m \, SSR} \ \sum \limits _{j=1}^m \ \frac{\left( \sum \limits _{t = T_j +1}^{T_{j+1}} x_t x_{t-1}^* \right) ^2}{\sum \limits _{t = T_j +1}^{T_{j+1}} \left( x_{t-1}^* \right) ^2} = \frac{LM}{m} + o_p(1), \end{aligned}$$

since \((T_{j+1}-T_j)^{-1} \sum \limits _{t=T_j +1}^{T_{j+1}} \left( x_{t-1}^* \right) ^2\) converges to \(\sigma ^2 \pi ^2 /6\) under Assumption 1.

Breitung and Hassler (2002) consider testing for the parameter value \(d\) (assuming a priori that there is no break) within

$$\begin{aligned} x_t=\widehat{\phi }x_{t-1}^*+\widehat{\varepsilon }_t, \quad t=2,\dots ,T. \end{aligned}$$
(12)

We now merge regressions (10) and (12) (and will argue in Sect. 3.3 that this robustifies against a misspecification of \(d\)):

$$\begin{aligned} x_t=\widehat{\phi }x_{t-1}^* + \sum _{j=1}^m \widehat{\psi }_j \, x_{t-1}^*\, D_{t}(\lambda _j^0) + \widehat{\varepsilon }_t, \quad t=2,\ldots ,T. \end{aligned}$$
(13)

A break in fractional integration is indicated by means of the usual \(F\) statistic \(F(\lambda _1^0, \ldots , \lambda _m^0)\) from (13) testing for the null

$$\begin{aligned} H_0: \, \psi _1 = \cdots = \psi _m = 0. \end{aligned}$$
(14)

The following result can be established, where “\(\overset{d}{\longrightarrow }\)” stands for the convergence in distribution.

Proposition 2

Under the assumptions of Proposition 1 and Assumption 1 it follows for the estimators from (13) that

$$\begin{aligned} \sqrt{T} \, \left( \widehat{\phi }, \widehat{\psi }_1, \ldots \, \ \widehat{\psi }_m \right) ^\prime \ \overset{d}{\longrightarrow } \ \mathcal N _{m+1} \left( 0, \Sigma \right) \end{aligned}$$

as \(T \rightarrow \infty \), where \(\Sigma \) has full rank and is given in the Appendix. Hence,

$$\begin{aligned} m \, F(\lambda _1^0, \ldots , \lambda _m^0) \overset{d}{\longrightarrow } \chi ^2 (m), \end{aligned}$$

where \(\chi ^2 (m)\) denotes a chi-squared distribution with \(m\) degrees of freedom.

Proof

See Appendix. \(\square \)

Remark 1

In practice, the variables entering (13) will have a mean different from zero that has to be accounted for. Deterministic components have to be extracted prior to the regression, see e.g. Robinson (1994), such that \(\{x_t\}\) can be considered as a zero mean variable.

3.2 Extensions

Assumption 2 is too restrictive for practical purposes and can be relaxed considerably. We indicate generalizations without going into technical details and omit formal proof, as our test statistic is related to statistics handled in the papers referenced below. A valid set of conditions replacing Assumption 2 is now adopted from Hassler et al. (2009).

Assumption 3

Let \(\{e_t\}\) from (3) be a stable autoregressive process of order \(p\),

$$\begin{aligned} e_t = \sum _{i=0}^p a_i e_{t-i} + \varepsilon _t, \end{aligned}$$

driven by a strictly stationary and ergodic martingale difference series \( \{ \varepsilon _t\}\) with variance \(\sigma ^2\) satisfying an eight-order cumulant condition.

Let us briefly comment on generalizations going beyond the previous section (Assumption 2).

First, Assumption 3 relaxes the assumption of independence and instead assumes lack of correlation, maintaining that the innovations form a martingale difference series. In case of conditional homoskedasticity, \( \text{ E }(\varepsilon _t^2| \varepsilon _{t-1}, \varepsilon _{t-2},\ldots ) = \sigma ^2 \), the asymptotic results of the previous section will not change, see, for example, Robinson (1991). In case of conditional heteroskedasticty, however, it is necessary to employ Eicker–White standard errors as advocated by Demetrescu et al. (2008). With such robustified standard errors the limiting distribution remains unchanged. More generally, this even holds true for unconditional heteroskedasticty of very general form, \(\text{ E } (\varepsilon _t^2) = \sigma ^2_t\), where the variance process allows for smooth shifts as well as sudden breaks, see Kew and Harris (2009).

Second, upon fractional differencing one often observes additional short memory correlation in \(\{e_t\}\). To account for autocorrelation, we follow Demetrescu et al. (2008) and augment the test regression with lagged endogenous variables,

$$\begin{aligned} x_t=\widehat{\phi }x_{t-1}^*+ \sum _{j=1}^m \widehat{\psi }_j \, x_{t-1}^*\, D_{t}(\lambda _j^0)+\sum _{i=1}^p \widehat{a}_i\,x_{t-i}+\widehat{\varepsilon _t }. \end{aligned}$$
(15)

In fact, Demetrescu et al. (2008) allow for more general processes \(\{e_t\}\) than in Assumption 3. Their assumptions accommodate many short-memory AR \((\infty )\) processes that can be approximated with growing \(p\). Since the regressors in (15) are not orthogonal, Demetrescu et al. (2008) advise against data-driven lag-length selection, as the model selection step affects subsequent inference about \(\psi _j\) even asymptotically, see, for example, Leeb and Pötscher (2005). Instead, they advocate choosing the lag length \(p\) in (15) by deterministically following the rule of thumb

$$\begin{aligned} p = \left[ 4 (T/100)^{1/4} \right] , \end{aligned}$$
(16)

which was originally proposed by Schwert (1989). Although it lacks optimality properties it is widely used in applied econometrics. Demetrescu et al. (2011) collected further experimental support for its usefulness in practice in that it balances the trade-off between power and control of size under \(H_0\).

Third, the starting value condition in Assumption 2 is not crucial. Note that, the sequence of regressors \(\{x_{t-1}^*\}\) is only asymptotically stationary. Without zero starting values, the stationary, non-observable counterpart is \(x_{t-1}^{**} = \sum _{j=1}^\infty j^{-1} \Delta ^d y_{t-j}\) with \(y_t\) being from (3) under \(H_0\). The difference between \(x_{t-1}^{**}\) and \(x_{t-1}^{*}\) becomes negligible with growing sample size, as already stressed by Demetrescu et al. (2008) and more recently by Hassler et al. (2009).

3.3 Misspecification of \(d\)

When testing for (5), Propositions 1 and 2 assume the true \(d\) to be known a priori, which will rarely be the case in practice. Often, practitioners will estimate the unknown differencing parameter before testing for a break, which will result in fractional misspecification when computing the differences. Therefore, we now consider the model

$$\begin{aligned} \Delta ^{d+\delta } y_t = e_t, \quad t=1,\ldots , T, \end{aligned}$$
(17)

where \(\delta \ne 0\) is the degree of misspecification. Consequently, the differences \(\{x_t\}\) from (7) building on \(\Delta ^d\) are not \(I(0)\) but rather \( I(\delta )\), and hence serially correlated and therefore correlated with \( \{x_{t-1}^*\}\). Hence, it is easy to show for \(\widehat{\psi }_j\) from (10) that \(\widehat{\psi }_j \nrightarrow 0\), and that the \(LM\) statistic diverges as \(T\) increases. To compensate for this effect we proposed combining regression (10) with the original proposal of Breitung and Hassler (2002). The latter test building on (12) is consistent, and a violation of the specified order of integration (\(\delta \ne 0\)) will be captured by \(\widehat{\phi }\rightarrow \text{ E } (x_t x_{t-1}^*)\). This motivates the regression (13) instead of (10). Admittedly, under the more realistic null model (17) instead of (1) the asymptotic distribution of the estimators from (13) is not obvious. Although local power results (for \( \delta =c / \sqrt{T}\)) are available from Tanaka (1999, Theorem 3.1) or Demetrescu et al. (2008, Proposition 3), it is not clear how they generalize for a fixed \(\delta \). Still, we have the conjecture in Remark 2 that the approximation in Proposition 2 is valid a guideline under \(\delta \ne 0\), too.

Remark 2

Let us assume the null model (17) with \(\delta \ne 0\), while the test regression is computed with \(\{x_t\}\) from (7) relying on \(\Delta ^d\). We then expect that estimates \(\widehat{\phi }\) significantly different from \(0\) in (13) or (15) will account for (at least moderate) misspecification \(\delta \ne 0\), such that \( \chi ^2(m)\) provides a valuable approximation for the multiple of the \(F\) statistic under the null of no break in fractional integration.

To back Remark 2, we report results from a computer exercise for the case \(m=1\), which corresponds to a classical Chow test applied to the regression (13) or the lag-augmented version (15). We simulate time series with \(T=500\) observations and test for a break assuming as break fraction \(\lambda _1 = 0.5\). The data is simulated with standard normal iid innovations \(e_t = \varepsilon _t\) entering (17) with \(d=0\) and no break, such that the observables are integrated of order \(\delta \). The parameter \(\delta \) measures the degree of misspecification. All experiments rely on 1,000 replications. We computed the size (at nominal 1, 5, and 10 % level) of the \(F\) test \(F(0.5)\) from Proposition 2, i.e., from regression (13) without lags. Figure 1 shows experimental sizes for the range of \(-0.4 \le \delta \le 0.4\) and \(\delta =\pm 1\). They tend to be below the nominal ones except for \(\delta =1\) where we observe size distortions which would be unacceptable in practice. The size properties distinctly improve when working with the lag-augmented regression. This is not surprising since the lags capture some of the serial correlation of \(\{x_t\}\) stemming from misspecification. Figure 2 displays the rejection rates of \(F(0.5)\) from (15) when compared with quantiles from \(\chi ^2 (1)\). Even for a misspecification as strong as \( \delta =1\) the size distortion is negligible. Hence, Fig. 2 soundly supports our conjecture in Remark 2 that a (moderate) misspecification due to the estimation of \(d\) in practice will leave the test valid.

Fig. 1
figure 1

Rejection rates from (13) plotted against \( \delta \) from (17)

Fig. 2
figure 2

Rejection rates from (15) plotted against \( \delta \) from (17)

4 Tests with unknown break points

Let us now turn to the interesting situation of where the timing of potential breaks in long memory is not known a priori. First, we adopt the tools by Bai and Perron (1998) to determine the number of breaks and to test for their significance. We then investigate their finite sample behavior in our context through Monte Carlo experimentation.

4.1 Implementation

We stick to the regression Eq. (15), only that the break fractions are now not known but varied over the sample. To underline this difference, we write

$$\begin{aligned} x_t=\widehat{\phi }x_{t-1}^*+ \sum _{j=1}^m \widehat{\psi }_j \, x_{t-1}^*\, D_{t}(\lambda _j)+\sum _{i=1}^p \widehat{a}_i\,x_{t-i}+\widehat{\varepsilon _t}, \end{aligned}$$
(18)

where the step dummies \(D_t(\lambda _j)\) are defined as in (11) but with \(\lambda _j\) and hence \(T_j = [\lambda _j T]\) varying. Under Assumption 3 all variables are (asymptotically) stationary, and the stage is set to perform a multiple change analysis along the lines of Bai and Perron (1998).

On top of the model and Assumption 1 concerning the true break fractions, we now assume that in the empirical aplication each sample segment has a minimal length determined by a trimming parameter \(\epsilon >0\) :

$$\begin{aligned} \frac{T_j - T_{j-1}}{T} \ge \epsilon , \quad j=1, \ldots , m+1. \end{aligned}$$

The limiting distributions depend on the trimming, and Bai and Perron (1998) provide critical values for \(\epsilon = 0.05\). Bai and Perron (2003a), however, recommend the usage of \(\epsilon = 0.15\) in order to have better size properties in finite samples. For the rest of the paper we will work with \(\epsilon = 0.15\) relying on corresponding critical values from response surface regressions by Bai and Perron (2003b).Footnote 4 \(F\) statistics \(F(\lambda _1, \ldots , \lambda _m)\) testing for \(\psi _1= \cdots = \psi _m = 0\) from (18) are computed for all possible break points subject to

$$\begin{aligned} \Lambda _\epsilon = \{ (\lambda _1,\ldots ,\lambda _m): \, |\lambda _{j}-\lambda _{j-1}| \ge \epsilon , \ j=1,\ldots ,m+1 \}, \quad \lambda _0=0, \, \lambda _{m+1}=1. \end{aligned}$$

The maximum across all \(F\) statistics is called \(sup F(m)\),

$$\begin{aligned} sup F(m) = \max _{\Lambda _\epsilon } \left( F(\lambda _1, \ldots , \lambda _m) \right) . \end{aligned}$$

It can easily be determined by a grid search for moderate sample sizes and small \(m\). For large values of \(m\), Bai and Perron (2003a) recommend the principle of dynamic programming. Critical values are available up to \(m=9\). For \(m=1\), this corresponds to a max-Chow test in line with Andrews (1993). The candidates for breaks are the arguments maximizing \(sup F(m)\) (or, as an equivalent, minimizing the sum of squared residuals from (18)):

$$\begin{aligned} (\widehat{\lambda }_1, \ldots , \widehat{\lambda }_m ) =\arg \max _{\Lambda _\epsilon } \left( F(\lambda _1, \ldots , \lambda _m)\right) . \end{aligned}$$

In many cases, we do not want to specify a specific number (\(m\)) of potential breaks a priori. We would prefer to determine \(m\) from the data. To this end, Bai and Perron (1998) suggest a so-called double maximum test which we do not investigate here. Instead, we adopt their third proposal to test for the null hypothesis of \(\ell \) breaks versus the alternative of \( \ell +1\) changes building on a test statistic \(sup F (\ell +1|\ell )\). To determine the number of breaks, Bai and Perron (2003a) advocate a sequence of tests, 0 vs. 1, 1 vs. 2, and so on; if \(sup F (\ell +1|\ell )\) is not significant for \(\ell \ge m\), then the number of breaks is determined as \( m=\ell \). Obviously, \(sup F (1|0)= sup F(1)\). In general, the statistic \(sup F (\ell +1|\ell )\) is computed in the following way: determine the break points assuming \(\ell \) breaks, \((\widehat{\lambda }_1,\ldots ,\widehat{\lambda } _\ell )\). For each of the \(\ell +1\) segments, determine the \(F\) statistic testing for \(m=1\) break at unknown time in segment \(j\), say \(sup F_j (1)\). If the overall maximal value, \(\max _{j=1,\ldots ,\ell +1} sup F_j(1)\) is sufficiently large, then the null of \(\ell \) breaks is rejected in favor of \( \ell +1\) breaks. The critical values are again available from Bai and Perron (1998, 2003b).

4.2 Monte Carlo evidence

For this section, we simulated time series with \(T=500\) observations, based on standard normal iid innovations \(\{\varepsilon _t\}\). The true data generating process is from (6). We computed the size (at nominal 1, 5, and 10 % level) and power. All the rejection frequencies rely on 1,000 replications. In (6) we choose \(d=0\) without loss of generality and vary \(|\theta |\) within the experiments between 0 and 1.

In a preliminary analysis we investigate the effect of a misspecified order of integration when testing for an unknown break point, analogously to the experiment leading to Fig. 2 in Sect. 3.3. For this purpose we evaluated with \(m=1\) the maximum \(supF(1)\) from (18) with with \(p=5\) lags under (17). The resulting empirical sizes resemble the ones in Figure 2 and are presented in Fig. 3. Hence, the issue of a misspecified order of integration does not seem to be of major concern, which is of course good news for applied purposes.

Fig. 3
figure 3

Rejection rates from (18) plotted against \( \delta \) from (17)

For all further simulations the true order of integration is not assumed to be known but estimated, which parallels the situation in real life. We use the so-called exact local Whittle [ELW] estimator proposed by Shimotsu and Phillips (2005). With the estimated \(\widehat{d}\), the differences \(\{x_t\}\) are constructed. \(F\) statistics are from regression (18) with \(p=5\) (according to (16)).

4.2.1 Case of one break

First, we focus on the situation where the data-generating process [DGP] has \(m=1\) change. Table 1 shows the empirical size and power of the \(sup F(1)\) test, the mean of the estimated break fractions, their standard deviation and their root mean squared error for different values of \( \theta _1=\theta \) in (3). Figure 4 visualizes the power and the size of the test for \(|\theta _1|=|\theta | \le 0.4\) and \(|\theta | = 1\). The unknown break data is in the middle of the sample, \( \lambda _1^0 = 0.5\).

Table 1 Rejection of the null hypothesis of \(sup F(1)\) for different values of \(\theta \)
Fig. 4
figure 4

\(sup F(1)\) plotted against \(\theta \), \(m=1\)

The simulation results in Table 1 correspond to expectations. The larger the difference in the order of integration before and after the break, the easier the break is detected and correctly allocated. In other words, the larger \(\theta \) is in absolute terms, the higher the rejection rate and the smaller the RMSE\((\hat{\lambda }_1)\). Overall, the performance of our test in a finite sample is satisfactory. The size of the test is good: close to 1, 5, 10 % at the corresponding significance levels. The power is extremely high if the difference in the long-memory parameter before and after the break is greater than 0.3 in absolute terms. Even if the difference is only \(\pm 0.2\), the power is still high. Figure 4 depicts the symmetry of the rejection rates with respect to \(\theta \) around zero.

Next, we investigate the performance of the \(sup F(1)\) test in the light of a number of variations in the simulation set-up. In the left-hand graph of Fig. 5, the 5 %-rejection rates are plotted against \(\theta \) for different values of the true unknown break fraction: \(\lambda _1^0 \in {\{0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8\}}\). It is remarkable how well breaks are detected where there are only 150 observations before or after the break if \(\theta >0.2\), that is for \( \lambda _1^0 = 0.2\) and \(\lambda _1^0=1- 0.2\). Where \(\theta \le 0.2\), and \( \lambda _1^0 = 0.2\) or \(\lambda _1^0=1- 0.2\), the power is low. For all other cases, the power is high and the RMSE\((\hat{\lambda }_1)\) (not reported here) are comparable to those reported in Table 1.Footnote 5

Fig. 5
figure 5

\(sup F(1)\): different break fractions and MA(1), nominal level of 5 %

The right-hand graph in Fig. 5 contains the 5 %-rejection rates plotted against \(\theta \) for three different moving average parameters. To allow for the short memory of the time series \(\{e_t\}\) in ( 3), we consider an MA(1) process,

$$\begin{aligned} e_t=\varepsilon _t + b \, \varepsilon _{t-1}. \end{aligned}$$

The MA(1) coefficient \(b\) takes on the values 0.00 (white noise), 0.50 and 0.75. Due to the lagged variables included in regression (18), the size and power of the \(sup F(1)\) are hardly affected, see Fig. 5.

Figure 6 shows the power of the test for different sample sizes, \(T\in \{250, 500, 1000, 2000\}\). Unsurprisingly, the power decreases as the sample size decreases. For \(T=250\), the test is only of limited use but if \(T\) is greater, the test has good and even excellent power properties.

Fig. 6
figure 6

\(sup F(1)\): different sample sizes, nominal level of 5 %

Next, we present a number of rejection frequencies of \(sup F(2)\) testing against two breaks where the true DGP only has one change. The results from Fig. 7 can be compared with \(sup F(1)\) from Fig. 4. In particular, at the 10 % level we observe that \(sup F(2) \) is mildly conservative under the null hypothesis. Consequently, it displays less power than \(sup F(1)\), which is not surprising since \(sup F(1)\) specifies the number of (potential) breaks correctly. Further, we present results for sequential testing under one break in Fig. 8. The left-hand graph contains rejection frequencies for \(F(1|0)\), which coincide of course with \(F(1)\) from Fig. 4. The right-hand graph in Fig. 8 shows the empirical sizes of \(sup F(2|1)\) at conventional levels. Given one break, \(sup F(2|1)\) tends to be mildly conservative; only in case of \(\theta =1\), are the experimental sizes above the nominal ones.

Fig. 7
figure 7

\(sup F(2)\) plotted against \(\theta \), \(m=1\)

Fig. 8
figure 8

\(sup F(1|0)\) and \(sup F(2|1)\) plotted against \(\theta \), \(m=1\)

We briefly summarize the findings for \(m=1\). The power of the \(sup F(1)\) test depends especially on the difference in the order of integration before and after the break. If the difference is larger than 0.3, the power is very good. Furthermore, the power is almost unaffected by variations in the true break fraction or the value of the moving average coefficient in the case of MA(1) short memory. The power of the test is good for samples with at least 500 observations. Its size properties are quite satisfactory throughout all the simulation set-ups in that the power does not come at the price of a too liberal test.

Testing against two breaks we observe that \(sup F(2)\) and \(sup F(2|1)\) are both mildly conservative in that the experimental size tends to be smaller than the nominal one.

4.2.2 Case of two breaks

The only difference to the previous experiments is that the DGP in this section has two breaks (\(m=2\)). The true break fractions are \(\lambda _1^0=1/3 \) and \(\lambda _2^0=2/3\). We consider the following scheme of breaks

$$\begin{aligned} d_1= 0, \ d_2=\theta , \ d_3= 0, \end{aligned}$$

which means in the notation of (4): \(\theta _1 = \theta \) and \( \theta _2=0\).

The size and power results of \(sup F(1)\) are given in the left-hand graph of Fig. 9. Clearly, the power curve is not as steep as in Fig. 4 because in the present DGP the second change returns to the original level.

The size and power of \(sup F(2|1)\) and \(sup F(2)\) are depicted in Figures 9 (right-hand graph) and 10, respectively. While the size is very similar, we observe that \(sup F(2)\) outperforms \(sup F(2|1)\) in terms of power.

Fig. 9
figure 9

\(sup F(1|0)\) and \(sup F(2|1)\) plotted against \(\theta \), \(m=2\)

Fig. 10
figure 10

\(sup F(2)\) plotted against \(\theta \), \(m=2\)

5 U.S. inflation

We use the monthly U.S. consumer price index (CPI) collected by the Organization for Economic Cooperation and Development. The sample runs from January 1966 until June 2008, yielding 509 observations. Inflation is computed as the annualized monthly change in CPI: \(p_t = 1,200(\log (CPI_t)-\log (CPI_{t-1}))\).

5.1 Preliminary analysis

It has been argued that long memory may be spurious and caused by breaks in the mean or by regime shifts. In particular, Lobato and Savin (1998) raised the question of whether the long memory in inflation is due to deterministic shifts. See also Sibbertsen (2004) for a corresponding survey paper. In order to avoid any confusion between mean shifts and long memory, we allow for a shift in the overall mean while seasonally demeaning at the same time. The demeaned inflation rate becomes

$$\begin{aligned} y_t=\left\{ \begin{array}{ll} p_t -\hat{\mu }_1(\tau _0) - seas_t, &{} t=1,\ldots , [\tau _0 \,T] \\ p_t -\hat{\mu }_2(\tau _0) - seas_t, &{} t=[\tau _0 \,T]+1, \ldots , T \end{array} \right. \end{aligned}$$

where \(\tau _0\) is the unknown, potential break fraction, \(\hat{\mu }_1(\tau _0) \) (\(\hat{\mu }_2(\tau _0)\)) is the estimated mean before (after) the break point and \(seas_t\) is the effect of seasonality.Footnote 6 In order to find \(\tau _0\), we adopt an approach developed by Hsu (2005) who modified the local Whittle [LW] estimator for \(d\), discussed by Robinson (1995). In the same way we modify the more refined exact local Whittle [ELW] estimator by Shimotsu and Phillips (2005). In a grid search over \(\tau \in [0.15, 0.85]\), \(d\) is estimated while accounting for a mean shift and seasonality at the same time. The modified criterion function is

$$\begin{aligned} R(d; \tau ) = \log (G(d;\tau ))-\frac{2d}{B}\sum \limits _{i=1}^{B}\log ( \lambda _i) \end{aligned}$$

with

$$\begin{aligned} G(d;\tau ) = \frac{1}{B} \sum _{i=1}^{B}I_{\Delta ^{d} y}(\lambda _{i}; \tau ), \end{aligned}$$
(19)

where \(\lambda _i\) are harmonic frequencies \(\lambda _i = 2 \pi i /T\), \( i=1,\ldots ,B\), and the bandwidth \(B\) is usually chosen according to

$$\begin{aligned} B= T^\alpha , \quad 0.5 < \alpha < 0.8. \end{aligned}$$

Further, \(I_{\Delta ^{d} y}(\lambda _{i}; \tau )\) denotes the periodogram evaluated from \(\Delta ^{d} y_t\) for a given mean shift fraction \(\tau \). Denote the conditional ELW estimator obtained for given \(\tau \) in a first minimization as \(\widehat{d}(\tau )\), while a second optimization step is necessary to find the change-point estimator \(\widehat{\tau }\):

$$\begin{aligned} \widehat{\tau } = \arg \min _{\tau \, \in \,[0.15,0.85]} R(\widehat{d} (\tau );\tau ). \end{aligned}$$

The modified ELW estimator for the memory parameter \(d\) is \(\widehat{d}( \widehat{\tau })\). Since the estimator \(\widehat{\tau }\) converges to the true normalized change point \(\tau _0\) (see Lavielle and Ludeña 2000), Hsu (2005) argues that the limiting distribution is not affected. From Shimotsu and Phillips (2005) we conclude

$$\begin{aligned} 2\sqrt{B}\,(\widehat{d}(\widehat{\tau })-d) \overset{d}{\rightarrow } \mathcal N (0,1), \end{aligned}$$
(20)

which allows to compute approximate confidence intervals.

Next, we wish to test whether the mean shift is significant, \(H_0: \mu _1=\mu _2\), using a test statistic proposed by Hidalgo and Robinson (1996):

$$\begin{aligned} HR=T^{d-0.5}\frac{\hat{\mu }_1(\tau _0)-\hat{\mu }_2(\tau _0)}{\sqrt{\Omega }} \sim \mathcal N (0,1), \end{aligned}$$

where \(\Omega \) depends on \(G(d;\tau )\). To obtain a feasible version of the test statistic, the unknown parameters are replaced by the estimators \( \widehat{\tau }\) and \(\widehat{d} (\widehat{\tau })\).

We repeat the empirical analysis for different values of the bandwidth: \(B \in \{T^{0.60},T^{0.65},T^{0.70},T^{0.75}\}\). The candidate for the break fraction \(\tau _0\) lies in the interval 1981/8 to 1982/7, depending on the bandwidth \(B\), see Table 2. For all choices of bandwidth, the mean shift is clearly significant according to the \(HR\) statistic. The timing of the mean shifts roughly coincide with the break date found by Hsu (2005) and is consistent with previous literature. Among others, Meltzer (2006) and Stock (2001) describe the level of inflation as high in the 1970s and early 1980s and low afterwards.

Table 2 Estimation of the order of integration

Table 2 reports the estimates of the order of integration. Needless to say, the appropriate choice of \(B\) is of crucial importance. If \( B\) is chosen too small, the estimate has a great standard deviation and might be imprecise. By contrast, choosing \(B\) too large results in a bias due to short memory components. Our estimate of \(d\) seems to stabilize for \( B=T^{0.65}\) and \(B=T^{0.70}\) while the estimate for \(B=T^{0.75}\) seems to exhibit a small downward bias. Therefore, the choice \(B=T^{0.70}\) maximizes the number of observations that do not lead to a bias: the results corresponding to this choice of \(B\) are highlighted below. The order of integration of inflation in the whole sample period is 0.35, with \(B=T^{0.70} \), implying that inflation is stationary.

We investigate whether there is a second break in the mean. To this end, we proceeded sequentially, subtracting the first mean-shift from the series and searching for a second mean-shift.Footnote 7 The second break is insignificant, even at the 10 % significance level. For this reason, we only account for one shift in the mean. In Fig. 11, we plot inflation adjusted for seasonal means and illustrate the mean shift. In order to obtain our variable of interest, \(\pi _t\), we then additionally adjusted for the mean shift.

Fig. 11
figure 11

Monthly U.S. inflation—seasonally demeaned, illustrating mean shift

Next, we visually investigate whether there has been a change in variance by inspecting the rolling standard deviations of inflation \(s_t(\pi )\) for \( \pi _t\) (5 years window), depicted in Fig. 12. We observe that the eighties were characterized by a reduction in volatility. To account for this variance heterogeneity, we report Eicker–White standard errors in the next section as advocated by Demetrescu et al. (2008) and Kew and Harris (2009).

Fig. 12
figure 12

Rolling standard deviations for \(\pi _t\) (window of 5 years)

5.2 Testing against changes in inflation persistence

We now turn to the estimation of a change in persistence in U.S. inflation rates. As a first step, we apply the difference filter to the adjusted inflation rates (\(\pi _t\)):

$$\begin{aligned} x_t=(1-L)^{\hat{d}} \pi _t, \end{aligned}$$

where \(\hat{d}=0.35\) is the estimated order of integration of the whole sample as reported in Table 2. Note that the precise value of \(d\) used for differencing is not of major importance since we observed a considerable robustness with respect to misspecification, see Remark 2. Next, we estimate regression (18) with \(m=1\) using \(p=\left[ 4(509/100)^{1/4}\right] =6\) lags, and compute a sequence of \(F\) statistics, \(F(\lambda _1)\), see Fig. 13. Their maximum values, \(sup F(1)\), are clearly significant, irrespective of whether \( F(\lambda _1)\) is computed using usual or Eicker–White standard errors. Both versions of the test detect the break in October 1973.

Fig. 13
figure 13

\(F(\lambda _1)\) (usual and Eicker–White standard errors) with critical values for a search in the interval 15–85 % of the observations

Similarly, we observe that \(sup F(2)\) is significant at the 1 % level: the critical value is 9.36 while \(sup F(2)\) takes on the values 14.86 and 10.14 with usual standard errors and with Eicker–White robustified standard errors, respectively. Again, the first break is found in October 1973, while the second one is located in March 1980. Note, that \(sup F(1)\) is larger than \(sup F(2)\), suggesting that there is only one break.Footnote 8

To verify whether there is a second change in persistence or not, we apply the \(sup F(2|1)\) test. In Fig. 14 we present \(F_1(1)\) and \( F_2(1)\) computed for the segments before and after October 1973, respectively. The maximum thereof, \(sup F(2|1)\), found in June 1980, is below 8.51 and hence not significant at the 10 % level, irrespective of whether robust Eicker–White standard errors are used or not.

Fig. 14
figure 14

\(F_1(1)\) and \(F_2(1)\) (OLS and Eicker–White standard errors) with critical values in search for a second break

As a robustness check, we test for a change in persistence of inflation without accounting for mean shifts. The results of these tests are similar and we come to the same conclusion. There is a break taking place in October 1973 with \(sup F(2|1)=46.56\) and 40.31, for OLS and Eicker–White standard errors, respectively. There is no evidence for a significant second break: \(sup F_1(1)\) \(=\) 0.78 and 1.11 (with the potential break point being June 1967) and \( sup F_2(1) \) \(=\) 6.58 and 4.58 (with the potential break point being April 1986), for OLS and Eicker–White standard errors, respectively.

One virtue of our approach is that it can find a change in \(d\) even with few—at least 150—observations before or after the break.Footnote 9 On this account, we were able to detect an early break in persistence taking place in 1973. Moreover, we can deduce the direction of the change in persistence from the sign of \(\widehat{\psi }\). A positive coefficient indicates an increase in persistence after a break, while a negative coefficient indicates a decrease, where the dummy variable \(D_t(\lambda )\) is defined as in (11).Footnote 10 In our estimation, \(\widehat{\psi }\) is positive, leading us to the conclusion that inflation persistence has increased since 1973. Naturally, we would like to know the order of integration before and after the break. However, the short time period does not allow us to reliably estimate the order of integration before the break. Footnote 11

The order of integration after 1973/10 can be estimated more reliably. The point estimate is 0.27 with a 90 % confidence interval of \([0.15, 0.39]\), for \(B=T^{0.70}\). It is worth noting that this confidence interval overlaps with the confidence interval of \(d\) estimated over the whole sample. This is not surprising as the rather long second subsample starting in 1973/10 dominates the estimation results obtained for the whole sample.

To sum up, we conclude that inflation persistence increased after 1973 and stayed constant thereafter. The estimate of \(d\) is about 0.27 after the break. By looking at the confidence intervals, we come to the conclusion that inflation neither has short memory (\(d \le 0\)) nor is nonstationary (\( d \ge 0.5\)). In addition to the break in persistence, we have evidence for a break in the mean and a trending behavior of the variance.

6 Concluding remarks

We proposed new tests against breaks in the order of fractional integration, which are built on the change-test methodology applied to the lag-augmented LM regression by Demetrescu et al. (2008). The procedures are \(sup-F\) tests, specifically following Andrews (1993) in the case of one potential break and more generally following Bai and Perron (1998). In particular, the latter authors allow for a sequence of tests to determine the unknown number of changes. Monte Carlo simulations indicate that the power of the tests essentially depends on the size of the changes. Breaks relatively close to the end or beginning of the sample can be detected with remarkable reliability. Not knowing the true order of integration and working with estimated values does not affect the performance of the tests.

Using the new tools, we investigate whether inflation persistence, i.e. the order of integration of inflation, in the U.S. has changed. In order to forestall spuriously high orders of integration, we adjust inflation rates by accounting for a shift in the mean where the break point is determined endogenously. Testing adjusted inflation, we find an increase in its persistence in October 1973. A second potential break in March 1980 is not significant at the 10 % level. This result does not change if we do not account for a mean shift.

Many studies measure inflation persistence as the largest autoregressive root [LARR] or as the sum of autoregressive coefficients [SARC]. Those measures cannot discriminate between different degrees of long-run persistence, see Kumar and Okimoto (2007) and Gadea and Mayoral (2006). Therefore, it is not surprising that most of these studies do not find evidence for a break in persistence. In contrast, most studies using the order of integration as a measure of persistence, which in a wider sense also includes the studies of Cogley and Sargent (2001) and Cogley and Sargent (2005), find time-varying persistence. The studies find breaks taking place in the early 1970s, the early 1980s and/or the early 1990s. Employing Eicker–White standard errors, our tests are robust to the apparent time-varying inflation volatility (see, for example, Stock and Watson (2007) or Pivetta and Reis (2007) for evidence). We are led to the conclusion that there is only one change in persistence and this took place in 1973. This break date coincides with the end of the Bretton Woods system, a sharp increase in oil prices and the start of an episode of high inflation.Footnote 12 Breaks in the eighties, documented in the literature, might be attributed to mean shifts or the decrease in inflation volatility.