1 Introduction

Exchange rates have long fascinated, challenged and puzzled researchers in international finance. Since the seminal papers by Meese and Rogoff (1983a, b), there has been wide agreement that macroeconomic models are not very helpful for exchange rate forecasting.Footnote 1 The exchange rate literature provides, however, at least two reasons for cautious optimism.

First, the dismal forecasting performance of exchange rate models can to some extent be explained by estimation and not only misspecification error (Engel et al. 2008). The significant role of estimation error is confirmed, among other things, by the relatively good forecasting performance of economic models estimated with a large panels of data (Mark and Sul 2001; Engel et al. 2008; Ince 2014) or long time series (Lothian and Taylor 1996). A number of tests have also been developed to take into account the sampling variance of estimated models, which would unduly give the RW a head start in a forecasting horse race (see Engel 2013, for a review).

The second reason for being cautiously optimistic about the usefulness of exchange rate models comes from the evidence in favor of the PPP model. According to Taylor and Taylor (2004), the exchange rate literature has turned full circle to the pre-1970s view that PPP holds in the long run. The mean reverting nature of real exchange rates has in particular found some support by panel unit root tests, which have higher power than the conventional univariate tests in the case of highly persistent or non-linear processes (Sarno and Taylor 2002; Holmes et al. 2012). Only a handful of studies have instead tested more directly whether the mean-reverting properties of the real exchange rate can be exploited in a forecasting setting. In the late 1980s, Meese and Rogoff (1988) extended their classic analysis to reach the conclusion that, like nominal exchange rates, real exchange rates are disconnected from economic fundamentals. Two studies in the mid-1990s argued instead that the RW can be beaten for larger datasets, for example in the case of long-data series (Lothian and Taylor 1996) or in multivariate frameworks (Jorion and Sweeney 1996). More recently, Rogoff (2009) suggested that is worth investigating whether PPP deviations and current account positions may help predict real exchange rate movements.

In this paper we aim to establish if Rogoff’s insight on PPP is correct. We will show that it is not enough to assume that real exchange rates revert to their mean, but it is also necessary to assume that the reversion pace is rather slow. This was first established in a series of studies conducted between the mid-1980s and early 1990s, which employed more than a hundred years of annual data. From an informal meta–analysis of these studies, Rogoff (1996) inferred that it takes between 3 and 5 years to halve real exchange rate deviation from the mean. More recent studies have been more skeptical about what is typically dubbed as “Rogoff consensus”. For example, Kilian and Zha (2002) proposed a prior probability distribution based on a survey of professional international economists and derived a posterior probability distribution of the half-life on the basis of a Bayesian autoregressive model. Their results provide very limited support for the view that the half-life is between three and five years. In a similar vein Murray and Papell (2002) stressed how univariate methods provide virtually no information on the size of the half-lives. Finally, a large cross-country heterogeneity in terms of point estimates and confidence intervals has also been found in the studies by Murray and Papell (2005) and Rossi (2005). There is however another strand of the literature which finds it instead plausible that at the aggregate level half-lives would be in the range between 3 and 5 years. They argue that aggregation bias both at the time and product dimension helps reconcile this high duration of the adjustment process with faster convergence at the product and sectoral level (Imbs et al. 2005; Crucini and Shintani 2008; Mayoral and Dolores Gadea 2011; Bergin et al. 2013).

In this paper we analyse whether mean-reversion of real exchange rates can be exploited for forecasting purposes. We test whether a calibrated, half-life PPP (henceforth, HL) model, which imposes a very gradual linear adjustment of the real exchange rate toward its mean, is able to forecast real exchange rates better than would be the case with (i) the same autoregressive (henceforth, AR) model, where the pace of mean-reversion is instead estimated rather than imposed and with (ii) a RW model. In our baseline the HL model is calibrated so that half of the adjustment of the real exchange rate toward its mean is completed within five years. We have set this initial value, rather conservatively, i.e. at the top of the range proposed by Rogoff (1996), so that in the short-run the model predictions resemble quite closely those of the RW model. As Rogoff’s consensus was built on the basis of the data from the pre-1990s and our forecast evaluation sample starts in 1990, this choice is not influenced by the data that are used to assess the quality of forecasts. We shall later extend our analysis with a thorough sensitivity analysis to show that, contrary to our initial expectations, all the key results hold true for the whole range of half-lives between 3 and 5 years proposed by Rogoff.

The key findings of our paper are as follows. We show that, exploiting simultaneously the evidence of (i) real exchange rate persistence and (ii) long-term convergence to PPP, leads to a considerable improvement in our ability to forecast real exchange rates even for short samples. To be more specific, we show that the HL model is able to forecast real exchange rates better than the RW for seven out of nine currencies. Particularly persuasive is that this simple approach beats the RW also at short-horizons.

Another remarkable result of our study is that the forecast accuracy of the estimated AR model is clearly considerably worse than that of the RW. We will explain this result both analytically and empirically, emphasizing that this is to be ascribed to the large impact of the estimation error, even if we have as many as 15 years of monthly data. Our empirical investigation is then taken a final step forward: we find that the mean reverting nature of the real exchange rate can also be exploited to outperform the RW model also for forecasting nominal effective exchange rates. This is because, for the majority of the currencies in our sample, real exchange rates revert to the mean mainly via nominal exchange rate rather than relative price adjustments. Finally, we will show that the results also hold in bilateral and not just effective terms.

The reason for the success of the HL model can be explained by Diebold’s shrinkage principle, which asserts that imposing restrictions on model parameters may be helpful for out of sample forecasts, even if the restrictions are false. There is also an interesting parallelism between our work and that of Faust and Wright (2013) on inflation forecasting. These authors show that a minimalitic two-steps forecasting strategy proves to be very effective out of sample. The first step requires finding a good long-term forecast, which could be underpinned by economic theory, survey data, etc. The second step consists of connecting the current and long-term value of inflation with a smooth path, giving up any attempt to explain a more complex dynamic of adjustment. Our proposed HL model is an exact application of this strategy, as we postulate a linear adjustment between the current level of the real exchange rate and a good long-term forecast, i.e. PPP.

The rest of the paper is structured as follows. Sections 2 and 3 outline the alternative models that we use in the real exchange rate forecasting horse race and present the outcome of the competition. Sections 4 and 5 provide an analytical investigation that sheds some light on our findings and contain a more general discussion on why the calibrated HL model is so competitive in terms of real exchange rate forecasting. Section 6 shows that the results are very robust to several alternative specifications. We also illustrate that our key results are valid for a broad range of half-lives (i.e. even wider than the range proposed by Rogoff). We conclude by showing that our improved ability to forecast real exchange rates is helpful also in the context of nominal exchange rate forecasting. The results are also valid for bilateral exchange rates of eight currencies vis-à-vis the US dollar. The last section concludes.

2 The Models

Let us define the log of the real exchange rate (RER) according to the convention that \(y_{t}\equiv s_{t} + p_{t} - {p}_{t}^{\ast }\), where s t is the log of the nominal exchange rate expressed as the foreign currency price of a unit of domestic currency, and p t and \({p}_{t}^{\ast }\) are the logs of home and foreign price levels, respectively.

Consider a simple autoregression model for the real exchange rate:

$$ y_{t}-\mu = \rho(y_{t-1}-\mu)+\epsilon_{t}, \epsilon_{t} \sim \mathcal{N}(0,\sigma^{2}). $$
(1)

For a stationary AR process the parameter ρ measures the speed of reversion to μ, which we interpret as the level of PPP. The half-life of adjustment toward PPP is equal to:

$$ hl = \log(0.5)/\log(\rho). $$
(2)

For ρ=1 the RER is generated by the RW process.

In the forecasting contest we employ three alternatives of model (1).

  1. 1.

    The first is a RW model, for which we calibrate ρ=1 and μ=0. The h step ahead forecast is:

    $$ {y}_{T+h|T}^{RW} = y_{T}. $$
    (3)
  2. 2.

    The second is the HL model for which we assume that the real exchange rate gradually converges to its sample mean (\(\bar {\mu }\)). The pace of convergence (\(\bar {\rho }\)) is calibrated with Eq. 2 so that the half-life is equal to five years, i.e. at the top of the range proposed by Rogoff.Footnote 2 The h step ahead forecast is:

    $$ {y}_{T+h|T}^{HL} = \bar{\mu} + \bar{\rho}^{h}(y_{T}-\bar{\mu}). $$
    (4)
  3. 3.

    The third is an autoregressive AR model, whose two parameters are estimated with OLS \(\hat {\mu }\) and \(\hat {\rho }\), so that the h step ahead forecast is:

    $$ {y}_{T+h|T}^{AR} = \hat{\mu} + \hat{\rho}^{h}(y_{T}-\hat{\mu}). $$
    (5)

3 Empirical Evidence

To assess the predictability of real exchange rates we gather monthly data for nine major currencies of the following economies: Australia (AUD), Canada (CAD), euro area (EUR), Japan (JPY), Mexico (MXN), New Zealand (NZD), Switzerland (CHF), the United Kingdom (GBP) and the United States (USD) for the period between 1975:1 and 2012:3. For all currencies we take the real effective exchange rates provided by the Bank for International Settlements (Klau and Fung 2006). The values of the analyzed series are presented in Fig. 1.

Fig. 1
figure 1

Real exchange rates (2010=100)

We assess the out-of-sample forecast performance for horizons ranging from one to sixty months ahead. In our baseline specification the models are estimated using rolling samples of 15 years (R=180 months). The first set of forecasts is elaborated with the rolling sample 1975:1-1989:12 for the period 1990:1-1994:12. This procedure is repeated with rolling samples ending in each month from the period 1990:2-2012:2. Since the data available end in 2012:3, the 1-month-ahead forecasts are evaluated on the basis of 267 observations, 2-month-ahead forecasts on the basis of 266 observations, and 60-month-ahead forecasts on the basis of 208 observations.

We measure the forecasting performance of the three competing models with three statistics: the mean squared forecast errors (MSFEs), the correlation coefficient between forecast and realized real exchange rate changes and the frequency of correct predictions in terms of directional change. Table 1 and Fig. 2 present the values of MSFEs. For the RW we report the MSFEs in levels. For the HL and AR models, we report them divided by MSFEs of the RW, so that values below unity indicate that such model outperforms the RW. We also test the null of equal forecast accuracy with the two-sided Diebold and Mariano (1995) test (corrected for heteroscedasticity and autocorrelation) to assess whether there is significant evidence that the model under or over-performs compared to the RW.Footnote 3

Table 1 Mean squared forecast errors (15Y rolling window)
Fig. 2
figure 2

Mean squared forecast errors. Notes: Each line represents the ratio of MSFE from a given method to MSFE from the random walk, where values below unity indicate better accuracy of point forecasts. The straight and dotted lines stand for AR and HL5, respectively. The forecast horizon is expressed in months

In terms of the MSFE criterion the HL model-based forecasts beat the RW for seven out of nine currencies (EUR, MXN, NZD, CHF, GBP, USD, JPY). The MSFEs of the HL model are on average 9 and 23 % lower than that of the RW model at two and five-year horizon, respectively. The HL model-based forecasts are also considerably more precise than those based on the AR model for five currencies (CAD, EUR, JPY, GBP and USD) while are broadly comparable for the other four. At short-horizons the HL model is clearly the best whereas the AR the worst one. At the one-year horizon the MSFEs of the HL model are on average 3 and 12 % lower than those from the RW and AR models, respectively.

Further evidence that the HL model beats the other two models can be found using our second criterion, which consists in computing the correlation coefficient between the realized and forecast changes of real exchange rates:

$$ r_{M,h} = cor\left( {y}_{T+h|T}^{M} - y_{T}, y_{T+h} - y_{T}\right), $$
(6)

where M stands for the model name. Note that Eq. 3 implies that r R W,h is zero: for that reason in Table 2 we report only the results for the HL and AR models. It shows that the correlation coefficients for the HL model are generally positive for all currencies at all horizons, except for the AUD. The average value of r H L,h also increases with the forecast horizon: from just 0.04 for one-month ahead forecasts to 0.53 for five-year ahead forecasts. The results do not provide support instead for the AR model: MXN is the only currency with a positive r A R,h throughout the forecast horizon. Moreover, the average value of r A R,h is positive only for horizons above two years. Finally, at all horizons r A R,h is visibly lower than r H L,h .

Table 2 Correlation of forecast and realized changes of real exchange rates

We also compare how well investigated models predict the sign of change in real exchange rates. Table 3 reports the frequency of correct direction of change in the real exchange rate predicted by the HL and AR models. The RW by construction is agnostic about directional change. The results reveal that at short horizons both the AR and HL models are unable to capture directional change systematically better than with a simple toss of the coin.Footnote 4 At medium term horizons, however, the HL model anticipates the direction of change in the real exchange rate between 60 and 80 % of times for seven out of nine currencies, which is somewhat better than in the case of the AR model. Using the Pesaran and Timmermann (1992) Chi-squared test we also establish that at five-year horizon the null of independence between the projected real exchange rate changes (on the basis of the HL model) and the realized changes is rejected at the 1 % significance level for all currencies except the Canadian dollar.

Table 3 Frequency of correct sign forecast for the real exchange rates

To sum up, the evidence suggests that real exchange rates of major currencies tend to be mean reverting and forecastable, as shown by the good performance of the calibrated HL model. At short horizons the HL model performs well compared to the AR model and slightly better than the RW. Over medium term forecasting horizons, however, the HL model strongly outperforms the RW exactly because it captures the tendency of the real exchange rate to be mean reverting. In the next section we provide an analytical explanation of why the estimated AR model instead performs so poorly.

4 Analytical Interpretation of the Results

In what follows we show analytically why the finite sample determines a sizable estimation error, which distorts the results in favor of the RW model even when the rolling estimation window covers several years of monthly data. Let us assume that the data generating process (DGP) for y t is given by Eq. 1 so that the unbiased and efficient forecast is:

$$ y_{T+h|T} = \mu + \rho^{h}(y_{T}-\mu), $$
(7)

and the variance of the forecast error:

$$ E\left\{\left( y_{T+h}-y_{T+h|T}\right)^{2}\right\} = \sigma^{2}\frac{1-\rho^{2h}}{1-\rho^{2}}. $$
(8)

If the DGP is known the only source of forecast errors comes from the random term. The variance of forecast errors generated by our three competing models is however higher than that in Eq. 8 because the coefficients μ and ρ are unknown and have to be estimated or calibrated.

Let us decompose the variance of the forecast error from a generic model M∈{R W,H L,A R} into three components:

$$ \begin{array}{lll} E\left\{\left( y_{T+h} - {y}_{T+h|T}^{M}\right)^{2}\right\} &=& E\left\{\left( y_{T+h}-y_{T+h|T}\right)^{2}\right\} + E\left\{\left( y_{T+h|T} - {y}_{T+h|T}^{M}\right)^{2}\right\} \\ && + 2E\left\{\left( y_{T+h}-y_{T+h|T}\right)\left( y_{T+h|T} - {y}_{T+h|T}^{M}\right)\right\}. \end{array} $$
(9)

The first component, which is given by Eq. 8, represents the random error that is common to all models. The second component captures the error due to model misspecification. The third component is equal to zero as future shocks are not forecastable. The key component that determines the different performance of our three competing models is therefore the second one. It is particularly advantageous that we can derive such component analytically for all three models.

In the case of the RW model the forecast error equals:

$$ y_{T+h|T} - {y}_{T+h|T}^{RW} = \left( \rho^{h}-1\right)\left( y_{T}-\mu\right) $$
(10)

and thus:

$$ E\left\{\left( y_{T+h|T}-{y}_{T+h|T}^{RW}\right)^{2}\right\} = \left( \rho^{h}-1\right)^{2} \times E\left\{\left( y_{T}-\mu\right)^{2}\right\}, $$
(11)

where:

$$E\left\{(y_{T}-\mu)^{2}\right\} = \frac{\sigma^{2}}{1-\rho^{2}}. $$

For the HL model, such error is equal instead to:

$$ y_{T+h|T}-{y}_{T+h|T}^{HL} = \left( \rho^{h} - \bar{\rho}^{h}\right)(y_{T}-\mu) - \left( 1-\bar{\rho}^{h}\right)(\bar{\mu} - \mu). $$
(12)

The first term describes the forecast error caused by the wrong calibration of parameter ρ and the second one is the error related to the estimation of μ. The resulting variance is:

$$ \begin{array}{lll} E\left\{\left( y_{T+h|T} - {y}_{T+h|T}^{HL}\right)^{2}\right\} &=& \left( \rho^{h} - \bar{\rho}^{h}\right)^{2} \times E\left\{\left( y_{T}-\mu\right)^{2}\right\} + \left( 1-\bar{\rho}^{h}\right)^{2}\\ && \times E\left\{\left( \bar{\mu}-\mu\right)^{2}\right\} - 2\left( \rho^{h} - \bar{\rho}^{h}\right)\left( 1-\bar{\rho}^{h}\right)\\ && \times E\left\{\left( y_{T}-\mu\right)\left( \bar{\mu}-\mu\right)\right\}, \end{array} $$
(13)

where:

$$\begin{array}{@{}rcl@{}} E\left\{\left( \bar{\mu}-\mu\right)^{2}\right\} &=& \frac{\sigma^{2}}{1-\rho^{2}} \times \frac{1}{R^{2}} \times \left( R + 2 \sum\limits_{j=1}^{R-1}(R-j)\rho^{j}\right)\\ E\left\{\left( y_{T}-\mu)(\bar{\mu}-\mu\right)\right\} &=& \frac{\sigma^{2}}{1-\rho^{2}} \times \frac{1}{R} \times \frac{1-\rho^{R}}{1-\rho} . \end{array} $$

Finally, as derived in Fuller and Hasza (1980), for the AR model the second component is approximately equal to:

$$ E\left\{\left( y_{T+h|T} - {y}_{T+h|T}^{AR}\right)^{2}\right\} \simeq \sigma^{2}\times \frac{1}{R}\times \left[h^{2}\rho^{2(h-1)} + \left( \frac{1-\rho^{h}}{1-\rho}\right)^{2}\right] $$
(14)

and is entirely caused by the estimation error.

Given Eqs. 814, the assumptions for the DGP coefficients (μ, ρ and σ) and the sample size (R), one can calculate the theoretical value of MSFE for all competing models (RW, HL and AR) at different forecast horizons (h=1,2,…,H). The theoretical MSFEs of all models do not depend on the value of μ and are proportional to the value of σ 2. The relative MSFEs depend hence only on the convergence coefficient ρ, the sample size R and the forecast horizon h.

Let us now consider values of ρ corresponding to DGPs where the underlying half-life parameter varies from one to ten years. We also postulate the same sample size and forecast horizons as in the empirical application from Section 3. The results are presented in Fig. 3, where the theoretical MSFEs of a given model are shown as a ratio with respect to the RW model.

Fig. 3
figure 3

Theoretical mean squared forecast errors. Notes: Each line represents the ratio of MSFE from a given method to MSFE from the random walk, where values below unity indicate better accuracy of point forecasts. The straight and dotted lines stand for AR and HL, respectively. The forecast horizon is expressed in months

The analytical results depend on the half-lives of the underlying DGP process. For half-lives above one year, the HL model beats the AR model; for values below 10 years it also beats the RW. This means that for a very wide range of half-lives, between 1 and 10 years, the HL model beats its competitors. For values higher than three years the AR model loses also with the RW model, as estimation error of estimating an autoregressive process is more severe than the model misspecification error of assuming a RW.Footnote 5

The bottom line is that in most univariate applications, unless the sample is very long, the AR model is likely to produce very imprecise forecasts. It is hence much preferable to simply employ a reasonably calibrated HL model, which assumes a gradual mean reversion to the sample mean.

5 Additional Remarks on the HL Model

While not particularly appealing from the theoretical perspective, apart from being consistent with long-run PPP, the HL model performs extremely well in the forecasting competition. What can explain this success? The shrinkage principle tells us that applying restrictions to economic models may be helpful out of sample even if they deteriorate their in-sample fit. Moreover, studies based on Monte Carlo techniques (e.g. Gilbert 1995), as well as our analytical derivations discussed in Section 4, show that the knowledge of the true structure of the DGP is often not particularly useful for predictive purposes if the parameter values are unknown. This is essentially the reasons why the estimated AR model is beaten by the calibrated HL framework. A helpful way of thinking about this is to recall George Box (1979) famous remark that “essentially, all models are wrong, but some are useful”.

From the results of our estimated autoregressive model it is straightforward to understand why a most likely false restriction (that the half-life adjustment is constant and equal to five years) has improved our forecasting performance. Table 4 reports rolling descriptive statistics for the estimated AR model. The first row presents the share of estimated half-lives in Rogoff’s range of 3 and 5 years, which is in general small. The second row broadens the range to between 1 and 10 years. The third row provides the fraction of cases for which the AR model produces explosive forecasts. The remaining rows report the mean, standard deviation, median and different quantiles of the estimated half-lives. These statistics illustrate perfectly well the high degree of estimation error and why calibration turns out to be the winning strategy out of sample.

Table 4 Rolling descriptive statistics; half-lives expressed in monthly terms

This strategy is remarkably close to that proposed by Faust and Wright (2013) on inflation forecasting. These authors have shown that assuming a smooth path of adjustment, between the last inflation reading and a good long term forecast, results in a projection that is hard to beat by more sophisticated econometric approaches. This confirms Box (1979) intuition that a cunningly well-chosen parsimonious model tends to beat overwhelmingly any method giving too much weigh to in-sample fluctuations.

6 Sensitivity Analysis

In this section we will show that the HL model turns out to be the most competitive model also when we change the forecast setting in our baseline. We shall then exploit its good performance and extend the analysis to nominal exchange rate forecasting.

6.1 Rolling Window Length

We begin by analyzing whether a change in the length of the rolling window has an impact on our findings. A longer rolling window should, in theory, increase the accuracy of the HL and AR models, as implied by Eqs. 13 and 14. In the case of the HL model, a longer rolling window helps the modeler to determine with more precision the PPP level. In the case of the AR model, a longer window also helps one to better determine the degree of real exchange rate persistence. A longer rolling window may, on the other hand, be potentially counterproductive, if we relax the assumption that we had in the analytical section that the equilibrium value of the real exchange rate is time-invariant (see Rossi 2006 for a discussion on the importance of parameter instability). As shown by Tables 5 and 6, for most currencies in our sample this latter effect seems to play a lesser role, considering that both the HL and AR models tend to become more competitive for longer rolling windows. For a 20 year rolling window as well as in the case of recursive estimation, the HL models outperform the RW model for 8 out of 9 currencies at almost all horizons.Footnote 6 For shorter samples, as in the case of a 10 year rolling window, the HL model continues to generally beat the RW (but this is no longer the case for the US dollar). The AR model instead generates, as expected, inaccurate forecasts, which confirms that the estimation error is the main source of the weak performance of the AR model. To sum up, for the currencies in our sample a rolling window of at least 15–20 years represents a good choice.

Table 5 Mean squared forecast errors (10Y rolling window)
Table 6 Mean squared forecast errors (20Y rolling window)

6.2 Prior on the Half-Life Parameter

A Bayesian autoregressive process may potentially outperform the HL models. To establish this, we set the mean-reversion parameter ρ as prior information rather than simply just impose it as we had done in the calibrated version of the model. To assess the implication of this choice let us consider a Bayesian autoregressive model (BAR), along the line suggested by Kilian and Zha (2002). We use the standard Minnesota setting for vector autoregressions to elicit our prior on the degree of PPP persistence. In particular, we write down the model (1) in the standard AR form:

$$ y_{t} = \delta + \rho y_{t-1} + \epsilon_{t}, $$
(15)

where δ=(1−ρ)μ. The prior for α=[δ ρ] is assumed to be \(\mathcal {N}(\underline {\alpha }, \underline {V})\) with \(\underline {\alpha } = [(1-\bar {\rho })\bar {\mu } \; \bar {\rho }]^{\prime }\) and \(\underline {V}=diag(\lambda \sigma ^{2},\lambda )\), where σ is the residual standard error from the AR model, \(\bar {\rho }\) is the mean-reversion parameter calibrated so that the half-life is five years and λ is the overall tightness hyperparameter. The expected value of the posterior is:

$$\overline{\alpha}=\left( \underline{V}^{-1}\underline{\alpha} + \sigma^{-2}X^{\prime}X \hat{\alpha}\right), $$

where \(\hat {\alpha }\) is the LS estimate of α, X is the observation matrix and \(\overline {V}=(\underline {V}^{-1} + \sigma ^{-2}X^{\prime }X)\). The parameter λ has a very simple intuitive explanation for it allows us to choose an intermediate solution between the calibrated and autoregressive solution. The two corner solutions are the calibrated solution (for \(\lambda =0\overline {\alpha }\) collapses to \(\underline {\alpha }\)) and the estimated solution (for \(\lambda \rightarrow \infty\, \overline {\alpha }\) equals to \(\hat {\alpha }\)).

We report in Table 7 the ratios between the MSFEs from the Bayesian autoregressive model (reported as BAR in the table) and the MSFEs from the RW model for λ equal to 0, .1 and . For the intermediate case λ=0.1 such ratios are typically higher than those corresponding to the HL model and lower than those corresponding to the AR models. In other words in general the relative MSFEs tend to increase monotonically with the rising of λ. The best solution is therefore to set λ=0, i.e. the calibrated solution.

Table 7 Mean squared forecast errors (MSFEs) – BAR model

For the one-month horizon we also provide a graphical illustration of what we have just said for values of λ ranging on a continuous scale between zero and (see Fig. 4). On the vertical axis the values of MSFE are normalized so that MSFE is equal to 100 for λ=0, which corresponds to the case of the calibrated HL model. For six currencies (EUR, JPY, NZD, CHF, GBP, USD) the relationship is indeed increasing and monotonic, i.e. the more weight one gives to estimation error the worse is the forecasting performance of the Bayesian autoregressive model. For one currency (MXN), the estimated model performs the best. Only for two currencies (AUD and CAD) and very specific ranges of λ we find additional gains from using a Bayesian autoregressive model.

Fig. 4
figure 4

Sensitivity analysis of MSFE on the λ (forecast horizon: 1 month). Notes: Each line represents the ratio of MSFE from a given method to MSFE from the HL5 multiplied by 100, where values 100 unity indicate better accuracy of point forecasts. The straight, dashed and dotted lines stand for BAR, HL and AR, respectively. The value of λ parameter is expressed using the logarithmic scale

6.3 Other Currencies

As a third robustness check we evaluate whether the results are applicable to other currencies as well. We thus consider the full set of real effective exchange rates indices available in the Bank for International Settlements database. The additional sample consists of eighteen currencies for the following countries: Austria (ATS), Belgium (BEF), Taiwan (TWD), Denmark (DKK), Finland (FIM), France (FRF), Germany (DEM), Greece (GRD), Hong Kong (HKG), Ireland (IEP), Italy (ITL), the South Korea (KRW), the Netherlands (NLG), Norway (NOK), Portugal (PTE), Singapore (SGD), Spain (ESP) and Sweden (SEK). The results are reported in Table 8 and lead to similar conclusions to those reached earlier The forecasts based on the HL model are better than those based on the RW for 9 of the 18 currencies, comparable for 6 and less accurate for 3. The HL model also delivers more precise forecasts than the AR model for most currencies.

Table 8 Mean squared forecast errors for other currencies

6.4 Sensitivity to the HL Parameter

Next, we evaluate if the performance of the HL model is sensitive to the duration of the adjustment process. Table 9 reports the relative performance of the HL model compared to the RW assuming that half of the adjustment is completed in 1, 3 and 10 years respectively. In the large majority of cases the HL model outperforms the RW regardless of this choice: the HL model beats the RW at the lower bound proposed by Rogoff (HL3) but is also quite competitive for half-lives in the broad range of 1 to 10 years. Opting for fast convergence to PPP, such as in the case of the HL1 model, the calibrated half-life model continues to perform satisfactorily for forecast horizons above two years. Opting for a much lower pace of convergence, such as in the HL10 model, the HL beats the RW at all horizons. However, at longer horizons the performance of the HL10 model is not as good as the HL3 or HL5 model, suggesting that it is still preferable to select a faster pace of convergence to PPP.

Table 9 Mean squared forecast errors for other HL duration

6.5 An Extension to Nominal Exchange Rate Forecasting

The next step in our analysis consists in testing whether the mean reverting nature of the real exchange rate helps us to forecast nominal exchange rates. A simple approach is to assume that the adjustment of the real exchange rate predicted by model M is entirely achieved via changes in nominal exchange rates, while the relative price channel is absent. The predicted change of log nominal exchange rate (s) at horizon h is thus simply equal to the predicted real exchange rate (y) adjustment:

$$ {s}_{T+h|T}^{M}- s_{T} = y_{T+h|T}^{M}- y_{T}. $$
(16)

The results presented in Table 10 are based on the same settings that we had earlier in our baseline for real exchange rate forecasting. The calibrated HL model performs visibly better than the RW for exactly the same seven currencies for which it was superior in the context of real exchange rate forecasting. The forecasts generated by the HL model are also generally much more precise than those generated by the AR model. Comparing the numbers in Tables 1 and 10 clearly shows that for our set of currencies the ability to forecast real and nominal exchange rates is not very different. For the same seven currencies for which the real exchange rate are forecastable, the nominal exchange rate has contributed, amid high volatility, to the gradual mean reversion process of the real exchange rate rather than just followed the RW.Footnote 7 The role of relative prices may be hypothetically more important in the case of countries that have a fixed exchange rate regime.

Table 10 Mean squared forecast errors for NEERs

6.6 An Extension to Real and Nominal Bilateral Exchange Rates

In our study we use ex-post real effective exchange rates that are subject to revision of weights. Consequently, these data are not available in real time. Even though weights revisions are generally small, they might still influence the relative performance of our competing models. It is hence important to show that the mean reverting properties of the real exchange rate are not dependent on the revision of the weights by the BIS nor are just a mere artefact of how different currency pairs are aggregated. The best way to prove this is to show that the HL model can forecast in a very precise way also bilateral exchange rates. In Table 11 we report our forecasting competition for the real and nominal exchange rate of the US dollar vis-à-vis the eight remaining currencies that we had analyzed before. As a reference currency we have chosen the US dollar, and not the euro, to avoid dependence of our results to the way we have defined the synthetic euro and the price index for the euro area before EMU creation. The results are exceptionally good. As it turns out the HL model clearly outperforms all other models for all currency pairs at medium term horizons. For most currencies the HL model performs comparably or often even better than the RW also at short-run horizons.

Table 11 Mean squared forecast errors for bilateral ERs against USD

7 Conclusions

Notwithstanding the recent important progress made in the field of exchange rate economics, we still know very little of what drives the fluctuations of major currencies. Moreover, as evidenced by numerous studies, exchange rate forecasts tend to be inaccurate both in absolute sense and relative to a naïve RW. Solving the “exchange rate puzzle” has been an endeavor for many economists over the past three decades. The vast exchange rate literature provides, however, at least two reasons for being cautiously optimistic. First, a number of papers have emphasized that the dismal forecasting performance of exchange rate models is partly due to estimation error, which explains why the RW is less competitive for larger datasets. Second, the literature on PPP half-life has shown that there is evidence of mean reversion in real exchange rates.

In this paper we have illustrated how these two findings can be exploited in real exchange rate forecasting. In particular, we have proposed a simple model that assumes a gradual return of the real exchange rate to its sample mean. From the theoretical perspective this alternative is more appealing than the RW for it takes into account that PPP holds in the long-term horizon. It is also appealing from the empirical perspective for it is consistent with the evidence that real exchange rates are highly persistent but mean reverting.

The key finding of our analysis is that the proposed HL model is able to overwhelmingly beat the naïve RW in terms of real effective and bilateral exchange rate forecasting for seven out of nine major world currencies. Moreover, a model of gradual adjustment toward the PPP rate outperforms the RW also at short-horizons: the naïve model is beaten already at the 6 month horizon for both the US dollar and the euro. We believe that our results are quite intuitive: our preferred forecasting model for real exchange rates resembles quite closely the RW in the short-run while it gradually approaches and reaches PPP over medium to long term horizons.

A second key finding of our analysis is that if the speed of mean reversion pace is estimated then the model performs significantly worse than the RW. We explain this result analytically by showing that estimation forecast error plays an important role even for horizons of 15 years of monthly data. We have carried out a comprehensive sensitivity analysis to check the robustness of our results to different rolling windows and the choice of analyzed currencies. We have found that the results are valid for a wide range of half-lives, as long as they are calibrated at reasonable values, rather than estimated, irrespective of whether we use Bayesian techniques. Finally, we have demonstrated that the mean reverting nature of real exchange rates can be exploited to outperform the RW also in terms of nominal exchange rate forecasting. For most currencies in our sample we find that the nominal exchange rate has contributed to the mean reversion process of the real exchange rate rather than just followed a RW.