Keywords

JEL Classifications

Decisions in the fields of economics and management have to be made in the context of forecasts about the future state of the economy or market. As decisions are so important as a basis for these fields, a great deal of attention has been paid to the question of how best to forecast variables and occurrences of interest. There are several distinct types of forecasting situations, including event timing, event outcome, and time-series forecasts. Event timing is concerned with the question of when, if ever, some specific event will occur, such as the introduction of a new tax law, or of a new product by a competitor, or of a turning point in the business cycle. Forecasting of such events is usually attempted by the use of leading indicators, that is, other events that generally precede the one of interest. Event outcome forecasts try to forecast the outcome of some uncertain event that is fairly sure to occur, such as finding the winner of an election or the level of success of a planned marketing campaign. Forecasts are usually based on data specifically gathered for this purpose, such as a poll of likely voters or of potential consumers. There clearly should be a positive relationship between the amount spent on gathering the extra data and the quality of the forecast achieved.

A time series xt is a sequence of values gathered at regular intervals of time, such as daily stock market closing prices, interest rates observed weekly, or monthly unemployment levels. Irregularly recorded data, or continuous time sequences may also be considered but are of less practical importance. When at time n (now), a future value of the series, xn+h, is a random variable where h is the forecast horizon. It is usual to ask questions about the conditional distribution of xn+h given some information set In, available now from which forecasts will be constructed. Of particular importance are the conditional mean

$$ {f}_{n,h}=E\left[{x}_{n+h}\left|{I}_n\right.\right] $$

and variance, Vn,h. The value of fn,h is a point forecast and represents essentially the best forecast of the most likely value to be taken by the variable x at time n + h.

With a normality assumption, the conditional mean and variance can be used together to determine an interval forecast, such as an interval within which xn,h is expected to fall with 95 per cent confidence. An important decision in any forecasting exercise is the choice of the information set In. It is generally recommended that In include at least the past and present of the individual series being forecast, xnj,j≥0. Such information sets are called proper, and any forecasting models based upon them can be evaluated over the past. An In that consists just of xnj, provides a univariate set so that future xi are forecast just from its own past. Many simple time-series forecasting methods are based on this information set and have proved to be successful. If In includes several explanatory variables, one has a multivariate set. The choice of how much past data to use and which explanatory variables to include is partially a personal one, depending on one’s knowledge of the series being forecast, one’s levels of belief about the correctness of any economic theory that is available, and on data availability. In general terms, the more useful are the explanatory variables that are included in In, the better the forecast that will result. However, having many series allows for a confusing number of alternative model specifications that are possible so that using too much data could quickly lead to diminishing marginal returns in terms of forecast quality. In practice, the data to be used in In will often be partly determined by the length of the forecast horizon. If h is small, a short-run forecast is being made and this may concentrate on frequently varying explanatory variables. Short-term forecasts of savings may be based on interest rates, for example. If h is large so that long-run forecasts are required, then slowly changing, trending explanatory variables may be of particular relevance. A long-run forecast of electricity demand might be largely based on population trends, for example. What is considered short run or long run will usually depend on the properties of the series being forecast. For very long forecasts, allowances would have to be made for technological change as well as changes in demographics and the economy. A survey of the special and separate field of technological forecasting can be found in Martino (1993) with further discussion in Martino (2003).

If decisions are based on forecasts, it follows that an imperfect forecast will result in a cost to the decision-maker. For example, if fn,h is a point forecast made at time n, of xn+h, the eventual forecast error will be

$$ {e}_{n,h}={x}_{n,h}-{f}_{n,h}, $$

which is observed at time n + h. The cost of making an error e might be denoted as C(e), where C(e) is positive with C(0) = 0. As there appears to be little prospect of making error-free forecasts in economics, positive costs must be expected, and the quality of a forecast procedure can be measured as the expected or average cost resulting from its use. Several alternative forecasting procedures can be compared by their expected costs and the best one chosen. It is also possible to compare classes of forecasting models, such as all linear models based on a specific, finite information set, and to select the optimum model by minimizing the expected cost. In practice the true form of the cost function is not known for decision sequences, and in the univariate forecasting case a pragmatically useful substitute to the real C(e) is to assume that it is well approximated by ae2 for some positive a. This enables least-squares statistical techniques to be used when a model is estimated and is the basis of a number of theoretical results including that the optimal forecast of xn+h based on In is just the conditional mean of xn,h. Machina and Granger (2006) have considered cost functions generated by decision makers and then find implications for their utility functions. This is just one component of considerable developments in the area of evaluation of forecasts; see West (2006) and Timmermann (2006), for example.

When using linear models and a least-square criterion, it is easy to form forecasts under an assumption that the model being used is a plausible generating mechanism for the series of interest. Suppose that a simple model of the form

$$ {x}_t=\alpha {x}_{t-1}+\beta {y}_{t-2}+{\varepsilon}_t $$

is believed to be adequate where εt is a zero-mean, white noise (unforecastable) series. When at time n, according to this model, the next value of x will be generated by

$$ {x}_{n+1}=\alpha {x}_n+\beta {y}_{n-1}+{\varepsilon}_{n+1}. $$

The first two terms are known at time n, and the last term is unforecastable. Thus

$$ {f}_{n,1}=\alpha {x}_n+\beta {y}_{n-1} $$

and

$$ {e}_{n,1}={\varepsilon}_{n+1}. $$

xn+2, the following x, will be generated by

$$ {x}_{n+2}=\alpha {x}_{n+1}+\beta {y}_n+{\varepsilon}_{n+2}. $$

The first of these terms is not known at time n, but a forecast is available for it, αfn; the second term is known at time n, and the third term is not forecastable, so that

$$ {f}_{n,2}=\alpha {f}_{n,1}+\beta {y}_n $$

and

$$ {e}_{n,2}={\varepsilon}_{n+2}+\alpha \left({x}_{n+1}-{f}_{n,1}\right)={\varepsilon}_{n+2}+{\alpha \varepsilon}_{n+1}. $$

To continue this process for longer forecast horizons, it is clear that forecasts will be required for yn+h−2. The forecast formation rule is that one uses the model available as though it is true, asks how a future xn+h will be generated, uses all known terms as they occur, and replaces all other terms by optimal forecasts. For non-linear models this rule can still be used, but with the additional complication that the optimum forecast of a function of x is not the same function of the optimum forecast of x.

The steps involved in forming a forecast include deciding exactly what is to be forecast, the forecast horizon, the data that is available for use, the model forms or techniques to be considered, the cost function to be used in the evaluation procedure, and whether just one single forecast would be produced or several alternatives. It is good practice to decide on the evaluation to be used before starting a sequence of forecasts. If there are several alternative forecasting methods involved, a weighted combination of the available forecasts is both helpful for evaluation and can often provide a superior forecast.

The central problem in practical forecasting is choosing the model from which the forecasts will be derived. If a univariate information set is used, it is natural to consider the model developed in the field of time-series analysis. A class of models that has proved to be successful in short-term forecasting is the autoregressive (AR) model class. If a series is regressed on itself up to p lags, the result is an AR(p) model. These models were originally influenced by Box and Jenkins (1970) as a particularly relevant subclass of their ARMA (p, q) models, which involve moving average components. The number of lags in an AR(p) can be chosen using a selection criterion; the most used are the Bayes information criterion (BIC) and the less conservative Akaike information criterion (AIC).

The natural extension was to vector autoregressive models. Later, when it was realized that many series in macroeconomics and finance had the property of being integrated, and so contained stochastic trends, the natural multivariate form was the error-correction model. It is quite often found that error-correction models improve forecasts, but not inevitably. There are a variety of ways of building models with many predictive variables, including those with unobserved components and using special data, such as survey expectations, real-time macro data, and seasonal components.

In recent years the linear models have been joined by a variety of nonlinear forms (see Terasvita 2006), including switching models and neural networks as well as linear models with time varying coefficients estimated using Kalman filters.

Traditionally, forecasters concentrated on the mean of the predictive distribution. Towards the end of the 20th century considerable attention was given to forecasting the variance of the distribution, particularly in the financial area, often using Engle’s (1995) ARCH model or one of its many generalizations (see the survey by Andersen et al. 2006). Recently forecasts of the whole distribution have become more common in practice, both in finance and in macroeconomics: see Corradi and Swanson (2006) for a recent discussion. These forecasts will include discussions of quantiles, and the use of copulas gives a way into multivariate distribution forecasts. The topics mentioned in this paragraph are covered by chapters in Elliott et al. (2006).