Keywords

1 Introduction

1.1 Exponential Smoothing method

Since the 1950s, exponential smoothing method became a very popular forecasting technique due to easy implementation and several other advantages. The most popular classical exponential smoothing methods are single exponential smoothing (Brown 1959, 1963), Holt’s linear method (Holt 1957) and Holt–Winters exponential smoothing method (Winters 1960). Further development of the exponential smoothing method has started to take place after the introduction of a nonlinear state-space framework, characterized by a single source of errors (Ord et al. 1997). As a result, Hyndman et al. (2008) listed 30 different models of the state-space version of the exponential smoothing methods.

All exponential smoothing methods mentioned above are considered as univariate time series models. These models only rely on dynamic movement of a single series to produce future forecasts of the same series. A number of attempts have been made to integrate the effect of regressors into exponential smoothing methods. Hyndman et al. (2008), for example, introduced an augmented version of the exponential smoothing model with independent variables in which regressor parameters are time invariant. An example of a local-level model with one regressor (denoted ETSX-TIP) is given by the following equations:

$$ y_{t} = \ell_{t - 1} + p_{1,t - 1} z_{1,t} + \varepsilon_{t} $$
(1a)
$$ \ell_{t} = \ell_{t - 1} + \alpha \varepsilon_{t} $$
(1b)
$$ p_{1,t} = p_{1,t - 1} $$
(1c)

In the above equations, \( y_{t} \) is the observed value of the dependent variable, \( \ell_{t} \) denotes the level of the series, \( z_{1,t} \) represents the regressor or independent variable while \( p_{1,t} \) is its parameter, and \( \varepsilon_{t} \) represents errors with the stochastic assumption that \( \varepsilon_{t} \sim NID\left( {0,\sigma^{2} } \right) \). α represents the smoothing parameter.

If we want to have a local-level model with a regressor in which parameter is time varying (ETSX-TVP) based on the formulation given by Eqs. (1a)–(1c), we can simply add another smoothing parameter, \( \beta_{1} \) and error term in Eq. (1c) so that the Eqs. (1a)–(1c) are expressed by the following equations.

$$ y_{t} = \ell_{t - 1} + p_{1,t - 1} z_{1,t} + \varepsilon_{t} $$
(2a)
$$ \ell_{t} = \ell_{t - 1} + \alpha \varepsilon_{t} $$
(2b)
$$ p_{1,t} = p_{1,t - 1} + \beta_{1} \varepsilon_{t} $$
(2c)

Osman and King (2015a) then introduced another formulation of the exponential smoothing method with regressors. Their proposed formulation covers both time-invariant and time-varying regressor parameters for k regressors as given by the following general equations.

$$ y_{t} = \ell_{t - 1} + s_{t - m} + \sum\limits_{i = 1}^{k} {b_{i,t - 1}\Delta _{{z_{i,t} }} + \varepsilon_{t} } $$
(3a)
$$ \ell_{t} = \ell_{t - 1} + \sum\limits_{i = 1}^{k} {b_{i,t - 1}\Delta _{{z_{i,t} }} + \alpha \varepsilon_{t} } $$
(3b)
$$ s_{t} = s_{t - m} + \gamma \varepsilon_{t} $$
(3c)
$$ \begin{aligned} & b_{1,t} = \left\{ {\begin{array}{*{20}l} {b_{1,t - 1} + \frac{{\beta_{1} \left( {\varepsilon_{1,t - 1}^{ + } + \varepsilon_{t} } \right)}}{{\Delta_{{z_{1,t} }}^{*} }},} \hfill & {{\text{if}}\;\left| {\Delta_{{z_{1,t} }}^{*} } \right| \ge L_{{b_{1} }} } \hfill \\ {b_{1,t - 1} ,} \hfill & {{\text{if}}\;\left| {\Delta_{{z_{1,t} }}^{*} } \right| < L_{{b_{1} }} } \hfill \\ \end{array} } \right. \\ &\qquad \quad \qquad \qquad {} \vdots \hfill \\\end{aligned} $$
(3d)
$$ b_{k,t} = \left\{ {\begin{array}{*{20}l} {b_{k,t - 1} + \frac{{\beta_{k} \left( {\varepsilon_{k,t - 1}^{ + } + \varepsilon_{t} } \right)}}{{\Delta _{{z_{k,t} }}^{ *} }},} \hfill & {{\text{if}}\; \left| {\Delta _{{z_{k,t} }}^{ *} } \right| \ge L_{{b_{k} }} } \hfill \\ {b_{k,t - 1} ,} \hfill & {{\text{if}}\;\left| {\Delta _{{z_{k,t} }}^{ *} } \right| < L_{{b_{k} }} } \hfill \\ \end{array} } \right. $$
(3e)
$$ \begin{aligned} & \varepsilon_{1,t}^{ + } = \left\{ {\begin{array}{*{20}l} {0,} \hfill & {{\text{if}}\;\left| {\Delta _{{z_{1,t} }}^{ *} } \right| \ge L_{{b_{1} }} } \hfill \\ {\varepsilon_{1,t - 1}^{ + } + \varepsilon_{t} ,} \hfill & {{\text{if}}\; \left| {\Delta _{{z_{1,t} }}^{ *} } \right| < L_{{b_{1} }} } \hfill \\ \end{array} } \right. \\ &\qquad \quad \qquad \qquad {} \vdots \hfill \\\end{aligned} $$
(3f)
$$ \varepsilon_{k,t}^{ + } = \left\{ {\begin{array}{*{20}l} {0,} \hfill & {{\text{if}}\;\left| {\Delta _{{z_{k,t} }}^{ *} } \right| \ge L_{{b_{k} }} } \hfill \\ {\varepsilon_{k,t - 1}^{ + } + \varepsilon_{t} ,} \hfill & {{\text{if}}\; \left| {\Delta _{{z_{k,t} }}^{ *} } \right| < L_{{b_{k} }} } \hfill \\ \end{array} } \right. $$
(3g)

where

$$ \Delta _{{z_{i,t} }}^{ *} = z_{i,t} - z_{i,t - 1}^{ *} \quad {\text{and}}\;\;z_{i,t}^{ *} = \left\{ {\begin{array}{*{20}l} {z_{i,t} ,} \hfill & {{\text{if}}\;\left| {\Delta _{{z_{i,t} }}^{ *} } \right| \ge L_{{b_{i} }} } \hfill \\ {z_{i,t - 1}^{ *} ,} \hfill & {{\text{if}}\; \left| {\Delta _{{z_{i,t} }}^{ *} } \right| < L_{{b_{i} }} } \hfill \\ \end{array} } \right. $$
(4)

In the above equations, \( s_{t} \) represents the seasonal component while m represents the periodicity of the seasonality. \( \gamma \) is another smoothing parameter. In addition to that, \( \Delta _{{z_{i,t} }} \) represents the change in ith regressor so that \( \Delta _{{z_{i,t} }} = {Z_{i,t}} - {Z_{i,{t-1}}}, {b_{i,t}}\) is the regressor parameter, and \( \varepsilon_{i,t}^{ + } \) is a dummy error.

\( L_{{b_{i} }} \) in Eqs. (3a)–(3g) represents the lower boundary for \( \left| {\Delta _{{z_{i,t} }}^{*} } \right| \) in a switching procedure. The use of the lower boundary in the switching procedure is to avoid extreme changes in the regressor parameters, \( b_{i,t} \). The idea of using a box-plot to flag outliers as introduced by Tukey (1977) can be used to determine the lower boundary for the switching procedure. By referring to Eqs. (3d) and (3e), extreme change in the regressor coefficients, \( b_{i,t} \) can be avoided if all of \( \beta_{i} /\left| {\Delta _{{z_{i,t} }} } \right| \) satisfy the following condition:

$$ \beta_{i} /\left| {\Delta _{{z_{i,t} }} } \right| \le {\text{upper}}\;{\text{inner}}\;{\text{fence}}\;{\text{of}}\;\left( {\beta_{i} /\left| {\Delta _{{z_{i,t} }} } \right|} \right) = Q_{3} + 1.5\left( {Q_{3} - Q_{1} } \right), $$

with \( Q_{1} = 1{\text{st}} \) quartile and \( Q_{3} = 3{\text{rd}} \) quartile of \( \left( {\beta_{i} /\left| {\Delta _{{z_{i,t} }} } \right|} \right) \).

The lower boundary can be determined in three steps. The first step is to compute \( \left| {\Delta _{{z_{i,t} }} } \right| \) for each regressor, \( z_{i,t} \), and then to remove \( \left| {\Delta _{{z_{i,t} }} } \right| = 0 \). The second step is to compute the dispersion summary of \( 1/\left| {\Delta _{{z_{i,t} }} } \right| \) and find the upper inner fence \( \left( {UIF_{{1/\left| {\Delta _{{z_{i,t} }} } \right|}} } \right) \). The final step is to get the lower boundary based on a formula \( L_{{b_{i} }} = 1/UIF_{{1/\left| {\Delta _{{z_{i,t} }} } \right|}} \) or set it at 0.5, whichever is smaller. The rationale of this setting is given in Osman and King (2015a).

The model specification as given by Eqs. (3a)–(3g) is considered as a model with time-varying regressor parameters. The corresponding model with time-invariant regressor parameters can be constructed by setting all regressor coefficients to be constant, i.e., \( b_{i,t} = b_{i,t - 1} \). In this case, the model no longer needs to apply the switching procedure.

For the case of the local-level model with one regressor in which the parameter is time varying (ESWR-TVP), the model specification is given by

$$ y_{t} = \ell_{t - 1} + b_{1,t - 1}\Delta _{{z_{1,t} }} + \varepsilon_{t} $$
(5a)
$$ \ell_{t} = \ell_{t - 1} + b_{1,t - 1}\Delta _{{z_{1,t} }} + \alpha \varepsilon_{t} $$
(5b)
$$ b_{1,t} = \left\{ {\begin{array}{*{20}l} {b_{1,t - 1} + \frac{{\beta_{1} \left( {\varepsilon_{1,t - 1}^{ + } + \varepsilon_{t} } \right)}}{{\Delta _{{z_{1,t} }}^{ *} }},} \hfill & {{\text{if}}\;\left| {\Delta _{{z_{1,t} }}^{ *} } \right| \ge L_{{b_{1} }} } \hfill \\ {b_{1,t - 1} ,} \hfill & {{\text{if}}\;\left| {\Delta _{{z_{1,t} }}^{ *} } \right| < L_{{b_{1} }} } \hfill \\ \end{array} } \right. $$
(5c)
$$ \varepsilon_{1,t}^{ + } = \left\{ {\begin{array}{*{20}l} {0,} \hfill & {{\text{if}}\;\left| {\Delta _{{z_{1,t} }}^{ *} } \right| \ge L_{{b_{1} }} } \hfill \\ {\varepsilon_{1,t - 1}^{ + } + \varepsilon_{t} ,} \hfill & {{\text{if}}\;\left| {\Delta _{{z_{1,t} }}^{ *} } \right| < L_{{b_{1} }} } \hfill \\ \end{array} } \right. $$
(5d)

where

$$ \Delta _{{z_{1,t} }}^{ *} = z_{1,t} - z_{1,t - 1}^{ *} \quad {\text{and}}\;\;z_{1,t}^{ *} = \left\{ {\begin{array}{*{20}l} {z_{1,t} ,} \hfill & {{\text{if}}\; \left| {\Delta _{{z_{1,t} }}^{ *} } \right| \ge L_{{b_{1} }} } \hfill \\ {z_{1,t - 1}^{ *} ,} \hfill & {{\text{if}}\; \left| {\Delta _{{z_{1,t} }}^{ *} } \right| < L_{{b_{1} }} } \hfill \\ \end{array} } \right.. $$
(6)

In contrast, the model specification for a local-level model with one regressor in which parameter is time invariant (ESWR-TIP) is given by

$$ y_{t} = \ell_{t - 1} + b_{1,t - 1}\Delta _{{z_{1,t} }} + \varepsilon_{t} $$
(7a)
$$ \ell_{t} = \ell_{t - 1} + b_{1,t - 1}\Delta _{{z_{1,t} }} + \alpha \varepsilon_{t} $$
(7b)
$$ b_{1,t} = b_{1,t - 1} . $$
(7c)

All models specified above can be represented in matrix notation of

$$ y_{t} = \bar{\varvec{w}}_{t}^{{\prime }} \bar{\varvec{x}}_{t - 1} + \varepsilon_{t} $$
(8a)
$$ \bar{\varvec{x}}_{t} = \overline{\varvec{F}}_{t} \bar{\varvec{x}}_{t - 1} + \bar{\varvec{g}}_{t} \varepsilon_{t} , $$
(8b)

with matrix notation for four models with one regressor described above are given below.

ETSX-TIP:

$$ \bar{\varvec{x}}_{t} = \left[ {\begin{array}{*{20}c} {\ell_{t} } \\ {p_{1,t} } \\ \end{array} } \right],\quad \bar{\varvec{w}}_{t} = \left[ {\begin{array}{*{20}c} 1 \\ {z_{1,t} } \\ \end{array} } \right],\quad \overline{\varvec{F}}_{t} = \left[ {\begin{array}{*{20}c} 1 & 0 \\ 0 & 1 \\ \end{array} } \right]\quad {\text{and}}\;\;\bar{\varvec{g}}_{t} = \left[ {\begin{array}{*{20}c} \alpha \\ 0 \\ \end{array} } \right]. $$

ETSX-TVP:

$$ \bar{\varvec{x}}_{t} = \left[ {\begin{array}{*{20}c} {\ell_{t} } \\ {p_{1,t} } \\ \end{array} } \right],\quad \bar{\varvec{w}}_{t} = \left[ {\begin{array}{*{20}c} 1 \\ {z_{1,t} } \\ \end{array} } \right],\quad \overline{\varvec{F}}_{t} = \left[ {\begin{array}{*{20}c} 1 & 0 \\ 0 & 1 \\ \end{array} } \right]\quad {\text{and}}\;\;\bar{\varvec{g}}_{t} = \left[ {\begin{array}{*{20}c} \alpha \\ {\beta_{1} } \\ \end{array} } \right]. $$

ESWR-TIP:

$$ \bar{\varvec{x}}_{t} = \left[ {\begin{array}{*{20}c} {\ell_{t} } \\ {b_{1,t} } \\ \end{array} } \right],\quad \bar{\varvec{w}}_{t} = \left[ {\begin{array}{*{20}c} 1 \\ {\Delta _{{z_{1,t} }} } \\ \end{array} } \right],\quad \overline{\varvec{F}}_{t} = \left[ {\begin{array}{*{20}c} 1 & {\Delta _{{z_{1,t} }} } \\ 0 & 1 \\ \end{array} } \right]\quad {\text{and}}\;\;\bar{\varvec{g}}_{t} = \left[ {\begin{array}{*{20}c} \alpha \\ 0 \\ \end{array} } \right]. $$

ESWR-TVP:

$$ \begin{aligned} \bar{\varvec{x}}_{t} & = \left[ {\begin{array}{*{20}c} {\ell_{t} } \\ {b_{1,t} } \\ {\varepsilon_{1,t}^{ + } } \\ \end{array} } \right],\quad \bar{\varvec{w}}_{t} = \left[ {\begin{array}{*{20}c} 1 \\ {\Delta _{{z_{1,t} }} } \\ 0 \\ \end{array} } \right], \\ \overline{\varvec{F}}_{t} & = \left[ {\begin{array}{*{20}c} 1 & {\Delta _{{z_{1,t} }} } & 0 \\ 0 & 1 & {\beta_{1} /\varDelta_{{z_{1,t} }}^{*} } \\ 0 & 0 & 0 \\ \end{array} } \right]\quad {\text{if}}\;\left| {\Delta _{{z_{1,t} }}^{*} } \right| \ge L_{{b_{1} }} \quad {\text{or}}\quad \overline{\varvec{F}}_{t}^{\varvec{*}} = \left[ {\begin{array}{*{20}c} 1 & {\Delta _{{z_{1,t} }} } & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{array} } \right]\quad {\text{if}}\; \left| {\Delta _{{z_{1,t} }}^{*} } \right| < L_{{b_{1} }} \\ \end{aligned} $$

and

$$ \bar{\varvec{g}}_{t} = \left[ {\begin{array}{*{20}c} \alpha \\ {\beta_{1} /{\varDelta }_{{z_{1,t} }}^{*} } \\ 0 \\ \end{array} } \right]\quad {\text{if}}\;\left| {\Delta _{{z_{1,t} }}^{*} } \right| \ge L_{{b_{1} }} \quad {\text{or}}\; \bar{\varvec{g}}_{t}^{\varvec{*}} = \left[ {\begin{array}{*{20}c} \alpha \\ 0 \\ 1 \\ \end{array} } \right]\quad {\text{if}}\; \left| {\Delta _{{z_{1,t} }}^{*} } \right| < L_{{b_{1} }} . $$

1.2 Stability and Forecastability

The objective of this paper is to investigate and compare forecast properties of the two formulations for integrating regressors into the exponential smoothing method. Any forecasting model is considered a good model if it is able to produce stable forecasts for long period ahead. There are two similar concepts known as stability and forecastability that ensures any model will produce stable forecasts. Osman and King (2015b) give an explanation of these concepts in detail. The same explanation can also be found in Hyndman et al. (2008). As a summary of the two concepts, consider the following expected value of Eq. (8a) which also represents the forecast equation.

$$ \hat{y}_{t + 1|t} = \bar{\varvec{w}}_{t + 1}^{{\prime }} \left( {\prod\nolimits_{i = 0}^{t - 1} {\varvec{D}_{t - i} } } \right)\bar{\varvec{x}}_{0} + \sum\limits_{j = 1}^{t - 1} {\left[ {\bar{\varvec{w}}_{t + 1}^{{\prime }} \left( {\prod\nolimits_{i = 0}^{t - j - 1} {\varvec{D}_{t - i} } } \right)\bar{\varvec{g}}_{j} y_{j} } \right]} + \bar{\varvec{w}}_{t + 1}^{{\prime }} \bar{\varvec{g}}_{t} y_{t} , $$
(9)

where \( \varvec{D}_{t} = \overline{\varvec{F}}_{t} - \bar{\varvec{g}}_{t} \bar{\varvec{w}}_{t}^{{ {\prime }}} \). Matrix \( \varvec{D}_{t} \) is known as discount matrix. In order to produce stable forecasts, any model needs to satisfy either the stability condition or the forecastability condition. Stability requires all eigenvalues of a matrix \( \varvec{D}_{t} \) to be less than 1 in absolute value. In contrast, the forecastability concept requires eigenvalues of a matrix \( \varvec{D}_{t} \) to be no greater than 1 in absolute value. Forecast values, \( \hat{y}_{{\left. {t + 1} \right|t}} \) will explode (be unstable) if one or more eigenvalue(s) of the matrix \( \varvec{D}_{t} \) is greater than 1 in absolute value.

2 Research Methodology

In order to achieve the research objective, theoretical inspection, and numerical experiments have been conducted. The theoretical inspection was used to investigate the characteristic equations of all four local-level models with one regressor parameter as described in the previous section. The inspection was aimed to determine whether eigenvalues of matrix \( \varvec{D}_{t} \) satisfy any or both of the stability and forecastability conditions. It is done by inspecting the maximum eigenvalues of matrix \( \varvec{D}_{t} \). If the maximum eigenvalues are less than 1, the model satisfies the stability condition. If the maximum eigenvalues are equal to 1, the model satisfies the forecastability condition. However, if maximum eigenvalues are greater than 1, the model does not satisfy both stability and forecastability conditions.

To confirm the findings based on the theoretical inspection, an empirical experiment was conducted to numerically inspect the maximum eigenvalues of the matrix \( \varvec{D}_{t} \). For this purpose, simulated series were used to estimate parameters of each model. The estimated parameters were then used to produce forecasts for short and long period ahead up to 12,000 steps ahead. The maximum eigenvalues of the matrix \( \varvec{D}_{t} \) of the four models were then determined.

3 Results

Table 1 lists characteristic equations and eigenvalues of the four models under consideration. For the two models with a time-invariant regressor parameter, i.e., ETSX-TIP and ESWR-TIP, the discount matrix \( \varvec{D}_{\varvec{t}} \) contains a unit eigenvalue and the other eigenvalue is \( (1 - \alpha ) \). This means that both models do not satisfy the stability concept due to the presence of a unit eigenvalue. However, by restricting \( (1 - \alpha ) \) to be less than 1, these models are able to satisfy the forecastability concept.

Table 1 Characteristic equation of matrix \( \varvec{D}_{\varvec{t}} \) or \( \varvec{D}_{\varvec{t}}^{\varvec{*}} \varvec{ } \) and eigenvalues

With regards to the model with time-varying regressor parameter, ETSX-TVP is also unable to satisfy the stability condition as its discount matrix has a unit eigenvalue. The other eigenvalue which is \( \left( {1 - h_{1} z_{1,t} - \alpha } \right) \), is clearly dependent on the value of the regressor, \( z_{1,t} \). For in-sample period, the second eigenvalue can be controlled by restricting its value to be less than 1 in absolute value during the estimation process. However, the same restriction cannot be imposed for the out of sample forecast period. It means that the second eigenvalue could drift away to be greater than 1 and eventually cause the forecast value to explode.

Another version of the model with a time-varying regressor parameter, ESWR-TVP, has two sets of eigenvalues. The first set of eigenvalues is when the model normally runs the updating process of the regressor parameter and, therefore, the discount matrix is given by a matrix \( \varvec{D}_{\varvec{t}} \). The second set of eigenvalues occurs when the switching procedure is implemented to put on hold the updating process and, therefore, the discount matrix is given by a matrix \( \varvec{D}_{\varvec{t}}^{\varvec{*}} \). The characteristic equation of matrix \( \varvec{D}_{\varvec{t}} \) contains three roots with the smallest equal to zero while other two are roots of \( p(\lambda ) \). By applying the switching procedure using the methodology explained before, the \( p(\lambda ) \) will have the same values of \( \Delta _{{z_{1,t} }} \) and \( \Delta _{{z_{1,t} }}^{*} \) at most of the times and, therefore, the two eigenvalues are independent of regressor and can be restricted to be less than 1 for both within and out sample forecast periods, except for few times when the updating process is re-activated after being put on hold. In contrast, the matrix \( \varvec{D}_{\varvec{t}}^{\varvec{*}} \) contains two unit eigenvalues and if \( (1 - \alpha ) \) is restricted to be less than 1, then the model is able to satisfy the forecastability condition.

The above findings of theoretical expectation are supported by the results of the numerical experiment as given in Table 2. It can be seen that the maximum eigenvalue of the characteristic equation of the discount matrix is always equal to 1 for both ETSX-TIP and ESWR-TIP models. Maximum eigenvalues of the characteristic equation of the discount matrix for ETSX-TVP model, however, is only equal to 1 for short-term forecast horizons. For long forecast horizons, maximum eigenvalues of the characteristic equation of the discount matrix are greater than 1 with maximum eigenvalue exceeds 3 for 1000 steps ahead forecast and almost exceeds 34 for 12,000 steps ahead forecast. Interestingly, inspection on maximum eigenvalues of the characteristic equation of the discount matrix for ESWR-TVP model has shown that the highest value is equal to 1 and most of the maximum eigenvalue are equal to 0.99999.

Table 2 Maximum eigenvalue of matrix \( \varvec{D}_{\varvec{t}} \) or \( \varvec{D}_{\varvec{t}}^{\varvec{*}} \)

These results suggest that both time-invariant regressor parameter models and time-varying regressor parameter model formulated by Osman and King (2015a) are able to be forecastable models. The model structure of time-varying regressor parameters based on the formulation proposed by Hyndman et al. (2008) on the other hand, is unable to be a forecastable model.

4 Conclusion

To sum up, this study was aimed to investigate the ability of two formulations of integrating regressors into the exponential smoothing method in producing stable forecasts. For each method, two local-level models with one regressor were considered; one with time-invariant regressor parameter while another with time-varying regressor parameter. The inspection of the characteristic equations suggests that the formulation suggested by Hyndman et al. (2008) is able to produce stable forecasts for the case of time-invariant regressor parameter model. However, the time-varying version of the same formulation failed to produce stable forecasts. In contrast, the formulation suggested by Osman and King (2015a) is able to produce stable forecasts for both time-invariant and time-varying regressor parameter models.