1 Introduction

Policy makers require reliable forecasting of economic growth. An accurately gross domestic product (GDP) measuring helps policy makers, economists, and investors determine appropriate policies and financial strategies. Forecast of real GDP growth depends on many economic variables, while the publication by statistical agencies of GDP data is generally delayed by one or two quarters. For forecasting GDP growth, Thailand’s GDP is available only as quarterly data while other economic variables to be used as leading indicators may be available in monthly data. There is a huge literature including [6] for the United States of America and [3] for Euro area who employed financial variables as leading indicators of GDP growth. Ferrara and Marsilli [7] concluded that the stock index could improve forecasting accuracy on GDP growth.

Thus, involving data sampled at different frequencies in forecasting model seems be to beneficial. From the literature, a way of using high frequency indicators to forecast low frequency variable is the Mixed Data Sampling (MIDAS) model proposed by Ghysels et al. [10]. It has been applied in various fields such as financial economics [13] and macroeconomics [4, 5, 15] to forecast GDP. Clements and Galvo [5] concluded that the predictive ability of the indicators in comparison with an autoregression is stronger. It also allows the regressed and the regressors to be sampled at different frequencies and is a parsimonious way of allowing lags of explanatory variables. MIDAS regression model combined with forecast combination schemes if large data sets are involved are computationally easy to implement and are less prone to specification errors. Based on the parsimony of representation argumentFootnote 1, the higher frequency part of conditional expectation of MIDAS regression is often formulated in terms of aggregates which depend on a weighting function. However, there are many weighting schemes such as Step, Exponential Almon, and Beta (analogue of probability density function) [9].

The objective of this paper is to use such important financial leading indicator as Stock Exchange of Thailand (SET) index to forecast Thailands quarterly GDP growth by using the different weighted MIDAS models. In addition to MIDAS model with weighting schemes, we also consider the traditional time-aggregate model and the unrestricted MIDAS model. This will allow us to see whether high frequency data render any benefit in predicting lower frequency data, and if it does, which model specification performs the best in this setting of forecasting Thailands quarterly GDP growth. The result of study will be useful for government in imposing appropriate policies and strategies for stabilising countrys economy.

The organisation of this paper is as follows. Section 2 describes the scope of the data used in this study. Section 3 provides the methodology of this study and provides the estimation of this study. Section 4 discusses the empirical results. Conclusion of this study is drawn in Sect. 5.

2 Data

The data in this study consist of Thailand’s quarterly gross domestic product (GDP) and monthly Stock Exchange of Thailand (SET) index. GDP is obtained from the Bank of Thailand while SET index is obtained from the Stock Exchange of Thailand. The series cover period of 2001Q1 to 2016Q4, while data during 2001Q1 to 2015Q4 are used for model estimation, the rest are left for out-of-sample forecast evaluation. All variables are transformed into year-to-year (Y-o-Y) growth rate to reduce the risk of having seasonality. Figures 1 and 2 provide the plot of GDP growth and SET index growth.

Fig. 1.
figure 1

Quarterly GDP growth 2001Q1 to 2016Q4

Fig. 2.
figure 2

Monthly SET index growth 2001Q1 to 2016Q4

It can be seen from the figures that there is a huge drop in GDP growth around the end of 2008 and the beginning of 2009. The SET growth also changes in similar manner during the same period. This is believed to be the results of US financial crisis. Also, around the end of 2011, it can be seen that there is a drop in GDP growth which was caused by the great flood in Thailand. Again, the SET index growth follows in the same direction. These figures may suggest that financial variable such as the SET index is a potential predictor for GDP.

3 Methodology

Prior to model estimation and forecasting, it is recommended to check whether series in the study is stationary or not. Therefore, in this section, we begin with brief information regarding the unit root tests, followed by forecasting models employed in this study.

3.1 Unit Root Tests

We start with the Augmented Dickey-Fuller (ADF) test [18] which is very well-known and widely-used in empirical works. The test model can be specified as

$$\begin{aligned} \varDelta {y_t} = {\alpha _0} + {\alpha _1}{y_{t - 1}} + \sum \limits _{i = 1}^p {{\alpha _{2i}}\varDelta } {y_{t - i}} + \mathrm{{ }}{\varepsilon _t} \end{aligned}$$
(1)

where \(y_t\) is the time series being tested and \({\varepsilon _t}\) is residual. The hypothesis testing can be specified as \({H_0}:{\alpha _1}=0\) for non-stationary against \({H_1}:{{\alpha _1}}< 1\) for stationary.

Next, the Phillips-Perron (PP) test [17] has been frequently used as an alternative test to the ADF test. The test employs the same null hypothesis of non-stationary as in the ADF test. However, the advantage of this test is that the additional lagged dependent variable is not required in the presence of serial correlation. Additionally, it is robust to the functional form of the error term in the model since the test is non-parametric. However, the test requires large sample properties in order to perform well.

Unlike the ADF test and the PP test, the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test introduced by Kwiatkowski et al. [16] has the null hypothesis of stationary. With alternative way of interpreting the null hypothesis, the KPSS test complements other unit root tests.

By looking at the results from each test, we can have a better view before making a conclusion on whether the series is stationary, non-stationary, or inconclusive. This is important since the stationarity of the series is required for the forecasting models considered in the study. Now, we are going to describe the five approaches that incorporate higher frequency data in forecasting lower-frequency variables.

3.2 Time-Aggregate Model

Traditionally, when one working on forecasting that involves mixed frequency data, all series must be converted into the same frequency. That means all the series will be transformed into the frequency matching that of series which was observed at the lowest frequency. As pointed out by Armesto et al. [1], this can be easily done by taking an average of values from high frequency data within the time frame of low frequency data. For instance, we work on variable X which is measured monthly and Y being observed quarterly data. Then, X will be transformed to match the same frequency of Y by taking the average of X at each respective quarter. After transformation, we can now use the new Y to help predict X. This is so-called the time-aggregate model. Suppose that we are interested in one step forecast, the model can be mathematically specified as

$$\begin{aligned} Y_t=\alpha +\sum ^p_{i=1}{{\beta }_iL^iY_t}+\sum ^r_{j=1}{{\gamma }_jL^j{\overline{X}}_t}+{\varepsilon }_t \end{aligned}$$
(2)

with

$$\begin{aligned} {\overline{X}}_t=\frac{1}{m}\sum ^{m-1}_{k=0}{X^{\left( m\right) }_{t-\left( {k}/{m}\right) }} \end{aligned}$$
(3)

where \(Y_t\) is a lower-frequency variable; \(X^{\left( m\right) }_{t-\left( {k}/{m}\right) }\) denotes the data from high frequency variable\(\ k\) periods prior to the low frequency period t; m is the frequency ratio between high and low frequency series (In the case of quarterly and monthly, \(m=3\) since the higher frequency monthly variable can be observed three times within each quarter); \({\overline{X}}_t\) is the average of \(X^{\left( m\right) }_{t-\left( {k}/{m}\right) }\) at the low frequency period t. L is a lag operator such that \(LY_t=Y_{t-1}\), \(L^2Y_t=Y_{t-2}\) and so on. i and j denote the selected lag lengths which are determined by Akaike Information Criterion (AIC). This approach is limited to the fact that it assumes coefficients of \(X^H_{\left( t-k,m\right) }\) within each period t to be the same. In addition, there may be information loss due to the averaging [14].

3.3 MIDAS Regression Models

Ghysels et al. [10] proposed a Mixed Data Sampling (MIDAS) approach to deal with various frequencies in multivariate model. Particularly, a MIDAS regression tries to deal with a low-frequency variable by using higher frequency explanatory variables as a parsimonious distributed lag. It also does not use any aggregation procedure and can be modelled for the coefficients on the lagged explanatory variables as allowing long lags in distributed lag function with only small number of parameters that have to be estimated [5]. The general form of MIDAS model is given by

$$\begin{aligned} Y_t=\alpha +\gamma W\left( \theta \right) X^{\left( m\right) }_{t-h}+{\varepsilon }_t \end{aligned}$$
(4)

where \(X^{\left( m\right) }_{t-h}\) is an exogenous variable measured at higher frequency than \(Y_t\). h is forecasting step. If \(h=1\), it means we are going to forecast the dependent variable by one period ahead using current and historical information of X. \(W\left( \theta \right) \ \)smooths historical values of \(X^{\left( m\right) }_{t-h}\). Unlike the time-aggregate model which simply takes the average, there are some weighting schemes here, controlled by estimated parameter \(\theta \ \)that allows us to convert the variable more efficiently. It can be written as

$$\begin{aligned} W\left( \theta \right) =\sum ^K_{k=1}{\omega \left( k;\theta \right) L^{\left( k-1\right) /m}} \end{aligned}$$
(5)

where K is the optimal number of lagged high frequency variable to be employed in the model. \(\ L\ \)is the lag operator such that

$$\begin{aligned} L^{\left( k-1\right) /m}X^{\left( m\right) }_{t-h}=X^{\left( m\right) }_{t-h-\left( \frac{k-1}{m}\right) }, \end{aligned}$$

and \(\omega \left( k;\theta \right) \) is the weighting function that can be in various forms. It can be noticed that it is possible to include the lagged dependent variable into the MIDAS model. Tungtrakul et al. [19] found that it provides a better forecast accuracy. Hence, the general form of MIDAS model becomes

$$\begin{aligned} Y_t=\alpha +\sum ^p_{i=h}{{\beta }_iL^iY_{t-h}}+\gamma \left( \sum ^K_{k=1}{\omega \left( k;\theta \right) L^{\left( k-1\right) /m}X^{\left( m\right) }_{t-h}}\right) +{\varepsilon }_t \end{aligned}$$
(6)

Now, we will discuss the different MIDAS weighting schemes employed in this study.

3.3.1 Step Weighting Scheme

Rather than transforming the high frequency variable to match the lower one, this approach directly includes all lags of high frequency variable into the model. It takes each lagged high frequency variable as an explanatory variable in the model. Thus, no information has been lost. The MIDAS model under step weighting scheme with step length of s can be specified as follows:

$$\begin{aligned} Y_t=\alpha + \sum ^p_{i=h}{{\beta }_i Y_{t-i}}+\sum ^K_{k=1}{{\gamma }_{k,s} X^{\left( m\right) }_{t-h-(k-1)/m}}+{\varepsilon }_t. \end{aligned}$$
(7)

However, this approach puts a restriction on the coefficient of lagged high frequency variable (\({\gamma }_{k,s}\)), which is determined by the step parameter (s). For instance, if the step parameter is equal to three (\(s=3\)), it means the first three lagged have the same coefficient, the next three lags will then employ another same coefficient. This pattern will continue to the last lag that is incorporated in the model.

For demonstration purpose, consider the case that \(p=1\), \(h=1\), \(m=3\), \(K=4\), and \(s=2\), then the MIDAS model with step weighting scheme can be specified as

$$\begin{aligned} Y_t=\alpha + {\beta }_1 Y_{t-1}+ {\gamma }_{1,2} X^{\left( 3\right) }_{t-1}+{\gamma }_{2,2} X^{\left( 3\right) }_{t-1-{1/3}}+{\gamma }_{3,2} X^{\left( 3\right) }_{t-1-{2/3}}+{\gamma }_{4,2} X^{\left( 3\right) }_{t-2}+{\varepsilon }_t. \end{aligned}$$
(8)

If \(Y_t\) is the GDP growth for the third quarter of 2017, then \(X^{\left( 3\right) }_{t-1}\) is a value of an indicator from June 2017, \(X^{\left( 3\right) }_{t-1-{1/3}}\) is from May 2017, \(X^{\left( 3\right) }_{t-1-{2/3}}\) is from April 2017, and \(X^{\left( 3\right) }_{t-2}\) is from March 2017. Also, the restriction on parameters are \({\gamma }_{1,2} = {\gamma }_{2,2}\), and \({\gamma }_{3,2} = {\gamma }_{4,2}\).

Another drawback of the step weighting scheme is that the model may suffer from large numbers of parameters due to high difference in frequency between high and low frequency series [1]. Suppose that we work on annual series and monthly, we can see that we have got at least 12 coefficients to be estimated. Thus, the estimation outcome may not be satisfactory.

3.3.2 Exponential Almon Weight

This weighting scheme has been employed in various empirical studies due to its flexibility despite involving a few parameters in estimation [8]. The weighing scheme can be specified as

$$\begin{aligned} \omega \left( k;\theta \right) =\frac{exp\left( k{\theta }_1+k^2{\theta }_2\right) }{\sum ^K_{k=1}{exp\left( j{\theta }_1+j^2{\theta }_2\right) }}. \end{aligned}$$
(9)

To have a better view how the MIDAS model with Exponential Almon weighting scheme is mathematically specified, let us consider the case that optimal lagged high frequency variable is 3 (or \(K=3\)), the ratio between high and low variables is 3 (or \(m=3\)), no lagged dependent variable, and forecasting for one step ahead (\(h=1\)). The model then can be written as follows

$$\begin{aligned} Y_t=\alpha +\gamma \left( \begin{array}{c} \frac{exp\left( {\theta }_1+{\theta }_2\right) }{\sum ^3_{k=1}{exp\left( k{\theta }_1+k^2{\theta }_2\right) }}\left( X^{\left( 3\right) }_{t-1}\right) \\ +\frac{exp\left( 2{\theta }_1+4{\theta }_2\right) }{\sum ^3_{k=1}{exp\left( k{\theta }_1+k^2{\theta }_2\right) }}\left( X^{\left( 3\right) }_{t-1-\left( \frac{1}{3}\right) }\right) \\ +\frac{exp\left( 3{\theta }_1+9{\theta }_2\right) }{\sum ^3_{k=1}{exp\left( k{\theta }_1+k^2{\theta }_2\right) }}\left( X^{\left( 3\right) }_{t-1-\left( \frac{2}{3}\right) }\right) \end{array} \right) +{\varepsilon }_t. \end{aligned}$$
(10)

Suppose that \(Y_t\)is measured at the 4\({}^{th}\) quarter of 2017, then\(\ X^{\left( 3\right) }_{t-1}\) is from September 2017, \(X^{\left( 3\right) }_{t-1-\left( \frac{1}{3}\right) }\) is from August 2017 and \(X^{\left( 3\right) }_{t-1-\left( \frac{2}{3}\right) }\) is from July 2017. \(\alpha \), \(\gamma \), \({\theta }_1\), and \({\theta }_2\) can be estimated by using either maximum likelihood approach or non-linear least squares (NLS) approach. Ghysels et al. [10] pointed out that the number of parameters in the MIDAS model with exponential Almon weight is not influenced by the number of lagged high frequency variables. This important feature of MIDAS regression model allows us to employ large lagged high frequency variables and, at the same time, maintain parsimonious parameter estimation [1, 12].

3.3.3 Beta Weight

It is another weighting scheme, which is an analogue of probability density function. It has been considered in empirical works as alternative to the exponential Almon weight [11]. According to Armesto et al. [1], this weighting scheme can be specified as follows

$$\begin{aligned} \omega \left( k;\theta \right) =\frac{f\left( \frac{k}{K},{\theta }_1,{\theta }_2\right) }{\sum ^K_{k=1}{f\left( \frac{k}{K},{\theta }_1,{\theta }_2\right) }}, \end{aligned}$$
(11)

where

$$\begin{aligned} f\left( x,a,b\right) =\frac{x^{a-1}{\left( 1-x\right) }^{b-1}\mathrm {\Gamma }\left( a+b\right) }{\mathrm {\Gamma }\left( a\right) \mathrm {\Gamma }\left( b\right) }, \end{aligned}$$
(12)

and

$$\begin{aligned} \mathrm {\Gamma }\left( a\right) =\int ^{\infty }_0{e^{-x}x^{a-1}\ dx} \end{aligned}$$
(13)

is the gamma function. \({\theta }_1\) and \({\theta }_2\) are parameters that control the weighing value for each lagged high preference variable.

3.4 Unrestricted MIDAS (U-MIDAS) Model

It can be noticed MIDAS models with weighting schemes may not completely extract all information from high frequency variable [14] since it still involves a frequency transformation. Kingnetr et al. [14] further asserted that the forecasting outcome may be satisfactory in the MIDAS model with exponential Almon weight framework when the difference in sampling frequencies between variables in the study is relatively small. Additionally, the model requires assumption on weighting scheme which may or may not be appropriate for every series. The MIDAS with exponential Almon may work well with one series, but not another. Foroni and Marcellino [8] suggested an alternative approach, the unrestricted MIDAS (U-MIDAS) regression model, to deal with the issue.

The basic idea of U-MIDAS model is similar to the MIDAS with step weighting scheme, except that the coefficient of each lagged high frequency variable is allowed to differ. Suppose that low frequency data is measured quarterly, while the high frequency is measured monthly, the U-MIDAS model for h-step forecasting can be written as

$$\begin{aligned} Y_t=\alpha + \sum ^p_{i=1}{{\beta }_i Y_{t-h-i}}+\sum ^K_{k=1}{{\gamma }_k X^{\left( m\right) }_{t-h-(k-1)/m}}+{\varepsilon }_t. \end{aligned}$$
(14)

\(Y_t\) is a quarterly variable at period t, \(X^{\left( m\right) }_{t-h-(k-1)/m}\) is a monthly indicator measured at \(k-1\) months prior to the last month of the quarter at period \(t-h\), h is the forecasting step, m is a frequency ratio, K is a number of monthly data used to predict \(Y_t\).

By taking each lagged high frequency variable as additional explanatory variable in the model, the parameters in U-MIDAS model can simply be estimated using OLS estimation [8]. However, the U-MIDAS will lose its parsimonious feature if the number of frequency ratio between high and low frequency variables is large. For instance, forecasting monthly series using daily series will involve more than 20 parameters to be estimated, which would lead to undesirable estimation and forecasting results.

4 Empirical Results

In this section, we begin with the results of unit root tests, followed by the results from each forecasting model considered in the study and discussion on their forecasting performances.

Table 1. Unit root tests

The results of unit root test are reported in Table 1. In the case of GDP growth, the null hypothesis of non-stationary cannot be rejected in the case of ADF test without intercept (specification C). However, the rest of the tests show that GDP growth is stationary. Similarly, all tests, except for ADF with trend and intercept, conclude that SET growth is stationary. Therefore, it is reasonable to conclude that both series are stationary and can be undergone model estimation and forecasting.

As far as model estimation is concerned, the data sample during the period of 2001Q1 to 2015Q4 is employed. The linear least squares estimation technique is used to estimate parameters for the time-aggregate model, while parameters in the MIDAS models are handled by the non-linear least squares. The optimal lag lengths for all models are chosen by Akaike information criterion (AIC) with the maximum of 24 lags. Then, we forecast the quarterly GDP growth rates for 2016. Since we are interested in comparing forecasting performance between models, it is advised to investigate how these models perform through figure. Figure 3 provides a plot of actual value of quarterly GDP growth rate and its forecasted values from different models.

Fig. 3.
figure 3

Forecast and actual quarterly GDP growth in 2016

It can be seen from Fig. 3 that the time-aggregate model, MIDAS model with Beta weighting, and U-MIDAS model seem to predict the GDP growth rate closer to the actual values than the MIDAS models with exponential Almon and step weighting schemes. However, as the forecasting horizon expands, the former three models seem to perform worse than the latter two. Table 2 provides forecasting results in details together with lag selection for each model.

Table 2. Quarterly GDP growth forecast in 2016

It is possible to notice that it is still uncertain to see which model can generally perform better. Therefore, we now turn to the root mean square error (RMSE) for evaluation. Table 3 shows the RMSEs for each model at each forecasting horizon.

Table 3. Forecast evaluation based on RMSEs

The results from Table 3 suggest that, overall, the unrestricted MIDAS (U-MIDAS) model exhibits higher forecasting accuracy than the rest of the models in this study. The conclusion here is also consistent with the recent empirical work by [14]. In addition, the superior in forecasting precision may due to the fact that, in U-MIDAS framework, information of high frequency variable is fully utilised. The results also suggest that the forecasting improvement is rather moderate, when it comes to the comparison between MIDAS models with weighting schemes and the traditional time-aggregate model. According to RMSEs, we can see that only the MIDAS model with beta weighting scheme can outperform the time-aggregate model in this study. This implies that using higher frequency will not improve the outcome after all if the inappropriate weighting scheme is chosen. Nevertheless, we can conclude that using high-frequency variable to predict the lower frequency one improves forecasting precision under U-MIDAS model, provided that the difference in frequency between series in a study is small.

5 Conclusion

In this paper, we investigate the forecasting performance of 5 different forecasting models, including the time-aggregate model, the MIDAS model with step weighting, exponential Almon weighting, and beta weighting, and the U-MIDAS model. Thailand’s quarterly GDP growth was forecasted using a financial variable, SET index, as a predictor. Unlike the time-aggregate model, the MIDAS model with weighting scheme allows us to efficiently utilise the information of high frequency variable to forecast lower frequency variable. However, it still involves the concept of frequency conversion, as in the time-aggregate model, via weighting schemes.

On the other hand, the U-MIDAS model fully exhausts information of high frequency variable. The model directly incorporates high frequency variable into forecasting model without frequency conversion. The data in this study spans from 2001Q1 to 2016Q4 with 2016Q1 to 2016Q4 being left out for forecasting performance evaluation. Our results, based on RMSEs, show that the U-MIDAS model has greater forecasting precision than other models in this study. This implies that, under the U-MIDAS framework, using high frequency variable to predict lower frequency variable improves the forecasting accuracy.

In addition, we found that the improvement of using higher frequency variable to predict lower frequency variable is rather small when it comes to the MIDAS model with weighting scheme. The forecasting results may even be worst if the weighting scheme is not appropriately chosen. If one wishes to employ such MIDAS model, the results suggest that the MIDAS model with Beta weighting scheme performs best among other weighting schemes. Otherwise, the traditional time-aggregate seems to provide acceptable predicting accuracy for short-horizon.

Nevertheless, this study focused on four-period forecasting using single predictor and ignored the possibility of having structural break in time series due to the limitation of approaches. Therefore, the recommendation for future research would be the inclusion of additional predictors in the model, longer forecasting horizon and controlling for potential structural breaks.