1 Introduction

Gross domestic product (GDP) is one of the most important indicators which summarize economic activity. However, it is unavailable at frequencies higher than the quarterly level and released with a significant lag. Considering the importance of timely assessment of current and future economic developments, forecasting GDP is an important task for policymakers. To forecast GDP data, analysts usually track timelier and high-frequency indicators, including monthly, weekly or daily data. In the literature, leading property of financial market variables about the future state of the economy has been pointed out (Stock and Watson 2003). In this study, we focus on the quarter-on-quarter Turkish GDP growth between 2000-Q2 and 2016-Q2 and generate nowcasts and forecasts for Turkish quarterly GDP growth by employing high-frequency financial data with the mixed data sampling (MIDAS) models. We evaluate forward-looking nature of financial variables for the Turkish economy over the period starting from the second quarter of 2010 until the second quarter of 2016.

Regarding the use of high-frequency data in forecasting GDP, mapping movements in these indicators into real GDP growth is not an easy task because there is mixed frequency in the data. For example, GDP data are sampled quarterly, employment and inflation data are sampled monthly, and asset price data are sampled daily. The simplest approach is to aggregate the high-frequency data to obtain a balanced data set at the same low frequency. Explicitly, a forecaster may time aggregate three monthly samples of employment data into a single observation for each quarterly sample of GDP data by taking an average of the monthly data. However, it is simple; in general, temporal aggregation entails a loss of information (Marcellino 1999). Moreover, it also changes the data generating mechanism, which may lead to considerable difference between the dynamics of the aggregate and high- or mixed-frequency model. This indicates that key econometric features can be spuriously modified when employing the aggregated data (Marcellino 1999). In this paper, we follow MIDAS approach developed by Ghysels et al. (2004). MIDAS models are one of alternative approaches to model the mixed-frequency data to avoid the mentioned problems related to temporal aggregation. They have parsimonious specifications based on distributed lag polynomials, which flexibly deal with data sampled at different frequencies and provide a direct forecast of the low-frequency variable (Ghysels et al. 2004; Clements and Galvão 2008).

The advantages of using MIDAS regressions in terms of improving quarterly macroforecasts with monthly data, or improving quarterly and monthly macroeconomic predictions with a small set of daily financial data have been investigated in the literature (Kuzin et al. 2011; Armesto et al. 2009; Clements and Galvão 2008, 2009; Galvão 2013; Tay 2007; Schumacher and Breitung 2008; Ghysels and Wright 2009; Hamilton 2008; Monteforte and Moretti 2013). However, these studies limit the number of variables in the regression and do not incorporate large set of data into their analysis. Apart from the previous works, (Andreou et al. 2013) use large cross section of around one thousand financial time series to forecast quarterly US GDP growth rate. They extract a small set of daily factors from these large dataset and apply forecast combination methods. Their application shows that MIDAS model forecasts are able to outperform naive models substantially.

Most of the existing studies employing MIDAS regression models focus on the US or the euro area economy to forecast quarterly series (Tay 2007; Clements and Galvão 2009), but few on the developing economies. In this study, we take Turkey as our laboratory. The reason is that it is a small open economy and exhibits similar macroeconomic dynamics to other emerging economies. Assessing the ability of financial market variables to anticipate Turkish output growth may provide valuable information about the forward-looking nature of financial variables in peer countries.

The literature related to forecasting Turkish GDP is quite limited. In one of the rare studies, Akkoyun and Günay (2012) employ small-scale dynamic factor model. They forecast quarterly GDP growth with monthly data by following Kalman filter approach. They find that some combinations of hard indicators such as industrial production, export and import quantity indices and soft indicators such as Purchasing Managers Index (PMI) and PMI new orders have predictive power for the output growth. Sacakli-Sacildi (2015) tests forecasting performance of Bayesian vector autoregression (BVAR) methodology for Turkish GDP growth rate. However, to our best knowledge, MIDAS regression modeling has not been used for forecasting Turkish GDP growth and this paper is the first attempt in exploring forecasting ability of financial data using MIDAS regression models for Turkish output growth. In this study, we follow the forecasting strategy introduced by Andreou et al. (2013) and generate nowcasts and forecasts of Turkish GDP growth using 204 daily financial series. We analyze whether financial data provide information about the future state of the economy and improve forecasting ability.

Our results suggest that there are statistically significant gains in using daily financial data and MIDAS models. MIDAS models with daily series and quarterly factors perform better than the benchmark model and other linear models at every horizon in consideration. The robustness of the results is proven by different predictive ability tests. Another implication of the results is that daily factors extracted from commodities, forex and domestic corporate risk series give the best forecasts at certain horizons, and in others, their results are competitive with other financial data classes. The results also reveal that combination of small set of daily factors received from separate financial data classes provides substantial forecasting gains especially in the medium run. This implies that the small set of series can be employed in following the Turkish economy.

The article is organized as follows. Section 2 describes the MIDAS models and the forecasting methodology. Section 3 describes the dataset, and Sect. 4 presents the empirical results. Finally, Sect. 5 concludes.

2 Forecasting methodology

This section describes the MIDAS models and forecast combination methods that are used in the study. The aim of the section is to give the intuition and details of MIDAS models and how they are implemented in macroeconomic forecasting. The model is compared and contrasted with other methods in the literature to give a clear picture of its flexibility as well as its limitations. To improve forecast performance, the study uses forecast combination methods. The second part of the section explains the forecast combination methods and how they are used.

2.1 MIDAS models

In order to understand innovations brought by MIDAS to single equation estimation of mixed-frequency data, first consider a distributed lag (DL) model which is typically employed in the literature to describe the distribution over time of the lagged effects of a change in the explanatory variable. In this linear regression methods, once all the high-frequency values are aggregated to the corresponding low-frequency values.

$$\begin{aligned} Y_{t} = \beta _0+ + \beta (L)X_t + \varepsilon _t, \end{aligned}$$
(1)

where \(\beta (L)\) is some finite or infinite lag polynomial operator, \(Y_t\) is the low-frequency variable, \(X_t\) is the high-frequency variable and \(\varepsilon _t\) is the error term. DL regression involves temporally aggregated series, based, for example, on equal weights of daily data. In this setup, reliable estimations of the low-frequency variable necessitates the correct form of aggregation of the high-frequency variable into low frequency. Therefore, a crucial drawback of the model is that the imposed aggregation schemes might lead to severe aggregation bias.

A potential solution to the aggregation bias is to quit the aggregation and project \(Y_t\) on, for instance, every observation of the high-frequency variable for each lag of the low frequency. This approach might work well when the fixed number of high-frequency periods in one single low-frequency period is small (e.g., it is 3 such as in the case of estimation of a quarterly variable with monthly data). However, for instance, in a quarterly low-frequency and daily high-frequency setting, even only one lag of the quarterly data is used, there are more than 60 parameters to estimate corresponding each day in the lagged quarter. Thus, parameter proliferation is another potential problem with the single equation models.

Novelty of the MIDAS models lies in its solution to handling high- frequency data. In its simplest form, the novelty can be described as aggregating data parsimoniously and allowing data to decide on the aggregation weights. MIDAS does these by using polynomial distributed lag functions to weigh the high-frequency data. Consider the basic MIDAS model:Footnote 1

$$\begin{aligned} Y_t=\beta _0+\beta _1\sum _{j=0}^{(q_X-1)}\sum _{i=0}^{(m-1)}\omega (\mathbf {\theta },i+j\times m)X_{(m-i,t-j)}+\varepsilon _{(t)}, \end{aligned}$$
(2)

where \(\omega (\mathbf {\theta },i+j\times m)\) is the polynomial distributed lag function that depends on hyperparameter \(\mathbf {\theta }\); \(q_X\) is the number of low-frequency lags; m is the fixed number of high-frequency periods in one single low-frequency period (e.g., 3 months in a quarter); (it) is the ith lagged high-frequency period in the low-frequency period t; and j is low-frequency lag.

Specification of weights is central to the MIDAS models since it gives the parsimony of the model over models such as bridge equations and autoregressive distributed lag (ADL). There is a number of polynomial functions suggested in the literature some of which are beta function, exponential Almon lag function, step function and Almon distributed lag polynomial. This study is using Almon distributed lag polynomialFootnote 2 in which the weights can be written as

$$\begin{aligned} \omega (\mathbf {\theta },i)=\sum _{p=0}^{n_p}\theta _p i^p, \end{aligned}$$
(3)

and assumes that the weight of the \(i\mathrm{th}\) lag can be calculated with underlying \((n_p+1)\) hyperparameters \(\mathbf {\theta }=(\theta _0,\ldots ,\theta _p)\). Now, assume that one wants to estimate quarterly data with daily data and a quarter consists of 66 working days. Instead of assigning weights separately to each lag and estimating 66 coefficients, with two or three hyperparameters, the Almon lag polynomial solves the parameter proliferation problem. Furthermore, weights are data driven which prevents aggregation bias. The parameters of the MIDAS regression model are estimated by nonlinear least squares.

Ghysels et al. (2004) show that MIDAS is always more efficient than simple data aggregation. MIDAS gives a reduced form method which does not require complexity of other alternative methods in the literature such as Kalman filters. The performance of MIDAS models against state-space models has been discussed by Andreou et al. (2011) and Bai et al. (2013). These studies show that in simple frameworks Kalman filters are found to be more efficient. However, comparisons with more complex problems imply that MIDAS models are less “prone” to specification errors and are easier to implement with a few parameters. This is why in a data-rich environment, such as in the empirical study of Andreou et al. (2013) with over 900 daily financial series, MIDAS can be preferred as the forecasting model.

2.2 Forecasting with MIDAS

This study deals with the problem of forecasting GDP growth in a data-rich environment and follows the methodology sketched in detail by Andreou et al. (2013). For h-step-ahead forecasts considering the persistence of growth in an economy, MIDAS Eq. 2 can be written as follows:

$$\begin{aligned} Y_{t+h}=\beta _0^h+\sum _{k=0}^{(q_Y-1)}\rho _k^hY_{(t-k)}+\beta _1^h\sum _{j=0}^{(q_X-1)}\sum _{i=0}^{(m-1)}\omega (\mathbf {\theta }^h,i+j\times m)X_{(m-i,t-j)}+\varepsilon _{(t+h)}^h, \end{aligned}$$
(4)

where \(\beta _0^h\) is the constant coefficient, \(\beta _1^h\) is the common slope, \(\omega (\mathbf {\theta }^h,i+j\times m)\) is the weight for the lagged high-frequency values, \(q_y\) is the number autoregressive lags, \(\rho _k^h\) are the corresponding coefficients to these lags and \(\varepsilon _{(t+h)}^h\) is the error term. Equation 4 is autoregressive distributed lag MIDAS (ADL-MIDAS) model.

As it can be seen in Eq. 4, MIDAS forecasts depend on the forecast horizon; thus, it has to be re-estimated for each horizon of interest and it gives direct forecasts for each of these horizons. This property of MIDAS makes it more robust in comparison with methods that use iterated forecasting since this forecasting method might cumulate possible model misspecification errors and cause inferior forecasts (Marcellino et al. 2006; Andreou et al. 2011).

As a deviation from the basic MIDAS model, factors extracted by principal component analysis (PCA) are included in the forecast regressions. As noted by Stock and Watson (2002) and Andreou et al. (2013), quarterly factors (i) reduce the dimensionality of variables at hand and (ii) might improve the performance of macroeconomic forecasts. PCA is chosen as the method of factor extraction following Andreou et al. (2013). Marcellino and Schumacher (2010) focus on three factor estimators. They find no clear indication of dramatic differences between the nowcast accuracy of the three-factor models over time.Footnote 3

Two types of factors are included in the regressions. The first type of factors is extracted from quarterly macroeconomic series and included in the regressions as the following:

$$\begin{aligned} Y_{t+h}= & {} \beta _0^h+\sum _{k=0}^{(q_Y-1)}\rho _k^hY_{(t-k)}+\sum _{k=0}^{(q_F-1)}\alpha _k^hF_{(t-k)}\nonumber \\&+\,\beta _1^h\sum _{j=0}^{(q_X-1)}\sum _{i=0}^{(m-1)}\omega (\mathbf {\theta }^h,i+j\times m)X_{(m-i,t-j)}+\varepsilon _{(t+h)}^h, \end{aligned}$$
(5)

where \(F_{t-k}\) represents quarterly factors and \(\alpha _k^h\) are the corresponding coefficients. Models that include quarterly factors are called factor autoregressive distributed lag MIDAS (FADL-MIDAS) models throughout the study.

The second type of factors is extracted from daily financial series. Estimations are carried out with factors both from all series and from separate financial data classes. These factors are high- frequency variables and are inserted as \(X_{(m-j,t)}\) in Eqs. 4 and 5.

In the empirical part of the study, ADL-MIDAS and FADL-MIDAS are compared with their non-MIDAS counterparts (i.e., ADL and FADL) in order to measure the added value of the MIDAS method. Let \(X_t^{Q}\) be the aggregation of the high-frequency data for quarter t, then the counterpart models can be given as:

$$\begin{aligned} Y_{t+h}&=\beta _0^h+\sum _{k=0}^{(q_Y-1)}\rho _k^hY_{(t-k)}+\sum _{j=0}^{(q_X-1)}\beta _j X_{(t-j)}^Q+\varepsilon _{(t+h)}^h, \end{aligned}$$
(6)
$$\begin{aligned} Y_{t+h}&=\beta _0^h+\sum _{k=0}^{(q_Y-1)}\rho _k^hY_{(t-k)}+\sum _{k=0}^{(q_F-1)}\alpha _k^hF_{(t-k)}+\sum _{j=0}^{(q_X-1)}\beta _j X_{(t-j)}^Q+\varepsilon _{(t+h)}^h. \end{aligned}$$
(7)

Equations 6 and 7 are labeled as the ADL and factor ADL (FADL) models, respectively.

Within-quarter forecasts are made by using the MIDAS with leads model. MIDAS with leads model was first used by Clements and Galvão (2008) and Kuzin et al. (2011) in a monthly–quarterly mixed data context. To illustrate the method within the daily–quarterly context, assume that we want to obtain 1-month-ahead forecast for a quarterly variable. In that case, we are 2 months into the quarter of interest and have financial data of 44 working days. Let \(J_x\) denote the number of months that passed in the quarter (i.e., for the current example, \(J_x=2\)). Then, the ADL-MIDAS with leads can be written as:

$$\begin{aligned} \begin{aligned} Y_{t+h}=\beta _0^h&+\sum _{k=0}^{(q_Y-1)}\rho _k^hY_{(t-k)}+\beta _1^h\left[ \sum _{i=(3-J_x)\times m/3}^{(m-1)}\omega (\mathbf {\theta }^h,i-m)X_{(m-i,t+1)}\right. \\&\left. +\,\sum _{j=0}^{(q_X-1)}\sum _{i=0}^{(m-1)}\omega (\mathbf {\theta }^h,i+j\times m)X_{(m-i,t-j)} \phantom {\int _1^2} \right] +\varepsilon _{(t+h)}^h, \end{aligned} \end{aligned}$$
(8)

where the first term in the brackets represent the leads and the second term represents the lags (i.e., information that belongs to the previous quarters).

In the forecasting literature, nowcastingFootnote 4 refers to within period of forecasts and updating them when new information becomes available. Andreou et al. (2011) note two main differences between MIDAS with leads and nowcasting. First, while nowcasting gives an update for the current quarter, MIDAS with leads can provide direct forecasts not only for the current quarter but also for longer horizons. Second difference is related to how models that do nowcasting (e.g., state-space models) and MIDAS with leads handle arrival of new information. Issues such as announcement dates, missing data and ragged end are taken into account in the state-space models. Furthermore, effects of macroeconomic announcements can also be analyzed with Kalman filters. However, daily financial data are available on “daily” basis and do not require updates. This study does not deal with the noted problems and assume that all news effect is already reflected by the daily data.

2.3 Forecast combinations

As a part of the forecasting procedure, individual forecasts are combined by using forecast combination methods suggested in the literature. Combining individual forecasts by a pre-specified weighting rule has shown to improve forecast accuracy (Bates and Granger 1969). This conclusion is reached by the assumption that individual forecasts are unbiased. The intuition behind the forecast gain by combination is that one forecast might include information which is not considered by the others and a better forecast can be generated by combining different information sets. In addition to that, as pointed out by Hendry and Clements (2004) and Stock and Watson (2004), forecast combinations can deal with model instability and structural breaks.Footnote 5

Assume that R is the number of h-step-ahead forecasts made for \(Y_t\) at time t. Then, the combination of forecasts, \(\hat{Y}_{(t+h)}^{c_R}\), can be written as

$$\begin{aligned} \hat{Y}_{(t+h)}^{c_R}=\sum _{r=1}^{R}w_{r,t}^c \hat{Y}_{(r,t+h)}, \end{aligned}$$
(9)

where the superscript \(c_R\) denotes the combination of R forecasts, \(\hat{Y}_{(r,t+h)}\) is the forecast from model r and \(w_{r,t}^c\) is the combination weight for the forecast from model r. There are methods suggested in the literature to decide on the weights of individual forecasts. It can be as simple as assigning equal weights to each model (i.e., \(w_{r,t}^c=1/R\)). This study employs discounted mean squared forecast error (DMSFE)Footnote 6 forecast combination method. With this method, forecast weights can be calculated as follows:

$$\begin{aligned} w_{r,t}^c=\frac{(\lambda _{r,t})^\kappa }{\sum _{r=1}^{R}(\lambda _{r,t}^{-1})^\kappa }, \; \; \lambda _{r,t}=\sum _{\tau =T_0}^{t-h}\delta ^{t-h-\tau }(Y_{t-h}-\hat{Y}_{r,\tau +h})^2, \end{aligned}$$
(10)

where \(\delta \in [0,1]\) is the discount factor and \(\kappa \) is 1 or 2. The method takes into account the historical performance of a model, and as \(\delta \) gets higher, the comparative importance of recent forecast accuracy gets higher. When \(\delta =1\), the DMSFE forecast combination method boils down to MSFE forecast combination method. In this study, \(\delta \) is taken as 0.9 and \(\kappa \) as 2.Footnote 7

One of the issues in the forecast combination literature is the selection of forecasts to combine (Timmermann 2006; Aiolfi and Timmermann 2006; Elliott et al. 2015). In the literature, there is no consensus on how to proceed. Therefore, we first combine forecasts of all series and daily factors. Then, forecasts from different financial data classes are combined to see their performances. Finally, based on their root-mean-squared forecast error (RMSFE) performances, best series for each horizons are selected and combination of these forecasts is analyzed.

The forecasting stage of the empirical research includes several steps. At the first step, for every daily financial series or factor and for each model (e.g., ADL-MIDAS and FADL-MIDAS), forecasts are calculated for different lag lengths. Afterward, for each high- frequency variable, the best-performing lag specification is selected based on Akaike information criterion (AIC) and RMSFE performances. At the last step, forecasts are combined by using the DMSFE method.

2.4 Comparing predictive accuracy

Diebold and Mariano (1995) (DM) suggest equal predictability test in the case of pairwise model comparison of non-nested models. Predictive accuracy of forecasts is compared with squared forecast errors in the DM test. Since we have a small number of forecasts, small sample adjustment of the DM test by Harvey et al. (1997) (HLN) is employed. It is specified as:

$$\begin{aligned} \text {HLN}=\left( \frac{T+1-2h+h(h-1)/T}{T}\right) ^{-1/2}\text {DM}, \end{aligned}$$
(11)

where T is the number of forecasts, h is the forecast horizon and DM is the original DM statistic. HLN follows a Student t distribution with \(T{-}1\) degrees of freedom (Harvey et al. 1997).

As it can be observed in the models that are used in the forecasting GDP growth, some of the models are nested in others. One issue noted about the DM test in the literature is that when comparing two nested models, the critical values are not asymptotically normally distributed and shown to be undersized (McCracken 2007). Clark and West (2007) (CW) develop a test statistic which can be used to compare out-of-sample forecast accuracy of two nested models. The null hypothesis of the test is that nesting model and the nested model have equal predictive accuracy and the alternative hypothesis is that the nesting model performs better than the nested model. Based on their simulation studies, Clark and West (2007) define two critical values for their CW statistics. If the test statistic is bigger that 1.282, the p value of the test statistic is between 0.05 and 0.10, and if the test statistic is bigger than 1.645, the p value is between 0.01 and 0.05.

3 Data

In this study, since we employ the data belonging to the last two decades of the Turkish economy, as a first step, we try to summarize developments in this period. After suffering important setbacks during 1990s, starting with 1994 domestic crisis and continuing with Asian and Russian crisis of late 1990s, in 2001 Turkey faced the most severe economic crisis in its history in which its GDP shrunk by 5.7%. From then up to the onset of the global crisis, Turkey experienced a strong and robust growth by supportive external conditions, the improved policy frameworks and growth-enhancing reforms. In addition, Turkey, as many emerging economies, also used the decade to implement structural reforms, strengthen policy frameworks which resulted in remarkable fiscal consolidation, improved macrofinancial stability and flexible exchange rate regimes (Gros and Selçuki 2013). The global crisis severely hit the Turkish economy (shrunk by 4.8% in 2009). During 2010 and the first half of 2011, the recovery from the global financial crisis peaked. The growth decelerated in 2012 and by end 2013 with the result of macroprudential measures and weaker external demand (especially weak growth in the EU). The FED took steps toward gradually finalizing the quantitative expansion programs during 2014 and announced that it could initiate an interest rate hike in 2015. This period of heightened global uncertainty and weak external demand due to geopolitical developments put downside risks for the Turkish economy and lead to slow pace of growth in the economy during 2014 and 2015.

The data of this period for the Turkish economy are retrieved from Datastream using several databases, and the most recently available data have been used for the regressions at the time of writing. The study focuses on the quarter-on-quarter seasonally adjusted Turkish GDP growth between 2000-Q2 and 2016-Q2. The GDP data series is from the TURKSTAT database. The length of the series is 65, and last 25 data points of the series (i.e., GDP growth data from 2010-Q2 till 2015-Q1) are selected to evaluate out-of-sample performance of MIDAS models. The initial window size is; thus, 40 (i.e., initial estimation sample is from 2000-Q2 till 2010-Q1) and forecasts are made with a recursive window.

Quarterly factors are extracted from six quarterly macroeconomic series.Footnote 8 Like the GDP data, these quarterly series are seasonally adjusted. The series are available since 1989-Q2. The total import and export series are transformed as first difference of their logarithms and remaining quarterly series are used in their original form, which gives quarter-on-quarter growth rates. The factors are extracted using the PCA with a window size of 40. Only first lag of the first factor, which explains 50% total variation in quarterly series, is used in the FADL-MIDAS and FADL regressions.

Daily financial series which run from January 3, 2000, to June 30, 2016, consist of 3 asset classes and a corporate risk class. There are 41 commodity, 35 equity and 12 foreign exchange series in the asset classes and 10 corporate risk series with a length of 4301. The commodity prices are mostly from the London Metal Exchange (LME) and the Goldman Sachs Commodity Index (GSCI). We use several forward prices, such as 3-month or 15-month forward, in order to exploit the forward-looking information in these prices. The equity series include sectoral indices, such as banking or industrials, which summarize developments in a certain sector. In addition to the sectoral information, we also use the Borsa Istanbul (BIST) 100 series to get information on the daily performances of major companies in Turkey. The United States Dollar (USD) and Euro are the two major currencies in the trade relation of Turkey, so we expect that the information content of these two currencies is the most useful for the forecasting GDP growth. However, other currency pairs are also added to the list to see whether they can provide improvements beyond our expectations. Finally, an additional foreign corporate risk class is used in order to account for the regional and global risk on GDP growth. There are 106 series in this class. The motivation of including the class stems from the possible linkages within trade, investment and growth for a small open economy. High international risk might negatively influence trade and investment flows and thereby the domestic GDP growth.

The whole list of quarterly series and daily financial series used here can be found in Şen Doğan and Midiliç (2016).

The daily data series are transformed to obtain return series in percentage terms as follows:

$$\begin{aligned} x_t=\mathrm{log}\left( \frac{X_{t-1}}{X_t}\right) \times 100; \end{aligned}$$
(12)

where \(x_t\) denotes log daily return at time t. The series are then tested for stationarity.

Five daily factors are extracted for all of the 204 series and for each financial data class separately. The window size used in factor extraction is 40. These factors explain 55–18%, 46–16%, 30–10%, 19–2%, 22–5\(\times 10^{-6}\)% and 26–0.03% of total variation in all series, commodities, equities, foreign exchange series, domestic corporate risk and foreign corporate risk series, respectively.

Daily series and all quarterly series except GDP growth are winsorized at the 1% level in order to avoid any outlier effect.

4 Empirical results

We analyze two types of models: models with financial data (ADL and ADL-MIDAS) and models with financial data and macrofactors (FADL and FADL-MIDAS). We evaluate forecasting performance of these models with respect to benchmark autoregressive model with lag one (AR(1)) using RMSFE in order to determine the contribution of financial data to forecasting GDP growth. Additionally, we try to find out to what extent employing quarterly factors and MIDAS models in forecasting GDP growth of Turkey provide forecasting gains.

The selection of the benchmark model in model comparisons relies on the results of Marcellino (2008), who show that the linear models prove to strong benchmarks for out-of-sample forecasting if they are correctly specified. For the Turkish GDP growth, a linear time-series model is specified based on the AIC and the Bayesian information criterion (BIC). According to the information criteria given in Table 1, the minimum values for both the AIC and the BIC are received form the AR(1) model with a constant; therefore, this model is selected as the benchmark model.

Table 1 Lag selection

The forecast results are reported in three subgroups as 1-month- ahead, 2-month-ahead, 3-month-ahead nowcasts and forecasts, and horizons beyond 2-quarter-ahead, namely, 3-quarter- and 4-quarter- ahead forecasts. Time left in certain quarter in monthly terms is considered in labeling the forecasts. In order to illustrate the distinction between these forecast horizons, assume that we want to make forecasts for 2013-Q1 and 2013-Q2. The forecast for the first quarter of 2013 done at the beginning of the first quarter (i.e., only use data from the previous periods and not from 2013-Q1) is labeled as 3-month-ahead nowcast, the one done at the end of the first month is labeled as 2-month-ahead nowcast, and the one done at the end of the second month is labeled as 1-month-ahead nowcast. Forecasts done for 2013-Q2 in 2013-Q1 are labeled as forecasts, and again the time left in 2013-Q1 in monthwise labels the forecasts (e.g., forecast done for 2013-Q2 at the beginning of 2013-Q1 is labeled as 3-month-ahead forecast).

Firstly, we evaluate contribution of using financial data in forecasting real GDP growth. Top panels of Table 2 include forecast combinations with 204 daily series, while combinations with 5 daily factors are given at the bottom panels. The results with macrofactors (FADL and FADL-MIDAS) are given after the results without macrofactors (ADL and ADL-MIDAS). The table presents the RMSFE of the benchmark model and RMSFE of the combined forecasts of the two types of models with respect to the RMSFE of the benchmark. If the ratio is less than one, it is interpreted as an improvement of the underlying forecast upon the benchmark. The results reveal that all models with financial data perform better than the benchmark model for all forecast horizons. To be explicit, in case of 1-month-ahead nowcast, for the daily series, the ADL and ADL-MIDAS improve upon the AR by 2 and 17%, respectively. In Table 3, predictive ability of MIDAS models with respect to the AR(1) is compared by using the HLN and CW tests. The results are similar and show that financial data provide statistically significant forecasting gains over the benchmark.Footnote 9

Table 2 RMSFE comparisons for MIDAS and non-MIDAS models
Table 3 HLN and CW test statistics for AR(1) versus MIDAS models

Secondly, we compare forecasting performance of ADL-MIDAS and FADL-MIDAS models over the corresponding ADL and FADL models so that we aim to assess predictive ability of MIDAS models for quarterly real GDP growth. RMSFE performances of these models are given in Table 2. An initial comparison shows that ADL-MIDAS models perform better than ADL models in all cases. Similarly, performance of FADL-MIDAS models is better than FADL models. The statistical significance of the differences between the models is tested by HLN and CW tests.Footnote 10 Corresponding test statistics are given in Table 4. In these tests, ADL models are compared with ADL-MIDAS models and FADL models with FADL-MIDAS models. According to the HLN test results, MIDAS models do not consistently beat the non-MIDAS models. On the other hand, according to the CW test statistics, the ADL-MIDAS and FADL-MIDAS models perform statistically better except for one case (i.e., FADL vs. FADL-MIDAS at 1-month-ahead forecast).

Table 4 HLN and CW test statistics non-MIDAS versus MIDAS models

Thirdly, we evaluate the performance of quarterly macroeconomic factors in forecasting GDP growth. For almost all MIDAS and non-MIDAS models, quarterly factors provide forecasting gains over the corresponding ADL and ADL-MIDAS models for both daily series and daily factors combinations and for each forecast horizon (Table 2). Test statistics for the significance of better performance of the FADL-MIDAS models against ADL-MIDAS models are given in Table 5. Regarding the observation on Table 2, the test statistics imply that the quarterly factors significantly improve the MIDAS forecasts only with the daily series and only for the 2-month-ahead forecasts.

Table 5 HLN and CW test statistics for ADL-MIDAS versus FADL-MIDAS models

Fourthly, we investigate forecasting performance of each financial data class. In Table 6, RMSFE of each financial data class with respect to the benchmark model is reported, while the results corresponding to the linear models are tabulated in Table 7. FADL-MIDAS models with the daily factors show the best performance for most of the horizons. For example, for 3-month-ahead nowcasts, FADL-MIDAS models with forex factors improve upon the benchmark by 19%. For 3-month-ahead forecasts, the combinations of the factors extracted from separate financial data classes provide 17% improvement over the benchmark. The improvement goes up to 38% for the 4-quarter- ahead forecasts with the forecast combination of the equity factors.

Table 6 RMSFE comparisons for MIDAS models, separate financial data classes

Lastly, we try to identify whether some small group of series become prominent among 204 daily series in terms of forecasting gains in order to search for evidence for the argument that the results can be replicated by a small number of daily series. Forecast combination performances of the best 5 and 10 series for each horizon are given in Table 8. The table shows that forecasts do not improve in comparison with the results reported before. HLN and CW test statistics comparing the forecasts of best 5 series are given in Table 9, and the test statistics show that the FADL-MIDAS model forecasts statistically perform better than the combination of the best-performing series.

Tables 10 and 11 report the first 5 best series for each horizon based on their RMSFE performances. The table shows that, within the daily financial series, some of the equity series, especially FTSE Oil & Gas index, are very useful in forecasting the Turkish GDP growth with both ADL-MIDAS and FADL-MIDAS models. At the longer horizons, the foreign corporate bond indices dominate the table. This might be a period specific phenomenon, but the predictive ability of both equity and foreign corporate bond series should be further analyzed.

Table 7 RMSFE comparisons for non-MIDAS models, separate financial data classes
Table 8 RMSFE comparisons for MIDAS models with best series

It can be observed on the tables that as the forecast horizon increases, the performance of the MIDAS models gets better compared to the benchmark. This is due to the deterioration of the performance of the benchmark model. The RMSFE of the AR(1) goes up to 1.33 at the 4-quarter-ahead forecasts from 1.08 at the 1-quarter- ahead forecasts. Therefore, the MIDAS models produce arguable more stable forecasts within horizons than the benchmark model.

Table 9 HLN and CW test statistics for best series versus FADL-MIDAS models
Table 10 The best series ranked based on their RMSFE performances for each MIDAS model and horizon
Table 11 The best series ranked based on their RMSFE performances for each MIDAS model and horizon

Overall, the RMSFE comparisons show that better results are received with the combination of a small set of daily factors extracted from separate financial data classes for the 1-month-ahead nowcasts, and 2-month-, 3-month- and 4-quarter-ahead forecasts. The result implies that the dimensions of the financial data can be reduced and the Turkish economy can be tracked by using a small set of daily factors for some horizons.

The results suggest that the framework used in the study works significantly better than naive models. Our findings indicate that employing daily financial data improves forecasting performance of two types of model. Besides, with financial data one does not have to wait for updates of the dataset. MIDAS regression models provide forecasts gains compared to the models using simple aggregation schemes, and in most of the cases, the gain is substantial. Consistent with the findings of previous studies (Stock and Watson 2003; Andreou et al. 2013), quarterly factors provide forecasting gains over the benchmark models for most of the forecast horizons and financial data classes. However, the models considered here do not make use of any monthly data that were shown to forecast Turkish GDP data quite well in the past. By adding monthly variables in Eq. 5 and updating the forecasts after monthly data releases, the performances of the models can further be improved.

Finally, we may compare our results with one of the limited studies about forecasting Turkish GDP growth, which is Akkoyun and Günay (2012). However, the direct comparison is not possible since we employ different estimation and forecasting period. Thus, we only mention about advantage of our framework over their models. Akkoyun and Günay (2012) employ small-scale dynamic factor model and forecast quarterly GDP growth with monthly data by following Kalman filter approach. They find that some combinations of hard indicators such as industrial production, export and import quantity indices and soft indicators such as PMI and PMI new orders have predictive power for the output growth. In last 2 years, correlation among GDP growth and the monthly data decreased (see Box 4.1 in the third inflation report of 2015 for details (CBRT 2015)) which leads to decline in the performance of these models. Instead, we extract information from large set of daily financial data and make use of historical performance of the data in generating forecasts. Therefore, we can say that the methodology applied in this study is more robust to movements in individual series and deterioration of explanatory power in time.

5 Conclusions

In this study, our aim is to incorporate the information in daily financial data about the future state of the Turkish economy. To this end, the mixed data sampling (MIDAS) models are used for the first time in generating nowcasts and forecasts for quarterly Turkish GDP growth in the analysis. We follow the approach introduced by Andreou et al. (2013) and employ 204 daily financial data consisting of 4 major financial data classes: commodity price, equity, foreign exchange and corporate risk.

Our results suggest that MIDAS regression models and forecast combination methods provide considerable advantage in exploiting information from daily financial data compared to the models using simple aggregation schemes when forecasting the quarterly GDP growth. Additionally, incorporating daily financial data into the analysis improves our forecasts substantially. These indicate that both the information content of the financial data and flexible data-driven weighting scheme of MIDAS regressions play a significant role for forecasting real GDP growth. Besides, consistent with the findings of previous studies (Stock and Watson 2003; Andreou et al. 2013), quarterly factors lead to forecasting gains over the benchmark models for most of the forecast horizons and financial data classes. Additionally, the analysis of best-performing series reveals the usefulness of daily factors from separate financial data classes in forecasting the Turkish GDP growth. This implies that daily factor series might be a good indicator for the Turkish GDP growth and be used as a daily index to track economic activity in. Further research might be carried out by incorporating monthly hard and soft data into the analysis since previous studies prove their significance in improving the nowcasting performance.

6 Data and computer code availability

The datasets of this paper (1. code and programs, 2. data, 3. detailed readme files) are collected in electronic supplementary material of this article.