Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

GDP growth is surely the most fundamental and important concept in empirical/applied macroeconomics and business cycle monitoring, yet significant uncertainty still surrounds its estimation. Two often-divergent estimates exist for the U.S., a widely-used expenditure-side version, GDP\(_E\), and a much less widely-used income-side version, GDP\(_I\). Nalewaik (2010) makes clear that, at the very least, GDP\(_I\) deserves serious attention and may even have properties in certain respects superior to those of GDP\(_E\). That is, if forced to choose between GDP\(_{E}\) and GDP\(_I\), a surprisingly strong case exists for GDP\(_{I}\).

But of course one is not forced to choose between GDP\(_{E}\) and GDP\(_I\), and a combined estimate that pools information in the two indicators GDP\(_E\) and \(GDP_I\) may improve on both. In this chapter, we propose and explore a method for constructing such a combined estimate, and we compare our new GDP\(_C\) (“combined”) series to GDP\(_E\) and GDP\(_I\) over many decades, with particular attention to behavior over the business cycle, emphasizing comparative behavior during turning points.

Our work is motivated by, and builds on, five key literatures. First, and most pleasing to us, our work is very much related to Hal White’s in its focus on dynamic modeling while acknowledging misspecification throughout.

Second, we obviously build on the literature examining GDP\(_I\) and its properties, notably Fixler and Nalewaik (2009) and Nalewaik (2010). GDP\(_I\) turns out to have intriguingly good properties, suggesting that it might be usefully combined with GDP\(_E\).

Third, our work is related to the literature distinguishing between “forecast error” and “measurement error” data revisions, as for example in Mankiw et al. (1984), Mankiw and Shapiro (1986), Faust et al. (2005), and Aruoba (2008). In this chapter we work largely in the forecast error tradition.

Fourth, and related, we work in the tradition of the forecast combination literature begun by Bates and Granger (1969), viewing GDP\(_E\) and GDP\(_I\) as forecasts of GDP [actually a mix of “backcasts” and “nowcasts” in the parlance of Aruoba and Diebold (2010)]. We combine those forecasts by forming optimally weighted averages.Footnote 1

Finally, we build on the literature on “balancing” the national income accounts, which extends back almost as far as national income accounting itself, as for example in Stone et al. (1942), who use a quadratic loss criterion to propose weighting different GDP estimates by the inverse of their squared “margins of error.” Stone refined those ideas in his subsequent national income accounting work, and Byron (1978) and Weale (1985) formalized and refined Stone’s approach. Indeed a number of papers by Weale and coauthors use subjective evaluations of the quality of different U.K. GDP estimates to produce combined estimates; see Barker et al. (1984),  Weale (1988), Solomou and Weale (1991), and Solomou and Weale (1993).Footnote 2 For example, Barker et al. (1984) and Weale (1988) incorporate data quality assessments from the U.K. Central Statistical Office. Weale also disaggregate some of their GDP estimates to incorporate information regarding differential quality of underlying source data. In that tradition, Beaulieu and Bartelsman (2004) use input–output tables to disaggregate GDP\(_E\) and GDP\(_I\), using what they call “tuning” parameters to balance the accounts. We take a similar approach here, weighting competing GDP estimates in ways that reflect our assessment of their quality, but we employ more of a top-down, macro perspective.

We proceed as follows. In Sect.  2 we consider GDP combination under quadratic loss. This involves taking a stand on the values of certain unobservable parameters (or at least reasonable ranges for those parameters), but we argue that a “quasi-Bayesian” calibration procedure based on informed judgment is feasible, credible, and robust. In Sect.  3 we consider GDP combination under minimax loss. Interestingly, as we show, it does not require calibration. In Sect.  4 we apply our methods to provide improved GDP estimates for the U.S. In Sect.  5 we sketch several extensions, and we conclude in Sect.  6.

2 Combination Under Quadratic Loss

Optimal forecast combination typically requires knowledge (or, in practice, estimates) of forecast error properties such as variances and covariances. In the present context, we have two “forecasts,” of true GDP, namely GDP\(_E\) and GDP\(_I\), but true GDP is never observed, even after the fact. Hence we never see the “forecast errors,” which complicates matters significantly but not hopelessly. In particular, in this section we work under quadratic loss and show that a quasi-Bayesian calibration based on informed judgment is feasible and credible, and simultaneously, that the efficacy of GDP combination is robust to the precise weights used.

2.1 Basic Results and Calibration

First assume that the errors in GDP\(_E\) and GDP\(_I\) growth are uncorrelated. Consider the convex combinationFootnote 3

$$\begin{aligned} \text{ GDP}_C = \lambda \text{ GDP}_E + (1- \lambda ) \; \text{ GDP}_I, \end{aligned}$$

where \(\lambda \in [0,1]\).Footnote 4 Then the associated errors follow the same weighting,

$$\begin{aligned} e_{C}= \lambda e_{E}+ (1- \lambda ) e_{I}, \end{aligned}$$

where \(e_C= \text{ GDP}-\text{ GDP}_C\), \(e_E= \text{ GDP}-\text{ GDP}_E\) and \(e_I= \text{ GDP}-\text{ GDP}_I\). Assume that both GDP\(_E\) and GDP\(_I\) are unbiased for GDP, in which case GDP\(_C\) is also unbiased, because the combining weights sum to unity.

Given the unbiasedness assumption, the minimum-MSE combining weights are just the minimum-variance weights. Immediately, using the assumed zero correlation between the errors,

$$\begin{aligned} \sigma ^2_{C}= \lambda ^2 \sigma ^2_{E}+ (1- \lambda )^2 \sigma ^2_{I}, \end{aligned}$$
(1)

where \(\sigma ^2_{C}=\text{ var}(e_C)\), \(\sigma ^2_{E}=\text{ var}(e_E)\) and \(\sigma ^2_{I}=\text{ var}(e_I)\). Minimization with respect to \(\lambda \) yields the optimal combining weight,

$$\begin{aligned} \lambda ^*= \frac{\sigma ^2_I}{ \sigma ^2_{{I}}+\sigma ^2_{{E}} } = \frac{1}{ 1+ \phi ^2}, \end{aligned}$$
(2)

where \( \phi = {\sigma _{{E}}}/{\sigma _{{I}}} \).

It is interesting and important to note that in the present context of zero correlation between the errors,

$$\begin{aligned} \text{ var}(e_E) + \text{ var}(e_I) = \text{ var}(\text{ GDP}_E-\text{ GDP}_I). \end{aligned}$$
(3)

The standard deviation of GDP\(_E\) minus GDP\(_I\) can be trivially estimated. Thus, an expression of a view about \(\phi \) is in fact implicitly an expression of a view about not only the ratio of var\((e_E)\) and var\((e_I)\), but about their actual values. We will use this fact (and its generalization in the case of correlated errors) in several places in what follows.

Based on our judgment regarding U.S. GDP\(_E\) and GDP\(_I\) data, which we will subsequently discuss in detail in Sect.  2.2, we believe that a reasonable range for \(\phi \) is \(\phi \in [0.75, 1.45]\), with midpoint 1.10.Footnote 5 One could think of this as a quasi-Bayesian statement that prior beliefs regarding \(\phi \) are centered at 1.10, with a 90 % prior credible interval of [0.75, 1.45]. In Fig.  1 we graph \(\lambda ^*\) as a function of \(\phi \), for \(\phi \in [0.75, 1.45]\). \(\lambda ^*\) is of course decreasing in \(\phi \), but interestingly, it is only mildly sensitive to \(\phi \). Indeed, for our range of \(\phi \) values, the optimal combining weight remains close to 0.5, varying from roughly 0.65 to 0.30. At the midpoint \(\phi =1.10\), we have \(\lambda ^*=0.45\).

Fig. 1
figure 1

\(\lambda ^*\) versus \(\phi \). \(\lambda ^*\) constructed assuming uncorrelated errors. The horizontal line for visual reference is at \(\lambda ^*= 0.5\). See text for details

It is instructive to compare the error variance of combined GDP, \(\sigma ^2_C\), to \(\sigma ^2_E\) for a range of \(\lambda \) values (including \(\lambda =\lambda ^*\), \(\lambda =0\), and \(\lambda =1\)).Footnote 6 From (1) we have:

$$\begin{aligned} \frac{\sigma ^2_C}{\sigma ^2_E} = \lambda ^2 + \frac{(1- \lambda )^2}{\phi ^2}. \end{aligned}$$

In Fig.  2 we graph \({\sigma ^2_C} / {\sigma ^2_E}\) for \(\lambda \in [0,1]\) with \(\phi =1.1\). Obviously the maximum variance reduction is obtained using \(\lambda ^*=0.45\), but even for nonoptimal \(\lambda \), such as simple equal-weight combination (\(\lambda =0.5\)), we achieve substantial variance reduction relative to using GDP\(_E\) alone. Indeed, a key result is that for all \(\lambda \) (except those very close to 1, of course) we achieve substantial variance reduction.

Fig. 2
figure 2

\({\sigma ^2_C} / {\sigma ^2_E}\) for \(\lambda \in [0,1]\). We assume \(\phi =1.1\) and uncorrelated errors. See text for details

Now consider the more general and empirically-relevant case of correlated errors. Under the same conditions as earlier,

$$\begin{aligned} \sigma ^2_{{c}}= \lambda ^2 \sigma ^2_{{E}}+ (1- \lambda )^2 \sigma ^2_{{I}} + 2 \lambda (1- \lambda ) \sigma _{EI} , \end{aligned}$$
(4)

so

$$\begin{aligned} \lambda ^*&= \frac{\sigma ^2_{{I}}- \sigma _{EI}}{ \sigma ^2_{{I}}+\sigma ^2_{{E}} -2 \sigma _{EI} } \nonumber \\&= \frac{1 - \phi \rho }{ 1+ \phi ^2 - 2 \phi \rho }, \nonumber \end{aligned}$$
(5)

where \(\sigma _{EI}=\text{ cov}(e_{E}, e_{I})\) and \(\rho =\text{ corr}(e_{E}, e_{I})\).

It is noteworthy that—in parallel to the uncorrelated-error case in which beliefs about \(\phi \) map one-for-one into beliefs about \(\sigma _E\) and \(\sigma _I\)— beliefs about \(\phi \) and \(\rho \) now similarly map one-for-one into beliefs about \(\sigma _E\) and \(\sigma _I\). Our definitions of \(\sigma _E^2\) and \(\sigma ^2_I\) imply that

$$\begin{aligned} \sigma ^2_j = \text{ var}[\text{ GDP}_j] - 2 \text{ cov}[\text{ GDP}_j,\text{ GDP}] + \text{ var}[\text{ GDP}], \quad j \in \{E,I\}. \end{aligned}$$
(6)

Moreover, the covariance between the GDP\(_E\) and GDP\(_I\) errors can be expressed as

$$\begin{aligned} \sigma _{EI} = \text{ cov}[\text{ GDP}_E,\text{ GDP}_I] - \text{ cov}[\text{ GDP}_E,\text{ GDP}] - \text{ cov}[\text{ GDP}_I,\text{ GDP}] + \text{ var}[\text{ GDP}]. \end{aligned}$$
(7)

Solving (5) for \(\text{ cov}[\text{ GDP}_j, \text{ GDP}]\) and inserting the resulting expressions for \(j \in \{E,I\}\) into (6) yields

$$\begin{aligned} \sigma _{EI} = \text{ cov}[\text{ GDP}_I,\text{ GDP}_E] - \frac{1}{2} \bigg ( var[\text{ GDP}_I] + \text{ var}[\text{ GDP}_E] - \sigma ^2_I - \sigma ^2_E \bigg ). \end{aligned}$$
(8)

Finally, let \(\sigma _{EI} = \rho \sigma _E \sigma _I\) and \(\sigma ^2_E = \phi ^2 \sigma _I^2\). Then we can solve (7) for \(\sigma ^2_I\):

$$\begin{aligned} \sigma ^2_I = \frac{ cov[\text{ GDP}_I,\text{ GDP}_E] - \frac{1}{2} \left( \text{ var}[\text{ GDP}_I] + \text{ var}[\text{ GDP}_E] \right)}{ \rho \phi - \frac{1}{2}(1+\phi ^2)} = \frac{N}{D}. \end{aligned}$$
(9)

For given values of \(\phi \) and \(\rho \) we can immediately evaluate the denominator \(D\) in (8), and using data-based estimates of \(cov[\text{ GDP}_I,\text{ GDP}_E]\), \(\text{ var}[\text{ GDP}_I]\), and \(\text{ var}[\text{ GDP}_E] \) we can evaluate the numerator \(N\).

Fig. 3
figure 3

\(\lambda ^*\) versus \(\phi \) for various \(\rho \) values. The horizontal line for visual reference is at \(\lambda ^*= 0.5\). See text for details

Fig. 4
figure 4

\(\lambda ^*\) versus \(\rho \) for various \(\phi \) values. The horizontal line for visual reference is at \(\lambda ^*= 0.5\). See text for details

Fig. 5
figure 5

\(\lambda ^*\) versus \(\rho \) and \(\phi \). See text for details

Based on our judgment regarding U.S. GDP\(_E\) and GDP\(_I\) data (and again, we will discuss that judgment in detail in Sect.  2.2), we believe that a reasonable range for \(\rho \) is \(\rho \in [0.30, 0.60]\), with midpoint 0.45. One could think of this as a quasi-Bayesian statement that prior beliefs regarding \(\rho \) are centered at 0.45, with a 90 % prior credible interval of [0.30, 0.60].Footnote 7

In Fig.  3 we show \(\lambda ^*\) as a function of \(\phi \) for \(\rho = 0\), 0.3, 0.45, and 0.6; in Fig.  4 we show \(\lambda ^*\) as a function of \(\rho \) for \(\phi = 0.95\), 1.05, 1.15, and 1.25; and in Fig.  5 we show \(\lambda ^*\) as a bivariate function of \(\phi \) and \(\rho \). For \(\phi =1\) the optimal weight is 0.5 for all \(\rho \), but for \(\phi \ne 1\) the optimal weight differs from 0.5 and is more sensitive to \(\phi \) as \(\rho \) grows. The crucial observation remains, however, that under a wide range of conditions it is optimal to put significant weight on both GDP\(_E\) and GDP\(_I\), with the optimal weights not differing radically from equality. Moreover, for all \(\phi \) values greater than one, so that less weight is optimally placed on GDP\(_E\) under a zero-correlation assumption, allowance for positive correlation further decreases the optimal weight placed on GDP\(_E\). For a benchmark calibration of \(\phi =1.1\) and \(\rho =0.45\), \(\lambda ^* \approx 0.41\).

Fig. 6
figure 6

\({\sigma ^2_C} / {\sigma ^2_E}\) for \(\lambda \in [0,1]\). We assume \(\phi =1.1\) and \(\rho =0.45\). See text for details

Let us again compare \(\sigma ^2_C\) to \(\sigma ^2_E\) for a range of \(\lambda \) values (including \(\lambda =\lambda ^*\), \(\lambda =0\), and \(\lambda =1\)). From (4) we have:

$$\begin{aligned} \frac{\sigma ^2_C}{\sigma ^2_E} = \lambda ^2 + \frac{(1-\lambda )^2}{\phi ^2}+ 2 \lambda (1- \lambda ) \frac{\rho }{\phi }. \end{aligned}$$

In Fig.  6 we graph \({\sigma ^2_C} / {\sigma ^2_E}\) for \(\lambda \in [0,1]\) with \(\phi =1.1\) and \(\rho =0.45\). Obviously the maximum variance reduction is obtained using \(\lambda ^*=0.41\), but even for nonoptimal \(\lambda \), such as simple equal-weight combination (\(\lambda =0.5\)), we achieve substantial variance reduction relative to using GDP\(_E\) alone.

2.2 On the Rationale for our Calibration

We have thus far implicitly asked the reader to defer to our judgment regarding calibration, focusing on \(\phi \in [0.75, 1.45]\) and \(\rho \in [0.30, 0.60]\) with benchmark midpoint values of \(\phi =1.10\) and \(\rho =0.45\). Here we explain the experience, reasoning, and research that supports that judgment.

2.2.1 Calibrating \(\phi \)

The key prior view embedded in our choice of \(\phi \in [0.75, 1.45]\), with midpoint 1.10, is that GDP\(_I\) is likely a somewhat more accurate estimate than GDP\(_E\). This accords with the results of Nalewaik (2010), who examines the relative accuracy of the GDP\(_E\) and GDP\(_I\) in several ways, with results favorable to GDP\(_I\), suggesting \(\phi > 1\).

Let us elaborate. The first source of information on likely values of \(\phi \) is from detailed examination of the source data underlying GDP\(_E\) and GDP\(_I\). The largest component of GDP\(_I\), wage, and salary income, is computed using quarterly data from tax records that are essentially universe counts, contaminated by neither sampling nor nonsampling errors. Two other very important components of GDP\(_I\), corporate profits, and proprietors’ income, are also computed using annual data from tax records.Footnote 8 Underreporting and nonreporting of income on tax forms (especially by proprietors) is an issue with these data, but the statistical agencies make adjustments for misreporting, and in any event the same misreporting issues plague GDP\(_E\) as well as GDP\(_I\), as we discuss below.

In contrast to GDP\(_I\), very little of the quarterly or annual data used to compute GDP\(_E\) is based on universe counts.Footnote 9 Rather, most of the quarterly GDP\(_E\) source data are from business surveys where response is voluntary. Nonresponse rates can be high, potentially introducing important sample-selection effects that may, moreover, vary with the state of the business cycle. Many annual GDP\(_E\) source data are from business surveys with mandatory response, but some businesses still do not respond to the surveys, and surely the auditing of these nonrespondents is less rigorous than the auditing of tax nonfilers. In addition, even the annual surveys do not attempt to collect data on some types of small businesses, particularly nonemployer businesses (i.e., businesses with no employees). The statistical agencies attempt to correct some of these omissions by incorporating data from tax records (making underreporting and nonreporting of income on tax forms an issue for GDP\(_E\) as well as GDP\(_I\)), but it is not entirely clear whether they adequately plug all the holes in the survey data.

Although these problems plague most categories of GDP\(_E\), some categories appear more severely plagued. In particular, over most of history, government statistical agencies have collected annual source data on less than half of personal consumption expenditures (PCE) for services, a very large category comprising between a quarter and a half of the nominal value of GDP\(_E\) over our sample. At the quarterly frequency, statistical agencies have collected even less source data on services PCE.Footnote 10 For this reason, statistical agencies have been forced to cobble together less-reliable data from numerous nongovernmental sources to estimate services PCE.

A second source of information on the relative reliability of GDP\(_E\) and GDP\(_I\) is the correlation of the two measures with other variables that should be correlated with output growth, as examined in Nalewaik (2010). Nalewaik (2010) is careful to pick variables that are not used in the construction of either GDP\(_E\) or GDP\(_I\), to avoid spurious correlation resulting from correlated measurement errors.Footnote 11 The results are uniformly favorable to GDP\(_I\) and suggest that it is a more accurate measure of output growth than GDP\(_E\). In particular, from the mid-1980s to the mid-2000s, the period of maximum divergence between GDP\(_E\) and GDP\(_I\), Nalewaik (2010) finds that GDP\(_I\) growth has higher correlation with lagged stock price changes, the lagged slope of the yield curve, the lagged spread between high-yield corporate bonds and Treasury bonds, short and long differences of the unemployment rate (both contemporaneously and at leads and lags), a measure of employment growth computed from the same household survey, the manufacturing ISM PMI (Institute for Supply Management, Purchasing Managers Index), the nonmanufacturing ISM PMI, and dummies for NBER recessions. In addition, lags of GDP\(_I\) growth also predict GDP\(_E\) growth (and GDP\(_I\) growth) better than lags of GDP\(_E\) growth itself.

It is worth noting that, as regards our benchmark midpoint calibration of \(\phi =1.10\), we have deviated only slightly from an “ignorance prior” midpoint of 1.00. Hence our choice of midpoint reflects a conservative interpretation of the evidence discussed above. Similarly, regarding the width of the credible interval as opposed to its midpoint, we considered employing intervals such as \(\phi \in [0.95, 1.25]\), for which \(\phi > 1\) over most of the mass of the interval. The evidence discussed above, if interpreted aggressively, might justify such a tight interval in favor of GDP\(_I\), but again we opted for a more conservative approach with \(\phi < 1\) over more than a third of the mass of the interval.

2.2.2 Calibrating \(\rho \)

The key prior view embedded in our choice of \(\rho \in [0.30, 0.60]\), with midpoint 0.45, is that the errors in GDP\(_E\) and GDP\(_I\) are likely positively correlated, with a moderately but not extremely large correlation value. This again accords with the results in Nalewaik (2010), who shows that 26 % of the nominal value of GDP\(_E\) and GDP\(_I\) is identical. Any measurement errors in that 26 % will be perfectly correlated across the two estimates. Furthermore, GDP\(_E\) and GDP\(_I\) are both likely to miss fluctuations in output occurring in the underground or “gray” economy, transactions that do not appear on tax forms or government surveys. In addition, the same price deflator is used to convert GDP\(_E\) and GDP\(_I\) from nominal to real values, so any measurement errors in that price deflator will be perfectly correlated across the two estimates.

These considerations suggest the lower bound for \(\rho \) should be well above zero, as reflected in our chosen interval. However, the evidence favoring an upper bound well below one is also quite strong, as also reflected in our chosen interval. First, and most obviously, the standard deviation of the difference between GDP\(_E\) and GDP\(_I\) is 1.9 %, far from the 0.0 % that would be the case if \(\rho =1.0\). Second, as discussed in the previous section, the source data used to construct GDP\(_E\) is quite different from the source data used to construct GDP\(_I\), implying the measurement errors are likely to be far from perfectly correlated.

Of course, \(\rho \) could still be quite high if GDP\(_E\) and GDP\(_I\) were contaminated with enormous common measurement errors, as well as smaller, uncorrelated measurement errors. But if that were the case, GDP\(_E\) and GDP\(_I\) would fail to be correlated with other cyclically-sensitive variables, such as the unemployment rate, as they both are. The \(R^2\) values from regressions of the output growth measures on the change in the unemployment rate are each around 0.50 over our sample, suggesting that at least half of the variance of GDP\(_E\) and GDP\(_I\) is true variation in output growth, rather than measurement error. The standard deviation of the residual from these regressions is 2.81 % using GDP\(_I\) and 2.95 % using GDP\(_E\). For comparison, taking our benchmark value \(\phi = 1.1\) and our upper bound \(\rho = 0.6\) produces \(\sigma _I = 2.05\) and \(\sigma _E = 2.25\). Increasing \(\rho \) to \(0.7\) produces \(\sigma _I = 2.36\) and \(\sigma _E = 2.60\), approaching the residual standard error from our regression. This seems like an unreasonably high amount of measurement error, since the explained variation from such a simple regression is probably not measurement error, and indeed some of the unexplained variation from the regression is probably also not measurement error. Hence the upper bound of \(0.6\) for \(\rho \) seems about right.

3 Combination Under Minimax Loss

Here we take a more conservative perspective on forecast combination, solving a different but potentially important optimization problem. We utilize the minimax framework of Wald (1950), which is the main decision-theoretic approach for imposing conservatism and therefore of intrinsic interest. We solve a game between a benevolent scholar (the Econometrician) and a malevolent opponent (Nature). In that game the Econometrician chooses the combining weights, and Nature selects the stochastic properties of the forecast errors. The minimax solution yields the combining weights that deliver the smallest chance of the worst outcome for the Econometrician. Under the minimax approach knowledge or calibration of objects like \(\phi \) and \(\rho \) is unnecessary, enabling us to dispense with judgment, for better or worse.

We obtain the minimax weights by solving for the Nash equilibrium in a two-player zero-sum game. Nature chooses the properties of the forecast errors and the Econometrician chooses the combining weights \(\lambda \). For expositional purposes, we begin with the case of uncorrelated errors, constraining Nature to choose \(\rho = 0\). To impose some constraints on the magnitude of forecast errors that Nature can choose, it is useful to re-parameterize the vector \((\sigma _I, \sigma _E)^{\prime }\) in terms of polar coordinates; that is, we let \(\sigma _I = \psi \cos \varphi \) and \(\sigma _E = \psi \sin \varphi \). We restrict \(\psi \) to the interval \([0,\,\bar{\psi }]\) and let \(\varphi \in [0,\, \pi /2]\). Because \(\cos ^2 \varphi + \sin ^2 \varphi = 1\), the sum of the forecast error variances associated with GDP\(_E\) and GDP\(_I\) is constrained to be less than or equal to \(\bar{\psi }^2\). The error associated with the combined forecast is given by

$$\begin{aligned} \sigma _C^2 (\psi ,\varphi ,\lambda ) = \psi ^2 \left[ \lambda ^2 \sin ^2 \varphi + (1-\lambda )^2 \cos ^2 \varphi \right], \end{aligned}$$
(10)

so that the minimax problem is

$$\begin{aligned} \max _{ \psi \in [0,\bar{\psi }], \, \varphi \in [0,\pi /2]} \; \min _{ \lambda \in [0,1]} \; \sigma _C^2 (\psi ,\varphi ,\lambda ), \end{aligned}$$
(11)

The best response of the Econometrician was derived in (2) and can be expressed in terms of polar coordinates as \(\lambda ^* = \cos ^2 \varphi \). In turn, Nature’s problem simplifies to

$$ \max _{ \psi \in [0,\bar{\psi }], \, \varphi \in [0,\pi /2]} \; \psi ^2 ( 1- \sin ^2 \varphi ) \sin ^2 \varphi , $$

which leads to the solution

$$\begin{aligned} \varphi ^* = \text{ arc} \text{ sin} \sqrt{1/2}, \quad \psi ^* = \bar{\psi }, \quad \lambda ^* = 1/2. \end{aligned}$$
(12)

Nature’s optimal choice implies a unit forecast error variance ratio, \(\phi =\sigma _E/\sigma _I=1\), and hence that the optimal combining weight is \(1/2\). If, instead, Nature set \(\varphi = 0\) or \(\varphi = \pi /2\), that is \(\phi =0\) or \(\phi =\infty \), then either GDP\(_E\) or GDP\(_I\) is perfect and the Econometrician could choose \(\lambda =0\) or \(\lambda =1\) to achieve a perfect forecast leading to a suboptimal outcome for Nature.

Now we consider the case in which Nature can choose a nonzero correlation between the forecast errors of GDP\(_E\) and GDP\(_I\). The loss of the combined forecast can be expressed as

$$\begin{aligned} \sigma _C^2 (\psi ,\rho ,\varphi ,\lambda ) = \psi ^2 \left[ \lambda ^2 \sin ^2 \varphi + (1-\lambda )^2 \cos ^2 \varphi + 2\lambda (1-\lambda ) \rho \sin \varphi \cos \varphi \right]. \end{aligned}$$
(13)

It is apparent from (12) that as long as \(\lambda \) lies in the unit interval the most devious choice of \(\rho \) is \(\rho ^*=1\). We will now verify that conditional on \(\rho ^*=1\) the solution in (11) remains a Nash equilibrium. Suppose that the Econometrician chooses equal weights, \(\lambda ^*=1/2\). In this case

$$ \sigma _C^2 (\psi ,\rho ^*,\varphi ,\lambda ^*) = \psi ^2 \left[ \frac{1}{4} + \frac{1}{2} \sin \varphi \cos \varphi \right]. $$

We can deduce immediately that \(\psi ^* = \bar{\psi }\). Moreover, first-order conditions for the maximization with respect to \(\varphi \) imply that \(\cos ^2 \varphi ^* = \sin ^2 \varphi ^*\) which in turn leads to \(\varphi ^* = \text{ arc} \text{ sin} \sqrt{1/2}\). Conditional on Nature choosing \(\rho ^*\), \(\psi ^*\), and \(\varphi ^*\), the Econometrician has no incentive to deviate from the equal-weights combination \(\lambda ^*=1/2\), because

$$ \sigma _C^2 (\psi ^*,\rho ^*,\varphi ^*,\lambda ) = \frac{\bar{\psi }}{2} \bigg [ \lambda ^2 + (1-\lambda )^2 + 2\lambda (1-\lambda ) \bigg ] = \frac{\bar{\psi }}{2}. $$

In sum, the minimax analysis provides a rational for combining GDP\(_E\) and GDP\(_I\) with equal weights of \(\lambda = 1/2\).

To the best of our knowledge, this section’s demonstration of the optimality of equal forecast combination weights under minimax loss is original and novel. There does of course exist some related literature, but ultimately our approach and results are very different. For example, a branch of the machine-learning literature (e.g., Vovk 1998; Sancetta 2007) considers games between a malevolent Nature and a benevolent “Learner.” The learner sequentially chooses weights to combine expert forecasts, and Nature chooses realized outcomes to maximize the Learner’s forecast error relative to the best expert forecast. The Learner wins the game if his forecast loss is only slightly worse than the loss attained by the best expert in the pool, even under Nature’s least favorable choice of outcomes. This game is quite different and much more complicated than ours, requiring different equilibrium concepts with different resultant combining weights.

4 Empirics

We have shown that combining using a quasi-Bayesian calibration under quadratic loss produces \(\lambda \) close to but less than 0.5, given our prior means for \(\phi \) and \(\rho \). Moreover, we showed that combining with \(\lambda \) near 0.5 is likely better—often much better—than simply using GDP\(_E\) or GDP\(_I\) alone, for wide ranges of \(\phi \) and \(\rho \). We also showed that combining under minimax loss always implies an optimal \(\lambda \) of exactly 0.5.

Here we put the theory to work for the U.S., providing arguably-superior combined estimates of GDP growth. We focus on quasi-Bayesian calibration under quadratic loss. Because the resulting combining weights are near 0.50, however, one could also view our combinations as approximately minimax. The point is that a variety of perspectives lead to combinations with weights near 0.50, and they suggest that such combinations are likely superior to using either of GDP\(_E\) or GDP\(_I\) alone, so that empirical examination of GDP\(_C\) is of maximal interest.

Fig. 7
figure 7

U.S. GDP\(_C\) and GDP\(_E\) growth rates. GDP\(_C\) constructed assuming \(\phi =1.1\) and \(\rho =0.45\). GDP\(_C\) is solid and GDP\(_E\) is dashed. In the top panel we show a long sample, 1947Q2–2009Q3. In the bottom panel, we show a recent sample, 2006Q1–2009Q3. See text for details

4.1 A Combined U.S. GDP Series

In the top panel of Fig.  7 we plot GDP\(_C\) constructed using \(\lambda = 0.41\), which is optimal for our benchmark calibration of \(\phi =1.1\) and \(\rho =0.45\), together with the “conventional” GDP\(_E\). The two appear to move closely together, and indeed they do, at least at the low frequencies emphasized by the long time-series plot. Hence for low-frequency analyses, such as studies of long-term economic growth, use of GDP\(_E\), GDP\(_I\) or GDP\(_C\) is not likely to make a major difference.

At higher frequencies, however, important divergences can occur. In the bottom panel of Fig.  7, for example, we emphasize business cycle frequencies by focusing on a short sample 2006–2010, which contains the severe U.S. recession of 2007–2009. There are two important points to notice. First, the bottom panel of Fig.  7 makes clear that growth-rate assessments on particular dates can differ in important ways depending on whether GDP\(_C\) or GDP\(_E\) is used. For example, GDP\(_E\) is strongly positive for 2007Q3, whereas GDP\(_C\) for that quarter is close to zero, as GDP\(_I\) was strongly negative. Second, the bottom panel of Fig.  7 also makes clear that differing assessments can persist over several quarters, as for example during the financial crisis episode of 2007Q1–2007Q3, when GDP\(_E\) growth was consistently larger than GDP\(_C\) growth. One might naturally conjecture that such persistent and cumulative data distortions might similarly distort inferences, based on those data, about whether and when the U.S. economy was in recession. We now consider recession dating in some detail.

4.2 U.S. Recession and Volatility Regime Probabilities

Thus far we have assessed how combining produces changes in measured GDP. Now we assess whether and how it changes a certain important transformation of GDP, namely measured probabilities of recession regimes or high-volatility regimes based on measured GDP. We proceed by fitting a regime-switching model in the tradition of Hamilton (1989), generalized to allow for switching in both means and variances, as in Kim and Nelson (1999a),

$$\begin{aligned} (\text{ GDP}_{t}-\mu _{s_{\mu t}})&= \beta (\text{ GDP}_{t-1}-\mu _{s_{\mu t-1}})+\sigma _{s _{\sigma t}}\varepsilon _{t} \\ \varepsilon _t&\sim&iid N(0,1) \nonumber \\ s_{\mu t}&\sim&\text{ Markov}(P_{\mu }), \quad s_{\sigma t} \sim \text{ Markov}(P_{\sigma }). \nonumber \end{aligned}$$
(14)

Then, conditional on observed data, we infer the sequences of recession probabilities [(\(P(s_{\mu t}=L)\), where \(L\) (“low”) denotes the recession regime] and high-volatility regime probabilities [(\(P(s_{\sigma _t}=H)\), where \(H\) (“high”) denotes the high-volatility regime]. We perform this exercise using both GDP\(_E\) and GDP\(_C\), and we compare the results.

We implement Bayesian estimation and state extraction using data 1947Q2-2009Q3.Footnote 12 In Fig.  8 we show posterior median smoothed recession probabilities. We show those calculated using GDP\(_C\) as solid lines with 90 % posterior intervals, we show those calculated using GDP\(_E\) as dashed lines, and we also show shaded NBER recession episodes to help provide context. Similarly, in Fig.  9 we show posterior median smoothed volatility regime probabilities.

Fig. 8
figure 8

Inferred U.S. Recession Regime Probabilities, calculated using GDP\(_C\) versus GDP\(_E\). Solid lines are posterior median smoothed recession regime probabilities calculated using GDP\(_C\), which we show with 90 % posterior intervals. Dashed lines are posterior median smoothed recession regime probabilities calculated using GDP\(_E\). The sample period is 1947Q2-2009Q3. Dark shaded bars denote NBER recessions. See text and appendix for details

Numerous interesting substantive results emerge. For example, posterior median smoothed recession regime probabilities calculated using GDP\(_C\) tend to be greater than those calculated using GDP\(_E\), sometimes significantly so, as for example during the financial crisis of 2007. Indeed, using GDP\(_C\) one might date the start of the recent recession significantly earlier than did the NBER. As regards volatilities, posterior median smoothed high-volatility regime probabilities calculated by either GDP\(_E\) or GDP\(_C\) tend to show the post-1984 “great moderation” effect asserted by McConnell and Perez-Quiros (2000) and Stock and Watson (2002). Interestingly, however, those calculated using GDP\(_E\) also show the “higher recession volatility” effect in recent decades documented by Bloom et al. (2009) (using GDP\(_E\) data), whereas those calculated using GDP\(_C\) do not.

For our present purposes, however, none of those substantive results are of first-order importance, as the present chapter is not about business cycle dating, low-frequency versus high-frequency volatility regime dating, or revisionist history, per se. Indeed, thorough explorations of each would require separate and lengthy papers. Rather, our point here is simply that one’s assessment and characterization of macroeconomic behavior can, and often does, depend significantly on use of GDP\(_C\) versus GDP\(_E\). That is, choice of GDP\(_C\) versus GDP\(_E\) can matter for important tasks, whether based on direct observation of measured GDP, or on transformations of measured GDP such as extracted regime chronologies.

5 Extensions

Before concluding, we offer sketches of what we see as two important avenues for future research. The first involves real-time analysis and nonconstant combining weights, and the second involves combining from a measurement error as opposed to efficient forecast error perspective.

5.1 Vintage Data, Time-Varying Combining Weights, and Real-Time Analysis

It is important to note that everything that we have done in this chapter has a retrospective, or “off-line,” character. We work with a single vintage of GDP\(_E\) and GDP\(_I\) data and combine them, estimating objects of interest (combining weights, regime probabilities, etc.) for any period \(t\) using all data \(t=1, {\ldots }, T\). In all of our analyses, moreover, we have used time-invariant combining weights. Those two characteristics of our work thus far are not unrelated, and one may want to relax them eventually, allowing for time-varying weights, and ultimately, a truly real-time-analysis.

Fig. 9
figure 9

Inferred U.S. high-volatility regime probabilities, calculated using GDP\(_C\) versus GDP\(_E\). Solid lines are posterior median smoothed high-volatility regime probabilities calculated using GDP\(_C\), which we show with 90 % posterior intervals. Dashed lines are posterior median smoothed high-volatility regime probabilities calculated using GDP\(_E\). The sample period is 1947Q2-2009Q3. Dark shaded bars denote NBER recessions. See text and appendices for details.

One may want to consider time-varying combining weights for several reasons. One reason is of near-universal and hence great interest, at least under quadratic loss. For any given vintage of data, error variances and covariances may naturally change, as we pass backward from preliminary data for the recent past, all the way through to “final revised” data for the more distant past.Footnote 13 More precisely, let \(t\) index time measured in quarters, and consider moving backward from “the present” quarter \(t=T\). At instant \(v \in T\) (with apologies for the slightly abusive notation), we have vintage-\(v\) data. Consider moving backward, constructing combined GDP estimates GDP\(_{C,T-k}^v\), \(k=1, \ldots \infty \). For small \(k\), the optimal calibrations might be quite far from benchmark values. As \(k\) grows, however, \(\rho \) and \(\phi \) should approach benchmark values as the final revision is approached. The obvious question is how quickly and with what pattern should an optimal calibration move toward benchmark values as \(k \rightarrow \infty \). We can offer a few speculative observations.

First consider \(\rho \). GDP\(_I\), and GDP\(_E\) share a considerable amount of source data in their early releases, before common source data are swapped out of GDP\(_I\) (e.g., when tax returns eventually become available and can be used). Indeed Fixler and Nalewaik (2009) show that the correlation between the earlier estimates of GDP\(_I\) and GDP\(_E\) growth is higher than the correlation between the later estimates. Hence \(\rho \) is likely higher for dates near the present (small \(k\)). This suggests calibrations with \(\rho \) dropping monotonically toward the benchmark value of 0.45 as \(k\) grows.

Now consider \(\phi \). How \(\phi \) should deviate from its benchmark calibration value of 1.1 is less clear. On the one hand, early releases of GDP\(_I\) are missing some of their most informative source data (tax returns), which suggests a lower-than-benchmark \(\phi \) for small \(k\). On the other hand, early releases of GDP\(_E\) growth appear to be noisier than the early releases of GDP\(_I\) growth (see below), which suggests a higher-than-benchmark \(\phi \) for small \(k\). All told, we feel that a reasonable small-\(k\) calibration of \(\phi \) is less than 1.1 but still above 1.

Note that our conjectured small-\(k\) effects work in different directions. Other things equal, bigger \(\rho \) pushes the optimal combining weight downward, away from 0.5, and smaller \(\phi \) pushes the optimal combining weight upward, toward 0.5. In any particular data set the effects could conceivably offset more-or-less exactly, so that combination using constant weights for all dates would be fully optimal, but there is of course no guarantee.

Several approaches are possible to implement the time-varying weights sketched in the preceding paragraphs. One is a quasi-Bayesian calibration, elaborating on the approach we have taken in this chapter. However, such an approach would be more difficult in the more challenging environment of time-varying parameters. Another is to construct a real-time data set, one that records a snapshot of the data available at each point in time, such as the one maintained by the Federal Reserve Bank of Philadelphia. The key is to recognize that each quarter we get not simply one new observation on GDP\(_E\) and GDP\(_I\), but rather an entire new vintage of data, all the elements of which could (in principle) change. One might be able to use the different data vintages, and related objects like revision histories, to infer properties of  “forecast errors” of relevance for construction of optimal combining weights across various \(k\).

One could go even further in principle, progressing to a truly real-time analysis, which is of intrinsic interest quite apart from addressing the issue of time-varying combining weights in the above “apples and oranges” environments. Tracking vintages, modeling the associated dynamics of revisions, and putting it all together to produce superior combined forecasts remains an outstanding challenge.Footnote 14 We look forward to its solution in future work, potentially in the state-space framework that we describe next.

5.2 A Model of Measurement Error

In parallel work in progress (Aruoba et al. 2011), we pursue a complementary approach based on a state-space model of measurement error. The basic model is

$$\begin{aligned} \left[ \begin{array}{c} \text{ GDP}_{E,t} \\ \text{ GDP}_{I,t} \end{array} \right]&= \left[ \begin{array}{c} 1 \\ 1 \end{array} \right]\text{ GDP}_t + \left[ \begin{array}{c} \varepsilon _{Et} \\ \varepsilon _{It} \end{array} \right] \nonumber \end{aligned}$$
(15)
$$\begin{aligned} \mathrm GDP_{t} = \beta _0+\beta _1 \mathrm GDP_{t-1} + \eta _{t}, \end{aligned}$$
(16)

where \(\varepsilon _{t} = (\varepsilon _{Et}, \varepsilon _{It})^{\prime } \sim WN(\underline{0}, \Sigma _{\varepsilon }) \), \( \eta _{t} \sim WN(0, \sigma ^2_{\eta })\), and \(\varepsilon _t\) and \(\eta _t\) are uncorrelated at all leads and lags. In this model, both GDP\(_E\) and GDP\(_I\) are noisy measures of the latent true GDP process, which evolves dynamically. The expectation of true GDP conditional upon observed measurements may be extracted using optimal filtering techniques such as the Kalman filter.

The basic state-space model can be extended in various directions, for example to incorporate richer dynamics, and to account for data revisions and missing advance and preliminary releases of GDP\(_I\).Footnote 15 Perhaps most important, the measurement errors \(\varepsilon \) may be allowed to be correlated with GDP, or more precisely, correlated with GDP innovations, \(\eta _t\). Fixler and Nalewaik (2009) and Nalewaik (2010) document cyclicality in the “statistical discrepancy” (GDP\(_E - \text{ GDP}_I\)), which implies failure of the assumption that \(\varepsilon _t\) and \(\eta _t\) are uncorrelated at all leads and lags. Of particular concern is contemporaneous correlation between \(\eta _t\) and \(\varepsilon _{t}\). The standard Kalman filter cannot handle this, but appropriate modifications are available.

6 Conclusions

GDP growth is a central concept in macroeconomics and business cycle monitoring, so its accurate measurement is crucial. Unfortunately, however, the two available expenditure-side and income-side U.S. GDP estimates often diverge. In this chapter, we proposed a technology for optimally combining the competing GDP estimates, we examined several variations on the basic theme, and we constructed and examined combined estimates for the U.S.

Our results strongly suggest the desirability of separate and careful calculation of both GDP\(_E\) and GDP\(_I\), followed by combination, which may lead to different and more accurate insights than those obtained by simply using expenditure-side or estimates alone. This prescription differs fundamentally from U.S. practice, where both are calculated but the income-side estimate is routinely ignored.

Our call for a combined U.S. GDP measure is hardly radical, particularly given current best-practice “balancing” procedures used at various non-U.S. statistical agencies to harmonize GDP estimates from different sources. We discussed U.K. GDP balancing at some length in the introduction, and some other countries also use various similar balancing procedures.Footnote 16 All such procedures recognize the potential inaccuracies of source data and have a similar effect to our forecast combination approach: the final GDP number lies between the alternative estimates.

Other countries use other approaches to combination. Indeed Australia uses an approach reminiscent of the one that we advocate in this chapter, albeit not on the grounds of our formal analysis.Footnote 17 In addition to GDP\(_E\) and GDP\(_I\), the Australian Bureau of Statistics produces a production-side estimate, GDP\(_P\), defined as total gross value added plus taxes and less subsidies, and its headline GDP number is the simple average of the three GDP estimates. We look forward to the U.S. producing a similarly-combined headline GDP estimate, potentially using the methods introduced in this chapter.