1 Introduction

The link between macroeconomic and microeconomic behaviours has attracted a lot of interest among economists. Treating an entire heterogeneous population as if it were a single agent or modelling aggregates without reference to individual behaviour poses the question of how relevant might the results be. The question of aggregation has been, for many years, one of the most interesting and troublesome, both theoretically and in the empirical analyses of consumption (see Deaton 1992, for a detailed survey on these issues). The origin of the contradiction between the theoretical postulates and the data available to validate them is the fact that the life cycle–permanent income hypothesis (LC-PIH) models individual behaviour. However, much of the empirical analyses of these models use macroeconomic data. This contradiction inspires a major part of the studies by Friedman (1957) and many of the first empirical contributions in the consumption literature. The early work by Gorman (1959) is also concerned with the issue of aggregation over goods and the separability of the utility function.

Despite their high econometric fitness and predictive goodness, many of the empirical results with individual data analysing microeconomic demand functions put into question the empirical aggregate Keynesian consumption functions. Sonnenschein (1972, 1973), Debreu (1974) and Mantel (1974) proved that any set of excess of market demand functions satisfying the Walras Law can be derived from the axiomatic of the standard maximisation for individual utility, breaking the link between the assumptions of a representative agent and the aggregate modelisation.Footnote 1 More recently, Kirman (1992) considers that testing for any hypothesis with representative agent models is not feasible with aggregate data, given that it is not possible to discriminate, in the empirical field, between the assumptions of aggregation and the testable theoretical hypotheses.

Deaton (1992) highlights that micro-data are the appropriate information to analyse consumption models, although some problems may arise when using microeconomic household data for the empirical analysis of consumption and income. First, one has to deal with differences among individuals in the sample and control for them if any improvement is to be made in understanding their consumption.Footnote 2 Aggregation not only eliminates individual idiosyncrasies, but could also significantly reduce the effects of measurement error, at least in intermediate levels of aggregation (like cohort data).Footnote 3 Second, there is a lack of household survey data suitable for testing the predictions of the consumption theory (see Gourinchas and Parker 2002). The ideal data would be those that track individuals or households over long time series.

In some cases, as an alternative to individual data, it can be used household income and expenditure surveys that have been in the field for many years. Thus, although households cannot be tracked in these surveys, it is possible to construct time series for independent cross sections.Footnote 4 Following Browning et al. (1985) and Deaton (1985), it is possible to construct synthetic cohorts from these data. One advantage of using cohorts is that, although this procedure uses (semi-) aggregated data, we do not face the usual functional form problems associated with aggregation, as averages can be computed for whatever function of the data desired (Deaton 1992). As with short panel data, econometric complexities arise because we do not have the long panel data that, in principle, would be ideal to test the theory of consumption. In addition to measurement error problems, pseudo-panels can suffer from a trade-off problem when trying to increase the number of cohorts at the cost of reducing the sample to calculate averages.Footnote 5

To analyse the aggregation issue, we focus on the intertemporal consumption theory as this provides a framework to illustrate the aggregation problem. Much of the recent macroeconomic literature on aggregate consumption adopts the theory of the LC-PIH to develop models for a representative agent. Deaton (1992) establishes the conditions for testing LC-PIH models with both macro- and microeconomic data while Attanasio and Low (2004) analyse the approaches and problems for estimating Euler equations for consumption.

This paper aims to show the differences that arise when estimating a simple Euler equation using different aggregation levels of household data (micro-level data, cohort aggregated data and full aggregated data). To estimate our models, we use microeconomic household data, i.e. the Spanish Family Expenditure Survey (Encuesta Continua de Presupuestos Familiares, ECPF from now on), for the period 1985–1997.Footnote 6 From these data, we construct both cohort and complete aggregated averages. The main results we get are in line with the above predictions. We get an estimate of the elasticity of intertemporal substitution (EIS) similar to the studies that use the same kind of data (with comparable levels of aggregation). We provide evidence about the bias incurred when using aggregated rather than panel data (either pseudo- or true panel) and when using pseudo-panel in place of true panel data. We find a larger bias when using aggregated data (due to the transformation of the model in a non-adequate way) than the bias we find when using cohort aggregates instead of individual data. The entropy measure introduces a downward bias in the EIS, which becomes non-significant when estimating it with aggregated data. Thus, our results are in line with Guvenen (2006), who finds that heterogeneity in the stock market participation and differences in the intertemporal behaviour of consumption (depending on the level of wealth) explain the discrepancy between the value of the EIS used in real business cycle models and the value estimated with aggregated data (see Hall 1988).

Our estimate for the EIS with panel data is 0.54 (and statistically significant). With cohort data the estimated EIS is 0.3. These figures are comparable to the estimates found by Attanasio and Weber (1993) with cohort data for the British economy. Further, it is important to note that we estimate an EIS not statistically different from zero with aggregated Spanish data, similar to Hall (1988) for the US economy. Thus, our results point to the need of using panel data in order to get estimates for the EIS significantly different from zero.

The layout of the paper is as follows. In the next section, we describe the theoretical and statistical models. In Sect. 3, we present the empirical application, the data and the results using individual survey data drawn from the ECPF and from the National and Regional Accounts. We also present an exercise trying to derive empirically the biases arising from estimating the models on cohort data when in fact one has real panel data. Finally, Sect. 4 concludes.

2 The life cycle model and the Euler equation

We use the intertemporal utility maximisation framework under intertemporal constraints to model household behaviour. In this model, the explanatory variables for household consumption expenditure in period t are not only current prices and current income at t and the intertemporal utility function, but also the stock of financial assets at the beginning of period t, and the household expectation for future prices, interest rates, income, etc., as well as past prices, interest rates and income.

We analyse the evolution of consumption expenditure for a large and heterogeneous population of households. To analyse household consumption over time, we will use data on different levels of aggregation, i.e. individual household data, intermediate level of aggregated data (cohort data) and full aggregates (national aggregates).

We define mean real consumption expenditure in period t for the group of goods G, Gt,G, as

$$ C_{t,G} = \frac{1}{{N_{t} }}\sum\limits_{i \in N} {c_{it,G} } $$
(1)

where cit,G, is the real consumption expenditure of household i during period t on all goods within the commodity group G and Nt is total population in period t.Footnote 7

To analyse the relationship between average individual consumption and aggregated consumption, we consider the first order condition of a standard intertemporal optimisation setting, or the Euler equation. Under standard assumptions, including absence of problems in achieving the optimal individual utility level and rational expectations, the Euler equation for consumption can be written as follows

$$ \frac{{U_{\text{c}} \left( {c_{it + 1} ,e^{{\theta_{it + 1}^{{}} }} } \right)}}{{U_{\text{c}} \left( {c_{it} ,e^{{\theta_{it}^{{}} }} } \right)}}\frac{{1 + r_{it + 1} }}{{1 + \delta_{i} }} = 1 + \varepsilon_{it + 1} $$
(2)

where Uc is the marginal utility of consumption, rit+1 is the real interest rate for household i and δi is the subjective discount rate for household i, which is assumed to be constant. The parameter θit+1 captures some perturbations in the set of household’s preferences, reflecting the influence of certain variables on household’s tastes. Usually, it is assumed that these perturbations depend on the size of the household and the ages of its members. And, finally, εit+1 is an expectation error term, which is uncorrelated with any information known at time t.

Assuming an isoelastic utility function, we get

$$ U_{\text{c}} \left( {c_{it + 1} ,e^{{\theta_{it + 1}^{{}} }} } \right) = \frac{1}{1 - \alpha }c_{it + 1}^{1 - \alpha } e^{{\theta_{it + 1}^{{}} }} $$
(3)

where α is the coefficient of the relative risk aversion. Now, taking logarithms and making some arrangements we obtain the following testable Euler equation

$$ \Delta \ln c_{it + 1} = \beta_{i} + \frac{1}{\alpha }\ln (1 + r_{it + 1} ) + \frac{1}{\alpha }\Delta \ln \theta_{it + 1} + v_{it + 1} $$
(4)

where \( \beta_{i} = - \frac{1}{\alpha }\ln (1 + \delta_{i} ) \) captures individual heterogeneity and \( \frac{1}{\alpha } \) is the elasticity of intertemporal substitution. Although \( \ln v_{it + 1} = - \frac{1}{\alpha }\ln (1 + \varepsilon_{it + 1} ) \) might not have the appropriate properties to estimate the model, we can obtain these properties by assuming that this error term is log-normally distributed (see Zeldes 1989). Equation (4) has been widely used in empirical analyses for consumption, with both macroeconomic and individual data, in different contexts (see among others Zeldes 1989; Blundell and Stoker 2005; Gourinchas and Parker 2002; or, Crossley and Low 2011). Note that, ceteris paribus, the individual consumption rate in Eq. (4) depends on the difference between the subjective discount rate, \( \beta_{i} = - \frac{1}{\alpha }\ln (1 + \delta_{i} ) \), and the interest rate.Footnote 8 At the aggregate level, it does not exist a variable to capture this subjective discount rate, which explains that we get different estimates of the model with aggregate or individual data.

We can use the previous framework to model aggregate consumption. To do so, we assume that we have a representative agent, so that all individuals in the economy are identical. We take the mean population as a good proxy for the behaviour of the typical representative agent in the economy. In our case, we can apply the Euler equation to these means, getting the following expression from Eq. (2),

$$ \frac{{U_{\text{c}} \left( {\frac{1}{{N_{t + 1} }}\sum\nolimits_{{i \in N_{t + 1} }} {c_{it + 1} } ,e^{{\theta_{it + 1}^{{}} }} } \right)}}{{U_{\text{c}} \left( {\frac{1}{{N_{t} }}\sum\nolimits_{{i \in N_{t} }} {c_{it} } ,e^{{\theta_{it}^{{}} }} } \right)}}\frac{{1 + r_{t + 1} }}{1 + \delta } = 1 + \varepsilon_{t + 1} $$
(5)

where Nt is the number of individuals in the population in period t, relevant for the aggregate definition considered.Footnote 9 This expression is the baseline for the empirical analyses using aggregated consumption data and pseudo-panel data. The aggregation problems posed by using expression (5) might come both from the definition of consumption and from aggregating idiosyncratic shocks.

First, we discuss the problems associated with the definition of consumption. The usual way to deal with the estimation of empirical models for consumption is using a log-linearised Euler equation, such as expression (4). However, the logarithm of the mean of the real aggregated consumption is different to the mean of the logarithms of individual consumption, as it can be checked in the following expression

$$ \ln C_{t} = \ln \left( {\frac{1}{{N_{t} }}\sum\limits_{{i \in N_{t} }} {c_{it} } } \right) \ne \frac{1}{{N_{t} }}\sum\limits_{{i \in N_{t} }} {\ln c_{it} } $$
(6)

This problem can be properly tackled when generating cohort data; however, there is no way to deal with it when using national aggregated data (see Carroll 2001).

Second, one has to face the problem of aggregating idiosyncratic shocks. The two aggregation issues rose above (the problem related to the definition of consumption and the problem associated with the aggregation of the shocks) only occur simultaneously when estimating the Euler equation with National Account (NA) data. If we use individual data from the ECPF, we can separate the two problems. Differences in the results of the EIS found at the highest aggregation level can then be attributed to the aggregation of the shocks at this level. Hildebrand (1998) analyses the conditions that guarantee the appropriateness of the estimates from average aggregates without specifying a rigorous model of microeconomic behaviour. The fitness of these estimates implies that some features of the household characteristics and income distributions do not vary. But this is equivalent to assuming that long time series of aggregated data might not be appropriate to estimate these models. So, the comparison of NA with data fully aggregated from individual households allows assessing Hildebrand’s (1998) conditions. Subsequently, we would be able to check the effects of aggregating idiosyncratic shocks at three different levels of aggregation.

From our point of view, it does not seem plausible that the results from estimating the same Euler equation using data at different aggregation levels yield the same results. Our goal in this paper is to study the existence of a bias in our estimates that might arise from the two problems outlined above. For undertaking these purposes, we use the above framework and estimate a simple Euler equation with data at different levels of aggregation: NA data and different levels of aggregation of the individual data from the ECPF. Specifically, we use this last database as a panel, as a pseudo-panel and as a cross section, with different options considered for this last case.

3 Data and empirical results

3.1 The data

The objective of this work implies using different statistical sources of data on consumption. We use national and microeconomic data sets elaborated by the Spanish National Institute of Statistics, INE. These data have very different levels of aggregation and periodicity. We take data for consumption on final expenditure from the NA and expenditure on food and beverages from the Regional Accounts. The selection of these main aggregates has been motivated by the fact that they meet the definition of total expenditure and expenditure on food and beverages that can be built from the ECPF.Footnote 10

The ECPF is a rotating panel based on a comprehensive survey run by the INE, which involves interviewing 3200 households every quarter (12.5% of the households are replaced every quarter by a new randomly drawn group). In principle, each household is interviewed eight consecutive quarters, although not all of them stay in the sample for the 8 quarters. The ECPF collects information concerning household characteristics (demographics, labour category, education, etc.) and on a wide range of expenditures made on more than 240 commodity categories.

To compare the aggregates obtained from the ECPF with those from the NA, we have selected out those households that report a zero on total expenditure or income. After this selection, we are left with a sample of 150,853 observations. We have been more rigorous in the selection of the cohort and panel data samples. For these data, we have discarded those households whose income and expenditure were below and above the 1st and the 99th percentiles of their distributions, respectively. This makes that our final sample is composed by 132,856 observations.

To build cohorts, we group observations by the date of birth of the head of the household. We use 5-year age bands: the youngest cohort is composed by those households whose head was born after 1960 while the oldest is composed by those whose head was born before 1940. We obtain 10 cohorts per year whose average sizes are presented in Table 2.

In the consumption literature with individual data, models are typically estimated with non-durable expenditure data to avoid intertemporal non-separabilities that might be produced by habits or durability. However, we do not have an appropriate aggregate measure for non-durable consumption for the Spanish economyFootnote 11 and use total expenditure for all aggregation levels considered.Footnote 12

The set of prices we use to transform the data to real terms are as follows. For NA data, we use the appropriate index for each aggregate of consumption, published by the INE (sub-indexes and weights). For the cohort and panel data, we have constructed an individual Stone price index given the heterogeneous composition of the basket of goods. Finally, it is important to note that we have aggregated the data from the ECPF with variables in real terms (deflated using individual Stone price indexes), i.e. we aggregate after removing the effect of prices.

Finally, we had some problems to find an interest rate for the entire period 1970–1997. We use the commercial banks borrowing interest rate between 1 and 3 years (1980–1997). For the NA estimates, we combine different interest rate series (for the Spanish economy) in order to get a rate for the entire period analysed. We have also used other interest rates published by the Bank of Spain (1985–1995), although the results are, in all cases, very similar. It is important to note that in those exercises where we use aggregates from the ECPF, nominal interest rates do not show individual variation, although real magnitudes have variation as they are deflated using individual price indexes.

Table 1 presents the summary statistics for all the categories of expenditure from both statistical sources. We made some corrections in the aggregates we produced from the ECPF, to make them comparable with final private national aggregate consumption from the NA. In particular, we have not considered the imputed rent for homeowners in total expenditure for the ECPF as this item is not included in the definition of consumption in the NA. There are other items (such a self-consumption, self-supply) not included in the NA that we have not been able to remove from the ECPF; as they are mixed with other expenditure categories, we cannot discard them from total expenditure. This explains that there still remains a difference between the data in both surveys. Given the problems to identify some items with the information of the survey, we have not subtracted any amount in the non-durable and food categories. The statistics for the aggregates from the ECPF and from cohort data show the expected gradation. Table 2 presents the average size for the different cohorts defined. As it can be checked, the sizes of the cohorts are fully comparable to the standard values in the literature, if not bigger (except for the 1st and 10th cohort, for obvious reasons).Footnote 13

Table 1 Summary statistics
Table 2 Cohort sizes

Prior to presenting the estimation results, we consider that a graphical analysis could be of interest. First, given that our focus is on the life cycle patterns of consumption, it is important to graph the main features of the data variables that might be important for consumption. We look at the life cycle behaviour of income and consumption (total expenditure). In Figs. 1 and 2, we plot the behaviour of these variables. To construct these figures, we use the constructed cohort data. Each line then represents the evolution over time of the relevant variable for one cohort (defined over a 5-year band). In Fig. 1, we present the life cycle path of consumption (total real expenditure) and get the familiar pattern that consumption raises initially and then falls after the household gets to the late 40s, which is a result also obtained for the British Family Expenditure Survey (FES), see Browning et al. (1985) and Blundell et al. (1994). Figure 2 shows the evolution of real income. Comparing these two graphs, we can conclude (assuming that income is exogeneous) that it seems that consumption tracks income over the life cycle, as in Blundell et al. (1994).

Fig. 1
figure 1

Life cycle total expenditure

Fig. 2
figure 2

Life cycle total income

Toda and Walsh (2015), using the US Consumer Expenditure Survey (CEX), provide some evidence that individual consumption growth data follow the power law in the upper and lower tails of the distribution. They also show that consumption growth for cohort data allows estimating Euler equations, being these data less exposed to suffer from the problems implied by fat tails. This is so because, as proved by Battistin et al. (2009), the consumption distribution is approximately lognormal within age cohorts. Battistin et al. (2009) also use data from the CEX, complemented with information from the Panel Study of Income Dynamics (PSID). These results are of great interest in this work to interpret the differences obtained between individual and cohort equations.

3.2 Empirical results

In this section, we present the estimation results for the Euler equation using our data sets at different aggregation levels. We will also present two robustness check exercises: on the one hand, we check if there is excess sensitivity in the Euler equation, and on the other, we test for robustness using different age cohort samples (young, intermediate and old).

Table 3 presents the results of the Euler equation estimation on different data sets. In the first column, we estimate the Euler equation without controlling for demographics, and in the second we present the specifications including demographics.Footnote 14

Table 3 IV-Estimated Euler equations

Table 3 shows that the results we report are very different in terms of the EIS parameters estimated. First, this parameter is negative (although not statistically significant) with NA data. This result is at odds with the results obtained in the related literature for other countries, where the estimated EIS value is positive, but very low (see Patterson and Pesaran 1992; or, Yogo 2004), but it is comparable to Hall (1988) who estimated an EIS not very different from zero.Footnote 15 When using microeconomic data, the estimated EIS is positive (although less than one), but, unfortunately, the standard errors are very large (except for cohort and panel data). This is in line with the available evidence from other countries, for example Runkle (1991), Attanasio and Weber (1993, 1995) and Biederman and Goenner (2008). We obtain that the estimated EIS value increases with the level of disaggregation, being the highest estimated value the one obtained with panel data (0.550, with a standard error –s.e.– of 0.118). With cohort data, we get an estimated value for the EIS that although is positive and significant, is much smaller than the panel data estimate. Further, the statistical significance of the EIS estimates increases with the level of disaggregation obtaining significant estimates with cohort and panel data. Since we can separate the use of an inadequate transformation of consumption and the aggregation of idiosyncratic shocks only when using individual data, the comparison of all sets of results stresses the importance of this problem for the estimate of the EIS and the superiority of panel data, and somehow cohort data, as compared to other (aggregate) data for this purpose.

Finally, we also report in Table 3 the estimates for the adjusted aggregate sample. For these data, we perform two exercises, one where we correctly aggregate consumption (the sum of the logs), as it can be seen in the right-hand side of Eq. (6), and the alternative incorrect aggregation (the log of the sum), as in the left-hand side of the same equation. We obtain that the estimated EIS with the correct and incorrect aggregation procedures is not statistically significant, as it occurs with the NA estimate.

Following Attanasio and Weber (1993, 1995) and to check for the need to control for household characteristics, in the second column in Table 3 we provide the estimates for the same specifications accounting for some demographics (age and number of household members).Footnote 16 The introduction of demographic variables slightly changes the results presented above, and it does not significantly affect the estimates of the EIS. All in all, we would like to underline that all the above reported results for Spain are very much in line with the results previously obtained by Attanasio and Weber (1993) for the UK and Attanasio and Weber (1995) for the USA, apart from the fact that Attanasio and Weber (1993) found that the estimated EIS changed when demographic variables were introduced in the specification. In their case, the estimated value for the EIS changes from 0.59 (without demographics) to 0.77 (with demographics).

In all estimates in Table 3, we have performed three tests for the validity of the instruments: under-identification (Kleibergen–Paap LM statistic) test, over-identification (Hansen J statistic) test and weak identification (Kragg–Donald Wald F statistic) test. All these tests perform properly for the panel data and cohort estimates. However, this is not so for the aggregate estimates. This confirms the validity of the instrument sets used in each specification.

3.3 Robustness checks

As typical in this literature, it is important to check the extent of the excess sensitivity. Thus, in Table 4 we provide an excess sensitivity test and report the results for the same specifications reported in Table 3, but including lagged growth of income as a covariate.Footnote 17 As before, with NA aggregates the specifications do not include demographics. As regards the excess sensitivity, it is striking to note that we only find excess sensitivity for the NA aggregate results. However, we do not find excess sensitivity for all the other levels of disaggregation. Further, the results we obtain for the EIS, both with cohort aggregates and with panel data, are very similar to those obtained before, which indicates that the excess sensitivity plays no role in these specifications. This is in line with other seminal papers, such as Runkle (1991). However, differently to Attanasio and Weber (1993) we do not find excess sensitivity to income with either cohort or panel data when we do not control for demographic characteristics. Further, our results also contrast with Guariglia and Rossi (2002) who associate the excess sensitivity to the presence of habits in the utility function.Footnote 18

Table 4 Excess sensitivity test

A second robustness check consists on replicating the above exercise grouping individual level data in different cohorts (all individuals, young, intermediate age and old).Footnote 19 These results are reported in Table 5. In the first column, we provide the estimates of the Euler equation without controlling for demographics, in the second column we control for demographics, and in the third one we present the estimates where we check for the excess sensitivity. The estimated values for the EIS vary across the different samples provided. First, all the samples considered provide EIS estimates bigger than the estimates obtained for cohort data in Table 3, except for the case of the young cohort without demographics. However, none of the EIS estimates is statistically significant. This contrasts with the statistically significant (at the 1% level) estimates for the cohort data reported in Table 3. There are some reasons related to aggregation that may explain this difference. Thus, although the increase in the number of individuals in the (young, broad and old) samples might reduce the measurement error associated with averaging, it is also true that creating bigger groups worsens the bias associated with the inclusion of individuals that behave differently in consumption. These two effects act in opposite directions, but might explain the difference. Second, the inclusion of demographics does not change very much the estimated EIS, except for the young cohort. Finally, the results of the excess sensitivity test (reported in column 3) can be used to detect aggregation problems, given that the literature attributes different sensitivity to different population groups.

Table 5 IV-Estimated Euler equations for different samples

Differently to the results presented above, when we include lagged income we find slightly changes (in a non-homogeneous manner) for the EIS estimated. Thus, we find excess sensitivity in two of the samples (young and old cohorts), which confirms previously available evidence related to these demographic groups for which the intertemporal optimisation model does work as expected. However, this failure might be related to different reasons: liquidity constraints in the case of young individuals and uncertainty about the retirement or the bequest motive for the elderly. Young cohorts face more problems than the rest of the population to obtain the credit they need, see Grant (2007), Benito (2006) and Jappelli (1990), while older cohorts usually show a consumption behaviour not explained by the intertemporal substitution model, finding frequently that people decrease consumption after retirement, which produces again excess sensitivity to income, Banks et al. (1998).

The following robustness check provides evidence on aggregation biases due to the incorrect transformation of the model. In this respect, recall that the difference between both terms in Eq. (6) provides the entropy measure. We calculate the difference between the log of consumption at the aggregate level and the sum of the log of consumption at the individual level. We plot our measure of entropy and the interest rate in Fig. 3. The evolution of these two variables provides evidence about the relationship between them, which is confirmed by a positive and highly significant coefficient of correlation. This result implies that there is an important aggregation bias in the estimated EIS from aggregate consumer data when a log-linear specification for consumption growth is used. This result adds further evidence about the inappropriateness of both NA data and aggregated data from the ECPF to estimate the IES, pointing to the existence of a bias in the estimated EIS both with NA and with inadequately aggregated data from the ECPF. Whether the inclusion of demographic variables in the specification has implications for explaining the evolution of consumption as well as for additional bias in the EIS is not confirmed by our results. Moreover, since we do not find significant estimates of the EIS either in the data correctly or incorrectly aggregated from the ECPF, we cannot confirm this effect in our empirical results.

Fig. 3
figure 3

Relationship between the interest rate and the entropy measure. Notes: 1. r1 is the interest rate. 2. y3 is the entropy measure

Finally, we follow Verbeek and Nijman (1992) to compute the existence of a bias when using pseudo-panel instead of true panel data. We propose an exercise where we assume that the heterogeneous household effects are correlated with the interest rate.Footnote 20 For this purpose, we take Eq. (4) for the sample of households (i = 1, …, N; t = 1, …, T) and first assume that we have neither an endogeneity problem of the interest rate nor a measurement error problem. In these circumstances, we could estimate the Euler equation by OLS, except for the fact that βi could still be correlated with the other covariates. Under correlated effects, any transformation of (4), i.e. within groups, first differences or any other, provide consistent estimates for the parameters. In a model with correlated effects, an alternative to get consistent estimates would be to assume a structure for βi (see Mundlak 1978; or Chamberlain 1984). Thus, we assume that the interest rate is correlated with household effects and we model this correlation as a function of age. There are economic reasons for assuming this correlation, and we could think on other variables correlated with the interest rate, but for simplicity, we assume that the only variable correlated with the interest rate is age. Then, following the above reasoning we assume:

$$ \beta_{i} = \lambda \overline{{\ln \left( {1 + r_{it + 1} } \right)}} + v_{i} $$
(7)

and

$$ \ln \left( {1 + r_{it + 1} } \right) = \mu_{t} + \gamma_{t} Z_{i} + \omega_{it} $$
(8)

where Zi is the squared value of the difference between the birthdate of the head of household i and the average birthdate for the whole sample (remember that our original panel data corresponds to quarters, and therefore, we can define individual birthdates quite accurately). We use the squared values of the differences to avoid compensation among positives and negatives and to allow non-linearities in the structure of the heterogeneous effects. Then, we can estimate the model composed by (4), (7) and (8) by OLS (assuming correlated effects) and compare these estimates with the specification estimated by OLS without assuming correlated effects (i.e. without the additional variables defined in expressions 7 and 8). We estimate these specifications with micro- and cohort data and compare the estimates for the EIS. In both cases, we repeat the estimations by instrumental variables in order to account for the endogeneity of the interest rate. These results are reported in Table 6.

Table 6 Bias from pseudo-panel to real panel data

According to the theoretical results (see Verbeek and Nijman 1992), the maximum bias we incur when using within groups based on cohort instead of panel data should be around 40%. In our case, the difference between the parameter estimates using the panel and the cohort data is around 35% for the case of OLS and higher for the specifications estimated by instrumental variables (see results in Table 6). Therefore, our results are in line with the predictions in Verbeek and Nijman (1992), which points to recommend the use panel data, whenever it is possible, to estimate Euler consumption equations.

4 Conclusions

The relevance of the problem of aggregation is well known since the 1960s, although it is surprising the scarce number of works devoted to this relevant issue in the literature. This lack of attention might be probably due to the difficulties, in the empirical applied analysis, to combine information from different levels of aggregation and datasets required for doing so. In this sense, this paper tries to fill this gap in the literature.

In this work, we have used consumption data for different levels of aggregation to assess the importance of the aggregation biases in the estimation of Euler equations for consumption. The first step we have undertaken is to check the adequacy of the different kind of datasets used, given that they have been extracted from different surveys with very different characteristics. Among these differences are the period of availability of the data and the different definitions for the variable expenditure considered. In general, we have homogenised the data as much as possible, generating the adequate aggregates for the comparisons.

In summary, our main result is concerned with the estimation of the EIS. This is a parameter that has been thoroughly investigated with different types of datasets for many economies, due to its economic relevance. The contradictory results obtained for this parameter have been frequently attributed to the problem of aggregation (see, for example, Attanasio and Weber 1993; Attanasio 1999), although very few works have investigated the exact reasons for the differences between the values obtained or the size of the aggregation bias in the estimations as performed in this paper, using all aggregation levels of the data. Our analysis is in line with Attanasio and Weber (1993), who found important differences when comparing results from aggregate cohort data and aggregate national level data. However, we find that including demographics in the specification does not have any effect on the estimated EIS. We further add to the literature the fact that we also study the impact on the EIS of using true panel data as compared to pseudo-panel data. We find that with true panel data, the EIS estimated is positive and significant what provides empirical evidence to Verbeek and Nijman (1992) suspicions about the existence of important biases in the estimates attributable to non-controlling for individual effects. This does not occur with true panel data as we can control for correlated effects, which implies that these data are superior to undertake this task. This is also in line with Alan and Browning (2010) or Alan et al. (2018) who indicate the importance of controlling for individual heterogeneity when adjusting reduced-form, semi-structural or structural equation models.