1 Introduction

The study of convergence in GDP per capita, or similar variables, among territories has been one of the central issues in the literature on economic growth and regional economics. This interest is logical from the point of view of the design of economic policy as well as in the arena of economic theory discussion, because it is fundamental to find empirical evidence about where and under what conditions it is possible to observe processes of economic convergence or divergence. For example, identifying patterns of divergence or very slow convergence among European territories would provide the empirical evidence to support active and expensive territorial cohesion policies, such as the European Union Cohesion Policy, which is now the most expensive policy in the EU budget. From an academic point of view, the measurement of convergence processes in different scenarios implies finding an empirical evidence for the hypothesis of decreasing returns while divergence means the rejection of this hypothesis giving evidence in line with endogenous growth models or the framework of urban and regional economics.

There are different ways of studying convergence among territories, but the so-called \(\sigma \) and, especially, \(\beta \)-convergence are the approaches most commonly applied. \(\sigma \)-Convergence is perhaps the simplest approach. It basically quantifies the dispersion of income per capita or a similar variable in different moments along time: if the standard deviation of the variable of interest decreases along time, this is considered as an indication of convergence. This kind of analysis is usually conducted as an exploratory or preliminary analysis in the study of convergence. The seminal paper of Baumol (1986) introduced the concept of \(\beta \)-convergence: the relation between the growth rate of a particular economic variable during a period of time with the initial level of that variable in a set of territories—countries or regions. The literature on the econometrics for estimating \(\beta \)-convergence has been growing in the last few years. Islam (2003) or Magrini (2004) present surveys of this literature classifying the studies of \(\beta \)-convergence into different types of approaches: (1) panel data, (2) time series, (3) club convergence and (4) spatial dependence.

In general, the literature on \(\beta \)-convergence does not pay much attention to the level of spatial disaggregation on which the data are observable. This could be partially explained by a practical reason: the lack of information at a detailed spatial scale for many economies. For instance, in the case of European countries information on value added or income is normally available only at the scale of NUTS-2 or NUTS-3 administrative regions, which are constructed as the aggregation of a range of smaller areas of different characteristics. Additionally, a more theoretical reason justified that the role of the spatial scale of the data was neglected: the neoclassical framework on which the initial models were built on did not consider this issue as important. Oppositely, alternative approaches for modelling economic growth explicitly consider the role played by variables at a local level, where agglomeration economies and other centripetal forces have an effect at a sub-regional level. As a consequence, the aggregation of spatially disaggregated data into larger regions could cast some doubts on the empirical evidence found in convergence studies based on spatially aggregated data if regions are characterized by a high degree of sub-regional heterogeneity.

This paper studies the consequences of aggregation of spatial data in convergence analysis. More specifically, we aim at quantifying the effect of neglecting small-scale processes derived from estimating \(\beta \)-convergence equations based on spatially aggregated data. Our research bases on previous studies that have already called the attention to the effect of the aggregation, like in the work by Theil (1954) for the general case on linear regression models or, more recently, by Arbia and Petrarca (2011) for the case of spatially dependent data.

The paper is structured as follows. Section 2 reviews the literature on economic growth, particularly on how the spatial scale plays a role on this literature. Section 3 derives the properties of ordinary least squares (OLS) estimators of \(\beta \)-convergence equations from cross-sectional data, and Sect. 4 quantifies this effect by means of numerical simulations applied to different structures of spatial data. Finally, Sect. 5 closes the paper with some remarks and potential future research lines and possible econometric solutions for the empirical analysis.

2 The relevance of the spatial scale in the regional convergence analysis

The neoclassical economic growth theories are mainly based on the role of the decreasing returns in the different production factors. Solow’s model (1956) concludes that, in the long run, all territories will converge to the same level of GDPpc, provided that we have taken into account the relevant factors of an economy and that there exist decreasing returns in the production factors. This model also predicts that there is a constant growth of GDPpc in the steady state, which is equal to the technological growth. The \(\beta \)-convergence translates this theoretical framework into a simple empirical equation: the relation between the growth rate of a particular economic variable during a period with the initial level of that variable in a set of territories. When another regressor is considered it is absolute \(\beta \)-convergence, whereas if other explanatory variable is included, it refers to a conditional \(\beta \)-convergence analysis. This estimation framework allows for testing if poorer areas grow faster or not than the richer ones. If the parameter \(\beta \) is estimated with a negative sign, this indicates that lower levels of income per capita produce higher growth rates, leading to a process of convergence in the long run. A positive estimate of \(\beta \) would reveal a process of divergence. Under this approach, the spatial scale is not relevant because the logic of decreasing returns operates in the same way in all spatial scales.

Alternative approaches in the literature on economic growth, however, pointed out again the relevance of the spatial scale in the empirical analysis by taking into account the presence of local processes of endogenous growth, as well as the relevance of the spatial scale and agglomeration economies. For instance, Myrdal (1957), Boudeville and Montefiore (1968) or Dixon and Thirlwall (1975), among others, highlight the importance of cumulative processes in rich territories due to the movements of capital and workers, which makes them even more attractive while the opposite situation happens to poor places. Romer (1990) developed the model of endogenous technological change, which was later extended by Mankiw et al. (1992) considering human capital as a relevant factor. These models argue that endogenous growth takes place mainly at the local level. Additionally, a vast body of literature also pays attention to the role of the scale—economies of scale—and agglomerations—economies of agglomerations, starting from the contributions by Marshall (1920). The gains derived from large-scale production and from positive externalities associated with size lead to the concentration of economic activity in central locations from where the largest possible market is accessible. Additionally, more recent literature also stresses the positive link between productivity and the presence of a diversified, highly qualified and versatile labour pool in large cities (Duranton and Puga 2000; Glaeser 1994, 1998; or Quigley 1998). In line with all this literature, it is possible to identify central and peripheral areas within regions, which is one of the essential concepts in the New Economic Geography (NEG) models (Krugman 1991; Krugman and Venables 1995 and Fujita and Krugman 1995). According to this literature: (1) there are incentives to largely concentrate the production in central areas; and (2) the intra-regional and inter-country processes of specialization and trade reinforce the processes of concentration and, in consequence, of divergence. Under a NEG approach, cities and metropolis—local areas—are in the centre of the analysis, drawing the attention to cities as the missing link between the macroeconomic theories of growth and the spatial empirical analysis.

Summarily, approaches such as the classical regional economics or the NEG models have a more local-based perspective than their neoclassical counterparts, which pays no attention to spatial aggregation.

Besides the theoretical discussion on the appropriate spatial scale, from the perspective of the empirical estimation of \(\beta \)-convergence equations, the role played by the spatial scale on which models are estimated is equally interesting if the conclusions of the empirical analysis could partially depend on this scale. This issue is generally denominated as a modifiable areal unit problem (MAUP), and its consequences have been explored since the 1930s (see Gehlke and Biehl 1934), and later explained in detail by Openshaw and Taylor (1979) or Openshaw (1983). Basically, one of the effects of the MAUP—the so-called scale effect—refers to the aggregation bias that emerges if data are aggregated into larger units—for example, cities to regions.Footnote 1

The study of the effect of data aggregation on the estimation of empirical models has a relative long tradition in economics. For instance, Theil (1954) already studied these effects for the case of linear regression models. More recently, Arbia and Petrarca (2011) explored the effects of aggregation in a scenario of special dependence in the data.Footnote 2 However, the estimation of \(\beta \)-convergence equations has some particularities that make the issue of data aggregation specially interesting. First, the literature generally focuses on the effects of data aggregation in linear models, while the usual functional forms applied for \(\beta \)-convergence equations are nonlinear. Moreover, the study of processes of convergence normally distinguishes between convergence between countries or between regions. While the definition of “country” is univocal, the definition of “region” is a more unclear concept—as argued previously—and several alternatives for grouping basic spatial units could be used to construct aggregated regions. This makes the empirical study of regional convergence to be at least partially conditioned by the particular configuration of regions on which the study is based.

To illustrate the role played by the spatial scale for regional convergence, a simple estimation of an absolute \(\beta \)-convergence equation has been made for the case of the European Union. Annual data on GDP per capita in Purchasing Power Standard (PPS) have been taken at the scale of NUTS-3 regions, and absolute \(\beta \)-convergence equations were estimated for different definitions of regions, namely NUTS-1, NUTS-2 and NUTS-3. The dependent variable is the growth rate of GDP per capita between 200 and 2011 to be regressed on the (log of) GDP per capita in 2000. A summary of the results is reported in Table 1.

Table 1 \({{\beta }}\)-Convergence equation for the EU-28 (2000–2011) for different spatial scales

Results in Table 1 show how the estimate of the \(\beta \) parameter at the scale of large NUTS-1 regions is remarkably higher than if the equations were estimated at the scale of NUTS-2 or NUTS-3 regions. Paying attention not only to the estimates of \(\beta \) parameters, but to the speed of convergence, it ranges between 2.38 (NUTS-3) and 3.38 % (NUTS-1) again depending on the specific definition of region applied. As a consequence, the time required to reduce the regional differences in the EU to one half of their initial levels—the so-called half-life, would be of around 20 years if the regions are defined as NUTS-1 units but approximately 30 if they were as NUTS-3.

This type of issues on regional convergence analysis has deserved some attention in previous empirical literature. Miller and Genc (2005), for example, estimated \(\beta \)-convergence equations under several possible spatial divisions for the US aggregating data available at county level, finding only a very minor effect of the scale on their results. More recently, Resende (2011) based on data collected at several spatial scales for the case of Brazil finding that their results were heavily conditioned by the specific criterion used to form regions: by using data grouped by Brazilian states he estimated a significant and negative \(\beta \) parameter, but the conclusion was the opposite when the \(\beta \)-convergence equations were estimated at a municipal scale. Even when these studies are interesting, they are limited to specific cases and particular periods of time, which limits the possibilities of drawing any general conclusions from them. The next section studies analytically the estimation of \(\beta \)-convergence equations and the properties of a least squares estimator of equations from spatially disaggregated and aggregated data.

3 The effect of the aggregation on the OLS estimation of \(\beta \)-convergence equations with cross-sectional data

The literature studying the empirics of estimating of \(\beta \)-convergence equations started with the cross-sectional analyses of Baumol (1986), Barro (1991), Barro and Sala i Martin (1991) or Mankiw et al. (1992), to later accommodate estimators capable to exploit panel-data structures as proposed by Islam (1995) or Lee et al. (1997).Footnote 3

While panel-data estimators are the type of estimation strategy most commonly followed by far in the context of analysing country data, in the context of regional analysis is not uncommon to base the estimation of \(\beta \)-convergence equations on cross-sectional data due to information availability (see, e.g., Azzoni 2001, for Brazil; Rodríguez-Pose and Sánchez-Reaza 2002, for Mexico; Cuadrado 2001, for Europe; or Raiser 1998, for China). This section studies the properties of a traditional ordinary least squares (OLS) estimator of \(\beta \)-convergence equations based on a cross section of data.

Let us assume an economy that is divided into different spatial units that are created according to several criteria for geographical aggregation. More specifically, suppose that the economy is divided into \(i=1,\ldots , n\) basic spatial units—municipalities or cities—that are aggregated into \(j=1,\ldots ,m(m<n)\) groups—regions. In line with the ideas of New Economic Geography and endogenous growth theories, we assume that the process of income generation takes place at the basic spatial scale of n units. This section studies the effects on the conclusions of convergence analysis depending on the scale at which the outcome data are observable: directly observable at the original scale (n local places) or at the aggregated scale (m regions). If the conclusions about the coefficient depend on the level of aggregation, this will be a signal that a potential MAUP is somehow “contaminating” our analysis.

Our starting point will be the formulation developed in Arbia and Petrarca (2011) for the case of cross-sectional data in a linear regression model that are generated at a given spatial level, but then observed at a more aggregate scale. The following equation describes the model to be estimated at a disaggregated scale with n spatial units:

$$\begin{aligned} {\varvec{y}}={\varvec{X}}{\varvec{\beta }} +{\varvec{u}} \end{aligned}$$
(1)

where \({\varvec{y}}\) is the \((n\times 1)\) vector with the dependent variable, \({\varvec{X}}\) is a \((n\times K)\) matrix with the K regressors considered in the equation, \({\varvec{\beta }}\) is the \((K\times 1)\) vector with the parameters to be estimated and \({\varvec{u}}\) is the typical \((n\times 1)\) disturbance, which is assumed to distribute normally around zero with a constant variance \(\sigma ^{2}\). If the data of the n units are aggregated at a higher geographical scale with m locations, the new data set is defined by:

$$\begin{aligned} {\varvec{y}}^{*}= & {} {\varvec{Gy}} \end{aligned}$$
(2)
$$\begin{aligned} {\varvec{X}}^{*}= & {} {\varvec{GX}} \end{aligned}$$
(3)
$$\begin{aligned} {\varvec{u}}^{*}= & {} {\varvec{Gu}} \end{aligned}$$
(4)

Being \({\varvec{G}}\) the aggregation matrix with dimensions \((m\times n)\), including elements like:

$$\begin{aligned} {\varvec{G}}=\left[ \begin{array}{ccccccccc} g_{11} &{}\quad \ldots &{}\quad g_{1r_1} &{}\quad &{}\quad \ldots &{}\quad &{}\quad 0&{}\quad \ldots &{}\quad 0\\ 0&{}\quad \ldots &{}\quad 0 &{}\quad {g_{21} }&{}\quad \ldots &{}\quad g_{2r_2 } &{}\quad 0&{}\quad \ldots &{}\quad 0\\ &{}\quad \cdots &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \cdots &{} \\ 0&{}\quad \ldots &{}\quad 0 &{}\quad &{}\quad \cdots &{}\quad &{}\quad g_{m1}&{}\quad \ldots &{}\quad g_{mr_m }\\ \end{array} \right] \end{aligned}$$
(5)

where each row indicates that the original data are aggregated—grouped—into m different locations, being the number of original spatial units differently aggregated in each case \((r_{1,} r_2 ,\ldots , r_m )\).

In this context, the aggregated equation is defined as:

$$\begin{aligned} {\varvec{y}}^{*}= {\varvec{X}}^{*}{\varvec{\beta }}^{*}+{\varvec{u}}^{*} \end{aligned}$$
(6)

where

$$\begin{aligned}&\displaystyle E\left( {{\varvec{u}}^{*}} \right) = E\left( {\varvec{Gu}}\right) =\mathbf{0} \end{aligned}$$
(7)
$$\begin{aligned}&\displaystyle Var\left( {{\varvec{u}}^{*}} \right) = E\left( {\varvec{u}}^{*}{\varvec{u}}^{*\prime } \right) =E\left( {\varvec{Guu}}^{\prime }{\varvec{G}}^{\prime } \right) ={\varvec{GG}}^{\prime }\sigma ^{2} \end{aligned}$$
(8)

In their paper, Arbia and Petrarca (2011) deal with the specific case of perfect aggregation where the elements of this aggregation matrix \({\varvec{G}}\) are unitary values:

$$\begin{aligned} {\varvec{G}}=\left[ \begin{array}{ccccccccc} 1&{}\quad \ldots &{}\quad 1 &{}\quad &{}\quad \ldots &{}\quad &{}\quad 0&{}\quad \ldots &{}\quad 0\\ 0&{}\quad \ldots &{}\quad 0 &{}\quad 1&{}\quad \ldots &{}\quad 1 &{}\quad 0&{}\quad \ldots &{}\quad 0\\ &{}\quad \cdots &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \cdots &{} \\ 0&{}\quad \ldots &{}\quad 0 &{}\quad &{}\quad \ldots &{}\quad &{}\quad 1&{}\quad \ldots &{}\quad 1\\ \end{array} \right] \end{aligned}$$
(9)

Being the number of ones in every row always equal to \(r=m/n\). They show how the OLS estimator of \({\varvec{\beta }}^{*} (\widehat{{\varvec{\beta }}}^{*})\) of equation (6) is an unbiased estimator of \({\varvec{\beta }}\) in the original equation (1), being the variance of the OLS estimator in the aggregated equation (6) bigger than the original variance of the OLS estimator in (1):

$$\begin{aligned} E\left( \widehat{\varvec{\beta }}^{*} \right)= & {} E\left( {\left[ {{\varvec{X}}^{\prime *}{\varvec{X}}^{*}} \right] ^{-\mathbf{1}}{\varvec{X}}^{\prime *}{\varvec{y}}^{*}} \right) =E\left( {\left[ {{\varvec{X}}^{\prime }{\varvec{G}}^{\prime }{\varvec{GX}}} \right] ^{-\mathbf{1}}{\varvec{X}}^{\prime }{\varvec{G}}^{\prime }{\varvec{Gy}}} \right) \nonumber \\= & {} E\left( {\left[ {{\varvec{X}}^{\prime }{\varvec{X}}} \right] ^{-\mathbf{1}}{\varvec{X}}^{\prime }{\varvec{y}}} \right) =E\left( \widehat{{\varvec{\beta }}} \right) ={\varvec{\beta }} \end{aligned}$$
(10)
$$\begin{aligned} Var\left( \widehat{\varvec{\beta }}^{*}\right)= & {} {\varvec{GG}}^{\prime }\sigma ^{2}\left[ {{\varvec{X}}^{\prime }{\varvec{G}}^{\prime }{\varvec{GX}}} \right] ^{-1}>Var\left( \widehat{{\varvec{\beta }}} \right) \end{aligned}$$
(11)

In other words, the scale effect does not represent a problem of bias, although it generates an efficiency problem.

The \(\beta \)-convergence equations, however, are not characterized by this same response to the scale effect, due to some particularities in the aggregation scheme of the dependent and the independent variables and the logarithmic form of the equation. In order to justify this claim, let us state the typical absolute \(\beta \)-convergence equations estimated for a cross section of n spatial units as:Footnote 4

$$\begin{aligned} \ln \left( {\frac{y_{it} }{y_{i0} }} \right)= & {} \alpha +\beta \, \ln \left( {y_{i0}} \right) +u_{it} ;\text { or }\nonumber \\ \ln \left( {y_{it} } \right)= & {} \alpha +\left( {1+\beta } \right) \, \ln \left( {y_{i0} } \right) +u_i \end{aligned}$$
(12)

where the growth in an economic indicator y as GDP or income, value added, etc., per capita between periods 0 and t in location i regressed on the logs of the initial variable per capita \((y_{i0})\) on the same location. One problem with aggregated data for estimating equations like (12) is that the nonlinearities in the dependent and explanatory variables are not compatible with the equivalences between the aggregated and disaggregated equation. More specifically, the aggregate version of the absolute \(\beta \)-convergence equations equation will be:

$$\begin{aligned} \ln \left( {\frac{y_{jt}^{*} }{y_{j0}^{*} }} \right)= & {} \alpha ^{*}+\beta ^{*}\, \ln \left( {y_{j0}^{*} } \right) +u_{jt}^{*} ;\hbox { or }\nonumber \\ \ln \left( {y_{jt}^{*} } \right)= & {} \alpha ^{*}+(1+\beta ^{*})\, \ln \left( {y_{j0}^{*} } \right) +u_{jt}^{*} \end{aligned}$$
(13)

Being:

$$\begin{aligned} {\varvec{y}}_{\mathbf{0}}^{*}= & {} {\varvec{Gy}}_{\mathbf{0}} \end{aligned}$$
(14)
$$\begin{aligned} {\varvec{u}}^{*}= & {} {\varvec{Gu}} \end{aligned}$$
(15)

Matrix \({\varvec{G}}\) represents the aggregation scheme for the initial values per capita, with a typical element \(g_{ij} \) indicating the population share of the basic spatial unit i on the aggregated location j measured in the initial period. In contrast to the type of equations aggregated as in (6), the dependent variable of the equation estimated with aggregate data is given by:

$$\begin{aligned} \ln \left( {{\varvec{y}}_{{\varvec{t}}}^*} \right) = \ln \left( {{\varvec{Hy}}_{{\varvec{t}}} } \right) \ne {\varvec{Gy}}_{{\varvec{t}}} \end{aligned}$$
(16)

where \({\varvec{H}}\) is the aggregation matrix where a typical element \(h_{ij} \) indicates the population share of the spatial unit i on region j measured in the final period. In general, this matrix is not necessarily equal to \({\varvec{G}}\), given that the elements of \({\varvec{H}}\) are the population shares in the final period and the populations in each period can be different.

Note that Eq. (10) states that the expected value of the OLS estimator with aggregated data is given by \(E\left( {\left[ {{\varvec{X}}^{\prime }{\varvec{G}}^{\prime }{\varvec{GX}}} \right] ^{-\mathbf{1}}{\varvec{X}}^{\prime }{\varvec{G}}^{\prime }{\varvec{Gy}}} \right) \) and it is equal to \(\beta \), while a different aggregation scheme would modify the form of the estimator being its expected value \(E\left( {\left[ {{\varvec{X}}^{\prime }{\varvec{G}}^{\prime }{\varvec{GX}}} \right] ^{-\mathbf{1}}{\varvec{X}}^{\prime }{\varvec{G}}^{\prime }{\varvec{Hy}}} \right) \). When the elements of matrix \({\varvec{H}}\) are larger than the elements of \({\varvec{G}}\), the estimator will present a positive bias, while a negative bias will be the consequence of the elements of \({\varvec{H}}\) being smaller than those in \({\varvec{G}}\). The comparison between these two matrices can be made in terms of the Euclidean norms of their row vectors, comparing \(\sqrt{{\varvec{h}}^{\prime }_{\varvec{j}} {\varvec{h}}_{\varvec{j}}}\) with \(\sqrt{{\varvec{g}}^{\prime }_{\varvec{j}} {\varvec{g}}_{\varvec{j}}}\). These norms would account for the concentration of population shares on each region j—they can be interpreted as a Herfindahl index for the distribution of population in region j. If population in the final period is more unequally distributed than in the initial period and, in general, \(\sqrt{{\varvec{h}}^{\prime }_{\varvec{j}} {\varvec{h}}_{\varvec{j}} }\ge \sqrt{{\varvec{g}}^{\prime }_{\varvec{j}} {\varvec{g}}_{\varvec{j}}}\) this would lead to a positive bias in the estimation of \(\beta \). The opposite situation will happen when the population in the final period is more evenly distributed within regions than in the initial period.

Even if the aggregation criterion reflected in \({\varvec{H}}\) was the same as the aggregation scheme present in matrix \({\varvec{G}}\), an additional problem derived for the nonlinear nature of the \(\beta \)-convergence equation will be present, affecting the properties of the OLS estimation from aggregated data. Assuming a case where \({\varvec{G}}={\varvec{H}}\), note that \(\ln \left( {{\varvec{y}}_{\varvec{t}}^*} \right) = \ln \left( {{\varvec{Gy}}_{\varvec{t}}} \right) \ne {\varvec{Gy}}_{{\varvec{t}}}\). This problem is the same with the matrix of explanatory variables \({\varvec{X}}^{*}\) (which in the case of absolute \(\beta \)-convergence equations corresponds to the log of the initial levels \({\varvec{y}}_{\mathbf{0}}^*)\) given that \(\ln \left( {{\varvec{y}}_{\mathbf{0}}^*} \right) =\ln \left( {{\varvec{Gy}}_{\mathbf{0}} } \right) \ne G \ln \left( {{\varvec{y}}_{\mathbf{0}}} \right) \).Footnote 5 Specifically, we could argue that \(\ln \left( {{\varvec{y}}_{\varvec{t}}^*} \right) \le {\varvec{H}} \ln \left( {{\varvec{y}}_{\varvec{t}} } \right) \) and \(\ln \left( {{\varvec{y}}_{\mathbf{0}}^*} \right) \le {\varvec{G}} \ln \left( {{\varvec{y}}_{\mathbf{0}}} \right) \) basing on Jensen’s inequality. These inequalities imply that Eqs. (10) and (11) do not hold, affecting the expected value and the variance of the OLS estimator of an aggregate equation as (6). The dependent variable \({\varvec{y}}_{\varvec{t}}^*\) in the case of \(\beta \)-convergence equations with aggregated data is \(\ln \left( {{\varvec{Hy}}_{\varvec{t}}} \right) \), being the matrix of regressors \({\varvec{X}}^{*}\) given by \(\ln \left( {\varvec{GX}}\right) \). The expected value and the variance of the OLS estimator for this aggregated equation are, respectively:

$$\begin{aligned} E\left( \widehat{\varvec{\beta }}^{*} \right)= & {} E\left( {\left[ {{\varvec{X}}^{\prime *}{\varvec{X}}^{*}} \right] ^{-1}{\varvec{X}}^{\prime *}{\varvec{y}}_{{\varvec{t}}}^*} \right) =E\left( {\left[ {\ln \left( {{\varvec{GX}}} \right) ^{\prime }\ln \left( {{\varvec{GX}}} \right) } \right] ^{-1}\ln \left( {{\varvec{GX}}} \right) ^{\prime }\ln \left( {{\varvec{Hy}}_{{\varvec{t}}} } \right) } \right) \nonumber \\\ne & {} E\left( {\left[ {{\varvec{X}}^{\prime }{\varvec{G}}^{\prime }{\varvec{GX}}} \right] ^{-1}{\varvec{X}}^{\prime }{\varvec{G}}^{\prime }{\varvec{Gy}}_{{\varvec{t}}} } \right) \ne {\varvec{\beta }} \end{aligned}$$
(17)
$$\begin{aligned} Var\left( \widehat{\varvec{\beta }}^{*} \right)= & {} Var\left( {\left[ {{\varvec{X}}^{\prime *}{\varvec{X}}^{*}} \right] ^{-\mathbf{1}}{\varvec{X}}^{\prime *}{\varvec{y}}_{{\varvec{t}}}^*}\right) \nonumber \\= & {} Var\left( {\left[ {\ln \left( {{\varvec{GX}}} \right) ^{\prime }\ln \left( {{\varvec{GX}}} \right) } \right] ^{-\mathbf{1}}\ln \left( {{\varvec{GX}}} \right) ^{\prime }\ln \left( {{\varvec{Hy}}_{{\varvec{t}}} } \right) } \right) \ge \sigma ^{2}\left[ {{\varvec{X}}^{\prime }{\varvec{G}}^{\prime }{\varvec{GX}}} \right] ^{-1}\nonumber \\\ge & {} Var\left( \widehat{{\varvec{\beta }}} \right) \end{aligned}$$
(18)

The result in (18) is equivalent to (11), indicating the augmenting effect of the aggregation on the variance of the estimator. However, Eq. (17) shows how a problem of bias emerges now as well, in contrast to the result in (10).Footnote 6 The scale effect in the estimation of the \(\beta \)-convergence equations leads, in summary, to estimates that can be biased and with higher variance than in the original disaggregated equations. The next section of the paper explores by means of a numerical simulation the empirical implications of this problem.

4 Convergence with spatially disaggregated and aggregated data: some numerical experiments

Once the effect of the aggregation level on the OLS estimator has been studied, it is important to quantify its consequences when applied to the empirical analysis of \(\beta \)-convergence. A numerical experiment is conducted in this section with this purpose in mind. Our experiment assumes that the data are generated at the level of \(i=1,\ldots , n\) basic spatial units by the following equation that determines the growth in the relevant variable as:

$$\begin{aligned} \dot{y}_{i}= & {} \alpha + \beta \ln (y_{{i0}} ) + u_{i} ;{\text { or }}\nonumber \\ \ln (y_{i} )= & {} \alpha + ( {1 + \beta } )\, \ln ( {y_{{i0}} } ) + u_{i} \end{aligned}$$
(19)

being \(y_{i0}\) the value of the relevant variable at the starting period and \(y_i \) its final value. In the experiment, we have arbitrarily set the value of the intercept \(\alpha \) at 1.1, and \({\varvec{u}}\sim N\left( {0,0.5} \right) \). The idea is to compare the OLS estimates of parameter \(\beta \), which is the key element in the analysis of \(\beta \)-convergence, in two situations that vary on the spatial scale on which the data are observed:

  1. 1.

    the reference situation or benchmark, that assumes that we have data observable at the same scale at which they are generated, i.e., for the \(i=1,\ldots , n\) basic spatial units

  2. 2.

    a case where the data are only observable at an aggregated spatial scale into \(j=1,\ldots , m\) units. In this second scenario, we assume that we only have data on \(y_j^*\) and \(y_{j0}^*\) and from them we estimate the parameters of the equation:

    $$\begin{aligned} \ln (y_j^{*} )=\alpha +\left( {1+\beta } \right) \ln (y_{j0}^{*} )+u_j^{*} \end{aligned}$$
    (20)

In order to have a numerical experiment as realistic as possible, we have taken as reference for simulating possible structures of aggregation of spatial data the real sub-regional and regional divisions in three different countries: namely the USA, Germany and Chile. These three countries are taken as examples of developed economies, each of them presenting a particular configuration in their regional divisions. For example, the basic spatial units for the case of Chile are the comunas \((n=100)\) that form the total of \(m=13\) administrative regions. Similarly, in Germany we can find the basic spatial units defined by the concept of kreise \((n=393)\) that are aggregated into \(m=14\) länders. Finally, the USA is divided into \(n=3088\) counties that are aggregated forming the \(m=50\) states.

In order to provide with sensible values to the growth equation depicted in (20), we have taken real data for the initial value of the variable of interest. In the cases of the USA and Chile, we have defined \(y_{i0} \) as the income per capita, while in the case of Germany—due to data availability at the desired spatial scale—it is defined as GDP per capita. The time span on which we estimate (20) is also different for each country and conditioned by data limitations: for the USA there is a series of income at county level from 1969 to 2011 published by the Bureau of Economic Analysis; in Chile we have data on income for the comunas between 1996 and 2006 available in the Casen Survey of the Ministry of Planning; and for Germany the Destatis Statistisches Bundesamt contains estimates of GDP for the kreise between 2000 and 2011. Additionally, data on population are required to have indicators of income or GDP per capita. We have opted for using real data on population as well. Note that data of population in the initial and the final periods are required in order to aggregate spatially the per capita values of the variable of interest. The values per capita in the initial and final periods—the explanatory and dependent variable in (20), respectively—are aggregated by weighting the values in levels at the scale of basic spatial units by their population shares on these periods. Summary statistics of all these variables are given in Table 2.

Table 2 Summary statistics for the different scenarios

All these pieces of information have been used for the data generating process described in Eq. (19). The key element on this equation is the parameter \(\beta \), whose value determines if we have a process of convergence—if negative—or divergence—if positive. In the experiment, different scenarios have been considered depending on the value of parameter \(\beta \), setting its values ranging between \(-\)0.3 and 0.3. For each value of the parameter and for each country, we have simulated 5,000 trials and we have estimated the parameter by applying OLS in scenarios (1) and (2).

Table 3 summarizes the results obtained on each case, reporting the true value of the parameter together with the average OLS estimate, the empirical variability of the estimates—standard deviation—and a measure of deviation—mean squared error—between the true values and the OLS estimates.

Table 3 Results of an OLS estimation with different spatial configurations (5000 trials)

Additionally, Fig. 1 visually illustrates the results of the simulations reported in Table 3. In these plots the x axis represents the true value of the \(\beta \) parameter considered in Eq. (20). For each value of \(\beta \), the mean estimate obtained in the 5000 trials using disaggregated or aggregated data is represented in the y axis. If the results were not biased, we would expect a \(45^{{\circ }}\) line crossing the origin of the two axes with the true values and the estimates. 95 % confidence bandwidths are also plotted, based on the normal distribution of the estimates.

As expected, the empirical variability of the OLS estimates are substantially lower when estimated from the n basic data points than in the case of the m aggregated spatial units, since the sample size are smaller when working with aggregate data. Not surprisingly, these differences are more remarkable for the case of the USA when compared to the other two countries in the experiment, given that the ratio \(r=m/n\) is much smaller for the USA. The loss of efficiency derived from estimating equation (20) with m aggregated regions instead of estimating (19) with n spatial units is not entirely produced, however, by this inflation of the variance. One substantial part can be attributed to the bias as stated in Eq. (17). The estimates based on aggregated data present a negative bias underestimating the true value of the \(\beta \) parameter. The negative bias is partially a consequence of populations generally more uniformly distributed within each type of aggregated region (US states, German länders or Chilean regiones) in the final period (2011 for the USA and Germany and 2006 for Chile) than in the initial one (1969 for the USA, 2000 for Germany and 1996 for Chile).

Fig. 1
figure 1

OLS estimator with local and aggregate data, 1000 replications. a Germany, 393 Kreise, 14 Länders (2000–2011). b USA, 3088 counties, 50 states (1969–2011). c Chile, 100 Comunas, 13 regions (1996–2006). Solid line mean of estimates, n spatial units. Dotted line bandwidth 95 % confidence, n spatial units. Square mean of estimates, m aggregated spatial units. Dashed line bandwidth 95 % confidence, m aggregated spatial units

Although the simulations have been made for countries with different characteristics and spatial configurations, the results seem to be robust. As expected, the mean of the OLS estimates with n data points are practically equal to the true coefficient. In contrast, for each value of the true parameter, the regression based on aggregate regions tends on average to estimates smaller than the real coefficient. The mean bias of the eight values set for parameter \(\beta \) in the simulation is \(-\)0.051 for Germany, \(-\)0.061 for the USA and \(-\)0.082 for Chile. In summary, the effect produced by the aggregation of the spatial units in our experiments negatively biases the conclusions drawn from the OLS estimation of \(\beta \)-convergence equations.

5 Conclusions

The study of convergence is one of the more prolific research lines in the literature on regional economics. Conclusions derived from convergence analysis provide the support to maintain, reduce or increment expensive policies, such as the Regional Cohesion Policy in the EU. Different improvements have been proposed in the estimation techniques applied to quantify empirically the speed of convergence or divergence among territories. However, most of this empirical literature does not pay attention to how relevant could be the geographical scale in which the convergence is measured, although one of the most important differences among neoclassical theoretical equations and other alternative approaches is the spatial scale in which economic growth is studied.

The objective of this paper is to provide an evaluation of the empirical consequences on changes in the spatial scale in the most commonly used approach for convergence analysis: the estimation of equations of \(\beta \)-convergence. The characteristics of an OLS estimator applied to cross-sectional data—which is a relatively common situation in empirical studies—are derived. We found that geographical aggregation produces estimators with higher variance—part of it produced by the reduction in the sample size, but also biased if compared with the OLS estimator based on the original disaggregated spatial units.

To provide quantitative evidence about the effect of the spatial scale in \(\beta \)-convergence analysis, we conduct numerical simulations with different spatial configurations of real countries: Germany, USA and Chile. The results in the simulation confirm the loss of efficiency caused by the aggregation of spatial data, some of which is due to differences in sample size, but the negative bias generated is also significant. One important implication derived from our results is that the estimation of \(\beta \)-convergence equations based on aggregated data should take into account that an important part of the information, related with intra-regional dynamics, could be missing.

Our results, however, do not necessarily indicate that estimates of \(\beta \)-convergence equations with aggregated data are misleading or not useful: in some situations the availability of spatially disaggregated data is very limited and some type of aggregation is required. In addition, in economies where aggregate regions are characterized by low levels of intraregional heterogeneity, aggregation of spatial data could be not a real issue when dealing with convergence analysis. Our results, however, suggest that the spatial scale on which data are taken for estimating \(\beta \)-convergence equations should be carefully defined, since this specification can be partially affecting the conclusions of the analysis.

Our analysis opens the discussion about the suitability of econometric techniques that are not affected by MAUP problems. In this regard, multilevel estimation (see, among others, Goldstein 1986, 2011, Hox et al. 2010), which allow for using data at different scales is particularly interesting if we want to identify different spatial scales of convergence avoiding the potential bias derived from the data aggregation.

Finally, there are relevant issues not studied here that would require further research. For instance, this paper studied the MAUP effect on a simple OLS estimator with cross-sectional data. The proliferation of time series with regional data has made possible, however, applying estimators based on a structure of panel data. The consequences of spatial aggregation in the context of estimators applied to dynamic panels are an important issue that should be included in the research agenda on the estimation of \(\beta \)-convergence equations.