1 Introduction

Whether income levels of poorer regions are converging to those of richer is a question of paramount importance for human welfare (Islam 2003). In Europe interest in this question has been enhanced in recent years, with the entry of new countries to the European Union. This paper looks at evidence for regional income convergence in Europe. By Europe we mean the European Union of 27 member states. The notion of convergence is a fuzzy term that can mean different things (see Quah 1999). In this paper we understand this notion in the sense of poorer regions catching-up with the richer. The observation units are NUTS-2 regions which the European Commission has chosen as targets for the convergence process and defined as the geographical level at which the persistence or disappearance of inequalities should be measured.

Measuring regional income and the extent to which convergence across regions—or what the European Commission calls regional cohesion—exists is a difficult issue. But per capita gross regional product (GRP) measured in purchasing power units seems like a natural definition if one is interested in an important determinant of average welfare. By focusing upon per capita GRP we are interested in the economic performance of regions and the claims that people living in those regions have over that wealth. Cohesion depends on the degree of equality in the distribution of per capita income and the extent to which there are processes of catch-up, in which less wealthy regions enjoy faster rates of income growth than more developed ones. The data were calculated on the basis of the 1995 European System of Accounts (ESA 95) and refer to the time period from 1995 to 2003, the latest year for which income data are available. This shorter time span makes apparent the need for a model, before we can speak of the underlying dynamic regularities in these data.

Empirical research on regional income convergence has proceeded in many directions, using different definitions and methodologies.Footnote 1 Most research has, however, concentrated on the cross-section regression approach to investigate β-convergence where β is the generic notion for the coefficient on the initial income variable in the growth-initial level regressions. A negative β is interpreted as evidence of convergence in terms of both income level and growth rate. But Quah (1993b), Friedman (1992) and others have emphasised that a negative β can just be an example of the more general phenomenon of reversion to the mean, and, by interpreting it as convergence, growth analysts falling into Galton’s fallacy.

This study follows the tradition of the non-parametric approach that views the catching-up question as a question about the evolution of the cross-section distribution of income, and diverts attention from the individual or representative region to the entire distribution as object of interest (see, in particular, Quah 1993a, 1996a, b, 1997a, b, c). The distribution that is relevant here is the distribution of income across regions, not that within a given region. Purpose of the analysis is to find the law of motion that describes transition dynamics and implied long-run behaviour of regional income. In the spirit of Quah (1996a, b) we assume that each region’s income follows a first-order Markov process with time-invariant transition probabilities. That is, a region’s (uncertain) income tomorrow depends only on its income today.

Most of the applications of this approach have worked in a discrete state space set up (see Quah 1996a, b; Fingleton 1997, 1999; Paap and van Dijk 1998; López-Bazo et al. 1999; Magrini 1999; Rey 2001; LeGallo 2004 to mention some). This set up has several advantages, but the process of discretising the state space of a continuous variable is necessarily arbitrary. Experience from the study of income distributions shows that this arbitrariness can matter in the sense that statements on inferred dynamic behaviour of the distribution in question and the apparent long-run implications of that behaviour are sensitive to the choice of the discretisation (Jones 1997; Reichlin 1999). Indeed, it is well known that the Markov property itself can be distorted from inappropriate discretisation (Bulli 2001).

This paper avoids arbitrary discretisation of the income space and its possible effects on the results by using the stochastic kernel, the continuous equivalent of the transition probability matrix, as a suitable tool to overcome the problem. The remainder of the paper is divided into two parts. The first, Sect. 2, provides an empirical framework that extends current research by incorporating two novel techniques into the existing research: kernel estimation and graphical devices for the representation of the stochastic kernel (see Hyndman et al. 1996), and Getis’ spatial filtering technique that enables to account for the effects of spatial autocorrelation. The second part of the paper, Sect. 3, applies this framework to analyse income distribution dynamics and cross-region convergence in Europe, looking at evolving distributions of purchasing power standardised per capita (relative) gross regional product across 257 NUTS-2 regions in 27 EU-countries from 1995 to 2003. Some concluding remarks are given in the final section.

2 The empirical framework

A distribution perspective to the study of income dynamics and cross-region convergence directs attention to the evolution of the entire cross-region income distribution, emphasising shape and intra-distribution dynamics, and long-run (ergodic) behaviour. Section 2.1 introduces a continuous version of the standard model of explicit distribution dynamics, pioneered by Quah (1993a), and argues that the stochastic kernel can be described as a conditional density function. In Sect. 2.2 we present a product kernel estimator for estimating this transition function, and briefly describe a three-step-strategy for solving the bandwidth selection problem, that appears to be crucial for estimation. Section 2.3 combines Getis’ spatial filtering view with stochastic kernel estimation to account for the issue of spatial autocorrelation that may misguide inferences and interpretations if not properly handled.

2.1 A continuous version of the model of distribution dynamics

Let F t denote the cross-section distribution of regional incomes at time t, then the simplest scheme for modelling the intra-distribution dynamics of \(\left\{{F_t \vert t\,\,\hbox{integer}} \right\}\) is a first-order Markov process with time-invariant transition probabilities. The distribution evolves according to

$$F_{t+1} =M\,F_t$$
(1)

where M maps the distribution from time t to time t + 1, and tracks where points in F t end up in F t+1. Iteration of Eq. (1) gives a prediction for future distributions of the ex-post probabilities

$$F_{t+\tau} =M^\tau\,F_t \quad \hbox{for}\ \tau > 0 \,\,(\tau =1,\,2,\,\ldots).$$
(2)

In this framework, there are two goals, the estimation of M will give us information on persistence of regional income inequalities and the computation of the ergodic (steady-state) distribution. The latter provides information on the limiting behaviour of the regional income distribution. Convergence then might manifest in \(\left\{{F_{t+\tau}} \right\}\) tending towards a point mass. A bimodal limit distribution can be interpreted as a tendency towards stratification into two different “convergence clubs”.

In the discrete version of the model, the operator M can be interpreted as the transition probability matrix of the Markov process. The operator is approximated by partitioning the set of possible income values into a finite number of intervals. These intervals then constitute the states of a (time-homogeneous) finite Markov process, and all the relevant properties of M are described by a Markov chain transition matrix whose (i, j) entry is the probability that a region in state i transits to state j in income space, in one time step. The inferred dynamic behaviour and the long-run implications of that behaviour are conditional on the discretisation chosen.

Regional income, however, is by nature a continuous variable, and hence discretisation may induce a possible bias. Instead of a state being a fixed interval we let the state be all possible interval, including the infinitesimal small ones. In this case one may think of the number of distinct cells to tend to infinity and then to continuum. The corresponding transition probability matrix then tends to a matrix with a continuum of rows and columns. In this case, the operator M in Eq. (1) may be viewed as a stochastic kernel or transition function that describes the (time-invariant) evolution of the cross-section distribution in time. Convergence can then be studied by visualising and interpreting the shape of the income distribution at time t + τ over the range of incomes observed at time t.

For notational convenience let Y and Z denote the variable (per capita) regional income at times t and t + τ  (τ > 0), respectively. The sample may be denoted then by \(\left\{{(Y_1 ,\,Z_1),\ldots,\,(Y_n ,\,Z_n)} \right\},\) and the observations by \(\left\{{(y_1 ,\,z_1),\,\ldots,\,(y_n ,\,z_n)} \right\}\) where n indicates the number of regions. We assume that the cross-region distribution of Y can be described by the density function f t (y). This distribution will evolve over time so that the density prevailing at t + τ is f t (z). If we continue to maintain the assumptions of time-invariance and first-order of the transition process, the relationship between the cross-region income distributions, at time t and τ-periods later, can be written as

$$f_{t+\tau} (z)=\int\limits_0^\infty {g_\tau \,(z\vert y)\,\,f_t (y)} \,\hbox{d}y$$
(3)

where \(g_\tau \,(z\vert y)\) is the conditional density function giving the τ-period ahead density of income z, conditional on income y at time t. Evidently, the (first-order) stochastic kernel can be described by a conditional density function assuming that the marginal and conditional income distributions have density functions.

So long as \(g_\tau \,(z\vert y)\) exists, the long-run (ergodic) density, f  (z), implied by the estimated \(g_\tau \,(z\vert y)\) function can then be found as solution to

$$f_\infty (z)=\int\limits_0^\infty {g_\tau \,(z\vert y)\,\,f_\infty (y)} \,\hbox{d}y.$$
(4)

In this paper we will use the solution procedure outlined in Johnson (2004) to estimate this long-run distribution of regional income per capita.

2.2 Kernel estimation of the conditional density function

If \(f_{t,\,t+\tau} (y,\,z)\) denotes the joint density of (YZ) and f t (y) the marginal density of Y, then the conditional density of \(Z\vert (Y=y)\) is given by

$$g_\tau \,(z\vert y)=\frac{f_{t,\,t+\tau} (y,\,z)}{f_t (y)}.$$
(5)

Probably, the most obvious estimator of this conditional density functionFootnote 2 (see Hyndman et al. 1996) is

$$\hat{g}_\tau \,(z\vert y)=\frac{\hat{f}_{t,\,t+\tau} (y,\,z)}{\hat {f}_t (y)}$$
(6)

where

$$\hat{f}_{t,\,t+\tau} (y,\,z)=\frac{1}{n\,h_y\,h_z }\sum\limits_{i=1}^n {K\left({\tfrac{1}{h_y}\left\| {y-Y_i} \right\|_y} \right)\,\,K\left({\tfrac{1}{h_z}\left\| {z-Z_i} \right\|_z} \right)}$$
(7)

is the kernel estimator of \(f_{t,\,t+\tau} (y,\,z),\) and

$$\hat{f}_t (y)=\frac{1}{n\,h_y}\sum\limits_{i=1}^n {K\left( {\tfrac{1}{h_y}\left\| {y-Y_i} \right\|_y} \right)}$$
(8)

the kernel estimator of f t (y) (see Hyndman et al. 1996). h y and h z are bandwidth parameters that control the degree of smoothing applied to the density estimate. h y controls the smoothness between conditional densities in the y-direction, and h z the smoothness of each conditional density in the \(z\hbox{-direction.} \quad \left\| {\,.\,} \right\|_y \) and \(\left\| {\,.\,} \right\|_z\) are distance metrics on the spaces Y and Z, respectively. In this paper we use the standard euclidean distances, \(\left\| {\,.\,} \right\|_y =\left| {\,.\,} \right|_y \,\,\hbox{and}\,\,\left\| {\,.\,} \right\|_z =\left| {\,.\,} \right|_z.\)

A multivariate kernel other than the product kernel might be used to define \(\hat{g}_\tau \,(z\vert y).\) But the product kernel is simpler to work with, leads to conditional density estimators with several nice properties and is only slightly less efficient than other multivariate kernels (Wand and Jones 1995). The kernel K(x), where x is variously y or z, is a real, integrable, non-negative, even function on \({{\mathbb{R}}}\) concentrated at the origin so that (Silverman 1986)

$$\int\limits_{{\mathbb{R}}} {K(x)\,\hbox{d}x=1} ,\quad \int\limits_{{\mathbb{R}}} {x\,K(x)\,\hbox{d}x=0 \quad \hbox{and}\quad \sigma _K^2 =\int\limits_{{\mathbb{R}}} {x^2\,K(x)\,\hbox{d}x < \infty}}.$$
(9)

Popular choices for K(x) are defined in terms of univariate and unimodal probability density functions. In this paper we use the Gaussian kernelFootnote 3 given by

$$K(x)=\left({\sqrt {2\pi}} \right)^{-1}\,\exp \left({-\frac{1}{2}\,x^2} \right).$$
(10)

Whatever kernel is being used, bandwidth parameters chosen to minimise the asymptotic mean square error give a trade-off between bias and variance. Small bandwidths yield small bias but large variance, while large bandwidths lead to large bias and small variance. The problem of choosing, how much to smooth, is of crucial importance in conditional density estimation, and the results of the continuous state space approach to distribution dynamics strongly depend on the bandwidth parameters chosen.

In this study we follow Bashtannyk and Hyndman (2001) to solve this bandwidth selection problemFootnote 4 by a three-step-strategy that combines three different procedures: a Silverman (1986) inspired normal reference rule that has proven useful in univariate kernel density estimation,Footnote 5 a bootstrap bandwidth selection approach following the approach of Hall et al. (1999) for estimating conditional distribution functions, and a regression-based bandwidth selectorFootnote 6 (see Fan et al. 1996). Step 1 involves finding an initial value for the smoothing parameter h z using the rule with normal marginal density. Given this value of h z , Step 2 makes use of the regression-based bandwidth selector to find a value for h y . In Step 3 the bootstrap method is used to revise the estimate of h z by minimising the bootstrap estimator of a weighted mean square error function. Step 2 and Step 3 may be repeated one or more times.

2.3 Spatial autocorrelation and stochastic kernel estimation

Stochastic kernel estimation rests on the implicit assumption that each region represents an independent observation providing unique information that can be used to estimate the transition dynamics of income. In essence, the cross-section observations at one point in time are viewed as a random sample from a univariate distribution, or in other words, X (where X stands variously for Y and Z) is assumed to be univariate and random. If the X i  (i = 1,..., n) are independent, we say that there is no spatial structure. Independence implies the absence of spatial autocorrelation.Footnote 7 Spatial autocorrelation reflects a lack of independence between regions. This independence may arise from a variety of measurement problems, such as boundary mismatches between the NUTS-2 regions. But also interactions or externalities across regions such as, for example, knowledge spillovers, trade as well as commuting and migration flows are likely to be a major source of the violation of the assumption (see Abreu et al. 2004 for a survey of the existing evidence).

A violation of the independence assumption may result in misguided inferences and interpretations (Rey and Janikas 2005). This problem has been largely neglected in distribution analysis so far. One wayFootnote 8 to dealing with the problem involves the filtering of the variable X in order to separate spatial effects from the variable’s total effects. While insuring spatial independence, this allows us to use the stochastic kernel to properly estimate the underlying regional income distribution and to analyse its evolution over time. The motivation for a spatial filter is simply that a spatially autocorrelated variable can be transformed into an independent variable by removing the spatial dependence embedded in it. The original variable, X, is hence partitioned into two parts, a filtered non-spatial variable, say \(\tilde{X},\) and a residual spatial variable L X . The transformation procedure depends on identifying an appropriate distance δ within which nearby regions are spatially dependent, and examining each individual observation for its contribution to the spatial dependence embedded in the original variable (Getis and Griffith 2002).

There have been several suggestions for identifying δ, but in this paper we adopt the Getis filtering approach (see Getis 1990, 1995) which is based on the local spatial autocorrelation statistic G i (Getis and Ord 1992) to be evaluated at a series of increasing distances until no further spatial autocorrelation is evident. As distance increases from an observation (region i), the G i -value also increases if spatial autocorrelation is present. Once the G i -value begins to decrease, the limit on spatial autocorrelation is assumed to have been reached, and the associated critical δ identified. The filtered observation \({\tilde{x}}_i\) is given as

$${\tilde{x}}_i =\frac{x_i \left[ {\tfrac{1}{n-1}\,\,W_i} \right]}{G_i \,(\delta)}$$
(11)

where x i is the original income observation for region i, n is the number of observations and

$$W_i =\sum\limits_{j=1}^n {w_{ij} (\delta)} \quad \hbox{for}\,\,j\ne i.$$
(12)

with w ij (δ) = 1 if the distanceFootnote 9 from region i to region j  (i ≠ j), say d ij , is smaller than the critical distance band δ, and w ij (δ) = 0 otherwise. G i (δ) is the spatial autocorrelation statisticFootnote 10 of Getis and Ord (1992) defined as

$$G_i \,(\delta)=\frac{\sum_{j=1}^n {w_{ij} \,(\delta)\,} x_j }{\sum_{j=1}^n {x_j}}\quad \hbox{for} \ i\ne j.$$
(13)

The numerator of (13) is the sum of all x j within δ of i but not including x i . The denominator is the sum of all x j not including x i .

Equation (11) compares the observed value of G i (δ) with its expected value, (n−1)−1 W i . E[G i (δ)] represents the realisation, \(\tilde{X},\) of the variable X at region i when no autocorrelation occurs. If there is no autocorrelation at i to distance δ, then the observed and expected values, \(x_i\;\hbox{and}\;{\tilde{x}}_i,\) will be the same. When G i (δ) is high relative to its expectation, the difference \(x_i -{\tilde{x}}_i\) will be positive, indicating spatial autocorrelation among high observations of X. When G i (δ) is low relative to its expectation, the difference will be negative, indicating spatial autocorrelation among low observations of X. Thus, the difference between \(x_i \ \hbox{and}\ {\tilde{x}}_i\) represents the spatial component of the variable X at i. Taken together for all i, L X represents a spatial variable associated, but not correlated, with the variable X. Thus, \(L_X +\tilde{X}=X\) (Getis and Griffith 2002).

Combining this spatial filtering approach with stochastic kernel estimation as described in the previous section yields the long-run (ergodic) density, \(f_\infty (\tilde{z}),\) implied by the estimated \(g_\tau (\tilde{z}\vert \tilde{y})\) function:

$$f_\infty (\tilde{z})=\int\limits_0^\infty {g_\tau (\tilde{z}\vert \tilde{y})} \,\,f_\infty (\tilde{y})\,\,\hbox{d}\tilde{y},$$
(14)

where \(\tilde{y}\) and \(\tilde{z}\) denote the spatially filtered observations of Y and Z, respectively. To assess the role played by space on income growth and convergence dynamics across the regions, we consider a specific stochastic kernelFootnote 11 that maps the distribution Y to the spatially filtered distribution \(\tilde{Y}\vert Y\) so that

$$g(\tilde{y}\vert y)=\frac{f(y,\,\tilde{y})}{f(y)}$$
(15)

where the stochastic kernel does not describe transitions over time, but transitions from unfiltered to spatially filtered regional income distributions, and, thus, quantifies the effects of spatial dependence. If spatial effects caused by spatial interaction among regions and measurement problems would not matter, then the stochastic kernel would be the identity map.

3 Revealing empirics

This section applies the above framework to study regional income dynamics and convergence in Europe. In Sect. 3.1 we describe the data and the observation units. Kernel smoothed densities and Tukey boxplots are used in Sect. 3.2 to study the shape dynamics of the distribution. Cross-profile plots, continuous stochastic kernels and implied ergodic distributions are taken in Sect. 3.3 to investigate intra-distribution dynamics and long-run tendencies in the data. Section 3.4 proceeds to the spatial filtering view of the data to gain insights not affected by the spatial autocorrelation problem.

3.1 Data and observation units

We use per capita GRP over the period 1995–2003 expressed in ECUs, the former European currency unit, replaced by the Euro in 1999. The GRP figures were calculated on the basis of the 1995 European System of Integrated Economic Accounts (ESA 95)Footnote 12 and extracted from the Eurostat Regio database. We use GRP per capita in national PPS (purchasing power standards) as defined by Eurostat. These units are comparable to ECUs/Euros, with a slight correction.Footnote 13

The time period is relatively short due to a lack of reliable figures for the regions in the new member states of the EU. This comes partly from the substantial change in measurement methods of national accounts in Central and East Europe (CEE) between 1991 and 1995. But more important, even if estimates of the change in the volume of output did exist, these would be impossible to interpret meaningfully because of the fundamental change of production from a centrally planned to a market system. As a consequence, figures for GRP are difficult to compare until the mid-1990s (Fischer and Stirböck 2006).

The observation units of the analysis are NUTS-2 regions.Footnote 14 Although varying considerably in size, NUTS-2 regions are those regions that are adopted by the European Commission for the evaluation of regional growth and convergence processes. NUTS is an acronym of the French for “the nomenclature of territorial units for statistic”, which is a hierarchical system of regions used by the statistical office of the European Community for the production of regional statistics. Our sample includes 257 NUTS-2 regionsFootnote 15 covering the 27 member states of the EU (see the Appendix for a description of the regions):

  • the EU-15 member states: Austria (nine regions), Belgium (eleven regions), Denmark (one region), Finland (five regions), France (22 regions), Germany (40 regions), Greece (thirteen regions), Ireland (two regions), Italy (20 regions), Luxembourg (one region), Netherlands (twelve regions), Portugal (five regions), Spain (16 regions), Sweden (eight regions), UK (37 regions);

  • the 12 new member states: Bulgaria (six regions), Cyprus (one region), Czech Republic (eight regions), Estonia (one region), Hungary (seven regions), Latvia (one region), Lithuania (one region), Malta (one region), Poland (16 regions), Romania (eight regions), Slovakia (four regions), Slovenia (one region).

3.2 Shape dynamics of the distribution

When studying income distribution dynamics across regions in Europe, one can consider incomes per region in absolute terms. Alternatively, one can study regional incomes normalised by the European average. Although there are merits to using the absolute income distribution, it is more natural to take relative incomes when considering changes in income distributions over time. Relative incomes allow us to abstract from overall changes in income levels.Footnote 16 A natural approach to assess the shape dynamics of the distribution change over the observation period 1995–2003 is to estimate the cross-sectional distributions by using non-parametric kernel smoothing procedures, which avoid the strong restrictions imposed by parametric estimation. In this framework, if there is a bimodal density at a given point in time, indicating the presence of two groups in the population of regions, convergence implies a tendency of the distribution to move progressively towards unimodality.

Figure 1 plots the distribution of (per capita) GRP relative to the average of all 257 regions—what we call the Europe relative (per capita) income or simply the relative income. The plots are densities and can be interpreted as the continuous equivalent of a histogram, where the number of intervals has been let tend to infinity and then to the continuum. All densities were calculated non-parametrically using a Gaussian kernel with bandwidths chosen as suggested in Silverman (1986), restricting the range to the positive interval. The solid line shows the distribution in 2003, and the dashed line that in 1995. To read this type of figure, note that 1.0 on the horizontal axis indicates the European average of regional income, 2.0 indicates twice the average, and so on. The height of the curve over any point gives the probability that any particular region will have that relative income. Since the height of the curve at any particular point gives the probability, the area under the curve between, say 0.0 and 1.0, gives the total likelihood that a region will have a relative income that is between 0.0 and 1.0.

Fig. 1
figure 1

Distributions of relative (per capita) regional income, 1995 versus 2003. Notes: The plots are densities calculated non-parametrically using a Gaussian kernel with bandwidth chosen as suggested in Silverman (1986), restricting the domain to be non-negative. The solid line shows the density for 2003 and the dashed line that for 1995

The figure shows a distribution with twin-peaks—to use the appellation coined by Quah (1993a)—in 1995, one corresponding to low income regions and the other to middle-income ranges, and a long tail with two smaller bumps at the upper end of the distribution. Technically, the income distribution is said to show a bimodal shape. The main modeFootnote 17 is located at about 110% of the European average, and the second mode at about 38%. The estimated densities reveal several changes over the observation period. The kernel estimated median value decreases by 2%, while the level of dispersion exhibits a small reduction. The kernel estimated standard deviation decreases by 3.3% from 0.393 in 1995 to 0.380 in 2003.

Perhaps most remarkable is the change in the shape of the distributions. By 2003, the peaks have become closer together, and the richer peak has risen moderately at the expense of the poorer. We see this by noting that the area under the 2003 curve, that is between 0.5 and 1.1, is greater than the corresponding area under the 1995 curve, while the area that is to the left of 0.5 is smaller. The smaller peak seems to progressively collapse over time. This finding may suggest an improvement in economic conditions of the poorest regions and reflect a trend, in some sense, of catching-up.

Figure 2 gives a sequence of Tukey boxplots for the 257 NUTS-2 regions. Recall that the units of income are PPS units scaled to the EU-27 average. Time appears on the horizontal axis, while the vertical axis maps relative per capita income values. To understand these pictures, recall the construction of a Tukey boxplot. Each boxplot includes a box bounded by Q 1 and Q 3 denoting sample quartiles. Thus, the box contains the middle 50% of the distribution. The thick line in the box locates the median. The upwards and downwards distances from the median to the top and bottom of the box provide information on the shape of the distribution. If these distances differ, then the distribution is asymmetric. Thin dashed vertical lines emanating from the box both upwards and downwards, reach upper and lower adjacent values, respectively. The upper adjacent value is the largest value observed that is not greater than the top quartile plus 1.5 times (Q 3Q 1). The lower quartile is similarly defined, extending downwards from the 25th percentile. Dots indicate upper and lower outside values, that is, observations that lie outside the upper and lower adjacent values, respectively. These denote regions which have performed extraordinarily well or extraordinarily poorly relative to the set of other regions. Of course, upper and lower outside values might not exist. The adjacent values might already be the extreme points in a specific realisation.

Fig. 2
figure 2

Tukey boxplots of relative (per capita) regional income across 257 European regions

There are no extraordinarily poorly performing regions, more accurately when regions performed especially badly, they were not alone. On the upside, by contrast, the figure shows several outstanding performers. At the beginning of the sample, five regions showed upper outside values, and by the end of the sample six outside values. The spreading apart in the regional income distribution has one distinct source, the pulling away of the upper outside values—representing Inner London, Brussels, Luxembourg, Hamburg, Île-de-France and Vienna—from the rest of the regions. The figure, moreover, makes clear that the interquartile range is decreasing by more than 15%, and this falling is due to a decrease of Q 3 rather than Q 1.

The matching counterparts in Figs. 1 and 2 use exactly the same data. But they emphasise different empirical regularities. The bimodal shape is striking in Fig. 1, but is far from obvious in Fig. 2. The spreading out of the upper tail of the distribution is apparent in Fig. 2. It appears in form of two smaller bumps in Fig. 1.

3.3 Intra-distribution dynamics and long-run tendencies

Thus far, we have considered only point-in-time snapshots of the income distribution across the regions. This section takes the next step in the analysis, and looks at the intra-distribution dynamics and then at the long-run (ergodic) tendencies. We start with Fig. 3 showing cross-profile dynamics.Footnote 18 The vertical axis is the log of relative (per capita) incomes. Each curve in the figure refers to the situation at a given point in time. The lowest curve gives the cross-section of regions at time 1995 in increasing order. This ordering is then maintained throughout the time periods considered. Proceeding upwards, we see curves for 1999 and 2003. The character of the upper plots, thus, depends on 1995 when the ordering is taken.

Fig. 3
figure 3

Cross-profile dynamics across 257 European regions, retaining the ranking fixed at the initial year, relative (per capita) income, advancing upwards: 1995, 1999 and 2003 (a guide to region codes can be found in the Appendix)

In the plots, increasing jaggedness indicates intra-distribution mobility. In contrast, if each cross-profile would always monotonically increase over time, then income rankings were invariant. The most striking feature of Fig. 3 is not this comparative stability through time. It is the change in choppiness through time in the cross-profile plots indicated by local peaks. By 2003, we observe local peaks, for example, at the lower end of the distribution around regions ranked 9th, 19th, 42nd and 66th poorest in 1995, and at the upper end around regions ranked second and fourth richest. These turn out to be Latvia, Estonia, Mazowieckie (Warszawa) and Közép–Magyarország (Budapest), and Inner London and Luxembourg, respectively. By contrast, Moravskoslezko (57th poorest in 1995) in the Czech Republic, Lüneburg (129th poorest) and Berlin (the 41st richest region) experienced economically significant relative declines by 2003. The cross-profile dynamics are informative. They illustrate when regions overtake one another, fall behind, or pull ahead. But they do not identify underlying dynamic regularities in the data. We thus turn to the stochastic kernel representation of intra-distribution dynamics next.

Figure 4 shows the conditional kernel density estimate \(\hat{g}_\tau (z\vert y)\) with fixed bandwidths (h y = 0.036,  h z  = 0.023)Footnote 19 that describes the stochastic kernel across the 257 regions, averaging over 1995 through 2003. The stochastic kernel has been estimated for a 5-year transition period, setting τ = 5. The figure displays the estimate, using Hyndman’s (1996) visualisation tools. Figure 4a presents the stochastic kernel in terms of a three-dimensional stacked conditional density plot in which a number of conditional densities are plotted side by side in a perspective plot. For any point y on the period t axis, looking in the direction parallel to the t + 5 time axis traces out a conditional probability density. The graph shows how the cross-section income distribution at time t evolves into that at time t + 5. Just as with a transition probability matrix in a discrete set up, the 45-degree diagonal in the graph indicates persistence properties. When most of the graph is concentrated along this diagonal, then the elements in the cross-section distribution remain where they started. As evident from Fig. 4a, a large portion of the probability mass remains clustered along the main diagonal over the 5-year horizon, and most of the peaks lie along this line indicating a low degree mobility and modest change in the regional income distribution.

Fig. 4
figure 4

Relative income dynamics across 257 European regions, the estimated g 5(z|y), see Eq. (6): a stacked density plot, and b highest density regions boxplot. Notes: ad b The lighter shaded regions in each strip is a 99% HDR, and the darker shaded region a 50% HDR. The mode for each conditional density is shown as a bullet •. Technical notes: The conditional density g τ (z|y) is estimated over a 5-year transition horizon τ = 5 between 1995–2003. Estimates are based on a Gaussian product kernel density estimator with bandwidth selection (h y  = 0.036, h z = 0.023) based on the three-step-strategy suggested by Bashtannyk and Hyndman (2001). The stacked conditional density plot and the high density region boxplot were estimated at 70 and 150 points, respectively. Calculations of the plots were performed using the R package HRDCDE, provided by Rob Hyndman

The highest density regions (HDRs) boxplot, given in Fig. 4b, makes this clearer. A HDR is the smallest region of the sample space containing a given probability. Figure 4b shows a plot of the 50 and 99% HDRs,Footnote 20 computed from the density estimates shown in Fig. 4a. Each vertical strip represents the conditional density for one y value. The darker shaded region in each strip is a 50% HDR, and the lighter shaded region is a 99% HDR. The mode for each conditional density is shown as a bullet •. The vertical dashed line at 1.0 marks regions with income equal to the European average at time t, and the horizontal dashed line at 1.0 those with income equal to the average at t + 5. The 45-degree diagonal indicates intra-distribution persistence over the 5-year transition horizon.

To read this type of boxplot note that strong persistence is evidenced when the main diagonal crosses the 50% HDRs. It means that most of the elements in the distribution remain where they started. There is a low persistence and more intra-distribution mobility if that diagonal crosses only the 99% HDRs. Strong (weak) global convergence towards equality would manifest in 50% (99%) HDRs crossed by the horizontal line at 1.0. Fifty percent HDRs consisting of two disjoint intervals would indicate a two-peaks property of the distribution.

The plot not only reveals persistence, but also mobility and polarisation features. Regions with an income range of 0.8–1.2 times the European average show strong persistence. Some mobility occurs at the extremes of the distribution, more at the upper extreme than at the lower. Some portions of the cross-section in the income range below 0.8 times the average tend to slightly increase their relative position over the 5-year transition horizon, indicating a process of catching-up of the poorest regions with the richer ones. In contrast, portions in the income range above 1.2–1.8 times the average lose out their relative position, becoming relatively poorer. The boxplot also shows signs of polarisation, the opposite of catching-up. This is indicated by the disjoint intervals of the 50 and 99% HDRs at the upper extreme of the income range. We see that regions starting with an income of 2.0–2.3 times the European average at time t are unlikely to remain there. Most see their European relative income fall and others rise, with the result that this income class appears to vanish. The position of a small very rich group around 2.3–2.6 times the average remains either unchanged or shifting away.

The evidence of Fig. 4 is corroborated by the ergodic density function that is obtained by solving Eq. (4). Figure 5 plots the estimated long-run (ergodic) density,Footnote 21 \(\hat{f}_\infty (z),\) implied by the estimated \(g_\tau (z\vert y)\) function for τ = 5, along with the initial income distribution. The solid line shows the point estimate of the ergodic distribution and the dashed line the initial income distribution. Comparing these two distributions we see that the ergodic distribution is wider, both at the top and at the bottom. This reflects a shift in the mass of the distribution away from the lower end to the middle, and from the middle to the upper end. In particular, the peak in the initial distribution between 20 and 50% of the European relative per capita income has shifted upward into the 60–100% range and shows a tendency to disappear.

Fig. 5
figure 5

The ergodic density f (z) implied by the estimated g 5(z|y) and the marginal density function f 1995(y). Notes: The solid line shows the point estimate for f (z) and the dashed line the estimate for the marginal density f 1995(y). The ergodic function f (z) has been found as solution to Eq. (4)

The stationary distribution across the 257 regions, plotted in Fig. 5, is distinctively bimodal. The dominant peakFootnote 22 represents regions clustered just below the European average income, while a small group of relatively rich regions gathers around three times of the average European (per capita) income. The bimodal nature of the ergodic distribution in comparison with the initial income distribution provides indication for two types of processes at work over time: a gradual and slow catching-up of the poorest regions which turn out to be—with very few exceptions—regions in Central and Eastern Europe, and simultaneously a tendency towards polarisation—a small group of richer regions separating from the rest of the cross-section.

The bimodal shape of the ergodic distribution contradicts with Quah’s (1996a) unimodal ergodic solution found in a discrete state space set up with a largely reduced set of 78 European regions over 1980–1989. The observation, however, is in line with Pittau and Zelli’s (2006) findings, obtained for a set of 110 regions covering twelve EU member countries over the time period from 1977 to 1996.

To sum up this first pass through the data, we conclude that the data show a wide spectrum of intra-distribution dynamics. Overtaking and catching-up occur simultaneously with persistence and polarisation. Polarisation manifests itself in the emergence of a twin-peak structure in the long-run regional income distribution.

3.4 The spatial filtering perspective

Large significant and positive values of Moran’s I reveal the presence of spatial association of similar values of neighbouring European regions in relative (per capita) income.Footnote 23 This motivates a spatial filtering passFootnote 24 through the data to avoid inferences and interpretations, misguided by the violation of the independence assumption in the previous analysis.

Figure 6 presents the spatially filtered counterpart of Fig. 1. Comparing these densities with those in Fig. 1 indicates that the mode, which was situated at around 38% of the European average, has disappeared. Consequently, the economic performance of the regions is well explained by the neighbouring regions’ performances, except may be for regions with very high relative (per capita) income.

Fig. 6
figure 6

Densities of relative (per capita) income, 1995 versus 2003: the spatial filtering view. Notes: The plots are densities calculated non-parametrically using a Gaussian kernel with bandwidth chosen as suggested in Silverman (1986), restricting the domain to be non-negative. The solid line shows the density for 2003 and the dashed line that for 1995

The filtered distributions in this figure are tighter and more concentrated than those in Fig. 1. The boxplots in Fig. 7 make this particularly clear. Upper and lower outliers exist here, but the 25th and 75th percentiles are located close to the average income. Lower and upper adjacent values are compactly situated within about 0.5 and 1.5 times average income levels. The filtered distribution has a kernel estimated standard deviation of 0.262 in 1995, which increases to 0.283 in 1999, and then to 0.310 in 2003. The increase over the time 1995–2003 is 15%. The estimated standard deviations of the unfiltered data were found to be 0.393 in 1995 and 0.380 in 2003, indicating a slight decline by 3.3%. From this, it is clear that the evidence for σ-convergence found in Sect. 3.1 is caused by spatial dependence embedded in the income data.Footnote 25

Fig. 7
figure 7

Tukey boxplots of relative (per capita) income, across 257 European regions: the spatial filtering view

More information on the role of spatial effects becomes evident when looking at the stochastic kernel in Fig. 8 that shows how the original (unfiltered) relative (per capita) income distribution is transformed into the spatially filtered one. Figure 8a displays the conditional kernel density estimate \(\hat{g}(\tilde{y}\vert y)\) with fixed bandwidths \((h_y =0.103,\,\,h_{\tilde{y}} =0.052)\) in terms of a three-dimensional stacked conditional plot as given in Fig. 8a, and an HDR boxplot in Fig. 8b.

Fig. 8
figure 8

Stochastic kernel mapping from the original to the spatially filtered distribution, the estimated \(g (\tilde{y}|y)\!\!:\) a stacked conditional density plot, and b highest density regions boxplot. Notes: ad b The lighter shaded region in each strip is a 99% HDR, and the darker shaded region a 50% HDR. The mode for each conditional density is shown as a bullet •. Technical notes: The conditional density \(g(\tilde{y}|y)\) is estimated over a 5-year transition horizon τ = 5 between 1995 and 2003. Estimates are based on a Gaussian product kernel density estimation with bandwidth selection \((h_{y}= 0.103, h_{\tilde{y}} = 0.052)\) based on the three-step-strategy suggested by Bashtannyk and Hyndman (2001). The stacked conditional density plot and the high density region boxplot were estimated at 70 and 150 points, respectively. Calculations of the plots were performed using the R package HRDCDE, provided by Rob Hyndman, and spatial filtering, using the PPA package, provided by Arthur Getis

If spatial effects account for a substantial part of the distribution, then the stochastic kernel mapping from the original (unfiltered) to the spatially filtered distribution would depart from the identity map. Indeed, Fig. 8a precisely conveys this message. The graph shows the kernel mapping the original to the filtered distribution in the same year. The evident clockwise reversal on the lower, but also on the higher part of the distribution indicates that spatial effects do account for a large part of income dynamics in Europe. Figure 8b reinforces this interpretation. The dominant feature in this figure appears to be intra-distribution mobility rather than persistence. Regions with an income less than 0.7 times the European average show a clear tendency towards cohesion. There are strong indications that the probability of the poorest regions to move up is negatively affected by the presence of spatial dependence effects. This is evidenced by the 99% HDRs crossing the horizontal line at 1.0 and by the 50% HDRs coming much closer to this line. However, while this is happening, the very highest parts of the income distribution show tendencies away from cohesion, and provide evidence for emerging twin peaks.

Figure 9 provides stochastic kernel representations of 5-year transition dynamics in the spatially filtered income space, using again a stochastic kernel estimator with fixed bandwidths \((h_{\tilde{y}} =0.061,\,\,h_{\tilde{z}} =0.047).\) This figure is the counterpart to Fig. 4 for spatially filtered relative (per capita) regional incomes. Figure 9a presents the stochastic kernel in terms of a three-dimensional stacked conditional density plot, and Fig. 9b in terms of a HDRs boxplot. The picture that emerges from the estimates here is that of a substantial degree of intra-distribution mobility at the upper and lower tails of the income distribution. The remarkably different dynamics that emerge—in comparison to the unfiltered regional income case—suggest that—if we are to evaluate growth and convergence dynamics across regions correctly—the use of spatially filtered data is pretty much essential to avoid misleading interpretations.

Fig. 9
figure 9

The spatial filter view of relative income dynamics: The estimated \(g_{5}(\tilde{z}\vert \tilde{y}),\) a stacked density plot, and b highest density regions boxplot. Notes: ad b The lighter shaded region in each strip is a 99% HDR, and the darker shaded region a 50% HDR. The mode for each conditional density is shown as a bullet •. Technical notes: The conditional density \(g_{\tau}({\tilde{z}\vert \tilde{y}})\) is estimated over a 5-year transition horizon τ = 5 between 1995–2003. Estimates are based on a Gaussian product kernel density estimator with bandwidth selection \((h_{\tilde{y}} = 0.061, h_{\tilde{z}} = 0.047)\) based on the three-step-strategy suggested by Bashtannyk and Hyndman (2001). The stacked conditional density plot and the high density region boxplot were estimated at 70 and 150 points, respectively. Calculations of the plots were performed using the R package HRDCDE, provided by Rob Hyndman, and spatial filtering using the PPA package, provided by Arthur Getis

4 Concluding remarks

The study follows the tradition of the non-parametric approach studying both the shape and mobility dynamics of cross-sectional distributions of relative (per capita) income that appears to be generally more informative about the actual patterns of cross-sectional growth than convergence empirics within the β-convergence regression approach. It differs from most of the previous work by going for a continuous kernel route which is more informative than research with discretely-defined income cells.

This paper incorporates two novel techniques into the continuous analysis: kernel estimation and more powerful graphical devices for the representation of the stochastic kernel, and Getis’ spatial filtering technique to explicitly account for the spatial dimension of the growth process. The paper illustrates that the use of spatially filtered data is pretty much essential to evaluate growth and convergence dynamics across regions. The lack of an appropriate inferential theory, however, restricts the study to a descriptive stage.

The study has produced some interesting results. First, there is no development trap in the long-run into which the poorer Central and Eastern European regions will be permanently condemned. Second, the findings suggest a tendency of the cross-section distribution of regional per capita income to split up into two separate groups, where a small group of richer metropolitan regions is growing away from the rest of the European regions. This evidence is coherent to Pittau and Zelli’s (2006) stationary distribution estimated on a sample of 110 EU-12 regions over the period 1977–1996. Third, spatial effects explain a substantial part of the income distribution, but not the emergence of the two-club regional world in the long-run. Growth theories now need to explain these facts. The distribution dynamics analysis carried out in this paper does not help further in this respect.