1 Introduction

There is strong empirical evidence showing that regional economies grow at different rates which (i) tend to converge over time (see Baumol 1986, and the subsequent literature on “convergence”) and (ii) are spatially correlated (Abreu et al. 2005). Convergence in per capita incomes finds its theoretical basis in models of economic growth (Barro and Sala-i-Martin 1992), while spatial convergence has been theoretically justified by Ertur and Koch (2007), who derive a spatial autoregressive equation from a growth model which takes into account technological interactions between regions based on proximity and neighborhood effects. Furthermore, new economic geography models predict spatial convergence in real wages, since migration between cities/regions leads to real wage equalization in the long run. This idea was proposed by Krugman (1991) in a setting with two regions. I derive a spatial beta-convergence equation for real wages using the simple multi-region model proposed by Tabuchi et al. (2005) in Sect. 3.

This paper tests these spatial convergence predictions for Russian cities. Since Russia is a large country with a very uneven distribution of economic activity, it is natural to conjecture that the “spatial” component of convergence in both real wages and per capita incomes is essential. Until now, the analysis of the spatial determinants of regional economic growth in Russia has been addressed in the literature by using the subjects-of-federation-level data.Footnote 1 This approach, however, is problematic, for it leads to a loss of information due to extremely high intraregional heterogeneity (Zubarevich 2015). Moreover, because the regions are few, this limits the use of advanced econometric methods in order to estimate the relevant effects. In this paper, I suggest a way of overcoming these difficulties.

Using disaggregated geo-coded data, I study whether the properties of city location patterns foster convergence in real wages, and quantify the impact of the spatial structure of the economy on \(\beta \)-convergence in real wages in Russia. This has the advantage of exploiting a finer location pattern as a source of variation, and allows me to design a flexible empirical strategy that yields robust inferences. I apply Bayesian spatial econometric models that allow a comparison of the estimation results for different spatial weight matrices. The flexibility of this econometric procedure also comes from the possibility of fine-tuning the sparsity of the spatial weight matrix, a property that is of paramount importance in modern spatial econometrics (LeSage and Pace 2009). To the best of my knowledge, no similar setting has ever been used in previous empirical work on Russian regional development.

My main findings can be summarized as follows. First, real city wages (i) converge over time and (ii) are significantly affected by the initial levels of real wages in neighboring cities. Second, the radius of significant spatial interactions between Russian cities is around 300–600 km for cities west of the Urals, and around 1000 km for the Far North (cities in the Arctic Circle and some other isolated cities), where convergence is slower. Third, the effect of regional subsidies on real wage is negative, and the effect of natural resources on real wages has become weaker over time. Finally, although no official statistical data on per capita incomes are available at city level in Russia, I find a strong positive correlation between per capita income and wages at the subject-of-the-federation level, which persists over time. Based on that, I believe that my main results provide indirect evidence for spatial convergence of Russian cities in per capita incomes.

The rest of the paper is organized as follows. Section 2 reviews the literature, mostly based on Russian regional data. Section 3 provides a theoretical foundation for real wage convergence in cities. Section 4 describes the dataset. I discuss spatial weight matrices applied to Russian regional data and show a significant positive autocorrelation of real wages in Russia using a series of spatial matrices. Section 5 shows the findings: (i) conditional sigma-convergence of spatially weighted real wages and (ii) spatial beta-convergence of real city wages. I show that my conclusions are robust to the threshold distance, i.e., the estimation results are not sensitive to the exclusion of remote towns from the dataset. Section 6 concludes.

2 Literature review

There is an extensive empirical literature following Baumol’s (1986) seminal paper on income convergence, using different datasets, explanatory variables and evaluation methods (including Barro 1991; Barro and Sala-i-Martin 1992; Sala-i-Martin 1996; Williamson 1996; Taylor 1999, and many more). Combining the baseline growth model with the fundamentals of the new economic geography (NEG), the regional science literature has introduced spatial dependence into the growth regression model. The earliest studies on spatial growth models include Armstrong (1995), Bernat (1996), and Fingleton and McCombie (1998). Thorough reviews on spatial growth studies are provided by Rey and Montouri (1999), Arbia (2006), Fingleton and López-Bazo (2006), Rey and Le Gallo (2009) and Le Gallo and Fingleton (2014). Most of them find a significant impact of spatial location patterns on regional income convergence.

Empirical work on Russian regional income growth covers different time periods. Extensive surveys are provided by Glushchenko (2010); Gluschenko (2012) and Guriev and Vakulenko (2012). The convergence hypothesis has been systematically rejected in papers focusing on the earliest post-Soviet period, which may be due to either short time horizons, or missing data, or both. On the contrary, more recent studies provide growing evidence of regional convergence in per capita income across subjects-of-the-federation in Russia. The results also tend to depend on the choice of a regional income measure: while per capita gross regional product (GRP) differentials persist, the cross-regional gaps in per capita incomes and regional wages have shrunk substantially.

Earlier papers on regional income inequalities fully ignored the spatial structure of Russian regions and therefore did not use the tools of spatial econometrics. There is, however, growing empirical support for there being a substantial impact of geography on the dynamics of regional incomes, see Table 5 in “Appendix 1” for a summary. In general, it is fair to say that the evidence provided by studies of regional income convergence in Russia is inconclusive. It is also worth noting that very few papers discuss possible dissimilarities in regional interaction for different parts of the country. Demidova (2015) revealed an asymmetric influence of eastern and western regions on each other using a partitioned spatial matrix. Ivanova (2014) tested whether estimates of a spatial growth model are sensitive to the spatial weight choice. To the best of my knowledge, neither wage convergence nor per capita income convergence in Russia has been studied at lower levels of spatial aggregation than at the subject-of-the-federation level.

3 A theoretical model of spatial convergence in real wages

As pointed out in Introduction, spatial convergence in per capita incomes has found its theoretical grounds in the spatial growth literature (Ertur and Koch 2007). What about theoretical foundations of spatial beta-convergence in real wages? The purpose of this section is to demonstrate that such convergence can be justified by NEG models. To show this, I use the multi-city setting proposed by Tabuchi et al. (2005) to derive a spatial beta-convergence equation for real wages across cities. For the sake of consistency, I start with a brief description of the model setup.Footnote 2

There are n cities. Denote the share of total population residing in city i by \(\lambda _{i}\), and the distance between cities i and j by \(d_{ij}>0\). Define the distance matrix by \(\mathbf {D}\equiv (d_{ij})\) for \(i,j=1,\ldots ,n\); the vector of population distribution across cities by \(\varvec{\lambda }\equiv (\lambda _{1},\ldots ,\lambda _{n})^{\mathrm{T}}\), where \(\mathrm {T}\) denotes the transpose operator; and the vector of real wages in cities by \(\mathbf{V}(\varvec{\lambda },\,\mathbf {D})\equiv [V_{1}(\varvec{\lambda },\,\mathbf {D}),\ldots ,V_{n}(\varvec{\lambda },\,\mathbf {D})]^{\mathrm{T}}\).Footnote 3 Because \(\lambda _{i}\) is city i’s population share, we have

$$\begin{aligned} \sum _{j=1}^{n}\lambda _{j}=1. \end{aligned}$$
(1)

Equation (1) can be equivalently restated as \(\varvec{\lambda }\in \Delta _{n-1}\), where \(\Delta _{n-1}\) is the standard \((n-1)\)-dimensional simplex. This formulation, although less intuitive than (1), will prove convenient below.

The key assumption of the model is that migration decisions made by workers are fully driven by pairwise comparisons of real wages between cities.Footnote 4 More specifically, the net migration flow from city i to city j is proportional to (i) the real wage differential \(V_i-V_j\) between cities, and (ii) the speed of adjustment \(f_{ij}(\varvec{\lambda }, \mathbf {D})\) of migration decisions between cities i and j, which depends, in general, on the whole population distribution \(\varvec{\lambda }\) and the whole distance matrix \(\mathbf {D}\). For simplicity, the pattern of adjustment speeds is assumed to be symmetric: \(f_{ij}(\varvec{\lambda }, \mathbf {D})=f_{ji}(\varvec{\lambda }, \mathbf {D}),\; i,j=1,\ldots ,n.\) Finally, it is required that \(f_{ij}(\varvec{\lambda }, \mathbf {D})\) are sufficiently differentiable in \(\varvec{\lambda }\).

As shown by Tabuchi et al. (2005, Eq. (6) on p. 431), the migration dynamics in the model can be described by the following ordinary differential equation (ODE) system:

$$\begin{aligned} \frac{\mathrm {d}\varvec{\lambda }}{\mathrm {d}t}= \mathbf {F}(\varvec{\lambda },\mathbf {D})\cdot \mathbf{V}(\varvec{\lambda },\mathbf {D}), \end{aligned}$$
(2)

where t denotes time, while the matrix \(\mathbf {F}(\varvec{\lambda },\mathbf {D})\) is constructed as follows: its off-diagonal entries are the speeds of adjustments taken with the opposite sign, while its iith entry is given by \(\sum _{j\ne i}^{n}f_{ij}(\varvec{\lambda }, \mathbf {D})\).

Given the above assumptions, it is readily verified that \(\mathbf {F}(\varvec{\lambda },\mathbf {D})\) is a symmetric positive semi-definite \(n\times n\) matrix, which is continuously differentiable in \(\varvec{\lambda }\) and satisfies the following identity:

$$\begin{aligned} \mathbf {F}(\varvec{\lambda },\mathbf {D})\cdot \mathbf{1=0}, \end{aligned}$$
(3)

where \(\mathbf {1}\equiv (1,\ldots ,1)^{\mathrm {T}}\).

Let \(\varvec{\lambda }^{*}\) be a stable spatial equilibrium of the system (2), which requires that \({\mathrm {d}\varvec{\lambda }}/{\mathrm {d}t} = 0\). Then, as implied by (2) and (3), it must be that \(\mathbf {F}(\varvec{\lambda },\mathbf {D})\cdot \mathbf{V}(\varvec{\lambda },\mathbf {D}) = {V^*}\cdot \mathbf {F}(\varvec{\lambda },\mathbf {D})\cdot \mathbf{1}\), which yields \(\mathbf{V}(\varvec{\lambda }^{*},\mathbf {D})=V^{*}{} \mathbf{1}\), i.e., the equilibrium real wage must be the same in all cities. In other words, the long-run equilibrium \(\varvec{\lambda }^{*}\) displays real wage equalization across cities.

Assume that the initial state of the system is off-equilibrium: \(\varvec{\lambda }(0)\ne \varvec{\lambda }^{*}\). This can be interpreted in two ways: (i) migration frictions, which were strong in the past, i.e., at times \(t<0\), but have been relaxed drastically at \(t=0\); (ii) the existence of compensating wage differentials (Roback 1982). Both these cases are highly relevant for Russia. First, until the end of the Soviet era, the interplay between agglomeration and dispersion forces, which constitute the whole essence of NEG, was not a key factor for city size distribution or city-level wages in Russia. Therefore, one may consider the process of convergence in real wages from their disequilibrium values (generated by the inertia typical for an ex-planned economy with low and impeded mobility) to their equilibrium values as being shaped by the interaction of centripetal and centrifugal forces. Second, there was a special wage policy in the USSR, to attract labor to settlements with bad amenities by setting higher real wages to compensate, see Sect. 4 for details.

It remains to obtain testable convergence equations. To do so, we need to derive a system of ODE describing real wage dynamics. Intuitively, this can be done by simply changing variables in (2). There is, however, a technical difficulty. Due to (1), the mapping \(\mathbf {V}(\varvec{\lambda },\mathbf {D})\) is not invertible, since it maps \(\Delta _{n-1}\), which is an \((n-1)\)-dimensional surface, onto \(\mathbb {R}_{+}^{n}\), which is an n-dimensional space. Therefore, (i) not any vector \(\mathbf {V}(0,\mathbf {D})\in \mathbb {R}_{+}^{n}\) can serve as a vector of initial real wage levels and (ii) we cannot compute \(\mathrm {d}\mathbf {V}/\mathrm {d}t\) as

$$\begin{aligned} \frac{\mathrm {d}\mathbf {V}}{\mathrm {d}t}= \left( \frac{\partial \mathbf {V}}{\partial \varvec{\lambda }}\right) ^{-1} \frac{\mathrm {d}\varvec{\lambda }}{\mathrm {d}t}, \end{aligned}$$

because the inverse Jacobi matrix \(\left( \partial \mathbf {V}/\partial \varvec{\lambda }\right) ^{-1}\) is not well defined.

To tackle these difficulties, we introduce \(\mathbf {y}(\varvec{\lambda },\mathbf {D})\equiv (y_{1}(\varvec{\lambda },\mathbf {D}),\ldots ,y_{n}(\varvec{\lambda },\mathbf {D}))\), where \(y_{i}(\varvec{\lambda },\mathbf {D})\) is the normalized real wage in city i defined by

$$\begin{aligned} y_{i}(\varvec{\lambda },\mathbf {D})\equiv \frac{V_{i}(\varvec{\lambda },\mathbf {D})}{\sum _{j=1}^{n} V_{j}(\varvec{\lambda },\mathbf {D})}. \end{aligned}$$

The mapping \(\mathbf {y}(\varvec{\lambda },\mathbf {D})\) maps \(\Delta _{n-1}\) onto itself, and it is invertible at any point of the interior of \(\mathbf {y}(\Delta _{n-1},\mathbf {D})\). We are now equipped to obtain the following result.

Proposition 1

Assume the migration dynamics ODE system (2) has a unique steady state \(\varvec{\lambda }^{*}\), which is interior and stable. Then:

  1. (i)

    in a neighborhood of \(\varvec{\lambda }^{*}\), the system (2) can be equivalently recast into the normalized real wage dynamics ODE system:

    $$\begin{aligned} \frac{\mathrm {d}\ln y_{i}}{\mathrm {d}t}=g_{i}(y_{i},\mathbf {y}_{-i},\mathbf {D}); \end{aligned}$$
    (4)
  2. (ii)

    the system (4) has a unique steady state given by \(\mathbf {y}^{*}=\mathbf {1}/n\), and it is asymptotically stable;

  3. (iii)

    for any fixed moment of time \(T>0\), the following “convergence equations” hold (approximately) in the vicinity of the steady state \(\mathbf {y}^{*}\):

    $$\begin{aligned} \ln y_{i}(T)-\ln y_{i}(0)\approx \alpha +\beta _{i}(\mathbf {D})\ln y_{i}(0)+\sum _{j \ne i}w_{ij}(\mathbf {D})\ln y_{j}(0), \end{aligned}$$
    (5)

where \(y_i(t)\) is a solution to (4) as a function of t.

Proof

See “Appendix 2.”

Equation (5) can be viewed as a spatial beta-convergence equation for the following reasons. First, it relates the growth rate of real wages to the initial levels of real wages in neighboring cities. Second, the coefficients \(w_{ij}(\mathbf {D})\) depend on the distance matrix \(\mathbf {D}\), whence they can be interpreted as spatial weights. \(\square \)

4 Real wages in Russian cities

4.1 Data description

City wage data I use average monthly wages in Russian cities provided in the Multistat database.Footnote 5 I omit settlements of the Chechen republic and the Ingushetia republic because of missing data. Also, I exclude settlements that have lost the status of a city and have become a part of another city between 1996 and 2013,Footnote 6 resulting in a dataset of 997 cities, and the time span is 1996–2013.

To compute real wages, I deflate nominal wages by a year- and region-specific measure of the cost of living. This measure reflects the value of a fixed set of goods and services, typically referred to as the “market basket of subsistence goods” (Remington 2015). The market basket values are not reported by Russian statistical agencies at the city level, but they are available at the subject-of-the-federation level starting from 2000.Footnote 7 For the years before 2000, I use the regional consumer price index (CPI) as a proxy for the basket value. The real wage data reveal very large differences between Russian cities, especially at the end of 1990s. In 1996, the richest city in the sample had a real wage 29.1 times larger than the poorest one. By 2013, this ratio had decreased to 5.5.

Next, I have geo-coded city-level real wages. The spatial distribution of real wages in 2013 is shown in Fig. 1, where classes are defined as deciles. Richer circles (i.e., cities with higher real wages) tend to correspond to cities with larger population: Moscow, Saint Petersburg, Yekaterinburg. However, cities located in oil and gas regions (e.g., Tyumenskaya or Sakha Yakutia) also display high wages although they have much smaller populations.

Fig. 1
figure 1

City wages 2013 (in market baskets) and the Far North territories

The Far North Recent studies on Russian income data show that results may depend substantially on the part of the country (Ahrend 2005; Demidova 2015; Sardadvar and Vakulenko 2016). The standard east-west split, which separates the European and Asian parts of Russia, runs along the Ural Mountains. Although real wages in eastern cities are higher than in western cities, there are no a priori reasons to believe that they converge faster or slower. Therefore, the east-west partition is hardly relevant in the present context. I choose instead to split the sample into Far North (FN) cities, where a special wage policy is implemented, and non-Far North cities.

The notion of FN territories was established during the Soviet period and comprised the regions of the USSR with severe climate conditions and/or poor transport connection to the rest of the country. Most of these territories (see Fig. 1) are in the Arctic Circle, whence the label “Far North.” In order to attract skilled workers, e.g., doctors and teachers, to Far North territories, the Soviet government offered higher wages and other benefits (longer vacation periods, lower retirement ages, etc.) to those who chose to work in the Northern regions. This practice is referred to as the “Northern” benefits.

These benefits were continued in Russia after 1991 and are now regulated by the Federal Law (2004). In 2005, most of the benefits provided by the Law were converted into cash allowances, while the burden of payments was redistributed among the Federal government and local governments. The availability of “Northern” benefits now depends on the ability of enterprises to pay bonuses, which varies greatly both across sectors and cities (Wengle and Rasell 2008).

The whole FN area comprises about 70% of Russia’s total area, the so-called territories equated to the Far North in Russia, which have either zero road connections with the rest of the country, or connections which suffer from seasonal isolation. Loosely speaking, these regions are “island regions.” Settlements located there may hence be considered as remote areas with peculiar conditions of spatial interactions.

I split the dataset into two subsets: the FN cities located in the Far North regions or regions equated to the Far North (119 cities in 19 regions), and the non-FN cities located mainly in the western part of the country (878 cities in 59 regions). The real wages are higher on average in the FN cities than in the rest of the country (see Figs. 1 and 6 in “Appendix 3”).

Fig. 2
figure 2

Real city wages and GRP per capita in market baskets, 1996–2013

Real wages as a proxy for per capita GRP Using labor force as weights, I compute the monthly weighted average real wages and the weighted average GRP per capita measured in market baskets. I average real wages across cities and GRP across subjects-of-the-federation, since there are no data available on per capita GRP at the city level. Figure 2 shows that these two indicators are highly correlated. The bottom-right panel of Fig. 2 shows the dynamics of the cross-sectional correlation coefficients between real city wages and GRP per capita at the subject-of-the-federation level (see also Table 6 in “Appendix 3”). The correlation coefficient is always above 0.5. The other three panels of Fig. 2 show the scatterplots of real wages versus GRP per capita in market baskets computed, respectively, for the whole country, for the FN part and for the non-FN part. In all three cases, the fitted lines are very close to 45° line, and the corresponding correlation coefficients are very close to 1. This strong positive correlation makes me conclude that real city wages in Russia may serve as a reasonable proxy for gross regional income at a more disaggregate administrative level, at least within the time period of my study.

4.2 Spatial autocorrelation of real wages

Figure 1 suggests that there are spatial patterns in the distribution of real wages across Russian cities and towns. The choice of spatial weights describing the intensity of the spatial effect on a given location is one of the key issues in spatial analysis. Harris et al. (2011) provided an extensive review of the standard approaches to constructing a spatial weights matrix. It is common practice to use either contiguity or distance-based spatial weight matrices. Distances between regions are measured as distances between the regional central (or largest) cities, and this type of spatial weights leads to rather crude measures of the spatial distribution at the regional level. However, it suits better for cities. Furthermore, there are papers constructing weights based on geodesic distances along highways or railways. In Russia, this way of measuring distances may lead to mixing up the effects of overall spatial interactions between regions with those of particular infrastructural improvements. While for most of the western part, land transportation networks are well developed, there are regions that are poorly connected with other territories via railroads (e.g., Tomskaya, Tyva, Altay), but have much better highway connections. There are also remote territories (e.g., Magadanskaya, Chukotka) for which the common way to access other regions is air transport combined with land transport. These considerations reveal the difficulties of constructing sophisticated measures of distance based on detailed information on the Russian transportation system.

An alternative strategy for constructing spatial weights relies on a fixed number of neighbors for each spatial unit. However, this method is also problematic, as the spatial distribution of population in Russia is highly asymmetric, as is the size distribution of the regional units. Therefore, assuming that each region has the same number of neighbors to interact with would mean that the radius of spatial interaction for the FN regions would be several times longer than that for the rest of the country. For example, 10 neighbors, as used by Sardadvar and Vakulenko (2016) in the case of Sakha Yakutia (a region in the Far North), comprise about 37% of the area of Russia, while the same number of nearest regional neighbors for the Moscow city, which is treated as a separate subject-of-federation in the Russian official statistics, covers less than 3% of the country’s area.

Fig. 3
figure 3

Kernel density estimation of pairwise great circle distances between Russian cities

I construct the spatial weights as follows:

$$\begin{aligned} w_{ij}^{({C})}=\left\{ \begin{array}{ccl} 1/d_{ij} &{}\quad \mathrm {if} &{} d_{ij}<C,\, i\ne j, \\ 0 &{} \quad \mathrm {if} &{} i=j, \end{array} \right. \end{aligned}$$

where \(d_{ij}\) is the great circle distance between cities i and j, while \(C>0\) is a cutoff distance. When C is greater than the maximum distance between spatial units in my data, I get standard inverse distances as a limiting case.

Given the cutoff C, the spatial weights \(w_{ij}^{({C})}\) as defined above are functions of the great circle distances only. The great circle distance may serve as a proxy for distance covered by land transport (for the densely populated part of the country) and by air transport (for remote regions). Meanwhile, the cutoff distance C is not a priori specified, which provides an additional dimension of flexibility.

Table 1 Number of cities in connected inverse great circle matrices, 100–2000 km

The distribution of pairwise distances between FN cities and between non-FN cities is shown in Fig. 3. The median distance for FN cities is 2305 km, and the mean is 2698 km; for non-FN cities the median distance is 1159 km and the mean is 1545 km. There are no grounds to take median or mean values of pairwise distances as a reasonable cutoff distance for the spatial interactions. As a robustness check, I construct weights with different cutoff distances starting from 100 km with 50-km steps. For small values of C, I get matrices with “islands,” i.e., at least two blocks not connected with each other. In such cases, I cut my sample by focusing on observations which generate a connected weight matrix of the highest dimensionality. The number of cities in the largest connected matrix with corresponding cutoff distances C are reported in Table 1. The smallest cutoff distance for which all the cities form a connected matrix is 900 km. When \(C<250\) km, no more than 40% of FN cities are involved in the spatial interaction matrix, while the percentage of non-FN cities is greater than 90 within the same cutoff distance.

As shown in Fig. 3 and Table 1, the radius of interaction up to 300–400 km corresponds to the biggest changes in number of city pairs. The average number of neighbors of each city within the cutoff distance is also shown in Table 1. Within a radius of 1000 km, each city interacts with one-third of all other cities, on average. When the cutoff distance is 2000 km, this share is about 70%. I obtain a weight matrix corresponding to the complete graph when \(C\ge 7600\mathrm{\ km}\).

Fig. 4
figure 4

Moran’s I for log of city wages (in market baskets), 1996–2013. Spatial weights—row-standardized inverse great circle distances with cutoffs. Number of cities in parentheses

In order to measure the overall tightness of wage co-movements in spatially close cities, I compute Moran’s global spatial autocorrelation index I using row-standardized spatial weights \(w_{ij}\). Figure 4 shows annual changes in the global Moran’s I for log real wages calculated using inverse great circle distances with different cutoffs. All indices are positive and significant at the 1% level, whence city-level wages are spatially autocorrelated. Starting from a cutoff distance of 300 km, the pattern Moran’s I follows barely changes as C increases further. Hence, a radius of 300 km captures almost all spatial autocorrelation in real city wages.

To sum up, there is significant spatial autocorrelation in real wages in Russian cities. Ignoring possible inter-city spatial links may therefore lead to biased estimates of wage convergence.

5 Spatial convergence

5.1 Spatial sigma-convergence

Figure 5 shows that the variation of real city wages (measured by the standard deviation of log real city wages) decreases over time. Changes in the standard deviation of the spatial lags of log real wages are presented in Fig. 5, where the lags are constructed as \(\nu _{i,t}={ \sum _{j=1}^{n}w_{ij}}y_{i,t}\), with \(y_{i,t}\) the log real wage in city i and year t. These two figures provide evidence for the decreasing volatility of wages in city neighborhoods as defined by the cutoff distance.

Fig. 5
figure 5

Standard deviation of real wages (log) and their spatial lags, 1996–2013. Spatial weights—row-standardized inverse great circle distances with cutoffs. Number of cities in parentheses

To test the hypothesis of spatial sigma-convergence, I use a panel unit root test for spatially lagged real wages and consider a first-order autoregressive component

$$\begin{aligned} \nu _{i,t}=b\nu _{i,t-1}+\gamma _{i}+\delta _{i}t+\varepsilon _{i,t}, \end{aligned}$$
(6)

where \(\gamma _{i}\) is an individual fixed effect; where \(\delta _{i}t\) is a time trend that captures the fact that average city-level wages increase over time (see Fig. 6); and where \(\varepsilon _{i,t}\sim iid(0,\sigma ^2)\) is the error term.

I split the time period into two sub-periods: 1996–2005 and 2006–2013. This is done to capture the possible impacts of: (i) changes in the fixed set of goods and services used to define the market basked, which was revised in late 2005 and (ii) changes in the Federal Law of 2004, when the practice of converting “Northern” benefits into cash allowances started. Note that both time periods are relatively short and the number of observations in each panel is large. This has been taken into account when running panel unit root tests. I test the null hypothesis \(H_{0}\): \(b_{i}=1\) for \(i=1,\ldots ,n\) (the panels contain unit roots) using two panel unit root tests.

The first test by Harris and Tzavalis (1999; henceforth HT) tests the above-mentioned null hypothesis against the “homogeneous” alternative \(H_{a}^{\mathrm {HT}}\): \(-1<b_{i}=b<1\) for \(i=1,\ldots ,n\) (the panels are stationary). Estimates for the HT tests using spatial lags of real wages with different cutoff distances are provided in Table 7 in “Appendix 4.” Spatially determined panels of real city wages are stationary during 1996–2005 when the time trend is not included, and this does not depend on C, except for the FN dataset with a large radius (\(C\ge 4000\) km). When panel-specific time trends are included, only large distances (\(C\ge 900\) km) yield spatially weighted real wages that converge during the first time period. For the second period (2006–2013), neighboring real city wages diverge for distances of less than 900 km and converge for larger distances, when there are no time trends. If I include panel-specific time trends, spatially lagged real city wages diverge between 2006 and 2013.

The alternative \(H_{a}^{\mathrm {HT}}\) imposes the restriction that \(b_{i}=b\) for all i, which implies that the rate of convergence would be the same for all cities (regions). Maddala and Wu (1999) showed that this implication is too restrictive in practice.

I hence run a second unit root test that relaxes this restrictive assumption. This test by Im, Pesaran, and Shin (2003; henceforth, IPS) tests \(H_{0}\) against the “heterogeneous” alternative \(H_{a}^{\mathrm {IPS}}\): \(-1<b_{i}<1\) for \(i=1,\ldots ,n_{1}\) and \(b_{i}=1\) for \(i=n_{1}+1,n_{1}+2,\ldots ,n\) (the fraction of panels that follow stationary processes is nonzero).

The estimation results for the IPS test for spatially determined real wages are given in Table 8 in “Appendix 4”. Relaxing the strong alternative \(H_{a}^{\mathrm {HT}}\) and taking \(H_{a}^{\mathrm {IPS}}\) instead suggest that there is a nonzero share of stationary panels of weighted city real wages, i.e., there is spatial sigma-convergence of real wages during both sub-periods, when city-specific time trends are included.

I thus reject the null hypothesis of a unit root and find support for conditional (in terms of the IPS alternative) sigma-convergence in wages for neighboring cities.

5.2 Spatial beta-convergence

As shown by Quah (1993), beta-convergence is a weaker property than sigma-convergence. I have shown in the previous subsection that the strong alternative of spatial sigma-convergence does not hold for real wages in Russian cities. I thus now relax the assumption and use a test of beta-convergence which is based on the cross-sectional Barro regression:

$$\begin{aligned} \ln y_{i,T}=\alpha +(1+T\beta )\ln y_{i,0}+\gamma ^\mathrm{T}\mathbf {X}_i+\varepsilon _{i}, \end{aligned}$$
(7)

where \(y_{i,0}\) and \(y_{i,T}\) are the real wages in the initial period and the final period; where \(\mathbf {X}_i\) is a vector of controls; and where \(\varepsilon _{i}\sim iid(0,\sigma ^2)\) is the error term.

I consider two groups of factors as controls: (i) public sector-related variables and (ii) natural resource-related variables. I use these factors to construct proxies that capture their possible effects on real city wages.

Public sector A substantial part of the labor force in Russia is employed in the public sector. According to the Federal State Statistic Service, 42% of the total labor force in Russia were employed in state- and municipality-owned companies in 1995. That share had decreased to 28% in 2013. The consolidated budgets of Russian regions get revenues from the following sources: corporate income taxes, taxes on personal income, property taxes, and gratuitous receipts. The later source is, mainly, a federal non-refundable transfer, i.e., subsidies. The Russian fiscal system redistributes taxes from the federal budget to regional budgets; regions with poor economic performance receive more subsidies than prosperous ones. Ponomareva and Zhuravskaya (2004) considered the reasons for subsidizing firms and noted that all levels of government may have incentives to subsidize firms to secure inefficiently high employment or bribes. The incentives of regional governments to redistribute subsidies efficiently may be weak, because when a region increases in its own budget revenues, federal subsidies to the region are reduced.

There are no available city-level data on the labor force and wages in the public/private sectors. In order to control for the public sector in the Barro regression, I use the percentage of non-refundable subsidies from the federal budget in the local regional budget. Annual subject-of-the-federation-level data on subsidies has been available since 2000. I take the average for 2000–2005 and the average for 2006–2013 for the two time periods, and I include a variable Subs defined as \(\ln (1+ \hbox {average subsidies})\) into the Barro regression.

Natural resources The share of natural resource rents in Russian gross domestic product is large. According to the World Bank Open Data, it was 9.83% in 1996 and 19.35% in 2005.Footnote 8 This increased during 1996–2005, while the 2006–2013 period was characterized by a decline in the percentage of total natural resource rents in GDP: it has decreased from 19.42% in 2006 to 13.73% in 2013. In order to capture the possible impact of natural resource rents on real wages, I include a variable Emp computed as \(\ln \)(\(1+\) the share of labor employed in the mining sector in total city labor) into the Barro regression. City-level data on labor in the mining sector have been available since 2005. I take the average over 2006–2013. In order to alleviate endogeneity concerns, I instrument the share of city labor in mining in 2005 using the share of mining in GRP for the previous year, 2004.

Spatial model specification The standard approach to studying regional growth involves constructing control variables describing, among others, economic diversity and specialization, measures of physical and human capital, market potential and the geographical characteristics of a spatial unit. Most of such data are not available for Russian cities. Hence, I face a problem of omitted variables in the regression. This problem is intrinsic to models run on Russian city-level data. LeSage and Fischer (2008) argued that spatial Durbin models are a natural choice over competing alternatives of spatial models in the case of missing explanatory variables.

Following LeSage and Fischer (2008), I include a spatial lag of the dependent variable on the right-hand side of the Barro regression in order to account for missing explanatory variables. I also include a spatial lag of the initial wage values. My baseline spatial regression of beta-convergence of real city wages in Russia is a spatial Durbin model (SDM) with a “Far North–non-Far North” interaction term:

$$\begin{aligned} \mathbf {y}_{T}= & {} \alpha \mathbf {1}+\left( 1+T\beta ^{nonFN}\right) \mathbf {y}_{0}^{nonFN}+\left( 1+T\beta ^{FN}\right) \mathbf {y}_{0}^{FN} \nonumber \\&+\,\gamma ^{nonFN}\mathbf {X}^{nonFN}+\gamma ^{FN}\mathbf {X}^{FN}+c\cdot \mathbf {dnorth}\nonumber \\&+\,\rho \mathbf {W}\mathbf {y}_{T}+\delta ^{nonFN}\mathbf {W}\mathbf {y}_{0}^{nonFN}+\delta ^{FN}\mathbf {W}\mathbf {y}_{0}^{FN}\nonumber \\&+\,\mathbf {W}\mathbf {X}^{nonFN}\mathbf {\theta }^{nonFN}+\mathbf {W}\mathbf {X}^{FN}\mathbf {\theta }^{FN}+\mathbf {\varepsilon }_{T}, \end{aligned}$$
(8)

where \(\mathbf {y}\) is the \(n\times 1\) vector of log real wages; T and 0 denote the final year and the initial year, respectively; \(\mathbf {1}\) is the \(n\times 1\) vector of ones; \(\beta ^{nonFN}\) and \(\beta ^{FN}\) are the convergence parameters; \(\mathbf {dnorth}\) is a \(n\times 1\) vector of dummy variables indicating whether city i is located in the Far North or not; \(\mathbf {X}\) is a matrix of control variables; \(\mathbf {W}\) is a \(n\times n\) weight matrix built from inverse great circle distances with cutoffs C (the number of cities n depends on C; see Table 1 and Fig. 3); \(\rho \) is the spatial autoregressive parameter; \(\mathbf {\theta }\) is the vector of the parameters of the spatially lagged control variables; and \(\varepsilon _{T}\sim N(0,\sigma ^{2}\mathbf {I}_{n})\) is the error term. In what follows, superscripts FN and nonFN refer to the FN and non-FN datasets, respectively.

Equation (8) captures spatial general-equilibrium effects in a simple way: changes in one observation drive a series of changes in all regions in the sample until a new long-run steady-state equilibrium arises. For example, high wages in city i may attract migrants from neighboring cities j and k, so the employment rate in i will change. Employers may offer lower wages in i, while the employment rate will decrease in cities j and k, which may lead to higher salaries there. Neighbors of j and k may react to new wages in a similar way, which eventually generates a global spatial feedback effect.

5.3 Estimates and interpretations

Because of the presence of the spatial lag of the dependent variable, the coefficients in the SDM (8) cannot be directly interpreted as marginal effects which describe the magnitude of changes in the dependent variable that arise from changes in the explanatory variables. A change in an explanatory variable in city i will result in a direct impact on the wages of city i in the final year, and an indirect impact arising from spatial relationships with cities \(j\ne i\). The averaged sum of such impacts for each explanatory variable is called the average direct effect of the explanatory variable (LeSage and Pace 2009).

When the wages in city i increase, i.e., when the population of city i becomes richer, then, according to the beta-convergence hypothesis, the wages in city i will grow more slowly. The direct impact of spatially close cities is positive and less than 1.

Estimates of the SDM with different spatial matrices are summarized in Tables 2 and 3. I use Bayesian estimation methods (see “Appendix 5” for a detailed description of my estimation strategy), and the spatial weights are row-standardized inverse great circle distances with cutoffs. I estimate Eq. (8) with different distance cutoffs: 200–1000 km with 100-km steps, and 1000–8000 km with 1000-km steps. In order to interpret my estimation results, I do not report coefficients for explanatory variables but provide their average direct effects.

As shown in Table 2, the spatial parameter \(\rho \) for the SDM with different spatial matrices introduced above is positive and highly significant. Therefore, the real city wage is related to the real wages of neighboring cities after conditioning for the effect of initial real wages \(y_{0}\) and various controls. The values of \(\rho \) increase with the cutoff distance, C.

Table 2 Estimates of SDM for the period 1996–2005

The direct effects of the initial wages represent the impact of this variable on wages of the final year, both immediate (own effect) and mediated (feedback effects via neighbors). For the first time period, the direct effects of initial wages vary from 0.577 to 0.609 for non-FN cities and from 0.596 to 0.838 for cities of the Far North (see Table 2). For the second time period, they vary from 0.602 to 0.661 for non-FN cities, and from 0.691 to 0.838 for the FN cities (see Table 3).

One of the main results of the SDM is that the direct effect of initial wages on final wages is stronger for FN cities than for non-FN cities. This holds for both time periods, as shown in Tables 2 and 3. Furthermore, the values of the direct effects of initial wages on real wages in the final year in non-FN cities are fairly robust to the choice of C, which is consistent with LeSage and Pace (2009). A tendency for the stabilization of direct effect estimates starts from about \(C=400\) km, when \(n<997\), i.e., when some remote settlements are excluded. The differences between the direct effects of the initial real wages in the Far North decrease in the case of long distances.

Table 3 Estimates of SDM for the period 2006–2013

The direct effects of regional subsidies on real wages in non-FN cities are significantly negative during both time periods. This captures the effect of federal non-refundable transfers to regional budgets: regions that are able to grow faster do not need federal support. My estimates of the SDM do not provide evidence of a significant impact of federal subsidies on wages in FN cities. A possible reason is that the policy of the “Northern” benefits creates inertia and slows down convergence in wages. This result echoes Sardadvar and Vakulenko (2017) who find that “Eastern regions suffer from circumstances which are difficult or even impossible to change by regional policies”.

Finally, the direct effects of employment in mining are significantly positive only for the non-FN cities during 1996–2005, when the share of natural resource rents in GDP was growing. This finding is in line with Solanko (2008), who studied regional income growth and used the share of extractive industries in GRP as a proxy for natural resources. The impact of employment in mining on real wages is no longer significant for 2006–2013, when the share of the mining sector in GDP was decreasing.

To sum up, estimates of the spatial Durbin model of the Barro regression are consistent with the expected spatial impact of initial wages on final real wages. The parameters for spatial lags of both initial and final wages are significant and quantitatively large.

5.4 Model comparison

My estimates of the SDM provided in Tables 2 and 3 depend on cutoff distances. Starting from \(C=900\) km, the estimates are based on the same number of cities. Bayesian posterior model probabilities are shown in Table 4. These probabilities have been calculated using information stored during the sampling of the log-marginal density vectors for each model (i.e., for each cutoff distance). They can be used for the purpose of model comparison.

The highest posterior probability for the regression using the first time period corresponds to the case of \(C=8000\) km, i.e., it captures spatial effects using all cities. For the second time period, the radius of spatial interaction is much smaller at \(C=1000\) km. It should be stressed that this does not mean restricting the spatial impacts only to cities located within 1000 km or geographically closer. The spatial matrix is connected, and in the case of the SDM, the effect can arise from neighbors of neighbors, with corresponding reduced spatial weights.

Table 4 Posterior model probabilities for SDM

Speed of spatial convergence The dependence of the dependent variable (\(y_{t+\tau }\)) on its initial level (\(y_{0}\)) disappears for large \(\tau \). The time required for this is usually analyzed based on the so-called half-life to convergence: \(HL=\tau \cdot T\), where the coefficient \(\tau \) stands for a number of T-year-long periods it takes to reach a half-way point on the path to the new steady-state equilibrium, \(\tau =-\ln 2/\ln (1+\beta )\). As noted, the coefficients of the SDM cannot be interpreted as a measure of a regressor’s impact on the dependent variable. For this purpose, I use estimates of the direct effects of initial wages instead of \(1+T\beta \), compute a spatial analogue of \(\beta \) and substitute it into the formula for \(\tau \) and HL for both parts of the dataset. See Tables 2 and 3 for values of \(\tau \).

Choosing the best model in terms of posterior probabilities (column 10 of Table 2), I find that the half-life to convergence is 14.4 rounds of 9 years for the non-FN cities, and 19.9 rounds of 9 years for FN cities. The best model for the second time period corresponds to \(C=1000\) km (column 6 of Table 3), which implies 12 rounds of 7 years for the non-FN cities and 15.4 rounds for FN cities. Hence, my estimation results reveal that the Far North real wages converge more slowly than in the rest of the country during both periods.

During the first period, commodity prices were increasing rapidly. The commodity boom reached its peak in 2007, after that commodity prices fell sharply (Åslund 2013). During boom periods, commodity bases seem to drive divergence (we observe a slower convergence speed during the first time span), but that effect disappears in times of commodity turndowns.

In the presence of amenity differentials, one can never see full convergence. Therefore, it is hard to state when convergence will be achieved, since there is no information on which equilibrium real wage differences are compatible with spatial equilibrium.

6 Conclusion

This paper tests the convergence in real wages across Russian cities and studies the impact of spatial proximity on the speed of convergence. I find conditional sigma-convergence of real wages in spatially close cities and show that this result is robust to different cutoff distances and the exclusion of remote towns. I also show that there is spatial beta-convergence of real city wages. The impact of the public sector, the share of which is still large in Russia, on real wage growth is negative. The impact of natural resources on real wage growth was significantly positive during 1996–2005, and this finding is consistent with earlier studies. However, this effect disappeared after 2005, when the share of extractive industries in GDP was decreasing. I also find that settlements of the Far North with a special state wage policy were converging more slowly than the non-Far North cities, including western cities.

A final comment is in order. Little work has been done on city-level income convergence in the Russian economy until now. This is arguably due to poor data quality, which precludes the construction of many required control variables. Using location patterns as a source of additional information, and applying spatial econometric techniques, yields a way to alleviate some of these difficulties.