Introduction

India’s growth story in the last two decades has been watched with great interest in international academic and policy fora. India constitutes more than 17% of the world’s population; its growth performance has been important in shaping the evolution of the world distribution of income because of its large demographic size and its dramatic changes in income distribution (Bourguignon and Morrisson 2002; Panagariya et al. 2013). Its federal structure and a socially diverse population with multiple religions, languages, castes and cultures have added to inter-state socio -economic differentiation. Wide regional variation in the growth performance of the states in India has kept alive the research interest in this area.

One of the key questions in empirical growth economics that is relevant to India currently is whether rich regions will remain rich and the poor remain poor in an era of liberal market forces. The theoretical expectation is that the initially poor ones will grow faster and catch up with the rich (Barro 1991; Baumol 1986; Sala-i-Martin 1996) and in the long run there would be a convergence of growth rates due to a transfer of technology and factors of production (Solow 1956). The study of economic geography had a marginal role in mainstream economics, but it has now been well emphasized that the features of real world economies are connected with location issues (Cheshire and Malecki 2004; Venables 2010). This notion of economic convergence is connected to the ‘New Economic Geography’ literature that tries to explain the formation of a large variety of economic concentration (agglomeration) in geographical space (Fujita et al. 1999). Clustering forces play an important role in an uneven distribution of economic activity and income across space—examples of the same could be formation of cities, emergence of industrial regions and the existence of regional disparities within nations as well as origins of international inequalities (Borts and Stein 1962).

While the earlier empirical literature focused on developed regions of the world, there has been an increasing interest on India (Kotwal et al. 2011). We add to this literature on India by bringing in neighbourhood effect (space) as a determinant of growth apart from the region’s own initial income using panel data across 28 states from 1981 to 2010.

The rest of this paper is organised as follows: in “Background” section we provide the background to this study, followed by empirical evidence on convergence and spatial dependence in “Convergence and spatial dependence” section. “Data and methodology” section describes the Data and Method of spatial data analysis. We present our empirical findings in “Results” section and the paper concludes with “Conclusion” section .

Background

India’s post-independence era was characterised as a closed economy till the mid-1980s (Basu and Maertens 2007), a semi-open economy from the mid-1980s and a more rapidly liberalising economy from the early 1990s (Cashin and Sahay 1996; Ghosh et al. 1998; Kalra and Thakur 2015). It was expected that liberalisation would help all the states and regions benefit from the market-oriented reforms (Ahluwalia 2000; Ghosh et al. 1998). Contrarily, several studies find that there has been divergence in the growth rates and this has been more prominent in the post-reforms period (Cherodian and Thirlwall 2015; Dholakia 1994; Ghate 2008; Ghosh et al. 1998; Kurian 2000; Sachs et al. 2002).

In addition, both fast-growing and slow-growing states are seen to cluster together (Bandyopadhyay 2011; Kar et al. 2011; Lolayekar and Mukhopadhyay 2017). Despite the recognition of this clustering, as predicted by the New Economic Geography school (Krugman 1991), spatial dependence among the states in India is only beginning to be researched rigorously. Kalra and Thakur (2015) provide a descriptive analysis on the role of geography in Indian regional growth but do not test the convergence hypothesis with any explicit econometric model. Since the current economic growth in India is service sector driven, it is not surprising that there is strong spatial clustering of this sector (Desmet et al. 2015). There is also a contrarian view: states which form clusters are not necessarily neighbours (Bandyopadhyay 2012). This uses the distributional dynamics approach popularised by Quah (1997) but has not relied on an econometric model.

Some earlier writing innovates on the idea of space and looks beyond geographical distance as a determinant (Kocornik-Mina 2009). Proximity is explained by nearness of policies observed through sectoral shares using a prey–predator model (Arbia and Paelinck 2003). Their earlier writing provides new insights, but does not explain what drives the neighbourliness in the outcomes. In a similar genre, Sofi and Durai (2015) examine pair-wise income gaps to understand the process of convergence but also back it up with an econometric analysis. By relying on convergence of sectoral shares or pair-wise income gaps we may be missing some critical characteristics of a country the scale and diversity of India. Geographical distance, language and cultural homogeneity are important in explaining trade between regions and gravity models have provided a rich literature studying this in the context of international trade (Anderson 2011). Geographical distances between states can be significant in India and therefore should not be under-estimated.

In addition to pair-wise analysis, Sofi and Durai (2015) also employ spatial interaction models—the Spatial Autoregressive Model (SAR) and Spatial Error Model (SEM). However, the use of geographic proximity to spatial interactions is misleading and the results are sensitive to the choice of weight matrix in defining spatial interaction. The share of the states in club convergence was found to be higher for the states which do not actually share borders or are geographically close. The classical panel fixed effects models are able to overcome the problems of individual heterogeneity and omitted variables but do not control for spatial dependence. This makes spatial panel data models the desirable econometric framework in the presence of spatial dependence. We extend the existing literature on spatial analysis in India by providing the first estimates of convergence rates from spatial panel data models. Our results provide more efficient and unbiased estimates of convergence rates in India.

In the next section, we discuss the frameworks used in measuring convergence and spatial dependence.

Convergence and spatial dependence

Growth theory suggests that if regions have unequal incomes to start with then they will experience unequal growth rates in the short run but will converge towards a common steady state rate of growth in the long run. Solow (1956) provided a formal model which explains this negative relation between the initial income per capita and the growth rates. The convergence hypothesis is based on the standard neo-classical production function which encounters diminishing returns to reproducible capital. Poor countries or regions with low ratios of capital to labour have a higher marginal product of capital. This is expected to attract greater investment resulting in a higher growth rate. The economy experiences growth in the capital stock and level of output along the transition path to the steady state level (Borts and Stein 1962). The equilibrium steady state income level is determined by the rate of technological progress. Convergence is the process wherein poorer regions grow faster than the rich ones. Two measures of convergence that are commonly discussed in the literature are:

  1. (a)

    β convergence—occurs when poor regions catch up with the rich ones by growing faster than the richer regions. The higher the absolute value of “β”, the quicker the convergence or divergence process.

  2. (b)

    σ convergence—occurs when there is a decline in regional dispersion of Per Capita Income (PCI) over time (see sections O1 and O2 in Online Resource).

Early writing on β convergence focused on a strong notion of convergence called “absolute” or “unconditional” convergence assuming that parameters like the savings rate, technological progress, depreciation and the rate of growth of population is same across the regions. In reality, it is unlikely that these assumptions will be fulfilled and parameters will be same across regions. This problem was overcome by allowing for the notion of conditional convergence, where all regions need not converge to one common steady state but there could be different steady state levels.

The σ-convergence hypothesis assumes that there is a one-time shock to the cross-section of economies in the initial period. Thereafter, the economies move towards their steady state following a smooth and monotonic path.

Solow’s (1956) theory of convergence had to wait empirical examination till Baumol’s (1986) convergence test for a small group of nations. This was followed by Barro (1991), Barro and Sala-i-Martin (1992), Barro et al. (1991) and Sala-i-Martin (1996), among others. A large literature has emerged on the idea of unconditional convergence (where economies converge to a common steady state rate of growth) and conditional convergence (where economies reach different steady states).

Different empirical strategies have been employed in the literature. Some have studied a small number of countries over a large number of years (Maddison 1983), while others have used a large number of countries over shorter periods of time (Barro 1991; Barro and Sala-i-Martin 1992; Islam 1995). Some have also undertaken intra-country (state-level) analysis (Evans and Karras 1996; Kanbur and Zhang 2005). There is evidence of regional convergence over long periods—100 years for states in the United States of America and over 60 years for Japanese prefectures—and also over shorter sub-periods within the same sample (Barro et al. 1991; Barro and Sala-i-Martin 1992; Sala-i-Martin 1996). Developing countries, however, have not exhibited growth convergence (Kanbur and Zhang 2005). When countries interact with each other through channels of trade, technology, capital movements, common economic and political policies, externalities may spill over among the countries. These spill over effects could explain the growth of the nations (Ades and Chua 1997; Krugman 1991; Ramirez and Loboguerrero 2005).

Regional convergence across the states in India has been explored by a number of studies. The empirical methods adopted have included time series analysis, panel regressions as well as non-parametric techniques like transition matrix and kernel density analysis. While some found evidence of convergence in PCI growth rates (Bajpai and Sachs 1996; Cashin and Sahay 1996; Dholakia 1994), others found evidence of regional divergence, especially in the post-reforms periods (Ahluwalia 2000; Dasgupta et al. 2000; Ghosh et al. 1998; Kurian 2000; Mitra and Marjit 1996; Rao and Singh 2001). These studies have viewed the states in India as independent entities and the possibility of dependence among them has been ignored (Sanga and Shaban 2017).

The significant role of spatial effects in convergence processes is widely acknowledged in the literature (Anselin 1988). Ignoring neighbourhood effects could lead to serious bias and inefficiency in the estimation of the convergence rate (Arbia et al. 2005; Getis 2008). Spatial effects could be of two types:

  1. 1.

    Spatial dependence (Spatial Autocorrelation) occurs when variables of one region depend on (or are correlated to) values observed in neighbouring regions.

  2. 2.

    Spatial heterogeneity is the variation in relationships across space.

Ramirez and Loboguerrero (2005) found strong evidence of spatial inter-dependence across 98 countries over 1965–1995. Using different specifications as well as different measures of proximity for 93 countries over 1965–1989, Moreno and Trehan (1997) found that demand and technology spill-overs from neighbouring countries strongly influence a country’s growth. In addition to the country-level analysis, many contributions have examined spatial dependence at the regional or sub-national levels too (Baumont et al. 2006; Elias and Rey 2011; Ertur et al. 2007; Fischer and Stumpner 2008; Khomiakova 2008; Magalhães et al. 2005; Patacchini and Rice 2007; Rey and Montouri 1999).

More recently, spatial panel data models have been developed with increasing access to larger data sets for different spatial units over time (Arbia et al. 2005; Chatterjee 2017). The panel data models with greater degrees of freedom, more variation and less amount of collinearity among the variables have more efficiency in estimation (Elhorst 2014). However, they do not control for spatial dependence. This makes spatial panel data models as the desirable econometric framework in the presence of spatial dependence.

In the next section, we discuss our data and the econometric model used to test for spatial impacts.

Data and methodology

We estimate regional income convergence rates in India for 28 regions for the period 1981–1982 (hereafter 1981) to 2010–2011 (hereafter 2010) controlling for spatial dependence in panel data models. We use the Exploratory Spatial Data Analysis (ESDA) to test for spatial effects in regional convergence. Our analysis has relied on three software packages for the analysis: Geoda (1.4.6), Stata (v12) and QGIS (v 2.0.1). The period 1981–2010 saw a reorganisation of states. In 2000, three states—Chhattisgarh, Jharkhand and Uttaranchal (now called Uttarakhand)—were carved out of the existing states of Madhya Pradesh, Bihar and Uttar Pradesh, respectively. In our analysis, these newer states were combined with their parent states to facilitate the comparative analysis.

Data

The primary variable of interest to us is the Per Capita Net State Domestic Product (PCNSDP) (the terms PCNSDP and PCI are used interchangeably in this paper). Macroeconomic data in India is provided by a number of organisations. The data is released officially both by the Central Statistical Office (of the Government of India) and the Reserve Bank of India. However, they do not provide comparable (common base year) long term current and constant price series. We use the PCI series from the Economic and Political Weekly Research Foundation (EPWRF) as it provides a common base year current price series. State-level income data prior to 1981 is available only for the major states, not for all the states and Union Territories (UT). The PCI current price series from the EPWRF is converted to a constant price series in this paper after controlling for price variability over time by using a price deflator (Dornbusch et al. 2002).The deflator was obtained by dividing India’s Net Domestic Product (NDP) at current prices by NDP at constant prices (base 2004–2005 prices). We use data up to 2010 as it allows us to take 5-year averages (since PCI up to 2015 is not available yet). A panel was formed by splitting the time-period of 30 years into six different 5-year sub-periods (that is, 1981–1985, 1986–1990, 1991–1995, 1996–2000, 2001–2005 and 2006–2010).

In each sub-period, the initial value of PCI was measured at the beginning of each 5-year period in the panel. So, for example, the growth equation for 1981–1985 would use the PCI of 1981 as an explanatory variable.

Methodology

Growth empirics typically uses a linear regression model

$$Y_{i} = \alpha_{i} + \beta X_{i} + u_{i}$$
(1)

where the dependent variable is PCI growth rate “\(Y_{i}\)”, with the initial PCI level “\(X_{i}\)” being the explanatory variable in region “i”, α and β are parameters to be estimated, and \(u_{i}\) is the error term (Eq. 1) (also see section O1 in Online Resource). As discussed earlier, ordinary least square (OLS) estimation is not a suitable method when we anticipate spatial dependence between the observations (Anselin 1988).

We start with a non-spatial regression model (like Eq. 1) and then test whether or not the model needs to be extended with spatial interaction effects (Anselin 1988; Elhorst 2014). Normally, there are three kinds of interaction effects that are found:

  1. (a)

    Endogenous interaction effects: when the dependent variable of a particular unit, say, “i”, depends on the dependent variable of other units, say, “j”, and vice versa.

  2. (b)

    Exogenous interaction effects: when the dependent variable of a particular unit “i” depends on independent explanatory variables of unit “i” and its neighbouring unit, “j”.

  3. (c)

    Interaction effects among the error terms: when the error terms in the model are spatially autocorrelated.

The expectation is that if two units (i, j) are in proximity they will influence each other a lot more than units which are located further away. This relation is captured through a weight matrix which we discuss next.

Spatial weight matrix

Spatial dependence is quantified through the Spatial Weight Matrix (SWM) such as:

$${\text{W}}\, = \,[ W_{ij} ],$$

where i and j = 1,…n, which incorporates the spatial relationship among the “n” observation units that are considered as neighbours (see section O3 in Online Resource).

In the SWM, the cells in “\(W_{ij}\)” have different values depending on whether we use the notion of neighbourhood based on contiguity or distance.

Contiguity matrix

A normalised-contiguity matrix is constructed from the boundary information in a coordinates’ dataset of geospatial data. In an SWM matrix, contiguous units (neighbours) are assigned the weights of 1, while non-contiguous units are assigned weights of 0. Contiguity can be further defined either as Queen, Bishop or Rook, and they could be of first order or higher (second or greater) order. We use rook contiguity of the first order—spatial units sharing a common border are considered first order rook contiguous (see Fig. 1). This is a stronger condition that avoids the situation of a single shared boundary point being counted as neighbour.

Fig. 1
figure 1

Distribution of states with first order contiguity

In the histogram in Fig. 1 (left-hand side) we present the distribution of states and their neighbours. It provides a count of the number of links (neighbours) that the 28 spatial units have. Diagnostic tests of the contiguity matrix for India show that we have two states—Madhya Pradesh and Assam—with seven neighbours, while Andaman and Nicobar being an island has no neighbour (zero link). The colour scheme on the India map (right-hand side of Fig. 1) shows the geography of total links.

The shortcoming with using the contiguity option is that only neighbouring units are taken into account. However, sometimes deeper knowledge about distance relationships is important (Anselin 1988). We have, therefore, also used the distance-based matrix for our analysis, which we discuss next.

Inverse distance matrix

An inverse-distance spatial-weight matrix uses the inverse of the distances between the units to generate the cell values of the matrix (see section O4 in Online Resource). This helps us to examine if the distance has any neighbourhood impacts.

These distances between geospatial units are computed from the latitudes and longitudes of the unit’s centroids. We find, following Drukker et al. (2013) that in India the centroids of the two closest states lie within 97 km of each other (that is, 1/0.0102623), while the two most distant states are 300.7 km apart (that is, 1/0.003325) (see Table 5 in Online Resource).

We now proceed to use ESDA to check for the presence of spatial heterogeneity and autocorrelation.

Exploratory spatial data analysis

The tests commonly used for detecting spatial autocorrelation are the Global Moran’s “I” and Local Moran’s “I” (also called the LISA—Local Indicators of Spatial Autocorrelation) tests (see section O5 in Online Resource).

The Global Moran’s “I” test statistic checks for the presence of global spatial dependence among observation units. Spatial dependence is confirmed if the correlation statistic “I” is significant.

The Local Moran’s “I” test statistic, on the other hand, is computed for each location of clusters and spatial outliers to identify the locations contributing most to the overall pattern of spatial clustering (Pisati 2001).

We can visualise the type and strength of spatial autocorrelation with the Moran scatter plot. The p values of Local Moran’s “I” statistic may be regarded as an approximate indicator of statistical significance (Anselin 1988). However, even when spatial autocorrelation statistics indicate a significant pattern of spatial clustering, it is only the first step in the analysis. The next step would be to model the relationship across the spatial units for the different interaction effects.

Spatial dependence models with cross-section and panel data

Four kinds of spatial models are commonly used for cross-section as well for panel data analysis (LeSage and Pace 2009):

  1. (a)

    The Spatial Lag Model or the Spatial Autoregressive Model (SAR): contains endogenous interaction effects.

  2. (b)

    The Spatial Error Model (SEM): considers the interaction effects among the error terms.

  3. (c)

    The Spatial Autocorrelation Model (SAC): considers endogenous interaction effects and the error interaction effects together.

  4. (d)

    The Spatial Durbin Model (SDM): includes both endogenous and exogenous interaction effects.

The four models differ in the way space is expected to influence the dependent variable. The SAR and SEM models pick up local interactions. While the SAR model focuses on the dependent variables of two regions, the SEM model is better suited when the spatial dependence is caused by a shock or a random aberration not picked up by any explanatory variable (Sofi and Durai 2015).

When all types of spatial interactions are considered in a cross-section model it is referred to as the General Nesting Spatial (GNS) model. The cross-section of “n” observations in the Eq. (1) can be extended for a panel of “n” observations over numerous time periods “T”, by adding a subscript “t” to all the variables and the error term in the model.

The simple panel model can be combined with the spatial cross-section model to create a spatial panel model to test for the presence of spatial dependence (Eqs. 24). We account for spatial dependence in the GNS panel model by extending Eq. 1 in the following way (Elhorst 2014):

$$Y_{it} = \alpha_{it} + \beta X_{it - 1} + \rho WY_{it - 1} + \mu_{i} + \eta_{t} + \theta WX_{it - 1} + u_{it} ,$$
(2)

where

$$u_{it} = \lambda Wu + e_{it} ,$$
(3)

implying

$$Y_{it} = \alpha_{it} + \beta X_{it - 1} + \rho WY_{it - 1} + \mu_{i} + \eta_{t} + \theta WX_{it - 1} + \lambda Wu + e_{it}$$
(4)

where “i = 1…n” denotes regions and “t = 1…T” denotes time periods and \(e_{it}\) is the random error term. In the context of this paper the dependent variable “\(Y_{it}\)” is the PCI growth rate and “\(X_{it - 1}\)” is the initial value of PCI in region “i” at time “t − 1”. In the above equations, the intercept “\(\mu_{i}\)” considers the omitted variables which are specific to each spatial unit, and “\(\eta_{t}\)” represents time specific effects. These spatial and time specific effects can be treated as fixed effects or random effects. When unobserved heterogeneity is correlated with the independent variables we use fixed effect to eliminate omitted variables bias. However, if unobserved heterogeneity is uncorrelated with independent variables, the random effects model would overcome the problem of serial correlation in the panel data model. For each spatial unit and for each time period, a dummy variable is introduced for fixed effect model, while for random effects model “\(\mu_{i}\)” and “\(\eta_{t}\)” are treated as random variables. Further, “\(\mu_{i}\)”, “\(\eta_{t}\)” and “uit” are assumed to be independent of each other. In our estimation later we use the Maximum Likelihood Estimator (MLE) technique for estimating the parameters (see sections O6 and O7 in Online Resource).

The QGIS software allows the PCI data to be combined with the spatial data given in the shape file. The spatial component is introduced by using a shape file in QGIS that includes geographic attributes data such as names and identity codes for each state.

We now proceed to present the results of our empirical analysis.

Results

To visually examine spatial variations in PCI over a time span of three decades (1981–2010), two choropleth maps were created using QGIS for 1981 and 2010 for 25 states and three UTs (see Fig. 2).

Fig. 2
figure 2

PCI among Indian states (1981, left and 2010, right)

The states and UTs were divided into four groups depending on their relative PCI—green-coloured states are the ones with highest relative PCI and red states have the lowest relative PCI. States with income levels below the highest PCI group are coloured in blue, followed by the orange-coloured states which are a little above the lowest PCI group. A comparison of these groups, in 1981 and 2010, suggests that central India which was in red in 1981, continues to be in red in 2010 (with the exception of Rajasthan). In 2010, many of the north-eastern states which were in orange in 1981 have joined the red group with the exception of Sikkim. Jammu and Kashmir too has joined the lowest PCI group in 2010. The geographic clustering of states by PCI is suggestive of spatial effects in the growth process in India and we use the global and local Moran’s “I” to confirm this before using an explicit spatial regression model.

Moran’s “I” statistics

The results of Moran’s “I” statistic for global spatial autocorrelation for PCI (1981 and 2010), as well as for real PCI growth (from 1981 to 2010: Growth rate 8110) are reported below (Table 1). The values of Moran’s “I” for the both contiguity and distance-based matrices show significant degrees of spatial dependence.

Table 1 Moran’s “I” global spatial autocorrelation statistic for Indian states

There is a strong positive spatial dependence in the PCI (for both 1981 and 2010) and growth (1981–2010) whether we use the contiguity measure or the distance measure. This can be observed in Moran’s scatter plot (Fig. 3). In Fig. 3, on the vertical axis, “\(W_{z}\)” represents the lag of variable “X” while on the horizontal axis (labelled “z”) is the variable “X”. The oblique line represents the linear regression curve between these two variables. The slope of the regression line obtained by regressing “\(W_{z}\)” (lag of variable “X”) on “\(z\)” (variable “X”) gives us Moran’s “I” (I = 0.070 in 1981 and I = 0.073 in 2010) based on the inverse distance matrix (Anselin 1996, p. 116).

Fig. 3
figure 3

Moran’s scatter plot of Ln PCI 1981 (left) and 2010 (right) based on inverse distance matrix (2004–2005 constant prices). Note 1: Andaman and Nicobar Islands, 2: Andhra Pradesh, 3: Arunachal Pradesh, 4: Assam, 5: Delhi, 6: Goa, 7: Gujarat, 8: Haryana, 9: Himachal Pradesh, 10: Jammu and Kashmir, 11: Karnataka, 12: Kerala, 13: Maharashtra, 14: Manipur, 15: Meghalaya, 16: Mizoram, 17: Nagaland, 18: Orissa, 19: Puducherry, 20: Punjab, 21: Rajasthan, 22: Sikkim, 23: Tamil Nadu, 24: Tripura, 25: West Bengal, 26: Uttar Pradesh, 27: Bihar, 28: Madhya Pradesh

Moran’s scatter plot is divided into four quadrants, and each represents different kinds of spatial association or dependence:

  1. (a)

    Quadrant 1 (upper right quadrant) depicts the spatial clustering of regions with high income and surrounded by high income neighbours (HH). Thus, the locations are associated with positive values of “\(I_{i}\)”.

  2. (b)

    Quadrant 3 (lower left quadrant) shows the spatial clustering of low income states which also have low income states as neighbours (LL). These locations are also associated with positive values of “\(I_{i}\)”.

  3. (c)

    Quadrant 2 (upper left quadrant) shows clustering of low incomes states surrounded by regions with high income states (LH). These locations have negative values of “\(I_{i}\)”.

  4. (d)

    Quadrant 4 (lower right quadrant) shows spatial clustering of high income states surrounded by regions with low income states (HL). These locations are also associated with negative values of “\(I_{i}\)”.

In the two periods 1981 and 2010, there is evidence of spatial concentration of PCI of the states. In 1981, Delhi, Goa and Punjab were the richest states surrounded by high income neighbours. In contrast, Puducherry, a high-income state, was surrounded by regions with low income states. In quadrant 2, Uttar Pradesh, Madhya Pradesh, Rajasthan, Kerala and Andhra Pradesh, the low-income states were surrounded by richer neighbours. In quadrant 3, Assam, Manipur, Meghalaya, Mizoram, Nagaland, Sikkim, Tripura (north-eastern states), along with Bihar and Odisha, were the poorest and also had poor neighbours.

In 2010, Delhi and Goa were the richest states. However, there has been an increase in the number of high income neighbours surrounding them. Kerala and Tamil Nadu, which earlier belonged to a lower income category, have joined the cluster in quadrant 1 in 2010. Similarly, Puducherry, which was surrounded by low income neighbours in quadrant 4 (in 1981) joined the cluster in quadrant 1 in 2010. Unfortunately, the north-eastern states of Assam, Manipur, Meghalaya, Mizoram, Nagaland and Tripura have continued to be in quadrant 3. Arunachal Pradesh and West Bengal have now joined this cluster in 2010. Sikkim has been a remarkable outlier and moved from being a low-income state to a high-income state. Since it is surrounded by low income neighbours, it is placed in quadrant 4. These observations confirm a strong regional concentration of PCI, with most of the richer states located in the southern and the western parts of India, along with Delhi, Haryana and Punjab in the north.

The scatter plots created from the contiguity matrix reveal similar results with respect to the pattern of spatial concentration in India, with a few exceptions (see Online Resource, Figure 4). Delhi, Goa, Haryana, Punjab and Maharashtra were found to be the richer states surrounded by high income neighbours in 1981. In contrast (to the findings of the distance matrix in Fig. 3), Gujarat is located in quadrant 4 surrounded by states with low income.

Since spatial dependence is confirmed by Moran’s “I” and the LISA statistics, the earlier convergence analysis in the literature based on non-spatial models is inadequate, which justifies the use of spatial panel models. For Ordinary Least Squares (OLS) estimation different period combinations are used (see Table 6 in Online Resource). For the OLS spatial and non-spatial results, see sections O8 and O9 in Online Resource. Results of the non-spatial panel model can be seen in section O9.

The question that we need to address is: Which of the 4 spatial panel models best suits our purpose? In India, the 1990s was a period of economic transition with the opening up of the economy but it has been argued that the process actually began much earlier in the mid-’80s (Rodrik and Subramanian 2005). In any case, if the perturbance was temporary and staggered across different states shock-driven models (SAC and SEM) may not be best suited for our purpose. SDM relies on the lag of the dependent and independent variable. Since the drivers of growth in different states (even neighbours) are not common, the use of the SDM, therefore, in our understanding of the economic process does not seem appropriate.

The major role played by human capital in India’s growth story in the form of a demographic dividend is well recognised (Karnik and Lalvani 2012). There has been some impacts of the information and communication technology (ICT) sector in India’s growth process, but its direct impact has been limited to urban centres of some states (Desmet et al. 2015). Though the sector has grown in value, employment generation in this sector has not kept pace.

While a majority of states have a dominant tertiary sector, geographically proximate (neighbouring) states have seen different growth drivers over the last decade. In Maharashtra, for example, the gross value added by “Manufacturing” (within the secondary sector) to the Gross State Domestic product was the highest in comparison to other sub-sectors in 2011–2012, followed closely by “Real Estate, Ownership of dwellings and Professional Services” (MPD (Maharashtra Planning Department) 2017). Its neighbour in the west, Goa, reported the greatest contribution from “Real estate, ownership of dwelling and professional services”, closely followed by “Trade, Repair, Hotels and Restaurants” and “Public Administration” (GoG (Government of Goa) 2017). Goa, as we know, is a tourist driven economy. In another fast-growing southern state, Karnataka (which houses information technology majors based in Bengaluru, the so-called Silicon Valley of India), “Real estate, Ownership of Dwellings and Professional Services” as a sub-sector contributed almost one-third of the Gross State Domestic Product. Presumably, this was triggered by the growth of the ICT sector (GoKar (Government of Karnataka) 2016). Kerala, on the other hand, had the highest contribution coming from “Trade, Repair, Hotels and Restaurants” indicative of a strong tourism focus (GoK (Government of Kerala) 2017). Evidently, there is an absence of a single common factor, policy or sub-sector that can be identified to be the growth stimuli in a country as diverse as India. Therefore, policy commonalities as the defining characteristic of proximity may not provide an accurate description of the neighbourhood in growth dynamics. In our view, therefore, the SAR model would best suit our purpose.

We next proceed to test the suitability of the SAR model with random (RE) and fixed (FE) effects. Since the FE model typically examines the relationship between the dependent and independent variables within an entity, the RE model may be better suited. The commonly used Hausman test confirms our expectation, wherein the differences between the coefficient of the RE and FE models are found to be not significant. Therefore, by default the RE model is chosen for further discussion.

The summary statistics of the two variables of interest to us (growth and initial income) reveal that average growth rate (over a 5-year period) in the panel was 0.053 and average of log PCI as 9.65 (see Table 2).

Table 2 Summary statistics for the dependent and the independent variables

Results of the SAR model using contiguity matrix and the inverse distance matrix and the Hausman test results are presented in Tables 3 and 4, respectively.

Table 3 MLE using different model specifications (SAR panel model with random and fixed effects)
Table 4 The Hausman test for SAR panel model for contiguity and inverse distance matrix models

Normally the Hausman test examines the significance in difference between the coefficients of the RE and FE models. In our case, since coefficient of RE < FE, the χ2 value < 0 and the model fitted on this data would fail to meet the asymptotic assumptions of the Hausman test. This is a known issue in estimation and an alternate that is often recommended under these circumstances is to use the absolute value of the difference to overcome the problem (Schreiber 2008). As evident, the null hypothesis holds according to the Hausman test and therefore RE is the preferred model for both the contiguity as well as inverse distance model.

While we report both the FE and RE model for both the contiguity and inverse distance matrix models (Table 3), we will use the results from the RE contiguity model as it is an indicator of greater connectedness between geographically proximate units. In addition, it is noticed that the difference in the coefficients (of the RE models) from the two spatial models is marginal.

We find that the “β” coefficient is positive and significant, confirming that there is strong evidence of income divergence. The “ρ” coefficient is significant and positive for SAR (RE) in the contiguity model, indicating that apart from the state’s own initial income, the growth rates of neighbouring states also have an influence on a state’s growth. This confirms our claim that estimates from previous studies are biased, inconsistent and inefficient. Expectedly, therefore, the values of “\(\upbeta\)” in the SAR panel RE model (about 0.015) are significantly less than the non-spatial FE model (0.028) and the non-spatial RE model (0.019), implying that the received econometric results of earlier contributions overestimated the value of “β” as they did not control for spatial dependence (see Table 10 in Online Resource).

Conclusion

The literature on convergence in India by a large majority has established that there is divergence in the growth rates, especially post-liberalisation. We find confirmation of these findings in our analysis. However, where we break away from the earlier literature is in the use of spatial analysis in the panel data models. Our results suggest that the earlier OLS and panel data estimates on convergence suffer from bias, inconsistency and inefficiency due to misspecification caused by the omitted spatial component in their analysis. Our estimate from the spatial panel model (SAR contiguity, random effects) confirms that the process of growth in India is divergent and spatially dependent. Further, the impact of initial income on growth is much smaller than earlier anticipated once we control for spatial dependence. Our analysis suggests that neighbourhood effects play a significant role in determining growth outcomes of Indian states.

We believe that this is the first attempt to demonstrate this using spatial panel analysis in the Indian context and has important implications for policy making. Areas of low incomes could benefit from growth spill over effects from richer neighbours. This raises hope that a virtuous circle of growth could emerge in India.