1 Introduction

Housing prices in major Chinese cities have recently experienced an exponential increase. Approximately 46 of the 70 large- and medium-sized cities have witnessed a price increase compared to the same month in the previous year, and 65 of these experienced a year-to-year increase as of April 2016 (National Bureau of Statistics of China 2016). Unaffordable housing prices have attracted significant concerns from both scholars and the public, ranking first in the list of China’s social problems from 2007 to 2012 (Zhang and Tang 2016). As such, identifying an approach to curb booming housing prices became one of the most important social concerns for the central and local governments in China. Further, implementing an affordable and sustainable housing policy is a dilemma for the Chinese government. On one hand, tackling the problem of real estate surplus in third- and fourth-tier cities requires adopting a series of measures to stimulate consumer spending, such as lowering the minimum down payment and mortgage rates. On the other hand, the government is confronted with the pressure to control the increase in housing prices and address housing affordability (Wang and Murie 2011). One difficulty in using a uniform policy system to solve housing problems is posed by the regional differences in housing prices. This regional imbalance exists at both the inter-municipal and inter-city levels and also at the county or district scales (Li 2012; Zang et al. 2015; Zhang et al. 2015; Wang et al. 2017b). Consequently, some scholars consider that a regionally differentiated policy should be designed to stabilize residential housing prices. Additionally, the factors related to Chinese residential real estate value need to be identified before enacting a new policy.

Numerous studies deal with the determinants of housing prices in the Chinese market. To this end, economists focus on macroeconomic drivers, such as GDP, disposable income growth, urbanization level, rural–urban migration, housing policy, and finance (Liang and Cao 2007; Deng et al. 2009; Chen et al. 2011; Zhang et al. 2012, 2016), while geographers are frequently interested in accessibility, infrastructure, and amenity features (Wen et al. 2014a, b, 2015; Huang and Du 2015; Chen 2017; Wang et al. 2017b; Dai et al. 2016; Wen and Tao 2015; Hu et al. 2016; Geng et al. 2015). Generally, although most studies indicate that proximity to transportation infrastructures results in high marginal prices in the Chinese market, the results are mixed. This effect is observed for a distance of up to 1400 m from a transfer subway station and 1000 m from a non-transfer station in Beijing (Dai et al. 2016). Moreover, when a residential house is closer by 100 m to the subway station, average increases of 96.5 and 23.0 yuan/m2 are observed in the residential unit prices for transfer and non-transfer stations, respectively. In Shenzhen, this effect is observed for a distance of approximately 700 m from a metro station (Li et al. 2009; Nie et al. 2010) and, for every 1 m decrease in distance from the metro station, the residential housing prices tend to increase by 0.0108% (Li et al. 2009). Unlike metro stations, high-speed railway stations seem to have a reverse ‘U’ influence pattern on housing prices in Beijing (Geng et al. 2015, being positively correlated with housing prices within the range of 0.475–0.891 km from the high-speed railway station, but have negative effects on the prices of houses located within the range of 0.891–11.704 km. Apart from accessibility features, several studies have reported positive effects of environmental amenities on housing price. If the distance to West Lake in Hangzhou increases by 1%, housing prices are expected to decrease by 0.229% (Wen et al. 2015). However, Wen et al. (2014a), in another study on Hangzhou’s housing prices found that although the distance to West Lake increased by 1%, the average housing price dropped by 0.159%. However, not all lakes and parks were found to have an effect on housing prices. For example, only the Yangtze River, East Lake, and city-level parks showed positive effects on apartment prices in Wuhan (Jiao and Liu 2010). Educational facilities (especially elementary and junior high schools) also had significant positive impacts on housing prices, being called the ‘school district effect’ (Wen et al. 2014b).

Despite the significant contribution of these studies on understanding the determinants of housing prices in China, some points require further discussion. One of them is the most common methodology for housing price modelling used in most previous studies, namely, the hedonic price method of Rosen (1974). According to this method, the diversified conclusions of the effects of proximity on housing prices in such studies may result from the structures and processes associated with housing bundles. The hedonic price method is based on ordinary least squares (OLS) regressions, which regard all variables as independent and not interactive and assume the error to follow an independent identical distribution (Lee 2009). Explicitly, this method fails to consider the inherent hierarchical characteristics of housing prices (Liou et al. 2016). Considering that houses are nested within neighbourhoods, neighbourhoods are nested within counties, and counties are nested within provinces, houses located in the same neighbourhood show more similar price characteristics than those located in different neighbourhoods. Moreover, housing prices in the same county are more similar than those in other counties. This is especially true in China, where individual housing prices can be grouped by counties, which can then form into municipalities that are themselves nested in provinces. Spatial correlation generally tends to occur ‘within-place’, which is inherently hierarchical when the locational characteristics of data are aggregated at the onward level (Jones and Bullen 1993; Brown and Uyar 2004; Hou 2016). When the hierarchy of housing prices is ignored and individual attributes are tackled, the neighbourhood attributes, as well as the city or province features of housing prices at a certain level, will be affected by heteroscedasticity and spatial autocorrelation, leading to biased estimates of standard errors (Orford 2000; Hou 2016). However, how hierarchical regional characteristics are capitalized into residential property values is less studied in previous works. Further, most previous studies mainly concentrate on the housing prices of a single city, with only few of them considering national housing prices to identify the determinants of housing price. For instance, Wang et al. (2017b) revealed a significant spatial autocorrelation for county-level housing samples and reported that the proportion of renters, floating population, wage level, cost of land, housing market, and city service level have positive effects, whereas living space is negatively related to housing prices. Their study thus demonstrates the importance of realizing the spatial heterogeneity of housing prices at county scale. However, given that spatial regression models are used to detect price determinants, they cannot model the mechanism of spatial correlation (Orford 2000). Theoretically, households begin their housing search process by choosing a province to, followed by a county or city to live in in the given province, and finally a house in the identified city and province. Therefore, understanding the regional characteristics and inherently hierarchical residential location decisions of households is important for identifying the determinants of housing prices.

This study thus examines the effects of environmental characteristics and accessibility features on housing prices by using a hierarchical linear model (HLM). One of its contributions is that we identified the determinants of housing price using a dataset of 146,099 housing observations from 1120 counties in China. As considering a single city was the focus of most previous studies, a cross-sectional and county-level analysis on a national scale will help better understand the determinants of housing prices on the Chinese residential housing market. Another contribution of our study is testing the ‘contextual effects’ of housing value determinants at different levels. In hierarchical data structures, the contextual effects come from the interactions and influence of high- and low-level variables, suggesting that nested environment or background characteristics can influence individuals (Lee 2009). For example, the effects of rural–urban migration in a county (low-level variable) on housing prices can be influenced by the effects of GDP (high-level variable) and number of urban workers of the province it is nested in. We also use the HLM to decompose the spatial heterogeneity effect and examine its magnitude at different levels.

The remainder of this paper is organized as follows. Sections 2, 3, and 4 present the research design, namely the theoretical assumption, data source, and variable selection. A three-level HLM is then proposed in Sect. 4. Section 5 reports the empirical results, which are then discussed in Sect. 6. Section 7 concludes the paper.

2 Housing policy reform, administrative hierarchy, and housing inequality in China

China’s housing policy reform can be divided into three periods: before 1979, 1979–1998, and after 1998 (Deng et al. 2009; Chen et al. 2011) (Fig. 1). Before 1979, scholars call it the public housing leading period. Under the central planning system, housing provision was dominated by governments and state-owned enterprises. During this period, housing in cities was not a private commodity but was mainly regarded as social welfare good. Specifically, the government directly controlled all parts of the housing system through work units, which allocated housing to their employees according to their work positions and contributions. Employees only paid a small rent for basic housing maintenance. About 75% of urban residents acquired housing from their work units or municipal housing bureau before 1985 (Li 2003). Naturally, there was no home ownership or private property rights. The drawback of this welfare housing system is apparent. Without economic benefits, the incentive for housing investment was extremely low, so that urban residents had to face serve housing shortages and poor living conditions, while the governments were faced with the pressure of providing housing.

Fig. 1
figure 1

Urban housing reform timelines in China

The second period coincides with China launching its reform and opening-up policy, as 1978 is a milestone in China’s economy. In the early stages of this period, along with the transformation from a centrally planned to a market-oriented economy, several housing reform experiments were carried out and at least two important documents were issued by the central government from 1979 to 1994 (Deng et al. 2009), which contained a set of measures for reducing local governments’ burden and improving living spaces from both the supply and demand sides. However, the experiments and policies were not a deep transformation of the housing system and did not change the fact that work units were still the main suppliers of housing for urban residents. After 1994, the housing provision system became a typical dual-track system. On one hand, work units still participated in housing construction and allocation and, on the other hand, the private housing market, where housing was privately owned and provided by the private sector, was allowed to develop for high-income households. Nevertheless, the Chinese housing market did not undergo real privatization until 1998.

After 1998, the housing provision by work units was completely prohibited by the central government, and the housing industry was encouraged to develop dramatically. Urban residents could buy their homes from private developers. To allow urban residents to buy commodity housing, two housing programs, Economical and Comfortable Housing and Housing Provident Fund were implemented across the country. The former mainly provided housing at a price much less than the market price for middle- and low-income families, while the latter provided opportunities for all workers to apply for subsidized mortgage loans based on social and private savings. The 1998 reform abandoned China’s old system of linking housing distribution with work units and marked the start of a new comprehensive marked-based housing provision system. Today, urban housing is provided by a diversified provision model, which includes market, privatized pre-reform public, old private or self-built, and post-reform public housing (Chen et al. 2014).

In sum, the urban housing reform was successful in stimulating economic growth and improving urban residents’ living conditions. However, it also posed problems. One of them is that housing inequality still exists and has even worsened in some respects (Zhao and Bourassa 2003), as it was not only attributed to the different social ranks of urban residents but was also induced by the different economic power and administrative ranks of work units. The administrative system in China is rather different from that of western countries, which has generated profound impacts on the housing market of China. Basically, there are four levels of units in the Chinese administrative system, namely provinces (Sheng/Zhixiashi), municipalities (Shi), counties (Xian/Qu), and towns (Zheng/Jiedao), which have different administrative powers in terms of land provision and housing market regulation. A province is the highest level administrative unit, followed by municipalities and counties. A town is the lowest unit level and does not have policy making and legislative powers. Migrants usually begin the settling down search process by choosing a province, followed by a city or county in given the province, and finally a house to live in the given the province and county. Given the different levels of economic development, public infrastructure, educational and medical resources as well as housing policies, there is significant between- and within-city housing inequality at the different administrative unit levels. At the provincial level, housing inequality was obvious during the first decade of the twenty-first century, although it has generally declined over time (Yi and Huang 2014). Additionally, regional inequalities caused by the reform and opening-up policy have resulted in uneven urban land expansion in China (Dennis 2017), which in turn lead to different localized housing sub-markets. In the developed eastern coastal regions, housing provision is relatively low and housing demand high because of the dense population and high average household income. Therefore, housing prices are unaffordable for low-income migrants. Conversely, given the sufficient urban land supply and low housing consumption ability, housing prices are lower in central and western China.

3 Estimates of the multilevel determinants of housing prices using the hierarchical linear model

The traditional hedonic price model is a one-level model, which assumes that housing data do not have a hierarchical structure, and only one general relationship between housing prices and their attributes is generated across time and space. In other words, the influence of housing attributes on housing prices is assumed to be homogeneous and reflected in the assumption that estimated parameters in OLS-based hedonic models are invariant. However, housing attributes in the same administrative unit are more similar than those in other units. The implicit prices of housing attributes may vary spatially by sub-markets. Spatial heterogeneity is thus likely to arise when the relationship between housing prices and their determinants is not constant across the country but varies spatially by hierarchical housing sub-markets. For example, considering that the transaction price of a housing sample is a function of an income variable measured by the average wage of urban employment (AWUE) and assuming there are no regional differences in the housing prices of all sub-markets, the traditional hedonic price model can be specified by a single line, which is universal across the entire sample (Fig. 2a). However, if there are variations in the average housing prices of all sub-markets and the effects of AWUE on housing price are assumed to be the same in all sub-markets, the relationship between housing prices and AWUE can be represented as a set of parallel lines that potentially have different intercepts but the same slopes (Fig. 2b). If the effects of AWUE on housing prices are different for different sub-markets and the average housing prices of different sub-market also differ, the entire regression will be decomposed by a series of price–AWUE relationships, with each sub-market having a different intercept and slope (Fig. 2c). Therefore, it is not reasonable to hypothesize that housing prices are not dependent on each other, as spatial dependence often exists in housing samples located in geographical proximity. Similarly, assuming the housing prices of different spatial (administrative) units have the same attributes and ignoring the differences of the effects of socio-economic forces of high-level units on low-level housing markets may also result in biased estimated outcomes. Consequently, in hierarchical models, both the micro- and macro-level need to be considered simultaneously. Differences in the mean price can be decomposed into two categories: (1) differences due to mean housing attributes and (2) differences due to sub-market characteristics. A complete multi-level housing price model allows random terms and attribute parameters to vary according to a higher-level distribution:

Fig. 2
figure 2

Three models for housing price–AWUE relationships

Price of house i in city j in province k = Typical price across province k + Fixed effects for explanatory variable n of house i in province k + Random term for explanatory variable n of house i in province k + Random term for explanatory variable n of house i in city j + random term for province k + random term for city j + random term for house i.

Additionally, although spatial econometric and geographically weighted models are effective methods to account for spatial dependence and spatial heterogeneity in housing samples, the cross-level interaction effects between independent variables at high- and low-level spatial units are ignored and cannot be captured. This is the one of the main reasons multi-level models should be employed to account for the hierarchical nature of housing data. Understanding the regional characteristics and inherently hierarchical residential location decisions of households is important to identify the determinants of housing prices.

4 Data and methodology

4.1 Data

The original housing sample data are derived from a real estate website named Anjuke (http://www.anjuke.com/), which is one of the main housing transaction information-sharing websites in China. The dataset covers a total of 146,099 samples from 1120 counties in October 2015, which is chosen as study period because the housing market was relatively active and the housing prices in most provinces experienced a rapid increase during that period. The spatial data mainly include housing price information, community name, and county and province where the observation was located. Owing to missing information on the housing characteristics of many housing observations, structural attributes (e.g. area, floor size, and plot ratio) are not incorporated into the analysis. A total of 146,099 community samples were selected for the analysis after duplicated data and data with missing geographical information and prices were removed. Besides housing samples, the spatial dataset also contains the spatial distribution and attributes of the county centre, province centre, main road, highway, railway, high-speed railway, river, lake, and land-use information. The original housing sample data are then georeferenced to the same geographic coordinates of transport and land-use data, which enables us to use spatial analysis in ArcGIS 10.2.

Accessibility and environmental characteristics are used in our model. At the house unit level, proximity variables are characterized by the nearest distance from a housing sample to the central business district (CBD), rivers, lakes, or transport lines; this distance is calculated by the near tool in ArcGIS 10.2. At the county level, socioeconomic data on environmental characteristics relate to the population and are collected from the sixth nationwide population census. At the province level, population data, GDP, income, and urban infrastructure are derived from the website of the National Bureau of Statistics of China (http://data.stats.gov.cn/easyquery.htm?cn=E0103).

4.2 Hierarchical levels of housing samples

Based on the classification of administrative regions, three levels of the hierarchical structure of the hedonic price model are specified (Fig. 3).

Fig. 3
figure 3

Hierarchy levels and distribution of observations in October 2015, China

Level 1 is the community level, used to calculate the housing price of each sample. This level captures the variation of housing prices caused by the differences in structural attributes, accessibility, and environmental attributes. The proximity variable of each sample is measured at this level. In summary, a total of 146,099 individual observations are collected at Level 1.

Level 2 is the county level, also the basic administrative unit in China. This level captures the local variations caused by county attributes. A total of 1120 observations are acquired. Population-related variables are available at this level.

Level 3 is the province level and approximates the sub-markets that dominate the socioeconomic variables. Considering the continuing unavailability of housing information in three municipalities (Taiwan, Hong Kong, Macao), only a total of 31 observations from 34 provincial districts are used at Level 3.

4.3 Potential explanatory variables and theoretical assumptions

4.3.1 Community level

Conducted at the community level and rooted in hedonic pricing theory (Rosen 1974), the literature generally considers that property can be valued by a bundle of determinants, commonly grouped into physical, accessibility, and environmental characteristics (Debrezion et al. 2011; Efthymiou and Antoniou 2013). Accessibility is a key determinant of the intra-urban spatial variation of housing prices in standard urban economic models (Ottensmann et al. 2008; Hou 2016). Many empirical studies have reported that proximity to transportation (Debrezion et al. 2011; Li et al. 2009; Efthymiou and Antoniou 2013; Geng et al. 2015; Dai et al. 2016; Hou 2016; Liou et al. 2016), CBD (Ottensmann et al. 2008; Qin and Han 2013; Wen and Tao 2015), and amenities (Jiao and Liu 2010; Wen et al. 2015) have significant effects on surrounding house values. While CBD plays a significant role in sustaining property values in most cities worldwide, the railway system has both positive and negative effects (Debrezion et al. 2011; Efthymiou and Antoniou 2013). The positive effects come from the improvements in commuting convenience between and within cities. However, noise along railway lines usually imposes localized negative environmental effects. In Beijing, the high-speed railway system (whose maximum operating speed is at least 200 km/h) can attract investment and improve public infrastructure, thus having positive effect on the surrounding property values (Geng et al. 2015). In sum, the estimated effects of transport infrastructure on housing prices differ in sign, magnitude, and level of significance of the coefficients. Based on the literature review, we select the distance to the main road (D_Mroad), to railway (D_Railway), to high-speed railway (D_Hisprailway), to river (D_River), to lake (D_Lake), and elevation (Elv) (elevation denotes the elevation of the location of a housing observation) as independent variables. As most studies reported the positive effects of transportation on housing prices, we assume housing prices are negatively correlated with proximity variables at the community level (H1). Considering the unavailability of most housing attributes of most observations at the house unit level, we do not incorporate structure variables (e.g. age of the house, house area, plot ratio) in our model.

4.3.2 Intra-county level

At county and province levels, existing studies come from both the demand and supply sides, as comprehensively reviewed by Zhang et al. (2012) and Wang et al. (2017b). The implicit housing prices vary between counties and provinces and form proxies for sub-markets, which can be explained by socioeconomic variables at the county and province levels. A contextual effect occurs between low- and high-level variables, which may be reflected by the fact that the constant housing price at community level is a function of URBAN_PO, MIGRATIO, and PSPACE at county level. Counties with larger urban populations would have a higher housing demand. Additionally, as residential land and housing are limited for each county, housing resources are relatively scarce for counties with high urban population rates. Therefore, housing prices might be more sensitive to housing demand for counties with higher urban population rates. Therefore, we hypothesize that housing prices at the community level are positively influenced by the urban population rate (URBAN_PO) at the county level (H2). In a recent study, per capita living space was reported as having weak negative impacts on housing prices from the demand perspective (Wang et al. 2017b). When holding residential gross floor area constant, a smaller per capita living space implies a higher housing demand. We thus expect a negative relationship between housing prices and per capita living space (H3). With respect to demographic determinants, migrants are an important in the demand of housing. Unlike developed countries, China is characterized by a unique dual rural–urban population structure. For seeking a job and making a better life, rural residents in less developed areas migrate to big cites, forming a typical phenomenon at both regional and national scales. Some scholars argue that rural–urban migration has a positive impact on housing prices (Wang et al. 2017a). Therefore, we also assume that the housing prices at the community level are more likely positively impacted by the rural–urban migration in a county (H4). Based on the literature review and the above hypotheses, the following variables are selected as potential determinants at the county level: urban population rate (URBAN_PO), which denotes the proportion of non-agricultural population in the total population; rural–urban migration (MIGRATIO), which represents the number of floating people coming from other counties; and per capita living space (PSPACE).

4.3.3 Intra-province level

At province level, we mainly focused on the contextual effects of macro socioeconomic variables on housing prices. Income is an important determinant of housing value. For example, Eichholtz and Lindenthal (2014) find housing demand to depend on not only the current income of households but also future income. By exploring households’ choice within a community, Newman (2005) note that higher income in urban communities attracts more housing demand. Zhang et al. (2015) find that income inequality causes an increase in housing prices in China. Since the household consumption level is related to household income, we hypothesize that the household consumption level at province level should have a contextual effect on housing prices at a lower level (H5). In hedonic studies, parks and open spaces always have amenity effects on housing prices and college students are potential buyers of new housing. We assume that both of these factors, calculated at province level, have positive contextual effects on housing prices at a lower level (H6). Although migration can stimulate the local housing prices in a county, rural–urban migration unlikely has the same effects on housing prices in all provinces in China. For instance, Chen et al. (2011) find that the a change in the number of migrant workers has no significant impact on the housing prices in coastal provinces in China. They argue that this is because the urban commodity of housing was not affordable to rural–urban migrants. However, over the past two decades, individuals migrating from less developed to developed areas in China is a fact. Pearl River Delta, Yangtze River Delta, and Beijing-Tianjin-Hebei Region, which are the most economically developed regions in China, are the main destinations for migrants. The different results for the migration–price relationship might result from the different economic development levels of the regions previous studies focused on. Therefore, we hypothesize that the effect of rural–urban migration at county level on housing priced is affected by province-level economic development (H7). We use GDP to denote the economic development. On the Chinese housing market, GDP has both short- and long-run effects on property price (Liang and Cao 2007). Additionally, the housing market was found to be sensitive to employment (Baffoebonnie 1998). According to Agnew and Lyons, (2017), 1000 extra jobs could cause nearby housing price increases of 0.5–1% in 1–2 years after their creation. We thus hypothesize that urban employment would affect the migration–price relationship at province level (H8). Since the wages of urban workers may impact their potential housing purchase abilities, we assume that a higher average wage of urban workers at province level increases the per capita living space at county level (H9). Based on the hypotheses, six variables are selected at province level: GDP, household consumption level (HCL), per capita park and green land area (PCPGL), number of college students in 100,000 population (NCSM), urban employment (UE), which denotes the total number of urban workers of a province, and average wage of urban workers (AWUE). Table 1 shows the descriptive statistics of these variables.

Table 1 Descriptive statistics of the dependent and independent variables

4.4 Hierarchical linear models

HLM violates the independence assumption of the OLS regression because of the hierarchical data structure. HLM includes the random intercept and random slope models (Raudenbush and Bryk 2002). A simple two-level HLM model can be expressed as follows:

$$Level{\hbox{-}}1:\,\,\,\,Y_{ij} = \beta_{0j} + \beta_{1j} X_{ij} + r_{ij} \,\,\,r_{ij} \sim{\text{N}}\left( {0, \sigma^{2} } \right),$$
(1)
$$Level{\hbox{-}}2:\,\,\,\,\beta_{0j} = \gamma_{00} + \gamma_{01} W_{1j} + \mu_{0j} ,$$
(2)
$$\beta_{1j} = \gamma_{10} + \gamma_{11} W_{1j} + \mu_{1j} ,$$
(3)

where Yij is the housing price for unit i in group j, Xij is the individual characteristics of house i in group j, rij is the normally distributed stochastic error term, β0j and β1j are the coefficients to be estimated at Level 1, γ00 and γ10 are the intercepts at Level 2, W1j is the independent variable of group j, γ01 and γ11 are the coefficients on W1j, and µ0j and µ1j are the stochastic error terms at Level 2, assumed to be normally distributed:

$${\text{E}}\left[ {\begin{array}{*{20}c} {\mu_{0j} } \\ {\mu_{1j} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} 0 \\ 0 \\ \end{array} } \right],\quad {\text{Var}}\left[ {\begin{array}{*{20}c} {\mu_{0j} } \\ {\mu_{1j} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\tau_{00} } & {\tau_{01} } \\ {\tau_{10} } & {\tau_{11} } \\ \end{array} } \right] = {\text{T}} .$$
(4)

Similar to the two-level model, a three-level HLM model can be expressed as follows (Raudenbush and Bryk 2002):

$$Level\,1:\,\,\,Y_{ijk} = \pi_{0jk} + \pi_{1jk} ,$$
(5)

where Yijk denotes the natural log of price for property i in county j, which belongs to province k; π0jk is the intercept of county j in province k; Xpijk (p = 1, …, P) denotes the housing characteristics for property i in county j, which is located within province k; πpjk is the coefficient; and eijk is a stochastic error term assumed to be distributed N(0, σ2)

$$Level\,2:\,\,\,\pi_{pjk} = \beta_{p0k} + \mathop \sum \limits_{q = 1}^{{Q_{p} }} \beta_{pqk} W_{qjk} + r_{pjk} ,$$
(6)

where βp0k is the intercept of province k, Wqjk (q = 1, …, Qp) denotes the independent variables associated with county features, βpqk is the coefficient, and rpjk is the stochastic error term

$$Level\,3:\,\,\,\beta_{pqk} = \gamma_{pq0} + \mathop \sum \limits_{s = 1}^{{S_{pq} }} \gamma_{pqs} Z_{sk} + u_{pqk} ,$$
(7)

where γpq0 is the intercept at province level, Zsk (s = 1, …, Spq) is the independent variable associated with province features, γpqs denotes the related coefficients, and upqk is the stochastic error term.

In a housing-related HLM model, housing price is dependent on individual housing characteristics and county and province level features. Additionally, the relationships between dependent and independent variables for low-level equations may be influenced by high-level variables. For example, the average housing price of a county is correlated with its amount of rural–urban migration. However, this relationship is not constant for the entire housing market, varying across provinces. Generally, the HLM model includes the random intercept and random slope models.

4.4.1 Null model (NM) and intraclass correlation coefficients (ICC)

Initially, a null model (one-way ANOVA with random effect) is estimated to determine whether the analysis should be conducted using HLM. The null model can be used to calculate the number of variances caused by variances among groups in total variance. If the value of the variance among groups is not 0, a statistically significant variance is considered to exist at county and province level. In this instance, the HLM model performs better than the traditional regression method. A three-level null model can be expressed as:

$$Level\,1{:}\,\,\,Y_{ijk} = \pi_{0jk} + e_{ijk} ,$$
(8)
$$Level\,2{:}\,\,\,\pi_{0jk} = \beta_{00k} + r_{0jk} ,$$
(9)
$$Level\,3{:}\,\,\,\beta_{00k} = \gamma_{000} + u_{00k} ,$$
(10)

where i is the ith housing unit, j is the jth county, and k is the kth province; π0jk is the mean value of jth county; β00k is the mean value of kth province; γ000 is the mean value of all housing observations, and eijk, r0jk, and u00k are stochastic error terms.

Total variance can be decomposed into the variance between transactions within the same county (Var(eijk) = σ2), variance between counties within the same province (Var(γ0jk) = \(\tau_{0}^{2}\)), and variance of mean prices between provinces (Var(u00k) = \(\varphi_{0}^{2}\)). The proportion of variance that is due to differences at the community, county, and province levels can be denoted by intraclass correlation coefficients p1, p2, and p3, respectively:

$$p_{1} = \sigma^{2} /\left( {\sigma^{2} + \tau_{0}^{2} + \varphi_{0}^{2} } \right),$$
(11)
$$p_{2} = \tau_{0}^{2} /\left( {\sigma^{2} + \tau_{0}^{2} + \varphi_{0}^{2} } \right),$$
(12)
$$p_{3} = \varphi_{0}^{2} /\left( {\sigma^{2} + \tau_{0}^{2} + \varphi_{0}^{2} } \right).$$
(13)

4.4.2 Random intercept model (RIM)

Next, the random intercept model identifies statistically significant variables at community level. The intercept is specified as random at the two higher levels. All the coefficients on explanatory variables are assumed to have only fixed effects, which indicate that their effects on housing price do not vary across different spatial levels:

$$Level\,1{:}\,\,\,Y_{ijk} = \pi_{0jk} + \pi_{1jk} \left( {ELEVATION} \right) + \pi_{2jk} \left( {HIGHSPEE} \right) + \pi_{3jk} \left( {LAKE} \right) + \pi_{4jk} \left( {MAJORROA} \right) + \pi_{5jk} \left( {RAILWAY} \right) + \pi_{6jk} \left( {RIVER} \right) + e_{ijk} ,$$
(14)
$$Level\,2{:}\,\,\,\pi_{0jk} = \beta_{00k} + r_{0jk} ,$$
(15)
$$\pi_{qjk} = \beta_{q0k} \,\,\,{\text{for}}\,{\text{q}}\, = \,1,\,2,\,3,\,4,\,5,\,6,$$
(16)
$$Level\,3{:}\,\,\,\beta_{00k} = \gamma_{000} + u_{00k} ,$$
(17)
$$\beta_{q0k} = \gamma_{q00} ,\,\,\,{\text{for}}\,{\text{q}}\, = \,1,\,2,\,3,\,4,\,5,\,6.$$
(18)

4.4.3 Fixed contextual effect model (FCM)

In the FCM, six contextual variables are assumed to have effects on housing price based on H2 and H3. The housing price can be explained both by explanatory variables at level one and contextual variables at the two higher levels. The variance of the average of the dependent variable is assumed to be influenced by contextual variables as fixed contextual effects. The FCM is defined as:

$$Level\,1{:}\,\,\,Y_{ijk} = \pi_{0jk} + \pi_{1jk} \left( {ELEVATION} \right) + \pi_{2jk} \left( {HIGHSPEE} \right) + \pi_{3jk} \left( {LAKE} \right) + \pi_{4jk} \left( {MAJORROA} \right) + \pi_{5jk} \left( {RAILWAY} \right) + \pi_{6jk} \left( {RIVER} \right) + e_{ijk} ,$$
(19)
$$Level\,2{:}\,\,\,\pi_{0jk} = \beta_{00k} + \beta_{01k} \left( {URBAN\_PO} \right) + \beta_{02k} \left( {MIGRATIO} \right) + \beta_{03k} \left( {PSPACE} \right) + r_{0jk} ,$$
(20)
$$\pi_{qjk} = \beta_{q0k} \,\,\,{\text{for}}\,{\text{q}}\, = \,1,\,2,\,3,\,4,\,5,\,6,$$
(21)
$$Level\,3{:}\,\,\,\beta_{00k} = \gamma_{000} + \gamma_{001} \left( {CPI} \right) + \gamma_{002} \left( {PCPGL} \right) + \gamma_{003} \left( {NCSM} \right) + u_{00k} ,$$
(22)
$$\beta_{0qk} = \gamma_{0q0} ,\,\,\,{\text{for}}\,{\text{q}}\, = \,1,\,2,\,3,$$
(23)
$$\beta_{q0k} = \gamma_{q00} ,\,\,\,{\text{for}}\,{\text{q}}\, = \,1,\,2,\,3,\,4,\,5,\,6.$$
(24)

4.4.4 Intercepts- and slopes-as-outcomes model (ISO)

The final HLM is specified as a intercepts- and slope-as-outcomes model, which indicates that the slopes of explanatory variables at the lower level can vary according to a higher-level distribution. According to Assumption 3, by incorporating two additional macro-models for the slopes of MIGRATIO and PSPACE at Level 2, the ISO is specified as:

$$Level{\hbox{-}}1{:}\,\,\,Y_{ijk} = \pi_{0jk} + \pi_{1jk} \left( {ELEVATION} \right) + \pi_{2jk} \left( {HIGHSPEE} \right) + \pi_{3jk} \left( {LAKE} \right) + \pi_{4jk} \left( {MAJORROA} \right) + \pi_{5jk} \left( {RAILWAY} \right) + \pi_{6jk} \left( {RIVER} \right) + e_{ijk} ,$$
(25)
$$Level{\hbox{-}}2{:}\,\,\,\pi_{0jk} = \beta_{00k} + \beta_{01k} \left( {URBAN\_PO} \right) + \beta_{02k} \left( {MIGRATIO} \right) + \beta_{03k} \left( {PSPACE} \right) + r_{0jk}$$
$$\pi_{qjk} = \beta_{q0k} \,\,\,{\text{for}}\,{\text{q}}\, = \,1,\,2,\,3,\,4,\,5,\,6,$$
(26)
$$Level{\hbox{-}}3{:}\,\,\,\beta_{00k} = \gamma_{000} + \gamma_{001} \left( {CPI} \right) + \gamma_{002} \left( {PCPGL} \right) + \gamma_{003} \left( {NCSM} \right) + u_{00k} ,$$
(27)
$$\beta_{01k} = \gamma_{010} ,$$
(28)
$$\beta_{02k} = \gamma_{020} + \gamma_{021} \left( {GDP} \right) + \gamma_{022} \left( {UE} \right) + u_{02k} ,$$
(29)
$$\beta_{03k} = \gamma_{030} + \gamma_{031} \left( {AWUE} \right) + u_{03k} ,$$
(30)
$$\beta_{q0k} = \gamma_{q00} ,\,\,\,{\text{for}}\,{\text{q}}\, = \,1,\,2,\,3,\,4,\,5,\,6,$$
(31)

where γ021(GDP), γ022(UE), and γ031(AWUE) indicate cross-level interactions between the independent variables of Levels 2 and 3. The two additional random variables, u02k and u03k, suggest that the slopes of MIGRATIO and PSPACE of the subdivisions are dissimilar to their slopes of the overall model. The variances of u02k and u03k are denoted by \(\varphi_{1}^{2}\) and \(\varphi_{2}^{2}\), respectively. The coefficients on the six explanatory variables at Level 1 and the coefficient on URBAN_PO at Level 2 only have fixed effects, which indicate that the effects of these variables on housing value do not vary across subdivisions.

5 Results and discussion

5.1 Spatial variations of housing prices in China

At community level, the statistics show that the maximum transaction housing price can reach 336,448 Yuan/m2 and the minimum 1000 Yuan/m2. These figures demonstrate a large difference in space. At county level, the prices also present an extremely uneven spatial distribution, with the average maximum county housing price being 66,329 Yuan/m2 and the minimum average 1711 Yuan/m2, for a mean of 5976 Yuan/m2. Figure 4 shows the spatial distribution of housing prices at county level in China in October 2015. The counties with high housing prices were mainly located in the East Coast Region. Specifically, Beijing–Tianjin–Hebei Region, Yangtze River Delta Region, and Pearl River Delta, which are the most developed and urbanized areas in China, dominate the high housing price sample.

Fig. 4
figure 4

The spatial distribution of housing price at the county level in China in October 2015

5.2 NM and RIM results

Four hierarchical linear models, which take a three-level nested hierarchical structure into consideration, were estimated using the HLM 7.03 package. Table 2 summarizes the estimates of the four models for random effects. The estimate of γ000 is 8.600, which indicates that the mean housing price is approximately 5431.66 Yuan (e8.6 = 5431.66). The random effects in the null model show that 8.1% of the total variance in the housing prices is between counties within provinces, while 31.4% is attributed to differences at the province level. This result suggests that housing prices vary across counties and provinces. The HLM is therefore supported by our data. Compared with the null model, the RIM regression results indicate that only 6.7% of the price variance is located at county level and only 10% of the total unexplained price variance at province level. The decrease of ICC (p2 and p3) indicates that the explanatory variables in Level 1 can capture the between-community price variation. The three levels of housing price variations were then incorporated into the traditional hedonic housing price model by considering the influence of contextual variables at the macrolevel as dependent variables or on the slope of the variables at the microlevel.

Table 2 Estimated random effects

5.3 FCM and ISO results

Model 3 considers the effects of independent variables at Levels 1 and 2 as constant, while Model 4 defines the impacts of two contextual variables at Level 2 on housing value as random changes. From Table 2, the declining trends of \(\tau_{0}^{2}\) and \(\varphi_{0}^{2}\) in FCM and ISO indicate the explanatory variables in Levels 2 and 3 capture an important part of variance. Additionally, variances \(\varphi_{1}^{2}\) and \(\varphi_{2}^{2}\) in ISO are 0.02402 and 0.00159, respectively, which indicates that the MIGRATIO–price and PSPACE–price relationships vary between counties. Table 3 presents the estimations of the fixed effects in the three HLMs. The deviance statistics show that ISO has a better goodness of fit than FCM. In the following section, we only report the results of the ISO model.

Table 3 Estimated fixed effects

5.3.1 Effects of province variables

At province level, three variables are associated with housing prices. One of the issues of this study is that both HCL and NCSM did not have significant coefficients in the model, which suggests they were not the multi-level determinants of housing price during the boom period. Among the factors at province level, PCPGL also did not pass the significant test, implying there is no direct relationship between housing prices and green and open spaces. This finding is inconsistent with that of previous studies (Wen et al. 2015, Jiao and Liu 2010); focusing on single cities also indicated that green parks had spatial effects on apartment housing prices in China. Waltert and Schläpfer (2010) pointed out that migrants are attracted by amenities nearly as frequently as by low taxes, implying that amenities are an important determinant of housing prices, which is also true for Chinese cities. However, at the province level, although the absolute area of green and open space of one province may be above that of another, not all the green and open spaces can be shared by everyone. The estimated coefficients on GDP and UE are significant at the 5% level, implying they have significant effects on the magnitude of migrants between counties. One difference between them is that the GDP exerted a positive effect on the relationship between migration and average housing prices, whereas UE showed a negative effect. Although Zhang and Tang (2016) argued that there is no evidence that GDP has a direct relationship with the booming housing prices, GDP showed a significant impact on the magnitude of migrant population, which increased the demand for housing. The negative coefficient on UE suggests that, holding other variables constant, in provinces with more urban employees, the average housing price would be less sensitive to rural–urban migration in their counties, which is inconsistent with our expectations.

5.3.2 Effects of county variables

At county level, the estimated coefficient on MIGRATIO is positive and significant at the 5% level. This finding implies that, all else being equal, counties with a greater migrant population have higher housing prices. UE is also significant and negatively correlated with housing prices. However, because the coefficient on GDP is positive, it may compromise the negative effect of UE. Further calculations indicate that, for a one-unit increase in GDP and UE, the coefficient on migrant population at county level still has a positive value, which suggests high migration at county level to be associated with high average housing prices. Additionally, this finding indicates the existence of contextual effects between the county and province levels. As expected, the estimated coefficients on AWUE are positive and statistically significant at the 5% level. This finding indicates that including AWUE significantly affected PSPACE, thereby influencing attribute prices at community level. As indicated by Wang et al. (2017b), the wage level determines housing demand by limiting the affordability of a household at the individual level and controlling labour supply at the regional level. Therefore, AWUE is an important determinant of housing prices in China.

5.3.3 Effects of proximity variables

Among the several proximity variables, the distance to railway has significant and positive coefficients at community level, implying that for a distance increasing in a range from the housing observations to the railway, prices experienced an increasing trend during the booming housing period. An estimated total impact of the distance to railway in natural logarithmic form is 4.443, which indicates that, holding other variables constant, when all housing observations move one unit closer to the railway, housing prices will decrease by 444.3% on average. This may because of the negative noise along the railway lines. By contrast, the estimated coefficient on distance to high-speed railway is − 2.432, which attains a 1% significance level. This finding suggests that, everything else being equal, a one-unit increase in the distance to the high-speed railway decreases the average housing price by 243.2%. This phenomenon may have occurred because a high-speed railway reduces the travel time, improves accessibility among cities, and changes the traditional travel patterns of citizens. Although the high-speed railway also generates a noise effect, residents have a high willingness to pay for its positive effect. The effects of lake, major road, and river, measured by a dwelling’s distance to the nearest lake, road line, and river, respectively, were not statistically significant.

6 Discussion

The hedonic pricing modelling theory established by Rosen (1974) describes housing value as a function of a bundle of attributes, which can be categorized into structural, neighbourhood, and environmental factors. Structural and neighbourhood features, such as size, floor number, plot ratio, year of construction, orientation, and proximity determinants, are associated with community characteristics. Environmental features, measured by macro social and economic indicators, are linked with large-scale units. Implicitly, they are hierarchical and the macro determinants have direct and indirect impacts on the micro factors for housing premiums. Although the effects of county- and province-level determinants on community-level drivers were unexamined in the current study due to data availability, they exist on the housing market in China, such that the residents of the northern more developed provinces are more likely to prefer low- and mid-rise buildings. By contrast, high-rise buildings have a significant potential in the cities of southern provinces. Consequently, community-level determinants may vary between cities, counties, or provinces, suggesting that interactions and influences exist between high- and low-level variables in housing price modelling, outcomes that are generally described as contextual effects (Lee 2009).

Thirteen socioeconomic variables were initially selected, but only six were included in the analysis and three yielded significant coefficients at the province level. However, this result does not mean that other variables had no effect on housing value. One reason is that collinear effects exist between these variables; for example, GDP is highly correlated with private enterprises and individual employees (PEIE) (Pearson correlation = 0.943) and the number of real estate development enterprises (NREDE) (Pearson correlation = 0.941). PEIE and NREDE are potential indicators of housing demand. Their increase and decrease will lead to the rise and decline of housing demand and are thus associated with the supply and demand mechanisms of the housing market. At county level, the number of individuals above 15 year old (P15), of unmarried persons (PS), and of individuals with a bachelor degree (PBD) are correlated with MIGRATIO. No evidence supports the argument that population structure is a key factor in determining housing prices. However, China is experiencing a population-aging period. Theoretically, with the decelerated growth in the number of teenagers, housing demand will decrease in the long run. Such decline may partly affect housing prices. Moreover, although the effects of the unmarried population on housing value were not reported in this study; thus far, whether marriage influences housing demand is unclear and inadequately studied in the literature. Further research could shed some light on this point.

At community level, we use the locations of county governments that do not accommodate employment concentrations, business functions, and services to represent the CBD, as the effect of distance to centres is not clear. Multiple urban centres have been found to determine housing price variations (Qin and Han 2013; Wen and Tao 2015), and our focus on national housing price determinants adds to the difficulty in identifying the CBDs of nearly 1200 cities. Moreover, many of them are polycentric; thus, the effects of multi-level CBDs remain unclear. As expected, the distance to a high-speed railway yielded negative effects on housing prices, which is different from the previous studies (Geng et al. 2015) that reported a combined positive and negative effects. One difference is that this study not only identifies the effect of railway station but also of the high-speed railway. Another difference is that we examined a large number of observations on a national scale, while Geng et al. (2015) focused on the influence of the Beijing South Railway Station, with relatively few observations. However, the distances to lake, major road, and river were not statistically significant. This finding suggests that no significant price premiums were identified by changing these proximity variables. These results are different from our expectations, given that some previous studies have shown that the inner river and lake of a city generated remarkable price premiums (Wen et al. 2014a, 2015; Jiao and Liu 2010). One possible explanation is that a scale effect exists in the hedonic price model. The relationship between housing price and the distance to lake, river, and road in a local city can unlikely be generalized to other cities.

Two decades ago, the housing system in China changed from a government-dominated plan provision system to a market-oriented one. The national average housing price in 2014 was approximately 3.5 times that in 1999, and the increase in average housing price for several large cities in China was considerably higher than the national average (Shi et al. 2015). A large imbalance also exists between housing prices and incomes among different regions, thus worsening the housing affordability problem. The pressure of ensuring housing affordability is actually a challenge for both the central and local governments, having already implemented measured such as Cheap Rental Housing and Economic Comfortable Housing policies for low-income residents. However, considering the limited land resources and incentives to boost economic development, these policies did not achieve their intended objectives. The hierarchical linear regression showed that high GDP levels are associated with high housing prices and migrants were a determinant of housing price increases. However, the heavy reliance on land revenue prevented the governments from determining the best solution to address the conflict between housing affordability and economic growth. For most rural–urban migrants, housing properties have remained unaffordable over the past 20 years. Therefore, a balance should be achieved between housing affordability and financial revenue when designing a sustainable housing policy. On one hand, a public housing program for low-income households provided by the local government should still be used. For example, the Public Rental Housing provision in Chongqing is a successful model that enabled an efficient cooperation between the government and the market force for public housing provision (Zhou and Ronald 2017). On the other hand, to control for the overheated real estate investment, different housing-related policies, such as applying different down payments and mortgage rates for different regions or provinces, should be considered in different regions, given the significant variance in the average housing prices over space and time in China.

7 Conclusions

With the booming of housing prices in Chinese cities in recent years, both scholars and governments are display interest in exploring fundamental market determinants as to create a smart housing policy or project for the future trend of housing prices. Numerous studies have reported the effects of amenities, public transit accessibilities, educational resources, and structural variables on housing value in China by using traditional hedonic price models. However, to the best of our knowledge, most of these previous studies assume that housing price samples have no hierarchical structure. Thus, they failed to consider the contextual effects of urban space produced by the interactions and influences between high- and low-level factors. We propose a hierarchical linear model approach to contribute to the understanding of how multi-level determinants influence housing value in light of previous studies by examining the housing market in Chinese cities as a case study.

This study examines the impacts of key multi-level socioeconomic drivers and accessibility elements on house value. The null HLM results show that the averaging housing price of 1120 counties from 31 provinces exhibit significant variance, indicating that the multi-level hedonic specification is more appealing compared with OLS regression-based hedonic price models when dealing with housing samples nested within hierarchical spatial structures. The three-level HLM results reveal the following. At community level, ordinary railways are negatively associated with housing value, whereas high-speed railway showed a significant housing price premium. At county level, rural–urban migration (MIGRATIO) and per capital living space (PSPACE) exerted significant positive influence on housing value. Moreover, both of them are subject to the contextual effects of province-level variables. MIGRATIO is dependent on GDP and urban employment (UE), whereas PSPACE is the function of average wages of urban employees (AWUE). Particularly, these results also indicate that GDP, urban employment, and average wage of urban employees are key determinants at province level. In sum, housing values in China are not only determined by house features but are also driven by multi-level socioeconomic factors.

The findings of this study have important policy implications for policymakers. The Chinese central government has implemented many housing policies to curb the surging housing prices, such as setting buying limits, increasing the down payment for a second house, and increasing mortgage rates. However, they seem to have remotely achieved their original goals. One of the reasons is that the central government failed to create different policies for different cities based on socioeconomic conditions. As our study indicates, a large difference exists in the average housing prices of Chinese cities, housing prices being associated with GDP, urban employment, and migration. A 20% percent increase in down payment may work immediately on the property market of a third-tier city but would not be too effective in a first-tier city. This is because the housing supply in first-tier cities is limited in China and housing demand is rigid. Increasing down payment will not affect the housing demand in first-tier cities. Conversely, housing demand is below supply in many third-tier cities in China. Recently, most of third- and forth-tier cities have been facing a significant pressure in reducing their unsold houses. As such, the Chinese central government paid attention to real estate destocking in third- and forth-tier cities. A policy of increasing the purchase threshold will prevent potential homebuyers from buying houses in these cities but will not work well in first-tier cities, where demand is much higher. Therefore, various housing policies should be created according to the socioeconomic determinants of housing values. This study also implies that a high-speed railway (with more efficient transportation routes linked to different regions) generates accessibility premiums from region-wide commuting improvements. Thus, governments could also out the land near a high-speed railway station to residential use to meet demand.

The originality of this study lies in the following two main elements. First, using the hierarchical linear regression approach to model the inherently hierarchical attributes of the determinants of housing prices and their interactions at different levels, a major improvement was observed compared with single-level OLS-based models, such as spatial regression and geographical weighted regression. Second, both accessibility features measured at the community level and socioeconomic factors representing the macro environment were examined based on a large number of transaction samples covering half of Chinese counties. However, we also acknowledge several limitations. First, the data used have constraints as the structural characteristics (e.g., age, plot ratio, size, building year) are excluded from the dataset, which prevents us from analysing the home buyers’ appreciation of these housing features. Second, we only analysed the determinants of housing prices for 1 month in 2015; the changing pattern of housing prices and their determinants over a longer temporal scale thus requires further quantifying in future research.