Keywords

1 Introduction

The housing market has a strong impact on the economy worldwide. At the national level, housing-related industries contribute to 15–18% of the Gross Domestic Product (GDP) in the United States, nearly 15% of the GDP in the European Union, and 13% of the GDP in AustraliaFootnote 1. These numbers show that housing is a leading indicator of a nation’s economic cycle [19] and important for economic and financial stability and growth [2]. Real estate is the biggest assets for most households and also one of the strongest factors as a financial assurance to afford a comfortable retirement life. In 2017, the median net wealthFootnote 2 of property-owning households with at least one of the occupants over 65 years old was over $934,900, whereas renting households under similar conditions only had $40,800, showing almost 23 times difference.

Housing appraisal is crucial for the housing market. An accurate appraisal leads to rational negotiation and decision making and thus helps preventing home buyers from buying over-valued homes. Housing appraisal is also highly relevant to financial stability as most banks require a specific house valuation process to decide a healthy mortgage amount. In practice, however, it is difficult for home buyers to access information during house hunting and price negotiation stage because hiring a professional property valuer is expensive, time consuming and inconvenient. These difficulties pose a strong need for a timely, accurate, automatic, and affordable housing appraisal system.

Despite that housing appraisal is important in both macro and micro economics, “housing price remains as much art as science” [18]. The understanding of housing price is still very limited. Existing studies focus on housing attributes (e.g., size of land, number of bedrooms) but largely ignore the relationship between a house and its surroundings. Traditional econometric models have revealed a strong spatial correlation to housing price, which can be explained with two theories [20]: (1) the spillover effect between regions–when physical and human capital, or technological improvement concentrates in one region, it will naturally have a positive impact on its neighbouring regions [9]; (2) unobserved or latent geolocational factors.

Recent availability and appreciation of social, economic and geographic data have enabled researchers to trace the spillover effect on human capital or technological breakthroughs, and to discover unobserved or latent factors. Housing appraisal becomes more viable thanks to granulated geo-spatial rich information available through multiple online resources, such as satellite, street views and housing images, and map data. With the support of real world data, we can empirically investigate how people make decisions. These socioeconomic data sets are multi-source, heterogeneous, and high-dimensional.

Most of the recent development focuses on finding new spatial factors by leveraging the new available online data and machine learning methods to reveal the unexplained elements [15, 21, 24]. These new findings have shown that housing price is correlated with safer environment [6, 8], intangible assets from its neighbourhood, and associations with neighbouring houses [18], design [25], culture [13], Point of Interests measure [11], etc. The new development is mostly from the perspective of neighbourhood characteristics, and is normally within the range of 1 to 2 km from the house. However, only exploring the near neighbourhood has a few limitations. First, certain living functions can’t be fulfilled in the near neighbourhood and these functions are not captured in the previous housing price models. For example, shopping malls, hospitals, universities are strategically located to service at regional level, or national level, not at suburb level. But these functions do influence the demographic distribution in the nearby suburbs and hence influence the housing value. Second, these regional services are economically highly concentrated clusters. For example, a shopping mall can contain 500 shops. A university campus can service 20,000 students. Therefore, economic value and service activities are highly concentrated in these nodes, and would influence the price of houses that are beyond their immediate neighbourhoods.

How can we expand the investigation and identify key factors beyond the immediate neighbourhood?

Few studies [1, 16] have shown that enlarging the neighbourhood area can improve housing price prediction. However, these studies were based on satellite image without further investigation of influential visual features, or area beyond the neighbourhood. Therefore, we can’t identify whether the improvement is due to merely enlarging the neighbourhood area, or due to the inclusion of new features in the calculation.

To expand the investigation beyond neighbourhood, we can either merely increase the neighbourhood area, or identify and add new key features at the metropolitan level. Increasing the neighbourhood area is not an ideal approach, as it can increase computation significantly without providing additional insights. Therefore, we take the second approach as finding the new key features. It is more challenging, but with the reward of less computation and potentially bringing implication values to home buyers, investors and urban planners.

Our approach extends the relational closeness by investigating the economic proximity. This establishes an economic closeness between the household and the place–economic cluster. This paper aims to study the intangible value of a house beyond neighbourhood value by evaluating the relationship between house and existing regional economic clusters. Specifically, we identify economic clusters by some significant categories, such as CBD, shopping malls, universities. By consolidating with other influencing factors, we build a housing appraisal framework including Housing features, Neighbourhood characteristics, regional Economic clusters and Demographic characteristics, called the HNED model. This approach may potentially help decision-making for home buyers, property investors and urban planner. It may also indicate solutions for affordable living without compromising the essential needs.

The rest of this paper is structured as follows. Section 2 explains the conceptual framework and main factors in detail. Section 3 explains the experimental settings. Section 4 discusses the results. Section 5 deals with discussion and implications, followed by related work in Sect. 6. Finally, concluding remarks are offered in Sect. 7.

2 Conceptual Framework

In this section, we introduce our housing price estimation model. The target variable of the model is the price of the house. The feature variables are grouped into four feature vectors, corresponding to four different types of attributes, which together influence the price of the house. The types of attributes are: housing attributes, neighbourhood characteristics, regional economic clusters, and demographic characteristics (HNED model).

2.1 Feature Vector 1: Housing Attributes

There are basic attributes about the house itself that people can easily acquire information through advertisement or inspection. These attributes are primary functions that fulfil people’s needs of dwelling. The first feature vector corresponds to the influence of these housing attributes. This follows the traditional hedonic model for housing appraisal. There are 18 property attributes selected (corresponding to an 18 dimensional feature vector), which are: area size (total square metres), property type (unit, house, townhouse), number of bedrooms, number of bathrooms, parking, separate study, separate dinning, separate family room, rumpus room, fireplace, walk-in-wardrobe, air condition, balcony, en suite, garage, lockup garage, polished timber floor, barbeque.

2.2 Feature Vectors 2 and 3: The Housing Location

Feature vectors 2 and 3 both relate to the location of the house. In the context of this paper, we interpret location as the accessibility to certain local amenities from the house, as well as the accessibility to important economic clusters from the house, such as the CBD, large shopping centres, universities, etc. Our model captures the quality, quantity and accessibility of the amenities.

There exists rich literature with the investigation of relationship between housing value and its neighbourhood [5,6,7, 10, 11]. Different from the current research, we emphasise the value of location rather than neighbourhood. Neighbourhood is the direct geographical and social influence around the individual property. The existing literature neglects the investigation of the relationship between individual property and the regional economic clusters within the metropolitan area. Urban sociologist Burgess [4] emphasised that the urban growth radially expanses from its CBD and physically attractive neighbourhoods. This provides us the theoretical guidance to investigate how location and social networks influence housing prices.

Feature Vector 2: POI-based Neighbourhood Characteristics. The second feature vector focuses on the small POI (Points of Interest) within walking distance (one kilometre), such as shops, restaurants, schools, parks, public transport stops, etc. We consider the location based social network (LBSN) as a cluster of important dots in a map. This location-based feature does not include larger POIs such as shopping centres and universities, which are covered by Feature Vector 3 described in the next section.

Feature vector 2 is a one-dimensional vector. For this feature vector, we assume that all small POI 1 km exerts influence on the price of the house, the quantity of which is one divided by the physical distance between the POI and the house. The total influence on the price of the house by all of the POIs is equal to the sum of the influence by each individual POI. Thus, if we have three POIs at 0.2 km, 0.3 km 0.5 km from the house, the value of Feature 2 for this house is:

$$\begin{aligned} \frac{1}{0.2} + \frac{1}{0.3} + \frac{1}{0.5} \end{aligned}$$
(1)

The dataset we use for feature vector 2 was collected from Open Street Maps, which included 13 categories of POIs.

Feature Vector 3: Regional-Level Economic Clusters. Regional-level economic clusters bring new dimension and features into the model. This captures economic activities not happen at neighbourhood level. As discussed in the introduction, capturing all the features in the regional or metropolitan level is time consuming and not realistic. The challenge is to find the most important link between a location and the house. Here we assume the value exchange is the important link. Therefore, we try to capture the economic clusters that provide huge economic values. Our model simplify the Regional Economic Clusters as universities, regional shopping centres, CBD [27] as a preliminary exploration. Features are not limited to these services. Feature Vector 3 is defined in a way with some similarities to Feature Vector 2. Feature Vector 3 is multi-dimensional. The influence of each of the super clusters is weighted by some measure of the size of the cluster (e.g. for shopping centres, the number of shops; for universities, the annual revenue), and attenuated by the inverse of the physical distance between the cluster and the house. Unlike with Feature Vector 2, the economic cluster does not have to be 1 km of the house to exert influence on the housing price.

2.3 Feature Vector 4: Socio-demographic Attributes

Feature Vector 4 involves the socio-demographic attributes of the suburb to which the house belongs, which characterise the social community that is physically closest to the house.

Previous urban economic research has explored the relationships between housing attributes and demographic characteristics of population and found that socio-demographic profiles determines the demand segregation and forms different trends city-wide [27]. We include social-demographic profiles into our modelling for a few reasons. First, we consider social-demographic characteristics can create long-term effect to shape the local economy and community. Housing is not easily transferrable as investment in stock market because of its physical moving difficulty and potential extra capital gain tax for short-term penalty. Residents would stay in the same suburb for a long period and co-create the taste, economy and culture of its local community. The suburb would grow with its residents. Second, human capital can generate externalities and have spill over effect in the neighbourhood regions [9]. Related businesses are more likely to be adjacent and form cluster effect.

Feature vector 4 includes this aspect by adding into the model the relationship between property price and its social-demographic profile. The components of this vector are carefully selected from four sections of the 2016 Australian census data. This include features about income, people and population, education and employment, family and community. We use these features to understand people and their life in each suburb.

Selected features at suburb-level are shown in Fig. 1 with example of one suburb of Melbourne called Brunswick.

Fig. 1.
figure 1

Components of feature vector 4 and sample dataset

3 Experimental Settings

3.1 Data Description

We use metropolitan Melbourne, Australia as our experiment city. The Melbourne housing price data was provided by the Australian Urban Research Infrastructure Network (AURIN). We mainly focus on the sold housing price data in 2018, which includes 161, 179 recorded sold properties. We removed properties whose geographical locations were missing, as well as those whose prices were less than 10, 000. The remaining dataset contains 158, 588 properties.

In order to investigate how location relates to housing price, we collect POIs and public transport stops from OpenStreetMap. We also collect primary school and high school rankings based on their standardised exam results. From TripAdvisor, we collect information of local restaurants and their ranking.

We also collect data to understand regional economic clusters, which include 42 shopping centres and total shop numbers in each centre, 9 universities located in Melbourne and total revenue of each university in 2019.

For the socio-demographic data, we use Australian Bureau of Statistics census data 2016.

3.2 Algorithms

We cast the problem of housing price appraisal in two forms. The first form is estimating the dollar value of the housing price (so the target variable is a continuous variable). The second form is estimating the price range of the house (so the target variable is a categorical variable, representing the estimated price range).

In the dollar value estimation, we use Linear Regression as the baseline algorithm, as this is what is commonly used for such problems in Classical Economics. The performance of three algorithms (Support Vector Machine, Multilayer Perceptron, XGBoost) were compared against the performance of the baseline.

In the price range estimation, we use Logistic Regression as the baseline (classical method), and the performance of four algorithms (Support Vector Machine, Multilayer Perceptron, k-Nearest Neighbour, XGBoost) were compared against the performance of the baseline.

Performance evaluation of the dollar value estimation uses the standard metrics of Mean Absolute Error (MAE), Root-Mean-Squared Error (RMSE), and the Coefficient of Determination (\(R^2\)). Performance evaluation of the price range estimation uses the standard metrics for a classification problem of Accuracy, Precision, Recall, and F1.

4 Results and Analysis

4.1 Overall Performance

Table 1 shows the performance of the different algorithms in the task of dollar value estimation. The performance of XGBoost considerably better than the other algorithms, achieving an \(R^2\) value of 0.8779, compared to the baseline performance of 0.6422.

Table 1. Housing price estimation

Table 2 shows the performance of the different algorithms in the task of price range estimation. The performance of XGBoost again is considerably better than the other algorithms.

Table 2. Housing price range estimation (Classification)
Table 3. Housing price estimation performance using a subset of the feature vectors

4.2 The Importance of the Regional Cluster Variable

In this section, we discuss a very important finding that regional cluster variables played a significant role in prediction [14, 29]. Table 3 used different combination of subsets of feature vectors. Regional economic cluster as feature vector 3 reached 0.6290 in \(R^2\) individually, and also consistently reached the highest performance when combined with other feature vectors. It is noticeable that feature vectors 1 and 3 combined could reach 0.1552 in MAE, 0.8585 in \(R^2\) which means housing attributes with regional economic cluster variables can give a good prediction.

5 Discussions and Implications

Our results show how housing price is related to housing attributes, location and socioeconomic characteristics. Each element contributes different implications for different social agents, such as home buyers, investors, local and regional councils, urban planners.

5.1 Implications for Home Buyers

Generally, home buyers consider both current living functions and investment value of a property. Firstly, housing attributes are the primary focus to meet the daily needs of dwelling. Extra bedroom or bathroom can drive the property value up with better functionality. Secondly, people value more for being in a highly ranked school zone. Walking distance to schools, supermarkets, public transport are preferable by most home buyers. However, the power of strong connection to regional economic clusters may be neglected in the decision making process. To capture a long-term investment return, home buyers need to identify a location with growing highly educated population, with a strong connection to regional economic clusters.

5.2 Implications for Councils and Urban Planning

Based on our results, income and education are strong indicators to drive the housing value, especially investment income is more relevant. High investment income indicates people with multiple source of income or they are business owners. Firstly, councils can attract these group of people by stimulating business district, business park development, or building strong industrial clusters. Secondly, councils can attract young highly educated or highly skilled people to settle down by providing high quality infrastructures, such as high quality public schools, welcoming new campus for highly ranked private schools, or planning shopping centres and sports facilities. Thirdly, councils can make strategic long-term planning of fostering regional economic clusters in a prominent industry, such as forming an IT cluster, education cluster, medical cluster or warehouse cluster, etc. By forming a super cluster, local areas can concentrate human capital in one expertise direction and achieve high economic growth rate.

5.3 Implications for Real-Estate Investors and Developers

Both investors and developers need to identify high demand of housing. Investors focus on both future return and sustainable rental demand. Good location is essential for both rental income and long-term return. Rich POI in the neighbourhood and close distance to regional economic cluster would guarantee a good location. The growth of high educated, skilled population in one area will contribute for future demand of such location and hence drive the future property return. Developers also need to consider maximise the housing needs for potential buyers, extra bedroom and bathroom can significantly increase house value.

6 Related Work

In this section, we discuss the development of housing appraisal methods in both traditional housing market research and computer science research fields. Both fields involved recent development of methods in dealing with the spatial data. Spatial autocorrelation and spatial heterogeneity recognised in both fields are the two main challenges in models involving the spatial data.

In the computer science field, more focus is about how to incorporate newly available data into the house price prediction model. Most of these new data helps to expand the understanding the geographical characteristics towards housing valuation. The nature of these newly available data is often beyond the angle of traditional economic variables, therefore, new methods also introduced into the real estate valuation field. For example, satellite image [1], street-view image, house image from real estate websites [22, 25, 30] are used for house price prediction [6, 16, 18]. Other paper investigated how the satellite image and street view can help to understand the neighbourhood [23], demography [12] and commercial activities. These results are also highly relevant to housing appraisal, though they didn’t investigate this question directly.

Besides image data, other types of data are used for housing appraisal beyond the traditional scope of economic modeling. Open Street Map data was used for collecting Point of Interests in the neighbourhood [6, 11]. Mobile phone data was used to empirically study the human activity and urban vitality [7]. Taxicab trajectory data was mined to estimate neighbourhood popularity, in order to understand the geographic factors for housing appraisal [11]. Google search index was used for housing prediction model to understand how people’s attention of real estate would influence the future housing price [28]. Text data from real estate related news was studied to learn how sentiment is related to housing price [17, 26].

These data applications are innovative and inspiring for improving housing price modeling. However, most of the newly applied spatial data are focused on the discovery of neighbourhood characteristics, such as crime perception [3], walk-ability [5], cultural influence [13]. These applications neglect the understanding beyond the neighbourhood. And this is the main focus of our work.

7 Conclusions and Future Work

We have studied how factors beyond neighbourhood impact housing values. Specifically, we established regional economic clusters as the significant source of impact beyond neighbourhood. We presented our housing price appraisal model that combined housing attributes, neighbourhood characteristics, and demographic factors. Our model using the XGBoost algorithm has reached 0.88 in \(R^2\), showing the significant impact of regional economic clusters.

Our work enlightens two related research questions worthy of future investigation. (1) Building a customised recommendation system for home buyers. This system aims to tailor and optimise personal needs of affordable living, and provide smart suggestions for trade-offs between different needs and opening up opportunities for locations. (2) Methods to systematically identify regional economic clusters and appropriately weight these clusters in our model. With better understanding of this behavioural mechanism, we could improve our community, facilitate sustainable gentrification, and lead to location diffusion, population growth, and new regional economic clusters emerging.