Keywords

1 Difficulties in Using Conventional Hedonic Models to Mass Appraisal

The main challenge in Automated Valuation Modelling (AVM) is to build models with a reasonable precision in order to estimate values for a single case or for a large number of properties (a property portfolio, for instance). AVM may contribute to save time and money in real estate appraisal, but it need be based in mass appraisal techniques, using tested models and a sound database.

In general, individual real estate appraisal (sometimes named as “commercial appraisal”) uses hedonic price models, calculated by multiple regression analysis and based in a small sample of market cases. The result is an equation, used then to estimate values. Precision of estimated values is linked to a small level of errors. Automated Valuation Modelling may follow the same scheme, but using a large quantity of market data and more general statistical models. However, build a single model to evaluate a large quantity of properties is not a trivial task, because real properties may be very different among them; there are several difficulties on measuring location characteristics, and so on. Furthermore, the coefficients of regression models are calculated as base on average values of each variable included in the model. Then, properties with characteristics too different of average values will have a major error level. One alternative is to build a set of models—one model to each sub-market, for instance—but in this case the problem is the segmentation criterion. The analyst needs to divide the set of properties in some clear and well-defined classes. He needs to define the limits or frontiers of each model. Nevertheless, different models will potentially presents different estimates to similar properties in borderline. For instance, neighbours regions A and B may present differences to estimate values to two similar properties PA and PB, when calculated by Model A and Model B, respectively, creating a gap or step on estimated values (Fig. 1).

Fig. 1
figure 1

Neighbour models with different valuations

These differences may occur because frontiers in sub-markets commonly are not abrupt, but continuous. Values to similar properties probably are similar too. There are a continuous range of values among neighbour sub-markets. Fuzzy logic offers means to make soft these frontiers. So, a possible solution to mass appraisal is to divide a market in some sub-markets (and their correspondents’ models) but in a fuzzy way. In the sequence, we introduce elements of theory, propose and present an application of a fuzzy system designed to real estate appraisal in AVM.

2 Hedonic Price Models

The representation of a housing market may be developed with econometric models, which include measures representing the most important attributes. The hedonic price models are widespread in the urban economy, based on the theory founded by Court on 30s (Court 1939) with important contributions of Griliches (1971) and Rosen (1974). These models basically present a relationship among observed prices and property characteristics (Robinson 1979). The analyst should establish hypotheses of relationships between these characteristics (explanatory variables) and sale price (explained variable), proposing a format to the model. Transaction data (evidences of the market behaviour) should be collected to develop the models, which should be tested to verify if the models are able to represent the market segment in question. Statistical tests allow evaluate the model itself and the individual importance of the variables included, within a certain degree of accuracy, indicating the overall quality of the formulated model. A conventional model assumes a format such as (Eq. 1):

$${\text{Y}} =\upalpha_{0} +\upalpha_{ 1} {\text{X}}_{ 1} +\upalpha_{ 2} {\text{X}}_{ 2} + \cdots +\upalpha_{\text{k}} {\text{X}}_{\text{k}} +\upvarepsilon = {\hat{\text{Y}}} +\upvarepsilon$$
(1)

This format is known as ‘classical linear model’, in which Y is the explained variable (usually sale price), X1, …, Xk are the explanatory variables (the characteristics of the property, location and sale conditions), α0 is the intercept, α1, …, αk are partial regression coefficients (also known as implicit hedonic prices), Ŷ is the estimate for the explained variable, and ε is the error term (ε = Y − Ŷ). The coefficients αi are often fitted using Ordinary Least Squares (OLS). Of course, there are several assumptions to obtain unbiasedness and efficiency of OLS, such as linearity, constant variance and normally distributed errors (Gujarati 2009; Kutner et al. 2010).

3 Classic and Fuzzy Logics

A conventional hedonic model is based in a sample set that follow the classic set theory. In classic sets, membership is simple: a case belongs or not belongs to a particular set. A classic set follows a ‘true or false’ scheme (in mathematical terms, membership may be indicated as {0, 1}, as based on Aristotelic or bivalent logic). Following this reasoning, the limits of the sample set define the limits of model validity (Fig. 1) (Cordón et al. 2001). Properties that may be evaluated by a hedonic model need be in the same range of sample set. For example, a division in three sub-markets by size provokes an association of each case to a one single set. Using this scheme, a property could be small (or) medium (or) large, for instance, and its value is calculated by a single model, following a format as Eq. (1). Therefore, the value of a ‘medium’ property P is estimated by Eq. (2):

$${\text{Value}}\left( {\text{P}} \right) = {\text{Model}}\_{\text{M}}\left( {\text{P}} \right)$$
(2)

where Value(P) is the estimation to the particular characteristics of property P and Model_M is a model estimated using a sample of market cases classified as ‘medium’. It could have also independent models to estimate small (Model_S) and large properties (Model_L), shared by size-limits, such as LimSM and LimML. In graphical terms, a group of classic sets looks as Fig. 2. Properties with size between [LimSM; LimML]—like ‘P’ in Fig. 2—are named ‘medium’ properties, and so on.

Fig. 2
figure 2

Example of a group of classical sets divide by property size with a case P

Fuzzy sets have a different scheme. There are some works about application of fuzzy logic in property valuation, such as Byrne (1995), Smith and Bagnoli (1997), d’Amato and Siniak (2003, 2008). Membership in fuzzy sets may assume any value in a real interval [0,1]. There is a degree of truth in each fuzzy set. A fuzzy system—basically an organized group of fuzzy sets—has not a simple membership, but a multiple membership. Each case belongs simultaneously to two or more fuzzy sets, with different membership degrees. The general membership of a case to the fuzzy system is the sum of partial memberships (Cordón et al. 2001; Zadeh 1965). A fuzzy system applied to mass appraisal with some fuzzy sets follows a weighted sum, such as (Eq. 3):

$${\text{Value}}\left( {\text{P}} \right) = \sum^{\text{i}} [\upmu_{\text{i}} \left( {\text{P}} \right)^{*} {\text{Model}}_{\text{i}} \left( {\text{P}} \right)]$$
(3)

where Value(P) is the valuation using the particular characteristics of property P, Modeli(P) is the value calculated to P using Modeli, μi(P) is the membership of case P to fuzzy set i, i = {1, …, n}, n is the number of fuzzy sets and ∑iμi = 1. For instance, in a fuzzy system with three fuzzy sets based on property size (such as in Fig. 3), a property P could have a μS = 0.30 membership degree in the ‘small’ set (S), a μM = 0.70 membership degree in the ‘medium’ set (M) and zero membership (μL = 0) in the ‘large’ fuzzy set (L). Other property, with a little bit larger floor area would have μS = 0.29, μM = 0.71, and μL = 0 memberships, and so on. Transitions among two classes became soft. Value (P) is a weighted average of three valuations and therefore permits a continuous value range.

Fig. 3
figure 3

Example of a fuzzy system divide by property size with a case P

Definitions about a fuzzy system begin by chose the number of fuzzy sets. In the sequence, some questions are to define the shape and limits to each fuzzy set. It’s very common to use triangular or trapezoidal functions to fuzzy sets. Other useful formats are Sigmoid and Gaussian (Cordón et al. 2001). A group of trapezoidal fuzzy sets has a form like in Fig. 3, and limits could be LimSM and LimML. If trapezoidal shape is choose, the next step is to define limit values for which two neighbour fuzzy sets have equal membership (a specific value to reach μS = μM = 0.5 and other value to μM = μL = 0.5), and the second is to define transition rules (the slope of each transition). The slope of each lateral of trapezium may be determined using equal angles or equal rate of value changing. Based on these values, the analyst can define the ranges for total membership in each set (values to μS = 1.0, μM = 1.0, and μL = 1.0).

4 Fuzzy Sets Designed to a Two-Level Fuzzy-Hedonic Mass Appraisal System

The proposed system to AVM is a combination of conventional hedonic price functions and two fuzzy levels. We choose property size and location as criteria to create these fuzzy systems. In this particular case, both are known be important on property values, and they are significant to define sub-markets. The first has variation in one direction (or axis) and the second is spatial. These characteristics make a difference on fuzzy functions, calculations of memberships, and other elements.

4.1 Fuzzy Sets Based on Size

To consider the influence of size, the range of floor area may be divided in some groups. In this case, it was divided in three groups, but can divide in 4, 5, or more classes. Following this, Eq. (3) may be written as on Eq. (4):

$${\text{Value}}\,\left( {\text{P}} \right) =\upmu_{\text{S}} {\kern 1pt}^{ *} {\text{Model}}\_{\text{S}}\left( {\text{P}} \right) +\upmu_{\text{M}} {\kern 1pt}^{ *} {\text{Model}}\_{\text{M}}\left( {\text{P}} \right) +\upmu_{\text{L}} {\kern 1pt}^{ *} {\text{Model}}\_{\text{L}}\left( {\text{P}} \right)$$
(4)

where Model_S(P), Model_M(P), Model_L(P) are the estimates of models S, M and L applied to property P, and μS, μM, μL are membership degrees of property P to each class.

We chose a slope of 20 % of size-limits in each crossing (0.8 and 1.2). In this case, membership functions use a scheme such as the presented in Fig. 3 and detailed in Exhibit 1:

  • Exhibit 1—Membership functions to trapezoidal fuzzy sets

  • μS = 1, μM = 0 and μL = 0, if size < LimSM*0.8

  • μM = (size-LimSM*0.8)/(LimSM*0.4), μL = 0 and μS = 1-μM, if size ∈ [LimSM*0.8; LimSM*1.2]

  • μM = 1, μS = 0 and μL = 0, if size ∈ (LimSM*1.2; LimML*0.8)

  • μM = (LimML*1.2-size)/(LimML*0.4), μS = 0 and μL = 1-μM, if size ∈ [LimML*0.8; LimML*1.2]

  • μL = 1, μS = 0 and μM = 0 if size > LimML*1.2

Actually, using the size of each property, only one or two memberships function has value different from zero.

4.2 Fuzzy Sets Based on Distance

In the case of location, relationship of properties with the neighbour properties occur in all directions (360°). General influence is the weighted sum of influences in all directions. The participation of neighbour cases in the final values is dependent of the weighting scheme defined to fuzzy sets. In this format, participation is more significant to next units. A format based on inverse of distance to weighting cases is an interesting option. Figure 4 present two inverse-distance fuzzy sets, using 1/dk, with k = 1 and k = 2 (1/d and 1/d2). Increase k will reinforce membership values to neighbours points (weighting more strongly neighbour cases). Therefore, the importance of neighbours cases on final value increases.

Fig. 4
figure 4

Two conical fuzzy sets with membership determined by distance to centre point

A fuzzy system to distance composed by m fuzzy sets may be described like in Eq. (5):

$${\text{Value}}\left( {\text{P}} \right) = \sum^{\text{j}} [\upmu_{\text{j}} \left( {\text{P}} \right)^{*} {\text{Value}}_{\text{j}} \left( {\text{P}} \right)]$$
(5)

where Value(P) is the valuation applied to property P; μj(P) is the membership of P to each fuzzy set j; Valuej(P) is the value calculated to P using Modelj; μj(P) = distance(P, j)−k/w, with w = ∑j[distance(P, j)−k], and w is calculated to reach ∑μj(P) = 1; distance(P, j) is the linear distance from case P to the reference centre of the Modelj, which have coordinates (xj, yj); k is the exponent that give a weight to the influence of distance; and j = {1, …, m}.

Fuzzy systems consist on a weighted sum of partial estimates to size, weighted then by distance. The double-fuzzy value is calculated combining Eqs. (4) and (5)—generating Eq. (6):

$$\begin{aligned} & {\text{Double}} - {\text{fuzzy}}\left( {\text{P}} \right) = \sum^{\text{j}} \{\upmu_{\text{j}} \left( {\text{P}} \right)^{ *} \sum^{\text{i}} [\upmu_{\text{i}} \left( {\text{P}} \right)^{ *} {\text{Model}}_{\text{i}} \left( {\text{P}} \right)]\} \\ & = \,\sum^{\text{j}} \{\upmu_{\text{j}} \left( {\text{P}} \right)^{ *} [\upmu_{\text{S}} {\kern 1pt}^{ *} {\text{Model}}\_{\text{S}}\left( {\text{P}} \right) +\upmu_{\text{M}} {\kern 1pt}^{ *} {\text{Model}}\_{\text{M}}\left( {\text{P}} \right) +\upmu_{\text{L}} {\kern 1pt}^{ *} {\text{Model}}\_{\text{L}}\left( {\text{P}} \right)]\} \\ \end{aligned}$$
(6)

5 Developing the Fuzzy-AVM System

In the sequence, we present an example of this system. This process may be viewed as the first step to build an AVM, and because that may seem a bit complex and time consuming. However, after this step, it’s only need to include new market cases to feed the database and to adapt the models to market changes, with minor effort.

The proposed fuzzy system is compared with conventional hedonic and surface models. We use a modular system, considering a lattice of squared modules of 1 × 1 km (such as Fig. 5). They are based on distance of the case in valuation to each module centre of the lattice and use a weighting based on the inverse of squared distance.

Fig. 5
figure 5

Spatial configuration to proposed system

The centre of each module is the reference to model location. In each module were calculated 3 models, designed to measure values in 3 different classes of property size (small, medium and large properties). The limits to each size-class were determined to each module in an independent way.

Data collected to develop the example. The system uses a database with more than 160 thousand cases of apartment sales in Porto Alegre, a southern Brazilian city. We use a cross section/time series sample of urban area, considering all the urban space in the city. The sales occur between 1998 and 2014. Data available include physical, location and sales characteristics. Table 1 presents the variables and basic statistical data and Fig. 6 illustrate location of properties and positioning of the lattice of modules.

Table 1 Variables and statistical properties for data collected
Fig. 6
figure 6

Positioning of sample cases and the 9-module lattice (The point 0.0 represents the CBD; scale: approximated 1:10,000)

Sale price was originally in Brazilian Real (R$) and was converted to Euros, using exchange rate of July, 2014. Size, quality and age of the properties are like in municipal cadastre. District quality is a subjective attribute, which is determined as base on author’ experience. Coordinates are measured using as reference point the Central Business District (CBD, the historical and commercial city centre, which is located as 0.0 in Fig. 6) Distance from CBD is the linear distance, in km, from to each property to CBD. Sale date was converted to a continuous scale of months, in which the month of first sale represents Time = 0. Characteristics like number of rooms and parking spaces were not available in source data. As based on Fig. 5, we use 9 fuzzy sets to consider location in the proposed systems. The central module (5) is the target to evaluate the system. Demonstration reaches a part of the city (almost 6 % of urban area).

A sample was extracted from database to the region in analysis (in white in Fig. 6). Nine individual samples of equal size (6,000 cases) were obtained of each module centre (the 6,000 nearest cases using distance to the module’ centres as neighbour measurement). Cases were classified by size in three classes, resulting in three sub-samples of equal size (2,000 in each size-class). The limits by size are designed as LimSM (division between properties with small and medium sizes) and LimML (between medium and large size)—(see Fig. 3 and Table 2). Then 80 % of the cases in each sub-sample were randomly selected to calculate the models and the other 20 % were separated to verify statistical quality of these models. Limits are different in each module because the differences on size in each module probably represent real market differences among properties and make matter in appraisals.

Table 2 Property size in each module

6 Models to be Estimated

Modelling starts with an exploration of available data. After study about some alternatives in model and variable formats, a set of variables were defined. Initial estimations using a Box-Cox procedure indicates better statistical results using an exponent of θ = 1.25 (Eq. 7—it’s equivalent to use Value0.8). The same general format was used for all equations, to make the system a little bit simpler. Distance from Central Business District was the main location characteristic. Distance from other points, such as shopping centres and leisure points, were tested, but had not better results. Hedonic Price Models follow a quite simple format (Eq. 6).

$$\begin{aligned} {\text{Value}} = & \,\left( {{\text{a}}_{0} + {\text{a}}_{ 1} .{\text{Property}}\_{\text{Size}} + {\text{a}}_{ 2} .{\text{Property}}\_{\text{Quality}} + {\text{a}}_{ 3} .{\text{Age}} + {\text{a}}_{ 4} .{\text{Time}}} \right. \\ & \left. { + \,{\text{a}}_{ 5} .{\text{Distance}}\_{\text{CBD}} + {\text{a}}_{ 6} .{\text{District}}\_{\text{Quality}}} \right)^{\uptheta} \\ \end{aligned}$$
(7)

Surface model is very similar (Eq. 8). Location variables (Distance_CBD and District_Quality) are substituted by terms combining coordinates (X, Y), to the degree 2. Surfaces using degrees 3 and 4 also were estimated, but they were no better than degree 2 surface.

$$\begin{aligned} {\text{Value}} & = \left( {{\text{b}}_{0} + {\text{b}}_{ 1} .{\text{Property}}\_{\text{Size}} + {\text{b}}_{ 2} .{\text{Property}}\_{\text{Quality}} + {\text{b}}_{ 3} .{\text{Age}}} \right. \\ & \quad \left. { + \,{\text{b}}_{ 4} .{\text{Time}} + {\text{b}}_{ 5} .{\text{X}} + {\text{b}}_{ 6} .{\text{Y}} + {\text{b}}_{ 7} .{\text{X}}^{ 2} + {\text{b}}_{ 8} .{\text{XY}} + {\text{b}}_{ 9} .{\text{Y}}^{ 2} } \right)^{\uptheta} \\ \end{aligned}$$
(8)

In each module were estimated 5 models (Table 3), resulting in 45 models, which were mounted using the same routine. Furthermore, two general models (G.C and G.S) were estimated using all cases in the sample (31,618 cases).

Table 3 Scheme for HPM estimated in each module

Where M = {1, … 9}. All models were examined at light of assumptions of regression, and test hypothesis to model and regressors, at α = 0.05. In most cases, models and regressors reaches α = 0.01 level.

7 Results Obtained to the Example-System

Initially we present a detailed example of calculations to individual properties. A case located in module 5 was randomly selected to demonstrate the procedure to obtain fuzzy values. The case has 185.00 m2 of floor area. Its sale price is €158,401.12. The first stage uses hedonic models. Initial estimation using general conventional model (G.C) indicates a value of €160,758.32. The valuation using general surface model (G.S) reaches €162,922.06. On a similar way, conventional (5.C) and surface (5.S) models of module 5 have estimative of €160,758.38 and €162,922.08, respectively. In this case, P is a medium property, then the valuation by 5.1 is equivalent to using only the equation to medium properties (5.1b) and reach €159,254.28 (see Table 4). The second stage consists on fuzzy system. Values calculated by fuzzy schemes are presented in Tables 5, 6 and 7. The estimative using fuzzy model based on size (from module 5) is €159,375.89 (Table 4). Value calculated using fuzzy model based on location, which includes weighting the values of the nine modules by the inverse of squared distance among property and reference centre of the modules (without effects of size’ fuzzy sets) is €159,245.14 (Table 6). After all, the value using double-fuzzy model (combining size and distance fuzzy sets) is €159,378.86 (Table 7). Table 4 illustrates the estimative calculated by the three equations (models 5.1a, 5.1b, and 5.1c), weighted by fuzzy sets based on property size. In module 5, the limits among small/medium and medium/large sizes are defined by 118.28 and 214.00 m2, respectively (Table 2). The membership values are defined by trapezoidal fuzzy numbers (such as in Fig. 3) and value is calculated by Eq. (4).

Table 4 Fuzzy estimate by property size
Table 5 Fuzzy estimate by distance
Table 6 Double fuzzy estimate—by size and distance
Table 7 Results to general sample and models from module 5

The second fuzzy system is based only on distance fuzzy sets. The distance among property and each module is calculated using your coordinates of reference. The estimative are calculated by nine equations (models 1.1 to 9.1, which have 3 models: S, M, L), weighted by fuzzy memberships based on inverse of squared distance from property to each module-centre, corrected to reach ∑(µj) = 1. For instance, to the fuzzy set 5 in Table 5, ∑(1/d 25 ) = 346.03504, then µ5 = 0.47598/346.03504 = 0.001376, and the Value5(P) is €159,254.24, then partial value to module 5 is 0.001376* 159,254.24 = €156,486.08. Each Valuej(P) is calculated by Eq. (4) and the final value is calculated by Eq. (5).

Table 6 present the estimative to double-fuzzy system, which is calculated by twenty seven equations (models 1.1a to 9.1c—see Table 3). These values are weighted in a first stage by fuzzy sets based on property size. In a second stage values are weighted by fuzzy sets based on distance from property (P) to each module-centre, using also the inverse of squared distance. The values based on fuzzy-size sets are not presented in Table 6. However, for instance, the estimative to module 5 (€159,375.84) is the same presented in Table 4. Values are calculated by Eq. (6).

In the sequence we present a view of the system to mass appraisal purposes, like to valuate a property portfolio (represented in this case by test sample of module 5). RMSE of test sample are similar to training figures, confirming statistical quality of these models. Values for model 5.1 are determined using the equation adequate to each property size—small or medium or large models (Table 7).

8 Final Comments

The proposed system has a reasonable statistical behaviour, regarding the size of sample. Of course, a detailed treatment of outliers and estimate of models with different formats maybe conduce to better results. In case, we intent to maintain a general format in order to permit comparisons among error figures and value estimates. One advantage of fuzzy logic in valuations is to add flexibility to conventional variables and introduce variables based on common language in regression analysis. Some alternatives to the presented mass appraisal system:

Regarding sample size Sample size proportional to variability: number of cases to each module is k * standard deviation in the module; Size semi-proportional to variability: a fixed number (minimum) plus k * standard deviation; In both cases, using round (k * standard deviation), in thousand cases; Size proportional to units in municipal cadastre: sample = k * number of existing units in each module; Regarding number of models in each module Number of models semi-proportional to variability: number of models is 2 + p * standard deviation.

In this cases, using round (p * standard deviation), in number of models; Regarding fuzzy levels In case of detected importance to other variables, such as property type or property quality, one would be to include a third level, using 3 models to consider quality.