Keywords

1 Introduction

The multi-parametric models in real estate appraisal that use quantitative data analysis, can be divided into two groups: (a) those based on statistical techniques, such as multiple regression analysis [15], neural network [611], genetic algorithms [12, 13], and (b) those using only mathematical processing, such as structural equation systems [14, 15] and rough set theory [16], UTA [17, 18]. The statistical and mathematical approaches are different, of course, not so much for the content but for the way of thinking of Statistics and the way of thinking of Mathematics. The fundamental nature that distinguishes the two approaches is that Statistics is an inductive discipline, while Mathematics is rather deductive.

The model applied in this study is of the second type [19]; it uses linear programming techniques [2022] and derives its theoretical basis from tools developed as part of decision theory and operational research.

The decision-making process, approached with multi-criteria analysis, formally can be summarized in an evaluation matrix. The columns of the matrix represent the alternatives, while the rows describe the evaluation criteria. The values in each cell of the matrix are the attributes, namely the qualitative or quantitative level reached by each alternative for each criterion. In turn, the appraisal process involves the comparison of the property to be estimated with a sample of which the selling prices are known. This comparison is done by measuring the difference between the property characteristics taken into consideration among those that contribute to the formation of the value. The estimate may thus derive from the solution of a system of equations that has the following expression:

$$ s \, = \, D - 1 \cdot p, $$

where s is the vector of unknowns, namely, the property value and marginal prices of the characteristics, p is the price vector and D is the matrix of the differences between the characteristics. Therefore, it is evident the analogy with a decision-making process where, in this case, the alternatives are the property of the sample and the criteria are the characteristics used for the comparison.

The search for a model able to reproduce the actual decision-making of market participants (supply and demand) as best as possible, has led many researchers to propose the application, in the field of real estate appraisal, of procedures borrowed from decision theory. These analysis techniques approximate the utility functions of the players in the housing market. The utility function must be understood as that which describes the marginal price of the property characteristic.

Researchers, in an attempt to retrace the decision-making process in a real way, have tried to increase the flexibility of the models using non-linear functions. On the other hand, efforts have been directed to the development of models able to limit the negative impact on the results due to the presence of strongly correlated explanatory variables.

This model interprets the process of the price’s formation in the same way as a multi-criteria choice [23], with a multi-objective approach, where the features of the properties that the market considers to be essential represent the selection criteria.

It is therefore a multi-equation model, but unlike other models of the same type, the equations have no endogenous variables except for those that each equation tries to explain. This is why no problem arises for the identification and simultaneous estimation of the parameters. The model equations describe the contributions of the features taken into account in the estimation process of the market value.

The contributions are integrated into a single function-price additive [24]. Because the price variable is expressed as the sum of univariate functions, one for each feature of the property, the model is able to obviate the typical statistical approach problem, relative to the size of the sample data, which is a function of the number of variables taken into consideration. The problem, known as curse of dimensionality requires, for example, that in a multivariate regression model, the amount of data required to maintain the same statistical accuracy, grows more rapidly than the number of variables taken into consideration. Hence, compared with multiple linear regression models or other models using multivariate analysis, its basic assumptions appear much weaker. Moreover, the model allows to impose limitations on the functions that describe the individual contributions. The operator can deductively impose whether the contribution is positive or negative.

In deductive logic, it is also possible to assign piecewise-defined functions. In this way, preserving the simplicity of linear forms, the marginal contribution of individual features can better adapt to the economic logic or to the very special conditions of the housing market. You can thus take into account some economic principles as the law of diminishing marginal utility (with the increase of the internal floor area of the units, a reduction in the marginal price is expected); or you can adapt the model to the actual trend of some observable phenomena (for example the change in the sign function of the price based on the floor level, given that the intermediate levels generally have the highest values).

This paper is structured as follow. The following section provides the model description, focusing specially on the way in which to set the constraints, able to impose the shape of utility functions that describe marginal prices. Section 3 presents the case study, the real estate properties used for comparison and the constraints imposed. The last section illustrates and discusses the results also comparing them to those obtained using other approaches.

2 The Mathematical Formalism of the Model

In the mathematical formalism of the model \( A = \left\{ {i, 1 \le i \le m} \right\} \) is the set of m units of the sample, \( C = \left\{ {j, 1 \le j \le n} \right\} \) describes the n criteria (features) that identify the units, chosen from among those that the market considers to be most significant. Given these two sets, for each criterion j, V ij is the score, that is, the numerical value of the generic element of the set A (housing unit). A prerequisite for the development of the analytical model is for the scores to always be greater than zero (V ij  > 0). Scores assigned to the units V ij , for each of the selected criteria j, contribute to the formation of the sale price of a property. This contribution is represented under the symbol U ji ; it can be positive or negative and is expressed as a linear piecewise-defined function that binds it to V ij score based on criterion j. For this purpose, the range of non-null values of the j criterion should be split into subintervals T j (integer), constructed so that the elements of each are not present in another and that the set of elements of all the subintervals contains all measured values of the j criterion. Once the subintervals have been defined, for each of those relative to the j-th criterion, the upper D + tj limit and the lower D tj limit shall be indicated; where t is the generic subinterval between all the subintervals T j . The symbols α e W indicate, respectively, the constant and the angular coefficient of a generic linear function. The marginal evaluation function of the j criterion then assumes the following expression:

$$ U^{t}_{ij} = f^{t}_{j} \left( {V_{ij} } \right) = \left\{ \begin{aligned} & 0,\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad if\;V_{ij} = 0 \\ & \alpha _{tj} + V_{ij} W_{tj} \quad \quad if\; V_{ij} \ne 0 \;and\;D^{ - }_{tj} \le V_{ij} \le D^{ + }_{tj} \\ \end{aligned} \right. $$

The piecewise linear representation of the function expressing the contribution of the j criterion, provides an approximation of the probable non-linear function that could possibly represent the relationship between price and scoring of different characteristics. The subdivision into sub-intervals or sections therefore depends on the nature of the criterion adopted, and should be an expression of real elasticity of market prices towards changes in the values of the measured scores for the j-th criterion. The indispensable condition for the division into sections is that for every defined section there correspond a number of observations, sufficient to provide a representation of the function. If due to the peculiarities of the survey sample, the number of discrete values of the scores relative to a given criterion is poor, the problem can be resolved by considering each value as a section. The price function of the i-th property is constructed as an additive sum of the individual contributions which are obtained with respect to each j criterion (property feature). Its analytical form is:

$$ f\left( {U_{ij} , \ldots \ldots ,U_{in} } \right) = U_{o} + \mathop \sum \limits_{j = 1}^{n} U_{ij} $$

The d i and d + i symbols respectively indicate the negative and positive residual. One of them of course will be null. These residues are expressed in absolute value as the difference between the observed price pi and the estimated value for the i-th property:

$$ Pr_{i} - \left[ {U_{o} + \mathop \sum \limits_{j = 1}^{n} U_{ij} } \right] = d_{i} = \left\{ {\begin{array}{*{20}c} {d_{i}^{ + } \quad if\,\,d_{i} \ge 0} \\ {d_{i}^{ - } \quad if\,\,d_{i} < 0} \\ \end{array} } \right. $$

DA is the sum of the residues, weighted on relative prices observed.

$$ \mathop \sum \limits_{i \in A} \frac{1}{{Pr_{i} }}\left( {d_{i}^{ - } + d_{i}^{ + } } \right) $$

The model is developed on the minimum calculation of DA function and respecting the constraints imposed.

$$ Min (DA) $$

with the following constraints

$$ \begin{array}{*{20}l} {U_{o} + \mathop \sum \limits_{j = 1}^{n} U_{ij} - d_{i}^{ + } + d_{i}^{ - } = Pr_{i} , \quad d_{i}^{ + } \ge 0, \quad \quad d_{i}^{ - } \ge 0 \quad \quad with \,i \in A} \hfill \\ {\quad \quad \quad f\left( {D_{tj}^{ + } } \right) \le f\left( {D_{t + 1,j}^{ - } } \right) \quad \,\,\,for\quad 1 \le t \le T_{j} - 1, \quad j \in C^{ + } } \hfill \\ {\quad \quad \quad \quad \quad \,\,\,\,\,W_{tj} \ge 0 for\quad for\quad 1 \le t \le T_{j} ,\quad j \in C^{ + } } \hfill \\ {\quad \quad \quad f\left( {D_{tj}^{ + } } \right) \ge f\left( {D_{t + 1,j}^{ - } } \right) \quad \quad for \quad 1 \le t \le T_{j} - 1, \quad j \in C^{ - } } \hfill \\ {\quad \quad \quad \quad \quad \,\,\,\,\,W_{tj} \le 0 for\quad for \quad 1 \le t \le T_{j} , \quad j \in C^{ - } } \hfill \\ \end{array} $$

3 The Case Study

The data used for the study are those already elaborated in a previous research [25], specifically 148 sales of residential property units located in a central district of a city of the Campania region, i.e. in a homogeneous market area with identical extrinsic characteristics, over a period of eight years (Tables 1 and 2).

Table 1. Variable description
Table 2. Statistical description of variables

Table 3 describes the sub-intervals of variation of the scores assigned to each criterion.

Table 3. Sections of variation of the scores

The empirical knowledge of the likely contribution to the price of the selected features has enabled the following choices: for the age variable the function is constrained to be decreasing; for the variables related to the surfaces, the facilities, the views and the maintenance, the functions are assumed to increase, while the functions related to the floor level and the date of sale are free from constraints.

4 The Results

The Lp_solve software is used to analyse the real estate data. Lp_solve is a free Mixed Integer Linear Programming (MILP) solver.

Table 4 shows the coefficients defining the piecewise functions of the individual criteria.

Table 4. Coefficients of the piecewise functions

The residue analysis indicates that the model has good predictive ability. The average percentage error is 7.13 %. The predictive power of the model is therefore higher than the multiple regression analysis (MRA) 7.84 % but slightly lower compared to a semi-parametric regression method based on Penalized Spline Smoothing, 6.47 %. The comparison between the residuals of these three models, however, shows that the one proposed in this study has the best performance on a percentage of 85 % of the sample in comparison with the MRA and on a percentage of 68 % compared to the non-linear regression model (Fig. 1).

Fig. 1.
figure 1

Residue analysis (Color figure online)

The graphs in Figs. 2, 3, 4 and 5 show the marginal value piecewise functions of some features.

Fig. 2.
figure 2

Value Piecewise functions of internal area

Fig. 3.
figure 3

Value Piecewise functions of date of sale

Fig. 4.
figure 4

Value Piecewise functions of balconies area

Fig. 5.
figure 5

Value Piecewise functions of floor level

For a better understanding of the graphs, when the piecewise functions have the steps, the moving average is calculated.

The results are consistent with the expectations, especially with regard to those parameters that are not constrained. The law of diminishing marginal utility is respected (Figs. 2, 4), and the variation of the marginal contribution of the floor level is consistent, first increasing then decreasing (Fig. 5). Even the marginal contribution on the date of sale reflects the dynamics of the market observed by official Observatories of the real estate market (Fig. 3).

5 Conclusion

The model applied in this study, is an effective real estate estimation tool. It shows greater confidence in the results compared to the linear statistical analysis models. On the one hand, it retains the advantages of a linear approach; on the other hand, it adds some operating advantages resulting from the reduction of the basic assumptions. The improved reliability comes from its ability to interpret the investigated phenomenon, due to an approach in which the deductive component plays a decisive role.

A priori knowledge of the phenomenon and the evaluator’s experience with adequate perception of the market mechanism, allows to set some constraints that lead the inductive analysis results within a predetermined track. One of the strengths of the model lies precisely in the possibility, not necessarily conceived as an exercisable option, of connecting the estimate to the indications that the deductive analysis provides about the shape and/or direction of the individual functions of the marginal contributions.

The essential feature in inductive reasoning is its generalizability, however, generalization is effective when the sample is very large and representative of the population. The complexity of the real estate market makes the construction of this type of sample difficult. The model proposed by integrating the inductive analysis with a deductive approach overcomes the limitations of the statistical analysis.

The marginal prices that easily derived from the piecewise functions may be used as benchmarks even in the estimates, very frequent in practice, which are based on a reduced number of comparable units.