Introduction

Real estate is the largest single class of investment in the world and the biggest source of household wealth in most industrialized countries. Estimating the risk and return of real estate is therefore important for underwriters, investors, policy makers and households alike.

Increasing attention has been paid to track real estate market movements on a detailed level (Constantinescu & Francke 2013). Granular indices give a better representation of the space and asset market the properties are traded in and are thus more informative. Also, having more granular indices enables tradable derivatives based on these indices (Bokhari & Geltner 2012; Deng & Quigley 2008). Estimating granular indices is challenging, because real properties are infrequently traded and are – especially in the case of commercial real estate or single-family housing – very heterogeneous (Deng et al. 2012, 2014). Geltner & Ling (2006) discuss the general trade-off that arises because of this: more granular indices are more useful, but ceteris paribus also more noisy, so less reliable and as a consequence, less useful.

In order to address this trade-off, we introduce a Hierarchical Repeat Sales Model (HRS). The HRS is a repeat sales model with multiple stochastic log price trends having a hierarchical additive structure: one common log price trend and cluster specific log price trends in deviation from the common trend. Both the number of clusters and the number of entries within a cluster are flexible. For example, consider the clusters property type and region. The log price trend for property type A in region B is the sum of the common log price trend, the property type A log price trend and the region B log price trend. The specification of the stochastic trends is flexible, for example a local linear trend model for the common trend and random walks for the cluster specific trends. The stochastic trends replace the commonly used fixed coefficients of time dummy variables.

The HRS basically consists of two sets of equations. The first equation is the measurement equation which relates the individual property log returns to changes in log price levels. The second set of equations is the transition equation which specifies the structure of the common and cluster trends. The hierarchical trend structure has successfully been applied in hedonic price models, see Francke & De Vos (2000), and Francke & Vos (2004), but has never been applied in repeat sales models. The HRS can be interpreted as a generalization of the structural time series repeat sales model (Francke 2010), in which only a common stochastic trend has been specified.

In order to demonstrate the strength of the hierarchical repeat sales model, we apply it in two different markets. The first application is commercial real estate (office, retail, industrial and multi-family) in the Philadelphia/Baltimore region in the United States, and the second application is owner-occupied housing (single-family housing and apartments) in a small area in Amsterdam in the Netherlands. We are able to estimate reliable quarterly price indices over a period of 15 years using less than 100 transaction pairs per index on average, about 3 transactions per quarter.

As a robustness check we also estimate our main model on (sub)markets with plenty of observations in itself first; multi-family in Manhattan (New York) and Los Angeles. Subsequently we (uniformly) sample without replacement 25% of the data, which makes the data scarce again. Our results show that omitting 75% of the data does not affect our indices considerably.

In this paper we make multiple contributions to the field of index construction. Firstly, we imply a hierarchical trend structure to the repeat sales model. The main benefit of using a hierarchical structure is that an aggregate dataset can be used to estimate indices on a more granular level, and thus use more information efficiently. Hierarchical trends have so far not been applied in repeat sales models, because the estimation is technically much more challenging compared to hedonic price models.

Secondly, we show that it is more appropriate to assume a t-distribution for the error term, instead of the more generally used normal distribution. This holds in specific for small samples, as is the focus in our application. The estimated degrees of freedom of the t-distribution are mostly between 2.5 and 3, very clearly rejecting the normal distribution, and the model fits the data better by a large margin.

Thirdly, our results reveal that – indeed – there are large differences between the different real estate markets, even though (spatially) they are relatively close by. For example, Baltimore had consistently more price appreciation during our analyzed period than Philadelphia. Also, the results from Amsterdam show that single-family homes had a higher risk-return profile compared to apartments, which is consistent with earlier findings.

Finally, to the best our knowledge, this paper is the first real estate application which is estimated using a Markov Chained Monte Carlo algorithm. In our case we used Gibbs sampling.

The remainder of this paper is structured as follows. “Model Specification and Estimation” gives the evolution of repeat sales models in markets with small number of observations and our proposed methodology. “Application” first provides the data and some descriptive statistics. Next the results of the models are presented. Finally “Conclusions” concludes.

Model Specification and Estimation

The Repeat Sales Model in Small Samples

The existing literature on real estate index construction has been developed initially around the hedonic (Rosen 1974) and the repeat sales methodology (Bailey et al. 1963). The main benefit of the repeat sales over the hedonic model, is that the first is less affected by specification errorsFootnote 1 and characteristics that are not observed in the data. This issue is exacerbated in markets with heterogeneous goods, such as commercial real estate or single-family housing.Footnote 2 On a negative side, only selling prices of properties sold more than once are included in a repeat sales model: all single-sales are left out. The issue that (repeat) sales are a non-random selection of the entire property stock (sample selection bias) is addressed by Gatzlaff & Haurin (1997), Hwang & Quigley (2004), and Chinloy et al. (2013). In this article, however, we do not address the problem of sample selection bias.

The hedonic price model with time dummy variables (pooled regression) is given by

$$ p_{it} = \mu_{it} + X_{it}\beta + Z_{it}\gamma + \varepsilon_{it}, $$
(1)

where p i t is the log transaction price of property i, i = 1, …, n t at time t, t = 1, …, T, X i t are observed characteristics and Z i t are unobserved characteristics, with corresponding parameters β and γ. μ t is a time varying constant, capturing the log price movement over time. The error term ε i t is typically assumed to be independently normally distributed with mean zero and variance \(\sigma ^{2}_{\varepsilon }\).

Under the assumptions that both the observed and unobserved variables X and Z do not change over time, we can take ‘first differences’ for pairs of sales. The repeat sales model is now provided by

$$ \tilde{p}_{i} = p_{it} - p_{is} = \mu_{t} - \mu_{s} + \varepsilon_{it} - \varepsilon_{is} = {\Delta} \mu_{s+1} + {\cdots} + {\Delta} \mu_{t} + \varepsilon_{it} - \varepsilon_{is}, $$
(2)

where s represents the time of buy (as opposed to time of sale t, with s < t), and μ 1 = 0 for identification. Please note that we follow the broad strand of literature by assuming pair fixed effects, instead of a property fixed effects, leading to zero correlation between pairs of sales of the same property. Traditionally, the specification of the time effect is simply a dummy variable approach with fixed parameters μ t .

Equation 2 can alternatively be expressed as

$$ \left( \begin{array}{c} p_{is} \\ p_{it} \end{array} \right) = \left( \begin{array}{c} \mu_{s} \\ \mu_{t} \end{array} \right) + \left( \begin{array}{c} 1 \\ 1 \end{array} \right) \delta_{i} + \left( \begin{array}{c} \varepsilon_{is} \\ \varepsilon_{it} \end{array} \right) , $$
(3)

where δ i is a pair fixed effect.Footnote 3 Conditional on δ i , the estimate of μ t is the average selling price at time t. This means that the estimate of μ t does not depend on preceding and subsequent periods. However, the estimate of μ t is sensitive to transaction price noise, in particular in small samples when the number of transactions per period is low. This happens, for example, with local price indices, short time periods, and/or in case of severe outliers, when the transaction price differs from its true market value by a large amount. The resulting price indices may then become very volatile (Francke 2010).

Multiple approaches have been proposed in order circumvent volatile indices in small samples. One method is to ‘simply’ smooth the estimated index \(\hat {\mu }_{t}\) ex post. Examples can be found in Cleveland (1979) and Clapp (2004), who both use a form of the local polynomial regression to smooth the index. The main drawback of this two-step procedure is that it does not take into account the uncertainty in the estimates of μ t . One can also replace the dummy variables by a deterministic trend, like cubic or piecewise linear splines. Thorsnes & Reifel (2007) for example, use the Fourier form approach, to estimate a hedonic model. Bokhari & Geltner (2012) use a two-stage data frequency conversion procedure in a repeat sales model, by first estimating lower-frequency indices (using the Case & Shiller methodology) staggered in time, and then applying a generalized inverse estimator to convert from lower to higher frequency return series. Recently authors have also created ‘artificial pairs’ (thus increasing the number of observations) by matching transactions based on their hedonic characteristics, see for example McMillen (2012) and Guo et al. (2014).

Goetzmann (1992) introduced into the real estate literature what is perhaps the major approach to date for addressing small-sample problems in property price indices, namely, the use of Bayesian inference within repeat sales regression models. Goetzmann (1992) specifies the prior distribution for the periodic returns in Eq. 2 to be normally distributed, \({\Delta } \mu _{t} \sim N(\kappa ,\sigma ^{2}_{\eta })\), implying a random walk with drift for the log price index μ t : μ t+1 = μ t + κ + η t . The variance parameters of the model (the signal \(\sigma ^{2}_{\eta }\) and noise \(\sigma ^{2}_{\varepsilon }\)) are estimated in a first step and ‘plugged in’ to the second step of the Bayesian procedure, which can lead to biased estimates of the variances.Footnote 4

Francke (2010) generalized this model by assuming that the trend component follows a local linear trend, and by providing the loglikelihood function \(\ell (\tilde {p};\sigma ^{2}_{\eta },\sigma ^{2}_{\varepsilon })\) to estimate the signal \(\sigma ^{2}_{\eta }\) and noise \(\sigma ^{2}_{\varepsilon }\) directly (for example by maximum likelihood), avoiding the somewhat ad hoc two–step procedure proposed by Goetzmann (1992). The local linear trend repeat sales model is given by

$$\begin{array}{@{}rcl@{}} p_{it} - p_{is} & = & \mu_{t} - \mu_{s} + \varepsilon_{it} - \varepsilon_{is}, \qquad \qquad \varepsilon_{it} \sim N(0,\sigma^{2}_{\varepsilon}), \end{array} $$
(4)
$$\begin{array}{@{}rcl@{}} \mu_{t+1} & = & \mu_{t} + \kappa_{t} + \eta_{t}, \qquad \qquad \eta_{t} \sim N(0,\sigma^{2}_{\eta}), \end{array} $$
(5)
$$\begin{array}{@{}rcl@{}} \kappa_{t+1} & = & \kappa_{t} + \zeta_{t}, \qquad \qquad \vartheta_{t} \sim N(0,\sigma^{2}_{\zeta}). \end{array} $$
(6)

Note that the local linear trend is very flexible and includes different specifications, like random walk (\( \sigma_{\zeta}^{2} \) = 0 and κ 1 = 0), random walk with drift (\( \sigma_{\zeta}^{2} \) = 0) (Goetzmann 1992) and smoothed trend (\( \sigma_{\eta}^{2} \) = 0), see Harvey (1989) for a description of the different functional forms.

The local linear trend repeat sales model is an example of a structural time series model. A structural time series model is a model in which the trend, error terms, plus other relevant components, are modeled explicitly. In contrast to the dummy variable approach, the structural time series model enables the prediction of the price level based on preceding and subsequent information. This means that even for particular time periods where no observations are available, an estimate of the price level can be provided. State space models have also been successfully used to estimate hedonic price indices, see Schwann (1998), Francke & De Vos (2000) and Francke & Vos (2004). We built on the work of Goetzmann (1992) and Francke (2010) by estimating the repeat sales model in a structural time series framework.

The Hierarchical Repeat Sales Model

A way to extend the repeat sales model is to imply a hierarchical trend structure. The main benefit of using a hierarchical structure is that an aggregate dataset can be used to estimate indices on a more granular level, and thus use more information efficiently. This was done for the hedonic price model in Francke & De Vos (2000) and Francke & Vos (2004). In this model μ t represents a common trend, while we add cluster specific trends. The Hierarchical Repeat Sales Model (HRS) is specified by

$$\begin{array}{@{}rcl@{}} \lambda^{j}_{t+1} & = & {\lambda^{j}_{t}} + {\zeta^{j}_{t}}, \qquad \qquad {\zeta^{j}_{t}} \sim N(0,\sigma^{2}_{\zeta^{j}}I_{n_{j}}),\quad \end{array} $$
(7)
$$\begin{array}{@{}rcl@{}} p_{it} - p_{is} & = & \mu_{t} - \mu_{s} + \sum\limits_{j=1}^{k}{{d^{j}_{i}} ({\lambda^{j}_{t}} - {\lambda^{j}_{s}})} + \varepsilon_{it} - \varepsilon_{is}, \qquad\varepsilon_{it} \sim N(0,\sigma^{2}_{\varepsilon}).\quad \end{array} $$
(8)

The vectors of cluster trends \( {\lambda ^{j}_{t}}, \,j=1,\ldots ,k, \) are modeled as (random walk) deviations from the common trend μ t , where k is the number of clusters. The common trend is specified as a local linear trend model, given by Eqs. 56.Footnote 5 The number of elements in cluster j is n j , so vector \( {\lambda ^{j}_{t}}\) has dimension n j . The selection row vector \( {d^{j}_{i}}\) has dimension n j and consists of zeros and a one to select the appropriate element of cluster j for observation i. \(I_{n_{j}}\) is an identity matrix with dimension n j . We impose μ 1 = \( \lambda_{l,1}^{j} \) = 0, j = 1, …, k, l = 1, …, n j for identification reasons. The total number of different price trends that can be calculated from the HRS is \({\Pi }_{j=1}^{k}{n_{j}}\).

In our empirical application we will use two clusters, one for property types \({\lambda ^{1}_{t}}\) and one for locations \({\lambda ^{2}_{t}}\). For example, for the US market we have 4 property types – Multi-family, Industrial, Office, and Retail – and 3 locations – Baltimore, Philadelphia and Rest. In total 4 × 3 = 12 price indices are calculated by HRS. The index for offices (Off) in Philadelphia (Phi) can be calculated as the sum of the common trend, an office specific trend and a Philadelphia specific trend, \(\mu _{t} + \lambda ^{1}_{\text {Phi},t} + \lambda ^{2}_{\text {Off},t}\).

The Hierarchical Repeat Sales (HRS) has multiple benefits over the structural time series repeat sales model (STRS) by Francke (2010). First of all, it uses more information for each trend, resulting in being able to estimate more reliable indices in markets with even smaller numbers of observations. In the extreme, with no observations at all in a specific sub-market (combinations of elements from the cluster trends), it is still possible to produce an index based on the common trend and the trends for location and property type. Secondly, indices estimated by the HRS are less likely to ‘break’. A break is defined as an index which ends prematurely, due to missing data at the end (or start) of the sample. Even though the STRS can handle missing observations for earlier time periods, no price level will be provided in the final (initial) period(s) if corresponding data is missing.Footnote 6

More Flexible Specifications of the Hierarchical Repeat Sales Model

Note that the variance of \({\zeta ^{j}_{t}}\) in Eq. 7 is assumed to be equal for all elements within a cluster. Another assumption is that the variance of the noise term ε i t in Eq. 8 does not vary over combinations of clusters. In this subsection we will relax both assumptions: all elements are allowed to have their own variances.

Another generalization is that we replace the normality assumption of the error term ε by a potentially heavy-tailed distribution, the t-distribution \(\varepsilon _{it} \sim t_{\nu }(0,\sigma ^{2}_{\varepsilon })\), where ν is the degrees of freedom. The t-distribution has fatter tails compared to the normal distribution, and can better deal with outliers. For large values of ν (ν) the t-distribution coincides with the normal distribution. If ν = 2 the error distribution collapses to a Cauchy distribution. Only for ν > 2 the second moment exists. The degrees of freedom will be treated as a parameter to be estimated from the data.

Table 1 provides an overview of the different models that will be applied in the empirical section. The local linear trend methodology as proposed by Francke (2010) is denoted STRS. The Hierarchical Repeat Sales model is denoted HRS. The suffix FE (VE) is added when all elements in vector \({\zeta ^{j}_{t}}\) have the same (different) variance(s). The VEN model has in addition different variances of the error term ε t for combinations of clusters. For all models, except for the HRS FEn model, we assume a t-distribution for the error term ε t .

Table 1 Summary of the models

Estimation

Hierarchical trends have not previously been applied to repeat sales models, mainly because of the more complex estimation procedure compared to hedonic price models with a hierarchical trend structure. In this section we provide two different estimation methods for the hierarchical repeat sales model. The first is by using state space methods (Kalman filter) and the second by Markov chain Monte Carlo techniques (Gibbs sampling).

For the Kalman filter we cannot use the ‘first difference’ specification (8), because the individual return data depend on the difference of the state vector in two moments in time, with varying time spans. The Kalman filter assumes that the state vector is a Markov chain. Therefore we have to use the alternative specification including pair or property fixed effects δ, such that the measurement equation does not depend on preceding states,

$$\begin{array}{@{}rcl@{}} p_{t} = D^{\delta}_{t} \delta + \mu_{t} + \sum\limits_{j=1}^{k}{D^{j} {\lambda^{j}_{t}}} + \varepsilon_{t} , \qquad \qquad \varepsilon_{t} \sim N(0,\sigma^{2}_{\varepsilon} I), \end{array} $$
(9)

where \(D^{\delta }_{t}\) is a selection matrix for the pair or property fixed effects, and the state vector α t is defined by \(\alpha _{t} = (\mu _{t}, \kappa _{t}, \lambda {{~}_{t}^{1}}^{\prime },\ldots , \lambda {{~}_{t}^{k}}^{\prime })^{\prime }\), see Eqs. 57. Define ψ as the vector of all variance parameters in the state and measurement equations.

The downside of this specification is that we have to deal with a (large amount) of pair or property fixed effects δ i . This can be done efficiently by the augmented Kalman filter (De Jong 1991). This filter gives estimates of the state vector α t conditional on all information up to time t and ψ.Footnote 7 Moreover, the augmented Kalman filter produces the loglikelihood (p; ψ). The loglikelihood can be used to obtain maximimum likelihood estimates of ψ or it can be used for Bayesian inference. Estimates of the state vector conditional on the full sample and ψ are computed by the Kalman smoother.

The standard (and augmented) Kalman filter assumes that the error terms in both the measurement and transition equations are normally distributed. Shephard (1994) and Durbin & Koopman (2012)[for details see Chapter 10.8.3] provide efficient ways to estimate state space models with t-distributions for the error terms in the measurement and transition equations. For example, if the error term in the measurement equation ε i t has a t-distribution with ν(> 2) degrees of freedom and variance \(\sigma _{\varepsilon }^{2}\), then it has the representation \(\varepsilon _{it} = (\nu -2)^{1/2} \varepsilon ^{*}_{it} / c_{it}^{1/2}\), where 1/2c i t ∼ Gamma(ν/2, ν/2) and \(\varepsilon ^{*}_{it} \sim N(0,\sigma _{\varepsilon }^{2})\) are independent. Conditional on c i t the model is linear Gaussian and the standard (augemented) Kalman filter can be applied. Markov chain Monte Carlo methods can be used to integrate c out. An identical procedure can be used for t-distributed error terms in the transition equation.

In the second estimation method we don’t apply state space techniques, and use directly the model formulated in ‘first differences’ given by Eqs. 58, avoiding the inclusion of the property or paired fixed effects δ. The second estimation method is full Bayesian inference, where non-informative priors are specified for all variance parameters as σ −2 ∼ Gamma(0.001;0.001). It is assumed that the degrees of freedom ν in the t-distribution has an exponential distribution: ν ∼ Exp(1/3). Although a full Bayesian analysis can be based on the loglikelihood function produced by the Kalman filter and the specified prior distributions for the variance parameters and the degrees of freedom, we used the program WinBUGS (Bayesian Analysis Using Gibbs Sampling), see Lunn et al. (2000). The main reason is the flexibility of the program: It only requires the model specification (in the ‘first difference’ representation (8), so without the pair or property fixed effects) and the specification of the priors. Appendix C provides an excerpt of the actual WinBUGS code used (Sturtz et al. 2005). All presented estimation results are from WinBUGS.

Application

Data and Descriptive Statistics

The Kadaster (Dutch Land Registry Office) provided the transaction data of individual houses in the Netherlands. The Kadaster is the official institution responsible for the registration of real estate properties. The Kadaster together with Statistics Netherlands construct an index based on the Sales Price Appraisal Ratio (SPAR) method (Bourassa et al. 2006). Price indices are provided on a national level as well as on regional (including Amsterdam) and house type levels. More details on the index construction method and the database can be found in De Vries et al. (2009). In this research we limit ourselves to a small area in Amsterdam-West, consisting of only five four-digit zip code areas. This area is of interest for us as it has a good mix of single-family housing and apartments (whereas Amsterdam is usually dominated by apartments). See Appendix A for more details on the exact location in Amsterdam.

Real Capital Analytics (RCA) provided us with repeat sales data of commercial real estate for the Philadelphia/Baltimore region, as defined by RCA. The Philadelphia/Baltimore region consists of the entire Delaware Valley and Baltimore metro region. It thus spans parts of the states: Pennsylvania, Maryland, Delaware and New Jersey.Footnote 8 RCA publishes a repeat sales index on a quarterly basis for the entire Philadelphia/Baltimore region, based on the frequency conversion methodology described by Bokhari & Geltner (2012).

In both cases, the total number of pairs is low: 915 pairs for our 5 zip codes in Amsterdam-West and 795 pairs for the combined Philadelphia/Baltimore region, between 2000 and the first quarter of 2016. Since we are interested in a quarterly index, we end up with an average of 28 transactions per quarter in Amsterdam-West and 24 transactions per quarter in Philadelphia/Baltimore. The resulting ‘standard’ Bailey et al. (1963) repeat sales quarterly index for both regions is shown in Fig. 1.

Fig. 1
figure 1

Bailey et al. (1963) indices for both regions

Both indices are very volatile (noisy) and other estimation techniques would probably fit the data better. Both indices show high price appreciation before the crisis, followed by a subsequent price drop and subsequent recovery. The price drop in Amsterdam is observed a few years after the start of the crisis. The most common explanation is the high mortgage debt levels of Dutch households and subsequent negative equity, which resulted in a big drop in transaction volume, but less in prices (Genesove & Mayer 1997).Footnote 9

In our research we are interested in even more granular indices. For example, it is expected that the Philadelphia market is different (at least to a certain extent), than the Baltimore market. Also, offices, retail, industrial and multi-family housing complexes have their own demand and supply characteristics. The same goes for ‘the’ housing market in Amsterdam-West. Housing in every zip code, and per property type (apartment and single-family housing) can have a different price appreciation, volatility, etc. Obviously, the number of pairs observed is even lower per sub-market. Tables 2 and 3 give the number of pairs per sub-market for both Amsterdam and Philadelphia/Baltimore respectively.Footnote 10

Table 2 Number of pairs in Amsterdam-West
Table 3 Number of pairs Philadelphia/Baltimore

We end up with an average of only 90 pairs per market, or 3 transactions on average per quarter for both markets. However, note that the HRS uses information of the entire region and all property types, but still produced the individual indices. In other words, the number of pairs used by the HRS is still 795 for Philadelphia/Baltimore and 915 for Amsterdam-West. This is inherently different from the ‘standard’ Bailey et al. (1963), Case & Shiller (1987) and ‘Bayesian’ Goetzmann (1992) and Francke (2010) repeat sales methodologies.

Comparison of the Index Methodologies

In this Section the results of 5 different methodologies are compared, see Table 1 for an overview. Figures 2 and 3 show the price indices for the different combinations of location and property type in Philadelphia/Baltimore and Amsterdam-West respectively. Note that we suppress the indices for the ‘rest’ area in Philadelphia/Baltimore to conserve space, however the indices are available upon request. The indices of the HRS FEn model for both Philadelphia/Baltimore and Amsterdam-West are presented in Appendix B. Table 4 shows the estimation results of the variance parameters and the degrees of freedom ν for different data sets and models. Table 5 provides some descriptive statistics of the (log) returns of the indices: mean, absolute mean, standard devation, autocorrelation and correlation between sub-indices.

Fig. 2
figure 2

Price indices for the Philadelphia/Baltimore metro area

Fig. 3
figure 3

Price indices for Amsterdam-West

Table 4 Estimation results from repeat sales models over different data samples
Table 5 (log) Return statistics

For the Philadelphia/Baltimore region it is evident (Fig. 2 and Tables 4 and 5) that the STRS model results in more smooth indices. The standard deviation of the quarterly returns is 2.3% on average for the STRS model compared to 3.5% for the HRS type models (see Table 5). An obvious reason for the smoothness in the STRS indices is the following: unlike the HRS index, the STRS index is based only on observations in the specific cluster. When very little data is available in a specific cluster, the STRS model just interpolates (according to the local linear trend structure) for the periods without observations, resulting in relatively smooth indices. This ‘over-smoothing’ is not evident in Amsterdam-West. However, it is difficult to compare the STRS models with the results from the HRS models, as so many indices ‘break’ using the first methodology. Finally, the timing of the STRS models seem off at certain points in time. In both Philadelphia/Baltimore and Amsterdam-West, the big downturn is a full year earlier than one would expect on average.

One issue with the HRS FE model is that the sub-indices seem have a very high correlation, see Table 5. In Amsterdam-West, the correlation of the (log) index returns is even close to 1. The reason for this behavior is that when subclusters have only a few observations, the cluster indices from HRS FE model will closely follow the common trend. The correlation drops considerably when allowing for a more flexible structure (VE and VEN models).

The HRS VE and VEN results reveal that in Philadelphia/Baltimore offices and retail are the most volatile markets, and multi-family the least. Within Amsterdam-West, the largest differences are between the zip codes and not so much between the property types (single family versus apartments).

The estimates of the scale parameter σ ε for the individual STRS models coincide largely with those of the HRS VEN model. This holds for both Philadelphia/Baltimore and Amsterdam-West, see Fig. 4. On average \(\hat {\sigma }_{\varepsilon }\) is larger in the Philadelphia market (compared to Baltimore) and for offices and retail. There is a large difference in noise between property types in Amsterdam-West. In all cases single family houses have a larger \(\hat {\sigma }_{\varepsilon }\) compared to apartments. This is not surprising as single family housing is more heterogeneous than apartments in general.

Fig. 4
figure 4

Estimates of scale parameters in the different repeat sales models

Interestingly, the difference between trends from the model with a normal distribution for the error term in the measurement Equation (HRS FEn), and its t-distributed counterpart (HRS FE), is quite large. For example, the Philadelphia office index ends at 120 (100) for the t-distributed (normally distributed) version. The multi-family index in Baltimore ends at 200 (220) for the t-distributed (normally distributed) version. In general the results clearly show that the errors in the measurement equation follow a t-distribution. In almost all cases the estimated degrees of freedom (ν) are between 2.5 and 3. Only in Amsterdam-West the error terms in the STRS and HRS VEN models are somewhat closer to normality.

Finally note that the Deviation Information Criterion (DIC) (Spiegelhalter et al. 2002) is vastly superior for the HRS FE with a t-distribution over the normal distribution for both Philadelphia/Baltimore and Amsterdam-West. The DIC is lowest for the HRS VEN model in Philadelphia/Baltimore and for the HRS VE model in Amsterdam-West.Footnote 11

Robustness

In this Section we will estimate our model (the HRS VEN) on large (sub)markets with plenty of observations. Subsequently we (uniformly) sample without replacement 25% of the data, which makes the data scarce again. This random sampling is done three times. The idea is to compare the different indices and assess to what extent the indices based on the ‘scarce’ dataset follow the index on the full sample. We also include the standard Bailey et al. (1963) repeat sales index (using the full sample of the submarket) for comparison.

More specifically, we are interested in a multi-family index for both Manhattan (739 observations) and Los Angeles proper (1,388 observations) using - again - the RCA data.Footnote 12 The multi-family properties are placed in a property type cluster with retail, industrial and offices for both indices. The location cluster for the Manhattan index consists of the (other) New York City boroughs and the rest of the New York metro area (according to the RCA definition, which closely follows the official MSA of New York). Inland Empire and Orange County are the other regions in the location cluster for the LA index. The total number of observations, after adding the other property types and locations, is 3,417 (4,724) for the New York index (Los Angeles index).

Note that we did the sampling on the ‘full’ markets (all locations and property types). This makes the underlying data of the different runs even more random. For example, one of the New York runs only had 100 Manhattan multi-family observations, whereas in another run the number of observations for said submarket was 195. The exact number of observations, the descriptive statistics of these markets and the estimation results (of the hyperparameters) are omitted for brevity (however are available upon request). The resulting indices are displayed in Fig. 5.

Fig. 5
figure 5

Multi-family indices on sub-samples

First, Fig. 5 reinforces earlier findings in the field of structural times series in that the HRS is less volatile and more positively autocorrelated than the ‘standard’ Bailey et al. (1963) repeat sales index. Counterintuitive to prior expectation is that the ‘standard’ repeat sales index for Los Angeles is (a lot) more volatile than the New York index, even though there are twice as many observations. This might be caused differences in heterogeneity of the properties between the markets. More interestingly however, is that omitting 75% of the data does not affect the index considerably. Arguably, only the third run of the Los Angeles index goes a bit awry in the final year or two.Footnote 13

Conclusions

In this paper we focus on computing property price indices in small samples by hierarchical repeat sales (HRS) models. The HRS model can be seen as an extension of the structural time series repeat sales model (STRS) by Francke (2010). The STRS model assumes one common trend for the total market, modeled by a local linear trend model, whereas the HRS model assumes both a common trend, and cluster trends in deviation from the common trend (modeled by random walks). The clusters can for example represent locations and property types. Both the STRS and HRS model specify the trends as a stochastic time series models, replacing fixed time dummy variables in the original repeat sales model by Bailey et al. (1963).

Another modification to commonly used versions of repeat sales models is that we allow for a heavy-tailed error distribution to better deal with potential outliers in the data. We specify the error term by a t-distribution with ν degrees of freedom. The degrees of freedom is not set a priori, so will be estimated from the data.

We provide estimation methods that efficiently deal with the complex error structure induced by the stochastic trends and the t-distribution. The first estimation method uses the augmented Kalman filter and the second by Markov chain Monte Carlo techniques (Gibbs sampling).

We apply the hierarchical repeat sales model on two different markets: commercial property in the Philadelphia/Baltimore region in the United States and residential property in Amsterdam-West in the Netherlands. In both markets we have on average about 30 transactions per quarter. Unlike the standard repeat sales model the STRS model is able to produce reliable indices on the total market level. The HRS goes a step further: it subdivides the market in sub-markets based on location and property type, resulting in 10 (Amsterdam-West) and 12 (Philadelphia/Baltimore) sub-indices. In specific, versions of the HRS model where the noise components have a t-distribution, and the variance parameter of the cluster trend noise varies over the different elements in the cluster, perform well. The estimated degrees of freedom are small, rejecting the commonly made assumption of normality of the error terms.

For robustness we also estimate our main model on (sub)markets with approximately 1,000 observations each, namely multi-family properties in Manhattan (New York) and in Los Angeles proper. The number of observations are plentiful and should therefore result in reliable indices. The two clusters are - again - completed by (1) other property types and (2) other (sub)locations within the metro area. Subsequently we (uniformly) sample without replacement 25% of the data, and re-estimate the model. This way we can assess how accurate our index methodology is in a scarce data environment by comparing it to the ‘true’ index (volatility and mean returns-wise for example). Our robustness check shows that omitting 75% of the data does not affect our indices considerably.