Introduction and objectives

Trip generation models are the first and most important phase in traditional transport modelling given that they provide the basic input for the remaining simulation stages (McNally 2008). The classic specification of four stage models starts from the assumption that changes in the distribution of activities and changes in the transport network do not affect the number of trips being generated, a hypothesis which is difficult to justify over the medium to long term (Martínez 2000; Ortúzar and Willumsen 2001).

This article proposes the inclusion of accessibility indicators in the trip generation models to try to minimize this problem by making them sensitive to changes which may occur within the system of activities and the transport network. The theoretical relationship between accessibility and trip generation is supported by two basic theories (Thill and Kim 2005). Firstly, the act of making a journey is not generally an end in itself but more of a means to perform other activities. Secondly, the fact that trip generation depends on the services provided by the transport infrastructure, in other words, on the existence of the phenomena known as induced demand (Cervero 2002; Goodwin 1996; Hymel et al. 2010). The first supposition suggests that a greater presence of opportunities should lead to a greater disposition to make journeys. The latter implies that lower journey costs will create greater demand for transport both in terms of the number of journeys being made as well as their length. Furthermore, greater accessibility to opportunities could also suppose a reduction in journeys made by private vehicle (PV), an effect detected in previous research (Cervero and Kockelman 1997). Therefore, in a determined zone, changes made both to the location of activities and to the network should have an effect on trip generation if they coincide with any change in opportunities or travelling costs. If the trip generation models are not sensitive to these changes, then the transport models could provide biased predictions for example, about the resulting demand from the construction of a new transport infrastructure.

Nevertheless, in spite of its theoretical importance, the inclusion of accessibility indicators in trip generation models has proven to be problematic. In some estimations the indicators that were used have turned out not to be significant or have produced an unexpected sign which is why some authors have recommended against using it in practice (Ewing et al. 1996; Hanson and Schwab 1987).

This article proposes a new evaluation of the explanatory capacity of the accessibility variable in trip generation models controlled by the spatial effects which may occur. Accessibility will be measured by three indicators: a gravity accessibility indicator from each of the zones to employment, a gravity accessibility indicator from each of the zones to education places and an indicator of the journey time to the city centre. The possible presence of spatial autocorrelation in the error term has been scarcely addressed by traditional trip generation linear regression models in spite of the possible bias and inefficiencies in the estimated parameters. Spatial autocorrelation could occur due to social effects resulting from interaction between households creating different patterns of trip generation in the study area. This is a hypothesis which links in with the growing interest around the influence of social factors in transport choices (Carrasco and Farber 2014; Scott et al. 2013). Furthermore, the lack of any relevant variable in the model which affects the population in a spatially differentiated way could also cause spatial autocorrelation in the residuals of the models. It is therefore important to control the presence of these spatial effects for the correct estimation of the models’ parameters.

The proposed models have been estimated using disaggregated household data. Multiple linear regression models (MLR), spatial autoregressive models (SAR), spatial autoregressive models in the error term (SEM) and spatially filtered Poisson regression models (SPO) have been estimated. The four types of models will be compared as a function of the theoretical coherence and statistical significance of the estimated parameters as well as their goodness of fit. Finally, the models will be validated using a K—Fold type technique, which is a recommended method for testing the predictive capacity of models, previously not applied in this line of research (James et al. 2014).

The obtained results show that the household trip production for work and other purposes was sensitive to the expressed accessibility conditions. However, whereas greater accessibility supposed a reduction in PV trips for work purpose, it also supposed an increase in the number of trips made using other transport modes (NPV) for non—mandatory purposes. Trips made for study purposes showed that they could decrease in zones providing greater accessibility using PV, although the effect was only significant using the journey time to the city centre as an indicator. These results provide evidence supporting the idea that accessibility is a relevant factor in modelling discretionary trip production using modes other than the car (greater accessibility coincides with greater household trip production, ceteris paribus) and that it could be a factor in the reduction of PV trip generation, especially for work purposes.

Furthermore, the models which consider the presence of spatial dependence between observations, SAR, SEM and SPO showed that there was a degree of significant spatial dependence in trip production. It was important to consider the presence of spatial dependence between observations, especially for the direct and indirect effects of the variables in the SAR production models in order to correctly interpret the estimated parameters.

Finally, the validation of the models showed that the SAR and SEM had a mean squared error which was slightly lower than the MLR models in predicting overall trip production. This was because the MLR models showed a degree of significant spatial correlation with positive residuals concentrated in the periphery of the study area for trips made using PV and concentrated in the centre for trips made for other purposes using NPV. This effect was minimized by most of the spatial models until it lost its significance. This could be due to the interaction between households in different zones in the study area or alternatively, to the omission of a spatial variable with a different effect between the centre and the periphery. Furthermore, the Poisson regression models behaved better for both, the work journeys and those made for other purposes, underlining the importance of considering the truncated and discrete nature of trip generation.

The article will be structured in the following way. “State of the Art” section will provide a review of the state of the art in work relating to the inclusion of accessibility as a variable and the consideration of spatial effects in trip generation models. The methodology and the study area used for estimating the models will be presented in “Methodology and Application” section. This will be followed by the presentation of the results in “Results and discussion” section and finally “Conclusions” section will cover the main conclusions that have been drawn.

State of the art

The relationship between accessibility and trip generation has still not yet been satisfactorily clarified in the specialised literature. Some authors have found that this relationship is positive and significant whereas others have found that there is no relationship whatsoever between accessibility and trip generation.

One of the first studies was by Vickerman (1974), who tested the relationship between trip generation and accessibility using data from Oxford (United Kingdom). The author found evidence that variations in accessibility were positively linked to trip generation. Vickerman, however, also underlined the difficulty of estimating the model given the high level of correlation between the independent variables, a factor which the author believed was the main difficulty when trying to differentiate between the effects accessibility and other factors have on trip generation.

Later, Koenig (1980) studied the relationship between trip generation and accessibility using data from five French cities. The author used a gravity type indicator of accessibility and correlated it with trip generation rates controlled by car possession, age group and journey purpose. The relationship was positive in all the groups leading the author to conclude that accessibility was a strong determining factor in trip generation.

Hanson and Schwab (1987) examined the relationship between accessibility and different journey choices, one of which was frequency, using data from the town of Uppsala (Sweden). The work was controlled by various sociodemographic characteristics and the authors concluded that, differently from Koenig (1980), the relationship between accessibility and trip generation was weak and lower than expected by theoretical hypothesis. This led the authors to state that on an intraurban scale the level of accessibility does not need to be incorporated into trip generation models. A similar result was obtained by Ewing et al. (1996) who tested the impact of accessibility and other land use variables on trip rates using data from Florida (USA). The results showed that density, land use diversity and accessibility did not have significant effects on trip generation so the authors claimed that the conventional trip generation models without regard to location were good enough. Negative results were also obtained by Kitamura et al. (2001) who examined data from the Kyoto-Osaka-Kobe metropolitan area (Japan) and southern California (USA) in order to test the effect of accessibility on trip behaviour. The authors concluded that accessibility did not affect automobile use in study areas characterized by high industrial and urban development.

Thill and Kim (2005) estimated multiple regressions to predict the generation of private vehicle journeys with data from Minneapolis—St. Paul (USA). The authors used two types of indicators of accessibility: gravity and accumulated opportunities, as well as different manually calibrated impedance parameters. The researchers concluded that both, the trip production and attraction models, are significantly affected by accessibility even though the variable sign was positive or negative depending on the model.

Another line of research has focused on the role of other land use characteristics in trip generation. Cervero and Kockelman (1997) studied the importance of the build environment in PV and NPV trip generation considering three dimensions: density, diversity and design, using data from the San Francisco Bay Area (USA). The authors considered the accessibility to jobs and services as part of the density concept. This research found that the three land use factors (including accessibility) can reduce PV trip rates encouraging the use of more sustainable transport modes. Cervero et al. (2009) examined how different facilities and built environment characteristics: density, diversity, accessibility and proximity to transit influenced walking and cycling behaviour in Bogotá (Colombia). The results showed that factors such as connectivity and proximity to cycle lanes were associated with more physical activity whereas other factors such as density and diversity were not. According to the authors, these results showed that given the high accessibility of most neighbourhoods in Bogotá only the design factors were relevant. Purvis et al. (1996) estimated non-work trip generation models incorporating the effect of work trip duration as a measure of accessibility using data from the San Francisco Bay Area (USA). The authors found that decreases in the work trip duration were correlated with increases in trip generation for shopping and other purposes. In the same study area, Wu et al. (2012) compared the auto trip rates predictions of an activity—based model with and without considering accessibility. The authors applied the model to the San Francisco urban area (USA) and the results allowed the authors to conclude that if trip generation had been calculated without considering accessibility, the model would have significantly overestimated the number of trips made by car in denser neighbourhoods.

Other authors like Walters et al. (2013), Clifton et al. (2015) and Millard-Ball (2015) have focused on methods of adjustment of vehicle trip generation estimates from the Institute of Transportation Engineer’s (ITE) Trip Generation Manual (Institute of Transportation Engineers 2012). In general, the authors have showed that the ITE trip rates overestimate private vehicle trips in significant percentages such as 55 % in the case of Millard-Ball (2015) and 49 % for AM peak traffic in Walters et al. (2013). This is because they were often based on surveys conducted before the 1960s in areas with low accessibility to opportunities, without mixed-use and where there was less congestion and fewer transport alternatives. This line of research has highlighted the importance of taking into account the built environment and the accessibility of every place in order to estimate more accurate trip generation rates.

MLR is among the more commonly used methods for estimating the future evolution of trip generation. However, MLR presents certain problems associated with modelling trip generation as a continuous variable and not considering the possible presence of spatial autocorrelation in the residuals of the model. Given that the data used in trip generation models contain a strong spatial component, independently of their aggregated or disaggregated nature, specific econometric techniques need to be applied to minimize the problem. If not, then the estimated parameters could show bias and may provide erroneous inferences about the influence of variables such as accessibility.

Spatial dependence could be defined simply as the impact of the dependent variable or the independent variables of neighbouring areas, on the dependent variable of a determined area (Anselin 1988). In order to define the relationship between areas the analyst needs to specify a neighbourhood matrix. Gamas et al. (2006) estimated trip generation models for Mexico City (Mexico) considering the trips generated by unit area and the presence of spatial effects. After finding that several of their variables were autocorrelated, the authors applied a SAR model and thereby avoided the estimation of biased parameters resulting from conventional MLR models (LeSage and Pace 2009).

Kwigizile and Teng (2009) estimated trip production and attraction models using data from Las Vegas Valley (USA). Initially, the authors specified and estimated trip production and attraction models using MLR. After performing a Moran I test on the models’ variables, the authors detected the presence residual autocorrelation of the trip attraction model but not in the trip production model. This led the authors to estimate a trip attraction model considering the presence of spatial dependence between observations. Comparing the fit between the spatial and non-spatial models, the former showed an average deviation from the data of 17 % faced with 31 % for the latter.

Finally, Kim et al. (2009) detected the presence of spatial effects in the non-household based (NHB) trip generation models in the metropolitan area of Daegu (South Korea). The NHB trip production and attraction models showed spatial correlation in the error term leading the authors to estimate SEM models. The NHB attraction model also showed spatial correlation. The authors concluded that the spatial econometric models were a useful tool for making estimations about trip generation in study areas in which there are agglomeration economics i.e. positive externalities derived from proximity among firms and people, that induce more travel demand (Glaeser 2010).

Although the authors in the three studies mentioned above showed that the models considering the presence of spatial correlation in the residuals had a better fit to the data, they did not validate the models using different data from that used in the estimation phase. However, Lopes et al. (2014) did validate trip generation models estimated using data from 1974 with an origin–destination survey in 2003 for the city of Porto Alegre (Brazil). The authors found that the spatial models made better predictions for the 2003 journeys than the traditional linear regression models, although in both cases the predictions were not good enough due to the huge changes experienced by the city over 30 years. In any case, the predictive capacity of these models compared to the traditional MLR models is yet to be completely clarified.

Furthermore, additional evidence is required to check if the theoretical hypothesis that accessibility affects trip generation is correct addressing also for spatial effects between observations usually present in cross-sectional data. These spatial effects in trip production may be caused by various factors. Firstly, by the spatial dependence between nearby households due to mutual social influence. Recent research has placed more emphasis on the importance of considering social factors together with traditional factors when explaining transport choices. Social norms (Bamberg et al. 2007), transport habits (Minnen et al. 2015), social networks (Sherwin et al. 2014) or social cohesion (Clark and Scott 2013) may implicate spatial dependencies in the behaviour of nearby households which generate a greater or lesser trip production than households in other areas. Another factor to consider is the inability to measure, ore precisely measure, variables that have a differential effect in space. This specification error may also imply autocorrelation in the residuals of the MLR models with the estimation of biased or inefficient parameters. Both effects can be captured with the support of models which explicitly consider situations of spatial dependence, such as those presented in this study.

Methodology and application

This section presents the theoretical formulation of the trip generation models incorporating the accessibility factor and the presence of spatial autocorrelation in the residuals of the models. The available data for estimating the trip generation models are also presented here.

Modelling trip generation considering accessibility to opportunities

In trip generation modelling, a distinction must be made between trip production and trip attraction models. Trip production models aim to predict the home-based journeys (HB), where the origin or destination is the home, and the origin of non-home-based trips (NHB). Trip attraction models instead includes the trips which end is different from the home in HB journeys, and the destination of NHB journeys (Ortúzar and Willumsen 2001).

Trip generation models can be classified by purpose. The three most commonly used categories include: journeys for work, journeys for education and journeys for other purposes. This latter class can be further divided into multiple sub-classes depending on the requirements of the model (shopping, leisure, etc.). The estimation can be carried out using disaggregated household data in the case of trip production or using aggregated zonal data in modelling trip attraction. The conventional form of trip generation models corresponds to a multiple linear regression:

$$y = X\beta + \varepsilon$$
(1)

where y is a vector with information about the dependent variable (produced or attracted trips), β is a vector of estimated parameters and ɛ is a vector of independent and identically distributed (IID) errors. The matrix X contains information about the independent variables.

Some authors have proposed the use of models which consider the truncated and discrete nature of trip production (Chang et al. 2014; Lim and Srinivasan 2011). Among the available alternatives is the non-linear Poisson regression model with probability given by (Greene 2003):

$$P(y_{i} ) = \frac{{\lambda_{i}^{{y_{i} }} e^{{ - \lambda_{i} }} }}{{y_{i} !}}$$
(2)

The Poisson regression assumes that each dependent variable yi is extracted from a Poisson type discrete distribution with parameter λ i , which commonly is logarithmically linked to a linear combination of independent variables:

$$\ln (\lambda_{i} ) = x_{i}^{'} \beta$$
(3)

However, this model has the limitation of considering that the mean and variance conditioned by the Poisson distribution are equal, which could be problematic if overdispersion in yi is detected. This hypothesis can be relaxed by estimating an additional dispersion parameter, leading to a quasi-Poisson regression model (Zeileis et al. 2008).

Among the most commonly used independent variables in trip generation models are those such as number of vehicles, household income, household size, land use and accessibility. Various indicators have been proposed in the literature to capture the effect of accessibility. Handy and Niemeier (1997) proposed a classification with three large types: accumulated opportunities, gravity type and based on utility. The most frequently used indicators are of the gravity type and generally take the following form:

$$A_{i} = \sum\limits_{j} {f(E_{j} ,C_{ij} )}$$
(4)

where Ej is a measure of the attraction of zone j and Cij is a measure of the journey cost between zones i and j. The indicators of accumulated opportunities can be interpreted in a specific way of (4) where Cij is equal to 1 if the opportunities are found within a cost cut-off point defined by the analyst and 0 if not (Koenig 1980). The gravity indicators have an advantage over the above in that they can differentiate the opportunities according to journey cost without setting a binary cut-off point. The gravity indicators are also zonal, meaning that they provide a better understanding of the aggregate relationship between transport and land use from those based on utility at the individual level (Horner 2004).

Three accessibility indicators were chosen for this work. The simplest consisted of measuring access times from the centroids to the network plus travel times along the arcs of the network to the city centre, from each of the zones in the study area. This type of indicator can be useful in monocentric urban areas where there is a high centralised concentration of opportunities. Access and travel times are not the only costs that could be considered, but they provide a good proxy to the generalized travel cost of the trip. A gravity indicator proposed by Cascetta (2009) and Coppola and Nuzzolo (2011) has also been used for urban areas with a polycentric nature. These authors differentiate between the active accessibility of a zone which can be defined as the capacity to reach the opportunities present in other zones, and passive accessibility which is defined as the ability of a zone to be reached from other zones. Active accessibility can be seen as a valid indicator to predict the number of journeys produced by a zone and is formulated in the following way:

$$ACC\_ACT_{i} = \sum\limits_{j} {[\exp (\alpha_{2} \cdot C_{ij} )) \cdot E_{j}^{{\alpha_{1} }} } ]$$
(5)

where Ej and Cij are the same as those present in Eq. (4) and α1 and α2 are parameters to be estimated. This indicator can be easily calibrated using ordinary least squares (OLS) transforming both sides of the expression (5) logarithmically.

Models considering the presence of spatial dependence between observations

LeSage and Pace (2009) provide an introduction to the spatial econometric models developed in the literature. The best known spatial model is the SAR which assumes the existence of a spillover effect in the dependent variable. The model is specified as:

$$y = \rho Wy + X\beta + \varepsilon$$
(6)

where ρ is the parameter of spatial autocorrelation, W is a weighted matrix N x N where N is the number of observations and the remaining variables are equal to those present in (1).

If the only requirement is to specify the presence of spatial dependence in the error term then a SEM model can be used, as follows:

$$y = X\beta + u$$
(7)
$$u = \lambda Wu + \varepsilon$$
(8)

where λ is a parameter of autocorrelation of errors µ, and ɛ is a vector of IID errors. So in this model the dependent variable of a location is a function of not only the independent variables but also of the µ errors of the neighbouring locations. The matrix W can be defined in different ways depending on whether zonal or point data are available. The four most common types of neighbourhood are: queen, rook, predetermined number of closest neighbours and the specification of a maximum neighbourhood distance. The queen type contiguity considers as neighbours all the adjacent locations sharing a border or a vertex with the given location, while the rook type contiguity considers as neighbours those locations that share a border with the reference location (Anselin 1988). Lesage and Pace (2010a) proposed different measurements of the correlation between neighbourhood matrices and showed how the influence of specifying W on the estimations of the parameters is minimal if they are correctly interpreted from the true partial derivatives (direct impacts + indirect impacts) and if the model is well specified.

Griffith (2013) has proposed another method for addressing spatial autocorrelation which could be present in the residuals of the linear and non-linear models. Spatial filtering, the method based on spatial filters defined as linear combinations of the eigenvectors chosen from a neighbourhood matrix, allow this correlation to be removed. The eigenvectors are chosen in such a way that they show a spatial autocorrelation index which is higher than a critical value according to the specified neighbourhood matrix. The first eigenvector chosen E1 will be the group of real numbers with the greatest index of spatial correlation according to the Moran I indicator. The vector E2, will present the highest index of spatial correlation between the groups of real numbers not correlated with the E1 vector, and so on. In this way these eigenvectors capture the spatial correlation present in the data acting as control variables for the model.

Application to the urban area of santander

The MLR, SAR, SEM and SPO trip generation models were estimated using data from the city of Santander (Spain). Santander is a small—medium sized city with a population of 178,400 in its urban nucleus rising to more than 280,000 in its sphere of influence. Most of the trips in the city are made by foot (45 %) or by car (45 %), with the rest of trips made by bus (8 %) and using other modes.

The data for the models’ estimation came from two sources. Firstly, a household survey asked in 2010 by the University of Cantabria to 1,000 households and, secondly, the national population and housing census (INE 2011) which provided data about the socio-demographic characteristics and housing in the city. The households were chosen randomly from a list of 73,395 households on the municipal register. The sample size was determined by supposing a 90 % confidence level and a 5 % error level. In order to avoid bias, the interviewer presented and collected the survey in person giving instructions about how to fill in the forms. If there were problems when returning to collect the survey, the interviewee was instructed to continue returning to the households up to three times in order to minimise the number of non-response. Given that the postal address was available for each of the surveyed households, their positions were located geographically onto an information system (GIS) to know the area where they were located in, among a total of 86 zones in which the city was divided. This allowed the authors to relate the household database with the census zonal data in preparation for estimating the trip production models.

A total of 28 variables were considered for using with the household and zonal data (see Tables 1, 2). Note that of the 1,000 households surveyed only 817 provided complete answers. These were the ones used to estimate the trip production models.

Table 1 Descriptive statistics of the variables contained in the household survey database (N = 817 households)
Table 2 Descriptive statistics of the variables contained in the zonal database (N = 86 zones)

The variables HBW-PV, HBS-PV and HBO-PV represent the trips produced by the households: home-based work trips, home-based study trips and home-based other purposes trips, using private vehicle (car or motorcycle). The variables HBW-NPV, HBS-NPV, HBO-NPV represent the same trip purposes but in which the trips are made using non—private vehicles (mainly on foot but also using public transport). These were the dependent variables used in the trip production models.

The AGE variable represents the mean age of the household members. Given that the relationship between age and trip production is probably quadratic, the variable has also been specified squared in the models (AGE2). The variable CHILDREN represents the number of children in the household under 6 years old. The variables WOMEN, WORKERS and SIZE represent, respectively, the number of women, workers and the total number of people in the household.

The income variable referring to the overall monthly household income can be within the following ranges: (1) income less than 600 euros (€), (2) income from 600 to 1200 €, (3) income from 1200 to 2500 € and (4) income over 2500 €. To avoid non—linearity problems, the income variable has been transform in three dummy variables: D_INCOME2, D_INCOME3 and D_INCOME4. As each household can only belong to one of the intervals, the corresponding dummy variable takes a value of 1 in that class and 0 in the others. If the household belongs to the interval (1), the three dummy variables take the value 0.

The accessibility indicators: ACCW-PV, ACCS-PV, ACCW-NPV and ACCS-NPV were calculated from expression (5). These four accessibility indicators represent, respectively: accessibility to work by PV, accessibility to places of education by PV, accessibility to work by NPV and accessibility to places of education by NPV. The dependent variables used in their calibration were: the trips by PV for work and other purposes, the PV trips for study purpose, the NPV trips for work and others purposes and the NPV trips for study purpose. Given that the accessibility indicators try to evaluate the interactive potential of an area, the real trip distribution provides the most suitable data for their calibration. The number of jobs and the education places in zone j (see variables EMP and EDU in Table 2) were introduced into ACCW and ACCS, respectively, as variables of opportunities. The places of education have covered all levels: primary, secondary and university. Accessibility to employment is considered to be a good indicator of both work related and other purposes trips, while places of education are a better accessibility indicator of trips made for study purpose. The cost matrix used corresponded to the sum of access times, from each centroid to the transport network, and travel times at free flow by car for the PV accessibility indicators. The composite cost (Ortúzar and Willumsen 2001) of access plus travel times by foot and bus modes was used for the NPV accessibility indicators.

As can be seen in Table 3 the estimations of the parameters gave fits, according to R2 of around 0.6 except for the case of accessibility to education places by PV where the fit was poorer. The parameters had the expected signs (positive for opportunities and negative for travel times) where the impedance parameter is somewhat greater in the case of accessibility to education places and using NPV which is what was to be expected.

Table 3 Estimation of the parameters of the accessibility indicators

The CBD-PV and CBD-NPV variables can also be interpreted as measures of accessibility but in this case only for the opportunities present in the city’s urban centre. They represent the access and travel time, in min, taken by a private car at free flow and the access and travel composite cost, in min, of foot and bus modes respectively. The city of Santander has a marked monocentric character, meaning the variables of accessibility and CBD show an important negative correlation with R2 below −0.7 in all the indicators (see the times to CBD in Fig. 1). The employment density (see Fig. 2) also highlights this monocentric nature of the city with a great many jobs concentrated along this central axis.

Fig. 1
figure 1

Distribution of households, total trips by household and travel time to CBD by Private Vehicle

Fig. 2
figure 2

Employment density in the study area

The meanings of the remaining variables: HOUSEHOLDS, EMP, EDU, POP and POP_DEN are described in Table 2.

Results and discussion

Trip production for work purpose without and considering spatial dependence between observations

Table 4 present the parameters for HBW trip production models without and considering spatial dependence between observations. The models were estimated using OLS (MLR), maximum likelihood (SAR and SEM) and quasi-maximum likelihood (SPO). The specifications of the MLR models include all the variables present in the data base except for HOUSEHOLDS and POP_DEN as they showed a high correlation with the variable POP. In addition, all the models are shown only with the ACC variable in the specification given that this and CBD could not be used simultaneously because of their strong correlation. Only the variables which were clearly significant in the MLR models were considered in the case of the spatial models. In brackets, below the estimated parameters, are the p—value of the t test considering that the parameter is significantly different from zero if the value is equal to or lower than 0.1 (a confidence level of at least 90 % to avoid a type l error).

Table 4 Estimated parameters of the HBW trip production models

Eight models were estimated corresponding to the home-based trips for work purpose. The spatial models were specified considering the possible presence of spatial dependence in the dependent variable (SAR), in the residuals of the model (SEM) and using spatial filtering (SPO). The programming language R and the software package spdep (Bivand et al. 2015) were used to estimate the models.

The MLR models had a fit of 20 and 30 % of the explained variance. The variable WORKERS was clearly significant and positive with an effect of 0.5 trips in the PV model and 1 trip per additional worker in the NPV model. The variable AGE was significant to a 95 % confidence level and had a positive sign only in the NPV model. In this case AGE2 was negative suggesting that after reaching a certain age, the journeys for work purpose start to decrease. The variable WOMEN was not significant for any case, while CHILDREN was negative for HBW-NPV journeys, suggesting that households with children tend to use less alternative transport modes when the workers travel to their jobs. The vehicles owned by the households (VEH variable) were clearly significant in the two models although with opposite signs (car ownership reduces the journeys made using alternative modes to the private vehicle). Household income was not a significant factor in explaining trip production in any of the cases and POP was only significant and positive in some of the PV models.

The variable ACC is the most relevant to the aims of this study. The variable was significant in the PV trips and clearly not significant in the NPV trips. In the PV trips, ACC showed a negative sign, meaning these are reduced, ceteris paribus, as accessibility to work opportunities increased. The introduction of the variable in a logarithmic form was also tested to capture if the accessibility had decreasing returns, but the results did not increase the goodness of fit of the models or the significance of the variable. If the models are estimated using the CBD variable, the results for other parameters were very similar, and the CBD parameter showed a positive sign and was more clearly significant than the accessibility indicator (parameter value: 0.125 and p value = 0.000 in the MLR-HBW-PV model). The effect is therefore the same, as the distance from the centre increased so did the number of journeys made by private vehicle.

For the two MLR models, the Moran I index was calculated for the residuals of the regression. The index was significant in the case of the PV trips, leading to the acceptance of the hypothesis of spatial autocorrelation in the residuals of the model. The Robust Lagrange Multiplier Test (Robust LM Test, see Table 4) was also performed on the MLR models in order to confirm the presence of spatial dependence in the dependent variable (LM—lag) or in the error term (LM-Error). The test was clearly significant for the two cases in the HBW-PV trips although the LM—lag test showed a higher value.

The trip production models considering spatial dependence were specified with the same variables which had shown to be significant and not correlated in the MLR models. Different specifications were also made to check if the variables that had previously not been significant became significant using the spatial models, however, in all cases the results were negative. Spatial autoregressive models with autoregressive disturbances (SARAR models) were also tested but the results were very similar to the SAR and SEM models considering the parameters estimated and the goodness of fit.

Given the nature of the household data (point data structure), a neighbourhood matrix with ten nearby neighbours was used. It was preferred to use this type of matrix rather than a distance matrix because the latter may generate a very irregular distribution of neighbours between observations. This phenomena is to be avoided because it may violate the regularity hypothesis which is required to obtain the asymptotic properties of the estimators and the statistical tests (Anselin 2002).

The parameters obtained in the MLR models are similar to those from SAR and SEM, although the MLR models do have a certain tendency to slightly overvalue some values compared to the spatial models (for example the parameters of the variable ACC equal to −0.013 vs. −0.009 in the SAR model). The SPO model was estimated as a quasi—Poisson model, given that overdispersion was significant using a regression-based test (Cameron and Trivedi 1990).

The accessibility indicator ACC was nearly significant in the SAR model and significant, at a 90 % confidence level, in the SEM and SPO models. If the CBD variable is introduced, the parameter was clearly significant (e.g. parameter value: 0.094 and p-value = 0.000 in the SAR-HBW-PV model). By calculating the elasticity of the ACC variable in the HBW trips using the expression \(\beta ({{\overline{X} } \mathord{\left/ {\vphantom {{\overline{X} } {\overline{Y} }}} \right. \kern-0pt} {\overline{Y} }})\), it can be seen how the PV generated trips are elastic with an absolute value between 7.4 and 10.7 %, depending on the model per 1 % increase in the accessibility indicator. In the case of the Poisson model, the estimated parameter gives a reduction of 1.6 % in the trip production by PV, per one additional unit increase in the ACC indicator.

The fit of the SAR and SEM PV models was very similar but slightly better in the case of the SAR model. In both cases the ρ or λ parameters were clearly significant. The LR test, which allows the fit of the SAR and SEM models to be compared with the fit of the MLR models (Ortúzar and Willumsen 2001), presented significant values (i.e. the spatial models had a significantly better fit to the data). The Moran I test performed on the residuals, also showed that the spatial correlation decreased and was not statistically significant at a 95 % confidence level in all the models.

Therefore, the results seem to indicate the presence of a moderate degree of spatial autocorrelation in the error term, in the case of the MLR-HBW-PV model. Figure 3 shows the Getis—Ord Gi* statistic (Ord and Getis 1995) estimated with the residuals of this model. The point observations from the households have been transformed into polygons using the Thiessen polygons technique. Note how the positive residuals, where the model makes predictions below the observed values, are significantly concentrated in the city outskirts, whereas the negative residuals are concentrated significantly closer to the city centre. This would suggest that the accessibility variables ACC and CBD partially capture the effect of increased HBW and PV trip generation producing an effect of spatial correlation in the residuals. This effect is minimized by the spatial models meaning spatial correlation ceases to be significant.

Fig. 3
figure 3

Getis—Ord Gi* statistic in the residuals of the MLR-HBW-PV models

Trip production for study purpose without and considering spatial dependence between observations

The models of trip production for study purpose can be seen in Table 5. The fit of the MLR models was similar to that of the models for work purpose although slightly higher in the case of the NPV trips and slightly lower for the PV trips. The variable accessibility to places of education was clearly not significant for the PV and NPV trips in the MLR models. However, if the models are specified using the CBD variable, this was significant for the PV trips with an estimated parameter of 0.04 (p—value: 0.02) and an elasticity of 2.5 % for an increase of 1 % in journey time. The LR tests were significantly better in the cases of SAR-HBS-PV and SEM-HBS-NPV although with moderate values. The Poisson models also showed a significant overdispersion because of which they were estimated as quasi—Poisson models. In the case of SPO-HBS-PV, if the model was specified with the CBD variable, the estimated parameter was also significantly different from 0, with a value for the parameter of 0.163 (p-value: 0.018).

Table 5 Estimated parameters of the HBS trip production models

Trip production models for other purposes without and considering spatial dependence between observations

Table 6 shows the parameters for the trip production for other purposes in the MLR, SAR, SEM and SPO models.

Table 6 Estimated parameters of the HBO trip production models

The MLR models had a poor fit in the case of the PV trips and a better performance in the NPV with 30 % of the explained variance. The most noteworthy change with respect to HBW and HBS models was the positive sign of the ACC parameter in the case of NPV trips. This means that in this case, greater accessibility to opportunities significantly increases the number of trips that are produced. In the case of the journey time to the city centre, the longer the journey time, the fewer household trips were produced by NPV with a parameter value equals to −0.032 (p—value: 0.000). The Moran I index presented clearly significant values whereas the LM—test was only significant for the lag type model.

In the spatial models, the accessibility variables continue to be clearly significant for the HBO-NPV. If the elasticities of the ACC variable are calculated for the HBO-NPV trips, it can be seen how the trips being produced are inelastic, with a value between 0.4 and 0.8 % according to the model. The quasi—Poisson SPO-HBP-NPV model estimated an increase of 0.1 % for one additional unit in the indicator of accessibility to opportunities. This shows that although accessibility increases so does the production of HBO trips using alternative modes to the private vehicle, although the growth is not very high.

Figure 4 shows a representation of the Getis—Ord G* statistic for the residuals of the MLR-HBO-NPV model, once again representing the households as Thiessen polygons. In this case, contrary to the details shown in Fig. 3, the positive residuals are mainly concentrated in the city’s central zone, whereas the negative residuals are located in the peripheral areas. This effect explains the spatial correlation in the residuals of the MLR models, an effect which is minimized by the spatial models in which the Moran I index ceased to be significantly different from zero with the exception of the SAR-HBO-NPV model.

Fig. 4
figure 4

Getis—Ord Gi* statistic in the residuals of the MLR-HBO-NPV model

Total impacts of the SAR models, eigenvector coefficients and comparison of the results

The total impacts on the dependent variable were also calculated in the SAR models. These effects take into account the direct effect of each independent variable on the dependent variable, as well as the indirect effects of spatial spillover from the dependent variable towards neighbouring observations and from neighbouring observations towards the local dependent variable (see Table 7) (LeSage and Pace 2010b). Taking these direct and indirect effects into account, the SAR models infer that the total effect of changing certain variables, particularly SIZE, WORKERS and VEH, is greater than that estimated using the MLR models. This data provides evidence of the existence of spatial effects on the trip production rates. The parameters estimated in the Poisson models for the eigenvectors of the spatial filter can also be seen in Table 8. The SPO-HBO-NPV model was the one that presented a greater number of significant eigenvectors when constructing the spatial filter.

Table 7 Total impacts of the SAR models
Table 8 Eigenvectors coefficients for constructing the spatial filters

Comparing these results with other studies, these agree with those of Vickerman (1974) who showed that accessibility was a positive factor in trip production, mainly for optional journeys for shopping or leisure purposes. Furthermore, the fits of the models were similar to those obtained by Vickerman after they removed the collinearity problems associated with the models. Using a similar study area to the one used in this research (monocentric small—medium sized city), Hanson and Schwab (1987) also found a significant, although weak, effect of accessibility on optional journeys, although the only indicator of accessibility used by these authors was the accumulated opportunities type, they did not use gravity type indicators or journey time to city centre in spite of even the authors themselves recognising “If distance of the residence from the central business district (CBD) can be taken as a valid surrogate for the density of opportunities (…) then there is some evidence that low accessibility levels are related to lower trip frequencies”. Thill and Kim (2005) found that trip production was affected positively or negatively by accessibility, similarly to the results of the models presented in this research considering the difference between using PV and NPV. Cervero and Kockelman (1997) also found that accessibility could be a negative factor in trip production for the PV mode as has been found in the models presented.

If these results are compared to those obtained in other research including the possible presence of spatial dependence between observations, Gamas et al. (2006) found that the MLR models they estimated showed biased parameters that overvalued the influence of specific variables compared to spatial models. This phenomenon is similar to that found in the present study when addressing only the direct effects of each independent variable. However, if the indirect effects are also considered in the case of the SAR model (untested by Gamas et al. (2006)) the effect of certain variables may be greater than that found by the MLR models. The results of Kwigizile and Teng (2009) also found that the ordinary regression models generally overvalued the estimated parameters when compared with the spatial models, although the authors did not address the influence of indirect effects. Finally, Kim et al. (2009) found the presence of spatial autocorrelation in both non-home-based (NHB) trip production and attraction. There seems to be evidence that both trip production and attraction may present the effects of spatial correlation and that this is dependent on the characteristics of the study area.

Validation of the models

The models in the predictive phase were validated using a K—Fold type cross—validation (James et al. 2014). The models chosen for the validation were MLR-HBW-PV, SAR-HBW-PV, SEM-HBW-PV and SPO-HBW-PV among the PV models and MLR-HBO-NPV, SAR-HBO-NPV, SEM-HBO-NPV and SPO-HBO-NPV among the models in other modes of transport. These models were selected because they were the ones that showed a greater degree of spatial correlation and where the LR test demonstrated a better improvement in the fit compared with the MLR models.

The SAR and SEM models can work in predictive phase using the technique proposed by Bivand (2002). Bivand proposes separating the model terms in the trend, the signal and the noise. The trend in both the SAR and SEM models is given by the matrix , in other words, the matrix of independent variables and the vector of parameters to be estimated. The signal in the SAR model can be found using:

$$\mathop \rho \limits^{ \wedge } W\mathop y\limits^{ \wedge } = \mathop \rho \limits^{ \wedge } W(I - \mathop \rho \limits^{ \wedge } W)^{ - 1} X\mathop \beta \limits^{ \wedge }$$
(9)

The above has the inconvenience of losing the part of the signal belonging to the error term. In the SEM model the signal is equal to 0 meaning that the prediction is given only by the trend.

The validation was done by randomly dividing the sample into 10 groups, successively estimating the models with 9 of the groups and performing the prediction with the remaining group. This process was repeated with all 10 groups until all the observations had a prediction value in validation mode. Later, the mean squared error (MSE) was calculated for the residuals of the predictions compared to the real value. In various empirical applications the cross—validation with 5 or 10 groups has shown no excessive bias or variance in the estimation of the MSE (James et al. 2014; Kohavi 1995). The cross-validation estimate is computed for k groups as:

$$CV_{k} = \frac{1}{k}\sum\limits_{i = 1}^{k} {MSE_{i} }$$
(10)

The results of the validation were similar for the MLR, SAR and SEM models (see Table 9). However, the SAR and SEM models have a slight advantage over the MLR based model in terms of their ability to predict trip production in both the PV and NPV modes. This improvement in prediction is almost certainly due to the spatial models helping to reduce spatial correlation in the residuals generated in the periphery and the urban centre with the HBW and HBO trips, respectively. The Poisson regression models SPO-HBW-PV and SPO-HBO-NPV presented the lowest MSE. This was probably due to the better adaptation of the Poisson distribution to the patterns of household trip generation.

Table 9 Cross—Validation of the models

Conclusions

This article has described work to estimate MLR, SAR, SEM and quasi-Poisson models considering spatial effects between observations, to predict the number of trips made by a sample of households in the urban area of Santander (Spain). The models were specified with indicators of accessibility to opportunities in order to determine if this is a significant factor when explaining trip generation. This aspect is important when considering that journey demand should be sensitive to the transport costs implied in accessing the opportunities, an expected fact according to the available theory about induced demand and microeconomic logic.

The estimated MLR models showed how the production of journeys is sensitive to the ACC variable, a gravity type accessibility indicator, and the CBD variable, the time it takes to get to the city centre. This sensitivity was the opposite in the trips produced using private vehicle and other modes of transport. While in the case of PV trip production for work purpose, accessibility elastically reduced the number of trips, for other modes of transport (mainly on foot), accessibility could increase trip production, especially for optional trips, although in a non-elastic way. These results provide evidence that, on an intraurban scale and for an urban area like the one being studied, accessibility could be a significant factor in the production of trips and that trips made using a private vehicle could be reduced by increasing accessibility to opportunities. Therefore, it would be recommendable to use some type of accessibility indicator for simulating the possible effects of changes made to the transport supply or the pattern of activities. If this is not done then the predictions would surely misestimate the transport demand of optional trips by NPV or mandatory trips by PV in the network being studied. In the case of the urban area of Santander, a city with a monocentric distribution of opportunities, the journey time from each zone to the city centre was an indicator for estimating trip production as good as the gravity type indicator in terms of the goodness of fit of the models. In the case of other urban areas with more decentralised distributions of opportunities, gravity accessibility indicators could be more suitable for predicting trip production.

The models which considered the presence of spatial dependence between observations, SAR, SEM and SPO were able to support these results and deal with the existence of a significant degree of spatial dependence in the production of HBW-PV and HBO trips and, to a lesser degree, HBS trips. The consideration of the presence of spatial dependence and the direct and indirect effects of the variables in the SAR production models was important in the estimation and correct interpretation of the parameters which in the cases of the variables SIZE, WORKERS and VEH showed greater effects than those estimated by the MLR models. This phenomenon points to the effects of spatial correlation in the residuals derived from the fact that in the urban periphery the MLR models underestimate the production of HBW trips using private vehicle, while the production of HBO trips by alternative modes of transport was underestimated in the urban centre. Although this effect is not of any great magnitude in the study area (a small-medium sized city without significant congestion problems), it could be higher in larger urban areas with worse traffic conditions. This effect could be due to mutual social influence between households resulting from different social characteristics between city centre households and those in the suburbs. In addition, the lack of a variable with spatially differentiated effects could also cause spatial autocorrelation in the residuals of the linear models although the fit of the SAR models and the LM-lag tests were higher than the fit of the SEM models and the LM-error tests for both, the MLR-HBW-PV and the MLR-HBO-NPV specifications.

Finally, the validation of the models using cross—validation with 10 groups, showed that the spatial models had a mean squared error slightly lower than the MLR models in the prediction of overall trip production, derived from the explicit consideration of the effect of spatial correlation in the residuals in the centre and periphery of the study area. In addition, the SPO models performed better than the continuous models by considering the truncated and discrete nature of trip generation.

Future lines of research stemming from the present work could involve the application of similar models in other study areas to determine the importance of differences in accessibility and spatial correlation between observations in larger cities with greater problems of traffic congestion. Furthermore, it would also be of interest to test the existence of spatial heterogeneity in the parameters of trip generation models, a path already opened by other researchers (Roorda et al. 2010) which could also be applied in evaluating the accessibility effects.