Introduction

In the past, little attention has been given to the task of establishing transportation analysis zones (TAZ), at least in comparison to that given to other elements of the modelling process. A literature review of the subject reveals scarce information and guidelines of how zoning systems are defined.

One of the first insights into spatial data aggregation was developed by Ward (1963), who developed a procedure to form hierarchical groups of mutually exclusive subsets on the basis of their similarity with respect to specified characteristics. Several authors have criticised and introduced new procedures for spatial data aggregation (Scott 1971; Batty 1974; Masser and Brown 1975; Keane 1975; Batty 1976).

The first systematic algorithmic attempt for the definition of TAZ was pursued by Openshaw (1977). Openshaw developed a hierarchical heuristic procedure to the automatic zoning problem (AZP) by optimising an objective function that measures partition performance in terms of the model and a predefined target value.

More recently, other authors have used cluster techniques on census data within GIS to achieve an optimal solution to zone design (O’Neill 1991; Crevo 1991; Chesbon and Cheryl 1992; Ding et al. 1993; Bennion and O’Neill 1994; Gan 1994; Ding 1994; Xie 1995; You et al. 1997a, b; Morphet 1997; Eagleson et al. 2002; Binetti and Ciani 2002). All of the proposed algorithms use different constraints and objective functions, which hinder a comparison of the results obtained from these different models, and has led to individual applications to specific case studies.

Like spatial analysis, transport demand modelling requires spatial data aggregation in TAZ, which is one source of inaccuracy of transportation analyses derived from the arbitrary delineation of TAZ boundaries. Sensitivity analysis to variations in the zoning systems used to collect data and in the scale at which data are reported (a scale or size of unit problem) (Fotheringham and Rogerson 1993) in transportation planning is either ignored or given little attention. TAZ boundary delineation may affect the subarea’s (core problem area) socioeconomic characteristics which influences trip origin-destination (O/D) patterns (Viegas et al. 2008).

Some authors have analyzed the modifiable areal unit problem (MAUP) in spatial analysis and transport demand modelling and discussed the commonality of spatial data aggregation in both topics. The MAUP addresses the ‘modifiable’ nature of area data used in spatial analysis and the influence it has on the analysis and modelling results. A classic example of MAUP effects is the relationship between estimated correlation coefficients and the geographic level of census data used in the computation (Robinson 1950). MAUP effects have two components: the scale effect relates to the level of spatial data aggregation, and the zoning effect relates to the definition or partitioning of units for which data are collected (Openshaw and Taylor 1979; Wong and Amrhein 1996). Effects of the MAUP have been reported in bivariate regression (Clark and Avery 1976), multiple regression (Fotheringham and Wong 1991), spatial interaction (Batty and Sikdar 1982a, b, c, d; Openshaw 1977), and location-allocation (Goodchild 1979). At least one study has shown that MAUP effects can vary with the statistics calculated: no apparent effects on means and variances but dramatic effects on regression coefficients and correlation statistics (Amrhein 1995).

This paper presents a comprehensive approach to the definition of TAZ, and attempts to overcome previous limitations using geocoded travel demand data to achieve an optimal zone design embedded in GIS software.Footnote 1

The developed algorithm aims to minimise the loss of information when moving from a continuous representation of the trip ends to their discrete representations through zones, and focuses on the trade-off between the statistical and geographical error, and considers the percentage of intra-zonal trips of the resulting OD matrix and trip generation/attraction homogeneity.

This new methodology was developed using the Lisbon Metropolitan Area (LMA) as a case study, which has an area of approximately 320,000 ha, and uses the 1994 Mobility Survey of this metropolitan area as a data set. The Mobility Survey includes a sample of 30,681 respondents who describe their daily individual trips (only 21,954 respondents with trips performed that day), resulting in a total of 58,818 trips, all having their origin and destination geocoded. After weighting, the survey estimate of daily trips in the LMA is 11,125,000 trips.

The paper starts by presenting some of the main constraints to zone design based on an intensive literature review, discusses the methodology of the new proposed TAZ definition algorithm and some local improvement algorithms, and then ends with a comprehensive analysis of the obtained results in the framework of a comparison to current analytical practises for the investigated case study.

Theoretical constraints to TAZ definition

Through four decades of research, the scientific literature has established some guidelines and constraints to the definition of TAZ. In this section, a complete set of the constraints discussed in the literature is presented, and some contradictions and difficulties in their implementation are discussed (Table 1).

Table 1 Constraints to the definition of TAZ

It is difficult to consider and implement all of the above criteria in a single process of TAZ design because some rules contradict others (Ding et al. 1993; O’Neill 1991).

Some of these rules are a consequence of the use of a fixed zoning scheme over time, and of different study scales and purposes. In the scientific literature, it has been stipulated that transport demand modelling requires constant spatial data aggregation in TAZ along the entire process, from data collection to the trip assignment step (Chang et al. 2002).

The number of zones problem can be solved through the development of a hierarchical zoning system, as in London Transportation Studies (Ortúzar and Willumsen 2001), where subzones are aggregated into zones, which in turn are combined into districts, traffic boroughs, and finally a sector. This facilitates an analysis of different types of decisions at the appropriate level of detail (Ortúzar and Willumsen 2001).

Unfortunately, predetermined zoning systems do not take into account the ongoing changes of land use (spatially and temporarily), which can deeply affect TAZ homogeneity and compactness, producing significant misestimates of trip generation and OD matrices (Edwards 1992).

These ongoing spatial and temporal land use changes demonstrate that current traffic management models, which are sometimes directly based on the results of the data collection process, and suppress the trip generation and distribution steps of the classical travel demand model (Ding 1998), should use a different zoning scheme from the transportation demand forecasting models (Openshaw and Rao 1995).

A better solution to the data collection zoning system constraint is the development of survey processes, in which the trip ends are geocoded. This data collection process requires the establishment of an initial zoning system, as with all other data collection processes, for the determination of expansion coefficients of each trip (Ortúzar and Willumsen 2001); however, after the conclusion of this process, travel data is not attached to this or any other zoning system, but only to their geospatial coordinates (Chapleau 1997). Each study that uses this database can then develop a new zoning scheme that better fits the study scale and goals, resulting in more flexibility for the transportation analyst and a higher utility of the available database (Chapleau 1997; Trepanier and Chapleau 2001).

TAZ delineation algorithm

Problem formulation

A methodology for zones delineation will be defined to reduce the noise level of the data for traffic modelling, and at the same time, to minimise the geographical error of the trip end location.

The methodology defines zones such that:

  • zone boundaries correspond to places with very low trip generation densities (reducing the probability of misallocating trips to zones near the zones boundaries);

  • intra-zonal trips are minimised;

  • the definition of zones with a very low number of trips or a very large area (high geographical error) is avoided;

  • the density of trip production inside a zone should be as homogeneous as possible.

Some of the constraints to the definition of TAZ presented in the preceding section were not included in this methodology. These constraints were considered as local improvements to TAZ borders (i.e., the adjustment of TAZ boundaries to political, administrative, or statistical boundaries), and were introduced only at the end of the TAZ definition process. The local improvements to the basic algorithm are presented below.

Methodology of analysis and the selection of zones limits

The methodology for the determination of zones starts by the aggregation of the geocoded trip ends (origin and destination) into a (relatively fine) cell grid. The cell grid can be variable and depends on the size of the study area and the precision intended for the study (Viegas et al. 2008). For the Lisbon Metropolitan Area, a square cell grid a 200 m side length was used.

A thin plate splineFootnote 2 was used to smooth the resulting surface and interpolate for cells without observations.

The result of this analysis is presented in Fig. 1, where it is possible to identify the high concentration of trip ends in the Lisbon city centre and near some locations at the Lisbon municipality boundary.

Fig. 1
figure 1

Total origins and destinations in the Lisbon municipality—3D view

The TAZ delineation algorithm uses the results of this analysis as background data, where each peak is the centre of a zone, the limits of which should be defined by the valleys surrounding it. The algorithm starts by identifying the local “highest peaks” and their surrounding area, sorts them by decreasing magnitude, and then uses a local search algorithm for the design of the zones. A search is performed for each peak considering a defined set of rules. These search rules, as presented in Fig. 2, were developed to avoid the delineation of zones with complex spatial structures, which could undermine the applicability of the model in complex urban structures, as well as the assessment of its results, even for an experienced transportation analyst.

Fig. 2
figure 2

Search example

These rules were translated into equations considering a matrix composed for each local “highest peak” located in the centre of the search matrix. In order to codify these rules, a set of variables were used. An example with the explanation of these variables is presented in Fig. 3.

Fig. 3
figure 3

Explanation of the variables used in the search rules equations

The search rules equations present two different cases for the same level of aggregation, as presented in Fig. 3: Case 1, where |x − d| is different from |y − d| (except for when x = d and y = d, seed cell of aggregation process); and Case 2, where |x − d| is equal to |y − d|. This differentiation inside each level of aggregation is caused by the need of aggregation of cells of the same level, prior to a Case 2 aggregation (see Fig. 2).

The search rules equations contain two different statements: the first statement is used for the cell identification in the search matrix (i.e., n = 0 and m <> 0 and m <> 2 × l), and the second is the requirements for aggregation of this cell in the search matrix (vicinity conditions) (i.e., If matrix (d + m − l, d – l + 1) = 1). There are nine different equations that restrain the search matrix, five for the Case 1 and four for Case 2. These rules and their application are not presented in detail in this paper, but are described in Martínez (2006).

The TAZ delineation algorithm is defined by five different constraints and an objective function with two variables. These constraints can be divided in two groups: Those derived from the algorithm (four constraints) and the geographic constraint for the TAZ border delineation (avoiding overlapping between zones). The constraints derived from the algorithm are:

  1. 1.

    The total origins (O i ) or destinations (D i ) of trips of each TAZ should be greater than 70%Footnote 3 of the average origins or destinations of trips by zone (total origins or destinations of trips divided by the number of zones). This is a requirement for quasi-homogeneity of trip quantities across zones, which indirectly controls the relative statistical error of the resulting zones.Footnote 4

  2. 2.

    Each TAZ area should be at least 70%Footnote 5 of the size of the influence areaFootnote 6 of local predefined “highest peaks,”Footnote 7 which avoids the formation of zones with very low geographic precision.

  3. 3.

    The average statistical (relative) error in the estimation of OD flow matrix cells should be lower than 50%, which directly controls the statistical precision of each TAZ.

  4. 4.

    The number of zones should fall within the range previously defined by the analyst, which forces the algorithm to follow the analyst’s preferences.

These are the main constraints of the algorithm. The other constraint (geographic constraint) is only used when a TAZ has already been delineated and forces TAZ limits to reach a frontier of an already formed TAZ (finding another boundary and stopping the search by defining the outer boundary of that specific zone) in order to obtain a zoning scheme that covers the entire study area and produces no overlaps.

The first constraint is defined to avoid the definition of zones with a very low and heterogeneous statistical precision in the resulting zoning schemes, as this is strongly correlated with the number of trip origins or destinations of trips per zone. The second constraint is intended to avoid the formation of very small zones that could have good geographic and statistical precision, but would lead to heterogeneous geographic precision from a global point of view (large zones to compensate for other small zones). These two indicators work as an indirect control over the statistical and geographic precision of the TAZ delineation. Of course, the 70% thresholds proposed in this application could be revised for a different application.

The third constraint attempts to directly control the statistical precision in the estimation of the OD matrix cells of each TAZ through the average statistical error of each matrix cell, assuming a distribution of trips from each TAZ as the origin, to all TAZs, proportionally to the total flow in those zones as destinations. This constraint does not guarantee that all of the cells of the OD matrix, in which the zone is the origin, are statistically significant, but considerably increases the likelihood that it is so (Martínez 2006).

The fourth constraint ensures the previously defined range for the number of zones. This constraint is also used as one of the stopping criteria of the algorithm when it cannot be satisfied because the given range of is too high for the available number of cells and data.

The objective function of the TAZ algorithm contains two different components: The density of trips and the percentage of intra-zonal trips of each zone. This objective function tries, simultaneously, to optimise these variables by minimising the standard deviation of the density of trips (across cells inside each TAZ), thus leading to more homogeneous zones and minimising the sum of the percentages of intra-zonal trips across all zones.

These components can have minimum values at different points, with trade-offs solved through the use of a ranking function, which minimises the sum of the rankings of the two variables (see Fig. 4).

Fig. 4
figure 4

Decision tree for determining the optimum of the TAZ algorithm objective function

If the ranking function retrieves the same result for different cases, the objective function considers the result with the lower trip density standard deviation to be the “most suitable.” The decision tree used for the objective function is presented in Fig. 4.

The algorithm requires setting some parameters, which are important for the establishment of constraints, as well as for the optimization of some additional features of the algorithm (e.g., the optimal number of zones for a given range, which depends on some macroscopic indicators that will be presented below).

These parameters are:

  • The definition of the range of number of zones (searching for the most suitable solution)—compulsory input;

  • minimum size of the influence area of local “highest peaks” for TAZ delineation;

  • definition of a core problem areaFootnote 8—compulsory input;

  • percentage of zones belonging to the core problem area;

  • maximum proportion of areas between zones in the core problem area and in the “rest of the world;”

  • some parameters of the indicators (macroscopic indicators) used to define the optimal number of zones of the given range. This optimal number is found with the help of a multi-criteria additive function, which is defined below.

If the user omits some of these parameters, and they are not compulsory, the algorithm uses the default values of those parameters in the algorithm (e.g., percentage of zones belonging to the core problem area equal to 50%).

After an overview of the TAZ delineation algorithm, the mathematical and data flow of the algorithm are presented to facilitate a deeper understanding of its mechanics.

TAZ delineation algorithm structure

This section describes all of the input data needed for the algorithm, the algorithm (functioning and data flow), and the equations used in the model (constraints and objective function).

The model needs as basic input:

  • The square grid cell, which will be used to form the TAZs (this input has a very strong impact on the delineation quality of the zones and the running time of the algorithm);

  • the geocoded trip ends to be aggregated into spatial grid cells;

  • the total number of trips of the survey, which is used to determine the relative statistical error of estimation of the OD matrix;

  • the definition of the core problem area;

  • the definition of the local “highest peaks” (seed of the TAZ formation) influence area;

  • the definition of the parameters of the multi-criteria function to be used for selecting the optimal number of zones based on the macroscopic indicators produced by the algorithm.

Using all of the inputs above and the parameters mentioned above, particularly the square cell grid and the range of the number of zones, the algorithm starts by ordering all of the cells in a decreasing order of their value of T (total origins and destinations of trips per cell). The algorithm is then structured with three different cycles (see Fig. 5). The first cycle is responsible for the calculation of the zoning scheme for the different numbers of zones contained in the given range. This cycle iterates from L, the lower bound of that range, to U, the upper bound of that range, and contains the two other cycles that are responsible for the delineation of each TAZ.

Fig. 5
figure 5

Delineation algorithm functioning and data flow

The second cycle delineates each TAZ. The cycle starts by defining the local “highest peak,” the initiator of the cell aggregation for the TAZ delineation, and its spatial search. This cycle iterates until the set of zones has a dimension of N (the number of zones given by the first cycle), or until the number of analysed cells (possible local “highest peaks”) reaches the total number of cells available.

The third cycle is responsible for the aggregation of cells for each TAZ. In the beginning of each TAZ delineation, the local “highest peak” search is empty. In the first iteration, the local “highest peak” is inserted at the centre search, thus being the seed of the delineation process. The following iterations aggregate cells to the local “highest peak” using the search rules above, where the objective function determines the most suitable cell to aggregate to the TAZ from the set of cells that respect the spreading rules in each iteration (minimization of the standard deviation of the trip density of TAZ and the intra-zonal trips).

After the determination of all of the zoning schemes for the available range of number of zones (L), the algorithm uses a compensatory rule to determine the optimal number of zones for the given range (see Fig. 5). The compensatory rule uses the macroscopic indicators for each zoning scheme that are presented below as attributes. All the attributes (M i ) were previously scaled to values between 0 and 1 using a linear function between the maximum and the minimum available values of the attributes. The weights of the attributes (w i ) of the compensatory rule are specified by the user, as indicated above. The global compensatory function that should be maximised is presented in (1), where N is tested between L and U (lower and upper bounds of the range of number of zones in the input) for the different indicators (MI—number of macroscopic indicators).

$$ \mathop {\max }\limits_{{{\text{L}} - {\text{U}}}} \quad \sum\limits_{i = 0}^{\text{MI}} {\left( {\frac{{\max (M_{i} ) - M_{i} }}{{\max (M_{i} ) - \min (M_{i} )}}} \right)} \times w_{i} $$
(1)

The algorithm selects a number of zones within the available range that maximises the global compensatory function and stores the result (numeric and spatial data) in a warehouse that can be accessed with GIS software. All of the microscopic and macroscopic indicators of the analysed zoning schemes are stored in a spreadsheet, making it possible for the user to quickly assess the results.

TAZ indicators of compliance

The TAZ delineation algorithm presented above uses several indicators of compliance. Some of these are used the constraints verification or the objective function calculation, while others are only used to characterise each TAZ or zoning scheme. These indicators are either microscopic or macroscopic, depending on the unit they are describing: Microscopic indicators are calculated for each TAZ, while macroscopic indicators are calculated for a zoning scheme. These indicators are dynamically recomputed during the algorithm run after the aggregation of a new cell to a TAZ, as stated above (see Table 2).

Table 2 Microscopic and macroscopic indicator summary

The macroscopic indicators of the TAZ delineation algorithm were developed to evaluate the resulting zoning schemes and to determine the optimal number of zones within the defined range using a compensatory rule. These indicators are only obtained at the end of running the algorithm, using in some cases the final TAZ microscopic indicators for their calculation (see Table 2). All of the indicators in Table 2 are presented in detail and discussed by Martínez (2006).

Local improvements to the TAZ delineation algorithm

Some local improvements to the TAZ delineation algorithm were also developed in order to improve the results and turn the algorithm more flexible to different study areas. Below is a list of the primary improvements developed.

  • A border conversion algorithm that adjusts the borders of TAZ (obtained from the square cell grid) into administrative or statistical land units, which allows a better understanding of the zoning systems results, easier TAZ data assessment, and a more feasible way to use the results as a transportation and urban planning tool. This algorithm takes as its objective function the maximisation of the overlapping areas between the borders of TAZ and the administrative or statistical land units, and as constraints, the preservation of the number of zones of the zoning system and the continuity of zones (split zones not accepted).

  • A bi-criterion optimization algorithm that locally optimises the statistical precision of the transport demand estimates associated to the zoning scheme a by local reassignment of frontier cells from one zone to its neighbour.

  • An urban barriers correction algorithm that respects physical geographic separators of the territory, such as railways, rivers, and common urban and mobility barriers. This algorithm determines the splitting areas generated by the intersection of each TAZ with the defined urban barriers, identifies the TAZ members (cells or institutional borders), and reassigns the cells/institutional borders located “outside” the urban barriers to the neighbouring TAZs.

  • A major flow corridors correction algorithm that adjusts the TAZ borders to the main study area flow corridors in order to maximise the intersection area of TAZ with these corridors (making only some frontier cell rearrangements), ensuring that these major flows have as small an intra-zonal representation as possible. The algorithm considers two types of constraints: The preservation of the number of zones and the continuity of zones (split zones are not accepted).

These four local improvement algorithms lead to zoning systems with better performances over the variables and constraints considered relevant for TAZ definition. A more detailed assessment of these algorithms is given by Martínez (2006).

Results of the case study and data analysis

The results of the application to the Lisbon municipality of the LMA are presented below. This analysis was developed only considering the Lisbon municipality due to the greater geographic precision of the trip ends geocoding in the Mobility Survey relative to the rest of the LMA. The consideration of different geographic precisions in the location of trip ends could bias the results of this analysis due to a random concentration of trip ends in some specific points in the suburbs.

The analysis of results was focused on the measurement of the value added by the TAZ delineation algorithm. An example was developed to compare the evolution of several indicators with the different kinds of zoning systems (different methodologies and geometries). This example uses the same number of zones (53 zones, equal to the number of “freguesias”–boroughs in the Lisbon municipality) for all methodologies in order to allow a direct comparison. The following zoning schemes are compared:

  • The usual administrative units (with statistical information) quite often used as TAZ in traffic modelling at a municipal scale (denoted as freguesias–boroughs);

  • a square cell grid leading to most suitable values of these indicators (1,620 m side and 20° angle) (denoted as grid);

  • the zoning system resulting from the TAZ delineation algorithm with grid based boundaries (denoted as TAZ);

  • the zoning system resulting from the TAZ delineation algorithm with irregular boundaries “adjusted” to an aggregation of small scale statistical units using the TAZ border conversion algorithm (denoted as TAZ BGRI—individual city blocks or a small group of them).

These different zoning systems are presented in Fig. 6, where the differences between them are clear, and only the TAZ and TAZ BGRI zoning systems are similar. The main difference between the zoning systems rests upon the greater geographical homogeneity of the last two zoning systems and, consequently, a greater statistical precision. This fact results from the constraints of the TAZ delineation algorithm.

Fig. 6
figure 6

Zoning systems compared in the Lisbon municipality example

The resulting values of the indicators of statistical and geographical precision obtained for the different zoning systems and the information loss generated by the intra-zonal trips of the resulting OD matrix are presented next:

  • The percentage of trips in statistically non-significant OD matrix cells (defining a relative estimation error of 50% as the limit for statistical significance), which defines the statistical precision in the estimation of OD matrix cells for each zoning system;

  • the 75th percentile of the zone equivalent radius (in the subset of statistically significant zonesFootnote 9), which defines the geographical precision of each zoning system;

  • the percentage of intra-zonal trips of the OD matrix (resulting from the sum of the OD matrix main diagonal cells), which defines the information loss for each zoning system.

These three indicators were calculated for the different zoning systems under analysis. The results are presented in Table 3, where it can be seen that the values obtained for the freguesias zoning system are worse than any of the other zoning systems. This result is due to the heterogeneity of zone size for this (historic) zoning system, which has very small zones in Lisbon city centre and very large zones near the Lisbon municipal borders. This fact generates a high percentage of intra-zonal trips with low geographic precision in the large border zones and, at the same time, a high percentage of trips in statistically non-significant OD matrix cells, which are caused by trips that have one of their trip ends in a very small zone in the city centre.

Table 3 Indicator values for the zoning systems of the comparison example (53 zones)

The grid zoning system, which is formed by regular larger zones, leads to overall better results compared to the freguesias zoning system (see Table 3). This fact reveals that the zoning system commonly used in traffic modelling on a municipality scale (freguesias) presents higher information loss than a square cell grid with large and regular boundary zones (grid).

This information loss can be very costly and can lead to two different situations: Traffic modelling results using this zoning system can present a high level of uncertainty, which can lead to poorly informed decisions, or the level of uncertainty of the results is so high that additional information to the available Mobility Survey is needed, increasing the sample and reducing the errors (but also severely increasing the costs). These results show that zoning is not a trivial matter, with significant consequences in terms of accuracy of results or additional costs incurred to ensure it.

The zoning system resulting from the TAZ delineation algorithm with regular geometry boundaries (TAZ), presents considerably better values for the indicators than does the previous zoning systems (freguesias and grid, as presented in Table 3), except for the percentage of trips in statistically non-significant OD matrix cells, where it presents a slightly higher value than does the grid zoning system (see Table 3). This small increase can be justified by the absence of the statistical precision on the TAZ delineation algorithm objective function. In spite of this, TAZ zoning greatly reduces the percentage of intra-zonal trips and maintains approximately the same statistical error as the square cell grid (with larger and uniform zones).

The last analysed zoning system, based on the “adjustment” of the results of the TAZ delineation algorithm with irregular geometry boundaries to an aggregation of small scale statistical units using the TAZ border conversion algorithm (TAZ BGRI), presents better indicator values than all other zoning systems (see Table 3). The improvement in the values of the indicators from the TAZ zoning system is only obtained by replacing regular (cell grid) geometry boundaries with irregular geometry boundaries. This fact confirms the assumption considered for the development of the TAZ border conversion algorithm that an irregular geometry boundaries zoning system should present better results compared to an equivalent regular geometry boundary zoning system.

The evolution of the results obtained using the different kinds of zoning systems (grid, TAZ, and TAZ BGRI) as the number of zones was varied is presented next (see Figs. 7, 8). The results show that TAZ BGRI consistently provides better results compared to the grid and TAZ zoning systems.

Fig. 7
figure 7

Evolution of the percentage of trips in statistically non-significant OD matrix cell indicators versus the percentage of intra-zonal trips of OD matrix indicators with the number of zones of the zoning system variation

Fig. 8
figure 8

Evolution of the 75th percentile of the zone equivalent radius indicator versus the percentage of trips in statistically non-significant OD matrix cell indicators with the number of zones of the zoning system variation

It is important to note that the grid zoning system for 40 zones presents a lower statistical precision indicator (approximately 20% less than that of TAZ BGRI and 28% less than that of TAZ) due to the presence of larger zones than in the other zoning systems, while at the same time, presents a significantly higher percentage of intra-zonal trips (approximately 40% more than the TAZ BGRI and 32% more than the TAZ) and also a higher 75th percentile of the zone equivalent radius (approximately 33% more than the TAZ BGRI and 33% less than the TAZ), as presented in Fig. 8.

After the assessment of the results obtained for this example, we next considered the TAZ and TAZ BGRI zoning systems corrected by the bi-criterion optimization algorithm. These new zoning systems are denoted as optimised TAZ and optimised TAZ BGRI, respectively.

In Table 4, the results of the analysis are presented, and it is easy to realise the existing trade-off between the three indicators. The bi-criterion optimization algorithm optimises the statistical precision of the zoning system through the TAZ borders rearrangements, while, at the same time, produces results with lower geographical precision and, in the TAZ case, also with a greater percentage of intra-zonal trips.

Table 4 Indicator values for the optimised TAZ and optimised TAZ BGRI zoning systems of the comparison example (53 zones)

The existent trade-off between the three indicators is very significant. For the TAZ zoning system, a reduction of approximately 3% of the percentage of trips in statistically non-significant OD matrix cell indicators lead to an increase of approximately 12% of the 75th percentile of the zone equivalent radius indicator and to an increase of approximately 1% of the percentage of intra-zonal trips. These results show the existence of a very high cross-elasticity between the percentage of trips in statistically non-significant OD matrix cells and the 75th percentile of the zone equivalent radius indicator (approximately −300%), showing that the gains obtained for the statistical precision of the zoning system are considerably lower than the losses of geographical precision.

Results are very similar for the TAZ BGRI zoning system, where a reduction of approximately 2% of the percentage of trips in statistically non-significant OD matrix cell indicators leads to an increase of approximately 24% of the 75th percentile of the zone equivalent radius indicator and to a decrease of approximately 0.5% of the percentage of intra-zonal trips. From these results, it is possible to conclude that experimenting with the bi-criterion optimization algorithm is of great value, as we can assess the cross-elasticity between statistical precision and geographical precision for the zoning system under consideration.

Conclusions

In most Transport Planning studies, a lot of effort is put in data collection, estimation of parameters and sophistication of models, but the issue of zoning rarely merits similar attention, normally being done on top of administrative units or “by common-sense”. The results obtained show that zoning is not a trivial matter giving the significant consequences it may have for the generation of statistical and geographical errors. This study introduces a new methodology of zones delineation that deals with the existent trade-offs in the process.

An important step is needed to consolidate this method: Testing the validity of the results obtained with a different database in order to quantify the sensitivity of the results to different travel patterns and land use distributions. Due to a lack of geocoded mobility surveys in Portugal, it was not possible during the project duration to assess other mobility surveys in order to validate the obtained results with other case studies.

Three main directions are envisaged for the next steps of this research:

  • Look at the consequence of these different zoning strategies on the traffic load estimates on roads of intermediate and lower hierarchies, especially after the implementation of all the local refinements to the base algorithm.

  • Investigate the consequences of these findings for the process of matrix estimation from traffic counts, being aware that the assignment of a trip to a zone directly affects the TAZ definition process. For this reason, studies based on traffic counts might have a simultaneous definition of the zoning system and the OD matrix, and not as an exogenous and pre-defined process. This iterative process until the convergence of the zoning systems and OD matrix might have a similar structure to the k-clustering algorithms, where the centroids of the clusters change position with every iteration.

  • Economically quantify the improvements in statistical precision and the reduction of percentage of intra-zonal trips obtained by the use of the algorithm. This significant decrease of information loss can lead to the use of smaller samples in data collection or to a significant increase of the quality of the data used in typical transportation planning studies, obtaining more robust results for the same input data and costs.

  • Evaluate the possible application of the algorithm with other indicators resulting from zonal data rather travel vector data (e.g., population vs. employment).