1 Introduction

Street maps and transportation networks are of fundamental importance in a wealth of applications. In the past, the production of street maps required expensive field surveying and labor-intensive postprocessing. Proprietary data vendors such as NAVTEQ (now Nokia), TeleAtlas (now TomTom) and Google therefore dominated the market. Over the last several years, Volunteered Geographic Information (VGI) [24] efforts such as OpenStreetMap (OSM) [26, 34] have complemented commercial map datasets. They provide map coverage especially in areas which are of less commercial interest. VGI efforts however still require dedicated users to author maps using specialized software tools. Lately, on the other hand, the commoditization of GPS technology and integration in mobile phones coupled with the advent of low-cost fleet management and positioning software has triggered the generation of vast amounts of tracking data. As a size indicator one can consider the contribution of tracking data in OpenStreetMap, which is steadily increasing in size and currently amounts to 2.6 trillion points [33]. Besides the use of such data in traffic assessment and forecasting [18], i.e., map-matching vehicle trajectories to road networks to obtain travel times [9], there has been a recent surge of actual map construction algorithms that derive not only travel time attributes but actual road network geometries from tracking data, e.g., [1, 2, 4, 7, 8, 1013, 16, 17, 20, 23, 25, 27, 28, 30, 3941, 43, 45]. Among those only a few algorithms give theoretical quality guarantees [1, 4, 13]. An example of a constructed map is given in Fig. 1, which shows (a) the vehicle trajectories collected for Berlin in grey color and (b) the respective constructed map, shown in black color, using the algorithm of [28] with an OpenStreetMap background map, shown in grey color.

Fig. 1
figure 1

Vehicle tracking data vs constructed map overlayed on ground-truth

A major challenge in the research community is to compare the performance and to evaluate the quality of the various map construction algorithms. Visual inspection remains the most common evaluation approach throughout the literature and only a few recent papers incorporate quantitative distance measures [3, 7, 8, 28, 30]. However, the cross-comparison of different algorithms remains rare, since algorithms and constructed maps are generally not publicly available. Also, there is a lack of benchmark data, and the quantitative evaluation with suitable distance measures is in its infancy. A cultural shift has recently been triggered by Biagioni and Eriksson [7]: In addition to providing an extensive survey of eleven map construction algorithms, they have performed a quantitative evaluation of three representative map construction algorithms. They have made their implementations of these algorithms, as well as their dataset, publicly available. The present work complements and significantly expands these benchmarking efforts to provide an evaluation and comparison of more map construction algorithms on more diverse datasets using various quality measures suitable for different applications. Such an effort can only be sustained in a culture of sharing that makes data, methods and source code publicly available.

In this work, we evaluate and compare seven map construction algorithms using four benchmark tracking datasets and four different distance measures. The algorithms we compare represent the state-of-the-art over the past several years and constitute representatives of different map construction algorithm classes. The algorithms include the recent algorithms by Ahmed and Wenk [4], by Ge et al. [23], and by Karagiorgou and Pfoser [28], in addition to the algorithms by Cao and Krumm [11], Davies et al. [16], Edelkamp and Schrödl [17], and Biagioni and Eriksson [8]. Among those, the algorithms by [11], [16] and [17] were previously compared by Biagioni and Eriksson [7]. We have used the publicly available implementations of the algorithms by [11], [16, 17] and by [8], and the authors of [23] ran their algorithm for us. We have also made our own implementations of the algorithms by [4, 28] publicly available (see below). The four distance measures used to assess the constructed map quality comprise two novel distance measures that have not been used for comparative evaluations of map construction before and that work with unmodified and unbiased ground-truth maps: the Directed Hausdorff distance [6] and the path-based distance measure presented by Ahmed et al. [3]. We also use a distance measure based on shortest paths by Karagiorgou and Pfoser [28] and the graph-sampling-based distance measure by Biagioni and Eriksson [8]. The implementation of the latter distance measure [8] has been made available to us by the authors. The tracking datasets include the Chicago dataset provided by Biagioni and Eriksson [7, 8], and three additional tracking datasets: two from Athens, Greece and one from Berlin, Germany (see details in Section 4). They are available together with unmodified ground-truth maps obtained from OpenStreetMap. We use different datasets because they cover diverse roads (i.e. highways, secondary roads), different sampling rates, and different scale.

In addition to providing the largest comprehensive comparison of map construction algorithms, we make our three new benchmark datasets, the map construction algorithms and outputs by Ahmed and Wenk [4] and by Karagiorgou and Pfoser [28], as well as the metric code for computing the three distance measures: the Directed Hausdorff distance [6], the path-based distance [3] and shortest path based measure [28] publicly available on the internet at mapconstruction.org. We have established this Web site as a repository for map construction data and algorithms, and we invite other researchers to contribute by uploading code and benchmark data supporting their map construction algorithms. We expect that such a central repository will encourage a culture of sharing and will enable the development of improved map construction algorithms.

Our main goal with this work is to provide a common platform to do comparative analysis of map construction algorithms. As different distance measures capture different features of a constructed map, it is hard to combine them into a single score and rank the algorithms based on that. Also, which algorithm is the best highly depends on the quality of the input data and for what purpose the map will be used. For example, for the Chicago dataset the KDE-based algorithm by Davies et al. [16] generates a very good-quality map in terms of spatial distance to the ground-truth map (captured using path-based and Directed Hausdorff distance), but if the user is interested in maps with good coverage (captured by shortest path based and graph-sampling based distance measures) this algorithm will not be the best choice as it ignores tracks in sparse areas as outliers/noise. So, we leave it to the user to pick the distance measure that suits her needs best.

The outline of the paper is as follows. Section 2 surveys map construction algorithms by introducing categories for types of algorithms and gives more details on the algorithms that we will use in our evaluation. Section 3 discusses quality measures that will allow us to assess the quality of the constructed maps. The tracking datasets that we provide for evaluation purposes are briefly discussed in Section 4. The datasets are available for download and also include the respective ground-truth map data. A comprehensive performance study comparing the various algorithms across datasets is given in Section 5. Finally, Section 6 provides conclusions and directions for future work.

2 Map construction algorithms

We assume that the input is given as a set of tracks, where each track is a sequence of measurements. Each measurement consists of a point (latitude/longitude or \(\left (x,y\right )\)-coordinates after suitable projection), a time stamp, and optionally additional information such as vehicle heading or speed. The desired output is to construct a street map, which we model as an undirected geometric graph in the plane. There are many possible models for street maps, mostly depending on the desired application and granularity. For example, an intersection can be modeled as a single vertex embedded as a point in the plane, or it could be a set of vertices possibly annotated with turn restrictions, or it could be a region. An edge can be modeled as an abstract connection between vertices, as a curve embedded in the plane, as a set of curves to model multiple lanes, and an edge might be directed to model one-way streets. We focus on the most basic model of a street map as an undirected geometric graph, where each vertex is embedded as a point in the plane and each edge is a polygonal curve that connects two vertices. The map construction algorithms in the literature generally model the maps as undirected graphs or different variants of directed graphs. But often, an undirected graph is computed as a first step and additional information such as edge directions, number of lanes, turn restrictions, and mean speed are computed in an additional post-processing step, e.g. [8, 11, 16, 17, 39]. We therefore perform the comparison on a common street map representation based on undirected graphs, although some algorithms may produce some additional attributes.

2.1 Related work

There exist several different approaches in the literature for constructing street maps from tracking data. These can be organized into the following categories: Point clustering (this includes k-means algorithms and Kernel Density Estimation (KDE) as described in Biagioni and Eriksson [8]), incremental track insertion, and intersection linking.

2.1.1 Point clustering

Algorithms in this general category assume the input to be a set of points, which are then clustered in different ways to obtain street segments that result in a street map. The input point set either comprises the set of all raw input measurements, or a dense sample of all input tracks. Here, the input tracks are assumed to be continuous curves obtained from interpolating (usually piecewise-linearly) between measurements. The point clustering techniques can be reduced to two types of methods for obtaining a street map. One approach type (cf. [17]) initially clusters the points to generate intersections and then computes the connecting segments as centerlines based on the trace points connecting the respective intersection clusters. Other approaches, such as KDE methods, compute the street map in one sweep. The set of points are interpreted as a skeleton image of the road network. The street map is computed as the set of centerlines derived from this image using, e.g., kernel density estimates.

The first approach type, spearheaded by Edelkamp and Schrödl [17], employs the k-means algorithm to cluster the input point set, using distance measures (e.g., Euclidean distance) and possibly also vehicle heading of the measurement, as a condition to introduce seeds at fixed distances along a path. Their map construction algorithm incorporates new algorithms for road segmentation, map-matching, and lane clustering. In [39] this approach was used to refine an existing map rather than building it entirely from scratch. In their short paper [25], Guo et al. make use of statistical analysis of GPS tracks in order to extract central line representation of a street, assuming that the GPS data follows a symmetric 2D Gaussian distribution. This assumption may become unrealistic, especially in error-prone environments. Worrall et al. [43] compute point clusters based on location and heading, and in a second step link these clusters together using non-linear least-squares fitting. They emphasize compression of the input tracks to infer a digitized road map and present their results only for small datasets. They are mostly concerned with topological elements and not with connected way points. Agamennoni et al. [2] present a machine-learning method to consistently build a representation of the map mostly in dynamic environments such as open-pit mines. They focus on estimating a set of principal curves from the input traces to represent the constructed map. Liu et al. [30] first cluster line segments based on proximity and direction, and then use the resulting point clusters and fit polylines to them, to extract road segments. In our comparisons, we use the algorithm by Edelkamp and Schrödl [17].

Another approach employs KDE methods to first transform the input point set to a density-based discretized image. Most of the KDE algorithms function well either when the data is frequently sampled (i.e., once per second) [12], or when there is a lot of data redundancy [8, 16, 40, 41]. A similar approach to [8] is presented in Liu et al. [30]. Generally, KDE algorithms have a hard time overcoming the problem of noisy samples when they accumulate in an area. Recently, Wang et al. [42] addressed the problem of map updates by applying their approach to OpenStreetMap data using a KDE-based approach. In our comparisons, we use the algorithms by Davies et al. [16] and by Biagioni and Eriksson [8], which are both KDE-based but use very different approaches to extract the map from the kernel density estimate.

In the computational geometry community, map construction algorithms have been proposed that cluster the input points using local neighborhood properties by employing Voronoi diagrams, Delaunay triangulations [13, 23], or other neighborhood complexes such as the Vietoris-Rips complex [1]. All these algorithms assume a densely sampled input point set, and provide theoretical quality guarantees for the constructed output map, under certain assumptions on the underlying street map and the input tracks. Aanjaneya et al. [1] view street maps as metric graphs, and they focus on computing the combinatorial structure by computing an almost isometric space with lower complexity, but they do not compute an explicit embedding of vertices and edges. Chen et al. [13] focus on detecting “good” street portions in the map and connect them subsequently. The theoretical quality guarantees, however, assume dense point sample coverage and error bounds, and make assumptions on the road geometry. In our comparisons, we use the algorithm by Ge and Wang [23].

2.1.2 Incremental track insertion

Algorithms in this category construct a street map by incrementally inserting tracks into an initially empty map [32], often making use of map-matching ideas [36]. Distance measures and vehicle headings are also used to perform additions and deletions during the incremental construction of the map. Intuitively, these methods use the first trace as a base map and refine it incrementally by adding more traces. With each insertion, new map detail is added and existing geometry is updated using interpolation.

One of the first algorithms in this category [38] clusters the tracks merely to refine an existing map and not to compute it from scratch. Cao and Krumm [11] first introduce a clarification step in which they modify the input tracks by applying physical attraction to group similar input tracks together. Then they incrementally insert each track by using local criteria such as distance and direction. Bruntrup et al. [10] propose a spatial-clustering based algorithm that requires high quality tracking data (sampling rate and positional accuracy). The work in [45] discusses a map update algorithm based on spatial similarity. It uses a method similar to GPS trace merging to continuously refine existing road maps. Ahmed and Wenk [4] present an incremental method that employs the Fréchet distance to partially match the tracks to the map. In our comparisons, we use the algorithms by Cao and Krumm [11] and by Ahmed and Wenk [4], which use very different approaches for incremental track insertion.

2.1.3 Intersection linking

The intersection linking approach emphasizes the correct detection of intersection nodes and the linkage of these nodes, both, in terms of connectivity and actual geometry. Intersections are identified based on movement characteristics (speed, direction) or point density. The intersections are then linked by interpolating the geometry of the connecting traces.

Fathi and Krumm [20] provide an approach that detects intersections by using a prototypical detector trained on ground truth data from an existing map. While a map is finally derived, their approach works best for geometries of vertically aligned street maps and it uses frequently sampled data of 1s or 5s. The method by Karagiorgou and Pfoser [28] relies on detecting changes in the direction of movement to infer intersection nodes, and then “bundling” the trajectories around them to create the map edges. It uses less frequently sampled data (>30s) and produces street maps for geometries of arbitrary road networks. In our comparisons, we use the algorithm by Karagiorgou and Pfoser [28].

2.2 Compared algorithms

Here we give some more details on the map construction algorithms that we compare in Section 5. The algorithms and respective categories are shown in Table 1.

Table 1 Algorithms categories

2.2.1 Ahmed and Wenk [4]

The algorithm by Ahmed and Wenk [4] is a simple and practical incremental track insertion algorithm. It models the street map as an undirected graph and uses one parameter ε to model error associated with the GPS tracks and with the street width. The insertion of one track proceeds in three steps. The first step performs a partial map-matching of the track to the partially constructed map in order to identify matched portions and unmatched portions. Figure 2a gives an example of a track with its matched portions shown in dark green (Fréchet distance of the sub-curve to the map is less than ε) and its unmatched portions shown in red (Fréchet distance of the sub-curve to the map is greater than ε). This partial map-matching is based on a variant of the Fréchet distance. In the second step, the unmatched portions of the track are then inserted into the partially constructed map, which requires creating new vertices and creating and splitting edges. In a third step, the already existing edges in the map that are covered by the matched portions of the trajectory, are updated using a minimum-link algorithm to compute a new representative edge (cf. Fig. 2b). This last step is only needed to provide a guaranteed bound on the complexity of the output map; in the implementation of this algorithm that we use in Section 5, this last step has been omitted. Ahmed and Wenk also give theoretical quality guarantees for the output map computed by their algorithm, which include a one-to-one correspondence between well-separated “good” portions of the underlying map and the output map, with a guaranteed Fréchet distance between those portions.

Fig. 2
figure 2

Incremental track insertion algorithm (images from presentation of[4])

2.2.2 Biagioni and Eriksson [8]

Biagioni and Eriksson [8] describe a point clustering-based algorithm that uses KDE methods. Their algorithm proceeds in using KDE with various thresholds to compute successive versions of a skeleton map. They annotate the map by performing a map-matching pass of the input tracks with the skeleton map. Figure 3 gives three example stages of the skeleton construction process using high to low KDE thresholds. The skeleton is represented as an undirected graph, and in a final step the edges are replaced with one directed edge per direction, as well as turn lanes at vertices. Just as in [8], for evaluation purposes we use the final undirected skeleton as the output.

Fig. 3
figure 3

KDE-based map construction using threshold ranges (images from presentation of [8])

2.2.3 Cao and Krumm [11]

This incremental track insertion approach models the streep map as a directed graph with one directed edge per direction. It proceeds in two stages. In the first stage, simulation of physical attraction is used to modify the input tracks to group portions of the tracks that are similar together. This results in a cleaner data set in which track clusters are more pronounced and the two differently directed portions of a road segment are more separated. Then, this much cleaner data is used as the input for a fairly simple incremental track insertion algorithm. This algorithm makes local decisions based on distance and direction to insert an edge or vertex and either merge the vertex into an existing edge, or add a new edge and vertex.

Figure 4 gives a respective map construction example. The three trajectories of Fig. 4a are used to incrementally build the graph in Fig. 4b by (i) either merging nodes to existing nodes if the distances are small and the directions of the traces match (nodes in boxes), or (ii) by creating new nodes and edges otherwise (nodes in circles).

Fig. 4
figure 4

The incremental track insertion algorithm - adapted from [11]

2.2.4 Davies et al. [16]

This is a classical KDE-based map construction algorithm. It first computes for each grid cell the density of tracks that pass through it (cf. the example of Fig. 5a). Then it computes the contour of the resulting bit map (Fig. 5b), and it uses the Voronoi diagram of the contour to compute a center line representation, followed by additional cleanup and assignment of edge directions (Fig. 5c). The final output is a directed graph in which each edge is labeled as directed or bidirected.

Fig. 5
figure 5

Clustering-based map construction algorithm (images from [16])

2.2.5 Edelkamp and Schrödl [17]

Edelkamp and Schrödl [17] were the first to propose a map construction approach based on the k-means method. Their point clustering algorithm creates road segments based on tracking data, represents the center line of the road using a fitted spline and performs lane finding. The lanes are found by clustering tracks based on their distance from the road center line. The final graph is directed with a directed edge per lane (Fig. 6).

Fig. 6
figure 6

Clustering-based map construction algorithm - adapted from [17]

2.2.6 Ge et al. [23]

This algorithm is a point clustering approach that applies topological tools to extract the underlying undirected graph structure. The main idea of this algorithm is to decompose the input data set into sets, each corresponding to a single branch in the underlying graph. The authors assume that the input point set is densely sampled and their algorithm only needs a distance matrix or proximity graph of the point set as input. Then they define a function on the proximity graph, which assigns to every point in the graph its geodesic distance to an arbitrary base point. They employ the Reeb graph to model the connected components of the level set of the inverse of this function. Finally, there is a canonical way to measure importance of features in the Reeb graph, which allows them to easily simplify the resulting graph. Runtime guarantees are provided, as well as partial quality guarantees for correspondence of cycles. They compared street-maps as sets of cycles. If a cycle organization one map does not correspond to a cycle on another map, then obviously a street or a turn is missing on the second map. An embedding for the edges is then obtained by using a principal curve algorithm [29] that fits a curve to the points contributing to the edge. Figure 7 gives an example of a constructed graph based on a point cloud shown as light (yellow) dots.

Fig. 7
figure 7

Reeb graph based map construction (images from [23])

2.2.7 Karagiorgou and Pfoser [28]

This intersection-linking map construction algorithm is a heuristic approach that “bundles” trajectories around intersection nodes. It represents the street map as a directed graph in which each edge is labeled as directed or bidirected. The main contribution of TraceBundle algorithm is its methodology to derive intersection nodes. It relies on detecting changes in movement to cluster “similar” nodes. A node at which a change in direction and speed occurs is considered a turn indicator. Turn clusters are produced based on (i) spatial proximity and (ii) turn type. The centroid of a turn cluster then becomes an intersection node. Links between intersection nodes are derived by compacting the trajectories connecting the intersection nodes. Figure 8 visualizes the steps of the algorithm. Figure 8a shows the constructed intersection nodes as black stars. The constituting turn clusters are shown as x and o markers. Figure 8b shows the links between intersections nodes as black lines. The constituting trajectories are shown as dashed lines. The TraceBundle algorithm has three tunable parameters, angular difference,speed and spatial proximity. Angular difference is the relative change of the vehicle direction measured in degrees. The speed threshold indicating turning vehicles is measured in k m/h. This is am empirical maximum threshold to separate high-speed turns from turns at intersections. Spatial proximity distance threshold for clustering turn clusters into intersection nodes (in m).

Fig. 8
figure 8

The TraceBundle algorithm [28]

3 Quality measures for map comparison

There are two key ingredients for evaluating the quality of a constructed map: (i) the availability of an adequate ground-truth map G as part of the benchmark data and (ii) a quality measure used to evaluate the similarity between the constructed map C and the ground-truth map G.

There are essentially two cases of what can be considered as a ground-truth map G. Ideally, G is the underlying map consisting of all streets, and only those streets, that have been traversed by the entities that generated the set of input tracks. If such a G was available, then a suitable quality measure would compare C to all of G and the ideal would be for C to equal G. However, in practice, it is hard to obtain an unbiased ground-truth map that exactly corresponds to the coverage of the tracking data. This non-trivial task has been addressed in the past by pruning the ground-truth either manually, by proximity to the tracking data, or by map-matching the tracking data to the map [7, 8, 28, 30]. By using graph topologies based on human judgment, or from the cropping behaviors of the different pruning algorithms, clearly all these approaches introduce an undesired bias.

Actually, it is much easier to obtain a ground-truth map that covers a superset of all the streets covered by the input tracks, e.g., street maps taken by proprietary vendors or OpenStreetMap. Therefore, if G is a superset, then the quality measure attempts to partially match C to G. Of course, another possible scenario is that C contains additional streets that are not present in either variation of G.

3.1 Related work

In the graph theory literature, there are various distance measures for comparing two abstract graphs that do not necessarily have a geometric embedding [15, 22, 37]. Most closely related to street map comparison are the subgraph isomorphism problem and the maximum common isomorphic subgraph problem, both of which are NP-complete. These, however, rely on one-to-one mappings of graphs or subgraphs, and they do not take any geometric embedding into account. Graph edit distance [21, 44] is a way to allow noise by seeking a sequence of edit operations to transform one graph into the other, however it is NP-hard as well. Cheong et al. [14] consider a graph edit distance for geometric graphs (embedded in two different coordinate systems, however), and also show that it is NP-hard to compute.

For comparing street maps, distance measures based on point sets and distance measures based on sets of paths have been proposed. Point set-based distance measures treat each graph as the set of points in the plane that is covered by all its vertices and edges. The idea is then to compute a distance between the two point sets. A straightforward distance measure for point sets are the directed and undirected Hausdorff distances [6]. The main drawback of such an approach is that it does not use the topological structure of the graph. Biagioni and Eriksson [7, 30], use two distance measures that essentially both use a variant of a partial one-to-one matching that is based on sampling both graphs densely. The two distance measures compare the total number of matched sample points to the total number of sample points in the graph, thus providing a measure of how much of the graph has been matched. They do require though to have as input a ground-truth graph that closely resembles the underlying map and not a superset.

For path-based distance measures on the other hand, the underlying idea is to represent the graphs by sets of paths, and then to define a distance measure based on distances between the paths. This captures some of the topological information in the graphs, and paths are of importance for street maps in particular since the latter are often used for routing applications for which similar connectivity is desirable. Mondzech and Sester [31] use shortest paths to compare the suitability of two road networks for pedestrian navigation by considering basic properties such as respective path length. Karagiorgou and Pfoser [28] also use shortest paths, but to actually assess the similarity of road network graphs. Computing random sets of start and end nodes, the computed paths are compared using Discrete Fréchet distance and the Average Vertical distance. Using those sets of distances, a global network similarity measure is derived. In another effort, Ahmed and Wenk [3] cover the networks to be compared with paths of k link-length and map-match the paths to the other graph using the Fréchet distance. They are the first to introduce the concept of local signature to identify how and where two graphs differ.

3.2 Quality measures used for comparison

Here we give some more detail on the quality measures that we use in Section 5 to compare the different road network construction algorithms. Note that in our experiments the ground-truth G is an unmodified street map from OpenStreetMap and thus expected to be a superset of the underlying graph. We use the Directed Hausdorff distance [6], the path-based distance measure presented by Ahmed et al. [3], the distance measure based on shortest paths by Karagiorgou and Pfoser [28] and graph-sampling based distance measure by Biagioni and Eriksson [7]. The first two measures have not been used for comparative evaluations of road network constructions before.

3.2.1 Directed hausdorff distance [6]

The directed Hausdorff distance of two sets of points A,B is defined as \(\overrightarrow {d}(A,B) = \max _{a \in A}\min _{b \in B} d(a,b)\). Here, d(a,b) is usually the Euclidean distance between two points a and b. Intuitively, the directed Hausdorff distance assigns to every point in a its nearest neighbor bB and takes the maximum of all distances between assigned points. In order to compare two graphs, we identify each graph as the set of points that is covered by all its vertices and edges. If the directed Hausdorff distance from graph C to graph G is at most ε, this means that for every point on any edge or vertex of C there is a point on G at distance at most ε. Or equivalently, every point of C is contained in the Minkowski sum of G with a disk of radius ε; the Minkowski sum intuitively “fattens” G by “drawing” each of its edges with a thick circular pen. This distance measure gives a notion about spatial distance for graphs. If C is the constructed graph and G is the ground-truth, the lower the distance from C to G, the closer the graph C to G.

3.2.2 Path-based distance [3]

The path-based map distance considers graphs as sets of paths. The distance between two sets of paths is then computed in the Hausdorff setting, while the Fréchet distance which is a natural distance measure for curves that takes monotonicity and continuity into account, is used to compute the distance between two paths.

For curves f,g, the Fréchet distance is defined as

$$ \delta_{F}(f,g) = \inf\limits_{\alpha,\beta:[0,1]\rightarrow [0,1]} \max\limits_{t\in\left[ 0,1\right] }d(f(\alpha(t)),g(\beta(t))), $$
(1)

where α,β range over continuous, surjective and non decreasing reparametrizations.

A common intuition is to explain it as the minimum leash length required such that a man and dog can walk on the two curves from beginning to end in a monotonic way.

Under this scope, let C and G be two planar geometric graphs, and let π C be a set of paths generated from C, and π G be a set of paths generated from G. The path-based distance is defined as:

$$ \overset{\rightarrow}d_{C,G}({\pi_{C}}{,\pi_{G}}) = \max\limits_{p_{C} \in \pi_{C}}\min\limits_{p_{G} \in \pi_{G}}\delta_{F}(p_{C},p_{G}) $$
(2)

Ideally, π C and π G should be the set of all paths in C and G, which however has exponential size. In [3] they showed that \(\overset {\rightarrow }d_{C,G}({\varPi _{C}}{,\varPi _{G}})\) can be approximated using \(\overset {\rightarrow }d_{C,G}({{\varPi ^{3}_{C}}}{,\varPi _{G}})\) in polynomial time using the map-matching algorithm of [5], under some assumptions on C. Here, π C is the set of all paths and \({{\Pi }^{3}_{C}}\) is the set of all link-3 paths of C. A link-k path consists of k “edges”, where vertices of degree two in the graph are not counted as vertices. Using this asymmetric distance measure \(\overset {\rightarrow }d_{C,G}({{\varPi ^{k}_{C}}}{,\varPi _{G}})\), which can be computed in polynomial time for constant k, the following properties have been shown in [3], under some assumptions on C:

  1. 1.

    k=1: For each edge in C, there is a path in G which is within Fréchet distance \(\overset {\rightarrow }d_{C, G}(\varPi _{C}^{1}, \varPi _{G})\).

  2. 2.

    k=2: For each vertex v in C there is a vertex in G within bounded distance \(\overset {\rightarrow }d_{C, G}(\varPi _{C}^{2}, \varPi _{G})/\sin {\frac {\theta }{2}}\), where 𝜃 is the minimum incident angle at v between its adjacent edges.

  3. 3.

    k=3: \(\overset {\rightarrow }{d}_{C, G}(\varPi ^{3}_{C}, \varPi _{G})\) approximates \(\overset {\rightarrow }d_{C, G}(\varPi _{C}, \varPi _{G})\) within a factor of \(1/\sin {\frac {\theta }{2}}\) if the vertices of C are reasonably well separated and have degree ≠3. Footnote 1

Similar to Directed Hausdorff distance, the lower the value of \(\overset {\rightarrow }{d}_{C, G}(\varPi _{C}, \varPi _{G})\) the more closely the constructed map C resembles the ground-truth map G.

The local signature of a vertex vC is defined as \({\Delta }_{v}=\overset {\rightarrow }{d}_{C, G}({\varPi _{C}}_{v}, \varPi _{G})\) where π C v is a set of paths that contains v. In a similar way, the local signature of an edge eC is defined as \({\Delta }_{e}=\overset {\rightarrow }{d}_{C, G}({\varPi _{C}}_{e}, \varPi _{G})\) where π C e is a set of paths that contains e. Based on the value of these signatures one can identify which vertices or edges are very similar and which are not.

3.2.3 Shortest path based distance [28]

Karagiorgou et al. [28] propose a measure that essentially samples each graph using random sets of shortest paths. First a ground-truth network is derived using the tracking data as a filter. Augmenting the geometry of the ground-truth network with buffer regions around its edges and intersecting it with the tracking data, we obtain a reduced network graph that we use as a ground-truth network G. In, both, the constructed and the ground-truth networks C and G, respectively, we randomly select the same set of origin, destination nodes and compute the respective shortest paths in both networks. The geometric difference/similarity between these shortest paths is used to assess the similarity between C and G, and so the quality of the constructed network. The Discrete Fréchet distance and the Average Vertical distance are used to compare the shortest paths. The rationale for using this approach is that measuring the similarity for sets of paths instead of individual links allows one to better reason about the connectivity of the generated network. The more “similar” the shortest paths in the constructed network are to the ground-truth network, the higher also the quality of the network. The results of this shortest path comparison can be assessed by plotting the distance of all paths against each other, or by deriving parameter of the entire set of paths. We employ both approaches in our experiments as shown in the following.

3.2.4 Graph-sampling based distance [7]

Biagioni and Eriksson [7] introduce a graph-sampling based distance measure in order to evaluate geometry and topology of the constructed road networks represented by graphs. The main idea is as follows: starting from a random street location, explore the topology of the graphs by placing point samples on each graph during a graph traversal outward within a maximum radius. This produces two sets of locations, which are essentially spatial samples of a local graph neighborhood. These two point sets are compared using one-to-one bottleneck matching [19] and counting the unmatched points in each set. Note that the graph traversal can take directions of edges into account if desired; in [7] the authors apply this distance measure to directed graphs, while in [8] the authors apply it to undirected graphs.

The sampling process is repeated for several seed locations. For the bottleneck matching, the sample points on one graph can be considered as “marbles” and on the other graph as “holes”. The algorithm considers one-to-one matchings between the point-sets and only allows points to be matched that are at distance at most a given threshold. Intuitively, if a marble lands close to a hole it falls in, marbles that are too far from a hole remain where they land, and holes with no marbles nearby remain empty. If one of the graphs is the ground truth, this difference represents the accuracy of the other graph. Counting the number of unmatched marbles and empty holes quantifies the accuracy of the generated road network with respect to the ground truth according to two metrics. The first metric is the proportion of spurious marbles, \(spurious =spurious\_ marbles /\left (spurious\_marbles + matched\_ marbles\right )\) and the second is the proportion of missing locations (empty holes), where \(missing = empty\_holes/\left (empty\_holes + matched\_holes\right )\).

To produce a combined performance measure from these two values, the well-known F-score is used, which is computed as follows:

$$ F {-} \textit{score} = 2 * \frac{\textit{precision} * \textit{recall}}{\textit{precision}+\textit{recall}} $$
(3)

where, precision=1−spurious and recall=1−missing.

The higher the F-score, the closer the match. Sampling the graphs locally is an important aspect of this approach as it provides the ability to capture the connectivity of the graphs at a very detailed level, allowing the topological similarity to be measured. Repeated local sampling at randomly chosen locations yields an accurate view of local geometry and topology throughout the graph.

A modified version is used in [8] where the method ignores parts of the road network where no correspondence could be found between generated and ground-truth networks, for our experiments we used this modified version.

3.3 Comparison of distance measures

All the distance measures described in Section 3.2 capture different properties of graphs. Based on the desired type of similarity, different distance measures could be employed. For example, if one is interested in ensuring similar shortest paths in the two graphs, requiring that independent queries produce similar routes, then the shortest path based measure would be the prefect choice [28, 31] among all. If, however, one wants to know the spatial displacement between the two graphs without necessarily considering any kind of topology or path similarity, then the directed Hausdorff distance [6] would be the distance measure to choose.

On the other hand, the two distance measures described in [8] and [3] maximize the use of topology in comparing graphs. Using the concept of local signature described in [3] one can visualize the exact differences in graphs using any of these two measures. Figure 9 shows an example where the graph sampling based distance [8] fails to identify local differences (the dotted graph has a broken connection in the gray square region). As it samples small sub-graphs starting from a root location, it cannot capture this kind of broken connection when another connecting detour between the two parts is available in that small sub-graph. As the path-based distance [3] exploits every adjacency transition around a vertex, it verifies all connectivities.

Fig. 9
figure 9

Graph G (dotted edges) overlayed on H (gray). G and H differs in the shaded squared region. The distance measure in [8] fails to capture the broken connection in G, as there is always detour available to reach every edge and sample it

Among these four measures only the graph sampling based distance [8] ensures one-to-one correspondence. So, if one of the graphs has missing streets or extra edges, that is reflected in the overall score as well as in the local signatures.

4 Datasets

A basic means for assessing map construction algorithms is the underlying dataset comprising vehicle trajectories and ground-truth map datasets. The datasets are in a projected coordinate system (UTM, GGRS87). All the visualizations of the datasets are also available on the mapconstruction.org Web site. The statistics of the datasets are provided in Table 2.

Table 2 Statistics for datasets used

Our experiments use several tracking datasets from different cities (Fig. 10). While other publicly available GPS-based vehicle tracking datasets exist, e.g., GeoLife [46] and OpenStreetMap GPX track data [35], the selected range covers the various types of existing datasets produced by different types of vehicles, at varying sampling rates and representing different network sizes. In the experiments, it becomes obvious that map construction algorithms produce less accurate street maps for less frequent sampling rates and large scaled networks.

Fig. 10
figure 10

Tracking data

The Athens large dataset consists of 511 trajectories with a total length of 6,781 k m (average: 13.27 km and standard deviation: 10.79 km) obtained from school buses covering an area of 12 km × 14 km; the tracks range from 32 to 80 position samples, with a sampling rate of 20 s to 30 s (average: 30.14s and standard deviation: 24.77 s) and an average speed of 20.16 k m/h. The Athens small dataset consists of 129 tracks with a total length of 443 km (average: 3.82km and standard deviation: 1.45 km) obtained from school buses covering an area of 2.6 km × 6 km; the tracks range from 13 to 47 position samples, with a sampling rate of 20‘s to 30 s (average: 34.07 s and standard deviation: 31.92 s) and an average speed of 19.55 k m/h. The Berlin dataset consists of 26,831 tracks with a total length of 41,116 km (average: 1.53 km and standard deviation: 634.51 m) obtained from a taxi fleet covering an area of 6 km × 6 km; the tracks comprise from 22 up to 58 position samples, with a sampling rate of 15 s to 127 s (average: 41.98 s and standard deviation: 38.70 s) and an average speed of 35.23 k m/h. The Chicago dataset [7, 8] consists of 889 tracks with a total length of 2,869 km (average: 3.22 km and standard deviation: 894.28 m) obtained from university shuttle buses covering an area of 7 km × 4.5 km; the tracks range from 100 to 363 position samples, with a sampling rate of 1 s to 29 s (average: 3.61 s and standard deviation: 3.67 s) and an average speed of 33.14 k m/h.

For all cases, we consider as ground-truth map data the corresponding OpenStreetMap excerpt.

5 Experiments

What follows is a description of the map construction experiments that were conducted for the range of algorithms, datasets and evaluation measures, with the scope to assess the quality of the constructed maps. In the experiments, we used C, Java, Python, and Matlab implementations of seven map construction algorithms. For evaluation purposes, we used only the underlying undirected graph structures computed by the algorithms, and we dropped directions and any other additional annotations. The experiments for six algorithms have been performed by the authors and the implementations have been made available on the mapconstruction.org Web site. The authors of [23] performed the experiments themselves, since we did not have access to their implementation. Given the implementations, (i) their difference in code base, (ii) their scope, i.e., to construct small-scale maps from GPS trajectories, and (iii) their quality, i.e., all are academic prototypes, we did not assess the characteristics of the algorithms themselves by means of a performance study or theoretical analysis. However, to at least give an impression of their running times, for the Chicago dataset the running times of the algorithms range from 10 min to 20 h. For the larger Berlin dataset, the running times range from 2 h to 4 days. Given the quality of the implementations, another problem we encountered was that some algorithms could not cope with the size of the input dataset (trajectories) resulting in runtime crashes attributed to memory consumption and leaks. Hence, not all algorithms could be tested on the large datasets and results for all algorithms are only available for the smaller datasets, i.e., Athens small and Chicago.

5.1 Constructed Maps

What follows is an initial overview of the experiments in terms of constructed maps and the respective result quality. Figure 11 illustrates the ground-truth map (light gray) and the generated maps (black) for the small Chicago dataset. On larger datasets, i.e., Athens large and Berlin, we ran the algorithms described in Sections 2.2.1, 2.2.6 and 2.2.7. Figure 12 illustrates the ground-truth map (light gray) and the generated maps (black) for the case of the larger Berlin dataset.

Fig. 11
figure 11

Constructed maps (in black) overlayed on ground-truth map (in gray) (small dataset)

Fig. 12
figure 12

Constructed maps (in black) overlayed on ground-truth map (in gray) (large dataset)

Each of the algorithms uses different parameter settings. The values of clustering parameters for Ahmed and Wenk [4] are: 180, 90, 170 and 80 meters for Athens large, Athens small, Berlin and Chicago, respectively; and for Cao the value is 20 m [11]. The minimum bearing difference between two streets at any intersection is 45 [11]. The respective parameters of intra cluster distance threshold (minimum separation between streets) and intra cluster bearing threshold (minimum angular difference between two streets at intersection) for the km e a n s algorithm is Edelkamp 50 m and 45 [17]. The respective parameters of minimum density threshold for the K D E−based algorithms are Biagioni 50 m [8] and Davies 16 m [16]. For Karagiorgou and Pfoser [28] the values of direction, speed and proximity to extract intersection nodes and to merge trajectories into links are 15 , 40 k m/h and 25 m accordingly. We evaluated all constructed maps using the distance measures described in Section 3.2.

A summary of the complexities of the constructed maps is shown in Table 3. Here, the number of vertices includes vertices of degree two (which may lie on a polygonal curve describing a single edge), the number of edges refers to the number of undirected line segments between these vertices, and the total length refers to the total length of all undirected line segments. It appears that the point clustering algorithms based on kernel density estimation such as Biagioni et al. [7, 8] and Davies et al. [16] produce maps with lower complexity (fewer number of vertices and edges) but often fail to reconstruct streets that are not traversed frequently enough by the input tracks. In particular, the maps reconstructed by Davies et al.’s algorithm are very small. On the other hand, the algorithm by Ge et al. [23] subsample all tracks to create a much denser output set, hence the complexity of their constructed maps is always higher.

Table 3 Complexities of the generated maps

Map construction algorithms based on incremental track insertion, such as Ahmed et al. [4] and Cao et al. [11] fail to cluster tracks together when the variability and error associated with the input tracks is large. As a result, the constructed street maps contain multiple edges for a single street, which implies larger values in the total edge length column in Table 3.

Several examples of generated maps are shown in Figs. 11 and 12. Since not all algorithms produced results for all maps, we showcase examples of the smaller Chicago map in Fig. 11. It can be clearly seen that the coverage and quality of the constructed map varies considerably. Three examples for the Berlin map are also given in Fig. 12. More examples can be found on the mapconstruction.org Web site.

5.2 Path-based and hausdorff distance

For the path-based distance measure we generated all paths of link-length 3 for each generated map. For each path, we computed the Fréchet distance between the path and the ground-truth map. We then computed the minimum, maximum, median, average of all the obtained distances. We also computed the d %-distance, as the maximum of the distances after removing the d % largest distances (“outliers”). For the Directed Hausdorff distance, we computed all link-length 1 paths and computed the Directed Hausdorff distance of the union of all edges to the ground-truth map. Our results are summarized in Table 4. In the case of Athens small, the Cao algorithm produced a very small map and thus it was not possible to perform a quantitative evaluation.

Table 4 Path-Based and Directed Hausdorff distance measure evaluation

The maps reconstructed using the algorithms by Karagiorgou et al. [28] and by Biagioni et al. [7, 8] generally have a better path-based distance than the others. Note that Davies et al.’s [16] map is unusually small for the Athens small dataset. Their idea of averaging trajectories, or computing skeletons, however, seems to help to improve the quality of the edges of the produced map.

For further analysis of the results, we selected the Chicago dataset as all map construction algorithms produced results for it. From Table 4 one can see that the path-based distance and the Directed Hausdorff distance are smaller for the generated maps by Biagioni, Davies and Karagiorgou compared to map generated using other algorithms. A visual inspection of the maps in Fig. 11 justifies the result. Note that Davies et al.’s [16] map is comparatively smaller than the other (cf. Table 3). Although the algorithms by Ahmed et al. and by Ge et al. produce maps with good coverage, their path-based distances are larger since they employ less aggressive averaging techniques that would help cope with noise in the input tracks.

To illustrate the appropriateness of the path-based distance, consider the path in Fig. 13 from the map generated by Biagioni et al. This is an example where the Fréchet-based distance measure is more effective than any point-based measure. As Fréchet distance ensures continuous mapping, the whole path needs to be matched with the bottom horizontal edge of the ground-truth map. The Fréchet distance for this path is 71 m. For the same path, the Hausdorff distance is 53 m, as this only requires for each point on the path to have a point on the graph close-by. So, to evaluate the connectivity of a map, the Fréchet distance is a more suitable distance measure than any point-based measure.

Fig. 13
figure 13

A path with Fréchet distance greater than Hausdorff distance

In addition, if desired one can discard outliers by computing the d %-distance. Figure 14 shows the distribution of both the path-based measure and the Directed Hausdorff distance for Biagioni et al. In both cases, a very small number of paths have the maximum distance, and the distances for most of the paths are distributed within a small range. Removing only 5 % of the outliers (largest) brings the path-based distance from 71 m (max) to 38 m and the Directed Hausdorff distance from 53 m (max) to 25 m. Figure 15 shows edges of maps with smaller distances in lighter shades and larger distances in darker shades. Such visual representation helps to identify areas in the map that have higher distance to the ground-truth map.

Fig. 14
figure 14

Distributions of individual path distances (Biagioni alg. - Chicago)

Fig. 15
figure 15

Reconstructed graph overlayed on ground-truth map (light gray). Based on link-length 3 paths, edges in lighter shades has smaller distance and darker shades has larger distance

5.3 Shortest path based measure

Another means to compare the constructed maps is the shortest path based distance. For each city, we computed a set of 500 random shortest paths with origin and destination nodes uniformly distributed over the maps and compared the paths using the Discrete Fréchet and Average Vertical distance measure.

A first impression on how different constructed maps affect such paths is given in Fig. 16. Given a specific origin and destination for the Chicago map, the shortest path has length 3.66 km in the ground-truth map (black dotted line). The computed shortest path for the map generated by each algorithm is shown in red. In the map generated by Ahmed et al.’s algorithm the shortest path has length 4.67 km (a Discrete Fréchet distance with respect to the ground-truth map of 65 m and an Average Vertical distance of 21 m). The respective results for the other algorithms are Biagioni 3.71 km (36 m, 5 m), Cao 3.76 km (24 m, 6 m), Davies 3.39 km (35 m, 4 m), Edelkamp 3.64 km, (26 m, 8 m), Ge 7.33 km, (174 m, 98 m), and Karagiorgou 3.73 km (21 m, 5 m). For most algorithms the resulting paths have a small distance to the shortest path in the ground-truth map. However, in the case of Ahmed (Fig. 16a) and Ge (Fig. 16f), due to significant differences in the generated map, different shortest paths have been computed that have a larger distance to the shortest path in the ground-truth map. This result is in line with the path-based measure of Section 5.2, where also Biagioni, Davies and Karagiorgou produced the best constructed maps.

Fig. 16
figure 16

Examples of shortest paths for the Chicago dataset

Figure 17a and b show the Discrete Fréchet distance and the Average Vertical distance measures for each of the 500 paths per algorithm for the Athens large map. The paths are ordered by increasing distance of the shortest path length with respect to the ground-truth map. Some paths could not be computed for some maps due to connectivity problems (missing links). Some other paths experience greater distance measures due to spatial accuracy problems. The graph shows that some algorithms produce maps which resemble the actual map more closely, as assessed by this shortest path sampling approach.

Fig. 17
figure 17

Map comparison

Finally, the shortest path based evaluation is summarized in Table 5. The first column shows the percentage (%) of shortest paths that in each case could be computed, i.e., an algorithm might find an accurate, but small map. The second and the third column show the two different distance measures used to compare the resulting paths. The fourth column gives some statistics with respect to the computed shortest paths. Considering the example of Berlin and here the Ahmed algorithm result in Table 5, this algorithm produces a map that in turn generates paths that have a min, max, and avg. Discrete Fréchet distance of 21 m, 469 m, and 192 m, respectively. An aspect not captured by these distances are missing paths due to limited map coverage. Consider the case of Davies for Chicago and Cao for Athens small. In both cases, the distance measures suggest good map quality. However, in both cases the constructed map has a small coverage, as only 92.6 % and 7.0 % of the 500 total paths were computed. In this evaluation, Karagiorgou produces maps that have both good coverage and high path similarity (cf. dark-shaded entry for Berlin - good coverage and small distance measure indicating similar paths between constructed and ground-truth map).

Table 5 Shortest path measure evaluation summary

Overall, shortest path sampling provides an effective means for assessing the quality of constructed maps as it not only considers similarity, but also the coverage of the map.

5.4 Graph-Sampling Based Distance

For this measure we use the source code obtained from the authors of [7]. We modified the code to use Euclidean distance as our data uses projected coordinate systems. The method that computes this measure has four parameters: (i) sampling density, how densely the map should be sampled (marbles for generated map and holes for ground-truth map), we use 5 m, (ii) matched distance, the maximum distance between a matched marble-hole pair, we vary this distance from 10 to 120 m, (iii) maximum distance from root, the maximum distance from randomly selected start location one will explore, we use 300 m and (iv) number of runs, number of start locations to consider, we use 1000. According to the implementation a start location is selected on the 2D-plane, then a corresponding set of start locations on each graph is selected based on matched distance and finally a union of all edges traversed from all corresponding start locations are used to produced set of sample points. A larger matched distance might yield a larger number of sample points. To make our comparison of all generated maps consistent, we generated a sequence of random locations for each dataset and used the first 1,000 locations from the same sequence for each algorithm for which both maps (ground-truth and generated) had correspondences within matched distance. When two maps are very similar, they should have very few unmatched marbles and holes, which implies the precision, recall and F-score values should be very close to 1. In our case, as we used a superset of the ground-truth map, there should be a large number of unmatched holes, which implies lower recall and F-score values than in [7], but still the relative comparison of F-score values should provide an idea of whether an algorithm performs better than another.

We chose matching distance thresholds upto 120 m to make it consistent with the error associated with input data. As mentioned in [7] some area in the Chicago dataset region traces show consistent errors well over 100 m.

Figure 18 shows F-score values for the Chicago dataset for different generated maps. As our ground-truth is essentially a superset of the actual ground-truth represented by the tracking dataset, a larger matching distance creates unexpected results for algorithms that generate extra edges and vertices. For example, in the case of Cao and Edelkamp for Chicago, the precision is low as there will be lots of unmatched marbles (cf. entry for Cao and Edelkamp for Chicago in Table 6). However, a larger matching distance decreases the number of unmatched marbles by matching these with available holes that probably are not part of the actual ground-truth. A higher recall value yields a higher F-score, which does not necessarily reflect better-quality maps (cf. Figs. 11 and 12).

Fig. 18
figure 18

Comparison of F-scores - Chicago

Table 6 Precisions for varying matched distance

In Fig. 18 we also see the performance based on F-score declines for Biagioni, Davies and Karagiorgou as the matching distance threshold increases. In investigating the reason of this unexpected behavior, we found that although precision increases with matching distances the recall declines for these three algorithms; and smaller recall indicates larger number of unmatched sample points on ground-truth (empty holes). Figure 11 and Table 3 show these three algorithms reconstruct less streets than others, which means they produce a smaller number of marbles to match with a larger number of holes. We explained earlier in this subsection how the total number of holes might increase with a choice of a larger matched distance.

Hence, in Table 6 we are ignoring F-score and recall values and showcase only precision values. According to precision values, the algorithms by Biagioni, Davies and Karagiorgou perform best for dataset Chicago, which is consistent with our findings using the other three distance measures.

5.5 Summary

The best way to characterize the constructed maps is in terms of coverage and accuracy. Here, it appears that KDE-based point clustering algorithms such as Biagioni and Davies produce maps with lower complexity (fewer number of vertices and edges) and often fail to reconstruct streets that are not traversed frequently enough by the input tracks. On the other hand, the algorithm by Ge subsample all tracks to create a much denser output set, hence the complexity of their constructed maps is always higher. A similar observation can be made for algorithms based on incremental track insertion, such as the Ahmed and Cao algorithms. They fail to cluster tracks together when the variability and error associated with the input tracks is large. As a result, the constructed street maps contain multiple edges for a single street, which implies a larger constructed, but not necessarily more accurate road network.

In terms of map quality and accuracy, the maps reconstructed using the algorithms by Karagiorgou, Davis, and Biagioni generally have smallest path-based and Directed Hausdorff distances and their constructed maps can be considered more accurate. Although the algorithms by Ahmed and Ge produce maps with good coverage and provide quality guarantees, their path-based distances are larger, since they employ less aggressive averaging techniques that would help cope with noise in the input tracks. In an effort to assess both accuracy and coverage, the shortest path based measure shows for the cases of Davies and Chicago and Cao and Athens small good map quality, but at the same only limited coverage. In this evaluation, Karagiorgou produces maps that have both good coverage and high path similarity.

An overall observation to be made based on our experiments is that map construction algorithms tend to produce either accurate maps, or maps with good coverage, but not both. The algorithm of Karagiorgou however seems to be a good compromise, in that it produces maps of good coverage and accuracy at the same time.

6 Conclusions

This survey has considered the active field of road network construction and has considered a variety of such construction algorithms. In the past, the lack of benchmark data and quantitative evaluation methods has hindered a cross-comparison between algorithms. In this paper, the contribution of benchmark data sets and code for road network construction algorithms and evaluation measures for the first time enables a standardized assessment and comparison of road network construction algorithms. All data, road network construction, and evaluation algorithms are available with detailed execution instructions on the mapconstruction.org Web site. Directions for future work include the expansion of the web site towards the inclusion of more algorithms and source code. The final goal will be to provide an easy-to-use benchmark suite and automated quality measurements for generated maps.