Keywords

1 Introduction

Transport planners usually model roads as one single edge between two nodes (e.g. intersections) in transportation networks, irrespective of the number of lanes. Therefore, single lines as part of an entire road graph represent the road sections. Often lane-specific information like the number of lanes is included in additional attributes of the graph. This generalization of road geometries reduces the resolution of the data as well as costs in the development and maintenance of a transportation network [1].

However, existing and emerging ITS services might require digital road network graphs with a higher level of detail and accuracy regarding the representation of lane center lines. Cooperative services, for instance, often either need the lane specific localization of messages or provide information for specific lanes [2]. Examples would be lane departure warnings, local hazard messages (e.g. road bumps, accidents, congestion) or lane specific route information (speed limit, turn relations, curvature).

In the context of (highly) automated driving, transportation networks acts as a priori basic information, so that a vehicle can localize itself on the road using its own position relative to the road geometry. For this purpose, highly detailed maps, which include among others the lane center positions, the exact lane widths, associations between neighboring lanes and road hierarchy of single lanes, are required [3].

The development of such detailed maps, which contain a lane-specific transportation network, needs extensive measurement campaigns using highly accurate localization equipment or technologies. This is a costly and time-consuming process especially for wide areas or spacious transportation networks. On the other hand positioning data from moving observers (vehicles) so called floating-car data (FCD) is a GNSS based data source and is often available for wide areas.

Methodologies to derive geometries and topologies for digital street maps using GNSS-based FCD has been the focus of several research projects and studies in recent years. Davies et al. [4] focused on determining road center lines by assigning GNSS-positions to raster cells and creating histograms. Cells with high sums of allocated GNSS-points were assumed to represent the road center line. Sato et al. [5] also observed the frequency distribution of GNSS-points in raster cells, but focused on identifying the correct number of lanes. While they could reliably identify the correct number of lanes, they did not evaluate their exact center line positions.

This was the aim of a study by Knoop et al. [6], who introduced the Precise Point Positioning (PPP) technique in order to determine the lane a vehicle is travelling on and to create a self-learning street map in real-time. Uduwaragoda et al. [7] also focused on identifying the number of lanes and their center lines using GNSS data. They analyzed the probability density distribution of vehicle trajectories at road cross sections using a non-parametric Kernel Density Estimation. Results showed that lane center lines can be computed accurately enough if a minimum of 150 trajectories are available, independently of road type and characteristic.

Traffic management operators often use FCD for different applications like traffic monitoring and forecast [8]. Generally particular vehicle fleets (e.g. city taxi fleet) are equipped with GNSS positioning systems (e.g. GPS receiver) and provides FCD for traffic management centers in different forms, either as raw positioning data (vehicle trajectories) or as processed and map-matched data (e.g. link related travel times). The quality and accuracy of raw FCD in terms of positioning depends strongly on the measurement equipment. In general, low-cost GPS receivers are used which are installed either fixed in the vehicle itself or within other devices inside the vehicle (smartphone, route guidance system, GPS data logger).

Herrera et al. [9] analyzed traffic data obtained via GPS-enabled phones for purposes of traffic management applications and found out that FCD is suitable for average speed estimation on roads if 2–3 % of all vehicles are equipped with GPS-enabled phones. Zheng et al. [10] evaluated the accuracy of GPS-based taxi trajectory records in Guangzhou, China. Zheng et al. identified different types of erroneous data using a four filter criteria. Most outliers were detected by the low accurate signal criterion. Zheng et al. conclude that 65 % of records seem valid, so GPS often fail in positioning correct coordinates.

The development of a lane-specific transportation network based on vehicle trajectories from FCD is the key objective in the research project “LaneS”, funded by the Austrian Federal Ministry for Transport, Innovation and Technology. The idea is to estimate the center lines of each lane based on a wide set of lane-specific trajectories obtained from measurements with low-cost GPS devices. The quality of measured vehicle trajectories is evaluated already in advance by comparing them with trajectories from high accurate positioning measurements.

2 Methodology

2.1 General Approach

The general approach in this study within the research project “LaneS” is summarized in Fig. 1. The basis is a broad data collection of vehicle trajectories (VT) from test runs with different GNSS-based positioning technologies on various road sections. For positioning a high accurate differential GPS (D-GPS) measurement equipment is used as well as common GPS-devices like smartphones and data loggers.

Fig. 1
figure 1

General methodology to generate a lane-specific transportation network

The quality of VT was evaluated afterwards within a roadway based (lateral deviations of several VT) and a trip based (longitudinal and lateral deviations of single VT) distance analysis. Therefore, the VT were compared with the high accurate D-GPS measurements to identify outliers and erroneous trajectories.

The generation of a lane-specific graph was realized with kernel density estimation (KDE), which is a non-parametric probability density function. First perpendicular lines (PL) on an input graph (e.g. from Open-Street-Map) of the considered road section were created every 5 m. Then the VT were cut with all PL to establish intersection points. Applying KDE the position of lane center lines (maximum of probability density function) were estimated for each PL. Connecting every center point per lane over all PL achieves finally a lane-specific transportation network.

2.2 Study Area and Measurement Systems

The measurements of the floating-car data (FCD) took place at three different measurement sites (section A, B, C) near Graz (Austria) to cover various road categories. Section A is an urban 3-lane section on the freeway A2 near the city of Graz with a length of 14 km (8.7 mi). Section B is a 2-lane section on the urban arterial road Triesterstrasse in the city of Graz. A characteristic of urban sections is that shadowing effects caused by buildings may occur when measuring the vehicle position with a GPS receiver. Section C is a rural 2-lane section on the freeway S35 in the north of Graz with a length of 12 km (7.5 mi). Within the choice of these sections, we paid attention to avoid tunnels and bridges, because these sites can disturb the sensitive GPS receivers.

In total 369 test runs over more than 4000 km (2500 mi) were conducted on the three measurement sections. Within some of these trips, one vehicle was equipped with a differential GPS measurement system (D-GPS), which consists of an inertial measurement unit (IMU) combined with a GPS receiver. The correction data of a reference station are received with a GSM antenna. The achieved positioning accuracy is about 0.02 m at 100 Hz recording rate. During the measurements, we installed several low-cost GPS receivers in the vehicles. Therefore, several Qstarz GPS data logger with an update rate of 1 or 5 Hz and some smartphones with different GPS logging applications were used. Four apps for Android and one for iPhone was tested, all of them recording with an update rate of 1 Hz.

At all three measurement sites each lane was surveyed separately at several constant vehicle speeds. First, there were no lane changes within the test runs. The vehicle moved as close as possible to the center of the road lane. This is necessary especially for the generation of the reference trajectory based on D-GPS. After that, we also performed measurements with lane changes, because common and available FCD in real, used for generating map data, will contain irregular distributed lane changes and will not contain data only from one defined lane.

2.3 Distance Analysis of GNSS Based Vehicle Trajectories

The position accuracy of the measured GNSS based vehicle trajectories (VT) was evaluated within two different approaches of distance analysis. Therefore, we choose only test runs without lane change. In the roadway based distance analysis, only lateral deviations to a reference graph of similar test runs (trips) were analyzed to get results for spatial positioning errors. Therefore, VTs of same lane, direction and GPS device were considered separately. Afterwards the results were compared between different lanes and other GPS devices. Additionally in the trip based distance analysis, lateral and longitudinal deviations of VTs from the same trip (test run) but from different devices were analyzed to achieve also results for time-based positioning errors.

Roadway Based Distance Analysis. At first a reference graph was generated which models the center line of each lane in the study area. This was realized with the open source statistics program R-project. This reference graph is the result of smoothing several VTs from the high accurate D-GPS measurements per lane. The smoothing uses spline curve estimation in R-project. Then we calculated the Euclidean distances of each GNSS based VT, which are the nearest distances from each point of trajectory perpendicular to the reference graph. All distances of similar test runs (same lane, direction and device) were merged. For evaluating the quality of lateral positioning of the VTs we established two different graphical analysis: a boxplot to get the distribution of distances and a barplot where all distance measures were classified in different groups of positioning accuracy. To compare different measurement devices and road characteristics, average distance and deviation measures for each measurement device were calculated over all lanes and both directions per measurement section (section A, B and C).

Trip Based Distance Analysis. This type of analysis sets its focus on the total two-dimensional error of position fixes contained in typical VTs. Thus, the complete horizontal position error will be determined for each instance in time, for which a respective test receiver provides a valid position fix. In order to quantify the contained error of all accumulated fixes during the test runs, the “true” trajectory—which the vehicle was actually driving—has to be known with high precision. This “true” trajectory of the vehicle has been determined on the basis of the D-GPS measurement equipment. Due to the combination of dual frequency GNSS and an inertial navigation unit, the accuracy of these “true” reference trajectories (RT), are in the range of a few centimeters for all RT position fixes. The superior quality of the RTs are perfectly suited to determine the contained position errors in all valid fixes of the VTs, which are expected to be in the range of a few meters. While the roadway based distance analysis cannot distinguish between position errors of the test receiver and the deviation of the vehicle from the exact center line due to the driver, the current analysis is capturing the horizontal position error with high precision. In the course of error determination, the location of the GNSS antenna of the test receiver inside the vehicle has to be known accurately with respect to the reference point of the high performance equipment. In the current test setup, the respective lever arms have been determined a priori to the conducted test runs. These body offsets between test receiver and reference equipment inside the test vehicle are taken into account and the RTs are transformed to the exact location of the VTs before the residuals are drawn. While the position error along the driving trajectory does not harm the process of center line determination, only the deviation perpendicular to the driving direction contributes errors into the algorithm of this study. Thus, the distinction between longitudinal and lateral position error has been made. The determination of both parts of the position error, require the knowledge of the exact driving direction, which is also provided with the RT-coordinates. For the goal of this study, the lateral part of the determined position error is relevant and it was analyzed whether this part of the VTs is accurate enough to support the developed approach.

2.4 Lane-Specific Transportation Network Based on Kernel Density Estimation

The central assumption for estimating lane center lines from a set of standard GNSS vehicle trajectories (VT) is that the probability to determine a vehicle’s position on a lane is highest along its central axis. Thus, the density of a population of vehicle trajectories is highest in the center of a lane and lowest around the edges. It follows that the density maxima of vehicle positions at a road cross section should correspond to the positions of lane center lines of a road. Moreover, the number of estimated density maxima indicates the number of lanes on a road.

For the computation of density distributions of GNSS based VTs, a Kernel Density Estimation (KDE) is applied. It is a non-parametric probability density function, which centers a smooth kernel function at each data point and sums them to estimate densities. Deng and Wickham [11] defines it as follows in Eq. (1),

$$ \hat{f}\,kde(x) = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} K\left( {\frac{{x - x_{i} }}{h}} \right) $$
(1)

where K is the kernel function and h is the bandwidth. In this work, a Gaussian kernel function is applied. In order to find an appropriate bandwidth as smoothing factor, a data-driven “solve-the-equation” plug-in approach developed by Sheather et al. [12] is applied. To further deal with distinct data outliers, confidence intervals of 5 % from the median later vehicle position are introduced. Trajectories outside these confidence intervals are not considered in the computation of the KDE.

Systematically erroneous GNSS trajectories within the underlying input data can lead to wrong maxima estimations in the sense of not representing an actually existing lane. Thus, a geographic distance matrix is calculated which contains the distances taken pairwise between all elements within the found maxima set. If there are n maxima in the maxima set, the distance matrix is an n * n symmetric two-dimensional array with n * (n − 1)/2 distinct elements. The probability of distance relations within the maxima set is evaluated, so that potentially implausible lane center lines can be detected and omitted. In this way, the potential effects of accumulated erroneous GNSS trajectories and over smoothed bandwidths are minimized.

The developed algorithm is applied on equidistant road cross sections every 5 m along the observed road. For these road cross sections, perpendicular lines are drawn. The positions of intersections between GNSS based VTs and perpendicular lines are determined and assigned with IDs. As a result, the lateral positions of VTs at cross sections every 5 m along the observed road are obtained. Based on these positions, the KDE is computed. Then, the local maxima of the derived density distributions are estimated. For this, first and second derivative tests are conducted. The maxima of consecutive cross sections are connected with line strings using a shortest distance algorithm. In this way, the geometries and the basic topology of the lane-specific road network are constructed. The resulting lane numbers and geometries are then compared to the lane center geometries based on the highly precise D-GPS measurements.

3 Results and Discussion

3.1 Results of Roadway Based Distance Analysis

In the roadway based distance analysis, distribution and quantity of distances between measured GPS based vehicle trajectories and the reference graph (represents the centerline of each lane) were analyzed for each lane and direction of each section (A, B, C) in the study area. Exemplarily the results over all lanes and both directions for the section A (urban 3-lane freeway) are presented in Fig. 2 considering three different measurement devices (one data logger and two smartphone apps).

Fig. 2
figure 2

Distances to reference graph on urban 3-lane freeway A2 (section A) for measured GPS vehicle trajectories based on Qstarz Data Logger (left), Android GPS Logger (middle) and iPhone GPS Logger (right). The barplot above shows the quantity of distances within classes of lateral positioning accuracy. The boxplot below shows distribution and statistics (quantity n, median, mean and standard deviation SD) of the distances

In the example of section A in Fig. 2 the lateral position accuracy is similar for Qstarz Data Logger and Android GPS Logger, although detection rate (5, 1 Hz) and number of trips (77, 26) is different. The characteristic of classes of positioning accuracy of both are similar (about 60 % of distances are less than 2 m to reference graph) as well as median (about 1.5 m) and mean (about 2 m) but standard deviation is higher for Android GPS Logger (3.21 m against 1.78 m). The quality is comparatively worse for iPhone GPS Logger (median 3.31 m); only 31 % of distances are less than 2 m to the reference graph.

Finally, average distance and distribution measures were calculated for all sections in the study area for different GPS devices (see Table 1). We achieved the best results with the Qstarz Logger 5 Hz (median 1.2–1.5 m). A detection rate of 1 Hz for the Qstarz Logger is not recommended here (median 2.5–3 m). Except the iPhone GPS Logger, the smartphone apps provide similar results to the Qstarz 5 Hz in terms of the median, but standard deviation is higher especially for the Android GPS Logger.

Table 1 Average distance and distribution measures for distances of different GPS devices for the three measurement sections A, B and C (SD means the standard deviation)

3.2 Results of Trip Based Distance Analysis

With respect to the approach of this study, we expect that most vehicle trajectories would arise from the use of smartphones, as modern devices contain GNSS and data transmission to provide their tracks. In this regard, two different Android smartphones and a Qstarz data logger have been placed in the same vehicle equipped with the D-GPS measurement. The data logger is used as reference device, to check if the GNSS chip set inside the smartphone can achieve similar performance values of typical mass-market receiver. These tests have been conducted for different road categories, in order to capture the influence of environmental conditions on freeways and urban streets. All three devices have been analyzed individually for each of the three road sections, to detect whether vehicle speed or environmental conditions would have significant impact on the overall performance.

The test area and its surroundings show good GNSS reception conditions on the urban 3-lane freeway (section A), some influence from topography on the rural 2-lane freeway (section C) and minor urban challenges in the city (section B), since the buildings have mostly 4–6 floors and have some distance to the road. Exemplarily the results of trip based distance analysis on the urban 3-lane freeway (section A) for the Android GPS logger are shown in Fig. 3, which is a representative example of the performance that can be expected by using low-cost GPS receivers.

Fig. 3
figure 3

Total (left) and lateral (right) position error distribution on the urban 3-lane freeway (section A) for the trajectories based on the Android GPS logger application for smartphones

The resulting position errors are presented as position error density, since this representation is most suitable for the current assessment objectives. In Fig. 3 such an error density is shown for the Android GPS logger over a sample of 25 test runs within one day. The resulting errors are distributed over eight error classes, from the half-meter class (the very left bin in both diagrams) to the hundred-meter class (the very right bin). The separation between two classes is the mean of both center values of each class. For example, all position errors greater than 1.5 m and smaller 3.5 m have been accumulated into the two-meter class, which represents the biggest bin in the left diagram of Fig. 3 with a share of 46.5 % of all determined position errors. The next bin with a high accumulation of error values is the five-meter class, which is a typical picture for mass-market receivers to have most hits in these two error classes.

With this understanding the left diagram in Fig. 3 shows that only 22.8 % of the errors are either contained in the half-meter class or the one-meter class, which would be sufficient, to be on the correct lane. Now looking at the right diagram in Fig. 3 the error density for the lateral part of the same Android receiver is depicted and it can be seen that the two left bins contain 51.2 % of the errors and thus smaller than 1.5 m. In other words, approximately half of all the valid fixes from smartphones are on the correct lane.

The results are not perfect, but they encourage the application of the kernel density estimation, since mass-market receiver would have the majority of all fixes on the correct lane. This approach also shows the limits of ordinary vehicle trajectories coming from mass-market devices, with respect to its applicability in other domains. The quality of the lateral position error is suitable for the purposes of this study, but it has to be noticed that the quality results cannot be assumed in the same way for other applications. The scientific analysis of mass-market receivers and the derivation of adequate parameters have been executed with respect to the specific requirements of the study.

3.3 Results of Generating a Lane-Specific Transportation Network

The algorithm described in Sect. 2.4 was applied for each of the three measurement sites in the study area (section A, B, C). In Fig. 4, exemplary cross sections of each section with the resulting KDEs are visualized. The punctuated lines show the x-coordinate positions of the detected maxima along the perpendicular lines of the road cross section. The light grey lines are equivalent to the positions of the highly precise D-GPS measurements. Furthermore, the respective derived lane geometries are depicted next to the diagrams. In these examples, the number of lanes was estimated correctly. The density distributions show distinctive maxima peaks corresponding to the estimated lane center lines. These estimated maxima are situated close to the lane center lines from the reference measurements. This indicates that the developed algorithm is capable for estimating the positions of lane center lines with high accuracy.

Fig. 4
figure 4

Exemplary road sections in the three study areas, for which lane center lines are estimated with high accuracy (estimation close to reference). Background Map: basemap.at

The overall performance of the estimation of lane center lines is evaluated based on the reference measurements. The following boxplot (Fig. 5) shows the distribution of distances of the estimated lane center line positions to the reference lane center lines which were measured using the highly precise D-GPS equipment. The median distance is 0.135 m for section A, 0.123 m for section B and 0.056 m for section C. The distribution of distances leans towards the upper quartile in all observed study areas, with outliers up to 1.882 m. Considering lane widths between 2.75 and 3.75 m, the estimation of lane center lines performs with high accuracy.

Fig. 5
figure 5

Boxplot of the distribution of deviations of lane center lines between the algorithmic estimations and the reference measurements in the three study areas

4 Conclusion

This study was carried out within the Austrian research project “LaneS” with the goal to generate a lane-specific transportation network as a basis for future ITS applications and automated driving. A wide set of test runs with different measurement equipment (high precise D-GPS and low-cost GPS receivers like Qstarz data loggers and different smartphone GPS positioning apps) were conducted on three road sections (urban 3-lane freeway, urban arterial, rural 2-lane freeway).

First, the position accuracy of the measured vehicle trajectories from the low-cost GPS receivers was checked against the reference trajectory from D-GPS within a roadway and trip based distance analysis. Lateral errors with a median of 1.5–3.3 m were determined in the roadway based distance analysis. Best results were achieved here with the Qstarz data logger with an update rate of 5 Hz. Some smartphone apps lead to similar good results, but having a higher standard deviation, especially on the freeway sections. In the trip based distance analysis, also the total positioning error (lateral and longitudinal) was checked for trajectories based on measurement devices inside the same vehicle. The exemplarily results for the Android GPS logger on the urban 3-lane freeway showed that looking at the total error, 22.8 % of distances are less than 1.5 m, but 51.2 % by considering only the lateral error. This means that the longitudinal error, which can also be time based, is an essential part of the total positioning error in the analysis.

The generation of the lane-specific transportation network was realized with kernel density estimation, which is a non-parametric probability density function. The idea here is that the density maxima of vehicle positions at a road cross section should correspond to the positions of lane center lines of a road. Moreover, the number of estimated density maxima indicates the number of lanes on a road. As a result the maxima of the density curve applied on several trajectories is the estimation of the center of the lane. Compared with the measured lane center line based on the D-GPS measurement, the distances are less than 0.14 m in median for all three road sections. Hence, the presented methodology for generating lane-specific transportation networks provides accurate estimations for lane center lines for several road characteristics in terms of different speeds, lane width and topology.

The accuracy of estimated lane center lines is diminished especially in areas with unfavorable environmental conditions or complex road situations. This underscores the dependency of the developed methodology on the positional accuracy of the input data. The assumption that the highest density of vehicle trajectories corresponds to lane center lines does not apply for erroneous input dataset with a significant accumulation of positional errors. Thus, further research is required with regard to dealing with a high level of distortion in the positions of GNSS vehicle trajectories in order to apply the developed methodology comprehensively on a road network, irrespective of road complexity and environmental conditions.