Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Roads are the transport with more fatalities. It is estimated that more than 40,000 people die every year in Europe in traffic accidents. During the latest years, the advances in both vehicles and roads helped to reduce the numbers of deaths in roads, but there is still lot of work to be done.

In recent years, efforts have focused in creating applications that use the advances in information technologies to help to increase the security in roads. One example of these applications are the Advance Driver Assistant Systems (ADAS), which purpose is to help and warn the driver in case of a hazardous situations.

Among all the available sensors for road safety applications, it is difficult to find a system able to fulfill the strong requirements of these applications. In the present article an approach based in data fusion is presented. This system tries to overcome the limitations of each sensor by fusing the information provided. A classic vision based ADAS is enhanced by adding a 2D laser scanner, providing pedestrians and vehicles detection with a high positive rate. The resulting application is already available in the Platform IVVI 2.0 (Fig. 1).

Fig. 1
figure 1

Intelligent vehicle based on visual information (IVVI) 2.0

2 State of the Art

Fusion approaches can be divided according to the level where fusion is performed:

In Low Level approaches raw data is fused, creating a new data set, which combines information from different sources. Usually these methods depend on the technology or sensor used. In computer vision, stereovision is an example of low level fusion. Images from two cameras are used to create a more complete set of information able to provide 3D information; in [1, 2] this information is used to provide pedestrian detection.

Medium Level fusion requires preprocessing stages for each sensor separately, creating a feature set, based in features form the different sensors, this set is used to perform the final classification. In [3, 4]authors present works combining the features and performing classification by different ways: Naïve Bayes, GMMC, NN, FLDA, and SVM.

High Level fusion approaches perform detection and classifications for each sensor independently and a final stage combines them [5]. Performs pedestrian detection, using visual Adaboost detection and Gaussian Mixture Model (GMM) for laser scanner, a Bayesian decisor is used to combine detections at high level. In [6] pedestrians are detected using laser scanner by multidimensional features; Histograms of Oriented Gradients (HOG) features and Support Vector Matching (SVM) for computer vision detection; finally Bayesian model provides high level fusion.

The work presented here is an example of high level detection, with independent classifiers for pedestrian and vehicle detection, providing a robust system able to fulfill requirements of safety applications. Besides, the independence of the low level classifiers allow to use them separately even in extreme situations were one of them is not available.

3 Low Level Detection

As it was remarked before, the first stage of the approach consisted in a low level detection, based in the information given by the laser scanner and the camera independently. Later a higher level stage fuses this information.

Several configuration were tested for obstacle detection: pattern based monocular camera detection, stereo-based obstacle detection and laser scanner obstacle detection. The final configuration used the laser scanner to provide obstacle detection to both systems, laser scanner and camera. The higher trustability of the laser scanner helps to reduce the amount of false positives in the vision, since only the regions in the image where there is certainty given by the laser are check. Besides, laser scanner provides obstacle detection faster and more efficient in comparison to the stereo based system.

3.1 Laser Scanner Detection

The laser scanner is mounted in the bumper of the vehicle, thus the delayed detections given should be corrected according to the movement of the vehicle. Translation and rotation should be performed according to the information provided by a GPS with inertial measurement available in the platform (1) and Fig. 2a. Later, shapes of the detected obstacles should be estimated, this shape reconstruction is based in polylines [7], as it is shown in Fig. 2b.

Fig. 2
figure 2

Vehicle movement compensation of laser scanner information. a Shows the detection points, in blue the raw data and in red the compensated. b Shows the shape reconstructed after the movement compensation. c Shows the alignment of the laser scanner data and the image

$$ \begin{gathered} \left[ {\begin{array}{*{20}c} x \\ y \\ z \\ \end{array} } \right] = R\left( {\left[ {\begin{array}{*{20}c} {x_{0} } \\ {y_{0} } \\ {z_{0} } \\ \end{array} } \right] + T_{v} + T_{0} } \right)\,,\;\text{with}\,T_{v} = \left[ {\begin{array}{*{20}c} {vT_{i} \cdot \rm{cos}\left( {\Updelta \theta } \right)} \\ {vT_{i} \cdot \rm{sin}\left( {\Updelta \theta } \right)} \\ 0 \\ \end{array} } \right]\,,\;T_{0} = \left[ {\begin{array}{*{20}c} {x_{t} } \\ {y_{t} } \\ {z_{t} } \\ \end{array} } \right]\,,\;\text{and} \hfill \\ R = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\rm{cos}\left( {\Updelta \delta } \right)} & { 0} & { \rm{sin}(\Updelta \delta )} \\ \end{array} } \\ {\begin{array}{*{20}c} { 0 } & { 1 } & { 0 } \\ \end{array} } \\ {\begin{array}{*{20}c} { - \rm{sin}(\Updelta \delta ) } & {0 } & { \rm{cos}(\Updelta \delta ) } \\ \end{array} } \\ \end{array} } \right]\;\left[ {\begin{array}{*{20}c} { \begin{array}{*{20}c} {1 } & { 0 } & { 1 } \\ \end{array} } \\ { \begin{array}{*{20}c} 0 & {\rm{cos}(\Updelta \varphi )} & { - \rm{sin}(\Updelta \varphi )} \\ \end{array} } \\ { \begin{array}{*{20}c} {0 } & {\rm{sin}(\Updelta \varphi ) } & {\rm{cos}(\Updelta \varphi )} \\ \end{array} } \\ \end{array} } \right]\;\left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\rm{cos}\left( {\Updelta \theta } \right)} & { - \rm{sin}\left( {\Updelta \theta } \right)} & 0 \\ \end{array} } \\ {\begin{array}{*{20}c} {\rm{sin}(\Updelta \theta )} & {\rm{cos}(\Updelta \theta )} & { 0} \\ \end{array} } \\ {\begin{array}{*{20}c} { 0 } & { 0 } & { 1 } \\ \end{array} } \\ \end{array} } \right] \hfill \\ \end{gathered} $$
(1)

where ∆δ, ∆φ and ∆θ corresponds to the increment of the Euler angles roll, pitch and yaw respectively for a given period of time Ti. Coordinates (x, y, z) and (x0, y0, z0) are the Cartesian coordinates of a given point after and before respectively to the vehicle movement compensation. R is the rotation matrix, Tv the translation matrix according to the velocity of the vehicle, T0 the translation matrix according to the position of the laser and the inertial sensor, v is the velocity of the car, Ti the time between the given point and the first one in a given scan. Finally, (xt, yt, zt) is the distance from the laser scanner coordinate system to the inertial measurement system.

After the shape reconstruction, the classification is performed using a pattern matching approach. The different obstacles that are possible to be differentiated are: big obstacles, small obstacles, road borders, L shaped, pedestrians or vehicles. The most important ones for this application are pedestrians and vehicles. Both detections were based in different patterns, obtained thanks to detailed studies in the movement of the different obstacles.

3.1.1 Vehicles

The pattern is based in the delay of the spots given by the laser scanner detection, this way, this spots provide a given pattern that depends on the movement of the vehicle. Thanks to this special pattern, classification can be performed as well as information about the movement of the vehicle estimated (Fig. 3). Deeper information of this algorithm is provided in [7].

Fig. 3
figure 3

Typical pattern for vehicle detection

3.1.2 Pedestrians

A pattern for pedestrians was defined based in the position of legs of the pedestrians (Fig. 4).

Fig. 4
figure 4

Pattern for pedestrian detection. a The pattern used. b Examples of the pattern with real pedestrians

In this pattern, three polylines are presented, and the angles that connect the polylines are included within the limits of [0, π/2] (2).

$$ {\text{Similarit}}y = \frac{{2\theta_{1} }}{\pi }\cdot\frac{{2\theta_{2} }}{\pi } $$
(2)

where θ1 and θ2 are the angles that connect two consecutive lines.

This similarity is computer between two consecutive angles and if the case arises that they match the pattern it is labeled as a possible pedestrian.

Finally a simple tracking stage is added at this level, to track the movement of the obstacle providing reliable detection. A voting scheme that takes into account the last 10 frames was created. Furthermore several filters are added to avoid false positives, these filters check wrong behavior along time, such as impossible movements, velocities or accelerations.

3.2 Computer Vision

As it was remarked, the vision approach makes use of the reliability of the laser scanner to provide region of interest. Thus only the obstacles detected by the laser scanner, with size similar to the object to be found, are provided to the vision system, allowing reducing the parts of the image where the algorithms perform the search. Thus computation cost and false positives are reduced (Fig. 5).

Fig. 5
figure 5

Examples of obstacle sets for pedestrians (left) and vehicles (right)

Coordinate change should be performed using pin-hole model and accurate extrinsic calibration (3) and (4) to provide information from the laser scanner to the camera coordinate system:

$$ \left[ {\begin{array}{*{20}c} {x_{c} } \\ {y_{c} } \\ {z_{c} } \\ \end{array} } \right] = R\left( {\left[ {\begin{array}{*{20}c} {x_{l} } \\ {y_{l} } \\ {z_{l} } \\ \end{array} } \right] + T} \right) $$
(3)

where R is the rotation matrix shown in (1) corresponding to the Euler angles that represent rotation between the different coordinate systems. T is the translation vector \( T = \left[ {\begin{array}{*{20}c} {x_{t} } \\ {y_{t} } \\ {z_{t} } \\ \end{array} } \right] \) that corresponds to the distance between the coordinate systems, (xc, yc, zc) are the camera coordinates and (xl, yl, zl) the laser scanner coordinates.

$$ \lambda \left[ {\begin{array}{*{20}c} u \\ v \\ 1 \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} f & 0 & {u_{0} } \\ 0 & f & {v_{0} } \\ 0 & 0 & 1 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {x_{c} } \\ {z_{c} } \\ {y_{c} } \\ \end{array} } \right] $$
(4)

where u0 and v0 are the center coordinates of the camera coordinate system in pixels. (u, v) are the coordinates in the camera coordinate system in pixels. xc, yc and zc are the Cartesian coordinates from camera. And f is the focal length.

The computer vision algorithms used were different according to the obstacle to be found:

  • Vehicles. Haar-Like features based approach with cascade classifiers was used (Fig. 6, right). The common features that can be found in the back of the vehicle allowed to used this fast algorithm originally used for face detection [8].

    Fig. 6
    figure 6

    Visual detection examples for pedestrians (left) and vehicles (right). Red boxes represents for visual detections, blue boxes represents laser scanner based pedestrian detections and yellow boxes vehicle detections

  • Pedestrian. Based in Histogram of Oriented Gradients (HOG) features (Fig. 6 left). This approach is classical in Intelligent Vehicles, and was proposed in [9].

4 Fusion

Fusion stage retrieves the detections from both subsystems, providing fused detections (tracks). Kalman filter was used to estimate the movement of the different obstacles.

Two kind of tracks were defined, consolidated and non-consolidated. First corresponds to those tracks detected by both sensors, second correspond to the detected by a single sensor.

The association technique used to match the new detection with the old ones (tracks) was Global Nearest Neibors (GNN), based in the distance between the estimation of the track and the position of the detections. A distance based in the stability of the measurements was defined (5) and the gate used to eliminate the non likely pairs was based in a square approach (6).

$$ d^{2} = \frac{{\left( {x_{i} - \bar{x}} \right)^{2} }}{{\sigma_{x}^{2} }} + \frac{{\left( {y_{i} - \overline{y} } \right)^{2} }}{{\sigma_{y}^{2} }} + { \ln }\left( {\sigma_{x} \sigma_{y} } \right) $$
(5)
$$ K_{Gl} \sigma_{r} $$
(6)

where \( \sigma_{r} \) is the residual standard deviation and KGl is a constant that was empirically chosen, d is the computed distance between previous and presented tracks to be associated. And \( \left( {\sigma_{x} , \sigma_{y} } \right) \) the appropriate values of covariance matrix of Kalman Filter.

A M/N policy was used to create and eliminate the tracks, thus a given track is created after M detection and is eliminated if N number of frames provide no matches for the track. In the case of a non-consolidated track which is not corroborated by the other sensor it is considered a false possitive.

5 Results

Different test were performed in both urban and interurban scenarios with more than 10,000 frames in real road situations. Result comparison is showed in Table 1.

Table 1 Results

The results proved that the system was able to enhance the low level approaches providing better results. In the low level detections, it was very interesting the high positive rate obtained with the limited information provided by the laser scanner, mainly in the case of vehicles. On the other hand the amount of misdetections was also very high.

It has to be remarked that the training process created for the camera approaches were performed taking into account the results of the laser scanner. Due to the high amount of false positives given by the laser scanner based system, the vision system was trained to obtain the lowest false positive rate possible. Besides, the camera systems did not include a tracking stage, thus the positive results expected were lower. This situation is visible in the case of vision based vehicle detection. Although the amount of no-detection errors for vehicles was high, it was proved that all the vehicles in the images were positively detected. Thus even in the worst case scenario any vehicle is detected after one or two frames. Finally the low amount of false positives in the visual approaches allowed overcoming the excessive number of these errors in the laser scanner approach.

6 Conclusions

Finally we can conclude, given the results presented in Table 1, that the fusion process allows to combine information from the camera and the laser scanner and helps to overcome the limitations of the system, enhancing the capacities of classic ADAS and providing reliable detection, ready to be included in a real road application.

Limitation inherent to each sensors and its algorithms are overcome thanks to the use of data fusion approach. First, computer vision have the trustworthiness limitations due to the unstructured information. The trustworthiness of the laser scanner detection allows to reduce the false positives. On the other hand, the limited information given by the laser scanner is completed thanks to the high amount of data given by the laser scanner.

Although nowadays a road application based in both technologies would represent a high cost, due to the high cost mainly of the laser scanner in comparison to the computer based approaches. Modern cars are already available with this technology to detect obstacles and perform avoiding maneuvers.