Keywords

1 Introduction

Vision based road detection refers to the accurate detection of the free road surface ahead of a vehicle, it is a key module of advanced driver assistance systems and robot vision navigation systems. A large number of road detection methods have been proposed in last decades. These methods can be categorize to appearance based [16], structure from motion based [7, 8] and stereo vision based [9]. Among these methods, appearance based methods have been extensively investigated, where color and texture are two main appearances. Lu [1] directly use RGB color and Gaussian Mixture model (GMM) to obtain a road probability distribution map, Rotaru [2] and Wang [3] respectively use Hue-Saturation-Intensity and CIE-Lab color space to model road pattern, and Christopher [4] combines color and texture cues to improve the road detection. Alvarez [5] comprehensively compares the performance of different color space for road detection. In order to overcome shadows in the road, Alvarez [6] also consider the physics behind color formation and design an illuminant-invariant image, which is a shadow free image converted from RGB image.

In order to capture road appearance online in lack of other road information, the above methods all assume a region localized in the center-lower part of a road image to be a confident road region and use it to train their road appearance models. However, for one thing, this region is sometimes reservative, which may lose important appearance information when multiple road materials exist; For another, this assumption is hold when the host-vehicle is moving along a road with regular pose, it might be violated when the pose of the host-vehicle undergoes serve variances, for instance, in the scenarios of turning and avoiding obstacles, where the center-lower region often incorporates many non-road pixels, which may affect the performance of these appearance models. In this paper, we propose a more sensible assumption, that is, the true road region is always connect to the image bottom boundary, and the boundary of a road region should take up large common part with the image bottom boundary due to the perspective effect. With this new assumption, a road region can be identified by measuring the extent of the road region connecting to the bottom boundary of a road image. Therefore, the road region can be more accurately inferred which is insensitive to the pose variations and road surface with multiple materials. This idea is motivated by Zhu’s work [9], which is originally proposed for the saliency object detection base on background prior. Different from [9], we use the idea of boundary prior to road detection problem, and we rather than use all the image boundary, but only use the bottom boundary. We also consider embedding illumination invariance space to remove the disturbance of shadows.

2 Related Work

Over the years, the most popular road detection approaches might be appearance based. Since modeling the appearance of non-road regions is very difficult due to its diversity, most of methods try to model road appearance only. These methods usually incorporate the assumption that road appears in the lower part of the road image, and road samples can be used to learn an appearance model. Then the model is used to identify other regions of the image. The appearance can be well modeled by one-class classifiers, such as GMM, Nearest Neighboring and so on. Color and texture [16] are the most popular features for this one-class learning. Unfortunately, the center-lower part assumption does not belong to the road in some scenarios, such as the pose variance of ego-vehicle and turning. Moreover, the samples in the center-lower part may be insufficient for modeling the whole road appearance, for example, shadows and non-uniform road surface.

With available training road images, a binary or multiple classifiers can also be learned to classify the appearance of road and non-road region. Reference [2] use support vector machine to identify road and non-road regions with an online incremental learning strategy. In Ref. [4], convolution neural network is trained offline to classify road scene into sky, vertical and road region. Reference [3] applied Adaboost classifier is trained offline first and then adapt to the specific scene by online updating.

In this paper, we proposed an appearance based road detection method based on road image boundary prior. It can deal with road detection in the scenarios of irregular vehicle pose, surfaces with multiple materials and shadows to some extent. The rest of this paper is organized as follows: In section two, the related work on road detection is reviewed firstly, then the image boundary prior based road detection method is introduced in section three, and we report our experimental results and draw a conclusion in section four and five respectively.

3 Method

3.1 Image Boundary Prior

Common road detection methods assume that the central–lower part of a road image belongs to the confident road surface. For example, Ref. [1] define a “safe” window in a road image which is depicted as the red region in Fig. 1(a), and in Ref. [4], nine patches with a size of 10 \( \times \) 10 are uniformly located in the center-bottom to modal construction, see the green patches in Fig. 1(a). However, their assumption has a main drawback since that area does not always belong to the road surface, such as overtaking, left/right turn, see Fig. 1(b). Fortunately, we observe that the road image boundary can provide more accurate road/non-road information. For instance, road regions are much more connected to the image bottom boundary since the perspective projection effect, and non-road regions (sky, trees and building et al.) are much less connected to image bottom boundary, but are much more connected to the image top boundary and the left and right boundaries. Motivated by this, we relax the center-lower part assumption to a more generate one, that is, road region might take large common boundary with the bottom boundary of a road image over the perimeter of the road region. Therefore the non-road region marked with green in Fig. 1(d) can be excluded since it has large boundary but only a few of them is common with the bottom boundary of the image. Therefore, we consider this as bottom boundary prior. By considering the whole bottom boundary, we not only can include more confident road region than center lower or limited seeds patch assumption when multiple materials exists, but also it is insensitive to the effect of including non-road regions due to the pose variance of the vehicle by considering the extent of a region connecting to the bottom boundary.

Fig. 1.
figure 1figure 1

Traditional “safe” road region or seeds (a) used by researchers may be affected by the including of non-road regions when the pose of the vehicle changes (b). The image boundary prior overcome this problem and can classify road and non-road regions accurately, as (c) and (d).

To measure the extent of a region connecting to the bottom boundary, we initially over-segment a road image into non-overlapping patches. Here we use uniform square patches for simplicity, i.e., if N patches are needed, then the length of each patch’s side \( L \) equals to \( \sqrt {\frac{w \cdot h}{N}} \), where \( w \) and \( h \) are the width and height of input images. Note that some more accurate over-segmentation methods, such as the super-pixel method [31], can also be used. Afterwards, an indirection graph is constructed as shown in Fig. 2, where each node denotes an image patch. Let p denotes one of the patches and all the patches are denoted by the set \( L = \{ p_{i} \}_{i = 1}^{N} \). We define the geodesic distance \( d_{geo} (p,p_{i} ) \) as the distance of the shortest path from \( p_{i} \) to each \( p_{i} \):

$$ d_{geo} (p,q) = \mathop {\hbox{min} }\limits_{{r_{1} ,r_{2} , \cdots r_{N} \in \{ p - > q\} }} \sum\limits_{i = 1}^{N - 1} {d_{pair} (p_{i} ,p_{i + 1} )} $$
(1)
Fig. 2.
figure 2figure 2

An illustration of the geodesic distance from p to q on the graph. The red, blue and green paths are three candidate paths from p to q, where the red one with the shortest path of all the geodesic distances from p to q (Color figure online).

where the pair-wise distance \( d_{pair} (p_{i} ,p_{i + 1} ) \) represents the Euclidean distance of mean colors of the adjacent patches \( p_{i} \) and \( p_{i + 1} \). We choose the CIE-Lab color space to calculate the distance since it is perceptually linear, meaning that similar differences between two color vectors in this color space are considered about equally important color changes to human. Therefore, it is an appropriate choice for measuring with Euclidean distance. In the Eq. (2), the symbol \( \{ p \to q\} \) represents the set of all paths from \( p \) to \( q \), e.g. in Fig. 2, the red, blue and green paths are three different paths. If a path from \( \{ p \to q\} \) is denoted by \( r_{1} ,r_{2} , \cdots ,r_{N} \), then the geodesic distance \( d_{geo} (p,q) \) is defined as the shortest path of all from \( p \) to \( q \) on the graph, where the calculation of the shortest path can be realized efficiently by Johnson’s algorithm [32]. If all distance on all edges of the graph in Fig. 2 are assumed to be equal, the shortest path from \( p \) to \( q \) is \( p \to a \to q \), in this case,\( d_{geo} (p,q) \) equals to \( d_{app} (p,a) + d_{app} (a,p) \).

Then all the distance can be turned into a similarity between p and \( p_{i} \) by using the \( \exp ( \cdot ) \) function as following.

$$ sim(p,p_{i} ) = \exp ( - \frac{{d_{geo}^{2} (p,p_{i} )}}{{2\sigma_{1}^{2} }}) $$
(2)

Where \( \sigma_{1} \) is a factor to control the smoothness of the distance \( d_{geo} (p,q) \). We sum up all the similarity to obtain the contribution from all patches.

$$ A(p) = \sum\limits_{i = 1}^{N} {sim(p,p_{i} )} $$
(3)

And we also sum up the similarity of those patches which are on the image bottom boundary.

$$ B(p) = \sum\limits_{i = 1}^{N} {sim(p,p_{i} ) \cdot \delta (B(p_{i} ) = 1} ) $$
(4)

Where \( B(p_{i} ) = 1 \) represents that \( p_{i} \) is on the image bottom boundary. \( \delta ( \cdot ) \) is an indication function, which returns 1 if \( p_{i} \) exactly lies in the bottom boundary of the road image and 0 otherwise.

So far, \( A(p) \) and \( B(p) \) are calculated, where \( B(p) \) can be view as the sharing length of p with the bottom boundary and \( A(p) \) is an area but not the region’s perimeter. Since the shape of the region is arbitrary, we assume the region is a circle in shape. And the perimeter can be estimated by \( A(p) \), that is \( \sqrt {4\pi A(p)} \). By neglecting the constant \( \sqrt {4\pi } \), we formulate the bottom boundary prior as:

$$ \alpha (p) = \frac{B(p)}{{\sqrt {A(p)} }} $$
(5)

where \( \alpha (p) \) reflects the extent of \( p \) connecting with the image bottom boundary, which can well used to identify the road.

Figure 3 gives a simple undirected weighted graph to illustrate the process of the inference of the road. The graph includes only three class, sky, tree and road, which are shown in different colors. We assume that the pair-wise distance \( d_{pair} (p,p_{i} ) \) associated with the edge connecting two neighboring patches from the same class equals to 0, and that the distance is set to infinite for those patches from different classes, The \( \alpha (p) \) value of all nodes are shown in a matrix form at the bottom left of Fig. 3. The normalized version of this matrix is also shown as an image at the bottom right of Fig. 3.

Fig. 3.
figure 3figure 3

Illustration of the road inference based on image boundary prior

The \( \alpha (p) \) values of all patches can provide us a probability map \( Pb \) using Eq. (4), where \( \sigma_{2}^{{}} \) is a smoothness factor which is empirically set to 1.

$$ Pb(p) = 1 - \exp ( - \frac{{\alpha^{2} (p)}}{{2\sigma_{2}^{2} }}) $$
(6)

3.2 Embedding Illumination Invariance Space

From Fig. 3, we can see clearly that the bottom boundary prior can infer road pretty good if small color distance is assigned to the inner region of the road while high distance is assigned to the boundary between road and non-road regions. However, road region often exits high color distance due to shadows. In practice, Shadows are one of main challenges to road detection. Under different illumination condition, shadows might cast on the road with random shapes and locations, and with different extent from shallow shadows to heavy ones. The Lab color space we used above is not photometrically invariant. Big changes of the intensity due to shadows can influence all three coordinates. In [11], Shannon’s entropy is applied to find and distinguish the intrinsic quality of surfaces’s spectral properties. In [4], the authors proposed an illumination invariant color space to suppress shadows in the road based on the work of [11]. They find the intrinsic feature of an RGB road image to obtain an illumination invariance space, the road and shadows on it looks more similar in this space. Specifically, for an RGB road image, the 3-D dara are transformed to a 2-D log chromaticity space \( (\rho_{1} ,\rho_{2} ) \), where \( \rho_{1} = \log \frac{R}{B} \), \( \rho_{2} = \log \frac{G}{B} \), the pixels on the same surface under different illumination form a straight line in this space. The 1-D space \( I_{\theta } \) is obtained by projecting the point in the \( (\rho_{1} ,\rho_{2} ) \) space with a line \( l_{\theta } \), which makes an angle \( \theta \) with the horizontal axis, as Eq. (5).

$$ I_{\theta } = (\rho_{1} ,\rho_{2} ) \cdot (\cos \theta ,\sin \theta )^{T} $$
(7)

Finally, the angle with minimum Shannon entropy \( \theta^{\prime} \) is the best projection direction, and \( I_{{\theta^{\prime}}} \) is the illumination invariance space (IIS).

$$ \theta^{\prime} = \mathop {\hbox{min} }\limits_{\theta } \left\{ { - \sum\limits_{j} {P_{j} (I_{\theta } )\log (P_{j} (I_{\theta } ))} } \right\} $$
(8)

However, we observed from our experiments that this space lost some discrimination between road and non-road, and we can obtain better performance if we combine it with Lab color space. Therefore, we linearly weight the distance of the average color vector in \( Lab \) color space and illumination invariant space \( I \) in the computation of the color distance of arbitrary patches according to Eq. (5), here \( \gamma \) is a constant aiming to balance the importance of two distances. Therefore, image bottom prior can combine with Lab, IIS and both Lab and IIS to obtain three methods which denoted by IBP-Lab, IBP-IIS and IBP-Lab-IIS, respectively in the following.

$$ d_{app} (p_{i} ,q) = \sqrt {(L_{p} - L_{q} )^{2} + (a_{p} - a_{q} )^{2} + (b_{p} - b_{q} )^{2} } + \gamma \left| {I_{p} - I_{q} } \right| $$
(9)

The final result of road detection is realized by the segmentation of the probability map using a simple adaptive threshold \( T = u + \alpha \cdot \sigma \), where \( u \) and \( \sigma \) are the mean and standard deviation of the probability map, \( \alpha \) is a constant which can take the value in the interval [1, 3]. However, we directly use the probability map for experimental comparison since it is the core of both our approach and the methods exploit for comparison.

4 Experiments

The proposed approach and the methods used for comparison have been implemented in Matlab on a PC with 2 Duo CPU (2.4 GHz) and 2 GB memory without any code optimization.

We evaluate our approach using BCN-2 open datasets and a dataset collected by ourselves, detailed description of the two datasets are listed in Table 1. They contain many challenging and complicated scenarios. Example images from these dataset are shown in Figs. 3 and 4. We compare our method with GMM based method [3] and illumination invariance based method [4]. We apply Maximum F1-measure (MaxF), Average Precision (AP), Precision(PRE), Recall(REC), False Positive Rate (FPR) and False Negative Rate(FNR) to show the comparison result as [10] and [12].

Table 1. Description of used datasets
Fig. 4.
figure 4figure 4

Sampled detection results in NUST dataset. The original images are shown in the first row, Ground-Truth are shown in the second row. For three to seven row, detection results are corresponds to GMM-HSV, IIS, IBP-Lab, IBP-IIS and IBP-Lab-IIS respectively.

4.1 Result on Irregular Vehicle Pose and Multiple Materials Surface

Our approach is first assessed using NUST dataset, which include some adverse conditions due to irregular vehicle pose and non-uniform road surface. Figure 4 gives five sampled images from NUST dataset, and their detection results are also shown. It is observed that the GMM and IIS methods are dependent on the pose of the vehicle, they often don’t work when non-road pixels are dominant in their “safe” road regions. Moreover, in non-uniform road surface scenarios, GMM based method often causes false negative result since road samples are not adequate in lower center region, while IIS often cause false positive if the bottom boundary include both road and non-road regions. However, our method is robust to these problems since we consider all the patches on the bottom boundary as road reference region and improve road discriminative ability by measuring the connection extent with the bottom boundary, which is insensitive to the incidence of non-road regions. Result of performance comparison on the whole dataset is shown in Table 2. Three IBP-X based methods are all superior to both GMM and IIS, where IBP-IIS yields the best result.

Table 2. Performance comparison on NUST dataset (%)

4.2 Result in Shadows Scenarios and Non-Uniform Road Surface

Our approach is also assessed using the BCN-2 dataset, which includes two sub-datasets, one is sunny shadow scenario and the other is after rain scenario, the two often lead to non-uniform road surface. Figure 5 give some illustrative detection results, and Tables 3 and 4 show their quantitative results respectively. We find that IBP-IIS yield best in the sunny shadow scenario and IBP-Lab-IIS performs best in the after rain scenario.

Fig. 5.
figure 5figure 5

Sampled detection results in BCN-2 dataset. The original images are shown in the first row, the first two frames are sunny shadow scenario (the frame 105 and 153) and the last three are after-rain scenario (the frame 47, 107 and 475). Ground-Truths are shown in the second row. From the third to seventh row, detection results are corresponds to GMM-HSV, IIS, IBP-Lab, IBP-IIS and IBP-Lab-IIS respectively.

Table 3. Performance comparison on the sunny shadows dataset of BCN-2 (%)
Table 4. Performance comparison on the after rain dataset of BCN-2 (%)

4.3 Parameter Sensitivity and Time Cost Analysis

We also conduct parameter selection and sensitivity test. In our method, \( \gamma \) is a key parameter. We change \( \gamma \) within the range [0,50] and we find that \( \gamma = [2,8] \) always yields the similar good performance. So we set \( \gamma = 5 \) in all experiments. Finally, we compute the time cost of these methods (Table 5).

Table 5. Comparison of average time cost (s)

5 Conclusion

Road detection is a key technique of ADAS, robot vision navigation and other applications. Road detection in arbitrary road scene is still an open problem. In this paper, we proposed an image boundary prior to infer road region, which can deal with the problem of pose variance of host-vehicles and exists of non-uniform road surface and shadows. Experiments demonstrate that the probability map generated based on image boundary prior is superior to GMM and IIS based ones. The IBP with illumination invariance space and with the combination of IIS and Lab color space always yield the best performance on the used datasets, and the performance can also be improved if superpixel segmentation is used instead of rectangle patches, however, this may increase time complexity. Moreover, our method can be applied to flexibly locate the “safe” road region so as to boost many appearance based road detection methods.