An overview of traffic sign detection and classification methods

Saadna, Yassmina; Behloul, Ali

doi:10.1007/s13735-017-0129-8

An overview of traffic sign detection and classification methods

Trends and Surveys
Published: 06 June 2017

Volume 6, pages 193–210, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

An overview of traffic sign detection and classification methods

Download PDF

Yassmina Saadna¹ &
Ali Behloul¹

3180 Accesses
76 Citations
Explore all metrics

Abstract

Over the last few years, different traffic sign recognition systems were proposed. The present paper introduces an overview of some recent and efficient methods in the traffic sign detection and classification. Indeed, the main goal of detection methods is localizing regions of interest containing traffic sign, and we divide detection methods into three main categories: color-based (classified according to the color space), shape-based, and learning-based methods (including deep learning). In addition, we also divide classification methods into two categories: learning methods based on hand-crafted features (HOG, LBP, SIFT, SURF, BRISK) and deep learning methods. For easy reference, the different detection and classification methods are summarized in tables along with the different datasets. Furthermore, future research directions and recommendations are given in order to boost TSR’s performance.

Robust Traffic Sign Detection and Classification Through the Integration of YOLO and Deep Learning Networks

Performance enhancement techniques for traffic sign recognition using a deep neural network

Article 20 April 2020

Traffic sign detection and recognition based on pyramidal convolutional networks

Article 04 March 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The United Nations estimates that between 2010 and 2020 the number of road deaths will increase by up to 50%, that is, about 1.9 million people. To reverse this trend, the UN established, in 2011, the 1st “Decade of Action for Road Safety.” Driver Assistance Systems can help reduce the number of accidents by automating tasks such as lane departure warning systems, traffic sign recognition.

The recognition of traffic signs has received increasing attention in recent years; it is even considered as a highly important feature of intelligent vehicles. Traffic signs carry substantial useful information that might be disregarded by drivers due to driving fatigue or searching for an address reasons. These drivers are also likely to pay less attention to traffic signs on driving in threatening weather. Therefore, making enhancement initiatives, like increasing driving safety along with improving automatic detection and road sign recognition system, is becoming indispensable to help decrease road death toll. These enhancements, however beneficial they may seem, meet several external non-technical challenges such as lighting variations, scale and weather conditions changes, occlusions and rotations, which may eventually decrease the traffic sign recognition systems performance.

The main issue of the problem in the traffic sign recognition system is not how to detect or recognize, with high recall, a traffic sign in a fixed image. It is rather about how to obtain a high precision in videos big data. To illustrate the problem of false alarms, a traffic sign recognition system installed in a smart phone, with 30 shot frames per s, (108,000 frames in a 1-h video), was considered. If we suppose that in every 4 min we detect a sign and that—with reference to the speed of a car—every sign spans 2 s, this means that in the course of 1 h we will find a total of 15 signs and that every sign will display in 60 frames (900 frames that contain signs while 107,100 does not). Supposing also that the system has a 1% false positive accuracy rate, this means that there are 17 detected false alarms in 1 min (1071 in 1 h) and 1 true positive and 68 false alarms in 4 min by consequent. This may eventually lead to most users disabling their applications.

Traffic sign recognition systems consist of three main stages: localization, detection, and classification. In the case of any false alarm in the detection stage, performance will be lower in the classification one; this is due to the fact that the classifier is not usually trained on false alarms.

Road signs have many discriminating features on the basis of which they are classified. According to their shapes and colors, these are five main classes: warning signs (red triangle), prohibition signs (red circular), reservation signs (rectangular blue), mandatory signs (circular blue), and temporary signs (yellow triangle). Examples of traffic signs for each of the categories mentioned above are shown in Fig. 1.

The aim of this paper is to present an overview of some recent and efficient traffic sign detection and classification methods; some authors like [1,2,3,4, 15, 36] are also preferred to make a study on this domain.

In Sect. 2, traffic sign detection methods are presented; they have been divided into three categories: color-based, shape-based, and learning-based methods, including deep learning methods. In Sect. 3, traffic sign classification methods are stressed, firstly we cite learning methods based on hand-crafted features, and then, we mention deep learning methods. Moreover, different publicly available traffic sign detection and classification datasets are also presented to help meet the goal of this paper. Section 4 describes the future research directions which can incorporate with researchers in their future works. Finally, a conclusion is expected.

2 Detection methods

As mentioned above, we can classify detection or localization methods into three fundamental classes: color-based, shape-based, and learning-based methods. According to the nature of the problem and system requirements, we can decide upon the best method to apply; for example, methods based on color information can used with high-resolution dataset, however, not with grayscale images.

2.1 Color-based methods

The dominant color-based segmentation is applied to detect regions of interest. There are specific characteristic colors of traffic signs: red, blue, and yellow. These characteristics, however, indicate sensitivity to various factors, such as the age of signs and the variation of light, which make the segmentation an arduous process. In order to overcome this problem, authors are working on different color spaces among which we mention the following:

2.1.1 RGB space

De La Escalera et al. [5] adopt RGB space by reason of HSI formulas are nonlinear. The authors use the relation between the components as it is presented in the following expression:

$$\begin{aligned} \begin{aligned} g(x,y)=k_{1}{\left\{ \begin{array}{ll}R_{a}\le f_{\mathrm{r}}(x,y)\le R_{b} &{}\\ \\ TG_{a}\le \frac{f_{\mathrm{g}}(x,y)}{f_{\mathrm{r}}(x,y)}\le TG_{b}&{} \\ \\ TB_{a}\le \frac{f_{\mathrm{g}}(x,y)}{f_{\mathrm{b}}(x,y)}\le TB_{b}\\ \\ \end{array}\right. } \\ g(x,y)=k_{2} \qquad \text {in other case} \end{aligned} \end{aligned}$$

(1)

$f_{\mathrm{r}}(x,y), f_{\mathrm{g}}(x,y)$, and $f_{\mathrm{b}}(x,y)$ are, respectively, the functions that provide the red, green, and blue levels of each point of the image.

Thresholding is used by several authors as [6,7,8]. Their approaches, however, highly related to the selected thresholds, which make the comparison of their performances a difficult task.

Ruta et al. [7] applied filtering for each pixel: $X=[x_\mathrm{R} ,x_\mathrm{G} ,x_\mathrm{B} ]$ and $ S=x_\mathrm{R}+ x_\mathrm{G}+ x_\mathrm{B}) $

$$\begin{aligned} f_\mathrm{R} (X)= & {} \hbox {max}(0,\hbox {min}(x_\mathrm{R}-x_\mathrm{G} ,x_\mathrm{R}-x_\mathrm{B})/S) \end{aligned}$$

(2)

$$\begin{aligned} f_\mathrm{B} (X)= & {} \hbox {max}(0,\hbox {min}(x_\mathrm{B}-x_R ,x_\mathrm{B}-x_\mathrm{G})/S))\end{aligned}$$

(3)

$$\begin{aligned} f_\mathrm{Y} (X)= & {} \hbox {max}(0,\hbox {min}(x_\mathrm{R}-x_\mathrm{B} ,x_\mathrm{G}-x_\mathrm{B})/S)) \end{aligned}$$

(4)

In this approach, they generate three maps red, blue, and yellow for each RGB image. The dominant color has a high intensity while deteriorated signals have low intensities. King et al. [9] prefer the R’G’B’ space. At first, they normalize the three RGB channels by intensity I:

$$\begin{aligned} I= & {} \frac{R'+G'+B'}{3} \end{aligned}$$

(5)

$$\begin{aligned} r= & {} \frac{R'}{I}, \quad g=\frac{G'}{I},\quad b=\frac{B'}{I} \end{aligned}$$

(6)

Then, they construct four new images according to the equation proposed by [10]:

$$\begin{aligned} R= & {} r-\frac{(g+b)}{2} \end{aligned}$$

(7)

$$\begin{aligned} G= & {} g-\frac{(r+b)}{2}\end{aligned}$$

(8)

$$\begin{aligned} B= & {} b-\frac{(r+g)}{2} \end{aligned}$$

(9)

$$\begin{aligned} Y= & {} \frac{r+g}{2}-\frac{\mid r-g \mid }{2}-b \end{aligned}$$

(10)

The dominant color has a great intensity that facilitates the extraction of the panels. Thresholding is used to binarize the four images (R, G, B, Y); morphological operations are then applied to remove the unwanted pixels. It is worth highlighting that this approach is capable of detecting up to 93.63% of the panels.

King et al.’s approach is adopted in [11]. A filter is proposed to eliminate undesirable pixels with a view to reduce the execution time.

2.1.2 HSV space

Yakimov [12] considered that is not possible to detect traffic signs in real images by applying a simple threshold directly in the RGB space due to lighting variations; this is what urged them to choose the HSV space. They used experimental method to determine the optimal threshold values for red color as it is presented in the following expression:

$$\begin{aligned} \begin{aligned}&(0.0H<23) \parallel (350<H<360)\\&(0.85<S\le 1)\\&(0.85<V\le 1) \\ \end{aligned} \end{aligned}$$

(11)

After segmentation, they used a modified algorithm presented in [13] to denoise segmented images. The advantage of the denoising algorithm is that only noise will be removed and the regions of interest stay unfiltered.

Wang et al. [14] also choose HSV space, and they found that the classical thresholding method gives good results in many different lighting conditions except for the cases of color cast or poor lighting condition. They proposed a new thresholding method by using the color information of neighboring pixels. Firstly, the red degree of each point c is calculated to get a new image $f_\mathrm{R} (c)$ with the following equation:

$$\begin{aligned} f_{R}(c)={\left\{ \begin{array}{ll} S(c) \frac{\hbox {sin}(H(c)-300^{\circ })}{\hbox {sin}(60^{\circ })}&{}\hbox {if } H(C)\in [300^{\circ },360^{\circ }] \\ S(c) \frac{ \hbox {sin}(60^{\circ })-H(c)}{\hbox {sin}(60^{\circ })}&{}\hbox {if } H(C)\in [0^{\circ },60^{\circ }] \\ 0 &{}\hbox {others} \end{array}\right. } \end{aligned}$$

(12)

Secondly, the normalized red degree $f_{NR} (x)$ is calculated as the following:

$$\begin{aligned} f_{NR}(x)=\frac{(f_\mathrm{R} (x)-\mu _R (\omega _x))}{(\sigma _R (\omega _x))} \end{aligned}$$

(13)

$\mu _R(\omega _x)$ and $\sigma _R (\omega _x)$ are the mean and the variance of the red degrees of the pixels in the window $\omega _x$ centered on x.

Thirdly, the normalized intensity $f_{NI}$ is calculated with the following equation:

$$\begin{aligned} f_{NI} (x)=\frac{(f_I (x)-\mu _I (\omega _x))}{(\sigma _I (\omega _x))} \end{aligned}$$

(14)

$f_I (x)$ is the intensity of the pixel x; $\mu _I (\omega _x)$ and $\sigma _I (\omega _x)$ are the mean and the variance of the intensities of the pixels in the window $\omega _x$. Finally, the red bitmap B is given as the following:

$$\begin{aligned} B(x)={\left\{ \begin{array}{ll} 1&{}\quad \hbox {if} \quad f_{NR}(x) > \hbox {max}(THR1, f_{NI}+THR2)\\ 0&{}\quad \hbox {others} \end{array}\right. } \end{aligned}$$

(15)

Basconand et al. [16] combine thresholding on H and S components with the achromatic decomposition. This method, however simple and fast, is not robust to signal deterioration and illumination changes. Fleyeh et al. [17] use thresholding on H, S, and V components; this method is resistant to lighting changes, but is costly in computation time. Vitabile et al. [18] on the other side use dynamic aggregation technique of pixels to segment the image.

2.1.3 HSI space

Several authors have chosen to use HSI space because its hue component is invariant to changes in the luminance.

Escalera et al. [19] have chosen HSI space to detect the signs. only H and S components are used to compensate for the problem of brightness variations. The authors constructed two look-up tables (LUTs): one for the hand and the other for the S component. The idea is that each LUT makes up for the other, i.e., if a component has false values, the other can compensate. Once the LUTs are applied, the resulting images are multiplied and compared with the conventional logical AND.

Fang et al. [20] assume that each particular color of a sign can be represented by a hue value distributed with a Gaussian manner variance $\sigma ^{2}$. The set of all these hue values is denoted as $\left[ h _{1}, h_{2},\ldots \ldots ,h_{q}\right] $; then, a z degree of similarity between the color of a pixel h and color sign hues $h_k$ is calculated as the following:

$$\begin{aligned} \begin{aligned}&z=\hbox {max}_{(k=1,.q)} \\&z_k=\frac{1}{\sqrt{2\pi \sigma }} \hbox {exp}((h-h_k )^2/2\sigma ^2) \end{aligned} \end{aligned}$$

(16)

They will not get a segmented image, but an image where each pixel represents the similarity between the pixel color and standard one. One of the drawbacks of this method is that its calculations are not linear.

2.1.4 YUV space

The YUV space is a three-component model based on the separation of luminance and chrominance information:

$$\begin{aligned} \begin{aligned}&Y=0,299 R+0,587 G+0,114 B\\&U=0,493(B-Y)\\&V=0,877(R-Y) \end{aligned} \end{aligned}$$

(17)

Rectangular information panels being detected in [21] by a colorimetric thresholding in YUV are followed by a horizontal and vertical projection of the gradient on a recognition of Kanji (Japanese writing system). In this approach, the performance is only illustrated by some examples. The choice in [22] fells also on the YUV space after previous color correction given the pixel values of the theoretically gray floor $(R = G = B)$.

In [23], the authors compared different segmentation methods in order to find the best method in the field of road signs recognition. They classified the methods of segmentation into three main categories: segmentation with binarization, chromatic/ achromatic decomposition, and edge detection and then proposed a new segmentation method in which they combined SVM with LUT (Look Up Table).

After implementing the different methods and conducting an extensive research to find the best method, they concluded that:

In the single images, the best results were obtained with the RGB method; however, in the videos, the best results were seen when applying LUT SVM method;
The edge detection can be used as a complement to the segmentation method but cannot be used alone;
The standardization of RGB color space proves good performance with less operations; on the other hand, HSV space, though it gives higher results, takes a long time of execution, which makes it a less efficient method. Thus, why would we use a nonlinear transformation if just simple normalization is good enough.

Unlike the color thresholding and extreme region extraction methods used in previous approaches, the recent approach [24] uses High-Contrast Region Extraction (HCRE), motivated by the cascaded detection methods, to extract ROI with high local contrast. This can keep a compromise between the detection and extraction rates. Taking advantage of the observation that different types of traffic signs have relative high contrast in local regions, the HCRE can reject approximately 83.10% of the non-interesting regions with low local contrast such as the sky, roads and some buildings, and thus boosting the detection speed of the SFC-tree detector from 5 to more than 10 frames per s in their experiments.

Table 1 Results achieved by Youssef et al. [81]

An overview of traffic sign detection and classification methods

Abstract

Similar content being viewed by others

Robust Traffic Sign Detection and Classification Through the Integration of YOLO and Deep Learning Networks

Performance enhancement techniques for traffic sign recognition using a deep neural network

Traffic sign detection and recognition based on pyramidal convolutional networks

Explore related subjects

1 Introduction

2 Detection methods

2.1 Color-based methods

2.1.1 RGB space

2.1.2 HSV space

2.1.3 HSI space

2.1.4 YUV space

2.2 Shape-based methods

2.3 Learning-based methods

2.4 Publicly available detection datasets

3 Classification methods

3.1 Learning methods based on hand-crafted features

3.2 Deep learning methods

3.3 Publicly available classification datasets

4 Future research directions

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation