Keywords

1 Introduction

The control of traffic and promotion of road safety are both greatly aided by traffic signs. Accidents and traffic jams are still frequently brought on by people disobeying traffic laws, such as failing to follow traffic signs. A driver alert system with the ability to precisely detect and recognize traffic signs in real-time is necessary to address this problem.

In this study, we offer a unique method for detecting and recognizing traffic signs in real-time in a busy traffic environment. The proposed method involves converting images of the road scene to grayscale and filtering them with simplified Gabor wavelets (SGW) to enhance the edges of traffic signs for improved detection [1]. The Maximally Stable External Regions (MSER) approach is then used to extract the area of interest (ROI). We have explored the use of multiple machine learning algorithms with various feature extraction techniques to classify the detected ROI from the MSER. This was done to compare the performance of different approaches and determine which is best suited for our proposed system. A classifier is used to classify the superclass of traffic signs.

Extensive tests were carried out on actual traffic scenarios to assess the detection accuracy, recognition accuracy, and processing speed of our suggested method. The findings demonstrate that our approach performs competitively in terms of accuracy and processing speed, qualifying it as a viable option for real-time traffic sign detection and identification.

In addition, we have explored the use of multiple machine learning algorithms with various feature extraction techniques to classify the detected ROI from the MSER [2]. This was done to compare the performance of different approaches and determine which is best suited for our proposed system. The results of these tests are included in our findings, providing a comprehensive evaluation of the effectiveness of our proposed approach.

The ultimate objective of the proposed method is to create a driver warning system that, by giving timely and precise notifications to drivers when they approach traffic signs, will dramatically minimize traffic rule breaches. A system like this has the potential to significantly improve traffic flow, lessen accidents, and improve road safety, resulting in more effective and sustainable transportation systems.

Authors Contribution

  1. 1.

    We have recorded videos in busy local street to get some real life environment situation.

  2. 2.

    We have created a database of images from the videos for testing purpose.

  3. 3.

    We have compared different feature descriptor like HOG and LBP.

2 Literature Review

The process of recognising traffic signs typically consists of two stages: the initial stage is traffic sign detection, which looks at the position and size of the signs in the images of the traffic scene, and the second is traffic sign recognition, which is focused on classifying the signs into the correct category. Traffic sign detection often relies on the colour and form characteristics of the signs, while traffic sign identification frequently uses classifiers like convolutional neural networks (CNN) and SVM with discriminative features.

Red, blue, and white make up the limited colour palette used on traffic signs, which helps us recognise them from the surroundings [3]. The colour-based traffic sign detection approach is demonstrated to be the most direct and basic method in the current study. However, colour-based detection techniques frequently fall short of expectations owing to intense light, weak light, and other unfavourable weather circumstances.

Many researchers have recently used technologies in this field, as deep learning approaches have prominently displayed representation ability and produced remarkable performance in traffic sign identification. With a hinge-loss stochastic gradient descent approach and a convolution neural network based on deep learning, it was able to attain a high recognition rate. Offered a multi-column deep neural network operating on a graphics processing unit (GPU) for traffic categorization and achieved a recognition rate that was superior to that of a person. In order to categorise traffic signs, Qian employed CNN as the classifier and learned the discriminative feature of maximum pooling locations. He achieved a performance that was equivalent to that of the state-of-the-art technique [4]. Nevertheless, to satisfy the needs of real-time applications, it is required to further investigate the selection of discriminative features and explore the network topology in order to increase classification accuracy and processing speed.

For a long time, there were no tough public datasets in this domain, but that changed in 2011. Larsson and Felsberg, as well as Stallkamp et al., presented difficult datasets, containing annotations for traffic sign detection categorization. The German Traffic Sign Recognition Benchmark (GTSRB), the German Traffic Sign Detection Benchmark (GTSDB), and the Belgian Traffic Sign Classification (BTSC) were among the datasets used. More academics have been drawn to the GTSDB and, in particular, the GTRSB, and some of them have discovered novel techniques to verify utilizing this database.

3 Methodology

3.1 Dataset

The dataset is an important part of this strategy. A substantial amount of traffic sign data is required to train and validate a deep convolutional neural network traffic sign recognition model, which is currently included in well-known traffic sign databases. The GTSRB and GTSDB traffic sign datasets are used in this work. These two datasets contain a wide range of sophisticated traffic signs, including sign tilt, uneven illumination, traffic signs with distraction, occlusion, and comparable backdrop colours, as well as genuine scene maps (Fig. 1).

Fig. 1.
figure 1

Different Traffic Signs in Data-set

3.2 Proposed Method

Our proposed method has three steps: feature extraction, detection, and classification. First, we extract the road images from the streaming video and convert them to grayscale. Then comes the feature extraction phase. For edge detection and smoothing of the image, we use the simplified Gabor filter. By using different parameters, the image has been smoothed. The borders of the traffic signs were reinforced, and the regions inside the traffic signs were smoothed in the resulting Gabor feature map picture. Then comes the detection phase. In this step, we used the maximally stable extremal regions (MSERs) method to determine the regions of interest (ROI) and filtered out places where there was a minimal chance of traffic signs existing based on our stated filter rules.

The proposed regions were then categorised using a variety of different processor and classifier combinations to determine the super-class of traffic signs to which each area belonged (Fig. 2).

Fig. 2.
figure 2

Flowchart of the Methodology

4 Feature Extraction

Feature extraction is the process of converting raw data into numerical features that may be processed while retaining the information in the original data set. It generates better results than merely utilising machine learning on raw data. It comprises features like corners, edges, ROI, and so on. The system’s precise feature is to distinguish the traffic sign from the backdrop image by using image colour, shape, size, and orientation.

4.1 Gabor Filter

A Gabor filter is a linear filter that is used in image processing to identify edges, classify textures, and extract features. When a Gabor filter is applied to a picture, it produces the best results near the edges and where the texture varies. A bank of Gabor filters with varying orientations is used to analyse texture or extract features from images.

It is particularly useful for detecting edges and other texture features in images with complex textures. It is also used to smooth the non-edge areas of the images (Fig. 3).

4.2 Canny

Fig. 3.
figure 3

Gabor filter output

An effective edge detection technique that is frequently used in image processing is the Canny feature extractor. It functions in several steps, including hysteresis thresholding, non-maximum suppression, gradient computation, and Gaussian blur. By examining an image’s intensity gradient, keeping just the local maximum in the gradient direction, and suppressing non-maximum gradients, it successfully finds edges. The precision of the Canny feature extractor’s edge detection while reducing noise and false positives is well recognised. It is a well-liked option in the field of image analysis and object identification since it generates binary edge maps, which are commonly used in several computer vision applications (Table 1).

Table 1. Comparison of the computational complexities of Canny, TWG and SWG algorithm

When compared to the Canny and traditional Gabor wavelet approaches, the SGW-based strategy produced the best computational results.

The Canny method produced a binary image, which resulted in a significant loss of image information and could not guarantee adequate detection and classification results, despite the SGW being slightly faster than the Canny.

5 Traffic Sign Detection

Traffic signs are strictly classified based on shape and colour, so that they can be identified by these attributes by any AI. As a result, traffic sign recognition is dependent on its shape or shade.

In our suggested technique, we improved the shape information of traffic signs by using simpler Gabor filters to smooth and enhance the edges. Then the gabor maps have been processed by the MSER algorithm for finding the region of interest. Then the ROI’s have been classified into different signs.

5.1 MSER Algorithm

MSER is a powerful tool for extracting regions of interest (ROI) from an image and has many practical applications in computer vision and image processing. Maximally Stable Extremal Regions (MSER) is an image segmentation algorithm that identifies and extracts regions of interest from an image. The algorithm works by identifying regions that are both stable and extremal with respect to a threshold parameter [5].

In this study, the MSER algorithm was employed to identify the regions of interest (ROI) in the image. By utilising a threshold value, specifically set at a delta of 18, in our implementation, the algorithm determined the stable and extremal regions based on their intensity values. A region in MSER is defined as a connected component of pixels with similar intensity values. Initially, the algorithm calculates the intensity threshold that produces the largest region and then progressively decreases the threshold until the region breaks into smaller pieces. The stable regions are the ones that persist across multiple thresholds, while the extremal regions are the ones that have the most distinct intensity values compared to their surroundings.

The chosen threshold value of 18 was determined empirically, based on the characteristics of the specific image dataset used in this context. It enabled the algorithm to capture the regions of interest with optimal performance. MSER demonstrates several advantages over other image segmentation methods, such as its robustness to noise and its ability to handle images with varying illumination and contrast. Its versatility has been leveraged in various applications, including object detection and text detection in images.

5.2 Image Pre-processing

In order to prepare the image data before feeding it through the different classifiers, we have implemented both HOG and LBP image per-processors. These techniques extract relevant features from the grayscale image data, which are then used as input for the subsequent classification steps. By using both HOG and LBP, we are able to capture complementary information about traffic signs, improving the accuracy of our system.

5.2.1 HOG

To determine which superclasses the suggested regions should be assigned to, we first extracted the Histogram of Oriented Gradient (HOG) features from the SGW feature map, which we dubbed SGW-HOG features [6]. We will be using the built-in function HOG from the Python library Skimage. By analysing the orientation and distribution of gradients, the HOG technique is used to extract features from the detected region of interest. The gradient strengths must be locally normalised to take into account variations in illumination and contrast, which necessitates combining the cells into bigger, spatially connected blocks. The road sign blob is separated into smaller sections termed ‘cells’ in order to discover the HOG characteristics. The cells were then gathered into blocks, and the resulting histograms were subjected to block normalisation to address the issue of illumination variance. It offers a condensed representation of the image that is reasonably resistant to changes in lighting, contrast, and image distortion. Better invariance to variations in illumination and shadowing is the effect of this normalisation. Values for parameters of HOG are orientations 9, pixels_per_cell (6, 6), and cells_per_block (6, 6).

skimage Compute the HOG by:

  1. 1.

    computing the gradient image in x and y

  2. 2.

    computing gradient histograms

  3. 3.

    Normalizing across blocks

5.2.2 LBP

Local Binary Patterns (LBP) is a widely used feature extraction technique that describes the local texture of an image by comparing the intensities of its pixels with their neighbors. In our proposed system, we use LBP as one of the image processors to extract features from the grayscale images of traffic signs [7]. Specifically, we have divided the image data into small parts, or regions, and for each region, we calculate a histogram of the LBP values of its pixels. These histograms are combined to generate a feature vector, which reflects the image’s texture information. By using LBP as a feature extraction technique, we are able to capture the local texture and pattern information of the traffic signs, which can be useful for accurate classification. LBP is initially defined in a neighborhood of eight pixels, and a threshold is established using the centre pixel’s grey value. All neighbors are assigned a value of 1 if their values are greater than or equal to those of the centre pixel; otherwise, they are set to 0. The values for parameters of LBP are radius = 1, n_points = 8 * radius, and method = ‘uniform’.

6 Traffic Sign Classification

We have explored the use of multiple machine learning algorithms with various feature extraction techniques to classify the detected ROI from the MSER. Specifically, we have used three classifiers, namely Support Vector Machines (SVM), K-Nearest Neighbors (KNN), and Random Forest, and compared their performance in terms of accuracy, precision, recall, and F1 score. These tests were conducted to determine the best classifier for our proposed system and to provide a more comprehensive evaluation of the effectiveness of our approach. A classifier is used to classify the superclass of traffic signs.

6.1 SVM

SVM is a strong and extensively used classification technique that has proven to be useful in a wide range of real-world applications. SVM has several advantages over other classification methods, including the ability to handle high-dimensional data, noise robustness, and effectiveness in dealing with non-linear data distributions. SVM has been utilised effectively in a range of applications, including image classification, due to its adaptability and dependability [8].

In our technique, we used a multi-class SVM classifier to divide data into two or more groups depending on input data attributes. Because the traffic signs had two super classes (triangular and circular), we trained a three-class SVM because there was a third super class known as the super class of negative data. This enabled us to efficiently categorise traffic signs, ensuring precise detection and identification.

SVM’s capacity to handle high-dimensional data is one of its primary benefits. This was especially useful in our approach because the input photos comprised a significant number of elements that required to be analysed in order to identify and recognise traffic signs effectively. Furthermore, SVM is very resistant to noise and can deal successfully with non-linear data distributions, which improves its performance in real-world applications.

Overall, the use of SVM in our approach proved to be highly effective, achieving high levels of accuracy in both detection and recognition tasks. We believe that SVM will continue to play an important role in the development of transportation systems, providing a reliable and efficient method for traffic sign detection and recognition.

6.2 Random Forest

Random Forest is a well-known and mostly used classification and regression algorithm. It uses multiple decision trees and takes the majority vote for classification [9]. Random forest has the advantage of being able to accommodate missing values and a high number of features without overfitting. It is basic, adaptable, and capable of handling binary and categorical data. The random forest method is an ensemble learning strategy that combines several classifiers to improve the performance of a model.

6.3 KNN

A non-parametric supervised learning technique for classification and regression applications is the k-nearest neighbor (k-NN) classifier. It operates by categorising fresh observations in a feature space according to the majority class of their k-nearest neighbors. Euclidean distance, Manhattan distance, and Minkowski distance are a few of the often-used distance metrics for gauging how closely two observations match each other. Through cross-validation or other tuning techniques, the value of k, which stands for the number of nearest neighbors to take into account, is frequently calculated. The k-NN algorithm is easy to understand, can handle high-dimensional data, and is straightforward. For large datasets, it can be computationally expensive and subject to the dimensional curse [10].

It is also known as a lazy learner algorithm since it keeps the training dataset instead of learning from it right away. When categorizing data, it instead utilizes the dataset to execute an action.

7 Experimental Results

Based on our proposed approach for real-time traffic sign detection and recognition using simplified Gabor wavelets (SGW) and Maximally Stable Extremal Regions (MSER), we conducted extensive tests to evaluate the performance of our system using various machine learning classifiers and feature extraction techniques.

We specifically evaluated the performance of our suggested method, which uses Histogram of Oriented Gradients (HOG) and Local Binary Patterns (LBP) as feature extraction techniques, against three classifiers: Support Vector Machines (SVM), K-Nearest Neighbors (KNN), and Random Forest (Table 2, Table 3).

Table 2. Comparision of SVM, Random Forest and KNN with LBP
Table 3. Comparison of SVM, Random Forest and KNN with HOG

The experimental results showed that HOG and LBP both yielded high accuracy, precision, recall, and F1 score for all three classifiers. However, we observed that HOG generally outperformed LBP in terms of accuracy and F1 score for all three classifiers.

SVM had the greatest accuracy and F1 score among the three classifiers for HOG. For LBP, on the other hand, the Random Forest classifier fared best, with the greatest accuracy and F1 score.

Overall, our experimental results demonstrate the effectiveness of our proposed approach for real-time traffic sign detection and recognition, with HOG as the preferred feature extraction technique and SVM as the preferred classifier for optimal performance (Fig. 4).

Fig. 4.
figure 4

LBP VS HOG

8 Conclusion

Our proposed approach using Gabor filter feature maps and SVM on the HOG feature proved to be a promising method for real-time traffic sign detection and recognition. The results showed high accuracy and an F1 score for circular and triangular shapes, which are common shapes for traffic signs. Additionally, we compared the performance of three classifiers (SVM, KNN, and Random Forest) with two feature extraction techniques (HOG and LBP) to further evaluate the effectiveness of our proposed approach.

For future improvement, we can explore the use of deep learning algorithms, such as Convolutional Neural Networks (CNNs), which have been shown to achieve state-of-the-art performance in image classification tasks. We can also investigate the use of more advanced feature extraction techniques, such as Scale-Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF), to further enhance the detection and recognition of traffic signs.

Furthermore, we can extend the scope of our proposed approach to other road-related objects, such as pedestrian detection, lane detection, and vehicle detection, to develop a more comprehensive and advanced driver assistance system. Ultimately, the implementation of our proposed approach and future improvements can lead to the development of more effective and sustainable transportation systems by reducing traffic rule breaches, improving traffic flow, and enhancing road safety.