1 Introduction

With the rapid development of the national economy, automobiles have entered millions of families. Cars have expanded people’s range of activities, facilitated people-to-people exchanges, and become an indispensable tool for society today. Smart cars have become a hot research field, and cars are gradually being developed intelligently through different channels. In this context, the ADAS came into being and has come a long way. The ADAS system uses vehicle-mounted sensors to monitor the driving status of the vehicle, obtains road information in real time and processes it, which can effectively improve the vehicle’s safety. With the integration of information technology and traditional automobiles, the intelligentization of automobiles has become an irreversible trend [23]. The ADAS system also has a huge industrial prospect [4, 8]. Pavement traffic signs play a role in controlling and guiding vehicles in traffic systems. Driving decisions such as turning and changing lanes must rely on road markings. Every year, as many as tens of thousands of people are killed by traffic accidents caused by human error redirection [5]. If roadside traffic sign detection technology is integrated in the ADAS system to provide drivers with reliable road traffic sign information [17], the driver can use this information to drive into the correct lane, which will effectively reduce the occurrence of traffic accidents; the ADAS system can also use the road markings recognition technology to locate the lane where the vehicle belongs and plan the driving path [25], which will greatly improve the driver's driving experience. The realization of a rapid, accurate and efficient detection system for road markings has important research significance.

2 Related works

With the upgrading of the automobile industry and the intelligentization of on-board equipment, the road traffic sign detection system, as a branch of the automobile intelligent system, has also achieved considerable development in recent years. At present, the identification of pavement traffic signs mainly includes two solutions: laser radar and vehicle camera. As we all know, lidar is used to detect three-dimensional moving objects, and cameras are more efficient in optical identification. On the other hand, cameras are more cost-effective and have less dependence on traditional mechanical structures. Therefore, the detection cost of road markings achieved by the way of collecting images through vehicle cameras is even lower, and it has more practical significance in product promotion.

For the detection of crosswalk signs, there is a certain degree of similarity between the algorithm principle and the recognition of air traffic signs. Domestic and foreign scholars also have a large number of research results. In 2004 Rebut [12] used global binarization methods and morphological operators to filter candidate regions of road markings, then described the characteristics of candidate regions using Fourier operators, and identified candidates by the k nearest neighbor classifier. The area corresponds to the type of road traffic sign. In 2009, Wang [19] and other Haar wavelet features were combined with SVM classifier to identify road markings. First, the image is transformed by inverse perspective, and then the road markings between lane lines are extracted by OTSU. Finally, the improved Haar wavelet features are used to describe the target and the cascaded SVM is used to classify the traffic signs. FAST (Features From Accelerated Segment Test) features and gradient direction histogram (Histogram of Oriented Gradient, HOG) were used to detect road signs in 2012 [21]. It first uses the Maximally Stable Extremal Regions (MSER) algorithm to extract the region of interest that may exist in the road markings, and then extracts the feature points of the region of interest through the FAST detector, and describes the generation of the feature vector with the HOG feature. The template is used to match the feature points and structural features of each region of interest, so as to determine whether the area of interest is a road traffic sign. The road traffic sign detection algorithm proposed by Wu has fast calculation speed, high recall rate and good effect. In 2014, He [9] et al. proposed a local intersection feature (L-junction) to detect road markings. He first divides the image into two values and extracts the contour of the linked domain to detect the contour nodes. By coding the location and angle between the nodes, the string similarity between the target code and the flag template is calculated to identify the specific types of the symbols. In 2015, Suhr [18] and others used the priori information of lane lines to assist in identifying road markings. The two lane lines are detected first, and then the region of interest is detected by the method of projection histogram between the lanes. The region of interest is described with HOG features, and the total error rate classifier (TER-based classifier.) is used to identify the region. This method reduces the search area and improves the speed of algorithm. But it is too dependent on lane recognition, and the algorithm cannot work when lane detection is missing. In 2015, Chen [3] proposed an algorithm that uses the BING (Binarized Normed Gradients for Objectness Estimation) algorithm to extract the possibility area, and then further classifies it with the principal component analysis network (PCANet). PCANet is easy to train, and has better classification effect for road signs of small sample sets. In 2016, Hyeon [11] proposed a method of connectivity components grouping based on convexity conditions to detect road signs. In this paper, we use the method of Gauss difference to extract the set of connected components on the pavement, and use a convex condition algorithm to group the components of the connected components, thus solving the problem of the poor recognition effect of the traffic signs on the road. IN 2017, Bailo [1] et al. Proposed an improved MSER method to extract road markings and use the convolution neural network to classify road markings. In this paper, a robust region of interest extraction algorithm is proposed, which extracts and combines regions of interest. Finally, PCANet and SVM are used to classify traffic signs, and the symbols, such as numbers and letters, can be identified.

Inspired by the above algorithm, a road traffic sign recognition algorithm is designed according to the domestic urban road scene. The main contributions of this article are as follows:

  1. (1)

    A road traffic sign extraction algorithm is proposed to segment road markings under various complex road conditions.

  2. (2)

    The method of combining HOG and SVM is used to detect and identify nine kinds of road markings in China.

  3. (3)

    Using the inter frame relationship to improve the robustness of the algorithm.

3 Method of road marking detection

3.1 Image preprocessing

The vehicle camera captures RGB three-channel color images, and most of the road markings are white. The identification of road markings is not strongly dependent on color, and mainly depends on the characteristics of its edges, and the colors actually collected in the road environment are disturbed. Seriously, adding color information will reduce the robustness of the algorithm. Graying can reduce the dimension of the image while retaining the overall and local edge and brightness information of the image, effectively speeding up the algorithm.

In this paper, the weighted average method is used to process the input image, and the weighted average method is the most commonly used gray scale method. Since the human eye has the highest sensitivity to green and the blue sensitivity is the lowest, the weighted average method adds the three colors red, green, and blue in a ratio of 0.43:0.59:0.11, as shown in Eq. (1).

$$ f\left(i,j\right)=0.30R\left(i,j\right)+0.59G\left(i,j\right)+0.11B\left(i,j\right) $$
(1)

Noise generation is inevitable during image acquisition, transmission and storage. Part of the noise is brought by the camera itself, such as the noise when the camera is collecting; the other part of the noise is caused by the environment, such as the reflective highlights of the road surface [22]. In this paper, the median filter is used to suppress the noise. To protect the details of the image, a smaller 3 × 3 window is used for median filtering, and the details in the picture are retained as much as possible.

3.2 Inverse perspective transform

Most of the image feature description methods have good invariance to rotation, scale and translation, but they are not suitable for perspective deformation. The landmark of the original perspective is influenced by perspective transformation, and has large perspective deformation and difficult to detect. Under the view of overlooking, road markings will only change in rotation and scale. Inverse perspective transform (IPM) is an image transformation method to get the top view (Birds-eye view). Under the view of overlooking, the feature of the landmark is more obvious. It is of great significance to transform the image from the original perspective to the overlook view through the inverse perspective transformation.

According to the principle of camera imaging, Bertozzi [2] proposed a method of inverse perspective transformation using camera parameters and successfully applied to the GOLD autopilot system developed by its team. The basic principle is to determine the relationship between the world coordinate system and the image coordinate system, and the mapping transformation between the coordinate systems.

Assuming that the camera is located (l,d,h) in the world coordinate system, the camera resolution is pixel, the field angle is 2α*2β, the yaw angle is γ, and the pitch angle is θ.

Using the formula put forward by Bertozzi, we can deduce the relation between the points (x, y, 0) in the world coordinate system and the points (u, v) in the image coordinate system.

$$ \Big\{{\displaystyle \begin{array}{l}\mathrm{x}\left(u,v\right)=h\times \cot \left(\theta -\alpha \right)+u\times \frac{2\alpha }{m-1}\times \cos \left(\left(\gamma -\beta \right)+v\times \frac{2\beta }{n-1}\right)+1\\ {}\mathrm{y}\left(u,v\right)=h\times \cot \left(\theta -\alpha \right)+u\times \frac{2\alpha }{m-1}\times \sin \left(\left(\gamma -\beta \right)+v\times \frac{2\beta }{n-1}\right)+d\end{array}} $$
(2)

By using Eq. (2), the point (x, y, 0) in the coordinates of the world coordinate system can be solved to represent the point (u, v) on the image coordinate system:

$$ \Big\{{\displaystyle \begin{array}{l}\mathrm{u}\left(x,y,0\right)=\frac{\cot \left\{\frac{h\times \sin \left[\cot \left(\frac{y-d}{x-l}\right)\right]}{y-d}\right\}-\left(\theta -\alpha \right)}{2\alpha /m-1}\\ {}\mathrm{v}\left(x,y,0\right)=\frac{\cot \left(\frac{y-d}{x-l}\right)-\left(\gamma -\beta \right)}{2\beta /n-1}\end{array}} $$
(3)

Through the Eq. (3), we can get the correspondence between any point (x, y, 0) on the aerial view of the world coordinate system and the point (u, v) on the image coordinate system. By traversing X and y, assigning the gray value of the original image to the corresponding position of the top view, we can get the picture after inverse perspective transformation.

In practical use, the position information of the camera in the world coordinate system can be measured. The X axis offset can be defaulted to 0, the Y axis offset the distance between the camera and the head, and the Z axis is offset to the camera height. The field of view and imaging resolution of the camera can be obtained according to the specific model of the camera. The missing parameters are the pitch angle θ and yaw angle γ of the camera. When the vehicle is running along a straight road, the pitch angle and yaw angle of the vehicle can be considered as 0. According to the location of the vanishing point of the collected image, the pitch angle θ and yaw angle γ of the camera can be estimated from the camera’s field of view. The coordinates (xp,yp) of the vanishing points on the original image are set.

$$ \Big\{{\displaystyle \begin{array}{l}\gamma =\left({x}_p-\frac{\mathrm{m}}{2}\right)\times \frac{2\alpha }{m}\\ {}\theta =\left({y}_p-\frac{n}{2}\right)\times \frac{2\beta }{m}\end{array}} $$
(4)

This research figures out the vanishing point through detecting the straight line in the picture and solving the intersection point of these lines. Vanishing point does not need every frame calculation. When the camera position is fixed, detecting the vanishing point at a time can achieve the purpose of solving the IPM equation. The original image can be obtained via inverse perspective transformation, and the top view can be obtained. The transformation result is shown in Fig. 1.

Fig. 1
figure 1

Result diagram of inverse perspective transformation

3.3 Target candidate region segmentation

The common target detection algorithm often uses the multiscale sliding window method to generate the candidate region, which needs to shrink the images at different scales, and then use a fixed size window to move in a certain step and detect the target for each window. This method is widely used in recognition tasks such as pedestrian and vehicle recognition. The slip window method has a low detection rate, but this method has huge search space and slow detection speed. The road markings are white, and the luminance is higher than the surrounding environment. The algorithm based on the maximum value stable region [14] (MSER) is often applied to the segmentation of road traffic sign elements. However, MSER algorithm is slow in computation and difficult to implement in real-time algorithms. This paper proposes an image segmentation algorithm for image segmentation of road markings. In the picture, the gray value of the road markings and the road background is not uniform, which is not conducive to the extraction of the target elements. First, the top hat transformation is used to deal with the original overview map, and the road background is eliminated preliminarily. Document [16] proposes a method of two values of each row pixel in gray image processing. In a road scene, in a fixed length pixel interval, if there is a symbol element (road traffic sign and lane line), there must be about 50% of the pixels in this interval. Therefore, this paper deals with the picture line by line, calculates the median gray value of the adjacent pixels of each pixel, and adds a fixed threshold, which is considered to be a symbol element above the threshold, and it is considered to be the background below the threshold.

In Fig.2, it can be seen from the experimental results that the algorithm has good segmentation effect on various situations, and has good robustness for various conditions such as road reflection and vehicle interference.

Fig. 2
figure 2

Results of segmentation algorithm. a Image to be segmented. b The results of segmentation algorithm

3.4 Candidate region screening

Because of the complexity of roads, vehicles and shadows will still interfere with logo extraction. There are still a large number of Unicom domains which are not road markings. In actual scenario, the size of road markings is in line with the national standard, which can be selected through certain means.

In this paper, the closed operation of the two valued segmentation results is used to connect some holes and discontinuous regions. In the actual scene, the size of the road markings conforms to the national regulations, and these connected fields can be screened by certain means. The minimum outsourced enclosure of each connected domain is obtained, and the length of the spindle, the length of the axis and the inclination angle σ are obtained [15]. As shown in Fig. 3.

Fig. 3
figure 3

Minimum outer bounding box for the connection area

In order to remove the link area of non-road markings, the following characteristics of Unicom domain can be considered:

  1. (1)

    Spindle length of minimum outsourcing frame in the connection area.

  2. (2)

    The length of the countershaft of the minimum outsourcing frame in the connection area.

  3. (3)

    The ratio of the length of the spindle to the length of the auxiliary axis H/W.

  4. (4)

    The angle between the minimum outsourcing envelope and the camera coordinate system in the Unicom area.

  5. (5)

    The area duty ratio of connected area. The duty cycle is defined as the ratio of the minimum area of the outsourcing frame to the area of the connection area.

The features extracted from the real road markings will have a certain range of distribution. Through the statistics of the distribution range of the connection area characteristics obtained from the actual samples, the characteristics can be restricted to a certain extent. For the Unicom domain which is not satisfied with the restricted conditions, it can be considered as a non-road traffic sign directly. A large number of negative samples can be removed through candidate region selection. The fewer the number of Unicom domains is, the fewer the specific recognition times will follow. However, due to the deformation, wear, and non-standard conditions of road markings, these conditions should be properly relaxed to ensure that all the positive instincts are screened through the region.

Figure 4 is the result of candidate region screening. From the results, we can see that the smaller area and long lane line in the original segmentation result are all excluded, only the road traffic sign area and the individual Unicom area are left, the segmentation effect is better. The enclosed rectangle box is the final result area.

Fig. 4
figure 4

Candidate region screening. a Candidate region segmentation. b Minimum surrounding contour. c Candidate region screening results

4 Road marking recognition

4.1 Algorithmic framework

When the road markings are identified, this paper first extracts the regional characteristics of the region of interest, and then uses the machine learning method, using the classifier which is trained off-line, to distinguish the regions of interest and determine whether these areas are road markings [13]. The principles and methods of concrete operation are mostly the same as those of most target detection applications. In this paper, the approach is different from the method of deep learning [10] and accurate lateral positioning from map data [6, 24]; the detection process is divided into three parts: training data preparation, model learning and target detection. First, prepare a large number of data containing targets, extract feature and classify training from data, and form a classification model. When testing, we first detect the input picture, extract the region of interest which may be the landmark, then use the model to classify and judge, and get the specific detection results. The main process is shown in Fig. 5.

Fig. 5
figure 5

Road markings recognition algorithm flow chart

4.2 Feature extraction and training

  • Feature extraction and dimensionality reduction

This paper uses the PCA method to reduce the dimension of the feature. Feature dimensionality reduction means mapping high dimensional data into low dimensional space by mapping method. If the characteristic parameters are too many, the so-called “dimension disaster” will occur. The direct consequence is the phenomenon of fitting, which leads to the good performance of the model in the training set and the poor results for the new data detection. On the other hand, features include redundant information and noise information in the original high dimensional space, which will affect the classification of the features and reduce the accuracy of the system. By reducing the dimension of the feature, the redundant information in the feature vector can be reduced, the precision of the model is improved, the running time of the model is reduced and the recognition accuracy of the system is improved.

Finally, how many dimensions should be reduced to be constantly adjusted in the experiment? Through experiments, we can analyze the effect of preserving the different dimension features on the classification effect of the final SVM classifier. Considering that the detection system is also sensitive to the running time of the system, the time spent in dimension reduction will also be taken as a measurement factor.

  • Classifier structure

Because the logo area in the actual road is far less than the non-marked area extracted, the classification structure can effectively improve the classification efficiency of the classifier [20]. In this paper, a cascade SVM classifier is designed to classify and recognize the features of the detection area. The structure of the classifier designed in this paper is as follows [7]:

As shown in Fig. 6, the first level classifier (SVM1) is used to remove some non-marking areas, and the second level classifier (SVM2) specifically classifies the types of the traffic signs. In this paper, we mainly use 15 types of traffic signs. The road background is all signs and other interference areas that are not recognized. Because most of the regions of interest are non-marked areas, most of the road background is separated at the first level, and the marker elements can be categorized at the second level. This classifier design is helpful to improve the classification efficiency of the classifier and improve the accuracy of the whole system.

Fig. 6
figure 6

Classifier structure

4.3 Post-processing of recognition results

In the actual detection, the quality of single frame image is not high, and the target ambiguity results in poor detection results. However, it does not need to detect single frame pictures in practical application, and the image sequence is obtained. By combining the information between multiple frames, the robustness of the system can be improved effectively and the misdetection and missing detection can be reduced.

By recording the test results in the first 3 frames and adding the results of the frame detection, we can make reference control and eliminate misdiagnosis. For each target that is considered as a road traffic sign, if the same target is detected in the first three frames, the detection is correct; the current frame is not considered a traffic sign, and the front frame is considered as the element of the traffic sign and is amended in the current frame. Considering that there may be multiple targets per frame, the system uses location information and type information to match. It is considered that the regional center distance, aspect ratio, and area difference satisfy certain threshold conditions.

5 Analysis of experimental results

5.1 The establishment of road traffic sign dataset

Because the machine learning algorithm needs a large number of data samples to train the model and evaluate the effectiveness of the algorithm, and there is no public data set of road markings at home. Therefore, this paper sets up a Chinese road traffic sign data set, including 381 segments of video files. The data set is collected by a dedicated vehicle mounted camera, and the camera is mounted in front of the vehicle. The data collection includes the common road markings in China. The shooting place is the urban road and the Expressway near Wuhan. The time range is from morning to night, and the weather conditions include sunny, cloudy and rainy. In order to identify the unity of algorithm operation, we unify these data to 1290 * 720 size. In this experiment, the data set is randomly divided into two parts, one part of which is used as a training set, which is used to participate in the training process of the two algorithms in this paper. The other part is the validation set, which is used to test and compare the actual performance of the two algorithms. In order to ensure sufficient data and more sufficient training model, the data expansion operation is carried out in this paper. By image processing, the number of samples of different categories can be expanded effectively, and the generalization ability of the model can be improved.

5.2 Dimensionality reduction experiment of classifier

For each region of interest, the normalized size is 32*32, and the HOG feature is extracted to get 324 dimensional eigenvectors. According to the PCA mapping relation, the feature vectors are obtained, and then the feature vectors after dimension reduction are sent to the SVM classifier for classification.

In this paper, we use K fold cross validation to find the best classifier. The data set is divided into K (k > 2), and one of them is selected as the test set to analyze the experimental results. The remaining k-1 parts are used to train the classifier model. Through repeated experiments, each subset is taken as a test set, and the rest is used as a training set. K group experiments are obtained and K group experimental results are obtained. The average value of the experimental results of the K group is calculated as the performance index of the classifier under the current K cross validation.

In this paper, the accuracy rate is used to evaluate the classification effect of classifiers. The accuracy rate is defined as the ratio of the correct number of samples to the total number of samples. In this paper, we test the classification effect of classifier trained on different dimensions and features on training set and test set respectively. The results are shown in Table 1 as shown below.

Table 1 Analysis of the effect of reducing dimension

Experiments show that the original classifier can be dimensionally reduced to 150 dimensions, and the trained SVM classifier can get the best result. When 150 principal components are retained, the average time of classifying the classifier is reduced to 57% of the original time. At the same time, the accuracy of the classifier for both the positive and negative samples is not significantly reduced, which basically meets the requirements. If dimensionality is reduced, the classifier’s ability to describe positive samples is reduced and cannot be classified well. In addition, from the test results, there are few samples of road markings, and the test results are poor, such as left or left confluence arrows, right bend and right confluence arrows, left turn or right turn arrowheads, which need to be collected more samples for training in the future to improve the detection effect.

5.3 Comparison experiment of different methods

The purpose of this section is to verify the feasibility of the algorithm, and to make a quantitative evaluation of the effect of the algorithm. At the same time, through the experimental results, we analyze the existing problems and summarize the relevant experience. This research compares the different algorithms, different scenes and different models, and evaluates the detection and recognition results of road markings with accuracy and recall. In addition, in order to evaluate the running time of the algorithm, the average calculation time of all the test picture frames is used as a reference index for the speed of the evaluation algorithm.

In this paper, we have tested 130 videos. The video capture time is mainly in the daytime, and the collection sites include urban roads, suburban roads, viaducts and so on. Fig. 7 is shown the result of road markings detection under different road conditions.

Fig. 7
figure 7

Detection results of road markings

Fig. 7 (a) (b) (c) (d) are the result of road markings identification under different traffic conditions. (e) (f) are the result of identification in the curve and wear scene respectively. (g) (h) are misdiagnosed and missed (the red frame is misdiagnosed and missed). In order to evaluate the accuracy of the algorithm, the results of statistical experiments are shown in Table 2.

Table 2 Experimental results

As can be seen from Table 2, all kinds of arrows have a good recognition effect. The F1 scores of the common signs such as straight arrow, left turn arrowhead, right turn arrowhead, straight left turn, crosswalk preview line were all above 0.9, and the accuracy rate was high. The recall rate of right turn arrowhead, turn arrow and left turn arrowhead is low, and the reasons are analyzed. It is found that these arrows are less important in the data set and may not be trained adequately, and the models are easily misjudged as road markings. Moreover, in the case of reflection, occlusion and shadow, the algorithm may lead to the failure of the region of interest extraction, which leads to missing detection.

In order to evaluate the recognition effect of this algorithm, we use template matching method and depth learning method to compare with the algorithm in this paper. In this paper, the test platform is tested on (i7–6700 CPU@3.40GHz, 16GB, RAM), and the comparison results are summarized in Table 3.

Table 3 experimental result

To sum up, the accuracy rate of this algorithm is higher than the two contrast algorithms in various scenes, only the recall rate is lower than the method based on depth learning, and the algorithm is the lowest time.

6 Conclusion

In this paper, a fast road traffic sign detection algorithm is designed, which has a good running speed on the embedded platform. This algorithm first designs an image segmentation algorithm based on the local median threshold method. The algorithm can effectively divide the road traffic sign elements from the image, and the algorithm has good robustness to other objects interference and uneven illumination. Secondly, this paper uses the method of combining PCA-HOG and SVM to realize a fast road traffic sign detection algorithm. The algorithm can quickly extract the area that may have traffic signs and identify the types of traffic signs by using the specific characteristics of road markings. At last, the accuracy and reliability of the algorithm are verified by experimental comparison and analysis on the data sets collected in this paper.

7 Acknowledgements

This work is supported by the National Natural Science Foundation of China ( Grant Number: 61540059, 41671441); the Plan Project of Guangdong Provincial Science and technology (2015B010131007); Joint fund project (nsfc-guangdong big data science center project), project number: U1611262, MOE (Ministry of Education in China) Project of Humanities and Social Sciences (17YJCZH203), The Key Research Projects of Hubei Provincial Department of Education (D20182702), Teaching research project of Hubei University of Science and Technology (2018-XB-023) , Innovation training program for college students: 201810927045, S201910927028.