Keywords

1 Introduction

According to a 2018 study [13] the Brazilian circulating fleet of cars, trucks and buses was composed of roughly 30 million vehicles in 2009. By the end of 2017 the same fleet had grown to over 43 million vehicles, which represents a 44% increase. Given this huge number os vehicles, problems like traffic control and surveilance, stolen car detection and access control to restricted areas have become harder to solve without an automatized tool. The automatized vehicle recognition is core to systems projected with this objective in mind.

Since the license plate is the most discriminant feature when analyzing vehicles, the vehicle recognition task based on finding and identifying the license plate is labeled as Automatized License Plate Recognition (ALPR). An ALPR system is usually composed of three stages: license plate location, character segmentation and character recognition. There exists a big number of approaches developed over the last decades for each of those stages. However, they are mostly limited to their home country license plates and also are greatly impacted by the existence of many plate models and by image or video capture conditions.

The license plate location can be described as a stage where the image regions that contain license plates are located. By finding these regions, the next stage has a much smaller and characteristic region to work with. It is also considered the most critical stage on this type of ALPR system, since the license plate characters can only be segmented and recognized by the next stages if the location provides the region containing the complete license plate number.

In the current Brazilian vehicle licensing system there are two types of plates based on shape: car plates and bike plates. As for further variations, there are many license plate colors considering both plate and character color. There are also irregular plates, which are mostly distinguished by the use of typesets other than the standard. In addition, some Brazilian states have begun licensing vehicles with the new Mercosur plate model, which has the same shapes as the old model but different color patterns and letter-number formatting.

Several authors have proposed using the Viola-Jones [14] cascade as a means to detect the license plate on a image. One of the reported main issues with this approach is the apparent large number of false positives detected by the cascade. On this paper we present an approach composed by the Viola-Jones detector and a pre-processing stage to the images analyzed by it. The pre-processing stage eliminates the color pattern problem, making any plate of the same shape to look similar. It also reduces the effects of some bad plate conditions such as physical deformation, partially erased characters and weaker lighting problems while also reducing the false positive ratio when compared to a Viola-Jones detector trained with no pre-processing stage.

2 Related Work

According to Peg et al. [10], the majority of plate detection methods can be classified into two categories, rule-based heuristic methods and machine learning based methods. Other approaches, such as the ones proposed by [5, 6, 8] make use of morphological operations in the plate location stage.

Several works, such as the ones proposed by [1, 3, 4, 9, 18, 22] make use of border detection techniques to perform the plate localization, since the rectangular outer border as well as the characters borders allows for a robust detection. Usually, methods of this type are robust in the presence of changes in illumination or plate inclination but are susceptible to the presence of noise. Thus, they are usually coupled with a blurring step.

The machine learning based methods consider the object locallization problem as a pattern recognition problem. They perform a training step, using a large number of samples to extract object features that are used by classifiers. The features are grouped in descriptors that are learned by the classifiers. The main task on these methods is then the determination of the descriptors and classifiers to be used. Of course, the amount of samples of the training set and their variability is directly tied to the accuracy and generalization capability of the trained classifier.

Several researchers have used the original Viola-Jones detector [14] for plate localization. In several works is mentioned that the original Viola-Jones detector generates a large number of false positives when applied to the plate localization. Thus, several works, such as [2, 10, 16, 17, 19,20,21] propose changes to it in order to make it more accurate. Some of these changes include the addition of new types of features [10, 18] to the original feature set, alternative feature selection methods, alternative training algorithms [20] and changes to the internal classifier cascade structure.

More specifically, the work of [21] proposes a framework that utilizes two Viola-Jones detectors in parallel, that employ the extended feature set proposed by [7]. Both detectors are trained with the same positive samples but with different negative samples. In the pre-processing phase, they perform an image size reduction, followed by a histogram equalization and Gaussian blurring.

On a different approach, Peng et al. [10] adds new types of features to the Viola-Jones detector, the line segment features. The idea is to reduce the number of Haar-like features in the final detector through the early elimination of background regions using these new features, thus, reducing the detector’s training and processing time. The line segment features are computed from the vertical line segments of the processed region. These are obtained during the pre-processing phase, by performing a histogram equalization, border detection and using the Hough Transform.

Finally, we mention the short survey of Reji and Dharun [12], that references and comments on the results of 17 different works.

3 The Proposed Method

The proposed method was developed with the objective of detecting Brazilian license plates regardless of their model, while maintaining a low cost. The proposed approach is divided into two stages: the pre-processing of the image and the location of the plates by a Viola-Jones cascade detector on the resulting image representation. The Brazilian plates, both the old and the new Mercosur models, have a 3:1 aspect ratio, similar character displacement and font size. The base resolution of the detector is set as \({60}\times {20}\).

Fig. 1.
figure 1

Stages of the proposed license plate localization method.

As mentioned above, the usage of the original Viola-Jones detector to the detection of license plates results in many false positives. Since the plate regions have many lighter to darker color transitions, which is key to the detector internal calculations, most of the problems must be related to the similar geometric patterns on other objects and the existance of many plate patterns. Considering this, a pre-processing stage was conceived to both reduce the possibility of confusion with other object patterns and reduce the plate variations. Another objective of this first stage is to reduce the influence of problems such as physical deformation on the plate, partially erased characters and uneven lighting problems.

The original Viola-Jones detector works internally with grayscale images and fixed size regions that are scalable during the detection. When dealing with old model red-white plates, we have one pattern where the characters have higher intensities than the background. This is the opposite of the most common old model gray-black plates and Mercosur model plates, where the character has a lower intensities than the background.

As the red-white old model plates have a different pattern than the gray-black old model plates or the Mercosur model plates, it is necessary to make them have a similar pattern so the detector can detect any of them without having to learn multiple patterns, which would lead to bigger and likely less accurate detectors. Both red-white and gray-black old model plates and any Mercosur model plates have borders which are represented by lighter to darker color transitions. Since their character displacement, font and plate sizes are all similar, the borders should have approximately the same positions on all of them. As the plate region is rich with vertical borders, a border detection algorithm should be a valid option as a means to create a shared pattern between the plate models. Horizontal borders are avoided to reduce the number of similar patterns with other objects. Also, by changing the representation to a binary image, the aforementioned problems have a lessened influence.

However, the contrast on the plate region must be high enough so that the border detection can correctly detect the vertical borders with a high threshold, intended to erase most of the image. After some experimentation, the Contrast Limited Adaptative Histogram Equalization [11] was chosen specially because of its reduced susceptibility to lighting peaks.

After the contrast enhancement by CLAHE, the vertical borders are detected by the Sobel operator. It is the lowest cost operator amongst the border detectors, but it does a good job on finding the vertical edges needed to create the plate pattern that is different from other object patterns.

Since most of the vertical borders and edges detected by the Sobel operator are thin, it should be harder for the training algorithm to find favorable thresholds for the internal classifiers as the majority of the plate region is now empty. As the operator is applied to the whole image, many false positives could be found since it is likely to be sparse. Thus, a enhancement step is added as a final step in order to try to create better patterns.

To use morphological operations such as dilation and erosion, a structuring element must be used. The one defined as A is a cross-shaped matrix of ones, while the one defined as B is just a one column shaped matrix. By using A, we hope to connect the borders by enlarging them in four directions. By using B we hope to connect vertical border fragments that were likely part of a same border.

After some experimentation, the best alternative to the enhancement was chosen as two dilations followed by an erosion. The second best alternative was the one composed by one dilation followed by one erosion and was also considered on the results. To summarize, the pre-processing is composed by the following steps:

  1. 1.

    Grayscale conversion;

  2. 2.

    Contrast Limited Adaptative Histogram Equalization (CLAHE);

  3. 3.

    Vertical border detection by Sobel operator;

  4. 4.

    Morphological dilation with A structuring element;

  5. 5.

    Morphological dilation followed by erosion with B structuring element.

When all the pre-processing steps are done, the image is searched by the Viola-Jones detector standard multi-scale sliding window procedure. The detector is trained with images which underwent the same pre-processing steps.

4 Results

To obtain the results with the proposed method, two sets of experiments were made. On the first, 5 Viola-Jones detectors were trained on a 1st training set, each using the training images processed on a different pre-processing stage. After establishing which one had the best results, 2 more detectors were trained using the same pre-processing as the selected detector, but trained on 2nd training set, extended from the first one. This two part approach was made to reduce the time spent on training the Viola-Jones detectors, which can take from hours to days.

4.1 Training Sets

To build this first training set, both positive and negative samples are needed. A total of 838 old model plates were manually cropped from car images on different positions and lighting conditions, therefore composing the positive set. The car images used were part of a private database used on a commercial system called Kapta. The database is composed by low resolution (\({320}\times {240}\)) vehicle images captured automatically by the use of magnetic presence sensors on many different capture conditions.

A total of 2,461 images were obtained from web sources to compose the negative samples of this training set. Two public databases were included in this set: the Calltech background dataset, containing 900 grayscale background images and the Google “thing” dataset, containing 520 color images of assorted things. Both were obtained from the University of Oxford Visual Geometry Group web page [15]. The remaining images were obtained by the use of a web search engine and by modifying images from the Kapta set not used on the positive samples. The modifications were plates covered by black rectangles and images cropped enough to remove part of the license plate number. The 2nd training set was an extension to the first, adding to a total of 3,497 positive samples and 13,616 negative samples.

4.2 Test Sets

For testing, two test sets were assembled. The first one used 1,803 images of the Kapta database, whose plate regions were manually selected. Those images contained one license plate each. The second contained 123 manually collected images that contained the Mercosur license plates. On this second set, a \({320} \times {240}\) sub-window was put on the car that contained the Mercosur license plate, to crop the image, so it would have the same size than the Kapta database images.

4.3 Analysis Criteria

To evaluate the accuracy of the trained detectors we used the IoU (Intersection over Union) metric to decide which of the detected regions were good enough to be considered as acceptable detections. By doing so, the threshold of 0.6 was set to split the good and bad detections. Along with the accuracy, we also use the number of undetected plates and the false positive ratio. A false positive is considered a detected region that does not intersect the license plate region of an image. When dealing with multiple detected regions that intersect the the license plate region, only the one with highest IoU value is considered to be a possible good detection and the remaining are treated as false positives. The false positive ratio is the average number of false positives per image.

The IoU metric gives the spatial precision of one detection by measuring how much one region found by the detector overlaps the plate region. However, the detector can detect multiple plates on a single image or even no plates at all. The following S metric is defined as a means to score the whole detection quality on a image. It can be calculated by the Eq. 1.

$$\begin{aligned} S = \left( \frac{1}{tn} \sum _{i=1}^{tn}\max _{j=1}^{gn}IoU(t_{i},g_{j}) \right) e^{\frac{\min (0,tn-gn)}{k}} \end{aligned}$$
(1)

In Eq. 1, the S value associated with an image is the IoU value or its average, when the image contains more than one license plate, penalized by the exceeding number of detected regions (gn) in relation to the number of plates (tn). The max function returns the IoU value of the best detected region (\(g_{j}\)), defined as the region which scored the highest IoU value with the analyzed plate region (\(t_{i}\)). The penalty is weighted by an empirically defined constant k, which defines the severity of having false positives. As the defined test sets contain images with one plate only, that Equation is simplified by removing the averaging of the IoU value.

4.4 Experiments

We first analyze the 5 detectors of the first set. The RAW detector was trained with the original color images, while the AEQ detector was trained with the images resulting of the second step of the pre-processing: CLAHE application post grayscale conversion. The SOBEL detector was trained with images resulting of the third step of the pre-processing: vertical border detection using Sobel operator. The MOR1 and MOR2 detectors both enhance the borders found by the third step. The first uses a dilation followed by a erosion, both with the structuring element A (\({3}\times {3}\) cross shaped). The second starts by a dilation with the structuring element A, followed a by a dilation and a erosion with the structuring element B (\({3}\times {3}\) central column shaped).

The training of the detectors was made by the use of the opencv_traincascade application on OpenCV 3.3. Both training and testing were executed on the same machine, a PC equipped with a core i5 3330 processor with 8GB of memory. For parameters, the base resolution of the detector was set to the previously defined \({60}\times {20}\) and a scale factor of 1.1. The maximum false alarm rate of 25% and minimum detection rate of 99,5% were used, and we also used twice number of positive samples for the negative samples per level on training. Those 5 detectors use the base Haar features set. The k value used to calculate the S value was empirically defined to 15.

Table 1 summarizes the results of these 5 detectors on the 1,803 old model plates test images. The accuracy is given by IoU threshold of 0.6 on the best detection of the respective image. Undetected is the number of undetected plates. FPR stands for false positive ratio. The following two columns contain the average S value to the images and its respective standard deviation (\(\sigma \)). It also contains the data obtained with the second set of detectors, which will be explained soon.

Table 1. Accuracy of detectors by using IoU \(\ge \) 0.6 as detection threshold, number of undetected plates, false positive ratio, average detection score and its standard deviation on old model plates

The MOR2 detector, while not exhibiting the best results, considering false positive ratio, still has a higher average image score than the other preceding 4 detectors. As the S value is penalized by false positives, it means the regions detected by the MOR2 detector must be more precise even if it has a lower number of undetected plates. Its accuracy considering the best detection is also the highest between the 5 first detectors. With that in mind, the MOR2 pre-processing has been proven effective by improving accuracy, reducing the undetected number and more precise regions.

The second set of detectors is composed by detectors MOR2b and MOR2b-ALL, both trained on the second training set. Their training parameters are the same as the other 5 detectors, with the exception of MOR2b-ALL which uses the extended Haar feature set proposed by Lienhart et al. [7]. They both have the same pre-processing steps as the MOR2 detector, since it was selected as the best amongst the initial 5. By training those detectors, we have the objective of trying to decrease the false positive ratio of the MOR2 detector by having a bigger training set.

From the results of Table 1, it can be seen that even though the MOR2b-ALL detector seems the best in terms of accuracy, the MOR2b detector can be considered in some situations the best because of its incredibly low false positive ratio. It is more strict than the other detectors, greatly reducing the false positive ratio at the cost of less detected regions. That also leads to a increase on the average S value.

In addition, as it can be seen on Fig. 2, the old model test set has images of varying quality in relation to the plate visibility and associated noise. By the obtained results it can also be said that the pre-processing successfully decreases the influence of problems such as physical deformation, partially erased characters and weaker lighting conditions. The Table 2 summarizes the results obtained on the Mercosur model plates test set.

Table 2. Accuracy of detectors by using IoU \(\ge \) 0.6 as detection threshold, number of undetected plates, false positive ratio, average detection score and its standard deviation on Mercosur model plates

Although the MOR2b detector works well with both plate models considering the low false positive ratio, its undetected number of plates is very high. So the MOR2b-ALL detector can be seen as better fit to the task of detecting both plate models, since it achieves greater average image detection score and higher accuracy compared to the MOR2 detector. Both plate models are very similar in appearance so the MOR2b-ALL training has made a better job by selecting features amongst the extended set which are more common between the models. The Table 3 lists the average time spent per image on each of the trained models, in seconds, followed by its standard deviation.

Table 3. Average time spent by the detectors (in seconds) on test images including pre-processing, in seconds

All detectors aside from RAW have spent similar average time on the images, including the whole pre-processing steps cost. The CLAHE step used to enhance image contrast is considered the highest cost step on the proposed method. The following steps have low cost considering the low resolution images and morphological operations on binary images. While considering cost-benefit, the proposed method has doubled the time spent per image but increased the accuracy by over 25% on the old models and 10% on the Mercosur model, while greatly decreasing the false positive ratio. Further experimentation would be required to attain better results on the Mercosur plates while keeping the old model results. Some of the detection results by the detectors RAW, MOR2, MOR2b and MOR2b-ALL can be seen on Fig. 2.

Fig. 2.
figure 2

Detection results of detectors on some of the test images with green rectangle being the ground truth and the red rectangles being the detections; first row containing old model good quality image, second row containing old model average quality image, third row containing old model low quality image, fourth and fifth row containing Mercosur model images. (Color figure online)

5 Conclusion

We proposed a new metric for evaluating the task of localizing license plates on images. We also proposed a simple and efficient method to perform license plate location on Brazilian license plates that uses the well known Viola-Jones cascade detector. The addition of the pre-processing step made it possible to improve the accuracy and reduce the false positive ratio of the detector, while keeping a low computational cost, our main objective, since tis is intended to work on an embedded system. It also leaves room for improvement such as the use of different contrast enhancement options and even changes on the internal structure of the detector. Further experimentation is needed to improve detection rate on Mercosur model plates, and incorporate low-cost algorithms for performing the segmentation and classification of the plate’s characters. We plan to do that and compare with the results of several approaches available in the literature [12] using our proposed measure, in order to validate it under a more general setting. Finally, we plan to investigate variations of the detection metric used, by incorporating penalties for missed characters on plates’ detected areas, to better encode the quality of the results of this task.