Keywords

1 Introduction

The Automatic License Plate Recognition (ALPR) problem consists in detecting and reading one or more license plates (LP) in images. The ALPR systems are used in a relevant number of real-world applications, such as road traffic monitoring, automatic toll collection, traffic law enforcement, and parking lot access control [3].

In general, ALPR systems are composed of the following stages: the vehicle’s LP region is located and cropped from the input image, LP’s characters are segmented and classified, thus, reading the LP. However, recent approaches first detect the regions of the vehicle before the LP detection in order to reduce false positives and the processing time [7, 9, 10, 16, 17]. Also, recent works replace character recognition and classification for character detection, combining both stages [10, 16, 17] while some works use completely segmentation-free approaches [2, 5, 6].

In each country or region, LPs obey some established patterns. In Brazil, for example, most LPs are composed of a white background with black characters or a red background with white characters. Besides considering the possible differences between LPs, ALPR systems have to deal with the variation of the quality of the images, which may differ in illumination, shadows, blur, inclinations, and other kinds of distortions.

Therefore, many recent approaches have used robust deep learning techniques [9, 10, 16, 17], as it has improved the state-of-the-art of object detection, speech recognition, among others [11]. These methods achieve good efficiency on a high-end Graphic Processing Unit (GPU) but may be too costly for local processing in weaker devices, being more suitable for cloud computing deployment. This approach using heavy deep learning methods incurs significant latency, energy, and financial overheads and also raises privacy concerns [13], thus, limiting the possibilities of real-world applications.

In this context, a better approach is to use edge computing, i.e., computing the data locally on small and low powered edge devices, as this approach is more attractive to several applications, such as robotics, drone-based surveillance, and autonomous driving [13].

In this work, we propose a complete ALPR system for Brazillian LPs based on the combination of deep learning object detection techniques and efficient enough for embedded system execution. Our main goal is to achieve a balance between accuracy and timing using convolutional neural networks (CNNs) to detect and read an LP aiming a suitable performance in realistic scenarios, allowing the extension of real-world applications through edge computation.

The remainder of this paper is organized as follows. We review related works in Sect. 2. The materials used in our experiments are presented in Sect. 4. The details of the proposed system are described in Sect. 3. In Sect. 5, we report and discuss the experimental results. Finally, conclusions and future works are given in Sect. 6.

2 Related Works

In this section, we will present several works proposed for the different stages of an ALPR system, Vehicle Detection, and LP Detection and Recognition (LPDR). Bringing performance data in precision and execution time of the methods, since the focus of our work is on-time processing real, we must pay attention not only to the efficiency of the method but also to the response time.

2.1 Vehicle Detection

The first stage of an ALPR system is the detection of the vehicle since the LP must be attached to its body. The system’s hit rate is highly dependent on the quality of the vehicle’s detection, and once the detection method returns a closed image of the vehicle where the LP is cut, or even no LP appears, the system will not be able to recognize all characters and no characters belonging to that vehicle’s LP. Next, we will discuss vehicle detection methods proposed by different authors.

Wang et al.  [18] proposed a new structure called Envolving Boxes, which determines and refines the object boxes through different representations of attributes of each object. A gyro-fine network (FTN) is responsible for the refinement of the boxes. The method was evaluated using Faster R-CNN in the DETRAC benchmark, where it achieved an improvement of 9.5% mAP, running at 9–13 FPS on an Nvidia Titan X GPU.

Sang et al.  [15] proposed a CNN based on Yolov2, Yolov2_Vehicle. During the training of the network, the K-means ++ algorithm groups the bounding and anchoring boxes. Other improvements were imposed, such as the normalization of object boxes to improve the method of calculating losses, removal of repeated convolutional layers, and the merging of attributes from different layers in order to improve the extraction of attributes. The Yolov2_Vehicle was tested in the vehicle dataset of the Beijing Institute of Technology (BIT), reaching 94.78% mAP and running on 4 Nvidia Tesla K80 GPUs.

The Faster R-CNN with Envolving Boxes proposed by Wang et al.  [18] proved unable to process real-time images (30 FPS, for example) on a medium-performance GPU. While Yolov2_Vehicle, proposed by Sang et al.  [15], required 4 GPUs for use in training and validation. Therefore, both methods may not achieve a satisfactory framerate in limited systems used in embedded applications, such as Jetson TX2.

2.2 License Plate Detection and Recognition

After detecting the vehicle, there are two more crucial steps for the operation of the ALPR, and these are the Detection and Recognition of LP characters (LPDR). Below we describe works that proposed LPDR methods, which used classical attribute extractors in Computer Vision, Convolutional Neural Networks (CNNs), as well as Machine Learning algorithms.

Bulan et al.  [2] proposed a method for recognizing LP where the first step consists of identifying flaws in the detection of the LP through the legibility classification of the characters present in the LP—using the Transfer Learning technique, with CNN AlexNet as feature extractor and Linear kernel Support Vector Machine (SVM) as classifier. The character recognition stage, on the other hand, used the HOG and LeNet extractors in conjunction with the SVM-Linear classifier. The method achieved more than 99% accuracy, running on an Nvidia GTX 570 GPU, but with a frame rate of 0.5 FPS.

Björklund et al.  [1] used LP synthetic images from the European Union (EU) to train LP detection and recognition CNNs. The detection task validation was performed in the AOLP dataset, where it performed an accuracy of 99.30%, as well as the recognition task that reached 99.80%. The methods together required an average of 845 ms to process each \(640\times 480\) image on a Jetson TX1 embedded system, while on an Nvidia GTX GPU 1080, the same procedure took 25.5 ms.

The method proposed by Bulan et al.  [2] is not feasible for a real-time application, since its framerate during the validation process was 0.5 FPS. Björklund et al.  [1] used synthetic images for training its detection and reconnaissance networks, but does not concern itself with vehicle detection, and its validation was performed with images with and without vehicles, but all with license plates, which does not match the purpose of our method.

2.3 Complete ALPR Systems

Some studies have proposed the complete ALPR system, from vehicle detection, through detection to LP recognition. This type of system receives an image with vehicles present, returning the license plate characters of each vehicle in the image. Some real-time applications use this type of system, such as parking lots, speed cameras, and police vehicles. We cite some examples of Complete ALPR Systems below.

Laroca et al.  [10] implemented a complete Automatic Plate Recognition (ALPR) system composed of three versions based on the YOLO (You Only Look Once) architecture: Yolov2 used in vehicle detection, Fast-Yolov2 responsible for plate detection on a given vehicle, and CR-Net which is a version of YOLO adapted for the detection and recognition of license plate characters. This method reached 95.90% accuracy among the license plates present in Dataset UFPR-ALPR  [9], also proposed by the author. The experiment ran at 73 FPS on a high-capacity GPU.

Silva and Jung  [17] proposed an ALPR method for LP images in different conditions of visibility, perspectives, and projections. This method uses a network architecture called Warped Planar Object Detection Network (WPOD-NET), which detects the LP and performs a perspective readjustment to assist in character recognition, which is the next step to LP detection. This method was evaluated in OpenALPR Datasets (types BR and EU), SSIG, and AOLP (RP), as well as in the Dataset proposed in their work, the CD-HARD, which contains LP at different angles and distances, presenting greater difficulty for systems of ALPR. The method reached 93.52% on OpenALPR-US, 91.23% on OpenALPR-BR, LP datasets in frontal perspectives, while on CD-HARD, it reached 75.00% LP accuracy at higher angles and distances, running at 5 FPS on an Nvidia GPU Titan X.

In our work, we propose a complete ALPR system for embedded applications in real-time, since, every day, the need for recognition of LP in security systems and public security in general, both for storing information and in parking with access control, as well as consultations carried out by security agents and also record of infractions in the house of speed cameras. We will compare our method with those previously mentioned, Laroca et al.  [10] and Silva et al.  [17], both using a server machine and a device created for embedded applications, since both validated their methods based on LP from Brazil, which is the type that we will validate our work.

3 Proposed Methodology

Fig. 1.
figure 1

Proposed methodology flow chart (we covered the LP to maintain the vehicle’s anonymity).

The proposed pipeline for LP recognition illustrated in Fig. 1, and is composed of: (A) car detection, (B) LP detection, and (C) LP character recognition. The first step is the vehicle detection using MobileNet-SSD; the detected vehicles are isolated from the rest of the input image. These isolated vehicles are the input for the LPD-net, because detecting the vehicle first may decrease the number of false positives and result in better images with larger and easier LPs for posterior detection. The LPD-net identifies LPs for each vehicle image. The next step consists of the LP character recognition using the OCR-net. Finally, the characters are replaced according to Brazilian LPs patterns.

Table 1. License Plate Detection Network (LPD-net): a modification of Tiny YOLOv3 for one class output

3.1 Vehicle Detection

Since vehicles are common objects in pre-trained weights of usual deep learning object detection approaches, we decided not to train new weights from scratch. The SSD-300 [12] with MobileNet [8] as the backbone and PASCAL-VOC [4] pre-trained weights is fast and accurate enough for this approach, even without any additional changes or training in the model. So, the MobileNet-SSD detects the vehicles in the input image, and then they are isolated in separate images that will be used in the next stages.

3.2 License Plate Detection

For each vehicle image, the LPs must be detected and isolated. In this work, we use the Tiny YOLOv3 architecture [14] changing the last layer for one class detection, resulting in the License Plate Detection Network (LPD-net), as shown in Table 1. The network is small and, thus, its speed should be efficient for most of the systems, including embedded systems.

For training the LPD-net, we used a private dataset, which is presented in Sect. 4.2. The four corners of each LP were manually labeled and no data augmentation techniques were required. The training of the network used the follow parameters: \(416\times 416\) for input size; 50k iterations of mini-batches containing 64 images; learning rate of 0.001 in the first 25k iterations and 0.0001 in the rest of them.

3.3 Optical Character Recognition

Eventually, since the LP image is isolated, the characters can be recognized using an Optical Character Recognition network (OCR-net) [16]. We decided not to train a model from scratch since there are pre-trained weighs with satisfying results [17]. Also, the vast majority of Brazilian LPs have a pattern of three letters, followed by four numbers in a uniform background color. Thus, some heuristics are applied to replace digits and letters when it is needed, as shown in Table 2.

Table 2. Replacing heuristics for correcting the recognized text

4 Materials

This section provides information about the Jetson TX2, and the datasets evaluated in this paper.

4.1 Jetson TX2

Jetson TX2 is a power-efficient computing device. It has a powerful processor that helps to bring artificial intelligence processing power to end products. The Jetson TX2 is composed of a GPU with Nvidia Pascal and a dual-core 64 bit ARM processor. Furthermore, it has an 8 GB RAM with a speed of 59.7 GB/s. It has standard connections for cameras, displays, mouse, and keyboard, as well as GPIO pins, which allow for fast prototyping.

4.2 Datasets

In this paper, we used two datasets of Brazilian LP images to evaluate the proposed methodology. The first one is a private dataset composed of 1988 images of cars obtained from traffic monitoring cameras from Brazil. The resolution of the images is \(752 \times 540\) and they were captured during the daytime and also during nighttime, resulting in some black and white pictures. This dataset was split into 1331 images for training and 657 for testing. Samples of the private dataset are presented in Fig. 2. Since this is a private dataset, we omitted characteristics from the vehicle that can identify it, including the license plate.

Fig. 2.
figure 2

Samples of the private dataset of car images

The second dataset, which is called UFPR-ALPR dataset, is from real-world scenarios, where a camera was placed inside a moving vehicle. Three different cameras were used on the images acquisition; for each camera, approximately 1,500 images with \(1920\times 1080\) pixels of size were captured, totaling 4,500 images, 150 vehicles, and over 30,000 characters. This dataset was split in 40% for training, 40% for testing, and the remaining for validation. The UFPR-ALPR dataset can only be used for academic research. Because of the private dataset has only images of cars, we filtered the UFPR-ALPR validation set in order to have only car images. Samples of the UFPR-ALPR dataset are shown in Fig. 3. We can observe that this dataset has a proper perspective for an embedded system.

Fig. 3.
figure 3

Samples of the UFPR-ALPR of car images.

5 Experimental Results

In this section, we evaluate the proposed ALPR system in two steps. The first step consists of experiments using the UFPR-ALPR dataset using a computer composed by Nvidia GTX 1070 as GPU, 8 GB of RAM, with Ubuntu 16.04 LTS as the operating system. In the second step, we evaluated the proposed ALPR system in an embedded platform, which is the Jetson TX2.

Since the LPD-net was trained only with Brazilian plates and cars, we compare our results with the works of Silva and Jung [17] and Laroca et al. [10]. Both papers proposed a complete ALPR system, and used a larger dataset than ours; Silva and Jung [17] affirmed to be tuned for Brazilian plates, and Laroca et al. [10] used Brazilian plates in its plate recognition training.

We performed ten runs on the validation set of each dataset on each system. All runs resulted in the same value. The final results of the proposed system are presented in Table 3 in bold. This table also presents the results for the methods in both datasets that we compare the proposed method.

For the UFPR-ALPR dataset, our system recognized all seven characters in 85.27% of the images, resulting in an improvement of 6.24% when compared to [17], but a reduction of 9.52% when compared to [10]. However, the proposed system identified at least six characters in 96.87% of the LPs, while [10] recognized at least six characters in 97.57% of the images, then the proposed system is just 0.7% inferior.

For the private dataset, we can note that the proposed approach achieved a recognition rate 3.51% inferior than [10] on all characters correct, but obtained a result 2.13% better on at least six characters correct. This result can indicate that the dataset used on the training stage influences the results. In both datasets, the OCR-net could be responsible for the big difference between a completely correct LP and a correct six-characters plate.

In Fig. 4, we can observe that the proposed method surpass the method proposed by [17] in both datasets, and is better than [10] in the private dataset. In addition, even though the proposed method does not have samples of the UFPR-ALPR dataset in its training, it has similar results to the method proposed by [10].

Table 3. Recognition rates of the proposed ALPR system in the Nvidia GTX 1070 and Jetson TX2.

5.1 Evaluation Using Nvidia GTX 1070

In Table 4, we present an average time required for processing the proposed ALPR system in the UFPR-ALPR dataset using an Nvidia GTX 1070 as GPU divided into stages. The LPD-net runs at 105 FPS, making it feasible for embedded systems. The slowest step is the vehicle detection, running at 22 FPS. This speed is due to the MobileNet-SSD with Pascal-VOC weights, which contains 20 classes, then one image can have additional classes, making the time increase.

Fig. 4.
figure 4

Graphical representation of the results for at least six characters correct.

Table 5 shows a comparison of the average of the processing times between different ALPR systems using an Nvidia GTX 1070 as GPU. For the UFPR-ALPR-cars dataset, we can observe that [17] achieved only 2 FPS, and [10] reached 5 FPS, meaning that the proposed system is three times faster than the system proposed by [10]. In the private dataset, the proposed system stood out again, reaching about fives faster than the compared approaches. All three systems had a better FPS in the private dataset because it has only one car per image.

Table 4. Results of average time required for processing the ALPR system in UFPR-ALPR dataset using a Nvidia GTX 1070.

5.2 Evaluation Using Jetson TX2

Since the results on Nvidia GTX 1070 were promising, we decided to embed the system on a Jetson TX2. Table 6 exhibits the average time required for processing the proposed ALPR system in the UFPR-ALPR dataset using a Jetson TX2 divided into stages. We can note that the complete system took approximately 122 ms to execute the three stages per image. Despite the time being double than the Nvidia GTX 1070, this time still is efficient for an embedded platform, considering that we have a complete ALPR-system.

Table 5. Comparison of average processing time between different ALPR systems using a Nvidia GTX 1070.

Table 7 presents a comparison of the average processing time between different ALPR systems using a Jetson TX2. For the UFPR-ALPR-cars dataset, we can observe that the proposed system is significantly faster than [17] and [10]. For the private dataset, all systems improved their times due to the characteristics of the dataset, highlighting the proposed system that is approximately five times faster than the other approaches.

Table 6. Results of average time required for processing the ALPR system in UFPR-ALPR dataset using a Nvidia Jetson TX2.

In Fig. 5, we can observe that the proposed system is faster than the other approaches in both datasets used in the experiments, processing more frames per seconds. Thus, the proposed system is feasible and can be applied as a real-time application in a Jetson TX2.

Table 7. Comparison of average processing time between different ALPR systems using a Nvidia Jetson TX2
Fig. 5.
figure 5

Chart comparing the average framerate of the ALPR systems in the UFPR-ALPR-cars and private dataset using Jetson TX2.

6 Conclusion and Future Works

In this paper, we proposed a complete ALPR system for Brazilian LPs. We used two existing CNN networks: MobileNet-SSD with Pascal-VOC weights for the detection of the cars, and OCR-net for character recognition. In addition, we created the LPD-net, a CNN network modified from the Yolov3 Tiny for the plate detection.

In order to evaluate the proposed system, we used two datasets: one private dataset and one public dataset, the UFPR-ALPR dataset. Furthermore, we assessed the results in two platforms, Nvidia GTX 1070 and Nvidia Jetson TX2. This comparison was made, mainly because many papers published make the assumption that there is unlimited computing power. However, this is not the case when dealing with mobile or portable systems.

When considering the complete ALPR system and the recognition of at least six characters, the proposed approach achieved 96.87% in the UFPR-ALPR dataset and 90.56% in the private dataset. Besides, the proposed system accomplished the best processing times in both datasets in both platforms, Nvidia GTX 1070 and Nvidia Jetson TX2; in Nvidia GTX 1070, the system obtained 65.01 ms and 19.29 ms for the UFPR-ALPR dataset and the private dataset, respectively. In Jetson TX2, the system reached the times 122.81 ms and 93.97 ms for the UFPR-ALPR dataset and the private dataset, respectively. Thus, these results indicated to be efficient and feasible for embedded systems.

For future works, first, we aim to implement the system as a real-time application inside a car, connecting cameras to the Jetson TX2. Also, we intend to create a CNN network to detect the cars so we can speed up this step and, consequently, the overall process. We also want to expand the proposed system for motorcycles and other types of vehicles.