1 Introduction

A traffic surveillance camera system is an important part of an intelligent transportation system [30]. It mainly includes automatic monitoring digital cameras to take snapshots of passing vehicles and other moving objects, as is shown in Fig. 1. The recorded images are high-resolution static images, which can provide valuable clues for police and other security departments, such as a vehicle plate number, the time it passed, its movement path and the driver’s face, etc. In prior days, massive amounts of stored images were processed manually, but this required hard work and resulted in poor efficiency. With the rapid development of computer technology, the latest in automatic license plate recognition software is utilized at an increasing rate in the field with great success [4]. Unfortunately, sometimes we may not discover the license plate of a vehicle because of cloned license plates, missing license plates, or because the license plate can’t be recognized. This is why automatic vehicle detection and recognition is becoming the imminent requirement for traffic surveillance applications [22]. This technology will save a lot of time and effort for users trying to identify blacklisted vehicles or who are searching for specific vehicles from a large surveillance image database [16, 28].

Fig 1
figure 1

Traffic surveillance camera system

2 Related work

Vehicle detection and recognition are a vital, yet challenging task since the vehicle image is distorted and affected by many factors. Firstly, the number of vehicle types is rising with new car model promoted regularly. And then there is also a great deal of similarities between some vehicle models. At last there are also significant differences among vehicle images due to differences of road environments, weather, illumination, and the cameras used.

Nowadays, most of the published research mainly focuses on the classification of vehicles into broad categories, such as motorbike, cars, buses, or trucks [5, 17, 23], but this does not provide sufficient functionality to satisfy users’ demands. Some researchers studied vehicle logo detection and recognition using frontal vehicle images to access the information that would reveal the vehicle’s manufacturer [19, 21]. Recently, some researchers have recently adapted feature extraction and machine learning algorithms to classify vehicles into precise classes. For vehicle recognition, Munroe used Canny edges as the extracted features, and tested 3 different classifiers: k-NN, neural network and the decision tree [13]. The data set was composed of 5 classes, and each class had 30 samples. Sobel edges are extracted and oriented-contour points are then obtained in Clady’s processing [3]. The training image set contains 50 classes, and each class is comprised of 291 frontal view images. The last correct recognition rate is about 93 %. Petrovic and Cootes described an investigation of feature representations and recognition, which is to create a rigid structure recognition framework for automatic identification of vehicle types with the recognition rates of over 93 % [18]. P. Negri, et al., developed an oriented-contour point-based voting algorithm to represent a vehicle type for multi-class vehicle type identification, which is robust to partial occlusion and lighting for recognition [14]. F. M. Kazemi investigated the application of transform-based image features in the use of classifying five models of vehicles, which contains wavelet transforms, fast Fourier transforms, and discrete curvelet transforms [10]. B. L. Zhang first studied two feature extraction methods used for image description, which included a wavelet transform and the Pyramid Histogram of Oriented Gradient for feature extraction; then Zhang proposed a reliable classification scheme for vehicle type recognition using cascade classifier ensembles [27]. M. A. Hannan introduced automatic vehicle classification for traffic monitoring using image processing. This technique uses the fast neural network (FNN) as a primary classifier, and then the classical neural network as a final classifier, which are applied to achieve high classification performance [8].

In recent years, computer vision and pattern recognition have made great progress in the development of image feature description and recognition, especially in the field of face recognition [6]. Face recognition continues to be an active, hot research point in image processing and computer vision research, which yields many useful and effective methods and algorithms [11, 24]. Compared to face recognition, vehicle recognition is very similar. For examples, each face consists of the same components, such as eyes, mouth, and nose, and each frontal vehicle consists of the same components, such as lights, bumper, and windscreens. Based on current, highly effective face recognition methods, the paper proposes an integrated vehicle detection and classification system. The first part of this paper concentrates on vehicle detection. In order to detect a vehicle in a static image, almost all researchers make use of license plate locations to extract the vehicle area from the image [20, 29]. However, this is not a valid technique when there are vehicles with non-symmetrical front license plates and the vehicle contour may not be accurate enough for some types of vehicles, particularly those that are either larger or smaller than the average.

This paper proposes a robust vehicle detection scheme based on an AdaBoost algorithm firstly. The basic idea is to extract the Haar-like features from vehicle samples and then use the AdaBoost algorithm to train classifiers for detection, which is distinct from previous research on vehicle detection for static images. The second part of this paper concentrates on vehicle recognition, which can also be called vehicle type classification. As vehicle images are subject to their environment and vehicles position can vary, a Gabor wavelet transform and a local binary pattern (LBP) operator are used to extract multi-scale and multi-orientation vehicle features; then, the principal components analysis (PCA) is used to reduce the feature vector dimensions; finally, an euclidean distance comparison algorithm is used to measure the similarity of vectors with lower dimensions in order to finalize the vehicle types.

3 Vehicle detection and recognition principle

This method of processing is divided into two stages: Vehicle detection (or location) and vehicle recognition. First, a machine learning algorithm, based on Haar-like features [15], and an AdaBoost algorithm is applied to train a classifier for the vehicle detection on the input image, which to find region-of-interest (ROI) of an image for recognition. Then, additional training is performed using a PCA classifier to learn recognition from samples of different types of vehicle, as is illustrated in Fig. 2.

Fig 2
figure 2

Flowchart of the vehicle recognition system

4 Vehicle detection

4.1 Haar-like feature

A Haar-like feature is well known as a local texture descriptor for describing the local appearance of object [2], which have been used successfully in object detecting and classification. Standard Haar-like features value can be calculated by subtracting the sum of a white region of the pixel value from the sum of the black region of the pixel value, as is shown in Fig. 3. When the position, size or scale of the Haar-like temple are changed, the object feature information, such as the intensity gradient, edge, or contour can be captured. As is shown in Fig. 3, the vehicle image includes saliency rectangle, contour and edge characteristics, where the Haar-like feature is especially suitable for its feature description. Besides, the value of Haar-like feature is easy to calculate with the use of an integral image [12].

Fig 3
figure 3

Haar-like feature

4.2 AdaBoost algorithm

The purpose of the AdaBoost algorithm [25] is to use the feature to discover the best weak classifiers to form a strong classifier, and has shown its capability to improve the performance of various detection and classification applications. Actually the strong classifier is an ensemble classifier composed of many weak classifiers that just better than a random guess. The Adaboost algorithm can be described as follows:

 A. Assume that samples are (x 1, y 1), (x 2, y 2),… (x n , y n ), While, y i  = 1 denotes a positive sample (vehicle), and y i  = 0 denotes a negative sample (non vehicle). n is the number of samples.

 B. Normalize the weights w 1,i  = D(i)

 C. For t = 1, 2, 3 … T :

 (1). Normalize the weights: \( {q}_{t,i}=\frac{w_{t,i}}{{\displaystyle {\sum}_{j=1}^n{w}_{t,j}}} \)

 (2). For each feature f, firstly training a weak classifier h(x, f, p, θθ), and then generate the weight sum of error rate εε f  = ∑ i q i |h(x i , f, p, θθ) − y i |. At last the weak classifier h(x, f, p, θθ) is defined as: \( h\left(x,f,p,\theta \right)=\left\{\begin{array}{l}1\kern0.72em pf(x)<p\theta \hfill \\ {}0\kern0.6em otherwise\hfill \end{array}\kern0.36em \right. \)

 (3). Choose the best weak classifier h t (x), which have the lowest error ε t and the ε t is defined as: εε t  = min f,p,θ i q i |h(x i , f, p, θθ) − y i |

 (4). For each training process, update the weights: \( {W}_{t+1,i}={w}_{t,i}{\beta}_t^{1-{e}_i} \).

 D. The final strong classifier is: \( H(x)=\left\{\begin{array}{l}1\;{\displaystyle \sum_{t=1}^T{\alpha}_t{h}_t(x)}\ge \frac{1}{2}{\displaystyle \sum_{t=1}^T{\alpha}_t}\hfill \\ {}0\kern3.12em otherwise\hfill \end{array}\right. \), Where \( {\alpha}_t= \log \frac{1}{\beta_t} \)

4.3 Vehicle detecting

The strong classifier is used to vehicle detection by sliding a sub-window across the image at all locations with a step. In order to reduce the detection time, the scaling is achieved by changing the detector itself, not the scaling of the image. The initial size of the classifier is 15 × 15, the sub-window is scaled at 1.2 each time, and the transform step is 2 pixels, as is shown in Fig. 4. The neighbour detected objects should be combined as one target object, because the same vehicle may be detected twice or more.

Fig 4
figure 4

Procedure of vehicle detection

5 Vehicle recognition

The vehicle image can be translated and described as a model called a Local Gabor Binary Pattern Histogram Sequence [9], which is illustrated in Fig. 5. The approach contains the following procedures: (1) collecting some vehicle images as input samples for the same type vehicle and then transforming the average image of one type vehicle to a Gabor magnitude picture via the frequency domain using Gabor wavelets filters; (2) extracting the LBP for each Gabor magnitude picture; (3) dividing each LBP picture into rectangle regions R 0, R 1 … R m − 1, and then computing the histogram for each region; (4) concatenating the histograms of each region to form the final histogram sequence, which represents the original vehicle image; (5) Measuring the vehicle’s similarity with the histogram’s feature vector after the dimension reduction via PCA . This procedure is described in detail in the following sub-sections.

Fig 5
figure 5

Framework of the proposed vehicle recognition approach

5.1 Gabor wavelets transform

The Gabor wavelets filter has been widely used in face recognition since the pioneering of the field. Considering the advantages of the Gabor filters in object recognition [31], we adopt the multi-resolution and multi-orientation Gabor filters to process input vehicle images for sequential feature extraction. The Gabor wavelets filters are defined as follows [26]:

$$ {\psi}_{u,v}(z)=\frac{{\left\Vert {k}_{u,v}\right\Vert}^2}{\sigma^2} \exp \left[-\frac{{\left\Vert k\right\Vert}^2{\left\Vert z\right\Vert}^2}{2{\sigma}^2}\right].\left[{e}^{i{k}_{u,v}z}- \exp \right] $$
(1)

In formula (1), u is the orientation of the Gabor kernels and v is the scale of the Gabor kernels z = (x, y), and the wave vector is \( {k}_{u,v}={k}_v{e}^{i{\phi}_u} \),where k v  = k max/f with k max being the maximum frequency and f being the spacing factor between kernels in the frequency domain. This approach uses Gabor filters with 5 scales and 8 orientations. Therefore, each vehicle image will be will result in a total of 40 Gabor Magnitude Pictures (GMPs), as is shown in Fig. 6.

Fig 6
figure 6

Gabor magnitude pictures of the vehicle

5.2 Local gabor binary pattern (LGBP) and histogram sequence

After the Gabor transform, we encode the magnitude values with the LBP operator to enhance the information. In 1993, Ojala introduced the LBP texture operator for 2D texture analysis. Later, the LBP operator was extended to use neighborhoods of different sizes [7]. Using the LBP 8,3 operator, the histogram of the labeled image can be defined as follows:

$$ {H}_i={\displaystyle \sum_{x,y}I\left\{{f}_l\left(x,y\right)=i\right\},i=0,\dots n-1,} $$
(2)

Where n is the number of different labels produced by the LBP operator, which is less than 256. Then, f(x, y) is the labeled image and I (A) is the decision function with a value of 1 if the event A is true and 0 otherwise.

To form the LBP histogram sequence, the LBP histogram for each sub-region has to be computed. The LBP histogram of one sub-region contains the local feature of that sub-region; by combining the LBP histograms for all sub-regions, and the last histogram sequence represents the global characteristics for the whole image.

For representation efficiency, every magnitude image has to be divided into 6*6 sub-regions, (R0,R1, . . . Rn-1), then the 36 sub-region histograms have to be combined to form a histogram sequence to create a magnitude image. With the same step, all the histogram pieces computed from the regions of all the 40 LGBP Maps are concentrated into a large histogram sequence to produce the final vehicle representation, as is shown in Fig. 7.

Fig 7
figure 7

LGBP Histogram Sequence

5.3 Feature dimension reduction using PCA

Finally, the cell-level histograms are concatenated to produce a high-dimensional global descriptor vector. For example, the vector with 40 Gabor wavelets, the histograms bin scale will be 255, the subset grid is 6 × 6, which will produce a vector descriptor with 6 × 6 × 255 × 50 = 459000 dimensionality. So, we use simple, PCA-based dimensionality reduction [1].

Assume that we have n feature vectors x i (i = 0, 1, 2 … n) of size n, which is representing a set of sampled vehicle images with high dimension. PCA tends to find a low dimensional subspace whose basis vectors correspond to the maximum variance direction in the original space of vectors x i . Let m represent the mean vector of x i :

$$ m=\frac{1}{M}{\displaystyle \sum_{i=1}^M{x}_i} $$
(3)

And let w i be defined as the mean centered vector:

$$ {w}_i={x}_i-m $$
(4)

So the covariance matrix C can be defined as:

$$ C=W{W}^T $$
(5)

Where W is a matrix composed of the column vectors w i , which are placed side by side. Then we can calculate eigenvalues and eigenvectors of matrix C with singular value decomposition algorithm, and the eigenvectors \( u{}_i \) are sorted from high to low according to their corresponding eigenvalues λ i . Corresponding to k largest eigenvalues, the projection matrix M k is composed as:

$$ {M}_k=\left[{w}_1{u}_1,{w}_2{u}_2,\dots, {w}_k{u}_k\right] $$
(6)

With M k , each vector x i is projection to low dimensional vector Ω = x i M k . So the simplest method for determining which vehicle class provides the best description of an input vehicle image is to find the vehicle class k that minimizes the Euclidean distance.

$$ {\varepsilon}_k=\left\Vert \varOmega -{\varOmega}_k\right\Vert $$
(7)

Where Ω k is a vector describing the k th vehicle class. If ε k is less than some predefined threshold, then the vehicle is classified as belonging to class k.

6 Experimental evaluations

This section will introduce the results of the vehicle detection and the vehicle recognition efforts, which will be described separately below. Our results are all calculated on a desktop computer with an Intel Core i7 3.4GHz CPU, 4GB RAM and NVIDIA Quadro 2000 GPU. Our software is developed in windows 7, with the visual studio 2010 and the opencv 2.4.3.

The local police department in Maanshan City provided a large collection of vehicle images recorded with traffic surveillance cameras in 1 week. The capturing time was between 7:00 (AM) and 17:00 (PM) with a wide range of illumination condition. All original image resolution are over 1600 × 1200 pixels. For more accurate experiments, the vehicle image data set were split randomly into training and testing sets.

6.1 Vehicle detection result

5,000 positive images were used for classifier training, which were obtained by manually cropping the vehicle area of the vehicle images that recorded in the database of traffic surveillance system. All images were resized to 30x30 pixels for training. In order to get sufficient negative samples, we download several image set from the internet, which resulted in more than 15,000 negative samples, at least. In the training processing, the maximum detection rate was set at 99.5 %, and the minimum false rate was 50 %. Parts of the image set are shown in Fig. 8.

Fig 8
figure 8

Examples of the positive samples and negative samples. a Positive images b Negative images

The detection rate (DR), the total detection rate (TDR), and the false rate (FAR) were defined for getting the numerical result of vehicle detection in our experiment as follows.

$$ DR=\frac{TP}{P}\times 100\%\kern0.36em TDR=\frac{TP+TN}{P+N}\times 100\%\kern0.24em FAR=\frac{FP}{N}\times 100\% $$
(8)

The P is the number of all positive samples, and The N is the number of non-vehicle samples (negative samples). Where TP stands for the number of vehicle images detected correctly, and TN stands for the number of negative samples detected correctly. FP stands for the number of negative samples that detected to be positive samples. The aim of vehicle detection is to get the higher DR, TDR value and the lower FAR value. The results of the experiment are displayed in Table 1. The image of vehicle detection result is shown in Fig. 9.

Table 1 Detection Result
Fig 9
figure 9

Result of vehicle detection

A total of 720 images in our test and the total time it took to deal with these images is about 43 s. Thus, the average detection time for each image is 60 milliseconds. As is shown in Table 1, we achieved a high detection rate and low false rate, which is better than the previous method. Vehicle detection may be failed because of multiple vehicles occlusion and incomplete vehicles in images, as demonstrated in Fig. 10. The unsuccessful segmentation of vehicle will not be included in classification images sets.

Fig 10
figure 10

Example of vehicle unsuccessful detection. a multiple vehicles occlusion in an image. b Incomplete vehicles in an image

6.2 Vehicle recognition result

A total of 223 images of different type of vehicles were selected for testing, including 8 classes, including Buick Excellent, Volkswagen Lavida, Volkswagen Santana, Volkswagen Tiguan, Skoda Ovcia, Chevrolet Cruze, Toyota Corolla, and all the vehicle are in old style. The training sets were collected randomly from the images database of traffic surveillance systems, separate with the training set, as shown in Fig. 11.

Fig 11
figure 11

Testing samples of vehicle image

For each class, we calculate the correct number and correct rate of recognition, the last result for all types of vehicle is as shown in Table 2.

Table 2 Result of the recognition

From Table 2 we can conclude that the last recognition rate is achieve 91.6 %, and the detection time of all 200 images is about 13 s (including image reading time), so the average recognition time for each picture is less than 300 ms, which indicates a very high time efficiency.

Unlike some other object detection and recognition filed, especially in face recognition, there are no standards or benchmark image sets for testing, so it is very difficult to make a fair comparison with other published investigations of vehicle recognition. In this paper, we had to create experiments with our own image sets to show our results; however, the results can still be contrasted roughly. From available information, recognition rates generally range from 85 to 96 % in other studies, whereas we have achieved an expected result, which is higher than the average levels of current vehicle recognition methods.

7 Conclusion

Accurate and robust vehicle detection and the recognition still a challenging task in the field of intelligent transportation surveillance systems. In this paper, we presented a cascade of boosted classifiers based on the characteristics of the vehicle images to be used for vehicle detection in on-road scene images. Then, Haar-like features and an AdaBoost algorithm were used to construct the classifier for the vehicle detection, which is distinct from previous research published on vehicle detection. Next, the histogram intersection was used to measure the similarity of different LGBP Histogram Sequence, and the nearest neighborhood of the Euclidean distance was exploited for final classification, which is impressively insensitive to appearance variations due to lighting or vehicle pose. We have tested this method on a realistic data set of over 800 frontal images of cars that were used for vehicle detection, which achieved a high accuracy of 97.3 %. Over seven types of vehicle with 227 images were tested in our experiment. The recognition rate was over 92 %, with a fast processing time, which is over the average levels of current vehicle recognition methods. However, the images we used were captured during the day, so our future efforts will be focused on detecting and recognizing vehicles during the night, which is very difficult problem to solve with existing technology.