1 Introduction

Traffic surveillance as an important technology in Intelligent Transportation Systems (ITS) plays a key role in providing sufficient information parameters of traffic for good transportation guide and control [1,2,3,4,5]. Vehicle as the core traffic object, liable and robust detection aiming at it is a fundamental component of traffic surveillance.

There are many representative methods, but vision-based methods are the hot topic for the past few years, which not only can overcome the installation and maintenance difficulties of induction coil method well, but also effectively solve the scattering and weather influences of infrared and ultrasonic methods. However, the vision-based vehicle detection is a challenging issue due to the huge within-class variability. For example, complex urban environments like bad weather, illumination changes, and poor/strong lighting conditions are difficult to be controlled. Moreover, lights reflection on the ground sometimes results in that shadow is detected as vehicle by mistake. Beyond that, for traffic congestion, vehicles are occluded by each other so that separate vehicles will easily merge into a single vehicle. Aiming at these difficulties, scholars have done a lot of researches and propose some constructive methods. Earlier, in case of good and stable lighting conditions, traditional moving object detection methods are usually applied for vehicle target detection in traffic surveillance, they are classified into three categories including background modeling, frame differencing and optical flow method [6,7,8]. However, there are some drawbacks that they are unable to detect a stationary vehicle and the detected moving object is not necessarily a vehicle. Therefore, later, the researchers utilize the visual features of vehicle to complete the localization for a static image. Features such as color [9], Gabor [10], SURF [11], Haar [12], edge [13] and corner [14] are usually used to represent the vehicle. Then, they are fed into a deterministic classifier and a generative model to identify vehicles. Classifier can be trained by semi-supervised machine learning methods such as SVM [15], AdaBoost [16] and Neural Network [17]. This works well during the sunny daytime but may fail during poor lighting conditions like nighttime. In addition, since high precision detection of this kind of method depends on large data samples, which results in that the real-time performance and robustness is relatively weak.

Many recent hot studies on part-based models have been developed to recognize vehicles. According to a human cognitive study [18], a vehicle is considered to be composed of a window, a roof, wheels, and other parts. These parts are usually easy to be learned and detected by using their appearance, edge and shape features. Therefore, use of these part detection results or combined model between them can localize vehicle quickly. As stated in [19, 20], vehicle headlights and taillights are used to represent the vehicle. Qing et al. [14] detected red color regions in rear vision as candidate taillights. Juric et al. [21] detected bright blobs as candidate headlights. In order to further filter out the false detection, they assumed that two headlights or taillights were aligned horizontally. More again as in literature [21, 22], license plate is used to represent the vehicle. Cho et al. [21] first recognized the character regions as candidate license plates, then, localized the license plate region by finding the inter-character distance in the plate region. Chang et al. [22] detected the vertical edges in feature space firs to determine the y-axis of the license plate candidates, then, the x-axis of the regions is identified for accurate license plate localization. This works well in normal traffic state but may fail when occlusion occurs. Considering of this, Winn et al. [23] used t the relationships between parts to improve the detection performance by a layout conditional random field model. Hoiem et al. [24] further expanded this method by using 3D model to mark the learning samples. In addition, vehicle parts can be selected and learned automatically in a deformable part-based model [25]. Niknejad et al. [26] employed this model composed of five components, including the front, back, side, front truncated and back truncated. Each component contained a root filter and six part filters, which were learned using a latent support vector machine and a histogram of oriented gradients features.

Although great progress had been made in fore literatures, especially the method based on Deformable Part Models (DPM) in [26], it had reached the top level of the average precision on PASCAL 2009 and 2010 standard database, which can resolve the occlusion effectively. Though, they still suffered from the balance problem of detection speed and accuracy. In our mechanism, the thought of DMP is inherited and in order to address the issue, the method using spatial relationships GMM to localize vehicle in daytime and nighttime is proposed. In the method, license plate, rear lamps and headlights are first localized based on their distinctive color, texture, and region feature. After that, the detected components are taken to model the spatial relationships using GMM, through similar probability measures of the model and the GMM, including GMM of plate and rear lamp, GMM of both rear lamps and GMM of both headlights, vehicles are recognized.

This paper is organized as follows: we introduce the spatial relationships modeling, rear-view vehicle detection for daytime traffic using GMM, and forward-view vehicle detection for daytime traffic using GMM in Sects. 2, 3 and 4. Finally, experimental results and conclusions are given in Sects. 5 and 6.

2 Spatial Relationship Modeling Among Vehicle Components

Supposing that components of vehicle at rear-view and forward-view are composed of license plate, rear lamps and headlights, for these components in china, they often have unified standard in not only size, but also installation position. Considering of this, we can make better use of the geometrical characteristics and spatial relationships of them to realize vehicle detection. Component spatial relationship is a new idea been introduced into vehicle detection, its difficulty is how to model aiming at such an abstract relationship.

Due to the symmetry, details of modeling are discussed in the following sections only for license plate and rear lamps at the rear-view. Earlier, we select the topologic structure to model the relationships between them, as shown in Fig. 1.

Fig. 1
figure 1

Spatial relationship among license plate and rear lamps

In the model, node L represents left rear lamp, node R represents symmetric right rear lamp, node P represents license plate. Spatial relationship composed of them forms a triangle. If we use this geometry model to localize vehicle, the case vehicle is wrongly detected will decrease greatly compared with that using single component to localize the vehicle. But, the license plate and rear lamps are prone to be sheltered in traffic jam, once one of the components is undetected, this model will cause the vehicle can not be localized. Based on the consideration, we resolve the spatial geometry model into three relationships, including the relationship between both rear lamps, and the relationship between license plate and left rear lamp as well as right rear lamp.

As in Fig. 1, the three relationships are denoted by three edges. Where, Edge LP connecting node L and node P represents the relationship between license plate and left rear lamp, edge LR connecting node L and node R represents the relationship between both rear lamps, edge RP connecting node R and node P represents the relationship between license plate and right rear lamp.

In order to facilitate the modeling, here, the edge is described with some geometrical characteristics. For the edge LP and RP, we choose a distance and an angle to describe it. i.e., the edge LP \(i\hbox {s}\) described by the spatial distance d(LP) between license plate and left rear lamp, and the angle \(\theta (LP,Hor)\) between the edge and horizontal line. Similarly, the edge RP \(i\hbox {s}\) described by the spatial distance d(RP) between license plate and right rear lamp and the angle \(\theta (LP,Hor)\) between the edge and horizontal line. For the edge LR, we choose three distinctive distances to describe it, having the horizontal distance \(d_x (L,R)\), vertical distance \(d_y (L,R)\) between both rear lamps and the average height \(d_h (L,R)\) of both rear lamps off the ground. Through statistics of 1000 vehicle samples including sedan vehicles, sports utility vehicles (SUVs) and buses for the five characteristic values, we infer a conclusion as below.

  1. (1)

    The probability that \(\theta (LP,Hor)\) is smaller than \(40{^{\circ }}\), the probability that d(LP) is more than 0.6 m less than 1.2 m, the probability that \(d_y (L,R)\) is 0m and the probability that \(d_h (L,R)\) is smaller than 0.7 m are all approach 100%.

  2. (2)

    The probability that \(d_x (L,R)\) is more than 0.8 m less than 1.5 m reaches nearly 70%, the probability that the value is more than 1.5 m but less than 2.5 m reaches nearly 30%. They are caused by various types of vehicles.

  3. (3)

    In stead, the probability of that out the range is close to 0.

If we utilize a function to describe the probability distribution of each above characteristic variable, we find that data in middle of the distribution are most, opposite data on both sides of the distribution are least. So, \(d_x (L,R)\,d_y (L,R)\), \(d_h (L,R)\), \(\theta (LP,Hor)\) and d(LP) as an independent variable, they all obey Gauss distribution. The distribution function is expressed as follows:

$$\begin{aligned} f(x)=\frac{1}{\sqrt{2\pi }\sigma }e^{-\frac{(x-u)^{2}}{2\sigma ^{2}}},\quad -\infty<x<\infty \end{aligned}$$
(1)

Among them, f(x) is the probability distribution density function of the random variable X, u is the distribution mean, \(\sigma ^{2}\) is the distribution variance, which can be easily obtained through sample mean and sample variance.

Then, aiming at the spatial relationship between both rear lamps, since \(d_y (L,R)\) is most equal to 0, we can ignore the distance difference data in the y direction only take the sample values of \(d_x (L,R)\) and \(d_h (L,R)\) as inputs to train the model. Meanwhile, aiming at the spatial relationship of license plate and rear lamps, since the left and right rear lamps are horizontally symmetrical on both sides of the license plate, we only need take the sample values of \(\theta (LP,Hor)\) and d(LP) as inputs to train the model, that of license plate and right rear lamp can be obtained based on the symmetry.

Fig. 2
figure 2

Spatial relationship model of the both rear-lamps

Figure 2 shows the learned model between both rear-lamps based on X-Diff and Y-Height. In the model, there are two peak periods, it means that the spatial relationship of both rear-lamps lies in two locations. One relatively thin represents the distribution model of small cars, the other relatively fat represents the distribution model of big cars.

Figure 3 shows the learned model between plate and rear lamp based on angle and distance. In the model, there are also two peak periods. The left peak represents the spatial relationship model of left rear lamp and license plate while the right peak represents that of right rear lamp and license plate.

Therefore, the relationships between components no matter the rear-lamps or plate and rear lamp might be multimodal. Based on that, each single distribution model in Figs. 2 and 3 follows the characteristics of Gaussian model, the relationship of parts can all be modeled to a GMM with two components, which satisfies the following expression:

$$\begin{aligned} \Pr (x)=\sum _{k=1}^2 {\pi _k } N(x;\mu _k ,\Sigma _k ) \end{aligned}$$
(2)

where, \(\pi _k\) is weight coefficient of the kth Gauss model, its value is determined by the component detection accurate rate of the kth relationship model, the greater the detection accurate rate of the single component is, the bigger the value is, but it meets the condition \(\sum \nolimits _{k=1}^3 {\pi _k } =1\).

$$\begin{aligned} N\left( x;u_k ,\sum \limits _k \right) =\frac{1}{\sqrt{2\pi \left| {\sum \nolimits _k } \right| }}\exp \left[ {-\frac{1}{2}(x-u_k )^{T}\sum \limits _k^{-1}(x-u_k )} \right] \end{aligned}$$

is the probability density function the kth Gauss model, \(u_k\) is the expectation of the kth Gauss model, \(\sum _k \) is the variance of the kth Gauss model. Different from the one-dimensional Gauss distribution in Formula 1, here variable x in the Gauss model is a 2-dimension of the column vector. Hence, through calculation, \(u_k\) can be expressed by a \(2*1\) column vector and \(\sum _k \) can be expressed by a \(2*2\) matrix, the estimation details can refer to the EM (Expectation Maximization) method introduced in literature [27], which contains two main steps: E Step and M Step, they are repeated to update the parameters until meeting the convergence condition. In the end, the GMM parameters of license plate and a rear lamp are obtained as follows: \(K=2, \pi _1 = \pi _2 =0.5, \mu _1 =[15.0625,66.8438]^{T}, \mu _2 =[164.9375,67.3750]^{T}\), that of two rear lamps are as follows: \(K=2, \pi _1 =0.7143, \pi _2 =0.2857, \mu _1 =[221.6064,84.4693]^{T}, \mu _2 =[138.5620,68.8671]^{T}, \Sigma _1 =\left[ \begin{array}{ll} 304.5098&{} 40.9447 \\ 40.9447&{} 214.8851 \\ \end{array}\right] \).

Fig. 3
figure 3

Spatial relationship model of the license plate and a rear lamp

3 Rear-View Vehicle Detection for Daytime Traffic Using the GMM

Using the rear-view GMM for vehicle detection, license plate and rear-lamp localization is a critical content. In the daytime scene, since the color, texture and region feature of these components are distinctive, we choose the method in literature [28] to complete the license plate and rear-lamp localization. Method adopted for license plate localization based on the unique texture features, deriving plate color converting model, plate hypothesis score calculation and cascade plate refining are accomplished, result is shown in Fig. 4.

Similar to license plate localization, based on the color and region features, multi-threshold segmentation and connected component analysis are adopted for rear lamp localization, result is shown in Fig. 5.

Fig. 4
figure 4

License plate location process. a Input image, b converted license plate gray image, c image obtained by gradient statistics, d license plate localization result

Fig. 5
figure 5

Rear-lamp location process. a Input image, b converted rear lamp gray image, c binary image after multi-threshold segmentation and connected domain tag, d rear lamp localization result

Fig. 6
figure 6

Vehicle detection process in daytime. a Vehicle detection result, b the whole column represents license plate location process, c the whole column represents both rear lamps localization process

When candidate license plates and rear-lamps are identified, we select the license plates in current frame as root nodes, then neighboring candidate rear lamp of each node is connected with the license plate to construct a model, if the model exists and satisfies the condition \(P(e_i |\Omega )>T\), we consider the components belong to a vehicle and the rear vehicle region can be labeled clearly. Here, T is a const equal to 0.2, \(P(e_i |\Omega )\) represents the similar probability of the model and the GMM, including GMM of plate and a rear lamp, GMM of both rear lamps. The vehicle detection results are shown in Fig. 6, all detected targets are marked with red rectangles. During the vehicle detection process, if one relationship model is successfully matched, we recognize it as a vehicle, which can resolve the object shelter and adhesion problems perfectly. But for the challenging case, vehicle-body color is similar to rear lamp, the proposed method can not reach a good effect. Aiming at the special condition, we find that on rear region excluding the license plate, other regions are all red, after color model transformation, the area of brightness is always more than 1/5 of the total area. Considering of that, we can fuse the bright region and the detected license plate region together, if they intersect with each other, we consider it as a vehicle as shown in Fig. 7.

Fig. 7
figure 7

For the case of red vehicle detection. a Vehicle detection result, b license plate location result, c rear bright region (Color figure online)

4 Forward-View Vehicle Detection for Nighttime Traffic Using the GMM

Vehicle detection at nighttime is usually a dramatic challenge for traffic surveillance due to the poor lighting condition, which makes the method for daytime unsuitable. So, a novel method using the GMM based on both headlights for nighttime traffic is suggested. First, since background is simple in nighttime scene only lights region is highlighted, frame difference is adopted to segment the highlighted targets as shown in Fig. 8.

Fig. 8
figure 8

Target segmentation process. a Input image, b gray image after frame difference, c background image, d binary image

Fig. 9
figure 9

Correct paired result of both headlights

From the binary result, we see that lights reflection on the ground is a very serious problem at night, which will interfere with the vehicle judgment. In order to resolve this problem, a method [29] based on the symmetry and shape characteristics (circular) of both headlights is applied, which utilizes the circularity and area as classification characteristics to filter candidate headlights. Then, through calculating the geometric similarity of symmetrical candidate headlights, pairing is completed for both headlights as shown in Fig. 9. Yet, two questions appear.

Question 1: Adjacent vehicle headlights are mistakenly matched

When two same type vehicles travel side by side, the candidate headlights located in two vehicles respectively will be mismatched as the headlights on one vehicle as in Fig. 10. In the image, there are two sports utility vehicles (SUVs) traveling side by side in current frame, after frame difference processing and geometric characteristic similarity calculation, the right headlight on left lane vehicle presents the highest similarity with the left headlight on right lane vehicle. Hence, according to method above introduced, they will be mistakenly matched.

Question 2: The vehicle is indentified more than once

When vehicle headlights and shadow lights or other small lights coexist in scene, the vehicle will be indentified more than once. As in Fig. 11, both shadow lights are wrongly labeled as vehicle headlights. So, the vehicle is indentified twice.

In order to resolve the two questions, spatial relationship is modeled for the paired headlights, which takes the horizontal distance and average height of the paired headlights in to consideration. In this way, if the model satisfies the condition \(P(e_i |\Omega )>T\), we consider the paired headlights belong to one vehicle. T also is a const equal to 0.2, \(P(e_i |\Omega )\) represents the similar probability of the model and both headlights GMM.

Fig. 10
figure 10

Adjacent vehicle headlights are matched by mistake

Fig. 11
figure 11

Shadow lights are matched by mistake

5 Results and Analysis

To test and evaluate our method, we applied it to the urban traffic road for vehicle detection. We selected Xi’an southern 2nd road and Shanghai huaxia road in China as the actual testing environment. We collected image sequences using a high-resolution charge coupled device camera. The proposed system is implemented with Visual C\(++\) on mts video format on Windows 8 platform with an Intel 3.6-GHz central processing unit (CPU), 4-GB random access memory (RAM). The size of each frame is processed as \(1920\times 1080\), and the sampling rate of the sequence is 50 ft/s. Detection rate \(t_p \), false detection rate \(f_n \), missing detection rate \(f_p\) of our method were evaluated for both the vehicle and its components detection performance, as shown in Table 1.

Table 1 The system performance

They are defined in Eq. (3).

$$\begin{aligned} t_p =\frac{N_{TP}}{N_{TP} +N_{FP} }\quad f_n =\frac{N_{FN} }{N_{TP} +N_{FP}}\quad f_p =\frac{N_{FP} }{N_{TP} +N_{FP} } \end{aligned}$$
(3)

where, \(N_{TP} , N_{FP}\), and \(N_{FN} \) are the numbers of the targets identified as true positives, false positives and false negatives respectively.

The performance of the component detection is highly related to that of the vehicle detection. We tried to ensure high detection rate of the vehicle components since they determine the upper limits of the vehicle detection rate. As in Table 1, the average detection rate of the license plate can reach 94.7%, the average detection rate of the rear lamp can reach 92.8%, the average detection rate of single headlight can reach 93%, the final detection performance of the vehicle is highly superior to those of the components. Our method achieved an average detection precision rate of 95.5% for all the scenarios. Furthermore, the average running time per frame was 20 ms, which satisfied the real-time performance.

In order to further prove the superiority of our method, Fig. 12 shows some special cases in daytime scene.

Fig. 12
figure 12

Vehicle detection at daytime in some special cases. a License plate occlusion, b right rear lamp occlusion, c left rear lamp occlusion

Incomplete detections of rear lamps were caused by occlusion of passing pedestrians or vehicles behind of it in Fig. 12b, c and that of license plate was caused by blue car in Fig. 12a. However, they did not affect the final vehicle localization. Because based on our proposed method, as long as there were two components were correctly detected, we could recognize it as a vehicle. It perfectly resolves the object shelter and adhesion problems in traffic jam.

Figure 13 shows some special cases in nighttime scene too.

Fig. 13
figure 13

Vehicle detection results at nighttime in some cases. a Small vehicle, b big vehicle, c vehicles move side by side, d with both shadow lights

Interferences of shadow lights and vehicles moving side by side existed in Fig. 13c, d. Before utilizing the spatial relationship model of both headlights, they were mistakenly paired to indentify a vehicle. But when both headlights GMM was added in to the method, the two interference conditions were effectively eliminated.

6 Conclusion

In this paper, we have proposed a vehicle detection method using spatial relationship GMM for complex urban surveillance based on a high-resolution camera. It is composed of two different detectors, one of which is based on rear-view in daytime, the other is based on forward-view in nighttime.

For daytime scene, components of the vehicle rear, including license plate and rear lamps are first detected using traditional methods. Then, we construct the GMM to model the spatial relationship of them and accomplish the likelihood estimation of the model further to identify the vehicle, which significantly improves the correct detection rate especially while occlusion occurs.

For nighttime scene, lights region are first detected based on frame difference method, then headlights preliminary paring is completed based on the symmetry and shape characteristics (circular), finally aiming at the interference of shadow lights and other warning lights, the both headlights GMM is added to correct the detection results, which greatly increases the vehicle detection rate in nighttime

The effect is verified through experiments in practical urban scenarios. In future, we will focus on selecting more components to construct the model on 3D space for high-precision vehicle detection.