1 Introduction

Intelligent transportation systems (ITS) refers to a variety of tools, software, hardware, and communication technologies that could be applied in an integrated fashion to improve the efficiency and safety of vehicular traffic [1, 2]. ITS provides support to enhance operation of transportation services, transit management and information to travelers [3, 4]. Research in ITS is targeting improvements in safety of future transportation systems through integrating safety enhancing functions within vehicles. Technologies such as Radar/Lidar, loop detectors and traffic video analysis have been used to provide such safety features. We discuss these technologies in Sect. 2 below.

The on-road automated vehicular detection and tracking has been considered as one of the most valuable research point over the past decades [5, 6]. Such point of interest plays a vital role in the evolution of intelligent transportation systems (ITS). Many available techniques have been grown up for the on-road vehicular detection. Those techniques can be classified into software based computer vision technique and hard ware active sensors based Millimeter radar and lidar techniques. Computer vision methodology introduces a good point of view in the state of the art of analyzing traffic videos. Vehicle interaction, automated traffic warning system, traffic rule violation, and congestion are good examples which can be targeted using surveillance on-road installed cameras. Foreground estimation, background estimation, and motion tracking are classical visual techniques for detecting and classifying vehicles on highways of interest. Video analysis of urban areas are still more challenging because of its dependency on some sort of road parameters such as traffic density, variation of road users, and the degree of occlusion [7]. In order to performing a comparison study between the proposed algorithms, it would be more difficult to perform such study as there is no standardized benchmark dataset to be used [8].

ITS provide the opportunity to establish functions in the infrastructure and/or vehicle to mitigate these deficiencies. For example, sensors in the main line highway could provide an advance warning about an oncoming vehicle to side street traffic on a stop-controlled intersection, to compensate for any sight distance deficiencies for the side street traffic. In-vehicle sensors could provide advance warning to inattentive or drowsy drivers before hitting another vehicle or object, or before running off the road. The possibilities to improve the safety of our transportation system are endless [9].

In this paper, video analysis has been proposed in order to consider vehicular tracking operations under different weather conditions (e.g., rain, fog, and reduced light conditions). accordingly, such analysis has been rarely performed in spite of being highly desirable. The proposed traffic video analysis setup a Kalman filter in the presence of weather conditions. The paper uses three different types of video degradation noises (salt and pepper, speckle, and Gaussian) with different levels of occurrence to analyze traffic videos. In addition, a system of filters has been applied to the degraded version of the test video producing a new record with respect to filter masking system. We calculate distance between two vehicles as an example application of vehicular tracking. Calculating such distance enhances safety function in order to provide an automated warning system in order to increase safety. This may result in reducing aggressive driving behavior as well as providing drivers with more time to react to road events. Figure 1 shows the main block diagram of the introduced study.

Fig. 1
figure 1

Proposed analysis block diagram

The rest of this article has been organized as follow: Sect. 2 ITS vehicular tracking technologies; Sect. 3 distance estimation and tracking algorithm; Sect. 4 introduces code setup; Sect. 5 introduces experimental results; Sect. 6 conclusion; references.

2 ITS vehicular tracking technologies

In this section, an overview has been introduced for the following ITS technologies: radar/Lidar and computer vision. One of the most widely used techniques that serves the process of detecting vehicles is the Millimeter radar active sensor. Typically, a continuous waveform signal which is frequency modulated will be emitted. Once receiving the demodulated wave form, the frequency content will be analyzed. The distance between the active sensor and the appropriate vehicle of interest will be easily calculated according to the frequency shift between the transmitted and the received signal. Tracking the detected objects will be performed according to motion characteristic of interest [10].

Vehicular detection and tracking based active millimeter radar works fairly in a challenging weather conditions (rain, fog, and darkness). In addition, in case of noisy measurements, a cleaning process would be extensively required. Millimeter radar detects and tracks all moving objects. A classification process would be necessary to classify those objects as vehicles according to the appropriate relative acceleration, motion, and size of interest. Furthermore, detection of stopped vehicles would be fairly performed [10, 11]. As millimeter radar, lidar detects and tracks all moving objects. A classification process would also be necessary to classify those objects as vehicles according to the appropriate relative acceleration, motion, and size of interest too. However, lidar provides cleaner measurements and more sensitivity to precipitation than radar. Lidar use a rotating hexagonal mirrors which split the laser beam [11]. The upper three beams are used for detecting vehicles and appropriate obstacles however the lower three beams are used for detecting road features and land marks [12]. Lidar cost remains a challenging issue. Vehicular detection and tracking based computer vision uses a system of installed surveillance cameras. Acquisition system based cameras provide a wide range view which allowing the vehicular detection and tracking across multiples lanes [13]. The appropriate imaging system contains a lens and a charged coupled device (CCD). By using means of computer vision, sophisticated computations would be required due to the presence of large amount of homogeneous mapped pixels in the digital video frames of interest. Successive video frames provide researchers with rich visual information source for manual inspection. In addition, there will be no traffic disruption for installation and maintenance. By using means of computer vision, recognizing objects as vehicles would be easier than active lidar and radar. In addition, there will be no need for any classification processes [14].

One of the most foundational and valuable technique based computer vision is the Kalman filter (KF). Kalman filter is an estimator which infers the appropriate parameters of interest from inaccurate, indirect, and uncertain perceptions [15]. Its filtering functionality based on a linear mean square error estimation. The main target of its filtering functionality is to minimize the estimated mean error covariance according to some sort of presumed conditions. Kalman filter produces good results due to optimality and structure in addition, KF offers a convenient form for online real time processing. However, KF also does not just clean up the data observations, but also projects these observations into an enhanced version of measurements [16]. The basic mathematical model for the KF involves a discrete-time nonlinear dynamic enhanced system as follow:

$$ x_{k + 1} = F(x_{k} ,v_{k} ), $$
(1)
$$ y_{k} = H(x_{k} ,n_{k} ), $$
(2)

where xk represents the un-observed state of the system and yk represents the only observed state of the system, vk represents the process based noise which drives the dynamic system, and nk represents the observation noise [15]. The dynamic model based system F and H has been assumed to be known. In state-estimation, the KF is the best predictor which has the ability to achieve a recursive maximum likelihood estimation of the state [16]. KF model could be expressed as follow:

$$ \widehat{{X_{k} }} = K_{K} \cdot Z_{K} + \left( {1 - K_{K} } \right) \cdot \widehat{{X_{K - 1} }}, $$
(3)

where \( \widehat{{X_{k} }} \) is the current estimation, KK is the Kalman filter gain with discrete values (K1, K2, K3,…..), Zk is the measured value, and \( \widehat{{X_{K - 1} }} \) is the previous estimation.

Kalman filter has five performance indices which affect directly on the accuracy of the tracking methodology:

  1. 1.

    Motion model The KF can follow one of two motion models. The first model is the constant velocity based model where the velocity of the moving object is assumed to be constant. The second model is the model based constant acceleration where the object is assumed to be accelerating at a constant rate.

  2. 2.

    Initial location For the KF to be able to track a certain object, its initial location must be known. This parameter is given in X and Y coordinates.

  3. 3.

    Initial estimation error This parameter expresses the amount of error in the X and Y directions which the KF should accept before deeming a certain track as “unacceptable” and dropping it altogether. It only affects the accuracy of the first few predictions since the KF adapts and creates its own estimation error values later on based on the previous results.

    Increasing this value will make it possible for the KF to adapt faster but it might also make the first few predictions inaccurate.

  4. 4.

    Motion noise This parameter represents the acceptable deviation from the set Motion Model because it might not fit the object’s velocity or acceleration perfectly. Increasing this parameter might be good for the reason of making the KF more suitable for the object’s movement but it might also cause some inaccuracy.

  5. 5.

    Measurement noise This value is given as a scalar. Increasing it causes the KF to adhere more to the set Motion Model.

3 Distance estimation and tracking algorithm

Figure 2 introduces the main algorithm for estimating the distance between two adjacent vehicles after the detection procedure for the vehicle in a certain video and how this vehicle is kept tracked through that video according to their centroids and how any other moving objects being eliminated according to specifying a minimum threshold size of interest. The following steps summarize the algorithm:

Fig. 2
figure 2

Shows the flow chart for detection, estimating distance and tracking procedure in the proposed system

Step 1 traffic video based multiple moving vehicles would be read according to Algorithm 1:

figure a

Step 2 Separate the background from the foreground (vehicles). A number of consecutive frames have been taken and the pixels have been divided into static and dynamic pixels. Foreground detector uses means of background subtraction technique. The appropriate methodology has been utilized as follow:

  1. a)

    Specify a reference frame which represent the appropriate back ground of interest. Now, the background model based frame would be initialized.

  2. b)

    Estimate the appropriate threshold value in order to satisfy the required detection rate of interest. The selection of the threshold plays a vital role in the subtraction operation.

  3. c)

    Identify and classify the type of given pixel with respect to degree of both brightness and chromaticity compared with pixels in the background frame. The four pixel classes would be summarized as follow:

Class 1: moving foreground based pixels According to both chromaticity and brightness, they would be different from the expected values in the background frame.

Class 2: shaded background based pixels According to both chromaticity and brightness, chromaticity would be similar to those in the background frame however brightness would but lower.

Class 3: ordinary background based pixels According to both chromaticity and brightness, they would be similar to those in the background frame of interest.

Class 4: highlighted background based pixels According to both chromaticity and brightness, chromaticity would be similar to those in the background frame however brightness would but higher.

A binary frame has been resulted where black represents the background and white represents the moving as shown in Figs. 3 and 4. The mathematical representation for the background subtraction was as follow:

$$ x_{t} \left( s \right) = \left\{ {\begin{array}{*{20}c} 1 & {d(I_{s,t} ,B_{s} ) > \tau } \\ 0 & {else where} \\ \end{array} } \right., $$
(4)

where I: observed video sequence, Is,t: foreground model at pixel s within time t, Bs: background model at pixel s, B: static background, τ: threshold, xt: motion mask, d: the distance between Is,t, and Bs. The occurrence probability of color I at pixel s is given by:

$$ p\left( {I_{s,t} } \right) = \mathop \sum \limits_{i = 1}^{k} w_{i,s,t} \cdot N\left( {\mu_{i,s,t} ,\varSigma_{i,s,t} } \right), $$
(5)

where \( N\left( {\mu_{i,s,t} ,\varSigma_{i,s,t} } \right) \):is the ith Gaussian model and wi,s,t is the appropriate weights.

Fig. 3
figure 3

Shows detected vehicles

Fig. 4
figure 4

Shows the next frame of interest

Step 3 Apply morphological operations to preprocess the appropriate test video and remove undesirable objects. Two phases have been introduced for such analysis; first phase is to get rid of undesirable objects such as smaller (i.e. birds) and larger (i.e. pedestrians) moving objects compared with the size of the desired moving vehicles of interest [17, 18]. This phase would be concerned in the state of the art of adaptive thresholding. Algorithm 2 introduces phase 1 based morphological operations of interest.

figure b

Second phase based morphological analysis guarantee the process of filling undetectable pixels in vehicle window. The appropriate filling would be concerned by using means of vehicle closing. This mathematical morphology based technique has been derived by applying dilation process cascaded by erosion process. The closing process working in the state of the art of enlarging the appropriate bright boundaries of the foreground objects (vehicles) in each frame and shrinking the appropriate background holes in such vehicles regions. Algorithm 3 introduces phase 2 of interest [19].

figure c

The proposed system has been utilized regardless the possibility of object losses due to the conditions that change the objects appearance.

Step 4 Apply blob analysis in order to isolate the blobs (vehicles) in each binary frame. A blob consists of a group of connected pixels which represent each vehicle of interest. Blob analysis methodology has the ability of extracting the most salient statistical features; area, perimeter, centroids, bounding box [20]. All these features would be used in order to classify blobs (vehicles) in order to ease the decision making of that if they hold the objects which we are concerned about or not. In the introduced paper, we have concerned with the calculation and the assignment of the following properties to each detected vehicle; Area of the objects, Bounding box of the object, and X and Y coordinates of the blob’s centroid [24]. Figures 3 and 4 show the output from the proposed system for two successive frames after applying blob analysis according to calculating centroids of the moving objects [21, 22]. The state of the art of blob detector is based on normalized Laplacian of Gaussian (LOG)norm:

$$ L\left( {x,y,t} \right) = g\left( {x,y,t} \right) \times f(x,y,t), $$
(6)

where \( g\left( {x,y,t} \right) = \frac{1}{{2\pi t^{2} }}e^{{ - \frac{{x^{2} + y^{2} }}{{2t^{2} }}}} \) is the Gaussian kernel and f(x,y,t) is the video frame of interest. The (LOG)norm can be estimated as follow:

$$ \nabla_{norm}^{2} L = t\left( {L_{xx} + L_{yy} } \right). $$
(7)

Step 5 For each detected blob, perform the following:

  • Assign an ID number that identifies the vehicle throughout the duration of its appearance in the video as shown in Figs. 3 and 4.

  • Apply KF to actually track the appropriate vehicle and associate its detections throughout the video to a single track.

  • Calculate duration of how long has a particular object been detected.

  • Estimate total Visible Count which indicates how many consecutive frames has the particular object been detected.

  • Estimate consecutive invisible count to indicate how many consecutive frames has the object been undetectable for.

Step 6 Estimate distance between two adjacent vehicles according to spatial coordinates of the centroid of each blob (calculated in step 4). The distance between these two centroids (x1, y1) and (x2, y2) of two blobs is calculated using the Euclidean equation:

$$ d = \sqrt {\left[ {\left( {x_{2} - x_{1} } \right)^{2} + \left( {y_{2} - y_{1} } \right)^{2} } \right]} . $$
(8)

Step 7 For each of the following frames, the KF will predict new locations of blobs and places a bounding box around it [21].The reason of using KF instead of any other object tracking [i.e. Hidden Markov Model (HMM)] [22, 23] is that KF introduces some facilities; its ability to Predict moving vehicles in future locations, its ability to reduce the appropriate noise that introduced by inaccurate detections, KF provides some sort of Facilitating the process of association of multiple objects to their tracks, finally, KF introduces multiple moving vehicles tracking with lower processing time.

In addition, the reason of using the concept of Euclidean distance instead of any other distance estimation techniques [(i.e. K-nearest neighbor’s algorithm (k-NN)] [24, 25] is that this method offers an acceptable processing time and simplicity in computations [26, 27]. however, K-NN is a learning methodology with more sophisticated analysis due to its main applications as a classifier detector and predictor [28,29,30,31]. In the proposed analysis, a simple way for calculating such distance is required due to the continuous variation of the centroid values for each detected based in each frame.

Step 8 As a particular vehicle is predicted, a possibility of error is generally expected. To ensures tracking vehicles in spite of changing position, speed and acceleration, we calculate the distance of centroids of each blob calculated in two consecutive frames using the Euclidean Eq. (1). If this difference is found to be less than a specified threshold value, then this prediction is deemed “accurate” and the track’s confidence level is incremented. If the difference value is greater than the threshold value, then the prediction is deemed ‘inaccurate’ and the track’s confidence level is decremented. In order to illustrating this mechanism, a cost matrix, shown in Fig. 5, is created. This matrix consists of M rows and N columns. M is the number of tracks (predictions) and N is the number of detections. Each element in this matrix represents the cost of matching the Mth prediction to the Nth detection. This cost is calculated via the Euclidean equation for distance calculation. If this cost is low, then a match between the prediction and detection is achieved, otherwise the match does not happen. Another parameter that goes into the process of deciding whether the track is to be assigned to the detection or not is the “Cost of non-assignment” which represents the cost of not assigning a prediction or a detection. The higher this parameter is the more likely for most detections and predictions to be matched. Figure 5 show the cost matrix from the proposed system.

Fig. 5
figure 5

Shows cost matrix for detection and prediction

Step 9 Based on the values regarding “accurate” predictions obtained from steps 7 and 8, perform the following:

  • Update the bounding box of the object to the current one instead of the previously predicted one.

  • Adds [1] to the age of the track.

  • Adds [1] to the visibility count for the track.

  • Sets the invisibility count to [0].

Step 10 Based on the values regarding “inaccurate” predictions obtained from step 7 and 8, perform the following:

  • Adds [1] to the age of the track.

  • Adds [1] to its invisibility count.

Step 11 Delete tracks with frequent inaccurate predictions (it stays invisible for a certain number of consecutive frames).

Step 12 Create new tracks for every new vehicle that enters the camera’s scope, assign a corresponding track structure to start tracking by the KF.

4 Code setup

The Software of the proposed system has been developed using Matlab 2015 release (a) with a PC that have 4 GB RAM and 2.5 GHZ dual core processor. According to the following procedure. A graphical user interface has been integrated for estimating the required observations of the two appropriate phases based computer vision analysis.

figure d
figure e

5 Experimental results

The objectives are to calculate distance between adjacent vehicles as an application of vehicular tracking. The main procedure was to measure efficiency of the tracking by analyze a raw test video with ideal weather conditions (i.e., no noise) and a simulated challengeable weather conditions (i.e., adding noise). A complete GUI interface has been utilized with the following indicators: number of assigned tracks, average number of assigned tracks, number of cars, distance alarm indicator, and processing time indicator. Figure 6 show screen shots from the interface of interest.

Fig. 6
figure 6

Shows the interface of phase 1 system topology

5.1 The no-noise case

The number of assigned tracks (correct predictions) has been measured in each frame according to the test video in response to varying the following KF parameters: initial estimation error, motion noise, and measurement noise. Motion model has been set to a constant velocity model and initial location has been set to the coordinates of the centroids. Figures 7, 8, and 9 represent the relation between number of frames (along horizontal axis) and number of assigned tracks in each appropriate frame (along vertical axis).

Fig. 7
figure 7

Initial estimation error representation for all cases with no change

Fig. 8
figure 8

Motion noise representation with each case study

Fig. 9
figure 9

Shows the measurement noise with each case study

In Fig. 7, the initial estimation error is varied along the [X Y] direction which the KF should accept before deeming a certain track as “unacceptable” and dropping it altogether. Table 1 represents all case studies of initial estimation error as a response for applying a gradual variation in its coordinate value according to a 500 frames test video. In each case, the average number of assigned tracks have been calculated in all frames as well as processing time. It is observed that the value of the initial estimation error does not have a significant effect on the number of assigned tracks (correct predictions) and all case studies remain unchanged. This is due to the fact that the KF adapts and changes its estimation error value based on input data.

Table 1 Initial estimation error with [X Y] direction

Figure 8 is concerned with the change of motion noise according to four categories of observations. Table 2 represents all case studies of interest of motion noise. From the obtained results, it is observed that as for the motion noise, choosing a value that is above [150, 150] would be unsuitable. This is due to the fact that increasing its value could lead into a great deviation from the motion model set which causes inaccuracy that appears as a drop in the values of the average number of assigned tracks. This drop causes some sort of wrong predictions which leads to in accurate tracking.

Table 2 Motion noise

Figure 9 is concerned with the changing of the measurement noise according to five categories of observations. Table 3 represents all case studies of interest. As for the Measurement Noise, increasing its value causes inaccuracy measurements for estimating the average number of assigned tracks. The number of assigned tracks is indeed increasing with the increase in its value. But, this increase could still have an adverse impact on the long run. From all previously discusses observations, a first stage fine-tuned tracking criteria has been concluded as follow; by adjusting the performance indices of KF (for a 500 frames test video) as follow:

Table 3 Measurement noise
  • Choose initial estimation error to have any coordinates below 500.

  • Choose motion noise to have any coordinates below 150.

  • Choose measurement noise to have any value below 150.

5.2 The with-noise case

Now, the analysis of non-ideal test video would be started by adding noise into test video. In addition, the previously discusses conclusions have been considered for the required modifications in the proposed GUI system. The appropriate modifications are needed to acquire the analysis of test video with different disturbances. In addition, the tracking accuracy has been utilized and tested again.

The main purpose for that phase is to create simple simulation criteria for some of the most challenging weather conditions. Modifications has been utilized by adding salt and pepper noise, Gaussian noise, and speckle noise as three types of video disturbances with different levels of occurrence percentages. In addition, system of filters has been added to the interface in order to measure the required performance (correct tracking) after the cleaning processes. Many distinguishing filtering systems have been applied for each type of noise; average, maximum, minimum, wiener, disk, Laplacian, of Gaussian (LoG), motion, sobel, prewitt, median, and Gaussian filters. Observations has been recorded as follow: In each case of salt and pepper noise, by increasing percentage of occurrence, observations have been targeted that the values of assigned tracks have been deviated from the ideal case value (zero noise). In addition, by adding a Gaussian noise with a changeable mean and variance values, the number of assigned tracks has been degraded than the ideal case (zero noise) however, these degradations could be assumed non effective in cases of small mean value. Finally, in case of speckle noise, we can see that these degradations will be very small at large variance. Tables 4, 5 and 6 represents the effect of video degradation on tracking accuracy.

Table 4 Effect of salt and pepper degraded video
Table 5 Effect of Gaussian noise degraded video
Table 6 Effect of speckle noise degraded video

By adjusting the GUI interface in filtering video mode as shown in Fig. 10. New observations have been recorded after applying the cleaning methodology of each filter mask on each type of noise with its different percentage of occurrence. These observations have been listed in Tables 8, 9, and 10. In addition, the main target was to measure the efficiency of each filter mask with each type of noise disturbance with respect to different levels of occurrences. This could be accomplished by measuring the amount of deviation of the average number of assigned tracks between the zero noise test video (value was recorded approximately to be 1.996) and the noisy version. Those deviations have been summarized in Table 7 according to comparing the observations listed in Tables 8, 9, and 10 (calculations after cleaning) with the ideal. Observations were as follow: in case of Speckle noise: The Wiener and Disk filters scored the least deviation from the original values, thus it appears that they are the most accurate and the processing times are approximately the same. In case of Gaussian noise: The Median filter showed the least deviation from the original values, thus it’s the most suitable filter for this type of noise. However, its processing time was high. Finally, in case of Salt and pepper noise: The Median filter scored the least deviation from the original values, thus it’s the most suitable one however its processing time was high. Table 11 represents snapshots from the system output for the applied three types of video noises after the cleaning process. From all previously discusses observations, a secondly stage fine-tuned tracking criteria has been recommended to use such median, wiener, and disk filter masks to discriminate between video degradations.

Fig. 10
figure 10

Shows two snapshots for the interface of phase 2

Table 7 The deviation from original value after cleaning
Table 8 Average number of assigned tracks after adding salt and pepper noise
Table 9 average number of assigned tracks after adding Gaussian noise
Table 10 Average number of assigned tracks after adding Speckle noise
Table 11 Some snapshots from the system output in each case study

6 Conclusion

Measuring and enhancing the tracking efficiency for moving vehicles in an urban video is an important challenging research points especially under abnormal weather conditions. A new system has been developed for calculating distance between adjacent vehicles as an application of vehicular tracking. Calculating such distance enhances safety function in order to provide an automated warning system for the drivers. Two phases based analysis have been observed in order to establish such system in both ideal and challenging weather conditions. Phase 1 is responsible for adjusting performance indices of KF parameters in order to achieve the best tracking in case of no noise test video. We recommend the values of initial estimation error, motion noise, and measurement noise to be below one quarter the total number of frames for the video of interest (in our case, we have used 500 frames test video and performance indices were below 150 for best tracking). For a noisy test video, the first procedure is to set KF parameters as discussed before. Then we recommend the use of wiener, disk, and median filters according to the type of disturbance of interest (use wiener and disk filter masking in case of speckle noise, use median filter in case of Gaussian noise, and use median filter in case of salt and pepper noise). Future work should be directed toward cascading filters with a focus on realistic conditions during evaluation taking in considerations the level of complexity. In addition, a comparison study would be targeted in the state of the art of using new algorithms for detection and tracking moving vehicles such as Otsu method, k-nearest neighbor’s algorithm (k-NN), and Hidden Markov Model (HMM).