1 Introduction

Enhancing and Ensuring a fair level of security across multiple scales of time and space in public places such as airport, railway station and at other places becomes extremely multifarious challenge for smart video surveillance system and it also enhances situational consciousness. There are multiple security challenges like screening system, database system, biometric system and video surveillance system for object tracking and verifying identity and also to monitor activities respectively. Today video surveillance system focuses on compression of data for the purpose of storing and transmission. Locating, identifying and learning the object behaviour in video sequence requires two main steps.

  • Detection of objects–foregrounds that are in motion.

  • Object’s behaviour recognition.

Smart video surveillance system queries fast and robust algorithm for estimating background, motion segmentation, object tracking, scene analysis and also assist operator for important scene events. Smart and intelligent video surveillance is the most researched topic for the last decade because more importance is given to security and military applications [1].

T. Reeve [2] and Rajiv Shah [3] have beautifully surveyed the penetration and importance of the surveillance system in United Kingdom and United States. They have reviewed that the large amount of surveillance data monitoring was done by human operators over a longer time and yet it doesn’t yields vigilant monitoring. Modern researchers are putting more concentration on real time processing of visual observations because of the tremendous growth of computers and low-cost high-resolution cameras. Over more than a decade, researchers focus their intention to object detection and object tracking for smart surveillance system.

Figure 1 displays the operations of the video surveillance system for which an automated system requires different components. The incoming video sequences may be with or without the background so, it is necessary to generate background with the help of the subsequent frames or by means of the frame analysis.

Fig. 1
figure 1

Simplified block diagram of video surveillance system

Pre-processing is required to remove the dataset noises and outliers. One of the important component of every surveillance system is to establish the background model which is appropriate for every video sequences. The robustness of background modeling gives accurate foreground detection. Finally, tracking can be achieved with the help of the tracker.

A typical visual surveillance system consists with the help of static Pan-Tilt-Zoom (PTZ) video cameras and it will transmit video sequences to a central surveillance room or stores in a surveillance monitoring server. Such video sequences are being observed carefully by human operators. If monotonous monitoring activity of an operator might miss some important incidences at that particular time then such loss in lapses during monitoring activities becomes challenging.

Figure 2 is a typical example of an outdoor surveillance system which operates under various constrains. Any smart surveillance system must be able to perceive and identify the new scene because a human operator is not able to change algorithm parameters every time. This system can operate in different environmental conditions like sudden changes in illumination, clutter background, occlusions etc., with minimum error on the go and turns out to be a robust video surveillance system.

Fig. 2
figure 2

Example of an outdoor surveillance system

2 Related work

Video surveillance and monitoring (VSAM) [1] of United States (U.S.) government funded system is used to detect and track moving vehicles, pedestrian and it also detects activities and interactions among objects. Stauffer et al. [4] proposed a popular background modeling—Gaussian Mixture Model (GMM) approach. They have proposed a multivariate background model using Gaussian Mixture Model (GMM) for each Pixel and compared each Pixel of frames to the Gaussian Mixture Model (GMM) for determining foreground and background component. This approach gives robust detection against illumination variations, clutter background and moving background such as tree breeze, water twinkling etc. It fails against sudden changes in illumination and also it takes more computational time. Bowden et al. [5] presented the system which succeeds in dealing with shortcoming of Stauffer et al. [4] of slow learning rate. The authors have explained that improvement in learning rate will create faster background model for the stable background and this approach also deals with the shadows. Wang et al. [6] proposed an additional approach for shadow removal and background updation. Harville [7] proposed a system which performed foreground segmentation based on depth and color features which indeed uses traditional MoG approach for foreground detection. Harville [8] has proposed an approach which is an extension of Gaussian Mixture Model. Authors allow certain high-level feedback to adopt faster changes in background. For almost stationary objects, weight of the mixture remains low so, the weight will be updated and object can still be identified correctly as a foreground. Wren et al. [9] proposed Gaussian background modeling in which every background pixel of a scene can be estimated as a Gaussian distribution and background color and covariance matrix can be defined as mean and covariance. Mahalanobis distance can be uses to identify the distance metric. Tan et al. [10] proposed Gaussian Mixture Model (GMM) algorithm in which no. of Gaussian can be estimated at each pixel. Carminati et al. [11] proposed an GMM algorithm in which no. of Gaussian can be optimally estimated using ISODATA algorithm. Proposed approach gives limited adaption as it is restricted by the training period. Pavlidis et al. [12] proposed unique Gaussian Mixture Model (GMM) approach to adopt Expectation–Maximization (EM) algorithm for initialization. Yong et al. [13] presented a beautiful comparative analysis for the background modeling. They have reviewed all pixel based, region based and hybrid background modeling methods. Among all the methods they have compared eight different state-of-the art approaches for a very challenging CDnet 2014 video dataset. Comparative analysis is reviewed for the motion segmentation and background subtraction. Along with this, it also compares for Recall, Precision, FPR, FNR and F-measure. Yannick et al. [14] presented a review and comparative analysis on background subtraction algorithms. They have compared seven various background subtraction techniques. All the various techniques are compared against challenging video and synthetic video datasets. Andrews et al. [15] explained the exhaustive literature review on Background Subtraction algorithms implemented on real and synthetic videos. They have compared all statistical and non statistical Background subtraction approaches. Yi Wang et al. [16] presented a video datasets for the motion (Change) detection and incorporate challenges encountered in many indoor and outdoor video surveillance as CDnet 2014. They have provided the ground truth labels and detection Evaluation metrics. Xia et al. [17] proposed a modified GMM model designed to handle dynamic scenes. Spatial and time distribution in background model is improving the performance of the traditional GMM. The decision fusion strategy improves the recovery speed of the proposed background model. Yazdi et al. [18] presented a detailed survey on the moving object detection in video frames specially for moving camera. The survey focused on the various classifications for the moving object detection and it also includes the various learning strategies and motion prediction models. Ata et al. [19] proposed a novel kalman filtering based tracking approach. They have beautifully adopted the performance indices of kalman to develop robust traffic video tracking system hence to increase the efficacy of the vehicular tracking. Tamilkodi et al. [20] presented an innovative cluster local mean wavelet transform approach for the image retrieval. The proposed algorithm utilized image features such as color, texture and silhouette for the extraction. Joshi et al. [21] suggested a novel approach for video analysis. A huge database are available in the form of audio and video for the manual surveillance, using the passive tempering detection algorithm calculated the difference and ultimately used to find and localize the tempering in digital videos. Ahmad et al. [22] proposed an efficient image retrieval tool for the image management system. The large no. of surveillance databases are in the form of audio and video hence storage requires huge space. Selma et al. [23] proposed an ultimate solution for the precise and accurate trajectory to the tracking the aerial vehicles. The proposed algorithm implements the swarm optimization algorithm for the improvement of the particle to produce the desired trajectory.

3 Proposed method

Mixture of Gaussians and Gaussian Mixture Model are been used to model the multi-modal backgrounds and also used to handle clutter and dynamic backgrounds. The Mixture of Gaussians was introduced by Stauffer et al. [4].

It is used to represent each pixel of the video frame by using a mixture of normal distributions in order to handle multimodality of the backgrounds. The Gaussian mixture model is considered such that the foreground detection can be achieved by modeling the background and subtracts the background from the current frame; such operation can be accomplished pixel by pixel as a substitute of region-based approach.

  1. A.

    Background modeling

Background modeling describes the type of model used to represent the background. It identifies mainly the strength of the model to deal with uni-variate or multivariate backgrounds. Background modeling for the detection of foreground or moving object is generally used in various applications to model the background such as indoor and outdoor video surveillance, optical motion, multimedia, robot navigation and assistance, traffic monitoring and automated driver assistance. In video surveillance, the primitive operation is to segment out the foreground or moving object in every frame of the input video sequence. The easiest technique to model background is to obtain a background image which omits any foreground. This background modeling can be classified in two broad categories like parametric and non-parametric models. Some may classify it in to three categorize like Pixel based approach, Region based approach and Hybrid approach [13]. Background modeling can also be classified as recursive and non-recursive approach [24]. The two approaches to locate and detect the positions of object along with object’s foreground are as follows: Background Subtraction and Optical Flow.

  1. B.

    Uni-Variate Gaussian

Uni-variant model is also referred to as single Gaussian. In this approach, Background model is based on fitting a Gaussian probability density function on a pixel’s value. For the background model maintenance, running average is used to update the mean and variance of the single Gaussian. Uni variate Gaussian distribution is defined as [9],

$$g\left( {x{|}\mu ,\sigma } \right) = \frac{1}{{\sqrt {2\pi \sigma^{2} } }}e^{{\frac{{ - \left( {x - \mu } \right)^{2} }}{{2\sigma^{2} }}}}$$
(1)

where \(\mu\) and \(\sigma\) are the mean and variance (standard deviation) of the normal distribution. Foreground and background can be estimated with the help of [9],

$$\left| {\mu_{t + 1} - X_{t + 1} } \right| < k\sigma_{t + 1}$$
(2)

Uni-Variate cannot handle dynamic backgrounds e.g. waving trees, water rippling or moving algae and cannot compensate for illumination variations.

  1. C.

    Multi-Variate Gaussian distribution

Multi-Variate Gaussian distribution is also referred to as Mixture of Gaussians and Gaussian Mixture Model. It stimates the background model by modeling each pixel with a mixture of number of Gaussians.Generally, model is parameterized by the mean, variance and also by probability of each component of the Gaussians [4].

$$\left( {X{|}\mu ,{\Sigma }} \right) = \mathop \sum \limits_{k = 1}^{K} \omega_{k} {\mathcal{N}}\left( {X{|}\mu_{k} ,{\Sigma }_{k} } \right)$$
(3)

where, \(\mathcal{N}\left(X|\mu ,\Sigma \right)\) represent the normal distribution. \(\mu\) and \(\Sigma\) are the mean and covariance of the Multi-Variate Gaussian distribution.

Multi-Variate Gaussian distribution easily adopts the dynamic scenes and also handles gradual and sudden changes in illuminations. Background having fast variations or sudden change in illumination cannot be accurately modelled with just a few gaussians so, it cannot deal with highly sensitive detection. Intrinsic and extrinsic improvements makes Gaussian Mixture Model to handle various constrains.

  1. D.

    Proposed–Modified GMM

To illustrate the background, hypothesis as history over time of pixel intensity values by a gaussians. Background having fast variations or sudden change in illumination cannot be accurately modeled with just a few gaussians so, it cannot deal highly sensitive detection. Figure 3 represent the Modified GMM, The Intrinsic and Extrinsic improvements makes Gaussian Mixture Model to handle various constrains.

Fig. 3
figure 3

Proposed object detection algorithm

Outdoor object suffered with dynamic scenes e.g. tree weaving, gradual illumination changes, fully and partial occlusions with static or moving objects.

Intrinsic Improvements: Motion segmentation:

  • To select model parameters appropriately using parameter optimization algorithm.

  • To develop a robust foreground detection algorithm through Adaptive Thresholding for motion segmentation

  1. a.

    Parameter initialization:

Background model parameters \(\omega , \mu , \sigma\) and the learning parameters \(\alpha\, \mathrm{and}\, \rho\) are initialized with the help of the parameter initialization algorithm. Figure 4 shows the detailed flow chart of the parameter initialization algorithm. For outdoor surveillance system, it is proposed in the literature that the parameters can be initialized with the help of various approaches such as k-means algorithm, Expectation Maximization (EM) algorithm, background reconstruction algorithm etc. For every video sequence, our proposed algorithm estimates the model parameters and every time background model is initialized with the estimated parameters and hence proposed background model is able to handle clutter and dynamic background. Flow chart indicates that every time the initialization relies on the number of iterations and the ground truth image. The initialization is achieved with the help of the \({f}_{\mathrm{minsearch}}\)—finds minimum of unconstrained multi variable function using derivative free method. After the initialization of entire set of parameters, load the mixture parameter file using the available parameters. For the outdoor environment, the available parameters are optimized. 50 iterations give optimum values of mixture parameters and time while if iterations are increased the algorithm takes more time with no significant difference in parameter values and if iterations are reduced the parameter values change significantly so, considering the tradeoffs between time and optimum parameter values number of iterations are set to 50.

Fig. 4
figure 4

Parameter initialization

Extrinsic Improvements: Performance evaluation:

  • To remove Dataset noise by pre-processing technique.

  • To improve Performance evaluation parameter using post processing technique.

  1. b.

    Adaptive Thresholding

Foreground segmentation is the process of pixel classifications among background and foreground. In literature, number of segmentation approaches are available such as intensity based, region based, texture based, edge based, motion based etc. Among all such approaches, Intensity based threshold gives fast and simple classification. Static threshold leads to certain difficulties i.e. if the threshold value is too large then it will detect the moving objects that are actually not available in the ground truth, and increases the false positives due to which system precision decreases. Sometimes large threshold value will fail to detect small color difference. On other hand if threshold value is too small then it will not detect the moving objects that are available in the ground truth rather it increases the false negatives and the system recall decreases. Sometimes low threshold will not be able to distinguish the foreground color if it is similar to background distribution and it also leads to unremovable noises.

Instead of predefined threshold, adaptive threshold will handle all the constraints and also improves the performance evaluation. The adaptive threshold is expressed as,

$$T_{a} = {\min}\left[ A \right]$$
(4)

where, \(A=\left({Fg}_{\omega }\times {Fg}_{v}\right)+\left({Bg}_{\omega }\times {Bg}_{v}\right)\), \({T}_{a}\) is referred to adaptive threshold, \({Fg}_{\omega }\) and \({Bg}_{\omega }\) are referred to as foreground and background weights, \({Fg}_{v}\) and \({Bg}_{v}\) are referred to as foreground and background variance.

The foreground and background weights are calculated as,

$$Fg_{\omega } = \frac{{\mathop \sum \nolimits_{i = 1}^{n} H\left( {1,i} \right)}}{{\mathop \sum \nolimits_{i = 1}^{n} H}}$$
(5)
$$Bg_{\omega } = \frac{{\mathop \sum \nolimits_{i = 1}^{g} H\left( {i,255} \right)}}{{\mathop \sum \nolimits_{i = 1}^{n} H}}$$
(6)

where \(H\)= Image (frame) histogram, \(g\)= gray level and n = number of pixels.

The foreground variance and the mean \(\mu\) is calculated as,

$$Fg_{v} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {A_{i} - \mu } \right)^{2} H\left( {1,i} \right)}}{{\mathop \sum \nolimits_{i = 1}^{n} H}}$$
(7)

where, \(\mu =\frac{\sum_{i=1}^{n}H\left(1,i\right) \times A}{\sum_{i=1}^{n}H}\) and \({A}_{i}\epsilon \left(1,i\right)\)

The background variance and the mean \(\mu\) is calculated as,

$$Bg_{v} = \frac{{\mathop \sum \nolimits_{i = 1}^{g} \left( {A_{i} - \mu } \right)^{2} H\left( {i,255} \right)}}{{\mathop \sum \nolimits_{i = 1}^{n} H}}$$
(8)

where, \(\mu =\frac{\sum_{i=1}^{n}H\left(i,255\right) \times A}{\sum_{i=1}^{n}H}\) and \({A}_{i}\epsilon \left(i,255\right)\)

One of our major contributions of the proposed approach in terms of the intrinsic improvements of the GMM model is to develop an adaptive threshold technique that provides appropriate selection of the threshold value for the better classifications among the background and foreground objects.

  1. c.

    Pre-processing

Pre processing is always been an important phase for the entire process. Pre processing not only improves the performance evaluation, but it also helps to reduce the execution time. In proposed algorithm Adaptive Local Noise Reduction filter will play a role of pre processing phase to reduce the dataset noises and outliers. The simple statistical measures of a random variable are its mean and variance. The mean of the adaptive filter determines average intensity in the region over which the mean is computed and the variance delivers a measure of contrast in that region.

  1. d.

    Post processing

Once the foreground object is detected, it is required to reduce the noise remained in the foreground objects. Extrinsic Improvements emphasizes purely on improving the performance of the model and also to improve the results. The smallest among the isolated segmented regions are removed and object regions are merged. In our proposed approach, Morphological Closing is used as a post-processing for the sake of reducing the foreground noise. Morphology procedure is carried out to pad gaps inside the moving objects and diminish noise persisted in moving objects.

  1. e.

    Mixture parameter maintenance

Mixture Parameter maintenance is a process for updating the background model to adopt the scene changes over the entire process using on line IIR up-gradation algorithm. It is a learning process and must be achieved online at every pixel and frame level. Background can be updated by the maintenance component using previous background, foreground binary mask and the current frame. It is the process to adopt the changes occurred in the background, but it is not limited to backgrounds only. Sometimes model parameters need to be updated when a foreground remains stationary for a long duration.

4 Results and discussion

Object detection algorithm has been designed and examined in outdoor surveillance environment. Memory consumption and execution time is important to evaluate while selecting background modeling method. All the comparative analysis has been implemented with MATLAB 2013a and an Intel core i3-5005U computer with a 3.3 GHz CPU and 4 GB memory. We have used four videos of different frame sizes: 768 × 576, 320 × 240 and 160 × 120.

The proposed algorithm consist couple of major modules: (a) Background Modeling Analysis (b) Foreground Detection. The improvised Gaussian Mixture Model is used to estimate the background model of every video sequences. Adaptive Thresholding is used to segment the foreground objects under various constraints and challenging conditions. Figures 5 and 6 are the Standard dataset sequences CDnet 2014 [25] of the motion segmentation under different constraints. The first sequence has suffered with the clutter and dynamic background such as baseline and leaf’s floating.

Fig. 5
figure 5

Outdoor standard dataset sequences [25]

Fig. 6
figure 6

Outdoor standard dataset sequences [25]

As compared to ground truth adaptive threshold segments the moving vehicles efficiently. Figure 6 has suffered with the dynamic background and the ability of the algorithm is to efficiently detect the moving objects under the dynamic and high intensity backgrounds. For both the sequences, the background model is developed and continuously updates in every new pixel and frame.

Figure 7 is a standard dataset PETS 2009 sequence has suffered with the high illumination, partial occlusions and near and far field objects. In all video sequences proposed Object Detection approach ably detects almost all the objects accurately.

Fig. 7
figure 7

Outdoor standard dataset sequences [26]

Figure 8 is also from a standard dataset ViSOR sequence which has suffered with clutter background, partial occlusions with the static and moving objects as well as near and far field objects. The proposed approach efficiently detects almost all the objects.

Fig. 8
figure 8

Outdoor standard dataset sequences [27]

  • Performance evaluation

For accurate object detection, certain performance evaluation metrics and measures are available to verify the strength and weakness of object detection approach. It also compares the robustness of the proposed approach with the ground truth and other similar approaches for which various performance metrics are used to evaluate the proposed algorithm such as Fail Rate Detector and Multiple Object Detection Accuracy (MODA) [28].

$${\text{MODA}} = 1 - \frac{{F_{n} + F_{p} }}{{T_{p} + F_{n} }}$$
(9)

where, \(F_{n}\) = False Negative, \(F_{p}\) = False Positive and \(T_{p}\) = True Positive.

Figure 9 shows the comparative graphical representation of the various sequences. The proposed approach is being tested with the standard challenging datasets and is being compared with the other similar approaches like GMM [4], KDE [29] and CODEBOOK [30] for moving object detection. For the case of the False Negative Rate Fig. 10 shows significant improvements for the Highway and Pedestrian- CDnet 2014 and ViSOR datasets.

Fig. 9
figure 9

Multiple object detection accuracy

Fig. 10
figure 10

False negative rates

Figure 11 shows the comparative analysis of the standard video data sequences for false positives. Plot shows the significant improvements in false positives. In proposed approach Parameter Initialization algorithm helps to decrease false positives. As compared to other approaches proposed algorithm provides less no. of false positives for the CDnet 2014 and PETS datasets.

Fig. 11
figure 11

False positives

5 Conclusion

The proposed Modified Gaussian Mixture model for outdoor Object detection algorithm is implemented and further its results are presented and discussed. The Intrinsic and Extrinsic improvements of the GMM and Adaptive Thresholding effectively detect the objects under different circumstances. Proposed algorithm is robust and it efficiently detects objects for an outdoor environment with complex and dynamic backgrounds. It can also handle partial amount of occlusions and certain amount of shadows. The proposed algorithm has certain limitations like complete shadow detection becomes difficult in a video frames with small texture foreground and background variations, unable to detect stationary foreground objects as well as difficult to detect fully occluded objects. Pre processing not only improves the performance evaluation, but it also helps to reduce the execution time. In proposed algorithm Adaptive Local Noise Reduction filter will play a role of pre processing phase to reduce the dataset noises and outliers. Morphology procedure is carried out to pad gaps inside the moving objects and diminish noise persisted in moving objects. The robustness of the propose algorithm has been compared with the ground truth and other similar approaches through several performance evaluation metrics.