1 Introduction

In computer vision and image processing, moving object detection and tracking plays an important role in higher-level video analysis applications. It is the first and foremost step in various applications such as intelligent traffic surveillance [1, 2], real-time detection of suspicious behavior in shopping malls [3], path detection in video surveillance [4], and intelligent observation of birds for further use in various applications such as aviation safety and protection [5]. Object detection can be done by extracting the shape of a moving object from a video sequence using a number of techniques like background subtraction (BS), feature extraction, and statistical methods. Object detection and tracking go hand in hand; once an object is detected, it is tracked throughout the video sequence using different techniques. There are two major activities involved in all object detection methods, namely background construction and foreground extraction. In background construction, the background information is constructed using different methods such as taking the arithmetic mean of pixels between successive frames, statistical methods, or hybrid methods. The constructed background is later subtracted from successive frames to obtain the desired object. Foreground extraction-based object detection uses temporal, spatial, or spatiotemporal information to extract an initial video object and later utilizes motion-based information to detect objects in successive frames. Because various issues such as illumination changes, fake motion, crowded video scenes, and Gaussian noise are involved, it is challenging to perfectly detect and track objects in a dynamic and practical environment. The performance of detection systems can be drastically affected by small movements within the environment. The challenges imposed by the abovementioned issues make the problem of detecting and tracking moving objects from videos a very interesting, challenging, and multidisciplinary research problem in computer vision and image processing. Furthermore, it is desirable to devise a mechanism for achieving more accuracy in the detection and tracking objects with as few misses as possible.

This paper is organized as follows: Section 2 summarizes various studies for background detection available in the literature. Section 3 details about various methods used in this study like Big Bang Big Crunch Optimization(BBBCO), fuzzy 2-partition entropy. Section 4 elaborates the proposed approach; algorithms used in the proposed modified BBBCO based BS are described in detail. Experiments and results are presented in Section 5, followed by conclusions in Section 6.

2 Related work

Background subtraction approaches consist of finding the absolute difference between current frame and background so that relevant changes could be detected. The success of algorithm lies in devising an effective method on the basis of which background can be modeled and updated. Background subtraction performs well in extracting objects from videos but the effectiveness of approach degrades as the background is not static, has lighting fluctuations and videos are noisy. A number of methods have been proposed for improvement of standard BS algorithm but these have been targeted for specific applications or environments. It is also difficult to devise any general method that is accurate in all situations. Various BS methods differ in mechanism by which they perform modeling of background and metrics they use for classification. The following section summarizes various studies available in the literature.

Heikkila et al. [6] used the conventional BS approach to identify moving objects in videos. The study used a first-order recursive filter to update the background model for reliable moving object detection. Mandellos et al. [7] presented a background reconstruction algorithm that uses a histogram filtering procedure. The most common algorithm used in BS models is based on Gaussian mixture models (GMMs). Yoshinaga, in [8] presented a GMM spatiotemporal BS model that performs a region level statistical analysis with a selection of nine sensitive parameters. These parameters were continuously updated as the video was analyzed. Chen and Ellis [9] proposed a self adaptive GMM model with an adaptive parameter, median of quotient, which implements special filters to suppress image noise and sudden illumination changes. The model was able to handle shadows and reflection issues, and a final morphological operation was incorporated to fill holes in the final frame. However, the detections were not very accurate. Spampinato et al. [10] proposed a texton-based kernel density estimation that used color and texture features. The texton-based method was validated using three different databases: I2R [http://perception.i2r.a-star.edu.sg/bkmodel/bkindex.html, Fish4Knowledge http://groups.inf.ed.ac.uk/f4k/F4KDATASAMPLES/INTERFACE/DATASAMPLES/search.ph], and Background Modeling Challenge(BMC) http://bmc.iut-auvergne.com/. The results indicated problems with night videos, scarce illumination, highly dynamic backgrounds, and rain/snow scenarios. Comprehensive surveys of the state-of-art techniques for conventional and recent approaches to background modeling and foreground detection were presented by Bouwmans [11], Shaikh et al. [12], Piccardi [13], Sobral et al. [14], and Karasulu et al. [15].

Entropy-based detection methods for videos are rather sparse in the literature, and a few of them are discussed in this section. Qin et al. [16] proposed a layered segmentation approach for detecting stationary and moving objects from surveillance video. Their method extended the maximum entropy statistical model to enable it to segment layers with certain features that were collected by developing a codebook with a set of codewords for each pixel. Wang et al. [17] proposed a block-based approach that utilizes entropy information collected in an H.264-bit stream for extracting moving objects. They applied a block entropy feature from compressed domain video to handle bit streams with variable quantization values. Two background entropy models were built for each video block to reflect the difference in coding efficiency between inter- and intra-block types. The foreground frame was built using BS after a normalization process. Over-segmentation when extracting a moving object was avoided by applying a hysteresis thresholding approach. Morphological filters and connected component labeling were applied at the final stage to extract the list of moving objects. Badri et al. [18] calculated the threshold value for difference images using an adaptive thresholding algorithm whose window size was chosen by the entropy measure over the window under consideration. An entropy-based window growing approach was used in which the ideal size of the window was generated by comparing the entropy of the window with the image histogram and using Ostu’s thresholding method for binarization. The results were poor and no benchmark dataset was utilized for the evaluation. Ma and Zhang [19] proposed a spatio-temporal entropy-based object detection method by measuring the color variation in successive frames and building a spatiotemporal entropy image. They claim that the proposed method is more robust to noise than the traditional difference-based methods. However, the proposed method generates false detections when light flickering and shadows exist in the scene. The most relevant study similar to the present work was proposed by Karasulu and Korukoglu [20]. They implemented a modified BS method based on spatial entropy and formulated the threshold detection problem as a p-median problem. Demand and facility points for the p-median problem were used as the entropy for the foreground and background regions. A modified BS algorithm based on simulated annealing (SA) was also implemented.

The literature discussed in this section summarizes the various research directions in which the standard BS algorithm has been explored. A good BS algorithm requires accurate pixel classification, should be robust, and must be adaptive to environment changes. Fortunately, such challenges have been successfully handled using soft computing techniques. Some solutions based on soft computing approaches have been reported in the literature to deal with noisy and insufficient data, but their applicability on real and noisy videos is still an open issue. Designing an effective system requires a soft computing approach that can optimize the detection process in a real dynamic environment to improve accuracy.

3 Problem analysis and basic methods

This section briefly explains Big Bang Big Crunch Optimization (BBBCO), entropy method, fuzzy 2- partition entropy and details of how to effectively determine threshold value using the concept of BBBCO method.

3.1 Big bang big crunch optimization

In recent times, meta-heuristic algorithms such as Genetic Algorithms, Particle Swarn Optimization algorithms, Ant-Colony Optimization algorithm etc. have been successfully proven to propose solutions for a wide variety of applications. The Big Bang–Big Crunch Optimization (BBBCO) algorithm is inspired by theory for origin and evolution of universe consisting of two phases: big bang and big crunch. In Big Bang phase, energy dissipation produces a disorderly state of particles, whereas, in Big Crunch phase, the randomly distributed particles are drawn into an order. Randomness is the key feature of the Big Bang phase. Randomness resembles energy dissipation in nature while convergence to a local or global optimum point can be seen as a gravitational attraction. Since energy dissipation creates disorder from ordered particles, randomness is used as a transformation from a converged solution (order) to the birth of totally new candidate solutions (disorder). Figure 1 summarizes the steps of standard BBBCO algorithm. This optimization is based on the application of two sequential phases: Big Bang and Big Crunch. Big Bang phase consists of initializing the population of candidate solutions randomly over the entire search space like other evolutionary search algorithms. In the next phase, Big Crunch, the contraction operator calculates the center of mass by processing current position of each candidate solution and its associated fitness value. The Explosion (Big Bang phase) and contraction phase (Big Crunch phase) are repeated until specified criteria are met. BBBCO is performed with an aim to shrink the randomly generated solutions to a single point using center of mass computations as the base. The calculation of center of mass is done using formula given in (1)

$$ C=\frac{{\sum}_{i=1}^{N} \frac{x_{i}}{F_{i}} }{\frac{1}{{\sum}_{i=1}^{N} F_{i}}} $$
(1)

where C is the position of the center of mass; x i is the position of candidate i in an n-dimensional search space; F i is the fitness value of the candidate i; and N is the population size in Big Bang phase. The algorithm proceeds to calculate new candidate solutions in and around this center of mass using formula (2)

$$ \mathrm{x}_{\text{new}}^{\mathrm{k+1}}= {x_{c}^{k}}\pm \frac{l.r}{k} $$
(2)

where \(\mathrm {x}_{\mathrm {c}}^{\mathrm {k}}\) is the center of mass in k th iteration; \(\mathrm {x}_{\text {new}}^{\mathrm {k+1}}\) is the new center of mass in (k + 1) th iteration; r is a random number; l is the upper limit on parameter and k is the current iteration step.

Fig. 1
figure 1

Basic Big Bang Big Crunch optimization(BBBCO) algorithm

The aim of standard BBBCO is to find an appropriate solution for which value of fitness function is minimized. The major advantage of BBBCO lies in its ability to avoid trapping in local minima because of the random search strategy employed by it. BBBCO method is also computationally less expensive and provides quick convergence as required for real time processing applications. The readers may refer to [2123] for more details.

3.2 Fuzzy 2-partition entropy approach

It is generally believed that image processing contains some fuzziness in nature and the fuzzy entropy method is a well-known threshold detection approach in image processing. Several researchers have utilized the concepts of image entropy for extracting suitable threshold values for different image sequences. In this paper, we use the concepts of a fuzzy 2-partition and maximum fuzzy entropy principle to select an optimal threshold values for performing background subtraction in videos.

Consider any digital image I of size M * N and let f(x,y) be gray level value of pixel (x,y) ranging from (0,1,2……….,L-1). If the number of pixels with gray level k is denoted as h(k), then, the probability of occurrence of the gray level kin the image can be calculated as:

$$ p\left( k \right)= h\left( k\right) \left/ {\vphantom {{h\left( k \right)} {MxN}}}\right. MxN $$
(3)

To partition the image into foreground and background, two different formulas for calculating fuzzy membership values μ F and μ B been used. The foreground region can be modeled using μ F and the background region can be modeled using μ B . The fuzzification of an image pixel into any of these regions can be effectively modeled by membership curves for μ F and μ B calculated using (4) and (5), respectively.

$$ \mu_{F}\left( j \right)=\left\{ {\begin{array}{ll} 0& j\le a \\ \frac{\left( j-a \right)^{2}}{\left( c-a \right)\left( b-a \right)}&a<j\le b \\ 1- \frac{\left( j-c \right)^{2}}{\left( c-a \right)\left( c-b \right)}&b< j\le c \\ 1&j>c \\ \end{array}} \right. $$
(4)
$$ \mu_{B}\left( j \right)=1-\mu_{F}\left( j \right) $$
(5)

where 0 ≤ aL−1; 0 ≤ bL−1; 0 ≤ cL−1; and j = 0,1,2,3,4…...255.

Figure 2 shows shapes of these membership functions for sample values of (a, b, c). The optimal threshold value for classifying any pixel as foreground or background can be extracted from the intersection of the curves μ F and μ B , as shown in Fig. 2. It can be clearly seen that μ F = 1−μ B and, according to [28], μ F and μ B is the fuzzy 2-partition of the image. The two probability distributions for foreground and background can be found using the fuzzy distribution of the gray-level values of the frame. The probability distribution functions for foreground and background is given by formulas defined in (6) and (7).

$$ P_{F}=\sum\limits_{j=0}^{L-1} {\mu_{F}\left( j \right)p(j)} $$
(6)
$$ P_{B}=\sum\limits_{j=0}^{L-1} {\mu_{B}\left( j \right)p(j)} $$
(7)

Chang et al. [28] specified that total entropy for such an image can be calculated using probabilities of fuzzy events of different fuzzy partitions defined as:

$$ \mathrm{H}=-\sum\limits_{i=1}^{c} P\left( A_{i} \right){log}_{2}\left( P\left( A_{i} \right) \right) $$
(8)

where P(A i )refers to probabilities of fuzzy events for various partitions, in this case, c =2 for background and foreground. H is the total entropy of image or frame. In other words, H may be rewritten as:

$$ H=-P_{F}{log}_{2}\left( P_{F} \right)-P_{B}{log}_{2}\left( P_{B} \right) $$
(9)
$$ H=H_{F}+H_{B} $$
(10)

where P F log 2(P F ) and −P B log 2(P B ) are entropies of foreground and background regions respectively, also referred to as HF and HB for brevity.

Fig. 2
figure 2

μ F and μ B membership curves with sample values of (a, b, c)

Using (6) and (7), we can calculate HF and HB as follows:

$$ \mathrm{H}_{\mathrm{F}}=-\left[ \thinspace \left( \sum\limits_{\mathrm{j=0}}^{\mathrm{L-1}} {\mathrm{\mu }_{\mathrm{F}}\left( \mathrm{j} \right)\mathrm{p}\left( \mathrm{j} \right)} \right)\thinspace \log_{2}\left( \sum\limits_{\mathrm{j=0}}^{\mathrm{L-1}} {\mathrm{\mu }_{\mathrm{F}}\left( \mathrm{j} \right)\mathrm{p}\left( \mathrm{j} \right)} \right)\mathrm{} \right] $$
(11)
$$ \mathrm{H}_{\mathrm{B}}=-\left[ \thinspace \left( \sum\limits_{\mathrm{j=0}}^{\mathrm{L-1}} {\mathrm{\mu }_{\mathrm{B}}\left( \mathrm{j} \right)\mathrm{p}\left( \mathrm{j} \right)} \right)\thinspace \log_{2}\left( \sum\limits_{\mathrm{j=0}}^{\mathrm{L-1}} {\mathrm{\mu }_{\mathrm{B}}\left( \mathrm{j} \right)\mathrm{p}\left( \mathrm{j} \right)} \right)\mathrm{} \right] $$
(12)

According to information and statistical theory concepts, the largest value of entropy leads to better solutions. Because the total entropy is the combination of foreground and background regions, the value of entropy is governed by the corresponding curves of μF and μB, which are defined for the foreground and background, respectively. Curves μF and μB are contradictory in nature, as an increase in the entropy of the foreground will tend to decrease the entropy of the background. Because the total entropy is a combination of both types of entropies, they will eventually settle into an optimal combination. The intersecting point of the optimal curves for μ F and μ B corresponds to threshold values that can be used for classification (see Fig. 2). According to (4) and (5), μF and μBcan be modeled by triplet (a, b, c); different (a, b, c) will generate different fuzzy entropy values for the foreground and background pixels. Therefore, the problem is reduced to finding an optimal combination of values of a, b, and c, such that the total entropy of both the foreground and the background pixels is maximum. The values of the parameters for which value of total entropy (H) is maximum will be used to determine the optimal threshold value for separation of foreground from background using (13) as

$$\begin{array}{@{}rcl@{}} \left( a^{\ast },b^{\ast },c^{\ast } \right)&\,=\,&arg\thinspace max\left\{ H\left( a,b,c \right) \right\}\thinspace \thinspace \thinspace \thinspace such\thinspace that\thinspace abc\\ &\!\in\!& L\thinspace \left\{ {\begin{array}{l} 0\thinspace \le \thinspace a\le \left( L-1 \right)\thinspace \\ 0\thinspace \le \thinspace b\le \left( L-1 \right)\thinspace \\ 0\thinspace \le c\le \left( L-1 \right)\thinspace \\ \end{array}} \right.\thinspace and\thinspace 0\le a\le b\le c\\ \end{array} $$
(13)

4 Proposed approach

This paper proposes an automatic and fast approach for the detection of moving objects in videos using an updated BS algorithm. The BS algorithm detects moving objects by calculating the difference of the current frame from a background frame and performing binarization over that difference using a threshold value. The background is continuously updated at every frame to absorb dynamics of the environment. The background update mechanism is a crucial component in the BS algorithm, as effective background modeling helps push static or motionless objects into the new background. The threshold also plays an important role in the classification of pixels to either foreground or background regions. It also helps to suppress false detections and nullify the presence of holes in frames. In this paper, the standard background detection algorithm has been changed and an optimized threshold value is generated using the BBBCO algorithm, which aids pixel classification and the detection of true object boundaries. BBBCO has also been utilized to extract other parameters for the standard BS algorithm, which has paved the way for the effective and correct modeling of background. Figure 3 shows the methodology used in the proposed approach. The motivation for employing BBBCO lies in its advantages over other algorithms, as it is computationally less expensive and converges faster. This makes it suitable for real time solutions and use in low-end devices like video and CCD cameras. In addition, calculation of the center of mass at every step incorporates more random behavior into the population space and helps the algorithm avoid local minima/maxima.

Fig. 3
figure 3

BBBCO based modified background subtraction methodology

The proposed BBBCO-based method generates a threshold value using the concept of fuzzy 2-partition entropy and formulates the threshold detection problem as a parameter optimization problem. The parameter estimation problem is an optimization problem in which the goal is to search for an optimal combination of parameters, in this case (a, b, c), within prescribed constraints. The optimization problem can be formulated as follows: find an optimal combination of parameters (a, b, c) such that the total entropy of the frame is maximized. The objective function can be specified as:

$$\begin{array}{@{}rcl@{}} \mathrm{H}&=&-\left[ \left( \sum\limits_{\mathrm{j=0}}^{\mathrm{L-1}} {\mathrm{\mu }_{\mathrm{F}}\left( \mathrm{j} \right)\mathrm{p}\left( \mathrm{j} \right)} \right)\thinspace \log_{2}\left( \sum\limits_{\mathrm{j=0}}^{\mathrm{L-1}} {\mathrm{\mu }_{\mathrm{F}}\left( \mathrm{j} \right)\mathrm{p}\left( \mathrm{j} \right)} \right) \right]\\ &&-\thinspace \left[ \left( \sum\limits_{\mathrm{j=0}}^{\mathrm{L-1}} {\mathrm{\mu }_{\mathrm{B}}\left( \mathrm{j} \right)\mathrm{p}\left( \mathrm{j} \right)} \right)\thinspace \log_{2}\left( \sum\limits_{\mathrm{j=0}}^{\mathrm{L-1}} {\mathrm{\mu }_{\mathrm{B}}\left( \mathrm{j} \right)\mathrm{p}\left( \mathrm{j} \right)} \right) \right] \end{array} $$
(14)

The aim is to maximize above objective function subject to constraints specified in (15) as

$$ \left\{\thinspace {\begin{array}{l} 0\le a\le 255 \\ 0\le b\le 255\thinspace \\ 0\le c\le 255 \\ \end{array}} \right. and\thinspace \thinspace \thinspace 0\le a\le b\le c $$
(15)

For grey level videos, the search space for (a, b, c) varies in the range (0, 255) and therefore, these values act as constraints for the variables. Once an optimal combination of values for (a,b,c) is found, an optimal threshold value can be found by solving (4) and (5) [28, 29]. The resulting formula is specified in (16)

$$ T=\left\{ {\begin{array}{l} a^{\ast }+\sqrt{ \frac{\left( {c^{\ast }-a}^{\ast } \right)\left( b^{\ast }-a^{\ast } \right)}{2} \thinspace \thinspace \thinspace \thinspace}, \left( a^{\ast }+\thinspace c^{\ast } \right) \mathord{\left/ {\vphantom {\left( a^{\ast }+\thinspace c^{\ast } \right) 2}} \right. \kern-\nulldelimiterspace} 2\thinspace \le b^{\ast }\le c^{\ast } \\ c^{\ast}-\sqrt {\frac{\left( {c^{\ast }-a}^{\ast } \right)\left( c^{\ast }-b^{\ast } \right)}{2} \thinspace \thinspace \thinspace \thinspace},\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace {a^{\ast }\le b^{\ast }\le \left( a^{\ast }+\thinspace c^{\ast } \right)} \mathord{\left/ {\vphantom {{a^{\ast }\le b^{\ast }\le \left( a^{\ast }+\thinspace c^{\ast } \right)} 2}} \right. \kern-\nulldelimiterspace} 2\thinspace\\ \end{array}} \right. $$
(16)

where a b and c are values for which (14) is maximized.

4.1 BBBCO based fuzzy 2-partition entropy thresholding algorithm

The paper uses the BBBCO-based fuzzy 2-partition entropy thresholding approach for finding an optimal value for the threshold and utilizing this value to effectively model a real-time solution for the detection of moving objects in videos. The contribution of this work is novel, as a standard BS method has been optimized and made adaptive using the Big Bang–Big Crunch technique for the effective detection of objects in videos. The methodology makes use of the concept of fuzzy 2-partition entropy and uses BBBCO to threshold videos into two regions, foreground and background. BBBCO is also used to determine parameters that help model the background effectively. The complete steps for modified BS method using BBBCO are given below:

Steps of BBBCO-BS Procedure

  1. 1.

    Read a video sequence; Extract frame information and other video parameters like number of frames (NFrames), frames per second (fps), FrameHeight, FrameWidth.

  2. 2.

    Repeat for i =1 to NFrames /*Run the loop for all frames*/

  3. 3.

    Read new frame as F(i) /* Read new frame*/

  4. 4.

    Initialization:Initialize variables

    1. 4.1.

      if(first frame) /* Initialize variables to be used in algorithm for first time*/

    2. 4.2.

      Initialize Background (frame) and Foreground (frame) to F(i).

    3. 4.3.

      Use first frame to initialize some matrices that will be used in background subtraction algorithm namely Difference_matrix, Foreground_matrix, Background_matrix /* Difference_matirx = Foreground_matirx = Background_matrix = first frame*/

    4. 4.4.

      end if.

  5. 5.

    Generate an optimized threshold value along with other parameters for current frame.

    1. 5.1.

      if((mod (fps, i)) ==0) /* condition to call BBBCO procedure; This is implemented to restrict calling on every frame so that performance can be improved*/

    2. 5.2.

      Find threshold = BBBCO(F(i)); /* get threshold and other parameter for frame*/

    3. 5.3.

      end if

  6. 6.

    Evaluation: Perform evaluation and update background from extracted parameters

    1. 6.1.

      Read next frame as current_frame.

    2. 6.2.

      Update Foreground(frame) and Background(frame) to current_frame.

      1. 6.2.1.

        if (mod(track_refreshrate,i) = = 0) /*use value of variable track_refreshrate to check if background refresh is required. Lower value means frequent update of background*/

      2. 6.2.2.

        temp_background_matrix = Foreground(frame);

      3. 6.2.3.

        Background_matrix = temp_background_matrix;

      4. 6.2.4.

        Difference_matrix =Foreground_matrix -Background_matrix;

    3. 6.3.

      end if

    4. 6.4.

      if mod(track_refreshrate, i) ˜ = 0 ) /*background requires only a small update */

      1. 6.4.1.

        Foreground_matrix = Foreground (frame)

      2. 6.4.2.

        temp_background_matrix = 0;

      3. 6.4.3.

        Difference_matrix =Foreground_matrix- Background_matrix;

      4. 6.4.4.

        Perform Background modeling using Running Average method using updated value of Alpha returned by BBBCO procedure.

        for i =1:FrameHeight

        for j =1:FrameWidth

        if(Foreground_matrix(i,j) ˜ =0)

        Background_matrix (i,j) = (1−Alpha) Background_matrix(i,j) + Alpha* Foreground_matrix(i,j);

        end if

        end for

        end for

        Background(frame) = Background_matrix;

    5. 6.5.

      end if

  1. 7.

    Generate binary frame on Difference_matrix using threshold values returned by BBBCO in step 5.

  2. 8.

    Apply Morphological Processing for enhancing detection of objects

    1. 8.1.

      Apply morphological open, close and fill operations to clear holes and enhance detections.

    2. 8.2.

      Perform blob analysis to get objects of interest with bounding boxes and other parameters.

  3. 9.

    Plotting with bounding boxes: Find contours and draw bounding boxes on detected objects.

  4. 10.

    Set i = i+1; /* Loot at step 2 ends */

  5. 11.

    end Loop

The modified BBBCO-based BS method works by reading video sequences and extracting useful parameters such as frame rate, height, and width. The method initializes the background and foreground frames using the first frame. Moving objects are calculated thereafter by subtracting the current frame from the background frame. The proposed modified BS method calls the BBBCO procedure to find optimal of various parameters and threshold for the specific frame. The threshold value detected by BBBCO is used for pixel-wise comparison to separate foreground from background. The background refresh rate is modeled using another parameter whose value is governed by relative changes in the foreground and background regions and helps determine whether a background refresh is required. A learning parameter (Alpha) modeled by the BBBCO algorithm as the ratio between the optimal and maximum threshold values is also employed to update the background periodically. Once an object is detected, it is processed through a sequence of morphological operations to smooth the boundaries of the detected object and suppress holes and false detections. Blob analysis is performed on the detected objects to detect contours and bounding boxes, which are further used to plot objects on actual frames.

The BBBCO algorithm generates an optimized threshold value for the modified BS algorithm using a total of nine steps (four major steps and five minor steps) discussed here. The first step, initialization, consists of generating a set of initial candidate solutions and the required variables are distributed randomly with an upper bound. The pre-processing step includes all operations from reading a frame to finding the fuzzy entropies for the foreground and backward regions. The input frame is read and the probability distribution of each gray level is calculated and normalized in the range 0–1. Input frames are fuzzified into foreground and background regions using the initial population. The fuzzy 2-partition entropies of these regions are calculated using Shannon’s entropy, and the collective entropies of both regions are used to measure the fitness levels of potential candidates. The objective function is the overall entropy of both regions and the aim is to maximize its value. The maximization process iterates the BBBCO algorithm over several generations. Given a set of candidate solutions, the algorithm finds an effective center of mass for every iteration (step 3). In step 4, the next generation of candidate solutions is generated in accordance with the center of mass generated in the previous step. Additional randomness is incorporated into the population using a random function to generate new candidates. The center of mass generated in the previous step is also added to the population to prevent losing the best solution. This speeds convergence and avoids trapping the search in local minima. BBBCO goes through a series of iterations and stores the best candidate solution in variable S best (step 6). Optimal values of parameters (a , b , c ) are calculated from the best candidate, S best (step 7). The value of these parameters is used to calculate the value of threshold using (14). Threshold along with other parameters, are returned by the algorithm in step 9. This threshold value is used to binarize the difference frame to extract moving object regions, and the other parameters are used to model the background.

Steps of BBBCO based parameter extraction procedure

  • 1. Initialization: Initialize required variables.

1.1:

Initialize BBBCO parameters:- Population size(N), number of iterations(M), number of parameters(p). /* for this algorithm p =3 for three variables a, b, c; N =30, M =40*/

1.2:

Initialize population randomly in accordance to population size(N) and number of parameters (p) and store in variable [pop]Nxp subject to constraints 0 ≤ a≤(L−1); 0 ≤ b≤(L−1);

0 ≤c≤(L−1) and 0≤a≤b≤c.

1.3:

Evaluate fitness for each candidate solution in variable [F]Nx1

1.4:

Each element in population is candidate for possible feasible solution and solution is represented as [S]Nx4 where each S i represents some feasible values of aibici together with corresponding fitness value of solution. Variables for population of candidates, fitness and solution space are represented as –

$$\text{pop}\left( \mathrm{N,p} \right)\mathrm{\,=\,}\left[ {\begin{array}{l} \mathrm{a}_{\mathrm{1}}\mathrm{\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace }\mathrm{b}_{\mathrm{1}}\mathrm{\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace }\mathrm{c}_{\mathrm{1}} \\ \mathrm{a}_{\mathrm{2}}\mathrm{\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace }\mathrm{b}_{\mathrm{2}}\mathrm{\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace }\mathrm{c}_{\mathrm{2}} \\ \mathrm{.\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace .\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace .} \\ \mathrm{.\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace .\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace .} \\ \mathrm{.\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace .\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace .} \\ \mathrm{a}_{\mathrm{N}}\mathrm{\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace }\mathrm{b}_{\mathrm{N}}\mathrm{\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace }\mathrm{c}_{\mathrm{N}} \\ \end{array}} \right]_{\mathrm{Nx3}}\mathrm{F}\left( \mathrm{N,1} \right)\mathrm{=\thinspace }\left[ {\begin{array}{l} \mathrm{F}_{\mathrm{1}} \\ \mathrm{F}_{\mathrm{2}} \\ \mathrm{.} \\ \mathrm{.} \\ \mathrm{.} \\ \mathrm{F}_{\mathrm{N}} \\ \end{array}} \right]_{\mathrm{Nx1}} $$
$$\mathrm{S}\left( \mathrm{N,4} \right)\mathrm{=}\left[ {\begin{array}{l} \mathrm{S}_{\mathrm{1}} \\ \mathrm{S}_{\mathrm{2}} \\ \mathrm{.} \\ \mathrm{.} \\ \mathrm{.} \\ \mathrm{S}_{\mathrm{N}} \\ \end{array}} \right]_{\mathrm{Nx4}}\mathrm{=}\left[ {\begin{array}{l} \mathrm{a}_{\mathrm{1}}\mathrm{\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace }\mathrm{b}_{\mathrm{1}}\mathrm{\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace }\mathrm{c}_{\mathrm{1}} \\ \mathrm{a}_{\mathrm{2}}\mathrm{\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace }\mathrm{b}_{\mathrm{2}}\mathrm{\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace }\mathrm{c}_{\mathrm{2}} \\ \mathrm{.\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace .\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace .} \\ \mathrm{.\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace .\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace .} \\ \mathrm{.\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace .\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace .} \\ \mathrm{a}_{\mathrm{N}}\mathrm{\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace }\mathrm{b}_{\mathrm{N}}\mathrm{\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace }\mathrm{c}_{\mathrm{N}} \\ \end{array}}\mathrm{\thinspace \thinspace \thinspace \thinspace \thinspace }{\begin{array}{l} \mathrm{F}_{\mathrm{1}} \\ \mathrm{F}_{\mathrm{2}} \\ \mathrm{.} \\ \mathrm{.} \\ \mathrm{.} \\ \mathrm{F}_{\mathrm{N}} \\ \end{array}} \right]_{\mathrm{N\thinspace x\thinspace 4}} $$
2. Preprocess: :

Perform initial operations like generation of histogram and calculation of fuzzy entropies]

2.1:

Read new frame as thisFrame and convert it to grayscale.

2.2:

Normalize histogram of thisFrame between zero and one.

2.3::

Find the number of pixels h(j) in thisFrame with gray level j, where j = 0, 1, 2, …, (L-1)

2.4::

Find probability distribution of gray level values in frame using formula

$$\mathrm{p}\left( \mathrm{j} \right)\mathrm{=}\frac{\mathrm{h(j)}}{\mathrm{(height\thinspace x\thinspace width)}} $$
2.5::

Repeat For k =1: max number of iterations (M)

2.6::

For each gray level value j, where j=1,2,3…L−1, and for each value of ai, bi,ci where i=1,2,3…N from pop(i), perform fuzzification of foreground \(\mathrm {\mu }_{\text {Fg}}^{\mathrm {k}}\) and background \(\mathrm {\mu }_{\text {Bg}}^{\mathrm {k}}\)through parameters, using formulas (4) and (5).

2.7::

Find pixel-wise fuzzy entropy for foreground and background regions defined as \(\mathrm {H}_{\text {Fg}}^{\mathrm {k}}\mathrm {(i)}_{\mathrm {}}\)and \(\mathrm {H}_{\text {Bg}}^{\mathrm {k}}(\mathrm {i})_{\mathrm {}}\)using formulas (11) and (12).

3. Fitness Evaluation: :

Calculate fitness of individuals and find center of mass.

3.1:

Evaluate total fitness as sum of entropies of background and foreground

$$\mathrm{F}^{\mathrm{k}}\left( \mathrm{i} \right)\mathrm{\thinspace =\thinspace }\mathrm{H}_{\text{Fg}}^{\mathrm{k}}\left( \mathrm{i} \right)\mathrm{+\thinspace }\mathrm{H}_{\text{Bg}}^{\mathrm{k}}\left( \mathrm{i} \right) $$
3.2:

Reorder all feasible solution candidates of the population in descending order, according totheir respective fitness value Fk(i)

3.3:

Calculate center of mass for current population of candidates at the k th iteration using formula

$$\mathrm{C}^{\mathrm{k}} \mathrm{(l)=}\frac{{\sum}_{\mathrm{i=1}}^{\mathrm{N}} \frac{{\text{pop}}^{\mathrm{k}}\mathrm{(i,l)}}{\mathrm{F}^{\mathrm{k}}\mathrm{(i)}} }{{\sum}_{\mathrm{i=1}}^{\mathrm{N}} \frac{\mathrm{1}}{\mathrm{F}^{\mathrm{k}}\mathrm{(i)}} } $$

where\({\mathrm {C}}_{\mathrm {l}}^{\mathrm {k}}\) represents the center of mass for k th iteration and l refers to the value of one of the parameters and popk(i,l) refers to i th candidate solution.

4. Generation of new population of candidates

4.1:

Generate new (N-1) set of candidate solutions computed randomly around center of mass using formula

$${\mathrm{pop(i,l)}}^{\mathrm{k+1}}\mathrm{=\thinspace }\mathrm{C}^{\mathrm{k}}\mathrm{(l)\pm \thinspace }\frac{\mathrm{rand\ast }\left( \max \left( {\text{pop}}^{\mathrm{k}}\mathrm{(i,l)} \right)\mathrm{-\thinspace }\min \left( {\text{pop}}^{\mathrm{k}}\mathrm{(i,l)} \right) \right)}{\mathrm{k+1}} $$

rand is random number generator for generating random number between 0 and 1, max and min are used to select maximum and minimum values of respective variable from pool of candidates and l= 1,2,3 for variable a,b and c.

4.2:

Copy best candidate of the previous step to new population’s first index to prevent loosing the best solution.

5.:

Set k = k+1 and return to Step 2.5.

6.:

Save the best solution as S best<– S(1,:).

7.:

Extract values of (abc) from the first three columns of S best as (abc) = S(1,1:3)

8.:

Calculate final threshold from best solution obtained using formula

$$\mathrm{T=}\left\{ {\begin{array}{l} \mathrm{a}^{\mathrm{\ast }}\mathrm{+}\sqrt{\frac{\left( {\mathrm{c}^{\mathrm{\ast }}\mathrm{-a}}^{\mathrm{\ast }} \right)\left( \mathrm{b}^{\mathrm{\ast }}\mathrm{-}\mathrm{a}^{\mathrm{\ast }} \right)}{\mathrm{2}}} \mathrm{\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace }\left( \mathrm{a}^{\mathrm{\ast }}\mathrm{+\thinspace }\mathrm{c}^{\mathrm{\ast }} \right) \mathord{\left/ {\vphantom {\left( \mathrm{a}^{\mathrm{\ast }}\mathrm{+\thinspace }\mathrm{c}^{\mathrm{\ast }} \right) \mathrm{2}}} \right. \kern-\nulldelimiterspace} \mathrm{2}\mathrm{\le }\mathrm{b}^{\mathrm{\ast }}\mathrm{\le}\mathrm{c}^{\mathrm{\ast }} \\ \mathrm{c}^{\mathrm{\ast }}\mathrm{-}\sqrt{ \frac{\left( {\mathrm{c}^{\mathrm{\ast }}\mathrm{-a}}^{\mathrm{\ast }} \right)\left( \mathrm{c}^{\mathrm{\ast }}\mathrm{-}\mathrm{b}^{\mathrm{\ast }} \right)}{\mathrm{2}}} \mathrm{\thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace \thinspace }{\mathrm{a}^{\mathrm{\ast }}\mathrm{\le }\mathrm{b}^{\mathrm{\ast }}\mathrm{\le }\left( \mathrm{a}^{\mathrm{\ast }}\mathrm{+\thinspace }\mathrm{c}^{\mathrm{\ast }} \right)} \mathord{\left/ {\vphantom {{\mathrm{a}^{\mathrm{\ast }}\mathrm{\le }\mathrm{b}^{\mathrm{\ast }}\mathrm{\le }\left( \mathrm{a}^{\mathrm{\ast }}\mathrm{+\thinspace }\mathrm{c}^{\mathrm{\ast }} \right)} \mathrm{2}}} \right. \kern-\nulldelimiterspace} \mathrm{2} \\ \end{array}} \right. $$
9.:

Return threshold T with other parameters.

10.:

End

5 Performance evaluations

Evaluation of proposed algorithm is an important and complex task. Evaluation of such algorithms can be done by comparing the detected objects with ground truth values of the object. A large number of metrics have been proposed in the literature and a few are discussed in this section for completeness. Nascimento and Marques [24] had proposed a framework for comparison of different motion detection algorithms based on false alarms, false failures, and correct detections. Readers may refer to studies of Karasulu et al.[25], Kasturi et al.[26] and McManus et al.[27] for detailed discussions on more metrics. These may be categorized as object and frame level metrics. Frame-based metrics are computed for every frame while object based metrics are based on coherency between ground truth object and detected object. Frame-based metrics include true positive rate sensitivity(TPR), false positive rate(FPR), false alarm ratio(FAR) and Specificity(Sp). Such metrics are computed from True positives(TP), False Positive (FP), False Negative (FN) and True Negatives(TN) as follows:

$$ TPR=\frac{\left| \text{TP} \right|}{\left| \text{TP} \right|\mathrm{+}\left| \text{FN} \right|} $$
(17)
$$ FPR=\frac{\left| \text{FP} \right|}{\left| \text{FP} \right|\mathrm{+}\left| \text{TN} \right|} $$
(18)
$$ FAR=\frac{\left| \text{FP} \right|}{\left| \text{TP} \right|\mathrm{+}\left| \text{FP} \right|} $$
(19)
$$ Sp=\frac{\left| \text{TN} \right|}{\left| \text{FP} \right|\mathrm{+}\left| \text{TN} \right|} $$
(20)

Where |TP|, |FP||TN||FN| represents the number of objects identified as true positive, false positive, true negatives and false negatives respectively.

Apart from above defined metrics, five other frame based metrics have been used as Accuracy of Object Count (AOC), Precision (P), Recall (R), Average Fragmentation (AFR) and F-Measure (FM) (i.e., F-Score (FS)). Overall values for these metrics are obtained by averaging of all frame wise values. The range of these parameters goes from 0 to + 1 with 1 being the best case. These are calculated as follows:

Let GT(t) be the set of the GT objects in a single frame t and let DT(t) be the set of output boxes, which are results of an algorithm in that frame. The AOC for frame t is defined as in (21)

$$\begin{array}{@{}rcl@{}} AOC(t)&=&\frac{Min(\left| GT\left( t \right) \right|,\thinspace \left| DT\left( t \right) \right|)}{{\left| GT(t) \right|+\thinspace \left| DT\left( t \right) \right|} \mathord{\left/ {\vphantom {{\left| GT(t) \right|+\thinspace \left| DT\left( t \right) \right|} 2}} \right. \kern-\nulldelimiterspace} 2}\\ &&if\thinspace \left| GT(t) \right|+\left| DT\left( t \right) \right|\ne 0 \end{array} $$
(21)

where |GT(t)|and |DT(t)| are number of ground truth objects and number of boxes detected by proposed method at any frame t. In this measure, only the count of boxes in each frame is considered, but spatial information is not considered. When |GT(t)|+|DT(t)|=0, it refers to the case when no object was present in the original frame and no object was detected by the algorithm as well. The value of AOC for that frame will simply be ignored; Precision for any frame (t) is defined as the percentage of selected items that are correct as is calculated using formula given in (22):

$$ Precision\left( t \right)=\frac{\left| \text{TP}\left( \mathrm{t} \right) \right|}{\left| \text{TP}\left( \mathrm{t} \right) \right|\mathrm{+}\left| \text{FP}\left( \mathrm{t} \right) \right|} $$
(22)

Recall is defined as the percentage of correct items that are selected and it is measured as

$$ Recall\left( t \right)=\frac{\left| \text{TP}\left( \mathrm{t} \right) \right|}{\left| \text{TP}\left( \mathrm{t} \right) \right|\mathrm{+}\left| \text{FN}\left( \mathrm{t} \right) \right|} $$
(23)

The fragmentation of the output boxes overlapping a GT(t) in frame t is defined as

$$\begin{array}{@{}rcl@{}} Frag\left( GT\left( t \right) \right)&=& \frac{1}{1+\log_{10}\left( \left| GT\left( t \right) \right|\thinspace \cap \left| DT\left( t \right) \right| \right)}\\ &&if\thinspace \left|GT\left( t \right) \right|\thinspace \cap \left| DT\left( t \right) \right|\ne 0 \end{array} $$
(24)

When |GT(t)|∩|DT(t)|=0; the value of fragmentation will be ignored as it refers to the case when no bounding boxes are present in the frame (t). Overall fragmentation value is calculated as the average of all frame level fragmentation values and is known as AFR (Average Fragmentation Rate). Often precision and recall measures are contradictory in nature and it is always advisable to evaluate any method based on a combination of these two measures. F-measure has been used for this purpose in a number of studies in the literature. It is also known as effectiveness measure and is defined as

$$ \mathrm{FM=}\frac{\left( \mathrm{\beta }^{\mathrm{2}}\mathrm{+1} \right)\mathrm{Precision.Recall}}{\mathrm{\beta }^{\mathrm{2}}\mathrm{Precision+Recall}} $$
(25)

In order to balance the weight of precision and recall, we substitute β=1 that gives us balanced F-1 measure as

$$ \mathrm{F1=}\frac{\mathrm{2\thinspace x\thinspace Precision\thinspace x\thinspace Recall}}{\mathrm{(Precision+Recall)}} $$
(26)

5.1 Experiment results

The current study has been tested on the benchmark BMC and Context Aware Vision using Image-based Active Recognition (CAVIAR) datasets, which are available at http://bmc.iut-auvergne.com/ and http://homepages.inf.ed.ac.uk/rbf/CAVIARDATA1/, respectively. The BMC dataset consists of 29 videos, categorized under two sets: learning and evaluation set. Videos under learning set are used for tuning of parameters while videos under evaluation consist of more complex synthetic and real videos with truth values also bundled together. The CAVIAR dataset consists of video clips that are recorded via a wide-angle camera. CAVIAR dataset consist of 27 video clips captured in the INRIA lab entrance hall for testing purposes along with Ground trurth(GT) values. A number of tests and evaluations have been carried out and these have been projected in this section. Plots for precision P(t) and Recall R(t) for all ‘t’ frames have been plotted in videos captured from the dataset and are shown in Figs. 45678 and 9. Precision versus Recall feature is useful for evaluating the performance of a classifier. Ideal value for precision versus recall curve is on the upper right hand corner of plots which depicts 100 % precision and recall features of the classifier. Worst behavior of classifier is when precision and recall tend to zero and curves shift towards the bottom left corner. Statistical tests have also been conducted and are represented in the form of Table 1 and Table 2.

Fig. 4
figure 4

P vs R plot for synthetic-322 clips

Fig. 5
figure 5

P vs R plot for wandering student video clip

Fig. 6
figure 6

P Vs. R plot for Browse3 video clip

Fig. 7
figure 7

P Vs. R plot for Browse4 video clip

Fig. 8
figure 8

P vs. R plot for Brw2 video clip

Fig. 9
figure 9

P vs. R plot for FrA1 video clip

Table 1 Statistical comparison of various background subtraction methods
Table 2 Performance comparison of various background subtraction methods

Two random videos under evaluation set from the BMC dataset, namely “synthetic 322” and “wandering students” were chosen for testing. “Synthetic 322” consists of 1501 PNG files while “wandering students” consists of 795 frames together with ground truth values. Figures 4 and 5 show precision and recall, respectively, obtained for these sample videos. The results of proposed method are compared with the standard BS algorithm for both videos. For “wandering students,” in which a larger number of moving objects are present in a single frame, BBBCO-based BS obtains good results. Relatively higher values of precision and recall are observed in the graphs because the modified BS algorithm could extract finer details of detected objects and model the background effectively. The BBBCO-based BS algorithm could make a large number of detections with correct object boundaries, thereby moving the results towards the upper right corner of the graph.

The proposed study was also evaluated on videos from the CAVIAR dataset and compared with a similar method by Karasulu [19] and the standard BS algorithm. The same video clips used in [19], namely “Browse3,” “Browse4,” “Browse_WhileWaiting2,” and “Fight_Run Away1” are used in this evaluation. These are referred to as Br3, Br4, BrW2, and FrA1 in the present study and consist of 912, 1,140, 1,896 and 552 video frames, respectively. Figures 6–9 show plots of Precision versus Recall values for the four video sequences chosen from the CAVIAR dataset. The curves of the SA-BS and BBBCO-BS method are similar for almost all the videos. However, the proposed BBBCO-based BS method performs better than the simple BS algorithm and the SA-based BS method because the values of precision and recall in the case of the proposed approach are higher. The total number of misses by the standard BS algorithm is higher while almost the same number of detections are obtained by both entropy-based methods. For Br3 and Br4, the proposed method outperforms both SA-BS and simple BS, and is able to make detections that are more accurate (Figs. 6 and 7). For the Brw2 video clip, the number of detections obtained by the proposed method are significantly higher than those obtained by both comparison methods, as is clear in Fig. 8. This can be attributed to the fact that the proposed scheme could detect slow-moving objects and those that with a small size efficiently. For the FrA1 video clip, the detection of all the methods was almost same, as there were fewer intensity variations in the video, and even the simple BS method was competitive with respect to other methods and obtained good results. Overall, it can be concluded that the proposed approach is still preferable to other methods for effective detection.

Various statistical measures were also calculated for the proposed method and the corresponding results are shown in Tables 1 and 2. Table 1 lists the values of the true positive rate (TPR), false positive rate (FPR), false alarm rate (FAR) and specificity (Sp). The values of AOC, P, R AFR, and F are shown in Table 2. Larger values of these variables indicate a better algorithm. Both tables clearly show that the proposed method is preferable to the rest of the methods. The proposed methodology can simulate the effects of changing illumination, dynamic backgrounds, and noisy environments. The number and accuracy of the detections obtained by the proposed method are higher for almost all video clips.

6 Conclusion

This study presented a novel method for object detection in videos using the fuzzy entropy principle and BBBCO. The study proposed a new variant of the standard BS algorithm to extract objects from video sequences by employing the concept of fuzzy partition entropy. The modified algorithm determines the optimal threshold value for classifying pixels as foreground and background. This is done using the concept of fuzzy 2-partition entropy and framing the problem of threshold detection as an optimization problem. A recent optimization technique called BBBCO is used to detect the effective threshold. The proposed technique is novel, as the concept of BS has not been explored using fuzzy entropy and optimization theory. The proposed technique was tested on benchmark datasets and a number of metrics were evaluated to gauge its effectiveness. The method was also compared with the traditional BS method and another recent method. The obtained results are highly encouraging and show improvement over state-of-the-art methods.