Keywords

1 Introduction

Background subtraction is an important research direction in the field of computer vision. It is widely used in multi-modal video registration, target detection, target tracking. Background modeling is very important for the processing of video, and even directly affects the success or failure of a system. And it can be considered as a classification problem in a certain degree, also known as background foreground segmentation. There are many ways to model the background in background subtraction, including: Single Gaussian (SG) used in real-time tracking of the human body, Mixture of Gaussian Model (MGS) [15], Running Gaussian Average(RGA) [3], CodeBook [8], SOBS-Self-organization background subtraction [17], SACON [19], ViBe algorithm [1], Color [5]. Temporal Median filter [10], W4 and so on. Barnich et al. proposed a visual background extraction (ViBe) algorithm, which just a frame image can established background model, and use the conservative update strategy, the algorithm is simple and real-time. The LBP algorithm uses the imaging principle of human eye to create a background. There is a trajectory-controlled watershed segmentation algorithm which effectively improves the edge-preserving performance and prevents the over-smooth problem [20].

Our method improved the original LBSP, our main contributions are: (1) combination color and LBSP, which can increase the accuracy of the foreground edge. (2) redefine persistence [12, 14], which consists of color, LBSP, and t in time and space respectively, can remove the shadow of the target and the hole inside the target.

2 Relation Work

According to the needs of the project, our method is mainly dealing with static background video, in which the basic idea of the pixel-based model is to establish a background model for each pixel, comparing with the successive frames constantly. Detecting the target based on the difference between the two, and the pixels which is detected as background will be used to update the background model. As time goes on, using the frame to compare with background, updating the background model constantly, and after several frames, the establishment of the background model is stable gradually, and reflect the background more accurately and can adapt to the background scene changes, such as illumination variation.

The general method lncludes four steps: Pre-processing, Background modeling, Foreground segmentation, and Data verification. At first, pre-processing consists of drying, enhancement, etc. And then background modeling, followed by foreground segmentation. In the process of updating the background model, the segmented data is re-applied to the background model in the form of feedback, complete the background (bg) and foreground (fg) segmentation at last. And there are some models we have learned and improved:

2.1 ViBe

The ViBe algorithm considers background modeling as a classification problem. When the selection is made on the color space, it is judged that the pixel is the foreground pixel by comparing the value of the pixel at the current position with the value of its neighborhood pixel, \(v_1\), \(v_2\), \(v_3\), \(v_n\) represent the value of the pixel at the periphery of each pixel or adjacent frame, and each background pixel is modeled by a collection of background sample values, representation such as Eq. 1:

$$\begin{aligned} M(x) =\left\{ v_{1},v_{2},\dots ,v_{N}\right\} . \end{aligned}$$
(1)

According to its corresponding model M(x), we can classify a pixel value v(x). The specific approach is as follows:

Firstly, define a sphere \(S_R(v(x))\) of radius R centered on v(x). Second, Compare v(x) to the closest values with the \(S_R(v(x))\). At last, We use #min for the threshold, we use # for the number of the intersections between M(x) and \(S_R(v(x))\), see Eq. 2, if # is larger than or equal to the #min, the pixel value v(x) is classified as background, else as foreground. And the algorithm representation respectively in grayscale images and color images as shown in Fig. 1:

$$\begin{aligned} \#\left\{ S_{R}\left( v\left( x\right) \right) \bigcap \left\{ v_{1},v_{2},\dots ,v_{N}\right\} \right\} . \end{aligned}$$
(2)
Fig. 1.
figure 1

(a), (b) respectively represent the algorithm in grayscale images and color images

2.2 Local Binary Pattern: LBP

One of the significant algorithm of the background subtraction is the LBP [11]. LBP is an algorithm based on feature, which can create a descriptor for each pixel, the specific approach as follows: every pixel as the center pixel to create a 3*3 window, only consider the eight neighbor of a pixel in the basic version, and from the top left corner, clockwise, the eight pixels around the window will be a descriptor of the center pixel, where if the pixel is larger than or equal to center pixel, then marked as 1, else as 0, see Eq. 3:

$$\begin{aligned} s \left( x\right) =\left\{ \begin{array}{rcl} 1 &{} &{} {\quad I_x \ge I_c} \\ 0 &{} &{} {\quad I_x < I_c} \end{array} \right. \end{aligned}$$
(3)

The eight flag bits will produce an eight-bit binary number, which is written down from the upper left corner and clockwise to the binary number. This binary number is converted to a decimal number. The decimal number represents the LBP value of the center pixel. Each pixel will get an LBP according to the neighborhood information through the method. Experiments show that the LBP operator has strong robustness to illumination variation, but the disadvantage is the coverage area is too small to accurately describe the texture requirements.

2.3 Local Binary Similar Pattern: LBSP

The original LBSP is an improvement based on the LBP, and the difference is that it is not a constant that determines whether a pixel (recorded as observe-pixel in the paper) is a background pixel (recorded as feature feature-pixel in the paper), comparing the absolute value of the two instead, see Eqs. 4 and 5.

$$\begin{aligned} LBSP \left( x,y\right) =\sum _{p=0}^{p-1}d\left( v_p-v_c\right) \cdot 2^p \end{aligned}$$
(4)
$$\begin{aligned} d\left( v_p-v_c\right) =\left\{ \begin{array}{rcl} 1 &{} &{} {\quad if \ |v_p-v_c| \le T} \\ 0 &{} &{} \quad otherwise \end{array} \right. \end{aligned}$$
(5)

The advantage of the improvement is that there is a broader and more meaningful range in the background segmentation, which further enhances the robustness of the background modeling algorithm to illumination variation.

3 The Generation of New Methods

Through the study of three background modeling methods, we can find there are many ideas worth learning, the three of them in common is that establishing a background descriptor for each pixel, and comparing the corresponding background pixels with the next frame, except that ViBe is established under the assumption that the background model of a pixel can be described by randomly selecting pixels, and for this reason, the ViBe algorithm does not perform a good segmentation on the initialization if the target in the background model. Both LBP and LBSP use background pixels around a pixel to describe the background, and this description is more representative, the key points of this approach are the selection of the neighborhood and the determination of the threshold. Next, we will introduce our new method detailed.

3.1 Combine Color and LBSP Features to Establish Background Models

The Disadvantage of only Using LBSP Feature Modeling. In the segmentation process of using LBSP, there are some disadvantages of only using LBSP feature modeling: In the segmentation process, the observe-pixel segmentation using feature-pixel background model is to compare the pixels around observe-pixel (marked as observe-pixel LBSP in the paper) with the background model (marked as feature-pixel LBSP in the paper) [2]. Assuming that the intensity value of pixel i is 100 in background modeling and the neighborhood pixels are 40, the L1 distance of feature-pixel LBSP is 60, the feature-pixel LBSP of the pixel is 000000000000000; if the neighborhood pixels are 160, the L1 distance of the observe-pixel LBSP is still 60, and the observe-pixel LBSP is 0000000000000000. It is clearly that the segmentation result is wrong because losing the observe-pixel own pixel information. In this case, the experiment results get the noise (stain) shown in (e) of Fig. 2.

Fig. 2.
figure 2

The drawback of LBSP. (a) Nobody on the road. (b) A pedestrian on the road. (c) The feature-pixel LBSP. (d) The observe-pixel LBSP. (e) The noise in the results (marked with a green box). (Color figure online)

Combining Color and LBSP Information to Establish the Background Model. In the previous section, we have experimentally demonstrated that the segmentation result is wrong because losing the observe-pixel own pixel information. In this part, the LBSP method has been improved (marked as improved-LBSP in the paper), combining LBSP and color for background modeling is based on the pixel-level model [13]. We also refer to the color attribute in [9]. The specific process of modeling is taking each pixel as a center, making a 5*5 window, the description includes 25 neighbor pixels and the color value.

The pixel at the center position is marked as \(v_c\). First, the center pixel is described by LBSP, as shown in Eq. 6:

$$\begin{aligned} LBSP \left( x,y\right) =\sum _{p=0}^{p-1}d\left( v_p-v_c\right) \cdot 2^p \end{aligned}$$
(6)

Where \(v_p\) is the pixel intensity of the p-th neighbor of \(v_c\) on the predefined P-length LBSP pattern. Taking into account the distance between the surrounding pixels and the central pixel are different, in the LBSP character description is treated differently, the specific definition see Eqs. 7 and 8:

$$\begin{aligned} d\left( v_p-v_c\right) =\left\{ \begin{array}{rcl} 1 &{} &{} {\quad if \ |v_p-v_c| \le T\cdot \alpha }\left( p,c\right) \\ 0 &{} &{} \quad otherwise \end{array} \right. \end{aligned}$$
(7)
$$\begin{aligned} \alpha \left( p,c\right) =\left( \frac{2\cdot d\left( p,c\right) }{d\left( p,c\right) +1}\right) ^2 \end{aligned}$$
(8)

Where T is the relative LBSP threshold, \(\alpha \)(p,c) expresses the different weight due to the difference distance, which is used to determine the threshold, and d(pc) represents the Euclidean distance between pixels.

Combine the Color and LBSP Information for Pixel Segmentation. In this paper, we have the following conventions: When \(v_p\) and \(v_c\) belong to the same frame, use feature-improved-LBSP. When \(v_c\) is the pixel in the background model, \(v_p\) is the pixel around the frame waiting for segmentation, use observe-improved-LBSP. The specific process of segmentation is: If the color of the pixel to be segmented is far from the color of the background pixel, it is classified as the foreground directly. Otherwise, the distance between the observe-improved-LBSP and the feature-improved-LBSP is calculated, and the smaller the Hamming distance the more similar, see Eq. 9, and we will show the process in the next.

$$\begin{aligned} F\left( x\right) =\left\{ \begin{array}{rcl} bg &{} &{} \quad if \ \left( H\left( a,b\right) <T_H\right) \\ fg &{} &{}\quad if \ \left( H\left( a,b\right) \ge T_H\right) \\ \end{array} \right. \end{aligned}$$
(9)
figure a

Where H() represents the Hamming distance, a, b respectively represent of observe-improved-LBSP of \(v_p\) and feature-improved-LBSP of \(v_c\), and \(T_I\) represents the intensity threshold. V(X,Y) is segmentation pixel, whose color information is \(i_{(x,y)observe}\), \(v_c\), \(i_c\) respectively represent the central pixel, and its intensity, \(v_p\) is the pixel intensity of the p-th neighbor of the \(v_c\). The update strategy draws on the reference [13] and will not repeat them.

3.2 Modeling the Background with Background Feature (BFs)

BFs Establishment and Foreground Segmentation. We propose a new background modeling method based on the modeling of Local Self Similary (LSS) [16] and codebook model [4]. We refer to the expression method in reference [12], and changed the conditions for establishing the background model. For each background pixel, we simply store the data from its surroundings LBSP information derived from the pixel clearly not represent the texture information in the small area, and an LBSP feature is established for each pixel, according to the mapping relation of the position, the eight neighborhood LBSP signatures of each position are taken as the position feature of the pixel, this feature can reflect the central pixel of the local neighborhood of the texture features. In addition, in the segmentation, taking into account the time and space information, we combine color, LBSP, t information as a persistence index, the number of occurrences of a pixel at time t, when the pixel from the last time. In this paper, the background model of the pixel is presented with the background feature (BF), and the background of the pixel is described as the background feature of the pixel. The degree of good and bad is expressed by persistence. The establishment of BFs for each pixel is shown in Fig. 3:

Fig. 3.
figure 3

The process of building BFs

In the pixel segmentation, we first need to calculate the resolution of the pixels to be segmented, that is, the pixels to be split with the corresponding position of the pixels of the similarity of BFs, the expression such as Eqs. 10, 11 and 12:

$$\begin{aligned} Q_t\left( x\right) =\sum _{\omega \in C_L\left( x\right) } q\left( \omega ,t\right) \mid \left( d \left( I_t\left( x\right) ,\omega \right) < D\left( x\right) \right) \end{aligned}$$
(10)

Where \(C_{L(x)}\)is the current pixels BFs, and \(\omega \) is a BF, \(d(I_t(x),w)<D(x)\) indicating that the pixel to be segmented with the \(\omega \) in the BFs meet the color/LBSP threshold and \(Q_t(x)\) is persistence.

$$\begin{aligned} q\left( w,t\right) = \frac{n \cdot \left( p\cdot \alpha + \frac{1}{t-t_{last}} \cdot \beta \right) }{S_{num}} \end{aligned}$$
(11)

Where n is the number of occurrences of w in BFs, p is the position information of w, \(\alpha \) is the position weight, tlast is the time of the last occurrence of w, \(\beta \) is the time weight, \(S_{num}\) is the total number, who is a constant in the experiment.

$$\begin{aligned} S_t\left( x\right) =\left\{ \begin{array}{rcl} 1 &{} &{} {\quad if \ Q_t\left( x\right) < W\left( x\right) }\\ 0 &{} &{} \quad otherwise \end{array} \right. \end{aligned}$$
(12)

Where W(x) represents the threshold of the background pixel persistence, and if the pixel is divided into a foreground, the result is 1; otherwise, the result is 0. D(x) and W(x) are dynamically adjusted, and the result of the segmentation is fed back to the BFs for the next segmentation [12].

Update Strategy. We compare the colors through L1 and compare the LBSP by the Hamming distance when calculates persistence. Combination the three information can improve the robustness to illumination variation. When the light of the observation scene has changed, we need to modify a small part of the BFs immediately. Only when the color/LBSP distortion is very small, we make random updates. If the change is only temporary, the small part that has not changed will restore the model to its original state quickly. We also draw on the sampling consistency model mentioned in [7, 18], that is, if a pixel is divided into a background, the neighboring pixel models are replaced as their local expression. This update strategy is different from the improvement of the spatial background mentioned in codebook [11], which is more predictable because it can be carried out under the influence of external conditions such as camera shake [12].

4 Experiments and Conclusions

The experimental platform is Intel(R) Core(TM) i5-4570, CPU@3.20 GHz, RAM 8 GB, the development tool of the algorithm is OpenCV2.4.10 + VS2010.

4.1 Experiment

Experiment1: In the improved part of LBSP described in 3.1, we used the data set CDnet dataset [6]. This data set contains a rich set of video sequences. We selected one of these data for experimentation, and there are qualitative and quantitative analysis of the experimental results. We refer to the experimental parameters in [2], and the experimental parameters we use are: \(T_H = 4, T = 30, T_I = 30\). The results are shown in Fig. 4:

Fig. 4.
figure 4

(a) The results of the reference [1] (b) The results of this method in 3.1 (c) The difference between the two (edge information)

In fact, we also change the size of the parameters in the experiment, \(T_H\) = 1(low), 28, 50(high) and with T = 3(low), 30, 90(high). And we will evaluate the model quantitatively from the following six index mentioned in [11].

  1. (1)

    Recall (Re): TP/(TP + FN)

  2. (2)

    Specificity (Sp): TN/(TN + FP)

  3. (3)

    False Positive Rate (FPR): FP/(FP + TN)

  4. (4)

    False Negative Rate (FNR): FN/(TN + FP)

  5. (5)

    Precision (Pr): TP/(TP + FP)

  6. (6)

    F-measure: 2 \(*\)Pr \(*\) Re/(Pr + Re)

Where TP is the number of pixels at the correct location, TN is the number of true negatives, FN is number of false negatives, and FP is the number of pixels at the wrong location.

It can be seen T has a greater effect on the results from the experimental results. When T is small, the model is not robust for the noise. When T becomes larger, our model is more robust for the noise. The experimental results are shown in Table 1 and Fig. 5:

Table 1. The value of the index under different thresholds
Fig. 5.
figure 5

(a) The original image in the dataset (b) Background subtraction result, when \(T_H=1,T=3\) (c) Background subtraction result, when \(T_H=1,T=30\) (d) Background subtraction result, when \(T_H=1,T=90\)

Experiment2: We also used the dataset CDnet dataset [6] to model the BFs. In the experiment, the number of BFs is fixed, the default is 50. In the static background, each background model can do a good split only need to activate two or three BFs. In the dynamic background, up to 20 BFs is activated. From the experimental results, our method can make the target complete which is blocked and camouflaged. Our improved method is denoted as BFs-method, the experimental results are shown in Fig. 6:

Fig. 6.
figure 6

(a) Experimental result of [12] (b) Experimental result of BFs-method

We also used the quantitative assessment in the previous section. The results are shown in Table 2. In this table, it show that BFs-method has a higher value in Recall, and has a good result in Precision (Pr). In addition, FPR has decreased comparing with the other six methods, and FNR is lower than PAWCS. This result shows that the persistence consists of color, LBSP, t can reflect the characteristics of the pixel background better, of which the number of occurrences can reflect the life of a pixel. In addition, the segmentation results clearly exceed the improved-LBSP which is described in Sect. 3.1.

Table 2. Quantitative assessment of several methods

4.2 Conclusion

We change the condition of establishing LBSP by adding the weight of distance based on [1]. From the experimental results, we can see that our method can get more accurate edge of the targets and remove the noise in the background when we set suitable value of parameters. In addition, BFs-method combines LBSP, color, persistence to build BFs, in which persistence is an index of the classification of background words, because it combines information of time and space, so it is superior to the classification of background words. BFs-method can remove the shadow of the target and the hole inside the target.