Keywords

1 Introduction

The video segmentation refers to decomposing a video data into meaningful elementary parts that have strong application and correlation with the real world contained in the video data. The collective numbers of segments resulted from video segmentation cover the entire real-time video data. The video signal keeps temporal information, which introduces the object motion and includes camera motion concept. Therefore, video has both temporal nature and spatial (static) nature [1]. Segmentation of video can be set with respect to temporal, spatial, or spatiotemporal [25]. In a video, the static image can be obtained by segmenting a frame in spatial domain. In temporal segmentation, the sequence of video frames is segmented.

The segmentation criteria may be motion-based or color/texture-based which can be used both individually and combine form. In motion-based segmentation methods [6], there is a group of pixels with similar motion, and they are separated by multiple layers. In most of the methods [79], the optical flow estimation is preprocessed, and then the pixels are divided based on the pattern of motion. Some methods [6, 10, 11] are transformed and together have an effect on motion and performance evaluation.

In the last few years, many ways of segmentation have been analyzed for different images as normalized cuts [12], mean-shift [13], segmentation based on lossy compression [14], weighted aggregation (SWA) [15], and the fuzzy C-means technique [16]. Many works have been done in application of skin tumor border in medical images using segmentation which is unsupervised [17]. In these methods [16, 17], adaptive thresholding has been implemented by using scale-space filter for the extraction of the required number of clusters using histograms. Another method is proposed for fuzzy C-means algorithm which is based on histogram [18, 19] and also purposed for simplification of mean-shift filter and K-means clustering both by using background modeling. One of the commonly used models is mixture of Gaussian (MOG), kernel density estimation (KDE), etc., for background estimation.

Here, we have studied and analyzed the mean-shift algorithm [13], for the segmentation of each frame independently with the given predefined parameters. Fuzzy K-means (also called fuzzy C-means) and K-means are two simple clustering methods. K-means finds complex clusters (fusion of single clusters), and fuzzy C-means is the more formalized statistical method and finds a soft cluster with more than one block with a certain probability.

The paper is prepared as follows: Sect. 2 deals with methodology of video segmentation methods, i.e., mean-shift, K-means, and fuzzy C-means; Sect. 3 gives the performance evaluation metrics. The experimental and discussion results are presented in Sect. 4, and the conclusions of this study are given in Sect. 5.

2 Study of Cluster-Based Video Segmentation Methods

We have analyzed and studied three video segmentation algorithms such as mean-shift, K-means, and fuzzy C-means that are discussed in the following sections.

2.1 Video Segmentation Using Mean-Shift

Mean-shift is a simplest procedure where each data point moves to the center of data in neighboring countries, introduced by Fukunaga and Hostetler [13]. The kernel density function was introduced by Cheng’s research into the mean-shift procedure.

In video segmentation, mean-shift algorithm is a direct resumption to protect the smoothness of interruptions. Each pixel is in its neighborhood. Then {x1, …, xn} is a set of data points that are based on geometric data that measure the dimension whose function dimension is d, and then the mean-shift vector of x is defined as follows:

$$ M\left( x \right) = \frac{{\mathop \sum \nolimits_{i = 1}^{n} x_{i} K_{w } \left( {x_{\text{i}} - x} \right) }}{{\mathop \sum \nolimits_{i = 1}^{n} K_{{w }} \left( {x_{i} - x} \right) }} $$
(1)

where x is the center of the kernel and \( K_{w} \left( x \right) = \left| W \right|^{ - 1/2} K\left( {W^{ - 1/2} x} \right) \). W is the approximate density of the dimension d × d, and matrix values determine the solution, the detection, the mode, and the smallest number of consecutive pixels. W is usually a diagonal matrix and W is a symmetric positive bandwidth matrix in which the values determine the smallest number of contiguous pixels. K(x) is a d-variate nucleus and is regarded as nonnegative free radical symmetric functions centered at zero.

The normal kernel defined below was used in our segmentation algorithm:

$$ K\left( x \right) = \left( {2\pi } \right)^{{ - \frac{d}{2}}} \exp \left( { - \frac{1}{2}\left\| x \right\|^{2} } \right) $$
(2)

Because of the mean-shift, the M(x) axis always shows the direction of the maximal density, and the average intermediate procedure is obtained by calculating the core change and changing the vectors of the vector K(x). At the end of the approximate procedure, it is guaranteed at the nearest point where the estimate is zero gradients. Define as

$$ y_{j + 1} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} x_{i} K_{w } \left( {y_{j} - x_{i} } \right) }}{{\mathop \sum \nolimits_{i = 1}^{n} K_{w } \left( {y_{j} - x_{i} } \right) }}\quad j = 1 $$
(3)

Equation (2) can be modified as

$$ y_{j + 1} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} x_{i} \exp \left( { - \frac{1}{2}\mathop \sum \nolimits_{l = 1}^{d} \left( {\left( {y_{j}^{l} - x_{i}^{l} } \right)/w_{l} } \right)^{2} } \right) }}{{\mathop \sum \nolimits_{i = 1}^{n} \exp \left( { - \frac{1}{2}\mathop \sum \nolimits_{l = 1}^{d} \left( {\left( {y_{j}^{l} - x_{i}^{l} } \right)/w_{l} } \right)^{2} } \right) }} $$
(4)

where \( y_{j}^{l} \) and \( x_{i}^{l} \) are the lth component of yj and xi, respectively. Thus, mean-shift algorithm is an update procedure for yj.

The mean-shift segmentation algorithm is as follows:

  • Step 1: Two-dimensional image size is a map of the two-dimensional space and looks at each pixel by \( x_{i} \in R^{d} \).

  • Step 2: Initialize the value j = 1 and yi,j = xi.

  • Step 3: Calculate the value of yi,j+1 according to Eq. (2) when it is approaching a certain point and then set zi = yi,c, where yi,c is the convergence value of yi,j.

  • Step 4: Imagine domain sharing {Cp}p=1,…, m by grouping together all zi than w1 to the lth function (l = 1, 2,…, d).

  • Step 5: For each i = 1, 2, …, n, assign \( L_{i} = \{ p\left| {z_{i} \in C_{p} \} } \right. \).

  • Step 6: Neglect the spatial regions having less than N pixels.

2.2 Video Segmentation Using K-Means

The simplest algorithm for learning without follow-up are the K-means, which can solve cluster problems. This is one of the easiest ways to divide datasets provided by a number of clusters (assume k clusters) that are supposed to fix priority. The main purpose is to set up a wider group for all (k centroids). These centroids are placed in the way different locations lead to different results. Therefore, everything is put as far as possible to get a good result. So each point in the dataset is selected at one time and connects it to the nearest center. When each point is processed, the first step is complete. So, at this point, the centroid is recalculated by clusters that came from the previous result. When we have these new centroids, we have to create a new connection between the same data and the nearest new center. Since it is a method, a loop is created and uses this cycle; we can notice that centroids k change their position step by step to further changes. Finally, we minimized an objective function. The objective function is given as

$$ J = \sum\limits_{j = 1}^{k} {\sum\limits_{i = 1}^{n} {\left\| {x_{i}^{\left( j \right)} - c_{j} } \right\|^{2} } } $$
(5)

\( \sum\nolimits_{i = 1}^{n} {\left\| {x_{i}^{\left( j \right)} - c_{j} } \right\|^{2} } \) is the objective function within group I. \( x_{i} \) is a vector and \( c_{j} \) the corresponding cluster center.

The K-means segmentation algorithm is as follows:

  • Step 1: Consider the first vector of the K function in clusters.

  • Step 2: Sample vector is calculated using the minimum distance principle.

  • Step 3: New average is a new cluster for each cluster.

  • Step 4: Repeat step if the center is moved to Step 2, or stop.

2.3 Video Segmentation Using Fuzzy C-Means

Clear compilation plays an important role in problem-solving in recognizing patterns and sample identities and organizing video images. Different separation methods are provided by clusters, and most of them are based on distance criteria. The widely used algorithm is the C-means (FCM) delaying algorithm. This is useful when the cluster count is predetermined. In this way, the algorithms try to place each data point on a cluster. What makes FCM different is that it cannot resolve the absolute membership in the data points provided in the cluster. Instead, it calculates the probability (membership) that the data point given will belong to a cluster. So depending on the accuracy of grouping required to exercise proper tolerance, it may be advisable. Since absolute membership is not calculated, FCM can be fast because of the number of changes required to achieve special components.

In the following FCM algorithm iteration, the following goals (objective function) are minimized:

$$ J = \sum\limits_{{i = 1}}^{N} {\sum\limits_{{j = 1}}^{C} {\left\| {x_{i} - c_{j} } \right\|^{2} } }$$
(6)

Here N is the number of data points, C is the necessary number of clusters, cj is the central vector j and δij is the membership level for the data point i in the j group. The \( \left\| {x_{i} - c_{i} } \right\| \) standard measures the (or near) approximation of the xi data point with the central vector cj of j clusters. Note that in each iteration, the rules maintain a vector center for each cluster. These data are calculated on the average weight of the weighted data displayed by membership.

The fuzzy C-means segmentation algorithm is as follows:

  • Step 1: Start the membership-matrix U with values between 0 and 1.

  • Step 2: Calculate the center cluster c.

  • Step 3: Calculate the function that costs according to the equation. Stop if either it is below a certain tolerance value or its improvement, than the previous repetition is below certain threshold levels.

  • Step 4: Calculate the new U. Please go to Step 2

3 Performance Evaluation Metrics

The quality rating of the video segmentation algorithm is by comparing the result of the segment with part of the ground truth and obtaining the following statistics:

True Positive (TP): Intersection of pixels in the foreground of the segmented output and ground truth.

False Positive (FP): Intersection of pixels in the foreground of the segmented output and pixels in the background of ground truth.

True Negative (TN): Intersection of pixels in the background of the segmented output and ground truth.

False Negative (FN): Intersection of pixels in the background of the segmented output and pixels in the foreground of ground truth.

Using these statistics, assessments including FPR, FNR, Precision (Pr), Recall (Re), Accuracy (Ac), and Specificity (Sp) are calculated.

$$ \begin{aligned} {\text{FPR}} & = \frac{\text{number of FP}}{{\left( {{\text{number of FP}} + {\text{number of TN}}} \right)}} \\ {\text{FNR}} & = \frac{\text{number of FN}}{{\left( {{\text{number of FN}} + {\text{number of TP}}} \right)}} \\ \end{aligned} $$
$$ {\text{Precision}}\left( { \Pr } \right) = \frac{\text{number of TP}}{{\left( {{\text{number of TP}} + {\text{number of FP}}} \right)}} $$
$$ {\text{Recall}}\left( {\text{Re}} \right) = \frac{\text{number of TP}}{{\left( {{\text{number of TP}} + {\text{number of FN}}} \right)}} $$
$$ {\text{Specificity}}\left( {\text{Sp}} \right) = \frac{\text{number of TN}}{{\left( {{\text{number of FP}} + {\text{number of TN}}} \right)}} $$
$$ {\text{Accuracy}}\left( {\text{Ac}} \right) = \frac{{{\text{number of TP}} + {\text{number of TN}}}}{{({\text{number of }} {{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}} )}} $$

These above parameters describe the performance [2022] of the moving object.

4 Experimental Results and Discussion

Analysis of video segment algorithms has been tested and implemented using Matlab. We have selected some of the testing segments, the standard video clips called CDnet 2014 [23] in the section of the video. These sets of videos differ in size with corresponding ground truth. The mean-shift, K-means, and fuzzy C-means segmentation algorithms are applied over the above dataset and got the segmentation result.

In mean-shift algorithm, we have taken the spatial_bandwidth = 40, color_bandwidth = 3 and convergence factor = 3. Different values of bandwidth W can be taken for the performance measurement.

In K-means algorithm, we have taken the cluster number = 8 and we choose the cluster center randomly.

In fuzzy C-means algorithm, we choose the cluster center randomly (Fig. 1).

Fig. 1
figure 1

Segmentation results obtained for various sequences of the CDnet dataset: (i) ground truth maps; (ii)–(iv) output results of mean-shift, K-means, and fuzzy C-means methods, respectively

To measure the quality of the segmentation algorithms, we calculate different metric measurements [2022]. The metrics were reported for CDnet for 2014 [23]. We have provided a comparative analysis of mean-shift, K-means, and fuzzy C-means clustering-based segmentation. We can see that for baseline category, the accuracy is more for K-means method. The baseline and shadow categories consist of simple sequences where the normal order of pedestrian and car is the focus. With different types of fraud and variation related to Interm.Object motion, fuzzy C-means method gives very good recall compared to other. In shadow category, the accuracy is more than 90% for mean-shift segmentation. But the recall is high for fuzzy C-means method. In dynamic background category, the accuracy is more than 90% both in mean-shift and K-means methods but recall is more in fuzzy C-means method (Tables 1, 2, 3 and 4).

Table 1 Performance measures (FPR, FNR, Pr, Re, Ac, Sp) of different methods for baseline image
Table 2 Performance measures (FPR, FNR, Pr, Re, Ac, Sp) of different methods for Interm. Object motion image
Table 3 Performance measures (FPR, FNR, Pr, Re, Ac, Sp) of different methods for shadow image
Table 4 Performance measures (FPR, FNR, Pr, Re, Ac, Sp) of different methods for dynamic background image

5 Conclusion

In the process of visualizing a video and a computer, one of the most difficult and most active areas of the research is the part of the video material. One of the most important issues for successful use of successful video sequences is the video segment that focuses on dividing video files into essential video frames. Various techniques are being used by the developers in reviewed. We analyzed three video segmentation algorithms based on the discovery of the changes in the video clip. We have used mean-shift, K-means, and fuzzy C-means clustering-based segmentation algorithm for video segmentation and measured the performance evaluation metrics. Mean-shift and K-means methods provide good accuracy in comparison to fuzzy C-means. But the recall is generally more in case of fuzzy C-means method.