1 Introduction

In today’s age of the Internet and the advancement of cutting edge technologies, the evolution of multimedia data is increasing by leaps and bounds. From social networking to online shopping, everywhere the demand for digital data is increasing exponentially. So it is very cumbersome to handle or effectively process this sort of digital data. The lion share of these digital data (in size per data) comprises of multimedia videos. Due to the inherent unstructured property of the video, it is very difficult to manage and retrieve a video. The use of the name as a prime attribute for indexing and accessing data items has become obsolete nowadays.

To sketch a solution, for the above problem the structural characteristics of the video are effectively used for indexing and retrieval which ascends to the necessity of content-based video retrieval system. In excerpting the entire gamut of a video, a video is subdivided into consequential intra-relevant frames defined as shots. In an abridgment, a shot can be defined as a series of correlated information for a specific chronological length of frames in a video. The activity of extracting the periphery among two chronological shots is labeled as shot boundary detection (SBD). In the context of SBD, a transition is of basically two types: abrupt and gradual transitions [1, 2]. Abrupt transition is also known as cut transition; this type of transition suddenly crops up among two successive frames. A gradual transition is created due to the incorporation of editing effects in multiple consecutive frames. A gradual transition is predominantly clustered into a fade, dissolve and wipes. This breed of transition prolongs to a sanguine length of frames. Fade transition is broadly classified as fade-in and fade-out transitions. When a shot materializes deliberately from a monochromatic frame is considered as fade-in transition; on the other hand, when a shot dissipates to a monochromatic frame is termed as fade-out transition. The protruding of fade-out and fade-in (without the monochromatic frames), where the current shot starts receding and the next shot starts breezing in simultaneously, is named as dissolve transition. A wipe transition occurs with an animated effect, and it is generally found in sports and news videos.

The major challenges still persisting in the shot boundary detection arena are to develop an illumination, object, and camera motion (OCM) invariant shot boundary detection approach and to define an adaptive threshold to classify transition and non-transition frames across videos. Sudden illumination and OCM challenges in the detection of abrupt transition result in high false positive [1]. Many researchers in the field of shot boundary detection prefer a histogram-based approach due to its low computational cost and motion invariant advantages [1, 3, 4]. Discrete cosine transform (DCT) transform plays an efficient role to reduce sudden illumination changes in a video and then after shot boundaries are detected using histogram difference approaches [2]. Some researchers have also experimented with DCT and wavelet transform to undermine the illumination changes [5]. It also used an adaptive threshold for detecting shot boundaries. Some algorithms are proposed based on the cross-correlation coefficient and stationary wavelet transform features [6]. In such approaches shot boundaries are detected using a combination of the local and adaptive threshold. In some cases, a dual-tree complex wavelet transform is used for analyzing the structural feature of each and every frame [7]. Here, shot boundaries are computed using these structural similarity features along with an adaptive threshold. In some cases, a block-based center symmetric local binary feature vector is also recycled to identify shot boundaries [8]. Few sparse representation-based approaches [9, 10] are also proposed for boundary detection. [9] proposed a boundary detection approach using sparse coding for selecting a keyframe efficiently for video summarization.

Edge-based features play a crucial role in neglecting the effect of sudden illumination and motion effect [1]. A transition is announced when the edges of the prevailing frame display a hefty variation with the edges of the preceding frame that have dissipated [11, 12]. A block matching algorithm is used to compute the motion strength to reduce motion effects [13]. A fast SBD algorithm using pixel information is proposed in [14].

Machine learning techniques also have some applications in shot boundary detection. In [15] a genetic and fuzzy logic-based technique is proposed to detect boundaries. In some cases combining with the adaptive threshold, a convolution neural network is used to extract the possible candidate segment in prepossessing steps [16,17,18,19].

Many researchers have employed some essential features like structural similarity index and standard deviation [20], quantized hue, saturation and value (HSV) color space feature [12], histogram intersection [21], along with feature difference using absolute sum gradient [22] to reduce the illumination and motion effects in a video. In object-based SBD approaches, a time stamp is attached to an object for identifying that object in the number of frames [23, 24]. Some of the drawbacks in object-oriented SBD are the exodus of an object from the frame, an enormous object moving which is erroneous as wipe transition, and irregular illumination in a video [24]. Some object tracking algorithms are developed in [25,26,27] to tackle the sudden illumination effect in videos. The multi-view spatiotemporal feature points and grid-based matching approach used for video stitching in [38] can also be helpful in SBD.

In a few approaches, local binary pattern (LBP) is used as an illumination invariant feature to detect shot boundaries [28,29,30]. Due to some pitfalls persisting in LBP, some researchers preferred the local ternary pattern (LTP) feature, a conjecture of LBP [31]. LTP is basically a local texture descriptor that is further discriminant and lowers susceptible to noise in a uniform suburb. Due to its less sensitivity to noise, the effects of sudden illumination change is nullified, hence preserving its essential properties.

The literature review correctly articulates that the sudden illumination and OCM effects are the sizable hurdles in abrupt transition detection. The frames suffering from these hurdles are falsely classified as abrupt transitions thereby reducing the precision of the system. To propose an illumination and OCM invariant method is a challenging task. The paper proposes a bifold stage abrupt transition detection approach where the LTP feature is used at the initial stage and Lab color difference in the confirmation stage which is invariant to irregular illumination and OCM effects.

The notable findings of the paper are:

  1. i.

    A bifold-stage shot boundary detection technique is scripted to counter the sudden illumination and motion effect across videos.

  2. ii.

    Adaptive thresholds have been proposed to efficiently classify possible transition frames in the initial stage and actual transition frames in the confirmation stage.

Further, the organization of paper is as follows: Sect. 2 gives a brief background knowledge of the features used in the proposed approach. In Sect. 3, detail of proposed approach is given. Section 4 presents a detailed discussion of experimental results and parameter settings, followed by Sect. 5 which reports the conclusion of the work done.

2 Background knowledge

This section briefly discusses about the frame features used in the proposed system.

2.1 Local ternary patterns

In the field of texture classification features which are very much reliable and highly discriminative in nature are preferred. LBP features having those characteristics tend to resist against lighting effects, which fall under this category [32]. In most of the cases, it is found that LBP features are resistant against monotonic gray-level transformations. Research clearly depicts that in some cases LBP lacks in context to sensitivity against noise, most probably in near-uniform image regions.

Due to the above problem, a more robust texture operator, i.e., local ternary pattern (LTP), is formulated which is more resistant to noise [31]. LTP has formulated a new pattern where the neighboring pixels are concealed to a three-valued code, i.e., \((-1, 0, 1)\), in comparison with the LBP-based approach in which a two valued code (0, 1) is depicted. This process is carefully processed using a user-defined threshold t. In LTP, the three-valued code is calculated in accordance to the centre pixel \(i_c\) as given in Eq. 2. The value generated after comparing with the center pixel higher than the threshold yields + 1 else − 1. The generated value is considered 0 if the difference falls inside the range of the threshold.

Here the gray value of the centre pixel along with the neighboring pixel are denoted as \(i_c\) and \(i_p\) \((p=0,\ldots ,P-1)\) in LTP. The radius of the circle formed incorporating the neighboring pixel is denoted as R and p to define the count of the neighboring pixels. To make sure that the neighboring pixels do not fall at the center of the pixel, an estimation method is defined and known as bilinear interpolation:

$$\begin{aligned} \mathrm{LTP}_{P,R}= & {} \sum \limits _{p=0}^{P-1} 2^{p}s\left( i_{p}-i_{c}\right) , \end{aligned}$$
(1)
$$\begin{aligned} s(x)= & {} {\left\{ \begin{array}{ll} 1,&{}\quad x\ge i_c + t \\ 0,&{}\quad i_c-t<x<i_c+t\\ -1,&{}\quad x \le i_c-t \end{array}\right. } \end{aligned}$$
(2)

where x is the neighbor pixel values.

Here, the use of threshold makes LTP invariant to noise. The LTP encoding procedure is illustrated in Fig. 1 where the threshold is set to \(t=5\), so the tolerance interval is [49, 59].

Fig. 1
figure 1

Illustration of the basic LTP operator

2.2 CIEDE 2000 color difference

CIEDE 2000 is an efficient color-difference formula that correctly distinguishes different colors perceived through human eyes. This formula \((\varDelta E)\) is basically based on CIELab color space [33]. The difference between each pair of color values in the context of CIELab space is computed using Eq. 3:

$$\begin{aligned} {\begin{aligned}&\varDelta E_{00} =\varDelta E_{00}\left( L_1^*,a_1^*, b_1^*,L_2^*, a_2^*, b_2^*\right) \\&\quad = \root \of {\left( \frac{\varDelta L'}{K_LS_L}\right) ^2+\left( \frac{\varDelta C'}{K_CS_C}\right) ^2+\left( \frac{\varDelta H'}{K_HS_H}\right) ^2+R_T\left( \frac{\varDelta C'}{K_CS_C}\right) ^2+\left( \frac{\varDelta H'}{K_HS_H}\right) ^2} \end{aligned}} \end{aligned}$$
(3)

Here, lightness, chroma, and hue differences among the pair of samples in CIEDE2000 are efficiently calculated using (\(\varDelta L'\)), (\(\varDelta C'\)), and (\(\varDelta H'\)), respectively. In the blue region, an interaction between chroma and hue difference is encountered or efficiently executed using a rotation function (\(R_{T}\)). For a modification of \( a^* \) in the case of CIELab some specific weighting functions, parameter related factors along with a rotation term that basically affects the colors which have low chroma values are also added. The CIEDE2000 that has been inducted in CIELab has five major corrections. As compared to the previous version of the formula the \(S_{L}\), \(S_{C}\), and \(S_{H}\) are Compensation for lightness, chroma, and the hue, respectively, and \(K_L\), \(K_C\), and \(K_H\) are weighting parameters. This modification which has been undertaken in Eq. 3 correctly maps to the primed values in context to lightness, chroma, and hue differences. The similarity measure between frames in a video sequence is computed using CIEDE 2000 color difference.

3 Proposed approach

This section discusses the proposed approach. Figure 2 shows the block diagram of the proposed approach.

Fig. 2
figure 2

Block diagram for proposed system

3.1 Preprocessing

Preprocessing is the first step in the proposed approach which includes:

  1. 1.

    Converting the video frames from RGB color space to grayscale.

  2. 2.

    Resizing each frame to \(R\times S\) where \(R=130\) and \(S = 150\).

3.2 Abrupt transition detection

For properly classifying abrupt transition an innovative automatic bifold-stage shot boundary detection approach is scripted. The whole detection procedure is divided into possible stage and confirmation stage. The function of the possible stage is to eliminate non-transition frames, on the other hand, the function of the confirmation stage is to detect actual transition frames. The whole process drastically reduces the false detection rate.

3.2.1 Possible stage

This section briefly explains the possible stage of the proposed system which consists of feature extraction, similarity measure, threshold selection, and possible transition detection.

3.2.1.1 Feature extraction and similarity measure

After properly accomplishing the prepossessing stage of the proposed system, LTP features of each and every frame are computed for a video. The reason to incorporate the LTP feature is that it has an advantage of more discriminant and less sensitive to noise in uniform regions as compared with the histogram-based approaches.

The dissimilarity difference \(D_{i}\) between the extracted LTP codes histogram of consecutive frames is calculated using Chi-square histogram distance [34] as given in Eq. 4. The possible abrupt transitions are marked based on the thresholds discussed in Sect. 3.2.1.2:

$$\begin{aligned} D_i(x, y)= \sqrt{\frac{1}{2}\sum _{j=1}^{n}\frac{(h_j(x)-h_j(y))^2}{(h_j(x)+h_j(y))}} \end{aligned}$$
(4)

Here, \(h_j\) depict the histogram value of jth bin and \(D_i(x,y)\) will store the dissimilarity difference value between the consecutive frames x and y.

3.2.1.2 Threshold selection

An important criterion in shot boundary detection is to properly classify transition and non-transition frames. For attaining this goal a suitable threshold is to be determined. The role of a threshold is to declare an abrupt transition when the distance difference between the consecutive frames is beyond some value. The importance of selecting the right threshold will yield high accuracy in classifying abrupt transition. As the structure and behavior of video data are very random, so it is very cumbersome to set a unique threshold (hard threshold) to work efficiently for all videos. So an adaptive sort of threshold is need of the hour which can adapt according to the structure of the video. In our proposed algorithm, a set of adaptive threshold is selected, namely \(\gamma \) and \(\beta \), to properly filter out non-transition frames as given in Eqs. 5 and 6, respectively:

$$\begin{aligned} \gamma= & {} \mu _{D}+\sigma _D\times \kappa _1 \end{aligned}$$
(5)
$$\begin{aligned} \beta= & {} \sigma _{D}+\mu _{D} \end{aligned}$$
(6)

where \(\mu _{D}\) and \(\sigma _D\) are the mean and standard deviation of the dissimilarity measure of a video. \(\kappa _1\) is a user-defined constant which is set as 1.9 experimentally.

3.2.1.3 Possible transition detection

In the possible stage initially, LTP features of each and every frame of a video are extracted. Then, a similarity measure between the corresponding frames is evaluated using Eq. 4. From the dissimilarity values D, it is observed that an abrupt transition is encountered when the dissimilarity value of the ith and \(i+1\)th frames given by \(D_i\) is greater than equal to threshold \(\gamma \). Further, it is also observed that the difference of the dissimilarity values between \(D_{i}\) and \(D_{i\pm _1} \) is greater than the adaptive threshold \(\beta \) as shown in Eq. 7:

$$ \begin{aligned} \mathscr {P}\mathscr {A} = {\left\{ \begin{array}{ll} \mathrm{true}, &{}\quad (D_i \ge \gamma ) \& \& (D_i- D_{i-1}> \beta )\\ &{}\quad \& \& (D_i-D_{i+1}> \beta ) \\ \mathrm{false}, &{}\quad \mathrm{Otherwise} \end{array}\right. } \end{aligned}$$
(7)

Due to the elimination of a large number of non-transition frames, a handful of frames are left for consideration in the confirmation stage.

3.2.2 Confirmation stage

Similarly, this section briefly explains the confirmation stage of the proposed system which consists of feature extraction, dissimilarity measure, threshold selection, and actual transition detection.

3.2.2.1 Feature extraction and dissimilarity measure

In this stage, only the possible transition frames are considered. The detected frames are converted to Lab color space, f. Lab color feature is drafted in such a fashion to resemble human vision. It strives for perceptual uniformity, and its L component meticulously contests the human perception of lightness. It can be used to make authentic color equity alterations by modifying output curves in the a and b components or to regulate the lightness contrast using the L component.

Then, the Lab color difference is calculated using Eq. 3 which is represented as \(\varDelta E\).

3.2.2.2 Threshold selection

For the confirmation stage, an adaptive threshold \(\delta \) is proposed which is given in Eq. 8:

$$\begin{aligned} \delta =\mu _{\varDelta _{E}}+ \kappa _2\times \sigma _{\varDelta _{E}} \end{aligned}$$
(8)

where \(\mu _{\varDelta _{E}}\) and \(\sigma _{\varDelta _{E}}\) are the mean and standard deviation of the dissimilarity measure of a video. \(\kappa _2\) is a user-defined constant which is set as 0.8 experimentally.

3.2.2.3 Actual transition detection

From the possible stage, a handful of possible transition frames are mined. The sole work of the confirmation section is to verify that all the possible transition frames can be effectively mapped into actual transition frames. Due to the use of the initial screening stage, most of the sudden illumination and motion effects frames are drastically reduced. Then, also there may be a rare possibility that some of the illumination and motion effects frames may creep into as a member of the possible transition frames set that are fed into the later stage. So, a post-processing stage (or confirmation stage) for determining correct transition and false reduction is highly required.

In the confirmation stage, the frames \(f_{\mathscr {P}\mathscr {A}_i -\zeta }\) and \(f_{\mathscr {P}\mathscr {A}_i +\zeta }\) are used for ensuring the conformity of the possible transition frames where \(f_{\mathscr {P}\mathscr {A}_i}\) is the possible abrupt frame given by index \(\mathscr {P}{\mathscr {A}_i}\). Thus, \(f_{\mathscr {P}\mathscr {A}_i\pm \zeta }\) are preceding and upcoming frames given by \(\zeta \) from \(f_{\mathscr {P}\mathscr {A}_i}\) where the value \(\zeta \) is fixed as 4. It is observed that \(f_{\mathscr {P}\mathscr {A}_i}\) is an actual abrupt transition if the Lab color difference (\(\varDelta E\)) between \(f_{\mathscr {P}\mathscr {A}_i -\zeta }\) and \(f_{\mathscr {P}\mathscr {A}_i +\zeta }\) is greater than or equal to threshold \(\delta \). So, Eq. 9 is used for ensuring the confirmation of all probable abrupt transition (\(\mathscr {P}\mathscr {A}_i\)).

$$\begin{aligned} \mathscr {A} = {\left\{ \begin{array}{ll} \mathrm{true}, &{}\quad \text {if} \,\varDelta E\left( f_{\mathscr {PA}_{i}-\zeta },f_{\mathscr {PA}_{i}+\zeta }\right) \ge \delta \\ \mathrm{false}, &{}\quad \mathrm{otherwise}. \end{array}\right. } \end{aligned}$$
(9)

The pseudocode of the proposed system is clearly depicted in Algorithm 1.

figure d

4 Experimental results and discussion

4.1 Database

The database plays an important metric for validating the results mined through the proposed approach. Here, the database videos consist of selected TRECVid 2001 and 2007 videos. To make our database videos more dynamic, we have included some video and small movie clips which are subject to more lighting, illumination, and motion effect; for example, “Transformer (T1),” “Mission impossible (M1)” and a song of the movie “Massom” are used. Our experiments are carried out using HP-Z220 workstation. The overall details of the selected videos of our dataset are given in Table 1.

Table 1 Ground truth data of the test videos

4.2 Evaluation parameters

The performance evaluation of the proposed system is computed using recall (R), precision (P) and F1 score (F1) performance metrics through Eqs. 10, 11 and 12, respectively:

$$\begin{aligned} R= & {} \frac{\mathrm{Correctly}\, \mathrm{detected}}{\mathrm{Correctly}\, \mathrm{detected}~ +~ \mathrm{Miss}\, \mathrm{detected}}\times 100 \end{aligned}$$
(10)
$$\begin{aligned} P= & {} \frac{\mathrm{Correctly}\, \mathrm{detected}}{\mathrm{Correctly}\, \mathrm{detected}~+~\mathrm{Wrongly}\, \mathrm{detected}}\times 100 \end{aligned}$$
(11)
$$\begin{aligned} F1= & {} \frac{2\times R\times P}{R+P} \end{aligned}$$
(12)

4.3 Parameters selection

The performance of the system totally depends on the proper selection of the parameters used in the proposed approach. We have used three adaptive thresholds \(\gamma \), \(\beta \) and \(\delta \) which are discussed in Sects. 3.2.1.2 and 3.2.2.2.

The thresholds \(\gamma \) and \(\beta \) are used for extracting probable abrupt changes and \(\delta \) is used in the confirmation stage for ensuring conformity of the probable abrupt changes where the adaptation of these thresholds can be seen in Table 2.

The experimental results depicts that the appropriate range of constants \(\kappa _1\) and \(\kappa _2\) used in Eqs. 5 and 8 are [1 3] and [0.5 1.5], respectively. Throughout the experimentation the values of \(\kappa _1\) and \(\kappa _2\) are set as 1.9 and 0.8, respectively.

4.4 System performance

The overall performance of the proposed system using TRECVid 2001 and 2007 selected videos is shown in Table 3.

Table 2 Adaptive thresholds for different videos
Table 3 Performance of the system for TRECVid 2001 and 2007

Table 3 depicts the performance analysis of selected videos of TRECVid 2001 and TRECVid 2007 dataset. In TRECVid 2001 \(92.3\%\), \(99.1\%\), \(95.5\%\) and 757 s are the recorded average recall, precision, F1 score and computation time in the proposed system. Similarly, \(99.3\%\), \(96.7\%\), \(97.9\%\) and 1012 s are the recorded average values for the TRECVid 2007 dataset, respectively. Finally, \(96.7\%\), \(98.0\%\), \(96.8\%\) and 812 s are the overall average performance depicted for the proposed system. Figure 3 shows correctly detected abrupt transitions between frame numbers 468 and 469 from “D6.mpg” video.

Fig. 3
figure 3

An example of correctly detected abrupt transition from video

4.5 Discussion

Our proposed algorithm has correctly detected most of the abrupt transitions present in the clip. The proposed algorithm has discarded the illumination change and object motion frames. To show the advantages of the adaptive threshold, comparison between the proposed system using adaptive and hard threshold is done as shown in Table 4. From Table 4 it is clearly shown that the proposed system performs better using the proposed adaptive thresholds. To show the advantage of the confirmation stage, a comparison between the performance of the proposed system with and without using the confirmation stage is done and the results are shown in Table 5.

Table 4 System performance using hard thresholds and the proposed adaptive thresholds
Table 5 Experimental results without and with confirmation stage

The reason for choosing videos D2 and D4 for experimentation is that both videos present sudden illumination and motion effects. Table 5 clearly depicts that the proposed method is very much successful in overcoming most of the challenges mentioned above. Another interesting fact which is revealed when minutely observed is that the Fscore of all the videos in Table 5 is enhanced subject to the increase of precision in the proposed system. Figure 4 shows the major challenges such as uniform and non-uniform illumination changes from video D2 and D4, respectively; these problems are easily handled by our proposed system.

Fig. 4
figure 4

An example of a uniform illumination and b non-uniform illumination

An example from video \(\textit{D4}\) is shown in Fig. 5 where a large object (fan) is obstructing the object (dummy airplane) in front of the camera. Figure 5a, b shows the obstacle in multiple consecutive frames and obstacle in a single frame, respectively. The problems in Fig. 5a, b are handled effectively in the confirmation stage of the proposed system. Figure 6 shows the problem of flashlight effect in videos due to which frames are miss-classified as abrupt transition. Our proposed approach has effectively solved this scenario in shot boundary detection.

Fig. 5
figure 5

An example of obstacle in front of camera: a multiple frames, b single frame

Fig. 6
figure 6

An example of correctly discarded flashlight effect in detecting abrupt transition

Our proposed algorithm has correctly detected all the abrupt boundaries present in the \(\textit{Clip T1}\) as shown in Fig. 7. The proposed algorithm has successfully discarded most of the illumination and object motion frames, thereby increasing the precision of the system.

Fig. 7
figure 7

An example of a and b correctly detected abrupt transitions, c correctly discarded Illumination effect and d correctly discarded large Object motion

4.6 Comparison

For the comparison with the proposed system, some state-of-the-art techniques are considered—WHT-SBD [13], gradient-oriented feature distance (GOFD) [22], stationary wavelet transform (SWT) [6], fast framework [14], PSO-GSA [35], ST-CNN [16], a dual-stage-based approach using LBP-HF and canny edge difference [30], an adaptive low rank and svd-updating approach in [36] and SBD using color histogram and modified multilayer perceptron neural network [37]. Table 6 shows the comparison of the proposed system with the state-of-the-art techniques.

Table 6 Comparison of the proposed system with state-of-the-art techniques

Table 6 overwhelmingly depicts that the performance of the proposed approach is comparable to other state-of-the-art techniques which is possible due to illumination invariant nature of LTP features and the dual-stage-based approach which is employed for the dual confirmation of the boundaries helps in ensuring comparable precision and F1 score. From Table 6, it is observed that [36] has the highest F1 score which is possible due to a good recall and in addition decent precision. But the target of the proposed system is to eliminate the illumination and motion effect, that is, to reduce the false positive of the system. Again a good precision of the system ensures that a system is free from false positive, and [6] has the highest precision, but in order to attain a good precision they have compromised their recall which reduces the F1 score of the system drastically. However, our proposed system has a comparable recall and precision which yields to a good F1 score.

5 Conclusion and future work

A solution has been carved out in the field of shot boundary detection in case of sudden illumination and object motion across videos. This bifold-stage SBD technique clearly depicts the highest precision among all the other state-of-the-art approaches. In the first stage, a possible abrupt transition (\(\mathscr {P}\mathscr {A}\)) is extracted using illumination invariant feature LTP and a set of adaptive thresholds \(\gamma \) and \(\beta \), respectively. The CIEDE2000 features are extracted from the possible abrupt transition frames in the confirmation section along with an adaptive threshold \(\delta \) to effectively classify actual abrupt transition frames (\(\mathscr {A})\). The advantage of the Lab color space is that it can map all the colors perceived by the human visual system, enabling it to differentiate the color features more accurately. The experimental results manifest the illumination and object motion effects effectively handled by the proposed system across videos. All the persisting challenges are effectively addressed such as non-uniform illumination effect and obstruction in front of the camera for multiple frames because a high precision is recorded by the proposed system.

The future work is to improve the recall of the proposed system along with effectively detecting gradual and wipe transition, which will in turn make our system complete to be applied in content-based video retrieval systems.