A Novel Method to Classify Videos Based VBR Trace

Yu, Xiaojie; Qiu, Min; Peng, Lizhi

doi:10.1007/978-3-030-62460-6_24

Xiaojie Yu¹²,
Min Qiu¹² &
Lizhi Peng¹²

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12487))

Included in the following conference series:

International Conference on Machine Learning for Cyber Security

1264 Accesses

Abstract

Video classification research has been studied for many years. Traditional video classification methods are based on text, sound, and visual content. However, all these approaches require that the video content can be inspected. If the video content can not be investigated. For example, the video frame is encrypted or transmitted on the network device, then we can only measure the size of the video frame bitrate. In this paper, we propose two novel feature extraction methods based on variable bit rate (VBR) trace. The first one is extracting features in sliding windows. The second one is based on change points techniques to obtain more reasonable windows. We carry out empirical studied on our data sets to discriminate the action videos from the other videos. The experiment shows that we can identify the action video with $87\%$ g-mean.

This research was partially supported by the National Natural Science Foundation of China under grant No. 61972176, No. 61472164, No. 61672262, No. 61572230, and No. 61573166, the Shandong Provincial Key R&D Program under Grants No. 2018CXGC0706 and No. 2017CXZC1206.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Action Recognition Using Hybrid Feature Descriptor and VLAD Video Encoding

A Novel Key Flow Frame Selection Method for Video Classification

Article 03 February 2024

Systematic approach to binary classification of images in video streams using shifting time windows

Article Open access 17 September 2018

Keywords

1 Introduction

In the past decade, Internet witnessed the burst of video, especially for the mobile Internet. All kinds of videos extremely enhanced user experiences. However, different video contents have posed a new challenge, that is how to identify the Internet video types. It is necessary for large video sites such as Netflix, YouTube, and Amazon to classify their videos to provide high quality video services. Schools need to ensure that students are exposed to health videos. From the view of Internet management, it is necessary to pick out illegal videos from other videos.

Traditional video classification is generally divided into four ways: text-based approaches, audio-based approaches, visual-based approaches, and those that used some combination of text, audio, and visual features [3, 7, 11, 15, 20, 21, 23]. Many of the standard classifiers such as Bayesian, support vector machines (SVM) can be used for video classification. Gaussian mixture models (GMMs) and hidden Markov models (HMMs) are particularly popular on video classification in the past few years [3]. Much progress has been made in video classification in recent years. Convolutional Neural Networks (CNNs) has been proven to perform very well on video classification tasks [4, 12, 13]. However, all these techniques are based on condition that video contents can be inspected. If the video frame is encrypted or transmitted on a network device, then we can only measure the size of the video frame. In such cases, all these traditional techniques are invalid as the video contents cannot be inspected. Hence, in this work, we explore the method of time series classification based on video frame size.

Variable Bit Rate (VBR) Trace. Most popular streaming services use variable bitrate encoding. Therefore, the bitrate of an encoded video varies with its content. Variable bit rate encoding is also used on H.264 video. H.264 is the most widely used encoding standard for Internet video. In this standard, a video is encoded into a series of consecutive GOP (Group of pictures) groups. For each GOP, there is one I frame, several B frames and P frames. The I frame (intra coded picture) reference image is equivalent to a fixed image and is independent of other image types. Each image group starts with an I frame. A P frame (predictive coded picture) contains the difference information from the previous I or P frame. B frames (bidirectionally predictive coded pictures) contain difference information from previous and/or subsequent I or P-frames. VBR allows a higher bitrate to build more complex segment of media files while less space is allocated to less complex segments. Hence, the frame size changes with the bit rate. In this work, we choose B frame size traces as VBR traces.

The main contributions of this paper are summarized as follows:

To the best of our knowledge, this paper is the first to use the change point method on the video frame size traces to explore the scene classification. We propose a new fusion function to obtain change points more accurately.
We extract statistical features from variable bit rate (VBR) trace, resulting a larger and more effective feature set.
We verified the effectiveness of the extracted features on our own data set, and further explored the impact of different parameters on classification.

The remainder of the paper is structured as follows. First, we introduce the background about VBR trace in Sect. 2. Then, we outline the releted work in Sect. 3. Next, we present the framework in this work in Sect. 4. In Sect. 5, we describe in detail the method of extracting features. Implementation details and experimental results are described in Sect. 6. Discussion and future work are provided in Sect. 7.

2 Related Work

In general, the classification and matching of videos are mainly studied in two fields, one is the field of computer vision, and the other is the field of network security.

In the field of computer vision, there are many studies on video classification [4, 12, 13], mostly based on deep learning methods. They basically pay attention to the recognition of various actions. On the other hand, there are also research and explorations on video classification using Zero-shot learning [1, 2, 10, 24].

In the field of cybersecurity, R Schuster et al. [22] showed that due to the segmentation prescribed by the MPEG-DASH standard, many video streams are uniquely characterized by their burst patterns. R Dubin et al. [6] and J Gu et al. [9] also explored the burst patterns to identify encrypted video streams. For the same reason, H Li et al. [16] and X Liu et al. [18] explored the action recognition on surveillance traffic.

Last but not least, FHP Fitzek et al. [8] present a publicly available library of frame szie traces of long MPEG-4 and H.263 encoded videos. They also present a thorough statistical analysis of the traces. Q Liang et al. [17] used fuzzy techniques to model and classify MPEG VBR videos.

Inspired by all these efforts, we explore a novel method to classify H.264 encoded videos. We extract features of VBR traces in following sections. And further present the effect of different parameters and feature combinations on the results.

3 The Framework

3.1 A. Video Pre-processing

We first convert the videos with different formats to a single format using Axiom [19]. The target parameters are shown in Table 1.

Table 1. Format parameter

Full size table

Then each movie is split into multi segments with fixed length of 120 s. And then, we pick out all the segments with actions to build the action movie set, and the left ones for the other movie set.

3.2 B. VBR Traces Building

The VBR data are extracted from the movie sets using FFmpeg, as VBR is the most effective method to get video information without inspecting the video contents. As we know, B frame is the dominant frame type in video data. Most differences between frames are contained in B frame. Therefore we only use the VBR trace of B frame. Each VBR trace is an array $D_i = (t_1, t_2, \dots ,t_n) $, thus all arrays with label form a vector sets $D = \{D_1, D_2,\dots ,D_m\}$. Obviously, the length of each row in vector sets is not same because the number of each segment is not equal.

3.3 C. Feature Extraction

Due to the high dimensional raw data and the varying length of the VBR traces, it is necessary to extract high-level semantic features to reduce the computing complexity and to achieve high classification performance. Inspired by [18], a basic idea is to calculate trace rate change C for each row data $D_i = (t_1, t_2, t_3, \dots ,t_n)$, where C is defined in Eq. 1.

As each VBR trace is essentially a time serial. We use a window sliding on each VBR trace to extract windowed-features. Then, some statistics are got from each window as the additional features. Detailed techniques will be introduced in the next section.

3.4 D. Classification

At the final step we utilize machine learning algorithms to discriminate the action movies from the other movies. To validate the effectiveness of the VBR trace and its features, we carry out our empirical studies using six well-known classic machine learning algorithms: Random Forests (RF), Logistic Regression (LR), Gaussian Naive Bayes (GNB), Decision Tree (DT), linear Suppor Vector Classification (LinearSVC), SVM with rbf kernel (SVM-rbf).

4 Extract Features

In this section, two novel feature extraction methods will be illustrated in detail. The first one is inspired by the method which is proposed in [18]. The second one is based on the change point detection techniques in statistics [14, 16]. Features we extracted are shown in Table 2.

Table 2. Features

Full size table

In order to capture the information of VBR rate change, we first calculate the VBR trace rate change $C_i = (a_1, a_2, a_3,\dots ,a_{n-1})$ of the ith VBR trace $D_i = (t_1, t_2,\dots ,t_n) $. $a_j$ represents the difference in frame size between the $(j+1)$-th and the j-th frame.

$$\begin{aligned} a_j = t_{j+1} - t_j, \qquad j \in [1,n-1] \end{aligned}$$

(1)

The mean values of the VBR trace and the frame size rate change: These features can show the intensity of scene changes in the videos. Given a VBR trace $D_i = (t_1, t_2, t_3, \dots ,t_n) $ and $C_i = (a_1, a_2, a_3,\dots ,a_{n-1})$, the mean values $\overline{t}$ and $\overline{a}$ are defined as:

$$\begin{aligned} \overline{t} = \frac{1}{n}\sum _{i=1}^n t_i, \qquad \overline{a} = \frac{1}{n-1}\sum _{i=1}^{n-1} a_i \end{aligned}$$

(2)

The variances of the VBR trace and the frame size rate change: These features can show the complexity of scene change. Given a frame size traces $D_i = (t_1, t_2, t_3, \dots ,t_n) $ and $C_i = (a_1, a_2, a_3,\dots ,a_{n-1})$, the variance $t^{var}$ and $a^{var}$ are defined as:

$$\begin{aligned} \begin{aligned} t^{var} = \frac{1}{n}\sum _{i = 1}^n (t_i - \overline{t})^2 \\ a^{var} = \frac{1}{n-1} \sum _{i=1}^{n-1}(a_i - \overline{a}) ^ 2 \end{aligned} \end{aligned}$$

(3)

The skewness of the VBR trace and the frame size rate change: These features describe the symmetry of data distribution. Given a frame size traces $D_i = (t_1, t_2, t_3, \dots ,t_n) $ and $C_i = (a_1, a_2, a_3,\dots ,a_{n-1})$, the skewness $t^{sk}$ and $a^{sk}$ are defined as:

$$\begin{aligned} \begin{aligned} t^{sk} = \frac{ \frac{1}{n} \sum _{i = 1}^n (t_i - \overline{t})^3}{(\frac{1}{n-1} \sum _{i = 1}^n (t_i - \overline{t})^2)^{\frac{3}{2}}} \\ a^{sk} = \frac{ \frac{1}{n-1} \sum _{i = 1}^{n-1} (a_i - \overline{a})^3}{(\frac{1}{n-2} \sum _{i = 1}^{n-1} (a_i - \overline{a})^2)^{\frac{3}{2}}} \end{aligned} \end{aligned}$$

(4)

The kurtosis of the VBR trace and the frame size rate change: These features describe the shapes of the distribution of the original data. Kurtosis is a measure of whether the distribution is peaked or flat relative to a normal distribution. Given a frame size traces $D_i = (t_1, t_2, t_3, \dots ,t_n) $ and $C_i = (a_1, a_2, a_3,\dots ,a_{n-1})$, the kurtosis $t^{ku}$ and $a^{ku}$ are defined as:

$$\begin{aligned} \begin{aligned} t^{ku} = \frac{ \frac{1}{n} \sum _{i=1}^n (t_i - \overline{t})^4}{ (\frac{1}{n} \sum _{i=1}^n (t_i - \overline{t})^2)^2} - 3\\ a^{ku} = \frac{ \frac{1}{n-1} \sum _{i=1}^{n-1} (a_i - \overline{a})^4}{ (\frac{1}{n-1} \sum _{i=1}^{n-1} (a_i - \overline{a})^2)^2} - 3 \end{aligned} \end{aligned}$$

(5)

Amplitude and Phase transformed from DFT: According to reference [18], we use DFT to obtain coefficients containing frequency information. We apply DFT in sliding window directly. Given a frame size traces $D_i = (t_1, t_2, t_3, \dots ,t_n) $, the coeffcients we get are complex numbers with the form of $z = a + bi$. Amplitude and phase were proven to be effective in practice [18]. Thus, we use these features. Amplitude and phase are defined as:

$$\begin{aligned} Amplitude = \sqrt{a^2 + b^2}, \qquad Phase = \arctan \frac{b}{a} \end{aligned}$$

(6)

Sliding window: For each frame size traces $D_i = (t_1, t_2, t_3, \dots ,t_n)$, sliding window may get more detailed information. We do not fix the size of the windows, but the number of windows: m is fixed. A consequent issue is the impact of the parameter m, which will be explored in the empirical studies. Furthermore, we explored the effect of different m. For example, given a frame size traces $D_i = (t_1, t_2, t_3, \dots ,t_n)$, result is $D_i = (d_1,d_2,\dots ,d_j,\dots ,d_m)$, $d_j$ is sub-sequence. The sub-sequece $d_j$ is defined as:

$$\begin{aligned} d_j = {\left\{ \begin{array}{ll} \{ t_{[(j-1)\frac{n}{m}]},\dots , t_{j\frac{n}{m}}\}, \qquad 1 \le j \le (m-1)\\ \{ t_{[(j-1)\frac{n}{m}]},\dots , t_n \}, \qquad j = m \end{array}\right. } \end{aligned}$$

(7)

Since the length of a scene in each video is not fixed, fixed length of sliding window is not reasonable. The perfect case is that a single window corresponds with a single scene. Therefore, we apply PELT algorithm [14] first to detect change points. Then, we get more reasonable length of window.

Change point detection for a time series data $y_{1:n}$, assume we get m change points with their positions $\tau = \{\tau _{1}, \tau _{2}, \dots ,\tau _{m}\}$ in $y_{1:n}$. Let $\tau _{0} = 0$ and $\tau _{m+1} = n$. One commonly used method to identify multiple change points is to minimize

$$\begin{aligned} \sum _{i=1}^{m+1}[\mathcal {C}(y_{(\tau _{i-1}+1):\tau _{i}})] + \beta f(m) \end{aligned}$$

(8)

Here $\mathcal {C}$ is a cost function for a segment and $\beta f(m)$ is a penalty guard against overfitting.

We present a weighted fusion cost function which combines cost function $\mathcal {C}_{1}$ for exponential distribution with changing mean and cost function $\mathcal {C}_{2}$ for normal distribution with variable variance. More formally, for a segmented subsequence $y_{1:n}$ between $\tau _{i-1} + 1$ and $\tau _{i}$, $ n = \tau _{i} - (\tau _{i-1} +1)$, we have

$$\begin{aligned} \mathcal {C}_{1} = -n(\log (\sum _{j = 1}^{n} y_{j}) ) \end{aligned}$$

(9)

$$\begin{aligned} \mathcal {C}_{2} = n \log \sigma ^{2} + \frac{1}{\sigma ^{2}} \sum _{j = 1}^{n}(y_{j}-\mu )^{2} \end{aligned}$$

(10)

Here,

$$\begin{aligned} \sigma ^2 = \frac{1}{n} \sum _{j=1}^{n}(y_{j} - \bar{y}) \end{aligned}$$

(11)

$$\begin{aligned} \mu = \frac{1}{n} \sum _{j = 1}^{n} y_{j} \end{aligned}$$

(12)

Figure 2 shows a case study of comparing the methods using $\mathcal {C}_{1}$ ,$\mathcal {C}_{2}$ and the combined cost function. As shown in Fig. 2, the distribution of the change points obtained by the cost function $\mathcal {C}_2$ is relatively dense. The distribution of the change points obtained by the cost function $\mathcal {C}_1$ is relatively sparse. Hence, we use cost function $\mathcal {C} = \theta _{1} \mathcal {C}_1 + \theta _{2} \mathcal {C}_{2}$. In our study, we set the parameters as $\theta _{1} = 0.7, \theta _{2} = 0.3$. We also carry out empirical studies to explore the impacts on these parameters.

Another problem is that some of the neighbour change points are too close to support reasonable segments. For the time of two change points $T_{\tau _{i}}$ and $T_{\tau _{i-1}}$, if $T_{\tau _{i}} - T_{\tau _{i-1}} > 0.5$ s, then we ignore $\tau _{i}$.

5 Evaluation

5.1 A. Date Collection

Table 3. Collection of videos

Full size table

Collection of video is shown in Table 3. We first convert the videos with the parameters in Table 1 using [19]. Then each movie is split into multi segments with fixed length of 120 s. And then we extract B frame traces on each segment. Finally, we got 2144 samples including 379 action scene samples and 1765 other scene samples. Obviously, the data sets is imbalanced, we over-sample minority classes by the Synthetic Minority Oversampling Technique (SMOTE) [5]. In this work, we use cross validation method to evaluate the performance of extracted features, so we split samples to 5 folds randomly.

5.2 B. Experimental Setup

We use six model to evaluate the effective of features. Firstly, we test on features which is extracted by fixed windows. As the Table 4 shown.

We can evaluate the effect of different number of windows. Secondly, for the features extracted by change points method, we choose a best model in last step to evaluate the performance of those features. We arrange the features between the change points $\tau _{i}$ and $\tau _{i+1}$ as $(time \ span, mean, variance, skewness, kurtosis)$, and define them as features in an interval, called span features. Then different combinations of features are tested according to Table 5. We also use Wilcoxon’s Sign Rank Test to test the difference between different features.

Table 4. Combinations of features

Full size table

Table 5. Different combinations of features

Full size table

5.3 C. Performance Metrics

We use G-mean, Accuracy, F1-score to evaluate the performance of our features.

G-mean. We define g-mean as

$$\begin{aligned} G-mean = \sqrt{\frac{TP}{TP+FN} \times \frac{TN}{TN+FP}} \end{aligned}$$

(13)

where TP and FP represent the true postives and the false positives of samples, respectively. Besides, TN and FN represent the true negative and false negative of samples, respectively.

Accuracy. we define accuracy as

$$\begin{aligned} accuracy = \frac{TP + TN}{TP + TN + FN + FP} \end{aligned}$$

(14)

F1-score. we define f1-score as

$$\begin{aligned} f1-score = \frac{2TP}{2TP+FP+FN} \end{aligned}$$

(15)

5.4 D. Experimental Results

In this part, we carry out experiments to evaluate the effectiveness of features. As the Fig. 3(a) shown, when the number of windows m is 1, the performance of random forest is best. The performance of svm algorithm with rbf kernel is better than that with linear kernel. The results show that our data requires a nonlinear method to fit. And random forest with 600 trees have more powerful fitness.

As the Fig. 3(b) shown, we experiment on features obtained with different window numbers by using random forest model. As the number of windows m increases, the g-mean score and f1-score score decreases. The best g-mean score, g-mean score of feature $1windows\_DFT$, is higher 5.41% than feature $9windows\_DFT$’s. We guess that the fewer windows, the larger the window size, which can capture more macro fluctuation characteristics. Another phenomenon is that there is no significant difference between those with Fourier transform features and those without Fourier transform features. In theory, we believe that selecting the appropriate frequency domain characteristics can better reflect the fluctuation characteristics of the data. We guess the reason for this result may be that our method of selecting frequency domain features is not suitable.

As the Fig. 3(c) and (d) shown, for the features extracted by changepoints methods, The performance of random forest also is best. Decision trees and Gaussian Naive Bayes perform well. In the random forest experiment, there are no obvious differences between the three combinations of feature points based on change points method. And compared with the features extracted with only one window, the method of change points does not remind of obvious advantages. In theory, we think that the change points method is more reasonable, and the problem may lie in our treatment of features. Since the change point of each sample is different, the number of extracted features is different. In order to obtain training samples of equal length, we fill in the zeros behind the features, which causes a lot of information redundancy, so that the classifier does not get good results.

We conduct experiments on the change points data obtained by different combinations of $\theta _1$ and $\theta _2$. As the Fig. 4 and Fig. 5 shown, when $\theta _1 = 0.4$ and $\theta _2 = 0.6$, the result is the best one. And f1-score and accuracy, the result of feature $0.4\_0.6\_V2$ is a little higher than the result of feature 1window.

In addition, through Wilcoxon’s Sign Rank Test, we compared the differences between different feature combinations. Firstly, we do hypothesis testing on the best features of the two ideas. Secondly, we do hypothesis testing for different combinations of features in each idea. The result is shown in Table 6.

Table 6. The results of Wilcoxon’s Sign Rank Test

Full size table

The result of hypothesis testing show that there is no significant difference in performance between most of the features of our experiments. But for the best model in our experiments, such as random forest, the difference in different features is still obvious.

6 Conclusion and Future Research

In this paper, we explored a novel method to classifier videos. The VBR trace can show some semantic features which is useful to classification. Further, we explored the more effective features obtained by segmenting data using change points method. Basically, our features are effective in the binary classification of videos. But the result of the hypothesis test is not what we thought. We realize that there are still many unsolved things. We need to solve the problem of zero-filling of changing points. Maybe a faster and more effective change point algorithm can be used. The fusion of the objective function of the change point and the corresponding weight have a large adjustment space, etc. Finally, in the future, we can try the multi-classification task.

References

Bishay, M., Zoumpourlis, G., Patras, I.: Tarn: temporal attentive relation network for few-shot and zero-shot action recognition. arXiv preprint arXiv:1907.09021 (2019)
Brattoli, B., Tighe, J., Zhdanov, F., Perona, P., Chalupka, K.: Rethinking zero-shot video classification: end-to-end training for realistic applications. arXiv preprint arXiv:2003.01455 (2020)
Brezeale, D., Cook, D.J.: Automatic video classification: a survey of the literature. IEEE Trans. Syst. Man Cyber. Part C (Appl. Rev.) 38(3), 416–430 (2008)
Google Scholar
Caba Heilbron, F., Escorcia, V., Ghanem, B., Carlos Niebles, J.: ActivityNet: a large-scale video benchmark for human activity understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 961–970 (2015)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Dubin, R., Dvir, A., Pele, O., Hadar, O.: I know what you saw last minute-encrypted HTTP adaptive video streaming title classification. IEEE Trans. Inf. Forensics Secur. 12(12), 3039–3049 (2017)
Article Google Scholar
Fan, J., Luo, H., Xiao, J., Wu, L.: Semantic video classification and feature subset selection under context and concept uncertainty. In: Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, pp. 192–201. IEEE (2004)
Google Scholar
Fitzek, F.H., Reisslein, M.: MPEG-4 and H. 263 video traces for network performance evaluation. IEEE Netw. 15(6), 40–54 (2001)
Google Scholar
Gu, J., Wang, J., Yu, Z., Shen, K.: Walls have ears: traffic-based side-channel attack in video streaming. In: IEEE INFOCOM 2018-IEEE Conference on Computer Communications, pp. 1538–1546. IEEE (2018)
Google Scholar
Hahn, M., Silva, A., Rehg, J.M.: Action2Vec: a crossmodal embedding approach to action learning. arXiv preprint arXiv:1901.00484 (2019)
Hauptmann, A., et al.: Video classification and retrieval with the informedia digital video library system (2002)
Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014
Google Scholar
Kay, W., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)
Killick, R., Fearnhead, P., Eckley, I.A.: Optimal detection of changepoints with a linear computational cost. J. Am. Stat. Assoc. 107(500), 1590–1598 (2012)
Article MathSciNet Google Scholar
Kobla, V., DeMenthon, D., Doermann, D.S.: Identifying sports videos using replay, text, and camera motion features. In: Storage and Retrieval for Media Databases 2000, vol. 3972, pp. 332–343. International Society for Optics and Photonics (1999)
Google Scholar
Li, H., He, Y., Sun, L., Cheng, X., Yu, J.: Side-channel information leakage of encrypted video stream in video surveillance systems. In: IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications, pp. 1–9. IEEE (2016)
Google Scholar
Liang, Q., Mendel, J.M.: Mpeg VBR video traffic modeling and classification using fuzzy technique. IEEE Trans. Fuzzy Syst. 9(1), 183–193 (2001)
Article Google Scholar
Liu, X., Wang, J., Yang, Y., Cao, Z., Xiong, G., Xia, W.: Inferring behaviors via encrypted video surveillance traffic by machine learning. In: 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 273–280. IEEE (2019)
Google Scholar
McManis, M.: Axiom: an FFmpeg interface for windows. https://axiomui.github.io
Roach, M., Mason, J., Xu, L.Q.: Video genre verification using both acoustic and visual modes. In: 2002 IEEE Workshop on Multimedia Signal Processing, pp. 157–160. IEEE (2002)
Google Scholar
Roach, M., Mason, J.S.: Classification of video genre using audio. In: Seventh European Conference on Speech Communication and Technology (2001)
Google Scholar
Schuster, R., Shmatikov, V., Tromer, E.: Beauty and the burst: remote identification of encrypted video streams. In: 26th $\{$USENIX$\}$ Security Symposium ($\{$USENIX$\}$ Security 17), pp. 1357–1374 (2017)
Google Scholar
Xu, L.Q., Li, Y.: Video classification using spatial-temporal features and PCA. In: 2003 International Conference on Multimedia and Expo. ICME 2003, Proceedings (Cat. No. 03TH8698), vol. 3, pp. III-485. IEEE (2003)
Google Scholar
Zhu, Y., Long, Y., Guan, Y., Newsam, S., Shao, L.: Towards universal representation for unseen action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9436–9445 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, 250022, People’s Republic of China
Xiaojie Yu, Min Qiu & Lizhi Peng

Authors

Xiaojie Yu
View author publications
You can also search for this author in PubMed Google Scholar
Min Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Lizhi Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lizhi Peng .

Editor information

Editors and Affiliations

Xidian University, Xi'an, China
Xiaofeng Chen
Guangzhou University, Guangzhou, China
Hongyang Yan
Michigan State University, East Lansing, MI, USA
Qiben Yan
Division of Computer, Electrical and Mathematical Sciences and Engineering, King Abdullah University of Science, Thuwal, Saudi Arabia
Xiangliang Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, X., Qiu, M., Peng, L. (2020). A Novel Method to Classify Videos Based VBR Trace. In: Chen, X., Yan, H., Yan, Q., Zhang, X. (eds) Machine Learning for Cyber Security. ML4CS 2020. Lecture Notes in Computer Science(), vol 12487. Springer, Cham. https://doi.org/10.1007/978-3-030-62460-6_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-62460-6_24
Published: 11 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62459-0
Online ISBN: 978-3-030-62460-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Novel Method to Classify Videos Based VBR Trace

Abstract

Similar content being viewed by others

Action Recognition Using Hybrid Feature Descriptor and VLAD Video Encoding

A Novel Key Flow Frame Selection Method for Video Classification

Systematic approach to binary classification of images in video streams using shifting time windows

Keywords

1 Introduction

2 Related Work