Predicting split decisions of coding units in HEVC video compression using machine learning techniques

Hassan, Mahitab; Shanableh, Tamer

doi:10.1007/s11042-018-6882-8

Predicting split decisions of coding units in HEVC video compression using machine learning techniques

Published: 23 November 2018

Volume 78, pages 32735–32754, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Predicting split decisions of coding units in HEVC video compression using machine learning techniques

Download PDF

340 Accesses
7 Citations
Explore all metrics

Abstract

In this work, we propose to reduce the complexity of HEVC video encoding by predicting the split decisions of coding units. We use a sequence-dependent approach in which a number of frames belonging to the video being encoded are used for generating a classification model. At each coding depth of the coding units, features representing the coding unit at that particular depth are extracted from both the present and previously encoded coding units. The feature vectors are then used for generating a dimensionality reduction model and a classification model. The generated models at each coding depth are then used to predict the split decisions of subsequent coding units. Stepwise regression, random forest reduction and principal component analysis are used for dimensionality reduction; whereas, polynomial networks and random forests are utilized for classification. The proposed solution is assessed in terms of classification accuracy, BD-rate, BD-PSNR and computational time complexity. Using seventeen video sequences with four different classes of resolution, an average classification accuracy of 86.5% is reported for the proposed classification system. In comparison to regular HEVC coding, the proposed solution resulted in a BD-rate loss of 0.55 and a BD-PSNR of −0.02 dB. The average reported computational complexity reduction is found to be 39.2%.

Fast video encoding based on random forests

Article 05 February 2019

Fast coding tree structure decision for HEVC based on classification trees

Article 18 March 2016

Detecting Double and Triple Compression in HEVC Videos Using the Same Bit Rate

Article 08 August 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The High Efficiency Video Coding (HEVC) standard is one of the successors of the well-known MPEG-4 AVC (H.264 or MPEG-4 Part 10). It is designed to target various applications, especially those dealing with Ultra High Definition (HD) content. The HEVC project was formally initiated when a joint Call for Proposals was issued by the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) in January 2010 [30]. The prime focus was directed towards significantly improving the compression performance relative to existing standards.

After its completion in January 2013, the HEVC standard provides twice the compression capabilities as that offered by its predecessor. Given that the appropriate encoder settings are used, around 50% bit-rate reduction is possible, while maintaining minimal video quality level loss [21]. However, this coding efficiency is introduced at the cost of increasing the encoding computational complexity, which can reach up to 40% more than that of H.264/AVC [34].

Among other factors, both the enhanced compression efficiency and the increased encoding computational complexity can be attributed to HEVC’s usage of flexible partitioning structures. HEVC uses quad-tree Coding Tree Units (CTUs), Prediction Units (PUs), and Residual Quad-Trees (RQTs) rather than macroblocks (MBs). In order to achieve the best configuration in terms of selecting the optimal partitioning structure, an exhaustive rate-distortion optimization (RDO) process takes place, which is the main reason behind the intensification of the computational complexity. Most of the encoding time involves recursively repeating the RDO process at each Coding Unit (CU) depth level for each structure, where every combination of encoding structure is tested and the one that minimizes the rate-distortion (RD) cost is chosen [31].

Several early termination algorithms for optimizing the encoding process in HEVC can be found in the literature, where their aim is to reduce the computational complexity while any minimizing performance degradation. Among many, some approaches utilize the textural or structural characteristics of a given CU [1, 7, 9, 12, 22, 24, 28, 35, 38, 40, 41], while others use machine learning techniques [4, 6, 11, 14,15,16,17, 29, 36, 39]. The optimizations are not limited to HEVC inter-coding as some also considered enhancing intra-coding [16, 22, 24, 28]. In this field of research, utilizing machine learning techniques as a tool to minimize RD efficiency losses is limited, and most algorithms proposed do not achieve superior results in terms of computation complexity reduction without introducing significant video quality level losses.

In this work, we use a sequence-dependent approach to model the relationship between CU feature variables and split decisions. The feature variables are extracted from both the present CU and its surrounding spatial and temporal CUs. Additionally, we use dimensionality reduction techniques for the three CU depths of 64 × 64, 32 × 32 and 16 × 16. This is needed to reduce the number of extracted features. The feature extraction and modeling is also performed at three CU coding depths. We use stepwise regression, random forest reduction and principal component analysis (PCA) for dimensionality reduction. Moreover, we utilize polynomial networks and random forests for classification.

This paper is organized as follows. Section 2 presents a review of algorithms proposed in the literature that are reduced the encoding computational complexity. The overall CU split prediction system proposed is overviewed in Section 3. The feature extraction process and dimensionality reduction are discussed in detail in Section 4 in addition to the classification tools and arrangements used in this work. The experimental setup and experimental results are presented in Section 5. Lastly, Section 6 concludes the paper.

2 Related work

As mentioned earlier, HEVC introduced significant coding efficiency improvements at the cost of increasing the computational complexity. Therefore, existing research work is conducted to limit this computational complexity whilst minimizing the adverse effect on compression efficiency. The work reported in [1, 7, 9, 12, 22, 24, 28, 35, 38, 40, 41] investigated the textural or structural characteristics of CUs at a given CU depth to optimize the HEVC encoding procedure. [41] proposed an inter-prediction optimization scheme, where the CTU structure is analyzed in a reverse order. Alternatively, a subjective-driven complexity control approach is presented in [7], which examines the relationship between visual distortion and maximum depth of all largest CUs. Another complexity control algorithm is proposed in [12], where an early termination condition is defined at each CU depth based on the content of the video sequence being encoding, the configuration files and the target complexity.

In [40], the authors present a hierarchical structure-based fast mode decision scheme. A fast CU decision algorithm is presented in [38], where the coded block flag and RD costs are checked to determine if intra- and inter- PUs can be skipped. In [35], a two-layered motion estimation based fast CU decision process is proposed, which uses the sum of absolute differences (SAD) estimation to extract the SAD costs for a CU and its sub-CUs. [22] speeds up the HEVC intra-coding process mainly by using encoded CU depths and RD costs of co-located CTU to predict both the current CU’s depth search range and the RD cost for CU splitting termination. Local texture descriptors or image characteristics were used in [9, 24, 28] to allow faster CU size selection. A spatiotemporal based CU encoding technique is explored in [1], where sample-adaptive-offset (SAO) parameters were utilized to predict the textural complexity of the CU being encoded. The work in [5] introduces an interesting approach for predicting CU splitting based on deep learning using a reinforced learning algorithm. The algorithm is also capable of predicting the reduction in rate-distortion cost. The solution is applied to all-intra configuration and results in a BD-rate loss of 2.5%. In [8], the authors proposed an offline training algorithm based on random forests to predicting early termination of CU splitting. Neighboring CU sizes are also used in determining the depth of the current CU. The algorithm is applied to all-intra mode and reported a complexity reduction of 48.3% with a BD-rate increase of 0.8%.

Other approaches utilized the Bayesian decision rule and other machine learning techniques to improve the time complexity of an HEVC encoder. For instance, the work in [15, 16] uses the Bayes’ rule to optimize PU and CU skip algorithms, respectively. In [14], the authors present a joint online and offline learning-based fast CU partitioning method that uses the Bayesian decision rule to optimize the CU partitioning process. The Bayesian decision theory is also utilized in [29] along with the correlation between the variances of the residual coefficients and the transform size to enhance the PU size decision process. Alternatively, a fast CU splitting and pruning algorithm is proposed in [4], which is applied at each CU depth according to a Bayes decision rule method based on low-complexity RD costs and full RD costs. A fast CU size and PU mode prediction algorithm that uses the k-means clustering method is introduced by [17].

On the other hand, [11] presents an early mode decision algorithm based on the Neyman-Pearson approach. In [36], a fast pyramid motion divergence (PMD)-based CU selection algorithm is proposed, where a k nearest neighbors (k-NN) like method is used to determine the optimal CU size. The work in [39] used a machine learning-based fast coding unit (CU) depth decision method, where the quad-tree CU depth levels are modeled as a three-level of hierarchical binary decision problem. The work proposed in [6] implemented early termination techniques on CUs, PUs, and TUs using a set of decision trees grown with the aid of Waikato Environment for Knowledge Analysis (WEKA) [10], an open source data mining tool.

3 System overview

In the proposed prediction system, the first 10% of frames of a video sequence are used for training. Hence, modeling and prediction will be specific to one video as opposed to training the classification system using many video sequences. The former training approach is known as “video-dependent” modeling, while the latter is known as “video-independent” modeling. The problem with the video-independent modeling is that it follows a one-size-fits-all approach in which there is an implicit assumption that the videos used for training are suitable for predicting the CU split decisions of all other videos. Video-dependent modeling, on the other hand, makes sure that the prediction model is most suitable for predicting the CU split decisions of the remaining video content.

The concept of video-dependent modeling was previously introduced by the author in [23, 25, 27]. The first 10% of video frames were used for training and the prediction model is then used throughout the sequence in a video transcoding context. If needed, the training can be repeated periodically or in the case of detecting scene cuts. Reducing the percentage of train frames might result in a less accurate classification model as the number of feature vectors in the train set are reduced. Increasing the percentage, on the other hand, might result in a more accurate classification model. However, the time at which the model is applied to predict the split decisions of CUs will be delayed which decreases the overall gain in terms of computational complexity.

Figures. 1 and 2 present the flowcharts of the proposed training system where FV refers to Feature Vectors. Figure 1 illustrates the data collection process of the training system. The video encoder will run with normal compression operations for the first 10% of the video frames during which, for each CU, features are extracted and recorded at the highest level, which is typically 64 × 64. The corresponding split decision is also recorded. If the encoder decides to split the CU, then the split decisions at the 32 × 32 and 16 × 16 levels will be recursively calculated during which, the training system will record the features and corresponding split decisions at 32 × 32 and 16 × 16 CU levels. The details of the selected feature variables are discussed in the next section.

The output of this data collection process is three sets of data. Each data set contains feature vectors and the corresponding split flags for 64 × 64, 32 × 32 and 16 × 16 CU levels. The second step in the training system is to map the feature vectors to the split decisions. This is illustrated in Fig. 2. The result of this step is 3 training models that can be used for the prediction of CU split decisions at 64 × 64, 32 × 32 and 16 × 16 CU levels. Prior to model generation, there is an optional dimensionality reduction step. Again, this is applied at the three CU levels and the dimensionally reduced models are stored and used for reducing the dimensionality of the feature vectors during the testing phase, as shall be explained next. The system modeling and dimensionality reduction techniques used in this work are explained in the next section.

Once the system is trained, the generated models are used to predict the split decisions of the remaining CUs of the underlying video sequence. This process is illustrated in Fig. 3. Basically, feature variables are extracted at the highest CU level, which in this work it is 64 × 64. The corresponding train model is then used to predict the split flag. If predicted as ‘no split,’ then early termination is applied. Otherwise, the second train model is applied for each of the 32 × 32 CU levels and 4 split flags are predicted. If any of the flags are predicted as ‘split’, then the process is repeated at the 16 × 16 CU levels using the third train model. At each level, feature vectors are calculated and reduced in dimensionality if required. Again, dimensionality reduction models are calculated during the training phase.

4 System training

This section introduces the proposed feature extraction and dimensionality reduction process. It also reviews the machine learning techniques used.

4.1 Feature extraction and dimensionality reduction

In this work, feature extraction is applied at each of the three coding levels (i.e. 64 × 64, 32 × 32 and 16 × 16). Common to all levels are features extracted from surrounding CTUs. The surrounding CTUs are previously encoded and include the CTUs at the following locations relative to the current CU: left, top-left, top, top-right and co-located from the previous frame. The total number of surrounding CTUs is therefore 5. The complete list of extracted features and their description are listed in Table 1, where MVs refer to Motion Vectors. The first 15 features in Table 1 belong to the current CU, whereas the remaining 55 features belong to surrounding CTUs. The total number of features is therefore 70.

Table 1 Feature variables representing CUs

Full size table

As illustrated in Fig. 2 above, the dimensionality of these features can be reduced prior to generating the training model. In this work, we generate experimental results with and without dimensionality reduction. We propose the use of the following dimensionality reduction techniques: stepwise regression, principle component analysis (PCA) and reduction based on random forests. In the following, we briefly summarize the use of each relative to the proposed solution. It is important to mention that all dimensionality reduction techniques are applied to the train data set as illustrated in Fig. 2. The generated model is then applied to the test data set.

Stepwise regression is a feature selection algorithm; however, it can be used as a dimensionality reduction technique as reported in [26]. In this work, we treat the feature vectors of CUs as predictors and the split decisions as response variables. As such, the problem can be formalized in a regression context. The idea of stepwise regression is to start with one feature variable and compute its correlation with the split decision. Then, another feature variable is added and the correlation is computed again. The significance of adding another feature variable is assessed by means of examining the P value at a 0.05 level of significance. If the added feature variable is found significant, then it is retained; otherwise, it is removed from the list of variables. Likewise, once a variable is retained, the stepwise regression algorithm proceeds by revisiting the previous feature variables and reassessing their significant, taking into account that a new variable has been retained. The algorithm terminates when there are no further feature variables to add or to eliminate. A full description of the algorithm can be found in [20].

Once applied to the train data set at 64 × 64, 32 × 32 and 16 × 16 CU levels, the result of the stepwise regression is simply 3 sets of indices of the retained feature variables, one set for each CU coding depth. These indices can be used to reduce the dimensionality of the feature vectors during the testing phase. Since we are using a video-dependent approach to learning in this work, the number of retained feature variables varies from one video sequence to the other. Full information about the experimental setup are given in the experimental results section; nonetheless, for completeness, we briefly discuss the results of applying the stepwise regression algorithm here. The number of retained variables for each video sequence is given in Table 2. The table lists the average number of retained CU variables with QP values of {22, 27, 32 and 37}. Since 3 models are generated, the table lists the retained variables at 64 × 64, 32 × 32 and 16 × 16 CU coding levels. A full example showing specific names of retained variables for the RaceHorses sequence is shown in Table 3.

Table 2 Retained variables using stepwise regression

Full size table

Table 3 Stepwise regression, example retained variables for 32 × 32 depth level with QP = 32

Full size table

In this work, we also experiment with dimensionality reduction using random forests. In this approach, we generate a large set of trees against the CU split decision. Each tree is trained on a small number of feature variables. The usage statistics of each feature variable can be used to find an informative subset of features. More specifically, if a feature variable is repeatedly selected as best split, it is consequently a good candidate to retain. More information about this algorithm can be found in [18].

Here, a random forest of 100 trees is grown, where the maximum number of decision splits or branch nodes is set to be the initial set of 70 features. The training dataset is sampled for each decision tree with replacement and the feature variables selected at random for each decision split are chosen without replacement within the same decision tree. The importance of each of these features in predicting the correct classification of a test instance from the out-of-bag data is computed and used to select the features whose raw importance score make up 80% of the total importance score. The out-of-bag data is the set of instances that were left out during the training process of a given tree in the random forest.

The number of retained variables, using random forests, for each video sequence is given in Table 4. Similar to Table 2 above, the table lists the average number of retained CU variables with QP values of {22, 27, 32 and 37}. Since 3 models are generated, the table lists the retained variables at 64 × 64, 32 × 32 and 16 × 16 CU coding levels. A full example showing specific names of retained variables for the RaceHorses sequence is shown in Table 5.

Table 4 Retained variables using feature importance with random forests

Full size table

Table 5 Random forest feature importance, example retained variables for 32 × 32 depth level with QP = 32

Full size table

Lastly, we also experiment with dimensionality reduction using PCA. In this approach, an orthogonal transformation is used to transfer the data from the feature domain to the principle component domain. The first principle component of the transformed data account for the highest variability in the feature data. One drawback of PCA is that it results in a reduced data set that cannot be directly interpreted. This is not the case for the other two dimensionality reduction techniques used in this paper. The number of principle components retained depends on the chosen Proportion of Variance (PoV) explained, which is 90% in this work. Again, the PCA is applied to the training dataset at 64 × 64, 32 × 32 and 16 × 16 CU levels. The resulting principle components are then stored and used for reducing the test data set.

The number of retained variables, using PCA, for each video sequence is given in Table 6. Similar to Tables 2 and 4 above, the table lists the average number of retained CU variables with QP values of {22, 27, 32 and 37}. Since 3 models are generated, the table lists the retained variables at 64 × 64, 32 × 32 and 16 × 16 CU coding levels. Unlike stepwise regression and random forest variable selection, PCA results in a reduced data set that cannot be directly interpreted; hence, there are no specific retained variable names to list.

Table 6 Retained variables using PCA

Full size table

4.2 Classification methods

In this work, we model the relationship between the CU features and split decisions using two classification tools; namely polynomial networks [33], random forest [3] .

In the polynomial networks, we experiment with a second order polynomial classifier. In the case of random forests, 100 trees are grown, where the maximal number of branch nodes is the square root of the number of retained feature variables. This is determined based on the out-of-bag estimates of the features’ importance in the tree ensemble. Based on the retained features, the training dataset is sampled for each decision tree with replacement. The variables selected at random for each decision split are chosen without replacement within the same decision tree. As the purpose of growing trees is classification, only one observation or class label can be seen per tree leaf. The arrangement in which we combine the classification tools with the dimensionality reduction techniques are presented in Table 7.

Table 7 Arrangement of classification solutions

Full size table

5 Experimental results

We assess the performance and efficiency of the proposed solutions by implementing them in the HM reference software version 13.0 [13]. All video sequences are encoded with QPs of {22, 27, 32, and 37}. The main profile and the standard random-access temporal configuration are used. The same motion estimation parameters are used in both the original and proposed video coders. For a fair comparison with [14], we use the HEVC video test sequences. The sequence resolutions are Class A (2560 × 1600), Class B (1080 pixels), Class C (800 × 480), and Class D (400 × 240). A total of 17 video sequences are used as reported in Table 8. We ran the experimental results on a PC with Intel Core i7-37400QM, 2.7-GHz CPU with a 16-GB DDR3 RAM. Compression efficiency is quantified in terms of BD-rate and BD-PSNR. The compression times using the proposed solutions are also computed and compared with the corresponding times obtained from running the unmodified HEVC encoder. Furthermore, the classification accuracy of the proposed classification systems is presented. We start by reporting the classification accuracy of the CU split prediction. Table 9 presents the overall classification accuracies of the 4 proposed solutions. These results are the average for all video sequences coded with QPs of {22, 27, 32 and 37}. The table reports the classification results for the 3 models generated according to the CU coding depth. The results indicate that the average classification accuracy enhances slightly as the depth of the CU coding increases. The results also show that the classification solutions using random forest are the most accurate. In Table 10, the average classification results for individual video sequences using the “R.F. Select & R.F.” solution are shown. The resolution of the video does not seem to affect the classification accuracy. Moreover, the accuracies do not seem to vary much according to the underlying video sequence as well.

Table 8 List of video sequences used

Full size table

Table 9 Overall classification average of CU split prediction

Full size table

Table 10 Classification accuracy of CU splits for the “R.F. Select & R.F.” solution

Full size table

In the following experiments, we report the coding efficiency of the proposed solutions using a number of approaches; namely BD-rate, BD-PSNR [13] and Computational Complexity Reduction (CCR). CCR is computed as

$$ \mathrm{CCR}=\left({Time}_{ref.}-{Time}_{proposedSolution}\right)/{Time}_{ref.} $$

(1)

Where Time_ref. is the time needed for regular encoding and Time_{proposedSolution} is the time needed to encode a sequence using the proposed fast encoder. In Table 11, the coding results of the 4 proposed solutions are listed. As mentioned earlier, all results are obtained with QPs of {22, 27, 32 and 37}. The best coding results in terms of BD-rate and BD-PSNR are achieved by the “R.F. Select & RF” and “R.F.” solutions. The reported results also indicate that, in general, the coding efficiency enhances as the resolution of the video increases. The effect of the proposed solution on the video quality in terms of BD-PSNR ranges from −0.05 to −0.02 dB. This gives a clear indication that the effect of the proposed solution on the visual quality is minimal. Nonetheless, we show example images of regular encoding and encoding using the proposed in Fig. 4. As shown in the figure, there are no subjective artifacts as a result of the proposed solution.

Table 11 Coding efficiency of proposed solutions

Full size table

We report the time complexity in terms of CCR% in Table 12, where sequences are referred to by the IDs listed in Table 8. We present the CCR% for two cases; the first case is the time savings without taking the training or the model generation time into account, while the second case is the CCR% with the training time being taken into account. In Table 12, it seems that without giving consideration to the training time, all solutions provide similar complexity reductions. However, because some of the training solutions are more computationally demanding, when the training time is taken into account, the range of CCR% increases from (39.1% - 37.5%) to (38.9% - 31.8%). Clearly, the most demanding training solution is the one that uses random forest. This is followed by random forest with dimensionality reduction. In this work, no attempt was taken to reduce the training time. However, one solution would be to reduce the number of train feature vectors by means of sampling, therefore reducing the model generation time. The reported results also indicate that, in general, the CCR% enhances as the resolution of the video increases. For comparison with existing work, we refer to the work reported in [14, 38]. These results contain the BD-rate and time savings only. However, the time savings are computed as

$$ \Delta \mathrm{Time}=\left({T}_{ref.}-{Time}_{proposedSolution}\right)/{Time}_{proposedSolution} $$

(2)

Table 12 Complexity reduction of proposed solutions using CCR (%)

Full size table

This is different than the CCR reported in Table 11 above, where time savings are calculated by dividing by Time_reference instead of Time_{proposedSolution}. For a fair comparison, in Table 13, we calculated the time savings of the proposed solution accordingly. In Table 13, we compare the results of our best solution against existing work.

Table 13 Comparison with existing work [14, 38]

Full size table

It was not to clear to the authors if the reviewed work considered the training time when reporting the ∆Time%; hence, in Table 13, we report our results with and without taking into account the training time. In the table, ∆1Time% refers to the time savings without training time and ∆2Time% refers to the time savings taking into consideration the training time.

As previously mentioned, it seems that the results of the proposed solution enhance with the increasing sequence resolution. This is in contrast to the results reported in [14], where the BD-rate seems to improve with decreasing sequence resolution. Moreover, the time reduction in [14] does not seem to be affected by the resolution of the video sequences.

All in all, the results in Table 13 indicate that the proposed solution of “R.F. Select & R.F.” has a clear advantage in terms of BD-rate and time savings. With reference to the average reported BD-rate of existing work (i.e. 0.765 dB), the proposed solution provides an enhancement of 27.1%. Additionally, this quality enhancement comes with further reduction in computational time. More specifically, with reference to the average reported time savings (i.e. -45.5%), the proposed solution provides an enhancement of 57% and 34% for ∆1Time% and ∆2Time%, respectively.

Lastly, the results in Table 14 present a comparison with the recent work reported in [36], which uses CCR% for measuring the time complexity reduction. Table 14 lists the results for 15 video sequences that are used in both [36] and the proposed solution.

Table 14 Comparison with existing work [32, 36]

Full size table

The results in the table show that the BD-rate of the proposed solution is on average 0.56, whereas that of the proposed solutions [32, 36] are 2.2 and 1.36 respectively. This noticeable enhancement in the BD-rate reduction comes at a slight advantage of increased computational complexity, where the CCR of the reviewed work is 37.5% [36] and that of the proposed work is 39.2%. The work in [32] has a CCR of 43.9% which comes at the expense of effecting the video quality as evident in the BD-rate.

Clearly, higher computational savings are possible, however at the expense of BD-rate. For example, the work reported in [19] proposed a solution for split prediction of CUs based on the motion features and rate-distortion cost of the NxN inter mode. The solution reuses motion vectors to expedite compression. The CCR results reported range from 55% to 61%. However, this speedup comes at the expense of high BD-rate loss of 1.93% to 2.33%. The work in [37] proposed a fast CU mode decision solution for HEVC trans-rating.

The solution uses modes and motion vectors of the correlated CUs to decide on an early SKIP decision. It also restricts the CU depth search by estimating a weighted average of the depth levels of correlated CUs. Additionally, CUs with a higher rate-distortions are divided into smaller CUs without evaluating RD costs of the remaining partitioning modes. The CCR results reported range from 55%, but again, this speedup comes at the expense of high BD-rate loss of 2.26%.

6 Conclusion

We proposed a solution for reducing the complexity of determining the CU split decisions in HEVC video coding. The solution uses a video sequence-dependent approach to collect features that represent CUs at various coding depths. The features are extracted from both the underlying CU and the previously encoded CUs. A classification model is then built based on these feature vectors and corresponding split decisions at various CU coding depths. Additionally, dimensionality reduction is optionality used as a preprocess to model generation. Therefore, the output of the training phase is a set of classification models for predicting split decisions and a set of dimensionality reduction models. We have used a number of dimensionality reduction and classification techniques, including stepwise regression, random forest variable selection, PCA, polynomial classifiers and random forest classifiers. Experimental results were carried out on many test video sequences with different resolutions. Comparison with existing work revealed that the proposed solution has an advantage in terms of coding efficiency and time savings. It was shown that, on average, the classification accuracy of the CU split models is 86.5%. In comparison to regular HEVC coding, the proposed solutions resulted in a BD-rate of 0.55 and a BD-PSNR of −0.02 dB. The average reported computational complexity reduction was 39.2%. Future work includes the investigation of model suitability over very long sequences and the possibility of model regeneration on demand. Future work can also include investing the suitability of the proposed work for model generation using a sequence independent approach.

References

Ahn S, Lee B, Kim M (2015) A Novel Fast CU Encoding Scheme Based on Spatiotemporal Encoding Parameters for HEVC Inter Coding. IEEE Transactions on Circuits and Systems for Video Technology 25(3):422–435
Article Google Scholar
Bjøntegaard G (2008) Improvements of the BD-PSNR model. document VCEG-AI11, ITU-T SG16/Q6
Breiman L (2001) Random Forests. Mach Learn 45(1):5–32
Article Google Scholar
Cho S, Kim M (2013) Fast CU Splitting and Pruning for Suboptimal CU Partitioning in HEVC Intra Coding. IEEE Transactions on Circuits and Systems for Video Technology 23(9):1555–1564
Article Google Scholar
Chung C-H, Peng W-H, Hu J-H (2017) HEVC/H.265 Coding Unit Split Decision Using Deep Reinforcement Learning. International Symposium on Intelligent Signal Processing and Communication Systems, Xiamen
Book Google Scholar
Correa G, Assuncao PA, Agostini LV, da Silva Cruz LA (2015) Fast HEVC Encoding Decisions Using Data Mining. IEEE Transactions on Circuits and Systems for Video Technology 25(4):660–673
Article Google Scholar
Deng X, Xu M, Jiang L, Sun X, Wang Z (2016) Subjective-Driven Complexity Control Approach for HEVC. IEEE Transactions on Circuits and Systems for Video Technology 26(1):91–106
Article Google Scholar
Du B, Siu W-C, Yang X (2015) Fast CU Partition Strategy for HEVC Intra-Frame Coding Using Learning Approach via Random Forests. Proceedings of APSIPA Annual Summit and Conference, Hong Kong
Book Google Scholar
Goswami K, Lee J, Kim B (2016) Fast algorithm for the High Efficiency Video Coding (HEVC) encoder using texture analysis. Inf Sci 364-365:72–90
Article Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: An update. ACM SIGKDD Explorations Newslett 11(1):10–18
Article Google Scholar
Hu Q, Zhang X, Shi Z, Gao Z (2016) Neyman-Pearson-Based Early Mode Decision for HEVC Encoding. IEEE Transactions on Multimedia 18(3):379–391
Article Google Scholar
Jiménez-Moreno A, Martínez-Enríquez E, Díaz-de-María F (2016) Complexity Control Based on a Fast Coding Unit Decision Method in the HEVC Video Coding Standard. IEEE Transactions on Multimedia 18(4):563–575
Article Google Scholar
Kim I-K, McCann KD, Sugimoto K, Bross B, Han W-J, Sullivan GJ (2013) High Efficiency Video Coding (HEVC) Test Model 13 (HM13) Encoder Description. Document: JCTVC-O1002, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, 15th Meeting, Geneva
Google Scholar
Kim H, Park R (2016) Fast CU Partitioning Algorithm for HEVC Using an Online-Learning-Based Bayesian Decision Rule. IEEE Transactions on Circuits and Systems for Video Technology 26(1):130–138
Article Google Scholar
Lee J, Kim S, Lim K, Lee S (2015) A Fast CU Size Decision Algorithm for HEVC. IEEE Transactions on Circuits and Systems for Video Technology 25(3):411–421
Article Google Scholar
Lim K, Lee J, Kim S, Lee S (2015) Fast PU Skip and Split Termination Algorithm for HEVC Intra Prediction. IEEE Transactions on Circuits and Systems for Video Technology 25(8):1335–1346
Article Google Scholar
Liu Z, Lin T, Chou C (2016) Efficient prediction of CU depth and PU mode for fast HEVC encoding using statistical analysis. J Vis Commun Image Represent 38:474–486
Article Google Scholar
Livingston F (2005) Implementation of Breiman’s random forest machine learning algorithm. ECE591Q Machine Learning Journal Paper 1–13
Mallikarachchi T, Talagala D, Arachchi H, Fernando A (2018) Content-adaptive feature-based CU size prediction for fast low-delay video encoding in HEVC. IEEE Transactions on Circuits and Systems for Video Technology 28(3)
Article Google Scholar
Genuer R, Poggi J-M, Tuleau-Malot C (2010) Variable selection using Random Forests. Pattern Recogn Lett 31(14):2225–2236
Article Google Scholar
Ohm JR, Sullivan GJ, Schwarz H, Tan TK, Wiegand T (2012) Comparison of the Coding Efficiency of Video Coding Standards—Including High Efficiency Video Coding (HEVC). IEEE Transactions on Circuits and Systems for Video Technology 22(12):1669–1684
Article Google Scholar
Park S (2016) CU encoding depth prediction, early CU splitting termination and fast mode decision for fast HEVC intra-coding. Signal Process Image Commun 42:79–89
Article Google Scholar
Peixoto E, Shanableh T, Izquierdo E (2014) H.264/AVC to HEVC Video Transcoder based on Dynamic Thresholding and Content Modeling. IEEE Transactions on Circuits and Systems for Video Technology 24(1)
Article Google Scholar
Radosavljević M, Georgakarakos G, Lafond S, Vukobratović D (2015) Fast coding unit selection based on local texture characteristics for HEVC intra frame. 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Orlando, pp 1377–1381
Google Scholar
Shanableh T (2011) Prediction of Structural Similarity Index of Compressed Video at a Macroblock Level. IEEE Signal Processing Letters 18(5):335–338
Article Google Scholar
Shanableh T, Assaleh K (2010) Feature modeling using polynomial classifiers and stepwise regression. Neurocomputing, Elsevier 73:10–12
Article Google Scholar
Shanableh T, Peixoto E, Izquierdo E (2013) MPEG-2 to HEVC video transcoding with content-based modeling. IEEE Transactions on Circuits and Systems for Video Technology 23(7)
Article Google Scholar
Shen L, Zhang Z, Liu Z (2014) Effective CU Size Decision for HEVC Intracoding. IEEE Trans Image Process 23(10):4232–4241
Article MathSciNet Google Scholar
Shen L, Zhang Z, Zhang X, An P, Liu Z (2015) Fast TU size decision algorithm for HEVC encoders using Bayesian theorem detection. Signal Process Image Commun 32:121–128
Article Google Scholar
Sullivan G, Ohm J, Han W, Wiegand T (2012) Overview of the High Efficiency Video Coding (HEVC) Standard. IEEE Transactions on Circuits and Systems for Video Technology 22(12):1649–1668
Article Google Scholar
Sullivan GJ, Wiegand T (1998) Rate-distortion optimization for video compression. IEEE Signal Process Mag 15(6):74–90
Article Google Scholar
Tai K-H, Hsieh M-Y, Chen M-J, Chen C-Y, Yeh C-H (2017) A fast HEVC encoding method using depth information of collocated CUs and RD cost characteristics of PU modes. IEEE Transactions on Broadcasting 63(4)
Article Google Scholar
Toh K, Tran Q, Srinivasan D (2004) Benchmarking a reduced multivariate polynomial pattern classifier. IEEE Trans Pattern Anal Mach Intell 26(6):740–755
Article Google Scholar
Vanne J, Viitanen M, Hamalainen TD, Hallapuro A (2012) Comparative rate-distortion-complexity analysis of HEVC and AVC video codecs. IEEE Trans Circuits Syst Video Technol 22(12):1885–1898
Article Google Scholar
Xiong J, Li H, Meng F, Wu Q, Ngan KN (2015) Fast HEVC Inter CU Decision Based on Latent SAD Estimation. IEEE Transactions on Multimedia 17(12):2147–2159
Article Google Scholar
Xiong J, Li H, Wu Q, Meng F (2014) A Fast HEVC Inter CU Selection Method Based on Pyramid Motion Divergence. IEEE Transactions on Multimedia 16(2):559–564
Article Google Scholar
Yang S-H, Zhong C-C (2017) Fast Coding-Unit Mode Decision for HEVC Transrating. IEEE International Conference on Computer and Information Technology, Finland
Book Google Scholar
Yoo H, Suh J (2014) Fast coding unit decision based on skipping of inter and intra prediction units. Electron Lett 50(10):750–752
Article Google Scholar
Zhang Y, Kwong S, Wang X, Yuan H, Pan Z, Xu L (2015) Machine Learning-Based Coding Unit Depth Decisions for Flexible Complexity Allocation in High Efficiency Video Coding. IEEE Trans Image Process 24(7):2225–2238
Article MathSciNet Google Scholar
Zhao W, Onoye T, Song T (2015) Hierarchical Structure-Based Fast Mode Decision for H.265/HEVC. IEEE Transactions on Circuits and Systems for Video Technology 25(10):1651–1664
Article Google Scholar
Zupancic I, Blasi SG, Peixoto E, Izquierdo E (2016) Inter-Prediction Optimizations for Video Coding Using Adaptive Coding Unit Visiting Order. IEEE Transactions on Multimedia 18(9):1677–1690
Article Google Scholar

Download references

Author information

Mahitab Hassan
Present address: IBM Cloud, Dubai, United Arab Emirates

Authors and Affiliations

Department of Computer Science and Engineering, American University of Sharjah, Sharjah, United Arab Emirates
Mahitab Hassan & Tamer Shanableh

Authors

Mahitab Hassan
View author publications
You can also search for this author in PubMed Google Scholar
Tamer Shanableh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tamer Shanableh.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hassan, M., Shanableh, T. Predicting split decisions of coding units in HEVC video compression using machine learning techniques. Multimed Tools Appl 78, 32735–32754 (2019). https://doi.org/10.1007/s11042-018-6882-8

Download citation

Received: 24 January 2018
Revised: 24 September 2018
Accepted: 13 November 2018
Published: 23 November 2018
Issue Date: December 2019
DOI: https://doi.org/10.1007/s11042-018-6882-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Predicting split decisions of coding units in HEVC video compression using machine learning techniques

Abstract

Similar content being viewed by others

Fast video encoding based on random forests

Fast coding tree structure decision for HEVC based on classification trees

Detecting Double and Triple Compression in HEVC Videos Using the Same Bit Rate

1 Introduction

2 Related work

3 System overview

4 System training

4.1 Feature extraction and dimensionality reduction

4.2 Classification methods

5 Experimental results

6 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Predicting split decisions of coding units in HEVC video compression using machine learning techniques

Abstract

Similar content being viewed by others

Fast video encoding based on random forests

Fast coding tree structure decision for HEVC based on classification trees

Detecting Double and Triple Compression in HEVC Videos Using the Same Bit Rate

Explore related subjects

1 Introduction

2 Related work

3 System overview

4 System training

4.1 Feature extraction and dimensionality reduction

4.2 Classification methods

5 Experimental results

6 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation