An efficient low-complexity block partition scheme for VVC intra coding

Song, Yun; Zeng, Biao; Wang, Miaohui; Deng, Zelin

doi:10.1007/s11554-021-01174-z

An efficient low-complexity block partition scheme for VVC intra coding

Original Research Paper
Published: 03 October 2021

Volume 19, pages 161–172, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Real-Time Image Processing Aims and scope Submit manuscript

An efficient low-complexity block partition scheme for VVC intra coding

Download PDF

Yun Song^1,2,
Biao Zeng^1,2,
Miaohui Wang ORCID: orcid.org/0000-0003-1125-9299^3,4 &
…
Zelin Deng^1,2

806 Accesses
11 Citations
Explore all metrics

Abstract

The newest versatile video coding (VVC) adopts a novel quadtree with a nested multi-type tree (QTMT) partition structure for intra-frame coding and splits the coding unit (CU) into not only square sub-blocks but also rectangular sub-blocks. The more flexible CU sizes improve the encoding efficiency while cause much heavier computational complexity than High-Efficiency Video Coding (HEVC). In this paper, an efficient low-complexity CU partition method based on the texture characteristic is proposed to achieve a good trade-off between the coding efficiency and the complexity reduction. Specifically, the texture complexities of vertical and horizontal directions are quantitatively measured in terms of the sum of the mean absolute deviation (SMAD) of the sub-blocks. Then, the vertical and horizontal texture complexities are compared to eliminate some unlikely partition modes. Moreover, the threshold of directional SMAD ratios for choosing the texture direction is adjusted according to the quantization parameter (QP) to improve the prediction accuracy. Experimental results show that the proposed method achieves the average complexity reduction efficiency by 54.60% than the VVC test model (VTM). The implementation of the proposed method is publicly available at http://sites.google.com/site/wangmiaohui/ and http://github.com/csust-sonie/fastVVC_RTIP_2107.

Texture-based fast QTMT partition algorithm in VVC intra coding

Article 11 October 2022

Fast CU partition strategy based on texture and neighboring partition information for Versatile Video Coding Intra Coding

Article 08 September 2023

Texture-Based Fast CU Size Decision and Intra Mode Decision Algorithm for VVC

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In recent years, the rapidly growing multimedia technologies [31] like 4K ultra-high-definition (UHD) video, $360^{\circ } $ immersive multimedia and high dynamic range (HDR) video increase the visual quality, but lead to tremendous workload for the storage and transmission. The previous High-Efficiency Video Coding (HEVC) is difficult to satisfy the compression demand of explosive video data. To overcome the bottleneck of digital media development, the Joint Video Exploration Team (JVET), consisting of ISO Moving Picture Expert Group (MPEG) and ITU-T Video Coding Expert Group (VCEG), started to explore the next-generation video standard beyond HEVC, named Future Video Coding (FVC) in 2015. One of the new techniques of FVC is the quadtree with nested binary tree (QTBT) structure, and it is adopted in the joint exploration model (JEM) [17]. The coding unit (CU) is first partitioned into four equal-sized blocks by the quadtree (QT) structure. Then, the binary tree (BT) partition is performed sequentially to obtain the rectangular CUs. Based on such a partition tool in FVC [7], JEVT further introduces a novel partition technology, named quadtree with nested multi-type tree (QTMT), and develops a new standard for video coding, called Versatile Video Coding (VVC). VVC aims to achieve more than 50% coding efficiency improvement while maintaining the same video quality as the HEVC standard. Except for the QT partition structure, there are four multi-type tree (MT) partition structures, including vertical binary tree partition (BV), horizontal binary tree partition (BH), vertical ternary tree partition (TV), and horizontal ternary tree partition (TH). As an illustration in Fig. 1, the visual QTMT result of VVC shows a very complicated partition pattern, which indicates that it needs a significant amount of the running time to achieve the optimal CU partition.

Recursive MT partition brings more flexibility to the CU shape, but the related rate-distortion optimization (RDO) process increases the computational complexity. If we can predict the MT partition mode in advance, the encoding time will be significantly saved. It can be seen that in Fig. 1b and c, the dominant direction of the best MT partition mode is closely related to the texture direction. The CUs with vertical textures are more likely to be partitioned by the vertical MT modes, while the CUs with horizontal textures are more likely to be partitioned by the horizontal MT partition modes. Inspired by the observation, we consider determining the partition direction by making use of the texture features. We introduce the concept of the sum of the mean absolute deviation (SMAD) of the sub-blocks to evaluate the complexity of vertical and horizontal directions. Moreover, the ratios of vertical-to-horizontal SMAD are exploited to decide the texture direction. Based on the observation and statistical results, we propose an efficient low-complexity CU partition method to reduce the encoding complexity of VVC. We remove the unlikely MT partition patterns by comparing the ratio of vertical SMAD to horizontal SMAD under different quantization parameters (QPs). The experimental results demonstrate that the proposed algorithm reduces the encoding time of the intra-prediction significantly with a negligible coding loss.

The rest of this paper is organized as follows: Sect. 2 briefly introduces some basic background knowledge of the VVC intra coding, and reviews some related fast algorithms. In Sect. 3, the motivation and statistical analysis are presented, and then, the proposed fast CU decision algorithm is described in details. Sect. 4 presents the experimental results and comparisons with other recent methods. The conclusion is provided in Sect. 5.

2 Background and related work

2.1 Background of the VVC intra coding

Block partition has taken a crucial role in video coding. VVC integrates a bunch of block partitions for intra-prediction. It splits the frame in the luminance and chrominance components [10], and unifies the concepts of the coding unit (CU), prediction unit (PU), and transform unit (TU). VVC employs the QTMT partition structure to determine the final CU size. Figure 2 demonstrates a schematic example of QTMT structure and the corresponding recursive partition trees. Unlike the QT structure that splits a CU into four squared sub-CUs, the BT structure divides a CU into two rectangular blocks with a symmetric area, while the ternary tree (TT) structure splits a CU into three rectangular blocks with an area ratio of 1:2:1. The determination process of the VVC intra-prediction and partition process is presented in Fig. 3. The coding tree unit (CTU) is partitioned into four equal-sized sub-CUs by QT first, and the resulted sub-CUs are recursively partitioned into equal-sized CUs by QT or rectangle-sized CUs by MT. The QTMT structure makes the CU more adaptive to the video content, and improves the coding efficiency substantially. However, it also inevitably increases the computational complexity to traverse all the possible partition combinations in a CTU one by one.

To avoid the overlapping partitions, there are some predefined restrictions for the QTMT structure. For instance, the $128\times 128$ CTU is divided into four $64 \times 64$ quadtree nodes by default, because the maximum CU size is $64\times 64$, and the minimum CU size is $4 \times 4$. The minimum QT partition size is $16 \times 16$, while the maximum MT partition size is $32 \times 32$. If a CU is split by MT, QT partition will be forbidden for obtaining leaf nodes. If the current CU performs the the TT partition, the inter-sub-blocks are not allowed to be BT partitioned. The redundant partitioning examples are shown in Fig. 4.

As for the intra-prediction in VVC, the candidate modes are extended from 35 to 67, including a DC mode, a planar mode, and 65 directional modes, as shown in Fig. 5. The DC mode is suitable for large flat areas, and the planar mode is for regions where the pixel values are slowly changed. The black lines denote the prediction modes in HEVC, while the red lines denote the additional modes in VVC. The intra-frame coding mode is determined as follows. First, the rough mode decision (RMD) is performed on the traditional 35 modes from HEVC. Then, the first N candidates based on RMD with the smallest sum of absolute Hadamard transformed difference (SATD) are further considered in a fine mode decision process. The direct neighboring modes of the first N modes are also added to the candidates, and the list is updated dynamically in terms of the SATD value. Finally, the most probable modes from the left and above neighboring blocks are merged to perform the full RDO process.

Moreover, VVC adopts many new encoding tools for intra-prediction, such as the position-dependent intra-prediction combination (PDPC) [23], cross-component linear model (CCLM) [18], intra-sub-partition (ISP) [12], and wide angular intra-prediction (WAIP) [37]. These new technologies have been integrated into the VVC test model (VTM). It is worth noting that they benefit the compression efficiency but considerably increase the computational complexity.

2.2 Related work

Many researchers have been devoted to developing fast algorithms for video encoding. For example, the global and local edge complexities of the current CU and its sub-CUs were exploited in [22] to classify a CU to be split, non-split, or undetermined at each depth level. [25] and [24] proposed fast algorithms for HEVC intra coding tasks by employing the texture property and the depth information to make an early CU pruning decision or CU splitting decision. In [33], the content similarity between the current CU and its co-located CU in temporally, and the neighboring CUs in the same frame were referenced to determine the depth early. In [38], the partition modes of different-size CUs were reduced according to the correlation between the quadtree partition and the texture partition features. Wang et al. [30] proposed a fast prediction mode and CU size decision algorithm by grouping 35 intra-prediction modes into 5 sub-lists according to the texture complexity. Hosseini et al. [16] designed a rate-complexity model to allocate the complexity budget and obtain the target complexity by considering the complexity distribution and video content. Varma et al. [29] adopted a fast test zonal search to reduce the encoding complexity, which was used in gird, raster, and refinement search stages. Meanwhile, machine learning solutions were also developed for the fast encoding algorithm. Lim et al. [21] designed a fast CU size decision based on the Bayes decision. The PU skip and split termination were performed by utilizing the ratio of the rate-distortion (RD) cost in RMD from the current PU to that of the neighboring PU. In [35], the CU size decision was modeled as a classification, and the representative features were utilized in support vector machine (SVM) to perform the early CU decision. Grellert et al. [14] trained an automatic learning model by combining the statistically based heuristics algorithm and fast decision method. Bouaafia et al. [6] proposed two fast CU partition algorithms for HEVC inter-mode decision, including the SVM trained model and the convolutional neural network (CNN) model. Tahir et al. [26] devised a fast algorithm based on a combination of the online and offline random forests (RF) to deal with the CU and transform unit partition. Apart from the above fast methods for HEVC, some fast methods were also investigated for 3D-HEVC, such as correlation-based [20], edge direction-guided [15, 34], and decision tree-driven [3].

The fast encoding schemes mentioned above achieved well-complexity reduction on the encoder side. However, they were all designed for HEVC instead of VVC. Recently, some fast CU algorithms were proposed for VVC. Inspired by the RMD process, [19] adopted the SATD-based mode decision to determine the best intra-prediction mode for sub-block partitions. In [13], the correlations between the parent CU and its following horizontally splitting sub-CUs were exploited to skip the vertical split through the Bayesian rule. And then, the RD cost of the vertical binary split was applied to avoid the horizontal ternary split early. [9] devised a fast algorithm for the new QTMT structure, where directional gradient and the variance of variance of sub-CUs features were exploited for the CU partition decision. An innovative partition determination framework was proposed in [32], in which the QT partition decision was judged before the MT partition decision. The model transformed the multi-classification problem into multiple binary-classification problems at each decision level, which was handled by decision tree classifiers. This work also proposed a fast intra-mode decision with gradient descent search. [28] skipped the vertical or horizontal partition modes, and performed the early termination for intra coding according to the edge features extracted by the Canny operator. Cui et al. [11] calculated the gradient of each CU along vertical, horizontal, and two diagonal directions, and then pre-determined the likelihood of BT or TT partition in horizontal or vertical direction. [2] proposed a lightweight and tunable partitioning method for QTBT and QTMT using the RF classifier to decide the likely partition modes. [8] established the relationships between the extracted features and splitting modes from the online learning frames, and then trained the SVM classifier to predict the direction of the MT structure. [27] performed a pre-decision algorithm for the homogeneous block by the Sobel gradient operator to skip the calculation of sub-CUs, and then implemented a pooling-variable CNN, which trained the partition decision for CUs with various sizes by only one network. [36] developed an adaptive CU partition algorithm based on deep learning and multi-feature fusion to terminate the iteration of the RDO process.

Being different from the previous methods, the article introduced a new fast block partition decision algorithm based on the complexity of the whole CU. Specifically, the vertical and horizontal complexity deviation of sub-blocks was characterized by SMAD. Then, the vertical and horizontal partition is predicted based on the texture direction, where the unnecessary MT partition patterns are skipped accordingly. Experimental results show that the proposed method obtains a satisfying balance between the prediction accuracy and the complexity reduction.

3 Methodology

VVC adopts a flexible QTMT structure, which improves the coding efficiency at the cost of substantial complexity (i.e., ten times or more than HEVC). To reduce the complexity, we devise a fast CU size decision algorithm based on the texture direction to avoid the redundant MT partitions. In this section, we describe the proposed scheme in details as follows.

3.1 Motivation and statistical analysis

In the VVC reference software, MT partition structures take almost one-third of the total division, and become the most complicated operation, which accounts for more than 90% of the total encoding time on average [8]. The statistical proportion of the MT partition modes (from $32\times 32$ to $8\times 8$) for five sequences is shown in Fig. 6a, and the related running time proportion for the MT modes is shown in Fig. 6b. In the MT partitioning decision process, there is a great flexibility in the partition modes decision, but only one of the vertical and horizontal directions is adopted at the last decision. Therefore, if we can remove the unlikely MT partition modes in advance, the coding complexity from the expensive RDO process will be reduced significantly.

Moreover, it can be observed from Fig. 1a that for the region with smooth textures, like the blue background, VVC tends to choose larger size CUs. While for the area with the complicated textures, like the outline of the characters, VVC is more likely to select smaller size CUs. Moreover, when the region with the directional texture is divided into several sub-CUs along the same direction, which can be categorized into the vertical and horizontal directions roughly. The vertical and horizontal partitions are indicated as the red and green boxes in Fig. 1b and c, respectively.

A toy example is shown in Fig. 7, in which block1 and block2 have the same pixel values while block3 and block4 have the same pixel values. It can be apparently found that the complexity difference between sub-CUs in the vertical direction is larger than that in the horizontal direction, and the texture direction is considered as horizontal direction. Therefore, it is more preferred to skip the vertical partition mode, and choose the horizontal partition modes in advance.

3.2 Establishment of the texture complexity measurement

The median absolute deviation (MAD) represents the mean of the data absolute deviations around its mean, which is computationally simpler than the standard deviation. In this section, we adopt MAD to measure the complexity (or variability) of the image block. The complexity measurement, ${\mathcal {C}}_{\text {mad}}$, is formulated as

$$\begin{aligned} {\mathcal {C}}_{\text {mad}} = \frac{1}{{W \times H}}\sum \nolimits _{x = 1}^W {\sum \nolimits _{y = 1}^H {|p(x,y) - m (\mathbf{p })|} }, \end{aligned}$$

(1)

where W and H are the width and height of the current CU, and p(x, y) represents the pixel value at position (x, y) . $m (\mathbf{p })$ denotes the mean of the pixel values, which is defined by

$$\begin{aligned} m (\mathbf{p }) = \frac{1}{{W {\times } H}}\sum \nolimits _{x = 1}^W {\sum \nolimits _{y = 1}^H {p(x,y)} }. \end{aligned}$$

(2)

The straightforward idea of the proposed fast block partition determination scheme is described as follows: if the ${\mathcal {C}}_{\text {mad}}$ value is less than a threshold ${\mathcal {T}}$, it means that there is a little variance between pixels in the CU; it is classified into the homogeneous type, and performs the early determination. Otherwise, the current CU is classified into the complex type. In such a case, the next bypass method is performed.

Based on the above analysis, we propose to divide the current CU into four equal-sized blocks, and compute the complexity of each sub-CU in terms of ${\mathcal {C}}_{\text {mad}}$. Considering the new characteristic of the VVC codec, we first calculate the mean complexity deviation between sub-CUs in both the vertical and horizontal directions. Then, we compare the complexity difference from such two directions to estimate the texture direction of the whole CU.

The ${\mathcal {C}}_{\text {mad}}^{bl}$, ${\mathcal {C}}_{\text {mad}}^{br}$, ${\mathcal {C}}_{\text {mad}}^{tl}$, and ${\mathcal {C}}_{\text {mad}}^{tr}$ values are used to represent the texture complication of the bottom-left, bottom-right, top-left, and top-right sub-CUs, respectively, which are calculated by Eqs. (3a)–(3d)

$$\begin{aligned} {\mathcal {C}}_{\text {mad}}^{bl}= & {} \frac{1}{{\left( {\frac{W}{2}} \right) {\times } \left( {\frac{H}{2}} \right) }}\sum \nolimits _{x = 1}^{W/2} {\sum \nolimits _{y = 1}^{H/2} {|p(x,y) - m (\mathbf{p }_1) |} }, \end{aligned}$$

(3a)

$$\begin{aligned} {\mathcal {C}}_{\text {mad}}^{br}= & {} \frac{1}{{\left( {\frac{W}{2}} \right) {\times } \left( {\frac{H}{2}} \right) }}\sum \nolimits _{x = \frac{W}{2} + 1}^W {\sum \nolimits _{y = 1}^{H/2} {|p(x,y) - m (\mathbf{p }_2) |} }, \end{aligned}$$

(3b)

$$\begin{aligned} {\mathcal {C}}_{\text {mad}}^{tl}= & {} \frac{1}{{\left( {\frac{W}{2}} \right) {\times } \left( {\frac{H}{2}} \right) }}\sum \nolimits _{x = 1}^{W/2} {\sum \nolimits _{y = \frac{H}{2} + 1}^H {|p(x,y) - m (\mathbf{p }_3) |} }, \end{aligned}$$

(3c)

$$\begin{aligned} {\mathcal {C}}_{\text {mad}}^{tr}= & {} \frac{1}{{\left( {\frac{W}{2}} \right) {\times } \left( {\frac{H}{2}} \right) }}\sum \nolimits _{x = \frac{W}{2} + 1}^W {\sum \nolimits _{y = \frac{H}{2} + 1}^H {|p(x,y) - m (\mathbf{p }_4) |} } . \end{aligned}$$

(3d)

We estimate the texture complexity according to the SMAD value of ${\mathcal {C}}_{\text {mad}}$ for each CU, including the vertical and horizontal complexities. The vertical SMAD, $S_{\text {ver}}$, is defined by the sum of the difference of the left sub-CUs and that of the right sub-CUs. Similarly, the horizontal SMAD, $S_{\text {hor}}$, is defined by the sum of the difference of the bottom sub-CUs and that of the top sub-CUs. Furthermore, we consider utilizing the ratio of $S_{\text {ver}}$ and $S_{\text {hor}}$ to represent the possibility of the partition mode, which is formulated as

$$\begin{aligned} {\mathcal {R}}_{\text {ver/hor}} =\frac{{\mathcal {S}}_{\text {ver}}}{{\mathcal {S}}_{\text {hor}}} = \frac{ {|{\mathcal {C}}_{\text {mad}}^{bl} - {\mathcal {C}}_{\text {mad}}^{tl} | + |{\mathcal {C}}_{\text {mad}}^{br} - {\mathcal {C}}_{\text {mad}}^{tr}|} }{ {|{\mathcal {C}}_{\text {mad}}^{bl} - {\mathcal {C}}_{\text {mad}}^{br} | + |{\mathcal {C}}_{\text {mad}}^{tl} - {\mathcal {C}}_{\text {mad}}^{tr}|} }. \end{aligned}$$

(4)

Table 1 The statistical data of the vertical and horizontal MT modes over $32\times 32$ and $16\times 16$ for ${\mathcal {R}}_{\text {ver/hor}} < 1 $ or ${\mathcal {R}}_{\text {ver/hor}} > 1 $

Full size table

Experiments on the VVC reference software [1] were conducted to investigate the relationship between the final partition mode and the SMAD feature. Six video sequences from the JVET common test condition (CTC) [5] were used in the experiments, including “FoodMarket4” ($3840 \times 2160$), “DaylightRoad2” ($3840\times 2160$), “Cactus” ($1920 \times 1080 $), “BQMall” ($832 \times 480 $), “BasketballPass” ($416 \times 240 $), and “FourPeople” ($1280\times 720 $). The first five frames of each sequence were encoded in all-intra (AI) configurations under $\hbox {QP}=27$. We collect the statistical results of ${\mathcal {R}}_{\text {ver/hor}}$ and the corresponding vertical and horizontal MT partition modes, and the overall results are presented in Table 1.

As we can see from Table 1, the percentages of the vertical and horizontal partition modes vary with different sequences. However, when the ${\mathcal {R}}_{\text {ver/hor}}$ values are less than 1, the percentages of vertical partition modes are 74%, 61%, 66%, 75%, 71%, and 78%, respectively. In such a case, the average possibility of being encoded by the vertical partition is more than twice that of the horizontal partition. Meanwhile, when the ${\mathcal {R}}_{\text {ver/hor}}$ values are greater than 1, we can find that the average possibility of being encoded by the horizontal partition is more than twice that of the vertical partition.

Based on the statistical data in Table 1, the proposed fast CU decision scheme is designed based on the following rule. When the ${\mathcal {R}}_{\text {ver/hor}}$ value is greater than the higher threshold ${\mathcal {T}}_{h}$, it means that the complexity of the horizontal direction is greater than that of the vertical direction, and the current CU is more likely to have the horizontal texture. In this case, we abandon the vertical MT partition modes in advance. In contrast, when the ${\mathcal {R}}_{\text {ver/hor}}$ value is less than a lower threshold ${\mathcal {T}}_{l}$, it represents that the current CU is more likely to have the vertical texture, so we skip the horizontal partitioning mode. When ${\mathcal {R}}_{\text {ver/hor}}$ is between ${\mathcal {T}}_{h}$ and ${\mathcal {T}}_{l}$, it can be considered that there is a little difference in the texture complexity between the vertical and horizontal directions. In such a case, every CU partition almost has the same probability. Thus, choosing the original VVC scheme can ensure the prediction accuracy. Although we separate the CU into the equal-size blocks, the proposed method is also applied for the TT partition. Considering that the maximum MT size is $32\times 32$, we only implement our method for the luminance blocks with $32\times 32$ and $16\times 16$ to keep the trade-off between the complexity reduction and the encoding efficiency.

3.3 Determination of the partition condition

To verify the assumptions, we statistically count the cumulative distribution of the splitting modes and the ${\mathcal {R}}_{\text {ver/hor}} $ values. The mistaken cases are defined as follows: (1) when ${\mathcal {R}}_{\text {ver/hor}} > {\mathcal {T}}_{h} $, the best splitting mode should be the horizontal splitting mode theoretically, but the actual splitting mode is vertical; (2) when ${\mathcal {R}}_{\text {ver/hor}} < {\mathcal {T}}_{l} $, the best splitting mode should be the vertical splitting mode theoretically, but the best splitting mode is horizontal instead. The error ratio is formulated as Eq. (5)

$$\begin{aligned} {\mathcal {R}}_{\text {err}} = \frac{{N_{err} }}{{N_{all} }} \times 100\%, \end{aligned}$$

(5)

where $N_{\text {err}}$ represents the number of the misclassification of the vertical and horizontal partitions, and $N_{all}$ represents the total number of all the testing CUs. Accordingly, the correct ratio, ${\mathcal {R}}_{\text {acc}}$, is calculated by ${\mathcal {R}}_{\text {acc}} = 1 - {\mathcal {R}}_{\text {err}}$. Consequently, we can use these two indicators to estimate the similarity between our method and the original scheme.

The threshold plays a crucial role in the proposed scheme. If the ${\mathcal {T}}_{h}$ value is too small, the algorithm will obtain better complexity reduction, because more partition modes are determined in advance. If the ${\mathcal {T}}_{h}$ value is too large, the coding performance will be better, but the computational complexity will increase. The encoding results of $ 32 \times 32$ and $ 16 \times 16$ are collected from “BasketballDrive”, “BQMall”, and “BasketballPass” under QP=27 and QP=37 to test the prediction accuracy. The relationship between the threshold and ${\mathcal {R}}_{\text {acc}}$ is demonstrated in Fig. 8.

Moreover, we also carry out experiments to reveal the partitioning results under different QPs. As shown in Fig. 9, we can intuitively observe that the number of CUs encoded by $32 \times 32 $ increases with a higher QP. Meanwhile, the number of CUs encoded by $16 \times 16 $ decreases with a higher QP. It indicates that we can update the threshold value along with the QP increasing. In view of the RD performance, since the mistaken classification of a larger CU will generate more BD rate than a smaller CU, we take a tight judgement condition for high QPs. In our method, when the QP value is greater than 28, we set the threshold ${\mathcal {T}}_{h}$ as 1.7; otherwise, we set the threshold ${\mathcal {T}}_{h}$ as 1.2. As presented in Fig. 8, the overall accuracy of the proposed method ranges from 80% to 90%. To simplify the modeling process, the ${\mathcal {T}}_{l}$ is defined as the reciprocal of ${\mathcal {T}}_{h}$, because the number of vertical and horizontal partition modes is at a similar numeral level generally for a specific sequence.

3.4 The overall algorithm

The pseudo-code of the overall algorithm can be summarized as Algorithm 1. The main idea is described as follows. If the size of the current CU is $32 \times 32 $ or $16 \times 16 $, the fast algorithm is performed for intra-prediction. We calculate the ${\mathcal {C}}_{\text {mad}}$ values for all sub-blocks in the vertical and horizontal directions, which is used to measure the complexity. And the ratio $ {\mathcal {R}}_{\text {ver/hor}}$ of the vertical-to- horizontal complexity is calculated to compare the possibility of the texture direction. If the $ {\mathcal {R}}_{\text {ver/hor}}$ value is larger than the higher threshold ${\mathcal {T}}_{h}$, the vertical MT partitioning mode is skipped, including BV and TV. If the $ {\mathcal {R}}_{\text {ver/hor}}$ value is less than the lower threshold ${\mathcal {T}}_{l}$, the horizontal MT partition is skipped, including BH and TH. Otherwise, we consider that the texture direction is not apparent, and choose the default partition modes of the VVC reference software. The flowchart of the proposed algorithm is shown in Fig. 10.

4 Experimental results

To evaluate the performance of the proposed fast method, we conduct the experiments on the VVC reference software [1]. A total of 22 common test sequences belong to six categories with various resolutions: A1 $(3840\times 2160)$, A2 $(3840\times 2160)$, B $(1920\times 1080)$, C $(832\times 480)$, D $(416\times 240)$, and E $(1280\times 720)$. The encoding parameters take the recommended setting by the JVET common test condition (CTC) [5] configurations under QP = 22, 27, 32, and 37, respectively.

We carry out the experiments based on the computing platform with Intel(R) Core (TM) i5-6200U CPU @ 2.30 GHz, 8 GB RAM with Microsoft Visual Studio C++ 2017 compiler, and we use the open-source computer vision (OpenCV) 3.4.1 library. The encoding efficiency is measured by Bjøntegaard delta peak signal-to-noise ratio (BDPSNR) and Bjøntegaard delta bit rate (BDBR) [4]. The time saving, $T_s$, is measured by

$$\begin{aligned} T_s = \frac{{T_{o} - T_{p} }}{{T_{o} }} \times 100\%, \end{aligned}$$

(6)

where $T_o$ denotes the total encoding time of the original VVC, and $T_p$ denotes the total encoding time of the proposed method.

4.1 Overall performance

Table 2 provides the overall encoding results of the proposed method. The proposed texture-based fast decision strategy achieves the time saving from 26.55 to 35.63% for various video sequences. The average encoding time saving is 30.33% compared with the anchor. More importantly, the complexity reduction has a negligible effect on the encoding efficiency; for example, 0.57% BDBR increase or 0.03 dB BDPSNR decrease on average.

Table 2 Experimental results of the proposed method under the all-intra configuration

Full size table

4.2 Rate-distortion comparison

Figure 11 illustrates the overall rate-distortion curves of six typical sequences from Classes A1 to E: “Campfire” ($3840 \times 2160 $), “ParkRunning3” ($3840 \times 2160 $), “BasketballDrive” ($1920 \times 1080 $), “BQMall” ($832 \times 480 $), “BasketballPass” ($416 \times 240 $), and “Johnny” ($1280 \times 720 $) with QP= 22, 27, 32, and 37, respectively. It can be seen that the RD curves of the original VVC anchor and the proposed method are approximately identical for all video sequences. The results verify the effectiveness of the proposed scheme in reducing the computational complexity but maintaining the coding efficiency.

4.3 Complexity reduction efficiency comparison

Complexity reduction efficiency is an important indicator to measure the complexity reduction versus the compress bit-rate increment. It is defined by

$$\begin{aligned} E_c = \frac{{T_{s} }}{{\text {BDBR} }} \times 100\%. \end{aligned}$$

(7)

Table 3 Comparison results of the proposed algorithm and thee recent advances on four QPs (22, 27, 32, and 37)

Full size table

The performance comparison results of the complexity reduction efficiency are summarized in Table 3. Three recent advances are employed to demonstrate the comparison results, including FU2019 [13], CHEN2019 [9], and YANG2019 [32]. Sine CHEN2019 [9] only performed the experiments on the sequences of 8-bit depth, we provided the results on class B–E for fair comparison. FU2019 reduces 42.36% of the encoding time at the cost of the BDBR increase by 0.93%, CHEN2019 saves the encoding time by 52.48% at the cost of the BDBR increase by 1.59%, and YANG2019 achieves 52.98% of the complexity reduction at the cost of the BDBR increase by 1.86%.

In terms of the complexity reduction efficiency, our method outperforms three methods with the average performance $E_c=54.60\%$. The other three methods have an explicitly low efficiency. For instance, the average complexity reduction efficiency results of FU2019 [13], CHEN2019 [9], and YANG2019 [32] are 45.44%, 33.08%, and 28.47%, respectively.

5 Conclusion

In this paper, we propose a high-efficient low-complexity block partition scheme for VVC intra coding. We evaluate the block texture complexity in the vertical and horizontal directions by the sum of mean absolute deviation between sub-blocks. The relationship between the vertical and horizontal texture complexities is statistically counted for $32\times 32$ and $16\times 16$ blocks, which verifies the relevance between the directional texture complexity feature and the best partition mode. Experimental results demonstrate that the proposed method obtains the average complexity reduction efficiency by up to 103.34% under the common test condition.

References

JVET Group: VVC Reference Software. [online] https://vcgit.hhi.fraunhofer.de/jvet/ VVCSoftware_VTM/-/tags/VTM-6.3 (2019)
Amestoy, T., Mercat, A., Hamidouche, W., Menard, D., Bergeron, C.: Tunable VVC frame partitioning based on lightweight machine learning. IEEE Trans. Image Process. 29, 1313–1328 (2019)
Article MathSciNet Google Scholar
Bakkouri, S., Elyousfi, A.: Machine learning-based fast cu size decision algorithm for 3D-hevc inter-coding. Springer J. Real-Time Image Process. 18(3), 983–995 (2021)
Article Google Scholar
Bjøntegaard, G.: Calculation of average PSNR differences between RD-curves. In: Proceedings of the VCEG-M33, p 1–4 (2001)
Bossen, F., Boyce, J., Suehring, K., Li, X., Seregin, V.: JVET common test conditions and software reference configurations. In: Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 11th Meeting, p 1–6 (2018)
Bouaafia, S., Khemiri, R., Sayadi, F.E., Atri, M.: Fast cu partition-based machine learning approach for reducing hevc complexity. Springer J. Real-Time Image Process. 17(1), 185–196 (2020)
Article Google Scholar
Bross, B.: Versatile video coding (draft 1). In: Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 10th Meeting (2018)
Chen, F., Ren, Y., Peng, Z., Jiang, G., Cui, X.: A fast CU size decision algorithm for VVC intra prediction based on support vector machine. Multimed. Tools Appl. 79(37), 27923–27939 (2020)
Article Google Scholar
Chen, J., Sun, H., Katto, J., Zeng, X., Fan, Y.: Fast QTMT partition decision algorithm in VVC intra coding based on variance and gradient. In: IEEE Visual Communications and Image Processing (VCIP), pp. 1–4 (2019)
Chuang, T.D., Chen, C.Y., Huang, Y.W., Lei, S.M.: CE1-related: Separate tree partitioning at 64x64-luma/32x32-chroma unit level. In: Joint Video Experts Team (JVET) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 11th Meeting, p. 1–3 (2018)
Cui, J., Zhang, T., Gu, C., Zhang, X., Ma, S.: Gradient-based early termination of CU partition in VVC intra coding. In: IEEE Data Compression Conference (DCC), pp. 103–112 (2020)
De-Luxán-Hernández, S., George, V., Ma, J., Nguyen, T., Schwarz, H., Marpe, D., Wiegand, T.: An intra subpartition coding mode for VVC. In: IEEE International Conference on Image Processing (ICIP), pp. 1203–1207 (2019)
Fu, T., Zhang, H., Mu, F., Chen, H.: Fast CU partitioning algorithm for H.266/VVC intra-frame coding. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 55–60 (2019)
Grellert, M., da Silva Cruz, L.A., Zatt, B., Bampi, S.: Coding mode decision algorithm for fast hevc transrating using heuristics and machine learning. Springer J. Real-Time Image Process., 1–16 (2021)
Hamout, H., Elyousfi, A.: An efficient edge detection algorithm for fast intra-coding for 3d video extension of hevc. Springer J. Real-Time Image Process. 16(6), 2093–2105 (2019)
Article Google Scholar
Hosseini, E., Pakdaman, F., Hashemi, M.R., Ghanbari, M.: Fine-grain complexity control of hevc intra prediction in battery-powered video codecs. Springer J. Real-Time Image Process. 18(3), 603–618 (2021)
Article Google Scholar
Huang, H., Zhang, K., Huang, Y.W., Lei, S.: EE2.1: Quadtree plus binary tree structure integration with JEM tools. In: Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 3rd Meeting (2016)
Laroche, G., Taquet, J., Gisquet, C., Onno, P.: CE3-5.1: On cross-component linear model simplification. In: Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 12th Meeting, p. 1–3 (2018)
Lei, M., Luo, F., Zhang, X., Wang, S., Ma, S.: Look-ahead prediction based coding unit size pruning for VVC intra coding. In: IEEE International Conference on Image Processing (ICIP), pp. 4120–4124 (2019)
Li, Y., Yang, G., Zhu, Y., Ding, X., Song, Y., Zhang, D.: Hybrid stopping model-based fast pu and cu decision for 3d-hevc texture coding. Springer J. Real-Time Image Process. 17(5), 1227–1238 (2020)
Article Google Scholar
Lim, K., Lee, J., Kim, S., Lee, S.: Fast PU skip and split termination algorithm for HEVC intra prediction. IEEE Trans. Circuits Syst. Video Technol. 25(8), 1335–1346 (2014)
Google Scholar
Min, B., Cheung, R.C.: A fast CU size decision algorithm for the HEVC intra encoder. IEEE Trans. Circuits Syst. Video Technol. 25(5), 892–896 (2014)
Google Scholar
Said, A., Zhao, X., Karczewicz, M., Chen, J., Zou, F.: Position dependent prediction combination for intra-frame video coding. In: IEEE International Conference on Image Processing (ICIP), pp. 534–538 (2016)
Shang, X., Wang, G., Fan, T., Li, Y.: Fast CU size decision and PU mode decision algorithm in HEVC intra coding. In: IEEE International Conference on Image Processing (ICIP), pp. 1593–1597 (2015)
Shen, L., Zhang, Z., Liu, Z.: Effective CU size decision for HEVC intracoding. IEEE Trans. Image Process. 23(10), 4232–4241 (2014)
Article MathSciNet Google Scholar
Tahir, M., Taj, I.A., Assuncao, P.A., Asif, M.: Fast video encoding based on random forests. Springer J. Real-Time Image Process., 1–21 (2019)
Tang, G., Jing, M., Zeng, X., Fan, Y.: Adaptive CU split decision with pooling-variable CNN for VVC intra encoding. In: IEEE Visual Communications and Image Processing (VCIP), pp. 1–4 (2019)
Tang, N., Cao, J., Liang, F., Wang, J., Liu, H., Wang, X., Du, X.: Fast CTU partition decision algorithm for VVC intra and inter coding. In: IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), pp. 361–364 (2019)
Varma, K.R.C., Mahapatra, S.: Complexity reduction of test zonal search for fast motion estimation in uni-prediction of high efficiency video coding. Springer J. Real-Time Image Process. 18, 511–524 (2021)
Article Google Scholar
Wang, J., Ji, B., Wang, H., Cheng, L.: Prediction mode grouping and coding bits grouping based on texture complexity for fast hevc intra-coding. Springer J. Real-Time Image Process. 18(3), 839–856 (2021)
Article Google Scholar
Wang, M., Xie, W., Meng, X., Zeng, H., Ngan, K.N.: Uhd video coding: a light-weight learning-based fast super-block approach. IEEE Trans. Circuits Syst. Video Technol. 29(10), 3083–3094 (2018)
Article Google Scholar
Yang, H., Shen, L., Dong, X., Ding, Q., An, P., Jiang, G.: Low-complexity CTU partition structure decision and fast intra mode decision for versatile video coding. IEEE Trans. Circuits Syst. Video Technol. 30(6), 1668–1682 (2019)
Article Google Scholar
Yang, Z., Shao, Q., Guo, S.: Fast coding algorithm for HEVC based on video contents. IET Image Proc. 11(6), 343–351 (2017)
Article Google Scholar
Zhang, R., Jia, K., Liu, P., Sun, Z.: Fast intra-mode decision for depth map coding in 3d-hevc. Springer J. Real-Time Image Process. 17(5), 1637–1646 (2020)
Article Google Scholar
Zhang, Y., Pan, Z., Li, N., Wang, X., Jiang, G., Kwong, S.: Effective data driven coding unit size decision approaches for HEVC INTRA coding. IEEE Trans. Circuits Syst. Video Technol. 28(11), 3208–3222 (2017)
Article Google Scholar
Zhao, J., Wang, Y., Zhang, Q.: Adaptive CU split decision based on deep learning and multifeature fusion for H.266/VVC. Sci. Program. 2020 (2020)
Zhao, L., Zhao, X., Liu, S., Li, X., Lainema, J., Rath, G., Urban, F., Racapé, F.: Wide angular intra prediction for versatile video coding. In: IEEE Data Compression Conference (DCC), pp. 53–62 (2019)
Zhu, W., Yi, Y., Zhang, H., Chen, P., Zhang, H.: Fast mode decision algorithm for HEVC intra coding based on texture partition and direction. J. Real-Time Image Proc. 17(2), 275–292 (2020)
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the Associate Editor and the Anonymous Reviewers for examining this manuscript and for their precious time. This paper was supported in part by the National Natural Science Foundation of China under Grants 61772087, 61701310 and 61504013, in part by the Natural Science Foundation of Shenzhen City under Grant 20200805200145001, in part by the State Scholarship Fund of China Scholarship Council under Grant 201808430236, in part by the “Double First-class” International Cooperation and Development Scientific Research Project of Changsha University of Science and Technology under Grants 2018IC23 and 2018IC25, in part by the Natural Science Foundation of Hunan Province under Grants 2016JJ2005 and 2019JJ50648, in part by Scientific Research Foundation of Hunan Provincial Education Department of China under Grants 19B004 and 16B006, and in part by the CERNET Innovation Project under Grant NGII20160203.

Author information

Authors and Affiliations

School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, 410114, People’s Republic of China
Yun Song, Biao Zeng & Zelin Deng
Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology, Changsha, 410114, People’s Republic of China
Yun Song, Biao Zeng & Zelin Deng
School of Computer Science and Technology, Hainan University, Haikou, 570228, People’s Republic of China
Miaohui Wang
Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen University, Shenzhen, 518060, People’s Republic of China
Miaohui Wang

Authors

Yun Song
View author publications
You can also search for this author in PubMed Google Scholar
Biao Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Miaohui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zelin Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Miaohui Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, Y., Zeng, B., Wang, M. et al. An efficient low-complexity block partition scheme for VVC intra coding. J Real-Time Image Proc 19, 161–172 (2022). https://doi.org/10.1007/s11554-021-01174-z

Download citation

Received: 28 June 2021
Accepted: 02 September 2021
Published: 03 October 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s11554-021-01174-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An efficient low-complexity block partition scheme for VVC intra coding

Abstract

Similar content being viewed by others

Texture-based fast QTMT partition algorithm in VVC intra coding

Fast CU partition strategy based on texture and neighboring partition information for Versatile Video Coding Intra Coding

Texture-Based Fast CU Size Decision and Intra Mode Decision Algorithm for VVC

1 Introduction