1 Introduction

In recent years, the rapidly growing multimedia technologies [31] like 4K ultra-high-definition (UHD) video, \(360^{\circ } \) immersive multimedia and high dynamic range (HDR) video increase the visual quality, but lead to tremendous workload for the storage and transmission. The previous High-Efficiency Video Coding (HEVC) is difficult to satisfy the compression demand of explosive video data. To overcome the bottleneck of digital media development, the Joint Video Exploration Team (JVET), consisting of ISO Moving Picture Expert Group (MPEG) and ITU-T Video Coding Expert Group (VCEG), started to explore the next-generation video standard beyond HEVC, named Future Video Coding (FVC) in 2015. One of the new techniques of FVC is the quadtree with nested binary tree (QTBT) structure, and it is adopted in the joint exploration model (JEM) [17]. The coding unit (CU) is first partitioned into four equal-sized blocks by the quadtree (QT) structure. Then, the binary tree (BT) partition is performed sequentially to obtain the rectangular CUs. Based on such a partition tool in FVC [7], JEVT further introduces a novel partition technology, named quadtree with nested multi-type tree (QTMT), and develops a new standard for video coding, called Versatile Video Coding (VVC). VVC aims to achieve more than 50% coding efficiency improvement while maintaining the same video quality as the HEVC standard. Except for the QT partition structure, there are four multi-type tree (MT) partition structures, including vertical binary tree partition (BV), horizontal binary tree partition (BH), vertical ternary tree partition (TV), and horizontal ternary tree partition (TH). As an illustration in Fig. 1, the visual QTMT result of VVC shows a very complicated partition pattern, which indicates that it needs a significant amount of the running time to achieve the optimal CU partition.

Fig. 1
figure 1

The visual partition results of the QTMT structure in Versatile Video Coding (VVC). a The partition results of the first frame of “Johnny” (\(1280\times 720\)) under quantization parameter (QP) = 32. b The example of the horizontal partition. c The example of the vertical partition

Recursive MT partition brings more flexibility to the CU shape, but the related rate-distortion optimization (RDO) process increases the computational complexity. If we can predict the MT partition mode in advance, the encoding time will be significantly saved. It can be seen that in Fig. 1b and c, the dominant direction of the best MT partition mode is closely related to the texture direction. The CUs with vertical textures are more likely to be partitioned by the vertical MT modes, while the CUs with horizontal textures are more likely to be partitioned by the horizontal MT partition modes. Inspired by the observation, we consider determining the partition direction by making use of the texture features. We introduce the concept of the sum of the mean absolute deviation (SMAD) of the sub-blocks to evaluate the complexity of vertical and horizontal directions. Moreover, the ratios of vertical-to-horizontal SMAD are exploited to decide the texture direction. Based on the observation and statistical results, we propose an efficient low-complexity CU partition method to reduce the encoding complexity of VVC. We remove the unlikely MT partition patterns by comparing the ratio of vertical SMAD to horizontal SMAD under different quantization parameters (QPs). The experimental results demonstrate that the proposed algorithm reduces the encoding time of the intra-prediction significantly with a negligible coding loss.

Fig. 2
figure 2

An exemplary QTMT structure and the corresponding recursive partition tree in VVC

The rest of this paper is organized as follows: Sect. 2 briefly introduces some basic background knowledge of the VVC intra coding, and reviews some related fast algorithms. In Sect. 3, the motivation and statistical analysis are presented, and then, the proposed fast CU decision algorithm is described in details. Sect. 4 presents the experimental results and comparisons with other recent methods. The conclusion is provided in Sect. 5.

2 Background and related work

2.1 Background of the VVC intra coding

Block partition has taken a crucial role in video coding. VVC integrates a bunch of block partitions for intra-prediction. It splits the frame in the luminance and chrominance components [10], and unifies the concepts of the coding unit (CU), prediction unit (PU), and transform unit (TU). VVC employs the QTMT partition structure to determine the final CU size. Figure 2 demonstrates a schematic example of QTMT structure and the corresponding recursive partition trees. Unlike the QT structure that splits a CU into four squared sub-CUs, the BT structure divides a CU into two rectangular blocks with a symmetric area, while the ternary tree (TT) structure splits a CU into three rectangular blocks with an area ratio of 1:2:1. The determination process of the VVC intra-prediction and partition process is presented in Fig. 3. The coding tree unit (CTU) is partitioned into four equal-sized sub-CUs by QT first, and the resulted sub-CUs are recursively partitioned into equal-sized CUs by QT or rectangle-sized CUs by MT. The QTMT structure makes the CU more adaptive to the video content, and improves the coding efficiency substantially. However, it also inevitably increases the computational complexity to traverse all the possible partition combinations in a CTU one by one.

Fig. 3
figure 3

Pipeline of the VVC intra-prediction and partition determination process

To avoid the overlapping partitions, there are some predefined restrictions for the QTMT structure. For instance, the \(128\times 128\) CTU is divided into four \(64 \times 64\) quadtree nodes by default, because the maximum CU size is \(64\times 64\), and the minimum CU size is \(4 \times 4\). The minimum QT partition size is \(16 \times 16\), while the maximum MT partition size is \(32 \times 32\). If a CU is split by MT, QT partition will be forbidden for obtaining leaf nodes. If the current CU performs the the TT partition, the inter-sub-blocks are not allowed to be BT partitioned. The redundant partitioning examples are shown in Fig. 4.

Fig. 4
figure 4

Illustration of the redundant partitions in VVC

As for the intra-prediction in VVC, the candidate modes are extended from 35 to 67, including a DC mode, a planar mode, and 65 directional modes, as shown in Fig. 5. The DC mode is suitable for large flat areas, and the planar mode is for regions where the pixel values are slowly changed. The black lines denote the prediction modes in HEVC, while the red lines denote the additional modes in VVC. The intra-frame coding mode is determined as follows. First, the rough mode decision (RMD) is performed on the traditional 35 modes from HEVC. Then, the first N candidates based on RMD with the smallest sum of absolute Hadamard transformed difference (SATD) are further considered in a fine mode decision process. The direct neighboring modes of the first N modes are also added to the candidates, and the list is updated dynamically in terms of the SATD value. Finally, the most probable modes from the left and above neighboring blocks are merged to perform the full RDO process.

Fig. 5
figure 5

Illustration of directional intra-prediction modes in VVC

Moreover, VVC adopts many new encoding tools for intra-prediction, such as the position-dependent intra-prediction combination (PDPC) [23], cross-component linear model (CCLM) [18], intra-sub-partition (ISP) [12], and wide angular intra-prediction (WAIP) [37]. These new technologies have been integrated into the VVC test model (VTM). It is worth noting that they benefit the compression efficiency but considerably increase the computational complexity.

2.2 Related work

Many researchers have been devoted to developing fast algorithms for video encoding. For example, the global and local edge complexities of the current CU and its sub-CUs were exploited in [22] to classify a CU to be split, non-split, or undetermined at each depth level. [25] and [24] proposed fast algorithms for HEVC intra coding tasks by employing the texture property and the depth information to make an early CU pruning decision or CU splitting decision. In [33], the content similarity between the current CU and its co-located CU in temporally, and the neighboring CUs in the same frame were referenced to determine the depth early. In [38], the partition modes of different-size CUs were reduced according to the correlation between the quadtree partition and the texture partition features. Wang et al. [30] proposed a fast prediction mode and CU size decision algorithm by grouping 35 intra-prediction modes into 5 sub-lists according to the texture complexity. Hosseini et al. [16] designed a rate-complexity model to allocate the complexity budget and obtain the target complexity by considering the complexity distribution and video content. Varma et al. [29] adopted a fast test zonal search to reduce the encoding complexity, which was used in gird, raster, and refinement search stages. Meanwhile, machine learning solutions were also developed for the fast encoding algorithm. Lim et al. [21] designed a fast CU size decision based on the Bayes decision. The PU skip and split termination were performed by utilizing the ratio of the rate-distortion (RD) cost in RMD from the current PU to that of the neighboring PU. In [35], the CU size decision was modeled as a classification, and the representative features were utilized in support vector machine (SVM) to perform the early CU decision. Grellert et al. [14] trained an automatic learning model by combining the statistically based heuristics algorithm and fast decision method. Bouaafia et al. [6] proposed two fast CU partition algorithms for HEVC inter-mode decision, including the SVM trained model and the convolutional neural network (CNN) model. Tahir et al. [26] devised a fast algorithm based on a combination of the online and offline random forests (RF) to deal with the CU and transform unit partition. Apart from the above fast methods for HEVC, some fast methods were also investigated for 3D-HEVC, such as correlation-based [20], edge direction-guided [15, 34], and decision tree-driven [3].

The fast encoding schemes mentioned above achieved well-complexity reduction on the encoder side. However, they were all designed for HEVC instead of VVC. Recently, some fast CU algorithms were proposed for VVC. Inspired by the RMD process, [19] adopted the SATD-based mode decision to determine the best intra-prediction mode for sub-block partitions. In [13], the correlations between the parent CU and its following horizontally splitting sub-CUs were exploited to skip the vertical split through the Bayesian rule. And then, the RD cost of the vertical binary split was applied to avoid the horizontal ternary split early. [9] devised a fast algorithm for the new QTMT structure, where directional gradient and the variance of variance of sub-CUs features were exploited for the CU partition decision. An innovative partition determination framework was proposed in [32], in which the QT partition decision was judged before the MT partition decision. The model transformed the multi-classification problem into multiple binary-classification problems at each decision level, which was handled by decision tree classifiers. This work also proposed a fast intra-mode decision with gradient descent search. [28] skipped the vertical or horizontal partition modes, and performed the early termination for intra coding according to the edge features extracted by the Canny operator. Cui et al. [11] calculated the gradient of each CU along vertical, horizontal, and two diagonal directions, and then pre-determined the likelihood of BT or TT partition in horizontal or vertical direction. [2] proposed a lightweight and tunable partitioning method for QTBT and QTMT using the RF classifier to decide the likely partition modes. [8] established the relationships between the extracted features and splitting modes from the online learning frames, and then trained the SVM classifier to predict the direction of the MT structure. [27] performed a pre-decision algorithm for the homogeneous block by the Sobel gradient operator to skip the calculation of sub-CUs, and then implemented a pooling-variable CNN, which trained the partition decision for CUs with various sizes by only one network. [36] developed an adaptive CU partition algorithm based on deep learning and multi-feature fusion to terminate the iteration of the RDO process.

Being different from the previous methods, the article introduced a new fast block partition decision algorithm based on the complexity of the whole CU. Specifically, the vertical and horizontal complexity deviation of sub-blocks was characterized by SMAD. Then, the vertical and horizontal partition is predicted based on the texture direction, where the unnecessary MT partition patterns are skipped accordingly. Experimental results show that the proposed method obtains a satisfying balance between the prediction accuracy and the complexity reduction.

Fig. 6
figure 6

Statistical data of the MT partition results in the VVC reference software. a The proportions of MT partition modes for five video sequences. b The average proportions of the encoding time by the MT partition modes for Class A1–E

3 Methodology

VVC adopts a flexible QTMT structure, which improves the coding efficiency at the cost of substantial complexity (i.e., ten times or more than HEVC). To reduce the complexity, we devise a fast CU size decision algorithm based on the texture direction to avoid the redundant MT partitions. In this section, we describe the proposed scheme in details as follows.

Fig. 7
figure 7

An example of the equal-sized sub-CUs

3.1 Motivation and statistical analysis

In the VVC reference software, MT partition structures take almost one-third of the total division, and become the most complicated operation, which accounts for more than 90% of the total encoding time on average [8]. The statistical proportion of the MT partition modes (from \(32\times 32\) to \(8\times 8\)) for five sequences is shown in Fig. 6a, and the related running time proportion for the MT modes is shown in Fig. 6b. In the MT partitioning decision process, there is a great flexibility in the partition modes decision, but only one of the vertical and horizontal directions is adopted at the last decision. Therefore, if we can remove the unlikely MT partition modes in advance, the coding complexity from the expensive RDO process will be reduced significantly.

Moreover, it can be observed from Fig. 1a that for the region with smooth textures, like the blue background, VVC tends to choose larger size CUs. While for the area with the complicated textures, like the outline of the characters, VVC is more likely to select smaller size CUs. Moreover, when the region with the directional texture is divided into several sub-CUs along the same direction, which can be categorized into the vertical and horizontal directions roughly. The vertical and horizontal partitions are indicated as the red and green boxes in Fig. 1b and c, respectively.

A toy example is shown in Fig. 7, in which block1 and block2 have the same pixel values while block3 and block4 have the same pixel values. It can be apparently found that the complexity difference between sub-CUs in the vertical direction is larger than that in the horizontal direction, and the texture direction is considered as horizontal direction. Therefore, it is more preferred to skip the vertical partition mode, and choose the horizontal partition modes in advance.

3.2 Establishment of the texture complexity measurement

The median absolute deviation (MAD) represents the mean of the data absolute deviations around its mean, which is computationally simpler than the standard deviation. In this section, we adopt MAD to measure the complexity (or variability) of the image block. The complexity measurement, \({\mathcal {C}}_{\text {mad}}\), is formulated as

$$\begin{aligned} {\mathcal {C}}_{\text {mad}} = \frac{1}{{W \times H}}\sum \nolimits _{x = 1}^W {\sum \nolimits _{y = 1}^H {|p(x,y) - m (\mathbf{p })|} }, \end{aligned}$$
(1)

where W and H are the width and height of the current CU, and p(xy) represents the pixel value at position (xy) . \(m (\mathbf{p })\) denotes the mean of the pixel values, which is defined by

$$\begin{aligned} m (\mathbf{p }) = \frac{1}{{W {\times } H}}\sum \nolimits _{x = 1}^W {\sum \nolimits _{y = 1}^H {p(x,y)} }. \end{aligned}$$
(2)

The straightforward idea of the proposed fast block partition determination scheme is described as follows: if the \({\mathcal {C}}_{\text {mad}}\) value is less than a threshold \({\mathcal {T}}\), it means that there is a little variance between pixels in the CU; it is classified into the homogeneous type, and performs the early determination. Otherwise, the current CU is classified into the complex type. In such a case, the next bypass method is performed.

Based on the above analysis, we propose to divide the current CU into four equal-sized blocks, and compute the complexity of each sub-CU in terms of \({\mathcal {C}}_{\text {mad}}\). Considering the new characteristic of the VVC codec, we first calculate the mean complexity deviation between sub-CUs in both the vertical and horizontal directions. Then, we compare the complexity difference from such two directions to estimate the texture direction of the whole CU.

The \({\mathcal {C}}_{\text {mad}}^{bl}\), \({\mathcal {C}}_{\text {mad}}^{br}\), \({\mathcal {C}}_{\text {mad}}^{tl}\), and \({\mathcal {C}}_{\text {mad}}^{tr}\) values are used to represent the texture complication of the bottom-left, bottom-right, top-left, and top-right sub-CUs, respectively, which are calculated by Eqs. (3a)–(3d)

$$\begin{aligned} {\mathcal {C}}_{\text {mad}}^{bl}= & {} \frac{1}{{\left( {\frac{W}{2}} \right) {\times } \left( {\frac{H}{2}} \right) }}\sum \nolimits _{x = 1}^{W/2} {\sum \nolimits _{y = 1}^{H/2} {|p(x,y) - m (\mathbf{p }_1) |} }, \end{aligned}$$
(3a)
$$\begin{aligned} {\mathcal {C}}_{\text {mad}}^{br}= & {} \frac{1}{{\left( {\frac{W}{2}} \right) {\times } \left( {\frac{H}{2}} \right) }}\sum \nolimits _{x = \frac{W}{2} + 1}^W {\sum \nolimits _{y = 1}^{H/2} {|p(x,y) - m (\mathbf{p }_2) |} }, \end{aligned}$$
(3b)
$$\begin{aligned} {\mathcal {C}}_{\text {mad}}^{tl}= & {} \frac{1}{{\left( {\frac{W}{2}} \right) {\times } \left( {\frac{H}{2}} \right) }}\sum \nolimits _{x = 1}^{W/2} {\sum \nolimits _{y = \frac{H}{2} + 1}^H {|p(x,y) - m (\mathbf{p }_3) |} }, \end{aligned}$$
(3c)
$$\begin{aligned} {\mathcal {C}}_{\text {mad}}^{tr}= & {} \frac{1}{{\left( {\frac{W}{2}} \right) {\times } \left( {\frac{H}{2}} \right) }}\sum \nolimits _{x = \frac{W}{2} + 1}^W {\sum \nolimits _{y = \frac{H}{2} + 1}^H {|p(x,y) - m (\mathbf{p }_4) |} } . \end{aligned}$$
(3d)

We estimate the texture complexity according to the SMAD value of \({\mathcal {C}}_{\text {mad}}\) for each CU, including the vertical and horizontal complexities. The vertical SMAD, \(S_{\text {ver}}\), is defined by the sum of the difference of the left sub-CUs and that of the right sub-CUs. Similarly, the horizontal SMAD, \(S_{\text {hor}}\), is defined by the sum of the difference of the bottom sub-CUs and that of the top sub-CUs. Furthermore, we consider utilizing the ratio of \(S_{\text {ver}}\) and \(S_{\text {hor}}\) to represent the possibility of the partition mode, which is formulated as

$$\begin{aligned} {\mathcal {R}}_{\text {ver/hor}} =\frac{{\mathcal {S}}_{\text {ver}}}{{\mathcal {S}}_{\text {hor}}} = \frac{ {|{\mathcal {C}}_{\text {mad}}^{bl} - {\mathcal {C}}_{\text {mad}}^{tl} | + |{\mathcal {C}}_{\text {mad}}^{br} - {\mathcal {C}}_{\text {mad}}^{tr}|} }{ {|{\mathcal {C}}_{\text {mad}}^{bl} - {\mathcal {C}}_{\text {mad}}^{br} | + |{\mathcal {C}}_{\text {mad}}^{tl} - {\mathcal {C}}_{\text {mad}}^{tr}|} }. \end{aligned}$$
(4)
Table 1 The statistical data of the vertical and horizontal MT modes over \(32\times 32\) and \(16\times 16\) for \({\mathcal {R}}_{\text {ver/hor}} < 1 \) or \({\mathcal {R}}_{\text {ver/hor}} > 1 \)

Experiments on the VVC reference software [1] were conducted to investigate the relationship between the final partition mode and the SMAD feature. Six video sequences from the JVET common test condition (CTC) [5] were used in the experiments, including “FoodMarket4” (\(3840 \times 2160\)), “DaylightRoad2” (\(3840\times 2160\)), “Cactus” (\(1920 \times 1080 \)), “BQMall” (\(832 \times 480 \)), “BasketballPass” (\(416 \times 240 \)), and “FourPeople” (\(1280\times 720 \)). The first five frames of each sequence were encoded in all-intra (AI) configurations under \(\hbox {QP}=27\). We collect the statistical results of \({\mathcal {R}}_{\text {ver/hor}}\) and the corresponding vertical and horizontal MT partition modes, and the overall results are presented in Table 1.

As we can see from Table 1, the percentages of the vertical and horizontal partition modes vary with different sequences. However, when the \({\mathcal {R}}_{\text {ver/hor}}\) values are less than 1, the percentages of vertical partition modes are 74%, 61%, 66%, 75%, 71%, and 78%, respectively. In such a case, the average possibility of being encoded by the vertical partition is more than twice that of the horizontal partition. Meanwhile, when the \({\mathcal {R}}_{\text {ver/hor}}\) values are greater than 1, we can find that the average possibility of being encoded by the horizontal partition is more than twice that of the vertical partition.

Based on the statistical data in Table 1, the proposed fast CU decision scheme is designed based on the following rule. When the \({\mathcal {R}}_{\text {ver/hor}}\) value is greater than the higher threshold \({\mathcal {T}}_{h}\), it means that the complexity of the horizontal direction is greater than that of the vertical direction, and the current CU is more likely to have the horizontal texture. In this case, we abandon the vertical MT partition modes in advance. In contrast, when the \({\mathcal {R}}_{\text {ver/hor}}\) value is less than a lower threshold \({\mathcal {T}}_{l}\), it represents that the current CU is more likely to have the vertical texture, so we skip the horizontal partitioning mode. When \({\mathcal {R}}_{\text {ver/hor}}\) is between \({\mathcal {T}}_{h}\) and \({\mathcal {T}}_{l}\), it can be considered that there is a little difference in the texture complexity between the vertical and horizontal directions. In such a case, every CU partition almost has the same probability. Thus, choosing the original VVC scheme can ensure the prediction accuracy. Although we separate the CU into the equal-size blocks, the proposed method is also applied for the TT partition. Considering that the maximum MT size is \(32\times 32\), we only implement our method for the luminance blocks with \(32\times 32\) and \(16\times 16\) to keep the trade-off between the complexity reduction and the encoding efficiency.

3.3 Determination of the partition condition

To verify the assumptions, we statistically count the cumulative distribution of the splitting modes and the \({\mathcal {R}}_{\text {ver/hor}} \) values. The mistaken cases are defined as follows: (1) when \({\mathcal {R}}_{\text {ver/hor}} > {\mathcal {T}}_{h} \), the best splitting mode should be the horizontal splitting mode theoretically, but the actual splitting mode is vertical; (2) when \({\mathcal {R}}_{\text {ver/hor}} < {\mathcal {T}}_{l} \), the best splitting mode should be the vertical splitting mode theoretically, but the best splitting mode is horizontal instead. The error ratio is formulated as Eq. (5)

$$\begin{aligned} {\mathcal {R}}_{\text {err}} = \frac{{N_{err} }}{{N_{all} }} \times 100\%, \end{aligned}$$
(5)

where \(N_{\text {err}}\) represents the number of the misclassification of the vertical and horizontal partitions, and \(N_{all}\) represents the total number of all the testing CUs. Accordingly, the correct ratio, \({\mathcal {R}}_{\text {acc}}\), is calculated by \({\mathcal {R}}_{\text {acc}} = 1 - {\mathcal {R}}_{\text {err}}\). Consequently, we can use these two indicators to estimate the similarity between our method and the original scheme.

The threshold plays a crucial role in the proposed scheme. If the \({\mathcal {T}}_{h}\) value is too small, the algorithm will obtain better complexity reduction, because more partition modes are determined in advance. If the \({\mathcal {T}}_{h}\) value is too large, the coding performance will be better, but the computational complexity will increase. The encoding results of \( 32 \times 32\) and \( 16 \times 16\) are collected from “BasketballDrive”, “BQMall”, and “BasketballPass” under QP=27 and QP=37 to test the prediction accuracy. The relationship between the threshold and \({\mathcal {R}}_{\text {acc}}\) is demonstrated in Fig. 8.

Fig. 8
figure 8

Ablation results of the prediction accuracy under various thresholding values

Moreover, we also carry out experiments to reveal the partitioning results under different QPs. As shown in Fig. 9, we can intuitively observe that the number of CUs encoded by \(32 \times 32 \) increases with a higher QP. Meanwhile, the number of CUs encoded by \(16 \times 16 \) decreases with a higher QP. It indicates that we can update the threshold value along with the QP increasing. In view of the RD performance, since the mistaken classification of a larger CU will generate more BD rate than a smaller CU, we take a tight judgement condition for high QPs. In our method, when the QP value is greater than 28, we set the threshold \({\mathcal {T}}_{h}\) as 1.7; otherwise, we set the threshold \({\mathcal {T}}_{h}\) as 1.2. As presented in Fig. 8, the overall accuracy of the proposed method ranges from 80% to 90%. To simplify the modeling process, the \({\mathcal {T}}_{l}\) is defined as the reciprocal of \({\mathcal {T}}_{h}\), because the number of vertical and horizontal partition modes is at a similar numeral level generally for a specific sequence.

Fig. 9
figure 9

Illustrations of the relationship between the block partition and quantization parameters. Two different CU sizes (\(16 \times 16 \) and \(32 \times 32 \) ) and four different QPs (22, 27, 32, 37) are evaluated in the experiments

3.4 The overall algorithm

The pseudo-code of the overall algorithm can be summarized as Algorithm 1. The main idea is described as follows. If the size of the current CU is \(32 \times 32 \) or \(16 \times 16 \), the fast algorithm is performed for intra-prediction. We calculate the \({\mathcal {C}}_{\text {mad}}\) values for all sub-blocks in the vertical and horizontal directions, which is used to measure the complexity. And the ratio \( {\mathcal {R}}_{\text {ver/hor}}\) of the vertical-to- horizontal complexity is calculated to compare the possibility of the texture direction. If the \( {\mathcal {R}}_{\text {ver/hor}}\) value is larger than the higher threshold \({\mathcal {T}}_{h}\), the vertical MT partitioning mode is skipped, including BV and TV. If the \( {\mathcal {R}}_{\text {ver/hor}}\) value is less than the lower threshold \({\mathcal {T}}_{l}\), the horizontal MT partition is skipped, including BH and TH. Otherwise, we consider that the texture direction is not apparent, and choose the default partition modes of the VVC reference software. The flowchart of the proposed algorithm is shown in Fig. 10.

figure e
Fig. 10
figure 10

Flowchart of the proposed algorithm

4 Experimental results

To evaluate the performance of the proposed fast method, we conduct the experiments on the VVC reference software [1]. A total of 22 common test sequences belong to six categories with various resolutions: A1 \((3840\times 2160)\), A2 \((3840\times 2160)\), B \((1920\times 1080)\), C \((832\times 480)\), D \((416\times 240)\), and E \((1280\times 720)\). The encoding parameters take the recommended setting by the JVET common test condition (CTC) [5] configurations under QP = 22, 27, 32, and 37, respectively.

We carry out the experiments based on the computing platform with Intel(R) Core (TM) i5-6200U CPU @ 2.30 GHz, 8 GB RAM with Microsoft Visual Studio C++ 2017 compiler, and we use the open-source computer vision (OpenCV) 3.4.1 library. The encoding efficiency is measured by Bjøntegaard delta peak signal-to-noise ratio (BDPSNR) and Bjøntegaard delta bit rate (BDBR) [4]. The time saving, \(T_s\), is measured by

$$\begin{aligned} T_s = \frac{{T_{o} - T_{p} }}{{T_{o} }} \times 100\%, \end{aligned}$$
(6)

where \(T_o\) denotes the total encoding time of the original VVC, and \(T_p\) denotes the total encoding time of the proposed method.

4.1 Overall performance

Table 2 provides the overall encoding results of the proposed method. The proposed texture-based fast decision strategy achieves the time saving from 26.55 to 35.63% for various video sequences. The average encoding time saving is 30.33% compared with the anchor. More importantly, the complexity reduction has a negligible effect on the encoding efficiency; for example, 0.57% BDBR increase or 0.03 dB BDPSNR decrease on average.

Table 2 Experimental results of the proposed method under the all-intra configuration

4.2 Rate-distortion comparison

Figure 11 illustrates the overall rate-distortion curves of six typical sequences from Classes A1 to E: “Campfire” (\(3840 \times 2160 \)), “ParkRunning3” (\(3840 \times 2160 \)), “BasketballDrive” (\(1920 \times 1080 \)), “BQMall” (\(832 \times 480 \)), “BasketballPass” (\(416 \times 240 \)), and “Johnny” (\(1280 \times 720 \)) with QP= 22, 27, 32, and 37, respectively. It can be seen that the RD curves of the original VVC anchor and the proposed method are approximately identical for all video sequences. The results verify the effectiveness of the proposed scheme in reducing the computational complexity but maintaining the coding efficiency.

Fig. 11
figure 11

Rate-distortion curve comparison between the proposed method and the VVC anchor reference software

4.3 Complexity reduction efficiency comparison

Complexity reduction efficiency is an important indicator to measure the complexity reduction versus the compress bit-rate increment. It is defined by

$$\begin{aligned} E_c = \frac{{T_{s} }}{{\text {BDBR} }} \times 100\%. \end{aligned}$$
(7)
Table 3 Comparison results of the proposed algorithm and thee recent advances on four QPs (22, 27, 32, and 37)

The performance comparison results of the complexity reduction efficiency are summarized in Table 3. Three recent advances are employed to demonstrate the comparison results, including FU2019 [13], CHEN2019 [9], and YANG2019 [32]. Sine CHEN2019 [9] only performed the experiments on the sequences of 8-bit depth, we provided the results on class B–E for fair comparison. FU2019 reduces 42.36% of the encoding time at the cost of the BDBR increase by 0.93%, CHEN2019 saves the encoding time by 52.48% at the cost of the BDBR increase by 1.59%, and YANG2019 achieves 52.98% of the complexity reduction at the cost of the BDBR increase by 1.86%.

In terms of the complexity reduction efficiency, our method outperforms three methods with the average performance \(E_c=54.60\%\). The other three methods have an explicitly low efficiency. For instance, the average complexity reduction efficiency results of FU2019 [13], CHEN2019 [9], and YANG2019 [32] are 45.44%, 33.08%, and 28.47%, respectively.

5 Conclusion

In this paper, we propose a high-efficient low-complexity block partition scheme for VVC intra coding. We evaluate the block texture complexity in the vertical and horizontal directions by the sum of mean absolute deviation between sub-blocks. The relationship between the vertical and horizontal texture complexities is statistically counted for \(32\times 32\) and \(16\times 16\) blocks, which verifies the relevance between the directional texture complexity feature and the best partition mode. Experimental results demonstrate that the proposed method obtains the average complexity reduction efficiency by up to 103.34% under the common test condition.