1 Introduction

With the development of capture and display technologies, High Definition (HD) video is gradually applied into many fields, such as, television, games, education, virtual reality and augmented reality [37, 38]. The data amount of HD video is tremendous, which exceeds the network transmission bandwidth. It is critical to develop efficient compression technologies. The most well-known compression methods are the Joint Photographic Experts Group (JPEG) and JPEG 2000 [7,8,9]. These methods are based on transform coding and perform better than others. However, they are only suitable for image. To efficiently store and transmit video signal, many video compression standards such as MPEG4, H.264/AVC [28] are successively proposed. Recently, ISO and ITU have jointly established a new video coding standard, High Efficiency Video Coding (HEVC) [35]. Compared with traditional standard H.264/AVC, HEVC achieves half bitrate saving for high definition video with same reconstructed video quality.

In HEVC, intra coding [12] is employed to exploit the spatial redundancy. Compared with H.264/AVC standard, HEVC intra coding not only introduces the quad-tree-based recursive partitioning and fine-grain intra prediction technology, but also adopts advanced Coding Unit (CU), Prediction Unit (PU) and Transform Unit (TU) [40]. Although the use of these new technologies and coding tools provides high flexibility for encoding the video with smooth or complex content, it also brings huge encoding complexity, which hinders the real-time application of HEVC.

Many algorithms have been proposed to reduce the computational complexity for H.264/AVC, which can be roughly classified into two categories. The first category [11, 16, 36] reduces the impossible candidate modes by making full use of statistical Rate-Distortion (RD) cost, dominant edge direction and the spatial and temporal correlation. The second category [10, 14, 15] reduces the computational complexity of motion estimation. However, these methods cannot be directly applied to HEVC due to the introduction of new coding tools and technologies.

Recently, a number of fast algorithms have been proposed to reduce the computational complexity for H.265/HEVC. These algorithms can be classified into two types, CU depth early termination or skip and fast intra mode decision. For CU depth early termination or skip, numerous algorithms have been developed by analyzing the texture complexity of video content, spatiotemporal depth correlation and RD cost. In [34], the current CU is firstly classified as homogeneous or complex according to texture homogeneity measured by mean absolute deviation. Then, the classification result and coding information from neighboring coded CUs are used to determine the optimal CU size. In [30], the CU smoothness parameter is generated based on the variance of the SADs in horizontal, vertical, left and right diagonal directions. According to the variance, the CU split process is early terminated or skipped based on two predefined thresholds. In [26], the global and local edge complexity in horizontal, vertical, left and right diagonal directions are proposed and used for CU splitting. If the global and local edge complexity are lower than predefined thresholds decided by the wrong hit rate, the searching of CU split is early terminated. In [2], early termination of simple CTU is achieved according to the edge information detected by Sobel operator. In [24], the unnecessary coding block will be skipped adaptively by calculating the texture complexity for each coding depth. In [25], an adaptive double thresholds scheme is employed to classify the video content. Then, the classification and depth information of temporally co-located CU and the spatially neighboring CUs are jointly utilized to constrain the CTU depth range. In [33], some specific depth levels rarely used in spatially neighboring CUs are skipped. In [32], the depth information of spatially neighboring CUs is exploited to make an early CU splitting decision or CU pruning decision. In [43], based on the temporal correlation, the depth information of temporally co-located CTU is used to predict the size of current CTU. In [39], the aggregated RD cost of sub-CUs is estimated according to Hadamard cost and RD cost of partial sub-CUs. If the aggregated RD cost is larger than the RD cost of current CU, the CU split process is early terminated. In [5], a rough mode cost (RMC) based algorithm is proposed for early CU splitting or termination. Whether to split or terminate the current CU is decided by comparing the least RMC value with the adaptive threshold.

For fast intra mode decision, related works have been proposed by analyzing the CU texture direction, RD cost and spatial mode correlation. In [24], the pixel values deviation technique is utilized to obtain the dominate texture direction for each PU. Based on the dominate direction, the number of candidate modes for Rough Mode Decision (RMD) and RD Optimization (RDO) is reduced from 35 to 13 and 8 to 3, respectively. In [33], the RD cost and prediction mode correlations among different depth levels or spatially nearby CUs are exploited to reduce some prediction modes. In [32], the prediction mode correlation between the higher layer PU and current layer PU is utilized to terminate some prediction modes which are rarely selected as the optimal mode. In [43], the dominant direction of the block is detected by discrete cross difference to reduce the prediction modes for RMD process. In addition, four early termination methods are proposed to terminate the RDO process properly. In [39], a progressive rough mode search method is proposed to check the potential modes instead of traversing all modes. Besides, an early RDO skip scheme is introduced for further complexity reduction. If the mode distance between current mode and any evaluated mode is lower than 2, the current mode is excluded for RDO process. In [5], the ratios between the least RMC value and all other RMC values are employed to reduce the number of intra modes for RDO process. Based on the correlation between Hadamard cost and RD cost, a selective function is established to reduce some modes for RDO process in [17]. In [31], the dominant texture orientation is detected by computing the local directional variance along a set of co-lines. According to the dominant texture orientation, a reduced number of candidate modes are selected to be further checked by RDO process. In [27], the prediction modes for RMD are decreased based on mode correlation among upper CU or spatially nearby PUs. For further complexity reduction, an early termination method for RD optimization (RDO) and RQT processes is introduced by an adaptive threshold based on the average value of RD costs. In [18], a two-step fast mode decision algorithm is proposed for HEVC. Firstly, the depth information is used to skip some intra modes for RMD process. Then, the order of RMD and MPM is adjusted to skip some unnecessary check for RDO. In [1], the texture of block diagonals is used to reduce the number of modes for RMD process.

In addition to traditional statistical methods mentioned above, many learning methods are extensively used in prediction and recognition [19,20,21,22,23, 29]. Some advanced learning algorithms are introduced for fast video coding. In [41, 42, 44], the authors modelled CU partitioning and mode selection as a binary or multi-class problem. Then, joint Support Vector Machine (SVM) based classifier is applied to early CU termination, early CU split decision and fast mode selection. In [13], Kim et al. propose a fast CU size decision algorithm, which early terminates the CU split process based on Bayesian decision rule. In [6], a decision tree is built through data mining techniques to decide whether the CU split process should be early terminated or not. For these learning algorithms, multiple features can be jointly used. Besides, sophisticated learning algorithms can improve the prediction accuracy when compared with traditional statistical methods.

For the content complexity-based fast CU size algorithm proposed in previous work, the variance or Sobel gradient are used to constrain the depth range of CTU. These algorithms fail on video content with single direction texture, which can be predicted well by corresponding angular mode. For the correlation based fast CU size algorithm mentioned above, the depth information of spatially neighboring CUs or co-located CU is exploited to accelerate the CU split process. However, these algorithms do not make full use of the intra prediction mechanism characteristic in HEVC. After RMD process, some methods are proposed to reduce the number of modes checked by RDO process based on the information of mode candidate list. These early termination algorithms could make the optimal prediction mode skipped, which leads to considerable loss of RD performance.

To alleviate abovementioned drawbacks, we propose a simple and effective intra coding algorithm based on depth range prediction and mode reduction. In HEVC intra coding, up to 35 modes are used for prediction, and each of them can be used to predict a block with single direction texture. There exists implicit relationship between intra modes and video texture characteristics. After encoding, it can be inferred that the texture of current block is rich if various intra modes are included. Based on this observation, instead of variance and Sobel gradient, a novel feature is proposed to measure the complexity of video content. Then, the relationship between this feature and CTU depth range is modeled. According to the model, the unnecessary CU split operations are skipped. After RMD process, the candidate modes in list are sorted in descending order according to Hadmard cost. That is, the modes rank ahead have high probability to be selected as optimal mode. Conversely, the modes rank behind in candidate list are rarely chosen as best mode. For each CU, the prediction is conducted only when the split flag of parent CU is set true. Based on aforementioned analyses, a fast mode decision algorithm is proposed to reduce the encoding complexity. For the prediction of unlikely CU, the last few intra modes in candidate list are eliminated before RDO process. Experiment result shows that the proposed overall algorithm can save significant encoding time reduction with negligible loss of RD performance, achieving well trade-off between complexity reduction and RD performance. This paper has two main contributions. 1) A novel feature is proposed to measure the complexity of video content. Based on the analysis about the feature, we build a depth prediction model for CTU. 2) We propose a fast mode decision algorithm for RDO process based on the statistical analyses of CU distribution structure.

The rest of this paper is organized as follows. In section 2, the procedure and its complexity analysis of HEVC intra coding is presented. Section 3 presents a fast intra coding algorithm based on correlation between PU prediction and CU split. The experimental result and conclusion are illustrated in Section 4 and Section 5, respectively.

2 HEVC intra coding and complexity analysis

To improve compression performance, HEVC introduces many novel technologies with the cost of rapid increase in computational complexity at encoder side. The significant computational load for intra coding is primarily due to its quad-tree-based recursive partitioning and fine-grain intra prediction [5]. The retailed analysis is presented in this section.

Compared to traditional encoding standard, HEVC still adopts block-based encoding style. Firstly, each frame is divided into non-overlapping CTUs. Then, flexible quad-tree structure is employed to recursively split each CTU into different size CUs. The size of CU can be 64 × 64, 32 × 32, 16 × 16 and 8 × 8, corresponding to depth of 0, 1, 2 and 3. In CU split process, each CU can be further split into one or multiple PUs and TUs by prediction and transform.

In HEVC, the CU split and mode decision are determined by the RDO rule, which represents a trade-off between the bit-rate and reconstructed quality. Aiming at achieving the best RD performance, HM encoder traverses every possible CU combination to get the optimal split scheme which leads to minimum RD cost. Figure 1 shows the recursive split of a CTU. Block with different color represents different depth CUs. The number and red line inside each CU represents the traversing order and split flag, respectively. The specific split process can be described in two steps. Firstly, traverse every depth CU by the Z-scan order and calculate RD cost. And then, backward pruning is performed according to the RD cost until the coding structure of CTU is determined.

Fig. 1
figure 1

Diagram for CTU split

Intra prediction is executed to get the optimal intra mode by encoder at each depth of CTU. It includes RMD, Most Probable Mode (MPM) derivation and RDO process, as show in Fig. 2. Firstly, RMD is performed to select N candidate modes according to the Hadamard cost, where the value of N is set 3, 3, 3, 8 and 8 for 64 × 64, 32 × 32, 16 × 16, 8 × 8 and 4 × 4 PU, respectively. After this process, the number of intra modes for RDO is reduced from 35 to N. Then, additional MPMs from left and above PUs will be added into candidate list if they are not included yet. Due to high correlation of spatially nearby PUs, this process can improve the coding efficiency. In the end, full RDO process is executed to choose the optimal mode from candidate list.

Fig. 2
figure 2

Intra prediction process

With the above analyses, the decision of coding parameters for each CTU requires encoder traversing different depth CUs, which needs 85(1 + 4 + 16 + 64) recursions. At each depth of CTU, PU prediction is executed to get the optimal intra mode. At depth 3, PU can be further divided into four sub-PUs. HEVC intra coding supports 35 prediction modes, including DC, Planar and 33 angular modes. For luminance component, 11,935 rough mode predictions for RMD process and at least 2623 refined mode predictions for RDO process need to be performed to determine the optimal mode for each CTU [24]. And the RDO process includes a series of complex processes such as prediction, transform, quantization and reconstruction. Hence, the introduction of new coding tools and technologies in HEVC requires massive computing resources, which blocks the practical encoding applications.

3 Proposed algorithm

In order to reduce the computational complexity for HEVC intra coding, an optimization algorithm is proposed. Figure 3 shows the overall flowchart of proposed algorithm, where C is current CTU and DL is the CU depth level. The overall algorithm includes Mode-based CTU Depth Range Prediction (MCDRP) and Distribution-based Intra Mode Fast Selection (DIMFS). The MCDRP is used to predict probable depth range of current CTU based on mode information obtained from a previous frame. For each depth, the DIMFS algorithm is performed to reduce the candidate modes for RDO process. Before encoding a CTU, the proposed MCDRP algorithm is performed to determine the DL range. The DL range of original HM14.0 platform is from 0 to 3. After depth prediction, the DL range is from min to max. For each depth of DL range, the DIMFS algorithm is recursively conducted. Since the two algorithms work for different stage, the overall algorithm can achieve considerable encoding time reduction.

Fig. 3
figure 3

Overall flowchart of proposed algorithm

3.1 Mode-based CTU depth range prediction algorithm

In HEVC, CTU split process needs to traverse all depth CUs, which exists large depth redundancy for CTUs whose maximum split depth is only 0, 1 and 2. The unnecessary split operations will be reduced if the depth range of CTU can be predicted before encoding. This paper utilizes the mode information of co-located CTU from a previous frame to predict the depth range of current CTU.

Because the contents between successive frames of each video are almost the same, the encoding parameters adopted by temporally co-located CTU and current CTU are similar [43]. For example, Fig. 4(a) shows the optimal split result for the second and third frames of BasketballPass sequence when QP is 32. As shown in the figure, the split depth between co-located CTUs is quite similar. HEVC adopts 35 prediction modes for intra coding, and each of them represents a directional texture. It is obvious that one CTU includes rich texture if many prediction modes are used. For nature video, the split structure of the region with rich texture tends to be complex. The number of different modes used for a CTU is defined as R. Figure 4(b) shows the mode distribution for a CTU with R equals to 12, where the number inside CUs represents a prediction mode. Figure 4(c) lists the six CTUs and its R value from the third frame of BasketballPass sequence. With the increase of R value, the split structure of CTU is more complex. Above analysis suggests that the R value of co-located CTU from a previous frame can be used to predict the depth range of current CTU.

Fig. 4
figure 4

Encoding parameters for the second and third frames for BasketballPass sequence

It is obvious that the smooth region is more inclined to be coded by large CUs. In contrast, the region with rich texture is more likely to be coded into small CUs. Firstly, this paper classifies different CTUs into three categories according to depth information. A CTU is classified as simple CTU (SCTU) when all of CUs take a depth no larger than 1 and as complex CTU (CCTU) when all of CUs take a depth no smaller than 2. All other CTUs are defined as median CTU (MCTU). It is noted that the SCTU has the maximum R value 4 when all of CUs inside CTU take depth 1 and the CCTU has the maximum R value 16 when all of CUs inside CTU take depth 2. Then, the relation between R and CTU Complexity Type (CT) can be modeled as

$$ CT=\left\{\begin{array}{l}\mathrm{SCTU},\kern0.5em 1\le R\le 4\\ {}\mathrm{MCTU},\kern0.5em 4<R<16\\ {}\begin{array}{cc}\mathrm{CCTU},& 16\le R\le 35\end{array}\end{array}\right. $$
(1)

To analyze the relation between the practical split depth and classification results, the optimal depth information of SCTU, MCTU and CCTU is investigated respectively, as shown in Table 1. The nd3 column represents the percentage of CU whose depth smaller than 3 while the nd0 and nd01 columns show the percentage of CU with depth not less than 1 and depth not less than 2. The average value of nd3 is 93.1% for SCTU. For MCTU and CCTU, the average value of nd0 is 98.0% and 100.0%, respectively. Furthermore, the average value of nd01 for CCTU is 89.9%. That is, SCTU is more inclined to be coded by large CUs. Conversely, the small CUs are better for MCTU and CCTU. A state-of-art depth prediction algorithm is proposed by Zhao [43], which the depth information of co-located CTU is used. The algorithm is implemented for comparison under the same condition, as shown in Table 1. Whether the CTU belongs to SCTU or CCTU, the accuracy of proposed algorithm is higher than Zhao’s algorithm. To reduce encoding complexity as much as possible, the relation between the CT and depth range (DR) is modeled as

$$ DR=\left\{\begin{array}{l}\left[0,2\right],\kern0.5em CT\in \mathrm{SCTU}\\ {}\begin{array}{cc}\left[\;1,3\right],& CT\in \mathrm{MCTU}\end{array}\\ {}\begin{array}{cc}\left[2,3\right],& CT\in \mathrm{CCTU}\end{array}\end{array}\right. $$
(2)
Table 1 Optimal depth distribution of different CTUs (%)

In this way, the average accuracy of proposed algorithm can reach to 89.9%, guaranteeing the quality of video.

3.2 Distribution-based intra mode fast selection

Refined mode prediction process is one of main responsible for the increase of encoder’s complexity. Although the predefined 35 modes are reduced to 3 or 8 by RMD algorithm, the encoding complexity is still high and there exists some redundancy. In HEVC intra coding, a PU partitioning structure has its root at the respective CU level. This means that the PU prediction depends on the CU structure. If the split structure of CTU is known, the unnecessary predictions can be skipped without changing the optimal result. For example, if the DR value of one CTU is [9, 38], encoder only needs to traverse the depth from 2 to 3. In this section, we proposed to reduce the complexity of prediction process by exploring the correlation between CU structure and mode prediction.

Using the HEVC test model HM14.0, 5 different resolution sequences from class A to class E given by common test conditions [4] are tested to analyze the CU distribution structure, as shown in Table 2. The average percentage of CU from 0 to 3 depth is 8.12%, 25.64%, 26.67% and 39.58%, respectively. That is, the 0 and 1 depth CUs have a high probability to be split into smaller CUs. For CU at 3 depth, there is only 39.58% percentage to be selected as the optimal depth. There is no need to check all candidate modes for RDO process when the current CU has a lower probability to be selected as optimal depth. After RMD process, the modes inside candidate list are sorted by Hadamard cost in descending order. The distribution of Hadamard cost of a PU is highly correlated with the distribution of RD cost [17]. This implies that the mode ranked behind in candidate list has a low probability to be selected as the best mode by RDO process.

Table 2 Statistics of optimal CU structure (%)

Based on above analysis, a simple and effective algorithm is proposed to reduce the number of candidate mode for RDO process. Firstly, the prediction accuracy of different size PU is calculated when adjusting the N value, as shown in Fig. 5. And then, we propose revising the accuracy by taking the CU structure into consideration. We define a set ψ = {mode1, mode2, ∙∙∙, modeN} represents the first N modes in the candidate list. The original accuracy and revised accuracy are denoted by P and Pr, and they are calculated as shown in Eqs. (3) and Eqs. (4), respectively. Where nN is the number of PU whose best prediction mode belongs to ψ and nHM is the total number of PU. The factor is a probability related to CU distribution. When the PU size is 64 × 64, the CU has a probability of 91.88% to be split. That means, the wrong predictions have the same probability not to influence the final prediction result. Therefore, the factor value of 64 × 64 size PU is set as 0.9188. Similarly, the factor value of 32 × 32, 8 × 8 and 4 × 4 size PU are 0.6625, 0.6142 and 0.6142, respectively.

$$ P=\frac{n_N}{n_{\mathrm{HM}}} $$
(3)
$$ {P}_r=P+\left(1-P\right)\times factor $$
(4)
Fig. 5
figure 5

Mode prediction accuracy when adjusting N value

As shown in Fig. 5, the accuracy can be improved a lot after revising. In the mode prediction process, the number of candidate mode can be reduced by eliminating such structural redundancy. Under the condition that the revised accuracy is less than 3%, the N value of 64 × 64, 32 × 32, 8 × 8 and 4 × 4 size PU are sequentially set to 1, 2, 4, and 4. After processing, the calculations for the RD cost of each CTU is reduced from 2623 to 1337(1 + 2 × 4 + 3 × 16 + (64 + 256) × 4 = 1337), which significantly reduces the complexity of RDO process. Compared with 32 × 32 PU, the factor for 16 × 16 PU is lower than 40%, and for this reason the proposed algorithm is not applied for 16 × 16 PUs.

figure g

According to the above analyses, the proposed whole algorithm could be summarized in Algorithm 1. Firstly, the R value of current CTU is calculated based on the mode information obtained from a previous frame. Then, the min_depth and max_depth of current CTU are determined according to Eqs. (1) and (2). On the basis of proposed DIMFS algorithm, the number of modes for RDO process is predetermined. For CU depth from 0 to 3, the number of modes for RDO process is set 1, 2, 3 and 4, respectively. In the end, the prediction is recursively conducted at CU depth from min_depth to max_depth.

4 Experimental results

To verify the performance of the proposed CU size and PU mode decision algorithms, we implemented them on the HEVC reference platform HM 14.0 with the All-intra-main configuration. All the coding experiments are performed on the computer with Inter(R) Core(TM) i5–4590 CPU @3.3GHz and8GB RAM, Windows 7 64-bit operating system. Four QPs, 22, 27, 32 and 37 are used. The Size of largest CU and smallest CU are 64 × 64 and 8 × 8, respectively. The GOP size is 1 and the other coding parameter settings follow the definitions in common test condition [4]. In addition, encoding time is used to measure the computational complexity, and time saving ratio (∆T) is employed to measure the complexity reduction. RD performance is measured with BD-rate, which is defined in [3]. ∆T is defined as.

$$ \Delta T\left(\%\right)=\frac{1}{4}\sum \limits_{i=1}^4\frac{T_{\mathrm{HM}}^{QP_i}-{T}_{proposed}^{QP_i}}{T_{\mathrm{HM}}^{QP_i}}\times 100,{QP}_i=\left\{22,27,32,37\right\} $$
(5)

where \( {T}_{\mathrm{HM}}^{QP_i} \) and \( {T}_{\mathrm{Proposed}}^{QP_i} \)is the encoding time of original HM and proposed algorithm with QPi, respectively.

Twenty standard video sequences with various resolutions and contents, as shown in left part of Table 3, are encoded. Table 3 shows the encoding time reduction and RD performance of proposed algorithm. Compared with HM14.0, the proposed DIMFS and MCDRP algorithms can respectively save encoding time by 18.5% and 31.5% on average with just 0.23% and 0.24% BD-rate increase. The encoding time reduction of different sequence for MCDRP algorithm exists obvious difference and for DIMFS algorithm is almost the same. There are two mainly reasons for this phenomenon. On the one hand, the strategy adopted by the DIMFS algorithm between different sequences is same due to the global CU structure feature. On the other hand, the feature adopted by MCDRP algorithm is related to the video content. The video content of the partial sequence like Kimono is relatively simple and the CTU type is mainly SCTU. For such CTU, the complex prediction process of 8 × 8 and 4 × 4 PU will be early terminated according to 3.1 section. Therefore, the encoding time reduction for this kind of sequence is significant. The proposed overall algorithm can save encoding time by 43.2% on average with just a 0.47% BD-rate increase.

Table 3 RD performance and encoding time reduction of the proposed algorithm compared to HM14.0

To intuitively compare the RD performance of proposed algorithm and original HM14.0 algorithm, we draw the RD curves and local enlargements of “BasketballDrive” and “Kimono” which are shown in Fig. 6. The blue and red curves respectively represent the RD performance of original HM14.0 platform and our proposed algorithm. The two curves nearly overlap each other which implies that the RD performance of the proposed algorithm is almost the same with HM14.0 platform. In Table 3, the BD-rate increase of “BasketballDrive” is largest among all test sequences. Still, the RD performance degradation of ”BasketballDrive” is negligible. It proves that the proposed algorithm can guarantee the RD performance.

Fig. 6
figure 6

RD curve of “BasketballDrive” and “Kimono”

We compared the proposed algorithm with the state-of-art algorithms proposed by Shang [32] and Ismail [27]. As shown in Table 4, Ismail’s algorithm can save encoding time by 40.0% on average with 1.30% loss of BD-rate. Although additional threshold is employed to improve the accuracy, RDO early skip algorithm could make several optimal modes skipped, which results in significant loss of RD performance. Shang’s algorithm can save encoding time by 37.9% on average with only 0.66% loss of BD-rate. Compared with Ismail’s algorithm, the proposed algorithm can achieve 3% more time reduction. Meanwhile, the BD-rate loss of our algorithm is 0.83% lower than Ismail’s algorithm. Compared with Shang’s algorithm, the proposed algorithm can achieve more 5% time reduction. Meanwhile, the loss of BD-rate of our algorithm is 0.19% lower than Shang’s algorithm. Especially for large definition sequences including “Nebuta”, “SteamLocomotive”, “Kimono”, etc., the proposed algorithm can achieve 10% more time reduction than these reference algorithms.

Table 4 Performance comparison between proposed algorithm and other two state-of- art algorithms

Figure 7 shows the BD-rate and time saving among Shang [32], Ismail [27] and the proposed algorithm. Nine sequences with different resolutions and contents are contained. The sequence number from 1 to 9 are “PeopleOnStreet”, “Traffic”, “BQTerrace”, “Cactus”, “Kimono”, “BasketballDrill”, “BQMall”, “BasketballPass”, “BlowingBubbles”, respectively. It is observed that the loss of BD-rate of proposed algorithm is lower than 0.60%. The RD penalty of the proposed algorithm is always minimal among three algorithms. For most sequences, the difference of time reduction among three algorithms is quiet small. For sequence Kimono, it is evident that the time reduction of proposed algorithm is better than other two algorithms. In addition, we can observe that RD penalty of Ismail’s algorithm is large for some sequences such as “BasketballDrill”, “BQMall”, “BasketballPass” and “BlowingBubbles”. This mainly because the threshold for RDO early skip algorithm is not always effective for all sequences. In the proposed algorithm, the strongly temporal correlation is exploited to constrain the depth range of each CTU, rather than to exhaustively search at each depth level. In the mode decision, some unlikely candidate modes are skipped, which make full use of the intra prediction mechanism characteristic. Therefore, compared to the benchmark, the encoding speed is accelerated, and the RD performance is guaranteed.

Fig. 7
figure 7

BD-rate and time saving among reference algorithms and proposed algorithm

Besides the two classical algorithms above, we also select the two latest learning algorithms proposed by Zhang [41] and Zhang [42] for comparison, as shown in Table 5. It is observed that these two learning algorithms can respectively save 53.9% and 52.8% encoding time on average with 0.70% and 1.58% BD-rate penalty. In terms of time saving, the proposed algorithm is lower than learning algorithms, and superior to Ismail’s algorithm and Shang’s algorithm. For the comparison of RD performance, the BD-rate penalty of proposed algorithm is far superior to other algorithms. To make a fair comparison, the ratio of BD-rate to ∆T proposed by Correa [6] is employed to measure the overall performance of various algorithms. It is obvious that the lower BD-rate/∆T ratio represents the better performance as it is interpreted as a good balance between low penalty and high speed up. It is observed that the BD-rate/∆T value of proposed algorithm is 0.011 which is least among these algorithms. Although the complexity reduction of proposed algorithm is still not as much as learning algorithms, the proposed algorithm can achieve the better trade-off between RD performance and complexity reduction than current state-of-art algorithms.

Table 5 Performance comparison between the existing algorithms and the proposed algorithm

5 Conclusions

To reduce the complexity of HEVC, this paper proposes a simple and effective intra coding algorithm based on depth range prediction and mode reduction. In the CU depth early termination or skip, the number of different modes adopted by co-located CTU from a previous frame is employed to predict the depth range of current CTU. In the mode selection, CU distribution structure is employed to reduce the number of candidate mode for RDO process. Experimental results show that the proposed overall algorithm can save encoding time by 43.2% with negligible rate penalty of 0.47%. Compared with current state-of-art algorithms, the proposed algorithm can achieve the better trade-off between RD performance and complexity reduction. In the future, we will try hard to achieve more encoding time reduction while maintain the coding efficiency.