1 Introduction

High Definition (HD) and Ultra High Definition (UHD) video can bring rich details to us in our daily lives. Unfortunately, the huge size of such contents also creates considerable burden for the transmission system and the storage system. The state-of-the-art H.264/AVC standard may not be still the best choice for these mass data compression. To meet the new demands, ITU-T and ISO/IEC jointly called for the next-generation coding standard and formed the High Efficiency Video Coding Test Model (HM) in 2010 [24]. A formal milestone of the H.265/HEVC project is that the committee draft has been approved for balloting in 2012 [5]. The standardization has been published by ISO/IEC and ITU-T in 2013 [4].

Intra coding in H.265/HEVC is also made up of hybrid framework: prediction coding, transform coding and entropy coding [15]. There are three types of blocks in H.265/HEVC: coding unit (CU), prediction unit (PU) and transform unit (TU) [23]. CU is a basic coding unit which is similar to the macro block in H.264/AVC [25]. The size of CU in intra coding can range from 64 × 64 (Largest Coding Unit, LCU) to 8 × 8 (Smallest Coding Unit, SCU). LCU can be further split into smaller CUs according to the Quad-tree scheme [15]. PU is used for the prediction unit and TU is defined as transform unit. PU can also be split into TU based on the Quad-tree structure. The structure of the block partition method establishes complex interactions among the CU, PU and TU. The prediction mode of a block must be checked after each splitting. In addition, there are as many as 35 intra modes featured by fine directions in intra mode checking procedure [14]. Mode decision contributes more than half of the computational complexity of the intra frame coding in H.265/HEVC [17].

Recently, many fast intra coding methods have been proposed in different layers, which can be categorized as three classes: fast intra coding in CU layer [6, 13, 19, 20], in PU layer [1, 9, 12, 22, 27] and in CU&PU layer [18, 21, 26]. Since it is well-known that the complexity of CU layer is much higher than PU layer, most of the existed works mainly focus on the complexity reduction on CU layer. For instance, Kim [13] introduces an early termination method based on the statistics of rate-distortion (RD) cost in CU splitting process. If the RD cost is less than a threshold, the current CU is selected as the best one with the optimal size. [19] and [20] adopt weighted SVM and Bayesian decision rule to implement CU splitting early termination and fast coding unit size selection, respectively. Both of the methods can save the encoding time with negligible loss in performance.

Regarding the reduction of the number of modes for the RMD (Rough Mode Decision), [1] and [9] propose fast intra mode decision algorithm in PU layer, respectively. The proposed method in [1] is divided into two steps: coarse step and refinement step. The coarse step is to find the dominant direction by checking the intra directions with different angular step-size. In the refinement step, the angular step size reduction process is repeated until the best direction is confirmed. According to the direction energy strength of the four main directions at degree 0, 45, 90, and 135, Fang [9] chooses the direction with minimum energy as the best direction, and then constructs a set of 9 or 11 directions for rough mode decision rather than 35 candidate modes. Meanwhile, some authors devote themselves to reducing the number of candidate modes for the FMD (Fine Mode Decision) in PU layer. Zhang [27] analyzes the relationship between the texture characteristic of PU and its best coding mode, followed by a presentation of an adaptive strategy for fast mode decision in intra coding. The number of the modes participated in the FMD is controlled by the adjacent distance of angles. Shi [22] detects that the local salient modes, whose RMD costs have a significant drop compared with adjacent modes, tend to be promising competitors for the optimal mode. Thus, a few non-salient candidate modes are cancelled to reduce the number of the candidates for the FMD process. [12] presents a fast mode decision algorithm based on gradient direction. According to the distribution of the gradient-mode histogram, a small candidate mode set with 1, 2, 4, 8 and 10 is chosen for PU size from 64 × 64 to 4 × 4 in the RMD process and 1, 2, 3 and 4 in the FMD process.

In addition, other fast mode decision algorithms are used not only in CU layer but also in PU layer. Shen [21] skips some specific depth levels rarely used in spatially nearby CUs because optimal CU depth level is highly content-dependent. Meanwhile, by fully exploiting mode and RD cost correlations among nearby CU or depth levels, they can skip some modes in PU. In [18], the edge map built by Sobel edge operators is used to determine the range of CU size and the set of candidate modes. Zhang [26] proposes a fast intra mode decision algorithm with different schemes. At the micro-level, the Hadamard cost based rough mode search checks the potential modes instead of traversing all the candidates. At the macro-level, an early termination method is introduced if the estimated RD cost is larger than that of the current CU. [11] presents an outlier-based fast intra-mode decision in PU layer and a transparent composite model based split decision in CU layer. [28] proposes a hierarchical structure-based fast mode decision algorithm. It predicts the best CU size by analyzing the collocated block from a previous frame in CU layer and terminates the mode decision process in PU layer.

The current PU layer based technique tries to use the texture features of the adjacent blocks to limit the mode searching range. It is based on the assumption that the texture feature of the current block is similar to its neighbors. There are two main drawbacks of this strategy. Firstly, this strategy fails on images with fine details because the texture of the current block is far from its neighbors and it cannot be guaranteed that the optimal prediction mode of the center block falls in a subset of its neighbor blocks’ modes. Secondly, the feature based technique fails to characterize the uniqueness of the center block since the chosen subset comes from the neighbor blocks instead of itself.

In this paper, we propose a new strategy to fully explore the inherent features of the blocks to reduce the complexity in PU layer. We find that the distribution of SATD (Sum of Absolute Transformed Differences) cost of a block is highly correlated with the distribution of RD cost. This implies that the mode with large SATD cost has a low probability to be selected as the best mode by FMD. Therefore, the modes with large SATD cost can be cut before the PU prediction. To explore the relationship of the current block and its neighbors, we further proposed an AR (Auto Regression) model to estimate the mode cutting parameter adaptively. Meanwhile, a MPM (Most Probable Mode) joint fast mode selection algorithm is also proposed to further reduce the number of candidate modes for the FMD based on the inclusion relationship between MPM and the mode set after the RMD. Experimental results show that the proposed AR fast intra mode decision method can save encoding time by 32 % with negligible coding loss in all-intra main configuration.

The remainder of this paper is organized as follows: Section 2 presents the overview of mode decision in intra coding. In Section 3, an AR based fast mode decision method is presented. The experimental evaluation of the proposed technique is illustrated in Section 4. Section 5 briefly concludes our work.

2 Mode decision of intra coding

In order to exploit the spatial statistical dependencies in the source signal of a single frame, several new intra prediction modes are introduced to predict one block by referring to its neighboring parts, which is an efficient coding technique. Different from the intra prediction modes in H.264/AVC, the new intra prediction for H.265/HEVC has features of fine direction, various block size and better fractional pixel interpolation. In the encoder, intra prediction supports up to 33 directional modes, plus planar and DC prediction modes. To select the best mode for a block, each mode needs to be checked before coding. Obviously, computational complexity of the prediction is very high.

In the encoder, a best mode is selected by two procedures: RMD and FMD. The RMD is known as the fast mode decision and the FMD is the full RDO (Rate Distortion Optimization) mode checking. Figure 1 shows the scheme of the intra mode decision.

Fig. 1
figure 1

The scheme of the intra mode decision

The RMD process checks all the 35 intra modes to get the M promising candidate modes (also known as the rough mode set) based on the SATD cost. The value of M is predefined according to the size of current PU. M is set to {8, 8, 3, 3} for 4 × 4, 8 × 8, 16 × 16 and 32 × 32, respectively. The SATD cost function is defined as follows:

$$ SatdCost=SATD+{\lambda}_{mode}\cdot {B}_{mode} $$
(1)

where SATD is the absolute sum of Hadamard transformed residual of a PU. λmode is the Lagrangian constant and Bmode is the bit cost of the prediction mode. MPM derived from encoded neighbor PUs is also joined together with the rough mode set to form a fine mode set for full RDO mode checking. Finally, the FMD checks the fine mode set to obtain the optimal mode with minimal RD cost.

Both the RMD and FMD for a block need to estimate the distortion and coding bits. They also share the same prediction unit for calculating the distortion. Considering that the function of Hadamard transform in RMD and DCT transform in FMD is similar in essence, the major difference between them is the estimation of the coding bits. Therefore, the SATD cost is highly correlated with the RD cost. The mode with smaller SATD cost in the RMD has a higher probability to be selected as the best mode in the FMD. The average percentage of first two candidates selected as the optimal mode is as high as 79.31 % in some video sequences [27].

Figure 2 shows the SATD cost distribution and the RD cost distribution of a frame in Traffic sequence [3] with 2560 × 1600 resolutions in terms of the HM13.0 [12] intra main profile. In Fig. 2, the horizontal axis denotes the index of the mode and the vertical axis is the cost. The quantization parameter (QP) value is 32. The 9 modes (including the MPM) in Fig. 2 are the candidate modes selected by the RMD and they are also checked by the FMD. Obviously, the distribution of the SATD cost in Fig. 2a and c is similar to that of the RD cost in Fig. 2b and d, respectively. The best mode selected by the FMD can also be selected by the RMD.

Fig. 2
figure 2

The cost distribution of RMD and FMD for a block

Figure 3 provides the relationship between SATD and RD cost in various QPs (four QPs, 22, 27, 32 and 37) and sequences (class A to class E, 18 sequences). They are also including the statistics of different block sizes. Figure 3a is the probability of the optimal model falls into the first two candidates (two modes with minimum SATD costs) in different QPs and sequences. Figure 3b is the probability of chosen the first candidate as the optimal mode with different QPs and sequences. From the two figures, we can find that the SATD cost has a high correlation to the RD cost. The mode with smaller SATD cost in the RMD has a higher probability to be selected as the best mode by RD.

Fig. 3
figure 3

The relationship between SATD and RD cost

In intra coding of the encoder, the high complexity of the mode decision procedure is mainly caused by the FMD process due to the complicated RDO checking. A direct way to decrease the complexity of intra coding is to control the number of the modes which are used by the FMD.

3 The proposed AR based fast mode decision

In this section, an AR based fast mode decision method for H.265/HEVC intra coding is proposed. Based on the relationship between the RMD and FMD, a selective model is proposed to remove some modes in the rough mode set. In this model, the parameter selection is critical. According to the upper and left block around current block, the parameter in the selective model is selected based on the AR model. An MPM joint fast mode selection algorithm is also introduced to reduce the number of the modes for the FMD.

3.1 AR based fast mode selection

As we have discussed in Section 2, the distribution of SATD cost of a block partly is highly correlated with the distribution of RD cost. The complexity of the FMD is higher than the RMD on account of the computation of RD cost. The mode with larger SATD cost has a lower probability to be selected as the best mode by FMD. If these modes are removed, the FMD process can save the encoding time considerably. We must have a strategy to decide which modes can be removed from the rough mode set.

Suppose there are m(m = 3 or 8) modes in the rough mode set selected by the RMD for current block. These modes are sorted in ascending order based on the SATD costs. In this sequence, SatdCost i is smaller than SatdCost i + 1. So, the ith mode has a higher probability to be selected as the optimal mode than the (i + 1)th mode according to the analysis in Section 2. If the ith mode is selected as a candidate mode for FMD process, the difference between SatdCost i and SatdCost i + 1 can be used to decide whether the (i + 1)th mode is also selected as a candidate mode for FMD. This rule can be formulated by the following formula:

$$ \begin{array}{l} SatdCos{t}_{i+1}- SatdCos{t}_i\le {\alpha}_j\cdot \frac{SatdCos{t}_{i+1}+ SatdCos{t}_i}{2}\\ {}\kern1em \left(2\le i\le 8,0\le \alpha \le 1,j=4,8,16,32\right)\end{array} $$
(2)

where α j is the parameter for current block. j is the size the block. (2) is used to estimate the difference between SatdCost i and SatdCost i + 1. If the ith mode has been selected for the FMD and formula (2) is true, which means SatdCost i + 1 is close to SatdCost i , we believe that the selection of the (i + 1)th mode as a candidate mode for FMD is reasonable. Otherwise, we can discard the (i + 1)th mode.

It is possible that many candidate modes are checked by the FMD if their SATD costs differ slightly in a block. In contrast, only a few modes are selected for the FMD if their SATD costs differ greatly. Therefore, the number of the candidate modes for the FMD is changing dynamically for different blocks. Obviously, this dynamical changing process is an adaptive procedure to select the candidate modes for the FMD. The number of candidate modes for the FMD is controlled by two parameters: distribution of SATD costs and adjusting parameter α. The latter one is a critical element in (2). If α increases, more candidate modes will be selected for the FMD and vice versa.

If the local area is similar and the adjusting parameters α of the neighbor coded blocks are known, we can decrease the value of the current α to reduce the number of candidate modes. Otherwise, we need to increase α to avoid the case that the optimal mode is not contained in the candidate mode set. The two situations imply us that the current α has a relationship with the neighbor coded blocks. However, the coding process is a causal scan of the image. In other words, the right and lower blocks are not available when we encode the current block. The adaptive selection of α only depends on the left and upper blocks. For simplicity reasons, we use linear weighting of the adjust parameters of neighbor blocks to estimate the α in the current block. This process in mathematics can be regarded as an AR process:

$$ \begin{array}{l}\kern4em {\alpha}_j={\displaystyle {\sum}_{k=0}^{\frac{j}{4}\cdot 2}{\omega}_k\cdot {\alpha}_k}\\ {}\left(0\le \alpha \le 1,0\le \omega \le 1,0\le k\le 32,j=4,8,16,32\right)\end{array} $$
(3)

where j denotes the size of the current block. α k is the adjusting parameter of the upper and left coded 4 × 4 basic unit. If the size of the optimal PU in upper block or left block is not 4 × 4, we can divide the PU into 4 × 4 basic units and the adjusting parameter of each basic unit is equal to that of the optimal PU. The adjusting parameter of the left-up block is also included in (3). The distribution map of α k and α j in spatial domain is given in Fig. 4.

Fig. 4
figure 4

The distribution map of adjusting parameters in spatial domain

In H.265/HEVC, the parameters in current PU always depend on the blocks around it. But experimental results show that the effects of the right-up and left-below blocks are negligible. Therefore, k is determined by the number of the adjusting parameters in left block, left-up block and upper block. When j is 4, 8, 16 and 32, the value of k is 3, 5, 9 and 17 respectively. ω k is the corresponding weight of α k and \( {\displaystyle {\sum}_{k=0}^{\frac{j}{4}\cdot 2}{\omega}_k}=1 \). The weight ω k relies on the distance d between the current block and the nearby block. The adjacent blocks are assigned the highest weight and the weight decreases with the increment of the distance. Therefore, we use the conventional Gaussian function exp(−d/δ) to weight the distance. The density function is defined as follows:

$$ {\omega}_k=\frac{1}{\sqrt{2\pi}\sigma }{e}^{-\frac{{\left(k-\mu \right)}^2}{2{\sigma}^2}} $$
(4)

where μ is the mean and σ is the variance of the distribution, which are empirically set to 10 and 5, respectively. The setting of the Gaussian parameters in Eq. (4) is based on a training process of the test sequence. To be specific, all the optimal α j can be obtained by traversing all the possible modes according to the Eq. (5).

$$ \begin{array}{l}{\alpha}_j=\frac{SatdCos{t}_{i+1}- SatdCos{t}_i}{\left( SatdCos{t}_{i+1}+ SatdCos{t}_i\right)/2}\\ {}\left(2\le i\le 8,0\le \alpha \le 1,j=4,8,16,32\right)\end{array} $$
(5)

The training is to find the ω k vector based on a set of α k . In (3), the initialization of the adjusting parameter in the boundary block of a frame is critical. In this area, the information of upper PU or the left PU is missed. We can also get a precise adjusting parameter α j by Eq. (5). When the RMD is completed, a SATD cost sequence in ascending order is obtained. After the optimal mode is selected by the FMD, we can find the SATD cost of this mode in the sequence. This value is defined as SatdCost i , and SatdCost i + 1 is the successive one in the sequence. In this situation, (2) is false. This means the mode of SatdCost i is selected in candidate mode set for FMD but the mode of SatdCost i + 1 and its successive modes are discarded.

The flow of AR based fast mode selection is illustrated in Fig. 5. For the boundary blocks, the adjusting parameters are initialized by (5). For other blocks, the rough mode set with 3 or 8 modes is obtained after the RMD. We can check these modes using their SATD costs by formula (2) until formula (2) is false or the last mode has been checked. The modes whose SATD costs conform to formula (2) can constitute a new candidate mode set {N} for the FMD.

Fig. 5
figure 5

The flow of AR based fast mode selection

3.2 MPM joint fast mode selection

In H.265/HEVC, the MPM is critical because of the similarity of the local texture in a frame. Thus, MPMs needs to be checked carefully by the FMD in both H.264/AVC and H.265/HEVC. With the AR based fast mode selection method, the candidate mode set {N} for FMD may not be included the MPM. If the MPMs are discarded, the loss of the coding performance is not negligible. In this paper, an MPM joint fast mode selection is also proposed to further exploit space correlation and preserve the coding performance.

MPMs are the optimal modes of upper PU and left PU around current block. There are three cases for MPMs: {N} contains two MPMs, {N} contains one MPM and {N} contains none of them. The solutions for these cases are shown in Table 1.

Table 1 The solution of MPMs

In Table 1, if both of the MPMs (MPM1 and MPM2) are in set {N}, set {N} is used for the FMD process. If only one MPM belongs to set {N}, we just add the other MPM into set {N} to construct a new set for the FMD. In the case that the both MPMs are not included in set {N}, we adopt the proposed MPM joint fast mode selection method to choose one or two MPMs and add them into set {N} to form a new mode set for FMD.

Now we consider choosing MPMs in the last case. The selection strategy is based on the direction difference between MPM1 and MPM2. If the direction difference between MPM1 and MPM2 is below a threshold (empirically set to 5), it implies that the directions of MPM1 and MPM2 are close. Thus we only chose the MPM with minimum SATD cost in set {N}. Otherwise, it is intractable to evaluate which MPM is better for the current block, and we have to add both of them into set {N}.

3.3 Overall algorithm

The proposed algorithm includes AR based fast mode selection and MPM joint fast mode selection. The flow of the proposed AR based fast mode decision algorithm is shown in Fig. 6. The main steps of the proposed AR based fast mode decision method are summarized as follows:

Fig. 6
figure 6

The overflow of proposed AR based fast mode decision algorithm

  1. Step 1:

    Get a mode set {M} after rough mode decision for current block.

  2. Step 2:

    Acquire the adjusting parameter α j by (3).

  3. Step 3:

    Adopt AR based fast mode selection to get a mode set {N}.

  4. Step 4:

    Decide whether the MPMs are selected for FMD by the MPM joint fast mode selection method.

  5. Step 5:

    Check fine mode set for FMD and determine the optimal mode.

We can observe that the complexity of the proposed AR based fast mode selection and the MPM joint fast mode selection is only a few logical operators. Its complexity is negligible compared with that of the FMD.

4 Experimental results

4.1 Performance evaluation

In order to verify the performance of our proposed AR based fast mode decision method, we implement the proposed method on the H.265/HEVC reference software (HM13.0 with main profile) [10]. There are 18 test sequences recommended in [3] and they are divided into five classes based on the resolution: class A (4 K), class B (1080p), class C (WVGA), class D (QWVGA) and class E. Experiments are carried out for all I-frames sequence. The test platform is 16 GB memory, Intel i7-4770 K with 3.50GHz frequency. The proposed method is evaluated with QPs 22, 27, 32 and 37 for different bit-rates.

In this experiment, coding capability is measured by Bjontegaard delta bit rates (BDBR) (%) [2]. The time efficiency is measured by Average Time Saving (ATS) (%) defined in (6) that represents coding time reduction in percentage.

$$ ATS=\frac{1}{4}{\displaystyle \sum_{i=1}^4\frac{T_{HM13.0}\left(Q{P}_i\right)-{T}_{proposed}\left(Q{P}_i\right)}{T_{HM13.0}\left(Q{P}_i\right)}}\times 100\% $$
(6)

where T HM13.0(QP i ) is the encoding time of HM13.0 and T proposed (QP i ) is that of our proposed method.

The performance of the proposed AR based fast mode decision method in coding capability and time efficiency is shown in Table 2. In this table, positive and negative values represent increment and decrement, respectively.

Table 2 Coding capability and time efficiency of the proposed method

In Table 2, we can see that our proposed algorithm can achieve 32 % encoding time reduction in average for all sequences. This is due to the fact that the number of candidate modes used for the FMD is reduced. Meanwhile, BDBR is only 0.9 % increase, which means the loss of the coding performance is negligible. The main reason is that the discarded modes have a lower possibility to be selected as the optimal mode. We can achieve a decent tradeoff between the coding performance and the complexity reduction.

We also draw RD curves of five classic test sequences: Traffic, Kimono, Basketball Drill, Race Horses and Viryo1. The details of the RD curves are shown in Fig. 7. In Fig. 7, the performance of our proposed algorithm is close to the HM13.0.

Fig. 7
figure 7

RD-curves for five typical test sequences

4.2 Performance comparison

Since HM10.0 is the version after FDIS (Final draft international standard), we only compare with the techniques based on HM10.0 or higher. After investigating the recent published works, we find three works, [7, 8, 16] acted on PU layer for fast mode decision, and [28] provides its PU-only mode fast selection experimental results. They are implemented on HM version 12.0, 12.0, 13.0 and 11.0, respectively. For fairness of the comparison, the proposed technique is also launched in related versions. Both the differences in BDBR increment and encoding time saving of our method are marginal among them. In addition, all the techniques share the same configuration (all-intra main configuration). The comparison between the proposed method and other three methods is shown in Table 3.

Table 3 The comparison between the proposed method and other methods

In Table 3, [16] can save 28 % encoding time with as high as 1.7 % performance loss in BDBR. Although [28] and [7] can save more time than our work, the performance loss of them is up to 1.6 % and 2.44 %, respectively. This performance loss is not negligible in practice. Here, we also find that the time-saving and performance-loss are two trade-offs. To get a fair comparison, we use the (A-TS)/(A-BDBR) to evaluate the performance of the five methods. “A-BDBR” represents the average BDBR and “A-TS” is the average coding time saved. The ratio represents the time saving in per loss of the performance unit (the larger the better), as the metric to evaluate the trade-off. Obviously, our proposed AR based fast mode decision method has a better tradeoff between coding efficiency and encoding time.

In addition, we observe that most of the intra optimization techniques in complexity reduction concentrate on the CU layer. It is particularly worth mentioning that the proposed AR based fast mode decision algorithm is designed for PU layer instead of the CU layer. Thus, the proposed method can be incorporated with CU optimization techniques to obtain further improvements.

4.3 The evaluation for MPM joint fast mode selection

MPMs are critical in intra coding. In order to evaluate the performance of the MPM joint fast mode selection, the statistic experiment based on our proposed AR based fast mode decision method consists of three cases: 1) the MPM with least SATD cost, 2) the other MPM and 3) both of them. The results are shown in Table 4. From Table 4, it is not surprising to find that choosing both of the MPMs achieves the best performance but consumes the highest executing time. Choosing the MPM with least SATD cost can achieve comparable performance with the selection of both MPMs method. But the former one saves around 3 % extra executing time.

Table 4 The experiment for MPMs selection

4.4 Comparison with the least SATD one and the first two methods

We have also compared the results of the proposed method with the fast selection methods directly determined by the least SATD cost one and the first two. The results are shown in Table 5.

Table 5 The comparison with the least SATD ones and the first two

In this table, we can find that our proposed method could achieve close complexity reduction compared with the method which benefits by first two candidates. However, the method of first two candidates with least SATD cost produces larger performance loss compared with our method.

5 Conclusion

This paper presents an AR based fast mode decision method for H.265/HEVC intra coding. Considering that the distribution of the SATD cost partly reflects that of the RD cost, the modes with the smaller SATD cost have a higher probability to be selected as the best one. We utilize an adaptive factor based on AR model to remove some candidate modes. Thus, the number of the candidate modes for the FMD checking is reduced considerably. The MPM joint fast mode selection method is also proposed to further reduce the number of candidate modes for FMD. Standard test conditions for 18 test sequences which belong to Class A, Class B, Class C, Class D and Class E, are adopted to verify the performance of the proposed algorithm. The comprehensive experiments show that the proposed algorithm can save 32 % encoding time with only 0.9 % BDBR increase. It implies that our proposed algorithm can reduce the computational complexity of H.265/HEVC with negligible loss in coding performance. Compared with other works, our proposed algorithm achieves the state-of-art performance in the balance between performance loss and time saving. In addition, because this optimization technique is proposed for PU layer, it can be further incorporated with CU optimization techniques to obtain further improvements.