1 Introduction

Established by the Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, H.264/AVC is the present international video coding standard. Compared with previous video coding standard, this standard obtains about half bitrate saving [16]. In H.264/AVC, the maximal size of block is 16 × 16 pixels, which has exhibited fine coding efficiency in the application of videos of 720p and lower resolution. However, the high-definition (HD) video of 1080p and higher resolution has received increased attention recently. Higher resolution means that there will be bigger blocks, where the objectives have the same motion vector. The maximal 16 × 16 block size is too small to be applied in HD video and leads to inefficiency [1, 13]. Therefore, the joint collaborative team on video coding (JCT-VC) aims to develop the next generation video coding standard, high efficiency video coding (HEVC) [2, 4, 15, 17]. HEVC expands its maximal block size to 64 × 64 pixels to enhance the coding efficiency and its aim is to save half bitrate than H.264/AVC. However, it has yielded tremendous computational complexity simultaneously.

Guilherme Corrêa [6] proposed a complexity control method based on a decision algorithm that dynamically adjusts the depth of the Coding Units (CU) defined by quad-tree structures. In their method, the relationship between CU depth and coding complexity is used to selectively constrain the CU depth in order to not exceed a predefined complexity target. Jongho Kim [10] proposed an adaptive coding unit early termination algorithm for HEVC by analyzing the correlation of whether the coding mode is SKIP not only between the neighbouring CU and current one, but also between the current depth of CU and the deeper depth of CUs. Jaehwan Kim [9] checked the differential motion vector (DMV) and the coded block flag (CBF) after searching the optimal Inter 2Nx2N mode, then proposed an SKIP mode early detection algorithm in one CU level. Xiaolin Shen [19] employed important and computational-friendly features to help making a precise and fast selection on CU size based on a Bayesian decision rule, which avoid exhaustive RDO search on all possible CU sizes and its modes. They [18] also proposed a CU splitting early termination algorithm. In this algorithm, CU splitting is modeled as a binary classification problem, on which a support vector machine (SVM) is applied. Guilherme Corrêa [8] found an observation that the maximum coding tree depth remains constant during quite long periods of the video sequence. Based on this observation they proposed an algorithm that maintaining the maximum coding tree depth for a relatively long period and skipping all the remaining tree possibilities tests would incur in a small decrease in terms of RD performance. Kiho Choi [3] Proposed an early TU decision method for HEVC, which can prune a residual quadtree in the early stages based on the number of nonzero DCT coefficients. In [7] the maximum coding tree depth allowed foreach coding tree block is determined during encoding time based on both spatial and temporal correlation. Then a complexity control method for HEVC was proposed by dynamically adjusting quadtree-based data structures depth according to computational or power resources availability. In [20], based on the depth information correlation between spatio-temporal adjacent CTUs and the current CTU, some depths can be adaptively excluded from the depth search process in advance. Kalyan Goswami [11] analyzed correlation information between skip modes in both the temporal and spatial domains. Based on correlation information, a statistical model is developed to predict an early skip for the current CU. Liquan Shen [14] observed two phenomena. The former is that the optimal depth level of a certain treeblock is the same or very close to the depth level of its spatially adjacent blocks due to the high correlation between adjacent blocks. The latter is that the depth information of the co-located treeblock in the previous frame affects the depth level determination process of the current treeblock. Based on the spatial and temporal correlations, they proposed an effective CU size decision method for HEVC.

In order to further reduce the unnecessary partition modes, the temporal and spatial correlation is used to reduce the coding computational complexity in this paper. The temporal correlation of the CU segmentation and partition mode between two adjacent frames is analyzed. Based on the temporal correlation, at most two partition modes are evaluated for each CU to reduce the computational complexity. The spatial correlation of CU segmentation and partition mode between the corresponding located (co-located) CU and its four surrounding CUs are analyzed. Based on the spatial correlation, only the optimal partition mode is evaluated for the deeper depths of CUs of the current CU to reduce the computational complexity. The simulation results show that the proposed algorithm further reduces the coding computational complexity with negligible loss in bitrates and video quality compared to ref.[14].

The outline of the paper is as follows. Section 2 introduces the previous fast inter mode decision method for HEVC. In Section 3, the fast inter-prediction algorithm is proposed. In Section 4 simulation results of the fast inter-mode are presented and discussed. Section 5 concludes the paper.

2 Overview of Previous Fast Inter-prediction in HEVC

Liquan Shen proposed a fast CU size decision algorithm for HM. Their algorithm is composed of a CU depth range decision method and three early termination (ET) algorithms based on motion homogeneity checking, RD cost checking and SKIP mode checking, respectively.

An optimal depth level of a treeblock Depth pred based on the spatial and temporal correlations is defined in their CU depth range decision method:

$$ Dept{h}_{pred}={\displaystyle \sum_{i=0}^{N-1}{w}_i\cdot {\delta}_i} $$
(1)

Where N is the number of the CU depth. δ i is the value of the depth level. w i is the weight value determined based on correlations between the current treeblock and its neighboring treeblocks. The four weights are normalized to have

$$ {\displaystyle \sum_{i=0}^3{w}_i=1} $$
(2)

weights for L, U, col and treeblocks are set to 0.3, and the weight for l-U treeblock is set to 0.1 as shown in Fig. 1.

Fig. 1
figure 1

temporal and spatial correlations of CUs

According to the predicted value of the optimal depth level, each treeblock is divided into five types: G0, G1, G2, G3 and G4. Type G0 to G4 denote the depth ranges of [0,0],[0,1],[0,2],[1, 16] and [1, 13], respectively. That is, the ME computation of the CU depth out of the depth ranges can be skipped to reduce the computational complexity.

In the ET method based on motion homogeneity checking, MVs from the current CU and those from 4 × 4 blocks covered by neighboring CUs are used to evaluate the motion homogeneity. If the motion homogeneity is smaller than a threshold, the current CU is considered with homogeneous motion. In this situation, its motion can be efficiently represented using the current CU size. Therefore, it is not necessary to spit the current CU into four sub-CUs and perform ME on sub-CUs. We can skip the ME on the next depth level.

In the ET method based on RD cost checking, the RD cost based correlation is used to determine the ET threshold. RD costs of spatial neighboring CUs (left CU, upper CU, and left-upper CU) and the co-located CU at the previously coded frame is computed to evaluate the RD cost of the current CU. When RD cost of the current CU size is smaller than the determined threshold, the ME on the next depth level can be skipped.

In the ET method based on SKIP mode checking, the prediction modes from the current CU and the parent CU in the upper depth level are checked. If both of them have SKIP as the best mode, the motion can be efficiently represented using the current CU size. Therefore, the ME on the next depth level can be skipped.

Their overall algorithm incorporates the depth range decision method and the three ET methods. Firstly, the predicted value of the optimal depth level of the current CU is evaluated to decide the depth range. The ME process of the treeblock out of the depth range is skip. Secondly, in the quad-tree CU depth partition process, the three ET conditions are checked. When one of them is satisfied, an early termination algorithm will be triggered. Finally, the unnecessary CU depths of the current LCU can be skipped to reduce the coding complexity.

3 Proposed Fast Inter-prediction algorithm

The aforementioned CU size decision method achieves about 30 % coding time with negligible loss in video quality and bitrate compared to HEVC test model HM2.0. However, there are only four partition modes in HM2.0. In current HM 13.0 [12], there will be eight partition modes for the CU size decision method. In order to further reduce the coding complexity, the temporal similarity between the current CU and the co-located block in the previous frame can be used to skip the unnecessary partition modes in this paper, further reducing the computational complexity. According to the similarity, the partition mode of the current CU depends on the structure of the co-located block in the previous frame. Furthermore, the largest size of CU in HEVC is 64 × 64. If the co-located CU and its four surrounding CUs have a large size and PART_2N × 2 N partition mode, it can be assumed that the region of the co-located CU is smooth or the objects in it have the similar vectors. According to temporal correlation, the region of the current CU may be smooth or the objects in it have the similar vectors, and then some unnecessary partition modes for the deeper depths of CUs can be skipped to reduce the computational complexity. Considering that some objects in the surrounding CUs in the previous frame may move to the current CU in the current frame, the spatial correlation of partition modes between the co-located CU and its surrounding CUs should be analyzed. In order to skip the unnecessary partition modes, the optimal partition mode of the CU in HEVC should be selected. To utilize the temporal and spatial correlation, the temporal similarity of CU segmentation and partition mode between two adjacent frames and the spatial correlation between the co-located CU and its four surrounding CUs should be analyzed. There are three relationships between the current CU and the co-located CU, the former is larger than, equal to or smaller than the latter. Therefore, the temporal and spatial correlation should be analyzed under the three relationships.

3.1 Statistical Analysis of the Optimal Partition Mode for CU

In order to skip the unnecessary PU partition modes in all depths of CU, their probabilities should be counted. Therefore, we counted the probabilities of the partition modes of six sequences with different resolutions and motions in HM13.0. The group of picture (GOP) structure is IPPP and the GOP size is set to 4. The Max Partition Depth (MPD) is set to 4, that is, the CU size is set from 8 × 8 to 64 × 64 pixels. For motion search, fast search and hadamard ME are set on and the search range is set to 64. For quantization, transform skip (TS) is on. For deblock filter, all the options are disabled. Internal bit-depth is set to 10. All the coding tools: Sample adaptive offset (SAO), Adaptive loop filter (ALF), Chroma from Luma intra prediction mode (LMC Chroma), Non-square transforms (NSQT) and Asymmetric motion partitions (AMP) are enabled. Four QP values are used: 22, 27, 32 and 37.

It is found that the partition mode PART_2N × 2 N is the most probable mode for all depths of CUs in all the different sequences. The results are listed in Table 1. Based on this phenomenon, PART_2N × 2 N is selected as the optimal partition mode in our algorithm. On the other hand, another partition mode can be selected according to the similarity of CU segmentations and partition modes between two adjacent frames, which is the main task in our proposed algorithm.

Table 1 Probabilities of partition mode PART_2N × 2 N for different depths of CU

3.2 Analysis of the Three Relationships between the Co-located CU and the Current CU

For the current CU, three relationships between the co-located block and the current CU need to be considered: the size of the current CU is larger than, equal to and smaller than that of the co-located CU.

3.2.1 The Size of the Current CU is Larger than that of the Co-located CU

When the size of the current CU is larger than that of the co-located CU, their relationship can be shown in Fig. 2. In Fig. 2, the size of the current CU is 2 N × 2 N, and the one of each CU in the co-located block is N × N. In addition, not all the CUs in the co-located block have a partition mode of PART_2N × 2 N. In this case, the objects in the co-located block are more likely to be not static or have different motion vectors. Therefore, the current CU is unlikely the leaf node CU and the other partition modes except PART_2N × 2 N for the current CU segmentation may be skipped.

Fig. 2
figure 2

Condition of skipping the current CU segmentation

In order to analyze the relationship between the current CU and the co-located block, two conditions are defined as follows:

$$ Con1=\left\{{L}_{co}<{L}_{cu}\right\} $$
(3)
$$ Con2=\left\{{L}_{co}<{L}_{cu}\right\}\cap \left\{{P}_{cu}= PART\_2 N\times 2 N\right\} $$
(4)

where L co and L cu are side lengths of the CUs in the co-located and the current CU, respectively. P cu is the partition mode of the current CU. Condition Con1 means that the CUs in the co-located block are smaller than the current CU. Condition Con2 means that the CUs in the co-located block are smaller than the current CU and the partition mode of the current CU is PART_2N × 2 N. Then, three probabilities are defined by:

$$ {P}_{i, c}=\frac{N_{i, c}}{N_i} $$
(5)
$$ {P}_{i, e}=\frac{N_{i, e}}{N_i} $$
(6)
$$ {P}_i=\frac{P_{i, e}}{P_{i, c}}=\frac{N_{i, e}}{N_{i, c}} $$
(7)

where N i is the number of the current CUs to be encoded. N i,c is the number of the current CUs when the condition Con1 is satisfied. N i,e is the number of the current CUs when the condition Con2 is satisfied. The variable i denotes the CU depth (i.e. 1, 2 and 3) and c denotes the current CU and e denotes the current CUs with a partition mode PART_2N × 2 N. P i,c denotes the probability that the condition Con1 is satisfied. P i,e denotes the probability that the condition Con2 is satisfied. P i denotes the probability that the partition mode of the current CU is PART_2N × 2 N when the condition Con1 is satisfied. In order to analyze the three probabilities, six sequences with different resolutions and motions are evaluated in HM13.0 and the results are shown in Table 2.

Table 2 Results of P i,c , P i,e and P i under all depths of CU

It can be seen from Table 2 that there are some cases when the current CU is larger than the CUs in the co-located block. Among these cases, most of the current CUs have a partition mode of PART_2N × 2 N. Therefore, a conclusion can be drawn according to the result in Table 2: when the CUs in the co-located block is smaller than the current CU, only partition mode PART_2N × 2 N needs to be evaluated for the current CU and the other partition modes can be skipped to save coding computational complexity.

3.2.2 The size of the current CU is equal to that of the co-located CU

When the size of the current CU is equal to that of the co-located CU, there are two cases need to be analyzed. The first is the similarity between the current CU and the co-located CU, which can be utilized to skip the unnecessary partition modes for the current CU. The second is the condition when the other partition modes except PART_2N × 2 N for the deeper depths of sub-CUs segmentation of the current CU can be skipped.

There is a high degree of similarity between the current CU and its co-located CU, the partition mode of the former will depend on that of the latter. It can be assumed that if the region of the co-located CU and its surrounding CUs is smooth or the objects in it have similar motion vectors, the region of the current CU will have the same characteristics. Considering the motion in time domain, some of the objects in the current CU may have moved from the surrounding CUs in the previous frame. All the objects in current CU are still static or have the same motion vector. Therefore, the regions of the deeper depths of CUs of the current CU will be smooth or the objects in them will have similar motion vectors. Duo to that there are four types of depths for CU in HEVC, the spatial correlation between the co-located CU and its four surrounding CUs should be analyzed at all CU depths. Among the eight partition modes in HEVC, PART_2N × 2 N represents the smoothest region for a CU. Therefore, if the region of the co-located CU and its surrounding CUs is smooth or the objects in it have similar motion vectors, the partition modes of the deeper depths of CUs of the current CU may be PART_2N × 2 N so the other partition modes can be skipped to reduce coding computational complexity, as shown in Fig. 3. In the following contents, the above assumptions will be tested in HM13.0 and the results will be analyzed to construct the algorithm when some condition is satisfied.

Fig. 3
figure 3

Condition of skipping the other partition modes except PART_2N × 2 N of the deeper depths of CUs based on the spatial correlation

In order to describe the smooth degree of the region where the co-located CU and its surrounding CUs are located, a set of the condition when the region is believed to be smooth is defined as follows:

$$ Su{r}_i={\displaystyle \underset{j=0}{\overset{3}{\cap }}\left(\left({L}_j={L}_{co}\right)\cap \left({P}_j= PART\_2 N\times 2 N\right)\right)\cup \left({L}_j>{L}_{co}\right)} $$
(8)

where L j is the side length of one of the four surrounding CUs, j is at a range of 0 to 3, denote the upper, left, right and lower neighboring CUs, respectively. P j are partition modes of the upper, left, right and lower neighboring CUs of the co-located CU. The suffix i is the CU depth and at a range of 0 to 3. Condition Sur i means that the co-located CU is equal to its four surrounding CUs and the partition modes of the surrounding CUs are PART_2N × 2 N, or the former is smaller than the latter.

Based on the condition Sur i , another set of condition is defined as follows:

$$ C1=\left({L}_{cu}<{L}_{co}\right)\cap Su{r}_i $$
(9)

C1 denotes the condition when L cu  < L co and condition Sur i is satisfied.

Then, three proportions are defined by:

$$ {R}_{i, s}=\frac{N_{i, s}}{N_i} $$
(10)
$$ {R}_{i, d}=\frac{N_{i, d}}{N_{i, s}} $$
(11)
$$ {R}_{i, e}=\frac{N_{i, p}}{N_{i, c}} $$
(12)

In (10), N i,s is the number of the co-located CUs when condition Sur i is satisfied and N i is the number of the co-located CUs. The suffix s denotes the condition Sur i is satisfied and i denotes the CU depth. In (11), N i,d is the number of the co-located CUs when condition C1 is satisfied. The suffix d denotes the deeper depths of segmentation. R i,d illustrates the probability that the CUs are smaller than the co-located CU under condition Sur i . In (12), N i,p is the number of the CUs with partition mode of PART_2N × 2 N in the current block under condition C1 and N i,c is the number of all the CUs in the current block under condition C1. R i,e illustrates the probability that the CUs in the current block have a partition mode of PART_2N × 2 N when they are smaller than the co-located CU under condition C1. The suffix e denotes the current CUs with a partition mode PART_2N × 2 N.

In order to illustrate the phenomenon more intuitively, another proportion is defined as follows:

$$ {R}_i={R}_{i, d}\times \left(1-{R}_{i, e}\right) $$
(13)

where R i is the probability that the CUs in the current block are small than the co-located CU and have other partition modes except PART_2N × 2 N when condition C1 is satisfied.

In order to analyze the probabilities R i,s , R i,d , R i.e and R i , six sequences with different resolutions and motions are tested in HM13.0 and the results under all CU depths are shown in Table 3.

Table 3 Results of R i,s , R i,d , R i.e and R i under all CU depths (%)

Table 3 shows that R 0,s , R 1,d and R 2,d are at ranges of 29.35–52.78 % ,28.32–57.62 % and 29.35–71.92 % respectively. These results indicate that the condition Sur i is satisfied frequently. Therefore, if the redundant partition modes under condition Sur i are skipped, the computational complexity will be reduced obviously.

It can be seen from Table 3 that R 0,d is at a range of 5.56–11.96 % yet R 1,d and R 2,d are of 12.78–20.74 % and 21.19–29.80 % respectively. This phenomenon indicates that for CUs with 64 × 64 pixels size, when the condition Sur i is satisfied, there is a probability of only 5.56–11.96 % that the current CU is smaller than the co-located CU. However, for the CUs of 32 × 32 and 16 × 16 pixels sizes, the probabilities are 12.78–20.74 % and 21.19–29.80 %. This phenomenon can be analyzed as follows. Condition Sur0 means that the four surrounding CUs of the co-located CU are all 64 × 64 pixels and have a same partition mode: PART_2N × 2 N. When Sur0 is satisfied, the co-located CU and its surrounding CUs have largest CU size and smooth partition mode. This phenomenon also indicates that the region of the co-located CU is smooth or the objects in the region have similar motion vectors, thus, the region of the current CU will be smoother or the objects in the region will have more similar motion vectors even if some objects in the surrounding CUs in the previous frame have moved to the current CU in the current frame. However, for the other two conditions Sur1 and Sur2, the co-located CU and its surrounding CUs have no the largest CU segmentation, which means the region has more texture or the objects in the co-located CU have more different motion vectors.

It can also be seen from Table 3 that R 0,e is at a range of 90.35–96.04 % yet R 1,e and R 2,e are of 67.21–72.79 % and 52.45–58.53 % respectively. These results indicate that the CUs in the current block have more partition mode of PART_2N × 2 N under 64 × 64 than 32 × 32 and 16 × 16 when condition C1 is satisfied.

It can be seen again from Table 3 that R 0 is at a range of 0.23–1.15 % yet R 1 and R 2 are of 3.66–6.80 % and 7.93–13.80 %, respectively. These results indicate the following phenomenon. When the condition Sur i is satisfied and the size of the co-located CU is 64 × 64 pixels, almost all the partition modes for the CUs in the current block are PART_2N × 2 N when they are smaller than the co-located CU. However, when the size of the co-located CU is 32 × 32 or 16 × 16 pixels, some of them are the other partition modes. Therefore, when the size of the co-located CU is 64 × 64 pixels and condition Sur i is satisfied, only the partition mode PART_2N × 2 N needs to be evaluated for the deeper depth of CUs of the current CU. For the other sizes of the co-located CUs, other partitions should not be skipped.

Based on the above analysis, when the size of the current CU is equal to that of the co-located CU, the flow chart of our algorithm can be drawn in Fig. 4. In addition, its corresponding pseudo code can be shown as follows.

Fig. 4
figure 4

Flow chart of the proposed algorithm when the size of the current CU is equal to that of the co-located CU

If Size of CU == Size of CoCU

Then Bestmode of CU = Bestmode of CoCU

If Size of CU == 64 × 64

Then If (Sizes of surCUs ==64 × 64) and (Bestmode of surCUs == ‘2 N × 2 N’)

Then Bestmodes of SubCUs (32 × 32, 16 × 16, 8 × 8) = ‘2 N × 2 N’

Else split CU

EndIf

Else split CU

EndIf

EndIf

3.2.3 When the size of the current CU is smaller than that of the co-located CU

When condition Sur0 in section 3.2.2 is not satisfied and the size of the current CU is smaller than that of its corresponding CU, the partition mode of the corresponding CU should be taken into consideration, as shown in Fig. 5.

Fig. 5
figure 5

Relation between the current CU and the co-located CU when the former is smaller than the latter

In Fig. 6, K defines the current block, and L, M, N, and O define four probable co-located block. When L gives the position of the co-located block and the partition mode of the co-located CU is PART_nL × 2 N, the most likely partition mode of the current CU is PART_N × 2 N. Similarly, PART_2N × nU in the co-located block will result in PART_2N × N in the current CU (see Fig. 6).

Fig. 6
figure 6

Relation between the partition mode of the co-located CU and that of the current CU when L gives the position of the co-located block: a The partition mode of the corresponding CU is PART_nL × 2 N b The partition mode of the corresponding CU is PART_2N × nU

Based on the above analysis, the flow chart of our algorithm when the current CU is smaller than the co-located CU can be summarized as Fig. 7. In Fig. 7 , L co and L cu are the side lengths of the co-located CU and the current CU, respectively, and P co denotes the partition mode of the co-located CU. Note that the partition mode PART_2N × 2 N is calculated compulsively for the current CU in our algorithm, which is not written in Fig. 7 for simplicity.

Fig. 7
figure 7

Flow chart of our algorithm when Current CU is smaller than the corresponding CU

4 Simulation Results and Analysis

In order to validate the effectiveness of the proposed fast inter-prediction algorithm, two comparisons are implemented. One is between the proposed algorithm and HM13.0, the another is between the proposed one and the algorithm in Refs.[14]. Both of the comparisons are implemented in HM13.0. All of the sequences for HEVC standard are tested. The group of picture (GOP) structure is IPPP and the GOP size is set to 4. Two coding conditions are selected in our simulation: Constraint set1 (CS1) and constraint set2 (CS2). CS1 (random access (RA) main 10) corresponds to a broadcast scenario with a maximum GOP size of 8 and a maximum intra frame period of approximately 1.1 s. CS2 (low-delay (LD) P main 10) corresponds to a low-delay scenario with no picture reordering. The two constraint sets restricting coding parameters and the temporal coding structure are used in all tests. The Max Partition Depth (MPD) is set to 4, that is, the CU size is set from 8 × 8 to 64 × 64 pixels. For motion search, fast search and hadamard ME are set on and the search range is set to 64. For quantization, transform skip (TS) is on. Search mode “EPZS” and “FEN” (fast encoder decision) are enabled. For deblock filter, all the options are disabled. Internal bit-depth is set to 10. All the coding tools: Sample adaptive offset (SAO), Adaptive loop filter (ALF), Chroma from Luma intra prediction mode (LMC Chroma), Non-square transforms (NSQT) and Asymmetric motion partitions (AMP) are enabled. Four QP values are used: 22, 27, 32 and 37.

In addition to BD-rate [5], which denotes the bitrates difference under the same video quality, the coding performance is based on the following three measures:

$$ \varDelta Bitrate=\frac{Bitrat{e}_{pro}- Bitrat{e}_{ref}}{Bitrat{e}_{ref}}\times 100\% $$
(14)
$$ \varDelta PSNR= PSN{R}_{pro}- PSN{R}_{ref} $$
(15)
$$ \varDelta Time=\frac{Tim{e}_{pro}- Tim{e}_{ref}}{Tim{e}_{ref}}\times 100\% $$
(16)

where ∆Bitrate is the bitrate increment, Bitrate pro and Bitrate ref are the bitrates of the proposed fast mode decision algorithm and HM13.0 or that in Refs.[14], respectively. ∆PSNR is the increment in peak signal-to-noise ratio (PSNR), PSNR pro and PSNR ref are the PSNR of the proposed algorithm and HM13.0 or that in Refs.[14], respectively. ∆Time is the simulation time increment, Time pro and Time ref are the simulation time of the proposed algorithm and HM13.0 or that in Refs.[14], respectively.

4.1 Comparison between the Proposed Algorithm and HM13.0

In order to indicate the performance, Table 4 shows comparison results of the proposed algorithm and HM13.0 for only six sequences. The comparison results for the other sequences are similar to them. It can be seen from Table 5 that our proposed algorithm achieves 57.8–73.2 % simulation time saving with a 0.09–1.95 % bitrates increment and a decrease of 0.01–0.09 dB in PSNR, compared to HM13.0. These results have confirmed that the proposed algorithm can skip most of the unnecessary partition modes and choose most of the true optimal partition modes with negligible degradation in coding efficiency and video quality. In the following, we analyze the performance results in different QPs, resolutions and object motions of the different sequences.

Table 4 Comparison of performance results between the proposed inter-prediction algorithm and HM13.0
Table 5 Coding efficiency of the contribution compared to HM13.0

4.1.1 Analysis of ∆Bitrate of comparison one

As shown in Table 4, the proposed algorithm yields less ∆Bitrate yet more simulation time saving in higher QP levels than lower ones. This phenomenon can be analyzed as follows. It can be seen from Table 1 that partition mode PART_2N × 2 N will appear more frequently at higher QP levels than lower ones. The mode PART_2N × 2 N has been selected as the first optimal partition mode in our proposed algorithm. Therefore, our algorithm will select more true partition modes at higher QP levels than lower ones. More true partition modes mean less ∆Bitrate. On the second side, it is well known that there will be less high frequency signal of picture texture in higher QP levels than lower ones. This objective law has led to that there will be higher similarity between two adjacent encoded frames in higher QP levels than lower ones. Note that we refer to two adjacent encoded (reconstructed) frames, not the two original adjacent frames, which have nothing to do with QP. In our algorithm, higher similarity between two adjacent encoded frames will lead to more unnecessary partition modes being skipped, which results in that our algorithm saves more coding time in higher QP levels than lower ones. Table 2 has shown that P i will have larger value in higher QP values than lower ones, which means the partition mode of the current CU have a larger probability to be PART_2N × 2 N in higher QP values than lower ones. Partition mode PART_2N × 2 N has been selected the optimal partition mode in our algorithm. Therefore, larger probability of PART_2N × 2 N will led to more precise mode decision, which means less ∆Bitrate in higher QP values than lower ones. In Table 3, R i has a smaller value in higher QP values than lower ones, which also means that the partition mode of the current CU will have a higher probability to be PART_2N × 2 N in higher QP values than lower ones.

The performance results of our algorithm also depend on the resolutions of the sequences. It can be seen from Table 4 that the proposed algorithm can achieve less ∆Bitrate in higher resolution than lower one. This phenomenon can be analyzed as follows. For the same number of pixels, higher resolution means smaller natural scene area than lower resolution. It is obvious that there will be higher similarity between two adjacent frames in smaller natural scene area than larger one, which leads to less ∆Bitrate in higher resolution than lower one. Tables 1, 2 and 3 have shown that the partition mode of the current CU will have a higher probability to be PART_2N × 2 N in higher resolutions than lower ones, which will result in more precise mode decision and less ∆Bitrate.

The performance results of our algorithm also depend on the object motion directions of the sequences. It can be found in Table 4: the bigger objects, basketball players in sequence BasketballDrive , are moving at high speed, but they have not yielded too much ∆Bitrate. On the contrary, the small water bubbles, which are moving to different directions in sequence Partyscreen, have yielded more ∆Bitrate than other sequences in our algorithm. This result implies that small objects with different motion vectors will worsen the coding efficiency in the proposed fast inter-prediction algorithm. This phenomenon can be explained as follows. When bigger object in the CU in the co-located block moves to other position or the smaller objects move to the same positions, the CU segmentation and partition mode in the co-located block in the previous frame is likely similar to that of the current CU in the current frame. However, when the smaller objects in the CU in the co-located block move to different positions, the CU segmentation and partition mode in the co-located block are not likely similar to those of the current CU.

4.1.2 Analysis of ∆Time

It can be found form Table 4 that the proposed algorithm will save more simulation time in higher QP levels than lower ones. This phenomenon can be analyzed as follows. Table 2 has shown that P i,c will have a larger value in higher QP levels than lower ones, which means that the condition Con1 will appear more frequently in higher QP levels than lower ones. Under condition Con1, the other partition modes except PART_2N × 2 N will be skipped. More frequently appearance of Con1 means more partition modes are skipped and so more simulation time will be saved in higher QP levels than lower ones. Table 3 has shown that R i,s will have a larger value in higher QP levels than lower ones, which means that the condition Sur i will apprear more frequently in higher QP levels than lower ones. Under condition Sur i , the other partition modes except PART_2N × 2 N for the deeper depths of CUs will be skipped. More frequently appearance of Sur i means more partition modes are skipped and so more simulation time will be saved in higher QP levels than lower ones.

It can also be found from Table 4 that the proposed algorithm will save more simulation time in higher resolutions than lower ones. This phenomenon can be analyzed as follows. Table 2 has shown that P i,c has a larger value in higher resolutions than lower ones, which will bring about more simulation time saving in higher resolutions than lower ones. Table 3 has shown that R i,s has a larger value in higher resolutions than lower ones, which will also result in more simulation time saving in higher resolutions than lower ones.

4.1.3 Analysis of BD-rate

In Table 5, Class A-E denote subsets of sequences with different resolutions, including clips of 4 K × 2 K, 1080p, WVGA, WQVGA and 720p, and Class F denotes screen Content Coding sequences. BD-Rate is used to Measure the coding performance. It can be seen that BD-rate for all sequences are average 0.9 % reduction for RA Main10 and 1.0 % for LD P Main10.

4.2 Comparison between the Proposed Algorithm and that in Ref.[14]

Table 6 shows that our proposed algorithm further reduces 42.6–50.0 % coding time with only 0.10–0.66 % loss in bitrates and 0.00–(0.03) dB decrement in PSNR compared to that in Ref.[14].

Table 6 Comparison of performance results between the proposed inter-prediction algorithm and that in Ref.[14]

4.2.1 Analysis of Bitrate and BD-rate

It can be observed from Table 6 that the proposed algorithm yields less bitrate increment yet more simulation time saving at higher QP levels than lower ones and in higher motion videos than motionless ones which is similar to the results in Table 7. However, the bitrates savings for different resolutions is constant in whole. This result indicates that our algorithm is close to Refs.[14] in bitrates saving.

Table 7 Coding efficiency of the contribution compared to Rf.[14]

4.2.2 Analysis of simulation time

It can also be seen from Table 6 that our proposed algorithm achieves more coding time saving at lower QP levels than higher ones, which is in contrary with that in Table 4. This result is caused by the reason that the algorithms achieve less coding time saving at lower QP levels than higher ones in Refs.[14]. This phenomenon implies that our proposed algorithm still can achieve low computational complexity in high quality videos.

It can be seen from Table 7 that BD-rate for all sequences are average 0.4 % reduction for RA Main10 and 0.5 % for LD P Main10.

5 Conclusion

In this paper, a fast inter-prediction for HEVC is proposed. The optimal partition modes in HEVC are selected based on the partition modes distribution of six video sequences with different resolutions and motions in HM13.0. Then, the fast inter-prediction algorithm is analyzed under three relationships between two adjacent frames, respectively. When the current CU is larger than the co-located CU, the other partition modes except PART_2N × 2 N for the current CU are skipped based on the statistical results. When they have the same size, the similarity of CU segmentation and PU partition mode between two adjacent frames is counted and a condition when the other partition modes except PART_2N × 2 N can be skipped is presented, under these analyses, the unnecessary partition modes for the current CU and its deeper depths of CUs are skipped. When the current CU is small than the co-located CU, a group of conditions are proposed, under which at most two partition modes are evaluated for the current CU to save computational complexity. The simulation results show that this algorithm can achieve a reduction of 65 % encoding time with negligible loss in bitrate and PSNR, compared to HM13.0.